Accepted Manuscript Instruments for evaluating pharmacotherapy intervention efficacy in violent and aggressive behavior and conduct disorder in youth
Jessica L. Hambly, Sohil Khan, Brett McDermott, William Bor, Alison Haywood PII: DOI: Reference:
S1359-1789(16)30159-8 doi: 10.1016/j.avb.2017.04.004 AVB 1106
To appear in:
Aggression and Violent Behavior
Received date: Revised date: Accepted date:
15 October 2016 2 March 2017 13 April 2017
Please cite this article as: Jessica L. Hambly, Sohil Khan, Brett McDermott, William Bor, Alison Haywood , Instruments for evaluating pharmacotherapy intervention efficacy in violent and aggressive behavior and conduct disorder in youth. The address for the corresponding author was captured as affiliation for all authors. Please check if appropriate. Avb(2017), doi: 10.1016/j.avb.2017.04.004
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT Instruments for evaluating pharmacotherapy intervention efficacy in violent and aggressive behavior and conduct disorder in youth
Abbreviated Title: Instruments measuring pharmacotherapy efficacy in violent and aggressive youth
Authors: a
PT
Jessica L Hamblya, Sohil Khana,b, Brett McDermottc, William Borb,d, Alison Haywooda,b School of Pharmacy, Menzies Health Institute Queensland, Griffith University, Gold Coast,
RI
QLD, 4222, Australia b
Mater Research Institute - The University of Queensland, Brisbane, QLD, Australia
School of Medicine and Dentistry, James Cook University, Townsville, Australia
SC
c
d
Mater Child and Youth Mental Health Service, Mater Health Service, South Brisbane, QLD,
NU
Australia
Author emails:
MA
Ms. Jessica Hambly –
[email protected] Dr. Sohil Khan –
[email protected]
Prof. Brett McDermott –
[email protected]
D
Dr. William Bor –
[email protected]
Corresponding author:
CE
Jessica Hambly
PT E
Dr. Alison Haywood –
[email protected]
School of Pharmacy Gold Coast Campus
Australia
AC
Griffith University QLD 4222
Tel: + 61 431985230; Fax: + 61 755528804; Email:
[email protected]
1
ACCEPTED MANUSCRIPT Abstract There is a need to identify the most appropriate standardized instruments for research evaluating pharmacotherapy for youth with violent and aggressive behaviors. Youth violence and aggression are heterogeneous behaviours which differ depending on age and gender. Instruments used in randomized controlled trials evaluating efficacy of pharmacotherapy in conduct disorder and its comorbidities were reviewed for psychometric, administrative and
PT
practicality evidence. Evidence was rated on a 3-point scale, adapted from the Scientific
RI
Advisory Committee’s Instrument Review Criteria.
Of the nine included instruments, the Nisonger Child Behavior Rating Form (NCBRF),
SC
Conners’ 3rd Edition, and Behavior Problems Inventory (BPI-01) were rated the highest for
NU
their psychometric properties. The Children’s Aggression Scale (CAS), Abberrant Behavior Checklist (ABC) and Disruptive Behavior Disorder Rating Scale (DBDRS) were rated
MA
moderate, and the Child Behavior Checklist (CBCL), Modified Overt Aggression Scale (MOAS) and Swanson, Nolan, and Pelham Rating Scale (SNAP-IV) were rated lowest. The NCBRF, BPI-01 and CAS were the only instruments that could be used to measure both
D
frequency and severity of aggressive behaviors. The CAS and MOAS featured the most items
PT E
pertaining to violence and aggression.
The broad-band scales, the NCBRF and Conners’ 3rd Edition, rated highest for their
CE
psychometric properties, however their usefulness in youth violence and aggression research is limited. The heterogeneity of aggressive and violent behaviors, age, gender,
AC
functional level, situational context and the type of informant should be taken into account when considering an appropriate instrument. All items in the CAS and the MOAS can be used to measure violent and/or aggressive behaviors. Further research into the psychometric properties of the MOAS in violent and aggressive youth is required before its use can be recommended. The CAS was found to be the most psychometrically sound and useful instrument that exclusively measures aggressive behaviors in youth.
Keywords: instruments, outcome measures, aggression, violence, youth, efficacy
2
ACCEPTED MANUSCRIPT 1.1
BACKGROUND Youth, those less than 25 years old, comprise approximately 44% of the world’s
population (Sawyer et al., 2000b). Half of all cases of diagnosed psychological disorders usually develop before age 14, emphasising the importance of youth in relation to mental health conditions and disorders (Erskine et al., 2015). Recent reviews of 21st century trends suggest declines in youth violent and aggressive behavior across developed nations (Bor, Dean, Najman,
PT
& Hayatbakhsh, 2014; Collishaw, 2015). Despite the declining or stabilizing levels of youth violent and aggressive behaviors, the moderate to high levels of the associated problems result
RI
in considerable costs to the affected individuals, their families, and the community (Bor et al.,
SC
2014). In the United States physical violence in school-aged children was reported at 24.7% in 2013 (Frieden, Jaffe, Cono, Richards, & Iademarco, 2014). Whereas, in China, a large cross-
NU
sectional sample of urban adolescents revealed self-reported mean rates of physical aggression being 13.38% and verbal aggression 12.95% in 2012 (Zhang et al., 2012). Further, an Australian
MA
population study found 8.8% of adolescents, 11 to 17 years reported conduct problems (Lawrence et al., 2015), and in the United Kingdom, 6.6% of adolescents, aged 11 to 16
D
received a clinical diagnosis of conduct disorder (CD) (Etchells, Gage, Rutherford, & Munafo,
PT E
2016).
Aggression may take many forms (e.g., physical and verbal) and have many functions
CE
(e.g., impulsive and instrumental) (Farmer et al., 2016). Aggressive behavior can be defined as an act directed towards a specific person, object or animal with the intent to hurt or frighten.
AC
Children present with more overt aggressive behaviors rather than covert and onset occurs more commonly in childhood rather than adolescence (Reebye & Moretti, 2005). Violence can be defined as the intentional use of threatened or actual, physical force or power against oneself or another person, which results in, or has a high likelihood of causing harm (World Health Organisation (WHO), 1991). As a behavioral construct, aggression is one of the core characteristics displayed for a diagnosis of CD in the Diagnostic and Statistical Manual of Mental Disorders, 5th edition (DSM-V) (American Psychiatric Association (APA), 2013). CD is often considered a diagnostic challenge in child and adolescent mental health, due to
3
ACCEPTED MANUSCRIPT heterogeneity within the classification and including the high prevalence of comorbidities (Klahr & Burt, 2014). Youth aggression is a multifaceted phenomenon that is displayed differently depending on age and gender (Halperin, McKay, & Newcorn, 2002). Several points need to be considered before identifying aggressive behavior as a disorder, such as the differentiation between prosocial assertive play as opposed to behaviour with the intent to
PT
hurt or frighten. Aggressive symptoms may change with developmental competence in
RI
motor and cognitive domains. For example, preschool aged children may display instrumental and physical expressions of aggression whereas, school-age children may
SC
exhibit hostile aggression through name-calling, criticizing and ridiculing (Reebye, 2005).
NU
In the context of intervention monitoring and evaluation studies, there is a requirement to distinguish between normal and abnormal levels of violence and aggression, and the
MA
nature and quality of the behaviors that fall outside of typical development expectations (Halperin, McKay, Grayson, & Newcorn, 2003).
Behavior rating scales are instruments used to complete a relatively quick, normative-
D
based assessment of child behavior. They can be used to measure clinical outcomes such as the
PT E
efficacy and safety of an intervention, and/or patient related factors that may influence the success of intervention (e.g., adherence, social environment, quality of life) (Hunsley & Mash,
CE
2010). In clinical research practitioner assessments are limited by lack of inter-rater and testretest reliability, difficulties standardizing clinical knowledge and experience, as well as cost
AC
and time constraints (Slade, Thornicroft, & Glover, 1999). For the purpose of outcome measurement in clinical research, the use of assessment tools or instruments are often employed. They are often formal, structured instruments, that demonstrate psychometric properties such as reliability and validity (Slade et al., 1999). Use of a statistically reliable instrument, appropriately selected based on its applicability to the population being studied and the types of questions being asked, is essential for effective research (Suris et al., 2004). There is a need to identify appropriate instruments for use in violent and aggressive youth particularly as there are reports of unstandardized instruments with unclear
4
ACCEPTED MANUSCRIPT psychometric properties being used in contemporary research (Elson, Mohseni, Breuer, Scharkow, & Quandt, 2014). Multiple instruments are available for the diagnosis and assessment of CD in youth, including those that assess CD and other behavioral symptoms and comorbidities (i.e., broadband), and those that are violence and aggression specific (i.e., narrow-band) (Farmer et al., 2016; Hersen, 2006). However, in clinical research it is useful if the instrument measuring
PT
aggression and violence can assess the frequency and or severity of problem behaviors
RI
(Halperin et al., 2002). Although pharmacotherapy is seldom first-line in the management of violent and aggressive behaviors in youth, numerous classes of medications have been
SC
documented with the large majority only supported by anecdotal evidence and exploratory investigation (Hambly, Khan, McDermott, Bor, & Haywood, 2016). A review investigating the
NU
role of pharmacotherapy in CD revealed that no particular instrument was a ‘gold-standard’
MA
measure of outcome to evaluate violent and aggressive behaviors in youth receiving pharmacological intervention (Hambly et al., 2016). It was therefore evident that there is a need to identify the most appropriate standardized instruments for pharmacotherapy intervention
1.2.1
METHODS
PT E
1.2
D
efficacy research involving youth with violent and aggressive behaviors.
Search strategy
A review evaluating the psychometric, administrative and practicality evidence of
CE
instruments in pharmacotherapeutics efficacy studies in youth with violence and aggression was
AC
conducted. Firstly, instruments utilized to evaluate efficacy of intervention in randomised controlled trials (RCTs) of pharmacotherapy in the management of CD and its related comorbidities in youth were included based on the findings from the most recently published systematic review (Hambly et al., 2016). The studies were published in English from January 2000 to an updated search till January 2016. Secondly, the name of the instrument and the key words ‘validity’, ‘reliability’, ‘children’, ‘adolescents’, and ‘youth’ were used to search for literature in MEDLINE, OVID SP and EMBASE. Literature for each tool was investigated for evidence of, or instrument specific information pertaining to, the criteria listed in summary Table 1. Additional literature was identified through searching reference lists of papers 5
ACCEPTED MANUSCRIPT identified after initial screening. For the analysis of validity and reliability, the search focused on the use of the instruments in youth populations who displayed violent and/or aggressive behavior. The titles and abstracts of all articles identified were screened independently by two review authors (JH and SK). If conflicting opinions arose during screening, a third author (AH) determined if inclusion in the study was appropriate. 1.2.2
Scoring
PT
1.2.2.1
Psychometric and administrative evaluation
Scoring criteria for psychometric and administrative properties for each instrument is
RI
outlined below. Retrieved information was graded on a 3-point scale adapted from the Scientific
SC
Advisory Committee’s Instrument Review Criteria (Lohr et al., 1996) by (Andresen, 2000). Generally, an instrument displaying high quality evidence for the evaluated characteristic was
NU
awarded an ‘A’ grade. A ‘B’ grade was given if it displayed moderate quality, a ‘C’ grade if it had poor quality, and an unknown (‘U’) grading if no information was available. To rank the
MA
instruments, a numeric score was assigned to the alphabetical grades as follows: U=0, A=1, B=2, C=3. Reliability
D
1.2.2.2
PT E
Reliability refers to the degree to which an instrument consistently produces the same result at different times, from independent observers, and/or different samples (Wasserman & Bracken, 2003). There are three general classes of reliability used when evaluating instruments: Inter-Rater/Observer Reliability - assesses the degree to which independent interviewers
CE
give consistent estimates using identical instruments (Roberts & Priest, 2006); Test-Retest Reliability - assesses the consistency of results from two separate
AC
administrations of an instrument, to the same participants (Roberts & Priest, 2006);
Internal Consistency Reliability - assesses the consistency to which items in a given instrument, domain, or sub-test are grouped correctly. This is often measured by Cronbach’s coefficient alpha (Cicchetti, 1994; Cronbach, 1951);
For the purpose of inter-rater and test-retest reliability scoring an intraclass correlation coefficient (ICC) or kappa coefficient (κ) ≥0·80 was considered excellent (graded ‘A’), <0·60 considered poor (graded ‘C’), and between these ranges (≥0.60, <0.80), moderate
6
ACCEPTED MANUSCRIPT (graded ‘B’) (McHugh, 2012). For internal consistency of scales a Cronbach’s alpha (α) or Kuder-Richardson Formula 20 (KR-20) ≥0·80 scored excellent (‘A’), ≥0·70 and 0·80 adequate (‘B’), and <0·70 low or inadequate (‘C’).(Andresen, 2000; Endicott, Tracy, Burt, Olson, & Coccaro, 2002) 1.2.2.3
Validity An instrument is considered robust in validity if there is strong evidence and theory that
PT
support the appropriateness, meaningfulness, and usefulness of test results and interpretation (Colliver, Conlee, & Verhulst, 2012). If the multitrait-multimethod matrix (MTMM) and/or
RI
correlation coefficient were used for the assessment of convergent validity, the following scores
SC
were allocated, ‘A’ = strong correlation (≥0.60), ‘B’ = moderate correlation (≥0.30 and <0.60), and ‘C’ = weak correlation (<0.30) (Campbell & Fiske, 1959). The opposite scale for scoring
NU
was applied for discriminate validity, ‘A’ = ≤0.30, ‘B’ = <0.30 and ≥0.60, and ‘C’ = >0.60 (Campbell & Fiske, 1959). For factorial validity if root mean square error of approximation
MA
(RMSEA) ≤0.05, non-normed fit of index (NNFI) ≥0.95 or confirmatory factor analysis (CFI) ≥0.95 (Lecavalier, Aman, Hammer, Stoica, & Mathews, 2004) was obtained it was considered a
D
good fit (‘A’). If RMSEA >0.05 and ≤0.08, NNFI or CFI ≥0.90 and <0.95 it was considered an
PT E
acceptable fit (‘B’) and values outside of these ranges were considered poor fitting (‘C’). All other scores were allocated on face validity or statistical properties depending upon which
1.2.2.4
CE
method was used.
Sensitivity - responsiveness
AC
The sensitivity or responsiveness of an instrument refers to the ability to detect any small but meaningful changes due to an intervention (Andresen, 2000; Lim, Liong, Lau, & Yuen, 2015). Effect size and standardized response means are the two most common statistical measures for responsiveness (Lim et al., 2015). Values of effect size index and standardized response means of 0.5 or greater but less than 0.8 are considered a moderate effect and 0.8 or greater deemed as a large effect (Lim et al., 2015). If an instrument had evidence for responsiveness expressed as effect size and derived from longitudinal data with clearly identified population and documented time intervals it was awarded an ‘A’ rating. If there was
7
ACCEPTED MANUSCRIPT weak evidence or it was based solely on statistical significance (p value) it was awarded a ‘C’. If evidence fell between these two extremes the instrument was rated a ‘B’. 1.2.2.5
Brevity – respondent burden Respondent burden is defined as the time, energy and demands placed upon those being
administered an instrument (Lohr et al., 1996). A maximum of 15 minutes is considered appropriate for a primary outcome measure (Andresen, 2000). The content should be
PT
appropriate for the target population and special training or resources should not be required. If a long questionnaire is warranted or particularly inconvenient, a study protocol should include
RI
compensation to reduce this burden e.g., payment (Andresen, 2000). Assessment of respondent
SC
burden may take the form of both quantitative and qualitative investigation. Quantitative assessment may include rating respondent satisfaction, whereas qualitative assessment may
NU
include reviewing respondent opinions and reactions (Andresen, 2000). A rating of ‘A’ was awarded for brevity (<15 minutes duration) and appropriate content. Grade ‘C’ was applied to
MA
instruments that were lengthy and had no evidence of appropriateness. Grade ‘B’ was awarded for those less than 30 minutes with some evidence of appropriate content. Simplicity – administrative burden
D
1.2.2.6
PT E
An administratively simple questionnaire is considered short, easily scored, without the need for a sophisticated computer program, and does not require specialist training or qualifications (Andresen, 2000; Lohr et al., 1996). If data gathered using the instrument was
CE
easily scored by hand, understood, inexpensive and required no formal training, it was awarded
AC
an ‘A’. If expensive, onerous for administration and scoring, and interpretation not straightforward, it was rated a ‘C’. Ratings falling between the 2 extremes ranked ‘B’. 1.2.2.7
Acceptability – cultural or language adaptations
The assessment of translation or cultural adaptation of an instrument involves two key criteria, ensuring conceptual and linguistic equivalence, and the evaluation of measurement properties (Lohr et al., 1996). To receive a high rating (‘A’) the instrument had evidence of validity for cultural adaptation and/or had information about methods to achieve linguistic equivalence (e.g., forward and backwards translations). If no evidence could be found for the
8
ACCEPTED MANUSCRIPT validity and methodology of translation or adaptation a ‘C’ grade was awarded. If rated in between, the instrument was given a ‘B’ for acceptability. 1.2.2.8
Accessibility – alternative mediums The availability and assessment of alternative version in administration of instruments
(other than the original version) is a concept referred to as accessibility (Andresen, 2000; Lohr et al., 1996). To rate an ‘A’ for accessibility, instruments were available in multiple forms and
PT
found free of mode effects (i.e., no differences between the original version and transformed version upon administration). To rate a ‘B’ the instrument was available in other forms however
1.2.2.9
SC
forms or mode information available (Andresen, 2000).
RI
no testing for mode effects were evident. A ‘C’ grade reflected that there were no alternative
Conceptual model
NU
A conceptual model is a rationale for, and description of, the concept/s that the instrument is intended to assess and the relationship between those concepts (Lohr et al., 1996).
MA
For assessment, instruments were graded on how well they captured the intended conceptual framework. If an instrument carefully defined their measurement construct and domains, their
D
grading reflected this. If domains were completely covered an ‘A’ was awarded, adequately
PT E
covered attracted a ‘B’, and inadequately was given a ‘C’(Andresen, 2000). 1.2.2.10 Measurement model
A measurement model refers to an instrument’s scale and subscale structure, and the
CE
procedure followed to create scale and subscale scores (Lohr et al., 1996). For scoring an ‘A’ was awarded if there was no skewness or evidence for development of scoring, ‘B’ was awarded
AC
if there was skewness intermediate or conflicting evidence for development of scoring, and ‘C’ was awarded for substantial skewness (>20% of a groups scores at either extreme of a range) (Andresen, Rothenberg, Panzer, Katz, & McDermott, 1998) or no evidence for development of scoring. 1.2.2.11 Bias Instruments should not be affected by differences in culture, social circumstances, or impairment type, except for differences the tool intends to measure. Statistical methods such as Rasch Analysis or Confirmatory Statistical Analysis (CFA) can be used to assist with the detection of item and instrument bias, (Smith, 2004) however, a large proportion of grading 9
ACCEPTED MANUSCRIPT qualitative evidence of bias is somewhat subjective. For subjective grading an ‘A’ was awarded to instruments with clear evidence of review. A ‘B’ grade awarded if formal evidence was lacking but had good face validity for bias, and a ‘C’ grade was awarded if bias was evident. 1.2.3
Applicability to violence and aggression research
For practicality assessment, instruments were also evaluated independent of their psychometric properties, investigating the ability to measure violent and aggressive behaviors,
PT
and their frequency and severity. Instruments were further screened against the following questions:
Can the instrument be used to measure the frequency of violent behaviors?
Can the instrument be used to measure the frequency of aggressive behaviors?
Can the instrument be used to measure the severity of violent behaviors?
Can the instrument be used to measure the severity of aggressive behaviors?
How many items in the instrument measure violent behaviors?
How many items in the instrument measure aggressive behaviors?
MA
NU
SC
RI
Table 1 summarises the quality assessment criteria applied for instruments in the current
AC
CE
PT E
D
systematic review.
10
ACCEPTED MANUSCRIPT
Table 1: A summary of characteristics considered desirable for an appropriate standardized outcome measure Characteristic Reliability
Explanation
Criteria for Grading (Andresen, 2000; Lohr et al., 1996; McHugh, 2012)
Measurements by different observers, at different
Inter-rater/test-retest (ICC or κ)
Internal consistency (Cronbach α or KR-20)
times or in parallel ways produce the same result.
A = ≥0.80
A = ≥ 0.80
T P
I R
B = ≥0.70, <0.80
B = ≥0.60, <0.80 C = <0.60 Validity
The instrument measures the intended outcome.
coefficient)
U N
A = ≥0.60
A M
B = ≥0.30, <0.60 C = <0.30 Responsiveness
degrees of change.
Brevity
Simplicity
Acceptability
D E
The instrument captures clinically meaningful
PT
The instrument length and content is appropriate.
E C
C A
SC
Convergent (correlation
C = <0.70
Factorial (RMSEA)
Factorial (CFI or NNFI)
A = ≤0.05
A = ≥0.95
B = >0.05, ≤0.08
B = ≥0.90, <0.95
C = >0.08
C = <0.90
A = Strong evidence with large effect size B = Moderate or conflicting evidence C = Weak evidence or based only on statistical significance, not effect size A <15 minutes and appropriate content B >15 minutes, <30 minutes and appropriate content C >30 minutes and/or inappropriate content
No formal training is required to use the
A = Inexpensive, no training, easy scoring and interpretation
instrument. It is easy and inexpensive to
B = Somewhat expensive, more obscure scoring and interpretation
administer, score and interpret. Ratings are clear.
C = Expensive and/or complex scoring and/or interpretation
Cultural and language adaptations of the
A = Validity evidence for translation and/or adaptation and methodology
11
ACCEPTED MANUSCRIPT
instrument exist.
B = Evidence has some problems for translation and/or adaptation C = No evidence for translation and/or adaptation and methodology
Accessibility
The instrument is available and tested in different
A = Available in multiple forms, no mode effects
mediums e.g., computer, self-administration,
B = Available in multiple forms, no testing for mode effects
clinician interviews
C = No alternative forms or mode information
There is a rationale and description of the
A = Clearly defined measurement construct and domains
concept/s the instrument intends to assess, and the
B = Adequately defined measurement construct and domains
relationship between the concepts.
C = Inadequately defined measurement construct and domains
Measurement
The instrument captures detail and breadth that is
A = No skewness or evidence for scoring methodology
Model
intended
B = Skewness or intermediate/conflicting evidence for scoring methodology
Conceptual
T P
I R
C S U
N A
M
C = Substantial skewness >20% in extreme range or no evidence for scoring methodology
Bias
D E
The instrument is not affected by differences in
T P E
culture and social context.
A = Evidence of review for bias, no bias present B = Face validity, no formal evidence C = Bias evident
C C
A
12
ACCEPTED MANUSCRIPT 1.3
RESULTS A total of 12 RCTs applied instruments for measuring efficacy of pharmacotherapy in
conduct disorders based on the recently published systematic review (Hambly et al., 2016). Twenty-four instruments were identified, of which 15 were excluded, based on their lack of aggressive and/or violent constructs, items assessing violent and/or aggressive behaviors, and published information. Seven instruments identified in the systematic review (Hambly et al.,
PT
2016) including the NCBRF (M.G. Aman, De Smedt, Derivan, Lyons, & Findling, 2002; Stocks, Taneja, Baroldi, & Findling, 2012), the Conners’ 3 (Connor, McLaughlin, & Jeffers-
RI
Terry, 2008), the BPI-01 (M.G. Aman et al., 2002; Ter-Stepanian, Grizenko, Zappitelli, &
SC
Joober, 2010), the ABC (M.G. Aman et al., 2002), the CBCL (Blader, Pliszka, Jensen, Schooler, & Kafantaris, 2010; Connor, Barkley, & Davis, 2000; Padhy et al., 2011; Steiner, Petersen,
NU
Saxena, Ford, & Matthews, 2003; Steiner, Saxena, et al., 2007; Wehmeier et al., 2011), the MOAS (Blader et al., 2010; Connor et al., 2008; Donovan et al., 2000; Malone, Delaney,
MA
Luebbert, Cater, & Campbell, 2000) and the SNAP-IV (Stocks et al., 2012; Wehmeier et al., 2011) met criteria for analysis (Figure 1). The CAS and the DBDRS were identified through
PT E
AC
CE
properties (Table 2).
D
reference searching. All nine instruments were scored based on psychometric and administrative
13
ACCEPTED MANUSCRIPT
24 instruments screened from 12 drug trials (RCTs) in CD (trial data extracted and evaluated from recently published systematic review) 8 instruments excluded (lacking aggressive/violent behavior constructs/items)
PT
Review of literature for reliability
RI
and validity studies for instruments
SC
Further 2 instruments identified from reference search
9 instruments excluded
criteria 8 records included - NCBRF
of included instrument)
MA
9 instruments evaluated against
NU
(insufficient data or previous version
D
6 records included - Conners’ 3 6 records included - BPI-01
PT E
2 records included - CAS
6 records included - ABC
5 records included - DBDRS
CE
7 records included - CBCL
6 records included - MOAS
AC
3 records included - SNAP-IV Figure 1: Selection of relevant instruments, presented as a flow diagram
1.3.1
Nisonger Child Behavior Rating Form (NCBRF) The NCBRF is designed to analyze the behavior of youth aged 3 to 16 with intellectual
disability and/or autism spectrum disorders (ASD) (Mircea, Rojahn, & Esbensen, 2010). It is a modified version of the Child Behavior Rating Form (CBCL) (Edelbrock, 1985) and has a teacher and a parent version (Mircea et al., 2010). A Typical IQ Version, has also been developed for rating children of normal developmental ability (M. Aman et al., 2008). The
14
ACCEPTED MANUSCRIPT NCBRF consists of two key subscales (social competence and problem behavior), eight domains and 76 items (Michael G. Aman, Tassé, Rojahn, & Hammer, 1996; Tassé, Aman, Hammer, & Rojahn, 1996). The instrument is free to access and appears straight-forward to score. It exists in a psychometrically sound Romanian form (Mircea et al., 2010). Age and gender bias may be present for the conduct problems and insecure/anxious subscales (Tassé et al., 1996). Several psychometric studies have been conducted on the NCBRF that examined
PT
various forms of reliability and validity with mixed results (Mircea et al., 2010). For reliability,
RI
internal consistency was rated high for both parent (Cronbach α = 0.85) and teacher forms (Cronbach α = 0.87) in the problem behaviors subscale (Michael G. Aman et al., 1996). For the
SC
social competence subscale internal consistency was moderate for parent reports (Cronbach α =
NU
0.78) and high for teacher reports (Cronbach α = 0.84) (Michael G. Aman et al., 1996). Moderate inter-rater reliability was shown (median correlation coefficient = 0.51) between
MA
parents and teachers in the problem behaviors subscale which was accountable based on the different situational contexts of the raters. For validity, confirmatory factor analysis (CFA) found that the factor structure of the social competence subscale was acceptable (RMSEA =
D
0.056), however the problem behavior section may be suboptimal (RMSEA = 0.086) (Norris &
PT E
Lecavalier, 2011) findings which were also echoed in ASD youth (Lecavalier et al., 2004). Comparison with the Aberrant Behavior Checklist (ABC) provided support for both parent
CE
(median correlation coefficient = 0.72) and teacher (median correlation = 0.69) versions for convergent validity (Michael G. Aman et al., 1996). Additionally, parent-ratings for the conduct
AC
problems subscale compared with the Disruptive Behavior Checklist – Disruptive/Antisocial subscale further supported convergent and criterion validity (Norris & Lecavalier, 2011). 1.3.2
Conners’ 3rd Edition The use of Conners’ behavioral rating scales have been long documented; with the
current edition being the Conners’ 3 (Conners, 2010; Westerlund, Ek, Holmberg, Naswall, & Fernell, 2009). There is a multitude of published literature on previous editions of the scales, with well-established validity and reliability evidence (Conners, Sitarenios, Parker, & Epstein, 1998; Waschbusch & Elgar, 2007). The 57 questionnaire items in Conners’ 3 are based largely on the DSM (American Psychiatric Association (APA), 2013) and principles of the International 15
ACCEPTED MANUSCRIPT Statistical Classification of Diseases (Westerlund et al., 2009; World Health Organisation (WHO), 1991). They assess ADHD, common comorbidities (e.g., ODD and CD) and associated problem behaviors in youth aged 6 to 18 (Conners, 2010). Connors’ 3 items are rated on a 4point Likert scale where higher scores are associated with a greater number and/or frequency of concerns (Conners, 2010). Three different scales can be used to analyze behavior in the Conners’ 3 namely, the Content Scales, DSM Symptom Scales and the Validity Scales. Flexible
PT
administration by qualified professionals is available, in online, offline and written modes;
RI
offered in 3 forms (parent, teacher and a self-report) (Conners, 2010). The instrument is expensive (~$650AUD) and appears complex to score either by hand or computer (Conners,
SC
2010). However, scoring is weighted for age and gender (Conners, 2010). There is a lack of published evidence for Conners’ 3 and its DSM-V version, perhaps due to the regularity of its
NU
updates. Overall the Conners’ 3 had high quality evidence for reliability with data for internal
MA
consistency (parent Cronbach α = 0.91, teacher Cronbach α = 0.94, self-report Cronbach α = 0.88), (Conners, 2010) inter-rater, and test-retest reliability (Conners, 2010). There are reports for moderate to high validity through multiple statistical analyzes, although not all data was
D
reported (Conners, 2010). It has been translated into other languages including French, (Catale,
PT E
Geurten, Lejeune, & Meulemans, 2014; Fumeaux et al., 2016) Chinese (Xiao, 2012) and Spanish, however published methods for translation methodology are lacking (Conners, 2010). The Behavior Problems Inventory (BPI-01)
CE
1.3.3
The Behavior Problems Inventory (BPI-01) is a 52-item instrument used to evaluate
AC
self-injurious, stereotypic, and aggressive/destructive behavior in mental retardation and other developmental disabilities (Rojahn, Matson, Lott, Esbensen, & Smalls, 2001). The evolution of the BPI-01 has been extensively documented over the years with well-defined measurement construct and domains (Rojahn et al., 2001). It has been transformed since its first German version in the 1980s and undergone multiple revisions (Rojahn, 1984; Rojahn et al., 2001). The BPI-01 has been translated into 11 languages (Mircea et al., 2010). Studies have examined the psychometric properties of the BPI-01 across children and adults samples (Rojahn et al., 2013). A comprehensive analysis of the BPI-01 in 1122 adults and children with intellectual disabilities (mean age = 34.4 years) from several sites reported moderate to high internal consistency across 16
ACCEPTED MANUSCRIPT frequency and severity of all three subscales (Cronbach's α = 0.73 to 0.92) (Hastings et al., 2012). Furthermore, a study including participants 14 to 91 years old found moderate (ICC = 0.64 to 0.76) test-retest reliability for the BPI-01 (Rojahn et al., 2001). Evidence for strong convergent and discriminant validity has also been reported, with factor structure endorsed by confirmatory factor analysis (Matson, Wilkins, Boisjoli, & Smith, 2008; Rojahn et al., 2012). In addition, several studies have found supporting evidence for criterion-related validity of the
PT
BPI-01 with the Repetitive Behavior Scale Spearman correlation = 0.77 (self-injurious), 0.68
RI
(stereotyped behavior) (Rojahn et al., 2010) and the Aberrant Behavior Checklist in adults (Rojahn, Aman, Matson, & Mayville, 2003).
SC
Few studies have exclusively assessed the use of the BPI-01 in youth. A study of 237
NU
ethnically diverse youth (four to 22 years old) with severe developmental disabilities and/or behavioral challenges supported the validity and reliability of the BPI-01 in this population
MA
(Rojahn et al., 2010). For the BPI-01 the mean teacher-teacher single measures and test-retest reliability was moderate to excellent with ICC = 0.76 and ICC = 0.84, respectively. For internal consistency, reliability coefficients ranged from low to high across the three subscales
D
(Chronbach’s α = 0.86 (stereotypic), 0.88 (aggressive/destructive), 0.59 (self-injurious))
PT E
(Rojahn et al., 2010). Further, excellent internal consistency for frequency (Chronbach’s α = 0.87) and severity subscales (Chronbach’s α = 0.89) and test-test reliability has been reported in
CE
infants and toddlers at risk for intellectual or developmental disabilities (Rojahn et al., 2013). The BPI-01 has also been found to have high convergent and discriminant validity when
AC
compared to the Nisonger Child Behavior Rating Form (NCBRF), as well as adequate factorial validity in youth (Rojahn et al., 2010). 1.3.4
The Children’s Aggression Scale (CAS) The CAS is designed to evaluate setting-specific (i.e., home and school) frequency and
severity of aggressive acts of non-institutionalized youth ages 5 to 18 years (Halperin et al., 2003). Two English versions exist, a parent and a teacher version. The parent version features 33 items, and the Teacher Rating Form features 23. The CAS features 5 key domains namely verbal aggression, aggression against objects and animals, provoked physical aggression, unprovoked physical aggression, and use of weapons. The Parent Rating Form has two 17
ACCEPTED MANUSCRIPT additional domains including, aggression against family members, and aggression against nonfamily members (Halperin et al., 2002). The items in the CAS are weighted differentially depending on the severity of the act (Halperin et al., 2002). The CAS requires evidence of qualifications for use, and has sophisticated hand or computer-assisted scoring. Despite this, it has high internal consistency for both the parent (Cronbach’s α = 0.93) and teacher (Cronbach’s α = 0.93) versions (Halperin et al., 2002). Differentiation between the diagnostic subgroups of
PT
ADHD, ODD and CD and correlations with other instruments including the CBCL provide
RI
support for validity (Halperin et al., 2002). The teacher and parent versions both display moderate to high convergent validity (Halperin et al., 2003; Halperin et al., 2002). The parent
SC
version has a good measurement model with a one-way analysis of variance (ANOVA)
NU
indicating significant difference in ratings across all 5 domains, with a continuum of severity also emerging (Halperin et al., 2002). Bias was not formally investigated however in reliability
MA
and validity studies featured an ethnically diverse population (Halperin et al., 2002). In the context of youth with violent and aggressive behaviors, 100% of the items in the CAS focus on aggression and violent outbursts, rendering it a valuable tool in CD clinical research. Aberrant Behavior Checklist (ABC)
D
1.3.5
PT E
The Aberrant Behavior Checklist (ABC) is a 58-item instrument used to measure behavior problems of children and adults with mental retardation across five subscales (Rojahn
CE
et al., 2003). It exists in two forms, residential and community (ABC-C) (M. G. Aman, Singh, Stewart, & Field, 1985; Schmidt, Huete, Fodstad, Chin, & Kurtz, 2013). It has an excellent
AC
measurement model with in-depth reporting of its development (Marshburn & Aman, 1992). The ABC is, or currently is, undergoing translation into 39 languages as reported by its distributors (M. G. Aman, 2012). However, minimal information for translation methods and validation was retrieved. Psychometric studies in toddlers have shown that the ABC-C is a reliable and valid behavior-rating instrument. Overall, moderate to high internal consistency for each subscale across two studies (Chronbach’s α ranging 0.68 to 0.90 (Karabekiroglu & Aman, 2009) and Chronbach’s α ranging 0.81 to 0.96 (Rojahn et al., 2013) has been demonstrated. Moderate convergent validity has been displayed in clinician-rated youth with intellectual disabilities and challenging needs (Hill, Powlitch, & Furniss, 2008). Furthermore, high 18
ACCEPTED MANUSCRIPT convergent validity with the CBCL has been reported in outpatient psychiatry clinics in toddlers with a variety of conditions (Karabekiroglu & Aman, 2009). The ABC-C contains less than 10% of items that specifically measure violent and aggressive behaviors. However, studies to validate the instrument have included participants with primary diagnoses of aggressive and/or violent behavior such as CD (Karabekiroglu & Aman, 2009; Rojahn & Helsel, 1991). 1.3.6
The Disruptive Behavior Disorders Rating Scale (DBDRS)
PT
The current DBDRS is based on the fourth edition of the DSM (Loona & Kamal, 2011). It consists of 41 items representing four domains of symptoms for CD, ODD and ADHD
RI
(Pelham, Gnagy, Greenslade, & Milich, 1992). There are two scoring methods than can be used
SC
to assign a DSM diagnosis to the child (Pelham et al., 1992). The DBDRS has been translated into Dutch (Antrop, Roeyers, Oosterlaan, & Van Oost, 2002), Spanish (Silva et al., 2005) and
NU
Urdu (Loona & Kamal, 2011). The Urdu version has displayed adequate psychometric properties, demonstrating that it is reliable for screening and diagnosis in the school and home
MA
setting (Loona & Kamal, 2011). In the same study, the internal consistency reliability for the English version of the DBDRS ranged from moderate to high (Chronbach’s α = 0.70 to 0.92)
D
(Loona & Kamal, 2011). Furthermore, another study of teacher responses showed excellent
PT E
internal consistency reliabilities for the DBDRS (Chronbach’s α = 0.91 to 0.95) across the inattentive (Chronbach’s α = 0.93), hyperactive/impulsive (Chronbach’s α = 0.91), and
CE
oppositional defiant (Chronbach’s α = 0.94) subscales (Pelletier, Collett, Gimpel, & Crowley, 2006). High convergent validity of the DBDRS with the School Situations Questionnaire (SSQ)
AC
has been found in teacher responses in preschool aged children (Pelletier et al., 2006). Low agreement between parent and teacher ratings had highlighted potential instrument bias (Antrop et al., 2002; Silva et al., 2005). Further studies evaluating the psychometric properties of the DBDRS are recommended as its role in the context of violent and aggressive behavior rating has not been fully understood. Investigation into specific teacher and parent versions is also warranted, although it was unclear to determine if separate versions existed, with a report that the teacher version excluded the CD domain (Pelletier et al., 2006).
19
ACCEPTED MANUSCRIPT 1.3.7
The Child Behavior Checklist (CBCL) The CBCL is a 120 item parent-report, which provides a measure of behavioral and
emotional functioning and social competence of youth, aged six to 18 (Achenbach, 1991). The CBCL has been updated to include DSM-oriented scales, and to complement the new preschool version for children aged 18 months to 5 years. The CBCL has two domains of open and closed questions, social competence and problem behaviors (Siddons & Lancaster, 2004). These
PT
domains can be further divided into DSM-orientated scales, (Nakamura, Ebesutani, Bernstein, & Chorpita, 2009) eight syndrome structures, or ratings of externalising/internalising behaviors
RI
(Ivanova et al., 2007). Parents rate each closed question on a three-point Likert scale (higher
SC
scores indicating a greater number of behaviors present) based on the preceding 6 months (Ivanova et al., 2007). The CBCL is part of a multi-informant group of instruments developed
NU
by Achenbach and colleagues which also includes the Youth Self Report and the Teachers Report Form (Ivanova et al., 2007). Professional qualifications are required to access the costly
MA
(~$700AUD) instrument and scoring is completed digitally or by hand (Sawyer et al., 2000a). Raw scores, T-scores and thresholds can be used to summarise the CBCL in multiple ways
D
under different scales, syndromes and structures (Sawyer et al., 2000a). The CBCL User
PT E
Manual reports high inter-rater reliability (ICC =0.93, p<0.001) and test-retest reliability (ICC =0.95, p<0.001) for item scores (Achenbach, 1991). It is also reported in the manual that there is
CE
established evidence for content validity, criterion-related validity and construct validity for the CBCL (Achenbach, 1991). Published evidence supports clinician use of the CBCL
AC
demonstrating moderate reliability (mean Chronbach’s α = 0.77), convergent and discriminant validity and sound factorial structure (CFI = 0.92) compared to parent-reporting for the CBCL (Dutra, Campbell, & Westen, 2004). The CBCL also demonstrates moderate to high factorial validity for the eight syndrome structure (RMSEA = 0.026 to 0.055) across 30 societies, with 58 051 participants aged six to 18 (Ivanova et al., 2007). Findings also support the concurrent validity of the recently derived DSM-oriented scales (Ebesutani et al., 2010). Due to the multiple structures and subscales, analysis of the conceptual and measurement models of the instrument in regards to its use in violent and aggressive youth has not been conducted. The CBCL features few items (approximately 3.3% of the instrument) related to violent or 20
ACCEPTED MANUSCRIPT aggressive outbursts, limiting its usefulness in studies featuring these participants. However, it is useful for obtaining data on the range of comorbidities that often need to be reported or included in the analysis in CD studies. Further research including youth with violent and aggressive behavior, or those examining the conduct subscale of the CBCL may be useful to determine the role of this instrument in the CD population. 1.3.8
Modified Overt Aggression Scale (MOAS)
PT
The MOAS is a 16 item rating scale that measures aggressive behavior over four domains (Kay, Wolkenfeld, & Murrill, 1988; Sorgi, Ratey, Knoedler, Markert, & Reichman,
RI
1991). The items of the MOAS are hand-scored on a 5-point Likert scale of increasing severity
SC
with verbal aggression assigned the lowest weight and physical aggression the highest (Huang et al., 2009). The total weighted score may range from 0 to 40, with higher scores indicating
NU
more aggression (Dean, Bor, Adam, Bowling, & Bellgrove, 2014). The MOAS has evidence for validity and reliability in French (Cronbach’s α of 0.84 to 0.89 (De Benedictis, Dumais,
MA
Stafford, Cote, & Lesage, 2012)) and Chinese (Huang et al., 2009) (ICC = 0.94, p<0.001 (Huang et al., 2009)) versions however both studies lacked information regarding their
D
methodology to achieve linguistic equivalence. The validity and reliability of the MOAS has
PT E
been described in multiple inpatient and outpatient populations including intellectual disability, behavioral disorders and brain injuries (De Benedictis et al., 2012; Endicott et al., 2002; Huang
CE
et al., 2009; Kho, Sensky, Mortimer, & Corcos, 1998; Oliver, Crawford, Rao, Reece, & Tyrer, 2007). Reports of acceptable to high reliability and validity in multiple studies exist, however
AC
studies including children are lacking. Dean et al.,(Dean et al., 2014) report that the validity of the MOAS has been examined in several studies, and demonstrated reliability and sensitivity in children, however, there is minimal evidence of published studies. Despite this, the tool has been used as an outcome measure in many studies with child and adolescent participants (Blader et al., 2010; Dean et al., 2014; Kronenberger et al., 2007). All items in the MOAS focus on aggressive and violent outbursts, with the potential to be a valuable tool in CD clinical research. Further studies to validate the psychometric properties of the MOAS in children and adolescents are recommended.
21
ACCEPTED MANUSCRIPT 1.3.9
The Swanson, Nolan and Pelham Teacher and Parent Rating Scale (SNAP-IV) The 90 item SNAP-IV Rating Scale is a revision of the SNAP Questionnaire designed
to measure ADHD and ODD symptoms in children and young adults (Gau et al., 2009). Items 41 to 90 of the SNAP-IV contain criteria from other DSM-IV disorders such as CD, generalized anxiety disorder, and intermittent explosive disorder. These are often symptoms that overlap with ADHD, or may be comorbid disorders (Inoue et al., 2014). The SNAP-IV can be accessed
PT
online. Hand scoring appears simple. The SNAP-IV has been used in multiple clinical trials as a measure of efficacy and ADHD symptom severity (Bussing et al., 2008). One study using the
RI
SNAP-IV has reported moderate to high internal consistency (Stevens, Quittner, & Abikoff,
SC
1998). There are also published data to support the validity and reliability of the SNAP-IV for the Japanese (Inoue et al., 2014) and Chinese (Gau et al., 2009) versions. The SNAP-IV has
NU
been criticized for the lack of published psychometric properties and sparse normative data in a review of ADHD rating scales, limiting its usefulness in research and clinical practice (Collett,
MA
Ohan, & Myers, 2003; Gau et al., 2009). For the same reason, it was not possible to extensively review this instrument for its use in violent and aggressive youth. Furthermore, the SNAP-IV
D
only features a handful of symptoms (approximately 9% of the instrument) indicative of violent
PT E
or aggressive outbursts, limiting its usefulness. Out of the nine instruments, two (NCBRF and ABC) can be used to evaluate behaviors
CE
in those with low IQs or mental retardation. Both the ABC and BPI-01 can be used to rate behaviors across child and adult populations and it is unknown if scoring is adjusted for age and
AC
gender as there was lacking published information regarding their measurement models. Numbers of items in the instruments varied from 16 (MOAS) to 120 (Conners’ 3rd Edition). Periods of measured behavioral occurrences spanned from the present (CBCL) to the past year (CAS). Instruments ranged in cost from free-access to $650AUD (see Appendix). Responsiveness data were not obtained for any instrument. The NCBRF, Conners’ 3rd Edition, and the BPI-01 were rated the highest for their psychometric properties. The CAS, ABC-C, and the DBDRS were rated moderately, and the CBCL, MOAS and SNAP-IV were rated lowest (Table 4). The NCBRF, BPI-01 and CAS were the only instruments that can be used to measure
22
ACCEPTED MANUSCRIPT both frequency and severity of behaviors. The CAS and MOAS featured the most items
AC
CE
PT E
D
MA
NU
SC
RI
PT
pertaining to aggression (both 100%).
23
ACCEPTED MANUSCRIPT
Table 2: Ranking of Assessed Tools Based on Psychometric and Administrative Properties Psychometric Properties
Instrument
Reliability
Validity
Responsiveness
Conceptual
Administrative Properties
Measurement Model
Bias
Brevity
Simplicity
I R
Nisonger Child Behavior Rating
3
2
0
3
3
3
2
0
2
2
3
2
0
3
C S U
1
Behavior Problems
D E
Inventory (BPI-
T P E
01) Children’s Aggression 3
C C
2
0
Scale
3
3
23
1
2
1
2
3
20
2
2
2
3
3
1
19
3
N A
M
Score
3
3
2
2
1
1
1
18
4
3
0
2
2
2
1
2
17
5
A
(CAS) Aberrant Behavior
0
Rank
3
3
Edition
Total Accessibility
2
Form (NCBRF) Conners’ 3rd
T P
Acceptability
3
2
0
Checklist (ABC)
24
ACCEPTED MANUSCRIPT
Disruptive Behavior 3
2
0
2
0
1
3
2
3
1
17
5
3
3
16
6
Disorder Scale
T P
(DBDRS)
I R
Child Behavior Checklist
2
2
0
0
0
3
2
C S U
(CBCL) Modified Overt
N A
Aggression 0
0
0
3
3
0
Scale (MOAS)
D E
Swanson, Nolan, and Pelham 0
0
T P E
0
Rating Scale (SNAP-IV)
2
0
1
3
3
2
2
16
6
3
3
3
1
12
7
M
0
C C
3=A, 2=B, 1=C, 0=Unknown
A
25
ACCEPTED MANUSCRIPT Table 3: Ranking of Instruments and Applicability to Violence and Aggression Research Number
Rank
Measures
Measures
Psychometric
frequency
severity
Instrument
Comments
of items n (%)
Nisonger Child
Designed for use in ~7
Behavior Rating
1
Yes
those with sub-
Yes (9.2)%
PT
Form (NCBRF) Conners’ 3rd
average IQ
Lacking published
Edition
2
Yes
No
RI
~15
data for DSM-V
SC
(13.6%)
Inventory (BPI-01) 3
(CAS)
CE
Aberrant Behavior
PT E
Aggression Scale
Yes
11
designed for use in (21.2%)
average IQ
33
100% of scale
(100%)
measures aggression
Yes
Developed and ~5
5
No
standardized in
Yes (8.6%)
adults
AC
Checklist (ABC)
4
for scoring lacking,
those with sub-
D
Children’s
Published methods
Yes
MA
Yes
NU
Behavior Problems
Disruptive
Behavior Disorder
version
Extensive published 15
5
No
data for ADHD
Yes (33.3%)
Scale (DBDRS)
populations
Child Behavior
Part of a multi~4
Checklist (CBCL)
6
Yes
informant group of
No (3.3%)
instruments Modified Overt
6
No
Yes
16
Lacking validation
26
ACCEPTED MANUSCRIPT (100%)
Aggression Scale (MOAS)
in youth populations, 100% of scale measures aggression
Swanson, Nolan,
Published
and Pelham Rating 7
No
~8
psychometric and
(8.8%)
normative data
Yes
lacking
AC
CE
PT E
D
MA
NU
SC
RI
PT
Scale (SNAP-IV)
27
ACCEPTED MANUSCRIPT 1.4
DISCUSSION The strengths and weaknesses of the reviewed instruments will be discussed under the
following subheadings: instruments accounting for the heterogeneity of aggression and comorbidities of conduct disorder, instruments encompassing weighted scoring for gender differences, Instruments accounting for the severity and frequency of aggressive behaviors, instruments accounting for average and sub-average IQ, impact of technological advances on
1.4.1
PT
instruments and their development, and practicality, findings and recommendations. Instruments accounting for the heterogeneity of aggression and comorbidities of
RI
conduct disorder
SC
The assessment of violence and aggression using behavior rating tools still appears limited, with a predominate focus on oppositional and defiant behavior, rather than aggression
NU
itself (Halperin et al., 2002). Instruments that intend to measure aggression were frequently confounded by items that evaluate oppositional and defiant behavior, perhaps due to the
MA
heterogeneity and frequency of comorbidities in those with CD (Halperin et al., 2002; Klahr & Burt, 2014). The CAS and the MOAS take into consideration the heterogeneity of aggressive
D
behaviors featuring items to rate multiple subtypes of aggressive behaviors. The CAS also
PT E
considers the context of the behavior whether it occurs in the home, out of the home, between peers, between siblings, and has the option of multi-informants (teacher version or parent version).
Instruments encompassing weighted scoring for age and gender differences
CE
1.4.2
The display of violent and aggressive behaviors has different implications based on the
AC
age and gender of the aggressor (Halperin et al., 2002). The CAS, Conners’ 3rd Edition, CBCL and DBDRS all feature scoring systems that take into account weighted and/or standardized adjustments. Scoring systems that weight more severe acts more heavily than less severe acts may provide a more accurate evaluation of the problem behaviors (Hirsch, Frank, Shapiro, Hazell, & Frank, 2004). Standardising behavior for age and gender can lead to scores that can be compared to age and gender-appropriate ranges, aiding in the delineation of normative and non-normative behaviors (Steiner, Remsing, & Work Group on Quality, 2007). The BPI-01 and ABC are instruments that can be used in adults. There was minimal published information found about the scoring system for the BPI-01. It was not able to be 28
ACCEPTED MANUSCRIPT determined if the age of subjects was taken into account during its development which may limit its usefulness and be a potential source of bias. There is a further need for evaluating instruments not standardized in youth, since the presentation of their behaviors may differ from adults as a result of developmental factors (Schmidt et al., 2013). 1.4.3
Instruments accounting for the severity and frequency of aggressive behaviors The ability to assess severity independent of frequency of behaviors is an important and
PT
valuable characteristic. Instruments that do not feature weighted scoring or combine frequency and severity are problematic (Halperin et al., 2002). Weighting items depending on the severity
RI
of the act allows for more severe episodes of behavior to be mathematically weighted more
SC
heavily than less severe ones (Halperin et al., 2002). The NCBRF, BPI-01, and CAS were the only instruments that both measured frequency and severity of behaviors. The BPI-01 measures
NU
frequency and severity of behaviors independently, however published scoring methods are lacking. The NCBRF requests users to rate behavior over the past month, however combines
MA
frequency and severity rating for each item The CAS requests users to rate behavior over the past year and provides items that escalate in severity. The items are then rated on a five-point
D
Likert scale for frequency. Although there is concern over the lack of precision when estimating
PT E
frequency, using a quantitative approach (e.g., “never”, “once a month”, “most days”) may overcome error variance that is associated with the users’ perception of subjective terms such as
CE
“occasionally” and “often” (Halperin et al., 2002). The CAS was the only scale that weighted various acts of aggression to determine the severity of specific acts of aggression beyond the
1.4.4
AC
frequency.
Instruments accounting for average and sub-average IQ The assessment of learning disabilities is essential in individuals with conduct problems
as a third of children with CD have sub-average IQ (Scott, 2012). Longitudinal studies show those with early onset CD have lower IQ and are predicted to have poorer outcomes in adulthood (Lahey et al., 1995; Scott, 2012). The BPI-01 as well as the NCBRF are designed to measure subjects with sub-average IQ. The NCBRF has a recently developed “Typical IQ” version, however it is unknown if the BPI-01 is suitable to use in this group. Although the BPI01 ranked highly in this review it should be used carefully in those with typical IQ until further 29
ACCEPTED MANUSCRIPT validation can be determined. Instruments should be used with caution until validated for use in those with typical IQ and sub-average IQ. 1.4.5
Impact of technological advances on instruments and their development The emergence of technologically advanced instruments is an area of interest. The
electronic Hamilton Anatomy of Risk Management tool (e-HARM) has recently been released and is a free electronic instrument that monitors a patient to help predict the likelihood of
PT
violent and aggressive outbursts (Vogel, 2016). The instrument aggregates and charts captured data over time, while connecting to past episodes and treatments. New technology may
RI
revolutionize how aggression is measured, monitored, and managed (Vogel, 2016). Therefore, it
1.4.6
Practicality, findings and recommendations
SC
should be considered when developing new, or refining existing instruments.
NU
Overall, it became apparent that aggressive behaviors typically receive relatively little attention in broad-band behavior rating instruments with small numbers of designated items,
MA
often confounded and conflated with oppositional and defiant symptoms (Rojahn et al., 2001). The two instruments that ranked highest, NCBRF and Conners’ 3rd Edition (presented in Table
D
3), were broad-band behavior rating scales that only had approximately 9.2% and 13.6% of
PT E
items measuring violent and aggressive behaviors, respectively, limiting their usefulness. The CAS is a psychometrically sound instrument that takes into account frequency and severity of behaviors as well as age and gender effects thus, is recommended for use in this violent and
CE
aggressive youth research. All of the items in the CAS and the MOAS can be used to measure
AC
violent and/or aggressive behaviors. Further research into the psychometric properties of the MOAS in violent and aggressive youth is required before its use can be recommended.
1.5
CONCLUSION Although broad-band scales such as the NCBRF and Conners’ 3rd Edition rated
highest for their psychometric properties, their usefulness in youth violence and aggression research is limited. This has led to conclude that much of the diversity in predictor and criterion measures may be due to a fundamental lack of theoretical models considering the heterogeneity of CD. In order to measure the efficacy of a pharmcotherapeutic intervention in those with violence, aggression and/or CD, age, 30
ACCEPTED MANUSCRIPT gender, functional level, situational context and the type of informant should also be taken into account. Most behavior rating instruments contain few designated items specific to violent and aggressive behaviors and are often conflated with oppositional and defiant symptoms. All of the items in the CAS and the MOAS can be used to measure violent and/or aggressive behaviors. Further research into the psychometric properties of the MOAS in violent and aggressive youth is required before its use can be recommended. The
PT
CAS was found to be the most psychometrically sound and useful instrument that
AC
CE
PT E
D
MA
NU
SC
RI
exclusively measures aggressive behaviors in youth.
31
ACCEPTED MANUSCRIPT 1.6
ACKNOWLEDGEMENTS We thank Griffith University for partially funding this study. The study has not received
any funding from third parties. Conflict of interest: The authors declared no potential conflicts of interest with respect to the research,
AC
CE
PT E
D
MA
NU
SC
RI
PT
authorship, and/or publication of this article.
32
ACCEPTED MANUSCRIPT 1.7
REFERENCES
AC
CE
PT E
D
MA
NU
SC
RI
PT
Achenbach, T. M. (1991). Manual for the Child Behavior Checklist/4-18 and 1991 Profile. Burlington, VT: University of Vermont, Department of Psychiatry. Aman, M., Leone, S., Lecavalier, L., Park, L., Buican, B., & Coury, D. (2008). The Nisonger Child Behavior Rating Form: typical IQ version. Int Clin Psychopharmacol, 23(4), 232242. doi:10.1097/YIC.0b013e3282f94ad0 Aman, M. G. (2012). Aberrant Behavior Checklist: Current Identity and Future Developments. Clin Exp Pharmacol, 2(3), 114. Aman, M. G., De Smedt, G., Derivan, A., Lyons, B., & Findling, R. L. (2002). Double-blind, placebo-controlled study of risperidone for the treatment of disruptive behaviors in children with subaverage intelligence. Am J Psychiatry, 159(8), 1337-1346. Aman, M. G., Singh, N. N., Stewart, A. W., & Field, C. J. (1985). The aberrant behavior checklist: a behavior rating scale for the assessment of treatment effects. Am J Ment Defic, 89(5), 485-491. Aman, M. G., Tassé, M. J., Rojahn, J., & Hammer, D. (1996). The Nisonger CBRF: A child behavior rating form for children with developmental disabilities. Res Dev Disabil, 17(1), 41-57. doi:10.1016/0891-4222(95)00039-9 American Psychiatric Association (APA). (2013). Diagnostic and statistical manual of mental disorders, 5th Edition: DSM-5. Arlington, VA: American Psychiatric Publishing. Andresen, E. M. (2000). Criteria for assessing the tools of disability outcomes research. Arch Phys Med Rehabil, 81(12 Suppl 2), S15-20. Andresen, E. M., Rothenberg, B. M., Panzer, R., Katz, P., & McDermott, M. P. (1998). Selecting a generic measure of health-related quality of life for use among older adults. A comparison of candidate instruments. Eval Health Prof, 21(2), 244-264. Antrop, I., Roeyers, H., Oosterlaan, J., & Van Oost, P. (2002). Agreement Between Parent and Teacher Ratings of Disruptive Behavior Disorders in Children with Clinically Diagnosed ADHD. Journal of Psychopathology and Behavioral Assessment, 24(1), 6773. doi:10.1023/A:1014057325752 Blader, J. C., Pliszka, S. R., Jensen, P. S., Schooler, N. R., & Kafantaris, V. (2010). Stimulantresponsive and stimulant-refractory aggressive behavior among children with ADHD. Pediatrics, 126(4), e796-806. doi:10.1542/peds.2010-0086 Bor, W., Dean, A. J., Najman, J., & Hayatbakhsh, R. (2014). Are child and adolescent mental health problems increasing in the 21st century? A systematic review. Aust N Z J Psychiatry, 48(7), 606-616. doi:10.1177/0004867414533834 Bussing, R., Fernandez, M., Harwood, M., Wei, H., Garvan, C. W., Eyberg, S. M., & Swanson, J. M. (2008). Parent and teacher SNAP-IV ratings of attention deficit hyperactivity disorder symptoms: psychometric properties and normative ratings from a school district sample. Assessment, 15(3), 317-328. doi:10.1177/1073191107313888 Campbell, D. T., & Fiske, D. W. (1959). Convergent and discriminant validation by the multitrait-multimethod matrix. Psychol Bull, 56(2), 81-105. Catale, C., Geurten, M., Lejeune, C., & Meulemans, T. (2014). The Conners Parent Rating Scale: Psychometric properties in typically developing 4- to 12-year-old Belgian French-speaking children. Revue europeenne de psychologie appliquee, 64(5), 221. doi:10.1016/j.erap.2014.07.001 Cicchetti, D. V. (1994). Guidelines, Criteria, and Rules of Thumb for Evaluating Normed and Standardized Assessment Instruments in Psychology. Psychological Assessment, 6(4), 284-290. doi:10.1037/1040-3590.6.4.284 Collett, B. R., Ohan, J. L., & Myers, K. M. (2003). Ten-year review of rating scales. V: scales assessing attention-deficit/hyperactivity disorder. J Am Acad Child Adolesc Psychiatry, 42(9), 1015-1037. doi:10.1097/01.CHI.0000070245.24125.B6 Collishaw, S. (2015). Annual Research Review: Secular trends in child and adolescent mental health. Journal of Child Psychology and Psychiatry, 56(3), 370-393. doi:10.1111/jcpp.12372 Colliver, J. A., Conlee, M. J., & Verhulst, S. J. (2012). From test validity to construct validity … and back? Medical Education, 46(4), 366-371. doi:10.1111/j.13652923.2011.04194.x 33
ACCEPTED MANUSCRIPT
AC
CE
PT E
D
MA
NU
SC
RI
PT
Conners, C. K. (2010). Test Review: C. Keith Conners Conners 3rd Edition Toronto, Ontario, Canada: Multi-Health Systems, 2008. Journal of Psychoeducational Assessment, 28(6), 598-602. doi:10.1177/0734282909360011 Conners, C. K., Sitarenios, G., Parker, J. D., & Epstein, J. N. (1998). The revised Conners' Parent Rating Scale (CPRS-R): factor structure, reliability, and criterion validity. J Abnorm Child Psychol, 26(4), 257-268. Connor, D. F., Barkley, R. A., & Davis, H. T. (2000). A pilot study of methylphenidate, clonidine, or the combination in ADHD comorbid with aggressive oppositional defiant or conduct disorder. Clin Pediatr (Phila), 39(1), 15-25. Connor, D. F., McLaughlin, T. J., & Jeffers-Terry, M. (2008). Randomized controlled pilot study of quetiapine in the treatment of adolescent conduct disorder. J Child Adolesc Psychopharmacol, 18(2), 140-156. doi:10.1089/cap.2006.0007 Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16(3), 297-334. doi:10.1007/BF02310555 De Benedictis, L., Dumais, A., Stafford, M. C., Cote, G., & Lesage, A. (2012). Factor analysis of the French version of the shorter 12-item Perception of Aggression Scale (POAS) and of a new modified version of the Overt Aggression Scale (MOAS). J Psychiatr Ment Health Nurs, 19(10), 875-880. doi:10.1111/j.1365-2850.2011.01870.x Dean, A. J., Bor, W., Adam, K., Bowling, F. G., & Bellgrove, M. A. (2014). A randomized, controlled, crossover trial of fish oil treatment for impulsive aggression in children and adolescents with disruptive behavior disorders. J Child Adolesc Psychopharmacol, 24(3), 140-148. doi:10.1089/cap.2013.0093 Donovan, S. J., Stewart, J. W., Nunes, E. V., Quitkin, F. M., Parides, M., Daniel, W., . . . Klein, D. F. (2000). Divalproex treatment for youth with explosive temper and mood lability: a double-blind, placebo-controlled crossover design. Am J Psychiatry, 157(5), 818-820. Dutra, L., Campbell, L., & Westen, D. (2004). Quantifying clinical judgment in the assessment of adolescent psychopathology: Reliability, validity, and factor structure of the Child Behavior Checklist for clinician report. J Clin Psychol, 60(1), 65-85. doi:10.1002/jclp.10234 Ebesutani, C., Bernstein, A., Nakamura, B. J., Chorpita, B. F., Higa-McMillan, C. K., Weisz, J. R., & The Research Network on Youth Mental, H. (2010). Concurrent Validity of the Child Behavior Checklist DSM-Oriented Scales: Correspondence with DSM Diagnoses and Comparison to Syndrome Scales. Journal of Psychopathology and Behavioral Assessment, 32(3), 373-384. doi:10.1007/s10862-009-9174-9 Edelbrock, C. S. (1985). Child Behavior Rating Form. Psychopharmacological Bulletin(21), 835-837. Elson, M., Mohseni, M. R., Breuer, J., Scharkow, M., & Quandt, T. (2014). Press CRTT to measure aggressive behavior: the unstandardized use of the competitive reaction time task in aggression research. Psychol Assess, 26(2), 419-432. doi:10.1037/a0035569 Endicott, J., Tracy, K., Burt, D., Olson, E., & Coccaro, E. F. (2002). A novel approach to assess inter-rater reliability in the use of the Overt Aggression Scale-Modified. Psychiatry Res, 112(2), 153-159. doi:10.1016/S0165-1781(02)00185-3 Erskine, H. E., Moffitt, T. E., Copeland, W. E., Costello, E. J., Ferrari, A. J., Patton, G., . . . Scott, J. G. (2015). A heavy burden on young minds: the global burden of mental and substance use disorders in children and youth. Psychol Med, 45(7), 1551-1563. doi:10.1017/S0033291714002888 Etchells, P. J., Gage, S. H., Rutherford, A. D., & Munafo, M. R. (2016). Prospective Investigation of Video Game Use in Children and Subsequent Conduct Disorder and Depression Using Data from the Avon Longitudinal Study of Parents and Children. PLoS ONE, 11(1), e0147732. doi:10.1371/journal.pone.0147732 Farmer, C. A., Kaat, A. J., Mazurek, M. O., Lainhart, J. E., DeWitt, M. B., Cook, E. H., . . . Aman, M. G. (2016). Confirmation of the Factor Structure and Measurement Invariance of the Children's Scale of Hostility and Aggression: Reactive/Proactive in ClinicReferred Children With and Without Autism Spectrum Disorder. J Child Adolesc Psychopharmacol, 26(1), 10-18. doi:10.1089/cap.2015.0098 Frieden, T. R., Jaffe, H. W., Cono, J., Richards, C. L., & Iademarco, M. F. (2014). Youth Risk Behavior Surveillance, United States, 2013. Retrieved from Atlanta,Georgia, USA: 34
ACCEPTED MANUSCRIPT
AC
CE
PT E
D
MA
NU
SC
RI
PT
Fumeaux, P., Mercier, C., Roche, S., Iwaz, J., Bader, M., Stéphan, P., . . . Revol, O. (2016). Validation of the French Version of Conners' Parent Rating Scale Revised, Short Version: Factorial Structure and Reliability/Validation de la version française de la version révisée et abrégée de l'échelle parents de Conners; structure factorielle et fiabilité. Canadian Journal of Psychiatry, 61(4), 236. doi:10.1177/0706743716635549a Gau, S. S., Lin, C. H., Hu, F. C., Shang, C. Y., Swanson, J. M., Liu, Y. C., & Liu, S. K. (2009). Psychometric properties of the Chinese version of the Swanson, Nolan, and Pelham, Version IV Scale-Teacher Form. J Pediatr Psychol, 34(8), 850-861. doi:10.1093/jpepsy/jsn133 Halperin, J. M., McKay, K. E., Grayson, R. H., & Newcorn, J. H. (2003). Reliability, validity, and preliminary normative data for the Children's Aggression Scale-Teacher Version. J Am Acad Child Adolesc Psychiatry, 42(8), 965-971. doi:10.1097/01.CHI.0000046899.27264.EB Halperin, J. M., McKay, K. E., & Newcorn, J. H. (2002). Development, reliability, and validity of the children's aggression scale-parent version. J Am Acad Child Adolesc Psychiatry, 41(3), 245-252. doi:10.1097/00004583-200203000-00003 Hambly, J. L., Khan, S., McDermott, B., Bor, W., & Haywood, A. (2016). Pharmacotherapy of conduct disorder: Challenges, options and future directions. J Psychopharmacol. doi:10.1177/0269881116658985 Hastings, R. P., Didden, H. C. M., Rojahn, J., Matson, J. L., Kroes, D. B. H., Sharber, A. C., . . . Dumont, E. L. M. (2012). The Behavior Problems Inventory-Short Form for individuals with intellectual disabilities: Part II: reliability and validity. Journal of Intellectual Disability Research, 56(5), 546-565. doi:10.1111/j.1365-2788.2011.01506.x Hersen, M. (2006). Clinician's handbook of child behavioral assessment. Burlington, MA: Elsevier Academic Press. Hill, J., Powlitch, S., & Furniss, F. (2008). Convergent validity of the aberrant behavior checklist and behavior problems inventory with people with complex needs. Res Dev Disabil, 29(1), 45-60. doi:10.1016/j.ridd.2006.10.002 Hirsch, S., Frank, T. L., Shapiro, J. L., Hazell, M. L., & Frank, P. I. (2004). Development of a questionnaire weighted scoring system to target diagnostic examinations for asthma in adults: a modelling study. BMC Fam Pract, 5(1), 30. doi:10.1186/1471-2296-5-30 Huang, H. C., Wang, Y. T., Chen, K. C., Yeh, T. L., Lee, I. H., Chen, P. S., . . . Lu, R. B. (2009). The reliability and validity of the Chinese version of the Modified Overt Aggression Scale. Int J Psychiatry Clin Pract, 13(4), 303-306. doi:10.3109/13651500903056533 Hunsley, J., & Mash, E. J. (2010). The role of assessment in evidence-based practice. Handbook of assessment and treatment planning for psychological disorders. New York: The Guilford Press. Inoue, Y., Ito, K., Kita, Y., Inagaki, M., Kaga, M., & Swanson, J. M. (2014). Psychometric properties of Japanese version of the Swanson, Nolan, and Pelham, version-IV ScaleTeacher Form: a study of school children in community samples. Brain Dev, 36(8), 700-706. doi:10.1016/j.braindev.2013.09.003 Ivanova, M. Y., Dobrean, A., Dopfner, M., Erol, N., Fombonne, E., Fonseca, A. C., . . . Chen, W. J. (2007). Testing the 8-syndrome structure of the child behavior checklist in 30 societies. J Clin Child Adolesc Psychol, 36(3), 405-417. doi:10.1080/15374410701444363 Karabekiroglu, K., & Aman, M. G. (2009). Validity of the aberrant behavior checklist in a clinical sample of toddlers. Child Psychiatry Hum Dev, 40(1), 99-110. doi:10.1007/s10578-008-0108-7 Kay, S. R., Wolkenfeld, F., & Murrill, L. M. (1988). Profiles of aggression among psychiatric patients. I. Nature and prevalence. J Nerv Ment Dis, 176(9), 539-546. Kho, K., Sensky, T., Mortimer, A., & Corcos, C. (1998). Prospective study into factors associated with aggressive incidents in psychiatric acute admission wards. Br J Psychiatry, 172, 38-43. Klahr, A. M., & Burt, S. A. (2014). Practitioner Review: Evaluation of the known behavioral heterogeneity in conduct disorder to improve its assessment and treatment. J Child Psychol Psychiatry, 55(12), 1300-1310. doi:10.1111/jcpp.12268 35
ACCEPTED MANUSCRIPT
AC
CE
PT E
D
MA
NU
SC
RI
PT
Kronenberger, W. G., Giauque, A. L., Lafata, D. E., Bohnstedt, B. N., Maxey, L. E., & Dunn, D. W. (2007). Quetiapine addition in methylphenidate treatment-resistant adolescents with comorbid ADHD, conduct/oppositional-defiant disorder, and aggression: a prospective, open-label study. J Child Adolesc Psychopharmacol, 17(3), 334-347. doi:10.1089/cap.2006.0012 Lahey, B. B., Loeber, R., Hart, E. L., Frick, P. J., Applegate, B., Zhang, Q., . . . Russo, M. F. (1995). Four-year longitudinal study of conduct disorder in boys: patterns and predictors of persistence. J Abnorm Psychol, 104(1), 83-93. Lawrence, D., Johnson, S., Hafekost, J., Boterhoven De Haan, K., Sawyer, M., Ainley, J., & Zubrick, S. (2015). The Mental Health of Children and Adolescents. Report on the Second AustralianChild and Adolescent Survey of Mental Health and Wellbeing. . Retrieved from Lecavalier, L., Aman, M. G., Hammer, D., Stoica, W., & Mathews, G. L. (2004). Factor Analysis of the Nisonger Child Behavior Rating Form in Children with Autism Spectrum Disorders. J Autism Dev Disord, 34(6), 709-721. doi:10.1007/s10803-0045291-1 Lim, R., Liong, M. L., Lau, Y. K., & Yuen, K. H. (2015). Validity, reliability, and responsiveness of the ICIQ-UI SF and ICIQ-LUTSqol in the Malaysian population: Validation of ICIQ in Malaysia. Neurourology and Urodynamics, n/a-n/a. doi:10.1002/nau.22950 Lohr, K. N., Aaronson, N. K., Alonso, J., Burnam, M. A., Patrick, D. L., Perrin, E. B., & Roberts, J. S. (1996). Evaluating quality-of-life and health status instruments: development of scientific review criteria. Clin Ther, 18(5), 979-992. Loona, M. I., & Kamal, A. (2011). Translation and adaptation of disruptive behaviour disorder rating scale. Pakistan Journal of Psychological Research, 26(2), 149. Malone, R. P., Delaney, M. A., Luebbert, J. F., Cater, J., & Campbell, M. (2000). A doubleblind placebo-controlled study of lithium in hospitalized aggressive children and adolescents with conduct disorder. Arch Gen Psychiatry, 57(7), 649-654. Marshburn, E. C., & Aman, M. G. (1992). Factor validity and norms for the aberrant behavior checklist in a community sample of children with mental retardation. J Autism Dev Disord, 22(3), 357-373. Matson, J. L., Wilkins, J., Boisjoli, J. A., & Smith, K. R. (2008). The validity of the autism spectrum disorders-diagnosis for intellectually disabled adults (ASD-DA). Res Dev Disabil, 29(6), 537-546. doi:10.1016/j.ridd.2007.09.006 McHugh, M. L. (2012). Interrater reliability: the kappa statistic. Biochem Med (Zagreb), 22(3), 276-282. Mircea, C. E., Rojahn, J., & Esbensen, A. J. (2010). Psychometric Evaluation of Romanian Translations of the Behavior Problems Inventory-01 and the Nisonger Child Behavior Rating Form. Journal of Mental Health Research in Intellectual Disabilities, 3(1), 51. doi:10.1080/19315860903520515 Nakamura, B. J., Ebesutani, C., Bernstein, A., & Chorpita, B. F. (2009). A Psychometric Analysis of the Child Behavior Checklist DSM-Oriented Scales. Journal of Psychopathology and Behavioral Assessment, 31(3), 178-189. doi:10.1007/s10862-0089119-8 Norris, M., & Lecavalier, L. (2011). Evaluating the validity of the Nisonger Child Behavior Rating Form – Parent Version. Res Dev Disabil, 32(6), 2894-2900. doi:10.1016/j.ridd.2011.05.015 Oliver, P. C., Crawford, M. J., Rao, B., Reece, B., & Tyrer, P. (2007). Modified Overt Aggression Scale (MOAS) for People with Intellectual Disability and Aggressive Challenging Behaviour: A Reliability Study. Journal of Applied Research in Intellectual Disabilities, 20(4), 368-372. doi:10.1111/j.1468-3148.2006.00346.x Padhy, R., Saxena, K., Remsing, L., Huemer, J., Plattner, B., & Steiner, H. (2011). Symptomatic response to divalproex in subtypes of conduct disorder. Child Psychiatry Hum Dev, 42(5), 584-593. doi:10.1007/s10578-011-0234-5 Pelham, W. E., Jr., Gnagy, E. M., Greenslade, K. E., & Milich, R. (1992). Teacher ratings of DSM-III-R symptoms for the disruptive behavior disorders. J Am Acad Child Adolesc Psychiatry, 31(2), 210-218. doi:10.1097/00004583-199203000-00006 36
ACCEPTED MANUSCRIPT
AC
CE
PT E
D
MA
NU
SC
RI
PT
Pelletier, J., Collett, B., Gimpel, G., & Crowley, S. (2006). Assessment of Disruptive Behaviors in Preschoolers: Psychometric Properties of the Disruptive Behavior Disorders Rating Scale and School Situations Questionnaire. Journal of Psychoeducational Assessment, 24(1), 3-18. doi:10.1177/0734282905285235 Reebye, P. (2005). Aggression during early years - infancy and preschool. Can Child Adolesc Psychiatr Rev, 14(1), 16-20. Reebye, P., & Moretti, M. (2005). Perspectives on childhood and adolescent aggression. Can Child Adolesc Psychiatr Rev, 14(1), 2. Roberts, P., & Priest, H. (2006). Reliability and validity in research. Nurs Stand, 20(44), 41-45. doi:10.7748/ns2006.07.20.44.41.c6560 Rojahn, J. (1984). Self-injurious behavior in institutionalized, severely/profoundly retarded adults:Prevalence data and staff agreement. Journal of Behavioral Assessment, 6(1), 1327. doi:10.1007/BF01321457 Rojahn, J., Aman, M. G., Matson, J. L., & Mayville, E. (2003). The Aberrant Behavior Checklist and the Behavior Problems Inventory: convergent and divergent validity. Res Dev Disabil, 24(5), 391-404. Rojahn, J., & Helsel, W. J. (1991). The Aberrant Behavior Checklist with children and adolescents with dual diagnosis. J Autism Dev Disord, 21(1), 17-28. doi:10.1007/BF02206994 Rojahn, J., Matson, J. L., Lott, D., Esbensen, A. J., & Smalls, Y. (2001). The Behavior Problems Inventory: an instrument for the assessment of self-injury, stereotyped behavior, and aggression/destruction in individuals with developmental disabilities. J Autism Dev Disord, 31(6), 577-588. Rojahn, J., Rowe, E. W., Macken, J., Gray, A., Delitta, D., Booth, A., & Kimbrell, K. (2010). Psychometric Evaluation of the Behavior Problems Inventory-01 and the Nisonger Child Behavior Rating Form with Children and Adolescents. Journal of Mental Health Research in Intellectual Disabilities, 3(1), 28. doi:10.1080/19315860903558168 Rojahn, J., Rowe, E. W., Sharber, A. C., Hastings, R., Matson, J. L., Didden, R., . . . Dumont, E. L. (2012). The Behavior Problems Inventory-Short Form for individuals with intellectual disabilities: part II: reliability and validity. J Intellect Disabil Res, 56(5), 546-565. doi:10.1111/j.1365-2788.2011.01506.x Rojahn, J., Schroeder, S. R., Mayo-Ortega, L., Oyama-Ganiko, R., LeBlanc, J., Marquis, J., & Berke, E. (2013). Validity and reliability of the Behavior Problems Inventory, the Aberrant Behavior Checklist, and the Repetitive Behavior Scale-Revised among infants and toddlers at risk for intellectual or developmental disabilities: a multi-method assessment approach. Res Dev Disabil, 34(5), 1804-1814. doi:10.1016/j.ridd.2013.02.024 Sawyer, M. G., Arney, F. M., Baghurst, P. A., Clark, J. J., Graetz, B. W., Kosky, R. J., . . . Aubrick, S. R. (2000a). The Mental Health of Young People in Australia. Appendix A: Mean scores on the child behaviour checklist and youth self-report. Canberra: Commonwealth Department of Health and Aged Care. Sawyer, M. G., Arney, F. M., Baghurst, P. A., Clark, J. J., Graetz, B. W., Kosky, R. J., . . . Aubrick, S. R. (2000b). The Mental Health of Young People in Australia. Mental Health and Special Programs. Canberra: Commonwealth Department of Health and Aged Care. Schmidt, J. D., Huete, J. M., Fodstad, J. C., Chin, M. D., & Kurtz, P. F. (2013). An evaluation of the Aberrant Behavior Checklist for children under age 5. Res Dev Disabil, 34(4), 1190-1197. doi:10.1016/j.ridd.2013.01.002 Scott, S. (2012). Conduct Disorders. In I. R. J. (ed) (Series Ed.) e-Textbook of Child and Adolescent Mental Health Siddons, H., & Lancaster, S. (2004). An overview of the use of the Child Behavior Checklist within Australia. Camberwell, Vic: ACER. Silva, R. R., Alpert, M., Pouget, E., Silva, V., Trosper, S., Reyes, K., & Dummit, S. (2005). A rating scale for disruptive behavior disorders, based on the DSM-IV item pool. Psychiatr Q, 76(4), 327-339. doi:10.1007/s11126-005-4966-x Slade, M., Thornicroft, G., & Glover, G. (1999). The feasibility of routine outcome measures in mental health. Soc Psychiatry Psychiatr Epidemiol, 34(5), 243-249. 37
ACCEPTED MANUSCRIPT
AC
CE
PT E
D
MA
NU
SC
RI
PT
Smith, R. M. (2004). Detecting item bias with the Rasch model. J Appl Meas, 5(4), 430-449. Sorgi, P., Ratey, J., Knoedler, D. W., Markert, R. J., & Reichman, M. (1991). Rating aggression in the clinical setting. A retrospective adaptation of the Overt Aggression Scale: preliminary results. J Neuropsychiatry Clin Neurosci, 3(2), S52-56. Steiner, H., Petersen, M. L., Saxena, K., Ford, S., & Matthews, Z. (2003). Divalproex sodium for the treatment of conduct disorder: a randomized controlled clinical trial. J Clin Psychiatry, 64(10), 1183-1191. Steiner, H., Remsing, L., & Work Group on Quality, I. (2007). Practice parameter for the assessment and treatment of children and adolescents with oppositional defiant disorder. J Am Acad Child Adolesc Psychiatry, 46(1), 126-141. doi:10.1097/01.chi.0000246060.62706.af Steiner, H., Saxena, K. S., Carrion, V., Khanzode, L. A., Silverman, M., & Chang, K. (2007). Divalproex sodium for the treatment of PTSD and conduct disordered youth: a pilot randomized controlled clinical trial. Child Psychiatry Hum Dev, 38(3), 183-193. doi:10.1007/s10578-007-0055-8 Stevens, J., Quittner, A. L., & Abikoff, H. (1998). Factors influencing elementary school teachers' ratings of ADHD and ODD behaviors. J Clin Child Psychol, 27(4), 406-414. doi:10.1207/s15374424jccp2704_4 Stocks, J. D., Taneja, B. K., Baroldi, P., & Findling, R. L. (2012). A phase 2a randomized, parallel group, dose-ranging study of molindone in children with attentiondeficit/hyperactivity disorder and persistent, serious conduct problems. J Child Adolesc Psychopharmacol, 22(2), 102-111. doi:10.1089/cap.2011.0087 Suris, A., Lind, L., Emmett, G., Borman, P. D., Kashner, M., & Barratt, E. S. (2004). Measures of aggressive behavior: overview of clinical and research instruments. Aggression and Violent Behavior, 9(2), 165-227. doi:10.1016/S1359-1789(03)00012-0 Tassé, M. J., Aman, M. G., Hammer, D., & Rojahn, J. (1996). The Nisonger child behavior rating form: Age and gender effects and norms. Res Dev Disabil, 17(1), 59-75. doi:10.1016/0891-4222(95)00037-2 Ter-Stepanian, M., Grizenko, N., Zappitelli, M., & Joober, R. (2010). Clinical response to methylphenidate in children diagnosed with attention-deficit hyperactivity disorder and comorbid psychiatric disorders. Can J Psychiatry, 55(5), 305-312. Vogel, L. (2016). New tool evaluates risk of patient aggression. CMAJ : Canadian Medical Association journal = journal de l'Association medicale canadienne, 188(10), E200E200. doi:10.1503/cmaj.109-5276 Waschbusch, D. A., & Elgar, F. J. (2007). Development and validation of the Conduct Disorder Rating Scale. Assessment, 14(1), 65-74. doi:10.1177/1073191106289908 Wasserman, J., & Bracken, B. (2003). Psychometric Characteristics of Assessment Procedures Handbook of Psychology: John Wiley & Sons, Inc. Wehmeier, P. M., Schacht, A., Dittmann, R. W., Helsberg, K., Schneider-Fresenius, C., Lehmann, M., . . . Ravens-Sieberer, U. (2011). Effect of atomoxetine on quality of life and family burden: results from a randomized, placebo-controlled, double-blind study in children and adolescents with ADHD and comorbid oppositional defiant or conduct disorder. Qual Life Res, 20(5), 691-702. doi:10.1007/s11136-010-9803-5 Westerlund, J., Ek, U., Holmberg, K., Naswall, K., & Fernell, E. (2009). The Conners' 10-item scale: findings in a total population of Swedish 10-11-year-old children. Acta Paediatr, 98(5), 828-833. doi:10.1111/j.1651-2227.2008.01214.x World Health Organisation (WHO). (1991). ICD-10 Classification of Mental and Behavioural Disorders: Clinical Descriptions and Diagnostic Guidelines. Geneva: World Health Organization (WHO). Xiao, Q. (2012). Reliability and validity of the Iowa Conners rating scale in Chinese children. Neuropsychiatrie de l'Enfance et de l'Adolescence, 60(5), S266. doi:10.1016/j.neurenf.2012.04.698 Zhang, P., Roberts, R. E., Liu, Z., Meng, X., Tang, Z., Sun, L., & Yu, Y. (2012). Hostility, Physical Aggression and Trait Anger as Predictors for Suicidal Behavior in Chinese Adolescents:A School-Based Study. PLoS ONE, 7(2), 1-5.
38
ACCEPTED MANUSCRIPT APPENDIX Features of Included Instruments
Measures
Numb
Dom Time
Questi
Age
Cos
er of
ains
on
(years
t
items
or
types
)
($A
(time
Subs
UD
to
cales
)
measured
PT
Name
RI
compl
e 201
problem behaviours
Behavio
and social
ur
competence in
Rating
children with low
Form
IQs/mental
(NCBR
retardation.
’ 3rd Edition
Over the
Closed
past
and 1
month (for
open
6 3-16
0
Closed
6-18
650
Closed
5-18
380
closed questions); 1-2 months for open questions.
Behaviours
110
associated with
(20
conduct and
minute
aggression in
s)
AC
Conners
CE
F)
8
MA
r Child
76
D
Parent assessment of
PT E
Nisonge
NU
SC
ete)
Jun
7
Past month
ADHD +/CD/ODD. Children
Parent observed
33
’s
frequency and
(Parent
5
Past year
and 2
39
ACCEPTED MANUSCRIPT Aggressi
severity of
), 23
open
on Scale
aggressive and
(Teach
(CAS)
disruptive
er)
behaviours.
(10-15 minute s)
ur
in residential/
Checklis
community/
t
educational settings
(ABCC) Behavioural and
Behavio
emotional
ur
functioning and
minute
Checklis
social competence
s)
Behavio
PT E
2
(15-20
173
adult, eviden ce lackin g for <5yrs old46
Present to
Open
6-18
6 months
and
years
180
closed
CE AC
6-18
120
D
Child
Child to
MA
nity)
(CBCL)
Closed
available
(Commu
t
Not
PT
problem behaviours
5
RI
Behavio
58
SC
Mental retardation
NU
Aberrant
Self-injurious,
52
3
2 months
Closed
Childr
N/ A
ur
stereotyped and
en -
Problem
aggressive/destructiv
adults
s
e behaviours in the
Inventor
intellectually
y
disabled
40
ACCEPTED MANUSCRIPT (BPI01) Assess the nature,
16
4
d Overt
prevalence and
(<15
Aggress
severity of
minute
ion
aggression in
s)
Scale
psychiatric
(MOAS
populations.
Not
Closed
specified
specifi
ve
of ADHD, ODD,
(5-10
Behavio
and CD in children
minute
ur
and adolescents.
s)
6 months
RI
4
Disorder Rating
Not
0
specifi ed
PT E
D
Scale Parent assessment of
90
Vari
Weekly to
n,
ADHD, ODD and
items
ed
life-time
Nolan,
aggression
(10
and
symptoms
AC
CE
Swanso
Scale
Closed
SC
45
NU
Identify symptoms
MA
Disrupti
Rating
0
ed
)
Pelham
Not
PT
Modifie
Closed
Not
0
specifi ed
minute s)
(SNAPIV)
41
ACCEPTED MANUSCRIPT
AC
CE
PT E
D
MA
NU
SC
RI
PT
Graphical abstract
42
ACCEPTED MANUSCRIPT Highlights Instruments used to measure violent and aggressive behaviors in youth are lacking These tools are essential for conducting randomised controlled trials (RCTs) in this population We reviewed the instruments used in RCTs in violent and aggressive youth Most instruments aren’t specifically focussed on violence and aggression
AC
CE
PT E
D
MA
NU
SC
RI
PT
The CAS is psychometrically sound and measures aggressive and violent behaviors
43