Instruments for evaluating pharmacotherapy intervention efficacy in violent and aggressive behavior and conduct disorder in youth

Instruments for evaluating pharmacotherapy intervention efficacy in violent and aggressive behavior and conduct disorder in youth

Accepted Manuscript Instruments for evaluating pharmacotherapy intervention efficacy in violent and aggressive behavior and conduct disorder in youth ...

810KB Sizes 0 Downloads 47 Views

Accepted Manuscript Instruments for evaluating pharmacotherapy intervention efficacy in violent and aggressive behavior and conduct disorder in youth

Jessica L. Hambly, Sohil Khan, Brett McDermott, William Bor, Alison Haywood PII: DOI: Reference:

S1359-1789(16)30159-8 doi: 10.1016/j.avb.2017.04.004 AVB 1106

To appear in:

Aggression and Violent Behavior

Received date: Revised date: Accepted date:

15 October 2016 2 March 2017 13 April 2017

Please cite this article as: Jessica L. Hambly, Sohil Khan, Brett McDermott, William Bor, Alison Haywood , Instruments for evaluating pharmacotherapy intervention efficacy in violent and aggressive behavior and conduct disorder in youth. The address for the corresponding author was captured as affiliation for all authors. Please check if appropriate. Avb(2017), doi: 10.1016/j.avb.2017.04.004

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT Instruments for evaluating pharmacotherapy intervention efficacy in violent and aggressive behavior and conduct disorder in youth

Abbreviated Title: Instruments measuring pharmacotherapy efficacy in violent and aggressive youth

Authors: a

PT

Jessica L Hamblya, Sohil Khana,b, Brett McDermottc, William Borb,d, Alison Haywooda,b School of Pharmacy, Menzies Health Institute Queensland, Griffith University, Gold Coast,

RI

QLD, 4222, Australia b

Mater Research Institute - The University of Queensland, Brisbane, QLD, Australia

School of Medicine and Dentistry, James Cook University, Townsville, Australia

SC

c

d

Mater Child and Youth Mental Health Service, Mater Health Service, South Brisbane, QLD,

NU

Australia

Author emails:

MA

Ms. Jessica Hambly – [email protected] Dr. Sohil Khan – [email protected]

Prof. Brett McDermott – [email protected]

D

Dr. William Bor – [email protected]

Corresponding author:

CE

Jessica Hambly

PT E

Dr. Alison Haywood – [email protected]

School of Pharmacy Gold Coast Campus

Australia

AC

Griffith University QLD 4222

Tel: + 61 431985230; Fax: + 61 755528804; Email: [email protected]

1

ACCEPTED MANUSCRIPT Abstract There is a need to identify the most appropriate standardized instruments for research evaluating pharmacotherapy for youth with violent and aggressive behaviors. Youth violence and aggression are heterogeneous behaviours which differ depending on age and gender. Instruments used in randomized controlled trials evaluating efficacy of pharmacotherapy in conduct disorder and its comorbidities were reviewed for psychometric, administrative and

PT

practicality evidence. Evidence was rated on a 3-point scale, adapted from the Scientific

RI

Advisory Committee’s Instrument Review Criteria.

Of the nine included instruments, the Nisonger Child Behavior Rating Form (NCBRF),

SC

Conners’ 3rd Edition, and Behavior Problems Inventory (BPI-01) were rated the highest for

NU

their psychometric properties. The Children’s Aggression Scale (CAS), Abberrant Behavior Checklist (ABC) and Disruptive Behavior Disorder Rating Scale (DBDRS) were rated

MA

moderate, and the Child Behavior Checklist (CBCL), Modified Overt Aggression Scale (MOAS) and Swanson, Nolan, and Pelham Rating Scale (SNAP-IV) were rated lowest. The NCBRF, BPI-01 and CAS were the only instruments that could be used to measure both

D

frequency and severity of aggressive behaviors. The CAS and MOAS featured the most items

PT E

pertaining to violence and aggression.

The broad-band scales, the NCBRF and Conners’ 3rd Edition, rated highest for their

CE

psychometric properties, however their usefulness in youth violence and aggression research is limited. The heterogeneity of aggressive and violent behaviors, age, gender,

AC

functional level, situational context and the type of informant should be taken into account when considering an appropriate instrument. All items in the CAS and the MOAS can be used to measure violent and/or aggressive behaviors. Further research into the psychometric properties of the MOAS in violent and aggressive youth is required before its use can be recommended. The CAS was found to be the most psychometrically sound and useful instrument that exclusively measures aggressive behaviors in youth.

Keywords: instruments, outcome measures, aggression, violence, youth, efficacy

2

ACCEPTED MANUSCRIPT 1.1

BACKGROUND Youth, those less than 25 years old, comprise approximately 44% of the world’s

population (Sawyer et al., 2000b). Half of all cases of diagnosed psychological disorders usually develop before age 14, emphasising the importance of youth in relation to mental health conditions and disorders (Erskine et al., 2015). Recent reviews of 21st century trends suggest declines in youth violent and aggressive behavior across developed nations (Bor, Dean, Najman,

PT

& Hayatbakhsh, 2014; Collishaw, 2015). Despite the declining or stabilizing levels of youth violent and aggressive behaviors, the moderate to high levels of the associated problems result

RI

in considerable costs to the affected individuals, their families, and the community (Bor et al.,

SC

2014). In the United States physical violence in school-aged children was reported at 24.7% in 2013 (Frieden, Jaffe, Cono, Richards, & Iademarco, 2014). Whereas, in China, a large cross-

NU

sectional sample of urban adolescents revealed self-reported mean rates of physical aggression being 13.38% and verbal aggression 12.95% in 2012 (Zhang et al., 2012). Further, an Australian

MA

population study found 8.8% of adolescents, 11 to 17 years reported conduct problems (Lawrence et al., 2015), and in the United Kingdom, 6.6% of adolescents, aged 11 to 16

D

received a clinical diagnosis of conduct disorder (CD) (Etchells, Gage, Rutherford, & Munafo,

PT E

2016).

Aggression may take many forms (e.g., physical and verbal) and have many functions

CE

(e.g., impulsive and instrumental) (Farmer et al., 2016). Aggressive behavior can be defined as an act directed towards a specific person, object or animal with the intent to hurt or frighten.

AC

Children present with more overt aggressive behaviors rather than covert and onset occurs more commonly in childhood rather than adolescence (Reebye & Moretti, 2005). Violence can be defined as the intentional use of threatened or actual, physical force or power against oneself or another person, which results in, or has a high likelihood of causing harm (World Health Organisation (WHO), 1991). As a behavioral construct, aggression is one of the core characteristics displayed for a diagnosis of CD in the Diagnostic and Statistical Manual of Mental Disorders, 5th edition (DSM-V) (American Psychiatric Association (APA), 2013). CD is often considered a diagnostic challenge in child and adolescent mental health, due to

3

ACCEPTED MANUSCRIPT heterogeneity within the classification and including the high prevalence of comorbidities (Klahr & Burt, 2014). Youth aggression is a multifaceted phenomenon that is displayed differently depending on age and gender (Halperin, McKay, & Newcorn, 2002). Several points need to be considered before identifying aggressive behavior as a disorder, such as the differentiation between prosocial assertive play as opposed to behaviour with the intent to

PT

hurt or frighten. Aggressive symptoms may change with developmental competence in

RI

motor and cognitive domains. For example, preschool aged children may display instrumental and physical expressions of aggression whereas, school-age children may

SC

exhibit hostile aggression through name-calling, criticizing and ridiculing (Reebye, 2005).

NU

In the context of intervention monitoring and evaluation studies, there is a requirement to distinguish between normal and abnormal levels of violence and aggression, and the

MA

nature and quality of the behaviors that fall outside of typical development expectations (Halperin, McKay, Grayson, & Newcorn, 2003).

Behavior rating scales are instruments used to complete a relatively quick, normative-

D

based assessment of child behavior. They can be used to measure clinical outcomes such as the

PT E

efficacy and safety of an intervention, and/or patient related factors that may influence the success of intervention (e.g., adherence, social environment, quality of life) (Hunsley & Mash,

CE

2010). In clinical research practitioner assessments are limited by lack of inter-rater and testretest reliability, difficulties standardizing clinical knowledge and experience, as well as cost

AC

and time constraints (Slade, Thornicroft, & Glover, 1999). For the purpose of outcome measurement in clinical research, the use of assessment tools or instruments are often employed. They are often formal, structured instruments, that demonstrate psychometric properties such as reliability and validity (Slade et al., 1999). Use of a statistically reliable instrument, appropriately selected based on its applicability to the population being studied and the types of questions being asked, is essential for effective research (Suris et al., 2004). There is a need to identify appropriate instruments for use in violent and aggressive youth particularly as there are reports of unstandardized instruments with unclear

4

ACCEPTED MANUSCRIPT psychometric properties being used in contemporary research (Elson, Mohseni, Breuer, Scharkow, & Quandt, 2014). Multiple instruments are available for the diagnosis and assessment of CD in youth, including those that assess CD and other behavioral symptoms and comorbidities (i.e., broadband), and those that are violence and aggression specific (i.e., narrow-band) (Farmer et al., 2016; Hersen, 2006). However, in clinical research it is useful if the instrument measuring

PT

aggression and violence can assess the frequency and or severity of problem behaviors

RI

(Halperin et al., 2002). Although pharmacotherapy is seldom first-line in the management of violent and aggressive behaviors in youth, numerous classes of medications have been

SC

documented with the large majority only supported by anecdotal evidence and exploratory investigation (Hambly, Khan, McDermott, Bor, & Haywood, 2016). A review investigating the

NU

role of pharmacotherapy in CD revealed that no particular instrument was a ‘gold-standard’

MA

measure of outcome to evaluate violent and aggressive behaviors in youth receiving pharmacological intervention (Hambly et al., 2016). It was therefore evident that there is a need to identify the most appropriate standardized instruments for pharmacotherapy intervention

1.2.1

METHODS

PT E

1.2

D

efficacy research involving youth with violent and aggressive behaviors.

Search strategy

A review evaluating the psychometric, administrative and practicality evidence of

CE

instruments in pharmacotherapeutics efficacy studies in youth with violence and aggression was

AC

conducted. Firstly, instruments utilized to evaluate efficacy of intervention in randomised controlled trials (RCTs) of pharmacotherapy in the management of CD and its related comorbidities in youth were included based on the findings from the most recently published systematic review (Hambly et al., 2016). The studies were published in English from January 2000 to an updated search till January 2016. Secondly, the name of the instrument and the key words ‘validity’, ‘reliability’, ‘children’, ‘adolescents’, and ‘youth’ were used to search for literature in MEDLINE, OVID SP and EMBASE. Literature for each tool was investigated for evidence of, or instrument specific information pertaining to, the criteria listed in summary Table 1. Additional literature was identified through searching reference lists of papers 5

ACCEPTED MANUSCRIPT identified after initial screening. For the analysis of validity and reliability, the search focused on the use of the instruments in youth populations who displayed violent and/or aggressive behavior. The titles and abstracts of all articles identified were screened independently by two review authors (JH and SK). If conflicting opinions arose during screening, a third author (AH) determined if inclusion in the study was appropriate. 1.2.2

Scoring

PT

1.2.2.1

Psychometric and administrative evaluation

Scoring criteria for psychometric and administrative properties for each instrument is

RI

outlined below. Retrieved information was graded on a 3-point scale adapted from the Scientific

SC

Advisory Committee’s Instrument Review Criteria (Lohr et al., 1996) by (Andresen, 2000). Generally, an instrument displaying high quality evidence for the evaluated characteristic was

NU

awarded an ‘A’ grade. A ‘B’ grade was given if it displayed moderate quality, a ‘C’ grade if it had poor quality, and an unknown (‘U’) grading if no information was available. To rank the

MA

instruments, a numeric score was assigned to the alphabetical grades as follows: U=0, A=1, B=2, C=3. Reliability

D

1.2.2.2

PT E

Reliability refers to the degree to which an instrument consistently produces the same result at different times, from independent observers, and/or different samples (Wasserman & Bracken, 2003). There are three general classes of reliability used when evaluating instruments: Inter-Rater/Observer Reliability - assesses the degree to which independent interviewers

CE



give consistent estimates using identical instruments (Roberts & Priest, 2006); Test-Retest Reliability - assesses the consistency of results from two separate

AC



administrations of an instrument, to the same participants (Roberts & Priest, 2006); 

Internal Consistency Reliability - assesses the consistency to which items in a given instrument, domain, or sub-test are grouped correctly. This is often measured by Cronbach’s coefficient alpha (Cicchetti, 1994; Cronbach, 1951);

For the purpose of inter-rater and test-retest reliability scoring an intraclass correlation coefficient (ICC) or kappa coefficient (κ) ≥0·80 was considered excellent (graded ‘A’), <0·60 considered poor (graded ‘C’), and between these ranges (≥0.60, <0.80), moderate

6

ACCEPTED MANUSCRIPT (graded ‘B’) (McHugh, 2012). For internal consistency of scales a Cronbach’s alpha (α) or Kuder-Richardson Formula 20 (KR-20) ≥0·80 scored excellent (‘A’), ≥0·70 and 0·80 adequate (‘B’), and <0·70 low or inadequate (‘C’).(Andresen, 2000; Endicott, Tracy, Burt, Olson, & Coccaro, 2002) 1.2.2.3

Validity An instrument is considered robust in validity if there is strong evidence and theory that

PT

support the appropriateness, meaningfulness, and usefulness of test results and interpretation (Colliver, Conlee, & Verhulst, 2012). If the multitrait-multimethod matrix (MTMM) and/or

RI

correlation coefficient were used for the assessment of convergent validity, the following scores

SC

were allocated, ‘A’ = strong correlation (≥0.60), ‘B’ = moderate correlation (≥0.30 and <0.60), and ‘C’ = weak correlation (<0.30) (Campbell & Fiske, 1959). The opposite scale for scoring

NU

was applied for discriminate validity, ‘A’ = ≤0.30, ‘B’ = <0.30 and ≥0.60, and ‘C’ = >0.60 (Campbell & Fiske, 1959). For factorial validity if root mean square error of approximation

MA

(RMSEA) ≤0.05, non-normed fit of index (NNFI) ≥0.95 or confirmatory factor analysis (CFI) ≥0.95 (Lecavalier, Aman, Hammer, Stoica, & Mathews, 2004) was obtained it was considered a

D

good fit (‘A’). If RMSEA >0.05 and ≤0.08, NNFI or CFI ≥0.90 and <0.95 it was considered an

PT E

acceptable fit (‘B’) and values outside of these ranges were considered poor fitting (‘C’). All other scores were allocated on face validity or statistical properties depending upon which

1.2.2.4

CE

method was used.

Sensitivity - responsiveness

AC

The sensitivity or responsiveness of an instrument refers to the ability to detect any small but meaningful changes due to an intervention (Andresen, 2000; Lim, Liong, Lau, & Yuen, 2015). Effect size and standardized response means are the two most common statistical measures for responsiveness (Lim et al., 2015). Values of effect size index and standardized response means of 0.5 or greater but less than 0.8 are considered a moderate effect and 0.8 or greater deemed as a large effect (Lim et al., 2015). If an instrument had evidence for responsiveness expressed as effect size and derived from longitudinal data with clearly identified population and documented time intervals it was awarded an ‘A’ rating. If there was

7

ACCEPTED MANUSCRIPT weak evidence or it was based solely on statistical significance (p value) it was awarded a ‘C’. If evidence fell between these two extremes the instrument was rated a ‘B’. 1.2.2.5

Brevity – respondent burden Respondent burden is defined as the time, energy and demands placed upon those being

administered an instrument (Lohr et al., 1996). A maximum of 15 minutes is considered appropriate for a primary outcome measure (Andresen, 2000). The content should be

PT

appropriate for the target population and special training or resources should not be required. If a long questionnaire is warranted or particularly inconvenient, a study protocol should include

RI

compensation to reduce this burden e.g., payment (Andresen, 2000). Assessment of respondent

SC

burden may take the form of both quantitative and qualitative investigation. Quantitative assessment may include rating respondent satisfaction, whereas qualitative assessment may

NU

include reviewing respondent opinions and reactions (Andresen, 2000). A rating of ‘A’ was awarded for brevity (<15 minutes duration) and appropriate content. Grade ‘C’ was applied to

MA

instruments that were lengthy and had no evidence of appropriateness. Grade ‘B’ was awarded for those less than 30 minutes with some evidence of appropriate content. Simplicity – administrative burden

D

1.2.2.6

PT E

An administratively simple questionnaire is considered short, easily scored, without the need for a sophisticated computer program, and does not require specialist training or qualifications (Andresen, 2000; Lohr et al., 1996). If data gathered using the instrument was

CE

easily scored by hand, understood, inexpensive and required no formal training, it was awarded

AC

an ‘A’. If expensive, onerous for administration and scoring, and interpretation not straightforward, it was rated a ‘C’. Ratings falling between the 2 extremes ranked ‘B’. 1.2.2.7

Acceptability – cultural or language adaptations

The assessment of translation or cultural adaptation of an instrument involves two key criteria, ensuring conceptual and linguistic equivalence, and the evaluation of measurement properties (Lohr et al., 1996). To receive a high rating (‘A’) the instrument had evidence of validity for cultural adaptation and/or had information about methods to achieve linguistic equivalence (e.g., forward and backwards translations). If no evidence could be found for the

8

ACCEPTED MANUSCRIPT validity and methodology of translation or adaptation a ‘C’ grade was awarded. If rated in between, the instrument was given a ‘B’ for acceptability. 1.2.2.8

Accessibility – alternative mediums The availability and assessment of alternative version in administration of instruments

(other than the original version) is a concept referred to as accessibility (Andresen, 2000; Lohr et al., 1996). To rate an ‘A’ for accessibility, instruments were available in multiple forms and

PT

found free of mode effects (i.e., no differences between the original version and transformed version upon administration). To rate a ‘B’ the instrument was available in other forms however

1.2.2.9

SC

forms or mode information available (Andresen, 2000).

RI

no testing for mode effects were evident. A ‘C’ grade reflected that there were no alternative

Conceptual model

NU

A conceptual model is a rationale for, and description of, the concept/s that the instrument is intended to assess and the relationship between those concepts (Lohr et al., 1996).

MA

For assessment, instruments were graded on how well they captured the intended conceptual framework. If an instrument carefully defined their measurement construct and domains, their

D

grading reflected this. If domains were completely covered an ‘A’ was awarded, adequately

PT E

covered attracted a ‘B’, and inadequately was given a ‘C’(Andresen, 2000). 1.2.2.10 Measurement model

A measurement model refers to an instrument’s scale and subscale structure, and the

CE

procedure followed to create scale and subscale scores (Lohr et al., 1996). For scoring an ‘A’ was awarded if there was no skewness or evidence for development of scoring, ‘B’ was awarded

AC

if there was skewness intermediate or conflicting evidence for development of scoring, and ‘C’ was awarded for substantial skewness (>20% of a groups scores at either extreme of a range) (Andresen, Rothenberg, Panzer, Katz, & McDermott, 1998) or no evidence for development of scoring. 1.2.2.11 Bias Instruments should not be affected by differences in culture, social circumstances, or impairment type, except for differences the tool intends to measure. Statistical methods such as Rasch Analysis or Confirmatory Statistical Analysis (CFA) can be used to assist with the detection of item and instrument bias, (Smith, 2004) however, a large proportion of grading 9

ACCEPTED MANUSCRIPT qualitative evidence of bias is somewhat subjective. For subjective grading an ‘A’ was awarded to instruments with clear evidence of review. A ‘B’ grade awarded if formal evidence was lacking but had good face validity for bias, and a ‘C’ grade was awarded if bias was evident. 1.2.3

Applicability to violence and aggression research

For practicality assessment, instruments were also evaluated independent of their psychometric properties, investigating the ability to measure violent and aggressive behaviors,

PT

and their frequency and severity. Instruments were further screened against the following questions:

Can the instrument be used to measure the frequency of violent behaviors?



Can the instrument be used to measure the frequency of aggressive behaviors?



Can the instrument be used to measure the severity of violent behaviors?



Can the instrument be used to measure the severity of aggressive behaviors?



How many items in the instrument measure violent behaviors?



How many items in the instrument measure aggressive behaviors?

MA

NU

SC

RI



Table 1 summarises the quality assessment criteria applied for instruments in the current

AC

CE

PT E

D

systematic review.

10

ACCEPTED MANUSCRIPT

Table 1: A summary of characteristics considered desirable for an appropriate standardized outcome measure Characteristic Reliability

Explanation

Criteria for Grading (Andresen, 2000; Lohr et al., 1996; McHugh, 2012)

Measurements by different observers, at different

Inter-rater/test-retest (ICC or κ)

Internal consistency (Cronbach α or KR-20)

times or in parallel ways produce the same result.

A = ≥0.80

A = ≥ 0.80

T P

I R

B = ≥0.70, <0.80

B = ≥0.60, <0.80 C = <0.60 Validity

The instrument measures the intended outcome.

coefficient)

U N

A = ≥0.60

A M

B = ≥0.30, <0.60 C = <0.30 Responsiveness

degrees of change.

Brevity

Simplicity

Acceptability

D E

The instrument captures clinically meaningful

PT

The instrument length and content is appropriate.

E C

C A

SC

Convergent (correlation

C = <0.70

Factorial (RMSEA)

Factorial (CFI or NNFI)

A = ≤0.05

A = ≥0.95

B = >0.05, ≤0.08

B = ≥0.90, <0.95

C = >0.08

C = <0.90

A = Strong evidence with large effect size B = Moderate or conflicting evidence C = Weak evidence or based only on statistical significance, not effect size A <15 minutes and appropriate content B >15 minutes, <30 minutes and appropriate content C >30 minutes and/or inappropriate content

No formal training is required to use the

A = Inexpensive, no training, easy scoring and interpretation

instrument. It is easy and inexpensive to

B = Somewhat expensive, more obscure scoring and interpretation

administer, score and interpret. Ratings are clear.

C = Expensive and/or complex scoring and/or interpretation

Cultural and language adaptations of the

A = Validity evidence for translation and/or adaptation and methodology

11

ACCEPTED MANUSCRIPT

instrument exist.

B = Evidence has some problems for translation and/or adaptation C = No evidence for translation and/or adaptation and methodology

Accessibility

The instrument is available and tested in different

A = Available in multiple forms, no mode effects

mediums e.g., computer, self-administration,

B = Available in multiple forms, no testing for mode effects

clinician interviews

C = No alternative forms or mode information

There is a rationale and description of the

A = Clearly defined measurement construct and domains

concept/s the instrument intends to assess, and the

B = Adequately defined measurement construct and domains

relationship between the concepts.

C = Inadequately defined measurement construct and domains

Measurement

The instrument captures detail and breadth that is

A = No skewness or evidence for scoring methodology

Model

intended

B = Skewness or intermediate/conflicting evidence for scoring methodology

Conceptual

T P

I R

C S U

N A

M

C = Substantial skewness >20% in extreme range or no evidence for scoring methodology

Bias

D E

The instrument is not affected by differences in

T P E

culture and social context.

A = Evidence of review for bias, no bias present B = Face validity, no formal evidence C = Bias evident

C C

A

12

ACCEPTED MANUSCRIPT 1.3

RESULTS A total of 12 RCTs applied instruments for measuring efficacy of pharmacotherapy in

conduct disorders based on the recently published systematic review (Hambly et al., 2016). Twenty-four instruments were identified, of which 15 were excluded, based on their lack of aggressive and/or violent constructs, items assessing violent and/or aggressive behaviors, and published information. Seven instruments identified in the systematic review (Hambly et al.,

PT

2016) including the NCBRF (M.G. Aman, De Smedt, Derivan, Lyons, & Findling, 2002; Stocks, Taneja, Baroldi, & Findling, 2012), the Conners’ 3 (Connor, McLaughlin, & Jeffers-

RI

Terry, 2008), the BPI-01 (M.G. Aman et al., 2002; Ter-Stepanian, Grizenko, Zappitelli, &

SC

Joober, 2010), the ABC (M.G. Aman et al., 2002), the CBCL (Blader, Pliszka, Jensen, Schooler, & Kafantaris, 2010; Connor, Barkley, & Davis, 2000; Padhy et al., 2011; Steiner, Petersen,

NU

Saxena, Ford, & Matthews, 2003; Steiner, Saxena, et al., 2007; Wehmeier et al., 2011), the MOAS (Blader et al., 2010; Connor et al., 2008; Donovan et al., 2000; Malone, Delaney,

MA

Luebbert, Cater, & Campbell, 2000) and the SNAP-IV (Stocks et al., 2012; Wehmeier et al., 2011) met criteria for analysis (Figure 1). The CAS and the DBDRS were identified through

PT E

AC

CE

properties (Table 2).

D

reference searching. All nine instruments were scored based on psychometric and administrative

13

ACCEPTED MANUSCRIPT

24 instruments screened from 12 drug trials (RCTs) in CD (trial data extracted and evaluated from recently published systematic review) 8 instruments excluded (lacking aggressive/violent behavior constructs/items)

PT

Review of literature for reliability

RI

and validity studies for instruments

SC

Further 2 instruments identified from reference search

9 instruments excluded

criteria 8 records included - NCBRF

of included instrument)

MA

9 instruments evaluated against

NU

(insufficient data or previous version

D

6 records included - Conners’ 3 6 records included - BPI-01

PT E

2 records included - CAS

6 records included - ABC

5 records included - DBDRS

CE

7 records included - CBCL

6 records included - MOAS

AC

3 records included - SNAP-IV Figure 1: Selection of relevant instruments, presented as a flow diagram

1.3.1

Nisonger Child Behavior Rating Form (NCBRF) The NCBRF is designed to analyze the behavior of youth aged 3 to 16 with intellectual

disability and/or autism spectrum disorders (ASD) (Mircea, Rojahn, & Esbensen, 2010). It is a modified version of the Child Behavior Rating Form (CBCL) (Edelbrock, 1985) and has a teacher and a parent version (Mircea et al., 2010). A Typical IQ Version, has also been developed for rating children of normal developmental ability (M. Aman et al., 2008). The

14

ACCEPTED MANUSCRIPT NCBRF consists of two key subscales (social competence and problem behavior), eight domains and 76 items (Michael G. Aman, Tassé, Rojahn, & Hammer, 1996; Tassé, Aman, Hammer, & Rojahn, 1996). The instrument is free to access and appears straight-forward to score. It exists in a psychometrically sound Romanian form (Mircea et al., 2010). Age and gender bias may be present for the conduct problems and insecure/anxious subscales (Tassé et al., 1996). Several psychometric studies have been conducted on the NCBRF that examined

PT

various forms of reliability and validity with mixed results (Mircea et al., 2010). For reliability,

RI

internal consistency was rated high for both parent (Cronbach α = 0.85) and teacher forms (Cronbach α = 0.87) in the problem behaviors subscale (Michael G. Aman et al., 1996). For the

SC

social competence subscale internal consistency was moderate for parent reports (Cronbach α =

NU

0.78) and high for teacher reports (Cronbach α = 0.84) (Michael G. Aman et al., 1996). Moderate inter-rater reliability was shown (median correlation coefficient = 0.51) between

MA

parents and teachers in the problem behaviors subscale which was accountable based on the different situational contexts of the raters. For validity, confirmatory factor analysis (CFA) found that the factor structure of the social competence subscale was acceptable (RMSEA =

D

0.056), however the problem behavior section may be suboptimal (RMSEA = 0.086) (Norris &

PT E

Lecavalier, 2011) findings which were also echoed in ASD youth (Lecavalier et al., 2004). Comparison with the Aberrant Behavior Checklist (ABC) provided support for both parent

CE

(median correlation coefficient = 0.72) and teacher (median correlation = 0.69) versions for convergent validity (Michael G. Aman et al., 1996). Additionally, parent-ratings for the conduct

AC

problems subscale compared with the Disruptive Behavior Checklist – Disruptive/Antisocial subscale further supported convergent and criterion validity (Norris & Lecavalier, 2011). 1.3.2

Conners’ 3rd Edition The use of Conners’ behavioral rating scales have been long documented; with the

current edition being the Conners’ 3 (Conners, 2010; Westerlund, Ek, Holmberg, Naswall, & Fernell, 2009). There is a multitude of published literature on previous editions of the scales, with well-established validity and reliability evidence (Conners, Sitarenios, Parker, & Epstein, 1998; Waschbusch & Elgar, 2007). The 57 questionnaire items in Conners’ 3 are based largely on the DSM (American Psychiatric Association (APA), 2013) and principles of the International 15

ACCEPTED MANUSCRIPT Statistical Classification of Diseases (Westerlund et al., 2009; World Health Organisation (WHO), 1991). They assess ADHD, common comorbidities (e.g., ODD and CD) and associated problem behaviors in youth aged 6 to 18 (Conners, 2010). Connors’ 3 items are rated on a 4point Likert scale where higher scores are associated with a greater number and/or frequency of concerns (Conners, 2010). Three different scales can be used to analyze behavior in the Conners’ 3 namely, the Content Scales, DSM Symptom Scales and the Validity Scales. Flexible

PT

administration by qualified professionals is available, in online, offline and written modes;

RI

offered in 3 forms (parent, teacher and a self-report) (Conners, 2010). The instrument is expensive (~$650AUD) and appears complex to score either by hand or computer (Conners,

SC

2010). However, scoring is weighted for age and gender (Conners, 2010). There is a lack of published evidence for Conners’ 3 and its DSM-V version, perhaps due to the regularity of its

NU

updates. Overall the Conners’ 3 had high quality evidence for reliability with data for internal

MA

consistency (parent Cronbach α = 0.91, teacher Cronbach α = 0.94, self-report Cronbach α = 0.88), (Conners, 2010) inter-rater, and test-retest reliability (Conners, 2010). There are reports for moderate to high validity through multiple statistical analyzes, although not all data was

D

reported (Conners, 2010). It has been translated into other languages including French, (Catale,

PT E

Geurten, Lejeune, & Meulemans, 2014; Fumeaux et al., 2016) Chinese (Xiao, 2012) and Spanish, however published methods for translation methodology are lacking (Conners, 2010). The Behavior Problems Inventory (BPI-01)

CE

1.3.3

The Behavior Problems Inventory (BPI-01) is a 52-item instrument used to evaluate

AC

self-injurious, stereotypic, and aggressive/destructive behavior in mental retardation and other developmental disabilities (Rojahn, Matson, Lott, Esbensen, & Smalls, 2001). The evolution of the BPI-01 has been extensively documented over the years with well-defined measurement construct and domains (Rojahn et al., 2001). It has been transformed since its first German version in the 1980s and undergone multiple revisions (Rojahn, 1984; Rojahn et al., 2001). The BPI-01 has been translated into 11 languages (Mircea et al., 2010). Studies have examined the psychometric properties of the BPI-01 across children and adults samples (Rojahn et al., 2013). A comprehensive analysis of the BPI-01 in 1122 adults and children with intellectual disabilities (mean age = 34.4 years) from several sites reported moderate to high internal consistency across 16

ACCEPTED MANUSCRIPT frequency and severity of all three subscales (Cronbach's α = 0.73 to 0.92) (Hastings et al., 2012). Furthermore, a study including participants 14 to 91 years old found moderate (ICC = 0.64 to 0.76) test-retest reliability for the BPI-01 (Rojahn et al., 2001). Evidence for strong convergent and discriminant validity has also been reported, with factor structure endorsed by confirmatory factor analysis (Matson, Wilkins, Boisjoli, & Smith, 2008; Rojahn et al., 2012). In addition, several studies have found supporting evidence for criterion-related validity of the

PT

BPI-01 with the Repetitive Behavior Scale Spearman correlation = 0.77 (self-injurious), 0.68

RI

(stereotyped behavior) (Rojahn et al., 2010) and the Aberrant Behavior Checklist in adults (Rojahn, Aman, Matson, & Mayville, 2003).

SC

Few studies have exclusively assessed the use of the BPI-01 in youth. A study of 237

NU

ethnically diverse youth (four to 22 years old) with severe developmental disabilities and/or behavioral challenges supported the validity and reliability of the BPI-01 in this population

MA

(Rojahn et al., 2010). For the BPI-01 the mean teacher-teacher single measures and test-retest reliability was moderate to excellent with ICC = 0.76 and ICC = 0.84, respectively. For internal consistency, reliability coefficients ranged from low to high across the three subscales

D

(Chronbach’s α = 0.86 (stereotypic), 0.88 (aggressive/destructive), 0.59 (self-injurious))

PT E

(Rojahn et al., 2010). Further, excellent internal consistency for frequency (Chronbach’s α = 0.87) and severity subscales (Chronbach’s α = 0.89) and test-test reliability has been reported in

CE

infants and toddlers at risk for intellectual or developmental disabilities (Rojahn et al., 2013). The BPI-01 has also been found to have high convergent and discriminant validity when

AC

compared to the Nisonger Child Behavior Rating Form (NCBRF), as well as adequate factorial validity in youth (Rojahn et al., 2010). 1.3.4

The Children’s Aggression Scale (CAS) The CAS is designed to evaluate setting-specific (i.e., home and school) frequency and

severity of aggressive acts of non-institutionalized youth ages 5 to 18 years (Halperin et al., 2003). Two English versions exist, a parent and a teacher version. The parent version features 33 items, and the Teacher Rating Form features 23. The CAS features 5 key domains namely verbal aggression, aggression against objects and animals, provoked physical aggression, unprovoked physical aggression, and use of weapons. The Parent Rating Form has two 17

ACCEPTED MANUSCRIPT additional domains including, aggression against family members, and aggression against nonfamily members (Halperin et al., 2002). The items in the CAS are weighted differentially depending on the severity of the act (Halperin et al., 2002). The CAS requires evidence of qualifications for use, and has sophisticated hand or computer-assisted scoring. Despite this, it has high internal consistency for both the parent (Cronbach’s α = 0.93) and teacher (Cronbach’s α = 0.93) versions (Halperin et al., 2002). Differentiation between the diagnostic subgroups of

PT

ADHD, ODD and CD and correlations with other instruments including the CBCL provide

RI

support for validity (Halperin et al., 2002). The teacher and parent versions both display moderate to high convergent validity (Halperin et al., 2003; Halperin et al., 2002). The parent

SC

version has a good measurement model with a one-way analysis of variance (ANOVA)

NU

indicating significant difference in ratings across all 5 domains, with a continuum of severity also emerging (Halperin et al., 2002). Bias was not formally investigated however in reliability

MA

and validity studies featured an ethnically diverse population (Halperin et al., 2002). In the context of youth with violent and aggressive behaviors, 100% of the items in the CAS focus on aggression and violent outbursts, rendering it a valuable tool in CD clinical research. Aberrant Behavior Checklist (ABC)

D

1.3.5

PT E

The Aberrant Behavior Checklist (ABC) is a 58-item instrument used to measure behavior problems of children and adults with mental retardation across five subscales (Rojahn

CE

et al., 2003). It exists in two forms, residential and community (ABC-C) (M. G. Aman, Singh, Stewart, & Field, 1985; Schmidt, Huete, Fodstad, Chin, & Kurtz, 2013). It has an excellent

AC

measurement model with in-depth reporting of its development (Marshburn & Aman, 1992). The ABC is, or currently is, undergoing translation into 39 languages as reported by its distributors (M. G. Aman, 2012). However, minimal information for translation methods and validation was retrieved. Psychometric studies in toddlers have shown that the ABC-C is a reliable and valid behavior-rating instrument. Overall, moderate to high internal consistency for each subscale across two studies (Chronbach’s α ranging 0.68 to 0.90 (Karabekiroglu & Aman, 2009) and Chronbach’s α ranging 0.81 to 0.96 (Rojahn et al., 2013) has been demonstrated. Moderate convergent validity has been displayed in clinician-rated youth with intellectual disabilities and challenging needs (Hill, Powlitch, & Furniss, 2008). Furthermore, high 18

ACCEPTED MANUSCRIPT convergent validity with the CBCL has been reported in outpatient psychiatry clinics in toddlers with a variety of conditions (Karabekiroglu & Aman, 2009). The ABC-C contains less than 10% of items that specifically measure violent and aggressive behaviors. However, studies to validate the instrument have included participants with primary diagnoses of aggressive and/or violent behavior such as CD (Karabekiroglu & Aman, 2009; Rojahn & Helsel, 1991). 1.3.6

The Disruptive Behavior Disorders Rating Scale (DBDRS)

PT

The current DBDRS is based on the fourth edition of the DSM (Loona & Kamal, 2011). It consists of 41 items representing four domains of symptoms for CD, ODD and ADHD

RI

(Pelham, Gnagy, Greenslade, & Milich, 1992). There are two scoring methods than can be used

SC

to assign a DSM diagnosis to the child (Pelham et al., 1992). The DBDRS has been translated into Dutch (Antrop, Roeyers, Oosterlaan, & Van Oost, 2002), Spanish (Silva et al., 2005) and

NU

Urdu (Loona & Kamal, 2011). The Urdu version has displayed adequate psychometric properties, demonstrating that it is reliable for screening and diagnosis in the school and home

MA

setting (Loona & Kamal, 2011). In the same study, the internal consistency reliability for the English version of the DBDRS ranged from moderate to high (Chronbach’s α = 0.70 to 0.92)

D

(Loona & Kamal, 2011). Furthermore, another study of teacher responses showed excellent

PT E

internal consistency reliabilities for the DBDRS (Chronbach’s α = 0.91 to 0.95) across the inattentive (Chronbach’s α = 0.93), hyperactive/impulsive (Chronbach’s α = 0.91), and

CE

oppositional defiant (Chronbach’s α = 0.94) subscales (Pelletier, Collett, Gimpel, & Crowley, 2006). High convergent validity of the DBDRS with the School Situations Questionnaire (SSQ)

AC

has been found in teacher responses in preschool aged children (Pelletier et al., 2006). Low agreement between parent and teacher ratings had highlighted potential instrument bias (Antrop et al., 2002; Silva et al., 2005). Further studies evaluating the psychometric properties of the DBDRS are recommended as its role in the context of violent and aggressive behavior rating has not been fully understood. Investigation into specific teacher and parent versions is also warranted, although it was unclear to determine if separate versions existed, with a report that the teacher version excluded the CD domain (Pelletier et al., 2006).

19

ACCEPTED MANUSCRIPT 1.3.7

The Child Behavior Checklist (CBCL) The CBCL is a 120 item parent-report, which provides a measure of behavioral and

emotional functioning and social competence of youth, aged six to 18 (Achenbach, 1991). The CBCL has been updated to include DSM-oriented scales, and to complement the new preschool version for children aged 18 months to 5 years. The CBCL has two domains of open and closed questions, social competence and problem behaviors (Siddons & Lancaster, 2004). These

PT

domains can be further divided into DSM-orientated scales, (Nakamura, Ebesutani, Bernstein, & Chorpita, 2009) eight syndrome structures, or ratings of externalising/internalising behaviors

RI

(Ivanova et al., 2007). Parents rate each closed question on a three-point Likert scale (higher

SC

scores indicating a greater number of behaviors present) based on the preceding 6 months (Ivanova et al., 2007). The CBCL is part of a multi-informant group of instruments developed

NU

by Achenbach and colleagues which also includes the Youth Self Report and the Teachers Report Form (Ivanova et al., 2007). Professional qualifications are required to access the costly

MA

(~$700AUD) instrument and scoring is completed digitally or by hand (Sawyer et al., 2000a). Raw scores, T-scores and thresholds can be used to summarise the CBCL in multiple ways

D

under different scales, syndromes and structures (Sawyer et al., 2000a). The CBCL User

PT E

Manual reports high inter-rater reliability (ICC =0.93, p<0.001) and test-retest reliability (ICC =0.95, p<0.001) for item scores (Achenbach, 1991). It is also reported in the manual that there is

CE

established evidence for content validity, criterion-related validity and construct validity for the CBCL (Achenbach, 1991). Published evidence supports clinician use of the CBCL

AC

demonstrating moderate reliability (mean Chronbach’s α = 0.77), convergent and discriminant validity and sound factorial structure (CFI = 0.92) compared to parent-reporting for the CBCL (Dutra, Campbell, & Westen, 2004). The CBCL also demonstrates moderate to high factorial validity for the eight syndrome structure (RMSEA = 0.026 to 0.055) across 30 societies, with 58 051 participants aged six to 18 (Ivanova et al., 2007). Findings also support the concurrent validity of the recently derived DSM-oriented scales (Ebesutani et al., 2010). Due to the multiple structures and subscales, analysis of the conceptual and measurement models of the instrument in regards to its use in violent and aggressive youth has not been conducted. The CBCL features few items (approximately 3.3% of the instrument) related to violent or 20

ACCEPTED MANUSCRIPT aggressive outbursts, limiting its usefulness in studies featuring these participants. However, it is useful for obtaining data on the range of comorbidities that often need to be reported or included in the analysis in CD studies. Further research including youth with violent and aggressive behavior, or those examining the conduct subscale of the CBCL may be useful to determine the role of this instrument in the CD population. 1.3.8

Modified Overt Aggression Scale (MOAS)

PT

The MOAS is a 16 item rating scale that measures aggressive behavior over four domains (Kay, Wolkenfeld, & Murrill, 1988; Sorgi, Ratey, Knoedler, Markert, & Reichman,

RI

1991). The items of the MOAS are hand-scored on a 5-point Likert scale of increasing severity

SC

with verbal aggression assigned the lowest weight and physical aggression the highest (Huang et al., 2009). The total weighted score may range from 0 to 40, with higher scores indicating

NU

more aggression (Dean, Bor, Adam, Bowling, & Bellgrove, 2014). The MOAS has evidence for validity and reliability in French (Cronbach’s α of 0.84 to 0.89 (De Benedictis, Dumais,

MA

Stafford, Cote, & Lesage, 2012)) and Chinese (Huang et al., 2009) (ICC = 0.94, p<0.001 (Huang et al., 2009)) versions however both studies lacked information regarding their

D

methodology to achieve linguistic equivalence. The validity and reliability of the MOAS has

PT E

been described in multiple inpatient and outpatient populations including intellectual disability, behavioral disorders and brain injuries (De Benedictis et al., 2012; Endicott et al., 2002; Huang

CE

et al., 2009; Kho, Sensky, Mortimer, & Corcos, 1998; Oliver, Crawford, Rao, Reece, & Tyrer, 2007). Reports of acceptable to high reliability and validity in multiple studies exist, however

AC

studies including children are lacking. Dean et al.,(Dean et al., 2014) report that the validity of the MOAS has been examined in several studies, and demonstrated reliability and sensitivity in children, however, there is minimal evidence of published studies. Despite this, the tool has been used as an outcome measure in many studies with child and adolescent participants (Blader et al., 2010; Dean et al., 2014; Kronenberger et al., 2007). All items in the MOAS focus on aggressive and violent outbursts, with the potential to be a valuable tool in CD clinical research. Further studies to validate the psychometric properties of the MOAS in children and adolescents are recommended.

21

ACCEPTED MANUSCRIPT 1.3.9

The Swanson, Nolan and Pelham Teacher and Parent Rating Scale (SNAP-IV) The 90 item SNAP-IV Rating Scale is a revision of the SNAP Questionnaire designed

to measure ADHD and ODD symptoms in children and young adults (Gau et al., 2009). Items 41 to 90 of the SNAP-IV contain criteria from other DSM-IV disorders such as CD, generalized anxiety disorder, and intermittent explosive disorder. These are often symptoms that overlap with ADHD, or may be comorbid disorders (Inoue et al., 2014). The SNAP-IV can be accessed

PT

online. Hand scoring appears simple. The SNAP-IV has been used in multiple clinical trials as a measure of efficacy and ADHD symptom severity (Bussing et al., 2008). One study using the

RI

SNAP-IV has reported moderate to high internal consistency (Stevens, Quittner, & Abikoff,

SC

1998). There are also published data to support the validity and reliability of the SNAP-IV for the Japanese (Inoue et al., 2014) and Chinese (Gau et al., 2009) versions. The SNAP-IV has

NU

been criticized for the lack of published psychometric properties and sparse normative data in a review of ADHD rating scales, limiting its usefulness in research and clinical practice (Collett,

MA

Ohan, & Myers, 2003; Gau et al., 2009). For the same reason, it was not possible to extensively review this instrument for its use in violent and aggressive youth. Furthermore, the SNAP-IV

D

only features a handful of symptoms (approximately 9% of the instrument) indicative of violent

PT E

or aggressive outbursts, limiting its usefulness. Out of the nine instruments, two (NCBRF and ABC) can be used to evaluate behaviors

CE

in those with low IQs or mental retardation. Both the ABC and BPI-01 can be used to rate behaviors across child and adult populations and it is unknown if scoring is adjusted for age and

AC

gender as there was lacking published information regarding their measurement models. Numbers of items in the instruments varied from 16 (MOAS) to 120 (Conners’ 3rd Edition). Periods of measured behavioral occurrences spanned from the present (CBCL) to the past year (CAS). Instruments ranged in cost from free-access to $650AUD (see Appendix). Responsiveness data were not obtained for any instrument. The NCBRF, Conners’ 3rd Edition, and the BPI-01 were rated the highest for their psychometric properties. The CAS, ABC-C, and the DBDRS were rated moderately, and the CBCL, MOAS and SNAP-IV were rated lowest (Table 4). The NCBRF, BPI-01 and CAS were the only instruments that can be used to measure

22

ACCEPTED MANUSCRIPT both frequency and severity of behaviors. The CAS and MOAS featured the most items

AC

CE

PT E

D

MA

NU

SC

RI

PT

pertaining to aggression (both 100%).

23

ACCEPTED MANUSCRIPT

Table 2: Ranking of Assessed Tools Based on Psychometric and Administrative Properties Psychometric Properties

Instrument

Reliability

Validity

Responsiveness

Conceptual

Administrative Properties

Measurement Model

Bias

Brevity

Simplicity

I R

Nisonger Child Behavior Rating

3

2

0

3

3

3

2

0

2

2

3

2

0

3

C S U

1

Behavior Problems

D E

Inventory (BPI-

T P E

01) Children’s Aggression 3

C C

2

0

Scale

3

3

23

1

2

1

2

3

20

2

2

2

3

3

1

19

3

N A

M

Score

3

3

2

2

1

1

1

18

4

3

0

2

2

2

1

2

17

5

A

(CAS) Aberrant Behavior

0

Rank

3

3

Edition

Total Accessibility

2

Form (NCBRF) Conners’ 3rd

T P

Acceptability

3

2

0

Checklist (ABC)

24

ACCEPTED MANUSCRIPT

Disruptive Behavior 3

2

0

2

0

1

3

2

3

1

17

5

3

3

16

6

Disorder Scale

T P

(DBDRS)

I R

Child Behavior Checklist

2

2

0

0

0

3

2

C S U

(CBCL) Modified Overt

N A

Aggression 0

0

0

3

3

0

Scale (MOAS)

D E

Swanson, Nolan, and Pelham 0

0

T P E

0

Rating Scale (SNAP-IV)

2

0

1

3

3

2

2

16

6

3

3

3

1

12

7

M

0

C C

3=A, 2=B, 1=C, 0=Unknown

A

25

ACCEPTED MANUSCRIPT Table 3: Ranking of Instruments and Applicability to Violence and Aggression Research Number

Rank

Measures

Measures

Psychometric

frequency

severity

Instrument

Comments

of items n (%)

Nisonger Child

Designed for use in ~7

Behavior Rating

1

Yes

those with sub-

Yes (9.2)%

PT

Form (NCBRF) Conners’ 3rd

average IQ

Lacking published

Edition

2

Yes

No

RI

~15

data for DSM-V

SC

(13.6%)

Inventory (BPI-01) 3

(CAS)

CE

Aberrant Behavior

PT E

Aggression Scale

Yes

11

designed for use in (21.2%)

average IQ

33

100% of scale

(100%)

measures aggression

Yes

Developed and ~5

5

No

standardized in

Yes (8.6%)

adults

AC

Checklist (ABC)

4

for scoring lacking,

those with sub-

D

Children’s

Published methods

Yes

MA

Yes

NU

Behavior Problems

Disruptive

Behavior Disorder

version

Extensive published 15

5

No

data for ADHD

Yes (33.3%)

Scale (DBDRS)

populations

Child Behavior

Part of a multi~4

Checklist (CBCL)

6

Yes

informant group of

No (3.3%)

instruments Modified Overt

6

No

Yes

16

Lacking validation

26

ACCEPTED MANUSCRIPT (100%)

Aggression Scale (MOAS)

in youth populations, 100% of scale measures aggression

Swanson, Nolan,

Published

and Pelham Rating 7

No

~8

psychometric and

(8.8%)

normative data

Yes

lacking

AC

CE

PT E

D

MA

NU

SC

RI

PT

Scale (SNAP-IV)

27

ACCEPTED MANUSCRIPT 1.4

DISCUSSION The strengths and weaknesses of the reviewed instruments will be discussed under the

following subheadings: instruments accounting for the heterogeneity of aggression and comorbidities of conduct disorder, instruments encompassing weighted scoring for gender differences, Instruments accounting for the severity and frequency of aggressive behaviors, instruments accounting for average and sub-average IQ, impact of technological advances on

1.4.1

PT

instruments and their development, and practicality, findings and recommendations. Instruments accounting for the heterogeneity of aggression and comorbidities of

RI

conduct disorder

SC

The assessment of violence and aggression using behavior rating tools still appears limited, with a predominate focus on oppositional and defiant behavior, rather than aggression

NU

itself (Halperin et al., 2002). Instruments that intend to measure aggression were frequently confounded by items that evaluate oppositional and defiant behavior, perhaps due to the

MA

heterogeneity and frequency of comorbidities in those with CD (Halperin et al., 2002; Klahr & Burt, 2014). The CAS and the MOAS take into consideration the heterogeneity of aggressive

D

behaviors featuring items to rate multiple subtypes of aggressive behaviors. The CAS also

PT E

considers the context of the behavior whether it occurs in the home, out of the home, between peers, between siblings, and has the option of multi-informants (teacher version or parent version).

Instruments encompassing weighted scoring for age and gender differences

CE

1.4.2

The display of violent and aggressive behaviors has different implications based on the

AC

age and gender of the aggressor (Halperin et al., 2002). The CAS, Conners’ 3rd Edition, CBCL and DBDRS all feature scoring systems that take into account weighted and/or standardized adjustments. Scoring systems that weight more severe acts more heavily than less severe acts may provide a more accurate evaluation of the problem behaviors (Hirsch, Frank, Shapiro, Hazell, & Frank, 2004). Standardising behavior for age and gender can lead to scores that can be compared to age and gender-appropriate ranges, aiding in the delineation of normative and non-normative behaviors (Steiner, Remsing, & Work Group on Quality, 2007). The BPI-01 and ABC are instruments that can be used in adults. There was minimal published information found about the scoring system for the BPI-01. It was not able to be 28

ACCEPTED MANUSCRIPT determined if the age of subjects was taken into account during its development which may limit its usefulness and be a potential source of bias. There is a further need for evaluating instruments not standardized in youth, since the presentation of their behaviors may differ from adults as a result of developmental factors (Schmidt et al., 2013). 1.4.3

Instruments accounting for the severity and frequency of aggressive behaviors The ability to assess severity independent of frequency of behaviors is an important and

PT

valuable characteristic. Instruments that do not feature weighted scoring or combine frequency and severity are problematic (Halperin et al., 2002). Weighting items depending on the severity

RI

of the act allows for more severe episodes of behavior to be mathematically weighted more

SC

heavily than less severe ones (Halperin et al., 2002). The NCBRF, BPI-01, and CAS were the only instruments that both measured frequency and severity of behaviors. The BPI-01 measures

NU

frequency and severity of behaviors independently, however published scoring methods are lacking. The NCBRF requests users to rate behavior over the past month, however combines

MA

frequency and severity rating for each item The CAS requests users to rate behavior over the past year and provides items that escalate in severity. The items are then rated on a five-point

D

Likert scale for frequency. Although there is concern over the lack of precision when estimating

PT E

frequency, using a quantitative approach (e.g., “never”, “once a month”, “most days”) may overcome error variance that is associated with the users’ perception of subjective terms such as

CE

“occasionally” and “often” (Halperin et al., 2002). The CAS was the only scale that weighted various acts of aggression to determine the severity of specific acts of aggression beyond the

1.4.4

AC

frequency.

Instruments accounting for average and sub-average IQ The assessment of learning disabilities is essential in individuals with conduct problems

as a third of children with CD have sub-average IQ (Scott, 2012). Longitudinal studies show those with early onset CD have lower IQ and are predicted to have poorer outcomes in adulthood (Lahey et al., 1995; Scott, 2012). The BPI-01 as well as the NCBRF are designed to measure subjects with sub-average IQ. The NCBRF has a recently developed “Typical IQ” version, however it is unknown if the BPI-01 is suitable to use in this group. Although the BPI01 ranked highly in this review it should be used carefully in those with typical IQ until further 29

ACCEPTED MANUSCRIPT validation can be determined. Instruments should be used with caution until validated for use in those with typical IQ and sub-average IQ. 1.4.5

Impact of technological advances on instruments and their development The emergence of technologically advanced instruments is an area of interest. The

electronic Hamilton Anatomy of Risk Management tool (e-HARM) has recently been released and is a free electronic instrument that monitors a patient to help predict the likelihood of

PT

violent and aggressive outbursts (Vogel, 2016). The instrument aggregates and charts captured data over time, while connecting to past episodes and treatments. New technology may

RI

revolutionize how aggression is measured, monitored, and managed (Vogel, 2016). Therefore, it

1.4.6

Practicality, findings and recommendations

SC

should be considered when developing new, or refining existing instruments.

NU

Overall, it became apparent that aggressive behaviors typically receive relatively little attention in broad-band behavior rating instruments with small numbers of designated items,

MA

often confounded and conflated with oppositional and defiant symptoms (Rojahn et al., 2001). The two instruments that ranked highest, NCBRF and Conners’ 3rd Edition (presented in Table

D

3), were broad-band behavior rating scales that only had approximately 9.2% and 13.6% of

PT E

items measuring violent and aggressive behaviors, respectively, limiting their usefulness. The CAS is a psychometrically sound instrument that takes into account frequency and severity of behaviors as well as age and gender effects thus, is recommended for use in this violent and

CE

aggressive youth research. All of the items in the CAS and the MOAS can be used to measure

AC

violent and/or aggressive behaviors. Further research into the psychometric properties of the MOAS in violent and aggressive youth is required before its use can be recommended.

1.5

CONCLUSION Although broad-band scales such as the NCBRF and Conners’ 3rd Edition rated

highest for their psychometric properties, their usefulness in youth violence and aggression research is limited. This has led to conclude that much of the diversity in predictor and criterion measures may be due to a fundamental lack of theoretical models considering the heterogeneity of CD. In order to measure the efficacy of a pharmcotherapeutic intervention in those with violence, aggression and/or CD, age, 30

ACCEPTED MANUSCRIPT gender, functional level, situational context and the type of informant should also be taken into account. Most behavior rating instruments contain few designated items specific to violent and aggressive behaviors and are often conflated with oppositional and defiant symptoms. All of the items in the CAS and the MOAS can be used to measure violent and/or aggressive behaviors. Further research into the psychometric properties of the MOAS in violent and aggressive youth is required before its use can be recommended. The

PT

CAS was found to be the most psychometrically sound and useful instrument that

AC

CE

PT E

D

MA

NU

SC

RI

exclusively measures aggressive behaviors in youth.

31

ACCEPTED MANUSCRIPT 1.6

ACKNOWLEDGEMENTS We thank Griffith University for partially funding this study. The study has not received

any funding from third parties. Conflict of interest: The authors declared no potential conflicts of interest with respect to the research,

AC

CE

PT E

D

MA

NU

SC

RI

PT

authorship, and/or publication of this article.

32

ACCEPTED MANUSCRIPT 1.7

REFERENCES

AC

CE

PT E

D

MA

NU

SC

RI

PT

Achenbach, T. M. (1991). Manual for the Child Behavior Checklist/4-18 and 1991 Profile. Burlington, VT: University of Vermont, Department of Psychiatry. Aman, M., Leone, S., Lecavalier, L., Park, L., Buican, B., & Coury, D. (2008). The Nisonger Child Behavior Rating Form: typical IQ version. Int Clin Psychopharmacol, 23(4), 232242. doi:10.1097/YIC.0b013e3282f94ad0 Aman, M. G. (2012). Aberrant Behavior Checklist: Current Identity and Future Developments. Clin Exp Pharmacol, 2(3), 114. Aman, M. G., De Smedt, G., Derivan, A., Lyons, B., & Findling, R. L. (2002). Double-blind, placebo-controlled study of risperidone for the treatment of disruptive behaviors in children with subaverage intelligence. Am J Psychiatry, 159(8), 1337-1346. Aman, M. G., Singh, N. N., Stewart, A. W., & Field, C. J. (1985). The aberrant behavior checklist: a behavior rating scale for the assessment of treatment effects. Am J Ment Defic, 89(5), 485-491. Aman, M. G., Tassé, M. J., Rojahn, J., & Hammer, D. (1996). The Nisonger CBRF: A child behavior rating form for children with developmental disabilities. Res Dev Disabil, 17(1), 41-57. doi:10.1016/0891-4222(95)00039-9 American Psychiatric Association (APA). (2013). Diagnostic and statistical manual of mental disorders, 5th Edition: DSM-5. Arlington, VA: American Psychiatric Publishing. Andresen, E. M. (2000). Criteria for assessing the tools of disability outcomes research. Arch Phys Med Rehabil, 81(12 Suppl 2), S15-20. Andresen, E. M., Rothenberg, B. M., Panzer, R., Katz, P., & McDermott, M. P. (1998). Selecting a generic measure of health-related quality of life for use among older adults. A comparison of candidate instruments. Eval Health Prof, 21(2), 244-264. Antrop, I., Roeyers, H., Oosterlaan, J., & Van Oost, P. (2002). Agreement Between Parent and Teacher Ratings of Disruptive Behavior Disorders in Children with Clinically Diagnosed ADHD. Journal of Psychopathology and Behavioral Assessment, 24(1), 6773. doi:10.1023/A:1014057325752 Blader, J. C., Pliszka, S. R., Jensen, P. S., Schooler, N. R., & Kafantaris, V. (2010). Stimulantresponsive and stimulant-refractory aggressive behavior among children with ADHD. Pediatrics, 126(4), e796-806. doi:10.1542/peds.2010-0086 Bor, W., Dean, A. J., Najman, J., & Hayatbakhsh, R. (2014). Are child and adolescent mental health problems increasing in the 21st century? A systematic review. Aust N Z J Psychiatry, 48(7), 606-616. doi:10.1177/0004867414533834 Bussing, R., Fernandez, M., Harwood, M., Wei, H., Garvan, C. W., Eyberg, S. M., & Swanson, J. M. (2008). Parent and teacher SNAP-IV ratings of attention deficit hyperactivity disorder symptoms: psychometric properties and normative ratings from a school district sample. Assessment, 15(3), 317-328. doi:10.1177/1073191107313888 Campbell, D. T., & Fiske, D. W. (1959). Convergent and discriminant validation by the multitrait-multimethod matrix. Psychol Bull, 56(2), 81-105. Catale, C., Geurten, M., Lejeune, C., & Meulemans, T. (2014). The Conners Parent Rating Scale: Psychometric properties in typically developing 4- to 12-year-old Belgian French-speaking children. Revue europeenne de psychologie appliquee, 64(5), 221. doi:10.1016/j.erap.2014.07.001 Cicchetti, D. V. (1994). Guidelines, Criteria, and Rules of Thumb for Evaluating Normed and Standardized Assessment Instruments in Psychology. Psychological Assessment, 6(4), 284-290. doi:10.1037/1040-3590.6.4.284 Collett, B. R., Ohan, J. L., & Myers, K. M. (2003). Ten-year review of rating scales. V: scales assessing attention-deficit/hyperactivity disorder. J Am Acad Child Adolesc Psychiatry, 42(9), 1015-1037. doi:10.1097/01.CHI.0000070245.24125.B6 Collishaw, S. (2015). Annual Research Review: Secular trends in child and adolescent mental health. Journal of Child Psychology and Psychiatry, 56(3), 370-393. doi:10.1111/jcpp.12372 Colliver, J. A., Conlee, M. J., & Verhulst, S. J. (2012). From test validity to construct validity … and back? Medical Education, 46(4), 366-371. doi:10.1111/j.13652923.2011.04194.x 33

ACCEPTED MANUSCRIPT

AC

CE

PT E

D

MA

NU

SC

RI

PT

Conners, C. K. (2010). Test Review: C. Keith Conners Conners 3rd Edition Toronto, Ontario, Canada: Multi-Health Systems, 2008. Journal of Psychoeducational Assessment, 28(6), 598-602. doi:10.1177/0734282909360011 Conners, C. K., Sitarenios, G., Parker, J. D., & Epstein, J. N. (1998). The revised Conners' Parent Rating Scale (CPRS-R): factor structure, reliability, and criterion validity. J Abnorm Child Psychol, 26(4), 257-268. Connor, D. F., Barkley, R. A., & Davis, H. T. (2000). A pilot study of methylphenidate, clonidine, or the combination in ADHD comorbid with aggressive oppositional defiant or conduct disorder. Clin Pediatr (Phila), 39(1), 15-25. Connor, D. F., McLaughlin, T. J., & Jeffers-Terry, M. (2008). Randomized controlled pilot study of quetiapine in the treatment of adolescent conduct disorder. J Child Adolesc Psychopharmacol, 18(2), 140-156. doi:10.1089/cap.2006.0007 Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16(3), 297-334. doi:10.1007/BF02310555 De Benedictis, L., Dumais, A., Stafford, M. C., Cote, G., & Lesage, A. (2012). Factor analysis of the French version of the shorter 12-item Perception of Aggression Scale (POAS) and of a new modified version of the Overt Aggression Scale (MOAS). J Psychiatr Ment Health Nurs, 19(10), 875-880. doi:10.1111/j.1365-2850.2011.01870.x Dean, A. J., Bor, W., Adam, K., Bowling, F. G., & Bellgrove, M. A. (2014). A randomized, controlled, crossover trial of fish oil treatment for impulsive aggression in children and adolescents with disruptive behavior disorders. J Child Adolesc Psychopharmacol, 24(3), 140-148. doi:10.1089/cap.2013.0093 Donovan, S. J., Stewart, J. W., Nunes, E. V., Quitkin, F. M., Parides, M., Daniel, W., . . . Klein, D. F. (2000). Divalproex treatment for youth with explosive temper and mood lability: a double-blind, placebo-controlled crossover design. Am J Psychiatry, 157(5), 818-820. Dutra, L., Campbell, L., & Westen, D. (2004). Quantifying clinical judgment in the assessment of adolescent psychopathology: Reliability, validity, and factor structure of the Child Behavior Checklist for clinician report. J Clin Psychol, 60(1), 65-85. doi:10.1002/jclp.10234 Ebesutani, C., Bernstein, A., Nakamura, B. J., Chorpita, B. F., Higa-McMillan, C. K., Weisz, J. R., & The Research Network on Youth Mental, H. (2010). Concurrent Validity of the Child Behavior Checklist DSM-Oriented Scales: Correspondence with DSM Diagnoses and Comparison to Syndrome Scales. Journal of Psychopathology and Behavioral Assessment, 32(3), 373-384. doi:10.1007/s10862-009-9174-9 Edelbrock, C. S. (1985). Child Behavior Rating Form. Psychopharmacological Bulletin(21), 835-837. Elson, M., Mohseni, M. R., Breuer, J., Scharkow, M., & Quandt, T. (2014). Press CRTT to measure aggressive behavior: the unstandardized use of the competitive reaction time task in aggression research. Psychol Assess, 26(2), 419-432. doi:10.1037/a0035569 Endicott, J., Tracy, K., Burt, D., Olson, E., & Coccaro, E. F. (2002). A novel approach to assess inter-rater reliability in the use of the Overt Aggression Scale-Modified. Psychiatry Res, 112(2), 153-159. doi:10.1016/S0165-1781(02)00185-3 Erskine, H. E., Moffitt, T. E., Copeland, W. E., Costello, E. J., Ferrari, A. J., Patton, G., . . . Scott, J. G. (2015). A heavy burden on young minds: the global burden of mental and substance use disorders in children and youth. Psychol Med, 45(7), 1551-1563. doi:10.1017/S0033291714002888 Etchells, P. J., Gage, S. H., Rutherford, A. D., & Munafo, M. R. (2016). Prospective Investigation of Video Game Use in Children and Subsequent Conduct Disorder and Depression Using Data from the Avon Longitudinal Study of Parents and Children. PLoS ONE, 11(1), e0147732. doi:10.1371/journal.pone.0147732 Farmer, C. A., Kaat, A. J., Mazurek, M. O., Lainhart, J. E., DeWitt, M. B., Cook, E. H., . . . Aman, M. G. (2016). Confirmation of the Factor Structure and Measurement Invariance of the Children's Scale of Hostility and Aggression: Reactive/Proactive in ClinicReferred Children With and Without Autism Spectrum Disorder. J Child Adolesc Psychopharmacol, 26(1), 10-18. doi:10.1089/cap.2015.0098 Frieden, T. R., Jaffe, H. W., Cono, J., Richards, C. L., & Iademarco, M. F. (2014). Youth Risk Behavior Surveillance, United States, 2013. Retrieved from Atlanta,Georgia, USA: 34

ACCEPTED MANUSCRIPT

AC

CE

PT E

D

MA

NU

SC

RI

PT

Fumeaux, P., Mercier, C., Roche, S., Iwaz, J., Bader, M., Stéphan, P., . . . Revol, O. (2016). Validation of the French Version of Conners' Parent Rating Scale Revised, Short Version: Factorial Structure and Reliability/Validation de la version française de la version révisée et abrégée de l'échelle parents de Conners; structure factorielle et fiabilité. Canadian Journal of Psychiatry, 61(4), 236. doi:10.1177/0706743716635549a Gau, S. S., Lin, C. H., Hu, F. C., Shang, C. Y., Swanson, J. M., Liu, Y. C., & Liu, S. K. (2009). Psychometric properties of the Chinese version of the Swanson, Nolan, and Pelham, Version IV Scale-Teacher Form. J Pediatr Psychol, 34(8), 850-861. doi:10.1093/jpepsy/jsn133 Halperin, J. M., McKay, K. E., Grayson, R. H., & Newcorn, J. H. (2003). Reliability, validity, and preliminary normative data for the Children's Aggression Scale-Teacher Version. J Am Acad Child Adolesc Psychiatry, 42(8), 965-971. doi:10.1097/01.CHI.0000046899.27264.EB Halperin, J. M., McKay, K. E., & Newcorn, J. H. (2002). Development, reliability, and validity of the children's aggression scale-parent version. J Am Acad Child Adolesc Psychiatry, 41(3), 245-252. doi:10.1097/00004583-200203000-00003 Hambly, J. L., Khan, S., McDermott, B., Bor, W., & Haywood, A. (2016). Pharmacotherapy of conduct disorder: Challenges, options and future directions. J Psychopharmacol. doi:10.1177/0269881116658985 Hastings, R. P., Didden, H. C. M., Rojahn, J., Matson, J. L., Kroes, D. B. H., Sharber, A. C., . . . Dumont, E. L. M. (2012). The Behavior Problems Inventory-Short Form for individuals with intellectual disabilities: Part II: reliability and validity. Journal of Intellectual Disability Research, 56(5), 546-565. doi:10.1111/j.1365-2788.2011.01506.x Hersen, M. (2006). Clinician's handbook of child behavioral assessment. Burlington, MA: Elsevier Academic Press. Hill, J., Powlitch, S., & Furniss, F. (2008). Convergent validity of the aberrant behavior checklist and behavior problems inventory with people with complex needs. Res Dev Disabil, 29(1), 45-60. doi:10.1016/j.ridd.2006.10.002 Hirsch, S., Frank, T. L., Shapiro, J. L., Hazell, M. L., & Frank, P. I. (2004). Development of a questionnaire weighted scoring system to target diagnostic examinations for asthma in adults: a modelling study. BMC Fam Pract, 5(1), 30. doi:10.1186/1471-2296-5-30 Huang, H. C., Wang, Y. T., Chen, K. C., Yeh, T. L., Lee, I. H., Chen, P. S., . . . Lu, R. B. (2009). The reliability and validity of the Chinese version of the Modified Overt Aggression Scale. Int J Psychiatry Clin Pract, 13(4), 303-306. doi:10.3109/13651500903056533 Hunsley, J., & Mash, E. J. (2010). The role of assessment in evidence-based practice. Handbook of assessment and treatment planning for psychological disorders. New York: The Guilford Press. Inoue, Y., Ito, K., Kita, Y., Inagaki, M., Kaga, M., & Swanson, J. M. (2014). Psychometric properties of Japanese version of the Swanson, Nolan, and Pelham, version-IV ScaleTeacher Form: a study of school children in community samples. Brain Dev, 36(8), 700-706. doi:10.1016/j.braindev.2013.09.003 Ivanova, M. Y., Dobrean, A., Dopfner, M., Erol, N., Fombonne, E., Fonseca, A. C., . . . Chen, W. J. (2007). Testing the 8-syndrome structure of the child behavior checklist in 30 societies. J Clin Child Adolesc Psychol, 36(3), 405-417. doi:10.1080/15374410701444363 Karabekiroglu, K., & Aman, M. G. (2009). Validity of the aberrant behavior checklist in a clinical sample of toddlers. Child Psychiatry Hum Dev, 40(1), 99-110. doi:10.1007/s10578-008-0108-7 Kay, S. R., Wolkenfeld, F., & Murrill, L. M. (1988). Profiles of aggression among psychiatric patients. I. Nature and prevalence. J Nerv Ment Dis, 176(9), 539-546. Kho, K., Sensky, T., Mortimer, A., & Corcos, C. (1998). Prospective study into factors associated with aggressive incidents in psychiatric acute admission wards. Br J Psychiatry, 172, 38-43. Klahr, A. M., & Burt, S. A. (2014). Practitioner Review: Evaluation of the known behavioral heterogeneity in conduct disorder to improve its assessment and treatment. J Child Psychol Psychiatry, 55(12), 1300-1310. doi:10.1111/jcpp.12268 35

ACCEPTED MANUSCRIPT

AC

CE

PT E

D

MA

NU

SC

RI

PT

Kronenberger, W. G., Giauque, A. L., Lafata, D. E., Bohnstedt, B. N., Maxey, L. E., & Dunn, D. W. (2007). Quetiapine addition in methylphenidate treatment-resistant adolescents with comorbid ADHD, conduct/oppositional-defiant disorder, and aggression: a prospective, open-label study. J Child Adolesc Psychopharmacol, 17(3), 334-347. doi:10.1089/cap.2006.0012 Lahey, B. B., Loeber, R., Hart, E. L., Frick, P. J., Applegate, B., Zhang, Q., . . . Russo, M. F. (1995). Four-year longitudinal study of conduct disorder in boys: patterns and predictors of persistence. J Abnorm Psychol, 104(1), 83-93. Lawrence, D., Johnson, S., Hafekost, J., Boterhoven De Haan, K., Sawyer, M., Ainley, J., & Zubrick, S. (2015). The Mental Health of Children and Adolescents. Report on the Second AustralianChild and Adolescent Survey of Mental Health and Wellbeing. . Retrieved from Lecavalier, L., Aman, M. G., Hammer, D., Stoica, W., & Mathews, G. L. (2004). Factor Analysis of the Nisonger Child Behavior Rating Form in Children with Autism Spectrum Disorders. J Autism Dev Disord, 34(6), 709-721. doi:10.1007/s10803-0045291-1 Lim, R., Liong, M. L., Lau, Y. K., & Yuen, K. H. (2015). Validity, reliability, and responsiveness of the ICIQ-UI SF and ICIQ-LUTSqol in the Malaysian population: Validation of ICIQ in Malaysia. Neurourology and Urodynamics, n/a-n/a. doi:10.1002/nau.22950 Lohr, K. N., Aaronson, N. K., Alonso, J., Burnam, M. A., Patrick, D. L., Perrin, E. B., & Roberts, J. S. (1996). Evaluating quality-of-life and health status instruments: development of scientific review criteria. Clin Ther, 18(5), 979-992. Loona, M. I., & Kamal, A. (2011). Translation and adaptation of disruptive behaviour disorder rating scale. Pakistan Journal of Psychological Research, 26(2), 149. Malone, R. P., Delaney, M. A., Luebbert, J. F., Cater, J., & Campbell, M. (2000). A doubleblind placebo-controlled study of lithium in hospitalized aggressive children and adolescents with conduct disorder. Arch Gen Psychiatry, 57(7), 649-654. Marshburn, E. C., & Aman, M. G. (1992). Factor validity and norms for the aberrant behavior checklist in a community sample of children with mental retardation. J Autism Dev Disord, 22(3), 357-373. Matson, J. L., Wilkins, J., Boisjoli, J. A., & Smith, K. R. (2008). The validity of the autism spectrum disorders-diagnosis for intellectually disabled adults (ASD-DA). Res Dev Disabil, 29(6), 537-546. doi:10.1016/j.ridd.2007.09.006 McHugh, M. L. (2012). Interrater reliability: the kappa statistic. Biochem Med (Zagreb), 22(3), 276-282. Mircea, C. E., Rojahn, J., & Esbensen, A. J. (2010). Psychometric Evaluation of Romanian Translations of the Behavior Problems Inventory-01 and the Nisonger Child Behavior Rating Form. Journal of Mental Health Research in Intellectual Disabilities, 3(1), 51. doi:10.1080/19315860903520515 Nakamura, B. J., Ebesutani, C., Bernstein, A., & Chorpita, B. F. (2009). A Psychometric Analysis of the Child Behavior Checklist DSM-Oriented Scales. Journal of Psychopathology and Behavioral Assessment, 31(3), 178-189. doi:10.1007/s10862-0089119-8 Norris, M., & Lecavalier, L. (2011). Evaluating the validity of the Nisonger Child Behavior Rating Form – Parent Version. Res Dev Disabil, 32(6), 2894-2900. doi:10.1016/j.ridd.2011.05.015 Oliver, P. C., Crawford, M. J., Rao, B., Reece, B., & Tyrer, P. (2007). Modified Overt Aggression Scale (MOAS) for People with Intellectual Disability and Aggressive Challenging Behaviour: A Reliability Study. Journal of Applied Research in Intellectual Disabilities, 20(4), 368-372. doi:10.1111/j.1468-3148.2006.00346.x Padhy, R., Saxena, K., Remsing, L., Huemer, J., Plattner, B., & Steiner, H. (2011). Symptomatic response to divalproex in subtypes of conduct disorder. Child Psychiatry Hum Dev, 42(5), 584-593. doi:10.1007/s10578-011-0234-5 Pelham, W. E., Jr., Gnagy, E. M., Greenslade, K. E., & Milich, R. (1992). Teacher ratings of DSM-III-R symptoms for the disruptive behavior disorders. J Am Acad Child Adolesc Psychiatry, 31(2), 210-218. doi:10.1097/00004583-199203000-00006 36

ACCEPTED MANUSCRIPT

AC

CE

PT E

D

MA

NU

SC

RI

PT

Pelletier, J., Collett, B., Gimpel, G., & Crowley, S. (2006). Assessment of Disruptive Behaviors in Preschoolers: Psychometric Properties of the Disruptive Behavior Disorders Rating Scale and School Situations Questionnaire. Journal of Psychoeducational Assessment, 24(1), 3-18. doi:10.1177/0734282905285235 Reebye, P. (2005). Aggression during early years - infancy and preschool. Can Child Adolesc Psychiatr Rev, 14(1), 16-20. Reebye, P., & Moretti, M. (2005). Perspectives on childhood and adolescent aggression. Can Child Adolesc Psychiatr Rev, 14(1), 2. Roberts, P., & Priest, H. (2006). Reliability and validity in research. Nurs Stand, 20(44), 41-45. doi:10.7748/ns2006.07.20.44.41.c6560 Rojahn, J. (1984). Self-injurious behavior in institutionalized, severely/profoundly retarded adults:Prevalence data and staff agreement. Journal of Behavioral Assessment, 6(1), 1327. doi:10.1007/BF01321457 Rojahn, J., Aman, M. G., Matson, J. L., & Mayville, E. (2003). The Aberrant Behavior Checklist and the Behavior Problems Inventory: convergent and divergent validity. Res Dev Disabil, 24(5), 391-404. Rojahn, J., & Helsel, W. J. (1991). The Aberrant Behavior Checklist with children and adolescents with dual diagnosis. J Autism Dev Disord, 21(1), 17-28. doi:10.1007/BF02206994 Rojahn, J., Matson, J. L., Lott, D., Esbensen, A. J., & Smalls, Y. (2001). The Behavior Problems Inventory: an instrument for the assessment of self-injury, stereotyped behavior, and aggression/destruction in individuals with developmental disabilities. J Autism Dev Disord, 31(6), 577-588. Rojahn, J., Rowe, E. W., Macken, J., Gray, A., Delitta, D., Booth, A., & Kimbrell, K. (2010). Psychometric Evaluation of the Behavior Problems Inventory-01 and the Nisonger Child Behavior Rating Form with Children and Adolescents. Journal of Mental Health Research in Intellectual Disabilities, 3(1), 28. doi:10.1080/19315860903558168 Rojahn, J., Rowe, E. W., Sharber, A. C., Hastings, R., Matson, J. L., Didden, R., . . . Dumont, E. L. (2012). The Behavior Problems Inventory-Short Form for individuals with intellectual disabilities: part II: reliability and validity. J Intellect Disabil Res, 56(5), 546-565. doi:10.1111/j.1365-2788.2011.01506.x Rojahn, J., Schroeder, S. R., Mayo-Ortega, L., Oyama-Ganiko, R., LeBlanc, J., Marquis, J., & Berke, E. (2013). Validity and reliability of the Behavior Problems Inventory, the Aberrant Behavior Checklist, and the Repetitive Behavior Scale-Revised among infants and toddlers at risk for intellectual or developmental disabilities: a multi-method assessment approach. Res Dev Disabil, 34(5), 1804-1814. doi:10.1016/j.ridd.2013.02.024 Sawyer, M. G., Arney, F. M., Baghurst, P. A., Clark, J. J., Graetz, B. W., Kosky, R. J., . . . Aubrick, S. R. (2000a). The Mental Health of Young People in Australia. Appendix A: Mean scores on the child behaviour checklist and youth self-report. Canberra: Commonwealth Department of Health and Aged Care. Sawyer, M. G., Arney, F. M., Baghurst, P. A., Clark, J. J., Graetz, B. W., Kosky, R. J., . . . Aubrick, S. R. (2000b). The Mental Health of Young People in Australia. Mental Health and Special Programs. Canberra: Commonwealth Department of Health and Aged Care. Schmidt, J. D., Huete, J. M., Fodstad, J. C., Chin, M. D., & Kurtz, P. F. (2013). An evaluation of the Aberrant Behavior Checklist for children under age 5. Res Dev Disabil, 34(4), 1190-1197. doi:10.1016/j.ridd.2013.01.002 Scott, S. (2012). Conduct Disorders. In I. R. J. (ed) (Series Ed.) e-Textbook of Child and Adolescent Mental Health Siddons, H., & Lancaster, S. (2004). An overview of the use of the Child Behavior Checklist within Australia. Camberwell, Vic: ACER. Silva, R. R., Alpert, M., Pouget, E., Silva, V., Trosper, S., Reyes, K., & Dummit, S. (2005). A rating scale for disruptive behavior disorders, based on the DSM-IV item pool. Psychiatr Q, 76(4), 327-339. doi:10.1007/s11126-005-4966-x Slade, M., Thornicroft, G., & Glover, G. (1999). The feasibility of routine outcome measures in mental health. Soc Psychiatry Psychiatr Epidemiol, 34(5), 243-249. 37

ACCEPTED MANUSCRIPT

AC

CE

PT E

D

MA

NU

SC

RI

PT

Smith, R. M. (2004). Detecting item bias with the Rasch model. J Appl Meas, 5(4), 430-449. Sorgi, P., Ratey, J., Knoedler, D. W., Markert, R. J., & Reichman, M. (1991). Rating aggression in the clinical setting. A retrospective adaptation of the Overt Aggression Scale: preliminary results. J Neuropsychiatry Clin Neurosci, 3(2), S52-56. Steiner, H., Petersen, M. L., Saxena, K., Ford, S., & Matthews, Z. (2003). Divalproex sodium for the treatment of conduct disorder: a randomized controlled clinical trial. J Clin Psychiatry, 64(10), 1183-1191. Steiner, H., Remsing, L., & Work Group on Quality, I. (2007). Practice parameter for the assessment and treatment of children and adolescents with oppositional defiant disorder. J Am Acad Child Adolesc Psychiatry, 46(1), 126-141. doi:10.1097/01.chi.0000246060.62706.af Steiner, H., Saxena, K. S., Carrion, V., Khanzode, L. A., Silverman, M., & Chang, K. (2007). Divalproex sodium for the treatment of PTSD and conduct disordered youth: a pilot randomized controlled clinical trial. Child Psychiatry Hum Dev, 38(3), 183-193. doi:10.1007/s10578-007-0055-8 Stevens, J., Quittner, A. L., & Abikoff, H. (1998). Factors influencing elementary school teachers' ratings of ADHD and ODD behaviors. J Clin Child Psychol, 27(4), 406-414. doi:10.1207/s15374424jccp2704_4 Stocks, J. D., Taneja, B. K., Baroldi, P., & Findling, R. L. (2012). A phase 2a randomized, parallel group, dose-ranging study of molindone in children with attentiondeficit/hyperactivity disorder and persistent, serious conduct problems. J Child Adolesc Psychopharmacol, 22(2), 102-111. doi:10.1089/cap.2011.0087 Suris, A., Lind, L., Emmett, G., Borman, P. D., Kashner, M., & Barratt, E. S. (2004). Measures of aggressive behavior: overview of clinical and research instruments. Aggression and Violent Behavior, 9(2), 165-227. doi:10.1016/S1359-1789(03)00012-0 Tassé, M. J., Aman, M. G., Hammer, D., & Rojahn, J. (1996). The Nisonger child behavior rating form: Age and gender effects and norms. Res Dev Disabil, 17(1), 59-75. doi:10.1016/0891-4222(95)00037-2 Ter-Stepanian, M., Grizenko, N., Zappitelli, M., & Joober, R. (2010). Clinical response to methylphenidate in children diagnosed with attention-deficit hyperactivity disorder and comorbid psychiatric disorders. Can J Psychiatry, 55(5), 305-312. Vogel, L. (2016). New tool evaluates risk of patient aggression. CMAJ : Canadian Medical Association journal = journal de l'Association medicale canadienne, 188(10), E200E200. doi:10.1503/cmaj.109-5276 Waschbusch, D. A., & Elgar, F. J. (2007). Development and validation of the Conduct Disorder Rating Scale. Assessment, 14(1), 65-74. doi:10.1177/1073191106289908 Wasserman, J., & Bracken, B. (2003). Psychometric Characteristics of Assessment Procedures Handbook of Psychology: John Wiley & Sons, Inc. Wehmeier, P. M., Schacht, A., Dittmann, R. W., Helsberg, K., Schneider-Fresenius, C., Lehmann, M., . . . Ravens-Sieberer, U. (2011). Effect of atomoxetine on quality of life and family burden: results from a randomized, placebo-controlled, double-blind study in children and adolescents with ADHD and comorbid oppositional defiant or conduct disorder. Qual Life Res, 20(5), 691-702. doi:10.1007/s11136-010-9803-5 Westerlund, J., Ek, U., Holmberg, K., Naswall, K., & Fernell, E. (2009). The Conners' 10-item scale: findings in a total population of Swedish 10-11-year-old children. Acta Paediatr, 98(5), 828-833. doi:10.1111/j.1651-2227.2008.01214.x World Health Organisation (WHO). (1991). ICD-10 Classification of Mental and Behavioural Disorders: Clinical Descriptions and Diagnostic Guidelines. Geneva: World Health Organization (WHO). Xiao, Q. (2012). Reliability and validity of the Iowa Conners rating scale in Chinese children. Neuropsychiatrie de l'Enfance et de l'Adolescence, 60(5), S266. doi:10.1016/j.neurenf.2012.04.698 Zhang, P., Roberts, R. E., Liu, Z., Meng, X., Tang, Z., Sun, L., & Yu, Y. (2012). Hostility, Physical Aggression and Trait Anger as Predictors for Suicidal Behavior in Chinese Adolescents:A School-Based Study. PLoS ONE, 7(2), 1-5.

38

ACCEPTED MANUSCRIPT APPENDIX Features of Included Instruments

Measures

Numb

Dom Time

Questi

Age

Cos

er of

ains

on

(years

t

items

or

types

)

($A

(time

Subs

UD

to

cales

)

measured

PT

Name

RI

compl

e 201

problem behaviours

Behavio

and social

ur

competence in

Rating

children with low

Form

IQs/mental

(NCBR

retardation.

’ 3rd Edition

Over the

Closed

past

and 1

month (for

open

6 3-16

0

Closed

6-18

650

Closed

5-18

380

closed questions); 1-2 months for open questions.

Behaviours

110

associated with

(20

conduct and

minute

aggression in

s)

AC

Conners

CE

F)

8

MA

r Child

76

D

Parent assessment of

PT E

Nisonge

NU

SC

ete)

Jun

7

Past month

ADHD +/CD/ODD. Children

Parent observed

33

’s

frequency and

(Parent

5

Past year

and 2

39

ACCEPTED MANUSCRIPT Aggressi

severity of

), 23

open

on Scale

aggressive and

(Teach

(CAS)

disruptive

er)

behaviours.

(10-15 minute s)

ur

in residential/

Checklis

community/

t

educational settings

(ABCC) Behavioural and

Behavio

emotional

ur

functioning and

minute

Checklis

social competence

s)

Behavio

PT E

2

(15-20

173

adult, eviden ce lackin g for <5yrs old46

Present to

Open

6-18

6 months

and

years

180

closed

CE AC

6-18

120

D

Child

Child to

MA

nity)

(CBCL)

Closed

available

(Commu

t

Not

PT

problem behaviours

5

RI

Behavio

58

SC

Mental retardation

NU

Aberrant

Self-injurious,

52

3

2 months

Closed

Childr

N/ A

ur

stereotyped and

en -

Problem

aggressive/destructiv

adults

s

e behaviours in the

Inventor

intellectually

y

disabled

40

ACCEPTED MANUSCRIPT (BPI01) Assess the nature,

16

4

d Overt

prevalence and

(<15

Aggress

severity of

minute

ion

aggression in

s)

Scale

psychiatric

(MOAS

populations.

Not

Closed

specified

specifi

ve

of ADHD, ODD,

(5-10

Behavio

and CD in children

minute

ur

and adolescents.

s)

6 months

RI

4

Disorder Rating

Not

0

specifi ed

PT E

D

Scale Parent assessment of

90

Vari

Weekly to

n,

ADHD, ODD and

items

ed

life-time

Nolan,

aggression

(10

and

symptoms

AC

CE

Swanso

Scale

Closed

SC

45

NU

Identify symptoms

MA

Disrupti

Rating

0

ed

)

Pelham

Not

PT

Modifie

Closed

Not

0

specifi ed

minute s)

(SNAPIV)

41

ACCEPTED MANUSCRIPT

AC

CE

PT E

D

MA

NU

SC

RI

PT

Graphical abstract

42

ACCEPTED MANUSCRIPT Highlights Instruments used to measure violent and aggressive behaviors in youth are lacking These tools are essential for conducting randomised controlled trials (RCTs) in this population We reviewed the instruments used in RCTs in violent and aggressive youth Most instruments aren’t specifically focussed on violence and aggression

AC

CE

PT E

D

MA

NU

SC

RI

PT

The CAS is psychometrically sound and measures aggressive and violent behaviors

43