Accepted Manuscript High Agreement was Obtained Across Scores from Multiple Equated Scales for Social Anxiety Disorder using Item Response Theory Matthew Sunderland, Philip Batterham, Alison Calear, Natacha Carragher, Andrew Baillie, Tim Slade PII:
S0895-4356(17)30916-2
DOI:
10.1016/j.jclinepi.2018.04.003
Reference:
JCE 9629
To appear in:
Journal of Clinical Epidemiology
Received Date: 10 August 2017 Revised Date:
16 March 2018
Accepted Date: 4 April 2018
Please cite this article as: Sunderland M, Batterham P, Calear A, Carragher N, Baillie A, Slade T, High Agreement was Obtained Across Scores from Multiple Equated Scales for Social Anxiety Disorder using Item Response Theory, Journal of Clinical Epidemiology (2018), doi: 10.1016/j.jclinepi.2018.04.003. This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT High Agreement was Obtained Across Scores from Multiple Equated Scales for Social Anxiety Disorder using Item Response Theory Matthew Sunderland1,2
Natacha Carragher1,4 Andrew Baillie1,5
M AN U
Tim Slade1,2
SC
Alison Calear3
RI PT
Philip Batterham3
1. Centre for Research Excellence in Mental Health and Substance Use, UNSW Sydney, Sydney, Australia.
TE D
2. National Drug and Alcohol Research Centre, UNSW Sydney, Sydney, Australia. 3. Centre for Mental Health Research, Australian National University, Canberra, Australia.
EP
4. Office of Medical Education, UNSW Sydney, Sydney, Australia. 5. Centre for Emotional Health, Macquarie University, Sydney, Australia.
AC C
Corresponding author: Dr Matthew Sunderland, Building R1, National Drug and Alcohol Research Centre, UNSW Sydney, Sydney, Australia. Email:
[email protected].
ACCEPTED MANUSCRIPT ABSTRACT Objective: There is no standardized approach to the measurement of social anxiety. Researchers and clinicians are faced with numerous self-report scales with varying strengths, weaknesses, and psychometric properties. The lack of standardization makes it difficult to
RI PT
compare scores across populations that utilise different scales. Item response theory offers one solution to this problem via equating different scales using an anchor scale to set a
standardized metric. This study is the first to equate several scales for social anxiety disorder.
SC
Design and setting: Data from two samples (n=3,175 and n=1,052), recruited from the
M AN U
Australian community using online advertisements, were utilised to equate a network of 11 self-report social anxiety scales via a fixed parameter item calibration method. Results: Comparisons between actual and equated scores for most of the scales indicted a high level of agreement with mean differences <0.10 (equivalent to a mean difference of less
TE D
than one point on the standardized metric).
Conclusion: This study demonstrates that scores from multiple scales that measure social anxiety can be converted to a common scale. Re-scoring observed scores to a common scale
EP
provides opportunities to combine research from multiple studies and ultimately better assess
AC C
social anxiety in treatment and research settings.
Keywords: Item response theory; scale equating; social anxiety disorder; psychometrics.
ACCEPTED MANUSCRIPT 1. INTRODUCTION Social anxiety disorder (SAD) is characterised by persistent distress and fear (primarily directed at fear of embarrassment or negative social judgement) during social
RI PT
interactions or performance situations. Those with social anxiety either endure social situations with intense distress and impaired functioning or avoid social or performance
situations utilising overt or subtle avoidance strategies [1]. SAD is one of the most common mental disorders experienced in developed countries and is strongly related to other comorbid
SC
anxiety and mood disorders, as well as substance use disorders and harmful alcohol use [2–
M AN U
4]. SAD is associated with an early age of onset relative to other mood and anxiety disorders, as well as a rapid transition from first fearing social or performance situations to overt avoidance [5,6]. If left untreated the disorder is associated with a persistent course and those with social anxiety often neglect to seek treatment or wait a substantial length of time before
TE D
seeking help [3,7].
Given the sizeable burden associated with SAD, it is imperative that measures are available that validly assess and quantify the severity of social anxiety in the population. Yet,
EP
there is no standardized approach to the measurement of social anxiety, and researchers and clinicians are faced with numerous scales that purport to measure SAD with varying
AC C
strengths, weaknesses, and psychometric properties [8]. The lack of standardization makes it difficult to compare scores across populations that utilise different scales. This in turn limits or introduces bias among studies that seek to harmonise or combine datasets from various populations or across time and limits clinical trials from generating robust evidence associated with the disorder [9]. Moreover, benchmarking of treatments as well as individual patient data meta-analyses require a standardized metric to draw comparisons between patient populations or different samples for proper evaluation [10]. With standardization, users of
ACCEPTED MANUSCRIPT different scales would have a common index of severity of disorder with known precision that would facilitate communication and generate robust evidence associated with SAD. Several strategies exist to standardize measurement in mental health, including the
RI PT
assertion that a single scale should be utilised for all future research trials and clinical assessment. Yet this approach is impractical given the different requirements of individual studies, different researcher and clinician preferences, no consensus on the optimal scale, and a need to maintain consistency with existing or historical datasets. An alternative approach
SC
utilises methods under the framework of Item Response Theory (IRT) to equate scores across
M AN U
a variety of scales on a common or unified metric [11,12]. In other words, regardless of the specific scale used, the severity of SAD associated with an individual could be re-scored onto a common metric to facilitate comparisons and data harmonization. Previous studies have demonstrated the utility of this approach and have equated several scales on a common metric for depression, generalized anxiety, and psychological distress [13–16]. Independent studies
TE D
have found that equated or linked scores demonstrate a high degree of accuracy and agreement [17,18]. To our knowledge no study has equated different scales for SAD using
EP
IRT and a common metric.
The first requirement to equate scores for different social anxiety scales is to establish
AC C
the common metric used as an anchor for the scale. Recent work by the authors utilised a systematic approach, known as item banking, to develop a valid, reliable, and relevant (based on clinical and patient ratings) set of self-report items for social anxiety disorder (labelled the SAD-bank) [19,20]. The item parameters of this SAD-bank were estimated in a community sample that was weighted to the sex, age, and comorbidity profiles of the Australian population [19]. Thus, scores on the SAD-bank metric can be meaningfully interpreted with respect to the general population (e.g., a mean score on the SAD-bank metric reflects the mean of the general population). A strength of using item banks to establish a common metric
ACCEPTED MANUSCRIPT include the rigorous development process under the assumptions of IRT, that make the item bank well suited for IRT-based scale equating. The current study is the first to equate several widely used and emerging scales for
RI PT
SAD using IRT-based equating. The scales selected by the current study included: the Leibowitz Social Anxiety Scale (LSAS; [21]), the Social Interaction Anxiety Scale (SIAS-20; [22]), the Social Phobia Scale (SPS-20; [22]), the Social Phobia Inventory (SPIN; [23]), and their associated short form scales (SIAS-6, SPS-6, Mini-SPIN; [24,25]). Moreover, two
SC
emerging and promising scales for SAD were included: the Social Phobia Screener (SOPHS;
M AN U
[26]) and the DSM-5 severity scale for social anxiety disorder (DSM-SAD; [27]) recommended by the American Psychiatric Association for dimensional assessment of SAD. All these scales, except the SPIN and Mini-SPIN, are freely available for use in research and clinical settings, which maximises their utility in population-based research. The current study examined IRT-based equating and provided estimates of agreement between the
TE D
equated and actual scores on the common metric. Finally, easy-to-use conversion tables were provided to facilitate scoring on the common metric. The primary aim of the current study
EP
was to establish a highly flexible yet unified approach to the measurement of SAD to
AC C
encourage scientific gains in future research and clinical work. 2. METHOD
2.1. Samples
Two samples were utilised in the current study to equate the multiple SAD scales.
Sample 1 comprised 3,175 Australian community-dwelling adults (aged 18+) recruited using Facebook advertising during August-December 2014. This sample was utilised in the original development study of the SAD-bank, as well as the development and simulation of adaptive static short forms derived from the full bank with further details described elsewhere [19,28].
ACCEPTED MANUSCRIPT In addition to a range of other measures, Sample 1 contained the SAD-bank and the SOPHS. In the current study, this sample was used to equate the SOPHS on the SAD-bank metric. Sample 2 comprised community-dwelling, English-speaking, Australian adults (18+
RI PT
years) recruited using Facebook advertisements during November-December 2016. Advertisements were placed as a promoted link on Facebook feeds and as a paid
advertisement appearing on the right-hand side of the Facebook website to targeted users. The data were obtained for the primary purpose of equating additional SAD scales to the
SC
SAD-bank metric and therefore Sample 2 contained the SAD-bank and all other SAD scales
M AN U
equated in the current study. The study was approved by the UNSW Human Research Ethics Committee (HC no. 16428). A total of 4,147 Facebook users clicked on the advertisement and were taken to a participant information and consent form. Of those, 1,988 (48%) participants provided informed consent and commenced the online survey. Only participants who completed the full battery of assessments were included in the study, due to the
TE D
requirements of the equating design, resulting in a final sample size of 1,052 (53% of those providing consent). The samples were skewed towards participants who were female,
EP
younger, more educated, and reported a greater severity of SAD. Descriptive statistics of both samples are provided in Table 1.
2.2.1.
AC C
2.2. Measures
SAD Item Bank
Further details on the development and calibration of the SAD-bank are provided in
Batterham et al. [19] and Batterham et al. [20]. The SAD-bank included 26 items targeting feelings and behaviours experienced during the past 30 days associated with social or performance anxiety, ranked on a five-point scale (0= Never, 1= Rarely, 2= Sometimes, 3= Often, 4= Always). Scoring for the SAD-bank is based on IRT methods and item parameters
ACCEPTED MANUSCRIPT estimated previously using a community sample weighted to the Australian population. Traditional IRT response pattern scores (θ scores) utilise a normal distribution with a mean of 0 and standard deviation of 1. These scores can be converted to a t-metric by multiplying the
of 50 and standard deviation of 10. 2.2.2. Social Phobia Screener (SOPHS)
RI PT
IRT score by 10 and adding 50, thereby placing scores on a more intuitive metric with a mean
SC
The SOPHS is a relatively new measure containing five items to assess the degree of fear, embarrassment, avoidance, and interference caused by social or performance situations
M AN U
experienced in the past 30 days. The SOPHS items are rated on a five-point scale with scores ranging from 0 to 20, higher scores representing more severe social anxiety. The SOPHS has strong psychometric properties among community and clinical samples including high internal consistency, high convergent validity, a strong single factor structure, sensitive to
TE D
change, and high sensitivity and specificity when detecting DSM-IV and DSM-5 diagnoses of social anxiety disorder [26,29]
EP
2.2.3. Leibowitz Social Anxiety Scale (LSAS)
The LSAS is one of the most widely used measures of social anxiety and comprises
AC C
24 items that cover a range of social interaction or performance situations [21]. The LSAS can be scored on two subscales with respondents asked to rate the level of fear and avoidance for each situation. To reduce the level of respondent burden in the current study, only data on the fear subscale of the LSAS was collected. Items on the fear scale are measured using a four-point scale ranging from 0 (no fear) to 3 (severe fear) for each situation. The LSAS can be administered as a self-report instrument with good test-retest reliability, satisfactory internal consistency, convergent/divergent and discriminant validity, and is sensitive to change [30,31].
ACCEPTED MANUSCRIPT 2.2.4. Social Phobia Inventory (SPIN) and Mini-Social Phobia Inventory (Mini-SPIN) The SPIN includes 17 items that ask respondents to indicate how much they have been bothered by fear, avoidance and physical symptoms of SAD in the past seven days.
RI PT
Scoring the SPIN is based on summing the total responses from each of the 17 items on a 5point response scale ranging from 0-4. Higher scores reflect greater severity of social anxiety. The SPIN has good internal consistency, test-retest reliability, acceptable
SC
convergent/divergent and discriminant validity, and sensitivity to change [23].
The Mini-SPIN comprises three items from the SPIN designed to briefly screen for
M AN U
SAD but also serve as a severity measure [25]. Similar to the SPIN, the Mini-SPIN has demonstrated excellent internal consistency, high convergent/divergent and discriminant validity, good test-retest reliability, sensitivity to change, and can accurately detect SAD from other anxiety or depressive states [32–34]. To reduce respondent burden and remove any
TE D
unnecessary repetition of the scales, the analysis of the Mini-SPIN in the current study was based on administration of the items embedded within the SPIN.
forms
EP
2.2.5. Social Interaction Anxiety Scale (SIAS-20), Social Phobia Scale (SPS-20), and short
AC C
The widely used SIAS-20 and SPS-20 were developed as companion measures to assess fear and distress associated with social interaction, as well as fears of being scrutinised by others during routine activities. The SIAS-20 and SPS-20 each comprise 20 items rated on a five-point scale ranging from 0 (not at all characteristic of me) to 4 (extremely characteristic of me). Total scores on each scale are calculated by summing the responses to each of the items, with higher scores representing more severe social anxiety. The SIAS-20 and SPS-20 have solid psychometric properties including high convergent/divergent validity,
ACCEPTED MANUSCRIPT internal consistency, test-retest reliability, and can adequately discriminate between social phobia and normal samples [22]. Peters and colleagues [24] applied nonparametric item response theory to select six
RI PT
items from each scale with optimal discriminating properties to develop a short form version of the SIAS-20 and SPS-20. Independent validation of the SIAS-6 and SPS-6 confirmed the utility of the scale in place of their longer counterparts [35]. Additional analyses using
bifactor models indicated that items from the two short form measures could be combined
SC
into a single scale, referred to here as the General Social Anxiety Scale-12 (GSAS-12), with
M AN U
higher total scores representing more severe levels of general social anxiety [36]. All three short form measures were examined in the current study. Like the Mini-SPIN, the analysis of the short form items in the current study were based on administration of the scales embedded within the long forms.
TE D
2.2.6. DSM-5 Severity scale for Social Anxiety Disorder (DSM-SAD) An emerging ten-item severity measure for DSM-5 social anxiety disorder (DSMSAD) is available for administration in adult populations and recommended for dimensional
EP
assessment of clinical cases to further enhance decision making by the American Psychiatric
AC C
Association. The DSM-SAD assesses behavioural responses, fear, and distress associated with usual social situations experienced in the past seven days on a five-point response scale. Total scores can range from 0 to 40 with higher scores indicating greater severity of social anxiety disorder. Evidence for the psychometric properties of the DSM-SAD is limited but initial validation studies in a clinical sample suggest the scale has adequate internal consistency, convergent and discriminant validity [27]. 2.3. Design and equating procedure
ACCEPTED MANUSCRIPT The current study utilised a single-group equating design where items from each instrument to be equated are administered to all participants [37]. The within-person design directly controls for differences in response propensities across the scales and is considered the strongest equating design [38]. To reduce the possibility of order effects, the
RI PT
administration of the SAD scales was randomised for each participant. There are multiple approaches available for scale equating using a single-group design, including IRT and nonIRT based approaches. The current study focused on an IRT-based approach to scale equating
SC
given the strong assumptions at the item level and greater flexibility associated with IRT-
M AN U
based approaches [38].
The specific IRT-based approach utilised in this study was fixed-parameter calibration [39]. In this approach, the SAD scales were equated using a single calibration of all items (the anchor scale and one additional scale to be equated). The item parameters for the anchor scale (i.e., the scale that sets the metric, in this case the SAD-bank) were fixed at their previously
TE D
published values, whereas the item parameters for the scale to be equated were freely estimated but subject to the metric set by the anchor set [19]. The freely estimated item
EP
parameters for the equated scale could then be used to generate scores that were scaled to the SAD-bank metric. The items were calibrated using the two-parameter logistic graded item
AC C
response model appropriate for ordinal categorical data [40]. Fixed-parameter calibration was implemented using the mirt package for R [41]. To examine the accuracy of the equated scores on the SAD-bank metric, the equated
scores generated by responses to items on each of the equated scales were compared to the actual SAD-bank scores generated by the responses provided to items on the SAD-bank. Several IRT scores on the SAD-bank metric were estimated for each participant using actual and equated item parameters [42]. It was also possible to convert raw total scores on each of the equated scales to the SAD-bank metric based on crosswalk or conversion tables [43]. The
ACCEPTED MANUSCRIPT equated scores based on IRT response pattern scoring as well as the crosswalk method were compared with the actual SAD-bank IRT scores using the intra-class correlation (ICC) coefficient, mean difference (bias), the standard deviation of the mean difference, and BlandAltman limits of agreement [44]. Finally, to examine the potential bias between equated
RI PT
scores and actual SAD-bank scores expected in smaller sample sizes, a resampling procedure was used to generate samples of n=50, n=100 and n=200 randomly selected from the full sample over 500 replications. Bias was calculated in each replication and the mean bias and
SC
95% confidence intervals of the bias distribution were reported.
M AN U
2.4. Assumption testing
Both IRT and scale equating require several assumptions of the data and the equating procedure [38]. The scales that are to be equated need to be unidimensional, measure the same construct, and be highly correlated. To test these assumptions the total raw scores on
TE D
the SAD-bank were correlated with the total raw scores on the scales to be equated. McDonald’s omega coefficient and the corrected item-total correlations were estimated for each scale as well as the combination of each scale with the SAD-bank items to measure
EP
internal consistency [45,46]. Confirmatory factor analysis (CFA) was then conducted using ordinal categorical data and a weighted least square mean and variance adjusted estimator as
AC C
implemented by Mplus 7.4 [47]. Unidimensional models were fit to combined items of the SAD-bank and each scale to be equated. Model fit was determined using a variety of fit statistics and previously determined cut-offs for good model fit, including: the comparative fit index (CFI; ≥0.90=adequate fit, ≥0.95 = excellent fit), the Tucker Lewis fit index (TLI; ≥0.90=adequate fit, ≥0.95 = excellent fit) and the root mean square error of approximation (RMSEA; ≤0.10 = adequate fit, ≤0.06 = excellent fit) [48,49].
ACCEPTED MANUSCRIPT To further examine the impact of multidimensionality on a single general factor, a series of exploratory bifactor models were estimated on the combined item sets [50,51]. Model fit statistics and eigenvalues were inspected along with the OmegaH coefficient and the percent of common variance explained by the general factor (ECV) [46]. A large ratio
RI PT
between the first and second eigenvalues, as well as high values for OmegaH (≥0.70) and ECV (≥0.60), provided evidence for a strong general factor and multidimensionality would have little impact on item parameters estimated using a unidimensional model [52]. Clusters
SC
of locally dependent items (providing some indication for unmodeled multidimensionality) were identified by inspection of the residual correlation matrix after fitting the IRT graded
M AN U
response model (using the mirt R package).
Finally, subgroup invariance of the equating procedures was examined by determining the Root Expected Mean Square Difference (REMSD) of the equating functions (i.e., cross-walk tables) across different subpopulations of interest [53,54]. The REMSD is a
TE D
single value summarizing the values of the standardized Root Mean Square Difference (RMSD) of the equating functions over the distribution of the equated scale and can be
EP
considered a type of effect size [55]. Invariance of the equating functions was examined across gender and age (18-45 vs 45+). The RMSD/REMSD values were weighted for the
AC C
unequal sample sizes associated with the subpopulations [55]. In the current study, REMSD values less than 0.10 (indexing a very small effect size) were considered small enough to justify population invariance of the equating functions. 2.5. Robustness Analysis The use of multiple equating methods provides more robust evidence for scale equating whilst attempting to determine any potential problems associated with the alternative methods [13]. As such, a separate calibration method for scale equating was also
ACCEPTED MANUSCRIPT applied to the data. This method involved freely estimating two different sets of item parameters before equating them using multiplicative and additive constants that linearly transform one set to be equivalent to the other set based on a common set of items [56]. Four methods were utilised to obtain the linking constants: 1) mean/mean, 2) mean/sigma, 3) the
RI PT
Haebara (1980) method, and 4) the Stocking-Lord (1983) method. The linking constants for each of the four methods were estimated using the plink package for R [58].
The four separate calibration methods to generate equated scores were compared to
SC
the fixed-parameter calibration method by determining the differences in equated scores
M AN U
along the SAD-bank severity metric. In line with previous studies, the fixed parameter calibration equating procedure could be considered robust if differences between scores across the equating methods were less than 1 expected score point along the common SADbank metric [13].
TE D
3. RESULTS
There were consistently high correlations between scores from each of the scales and the SAD-bank, ranging from r=0.80 for the SIAS-6 to r=0.87 for the SOPHS, SPIN, and
EP
DSM-SAD (mean correlation across all scales r=0.85). Table 2 displays the McDonald’s
AC C
omega coefficients and the minimum, average, and maximum corrected item-total correlations for each of the scales and the combined scales with the SAD-bank. Reliability for the scales was uniformly high with only the SIAS-6 and Mini-SPIN evidencing omega values below 0.9. Moreover, reliability of the combined scales with the SAD-bank were consistently very high (ω ≥ 0.99) as well as high average corrected item-total correlations (ranging from 0.74 to 0.84). Results of the CFA and the exploratory bifactor analyses of the combined scales with the SAD-bank are provided in Table 3. The combined item sets for the SAD-bank+LSAS and
ACCEPTED MANUSCRIPT SAD-bank+SPIN displayed evidence of adequate model fit (≥0.90) per the CFI and TLI and the SAD-bank+SIAS-20 displayed evidence of adequate model fit for the TLI whereas the remaining models demonstrated evidence of excellent fit (≥0.95) for the CFI and TLI. RMSEA values for all the unidimensional models were sitting on or just below the cut-off for
RI PT
acceptable fit (<0.10). The model fit statistics associated with the exploratory bifactor
analyses were all within the excellent range for CFI and TLI and well within the acceptable range for RMSEA. The OmegaH and ECV values were all very high (ωH ≥ 0.98, ECV ≥ 0.87)
SC
suggesting a dominant general factor accounting for most of the variance among items. There was some indication of local dependence among five item pairs as evidenced by relatively
M AN U
high residual correlations (>0.3) between LSAS6 (“Acting, performing or giving a talk in front of an audience”) and LSAS20 (“Giving a report to a group”), SPIN1 (“I am afraid of people in authority”) and SPIN16 (“I avoid speaking to anyone in authority”), SPIN5 (“Being criticized scares me a lot”) and SPIN12 (“I would do anything to avoid being criticized”),
TE D
SAD-Bank26 (“I avoided disagreeing with or expressing disapproval to others”) and SIAS13 (“I find it difficult to disagree with another’s point of view”), and SIAS9 (“I am at ease meeting people at parties”) and SIAS11 (“I find it easy to think of things to talk about”). The
EP
high residual correlations are most likely due to the similar wording associated with each
AC C
item pairing and the positive wording of the two SIAS items. Despite the presence of some local dependence among items, we continued to use IRT-based equating given the large OmegaH and ECV values (indicating any local dependence will likely have a small impact) and inspection of the slope parameters (see Supplementary Material 1) did not provide any indication of inflated discrimination. In terms of the robustness analysis, there was substantial overlap between the equating functions from the different equating methods (results are provided in the Supplementary Material 2). Differences between the fixed-parameter calibration and the
ACCEPTED MANUSCRIPT separate calibration methods on almost all scales were less than 1 point on the expected score across the range of the SAD-bank metric. The only exception was the difference between the fixed-parameter calibration method and the separate calibration method with mean/mean constants for the SPS-20 at the +2 to +2.5 severity range on the SAD-bank metric. However,
RI PT
differences were only slightly above the 1 point expected score cut-off with the largest
difference in expected scores estimated at 1.05. As such, the fixed-parameter calibration
equating method was deemed robust and was utilised for the remaining analyses. The equated
SC
item parameters for each of the SAD scales based on the fixed-parameter calibration are
M AN U
provided in Supplementary Material 1.
Comparisons of the actual SAD-bank scores and the equated scores for each scale using both IRT response pattern scoring and crosswalk scoring are provided in Table 4. The two scoring methods demonstrated similar results however a slight decrease in accuracy (SD of mean difference) was identified for the crosswalk scoring. The ICCs indicated that all the
TE D
equated scores were highly correlated (>0.88) with the actual SAD-bank scores providing evidence for equity among the scores. Overall, most of the equated scores demonstrated a
EP
high level of agreement with the actual SAD-bank scores with mean differences <0.1 (equivalent to a mean difference of less than one point on the t-metric). The Mini-SPIN,
AC C
SIAS-6, and SPS-6 demonstrated slightly higher mean differences but all could be considered small in terms of effect size (<0.2). The full Bland-Altman plots are presented in Supplementary Material 3 and provide some indication of larger differences in SAD-bank scores at below average scores on the SAD-bank (theta<0) for the SOPHS and SPS-6 whereas for the other scales the differences are more equally distributed across mean SADbank scores. Equated scores using the SIAS-20 provided the greatest agreement with the actual SAD-bank scores, closely followed by the SPIN, SPS-20, LSAS, and DSM-SAD. Equated
ACCEPTED MANUSCRIPT scores using the short form scales (Mini-SPIN, SIAS-6, SPS-6) were less accurate, however the SOPHS provided a good balance between number of items and accuracy (bias=0.09). The precision of each scale as a function of theta scores on the equated SAD-bank metric is described graphically in Figure 1 and demonstrate greater precision across the SAD-bank
RI PT
metric for the lengthier scales in comparison to the short forms with the SOPHS again
providing a good balance between precision and efficiency. The results of the resampling analysis are provided in Table 5 and indicate a greater spread of potential bias between
SC
equated and actual SAD-bank scores when the sample size was low (n=50) with the upper 95% confidence interval as high as 0.33 (equivalent to a 3 point difference on the t-metric)
M AN U
for the SIAS-6 cross walk method, however the potential bias improved with sample sizes of n=100 or more.
Population invariance of the equating functions across gender and age groups was confirmed based on REMSD values <0.1. For age, the REMSD values ranged between 0.018
TE D
for the SOPHS and 0.095 for the SPS. For gender, REMSD values ranged between 0.018 for the SPS and 0.048 for the LSAS. The crosswalk tables for the total population are presented
EP
in Supplementary Material 4 and could be utilised by researchers and clinicians to equate individual or group means on each of the different scales. IRT response pattern scoring for
AC C
each of the scales is available for increased accuracy and precision using the equated IRT parameters provided in the supplementary material, however this requires the use of software with IRT estimation capability to estimate scores. 4. DISCUSSION The current study equated a large network of widely used or emerging scales that measure SAD, based on a standardized or common metric (e.g., SAD-bank). The results indicated that the different SAD scales demonstrated good concurrent validity with the SAD-
ACCEPTED MANUSCRIPT bank metric as evidenced by correlations between raw scores ≥0.80. Moreover, the confirmatory factor analyses of the combined item sets indicated that a unidimensional model sufficiently accounted for most of the variance among items from the SAD-bank and each SAD scale. Comparisons of the different equating procedures indicated a high degree of
RI PT
similarity in equated scores and provided further evidence for the robustness of the findings. Likewise, any degree of population non-invariance of the equating function across gender and age groups was identified as small [54]. All scales demonstrated a relatively small mean
SC
difference (bias) between equated and actual SAD-bank scores in terms of effect size, as well as acceptable degrees of precision. Scales with fewer items, such as the SIAS-6, SPS-6 and
M AN U
Mini-SPIN short forms, tended to demonstrate greater bias in comparison to longer scales and less precision, whereas the SOPHS provided a good balance between scale length (at five items),bias (0.09), and precision across the SAD-bank metric. Finally, the cross-walk scoring method resulted in slightly less accurate scores than the IRT response pattern scoring,
TE D
however the added complexity associated with response pattern scoring may outweigh the minor differences observed between the two methods.
EP
The results of the current study are broadly concordant with other equating studies for constructs of mental health using IRT, particularly the common metrics set by the PROMIS
AC C
initiative for depression and anxiety [13,15–17]. Yet, to our knowledge, this is the first study to equate several scales specifically developed to measure SAD. The existing PROMIS anxiety common metric and equating functions can be best described as targeting broad or general levels of anxiety, such as those corresponding to generalized anxiety disorder or the broad construct of anxious-distress previously identified in hierarchical models of emotional disorders [15,59]. While it could be assumed that the PROMIS anxiety metric would be highly correlated with the SAD-bank metric, the PROMIS anxiety metric provides little information regarding the more targeted construct of anxiety, fear, and distress associated
ACCEPTED MANUSCRIPT with performance or social interactions. Clinicians or researchers wishing to assess social anxiety or equate existing measures with a more targeted focus than broad anxiety would benefit from utilising the measures and the equating functions developed in the current study.
RI PT
Several easy-to-use crosswalk or conversion tables were produced as part of this research study, as well as the data required to score scales using equated IRT parameters for more accurate equated scores on the SAD-bank metric. These tools provide researchers and clinicians with the ability to accurately compare scores from different studies that utilise
SC
different scales without the added effects of bias associated with a differential propensity to
M AN U
respond to each scale. The implications of this study include greater capacity for harmonizing datasets that measure SAD, including greater accuracy associated with meta-analyses and benchmarking exercises across clinical populations.
As one example, a technique known as cross-temporal meta-analysis involves
TE D
estimating trends across cohorts over time on a construct of interest (neuroticism, psychopathology, depression, etc.) using published data and extracting the mean scores from a common scale [60,61]. Indeed, existing studies have been limited to extracting data using a
EP
single scale and therefore continuity across time is limited given changing trends in the field regarding the use of scales. Extracting mean scores followed by converting those scores to
AC C
the SAD-bank metric would overcome this limitation and enable a greater number of data points to be extracted from studies over a longer period. Likewise, meta-analysis of individual patient data enables detailed and robust investigation of patient-level characteristics in comparison to meta-analysis of summary data, representing the top-tier of the hierarchy in terms of the best available evidence in medicine [62,63]. Using equated scores to conduct individual patient data meta-analysis may overcome several considerable and complex barriers associated with standardization and commensurate measures in the field
ACCEPTED MANUSCRIPT of mental health. Further work on the application of equated scores in meta-analysis is required. In terms of moving forward with a standardized metric for SAD, the most accurate
RI PT
scores will be obtained by direct administration of the SAD-bank items. Following that, the next level of accuracy could be obtained by administration of one of the adaptive or static screeners derived from the SAD-bank but with increased efficiency and therefore reduced respondent burden [28]. Finally, the next level of accuracy could be obtained by estimating
SC
the equated SAD-bank scores using one of the scales included in the current study. Whilst
M AN U
there is a lower level of accuracy and increased level of imprecision associated with equated scores, their use offers an increased level of flexibility and utility associated with assessment. Moreover, the current study utilised a single-group equating design, considered the most robust form of equating designs [38], as well as testing multiple forms of IRT equating. These factors ensure that the equating functions provided in the current study offer the best
TE D
approach to approximate a common metric or standardised measurement without administering a single common scale across multiple studies.
EP
The limitations of the current study should be kept in mind when interpreting the results. First, participants were recruited using Facebook advertisements and completed an
AC C
anonymous online survey to generate the data. The characteristics of the sample were markedly different from that of a representative sample of the Australian general population, including an increased proportion of females, more educated, and more severe social anxiety. As such, the results could be influenced by some degree of self-selection bias. However, steps were taken to reduce this bias by utilising IRT parameters for the SAD-bank metric that were estimated in a sample that was weighted to characteristics of the Australian population. Moreover, population invariance of the equating functions was examined to ensure that the equating does not systematically differ depending on gender or age. Second, the use of data
ACCEPTED MANUSCRIPT for the short form versions of the SIAS, SPS and SPIN were obtained from a single administration of the long forms. This was done to reduce respondent burden and to limit repetitiveness of the survey. However, the ecological validity of the SIAS-6, SPS-6, and Mini-SPIN requires further investigation as well as the use of these equating functions based
RI PT
on administration of the independent short forms. Third, the scale equating functions were evaluated in the current study (see Table 4) using the same sample that the equating functions were developed. This limitation runs the risk of generating overly optimistic results given the
SC
potential for overfitting the model to the data. Additional research is required to evaluate the current equating functions in an independent dataset and until such studies can be carried out
M AN U
the current results should be treated as preliminary evidence of validity. Fourth, despite the random presentation of the social anxiety scales during data collection, there is still the potential that order effects may have influenced the data given that multiple, very similar, items were presented to each participant and the overall response burden was quite high in
TE D
both samples (surveys took approximately 30-60 minutes to complete). Finally, the SADbank metric is assumed to target a single construct that represents the severity of generalized or broad social anxiety disorder. Likewise, the equated scales are assumed to represent a
EP
similar construct. While the results of the CFA and bifactor EFA indicate that a single
AC C
construct could be assumed, the SAD-bank metric does not provide any information on specific sub-factors or components of social anxiety. Additional work investigating the utility of multidimensional and bifactor models to establish the metric, as suggested for the measurement of depression [64], is required. This study demonstrates that scores from multiple scales that measure SAD can be converted to a common metric using the IRT parameters (in the case of response pattern scoring) or the cross-walk tables provided in the supplementary material. Re-scoring observed scores to a common scale provides opportunities to combine research from multiple
ACCEPTED MANUSCRIPT studies, to reduce inconsistency between different measures, and ultimately better assess SAD in treatment and research settings. By providing one approach to standardising the measurement of SAD, it is hoped that additional scientific gains will be achieved regarding the aetiology, identification, treatment, and prevention of social anxiety among the
AC C
EP
TE D
M AN U
SC
RI PT
population.
ACCEPTED MANUSCRIPT FUNDING Funding: MS, PB, and AC are supported by National Health and Medical Research Council fellowships [1052327, 1083311, 1122544]. The funders had no involvement in the study
AC C
EP
TE D
M AN U
SC
RI PT
design, collection, analysis, interpretation, and writing of this report.
ACCEPTED MANUSCRIPT REFERENCES [1]
Rapee RM, Heimberg RG. A cognitive-behavioral model of anxiety in social phobia. Behav Res Ther 1997;35:741–56. doi:10.1016/S0005-7967(97)00022-3. McEvoy PM, Grove R, Slade T. Epidemiology of anxiety disorder in the Australian
RI PT
[2]
general population: findings of the 2007 Australian National Survey of Mental Health and Wellbeing. Aust N Z J Psychiatry 2011;45:957–67.
Crome E, Grove R, Baillie AJAJ, Sunderland M, Teesson M, Slade T. DSM-IV and
SC
[3]
DSM-5 social anxiety disorder in the Australian community. Aust N Z J Psychiatry
[4]
M AN U
2015;49:227–35. doi:10.1177/0004867414546699.
Grant BF, Hasin DS, Blanco C, Stinson FS, Chou SP, Goldstein RB, et al. The epidemiology of social anxiety disorder in the United States: Results from the National
2005;66:1351–61. [5]
TE D
Epidemiologic Survey on Alcohol and Related Conditions. J Clin Psychiatry
Kessler RC, Berglund P, Demler O, Jin R, Merikangas KR, Walters EE. Lifetime
EP
Prevalence and Age-of-Onset Distributions of DSM-IV Disorders in the National Comorbidity Survey Replication. Arch Gen Psychiatry 2005;62:593.
[6]
AC C
doi:10.1001/archpsyc.62.6.593. Sunderland M, Crome E, Stapinski L, Baillie A, Rapee R. From fear to avoidance: factors associated with the onset of avoidance in people who fear social situations. J Exp Psychopathol 2016;7:534–48. doi:10.5127/jep.055216.
[7]
Beesdo-Baum K, Knappe S, Fehm L, Hofler M, Lieb R, Hofmann SG, et al. The natural course of social anxiety disorder among adolescents and young adults. Acta Psychiatr Scand 2012;126:411–25. doi:10.1111/j.1600-0447.2012.01886.x.
ACCEPTED MANUSCRIPT [8]
Wong QJJ, Gregory B, McLellan LF. A Review of Scales to Measure Social Anxiety Disorder in Clinical and Epidemiological Studies. Curr Psychiatry Rep 2016;18:38. doi:10.1007/s11920-016-0677-2. Hussong AM, Curran PJ, Bauer DJ. Integrative Data Analysis in Clinical Psychology
RI PT
[9]
Research. Annu Rev Clin Psychol 2013;9:61–89. doi:10.1146/annurev-clinpsy050212-185522.
SC
[10] Griffith L, van den Heuvel E, Fortier I, Hofer S, Raina P, Sohel N, et al.
Harmonization of Cognitive Measures in Individual Participant Data and Aggregate
M AN U
Data Meta-Analysis. Harmon. Cogn. Meas. Individ. Particip. Data Aggreg. Data MetaAnalysis, Agency for Healthcare Research and Quality (US); 2013. doi:http://www.ncbi.nlm.nih.gov/books/NBK132553/.
[11] Gibbons R, Perraillon MC, Kim JB. Item Response Theory Approaches to
TE D
Harmonization and Research Synthesis. Health Serv Outcomes Res Methodol 2014;14:213–31. doi:10.1007/s10742-014-0125-x.
EP
[12] Curran PJ, Hussong AM, Cai L, Huang W, Chassin L, Sher KJ, et al. Pooling data from multiple longitudinal studies: the role of item response theory in integrative data
AC C
analysis. Dev Psychol 2008;44:365–80. doi:10.1037/0012-1649.44.2.365. [13] Choi SW, Schalet B, Cook KF, Cella D. Establishing a common metric for depressive symptoms: Linking the BDI-II, CES-D, and PHQ-9 to PROMIS Depression. Psychol Assess 2014;26:513–27. doi:10.1037/a0035768. [14] Wahl I, Lowe B, Bjorner JB, Fischer F, Langs G, Voderholzer U, et al. Standardization of depression measurement: A common metric was developed for 11 self-report depression measures. J Clin Epidemiol 2014;67:73–86.
ACCEPTED MANUSCRIPT doi:10.1016/j.jclinepi.2013.04.019. [15] Schalet BD, Cook KF, Choi SW, Cella D. Establishing a common metric for selfreported anxiety: Linking the MASQ, PANAS, and GAD-7 to PROMIS Anxiety. J
RI PT
Anxiety Disord 2014;28:88–96. doi:10.1016/j.janxdis.2013.11.006. [16] Kaat AJ, Newcomb ME, Ryan DT, Mustanski B. Expanding a common metric for depression reporting: linking two scales to PROMIS® depression. Qual Life Res
SC
2016:1–10. doi:10.1007/s11136-016-1450-z.
[17] Kim J, Chung H, Askew RL, Park R, Jones SMW, Cook KF, et al. Translating CESD-
M AN U
20 and PHQ-9 Scores to PROMIS Depression. Assessment 2017;24:300–7. doi:10.1177/1073191115607042.
[18] Liegl G, Wahl I, Berghofer A, Nolte S, Pieh C, Rose M, et al. Using Patient Health Questionnaire-9 item parameters of a common metric resulted in similar depression
TE D
scores compared to independent item response theory model reestimation. J Clin Epidemiol 2016;71:25–34. doi:10.1016/j.jclinepi.2015.10.006.
EP
[19] Batterham PJ, Sunderland M, Carragher N, Calear AL. Development and communitybased validation of eight item banks to assess mental health. Psychiatry Res
AC C
2016;243:452–63. doi:10.1016/j.psychres.2016.07.011.
[20] Batterham PJ, Brewer JL, Tjhin A, Sunderland M, Carragher N, Calear ALAL. Systematic item selection process applied to developing item pools for assessing multiple mental health problems. J Clin Epidemiol 2015;68:913–9. doi:10.1016/j.jclinepi.2015.03.022. [21] Liebowitz MR. Social Phobia. Mod Probl Pharmacopsychiatry 1987;22:141–73. doi:10.1159/000414022.
ACCEPTED MANUSCRIPT [22] Mattick RP, Clarke JCC. Development and validation of measures of social phobia scrutiny fear and social interaction anxiety. Behav Res Ther 1998;36:455–70. doi:10.1016/S0005-7967(97)10031-6.
RI PT
[23] Conner KM, Davidson JRT, Churchill LE, Sherwood A, Weisler RH. Psychometric Properties of the Social Phobia Inventory (SPIN). Br J Psychiatry 2000;176:379–86. doi:10.1192/bjp.176.4.379.
SC
[24] Peters L, Sunderland M, Andrews G, Rapee RMRM, Mattick RPRP. Development of a short form Social Interaction Anxiety (SIAS) and Social Phobia Scale (SPS) using
M AN U
nonparametric item response theory: The SIAS-6 and the SPS-6. Psychol Assess 2012;24:66–76. doi:10.1037/a0024544.
[25] Conner KM, Kobak KA, Churchill E, Katzelnick DJ, Davidson JRT. Mini-SPIN: a
2001;14:137–40.
TE D
brief screening assessment for generalized social anxiety disorder. Depress Anxiety
[26] Batterham PJ, Mackinnon AJ, Christensen H. Community-Based Validation of the
EP
Social Phobia Screener (SOPHS). Assessment 2016:1073191116636448. doi:10.1177/1073191116636448.
AC C
[27] LeBeau RT, Mesri B, Craske MG. The DSM-5 social anxiety disorder severity scale: Evidence of validity and reliability in a clinical sample. Psychiatry Res 2016;244:94– 6. doi:10.1016/j.psychres.2016.07.024.
[28] Sunderland M, Batterham PJ, Calear AL, Carragher N. The development and validation of static and adaptive screeners to measure the severity of panic disorder, social anxiety disorder, and obsessive compulsive disorder. Int J Methods Psychiatr Res 2017:e1561. doi:10.1002/mpr.1561.
ACCEPTED MANUSCRIPT [29] Griffiths KM, Walker J, Batterham PJ. Help seeking for social anxiety: A pilot randomised controlled trial. Digit Heal 2017;3:205520761771204. doi:10.1177/2055207617712047.
RI PT
[30] Fresco DM, Coles ME, Heimberg RG, Liebowitz MR, Hami S, Stein MB, et al. The Liebowitz Social Anxiety Scale: a comparison of the psychometric properties of selfreport and clinician administered formats. Psychol Med 2001;31:1025–35.
SC
[31] Baker SL, Heinrichs N, Kim HJ, Hofmann SG. The Liebowitz social anxiety scale as a self-report instrument: A preliminary psychometric analysis. Behav Res Ther
M AN U
2002;40:701–15. doi:10.1016/S0005-7967(01)00060-2.
[32] Newby JM, Mewton L, Williams AD, Andrews G. Effectiveness of transdiagnostic internet cognitive behavioural treatment for mixed anxiety and depression in primary care. J Affect Disord 2014;165:45–52. doi:10.1016/j.jad.2014.04.037.
TE D
[33] Fogliati VJ, Terides MD, Gandy M, Staples LG, Johnston L, Karin E, et al. Psychometric properties of the mini-social phobia inventory (Mini-SPIN) in a large
EP
online treatment-seeking sample. Cogn Behav Ther 2016;45:236–57. doi:10.1080/16506073.2016.1158206.
AC C
[34] Seeley-Wait E, Abbott MJ, Rapee RM. Psychometric properties of the mini-social phobia inventory. Prim Care Companion J Clin Psychiatry 2009;11:231–6. doi:10.4088/PCC.07m00576.
[35] Le Blanc AL, Bruce LC, Heimberg RG, Hope DA, Blanco C, Schneier FR, et al. Evaluation of the Psychometric Properties of Two Short Forms of the Social Interaction Anxiety Scale and the Social Phobia Scale. Assessment 2014;21:312–23. doi:10.1177/1073191114521279.
ACCEPTED MANUSCRIPT [36] Gomez R. Factor structure of the Social Interaction Anxiety Scale and the Social Phobia Scale Short Forms. Pers Individ Dif 2016;96:83–7. doi:10.1016/j.paid.2016.02.086.
RI PT
[37] Kolen MJ, Brennan RL. Test Equating, Scaling, and Linking. 2nd Editio. New York, NY: Springer; 2004.
[38] Dorans NJ. Linking scores from multiple health outcome instruments. Qual Life Res
SC
2007;16:85–94. doi:10.1007/s11136-006-9155-3.
[39] Kim S. A comparative study of IRT fixed parameter calibration methods. J Educ Meas
M AN U
2006;43:355–81. doi:10.1111/j.1745-3984.2006.00021.x.
[40] Samejima F. Graded Response Model. Handb. Mod. Item Response Theory, New York, NY: Springer New York; 1997, p. 85–100. doi:10.1007/978-1-4757-2691-6_5.
TE D
[41] Chalmers RP. mirt: A multidimensional item response theory package for the R environment. J Stat Softw 2012;48:1–29. doi:10.18637/jss.v048.i06. [42] Embretson SE, Reise SP. Item Response Theory for Psychologists. Mahwah:
EP
Lawrence Erlbaum Associates; 2000.
AC C
[43] Thissen D, Pommerich M, Billeaud K, Williams VSL. Item Response Theory for Scores on Tests Including Polytomous Items with Ordered Responses. Appl Psychol Meas 1995;19:39–49. doi:10.1177/014662169501900105.
[44] Bland JM, Altman DG. Measuring agreement in method comparison studies. Stat Methods Med Res 1999;8:135–60. doi:10.1177/096228029900800204. [45] Dunn TJ, Baguley T, Brunsden V. From alpha to omega: A practical solution to the pervasive problem of internal consistency estimation. Br J Psychol 2014;105:399–412.
ACCEPTED MANUSCRIPT doi:10.1111/bjop.12046. [46] Rodriguez A, Reise SP, Haviland MG. Evaluating Bifactor Models : Calculating and Interpreting Statistical Indices. Psychol Methods 2016;21:137–50.
RI PT
doi:10.1037/met0000045. [47] Muthen LK, Muthen BO. Mplus Users’ Guide. Los Angeles, CA.: Muthen & Muthen; 2015.
SC
[48] Hu L, Bentler PM. Cutoff criteria for fit indexes in covariance structure analysis:
M AN U
conventional criteria versus new alternatives. Struct Equ Model 1999;6:1–55. [49] Sharma S, Mukherjee S, Kumar A, Dillon WR. A simulation study to investigate the use of cutoff values for assessing model fit in covariance structure models. J Bus Res 2005;58:935–43. doi:10.1016/j.jbusres.2003.10.007.
TE D
[50] Jennrich RI, Bentler PM. Exploratory Bi-factor Analysis: The Oblique Case. Psychometrika 2012;77:442–54. doi:10.1007/s11336-012-9269-1. [51] Cook KF, Kallen MA, Amtmann D. Having a fit: impact of number of items and
EP
distribution of data on traditional criteria for assessing IRT’s unidimensionality
AC C
assumption. Qual Life Res 2009;18:447–60. doi:10.1007/s11136-009-9464-4. [52] Reise SP, Scheines R, Widaman KF, Haviland MG. Multidimensionality and Structural Coefficient Bias in Structural Equation Modeling. Educ Psychol Meas 2013;73:5–26. doi:10.1177/0013164412449831.
[53] Dorans NJ, Holland PW. Population Invariance and the Equatability of Tests: Basic Theory and The Linear Case. J Educ Meas 2000;37:281–306. doi:10.1111/j.17453984.2000.tb01088.x.
ACCEPTED MANUSCRIPT [54] Huggins AC, Penfield RD. An NCME instructional module on population invariance in linking and equating. Educ Meas Issues Pract 2012;31:27–40. [55] Dorans NJ. Equating, Concordance, and Expectation. Appl Psychol Meas
RI PT
2004;28:227–46. doi:10.1177/0146621604265031. [56] Stocking ML, Lord FM. Developing a Common Metric in Item Response Theory. Appl Psychol Meas 1983;7:201–10. doi:10.1177/014662168300700208.
SC
[57] Haebara T. Equating logistic ability scales by a weighted least squares method. Jpn
M AN U
Psychol Res 1980;22:144–9. doi:10.4992/psycholres1954.22.144.
[58] Weeks JP. plink: An R Package for Linking Mixed-Format Tests Using IRT-Based Methods. J Stat Softw 2010;35:1–33. doi:10.18637/jss.v035.i12. [59] Watson D, O’hara MW, Stuart S. Hierarchical structures of affect and
TE D
psychopathology and their implications for the calssification of emotional disorders. Depress Anxiety 2008;25:282–8.
[60] Twenge JM, Gentile B, DeWall CN, Ma D, Lacefield K, Schurtz DR. Birth cohort
EP
increases in psychopathology among young Americans, 1938–2007: A cross-temporal
AC C
meta-analysis of the MMPI. Clin Psychol Rev 2010;30:145–54. doi:10.1016/j.cpr.2009.10.005.
[61] Mackenzie CS, Erickson J, Deane FP, Wright M. Changes in attitudes toward seeking mental health services: A 40-year cross-temporal meta-analysis. Clin Psychol Rev 2014;34:99–106. doi:10.1016/j.cpr.2013.12.001. [62] Steinberg KK, Smith SJ, Stroup DF, Olkin I, Lee NC, Williamson GD, et al. Comparison of effect estimates from a meta-analysis of summary data from published
ACCEPTED MANUSCRIPT studies and from a meta-analysis using individual patient data for ovarian cancer studies. Am J Epidemiol 1997;145:917–25. [63] Stewart LA, Tierney JF. To IPD or not to IPD? Eval Health Prof 2002;25:76–97.
RI PT
doi:10.1177/0163278702025001006. [64] Croudace TJ, Böhnke JR. Item bank measurement of depression: Will one dimension
AC C
EP
TE D
M AN U
SC
work? J Clin Epidemiol 2014;67:4–6. doi:10.1016/j.jclinepi.2013.08.002.
1
ACCEPTED MANUSCRIPT TABLES Table 1: Sample characteristics Sample 1
EP AC C
Area
TE D
Education
RI PT
SC
Gender
18-25 26-35 36-45 46-55 56-65 66+ Male Female Other Prefer not to answer Missing Less than high school High school Certificate/Diploma Bachelor’s degree Higher degree Prefer not to answer Metropolitan area (capital city) Regional area (other city/town) Rural or remote area Missing
N % N % 371 11.7 245 23.3 277 8.7 233 22.2 506 15.9 153 14.5 830 26.1 186 17.7 842 26.5 173 16.4 349 11 62 5.9 648 20.4 208 19.8 2527 79.6 813 77.3 23 2.2 5 0.5 3 0.3 382 12.0 64 6.1 440 13.9 169 16.1 963 30.3 281 26.7 630 19.8 257 24.4 747 23.5 274 26.0 13 0.4 7 0.7
M AN U
Characteristic Age
Sample 2
1421
44.8
508
48.3
1279
40.3
447
42.5
475 -
15 -
92 5
8.7 0.5
2
ACCEPTED MANUSCRIPT Table 2: Reliability and item-total correlations for individual scales and combined with SAD item bank Corrected item-total correlation
AC C
EP
TE D
M AN U
SC
RI PT
Number McDonald’s Scale of items Omega Min Average Max SAD-bank 26 0.99 0.64 0.81 0.87 SOPHS 5 0.98 0.86 0.88 0.90 LSAS 24 0.97 0.48 0.67 0.78 SPIN 17 0.96 0.58 0.70 0.82 Mini-SPIN 3 0.88 0.62 0.70 0.78 SIAS-20 20 0.96 0.39 0.67 0.84 SIAS-6 6 0.88 0.42 0.62 0.84 SPS-20 20 0.97 0.54 0.71 0.83 SPS-6 6 0.94 0.63 0.77 0.84 GSAS-12 12 0.95 0.44 0.69 0.80 DSM-SAD 10 0.96 0.59 0.76 0.84 SAD item bank + SOPHS 31 0.99 0.73 0.84 0.89 SAD item bank + LSAS 50 0.99 0.47 0.73 0.86 SAD item bank + SPIN 43 0.99 0.54 0.75 0.86 SAD item bank + Mini-SPIN 29 0.99 0.65 0.80 0.87 SAD item bank + SIAS-20 46 0.99 0.36 0.73 0.86 SAD item bank + SIAS-6 32 0.99 0.45 0.77 0.87 SAD item bank + SPS-20 46 0.99 0.50 0.74 0.87 SAD item bank + SPS-6 32 0.99 0.62 0.79 0.87 SAD item bank + GSAS -12 38 0.99 0.46 0.76 0.87 SAD item bank + DSM-SAD 36 0.99 0.57 0.78 0.87 Notes: SAD item bank = Social Anxiety Disorder item bank, SOPHS = Social Phobia Screener, LSAS = Liebowitz Social Anxiety Scale – Fear subscale, SPIN = Social Phobia Inventory, SIAS = Social Interaction Anxiety Scale, SPS = Social Phobia Scale, GSAS =General Social Anxiety Scale from SIAS-6/SPS-6, DSM-SAD = DSM-5 Severity Screener for Social Anxiety Disorder.
ACCEPTED MANUSCRIPT
3
Table 3: Tests for unidimensionality of the combined scales with the SAD item bank
AC C
EP
TE D
M AN U
SC
RI PT
Unidimensional fit Exploratory Bifactor fit statistics statistics Eigenvalue Model CFI TLI RMSEA CFI TLI RMSEA 1 2 3 4 ECV OmegaH SAD-bank + SOPHS 0.98 0.98 0.09 0.99 0.99 0.04 24.34 0.71 0.50 0.21 0.95 0.99 SAD-bank + LSAS 0.93 0.93 0.09 0.98 0.97 0.06 30.72 1.65 1.63 1.37 0.87 0.98 SAD-bank + SPIN 0.94 0.93 0.10 0.97 0.97 0.07 27.88 2.01 1.20 0.91 0.87 0.99 SAD-bank + Mini-SPIN 0.97 0.97 0.10 0.99 0.99 0.06 20.68 0.80 0.73 0.48 0.91 0.98 SAD-bank + SIAS-20 0.95 0.94 0.09 0.98 0.98 0.06 28.52 1.48 1.44 1.21 0.87 0.99 SAD-bank + SIAS-6 0.97 0.96 0.10 0.99 0.99 0.06 21.66 0.93 0.81 0.74 0.90 0.98 SAD-bank + SPS-20 0.95 0.95 0.09 0.99 0.98 0.05 29.63 1.52 1.36 1.13 0.88 0.98 SAD-bank + SPS-6 0.97 0.96 0.10 0.99 0.99 0.06 22.43 1.19 0.68 0.54 0.90 0.98 SAD-bank + GSAS-12 0.96 0.96 0.10 0.99 0.98 0.06 25.18 1.19 1.12 0.85 0.89 0.98 SAD-bank + DSM-SAD 0.97 0.96 0.09 0.99 0.98 0.06 24.98 1.07 1.05 0.50 0.90 0.98 Notes: SAD item bank = Social Anxiety Disorder item bank, SOPHS=Social Phobia Screener, LSAS = Liebowitz Social Anxiety Scale – Fear subscale, SPIN = Social Phobia Inventory, SIAS = Social Interaction Anxiety Scale, SPS = Social Phobia Scale, GSAS =General Social Anxiety Scale from SIAS-6/SPS-6, DSM-SAD = DSM-5 Severity Screener for Social Anxiety Disorder. CFI = Confirmatory Fit index, TLI = TuckerLewis Fit Index, RMSEA = Root mean square error of approximation. Exploratory bifactor analysis included one general factor and three specific factors. ECV = explained common variance.
ACCEPTED MANUSCRIPT
4
Table 4: Intraclass correlations, mean differences, standard deviation of differences, and limits of agreement between equated and actual SAD-bank scores
AC C
EP
TE D
M AN U
SC
RI PT
Limits of agreement Mean SD of Scale-Scoring Method ICC difference differences Lower Upper Equated SOPHS IRT scoring 0.91 0.08 0.60 -1.10 1.26 Equated SOPHS cross walk scoring 0.91 0.09 0.60 -1.09 1.27 Equated LSAS IRT scoring 0.92 0.05 0.52 -0.97 1.07 Equated LSAS cross walk scoring 0.92 0.06 0.53 -0.98 1.10 Equated SPIN IRT scoring 0.93 0.04 0.47 -0.88 0.96 Equated SPIN cross walk scoring 0.93 0.05 0.50 -0.93 1.03 Equated Mini-SPIN IRT scoring 0.89 0.15 0.57 -0.97 1.27 Equated Mini-SPIN cross walk scoring 0.88 0.16 0.57 -0.96 1.28 Equated SIAS-20 IRT scoring 0.93 0.03 0.48 -0.91 0.97 Equated SIAS-20 cross walk scoring 0.92 0.04 0.50 -0.94 1.02 Equated SIAS-6 IRT scoring 0.89 0.14 0.59 -1.02 1.30 Equated SIAS-6 cross walk scoring 0.88 0.17 0.60 -1.01 1.35 Equated SPS-20 IRT scoring 0.92 0.05 0.51 -0.95 1.05 Equated SPS-20 cross walk scoring 0.92 0.07 0.52 -0.95 1.09 Equated SPS-6 IRT scoring 0.88 0.13 0.61 -1.07 1.33 Equated SPS-6 cross walk scoring 0.88 0.16 0.61 -1.04 1.36 Equated GSAS-12 IRT scoring 0.92 0.07 0.50 -0.91 1.05 Equated GSAS-12 cross walk scoring 0.92 0.09 0.51 -0.91 1.09 Equated DSM-SAD IRT scoring 0.93 0.06 0.46 -0.84 0.96 Equated DSM-SAD cross walk scoring 0.93 0.08 0.47 -0.84 1.00 Notes: SOPHS=Social Phobia Screener, LSAS = Liebowitz Social Anxiety Scale – Fear subscale, SPIN = Social Phobia Inventory, SIAS = Social Interaction Anxiety Scale, SPS = Social Phobia Scale, GSAS=General Social Anxiety Scale from SIAS-6/SPS-6, DSM-SAD = DSM-5 Severity Screener for Social Anxiety Disorder. SD = standard deviation, ICC = intraclass correlation, IRT scoring = individual response pattern
ACCEPTED MANUSCRIPT
5
AC C
EP
TE D
M AN U
SC
RI PT
scoring (θ metric with mean=0 and SD=1) based on Bayesian expected a posteriori (EAP), cross walk scoring = scores (θ metric with mean=0 and SD=1) based on sum score to EAP score tables.
ACCEPTED MANUSCRIPT
6
Table 5: Mean bias, SD of mean bias, and 95% confidence intervals of bias of SAD-bank scores with equated scores at sample size n=50, n=100, and n=200 over 500 replications
AC C
EP
TE D
M AN U
SC
RI PT
N=50 N=100 N=200 Scale-Scoring method Mean Bias SD 95% CI Mean Bias SD 95% CI Mean Bias SD 95% CI 0.06 0.06 -0.06 0.18 0.07 0.04 -0.01 0.15 Equated SOPHS IRT scoring 0.06 0.08 -0.10 0.22 Equated SOPHS cross walk scoring 0.07 0.08 -0.09 0.23 0.07 0.06 -0.05 0.19 0.07 0.04 -0.01 0.15 Equated LSAS IRT scoring 0.05 0.08 -0.11 0.21 0.05 0.05 -0.05 0.15 0.05 0.03 -0.01 0.11 Equated LSAS cross walk scoring 0.06 0.08 -0.10 0.22 0.06 0.05 -0.04 0.16 0.06 0.03 0.00 0.12 Equated SPIN IRT scoring 0.04 0.06 -0.08 0.16 0.04 0.04 -0.04 0.12 0.04 0.03 -0.02 0.10 Equated SPIN cross walk scoring 0.06 0.07 -0.08 0.20 0.05 0.05 -0.05 0.15 0.05 0.03 -0.01 0.11 Equated Mini-SPIN IRT scoring 0.15 0.08 -0.01 0.31 0.15 0.05 0.05 0.25 0.15 0.03 0.09 0.21 Equated Mini-SPIN cross walk scoring 0.16 0.08 0.00 0.32 0.16 0.05 0.06 0.26 0.16 0.03 0.10 0.22 Equated SIAS-20 IRT scoring 0.03 0.07 -0.11 0.17 0.03 0.04 -0.05 0.11 0.03 0.03 -0.03 0.09 Equated SIAS-20 cross walk scoring 0.05 0.07 -0.09 0.19 0.04 0.05 -0.06 0.14 0.04 0.03 -0.02 0.10 Equated SIAS-6 IRT scoring 0.14 0.08 -0.02 0.30 0.14 0.06 0.02 0.26 0.14 0.04 0.06 0.22 Equated SIAS-6 cross walk scoring 0.17 0.08 0.01 0.33 0.17 0.06 0.05 0.29 0.17 0.04 0.09 0.25 Equated SPS-20 IRT scoring 0.05 0.07 -0.09 0.19 0.05 0.05 -0.05 0.15 0.05 0.03 -0.01 0.11 Equated SPS-20 cross walk scoring 0.08 0.07 -0.06 0.22 0.08 0.05 -0.02 0.18 0.08 0.03 0.02 0.14 Equated SPS-6 IRT scoring 0.13 0.08 -0.03 0.29 0.13 0.06 0.01 0.25 0.13 0.04 0.05 0.21 Equated SPS-6 cross walk scoring 0.15 0.08 -0.01 0.31 0.16 0.06 0.04 0.28 0.16 0.04 0.08 0.24 Equated GSAS-12 IRT scoring 0.08 0.07 -0.06 0.22 0.07 0.05 -0.03 0.17 0.07 0.03 0.01 0.13 Equated GSAS-12 cross walk scoring 0.10 0.07 -0.04 0.24 0.09 0.05 -0.01 0.19 0.10 0.03 0.04 0.16 Equated DSM-SAD IRT scoring 0.06 0.06 -0.06 0.18 0.06 0.05 -0.04 0.16 0.06 0.03 0.00 0.12 Equated DSM-SAD cross walk scoring 0.08 0.06 -0.04 0.20 0.07 0.05 -0.03 0.17 0.08 0.03 0.02 0.14 Notes: SOPHS=Social Phobia Screener, LSAS = Liebowitz Social Anxiety Scale – Fear subscale, SPIN = Social Phobia Inventory, SIAS = Social Interaction Anxiety Scale, SPS = Social Phobia Scale, GSAS=General Social Anxiety Scale from SIAS-6/SPS-6, DSM-SAD = DSM-5 Severity Screener for Social Anxiety Disorder. SD = standard deviation of bias estimates, 95% CI = 95% confidence intervals of bias estimates, IRT scoring = individual response pattern scoring (θ metric with mean=0 and SD=1) based on Bayesian expected a posteriori (EAP), cross walk scoring = scores (θ metric with mean=0 and SD=1) based on sum score to EAP score tables.
ACCEPTED MANUSCRIPT
40
RI PT
35
SPIN Mini-SPIN SIAS-20
M AN U
25
Information
LSAS
SC
30
20
15
DSM-SAD SIAS-6 SPS-6 GSAS-12
5
0 -3
-2
-1
AC C
-4
EP
TE D
10
SPS-20
0
SOPHS
1
2
3
4
SAD-Bank Theta score
Figure 1: Test information curves for each social anxiety scale equated on the SAD-Bank metric using Fixed item calibration. Note: Information can be interpreted as the degree of precision associated with the equated scale as a function of the SADbank theta scores (An information value of 11 is approximately equivalent to a reliability coefficient of 0.9).
ACCEPTED MANUSCRIPT What is new?
•
The current study is the first to apply Item Response Theory to equate scores from several scales for social anxiety disorder using a common metric. The results indicated a high level of agreement between equated scores and actual
RI PT
•
scores on the common metric across the sample. •
This study demonstrates that it is possible to effectively standardize the measurement
SC
of social anxiety using Item Response Theory, facilitating comparisons across patients and studies as well as enhancing integrative data analysis.
M AN U
Researchers and clinicians can utilize the equated item response parameters or the score conversion tables to generate individual or grouped scores on the common metric using any one of the eleven scales that measure social anxiety disorder
EP
TE D
included in the study.
AC C
•