Journal of Psychosomatic Research 78 (2015) 34–38
Contents lists available at ScienceDirect
Journal of Psychosomatic Research
Comparison of scoring methods for the Brief Insomnia Questionnaire in a general population sample Ka-Fai Chung a,⁎, Wing-Fai Yeung b, Fiona Yan-Yee Ho c, Lai-Ming Ho d, Kam-Ping Yung e, Yee-Man Yu a, Chi-Wa Kwok a a
Department of Psychiatry, University of Hong Kong, Hong Kong, China School of Chinese Medicine, University of Hong Kong, Hong Kong, China Department of Psychology, University of Hong Kong, Hong Kong, China d School of Public Health, University of Hong Kong, Hong Kong, China e Department of Psychology, Chinese University of Hong Kong, Hong Kong, China b c
a r t i c l e
i n f o
Article history: Received 20 May 2014 Received in revised form 30 October 2014 Accepted 14 November 2014 Keywords: Diagnosis DSM-IV-TR DSM-5 ICD-10 ICSD-2 ICSD-3 Insomnia
a b s t r a c t Objective: The Brief Insomnia Questionnaire (BIQ) is a lay-administered, structured interview to derive insomnia disorders according to the Diagnostic and Statistical Manual, Fourth Edition, Text Revision (DSM-IV-TR), International Classification of Diseases, Tenth Edition (ICD-10) and research diagnostic criteria/International Classification of Sleep Disorders, Second Edition (RDC/ICSD-2). The concordance between diagnoses derived from the BIQ and clinical interviews was only moderate and the prevalence estimates based on the BIQ were significantly different from estimates based on clinical interviews. We hypothesized that a modification of the scoring algorithm closer to the diagnostic criteria would improve the performance of the BIQ. Methods: Probability subsample of population-based epidemiological survey respondents (n = 2011) completed clinical reappraisal (n = 176) interviews. We compared the modified scoring with the original scoring in sensitivity, specificity, positive and negative predictive values, areas under the characteristic curve, and Cohen's kappa to detect DSM-IV-TR, ICD-10 and RDC/ICSD-2 insomnia diagnoses by the BIQ against clinical interviews. Result: The diagnostic accuracy was improved with the modified scoring. The areas under the receiver operating characteristic curve for the DSM-IV-TR, ICD-10, RDC/ICSD-2 and any of the insomnia diagnoses ranged from 0.76 to 0.87. Using the modified scoring, there was no significant difference between prevalence estimates based on the BIQ classification and clinical interviews. Conclusions: The BIQ with modified scoring enhanced case detection and produced more accurate prevalence estimates of DSM-IV-TR, ICD-10 and RDC/ICSD-2 insomnia disorders. With scoring algorithms now extended to DSM-5 and ICSD-3 diagnoses, the BIQ should be more widely used in clinical and research settings. © 2014 Elsevier Inc. All rights reserved.
Introduction Insomnia is the most common sleep complaint and it is associated with daytime impairments and health risks. Epidemiological studies have reported the prevalence of insomnia ranging from 6% to 48% [1]. In Hong Kong, the prevalence of insomnia in the general population was estimated to be 11.9% [2] and 39.4% in another study [3]. Estimates of insomnia prevalence vary widely due to the use of different diagnostic criteria, questionnaires and interview methods. The Sleep-EVAL interview was developed by Ohayon et al. [4] in the late 1990s to derive sleep disorder diagnoses according to the Diagnostic and Statistical Manual of Mental Disorder, Fourth Edition (DSM-IV) and International Classification of Sleep Disorders (ICSD), but it is not generally available ⁎ Corresponding author at: Department of Psychiatry, The University of Hong Kong, Pokfulam Road, Hong Kong, China. Tel.: +852 22554487; fax: +852 28551345. E-mail address:
[email protected] (K.-F. Chung).
http://dx.doi.org/10.1016/j.jpsychores.2014.11.015 0022-3999/© 2014 Elsevier Inc. All rights reserved.
to the insomnia research community. Recently, a standardized layadministered instrument, the Brief Insomnia Questionnaire (BIQ), was developed for use in the America Insomnia Survey [5]. The BIQ covers the diagnostic criteria of insomnia disorder according to the DSM-IV, Text Revision (DSM-IV-TR) [6], the International Classification of Diseases, Tenth Edition (ICD-10) [7] and research diagnostic criteria/ICSD, Second Edition (RDC/ICSD-2) [8,9]. Following the proposed scoring algorithms [5], insomnia disorder diagnoses by the DSM-IV-TR, ICD-10 and RDC/ICSD-2 can be obtained; making the BIQ useful for comparative epidemiological studies and for clinicians and researchers who may have different preference of diagnostic system. The original validation study showed that diagnoses derived from the BIQ had moderate concordance with diagnoses based on semistructured clinical reappraisal interviews, with sensitivity from 41.7% to 67.6% and specificity from 94.4% to 98.5%. There was also a significant difference between BIQ and clinical-interview estimates of the prevalence of insomnia disorder, compromising the BIQ as an epidemiological
K.-F. Chung et al. / Journal of Psychosomatic Research 78 (2015) 34–38
35
tool. The discrepancy in prevalence estimates was also found in our validation study of the Hong Kong Chinese version of the BIQ (HK-BIQ) [10]. Compared with estimates by reappraisal interviews, the HK-BIQ was accurate in estimating the prevalence of RDC/ICSD-2 insomnia disorder, but under-estimated the DSM-IV-TR and over-estimated the ICD10 insomnia diagnoses. To address this issue, we reviewed the scoring method of the BIQ and modified the scoring to approximate with the diagnostic criteria. The DSM-IV-TR criterion B requires an individual's sleep disturbance causing clinically significant distress or impairment in social, occupational, or other important areas of functioning. A similar wording is used in the ICD-10 criterion D. We modified the scoring to allow the criterion be met if any one of the BIQ daytime impairment items is positive, instead of two in the original scoring algorithms. In addition, the ICD-10 criterion C requires that there is preoccupation with the sleeplessness and excessive concern over its consequences at night and during the day. We modified the scoring such that the criterion is met when both the BIQ excessive concern item and the worries or distress item are present, instead of either one of the items is present in the original scoring algorithm. The scoring algorithm for RDC/ICSD-2 insomnia disorder was found to be compatible with the diagnostic criteria; hence, we had not made any adjustment. The scoring algorithms that have been modified are presented in Appendix A. We believe that the closer the scoring algorithms are in line with the diagnostic criteria, the greater will be the likelihood of agreement between the BIQ and clinical diagnosis and closer will be the prevalence estimates. The recent launch of the DSM-5 [11] and ICSD-3 [12] might have hampered the need of the old systems; however, comparative epidemiological studies of the change after using the new systems are important, especially for the ICSD, which has mainly revised the duration criterion from 1 month in RDC/ICSD-2 to 3 months in ICSD-3. Using the same sample and statistical approach as in the original HKBIQ study [10], we compared the diagnostic accuracy between the original and modified scoring method, with a view that the modified scoring method will provide more accurate case detection and prevalence estimates.
Kong. We successfully interviewed 2011 respondents from July 24 to December 6, 2012. The overall response rate was 64.3%. There were 1019 refusals at household or respondent-levels and 97 partial responses. The first section included an introduction and verbal consent, followed by the HK-BIQ, then sociodemographics, including age, gender, occupation and level of education. The last section consisted of verbal consent to another telephone interview on their sleep problem. In most cases, the telephone interview could be completed within 15 min. In line with the American BIQ validation study [5], we randomly selected participants but oversampled BIQ positives for clinical reappraisal interviews. The clinical reappraisal subsample consisted of 73 cases, 51 subthreshold cases and 52 non-cases, which allowed a Cohen's kappa (κ) of 0.7 with 2-sided 95% confidence interval (CI) of 0.1 [15]. The telephone-based clinical reappraisal interviews were conducted in a blinded manner 2–14 days after the first interview by 2 senior authors (KC and WY). The re-interviews were timed to minimize memory effects and true changes in insomnia status and to provide some flexibility in contacting the respondents. No respondents were paid for participation.
Method
All statistical analysis was done by STATA 10.0. The clinical reappraisal subsample was first weighted to adjust for the oversampling of BIQ positives. Validity of the BIQ was assessed by the concordance between diagnoses based on the BIQ and diagnoses based on clinical reappraisal. The validity and reliability at both aggregate and individual levels were tested. At the aggregate level, we compared prevalence estimates based on the BIQ and the clinical reappraisal by the McNemar χ2 tests. Individual-level diagnostic concordance was evaluated using 2 descriptive measures, the area under the receiver operating characteristic curve (AUC) [17] and κ. AUC was calculated as (TP / P + TN / N) / 2 while κ as [TP − FP − P′(1 − 2 N)] / [P − P′(1 − 2 N)], where TP = true positive, FP = false positive, TN = true negative, FN = false negative, P = TP + FN, P′ = TP + FP, and N = FP + TN [18]. Although AUC and κ are related concepts, they may not always change in the same direction. We also reported sensitivity (SN), specificity (SP), positive predictive value (PPV) and negative predictive value (NPV). The odd ratio, which is equal to [SN × SP] / [(1 − SN) × (1 − SP)] was also used to assess concordance between diagnoses based on the BIQ and the clinical reappraisal. We used net reclassification index (NRI) to examine the change in diagnostic accuracy after using the modified scoring algorithm. Overall NRI was calculated using the formula: [Pr(up|event) −Pr(down|event)] + [Pr(down|nonevent) − Pr(up|nonevent)], with higher values indicating greater improvement [19]. To determine whether BIQ symptom-level data improved the prediction of clinical diagnoses, a series of stepwise logistic-regression equations, in which clinical diagnoses were treated as dichotomous outcomes and BIQ symptom variables were included along with BIQ diagnoses as predictors. We compared the AUC based on dichotomous BIQ classification with that of continuous predicted
Sample The sample and study procedure were reported in detail in a previous study [10]. The study population consisted of Hong Kong residents older than 18 years and able to communicate in Cantonese or Mandarin Chinese languages. The randomization process was divided into 2 parts: randomization of telephone numbers and selection of respondents in households. Telephone numbers in Hong Kong are listed in telephone directories automatically unless the customers request their numbers be withheld. We selected telephone numbers randomly from computerized residential telephone directories with no stratification applied and generated some unlisted numbers by adding and subtracting 1 and 2 from the selected numbers [13]. Duplicate numbers were screened out. Within each household, respondents were randomly selected by asking to speak to the person who was going to celebrate a birthday next. This technique is commonly used to overcome respondent selection bias associated with administering the survey to the household member most likely to answer the phone. A recent review detected no significance differences in demographic distribution between “next birthday” and true probability samples [14]. Verbal consent was obtained from all participants and all procedures used in this study were reviewed and approved by the local institutional review board. Procedure A fully-structured lay-administered telephone interview was conducted by the Public Opinion Programme, The University of Hong
Measures The translation of the BIQ into Chinese was conducted according to the World Health Organization guidelines [16], with steps including forward translation, expert review, back-translation, expert review, pre-testing, and final version. The expert panel consists of experienced clinicians and researchers in sleep disorders. The clinical reappraisal was conducted using a standardized semi-structured questionnaire, developed specifically for BIQ validation [5]. It includes DSM-IV-TR, ICD-10 and RDC/ICSD-2 symptom checklists, with rating categories of definite, probable, possible and no for each symptom and classification as case or non-case for each diagnostic system. The κ values between KC and WY for all diagnostic categories were 1.0 based on 20 audiotaped interviews. Data analysis
36
K.-F. Chung et al. / Journal of Psychosomatic Research 78 (2015) 34–38 RDC/ICSD-2 insomnia diagnoses improved when using the modified scoring algorithms, while the SP remained unchanged. The overall NRI was 0.023 for any of the insomnia diagnoses, suggesting a slight increase in accuracy. In short, the modified scoring algorithms were able to improve the screening performance of the BIQ, except a moderate reduction of the SN to detect ICD-10 cases, resulting in a negative NRI.
probability based on BIQ symptom-level data. In line with the original BIQ validation study [5], we excluded the borderline cases to create a sample with good sleepers as controls and re-analyzed the psychometric performance of the BIQ. Statistical significance was evaluated in all of the above analyses using 0.05 level 2-sided tests. The Taylor series linearization method was used to adjust standard errors of estimates in the regression models.
The use of good sleepers as controls We re-analyzed the diagnostic concordance after excluding the respondents who reported some sleep problems (2 or more days a week for 1 month or longer) but failed to meet full criteria for insomnia disorder. Using the original scoring algorithms, results showed that concordance estimates were substantially inflated in this way for all diagnostic criteria, except the ICD-10 diagnosis (Table 2). With good sleepers as controls, the κ for the DSM-IV-TR and RDC/ICSD-2 increased from 0.72 to 0.88 and from 0.60 to 0.85, respectively, while for the ICD-10 diagnosis, the κ stayed at 0.52 and the PPV remained 38.3%. Using the modified scoring algorithms with good sleepers as controls, there was more apparent improvement in the diagnostic accuracy for ICD-10 diagnosis, with κ increased from 0.50 to 0.65 and AUC from 0.76 to 0.98.
Results Concordance of diagnoses based on the BIQ and clinical reappraisal interviews Table 1 presents the sociodemographic characteristics of the total sample and subsamples compared to the population census data. Prevalence estimates based on the BIQ classification were compared with estimates based on clinical interviews for each of the diagnostic systems. Using the original scoring algorithm, McNemar tests showed that the prevalence estimates differed significantly for the DSM-IV-TR and ICD-10 diagnoses (Table 2), but there was no significant difference between the estimates for the RDC/ ICSD-2 and any of the DSM-IV-TR, ICD-10 and RDC/ICSD-2 diagnoses. When the modified scoring algorithms were used, there was no significant difference in the prevalence estimates for all diagnostic systems. Using the original scoring algorithms, individual-level concordance between BIQ classification and clinical reappraisal was the highest for the DSM-IV-TR and any diagnosis (κ = 0.72 in both cases; AUC = 0.83 and 0.85, respectively); concordance was moderate for the RDC/ICSD-2 (κ = 0.60; AUC = 0.78) and fair for the ICD-10 (κ = 0.48; AUC = 0.86). The use of the modified scoring algorithms slightly improved the diagnostic accuracy of the BIQ for the DSM-IV-TR and any diagnoses (κ = 0.74 and 0.75, respectively; AUC = 0.87 in both cases); for the ICD-10, there was a slight increase in κ but a reduction in AUC value. Results showed that based on the original scoring algorithms, the BIQ was more sensitive in detecting DSM-IV-TR and ICD-10 cases (SN = 68.4% and 77.8%, respectively) than RDC/ICSD-2 diagnosis (SN = 59.5%), while the SN to detect any of the DSM-IV-TR, ICD-10 and RDC/ICSD-2 cases was 75.0%. Most of the DSM-IV-TR and RDC/ICSD-2 cases were confirmed by clinical reappraisal (PPV = 76.2%–89.7%), but it was unsatisfactory for the ICD-10 cases, which had a low PPV at 38.9%. The SP and NPV are high for all 3 diagnostic systems (SP = 93.5%–97.8%; NPV = 89.8%–98.7%), indicating that the vast majority of non-cases were classified accurately by the BIQ. Using the modified scoring algorithms, the SN to detect DSM-IV-TR cases increased from 68.4% to 79.3%, but the SP decreased from 97.8% to 94.2%. The overall NRI was 0.0009 for the DSM-IV-TR, indicating very slight increase in accuracy after using the modified scoring algorithm. For the ICD-10 diagnosis, the modified scoring algorithm helped to increase SP from 93.5% to 97.0%, but the SN was reduced from 77.8% to 55.6%. The overall NRI was −0.10 for the ICD-10, suggesting a small reduction in accuracy. The SN of the BIQ for detecting any of the DSM-IV-TR, ICD-10 and
Continuous classifications using BIQ symptom data We performed stepwise logistic-regression analysis to select BIQ items that could significantly predict clinical diagnoses after controlling for the dichotomous BIQ diagnoses. Each respondent in the clinical reappraisal sample was then assigned a predicted probability based on the resulting logistic-regression equations (Table 3). Results show that with the original scoring method the AUCs after including BIQ symptom data were substantially higher than the AUCs using the dichotomous classification. Improvement was most substantial for the RDC/ICSD-2 diagnosis, with AUC increased from 0.78 to 0.96, followed by the ICD-10, from 0.86 to 0.90, and the DSM-IV-TR, from 0.83 to 0.89. Using the modified scoring algorithms, there were also increases in diagnostic accuracy, with AUC for DSMIV-TR increased from 0.87 to 0.89 and for ICD-10, from 0.76 to 0.84, when symptomlevel data were included.
Discussion This is the first study to validate the BIQ in a general population sample using a modified scoring method. Our results show that compared to the original scoring algorithms the modified scoring is more accurate in detecting insomnia disorders and provides more precise prevalence estimates. The modified scoring algorithms are more in line with the diagnostic criteria for the DSM-IV-TR, ICD-10 and RDC/ICSD-2 insomnia disorders and should be used in future studies.
Table 1 Socio-demographic characteristics of the total sample and subsamples compared to census population data Variables
Hong Kong general population aged ≥18 yra (N = 5,999,455)
Total sample (N = 2011)
Clinical reappraisal subsample (N = 176)
Good sleepers within the subsampleb (N = 45)
Age in yr, mean (SD) Sex, male/female Education, N (%) Primary Secondary Tertiary Marital status, N (%) Never married Married Divorced Cohabited, separated or widow Occupation, N (%)c Professional and associate professional Skilled and semi-skilled worker Unskilled worker Retired Students Homemakers/others Unemployed Income, N (%)c No income b$10,000 $10,000–19,999 $20,000–29,999 N$30,000
46.51 (17.2) 1/1.18
52.20 (17.9) 686/1325 (1/1.93)
42.29 (17.5) 57/119 (1/2.09)
46.1 (18.8) 18/27 (1/1.5)
a b c
23.7% 48.1% 28.3%
520 (26.0) 993 (49.7) 484 (24.2)
41 (23.3) 84 (47.7) 51 (29.0)
8 (17.8) 24 (53.3) 13 (28.9)
28.8% 60.1% 4.1% 7.0%
415 (20.9) 1432 (72.1) 60 (3.0) 80 (4.0)
39 (22.2) 127 (72.2) 6 (3.4) 4 (2.3)
14 (31.1) 30 (66.7) 1 (2.2) 0 (0.0)
22.0% 26.1% 11.8% 18.0% 2.5% 17.5% 2.1%
313 (15.8) 406 (20.4) 80 (4.0) 530 (26.7) 123 (6.2) 475 (23.9) 60 (3.0)
35 (19.9) 34 (19.3) 3 (1.7) 39 (22.2) 17 (9.7) 44 (25.0) 4 (2.3)
11 (24.4) 9 (20.0) 1 (2.2) 9 (20.0) 9 (20.0) 6 (13.3) 0 (0.0)
40.3% 23.2% 20.0% 7.3% 9.2%
960 (51.1) 368 (19.6) 288 (15.3) 141 (7.5) 122 (6.5)
85 (49.4) 35 (20.3) 25 (14.5) 11 (6.4) 16 (9.3)
17 (37.8) 13 (28.9) 5 (11.1) 5 (11.1) 5 (11.1)
Population census 2011; occupation and income data based on population aged ≥20 yr. Insomnia symptom at most once per week. Difference from total N reflects omissions on reporting forms; income in HK$.
K.-F. Chung et al. / Journal of Psychosomatic Research 78 (2015) 34–38
37
Table 2 Consistency of diagnoses based on the Brief Insomnia Questionnaire with diagnoses based on clinical interviews in weighted normal control samples and control samples using good sleepersa Criteria DSM-IV-TR
ICD-10
Original scoring McNemar χ2 test (p-value) OR (95% CI) Cohen's κ (95% CI) AUC (95% CI) Sensitivity (95% CI) Specificity (95% CI) PPV (95% CI) NPV (95% CI) NC (cases/control) GSC (cases/control)
NC NC NC GSC NC GSC NC GSC NC GSC NC GSC NC GSC
Modified scoring
5.4 (0.04)
0.0 (1.00)
96.8 (23.4,538.6) 0.72 (0.59,0.85) 0.88 (0.85,0.91) 0.83 (0.76,0.91) 0.94 (0.93,0.96) 68.4 (51.3,82.5) 91.8 (88.3,94.5) 97.8 (93.7,99.5) 97.0 (95.9,97.9) 89.7 (72.6,97.8) 89.1 (85.3,92.2) 91.8 (86.1,95.7) 97.8 (96.8,98.5) 176 (73/103) 118 (73/45)
Original scoring 6.2 (0.02)
63.0 (19.8,208.4) 0.74 (0.62,0.86) 0.81 (0.78,0.84) 0.87 (0.80,0.94) 0.93 (0.91,0.94) 79.5 (63.5,90.7) 92.8 (89.8,95.2) 94.2 (88.9,97.5) 93.0 (91.5,94.4) 79.5 (63.5,90.7) 79.7 (75.7,83.4) 94.2 (88.9,97.5) 97.8 (96.8,98.5) 176 (92/84) 137 (92/45)
50.0 (7.8,516.1) 0.48 (0.25,0.72) 0.52 (0.44,0.59) 0.86 (0.71,1.00) 0.95 (0.95,0.96) 77.8 (40.0,97.2) 100.0 (95.2,100.0) 93.5 (88.6,96.7) 91.0 (89.3,92.4) 38.9 (17.3,64.3) 38.3 (31.4,45.5) 98.7 (95.5,99.8) 100.0 (99.7,100.0) 176 (42/134) 87 (42/45)
Modified scoring 0.1 (1.00) 40.5 (6.2,261.2) 0.50 (0.22,0.78) 0.65 (0.56,0.73) 0.76 (0.59,0.94) 0.98 (0.97,0.98) 55.6 (21.2,86.3) 100.0 (93.6,100.0) 97.0 (93.2,99.0) 95.6 (94.3,96.7) 50.0 (18.7,81.3) 50.0 (40.4,59.6) 97.6 (93.9,99.3) 100.0 (99.7,100.0) 176 (24/152) 69 (24/45)
RDC/ICSD-2b
Any of DSM-IV-TR, ICD-10 and RDC/ICSD-2
Original scoring
Original scoring
3.9 (0.09) 32.3 (10.3,109.4) 0.60 (0.45,0.76) 0.85 (0.82,0.89) 0.78 (0.69,0.86) 0.97 (0.97,0.98) 59.5 (42.1,75.2) 100.0 (98.6,100.0) 95.7 (90.8,98.4) 94.6 (93.2,95.7) 78.6 (59.0,91.7) 78.5 (73.6,82.8) 89.8 (83.7,94.2) 100.0 (99.7,100.0) 176 (68/108) 113 (68/45)
0.5 (0.63) 55.7 (17.8,183.1) 0.72 (0.59,0.84) 0.83 (0.80,0.86) 0.85 (0.78,0.92) 0.93 (0.92,0.95) 75.0 (58.8,87.3) 92.7 (89.6,95.1) 94.9 (89.8,97.9) 94.1 (92.6,95.3) 81.1 (64.8,92.0) 82.1 (78.0,85.6) 92.9 (87.3,96.5) 97.8 (96.8,98.5) 176 (88/88) 133 (88/45)
Modified scoring 0.1 (1.00) 71.4 (21.7,246.6) 0.75 (0.63,0.87) 0.83 (0.80,0.86) 0.87 (0.80,0.94) 0.93 (0.92,0.95) 79.5 (63.5,90.7) 93.0 (90.0,95.3) 94.9 (89.7,97.9) 93.7 (92.2,95.0) 81.6 (65.7,92.3) 81.8 (77.8,85.3) 94.2 (88.8,97.4) 97.8 (96.8,98.5) 176 (92/84) 137 (92/45)
Abbreviations: AUC, area under the receiver operating characteristic curve; CI, confidence interval; DSM-IV-TR, Diagnostic and Statistical Manual, Fourth Edition, Text Revision; ICD-10, International Classification of Diseases-10; NC, Normal control samples; RDC/ICSD-2, Research Diagnostic Criteria and International Classification of Sleep Disorders-2; GSC, Good sleeper control samples. a Data were weighted to adjust for the oversampling of respondents classified as cases and subthreshold cases by the Brief Insomnia Questionnaire. b No modification of the scoring method required for the RDC/ICSD-2 diagnosis.
The modified scoring algorithm for the DSM-IV-TR has a lower threshold for diagnosis than the original algorithm; hence the SN for detecting cases is increased but the SP is reduced. The overall diagnostic accuracy is improved with increases in both κ and AUC values. Most importantly, the new scoring method provides accurate prevalence estimate of the DSM-IV-TR diagnosis, as there is no significant difference between the estimates by the BIQ and by the “gold standard” — clinician diagnosis based on skillful interview and detailed evaluation of the diagnostic criteria [20]. Using the modified scoring method for the DSM-IVTR, we show that both symptom-level data and using good sleepers as controls could enhance diagnostic performance, replicating the findings with the original scoring method. The modification of the ICD-10 scoring algorithm is less straightforward than that of the DSM-IV-TR, as there are both lowering and raising of the threshold for diagnosis. The modified scoring method for the ICD10 is more specific and less sensitive than the original scoring method; but the modified scoring produces accurate prevalence estimate of the
Table 3 Comparisons of AUC in predicting clinical diagnoses based on the dichotomous Brief Insomnia Questionnaire classification and the continuous predicted probabilities based on item-level data in a weighted sample of 176 adults Criteria
DSM-IV-TR Original scoring Modified scoring ICD-10 Original scoring Modified scoring RDC/ICSD-2 Original scoring Any of DSM-IV-TR, ICD-10 and RDC/ICSD-2 Original scoring Modified scoring
AUC Dichotomous
Continuous
0.83 (0.76,0.91) 0.87 (0.80,0.94)
0.89 (0.83,0.94) 0.89 (0.84,0.95)
0.86 (0.71,1.00) 0.76 (0.59,0.94)
0.90 (0.84,0.97) 0.84 (0.74,0.94)
0.78 (0.69,0.86)
0.96 (0.93,0.99)
0.85 (0.78,0.92) 0.87 (0.80,0.94)
0.89 (0.83,0.94) 0.92 (0.86,0.98)
Abbreviations: AUC, area under the receiver operating characteristic curve; DSM-IV-TR, Diagnostic and Statistical Manual, Fourth Edition, Text Revision; ICD-10, International Classification of Diseases-10; RDC/ICSD-2, Research Diagnostic Criteria and International Classification of Sleep Disorders-2.
ICD-10 diagnosis. The overall NRI is −0.10, suggesting a small reduction in accuracy with the modified scoring, but it is likely due to the small number of participants fulfilling ICD-10 diagnostic criteria; hence the analysis of NRI is vulnerable to small variation in number. Using good sleepers as controls, the improvement in diagnostic accuracy with the modified scoring becomes more apparent, with an increase in κ from 0.52 to 0.65 and AUC from 0.95 to 0.98, compared to the original scoring. In terms of diagnostic accuracy for detecting any of the DSM-IV-TR, ICD-10 and RDC/ICSD-2 insomnia disorders, the modified scoring method is better than the original scoring, with increases in SN, SP, PPV, NPV, κ and AUC values. There is seldom perfect agreement between screening instruments and clinician diagnosis. Including borderline cases to reflect real-world situation, the SN of the HK-BIQ with the modified scoring to detect DSM-IV-TR, ICD-10 and RDC/ICSD-2 insomnia disorders ranges from 55.6% to 79.5%, the SP from 94.2% to 97.0%, the PPV from 50.0% to 79.5%, and the NPV from 89.8% to 97.6%. Although the HK-BIQ is relatively weak in screening ICD-10 cases, with a modest SN at 55.6% and PPV at 50.0%, it actually performs better than many screening tools. For example, at the recommended cutoff, the SN and SP for K6 to detect 1-month major depression are 0.97 and 0.58, respectively, producing a PPV of 0.04 [21]; while the SN and SP for Mood Disorder Questionnaire to detect lifetime bipolar disorder are 0.28 and 0.97, respectively [22]. Previous studies have shown that a screening instrument usually performs best in specialist setting, followed by primary care setting, and worst in the general population [23–25], As the HK-BIQ has been shown to be useful for case detection and estimating prevalence in the general population, it is likely that the results are generalizable to medical settings, though confirmation is needed. Our study has several methodological limitations. First, there are differences in sociodemographic characteristics between the clinical reappraisal subsample and the census population, but age and educational level, the most important factors that could affect understanding of the BIQ, are not markedly different. Hence, the psychometric properties found with the clinical reappraisal sample are quite likely generalizable to the general population. Our response rate was only 64.3%, although it is similar to the original validation study [5]. Similar to the original study, the HK-BIQ was administered by lay interviewers, and it is
38
K.-F. Chung et al. / Journal of Psychosomatic Research 78 (2015) 34–38
unclear whether psychometric performance can be improved with trained interviewers. During clinical reappraisal interviews, systematic bias in rating of symptoms and classification of case/non-case may be introduced, but the interviewers were blind to the BIQ results and it was minimized by using a standardized semi-structured clinical reappraisal questionnaire. Lastly, the accuracy of clinical diagnoses may be improved with in-person assessments and the use of sleep diaries and other additional information. The DSM-5 was officially released in May 2013 [11], followed by the ICSD-3 in 2014 [12]. There are major differences between the DSM-5 and ICSD-3 and the DSM-IV-TR, ICD-10 and RDC/ICSD-2 in the diagnostic criteria for insomnia disorder. One of which is the duration criterion of 3 months in DSM-5 and ICSD-3 and 1 month in other systems. Until more data are available on the DSM-5 and ICSD-3 being better than the previous diagnostic systems, data on the DSM-IV-TR, ICD-10 and RDC/ICSD-2 insomnia disorders are still important. The HK-BIQ has been revised to cover DSM-5 and ICSD-3 (Appendix B) insomnia disorder diagnoses. We have previously shown that the HK-BIQ is valid in deriving DSM-5 insomnia disorder [10]. As the diagnostic criteria of the ICSD-3 chronic insomnia disorder are similar to that of the DSM-5 insomnia disorder, it is likely that the HK-BIQ is also valid in deriving ICSD-3 diagnosis, though confirmation is needed. Our findings suggest that the HK-BIQ is an easy-to-use, lay-administered, valid and reliable questionnaire applicable for use in clinical and research settings for screening and epidemiological purposes. Minor revision of the English-version BIQ and testing it against DSM-5 and ICSD-3 clinician diagnosis are needed to confirm its application with the new diagnostic systems. Conflict of interest statement The authors report no conflict of interests. Acknowledgments This project was supported by the Small Project Funding of the University of Hong Kong Grant No. (104002576). Appendix A. Supplementary data Supplementary data to this article can be found online at http://dx. doi.org/10.1016/j.jpsychores.2014.11.015. References [1] Ohayon MM. Epidemiology of insomnia: what we know and what we still need to learn. Sleep Med Rev 2002;6:97–111. [2] Li RH, Wing YK, Ho SC, Fong SY. Gender differences in insomnia — a study in the Hong Kong Chinese population. J Psychosom Res 2002;53:601–9.
[3] Wong WS, Fielding R. Prevalence of insomnia among Chinese adults in Hong Kong: a population-based study. J Sleep Res 2011;20:117–26. [4] Ohayon MM, Guilleminault C, Sulley J, Palombini L, Raab H. Validation of the SleepEVAL system against clinical assessments of sleep disorders and polysomnographic data. Sleep 1999;22:925–30. [5] Kessler RC, Coulouvrat C, Hajak G, Lakoma MD, Roth T, Sampson N, et al. Reliability and validity of the Brief Insomnia Questionnaire in the America Insomnia Survey. Sleep 2010;33:1539–49. [6] American Psychiatric Association. Diagnostic and Statistical Manual of Mental Disorders (DSM-IV-TR). Text Revision 4th ed. Washington, DC: American Psychiatric Association; 2000. [7] World Health Organization. International Classification of Diseases (ICD-10). Geneva, Switzerland: World Health Organization; 1991. [8] Edinger JD, Bonnet MH, Bootzin RR, Doghramji K, Dorsey CM, Espie CA, et al. Derivation of research diagnostic criteria for insomnia: report of an American Academy of Sleep Medicine Work Group. Sleep 2004;27:1567–96. [9] American Academy of Sleep Medicine. International Classification of Sleep Disorders: Diagnostic and Coding Manual. 2nd ed. Westchester, IL: American Sleep Disorders Association; 2005(ICSD-2). [10] Chung KF, Yeung WF, Ho YY, Ho LM, Yung KP, Yu YM, et al. Validity and reliability of the Brief Insomnia Questionnaire in the general population in Hong Kong. J Psychosom Res 2014;76:374–9. [11] American Psychiatric Association. Diagnostic and Statistical Manual of Mental Disorders (DSM-5). 5th ed. Arlington (VA): American Psychiatric Publishing; 2013. [12] American Academy of Sleep Medicine. International Classification of Sleep Disorders. 3rd ed. Darien, IL: American Academy of Sleep Medicine; 2014. [13] Lavrakas PJ. Generating telephone survey sampling pools. Telephone survey methods: sampling, selection, and supervision. Second ed. California: Sage Publications Inc.; 1993. p. 27–59. [14] Gaziano C. Comparative analysis of within-household respondent selection techniques. Public Opin Q 2005;69:124–57. [15] Rotondi MA, Donner A. A confidence interval approach to sample size estimation for interobserver agreement studies with multiple raters and outcomes. J Clin Epidemiol 2012;65:778–84. [16] Harkness J, Pennell BE, Villar A, Gebler N, Aguilar-Gaxiola S, Bilgen I. Translation procedures and translation assessment in the World Mental Health Survey initiative. In: Kessler RC, Üstün TB, editors. The WHO World Mental Health Survey: global perspectives on the epidemiology of mental disorders. Cambridge: Cambridge University Press; 2008. p. 91–113. [17] Hanely JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 1982;143:29–36. [18] Ben-David A. About the relationship between ROC curves and Cohen's kappa. Eng Appl Artif Intel 2008;21:874–82. [19] Leening MJ, Vedder MM, Witteman JC, Pencina MJ, Steyerberg EW. Net reclassification improvement: computation, interpretation and controversies. A literature review and clinician's guide. Ann Intern Med 2014;160:122–31. [20] Buysse DJ, Ancoli-Israel S, Edinger JD, Lichstein KL, Morin CM. Recommendations for a standard research assessment of insomnia. Sleep 2006;29:1155–73. [21] Cairney J, Veldhuizen S, Wade TJ, Kurdyak P, Streiner DL. Evaluation of 2 measures of psychological distress as screeners for depression in the general population. Can J Psychiatry 2007;52:111–20. [22] Hirschfeld RM, Holzer C, Calabrese JR, Weissman M, Reed M, Davies M, et al. Validity of the mood disorder questionnaire: a general population study. Am J Psychiatry 2003;160:178–80. [23] Chung KF, Tso KC, Cheung E, Wong M. Validation of the Chinese version of the Mood Disorder Questionnaire in a psychiatric population in Hong Kong. Psychiatry Clin Neurosci 2008;62:464–71. [24] Chung KF, Tso KC, Chung TY. Validation of the Mood Disorder Questionnaire in the general population in Hong Kong. Comp Psychiatry 2009;50:471–6. [25] Poon Y, Chung KF, Tso KC, Chang CL, Tang D. The use of Mood Disorder Questionnaire, Hypomania Checklist-32 and clinical predictors for screening previously unrecognized bipolar disorder in a general psychiatric setting. Psychiatry Res 2012; 195:111–7.