Development of a Symptom Score for Dysfunctional Elimination Syndrome Kourosh Afshar, Amir Mirbagheri, Heidi Scott and Andrew E. MacNeily From the Department of Urologic Sciences, University of British Columbia, Vancouver, British Columbia, Canada
Purpose: Dysfunctional elimination syndrome is a heterogeneous syndrome with no widely accepted diagnostic criteria. Previously developed questionnaires provide incomplete psychometric assessment. We developed a discriminative questionnaire for diagnosing dysfunctional elimination syndrome and assessed its validity and reliability. Materials and Methods: A 14-item 5-point Likert scale questionnaire was devised using literature review, expert opinions and patient input. The questionnaire was administered to 62 children 4 to 16 years old (median age 8) clinically diagnosed with dysfunctional elimination syndrome by a pediatric urologist, of whom 71% were female. It was also administered to 50 healthy controls 4 to 16 years old (median age 7), of whom 66% were female. Children with structural abnormalities were excluded from study. To assess reliability 50 participants were asked to complete the questionnaire again 1 week later. Results: Median total score in cases and controls was 14 of 52 (range 4 to 30) and 6 of 52 (range 1 to 13), respectively. The difference was statistically significant (p ⫽ 0.001). Discriminant function analysis showed 80% accuracy. ROC curve showed a score of 11 as the optimum threshold with an AUC of 0.903 (95% CI 0.814 – 0.948). Test-retest reliability was 84.5% (p ⫽ 0.001). Factor analysis showed unloading on 4 factors, corresponding to urinary incontinence, urgency, obstructive symptoms and constipation/fecal soiling. Of participants 85% classified the questionnaire as very easy or easy to complete. Conclusions: This new questionnaire is valid and reliable for diagnosing dysfunctional elimination syndrome. It can be used as a clinical or research instrument.
Abbreviations and Acronyms DES ⫽ dysfunctional elimination syndrome LUTS ⫽ lower urinary tract symptoms NDLU/B ⫽ lower urinary tract or bowel nonneurogenic dysfunction NLUTD ⫽ nonneurogenic lower urinary tract dysfunction UTI ⫽ urinary tract infection Study received University of British Columbia research ethics board approval.
Key Words: urinary bladder, urinary incontinence, fecal incontinence, constipation, questionnaires NONNEUROGENIC lower urinary tract dysfunction is one of the most common reasons for referral to pediatric urology clinics, comprising up to 30% of visits to some outpatient clinics.1 These patients present with abnormal voiding habits, UTI and various LUTS, including incontinence.2,3 When this disorder is associated with defecation symptoms, some groups have used the term DES.2 DES may have profound effects
on child physical and mental health. Upper or lower urinary tract damage may occur, such as renal scarring due to infection and detrusor decompensation, resulting in an atonic bladder.4 Incontinence may result in child isolation, lack of self-esteem, anxiety and mood disorders.5,6 There are a few published questionnaires that quantify NLUTD or DES but they lack thorough assess-
0022-5347/09/1824-1939/0 THE JOURNAL OF UROLOGY® Copyright © 2009 by AMERICAN UROLOGICAL ASSOCIATION
Vol. 182, 1939-1944, October 2009 Printed in U.S.A. DOI:10.1016/j.juro.2009.03.009
www.jurology.com
1939
1940
SYMPTOM SCORE FOR DYSFUNCTIONAL ELIMINATION SYNDROME
ment of psychometric properties such as validity and reliability.7–9 Validity is defined as the ability of a scale to measure what it is designed to measure and reliability is the degree to which measurement results can be replicated under identical situations.10,11 We devised a questionnaire to diagnose DES and tested its construct validity and reliability.
METHODS Approval was obtained from the University of British Columbia research ethics board. A 14-item questionnaire was developed using a combination of literature review, expert opinions and patient/family interviews (fig. 1). Preliminary patient interviews were done by a nurse clinician or one of us (KA). A total of 15 patients clinically diagnosed with NLUTD/DES and their families were asked about patient urinary and gastrointestinal symptoms in systematic fashion. Most questions were based on the results of our literature review. We also included openended questions to prompt the patient about symptoms not probed by the interviewer. Also, patients/families were asked to rank symptoms in terms of bother. Findings were tabulated and used to create a first draft of the questionnaire. This was pretested in 7 subsequent cases for face validity and 2 items that were hard to understand or ambiguous were deleted. The instrument was constructed as a self-administered questionnaire with a nurse clinician available to answer any questions. Items were answered by patients at ages 9 years or older and by parents when patients were younger. All items were weighted equally. The last question requested feedback on the ease of completing the questionnaire and was not included in the total score. The reading level required for the questionnaire was targeted at between grades 3 and 4 (Flesch-Kinkaid grade level 3.9). A 5-point Likert scale was used for all questions. Each question probed a single symptom. A score of zero denoted no complaints and a score of 4 indicated severe symptoms. Item 3 on frequency was scored differently to capture abnormalities of voiding frequency at each end of the extreme. The neutral choice (5 to 6 voids per day) had a score of zero. Voiding only 1 to 2 times or greater than 8 times per day each generated a score of 4. Voiding 3 or 4 and 7 or 8 times per day each corresponded to a score of 2. To assess validity the instrument was then administered to 62 consecutive patients who were referred to our clinic with lower urinary tract symptoms and diagnosed with DES/NLUTD by a pediatric urologist based on history and physical examination. Diagnosis in these cases was based on the subjective expert opinion of the attending urologist. In general when the voiding and/or defecation pattern seemed abnormal in the absence of structural disease, the diagnosis was NLUTD/DES. Questionnaires were completed at the first visit to the pediatric urology clinic at our institution. To assess test-retest reliability participants were asked to repeat the questionnaire at home 1 week later and mail it back to the clinic in a prestamped envelope. From the otolaryngology clinic at our institution 50 controls were selected randomly. They had no history of
Figure 1. Vancouver NULTD/DES questionnaire
urinary or gastrointestinal diseases and considered themselves or their child normal in terms of bladder and bowel function. The scores of this group were used to assess construct validity using the extreme groups methods. Patients with known anatomical abnormalities such as neurological disorders, bladder outlet obstruction and pre-
SYMPTOM SCORE FOR DYSFUNCTIONAL ELIMINATION SYNDROME
vious urological surgery were excluded from analysis. We only included patients with English as the primary language. Primary vesicoureteral reflux was not a study exclusion criterion. Validity was assessed using the Mann-Whitney U test on total score in NLUTD/DES cases vs controls. Discriminant function analysis (a technique to predict the category of an outcome variable by 1 or more continuous or categorical independent variables) was also done to determine the accuracy of the instrument for classifying the children. Questionnaire internal consistency was evaluated by Cronbach’s ␣. Factor analysis was done to assess relationships between individual items. Test-retest reliability was estimated by Pearson’s correlation coefficient. An ROC curve was created to estimate the optimal threshold score for diagnosis. SPSS® was used for statistical analysis.
1941
Figure 2. Total score distribution
RESULTS Cases and controls were demographically well matched with a median age of 8 (range 4 to 16) and 7 years (range 4 to 16), respectively. Of cases and controls 71% and 66% were female, respectively. Urinary incontinence, irritative LUTS, obstructive LUTS, constipation (hard stool, or 1 stool every 3 days or less often) and fecal soiling were seen in 58%, 87%, 66%, 35% and 10% of cases, and in 0%, 8%, 4%, 14% and 0% of controls, respectively. Of cases 60% had a history of UTI, of which two-thirds were nonfebrile. No controls had a history of UTI. The mean ⫾ SD total score in cases vs controls was 14.3 ⫾ 5.6 (median 14) vs 6.9 ⫾ 3.7 (median 6) with a possible maximum of 52. The table lists median scores on each item. Figure 2 shows the total score distribution. The difference was statistically significant (Mann-Whitney U test p ⫽ 0.001). Discriminant function analysis showed 80% accuracy for correctly classifying sample patients. An ROC curve was generated to determine the optimum cutoff score to diagnose DES/NLUTD. A total score of 11 was associated with 80% sensitivity and 91% specificity (fig. 3). The AUC was 0.903 (95% CI 0.814 – 0.948).
Comparing scores related only to urinary symptoms revealed a significant difference in cases vs controls with a median score of 6 vs 3 (MannWhitney U test p ⫽ 0.0001). At a cutoff score of 6 cases and controls were distinguished with 74% sensitivity and 88% specificity. Bowel function scores were also significantly different (MannWhitney U test p ⫽ 0.0003). We assessed test-retest reliability in 50 cases and controls. The response rate was 75%. Pearson’s correlation coefficient was 0.845 (p ⬍0.001), showing excellent reliability when 2 questionnaires were answered 1 week apart. Factor analysis identified 4 domains corresponding to incontinence, urgency, obstructive symptoms and fecal complaints. Collectively they were responsible for 65% of the score variance. Cronbach’s ␣ was modest at 0.445, showing that NLUTD/DES is a heterogeneous clinical syndrome, as described. Of the participants 85% classified the questionnaire as easy or very easy to answer.
Case and control item scores Item No.
Control Median Score
Case Median Score
1 2 3 4 5 6 7 8 9 10 11 12 13
0 0 2 1 0 0 0 0 0 0 1 1 0
1 2 2 2 1 0 1 0.5 0 0 1 1 0
Figure 3. ROC curve
1942
SYMPTOM SCORE FOR DYSFUNCTIONAL ELIMINATION SYNDROME
DISCUSSION Despite the high prevalence of NLUTD/DES there is no unifying definition of the disorder, partly because of symptom variability. Lower urinary and intestinal complaints are characteristic, especially urinary incontinence. UTI and vesicoureteral reflux are common in these patients. The debate continues around the pathogenesis but favored theories are behavioral issues, immature nervous control of the lower urinary tract, detrusor overactivity and sphincteric dyscoordination in the absence of neurological abnormalities.2,12 Although there is a large body of literature on the different aspects of this syndrome, there is no universal agreement on diagnostic criteria. NLUTD/ DES is primarily a clinical diagnosis and, thus, it is subject to individual clinician judgment. Lack of a validated instrument to diagnose this disorder has a profound negative impact on clinical research, creating patient population dissimilarities in different studies and so inconsistent reported results. In the last few years some groups developed questionnaires to remedy this problem but their development and psychometric validation processes had multiple shortcomings. Validity and reliability are 2 basic psychometric requirements for any questionnaire. As a general rule, reliability sets the upper limit of validity and, thus, an unreliable instrument has low validity.10 There are different types of reliability but the most important one in this case is test-retest reliability. If an instrument is reliable, there should be little variance in results when it is reapplied to the same participant, provided that the condition has not changed. Validity is defined as the success of a tool in achieving its goal, which in our study equates to how confident we are regarding inferences that we make about participants based on their scores.11 Furthermore, when developing a clinical instrument, it is important to be clear about its purpose. For example, the validation process is different for tools used to diagnose as opposed to follow. In 2000 Farhat et al provided a questionnaire.7 They were the first to use a psychometric approach to develop an NLUTD/DES tool. The International Prostate Symptom Score, a validated tool to classify and grade lower urinary tract symptom severity in men with prostate enlargement, was used as the framework for generating their items. Extrapolating these questions to a completely different entity in children does not seem logical. Other flaws in the development of this instrument are the omission of important symptoms such as fecal soiling, the inclusion of time sensitive scores and ambiguity in some question stems. An example of the latter is a question asking about going to the bathroom, which may
indicate voiding or defecating. The main shortcoming of this instrument is that it lacks reliability assessment. Thus, the fundamental question remains unanswered of how much of the score variance is unrelated to the disease process but related to the poor reproducibility of the instrument. Also, they stated that the questionnaire may be used for followup. Their methodology does not support this conclusion. Since there was no assessment of responsiveness (ability of a scale to detect changes in disease status), this instrument cannot be considered valid to measure changes in disease status. In 2001 Sureshkumar et al provided a questionnaire to assess the prevalence of and risk factors for daytime urinary incontinence in children 3.5 to 7 years old.8 The questionnaire had good test-retest reliability but it was not designed or evaluated for its discriminative properties, ie its ability to differentiate between normal and abnormal voiding patterns. Although they concluded that the questionnaire could be used in clinical trials to measure treatment responses, no data in their study suggest that instrument responsiveness to change was assessed. In 2005 Akbal et al provided another symptom scoring system.9 They used the International Reflux Study questionnaire as the basis for item generation. The instrument has 13 items weighted according to ability to differentiate normal and abnormal groups. Five items have a total of 4 ordinal response choices and the rest are binary. They used robust methodology to evaluate validity but did not assess reliability. Also, the questionnaire does not probe defecatory symptoms, which are a major component of DES. The questionnaire was validated in a Turkish population, preventing its use in populations speaking a different language without formal assessment of measurement equivalence, which is a well-known principle in psychometric evaluation of health measurement scales. Measurement equivalence in this situation would comprise translation to the target language (English), administration of the tool, back translation of the results to Turkish by a second translator and statistical analysis of the scores.13 To our knowledge this has not been done for this instrument to date so that any application of it in English speaking patients is speculative. The current questionnaire differs from previous ones in several aspects. Item generation was more thorough, including 3 sources (literature review, patient interview and expert opinions). The items probe incontinence and its severity, and irritative and obstructive LUTS. Since gastrointestinal symptoms may be a prominent part of DES, we included items on constipation and fecal soiling. The scale is self-administrated and requires only a few minutes to complete.
SYMPTOM SCORE FOR DYSFUNCTIONAL ELIMINATION SYNDROME
We noted excellent test-retest reliability (Pearson’s coefficient 0.845), an important characteristic that was explored in only 1 previous study.8 We chose 1 week for repeat testing because the condition is unlikely to change in such a short time and repeat answers are not likely to be based only on recall from the first questionnaire. Cronbach’s ␣, a measure of internal consistency, was modest (0.445). This is a well described issue with instruments assessing multifaceted clinical syndromes such as NLUTD/DES.10 The low value only denotes heterogeneity in clinical presentation, which is known to pediatric urologists. Factor analysis, a statistical methodology looking for patterns of association within the instrument, showed unloading on 4 factors, which correlated with items on incontinence, irritative LUTS, obstructive LUTS and intestinal components. Together these factors accounted for 65% of the variance in scores. The good discriminative ability of the questionnaire is an indication of its diagnostic validity. At a score of 11 as a threshold diagnostic sensitivity and specificity are 80% and 91%, respectively. To put these values in perspective serum prostate specific antigen at a cutoff of 4.1 ng/ml to diagnose high grade prostate cancer has only 50% sensitivity and 90% specificity. There are limitations to this study. Since there is no gold standard for diagnosing DES, we could not test criterion validity. This type of validation re-
1943
quires testing the scores of the tool against a well established diagnostic criterion. Like other clinical entities such as depression or irritable bowel syndrome, NLUTD/DES is a construct. Using the extreme groups method we noted high discriminative construct validity, ie participants with higher scores are more likely to be diagnosed with NLUTD/DES. The mainstay of differentiation between our groups was the currently acceptable criterion, ie clinical diagnosis by an expert. This instrument was not tested for detecting changes after different treatments, ie responsiveness. We are currently collecting data for this purpose. Also, NLUTD/DES is a heterogeneous syndrome with many facets. Treatment should not be based only on the sum of scores but rather tailored to specific clinical presentation aspects in each case. Development of a health measurement scale is an ongoing process requiring reassessment in new settings. One should be cautious about using a scale in a different population or in the context of different baseline definitions of the disease.
CONCLUSIONS This newly developed symptom score was a reliable and valid tool for diagnosing NLUTD/DES in our sample of English speaking children. This questionnaire is useful for obtaining unified diagnostic criteria for this elusive entity in the contexts of clinical practice and research.
REFERENCES 1. Rushton HG: Wetting and functional voiding disorders. Urol Clin North Am 1995; 22: 75.
with nocturnal enuresis and urinary incontinence. Scand J Urol Nephrol, suppl., 1997; 183: 79.
2. Koff SA, Wagner TT and Jayanthi VR: The relationship among dysfunctional elimination syndromes, primary vesicoureteral reflux and urinary tract infections in children. J Urol 1998; 160: 1019.
6. Ollendick TH, King NJ and Frary RB: Fears in children and adolescents: reliability and generalizability across gender, age and nationality. Behav Res Ther 1989; 27: 19.
3. Neveus T, von Gontard A, Hoebeke P et al: The standardization of terminology of lower urinary tract function in children and adolescents: report from the Standardisation Committee of the International Children’s Continence Society. J Urol 2006; 176: 314. 4. Varlam DE and Dippell J: Non-neurogenic bladder and chronic renal insufficiency in childhood. Pediatr Nephrol 1995; 9: 1. 5. Hagglof B, Andren O, Bergstrom E et al: Selfesteem before and after treatment in children
7. Farhat W, Bagli DJ, Capolicchio G et al: The dysfunctional voiding scoring system: quantitative standardization of dysfunctional voiding symptoms in children. J Urol 2000; 164: 1011.
10. Streiner DL and Norman GR: Reliability. In: Health Measurement Scales: A Practical Guide to Their Development and Use, 3rd ed. Oxford, United Kingdom: Oxford University Press 2003; pp 126 – 151. 11. Streiner DL and Norman GR: Validity. In: Health Measurement Scales: A Practical Guide to Their Development and Use, 3rd ed. Oxford, United Kingdom: Oxford University Press 2003: 172–94.
8. Sureshkumar P, Craig JC, Roy LP et al: A reproducible pediatric daytime urinary incontinence questionnaire. J Urol 2001; 165: 569.
12. Feldman AS and Bauer SB: Diagnosis and management of dysfunctional voiding. Curr Opin Pediatr 2006; 18: 139.
9. Akbal C, Genc Y, Burgu B et al: Dysfunctional voiding and incontinence scoring system: quantitative evaluation of incontinence symptoms in pediatric population. J Urol 2005; 173: 969.
13. Guillemin F, Bombardier C and Beaton D: Crosscultural adaptation of health-related quality of life measures: literature review and proposed guidelines. J Clin Epidemiol 1993; 46: 1417.
EDITORIAL COMMENT The increasing focus on NDLU/B in children has also cast light on the increasing time, effort and cost
required to manage NDLU/B successfully. As such, we recognize a need for more objective and system-
1944
SYMPTOM SCORE FOR DYSFUNCTIONAL ELIMINATION SYNDROME
atic classification of such cases if we hope to objectively monitor treatment response and rationally develop modern new therapy. When this need was recognized, first by our group (reference 7 in article) and later by others, we did not fully understand (as these authors teach) that the design and implementation of psychometric scoring systems for NDLU/B is a 2-step process. This study is the first step in this process, that is validation for discriminating pediatric NDLU/B using a scoring system is feasible. However, since the author system is admittedly based on their expert opinion and not on the expert opinion of readers, one wonders whether they produced a system that works objectively for them but not for the rest of us. Herein lies the greater problem, that is the lack of a true gold standard for NDLU/B on which to base the development of such scoring tools. The author ROC showed an optimum diagnostic cutoff total score of 11. In our study cohort this score was 9 in boys and 6 in girls. However, such scores only apply to the cohort from which they were de-
rived. Actual values matter less than the patient population in which they are to apply. Since the local expert needs a tool that can quantify that expert opinion within that practice pattern, perhaps in the future one can envision an off-the-shelf scoring system that must first be calibrated to reflect the view of the expert(s) using it. Much as one trains speech recognition software for accuracy, the clinician would first derive an optimal ROC cutoff score from a validated series of questions with discriminatory ability that reflects their expert diagnostic threshold for NDLU/B in their practice. Although we await an NDLU/B scoring system that we can all use, these authors show by the most rigorous effort to date that there is much more to this task than meets the eye. Darius J. Bagli Division of Urology Hospital for Sick Children University of Toronto Toronto, Ontario Canada