DEVELOPMENT AND VALIDATION OF PATIENT-REPORTED OUTCOMES MEASURES FOR OVERACTIVE BLADDER: A REVIEW OF CONCEPTS KARIN S. COYNE, ANDREA TUBARO, LINDA BRUBAKER,
AND
TAMARA BAVENDAM
ABSTRACT Patient-reported outcome (PRO) measures are a valuable means for determining how a disease and its treatment affect patients, including effects on health-related quality of life (HRQL). To ensure that the results obtained with PROs are clinically useful, data must be gathered using valid and reliable instruments. Developing such instruments requires a multistep, structured process that incorporates cognitive psychology, psychometric theory, and patient and clinician input. The process begins by determining the intent and purpose of the PRO and culminates in studies that demonstrate the measure’s validity, reliability, and responsiveness. Several valid and reliable PROs are available for assessing the effects of treatment on symptom severity, symptom bother, and HRQL in patients with overactive bladder. UROLOGY 68 (Suppl 2A): 9–16, 2006. © 2006 Elsevier Inc.
P
atient-reported outcomes (PROs) represent any report coming from a patient about a health condition or its treatment1 and can include health-related quality of life (HRQL), symptom assessments, treatment satisfaction, and functional outcomes. PROs are frequently used by clinical researchers when discussing treatment outcomes, but how exactly can PROs be measured in a meaningful and reproducible manner? Asking a patient “How are you feeling?” provides a global assessment but does not fully capture the impact of a disease or treatment outcome, nor does it allow for meaningful patient comparisons. As discussed in this supplement in an article by Brubaker et al.,2 PROs provide clinicians with additional insights for diagnosis, symptom evaluation, disease impact, Karin S. Coyne is a consultant to Pfizer Inc. Linda Brubaker receives research funding from Q-Med, Pfizer Inc, and Allergan and receives consultant/speaker honoraria from Q-Med, Novartis, Pfizer Inc, and Astellas. Tamara Bavendam is an employee of Pfizer Inc. From the Center for Health Outcomes Research, United BioSource Corporation, Bethesda, Maryland, USA (KSC); Sant’Andrea Hospital, “La Sapienza” University, Rome, Italy (AT); Departments of Obstetrics, Gynecology, and Urology, Loyola University Medical Center, Maywood, Illinois, USA (LB); and Pfizer Inc, New York, New York, USA (TB) Reprint requests: Karin S. Coyne, PhD, MPH, United BioSource Corporation, 7101 Wisconsin Avenue, Suite 600, Bethesda, Maryland 20814. E-mail: karin.coyne@unitedbiosource. com © 2006 ELSEVIER INC. ALL RIGHTS RESERVED
and assessment of treatment response. However, to accomplish this, PROs need to be tested to demonstrate their validity and reliability just as is done for any other diagnostic instrument used in clinical practice. The science behind measuring PROs incorporates many fields including cognitive and quantitative psychology, sociology, psychometric theory, as well as clinician and patient input. As such, the creation of a patient questionnaire should follow a structured process. The purpose of this article is to review the science behind developing PRO measures and to briefly discuss current PRO questionnaires for overactive bladder (OAB). BACKGROUND In urology, early PRO measures, such as the Boyarsky score3 and Stamey incontinence grading,4 were attempts to standardize clinical history to guide treatment decisions or ensure uniformity of data collection in clinical trials. These instruments were not validated and were clinician administered, not selfadministered.5 Since then, particularly during the past 10 years, increased recognition of the importance of PROs has led to the development of a large number of patient-completed questionnaires. However, most PRO instruments in urology have been developed for patients with urinary incontinence (UI)5–7; fewer measures have been designed specifically for those with OAB. 0090-4295/06/$32.00 doi:10.1016/j.urology.2006.05.042 9
DEVELOPING A PATIENT-REPORTED OUTCOME MEASURE Developing a PRO measure is a multistep process that begins with identifying the need for a specific instrument and culminates with a measure that has demonstrated validity and reliability in the target patient population. To be a useful measurement tool, a PRO instrument must also be easy to administer, reliable, and valid. Several issues must be considered when designing a PRO questionnaire. INTENT AND PURPOSE The first task in developing a PRO measure is to determine why the instrument is needed. Given the current number of disease-specific questionnaires available in the field of incontinence and related pelvic disorders, a new PRO measure must fill a need that has not already been met by an existing instrument. Once the need for the measure is recognized, its purpose and clinical usefulness need to be considered because the purpose dictates the validation– design process. For example, measures of symptoms and treatment satisfaction would be developed and validated differently because the outcome is different. The development stage would focus on the outcome of interest (eg, symptoms patients experience and the significance of each symptom, or what issues patients consider when determining how satisfied they are with treatment) with the items derived directly relating to the outcome of interest. Validation efforts would include designing a study focused on the outcome of interest with the appropriate patient inclusion/ exclusion criteria to enhance generalizability while maintaining internal consistency and providing opportunities to test—at a minimum—reliability and construct validity. ITEM DEVELOPMENT Designing a clinically useful PRO measure involves more than just developing a series of questions. In addition to clinician input and literature review, questionnaire items should be generated from patient input obtained through focus groups or one-on-one interviews to provide qualitative data on issues pertinent to patients and to identify the words patients use to describe their symptoms or disease impact. Focus groups and one-on-one interviews should be carefully planned to address the goals of the questionnaire being developed. For example, if a measure is intended to assess symptom bother, interview questions should pertain to the patient’s symptom experience. Also, rather than using clinical terminology, the words used during focus groups or interviews should be common to patients. After items are generated, the newly drafted questionnaire should be reviewed by 10
other patients and experts to ensure its readability and content validity. An alternative approach to questionnaire development is to adapt an existing measure to meet the needs of the desired questionnaire. Although no patient input is required at the outset when adapting an existing instrument, patients need to be involved after the questionnaire is adapted to ensure that the revised measure is pertinent to the population of interest. Also, the adapted questionnaire must be validated on its own in the target population; the validity of the original questionnaire does not apply to an adapted measure. For newly developed and adapted questionnaires, think-out-loud interviews or cognitive debriefing interviews should be used to ascertain the correctness and validity of the revised questionnaire. In think-out-loud interviews, patients are asked to review a question and describe what they are thinking as they cognitively process the question; the patients think out loud about what the question means to them. For a cognitive debriefing approach, patients review and respond to the questionnaire items, then are interviewed about what each item meant to them as they completed the questionnaire. Both approaches provide information about what patients consider when responding to each item. MODE OF ADMINISTRATION Once items have been generated, the mode of administration must be considered. Will the measure be completed by the patient (ie, self-administered) or administered by an interviewer (ie, interviewer administered)? How the questionnaire will be completed needs to be determined before the validation stage because mode of administration can affect patient responses. For highly personal or intimate questions, a self-administered questionnaire is recommended to avoid response bias. Questionnaires that are self-administered are preferable to interviewer-administered questionnaires because the data collection burden is reduced, and patients are more likely to provide unbiased information on self-administered questionnaires. Importantly, if a questionnaire has been validated for a particular mode of administration, this does not make the questionnaire valid for all modes of administration. Each mode of administration should be validated separately. VALIDATION All PRO measures must demonstrate reliability, validity, and responsiveness in practice. This can be accomplished in several ways: (1) perform a stand-alone cross-sectional study to validate the questionnaire in the patient population for which it was designed, (2) administer the untested quesUROLOGY 68 (Supplement 2A), August 2006
tionnaire in a clinical trial and use the baseline data to perform psychometric validation (the end-ofstudy data can also be used to evaluate responsiveness), or (3) perform a stand-alone longitudinal study with an intervention to determine the instrument’s psychometric performance and responsiveness in a noninvestigational setting. Reliability refers to the ability of a measure to produce similar results when assessments are repeated (ie, is the measure reproducible?).8,9 Reliability is critical to ensure that change detected by the measure is because of the treatment or intervention and not because of measurement error.10 One measure of reliability is the questionnaire’s internal consistency, which indicates how well individual items within the same domain (or subscale) correlate.11 To ensure that individual items are measuring a similar concept, the items should be correlated.10,11 The Cronbach ␣-coefficient is used to assess internal consistency reliability, with higher ␣-coefficients indicating greater correlation.10 Typically, the Cronbach ␣-coefficient should be between 0.70 and 0.90 for group data; ␣-coefficient values ⬎0.90 suggest possible redundancy in the questionnaire items.10,11 If the value of the item-to-total ␣-coefficient is ⬍0.20, the question should be removed or rewritten.8 Test–retest reliability, or reproducibility, indicates how well results can be reproduced with repeated testing. To assess test–retest reliability, the same patient completes the questionnaire more than once, at baseline and again after a few days or weeks.8,10,11 The Pearson correlation coefficient and intraclass correlation coefficient are used to demonstrate reproducibility. For group data, a Pearson correlation coefficient and an intraclass correlation coefficient of ⱖ0.70 demonstrate good test–retest reliability.10,11 Interrater reliability indicates how well scores correlate when a measure is administered by different interviewers or when multiple observers rate the same phenomenon.11 Demonstration of interrater reliability is not necessary for self-administered questionnaires, but it is necessary for instruments based on observer ratings or using multiple interviewers. A correlation ⱖ0.80 between raters indicates good interrater reliability.11 Validity refers to the ability of an instrument to measure what it was intended to measure.8,10,11 A measure should be validated for each specific condition or outcome for which it will be used. A measure designed to assess stress incontinence would not be valid for OAB unless it was validated in patients with OAB symptoms. Content validity, construct validity, and criterion validity typically are required to validate a questionnaire.11 Content, or face, validity is a qualitative assessment of whether the questionnaire UROLOGY 68 (Supplement 2A), August 2006
captures the range of the concept it is intended to measure.10,11 For example, does a measure of symptom severity capture all symptoms that patients with a particular condition have, and if so, is the measure capturing the items in a manner meaningful to patients? To obtain content validity, patients and clinical experts evaluate the measure and judge whether the questions are clear, unambiguous, and comprehensive.8 Construct validity is a quantitative assessment of whether the questionnaire measures the theoretical construct it was intended to measure8,10; it encompasses convergent validity and discriminant validity. Convergent validity indicates whether a questionnaire has stronger relations with similar concepts or variables; discriminant validity indicates whether the questionnaire can differentiate between known patient groups (eg, those with mild, moderate, or severe disease).10 Stronger relations should be seen with the most closely related constructs and weaker relations seen with less-related constructs.10 Criterion validity reflects the correlation between the new questionnaire and an accepted reference, or “gold standard.”10,11 Establishing criterion validity can be difficult because a gold standard measure might not be available.10,11 When criterion validity can be established with an existing measure, the correlation should be 0.40 – 0.70; correlations approaching 1.0 indicate that the new questionnaire may be too similar to the gold standard measure and therefore redundant.11 Responsiveness indicates whether the measure can detect change in a patient’s condition.10 An important aspect of responsiveness is determining not only whether the measure detects change but also whether the change is meaningful to the patient. This can be done by determining the minimal important difference (MID) of the measure. The MID is the smallest change in a PRO questionnaire score that would be considered meaningful or important to a patient.10,12–14 A treatment that is statistically significantly better than another may not necessarily have made a meaningful difference to the patient. The MID indicates whether the treatment made such a difference from a patient perspective.12–14 Unfortunately, there is no scientific test for MID. It is an iterative process that involves 2 methodologies to determine the MID of a questionnaire: an anchor-based approach and a distribution-based approach.15,16 With the anchor-based approach, the MID is determined by comparing the measure with other measures (or “anchors”) that have clinical relevance,16 such as a global measure of wellbeing or perception of treatment benefit.13 With the distribution-based approach, the MID is determined by the statistical distributions of the data,16 11
using analyses such as effect size, one-half standard deviation, and standard error of measurement.13,17 ADAPTING A QUESTIONNAIRE FOR DIFFERENT LANGUAGES AND CULTURES A measure that is valid and reliable for a particular language and culture may not prove so when used in a different population. Linguistic and cultural adaptation of a questionnaire can occur during the development phase before validation, or they can be done after the questionnaire is validated in the language in which it was initially developed, with the latter being the more common approach. Ensuring the linguistic and cultural validity of a questionnaire is especially important for measures used in multinational clinical trials.18 A particular challenge in doing this is the difficulty of translating medical terms and concepts into other languages.19 The principal steps in adapting a measure for different languages and cultures are as follows: (1) development of 2 forward translations of the original instrument into the new language, (2) adoption quality control procedures that may include a backward translation (translating the instrument back into the original language),20 (3) discussion by an expert panel to ensure clarity of the translated questionnaire, and (4) testing of the translated instrument in monolingual or bilingual patients to ensure that it measures the same concepts as the original instrument.8,11,18,20 However, if a backward translation of the measure does not produce a semantically equivalent instrument, then the instrument may need to be developed in the target language, rather than just translated.20 The steps involved in adapting the Overactive Bladder Questionnaire (OAB-q) for multinational use were as follows19: (1) the original US English version was translated into 12 languages—the questionnaire was translated twice by 2 professional translators, and the 2 versions were reconciled; (2) the translated questionnaires were then translated back into English, and any discrepancies were discussed and resolved; (3) the revised version was then reviewed by physicians, and a third version was constructed based on their input; and (4) this version was tested with patients with OAB in face-to-face interviews; any difficulties patients had comprehending the translated version were addressed, and a final version was produced. A considerable amount of work is needed to develop and validate a PRO measure. Using valid and reliable instruments, such as those discussed next, helps ensure that the results obtained in clinical research are valid and useful to clinicians, patients, regulatory authorities, and others making treatment decisions. 12
PATIENT-REPORTED OUTCOMES MEASURES FOR OVERACTIVE BLADDER Several PROs are available to assess symptom severity or bother and HRQL in patients with UI, but fewer measures have been designed specifically for patients with OAB. Measures that are particularly useful for OAB are described in the next section, and their psychometric properties are summarized in Table I.7,21–31 Importantly, when selecting a PRO measure, its items and overall performance in terms of reliability, validity, and responsiveness must be considered. Certainly, it is better to use a measure with a previously validated and documented performance over a new PRO measure. However, it is equally important to ensure that the PRO measure selected meets the needs of the clinician and researcher. The central question to answer when selecting a PRO measure is: “Does this instrument measure the outcome that I want to measure?” ASSESSING OAB SYMPTOM BOTHER Measures that can be used to assess how bothered patients are by OAB symptoms include the OAB-q Symptom Bother Scale (the first 8 items of the OAB-q)21 and the Primary OAB Symptom Questionnaire (POSQ).22 Either instrument can be used to assess symptom bother; however, each is scored differently, so the choice of which instrument to use depends entirely on the outcome being measured. With the OAB Symptom Bother Scale, patients are asked to rate from 1 (not at all) to 6 (a very great deal) how bothered they are by the OAB symptoms of urgency, frequency, nocturia, and urgency urinary incontinence (UUI). Responses are calculated as single summary scores ranging from 0 to 100 with higher scores indicating greater symptom bother.21 The POSQ, also called the OAB Bother Rating Scale, is a 5-item questionnaire in which patients rate how bothered they are by each OAB symptom (urgency, frequency, nocturia, and UUI) during the preceding 2 weeks and indicate which symptom bothered them the most.22 Unlike the OAB-q Symptom Bother Scale, the POSQ does not provide a summary score. Rather, there are 4 individual Likert responses and a single question asking which symptom bothers the patient the most. Content validity of the POSQ was evaluated through cognitive debriefing interviews with patients.22 ASSESSING THE IMPACT OF URGENCY Several instruments have been developed specifically to assess urinary urgency, which is defined by the International Continence Society as “the complaint of a sudden compelling desire to pass UROLOGY 68 (Supplement 2A), August 2006
TABLE I. Psychometric properties of selected patient-reported outcomes measures for overactive bladder (OAB) Reliability Instrument OAB symptom bother Overactive Bladder Questionnaire (OAB-q) Symptom Bother Scale21,29 Primary OAB Symptom Questionnaire22 Urgency Urgency Perception Scale23
Indevus Urgency Severity Scale Urinary Sensation Scale25
Urgency Rating Scale26 Urgency Questionnaire22,27
Health-related quality of life OAB-q21,29
OAB-q Short Form28 King’s Health Questionnaire30 Urge Incontinence Impact Questionnaire31
Validity
Internal Consistency
Reproducibility (Test–Retest)
Content
Construct
✓
✓
✓
✓
✓ (12 wk) (⫹ MID)
NA
✓
✓
x
x
OAB (double-blind), men and women (n ⫽ 1169; mean age, 60 yr) OAB (single-blind); men and women (n ⫽ 978; mean age, 66 yr) OAB (open-label), men and women (n ⫽ 539; mean age, 62 yr) OAB with predominant urgency incontinence, men and women (n ⫽ 523) Urologists or urogynecologists (n ⫽ 5) Survey respondents with OAB symptoms (n ⫽ 12) NA OAB, men and women (n ⫽ 47; mean age, 66 yr) OAB, women (n ⫽ 974; mean age, 49 yr)
NA
x
✓
✓
✓/x
—
—
—
—
—
—
—
—
—
—
x
✓
✓
✓
✓ (12 wk)
NA
NA
✓
NA
NA
NA ✓
NA ✓
NA ✓
NA ✓
NA ✓ (10 days)
—
—
—
—
—
OAB, men and women (n ⫽ 990; mean age, 59 yr) OAB, men and women (n ⫽ 865) (responsiveness) OAB, men and women (n ⫽ 396) (validity) OAB, men and women (n ⫽ 1529; mean age, 61 yr) Incontinence, mostly women (n ⫽ 257; mean age, 60 yr)
—
—
—
—
—
—
—
—
—
—
✓
✓
✓
✓
✓
x
x
✓
✓
✓
x
✓
✓ (12 wk) (⫹ MID for OAB-q) ✓ (12 wk) (⫹ MID) ✓ (12 wk)
Population Sample
OAB, men and women (n ⫽ 990; mean age, 59 yr) OAB, men and women (n ⫽ 865) (responsiveness) OAB, men and women (n ⫽ 47); (mean age, 66 yr)
Responsiveness (Treatment Duration)
✓ ⫽ feature demonstrated; x ⫽ feature not demonstrated; ✓/x ⫽ feature not clearly demonstrated; MID ⫽ minimal important difference; NA ⫽ not applicable or data not available. Adapted from J Urol.7
urine which is difficult to defer.”32 Urgency is the principal symptom of OAB,33 and, as such, assessing the effect of treatment on this symptom and its impact on HRQL is important. With any measure designed to evaluate urgency, patients must be able to distinguish between the normal desire to urinate (urge) and the difficult-to-postpone need to urinate (urgency).34,35 Wording thus becomes critical in the development of urgency assessment measures. Chapple and Wein36 make a case for describing urgency as a “compelling desire to void in which patients fear leakage of urine” as a means of distinguishing this abnormal sensation from the normal need to void. However, some patients may have a sensation of urgency without fear of leakage, further complicating attempts to define urgency. The Urgency Perception Scale (UPS) was designed for use in clinical trials to evaluate patientperceived urgency.23 This instrument consists of a single question asking patients to describe their typical experience when they feel the need to uriUROLOGY 68 (Supplement 2A), August 2006
nate. The 3 possible responses are “I am usually not able to hold urine,” “I am usually able to hold urine until I reach the toilet if I go immediately,” and “I am usually able to finish what I am doing before going to the toilet.”23 This scale was validated in a clinical trial evaluating the efficacy of tolterodine in treating OAB symptoms23; however, its limited responsiveness may preclude its usefulness in clinical practice.37 The Indevus Urgency Severity Scale (IUSS) asks patients to rate their level of urgency on a 4-point scale, from 0 (no urgency) to 4 (extreme urgency discomfort that abruptly stops all activity/tasks).24 This scale has been validated in a clinical trial of trospium in patients with OAB,24 but Chapple et al.37 question whether this scale actually measures urgency or just the normal urge to void. The Urinary Sensation Scale (USS) is a 5-point scale ranging from 1 (no urgency; can continue activities until it is convenient to use the bathroom) to 5 (urge incontinence; extreme urgency 13
discomfort, cannot hold urine, and have a wetting accident before arriving at the bathroom).25 The content validity of this scale was established through a physician survey and patient interviews.25 The Urgency Rating Scale, recommended by the European Medicines Agency, consists of a 5-point rating scale to be rated with every void, ranging from 1 (no urgency; I felt no need to empty my bladder but did so for other reasons) to 5 (urge incontinence; I leaked before arriving at the toilet).26 This scale was used in a tolterodine clinical trial, in which responses on this scale were used to calculate sum urgency, a measure that accounts for changes in both urgency and frequency.38 The Urgency Questionnaire consists of 15 items measured on a 5-point rating scale from 1 (none of the time) to 5 (all of the time) that form 4 subscales (fear of incontinence, time to control urge, impact on daily activities, and nocturia), plus 4 visual analog scales used to rate the intensity, severity, impact, and discomfort related to urgency.22,27 In a trial in which the Urgency Questionnaire was completed by patients with OAB who were treated with tolterodine, the instrument was found to have good reliability, construct validity, and responsiveness.27 With some of these scales, patients have the option of indicating that they had UUI (an event) rather than the strongest feeling of urgency (a sensation) itself. In such cases, patients who have severe urgency, but not UUI, do not have an option for endorsing the highest (worst) value, because they are not incontinent. Urgency severity scales that include a UUI response option thus may be less useful than those that do not because such scales are trying to measure 2 things at once, urgency and UUI. ASSESSING THE IMPACT OF OVERACTIVE BLADDER SYMPTOMS ON HRQL The OAB-q currently is the only measure specifically designed to assess HRQL in both continent and incontinent patients with OAB. The King’s Health Questionnaire (KHQ), however, is often used to assess HRQL in clinical trials of antimuscarinic agents for OAB. The Urge Incontinence Impact Questionnaire (Urge-IIQ) is a more focused measure that does not assess the impact on HRQL of the entire range of OAB symptoms. The OAB-q was developed to assess how much a patient is bothered by OAB symptoms and the impact of these symptoms on the patient’s HRQL. This measure consists of the previously mentioned 8-item Symptom Bother Scale and a 25-item scale assessing 4 HRQL domains (coping, concern, sleep, and social interaction).21 The OAB-q is also available in a shortened form (OAB-q SF) consist14
ing of 6 symptom bother questions and 13 HRQL questions.28 The OAB-q and OAB-q SF have been incorporated into the International Consultation on Incontinence Questionnaire for OAB (ICIQOAB).39 The OAB-q has been validated in both continent and incontinent OAB patients.21 In an evaluation study of 990 men and women with OAB, the OAB-q showed good construct validity.21 In a study of 47 patients with OAB, the OAB-q was shown to have good test–retest reliability (reproducibility).22 In a study of 865 patients with OAB treated with tolterodine, the OAB-q was highly responsive to treatment-related change, with statistically significant correlations between OAB-q change scores and changes in bladder diary variables.29 The OAB-q has been used to provide a quantitative assessment of the impact of nocturia on HRQL and sleep, and it discriminated well among patients with varying degrees of nocturia.40 Also, the OAB-q has been used to examine the impact of UUI, stress incontinence, and mixed incontinence on HRQL.41 As a disease-specific measure, the OAB-q was more sensitive than generic instruments in detecting differences among patients with different types of incontinence. In a recent post hoc analysis of 2 clinical trials of tolterodine, distribution- and anchor-based approaches were used to determine that the MID for all OAB-q subscales is 10 points.42 The KHQ, originally designed to assess changes in HRQL in women with UI,43 has since been validated in patients with OAB, but only in those whose symptoms include UUI.30 In a study using data from 2 clinical trials of tolterodine in OAB, Kelleher et al.13 used an anchor-based approach and a distribution-based approach to calculate the MID for the KHQ in OAB. They found that a change of ⱖ5 points on the KHQ indicated a clinically important difference in HRQL. A shortcoming of the KHQ is that it is written in British English, which does not always translate well to a US population. A short version of the KHQ has been developed and validated in Japan.44 Instead of the 16 items grouped into 8 domains that constitute the KHQ, the KHQ short form consists of 6 items grouped into 2 domains (limitation of daily life, mental health).44 The KHQ has been incorporated into the International Consultation on Incontinence Questionnaire for Lower Urinary Tract Symptoms and quality of life as the ICIQLUTSqol.45 The Urge–Incontinence Impact Questionnaire (Urge-IIQ) is an adaptation of IQQ, which was designed to evaluate symptoms in patients with stress incontinence.46 The 32-item Urge-IIQ was developed by adding and deleting items from the original questionnaire (the IIQ) and by using focus UROLOGY 68 (Supplement 2A), August 2006
groups, literature reviews, and expert clinical opinion.5,31 The Urge-IIQ has 7 domains (travel, activities, feelings, physical activities, relationships, sexual function, and nighttime bladder control) and has been demonstrated to be reliable and valid in patients with incontinence.31 CONCLUSION Because OAB is a symptom-defined condition, patient input is needed to evaluate treatment response. Consequently, valid and reliable PRO measures are needed. Developing such measures and ensuring their validity is a time-consuming, multistep process. In clinical trials, researchers need PRO instruments that are brief and easily completed by patients, yet yield useful information on treatment outcome in a meaningful and interpretable manner. A range of measures are available to assess the severity and impact of incontinence, but fewer such measures have been designed and validated specifically for OAB. Currently, the most widely validated PRO measure for OAB is the OAB-q. REFERENCES 1. Office of New Drugs and Office of Medical Policy: Guidance for Industry Patient-Reported Outcome Measures: Use in Medical Product Development to Support Labeling Claims. Rockville, MD, Center for Drug Evaluation and Research (CDER), 2006, vol. 2006. 2. Brubaker L, Chapple C, Coyne KS, et al: Patient-reported outcomes in overactive bladder: importance for determining clinical effectiveness of treatment. Urology 68(suppl 2A): 3– 8, 2006. 3. Boyarsky S, Jones G, Paulson DF, et al: A new look at bladder neck obstruction by the Food and Drug Administration regulators: guidelines for investigation of benign prostatic hyperplasia. Trans Am Assoc Genitourin Surg 68: 29 –32, 1977. 4. Stamey TA: Urinary incontinence in the female, in Harrison JH, Gittes RF, Perlmutter AD, et al (Eds): Campbell’s Urology. Philadelphia, WB Saunders, 1979, vol. 2, pp 2272–2293. 5. Naughton MJ, Donovan J, Badia X, et al: Symptom severity and QOL scales for urinary incontinence. Gastroenterology 126: S114 –S123, 2004. 6. Corcos J, Beaulieu S, Donovan J, et al: Quality of life assessment in men and women with urinary incontinence. J Urol 168: 896 –905, 2002. 7. Symonds T: A review of condition-specific instruments to assess the impact of urinary incontinence on health-related quality of life. Eur Urol 43: 219 –225, 2003. 8. Donovan J, Badia X, Corcos J, et al: Committee 6: symptom and quality of life assessment, in Abrams P, Cardozo L, Khoury S et al (Eds): Incontinence. Plymouth, United Kingdom, Plymbridge Distributors, Ltd, 2002, pp 267–316. 9. Gordis L: Assessing the validity and reliability of diagnostic and screening tests. In: Epidemiology. Philadelphia, PA: WB Saunders; 1996: pp 58 –76. 10. Fitzpatrick R, Davey C, Buxton MJ, et al: Evaluating patient-based outcome measures for use in clinical trials. Health Technol Assess 2: 1–73, 1998. UROLOGY 68 (Supplement 2A), August 2006
11. Chassany O, Sagnier P, Marquis P, et al: Patient-reported outcomes: the example of health-related quality of life—a European guidance document for the improved integration of health-related quality of life assessment in the drug regulatory process. Drug Inf J 36: 209 –238, 2002. 12. Jaeschke R, Singer J, and Guyatt GH: Measurement of health status: ascertaining the minimal clinically important difference. Control Clin Trials 10: 407– 415, 1989. 13. Kelleher CJ, Pleil AM, Reese PR, et al: How much is enough and who says so? The case of the King’s Health Questionnaire and overactive bladder. Br J Obstet Gynaecol 111: 605– 612, 2004. 14. Wiklund I: Assessment of patient-reported outcomes in clinical trials: the example of health-related quality of life. Fundam Clin Pharmacol 18: 351–363, 2004. 15. Norman GR, Sridhar FG, Guyatt GH, et al: Relation of distribution- and anchor-based approaches in interpretation of changes in health-related quality of life. Med Care 39: 1039 –1047, 2001. 16. Crosby RD, Kolotkin RL, and Williams GR: Defining clinically meaningful change in health-related quality of life. J Clin Epidemiol 56: 395– 407, 2003. 17. Wyrwich KW, Tierney WM, and Wolinsky FD: Further evidence supporting an SEM-based criterion for identifying meaningful intra-individual changes in health-related quality of life. J Clin Epidemiol 52: 861– 873, 1999. 18. Boyle P: Cultural and linguistic validation of questionnaires for use in international studies: the nine-item BPHspecific quality-of-life scale. Eur Urol 32(suppl 2): 50 –52, 1997. 19. Conway K, Uzun V, Vigner S, et al: Linguistic validation of the Overactive Bladder Questionnaire (OAB-q) into 12 languages. Presented at the 3rd International Conference on Incontinence; June 26-29, 2004; Monte Carlo, Monaco. 20. Maneesriwongul W, and Dixon JK: Instrument translation process: a methods review. J Adv Nurs 48: 175–186, 2004. 21. Coyne K, Revicki D, Hunt T, et al: Psychometric validation of an overactive bladder symptom and health-related quality of life questionnaire: the OAB-q. Qual Life Res 11: 563–574, 2002. 22. Matza LS, Thompson CL, Krasnow J, et al: Test-retest reliability of four questionnaires for patients with overactive bladder: the Overactive Bladder Questionnaire (OAB-q), Patient Perception of Bladder Condition (PPBC), Urgency Questionnaire (UQ), and the Primary OAB Symptom Questionnaire (POSQ). Neurourol Urodyn 24: 215–225, 2005. 23. Cardozo L, Coyne KS, and Versi E: Validation of the Urgency Perception Scale. BJU Int 95: 591–596, 2005. 24. Bowden A, Colman S, Sabounjian L, et al: Psychometric validation of an Urgency Severity Scale (IUSS) for patients with overactive bladder [abstract]. Presented at the 33rd Annual Meeting of the International Continence Society; October 5-9, 2003; Florence, Italy. 25. Brewster-Jordan JL, Guan Z, Green HL, et al: Establishing the content validity of the Urinary Sensation Scale (USS): International Society for Pharmacoeconomics and Outcomes Research. Washington, DC, 2005. 26. European Agency for the Evaluation of Medicinal Products and Committee for Proprietary Medicinal Products: Note for guidance on the clinical investigation of medicinal products for the treatment of urinary incontinence. London, 2002. 27. Coyne KS, Matza L, and Versi E: Urinary urgency: can “gotta go” be measured? Presented at the 10th Annual Conference of the International Society for Quality of Life Research; Prague, Czech Republic; November 12–15, 2003. 28. Coyne KS, Lai J-S, Zycyzynski T, et al: An overactive bladder symptom and quality-of-life short form: development of the Overactive Bladder Questionnaire Short Form (OAB-q 15
SF). Presented at the 34th Joint Meeting of the International Continence Society and the International Urogynecological Association; August 23-27, 2004; Paris, France. 29. Coyne KS, Matza LS, and Thompson CL: The responsiveness of the Overactive Bladder Questionnaire (OAB-q). Qual Life Res 14: 849 – 855, 2005. 30. Reese PR, Pleil AM, Okano GJ, et al: Multinational study of reliability and validity of the King’s Health Questionnaire in patients with overactive bladder. Qual Life Res 12: 427– 442, 2003. 31. Lubeck DP, Prebil LA, Peeples P, et al: A health related quality of life measure for use in patients with urge urinary incontinence: a validation study. Qual Life Res 8: 337–344, 1999. 32. Abrams P, Cardozo L, Fall M, et al: The standardisation of terminology in lower urinary tract function: report from the standardisation sub-committee of the International Continence Society. Urology 61: 37– 49, 2003. 33. Brubaker L: Urgency: the cornerstone symptom of overactive bladder. Urology 64: 12–16, 2004. 34. Brubaker L: Urinary urgency and frequency: what should a clinician do? Obstet Gynecol 105: 661– 667, 2005. 35. Staskin DR: The urge to define urgency: a review of three approaches. Curr Urol Rep 5: 413– 415, 2004. 36. Chapple CR, and Wein AJ: The urgency of the problem and the problem of urgency in the overactive bladder. BJU Int 95: 274 –275, 2005. 37. Chapple CR, Artibani W, Cardozo LD, et al: The role of urinary urgency and its measurement in the overactive bladder symptom syndrome: current concepts and future prospects. BJU Int 95: 335–340, 2005.
16
38. Coyne KS, Matza L, Thompson C, et al: A comparison of three approaches to analyze urinary urgency as a treatment outcome. Presented at the 35th Annual Meeting of the International Continence Society; August 28 to September 2, 2005; Montreal, Quebec, Canada. 39. Abrams P, Avery K, Zyczynski T, et al: Measuring patient outcomes in OAB: the OAB-q, OAB-q SF, OAB screener and ICIQ-OAB [abstract]. Neurourol Urodyn 23: 399, 2004. 40. Coyne KS, Zhou Z, Bhattacharyya SK, et al: The prevalence of nocturia and its effect on health-related quality of life and sleep in a community sample in the USA. BJU Int 92: 948 –954, 2003. 41. Coyne KS, Zhou Z, Thompson C, et al: The impact on health-related quality of life of stress, urge and mixed urinary incontinence. BJU Int 92: 731–735, 2003. 42. Coyne K, Matza L, Thompson C, et al: Determining the importance of change in the OAB-q. J Urol 176: 627– 632, 2006. 43. Kelleher CJ, Cardozo LD, Khullar V, et al: A new questionnaire to assess the quality of life of urinary incontinent women. Br J Obstet Gynaecol 104: 1374 –1379, 1997. 44. Homma Y, and Uemura S: Use of the short form of King’s Health Questionnaire to measure quality of life in patients with an overactive bladder. BJU Int 93: 1009 –1013, 2004. 45. Abrams P, Avery K, Gardener N, et al: The International Consultation on Incontinence Modular Questionnaire: www.iciq.net. J Urol 175: 1063–1066; discussion 1066, 2006. 46. Matza LS, Zyczynski TM, and Bavendam T: A review of quality-of-life questionnaires for urinary incontinence and overactive bladder: which ones to use and why? Curr Urol Rep 5: 336 –342, 2004.
UROLOGY 68 (Supplement 2A), August 2006