REVIEW
CLINICAL STUDY DESIGNS IN THE UROLOGIC LITERATURE: A REVIEW FOR THE PRACTICING UROLOGIST ANDREW L. ROSENBERG
I
n the current era of evidence-based medicine, urologists are increasingly called on to apply a greater scientific rigor to their daily practice. To do so, they must remain current with the new advances being reported in published studies and at scientific meetings. Furthermore, given the increasing volume of published clinical research, urologists need to discern between reports providing a high level of evidence for a causal effect and those that do not. A necessary tool for interpreting these articles is an understanding of clinical research designs. Although a variety of study designs are used in clinical research, they all possess a common purpose: to explore the associations and outcomes for a given condition or intervention. Ultimately, the goal is to determine the strength of evidence for a causal effect provided by a study and how best to apply these results to patient care. This review begins with a discussion of several basic concepts necessary for understanding clinical study designs. We then review the essential components of the most common clinical study designs: the case report/series, cross-sectional, case-control, cohort, pre-post experimental, nonrandomized clinical, and randomized clinical trial. In particular, their strengths, weaknesses, and threats to validity are discussed using examples drawn from urologic published reports. CONCEPTS BASIC TO THE UNDERSTANDING OF CLINICAL STUDIES
INFERENCE AND GENERALIZABILITY At its core, clinical research is a process in which samples of patients are studied to derive concluFrom the Robert Wood Johnson Clinical Scholars Program, Department of Anesthesiology and Critical Care Medicine, and Section of Urology, Department of Surgery, University of Michigan Medical Center; and Veterans Affairs Health Services Research and Development Program, Ann Arbor, Michigan Reprint requests: John T. Wei, M.D., Section of Urology, Department of Surgery, University of Michigan Medical Center, 1500 East Medical Center Drive, Box 0330, Ann Arbor, MI 48109-0330 Submitted: August 4, 1999, accepted (with revisions): November 19, 1999
468
© 2000, ELSEVIER SCIENCE INC. ALL RIGHTS RESERVED
AND
JOHN T. WEI
sions about a larger population of patients with a given condition. For example, the often-referenced “Partin Nomograms” used to predict pathologic stage were based on a sample of 4133 men who underwent radical retropubic prostatectomy for prostate cancer.1 From the analyses of this sample of men, the investigators constructed nomograms that are applicable to most men undergoing a radical prostatectomy. This process is known as inference and is primarily based on statistical terms of probabilities and confidence limits, a discussion of which is beyond the scope of this review. A concept related to inference is generalizability; that is, whether the findings from one study are valid for other populations. A recent report on the diagnostic value of percent free prostate-specific antigen (PSA) concluded that in men with total serum PSA levels of less than 3 ng/mL, a proportion of free PSA greater than 18% confers a very low risk of prostate cancer.2 The findings from this Scandinavian study, although useful for clinical practices of predominantly white European men, may not be generalizable to a urologic practice consisting of predominantly African-American or Asian men. Additionally, the mere fact that this study had a large sample size (1748 men) does not confer greater generalizability. After all, having more men in that study would not change the study population with regard to their ethnicity or race. Critical readers of published reports should ask whether the patient sample used in a particular study is similar to their own practice. LEVELS OF EVIDENCE The goal of any clinical study is to establish a causal relationship between an exposure, therapy, or risk factor and a particular outcome. Guidelines for assessing the evidence for causality have been published3,4 (Table I). Of these criteria, the strengths of association, temporal relationships, and dose-response relationships are relevant to the evaluation of study designs. A strong study design will provide a high level of evidence for causation by addressing all three of these criteria; weaker study designs may only consider one or two of UROLOGY 55: 468 – 476, 2000
• 0090-4295/00/$20.00 PII S0090-4295(99)00599-3
TABLE I. Guidelines for evaluating evidence for causal relationships 1. Strength of association—stronger associations are more likely to be causal 2. Temporal relationships—consistent with cause leading to an effect 3. Dose-response relationship—magnitude of effect increases with increasing exposure 4. Consistency with literature—supported by previously published data, including in different populations 5. Biologic plausibility—consistent with known basic biologic knowledge 6. Limited alternative hypotheses—few other viable explanations exist
these factors. Furthermore, randomized clinical trials generally provide the strongest evidence for causal relationships because they are affected less by bias and confounding than other study designs.5 ASSOCIATION, BIAS, AND CONFOUNDING The aim of most clinical research is to determine whether an association between a risk factor or intervention and an outcome exists. A statistically significant association results when subjects have both the risk factor and the outcome more frequently than would be expected by chance alone, often expressed as a P value less than 0.05. A statistically significant association may be seen for a number of reasons. First, an association can be due to a true cause and effect. This type of causal association is the most important clinically, as it offers the best opportunity to alter the natural history of a disease. For example, in a classic study of occupational disease, researchers discovered that dye workers in English factories had a 10 to 50-fold increased risk of developing bladder cancer.6 When this association was subsequently shown to be due to the teratogenic effects of aromatic amines on the transitional cell layer of the bladder,7 rational preventive interventions were possible to limit exposure to these compounds. However, a statistically significant association may also be due to chance alone or bias. Both chance and bias could explain the reported association between vasectomy and prostate cancer risk.8 In this example, the association may be due to chance alone, since both prostate cancer and vasectomy are very common; however, the more likely explanation may be an increased opportunity for those men who have had vasectomies to receive screening for prostate cancer as part of the continuing care from their urologist. This differential screening of one group is known as surveillance bias. Indeed, subsequent investigations have found no evidence to support the theory that vasectomy increases the risk of prostate cancer.9,10 UROLOGY 55 (4), 2000
Bias is perhaps the greatest threat to the validity of a clinical study and also the most difficult for readers to detect. It results from systematic errors in the design and/or conduct of a study and may take many forms. Common forms of bias include selection bias (ascertainment bias), misclassification bias, measurement bias, recall bias, and publication bias. Selection bias (ascertainment bias) occurs when different or incomparable criteria are used to select study and control subjects. It is most often encountered when a selective or convenience sample rather than a true random sample is used. Misclassification bias occurs when incomparable methods are used for identifying outcomes, such as when comparing pathologic staging after radical prostatectomy with clinical staging for radiation therapy. Measurement bias occurs when incomparable methods are used to measure a variable. This is often encountered when the results of different laboratories are used in a study rather than results from a central laboratory. Recall bias occurs when individuals with a certain condition are more likely to remember events related to their condition. Publication bias occurs when journals preferentially publish “positive” findings more frequently than studies with too many “negative” results. Biases are difficult to deal with in clinical research; they must be identified and controlled for as much as possible before the initiation of the study. There are a couple of other types of bias unique to survival analyses that are common and merit watching out for. Lead-time bias is an apparent increase in survival due to earlier detection without any alteration of the natural history of the disease. Length-time bias is the “over-diagnosis” of less aggressive cases that would never have led to death. Failure to address all known biases will limit the validity and value of a research report. In addition to bias, a true association can be affected by an unmeasured variable that is associated with the outcome of interest. This unmeasured variable is often referred to as a confounder. Confounding occurs when a differential distribution of unmeasured variables exists among the study and control groups. Along with the known, measured variables, these confounding variables are also associated with the outcome and are linked with the risk factor of interest. They can cause associations to exist when in reality there are none and vice versa. For example, there are data to suggest that the incidence of testicular cancer is increased among subfertile men.11 However, such studies do not consider that the diagnosis of subfertility often brings a man into the urologist’s office for a testicular examination. This increases the rate of testicular cancer screening among subfertile men and increases their chance of having a tumor detected. 469
FIGURE 1. Sample selection, strengths, and weaknesses for the commonly encountered types of clinical study design.
Testicular screening, then, may be a potential confounding variable. Confounding may be addressed both in the design of the study and in the analysis. Matching, a common method used to reduce confounding in observational studies, involves selecting samples for the study and control group that are as similar as possible with regard to one or more potentially confounding variables. The most common adjustment for confounding is in the analysis of data by using multivariable statistical modeling. The ability to identify bias and confounding will reduce systematic and nonsystematic errors within the data, improve the validity of studies, and strengthen the evidence for causality. STUDY DESIGN TYPES There are two general types of clinical study designs: observational (descriptive) and experimental (interventional) (Fig. 1). In an observational study, the investigator does not interfere with the natural history of the disease but rather measures variables of interest and the outcome. Conversely, in an experimental study, the investigator performs an intervention and subsequently measures its effects on the natural history of the disease. Observational studies can be either prospective or 470
retrospective.12 In a prospective study, the investigator specifies the variables of interest before the occurrence of the outcome. In a retrospective study, the investigator measures the variables of interest only after the outcome has occurred. Figure 2 compares the sampling strategies and outcome measurements among the commonly encountered observational study designs. Prospective designs can reduce certain biases, especially selection, misclassification, recall, and measurement biases. Observational studies are commonly performed to identify potential variables of interest and/or to estimate their frequency of occurrence within a patient population. The major strengths of observational studies are their lower costs, convenience, and shorter study periods, especially when using previously collected data. The major weaknesses are related to bias and confounding. Experimental studies are performed to establish the effects of interventions or therapies, usually after the initial descriptive studies have been performed. These studies, such as randomized clinical trials, are generally more complex, and tend to consume significantly more time and resources than the other types of studies. However, they minimize bias and confounding and are frequently needed to convince clinicians to adopt a new therUROLOGY 55 (4), 2000
FIGURE 2. Time-course, sampling methods, and outcome measurements for observational study designs.
apy or change their current practice. Experimental study designs also differ from observational study designs in that the causal element (an intervention) is usually controlled by the investigator. As a result, experimental studies offer the strongest level of evidence toward causality. OBSERVATIONAL STUDY DESIGNS CASE REPORTS AND CASE SERIES Case reports and case series are the most common and simplest form of observational study. These two study designs evaluate a single case (a report) or a succession (a series) of similar cases. The principal aspect of the case report or series is that they only provide information about those individuals with the disease or outcome of interest. There are no control groups per se, although some investigators will refer to historic controls (previously published works) as a comparison group. Case series can either be retrospective or prospective and typically are from a single institution. In urology, case series are often used to describe an investigator’s experience with a new surgical technique or procedure. For example, Koppie et al.13 reported on the efficacy of cryosurgery in 176 men UROLOGY 55 (4), 2000
with prostate cancer who underwent the procedure. Outcomes were assessed using serum PSA levels, and the actuarial biochemical recurrencefree survival was determined. The investigators found that an undetectable PSA nadir and a low preoperative PSA level were both associated with a more favorable outcome. Although this study suggests an effect of cryosurgery on some prostate cancers, an assessment of the magnitude of the benefit cannot be determined without a control group. Furthermore, selection bias occurred because 61% of the men had clinical Stage T3 tumor or greater, which may in part explain the high failure rate (51% at 3 years); therefore, it would be inappropriate to compare these results with the results reported in most radical prostatectomy series.13 The main advantage of case series is the ease with which one can collect data, especially when analyzing rare events or disorders. Their major weakness is the lack of a comparison group for developing measures of risks, and as a result, the potential for selection bias is increased. This makes generalizing a study’s findings more limited if the study group is quite different from the patients cared for by other urologists. Nevertheless, case reports and series are 471
helpful in generating hypotheses regarding new clinical ideas or exploring rare but important clinical issues. Many journals, including Urology, have a section dedicated to reports of this format. CROSS-SECTIONAL STUDIES Cross-sectional studies evaluate both the risk factors and the disease outcomes simultaneously for all individuals in a sample population and are analogous to viewing a snapshot of the population. This study design is useful for generating hypotheses and is sometimes used as a starting point for setting up cohort studies. In conducting a crosssectional study, it is essential that the target population be well characterized so that the study conclusions can be generalized to the larger population from which the sample is drawn. It is also important that the sampling plan (ie, the formal process used to select subjects) be clearly stated. If the sampling plan minimizes the possibility of differential data collection, the potential for biases is less and the validity of the study’s conclusion strengthened. Since both risk factors and outcomes are assessed at the same time in a population, it is impossible to establish causal relationships definitively. In a recent report, Stein et al.14 assessed the urinary levels of norepinephrine in a group of patients, of whom some were diagnosed with interstitial cystitis (IC) and others were not. This cross-sectional study found that the prevalence of urinary norepinephrine was significantly higher in patients with IC than in the rest of the subjects studied; however, causality cannot be proved, since one does not know which came first, the elevated norepinephrine levels or the IC. Nevertheless, these findings can be helpful in generating hypotheses that can then be tested in additional clinical studies. Bias is another limitation inherent in many crosssectional studies. In the previously described study, patients with IC were identified on the basis of the National Institutes of Health criteria.15 This set of criteria was not intended to be inclusive of all patients with IC but rather to establish a homogenous population suitable for clinical trials. By using these criteria, the investigators have systematically excluded certain patients with IC who have fewer symptoms, and this selection bias may have affected their results. The reader should consider the possibility that those patients with IC who do not fulfill the National Institutes of Health criteria may have low or normal levels of norepinephrine. CASE-CONTROL STUDIES Case-control studies are a powerful and commonly used study design in clinical research. The primary objective is to determine the strength of an 472
association between a risk factor and an outcome. By definition, case-control studies are retrospective because they start by identifying patients with a particular condition or outcome (cases). Comparisons can then be made with a group of similar patients who do not have the condition or outcome of interest (controls). It is important that the control group be sampled from the same underlying population as the cases (Fig. 2). This is done to ensure that both the cases and controls have a similar exposure to the risk factors of interest. The case-control study design is particularly useful when the disease or outcome being studied occurs infrequently relative to the risk factor of interest. For extremely rare disorders, this is the most practical study design to investigate relationships with statistical power. The main measure of association in case-control studies is the odds ratio. The odds ratio measures the strength of an association between the risk factor of interest and the disease or outcome. Odds ratios can range from zero to infinity, and extreme values (approaching zero or very large) indicate stronger negative or positive associations. Values close to 1.0 indicate weak relationships. For example, Jacobsen et al.16 evaluated the association between a digital rectal examination (DRE) and prostate cancer mortality. In this case-control study, cases were identified from the residents of Olmsted County who had prostate cancer listed on their death certificates. The control subjects were selected based on residence in Olmsted County, age, and duration of having a medical record in the community. The investigators found that men who died of prostate cancer were less likely to have undergone a DRE (odds ratio 0.51). In other words, screening with a DRE might have prevented 50% of the deaths due to prostate cancer. This is rather strong single study evidence for a relationship between a screening DRE and prostate cancer mortality. However, temporal relationships cannot be established and causal associations cannot be assessed using this study design. Similar to case series, case-control studies are also subject to bias, particularly sampling and recall bias. In the Olmsted County study above, the subjects in the control group were selected from a data base of residents in the county. If this list were outdated and many men who had undergone prescreening had since moved into the county, they would have been systematically excluded from the study. Jacobsen et al.16 addressed this potential bias in the design by requiring subjects to be matched by “duration of medical record” so that contemporaneous samples would be selected. A similar situation occurs with survival bias when dealing with more lethal diseases. In this case, eligible patients (cases) may die before being UROLOGY 55 (4), 2000
enrolled in the study, which would result in a systematic oversampling of healthier individuals as cases. Perhaps the most important bias to be aware of in case-control studies is recall bias. For example, in the Olmsted County study, patients who died of prostate cancer may have been more likely to have had their DREs recorded in the medical chart compared with control subjects who did not have prostate cancer. Although patient “recall” was not involved in this study, the recording bias in the medical record is analogous to recall bias. In another study, reported by Gann et al.,17 a case-control design was used to evaluate the ability of serum PSA to detect prostate cancer. Cases were men participating in the Physicians Health Study who were diagnosed with prostate cancer during their follow-up and had serum samples stored. Controls were age-matched individuals from the same study who did not have the diagnosis of prostate cancer reported. As this was a nested casecontrol study (a case-control study with subjects selected from a prospective cohort), relative risk estimates were obtained rather than odds ratios. These investigators reported a significant association between a single PSA measurement and the subsequent diagnosis of prostate cancer within 4 years. The magnitude of the association increased with increasing PSA levels. This study demonstrates the potentially powerful and valid findings that can be obtained in a case-control design. COHORT STUDIES Cohort studies begin by defining a sample of subjects and identifying the subsample who have and have not been exposed to the risk factor or variable of interest (Fig. 2). Importantly, individuals from this “cohort” must not already have the disease or outcome of interest. This cohort is then followed up for a predetermined period, during which time the outcomes are recorded as they occur. The main advantage of this study design is the ability to relate risk factors to the outcome using temporal relationships. This allows one to test hypotheses on the basis of biologic plausibility and causality. Cohort studies also provide stronger evidence for associations by allowing multiple outcomes related to particular exposures to be studied. The main measure of association in cohort studies is the relative risk (RR) or the ratio of disease in exposed subjects compared with the risk of disease in unexposed subjects. If the RR is equal to 1, no increased risk is attributable to the factor of interest. However, an RR greater than 1 is evidence of a positive association, which may or may not be causal. An RR less than 1 provides evidence of a negative association or a protective effect. True cohort studies used to investigate urologic conditions are rare. In fact, in urologic reports, UROLOGY 55 (4), 2000
most studies identified as involving a cohort design are actually case series. However, existing cohorts of people from other types of research are occasionally used to study a urologic condition. An example is the study relating the effects of total fluid intake and the subsequent risk of bladder cancer.18 The original cohort was the Health Professionals Follow-up Study initiated in 1986 to study the effects of dietary exposures on subsequent disease. The sample population was drawn from health professionals aged 40 to 75 who denied any history of malignancy. This cohort was then followed up using biennial surveys to update their diet and medical history. After adjusting for age, it was determined that total fluid intake was associated with a decreased risk of bladder cancer. A dose-response relationship was also observed; the highest quintile of total fluid intake (greater than 2531 mL/day) had the lowest risk of bladder cancer (RR ⫽ 0.51). In other words, those individuals who had a total fluid intake greater than 2531 mL/day had about one half the risk of bladder cancer compared with those in the lowest quintile (less than 1290 mL/ day). The ability to evaluate multiple exposures is a strength of cohort studies. In the last study multiple aspects of dietary exposure were being evaluated, only one of which was total fluid intake. As data are prospectively collected, recall and selection bias are minimized and measurement of exposures that are difficult for patients to remember such as fluid intake can be assessed with accuracy. A weakness of cohort studies is that they are inefficient in studying rare diseases or outcomes. Because cohort studies start by looking at exposures, if a disease or outcome occurs rarely or takes a long time to develop, very large groups and long periods of follow-up will be necessary. Perhaps the greatest threat to the validity of a cohort study occurs when significant follow-up losses occur. Losses to follow-up often occur when subjects die from other causes, move away, or otherwise become too ill to continue participation. If large groups of patients or important subgroups are lost to follow-up, measures of associations may be erroneous. The careful reader should check what percentage of patients originally enrolled were followed up through to the end of the study. EXPERIMENTAL STUDY DESIGNS PRE-POST EXPERIMENTAL STUDIES Pre-post experimental studies provide observations on a single group of subjects both before and after an intervention (Fig. 3). In this type of study, patients have baseline measurements taken for a particular variable (eg, flow rate or symptom score) before an intervention (eg, transurethral re473
FIGURE 3. Time-course, sampling methods, and outcome measurements for experimental study designs.
section of the prostate or medical therapy). After the intervention, the measurements are repeated on all the subjects and compared with each individual’s own baseline value. The same subject is used as his own control, minimizing problems of confounding. However, associations between the intervention and outcome are fairly speculative in the absence of a separate control group. That is, the outcome may have occurred independently of the intervention, as a result of the natural history of the disease. Hematuria secondary to prostatic bleeding is a common urologic diagnosis, and on occasion it may be difficult to manage. Several investigators have suggested that finasteride may be effective in treating hematuria from the prostate.19,20 Both of these studies used a pre-post study design in which the hematuria was graded before and after therapy with finasteride. One potential threat to the validity of these studies arises if patients were selected on the basis of extreme measurements (ie, if men with greater amounts of bleeding were more likely to be included in the study). A phenomenon known as “regression to the mean” is the greatest threat to pre-post experimental studies. It results from the statistical likelihood that the next measurement after an extreme value will be closer to 474
the overall mean of the population. This phenomenon occasionally explains most, if not all, of the observed differences between the before and after treatment measurements. The best way to address this problem is to perform multiple pretreatment measurements on each subject. The average pretreatment value is then used as the baseline measurement for comparison. By itself, a pre-post experimental study can often suggest causality but rarely provides sufficient evidence to warrant a change in clinical practice. The phenomenon of “regression to the mean” is well illustrated in another study that evaluated urinary symptom scores and quality of life after radical prostatectomy.21 The premise of the study was that many men who undergo radical prostatectomy also have significant voiding symptoms that may be alleviated after surgery, as the prostate gland is removed. In their analysis, the investigators found that those subjects with moderate to severe lower urinary tract symptoms had a significant improvement in their total symptoms and quality of life. However, in those subjects who had only mild urinary symptoms, no differences were detected. Indeed, the total symptom score in this group increased by 35% rather than decreased. Without the use of a proper control, the effect of the radical UROLOGY 55 (4), 2000
TABLE II. An organized approach to reading clinical urologic published reports 1. Were the study conclusions Clinically interesting? Important? 2. Were the results valid? Do the results address the research question? Was the proper study design used? Were sample data collected and analyzed correctly? Did the study include a control group? How complete was the follow-up? Were the results biologically plausible? 3. What were the results? Were all the clinically important variables considered? What was the magnitude of the effect? Was statistical significance addressed (P values/confidence intervals and sample size/power)? Was clinical significance considered? 4. Generalizability Was the study population similar or relevant to your practice?
prostatectomy on symptoms and quality life cannot be definitively evaluated. NONRANDOMIZED CLINICAL TRIALS Nonrandomized clinical trials address many of the problems inherent in the pre-post experimental designs by concurrently evaluating an intervention in an experimental and a control group of subjects (Fig. 3). When the study and control groups are appropriately matched for known confounders, the strength of evidence provided by this study design can be quite high. Unfortunately, it is often difficult to match two groups for all known important variables. In the absence of randomization, the investigators may select patients for each group by overtly or subtly using different criteria. This leads to selection bias, especially if unknown factors turn out to be important determinants of how subjects respond to the intervention. The inability to control for these confounders is the greatest threat to nonrandomized clinical trials. RANDOMIZED CLINICAL TRIALS The strongest evidence for a causal effect is offered by randomized clinical trials (RCTs). Frequently, an RCT will be used to evaluate the effect of a new drug, new procedure, or new intervention on an outcome of interest. Because of expense and time, they are usually preceded by observational studies that help to identify the likely risk factors that affect the outcomes. Figure 3 shows the general process of RCTs. Randomization ensures that confounding factors are equally distributed in both the study and control groups; however, since RCTs often use narrow inclusion and exclusion criteria, UROLOGY 55 (4), 2000
the study findings may have limited generalizability. In RCTs, the investigator identifies a group of eligible subjects who are then randomized into an intervention and a control group. McConnell et al.22 recently published the results of an RCT evaluating the efficacy of finasteride for the prevention of urinary retention. In this trial, they randomized men to receive either 5 mg of finasteride or a placebo. By randomization, confounding is reduced, since both study groups should have nearly the same distribution for all confounding factors, even the unmeasured ones; that is, if the randomization was successful. One should always note whether baseline characteristics were equally distributed between the study groups. Although randomization maximizes the comparability of study groups at the beginning of a clinical trial, it does not ensure that the groups will remain similar throughout a study. Patients sometimes cross from one study group to the other. This is most often seen in trials comparing medical and surgical therapies in which the subjects know which study group they are in. In the finasteride study, this was avoided by ensuring that both the patients and the researchers were unaware of which treatment the subjects were receiving (double-blind placebo). Alternatively, patients may just drop out of the trial. This can be expected to happen to some degree in all clinical trials, but the dropout rate will vary directly with the duration of follow-up. During the 4-year follow-up in the finasteride study, 34% of the men dropped out of the intervention group, and 42% dropped out in the placebo group. This difference was statistically significant. Since subjects who are lost to follow-up or drop out of a trial are more likely to have been adversely affected by the intervention, it becomes critical that the primary outcome be measured among the dropouts as well. The analysis should be conducted according to the randomization plan (intention-to-treat). An intention-to-treat analysis uses the outcomes from individual subjects based on the group to which they were originally randomized. Although biases exist in an intention-totreat analysis, they tend to reduce significant differences found between the study and control groups. Therefore, an intention-to-treat analysis is more conservative and the preferred analysis strategy.23 For greater details regarding the proper conduct and analysis of RCTs, the reader should refer to works by Pocock and others.24 –26 CONCLUSIONS Study designs range from the simple case report to the costly RCT. When reading clinical reports, the urologist must consider the strength of evi475
dence provided by a particular study design when analyzing the study results and conclusions. Observational study designs provide weak to moderate levels of evidence for causality, and RCTs generally provide the strongest level of evidence. The reader should always consider whether an appropriate study design has been used and how bias and confounding are addressed in a study. Ultimately, urologists must determine whether a study is valid and generalizable to their own clinical practice. To aid the practicing urologist in the assimilation of the urologic literature, we have outlined our approach to interpreting clinical reports (Table II). REFERENCES 1. Partin AW, Kattan MW, Subong EN, et al: Combination of prostate-specific antigen, clinical stage, and Gleason score to predict pathological stage of localized prostate cancer: a multi-institutional update. JAMA 277: 1445–1451, 1997. 2. Tornblom M, Norming U, Adolfsson J, et al: Diagnostic value of percent free prostate-specific antigen: retrospective analysis of a population-based screening study with emphasis on men with PSA levels less than 3.0 ng/mL. Urology 53: 945–950, 1999. 3. Gordis L: Epidemiology. Philadelphia, WB Saunders, 1996, pp 170 –182. 4. Gordis L, Kleinman JC, Klerman LV, et al: Criteria for evaluating evidence regarding the effectiveness of prenatal interventions, in Merkatz IR, and Thompson JE (Eds): New Perspectives on Prenatal Care. New York, Elsevier, 1990, pp 31–38. 5. Bailar JC: Introduction to clinical trials, in Shapiro SH, and Louis TA (Eds): Clinical Trials: Issues and Approaches. New York, Marcel Dekker, 1983, pp 1–11. 6. Case RAM, and Hosker ME: Tumors of the urinary bladder in workmen engaged in the manufacture and use of certain dyestuff intermediates in the British chemical industry. Br J Indust Med 11: 75–104, 1954. 7. Cohen SM: Urinary bladder carcinogenesis. Toxicol Pathol 26: 121–127, 1998. 8. Rosenberg L, Palmer JR, Zauber AG, et al: The relation of vasectomy to the risk of cancer. Am J Epidemiol 140: 431– 438, 1994. 9. Lesko SM, Louik C, Vezina R, et al: Vasectomy and prostate cancer. J Urol 161: 1848 –1852, 1999. 10. Zhu K, Stanford JL, Daling JR, et al: Vasectomy and prostate cancer: a case-control study in a health maintenance organization. Am J Epidemiol 144: 717–722, 1996. 11. Moller H, and Skakkebaek NE: Risk of testicular cancer
476
in subfertile men: case-control study. Br Med J 318: 559 –562, 1999. 12. Peipert JF, and Phipps MG: Observational studies. Clin Obstet Gynecol 41: 235–244, 1998. 13. Koppie TM, Shinohara K, Grossfeld GD, et al: The efficacy of cryosurgical ablation of prostate cancer: the University of California, San Francisco experience. J Urol 162: 427– 432, 1999. 14. Stein PC, Torri A, and Parsons CL: Elevated urinary norepinephrine in interstitial cystitis. Urology 53: 1140 – 1143, 1999. 15. Nyberg LM: Advances in the diagnosis and management of interstitial cystitis, in Stephen R (Ed): Urology Annual. Norwalk, Connecticut, Appleton & Lange, 1991, pp 181–191. 16. Jacobsen SJ, Bergstralh EJ, Katusic SK, et al: Screening digital rectal examination and prostate cancer mortality—a population-based case-control study. Urology 52: 173–179, 1998. 17. Gann PH, Hennekens CH, and Stampfer MJ: A prospective evaluation of plasma prostate-specific antigen for detection of prostatic cancer. JAMA 273: 289 –294, 1995. 18. Michaud DS, Spiegelman D, Clinton SK, et al: Fluid intake and the risk of bladder cancer in men. N Engl J Med 340: 1390 –1397, 1999. 19. Miller MI, and Puchner PJ: Effects of finasteride on hematuria associated with benign prostatic hyperplasia: longterm follow-up. Urology 51: 237–240, 1998. 20. Sieber PR, Rommel FM, Huffnagle HW, et al: The treatment of gross hematuria secondary to prostatic bleeding with finasteride. J Urol 159: 1232–1233, 1998. 21. Schwartz EJ, and Lepor H: Radical retropubic prostatectomy reduces symptom scores and improves quality of life in men with moderate and severe lower urinary tract symptoms. J Urol 161: 1185–1188, 1999. 22. McConnell JD, Bruskewitz R, Walsh P, et al, for the Finasteride Long-Term Efficacy and Safety Study Group: The effect of finasteride on the risk of acute urinary retention and the need for surgical treatment among men with benign prostatic hyperplasia. N Engl J Med 338: 557–563, 1998. 23. Peduzzi P, Wittes J, Detre K, et al: Analysis as randomized and the problem of non-adherence: an example from the Veterans Affairs Randomized Trial of Coronary Artery Bypass Surgery. Stat Med 12: 1185–1195, 1993. 24. Pocock SJ: Clinical Trials: A Practical Approach. New York, John Wiley & Sons, 1984. 25. Blumenstein BA: The randomized clinical trial, in Oesterling JE, and Richie JP (Eds): Urologic Oncology. Philadelphia, WB Saunders, 1997, pp 792– 800. 26. Friberg S, Taube A, Sylvester R, et al: Analysis and presentation. Urology 49(suppl 4A): 54 – 65, 1997.
UROLOGY 55 (4), 2000