Observational Epidemiology Jennifer L Kelsey, Stanford University School of Medicine, Stanford, CA, USA Ellen B Gold, University of California Davis School of Medicine, Davis, CA, USA Ó 2017 Elsevier Inc. All rights reserved.
Introduction In observational epidemiologic studies, an investigator observes what is occurring in a study population without intervening. In an observational study addressing the question of whether physical activity protects against coronary heart disease, for example, an investigator might note at the beginning of a study, and at intervals throughout the study, the physical activity level of each study participant and relate this to her/his development of coronary heart disease over time. This contrasts with an experimental (or intervention) study, in which the investigator assigns study participants, preferably at random, either to be exposed or not to be exposed to a particular agent or activity. In an experimental study, an investigator would randomly assign some people to a physical activity enhancement program and others not to be in this program, and then follow them over time to determine the frequency of the development of coronary heart disease in each group. Because of the randomization process, on average those assigned to the physical activity program will otherwise be similar to those assigned to the comparison group. In observational studies, however, physically active individuals might tend to differ in many other ways from inactive individuals. Thus, observational studies, which are the subject of this article, present special challenges when used for learning about disease causation, prevention, and therapy as well as for other purposes.
Some Uses of Observational Epidemiology One use of observational studies is to determine the frequency of occurrence, impact, prognosis, and other characteristics of diseases or other conditions in populations or in selected subgroups of the population. Such data are useful in setting priorities for investigation and control, in deciding where preventive efforts should be focused, and in determining what type of treatment facilities are needed. Observational studies can be used to learn about the natural history, clinical course, and pathogenesis of diseases. Observational studies may also be used to evaluate the effectiveness of therapeutic procedures or new modes of health-care delivery, although randomized trials are usually the preferred method of study for such evaluations. Most frequently, observational epidemiologic studies are used to learn about disease causation and prevention, and these applications of epidemiology will be emphasized in this article.
Descriptive Studies Descriptive studies generally provide information about the distribution of diseases and other characteristics without testing specific hypotheses about causation. Often such studies are used
International Encyclopedia of Public Health, 2nd edition, Volume 5
to generate hypotheses about causation or to examine the consistency of the distribution of a possible causative agent and a disease. Many descriptive studies provide information on patterns of disease occurrence in populations according to such attributes as age, sex, race/ethnicity, marital status, social class, occupation, geographic area, and time of occurrence. Routinely collected data from such sources as cancer registries, health insurance plans, death certificates, hospital discharge records, other medical records, and general health surveys are generally used for descriptive studies. This information can be used to indicate the magnitude of a problem or to suggest preliminary hypotheses about disease causation. For instance, Dantes et al. (2013) reported estimates of the number of health care-associated and community-onset invasive methicillinresistant Staphylococcus aureus (MRSA) infections in the United States during 2005–11. Cases were identified through the Emerging Infections Program-Active Bacterial Core (EIP-ABCs) surveillance system of the U.S. Centers for Disease Control and Prevention. Throughout this period, invasive MRSA infections were most frequent among patients with onset outside of acute care hospitals but who had recent or ongoing exposure to health care services, including recent discharge from acute care hospitals. In 2011, for the first time, fewer infections occurred among patients during hospitalization than among persons in the community who had not had recent health care exposures (Table 1). These results suggest that identification and testing of interventions outside of acute care settings are needed, especially among those with recent hospitalizations. Navar-Boggan et al. (2014), in another descriptive study, used data from the U.S. National Health and Nutrition Examination Survey to estimate the percentage of U.S. adults considered to be treatment-eligible for hypertension under guidelines released in 2014 by the Eighth Joint National Committee (JNC 8) compared to the percentage under the guidelines of 2003 from the Seventh Joint National Committee (JNC 7) on Prevention, Detection Evaluation, and Treatment of High Blood Pressure. The 2014 guidelines relaxed blood pressure targets for those aged 60 years and older and those with diabetes or chronic kidney disease. Classification according to the 2014 guidelines was associated with a reduction in the proportion of adults
Table 1 Estimated numbers of new cases of invasive methicillinresistant Staphylococcus aureus infections by epidemiologic category, United States, 2005 and 2011 Epidemiologic category
2005
2011
Health care-associated community-onset Community-associated Hospital onset Overall
3463 966 1601 6134
2912 1010 868 4872
Adapted from Dantes, R., Mu, V., Belflower, R., et al., 2013. National burden of invasive methicillin-resistant Staphylococcus aureus infections, United States, 2011. J. Am. Med. Assoc. Intern. Med. 173, 1970–1978.
http://dx.doi.org/10.1016/B978-0-12-803678-5.00310-6
295
296
Observational Epidemiology
recommended for hypertension treatment, especially among older adults (68.9% to 61.2% among those aged 60 years and older) and a substantial increase in the proportion of adults with treatment-eligible hypertension considered to have achieved blood pressure goals (40.0% to 65.8% among those of aged 60 years and older). This study demonstrates the need for more research on the long-term implications of the two sets of guidelines, especially among older adults. In another example, rates of diabetes-related complications (including acute myocardial infarction, death from hyperglycemia crisis, stroke, amputations and end stage renal disease) in the United States declined markedly between 1990 and 2010. The steep decline in these complication rates was hypothesized to have resulted from improved preventive care for adults with diabetes, although a large burden of complications remains because of the high prevalence of diabetes (Gregg et al., 2014). The hypothesis about improved preventive care for adults with diabetes requires testing in an appropriately designed hypothesis-testing study. Other types of descriptive studies are case reports and case series. In a case report, characteristics of one person or a very small number of people with a certain disease are described. Rarely can just one case provide definitive, generalizable information, but occasionally observation of one case can lead to important new avenues of research. For instance, the recent use of ‘unbiased next-generation sequencing’ of the cerebrospinal fluid of a 14-year old boy with severe neurologic symptoms revealed leptospirosis that had eluded conventional tests for months. Appropriate antimicrobial agents were immediately administered, and the infection was eradicated within days (Wilson et al., 2014). It is hoped that this technology can eventually be applied more broadly to the identification of pathogens, but much more work is needed before this occurs. In a case series, characteristics of several cases with a given disease are noted. A case series, even if large, also generally does not provide definitive information because a comparison group without the disease is needed to determine if the proportion with the characteristic(s) of interest is in excess of what would be expected in similar non-diseased people. In a case series from an outbreak of severe viral encephalitis associated with Nipah virus in Malaysia (Goh et al., 2000), some 93% of 94 cases reported direct contact with pigs, usually in the 2 weeks before the onset of the illness. This high percentage is strongly suggestive of transmission from pigs to humans, but a properly designed case-control study (to be described in the next section) would be needed to show whether the 93% exceeds what would be expected in similar non-diseased people. In another study, Roux et al. (2014) reported on five children who developed a polio-like syndrome with lower motor neuron injury in California from August 2012 to July 2013. All five had been previously vaccinated against the poliovirus. These cases suggest the possibility of an emerging infection, but further study is needed to begin to learn about causation. Thus, case reports, case series, and other descriptive studies are usually useful mainly in providing leads for future studies designed specifically to test hypotheses. Such hypothesistesting studies are often referred to as analytic epidemiologic studies.
Analytic Studies Analytic epidemiologic studies are designed to test causal hypotheses that have been generated from descriptive epidemiology, clinical observations, laboratory studies, and other sources, including analytic studies undertaken for other purposes. Analytic studies seek to determine why a disease is distributed in the population in the way that it is. Because analytic studies often necessitate the collection of new data, they tend to be more expensive than descriptive studies, but, if properly designed and executed, generally allow more definitive conclusions to be reached. Common types of analytic observational studies to be described below include case-control, cohort, and cross-sectional studies, as well as some hybrid designs. Ecologic studies will also be briefly described. Although these analytic study designs usually provide more definitive information than descriptive studies, in some instances experimental studies may be needed to demonstrate causality, though experimental studies are ethically appropriate only for potentially beneficial interventions. In presenting examples of study designs in this article, certain statistics will be used with which the reader may not be familiar. These will be described in more detail later under Measures of Association, but, briefly, the terms relative risk, odds ratio, and hazard ratio all roughly provide an idea of how much more likely one group (usually the exposed group) is to develop or die from a disease than a comparison group (usually the unexposed group). The standardized mortality ratio is the ratio of the observed number of deaths in a subgroup of the population to the number of deaths that would be expected on the basis of the age- and sex- (for instance) distribution of a reference population. Relative risks, odds ratios, hazard ratios, and standardized mortality ratios greater than 1.0 indicate a positive association between an exposure and a disease, those less than 1.0 indicate a negative association, and those of around 1.0 indicate no association. In addition, 95% confidence intervals are generally presented along with the estimates to provide an idea of the range of plausible values of these estimated parameters.
Case-Control Studies These are studies in which the investigator selects persons with a given disease (the cases) and similar persons without the given disease (the controls). Certain characteristics or past exposures to possible risk factors (e.g., cigarette smoking) in cases and controls are then determined and their proportion in each group compared. Cases are typically people seeking medical care for the disease. Usually only newly diagnosed cases are included to be more certain that the risk factor or exposure preceded the disease rather than being a consequence of the disease. A useful working concept of a control group was given by Miettinen (1985): The controls should be selected in an unbiased manner from those individuals who would have been included in the case group if they had developed the disease under study. Thus, the optimal control group depends on the source of the cases, but the relative costs of obtaining the various types of controls and the resources available to the investigator are also taken into account in selecting control groups. Sometimes controls are matched to cases either
Observational Epidemiology
individually or as a group on certain important characteristics, such as age and sex (see section Confounding). Matching on age and sex (for instance) means that the age and sex distributions of controls will be more closely aligned with those of the cases than would have occurred if controls had been selected without matching. Matching is preferred mainly when a relatively small amount of overlap exists between cases and controls on important variables, such as age and sex. Otherwise, these other characteristics can be taken into account in the statistical analysis. If the cases consist of all people developing the disease of interest within a defined population, then the best single control group would usually be a random sample of individuals from the same source population who have not developed the disease. In the United States and a few other countries, for the past several decades controls have frequently been found using random-digit dialing of landline phones (Waksberg, 1978) to identify members of the same community from which the cases arose, but the proportion of the target population responding may be unacceptably low. Response rates have been diminishing, and the proportion of the population with landline phones has also been decreasing, especially among young people, raising questions about how well participants identified through random-digit dialing represent their source population. Therefore, investigators have been exploring supplementing traditional landline random-digit dialing with random-digit dialing using cell-phone numbers and with address-based sampling (described in Clagett et al., 2013). Although these methods are feasible, the response rates for all three methods are low: taking into account both the screening response rate and the field response rate, overall response rates of 11.4% for landline random-digit dialing, 4.1% for cell phone random-digit dialing, 1.7% for addressbased sampling were reported from the study of Clagett et al. (2013). Thus, the representativeness of samples derived from such methods is indeed of considerable concern. In countries with population registries, controls are typically sampled from the registries. If cases come from a defined group such as members of a health maintenance organization, controls without the disease of interest are generally sampled from among other members of the defined group. Cotterchio et al. (2014) used a case-control design to evaluate a possible protective effect of allergies against the development of pancreatic cancer. The study included 345 newly diagnosed pancreatic cancer cases during 2011 and 2012 identified from the Ontario Cancer Registry, which collects data on all cancer cases diagnosed in the Province of Ontario, Canada. Controls, in a ratio of 3 controls per case, were selected by random-digit dialing using a sampling frame of telephone directories and commercially available lists from the same area. A total of 11 629 households was called, 4549 of whom refused or did not answer. Of the 1995 eligible persons identified, 87% agreed to participate, but only 1285 (74%) of those completed a questionnaire. Protective associations were observed for hay fever, dust or mold allergies, and animal or pet allergies, consistent with prior studies, but not for food or medication allergies or allergic asthma. Amadou et al. (2014) recruited women with newly diagnosed breast cancer from 12 hospitals at the major health care institutions in three large regions in Mexico, and sampled controls from the same
297
5-year age group, the same region and with membership in the same health care institution as the cases. They used this type of case-control approach to determine the association between anthropometric characteristics and breast cancer in Mexico. If cases are identified at certain hospitals that do not cover a defined geographic area, it is usually impossible to specify the source population from which the cases arose. In this instance, controls are often chosen from among patients with other diseases admitted to the same hospitals as the cases, because one wants to obtain a source of controls subject to the same selective factors as the cases. It is usually desirable to include as controls people with a variety of other conditions, so that no single disease is unduly represented in the control group. Having one or more diseases overrepresented among the controls might provide a non-representative estimate of the frequency of exposure among the controls. Generally, it is important to exclude potential controls who have had their disease for a long period of time because, like the cases, the presence of their disease may have influenced their exposure to possible risk factors. For example, such characteristics as physical activity, diet, weight, and medication use may change as a result of many diseases. The classic case-control studies of Doll and Hill (1952) of the association between cigarette smoking and lung cancer (Table 2) identified cases from several hospitals in England and used as controls patients admitted to the same hospitals with other diseases who had a similar age and sex distribution to the cases. Although the vast majority of both cases and controls smoked, a higher percentage of cases than controls smoked, and when the amount of smoking was examined, the cases were much more likely to have been heavy smokers than the controls. Information on exposure to putative risk factors may be obtained in several ways, depending on the nature of the exposure. Frequently, questionnaires administered to cases and controls by trained interviewers are used, as this is often the only way to find out about past exposures. Existing records may sometimes be used to find out about exposures such as medication use. Physical measurements or laboratory tests on sera or other tissue drawn from cases and controls may also be used, but it must be kept in mind that measurements of some attributes differ after the disease has occurred compared to before the disease developed. Whichever methods are used, it is important that the same measurement methods be used in cases and controls.
Table 2 Case-control study of the association between cigarette smoking and lung cancer in males, England Smoking status
Lung cancer cases
Control group
Smokers Nonsmokers Total Percent smokers
1350 7 1357 99.5%
1296 61 1357 95.5%
1350=7 Odds ratio ¼ 1296=61 ¼ 9:1, 95% confidence interval ¼ (4.1–19.9). Adapted from Doll, R., Hill, A., 1952. A study of the aetiology of carcinoma of the lung. Br. Med. J. 2, 1271–1286.
298
Observational Epidemiology
Case-control studies can provide a great deal of useful information about risk factors for diseases, and over the years have been the most frequently undertaken type of epidemiologic study. They can generally be carried out in a much shorter period of time than cohort studies (to be discussed in the next section), and under most circumstances do not require nearly so large a sample size as cohort studies. Consequently, they are less expensive. For a rare disease, case-control studies are usually the only practical approach to examining possible risk factors. Nevertheless, certain problems and limitations may affect the conclusions that can be drawn from case-control studies. Among some of the major concerns are that: 1. Information on potential risk factors may not be available with sufficient accuracy either from records or the participants’ memories (see section Measurement Error); 2. Information on other relevant variables that need to be taken into account to ensure comparability of cases and controls may not be available with sufficient accuracy either from records or from the participants’ memories (see section Confounding); 3. Cases may be more likely than controls to search their memories for a cause of their disease and thereby be more likely to report an exposure to a particular risk factor than controls (see section Information Bias); 4. The investigator may be unable to determine with certainty whether the agent was likely to have caused the disease or whether the occurrence of the disease was likely to have caused exposure to the agent, sometimes referred to as ‘temporal bias’; 5. Identifying and assembling a case group representative of all cases may not be feasible (see section Selection Bias); 6. Identifying and assembling an appropriate control group may be difficult (see section Selection Bias); 7. Participation rates may be low and/or different between cases and controls, causing concern about representativeness and bias (see section Selection Bias). In view of these potential weaknesses, the case-control study is considered by some to be a type of study that merely provides leads to be followed by more definitive cohort studies. However, decisions about preventive actions often have to be made on the basis of information obtained from case-control studies because cohort studies or randomized trials may not be feasible or sufficiently timely. Each case-control study should be evaluated on its own merits because some studies are affected very little by error and bias, while others may be affected a great deal.
follow-up) in those exposed and those not exposed (or according to levels of exposure) are compared. A cohort study may also involve measuring exposure status at the beginning of a study and determining how this relates to changes in an attribute (such as blood pressure) over time. Changes in both exposure status and the outcome of interest may be measured over time. Because in most cohort studies people enter and leave the cohort at different times, the total length of time that each cohort member is at risk and under surveillance by the investigator for the outcome of interest has to be taken into account. The sum of the length of time each cohort member is at risk and under observation is the person-time. However, one must be careful that the concept of persontime is valid in any given situation; for example, in studies of pregnancy, one woman observed for 9 months is not equivalent to nine women observed for 1 month in terms of the probability of an outcome of pregnancy. U.S. and Ukrainian investigators conducted a prospective cohort study of whether individuals exposed to higher doses of radiation from the explosion at the Chernobyl nuclear power plant in the Ukraine on 26 April 1986 had a higher risk of thyroid cancer than those exposed to lower doses (Brenner et al., 2011). Analyses of data from 12 514 individuals followed for 73 004 person-years revealed 65 incident thyroid cancers and a linear dose–response relationship. The excess rate of thyroid cancer per gray of I-131 exposure above the background thyroid cancer rate was 2.21 per 10 000 person-years (95% confidence interval 0.04–5.78 per 10 000 person-years). The results indicated that I-131-related thyroid cancer risks persisted for two decades after exposure without evidence of a decrease. Another example is a cohort study conducted in response to concerns that consumption of soy-containing foods might have adverse consequences for women previously diagnosed with breast cancer (Caan et al., 2011). A total of 3088 women with prior early stage breast cancer diagnoses from 1991 to 2000 were followed for a median of 7.3 years to examine breast cancer events and death in relation to their intake of isoflavones, a major component of soy. The results showed that overall mortality declined with increasing isoflavone intake. No association of such intake with invasive breast cancer recurrence or new invasive primary breast cancer was observed (Table 3).
Table 3 Hazard ratios (HR) and 95% confidence intervals (CI) for isoflavone intake associated with breast cancer recurrence and overall mortality in women with prior breast cancer in Women’s Healthy Eating and Living Study, a prospective cohort study
Cohort Studies Prospective Cohort Studies In a traditional prospective cohort study, individuals without the disease(s) of interest at the start of the study are classified according to whether they are exposed or not exposed to one or more potential risk factors, or according to their levels of exposure. The cohort is then followed for a period of time (which may be many years) and the incidence rates (number of new cases of disease per person-time of follow-up) or mortality rates (number of deaths per person-time of
Isoflavone intake (mg day1)
HR (95% CI) for invasive breast cancer recurrence or new primary breast cancer
HR (95% CI) for overall mortality
0–0.07 0.07–1.01 1.01–16.33 16.33–86.9
1.0 0.89 (0.72–1.11) 0.99 (0.75–1.32) 0.78 (0.46–1.31)
1.0 0.75 (0.57–0.99) 0.79 (0.54–1.15) 0.46 (0.2–1.05)
Adapted from Caan, B.J., Natarajan, L., Parker, B., et al., 2011. Soy food consumption and breast cancer prognosis. Cancer Epidemiol. Biomark. Prev. 20, 854–858.
Observational Epidemiology
Cohort studies have a major advantage over case-control studies in that exposures or other characteristics of interest are generally measured before the disease has developed (or before changes in an attribute take place). On the other hand, cohort studies often require large sample sizes (particularly for relatively rare diseases), long-term follow-up of cohort members (particularly for diseases with long latent periods), large monetary expense, and complex administrative and organizational arrangements. The outcome of primary interest must be relatively frequent, or prohibitively large sample sizes will be needed to ensure adequate numbers experiencing the outcome. Therefore, prospective cohort studies are usually initiated under two circumstances: first, when sufficient (but not definitive) evidence has been obtained from less expensive studies to warrant more expensive cohort studies, and, second, when a new agent (e.g., a widely used medication) is introduced that may alter the risk for several diseases. In addition, cohort studies undertaken for one specific purpose are often used to test other hypotheses of interest as well, so that many cohort studies are multi-purpose, thus making them more costeffective. Many prospective cohort studies are being undertaken in various parts of the world. Cohorts are occasionally chosen because they are representative of a certain community, such as in the Framingham Heart Study (Dawber et al., 1951), which started in 1948. This study contributed substantially to our knowledge of risk factors for coronary heart disease, which were largely unknown when the study was initiated. This study has been carried out for several decades, is ongoing, and now includes both offspring and subsequent generations of the original cohort members. It has added to our knowledge of risk factors for many other diseases besides coronary heart disease. Although the ability to generalize from community-based studies makes them highly desirable, they are usually expensive and often lose a substantial proportion of participants over the course of many years of follow-up, an issue that can reduce the accuracy and generalizability of the results. Furthermore, an exposure of interest may be infrequent in the general population, so that selection of a cohort with a higher proportion who are exposed may be a more efficient approach than a cohort study in a general population or sometimes even a case-control study. People working in a specific industry or occupation are sometimes enrolled in cohort studies because they: (1) often have exposures of particular interest, (2) are less likely to be lost to follow-up, (3) have a certain amount of relevant information recorded in their medical and employment records, and (4) in many instances undergo initial and then periodic medical examinations. Cohorts from health insurance plans also offer various advantages, including the records that are kept of all patient encounters with the health plan. Virtually all cohorts are selected because they offer some special opportunities. A series of cohort studies that has provided a great deal of information about risk factors for several diseases in women is the Nurses’ Health Studies (www.channing.harvard.edu/nhs/). Nurses were selected not because of any particular occupational exposure, but because it was believed that their cooperation would be good and that they would report disease occurrence with a high degree of accuracy. The first nurses’ cohort was established in 1976 and included 121 700 nurses. The second nurses’ cohort, which
299
includes 116 686 nurses, was started in 1989. As of this writing, nurses for a third nurses’ cohort are being recruited, with the intent that the study be entirely web-based. A large cohort in the United Kingdom, the Million Women Study (Beral et al., 2011), followed 1 129 025 of the women who were invited to the Breast Screening Program of the National Health Service from 1996–2001. The cohort members were followed by means of mailed questionnaires, for a total of 4.04 million woman-years of follow-up. Among the many findings from this study, the investigators reported an increased risk of breast cancer associated with the use of menopausal hormone therapy (Table 4). Following such a large cohort over many years might be prohibitively expensive, except that most of the information was collected using a questionnaire sent through the mail.
Retrospective Cohort Studies In a retrospective cohort study (also called an historical cohort study), investigators assemble a cohort by reviewing records to identify those who had the exposure(s) of interest in the past (often decades previously). Based on recorded past exposure histories, cohort members are divided into exposed and nonexposed groups, or according to level of exposure. The investigator then reconstructs their subsequent disease or mortality experience up to some defined point in time. For instance, cancer incidence during the period 1953–96 was determined in workers who had been employed for at least 6 months in three Norwegian silicon carbide smelters (Romundstad et al., 2001) and then compared to the cancer incidence in the population of Norway over the same time period. A greater number of lung cancer cases was found among the silicon workers than was expected on the basis of lung cancer rates in the general population. In another example, mortality during the period 4 July 1965 to 31 December 2010 was determined in female veterans who had served in the U.S. military between 4 July 1965 and 28 March 1973, the dates of U.S. military involvement in combat in the Vietnam war (Kang et al., 2014). The authors compared the cause-specific mortality risks of these female veterans to those of women who served in other countries near Vietnam, to non-deployed U.S. military women, and to the general population of U.S. women over the same time period. All three female veteran groups had lower all-cause Table 4 Adjusteda hazard ratios (HR) and 95% confidence intervals (CI) for breast cancer related to hormone therapy (HT) use in the Million Women Study, a prospective cohort study Estrogen only HR (95% CI) Total duration used (years) <5 1.24 (1.14–1.35) 5þ 1.44 (1.37–1.52) Years from menopause to first HT use <5 1.43 (1.36–1.49) 5þ 1.05 (0.89–1.23) a
Estrogen-progestin (HR (95% CI)
1.62 (1.54–1.71) 2.19 (2.10–2.27) 2.04 (1.97–2.12) 1.53 (1.38–1.69)
Adjusted for age, region, socioeconomic status, age at menopause, parity, age at first birth, alcohol consumption. Adapted from Beral, V., Reeves, G., Bull, D., Green, J., 2011. Breast cancer risk in relation to the interval between menopause and starting hormone therapy. J. Natl. Cancer Inst. 103, 296–305.
300
Observational Epidemiology
mortality risk than the U.S. women after adjustment for race, age and calendar period. However, the Vietnam cohort had well over three times the number of motor vehicle deaths (Standardized Mortality Ratio [SMR] ¼ 3.67, 95% confidence interval [CI] 2.30–5.56) and a greater number of pancreatic cancer deaths (SMR ¼ 1.47, 95% CI 0.97–2.12) than was expected on the basis of the general population of U.S. women. Retrospective cohort studies have many of the advantages of prospective cohort studies, but can be completed in a much more timely fashion; consequently, they are considerably less expensive. However, only when the necessary information on past exposure has been recorded fairly accurately can a retrospective cohort study be undertaken with much likelihood of success. Tracing most of the cohort members must be possible to establish whether they developed the disease of interest (or died). As with prospective cohort studies, a retrospective cohort study is usually feasible only when the outcome of interest is relatively frequent. Obtaining information on characteristics of the cohort members other than the exposure and outcome of primary interest may also be critical, so as to determine whether those with and without the exposure of interest are comparable in other relevant respects (e.g., age, smoking habits) and to allow statistical adjustment for differences. If such information is not available, interpretation of the study results may be ambiguous.
Cross-Sectional Studies In a cross-sectional, or prevalence, study, both the exposure to a hypothesized risk factor and the occurrence of a disease are measured at one time (or over a relatively short period of time) in a study population. Prevalence proportions (numbers of cases with existing disease per population at risk at a given point in time or short period of time) among those with and without the exposure or characteristic of interest are then compared. For a quantitative variable such as blood pressure, the distributions of the variables in the exposed and unexposed are compared. Cross-sectional studies include all cases of a disease, new and old, at a given point or period in time. Thus, it is difficult to differentiate cause from effect. For instance, a cross-sectional study reported higher proportions of obesity among persons employed for more than 40 h week1 and with exposure to a hostile work environment than among others (Luckhaupt et al., 2014). However, it was not possible to determine whether obesity predisposed people to work in certain types of jobs or whether people with such jobs tended to become obese. Interpretation of cross-sectional studies is generally clear only for attributes that do not change as a result of the disease, such as race or genotype. In addition, the cases with disease of long duration tend to be overrepresented because such cases are more likely to be identified at a given point in time than cases who recover or die quickly. Accordingly, any association found between an exposure and a disease may be attributable to survivorship with the disease rather than to development of the disease. Another use of cross-sectional studies is simply to describe the prevalence of a disease in a population. For such studies to be useful, the individuals studied should be representative of the population to whom the results are to be generalized.
Patients seen in tertiary care centers or in the practice of any one physician, for example, are seldom representative of all persons in the community with a disease, many of whom may not have sought medical care. Generalizations from such select groups of patients should be avoided. The Study of Osteoporotic Fractures used a cross-sectional approach to describe the prevalence of radiographic vertebral fractures in black and white women of age 65 years and older (Cauley et al., 2008). The prevalence was 10.6% in black women and 19.1% in white women, and the difference remained when age and other risk factors were taken into account. Because race had to have preceded the fracture, the question of cause and effect does not arise in this study, and vertebral fractures do not have much effect on survivorship. A cross-sectional study with potentially important public health implications examined dietary and weight data from 1999 to 2010 in the U.S. National Health and Nutrition Examination Survey (Bleich et al., 2014). The results indicated that in the 24 h prior to the survey, overweight and obese adults consumed more diet drinks and also more calories from solid food than ‘healthy weight’ adults (1965 kcal and 2058 kcal from solid food for overweight and obese, respectively, vs 1841 kcal for ‘healthy weight’ adults). Their consumption of total calories was similar to overweight (1874 kcal) and obese (1897 kcal) adults who drank sugar-sweetened beverages. These results suggest that overweight and obese adults who consume diet drinks may need to decrease calorie consumption from solid foods to lose weight. Longitudinal studies are needed to be more certain of the time sequence of obesity, diet, and beverage consumption and to examine whether other factors affect these associations.
Hybrid Study Designs Nested Case-Control Studies Sometimes it is possible to increase efficiency and maintain the advantages of a cohort study by designing a case-control study within either a prospective or retrospective cohort study. The cohort study may be either observational or experimental. Such a study is often referred to as a nested case-control study. Case-control studies in which the populations from which the cases and controls are well-defined may also be considered to be nested within a cohort. Suppose blood samples from a cohort of 10 000 people free of rheumatoid arthritis at baseline have been frozen and stored. Suppose also that after 10 years 200 people have developed rheumatoid arthritis and 9800 have not. The stored sera from the 200 people with rheumatoid arthritis and a sample of, say, 600 of the 9800 people free of rheumatoid arthritis are thawed and analyzed for the presence of some serologic marker. This sampling of nondiseased individuals greatly reduces the cost from what it would be in a traditional cohort study in which sera from all 10 000 cohort members would be tested at the beginning of the study. Nevertheless, the serologic marker was present before the disease developed, thus providing the major advantage of a cohort study. The serologic status of the cases and controls is then compared, as in a traditional case-control study. In a nested case-control study, controls are selected from unaffected cohort members who are still alive and under surveillance at the time the cases developed the disease. Typically,
Observational Epidemiology
Table 5 Odds ratios (OR) and 95% confidence intervals (CI) for association of anal cancer with seropositivity for selected HPV antigens and for other risk factors, a nested case-control study within the Swiss HIV Cohort Study, 1988–2011 Odds Ratioa
95% CI
HPV antigens (41 cases of anal cancer and 114 matched controls) 4.52 2.00–10.20 HPV 16 seropositiveb Any other non-HPV 16 2.30 1.03–5.13 high risk HPV seropositiveb HPV6 and 11 seropositiveb 3.04 1.15–8.01 Other risk factors (in 59 cases of anal cancer and 295 matched controls) Current smokerc 2.59 1.25–5.34 Former smokerc 0.96 0.32–2.89 Nadir CD4þ cell count, cells ml1d 50–199 1.68 0.77–3.65 <50 3.96 1.82–8.61 a
Accounting for controls being matched to cases on hospital, sex, HIV risk category, age, and year at enrollment. Referent is seronegative for the specific antigen. c Referent is never smoker. d Referent is 200. Adapted from Bertisch, B., Franceschi, S., Lise, M., et al., 2013. Risk factors for anal cancer in persons infected with HIV: a nested case-control study in the Swiss HIV Cohort Study. Am. J. Epidemiol. 178, 877–884. b
the controls are matched to cases according to age, sex, and time of entry into the cohort. The availability of many banks of stored serum around the world and the current interest in serologic predictors of disease make nested case-control studies an attractive and economical approach, as long as the serologic marker of interest does not undergo degradation over time. Similarly, genetic markers of disease can be examined in a nested case-control study, provided appropriate cells have been properly stored. Bertisch et al. (2013) conducted a case-control study nested within the Swiss HIV Cohort Study (1988–2011) to examine the relation of human papillomavirus (HPV) infection and other potential risk factors to anal cancer in persons infected with HIV. The cohort study enrolled persons infected with HIV from seven large Swiss hospitals and had 103 000 person-years of follow-up by December 2011. Fifty-nine eligible cases of anal cancer were identified from the cohort, and five controls from among the cohort members who did not develop anal cancer were matched (on hospital, sex, HIV risk category, age, and year at enrollment) to each case. All controls had at least the same length of follow-up as their matched cases, and each could serve as a control for only one case. For a subset of 41 cases and their 114 matched controls with available stored serum samples, HPV antibodies were tested from serum samples that had been taken closest in time prior to the cancer diagnosis. Anal cancer was associated with seropositivity for antibodies against 11 coat proteins of HPV16, non-HPV16 high risk HPV types, and HPV6 and 11 (Table 5). Also, using the full 59 cases and their matched controls, current cigarette smoking and low CD4þ cell counts were identified as risk factors for anal cancer.
Case-Cohort Studies Another hybrid study design being used with increasing frequency is the case-cohort study. A case-cohort study is another
301
method of increasing efficiency compared to a traditional retrospective or prospective cohort study (observational or experimental), and includes one or more case groups along with the comparison group. It is particularly useful when the associations between a serologic marker (or other variable) and two or more diseases are of interest. As in a nested case-control study, all cases of the disease(s) of interest occurring in a cohort are generally selected for study. However, in a case-cohort study, the comparison group is a sample of the entire cohort, not just those free of disease, and the cohort members are not matched to the cases. Rather, other relevant variables and the possibility that some cases could be included in the comparison group are taken into account in the statistical analysis. Lam et al. (2013) used a case-cohort design among participants in a blood collection survey within the Linxian General Population Nutrition Intervention Trial in China to examine whether low plasma Vitamin C concentrations were associated with subsequent gastric cancer and esophageal cancer. All cases of each type of cancer (618 with esophageal cancer and 467 with gastric cancer) were included, along with an age- and sex-stratified random sample of 948 eligible cohort members who had participated in the blood collection survey. They found that the higher the plasma Vitamin C concentration at the time of the survey, the lower the subsequent risk for gastric cancer, whereas no association was seen for esophageal cancer. Thus, they were able to study two different types of cancer efficiently by using a common comparison group; in addition, the sera for the measurement of Vitamin C concentrations were drawn before the diagnosis of the cancers rather than possibly being a consequence of the cancers.
Ecologic Studies In an ecologic study, a summary measure of exposure and a summary measure of disease frequency are obtained for aggregates of individuals, such as persons in certain geographic areas or time periods. The purpose is to determine whether the units with the highest (or lowest) frequency of exposure tend to be the units with the highest (or lowest) frequency of disease. Ecologic studies may be descriptive or analytic. Analytic studies may be undertaken for a variety of reasons, including as an inexpensive and convenient way to test hypotheses regarding causation. For instance, it has long been observed, using routinely collected data, that countries with the highest per capita consumption of beer have the highest incidence rates of rectal cancer. However, many other differences occur among these countries in addition to their beer consumption. Additional analytic studies are needed to determine whether within these countries the individuals who drink the most beer have the highest rectal cancer rates. Sometimes ecologic studies are carried out because an individual-level study would be impractical or impossible. An ecologic study in the United States reported that as the volume of text messages greatly increased between 2005 and 2008, the number and percentage of all road fatalities attributed to distracted driving also markedly increased (Wilson and Stimpson, 2010). Although such data are limited in as much as they do not indicate whether it was specifically the texting individuals who had an excess of such fatalities, these results are consistent with studies from other disciplines that strongly suggest a causal association. Conducting an individual-level
302
Observational Epidemiology
study of the association between distracted driving and road fatalities would be difficult, if not impossible. In a case-control study, data would be needed on the precise timing of the distraction in relation to the motor vehicle crash for cases and at a comparable time and place for controls who did not crash. Obtaining such information accurately would be impossible. Another reason for using ecologic studies is that sometimes the health effects of ecologic factors, such as characteristics of a neighborhood, are of interest in their own right. Likewise, the effects of a new public health program on the aggregate health of the residents of the area in which it was implemented might be of interest. When appropriate aggregated and individual-level data are available, two or more different levels of aggregation (e.g., state, county, municipality, census tract, block group, block, individual) can be considered in the same study and nested within each other to form a hierarchy of levels, each of which could be contributing to risk for the disease under study. Special methods of statistical analysis, called multilevel analysis or multilevel modeling, are used to try to separate out the contributions of the different levels of aggregation to disease risk.
Selected Epidemiologic Concepts in Observational Studies Confounding The possibility of confounding, which is especially likely to occur in observational studies, usually needs to be considered when trying to interpret associations between exposures and diseases. A confounding variable is a variable that (1) affects the risk of the disease or condition under study independent of the exposure or characteristic of primary interest, and (2) is associated with the exposure or characteristic of primary interest in the study population, but (3) is not a consequence of that exposure. Suppose that an investigator finds that coffee drinking during pregnancy is associated with an increased risk for delivery of a low-birth-weight infant. The investigator would have to be concerned that this statistical association between coffee consumption and delivery of a low-birthweight infant is actually attributable to a greater tendency of coffee drinkers than non-coffee drinkers to smoke cigarettes, and that it is the cigarette smoking that puts them at an elevated risk for delivering a low-birth-weight infant, not the coffee drinking. In this instance, smoking is considered a confounding variable. With any study design, information on potential confounding variables can be obtained during data collection and then taken into account in the statistical analysis. Alternatively, in cohort studies, one or more potential confounding variables may be taken into account in the study design by matching unexposed to exposed individuals on the potential confounding variable(s). In case-control studies, non-diseased (controls) may be matched to diseased (cases) on one or more potential confounding variables. If controls are matched to cases, statistical methods that take the matching into account are used in the analysis. It is also possible to match roughly on certain variables in the study design and then control more tightly in the analysis. Measurement of potential confounding variables is
highly important, because otherwise they cannot be adequately taken into account in the study design or analysis.
Effect Modification Effect modification, also called statistical interaction, occurs when the magnitude of the association between one variable and another differs according to the level of a third variable. For instance, in affluent Western countries, the association between obesity and risk for breast cancer varies according to whether a woman is premenopausal or postmenopausal. Among postmenopausal women, obese women are at elevated risk for breast cancer, while before menopause, obese women are at decreased risk compared to thin women (Colditz et al., 2006). Thus, menopausal status is an effect modifier of the relation of obesity to breast cancer. An area of considerable current interest is whether the effects on disease risk of certain environmental exposures and lifestyle factors are modified by a person’s genotype. For example, Schmidt et al. (2012) found that compared to lower maternal folic acid intake, higher maternal folic acid intake during the first month of pregnancy was associated with a decreased risk for autism spectrum disorders (ASD) (odds ratio (OR) ¼ 0.61, 95% CI 0.41–0.89 for 600 mg day1 compared to <600 mg day1). However, maternal and child MTHFR 677 C > T variant (a genetic polymorphism which is associated with high homocysteine levels that thus makes higher amounts of folate intake necessary for appropriate neurodevelopment) modified this association. The association of higher maternal folic acid intake with reduced ASD risk was strongest when mothers and children had the MTHFR 677 CT/TT variant (OR ¼ 0.30, 95% CI 0.10–0.90 for 600 mg day1 compared to <600 mg day1), whereas if the mother or child had the CT/TT genotype, the protective relation was less strong and the confidence interval included 1.00 (OR ¼ 0.49, 95% CI 0.16–1.50). If the mother or child or both had the CC genotype, the relation was not protective (ORs ranged from 1.15 to 1.29, and all 95% CIs included 1.00). The MTHFR 677 C > T variant occurs with increased frequency in children with autism, potentially indicating that such children may be genetically predisposed to have folate function less efficiently and to have less efficient folate metabolism and thus would likely benefit from higher maternal folic acid intake. In the statistical analysis, identification of effect modification involves comparing associations between exposures and diseases in subgroups of the population (e.g., in premenopausal vs postmenopausal women; in those with the MTHFR 677 C > T gene variant vs those of other genotypes). Large overall sample sizes are often needed to make such comparisons among subgroups.
Some Measures of Association and Their Attributes Relative Risk In cohort studies, the strength of the association between a putative risk factor and a disease is often measured by what is called a relative risk (or more accurately, a hazard (or rate) ratio or risk ratio; a discussion of hazard ratios and risk ratios is beyond the scope of this article). A relative risk is simply the risk (or incidence rate) of disease in one group (usually
Observational Epidemiology
303
the exposed group) divided by the risk (or incidence rate) of disease in another group (usually the unexposed). When appropriate, a relative risk should be adjusted for confounding variables. A relative risk (or, technically, a hazard ratio) can be computed from the data provided by Brenner et al. (2011), mentioned earlier, on differences in the rates of thyroid cancer by age. The authors reported a fivefold increased risk for 30– 39 year-olds relative to those less than 22 years of age (adjusted relative risk ¼ 5.10, 95% CI 2.60–9.99). Before reaching any conclusions, however, one would want to check for confounding by a variety of variables that are associated with age and thyroid cancer, such as sex, smoking habits, alcohol consumption, and many other attributes. If confounding is present, the unadjusted and adjusted relative risks will differ.
sex-specific death (or incidence) rates in the general population. The SIR of 1.85 (74 observed cases/39.9 expected cases) for lung cancer incidence in male Norwegian silicon carbide smelter workers (Romundstad et al., 2001; see section Retrospective Cohort Studies), indicated that almost twice as many lung cancer cases were observed among the smelter workers than was expected based on the age- and time-period-specific incidence rates in the male population of Norway. The SMR of 3.67 (22 observed cases/6 expected cases) for motor vehicle accident deaths in the female Vietnam veterans (Kang et al., 2014) indicated that almost four times as many motor vehicle accident deaths were observed among female Vietnam Veterans than was expected based on the age- and time-period-specific mortality rates in the female population of the U.S.
Odds Ratio
Confidence Interval
In case-control studies, risks and incidence rates usually cannot be determined because the investigator has selected the study population based on the presence or absence of disease and does not know the risks or rates of disease specifically in the exposed and unexposed groups. Therefore, relative risks cannot be computed. Rather, the odds ratio is calculated to provide an estimate of the relative risk. In an unmatched case-control study, the odds ratio is estimated by the ratio of exposed to unexposed among cases divided by the ratio of exposed to unexposed among controls. In a case-control study in which controls are individually matched to cases, the odds ratio is estimated as the ratio of the number of case-control pairs in which the case is exposed but the control is not to the number of pairs in which the control is exposed but the case is not. It can be shown that for all but the most frequently occurring diseases (i.e., more than 10% of the exposed or unexposed population affected), the odds ratio is a good approximation to the relative risk and can be interpreted in a similar manner. In the case-control study described above of the association between smoking and lung cancer (Table 1), the odds ratio
A confidence interval should be presented along with estimates of the relative risk, odds ratio, hazard ratio, standardized mortality ratio or other parameter to give a range of plausible values for the parameter being estimated. A 95% confidence interval of 1.46–2.75 around a point estimate of relative risk of 2.00, for instance, indicates that a relative risk of less than 1.46 or greater than 2.75 can be ruled out at the 95% confidence level, and that a statistical test of any relative risk outside the interval would yield a probability value less than 0.05.
of
1350=7 1296=61
¼ 9:1 indicates that the odds of lung cancer in
smokers was more than nine times that in nonsmokers. The odds ratio in the study mentioned previously of the association between anthropometric characteristics and breast cancer in Mexican women (Amadou et al., 2014) was 0.96, 95% CI 0.64–1.44 for overweight (body mass index 25–29.9 kg m2) versus normal weight (body mass index <25) in postmenopausal women, suggesting no association, either positive or negative, of overweight on postmenopausal breast cancer in Mexican women. Odds ratios should be adjusted for confounding factors as needed.
Standardized Ratios The standardized mortality ratio (SMR) and the standardized incidence ratio (SIR) are generally used when disease rates in the cohort under study are being compared to disease rates in a reference population, such as the general population of the geographic area from which the cohort was selected. The SMR (or SIR) is the ratio of observed number of deaths (or incident cases) in the cohort to the number of deaths (or incident cases) that would be expected, for example, on the basis of age- and
Attributable Fraction Provided that the association between a risk factor and a disease is causal (see section Guidelines for Assessing Causation from Observational Studies), the attributable fraction provides a rough indication of the proportion of disease occurrence that potentially would be eliminated if exposure to the risk factor were prevented. It should not, however, be confused with the proportion of cases caused by the exposure or the probability of causation. The attributable fraction can be calculated either for exposed individuals only or for the population as a whole. In a cohort study, the attributable fraction for the exposed can be computed as ðRisk for exposed Risk for unexposedÞ Risk for exposed or, equivalently Relative risk 1 Relative risk The attributable fraction for the population can be computed as ðRisk for entire populationÞ ðRisk for unexposedÞ Risk for entire population or, equivalently Prevalence of exposure in population ðRelative risk 1Þ 1 þ ðPrevalence of exposure in populationÞ ðRelative risk 1Þ
In case-control studies of infrequent diseases, the odds ratio can be substituted for the relative risk to provide a good
304
Observational Epidemiology
approximation to the attributable fraction that would be computed using the relative risk. It is important to note that in the presence of confounding, other formulae must be used to estimate attributable fractions.
Measurement Error Inaccurate measurement can lead to erroneous conclusions. The possibility of measurement error is of concern for most variables considered in observational epidemiologic studies. Exposures such as diet and physical activity are often measured by questionnaire, and their measurement can entail a great deal of error. Measurement of some diseases, such as arthritic disorders and psychiatric disorders, is difficult because of the frequent absence of definitive diagnostic criteria and the episodic nature of signs or symptoms. The validity or accuracy of a measurement refers to the average closeness of the measurement to the true value. Reliability or reproducibility refers to the extent to which the same value of the measurement is obtained on the same occasion by the same observer, on multiple occasions by the same observer, or by different observers on the same occasion. Precision refers to the amount of variation around the measurement or estimate; a precise measure will have a small amount of variation around it, but may or may not be valid. Measurement error is said to be differential if the magnitude of error for one variable differs according to the actual value of other variables, and nondifferential if the magnitude or error in one variable does not vary according to the actual value of other variables. In a 2 2 table (e.g., exposure present or absent, disease present or absent), nondifferential misclassification always causes the relative risk or odds ratio to be closer to 1.0 than the true value (i.e., attenuates the magnitude of the association), provided that errors in measurement of the two variables are independent. Dependent, or differential, misclassification, on the other hand, can cause associations to be overestimated or underestimated, depending on the circumstances. When measurement error occurs for a potential confounding variable, adjusting for the confounding variable in the analysis will not entirely remove its effect. When both the exposure and confounder are measured with error, the effects are less predictable. Also, when estimates are made from tables larger than 2 2, some circumstances can occur under which even nondifferential measurement error can cause an association to appear larger than it really is.
Frequent Sources of Bias in Observational Studies Bias refers to the tendency of a measurement or a statistic systematically to underestimate or overestimate the true value of that measurement or statistic. Bias can arise from many sources in observational epidemiologic studies. It can affect estimates of disease and exposure frequency and the magnitude of associations between exposures and diseases. Biases from uncontrolled confounding and from measurement error were described in previous sections. Some other frequent sources of bias are described in the following sections.
Information Bias This is systematic error in measuring the exposure or outcome such that data are more accurate or more complete in one group than in another. Interviewer (or other types of observer) bias, recall bias, and reporting bias are examples of information bias. Interviewer (or other types of observer) bias is systematic error occurring when an interviewer (or other observer) does not collect information in a similar manner for each group being compared. For example, in a case-control study if an interviewer believes, whether subconsciously or not, that a certain drug increases the risk for a disease, the interviewer might probe more deeply into the medication history of cases than controls. Recall bias is systematic error resulting from differences in the accuracy or completeness of recall of past events between groups. In a case-control study, mothers of infants whose children are born with a congenital malformation may think back and remember events during the pregnancy more thoroughly than mothers of apparently healthy infants. Reporting bias is a systematic error resulting from the tendency of people in one group to be more or less likely to report information than others. In a case-control study, cases with certain diseases, for example, might be more likely to deny that they had used alcohol than controls.
Selection Bias This is systematic error occurring as a result of differences between those who are and those who are not selected for inclusion in a study or who are selected to be in a certain group within a study. Examples of selection bias are ascertainment bias, detection bias, and response (or participation) bias. Ascertainment bias is systematic error consequent to failure to identify equally all categories of individuals who are supposed to be represented in a group. For example, a specialty hospital may include mostly very sick or complicated cases who are not representative of all cases of a particular disease. Detection bias is systematic error resulting from greater likelihood of some cases being identified, diagnosed, or verified than others. For instance, a diagnosis of pulmonary embolism may be more likely to be made in oral contraceptive users than in non-users because the oral contraceptive users may have more frequent medical care so as to be able to refill their oral contraceptive prescriptions and thus be more likely to have a lung scan for chest pain. As a result, an association between oral contraceptives and pulmonary embolism might result at least in part from greater likelihood of disease detection in users than occurs in non-users of oral contraceptives. Response (or participation) bias is systematic error resulting from differences between those who do and do not choose to participate in a study, and between those who remain in a cohort study and those who do not. In a study to estimate disease incidence or prevalence, even though a sample is selected by formal sampling procedures to be representative of the source population, the sample is likely to give biased results if a substantial proportion of those who are selected do not participate. In a study trying to estimate the prevalence of a disease, for instance, those with serious disease may be too sick to participate, and busy people may have little interest in
Observational Epidemiology
participating. Accordingly, very ill and very busy people may be underrepresented in the study. If people who are sicker are less likely to return for follow-up visits in a cohort study, information on disease status of those who do continue to participate will not be representative of the disease status of all persons who were originally enrolled in the study.
l
l
Guidelines for Assessing Causation from Observational Studies Because it is unethical to conduct experimental trials in humans of potentially disease-causing agents, many epidemiologic studies of potentially causal agents are observational, and thus conclusions about the likelihood of causation often have to be made on the basis of observational studies, despite their limitations and potential biases. In recent years, counterfactual, graphical, marginal structural models and structural equation models have begun to be applied to the analysis of possible causal relationships. Although beyond the scope of this article, these models are likely to see increasing applications in observational epidemiology in the future. Over the years, practical guidelines have been developed to be used as tests of whether a causal association exists. The first widely used set of causal criteria was developed by Hill (1965), and has been somewhat modified over time. Not all criteria need to be fulfilled in all instances, nor are all equally important or always applicable. However, taken together, they provide some guidance about whether an association between a given exposure and disease is one of cause and effect. The following include most of those promulgated by the Report of the Advisory Committee to the Surgeon General (1964) and Hill (1965), with some additions. However, their criterion that the association be specific for a given agent and given disease has not been included. Researchers and policy-makers have asserted that the criterion of specificity is inappropriate for exposures such as smoking that have systemic effects and thus are rarely specific to one outcome. l
Strength of association: The measure of association (e.g., relative risk or odds ratio) should be elevated (or decreased for a protective factor), indicating that the exposed are at increased risk of disease compared to the unexposed. The higher the relative risk, the more likely the association is to be causal because confounding is unlikely to be responsible for a very strong association. As a rough rule of thumb, a relative risk or odds ratio of 2 suggests a moderate elevation in risk, and a relative risk or odds ratio of 3 or more is considered a strong association. l Ruling out alternative explanations: Once it has been determined that an association between an exposure and disease exists, other explanations for the observed association, such as methodological deficiencies and confounding, should be carefully considered and tested against any available data or background information. l Dose–response relationship: If increasing dose or length of exposure is associated with increasing risk, then the case for causality is considerably enhanced. The absence of a dose–response relationship, however, does not disprove causality because other patterns, such as a threshold effect, could also exist.
l
l
l
l
l
305
Removal of exposure: If the presence of an exposure increases risk of disease and removing the exposure reduces risk, the likelihood that an association is causal is increased. Time order: It should be clear that the exposure caused the disease and not that the disease caused the exposure. This issue is especially relevant in cross-sectional studies, in which prevalent cases and exposures are considered simultaneously. Time order is unique among the causal guidelines in that if the disease can be shown to have preceded the exposure, the exposure cannot have caused the disease. Predictive ability: Tentative hypotheses regarding causation that can be shown to predict future occurrences better than alternative hypotheses provide strong support for causality. Consistency: If associations of similar magnitude are found in different populations by different methods of study, the likelihood of causality is increased because all studies are unlikely to have the same methodological limitations or idiosyncrasies of the study population. Biologic plausibility: When a new finding fits well with current knowledge of the biology of a disease, it is more plausible that it is causal than if a whole new theory must be developed to explain the finding. Another way of enhancing biologic plausibility is through laboratory experiments. However, what occurs in a laboratory or in animals may have limited applicability to free-living humans. Also, the biology may not yet have been sufficiently well studied, even though the epidemiologic evidence is strong. Coherence of evidence: The various relationships and findings should make biologic and epidemiologic sense to be considered causal. Confirmation in experimental studies: When available, the results of well-designed experiments in which exposures are assigned at random are very convincing because the only factor on which groups differ, except by chance, is the exposure of interest. However, in many circumstances, exposures cannot be ethically or practically assigned at random. In addition, experiments on carefully selected people may have limited relevance to the general, free-living population.
It should be apparent that decisions on the likelihood of causality are of necessity partly judgmental. What one person may believe is a causal association, another person may not. Lilienfeld (1957) divided the degree of evidence for causation into three levels (Table 6). At the first level, the evidence is considered sufficient for further study. The possibility that the risk of food allergies is increased by alterations in the gut microbiota from increased use of antimicrobial agents falls into this category. At the second level, the evidence is considered sufficient to warrant public health action, even if the causal association has not been definitively established. Many people would Table 6 Level 1 Level 2 Level 3
Levels of evidence for causation The evidence is sufficient to warrant further investigation The evidence is sufficient for recommending preventive action The evidence is sufficient to say that a causal inference has been proved; this causal hypothesis is included in our body of scientific knowledge
Adapted from Lilienfeld, A., 1957. Epidemiologic methods and inferences in studies of non-infectious diseases. Public Health Rep. 72, 51–60.
306
Observational Epidemiology
put the evidence that a healthy diet protects against certain cancers at this second level. At the third level, the evidence is so strong that the causal association is considered part of the body of scientific knowledge. The evidence that smoking causes lung cancer or that the human immunodeficiency virus causes AIDS is at this level of certainty.
Conclusion Observational epidemiology plays an important role in learning about disease causation. Major concerns of epidemiologists undertaking observational epidemiologic studies include using the best sources of data, employing the proper study designs, selecting appropriate study populations, using good methods of measurement, quantifying the magnitudes of associations, controlling for confounding, and detecting effect modification. In practice, it is often not possible to meet all these objectives to the extent desired because people may choose not to participate in a study, optimal measurement may not be feasible, fiscal constraints may affect study design and the nature or extent of data collection, and a variety of other problems may arise. It is important to recognize the effects of these inadequacies in various situations because specific inadequacies can affect study results in different ways. It should be readily apparent that conclusions drawn from observational epidemiologic studies, as with other types of scientific inquiry, are often not final. Results generally require confirmation from additional epidemiologic or laboratory studies, experimental studies, or ascertainment of the effect of removal or modification of the suspected risk factor. What was believed to be a causal association may later be found to be attributable to uncontrolled confounding, and an association thought to be attributable to uncontrolled confounding may turn out to be causal. A causal agent in one population may not operate the same way in another population. The best method of measurement at one point in time may later be supplanted by a better method. In any one study, a reported association may have occurred by chance, especially when many possible associations are being examined. Thus, it is essential to keep an open mind as new knowledge accumulates about disease causation from epidemiologic, laboratory, and other types of studies and as attempts are made to replicate the results of even the most carefully executed individual studies. In this way, knowledge about disease causation gradually evolves.
See also: Clinical Epidemiology; Demography, Epidemiology, and Public Health; Genetic Epidemiology; Social Epidemiology; Surveillance of Disease: Overview.
References Amadou, A., Torres Mejia, G., Fagherazzi, G., et al., 2014. Anthropometry, silhouette trajectory and risk of breast cancer in Mexican women. Am. J. Prev. Med. 46, S52–S64. Beral, V., Reeves, G., Bull, D., Green, J., 2011. Breast cancer risk in relation to the interval between menopause and starting hormone therapy. J. Natl. Cancer Inst. 103, 296–305.
Bertisch, B., Franceschi, S., Lise, M., et al., 2013. Risk factors for anal cancer in persons infected with HIV: a nested case-control study in the Swiss HIV Cohort Study. Am. J. Epidemiol. 178, 877–884. Bleich, S.N., Wolfson, J.A., Vine, S., Wang, Y.C., 2014. Diet-beverage consumption and caloric intake among U.S. adults, overall and by body weight. Am. J. Public Health 104, e72–e78. Brenner, A.V., Tronko, M.D., Hatch, M., et al., 2011. I-131 dose response for incident thyroid cancers in Ukraine related to the Chornobyl (sic) accident. Environ. Health Perspect. 119, 933–939. Caan, B.J., Natarajan, L., Parker, B., et al., 2011. Soy food consumption and breast cancer prognosis. Cancer Epidemiol. Biomark. Prev. 20, 854–858. Cauley, J.A., Palmero, L., Vogt, M., et al., 2008. Prevalent vertebral fractures in black women and white women. J. Bone Miner. Res. 23, 1458–1467. Clagett, B., Nathanson, K.L., Ciosek, S.L., et al., 2013. Comparison of address-based sampling and random-digit dialing methods for recruiting young men as controls in a case-control study of testicular cancer susceptibility. Am. J. Epidemiol. 178, 1638–1647. Colditz, G.A., Baer, H.J., Tamimi, R.M., 2006. Breast cancer. In: Schottenfeld, D., Fraumeni Jr., J.F. (Eds.), Cancer Epidemiology and Prevention, third ed. Oxford University Press, New York, pp. 995–1012. Cotterchio, M., Lowcock, E., Hudson, T.J., Greenwood, C., Galinger, S., 2014. Association between allergies and risk of pancreatic cancer. Cancer Epidemiol. Biomark. Prev. 23, 469–480. Dantes, R., Mu, Y., Belflower, R., et al., 2013. National burden of invasive methicillinresistant Staphylococcus aureus infections, United States, 2011. J. Am. Med. Assoc. Intern. Med. 173, 1970–1978. Dawber, T.R., Meadors, G.F., Moore Jr., F.E., 1951. Epidemiological approaches to heart disease: the Framingham Study. Am. J. Public Health 41, 279–286. Doll, R., Hill, A., 1952. A study of the aetiology of carcinoma of the lung. Br. Med. J. 2, 1271–1286. Goh, K.J., Tan, C.T., Chew, N.K., et al., 2000. Clinical features of Nipah virus encephalitis among pig farmers in Malaysia. N. Engl. J. Med. 342, 1229–1235. Gregg, E.W., Li, Y., Wang, J., et al., 2014. Changes in diabetes-related complications in the United States, 1990–2010. N. Engl. J. Med. 370, 1514–1523. Hill, A.B., 1965. The environment and disease: association or causation? Proc. R. Soc. Med. 58, 295–300. Kang, H.K., Cypel, Y., Kilbourne, A.M., et al., 2014. Health VIEWS: mortality study of female US Vietnam era veterans, 1965–2010. Am. J. Epidemiol. 179, 721–730. Lam, T.K., Freedman, N.D., Fan, J.-H., et al., 2013. Prediagnostic plasma vitamin C and risk of gastric adenocarcinoma and esophageal squamous cell carcinoma in a Chinese population. Am. J. Clin. Nutr. 98, 1289–1297. Lilienfeld, A., 1957. Epidemiologic methods and inferences in studies of non-infectious diseases. Public Health Rep. 72, 51–60. Luckhaupt, S.E., Cohen, M.A., Li, J., Calvert, G.M., 2014. Prevalence of obesity among U.S. workers and associations with occupational factors. Am. J. Prev. Med. 46, 237–248. Miettinen, O.S., 1985. The “case-control” study: valid selection of subjects. J. Chronic Dis. 38, 543–548. Navar-Boggan, A.M., Pencina, M.J., Williams, K., Sniderman, A.D., Peterson, E.D., 2014. Proportion of US adults potentially affected by the 2014 hypertension guidelines. J. Am. Med. Assoc. 311, 1424–1429. Report of the Advisory Committee to the Surgeon General of the Public Health Service, 1964. Smoking and Health. U.S. Department of Health, Education, and Welfare, Public Health Service. PHS Publication No. 1103. U.S. Government Printing Office, Washington. Romundstad, P., Andersen, A., Haldorsen, T., 2001. Cancer incidence among workers in the Norwegian silicon carbide industry. Am. J. Epidemiol. 153, 978–986. Roux, A., Lulu, S., Waubant, E., Glaser, C., Van Haren, K., 2014. A polio-like syndrome in California: clinical, radiologic, and serologic evaluation of five children identified by a statewide laboratory over a twelve-month period. In: Presented at the 66th Annual Meeting of the American Academy of Neurology, Philadelphia: April 29, 2014. Schmidt, R.J., Tancredi, D.J., Ozonoff, S., et al., 2012. Maternal periconceptional folic acid intake and risk of autism spectrum disorders and developmental delay in the CHARGE (Childhood Autism Risks from Genetics and Environment) case-control study. Am. J. Clin. Nutr. 96, 80–89. Waksberg, J., 1978. Sampling methods for random digit dialing. J. Am. Stat. Assoc. 73, 40–46. Wilson, F.A., Stimpson, J.P., 2010. Trends in fatalities from distracted driving in the United States, 1999–2008. Am. J. Public Health 100, 2213–2219. Wilson, M.R., Naccache, S.N., Samayoa, E., et al., June 4, 2014. Actionable diagnosis of neuroleptospirosis by next-generation sequencing. N. Engl. J. Med. 370 (25), 2408–2417. Epub ahead of print.
Observational Epidemiology
Further Reading Austin, H., Hill, H.A., Flanders, W.D., Greenberg, R.S., 1994. Limitations in the application of case-control methodology. Epidemiol. Rev. 16, 65–76. Fletcher, R., Fletcher, S.W., Fletcher, G.S., 2014. Clinical Epidemiology, fifth ed. Lippincott Williams & Wilkins, Philadelphia. Friedman, G.D., 2004. Primer of Epidemiology, fifth ed. McGraw-Hill, New York. Gordis, L., 2009. Epidemiology, fourth ed. WB Saunders Company, Philadelphia. Greenland, S., 2001. Ecologic versus individual-level sources of bias in ecologic estimates of contextual health effects. Int. J. Epidemiol. 30, 1343–1350.
307
Hernán, M.A., Robins, J.M., 2006. Instruments of causal inference: an epidemiologist’s dream? Epidemiology 17, 360–372. Porta, A. (Ed.), 2014. A Dictionary of Epidemiology, sixth ed. Oxford University Press, New York. Rothman, K.J., Greenland, S., Lash, T.L., 2008. Modern Epidemiology, third ed. Lippincott Williams & Wilkins, Philadelphia. Szklo, M., Nieto, F.J., 2014. Epidemiology, beyond the Basics, third ed. Jones and Bartlett, Burlington MA. Weiss, N.S., Koepsell, T.D., 2014. Epidemiologic Methods. Studying the Occurrence of Illness, second ed. Oxford University Press, New York.