Chapter 2
What are the Common Diseases? 2.1 THE COMMON DISEASES OF HUMANS, A SHORT BUT TERRIFYING LIST “Not everything that counts can be counted, and not everything that can be counted counts.” —William Bruce Cameron
There are about 7 billion humans living in the world today, with about 57 million people dying each year [1,2]. There are about 312 million persons residing in the U.S. [3,1]. The U.S. Central Intelligence Agency estimates that U.S. crude death rate is 8.36 per 1000 and the world crude death rate is 8.12 per 1000 [4]. This translates to 2.6 million people dying in 2011 in the U.S. These figures are just a tad higher than the total U.S. deaths calculated independently from the 2003 National Vital Statistics Report [5]. Authoritative death statistics correlate surprisingly well with the widely used rule of thumb that 1% of the human population dies every year. What diseases account for all of these deaths? Let us take a look at diseases that cause the greatest number of human deaths worldwide. Worldwide deaths in 2008, from the World Health Organization [2]: TOTAL DEATHS WORLDWIDE
56,888,289
1. Cardiovascular diseases
17,326,646
2. Infectious and parasitic diseases
8,721,166
3. Malignant neoplasms
7,583,252
4. Respiratory infections
3,533,652
5. Diabetes mellitus
1,255,585
6. Alzheimer and other dementias
539,948
7. Other neoplasms
188,227
Rare Diseases and Orphan Drugs. http://dx.doi.org/10.1016/B978-0-12-419988-0.00002-X © 2014 Elsevier Inc. All rights reserved.
9
10
PART | I Understanding the Problem
U.S. deaths in 2003, from National Vital Statistics Report [5]: TOTAL DEATHS IN THE U.S. (calculated from table)
2,512,873
1. Diseases of heart
596,339
2. Malignant neoplasms
575,313
3. Chronic lower respiratory diseases
143,382
4. Cerebrovascular diseases
128,931
5. Accidents (unintentional injuries)
122,777
6. Alzheimer’s disease
84,691
7. Diabetes mellitus
73,282
There is much to be learned from these two short lists. We see that although there are thousands of human diseases, many of which are capable of causing death, only a few diseases account for the bulk of death occurring in populations. For both U.S. deaths and worldwide deaths, the first three conditions on each list account for more than 50% of the total number of deaths. The top seven conditions account for 70% of the total number of deaths worldwide. 2.1.1 Rule—A small number of diseases account for most instances of morbidity or mortality. Brief Rationale—Pareto’s principle applies to biological systems.
Pareto’s principle, also known as the 80/20 rule, holds that a small number of causes will account for the vast majority of observed instances of real-world distributions (see Glossary item, Pareto’s principle). For example, a small number of rich people account for the majority of wealth. A few troublemakers in a classroom may draw the bulk of a teacher’s attention. Just two countries, India and China, account for 37% of the world population. Within most countries, a small number of provinces or geographic areas contain the majority of the population of a country (e.g., east and west coastlines of the U.S.). A small number of books, compared with the total number of published books, account for the majority of book sales. In the realm of medicine, a small number of diseases account for the bulk of human morbidity and mortality. For example, two common types of cancer, basal cell carcinoma of skin and squamous cell carcinoma of skin, account for about 1 million new cases of cancer each year in the U.S. This is approximately the sum total of all other types of cancer combined. Sets of data that follow Pareto’s principle are often said to follow a Zipf distribution, or a power law distribution (see Glossary item, Zipf distribution). These types of distributions are not tractable by standard statistical descriptors because they do not produce a symmetric distribution around a central peak. Simple measurements such as average and standard deviation have very little
11
Chapter | 2 What are the Common Diseases?
practical meaning when applied to Zipf distributions. Furthermore, none of the statistical inferences built upon an assumption of a normal or Gaussian distribution will apply to data sets that observe Pareto’s principle. 2.1.2 Rule—Funding for disease research adheres to Pareto’s principle. Brief Rationale—The diseases that kill the greatest number of individuals receive the highest levels of funding, in the simple-minded expectation that advances against common diseases will provide the greatest benefit to society.
National Institutes of Health (NIH) spending, by institute, for the budget year 2010, based on data from the American Association for the Advancement of Science [6], is: National Cancer Institute
$5.295 billion
National Institute of Heart, Lung, and Blood
$3.213 billion
National Institute of Allergy and Infectious Diseases
$4.690 billion
Total NIH budget
$32.127 billion
The NIH comprises 27 institutes and centers. The top three institutes account for 41% of NIH budget. Let us look at cancer funding, based on cancer incidence. Cancer funding from the National Cancer Institute in millions of dollars, and listed in decreasing order of cancer incidence, for 2010 [7], is: Lung
281.9
Prostate
300.5
Breast
631.2
Colorectal
270.4
Bladder
22.6
Melanoma
102.3
Non-Hodgkin lymphoma
122.4
Kidney
44.6
Thyroid
15.6
Endometrial (uterine)
14.2
Once again, Pareto’s principle applies. The top four sites of cancer occurrence are the top four recipients of funding, accounting for 82% of the funding provided to the top 10 cancer sites. Whenever we look at death rates, we find that a few common diseases have a disproportionate effect on human mortality. For example, using WHO data, a
12
PART | I Understanding the Problem
drop in the cardiovascular death rate of a mere 3% would be equivalent, in terms of lives saved, to eliminating all deaths due to Alzheimer’s disease plus all other dementias. Is there any wonder that the bulk of research spending at NIH is directed toward cancer, cardiovascular diseases, and infectious diseases? Actually, there are reasons that weigh against spending the bulk of research funds on common diseases. These reasons will be discussed throughout this book. For now, let us consider how the argument for investing in common diseases considerably weakens when we consider the effect of age-at-diagnosis on life expectancy. 2.1.3 Rule—The cancers that account for the majority of cancer deaths occur in elderly individuals. Brief Rationale—Common diseases are caused by cellular events that accumulate over time or that arise over time. Hence, the chance of developing a common disease increases steadily as individuals age.
For example, let us compare the incidence of cancer in children aged 4 and under compared with the incidence of cancer in adults aged 85 and older. In England, males and females 4 and under have a cancer incidence of 19.3/100,000 population and 17.4/100,000 population, respectively [8]. We see here that females have a lower rate of cancer than males for this age group. In the same statistical survey, the incidence of cancer in males and females 85 and older is 3393.5/100,000 and 2095.3/100,000 [8]. Males and females 85 and older have a cancer incidence 176 and 120 times that seen with male and female children 4 years and under, respectively. 2.1.4 Rule—The most common causes of death, if eliminated entirely, will not greatly increase human life expectancy. Brief Rationale—Elderly individuals who do not die from one common disease will likely die from some other common disease.
In 1978, Tsai and coworkers calculated the increase in life expectancy that would occur if cancer was eliminated as a human disease. They predicted that the elimination of cancer would extend human life by no more than 2.5 years [9]. Readers may be surprised to know that if we were to finally win the war against cancer, the increase in life expectancy would only equal 2.5 years. Although we would all appreciate having an average of 2.5 years added to our lifespans, we must understand that differences of life expectancy of 2.5 years are found among populations living in different countries. For example, life expectancy in the U.S. is 78.6 years. The life expectancies in Canada and Italy are 81.6 and 81.9 years, and life expectancies in Australia and Japan are 82.0 and 84.2 years [10]. Had we been born in these countries, or in any of the developed European countries, our life expectancies would be extended by about as much as we might expect to gain by eliminating cancer.
Chapter | 2 What are the Common Diseases?
13
Why are the benefits so small? It comes down to Pareto’s principle. Most of the deaths from cancer occur from a few common diseases, and these diseases occur almost exclusively in elderly patients. The rare cancers receive a small portion of NIH funding, but they strike children in disproportionate numbers.
2.2 THE RECENT DECLINE IN PROGRESS AGAINST COMMON DISEASES “Despite large public investments in genome-wide association studies of common human diseases, so far, few gene discoveries have led to applications for clinical medicine or public health.” —Idris Guessous, Marta Gwinn and Muin J. Khoury in 2009 [11]
We like to think that we are living in an era of rapid scientific advancement; more rapid than any prior era in human history. This is nonsense. In the field of medicine, the 50-year progress between 1913 and 1963 greatly exceeded progress between 1964 and 2014. By 1921, we had insulin. Over the next four decades, we developed antibiotics effective against an enormous range of infectious diseases, including tuberculosis. Civil engineers prevented a wide range of common diseases by providing a clean water supply and improved waste management. Safe methods to preserve food, such as canning, refrigeration, and freezing, saved countless lives. In 1941, Papanicolaou introduced the smear technique to screen for precancerous cervical lesions, resulting in a 70% drop in the death rate from uterine cervical cancer, one of the leading causes of cancer deaths in women (see Glossary item, Precancer, Precancerous condition). By 1947, we had overwhelming epidemiologic evidence that cigarettes caused lung cancer. The first polio vaccine and the invention of oral contraceptives came in 1954. By the mid-1950s, the sterile surgical technique was widely practiced, bringing a precipitous drop in post-surgical and post-partum deaths. The elucidation of the molecular basis of sickle cell anemia came in 1956 [12,13]. The major discoveries of the fundamental chemistry and biology of DNA came in the 1950s. Perhaps the greatest advances in the common diseases, in the past several decades, have been in the realm of heart disease. The role of statins in the prevention of heart attacks and strokes, improvements in cardiac surgery, and the use of stents to open narrowed arteries are major therapeutic success stories (see Glossary item, Brain attack). Nonetheless, few would argue that the benefits from these interventional measures would be dwarfed by the benefits enjoyed by individuals who adapted healthy eating habits, exercised regularly, attained a trim habitus, and avoided smoking; sensible life choices available prior to 1950. The National Cancer Institute is the largest of the research institutes at the NIH, receiving about 10% of the total NIH budget. Despite intense effort by generations of medical scientists, the cancer death rate today is about the same as it was in 1970 [14]. Though there has been a drop in the cancer death rate
14
PART | I Understanding the Problem
that has extended from the last decade of the twentieth century to the present, that drop was preceded by a rise in the cancer death rate from 1970 to the early 1990s. The two-decade rise followed by a two-decade drop was shaped by both the rise and consequent fall of smoking. Countries that had a drop in smoking prior to the U.S. saw a drop in cancer death rates prior to the U.S. drop. Countries in which smoking is on the increase have increasing rates of cancer death. For the common cancers (lung, colon, prostate, breast, pancreas, esophagus), progress has been impressive, extending survival times after diagnosis; but the overall death rate from the common cancers has not changed appreciably. The Human Genome Project is a massive bioinformatics project in which multiple laboratories contributed to sequencing the 3 billion base pairs encoding the full, haploid human genome (see Glossary item, Haploid). The project began its work in 1990, a draft human genome was prepared in 2000, and a nearly complete genome was announced in 2003, marking the start of the socalled post-genomics era. One of the purposes of the project was to find the genetic causes of common diseases. Although we have learned much about the genetics of the common diseases, most of what we have learned has only served to teach us that the genetics of common diseases are much more complex than we had anticipated. Common diseases are associated with hundreds of gene variations, and the gene variations that we have found explain only a small portion of the observed heritability of common diseases [15,16]. Early studies using polygenic variants to predict risk of developing common diseases have not been clinically useful [15]. If the rate of scientific accomplishment were dependent upon the number of scientists on the job, you would expect that progress would be accelerating, not decelerating. According to the National Science Foundation, 18,052 science and engineering doctoral degrees were awarded in the U.S. in 1970. By 1997, that number had risen to 26,847, nearly a 50% increase in the number of graduates at the highest level of academic training [17]. The growing work force of scientists failed to advance science at rates achieved in an earlier era, with fewer workers. While the overall rate of medical progress has slowed over the past half century, research funding has accelerated. In 1953, according to the National Science Foundation, the total U.S. expenditure on research and development was $5.16 billion, expressed in current dollar values. In 1998, that number had risen to $227.173 billion, greater than a 40-fold increase in research spending [17]. There has not been a commensurate 40-fold increase in scientific discoveries. The U.S. Department of Health and Human Services has published a sobering document entitled “Innovation or Stagnation: Challenge and Opportunity on the Critical Path to New Medical Products” [18]. The authors note that fewer and fewer new medicines and medical devices are reaching the Food and Drug Administration. Significant advances in genomics, proteomics, and nanotechnology have not led to equivalent advances in the treatment of common diseases. The last quarter of the twentieth century has been described as the “era of Brownian motion in health care” [19]. Wurtman and Bettiker, in their review
Chapter | 2 What are the Common Diseases?
15
of medical treatments, commented that, “Successes have been surprisingly infrequent during the past three decades. Few effective treatments have been discovered for the diseases that contribute most to mortality and morbidity” [20].
2.3 WHY MEDICAL SCIENTISTS HAVE FAILED TO ERADICATE THE COMMON DISEASES “One does not discover new lands without consenting to lose sight of the shore for a very long time.” —Andre Gide
Suppose we lived in a society where every adolescent and adult smoked two or three packs of cigarettes every day. Of course, the incidence of lung cancer, chronic obstructive pulmonary disease, emphysema, and other smokingassociated disorders would skyrocket. Still, we would be less likely to associate smoking with common diseases than we would be if only a small proportion of society were smokers. The cornerstone of research into the causal mechanisms of disease involves comparing disease occurrences in a group of individuals who share a particular trait (e.g., smoking), against a group of individuals who lack the trait (e.g., non-smokers). When everyone smokes, there is no basis to make a comparison. Suppose there were familial clusters of lung cancer (i.e., some families at higher risk than others). You could start checking to see if high-risk families have certain sets of genes that account for lung cancer heritability. Imagine that you start finding hundreds of gene variants that seem to separate the highrisk families from the low-risk families. How do you begin to determine which of those genes contribute to the pathogenesis of lung cancer? Keep in mind that, because high-risk families are small study populations, you may find it impossible to assign any statistical significance to your findings. In the last decades of the twentieth century, scientists hoped that the common diseases, like the rare diseases, were each caused by a single, diseasespecific genetic mutation. Once the mutation was found, it could be targeted with a drug. Most scientists today will admit that the common diseases of humans are much more complex than they had ever imagined. 2.3.1 Rule—We may have reached the limit by which we can understand the common diseases through direct genetic studies. Brief Rationale—The common diseases of humans are complex, and biological complexity cannot be calculated, predicted or solved, even with supercomputers.
An objective review of the genetics of common diseases yields only bad news. With no exceptions, the common diseases are genetically complex. Attempts at predicting the behavior of common diseases, based on detailed, yet incomplete, knowledge of their complex genetic attributes, have led to failure after failure [21–23].
16
PART | I Understanding the Problem
Not to be discouraged, data analysts believe that with the right algorithm, and the right supercomputer, the complexities of common diseases can be predicted. This belief is based, in no small part, on the assumption that organisms and cells behave much like non-biological devices composed of many parts, each performing some well-defined function, according to well-defined laws of physics, and interacting to produce a predictable and repeatable effect. Physicians have bought into this fantasy. When a sampling of physicians was asked to rank the areas in which they needed additional genetics training, their number one choice was the “genetics of common disease” [24,25]. 2.3.2 Rule—Biological systems are much more complex than naturally occurring non-biological systems (i.e., galaxies, mountains, volcanoes) and man-made physical systems (e.g., jet airplanes, computers). Brief Rationale—The components of biological systems, unlike the components of non-biological systems, have multiple functions, dependencies, and regulatory systems. We cannot predict how any single component of a biological system will react under changing physiologic conditions.
The grim truth is that biological systems are nothing like man-made physical systems. When an engineer builds a radio, she knows that she can assign names to components, and these components can be relied upon to behave in a manner that is characteristic of its type. A capacitor will behave like a capacitor, and a resistor will behave like a resistor. The engineer need not worry that the capacitor will behave like a semiconductor or an integrated circuit. What is true for the radio engineer does not hold true for the biologist [26]. In biological systems, components change their functions depending on circumstances. For example, cancer researchers discovered a protein that plays an important role in the development of cancer. This protein, p53, was once considered to be the primary cellular driver for human malignancy. When p53 mutated, cellular regulation was disrupted, and cells proceeded down a slippery path leading to cancer. In the past few decades, as more information was obtained, cancer researchers have learned that p53 is just one of many proteins that play a role in carcinogenesis, but the role changes depending on the species, tissue type, cellular micro-environment, genetic background of the cell, and many other factors (see Glossary item, Carcinogenesis). Under one set of circumstances, p53 may play a role in DNA repair; under another set of circumstances, p53 may cause cells to arrest the growth cycle [26,27]. It is difficult to predict a biological outcome when pathways change their primary functionality based on cellular context. Various mutations in the TP53 gene have been linked to 11 clinically distinguishable cancer-related disorders, and there is little reason to assume that the same biological role is played in all of these 11 disorders [28]. Likewise, the Pelger–Huet anomaly and hydrops-ectopic calcification-motheaten (HEM) are both caused by mutations of a gene, coding for the lamin B
Chapter | 2 What are the Common Diseases?
17
receptor. The Pelger–Huet anomaly is a morphologic aberration of neutrophils wherein the normally multi-lobed nuclei become coffee bean-shaped, or bilobed, with abnormally clumped chromatin. The condition is called an anomaly, rather than a disease, because despite the physical abnormalities, the affected white cells seem to function adequately. HEM is a congenital chondrodystrophy that is characterized by hydrops fetalis (i.e., accumulations of fluid in the fetus), and skeletal abnormalities. It would be difficult to imagine any two diseases as unrelated as Pelger–Huet anomaly and HEM. How could these disparate diseases be caused by a mutation involving the same gene? As it happens, the lamin B receptor has two separate functions: preserving the structure of chromatin and serving as a sterol reductase in cholesterol synthesis [29]. These two different and biologically unrelated functions in one gene product account for two different and biologically unrelated diseases. A gene’s role may be influenced by other genes, a phenomenon called epistasis (see Glossary item, Epistasis). Likewise, the role of a gene is influenced by the temporal expression of the gene (e.g., at precise moments of organismal development), and by its sequential activation (e.g., preceding or succeeding sequential steps in multiple pathways). The activity of a protein encoded by a gene can be influenced by subtle variations in amino acid sequence, by threedimensional structure, by chemical modifications of the protein, by quantity of the protein, by location of the protein molecules in cells, and by the type of cell in which the protein is expressed. Attempts to predict the functional effect of single or multiple gene variations are typically futile [30,31]. The most complex man-made physical systems are laughably simplistic compared to human genetics. The fastest supercomputers cannot cope with networks of systems whose individual objects behave in unpredictable and indescribable ways. With a few exceptions, the common diseases of humans are products of modern life; hence, they are relatively new diseases. Heart disease, diabetes, obesity, hypertension were not major scourges of ancient man. Neither are they common among modern men who lack modern conveniences, such as fresh food, comfortable shelter, potable water, and hygienic plumbing. It can be assumed that sets of gene variants that predispose us to most of the common diseases are old genes faced with new tasks. It is reasonable to expect that the genes that seem to associate themselves with common diseases may vary in different populations of humans, living under different environments. If this turns out to be the case, such variations may make an impossible job (determining clinical phenotype from genotype) even more impossible. Infectious diseases, to the contrary, have been around for a very long time. In Chapter 7, we will see that the human genome has evolved to cope with infectious organisms that have specifically evolved to live in our bodies (see Glossary item, Genome).
18
PART | I Understanding the Problem
REFERENCES 1. Total Midyear Population for the World: 1950–2050. United States Census Bureau. Available from: http://www.census.gov/population/international/data/idb/worldpoptotal.php, viewed May 21, 2013. 2. Deaths by Age, Sex and Cause for the Year 2008. World Health Organization. Available from: http://www.who.int/healthinfo/global_burden_disease/estimates_regional/en/index.html, viewed May 19, 2008. 3. U.S. and World Population Clocks. U.S. Census Bureau. Available from: http://www.census. gov/main/www/popclock.html, viewed July 20, 2011. 4. The World Factbook. Central Intelligence Agency, Washington, DC, 2009. 5. Hoyert DL, Heron MP, Murphy SL, Kung H-C. Final data for 2003. Natl Vital Stat Rep 54(13) April 19, 2006. 6. NIH Budgets by Institute and Funding Mechanism, FY 1998–2013. American Association for the Advancement of Science. Available from: http://www.aaas.org/spp/rd/fy2013/ health13pTBL.pdf, viewed May 21, 2013. 7. Cancer Research Funding. National Cancer Institute. Available from: http://www.cancer.gov/ cancertopics/factsheet/NCI/research-funding, viewed August 7, 2013. 8. Cancer Incidence by Age. Cancer Research UK. Available from: http://www.cancerresearchuk.org/cancer-info/cancerstats/incidence/age/, viewed November 13, 2013. 9. Tsai SP, Lee ES, Hardy RJ. The effect of a reduction in leading causes of death: potential gains in life expectancy. Am J Public Health 68:966–971, 1978. 10. Central Intelligence Agency World Factbook. Rank-order life expectancy at birth. https:// www.cia.gov/library/publications/the-world-factbook/rankorder/2102rank.html. 11. Guessous I, Gwinn M, Khoury MJ. Genome-wide association studies in pharmacogenomics: untapped potential for translation. Genome Med 1:46, 2009. 12. Pauling L, Itano HA, Singer SJ, Wells IC. Sickle cell anemia, a molecular disease. Science 110:543–548, 1949. 13. Ingram VM. A specific chemical difference between globins of normal and sickle-cell anemia hemoglobins. Nature 178:792–794, 1956. 14. Berman JJ. Precancer: The Beginning and the End of Cancer. Jones and Bartlett, Sudbury, 2010. 15. Wade N. A decade later, genetic map yields few new cures. The New York Times June 12, 2010. 16. Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, Hunter DJ, et al. Finding the missing heritability of complex diseases. Nature 461:747–753, 2009. 17. National Science Board. Science & Engineering Indicators—2000. National Science Foundation, Arlington, VA, 2000 (NSB-00-1). 18. Innovation or Stagnation: Challenge and Opportunity on the Critical Path to New Medical Products. U.S. Department of Health and Human Services, Food and Drug Administration, 2004. 19. Crossing the Quality Chasm: A New Health System for the 21st Century. Quality of Health Care in America Committee, eds. Institute of Medicine, Washington, DC, 2001. 20. Wurtman RJ, Bettiker RL. The slowing of treatment discovery, 1965–1995. Nat Med 2:5–6, 1996. 21. Cecile A, Janssens JW, vanDuijn CM. Genome-based prediction of common diseases: advances and prospects. Hum Mol Genet 17:166–173, 2008.
Chapter | 2 What are the Common Diseases?
19
22. Ioannidis JP. Is molecular profiling ready for use in clinical decision making? Oncologist 12:301–311, 2007. 23. Venet D, Dumont JE, Detours V. Most random gene expression signatures are significantly associated with breast cancer outcome. PLoS Comput Biol 7:e1002240, 2011. 24. Calefato JM, Nippert I, Harris HJ, Kristoffersson U, Schmidtke J, Ten Kate LP, et al. Assessing educational priorities in genetics for general practitioners and specialists in five countries: factor structure of the Genetic-Educational Priorities (Gen-EP) scale. Genet Med 10:99–106, 2008. 25. Julian-Reynier C, Nippert I, Calefato JM, Harris H, Kristoffersson U, Schmidtke J, et al. Genetics in clinical practice: general practitioners’ educational priorities in European countries. Genet Med 10:107–113, 2008. 26. Madar S, Goldstein I, Rotter V. Did experimental biology die? Lessons from 30 years of p53 research. Cancer Res 2009(69):6378–6380, 2009. 27. Zilfou JT, Lowe SW. Tumor suppressive functions of p53. Cold Spring Harb Perspect Biol 1:a001883, 2009. 28. Vogelstein B, Lane D, Levine AJ. Surfing the p53 network. Nature 408:307–310, 2000. 29. Waterham HR, Koster J, Mooyer P, van Noort G, Kelley RI, Wilcox WR, et al. Autosomal recessive HEM/Greenberg skeletal dysplasia is caused by 3-beta-hydroxysterol delta(14)reductase deficiency due to mutations in the lamin B receptor gene. Am J Hum Genet 72: 1013–1017, 2003. 30. Chi YI. Homeodomain revisited: a lesson from disease-causing mutations. Hum Genet 116: 433–444, 2005. 31. Gerke J, Lorenz K, Ramnarine S, Cohen B. Gene environment interactions at nucleotide resolution. PLoS Genet 6:e1001144, 2010.