C H A P T E R
8 The Rationale and Challenges of Molecular Medicine 8.1 SUMMARY An important application when establishing appropriate genomic and evolutionary theories is to bridge the translational gap between basic research and molecular medicine. Following the success of the Human Genome Project, various large-scale -omics projects have promised to revolutionize medicine. In particular, “precision medicine” has become a buzzword. In this chapter, a brief history of molecular medicine, as well as its challenges and prospects, is reviewed. Following the case study of p53 research, the relationship between stress-induced variations and cellular adaptation and its trade-offs are summarized in the context of disease formation. Moreover, the Future Direction section discusses crucial upcoming issues in molecular medicine: increased bio-uncertainty, relationships between big data and theories, biomarker development, and educational improvements in biomedical science, including scientific policy along with essential knowledge structures, scientific culture, and professionalism.
8.2 A BRIEF HISTORY: THE PROMISES OF MOLECULAR MEDICINE According to the National Cancer Institute (NCI), molecular medicine is defined as “A branch of medicine that develops ways to diagnose and treat disease by understanding the way genes, proteins, and other cellular molecules work. Molecular medicine is based on research that shows how certain genes, molecules, and cellular functions may become abnormal in
Genome Chaos https://doi.org/10.1016/B978-0-12-813635-5.00008-2
427
Copyright © 2019 Elsevier Inc. All rights reserved.
428
8. THE RATIONALE AND CHALLENGES OF MOLECULAR MEDICINE
diseases such as cancer” (NCI). Currently, studies on the genetic basis of disease represent the mainstream in the field, even though research subjects also include enzymes, antibiotics, hormones, carbohydrates, lipids, metals, and vitamins, as well as synthetic, organic, and inorganic polymers. It is thus no surprise that the textbook Molecular Medicine states that “Molecular medicine is the application of gene or DNA based knowledge to the modern practice of medicine” (Trent, 2005). The year 1949 marked the birth of molecular medicine, when Linus Pauling and Harvey Itano, teamed up with their colleagues, published the landmark paper, “Sickle cell anemia, a molecular disease,” which described the results that there is a significant difference between the electrophoretic mobilities of hemoglobin obtained from erythrocytes of normal individuals and from those of sickle cell anemic individuals (Pauling et al., 1949). This publication successfully linked a specific disease (sickle cell anemia) to a particular molecular variant (a different form of the Metalloprotein hemoglobin in patients’ blood) and established sickle cell anemia as a genetic disease, linking a gene to the specific structure of protein molecules. Pauling was one of the most influential scientists of our time, and he is also one of the founders of molecular biology. As such, it seems logical that he introduced the molecular medicine concept by illustrating the relationship between gene, protein, and disease phenotype. The journey of searching for the molecular mechanism (e.g., genetic basis) of various human diseases, however, started far earlier. In addition to the establishment of the theory-based science of genetics in 1865 by Mendel, Sir Archibald Edward Garrod linked alkaptonuria to inborn errors of metabolism (Garrod, 1902). He correctly assumed that an enzymatic defect in a metabolic pathway was responsible for this phenotype, which led to a classical example of using genetics to explain errors of metabolism. As Molecular Medicine emerged from the application of molecular genetics/genomics to medicine, it is strongly influenced by concepts within the methodologies of genomics and other -omics. Its overall success and limitations are reflected by those of genomics, as was extensively discussed in Chapters 1 and 2. Some current and future technologies should be mentioned, as they are either commonly used or will likely play an important role in molecular medicine. These platforms include the industrial capability to produce therapeutic proteins; in vitro fertilization methods for reproduction; gene therapy; stem cell therapy; target-specific therapy; prenatal diagnosis; gene mutation screening; immune therapy; organ culture; and CRISPR/Cas9 and targeted genome editing. Like genomics, perhaps the biggest source of excitement for molecular medicine is the continuous promises that came first from the Human
8.2 THE PROMISES OF MOLECULAR MEDICINE
429
Genome Project and then from many other large-scale -omics projects. Even before the completion of the Human Genome Project, the field of molecular medicine was convinced that a new era was coming, which would impact every aspect of the field. The following two publications are examples. The near-completion of the Human Genome Project, which identifies the 3.2 billion base pairs that comprise the human genome (the so-called ‘Book of Life’), has exponentially heightened the focus on the importance of molecular studies and how such studies will impact on various aspects of medicine in the 21st century. Semsarian and Seidman, 2001 The landmark event since the second edition of Molecular MedicinedAn Introductory Text has been the completion of the Human Genome Project, which is already living up to the promise that it will provide the framework for important new medical discoveries in the twenty-first century. Trent, 2005
The above claims actually represent some of the more modest ones, compared with many more exciting predictions mentioned in previous chapters, most of which have not been fulfilled. Nevertheless, a wave of renewed promises continues to come. One of the most popular ones is the Precision Medicine Initiative launched by then-President Obama in 2015. According to NIH leadership, their understanding of a new initiative on precision medicine was the following: The concept of precision medicine d prevention and treatment strategies that take individual variability into account d is not new; . But the prospect of applying this concept broadly has been dramatically improved by the recent development of largescale biologic databases, powerful methods for characterizing patients, and computational tools for analyzing large sets of data. What is needed now is a broad research program to encourage creative approaches to precision medicine, test them rigorously, and ultimately use them to build the evidence base needed to guide clinical practice. . Collins and Varmus, 2015
Precision medicine soon became a buzzword to replace “personalized medicine” which was popular a few years before and largely overlaps with precision medicine. Also, there are many definitions of precision medicine, but the term broadly refers to the use of molecular diagnostic tools and targeted treatments for individual patients based on genomic (and other -omics), biomarker, or psychosocial characteristics (Ramaswami et al., 2018). Despite its popularity, there have been increasing calls to challenge precision medicine based on the complex reality of medicine and the limitations of current molecular profiling methods (Khoury and Galea, 2016; Joyner et al., 2016). Recently, after examining issues of diagnostic methods, novel therapies, and public health integration, both from
430
8. THE RATIONALE AND CHALLENGES OF MOLECULAR MEDICINE
individual patient and general population points of view, some important statements have been made regarding the health policy: Over the past decade, precision medicine (PM) approaches have received significant investment to create new therapies, learn more about disease processes, and potentially prevent diseases before they arise. However, in many ways, PM investments may come at the expense of existing public health measures that could have a greater impact on population health . We cannot ignore the potential that PM holds for medical progress. However, PM has been disproportionately focused on drug development and strategies for those who have a disease with an intention to improve outcomes, leaving behind the concerns of whole population health. Ramaswami et al., 2018
Putting the policy issue aside for a moment (while acknowledging that it is a crucial issue), it is important to examine if the “disproportionately focused strategies” themselves will work through the lens of genomemediated disease evolution. Knowing the high pressure that the genomic research community has faced to fulfill its promises, it is not surprising to see new initiatives to push translational genomic research and utilize the massive sequencing information for patient care. It is surprising, however, to see the “business as usual” attitude. Besides getting more molecular data and increasing computational power, there has been no call for reexamining the genecentric genomic framework or how somatic evolution plays a key role in disease initiation and progression. The way of thinking, rather than simply accumulating data, might be the biggest challenge for the precision medicine initiative.
8.3 THE CHALLENGES AND OPPORTUNITIES FOR PRECISION MEDICINE The success of precision medicine is largely dependent on the precise prediction of phenotype from the genomic profile or other -omics profiles. It is already known that most common diseases cannot be explained by a few key gene mutations. The confidence of achieving precision in medicine is based on a new assumption that by sequencing more samples and by doing so with increased computational power, disease-specific genomic patterns will be identified, which are valuable for future medical diagnosis and treatment. The same assumption is also used to rationalize Cancer Genome Atlas Project, as well as the study of other types of common and complex diseases (see Chapters 1e2). It was pointed out that current precision medicine evolved directly from the promises of the Human Genome Project.
8.3 THE CHALLENGES AND OPPORTUNITIES FOR PRECISION
431
Collins (1999) envisioned a genetic revolution in medicine facilitated by the Human Genome Project and described 6 major themes:(1) common diseases will be explained largely by a few DNA variants with strong associations to disease, (2) this knowledge will lead to improved diagnosis; (3) such knowledge will also drive preventive medicine; (4) pharmacogenomics will improve therapeutic decision making; (5) gene therapy will treat multiple diseases; and(6) a substantial increase in novel targets for drug development and therapy will ensue. These 6 ideas have more recently been branded as personalized or precision medicine. Joyner et al., 2016.
The question “What are the key challenges of precision medicine?” indeed belongs to a more general question: “What are the limitations of current genomics, and specifically, the various genome sequencing projects?” As we have extensively discussed in previous chapters, the success of precision medicine is predicted to be low if the current gene-centric framework remains the key principle. From these discussions, there are two key messages regarding the realities we must face. First, if the genomic landscape itself for an individual patient is highly dynamic and cannot be precisely profiled (because of fuzzy inheritance, somatic evolution, and environment-influenced emergence), how can precision medicine work? Secondly, we must search for a new path and attempt to answer the question, how can the genome theory better position precision medicine? In this section, rather than repeating many previously made arguments, a number of case studies will be used to make our point.
8.3.1 The 40-Year Journey of Studying p53, From Certainty to Increased Uncertainty In discussing efforts to identify and characterize individual molecules, genes, or proteins, based on its usage in molecular medicine, one cannot forget the famous p53, which is undoubtedly one of the most extensively studied genes and proteins in the history of molecular biology! When it was initially described by different groups in 1979, few could have foreseen how important it would be for basic cancer research, how expensive it would be to understand its functions, how confusing it would be to deal with the multiple facets molecule, and how challenging it would be to use it for cancer treatment. An excellent review article published a decade ago by one of the original researchers very nicely summarized the history of p53 research, outlining the most important functions of p53 among many. Thirty years ago, p53 was discovered as a cellular partner of SV40 Large Tumor Antigen, the oncoprotein of this tumor virus. The first decade of p53 research saw the cloning of p53 DNA and the realization that p53 is not an oncogene but a tumor suppressor .... In the second decade, the function of p53, a transcription factor induced
432
8. THE RATIONALE AND CHALLENGES OF MOLECULAR MEDICINE
by stress, resulting in cell cycle arrest, apoptosis and senescence, was uncovered. In its third decade new functions were revealed, including regulation of metabolic pathways and cytokines required for embryo implantation. The fourth decade may see new p53-based drugs to treat cancer. What is next is anybody’s guess. Levine and Oren, 2009
One might wonder about how the 40 years’ review will look. Knowing the fourth decade has failed to develop p53-based drugs to treat cancer, and the interactions among p53 and other molecules/pathways have become mind-bogglingly complicated, one thing that is certain is that the p53 story can only become even more uncertain. However, if one examines the status of p53 research through the evolutionary lens, the increased confusion can be reduced. The key is to not solely focus on p53 itself and its immediate up- and downstream partners, but on the real world of stochastic interaction within the context of emergence and selection. The following research experiments on p53 illustrate the importance of using the correct framework to understand p53’s contribution during cancer evolution, a phenomenon which plays a crucial role in establishing the genome theory of somatic cell evolution. 1. p53 is ultimately linked to evolutionary potential through genome instability. To compare tumor cells with normal and defective p53, the human lung cancer cell line H460 and ovarian carcinoma cell line PA-1 were modified by HPV E6 transfection to generate two pairs of cell lines with p53þ/þ and p53/. Following cell harvesting, cytogenetic slide preparation, and spectral karyotyping, the frequencies of nonclonal chromosome aberrations (NCCAs) were scored for each sample. Cells with p53/ displayed significantly increased frequencies of NCCAs, as compared with the wild-type p53 counterpart, confirming the importance of p53 to genome instability and evolutionary potential, regardless of the diverse p53-related molecular pathways, which may have been involved (Heng et al., 2006a; Heng, 2015) (see Chapters 3 and 4). By further culturing these cells featuring genome instability, many of them formed clonal populations displaying clonal chromosome aberrations, which are likely associated with gene mutations that promote proliferation. This simple experiment illustrated that system instability is more important than a given cancer gene or pathway, as the stochastic process will select one or another cancer gene(s), as long as instability is there. 2. Many factors (genomic and environmental alike) can contribute to cancer evolution.
8.3 THE CHALLENGES AND OPPORTUNITIES FOR PRECISION
433
Although the p53 mutation can drastically increase the frequencies of NCCAs, so can many other gene/epigene/ chromosomal alterations (like ATM/, DMF, aneuploidy) and environmental factors (like different drug treatments, viral infection, inflammation, culture conditions, and more) (Heng et al., 2006a; Ye et al., 2009). These different factors can work independently or in combination (in an either collaborative or conflicting manner). Different factors can replace each other when opportunities appear. This agrees with the observation that a large number of gene mutations and other factors can be linked to cancer, but in a complicated fashion. Interestingly, all of these factors can be linked to genome instability, which in turn links them to the stochastic evolutionary process, exemplifying “the transition from certainty of parts to uncertainty of the process selecting parts.” This analysis, coupled with additional syntheses, has promoted the evolutionary mechanism of cancer, which unifies the diverse molecular mechanisms comprising system instabilityemediated evolutionary selection (Ye et al., 2009; Heng et al., 2009, 2011a-b, 2013a-b). 3. The two phases of cancer evolution were initially observed from a cellular model of immortalization in which the p53/ was involved. It was confirmed with a mouse model system, which lacked initial p53/ mutations (Lawrenson, 2010; Abdallah et al., 2014). Later, the increased capability to induce genome chaos was also linked to p53/ status (Liu et al., 2014), which led to the realization that reduced genome constraint is important for the generation of new systems through rapid genome reorganization (genome chaos). During the phase of macrocellular evolution, transcriptome dynamics are very high, and an individual gene mutation is much less important than genome reorganization in terms of timely survival. During the microevolutionary phase, however, the gene’s power becomes more visible in the context of clonal expansion. This experiment supports the viewpoint that different genomic mechanisms are dominant in different phases of cancer evolution, further suggesting that macroevolution does not seem to be caused by the accumulation of microevolution over time. Together, the above experiments, and more importantly, the syntheses, demonstrated that a new strategy is needed to understand the mechanism of p53 or other molecules in the context of cancer evolution. Current molecular characterization of p53, in contrast, is mainly in the context of different pathways which often generate large amounts of conflicting data, as most of these data are from a “parts point of view.” In reality, many changes at the gene
434
8. THE RATIONALE AND CHALLENGES OF MOLECULAR MEDICINE
level might be irrelevant for cancer evolutionary mechanisms, at least during the rapid evolutionary phase. With this understanding, it is no longer puzzling why it is so challenging to define p53’s functions and to apply them for the treatment of cancer. Namely, there is no defined function of this molecule in evolution (despite that scientists can define its function in isolation); instead, all of the p53’s function is context-dependent. It is no wonder that there is increased concern about the status of p53 research, particularly in regard to its implications for medicine. Indeed, the issue of “the paradox of p53” has entered into discussions: As p53 biology continues to surprise, the question of how to efficiently harness the modulation of p53 activities for therapeutic benefit remains tantalizingly unanswered. Kruiswijk et al., 2015 Unlike the rather stereotypic image by which it was portrayed ..., p53 is now increasingly emerging as a multifaceted transcription factor that can sometimes exert opposing effects on biological processes. This includes pro-survival activities that seem to contradict p53’s canonical proapoptotic features, as well as opposing effects on cell migration, metabolism, and differentiation.. Deciphering the mechanisms by which p53 determines which hubs to engage, ... remains a major challenge .... . the “paradox” of p53 is still far from being resolved. Can we develop the computational, technological, and biological tools to tackle this “super hub” challenge? . Only the next 35 years will tell. But be ready for new surprises! Aylon and Oren, 2016
Interestingly, even though the issue of evolutionary selection is mentioned in some pieces that search for new frontiers in p53 research, most of the questions still focus on how to handle the molecular characterization of p53, albeit from a large data perspective. The critical framework of rethinking genomics and evolution is not on the table. The following analyses/statements/questions are thus crucial for addressing the paradoxes of p53, in the light of the genome theory of cellular evolution. 1. The “parts” (p53) characterization is extremely limited when the genome context that defines the function of parts is highly dynamic and less predictable. The function of p53 within the context of the genome is analogous to the function of a brick within the context of a building. The function of a brick is dynamic depending on the context of the bricks around it which make up various types of structures. Depending on its context, p53 can have so many functions. The key is to study the context of the building, rather than a specific brick. Unfortunately, so far, in studies of p53’s function, the
8.3 THE CHALLENGES AND OPPORTUNITIES FOR PRECISION
435
issue of different karyotypes has never appeared in the work of most molecular researchers. 2. Studies of “part”-specific interactions (such as p53 DNA-binding specificity) should provide clarity in regard to how p53 and its partners bind. However, because a large number of substrates can be involved, the nonspecificity that p53 faces is overwhelming. There are nearly unlimited possibilities involving the interactions of p53 with others, making its predictability very low. For example, there are more than 2000 different mutant p53 proteins, known as the p53 mutome, which affect the interaction of p53 with DNA (Stiewe and Haran, 2018). Nevertheless, such stochasticity works well for a biological system undergoing evolution, which is unfortunate for researchers and their preference for certainty. 3. Emergence is the key. One can characterize thousands of agents in artificially created linear models; different initial conditions, as well as the evolutionary context, ultimately define the limitation of the basic research. The key is that we cannot take heterogeneity out of the system for the sake of research, and we cannot do continue translational research without the evolutionary context. Furthermore, many key molecular methodologies are problematic, including the various mouse models (Heng, 2015). These methods focus on the average profiles by ignoring the outliers, focus on the short-term molecular response by ignoring the long-term phenotypic consequences, and assume that what is “good” or “bad” in the short term will lead to long-term “benefit” or “harmfulness,” respectively. 4. Everything is linked. This reality also applies to p53. In the past 40 years, p53 studies have touched on many aspects of biological systems, as reflected by nearly 100,000 publications. Despite all of the money involved and research efforts that have taken place, how to apply the molecular knowledge of p53 to the clinic is still a big unknown. The last decade has brought further increased uncertainty to p53 research through various -omics. Now that p53 is linked to the active fields of metabolics, stem cell research, and epigenetics, this complexity will only continue to increase. This leads to a practical policy question: when do we say that enough is enough, in terms of continuing to characterize the potential linkage between p53 and other molecules? This is an especially relevant question when considering that most of the cellular systems used (including animal models and clinical samples) involve karyotype dynamics, which means that the genomic context in these systems is constantly changing. It will likely take much longer than 40 years just to know how complicated this issue will be. An even more profound question for the research community is what should be done about the over 20,000 other genes? Should we duplicate the 40-year story of p53 for most of them as well? p53 alone costs tens of billions of
436
8. THE RATIONALE AND CHALLENGES OF MOLECULAR MEDICINE
dollars; imagine how many resources will be needed to build up all of the molecular knowledge of these other genes (with limited clinical implications)? This question is important and deserves that any serious scientist think about and act upon it. In conclusion, we can continue the effort of chasing the dynamics of parts interactions, get more information at the parts level, and publish another 100,000 papers, while establishing little that is useful for the clinic. Alternatively, we can start a serious discussion about the direction of molecular medicine in general, and at the same time, vigorously search for better frameworks that can guide future research.
8.3.2 The Relationship Between Stress, Variation, Adaptation and Trade-Off, and Disease To understand why the genomic and environmental context is so important for defining the significance of individual molecules in molecular medicine, it is necessary to define what disease is in genomic and evolutionary terms. It was recently stated: The term disease broadly refers to an impairment of the normal physiological function of a tissue/organ, organ system, or of the body and mind, in the context of genetic or developmental errors and unfavorable environmental factors (i.e., infection, poisons, and nutritional deficiency or imbalance). Disease is often associated with specific physiological responses and/or pathological changes caused by stress. (Heng, 2008, 2013c, 2015; Heng et al., 2016a). Based on the appreciation of the ultimate importance of genetic heterogeneity to human species’ adaptation and survival, we would like to define disease as genetice environment interaction-generated variable phenotypes, which display functional disadvantages, discomfort, and/or harmfulness, when they are less fit in the current environment (note, nevertheless, that some potential benefits could be achieved in very different environments). Heng et al., 2016b (with permission from Wolters Kluwer-Medknow).
From the above definitions of disease, a few key concepts need to be highlighted based on the viewpoints of genome theory. The meanings of some frequently used terms, such as “stress,” “variants,” “environment,” and “interaction,” will be discussed in the context of cellular evolution. 1. Stress is not only a double-edged sword for health but also an essential condition for any living system to exist and evolve. Stress can be linked to various diverse molecular mechanisms. Despite that the word “stress” is often associated in the general public with unhealthiness, it is increasingly appreciated that stress can be bad or good, as many important bioprocesses, such as
8.3 THE CHALLENGES AND OPPORTUNITIES FOR PRECISION
437
development and evolution, depend on it (Horne et al., 2014; Heng, 2017b). Therefore, upcoming research should be focused on how to maintain system balance rather than on how to avoid all stress (Zhang et al., 2005; Horne et al., 2014). The following points are related to how to understand stress in molecular medicine: (a) Any “good” molecular treatment can become a stress to the system. Under certain conditions, it can become deadly. The fact that adverse drug reactions remain a major cause of death (Bonn, 1998) reminds us of the challenges that will almost certainly arise when many more molecular interventions are introduced in the future. In addition, changes at lower levels will generate stress at higher levels, and vice versa. However, molecular medicine has traditionally paid more attention to successes at molecular level. For example, in cancer treatment, some costly molecular treatments can achieve a good response at the molecular or tumor levels but have failed to improve life quality or prolong life. (b) Different levels of the biosystem (such as cell, tissue, or individual) may have different responses to a given treatment/ stress, which demands that attention be given not only to what occurs at the molecular level. Pushing the maximal dosage for chemotherapy during treatment, for example, could have a very negative impact on some patients, as high-dosage treatment can induce genome chaos. Many molecular treatments can have an impact on the patient’s mental health, which can in turn affect the stability of lower levels of the biosystem. (c) One of the rationales of identifying a molecular magic bullet (to target a key gene/protein or pathway) is the assumption that stress and the stress response comprise of a specific event and that there is a linear, causative relationship between this molecular target and the disease phenotype. As has been discussed, this is often not the case in cancer and most common diseases. Significantly, the cellular stress response is by and large less specific, as many reported specific responses among molecules can only be observed for a very short period of time and, furthermore, are limited to only some linear models. When the entire genomic network is under investigation, this specificity is unlikely to be observed again (Stevens et al., 2013a-b, 2014). This point has been addressed recently: The cellular stress response is a reaction to any form of macromolecular damage that exceeds a set threshold, independent of the underlying cause, and the fragmented knowledge of the stress response needs to be unified at the conceptual level to explain its universality for many different species and types of stress (Kultz, 2003). In fact,
438
8. THE RATIONALE AND CHALLENGES OF MOLECULAR MEDICINE
many aspects of the cellular stress response are not stressor-specific, because cells monitor stress based on macromolecular damage without regard to the type of stress that causes such damage (Kultz, 2005). There is also limited pathway specificity for stress response during somatic cell evolution, especially under pathological conditions where stochastic genetic alteration plays an important role Horne et al., 2014.
To wit, while the wild-type p53 gene can restore some features in p53 knockout cells in some linear models (illustrating the specific function of p53), this strategy of restoring p53 in cancer patients (a setting in which nonlinear systems are involved) has so far failed. The idea of simply putting the wild-type p53 gene back into the unstable system (which was caused by the p53/ in the first place) will not work in highly dynamic evolutionary systems. In contrast, wild-type p53 can generate further stress for the already unstable system. This gap between research (focused on parts) and the reality of the clinic (determined by systems) once again demonstrates a simple truth: an adaptive system is not like a clock in which specific parts can be replaced without changing the overall system. 2. Multiple levels of genomic/epigenetic variations: There are more important genomic elements than genes. When talking about the genomic/nongenetic variable elements or variations, one is mostly referring to gene mutations/splicing, copy number variations, and epigenetic variations (Feuk et al., 2006; Kundaje et al., Roadmap epigenomics consortium, 2015; Heng, 2017a). However, the most important genomic variants, karyotypic variations, are often left without the deserved amount of study. As stated, the short-term focus of the Precision Medicine Initiative is, using cancer as an example, to translate knowledge into clinical practice (Collins and Varmus, 2015). As there are two phases of cancer evolution, and gene mutations play a limited role in the macrocellular phase of cancer evolution, genome-level alterations must be carefully studied. Unfortunately, current genomic landscape profiling still focuses mainly on gene mutation and copy number variation profiles, with a renewed interest in epigenetics. This situation must change for molecular medicine to include the genomic context, with the new knowledge of chromosomal coding and the genome as a gene organizer. More importantly, chromosomal aberrations display a much higher prediction value than gene mutations (Jamal-Hanjani et al., 2017; Davoli et al., 2017; Ye et al., 2018b). Based on more detailed discussions in Chapters 3 and 4, a new attitude toward variations is needed: not all variants are equal in different evolutionary contexts. When studying each individual type of variable, one needs to understand the types of disease (is it a
8.3 THE CHALLENGES AND OPPORTUNITIES FOR PRECISION
439
gene-based, a genome-based, or an epigenetic disease?). Which types of cellular evolution are involved (microevolution, macroevolution, or mixture of both)? 3. Understanding different genotype and phenotype relationships. The disease phenotype (using cancer as an example) can be classified into the following relationships of environments and genotypes: Disease phenotype ¼ Disease genotype þ environments ð1Þ or ¼ Predisposition genotype þ environments ð2Þ or ¼ Normal genotype þ environments ð3Þ Normal phenotype ¼ Normal genotype þ environments ð4Þ or ¼ Predisposition genotype þ environments ð5Þ or ¼ Disease genotype þ environments ð6Þ The relationship between genotype, environment, and phenotype can be explained as the collection of the above six categories. Category (1) suits typical Mendelian single-gene diseases, in which the environment mainly affects the severity of disease phenotype. Categories (2) and (3) suit most common and complex diseases. There are many examples of category (2) in hereditary and familial cancers. Category (3) likely represents most cases of human diseases, in which environmental conditions seem to play a dominant role. Many sporadic cases of various chronic diseases belong to this category. Category (4) is suitable for most healthy individuals. Categories (5) and (6) are suitable for those lucky individuals who have familial predisposition, or driver cancer gene mutations or chromosomal aberrations, and yet are cancer free. It should be pointed out that the concept of “normal genotype” is now under questioning. What is the normal genotype anyway? . When combined with copy number variations, genome alterations, and somatic mosaicism, and especially when single cell profiling is included, the degree of genetic and nongenetic changes is beyond our imagination (Feuk et al., 2006; Iourov et al., 2008; Heng et al., 2016a). Based on this evidence, it is starting to make sense why the environment plays a major role in human diseases, as most of the evolutionary potential will be fulfilled by environmental interaction-mediated evolution. Heng et al., 2016b
4. Environments deserve more attention in molecular medicine: why environmental dynamics is one major challenge for precision medicine In the field of molecular medicine, understanding the genomic contribution to human diseases has been a priority. In recent
440
8. THE RATIONALE AND CHALLENGES OF MOLECULAR MEDICINE
years, due in part to the limited success of identifying common gene mutations responsible for most common and complex diseases, increased attention is now being paid to geneeenvironment interaction. Because the environment has broad coverage, its medicine-related functions can be discussed as follows: (a) For a given genotype, environment determines or “chooses” the specific phenotype within the potential phenotypic range coded by fuzzy inheritance. Different environments will lead to different phenotypes. (b) Different types of environments influence medical strategies. In the case of infectious diseases, for example, because the causative agent is commonly shared among patients for a given disease, there is a more or less linear relationship between the agent and the phenotype. It is logical to perform diagnosis and treatment based on the infectious agents. In contrast, in the case of many common and complex diseases, such as cancer, there are so many contributing factors and dynamics of the evolutionary context that it is challenging to identify any key molecule(s) as a magic bullet. (c) Somatic genomes, including altered genomes, are the genomic environment of individual human genes. In recent years, studies on the microbiome have led to the further inclusion in this equation of the influence of approximately 100 trillion bacteria and other microbes hosted by the human body. It was even suggested that the human microbiome should be considered part of human hologenomes (Bordenstein and Theis, 2015). While microbiome research represents a long overdue and important frontier (Ursell et al., 2012), data are needed to illustrate its quantitative contribution in the real world rather than in experimental models (as it is currently easier to demonstrate its impact in some linear model systems than in patients). Yes, the microbiome is highly dynamic, but how can we use this feature for medical intervention? Some interesting questions have been asked: It is important to investigate, for example, to what extent specific micro-organisms contribute to the evolutionary selection of hosts. Who controls who and in what degree? (Specifically, is the human genome selecting the microbiome, . or is selection based on the interaction package?..). Does the hostemicrobiome interaction-mediated degree of heterogeneity matter the most, rather than any specific interaction? Should the microbiome be considered as an environmental component, no matter how many types and numbers of them there are? (Similarly, should the number/type of animals/ plants surrounding humans be included as part of the hologenome?) When compared to the multiple types and levels of genome heterogeneity (which so far have been largely ignored), which type of impact is more significant? Heng et al., 2016b (with permission from Wolter Kluwer-Medknow).
8.3 THE CHALLENGES AND OPPORTUNITIES FOR PRECISION
441
(d) Based on the “phenotype ¼ genome type þ environments” relationship, if it is difficult to change the genotype for some diseases, we should focus on environmental changes to achieve the phenotype correction. Therefore, in the future, more effort should be focused on environmental therapy (to reduce the disease phenotype by altering its environmental context). By creating a certain environment, it is possible to eliminate or reduce the disease phenotype for a specific patient group. The strategy of changing the environment will be useful for controlling many common diseases. Furthermore, creating conditions in a medical context, which promotes self-healing, is an important avenue. For example, pain relief medication may be administered to encourage exercisedthe medication is not used purely for continuous relief but to promote conditions for direct self-healing. (e) Different types of environmental stress can be measured by considering them as a general stress. At the genome level, it can be measured using the frequencies of NCCAs. At the gene level, it can be measured using the increased gene mutation rate across the genome. Standardized methods are needed for this purpose, however. 5. Interaction is defined by emergence within an evolutionary context: Traditional strategies for studying molecular interaction often focus on the physical interaction among molecular partners, as well as up- and downstream relationships in specific pathways. To illustrate the interaction among agents in a complex system, however, the concept of emergence within an evolutionary context is the key. Of equal importance, there are multiple levels of interactions. Many higher-level interactions can constraint molecular interactions. Future medicine needs to give more consideration to the individual’s overall health conditions (including mental health and nutrition), as well as familial and societal interactions. It is likely that for the general population, focusing on improving lifestyle for prevention is more effective than applying molecule intervention when people are sick. In addition, the conflicting relationship among different organs needs more attention. It is known that some drugs are good for one type of organ but bad for another. It is best to have a balanced view based on the individual, rather than just on a specific organ. Similarly, a balanced approach is needed when there is a conflict between short- and long-term benefits.
442
8. THE RATIONALE AND CHALLENGES OF MOLECULAR MEDICINE
The synthesis of these five key points has led to the following conclusion: A specific concept of “stress promoted, genetic variation mediated cellular evolution” has been proposed, using the genome theory to synthesize the current status of genomic medicine (Horne et al., 2014; Heng, 2016a, 2016b). In brief, diseases are defined as genotype/environment-induced variants that are not compatible with a selected environment. Since different types of variants are necessary for cellular adaptation, different stresses can promote variants with benefit but also can eventually contribute to disease conditions. Various trigger factors (genetic or environmental alike) can speed up cellular evolution; there is often no stepwise relationship between initial causative factors to the molecular profiles of diseases following years of cellular evolution, where the genome instability can be stochastically linked to different molecular pathways. Though causative factors may be diverse, they can all be considered system stress. Therefore, stress, system response, and cellular evolution are the general bases for most common and complex diseases. Accordingly, monitoring diseases should focus on genome defined system stability and evolutionary potential, rather than specific gene’s functional status that is in fact constantly changing. The rationale of bringing some key factors together under the evolutionary adaptive process is to search for the common mechanisms in diseases, and to unify such diverse molecular mechanisms (Horne et al., 2015a, 2015b, 2015c) Heng, 2017b (with permission from John Wiley and Sons).
With the above understanding, some big picture questions should also be asked, which are important for the future of molecular medicine: First, how can molecular medicine and traditional medicine be balanced? With the increased use of whole genome sequencing data and other molecular profiles, diagnosis soon will detect molecular indications of diseases long before clinical symptoms become detectable. Although this seems to be a dream come true for molecular medicine, it can also lead to serious confusion, as many molecular indications will not lead to a given disease’s phenotype or rapid progression. How “potential patients” deal with this situation represents a big challenge. Even based on current diagnostic platforms, overdiagnosis, referring to when a disease condition is diagnosed that would otherwise not go on to cause symptoms or death (Welch and Black, 2010), is a serious problem in cancer clinics. Indeed, this phenomenon is applicable in the case of approximately 25% of mammographically detected breast cancers, 50% of chest Xray and/or sputum-detected lung cancers, and 60% of prostatespecific antigen-detected prostate cancers (molecular medicine method) (Heng, 2015). The issue of molecular overdiagnosis will likely become worse. Second, can we cure all diseases? If we can, should we? In recent years, there have been many headlines declaring that with advanced -omics technology and powerful artificial intelligence, molecular
8.3 THE CHALLENGES AND OPPORTUNITIES FOR PRECISION
443
medicine will completely alter human fate by bypassing biological evolutionary selection. As a result, we will cure nearly all diseases, including aging. To support these claims, people have cited genome editing technology such as CRISPR/Cas9, as well as organ culture and the use of stem cell technology. We have heard similar promises before (see Chapters 1e3). In fact, mankind has a long history of dreaming of becoming immortal and free of suffering, across all different cultures. But this time is different, we are told. This time we have artificial intelligence, which truly knows how to solve the mystery of life, as if we are God-like. In fact, many funding organizations have clearly settled on the goal of curing all diseases in the not-so-distant future. Can we do it? The answer is no. Based on the gene-centric viewpoint, new gene editing methods can precisely replace individual genes, but from a genome-based evolutionary point of view, the system’s evolution will soon make target-specific molecular manipulation become off-target. This point has been discussed in regard to how DNA transfer methods have an impact on the genome. We conceptualize that the diverse experimental manipulations (e.g., transgene overexpression, gene knock out/down, chemical treatments, acute changes in culture conditions, etc.) may act as a system stress, promoting intensive genome-level alterations (chromosomal instability, CIN), epigenetic and phenotypic alterations, which are beyond the function of manipulated genes. Such analysis calls for more attention on the reduced specificities of gene-focused methodologies. Stepanenko and Heng, 2017.
Further experiments have examined the majority of current DNA manipulating methodologies, including transgene, RNAi, small molecule targeting, and CRISPR/Cas9; all of these have altered the genome, and each run of experimentation led to distinctive karyotypes after evolutionary selection (Heng, unpublished observations). Currently, the precision aspect of CRISPR/Cas9 has been focused on targeted genes or genomic regions. The potential problem is the impact on the entire genome. Hidden genomic rearrangements generated by Cas9 have been reported by another group as well (Boroviak et al., 2017). These results should alarm many. Interestingly, there seems to be a general trend in which types of diseases have been dominant across human history, as defined by the environments and influenced by the advance of medicine. Infectious diseases were dominant before the antibiotics era; now, those dominating are cancer and metabolic diseases, and soon, mental disease will be more dominant. Of course, many future
444
8. THE RATIONALE AND CHALLENGES OF MOLECULAR MEDICINE
diseases will appear as well. Humans will control many diseases, and new forms of current diseases and new types of diseases will come. The only difference is that mankind will use technologies to create artificial selection environments, thereby bypassing some biological evolutionary constraints. The question remains: should we do it (that is, cure all human diseases)? Knowing that genomic heterogeneity is the very reason for many common and complex diseases, and at the same time, that it is essential for the human species to exist and evolve, the answer to this question becomes rather complicated. We probably will try to wipe out many infectious agents (even though the superbugs will challenge back), change our lifestyles to reduce and manage metabolic diseases, live longer, and live in different environments with help of artificial technologies; however, we will still live with diseases because of the unfit variables. Medicine should do as much as possible to reduce the individual’s suffering, but at the same time, it should take care of our own species from a long-term point of view. It is difficult to even realize that, while representative of unfortunate circumstances for some individual patients, having many genetic variations in the human population is essential to ensuring the necessary degree of heterogeneity or robustness (regardless of the good or harm to some individuals in current conditions), which could ultimately contribute to the existence of humanity. Heng et al., 2016b
8.3.3 Genome Alterations and Common/Complex Diseases The long-term goal of the precision medicine initiative is to establish a platform for successful diagnosis and treatment of other noncancer, common, and complex diseases. Based on the extensive analysis of cancer genomics, it is crucial to appreciate the concept that system instability is a key feature of many diseases. Numerous diseases whose phenotypes and molecular mechanisms are different may have the same underlying causedgenome instability. According to traditional thinking, cancer appears as a unique problem because such cells have an apparent growth advantage over their normal counterparts. But the reason altered cancer cells can outcompete normal tissue is that they have lost the homeostasis mechanisms (or system constraint) of normal tissues. Despite the very different features between cancer and other common diseases including differential degrees of genome alteration, they are all in fact system diseases where system deregulation is the key evolutionary process during disease progression. Rearranged genomes represent altered systems, and the created imbalance of system homeostasis is an important defect that favors the evolution of disease.
8.3 THE CHALLENGES AND OPPORTUNITIES FOR PRECISION
445
8.3.3.1 Key Features and Types of Common and Complex Diseases The search for genetic causes of common and complex diseases is a major challenge for molecular medicine. As listed in Table 8.1, many key features of common and complex diseases have handicapped traditional genetic strategies useful for studying Mendelian diseases. While it is difficult to identify commonly shared genomic alterations with high or even modest effects in patient populations, increasing evidence has linked large numbers of rare genetic loci with severe effects on individual patients. Significantly, many detected genetic changes involve genome alterations rather than gene mutations alone. To address this important issue, the genome theory will be applied to illustrate why 4D genomics rather than the 1D gene view can provide answers. Based on the diverse genetic patterns of various human genetic diseases, it is necessary to classify human inherited diseases into four subtypes (Table 8.2) (Heng, 2010). The first type is classified as those produced by genomes that feature commonly shared genetic alterations within a population (gene mutations or chromosomal or subchromosomal alterations such as copy number variations). This type of genome can be further classified into two subtypes. (1) Common genetic/epigenetic loci
TABLE 8.1 The General Features of Common and Complex Diseases. * High incidence within populations * Clear genetic influence, family clusters (tends to aggregate in families) * Failure to identify common genes responsible for the majority of cases after several decades of searching Many genetic loci are indicated, but the penetration is low among patient populations Fewer collective effects are observed when multiple loci are used for population screening Most genetic loci are stochastically involved Diverse genetic/epigenetic heterogeneity among somatic cells Common diseases without common molecular causes and most genetic changes that lead to diseases are rare within populations but carry serious consequences Large numbers of factors (genetic and nongenetic) are involved Many experimental models mimic the phenotype under specific conditions Some diseases are closely associated or overlapped and share some common genomic regions: cancer and obesity, cancer, and aging Many diseases share some key molecular pathways, such as the dysfunction of mitochondria, linkage with stress pathways, metabolic pathways Many diseases share similar genetic networks * Longer periods of time are required for a disease to become clinically dominant (possibly represents an evolutionary process: time þ probability) * Clearly related to lifestyle * Systems diseases involving somatic evolution
446
8. THE RATIONALE AND CHALLENGES OF MOLECULAR MEDICINE
TABLE 8.2 Genetic Disease Classification. Genetic Factors in Patient Populations
Disease Type Prevalence
Relative Genome Stability
A
Commonly shared
Infrequent
Stable
Typical Mendelian diseases (for example, sickle cell anemia)
B
Commonly shared
Infrequent
Unstable
Familial cancer syndromes
C
Rare
Infrequent
Stable
CharcoteMarieeTooth neuropathy
D
Rare
Prevailing
Unstable
Sporadic cancers and neurological and/or behavioral disorders
Disease Type
Examples
Reproduced from Heng, H. H. (2010). Missing heritability and stochastic genome alterations. Nat Rev Genet, 11(11), 813. https://doi.org/10.1038/nrg2809-c3.
that have been identified within a relatively stable genome. Well-known examples of this category include cystic fibrosis, sickle cell anemia, Down syndrome with extra chromosomes, fragile X syndrome with expansion of trinucleotide repeats, and diseases that share copy number variations or even single-nucleotide polymorphisms. (2) Common genetic/epigenetic loci that have been identified with unstable genomes. Typical examples are the p53 mutations detected in LieFraumeni syndrome and BRCA mutations in familial types of breast cancer. This first type of genetic aberration with higher penetration levels has been the primary research focus of inherited diseases where whole genome association studies and patient validation works well. In fact, the concepts and experimental approaches of medical genetics have so far been based on the understanding of this type of disease. The second disease type features genomes that have rare genetic alterations. In contrast to the first type, they are represented by a large number of rare genetic alterations in individuals or families and are not highly represented within the population. Here, there are also two subtypes: type c, rare loci within relatively stable genomes, and type d, rare loci within unstable genomes. As genome instabilityemediated stochastic genome evolution is the driving force of cancer formation, it is likely that most of the sporadic cancers are caused by rare genetic or epigenetic alterations within or leading to unstable genomes, the type d disease. The vast majority of common and complex diseases belong to type c and d, in which the traditional approach of identifying common patterns of genetic/epigenetic alterations has failed. The rationale for such a classification is to illustrate the distinctive patterns among heritable diseases.
8.3 THE CHALLENGES AND OPPORTUNITIES FOR PRECISION
447
Sometimes both common and rare genetic alterations may lead to the same disorder like CML patients with and without Ph chromosomes as well as inherited hearing loss (Dror and Avraham, 2009). More detailed analysis is needed to define rarity and instability of the genome for various diseases. Such a classification can resolve many confusing issues. For example, although many rare genetic loci have been found across the human genome within populations when compared with a few commonly shared loci that are responsible for a common disease’s phenotype, the validation concept and currently used methods downplay the important relationship of these rare loci to diseases. Furthermore, even though stochastic genome alterations are much more frequent and dominant than gene mutations, these genome-level alterations are often ignored unless they can be directly linked to specific disease genes. Such geneticmediated phenotypic plasticity has often been confused with nongenetic effects because genome-level alterations provide additional arrays of phenotypic plasticity that reduce predictability. 8.3.3.2 Stochastic Genomic Alterations Contribute to Most Common Diseases By linking stochastic genome changes to various common diseases, it is hypothesized that the missing heritability of common and complex diseases is the result of stochastic genome alteration during disease evolution. Thus, for most common diseases, more attention should be focused on the heterogeneity and system dynamics defined by the genome package rather than on common gene mutations. To connect the dots, the above classification has been further synthesized based on the genome theory. 1. Changes at the genome level can have an impact on large numbers of individual gene functions. 2. Any significant stress such as specific gene mutations, epigenetic abnormalities, or environmental stresses will cause an increase in genome dynamics, leading to a less stable population with increased NCCAs. 3. The evolutionary mechanism of a given common complex disease is equal to or larger than the sum of all molecular mechanisms. If a specific altered genetic locus can be linked to a particular molecular mechanism in an individual case, the entire collection of different genetic loci is necessary to explain large numbers of diverse individuals. 4. There are many cases where the properties of genome-level alterations cannot be explained by individual gene functions. Thus, stochastic genome alterations are the common reason behind the diverse molecular pathways in patient populations.
448
8. THE RATIONALE AND CHALLENGES OF MOLECULAR MEDICINE
5. In addition to passing on genes and mutations, it is also possible for individuals to pass on the degree of genome instability (or fuzzy inheritance) without direct impact on specific genes or pathways. In this case, the specific genome is less stable under certain environments, reducing predictability in terms of phenotypes. 6. In the same individual, there is a key difference between the genome of germ cells and the genome of somatic cells. Increasing data indicate that somatic genome variations may contribute to disease conditions (Heng, 2010, 2016a-c; Iourov et al., 2008; Vorsanova et al., 2010; Sgaramella and Astolfi, 2010; Ye et al., 2018b). This level of variation typically occurs in only a proportion of somatic cells (Bruder et al., 2008) and is significantly more abundant in adults than in newborn individuals (Flores et al., 2007). Epigenetic differences also arise during the aging process as illustrated by differences arising during the lifetime of monozygotic twins (Fraga et al., 2005). Collectively, stochastic alterations should be observed in many diseases especially in aging tissue (Geigl et al., 2004; Vijg and Dolle´, 2002). 7. Increasing reports link various common diseases to genome-level alterations including hypertension, neuropathy, infertility, schizophrenia, autism, aging, and recently, obesity, and Gulf War illness (GWI) (Heng et al., 2013b, 2017b, 2018; Iourov et al., 2008; Liu et al., 2018; Vorsanova et al., 2010; Sgaramella and Astolfi, 2010; Bucan et al., 2009), and diverse karyotypes have been linked to the vast majority of cancer cases. In addition, the de novo somatic L1 insertion occurs at higher frequencies in the human lung cancer genome (Iskow et al., 2010). Together, this strongly supports the importance of genome variation during the transition from physiological to pathological conditions. Based on the above synthesis, the following general model is proposed for consideration. A. In response to system stress (internal and environmental), the majority of genetic/epigenetic alterations are stochastically distributed among patients’ genomes with low penetration within a population. Only a small portion of genetic/epigenetic changes will display higher penetration in a population. B. Some of these rare genetic alterations can be linked to particular molecular mechanisms and disease phenotypes by being associated with specific gene functions. However, many of these genome-level alterations can impact genome topology rather than only directly affect specific gene loci, especially when different loci function simultaneously within the genome package based on the self-organization principle. A large portion of genomic disorders in this category cannot be satisfactorily explained by an individual gene or even the cumulative effects of
8.3 THE CHALLENGES AND OPPORTUNITIES FOR PRECISION
449
multiple loci. This agrees with the finding that there is a major gap between the majority of variations detected by genome-wide association studies (GWAS) and their biological significance based on the knowledge of specific gene functions. It is thus likely that many emergent properties at the genome and its environmental interaction level (such as heritability) cannot be dissected to the individual genetic elements. C. Because of the nature of heterogeneity, a specific genetic alteration that is crucial for one individual patient may or may not be significant for another. The difference between individuals and populations must not be ignored when screening methods are designed. D. There is a difference between inherited “germline genomes” and “somatic cell genomes” of the same individual. Germline genomes are much more stable than somatic genomes. Somatic genome variation that can occur during developmental and physiological processes such as tissue renewal and aging and particularly during pathological processes is essential to disease phenotypes. E. Some diseases can share certain genetic alterations and all diseases represent different phenotypes of an altered system. F. The genomeeenvironmentetime interaction plays an important role in common diseases. The general cause of many common diseases is system instability, which can be achieved by combinations of an array of molecular mechanisms and environmental stress. Furthermore, the “window of opportunity” between the genome of germline and somatic cells and in particular between somatic cells provides yet another layer of complexity on which environmental insults can act. The initiating factors are often untraceable in fully developed complex diseases because their formation is a time-dependent, nonlinear evolutionary process. It is thus likely that identifying “initial” errors (genetic or otherwise) may have minimal benefit to the diagnosis and treatment of most cases of common diseases. In these cases, it is the 4D genomic dynamic interactions that matter. G. How genetic variations become associated with disease depends on the environment. With drastic environmental changes, it is anticipated that some “future diseases” will become clinically significant. The proposed model is illustrated in Fig. 8.1. Of course, the main purpose of using this simple model is to link genome change to specific gene function, which will be more easily accepted by gene-centric researchers. Note that even though many
450
8. THE RATIONALE AND CHALLENGES OF MOLECULAR MEDICINE
FIGURE 8.1 The stochastic genome alteration model. Stochastic changes can potentially “hit” anywhere within unstable genomes. However, most of the alterations are not directly associated with disease (not shown). Four chromosomes and nine loci (AeI) are illustrated and represent the entire genome. Some of these diverse genome alterations can be linked to specific genes or pathways, but many of them contribute to altering the genome context or dynamics without being linked to specific genes. Multiple loci can contribute to different diseases but individual patients may display variable “hit list” profiles of potential shared loci (in disease type a, loci A, D, E, G, and I are involved in different patients, whereas in disease type b, loci B, D, E, F, and H are involved). Some loci are actually shared among diseases (e.g., D and E for diseases a and b). There are also some genetic alterations that can potentially generate disease conditions if there is exposure to a particular environment (C) (potential or future disease).
chromosomal translocations can be linked to specific fusion genes, the genomic codes can be changed even without those specific fusion genes. More discussion can be found in Chapter 4. 8.3.3.3 The Search for the General Model for Common and Complex Diseases/Illnesses: A Case Study for Gulf War Illness GWI is detected in nearly one-third of Gulf War veterans in the United States (Research advisory committee on Gulf War Veterans’ Illness, 2008). Diagnosing and treating GWI is difficult because of its complex etiology and diverse symptoms. This challenge has led to the slow acceptance of GWI as a real clinical condition (Heng et al., 2016a-c). The general mechanism of GWI remains unknown, despite increasing studies that have revealed a number of associated molecular mechanisms, including
8.3 THE CHALLENGES AND OPPORTUNITIES FOR PRECISION
451
mitochondrion dysfunction and altered immune response (Koslik et al., 2014; Craddock et al., 2015; Parihar et al., 2013; White et al., 2016; Liu et al., 2018). Interestingly, however, all these seemingly diverse observations can be summarized as the following conclusions: (1) There are diverse warrelated factors that can be considered as GWI triggers (e.g., nerve gases, pesticides, insect repellents, and antinerve agent pills); (2) all these trigger events occurred nearly 30 years ago, suggesting that GWI represents a complex adaptive system; (3) the symptoms are highly diverse; (4) most GWI patients display genome instability, reflected as an elevated level of NCCAs, rather than specific and commonly shared chromosomal aberrations; and (5) the majority of identified contributing factors as well as phenotypes can be linked to genome instability. By synthesizing all these facts through the genome theory lens, a general model of GWI has been proposed to integrate stress, genome alteration, somatic cell evolution, and diverse symptoms (Fig. 8.2).
FIGURE 8.2 A general model of cellular evolution of GWI (modified from Heng et al., 2016c, with permission from Springer Nature).
452
8. THE RATIONALE AND CHALLENGES OF MOLECULAR MEDICINE
As illustrated in this model, the GWI process can be divided into three stages: (1) The initial stage: diverse, extremely high stresses incurred during the Gulf War damage cellular systems; (2) the cellular/system evolution stage: many individuals recover from stage one, but in those who cannot recover from the initial damage, the genome will become destabilized, triggering further cellular evolution. This stage may involve a variable period of time for different individuals; and (3) the illness stage (with diverse disease phenotypes): the altered genome can impact different cellular mechanisms leading to diverse symptoms (which is why GWI symptoms range widely). Increased stress and further genome instability represent key features of this stage and are likely linked to illness severity or progression potential. As GWI has progressed for nearly 30 years, the initial trigger factors might be less visible now, but because both genome instability and various stress pathways are elevated in GWI patients, the stressegenome interaction is still clearly evidenced in GWI patients. It is thus extremely important to study the ongoing stress-mediated genome evolution process and its implications for diagnosis and treatment. In addition, this model can also be used to explain some other common and complex diseases that involve stress-induced, genome alteratione mediated somatic evolution. Moreover, the three stages of the disease evolutionary model are very useful when it is integrated with patient-centric health care. Prevention (avoiding initial trigger factors), stabilization (stabilizing the system and slowing down the illness’s cellular evolution), and reduction of the illness’s impact by promoting systems recovery (applying systems constraint and self-healing) should play increased roles in health care. 8.3.3.4 New Model With New Explanations The concept of stochastic genome alteration as the basis for many common/complex diseases needs to be vigorously examined, particularly the study of distribution patterns of the various genome types (a to d) for each disease. This will establish a new standard for validating rare genetic loci and will elucidate the relationship between genome-level alterations and stochastically involved molecular pathways. By emphasizing the importance of stochastic genome dynamics rather than specific common gene mutations, the genome era will finally arrive. This realization will also diminish our zeal for magic bullet therapy to cure common and complex diseases. New methods of medical validation and conducting clinical trials will emerge. These strategies will deliver the benefit of concurrently targeting numerous diseases whose phenotypes are very different but whose underlying causedgenomic instabilitydmay be the same. In addition, new approaches to public health will focus on changing
8.3 THE CHALLENGES AND OPPORTUNITIES FOR PRECISION
453
lifestyles to reduce the probability of developing disease. The future of medicine will greatly benefit from the genome theory. This model has the potential to explain many issues/questions better than previous explanations. Examples are listed as follows. 1. Why are multiple steps of genome stochasticity the main reason for high levels of genomic heterogeneity? Genome stochasticity can affect at least five key transitions that are directly related to human diseases (Fig. 8.3). First, the specific type of genome alteration acquired by any individual within a population is a stochastic phenomenon. It used to be thought that most of the unfortunate genetic alterations inherited arose mainly from one’s own family tree. In fact, many new genetic alterations related to human illness are introduced each generation (McClellan and King, 2010) and are transient. The main reason that most commonly shared alterations can accumulate in populations is not
FIGURE 8.3 Genome stochasticity affects at least five key transitions that are directly related to human diseases. The five different key transitional events are illustrated. Event 1 occurs between the population and the individual. Event 2 occurs between the germline and somatic cells (early development). Event 3 occurs between the somatic cells and other somatic cells during system maintenance. Event 4 occurs in the interval between the normal physiological state and when pathological changes first appear during disease progression. Event 5 involves system alterations that occur following medical intervention. In event 1, for example, population size, genetic drift, and geography all play roles. In event 3, somatic cell maintenance is a lifelong challenge to preserve and maintain system homeostasis. Regarding events 4 and 5, drastic genome alterations might be a key (the altered genome is represented by different shapes of the genome). All five events are influenced by time and environmental interactions, as well as the fuzzy inheritance.
454
8. THE RATIONALE AND CHALLENGES OF MOLECULAR MEDICINE
because they are harmful but because many rare alterations can be linked to disease under a given environment (also see the fuzzy inheritance section). Second, variation occurs from the germline level to somatic genomes in a given individual, notably during developmental processes such as T and B cell development and high-level ploidy dynamics observed from hepatocytes (Duncan et al., 2010). Third, variation occurs during somatic cell maintenance such as during tissue regeneration. Fourth, additional high levels of variation occur during the transition from normal physiology to pathological conditions where genome integrity is often lost as in cancer cells. Finally, from pretreatment conditions to the posttreatment state, pathway switching is a frequently observed phenomenon associated with genome alterationsdnote the loss of effectiveness of specific drugs during cancer treatment as the cancer cells adapt to the new dominant pathways. Such pathway switching is often accomplished by altering karyotypes. Together, all five steps contribute to extremely high levels of genome alterationemediated genetic and epigenetic heterogeneity including karyotypes, CNVs, and de novo insertions of endogenous retrotransposons. These steps reflect the important involvement of fuzzy inheritance, which represents the big challenge for precision medicine, which is only based on parts characterization and targeting. 2. How can the same genetic defect generate alternate disease phenotypes while different genetic alterations can lead to similar phenotypes? The same genetic defect can be linked to diverse phenotypes because of combinations of different genetic or environmental modifiers. More significantly, the same gene mutation can display a variety of functions within different karyotype-defined genomes. On the one hand, most common diseases occur at the somatic cell and tissue/organ levels where both the developmental and aging processes are intimately involved. Environmental impact such as geographic factors and lifestyle also contribute to the wide variation in phenotypic diversity. In addition, unstable genomes have more probability of becoming “abnormal” during the development and aging processes, particularly during stress caused either by the altered genomes themselves or by various environmental stresses. This explains the phenotypic variation among individuals who share similar genetic alterations. On the other hand, as the human body has limited major systems (cardiovascular system, nervous system), many genetic abnormalities will ultimately impact on these same major organs or systems. From a molecular and cellular biological point of view, many key cellular functions will be commonly involved as many genes can be involved in similar pathways, creating a highly interlinked internal organization of the
8.3 THE CHALLENGES AND OPPORTUNITIES FOR PRECISION
455
cell (Albert, 2005)da package deal (Heng, 2009). For instance, common/complex diseases are often associated with abnormal metabolic regulation, increased endoplasmic reticulum stress, and abnormal cell death regulation. Notably, mutations of p53 lead to multiple effects (Vogelstein et al., 2000) while many genomic regions have been linked to various types of disease such as four constitutional (germline) genomic disorders, an array of other somatic disorders linked to 17p11.2p12 (Carvalho et al., 2010), and the association of 16p11.2 with autism and obesity (Weiss et al., 2008; Walters et al., 2010). Similarly, common polymorphic variation at the histocompatibility (MHC) loci has been linked to autoimmune and inflammatory conditions such as multiple sclerosis, type 1 diabetes, systemic lupus erythematosus, ulcerative colitis, Crohn’s disease, and rheumatoid arthritis (Fernando et al., 2008). It has recently been noticed that the transcriptional signature and common networks link cancer with diverse human diseases (Hirsch et al., 2010). Many diseases can be derived from an unstable system and can further promote the instability of that system. This explains how different common diseases share many of the same genetic alterations and similar environmental responses as many common diseases are just varied expressions of an unstable system. It is also possible that the shared network reflects the system dynamics of the “abnormal system” in general rather than specific pathways. 3. Why is epigenetic deregulation particularly important in human disease but hard to target? For a given species, the framework of the genome cannot be drastically altered due to sexual reproduction, yet increased system complexity is essential for evolution (Heng et al., 2009; Stevens et al., 2013a). Epigenetics therefore serves as another layer of complexity (Huang et al., 2009). This situation would be similar to a person changing the color of a house or rearrange the furniture but being unable to alter the architecture itself. Since epigenetic regulation is more sensitive to environmental stress, it has a profound impact on human diseases. However, despite the possibility that abnormal epigenetic regulation represents the earliest changes during the evolution of many diseases, application of epigenetic therapy is challenging. A potential dangerous side effect is related to less predictable responses and the fact that the somatic genomes are drastically altered for many late stage diseases. Simply targeting the epigenetic status will not reverse a changed genome (Heng et al., 2010b), yet, drastically challenging the epigenetic status by treatment (especially prolonged treatment) could harm the system, as interestingly, despite generating large amounts of diversity (such as contributing to diverse disease
456
8. THE RATIONALE AND CHALLENGES OF MOLECULAR MEDICINE
phenotypes), most epigenetic alterations are in flux. This is a key biological advantage of maintaining the genome system while providing phenotypic diversity and complexity in normal tissue. Targeting the epigenetic status therefore could go either way and could just as easily harm the system. In addition, this plasticity is much more subtle compared to gene mutation mediated phenotypes, and they are harder to target with any precision or predictability. 4. Why are genome alteration-mediated common diseases difficult to study? As collectively illustrated by the above reasons, the main difficulty is the stochasticity of a complex system that diminishes the significance of linear causative relationships. The emergent property of a disease is not simply based on the quantitative accumulation of individual loci but rather on the genome-level information package. As illustrated in Figs. 3.4, 8.3, the genetic information transfer goes through multiple steps that provide increased stochasticity of genome variation. In a sense, for many sporadic cases, the genome is the nondissectible unit of information of an individual’s disease condition. Thus, the gene theoryebased dissection of an individual’s genes will not adequately explain the missing heritability. Furthermore, within a patient population, diverse combinational patterns make pattern identification extremely difficult using current approaches, as the alteration of genome architecture may lead to the failure to replicate data of a given genetic association study (Greene et al., 2009). It is likely that the combination of rare genetic alterations with ancient commonly shared polymorphisms may contribute to disease. These ancient polymorphisms are shared by all human populations and account for 90% of human variation (Tishkoff and Verrelli, 2003). Another key challenge when attempting to link individual genetic loci and disease phenotypes is the evolutionary process itself where time and historical contingency are important. Many common diseases require years of evolution to become clinically significant and it is difficult to identify and repeat the historically contingent events for different individuals within the patient population. This is particularly so for many sporadic diseases. In addition, an unstable genome often generates highly diverse genome populations among somatic cells where the heterogeneous cell population further stresses the system and the stress can damage the homeostasis leading to disease. Interestingly, research has established the relationship between diverse types of molecular mechanisms and common system stresses which explains why it is
8.3 THE CHALLENGES AND OPPORTUNITIES FOR PRECISION
457
so difficult to predict the probability of diseases based on specific molecular mechanisms (Stevens et al., 2011a-b, 2013a-c; Heng, 2015). This analysis also applies to the large number of microregulators of disease genes. The recent ENCODE project (the Encyclopedia of DNA Elements) has identified a huge number of noncoding sequences and each might contribute to disease conditions in a very subtle way. Despite the high-level excitement of these discoveries, it would be extremely challenging to apply this information to clinical situations. If the emergent properties at the genome level are hard to dissect into individual genes, it will be even more challenging to dissect them into individual microregulators of these individual genes. If we cannot make predictions based on a handful of seemingly dominant influential gene mutations, how can we realistically expect to decipher the more subtle effect of these less prominent elements? 5. If evidence to support this model is overwhelming, why was it not recognized before? There are a number of contributing factors that have led to a blind spot regarding this important link. First, often what is seen is what is recognizable, familiar, or expected. Second, the knowledge generated from infectious diseases has influenced our approach to diseases in general, as it is believed that each type of disease should have the same cause and effect similar to infections caused by the same type of infectious agents. Third, in genetics, there is a tradition of making exceptions into general rules. Highly penetrating, widespread gene mutations (disease alleles) with a high correlation in disease phenotypes are the exception and their uniqueness (exceptional value) is the main reason we analyze them. However, as soon as these molecular mechanisms are established, the fact that these mechanisms will likely work only for these exceptions is often overlooked. To further compound this mistake, these exceptions are used to validate general findings. If we search for the genetic basis of diseases in diverse patient populations, a link can often be found with genetic defects in some individuals. But most individuals in the general population will display different alterations representing different genetic loci. Such situations have prevented the establishment of the link between specific diseases and rare genetic alterations as it is difficult to validate them using the concept of “common diseases being caused by common genetic loci.” Fourth, most of us believe that different levels of information can be inferred by data accumulation and synthesis. Despite the fact that there are multiple levels and nonlinear relationships involved, there is a tendency to attempt to understand genes first and then translate this information to higher levels of organization. Unfortunately, this
458
8. THE RATIONALE AND CHALLENGES OF MOLECULAR MEDICINE
approach no longer works, as the emergent properties cannot be understood by classifying only the parts at a lower level. The knowledge gap between these levels is the real challenge (Heng, 2013c). Fifth, bio-heterogeneity among patients has traditionally been considered to be “noise,” which can be eliminated by analyzing large numbers of diverse samples. Sixth, many common and complex diseases are a result of somatic cell evolution where time is a key element. The issue of time is related to many other physical/ pathological and environmental factors and it is rather difficult to predict on an individual basis despite its power of prediction based on the population. Last (but not least), with a reductionist’s mindset and the influence of molecular biology, many are not comfortable at all if molecular targets are not identified. If it is not in the genes, then it must be the noncoding parts including noncoding RNAs or epigenetics. Something of a molecular nature must be at the root of this problem or so the thinking goes.
8.4 FUTURE DIRECTION The goal achieving precision genetics started from Mendel in 1866, when he introduced the method of calculating the pattern of genetic factors across generations. Sixty years later, when Morgan introduced his gene theory, he declared “. the theory of the gene, enable[s] us to handle problems of genetics on a strictly numerical basis, and allow[s] us to predict, with a great deal of precision, what will occur in any given situation” (Morgan, 1926). Based on the contributions of Ronald Fisher, Sewall Wright, and J. B. S. Haldane, population genetics has played a key role in the neo-Darwinian synthesis. With all of these statistical tools, bioscience has drastically increased its prediction power. So far, so good. Both genetics and evolutionary biology have achieved their solid frameworks, and the major work ahead is simply to fill in some details and to apply these exciting knowledge into practice, including in medicine, or so we were told. In particular, people predicted, based on the availability of large datasets (such as the Human Genome Project) and increased computational power, precision genetics would finally be within our reach. This was where the problem appeared. It is also the rationale behind why we search for new theories. It is thus not at all surprising that after over 150 years of triumph and failure, many geneticists are determined to achieve the dream of precision medicine, the ultimate implication of the precise genetics that Mendel had foreseen.
8.4 FUTURE DIRECTION
459
The fact is, however, that the gene represents only “parts inheritance,” genomic information is fuzzy, and the evolutionary emergent process is highly dynamic. This explains why it is so challenging to explain some key genomic/evolutionary phenomena and to precisely predict clinical outcomes based on genetic profiling. It comes full circle.
8.4.1 Facing Reality: The Increased Bio-Uncertainty As described in previous chapters, bio-uncertainty has increased drastically since the birth of genomics, and especially, following the start of the Human Genome Project. Initially, the immaturity of technical platforms was blamed. Gradually, it was realized that this newly revealed bio-uncertainty is real. This accumulative realization is based on the following examples of observations: (a) Various large-scale -omics have produced overwhelmingly heterogeneous data, both from experimental systems and from clinical samples; (b) Cancer evolutionary studies have questioned the certainty of the cancer genes, as well as the pattern of somatic evolution; (c) GWAS studies have delivered disappointing results despite large sample sizes, and the missing inheritability issue has become obvious; (d) There is clearly a complex correlation between different layers of molecular profiles (DNA sequencing, transcriptome, proteins); (e) Single-cell technology has revealed the importance of “noise,” and the study of NCCAs led to the concept of the fuzzy inheritance; (f) Many biological processes seem very wasteful (large amount of RNA transcripts do not get used for protein; portions of proteins are immediately degraded soon after their synthesis; many insulin molecules are destroyed soon after their synthesis); (g) DNA-binding specificity can be influenced by genomic topology (different cellular locations with different substrates), gene function specificity can be affected by karyotype, and pathway specificity can be influenced by the level of stress and cellular environments; (h) Mosaicism is common in human beings (as we all had a “genomic touch” of disease, in different degrees of genomic impact), and genome chaos can be observed from early developmental stages; (i) The century-long knowledge of genetics cannot apply to common and complex diseases, given our understanding of how macroevolution really works.
460
8. THE RATIONALE AND CHALLENGES OF MOLECULAR MEDICINE
Bio-uncertainty always exists. However, because of the use of many well-established linear model systems and selective data collection, heterogeneity-reflected uncertainty has been ignored, even since Mendel. This strategy has proven ineffective when a given conclusion from basic research needs to be translated into the clinic or when the data need to be validated by different models or different researchers. Here are a few well-discussed experiments and their conclusions: Over the past decade, before pursuing a particular line of research, scientists (including C.G.B.) in the haematology and oncology department at the biotechnology firm Amgen in Thousand Oaks, California, tried to confirm published findings related to that work. Fifty-three papers were deemed ’landmark’ studies
(Note: these articles are published by reputable labs in top scientific journals. Thus, the selection criterion is much higher than that of average papers.) Nevertheless, scientific findings were confirmed in only 6 (11%) cases. Even knowing the limitations of preclinical research, this was a shocking result. Begley and Ellis, 2012
This is an extremely shocking conclusion, but it is in line with other validation experiments. Scientists from Bayer, another pharmaceutical company, have examined 67 target validation projects in molecular medicine, including cancer care and cardiovascular medicine. The reproducibility is only 21% (Mullard, 2011)! These important analyses on data irreproducibility from academic research are timely. They forcefully revealed a huge problem in molecular medicine: the vast majority of the research publications are not reliable. This represents a deep insult for any serious scientist who works hard to search for the truth. But what are the scientific explanations behind this stunning observation? While no researchers come out to dispute these numbers, many have offered their explanations of what has gone wrong with current science. There are many obvious factors that bear the blame: dishonest individuals, insufficient duplication of experiments, the publication of only positive data (and the hiding of negative data), cell line contaminations, experimental condition variability, etc., and the list goes on. One personal estimation is that cases of dishonesty should be lower than approximately 10%. The majority of cases are directly caused by bio-uncertainty and the current conceptual and technical limitations of molecular medicine. Because of bio-complexity, any given process can display different molecular features, depending on its context. Most reports have captured one of the many potential statuses
8.4 FUTURE DIRECTION
461
during the dynamic process, although many researchers are guilty of cherry-picking. As illustrated in Chapter 3, the observation that changed our view about molecular reproducibility occurred during the “watching evolution in action” experiments, using an immortalization model. As each run of cellular evolution can be achieved by different genomes (with unique karyotypes) coupled with different molecular pathways, each run of an experiment can be linked to some specific genes, which is good enough for publishing papers. However, if the same experiment is repeated (a task which entails more than a year of time and effort), different genes will be identified. Not only can many molecular results not be duplicated during different runs of evolution; just focusing on specific molecular parts will result in one fundamentally missing the stochastic nature of cancer evolution. Surely, as is evidenced by the literature, different molecular mechanisms have often been reported by different groups using the same model system. This has even been the case even for the same research group, when reporting different molecular links on different occasions. The only conclusion is that these reported certainties are based on isolated models. When different contexts are compared, the uncertainty of a given molecular event is high and universal. Interestingly, the main reason that molecular reproducibility has become a big issue now is also linked to the stage of molecular medicine itself. This topic has been briefly discussed: In molecular biology, most researchers have focused their studies on isolated parts, such as: enzyme activities in vitro, a specific gene’s structure, the regulation of a defined pathway, a given cellular structure/response, and molecular characterization of causality in a linear experimental system. To study these individual structures and mechanisms, focusing on the average is justified and even effective. At this level of understanding, reductionist approaches might work best, as noise becomes less visible and is less important for the understanding of parts. . our bio-knowledge can be classified into three types: (1) parts characterization: description of the parts and study how parts can potentially work. To achieve the goal, many parts are analyzed under the condition of isolation. (2) parts assembly: study the conditions to put many parts together to characterize a specific pathway or biofunction, the assembly of ribosome, the stages of development, and interactions among parts. Current systems biology holds much promise to advance this type of understanding; and (3) how the system as a whole works under evolutionary selection in a laboratory setting or nature, and especially under various stress conditions where the physiological and pathological responses might differ drastically Heng, 2015 (with permission from World Scientific).
As increased attention becomes focused on the translational implication of molecular discovery, the third type of research will become dominant, as will the issue of reproducibility. That is the reason why the
462
8. THE RATIONALE AND CHALLENGES OF MOLECULAR MEDICINE
TABLE 8.3 A List of Some Contributing Factors That Often Give Molecular Researchers the Illusion of Bio-certainty Concept: Cellular evolution is stepwise and accumulative. Targeting key genes in the process should cure diseases. Common and complex diseases share the same genetic mechanism of single-gene diseases, with the exception that there are more genes involved. The key is quantitative analysis based on large sample size. Methods: Creating linear models by reducing or eliminate heterogeneity Focusing too much on nonrealistic hypotheses Cherry-picking (i.e., only reporting data that make sense under a given hypothesis) Using averaging-based profiling methods to wash off “noise” and increase the support of statistics Assuming that the same principles apply to both normal physiological and pathological conditions
genome-mediated evolutionary concept has become essential for future molecular medicine Table 8.3. To illustrate that outliers (and uncertainty) are very important for the evolution of diseases, phenomena that are generally ignored by current molecular methods, the question should be raised: how many times does one need to repeat an experiment to capture the dynamics of outliermediated drug resistance? Following the comparison of the short- and long-term effects of drug treatment, the maximal dosage-induced drug resistance experiment needed to be repeated 64e100 times (Heng, 2015; Horne et al. unpublished observation).
8.4.2 Big Data, Artificial Intelligence, and Biomarkers for Adaptive Biosystems Nowadays, “big data” and “artificial intelligence” are exciting topics in molecular medicine. The vast majority of all bio-medicine data have been generated from the past 2e3 years, and only an estimated 1%e2% has been analyzed. With such a massive amount of data pouring in, is molecular medicine prepared for more? Biomedical research has always been driven by data generation and analyses. Before the massive -omics data era, researchers spent most of their time generating data. Although it is tedious work, researchers thought they knew what the generated data meant, at least in the ballpark. But not anymore, not for traditional molecular researchers, many of whom have worked on one or a few key genes or cellular targets for their entire career.
8.4 FUTURE DIRECTION
463
In this section, genomic considerations regarding the big data approach in molecular medicine will be discussed. In particular, it will speculate questions regarding the relationship between data and theory and what types of genomic data should be collected in the big data era. 8.4.2.1 The Future of Big Data in Biological Systems The term “big data” does not simply refer to the rapid accumulated data itself but the evolving technological platform of computer science, which enables us to extract new insights from massive datasets. For example, one simple definition stated: “Big data represents the information assets characterized by such a high volume, velocity and variety to require specific technology and analytical methods for its transformation into value” (De Mauro et al., 2016). Although many definitions exist, most have emphasized the following key elements: new characters (volume, velocity, and variety), specific computational technology, and the end products: valuable information. For biologists, we should focus on the data generation and how to use the information revealed by big data to establish and validate new biological theories which can contribute to the practical benefits of humanity including molecular medicine. The increased power of artificial intelligence, evidenced by AlphaGo (a computer program), defeated the best professional player of the board game Go in 2015 (Silver et al., 2016). This event had people hoping artificial intelligence will also serve as a game changer in molecular medicine. For a machine to win a given game, it depends on the rules, algorithm (a set of unambiguous instructions that a mechanical computer can execute) (Domingos, 2015), and extensive training (playing with humans and more importantly with machines). To duplicate such success in molecular medicine, the correct rules, training, and a biological version of algorithms are crucial. These requirements bring forth a difficult dilemma: computers need a biological theory in the first place to capture and analyze the right types of data among so many and only the data can confirm the theory. To solve this issue and avoid circular arguments, we should collect data based on various theories and then allow the data to validate the theories. For example, the evolutionary concept is now integrated into artificial intelligence. However, if the evolutionary theory needs to be changed at the first place, it of course will have an impact on how artificial intelligence works. Nevertheless, biologists must play an important role in big data business when dealing with medical issues or when algorithms are based on biological principles, in particular, to provide the correct biological concept to build and train machines. The computer technologies, no matter how powerful they will become, are only helpers to biologists and physicians in treating patients and searching for the truth in biology.
464
8. THE RATIONALE AND CHALLENGES OF MOLECULAR MEDICINE
8.4.2.2 Big Data Versus Theories: The End of Theories or the Beginning of Better Theories Current genomics has collected parts data vs system data but the characterization of parts so far has failed to achieve the understanding of a genomic system. Not distinguishing “parts inheritance” and “system inheritance” has led to much confusion (Chapters 1e4). To solve these confusions, different strategies have been proposed: 1. Analyze more samples to eliminate the “noise” and to identify the pattern of gene mutations, which will ultimately validate current theories of human diseases. This strategy has been used in the current cancer genome project and has not been successful as the “noise” or heterogeneity is a key feature of the cancer system. The more samples analyzed, the more diverse mutations are discovered for the majority of cancer types (Chapter 4). The same approach was also unsuccessful for GWAS. 2. Use the big data approach to establish a correlation between genetic data and disease phenotypes. As promised by many computational experts, the big data approach will finally establish reliable biomarkers for most diseases. Although there are some great success stories about the market prediction based on big and messy data, predicting a biological system within an evolutionary context will surely be more challenging. For example, in the physical world/ business world, predictions can be made based on the general trend of data. For predicting evolutionary events however, the outliers rather than the average data make the call, and macroevolution is often based on “accidents” or a “perfect storm” of emergence. Moreover, what if new system-based data, which we do not even know how to collect, are actually the key? In this sense, we do not even have the correct data for molecular medicine in the first place. Obviously, a better genomic and evolutionary theory is essential in guiding data generation and analyses. A correct cancer theory will also convince cancer researchers to accept the idea that correlation between genomic profiles and phenotypes is good enough in cancer diagnosis (while focusing on individual molecular mechanisms is not useful when there are so many, especially for a complex adaptive system). The evolutionary mechanism of cancer predicts that knowing “what happened” is more important than knowing “why it happened,” as how one pathway works under a defined condition has little to do with predicting cancer in reality. Last, the strength of big data is identifying associations without understanding the meaning. Correlation is good enough for discovering fact, and new theory is then needed to synthesize these facts.
8.4 FUTURE DIRECTION
465
3. Our recommended approach is to use a correct theory to guide the data collection. Specifically, collecting the genomic data based on multiple levels of genetic organizations (including the ignored karyotype level) and applying the results of big data in molecular medicine as well as validating and improving the genome theory. As bio-scientists, our task is to establish a correct theory of genomics and evolution and use these theories to guide data generation and collection and use the results of big data to develop, falsify, or improve new bio-theories. This approach can also reconcile the opposite viewpoints regarding the value of the genomic theory in the big data era. In 2008, Chris Anderson, then the editor in chief of Wired magazine, published a provocative article “The end of theory: The data deluge makes the scientific methods obsolete” (Anderson, 2008). This piece has generated a heated debate regarding the importance of a theory in science and the rationale to study correlation or causation. Of course, many scientists disagree with Anderson’s conclusion even though they consider the importance of this debate. Most of Anderson’s observations/evidence are both true and highly significant, including the great limitation of the Mendelian theory in genomics and the speciation concept in evolution. However, these limitations only point out that the current paradigm of genetics and evolution might no longer work and science desperately needs new theories rather than end theories altogether. In fact, we have listed much more evidence than Anderson did, but with very different conclusions. In our view, quite the opposite, a way out is to search for a new theory with the help of big data. Once again, it demonstrates the importance of a scientific paradigm on individual observers, as the same facts can lead to drastically different conclusions. For Thomas Kuhn, perhaps, the observations Anderson mentioned would have represented a clear sign for a paradigm shift; for us, it is the time to search for new genome-based genomics and evolutionary theories. Either way, it should be the beginning of new theories with the help of new technologies. 8.4.2.3 How to Collect the Necessary Data to Create a New Generation of Biomarkers? The triumph of precision medicine is largely contingent on the success of identifying large numbers of reliable biomarkers. Despite that hundreds and thousands of biomarkers have been described each year, only few will reach clinic. Moreover, some biomarkers suffer with a lack of reproducibility. To change this situation, the National Biomarkers Development Alliance has organized two think tank meetings in 2017 and 2018, aimed
466
8. THE RATIONALE AND CHALLENGES OF MOLECULAR MEDICINE
to “connect the dots” among topics of “big data, artificial intelligence, and biomarkers.” There are shared concerns regarding biomarkers, including the quality of data and database, the data standardization, integration and sharing, industry’s participation, government’s role and patient’s rights, to name a few. As for how to improve the strategy for developing biomarkers, the majority seem to favor “the more the better” approach. The suggested amount of -omics platforms that should be used keeps growing during discussions, as if biologists’ job is to collect all information possible, and big data analyses will consequently deliver the good biomarkers. There is limited interest in prioritizing current -omics strategies. Overall, it is a bit surprising to see that only a few presentations have discussed the conceptual limitations of the current genomic approach, despite the importance of the information theory and the fundamental laws of biology (to create algorithms provides a mechanistic approach to the discovery of biomarkers) being emphasized by some speakers. We have insisted that biology does need a new framework to collect data; and current genomics has only collected parts data rather than the systems data that we need. As genome organization (system) is more important than genes (parts) in cellular evolution and these two levels of genomic organizations follow different “laws,” different biomarkers are most definitely needed to monitor these different processes. Unfortunately, for systems with multiple levels, information from lower levels is easier to obtain (such as DNA sequence) but has little to do with system control. To further illustrate our viewpoints, the following questions were considered: 1. How can we develop biomarkers to predict emergent properties during the cellular evolutionary process (in which many genetic targets are constantly changing and there is no linear correlation between genetic markers and phenotypes)? 2. Which genetic/genomic or cellular entity (gene, chromosome, epigene, expression profile, overall genome instability, metabolic profile, or a combination) should be our priority when developing biomarkers? Which types of data should we collect and what purpose will they serve? Now efforts are being made to study genotype-tissue expression (GTEx Consortium 2015), what else needs to be done? 3. Knowing that heterogeneity is the key driver for many diseases, how should we deal with “noise” in developing biomarkers? What if the genetic code is not precise but fuzzy in the first place? How should we predict the emergence of outliers (traditional cancer research has ignored the outliers) (Abdallah et al., 2013; Heng, 2015)
8.4 FUTURE DIRECTION
467
4. Why is correlation good enough in cancer research (while focusing on individual molecular mechanisms is not useful when there are so many, especially for a complex adaptive system)?: the evolutionary mechanism of cancer predicts that knowing “what” is more important than knowing “why” (as how one pathway works under a defined condition has little to do with predicting cancer in reality). To address these issues, it was proposed that the field of molecular medicine should search for better frameworks and technical platforms by rethinking current genomics and evolution theories. The following suggestions represent some examples of how to achieve this goal. 1. When developing new biomarkers, focus on monitoring the system behavior, not specific pathways (e.g., use NCCAs to measure genome instability for different diseases, based on heterogeneity and complexity, outlier’s profile). 2. Record longitudinal datasets to monitor the dynamic process (e.g., when monitoring the disease progress as well as drug treatment response) 3. Pay attention to phases of evolution (in the macrocellular phase, genome data are more important, whereas in microcellular phase, gene mutation data might be more useful). Different phases need different biomarkers, and the phase transition is of clinical importance. 4. Collect quantitative data (with all positive and negative results, benefits and trade-offs, long and short term). 5. Do not treat all datasets equally (it is important to identify the conflicting data sets. It is known that in different cases, predictions can be improved or reduced by combining data) and phenotype data should be weighed more heavily. It is necessary for the research community to compare different types of biomarkers in identical conditions to identify the best stage specific biomarkers. 6. Outlier data are more important than average data during crisis (drug resistance data need to be repeated over 60e100 times based on our in vitro experiments). Methods are needed to predict the emergent properties based on outliers. 7. While increasing the quality of data is essential (to avoid the situation of garbage in and garbage out), researchers also need to respect the fuzziness of real data when dealing with messy big data, as bio-uncertainty and nonspecificity is a key feature for disease conditions. The commonly used statistical platforms need to be modified to reflect such needs.
468
8. THE RATIONALE AND CHALLENGES OF MOLECULAR MEDICINE
8.4.2.4 Big Data and Phenotypes If the big data approach is effective in identifying associations, why not take advantage of this amazing technology for developing biomarkers? If the genomic fuzziness is so high, why not focus on the phenotype rather than gene profiles, especially with the new finding that normal tissues also host a large amount of gene mutations including cancer driver mutations (Martincorena et al., 2018)? There are two new trends representing such a shift in focusing more on phenotypes rather than molecular parts to search for biomarkers: One is focusing more on cellular phenotypes rather than genotypes to monitor disease progression and treatment responses. For example, many cellular features (cell weight, nuclear morphology, cellular differentiation status, and growth and survival rates) can result from various genetic and environmental conditions. They can all be treated as end products of a complex interactive relationship without knowing which molecular pathway is currently involved (like a “black box”). Then, combined with the big data approach, identify useful biomarkers based on cellular phenotypes. Encouragingly, US Food and Drug Administration considers “black box” algorithms of complex diseases an acceptable strategy for identifying biomarkers (if the inputs and outputs are robust). Interestingly, when frequencies of various types of NCCAs are used as a biomarker, it is not based on the specific genomic change, but the stability of the cellular population, another phenotype. In fact, these new systems related phenotypes, including the frequencies of the outliers, the complexity of the karyotypes, the evolutionary phases of transition, new karyotype emergence, all of which have also been referred to as system behavior, should be used for establishing new biomarkers. Similarly, the dynamic level of transcriptome and epigenetic activities rather than specific gene/epigene function can serve as a better biomarker to monitor the disease process (Stevens et al., 2011b, 2013a-b, 2014; Heng, 2015). Recently, using aneuploidy as an example, such points have been explained through the lens of system inheritance, fuzzy inheritance, and emergence of new genome systems (Ye et al., 2018b). As nonclonal aneuploidy represents a phenotype which can be used to unify diverse molecular mechanisms, its clinical predictability is much better than individual gene mutations. Other trends include using the large datasets of normal individuals and patients’ health records to study the relationship between genetic profiles and diseases and connecting different diseases to one another, in addition to looking at the interactions among different treatments, the baseline and ranges of normal individuals and patients, and their diseases along with health and survival consequences. Well-known examples
8.4 FUTURE DIRECTION
469
include the UK Biobank and recent US million veteran program (MVP). The UK biobank involves 500,000 participants with links to a wide range of electronic health records (cancer, death, hospital episodes, general practice). In particular, 100,000 participants have worn a 24-hour activity monitor for a week, and 20,000 have undertaken repeat measures; 100,000 selected participants have done scan image (brain, heart, abdomen, bones, and carotid artery); all participants have done blood biochemistry and genotyping and many with exome sequence (UK Biobank). The goal of MVP is to partner with veterans receiving their care in the VA Healthcare System to study how genes affect health. Other types of information will still also be available. Although very promising (Cox, 2017), these types of programs also have their limitations. First, genomic information is mainly based on the gene’s contribution. It would be more valuable if the chromosomal features were also included. For example, it is now known that elevated stochastic chromosomal aberrations can be linked to GWI (Liu et al., 2018). The continuous ignorance toward chromosomal aberration relevance is puzzling. Second, it is very important to retrieve population data based on large sample sizes; there is still a gap when applying such information to individual patients, especially when the establishment of significant linkages between disease and factors requires hundreds and thousands of individuals. In a sense, it is still difficult to use the population pattern and general trend to precisely diagnose individual patients. Efforts regarding this aspect clearly need improvement. The idea of focusing on phenotypes has generated exciting results (Bastarache et al., 2018). Similarly, different medical records have been applied to study human diseases. For example, using insurance claims for over one-third of the entire US population to create a subset of 128,989 families (481,657 unique individuals), Wang et al. have used these data to “(i) estimate the heritability and familial environmental patterns of 149 diseases and (ii) infer the genetic and environmental correlations for disease pairs from a set of 29 complex diseases.” They found that “migraine, typically classified as a disease of the central nervous system, appeared to be most genetically similar to irritable bowel syndrome and most environmentally similar to cystitis and urethritis, all of which are inflammatory diseases” (Wang et al., 2017). Many similar types of research are actively ongoing from nearly all major medical centers and the topic of big data mining on medical records will soon become a big deal in molecular medicine.
8.4.3 Education and the Future of Biomedical Science Before the end of this book, the issue of how to educate and influence future biomedical scientists deserves a brief discussion. The importance of this subject is obvious: the new generation of scientists will ultimately
470
8. THE RATIONALE AND CHALLENGES OF MOLECULAR MEDICINE
decide what the future of molecular medicine will look like and which theories will dominate. However, today’s education systems are not likely up to the task. The current PhD students’ training has been more focused on how to keep up with the excellence of incremental scientific progress and how to equip them with popular technical skills. This has been particularly true since genomics became a big science, where most of the trainees have become the skillful work force. Such issues have triggered serious rethinking in treating scientists as a workforce and its damage to the future of science (Lazebnik, 2015). In recent years, the bioresearch landscape has drastically changed in front of our eyes. Traditionally, there was a higher degree of heterogeneous research environments. Scientists were trained and influenced by different schools of thought (reflected by the academic tree back to generations), excelled in different technical skills or experimental systems, and were interested in diverse research topics. They often became experts in their own field by accumulating decades long of experience. The greatest scientists who made seminal contributions all went through decades of dedication by continuous focus on research topics that most interested them and seized their curiosity rather than simply shifting their research direction for the sake of popularity or funding. This aspect of research has changed since the arrival of molecular genetics and the biotechnology industry and such changes have been greatly accelerated by big science projects such as the Human Genome Project. Most laboratories now use similar popular methods and this causes a scientist’s individuality to gradually disappear. As large-scale technologies have become a big driving force in the research community and research funding pays for access to these technologies, success in current research environments requires high levels of funding and a large working force, making smart research ideas and unique skills less valuable and less desired. With individual creative thinking rapidly losing its influence on science, as many big projects are goal-oriented and individuals only play their own part, scientific individuality is being drastically reduced for many, while a small portion of “science managers” continue gaining more credit. The goal of many researchers is to successfully obtain funding and publications, rather than search for the truth, which should ultimately be the main goal of conducting any kind of research in the first place. It seems as though they have little time for conducting truly meaningful research, which often requires long-term thinking and the courage to not simply “go with the flow.” Becoming an exceptional scientist is difficult. It requires decades long of knowledge accumulation and long hours of work and dedication (including holidays), all while being in a competitive environment and with moderate financial rewards. The driving force for most scientists is the curiosity toward nature, a passion to search for the truth, intellectual
8.4 FUTURE DIRECTION
471
satisfaction, and the potential contribution to humanity. The negative environments of current research communities, which in turn have a huge negative impact on researchers, are too significant to ignore. Such situations have a great impact on scientific morality and wellbeing. This phenomenon has been discussed often among scientists but less in public. One rare publication appeared near a decade ago in a cancer research journal which asked a profound question: where is the passion for cancer research? Scientific research was once considered a pinnacle profession where intellectual rigor was paired with a passion for novel discovery. Today, despite better equipment, more funding and online access to a growing reservoir of data, researchers in some of the largest cancer research centers in the country appear to be spending less time in the lab and, perhaps, less time worrying about how their work impacts people with cancer. Kern, 2010
This explains (partially) why the future for graduate students in the United States is troubling, according to some: About 60% of graduate students said that they felt overwhelmed, exhausted, hopeless, sad, or depressed nearly all the time. One in 10 said they had contemplated suicide in the previous year. Arnold, 2014
A key approach to changing this situation is to bring back the individuality and fun of science. Worthy scientists should be independent capable thinkers who enjoy challenging the status quo and embracing enhanced frameworks while working toward a lifelong goal to improve. To keep genomics an interesting field, there needs to be a balance between key fundamental aspects: between “boring” data collection and exciting data analyses as well as theoretical synthesis; between tedious daily life and continuously deep thinking on some key topic of interest, without constantly chasing the fashion of research topics; between one’s own technical expertise and newly developed cuttingedge methods; and between the technical progress of one’s own projects and the conceptual progress of the entire field. The following educational strategies below should be very useful in achieving these balances. 8.4.3.1 Knowledge Structure To build a solid scientific foundation for future genomic scientists, the appreciation of the following information/concepts is essential, and many of these subjects should become mandatory courses. 1. Complexity science: Unlike traditional molecular biology where “cause and effect” or linear approaches are the key ways of thinking, complexity science focuses more on nonlinear dynamics, which is unpredictable and multidimensional within the context of emergence. In fact, Water
472
8. THE RATIONALE AND CHALLENGES OF MOLECULAR MEDICINE
Elsasser has called for a different kind of biology in the 1980s in which molecular causal chains are no longer the main focus of study (Elsasser, 1981, 1984). With the increasing discoveries of nonspecificity and fuzzy inheritance in genomics, appreciation of complexity science will finally come into play. In medical schools, courses on “health care as a complex adaptive system” are necessary (Sturmberg, 2013). The adaptive systems way of thinking in health will have a big impact on both health care delivery and how to sustain current health care infrastructures (Sturmberg et al., 2017, 2019). 2. Evolutionary medicine: Evolutionary medicine is a newly emerged field that applies the principles of evolutionary biology to understand health and disease and to use this knowledge to help diseases’ prevention, diagnosis, and treatment. Since its introduction in the early 1990s, evolutionary medicine has impacted various human health issues, such as infectious diseases, immune function, and aging (Nesse et al., 2010). Particularly in cancer research, evolutionary analyses have become quite popular (Heng, 2007a, 2015, 2017a). There has been a recent call to classify evolutionary biology as a basic science in medicine (Grunspan et al., 2018). Because two-phased cancer evolution exists (see Chapter 3), caution is needed when the term “Darwinian Medicine” is used. 3. The philosophy of science: The appreciation of both the history and philosophy of science is important to understand the key limitations of current genomics, especially when the analysis is done through the eyes of Thomas Kuhn and Karl Popper. The textbook description of the preconditions for the paradigm shift will certainty encourage scientists to push such shift in genomics and evolution, also through decisive experiments to scrutinize gene theory and the mechanism of natural selection. In addition, many keystone experiments/ assumptions and their predictions must undergo serious scrutinization with both past and current data. 4. The limitations of mathematics and physics in biology: With increasing involvement of mathematics and physics in biomedical science, especially with the integration of various computational technologies and bioinformatic platforms, an illusion that mathematics/physics/computational analyses will solve the mystery of biology once and for all will likely develop. It is thus extremely important to emphasize the difference between nonlife and life systems. As discussed in a previous chapter, the “laws” will likely differ as well. Knowing the limitations of using principles derived from nonlife systems to study biology is of great
8.4 FUTURE DIRECTION
473
importance, as the degree of uncertainty is quite high in dynamic biosystems, and bio-evolution is historically context-dependent. Different from mathematics and physics, biological principles/ theories are often with many exceptions. When key predictions of a theory fail for the majority of cases, it is time to get a better theory. Furthermore, most biological issues have their close practical implications (for medicine, agriculture, environment, ecology, and more). Respecting reality is one key for designing experiments, if scientists would like to apply their laboratory findings to the real world. 5. Critical thinking: A course on critical thinking is urgently needed in molecular medicine and genomics. Comparing and contrasting the reductionist and holistic approaches in medical science along with the gene and genome theory should be the leading discussions in this course, among many other topics. The limitations of using molecular mechanisms to understand diseases should also be discussed, including the current statistical platforms, as both “parts characterization” and “average profiling” are limiting when studying cellular evolution, one important basis for an individual’s health. 8.4.3.2 Scientific Culture and Professionalism Scientific professionalism can be judged by attitude, character, behavior, and standards of research and communication. One important attitude toward science is considering any scientific theory as a dynamic one and with its limitation. When it does not fit the scientific reality, no matter how favorable to the scientific community, it is about time to search for a better one. Furthermore, today’s theory, by definition and historical experience, likely will be wrong or significantly limited tomorrow. This is the biggest rationale to search for new framework, especially when a paradigm shift seems to be around the corner. Individuals can make differences by changing history, as today’s experiments are tomorrow’s history. As the field of biology has not experienced a true paradigm shift, the effort to search for the new conceptual framework for future genomics and evolution is highly significant. Another important scientific culture is debate. By referring to historical publications, it appears that debate was much more common in the past than now. The general public was also much more highly interested in debating evolution. Both the established side (the constrained force for current knowledge) and the challenging side (the dynamic force for change) are essential for scientific progress. Serious debate is the key to identifying paradoxes and
474
8. THE RATIONALE AND CHALLENGES OF MOLECULAR MEDICINE
fundamentally defending and advancing established theories and searching for new theories when required. Debate can also recruit scientists from different disciplines and more importantly generate exciting opportunities for a new generation of scientists. Leading scientists should set examples for debate, especially when there is much confusion as we have previously discussed. One suggestion is for the field of molecular medicine to systematically debate key paradoxes in the field, with the collaboration of leading scientific journals. The entire research community, including students, should be encouraged to get involved in such activities. In the field of cancer research, for example, there is increasing debate that challenges the gene theory of cancer (Heng, 2015). To create and maintain a healthy research landscape, it is important for individual scientists to adopt a high standard of scientific moral principle and integrity. It is scientists’ responsibility to question the mainstream concepts/platforms when they contradict their key predictions. Persistently and forcefully challenging these overexaggerated promises in molecular medicine is not only essential for science’s regulation but can also help the public establish more realistic expectations from science and medicine. For example, acknowledging the limitations of the current treatment methods of cancer will help patients make more beneficial decisions about their treatment options and encourage the general population to put more effort in making healthier lifestyle choices. This may reduce diseases prevalence in the first place. This is important because medicine cannot just fix most problems, despite its ability to sequence patients’ DNA. It is equally important to support new academic societies and organizations that aim to promote new concepts/approaches. For example, there are new societies on evolution and complexity in health including the International Society for Evolution, Medicine, and Public Health; International Society for Evolution, Ecology and Cancer; and the International Society for Systems and Complexity Sciences for Health. People are extremely motivated to push these concepts into main stream medical research, and their contributions will soon become obvious. 8.4.3.3 Policy Matters Since the end of World War II, the US government has gradually become the major sponsor for biomedical research. For decades, the majority of funding for health studies in universities comes from NIH. In recent years, the sustainability of US biomedical research has become a serious issue, partially because of its own rapid expansion and the reduced overall support from the federal budget. This issue has promoted many leaders (including former president of National
8.4 FUTURE DIRECTION
475
Academy of Science, and former director of NIH) to call for quick action in rescuing US biomedical research. . the remarkable outpouring of innovative research from American laboratoriesdhigh-throughput DNA sequencing, sophisticated imaging, structural biology, designer chemistry, and computational biologydhas led to impressive advances in medicine and fueled a vibrant pharmaceutical and biotechnology sector. In the context of such progress, it is remarkable that even the most successful scientists and most promising trainees are increasingly pessimistic about the future of their chosen career. Based on extensive observations and discussions, we believe that these concerns are justified and that the biomedical research enterprise in the United States is on an unsustainable path. Alberts et al., 2014
Discussions regarding the uncovering of contributing factors and the search for a solution are rising. Many of these discussions are related to funding, how to reduce competitiveness in the environment, and regulation of workforce size. Fewer discussions shed light to the balance between the increasing big science approach and decreasing individuality of scientists. Even within optimal fanatical support from the government (from year 1965e2015), scientific progress was not optimal when compared with the scientific achievements of the previous 50 years (1915e1965), when much less money was spent with a much smaller scientific community. Overall, “Science of the past 50 years seems to be more defined by big projects than by big ideas.”; “the advances are mostly incremental, and largely focused on newer and faster ways to gather and store information, communicate, or be entertained”; and “We are awash in small discoveries, most of which are essentially detections of “statistically significant” patterns in big data. Usually, there is no unifying model or theory that generates predictions, testable or not. That would take too much time and thought.” (Geman and Geman, 2016) Despite the impressive development of molecular manipulation technologies and various large-scale -omics projects, there has been limited theoretical progress in molecular biology when compared to the past 50 years where many important concepts and theories have been established including the discovery of DNA, the formation of modern synthesis, the introduction of the gene theory with model’s principles, establishment of the double-helix model, illustration of the genetic code, the geneeproteinephenotype relationship, and the central dogma of biology. In comparison, the period of 1965e2015 represents the “normal science” phase of molecular biology, according to Kuhn’s definition, characterized by the accumulation of molecular details under the established gene theory without challenging it. Currently, the systematic characterization of genes’ structure, function, and its alterations
476
8. THE RATIONALE AND CHALLENGES OF MOLECULAR MEDICINE
dominates. Based on the accumulated anomalies described throughout this book, the period of 2015e65 should be an exciting time to embrace new theories/technologies and perhaps the long awaited paradigm shift in biology. Given the high proportion of financial support from the federal government, the federal science policy directly impacts the behavior of individual scientists and even the culture of the scientific community. Because molecular medicine is intimately related to health issues of the public, many ethical issues are of ultimate importance. Scientists need to pay more attention to the law and ethical guidelines and to establish these regulations at the federal level; this is essential when many biotechnologies are extremely powerful and could thus cause huge damages to humanity when misused. The following policies will play an important role in maintaining a healthy research ecosystem for biomedical research: 1. Establish regular national think tanks to critically evaluate major scientific fields. Think tanks include experts outside of the examined field, experts representing different schools of thought, and scholars of philosophy of science. The main task of these think tanks is to critically examine key theories, conceptual frameworks, and key technical platforms; evaluate the phases of science (do they belong to the normal science phase where data collection is key? or does it belong to the dynamic changing phase where introducing new concepts is key?); and identify the key predictions and paradoxes in the field. These analyses would help determine priorities for future research. For example, based on the results of the cancer genome project, there are a few hundred driver gene mutations. How many of them should be systematically studied? Are those top 50 or 100 based on the occurring frequency in the patient population? How many grants should be given to study the same gene mutation? How about the different technical platforms? Should we prioritize them and reflect this in funding? 2. Promote different schools of thought on major theoretical issues. The current dominating theory needs to constantly be challenged by alternative ones, especially when key paradoxes exist. This kind of challenge is crucial in improving the mainstream theory as well. To provide opportunities for different competitive concepts to grow is the best policy in maintaining healthy scientific ecosystems. Thus, investment in alternative concepts is very valuable and much more important than simply repeating experiments to confirm mainstream concepts. Scientific evolution will allow only the final
8.4 FUTURE DIRECTION
477
winners to emerge. The current grant review process has completely ignored the importance of maintaining heterogeneous research ecosystems. It requires a policy change to change the behavior of reviewers. Vannevar Bush, the first science adviser to the US president and the visionary who was behind the establishment of the national science foundation, has insightfully advised that: At their best they [universities] provide the scientific worker with a strong sense of solidarity and security, as well as a substantial degree of personal intellectual freedom. All of these factors are of great importance in the development of new knowledge, since much of new knowledge is certain to arouse opposition because of its tendency to challenge current beliefs or practice. Bush, 1945
Clearly, as the majority of bioresearch funds in Universities are from NIH, it is best for NIH to provide conditions for searching and discovering new knowledge. 3. The time is ripe to have more faculty positions for bio-theorists. With the rapid accumulation of big data, theoretical synthesis is becoming extremely important. Unlike bioinformatics and/or computational biologists, theorists are more interested in establishing, validating, or falsifying different bio-theories. Funding mechanisms should also be created for them. In the era of molecular biology, scientists are paying more attention to data generation than theoretical analysis. In the gene cloning era, the key was “who clones what gene first”, and there was less theoretical involvement for cloning individual genes. It is not surprising knowing how “normal science” works. However, it is puzzling that, unlike mathematicians and physicists, bioscientists do not get well-deserved credit for insightful analysis and reanalysis of published data by others. It is as if one must do his or her own experiments, then have the right to analyze, and get credit. There are some exceptions of course. In the field of evolutionary biology, scientists can often analyze others’ data to support their theories. This situation in molecular genomics is now drastically changing, led by waves of analyses using bioinformatic approaches. More bio-theorists will soon find the best research ecosystems, when there is way more data than most scientists can handle. On an important note, some of the most important bio-discoveries/theories are not based on their own experiments, but on information synthesis: from Darwin’s natural selection theory to modern synthesis to the model of the DNA double helix. Surly more is to follow.
478
8. THE RATIONALE AND CHALLENGES OF MOLECULAR MEDICINE
4. Allow individual scientists to access big science infrastructures. Big science infrastructures, including national bio-information sharing systems and national core facilities, are essential for both quality control and saving money. For example, nowadays, a large portion of health researchers are using animal models. As discussed earlier, by eliminating the genetic and environmental heterogeneity, these valuable mouse models become less useful in representing clinical conditions, as without heterogeneity and evolutionary selection, these models no longer relate to clinic reality (Heng, 2015). To address this issue, large animal centers are a good option. To achieve both genetic and environmental heterogeneity, animal model experiments performed by individual investigators become unrealistic. To solve this issue, major animal model centers are needed for systematic evaluation. On the surface, it sounds very expensive to have this type of comprehensive animal model center. However, when compared to the money spent on small scale experiments and their lack of medical applicability, such a collective effort would be justified. In addition, an international database is needed to deposit all animal model data including all negative data. Strong statistical and simulation teams are also required by these centers to dynamically monitor and apply the data generated from animal models. Heng, 2015 (with permission from World Scientific).
Artificial intelligenceebased data analyses centers should also be established and open access should be given to individual scientists, rather than using extra funding to establish small and less sufficient labs. Together, they will provide huge financial benefits. 5. Push the frontier and enforce regulation. Molecular medicine now has many exciting frontiers: stem cellemediated tissue/organ regeneration, CRISPR-Cas9ebased gene editing and therapy, whole genome sequencing for individual patients, big dataebased diagnosis, specific molecular targeting, in vitro fertilization, etc. While each frontier represents an exciting potential to enormously benefit patients, it also represents a high risk when things go wrong or when these powerful tools are used inappropriately or by the wrong people. Therefore, policies regarding how to push these frontiers and particularly how to carefully regulate and control these technologies are of great importance. Although science’s self-regulation works in most cases, clear federal laws and ethical guidelines are needed to enforce all technology, especially for combined use. Otherwise, news stories such as “The scientific world erupted with outrage and concern after a scientist claimed he used gene-editing to alter the DNA of a pair of twins” (Fox, 2018) will start escalating. In addition, both federal laws
8.4 FUTURE DIRECTION
479
and institutional regulations should be required to punish practices of fraud in science to ensure its integrity. The current status of molecular medicine reflects a big gap between our scientific knowledge and our technical capabilities. Despite all of our capabilities, we still do not know why we get cancer and how to cure it (if only based on molecular knowledge). The rationale behind all our efforts is for the genome-based somatic evolutionary theory to fill this gap. Even with the ability to land on Mars and with all the exciting predictions of artificial intelligence, we have no idea how our own intelligence works, and more importantly, we still understand very little about ourselves, where we come from, and what the mechanisms are that make us human and lead us to diseases? The new genomics and evolutionary theories discussed in this book will certainly provide the necessary platform to directly debate and then address these questions. Our ultimate job is to study and answer these questions to the best of our ability, using available information and technology, and to deliver this knowledge to the public. If we fail to do this, we have failed our duty to give back to science and humanity.