C H A P T E R
2 Genes and Genomes Represent Different Biological Entities 2.1 SUMMARY It is the emergent genome rather than isolated genes that define a biosystem. The gene-centric view, by ignoring the importance of the genome system, has been limited due to its unrealistic simplicity. To illustrate the distinctive features between genes and the genome, the definition of genome and a number of gene-centricerelated concepts, and their limitations, are briefly reviewed, including the “selfish gene,” “minimal gene sets,” and “speciation genes.” Furthermore, experimental examples are discussed to highlight the conflicting relationship between genes and the genome. These analyses illustrate that gene functions are not only constrained by the genome, but, perhaps more importantly, that the characterization of genetic parts will not lead to the understanding of the emergent genomic properties of a system. The genome-level operation is not simply a matter of “adding up” the functions of individual genes. Thus, examples of genome contexts determining or influencing gene function are presented. Alongside Chapter 1, this chapter establishes the rationale of searching for a new framework of genome-based genomic theories.
2.2 THE DEFINITION OF THE GENOME A few decades after they were first observed by German botanist Wilhelm Hofmeister in 1848, chromosomes were suggested to be the carriers of inheritance. The chromosome theory of inheritance, introduced independently by Walter Sutton and Theodor Boveri in 1902, purported that chromosomes are the basis for all genetic inheritance. In 1889, Hugo
Genome Chaos https://doi.org/10.1016/B978-0-12-813635-5.00002-1
53
Copyright © 2019 Elsevier Inc. All rights reserved.
54
2. GENES AND GENOMES REPRESENT DIFFERENT BIOLOGICAL ENTITIES
de Vries used the term “pangen” to describe Mendel’s abstract concept of an isolated genetic factordthe smallest hereditary particle (de Vries, 1889). Not until 1909 did Danish botanist Wilhelm Johannsen use the word “gene” to define the fundamental physical and functional units of heredity (Johannsen, 1909). German botanist Hans Winkler coined the term “genome” in 1920 by the elision of two terms: “GENe” and “chromosOME” (Winkler, 1920). With this historical basis, clearly, the meaning of the word “genome” should include both the whole genomic basis (chromosomes) and units of heredity (genes). It is important to note that Winkler hoped to use this expression to link the genome to the foundation of the species (the whole genome system). Similarly, the genome was referred to as “a set of chromosomes” before the establishment of molecular genetics. Unfortunately, during the gene era and the dominance of the gene-centric view, the importance of the chromosome as a system organizer was ignored. The chromosome became primarily considered as a vehicle for genes. The term “genome” has been applied specifically to mean the complete set of DNA molecules of a cell (both the nuclear genome and organelle genome that include the mitochondria and chloroplasts of a given species). Gradually, the “chromosome” portion of the genome has been chipped away; in practice, the definition of the term “genome” is now relegated to merely “a collection of genes, ”or “the whole of organism’s hereditary information encoded in its DNA (or, for some viruses, RNA).” Here are some representative examples: A genome is an organism’s complete set of DNA, including all of its genes. Each genome contains all of the information needed to build and maintain that organism. US National Library of Medicine (NIH) https://ghr.nlm.nih.gov/primer/ hgp/genome The genome of an organism is the whole of its hereditary information encoded in its DNA (or, for some viruses, RNA). This includes both the genes and the non-coding sequences of the DNA. https://simple.wikipedia.org/wiki/Genome A genome is the full set of instructions needed to make every cell, tissue, and organ in your body. Almost every one of your cells contains a complete copy of these instructions, written in the four letter language of DNA (A, C, T, and G). http://www. broadinstitute.org/education/glossary/genome All the genetic material in the chromosomes of a particular organism. The size of a genome is generally given as its total number of base pairs. Kevles and Hood, 1992 The haploid set of chromosomes in a gamete or microorganism. Or in each cell of a multicellular organism, and more specifically, “The complete set of gene or genetic material present in a cell or organism.” Oxford living dictionaries
2.2 THE DEFINITION OF THE GENOME
55
Keller has listed a range of definitions for the genome (both “official” and semi-official), coupled with historical discussions about the relationship between genes, the genome, and genomics (Keller, 2011). Among all these different definitions, a common thread is the mention of “genes or genetic materials.” In addition, “genetic instructions” are sometimes vaguely mentioned. For those definitions that include chromosomes, they mainly focus on the fact that genes are located on chromosomes. No specific function of inheritance is discussed at the chromosomal level, and no topological and organizing roles of chromosomes are mentioned. In summary, all definitions have focused on the genetic materials, rather than the organization of these materials. It is no wonder then that the project of sequencing all genes has been termed “the genome project” rather than “the gene sequencing project,” and the decoding of individual genes equals the decoding of the genome (the relationship among genes) to most researchers. As a consequence, the current improper use of the term “genome” represents one of the biggest confusions in current genomics, which has also generated many misconceptions in other related fields of bioscience. For example, many consider the gene to be the independent unit of inheritance, and the individual gene or combinations thereof to be responsible for genetic traits, whereas few believe that chromosomes are more important for organizing genomic information; the gene is extensively used as the unit for cellular and organismal evolution studies, and only a few researchers still use karyotypes to study macroevolution (and if they do, they are often considered outdated); it is believed that most complex diseases can be understood by tracing these diseases back to less than a handful of genes, and studying chromosomal aberrations serves only as a tool to identify gene mutations. Almost all molecular genetic research follows the gene and its related aspects. Ironically, the characterization of chromosomal aberrations is often viewed by some as descriptive studies (a very negative comment from the NIH study sections aimed to kill proposals). However, the characterization of gene mutations (clearly, a description at another level) is not considered descriptive but rather mechanistic research. Such an unfortunate bias is based on a common belief that the genecentric reductionist approach is the preferred method and that studies of gene-mediated genetic information are more important than studies of other types of genomic organization (simply because of the high resolution of research). However, these wishful ideas are at odds with the basic principle of complex science, as well as accumulated genomic data, which demonstrate that the genome context differs from gene content (Heng, 2009). Together, rather than reexamining the basis of genetics, scientists are now pushing strategies to search for the answer of genomics beyond
56
2. GENES AND GENOMES REPRESENT DIFFERENT BIOLOGICAL ENTITIES
the genome, as studying individual genes is not working. This situation has been discussed as follows: Perhaps also due to a common misconception that finishing the genome sequencing phase means the mission for genome-level research has been accomplished, the term post-genome era has been used, as if decoding DNA by sequencing is equal to decoding the genome itself. Furthermore, some have even suggested that the goal of this post-genome era is to search for molecular mechanisms beyond the genome. In fact, “beyond the genome” has become a buzzword, even though it is clear that the genome is not just a bag of all genes and DNA sequences, and the functional organization of the genome is virtually unknown. Obviously, the major promises of sequencing the human genome did not pay off in terms of solving the mystery of life or providing an understanding of the genetic basis for most common human diseases. Such a disappointment has quickly led to the common viewpoint that: “If the answer cannot be obtained from genome sequencing, why not move beyond the genome?” On the surface, this seems logical. However, this view is based on a deep misunderstanding of genetic organization, because the genome system is defined by genomic context rather than gene content. . many features of bio-systems, such as protein interaction pattern and network dynamics (including systems- specific boundaries), are defined by the genome context. Characterization of the genome’s features or behavior is not beyond genomic research; it is the very core of it. Heng and Regan, 2017 (with permission)
To initiate a timely conversation on this subject, a definition of the genome has been introduced: A genome is the complete set of genetic material (including gene content) of an organism, which is organized by the unique composition of chromosomes or karyotypes. While genes represent parts inheritance (how individual genes code/regulate individual proteins), the karyotype determines the topological order of genes along and among chromosomes, which ultimately defines interactive relationship among genes, representing the system inheritance (how the blueprint works to instruct the protein network). Heng, 2017a
Note that the usage of “karyotype” here is not accurate but necessary. Rather than its original definition (the entire set of chromosomes of a cell from a given species, usually displayed as a systematized arrangement of chromosome pairs in descending order of size), it in fact refers to the species-specific gene order (or topological relationship) along and among chromosomes, which serves as a physical platform for gene interaction within the 3D nucleus (see Chapter 4 for more details). The main reason for using “karyotype coding” to refer to the “chromosomal code” is to get reader’s attention, as it will likely trigger readers to think more. If we just use the term “chromosomal coding,” readers will probably interpret it as DNA coding and continue to ignore it. In addition, different species often display different karyotypes; karyotype coding can thus emphasize that this new coding is species-specific.
2.3 “PARTS VERSUS THE WHOLE”: THE EMERGENT RELATIONSHIP
57
2.3 “PARTS VERSUS THE WHOLE”: THE EMERGENT RELATIONSHIP (WHICH CHALLENGES REDUCTIONISM) The bias of favoring genes over chromosomes is no surprise given the popularity of reductionism in current bioscience. According to the genecentric view, the gene represents the basic unit of inheritance, and the phenotypic contribution of higher levels of genomic organization, such as the chromosome, should be understood by simply dissecting individual genes’ functions followed by information integration. Furthermore, science education has enforced the belief that physics and chemistry comprise the mechanistic basis of how biosystems work, and biological science needs to follow the approaches which led to the success of physics and chemistry (as they represent more matured scientific disciplines). Indeed, many physicists and chemists had become biologists and contributed greatly to the birth of molecular biology, and mathematical tools have shaped the research landscape of population genetics. In the past two to three decades, computational and bioinformatic analyses have gradually become a key component of biological research, as reflected by the Human Genome Project and many other large-scale -omics projects. In fact, the journal “Bioinformatics” (initially named Computer Applications in the Biosciences before switching to its current name in 1998) was established in 1985, which was 2 years earlier than the birth of journal “Genomics.” Following the massive molecular data collection and bioinformatic analyses that have taken place for over three decades, a large number of data generation platforms and bioinformatic tools/packages are now available. Despite wave after wave of excitement (e.g., who can forget how promising gene expression microarray technologies were when they were initially introduced?), in contrast to initial expectations, the flood of data has only increased overall bio-uncertainty; as more diverse data have often meant more complexity and led to more confusion, the issues seem bigger than computational power itself, which ultimately challenges many of our genetic or biological foundations. When the data do not support the concepts, most researchers blame the data rather than question the biological concepts. One viewpoint is that inconsistencies between data generated and genomic concepts are caused by insufficient data. Once more data are collected, the pattern predicted by current concepts will become reliable. Another viewpoint is that we do not know how to filter out the noise from the available data. Once we have better computational power to filter out this noise, the pattern will become apparent.
58
2. GENES AND GENOMES REPRESENT DIFFERENT BIOLOGICAL ENTITIES
Both viewpoints are based on the same rationale: similar to the relationship between defined molecules and chemical reactions, the relationship between genes and phenotypes should have a high degree of certainty/predictability. The key in searching for certainty (reflected by highly repeatable patterns) is to eliminate the noise. Despite that: (1) it has been increasingly realized that many genetic factors are highly dynamic and heterogeneous (thus it is not a simple issue of eliminating noise); (2) the data distribution does not follow the simple patterns defined by the Mendelian laws of genetics (thus it is needed to reexamine the law of genetics); and (3) environmental factors can have a significant impact on the genes’ phenotype, especially for complex traits (thus it is not a simple issue of data accumulation from different factors), most scientists still believe or hope that gene-based predictions will be precise when the data issue is solved. After all, most genetic/genomic researchers are typical reductionists. Knowing the historical relationship between reductionism and classical physics, the general appreciation of the limitation of Newton’s view of the universe in the modern scientific era, and the fact that complexity science has become increasingly popular since the 1980s, it is extremely puzzling to observe that most molecular biologists still firmly hold the reductionist viewpoint to guide their way of practicing science. Biologists should know better, as life is both complicated and complex, and evolution is not a linear process. Among all scientists, they should embrace complexity science with the highest enthusiasm. Interestingly, the emergence of complexity science during the 1980s and 1990s unfortunately overlapped with the golden age of molecular genetics/genomics, a field in which the cloning of disease genes became dominant. It is likely that the exciting discoveries coming from molecular genetics then took needed attention away from biologists studying the new scientific frontier of complexity. When molecular researchers who were trained during the 1980s were asked if they were familiar with complexity science, they answer that they all heard about it. When asked why they have not taken it seriously in regard to their own research, they have a common answer: “Back then, we were too busy cloning genes. By the way, the reductionist molecular approaches have been working well. Why change things when they are fine?” Most bio-researchers are skillful in designing and performing linear models which lack the complexity of real life. Successfully performed artificial experiments might have given them the illusion that molecular research is going well so far. Thus, there is no reason to consider complexity. Perhaps, even more surprisingly, when asked about complexity science, most current students majoring in the biological sciences (at both the undergraduate and graduate levels) do not pay any attention to this timely subject at all. This is mostly likely because most genetics/genomics
2.3 “PARTS VERSUS THE WHOLE”: THE EMERGENT RELATIONSHIP
59
textbooks have failed to link current genomic challenges to the fundamental limitations of reductionism and the ultimately important fact that biological systems are complex adaptive systems (a fact that demands the new science of complexity). Moreover, because biological science will lead all scientific disciplines in the 21st century, it is rather troublesome to realize that complexity science has not yet been integrated into mainstream bioscience. Clearly, without the necessary understanding and appreciation of the emergent properties of complex systems, there is no way to comprehensively reconcile many conflicting issues in genomics, such as the relationship between genes and genome, between specific pathway and phenotype, and between laboratory experimental results and clinical reality. Complexity science can be regarded as a new scientific discipline for studying complex systems, within which many parts (agents) interact to generate emergent global collective behavior that cannot be easily explained by the function of individual parts or their interaction. Two simple examples are frequently used to illustrate emergent properties: (1) edible table salt is made up of sodium and chlorine, and the properties of NaCl differ drastically from those of a metal and a poisonous gas; and (2) carbon atoms, when arranged differently to result in different molecular architectures, can form many types of materials (including graphite, diamonds, and C60 buckyballs, also called fullerenes, one of the first discovered nanoparticles), each of which display different properties. It is important to point out that those examples only represent simple emergent molecules from well-defined homogenous atoms. In these situations, the relationship between the parts and the emergent properties are highly certain; thus, they have high predictability based on the parts and the conditions they are in. In contrast, for most biosystems, there are often multiple levels of emergence, the parts involved come in a large variety of types, and each type of these parts, including environments, is highly heterogeneous. Beyond these features, there is also the important factor of time. All of the complicated interactions occurring within biosystems in an adaptive fashion make mechanistic studies aimed to predict emergent properties based on parts characterization highly challenging, if not impossible (Heng et al., 2018; 2019). The key rationale of developing complexity science is to address the issue of complexity and to depart from reductionism. When applied to genomics, complexity science acknowledges that the genotypee phenotype relationship is an adaptive relationship; as such, focusing on the characterization of the initial genetic condition will offer limited prediction of the final phenotype, as there is often no strong causation between individual parts and their emergent key features. Furthermore,
60
2. GENES AND GENOMES REPRESENT DIFFERENT BIOLOGICAL ENTITIES
the nonlinear feature of complex systems requires new platforms for understanding. Nevertheless, it explains the current confusion and challenges of large-scale genomic data and demands that we adapt a new way of understanding and doing genomic science. If the basic principles of complexity science are correct, particularly that the heterogeneity of emergence represents a key feature for biocomplexity (for more, see Chapter 8), many genomic concepts/approaches need to be changed. For example, a new type of biomarker needs to be developed to monitor the adaptive process by measuring evolutionary potential. Furthermore, attention should be paid to the physiological and pathological relationship, pre- and posttreatment dynamics, as well as short-term response and long-term benefits. All of these transitions involve punctuated somatic cell evolution (Heng, 2015). Of equal importance, only when we accept the reality of biosystems as adaptive systems, and that there is no simple relationship between genetic parts and phenotypes, can the needed change of attitude regarding reductionism finally occur. Right now, we should use the lens of complexity science to reexamine some key predictions of the gene theory.
2.4 REEXAMINING GENE THEORY PREDICTIONS Chapter 1 gave a brief introduction of the rise and fall of the concept of the gene. To appreciate the fundamental limitations of the gene theory, a series of paradoxes must be presented that represent “the accumulation of anomalies” phase of the scientific revolution, according to Kuhn’s criteria. Similar to Kuhn’s description, the current paradigm of gene-based genomics unintentionally generates considerable observations which challenge the paradigm itself. The following are some examples that have triggered both my curiosity and critical evaluation.
2.4.1 Selfish Gene or Constrained Genome? The selfish gene concept generated a powerful impact when it was first introduced (Dawkins, 1976). The idea that the sole implicit purpose of the gene was to replicate itself seemed to touch the core of evolution, as it explained evolution in a simple and vivid way. This gene-centric view of evolution proposed that evolution occurs through the differential survival of competing genes. On the surface, gene competition can lead to the fitness of phenotypes or vehicles. But fundamentally, it is the gene’s own propagation that matters the most. At the time, the gene-centric concept seemed to make sense, as genes were
2.4 REEXAMINING GENE THEORY PREDICTIONS
61
thought to be the genetic material and the units of natural evolution. The chromosome was merely a vehicle of the gene, as was the individual a vehicle of the species. The birth of the selfish gene concept was not an isolated event. For decades, population genetics focused on gene frequencies. Molecular cloning technologies further pushed the gene to center stage of genetics and biology and spawned the biotech industry. As a result, geneticists and society at large embraced this concept. Dawkins’ The Selfish Gene sold over a million copies. Some credited it with causing a silent and almost immediate revolution in biology. However, despite its popularity, many criticized this extreme idea of evolutionary selection based on genetic parts rather than the genetic system. Evolutionary biologist Stephen Jay Gould, who along with Niles Eldredge established the theory of punctuated equilibrium, was critical of the selfish gene concept. According to Gould, the fatal flaw of the selfish gene concept is that the gene is not a selection unit because “no matter how much power Dawkins wishes to assign to genes, there is one thing that he cannot give themedirect visibility to natural selection” (Gould, 1990). Eldredge has nicely summarized the difference between Dawkins and Gould using Dawkins’s own analysis: Dawkins sees genes playing a causal role in evolution, while Gould and Eldredge see genes as passive recorders of evolutionary changes (Eldredge, 2004). Similarly, Ernst Mayr, one of the 20th century’s leading evolutionary biologists, also insisted that the notion of the gene as the object of selection was not a valid evolutionary idea and this is totally anti-Darwinian. He stated: The idea that a few people have about the gene being the target of selection is completely impractical; a gene is never visible to natural selection, and in the genotype, it is always in the context with other genes, and the interaction with those other genes make a particular gene either more favorable or less favorable. In fact, Dobzhansky, for instance, worked quite a bit on so-called lethal chromosomes which are highly successful in one combination, and lethal in another. Therefore people like Dawkins in England who still think the gene is the target of selection are evidently wrong. In the 30s and 40s, it was widely accepted that genes were the target of selection, because that was the only way they could be made accessible to mathematics, but now we know that it is really the whole genotype of the individual, not the gene. EDGE interview, 2001 (with permission)
Lynn Margulis, best known for her contribution to the endosymbiotic theory of evolution, even criticized the term “selfish gene”: . The terminology of most modern evolutionists is not only fallacious but dangerously so, because it leads people to think they know about the evolution of life when in fact they are confused and baffled. The ‘selfish gene’ provides a fine example. What is Richard Dawkins’s selfish gene? A gene is never a self to begin with. A gene alone is only a piece of DNA long enough to have a function. The gene by itself can be flushed down the sink; . There is no life in a gene. Margulis and Sagan, 2003
62
2. GENES AND GENOMES REPRESENT DIFFERENT BIOLOGICAL ENTITIES
Because eukaryotic cells require a much longer time to replicate than do prokaryotic cells, and if the selfish gene concept is correct that the ultimate goal of life is gene duplication, then it is difficult to explain the evolution of eukaryotes from prokaryotes, as the most effective strategy for selfish genes would be to confine themselves to bacteria, which are the most effective vehicle of gene duplication. The fact that evolution has resulted in the propagation of more complicated yet slower duplicating genomes is strong evidence for the reduced importance of the selfish gene in the eukaryotic world (Heng, 2009). Does the fact that bacteria are one of the most abundant life forms on Earth support the validity of the selfish gene? The answer is also negative. Before the genomic era, individuals from the same ‘‘species’’ of bacteria were thought to share identical copies of “selfish” genes. However, the discovery of a high level of gene dynamics in bacteria indicates that bacterial genes do not just self-replicate but generate large numbers of variants or ‘‘noneself-oriented’’ genes (Heng, 2007a). In addition, extensive horizontal gene transfer (HGT) also indicates the success of cooperation rather than mere replication of selfish genes. It is no wonder that now even Dawkins uses the term “the cooperative gene” rather than just the “selfish gene” (see also Chapter 1). If the gene is the primary unit of evolution, then gene selfishness is of ultimate importance in evolutionary competition. However, if higher levels of the genomic system are the primary unit of evolution rather than interdependent parts such as genes, then cooperation within the genome is the key for evolution (especially macroevolution) as this higher level serves as a regulatory constraint for the lower parts. It should be noted that emphasizing the importance of the genome in evolution is not to deny evolutionary selection observed in multiple levels. One of the main purposes of this book is to raise the following question: which level is more important given a particular context of genetic organization or type of evolution? In other words, when are genes or genomes more important? It is very tempting to speculate, however, that there might have been a period in history when dominant selfish genes did exist (in a specific context). During this period, early life forms contained limited numbers of genes. For these life forms, each gene was essential, directly serving as a unit of evolutionary selection; therefore, each gene was selfish. Yet with the success of genes and genome duplication, the increased number of genes actually changed the game, as paradoxically, the more genes involved in a system, the less important each gene becomes. At a certain evolutionary tipping point, there would be so many genes within each individual that individual genes are not selected upon. When bio-features are generated by the emergent properties of genes requiring a certain degree of complexity, the selection pressures shift focus to the higher level of the whole genome package and reach the point of no return in regard to
2.4 REEXAMINING GENE THEORY PREDICTIONS
63
gene-based selection. The complexity of such biosystems becomes dominant, and inheritance serves as a powerful constraint preventing a return to a much simpler system with a smaller number of genes. There are many examples in current life forms where the constraints of a system are dominated by the genome rather than by selfish genes. The classification of bacteria according to similarities in their DNA sequences reflects this constraint. High levels of intraspecies genetic diversity make it difficult to define bacterial species (Konstantinidis et al., 2006). Nevertheless, a bacterial species is defined as a collection of strains characterized by DNA with at least 70% cross-hybridization (Wayne et al., 1987) (for more discussion, see Chapter 5). HGT is a common phenomenon which on the surface supports the selfishness of individual genes. However, even the efficiency of HGT is influenced by the genome constraint across evolutionary distances. HGT is most commonly detected in closely related microorganisms. In contrast, in distantly related microorganisms, such as bacteria and archaea, HGT has not been demonstrated to occur on a large scale (Glansdorff et al., 2009), as there is a donorerecipient similarity barrier (Popa and Dagan, 2011; Popa et al., 2011; Tuller et al., 2011). For example, many horizontally acquired genes were in fact compatible with the recipient genome’s constraints such as codon usage (related to GC content and/or amino acid usage). Once integrated into the genome, acquired genes still have to adapt within the genome to be retained during evolution. Recently, it was demonstrated that there are bidirectional associations between similar tRNA pools of organisms and the number of HGT events occurring between them. Here, the similar tRNA pools reflect a similar genome context. Interestingly, this study also suggested that frequent HGTs may be a homogenizing force that increases the similarity in the tRNA pools of organisms within the same community. Clearly, genomic constraint plays a dominant role here. There is a restriction on HGT in multicellular eukaryotes: further genome constraint is visible in the separation of germ cells from somatic cells where the function of sex plays the key role of maintaining the genome in germ cells while the somatic genome makeup is more flexible (Heng, 2007b; Gorelick and Heng, 2011; Horne et al., 2013a; Ying et al., 2018). For more in depth discussion on this point, see Chapter 5. System constraints ensure that evolutionary selection acts at the level of the genetic network rather than on the genes or the individual pathways. This conclusion has been illustrated by directly manipulating bacterial genetic networks where master regulators (transcription factors) in 598 Escherichia coli gene networks were rewired by reconstructing new regulatory links in the network. Surprisingly, in the majority of altered systems, the newly formed networks were functionally unchanged or even superior to the original system in terms of growth, reflecting plasticity of
64
2. GENES AND GENOMES REPRESENT DIFFERENT BIOLOGICAL ENTITIES
individual gene function and demonstrating lack of importance of specific genes within a given system (Isalan et al., 2008). Another important and convincing experiment in yeast illustrates the power of genome system alteration in forming new systems to compensate for the loss of a key gene. MYO1 plays a central role in cytokinesis, and as expected, MYO1 deletion is lethal. However, researchers from Rong Li’s laboratory at the Stowers Institute for Medical Research noticed that, surprisingly, some cells grew back after culture plates were left on the laboratory bench for many days (the plates were supposed to have been disposed of on completion of the deletion experiments)! Detailed studies of the resurrected cells demonstrated that yeast cells with deleted MYO1 rapidly evolved divergent pathways to restore growth and cytokinesis, and the evolved cytokinesis phenotypes correlated with specific changes in the transcriptome rather than the restoration of the MYO1 gene itself. Significantly, extensive polyploidy and aneuploidy were the initial evolutionary changes detected from these new systems. Similar incidents involving other genes had previously been observed by others, but this resurrection phenomenon was ignored as most investigators are interested only in the causative relationship between a specific gene and a phenotype in the short term. When one key gene is deleted, cell death occurs, seemingly demonstrating the importance of the gene of interest, and that is the end of the story. However, Li’s laboratory realized the importance of such an observation and carefully studied this issue from an evolutionary perspective. These results demonstrate the evolvability of even a well-conserved process and suggest that changes in chromosome stoichiometry provide a source of heritable variation driving the emergence of adaptive phenotypes when the cell division machinery is strongly perturbed. Rancati et al. (2008)
This elegant experiment demonstrated that the system robustness and macroevolution of a biological organism are not dependent on individual key genes, but rather on the formation of new genome-defined systems. In this case, extensive polyploidy and aneuploidy represent new systems that characterize a new emergent potential irrespective of the individual genes. This experiment also answers the question why the genome-based evolutionary concept insists that the individual gene’s function is extremely limited during macroevolution, despite that its function could sometimes be obvious in a given individual. The key is whether or not genome reorganization is involved. When there is genome reorganization, any chromosomal changes can impact thousands of genes (which key gene is more important than hundreds and thousands of genes?), and, as recently illustrated, genome reorganization often completely changes the transcriptome (Chapters 3 and 4). Within the new genome-defined
2.4 REEXAMINING GENE THEORY PREDICTIONS
65
transcriptome, an individual gene’s function could be drastically altered. When the MYO1 gene was reintroduced to the altered genome, its original function no longer existed (Rancati et al., 2008). Ongoing experiments are actively testing how many individual gene’s functions can be altered by changing the genome. Human genomics originally set out to illustrate the connection between genes and most human diseases with an eye to use this knowledge for medical purposes. And yet, inadvertently, these studies have also supported the idea that an individual gene has limited impact on a biological system. For example, the data from the 1000 Genomes Project revealed a high level of gene defects in “normal” individuals where on average, each individual in the study carried 250e300 loss-offunction variants in annotated genes. More strikingly, 50e100 variants were previously implicated in inherited disorders (Abecasis et al, 1000 Genomes Project Consortium, 2010)! Recent data from personal genome sequencing demonstrate that more mutations exist in each individual than are necessary for most diseases to occur, which downplays the contribution of these disease genes to disease phenotypes. It is estimated that each human individual, on average, has 60 gene mutations per generation (Conrad et al., 2011). As most mutable parts of the genome are in tandem repeats and satellite DNA, the average number of mutations in the germline of an individual could even be much higher. And there is an even higher rate of somatic mutations (see Chapters 3 and 7). Yet, most of us are seemingly normal. This will be discussed in more detail in Chapter 6, but all these surprising observations cry out the same message: genome constraintsdnot gene mutationsdmatter most for the survival and evolution of biological systems. Individual parts become less important in a genome-defined system regardless of the selfishness of the genes. Even transposable elements (TEs) (one type of so-called “selfish DNA”) must acquiesce to genome constraints. The fact that different types of TEs are detected in different species suggests a genome-defined species constraint. It has been hypothesized that TE can escape the sexual filters (see Chapter 5) by spreading within the host genomes, but there is a limitation on how much can occur each generation. More importantly, the newly invading TEs are not directly selected but come as part of the whole genome package. The surprising result of sequencing various genomes is that the position of the genes are reorganized within different species, which downplays the importance of the selfishness of the gene and underscores the importance of the topological relationship among genes within a given genome. Before the genome sequencing era, however, most researchers believed that different species are caused by the accumulation of different selfish genes, which fits the neo-Darwinian evolutionary principle (see Chapter 6).
66
2. GENES AND GENOMES REPRESENT DIFFERENT BIOLOGICAL ENTITIES
By now it should be clear that a preponderance of evidence points to the importance of genome constraint versus selfish genes and the importance of the genome concept versus the gene-centric concept. Some might argue that with overwhelming genomic evidence, analyzing the fundamental limitations of the gene-centric concept is akin to fighting a straw man or beating a dead horse. However, the fact of the matter is that, even though increasing numbers of scholars are abandoning the extreme version of the selfish gene concept, the gene-centric view of life is still very dominant in biology. How many researchers agree that gene-based research does not paint a full picture but are still studying genes or gene-based biology? The geneegenome transition has surely not yet happened. Consider the following excerpt from the public conversation between Craig Venter and Richard Dawkins at Digital Life Design in 2008, titled “Life: a gene-centric view” (EDGE conversation: Digital Life Design, 2008). JOHN BROCKMAN (moderator) “. Dawkins is responsible for possibly the most important science book of the last century, The Selfish Gene, . which has become the basic science agenda for biologists for the last quarter century .” Venter: “. Richard’s book on The Selfish Gene really influenced most thinking in modern biology. I actually didn’t like his book initially .. But I’ve come to appreciate it immensely. I was looking at the world from a genome-centric viewdthe collection of genes that put together to lead to any one speciesd . But I’ve switched, and I’ve really come to view the world from a gene-centric point of view .” (with permission from Edge)
Venter should be the last person to switch from the genome-centric view back to the gene-centric view, as genomics has illustrated the fundamental limitations of the gene-centric view partially because of his personal efforts. Venter found that in most microorganisms, it is hard to identify the defined genome because of sequence diversity. This lack of a genomic identity is because of the lack of sexual filters commonly used in eukaryotic organisms (see Chapter 5). The fact that it is difficult to define systems of microorganisms by the same gene sets clearly downplays the importance of genes. It provides further support to the genome-centric concept. This should have convinced Venter to confirm the genomecentric view, not revert to his pregenome thinking. The independent behavior of the parts (genes or chromosomes alike) is only meaningful within the context of the system. Notably, in the experiment of synthetic life form creation, Venter’s group synthetically copied the genome of a bacterium and incorporated it into a cell to make what they call the world’s first synthetic life form (Gibson et al., 2010). One key message from this study is that they used the coding information of an already existing biosystem, and changing the order of genes resulted in failure to create a functional system. This further illustrates the importance of the
2.4 REEXAMINING GENE THEORY PREDICTIONS
67
genome context rather than just the importance of individual genes. Clearly, this observation favors a genome-centric rather than genecentric view. Obviously, changing views of genetics to incorporate the relative unimportance of genes will require great effort, as the gene-centric concept continues to influence genomic researchers in the face of mounting evidence that seriously challenges the gene theory itself. This situation illustrates the important need to establish a new genome-based paradigm to replace the old gene-based one. Without a new genome theory, gene-based genomic research will likely continue to fail to deliver relevant clinical results. It is true that the selfish gene has played an important role prioritizing gene-based research, but the time has come to consider the genome as the top priority after decades of focusing on individual genes. Yes, there has been a great deal of discussion regarding the selfish or cooperative gene perspective. However, these discussions still fall within the gene-centric realm where the gene is the motor of evolution. Despite many critical analyses from many well-respected evolutionary scholars (including Ernst Mayr, Stephen Jay Gould, and Lynn Margulis) on the selfish gene concept, the notion that the genome itself is a key selection unit that constrains genes has not been systematically discussed and, in particular, has not been discussed within the genome-based evolutionary concept. As a result, the relationship between genetic parts and the genome system is less clear, even though it is clear to some that a genetic program cannot reside solely within the genes (Keller, 2000). Similar to the “selfish gene” expressing the gene-centric viewpoint of evolution, genomic constraint expresses the genome-centric view of evolution (Chapter 6).
2.4.2 Genomes Not Genes Define Biosystems One crucial step to further understand the limitation of individual genes is to examine the issue of whether genes define a biological system and, if not, what genetic components or structures actually define a biological system. A key feature of biological systems is inheritance, and therefore, the involvement of genetics is crucial. The following examples address this issue and are applicable to many related topics important for genomics and evolutionary biology. Further discussions based on genomic coding can be found in Chapter 4. 2.4.2.1 There Are No Common Minimal Gene Sets in Nature A “minimal gene set” refers to the smallest possible group of genes sufficient to maintain a functioning cellular life form under ideal conditions (sufficient nutrients and absence of environmental stresses) (Koonin, 2000). These minimal gene sets, if they exist, should be evident from the
68
2. GENES AND GENOMES REPRESENT DIFFERENT BIOLOGICAL ENTITIES
simplest forms of life such as obligate host-associated bacteria when they are sequenced and compared. Alternatively, they should become apparent if we delete nonessential genes from a few species and compare the remaining essential genes common to them. The existence of minimal gene sets would powerfully illustrate the importance of these evolutionary conserved key genes in evolution and contribute to the understanding of the universal principles of life (Glass et al., 2009). In addition, minimal gene set reconstructions could be experimentally testable. These minimal gene sets serve as the basic foundation of synthetic biology, a new scientific discipline that focuses on artificially generated organisms using genetic information and material. Surprisingly, these essential gene sets are difficult to identify, despite the supposition that some genes and their coded functions are absolutely necessary for the survival of any living entity (Juhas et al., 2011). Examination of endosymbiotic genomes revealed that the expected minimal gene sets are highly diverse and are largely host-defined and speciesspecific, with the exception of a small set of genes required to process information (Klasson and Andersson, 2004). Mycoplasmas are important models to analyze essential genes because of their small genome size and easy cultivation. But there, too, the presence of diverse gene sets challenges the idea of a rigid set of minimal genes and suggests a minimal set of functional niches (or genome packages in our view). The global knockout mutagenesis of mycoplasmal genes has further illustrated that some of the so-called universal or highly conserved genes may not be necessary, suggesting the importance of evolutionary selection based on the overall system rather than individual components like genes. Interestingly, as more sequences of genomes are compared, there are a diminishing number of protein-coding essential genes that can be identified, contradicting the notion of a minimal gene set (Table 2.1). Based on this current trend, as more and more genomes are added to this analysis,
TABLE 2.1
The Reduced Minimal Gene Set
Number of Genomes Compared
Number of Shared Protein-Coding Genes
2
256
5
180
7
156
100
63
147
35
2.4 REEXAMINING GENE THEORY PREDICTIONS
69
the number of minimal essential genes will become unrealistically small or vanish altogether. Despite the disappointing results from comparative genomics and experimental biology, computational reconstructions and modeling of minimal genes or metabolic machinery necessary to sustain life are still actively pursued and numerous models have proposed a variety of numbers of minimal genes (Gil et al., 2004). It is extremely difficult to predict evolution by modeling, particularly in light of the diverse gene sets discovered to date; thus, the value of these models is limited. While the rationale of identifying clear-cut, simple general genetic patterns is ubiquitous in biological research, it has largely failed to produce useful results. The same rationale that led to the search for the minimal essential genes for life has been used to attempt to identify the common gene mutations in cancer. Such a rationale is based on the genecentric concept that mutation of a handful of key genes leads to clonal expansion and cancer and that finding the key genes will lead to an understanding of the underlying science and provide therapeutic targets. This approach ignores genomic and system heterogeneity, which is pervasive in cancer. A key assumption of minimal gene sets is the imagined optimal conditions that limit the all-important system heterogeneity. As each system has its own conditions, there are no such universal optimal conditions in the first place. Artificial conditions can be created to illustrate the significance or insignificance of genes. For example, under experimental conditions, over 70% of all genes from yeast can be deleted without serious consequences. Similarly, E. coli genomes can eliminate 10%e30% of their original genes without any detectable effect on bacterial viability. In fact, the fraction of essential genes proved to be surprisingly low in almost all organisms studied, typically in the range of 10%e30% of the whole gene set (Koonin, 2003; Feher et al., 2007). Such a conclusion has little relevance in natural systems where survival and competition occur within dynamic natural environments. Not only are individuals with serious deficiencies unable to survive competition in the real world, but system constraints (like the mechanism of sexual reproduction, see Chapter 5) will not allow such a drastic gene elimination to occur while simultaneously maintaining the same system. Indeed, when culture conditions were varied in the same yeast model, many of the previously characterized “nonessential genes” proved to be essential for viability (Hillenmeyer et al., 2008). No wonder Venter has faced such a challenge trying to assemble a “standard” genome for many marine microorganisms collected in nature. Not only is there a tremendous diversity at the gene level, but whether or not a gene is essential largely depends on its internal genetic and external environment (Nealson and Venter, 2007). It is thus clear that there are likely no minimal gene sets with fixed conserved genes in nature, as
70
2. GENES AND GENOMES REPRESENT DIFFERENT BIOLOGICAL ENTITIES
heterogeneity is the inseparable feature of biological systems. There also is no standard genome with fixed genes in microorganisms because of a lack of genome constraint provided by sexual reproduction. So where should we look for the inheritance that defines the pattern of evolution, knowing that inheritance plays a major role in evolution? If it’s not in the genes, then within what genetic component does it exist? Interestingly, the efforts of illustrating the ultimate importance of genes (including efforts to identify minimal gene sets) have in fact led to the search for the correct level of inheritance (Heng, 2009; Heng et al., 2011a). It has been stated that “some genes and the functions encoded by them are absolutely necessary for the survival of any living entity” (Juhas et al., 2011). While some functions are absolutely necessary for the survival of any living system, survival can be achieved by a variety of emergent mechanisms based on different genes. The importance of individual genes is accentuated under artificial experimental conditions, but in the natural world their individual importance can be lost or changed amid the collective interactive genomic network effect. Thus, these contradictions call for a new concept that focuses on the whole (genome package with unlimited potential) rather than the parts (diverse genes). Gene coding and system coding are fundamentally different, and it is genome coding that will provide answers for the nonexistent minimal essential gene sets. Interestingly, based on genomic information, many researchers now accept that approximately 300 genes are required to support cellular life (Juhas et al., 2011). A new suggestion is that it is likely that a certain amount of minimal variable parts (genes or DNA sequences) are needed but the specific parts list could be very different in different systems, as long as a minimal function is achieved, there is no requirement for the same parts. This idea is consistent with the finding that there often is a conservation of networks across distant species despite a lesser amount of conservation of specific genes. It would be valuable to analyze the relationship between the number of genes and degree of system complexity to investigate whether there is a required number of parts for the emergence of a certain degree of complexity. Both computational simulations and synthetic genomics could be applied to achieve this goal. 2.4.2.2 Are Gene or Genome Alterations Mainly Responsible for Speciation? There has been a significant bias largely caused by the confusion between the function of genes and chromosomes. Ever since Dobzhansky’s work over 80 years ago, the speciation gene concept has been favored as the underlying genetic mechanism of evolution. Using the classic approach to study speciation genes, Dobzhansky initially linked hybrid testis size (a proxy for fertility) to a number of mutant markers by
2.4 REEXAMINING GENE THEORY PREDICTIONS
71
crossing different species of Drosophila pseudoobscura (Dobzhansky, 1936). The DobzhanskyeMuller model of hybrid incompatibility suggested that genetic incompatible is caused by new mutation combinations unique to the hybrids (mating between species). While in parental populations (different species) each incompatible allele can arise and become fixed without combinational effects, it was the cross-hybridization that created the harmful combinations. Two important issues need to be pointed out. First, the Dobzhanskye Muller model is based on the assumption that a new gene mutation is the basis of a new species. According to this model, different species should have different genes or mutational profiles. If this key assumption is not accurate, the whole model is invalid. Second, many of Dobzhansky’s wonderful experiments were actually based on chromosomal studies rather than gene studies. Conclusions at the gene level using chromosomal studies likely will be incorrect in his cases because these two levels involve different genetic roles. In fact, the reduced hybrid compatibility he observed might simply be caused by chromosomal incompatibility. The jump from the chromosome-based concept to Dobzhansky’s gene-based concept may represent his greatest limitation. Speciation studies have now entered the molecular era, and there have been increasingly exciting reports in top science journals that claim to have identified speciation genes. In particular, equipped with highthroughput large-scale genomic analysis, speciation genomics has provided a great deal of expectation. In contrast, there has been limited interest in how chromosomes or genomes initiate speciation. To further explore this issue, we first must decide on the definition of a species. According to Ernst Mayr’s “Biological Species Concept,” species are “groups of interbreeding natural populations that are reproductively isolated from other such groups” (Mayr, 1963). The modern version of the definition emphasizes the aspect of potential interbreeding and deemphasizes natural populations. This could apply to experimental conditions where most speciation genes have been identified. The following statement was quoted from Douglas Futuyma (Futuyma, 1998): “Species are groups of actually or potentially interbreeding populations that are reproductively isolated from other such groups.” Alternatively, the “Morphological Species Concept” considers that two organisms are the same species if they look similar, while the “Evolutionary and Ecological Concept” defines organisms as the same species if they share the same evolutionary or ecological history. Despite some limitations, the Biological Species Concept still dominates. Major limitations of this concept include not explaining how asexual organisms, preserved museum specimens, and extinct taxons fit into this definition of a species, as well as defining what “potentially interbreeding” actually means.
72
2. GENES AND GENOMES REPRESENT DIFFERENT BIOLOGICAL ENTITIES
Second, one needs to know the mechanisms of speciation. The commonly accepted view suggests that reproductive isolation interrupts gene flow. Reproductive isolation can be geographic, genetic, or even behavioral. There are many models to explain how reproductive isolation happens. The ecological model states that ecological circumstances can cause two populations to become reproductively isolated and that following a sufficiently long period, a new species will develop because such isolation provides the opportunity for the original species to split after accumulating enough variation. Genetic models also address the mechanism that leads to genetic incompatibility. Within the modern synthesis school of thought, ecological reproductive isolation plays an important role in addition to genetic mechanisms, as macroevolution (of speciation) can be viewed as the compounded effects of microevolution where ecological isolation provides the time needed for the accumulation of microevolution. One of the commonly used and simple definitions of genetic speciation (for sexually reproducing organisms) is the process that transforms within-population variation into taxonomic differences through the evolution of inherent barriers to gene flow (Noor and Feder, 2006). Interestingly, recent data surprisingly suggest that much of this divergence, which led to incompatibility between species (an assumed key step for speciation), does not appear to be driven by ecological adaptation but may instead result from responses to purely mutational mechanisms or to internal genetic conflicts (Maheshwari and Barbash, 2011). Third, what type of genetic changes result in speciation (Noor and Feder, 2006)? Currently, there are three types of genetic factors known to contribute to inherent barriers that serve as the genetic basis of reproductive isolation. From the new genome perspective, the most significant type is chromosomal or genome changes. Chromosomal rearrangements and number changes, particularly polyploidization, are the most commonly identified phenomena associated with speciation. This mechanism has been commonly observed in plants. The generation of new plant species through autopolyploidy was long ago observed by Hugo de Vries (1905), one of the rediscoverers of Mendel’s experiments. He isolated a tetraploid version of the normally diploid evening primrose, Oenothera lamarckiana, which he named Oenothera gigas (Brown, 2002). There are examples where the duplication of an entire genome or massive chromosome changes occurs in a single or a few generations, resulting in speciation. This example of rapid chromosomal speciation explains cases that are not dependent on physical isolation occurring over a long duration of time. The fact that the majority of eukaryotic species display different karyotypes strongly supports the importance of chromosomalmediated speciation (White, 1978; King, 1995; Ye et al., 2007; Heng, 2007b, 2009, 2015). Unfortunately, the concept of chromosomal speciation
2.4 REEXAMINING GENE THEORY PREDICTIONS
73
is largely ignored by current research, as it fundamentally contradicts the common belief of gene-mediated stepwise evolution. Extrachromosomal elements including cytoplasmic symbionts and TEs can also contribute to speciation (Margulis and Sagan, 2002; Rebollo et al., 2010). It should be pointed out that this category is closely associated with chromosomal speciation. The symbiotic process involves the interaction of the symbiont and host’s genome. TE-mediated speciation also usually requires genome reorganization. TE also represents a powerful mechanism to induce genome alterations by changing the topology of genes within a genome, even without gross chromosomal alterations (more discussion on this process will follow) (Rebollo et al., 2010). Finally, genic elements or speciation genes receive the most attention, as they fit well with the current mainstream evolutionary belief that a gradual accumulation of new genes is the key to speciation. The importance of incompatibilities between the genes of diverging species causing reproductive isolation was previously suggested by the Dobzhanskye Muller model of hybrid incompatibility (Wu and Ting, 2004). Despite its importance to the gene theory and the efforts to trace down speciation genes using cutting-edge technologies, to date there has been rather limited success as only a few such putative genes have been identified in various species, the majority of which come from the Drosophila species (Noor and Feder, 2006). Recently, with using the latest methods, increasing numbers of speciation genes have been identified, including in plants (Rieseberg and Blackman, 2010), which seems to support the DobzhanskyeMuller model. However, many of the cases of speciation genes are probably better explained by the chromosomal-based concept (again, see discussion below). Rather than immediately presenting the genome-mediated speciation model as the major form of speciation, this chapter will focus on the difference between genes and the genome. Because of its ultimate importance and sharp contrast to popular gene-centric views, genomemediated speciation will be discussed in Chapter 6. The following are some interesting observations/questions to illustrate the bias and challenges of the gene-speciation concept, which is very helpful to illustrate the differential roles of genes and the genome. While it is true that increasing numbers of papers have linked genes to hybrid incompatibility, it would be comforting if these purported speciation genes are actually the real causative factors of the hybrid incompatibility that has been observed. When hybrids are formed, there are usually multiple genetic alterations involved, which provide opportunities for different researchers to make specific links based on their unique concepts. This situation is explained using cancer research as an example. Researchers with different interests using different technologies will each make their own discoveries. DNA sequencing will surely reveal
74
2. GENES AND GENOMES REPRESENT DIFFERENT BIOLOGICAL ENTITIES
an array of gene mutations in a tumor, while proteomic studies will surely reveal differential posttranslational modifications of certain proteins, which lead to different and conflicting conclusions. Cancer is caused by these gene mutations in one case and posttranslational modification in the other! Others will find a strong correlation with any number of changes, including but not limited to copy number variations, specific RNA splicing forms, noncoding RNA, TEs, defects of mitochondria, altered DNA methylation, protease overexpression, endoplasmic reticulum stress, inflammation, abnormal gene expression, microenvironment changes, and, of course, chromosomal alterations. Although efforts have been made to integrate various datasets using computational or systems biology methods, there is currently no unified, established approach to integrate data from multiple levels of detection and in particular, no systematic comparison to determine which level of alteration is most important and which level should be the main window of our observation. In cancer research, seemingly unlimited genetic and epigenetic factors can be linked to the cancer phenotype and genome dynamics (see Chapter 3). Similar to cancer, if multiple levels of genetic and epigenetic alterations are examined in the hybrids where speciation genes are identified, multiple different links could be established. Moreover, the more cases of hybrid incompatibility that are examined, the more diverse molecular mechanisms behind the incompatibility will be identified, and the number of speciation genes will continue to increase. By quickly scanning recent literature, many different genes have been found that underlie hybrid incompatibility, and these genes indeed represent a wide array of functions, including those involved in oxidative respiration, nuclear trafficking, DNA binding, plant defense, pollination, and nuclearmitochondrial conflicts (Rieseberg and Blackman, 2010; Johnson, 2010; Chou et al., 2010). In addition, protein-coding genes (sequence and location), copy number variations, TEs, interactions among heterochromatin, noncoding RNA, and other genetic and epigenetic factors are all involved. Clearly, there are no general gene sets responsible for speciation. Similar to the search for cancer-causing genes, there likely will be unlimited genes or combinations of genes that can be linked to speciation. At the end of the day, the speciation gene concept will become insignificant. That is the exact reason researchers now call for a search for a more fundamental evolutionary mechanism of speciation rather than continuing to focus on individual links. It is interesting to cite John Wilkins’s similar viewpoint (Wilkins, 2008, with permission): There is a widespread tendency of biologists to overgeneralise from their study group of organisms to the whole of biology . For some time now various researchers . have sought speciation genes. These are genes that cause speciation, in a general sense, but the slide appears to be made to the conclusion that there are particular genes in many if not most cases of speciation. I want to consider this now.
2.4 REEXAMINING GENE THEORY PREDICTIONS
75
Phadnis and Orr have found that a particular gene is both a gene causing sterility between hybrids of two Drosphilid subspecies, and a segregation distorter - that is, the gene causes itself to be differentially copied when gametes are formed. A similar process of meiotic distortion occurs in mice as Mihola et al. show. A great result, but how general? In the commentary accompanying the online advance publication, Asher Mullard writes and quotes this: “.With more genes should come greater insight into speciation. Some geneticists wonder whether only particular classes of genes are important in speciation d such as epigenetic genes or segregation distorters d or whether many sorts of genes help to drive species apart.‘What is surprising about the speciation genes that have been identified [so far] is that there is a whole hodgepodge of different kinds of genes with different functions,’ says Nachman. ‘I don’t think we’re going to see [trends] until dozens of genes are identified, and there’s just a handful now.’” Mullard, 2008 But why think that there should be particular classes of genes that contribute to speciation? Sure, there may be genes that are implicated in Drosophilid speciation, or maybe even in insect speciation, but all that matters in sexual species is that some barrier to reproduction exists. It need not be a particular barrier. Consider this - how many ways are there to impede the flow of traffic? Should we expect there are only a couple of ways? Witches’ hats and workers’ signs may be common, but there are sinkholes, burning barriers of demonstrations, collapsed cranes, street parties . and the list could be indefinitely extended. I suspect that speciation is like that e it’s a negative property, and one that can be arrived at in an indefinitely large number of ways .
Surprisingly to many speciation gene researchers, these identified diverse speciation genes are often nonessential and have functions only loosely linked to reproductive isolation (Wu and Ting, 2004)! Many other important issues are associated with the speciation genes identified. First, many specific models do not reflect real-world situations. It is difficult to validate many of the identified speciation genes in natural populations (Noor and Feder, 2006). Furthermore, it has been gradually realized that many of the speciation models are seriously misleading, as the modeling process changes the nature of the systems under study (Chapter 3). Second, hybrid isolation (HI) is not equal to speciation. There is a longlasting concern within the field as to whether hybrid incompatibility genes are directly involved in causing speciation or if they evolved after full reproductive isolation (Noor and Feder, 2006). Not all genes affecting reproductive isolation today had a role in speciation (Nosil and Schluter, 2011). When the HI is severe, the hybrids cannot survive. This type of HI represents an end to a species rather than the beginning of a new one. More importantly, speciation genes do not address the key question of how the initial gene led to genome alterations, a common signature of differentiating species. Specifically, because speciation is a whole genome concept, it is challenging to study the process of evolution by increasing genetic isolation at the gene level.
76
2. GENES AND GENOMES REPRESENT DIFFERENT BIOLOGICAL ENTITIES
Third, regarding the mechanism whereby evolution causes genetic divergence within a species and subsequent incompatibility between species, unexpectedly, much of the genetic/epigenetic divergence does not seem to be driven by ecological adaptation. Rather, it results from a response to genetic mechanisms (the process of genetic alterations and internal genetic conflict), despite the fact that ecological adaptation is considered an important evolutionary mechanism causing genetic/ epigenetic divergence (Noor and Feder, 2006). Interestingly, by searching for a common mechanism rather than diverse speciation genes, using a similar analysis as was used to develop the evolutionary mechanism of cancer (Chapter 3), many genetic isolation cases are in fact linked to genome-level alterations. In other words, most diverse “speciation genes” associated with hybrid incompatibility can also be linked to genomedefined system changes reflected as genetic conflicts and chromosomal incompatibility in general. For example, many cases of “gene” speciation occur in addition to chromosomal changes. Alternatively, the data can be reinterpreted as evidence to support the speciation by chromosomal changes if one thinks outside of the (gene) box. Many investigations have also directly linked the diverse cases of reproductive isolation to chromosomes (Faria et al., 2011). For example, the divergence of noncoding repetitive sequences between species can directly cause reproductive isolation by altering chromosome segregation (Ferree and Barbash, 2009), and rapid heterochromatin evolution affects the onset of hybrid sterility (Bayes and Malik, 2009). On the surface, there are many cases that are less obviously related to specific chromosomal alterations (Maheshwari and Barbash, 2011). These studies have identified “internal genetic conflicts” as key factors contributing to HI. At a fundamental level, however, this internal genetic conflict is best explained as genome system incompatibility rather than individually identified superficial factors. For example, the small DNA sequence differences among closely related species can in fact be associated with a significant genome-level difference. Even identical genes can form different chromosomes. In addition, the gene only represents a tiny portion of the DNA sequences. As the genome defines the system and its dynamic boundary, and speciation is the definitive process creating new biosystems, the genome-level changes should represent the main mechanism of speciation in a majority of cases (Heng, 2009). One important fact is that when genome system changes occur (reflected on a genome scale rather than an individual gene or genetic element scale), they often involve multiple factors, each of which has been discovered by individual researchers who consider “their” factor to be the key. But unfortunately, as was mentioned earlier, these are not the common or universal factors for system changes as there are so many similar and different factors out there which make each of these “significant”
2.4 REEXAMINING GENE THEORY PREDICTIONS
77
factors less so. Furthermore, the majority of these identified factors often have alternate explanations (this is particularly true when researchers are focused on an individual hypothesis and study them in linear models). Again, this situation is very similar to the current cancer research situation where multiple and varying factors have been identified as the likely cause of cancer while the overall molecular understanding is actually becoming increasingly complex and confusing (Chapter 4). The problem is the same: most researchers have ignored the central importance of the genome. The suggestion that chromosome-based genomics is the predominant form of speciation over evolutionary time rather than individual genedefined genetic elements can find its roots long before the establishment of the field of genetics. The nongenetic “chromosomal” viewpoint on speciation can be traced back to George Romanes (1886) and William Bateson (1909), as both were convinced of the importance of “nongenic factors” to use modern terminology (Forsdyke, 2003, 2004). Bateson focused his research on the problem of species. Despite his influence (he introduced the term “genetics” and brought Gregor Mendel’s work to the attention of the English-speaking world), he failed to convince contemporary and modern scholars. This represents yet another interesting example of how the holistic or systematic view often loses the battle with its reductionist rival. According to Forsdyke, Bateson’s position on nongenic factors was often misquoted to support the genic viewpoint. Richard Goldschmidt, one of the most controversial biologists of the 20th century, championed the chromosomal speciation concept and argued that micro- and macroevolution are distinctively different and use different mechanisms. He proposed two main mechanisms for macroevolution: systemic mutation and developmental macromutations. Systemic mutation, he suggested, involved chromosomal changes in speciation where the linear arrangement of the genetic material had a marked impact on the set of reactions created by its immediate products (Goldschmidt, 1940). Unfortunately, and interestingly, his brilliant idea of chromosomal speciation was ignored, rejected, or overshadowed by his hopeful monster idea. For example, the prominent geneticist and evolutionist Theodosius Dobzhansky with regard to Goldschmidt’s theory stated: . systemic mutations . have never been observed. It is possible to imagine a mutation so drastic that its product becomes a monster hurling itself beyond the confines of a species, genus, family or class . The assumption that such a product may, however rarely, walk the earth, overtaxes one’s credulity .. Dobzhansky, 1941
Considering that Dobzhansky worked on chromosomal research for decades and was familiar with the diverse chromosomal differences
78
2. GENES AND GENOMES REPRESENT DIFFERENT BIOLOGICAL ENTITIES
among species, and that Goldschmidt explained systemic mutation as chromosomal alterations, Dobzhansky’s comments are puzzling to me. On the other hand, however, knowing his belief in the accumulation of small genetic changes leading to macroevolution over an extended historical period, such a blind spot is not a total surprise. Dobzhansky clearly became biased when he jumped from chromosome-based research to the gene concept (Heng, 2009). Even these scholars, who appreciated Goldschmidt’s idea of macroevolution, like Stephen Gould, misunderstood Goldschmidt’s systemic mutation and developmental macromutations. According to Gould, I do, however, predict that during this decade Goldschmidt will be largely vindicated in the world of evolutionary biology ...As a Darwinian, I wish to defend Goldschmidt’s postulate that macroevolution is not simply microevolution extrapolated, and that major structural transitions can occur rapidly without a smooth series of intermediate stages ..In my own, strongly biased opinion, the problem of reconciling evident discontinuity in macroevolution with Darwinism is largely solved by the observation that small changes early in embryology accumulate through growth to yield profound differences among adults . Indeed, if we do not invoke discontinuous change by small alteration in rates of development, I do not see how most major evolutionary transitions can be accomplished at all. Few systems are more resistant to basic change than the strongly differentiated, highly specified, complex adults of “higher” animal groups. How could we ever convert an adult rhinoceros or a mosquito into something fundamentally different. Yet transitions between major groups have occurred in the history of life. Gould, 1977
It should be pointed out that, even though both involve morphological changes, the gross morphological changes that occur during development differ fundamentally from the sudden emergence of a species as well as key evolutionary transitions. Morphological changes occurring during development are mainly determined by gene regulation within a given genome system (with high predictability), while speciation occurs mainly through genome alterations (with less predictability). Attributing a major influence to small gene alterations that have a significant impact on the developmental processes without genome alterations cannot explain speciation. Gould wanted to reconcile macroevolution and genemediated microevolution (the relationship between stepwise gene accumulation and the formation of major features) by using developmental genes. While this seems reasonable within gene-influenced evolutionary thinking and in particular the overall stepwise patterns of evolution, it does not solve the basic issue. In fact, this confusion was partially initiated by Goldschmidt himself when he proposed the two opposite mechanisms of macroevolution where one is independent of genes and the other is closely associated with genes. This also reflected his effort to reconcile the confusion between the two genetic levels. When two very different mechanisms were proposed
2.4 REEXAMINING GENE THEORY PREDICTIONS
79
to explain macroevolution, others could simply pick the one that best fits their own ideas. That is what happened. According to Michael Dietrich, The second mechanism that Goldschmidt proposed to explain macroevolution did not depend on the rejection of the classical gene . Goldschmidt proposed that mutations in developmentally important genes could produce large phenotypic effects. He called the results of these developmental macromutations ‘hopeful monsters’ because they were the embodiment of large phenotypic changes that had the potential to succeed as new species. It is important to note that Goldschmidt’s idea of a hopeful monster was not tied to his idea of systemic mutation. In fact, Goldschmidt used hopeful monsters to argue, by analogy, for evolution by systemic mutations. The possibility of mutations in developmentally important genes was intended to make the genetic mechanism of systemic mutation more plausible . Dietrich, 2003
As pointed out by Dietrich, the idea of developmentally important mutations with large effects was accepted by many well-known researchers. In contrast, the more important theory of speciation through system mutation received no support at all (Dietrich 2003). Looking back, the field was looking for answers, but only chose to see ones that fit the popular gene-centric paradigm. The “hopeful monster” has had a life of its own but in a different way than Goldschmidt had originally intended. Rather than making the genetic mechanism of system mutation more plausible, the idea of developmental macromutation has in fact caused many researchers to take their eyes off the ball of chromosomal speciation. It is sad that Goldschmidt has been most remembered as the originator of the idea of the hopeful monster, an idea that in reality represented only a minor component of his concept of macroevolution (Forsdyke, 2003; Schlichting and Pigiucci, 1998). Perhaps, even more unfortunate is that the proposed linkage between genes and developmental changes delayed or diluted the efforts of searching for genome-mediated macroevolution. It is interesting to point out that by focusing on features (phenotypic traits) rather than a genome-defined system, many researchers including Bateson and Goldschmidt had searched for the causes of profound new features during macroevolution and nongenic complementary factors involved in speciation. Some had finally realized that chromosomal alterations themselves are a major force of speciation and the newly gained features are part of the emergent properties of the newly formed genome. Similarly, many secondary features including identifiable genes are the consequence of genome alteration-defined speciation. As will be discussed in Chapter 6, this realization comes as a result of genome-based cancer research. Barbara McClintock’s earlier work, in fact, supports “systemic mutation” (chromosome alteration) causing speciation. By crossing strains of corn, she observed a genetic earthquake triggered by
80
2. GENES AND GENOMES REPRESENT DIFFERENT BIOLOGICAL ENTITIES
introducing broken chromosomes from both parents into a zygote. Neil Jones nicely summarizes McClintock’s genetic earthquake experiments: In 1944 one of the crosses, in which a broken chromosome 9 was contributed by both parents, led to some unexpected results. It triggered a “genetic earthquake” in the kernels of the ear concerned. The embryos in the kernels were undergoing the chromosome BFB (Chromatid breakage-fusion-bridge) cycle in their early development, and when the seeds were germinated and grown McClintock witnessed an enormous burst of genetic instabilities among the progeny plants. Jones, 2005
Among these products of “genetic earthquake,” 87 of 677 kernels did not germinate, 134 died as seedlings and 73 died as young plants. 383 plants (>57%) grew to maturity and were capable of self-pollination (150 of which were used for meiotic analysis) (Jones 2005). The “genetic earthquake” experiment revealed the following: 1. Extensive genome rearrangement was involved. 2. Despite death at different stages, some plants can survive with reproducibility. 3. stable phenotypes were noticed. The most important message from the genetic earthquake experiment is that the genome rearrangements can not only produce diverse features and phenotypes but also generate new genome systems. Despite the fact that many of these new genomes are unstable and associated with massive cell death, infertility, and growth abnormalities, they represent greatly enhanced potential for new speciation. At the 1980 Miami Winter Symposium, McClintock stated (McClintock, 1980): There is little doubt that genomes of some if not all organisms are fragile and that drastic changes may occur at rapid rates . It is reasonable to believe that such genome shocks are responsible for the release of otherwise silent elements, which can then initiate changes to overcome disruptive challenges. Since the types of genome restructuring induced by such elements know few limits, their extensive release, followed by stabilization, could give rise to new species or even new genera.
Following McClintock’s lead, other molecular biologists also observed evidence linking drastic morphological changes to genome alterations. According to Nina Fedoroff who cloned jumping genes: It is as if transposable elements can amplify a small disturbance, turning it into a genetic earthquake. Perhaps such genetic turbulence is an important source of genetic variability, the raw material from which natural selection can sift what is useful for the species. Fedoroff, 1984 I cannot say for sure that transposable elements are useful to the corn plant, I do know from experience, however, that corn lines with too many active transposable elements are in trouble: Some of their offspring look more like cabbages than corn plants. If I try to think like a corn plant (although sometimes I’m convinced that I think
2.4 REEXAMINING GENE THEORY PREDICTIONS
81
more like a cabbage), I conclude that my best bet is to keep my options open by hanging on to some of these principles of radical change, but shackling them as securely as possible. Fedoroff, 1992.
The message is clear that TEs can induce drastic morphological changes. Based on McClintock’s hypothesis and the genome theory (Chapter 7), it is clear that a high level of activity of TEs can be induced under stress. Recent studies demonstrate that the stress condition does induce TE activity (Wilkins, 2010). However, sex-mediated genome constraints purify most genomes with each generation, defining the extent to which chromosomal restructuring contributes to organismal evolution. The less stringent sexual filter in plants might also allow higher levels of genome dynamics, as drastic genome alterations are better tolerated in plants. In fact, polyploidy is pervasive in plants (some estimates suggest that 30%e80% of living plant species are polyploidy). In contrast, polyploidy mammals are rare and most often result in prenatal death. It is thus realized that the jumping gene, in fact, is not really a genebased concept, but rather a concept of genome reorganization as it is really about the changing relationship of genes within a genome. When the same gene jumps, on the surface, it can be traced to a specific gene function, but in all actuality something more profound has occurredethe genome context has changed by reorganizing the physical interactive relationship between genes within the genome. This leads to a new genetic package with the same genes in an altered organizational topology. Thus, in some well-controlled experimental systems, it is possible to link newly formed morphological features to distinct individual genes. In reality, system alterations occur, mediated by chromosomal alterations that affect many genes, and direct linear relationships between genotype and phenotype can be illustrated only in exceptional cases. No wonder McClintock had such a high level of appreciation of the genome rather than the gene, even though her work led to the concept of the jumping gene! As it will be discussed in Chapter 4, the jumping gene likely contributes to fuzzy inheritance and changes the chromosomal coding in addition to interrupting specific genes. Of course, the large-scale karyotype changes represent more powerful means for speciation. Michael White has estimated that over 90% of speciation events are accompanied by karyotypic changes (White, 1978). The significance of chromosomal speciation is further emphasized by Max King (King, 1993). Together, George Romanes’ nongenic torch was passed on by a series of scholars including William Bateson, Nicolai Kholodkovskii, Richard Goldschmidt, Barbara McClintock, Michael White, Max King, and Donald Forsdyke. The nongenic factor really was the result of the influence of the chromosomal factor and should now be referred to as the genome context.
82
2. GENES AND GENOMES REPRESENT DIFFERENT BIOLOGICAL ENTITIES
Because of the highly heterogeneous nature of biology, it is likely that some genes can play an important role in speciation. The cases of gene speciation, however, represent exceptions compared with chromosomal or genome speciation, as genome-level alterations play a much more dominant role in macroevolution. New testing has shown that some known speciation genes in yeast do not play a major role in yeast speciation, and the likely key factor of speciation is the diverse sequences that interfere with meiosis and the chromosomal behavior of the system (Greig, 2007). Regarding the previously noted contrary cases of chromosomal speciation (there are some cases that are inconsistent with the chromosomal speciation theory, as different species can share the same karyotypes and some species that produce sterile hybrids without obvious changes in chromosome arrangements), one explanation is the lack of resolution of classic karyotype analyses methods. Using advanced cytogenomic methods, many intrachromosomal rearrangements can now be detected that distinguish these species (Skinner and Griffin, 2012). Large-scale copy number variation can potentially contribute to this phenomenon as well; the key is to change genome compatibility as defined by sexual reproduction (see Chapters 5 and 6). In addition to large-scale copy number variation contributing to speciation, speciation can be driven by sequence-level change; however, this change must be on a large scale, effectively working as genome-level change. For instance, in the closely related yeast species such as Saccharomyces cerevisiae and Saccharomyces paradoxus, there are no identified gross chromosomal translocations, but about 15% of sequences are divergent. The genome sequencing divergence can affect fertility when two species are hybridized, suggesting that it is the chromosomal or genome difference that matters despite the fact that there are no visible chromosomal alterations at the karyotype level in these exceptional cases. It is not surprising that significance of chromosomal speciation is obfuscated when viewed through the gene-centric lens. In their 1998 article, Goyne and Orr listed some of the empirical problems of chromosomal speciation (Coyne and Orr, 1998). These problems are in fact more relevant to the gene-speciation concept. For example, they asked: “even in species hybrids whose chromosomes fail to pair during meiosis, we do not know whether this failure is caused by difference in chromosome arrangement or difference in genes.” So what is the “conclusive evidence” that genes rather than chromosomes cause reproductive isolation? Why one over the other? Once again, such drastically different conclusions suggest the importance of using a correct framework for data interpretation and calls for further study of the differences between genes and genomes. Interestingly, increased researchers are favoring chromosomes over genes for speciation (Ye et al., 2007; Heng, 2015). Even those who study
2.5 THE CONFLICTING RELATIONSHIP BETWEEN THE GENE
83
how new gene formation leads to new phenotypes realize that the insertion of a new gene into a network can change the gene network by altering the network topology. This change can create new pathways by rewiring previous gene networks, ultimately leading to new phenotypes (Chen et al., 2013). What people have not realized is that karyotype changes bear a much greater impact on the network topology, as the physical order of genes along and among chromosomes functions as the physical platform for gene networks (Chapter 4).
2.5 THE CONFLICTING RELATIONSHIP BETWEEN THE GENE AND THE GENOME The above examples forcefully demonstrate the conflicting interpretations of evolutionary mechanisms based on gene-centric and genome-based genetic concepts. To understand genome-based genomic thinking, one first needs to illustrate that the genome represents a unique level of genetic organization. Demonstrating that there is a conflicting relationship between the gene and genome and that genes and genomes represent a distinct level of genetic organization will help to eliminate the confusion between the two. The cooperative and conflicting relationship between the gene and the genome is best illustrated in Chapters 5 and 6. The following are some descriptions of experiments that show that the genome is a unique level of genetic organization that functions separately from the sequence level.
2.5.1 Chromosomal Position and Loop Size: Overall Genomic Architecture Constrains Local Structures During the gene era, attention was focused on cloning and characterizing individual genes. Much is known about molecular mechanisms of genes and their regulation under experimental conditions, but on the other hand, there is no clear idea how genes are organized within the nucleus. It is known that DNA is organized into chromatin that forms loops within the nucleus that attach to the nuclear matrix, but specifically, how genes and regulatory elements are located along chromatin loops, what the precise loop size is, and how loop behavior contributes to gene function, particularly, within a genetic network are not clear. This situation seems not to bother the majority of molecular researchers at all, as the gene has been viewed as a powerful independent information unit. In addition, molecular biology has been deeply influenced by biochemists, many of whom consider cells to be mini test tubes and hence they are not interested in studying cell-based systems. Furthermore, it is much easier
84
2. GENES AND GENOMES REPRESENT DIFFERENT BIOLOGICAL ENTITIES
to perform elegant experiments on the fundamentally simplified gene systems without chromatin context and “prove” a hypothesis that defines particular genetic elements. An example of the influence of the importance of genome-level organization is given by Peter Moens who was the first to use fluorescence in situ hybridization (FISH) to study meiotic chromosomes (Moens and Pearlman, 1989). Using the approach of multicolor DNAeprotein codetection (Heng et al., 1994; Heng et al., 1996), the structure of various DNA inserts within host meiotic chromosomes was compared, including phage DNA and human DNA inserts within mouse meiotic chromosomes. Meiotic chromosome structure is based on the synaptonemal complex (SC) which forms the backbone of the chromosome and anchors loops of DNA. It was shown that inserted phage DNA formed much bigger loops than somatic DNA. The drastic “looping out” morphology of phage DNA suggested that certain DNA sequences that serve as anchors are important to regular chromatin loop size formation. The same type of DNA inserts along different positions of the meiotic chromosome was further compared using transgenic mice. Interestingly, the meiotic loop size was determined by the position of the insertion site along the chromosome. Specifically, the closer to the telomere, the smaller the chromatin loop size of the insert (Fig. 2.1). Similar cases can also be observed by comparing the loop size of telomeric sequences located at interstitial positions and at the ends of the chromosome, respectively. In both Chinese hamsters and golden hamsters, telomeric sequences can be detected at both telomeres and interstitial positions because of
FIGURE 2.1 Transgenic insertions on meiotic chromosome core. The diagram summarizes the loop sizes of various foreign DNA insertions along mouse meiotic chromosome cores (synaptonemal complex [SC]). Phage DNA insert (P) does not form normal size loops possibly because of a lack of anchor sequences. Human DNA insertions (red color) form loops on mouse chromosomes similar to mouse endogenous DNA loops (blue color, M); however, loop size is dependent on insertion sites. Loop size is smaller when insertion occurs near a telomere (green loop, endogenous mouse telomere), Ha, and Hb (peritelomeric human inserts). In contrast, loop size is larger when insertion occurs at non-peritelomeric regions (Hc).
2.5 THE CONFLICTING RELATIONSHIP BETWEEN THE GENE
85
chromosome fusions that happened during natural evolution. The telomeric loop is extremely small at the chromosomal end, while the interstitial loop of the telomeric sequences is large, similar to other interstitial sequences. Together, these observations demonstrated that the size of chromatin loops is also determined by the position of the integration site rather than just by the DNA sequence itself (Heng et al., 1996). This conclusion that genomic topology constrains the loop size of DNA has gained further support from studies of the loop size of human DNA insertions within yeast meiotic chromosomes. It is known that the loop size of human chromosomes is about 20 times longer than yeast. However, when inserted into yeast, the loop size of human DNA formed small loops similar to the host yeast suggesting that in yeast meiosis, human and yeast DNA adopt a similar organization within the chromatin along the pachytene chromosome cores. More interestingly, the recombination rates of the human segments are also increased to match those of yeast, possibly also contributing to the loop size change in the host’s environment (Loidl et al., 1995).
2.5.2 Gene Expression and Chromatin Loops: Chromatin Loop Domains Constrain Gene Function Gene expression control by promoters and enhancers has been shown in various in vitro systems and particularly in simple expression systems. These experiments have given the impression that only the genes and directly adjoining sequences themselves matter. On the other hand, it is well known that gene silencing can be achieved by positional effects, and there are many inconsistencies even within the same experimental systems as attested by transgenic mice generation. To understand how higher levels of genetic systems operate, such as whether or not chromatin can constrain the function of genes, chromatin loop behavior of various transgenes has been studied in transfected cells and transgenic mice. Specifically, the biological significance of nuclear scaffold/matrix-attachment regions (S/MARs) and how loop anchors impact on gene expression were investigated. The concept of the nuclear matrix has been a highly debated subject over the past half a century. Since its modern introduction by Berezney and Coffey revealing that the nuclear matrix is a proteinaceous skeleton in the nucleus that was resistant to nuclease digestion (Berezney and Coffey, 1974), this field has made tremendous progress by linking the nuclear matrix to nuclear architecture and chromatin package as well as largescale gene regulation. Because of its complexity, however, many questions remain with respect to this dynamic structure and its function. One
86
2. GENES AND GENOMES REPRESENT DIFFERENT BIOLOGICAL ENTITIES
criticism was based on the incorrect viewpoint that a nucleus is a bag of biochemical solutions where free diffusion occurs and requires no nonchromatin structure. Such a viewpoint also lacks the appreciation of bioheterogeneity. To determine the biological significance of S/MARs, S/MAR behavior and gene expression dynamics must be investigated. By introducing varying copies of S/MAR-containing constructs, various transgenic mice and transfected cell lines were established. As expected, the integrated S/MAR fragments are tightly anchored on the nuclear matrix illustrated by FISH detection following DNase I digestion to eliminate the chromatin loop portion not tightly associated with the nuclear matrix. Surprisingly, however, FISH analysis also showed that the transgenic somatic S/MARs were present in both the loop portion and the nuclear matrix regions when multiple copies of gene-S/MAR constructs were introduced. In other words, not every copy of the same S/MAR is used as an anchor. Of the 12 copies of tandemly arrayed transgenes of 40 kb human protamine genes, only 1 copy is expressed. In addition, only one copy is associated with the nuclear matrix and the rest of the copies reside on the loop portion. The same phenomenon was confirmed by transfected DNA fragments encompassing the human interferon (IFN)-b (huIFNB1) gene that contained the endogenous 50 S/MAR element. To further document such anchor dynamics, two adjacent bacteria artificial chromosome (BAC) probes were labeled with two distinct colors to “paint” portions of a given endogenous chromatin loop that contains many SAR/MARs. If the anchor of this loop is fixed, then the color configuration of the two probes should be fixed among cells. In contrast, if the anchor is dynamic, the configuration should vary. When the color configuration was compared among cell populations representing different stages of the cell cycle, the average pattern of color configurations was different (Fig. 2.2). These observations demonstrate that a key feature of chromatin loop anchors is that they are dynamic and context-dependent. A dynamic chromatin domain model of transcription regulation (a pulling model) was proposed based on the following information and reasoning regarding the S/MAR-mediated mechanism (Heng et al., 2000, 2001a, 2004a). (1) The chromatin loop domain is an integral component of the transcriptional regulatory mechanism associated with the nuclear matrix. (2) The selective use of S/MARs appears to be directly linked to the movement of the loops that represents a key component of the functional mechanism of chromatin packaging and gene regulation. (3) Chromatin loops can display specific yet flexible behavior, which correlates the number of S/MAR anchors and gene expression status (Fig. 2.3). The concept of the dynamic anchor reconciles many seemingly contradictory attributes previously associated with S/MARs. For example,
87
2.5 THE CONFLICTING RELATIONSHIP BETWEEN THE GENE
FIGURE 2.2 Dynamic configurations of loops illustrated by two-color fluorescence in situ hybridization (FISH). The diagram summarizes the two-color FISH result of two adjacent bacteria artificial chromosome clones representing a 300 kb genomic region on a nuclear halo. The released chromatin loops form the halo (blue color) surrounding the nuclear matrix. The V-shaped configuration of both red and green color probes were anchored on the nuclear matrix (center panel). The linear-shaped configuration with only one probe (red or green) is anchored on the nuclear matrix, whereas the adjacent probe was not. The color configuration is not fixed suggesting that the anchor site of the chromatin loops is not static on the nuclear matrix.
Nuclear Matrix Machinery
Unbounded MAR
Gene
Machinery
Gene
Forming Association (Initiation)
Pulling (Elongation)
Gene
Machinery Bounded MAR
Disassociation (Inactivation)
FIGURE 2.3 Model for the selective use of nuclear scaffold/matrix-attachment region (S/MAR) for transcription/replication regulation. The left side of the figure shows a gene located on the loop with an S/MAR in close proximity. When functional demands require the specific association of this gene with the transcriptional machinery located on the nuclear matrix, the S/MAR moves the gene to the nuclear matrix, thereby initiating gene expression (middle of figure). Following initiation, the gene is pulled in through the transcriptional machinery, allowing transcription to occur (right side of figure). There are two types of S/MARs: functional S/MARs serve as mediators to bring genes to the nuclear matrix to be transcribed and structural S/MARs serve as anchors, which are less dynamic compared with functional S/MARs. Reproduced/adapted with permission from citation Heng et al. (2004a).
88
2. GENES AND GENOMES REPRESENT DIFFERENT BIOLOGICAL ENTITIES
now we know that S/MAR anchors are necessary but not sufficient for chromatin loops to form, there is a direct link between S/MARs behavior/ function and gene expression, and finally, the function of genes is constrained by dynamic chromatin behavior. This message challenges the principal power of individual genes. This dynamic model has now received wide acceptance but initial resistance came from two opposite viewpoints. Those who believe that cells were no more than biochemical test tubes did not believe the concept of a nuclear matrix, and those who believed the nuclear matrix to be a static concept did not accept the dynamic features of it. From a genome perspective, this publication also emphasizes the importance of the chromatin topology and its dynamic relationship constraining individual genes. Clearly, the reality is that the function of genes is defined by a higher level of genetic organization. In recent years, many novel features of chromosome architecture have been revealed using Hi-C technology (chromosome conformation capture which comprehensively detects chromatin interaction in the nucleus) for genome-wide mapping and analysis (Schmitt et al., 2016). Specifically, the topologically associated domains (TADs) have been identified as an important feature for gene regulation. It is crucial to integrate the information from TADs with the nuclear matrix, as well as the chromosomal coding system (Chapter 4).
2.5.3 Loops/Chromosome Length and AT/GC Composition: Why Little Clarification Comes from the Highest Resolution To study the process of meiotic chromosome pairing, a new method was introduced to combine multiple color FISH and protein co-detection using spectral karyotype (SKY) technology. SKY can trace each chromosome with a unique color, while protein detection can study SCs (highlighted by detection of antibodies specific for SC) (Fig. 2.4). The successful application of SKY and SCeprotein co-detection has led to a surprising discovery: the length of mouse meiotic chromosomes (SC) does not precisely correspond to the length of the corresponding mitotic chromosomes (Fig. 2.5)! For example, in mitosis, the length of mouse chromosome 4 exceeds that of chromosome 5, and chromosome 14 is longer than chromosome 15. But mouse meiotic chromosomes when analyzed by our SKYeSC co-detection revealed that the length of chromosomes 5 and 15 were, respectively, much longer than chromosomes 4 and 14. The length of the SC has been used for karyotyping in the past, assuming that there is a precise correlation between relative mitotic length and relative meiotic length. This assumption has not been challenged previously, as there was no effective method to identify each meiotic
89
2.5 THE CONFLICTING RELATIONSHIP BETWEEN THE GENE
FIGURE 2.4 Spectral karyotype (SKY)eprotein co-detection on mouse meiotic chromosome cores (synaptonemal complex [SC]). SC proteins were labeled with FITC (fluorescein isothiocyanate)-conjugated SC antibody (left panel), while each individual chromosome was labeled with a unique SKY paint for specific identification (right panel). The length of the SC can then be measured for each specific chromosome. Reproduced with permission from citation Ye et al. (2001).
Relative SC length of Mouse Chromosomes 8 Relative SC Length
7 6 5 4 3 2 1 0
1
2
3
4
5
6
7
8
9 10 11 12 Chromosome
13
14
15
16
17
18
19
FIGURE 2.5 The inconsistency between the length of mitotic and meiotic chromosomes. Relative average lengths of mouse meiotic chromosomes (measured by the length of the synaptonemal complexs) are shown. Chromosomes 5, 7, 11, 15, and 17 are longer than expected based on the length of mitotic chromosomes, suggesting a packaging disparity between mitotic and meiotic chromosomes.
90
2. GENES AND GENOMES REPRESENT DIFFERENT BIOLOGICAL ENTITIES
chromosome for most species and to precisely compare the length of each SC (Ye et al., 2001). This finding challenges the “ABCs” of biology, as chromosome 4 must be longer than 5, and the length variation observed must be attributed to a slide preparation artifact. To ensure that the proper ABCs of biology were followed, SKY data were examined using FISH-specific probes and SC co-detection (Ye et al., 2006). These experiments were repeated again and again, and each time during meiosis, chromosome 5 was longer than 4 and 15 was longer than 14 (Fig. 2.6). This important observation was confirmed by others. By using the immunofluorescence method to examine exchanges in human spermatocytes, Terry Hassold’s group reported remarkable variation in the rate of recombination within and among individuals. They then confirmed that in humans and mice, this variation was linked to differences in the length of the SC. Therefore, the SC reflects genetic rather than physical distance (Lynn et al., 2002). Another 6 months later, Terry Ashley’s group showed that the mouse SC length is strongly correlated with crossover frequency and distribution. Although the length of most SCs
FIGURE 2.6 Synaptonemal complex (SC) length of chromosomes 14 and 15 in mouse meiotic cells. Confirmation of spectral karyotype data using proteineDNA-co-detection. Chromosomes 14 and 15 were specifically labeled by rhodamine (red) and FITC (green). SC was detected with an FITC-conjugated antibody. The left image was captured using a dual-color filter specifically showing both the specific label of chromosome 14 and its SC. The right panel is the same image taken with an FITC filter (showing the chromosome 15 paint and its SC). The SC length of mouse chromosome 15 is obviously longer than 14 Reproduced with permission from citation Ye et al. (2006).
2.5 THE CONFLICTING RELATIONSHIP BETWEEN THE GENE
91
correspond to that predicted by their mitotic chromosome length-defined rank, several SCs are longer or shorter than expected, with corresponding increases and decreases in MLH1, a mismatch repair protein that colocalizes with recombination foci (Froenicke et al., 2002). But what is the mechanism associated with the differential packaging between mitotic and meiotic chromosomes? Why is there disagreement between genetic distance (length of meiotic chromosomes) and physical distance (the length of mitotic chromosomes)? By comparing the expected SC relative length based on the relative physical length of the chromosome (by calculating the ratio of total base pairs of DNA on each mitotic chromosome among the base pairs of all autosomes of mouse genomes) and the actual relative length determined from our SKY and FISH experiments, all mouse chromosomes were divided into three categories. Most of them had equivalent SC representation. That is, the expected relative length was equal to the actual relative length. Some were in the SC overrepresented group, including chromosomes 5 and 15, where the actual relative SC length is longer than the expected relative length. Some were in the SC underrepresented group including chromosomes 4 and 14. What were the structural differences among these three groups of chromosomes, and in particular, what was special about the “outliers” that disrupted the expectations? It is likely that the AT/GC content might contribute to this phenomenon as it is known that the AT/GC content is associated with the distribution of genes along the chromosome. Luckily, the mouse gene sequencing had just been completed and the access to AT/GC information for each chromosome was available. Once the AT/ GC content for each mouse chromosome was compared, it became quite clear that there is a very nice correlation between SC representation and the GC content of the chromosome. In chromosomes 4 and 14 that have a much shorter SC, the GC content was among the lowest. In contrast, in chromosomes 5 and 15, the GC content was among the highest. In the majority of “normal” chromosomes, there is average GC content. This was an unbelievably simple and exciting correlation, but was it real? It turns out that many groups had already compared the relationship between the genetic recombination rates and GC content. They all failed to establish such a highly significant correlation, despite the use of highresolution whole genome scanning methods. It was true that by using cutting-edge technologies, it was possible to compare genetic recombination and GC content in observation windows of 5e10 kb. However, by ignoring the entire chromosome, the trees are lost to the forest. The key information is that the entire chromosome functions as a unit during genetic recombination and not simply isolated specific parts. The next question was why does the GC content correlate to the differential length of the SC? Based on previous data that the anchors of
92
2. GENES AND GENOMES REPRESENT DIFFERENT BIOLOGICAL ENTITIES
loops are highly dynamic, that loop size can change depending on the chromosome architecture, and that the space of loops seems fixed along the chromosome core, a model was proposed where the overrepresented SC formed small chromatin loops. The same amount of DNA when forming longer loops might require shorter SC lengths. This is similar to building a house with a variable base: if the land area is large, building a ranch is not a bad idea. However, on a smaller site, to attain the same square footage, one has to build a house with two or even three stories. Using BAC clones that represented both AT-rich and GC-rich regions of the mouse genome, the chromatin loop size was examined using the DNAeprotein co-detection method. It was observed that the GC-rich clones do form smaller loops! In addition, in the telomeric region, even GC-rich chromatin loops were small! A new meiotic chromosome model was finally built (Heng et al., 2004b). Interestingly, we later found that Nancy Kleckner and her colleagues had suggested a similar model to explain loop size and SC length (Kleckner et al., 2003). Previous efforts had failed to link the GC content to genetic maps by using large-scale high-resolution data with sophisticated mathematical analyses. What these analyses had missed was the basic biological concept that chromosomal pairing is based on the entire chromosome and not simply a small region of the DNA. An example of how basic Biology 101 can trump the most sophisticated analyses! It has now been realized that many factors including the GC content, differential chromosomal locations, different sex, and different cell types (somatic cells or meiotic cells or sperm/eggs) can impact the chromosomal package, reflecting the plasticity of the genome-defined package, but as different species display different boundaries, this plasticity is also constrained. The key message here is as follows: choosing the correct level of genetic organization for biological studies is the most important precondition when searching for biological truths, as biological functions are very different depending on the particular level of organization. The use of higher resolution technologies can also lead to the wrong biological conclusion. This is the very rationale needed to search for genome-based biological concepts in a gene-dominated research world.
2.6 GENOME CONTEXT DETERMINES GENE FUNCTION The above examples are important to understanding the relationship between DNA/gene and chromosome/genome. The genome-level function is not simply achieved by “adding up” the functions of individual genes, as there is no purely quantitative relationship between the gene and genome. This has ultimately been proved by the Human
2.7 ACTION NEEDED
93
Genome Project: knowing the function of many individual genes has not yielded an understanding of the blueprint coded by system inheritance (see Chapter 4). The current cancer genome sequencing project has failed to detect the long expected common cancer drivers for most cancers. However, it has revealed that one universal feature of all cancers is elevated genome alterations (see Chapter 3). The most logical conclusion is that genome context defines gene function both in normal individual and cancer. This point will be systematically and continuously articulated throughout the entire book. The gene and genome dominate two different eras of inheritance studies. Because the gene and genome are different entities, understanding their own functions as well as the collaborative and conflicting relationship between the two hold the key for future genomics. What separates the gene and genome is the “parts” and “whole” relationship. While the interaction among genes can emerge into the genome’s function, the high level of genome context serves as a constraint for individual genes. Because each individual gene’s function needs to be fulfilled within the genome context, most gene functions are not self-determined but represent potential function as defined or selected by genomee environment interactions. Thus, the genome is the platform to organize and select gene response. As Barbara McClintock has insightfully pointed out, the genome (a highly sensitive organ of the cell) monitors genomic activities and corrects common errors, senses unusual and unexpected events, and responds to them, often by restructuring itself (for more, see Chapters 3 and 4). Such relationships can be used to explain many puzzling phenomena, such as why many genes can be experimentally eliminated without obvious phenotype changes, why “normal” individuals can have many gene mutations, why similar genes can form different genome-defined species, and why genome-level alterations are crucial for macroevolution while gene-level alterations are needed for microcellular evolution.
2.7 ACTION NEEDED Prior to finishing this chapter, some common feelings/responses need to be briefly addressed. Some readers might say that the differences between chromosomes and genes are obvious. “We appreciate the importance of the chromosome, of course. What you are doing is just beating a dead horse.” But in fact, the majority of researchers do not understand the importance of the genome. If they do, they would have already changed their research direction or priority. Just looking around, how many scientists have been focused on the gene-based research? Even today, with so many
94
2. GENES AND GENOMES REPRESENT DIFFERENT BIOLOGICAL ENTITIES
whole genome profile platforms, the majority of researchers are still focusing on understanding individual domainsdspecific genes. We need to be honest: there is no much needed framework to organize these genetic parts. Other readers might suggest that the limitations of the gene theory are obvious. “We know it has problems, but why not just gradually improve it like we are doing?” Well, this difference depends on our view of how science works. Clearly, we all have the same goal to search for the truth, but each approach is drastically different. Many might want to improve science by gradual modification within the current gene-centric framework. We, in contrast, happen to agree that a different framework works much better than the field’s current one, and many paradoxes can only be solved by a new genome-based paradigm. The good news is that we understand why one may think this way. I used to be passionate about the gene as well. In fact, I was trained by one of the top gene scientists, Dr. Lap-Chee Tsui, the man who cloned the cystic fibrosis gene.