Current Biology, Vol. 12, R470–R471, July 9, 2002, ©2002 Elsevier Science Ltd. All rights reserved.
Rice Genomes: A Grainy View of Future Evolutionary Research Kevin Livingstone and Loren H. Rieseberg
The draft genome sequences from two subspecies of rice are powerful new tools for gene discovery in the grasses. Genome-wide comparisons of gene content and order will also shed new light on evolutionary processes.
Although completion of the heavily anticipated human genome sequence project will provide information needed to combat inherited maladies, the recent completion of two sequences of the rice genome [1,2] may be a far greater gift to humanity. After all, as the Byzantine proverb states, “he who has bread has many problems, he who has no bread has only one problem”. Because of the importance of rice and its status as a model for all grasses, these sequences will provide a basis for future genetic improvement of all the cereal grains, our most important food resource. Beyond the obvious agricultural benefit, these sequences may also provide unparalleled views of the processes operating on DNA sequences that change the function and organization of genes, leading to the formation of new species. The two rice sequences are from subspecies that represent the major cultivated gene pools of rice, Oryza sativa L. ssp. indica and O. sativa ssp. japonica. The indica type is primarily grown in China, and the Beijing Genomics Institute (BGI) determined its sequence [1]. The japonica subspecies is preferred in Japan, and Syngenta AG’s Torrey Mesa Research Institute (TMRI) determined its sequence [2]. These are both draft sequences, produced by randomly sequencing small genomic bits and relying on multiple, offset sequences to assemble the larger pieces. From a functional standpoint, while each draft should contain nearly all the genes in rice, many of the sequences identified as genes are only predicted on the basis of different gene detection algorithms. It will take a long time to validate the expression of the putative genes biologically and take full advantage of these efforts. Structurally, both drafts cover the majority of the rice genome, but many of the intergenic regions are missing. Consequently, each draft resembles a puzzle with tens of thousands of pieces on the table, but only a few joined to start to form a picture of the twelve rice chromosomes. Even in a draft state, however, these sequences provide enormous agricultural benefits. Rice is a crucial staple for much of the world’s population, and rice is also the compact key to other grass genomes [3]. The main differences between rice and maize, wheat, barley and so on are that, while the same Department of Biology, 142 Jordan Hall, Indiana University, Bloomington, Indiana 47405, USA.
PII S0960-9822(02)00949-1
Dispatch
genes are found in each species, they are in different arrangements, amid various amounts of species-specific ‘junk’ DNA. The compactness of the rice genome, coupled with a known sequence, will make identification of important genes easier. In addition, the rice sequence provides a means for directing searches in other grasses to the genes in a particular chromosomal region. The TMRI group [2] has already demonstrated the power of a focused search approach to define candidates for a subset of agronomically important traits mapped in maize. Use of the comparative method to glean and transfer information is of course not limited to the grasses. The intense molecular genetic characterization of Arabidopsis thaliana should also benefit rice research, at least where pathways and processes are conserved. The availability of the completed Arabidopsis genomic sequence [4] also enables genome-wide evolutionary comparisons of rice and Arabidopsis. These comparisons first require gene identification in rice. Using a variety of gene-calling programs, parameters and confidence levels, the BGI group [1] estimate that their sequence contains 46,022–55,615 genes, and the TMRI group [2] conclude that rice has 32,000–50,000 genes. Either estimate suggests that rice has more genes than any other sequenced organism, including humans. When the set of 25,498 genes similarly identified in the Arabidopsis genome sequence was compared to rice, the TMRI group [2] found that 85% of Arabidopsis genes had a match (homolog) in rice, with a mean identity of 49% at the protein level; the BGI group [1] found that 81% of Arabidopsis genes had a homolog in their rice gene set, with a mean identity of 60% at the protein level. When the two groups attempted to classify the functions of the rice genes — for example, disease resistance, metabolism, structural proteins and so on — they both found all of the important categories previously found in Arabidopsis. The BGI group [1] found the functional groups in approximately the same ratios, and the TMRI group [2] showed that the two species have a similar number of gene families (about 15,000). Taken together, this indicates that Arabidopsis and rice share a large set of genes that most likely carry out similar functions. Arabidopsis and rice are markedly different, however, a fact which must ultimately be reflected in their genes. When the BGI gene set was compared to Arabidopsis, only 49% of the rice genes had a homolog in Arabidopsis. While many of the unique sequences in rice may be erroneous predictions, the BGI team [1] did show that at least 15% of these are expressed. A similar attempt to find Arabidopsis homologs for a set of maize genes found matches for only 60–70% of the maize sequences [5]. These genes may, therefore, define a subset that differentiates monocots from dicots. The BGI group [1] suggest these genes might have arisen from genomewide or large-scale segmental gene duplication(s) in rice. The TMRI data [2] also support the view that there
Current Biology R471
are multiple gene duplications in the rice genome relative to that of Arabidopsis, which is itself highly duplicated [6]. Comparison of rice to Arabidopsis illustrates both the power and pitfalls of analyzing highly divergent genomes. On the one hand, the conservation found between proteins indicates functional importance, and the thousands of available homologs may be used to refine mutation rate estimates. On the other hand, the molecular mechanisms and microevolutionary forces responsible for the observed differences are difficult to discern because multiple changes are likely to have occurred at the same region or site. Interesting as the broad comparisons may be, evolution proceeds by discrete events, which are much more likely to be detected in close comparisons. Take, for example, genome rearrangement. The number of clear homologs from each genome should allow the definition of shared ancestral chromosomal blocks and the events that changed them. The BGI [1] and TMRI [2] groups both came to the conclusion, however, that the amount of duplication in the genomes, and the extent of rearrangement, make attempts to identify individual rearrangement events next to impossible. The draft nature of their sequences undoubtedly hampered this effort, but these results still effectively extinguish hopes of a general monocot–dicot comparative framework. Even the rice–grass comparisons may be difficult to interpret: the TMRI group [2] produced rice–maize, rice–barley, and rice–wheat comparative maps, and although conserved blocks are visible, they appear to be substantially interrupted. The availability of the two rice sequences in their completed form, when all their chromosomes are assembled, will thus allow an unprecedented view of molecular mechanisms and microevolutionary forces responsible for genome evolution. Instead of comparing gene pools that have been diverging for close to 200 million years, such as those of Arabidopsis and rice, whole-genome comparisons between subspecies that have been evolving independently for a mere two to three million years will be possible. Revealing comparisons can then be made between the intergenic regions of the two sequences — regions that harbor most of the sequence-level variation in the so-called ‘junk’ DNA [1]. This DNA is composed of multiple copies of different repeated sequences [7], including small autonomous sequences called transposons and retrotransposons. By looking at rice subspecies, we may better understand how these elements proliferate and change within a genome. The sequenced cultivars from these gene pools represent thousands of years of directional selection under cultivation. Crosses between the two gene pools commonly result in the discovery of reproductive barriers [8,9]. Most often, these barriers are characterized by deviations from Mendelian inheritance, as manifested by a failure to recover or introgress alleles from one of the parents, or by partial sterility of hybrids. These sequences thus offer an unprecedented system to study adaptation and reproductive isolation in the speciation process. Studies designed to identify adaptive differences will now be able to compare the genes
involved. Sequence comparisons will also define the number and extent of gene rearrangements, providing clues to their causative mechanism(s). With these data in hand, we will be able to see whether reproductive isolating factors are genic, as predicted by Dobzhansky [10] and Muller [11], or dependent on small chromosomal rearrangements [12], or both. Simply put, having this level of detail about an early stage of speciation should prove invaluable. In the end, evolutionary biologists have a reason to celebrate. Jurassic Park notwithstanding, most evolutionary geneticists work by comparing contemporary DNA sequences — and the more, the better. Plant evolutionists, who have seen multiple animal and microbial genomes sequenced, have finally received the boost they need: two draft genomic sequences from closely related subspecies. Used wisely, these two sequences will not only allow rapid agricultural advances, but may also answer questions regarding the nature of adaptive differences, mechanisms of genome rearrangement, and basis of reproductive isolation. References 1. Yu, J., Hu, S., Wang, J., Wong, G.K.-S., Li, S., Liu B., Deng Y., Dai L., Zhou Y., Zhang X, et al. (2002). A draft sequence of the rice genome (Oryza sativa L. ssp. Indica). Science 296, 79–92. 2. Goff S.A., Ricke D., Lan T.H., Presting G., Wang R., Dunn M., Glazebrook J., Sessions A., Oeller P., Varma H., et al. (2002). A draft sequence of the rice genome (Oryza sativa L. ssp. Japonica). Science 296, 92–100. 3. Devos, K.M. and Gale, M.D. (2000). Genome relationships: the grass model in current research. Plant Cell 12, 637–646. 4. The Arabidopsis Genome Initiative. (2000). Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408, 796–815. 5. Brendel, V., Kurtz, S. and Walbot, V. (2002). Comparative genomics of Arabidopsis and maize: prospects and limitations. Genome Biol. 3, 1005. 6. Vision, T.J., Brown, D.G. and Tanksley, S.D. (2000). The origins of genomic duplications in Arabidopsis. Science 290, 2114–2117. 7. SanMiguel, P., Tikhonov, A., Jin, Y.-K., Motchoulskaia, N., Zakharov, D., Melake-Berhan, A., Springer, P.S., Edwards, K.J., Lee, M., Avramova, Z. and Bennetzen, J.L. (1996). Nested retrotransposons in the intergenic regions of the maize genome. Science 274, 765–768. 8. Harushima, Y., Nakagahra, M., Yano, M., Sasaki, T. and Kurata., N. (2002). Diverse variation of reproductive barriers in three intraspecific rice crosses. Genetics 160, 313–322. 9. Kubo, T. and Yoshimura, A. (1999). Complementary genes causing F2 sterility in Japonica/Indica cross of rice. Rice Genetics Newsletter 16, 68–69. 10. Dobzhansky, T. (1951). Genetics and the origin of species, 3rd edn. (New York: Colombia University Press.) 11. Muller, H.J. (1942). Isolating mechanisms, evolution and temperature. In Biological Symposia, T. Dobzhansky, ed. (Lancaster, PA: The Jaques Cattell Press), pp. 71–125. 12. Stebbins, G.L. (1958). The inviability, weakness and sterility of interspecific hybrids. Adv. Genet. 9, 147–215.