COMMENT
Plant genomics takes root, branches out
Outlook
Plant genomics takes root, branches out For the past nine years, an international consortium of researchers have collaborated on a project to provide a full set of genomics tools for the model plant species Arabidopsis thaliana. Among the goals of this project were the complete sequence of the Arabidopsis genome, which may be completed in the year 2000, four years ahead of schedule. Arabidopsis was an appropriate choice as the first target of plant genomics because of its excellent genetics, outstanding research community and small genome size. Until very recently, it appeared that comprehensive high-throughput plant genomics in the public sector would largely begin and end with Arabidopsis. Over the past two years, this situation has changed completely. hile the public sector has proceeded with Arabidopsis genomics at breakneck pace, agricultural companies like Pioneer Hi-Bred, Monsanto and DuPont initiated relatively enormous proprietary projects in plant genomics1. The major cash crops, particularly maize, wheat and soybean, were the targets of these projects that initially centered on identification of genes as expressed sequence tags (ESTs) through sequencing of diverse and large collections of cDNAs. As these companies moved from gene discovery to the identification of gene function (and hoped-for patent positions) via functional genomics, academic plant scientists who didn’t study Arabidopsis foresaw an inability to compete in basic or applied genetic research. Internationally, efforts had been building since the early 1990s to use comparative genetics as a powerful tool for leapfrogging crop genomics to a prominent position (http://www-iggi.bio.purdue.edu). In Japan, scientists at the Rice Genome Program (RGP) decided that it was time to switch from mapping and EST projects in rice (http://www.staff.or.jp) to a complete genomics project, including the sequencing of the rice genome. In the USA, maize researchers and growers came to the consensus opinion that public sector genomics was necessary in corn (http://www.inverizon.coin/ncgi). Since 1997, all of these projects and initiatives have blossomed, through a combination of persuasive science, good fortune and old-fashioned politics.
W
Rice as the second model plant species In 1998, the RGP directed by Dr Takuji Sasaki in Tsukuba asserted its continuing prominence in plant genomics by acquiring support within Japan for the sequencing of the rice genome. It has a commitment of $10 million per year for three years, with expectations that this will be continued for ten years at a similar level. These funds are all for genomic sequencing, which Dr Sasaki hopes to ramp up to 20 Mb per year within a few months. The rice genome is about 440 Mb in size. With their colleagues in Tsukuba, researchers in the USA, the EU, Korea and Taiwan have begun planning and preliminary research to assist in coordinated sequencing of the genome of rice variety Nipponbare (ftp://genome1. bio.bnl.gov/pub/maize/RiceProject.html). China will also participate in this consortium, generating rice genomic sequence data for a different rice variety. About 20 labs in the public and private sector in Japan have been granted an additional $1.4 million to pursue identification 0168-9525/99/$ – see front matter © 1999 Elsevier Science All rights reserved. PII: S0168-9525(98)01683-7
of rice gene function via transposon tagging2 and microarray analysis3 or gene isolation by map-based techniques4. Beyond its importance as the world’s number one crop, rice is also an excellent model plant with a small genome, reasonable transformation competence, and a detailed genetic map. Rice is a monocotyledonous plant and a grass, as are most of the world’s other major food crops (e.g. wheat, maize, barley, sorghum, oats, sugarcane). Comparative genetic mapping with DNA markers5 has indicated similar gene content and extensive map colinearity between rice and other grasses, suggesting that rice could provide a road map for the characterization of larger grass genomes, like those of maize (2400 Mb) or barley (4800 Mb)6. Hence, rice has been put forward as a good second model for plants, allowing valuable comparisons to a model dicot (Arabidopsis thaliana) and the most important monocots.
Comparative genomics Comparative genetic mapping indicated that tremendous added value could come from genomics carried out across a range of species7,8 (Ref. 9 and references therein). An historically unique strength of plant science has been the diverse array of species (literally dozens) that have been physiologically, biochemically, developmentally and genetically investigated. Comparative genetics and particularly, genomics, could bring these diverse data sets together in a synergistic manner, indicating the genetic nature of the commonalities central to plant biology and the allelic variation responsible for the different biologies of different species7. Animal scientists have recently realized that their inability to understand how genes, pathways and physiologies have evolved leads to a dead end in understanding the biology of any single organism. Hence, the study of the puffer fish, zebrafish and, most recently, apes10 have all been proposed to help provide the links between genes and evolved function. Rather than supplanting the appropriate emphasis on tractable model species, these comparative studies will add value to model work by indicating the nature and significance of allelic and functional variation and by providing the connections between model observations and applications to crop improvement. A good scientific justification for multi-species genomics didn’t look to be of much value, though, in the tight-budget perspective of the times. TIG March 1999, volume 15, No. 3
Jeffrey L. Bennetzen
[email protected]. purdue.edu Department of Biological Sciences, Purdue University, West Lafayette, IN 47906-1392, USA. 85
A good argument, better politics and timing However, the dawning realization of the potential comparability and synergy of plant genomics dovetailed nicely with the success of the National Corn Growers Association in educating the US Congress in the value of basic research in crop genomics, and with an unexpectedly vigorous expansion of the US economy. The first outcome of this education and lobbying process was the proposal by Senator Christopher Bond of Missouri to apply $40 million in new money at NSF for plant genomics. An Interagency Working Group (IWG), chaired by Dr Ron Phillips, was organized to provide recommendations as to how a new plant genomics program should be targeted and organized. After interactions with the scientific, industrial and end-user communities, the IWG came up with recommendations that formed the foundation of an NSF Plant Genome program announcement requesting proposals for a 6 April 1998 deadline. As recommended by a broad spectrum of basic plant science researchers, some of these funds would be used to accelerate the ongoing Arabidopsis Genome Initiative, but most were for the initiation of new genomics projects. Large grant proposals (as high as $3 million per year for up to five years) were encouraged. The funded projects were envisioned as potentially multiinstitutional and multi-investigator collaborations. These ‘virtual centers’ would apply high-throughput technologies to produce genomic tools that are valid in model plant species and in genetically amenable crops.
A plan from the community The IWG recommendations largely reflected a consensus view by the plant genetics community of the power of comparative genomics in plants. In its simplest outlines, the IWG recommendations involved the development of a potentially universal set of plant genomics tools. One exceptional value of plant genomics research is that it can empower all plant biologists and agriculturalists uniquely. As clones become available and routine expression studies are in user-friendly databases, then any research group (regardless of size or research question) will only be limited by its level of creativity. Compared to the Human Genome Project, a plant genomics project has very different resources and goals. Plant genomics is not just animal genomics with less money. Unlike animal researchers, who have concentrated their molecular and genetic characterizations on a handful of model organisms with the potential to be informative vis-à-vis human biology, plant researchers have extensively characterized dozens of species with known phylogenetic relationships. Plant genomics could provide the connections between these large sets of information and should also provide the multi-species sets of knowledge and genes that will be used in future crop improvement. The tool development proposed by the IWG document was very similar to that described in various genomics proposals9,11. These tools include genomic sequencing of (at least initially) two plant species, rice and Arabidopsis, to serve as the foundation for gene discovery and characterization in all plants (Fig. 1). Physical maps of a few species would be needed, presumably constructed by known technologies like bacterial artificial chromosome fingerprinting. The ‘nodal’ species for physical maps would be chosen because they have relatively small genomes and because they can serve as surrogates for important and phylogenetically diverse plant families, like 86
TIG March 1999, volume 15, No. 3
Plant genomics takes root, branches out
sorghum for maize and lotus for soybean. Rice or Arabidopsis alone cannot serve as the only nodal species, partly because local colinearity of genomes might not extend consistently beyond closely related species8,12. Genetic maps, using comparable DNA markers, would have to be prepared for all of the nodal species that were physically mapped and for many other important plant species. These comparable marker-based genetic maps would provide the foundation for relating all of the genetic, biochemical, physiological, developmental and morphological information on the great wealth of studied plant species. A similarly large number of species would deserve medium-deep (perhaps 50 000 clone) EST projects. Certainly the most economical route to gene discovery and investigations of allelic diversity in genomes that will not soon be totally sequenced, ESTs also provide the species-specific sequences needed for precise DNA-chip analysis of gene expression13. High-throughput expression analyses using microarrays or DNA chips would also be appropriate for as many species as feasible, as would reverse genetic (e.g. transposon tagging) studies of gene function. Different plant species might vary tremendously,
FIGURE 1. The comparative plant genomics tree Expression analysis
Many species
EST projects
Reverse genetics
Genetic maps Informatics
COMMENT
Informatics
Outlook
Physical maps of nodal species
Genomic sequencing
Arabidopsis
Rice
or not at all, in the functions provided by orthologous genes. These data will indicate how plants have evolved, and how they can be engineered. Moreover, copy number and functional roles of gene family members both differ between species, indicating that a knockout mutation of a comparable gene will only provide an identifiable phenotype in an initially unpredictable subset of species. Informatics will be needed to tie together all of these structural (genomic sequencing, ESTs, physical maps, genetic maps) and functional (reverse genetics, expression analysis, mapping of genetic variation) genomics projects (Fig. 1). Tremendous strides are needed to improve our current informatic capacities, both for display and for data-mining of the various varieties of genomics and of the genomic data for different species. Most importantly, links must be maintained to the full plant research community to make sure that all of this technology is transferable. Comparative genetic maps can provide that link.
COMMENT
Plant genomics takes root, branches out
The first round: surprises The first NSF plant genome panel ranked 66 proposals, rating 4 outstanding, 16 meritorious, 22 competitive and 24 not competitive. In all, 24 new proposals were chosen for funding across a range of disciplines and species. Brief descriptions of the funded projects can be found at http://www.fastlane.nsf.gov/cig-bin/A6QueryList. The proposals chosen, and their high level of support, were a surprise to many plant scientists, including some of the funded researchers. Moreover, the NSF panel edited a few proposals, adding components to some and deleting components from others. Additional criteria beyond merit were also utilized, as some proposals rated meritorious were rejected while others with lower ratings were funded. Presumably, the editing and selection processes were meant to insure that a complete set of tools would be developed in a given genomics area, without unnecessary duplication. The successful proposals received funds ranging from $301 072 for two years to over $12.5 million for five years, involving a commitment of over $25 million in year one and over $90 million for the duration of the projects. Some of the projects will produce the plant genomics tools outlined in the IWG document, while others focus more on a particular biological question, like stress tolerance or cellulose synthesis. Compared with the IWG outline, certain areas of research (e.g. maize functional and structural genomics) were very heavily funded and others (e.g. rice genomic sequencing, a multispecies range of EST projects) were not extensively supported. The absence of a funded rice genomic sequencing project, despite several proposals from established groups, casts some doubt on the degree to which the USA will be able to participate (at least initially) in the International Rice Genome Sequencing Consortium14. An additional $50 million to NSF to continue this program was provided by the US Congress for this fiscal year. NSF has requested new plant genomics proposals, with a deadline of 29 January 1999. About $20 million of this allocation should be available for the first year of new funded proposals. Letters of intent for this program were requested by 4 December 1998; about 15% fewer were submitted in this round than were received in the previous year.
Agricultural genomics at the USDA The chief plant research organization in the USA, the US Department of Agriculture (USDA), has been supporting a low throughput, individual investigator form of plant genomics for the last seven years. The USDA Plant Genome Program, through the National Research Initiative (NRI), will receive about $9 million in funding this year, and has been responsible for the generation of many DNA marker-based genetic maps, the cloning of numerous agronomically significant genes, and the verification of genomics and comparative genetic approaches
References 1 Meinke, D.W. et al. (1998) Arabidopsis thaliana: A model plant species for genome analysis. Science 282,662–682 2 Hirochika, H. et al. (1996) Retrotransposons of rice involved in mutations induced by tissue culture. Proc. Natl. Acad. Sci. U. S. A. 93, 7783–7787 3 Schena, M. et al. (1995) Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270, 467–470 4 Yano, M. and Sasaki, T. (1997) Genetic and molecular dissection of quantitative traits in rice. Plant Mol. Biol. 35,
Outlook
in many plant species. With this solid research background and a mission of agricultural improvement, it was inevitable that the USDA would propose its own large-scale genomics program. A last minute decision by Congress killed what could have been $30–$80 million in 1999 for an Agricultural Genomics Initiative (AGI), but the USDA has hopes that a similar sum might become available for this purpose next year. If supported, the new USDA AGI would fund genomics research in plants, livestock and agriculturally significant microbes. Partly to recompense the USDA for its loss of expected funds for Agricultural Genomics, the NRI received an unusually large increase in funding (over 20%). In this fiscal year, the USDA plans to fund approximately $2 million for rice genomic sequencing, with a commitment from the NSF to provide a similar level of support. Like the current NSF Plant Genome Program, the USDA expects that eventual AGI funding will go to multi-investigator projects to generate genomics tools (including clones and data sets). Unlike single investigator, blue-sky research, this first generation genomics will require coordination of projects within species and between agencies so that the full set of tools is provided in a timely manner. Hence, the USDA, NSF and DOE will continue to participate in the steering of these projects through a group like the IWG.
A promising future for plant genomics, plant scientists and the consumer From a few loud voices as recently as early 1997, the shouts for a multi-species approach to plant genomics have swelled to a cacophonous roar. The availability of fiscal resources, particularly at such unheard of levels in public plant science research, tends to attract interest. In the future, plant genomics will evolve into a powerful but routine tool in the kit of any researcher, much as molecular genetics has become the universal currency of most life science research. For the present, genomics research will need to produce the tools that allow individual researchers access to genes and genetic information with a minimal amount of repetitious technology. A few large research groups will generate and distribute these monotonous and massive data sets, but the real beneficiaries will be the individual plant biologists who can apply creativity to the use of these resources. As long as all plant researchers have complete access to these pertinent tools in their species, then the recent bounteous crop of new plant genomic dollars will have been very well spent.
Acknowledgements Special thanks to the many plant genomics researchers who provided comments on this manuscript and to S. Frank for producing the figure. The preparation of this manuscript was supported by USDA grant #97-35310-5136.
145–153 5 Moore, G. et al. (1995) Grasses, line up and form a circle. Curr. Biol. 5, 737–739 6 Arumuganathan, E. and Earle, E.D. (1991) Nuclear DNA content of some important plant species. Plant Mol. Biol. Rep. 9, 208–218 7 Bennetzen, J.L. and Freeling, M. (1997) The unified grass genome: synergy in synteny. Genome Res. 7, 301–306 8 Gale, M.D. and Devos, K.M. (1998) Plant comparative genetics after ten years. Science 282, 656–658 9 Phillips, R.L. and Freeling, M. (1998) Plant genomics and our food supply: an introduction. Proc. Natl. Acad. Sci. U. S. A. 95,
1969–1970 10 McConkey, E.H. and Goodman, M. (1998) A human genome evolution project is needed. Trends Genet. 13, 350–351 11 Bennetzen, J.L. et al. (1998) A plant genome initiative. Plant Cell 10, 488–493 12 Bennetzen, J.L. et al. (1998) Grass genomes. Proc. Natl. Acad. Sci. U. S. A. 95, 1975–1978 13 Marshall, A. and Hodgson, J. (1998) DNA chips: An array of possibilities. Nat. Biotechnol. 16, 27–31 14 Pennisi, E. (1998) Slow start for US rice genome project. Science 282, 65
TIG March 1999, volume 15, No. 3
87