Genome studies and molecular genetics Part 1 Model legumes Exploring the structure, function and evolution of legume genomes Editorial overview Nevin D Young and Randy C Shoemaker Current Opinion in Plant Biology 2006, 9:95–98 Available online 10th February 2006 1369-5266/$ – see front matter # 2006 Elsevier Ltd. All rights reserved. DOI 10.1016/j.pbi.2006.01.016
Nevin D Young Department of Plant Pathology and Department of Plant Biology, University of Minnesota, St. Paul, Minnesota 55108, USA e-mail:
[email protected]
Nevin’s group focuses on legume genome organization and evolution. They are especially interested in plant disease resistance genes and in large gene families. Nevin is currently principal investigator on the NSF initiative to sequence the genome of the model legume, Medicago truncatula. Further details about Nevin’s research can be found at http://umn.edu/home/neviny. Randy C Shoemaker USDA–ARS, Corn Insect and Crop Genetics Research Unit, Department of Agronomy, Iowa State University, Ames, Iowa 50011, USA e-mail:
[email protected]
Randy’s team works on genomics and gene discovery in soybean. They are studying the genome of the soybean to better understand processes related to genome duplication events and genome evolution. The team is working to position gene transcripts onto a detailed molecular genetic map of the soybean genome, to integrate the transcripts into a physical map and to conduct map-based cloning of genes involved in QTL. More information on Randy’s work is available at http:// www.agron.iastate.edu/personnel/ userspage.aspx?id=207. www.sciencedirect.com
Genetic research in legumes (Fabaceae) preceded that of any other plant – beginning with Mendel’s original experiments on peas. Today the plant research community is once again turning its attention to legumes, this time as a subject of genomic research — sequencing, comparative genomics, phylogenomics, transcriptomics, proteomics, metabolic profiling, and of course, translation from model to crop. Legumes have a long, fascinating, and sometimes controversial history of systematics and phylogenetics. Now, phylogenomics is revealing the complex taxonomic relationships that exist among legumes (see review by Cronk et al., pp. 99–103). Soybean was one of the first species in which paleopolyploidy was described [1]. Today, genomics is providing detailed insights into this extraordinary process and its role in genome dynamics (see review by Shoemaker et al., pp. 104–109). In the field of plant–microbe symbiosis, legumes form the basis for much of what we know today. Extending the tools of genomics, especially the capacity to isolate easily genes that control critical phenotypes, is helping to build an integrated model of the relevant signal transduction pathways (see review by Stacey et al., pp. 110–121). Central to this revolution in legume research has been a focus on two key models, Medicago truncatula and Lotus japonicus (see reviews by Town [pp. 122–127] and by Sato and Tabata [pp. 128–132]). Both of these model species were originally chosen because they provided convenient and powerful systems for studying plant–Rhizobium interactions. But soon it was recognized they also possessed relatively small and simple genomes that are suitable for sequencing. Starting in 2000, large-scale sequencing projects were begun, first for Lotus at the Kasuza center in Japan. For Medicago, sequencing began with Samuel Roberts Noble Foundation support and has since expanded to an international consortium that includes groups throughout the US (University of Oklahoma, The Institute for Genomic Research, University of CaliforniaDavis and University of Minnesota) and Europe (Genoscope, INRAToulouse, Sanger Centre, John Innes Centre, Munich Information Center for Protein Sequences, University of Ghent and University of Wageningen). In the near future, essentially complete sequences for the gene-rich euchromatin of both Medicago and Lotus will be available. Already, the extensive body of sequence data is providing tools for efficient positional cloning of genes that are central to symbiosis [2,3] and comparative genomics [4]. In the future, the Medicago and Lotus sequences will enable researchers to discover the genetic and genomic basis for the fascinating array of developmental phenotypes originally described in legumes (see review by Domoney et al., pp. 133–141), Current Opinion in Plant Biology 2006, 9:95–98
96 Genome studies and molecular genetics
genes and gene products that are unique to legumes (see review by Silverstein et al., pp. 142–146), and economically valuable traits that are associated with secondary metabolism. This section of Current Opinion in Plant Biology highlights many outstanding examples of progress in legume genomics. The reviews describe how genomics research is helping to explore the evolutionary relationships, developmental processes, and genomic features observed among the many thousands of legume species. Still, most economically important legumes are members of a monophyletic subfamily, the Papillioinidae, which includes notable taxa such as Lupinus, Arachis, Glycine, Phaseolus, Lotus, Medicago, and Pisum, so most genomics research today is focused on this relatively small group. This leaves the remainder of the legume family, including tropical woody species, virtually untouched. The biology of the seed has played a major role in the radiation of legumes. Similarly, flower morphology shows widely divergent developmental processes. Several mutants are available for the study of either organ [5]. Domoney et al. synthesize results of numerous studies, many originating in the EU Grain Legume Integrated Project. These authors point out that no apparent nonlegume homolog is known for many of the genes that affect seed morphology or seed composition in legumes. By contrast, many of the genes that are related to legume flower morphology have homologs in Arabidopsis. Interestingly, differences in the regulation of those genes seem to permit a developmental balance between leaf and inflorescence production in some legumes that is not seen in Arabidopsis. This makes legumes ideal models to study unique components of both seed and flower development. A growing repertoire of expression profiling tools, and other ‘-omic’ technologies is now available for model and crop legumes, and the advent of powerful bioinformatic tools facilitates the use of these technologies. Genetic maps and other high-throughput genomic tools permit the comparative analysis of quantitative trait loci (QTLs) that affect seed composition and seed development in multiple legumes. As in other models, a widely accepted gene and trait ontology plus a common set of genetic markers are needed to make the most of comparative analyses. And of course, interoperability of legume bioinformatic resources is a necessity. The review by Shoemaker et al. addresses the complexity of genome structure in legumes, especially soybean. Almost without exception, paleopolyploidy contributes to this complexity. The availability of large numbers of ESTs from the Fabacae has permitted the analysis of multiple paralogous gene pairs. Working independently, two laboratories identified tell-tale signatures of largescale genome duplication events among legumes: disCurrent Opinion in Plant Biology 2006, 9:95–98
tinctive groups of genes with similar levels of divergence. These studies suggested that soybean underwent a largescale genome duplication event approximately 14 million years ago (MYA) and another approximately 44 MYA. A similar duplication event in Medicago is presumed to have occurred approximately 58 MYA. When paralogous sets of genes for which three or four copies could be identified were evaluated, tree topologies pointed to both Glycine and Medicago sharing the older duplication event [6]. The timing and extent of the duplication event shared by Medicago and Glycine will have important implications in answering questions about the extent to which polyploidy can account for major adaptive innovations that led to significant evolutionary radiations. Soon we may see genomic resources in other legume clades (e.g. Arachis [ground nut]) that can illuminate evolutionary relationships further. The review by Cronk et al. discusses the relationships of various legume clades and provides evidence to support an ‘early explosion hypothesis’, whereby much of the morphological diversification of the family occurred during a brief period 50–60 MYA. Given that structural similarities within genomes is greater between closely related genera than between more distant relatives [7], taxonomic relationships among legumes becomes very important. For example, the probable paleoduplication common between Glycine and Medicago suggests that the same event has affected all 7000+ species within the millettioid and holgalegina clades. As noted by Cronk et al., the almost simultaneous emergence of all legumes clades (‘the explosion’) raises the question of whether that genome duplication is common to all legumes. Again, the growing amount of genomic data within the dalbergioid clade (Arachis), within the phaseoloids (soybean and common bean), and within the galegoid clade (Lotus and Medicago) might help to resolve this question. Sato and Tabata describe the ongoing Lotus genome sequencing project and the clever strategy they used to cover the genome. Combining clone-by-clone sequencing of large insert clones from gene-rich regions with shotgun sequencing, they obtained nearly 90% coverage of the Lotus genome (based on the coverage of expressed sequence tags [ESTs]). Notably, one of the shotgun libraries came from a pool of thousands of large insert clones whose ends did not match anywhere in the anchored sequence, thereby efficiently recovering genome regions that were missed in the clone-by-clone component. Sato and Tabata find that the broad genome features of Lotus resemble those of Arabidopsis. They also find extensive synteny between Lotus, Medicago, soybean and even Arabidopsis. Their review highlights examples in which the Lotus sequence has already enabled the mapbased cloning of important symbiosis genes [3]. Coupled with other substantial genomic resources, such as ESTs www.sciencedirect.com
Editorial overview Young and Shoemaker 97
and TILLING (targeting-induced local lesions in genomes), Lotus should provide a powerful platform for legume research for years to come.
of nodule development. Is it possible, they ask, that the nodulation pathways of legumes could have evolved from more ancient chitin recognition pathways?
In Medicago, sequencing is at nearly the same stage as that in Lotus, though with greater emphasis on clone-by-clone selection and a larger proportion of clones finished to completion. To make this resource convenient and useful to researchers, the Medicago community established a consortium known as the International Medicago Genome Annotation Group (IMGAG), with the goal of providing logical and consistent gene annotation throughout the project. Town provides an overview of the IMGAG initiative and the process that was used to develop, test and implement the IMGAG annotation pipeline. Lessons learned from initial gene annotation in Arabidopsis and rice were the impetus for the coordinated IMGAG effort. Moreover, funding for comprehensive manual annotation in Medicago (and Lotus) seems unlikely to be repeated because of the high cost. Therefore, the best and most consistent automated annotation is essential. Town describes how the IMGAG annotation pipeline was extensively tested to obtain an estimated accuracy of nearly 80% at the gene level and 95% for exons.
Some of the genes that are involved in the signaling between plant and symbiont appear to be legume-specific. It is thought that many of these genes were either recruited from other metabolic pathways or created de novo. Silverstein et al. review numerous methods for detecting legume-specific gene families. The more than 1 000 000 sequences available from the Fabaceae provide a rich source from which to identify these novel genes. Silverstein et al. point out that the plant–symbiont interaction requires the invasion of the plant tissue by a bacterium, a process strikingly similar to host–pathogen interactions. In order for the process to occur, the legume must shut down prevailing defense mechanisms. Not surprisingly, the plant would benefit from the presence of a backup defense. Emerging evidence suggests that some nodule-specific peptides (so-called cysteine cluster proteins or ‘CCPs’) provide antimicrobial defense to the plant, potentially protecting the nutritionally rich nodule [9,10]. The presence of this antibacterial and antifungal mechanism suggests a novel means of enhancing pathogen resistance or tolerance in crop plants.
The fixation of atmospheric nitrogen into a form of nitrogen that is useable by plants has major agricultural and environmental impacts. This natural capacity occurs virtually nowhere else in the plant kingdom other than in legumes. The genes that are required to establish symbiosis between plant and soil bacteria and to create the structural environment for nitrogen fixation to occur have been studied for many years [8]. Now, genomics is enhancing our understanding of these processes and of the genes that control them. Stacey et al. describe the recent cloning of several key players in symbiosis that are involved in interactions with both Rhizobia and mycorrhizal fungi, which share many overlapping signaling steps. Crucial to these discoveries have been a large set of mutants, frequently in Lotus and Medicago, that enabled the dissection of the developmental pathway. It’s now clear, for example, that one or a pair of LysM-type serine/threonine receptor kinases plays an initial role in sensing Nod factors from Rhizobium. Downstream, a series of leucine-rich receptor kinase, cationchannel, calmodulin-dependent kinase, and GRAS-type transcription factors are essential for successful nodulation. Stacey et al. integrate the many different components in this pathway into a unified model that should help to inform future research. In the area of transcriptome, proteome, and metabolome research on nodulation, more questions than answers remain. Clearly, a combination of these technologies will eventually help to reveal the global molecular changes that underlie nodule biology. Finally, Stacey et al. speculate on one area that has always fascinated students of nodulation — the ontogeny www.sciencedirect.com
Among the reviews in this issue, we see examples of the recent blossoming of legume genomics. Legumes, with their 16 000 species, possess an impressive breadth of phenotypic diversity. The family contains members that are important in food, feed and fiber, that benefit human health and nutrition, and that are promising supplements for the world’s energy needs. To date, genomic initiatives on just a small number of species provide most of the legume sequence information. From these data, it is abundantly clear that the genomic diversity among legume genomes is also fascinatingly broad. Although most legume initiatives are still in their infancy, genomic models such as Lotus and Medicago are beginning to emerge. Rapid developments in technology will soon make whole-genome sequencing in more legume species feasible. The information we gain from these initiatives will open the floodgates to evolutionary studies of legume genomes and permit the phylogenetic resolution of this complex family. Genomic information from key evolutionary nodes will also be crucial in ensuring the quality and quantity of food and feed for a growing world population. The reviews presented in this issue of Current Opinions in Plant Biology represent excellent examples of progress, but only a glimmer of what we can expect to see in legume research in coming years.
References 1.
Shoemaker RC, Polzin K, Labate J, Specht J, Brummer EC, Olson T, Young N, Concibido V, Wilcox J, Tamulonis JP, Kochert G, Boerma HR: Genome duplication in soybean (Glycine subgenus soja). Genetics 1996, 144:329-338. Current Opinion in Plant Biology 2006, 9:95–98
98 Genome studies and molecular genetics
2.
3.
Ane JM, Kiss GB, Riely BK, Penmetsa RV, Oldroyd GE, Ayax C, Levy J, Debelle F, Baek JM, Kalo P, Rosenberg C, Roe BA, Long SR, Denarie J, Cook DR: Medicago truncatula DMI1 required for bacterial and fungal symbioses in legumes. Science 2004, 303:1361-1364. Madsen EB, Madsen LH, Radutoiu S, Olbryt M, Rakwalska M, Szczyglowski K, Sato S, Kaneko T, Tabata S, Sandal N, Stougaard J: A receptor kinase gene of the LysM type is involved in legume perception of rhizobial signals. Nature 2003, 425:637-640.
4.
Mudge J, Cannon SB, Kalo P, Oldroyd GED, Roe BA, Town CD, Young ND: Highly syntenic regions in the genomes of soybean, Medicago truncatula, and Arabidopsis thaliana. BMC Plant Biology 2005, 5:15.
5.
Lewis G, Schrirer B, Mackinder B, Lock M (Eds): Legumes of the World. Richmond, UK: Royal Botanic Gardens, Kew; 2005.
6.
Pfeil BE, Schlueter JA, Shoemaker RC, Doyle JJ: Placing paleopolyploidy in relation to taxon divergence: A
Current Opinion in Plant Biology 2006, 9:95–98
phylogenetic analysis in legumes using 39 gene families. Syst Biol 2005, 54:441-454. 7.
Choi HK, Mun JH, Kim DJ, Zhu HY, Baek JM, Mudge J, Roe B, Ellis N, Doyle J, Kiss GB, Young ND, Cook DR: Estimating genome conservation between crop and model legume species. Proc Natl Acad Sci USA 2004, 101:15289-15294.
8.
Limpens E, Bisseling T: Signaling in symbiosis. Curr Opin Plant Biol 2003, 6:343-350.
9.
Mergaert P, Nikovics K, Kelemen Z, Maunoury N, Vaubert D, Kondorosi A, Kondorosi E: A novel family in Medicago truncatula consisting of more than 300 nodule-specific gene coding for small, secreted polypeptides with conserved cysteine motifs. Plant Physiol 2003, 132:161-173.
10. Graham MA, Silverstein KA, Cannon SB, VandenBosch KA: Computational identification and characterization of novel genes from legumes. Plant Physiol 2004, 135:1179-1197.
www.sciencedirect.com