Genome studies and molecular genetics Unwrapping new layers of complexity in plant genomes Editorial overview Joseph R Ecker and Doug Cook Current Opinion in Plant Biology 2004, 7:99–101 1369-5266/$ – see front matter ß 2004 Elsevier Ltd. All rights reserved. DOI 10.1016/j.pbi.2004.01.017
Joseph R Ecker Plant Biology Laboratory, Salk Institute Genomic Analysis Laboratory (SIGnAL), The Salk Institute for Biological Studies, 10010 N. Torrey Pines Road, La Jolla, CA 92037, USA e-mail:
[email protected]
Joseph R. Ecker was a principal investigator in the multinational Arabidopsis genome project. He also works on the gaseous plant hormone ethylene, which regulates a variety of basic plant processes including fruit ripening and responses to pathogenic organisms. Doug Cook Department of Plant Pathology, University of California, One Shields Avenue, Davis, California 95616-8680, USA e-mail:
[email protected]
Doug is a professor of plant pathology at the University of California – Davis. His genomics research is motivated by an interest in plant–microbe interactions and a desire to apply the fruits of genome analysis to crop improvement strategies. The principal focuses of his laboratory include symbiotic nitrogen fixation, comparative legume genomics, and transcriptional profiling of grapes during pathogen infection and berry development.
In the ‘early’ days of plant genomics, just a scant six years ago, the community of plant scientists was anticipating the arrival of the Arabidopsis whole-genome sequence. Many of us labored under the impression that the conservation of gene content and organization between plant genomes, despite differences in DNA content and chromosome number, would be sufficiently high that the Arabidopsis genome would suffice as a generic road map of plant genome structure. This view provided some solace as inadequate resources and cumbersome technology made tackling the whole genomes of crop plants seem only a distant possibility. As a counterpart to the Arabidopsis effort, numerous projects were initiated on crop and nonArabidopsis model species, with the recurrent themes of gene discovery through the sequencing of expressed sequence tags (ESTs) and describing the syntenic network of genes between plant families. In hindsight, genome structure is both more complex and more fluid that many of us had anticipated. Thus, the Arabidopsis paradigm serves plant biology well as a generic model for plant gene function but less well as a specific reference for gene content and conserved gene order. Fortunately the equation is balanced by a massive reduction in sequencing costs and an increased capacity for the generation of higher quality data. As a consequence, the sequencing of several whole plant genomes (including even relatively large and highly complex genomes) is underway, and draft sequencing of multiple genotypes within a species or of closely related species is on the horizon. Thus, the overriding theme in today’s climate is whole-genome analysis. Not surprisingly, in an environment of limiting resources, the choice of which genome to sequence is a matter of some debate: basic biologists argue for selections that are informed by phylogenetic diversity and experimental tractability, whereas crop biotechnologists tend to favor species of agronomic importance. The microbial associates of plants are being sequenced at an even more rapid pace, with bacterial genome projects becoming the sequencing equivalent of popcorn for sequencing centers, and fungal genomes only an afternoon snack! Within the resulting sea of data, phylogenetics and evolutionary theory are becoming important currencies with which to understand the relationships between genes and genomes. Just as the capacity for DNA sequencing has expanded the number of taxa being analyzed, similar shifts are occurring in functional genomics. Reverse genetics is being used to investigate the function of candidate genes through approaches such as RNA interference (RNAi), TILLING (for targeting induced local lesions in genomes) and the use of systematically generated T-DNA populations. Analysis of natural genetic variation, and the accompanying interest in distinguishing adaptive variation from genetic drift, is
www.sciencedirect.com
Current Opinion in Plant Biology 2004, 7:99–101
100 Genome studies and molecular genetics
taking center stage. Projects are underway to improve our understanding of complex traits in previously intractable species, such as Douglas fir, using the tools of association genetics to conduct ‘population genomics’. The ability to sample variation and to identify common alleles at the species level will also impact the applied fields of germplasm curation and crop breeding, in which the methodical description of genetic diversity will speed the refinement of core germplasm collections. In turn, the resulting data will enable allele prospecting in land races and wild relatives of crop plants, spanning the breadth from global commodities to species of regional importance. Moreover, in species with suitable genetic attributes, technologies for high-throughput genotyping will revolutionize our ability to map conserved elements between genomes and to identify informative genetic intervals, with corresponding impacts on areas such as the comparative analysis of genetic maps and the analysis of quantitative trait loci. Thus, it seems likely that genotyping and phenotype assessment will achieve ‘omic’ status for basic and applied plant biologists. Many, but not all, of these topics are covered in the reviews in this section. The most prevalent theme is one of genome complexity, ranging from strategies for the analysis of large and complex genomes to those that improve our understanding the functional and structural complexity of even the ‘simple’ genomes of Arabidopsis and rice. The revolution in the capacity for genome-scale analysis makes it possible to consider the large and complex genomes of species such as maize as ‘model systems’ in their own right, in which the structural complexity of the genome becomes a target for investigation rather than an impediment its study. Martienssen et al. (pp. 102–107) describe the relative merits of several approaches to sequencing the highly fluid genome of maize. Gene-enrichment strategies, such as methyl filtration and high-C0t sequencing, offer the opportunity to capture the majority of maize genic DNA, providing a cost-effective alternative to more conventional BACby-BAC whole-genome sequencing. At the same time, there is an increasing capacity to investigate the complex features of even simple genomes, such as the centromeres and telomeres of Arabidopsis, and studies are underway to elucidate the structure, function and evolution of these gene-poor regions. Hall et al. (pp. 108–114) provide a cogent assessment of the state of centromere biology. Comparative genomics approaches, as well as functional assays using mini-chromosomes, hold great promise for understanding both the cis DNA elements and the trans protein components that are necessary for centromere activity. The availability of large amounts of sequence information has also had a profound effect on our understanding of the origin and evolution of repetitive features within plant Current Opinion in Plant Biology 2004, 7:99–101
genomes. Among such repetitive elements are miniature inverted repeat transposable elements (MITES), short repetitive elements that were discovered more than a decade ago to be associated with many genes in grass species. The review by Jiang et al. (pp. 115–119) describes current progress in this exciting area of genome research, including the recent finding that these nonautonomous DNA elements are capable of genomic insertion and excision, and explains how MITES have played a major role in shaping the genomes of grass species such as rice. Complexity is also increasingly recognized in the functional attributes of plant genomes, with the simple one gene–one protein hypothesis having been modified significantly to include the role of non-coding small RNAs, the dynamic nature and regulatory role of chromatin, the alternative splicing of transcripts and other levels of regulation. The review by Mallory and Vaucheret (pp. 120–125) provides an overview of recent developments in the area of small non-coding RNAs. These small RNAs appear to operate by distinct mechanisms and towards various ends, with biological consequences ranging from virus resistance and the regulation of transposon activity, to the regulation of plant development and the sequence-specific degradation of transcripts. In no small way, our increasing awareness of this non-coding RNA realm represents a paradigm shift in the biological sciences. Delseny (pp. 126–131) revisits the issue of conserved gene order among the angiosperms. Although it was once widely anticipated that conserved gene order would be the rule among the angiosperms, it has turned out to be the exception. Gene co-linearity has been disrupted by extensive genome duplication (which has been best documented for Arabidopsis but is likely to be equally prevalent in other plant genomes) accompanied by segmental rearrangements. As phylogenetic distance increases, evidence of synteny is confounded by a high frequency of gene loss in duplicated regions. Conversely, in cases in which paralogous genes are retained, the functional attributes of genomes diverge, either by the retention of a redundant function in a given lineage or through the divergence and specialization of the respective paralogs. As Delseny points out, the plasticity of plant genome structure confounds comparisons across even moderate phylogenetic distances; this situation increases the impetus to select phylogenetically well-positioned species for detailed characterization, with the expectation that they will serve as references for closely related plant genomes. The flood of sequence information on many species is providing extensive catalogs of genetic diversity, often in the form of single nucleotide polymorphisms. These catalogs, in combination with technologies for highthroughput genotyping, mean that the rate-limiting step www.sciencedirect.com
Editorial overview Ecker and Cook 101
in the genetic mapping of traits is shifting from genotype analysis to phenotype analysis. Borevitz and Chory (pp. 132–136) provide an overview of recent advances in the analysis of quantitative traits. They discuss the implications for studies of basic plant biology, in which researchers aim to understand the molecular basis of complex traits, and for the more applied area of plant breeding, which focuses on combining valuable alleles into elite lines. Common to both basic and applied interests is the need to understand how environment affects the expression of quantitative traits; unraveling this mystery should enhance both the characterization and the application of natural accessions for crop improvement.
pathogens of both plants and animals, are widely distributed in the genomes of symbiotic and non-pathogenic associative bacteria. It seems likely, therefore, that typeIII secretion is a general adaptation for prokaryote– eukaryote interactions, which is refined for either pathogenic or non-pathogenic lifestyles. Finally, although describing the gene content of plant-associated bacteria is rewarding in its own right, the real opportunities lie in post-genomics investigations that will reveal transcriptional networks and their products and functions. Thus, the tools of transcriptional profiling, proteomics and highthroughput reverse genetics will become increasingly important for studying plant–bacteria interactions.
In contrast to the analysis of plant genomes, in which complete sequence characterization remains a formidable task, the genomes of bacteria yield easily to wholegenome sequencing. Thus, the data set for entire genomes of plant-associated bacteria is increasing rapidly, providing opportunities in the area of comparative genomics and for large-scale post-genomic studies of pathogenic mechanisms. The review by Pu¨ hler and colleagues (pp. 137–147) discusses the impact of whole-genome sequencing on our understanding of the interactions between plants and pathogenic, symbiotic and associative bacteria. Among the initial findings is a greatly expanded list of genes that are predicted to have a role in host interactions. In turn, these ‘candidate genes’ have become desirable targets for post-genomic investigations of gene function using high-throughput strategies for genome mutagenesis. One particularly interesting finding from the whole-genome data is the fact that type-III secretion systems, once thought to be the sole domain of the
The past several years have brought new initiatives to sequence plant and microbial genomes, several of which are described in this genome studies and molecular genetics section. The resulting information is providing a wealth of new opportunities in plant biology, ranging from basic studies of evolution and molecular-genetic mechanisms to applied aspects of plant biology that focus on crop improvement. However, despite the massive impact that genomics has had on plant sciences, the challenges that lie ahead are undoubtedly more numerous than those we have faced to date. Thus, the tools of bioinformatics and computational genomics are now essential in most plant biology laboratories, yet these disciplines are still in their infancy. The era of research that focuses on one gene, one protein, or one metabolite is fading fast. In its place we find post-genomic research with the goal of determining how genomes function to provide networks of interacting molecules and regulatory mechanisms.
www.sciencedirect.com
Current Opinion in Plant Biology 2004, 7:99–101