C H A P T E R
5 Informatic tools and platforms for enhancing plant R-gene discovery process Maria Raffaella Ercolano, Giuseppe Andolfo, Luigi Frusciante Department of Agricultural Sciences, University of Naples ‘Federico II’, Portici, Italy
5.1 Introduction The vibrant genomics era in which scientific community is living today, traced by extensive sequencing data and supported by newly developed computational technologies, is called to face with an ultimate challenge. The conspicuous technological advances should be driven in applications useful for developing sustainable agriculture. The most important task in this respect is the acquisition and collection of basic knowledge on trait of interest, genome organization and gene function. The value of the data generated largely depends on how those data are stored, managed, analyzed, and made accessible to the scientific community as well as breeding companies. How these data can be analyzed depends, in turn, on how analytical tools and other information sources are made available. Bioinformatics, biometric and advanced data management systems should be designed to support the storage of genomics and genetic resources, and to develop an integrated crop improvement information network. Like global agricultural needs increase, so do the requirements for crops that can be efficiently and safely produced. Traditional plant breeding methods have served well in the past and breakthrough technologies are now available to aid this process. These breakthroughs include availability of crop genome sequences information and gene engineering techniques (Andolfo et al., 2016). In the past decade, research has yielded extensive databases of entire plant genomes. Powerful bioinformatics tools have been developed for genomic research, used for a variety of purposes in plant genetics. During recent years, these tools have been applied to the analysis of the whole genome, transcriptome and epigenome helping in acquiring an enormous amount of information on the structure of genomes and the nature of genes and gene products (Bevan and Uauy, 2013; Poland and Rutkoski, 2016). These tools have
Applied Plant Biotechnology for Improving Resistance to Biotic Stress https://doi.org/10.1016/B978-0-12-816030-5.00005-7
121
Copyright © 2020 Palmiro Poltronieri and Yiguo Hong. Published by Elsevier Inc. All rights reserved.
122
5. Informatic tools and platforms for enhancing plant R-gene discovery process
also enabled to analyze the organization of individual genomes, to assess the p hylogenetic relationships among different species, to localize individual genes and to detect genetic diversity among different individuals (Beddows et al., 2017). As sequencing technology improves, available information to aid crop improvement is expanding rapidly. Basic research information is utilized for linking important agricultural traits to genetic sequence variations and incorporating this knowledge into crop improvement strategies (Parry and Hawkesford, 2012; Bohra et al., 2014). Plant pathogen resistance is one of most desired crop traits to improve plant yield and agriculture sustainability. High levels of resistance against specific races of a pathogen are often controlled by single dominant genes. This type of resistance, referred as monogenic or major gene (R-gene) resistance, has been widely studied and employed by breeders (Michelmore et al., 2013). R-genes acting against a broad range of pathogens including bacteria, virus, nematodes, fungi and aphids from different plant species have been identified. Most of them encode proteins containing nucleotide binding (NBS) sites and leucine-rich repeat (LRR) domains (Andolfo et al., 2014) and a Toll/Interleukin-1 receptor (TIR) or a coiled coil (CC) domain. A number of monogenic resistance genes belong to transmembrane receptor proteins like receptor-like proteins (RLP), and the receptor-like kinases (RLK) or other classes like mlo genes (Sanseverino et al., 2010). Resistance mediated by multiple genes or quantitative trait loci (QTLs), and characterized generally by pathogen species-nonspecific or race-nonspecific resistance, has been also evidenced (Parlevliet, 2002). Quantitative resistance is more difficult to exceed, but this type of resistance is difficult to transfer (Brown, 2015). Knowledge of the molecular bases of specificity and technological advances are providing new opportunities for pathogen informed strategies for gene deployment (Brown, 2015). To counteract pathogen attacks, plants have evolved strategies that comprise pathogen perception mediated by hormone sensing and specific receptors (NBS, RLP and RLK as well as other genes), signal transduction and induction of appropriate defense responses (Andolfo and Ercolano, 2015). Regulation of these responses is mediated by a network of signal transduction pathways in which classical signal transmitters such as receptors and MAP (mitogen activated protein) kinases are triggered by signals from elicitors and signal molecules such as ethylene, salicylic acid (SA) and jasmonic acid (JA) to activate defense-related genes and proteins expression. Infection of plants with distinct pathogens results in specific accumulation rates of ethylene, SA and JA, and other hormones, and in distinct sets of activated genes representing individual signal molecule signatures and gene expression profiles for different pathogens (Birkenbihl et al., 2017). The study of natural resistance acquired by the plants after millennia of evolution against its diseases, clarification of the mechanisms of action and identification of genes or gene families involved in these processes are the starting points for the beginning of less invasive crop protection strategies. Plant genome sequences (www.phytozome.it) are already available for important crops such as vitis, tomato, potato, rice, poplar, soybean, maize, cotton, cucumber, medicago, apple, etc. The outreach material should help the breeding community to make better use of genetic information and to increase the likelihood that plant breeding will benefit from resistant genotype-based selection processes. The accessibility of genetic and genomic resources could be improved by the development of platforms to link and integrate databases that act as hub portals to the vast number of public databases and tools in crops. Numerous systems are already available, and they can be implemented for specific needs.
5.2 Digital tools for promoting R-genes research activities
123
5.2 Digital tools for promoting R-genes research activities 5.2.1 Literature and germplasm repositories Basic informatic systems can provide information for facilitating many aspects of resistant varieties development. Searches of scientific literatures can be helpful to see what research, if any, has been conducted for the trait of interest. The PubMed is freely available to search; it contains citations, abstracts, and links to full text articles, as well as helpful tutorials and guides. PubMed is a good place to begin literature research, but it does not contain all agriculture-related journals, and further searches of other databases may be necessary such as Scopus (www.scopus.com/sources); ISI web of Science (www.webofknowledge. com); Agricola (www.nal.usda.gov); Plant Genetics and Breeding Database (www.cabi.org/ plant-genetics-and-breeding-database). The basic requirement to identify new resistance genes is the access to suitable genetic material. Several plant genetic resource centres from governmental and non-governmental organizations and commercial plant breeding companies support informatic resource for preserving, documenting, evaluating and distributing crop germplasm. Germplasm web repositories include the National Plant Germplasm System (NPGS): a Germplasm Resources Information Network (GRIN) that is a comprehensive database containing information on plants, animals, microbes, and invertebrates. Scientists using this tool can have access to genetic diversity to help bring forth new varieties that can resist to pests and diseases. A European GenBank Integrated System, maintained in accordance with agreed quality standards, is also freely available (http://eurisco.ecpgr.org). The National Genetic Resources Center (NGRC) in Japan implemented the NGRC (www.gene.affrc.go.jp/databases_en.php) database (db) for conservation and promotion of agrobiological genetic resources. Several other database systems have been developed for crops or plant families for facilitating genetic resources search. A list of main plant genetic repositories is provided in Table 5.1.
5.2.2 Genome-based prediction strategies Several genomic methods are employed to facilitate the study and transfer of resistance genes (Ercolano et al., 2012). A promising tool for capitalizing knowledge on R protein sequences is based on prediction approaches. The identification of genome-wide R-gene candidates is facilitated thanks to their distinctive structural features. Classification of proteins and extraction of motifs have become an active bioinformatic research area in recent years. A variety of approaches such as Pfam, PROSITE, MEME, InterProScan and SAM (Finn et al., 2014; Sigrist et al., 2010; Bailey et al., 2009; Jones et al., 2014; Holliday et al., 2018) has been developed. These databases are very useful in the analysis of newly discovered protein sequences. By classifying a protein into a family, we can infer its functions based on the known information about the family. The approach is based on the assumption that genes with a proven or predicted function in a species (functional candidate genes) could control a similar function in other species. The prerequisite for a prediction approach is a repertoire of well characterized gene families. The first step is to select functionally characterized genes. A prediction tool, named DRAGO, “Disease Resistance Analysis and Gene Orthology,” was build up to computationally predict “putative” R-genes (Andolfo et al., 2014). Detailed analyses
124
5. Informatic tools and platforms for enhancing plant R-gene discovery process
TABLE 5.1 List of main plant genetic repositories. Repository
Website
Holder
Aims
National Plant Germplasm (GRIN)
www.ars-grin.gov/
United States
Platform for Plant Germplasm System (NPGS) USDA
NIASGBdb
www.gene.affrc.go.jp/ databases_en.php
Japan
Conservation and promotion of agrobiological genetic resources
EURISCO ECPGR
http://www.ecpgr.cgiar. org/
International
Collect data from inventories of European Nations
CGIAR
http://www.cgiar.org/
International
Consultative Group on International Agricultural Research
AVRDC – The World Vegetable Center
http://avrdc.org/
Taiwan
Collect and preserve germplasm
N.I.Vavilov Research Institute of Plant Industry
www.vir.nw.ru
Russia
VIR Scientists Preserve PGR Collections
Plant Genome Resources Center IPKGatersleben. pgrc.ipk(PGRC) gatersleben.de
Germany
Platform for plant genome analysis
The Centre for Genetic Resources, the Netherlands (CGN)
https://www. wageningenur.nl/en/ show/CGN-Centre-forGenetic-Resources-theNetherlands.htm
The Netherlands
Conservation and use of vegetable crops
UK Plant Genetic Resource
http://ukpgrg.org/
United Kingdom
Ex-situ plant genetic resources in the United Kingdom
WIEWS (PGRFA)
apps3.fao.org/wiews/ wiews.jsp
International
World Information and Early Warning System on Plant Genetic Resources for Food and Agriculture
Seednet
https://wwwseednet.cbm. South East European slu.se/index.hm Countries
South East European Network on Plant Genetic Resources
NGB The Nordic Gene Bank
http://www.nordgen. org/index.php/skand/ content/view/full/62//
Sweden
Plant Genetic Resources Center
Global biodiversity information facility
http://www.gbif.at/
Denmark
Global biodiversity information sharing
For each web-site is reported the address, the holder and the description of main aims.
5.2 Digital tools for promoting R-genes research activities
125
performed on conserved profiles of those strong putative R proteins revealed interesting domain features (Sanseverino and Ercolano, 2012). More than 170,000 annotated proteins using this prediction tool have been collected in a dedicated on-line resource, named pathogen receptor gene (PRG) database (http\\PRGdb.org), for molecular and in silico studies on plant R-genes. PRGdb includes well characterized and candidate plant disease resistance genes belonging to nearly 268 plant species, of which 35 contains cloned reference PRG genes. The last version of PRGdb platform categorizes the information related to disease resistance genes in four sections: plants, genes, pathogens and disease. Moreover it provides a BLAST search tool and a DRAGO pipeline version 2.0 to annotate resistance genes (Osuna-Cruz et al., 2018). Researchers can download reference genes of interest to design primers to amplify homologous genes in their favorite species, predicting and classifying candidates from high- throughput sequences or single sequence or using various queries to get further information linked to genes. So far, several informatic R gene prediction approaches were employed in individual studies (Sekhwal et al., 2015). A comprehensive effort for integrating individual tools and dataset is expected for facilitating processing and standardizing R-gene data organization (Osuna-Cruz et al., 2018). Detection of new candidate R-gene can be made easier by machine learning methods to extract and classify the feature of available genomic data (Pal et al., 2016). Resistance gene enrichment and sequencing (RenSeq) method can also improve annotation and the discovery process of pathogen resistance genes in plant genome sequences. Indeed, RenSeq technology was successful applied in the analysis of genome resistance gene complements (Michelmore et al., 2013) and in accelerating cloning of a potato late blight-resistance gene (Jupe et al., 2013; Witek et al., 2016).
5.2.3 Integrated genomics-based mapping The huge amount of data generated by sequencing initiatives allows the “in silico” identification of genetic variation. This genetic variation can be linked to a resistance trait of interest by genetic mapping or linkage disequilibrium (LD) mapping. With the growth of genomic data such approach is becoming a powerful tool for a more accurate identification of relevant genes involved in disease resistance process (Costet et al., 2012). High density SNP maps increase the potential of linkage disequilibrium mapping and association studies (Cabrera-Bosquet et al., 2012). Studying the sequence variation among alleles of target genes may provide conserved sequence motifs or conserved SNPs associated with a trait (Caicedo and Purugganan, 2005). Genomic breeding strategies would achieve greatest gains using whole genome selection. Establishing methods for efficient transfer of validated genome signatures into breeding selection procedures is essential for the uptake of the marker technology and it is a focus of ongoing research (Poland and Rutkoski, 2016). Successful genomic selection requires considerable infrastructure, especially if one wishes to undertake large-scale genome-wide SNP discovery. This is because there is a requirement to have access to high-throughput sequencing and genotyping facilities as well as expertise in bioinformatics to manage the resulting databases. In this context, public genomic consortia appear to be cost-efficient initiatives for developing large numbers of genomic variation data (Rimbert et al., 2018). Excellent examples of such a consortium effort are the International Wheat Genome Sequencing Consortium
126
5. Informatic tools and platforms for enhancing plant R-gene discovery process
(IWGSC) and the Solanaceae Genomics Network (SGN, http://solgenomics.net/). In the first case, polymorphic SNP loci are collected in either hexaploid wheat or its diploid ancestors (AlauxEmail et al., 2018). In the second example, gene allelic variations are linked to valuable traits in cultivated germplasm of potato and tomato. This information resulted very useful to increase the probability that crops can benefit from a genotype-based selection (Liabeuf et al., 2018; Zegeye et al., 2014). High-resolution chromosome maps represent a clear advantage for improving gene tagging. Genetic maps can be combined with sequence data to accelerate R-gene discovery. Graphical genotyping, proposed for the first time by Young and Tanksley (1989), allowed to visualize the genotype of single individuals for helping breeders to identify desirable individuals based on their genotype. Genotype visualizers could be very useful for performing genomic drived selection and cross-interactions with the data, supporting data mining (Belarmino et al., 2012; Dèrozier et al., 2011). Using such tools is possible to identify genotypes with desirable alleles for resistance traits and with minimal undesirable alleles elsewhere in the genome. Graphical genotyping is supported by software packages such as Graphical GenoTypes v2.0 (GGT) and Functional Analysis of the Arabidopsis Genome (FLAGdb++) (Van Berloo, 2008; Samson et al., 2004). Sequence data visualization and integration with complementary information can results very useful. Collocation of a predicted gene with similar function within mapping positions, association of alleles with specific traits or the identification of syntenic regions among genomes can help also to select positional or functional candidate genes for the trait. Knowing the location of given R-gene locus can be of great benefit for mining its nucleotide sequences using both recombination analysis and protein function prediction. Andolfo et al. (2013a) proposed a combined approach to link prediction data with genetic molecular analysis to discover new tomato R-genes in tomato (Fig. 5.1). Once germplasm with the desired trait has been identified, sequencing and genetic mapping can be used simultaneously to predict the location and the sequence of a resistance gene. Information on chromosome recombination rates and R-gene distribution may be useful to steer future disease resistance breeding schemes and select favorable allele combinations (Nieri et al., 2017).
5.2.4 Comparative genomics Comparative genomics is becoming increasingly important as more genomes are being sequenced. Previously, the identification of candidate genes underlying a genomic region linked to a trait involved laborious fine mapping studies and genetic complementation studies. Functional R orthologous genes (genes derived from a common ancestor through a speciation event) can be identified in “target” species. This approach has been often validated in crop improvement for pathogen disease, for instance the “mlo” gene that confers resistance to Erysiphe graminis in Hordeum vulgare is orthologous to (ol2) gene conferring resistance to Oidium in tomato (Pavan et al., 2008) and in other species (Andolfo et al., 2013b). The availability of whole genome sequence information in dicot and monocot species allows to investigate the genes present within a “candidate region” based on genomic synteny and on putative function of the genes. Sequenced genomes can be aligned with comparable marker sequences in a “target” region to deduce the putative gene present in that genomic area. Genomes belonging to the same plant family are often colinear which might offer the possibility to identify orthologous R-genes on basis of syntenic genomic regions (Andolfo et al., 2017).
5.2 Digital tools for promoting R-genes research activities
FIG. 5.1 Schematic representation of the strategy used by Andolfo et al. (2013a) to identify R-gene candidates in Solanum lycopersicum genome. The R-gene prediction is based on features from genomic (e.g. protein sequences, physical position of gene loci) and genetic (i.e. molecular markers linked to disease resistance genes) data.
127
128
5. Informatic tools and platforms for enhancing plant R-gene discovery process
Comparative genomic studies can provide a basis for the extrapolation of gene functions among species. Comparative analysis among C. pepo, C. melo, C. sativus, C. lanatus, and A. thaliana proteomes highlighted specific R-gene family expansions in C. pepo. On an even smaller scale, the alignment of the nucleotide or amino acid sequence of genes (e.g. in BLAST searches and subsequent sequence alignments) is an increasingly valuable starting point for the selection of genes that share the same gene function (Song and Messing, 2003). On the other hand, caution should be taken with candidate positional approach since genomic synteny is not always reflecting a perfect colinearity. How R-genes genetic variations and how many of these genes are conserved remain to be determined. Insight into these questions will greatly help to estimate the richness of resistance germplasm and enable these resources to be preserved and utilized efficiently. Within a species, divergence into different haplotypes in most cases does not affect the colinearity of genes. However, there are also examples were the order of genes in a given genomic region changed, for example among breeding lines of maize (Brunner et al., 2005; Fernandez-Pozo et al., 2015). Several comparative genomics software to manage, compute, query and display synteny between multiple sequenced genomes are available. The algorithms vary as to whether they are optimized for large genomes, small genomes, regions or genes, along with the number of sequences that can be compared simultaneously. Nowadays, data from many plant sequencing projects are available completely free on different web portals. Gramene (www.gramene. org) is a database where users are allowed to query and explore the power of genomic colinearity and comparative genomics for genetic and genomic studies on cereals like rice, sorghum, maize, wild rice, wheat, oats, barley, and other agronomically important crop plants such as poplar and grape, and the model plant Arabidopsis. The SGN, a family-oriented database dedicated to the Solanaceae family, includes a comparative viewer among Solanaceae genomes (Ghiurcuta and Moret, 2014). The variation patterns of R-genes can provide valuable information that may be applied during plant breeding for improving plant resistance. Comparative genomics take advantage of the information provided by the signatures of selection to understand the function and evolutionary processes that act on genomes. Detecting synteny block order and gene micro-arrangements can aid the R gene functional gene discovery. Several systems for constructing multiple genome alignments in the presence of evolutionary events such as rearrangement and inversion can be employed (Darling et al., 2004).
5.2.5 Gene expression profile investigations The advent of transcriptional profiling, which measures the expression level of thousands of messenger RNA (mRNA) transcripts in a single experiment have substantially increased the power to unravel disease process. Massively parallel sequencing platforms for Next Generation Sequencing (NGS) protocols changed the landscape of the genetics studies. Expression levels of specific genes, differential splicing, allele-specific expression of transcripts can be accurately determined by RNA-Seq experiments to address many biological-related issues. This technology brings the great power to make several biological observations and discoveries, it also requires a considerable effort in the development of new bioinformatics tools to deal with these massive data files. The datasets can be analyzed with several tools in integrative way and correlated with plant pathogenesis pathways
5.2 Digital tools for promoting R-genes research activities
129
enabling comparative studies of gene expression patterns. Gene expression data represents an ideal source to discover differentially regulated genes during plant-pathogen challenge. Several web-based services harbor gene expression data and cross investigations can help to recover information for individual genes or gene sets such as GEO (www.ncbi.nlm.nih. gov/geo/), a gene expression experiments repository of National Center for Biotechnology Information (NCBI) and the RNASeq-er API (www.ebi.ac.uk/fg/rnaseq/api/) that provides easy access to the results of the systematically updated and continually growing analysis of public RNA-seq data in European Nucleotide Archive (ENA). Genevestigator characterizes plant genes by finding out in response to what they are expressed (https://genevestigator. com/gv/). BAR is a bio-analytical resources for plant breeding (http://bar.utoronto.ca/), Gene Expression Resources collects Arabidopsis curated expression profiles experiment at TAIR (www.arabidopsis.org/portals/expression/index.jsp), and RiceGE, Gene Expression Atlas Data (http://signal.salk.edu/RiceGE/RiceGE_gAtlas_Source.html) provides gene expression data in rice. TFGD tomato functional genomic database offers a comparative gene analysis service to detect clusters of genes with similar expression patterns across selected or the complete set of stimuli (http://ted.bti.cornell.edu/). The Gene Ontology (GO) analysis toolkit for the agricultural community (AgriGO, http://bioinfo.cau.edu.cn/agriGO/index. php) is a platform that offers enrichment analyses for plant species. AgBase (www.agbase. msstate.edu) provides structural and functional analysis for agriculturally important plant and pathogen genomes. Repositories of data of experiments displaying interactions between plants and pathogens at molecular level during pathogenesis are highly needed. An effort for building up a gene expression resource for plants and plant pathogens was made with Plexdb (www.plexdb.org/). A limited number of regulated genes have been annotated in PathoPlant from microarray-experimentally proven direct molecular interactions (Merchant et al., 2016), but more data should be added to promote further discoveries. Information on promoter structures, Transcription Start Sites (TSSs), regulators and microRNA, transcription factors could be also useful. By using specific databases such as Plant Cis-Acting Regulatory Elements (Plant-CARE), Promomer (bc.botany.utoronto.ca/ntools/cgi-bin/BAR_Promomer. cgi), Plant prom and ppdb, the core promoter structure, the presence of regulatory elements and the distribution of TSS clusters can be identified. The plant miRNA database (PMRD) (http://bioinformatics.cau.edu.cn/PMRD) contains sequence information, microRNA target genes, and expression profiles data. The Plant Transcription Factor Database (PlnTFDB) (http://plntfdb.bio.uni-potsdam.de/v3.0/) is an integrative database that provides putatively complete sets of transcription factors (TFs) and other transcriptional regulators (TRs) in plant species (sensu lato) whose genomes have been completely sequenced and annotated. Database Resource for the Analysis of Signal Transduction In Cells (DRASTIC, http://www. drastic.org.uk/) records plant expressed sequence tags and genes up or down-regulated in response to various pathogens or other treatments such as drought, salt and low temperature. The INSIGHTS (INference of cell SIGnaling HypoTheseS) allows data mining and extraction of information from the DRASTIC database. Potential response pathways can be visualized and comparisons can be made between gene expression patterns in response to various treatments to gain new findings. Reconstructing gene or protein networks is a very important task for deciphering molecular mechanisms and potential traits involved. Accordingly, to the main target of the single experiment different tools and web-based resources can be used in the data analysis to acquire the information needed.
130
5. Informatic tools and platforms for enhancing plant R-gene discovery process
5.3 Optimizing the use of informatic resources for R-gene discovery Plant science is characterized by the production of large and heterogeneous collections of biological data, including genomic sequence data, expression profiles, and phenotypic characterization (Bolger et al., 2017). The importance of more extensive annotations is becoming apparent, this is also associated with significant increase in curation workload, which slows down the manual annotation process of the biological context information that can be encountered in the literature. There has been a tendency to enrich of annotations many of the existing curated databases with previously missing, but biologically relevant information by incorporating new fields or extending the inventory of terms used to describe a subject. In parallel, new manually curated databases focused on highly specialized topics that are not sufficiently covered in general expert databases have been developed (Hu et al., 2018). Interdisciplinary approaches can be undertaken using these resources for disease gene discovery. Genomics has changed our outlook of life by providing an exponentially wider view on how biological information interacts and flows but translating genomics into plant breeding is still difficult. The diversity of phenotypes must be associated with databases of DNA sequences and their expression patterns (Borevitz and Ecker, 2004). Platforms that allow users to combine different information using a core set of web applications, handling phenotypic, genotypic and passport data should be designed (Fig. 5.2). A set of interfaces for interrogation, browsing and displaying of a number of data types is often offered by management system. For instance, Germinate (http://bioinf.scri.ac.uk/germinate) is a generic plant data management platform designed to act as a scaffold on which complex web-based applications can be developed to leverage information from experimentally derived data. Germinate currently handles phenotypic, genotypic and passport data. A set of interfaces for interrogation, browsing and displaying of a number of data types is offered by the system. The Integrated Breeding Platform (https://www.integratedbreeding. net) also brings together tools, knowledge and support that plant breeders need to plan, conduct and assess the outcomes of both conventional breeding and genomics-assisted breeding. Commercial products such as AGROBASE (http://www.agronomix.com/), PHENOM network (http://www.phenome-networks.com), SEEDBASE (http://www.seedbase.info), DORIANE (http://www.doriane.com) or others with similar characteristics could be also used. Breeders can use graphical interfaces to manage the logistics of their work, keep control of their data and handle issues like data analysis, simulation and decision-making. Consistent and robust suites of open source tools based on common internationally agreed standards are under development. To meet requirements, numerous groups have been working together to create Application Programming Interface (API) for breeding (Br) data. Independent tools can run on its own or can be easily integrated with a larger system. The goal is to keep the BrAPI community informed of what tools are available and encourage functionality sharing (https://www.brapi.org). BrAPI is a practical tool to help solve problems in accessing, exchanging, and integrating data across systems and applications. Plant researchers are making use of the scientific information in multiple stages within the discovery process. Knowledge extracted from previous publications and database is used to define target genes, to select target trait to begin studied, to extract information relevant for identify a gene or obtaining desired varieties. After generating and analyzing new data,
5.3 Optimizing the use of informatic resources for R-gene discovery
131
FIG. 5.2 A general workflow for the high-throughput data analysis from genetic (Genetic Markers: SNP, RAPD, AFLP, SFP, DArT and RFLP; Genetic Resources: ILs, RILs, BCs and DHs), genomic (ChIA-PET, WGBS, DNA-Seq, RIP-Seq, ChIP-Seq, RRBS, RNA-Seq and ATAC-Seq) and mapping (GWAS, GS, QTLmap) studies, using informatic resources (Software, DBs, Statistics and Algorithms). The workflow describes the data processing steps for the identification of the novel agronomic traits (gene discovery) and the selection new cultivated varieties (new cultivars).
information derived from the informatic supports is essential to understand and interpret the resulting data, to draw conclusions. The genetic advance achieved through genomic selection depends on the ability of capturing superior alleles, the repeatability of the trait and the selection pressure imposed. Modern breeding is a dynamic, and evolving research discipline for minimizing efforts are under development. Traditional selection schemes should be modified and adapted for computational input data to make use of this enormous data richness that is changing rapidly over time. Many traits are more complex than previously suspected, with complex regulation of gene expression and interactions between regulatory pathways (Dolatabadian et al., 2017). In addition, it was shown that besides SNPs, duplications and deletions, large scale copy number polymorphisms or variations may underlie a diverse range of resistant phenotypes (Slootweg et al., 2017). Differences in expression level among R-gene homologs seemed to cause differences (Guo et al., 2008). In such complex situations genetic variation in combination with high throughput and sensitive SNP detection methods are important to offer the possibility to screen for allelic differences at the expression level (Schaart et al., 2005; Salentijn et al., 2007) and to discriminate superior allelic forms (haplotypes) of a complex trait within the complete germplasm pool of a species. Research community is
132
5. Informatic tools and platforms for enhancing plant R-gene discovery process
generating thousands of studies where complex phenotypes are being measured in biological populations, but phenotyping data are still fragmented. The bottom line is that biological databases alone cannot capture the richness of scientific information and argumentation needed for conduct a breeding program. Informatic resources provide support for the novel ways in which scientists will interrogate these databases. Even if curators of biological databases were able to keep up with the ever-increasing volume of data, researchers would still need mining data to link the database entries to draw out desired genotypes. Platforms for integrating genotypic and phenotypic data in breeding have been developed. However, we need expertise to use and combine multidisciplinar information in a new background.
5.4 Concluding remarks From the examples presented above it becomes clear that ongoing bioinformatic efforts provide an increasing availability of supports for developing disease resistant plants (Bro and Nielsen, 2004; Chapman et al., 2014). How we can use expanding sources of genomic knowledge for developing resistance disease crops depends on several factors such as the availability of proper bioinformatic tools. Structural and comparative mapping can help to identify candidate genes and to find R orthologs in target crops. However, validation of candidate genes by genetics, comparative and physical mapping and phenotypic studies is still necessary to prove linkage with desirable trait. Random mutagenesis, site-directed mutagenesis and genome editing can help in finding variants with gain of recognition (Andolfo et al., 2016). Despite the complex nature of many crops we expect that, with the increased possibilities at the technical level and in the field of data integration, genomic research creates valuable tools for disease resistance breeding in crop species. A quickly expanding amount of genomics information to make desirable ideotype is being obtained from integrated databases managing genomic sequence data, literature, expression profiles, and phenotype information. The challenge of combining all information into one system is quite large. A coherent global information infrastructure with analytical tools linking and integrating these information components is under development, new trained scientists should be able to use it. Plant breeders need continuous support in the selection, identification and the integration of existing tools and databases.
References AlauxEmail M, Rogers J, Letellier T, Flores R, Alfama F, Pommier C, Mohellibi N, Durand S, Kimmel E, Michotey C, Guerche C, Loaec M, Lainé M, Steinbach D, Choulet F, Rimbert H, Leroy P, Guilhot N, Salse J, Feuillet C, Paux E, Eversole K, Adam-Blondon AF, Quesneville H. Linking the International Wheat Genome Sequencing Consortium bread wheat reference genome sequence to wheat genetic and phenomic data. Genome Biol 2018;19:111. Andolfo G, Ercolano MR. Plant innate immunity multicomponent model. Front Plant Sci 2015;6:987. Andolfo G, Sanseverino W, Aversano R, Frusciante L, Ercolano MR. Genome-wide identification and analysis of candidate genes for disease resistance in tomato. Mol Breed 2013a;33:227. Andolfo G, Sanseverino W, Rombauts S, Van de Peer Y, Bradeen JM, Carputo D, Frusciante L, Ercolano MR. Overview of tomato (Solanum lycopersicum) candidate pathogen recognition genes reveals important Solanum R locus dynamics. New Phytol 2013b;197(1). Andolfo G, Jupe F, Witek K, Etherington GJ, Ercolano MR, Jones JD. Defining the full tomato NB-LRR resistance gene repertoire using genomic and cDNA RenSeq. BMC Plant Biol 2014;14:120.
References
133
Andolfo G, Iovieno P, Frusciante L, Ercolano MR. Genome-editing technologies for enhancing plant disease resistance. Front Plant Sci 2016;7:1813. Andolfo G, Di Donato A, Darrudi R, Errico A, Aiese Cigliano R, Ercolano MR. Draft of zucchini (Cucurbita pepo L.) proteome: A resource for genetic and genomic studies. Front Genet 2017;8:181. Bailey TL, Boden M, Buske FA, Frith M, Grant CE, Clementi L, Ren J, Li WW, Noble WS. MEME SUITE: Tools for motif discovery and searching. Nucleic Acids Res 2009;37:W202. Beddows I, Reddy A, Kloesges T, Rose LE. Population genomics in wild tomatoes − the interplay of divergence and admixture. Genome Biol Evol 2017;9:3023. Belarmino LC, da S. Oliveira AR, Brasileiro-Vidal AC, de A. Bortoleti KC, Bezerra-Neto JPI, Abdelnoor RV, BenkoIseppon AM. Mining plant genome browsers as a means for efficient connection of physical, genetic and cytogenetic mapping: an example using soybean. Genet Mol Biol 2012;35:335. Bevan MW, Uauy C. Genomics reveals new landscapes for crop improvement. Genome Biol 2013;14:206. Birkenbihl RP, Liu S, Somssich IE. Transcriptional events defining plant immune responses. Curr Opin Plant Biol 2017;38:1. Bohra A, Pandey MK, Jha UC, Singh B, Singh IP, Datta D, Chaturvedi SK, Nadarajan N, Varshney RK. Genomicsassisted breeding in four major pulse crops of developing countries: present status and prospects. Theor Appl Genet 2014;127:1263. Bolger M, Schwacke R, Gundlach H, Schmutzer T, Chen J, Arend D, Oppermann M, Weise S, Lange M, Fiorani F, Spannagl M, Scholz U, Mayer K, Usadel B. From plant genomes to phenotypes. J Biotechnol 2017;261:46. Borevitz JO, Ecker JR. Plant genomics: the third wave. Annu Rev Genomics Hum Genet 2004;5:443–77. Bro C, Nielsen J. Impact of ‘ome’ analyses on inverse metabolic engineering. Metab Eng 2004;6:204. Brown JKM. Durable resistance of crops to disease: a Darwinian perspective. Annu Rev Phytopathol 2015;53:513. Brunner CA, Bornhold BD, Firth JV. Paleontological investigation on planktonic foraminifers of ODP Hole 169-1036B. PANGAEA, Shipboard Scientific Party. 2005. Cabrera-Bosquet L, Crossa J, von Zitzewitz J, Serret MD, Araus JL. High-throughput phenotyping and genomic selection: the frontiers of crop breeding converge. J Integr Plant Biol 2012;54:312. Caicedo AL, Purugganan MD. Comparative plant genomics frontiers and prospects. Plant Physiol 2005;138:545. Chapman S, Stevens LJ, Boevink PC, Engelhardt S, Alexander CJ, Harrower B, Champouret N, McGeachy K, Van Weymers PS, Chen X, Birch PR, Hein I. Detection of the virulent form of AVR3a from Phytophthora infestans following artificial evolution of potato resistance gene R3a. PLoS ONE 2014;9:e110158. Costet L, Le Cunff L, Royaert S, Raboin LM, Hervouet C, Toubi L, Telismart H, Garsmeur O, Rousselle Y, Pauquet J, Nibouche S, Glaszmann JC, Hoarau JY, D’Hont A. Haplotype structure around Bru1 Dreveals a narrow genetic basis for brown rust resistance in modern sugarcane cultivars. Theor Appl Genet 2012;125:825. Darling ACE, Mau B, Blattner FR, Perna NT. Mauve: multiple alignment of conserved genomic sequence with rearrangements. Genome Res 2004;14:1394. Dèrozier S, Samson F, Tamby JP, Guichard C, Brunaud V, Grevet P, Gagnot S, Label P, Leplé JC, Lecharny A, Aubourg S. Exploration of plant genomes in the FLAGdb++ environment. Plant Methods 2011;7:8. Dolatabadian A, Patel DA, Edwards D, Batley J. Copy number variation and disease resistance in plants. Theor Appl Genet 2017;130:2479. Ercolano MR, Sanseverino W, Carli P, Ferriello F, Frusciante L. Genetic and genomic approaches for R-gene mediated disease resistance in tomato: retrospects and prospects. Plant Cell Rep 2012;31:973. Fernandez-Pozo N, Rosli HG, Martin GB, Mueller LA. The SGN VIGS tool: user-friendly software to design virus induced gene silencing (VIGS) constructs for functional genomics. Mol Plant 2015;8:486. Finn RD, Bateman A, Clements J, Coggill P, Eberhardt RY, Eddy SR, Heger A, Hetherington K, Holm L, Mistry J, Sonnhammer EL, Tate J, Punta M. Pfam: the protein families database. Nucleic Acids Res 2014;42:D222. Ghiurcuta CG, Moret BM. Evaluating synteny for improved comparative studies. Bioinformatics 2014;30:9. Guo M, Yang S, Rupe M, Hu B, Bickel DR, Arthur L, Smith O. Genome-wide allele-specific expression analysis using massively parallel signature sequencing (MPSS™) reveals cis- and trans-effects on gene expression in maize hybrid meristem tissue. Plant Mol Biol 2008;66:551. Holliday GL, Akiva E, Meng EC, Brown SD, Calhoun S, Pieper U, Sali A, Booker SJ, Babbitt PC. Atlas of the radical SAM superfamily: divergent evolution of function using a “plug and play” domain. Methods Enzymol 2018;606:1. Hu H, Scheben A, Edwards D. Advances in integrating genomics and bioinformatics in the plant breeding pipeline agriculture. Agriculture 2018;8:75.
134
5. Informatic tools and platforms for enhancing plant R-gene discovery process
Jones P, Binns D, Chang HY, Fraser M, Li W, McAnulla C, McWilliam H, Maslen J, Mitchell A, Nuka G, Pesseat S, Quinn AF, Sangrador-Vegas A, Scheremetjew M, Yong SY, Lopez R, Hunter S. InterProScan 5: genome-scale protein function classification. Bioinformatics 2014;30:1236. Jupe F, Witek K, Verweij W, Sliwka J, Pritchard L, Etherington GJ, Maclean D, Cock PJ, Leggett RM, Bryan GJ, Cardle L, Hein I, Jones JD. Resistance gene enrichment sequencing (RenSeq) enables reannotation of the NB-LRR gene family from sequenced plant genomes and rapid mapping of resistance loci in segregating populations. Plant J 2013;76:530. Liabeuf D, Sim SC, Francis DM. Comparison of marker-based genomic estimated breeding values and phenotypic evaluation for selection of bacterial spot resistance in tomato. Phytopathology 2018;108:392. Merchant N, Lyons E, Goff S, Vaughn M, Ware D, Micklos D, Antin P. The iPlantCollaborative: cyberinfrastructure for enabling data to discovery for the life sciences. PLoS Biol 2016;14:e1002342. Michelmore RW, Christopoulou M, Caldwell KS. Impacts of resistance gene genetics, function, and evolution on a durable future. Annu Rev Phytopathol 2013;51:291. Nieri D, Di Donato A, Ercolano MR. Analysis of tomato meiotic recombination profile reveals preferential chromosome positions for NB-LRR genes. Euphytica 2017;213:206. Osuna-Cruz CM, Paytuvi-Gallart A, Di Donato A, Sundesha V, Andolfo G, Aiese Cigliano R, Sanseverino W, Ercolano MR. PRGdb 3.0: a comprehensive platform for prediction and analysis of plant disease resistance genes. Nucleic Acids Res 2018;46:D1197. Pal T, Jaiswal V, Chauhan RS. DRPPP: a machine learning based tool for prediction of disease resistance proteins in plants. Comput Biol Med 2016;78:42. Parlevliet JE. Durability of resistance against fungal, bacterial and viral pathogens; present situation. Euphytica 2002;124:147. Parry MA, Hawkesford MJ. An integrated approach to crop genetic improvement. J Integr Plant Biol 2012;54:250. Pavan S, Zheng Z, Berg P, Lotti C, Giovanni C, Borisova M, Lindhout P, Jong H, Ricciardi L, Visser R, Bai Y. Map- vs homology-based cloning for the recessive gene ol-2 conferring resistance to tomato powdery mildew. Euphytica 2008;162:91. Poland J, Rutkoski J. Advances and challenges in genomic selection for disease resistance. Annu Rev Phytopathol 2016;54:79. Rimbert H, Darrier B, Navarro J, Kitt J, Choulet F, Leveugle M, Duarte J, Rivière N, Eversole K, Le Gouis J, Davassi A, Balfourier F, Le Paslier MC, Berard A, Brunel D, Feuillet C, Poncet C, Sourdille PAND, Paux E. High throughput SNP discovery and genotyping in hexaploid wheat. PLoS ONE 2018;13:e0186329. Salentijn EMJ, Pereira A, Angenent GC, van der Linden CG, Krens F, Smulders MJM, Vosman B. Plant translational genomics: from model species to crops. Mol Breed 2007;20:1. Samson F, Brunaud V, Duchene S, De Oliveira Y, Caboche M, Lecharny A, Aubourg S. FLAGdb(þþ): a database for the functional analysis of the Arabidopsis genome. Nucleic Acids Res 2004;32:D347. Sanseverino W, Ercolano MR. In silico approach to predict candidate R proteins and to define their domain architecture. BMC Res Notes 2012;5:678. Sanseverino W, Roma G, Simone MD, Faino L, Melito S, Stupka E, Frusciante L, Ercolano MR. PRGdb: a bioinformatics platform for plant resistance gene analysis. Nucleic Acids Res 2010;38:D814. Schaart JG, Mehli L, Schouten HJ. Quantification of allele-specific expression of a gene encoding strawberry polygalacturonase-inhibiting protein (PGIP) using PyrosequencingTM. Plant J 2005;41:493. Sekhwal MK, Li P, Lam I, Wang X, Cloutier S, You FM. Disease resistance gene analogs (RGAs) in plants. Int J Mol Sci 2015;16:19248. Sigrist CJ, Cerutti L, de Castro E, Langendijk-Genevaux PS, Bulliard V, Bairoch A, Hulo N. PROSITE, a protein domain database for functional characterization and annotation. Nucleic Acids Res 2010;38:D161. Slootweg EJ, Koropacka K, Roosien J, Dees R, Overmars H, Lankhorst RK, van Schaik C, Pomp R, Bouwman L, Helder J, Schots A, Bakker J, Smant G, Goverse A. Sequence exchange between homologous NB-LRR genes converts virus resistance into nematode resistance, and vice versa. Plant Physiol 2017;175:498. Song R, Messing J. Gene expression of a gene family in maize based on noncollinear haplotypes. PNAS 2003;100:9055. Van Berloo R. GGT 2.0: versatile software for visualization and analysis of genetic data. J Hered 2008;99:232–6. Witek K, Jupe F, Witek AI, Baker D, Clark MD, Jones JD. Accelerated cloning of a potato late blight-resistance gene using RenSeq and SMRT sequencing. Nat Biotechnol 2016;34:656. Young ND, Tanksley SD. Restriction fragment length polymorphisms maps and the concept of graphical genotypes. Theor Appl Genet 1989;77:95.
Further reading
135
Zegeye H, Rasheed A, Makdis F, Badebo A, Ogbonnaya FC. Genome-wide association mapping for seedling and adult plant resistance to stripe rust in synthetic hexaploid wheat. PLoS ONE 2014;9:e105593.
Further reading Di Donato A, Andolfo G, Ferrarini A, Delledonne M, Ercolano MR. Investigation of orthologous pathogen recognition gene-rich regions in solanaceous species. Genome 2017;60:850. Thao NP, Tran LS. Enhancement of plant productivity in the post-genomics era. Curr Genomics 2013;17:295.