Molecular Plant Letter to the Editor
SIFGD: Setaria italica Functional Genomics Database Dear Editor, Foxtail millet (Setaria italica) is a member of the Poaceae grasses that was initially domesticated from the wild species of S. viridis in Northern China over 8700 years ago (Barton et al., 2009). Foxtail millet is distributed in natural and agricultural ecosystems worldwide, and it is grown as a cereal crop for both human food and fodder (Bettinger et al., 2010). With excellent characteristics such as short stature, short growing season, self-pollination, drought resistance, and high light efficiency, millet is favored as an important crop in the semi-arid tropical area, and regarded as a great model for all types of plant research (Doust et al., 2009), especially the studies of C4 photosynthesis (Li and Brutnell, 2011). In addition, foxtail millet exhibits rich genetic diversity (6000 varieties), which should be useful for gene and genome characterization. With two genome sequences published in 2012 (Bennetzen et al., 2012; Zhang et al., 2012), functional research such as analyses of transcriptional regulatory factors, microRNAs (Yi et al., 2013), and metabolic pathways has sped up. However, the experimental results from functional studies of S. italica are distributed over different publications and there is a lack of integration. Therefore, it is inconvenient for researchers to capture known analyses and data for this grass. In addition, the degree of gene annotation coverage in foxtail millet is still limited compared with some other plant species that have standalone databases for data storage and analysis. We have established the S. italica Functional Genomics Database (SIFGD) for bioinformatics analyses of gene function or regulatory modules (Figure 1), and the URL is http://structuralbiology.cau. edu.cn/SIFGD/. A generic genome browser (Gbrowse) is used as a platform to integrate S. italica genome sequences, transcript sequences, protein sequences, expressed sequence tags (EST), miRNA-seq and RNA-seq data, from public data sources (Supplemental Table 1) such as phytozome, NCBI, and the Beijing Genomics Institute. In the meantime, some gene functional analysis tools are also provided in our database, including gene family identification, advanced search, and motif analysis. First, we combined several analysis tools such as Pfam, InterProScan, BLAST, HMM, and collected experimental results in publications from other researchers for domain/motif and pathway identification. We identified 87 gene families with 2968 members that are putative transcription regulators; 5 classes with 1568 kinase protein-encoding genes; 22 subfamilies with 2060 members involved in the ubiquitin proteasome system; 43 families with 357 proteins of cytochrome P450; and 92 families with 1207 members for 6 modules of carbohydrate-active enzymes (Figure 1H and 1I and Supplemental Figure 1). For integrating
data from other researchers, 159 metabolism pathways containing 11,863 enzymes captured from the KEGG database and blast2GO results were highlighted in wiring diagrams manually (Figure 1K); and 623 sequences of microRNAs collected from the Yu and Prasad groups were predicted to act on 1012 target genes (Figure 1G). Second, with the aim of functional analysis, we have collected 978 plant cis-elements from publications and have found 754 present in promoter regions of foxtail millet genes. These motifs are used to analyze transcriptional regulation (Figure 1B). In addition, 130,000 GO entries integrated from ortholog prediction and public databases were supplied for gene ontology analysis. Third, we set some tools for analysis of the EAR motif; EAR motifmediated gene repression is emerging as one of the principal mechanisms of plant gene regulation. Our strategy combined multiple alignments, an HMM model, PHI-BLAST, and orthologous search tools to identify 685 proteins with the EAR motif (Supplemental Figure 2). Among the 685 EAR motifs, there are 34 different kinds of transcription regulators (TR) as listed in Supplemental Table 2. We then made a comparison of TR distribution between S. italica and Arabidopsis thaliana, which clearly displayed the similarity and diversity. For example, 18 Setaria italica proteins are homologous to the rice DWARF 53(D53) protein, a repressor containing the EAR motif pattern in strigolactone (SL) signaling (Jiang et al., 2013), while no orthologs exists in Arabidopsis; six proteins are orthologous to Zea mays RA1(RAMOSA1), a key regulator participated in repressing branching as well as controlling inflorescence architecture (Eveland et al., 2014), and three orthologous genes are present in Arabidopsis. There is a model describing how the EAR motif negatively regulates SIN3 ASSOCIATED POLYPEPTIDE 18 (SAP18), HISTONE DEACETYLASE 19 (HDA19), and TOPLESS (TPL) through protein–protein interaction to affect plant development and responses to hormonal signals or stress (Kagale and Rozwadowski, 2011). According to the orthologous search, we found that Si018590m (an ortholog of SAP18), Si016908m (an ortholog of HDA19), and Si033987m (an ortholog of TOPLESS-RELATED 1) appear in foxtail millet. HDA19 and SAP18 are reported to promote resistance to stress through regulating hormone signaling pathways, including jasmonic acid (JA) and ethylene signalings; the key regulators of these two pathways in millet, such as AP2/EREBP and the EIL family, also contain the EAR motif (Supplemental Table 2). These EAR motif-containing proteins may negatively regulate the same signaling pathway mentioned above in S. italica.
Published by the Molecular Plant Shanghai Editorial Office in association with Cell Press, an imprint of Elsevier Inc., on behalf of CSPB and IPPE, SIBS, CAS.
Molecular Plant 8, 967–970, June 2015 ª The Author 2015.
967
Molecular Plant
Letter to the Editor
Figure 1. Structure of the Setaria italica Functional Genome Database. The structure of SIFGD displays the main contents of the database. (A) The quick search page for gene information, metabolic pathways, and ortholog transformation. (B) The motif analysis page for motif scans and significance analysis. (C) A download page. (legend continued on next page)
968
Molecular Plant 8, 967–970, June 2015 ª The Author 2015.
Molecular Plant
Letter to the Editor Lastly, popular function analysis tools are available at SIFGD. For instance, search tools allow the use of batch or single search for gene information, metabolic pathway descriptions, and ortholog transformation (Figure 1A); a motif analysis tool can discover important cis-elements enriched in the promoter sequences of a set of genes (Figure 1B); GBrowse is used for displaying annotations, gene structures, EST locations, microRNAs, mRNA-seq, miRNA-seq (Figure 1F), and synteny relationships between two genomes (Figure 1E); and the BLAST search tool is available for homolog discovery (Figure 1D). These tools can be used to predict potential regulators or components of biological processes (detailed description in the Supplemental Information). Here, we provide an example of photosynthesis-related gene prediction. Setaria italica and its wild ancestor, Setaria viridis, are regarded as ideal models for C4 plant research (Li and Brutnell, 2011). Therefore, studying candidate key regulatory genes for C4 photosynthesis may help with improving C4 photosynthesis or building C4 systems into valuable C3 crops. SIFGD allows the discovery of major enzymes belonging to the NADP-ME type of C4 photosynthesis system in Setaria italica, including Si005789m (phosphoenolpyruvate carboxylase [PEPC]), Si013632m (NADP-MDH), and Si021174m (pyruvate orthophosphate dikinase [PPDK]), which were also identified in previous publications (Supplemental Figure 3A). As these C4 carbon shuttle genes mainly works in carbon fixation in the photosynthetic organisms (KO00710 on the KEGG web site), we selected enzymes from foxtail millet in this pathway, as well as major enzymes captured from articles as photosynthesisrelated proteins. According to cis-element enrichment analysis of these candidate genes of photosynthesis, several functional elements were easily observed. For example, the light-regulated cis-elements and motifs related to the secondary cell wall biosynthesis might all be pertinent for studies of biomass production, which is especially important because Setaria is related to numerous grass species with proposed biofuel potential (Supplemental Figure 3D). To be specific, motif AGATCCAA recognized by bZIP transcription factors is believed to regulate the light signaling process; motif ACAAAGAA functions in secondary xylem development; motif ACGTGTC and WAAAG (W = A/T) are candidate photosynthesis cis-regulatory elements conserved in both C4 and C3 grasses (Wang et al., 2014); in addition, RGCGR (R = A/G), a motif that is only present in the promoters of mesophyll-enriched genes for multiple times in C4 grasses to increase efficacy, actually appears several times in the photosynthesis-related genes in foxtail millet (Supplemental Figure 3C). Apart from the discovery of functional proteins, we predicted 193 TRs of C4 photosynthesis based on Zea mays
research results (Wang et al., 2014). Hence, the use of ortholog annotation, pathway mapping, and cis-element analysis can conveniently discover candidate photosynthesis-related proteins and regulators of C4 photosynthesis process in S. italica (Supplemental Table 4). In summary, SIFGD was designed to integrate existing data from publications, to improve the proportion of gene annotation, and to provide popular functional analysis tools in a convenient format for use by Setaria researchers. Functional analysis modules, major components of SIFGD, are useful for studying biological processes, such as regulation, signaling, and metabolism. SIFGD is a comprehensive database that provides search and analysis tools, which we hope will make a major contribution to the research on this important model grass.
SUPPLEMENTAL INFORMATION Supplemental Information is available at Molecular Plant Online.
FUNDING This work was supported by grants from the Ministry of Science and Technology of China (31371291 and 2012CB215301) and the Ministry of Education of China (NCET-09-0735).
ACKNOWLEDGMENTS No conflict of interest declared. Received: September 29, 2014 Revised: December 26, 2014 Accepted: February 1, 2015 Published: February 10, 2015
Qi You, Liwei Zhang, Xin Yi, Zhenghai Zhang, Wengying Xu*, and Zhen Su* State Key Laboratory of Plant Physiology and Biochemistry, College of Biological Sciences, China Agricultural University, Beijing 100193, China *Correspondence: Zhen Su (
[email protected]), Wengying Xu (
[email protected]) http://dx.doi.org/10.1016/j.molp.2015.02.001
REFERENCES Barton, L., Newsome, S.D., Chen, F.H., Wang, H., Guilderson, T.P., and Bettinger, R.L. (2009). Agricultural origins and the isotopic identity of domestication in northern China. Proc. Natl. Acad. Sci. USA 106:5523–5528. Bennetzen, J.L., Schmutz, J., Wang, H., Percifield, R., Hawkins, J., Pontaroli, A.C., Estep, M., Feng, L., Vaughn, J.N., Grimwood, J., et al. (2012). Reference genome sequence of the model plant Setaria. Nat. Biotechnol. 30:555–561.
(D) The BLAST search page. (E) The gbrowse_syn page for investigating cds synteny and genomic synteny relationships between two genome versions (Bennetzen et al., 2012; Zhang et al., 2012). (F) The genome browser page for integrating gene structure, ESTs, microRNAs, RNA-seq, and miRNA-seq data. (G) An information page showing one microRNA with its sequence, location, target genes, and secondary structure. (H) A table listing information for six gene families, including carbohydrate-active enzymes, transcriptional regulators, kinases, ubiquitin proteasome components, cytochrome P450 enzymes, and EAR motif-containing proteins. (I) Detail for the ubiquitin subfamily (UBC), including gene id, prediction method, and Pfam domain. (J) An example of an information page for one interesting gene containing several types of information such as location, orthologous genes, gene family designation, domains, associated pathway(s), and expression profiles. (K) An example of a page for one metabolic pathway showing the participated enzymes and wiring diagrams.
Molecular Plant 8, 967–970, June 2015 ª The Author 2015.
969
Molecular Plant Bettinger, R.L., Barton, L., and Morgan, C. (2010). The origins of food production in north China: a different kind of agricultural revolution. Evol. Anthropol. Issues News Rev. 19:9–21. Doust, A.N., Kellogg, E.A., Devos, K.M., and Bennetzen, J.L. (2009). Foxtail millet: a sequence-driven grass model system. Plant Physiol. 149:137–141. Eveland, A.L., Goldshmidt, A., Pautler, M., Morohashi, K., LiseronMonfils, C., Lewis, M.W., Kumari, S., Hiraga, S., Yang, F., UngerWallace, E., et al. (2014). Regulatory modules controlling maize inflorescence architecture. Genome Res. 24:431–443. Jiang, L., Liu, X., Xiong, G., Liu, H., Chen, F., Wang, L., Meng, X., Liu, G., Yu, H., Yuan, Y., et al. (2013). DWARF 53 acts as a repressor of strigolactone signalling in rice. Nature 504:401–405. Kagale, S., and Rozwadowski, K. (2011). EAR motif-mediated transcriptional repression in plants: an underlying mechanism for epigenetic regulation of gene expression. Epigenetics 6:141–146.
970
Molecular Plant 8, 967–970, June 2015 ª The Author 2015.
Letter to the Editor Li, P., and Brutnell, T.P. (2011). Setaria viridis and Setaria italica, model genetic systems for the Panicoid grasses. J. Exp. Bot. 62:3031– 3037. Wang, L., Czedik-Eysenberg, A., Mertz, R.A., Si, Y., Tohge, T., NunesNesi, A., Arrivault, S., Dedow, L.K., Bryant, D.W., Zhou, W., et al. (2014). Comparative analyses of C4 and C3 photosynthesis in developing leaves of maize and rice. Nat. Biotechnol. 32:1158– 1165. Yi, F., Xie, S., Liu, Y., Qi, X., and Yu, J. (2013). Genome-wide characterization of microRNA in foxtail millet (Setaria italica). BMC Plant Biol. 13:212. Zhang, G., Liu, X., Quan, Z., Cheng, S., Xu, X., Pan, S., Xie, M., Zeng, P., Yue, Z., Wang, W., et al. (2012). Genome sequence of foxtail millet (Setaria italica) provides insights into grass evolution and biofuel potential. Nat. Biotechnol. 30:549–554.