Accepted Manuscript Rice Information GateWay (RIGW): A Comprehensive Bioinformatics Platform for Indica Rice Genomes Jia-Ming Song, Yang Lei, Cheng-Cheng Shu, Yuduan Ding, Feng Xing, Hao Liu, Jia Wang, Weibo Xie, Jianwei Zhang, Ling-Ling Chen PII: DOI: Reference:
S1674-2052(17)30302-7 10.1016/j.molp.2017.10.003 MOLP 531
To appear in: MOLECULAR PLANT Accepted Date: 8 October 2017
Please cite this article as: Song J.-M., Lei Y., Shu C.-C., Ding Y., Xing F., Liu H., Wang J., Xie W., Zhang J., and Chen L.-L. (2017). Rice Information GateWay (RIGW): A Comprehensive Bioinformatics Platform for Indica Rice Genomes. Mol. Plant. doi: 10.1016/j.molp.2017.10.003. This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. All studies published in MOLECULAR PLANT are embargoed until 3PM ET of the day they are published as corrected proofs on-line. Studies cannot be publicized as accepted manuscripts or uncorrected proofs.
ACCEPTED MANUSCRIPT
1
Rice Information GateWay (RIGW): a comprehensive
2
bioinformatics platform for indica rice genomes
RI PT
3 4
Jia-Ming Song1, 2, Yang Lei1, 2, Cheng-Cheng Shu1, 2, Yuduan Ding1, 2, Feng
6
Xing1,2, Hao Liu1, Jia Wang2, Weibo Xie1, 2, Jianwei Zhang1*, Ling-Ling Chen1, 2*
SC
5
7 1
National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural
M AN U
8
University, Wuhan 430070, P. R. China
9
11
2
Agricultural Bioinformatics Key Laboratory of Hubei Province, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
TE D
10
12 13
EP
14
AC C
15 16
* Correspondence: Ling-Ling Chen or Jianwei Zhang
17
Ling-Ling Chen
E-mail:
[email protected]
18
Jianwei Zhang
E-mail:
[email protected]
19 20 21
Running title: A bioinformatics platform for indica rice
22 23 1
ACCEPTED MANUSCRIPT 24
Dear Editor,
25
Oryza sativa subsp. indica and O. sativa subsp. japonica are two subspecies of
27
Asian cultivated rice with the fact that indica rice is much more widely grown and
28
genetically diverse. Over the past years, the Rice Annotation Project Database
29
(RAP-DB) (Ohyanagi et al., 2006) and Michigan State University Rice Genome
30
Annotation Project (MSU-RGAP) (Ouyang et al., 2007) are two popular databases
31
that have been developed to manage rice genomic and transcriptomic data based on a
32
unified reference genome of japonica cultivar Nipponbare (International Rice
33
Genome Sequencing Project, 2005). Beijing Genomics Institute Rice Information
34
System (BGI-RIS) (Zhao et al, 2004) is an available resource for indica rice cultivar
35
93-11, however, its application was limited due to the lack of high-quality indica
36
reference genomes. To fill the gap, we constructed an integrative and comprehensive
37
platform, Rice Information GateWay (RIGW, http://rice.hzau.edu.cn/), to provide
38
genomics, transcriptomics, protein-protein interactions (PPIs), metabolic network,
39
metabolites and computational tools by using our newly obtained map-based
40
reference genomes of indica rice Zhenshan 97 (ZS97) and Minghui 63 (MH63)
41
(Zhang et al., 2016). RIGW serves the rice community by making a wealth of
42
genomics and other omics data through an intuitive web-based interface.
TE D
M AN U
SC
RI PT
26
RIGW was implemented in Linux operation system and Apache Tomcat web
44
server (http://tomcat.apache.org/). All the genomic data, annotation, homologs, gene
45
expression, PPIs, metabolites and literatures were organized and stored in MySQL
46
database (http://www.mysql.com/). The architecture, some representative resources
47
and computational tools in RIGW were shown in Figure 1A. A local GBrowse
48
(https://github.com/GMOD/GBrowse) was deployed for visualization of ZS97 and
49
MH63 genomic and transcriptomic data (Figure 1B), the selected tracks included gene
50
annotation and RNA-sequencing evidences in flag leaf, panicle and shoot,
51
respectively. In addition, comparative analysis among ZS97, MH63 and Nipponbare
52
genomes were supplied with the tool of Gbrowse_synteny (Figure 1C), all the
53
syntenic regions with corresponding annotations can be easily displayed in parallel
54
(each genome can be set as a reference).
AC C
EP
43
55
We developed a flexible query interface to retrieve and graphically visualize
56
various data efficiently. For example, a keyword-based search engine was provided to 2
ACCEPTED MANUSCRIPT look for all relevant genes by entering keywords (e.g. gene locus, gene function) and
58
linked to detailed pages (e.g. gene location, gene structure, alternative splicing,
59
homologs in other rice cultivars, nucleotide and amino acid sequences, gene
60
expression levels, etc.). Additionally, Gene Ontology (Harris et al., 2004), InterPro
61
domain information, predicted subcellular location and protein-protein interaction, as
62
well as extra links to external databases, were also listed in the search results if
63
available (Figure 1D). Basic Local Alignment Search Tool (BLAST) were supplied as
64
a sequence-based search engine to find homologs in indica rice ZS97, MH63, 93-11
65
and japonica rice Nipponbare by presenting alignment results in graphical and text
66
format.
SC
RI PT
57
Characteristics of ZS97 and MH63 genomic features were listed in Supplemental
68
Table 1 and RIGW homepage. We manually collected >2,000 cloned genes in
69
different rice cultivars with related references and >2,500 rice metabolites with
70
detailed annotation information. Using the data from CREP (http://crep.ncpgr.cn/), we
71
set up a friendly web interface to query and visualize gene expression levels of 39
72
tissues throughout the life cycle of ZS97, MH63 and their hybrid Shanyou 63 (SY63).
73
For a given gene, its expression profile in all available tissues was visualized in a box
74
plot, which greatly facilitated the investigation of its expression pattern. As
75
genome-wide PPI networks are very useful to study the cellular behavior with a
76
global view, we collected 1,871,563 non-redundant rice PPIs (929 of them are
77
experimentally determined PPIs) from public databases including PRIN (Gu et al.,
78
2011), RiceNet (Lee et al., 2015) and related literatures in RIGW. Users can submit
79
one or more gene IDs of ZS97/ MH63/Nipponbare on the PPI search page, then the
80
server will return proteins that interact with the query proteins, which can help to
81
reveal the relationships among different kinds of proteins with various functions. The
82
query protein and its interaction partners were visualized with Cytoscape
83
(http://www.cytoscape.org/) and different color nodes represented different pathway
84
classifications (Figure 1E). In addition, all the PPIs in Nipponbare, ZS97 and MH63
85
can be downloaded from ‘download’ module.
AC C
EP
TE D
M AN U
67
86
KEGG metabolic pathway maps are graphical diagrams representing knowledge
87
of reaction networks for metabolism, and each map summarizes experimental
88
evidence in published literatures (Kanehisa et al., 2012). Based on KEGG Orthology
89
(KO) groups, we obtained the KEGG orthologs in ZS97 and MH63 genomes, and
90
generated their metabolic pathways. KEGG modules in each pathway map can be 3
ACCEPTED MANUSCRIPT generated by converting nodes to gene identifiers and highlighted in green. Metabolic
92
pathways in ZS97 and MH63 contained four categories (i.e., metabolism, genetic
93
information processing, environmental information processing and cellular processes)
94
and each category contained many pathways. When a specific pathway was selected,
95
the enzymes/proteins that had KEGG orthologs in ZS97 and MH63 were indicated
96
with green (Figure 1F).
RI PT
91
A series of computational tools were integrated in RIGW for comparative,
98
evolutionary and functional analysis of rice and other plants. OrthoMCL (Li et al.,
99
2003) was used to identify homologs in plant genomes including Arabidopsis,
100
Brachypodium, maize, grapevine and sorghum. We performed OrthoMCL (e-value:
101
1e-5) to identify putative orthology and inparalogy relationships and generated
102
disjoint clusters of closely related proteins in rice and the above plants. A total of
103
48,515 putative orthologous groups including in-paralogs were identified and stored
104
in RIGW, which can also be obtained from ‘download’ module. We defined
105
homologous gene pairs with MCscanX (Wang et al., 2012) among chromosomes
106
(e-value < 1e-10) and determined the homologous genic blocks to show the segmental
107
duplicated regions in ZS97 and MH63 genomes (Figure 1G). We also provided a gene
108
ID conversion tool to convert the orthologous gene IDs among ZS97, MH63, 93-11
109
and Nipponbare. Furthermore, KEGG/GO enrichment, GO slim classification tools
110
were supplied to perform functional enrichment analysis. For the convenience of
111
genome editing in different rice cultivars, we integrated CRISPR-P 2.0 (Liu et al.,
112
2017) to design guide RNA sequences for various Clustered Regularly Interspaced
113
Short Palindromic Repeats (CRISPR)-Cas systems, and an example result was shown
114
in Figure 1H. Lastly, a text mining tool was available in RIGW, which allows a user
115
to search papers by gene names or keywords in 27,831 rice-related literatures
116
obtained from PubMed (http://www.ncbi.nlm.nih.gov/pubmed/) (Figure 1I).
M AN U
TE D
EP
AC C
117
SC
97
In summary, we established a comprehensive bioinformatics platform RIGW to
118
provide GBrowse-based view of ZS97 and MH63 genomic and other omics data.
119
RIGW offered homologs among indica and japonica rice, and other plant species. We
120
also provided user-friendly web interfaces to show the predicted PPIs in rice, the
121
metabolic pathways in ZS97/MH63, CRISPR-Cas single guide RNA design tool and
122
GO enrichment in RIGW. All the genomic sequences and annotation could be freely
123
accessed, and useful links to other public databases were offered. In the near further,
124
we will integrate more available resources and extend its functionality with new tools 4
ACCEPTED MANUSCRIPT 125
to make RIGW substantially server the rice community as a comprehensive
126
bioinformatics platform. RIGW is freely available at http://rice.hzau.edu.cn/.
127 128
SUPPLEMENTAL INFORMATION
129
Supplemental Information is available ate Molecular Plant Online.
RI PT
130
FUNDING
132
This work was supported by the National Key Research and Development Program of
133
China (2016YFD0100904), the National Natural Science Foundation of China
134
(31571351) and the National Science Foundation of Hubei Province (2015CFA044).
135
No conflict of interest declared.
M AN U
136
SC
131
137
AUTHOR CONTRIBUTIONS
138
J.M.S. and Y.L. constructed the website; J.M.S., C.C.S., Y.D., F.X., H.L., J.W. and
139
W.X. collected the data; L.L.C. and J.Z. designed the project and wrote the paper.
140
ACKNOWLEDGMENTS
142
No conflict of interest declared.
143
TE D
141
REFERENCES
145 146 147 148 149 150 151 152 153 154 155 156 157 158 159
Gu, H., Zhu, P., Jiao, Y., Meng, Y., and Chen, M. (2011). PRIN: a predicted rice interactome
EP
144
network. BMC Bioinformatics 12: 161.
AC C
Harris, M.A., Clark, J., Ireland, A., Lomax, J., Ashburner, M., Foulger, R., Eilbeck, K., Lewis, S., Marshall, B., Mungall, C., et al. (2004). The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res. 32: D258-261.
International Rice Genome Sequencing Project. (2005). The map-based sequence of the rice genome. Nature 436: 793-800.
Kanehisa, M., Goto, S., Sato, Y., Furumichi, M., and Tanabe, M. (2012). KEGG for integration and interpretation of large-scale molecular data sets. Nucleic Acids Res. 40: D109-114. Lee, T., Oh, T., Yang, S., Shin, J., Hwang, S., Kim, C.Y., Kim, H., Shim, H., Shim, J.E., Ronald, P.C., Lee, I. (2015). RiceNet v2: an improved network prioritization server for rice genes. Nucleic Acids Res. 43: W122-127. Li, L., Stoeckert, C.J. Jr., and Roos, D.S. (2003). OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res. 13: 2178-2189. Liu, H., Ding, Y., Zhou, Y., Jin, W., Xie, K., and Chen, L.L. (2017). CRISPR-P 2.0: an improved 5
ACCEPTED MANUSCRIPT CRISPR-Cas9 tool for genome editing in plants. Mol. Plant 10: 530-532. Ohyanagi, H., Tanaka, T., Sakai, H., Shigemoto, Y., Yamaguchi, K., Habara, T., Fujii, Y., Antonio, B.A., Nagamura, Y., Imanishi, T., et al. (2006). The Rice Annotation Project Database (RAP-DB): hub for Oryza sativa ssp. japonica genome information. Nucleic Acids Res. 34: D741-744. Ouyang, S., Zhu, W., Hamilton, J., Lin, H., Campbell, M., Childs, K., Thibaud-Nissen, F., Malek, improvements and new features. Nucleic Acids Res. 35: D846-851.
RI PT
R.L., Lee, Y., Zheng, L., et al. (2007). The TIGR Rice Genome Annotation Resource: Wang, Y., Tang, H., Debarry, J.D., Tan, X., Li, J., Wang, X., Lee, T.H., Jin, H., Marler, B., Guo, H., et al. (2012). MCScanX: A toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Res. 40: e49.
Zhang, J., Chen, L.L., Xing, F., Kudrna, D.A., Yao, W., Copetti, D., Mu, T., Li, W., Song, J.M.,
SC
Xie, W., et al. (2016). Extensive sequence divergence between the reference genomes of two elite indica rice varieties Zhenshan 97 and Minghui 63. Proc. Natl. Acad. Sci. USA 113: E5163-5171. Zhao, W., Wang, J., He, X., Huang, X., Jiao, Y., Dai, M., Wei, S., Fu, J., Chen, Y., Ren, X., et al. (2004) BGI-RIS: an integrated information resource and comparative analysis workbench for rice genomics. Nucleic Acids Res. 32: D377-382.
M AN U
160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177
AC C
EP
TE D
178
6
ACCEPTED MANUSCRIPT Figure Legends
180
Figure 1. The RIGW architecture and some screenshots of representative resources
181
and computational tools.
182
(A) The three-layer architecture of RIGW. (B) GBrowse. (C) GBrowse-synteny. (D)
183
Gene information page. (E) Predicted PPIs. (F) KEGG pathway. (G) Segmental
184
duplications in MH63 genome. (H) Single guide RNAs designed with CRISPR-P 2.0.
185
(I) Literature search results.
RI PT
179
AC C
EP
TE D
M AN U
SC
186
7
AC C
EP
TE D
M AN U
SC
RI PT
ACCEPTED MANUSCRIPT