Rice Information GateWay: A Comprehensive Bioinformatics Platform for Indica Rice Genomes

Rice Information GateWay: A Comprehensive Bioinformatics Platform for Indica Rice Genomes

Accepted Manuscript Rice Information GateWay (RIGW): A Comprehensive Bioinformatics Platform for Indica Rice Genomes Jia-Ming Song, Yang Lei, Cheng-Ch...

3MB Sizes 2 Downloads 44 Views

Accepted Manuscript Rice Information GateWay (RIGW): A Comprehensive Bioinformatics Platform for Indica Rice Genomes Jia-Ming Song, Yang Lei, Cheng-Cheng Shu, Yuduan Ding, Feng Xing, Hao Liu, Jia Wang, Weibo Xie, Jianwei Zhang, Ling-Ling Chen PII: DOI: Reference:

S1674-2052(17)30302-7 10.1016/j.molp.2017.10.003 MOLP 531

To appear in: MOLECULAR PLANT Accepted Date: 8 October 2017

Please cite this article as: Song J.-M., Lei Y., Shu C.-C., Ding Y., Xing F., Liu H., Wang J., Xie W., Zhang J., and Chen L.-L. (2017). Rice Information GateWay (RIGW): A Comprehensive Bioinformatics Platform for Indica Rice Genomes. Mol. Plant. doi: 10.1016/j.molp.2017.10.003. This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. All studies published in MOLECULAR PLANT are embargoed until 3PM ET of the day they are published as corrected proofs on-line. Studies cannot be publicized as accepted manuscripts or uncorrected proofs.

ACCEPTED MANUSCRIPT

1

Rice Information GateWay (RIGW): a comprehensive

2

bioinformatics platform for indica rice genomes

RI PT

3 4

Jia-Ming Song1, 2, Yang Lei1, 2, Cheng-Cheng Shu1, 2, Yuduan Ding1, 2, Feng

6

Xing1,2, Hao Liu1, Jia Wang2, Weibo Xie1, 2, Jianwei Zhang1*, Ling-Ling Chen1, 2*

SC

5

7 1

National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural

M AN U

8

University, Wuhan 430070, P. R. China

9

11

2

Agricultural Bioinformatics Key Laboratory of Hubei Province, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China

TE D

10

12 13

EP

14

AC C

15 16

* Correspondence: Ling-Ling Chen or Jianwei Zhang

17

Ling-Ling Chen

E-mail: [email protected]

18

Jianwei Zhang

E-mail: [email protected]

19 20 21

Running title: A bioinformatics platform for indica rice

22 23 1

ACCEPTED MANUSCRIPT 24

Dear Editor,

25

Oryza sativa subsp. indica and O. sativa subsp. japonica are two subspecies of

27

Asian cultivated rice with the fact that indica rice is much more widely grown and

28

genetically diverse. Over the past years, the Rice Annotation Project Database

29

(RAP-DB) (Ohyanagi et al., 2006) and Michigan State University Rice Genome

30

Annotation Project (MSU-RGAP) (Ouyang et al., 2007) are two popular databases

31

that have been developed to manage rice genomic and transcriptomic data based on a

32

unified reference genome of japonica cultivar Nipponbare (International Rice

33

Genome Sequencing Project, 2005). Beijing Genomics Institute Rice Information

34

System (BGI-RIS) (Zhao et al, 2004) is an available resource for indica rice cultivar

35

93-11, however, its application was limited due to the lack of high-quality indica

36

reference genomes. To fill the gap, we constructed an integrative and comprehensive

37

platform, Rice Information GateWay (RIGW, http://rice.hzau.edu.cn/), to provide

38

genomics, transcriptomics, protein-protein interactions (PPIs), metabolic network,

39

metabolites and computational tools by using our newly obtained map-based

40

reference genomes of indica rice Zhenshan 97 (ZS97) and Minghui 63 (MH63)

41

(Zhang et al., 2016). RIGW serves the rice community by making a wealth of

42

genomics and other omics data through an intuitive web-based interface.

TE D

M AN U

SC

RI PT

26

RIGW was implemented in Linux operation system and Apache Tomcat web

44

server (http://tomcat.apache.org/). All the genomic data, annotation, homologs, gene

45

expression, PPIs, metabolites and literatures were organized and stored in MySQL

46

database (http://www.mysql.com/). The architecture, some representative resources

47

and computational tools in RIGW were shown in Figure 1A. A local GBrowse

48

(https://github.com/GMOD/GBrowse) was deployed for visualization of ZS97 and

49

MH63 genomic and transcriptomic data (Figure 1B), the selected tracks included gene

50

annotation and RNA-sequencing evidences in flag leaf, panicle and shoot,

51

respectively. In addition, comparative analysis among ZS97, MH63 and Nipponbare

52

genomes were supplied with the tool of Gbrowse_synteny (Figure 1C), all the

53

syntenic regions with corresponding annotations can be easily displayed in parallel

54

(each genome can be set as a reference).

AC C

EP

43

55

We developed a flexible query interface to retrieve and graphically visualize

56

various data efficiently. For example, a keyword-based search engine was provided to 2

ACCEPTED MANUSCRIPT look for all relevant genes by entering keywords (e.g. gene locus, gene function) and

58

linked to detailed pages (e.g. gene location, gene structure, alternative splicing,

59

homologs in other rice cultivars, nucleotide and amino acid sequences, gene

60

expression levels, etc.). Additionally, Gene Ontology (Harris et al., 2004), InterPro

61

domain information, predicted subcellular location and protein-protein interaction, as

62

well as extra links to external databases, were also listed in the search results if

63

available (Figure 1D). Basic Local Alignment Search Tool (BLAST) were supplied as

64

a sequence-based search engine to find homologs in indica rice ZS97, MH63, 93-11

65

and japonica rice Nipponbare by presenting alignment results in graphical and text

66

format.

SC

RI PT

57

Characteristics of ZS97 and MH63 genomic features were listed in Supplemental

68

Table 1 and RIGW homepage. We manually collected >2,000 cloned genes in

69

different rice cultivars with related references and >2,500 rice metabolites with

70

detailed annotation information. Using the data from CREP (http://crep.ncpgr.cn/), we

71

set up a friendly web interface to query and visualize gene expression levels of 39

72

tissues throughout the life cycle of ZS97, MH63 and their hybrid Shanyou 63 (SY63).

73

For a given gene, its expression profile in all available tissues was visualized in a box

74

plot, which greatly facilitated the investigation of its expression pattern. As

75

genome-wide PPI networks are very useful to study the cellular behavior with a

76

global view, we collected 1,871,563 non-redundant rice PPIs (929 of them are

77

experimentally determined PPIs) from public databases including PRIN (Gu et al.,

78

2011), RiceNet (Lee et al., 2015) and related literatures in RIGW. Users can submit

79

one or more gene IDs of ZS97/ MH63/Nipponbare on the PPI search page, then the

80

server will return proteins that interact with the query proteins, which can help to

81

reveal the relationships among different kinds of proteins with various functions. The

82

query protein and its interaction partners were visualized with Cytoscape

83

(http://www.cytoscape.org/) and different color nodes represented different pathway

84

classifications (Figure 1E). In addition, all the PPIs in Nipponbare, ZS97 and MH63

85

can be downloaded from ‘download’ module.

AC C

EP

TE D

M AN U

67

86

KEGG metabolic pathway maps are graphical diagrams representing knowledge

87

of reaction networks for metabolism, and each map summarizes experimental

88

evidence in published literatures (Kanehisa et al., 2012). Based on KEGG Orthology

89

(KO) groups, we obtained the KEGG orthologs in ZS97 and MH63 genomes, and

90

generated their metabolic pathways. KEGG modules in each pathway map can be 3

ACCEPTED MANUSCRIPT generated by converting nodes to gene identifiers and highlighted in green. Metabolic

92

pathways in ZS97 and MH63 contained four categories (i.e., metabolism, genetic

93

information processing, environmental information processing and cellular processes)

94

and each category contained many pathways. When a specific pathway was selected,

95

the enzymes/proteins that had KEGG orthologs in ZS97 and MH63 were indicated

96

with green (Figure 1F).

RI PT

91

A series of computational tools were integrated in RIGW for comparative,

98

evolutionary and functional analysis of rice and other plants. OrthoMCL (Li et al.,

99

2003) was used to identify homologs in plant genomes including Arabidopsis,

100

Brachypodium, maize, grapevine and sorghum. We performed OrthoMCL (e-value:

101

1e-5) to identify putative orthology and inparalogy relationships and generated

102

disjoint clusters of closely related proteins in rice and the above plants. A total of

103

48,515 putative orthologous groups including in-paralogs were identified and stored

104

in RIGW, which can also be obtained from ‘download’ module. We defined

105

homologous gene pairs with MCscanX (Wang et al., 2012) among chromosomes

106

(e-value < 1e-10) and determined the homologous genic blocks to show the segmental

107

duplicated regions in ZS97 and MH63 genomes (Figure 1G). We also provided a gene

108

ID conversion tool to convert the orthologous gene IDs among ZS97, MH63, 93-11

109

and Nipponbare. Furthermore, KEGG/GO enrichment, GO slim classification tools

110

were supplied to perform functional enrichment analysis. For the convenience of

111

genome editing in different rice cultivars, we integrated CRISPR-P 2.0 (Liu et al.,

112

2017) to design guide RNA sequences for various Clustered Regularly Interspaced

113

Short Palindromic Repeats (CRISPR)-Cas systems, and an example result was shown

114

in Figure 1H. Lastly, a text mining tool was available in RIGW, which allows a user

115

to search papers by gene names or keywords in 27,831 rice-related literatures

116

obtained from PubMed (http://www.ncbi.nlm.nih.gov/pubmed/) (Figure 1I).

M AN U

TE D

EP

AC C

117

SC

97

In summary, we established a comprehensive bioinformatics platform RIGW to

118

provide GBrowse-based view of ZS97 and MH63 genomic and other omics data.

119

RIGW offered homologs among indica and japonica rice, and other plant species. We

120

also provided user-friendly web interfaces to show the predicted PPIs in rice, the

121

metabolic pathways in ZS97/MH63, CRISPR-Cas single guide RNA design tool and

122

GO enrichment in RIGW. All the genomic sequences and annotation could be freely

123

accessed, and useful links to other public databases were offered. In the near further,

124

we will integrate more available resources and extend its functionality with new tools 4

ACCEPTED MANUSCRIPT 125

to make RIGW substantially server the rice community as a comprehensive

126

bioinformatics platform. RIGW is freely available at http://rice.hzau.edu.cn/.

127 128

SUPPLEMENTAL INFORMATION

129

Supplemental Information is available ate Molecular Plant Online.

RI PT

130

FUNDING

132

This work was supported by the National Key Research and Development Program of

133

China (2016YFD0100904), the National Natural Science Foundation of China

134

(31571351) and the National Science Foundation of Hubei Province (2015CFA044).

135

No conflict of interest declared.

M AN U

136

SC

131

137

AUTHOR CONTRIBUTIONS

138

J.M.S. and Y.L. constructed the website; J.M.S., C.C.S., Y.D., F.X., H.L., J.W. and

139

W.X. collected the data; L.L.C. and J.Z. designed the project and wrote the paper.

140

ACKNOWLEDGMENTS

142

No conflict of interest declared.

143

TE D

141

REFERENCES

145 146 147 148 149 150 151 152 153 154 155 156 157 158 159

Gu, H., Zhu, P., Jiao, Y., Meng, Y., and Chen, M. (2011). PRIN: a predicted rice interactome

EP

144

network. BMC Bioinformatics 12: 161.

AC C

Harris, M.A., Clark, J., Ireland, A., Lomax, J., Ashburner, M., Foulger, R., Eilbeck, K., Lewis, S., Marshall, B., Mungall, C., et al. (2004). The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res. 32: D258-261.

International Rice Genome Sequencing Project. (2005). The map-based sequence of the rice genome. Nature 436: 793-800.

Kanehisa, M., Goto, S., Sato, Y., Furumichi, M., and Tanabe, M. (2012). KEGG for integration and interpretation of large-scale molecular data sets. Nucleic Acids Res. 40: D109-114. Lee, T., Oh, T., Yang, S., Shin, J., Hwang, S., Kim, C.Y., Kim, H., Shim, H., Shim, J.E., Ronald, P.C., Lee, I. (2015). RiceNet v2: an improved network prioritization server for rice genes. Nucleic Acids Res. 43: W122-127. Li, L., Stoeckert, C.J. Jr., and Roos, D.S. (2003). OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res. 13: 2178-2189. Liu, H., Ding, Y., Zhou, Y., Jin, W., Xie, K., and Chen, L.L. (2017). CRISPR-P 2.0: an improved 5

ACCEPTED MANUSCRIPT CRISPR-Cas9 tool for genome editing in plants. Mol. Plant 10: 530-532. Ohyanagi, H., Tanaka, T., Sakai, H., Shigemoto, Y., Yamaguchi, K., Habara, T., Fujii, Y., Antonio, B.A., Nagamura, Y., Imanishi, T., et al. (2006). The Rice Annotation Project Database (RAP-DB): hub for Oryza sativa ssp. japonica genome information. Nucleic Acids Res. 34: D741-744. Ouyang, S., Zhu, W., Hamilton, J., Lin, H., Campbell, M., Childs, K., Thibaud-Nissen, F., Malek, improvements and new features. Nucleic Acids Res. 35: D846-851.

RI PT

R.L., Lee, Y., Zheng, L., et al. (2007). The TIGR Rice Genome Annotation Resource: Wang, Y., Tang, H., Debarry, J.D., Tan, X., Li, J., Wang, X., Lee, T.H., Jin, H., Marler, B., Guo, H., et al. (2012). MCScanX: A toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Res. 40: e49.

Zhang, J., Chen, L.L., Xing, F., Kudrna, D.A., Yao, W., Copetti, D., Mu, T., Li, W., Song, J.M.,

SC

Xie, W., et al. (2016). Extensive sequence divergence between the reference genomes of two elite indica rice varieties Zhenshan 97 and Minghui 63. Proc. Natl. Acad. Sci. USA 113: E5163-5171. Zhao, W., Wang, J., He, X., Huang, X., Jiao, Y., Dai, M., Wei, S., Fu, J., Chen, Y., Ren, X., et al. (2004) BGI-RIS: an integrated information resource and comparative analysis workbench for rice genomics. Nucleic Acids Res. 32: D377-382.

M AN U

160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177

AC C

EP

TE D

178

6

ACCEPTED MANUSCRIPT Figure Legends

180

Figure 1. The RIGW architecture and some screenshots of representative resources

181

and computational tools.

182

(A) The three-layer architecture of RIGW. (B) GBrowse. (C) GBrowse-synteny. (D)

183

Gene information page. (E) Predicted PPIs. (F) KEGG pathway. (G) Segmental

184

duplications in MH63 genome. (H) Single guide RNAs designed with CRISPR-P 2.0.

185

(I) Literature search results.

RI PT

179

AC C

EP

TE D

M AN U

SC

186

7

AC C

EP

TE D

M AN U

SC

RI PT

ACCEPTED MANUSCRIPT