The Chromosome-Based Rubber Tree Genome Provides New Insights into Spurge Genome Evolution and Rubber Biosynthesis

The Chromosome-Based Rubber Tree Genome Provides New Insights into Spurge Genome Evolution and Rubber Biosynthesis

Journal Pre-proof The Chromosome-based Rubber Tree Genome Provides New Insights into Spurge Genome Evolution and Rubber Biosynthesis Jin Liu, Cong Shi...

4MB Sizes 2 Downloads 112 Views

Journal Pre-proof The Chromosome-based Rubber Tree Genome Provides New Insights into Spurge Genome Evolution and Rubber Biosynthesis Jin Liu, Cong Shi, Cheng-Cheng Shi, Wei Li, Qun-Jie Zhang, Yun Zhang, Kui Li, Hui-Fang Lu, Chao Shi, Si-Tao Zhu, Zai-Yun Xiao, Hong Nan, Yao Yue, Xun-Ge Zhu, Yu Wu, Xiao-Ning Hong, Guang-Yi Fan, Yan Tong, Dan Zhang, Chang-Li Mao, Yun-Long Liu, Shi-Jie Hao, Wei-Qing Liu, Mei-Qi Lv, Hai-Bin Zhang, Yuan Liu, Ge-Ran Hu-tang, Jin-peng Wang, Jia-Hao Wang, Ying-Huai Sun, Shu-bang Ni, Wen-Bin Chen, Xing-Cai Zhang, Yuan-nian Jiao, Evan E. Eichler, Guo-hua Li, Xin Liu, Li-Zhi Gao PII: DOI: Reference:

S1674-2052(19)30402-2 https://doi.org/10.1016/j.molp.2019.10.017 MOLP 865

To appear in: MOLECULAR PLANT Accepted Date: 30 October 2019

Please cite this article as: Liu J., Shi C., Shi C.-C., Li W., Zhang Q.-J., Zhang Y., Li K., Lu H.-F., Shi C., Zhu S.-T., Xiao Z.-Y., Nan H., Yue Y., Zhu X.-G., Wu Y., Hong X.-N., Fan G.-Y., Tong Y., Zhang D., Mao C.-L., Liu Y.-L., Hao S.-J., Liu W.-Q., Lv M.-Q., Zhang H.-B., Liu Y., Hu-tang G.-R., Wang J.-p., Wang J.-H., Sun Y.-H., Ni S.-b., Chen W.-B., Zhang X.-C., Jiao Y.-n., Eichler E.E., Li G.-h., Liu X., and Gao L.-Z. (2020). The Chromosome-based Rubber Tree Genome Provides New Insights into Spurge Genome Evolution and Rubber Biosynthesis. Mol. Plant. doi: https://doi.org/10.1016/ j.molp.2019.10.017. This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. All studies published in MOLECULAR PLANT are embargoed until 3PM ET of the day they are published as corrected proofs on-line. Studies cannot be publicized as accepted manuscripts or uncorrected proofs.

© 2019 The Author

1 2 3

The Chromosome-based Rubber Tree Genome Provides New Insights into Spurge Genome Evolution and Rubber Biosynthesis

4 5

Jin Liu1,2,13, Cong Shi2,3,13, Cheng-Cheng Shi4,13, Wei Li5,13, Qun-Jie Zhang5, 13, Yun

6

Zhang6, Kui Li7,8, Hui-Fang Lu9, Chao Shi2, Si-Tao Zhu4, Zai-Yun Xiao1, Hong

7

Nan2,3, Yao Yue4, Xun-Ge Zhu2,3, Yu Wu1, Xiao-Ning Hong4, Guang-Yi Fan4,9, Yan

8

Tong2, Dan Zhang5, Chang-Li Mao1, Yun-Long Liu2, Shi-Jie Hao4, Wei-Qing Liu9,

9

Mei-Qi Lv4, Hai-Bin Zhang2, Yuan Liu2, Ge-Ran Hu-tang2,3, Jin-peng Wang3,10, Jia-

10

Hao Wang4, Ying-Huai Sun9, Shu-bang Ni1, Wen-Bin Chen9, Xing-Cai Zhang11, Yuan-nian Jiao10, Evan E. Eichler12, Guo-hua Li1, Xin Liu4,9,*, and Li-Zhi Gao2,5,*

11 12 13

1

Yunnan Institute of Tropical Crops, Jinghong 666100, China

14

2

Plant Germplasm and Genomics Center, Germplasm Bank of Wild Species in

15

Southwestern China, Kunming Institute of Botany, Chinese Academy of Sciences,

16

Kunming 650204, China

17

3

University of Chinese Academy of Sciences, Beijing 100049, China

18

4

BGI-Qingdao, Qingdao 266555, China

19

5

Institution of Genomics and Bioinformatics, South China Agricultural University,

20

Guangzhou 510642, China

21

6

22

University, Kunming 650224, China

23

7

School of Life Sciences, Nanjing University, Nanjing 210023, China

24

8

Novogene Bioinformatics Institute, Beijing 100083, China

25

9

BGI-Shenzhen, Shenzhen 518083, China

26

10

27

Chinese Academy of Sciences, Beijing100093, China

28

11

29

Cambridge, MA 02138, USA

30

12

31

USA

32

13

33

*

34

([email protected])

Asia-Pacific Tropical Forestry Germplasm Institution, Southwest China Forestry

State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany,

John A. Paulson School of Engineering and Applied Sciences, Harvard University,

Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195,

These authors contributed equally to this article.

Correspondence: Li-Zhi Gao ([email protected]) or Xin Liu

35

SHORT SUMMARY

36

This study reports a chromosome-based genome for rubber tree and provides novel

37

insight into the spurge chromosome evolution. Moreover, it is found that significant

38

expansion of genes associated with whole rubber biosynthesis processes relevant to

39

the latex production. A number of candidate domestication genes are also identified,

40

some of which are associated with the rubber biosynthesis in the cultivated rubber

41

trees.

42

43

ABSTRACT

44

The rubber tree, Hevea brasiliensis, produces natural rubber that serves as an essential

45

industrial raw material. Here, we present a high-quality reference genome for a rubber

46

tree cultivar GT1 using single-molecule real-time sequencing (SMRT) and Hi-C

47

technologies to anchor the ~1.47-Gb genome assembly into 18 pseudo-chromosomes.

48

The chromosome-based genome analysis enabled us to establish a model of spurge

49

chromosome evolution since the common paleopolyploid event occurred before the

50

split of Hevea and Manihot. We show recent and rapid bursts of the three Hevea-

51

specific LTR-retrotransposon families during the last ten million years (MYR),

52

leading to the massive expansion ~60.44% (~890 Mbp) of the whole rubber tree

53

genome since the divergence from Manihot. We identify large-scale expansion of

54

genes associated with whole rubber biosynthesis processes, such as basal metabolic

55

processes, ethylene biosynthesis, and the activation of polysaccharide and

56

glycoprotein lectin, which are important properties for the latex production. We report

57

the first map of genomic variation between the cultivated and wild rubber trees and

58

obtained ~15.7 million high-quality single nucleotide polymorphisms (SNPs). We

59

identified hundreds of candidate domestication genes with drastically lowered

60

genomic diversity in the cultivated but not wild rubber trees despite a relatively short

61

domestication history, some of which are involved in the rubber biosynthesis. This

62

genome assembly represents key resources for future rubber tree breeding programs

63

providing novel gene targets to improve biotic and abiotic tolerance and rubber

64

production.

65 66

Key words: rubber tree, rubber biosynthesis, chromosome evolution, whole genome

67

duplication, domestication

68 69

70

INTRODUCTION

71

The rubber tree (Hevea brasiliensis (Willd. ex A. Juss.) Muell. Arg.), together with a

72

number of other economically important plant species, such as castor bean (Ricinus

73

communis L.), cassava (Manihot esculenta Crantz), and Barbados nut (Jatropha

74

curcas), belong to the spurge family (Euphorbiaceae). Among ~2,500 latex-yielding

75

plants, such as Eucommia ulmoides Oliver (Wuyun et al., 2018) and Taraxacum kok-

76

saghyz Rodin (Lin et al., 2018), only H. brasiliensis produces commercially viable

77

amount of natural rubber (cis-1, 4-polyisoprene), making up more than 98% of the

78

world’s natural rubber production (Backhaus, 1985; Bowers, 1990). These high-

79

quality isoprenoid polymers possess unique physical and chemical properties that are

80

incomparable to any synthetic alternatives (van Beilen and Poirier, 2007). Natural

81

rubber is, thus, an indispensable source material for numerous rubber products

82

worldwide. In sharp contrast to the environmental pollution of the industrial

83

synthetics, the rubber tree is able to sustainably yield natural rubber that is still

84

imperative for worldwide high-performance engineering components, such as heavy-

85

duty tires. H. brasiliensis remains a longstanding target for genetic manipulation in

86

order to improve and industrially enhance the commercial production of natural

87

rubber.

88 89

H. brasiliensis is a cross-pollinated tropical tree that grows to 30–40 m tall and can

90

live up to 100 years in the wild (Priyadarshan and Clement-Demange, 2004).

91

Historical records have documented that the domestication of the rubber tree, which is

92

native to the Amazon Basin in South America, began in 1896, and then dispersed to

93

Southeast Asia with the transfer of H. brasiliensis seedlings, mainly including

94

Malaysia, Indonesia, Thailand, and China today (Chan, 2000). Over a century’s

95

traditional breeding has greatly increased rubber productivity from 650 kg ha–1

96

yielded from wild natural populations of H. brasiliensis during the 1920s to 2,500 kg

97

ha–1 harvested in elite cultivars by the 1990s (Priyadarshan and Goncalves, 2003),

98

which is far below the hypothetical yield of 7,000-12,000 kg ha–1 in the rubber tree

99

(Webster and Baulkwill, 1989). Notwithstanding the origin of the rubber tree from the

100

Amazonian basin, the production of natural rubber in South America has merely 2%

101

of the total production worldwide because of the destructive spread of South

102

American leaf blight (SALB) disease affected by ascomycete Microcyclus ulei in the

103

1930s (Lieberei, 2007). Thus, although human selection efforts that have brought a

104

slow increase in rubber productivity, it is even more urgently required to raise new H.

105

brasiliensis varieties with desirable traits, including the tolerance to cold, drought and

106

wind, and particularly resistance to diseases caused by pathogenic fungi.

107

Nevertheless, the domestication from a small number of wild individuals originating

108

from the Amazonian basin of South America created a severe bottleneck that has led

109

to a restricted gene pool for cultivated rubber tree germplasms. This stands in contrast

110

to the wealth of phenotypic diversity and genetic adaptations residing in natural wild

111

population. These have the potential to be exploited through breeding programs that

112

would help expand genomic diversity important for the generation of more

113

environmentally resilient and high-yielding varieties. The widespread application of

114

marker-assisted selection promises to advance the breeding efficiency but requires

115

access to a high-quality rubber tree genome sequence in order to accelerate the pace at

116

which the untapped reservoir of agronomically important genes can be exploited from

117

the wild.

118 119

Considering the tremendous economic importance of the rubber tree, there has been a

120

longstanding effort to obtain a high-quality reference genome assembly (Lau et al.,

121

2016; Pootakham et al., 2017; Rahman et al., 2013; Tang et al., 2016). The first draft

122

genome sequence of RRIM600 clone, generated by a Malaysia research team, was

123

very fragmented but provided our first glimpse of the genome structure (Rahman et al.,

124

2013). The second draft rubber tree genome of the cultivar Reyan7-33-97 was

125

reported based on sequence data from both whole-genome shotgun (WGS) and pooled

126

BAC clones. While this assembly was significantly improved with respect to

127

annotation, the use of short Illumina reads posed difficulties in resolving highly

128

heterozygous and repetitive DNA sequences (Tang et al., 2016). The RRIM 600

129

genome assembly was subsequently improved based on ~155-fold combined coverage

130

with Illumina and PacBio sequence data. As part of that effort, 100 SMRT Cells were

131

sequenced using a 10-kbp SMRTbell library yielding 45.25 Gb (~21-fold coverage)

132

with an average read length of 6,852 bp (Lau et al., 2016). Until recently, deep-

133

coverage 454/Illumina short-read and Pacific Biosciences (PacBio) long-read

134

sequence data were combined to generate a de novo hybrid assembly of the

135

preliminary BPM24 rubber tree draft genome, which was subsequently scaffolded

136

using a long-range “Chicago” technique to obtain the best assembly of 1.26Gb (N50=

137

96.8 kb) to date. Using a SNP-based genetic map, only ~28.9% of the genome

138

assembly (~363 Mb) was successfully anchored into rubber tree’s 18 linkage groups

139

(Pootakham et al., 2017). Notwithstanding the complexity of the rubber tree genome,

140

a high-quality chromosome-level genome assembly is required in order to provide a

141

comprehensive understanding of the latex biosynthesis, genetic improvement of

142

desirable agronomic traits and efficient utilization of introduced alleles from wild

143

populations to expand the genetic basis of the cultivated rubber tree.

144 145

Here, we present a high-quality genome sequence of the rubber tree cultivar GT1, an

146

elite cultivar worldwide cultivated based on the single-molecular long-read

147

sequencing (SMRT) technology. This assembly has a length of 1.47-Gb and spans

148

∼94% of the estimated genome size of 1.56-Gb. We anchored this assembly into 18

149

chromosomes and generated the first chromosome-scale genome assembly assisted by

150

Hi-C technology. Together with the data analysis of resequenced genomes and

151

comparative transcriptomics of cultivated and wild rubber trees, we provide novel

152

insights into gene and genome evolution, domestication and the latex biosynthesis in

153

the rubber tree.

154

RESULTS

155

Genome sequencing, assembly, and feature annotation

156

We first performed a whole-genome shotgun sequencing (WGS) analysis of a rubber

157

tree cultivar GT1 with the next-generation sequencing (NGS) Illumina Hiseq 2000

158

platform. This generated clean sequence datasets of ~348.14 Gb and yielded

159

approximately 261.4-fold coverage (Supplementary Table 1). Using a 17-mer

160

analysis method, we estimated the genome size to be ~1.56 Gb (Supplementary

161

Table 2; Supplementary Figure 1). The genome heterozygosity was estimated to be

162

1.60-1.62% using GenomeScope (Vurture et al., 2017). The genome was assembled

163

using SOAPdenovo (Li et al., 2010), resulting in the final assembly of ~1.59 Gb

164

(Supplementary Table 3). The contig and scaffold N50 lengths were ~8.79 Kb and

165

~31.3 Kb, respectively (Supplementary Table 3).

166 167

In order to resolve the repetitive structure and heterozygous regions (Lau et al., 2016;

168

Pootakham et al., 2017; Rahman et al., 2013; Tang et al., 2016), we also sequenced

169

the same GT1 individual using PacBio single-molecule long-read sequencing (SMRT)

170

sequencing platform. We generated a total of ~161.86 Gb (103.75-fold sequence

171

coverage) of long read sequence data from 20 Kb and 40 Kb insert libraries with

172

subread N50 lengths of 9.07 Kb and 18.34 Kb, respectively (Supplementary Table 4;

173

Supplementary Figure 2). We subsequently performed a PacBio-only assembly

174

using an overlap layout-consensus method implemented in FALCON (version 0.3.0)

175

(Chin et al., 2013), and obtained a 1.47-Gb genome assembly with a contig N50 of

176

152.7 Kb (Table 1; Supplementary Tables 5-6). This final assembly of the rubber

177

tree genome comprises 16,023 scaffolds, of which 15,885 scaffolds (>10 kb) represent

178

99.93% of the 1.56-Gb genome (Supplementary Tables 5-7). We employed ~30.9

179

Gb high-quality next-generation sequencing (NGS) data with 19.8-fold genome

180

coverage using the Illumina HiSeq 2500 platform to polish the assembled genome

181

(Supplementary Table 4). Next, ~119.63 Gb sufficiently high-quality Hi-C data

182

(76.69×) were used to further superscaffold the genome assembly (Supplementary

183

Table 8; Supplementary Figure 3). We finally obtained a reference genome of the

184

rubber tree on the chromosome level by anchoring ~1,442 Mbp-sized contigs into 18

185

chromosomes using AllHic v0.8.12 (Zhang et al., 2019) (Figure 1; Supplementary

186

Tables 9-10; Supplementary Figure 4). The total length of the assembled genome

187

sequences accounts for ~92.4% of the estimated genome size (Table 1), about four

188

times longer than the previously reported genome assembly of BPM24 that was linked

189

using a high-density SNP-based genetic map (Pootakham et al., 2015). The lengths of

190

18 chromosomes of the GT1 genome ranged from ~36 Mbp (Chr17) to ~104 Mbp

191

(Chr09) with an average size of ~80 Mbp (Figure 1; Supplementary Table 10;

192

Supplementary Figure 4).

193 194

To evaluate the quality of the assembled rubber tree genome, we first mapped ~61

195

Gbp Illumina short reads. Our results showed that more than 98% of NGS reads could

196

be unambiguously represented with an expected insert size distribution, indicating a

197

high confidence of genome scaffolding (Supplementary Table 11). We then aligned

198

51,701 expressed sequence tags (ESTs) retrieved from public databases and 102,235

199

unigenes assembled through RNA-sequencing (RNA-Seq) data of the six GT1 tissues

200

with our assembled genome, showing that ~97.24% and 97.14% of protein-coding

201

genes could be covered, respectively (Supplementary Tables 12-13). Finally, we

202

assessed core gene statistics using BUSCO (Simao et al., 2015) to verify the

203

sensitivity of gene prediction and the completeness and appropriate haplotig merging

204

of the genome assembly. Our predicted genes recovered 1,359 of the 1,440 (94.4%)

205

highly conserved core proteins in the Embryophyta lineage (Supplementary Table

206

14).

207 208

Whole genome comparisons of the GT1 genome with the three other rubber tree

209

genome assemblies showed that the ordered syntenic genomic sequences displayed a

210

good collinearity between GT1 and the three other rubber tree genomes; ~86-95% of

211

the GT1 genome sequences with lengths of >1 kbp were covered by any of the other

212

three genomes (Supplementary Table 15; Supplementary Figures 5-6). However,

213

the GT1 genome assembly using long-read PacBio sequencing data alone in this study

214

revealed closer syntenic relationships with hybrid assemblies of RRIM600 and

215

BMP24 with Illumina and PacBio sequencing data (Lau et al., 2016; Pootakham et al.,

216

2017) than the fairly fragmented genome assembly of Reyan7-33-97 using the

217

Illumina sequencing data alone (Tang et al., 2016). We annotated approximately

218

1042.42 Mbp (~70.82%) of repetitive sequences, among which the GT1 genome

219

comprised ~65.88% (~969.72 Mbp) of LTR retrotransposons (Supplementary Table

220

16). Total TE content of the GT1 genome is larger than the formerly reported genome

221

assemblies of the two rubber tree cultivars (Reyan7-33-97: 66.46%; RRIM600:

222

69.80%) and comparable to BPM24 (BPM24: 71.10%) (Supplementary Table 16),

223

indicating a high-quality performance to de novo assemble genomic regions

224

containing highly repetitive sequences using long PacBio reads (Supplementary

225

Figure 6).

226 227

Combining ab initio prediction and transcriptome sequence alignments from RNA-

228

Seq data of the six tissues (Supplementary Tables 17-18), we predicted a total of

229

44,187 protein-coding genes with an overall support of 96.76% (Table 1;

230

Supplementary Tables 19-20; Supplementary Figure 7a). Of them, 79.34%,

231

75.80%, 94.70% and 96.37% could be functioned with SwissProt, PFAM, TrEMBL

232

and Interpro databases, respectively (Supplementary Table 20; Supplementary

233

Figure 7b). We further performed homology searches and annotated noncoding RNA

234

(ncRNA) genes (Supplementary Table 21), yielding 945 transfer RNA (tRNA)

235

genes, 93 ribosomal RNA (rRNA) genes, 61 small nucleolar RNA (snoRNA) genes,

236

396 small nuclear RNA (snRNA) genes and 373 microRNA (miRNA) genes. We

237

annotated ~689,255 simple sequence repeats, which will provide valuable genetic

238

markers to assist future genetic improvemet of the rubber tree (Supplementary Table

239

22; Supplementary Figure 8).

240 241

Chromosomal evolution after a common paleotetraploidy in the spurge family

242

Chromosome-scale genome assembly ensures the detection of whole genome

243

duplication (WGD) events that affected chromosomal evolution in the rubber tree. We

244

examined the rubber tree genome for homeologous genomic segments based on

245

sequence similarities of paralogous genes. A total of 1,901 syntenic blocks containing

246

26,403 paralogous gene pairs were identified in the rubber tree genome

247

(Supplementary Table 23). Comparative analyses of paralogous gene pairs from

248

these intragenomic homeologous blocks unquestionably showed that rubber tree has

249

experienced two paleotetraploidization events corresponding to sequence divergence

250

peaks at ~0.124 and 0.530 synonymous transversions per site, respectively

251

(Supplementary Figure 9). Comparisons among paralogues and orthologues of the

252

five Malpighiales species show that a recent paleotetraploidy event occurred prior to

253

the split of the Hevea and Manihot lineages, and an ancient eurosid WGD was shared

254

by all these examined species (Supplementary Figure 9). This finding strongly

255

supports the hypothesis that a recent paleotetraploidy event took place before the

256

divergence of the Hevea and Manihot species but after the split of the castor bean

257

(Pootakham et al., 2017).

258 259

The rubber tree contains the same number of chromosomes (2n=36) as the cassava,

260

which is almost twice as many as that of castor bean (2n = 20) (Chan et al., 2010).

261

The genus Hevea is closest to Manihot in the spurge family (Euphorbiaceae), and they

262

diverged from each other approximately 36 Myr (Bredeson and Lyons, 2016;

263

Pootakham et al., 2017). The chromosome-based genome assembly we obtained for

264

GT1 permits us to identify the 1,901 syntenic blocks of the rubber tree genome by

265

inter-chromosomal comparisons, within which 26,403 paralogous gene pairs are

266

spread across the 18 chromosomes, falling into the five two-by-two chromosome pairs

267

and two four-by-four chromosome groups (Supplementary Table 23; Figure 1;

268

Figure 2a;Supplementary Figure 10). We similarly detected 38,833 orthologous

269

gene pairs corresponding to 1,829 conserved syntenic blocks between the rubber tree

270

and cassava genomes (Supplementary Table 23; Supplementary Figure 11). These

271

can also be divided into five two-by-two pairs and two four-by-four groups

272

(Supplementary Table 23; Supplementary Figure 12). Bredeson et al. (2016)

273

previously reported a similar pattern of conserved synteny comprising five groups of

274

two-by-two chromosome pairs and two chromosome groups of four-by-four as a

275

result of the paleotetraploidy in the cassava. These two chromosome-based high-

276

quality genome assembles ensured a reliable discovery of macrosynteny conservation

277

between the two euphorbs. Macrosynteny conservation of rubber tree and cassava that

278

comprise the same chromosome nui ombers further supports the hypothesis that a

279

whole genome duplication event occurred in the common ancestor of Hevea and

280

Manihot.

281 282

To trace the palaeohistory of euphorbs and understand the chromosomal evolution

283

after the polyploidization event, we performed a comparative genomic analysis of the

284

rubber tree with cassava (Bredeson et al., 2016) using the grape (Jaillon et al., 2007)

285

as the closest modern representative of the ancestral eudicot karyotype (AEK)

286

(Badouin et al., 2017; Salse, 2016). Since the recent WGD event occurred before the

287

split of the rubber tree and cassava lineages, these two paleopolyploid plants have

288

experienced diploidization through structural and functional changes, providing an

289

unprecedented opportunity to understand chromosomal evolution in spurge plants. We

290

reconstructed a model to infer the scenario of chromosomal evolution in rubber tree

291

and cassava based on genome assemblies at the chromosome level (Figure 2b). In the

292

rubber tree genome, Hbr04, Hbr08, Hbr10, and Hbr13 did not experience many

293

rearrangements while other chromosomes have suffered from a large number of

294

fission and fusion events (Figure 2b). In the cassava genome, Mes03 and Mes04

295

retained few regions derived from the eudicot ancestor compared with other

296

chromosomes (Figure 2b). Our comparative genomic analysis of the rubber tree and

297

cassava further identified a large number of genomic structural variation after a shared

298

polyploidization event (Supplementary Figure 13). The model of chromosomal

299

evolution for rubber tree and cassava reveals that substantial genomic rearrangement

300

events have extensively shaped the chromosome structure leading to the 18 modern

301

chromosomes since the common paleopolyploid ancestor. `

302 303

The three LTR retrotransposon families are drivers of the expanded rubber tree

304

genome

305

Using our chromosome-based genome assembly, we investigated the evolution of

306

LTR-retrotransposons and their potential contribution to the growth of the rubber tree

307

genome. The rubber tree has experienced a rapid growth of genome size as a result of

308

the Hevea-specific proliferation of transposable elements compared to the three other

309

closely related species of Euphorbiaceae, including cassava, castor bean and physic

310

nut (Table 1; Figure 3; Supplementary Tables 24-25; Supplementary Figures 14-

311

15). Retrotransposons in the rubber tree genome are particularly abundant (~1042.42

312

Mb; ~70.82%) compared to five other plant species of Malpighiales, including

313

Manihot esculenta, Jatropha curcas, Ricinus communis, Populus trichocapa and

314

Linum usitatissimum. Specifically, the rubber tree genome is enriched in LTR

315

retrotransposons (Ty1/copia, Ty3/gypsy and non-autonomous LTR retrotransposons)

316

with a total length of ~969.72 Mb (~65.88%) (Figure 3; Supplementary Figures 15-

317

16; Supplementary Tables 24-25). Dating transposable elements shows that

318

retrotransposons and DNA transposons were almost separately amplified among the

319

six sequenced plant species of Malpighiales (Supplementary Figure 14).

320

Comparative analyses of the Ty1/copia, Ty3/gypsy and other classes of LTR

321

retrotransposons suggest that they have experienced three retrotransposition bursts

322

over the last 10, 20 and 30 million years ago (mya), respectively (Supplementary

323

Figure 14). Ty3/gypsy LTR retrotransposon families dominate the rubber tree genome

324

contributing ~585.69 Mb (~39.79%) and are ~3.13-fold more abundant than

325

Ty1/copia with ~187.34 Mb (~12.73%) (Supplementary Tables 24-25; Figure 3a;

326

Supplementary Figure 15). We exclusively observed that the amplification of Tekay

327

(~430.60 Mb; ~29.25%) of Ty3/gypsy and Angela (~96.70 Mb; ~6.57%) of Ty1/copia

328

has largely contributed to the expansion of the rubber tree genome (Figure 3;

329

Supplementary Figure 15a, b). Surprisingly, the largest Ty3/gypsy retrotransposon

330

family HB0001 (~368.08 Mb; ~25.01%) belonging to Tekay has long been active over

331

the last 45 million years (myr). Whereas, the two Ty1/copia retrotransposon families,

332

HB0002 (~96.70 Mb; ~6.57%) and HB0003 (~62.52 Mb; ~4.25%) belonging to

333

Angela and Tekay, respectively, have recently proliferated over the last 10 MYR

334

(Supplementary Figure 15c; Supplementary Figure 17). It is these three LTR

335

retrotransposon families that have predominantly amplified during the last 10 MYR;

336

they together account for ~54.38% of LTR retrotransposons and ~35.83% of the

337

whole rubber tree genome (Figure 3; Supplementary Figure 15c; Supplementary

338

Figure 17; Supplementary Table 26). Further annotation and comparative analyses

339

of the repeated elements among the four Euphorbiaceae species (Supplementary

340

Figure 15c; Supplementary Figure 17), including M. esculenta, J. curcas, R.

341

communis and H. brasiliensis, suggests that only the H. brasiliensis genome has

342

undergone specific bursts of these three LTR-retrotransposon families during the past

343

5 MYR. This resulted in the accumulation of a large number of retroelements driving

344

the growth of ~890 Mb of the rubber tree genome after the divergence from the M.

345

esculenta lineage around 36 Myr (Supplementary Figure 17).

346 347

The rapid evolution of gene families and the rubber biosynthesis

348

Defining the rapidly evolving gene families among flowering plants has been helpful

349

to identify genomic basis underlying physiological changes of metabolite constituents

350

during evolution. We compared the predicted proteomes of the rubber tree, cassava,

351

castor bean, physic nut, poplar, flax, Arabidopsis thaliana, and rice, yielding a total of

352

24,562 orthologous gene families that comprised 225,207 genes (Supplementary

353

Table 27; Supplementary Figure 18a). This revealed a core set of 121,256 genes

354

belonging to 7,498 gene clusters that were shared among all eight plant species,

355

representing ancestral gene families (Supplementary Figure 18a). We obtained a

356

total of 1,352 gene clusters containing 4,791 genes unique to the rubber tree,

357

potentially related to biosynthetic processes associated with the production of latex.

358

PFAM functional analysis showed that gene functions are enriched in aspartic acid

359

proteinase gene family (PF16845, P < 0.05), which encodes enzymes associated with

360

the production of lutoid membranes necessary for the aggregation of rubber particles

361

(Supplementary Table 28). Remarkably, we found that the rubber tree-specific gene

362

families are significantly enriched in functions related to the phosphorylation activity,

363

which is key to biosynthetic processes of latex (PF08645, P < 0.05; PF03372, P <

364

0.001) (Supplementary Table 28).

365 366

In flowering plants, the expansion or contraction of gene families is an important

367

driver of phenotypic diversification and the formation of phytochemical properties

368

(Chen et al., 2013; Ohno, 1970). We characterized gene families that underwent

369

discernible changes and divergently evolved along different branches with a particular

370

emphasis on those involved in the latex biosynthesis of the rubber tree. Our results

371

revealed that, of the 14,253 gene families inferred to be present in the most recent

372

common ancestor of the eight studied plant species, 5,034 gene families comprising

373

14,103 genes show significant expansions (P < 0.05) in the rubber tree lineage

374

(Supplementary Figure 18b; Supplementary Figure 19). Functional enrichment

375

analyses of these genes by both Gene Ontology (GO) terms and PFAM domains

376

surprisingly reveals that they were mainly enriched in a number of functional

377

categories involved in whole latex biosynthesis process that comprises twelve

378

pathways (Rahman et al., 2013) (Supplementary Table 29). For examples, functional

379

annotation of these genes demonstrates that they were mainly enriched in a number of

380

functional categories involved in basal metabolic processes, such as fructose 6-

381

phosphate metabolic process (GO:0006002; P < 0.001), glucose metabolic process

382

(PF11721; P < 0.001), phospholipase (PF00388, PF12357, PF00614, PF09279,

383

PF00387; P < 0.001), ribosomal protein (PF01020, PF00453, PF00347; P < 0.001),

384

argonaute (PF08699, PF16487, PF16488, PF16486; P < 0.001), aminotransferase

385

(PF00202; P < 0.001) and dehydrogenase (PF00725, PF09265; P < 0.001).

386

Furthermore, gene families are significantly enriched in a number of functions related

387

to carbohydrate metabolism, such as hydrolase (PF00702, PF03662, PF07748,

388

PF01915, PF01738, PF03644, PF00332, PF01074, PF12215, PF04685, PF00933,

389

PF00232, PF01301, PF12710; P < 0.001), glutamine synthetase (PF03951, PF00120;

390

P < 0.001), carbohydrate-binding protein of the ER (PF12819; P < 0.001), glutamate

391

decarboxylase (GO:0004970; P < 0.001), and UDP-glucose/GDP-mannose

392

dehydrogenase family (PF00984, PF03720; P < 0.001) (Supplementary Table 29).

393

In addition, gene families are significantly enriched in a number of functions related

394

to pyruvate metabolism involved in the MVA pathway (PF02887; P < 0.001). Notably,

395

we find that gene families encoding S-adenosyl-L-methionine synthase (SAMS) are

396

significantly enriched in the ethylene biosynthesis, including S-adenosylmethionine

397

biosynthetic process (GO:0006556; P < 0.001), methionine adenosyltransferase

398

activity (GO:0004478; P < 0.001) S-adenosylmethionine synthetase (PF02773,

399

PF00438, PF02772; P < 0.001), and adenosylmethionine decarboxylase (PF01536; P

400

< 0.001). Gene families are also significantly enriched in a number of functions

401

related to the biosynthesis of metabolic compounds, including polysaccharide

402

(PF03033,PF05686,PF00982,PF13641; P < 0.001) and glycoprotein lectin

403

(PF00139,PF02140,PF01453; P < 0.001) (Supplementary Table 29). Enrichment

404

analyses show that the expanded gene families are enriched in a total of 18 KEGG

405

pathways, including fifteen metabolism-related pathways, such as pyruvate

406

metabolism (ko00620; P < 0.05) relevant to biosynthetic processes of latex

407

(Supplementary Table 30; Supplementary Figure 18c).

408 409

As an important biological feature of the rubber tree, rubber is sequentially

410

synthesized by twelve pathways, in which hundreds of gene families are involved

411

(Rahman et al., 2013). In H. brasiliensis, natural rubber is a high molecular weight

412

biopolymer that comprises cis-isoprene units resulting from isopentenyl diphosophate

413

(IPP). The biosynthesis of IPP takes place via two distinct paths, that is, MVA and

414

MEP pathways. In total, we identified 70 so-called rubber biosynthesis-related genes,

415

including 11 genes involved in the MVA pathway, 18 genes associated with MEP

416

pathway, 11 genes responsible for initiator synthesis in the cytosol, and 30 genes

417

related to the rubber elongation (Supplementary Table 32; Figure 4). Compared to

418

non-rubber-produced plant species, the rubber tree shows the largest number of genes

419

involved in the synthesis of IPP to the final rubber polymer, which may largely

420

enhance the formation of isoprenoids (Supplementary Table 33). We particularly

421

observed a significant expansion of rubber biosynthesis-related gene families that

422

correlates with its capacity to produce high levels of latex in H. brasiliensis, such as

423

the eight f1-deoxy-D-xylulose 5-phosphate synthase (DXS) encoding DXP synthase

424

that catalyzes the first step in the MEP pathway, twelve cis-prenyltransferase (CPT),

425

eight rubber elongation factor (REF) and ten small rubber particle (SRPP) genes

426

(Supplementary Table 33). Besides a significant expansion (13/18) of the

427

REF/SRPP gene family that make gene clusters on Chromosome 9 (Figure 4), we

428

also found that nine of the identified twelve CPT genes are clustered on Chromosome

429

14 (Figure 4), suggesting that many of them appear to have arisen by tandem

430

duplication events.

431 432

To gain insights into the molecular mechanisms underlying the production of natural

433

rubber we examined tissue-specific expression patterns of genes involved in the

434

rubber biosynthesis using RNA-Seq datasets from barks, roots, flowers, stems, leaves,

435

and seeds. Our results show that these 20 rubber biosynthesis-related gene families

436

exhibit distinct patterns of gene expression among tissues; interestingly, they are most

437

highly expressed in flowers when compared to barks, followed by stems

438

(Supplementary Table S32). We paid particular attention to tissue-specific

439

expression patterns of REF/SRPP and CPT gene families that are functionally known

440

to encode enzymes that are responsible for the rubber elongation. The 18 REF/SRPP

441

genes exhibit dissimilar expression patterns among six tissues of the rubber tree, of

442

which we intriguingly found that REF1, REF2, REF3, REF4 and REF5 are highly

443

expressed in flowers, while REF6, REF7 and REF8 are highly expressed in seeds.

444

However, the majority of REF genes are expressed in barks. Interestingly, we

445

observed that SRPP3, SRPP5, SRPP6, SRPP8, SRPP9, and SRPP10 are highly

446

expressed in flowers compared to the barks, while SRPP2 and SRPP4 are favorably

447

expressed in barks when compared to flowers. Our results indicated that CPT1, CPT4,

448

CPT7, and CPT11 are highly expressed in flowers, while others are preferentially

449

expressed in barks (CPT12), roots (CPT2 and CPT9), leaves (CPT4 and CPT5) and

450

seeds (CPT6), respectively. Taken together, we show that, besides the high expression

451

of a small number of genes in roots, stems, leaves, and seeds, the rubber may be

452

actively synthesized in both flowers and barks, but predominantly accumulates in

453

barks. Our differential expression analysis further reveals that 276 DEGs related to the

454

rubber biosynthesis are commonly up- regulated in barks compared with five other

455

tissues (Supplementary Table 34; Supplementary Figure 20), of which we

456

identified two latex-produced genes involved in TCA-cycle and one gene associated

457

with starch metabolism (Supplementary Table 35).

458 459

To provide a transcriptomic overview of expression divergence present between

460

cultivated and wild rubber trees that underlies the variation in latex yields, we

461

examined tissue-specific expression patterns of genes involved in the rubber

462

biosynthesis using RNA-Seq data from barks of the five elite cultivars of the rubber

463

tree and the five accessions of wild rubber tree representatives of its global geographic

464

range. Our results show that these 20 rubber biosynthesis-related gene families

465

exhibited distinct patterns of gene expression across different accessions of rubber

466

trees (Supplementary Figure 21; Supplementary Tables 36-38). PCA analysis

467

based on expression levels of rubber biosynthesis-related genes further suggests that,

468

compared to abundant expression bias across GT1 tissues, there are no remarkably

469

preferential expression patterns between the cultivated and wild rubber trees

470

(Supplementary Figures 21-22; Supplementary Tables 36-37). Our statistic test

471

further showed that, among all 69 rubber biosynthesis-related genes, only seven genes

472

are significantly differentially expressed between cultivated and wild rubber trees

473

(Supplementary Table 38), which were experimentally validated by real-time PCRs

474

(Supplementary Figure 23; Supplementary Table 39).

475 476

Genomic insights into population divergence and artificial selection on cultivated

477

rubber tree

478

The first high-quality rubber tree genome allows us to examine genomic variation

479

present within the cultivated and wild rubber trees and detect potential signatures of

480

selection during the domestication. We employed the Illumina short-read technology

481

with paired-end libraries on the HiSeq2000 sequencing platform to resequence eight

482

elite cultivars of the rubber tree and six accessions of wild rubber tree according to

483

native geographic range throughout the world (Supplementary Table 40). Each of

484

these 14 cultivated and wild representative genomes (first reported in this study), were

485

sequenced to > 3-fold sequence coverage using paired-end (2 × 100-bp) Illumina

486

sequencing. In addition, we included two previously reported genomes of cultivated

487

rubber trees (RRIM 600 and Reyan7-33-97) with >10-fold genome sequence coverage

488

(Lau et al., 2016; Tang et al., 2016). We first mapped high-quality short reads back to

489

the reference genome sequence of the rubber tree, and obtained mapping rates ranging

490

from 95.81% to 98.86% (Supplementary Table 40). They represent the largest

491

whole-genome sequencing data for a diverse panel including 16 accessions with

492

sufficient depths of genome coverage to represent the genomic diversity of cultivated

493

and wild rubber trees.

494 495

After aligning reads against the rubber tree reference genome sequence, we obtained a

496

total of 15,728,276 common single nucleotide polymorphisms (SNPs), more than 90%

497

of which were located on intergenic regions (Supplementary Tables 40-41;

498

Supplementary Figure 24). The density of SNPs across all 16 cultivated and wild

499

rubber trees averages approximately 11.9 SNPs per kilobase, of which 14,645,744 and

500

14,497,454 SNPs were totally identified across these 10 cultivars and 6 wild

501

accessions of the rubber tree, respectively. We observed 1,547,989 SNPs (9.84%) in

502

genic regions, including 174,394 synonymous, 258,938 nonsynonymous, 426,705

503

exonic, and 1,121,284 intronic SNPs. We used this large dataset of SNPs from both

504

wild and cultivated rubber trees to evaluate genome-wide levels of nucleotide

505

diversity (π and θw) to be 0.00345±0.00021 and 0.00291±0.00041. The wild rubber

506

trees show higher levels of nucleotide diversity than rubber tree cultivars (π per kb,

507

3.43 vs. 2.86, P = 2.29 × 10-5; θw per kb, 2.59 vs. 2.55, P = 4.21 × 10-1)

508

(Supplementary Table 42).

509 510

The genome-wide SNP dataset provides an unprecedented opportunity to investigate

511

the population structure of the rubber tree. Principal-component analysis (PCA) of

512

SNP variation suggests that the top two eigenvectors strongly correlate with sampling

513

sources: PC1 was correlated with cultivated rubber trees admixed by two accessions

514

of wild rubber tree (Wild258 and Wild166), and PC2 was correlated with wild rubber

515

trees (Figure 5a). These inferred relationships are also supported by phylogenetic

516

analysis based on SNPs (Figure 5b), showing that the ten cultivars (Cult053, Cult169,

517

Cult171, Cult223, Cult293, Cult589, KEN, RRIM 600, RP, and RY7-33-97) formed

518

one cluster admixed by two accessions of wild rubber tree (Wild258 and Wild166),

519

while the four wild rubber trees (Wild162, Wild194, Wild232 and Wild778) were

520

clustered into another group. To generate an alternative view of population

521

stratification, we used the population clustering program STRUCTURE (Pritchard,

522

2010), which inferred the optimal number of genetic clusters comprising the

523

cultivated and wild rubber tree genomes to be K = 7 (Figure 5c; for other K values,

524

see Supplementary Figure 25). We found that cultivated rubber trees predominantly

525

form three major clusters, two of which are grouped with Wild258 and Wild166,

526

respectively. However, the four accessions of wild rubber trees split into four distinct

527

groups, of which Wild778 has contribute the most to cultivated rubber trees, probably

528

serving as an ancestral population.

529 530

Population genomics provides an opportunity to track the dispersal, genetic

531

bottlenecks and signature of artificial selection during the domestication in most

532

cultivated crop species (Doebley et al., 2006). To detect the footprint of artificial

533

selection in the rubber tree genome we attempted to identify genomic regions with

534

significantly lowered levels of polymorphisms in cultivated populations compared to

535

wild rubber trees. While comparing to the FST values and levels of polymorphisms

536

between cultivated and wild accessions of the rubber tree (Figures 5d, 5e), we

537

performed a whole-genome screening and identified ~83.7 Mbp of the genome

538

potentially subjected to selective sweeps. These regions harbored 578 genes, which

539

we consider candidate domestication genes of the rubber tree (Supplementary Table

540

43). KEGG enrichment analysis shows that these candidate domestication genes are

541

enriched in twelve pathways, of which nine genes enriched in ko00900 associate with

542

the terpenoid backbone biosynthesis that is key to the latex biosynthesis

543

(Supplementary Figures 26-27). Further functional annotation indicates, for example,

544

that the two genes (GT005609, GT009352) encode the first enzyme (DXS) in the

545

MEP pathway (Supplementary Table 43).

546 547

DISCUSSION

548

De novo sequencing and assembly of large, highly repetitive, and heterozygous plant

549

genomes have long been challenging and problematic. Here we construct a high-

550

quality reference for one such genome, the GT1 genome for H. brasiliensis. We

551

generated a 1.3-Gbp de novo assembly of the rubber tree genome highlighting the

552

potential of combining PacBio long-read and Illumina short-read assemblies to create

553

improved reference genomes, in sharp contrast to relatively fragmented genome

554

assemblies using the Illumina sequencing data alone (Rahman et al., 2013; Tang et al.,

555

2016) or hybrid assemblies with Illumina and PacBio sequence data (Lau et al., 2016;

556

Pootakham et al., 2017). We thus established the efficiency of employing long SMRT

557

reads to resolve ambiguous genomic regions harboring predominantly repetitive

558

sequences, particularly LTR-retrotransposons.

559 560

Considering the difficulty in generating a high-density linkage map for the rubber tree

561

as a long lifespan tree species, we demonstrated an efficient methodology that

562

leverages long-range Hi-C data to scaffold contigs assembled from PacBio reads and

563

successfully anchor ~71% of the 1.3-Gb genome assembly into 18 pseudo-

564

chromosomes. After the availability of four draft genome assemblies for the rubber

565

tree (Lau et al., 2016; Pootakham et al., 2017; Rahman et al., 2013; Tang et al., 2016),

566

we obtain a chromosome-based reference genome with considerable improvement in

567

sequence contiguity. The advent of the GT1 reference genome together with

568

companion transcriptome resources presented in this study provides important insights

569

into spurge genome evolution. It also further strengthens interest in the rubber tree as

570

a model for ~2,500 rubber-produced plants to accelerate our understanding of the

571

molecular mechanisms underlying the rubber biosynthesis.

572 573

Our comparative genomics analysis of the five Malpighiales species convincingly

574

demonstrates that the rubber tree has experienced two paleotetraploidization events.

575

Besides an ancient eurosid WGD (Salse, 2016), we confirm a recent paleotetraploidy

576

event occurred before the divergence of the Hevea and Manihot species but after the

577

split of the castor bean by genome-scale comparative analysis of H. brasiliensis and

578

other members in Euphorbiaceae, M. esculenta, R. communis and J. curcas (Bredeson

579

and Lyons, 2016; Chan et al., 2010; Pootakham et al., 2017; Sato et al., 2011).

580

Chromosome-based analysis of the two high-quality genomes of M. esculenta

581

(Bredeson and Lyons, 2016) and H. brasiliensis reported in this study further reveal

582

that macrosynteny conservation derived from the shared polyploidization event were

583

disrupted by widespread genomic rearrangement events that drive chromosomal

584

evolution in the spurge family.

585 586

The rubber tree possesses an unusually large genome when compared to the majority

587

of other spurge plants. We compared the GT1 genome with three other previously

588

released genome assemblies, including RRIM600 (Lau et al., 2016; Rahman et al.,

589

2013), Reyan7-33-97 (Tang et al., 2016) and BMP24 (Pootakham et al., 2017),

590

showing nearly completely annotation of most transposable elements, including a

591

large number of LTR retrotransposons in the GT1 genome. We identify a large

592

Hevea-specific proliferation of LTR retrotransposons, which contributed to the rapid

593

increase in genome size when compared to the three other closely related species,

594

including cassava, castor bean and physic nut. Among the three LTR retrotransposon

595

families that have predominantly amplified and altogether account for over half of

596

whole rubber tree genome, we observed a pattern of longstanding and incessant LTR

597

retrotransposon bursts of the largest Ty3/gypsy retrotransposon family over the last 45

598

million years. This is similar to what has been reported for tea tree genome (Xia et al.,

599

2017). However, we also document the recent proliferation of two other large LTR

600

retrotransposon families during the last 10 MYR. Once again, this pattern is consistent

601

to a previous study that reported a recent proliferation of the three LTR-

602

retrotransposon families in the wild rice genome, Oryza australiensis, leading to a

603

two-fold increase in genome size during the last three MYR (Piegu et al., 2006).

604 605

The well-established WGD event occurred before the divergence of the Hevea and

606

Manihot species but after the split of the castor bean. This occurred in conjunction

607

with lineage-specific segmental duplication leading to the expansion of gene families

608

relevant to the activation of the latex biosynthesis and thus the increase of rubber

609

production. We have identified a large number of rubber tree-expanded genes

610

encoding enzymes widely involved in numerous functional categories related to basal

611

metabolic processes, carbohydrate metabolism as well as pyruvate metabolism

612

relevant to the MVA pathway, which have greatly enhanced the accumulation of

613

sufficient precursors for the subsequent latex production. The rubber tree-expanded

614

genes encoding SAMS are significantly enriched in GO terms and PFAM domains

615

involved in the ethylene biosynthesis that have greatly improved the latex production

616

(Dusotoit-Coucaud et al., 2010; Yang and Hoffman, 2003). We detected a significant

617

enrichment in a number of functions related to glycoprotein lectin and polysaccharide

618

activities that are well-known to regulate latex coagulation and the blocking of latex

619

flow—important for controlling the rubber production and yield of the rubber tree.

620

The completion of such a high-quality reference genome of rubber tree also permits us

621

to fully identified almost all gene families involved in sequentially long pathways of

622

the rubber biosynthesis from sucrose to rubber polymerization. Our comparative

623

analyses with non-rubber-produced plant species show that the rubber tree harbors the

624

largest number of genes involved in the latex biosynthesis. The rapid expansion of

625

rubber biosynthesis-related gene families, such as DXS, CPT and REF/SRPP, which is

626

in a good agreement with former observations (Lau et al., 2016; Pootakham et al.,

627

2017; Tang et al., 2016), suggests a strong correlation with its capacity to produce

628

high levels of latex in H. brasiliensis. We thus hypothesize that, instead of the

629

assumption that the expansion of the REF/SRPP family is associated with correlation

630

with the rubber biosynthesis (Tang et al., 2016), the formation of rubber biosynthesis

631

network and wide-ranging expansion of rubber biosynthesis-related gene families

632

enabled the rubber tree to significantly strengthen the latex biosynthesis and to

633

efficiently yield high-quality natural rubber.

634 635

Comprehensive comparative transcriptomic analyses aided by this high-quality

636

genome assembly also provided new insights into the molecular mechanisms

637

underlying the production of natural rubber. An assessment of transcriptomic data

638

shows that these rubber biosynthesis-related gene families are more highly expressed

639

in flowers than barks and stems, suggesting that a potential subfunctionalization or

640

neofunctionalization after gene duplication might explain functional divergence under

641

strong selection pressures that exaggerates the complexity in rubber biosynthesis. We

642

previously reported that the majority of genes involved in the ginsenoside

643

biosynthesis are highly expressed in flowers compared with leaves and roots in Panax

644

notoginseng (Zhang et al., 2017). The results suggested that ginsenosides might be

645

actively synthesized in flowers besides roots and leaves but accumulated in roots,

646

supported by a speedy accumulation of total triterpene saponins after flowering. We

647

thus assume that the rubber may be actively synthesized in flowers besides barks but

648

accumulated in barks, but the hypothesis that there is an increased yield of rubber in

649

bark after flowering requires to be experimentally validated in H. brasiliensis.

650 651

In addition to confirming earlier observations that REF/SRPP gene families are highly

652

expressed in latexs (Lau et al., 2016; Pootakham et al., 2017; Tang et al., 2016), we

653

observed differentially expressed patterns of REF/SRPP and CPT genes in flowers

654

and barks. Further experimental studies are required to examine the complicated

655

transcriptional regulation of these expanded rubber biosynthesis-related gene families.

656

Such investigations will likely offer clues to understanding the unique properties of H.

657

brasiliensis needed to produce extraordinary amounts of rubber.

658 659

We also provided new insights into natural standing genetic variation and divergence

660

between the cultivated and wild rubber trees. Compared to wild rubber trees, we

661

observe considerable reduction in genomic diversity among cultivated rubber trees.

662

This is likely the result of a strong genetic bottleneck and artificial selection during

663

the relatively short period of domestication since the late nineteenth century. Despite

664

limited phenotypic divergence and gene flow, there is a strong genetic split between

665

wild and the cultivated rubber trees. A subset of wild rubber trees, however, may

666

either represent the ancestral populations of the domesticated rubber tree or have

667

experienced more recent and possibly frequent genomic introgression with rubber tree

668

cultivars. Extensively sampling of cultivated and wild rubber trees will be required to

669

distinguish among these possibilities. Finally, we identify hundreds of candidate

670

domestication genes many of which are involved in specific pathways important for

671

rubber biosynthesis. More analyses and experimental validation are needed to

672

determine whether these represent bona fide domestication genes or whether they are

673

simply genetic hitchhikers of regions that were targeted by artificial selection over

674

one hundred years’ domestication of the rubber tree.

675 676

The rubber tree genome assembly and transcriptomic and genomic variation datasets

677

presented in this study will offer valuable information to aid efficient germplasm

678

exploration and the improvement of economically important traits of rubber tree to

679

meet global market’s increasing demand for natural rubber.

680 681

METHODS

682

Genome sequencing and assembly

683

DNA was extracted from a GT1 individual for PacBio RSII and Hiseq sequencing

684

platforms. One library with 20 Kb insert size was constructed and sequenced for 100

685

SMRT cells on PacBio RSII. The PacBio data were corrected with MHAP (Berlin et

686

al., 2015) algorithm and assembled by canu (v1.1) (Koren et al., 2017). We generated

687

~31 Gb Illumina data with 500 bp insert size on Hiseq 2500 platform to polish

688

assembled genome sequences using pilon (Walker et al., 2014). For Hi-C sequencing,

689

chromosome structure was fixed by formaldehyde crosslinking, and then MboI

690

enzyme was used to shear DNA. Hi-C library with 200-600 bp insert size was

691

constructed, which was sequenced on Hiseq 2000 platform. The Hi-C sequence data

692

were qualified with HIC-pro (Servant et al., 2015), in which the validly mapped reads

693

were selected to cluster and order the assembled genome sequences using AllHic

694

v0.8.12 (Zhang et al., 2019).

695 696

Genome annotation

697

Repetitive elements were identified based on homologous detection and de novo

698

searches. For homolog strategy, whole genome sequences were aligned with RepBase

699

21.01 (Jurka et al., 2005) using RepeatMasker (v4-0-6) program

700

(www.repeatmasker.org). LTR_FINDER1.0.6 (Xu and Wang, 2007) was applied to

701

identifying LTR retrotransposon elements to construct de novo repeat library, and

702

genomic locations were also detected using RepeatMasker (v4-0-6).

703 704

Gene models were predicted based on the five closely related plant species of the

705

rubber tree (M. esculenta, R, communis, J. carcass, L. usitatissimum and P.

706

trichocarpa) and RNA-seq data. Amino acid sequences from these genomes were

707

mapped using BLAT (Kent, 2002) with the rubber tree genome assembly to search for

708

candidate protein-coding sequences, and then GeneWise (Birney et al., 2004) was

709

performed to predict gene models. RNA-seq reads from bark, stem, seed, root, leaf

710

and inflorescence tissues were mapped to the assembled GT1 genome sequences with

711

TOPHAT (Trapnell et al., 2009), and transcripts were determined by Cufflink

712

(Trapnell et al., 2010); these evidences were subsequently integrated by GLEAN to

713

generate conserved gene models. The integrated genes were compared with Cufflink

714

and GeneWise results based on cultivar Reyan7-33-97 (Tang et al., 2016), and

715

transcripts or functional genes were finally added to the GLEAN (Elsik et al., 2007)

716

gene set. For function annotation, amino acid sequences were searched with known

717

UniProt (2009), KEGG (Kanehisa and Goto, 2000) databases. InterProscan (Jones et

718

al., 2014) were performed to annotate protein domains.

719 720

Analysis of gene family evolution

721

Homologous genes from different species were combined using all vs all BLASTP.

722

Gene families were clustered with OrthoMCL (Li et al., 2003) according to blast

723

results. Gene family expansion and contraction were calculated using CAFÉ program

724

based on gene cluster statistics. Homologous genes were detected by BLASTP

725

(Altschul et al., 1990), and then synteny blocks were identified with MCscanX (Wang

726

et al., 2012).

727 728

Population genomic analysis

729

The reads from 10 cultivated and 6 wild species were mapped to assemble the GT1

730

genome using BWA (Li, 2013) with mem algorithm. The duplicated reads were

731

removed using Picard tools (http://broadinstitute.github.io/picard/). SNP calling was

732

performed using the Genome Analysis Toolkit (GATK) (McKenna et al., 2010). The

733

phylogenetic tree of all individuals was reconstructed using Neighbor-

734

Joining/UPGMA method, which was visualized using the software MEGA6 (Tamura

735

et al., 2013). Population structure was speculated by ADMIXTURE (Alexander et al.,

736

2009), which is based on a maximum likelihood method. We predefined the number

737

of genetic clusters K from 2–7, and PCA analysis was performed using R language.

738

The average pairwise diversity within a population (θπ) and FST values were

739

calculated on sliding windows of 100 kb. Considering that genomic regions under

740

selection have a lowered diversity and reduced allele frequencies in the cultivated

741

populations compared with the same region in wild ancestral populations, and thus log

742

(Pwild/Pcult) > 1 and FST > 0.25 were set as cutoff values to identify these regions.

743 744

RNA isolation, sequencing and assembly

745

Tissues from leaves, flowers, seeds, barks, roots and stems of GT1 as well as the

746

eleven other bark samples of the rubber tree, including 5 cultivated (W169, W589,

747

W293, W53, W171) and 5 wild individuals (Y1187, Y379, Y809, Y4211, Y478) were

748

collected in Jinghong City, Yunnan Province, China. The high-quality RNA was

749

separately extracted and then sequenced on Hiseq 2500 platform. We filtered the low-

750

quality reads by following the quality-control procedures used for the genome

751

assembly. The transcriptome assembly for each rubber tree was generated using

752

Trinity (version r20140717) (Grabherr et al., 2011) with default parameters. The gene

753

expression levels were computed as the number of reads per kilobase of gene length

754

per million mapped reads (FPKM) using RSEM software (Li and Dewey, 2011).

755 756

ACCESSION NUMBERS

757

All sequencing reads and genome assembly have been deposited in the NCBI and BIG

758

Data Center under accession numbers PRJNA587314 and PRJCA001891,

759

respectively.

760 761

SUPPLEMENTAL INFORMATION

762

Supplemental Information is available at Molecular Plant Online.

763 764

FUNDING

765

This work was supported by Yunnan Innovation Team Project and the start-up grant

766

from South China Agricultural University (to L. G.).

767 768

AUTHOR CONTRIBUTION

769

L.-Z.G. and G.-H. L. conceived and designed the study; J. L., C. S., Ch. Sh., Y. W.,

770

C.-L. M., H.-B. Zh., G.-R. H.-T., X.-G. Zh. and S.-B. N. contributed to the sample

771

preparation and genome sequencing; Ch.-Ch. Sh., G.-Y. F. and W. L. performed

772

genome assembly; Y.T. performed flow cytometry experiments; Ch.-Ch. Sh., W.-B.

773

Ch., G.-Y. F., Q.-J. Zh., and Y.Zh., W.L. performed genome annotation; Ch.-Ch. Sh.,

774

C. Sh., Q.-J. Zh., W. L., G.-Y. F., H.-F. L., Z.-Y. X., S.-T. Zh., Y. Y., X.-N. H., Sh.-J.

775

H., W.-Q. L., M.-Q. L., J.-H. W., Y.-H. S., H. N., D. Zh., Y.-L. L.,Y. L.,K. L., J.-P.

776

W. and Y.-N. J. performed data analyses; L.-Z.G., J. L., C. Sh., Ch.-Ch. Sh., Q.-J. Zh.,

777

and W. L. wrote the manuscript; L.-Z.G., X. L. X.-C. Zh., E. E. E. and X. L. revised

778

the manuscript.

779 780

ACKNOWLEDGMENTS

781

The authors have no conflicts to declare.

782 783

REFERENCES

784

UniProt Consortium. (2009). The Universal Protein Resource (UniProt) 2009.

785

Nucleic Acids Res. 37:D169-174.

786

Alexander, D.H., Novembre, J., and Lange, K. (2009). Fast model-based estimation

787

of ancestry in unrelated individuals. Genome Res. 19:1655-1664.

788

Altschul, S.F., Gish, W., Miller, W., Myers, E.W., and Lipman, D.J. (1990). Basic

789

local alignment search tool. J. Mol. Biol. 215:403-410.

790

Backhaus, R.A. (1985). Rubber formation in plants—a mini-review. Israel J. Bot.

791

34:283-293.

792

Badouin, H., Gouzy, J., Grassa, C.J., Murat, F., Staton, S.E., Cottret, L.,

793

Lelandais-Briere, C., Owens, G.L., Carrere, S., Mayjonade, B., et al. (2017). The

794

sunflower genome provides insights into oil metabolism, flowering and Asterid

795

evolution. Nature 546:148-152.

796

Berlin, K., Koren, S., and Chin, C.S. (2015). Assembling large genomes with single-

797

molecule sequencing and locality-sensitive hashing. Nat. Biotechnol. 33:623-630.

798

Birney, E., Clamp, M., and Durbin, R. (2004). GeneWise and Genomewise.

799

Genome Res. 14:988-995.

800

Bowers, J.E. (1990). Natural rubber-producing plants for the United States. PNAS

801

USA 100:1352-1357.

802

Bredeson, J.V., and Lyons, J.B. (2016). Sequencing wild and cultivated cassava and

803

related species reveals extensive interspecific hybridization and genetic diversity. Nat.

804

Biotechnol. 34:562-570.

805

Chan, A.P., Crabtree, J., Zhao, Q., Lorenzi, H., Orvis, J., Puiu, D., Melake-

806

Berhan, A., Jones, K.M., Redman, J., Chen, G., et al. (2010). Draft genome

807

sequence of the oilseed species Ricinus communis. Nat. Biotechnol. 28:951-956.

808

Chan, H. (2000). Milestones in Rubber Research (Malaysian Rubber Board).

809

Chen, S., Krinsky, B.H., and Long, M. (2013). New genes as drivers of phenotypic

810

evolution. Nat. Rev. Genet. 14:645-660.

811

Chin, C.-S., Alexander, D.H., Marks, P., Klammer, A.A., Drake, J., Heiner, C.,

812

Clum, A., Copeland, A., Huddleston, J., and Eichler, E.E. (2013). Nonhybrid,

813

finished microbial genome assemblies from long-read SMRT sequencing data. Nature

814

Methods 10:563.

815

Doebley, J.F., Gaut, B.S., and Smith, B.D. (2006). The molecular genetics of crop

816

domestication. Cell 127:1309-1321.

817

Dusotoit-Coucaud, A., Kongsawadworakul, P., Maurousset, L., Viboonjun, U.,

818

Brunel, N., Pujade-Renaud, V., Chrestin, H., and Sakr, S. (2010). Ethylene

819

stimulation of latex yield depends on the expression of a sucrose transporter

820

(HbSUT1B) in rubber tree (Hevea brasiliensis). Tree Physiology 30:1586-1598.

821

Elsik, C.G., Mackey, A.J., Reese, J.T., Milshina, N.V., Roos, D.S., and Weinstock,

822

G.M. (2007). Creating a honey bee consensus gene set. Genome Biol. 8:R13.

823

Grabherr, M.G., Haas, B.J., Yassour, M., Levin, J.Z., Thompson, D.A., Amit, I.,

824

Adiconis, X., Fan, L., Raychowdhury, R., Zeng, Q. et al. (2011). Full-length

825

transcriptome assembly from RNA-Seq data without a reference genome. Nat.

826

Biotechnol. 29: 644-U130.

827

Jaillon, O., Aury, J.M., Noel, B., Policriti, A., Clepet, C., Casagrande, A.,

828

Choisne, N., Aubourg, S., Vitulo, N., Jubin, C., et al. (2007). The grapevine genome

829

sequence suggests ancestral hexaploidization in major angiosperm phyla. Nature

830

449:463-467.

831

Jones, P., Binns, D., Chang, H.Y., Fraser, M., Li, W., McAnulla, C., McWilliam,

832

H., Maslen, J., Mitchell, A., Nuka, G., et al. (2014). InterProScan 5: genome-scale

833

protein function classification. Bioinformatics 30:1236-1240.

834

Jurka, J., Kapitonov, V.V., Pavlicek, A., Klonowski, P., Kohany, O., and

835

Walichiewicz, J. (2005). Repbase Update, a database of eukaryotic repetitive

836

elements. Cytogenet. Genome Res. 110:462-467.

837

Kanehisa, M., and Goto, S. (2000). KEGG: kyoto encyclopedia of genes and

838

genomes. Nucleic Acids Res. 28:27-30.

839

Kent, W.J. (2002). BLAT--the BLAST-like alignment tool. Genome Res. 12:656-664.

840

Lau, N.S., Makita, Y., Kawashima, M., Taylor, T.D., Kondo, S., Othman, A.S.,

841

Shu-Chien, A.C., and Matsui, M. (2016). The rubber tree genome shows expansion

842

of gene family associated with rubber biosynthesis. Sci. Rep. 6:28594.

843

Li, B., and Dewey, C.N. (2011). RSEM: accurate transcript quantification from RNA-

844

Seq data with or without a reference genome. BMC Bioinformatics 12:323.

845

Li, H. (2013). Aligning sequence reads, clone sequences and assembly contigs with

846

BWA-MEM. arXiv (1303.3997). 1303.

847

Li, L., Stoeckert, C.J., Jr., and Roos, D.S. (2003). OrthoMCL: identification of

848

ortholog groups for eukaryotic genomes. Genome Res. 13:2178-2189.

849

Li, R., Zhu, H., Ruan, J., Qian, W., Fang, X., Shi, Z., Li, Y., Li, S., Shan, G.,

850

Kristiansen, K., et al. (2010). De novo assembly of human genomes with massively

851

parallel short read sequencing. Genome Res. 20:265-272.

852

Lieberei, R. (2007). South American leaf blight of the rubber tree (Hevea spp.): new

853

steps in plant domestication using physiological features and molecular markers. Ann.

854

Bot. 100:1125-1142.

855

Lin, T., Xu, X., Ruan, J., Liu, S., Wu, S., Shao, X., Wang, X., Gan, L., Qin, B.,

856

and Yang, Y. (2018). Genome analysis of Taraxacum kok-saghyz Rodin provides new

857

insights into rubber biosynthesis. Natl. Sci. Rev. 5:78-87.

858

McKenna, A., Hanna, M., Banks, E., Sivachenko, A., Cibulskis, K., Kernytsky,

859

A., Garimella, K., Altshuler, D., Gabriel, S., Daly, M., et al. (2010). The Genome

860

Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA

861

sequencing data. Genome Res. 20:1297-1303.

862

Ohno, S. (1970). Introduction in Evolution by gene duplication 1-2 (Berlin,

863

Heidelberg: Springer).

864

Piegu, B., Guyot, R., Picault, N., Roulin, A., Sanyal, A., Kim, H., Collura, K.,

865

Brar, D.S., Jackson, S., Wing, R.A., et al. (2006). Doubling genome size without

866

polyploidization: dynamics of retrotransposition-driven genomic expansions in Oryza

867

australiensis, a wild relative of rice. Genome Res. 16:1262-1269.

868

Pootakham, W., Ruang-Areerate, P., Jomchai, N., Sonthirod, C., Sangsrakru, D.,

869

Yoocha, T., Theerawattanasuk, K., Nirapathpongporn, K., Romruensukharom,

870

P., Tragoonrung, S., et al. (2015). Construction of a high-density integrated genetic

871

linkage map of rubber tree (Hevea brasiliensis) using genotyping-by-sequencing

872

(GBS). Front. Plant Sci. 6:367.

873

Pootakham, W., Sonthirod, C., Naktang, C., Ruang-Areerate, P., Yoocha, T.,

874

Sangsrakru, D., Theerawattanasuk, K., Rattanawong, R., Lekawipat, N., and

875

Tangphatsornruang, S. (2017). De novo hybrid assembly of the rubber tree genome

876

reveals evidence of paleotetraploidy in Hevea species. Sci. Rep. 7:41457.

877

Pritchard, J. (2010). Documentation for STRUCTURE software: version 2.2.

878

41:55–63.

879

Priyadarshan, P.M., and Clement-Demange, A. (2004). Breeding Hevea rubber:

880

formal and molecular genetics. Adv. Genet. 52:51-115.

881

Priyadarshan, P.M., and Goncalves, P.D.S. (2003). Hevea gene pool for breeding.

882

Genet. Resour. Crop Ev. 50:101-114.

883

Rahman, A.Y., Usharraj, A.O., Misra, B.B., Thottathil, G.P., Jayasekaran, K.,

884

Feng, Y., Hou, S., Ong, S.Y., Ng, F.L., Lee, L.S., et al. (2013). Draft genome

885

sequence of the rubber tree Hevea brasiliensis. BMC Genomics 14:75.

886

Salse, J. (2016). Ancestors of modern plant crops. Curr. Opin. Plant Biol. 30:134-142.

887

Sato, S., Hirakawa, H., Isobe, S., Fukai, E., Watanabe, A., Kato, M., Kawashima,

888

K., Minami, C., Muraki, A., Nakazaki, N., et al. (2011). Sequence analysis of the

889

genome of an oil-bearing tree, Jatropha curcas L. DNA Res. 18:65-76.

890

Servant, N., Varoquaux, N., Lajoie, B.R., Viara, E., Chen, C.J., Vert, J.P., Heard,

891

E., Dekker, J., and Barillot, E. (2015). HiC-Pro: an optimized and flexible pipeline

892

for Hi-C data processing. Genome Biol. 16:259.

893

Simao, F.A., Waterhouse, R.M., Ioannidis, P., Kriventseva, E.V., and Zdobnov,

894

E.M. (2015). BUSCO: assessing genome assembly and annotation completeness with

895

single-copy orthologs. Bioinformatics 31:3210-3212.

896

Tamura, K., Stecher, G., Peterson, D., Filipski, A., and Kumar, S. (2013).

897

MEGA6: Molecular Evolutionary Genetics Analysis version 6.0. Mol. Biol. Evol.

898

30:2725-2729.

899

Tang, C., Yang, M., Fang, Y., Luo, Y., Gao, S., Xiao, X., An, Z., Zhou, B., Zhang,

900

B., Tan, X., et al. (2016). The rubber tree genome reveals new insights into rubber

901

production and species adaptation. Nature Plants 2:16073.

902

Trapnell, C., Pachter, L., and Salzberg, S.L. (2009). TopHat: discovering splice

903

junctions with RNA-Seq. Bioinformatics 25:1105-1111.

904

Trapnell, C., Williams, B.A., Pertea, G., Mortazavi, A., Kwan, G., van Baren,

905

M.J., Salzberg, S.L., Wold, B.J., and Pachter, L. (2010). Transcript assembly and

906

quantification by RNA-Seq reveals unannotated transcripts and isoform switching

907

during cell differentiation. Nat. Biotechnol. 28:511-515.

908

van Beilen, J.B., and Poirier, Y. (2007). Establishment of new crops for the

909

production of natural rubber. Trends Biotechnol. 25:522-529.

910

Vurture, G.W., Sedlazeck, F.J., Nattestad, M., Underwood, C.J., Fang, H.,

911

Gurtowski, J., and Schatz, M. C. (2017). GenomeScope: Fast reference-free genome

912

profiling from short reads. Bioinformatics 33: 2202-2204.

913

Walker, B.J., Abeel, T., Shea, T., Priest, M., Abouelliel, A., Sakthikumar, S.,

914

Cuomo, C.A., Zeng, Q., Wortman, J., Young, S.K., et al. (2014). Pilon: an

915

integrated tool for comprehensive microbial variant detection and genome assembly

916

improvement. PLoS One 9:e112963.

917

Wang, Y., Tang, H., Debarry, J.D., Tan, X., Li, J., Wang, X., Lee, T.H., Jin, H.,

918

Marler, B., Guo, H., et al. (2012). MCScanX: a toolkit for detection and evolutionary

919

analysis of gene synteny and collinearity. Nucleic Acids Res. 40:e49.

920

Webster, C., and Baulkwill, W. (1989). Rubber (Longman Scientific and Technical).

921

Wuyun, T.N., Wang, L., Liu, H., Wang, X., Zhang, L., Bennetzen, J.L., Li, T.,

922

Yang, L., Liu, P., Du, L., et al. (2018). The hardy rubber tree genome provides

923

insights into the evolution of polyisoprene biosynthesis. Molecular Plant 11:429-442.

924

Xia, E.H., Zhang, H.B., Sheng, J., Li, K., Zhang, Q.J., Kim, C., Zhang, Y., Liu,

925

Y., Zhu, T., Li, W., et al. (2017). The tea tree genome provides insights into tea flavor

926

and independent evolution of caffeine biosynthesis. Molecular Plant 10:866-877.

927

Xu, Z., and Wang, H. (2007). LTR_FINDER: an efficient tool for the prediction of

928

full-length LTR retrotransposons. Nucleic Acids Res. 35:W265-268.

929

Yang, S.F., and Hoffman, N.E. (2003). Ethylene biosynthesis and its regulation in

930

higher plants. Annu. Rev. Plant Physiol 35:155-189.

931

Zhang, D., Li, W., Xia, E.H., Zhang, Q.J., Liu Y., Zhang, Y., Tong, Y., Zhao, Y.,

932

Niu, Y. C., Xu, J.H., Gao, L.Z. (2017). The medicinal herb Panax notoginseng

933

genome provides insights into ginsenoside biosynthesis and genome evolution.

934

Molecular Plant 10: 903–907.

935

Zhang, X., Zhang, S., Zhao, Q., Ming, R., Tang, H. (2019). Assembly of allele-

936

aware, chromosomal scale autopolyploid genomes based on Hi-C data. Nature Plants,

937

5: 833–845.

938

Tables

939

Table 1. Statistics of the genome assembly and gene annotation for rubber tree.

940 Assembly Illumina / PacBio sequencing depth (×) Estimated genome size (Mb) Chromosome number (2n = 2x=) Total length of assembly (Mb) No. of contigs Contig N50 (Kb) No. of scaffolds Contig number on chromosomes Chromosome length (Mb) GC content of the genome (%) Annotation Number of gene models Average gene length (bp) Mean exon length (bp) Average exon per gene Mean intron length (bp) Number of function annotated tRNAs rRNAs snoRNAs snRNAs miRNAs Repeat sequence length (Mb) Percentage of repeat sequence (%) 941 942

261.38 / 103.69 1,561 36 1,472 16,023 152.7 600 15,324 1,442 33.87 44,187 3,918.4 222.08 5.13 672.1 42,758 945 93 61 396 373 1,042.4 70.81

943

Figure Legends

944

Figure 1. The rubber tree genome features. (A) Circular representation of the 18

945

pseudochromosomes; (B) the density of genes; (C) the density of non-coding RNA;

946

(D) the distribution of transposable elements (TEs); (E) the distribution of gypsy-type

947

retrotrasnposons; (F) the distribution of copia-type retrotrasnposons; (G) the

948

distribution of DNA transposons; (H) SSR density; (I) the distribution of GC content;

949

(J) whole genome duplication (WGD) event shown by syntenic relationships among

950

duplication blocks containing more than fifteen paralogous gene pairs.

951 952

Figure 2. The chromosome evolution after the shared paleotetraploidy of rubber

953

tree and cassava. (A) Conserved synteny within the rubber tree genome. Diagrams

954

show genomic colinearity within the H. brasiliensis genome. Lines link the position

955

of paralogous gene sets among linkage group/chromosome set. The ten chromosomes

956

arranged in the upper circle illustrate 1:1 synteny between the five duplicated pairs of

957

chromosomes. The eight chromosomes depicted in the lower circle each share

958

syntenic regions with two other chromosomes, owing to chromosomal rearrangements

959

that occurred after the whole-genome duplication; (B) Evolutionary patterns for the

960

rubber tree and cassava chromosomes from the AEK of 7 (pre-WGT-γ)

961

protochromosomes. Genome rearrangements of these two genomes are elucidated

962

with different colors that represent the origins from the seven ancestral chromosomes

963

from the n= 7 AEK. Rubber tree linkage groups are designated with “Hbr” followed

964

by the linkage group numbers, and cassava chromosomes are designated with “Mes”

965

followed by the chromosome numbers.

966 967

Figure 3. Genome size variation and evolution of retrotransposon families in the

968

rubber tree genome. (A) shows genome sizes and proportions of different types of

969

transposable elements (TEs) in R. communis (RC), J. curcas (JC), M. esculenta (ME),

970

H. brasiliensis (HB), P. trichocarpa (PT) and L. usitatissimum (LU) using

971

Arabidopsis thaliana (AT) as outgroup. The unrooted phylogenetic trees were

972

constructed on the basis of Ty1/copia (B) and Ty3/gypsy (C) aligned sequences

973

corresponding to the RT domains without premature termination codon. LTR

974

retrotransposon family names and proportion of each are indicated.

975

976

Figure 4. Rubber biosynthesis and the expansion of the REF/SRPP and CPT gene

977

families in rubber tree. (A) Levels of expression (reads per kilobase per million

978

reads mapped; RPKM) of genes involved in the rubber biosynthesis in bark, root,

979

flower, stem, leaf and seed; (B) genomic locations of REF/SRPP and CPT genes.

980

Chromosomes are represented as solid bars with their names on left. Note that most of

981

the REF/SRPP and CPT genes are located on Hbr_9 and Hbr_14, respectively.

982 983

Figure 5. Population divergence and artificial selection of H. brasiliensis. (A)

984

Two-way principal components analysis (PCA) of the sixteen H. brasiliensis

985

accessions using identified SNPs. (B) Population structure of H. brasiliensis. Each

986

color and vertical bar represents one population and one accession, respectively. The

987

y-axis shows the proportion of each accession contributed from ancestral populations.

988

(C) Neighbor-Joining (NJ) phylogenetic tree of the sixteen H. brasiliensis accessions

989

constructed using SNP data. The scale bar represents the evolutionary distances

990

measured by p-distance. (D) the distribution of the FST values and levels of nucleotide

991

variation between the cultutivated and wild rubber trees. (E) genomic regions under

992

artificial selection on the 18 chromosomes of rubber tree.

993

Pyruvate Acetyl-CoA ACAT Acetoacetyl-CoA

ACAT1

1

ACAT2

0

ACAT3

−1

ACAT4

HMGS

ACAT5

HMG-CoA HMGR

HMGS1 HMGR1 HMGR2

MVA MVK

HMGR3 HMGR4 MVK1

MVD1

PMD1

rk

2

Ba

Ro

DXS1

DXS3 DXS4 DXS5 DXS6

CME CMK

DXS7 DXS8 DXR

PCME MCS

CMS1 CMS2

CMEC HDS

CMK MCS1

HMED

MCS2

HDR

HDS1 HDS2

rS we ed af Flo tem Le Se

HDR1 HDR2

CPT1

IPP

CPT2

1

CPT3

0

rk

Ba

IPPI2

IPPI

CPT6

cis-1,4polyisopre ne

CPT7 CPT8 CPT9 CPT10

REF

CPT11

rS we ot rk ed af Ba Ro Flo tem Le Se

CPT12

REF2

SRPP1

2

SRPP2

1

SRPP3 0

SRPP4

−1

0

REF4

−1

REF5

−2

FPS1

GPP FPS

FPS2

FPP GGPS

GGPS1

FPS3

GGPS2 GGPS3

GGPP

1

REF3

GPS2

GGPS4

REF6

SRPP5

−2

2

REF1

rS we ot ed af Ro Flo tem Le Se

B

GPS1

DMAPP GPS

SRPP

ark

1.5 1 0.5 0 −0.5 −1 −1.5

IPPI1

CPT5

−2

rS ot lowe em eaf eed t F L S

Ro

CPT

CPT4 −1

1.5 1 0.5 0 −0.5 −1 −1.5

DXS2

MEP CMS

PMD

PMD2

rS ot lowe em eaf eed t F L S

Ro

DXP DXR

MVA-5-pp

MVD2

rk

Ba

DXS

MVA-5-p MVD

MVK2

ot

Pyruvate

GA-3-p

MEP Pathway

rS we ot rk ed af Ba Ro Flo tem Le Se

MVA Pathway

A

REF7

SRPP6

REF8

SRPP7 SRPP8 SRPP9

B

SRPP10

2900000

56900000

Chr_1 Chr_1

Chr_2

86100000

Chr_3

Chr_9

Chr_3 15600000

Chr_7 Chr_8

Chr_10

Chr_15

Chr_16

37900000

6360000

22200000

86150000

1610000

47450000

86600000

56950000 71300000 3000000

71350000 3050000

37950000 71750000

71800000 72600000

22250000

86650000

86250000

1600000 38550000

6560000

86350000

6560000 6610000

22300000 13240000

22350000 13290000

22400000 13340000

22450000 13390000

47550000

47600000

47650000

47700000

86750000 65100000

65500000 1650000 38600000

65200000

65250000

65550000 38650000

65600000 38700000

38750000

77800000

77850000

2050000

79500000

79550000

79600000

79650000

79700000

79750000

16950000

17000000

17050000

17100000

17150000

17200000

3750000

3800000

REF

SRPP

REF

80250000

6660000 6560000

50600000 6560000

13440000

50650000

13490000

13540000

6560000 13590000

86850000

86800000 65150000

2000000

3700000

SRPP

CPT

72700000 80200000

72650000 86300000

CPT

3150000

6510000

86700000

1550000 38500000

71400000 3100000

6460000

1660000

47500000

65050000

Chr_12

86200000

6410000

65000000

Chr_13 Chr_14

Chr_14

2950000

38800000

77900000

78300000

38850000

78350000

38900000

78400000

39050000

78450000

39100000

78500000

78550000

78700000

78750000

A Principal component 2

B Cult053 K=2 Cult169 Cult171 Cult223 Cult293 Cult589 KEN RRIM 600 RP K=3 RY7−33−97 Wild162 Wild166 Wild194 Wild232 Wild258 Wild778 K=4

0.5

0.0

−0.25 0.00 Principal component 1

RR 0

Cu

78

23

RP

ld7

53

lt0

Cult293

60

4

19

IM

ild

W

Wi

K=5

lt2

C

0.25

Cu

−0.5 −0.50

Cult171 K=6

Wild232

Cul

t589

162

lt5 89 33 − Cu 97 lt2 Cu 23 l RR t293 IM 6 W 00 ild 25 8 KE Cu N lt1 W 69 ild 7 W 78 ild 1 W 94 ild 1 W 62 ild 23 2

RP

Cu lt0 W 53 ild 1 Cu 66 lt1 71

RY

7−

Cu

6 d16

Wil

Wild258

KE N

K=7

7

-9

0.02

33

9

t16

Cul

7-

W

RY

ild

D

E

1

2

log (pw/pc)

3

3

−3 −2 −1 0

log(pw/pc)

4

2

1

0.0

0.2

0.4 FST

0.6

0.8

0

1 2 3 4 5 6 7

8

9 10 1112131415 16 17 18 Chromosomes