Journal Pre-proof The Chromosome-based Rubber Tree Genome Provides New Insights into Spurge Genome Evolution and Rubber Biosynthesis Jin Liu, Cong Shi, Cheng-Cheng Shi, Wei Li, Qun-Jie Zhang, Yun Zhang, Kui Li, Hui-Fang Lu, Chao Shi, Si-Tao Zhu, Zai-Yun Xiao, Hong Nan, Yao Yue, Xun-Ge Zhu, Yu Wu, Xiao-Ning Hong, Guang-Yi Fan, Yan Tong, Dan Zhang, Chang-Li Mao, Yun-Long Liu, Shi-Jie Hao, Wei-Qing Liu, Mei-Qi Lv, Hai-Bin Zhang, Yuan Liu, Ge-Ran Hu-tang, Jin-peng Wang, Jia-Hao Wang, Ying-Huai Sun, Shu-bang Ni, Wen-Bin Chen, Xing-Cai Zhang, Yuan-nian Jiao, Evan E. Eichler, Guo-hua Li, Xin Liu, Li-Zhi Gao PII: DOI: Reference:
S1674-2052(19)30402-2 https://doi.org/10.1016/j.molp.2019.10.017 MOLP 865
To appear in: MOLECULAR PLANT Accepted Date: 30 October 2019
Please cite this article as: Liu J., Shi C., Shi C.-C., Li W., Zhang Q.-J., Zhang Y., Li K., Lu H.-F., Shi C., Zhu S.-T., Xiao Z.-Y., Nan H., Yue Y., Zhu X.-G., Wu Y., Hong X.-N., Fan G.-Y., Tong Y., Zhang D., Mao C.-L., Liu Y.-L., Hao S.-J., Liu W.-Q., Lv M.-Q., Zhang H.-B., Liu Y., Hu-tang G.-R., Wang J.-p., Wang J.-H., Sun Y.-H., Ni S.-b., Chen W.-B., Zhang X.-C., Jiao Y.-n., Eichler E.E., Li G.-h., Liu X., and Gao L.-Z. (2020). The Chromosome-based Rubber Tree Genome Provides New Insights into Spurge Genome Evolution and Rubber Biosynthesis. Mol. Plant. doi: https://doi.org/10.1016/ j.molp.2019.10.017. This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. All studies published in MOLECULAR PLANT are embargoed until 3PM ET of the day they are published as corrected proofs on-line. Studies cannot be publicized as accepted manuscripts or uncorrected proofs.
© 2019 The Author
1 2 3
The Chromosome-based Rubber Tree Genome Provides New Insights into Spurge Genome Evolution and Rubber Biosynthesis
4 5
Jin Liu1,2,13, Cong Shi2,3,13, Cheng-Cheng Shi4,13, Wei Li5,13, Qun-Jie Zhang5, 13, Yun
6
Zhang6, Kui Li7,8, Hui-Fang Lu9, Chao Shi2, Si-Tao Zhu4, Zai-Yun Xiao1, Hong
7
Nan2,3, Yao Yue4, Xun-Ge Zhu2,3, Yu Wu1, Xiao-Ning Hong4, Guang-Yi Fan4,9, Yan
8
Tong2, Dan Zhang5, Chang-Li Mao1, Yun-Long Liu2, Shi-Jie Hao4, Wei-Qing Liu9,
9
Mei-Qi Lv4, Hai-Bin Zhang2, Yuan Liu2, Ge-Ran Hu-tang2,3, Jin-peng Wang3,10, Jia-
10
Hao Wang4, Ying-Huai Sun9, Shu-bang Ni1, Wen-Bin Chen9, Xing-Cai Zhang11, Yuan-nian Jiao10, Evan E. Eichler12, Guo-hua Li1, Xin Liu4,9,*, and Li-Zhi Gao2,5,*
11 12 13
1
Yunnan Institute of Tropical Crops, Jinghong 666100, China
14
2
Plant Germplasm and Genomics Center, Germplasm Bank of Wild Species in
15
Southwestern China, Kunming Institute of Botany, Chinese Academy of Sciences,
16
Kunming 650204, China
17
3
University of Chinese Academy of Sciences, Beijing 100049, China
18
4
BGI-Qingdao, Qingdao 266555, China
19
5
Institution of Genomics and Bioinformatics, South China Agricultural University,
20
Guangzhou 510642, China
21
6
22
University, Kunming 650224, China
23
7
School of Life Sciences, Nanjing University, Nanjing 210023, China
24
8
Novogene Bioinformatics Institute, Beijing 100083, China
25
9
BGI-Shenzhen, Shenzhen 518083, China
26
10
27
Chinese Academy of Sciences, Beijing100093, China
28
11
29
Cambridge, MA 02138, USA
30
12
31
USA
32
13
33
*
34
(
[email protected])
Asia-Pacific Tropical Forestry Germplasm Institution, Southwest China Forestry
State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany,
John A. Paulson School of Engineering and Applied Sciences, Harvard University,
Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195,
These authors contributed equally to this article.
Correspondence: Li-Zhi Gao (
[email protected]) or Xin Liu
35
SHORT SUMMARY
36
This study reports a chromosome-based genome for rubber tree and provides novel
37
insight into the spurge chromosome evolution. Moreover, it is found that significant
38
expansion of genes associated with whole rubber biosynthesis processes relevant to
39
the latex production. A number of candidate domestication genes are also identified,
40
some of which are associated with the rubber biosynthesis in the cultivated rubber
41
trees.
42
43
ABSTRACT
44
The rubber tree, Hevea brasiliensis, produces natural rubber that serves as an essential
45
industrial raw material. Here, we present a high-quality reference genome for a rubber
46
tree cultivar GT1 using single-molecule real-time sequencing (SMRT) and Hi-C
47
technologies to anchor the ~1.47-Gb genome assembly into 18 pseudo-chromosomes.
48
The chromosome-based genome analysis enabled us to establish a model of spurge
49
chromosome evolution since the common paleopolyploid event occurred before the
50
split of Hevea and Manihot. We show recent and rapid bursts of the three Hevea-
51
specific LTR-retrotransposon families during the last ten million years (MYR),
52
leading to the massive expansion ~60.44% (~890 Mbp) of the whole rubber tree
53
genome since the divergence from Manihot. We identify large-scale expansion of
54
genes associated with whole rubber biosynthesis processes, such as basal metabolic
55
processes, ethylene biosynthesis, and the activation of polysaccharide and
56
glycoprotein lectin, which are important properties for the latex production. We report
57
the first map of genomic variation between the cultivated and wild rubber trees and
58
obtained ~15.7 million high-quality single nucleotide polymorphisms (SNPs). We
59
identified hundreds of candidate domestication genes with drastically lowered
60
genomic diversity in the cultivated but not wild rubber trees despite a relatively short
61
domestication history, some of which are involved in the rubber biosynthesis. This
62
genome assembly represents key resources for future rubber tree breeding programs
63
providing novel gene targets to improve biotic and abiotic tolerance and rubber
64
production.
65 66
Key words: rubber tree, rubber biosynthesis, chromosome evolution, whole genome
67
duplication, domestication
68 69
70
INTRODUCTION
71
The rubber tree (Hevea brasiliensis (Willd. ex A. Juss.) Muell. Arg.), together with a
72
number of other economically important plant species, such as castor bean (Ricinus
73
communis L.), cassava (Manihot esculenta Crantz), and Barbados nut (Jatropha
74
curcas), belong to the spurge family (Euphorbiaceae). Among ~2,500 latex-yielding
75
plants, such as Eucommia ulmoides Oliver (Wuyun et al., 2018) and Taraxacum kok-
76
saghyz Rodin (Lin et al., 2018), only H. brasiliensis produces commercially viable
77
amount of natural rubber (cis-1, 4-polyisoprene), making up more than 98% of the
78
world’s natural rubber production (Backhaus, 1985; Bowers, 1990). These high-
79
quality isoprenoid polymers possess unique physical and chemical properties that are
80
incomparable to any synthetic alternatives (van Beilen and Poirier, 2007). Natural
81
rubber is, thus, an indispensable source material for numerous rubber products
82
worldwide. In sharp contrast to the environmental pollution of the industrial
83
synthetics, the rubber tree is able to sustainably yield natural rubber that is still
84
imperative for worldwide high-performance engineering components, such as heavy-
85
duty tires. H. brasiliensis remains a longstanding target for genetic manipulation in
86
order to improve and industrially enhance the commercial production of natural
87
rubber.
88 89
H. brasiliensis is a cross-pollinated tropical tree that grows to 30–40 m tall and can
90
live up to 100 years in the wild (Priyadarshan and Clement-Demange, 2004).
91
Historical records have documented that the domestication of the rubber tree, which is
92
native to the Amazon Basin in South America, began in 1896, and then dispersed to
93
Southeast Asia with the transfer of H. brasiliensis seedlings, mainly including
94
Malaysia, Indonesia, Thailand, and China today (Chan, 2000). Over a century’s
95
traditional breeding has greatly increased rubber productivity from 650 kg ha–1
96
yielded from wild natural populations of H. brasiliensis during the 1920s to 2,500 kg
97
ha–1 harvested in elite cultivars by the 1990s (Priyadarshan and Goncalves, 2003),
98
which is far below the hypothetical yield of 7,000-12,000 kg ha–1 in the rubber tree
99
(Webster and Baulkwill, 1989). Notwithstanding the origin of the rubber tree from the
100
Amazonian basin, the production of natural rubber in South America has merely 2%
101
of the total production worldwide because of the destructive spread of South
102
American leaf blight (SALB) disease affected by ascomycete Microcyclus ulei in the
103
1930s (Lieberei, 2007). Thus, although human selection efforts that have brought a
104
slow increase in rubber productivity, it is even more urgently required to raise new H.
105
brasiliensis varieties with desirable traits, including the tolerance to cold, drought and
106
wind, and particularly resistance to diseases caused by pathogenic fungi.
107
Nevertheless, the domestication from a small number of wild individuals originating
108
from the Amazonian basin of South America created a severe bottleneck that has led
109
to a restricted gene pool for cultivated rubber tree germplasms. This stands in contrast
110
to the wealth of phenotypic diversity and genetic adaptations residing in natural wild
111
population. These have the potential to be exploited through breeding programs that
112
would help expand genomic diversity important for the generation of more
113
environmentally resilient and high-yielding varieties. The widespread application of
114
marker-assisted selection promises to advance the breeding efficiency but requires
115
access to a high-quality rubber tree genome sequence in order to accelerate the pace at
116
which the untapped reservoir of agronomically important genes can be exploited from
117
the wild.
118 119
Considering the tremendous economic importance of the rubber tree, there has been a
120
longstanding effort to obtain a high-quality reference genome assembly (Lau et al.,
121
2016; Pootakham et al., 2017; Rahman et al., 2013; Tang et al., 2016). The first draft
122
genome sequence of RRIM600 clone, generated by a Malaysia research team, was
123
very fragmented but provided our first glimpse of the genome structure (Rahman et al.,
124
2013). The second draft rubber tree genome of the cultivar Reyan7-33-97 was
125
reported based on sequence data from both whole-genome shotgun (WGS) and pooled
126
BAC clones. While this assembly was significantly improved with respect to
127
annotation, the use of short Illumina reads posed difficulties in resolving highly
128
heterozygous and repetitive DNA sequences (Tang et al., 2016). The RRIM 600
129
genome assembly was subsequently improved based on ~155-fold combined coverage
130
with Illumina and PacBio sequence data. As part of that effort, 100 SMRT Cells were
131
sequenced using a 10-kbp SMRTbell library yielding 45.25 Gb (~21-fold coverage)
132
with an average read length of 6,852 bp (Lau et al., 2016). Until recently, deep-
133
coverage 454/Illumina short-read and Pacific Biosciences (PacBio) long-read
134
sequence data were combined to generate a de novo hybrid assembly of the
135
preliminary BPM24 rubber tree draft genome, which was subsequently scaffolded
136
using a long-range “Chicago” technique to obtain the best assembly of 1.26Gb (N50=
137
96.8 kb) to date. Using a SNP-based genetic map, only ~28.9% of the genome
138
assembly (~363 Mb) was successfully anchored into rubber tree’s 18 linkage groups
139
(Pootakham et al., 2017). Notwithstanding the complexity of the rubber tree genome,
140
a high-quality chromosome-level genome assembly is required in order to provide a
141
comprehensive understanding of the latex biosynthesis, genetic improvement of
142
desirable agronomic traits and efficient utilization of introduced alleles from wild
143
populations to expand the genetic basis of the cultivated rubber tree.
144 145
Here, we present a high-quality genome sequence of the rubber tree cultivar GT1, an
146
elite cultivar worldwide cultivated based on the single-molecular long-read
147
sequencing (SMRT) technology. This assembly has a length of 1.47-Gb and spans
148
∼94% of the estimated genome size of 1.56-Gb. We anchored this assembly into 18
149
chromosomes and generated the first chromosome-scale genome assembly assisted by
150
Hi-C technology. Together with the data analysis of resequenced genomes and
151
comparative transcriptomics of cultivated and wild rubber trees, we provide novel
152
insights into gene and genome evolution, domestication and the latex biosynthesis in
153
the rubber tree.
154
RESULTS
155
Genome sequencing, assembly, and feature annotation
156
We first performed a whole-genome shotgun sequencing (WGS) analysis of a rubber
157
tree cultivar GT1 with the next-generation sequencing (NGS) Illumina Hiseq 2000
158
platform. This generated clean sequence datasets of ~348.14 Gb and yielded
159
approximately 261.4-fold coverage (Supplementary Table 1). Using a 17-mer
160
analysis method, we estimated the genome size to be ~1.56 Gb (Supplementary
161
Table 2; Supplementary Figure 1). The genome heterozygosity was estimated to be
162
1.60-1.62% using GenomeScope (Vurture et al., 2017). The genome was assembled
163
using SOAPdenovo (Li et al., 2010), resulting in the final assembly of ~1.59 Gb
164
(Supplementary Table 3). The contig and scaffold N50 lengths were ~8.79 Kb and
165
~31.3 Kb, respectively (Supplementary Table 3).
166 167
In order to resolve the repetitive structure and heterozygous regions (Lau et al., 2016;
168
Pootakham et al., 2017; Rahman et al., 2013; Tang et al., 2016), we also sequenced
169
the same GT1 individual using PacBio single-molecule long-read sequencing (SMRT)
170
sequencing platform. We generated a total of ~161.86 Gb (103.75-fold sequence
171
coverage) of long read sequence data from 20 Kb and 40 Kb insert libraries with
172
subread N50 lengths of 9.07 Kb and 18.34 Kb, respectively (Supplementary Table 4;
173
Supplementary Figure 2). We subsequently performed a PacBio-only assembly
174
using an overlap layout-consensus method implemented in FALCON (version 0.3.0)
175
(Chin et al., 2013), and obtained a 1.47-Gb genome assembly with a contig N50 of
176
152.7 Kb (Table 1; Supplementary Tables 5-6). This final assembly of the rubber
177
tree genome comprises 16,023 scaffolds, of which 15,885 scaffolds (>10 kb) represent
178
99.93% of the 1.56-Gb genome (Supplementary Tables 5-7). We employed ~30.9
179
Gb high-quality next-generation sequencing (NGS) data with 19.8-fold genome
180
coverage using the Illumina HiSeq 2500 platform to polish the assembled genome
181
(Supplementary Table 4). Next, ~119.63 Gb sufficiently high-quality Hi-C data
182
(76.69×) were used to further superscaffold the genome assembly (Supplementary
183
Table 8; Supplementary Figure 3). We finally obtained a reference genome of the
184
rubber tree on the chromosome level by anchoring ~1,442 Mbp-sized contigs into 18
185
chromosomes using AllHic v0.8.12 (Zhang et al., 2019) (Figure 1; Supplementary
186
Tables 9-10; Supplementary Figure 4). The total length of the assembled genome
187
sequences accounts for ~92.4% of the estimated genome size (Table 1), about four
188
times longer than the previously reported genome assembly of BPM24 that was linked
189
using a high-density SNP-based genetic map (Pootakham et al., 2015). The lengths of
190
18 chromosomes of the GT1 genome ranged from ~36 Mbp (Chr17) to ~104 Mbp
191
(Chr09) with an average size of ~80 Mbp (Figure 1; Supplementary Table 10;
192
Supplementary Figure 4).
193 194
To evaluate the quality of the assembled rubber tree genome, we first mapped ~61
195
Gbp Illumina short reads. Our results showed that more than 98% of NGS reads could
196
be unambiguously represented with an expected insert size distribution, indicating a
197
high confidence of genome scaffolding (Supplementary Table 11). We then aligned
198
51,701 expressed sequence tags (ESTs) retrieved from public databases and 102,235
199
unigenes assembled through RNA-sequencing (RNA-Seq) data of the six GT1 tissues
200
with our assembled genome, showing that ~97.24% and 97.14% of protein-coding
201
genes could be covered, respectively (Supplementary Tables 12-13). Finally, we
202
assessed core gene statistics using BUSCO (Simao et al., 2015) to verify the
203
sensitivity of gene prediction and the completeness and appropriate haplotig merging
204
of the genome assembly. Our predicted genes recovered 1,359 of the 1,440 (94.4%)
205
highly conserved core proteins in the Embryophyta lineage (Supplementary Table
206
14).
207 208
Whole genome comparisons of the GT1 genome with the three other rubber tree
209
genome assemblies showed that the ordered syntenic genomic sequences displayed a
210
good collinearity between GT1 and the three other rubber tree genomes; ~86-95% of
211
the GT1 genome sequences with lengths of >1 kbp were covered by any of the other
212
three genomes (Supplementary Table 15; Supplementary Figures 5-6). However,
213
the GT1 genome assembly using long-read PacBio sequencing data alone in this study
214
revealed closer syntenic relationships with hybrid assemblies of RRIM600 and
215
BMP24 with Illumina and PacBio sequencing data (Lau et al., 2016; Pootakham et al.,
216
2017) than the fairly fragmented genome assembly of Reyan7-33-97 using the
217
Illumina sequencing data alone (Tang et al., 2016). We annotated approximately
218
1042.42 Mbp (~70.82%) of repetitive sequences, among which the GT1 genome
219
comprised ~65.88% (~969.72 Mbp) of LTR retrotransposons (Supplementary Table
220
16). Total TE content of the GT1 genome is larger than the formerly reported genome
221
assemblies of the two rubber tree cultivars (Reyan7-33-97: 66.46%; RRIM600:
222
69.80%) and comparable to BPM24 (BPM24: 71.10%) (Supplementary Table 16),
223
indicating a high-quality performance to de novo assemble genomic regions
224
containing highly repetitive sequences using long PacBio reads (Supplementary
225
Figure 6).
226 227
Combining ab initio prediction and transcriptome sequence alignments from RNA-
228
Seq data of the six tissues (Supplementary Tables 17-18), we predicted a total of
229
44,187 protein-coding genes with an overall support of 96.76% (Table 1;
230
Supplementary Tables 19-20; Supplementary Figure 7a). Of them, 79.34%,
231
75.80%, 94.70% and 96.37% could be functioned with SwissProt, PFAM, TrEMBL
232
and Interpro databases, respectively (Supplementary Table 20; Supplementary
233
Figure 7b). We further performed homology searches and annotated noncoding RNA
234
(ncRNA) genes (Supplementary Table 21), yielding 945 transfer RNA (tRNA)
235
genes, 93 ribosomal RNA (rRNA) genes, 61 small nucleolar RNA (snoRNA) genes,
236
396 small nuclear RNA (snRNA) genes and 373 microRNA (miRNA) genes. We
237
annotated ~689,255 simple sequence repeats, which will provide valuable genetic
238
markers to assist future genetic improvemet of the rubber tree (Supplementary Table
239
22; Supplementary Figure 8).
240 241
Chromosomal evolution after a common paleotetraploidy in the spurge family
242
Chromosome-scale genome assembly ensures the detection of whole genome
243
duplication (WGD) events that affected chromosomal evolution in the rubber tree. We
244
examined the rubber tree genome for homeologous genomic segments based on
245
sequence similarities of paralogous genes. A total of 1,901 syntenic blocks containing
246
26,403 paralogous gene pairs were identified in the rubber tree genome
247
(Supplementary Table 23). Comparative analyses of paralogous gene pairs from
248
these intragenomic homeologous blocks unquestionably showed that rubber tree has
249
experienced two paleotetraploidization events corresponding to sequence divergence
250
peaks at ~0.124 and 0.530 synonymous transversions per site, respectively
251
(Supplementary Figure 9). Comparisons among paralogues and orthologues of the
252
five Malpighiales species show that a recent paleotetraploidy event occurred prior to
253
the split of the Hevea and Manihot lineages, and an ancient eurosid WGD was shared
254
by all these examined species (Supplementary Figure 9). This finding strongly
255
supports the hypothesis that a recent paleotetraploidy event took place before the
256
divergence of the Hevea and Manihot species but after the split of the castor bean
257
(Pootakham et al., 2017).
258 259
The rubber tree contains the same number of chromosomes (2n=36) as the cassava,
260
which is almost twice as many as that of castor bean (2n = 20) (Chan et al., 2010).
261
The genus Hevea is closest to Manihot in the spurge family (Euphorbiaceae), and they
262
diverged from each other approximately 36 Myr (Bredeson and Lyons, 2016;
263
Pootakham et al., 2017). The chromosome-based genome assembly we obtained for
264
GT1 permits us to identify the 1,901 syntenic blocks of the rubber tree genome by
265
inter-chromosomal comparisons, within which 26,403 paralogous gene pairs are
266
spread across the 18 chromosomes, falling into the five two-by-two chromosome pairs
267
and two four-by-four chromosome groups (Supplementary Table 23; Figure 1;
268
Figure 2a;Supplementary Figure 10). We similarly detected 38,833 orthologous
269
gene pairs corresponding to 1,829 conserved syntenic blocks between the rubber tree
270
and cassava genomes (Supplementary Table 23; Supplementary Figure 11). These
271
can also be divided into five two-by-two pairs and two four-by-four groups
272
(Supplementary Table 23; Supplementary Figure 12). Bredeson et al. (2016)
273
previously reported a similar pattern of conserved synteny comprising five groups of
274
two-by-two chromosome pairs and two chromosome groups of four-by-four as a
275
result of the paleotetraploidy in the cassava. These two chromosome-based high-
276
quality genome assembles ensured a reliable discovery of macrosynteny conservation
277
between the two euphorbs. Macrosynteny conservation of rubber tree and cassava that
278
comprise the same chromosome nui ombers further supports the hypothesis that a
279
whole genome duplication event occurred in the common ancestor of Hevea and
280
Manihot.
281 282
To trace the palaeohistory of euphorbs and understand the chromosomal evolution
283
after the polyploidization event, we performed a comparative genomic analysis of the
284
rubber tree with cassava (Bredeson et al., 2016) using the grape (Jaillon et al., 2007)
285
as the closest modern representative of the ancestral eudicot karyotype (AEK)
286
(Badouin et al., 2017; Salse, 2016). Since the recent WGD event occurred before the
287
split of the rubber tree and cassava lineages, these two paleopolyploid plants have
288
experienced diploidization through structural and functional changes, providing an
289
unprecedented opportunity to understand chromosomal evolution in spurge plants. We
290
reconstructed a model to infer the scenario of chromosomal evolution in rubber tree
291
and cassava based on genome assemblies at the chromosome level (Figure 2b). In the
292
rubber tree genome, Hbr04, Hbr08, Hbr10, and Hbr13 did not experience many
293
rearrangements while other chromosomes have suffered from a large number of
294
fission and fusion events (Figure 2b). In the cassava genome, Mes03 and Mes04
295
retained few regions derived from the eudicot ancestor compared with other
296
chromosomes (Figure 2b). Our comparative genomic analysis of the rubber tree and
297
cassava further identified a large number of genomic structural variation after a shared
298
polyploidization event (Supplementary Figure 13). The model of chromosomal
299
evolution for rubber tree and cassava reveals that substantial genomic rearrangement
300
events have extensively shaped the chromosome structure leading to the 18 modern
301
chromosomes since the common paleopolyploid ancestor. `
302 303
The three LTR retrotransposon families are drivers of the expanded rubber tree
304
genome
305
Using our chromosome-based genome assembly, we investigated the evolution of
306
LTR-retrotransposons and their potential contribution to the growth of the rubber tree
307
genome. The rubber tree has experienced a rapid growth of genome size as a result of
308
the Hevea-specific proliferation of transposable elements compared to the three other
309
closely related species of Euphorbiaceae, including cassava, castor bean and physic
310
nut (Table 1; Figure 3; Supplementary Tables 24-25; Supplementary Figures 14-
311
15). Retrotransposons in the rubber tree genome are particularly abundant (~1042.42
312
Mb; ~70.82%) compared to five other plant species of Malpighiales, including
313
Manihot esculenta, Jatropha curcas, Ricinus communis, Populus trichocapa and
314
Linum usitatissimum. Specifically, the rubber tree genome is enriched in LTR
315
retrotransposons (Ty1/copia, Ty3/gypsy and non-autonomous LTR retrotransposons)
316
with a total length of ~969.72 Mb (~65.88%) (Figure 3; Supplementary Figures 15-
317
16; Supplementary Tables 24-25). Dating transposable elements shows that
318
retrotransposons and DNA transposons were almost separately amplified among the
319
six sequenced plant species of Malpighiales (Supplementary Figure 14).
320
Comparative analyses of the Ty1/copia, Ty3/gypsy and other classes of LTR
321
retrotransposons suggest that they have experienced three retrotransposition bursts
322
over the last 10, 20 and 30 million years ago (mya), respectively (Supplementary
323
Figure 14). Ty3/gypsy LTR retrotransposon families dominate the rubber tree genome
324
contributing ~585.69 Mb (~39.79%) and are ~3.13-fold more abundant than
325
Ty1/copia with ~187.34 Mb (~12.73%) (Supplementary Tables 24-25; Figure 3a;
326
Supplementary Figure 15). We exclusively observed that the amplification of Tekay
327
(~430.60 Mb; ~29.25%) of Ty3/gypsy and Angela (~96.70 Mb; ~6.57%) of Ty1/copia
328
has largely contributed to the expansion of the rubber tree genome (Figure 3;
329
Supplementary Figure 15a, b). Surprisingly, the largest Ty3/gypsy retrotransposon
330
family HB0001 (~368.08 Mb; ~25.01%) belonging to Tekay has long been active over
331
the last 45 million years (myr). Whereas, the two Ty1/copia retrotransposon families,
332
HB0002 (~96.70 Mb; ~6.57%) and HB0003 (~62.52 Mb; ~4.25%) belonging to
333
Angela and Tekay, respectively, have recently proliferated over the last 10 MYR
334
(Supplementary Figure 15c; Supplementary Figure 17). It is these three LTR
335
retrotransposon families that have predominantly amplified during the last 10 MYR;
336
they together account for ~54.38% of LTR retrotransposons and ~35.83% of the
337
whole rubber tree genome (Figure 3; Supplementary Figure 15c; Supplementary
338
Figure 17; Supplementary Table 26). Further annotation and comparative analyses
339
of the repeated elements among the four Euphorbiaceae species (Supplementary
340
Figure 15c; Supplementary Figure 17), including M. esculenta, J. curcas, R.
341
communis and H. brasiliensis, suggests that only the H. brasiliensis genome has
342
undergone specific bursts of these three LTR-retrotransposon families during the past
343
5 MYR. This resulted in the accumulation of a large number of retroelements driving
344
the growth of ~890 Mb of the rubber tree genome after the divergence from the M.
345
esculenta lineage around 36 Myr (Supplementary Figure 17).
346 347
The rapid evolution of gene families and the rubber biosynthesis
348
Defining the rapidly evolving gene families among flowering plants has been helpful
349
to identify genomic basis underlying physiological changes of metabolite constituents
350
during evolution. We compared the predicted proteomes of the rubber tree, cassava,
351
castor bean, physic nut, poplar, flax, Arabidopsis thaliana, and rice, yielding a total of
352
24,562 orthologous gene families that comprised 225,207 genes (Supplementary
353
Table 27; Supplementary Figure 18a). This revealed a core set of 121,256 genes
354
belonging to 7,498 gene clusters that were shared among all eight plant species,
355
representing ancestral gene families (Supplementary Figure 18a). We obtained a
356
total of 1,352 gene clusters containing 4,791 genes unique to the rubber tree,
357
potentially related to biosynthetic processes associated with the production of latex.
358
PFAM functional analysis showed that gene functions are enriched in aspartic acid
359
proteinase gene family (PF16845, P < 0.05), which encodes enzymes associated with
360
the production of lutoid membranes necessary for the aggregation of rubber particles
361
(Supplementary Table 28). Remarkably, we found that the rubber tree-specific gene
362
families are significantly enriched in functions related to the phosphorylation activity,
363
which is key to biosynthetic processes of latex (PF08645, P < 0.05; PF03372, P <
364
0.001) (Supplementary Table 28).
365 366
In flowering plants, the expansion or contraction of gene families is an important
367
driver of phenotypic diversification and the formation of phytochemical properties
368
(Chen et al., 2013; Ohno, 1970). We characterized gene families that underwent
369
discernible changes and divergently evolved along different branches with a particular
370
emphasis on those involved in the latex biosynthesis of the rubber tree. Our results
371
revealed that, of the 14,253 gene families inferred to be present in the most recent
372
common ancestor of the eight studied plant species, 5,034 gene families comprising
373
14,103 genes show significant expansions (P < 0.05) in the rubber tree lineage
374
(Supplementary Figure 18b; Supplementary Figure 19). Functional enrichment
375
analyses of these genes by both Gene Ontology (GO) terms and PFAM domains
376
surprisingly reveals that they were mainly enriched in a number of functional
377
categories involved in whole latex biosynthesis process that comprises twelve
378
pathways (Rahman et al., 2013) (Supplementary Table 29). For examples, functional
379
annotation of these genes demonstrates that they were mainly enriched in a number of
380
functional categories involved in basal metabolic processes, such as fructose 6-
381
phosphate metabolic process (GO:0006002; P < 0.001), glucose metabolic process
382
(PF11721; P < 0.001), phospholipase (PF00388, PF12357, PF00614, PF09279,
383
PF00387; P < 0.001), ribosomal protein (PF01020, PF00453, PF00347; P < 0.001),
384
argonaute (PF08699, PF16487, PF16488, PF16486; P < 0.001), aminotransferase
385
(PF00202; P < 0.001) and dehydrogenase (PF00725, PF09265; P < 0.001).
386
Furthermore, gene families are significantly enriched in a number of functions related
387
to carbohydrate metabolism, such as hydrolase (PF00702, PF03662, PF07748,
388
PF01915, PF01738, PF03644, PF00332, PF01074, PF12215, PF04685, PF00933,
389
PF00232, PF01301, PF12710; P < 0.001), glutamine synthetase (PF03951, PF00120;
390
P < 0.001), carbohydrate-binding protein of the ER (PF12819; P < 0.001), glutamate
391
decarboxylase (GO:0004970; P < 0.001), and UDP-glucose/GDP-mannose
392
dehydrogenase family (PF00984, PF03720; P < 0.001) (Supplementary Table 29).
393
In addition, gene families are significantly enriched in a number of functions related
394
to pyruvate metabolism involved in the MVA pathway (PF02887; P < 0.001). Notably,
395
we find that gene families encoding S-adenosyl-L-methionine synthase (SAMS) are
396
significantly enriched in the ethylene biosynthesis, including S-adenosylmethionine
397
biosynthetic process (GO:0006556; P < 0.001), methionine adenosyltransferase
398
activity (GO:0004478; P < 0.001) S-adenosylmethionine synthetase (PF02773,
399
PF00438, PF02772; P < 0.001), and adenosylmethionine decarboxylase (PF01536; P
400
< 0.001). Gene families are also significantly enriched in a number of functions
401
related to the biosynthesis of metabolic compounds, including polysaccharide
402
(PF03033,PF05686,PF00982,PF13641; P < 0.001) and glycoprotein lectin
403
(PF00139,PF02140,PF01453; P < 0.001) (Supplementary Table 29). Enrichment
404
analyses show that the expanded gene families are enriched in a total of 18 KEGG
405
pathways, including fifteen metabolism-related pathways, such as pyruvate
406
metabolism (ko00620; P < 0.05) relevant to biosynthetic processes of latex
407
(Supplementary Table 30; Supplementary Figure 18c).
408 409
As an important biological feature of the rubber tree, rubber is sequentially
410
synthesized by twelve pathways, in which hundreds of gene families are involved
411
(Rahman et al., 2013). In H. brasiliensis, natural rubber is a high molecular weight
412
biopolymer that comprises cis-isoprene units resulting from isopentenyl diphosophate
413
(IPP). The biosynthesis of IPP takes place via two distinct paths, that is, MVA and
414
MEP pathways. In total, we identified 70 so-called rubber biosynthesis-related genes,
415
including 11 genes involved in the MVA pathway, 18 genes associated with MEP
416
pathway, 11 genes responsible for initiator synthesis in the cytosol, and 30 genes
417
related to the rubber elongation (Supplementary Table 32; Figure 4). Compared to
418
non-rubber-produced plant species, the rubber tree shows the largest number of genes
419
involved in the synthesis of IPP to the final rubber polymer, which may largely
420
enhance the formation of isoprenoids (Supplementary Table 33). We particularly
421
observed a significant expansion of rubber biosynthesis-related gene families that
422
correlates with its capacity to produce high levels of latex in H. brasiliensis, such as
423
the eight f1-deoxy-D-xylulose 5-phosphate synthase (DXS) encoding DXP synthase
424
that catalyzes the first step in the MEP pathway, twelve cis-prenyltransferase (CPT),
425
eight rubber elongation factor (REF) and ten small rubber particle (SRPP) genes
426
(Supplementary Table 33). Besides a significant expansion (13/18) of the
427
REF/SRPP gene family that make gene clusters on Chromosome 9 (Figure 4), we
428
also found that nine of the identified twelve CPT genes are clustered on Chromosome
429
14 (Figure 4), suggesting that many of them appear to have arisen by tandem
430
duplication events.
431 432
To gain insights into the molecular mechanisms underlying the production of natural
433
rubber we examined tissue-specific expression patterns of genes involved in the
434
rubber biosynthesis using RNA-Seq datasets from barks, roots, flowers, stems, leaves,
435
and seeds. Our results show that these 20 rubber biosynthesis-related gene families
436
exhibit distinct patterns of gene expression among tissues; interestingly, they are most
437
highly expressed in flowers when compared to barks, followed by stems
438
(Supplementary Table S32). We paid particular attention to tissue-specific
439
expression patterns of REF/SRPP and CPT gene families that are functionally known
440
to encode enzymes that are responsible for the rubber elongation. The 18 REF/SRPP
441
genes exhibit dissimilar expression patterns among six tissues of the rubber tree, of
442
which we intriguingly found that REF1, REF2, REF3, REF4 and REF5 are highly
443
expressed in flowers, while REF6, REF7 and REF8 are highly expressed in seeds.
444
However, the majority of REF genes are expressed in barks. Interestingly, we
445
observed that SRPP3, SRPP5, SRPP6, SRPP8, SRPP9, and SRPP10 are highly
446
expressed in flowers compared to the barks, while SRPP2 and SRPP4 are favorably
447
expressed in barks when compared to flowers. Our results indicated that CPT1, CPT4,
448
CPT7, and CPT11 are highly expressed in flowers, while others are preferentially
449
expressed in barks (CPT12), roots (CPT2 and CPT9), leaves (CPT4 and CPT5) and
450
seeds (CPT6), respectively. Taken together, we show that, besides the high expression
451
of a small number of genes in roots, stems, leaves, and seeds, the rubber may be
452
actively synthesized in both flowers and barks, but predominantly accumulates in
453
barks. Our differential expression analysis further reveals that 276 DEGs related to the
454
rubber biosynthesis are commonly up- regulated in barks compared with five other
455
tissues (Supplementary Table 34; Supplementary Figure 20), of which we
456
identified two latex-produced genes involved in TCA-cycle and one gene associated
457
with starch metabolism (Supplementary Table 35).
458 459
To provide a transcriptomic overview of expression divergence present between
460
cultivated and wild rubber trees that underlies the variation in latex yields, we
461
examined tissue-specific expression patterns of genes involved in the rubber
462
biosynthesis using RNA-Seq data from barks of the five elite cultivars of the rubber
463
tree and the five accessions of wild rubber tree representatives of its global geographic
464
range. Our results show that these 20 rubber biosynthesis-related gene families
465
exhibited distinct patterns of gene expression across different accessions of rubber
466
trees (Supplementary Figure 21; Supplementary Tables 36-38). PCA analysis
467
based on expression levels of rubber biosynthesis-related genes further suggests that,
468
compared to abundant expression bias across GT1 tissues, there are no remarkably
469
preferential expression patterns between the cultivated and wild rubber trees
470
(Supplementary Figures 21-22; Supplementary Tables 36-37). Our statistic test
471
further showed that, among all 69 rubber biosynthesis-related genes, only seven genes
472
are significantly differentially expressed between cultivated and wild rubber trees
473
(Supplementary Table 38), which were experimentally validated by real-time PCRs
474
(Supplementary Figure 23; Supplementary Table 39).
475 476
Genomic insights into population divergence and artificial selection on cultivated
477
rubber tree
478
The first high-quality rubber tree genome allows us to examine genomic variation
479
present within the cultivated and wild rubber trees and detect potential signatures of
480
selection during the domestication. We employed the Illumina short-read technology
481
with paired-end libraries on the HiSeq2000 sequencing platform to resequence eight
482
elite cultivars of the rubber tree and six accessions of wild rubber tree according to
483
native geographic range throughout the world (Supplementary Table 40). Each of
484
these 14 cultivated and wild representative genomes (first reported in this study), were
485
sequenced to > 3-fold sequence coverage using paired-end (2 × 100-bp) Illumina
486
sequencing. In addition, we included two previously reported genomes of cultivated
487
rubber trees (RRIM 600 and Reyan7-33-97) with >10-fold genome sequence coverage
488
(Lau et al., 2016; Tang et al., 2016). We first mapped high-quality short reads back to
489
the reference genome sequence of the rubber tree, and obtained mapping rates ranging
490
from 95.81% to 98.86% (Supplementary Table 40). They represent the largest
491
whole-genome sequencing data for a diverse panel including 16 accessions with
492
sufficient depths of genome coverage to represent the genomic diversity of cultivated
493
and wild rubber trees.
494 495
After aligning reads against the rubber tree reference genome sequence, we obtained a
496
total of 15,728,276 common single nucleotide polymorphisms (SNPs), more than 90%
497
of which were located on intergenic regions (Supplementary Tables 40-41;
498
Supplementary Figure 24). The density of SNPs across all 16 cultivated and wild
499
rubber trees averages approximately 11.9 SNPs per kilobase, of which 14,645,744 and
500
14,497,454 SNPs were totally identified across these 10 cultivars and 6 wild
501
accessions of the rubber tree, respectively. We observed 1,547,989 SNPs (9.84%) in
502
genic regions, including 174,394 synonymous, 258,938 nonsynonymous, 426,705
503
exonic, and 1,121,284 intronic SNPs. We used this large dataset of SNPs from both
504
wild and cultivated rubber trees to evaluate genome-wide levels of nucleotide
505
diversity (π and θw) to be 0.00345±0.00021 and 0.00291±0.00041. The wild rubber
506
trees show higher levels of nucleotide diversity than rubber tree cultivars (π per kb,
507
3.43 vs. 2.86, P = 2.29 × 10-5; θw per kb, 2.59 vs. 2.55, P = 4.21 × 10-1)
508
(Supplementary Table 42).
509 510
The genome-wide SNP dataset provides an unprecedented opportunity to investigate
511
the population structure of the rubber tree. Principal-component analysis (PCA) of
512
SNP variation suggests that the top two eigenvectors strongly correlate with sampling
513
sources: PC1 was correlated with cultivated rubber trees admixed by two accessions
514
of wild rubber tree (Wild258 and Wild166), and PC2 was correlated with wild rubber
515
trees (Figure 5a). These inferred relationships are also supported by phylogenetic
516
analysis based on SNPs (Figure 5b), showing that the ten cultivars (Cult053, Cult169,
517
Cult171, Cult223, Cult293, Cult589, KEN, RRIM 600, RP, and RY7-33-97) formed
518
one cluster admixed by two accessions of wild rubber tree (Wild258 and Wild166),
519
while the four wild rubber trees (Wild162, Wild194, Wild232 and Wild778) were
520
clustered into another group. To generate an alternative view of population
521
stratification, we used the population clustering program STRUCTURE (Pritchard,
522
2010), which inferred the optimal number of genetic clusters comprising the
523
cultivated and wild rubber tree genomes to be K = 7 (Figure 5c; for other K values,
524
see Supplementary Figure 25). We found that cultivated rubber trees predominantly
525
form three major clusters, two of which are grouped with Wild258 and Wild166,
526
respectively. However, the four accessions of wild rubber trees split into four distinct
527
groups, of which Wild778 has contribute the most to cultivated rubber trees, probably
528
serving as an ancestral population.
529 530
Population genomics provides an opportunity to track the dispersal, genetic
531
bottlenecks and signature of artificial selection during the domestication in most
532
cultivated crop species (Doebley et al., 2006). To detect the footprint of artificial
533
selection in the rubber tree genome we attempted to identify genomic regions with
534
significantly lowered levels of polymorphisms in cultivated populations compared to
535
wild rubber trees. While comparing to the FST values and levels of polymorphisms
536
between cultivated and wild accessions of the rubber tree (Figures 5d, 5e), we
537
performed a whole-genome screening and identified ~83.7 Mbp of the genome
538
potentially subjected to selective sweeps. These regions harbored 578 genes, which
539
we consider candidate domestication genes of the rubber tree (Supplementary Table
540
43). KEGG enrichment analysis shows that these candidate domestication genes are
541
enriched in twelve pathways, of which nine genes enriched in ko00900 associate with
542
the terpenoid backbone biosynthesis that is key to the latex biosynthesis
543
(Supplementary Figures 26-27). Further functional annotation indicates, for example,
544
that the two genes (GT005609, GT009352) encode the first enzyme (DXS) in the
545
MEP pathway (Supplementary Table 43).
546 547
DISCUSSION
548
De novo sequencing and assembly of large, highly repetitive, and heterozygous plant
549
genomes have long been challenging and problematic. Here we construct a high-
550
quality reference for one such genome, the GT1 genome for H. brasiliensis. We
551
generated a 1.3-Gbp de novo assembly of the rubber tree genome highlighting the
552
potential of combining PacBio long-read and Illumina short-read assemblies to create
553
improved reference genomes, in sharp contrast to relatively fragmented genome
554
assemblies using the Illumina sequencing data alone (Rahman et al., 2013; Tang et al.,
555
2016) or hybrid assemblies with Illumina and PacBio sequence data (Lau et al., 2016;
556
Pootakham et al., 2017). We thus established the efficiency of employing long SMRT
557
reads to resolve ambiguous genomic regions harboring predominantly repetitive
558
sequences, particularly LTR-retrotransposons.
559 560
Considering the difficulty in generating a high-density linkage map for the rubber tree
561
as a long lifespan tree species, we demonstrated an efficient methodology that
562
leverages long-range Hi-C data to scaffold contigs assembled from PacBio reads and
563
successfully anchor ~71% of the 1.3-Gb genome assembly into 18 pseudo-
564
chromosomes. After the availability of four draft genome assemblies for the rubber
565
tree (Lau et al., 2016; Pootakham et al., 2017; Rahman et al., 2013; Tang et al., 2016),
566
we obtain a chromosome-based reference genome with considerable improvement in
567
sequence contiguity. The advent of the GT1 reference genome together with
568
companion transcriptome resources presented in this study provides important insights
569
into spurge genome evolution. It also further strengthens interest in the rubber tree as
570
a model for ~2,500 rubber-produced plants to accelerate our understanding of the
571
molecular mechanisms underlying the rubber biosynthesis.
572 573
Our comparative genomics analysis of the five Malpighiales species convincingly
574
demonstrates that the rubber tree has experienced two paleotetraploidization events.
575
Besides an ancient eurosid WGD (Salse, 2016), we confirm a recent paleotetraploidy
576
event occurred before the divergence of the Hevea and Manihot species but after the
577
split of the castor bean by genome-scale comparative analysis of H. brasiliensis and
578
other members in Euphorbiaceae, M. esculenta, R. communis and J. curcas (Bredeson
579
and Lyons, 2016; Chan et al., 2010; Pootakham et al., 2017; Sato et al., 2011).
580
Chromosome-based analysis of the two high-quality genomes of M. esculenta
581
(Bredeson and Lyons, 2016) and H. brasiliensis reported in this study further reveal
582
that macrosynteny conservation derived from the shared polyploidization event were
583
disrupted by widespread genomic rearrangement events that drive chromosomal
584
evolution in the spurge family.
585 586
The rubber tree possesses an unusually large genome when compared to the majority
587
of other spurge plants. We compared the GT1 genome with three other previously
588
released genome assemblies, including RRIM600 (Lau et al., 2016; Rahman et al.,
589
2013), Reyan7-33-97 (Tang et al., 2016) and BMP24 (Pootakham et al., 2017),
590
showing nearly completely annotation of most transposable elements, including a
591
large number of LTR retrotransposons in the GT1 genome. We identify a large
592
Hevea-specific proliferation of LTR retrotransposons, which contributed to the rapid
593
increase in genome size when compared to the three other closely related species,
594
including cassava, castor bean and physic nut. Among the three LTR retrotransposon
595
families that have predominantly amplified and altogether account for over half of
596
whole rubber tree genome, we observed a pattern of longstanding and incessant LTR
597
retrotransposon bursts of the largest Ty3/gypsy retrotransposon family over the last 45
598
million years. This is similar to what has been reported for tea tree genome (Xia et al.,
599
2017). However, we also document the recent proliferation of two other large LTR
600
retrotransposon families during the last 10 MYR. Once again, this pattern is consistent
601
to a previous study that reported a recent proliferation of the three LTR-
602
retrotransposon families in the wild rice genome, Oryza australiensis, leading to a
603
two-fold increase in genome size during the last three MYR (Piegu et al., 2006).
604 605
The well-established WGD event occurred before the divergence of the Hevea and
606
Manihot species but after the split of the castor bean. This occurred in conjunction
607
with lineage-specific segmental duplication leading to the expansion of gene families
608
relevant to the activation of the latex biosynthesis and thus the increase of rubber
609
production. We have identified a large number of rubber tree-expanded genes
610
encoding enzymes widely involved in numerous functional categories related to basal
611
metabolic processes, carbohydrate metabolism as well as pyruvate metabolism
612
relevant to the MVA pathway, which have greatly enhanced the accumulation of
613
sufficient precursors for the subsequent latex production. The rubber tree-expanded
614
genes encoding SAMS are significantly enriched in GO terms and PFAM domains
615
involved in the ethylene biosynthesis that have greatly improved the latex production
616
(Dusotoit-Coucaud et al., 2010; Yang and Hoffman, 2003). We detected a significant
617
enrichment in a number of functions related to glycoprotein lectin and polysaccharide
618
activities that are well-known to regulate latex coagulation and the blocking of latex
619
flow—important for controlling the rubber production and yield of the rubber tree.
620
The completion of such a high-quality reference genome of rubber tree also permits us
621
to fully identified almost all gene families involved in sequentially long pathways of
622
the rubber biosynthesis from sucrose to rubber polymerization. Our comparative
623
analyses with non-rubber-produced plant species show that the rubber tree harbors the
624
largest number of genes involved in the latex biosynthesis. The rapid expansion of
625
rubber biosynthesis-related gene families, such as DXS, CPT and REF/SRPP, which is
626
in a good agreement with former observations (Lau et al., 2016; Pootakham et al.,
627
2017; Tang et al., 2016), suggests a strong correlation with its capacity to produce
628
high levels of latex in H. brasiliensis. We thus hypothesize that, instead of the
629
assumption that the expansion of the REF/SRPP family is associated with correlation
630
with the rubber biosynthesis (Tang et al., 2016), the formation of rubber biosynthesis
631
network and wide-ranging expansion of rubber biosynthesis-related gene families
632
enabled the rubber tree to significantly strengthen the latex biosynthesis and to
633
efficiently yield high-quality natural rubber.
634 635
Comprehensive comparative transcriptomic analyses aided by this high-quality
636
genome assembly also provided new insights into the molecular mechanisms
637
underlying the production of natural rubber. An assessment of transcriptomic data
638
shows that these rubber biosynthesis-related gene families are more highly expressed
639
in flowers than barks and stems, suggesting that a potential subfunctionalization or
640
neofunctionalization after gene duplication might explain functional divergence under
641
strong selection pressures that exaggerates the complexity in rubber biosynthesis. We
642
previously reported that the majority of genes involved in the ginsenoside
643
biosynthesis are highly expressed in flowers compared with leaves and roots in Panax
644
notoginseng (Zhang et al., 2017). The results suggested that ginsenosides might be
645
actively synthesized in flowers besides roots and leaves but accumulated in roots,
646
supported by a speedy accumulation of total triterpene saponins after flowering. We
647
thus assume that the rubber may be actively synthesized in flowers besides barks but
648
accumulated in barks, but the hypothesis that there is an increased yield of rubber in
649
bark after flowering requires to be experimentally validated in H. brasiliensis.
650 651
In addition to confirming earlier observations that REF/SRPP gene families are highly
652
expressed in latexs (Lau et al., 2016; Pootakham et al., 2017; Tang et al., 2016), we
653
observed differentially expressed patterns of REF/SRPP and CPT genes in flowers
654
and barks. Further experimental studies are required to examine the complicated
655
transcriptional regulation of these expanded rubber biosynthesis-related gene families.
656
Such investigations will likely offer clues to understanding the unique properties of H.
657
brasiliensis needed to produce extraordinary amounts of rubber.
658 659
We also provided new insights into natural standing genetic variation and divergence
660
between the cultivated and wild rubber trees. Compared to wild rubber trees, we
661
observe considerable reduction in genomic diversity among cultivated rubber trees.
662
This is likely the result of a strong genetic bottleneck and artificial selection during
663
the relatively short period of domestication since the late nineteenth century. Despite
664
limited phenotypic divergence and gene flow, there is a strong genetic split between
665
wild and the cultivated rubber trees. A subset of wild rubber trees, however, may
666
either represent the ancestral populations of the domesticated rubber tree or have
667
experienced more recent and possibly frequent genomic introgression with rubber tree
668
cultivars. Extensively sampling of cultivated and wild rubber trees will be required to
669
distinguish among these possibilities. Finally, we identify hundreds of candidate
670
domestication genes many of which are involved in specific pathways important for
671
rubber biosynthesis. More analyses and experimental validation are needed to
672
determine whether these represent bona fide domestication genes or whether they are
673
simply genetic hitchhikers of regions that were targeted by artificial selection over
674
one hundred years’ domestication of the rubber tree.
675 676
The rubber tree genome assembly and transcriptomic and genomic variation datasets
677
presented in this study will offer valuable information to aid efficient germplasm
678
exploration and the improvement of economically important traits of rubber tree to
679
meet global market’s increasing demand for natural rubber.
680 681
METHODS
682
Genome sequencing and assembly
683
DNA was extracted from a GT1 individual for PacBio RSII and Hiseq sequencing
684
platforms. One library with 20 Kb insert size was constructed and sequenced for 100
685
SMRT cells on PacBio RSII. The PacBio data were corrected with MHAP (Berlin et
686
al., 2015) algorithm and assembled by canu (v1.1) (Koren et al., 2017). We generated
687
~31 Gb Illumina data with 500 bp insert size on Hiseq 2500 platform to polish
688
assembled genome sequences using pilon (Walker et al., 2014). For Hi-C sequencing,
689
chromosome structure was fixed by formaldehyde crosslinking, and then MboI
690
enzyme was used to shear DNA. Hi-C library with 200-600 bp insert size was
691
constructed, which was sequenced on Hiseq 2000 platform. The Hi-C sequence data
692
were qualified with HIC-pro (Servant et al., 2015), in which the validly mapped reads
693
were selected to cluster and order the assembled genome sequences using AllHic
694
v0.8.12 (Zhang et al., 2019).
695 696
Genome annotation
697
Repetitive elements were identified based on homologous detection and de novo
698
searches. For homolog strategy, whole genome sequences were aligned with RepBase
699
21.01 (Jurka et al., 2005) using RepeatMasker (v4-0-6) program
700
(www.repeatmasker.org). LTR_FINDER1.0.6 (Xu and Wang, 2007) was applied to
701
identifying LTR retrotransposon elements to construct de novo repeat library, and
702
genomic locations were also detected using RepeatMasker (v4-0-6).
703 704
Gene models were predicted based on the five closely related plant species of the
705
rubber tree (M. esculenta, R, communis, J. carcass, L. usitatissimum and P.
706
trichocarpa) and RNA-seq data. Amino acid sequences from these genomes were
707
mapped using BLAT (Kent, 2002) with the rubber tree genome assembly to search for
708
candidate protein-coding sequences, and then GeneWise (Birney et al., 2004) was
709
performed to predict gene models. RNA-seq reads from bark, stem, seed, root, leaf
710
and inflorescence tissues were mapped to the assembled GT1 genome sequences with
711
TOPHAT (Trapnell et al., 2009), and transcripts were determined by Cufflink
712
(Trapnell et al., 2010); these evidences were subsequently integrated by GLEAN to
713
generate conserved gene models. The integrated genes were compared with Cufflink
714
and GeneWise results based on cultivar Reyan7-33-97 (Tang et al., 2016), and
715
transcripts or functional genes were finally added to the GLEAN (Elsik et al., 2007)
716
gene set. For function annotation, amino acid sequences were searched with known
717
UniProt (2009), KEGG (Kanehisa and Goto, 2000) databases. InterProscan (Jones et
718
al., 2014) were performed to annotate protein domains.
719 720
Analysis of gene family evolution
721
Homologous genes from different species were combined using all vs all BLASTP.
722
Gene families were clustered with OrthoMCL (Li et al., 2003) according to blast
723
results. Gene family expansion and contraction were calculated using CAFÉ program
724
based on gene cluster statistics. Homologous genes were detected by BLASTP
725
(Altschul et al., 1990), and then synteny blocks were identified with MCscanX (Wang
726
et al., 2012).
727 728
Population genomic analysis
729
The reads from 10 cultivated and 6 wild species were mapped to assemble the GT1
730
genome using BWA (Li, 2013) with mem algorithm. The duplicated reads were
731
removed using Picard tools (http://broadinstitute.github.io/picard/). SNP calling was
732
performed using the Genome Analysis Toolkit (GATK) (McKenna et al., 2010). The
733
phylogenetic tree of all individuals was reconstructed using Neighbor-
734
Joining/UPGMA method, which was visualized using the software MEGA6 (Tamura
735
et al., 2013). Population structure was speculated by ADMIXTURE (Alexander et al.,
736
2009), which is based on a maximum likelihood method. We predefined the number
737
of genetic clusters K from 2–7, and PCA analysis was performed using R language.
738
The average pairwise diversity within a population (θπ) and FST values were
739
calculated on sliding windows of 100 kb. Considering that genomic regions under
740
selection have a lowered diversity and reduced allele frequencies in the cultivated
741
populations compared with the same region in wild ancestral populations, and thus log
742
(Pwild/Pcult) > 1 and FST > 0.25 were set as cutoff values to identify these regions.
743 744
RNA isolation, sequencing and assembly
745
Tissues from leaves, flowers, seeds, barks, roots and stems of GT1 as well as the
746
eleven other bark samples of the rubber tree, including 5 cultivated (W169, W589,
747
W293, W53, W171) and 5 wild individuals (Y1187, Y379, Y809, Y4211, Y478) were
748
collected in Jinghong City, Yunnan Province, China. The high-quality RNA was
749
separately extracted and then sequenced on Hiseq 2500 platform. We filtered the low-
750
quality reads by following the quality-control procedures used for the genome
751
assembly. The transcriptome assembly for each rubber tree was generated using
752
Trinity (version r20140717) (Grabherr et al., 2011) with default parameters. The gene
753
expression levels were computed as the number of reads per kilobase of gene length
754
per million mapped reads (FPKM) using RSEM software (Li and Dewey, 2011).
755 756
ACCESSION NUMBERS
757
All sequencing reads and genome assembly have been deposited in the NCBI and BIG
758
Data Center under accession numbers PRJNA587314 and PRJCA001891,
759
respectively.
760 761
SUPPLEMENTAL INFORMATION
762
Supplemental Information is available at Molecular Plant Online.
763 764
FUNDING
765
This work was supported by Yunnan Innovation Team Project and the start-up grant
766
from South China Agricultural University (to L. G.).
767 768
AUTHOR CONTRIBUTION
769
L.-Z.G. and G.-H. L. conceived and designed the study; J. L., C. S., Ch. Sh., Y. W.,
770
C.-L. M., H.-B. Zh., G.-R. H.-T., X.-G. Zh. and S.-B. N. contributed to the sample
771
preparation and genome sequencing; Ch.-Ch. Sh., G.-Y. F. and W. L. performed
772
genome assembly; Y.T. performed flow cytometry experiments; Ch.-Ch. Sh., W.-B.
773
Ch., G.-Y. F., Q.-J. Zh., and Y.Zh., W.L. performed genome annotation; Ch.-Ch. Sh.,
774
C. Sh., Q.-J. Zh., W. L., G.-Y. F., H.-F. L., Z.-Y. X., S.-T. Zh., Y. Y., X.-N. H., Sh.-J.
775
H., W.-Q. L., M.-Q. L., J.-H. W., Y.-H. S., H. N., D. Zh., Y.-L. L.,Y. L.,K. L., J.-P.
776
W. and Y.-N. J. performed data analyses; L.-Z.G., J. L., C. Sh., Ch.-Ch. Sh., Q.-J. Zh.,
777
and W. L. wrote the manuscript; L.-Z.G., X. L. X.-C. Zh., E. E. E. and X. L. revised
778
the manuscript.
779 780
ACKNOWLEDGMENTS
781
The authors have no conflicts to declare.
782 783
REFERENCES
784
UniProt Consortium. (2009). The Universal Protein Resource (UniProt) 2009.
785
Nucleic Acids Res. 37:D169-174.
786
Alexander, D.H., Novembre, J., and Lange, K. (2009). Fast model-based estimation
787
of ancestry in unrelated individuals. Genome Res. 19:1655-1664.
788
Altschul, S.F., Gish, W., Miller, W., Myers, E.W., and Lipman, D.J. (1990). Basic
789
local alignment search tool. J. Mol. Biol. 215:403-410.
790
Backhaus, R.A. (1985). Rubber formation in plants—a mini-review. Israel J. Bot.
791
34:283-293.
792
Badouin, H., Gouzy, J., Grassa, C.J., Murat, F., Staton, S.E., Cottret, L.,
793
Lelandais-Briere, C., Owens, G.L., Carrere, S., Mayjonade, B., et al. (2017). The
794
sunflower genome provides insights into oil metabolism, flowering and Asterid
795
evolution. Nature 546:148-152.
796
Berlin, K., Koren, S., and Chin, C.S. (2015). Assembling large genomes with single-
797
molecule sequencing and locality-sensitive hashing. Nat. Biotechnol. 33:623-630.
798
Birney, E., Clamp, M., and Durbin, R. (2004). GeneWise and Genomewise.
799
Genome Res. 14:988-995.
800
Bowers, J.E. (1990). Natural rubber-producing plants for the United States. PNAS
801
USA 100:1352-1357.
802
Bredeson, J.V., and Lyons, J.B. (2016). Sequencing wild and cultivated cassava and
803
related species reveals extensive interspecific hybridization and genetic diversity. Nat.
804
Biotechnol. 34:562-570.
805
Chan, A.P., Crabtree, J., Zhao, Q., Lorenzi, H., Orvis, J., Puiu, D., Melake-
806
Berhan, A., Jones, K.M., Redman, J., Chen, G., et al. (2010). Draft genome
807
sequence of the oilseed species Ricinus communis. Nat. Biotechnol. 28:951-956.
808
Chan, H. (2000). Milestones in Rubber Research (Malaysian Rubber Board).
809
Chen, S., Krinsky, B.H., and Long, M. (2013). New genes as drivers of phenotypic
810
evolution. Nat. Rev. Genet. 14:645-660.
811
Chin, C.-S., Alexander, D.H., Marks, P., Klammer, A.A., Drake, J., Heiner, C.,
812
Clum, A., Copeland, A., Huddleston, J., and Eichler, E.E. (2013). Nonhybrid,
813
finished microbial genome assemblies from long-read SMRT sequencing data. Nature
814
Methods 10:563.
815
Doebley, J.F., Gaut, B.S., and Smith, B.D. (2006). The molecular genetics of crop
816
domestication. Cell 127:1309-1321.
817
Dusotoit-Coucaud, A., Kongsawadworakul, P., Maurousset, L., Viboonjun, U.,
818
Brunel, N., Pujade-Renaud, V., Chrestin, H., and Sakr, S. (2010). Ethylene
819
stimulation of latex yield depends on the expression of a sucrose transporter
820
(HbSUT1B) in rubber tree (Hevea brasiliensis). Tree Physiology 30:1586-1598.
821
Elsik, C.G., Mackey, A.J., Reese, J.T., Milshina, N.V., Roos, D.S., and Weinstock,
822
G.M. (2007). Creating a honey bee consensus gene set. Genome Biol. 8:R13.
823
Grabherr, M.G., Haas, B.J., Yassour, M., Levin, J.Z., Thompson, D.A., Amit, I.,
824
Adiconis, X., Fan, L., Raychowdhury, R., Zeng, Q. et al. (2011). Full-length
825
transcriptome assembly from RNA-Seq data without a reference genome. Nat.
826
Biotechnol. 29: 644-U130.
827
Jaillon, O., Aury, J.M., Noel, B., Policriti, A., Clepet, C., Casagrande, A.,
828
Choisne, N., Aubourg, S., Vitulo, N., Jubin, C., et al. (2007). The grapevine genome
829
sequence suggests ancestral hexaploidization in major angiosperm phyla. Nature
830
449:463-467.
831
Jones, P., Binns, D., Chang, H.Y., Fraser, M., Li, W., McAnulla, C., McWilliam,
832
H., Maslen, J., Mitchell, A., Nuka, G., et al. (2014). InterProScan 5: genome-scale
833
protein function classification. Bioinformatics 30:1236-1240.
834
Jurka, J., Kapitonov, V.V., Pavlicek, A., Klonowski, P., Kohany, O., and
835
Walichiewicz, J. (2005). Repbase Update, a database of eukaryotic repetitive
836
elements. Cytogenet. Genome Res. 110:462-467.
837
Kanehisa, M., and Goto, S. (2000). KEGG: kyoto encyclopedia of genes and
838
genomes. Nucleic Acids Res. 28:27-30.
839
Kent, W.J. (2002). BLAT--the BLAST-like alignment tool. Genome Res. 12:656-664.
840
Lau, N.S., Makita, Y., Kawashima, M., Taylor, T.D., Kondo, S., Othman, A.S.,
841
Shu-Chien, A.C., and Matsui, M. (2016). The rubber tree genome shows expansion
842
of gene family associated with rubber biosynthesis. Sci. Rep. 6:28594.
843
Li, B., and Dewey, C.N. (2011). RSEM: accurate transcript quantification from RNA-
844
Seq data with or without a reference genome. BMC Bioinformatics 12:323.
845
Li, H. (2013). Aligning sequence reads, clone sequences and assembly contigs with
846
BWA-MEM. arXiv (1303.3997). 1303.
847
Li, L., Stoeckert, C.J., Jr., and Roos, D.S. (2003). OrthoMCL: identification of
848
ortholog groups for eukaryotic genomes. Genome Res. 13:2178-2189.
849
Li, R., Zhu, H., Ruan, J., Qian, W., Fang, X., Shi, Z., Li, Y., Li, S., Shan, G.,
850
Kristiansen, K., et al. (2010). De novo assembly of human genomes with massively
851
parallel short read sequencing. Genome Res. 20:265-272.
852
Lieberei, R. (2007). South American leaf blight of the rubber tree (Hevea spp.): new
853
steps in plant domestication using physiological features and molecular markers. Ann.
854
Bot. 100:1125-1142.
855
Lin, T., Xu, X., Ruan, J., Liu, S., Wu, S., Shao, X., Wang, X., Gan, L., Qin, B.,
856
and Yang, Y. (2018). Genome analysis of Taraxacum kok-saghyz Rodin provides new
857
insights into rubber biosynthesis. Natl. Sci. Rev. 5:78-87.
858
McKenna, A., Hanna, M., Banks, E., Sivachenko, A., Cibulskis, K., Kernytsky,
859
A., Garimella, K., Altshuler, D., Gabriel, S., Daly, M., et al. (2010). The Genome
860
Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA
861
sequencing data. Genome Res. 20:1297-1303.
862
Ohno, S. (1970). Introduction in Evolution by gene duplication 1-2 (Berlin,
863
Heidelberg: Springer).
864
Piegu, B., Guyot, R., Picault, N., Roulin, A., Sanyal, A., Kim, H., Collura, K.,
865
Brar, D.S., Jackson, S., Wing, R.A., et al. (2006). Doubling genome size without
866
polyploidization: dynamics of retrotransposition-driven genomic expansions in Oryza
867
australiensis, a wild relative of rice. Genome Res. 16:1262-1269.
868
Pootakham, W., Ruang-Areerate, P., Jomchai, N., Sonthirod, C., Sangsrakru, D.,
869
Yoocha, T., Theerawattanasuk, K., Nirapathpongporn, K., Romruensukharom,
870
P., Tragoonrung, S., et al. (2015). Construction of a high-density integrated genetic
871
linkage map of rubber tree (Hevea brasiliensis) using genotyping-by-sequencing
872
(GBS). Front. Plant Sci. 6:367.
873
Pootakham, W., Sonthirod, C., Naktang, C., Ruang-Areerate, P., Yoocha, T.,
874
Sangsrakru, D., Theerawattanasuk, K., Rattanawong, R., Lekawipat, N., and
875
Tangphatsornruang, S. (2017). De novo hybrid assembly of the rubber tree genome
876
reveals evidence of paleotetraploidy in Hevea species. Sci. Rep. 7:41457.
877
Pritchard, J. (2010). Documentation for STRUCTURE software: version 2.2.
878
41:55–63.
879
Priyadarshan, P.M., and Clement-Demange, A. (2004). Breeding Hevea rubber:
880
formal and molecular genetics. Adv. Genet. 52:51-115.
881
Priyadarshan, P.M., and Goncalves, P.D.S. (2003). Hevea gene pool for breeding.
882
Genet. Resour. Crop Ev. 50:101-114.
883
Rahman, A.Y., Usharraj, A.O., Misra, B.B., Thottathil, G.P., Jayasekaran, K.,
884
Feng, Y., Hou, S., Ong, S.Y., Ng, F.L., Lee, L.S., et al. (2013). Draft genome
885
sequence of the rubber tree Hevea brasiliensis. BMC Genomics 14:75.
886
Salse, J. (2016). Ancestors of modern plant crops. Curr. Opin. Plant Biol. 30:134-142.
887
Sato, S., Hirakawa, H., Isobe, S., Fukai, E., Watanabe, A., Kato, M., Kawashima,
888
K., Minami, C., Muraki, A., Nakazaki, N., et al. (2011). Sequence analysis of the
889
genome of an oil-bearing tree, Jatropha curcas L. DNA Res. 18:65-76.
890
Servant, N., Varoquaux, N., Lajoie, B.R., Viara, E., Chen, C.J., Vert, J.P., Heard,
891
E., Dekker, J., and Barillot, E. (2015). HiC-Pro: an optimized and flexible pipeline
892
for Hi-C data processing. Genome Biol. 16:259.
893
Simao, F.A., Waterhouse, R.M., Ioannidis, P., Kriventseva, E.V., and Zdobnov,
894
E.M. (2015). BUSCO: assessing genome assembly and annotation completeness with
895
single-copy orthologs. Bioinformatics 31:3210-3212.
896
Tamura, K., Stecher, G., Peterson, D., Filipski, A., and Kumar, S. (2013).
897
MEGA6: Molecular Evolutionary Genetics Analysis version 6.0. Mol. Biol. Evol.
898
30:2725-2729.
899
Tang, C., Yang, M., Fang, Y., Luo, Y., Gao, S., Xiao, X., An, Z., Zhou, B., Zhang,
900
B., Tan, X., et al. (2016). The rubber tree genome reveals new insights into rubber
901
production and species adaptation. Nature Plants 2:16073.
902
Trapnell, C., Pachter, L., and Salzberg, S.L. (2009). TopHat: discovering splice
903
junctions with RNA-Seq. Bioinformatics 25:1105-1111.
904
Trapnell, C., Williams, B.A., Pertea, G., Mortazavi, A., Kwan, G., van Baren,
905
M.J., Salzberg, S.L., Wold, B.J., and Pachter, L. (2010). Transcript assembly and
906
quantification by RNA-Seq reveals unannotated transcripts and isoform switching
907
during cell differentiation. Nat. Biotechnol. 28:511-515.
908
van Beilen, J.B., and Poirier, Y. (2007). Establishment of new crops for the
909
production of natural rubber. Trends Biotechnol. 25:522-529.
910
Vurture, G.W., Sedlazeck, F.J., Nattestad, M., Underwood, C.J., Fang, H.,
911
Gurtowski, J., and Schatz, M. C. (2017). GenomeScope: Fast reference-free genome
912
profiling from short reads. Bioinformatics 33: 2202-2204.
913
Walker, B.J., Abeel, T., Shea, T., Priest, M., Abouelliel, A., Sakthikumar, S.,
914
Cuomo, C.A., Zeng, Q., Wortman, J., Young, S.K., et al. (2014). Pilon: an
915
integrated tool for comprehensive microbial variant detection and genome assembly
916
improvement. PLoS One 9:e112963.
917
Wang, Y., Tang, H., Debarry, J.D., Tan, X., Li, J., Wang, X., Lee, T.H., Jin, H.,
918
Marler, B., Guo, H., et al. (2012). MCScanX: a toolkit for detection and evolutionary
919
analysis of gene synteny and collinearity. Nucleic Acids Res. 40:e49.
920
Webster, C., and Baulkwill, W. (1989). Rubber (Longman Scientific and Technical).
921
Wuyun, T.N., Wang, L., Liu, H., Wang, X., Zhang, L., Bennetzen, J.L., Li, T.,
922
Yang, L., Liu, P., Du, L., et al. (2018). The hardy rubber tree genome provides
923
insights into the evolution of polyisoprene biosynthesis. Molecular Plant 11:429-442.
924
Xia, E.H., Zhang, H.B., Sheng, J., Li, K., Zhang, Q.J., Kim, C., Zhang, Y., Liu,
925
Y., Zhu, T., Li, W., et al. (2017). The tea tree genome provides insights into tea flavor
926
and independent evolution of caffeine biosynthesis. Molecular Plant 10:866-877.
927
Xu, Z., and Wang, H. (2007). LTR_FINDER: an efficient tool for the prediction of
928
full-length LTR retrotransposons. Nucleic Acids Res. 35:W265-268.
929
Yang, S.F., and Hoffman, N.E. (2003). Ethylene biosynthesis and its regulation in
930
higher plants. Annu. Rev. Plant Physiol 35:155-189.
931
Zhang, D., Li, W., Xia, E.H., Zhang, Q.J., Liu Y., Zhang, Y., Tong, Y., Zhao, Y.,
932
Niu, Y. C., Xu, J.H., Gao, L.Z. (2017). The medicinal herb Panax notoginseng
933
genome provides insights into ginsenoside biosynthesis and genome evolution.
934
Molecular Plant 10: 903–907.
935
Zhang, X., Zhang, S., Zhao, Q., Ming, R., Tang, H. (2019). Assembly of allele-
936
aware, chromosomal scale autopolyploid genomes based on Hi-C data. Nature Plants,
937
5: 833–845.
938
Tables
939
Table 1. Statistics of the genome assembly and gene annotation for rubber tree.
940 Assembly Illumina / PacBio sequencing depth (×) Estimated genome size (Mb) Chromosome number (2n = 2x=) Total length of assembly (Mb) No. of contigs Contig N50 (Kb) No. of scaffolds Contig number on chromosomes Chromosome length (Mb) GC content of the genome (%) Annotation Number of gene models Average gene length (bp) Mean exon length (bp) Average exon per gene Mean intron length (bp) Number of function annotated tRNAs rRNAs snoRNAs snRNAs miRNAs Repeat sequence length (Mb) Percentage of repeat sequence (%) 941 942
261.38 / 103.69 1,561 36 1,472 16,023 152.7 600 15,324 1,442 33.87 44,187 3,918.4 222.08 5.13 672.1 42,758 945 93 61 396 373 1,042.4 70.81
943
Figure Legends
944
Figure 1. The rubber tree genome features. (A) Circular representation of the 18
945
pseudochromosomes; (B) the density of genes; (C) the density of non-coding RNA;
946
(D) the distribution of transposable elements (TEs); (E) the distribution of gypsy-type
947
retrotrasnposons; (F) the distribution of copia-type retrotrasnposons; (G) the
948
distribution of DNA transposons; (H) SSR density; (I) the distribution of GC content;
949
(J) whole genome duplication (WGD) event shown by syntenic relationships among
950
duplication blocks containing more than fifteen paralogous gene pairs.
951 952
Figure 2. The chromosome evolution after the shared paleotetraploidy of rubber
953
tree and cassava. (A) Conserved synteny within the rubber tree genome. Diagrams
954
show genomic colinearity within the H. brasiliensis genome. Lines link the position
955
of paralogous gene sets among linkage group/chromosome set. The ten chromosomes
956
arranged in the upper circle illustrate 1:1 synteny between the five duplicated pairs of
957
chromosomes. The eight chromosomes depicted in the lower circle each share
958
syntenic regions with two other chromosomes, owing to chromosomal rearrangements
959
that occurred after the whole-genome duplication; (B) Evolutionary patterns for the
960
rubber tree and cassava chromosomes from the AEK of 7 (pre-WGT-γ)
961
protochromosomes. Genome rearrangements of these two genomes are elucidated
962
with different colors that represent the origins from the seven ancestral chromosomes
963
from the n= 7 AEK. Rubber tree linkage groups are designated with “Hbr” followed
964
by the linkage group numbers, and cassava chromosomes are designated with “Mes”
965
followed by the chromosome numbers.
966 967
Figure 3. Genome size variation and evolution of retrotransposon families in the
968
rubber tree genome. (A) shows genome sizes and proportions of different types of
969
transposable elements (TEs) in R. communis (RC), J. curcas (JC), M. esculenta (ME),
970
H. brasiliensis (HB), P. trichocarpa (PT) and L. usitatissimum (LU) using
971
Arabidopsis thaliana (AT) as outgroup. The unrooted phylogenetic trees were
972
constructed on the basis of Ty1/copia (B) and Ty3/gypsy (C) aligned sequences
973
corresponding to the RT domains without premature termination codon. LTR
974
retrotransposon family names and proportion of each are indicated.
975
976
Figure 4. Rubber biosynthesis and the expansion of the REF/SRPP and CPT gene
977
families in rubber tree. (A) Levels of expression (reads per kilobase per million
978
reads mapped; RPKM) of genes involved in the rubber biosynthesis in bark, root,
979
flower, stem, leaf and seed; (B) genomic locations of REF/SRPP and CPT genes.
980
Chromosomes are represented as solid bars with their names on left. Note that most of
981
the REF/SRPP and CPT genes are located on Hbr_9 and Hbr_14, respectively.
982 983
Figure 5. Population divergence and artificial selection of H. brasiliensis. (A)
984
Two-way principal components analysis (PCA) of the sixteen H. brasiliensis
985
accessions using identified SNPs. (B) Population structure of H. brasiliensis. Each
986
color and vertical bar represents one population and one accession, respectively. The
987
y-axis shows the proportion of each accession contributed from ancestral populations.
988
(C) Neighbor-Joining (NJ) phylogenetic tree of the sixteen H. brasiliensis accessions
989
constructed using SNP data. The scale bar represents the evolutionary distances
990
measured by p-distance. (D) the distribution of the FST values and levels of nucleotide
991
variation between the cultutivated and wild rubber trees. (E) genomic regions under
992
artificial selection on the 18 chromosomes of rubber tree.
993
Pyruvate Acetyl-CoA ACAT Acetoacetyl-CoA
ACAT1
1
ACAT2
0
ACAT3
−1
ACAT4
HMGS
ACAT5
HMG-CoA HMGR
HMGS1 HMGR1 HMGR2
MVA MVK
HMGR3 HMGR4 MVK1
MVD1
PMD1
rk
2
Ba
Ro
DXS1
DXS3 DXS4 DXS5 DXS6
CME CMK
DXS7 DXS8 DXR
PCME MCS
CMS1 CMS2
CMEC HDS
CMK MCS1
HMED
MCS2
HDR
HDS1 HDS2
rS we ed af Flo tem Le Se
HDR1 HDR2
CPT1
IPP
CPT2
1
CPT3
0
rk
Ba
IPPI2
IPPI
CPT6
cis-1,4polyisopre ne
CPT7 CPT8 CPT9 CPT10
REF
CPT11
rS we ot rk ed af Ba Ro Flo tem Le Se
CPT12
REF2
SRPP1
2
SRPP2
1
SRPP3 0
SRPP4
−1
0
REF4
−1
REF5
−2
FPS1
GPP FPS
FPS2
FPP GGPS
GGPS1
FPS3
GGPS2 GGPS3
GGPP
1
REF3
GPS2
GGPS4
REF6
SRPP5
−2
2
REF1
rS we ot ed af Ro Flo tem Le Se
B
GPS1
DMAPP GPS
SRPP
ark
1.5 1 0.5 0 −0.5 −1 −1.5
IPPI1
CPT5
−2
rS ot lowe em eaf eed t F L S
Ro
CPT
CPT4 −1
1.5 1 0.5 0 −0.5 −1 −1.5
DXS2
MEP CMS
PMD
PMD2
rS ot lowe em eaf eed t F L S
Ro
DXP DXR
MVA-5-pp
MVD2
rk
Ba
DXS
MVA-5-p MVD
MVK2
ot
Pyruvate
GA-3-p
MEP Pathway
rS we ot rk ed af Ba Ro Flo tem Le Se
MVA Pathway
A
REF7
SRPP6
REF8
SRPP7 SRPP8 SRPP9
B
SRPP10
2900000
56900000
Chr_1 Chr_1
Chr_2
86100000
Chr_3
Chr_9
Chr_3 15600000
Chr_7 Chr_8
Chr_10
Chr_15
Chr_16
37900000
6360000
22200000
86150000
1610000
47450000
86600000
56950000 71300000 3000000
71350000 3050000
37950000 71750000
71800000 72600000
22250000
86650000
86250000
1600000 38550000
6560000
86350000
6560000 6610000
22300000 13240000
22350000 13290000
22400000 13340000
22450000 13390000
47550000
47600000
47650000
47700000
86750000 65100000
65500000 1650000 38600000
65200000
65250000
65550000 38650000
65600000 38700000
38750000
77800000
77850000
2050000
79500000
79550000
79600000
79650000
79700000
79750000
16950000
17000000
17050000
17100000
17150000
17200000
3750000
3800000
REF
SRPP
REF
80250000
6660000 6560000
50600000 6560000
13440000
50650000
13490000
13540000
6560000 13590000
86850000
86800000 65150000
2000000
3700000
SRPP
CPT
72700000 80200000
72650000 86300000
CPT
3150000
6510000
86700000
1550000 38500000
71400000 3100000
6460000
1660000
47500000
65050000
Chr_12
86200000
6410000
65000000
Chr_13 Chr_14
Chr_14
2950000
38800000
77900000
78300000
38850000
78350000
38900000
78400000
39050000
78450000
39100000
78500000
78550000
78700000
78750000
A Principal component 2
B Cult053 K=2 Cult169 Cult171 Cult223 Cult293 Cult589 KEN RRIM 600 RP K=3 RY7−33−97 Wild162 Wild166 Wild194 Wild232 Wild258 Wild778 K=4
0.5
0.0
−0.25 0.00 Principal component 1
RR 0
Cu
78
23
RP
ld7
53
lt0
Cult293
60
4
19
IM
ild
W
Wi
K=5
lt2
C
0.25
Cu
−0.5 −0.50
Cult171 K=6
Wild232
Cul
t589
162
lt5 89 33 − Cu 97 lt2 Cu 23 l RR t293 IM 6 W 00 ild 25 8 KE Cu N lt1 W 69 ild 7 W 78 ild 1 W 94 ild 1 W 62 ild 23 2
RP
Cu lt0 W 53 ild 1 Cu 66 lt1 71
RY
7−
Cu
6 d16
Wil
Wild258
KE N
K=7
7
-9
0.02
33
9
t16
Cul
7-
W
RY
ild
D
E
1
2
log (pw/pc)
3
3
−3 −2 −1 0
log(pw/pc)
4
2
1
0.0
0.2
0.4 FST
0.6
0.8
0
1 2 3 4 5 6 7
8
9 10 1112131415 16 17 18 Chromosomes