De novo assembly and analysis of the Pugionium cornutum (L.) Gaertn. transcriptome and identification of genes involved in the drought response

De novo assembly and analysis of the Pugionium cornutum (L.) Gaertn. transcriptome and identification of genes involved in the drought response

Accepted Manuscript De novo assembly and analysis of the Pugionium cornutum (L.) Gaertn. transcriptome and identification of genes involved in the dro...

995KB Sizes 0 Downloads 22 Views

Accepted Manuscript De novo assembly and analysis of the Pugionium cornutum (L.) Gaertn. transcriptome and identification of genes involved in the drought response

Ping Wang, Fengfeng Wang, Jing Yang PII: DOI: Reference:

S0378-1119(17)30410-9 doi: 10.1016/j.gene.2017.05.053 GENE 41953

To appear in:

Gene

Received date: Revised date: Accepted date:

7 December 2016 28 April 2017 24 May 2017

Please cite this article as: Ping Wang, Fengfeng Wang, Jing Yang , De novo assembly and analysis of the Pugionium cornutum (L.) Gaertn. transcriptome and identification of genes involved in the drought response, Gene (2017), doi: 10.1016/j.gene.2017.05.053

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT Title De novo assembly and analysis of the Pugionium cornutum (L.) Gaertn. transcriptome and identification of genes involved in the

PT

drought response

RI

Authors

SC

Ping Wang*, Fengfeng Wang, Jing Yang

Affiliations

NU

College of Agronomy, Inner Mongolia Agricultural University,Inner Mongolia

MA

Autonomous Region Key Laboratory of Wild Peculiar Vegetable Germplasm Resource and Germplasm Enhancement, Huhhot 010019, China to

Dr.

Ping

Wang,

Tel:+864714318437,

D

*Correspondence

CE

Abstract

PT E

E-mail:[email protected]

AC

Pugionium cornutum (L.) Gaertn. is a xerophytic plant species widely distributed in sandy and desert habitats in northwest China. However, the molecular mechanism of drought tolerance in P. cornutum has received little attention. At present, there is limited available transcriptome information for P. cormutum in public databases. Illumina sequencing was used to identify drought-responsive genes and to further characterize the molecular basis of drought tolerance in P. cornutum. In total, 51,385 unigenes with an average length of 825.32 bp were obtained by de novo transcriptome 1 / 37

ACCEPTED MANUSCRIPT assembly. Among these unigenes, 35,276 were annotated with gene descriptions, conserved domains, gene ontology terms, and metabolic pathways. In addition, the results showed that differentially expressed genes (DEGs) were mainly involved in photosynthesis, nitrogen metabolism, and plant hormone signal transduction

PT

pathways, notably ascorbate and aldarate metabolism, which could be an alternative

RI

pathway to enhance antioxidant capacity in P. cornutum in response to drought stress.

SC

These results provide an important clue about the effects of accumulation of ROS on ascorbic acid biosynthesis in P. cornutum. In addition, we found that transcription of

NU

most genes involved in ascorbic acid metabolism was altered under drought stress.

MA

Additionally, 93 drought-inducible transcription factor genes were identified in the DEGs under drought conditions; these included DREB, AP2/EREBP, B-2a, ERF2,

D

MYB and Zinc finger family. The results of this study provide further insight into the

PT E

molecular mechanisms of stress tolerance in P. cornutum, and also identify some attractive candidate genes and valuable information for improving drought stress

CE

tolerance in other species through genetic engineering.

AC

Keywords: Pugionium cornutum (L.) Gaertn.; drought-stress response; Illumina sequencing, transcriptome

2 / 37

ACCEPTED MANUSCRIPT 1. Introduction With the expansion of deserts in many parts of the world, large-scale anti-desertification schemes have proposed, and these include the search for sand plants that are suitable for growing in desert regions[1]. These plants can be planted

PT

on a large scale for sustainable reforestation to combat desertification[2]. Currently,

RI

many local sand-fixing shrubs have been shown to be effective in combatting

SC

desertification[3].

Pugionium is a small genus in the botanical family Brassicaceae that is endemic

NU

to central Asia [4]. Pugionium. cornutum (L.) Gaertn. occurs in an area of the typical

MA

steppe belt that includes the top and the base of the leeward side of mobile or semi-mobile sand dunes. Plants of this genus are typical psammophytes[5], which are

D

mainly distributed in Badain Jaran Desert, Kubuqi Desert, Mu Us Desert, Horqin and

PT E

Hulunbuir sandy land [4]. As a xerophytic desert plant, P. cornutum is very tolerant of drought conditions, because it has a well-developed taproot which can extend to a

CE

depth of 160 cm with a root range of 60-80 cm, this may contribute to its ability to

AC

absorb water from deep layers of the sand (Figure S1a). Additionally, the lower leaf blades are pinnatipartite, but the upper leaves are lanceolate, which reduces the amount of water lost to evaporation (Figure S1b). Furthermore, P. cornutum has a strong ability to fix sand because of its branching stems and well-developed root system. P. cornutum has also been consumed as a vegetable by the local people for a long time[6]. As a food, P. cornutum contains protein, fat, carbohydrates, vitamins, and minerals, and has a higher vitamin C content than do many common vegetables. 3 / 37

ACCEPTED MANUSCRIPT Moreover, P. cornutum has a high medicinal value, and the whole plant can be used as medicine[1]. When plants suffer from abiotic stresses such as drought, low temperatures, or high temperatures, it can lead to changes in the transcription of some functionally

PT

related genes that are involved in the adaptive changes to multiple metabolic and

RI

physiological processes[7]. The relative effect of drought on plants can differ, as can

SC

their responses to drought stress[8]. Some plants can reduce the negative effects of stress through adaptive physiological and biochemical mechanisms, such as changes

NU

in the levels of phtohormones and enzymes in response to water stress, and the

MA

generation of reactive oxygen species (ROS) [9]. More importantly, stress-tolerance genes have evolved in plants through long-term natural selection[10]. Recent

D

advances in molecular biology allow us to study drought responses in plants, and also

PT E

reveal the underlying response mechanisms[11]. Therefore, a comprehensive understanding of the genetic basis of drought tolerance will be of great significance in

AC

varieties[12].

CE

exploring drought tolerance genes and in cultivating drought-resistant crop

Recently, due to habitat deterioration and human overexploitation, the number of species in the genus Pugionium is gradually decreasing[13, 14]. To explore its drought survival mechanism as it relates to conservation biology, many studies have been conducted on the physiological adaptation of P. cornutum to drought in recent years[1, 15]. However, complete transcriptome data is presently unavailable for species of Pugionium; therefore, the genetic basis of drought tolerance in P. cornutum 4 / 37

ACCEPTED MANUSCRIPT remains unknown. The objective of this study was to use high-throughput Illumina DNA sequencing technology to perform de novo transcriptome sequencing on P. cornutum leaves and roots at different stages of drought stress. To obtain complete

PT

transcriptome information under drought stress, we obtained functional annotation

RI

and classification of unigenes using bioinformatics methods. Based on this, a

SC

comparative analysis was then conducted on gene expression profiles before and after drought stress treatment. This allowed us to systematically study the classes of

NU

differentially expressed genes, the relative gene expression levels, and the temporal

MA

and spatial specificity of genes expressed in response to drought stress. To the best of our knowledge, this is the first report on the transcriptome of P.

D

cornutum. The results of our study will not only provide further insight into the

PT E

molecular mechanisms of stress tolerance in P. cornutum, but will also offer some attractive candidate genes and valuable information for improving drought stress

CE

tolerance in other species through genetic engineering.

AC

2. Materials and methods 2.1 Plant material and sample processing We collected the fruits of P. cornutum from Mu Us Sandland located in Ordos City in the Inner Mongolia Autonomous Region of China. No specific permissions were required for the described locations. The silicles (fruits) of P. cornutum have narrow wings and an acuminate apex (Figure S1c). After 8 months of dry storage at room temperature, seeds with the pericarp removed were sterilized in 2% (v/v) 10% NaOCl 5 / 37

ACCEPTED MANUSCRIPT for 10 min, followed by several rinses with sterile water. The surface-sterilized seeds were then germinated in Petri dishes (9 cm) covered with filter paper. Germinated seeds were then sown in PVC pots containing 1.5 kg of a mixture of sand and manure (4:1; v/v) (the top and bottom diameters of the pots were 17 cm and 11 cm,

PT

respectively, and the height was 15 cm). The mixture contains 2.36 g/kg nitrogen,

RI

1.01g/kg available phosphorus, and 1.58g/kg available potassium at 13.8% field

SC

water-holding capacity (WHC). Seedlings were maintained in a greenhouse at mean temperatures of 26/19oC (day/night), and relative humidity of 75–85%. The study was

NU

carried out in a shade-house covered with a clear plastic rain shield. We confirm that

MA

the field studies did not involve endangered or protected species. Morphologically uniform seedlings at the 6-leaf stage were selected for the

D

experiments. Seedlings were randomly divided into three groups (n=20 per group) and

PT E

assigned to different treatments as described previously by Hsiao (1973)[16]: (1) control seedlings maintained at 75-80% water-holding capacity (WHC) conditions,

CE

(2) mildly drought-stressed at 40-45% WHC, and (3), for the severe drought

AC

treatment, water was restricted to 30-35% of field capacity. Root tissues and fully expanded leaves at the third or fourth positions from the shoot apex were then harvested at 7:30 a.m., and the samples from both the control and treated seedlings were flash frozen in liquid nitrogen for 5 min and stored at -80oC for subsequent RNA isolation. 2.2 RNA isolation Total RNA was extracted from the leaf and root samples according to the instruction 6 / 37

ACCEPTED MANUSCRIPT manual of the Quick RNA isolation Kit (Huayueyang Biotech Co., Ltd., Beijing, China). These samples were divided into two categories: 1) leaves; L1 (control treatment), L2 (moderate drought stress), and L3 (severe drought stress), and 2) roots; R1 (control treatment), R2 (moderate drought stress), and R3 (severe drought stress).

PT

The extracted RNA was treated with RNase-free DNaseI (TaKaRa Biotech Co., Ltd.,

RI

Dalian, China) to eliminate genomic DNA contamination. Equivalent amounts of total

SC

RNA extracted from leaves or roots of the same genotype were mixed, the RNA integrity was then examined by agarose gel electrophoresis (1.2%), and the RNA

NU

concentration was quantified on an Agilent 2100 Bioanalyzer (Agilent Technologies,

2.3 cDNA library construction

MA

Inc., Santa Clara, CA, USA).

D

High-quality RNA samples from P. cornutum plants were sent to Biomarker

PT E

Technologies Corporation (Beijing, China) for cDNA library construction and sequencing. Poly(A)-containing mRNA was purified using oligo-dT magnetic beads

CE

and broken into short fragments (~200 nt) in fragmentation buffer. RNA-seq libraries

AC

were constructed using the TruSeq RNA Sample Prep Kit (Illumina, San Diego, CA) with multiplexing primers, according to the manufacturer’s protocol. The first-strand cDNA was synthesized using random hexamer primers with the mRNA fragments as templates. The second strand cDNA was synthesized with RNase H and DNA polymerase I and was purified using a QiaQuick PCR extraction kit (Qiagen, Inc., Hilden, Germany). Following end repair and A-tailing, the short cDNA fragments were ligated to Illumina paired-end adaptors and purified by agarose gel 7 / 37

ACCEPTED MANUSCRIPT electrophoresis. The cDNA fragments were then amplified by PCR to create the final cDNA library. Sequencing was performed from both the 5' and 3' ends on the Illumina HiSeq2000 system, generating more than 300 million read pairs. 2.3 Data analysis

PT

2.3.1 Raw data preprocessing and filtering

RI

To obtain clean sequence reads, a sliding-window method[17] was applied to remove

SC

duplicate, redundant, and poor quality reads and reads containing 'N' bases, after which the clean reads were used in subsequent analyses.

NU

2.3.2 De novo transcriptome assembly

MA

In order to facilitate the subsequent research, at least five seasons of self-reproduction are required to reduce the frequency of heterozygosity for this materials. The

D

remaining clean reads were used for de novo transcriptome assembly using

PT E

Trinity[18], which is especially robust for de novo assembly of short reads without a reference genome. We used a single Trinity assembly based on combining all reads

CE

across all samples as inputs, due to the high level of polymorphism in the Pugionium

AC

cornutum genome. Redundant contigs were removed by using the CD-hit program[19]. A single set of nonredundant unigenes was acquired with TGICL software[20] after removing the alternatively-spliced and redundant sequences. 2.3.3 Functional annotation of unigenes Gene ontology (GO) [21] terms were assigned to each contig using the Blast2GO tool[22]. Functional annotation results were obtained from BLAST[23] searches against the following public databases; NCBI Nr, SwissProt[24], Cluster of 8 / 37

ACCEPTED MANUSCRIPT Orthologous Groups (COG) [25], and the Kyoto Encyclopedia of Genes and Genomes (KEGG) database[26]. MISA software[27] (http://pgrc.ipk-gatersleben.de/misa/) was used to screen microsatellite sequences in the unigenes with lengths >1,000 nt. The minimum motif repeat size was set to 10 for mononucleotides, 6 for dinucleotides,

PT

and 5 for other motifs. Open reading frames (ORFs) were predicted using Getorf

RI

software[28]. A popular SNP detection tool SOAPsnp[29] was used to identify single

SC

nucleotide polymorphisms (SNPs). 2.3.4 Differential expression analysis of the unigenes

NU

The unigene expression levels were measured as the numbers of clean reads mapped

MA

to the reference sequence. The RPKM (Reads Per kb per Million reads) method[28] was used to calculate the relative expression of each unigene. To reveal the molecular

D

events involved in the drought response, pairwise comparisons were performed

PT E

between the samples with IDEG6[30]. Unigenes with a false discovery rate (FDR) <0.001 and an absolute value of log2 Ratio ≥1 were defined as statistically significant

CE

DEGs. Functional enrichment analysis of the DEGs was implemented by the GOseq

AC

R package based on Wallenius non-central hyper-geometric distribution[31]. 2.4 Quantitative real-time PCR analysis Although RNA-seq is a powerful tool for interrogating the global transcriptome, there are inevitably some errors in a transcriptome assembly that is derived from billions of short RNA-seq reads. To validate the assembled sequences and the expression profiles obtained by RNA-seq, 10 of the above DEGs were randomly selected for qRT-PCR (reverse transcription polymerase chain reaction) analyses using the Power SYBR 9 / 37



ACCEPTED MANUSCRIPT Premix Ex TaqTM II Kit (Perfect RealTime, TaKaRa, Dalian, China) on a Bio-Rad iQ5 multicolor real-time PCR detection system according to the manufacturer’s instructions. Reactions were performed by starting with an intitial denaturation step at 95°C for 3 min, followed by 45 cycles of 95°C for 5 s and 60°C for 30 s and 71 cycles

PT

of 60–95 °C for 15 s increasing by 0.5 °C at every cycle. The β-actin

RI

(R1_Unigene_BMK.22570) gene was used as an internal control. The relative

SC

expression of selected unigenes was calculated by the 2-ΔΔCt method. Two technical replicates were performed for each sample. All primers used in this study are listed in

NU

Table S1.

MA

3. Results

3.1 De novo transcriptome assembly and functional annotation of the assembled

D

cDNA contigs

PT E

Illumina sequencing produced 139,550,391 high quality reads for all of the samples (Table 1). The Cycle Q20 value refers to the percentage of the bases with a quality

CE

(Q) score ≥20, and is frequently used to assess sequencing quality. The Q scores were

AC

100% in this study, indicating that the sequencing data is highly reliable. The GC content of the nucleotides ranged between 45.37% and 45.89%.

10 / 37

ACCEPTED MANUSCRIPT Table 1. Summary of Illumina RNA sequencing for six samples isolated from P. cornutum (L.) Gaertn. GC

reads

percentage

percentage

L1

28,632,191 5,783,112,390

100%

45.74%

R1

22,175,072 4,478,906,443

100%

L2

20,141,273 4,068,130,202

100%

R2

21,302,429 4,302,655,395

100%

45.89%

L3

21,341,410 4,310,523,191

100%

45.37%

R3

25,958,016 5,242,916,698

100%

45.65%

PT

CycleQ20

RI

45.51%

SC

Total nucleotides

NU

Total

45.73%

MA

Samples

After removing the low-quality sequences and assemblies, a total of 51,385

D

unigenes were obtained from the six samples, of which 13,096 unigenes had lengths

PT E

>1,000 nt, accounting for 12.74 % of the total number. The average length and N50 of the unigenes were approximately 825 bp and 1,582 bp, respectively. It was predicted

CE

that approximately 99.3% of the unigenes had ORFs.

AC

As shown in Figure 1a, there were 34,257 unigenes that showed homology to known protein sequences in the NCBI Nr database; 60.9% of these had significant E values (E value threshold ≤1E-50). Consistent with this, the unigenes with homology similarity >80% accounted for 60.2%. Arabidopsis lyrata (39.3%) was the top-hit species (Figure 1b). 27,370 unigenes had significant similarity with sequences deposited

in

the

Swiss-Prot

2). 11 / 37

database

(Table

AC

CE

PT E

D

MA

NU

SC

RI

PT

ACCEPTED MANUSCRIPT

Figure 1. Characteristics of unigene BLAST searches. (a). E value distribution of blast hits for assembled unigenes against the Nr database with a cutoff E value of 10−5. (b). NR species distribution shown as a percent of the total top homologies. Table 2. Annotation results of the P. cornutum transcriptome

12 / 37

ACCEPTED MANUSCRIPT

Annotated

All sequence

>=300 nt

>= 1000 nt

NR

34,257

27,427

12,945

SwissProt

27,370

22,083

11,332

GO

26,783

21,681

10,714

KEGG

8,845

6,897

COG

11,012

9,393

Total

35,276

27,896

PT

databases

SC

RI

3,270 5,746 12,977

NU

A total of 26,783 (52.1%) unigenes were assigned at least one GO term. In many

MA

cases, multiple terms were assigned to the same transcript; thus, 149,510 were assigned at least one GO term in the biological process category, 38,549 in the

D

molecular function category and 111,700 in the cellular component category. As

PT E

shown in Figure S2, for each of the three main categories of the GO classification, ''cell part'', ''binding'' and ''cellular process'' were dominant, respectively.

CE

After searching against the COG database, 15,141 unigenes were assigned to 25

AC

specific functional groups (Figure 2). The top three clusters were “general function prediction only” (2,626; 17.3%), “translation, ribosomal structure and biogenesis” (1,846; 12.2%), and “replication, recombination and repair” (1,362; 9%). In addition, 9,174 unigenes were assigned to 122 pathways by searching the KEGG database; the pathway “ribosome” (ko03010) contained the largest number of unigenes (1,081), followed by “oxidative phosphorylation” (ko00190; 398 unigenes) and “plant hormone signal transduction” (ko04075; 294 unigenes). 13 / 37

SC

RI

PT

ACCEPTED MANUSCRIPT

NU

Figure 2. Histogram of Clusters of Orthologous Groups (COG) classification.

MA

The unigenes were aligned to the COG database to predict and classify possible functions. Out of 34,257 hits in the NCBI non-redundant (nr) database, 15,141

D

unigenes were annotated and separated into 25 clusters.

PT E

3.2 Analysis of differential gene expression in response to drought stress To identify the drought-responsive genes, we performed differentially expressed

CE

gene (DEG) analysis in both leaves and roots. A large number of differentially Notably, more genes

AC

expressed unigenes were detected in the pairwise comparison.

were differentially expressed under severe stress compared to mild stress, indicating that a severe stress treatment is more likely to stimulate the expression of drought stress-related genes than is mild drought stress. In general, 1385, 2567, and 1280 genes were differentially expressed in comparisons of L2 vs. L1, L3 vs. L1, and L3 vs. L2, respectively (Figure 3a). Similarly, we identified 1494, 2478, and 1649 differentially expressed genes in the R2 vs. R1, R3 vs. R1, and R3 vs. R2 14 / 37

ACCEPTED MANUSCRIPT comparisons, respectively (Figure 3b). Based on the above results, differences between leaves and roots were further compared with a focus on the DEGs; we identified 592 up-regulated and 388 down-regulated genes in the leaves and roots,

MA

NU

SC

RI

PT

respectively.

D

Figure 3. Venn diagrams showing DEGs in leaves (a) and roots (b) across three

PT E

comparisons (mild stress vs. control, severe stress vs. control, and severe stress vs. mild stress). The overlapping values correspond to the number of

CE

differentially expressed genes in the two sets of comparisons.

AC

In our study, approximately equal numbers of DEGs were identified in roots and leaves; we identified 1,307 up-regulated genes in leaves and 1,412 in roots; and 1,260 and 1,066 that were down-regulated in leaves and roots, respectively (Figure S3). Specifically, when the up- and down-regulated gene sets identified in the L3 and L1 comparison were compared with the corresponding gene sets from the R3/R1 comparison, overlap was observed for only 27.83% (up) and 20.02% (down), respectively. The result suggested that a considerable number of genes were 15 / 37

ACCEPTED MANUSCRIPT differentially regulated in a tissue-specific manner. This further indicated that plants can adapt to drought stress by physiological and tissue-specific changes[32] in gene expression. Transcription factors are widely involved in the regulation of plant drought stress

PT

responses, especially DREB-type TF genes, which encode dehydration-responsive

RI

element-binding proteins capable of responding to abiotic stresses. In this study, we

SC

identified 93 differentially expressed TF genes (Table S2) in the L3-L1 comparison, based on the Swiss-Prot annotation, which belong to a diverse range of TF families

NU

such as MYB, zinc finger, WRKY, AP2/ERF, and DREB. In parallel, a total of 99

MA

differentially expressed TFs were found in the R3-R1 comparison. 40 genes were overlapped with the L3-L1 comparison; however, there were 10 genes that did not

D

match any genes in the Swissprot database; we speculate that the p value of these

PT E

genes were not sufficiently small. Generally, Swiss-Prot database provide the most conservative, verified, and up-to-date sequence information and functional role

CE

identification. Additionally, ethylene-responsive TFs, heat stress TFs, and bHLH

AC

family TFs were found to be related to the drought response. To further validate the changes in gene expression observed in the differential expression analysis, qRT-PCR assays were performed on ten randomly selected DEGs, which are from the intersection of genes from leaves and roots, respectively, compared with the control groups. All 10 genes showed trends similar to those in the differential expression analysis (Figure S4). Thus, the transcriptome-based DEG results were reliable for the identification of drought-responsive genes in the present 16 / 37

ACCEPTED MANUSCRIPT study. 3.3 DEGs involved in signal transduction and regulation in P. cornutum underlying drought stress Gene ontology enrichment analysis was performed to identify GO terms and KEGG

PT

pathways that were over-represented in the list of DEGs. In total, 6,753 DEGs were

RI

enriched in 61 GO terms, as shown in Figure 4, in the biological process category;

SC

GO terms related to the defense response, such as “metabolic process” (5160), “response to stimulus” (4099), “biological regulation” (3688), and “developmental

NU

process” (2872) were overrepresented, as expected. Similarly, the terms "binding",

MA

"catalytic activity", "nucleic acid binding transcription factor activity", "transporter activity", and "structural molecule activity" were predominant in the molecular

D

function category. Functional classification based on GO terms showed that genes in

PT E

the functional categories “metabolic process”, “response to stimulus”, and “biological

AC

CE

regulation” were more active in P. cornutum during drought treatment.

Figure 4. Functional classification of the DEGs based on Gene Ontology (GO) 17 / 37

ACCEPTED MANUSCRIPT categorization. The results are summarized for the three main GO categories of biological processes, cellular components, and molecular functions. The x-axis indicates the subcategories, and the y-axis indicates the numbers of genes related to the total number of GO terms present. The unigene numbers that are assigned

PT

to the same GO terms are indicated at the top of the bars.

RI

KEGG pathway analysis suggested that a variety of signaling pathways were

SC

altered in this plant cell (Table S3), including phenylpropanoid biosynthesis, photosynthesis, phenylalanine metabolism, plant hormone signal transduction, and

NU

carotenoid biosynthesis. However, only 1,305 differentially expressed unigenes in the

MA

L3-L1 comparison were annotated in the KEGG pathway database. Importantly, we found that the ascorbate and aldarate metabolism pathway was among them. It has

D

previously been indicated that ascorbic acid (AsA) can act as a free-radical and active

PT E

oxygen species scavenger. There is a rapid and transient production of ROS, when plants subjected to abiotic stress. Then antioxidant protection systems represented by

CE

ascorbic acid will function. These results also further indicate that a higher ascorbic

AC

acid content in the samples may play a role in mitigating the stress responses. We found that the important enzymes in ascorbate metabolism were differentially expressed (Figure 5a). Moreover, we found that nitrogen metabolism pathway (Figure 5b) were significantly correlated with the differentially expressed genes (P-value= 5.4E-4). Nitrogen metabolism related enzymes such as nitrate reductase (NR) for nitrate reduction, and glutamine synthetase (GS) and glutamate synthase (GOGAT) for ammonia assimilation play important roles in plant drought stress responses. In 18 / 37

ACCEPTED MANUSCRIPT our research, the ferredoxin-nitrite reductase (1.7.7.1) gene was up-regulated, while genes for glutamine synthetase (6.3.1.2) and glutamate synthase (1.4.7.1) were down-regulated, leading to the accumulation of ammonia, which can produce a toxic effect on plants. From another perspective, glutamate dehydrogenase (1.4.1.2) has a

PT

dual function, and can as as an ammonia remover. Therefore, it is important to

RI

improve nitrogen metabolism under drought stress conditions in order to promote

PT E

D

MA

NU

SC

drought tolerance in P. cornutum.

Figure 5. Two biochemical pathways that were significantly enriched during

CE

drought stress. a) The putative ascorbic acid metabolism pathway, b) The nitrogen metabolism pathway. Up-regulated genes are shown in red,

blue.

AC

down-regulated genes in green, and genes expressed in both cases are labeled in

3.4 Functional roles of DREB transcription factors in the adaptation to drought stress Previous research has shown that the dehydration responsive element binding (DREB) transcription factor family can enhance plant tolerance to low temperature, drought, and salt stresses [33]. A systematic study of DREB genes involved in drought stress 19 / 37

ACCEPTED MANUSCRIPT tolerance in P. cornutum is limited. Our results show that 14 unigenes encoding DREB-like proteins were identified by gene annotation, of which eight were differentially

expressed;

T1_Unigene_BMK.15688;

T2_Unigene_BMK.20577;

T2_Unigene_BMK.20579;

T2_Unigene_BMK.21446;

T2_Unigene_BMK.22591;

PT

T2_Unigene_BMK.4148; T2_Unigene_BMK.8143; T4_Unigene_BMK.8528.

RI

In order to confirm the functions of the eight DREB genes, gene ontology (GO)

SC

was used to describe the function of the annotated unique sequences based on the three GO categories, from which we obtained many meaningful GO term annotations:

“sequence-specific

DNA

binding”

(GO:0043565);

Cellular

MA

(GO:0003700),

NU

Molecular Function: “sequence-specific DNA binding transcription factor activity”

Component: “nucleus” (GO:0005634), “plastid” (GO:0009536); Biological Process:

D

“regulation of transcription, DNA-dependent” (GO:0006355), “response to heat”

stress”

PT E

(GO:0009408), “response to water deprivation” (GO:0009414), “response to salt (GO:0009651),

“regulation

of

gibberellin

biosynthetic

process”

CE

(GO:0010371), “cell growth” (GO:0016049), “regulation of timing of transition from

AC

vegetative to reproductive phase” (GO:0048510), and “response to freezing” (GO:0050826).

3.5 Identification of molecular markers To further evaluate the assembly quality and identify new molecular markers, 13,096 unigenes with lengths >1,000nt was used to mine for potential microsatellites. Using MISA software, we identified 5,377 unigenes that contained simple sequence repeats (SSRs); of these, 1,026 unigenes contained one >1 SSR (Table S4). The percentages 20 / 37

ACCEPTED MANUSCRIPT of mononucleotide, di-nucleotide, and tri-nucleotide repeats were 41.4%, 29.7%, and 28.1%, respectively. SOAPsnp was used to assemble the consensus sequences and call genotypes in the target regions. SNP (single nucleotide polymorphism) analysis between the RNA samples revealed 75,059 SNP sites, which had Q scores >30 with

PT

read depth between 10X and 100X. As expected, SNPs between the samples exposed

RI

to different drought conditions accounted for the majority of above sites; 5,122 (L3

SC

vs. L1), 4,925 (R3 vs. R1), and 4,217 (R2 vs. R1) (Table S5). RNA-seq provides a cost-effective way to analyze a subset of the transcriptome. The transcriptomic data

NU

and microsatellite markers will provide valuable resources for future functional gene

MA

analyses, genetic map construction, and quantitative trait loci mapping in the genus Pugionium[34].

D

4. Discussion

PT E

Pugionium cornutum (L.) Gaertn has received increased attention in recent years because of its high nutritional value and excellent drought tolerance. To adapt to arid

CE

environments, P. cornutum has developed various physiological and biochemical

AC

mechanisms to reduce the effects of drought stress. However, at present, there is no study describing the biochemical, physiological, and molecular mechanisms that underly drought tolerance in this species. We expect that NGS technology will be a valuable tool to further our understanding of these molecular mechanisms. In the present study, we used Illumina DNA sequencing technology to identify drought-responsive genes and to further characterize the molecular basis of drought tolerance in P. cornutum. 21 / 37

ACCEPTED MANUSCRIPT In this study, the raw sequencing reads were processed following the standard Illumina protocol, thus ensuring the accuracy of the RNA-seq data. Additionally, the qRT-PCR results showed that randomly selected DEGs exhibited similar expression patterns to those obtained from the RNA-seq analysis. Currently, publicly available

PT

transcription data for drought resistant plants is very limited[35]; a previous study

RI

reported that de novo assembly of the transcriptome of the desert tree Haloxylon

SC

ammodendron provided new perspectives on the drought response, gene discovery, and marker identification[36]. Our efforts will lead to a better understanding of

NU

drought resistance in plants. Although a large number of unigenes were identified in

MA

this study, intrinsic factors are often more difficult to grasp without prior information about the system, and are definitely more ambiguous from a decision standpoint[37].

D

Some important genes have inevitably been missed, including non-coding RNAs and

PT E

those that were developmentally silenced. Nonetheless, our findings indicate a high quality transcriptome assembly for P. cornutum and highlight the potential of using

CE

Illumina sequencing in non-model species. It goes without saying that more novel

AC

genes will be discovered in the future. We identified a large number of DEGs in the pairwise comparison of samples; these promising genes are expected to make significant contributions to the cultivation of new varieties, and the inter-relationships of functional annotation terms will clarify the processes involved in the plant response to drought stress including stress signal perception, and adaptation at the morphological, physiological, and molecular levels[38]. It is well known that extensive changes in gene expression 22 / 37

ACCEPTED MANUSCRIPT occur when plants are exposed to drought stress[39]. In general, both up- and down-regulation of gene expression is observed in plants exposed to drought conditions. It has been reported that many genes tend to be up-regulated under drought stress in model plants[36, 40]. In the present study, we did not find this

PT

tendency, although the number of DEGs increased to 4,763, but the proportion of the

RI

genome that is devoted to drought stress is unknown due to a lack of a reference

SC

genome for this species. In comparison to the studies of Long et al. and Ma et al. [36, 41], more DEGs were found and annotated in our study, which allowed us to further

enrichment

analysis

showed

that

phenylpropanoid

biosynthesis,

MA

pathway

NU

describe the mechanisms of drought tolerance in a xerophytic desert plant. The KEGG

photosynthesis, and phenylalanine metabolism pathways were significantly enriched

D

in the DEGs, and plant hormone signal transduction, carotenoid biosynthesis, and the

PT E

ascorbate and aldarate metabolism pathway were also in the top 10 (p-value ≤0.01). Abiotic stresses such as drought, low temperatures, and high light levels can lead

CE

to increased production of free radicals and reactive oxygen species in plants.

AC

Although high concentrations of ROS can cause irreversable damage and cell death[42], they can also function to regulate the expression of specific protein kinase and transcription factor genes to enhance plant drought tolerance [43, 44]. A growing body of evidence suggests that plants respond to abiotic stresses by increasing their capacity to scavenge ROS[45]. Efforts to understand this acclimation process for P. cornutum have focused on the main components of the antioxidant system, i.e. ascorbate peroxidase, superoxide dismutase, and the low molecular weight 23 / 37

ACCEPTED MANUSCRIPT antioxidants ascorbate and glutathione. A case in point is ascorbate and aldarate metabolism; ascorbic acid (AsA), also known as vitamin C, has an important antioxidant and metabolic function in plants. As a necessary antioxidant, AsA protects plants from oxidative damage by scavenging free radicals and ROS that are generated

PT

during photosynthesis, oxidative metabolism, and in response to various stresses[46].

RI

Enhanced ascorbate recycling limits the deleterious effects of environmental oxidative

SC

stress. Therefore, it is important to clarify the effects of accumulation of ROS on AsA in plants under drought stress. Additionally, we also confirmed changes in the

NU

phenylalanine metabolism pathway; previous research found that under stress

MA

conditions, phenylalanine may represent an alternative pathway for photochemical energy dissipation that can enhance the antioxidant capacity of the cell[47].

D

Several transcription factors have been shown to be related to drought tolerance

PT E

in plants[48]. In contrast to previous studies, more differentially-expressed transcription factor genes were identified and annotated in our study. Our results were

CE

consistent with previous studies[33] showing that DERB TFs have vital roles in the

AC

plant response to drought stress. In our study, more DEGs for TF families were identified in the plants subjected to drought treatment. Among these TFs, genes for AP2/EREBP, Myb family, BZip, ethylene-responsive, bHLH, and zinc finger family TFs have been shown to have crucial roles in plant development and plant responses to biotic and abiotic stresses. In addition, some TFs are key regulators in signal transduction pathways; ethylene-responsive transcription factors (ERFs) play a key role in the integration of ethylene and abscisic acid (ABA) signals for the regulation 24 / 37

ACCEPTED MANUSCRIPT of stress response genes[49]. Jin et al.[50] found a novel gene for an ethylene-responsive element binding protein (EREBP) and showed that ABA could induce the expression of EREBP in cotton. This further indicates that EREBP is involved in both ethylene and ABA signaling pathways, and has great significance to

PT

investigate cross-talk between different pathways and identify the common

RI

transcription factors. Shinshi et al.[51] showed that ERFs can act in the ethylene and

SC

jasmonic acid pathways through simultaneous transcriptional and post-transcriptional regulation.

NU

5. Conclusions

MA

In this study, high-throughput DNA sequencing data and DEG analysis provided comprehensive transcriptome information for the xerophytic plant P. cornutum under

D

varying conditions of drought stress. We built digital gene expression profiles for six

PT E

tissue samples, and a large number of candidate genes potentially involved in the drought stress response were identified. In addition, functional annotation analysis

CE

revealed significant alterations in the nitrogen metabolism and ascorbate metabolism

AC

pathways during drought stress. These novel drought-tolerance genes will provide critical clues for understanding the physiological mechanisms of drought tolerance.

25 / 37

ACCEPTED MANUSCRIPT Supplementary Materials Table S1. Oligonucleotide primers used in qRT-PCR analyses. Table S2.

Transcription

factor genes

differentially

expressed

between

comparisons L3 and L1.

PT

Table S3. List of DEGs in significantly biochemical enriched pathways.

RI

Table S4. Summary of EST-SSRs in the transcriptome of P. cornutum.

SC

Table S5. Homozygous SNP sites identified by SOAPsnp between pairwise comparisons among all tissue samples.

NU

Figure S1. Gross morphology of P. cornutum (L.) Gaertn. (a) leaves and roots; (b)

MA

aerial parts of the plant; (c) silicles (fruits).

Figure S2. Functional classification of the assembled unigenes based on Gene

D

Ontology (GO) categorization. The results are summarized for the three main

PT E

GO categories: biological processes, cellular components and molecular functions. The x-axis indicates the subcategories, and the y-axis indicates the

CE

numbers of genes related to the total number of GO terms present; the unigene

bars.

AC

numbers that are assigned the same GO terms are indicated at the top of the

Figure S3. Venn diagram analysis of differentially expressed gene sets for the L3/L1 and R3/R1 comparisons. Blue represents the up-regulated genes in leaves, and green represents down-regulated genes in leaves; similarly, yellow represents the up-regulated genes in roots, and red represents down-regulated genes in roots. 26 / 37

ACCEPTED MANUSCRIPT Figure S4. Validation of RNA-seq results using qRT-PCR. Ten DEGs were randomly selected for qRT-PCR analyses. Theβ-actin gene was used as an

AC

CE

PT E

D

MA

NU

SC

RI

PT

internal control.

27 / 37

ACCEPTED MANUSCRIPT Acknowledgments This work was supported by grants from the National Natural Science Foundation of China (Grant Nos. 31260475,30960236,30560088), Innovation (Foster) Team of Inner Mongolia Agricultural University, China (Grant No. NDPYTD2013-3).

PT

Author Contributions

RI

WP designed the study, WFF and YJ planted and sampled the material, WP wrote the

Conflicts of Interest

AC

CE

PT E

D

MA

NU

The authors declare no conflict of interest.

SC

manuscript. All authors have read and approved the final version.

28 / 37

ACCEPTED MANUSCRIPT References 1.

Li, H.Y., et al., Compositional and gastrointestinal prokinetic studies of Pugionium (L.). Food Chemistry, 2015. 186: p. 285-291.

2.

Zorner, R.J., et al., Climate change mitigation: A spatial analysis of global

PT

land suitability for clean development mechanism afforestation and

RI

reforestation. Agriculture Ecosystems & Environment, 2008. 126(1-2): p.

3.

SC

67-80.

Wang, T., et al., Combating Aeolian Desertification in Northern China. Land

Wang, Q., et al., Pleistocene climate change and the origin of two desert plant

MA

4.

NU

Degradation & Development, 2015. 26(2): p. 118-132.

species, Pugionium cornutum and Pugionium dolabratum (Brassicaceae), in

Zhao, Y., A taxological revision and floristic analysis of the genus Pugionium.

PT E

5.

D

northwest China. New Phytol, 2013. 199(1): p. 277-87.

Acta Scientiarum Naturalium Universitatis NeiMonggol (Natural Science

Mosaddegh, M., et al., Ethnobotanical survey of herbal remedies traditionally

AC

6.

CE

Edition), 1998. 30(2): p. 197-199.

used in Kohghiluyeh va Boyer Ahmad province of Iran. Journal of Ethnopharmacology, 2012. 141(1): p. 80-95. 7.

Mahajan, S. and N. Tuteja, Cold, salinity and drought stresses: An overview. Archives of Biochemistry and Biophysics, 2005. 444(2): p. 139-158.

8.

Chaves, M.M., J.P. Maroco, and J.S. Pereira, Understanding plant responses to drought - from genes to the whole plant. Functional Plant Biology, 2003. 29 / 37

ACCEPTED MANUSCRIPT 30(3): p. 239-264. 9.

Feder, M.E. and G.E. Hofmann, Heat-shock proteins, molecular chaperones, and the stress response: Evolutionary and ecological physiology. Annual Review of Physiology, 1999. 61: p. 243-282. Ceccarelli, S., et al., Plant breeding and climate changes. Journal of

PT

10.

Hasegawa, P.M., et al., Plant cellular and molecular responses to high

SC

11.

RI

Agricultural Science, 2010. 148: p. 627-637.

salinity. Annual Review of Plant Physiology and Plant Molecular Biology,

Cattivelli, L., et al., Drought tolerance improvement in crop plants: An

MA

12.

NU

2000. 51: p. 463-499.

integrated view from breeding to genomics. Field Crops Research, 2008.

Cai, X.Y., et al., Genetic diversity and population structure of an endangered

PT E

13.

D

105(1-2): p. 1-14.

Orchid (Dendrobium loddigesii Rolfe) from China revealed by SRAP markers.

Grishkan, I., R.L. Jia, and X. Rong, Influence of sand burial on cultivable

AC

14.

CE

Scientia Horticulturae, 2011. 129(4): p. 877-881.

micro-fungi inhabiting biological soil crusts. Pedobiologia, 2015. 58(2-3): p. 89-96. 15.

Hao, L., et al., A study on vegetative growth law and leaf anatomy of Pugionium cornutum (L.) gaertn. Acta Agriculturae Boreali-Sinica, 2003. 19(4): p. 66-69.

16.

Hanscom, Z. and I.P. Ting, Responses of Succulents to Plant Water Stress. 30 / 37

ACCEPTED MANUSCRIPT Plant Physiology, 1978. 61(3): p. 327-330. 17.

Takami, H., et al., Complete genome sequence of the alkaliphilic bacterium Bacillus halodurans and genomic sequence comparison with Bacillus subtilis. Nucleic Acids Research, 2000. 28(21): p. 4317-4331. Grabherr, M.G., et al., Full-length transcriptome assembly from RNA-Seq data

PT

18.

Li, W. and A. Godzik, Cd-hit: a fast program for clustering and comparing

SC

19.

RI

without a reference genome. Nature Biotechnology, 2011. 29(7): p. 644-U130.

large sets of protein or nucleotide sequences. Bioinformatics, 2006. 22(13): p.

Pertea, G., et al., TIGR Gene Indices clustering tools (TGICL): a software

MA

20.

NU

1658-9.

system for fast clustering of large EST datasets. Bioinformatics, 2003. 19(5):

Ashburner, M., et al., Gene ontology: tool for the unification of biology. The

PT E

21.

D

p. 651-2.

Gene Ontology Consortium. Nat Genet, 2000. 25(1): p. 25-9. Conesa, A., et al., Blast2GO: a universal tool for annotation, visualization and

CE

22.

AC

analysis in functional genomics research. Bioinformatics, 2005. 21(18): p. 3674-6. 23.

Altschul, S.F., et al., Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res, 1997. 25(17): p. 3389-402.

24.

Apweiler, R., et al., UniProt: the Universal Protein knowledgebase. Nucleic Acids Res, 2004. 32(Database issue): p. D115-9. 31 / 37

ACCEPTED MANUSCRIPT 25.

Tatusov, R.L., et al., The COG database: a tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Res, 2000. 28(1): p. 33-6.

26.

Kanehisa, M., et al., The KEGG resource for deciphering the genome. Nucleic Acids Res, 2004. 32(Database issue): p. D277-80. Kanehisa, M., et al., KEGG for linking genomes to life and the environment.

PT

27.

Mortazavi, A., et al., Mapping and quantifying mammalian transcriptomes by

SC

28.

RI

Nucleic Acids Res, 2008. 36(Database issue): p. D480-4.

RNA-Seq. Nat Methods, 2008. 5(7): p. 621-8.

Li, R., et al., SNP detection for massively parallel whole-genome

NU

29.

30.

MA

resequencing. Genome Res, 2009. 19(6): p. 1124-32. Romualdi, C., et al., IDEG6: a web tool for detection of differentially

D

expressed genes in multiple tag sampling experiments. Physiol Genomics,

31.

PT E

2003. 12(2): p. 159-62.

Young, M.D., et al., Gene ontology analysis for RNA-seq: accounting for

Tosens, T., et al., Developmental changes in mesophyll diffusion conductance

AC

32.

CE

selection bias. Genome Biol, 2010. 11(2): p. R14.

and photosynthetic capacity under different light and water availabilities in Populus tremula: how structure constrains function. Plant Cell and Environment, 2012. 35(5): p. 839-856. 33.

Agarwal, P.K., et al., Role of DREB transcription factors in abiotic and biotic stress tolerance in plants. Plant Cell Reports, 2006. 25(12): p. 1263-1274.

34.

Sehgal, D., et al., Integration of gene-based markers in a pearl millet genetic 32 / 37

ACCEPTED MANUSCRIPT map for identification of candidate genes underlying drought tolerance quantitative trait loci. Bmc Plant Biology, 2012. 12. 35.

Cushman, J.C. and H.J. Bohnert, Genomic approaches to plant stress tolerance. Curr Opin Plant Biol, 2000. 3(2): p. 117-24. Long, Y., et al., De novo assembly of the desert tree Haloxylon ammodendron

PT

36.

RI

(C. A. Mey.) based on RNA-Seq data provides insight into drought response,

37.

SC

gene discovery and marker identification. BMC Genomics, 2014. 15: p. 1111. Wang, J., et al., Transcriptomic profiling of the salt-stress response in the

Guo, Y., Screening for drought tolerance in Brassica rapa L.: from genetic

MA

38.

NU

halophyte Halogeton glomeratus. BMC Genomics, 2015. 16: p. 169.

variation to gene expression, 2015, The University of Western Australia. Hajela,

R.K.,

et

al.,

Molecular-Cloning

and

Expression

of

Cor

D

39.

PT E

(Cold-Regulated) Genes in Arabidopsis-Thaliana. Plant Physiology, 1990. 93(3): p. 1246-1252.

Gao, J.P., D.Y. Chao, and H.X. Lin, Understanding abiotic stress tolerance

CE

40.

AC

mechanisms: Recent studies on stress response in rice. Journal of Integrative Plant Biology, 2007. 49(6): p. 742-750. 41.

Ma, X.W., et al., De novo transcriptome sequencing and comprehensive analysis of the drought-responsive genes in the desert plant Cynanchum komarovii. Bmc Genomics, 2015. 16.

42.

Apel, K. and H. Hirt, Reactive oxygen species: Metabolism, oxidative stress, and signal transduction. Annual Review of Plant Biology, 2004. 55: p. 33 / 37

ACCEPTED MANUSCRIPT 373-399. 43.

Thapa, G., et al., An insight into the drought stress induced alterations in plants. Biologia Plantarum, 2011. 55(4): p. 603-613.

44.

Tian, Z.G., et al., Antioxidant Mechanism and Lipid Peroxidation Patterns in

PT

Leaves and Petals of Marigold in Response to Drought Stress. Horticulture

Pitzschke, A. and H. Hirt, Mitogen-activated protein kinases and reactive

SC

45.

RI

Environment and Biotechnology, 2012. 53(3): p. 183-192.

oxygen species signaling in plants. Plant Physiology, 2006. 141(2): p.

Stevens, R., et al., Candidate genes and quantitative trait loci affecting fruit

MA

46.

NU

351-356.

ascorbic acid content in three tomato populations. Plant Physiol, 2007.

Grace, S.C. and B.A. Logan, Energy dissipation and radical scavenging by the

PT E

47.

D

143(4): p. 1943-53.

plant phenylpropanoid pathway. Philos Trans R Soc Lond B Biol Sci, 2000.

Chen, L.M., et al., Genome-wide transcriptional analysis of two soybean

AC

48.

CE

355(1402): p. 1499-510.

genotypes under dehydration and rehydration conditions. Bmc Genomics, 2013. 14. 49.

Jin, L.G., et al., Expression profiles and transactivation analysis of a novel ethylene-responsive transcription factor gene GhERF5 from cotton. Progress in Natural Science, 2009. 19(5): p. 563-572.

50.

Jin, L., et al., Expression profiles and transactivation analysis of a novel 34 / 37

ACCEPTED MANUSCRIPT ethylene-responsive transcription factor gene GhERF5 from cotton. Progress in Natural Science, 2009. 19(5): p. 563-572. Shinshi, H., Ethylene-regulated transcription and crosstalk with jasmonic

CE

PT E

D

MA

NU

SC

RI

PT

acid. Plant Science, 2008. 175(1-2): p. 18-23.

AC

51.

35 / 37

ACCEPTED MANUSCRIPT Abbreviations P. cornutum

PT E

D

MA

NU

SC

RI

PT

-differentially expressed genes

CE

DEGs

-reactive oxygen species

AC

ROS

-Pugionium cornutum (L.) Gaertn

36 / 37

ACCEPTED MANUSCRIPT Highlights DEGs were mainly involved in ascorbate and aldarate metabolism. The accumulation of ROS may affect ascorbic acid biosynthesis in P. cornutum.

AC

CE

PT E

D

MA

NU

SC

RI

PT

Many drought-inducible transcription factor were identified under drought conditions.

37 / 37