Genome-wide expression profiling of the transcriptomes of four Paulownia tomentosa accessions in response to drought

Genome-wide expression profiling of the transcriptomes of four Paulownia tomentosa accessions in response to drought

Genomics 104 (2014) 295–305 Contents lists available at ScienceDirect Genomics journal homepage: www.elsevier.com/locate/ygeno Genome-wide expressi...

2MB Sizes 1 Downloads 42 Views

Genomics 104 (2014) 295–305

Contents lists available at ScienceDirect

Genomics journal homepage: www.elsevier.com/locate/ygeno

Genome-wide expression profiling of the transcriptomes of four Paulownia tomentosa accessions in response to drought☆ Yanpeng Dong a,b, Guoqiang Fan a,b,⁎, Minjie Deng a, Enkai Xu a,b, Zhenli Zhao b a b

Institute of Paulownia, Henan Agricultural University, Zhengzhou, Henan, China College of Forestry, Henan Agricultural University, Zhengzhou, Henan, China

a r t i c l e

i n f o

Article history: Received 5 November 2013 Accepted 18 August 2014 Available online 2 September 2014 Keywords: Diploidy Tetraploidy Droughts Transcriptome

a b s t r a c t Paulownia tomentosa is an important foundation forest tree species in semiarid areas. The lack of genetic information hinders research into the mechanisms involved in its response to abiotic stresses. Here, short-read sequencing technology (Illumina) was used to de novo assemble the transcriptome on P. tomentosa. A total of 99,218 unigenes with a mean length of 949 nucleotides were assembled. 68,295 unigenes were selected and the functions of their products were predicted using Clusters of Orthologous Groups, Gene Ontology, and Kyoto Encyclopedia of Genes and Genomes annotations. Afterwards, hundreds of genes involved in drought response were identified. Twelve putative drought response genes were analyzed by quantitative real-time polymerase chain reaction. This study provides a dataset of genes and inherent biochemical pathways, which will help in understanding the mechanisms of the water-deficit response in P. tomentosa. To our knowledge, this is the first study to highlight the genetic makeup of P. tomentosa. © 2014 Elsevier Inc. All rights reserved.

1. Introduction Plants often suffer from adverse environmental changes. Among the major unfavorable abiotic stresses that affect plant growth and productivity, drought is one of the most devastating. In this context, many plants have evolved specific mechanisms for adapting to drought and other environmental stresses. Drought resistant plants are potential sources of genes for breeding plants with improved water-use efficiency. Drought response and tolerance mechanisms are polygenic traits, which trigger genes that are involved in a wide range of biological processes [1]. Next-generation sequencing techniques have made the cost of sequencing affordable, and this has led to the availability and analysis of genome-wide transcriptome data from many non-model organisms. In many plants for which genome information is unavailable, transcriptome sequencing has proven to be an efficient method to study drought adaptation and other biological features [2]. Genes that are triggered by drought have been identified in many plant species. For example, in various species of Populus (poplars), many droughtinduced genes have been identified [3,4] and the functions and expression patterns of some drought-induced genes [5,6] have been examined. ☆ Sequence data from this article have been deposited with the National Center for Biotechnology Information Sequence Read Archive database (http://www.ncbi.nlm.nih. gov/sra) under Accession No. SRP031515. ⁎ Corresponding author at: 95 Wenhua Road, Jinshui Area, Zhengzhou 450002, Henan, China. E-mail addresses: [email protected] (Y. Dong), [email protected] (G. Fan), [email protected] (M. Deng), [email protected] (E. Xu), [email protected] (Z. Zhao).

http://dx.doi.org/10.1016/j.ygeno.2014.08.008 0888-7543/© 2014 Elsevier Inc. All rights reserved.

Intraspecies variations among Populus species have been described [7] and some specific genotypes have been compared [8] using large-scale transcriptome sequencing methods. A recent study focused on changes in the transcriptome of cambial tissues from Populus in response to drought stress [9]. Transcriptomic response has also been studied in other species of angiosperm forest trees, although not to the same depth. For instance, Quercus spp. (oak) has been suggested as a model forest tree species [10]. A study of the physiological and biochemical responses controlled by a network of differentially expressed genes (DEGs), which belong to different functional groups underlying drought stress, has provided an important insight into the complex molecular mechanisms that underlie the response and adaption of plants to water deficiency [11]. Based on the predicted functional roles of these genes, the DEGs were classified into two major groups [12]: (1) genes that encode proteins related to signaling cascades and transcriptional control, including protein kinases and phosphatases, phytohormones (e.g. abscisic acid (ABA), ethylene, salicylic acid, and jasmonic acid), and transcription factors (TFs) [2,12]; and (2) genes that encode heat shock proteins, osmoprotectants, ion channel proteins, membrane protectants, dehydrins, late embryogenesis abundant proteins, transporters (e.g. sugar transporters, ABC transporters, and aquaporins), reactive oxygen species, antioxidants (e.g. peroxiredoxins, peroxidases, catalases, superoxide dismutase, and glutathione peroxidases), various kinds of metabolism-related (e.g. carbohydrate metabolism and alcohol metabolism) proteins, and senescence-related proteins that can act as protectors of plant cells against drought [5,13–15]. Two major pathways in the response of plants to water deficiency, the abscisic acid (ABA)-dependent and ABA-independent pathways, have

296

Y. Dong et al. / Genomics 104 (2014) 295–305

been well described [2]. These pathways involve complex interactions; for example, TFs in the ABA-dependent pathway include bZIPs, NACs, MYBs and MYCs [16] and, TFs in the ABA-independent pathway involve NAC and DREB [6,17]. However, the functions of many of the genes that have been found to be differentially expressed in response to drought are still unknown and their putative functions are difficult to predict, especially in non-model forest tree species. Paulownia tomentosa is a forest tree species that has been native in Asia for over 2000 years. This plant has been introduced gradually into Europe, America, and Australia [18,19]. Paulownia species are very widely distributed in China and are intercropped on 2.5 million ha of farmland [19,20]. In most areas, the productivity of the tree is affected adversely by water deficiency, making it imperative to understand the molecular mechanisms underlying the drought response of Paulownia. In our previous studies, the physiological responses of tetraploid and diploid Paulownia plants to drought stress were measured when the relative soil water content was 25% (drought-treated plants) and 75% (well-watered plants as the control) [21]. We found that changes in the leaf physiological and biochemical indexes with drought stress were similar in the tetraploid and diploid plants. The water and chlorophyll contents of the leaves decreased with drought stress, and were higher in the tetraploid than in the diploid in both the droughttreated and the well-watered plants. The relative conductivity and content of malonaldehyde increased with drought stress. Both the superoxide dismutase activity and soluble protein content were found to first increase and then decrease. The soluble sugar and proline contents increased with drought stress, and were higher in the tetraploid than in the diploid plants. In the present study, Illumina/Solexa sequencing technology was used to perform a genome-wide transcriptome analysis to identify P. tomentosa genes that are responsive to water deficiency. This is the first reported transcriptome of this economically important species. 2. Results 2.1. Comparison of the physiological responses of diploid and tetraploid Paulownia to drought stress With 25% and 75% relative soil water contents, the physiological responses of tetraploids and diploid Paulownia plants to drought stress

tolerance were studied. In the tetraploid and diploid Paulownia plants, the changing trends of leaf physiological and biochemical indexes were consistent with aggravation of drought stress [21]. The loss of water increased gradually with time (3-d, 6-d, 9-d and 12-d) and the dehydrated symptoms gradually worsened. Leaves began to curl from the edges to the extent that the entire pages slowly shrink. The colors of the plants gradually altered from bright green to dark green. The stems were bending and were hardly able to afford the weight of the leaves. The morphological specificities of the well-watered and the 12-d drought-treated accessions were shown in Fig. 1. 2.2. RNA-Seq sequencing and assembly of the Paulownia leaf transcripts Sampling 18 seedlings of drought-treated and well-watered diploid (2n = 40) and autotetraploid (4n = 80) P. tomentosa plants resulted in 6 different reproductive transcriptomes (named PT2T, PT2, PT2-1, PT4T, and PT4, PT4-1 respectively). Among them, a total of 12 samples were collected from two biological replicates of well-watered diploid and autotetraploid P. tomentosa plants (PT2R and PT4R). The Illumina/Solexa GAIIx platform generated a total of 312,865,162 raw reads comprising 24 billion nucleotides (nt) (75.1 million for PT2, 80.6 million for PT2T, 77.3 million for PT4 and 79.8 million for PT4). After removing of the adaptor sequences, contaminated reads, and short reads, a total of 268,588,730 clean reads comprising 94,200,622 nucleotides (nt) of the four accessions were de novo assembled into unigenes (75,790 for PT2, 89,165 for PT2T, 70,565 for PT4 and 92,439 for PT4T). The four libraries of unigenes were then combined into one (99,218 unigenes with a mean length of 949 nt and an N50 of 1568 nt). 47,518 of the unigenes were distinct clusters and 51,700 were singletons. All the unigenes were over 200-nt long, and half of them were more than 1400 nt, indicating that the assembly quality met the criteria (the assembled sequences with N50 N 1000; while reads being aligned back to unigenes, the alignment ratio N 85%) for transcriptome sequencing (Fig. S1). Details of the sequencing and assembly of the transcriptome are shown in Table 1. The correlation coefficients of the expression of both the well-watered duplicate diploid and tetraploid were shown in Fig. 2. Both duplicates showed linear correlations with their corresponding ones. The Spearman r values of the diploids and the tetraploids were 0.75 and 0.72, respectively, while the Pearson r values were 0.81 and 0.87, respectively.

Fig. 1. Tissues used for the transcriptome analysis. (a) PT2, well-watered diploid, (b) PT2T, 12 days drought treated diploid, (c) PT4, well-watered tetraploid, (d) PT4T, 12 days drought treated tetraploid.

Y. Dong et al. / Genomics 104 (2014) 295–305

297

Table 1 Overview of the sequencing and assembly. Statistics of data production

PT2

PT2T

PT4

PT4T

Number of clean reads Total nucleotides (nt) Q20 percentage (%) N percentage GC percentage (%)

64,638,340 5,817,450,600 95.14% 0.00% 46.77%

69,318,810 6,238,692,900 95.30% 0.00% 45.87%

65,842,456 5,925,821,040 94.94% 0.00% 46.75%

68,789,124 6,191,021,160 95.33% 0.00% 46.00%

Contigs

PT2

PT2T

PT4

PT4T

Number of contigs Total nucleotides (nt) in contigs Length of N50 (bp) Average length of contigs (nt)

132,005 42,589,063 528 323

141,285 46,261,979 520 327

131,743 40,038,137 478 304

145,541 47,400,990 510 326

Unigenes

PT2

PT2T

PT4

PT4T

Number of unigenes Total nucleotides (nt) in unigenes Length of N50 (bp) Average length of unigenes (bp)

75,790 55,218,964 1238 729

89,165 65,091,733 1305 730

70,565 45,983,351 1045 652

92,439 65,409,107 1237 708

All-unigenes Number of All-unigenes Total nucleotides (nt) in All-unigenes Length of N50 (bp) Average length of All-unigenes (bp)

99,218 94,200,622 1568 949

2.3. Annotation of the Paulownia leaf transcripts 2.3.1. Database searches Genome information for P. tomentosa is not available. Thus, the unigenes were searched against a number of protein databases (National Center for Biotechnology Information (NCBI) nr, Swiss-Prot, Kyoto Encyclopedia of Genes and Genomes (KEGG), and Clusters of Orthologous Groups (COG) in that order) using BLASTX with a cutoff E-value of 1.0E− 5. The proteins in each of the databases that shared the highest similarities with the unigenes were retrieved. We found that 68.8% of all the unigenes, a total of 68,295 distinct sequences, matched known proteins. 65,639 of the unigenes matched proteins in the nr-protein sequence database (Table S1), 34.1% of these shared strong homology with the nr sequences (E-value ≤ 1.0E−100), and most of the alignments had E-values in the range 1.0E−15–1.0E−100

(Fig. 3a). The similarity distribution showed that 69.5% of the unique unigene sequences shared similarities higher than 60% with the nr sequences, while the others shared similarities that ranged from 19 to 60% (Fig. 3b). The annotated unigene matched sequences that originated from many plant species: 31,322 (47.7%) unigenes shared strong homology with Vitis vinifera sequences, 8670 (13.21%) shared homology with Ricinus communis sequences, followed by sequences from Populus trichocarpa (7204 unigenes, 10.98%), Glycine max (3916 unigenes, 5.97%), Medicago truncatula (1339 unigenes, 2.04%), Nicotiana tabacum (1321 unigenes, 1.71%), Solanum lycopersicum (1123 unigenes, 1.71%), and other plant species (10,743 unigenes, 16.36%) (Fig. 3c). The sequence similarity of the genes with homologues to the different reference species was shown in Table S2. Thus, the P. tomentosa sequences shared the highest similarities with sequences from the most closely related plant species.

Fig. 2. Correlation coefficients of the expression of duplicate samples. X-axis represents the logarithmic value of diploid (a) or tetraploid (b) expression, while Y-axis represents the logarithmic value of the corresponding duplicate samples.

298

Y. Dong et al. / Genomics 104 (2014) 295–305

Fig. 3. Distribution of E-values (a), similarity (b), and species (c) for the unigene BLAST matches against the nr-protein database. For the BLASTX matches the E-value cutoff was b1.0E−5.

2.3.2. Functional annotation COG assignments were used to predict and classify possible functions of the unigenes. Based on sequence homology, 25,890 (26.1%) unigenes could be categorized into 25 COG groups (Fig. 4). A general function group with 8645 (8.71%) unigenes was the largest group. This was followed by a transcription group (4834 (4.87%) unigenes), a replication, recombination, and repair group (3982 (4.01%) unigenes), a post-translational modification, protein turnover, and chaperones group (3647 (3.68%) unigenes,), a signal transduction mechanisms group (3613, 3.64%), a translation, ribosomal structure, and biogenesis group (3130, 3.15%), a carbohydrate transport and metabolism group (3012, 3.04%), and an amino acid transport and metabolism group (2193, 2.21%). The nuclear structure group was the smallest with only 10 unigenes (0.01%). Unigenes were also annotated using Gene Ontology (GO) terms, and 54,287 unigenes (54.7%) were categorized into 58 functional groups at level two in the three basic ontologies: molecular function, cellular component, and biological process. Cell and cell part were the two largest groups, each containing 43,858 unigenes, and the smallest groups were virion and virion part, with only three unigenes predicted for each group (Fig. 5).

2.3.3. Metabolic pathway analysis The unigene metabolic pathways were investigated using the KEGG annotation system. 39,428 (39.7%) unigenes were mapped to 128 KEGG reference pathways. Among them, metabolic pathways, which included 9228 unigenes (23.4%), were found to be significantly larger than the other pathways, namely, biosynthesis of secondary metabolites (4505 unigenes, 11.38%), plant–pathogen interaction (2324 unigenes, 5.89%), plant hormone signal transduction (2087 unigenes, 5.29%), spliceosome

(1532 unigenes, 3.89%), RNA transport (1331, 3.38%), endocytosis (1232 unigenes, 3.12%), and glycerophospholipid metabolism (1215, 3.08%). The smallest group was betalain synthesis with only three unigenes (Table S3).

2.4. Comparison of gene expression profiles between drought-treated and well-watered Paulownia leaves By comparing the transcriptomes from the drought-treated and well-watered plants, a large number of DEGs were detected using NOISeq [22] (Tables S4, S5, S6 and S7). Pair-wise comparisons of the number of DEGs between summed P. tomentosa libraries (PT2 and PT4) are shown in Fig. 6. We identified a large amount of coregulated DEGs between the two comparisons of the droughttreated vs. the well-watered (PT2T vs. PT2R and PT4T vs. PT4R). Among them, 77 DEGs were up-regulated, while 520 were downregulated (Table S8). The same analysis was done for the comparison between the tetraploid vs. the diploid (PT4R vs. PT2 R and PT4T vs. PT2T), 116 DEGs were up-regulated and 196 were down-regulated (Table S9). The DEGs were categorized into the three GO groups: biological process, cellular component, and molecular function. In the cellular component group, most of the genes were in the cell and cell part (328 genes for the drought-treated vs. the well-watered; 120 genes for the tetraploid vs. the diploid) categories. In the biological process group, for the drought-treated vs. the well-watered, 320 and 297 genes were involved in metabolic process and cellular process respectively; similarly, for the tetraploid vs. the diploid, 99 and 97 genes were involved in metabolic process and cellular process respectively. In the molecular function group, 219 genes (the drought-treated

Y. Dong et al. / Genomics 104 (2014) 295–305

299

Fig. 4. Clusters of Orthologous Groups (COG) classification of the P. tomentosa unigenes. 25,890 unigenes (26.09% of the total) were annotated and grouped into 25 specific categories.

vs. the well-watered) and 70 genes (the tetraploid vs. the diploid) were involved in catalytic activity (Figs. S2 and S3). In the KEGG analysis, the number of DEGs in the metabolic pathway category was significantly higher for both the drought-treated vs. the well-watered and the tetraploid vs. the diploid compared with their numbers in any of the other pathways (Tables S10 and S11).

2.5. Real-time reverse transcription-polymerase chain reaction (PCR) validation of selected DEGs The expression patterns of 12 selected DEGs from the droughttreated PT2T and PT4T plants and the well-watered PT2 and PT4 plants were measured by quantitative real-time PCR (qRT-PCR) to confirm the

Fig. 5. Gene ontology (GO) classification of the P. tomentosa unigenes. 54,289 unigenes (54.71% of total) were annotated and categorized into 58 function groups.

300

Y. Dong et al. / Genomics 104 (2014) 295–305

Fig. 6. Statistics of differentially expressed genes in each pairwise comparison. Red bars represent the up-regulated genes while blue bars represent the down-regulated ones. PT4T, 12 days drought treated tetraploid. PT4R, well-watered tetraploid biological replicate. PF2T, 12 days drought treated diploid. PT2R, well-watered diploid biological replicate.

results of the fragments per kilobase of transcript per million fragments mapped (FPKM) method that we used to identify the DEGs from the transcriptome data. For 11 of the DEGs, the results obtained with the qRT-PCR reflected the same up- and down-regulation trends as those obtained using the FPKM method. Further, in both the tetraploid and diploid Paulownia plants, the changing trends of gene expression were consistent with aggravation of drought stress (Fig. 7). For the gene that represented the inconsistency between the qRT-PCR and transcriptome methods, it was likely due to the fact that transcriptome was more sensitive in the detection of low-abundant transcripts and small changes in gene expression than qRT-PCR method, just as reported by Yan et al. [23]. The predicted gene functions are shown in Table S12. One of the selected DEGs failed to be amplified in the first step. 3. Discussion Of the 99,218 assembled unigenes, 30,923 (31.2%) were not assigned functional annotations. Insufficient information in the public databases may be one explanation for the limited number of annotations that could be assigned. Another explanation might be that the unannotated sequences represent regions in the P. tomentosa genome that are poorly conserved in other plant species [24]. In this study, biological replicates were used; two libraries were created from well-watered diploid and tetraploid P. tomentosa, respectively. The correlation coefficients of logarithmic values of the expression of both the duplicated samples indicate that both of the duplicate wellwatered samples were significantly correlated. We ran NOISeq for different ploidy and different treatment comparisons on each library. Moreover, eleven out of twelve DEGs from NOISeq were confirmed by qRT-PCR, which indicates the high reliability of NOISeq for identifying DEGs. NOISeq was chosen mainly because the method can compute noise from replicates when replicates are available. Evaluation shows that the method is robust against different levels of sequencing depth and is stable across a number of variable conditions. Comparing with other statistical methods for gene expression analyses, the method has a better precision/recall and much lower false discovery rate [22]. Therefore, we were able to calculate differential gene expression profiles between treatments by combining the FPKM of four ploidy libraries using NOISeq. The present study has provided a dataset of genes and their inherent biochemical pathways. Various genes and pathways that may be responsible for drought response and adaption in P. tomentosa were identified. The droughttreated (PT2T and PT4T) and well-watered (PT2 and PT4) plants showed significant variations in leaf size and leaf rolling shapes [21].

At the 25% water stress level, PT2T and PT4T displayed leaf morphologies that could limit water loss by evaporation by restricting the exposed leaf surface. Expression profiling of the RNA-Seq data identified a large number of genes, such as those encoding oxidoreductases, monooxygenases, peroxidases, transmembrane transporters, glutamate synthase, phosphoenolpyruvate carboxykinase, alcohol dehydrogenase, heat shock protein 70, osmotin, chalcone synthase, chitinase, proline oxidase, sodium hydrogen exchanger, and brassinosteroids, that were differentially expressed in PT2T and PT4T compared with PT2 and PT4, respectively (Tables S4 and S5). Many of the DEGs were predicted to regulate or balance osmotic pressure, thus protecting plants instress conditions [25–28]. Many of the identified DEGs were found to be involved in diverse metabolic pathways that may be involved in exposure to drought stress (Tables S6 and S7). The transcriptional modulation that these findings reflect may trigger the synthesis of secondary metabolites that can confer plants responsive advantages to drought stress [28]. Interestingly, the GO and KEGG analyses of the DEGs suggested the possibility of crosstalk between pathways, which might help the plants to survive in water-scarce conditions. A large number of GO terms and KEGG pathways were present exclusively in the transcriptomes from PT2T and/or PT4T, and most of them were related to abiotic stresses; or example, cellular response to a stimulus, response to hypoxia, response to jasmonic acid, response to an abiotic stimulus, response to misfolded protein, response to oxidative stress, response to water deprivation, response to cold, response to salinity, response to desiccation, response to a hormone stimulus, ion transporter, kinase activity, response to inorganic substances, and response to hyperosmotic stress. The enrichment of stress-related GO terms and KEGG pathways among the DEGs is indicative of improved drought-stress management in the PT2T and PT4T plants [29]. The KEGG analysis revealed significant enrichment of proline, betaalanine, and sucrose metabolism pathways in PT2T and PT4T (Tables S10 and S11). In Arabidopsis, wheat, and maize under water-stress conditions, the concentrations of these small solute molecules have been reported to increase [30–32]. It is probable that they act as osmoprotective molecules under drought conditions, where they may play an important part in balancing the osmotic pressure across membranes. Secondary metabolites, like some flavonoids, were found to accumulate in the drought-treated plants, in agreement with findings reported previously [33]. Although osmoticum deposition rates and the concentration of various secondary metabolites have been found to increase to adjust osmotic pressure in drought-stressed plants, this may not be enough to maintain turgor pressure [28,34]. In such cases, various transporter proteins and ion channels may be required to regulate the unbalanced osmotic pressure [32]. In our study, we found that a variety of membrane transporters, such as sugar transporters, ABC transporters, and ion channels, were enriched in PT2T and PT4T (Tables S4 and S5). These transporters may play roles in maintaining the homeostasis of intracellular ion concentrations as well as in stabilizing the physiological balance of plants under drought stress [35]. Thus, the DEGs that were related to membrane transporters may have very plastic or adaptive behaviors in drought conditions. Phytohormones are known to play an important part in plant response and adaption to abiotic stresses. In the present study, we identified various DEGs that were predicted to encode plant hormones. The well-described ABA-dependent pathway uses ABA as the main phytohormone. ABA is known to be related to drought response in plants where it acts as an endogenous messenger in drought response and adaption processes. ABA is induced by water deficit, which triggers major alterations to gene expression patterns in plant cells [36]. This hormone also contributes to other important processes like plant growth, development, and seed dormancy [37]. In the present study, some of the DEGs were annotated as being involved in crosstalk among ABA and other important phytohormones, such as ethylene, auxin, zeatin, and brassinosteroid. It has been reported that ethylene is involved in almost all kinds of abiotic stress, including drought [38].

Y. Dong et al. / Genomics 104 (2014) 295–305

A gene that encodes 1-aminocyclopropane-1-carboxylate oxidase, which is the last enzyme in the ethylene biosynthesis pathway [37], was found to be elevated in both PT2T and PT4T. How drought stress is perceived by ABA is still an issue that needs to be investigated further. A series of ABA synthesis-related enzymes and transporters were found in the PT2T and PT4T transcriptomes (Tables S4 and S5). It has been reported that water-scarce conditions can trigger an immediate hydraulic signal, which may activate ABA biosynthesis over a great distance [39]. The signal can then activate enzymes like cytochrome P450 to catalyze ABA synthesis only a few minutes after the signal is received [40]. ABA transporters are also important during drought-response processes. Recent studies have shown that on perception of a signal, ABA is synthe-

301

sized primarily in vascular tissues and then exported to other cells in the plant. The family of ATP-dependent ABC transporters can stimulate the absorption of ABA. This mechanism is useful in the rapid distribution of ABA hormone to the surrounding tissues [41,42]. The data presented in this article will greatly improve our understanding of the molecular mechanisms involved in the response of P. tomentosa to water deficit. The findings reported here can be extended: (1) To identify the differentially expressed unigenes that were not annotated in the present study. Identification of these unigenes may reveal additional novel response pathways and the genes that control them. (2) To identify and examine DEGs in other tissues and at different time points. We are particularly interested in determining whether such

Fig. 7. qRT-PCR analysis of candidate drought response genes. The following genes were validated: (a) CL12194.Contig2, a late embryogenesis abundant protein 4; (b) CL4688.Contig7, an amine oxidase; (c) Unigene19816, a dormancy-associated protein 1; (d) Unigene20122, a dehydrin; (e) Unigene30149, a flavonoid glycosyltransferase; (f) CL2632.Contig3, a bidirectional sugar transporter; (g) CL2782.Contig1, a heat shock protein 70; (h) CL2843.Contig1, a 1-aminocyclopropane-1-carboxylate oxidase 12; (i) Unigene35597, a heat shock protein 70; (j) Unigene5056, an anthocyanin 5-aromatic acyltransferase; and (k) Unigene50562, an ABC transporter C family member 10. 18S rRNA was used as the internal reference gene. Standard error of the mean for three technical replicates is represented by the error bars. PT2, diploid accession; PT4, tetraploid accession. * represents statistically significant differences between the diploid and the tetraploid under the same conditions (P-value was less than 0.05).

302

Y. Dong et al. / Genomics 104 (2014) 295–305

Fig. 7 (continued).

genes and their transcripts can be used to detect stress responses at early stages of the plants' exposure to drought. (3) To address new questions and to design experiments to obtain more information about the biology of drought response and adaptation in plants. Further work is required to understand the contributions of the differentially expressed P. tomentosa genes before they can serve as a basis to identify candidate genes that can be targeted to enhance plant tolerance or adaptation to drought. In addition, strategies to incorporate the identified genes into molecular breeding programs should be developed. (4) To determine whether the observed changes in response to water-scarce conditions reflect changes in either gene expression or the posttranslational modification of the encoded proteins. It will also be interesting to find out whether the differentially regulated unigenes reflect a direct response to drought or are a secondary effect of the symptoms of drought. Further studies are required to understand the roles of the

DEGs and the proteins that they encode. (5) The results of this study lay a foundation for a further study on small RNA. 4. Conclusions In this study, we generated and investigated systematically the first global transcriptome of the leaves of drought-treated diploid and autotetraploid P. tomentosa by RNA-Seq sequencing, thereby providing a useful resource for this species. The availability and analysis of these sequences will greatly enrich the molecular information on P. tomentosa. The DEGs and their associated pathways have shed some light on the biology of the drought response and tolerance in Paulownia species. In addition, the data have provided candidate genes and pathways for future functional genomics studies. Further, 30,923 sequences could not be mapped to known sequences in the various protein databases and,

Y. Dong et al. / Genomics 104 (2014) 295–305

303

therefore, remained unannotated. These novel sequences will become a rich Paulownia gene pool for further study. The comprehensive analysis of the novel P. tomentosa sequence resource reported here provides an integrated easy-to-use collection of information that will contribute to a deeper understanding of the response of Paulownia trees to drought, and mining of this data will provide new directions for further experimental studies. The dataset of genes and their inherent biochemical pathways can act as a source of potential targets for improving the drought tolerance of Paulownia and other related species.

and single nucleotide A (adenine) addition. Then, the short fragments were connected with adapters. Suitable fragments were used as the templates for PCR amplification. An Agilent 2100 Bioanalyzer (Agilent Technologies, Palo Alto, CA, USA) and ABI StepOnePlus Real-Time PCR System (ABI, New York, NY, USA) were used to quantify and assess the quality of the sample library. Finally, the library was sequenced using the Illumina HiSeq™ 2000 platform (Illumina, San Diego, CA, USA).

5. Material and methods

5.3. Bioinformatics analysis

5.1. Plant material

Base calling was performed using the Illumina software supplied with the HiSeq 2000. There were two softwares used in this process. One was the real-time analysis software RTA which was installed in the DNA sequencer. It could analyze the results while sequencing and the result was to convert the image files into bcl files after analysis. The bcl files were then transmitted to the mainframe. Another was a module of OLB (version 1.9.4) which was used for data processing. It could convert the bcl files into sequence data files. The parameters (setupBclToQseq. py -b path/to/BaseCalls -o path/to/output/directory) used were ordinary input and output ones and no special parameters were involved. After that, we added another parameter (–ignore-missing-stats) to ignore the situations of the missed files. These data are called raw reads and were stored in fastq format. The raw reads included dirty reads that contained adapters and unknown or low quality bases. These contaminants were filtered out by the program filter_fq (parameters: --Nrate 0.05 --Q_low 10, 0.2 -G 5 -3 ****** -5 ***** (* represents the adaptor sequence)) to produce the clean reads. The clean reads have been deposited with the NCBI Sequence Read Archive database under Accession No. SRP031515. The short reads assembling program, Trinity [46], was used for the de novo assembly of the clean reads. The assembled sequences are called unigenes. Because multiple samples from the same species were sequenced, the unigenes from each assembly were processed using sequence clustering software to remove spliced and redundant sequences to acquire non-redundant unigenes that were as long as possible. The clustering generated unigenes shared more than 70% sequence similarity, which were named using the prefix CL, and singletons, which were named using the prefix Unigene. BLASTX alignments (E-value b 1.0E− 5) were performed between the unigenes and sequences in the nr, Swiss-Prot, KEGG, and COG protein databases. The directions of the unigene sequences were decided based on the best alignment results over all these databases. When different databases produced conflicting results the sequence direction was assigned based on a priority order of nr, Swiss-Prot, KEGG, and COG. When a unigene could not be aligned to any of the sequences in any of the four databases, ESTScan [47] was used to decide its direction.

The plant material was sourced from the Institute of Paulownia, Henan Agricultural University, Zhengzhou, Henan Province, China. Tissue culture seedlings of diploid and tetraploid P. tomentosa (derived from the same tissue culture, no genetic variation) were cultured for 30 days before being clipped from the roots. These seedlings of about 1 m high (18 diploids and 18 tetraploids) were transferred into nutrition blocks containing ordinary garden soil for 30 days. Samples with the same growing consistency were then transferred into nutrition pots 30 cm in diameter with trays underneath. Each pot was filled with the same amount of ordinary garden soil with one plant in each pot. After 50 days, tissue culture seedlings with the same growing consistency were subjected to drought conditions using a water controlling experiment according to the method of Zhang et al. [21]. Diploid and tetraploid Paulownia were grown in soil with 25% (PT2T and PT4T) and 75% (PT2 and PT4) relative soil water content. After 3 d, 6 d, 9 d, and 12 d (wilting state), the second pairs of leaves from the growing apex of the young sprout of the plants were picked from the droughttreated samples. The 3-d, 6-d, 9-d, and 12-d diploid leaf samples were named PT2T-1, PT2T-2, PT2T-3 and PT2T, respectively, and the 3-d, 6-d, 9-d, and 12-d tetraploid leaf samples were named PT4T-1, PT4T-2, PT4T-3, and PT4T, respectively. The well-watered PT2 and PT4 samples were picked only after 12 d. The water and chlorophyll contents were measured according to the methods of Bao [43] and An et al. [44], respectively. The relative conductivity, the content of malonaldehyde, the superoxide dismutase activity, the soluble protein content, and the proline content were measured according to the method of Li [45]. 5.2. Construction of cDNA libraries of Paulownia A total of 18 single plants were used for Illumina short-read sequencing. Three individuals taken in day 12 of leaves were pooled for the PT2T and PT4T library, respectively (approximately 8 μg RNA/library). Additionally, leaves from six plants from PT2 and six plants from PT4 were pooled and prepared for four total RNA libraries. (The leaves from three individual plants were combined to form one biological replicate and there were two biological replicates for each accession, approximately 8 μg RNA/library, labeled as PT2, PT2-1, PT4, PT4-1, respectively.) Total RNA was extracted from the cells using TRIzol reagent (Invitrogen, Carlsbad, CA, USA), followed by RNA purification using the RNeasy MiniElute Cleanup Kit (Qiagen, Dusseldorf, Germany) according to the manufacturer's protocol. With a NanoVue UV–Vis spectrophotometer (GE Healthcare Bio-Science, Uppsala, Sweden), RNA was quantified by measuring absorbance at 260 nm. Absorbance ratios of OD260/280 and OD260/230 were used to assess the purity of the RNA samples. The integrity of the RNA was checked by 1% agarose gel electrophoresis. After total RNA extraction and DNase I treatment, magnetic beads with Oligo (dT) were used to isolate mRNA. The mRNA was mixed with fragmentation buffer to obtain short fragments. cDNA was synthesized using the mRNA fragments as templates. The short cDNA fragments were purified and resolved with EB buffer for end reparation

5.4. Unigene functional annotation 5.4.1. Database searches Unigene sequences are first aligned to the protein databases nr (ftp://ftp.ncbi.nih.gov/blast/db/nr), Swiss-Prot (http://www.ebi.ac.uk/ swissprot/), KEGG (http://www.genome.jp/kegg/) and COG (http:// www.ncbi.nlm.nih.gov/COG/) (E-value cutoff was b1.0E − 5) using BLASTX, and to the NCBI nucleotide nt database (ftp://ftp.ncbi.nih.gov/ blast/db/nt) (E-value cutoff b 1.0E − 5) using BLASTN. The matched sequences that shared the highest sequence similarity with the corresponding unigenes were retrieved along with their functional annotations. The KEGG database provided pathway annotations for the matched unigenes. The unigenes are also aligned to the COG sequences to help predict and classify their possible functions.

304

Y. Dong et al. / Genomics 104 (2014) 295–305

5.4.2. GO classification We used the Blast2GO program [48] to retrieve the GO annotations for each of the matched unigenes from the nr database. To understand the distribution of the predicted functions of the genes in P. tomentosa transcriptome at the macro level, the GO annotations were subjected to GO functional classification using the WEGO software [49]. 5.4.3. Protein coding sequence (CDS) prediction Unigenes that aligned to sequences in a higher priority database also aligned to sequences in the lower priority databases. The highest ranked alignments were used to predict the CDS of each unigene. The predicted CDSs were translated into amino acid sequences using the standard codon usage. 5.5. Unigene expression difference analysis

we chose a Q-value of ≤0.05 to identify the pathways that were significantly enriched among the DEGs. 5.8. Correlation coefficients of the expression of the duplicate samples The raw data of duplicate well-watered diploid and tetraploid samples were filtered as Section 5.3. The processed clean data were then compared to the original project of the assembled All-Unigene.fa. The expression of the unigenes was calculated using FPKM methods [49]. The correlation coefficients of logarithmic values of the expression of both the duplicated samples were then calculated. 5.9. qRT-PCR analysis

where FPKM is the expression of unigene A, C is the number of fragments that aligned uniquely to unigene A, N is the total number of fragments that aligned uniquely to all unigenes, and L is the number of bases in the coding sequence of unigene A. Differentially expressed genes (DEGs) were selected by a novel method called NOISeq [22] with a threshold value of P ≥ 0.8 and an absolute value of log2Ratio N 2. This non-parametric algorithm compares replicates within the same condition to estimate noise distribution and performs pairwise comparison of the samples to identify differentially expressed genes. GO functional analysis and KEGG pathway analysis were carried out on the identified DEGs.

The RNA from the leaves of the PT2T-1, PT2T-2, PT2T-3, PT2T, PT2, PT2-1, PT4T-1, PT4T-2, PT4T-3, PT4T, PT4 and PT4-1 samples was extracted with TRIzol (Sangon, Shanghai, China). The RNA was then precipitated with isopropanol. Purified and concentrated RNA was denatured and first-strand cDNA for all the samples was synthesized using a PrimeScript RT reagent Kit (Takara, Dalian, China). The cDNAs were then amplified in a Bio-Rad CFX96TM Real-Time System (BioRad, Hercules, CA, USA) with SYBR Premix Ex Taq TM II (Takara). The following PCR parameters were used: 50 °C for 2 min, 95 °C for 30 s followed by 40 cycles of 94 °C for 15 s and 60 °C for 1 min. Three replicates were analyzed for each gene. The average threshold cycle (Ct) was normalized and the relative expression changes were calculated using the 2−△△Ct method. The 18S rRNA of Paulownia was chosen as an internal reference gene for normalization. The primers used for the qRT-PCR analysis, amplicon sizes, and potential gene functions are shown in Table S12. Supplementary data to this article can be found online at http://dx. doi.org/10.1016/j.ygeno.2014.08.008.

5.6. GO functional enrichment analysis for the DEGs

Acknowledgments

All the DEGs from NOISeq were mapped to terms in the GO database (http://www.geneontology.org/) and the numbers of genes that were annotated with each GO term were calculated to obtain a gene list and the numbers of genes for every assigned GO term. To identify significantly enriched GO terms among the DEGs, a hypergeometric test was used. The formula used to calculate the P-value was:

This work was supported by the joint funds of the National Natural Science Foundation of China (NSFC) (Grant No. U1204309), the fund of the Transformation Project of the National Agricultural Scientific and Technological Achievement of China (Grant No. 2012GB2D000271), the Central Financial Forestry Science Promotion Project (Grant No. GTH [2012]01), the fund of the Science Key Program of Department of Henan Education (Grant No. 12A220003), the fund of the Technology Innovation Team Project of Zhengzhou (Grant No. 121PCXTD515), and the Fund of Henan Agricultural University (Grant No. 30600405).

Unigene expression was calculated using the FPKM (fragments per kb per million fragments) method [50] as: FPKM ¼

106 C NL=103

m−1 X

P ¼ 1−

i¼0

   M N−M i n‐i   N n

where N is the total number of unigenes with GO annotations, n is the number of DEGs in N, M is the total number of unigenes annotated to a particular GO term, and m is the number of DEGs in M. A P-value after Bonferroni correction of ≤ 0.05 was used as the threshold. GO terms that fulfilled this condition were defined as significantly enriched in the DEGs in the context of the whole transcriptome background. The GO functional enrichment analysis was used to recognize the main biological functions of the DEGs as well as to integrate the clustering analysis of the expression patterns to obtain the expression patterns of the DEGs annotated with GO terms. 5.7. KEGG pathway analysis for the DEGs Pathway enrichment analysis was used to identify significantly enriched pathways in the DEGs in the context of the whole transcriptome background. The formula used to calculate the P-value was similar to the formula used in the GO analysis. A Q-value was defined as the FDR analog of the P-value. After multiple testing and correction,

References [1] A. Ashoub, T. Beckhaus, T. Berberich, M. Karas, W. Bruggemann, Comparative analysis of barley leaf proteome as affected by drought stress, Planta 237 (2013) 771–781. [2] T. Hirayama, K. Shinozaki, Research on plant abiotic stress responses in the postgenome era: past, present and future, Plant J. Cell Mol. Biol. 61 (2010) 1041–1052. [3] A. Caruso, F. Chefdor, S. Carpin, C. Depierreux, F.M. Delmotte, G. Kahlem, D. Morabito, Physiological characterization and identification of genes differentially expressed in response to drought induced by PEG 6000 in Populus canadensis leaves, J. Plant Physiol. 165 (2008) 932–941. [4] E.K. Bae, H. Lee, J.S. Lee, E.W. Noh, Isolation and characterization of osmotic stressinduced genes in poplar cells by suppression subtractive hybridization and cDNA microarray analysis, Plant Physiol. Biochem. PPB/Soc. Fr. Physiol. Veg. 48 (2010) 136–141. [5] E.K. Bae, H. Lee, J.S. Lee, E.W. Noh, Differential expression of a poplar SK2-type dehydrin gene in response to various stresses, BMB Rep. 42 (2009) 439–443. [6] J. Chen, X. Xia, W. Yin, A poplar DRE-binding protein gene, PeDREB2L, is involved in regulation of defense response against abiotic stress, Gene 483 (2011) 36–42. [7] E.T. Hamanishi, M.M. Campbell, Genome-wide responses to drought in forest trees, Forestry 84 (2011) 273–293. [8] D. Cohen, M.B. Bogeat-Triboulot, E. Tisserant, S. Balzergue, M.L. Martin-Magniette, G. Lelandais, N. Ningre, J.P. Renou, J.P. Tamby, D. Le Thiec, I. Hummel, Comparative transcriptomics of drought responses in Populus: a meta-analysis of genome-wide expression profiling in mature leaves and root apices across two genotypes, BMC Genomics 11 (2010) 630.

Y. Dong et al. / Genomics 104 (2014) 295–305 [9] F. Yang, Y. Wang, L.F. Miao, Comparative physiological and proteomic responses to drought stress in two poplar species originating from different altitudes, Physiol. Plant. 139 (2010) 388–400. [10] O. Gailing, B. Vornam, L. Leinemann, R. Finkeldey, Genetic and genomic approaches to assess adaptive genetic variation in plants: forest trees as a model, Physiol. Plant. 137 (2009) 509–519. [11] J. Ingram, D. Bartels, The molecular basis of dehydration tolerance in plants, Annu. Rev. Plant Physiol. Plant Mol. Biol. 47 (1996) 377–403. [12] K. Shinozaki, K. Yamaguchi-Shinozaki, Gene networks involved in drought stress response and tolerance, J. Exp. Bot. 58 (2007) 221–227. [13] M.J. Oliver, J.C. Cushman, K.L. Koster, Dehydration tolerance in plants, Methods Mol. Biol. 639 (2010) 3–24. [14] W. Wang, B. Vinocur, O. Shoseyov, A. Altman, Role of plant heat-shock proteins and molecular chaperones in the abiotic stress response, Trends Plant Sci. 9 (2004) 244–252. [15] D. Wanke, H.U. Kolukisaoglu, An update on the ABCC transporter family in plants: many genes, many proteins, but how many functions? Plant Biol. 12 (Suppl. 1) (2010) 15–25. [16] A.N. Olsen, H.A. Ernst, L.L. Leggio, K. Skriver, NAC transcription factors: structurally distinct, functionally diverse, Trends Plant Sci. 10 (2005) 79–87. [17] J. Chen, X. Xia, W. Yin, Expression profiling and functional characterization of a DREB2-type gene from Populus euphratica, Biochem. Biophys. Res. Commun. 378 (2009) 483–487. [18] A.D. Krikorian, Paulownia in China: cultivation and utilization, Asian Network for Bio Sci & Inter Devel Res Centre, Canada, 1986, p. 65. [19] A. Lyons, Paulownia, in: D. Race (Ed.), Agroforestry: Trees for Productive Farming, Agmedia, Victoria, 1993, pp. 149–154. [20] S. Newman, K. Bennett, Y. Wu, Performance of maize, beans and ginger as intercrops in Paulownia plantations in China, Agrofor. Syst. 39 (1998) 23–30. [21] X.S. Zhang, X.Q. Zhang, M.J. Deng, Y.P. Dong, Zh.L. Zhao, G.Q. Fan, Comparative studies on physiological responses of diploid Paulownia and its tetraploid to drought stress, J. Henan Agric. Sci. 42 (2013) 118–123. [22] S. Tarazona, F. Garcıa-Alcalde, J. Dopazo, A. Ferrer, A. Conesa, Differential expression in RNA-seq: a matter of depth, Genome Res. 21 (2012) 2213–2223. [23] X. Yan, C. Dong, J. Yu, W. Liu, C. Jiang, J. Liu, Q. Hu, X. Fang, W. Wei, Transcriptome profile analysis of young floral buds of fertile and sterile plants from the selfpollinated offspring of the hybrid between novel restorer line NR1 and Nsa CMS line in Brassica napus, BMC Genomics 14 (2013) 26. [24] R. Hou, Z. Bao, S. Wang, H. Su, Y. Li, H. Du, J. Hu, S. Wang, X. Hu, Transcriptome sequencing and de novo analysis for Yesso scallop (Patinopecten yessoensis) using 454 GS FLX, PLoS ONE 6 (2011) e21560. [25] G. Taramino, M. Sauer, J.L. Stauffer Jr., D. Multani, X. Niu, H. Sakai, F. Hochholdinger, The maize (Zea mays L.) RTCS gene encodes a LOB domain protein that is a key regulator of embryonic seminal and post-embryonic shoot-borne root initiation, Plant J. Cell Mol. Biol. 50 (2007) 649–659. [26] Y. Okushima, H. Fukaki, M. Onoda, A. Theologis, M. Tasaka, ARF7 and ARF19 regulate lateral root formation via direct activation of LBD/ASL genes in Arabidopsis, Plant Cell 19 (2007) 118–130. [27] M.S. Hanafy, A. El-Banna, H.M. Schumacher, H.J. Jacobsen, F.S. Hassan, Enhanced tolerance to drought and salt stresses in transgenic faba bean (Vicia faba L.) plants by heterologous expression of the PR10a gene from potato, Plant Cell Rep. 32 (2013) 663–674. [28] P. Rampino, S. Pataleo, C. Gerardi, G. Mita, C. Perrotta, Drought stress response in wheat: physiological and molecular analysis of resistant and sensitive genotypes, Plant Cell Environ. 29 (2006) 2143–2152. [29] R. Sanchez-Fernandez, M. Fricker, L.B. Corben, N.S. White, N. Sheard, C.J. Leaver, M. Van Montagu, D. Inze, M.J. May, Cell proliferation and hair tip growth in the Arabidopsis root are under mechanistically different forms of redox control, Proc. Natl. Acad. Sci. U. S. A. 94 (1997) 2745–2750.

305

[30] P.H. Yancey, Organic osmolytes as compatible, metabolic and counteracting cytoprotectants in high osmolarity and other stresses, J. Exp. Biol. 208 (2005) 2819–2830. [31] D.J. Wohlbach, B.F. Quirino, M.R. Sussman, Analysis of the Arabidopsis histidine kinase ATHK1 reveals a connection between vegetative osmotic stress sensing and seed maturation, Plant Cell 20 (2008) 1101–1117. [32] H.J. Bohnert, D.E. Nelson, R.G. Jensen, Adaptations to environmental stresses, Plant Cell 7 (1995) 1099–1111. [33] G. Lu, C. Gao, X. Zheng, B. Han, Identification of OsbZIP72 as a positive regulator of ABA response and drought tolerance in rice, Planta 229 (2009) 605–615. [34] D.J. Cosgrove, Loosening of plant cell walls by expansins, Nature 407 (2000) 321–326. [35] D. Bartels, R. Sunkar, Drought and salt tolerance in plants, Crit. Rev. Plant Sci. 24 (2005) 23–58. [36] A. Christmann, D. Moes, A. Himmelbach, Y. Yang, Y. Tang, E. Grill, Integration of abscisic acid signalling into plant responses, Plant Biol. 8 (2006) 314 (Stuttgart, Germany). [37] T. Hirayama, K. Shinozaki, Perception and transduction of abscisic acid signals: keys to the function of the versatile plant hormone ABA, Trends Plant Sci. 12 (2007) 343–351. [38] C.N. Lu PL, R. An, Z. Su, B.S. Qi, F. Ren, J. Chen, X.C. Wang, A novel droughtinducible gene, ATAF1, encodes a NAC family protein that negatively regulates the expression of stress-responsive genes in Arabidopsis, Plant Mol. Biol. 63 (2007) 289–305. [39] A. Christmann, E.W. Weiler, E. Steudle, E. Grill, A hydraulic signal in root-to-shoot signalling of water shortage, Plant J. Cell Mol. Biol. 52 (2007) 167–174. [40] M. Okamoto, Y. Tanaka, S.R. Abrams, Y. Kamiya, M. Seki, E. Nambara, High humidity induces abscisic acid 8′-hydroxylase in stomata and vasculature to regulate local and systemic abscisic acid responses in Arabidopsis, Plant Physiol. 149 (2009) 825–834. [41] J. Kang, J.U. Hwang, M. Lee, Y.Y. Kim, S.M. Assmann, E. Martinoia, Y. Lee, PDR-type ABC transporter mediates cellular uptake of the phytohormone abscisic acid, Proc. Natl. Acad. Sci. U. S. A. 107 (2010) 2355–2360. [42] T. Kuromori, T. Miyaji, H. Yabuuchi, H. Shimizu, E. Sugimoto, A. Kamiya, Y. Moriyama, K. Shinozaki, ABC transporter AtABCG25 is involved in abscisic acid transport and responses, Proc. Natl. Acad. Sci. U. S. A. 107 (2010) 2361–2366. [43] S.D. Bao, Agricultural Soil Analysis, China Agriculture Press, Beijing, 2000. [44] Y.Y. An, Z.C. Liang, R.L. Han, Water use characteristics and drought adaption of three native shrubs in the loess plateau, Sci. Silvae Sin. 47 (2011) 8–15. [45] H. Li, Principle and Technology of Plant Physiology and Biochemistry, Higher Education Press, Beijing, 2000. [46] M.G. Grabherr, B.J. Haas, M. Yassour, J.Z. Levin, D.A. Thompson, I. Amit, X. Adiconis, L. Fan, R. Raychowdhury, Q. Zeng, Z. Chen, E. Mauceli, N. Hacohen, A. Gnirke, N. Rhind, F. di Palma, B.W. Birren, C. Nusbaum, K. Lindblad-Toh, N. Friedman, A. Regev, Fulllength transcriptome assembly from RNA-Seq data without a reference genome, Nat. Biotechnol. 29 (2011) 644–652. [47] C. Iseli, C.V. Jongeneel, P. Bucher, ESTScan: a program for detecting, evaluating, and reconstructing potential coding regions in EST sequences, Proceedings/… International Conference on Intelligent Systems for Molecular Biology; ISMB. International Conference on Intelligent Systems for Molecular Biology, 1999, pp. 138–148. [48] A. Conesa, S. Gotz, J.M. Garcia-Gomez, J. Terol, M. Talon, M. Robles, Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research, Bioinformatics 21 (2005) 3674–3676. [49] J. Ye, L. Fang, H. Zheng, Y. Zhang, J. Chen, Z. Zhang, J. Wang, S. Li, R. Li, L. Bolund, J. Wang, WEGO: a web tool for plotting GO annotations, Nucleic Acids Res. 34 (2006) W293–W297. [50] A. Mortazavi, B.A. Williams, K. McCue, L. Schaeffer, B. Wold, Mapping and quantifying mammalian transcriptomes by RNA-Seq, Nat. Methods 5 (2008) 621–628.