Transcriptome analysis of the plateau fish (Triplophysa dalaica): Implications for adaptation to hypoxia in fishes Ying Wang, Liandong Yang, Bo Wu, Zhaobin Song, Shunping He PII: DOI: Reference:
S0378-1119(15)00432-1 doi: 10.1016/j.gene.2015.04.023 GENE 40429
To appear in:
Gene
Received date: Revised date: Accepted date:
15 December 2014 2 April 2015 7 April 2015
Please cite this article as: Wang, Ying, Yang, Liandong, Wu, Bo, Song, Zhaobin, He, Shunping, Transcriptome analysis of the plateau fish (Triplophysa dalaica): Implications for adaptation to hypoxia in fishes, Gene (2015), doi: 10.1016/j.gene.2015.04.023
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT Transcriptome analysis of the plateau fish (Triplophysa dalaica): implications for adaptation to hypoxia in fishes
The Key Laboratory of Aquatic Biodiversity and Conservation of Chinese Academy
SC R
1
IP
T
Ying Wang1, 2, Liandong Yang1, 2, Bo Wu3, Zhaobin Song3, Shunping He1*
of Sciences, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, PR China
University of Chinese Academy of Sciences, Beijing 100049, PR China
3
Sichuan Key Laboratory of Conservation Biology on Endangered Wildlife, College
MA
NU
2
TE
D
of Life Sciences, Sichuan University, Chengdu 610065, China
CE P
*Correspondence: S.P. He. The Key Laboratory of Aquatic Biodiversity and Conservation of Chinese Academy of Sciences, Institute of Hydrobiology, Chinese
AC
Academy of Sciences, Wuhan 430072, PR China E-mail addresses:
[email protected]
1
ACCEPTED MANUSCRIPT Abstract Triplophysa dalaica, endemic species of Qinghai-Tibetan Plateau, is informative for
IP
T
understanding the genetic basis of adaptation to hypoxic conditions of high altitude
SC R
habitats. Here, a comprehensive gene repertoire for this plateau fish was generated using the Illumina deep paired-end high-throughput sequencing technology. De novo assembly yielded 145, 256 unigenes with an average length of 1632 bp. Blast searches
NU
against GenBank non-redundant database annotated 74,594 (51.4%) unigenes
MA
encoding for 30,047 gene descriptions in T. dalaica. Functional annotation and classification of assembled sequences were performed using Gene Ontology (GO),
and
Genomes
(KEGG) analysis. After comparison with other fish
TE
Genes
D
clusters of euKaryotic Orthologous Groups (KOG) and Kyoto Encyclopedia of
CE P
transcriptomes, including silver carp (Hypophthalmichthys molitrix) and mud loach (Misgurnus anguillicaudatus), 2621 high-quality orthologous gene alignments were
AC
constructed among these species. 61 (2.3%) of the genes were identified as having undergone positive selection in the T. dalaica lineage. Within the positively selected genes, 13 genes were involved in hypoxia response, of which 11 were listed in HypoxiaDB. Furthermore, duplicated hif-α (hif-1αA/B and hif-2αA/B), EGLN1 and PPARA candidate genes involved in adaptation to hypoxia were identified in T. dalaica transcriptome. Branch-site model in PAML validated that hif-1αB and hif-2αA genes have undergone positive selection in T.dalaica. Finally, 37,501 simple sequence repeats (SSRs) and 19,497 high-quality single nucleotide polymorphisms (SNPs) were identified in T. dalaica. The identified SSR and SNP markers will 2
ACCEPTED MANUSCRIPT facilitate the genetic structure, population geography and ecological studies of
IP
T
Triplophysa fishes.
SC R
Keywords: Triplophysa dalaica, Transcriptome, Hypoxic adaptation, Positive
NU
selection, Branch-site model
MA
1. Introduction
The genus Triplophysa is one member of the subfamily Nemacheilinae
D
(Cypriniformes: Balitoridae) (Nelson, 2006), which is a strongly diverged fish group,
TE
with 137 valid species reported in the Fishbase (Froese and Pauly, 2014). Triplophysa
CE P
fishes, also known as Plateau-dwelling loaches, are mainly distributed at high altitude rivers and lakes on the Qinghai-Tibetan Plateau (QTP) and its peripheral regions (Zhu,
AC
1989; Prokofiev, 2001; He et al., 2011). The elevation in this area ranges from 700 to 5000 m, and the high altitudes are accompanied by hypoxia and low temperatures (Wu YF, 1992). As representatively endemic fishes of the QTP, Triplophysa dalaica(Kessler, 1876) (Cypriniformes: Balitoridae: Nemacheilinae) is adapted to the cold and hypoxic conditions of high-altitude habitats. Recent studies of adaptation to high-altitude hypoxia mainly focused on endothermic vertebrates, including Ground tit, yak, Tibetan Mastiffs, Tibetan wild boar, and Tibetans (Qiu et al., 2012a; Cai et al., 2013; Li et al., 2013b; Qu et al., 2013; Li
et
al.,
2014).
EGLN1
(1egl
9 3
homolog
1),
PPARA
(peroxisome
ACCEPTED MANUSCRIPT proliferator-activated receptor-a genes) and EPAS1, also known as HIF2α (endothelial PAS domain 1) were significantly associated with hypoxia adaptation (Beall et al.,
IP
T
2010; Bigham et al., 2010; Simonson et al., 2010; Storz, 2010; Yi et al., 2010).
SC R
However, few studies have been performed on poikilothermic species. By comparing the two transcriptomes of Rana chensinensis and R. kukunoris which inhabit respective low- and high-elevation habitats, candidate genes were identified that may
NU
be involved in high-elevation adaption in poikilothermic species (Yang et al., 2012).
MA
Guan et al. (2014) concluded HIF-1αB may be the most important regulator in the adaptation of schizothoracine fish to the extreme environment of the Tibetan Plateau.
D
In fact, as an important transcriptional factor, hypoxia-inducible factor (HIF) has
TE
evolved specialized roles in both low temperature acclimation and hypoxia response
CE P
(Rissanen et al., 2006; Kaelin and Ratcliffe, 2008). For example, a study on a poikilothermic vertebrate, crucian carp (Carassius carassius) revealed that
AC
hypoxia-inducible factor-1 (HIF-1) could have a role in low temperature acclimation (Rissanen et al., 2006). Generally speaking, HIF is mainly regulated at the posttranscriptional level (Kallio et al., 1997), however, transcriptional control may be central for the regulation of HIFs in fishes. For example, in grass carp (Ctenopharyngodon idella), two distinct HIF-α isoforms (gcHIF-1alpha and gcHIF-4alpha) were differentially regulated at the transcriptional level in response to hypoxic stress in different organs (Law et al., 2006). Similar studies were performed in Atlantic croaker (Micropogonias undulatus), Chinese sucker (Myxocyprinus asiaticus) and Wuchang bream (Megalobrama amblycephala) and also showed that 4
ACCEPTED MANUSCRIPT HIF-α isoforms were regulated in hypoxic conditions at the transcriptional level and potentially useful molecular indicators of environmental hypoxia exposure (Chan et
IP
T
al., 2005; Rahman and Thomas, 2007; Chen et al., 2012). Nevertheless, the
SC R
contribution of HIF and the genetics basis to high-altitude adaptation in Triplophysa fishes were yet to be investigated. On the other hand, as an iconic symbol species of Qinghai-Tibetan Plateau, only a few genetic markers of Triplophysa fishes are
NU
available including partially mitochondrial DNA fragments, few nuclear genes and
MA
microsatellites (Tang et al., 2006; Hou et al., 2012; Liu et al., 2012; Wang et al., 2012; Yao et al., 2012; Li et al., 2013a; Tang et al., 2013). The limited genetic resources,
D
especially the scarcity of HIF information in Triplophysa fishes have greatly hindered
TE
understanding the genetic basis of adaptation to high-altitude environment.
CE P
With respect to non-model organisms, transcriptome sequencing is an effective way to obtain large amounts of coding sequences and identify large numbers of
AC
molecular makers (Wang et al., 2009; Sánchez et al., 2011; Wang et al., 2011; Fu and He, 2012; Ramayo-Caldas et al., 2012). Therefore, we attempt to reveal the genetic basis about how Triplophysa fishes cope with high-altitude environment using transcriptome data. It has been proved that de novo assembly of high-throughput sequences from single tissue usually results in relative short sequences in some animal models, including rainbow trout (Oncorhynchus mykiss) transcriptome researches (Salem et al., 2010; Le Cam et al., 2012). In this study, we adopted paired-end Illumina deep sequencing to characterize the T. dalaica transcriptome from five mixed tissues (heart, 5
ACCEPTED MANUSCRIPT liver, brain, spleen and kidney). The primary goal of this study was to generate a large amount of T. dalaica transcriptomic reads from multiple tissues, and provide a wealth
IP
T
of sequence data to serve as a reference transcriptome for future studies. The second
SC R
goal is to detect candidate genes that may be involved in adaptation to high altitude hypoxia. The last one is to gather and apply a large number of genetic markers (microsatellites and SNPs) to the study of the genetic diversity, geographical
D
2. Materials and methods
MA
NU
distribution, expansion and origins of plateau-dwelling loaches.
TE
2.1. Ethics statement
This study was approved by the ethics committee of Institute of Hydrobiology,
CE P
Chinese Academy of Sciences.
AC
2.2. Sample collection, cDNA library construction and Illumina deep sequencing An adult male T. dalaica was collected in upper reaches of the Yellow River in Ruoergai County of Sichuan Province. The fish was netted directly into an ice water bath until no movement was dissected (<30 seconds). Total RNA from five tissues (heart,
liver,
brain,
spleen
and
kidney)
was
extracted
using
SV
Total RNA Isolation System (Premega). Equal quantities of RNA from each tissue were pooled to generate one mixed sample. Then, the mRNA was fragmented using divalent cations under elevated temperature in NEBNext First Strand Synthesis Reaction Buffer (5X). First strand cDNA was synthesized using random hexamer 6
ACCEPTED MANUSCRIPT primer and M-MuLV Reverse Transcriptase (RNase H-). Second strand cDNA was subsequently synthesized. Sequencing library was generated using TruSeq SR Cluster
IP
T
Kit v3-cBot-HS (Illumina) following manufacturer’s recommendations. In the final
SC R
step, the library preparations were sequenced on an Illumina Hiseq 2000 platform and 100 bp paired-end reads were generated. The sequencing reads were submitted through the NCBI SRA and can be accessible via NCBI BioProject accession
MA
NU
SRP050972.
2.3. Data filtration, de novo assembly and assembly assessment
D
The quality of raw reads was checked by FastQC firstly. Then, the raw reads were
reads
(
using
the
Cutadapt
application
CE P
quality
TE
cleaned by removing reads containing adapter, reads containing ploy-N and low (version
1.6;
http://code.google.com/p/cutadapt/)andTrimGalore software (Barbraham,Boinformati De
novo
AC
cs) (http://www.bioinformatics.babraham.ac.uk/projects/trim_galore/).
assembly of clean reads was performed using Trinity software with default parameters (Grabherr et al., 2011). Then CD-HIT-EST was used to remove redundant transcripts with sequence identity threshold of 0.95 and word length
of 10 (Fu et al.,
2012), resulting in a non-redundant transcripts which were considered as unigenes for the following analyses and stored in Fasta format. To assess the quality of our assembled unigenes, we made comparisons with other fish
transcriptomes
sequenced
within
Cypriniformes,
Carassius
auratus,
Megalobrama amblycephala, Hypophthalmichthys molitrix, Elopichthys bambusa and 7
ACCEPTED MANUSCRIPT Cyprinus carpio (Fu and He, 2012; Gao et al., 2012; Ji et al., 2012; Liao et al., 2013; Long et al., 2013; Zou et al., 2014). Furthermore, cDNA template was prepared using
IP
T
high-quality RNA with a first-strand DNA synthesis kit (Promega). Reverse
SC R
transcription polymerase chain reaction (RT-PCR) assay was used to validate the quality of assembled unigenes. Ten pairs of primers for 10 T. dalaica unigenes with different expression levels (FPKM, ranged from 306 to 2218) were designed. The
NU
primer sequences, amplicon length and sequence description are displayed in
MA
Supplementary Table S1.
D
2.4. Sequence annotation
TE
The unigenes were first annotated by searching sequence homologies against
CE P
SwissProt database (released on March 19, 2014 at http://www.uniprot.org/downloads) and the National Center for Biotechnology Information (NCBI) nonredundant protein
AC
(Nr) database (released on May 3, 2014 at http://www.ncbi.nlm.nih.gov) using blastx (Altschul et al., 1997) with a cutoff E-value set at 1e-5. The results of blastx searches against the Nr database were imported into Blast2GO (Götz et al., 2008) for GO term mapping. The results of Blast2GO analysis were classified into three main GO categories, biological processes, molecular functions and cellular components using WEGO software (Ye et al., 2006)(http://wego.genomics.org.cn/cgi-bin/wego/index.pl). Additionally, KOG database (ftp://ftp.ncbi.nih.gov/pub/COG/KOG) was also used to predict and classify functions. The overview of metabolic pathway analysis was conducted
by
the
online
KEGG
Automatic 8
Annotation
Server
(KAAS)
ACCEPTED MANUSCRIPT (http://www.genome.jp/tools/kaas/) (Moriya et al., 2007). The Bi-directional Best Hit
SC R
2.5 Orthologous gene identification and alignment
IP
T
(BBH) method was used to obtain KEGG Orthology (KO) assignment.
We downloaded the transcriptome data of silver carp (H. molitrix) (Fu and He, 2012) and mud loach (M. anguillicaudatus) (Long et al., 2013) from the National Center for
NU
Biotechnology Information (http://www.ncbi.nih.gov/). The quality control and
MA
assembly were performed same as the T. dalaica data. Putative orthologues among these three species were obtained using a reciprocal best hit with blast matches with
D
zebrafish transcriptome as a reference. EMBOSS-GETORF was employed to predict
TE
the open reading frame (ORF) for each orthologous group (Rice et al., 2000). Putative
CE P
orthologues were aligned at the codon level with the option “-codon” using the Prank program (Loytynoja and Goldman, 2005). After alignments were generated, the
AC
Gblocks program was used to remove poorly aligned and potentially unreliable positions with the default parameters (the sequence type being codon “-t=c”) (Castresana, 2000). Additionally, to eliminate the effect of uncertain bases on the test of positive selection, all positions that had gaps (“-”) and “N” from the alignments were deleted. After the above trimming process, if the remaining alignment was shorter than 90bp (30 codons), then the entire alignment was excluded.
2.6. Detection of natural selection For each of the remaining orthologs, CODEML’s branch-site model (model=2, 9
ACCEPTED MANUSCRIPT Nsites=2) in the PAML package (version 4.7) was used to detect positive selection (Yang and Nielsen, 2002; Zhang et al., 2005; Yang, 2007). The T. dalaica lineage was
IP
T
set as the foreground branch and the other lineages were defined as background
SC R
branches. We compared a neutral model that constrained this additional class of sites to have dN/dS = 1 (ModelA1, fix_omega = 1, omega = 1) with a selection model that allow a class of codons on the foreground branch to have dN/dS>1 (ModelA2,
NU
fix_omega = 0, omega =1.5). A likelihood ratio test (LRT) with a χ2 approximation
MA
was conducted to obtain the P-values (Zhang et al., 2005). Here, only if the P-value was less than 0.01, the orthologous genes were inferred to be positively selected genes
D
(PSGs) (Sun et al., 2013). After the PSGs were identified, functional enrichment
TE
analysis was performed using DAVID (Huang da et al., 2009). Additionally, the
CE P
functional annotated information was assigned to the PSGs according to zebrafish genome resources. Then, we compared our PSGs with the complete list of proteins in
AC
HypoxiaDB (A database of hypoxia regulated proteins) which is a first ever manually-curated non-redundant catalogue of human hypoxia-regulated proteins (Khurana et al., 2013). Because the Mud Loach transcriptome only cover one tissue (skin), the above genome-wide screening positively selected genes method may have resulted in elimination of candidate genes involved in high-altitude hypoxia. Therefore, we chose some known genes involved in high altitude hypoxia to perform further analysis. Duplicated hif-α (hif-1αA, hif-1αB, hif-2αA, and hif-2αB), EGLN1 (1egl 9 homolog 1) and PPARA (peroxisome proliferator-activated receptor-a genes) in cyprinids 10
ACCEPTED MANUSCRIPT (Rytkonen et al., 2013; Guan et al., 2014) were downloaded from the National Center for
Biotechnology
Information
(NCBI)
and
Ensembl
version
76
IP
T
(http://www.ensembl.org/). Each of zebrafish genes was used as query to search
SC R
against the above corresponding genes obtained from the T. dalaica Swissprot annotations, respectively. Only the reciprocal BLAST best-hit result was chosen. The accession numbers of candidate genes used in this study were listed in Supplementary
NU
Table S2. The alignment of sequences was based on notional translations of
MA
nucleotide sequences using MEGA 4.0 (Tamura et al., 2007). Additionally, the optimum HKY+G model was adopted for the phylogeny detected by the Akaike
D
Information Criterion (AIC) in Modeltest 3.7 (Posada and Crandall, 1998). RAxML
TE
with 1,000 nonparametric bootstrap replicates were used to construct a phylogenetic
CE P
tree (Stamatakis, 2006). The above analysis and statistic method was performed to determine whether the above candidate genes have undergone statistically significant
AC
differences in selection pressures (Yang, 2007) setting T. dalaica as the foreground branch for each time.
2.7. Microsatellite and SNP marker discovery A microsatellite identification tool (MISA) with default parameters was used to search for
microsatellite
loci
within
all
the
unigenes
(Thiel
et
al.,
2003)
(http://pgrc.ipk-gatersleben.de/misa/misa.html). The repeat thresholds for mono- di-, tri-, tetra-, penta-, hexa-nucleotide motifs were set as 10, 6, 5, 5, 5 and 5, respectively. Unigenes contained microsatellites at locations between 50-bp flanking regions were
11
ACCEPTED MANUSCRIPT considered sufficient for primer design (Rozen and Skaletsky, 1999). Additionally, single nucleotide polymorphism (SNP) sites were identified using SNP detection
IP
T
software SAMtools (Li et al., 2009) after mapping all clean reads to the assembled
SC R
unigenes with BWA (Li and Durbin, 2009). Filter threshold was set as a minimal quality (-Q) and coverage (-d) of 25. The maximum read depth (-D) was set at 100. Indels were not considered because alternative splicing impedes reliable indel
MA
NU
discovery.
D
3. Results
TE
3.1. Illumina paired-end sequencing and de novo assembly
CE P
High-throughput sequencing was performed using Illumina Hiseq 2000 platform, which generated approximately 43.44 million 100 bp paired-end reads for T. dalaica.
AC
After trimming and filtering the raw reads with quality score threshold (Q>20), approximately 42.14 million clean 100 bp paired-end reads were retained. The workflow of Illumina deep sequencing and bioinformatics analysis were described in the Supplementary Fig.S1. Finally, a total of 145,256 unigenes (ranging from 200 to 23,847 bp) were generated for T. dalaica. The detailed de novo assembly results were summarized in Table 1 and the length distribution of all the unigenes was shown in Figure 1.
12
ACCEPTED MANUSCRIPT 3.2. Assembly assessment Comparisons to other fish transcriptomes sequenced within Cypriniformes showed
IP
T
that the T. dalaica transcriptome had the largest transcript number, the longest N50
SC R
length and the most transcripts being annotated, suggesting the quality of our assembly was high (Table 2).
In addition to above computing method, reverse transcription polymerase chain
NU
reaction (RT-PCR) assay was further used to validate the quality of assembled
MA
unigenes. Ten pairs of primers for 10 randomly chosen unigenes of T. dalaica with different expression levels were amplified successfully and accurately. RT-PCR
TE
D
results are shown in Fig.S2.
CE P
3.3. Sequence annotation
Among the 145,256 assembled unigenes, 66,599 (45.85%) and 74,594 (51.35%)
AC
unigenes had blast hits to the Swiss-prot database and Nr database, respectively. A blastx top-hit species distribution of gene annotations from Nr database for T. dalaica showed highest homology to zebrafish (Danio rerio), followed by Nile tilapia (Oreochromis niloticus) and Spotted gar (Lepisosteus oculatus) (Figure 2).
Annotation of the 145,256 assembled unigenes using clusters of predicted orthologs (Eukaryotic Ortholog Groups,KOGs) yielded good results for 65,212 (44.89%) in T. dalaica. The KOG-annotated putative proteins were classified into 26 groups, among which the cluster for “General function prediction only” is the largest group (10,484, 20% of the matched unigenes), followed by ‘Signal transduction 13
ACCEPTED MANUSCRIPT mechanisms’ (9,271, 18%) for T. dalaica. The detailed information about the KOG annotations of putative proteins for T. dalaica is depicted in Figure 3.
IP
T
The KEGG (Kyoto Encyclopedia of Genes and Genomes) pathway is a
SC R
knowledgeable database for systematic analysis of gene functions in terms of the networks of genes and molecules, which is useful for understanding gene functions, metabolic pathways and complicated biological behaviors (Ogata et al., 1999;
NU
Kanehisa and Goto, 2000). A total of 17,718 (12.2%,) assembled unigenes were found
MA
to be assigned to 348 KEGG pathways in T. dalaica (Supplementary Table S3 ), and the number of unigenes in different pathways ranged from 1 to 992.
The metabolic
D
pathways (KO01100) were the most abundant, followed by Biosynthesis of secondary
TE
metabolites (KO01110) and Pathways in cancer (KO05200). Considering the
CE P
importance of animal oxygen tolerance, we found 41 unigenes involved with glycolysis/gluconeogenesis pathway in T. dalaica. Additionally, a total of 79 unigenes
4.
AC
were mapped to HIF-1 signaling pathway (KO04066), which was depicted in Figure
BLAST result accessions from Nr database were used to retrieve associated gene names and Gene ontology (GO) terms. Among the 74,594 annotated unigenes of T. dalaica, 30,225 had the GO annotations assigned to 52 level-2 GO terms. The GO annotations were summarized under three main GO categories (Figure 5). GO analysis showed that 23,208, 17,406 and 24,161 sequences were found to be involved in biological process, cellular component and molecular function categories, respectively. The top GO term in the biological process category was “cellular 14
ACCEPTED MANUSCRIPT process,” and the GO terms “biological regulation”, “metabolic process”, and “multicellular organismal process” were also well represented. In the cellular
IP
T
component, the GO terms “cell”, “cell part”, “organelle”, “organelle part” and
SC R
“extracelluar region part” were overrepresented. For the molecular function category, the GO terms “binding” and “catalytic” were the most abundant.
NU
3.4. Candidate genes related to high-altitude hypoxia
MA
Using the RBH method, we identified 3803 orthologous genes among the three species. After alignment and trimming for quality control (see Materials and Methods),
D
2621 single copy orthologous genes were used to test for positive selection. We used
TE
branch-site model to identify genes that evolved under positive selection using the T.
CE P
dalaica as the foreground branch, and 61 genes were subjected to positive selection (P<0.01) (Supplementary TableS4). Function enrichment analysis for these PSGs
AC
showed that some categories were enriched, including “anchoring junction” (P = 0.045), “adherens junction” (P = 0.045), and “collagen” (P = 0.067). Previous study reported endothelial adherens junctions play important roles in vascular permeability and angiogenesis (Wallez and Huber, 2008). Therefore, the adherens junction category may be related to the hypoxia response. On the other hand, 11 of 61 genes were reported in HypoxiaDB (Table 3), which is cct5, ptp4a1, ctnna1, slc33a1, colec12, qrsl1, dido1, med23, CTBP1, phtf2 and vldlr (Khurana et al., 2013). ubfd1and SIK1 were sensitive to the chronic hypoxia (Mense et al., 2006; Mosqueira et al., 2012). Additionally, based on ML phylogenetic tree from individual candidate 15
ACCEPTED MANUSCRIPT genes (hif-1αA, hif-1αB, hif-2αA, hif-2αB, EGLN1 and PPARA), standard branch-site models in PAML were used to assess the different pressures experienced.
IP
T
T. dalaica was specified as the foreground branch in the above analysis. The
SC R
likelihood ratio test (LRT) of branch-site models showed that hif-1αB and hif-2αA probably subject to positive selection (Table 4). According to the LRT of branch-site models, for hif-1αB, the likelihood obtained under model A2 was significantly higher
NU
than that obtained under the neutral model A1 (2ΔlnL= 5.26, p< 0.05, df=1) and ω>1
MA
(ω= 40.32). For hif-2αA, the likelihood obtained under model A2 was also significantly higher than that obtained under the neutral model A1 (2ΔlnL= 17.416,
D
p< 0.01, df=1) and w >1 (ω= 25.51). The ML phylogenetic trees of hif-1αB and
CE P
TE
hif-2αA were depicted in Figure 6, respectively.
3.5. SSR/SNP marker identification
AC
Using the MISA program, 37, 501 SSR markers were identified within T. dalaica unigenes, including mono-, di-, tri-, tetra-, pentra- and hexanucleotides. SSR markers were classified into perfect SSR markers and compound SSR markers. Among the SSRs identified, 1884 compound SSR markers were possessed by T.dalaica. With respect to T. dalaica microsatellites, dinucleotide repeats were found to be the richest (20,066, 53.5%) of all SSRs, followed by the mono- (12,429, 33.1%), tri- (4255, 11.3%), tetra- (724, 1.9%), pentra- (15, 0.3%) pentranucleotide repeats, and only 9 hexanucleotide repeats was identified (Table 5). Among SSRs of T.dalaica detected, 22,428 unigenes contained microsatellites at locations between 50-bp flanking regions 16
ACCEPTED MANUSCRIPT which are expected to be useful for genetic linkage mapping and other genetic studies. Furthermore, under stringent criteria using BWA and SAMtools, 19,497 putative
IP
T
high-quality SNPs were detected for T. dalaica. SNPs of T. dalaica contained 10,586
SC R
transitions and 8911 transversions, and the Ts: Tv ratio was 1.19:1. Among all the SNPs, GC/CG types were the smallest SNP types while the AG/GA, CT/TC and
NU
AT/TA types were the most abundant (Figure 7).
MA
4. Discussion
This is the first report of T. dalaica transcriptome data. Compared with other fish
D
transcriptomes sequenced within Cypriniformes (Fu and He, 2012; Gao et al., 2012; Ji
TE
et al., 2012; Liao et al., 2013; Long et al., 2013; Zou et al., 2014), the de novo
CE P
assembly and annotations of T. dalaica transcriptome were best (Table 2). In terms of the gene expression level, T. dalaica transcriptome showed the highest similarity to
AC
zebrafish, which can be explained by the fact that Triplophysa fishes and zebrafish belonging to the same order Cypriniformes. Among the pathways, the greatest number of assembled unigenes was found to be involved in the metabolic pathways, which is consistent with the sliver carp transcriptome analysis (Fu and He, 2012), but this pattern didn’t occur for the Crucian Carp (Liao et al., 2013). The previous study reported that glycolysis metabolism could compensate for insufficient levels of oxygen in freely diving birds and mammals (Butler and Jones, 1997). The anoxic animals emerald notothen (Trematomus bernacchii), Crucian carp (Carassius auratus), Shark (Hemiscyllium ocellatum), frog 17
ACCEPTED MANUSCRIPT (Rana pipiens) and turtle (Chrysemys picta) utilized glycolysis/gluconeogenesis pathway maintain active and responsive functions in oxygen absence (Nilsson and
IP
T
Renshaw, 2004; Gorr et al., 2010; Huth and Place, 2013; Liao et al., 2013). In this
SC R
study, 41 unigenes involved with glycolysis/ gluconeogenesis pathway in T. dalaica transcriptome. Additionally, HIF-1 signaling pathway of T. dalaica also may offer clues for the molecular adaptation involved in hypoxia tolerance, which plays a
NU
pivotal role in the response to high altitude hypoxia.
MA
HypoxiaDB is a database of hypoxia regulated proteins extracted from 73 peer-reviewed publications selected from PubMed (Khurana et al., 2013). In our study,
D
11 out of 61 positive selected genes have been deposited in HypoxiaDB, which play a
TE
critical role in hypoxia adaptation. Although the current version of HypoxiaDB only
CE P
has human proteins, 11 hypoxia-regulated proteins could be identified as the products of positive selected genes in T. dalaica transcriptome. This result could to some
AC
degree provide implications for the genetic basis for hypoxia response of T. dalaica. Among them, some genes have been reported to be involved in hypoxia. The genes, cct5, slc33a1, qrsl1, med23, and phtf2 significantly down-regulated while the genes ptp4a1, ctnna1, colec12, dido1, and vldlr significantly up-regulated induced by hypoxia, suggesting an important role in hypoxia response (Beyer et al., 2008; Khurana et al., 2013). CTBP1 (C-terminal binding protein 1) is sensitive to hypoxia-induced apoptosis through directly interacts with Ikaros (Dorman et al., 2012). In addition, ubfd1 remained down-regulated during the entire protocol of chronic hypoxia while SIK1 possess hypoxia effect on astrocytes (Mense et al., 2006; 18
ACCEPTED MANUSCRIPT Mosqueira et al., 2012). These hypoxia related genes, found to be under positive selection in T. dalaica, serve as important targets in understanding the genetic
IP
T
mechanisms of adaptation to high-altitude in fishes in future functional studies.
SC R
Some candidate genes involved in the HIF-1 signaling pathway have been shown to exhibit signatures of positive selection in Tibetans, yak, ground tit, Tibetan mastiff, and schizothoracine fish living in the Tibetan Plateau (Beall et al., 2010; Bigham et al.,
NU
2010; Storz, 2010; Qiu et al., 2012b; Qu et al., 2013; Guan et al., 2014; Li et al., 2014).
MA
EPAS1 (HIF-2α), EGLN1, and PPARA in Tibetans, ADAM17 and Arg2 in yak, SRF, TXNRD2, and WNT7B in the ground tit, EPAS1, SIRT7, PLXNA4, and MAFG in the
D
Tibetan mastiff, HIF-1αB in schizothoracine fish could be identified in T. dalaica
TE
transcriptome (Supplementary Table S5). In fact, previous study revealed that
CE P
duplicated HIF alpha paralogs were retained in cyprinids and evolved specialized roles in development and the hypoxic stress response (Rytkonen et al., 2013). In this
AC
study, duplicated hif-α (hif-1αA, hif-1αB, hif-2αA, and hif-2αB), EGLN and PPARA were first identified and then used as candidate genes to explore the molecular genetic basis of high altitude adaptation. Hif-1αB and hif-2αA genes of T. dalaica were also found to experience positive selection, which may have led to changes in function to cope with the high-altitude life. On the one hand, the results validated the conclusion that the specialized and highly specialized schizothoracine fish HIF-1αB have experienced significantly selective pressure (Guan et al., 2014). On the other hand, hif-2αA genes experienced positive selection in T. dalaica indicated that fish and mammals may have developed similar mechanisms to adapt to the extreme 19
ACCEPTED MANUSCRIPT environment of the Tibetan Plateau. However, the definite mechanism of adaptation to
IP
T
the Tibetan Plateau needs to be investigated by functional studies in the future.
SC R
Conclusions
In this study, a high-throughput, deep RNA sequencing technology was performed to characterize the T. dalaica transcriptome. Given that this work is the first to
NU
characterize the transcriptome of Triplophysa fishes with 136 valid species, it will
MA
provide valuable resources for future phylogeny and evolutionary genomic analyses and serve as an assembly reference transcriptome. The identification and validation of
D
13 genes involved in hypoxia response and hif-1αB and hif-2αA candidate genes in T.
TE
dalaica transcriptome facilitated our understanding of the genetic mechanism of
CE P
adaptation to high-altitude hypoxia.
AC
Conflict of interest
There is no conflict of interest.
Acknowledgements This work was supported by the Pilot projects (Grant No. XDB13020100) and the Major Research plan of the National Natural Science Foundation of China (Grant No. 91131014).
20
ACCEPTED MANUSCRIPT
References
T
Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J., Zhang, Z., Miller, W. and Lipman, D.J., 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic
IP
acids research 25, 3389-3402.
Beall, C.M., Cavalleri, G.L., Deng, L., Elston, R.C., Gao, Y., Knight, J., Li, C., Li, J.C., Liang, Y., McCormack,
SC R
M., Montgomery, H.E., Pan, H., Robbins, P.A., Shianna, K.V., Tam, S.C., Tsering, N., Veeramah, K.R., Wang, W., Wangdui, P., Weale, M.E., Xu, Y., Xu, Z., Yang, L., Zaman, M.J., Zeng, C., Zhang, L., Zhang, X., Zhaxi, P. and Zheng, Y.T., 2010. Natural selection on EPAS1 (HIF2alpha) associated 11459-64.
NU
with low hemoglobin concentration in Tibetan highlanders. Proc Natl Acad Sci U S A 107, Beyer, S., Kristensen, M.M., Jensen, K.S., Johansen, J.V. and Staller, P., 2008. The histone demethylases JMJD1A and JMJD2B are transcriptional targets of hypoxia-inducible factor HIF. Journal of
MA
Biological Chemistry 283, 36542-52.
Bigham, A., Bauchet, M., Pinto, D., Mao, X.Y., Akey, J.M., Mei, R., Scherer, S.W., Julian, C.G., Wilson, M.J., Herraez, D.L., Brutsaert, T., Parra, E.J., Moore, L.G. and Shriver, M.D., 2010. Identifying Data. Plos Genetics 6.
D
Signatures of Natural Selection in Tibetan and Andean Populations Using Dense Genome Scan
TE
Butler, P.J. and Jones, D.R., 1997. Physiology of diving of birds and mammals. Physiol Rev 77, 837-99. Cai, Q., Qian, X., Lang, Y., Luo, Y., Xu, J., Pan, S., Hui, Y., Gou, C., Cai, Y. and Hao, M., 2013. Genome sequence of ground tit Pseudopodoces humilis and its adaptation to high altitude. Genome
CE P
biology 14, R29.
Castresana, J., 2000. Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Mol Biol Evol 17, 540-52. Chan, D.A., Sutphin, P.D., Yen, S.E. and Giaccia, A.J., 2005. Coordinate regulation of the
AC
oxygen-dependent degradation domains of hypoxia-inducible factor 1 alpha. Mol Cell Biol 25, 6415-26.
Chen, N., Chen, L.P., Zhang, J., Chen, C., Wei, X.L., Gul, Y., Wang, W.M. and Wang, H.L., 2012. Molecular characterization and expression analysis of three hypoxia-inducible factor alpha subunits, HIF-1alpha/2alpha/3alpha of the hypoxia-sensitive freshwater species, Chinese sucker. Gene 498, 81-90. Dorman, K., Shen, Z., Yang, C., Ezzat, S. and Asa, S.L., 2012. CtBP1 interacts with Ikaros and modulates pituitary tumor cell survival and response to hypoxia. Mol Endocrinol 26, 447-57. Froese, R. and Pauly, D., 2014. FishBase.World Wide Web electronic publication.www.fishbase.org, version (11/2014). Fu, B. and He, S., 2012. Transcriptome analysis of silver carp (Hypophthalmichthys molitrix) by paired-end RNA sequencing. DNA Res 19, 131-42. Fu, L., Niu, B., Zhu, Z., Wu, S. and Li, W., 2012. CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics 28, 3150-3152. Götz, S., García-Gómez, J.M., Terol, J., Williams, T.D., Nagaraj, S.H., Nueda, M.J., Robles, M., Talón, M., Dopazo, J. and Conesa, A., 2008. High-throughput functional annotation and data mining with the Blast2GO suite. Nucleic acids research 36, 3420-3435. 21
ACCEPTED MANUSCRIPT Gao, Z., Luo, W., Liu, H., Zeng, C., Liu, X., Yi, S. and Wang, W., 2012. Transcriptome analysis and SSR/SNP markers information of the blunt snout bream (Megalobrama amblycephala). PLoS One 7, e42637. Gorr, T.A., Wichmann, D., Hu, J., Hermes-Lima, M., Welker, A.F., Terwilliger, N., Wren, J.F., Viney, M.,
T
Morris, S., Nilsson, G.E., Deten, A., Soliz, J. and Gassmann, M., 2010. Hypoxia tolerance in animals: biology and application. Physiol Biochem Zool 83, 733-52.
IP
Grabherr, M.G., Haas, B.J., Yassour, M., Levin, J.Z., Thompson, D.A., Amit, I., Adiconis, X., Fan, L., Raychowdhury, R. and Zeng, Q., 2011. Full-length transcriptome assembly from RNA-Seq data
SC R
without a reference genome. Nature biotechnology 29, 644-652.
Guan, L., Chi, W., Xiao, W., Chen, L. and He, S., 2014. Analysis of hypoxia-inducible factor alpha polyploidization reveals adaptation to Tibetan Plateau in the evolution of schizothoracine fish. BMC Evol Biol 14, 192.
NU
He, C., Song, Z. and Zhang, E., 2011. Triplophysa fishes in China and the status of its taxonomic studies. Sichuan J Zoo 30, 150-155.
Hou, F., Zhang, X., Zhang, X., Yue, B. and Song, Z., 2012. High intra-population genetic variability and
MA
inter-population differentiation in a plateau specialized fish, Triplophysa orientalis. Environmental biology of fishes 93, 519-530.
Huang da, W., Sherman, B.T. and Lempicki, R.A., 2009. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc 4, 44-57.
D
Huth, T.J. and Place, S.P., 2013. De novo assembly and characterization of tissue specific
TE
transcriptomes in the emerald notothen, Trematomus bernacchii. BMC Genomics 14, 805. Ji, P., Liu, G., Xu, J., Wang, X., Li, J., Zhao, Z., Zhang, X., Zhang, Y., Xu, P. and Sun, X., 2012. Characterization of common carp transcriptome: sequencing, de novo assembly, annotation
CE P
and comparative genomics. PloS one 7, e35152. Kaelin, W.G. and Ratcliffe, P.J., 2008. Oxygen sensing by metazoans: The central role of the HIF hydroxylase pathway. Molecular Cell 30, 393-402. Kallio, P.J., Pongratz, I., Gradin, K., McGuire, J. and Poellinger, L., 1997. Activation of hypoxia-inducible
AC
factor 1alpha: posttranscriptional regulation and conformational change by recruitment of the Arnt transcription factor. Proc Natl Acad Sci U S A 94, 5667-72.
Kanehisa, M. and Goto, S., 2000. KEGG: kyoto encyclopedia of genes and genomes. Nucleic acids research 28, 27-30.
Khurana, P., Sugadev, R., Jain, J. and Singh, S.B., 2013. HypoxiaDB: a database of hypoxia-regulated proteins. Database (Oxford) 2013, bat074. Law, S.H., Wu, R.S., Ng, P.K., Yu, R.M. and Kong, R.Y., 2006. Cloning and expression analysis of two distinct HIF-alpha isoforms--gcHIF-1alpha and gcHIF-4alpha--from the hypoxia-tolerant grass carp, Ctenopharyngodon idellus. BMC Mol Biol 7, 15. Le Cam, A., Bobe, J., Bouchez, O., Cabau, C., Kah, O., Klopp, C., Lareyre, J.-J., Le Guen, I., Lluch, J. and Montfort, J., 2012. Characterization of rainbow trout gonad, brain and gill deep cDNA repertoires using a Roche 454-Titanium sequencing approach. Gene 500, 32-39. Li, H. and Durbin, R., 2009. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754-1760. Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G. and Durbin, R., 2009. The sequence alignment/map format and SAMtools. Bioinformatics 25, 2078-2079. Li, J., Si, S., Guo, R., Wang, Y. and Song, Z., 2013a. Complete mitochondrial genome of the stone loach, 22
ACCEPTED MANUSCRIPT Triplophysa stoliczkae (Teleostei: Cypriniformes: Balitoridae). Mitochondrial DNA 24, 8-10. Li, M., Tian, S., Jin, L., Zhou, G., Li, Y., Zhang, Y., Wang, T., Yeung, C.K., Chen, L. and Ma, J., 2013b. Genomic analyses identify distinct patterns of selection in domesticated pigs and Tibetan wild boars. Nature genetics 45, 1431-1438.
T
Li, Y., Wu, D.-D., Boyko, A.R., Wang, G.-D., Wu, S.-F., Irwin, D.M. and Zhang, Y.-P., 2014. Population variation revealed high-altitude adaptation of Tibetan mastiffs. Molecular biology and
IP
evolution 31, 1200-1205.
Liao, X., Cheng, L., Xu, P., Lu, G., Wachholtz, M., Sun, X. and Chen, S., 2013. Transcriptome analysis of
SC R
crucian carp (Carassius auratus), an important aquaculture and hypoxia-tolerant species. PLoS One 8, e62308.
Liu, S.-q., Mayden, R.L., Zhang, J.-b., Yu, D., Tang, Q.-y., Deng, X. and Liu, H.-z., 2012. Phylogenetic relationships of the Cobitoidea (Teleostei: Cypriniformes) inferred from mitochondrial and
NU
nuclear genes with analyses of gene evolution. Gene 508, 60-72.
Long, Y., Li, Q., Zhou, B., Song, G., Li, T. and Cui, Z., 2013. De novo assembly of mud loach (Misgurnus anguillicaudatus) skin transcriptome to identify putative genes involved in immunity and
MA
epidermal mucus secretion. PLoS One 8, e56998.
Loytynoja, A. and Goldman, N., 2005. An algorithm for progressive multiple alignment of sequences with insertions. Proc Natl Acad Sci U S A 102, 10557-62. Mense, S.M., Sengupta, A., Zhou, M., Lan, C., Bentsman, G., Volsky, D.J. and Zhang, L., 2006. Gene
D
expression profiling reveals the profound upregulation of hypoxia-responsive genes in
TE
primary human astrocytes. Physiol Genomics 25, 435-49. Moriya, Y., Itoh, M., Okuda, S., Yoshizawa, A.C. and Kanehisa, M., 2007. KAAS: an automatic genome annotation and pathway reconstruction server. Nucleic acids research 35, W182-W185.
CE P
Mosqueira, M., Willmann, G., Zeiger, U. and Khurana, T.S., 2012. Expression profiling reveals novel hypoxic biomarkers in peripheral blood of adult mice exposed to chronic hypoxia. PLoS One 7, e37497.
Nelson, J.S., 2006. Fishes of the World,
John Wiley & Sons.
AC
Nilsson, G.E. and Renshaw, G.M., 2004. Hypoxic survival strategies in two fishes: extreme anoxia tolerance in the North European crucian carp and natural hypoxic preconditioning in a coral-reef shark. J Exp Biol 207, 3131-9.
Ogata, H., Goto, S., Sato, K., Fujibuchi, W., Bono, H. and Kanehisa, M., 1999. KEGG: Kyoto encyclopedia of genes and genomes. Nucleic acids research 27, 29-34. Posada, D. and Crandall, K.A., 1998. MODELTEST: testing the model of DNA substitution. Bioinformatics 14, 817-8. Prokofiev, A., 2001. Four new species of the Triplophysa stoliczkai-complex from China (Pisces: Cypriniformes: Balitoridae). Zoosystematica Rossica 10, 193-207. Qiu, Q., Zhang, G., Ma, T., Qian, W., Wang, J., Ye, Z., Cao, C., Hu, Q., Kim, J. and Larkin, D.M., 2012a. The yak genome and adaptation to life at high altitude. Nature genetics 44, 946-949. Qiu, Q., Zhang, G.J., Ma, T., Qian, W.B., Wang, J.Y., Ye, Z.Q., Cao, C.C., Hu, Q.J., Kim, J., Larkin, D.M., Auvil, L., Capitanu, B., Ma, J., Lewin, H.A., Qian, X.J., Lang, Y.S., Zhou, R., Wang, L.Z., Wang, K., Xia, J.Q., Liao, S.G., Pan, S.K., Lu, X., Hou, H.L., Wang, Y., Zang, X.T., Yin, Y., Ma, H., Zhang, J., Wang, Z.F., Zhang, Y.M., Zhang, D.W., Yonezawa, T., Hasegawa, M., Zhong, Y., Liu, W.B., Zhang, Y., Huang, Z.Y., Zhang, S.X., Long, R.J., Yang, H.M., Wang, J., Lenstra, J.A., Cooper, D.N., Wu, Y., Wang, J., Shi, P., Wang, J. and Liu, J.Q., 2012b. The yak genome and adaptation to life at high 23
ACCEPTED MANUSCRIPT altitude. Nature Genetics 44, 946-+. Qu, Y., Zhao, H., Han, N., Zhou, G., Song, G., Gao, B., Tian, S., Zhang, J., Zhang, R. and Meng, X., 2013. Ground tit genome reveals avian adaptation to living at high altitudes in the Tibetan plateau. Nature communications 4.
T
Rahman, M.S. and Thomas, P., 2007. Molecular cloning, characterization and expression of two hypoxia-inducible factor alpha subunits, HIF-1alpha and HIF-2alpha, in a hypoxia-tolerant
IP
marine teleost, Atlantic croaker (Micropogonias undulatus). Gene 396, 273-82. Ramayo-Caldas, Y., Mach, N., Esteve-Codina, A., Corominas, J., Castelló, A., Ballester, M., Estellé, J.,
SC R
Ibáñez-Escriche, N., Fernández, A.I. and Pérez-Enciso, M., 2012. Liver transcriptome profile in pigs with extreme phenotypes of intramuscular fatty acid composition. BMC genomics 13, 547. Suite. Trends Genet 16, 276-7.
NU
Rice, P., Longden, I. and Bleasby, A., 2000. EMBOSS: the European Molecular Biology Open Software Rissanen, E., Tranberg, H.K., Sollid, J., Nilsson, G.E. and Nikinmaa, M., 2006. Temperature regulates hypoxia-inducible factor-1 (HIF-1) in a poikilothermic vertebrate, crucian carp (Carassius
MA
carassius). J Exp Biol 209, 994-1003.
Rozen, S. and Skaletsky, H., 1999. Primer3 on the WWW for general users and for biologist programmers, Bioinformatics methods and protocols. Springer, pp. 365-386. Rytkonen, K.T., Akbarzadeh, A., Miandare, H.K., Kamei, H., Duan, C., Leder, E.H., Williams, T.A. and
D
Nikinmaa, M., 2013. Subfunctionalization of cyprinid hypoxia-inducible factors for roles in
TE
development and oxygen sensing. Evolution 67, 873-82. Sánchez, C.C., Weber, G.M., Gao, G., Cleveland, B.M., Yao, J. and Rexroad, C.E., 2011. Generation of a reference transcriptome for evaluating rainbow trout responses to various stressors. BMC
CE P
genomics 12, 626.
Salem, M., Rexroad, C.E., Wang, J., Thorgaard, G.H. and Yao, J., 2010. Characterization of the rainbow trout transcriptome using Sanger and 454-pyrosequencing approaches. BMC genomics 11, 564.
AC
Simonson, T.S., Yang, Y.Z., Huff, C.D., Yun, H.X., Qin, G., Witherspoon, D.J., Bai, Z.Z., Lorenzo, F.R., Xing, J.C., Jorde, L.B., Prchal, J.T. and Ge, R.L., 2010. Genetic Evidence for High-Altitude Adaptation in Tibet. Science 329, 72-75.
Stamatakis, A., 2006. RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 22, 2688-90. Storz, J.F., 2010. Evolution. Genes for high altitudes. Science 329, 40-1. Sun, Y.B., Zhou, W.P., Liu, H.Q., Irwin, D.M., Shen, Y.Y. and Zhang, Y.P., 2013. Genome-wide scans for candidate genes involved in the aquatic adaptation of dolphins. Genome Biol Evol 5, 130-9. Tamura, K., Dudley, J., Nei, M. and Kumar, S., 2007. MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0. Mol Biol Evol 24, 1596-9. Tang, Q., Huang, Y., Wang, J., Huang, J., Wang, Z. and Peng, Z., 2013. The complete mitochondrial genome sequence of Triplophysa bleekeri (Teleostei, Balitoridae, Nemacheilinae). Mitochondrial DNA 24, 25-27. Tang, Q., Liu, H., Mayden, R. and Xiong, B., 2006. Comparison of evolutionary rates in the mitochondrial DNA cytochrome< i> b gene and control region and their implications for phylogeny of the Cobitoidea (Teleostei: Cypriniformes). Molecular Phylogenetics and Evolution 39, 347-357. 24
ACCEPTED MANUSCRIPT Thiel, T., Michalek, W., Varshney, R. and Graner, A., 2003. Exploiting EST databases for the development and characterization of gene-derived SSR-markers in barley (Hordeum vulgare L.). Theoretical and Applied Genetics 106, 411-422. Wallez, Y. and Huber, P., 2008. Endothelial adherens and tight junctions in vascular homeostasis,
T
inflammation and angiogenesis. Biochim Biophys Acta 1778, 794-809. Wang, J., Tang, Q., Wang, Z., Zhang, Y., Wu, Q. and Peng, Z., 2012. The complete mitogenome
IP
sequence of a cave loach Triplophysa rosa (Teleostei, Balitoridae, Nemacheilinae). Mitochondrial DNA 23, 366-368.
SC R
Wang, X.-W., Luan, J.-B., Li, J.-M., Su, Y.-L., Xia, J. and Liu, S.-S., 2011. Transcriptome analysis and comparison reveal divergence between two invasive whitefly cryptic species. BMC genomics 12, 458.
Wang, Z., Gerstein, M. and Snyder, M., 2009. RNA-Seq: a revolutionary tool for transcriptomics. Nature
NU
Reviews Genetics 10, 57-63.
Wu YF, W.C., 1992. The Fishes of the Qinghai–Xizang Plateau, Sichuan Publishing House of Science and Technology, Sichuan.
MA
Yang, W.Z., Qi, Y., Bi, K. and Fu, J.Z., 2012. Toward understanding the genetic basis of adaptation to high-elevation life in poikilothermic species: A comparative transcriptomic analysis of two ranid frogs, Rana chensinensis and R-kukunoris. Bmc Genomics 13. Yang, Z., 2007. PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol 24, 1586-91.
D
Yang, Z.H. and Nielsen, R., 2002. Codon-substitution models for detecting molecular adaptation at
TE
individual sites along specific lineages. Molecular Biology and Evolution 19, 908-917. Yao, Y., Wang, D.Q., He, L. and Yu, L.N., 2012. Isolation and characterization of sixteen microsatellite loci of a blind cavefish, Triplophysa xiangxiensis. Conservation Genetics Resources 4, 371-373.
CE P
Ye, J., Fang, L., Zheng, H., Zhang, Y., Chen, J., Zhang, Z., Wang, J., Li, S., Li, R., Bolund, L. and Wang, J., 2006. WEGO: a web tool for plotting GO annotations. Nucleic Acids Res 34, W293-7. Yi, X., Liang, Y., Huerta-Sanchez, E., Jin, X., Cuo, Z.X.P., Pool, J.E., Xu, X., Jiang, H., Vinckenbosch, N., Korneliussen, T.S., Zheng, H.C., Liu, T., He, W.M., Li, K., Luo, R.B., Nie, X.F., Wu, H.L., Zhao,
AC
M.R., Cao, H.Z., Zou, J., Shan, Y., Li, S.Z., Yang, Q., Asan, Ni, P.X., Tian, G., Xu, J.M., Liu, X.A., Jiang, T., Wu, R.H., Zhou, G.Y., Tang, M.F., Qin, J.J., Wang, T., Feng, S.J., Li, G.H., Huasang, Luosang, J.B., Wang, W., Chen, F., Wang, Y.D., Zheng, X.G., Li, Z., Bianba, Z.M., Yang, G., Wang, X.P., Tang, S.H., Gao, G.Y., Chen, Y., Luo, Z., Gusang, L., Cao, Z., Zhang, Q.H., Ouyang, W.H., Ren, X.L., Liang, H.Q., Zheng, H.S., Huang, Y.B., Li, J.X., Bolund, L., Kristiansen, K., Li, Y.R., Zhang, Y., Zhang, X.Q., Li, R.Q., Li, S.G., Yang, H.M., Nielsen, R., Wang, J. and Wang, J.A., 2010. Sequencing of 50 Human Exomes Reveals Adaptation to High Altitude. Science 329, 75-78. Zhang, J.Z., Nielsen, R. and Yang, Z.H., 2005. Evaluation of an improved branch-site likelihood method for detecting positive selection at the molecular level. Molecular Biology and Evolution 22, 2472-2479. Zhu, S., 1989. The loaches of the subfamily Nemacheilinae in China (Cypriniformes: Cobitidae), Jiangsu Science and Technology Publishing House. Zou, M., Guo, B. and Ma, X., 2014. Characterizing the transcriptome of yellow-cheek carp (Elopichthys bambusa) enables evolutionary analyses within endemic East Asian Cyprinidae. Gene 547, 267-72.
25
ACCEPTED MANUSCRIPT Figure captions Figure1. Overview of Triplophysa dalaica transcriptome sequence length distributions
IP
T
for all transcripts
databases.
SC R
Figure2. Top 20 hit species distributions for T.dalaica based on BLASTX against Nr A bar plot showing the hits to Nr databases (displaying only top 20 hits
species) for T.dalaica unigenes.
NU
Figure3. KOG classifications of the 65,212 unigenes were classified into 25 functional
MA
categories in T.dalaica.
Figur4. HIF-1 signaling pathway KEGG map populated with transcripts coding for
D
corresponding enzymes. The enzymes annotated are shown in green boxes.
TE
Figure5. The distribution of GO terms assigned to the T.dalaica transcriptome.
genes.
CE P
Figure6. Phylogenetic trees were constructed for the selected fish hif-1αB and hif-2αA
AC
Figure7. Distribution of putative single nucleotide polymorphisms(SNP) in T. dalaica assembled transcriptome. Supplementary figure S1. Flow chart of Illumina deep sequencing and analysis. It contains sample collections, cDNA library construction and Illumina sequencing, data analysis including assemble, blast, GO assignments, candidate genes analyses of high -altitude adaptation, SSR and SNP analysis,etc. Supplementary figure S2. The RT-PCR results for validating the quality of the T. dalaica assembled transcriptome.
26
AC
Fig. 1
CE P
TE
D
MA
NU
SC R
IP
T
ACCEPTED MANUSCRIPT
27
AC
CE P
TE
D
MA
NU
SC R
IP
T
ACCEPTED MANUSCRIPT
Fig. 2
28
AC
CE P
TE
D
MA
NU
SC R
IP
T
ACCEPTED MANUSCRIPT
Fig. 3
29
AC
Fig. 4
CE P
TE
D
MA
NU
SC R
IP
T
ACCEPTED MANUSCRIPT
30
AC
CE P
TE
D
MA
NU
SC R
IP
T
ACCEPTED MANUSCRIPT
Fig. 5
31
AC
CE P
TE
D
MA
NU
SC R
IP
T
ACCEPTED MANUSCRIPT
Fig. 6
32
AC
CE P
TE
D
MA
NU
SC R
IP
T
ACCEPTED MANUSCRIPT
Fig. 7
33
AC
CE P
TE
D
MA
NU
IP
SC R
Table 1. Summary of transcriptome data for T. dalaica. T.dalaica Total number of raw reads 86,875,190 Total number of clean reads 84,281,578 Read length (bp) 100 Total length of reads (bp) 8,428,157,800 Total length of assembly (bp) 237,076,947 Total sequences of assembly 145,256 Average length of assembly (bp) 1632 N50 length of assembly (bp) 3541 Maximal length of assembly (bp) 23847
T
ACCEPTED MANUSCRIPT
34
ACCEPTED MANUSCRIPT Table 2 Assembly comparison to other fish transcriptomes sequenced within Cypriniformes. Assembler
Number of transcripts
N50
Transcripts annotated
References
T.dalaica
Trinity
145,256
3541
74,594
In this study
M. anguillicaudatus
Trinty, SOAP
40,364
611
17336
Long et al.,2013
H. molitrix
ABySS, SSPACE
85,796
270
54198
C.carpio
Newbler, MIRA, CAP3
36,811
1,002
28,443
Ji et al., 2012
M.Amblycephala
MIRA
10,0477
C. auratus
Newbler
127,711
Elopichthysbambusa
Trinity
124,135
Fu and He 2012
505
40,000
Gao et al., 2012
547
22,273
Liao et al., 2013
552
61,197
Zou et al., 2014
AC
CE P
TE
D
MA
NU
SC R
IP
T
Species
35
ACCEPTED MANUSCRIPT Table3 Positively selected genes involved in hypoxia response. Gene name
comp13203_c0_seq1
cct5
comp16096_c0_seq1
ptp4a1
comp16890_c0_seq1
ctnna1
comp17198_c0_seq1
slc33a1
comp27855_c0_seq1
colec12
comp28239_c1_seq1
qrsl1
comp29992_c1_seq5
dido1
comp30402_c1_seq6
med23
comp32366_c3_seq7
CTBP1
comp33786_c1_seq43
phtf2
comp34021_c4_seq10
vldlr
comp24790_c0_seq1
ubfd1
comp24003_c0_seq2
SIK1
References
GO:0051082 unfolded protein binding GO:0016791 phosphatase activity GO:0005198 structural molecule activity GO:0008521 acetyl-CoA transporter activity GO:0030246 carbohydrate binding GO:0016884 carbon-nitrogen ligase activity
IP
SC R
NU
MA
D
TE
(Khurana et al., 2013)
GO:0005634 nucleus GO:0004616 phosphogluconate dehydrogenase (decarboxylating) activity GO:0005515 protein binding GO:0005515 protein binding GO:0004674 protein serine/threonine activity
AC
CE P
GO annotation
T
Gene ID
36
kinase
(Mosqueira et al., 2012) (Mense et al., 2006)
ACCEPTED MANUSCRIPT Table4 Positively selected sites detected in hif-1αB and hif-2αA candidate genes of Triplophysa dalaica. Transcript IDs
Log likehood
P value
Number of positive sites
hif-1αB hif-2αA
comp29194_c0_seq2 -9565.457 -9562.825 2.177e-02 120 comp25674_c0_seq1 -7660.628 -7651.920 3.028e-05 12
AC
CE P
TE
D
MA
NU
SC R
IP
T
Candidate genes
37
ACCEPTED MANUSCRIPT Table 5. Statistics of microsatellite identified in T. dalaica transcriptome.
AC
CE P
TE
D
MA
NU
SC R
IP
T
Microsatellites identified Compound microsatellite Mono- nucleotide repeats Di-nucleotide repeats Tri- nucleotide repeats Tetra- nucleotide repeats Penta- nucleotide repeats Hex- nucleotide repeats Number of microsatellites with sufficient flanking sequence (>50 bp) for PCR primer design
38
37,501 1,884 12,429 20,066 4,255 724 18 9 22,428
ACCEPTED MANUSCRIPT
AC
CE P
TE
D
MA
NU
SC R
IP
T
Abbreviations QTP, Qinghai-Tibetan Plateau; EGLN1, Egl nine homolog 1; PPARA, peroxisome proliferator-activated receptor-a genes; EPAS1, endothelial PAS domain-containing protein 1; RT-PCR, Reverse transcription polymerase chain reaction; FPKM, Fragments Per Kilobase Of Exon Per Million Fragments Mapped; AIC, Akaike Information Criterion; SNP, single nucleotide polymorphism.
39
ACCEPTED MANUSCRIPT
Highlights
IP
T
We sequenced the transcriptome of a plateau fish (Triplophysa dalaica).
SC R
We identified 61 positively selected genes in Triplophysa dalaica, of which 13 may be involved in hypoxia response.
AC
CE P
TE
D
MA
NU
We provided lots of SSR and SNP markers for Triplophysa dalaica.
40