Transcriptome analysis reveals positive selection on the divergent between topmouth culter and zebrafish

Gene 552 (2014) 265–271 Contents lists available at ScienceDirect Gene journal homepage: www.elsevier.com/locate/gene Transcriptome analysis reveal...

Download PDF

722KB Sizes 0 Downloads 24 Views

Report

PDF Reader
Full Text

Gene 552 (2014) 265–271

Contents lists available at ScienceDirect

Gene journal homepage: www.elsevier.com/locate/gene

Transcriptome analysis reveals positive selection on the divergent between topmouth culter and zebraﬁsh Li Ren a,1, Xing-Jun Tan a,1, Ya-Feng Xiong a, Kang Xu a, Yi Zhou a, Huan Zhong a, Yun Liu a, Yun-Han Hong a,b,⁎, Shao-Jun Liu a,⁎⁎ a b

Key Laboratory of Protein Chemistry and Fish Developmental Biology of Education Ministry of China, College of Life Sciences, Hunan Normal University, Changsha 410081, China Department of Biological Sciences, National University of Singapore, Singapore 117543, Singapore

a r t i c l e

i n f o

Article history: Received 17 June 2014 Received in revised form 29 August 2014 Accepted 25 September 2014 Available online 28 September 2014 Keywords: Liver Natural selection Repeat sequence

a b s t r a c t The topmouth culter (Erythroculter ilishaeformis) is a predatory cyprinid ﬁsh that distributes widely in the East Asia. Here we report the liver transcriptome in this organism as a model of predatory ﬁsh. Sequencing of 5 Gb raw reads led to 27,741 unigenes and produced 11,131 annotatable genes. A total of 7093 (63.7%) genes were found to have putative functions by gene ontology analysis. Importantly, a blast search revealed 4033 culter genes that were orthologous to the zebraﬁsh. Extracted from 38 candidate positive selection genes, 4 genes exhibit strong positive selection based on the ratio of nonsynonymous (Ka) to synonymous substitutions (Ks). In addition, the four genes also indicated the strong positive selection by comparing them between blunt snout bream (Megalobrama amblycephala) and zebraﬁsh. These genes were involved in activator of gene expression, metabolic processes and development. The transcriptome variation may be reﬂective of natural selection in the early life history of Cyprinidae. Based on Ks ratios, date of the separation between topmouth culter and zebraﬁsh is approximately 64 million years ago. We conclude that natural selection acts in diversifying the genomes between topmouth culter and zebraﬁsh. © 2014 Elsevier B.V. All rights reserved.

1. Introduction Up to date, compared with zebraﬁsh (Danio rerio) as an omnivorous and small-sized ﬁsh, topmouth culter belonging to the genera of Culter ﬁshes, is a predatory ﬁsh distributed in the East Asia region and is different to several economically cultivated carp, including common carp (Ji et al., 2012), crucian carp (Liao et al., 2013) and grass carp (Chen et al., 2012). It shows rapid growth rates which could be up to 10 kg. With regard to artiﬁcial breeding of ﬁsh, topmouth culter shows a great culture performance as a cultured fresh water ﬁsh (Zhang et al., 1999; Chen et al., 2003; Zhao et al., 2009). The Cyprinidae are the largest ﬁsh family of freshwater ﬁshes, which are an extremely diverse group of ﬁshes that show a remarkable level of Abbreviations: bp, base pair; COG, Clusters of Orthologous Groups of proteins; ESTs, expressed sequence tags; gata4, GATA-binding protein 4; GO, gene ontology; GY, Goldman and Wang method; Ka, nonsynonymous substitution rate; KEGG, Kyoto Encyclopedia of Genes and Genomes; Ks, synonymous substitution rate; MA, model averaging; MS, model selection; MYN, modiﬁed Yang and Nielsen method; nceh1, neutral cholesterol ester hydrolase 1; NG, Nei–Gojobori method; NGS, next generation sequencing; NR, nonredundant nucleotide database; NT, nucleotide(s); ORFs, open reading frames; phf13, PHD ﬁnger protein 13; tbl3, transducin (beta)-like 3; YN, Yang and Nielsen method. ⁎ Correspondence to: Y. Hong, Department of Biological Sciences, National University of Singapore, 14 Science Drive 4, Singapore 117543, Singapore. ⁎⁎ Corresponding author. E-mail addresses: [email protected] (Y.-H. Hong), [email protected] (S.-J. Liu). 1 These authors contributed equally to this work.

http://dx.doi.org/10.1016/j.gene.2014.09.053 0378-1119/© 2014 Elsevier B.V. All rights reserved.

diversity affecting their morphology, ecology, behavior and genomes, as well as multiple other facets of their biology. As to tectonic movements and climatic changes, the divergence of genera Culter and Danio occurred in the early life history of Cyprinidae based on the phylogenetic analysis (Bermingham and Avise, 1986; Dominguez-Dominguez et al., 2008; He et al., 2008). Although traditionally recognized subfamily groupings of Cyprinidae were identiﬁed by monophyletic analysis (Chen et al., 1984), the phylogenetic analysis based on S7 ribosomal protein-coding gene helps us in understanding the divergence events between the genera Culter and Danio clearly (He et al., 2008). The lack of information in numerous genes referring to different types of evolution limited us to understand the change of the genetic structure that is shaped by aquatic environment (Bermingham and Avise, 1986). Divergent time events are often estimated by information of the genome, mitochondrial DNA, and microRNA sequences. Sequencing and comparative analysis of whole-genome sequences of teleost ﬁshes, such as fugu, tetraodon, and medaka (Aparicio et al., 2002; Jaillon et al., 2004; Kasahara et al., 2007), have enabled calculation of the rate of evolution in ﬁsh lineage. In recent years, along with the rapid development of next generation sequencing (NGS) technology, the distribution of synonymous substitution (Ks value) based on coding sequences has been used to calculate the divergent event in species (Thorne and Kishino, 2002), especially for the divergence of ﬁsh, which comprise the largest amount of vertebrates (Elmer et al., 2010; Wang et al., 2012). This calculation is an effective method for

266

L. Ren et al. / Gene 552 (2014) 265–271

determining the divergence time. The whole-genome sequence of topmouth culter is not available yet. Therefore, transcriptome analysis will help us to determine the molecular genetics of topmouth culter. Meanwhile, comparison with principal Cyprinidae ﬁsh models (zebraﬁsh) (Volff, 2005) will help understand the mechanisms driving differential evolution. NGS has been regarded as a fast and effective method to investigate not only species diversity but also genetic marker (Rokas and Abbot, 2009; Metzker, 2010). The combination of NGS and microsatellite markers has opened new horizons for simple sequence repeat markers (Blanca et al., 2011; Dutta et al., 2011). The study of morphological variations among seven populations of topmouth culter showed that the species could be divided into three patterns (Wang et al., 2007). However, we still lack detailed molecular marker research on this species. In the current study, we compared the genes between topmouth culter and zebraﬁsh: 1) to discover new molecular markers in coding sequences in topmouth culter; 2) to identify the genes under positive selection for biological function; and 3) to assess the relative age of separation between topmouth culter and zebraﬁsh. 2. Materials and methods 2.1. Animal materials The topmouth culter was obtained from the Engineering Center of Polyploidy Fish Breeding of the National Education Ministry located in Hunan Normal University, China. Topmouth culter was kept in a tank (23 °C) for 2 days before being killed. After anesthetizing the ﬁsh with 2-phenoxyethanol, liver tissue was carefully excised. Samples were stored in RNALater (Ambion) at −80 °C. The total RNA was extracted from the liver tissue of topmouth culter. After RNALater was removed, the samples were homogenized using a pestle and mortar. RNA isolation was performed according to the standard Trizol protocol (Invitrogen). After using agarose gel electrophoresis to detecting, the purity and concentration were used for assessing the quality of the RNA by NanoDrop (NanoDrop Technologies, Wilmington, DE, USA; http://www.nanodrop.com).

sequence data into many individual de Bruijn graphs, and then processed each graph independently to extract full-length splicing isoforms and to tease apart transcripts derived from paralogous genes. Firstly, Module Inchworm was used to assemble the RNA-seq data into unique sequences of transcripts. Secondly, Module Inchworm was used to cluster and construct complete de Bruijn graphs for each cluster. Finally, Module Butterﬂy processed the individual graphs in parallel, tracing the paths that reads and pairs of reads took within the graph. This ultimately reported full-length transcripts for alternatively spliced isoforms, and teased apart transcripts that corresponded to paralogous genes. Finally, the resulting sequences of Trinity underwent the further process of sequence splicing, and redundancy was removed using sequence clustering software to acquire non-redundant contigs as long as possible. 2.4. Gene annotation BLASTX alignment (e-values b 1e−5) of the above-mentioned contigs was performed using NCBI nonredundant nucleotide database (NR), NCBI-nucleotide (NT), and Swiss-Prot as the reference databases. The best alignment results were used to decide the sequence direction of contigs. The associated gene name and gene ontology (GO) term accession number were obtained from BLASTX alignment (e-values b 1e−6) with zebraﬁsh in Ensembl BioMart (Flicek et al., 2013). WeGo software was used for analysis of the GO annotations (Ye et al., 2006). 2.5. Predicting and extracting open reading frames (ORFs) Contigs were aligned by BLASTX (e-value b 1e−5) to protein databases of the NCBI-NR, Swiss-Prot, Kyoto Encyclopedia of Genes and Genomes (KEGG), and Clusters of Orthologous Groups of proteins (COG) for extracting the ORFs using a Perl script. We ﬁrst aligned contigs to NR, the unaligned contigs were aligned to Swiss-Prot, KEGG, and ﬁnally COG, step by step. Contigs that could not be aligned to the above-mentioned databases were scanned by the software package ESTScan (Iseli et al., 1999), which produced nucleotide sequences with direction (5′–3′) and amino sequences of the predicted coding region.

2.2. The obtaining and processing quality control of raw Illumina data 2.6. Identiﬁcation of SSRs Poly(A) mRNA isolation was performed using beads with oligo(dT) after total RNA collection from eukaryotes. Fragmentation buffer was added for interrupting mRNA to short fragments. After taking these short fragments as templates, ﬁrst-strand cDNA was synthesized by random hexamer primer. Second-strand cDNA was then synthesized using buffer, dNTPs, RNaseH, and DNA polymerase I. Short fragments were puriﬁed with the QiaQuick PCR extraction kit (Qiagen) and resolved with elution buffer. These fragments were run with agarose gel electrophoresis after adding sequencing adapters. PCR ampliﬁcation templates of the suitable fragment were selected as PCR ampliﬁcation templates. During the quality control steps, the Agilent 2100 Bioanalyzer and ABI StepOnePlus™ Real-Time PCR System were used to qualify and quantify the sample library. Finally, the library was sequenced using Illumina HiSeq™ 2000. The blunt snout bream liver transcriptome data was downloaded from the NCBI's Sequence Read Archive (SRA) (Accession ID SRX685580). 2.3. Assembly of the Illumina contigs After raw reads were produced from sequencing machines, some read adaptors and unknown nucleotides larger than 5% were removed, including low quality reads in which the percentage of low quality bases (base quality ≤ 10) was more than 20%. Transcriptome de novo assembly was carried out with a short read assembling program (Trinity) (Grabherr et al., 2011), including three independent software modules called Inchworm, Chrysalis, and Butterﬂy. Trinity partitioned the

The various types of adaptors were removed by data processing software in a sequencing system. Therefore, the transcriptome data could be used for identifying SSRs using MIcroSAtellite (http://pgrc.ipkgatersleben.de/misa/mis-a.html). Contigs with a length less than 150 bp were removed. The remaining sequences were obtained to design primers. The microsatellite loci were identiﬁed by Perl5 script. The different microsatellites were classiﬁed as dimers, trimers, tetramers, pentamers, and hexamers. After SSRs containing expressed sequence tags (ESTs) were identiﬁed, ﬂanking primers were designed using primer3 software with a Perl script. Finally, primers which were aligned to more than one contig or SSRs never found in the primer were removed. 2.7. Identiﬁcation of Ka and Ks in orthologs To identify putative orthologs between topmouth culter and zebraﬁsh, the sequences from topmouth culter and zebraﬁsh were aligned with the reciprocal BLAST (BLASTN) hit with an e-value of 1e− 20 (Blanc and Wolfe, 2004). Two sequences were deﬁned as orthologs if each of them was the best hit of the other and if the sequences were aligned over 300 bp. Using the available zebraﬁsh protein database in Ensembl, the sequences from topmouth culter and zebraﬁsh were aligned with the reciprocal BLAST (BLASTX) if the aligned regions were N100 amino acids and a hit with the expected e-value was b 1e−15 (Blanc and Wolfe, 2004). If no same best match was found in the two

L. Ren et al. / Gene 552 (2014) 265–271

methods, the pair of sequences was discarded. Using the corresponding best match protein of zebraﬁsh as a guide, the nucleotide sequence was then translated using the BioEdit program (version 7.0.9) (Hall, 1999). The alignment of orthologous topmouth culter and zebraﬁsh protein was obtained using ClustalW (Li, 2003). After deletion of unmatched regions using PAL2NAL (Suyama et al., 2006), nonsynonymous and synonymous substitution rates were estimated using six methods in the software package KaKs_Calculator (Zhang et al., 2006).

3. Results and discussion 3.1. Transcriptome assembly Illumina HiSeq 2000 technology was employed to obtain liver transcriptome data of topmouth culter. The results of the sequencing data are shown in Table 1. A total of 4.94 Gb of topmouth culter liver transcriptome data were generated. The number of total raw reads was 61,052,660. After low quality reads were removed, sequences of 54,896,738 reads (89.9%) remained for the next analysis. The remaining reads were assembled into 47,663 contigs, with a mean length 593 bp (Table 1). The average depth of a read was 91×. The paired end reads of topmouth culter were submitted to the NCBI SRA under the Accession ID SRA099003.

3.2. Functional analyses By BLASTX (e-value b 1e−5) against three databases such as NCBINR, NCBI-NT, and Swiss-Prot, the contigs annotated were identiﬁed. There were 29,205 genes with BLASTX hits to the NCBI-NR database, with 23,879 genes (81.7%) annotation in zebraﬁsh database. A total of 40,416 genes and 25,918 genes matched in the NCBI-NT and SwissProt databases, respectively. A total of 27,741 contigs (11,131 genes) were annotated with BLASTX alignment (e-values b 1e−6) against the zebraﬁsh protein database in Ensembl (Flicek et al., 2013). We then performed gene ontology (GO) analysis (level 2) of 7093 (63.7%) genes (Fig. 1 and Table 1). Eleven groups were found in topmouth culter refers to “cellular component”. The largest categories were cell (GO: 0005623) and component (GO: 0043226) (Fig. 1). For the “Molecular function” level, 13 groups were found in topmouth culter. The most abundant categories were binding (GO: 0005488) and catalytic activity (GO: 0003824) (Fig. 1). “Biological process” contigs were assigned to 22 groups, the largest two GO categories were cellular process (GO: 0009987) and metabolic process (GO: 0008152), which reﬂect the biological functional role of liver tissue in the ﬁsh (Fig. 1). With regard to the contig assembly, 29,242 ORFs were obtained, ranging from 60 to 7227 bp after BLASTX hits against the NCBI-NR, Swiss-Prot, KEGG, and COG databases with an e-value b 1e5 (Fig. 2). The unaligned contigs were used to predict ORFs by the software ESTScan (version 3.0.2) (Iseli et al., 1999). There were 500 contigs of the predicted coding region obtained from it. Table 1 Summary of statistics for transcriptome data. Topmouth culter Total clean nucleotides (bp) Mean length (bp) Total raw reads Q20 percentage N percentage GC percentage Total length (bp) Total clean reads N50

4,940,706,420 593 61,052,660 97.77% 0.01% 47.85% 28,268,033 54,896,738 796

267

3.3. In silico mining of SSRs Several SSR motifs have been identiﬁed and characterized as potential molecular markers in the topmouth culter contigs. The standard for SSR selection was shown as follows: six for dinucleotide, ﬁve for trinucleotide, ﬁve for tetranucleotide, and three for penta and hexanucleotide motifs. According to these settings, we identiﬁed 3111 SSRs within 2633 sequences (Table 2). As expected, the most frequent type of microsatellite corresponded to dimers (64.0%), trimers, tetramers, pentamers, and hexamers was present at much lower frequencies (30.5%, 3.4%, 1.6%, and 0.5% respectively) (Supplementary Table 1 and Fig. 3). Similar results have been found in oak and Nothofagus nervosa (Ueno et al., 2010). Then, the 384 SSRs (12.3%) had sufﬁcient ﬂanking sequences to allow the design of appropriate unique primers.

3.4. Ka/Ks calculation in orthologs Using an e-value of 1e−20, we identiﬁed 4033 orthologous pairs between topmouth culter and zebraﬁsh. These orthologous pairs were used to calculate the Ka/Ks ratio (Fig. 4). The Ka/Ks ratio, which was based on extracted orthologous ORFs, was estimated with pairwise measurements. Different algorithms have been taken into account in research on the relation of evolutionary lineages of selection strength (Li et al., 2009; Luo and Hughes, 2012). Therefore, six methods including the Nei–Gojobori (NG), the Yang and Nielsen (YN), the modiﬁed method of YN (MYN), the Goldman and Wang (GY), model selection (MS) and model averaging (MA) were used for analysis in our study. The Ka/Ks ratio between topmouth culter and zebraﬁsh was within the range from 0 to 3.58 (NG) (average value = 0.10), which is calculated by using KaKs_Calculator with the method of NG (Fig. 4). Two orthologous ESTs showed Ka/Ks ratio N 1 (Table 3). The ratio of Ka/Ks value of tbl3 was 1.11 and gata4 was 3.58. A Ka/Ks ratio N 1 suggests that strong positive selection plays an important role in changes in protein DNA sequence (Yang and Bielawski, 2000). Other methods, such as approximate methods (YN and MYN) and the maximum likelihood method (GY, MS and MA), were also used to calculate the Ka/Ks ratio for validation of orthologous ESTs. Four orthologous ESTs had a Ka/Ks ratio N 1 in all of these methods. For accurately extracting evolutionary information through model selection and model averaging, the results were shown in Table 3. The Kruskal–Wallis test was used to analyze the different methods of Ka/Ks calculation with the software SigmaPlot 12.5 (H = 7.298 with 5 degrees of freedom, P = 0.199). The median values were 0.0772 (NG), 0.0806 (YN), 0.0765 (MYN), 0.0784 (GY), 0.0773 (MS), and 0.0771 (MA), with no signiﬁcant difference between methods. As a closed related species (He et al., 2008), blunt snout bream help to validated the results. So the four orthologous genes between blunt snout bream and zebraﬁsh were obtained from reciprocal BLAST. Under the same analysis, the four orthologous pairs were extracted for the calculation of the Ka/Ks ratio. The results ranged from 0.95 to 2.82 showed the strong positive selection in the four genes (Table 3). In addition, the longer orthologous pairs also raise credibility of the results. After the Ka/Ks calculations among the six methods, four genes were under positive selection with mean Ka/Ks ratio ranging from 0.97 to 4.54 (Table 3). These genes were relevant to zinc ion binding, regulation of transcription, and the cell cycle (Table 3). Divergent protein sequences experienced adaptive evolution (Ravi and Venkatesh, 2008), especially gata4, which is related to development of heart valves (Rivera-Feliciano et al., 2006). Protein-coding sequence mutations are the foundation for evolution of species. A higher neutral mutation rate in teleosts offers a higher chance for selection to act and retain favorable mutations (Ravi and Venkatesh, 2008). Although the high Ka/Ks ratio may be related to the lack of information in full-length gene, the rapid evolution of ESTs also indirectly reﬂected the mutation in four genes adapted to new environment.

268

L. Ren et al. / Gene 552 (2014) 265–271

Fig. 1. Gene ontology (GO) assignments for the topmouth culter transcriptome. GO assignments (level 2) were used to predict the distribution of functional genes in transcriptome data. Three comparisons are shown: (A) cellular component ontology; (B) molecular function ontology; and (C) biological processes ontology.

In our study, while 3909 (97.1%) orthologous ESTs showed Ka/Ks ratio b 1, 60 orthologous ESTs had a Ka/Ks ratio between 0.5 and 1. However, a Ka/Ks ratio between 0.5 and 1 is a less conservative cut-off that has proven useful for identifying genes under positive selection (Swanson et al., 2004). We included ESTs with Ka/Ks ratio N 0.5 in our genes of interest because we lacked full-length genes, which would

decrease the Ka/Ks ratio and cause loss of interesting genetic information relevant to positive selection. ESTs (Ka/Ks ratio between 0.5 and 1) are candidate genes for analysis of rapid mutation between two species (Swanson et al., 2004). GO analysis of 60 genes was obtained from Web Gene Ontology (Ye et al., 2006). The largest category of GO is the cell part (GO: 0044464) in “biological process”. The second largest

Fig. 2. The length distribution of ORFs in the topmouth culter transcriptome. The protein sequences predicted by BLASTX hits (e-value b 1e−5) are shown.

L. Ren et al. / Gene 552 (2014) 265–271

269

Table 2 The basic information of EST-SSR in topmouth culter transciptome. Topmouth culter Total number of sequences examined Total size of examined sequences (bp) Total number of identiﬁed SSRs Number of SSR containing sequences Number of sequences containing more than 1 SSR Number of SSRs present in compound formation

47,663 28,268,033 4870 4266 490 232

GO term is cellular process (GO: 0009987) about “cellular component”. The third largest GO term regards binding (GO: 0005488), which belongs to “molecular function”. Genes with Ka/Ks ratio between 0.5 and 1 are relevant to the regulation of development in the liver (Supplementary Table 2). Among 34 genes, high genetic variation occurred in immune genes tnfrsf1a and cfb (Supplementary Table 2). Strong disease resistance and wide adaptability play an important role in adaptation to evolution. Changes in protein coding sequences result in adaptation to a new environment. Some reports have shown strong disease resistance in topmouth culter (Zhao et al., 2009). In addition, the function of cellular responses to glucose starvation also plays an important role in survival. Mutation sequences, such as rrp8, are usually relevant for adapting to a lack of food (Supplementary Table 2). Teleost protein-coding sequences offer higher chances for survival in a changed environment compared with higher vertebrates (Ravi and Venkatesh, 2008). Such changes in the regulation of genes can result in genetic isolation between populations, ultimately leading to speciation. Further, sequence variation between topmouth culter and zebraﬁsh has demonstrated selection. Additional research needs to be carried out to identify the genes under positive selection, including detecting ongoing or recent selection, population genetic data, and comparisons (Nielsen et al., 2007). However, 116 (2.9%) orthologous ESTs were identiﬁed as synonymous mutations, which are often assumed to be neutral, meaning that this does not affect the ﬁtness of the individual. 3.5. Detection of divergence time Mean Ks increases with genetic distance between species, which is expected under the assumption of a molecular clock. In our study, the 4033 orthologous ESTs were obtained by calculating Ks values. Ks values obtained by six methods were calculated by the Kruskal–Wallis test in

Fig. 4. Distribution of the Ka/Ks ratio of topmouth culter and zebraﬁsh based on six algorithms. Similar values between the six methods show the reliability of the results.

SigmaPlot 12.5 (H = 139.656 with 5 degrees of freedom, P b 0.001). There was a signiﬁcant difference in the median values of 0.418 (NG), 0.383 (YN), 0.397 (MYN), 0.397 (GY), 0.404 (MS), and 0.417 (MA) (Fig. 5). Ks indicates the sites under natural selection and is useful for calculating the time of divergence (Hurst, 2002). The clock-like rate of mammals is generally 2.2 × 10−9 per site per year (Kumar and Subramanian, 2002); humans are 3.0 × 10−8 per site per generation (Xue et al., 2009). The molecular evolutionary rate of ﬁsh is known to be fast. Protein sequences in ﬁsh evolve signiﬁcantly faster than their orthologs in mammals, both for duplicated genes and those retained in a single copy (Ravi and Venkatesh, 2008). For example, protein sequences between the two pufferﬁsh species Takifugu rubripes and Tetraodon nigroviridis are more divergent than their homologs in mammals, although pufferﬁsh diverged 32 MYA and mammals (humans and mice) diverged 61 MYA (Ravi and Venkatesh, 2008). Other reports on pairwise comparisons of a genome-wide set of 5800 orthologous genes in tetraodon, fugu, humans, and mice have shown that the neutral mutation rate between tetraodon and fugu (approximately 32 MYA) is higher than in humans and mice (approximately 61 MYA) (Jaillon et al., 2004). Comparative analysis of whole-genome sequences of fugu, tetraodon, and medaka (Aparicio et al., 2002; Jaillon et al., 2004; Kasahara et al., 2007) has provided a platform to investigate the evolution rate in ﬁsh lineage. The substitution rate can be elevated by population-level polymorphism population genetic and mutation rate estimates by mitochondrial DNA. However, the true

Fig. 3. The distribution of ﬁve types of SSRs. Five types of microsatellites corresponded to dimers (64.0%), trimers (30.5%), tetramers (3.4%), pentamers (1.6%), and hexamers (0.5%).

270

L. Ren et al. / Gene 552 (2014) 265–271

Table 3 Characteristics of genes (Ka/Ks N 1) in topmouth culter and blunt snout bream. Gene name

Gene symbol

GO annotation

Ka/Ks (topmouth culter vs zebraﬁsh) (methods)

Length (topmouth culter vs zebraﬁsh) (bp)

Ka/Ks (blunt snout bream vs zebraﬁsh) (methods)

GATA-binding protein 4

gata4

Transcription cofactor; DNA-dependent; nucleus; DNA binding protein; zinc ion binding protein

570

Transducin (beta)-like 3

tbl3

Lymphocyte differentiation; small-subunit processome

PHD ﬁnger protein 13

phf13

Cell cycle; cell division; DNA condensation; mitosis; nucleus; chromatin binding; methylated histone residue binding; zinc ion binding

Neutral cholesterol ester hydrolase 1

nceh1

Lipid catabolic process; integral component of membrane; endoplasmic reticulum

3.57871 (NG) 4.52284 (GY) 4.43876 (YN) 4.47918 (MYN) 4.53576 (MS) 4.53277 (MA) 1.11065 (NG) 1.30488 (GY) 1.11702 (YN) 1.05032 (MYN) 1.30488 (MS) 1.28137 (MA) 0.983024 (NG) 0.999647 (GY) 2.09969 (YN) 4.06167 (MYN) 0.987124 (MS) 0.988919 (MA) 0.968628 (NG) 1.1169 (GY) 1.14624 (YN) 1.06139 (MYN) 1.04266 (MS) 1.06114 (MA)

1.92094 (NG) 2.75827 (GY) 2.61955 (YN) 2.53654 (MYN) 2.74172 (MS) 2.73966 (MA) 0.954899 (NG) 0.997821 (GY) 1.20336 (YN) 1.0876 (MYN) 0.997821 (MS) 0.997931 (MA) 1.24853 (NG) 1.72872 (GY) 1.68216 (YN) 1.41133 (MYN) 1.5224 (MS) 1.51597 (MA) 2.14107 (NG) 2.55423 (GY) 2.74683 (YN) 2.81853 (MYN) 2.55423 (MS) 2.56812 (MA)

divergence time is less than that estimated from mtDNA, then the substitution rate would be faster (Elmer et al., 2010). Although the Ka/Ks ratio is relative to the category of gene, the population size and species classiﬁcation (Thorne and Kishino, 2002), then little consensus or knowledge exists based on genome- or transcriptomelevel molecular clocks in aquatic organisms (Kumar, 2005; Pulquerio and Nichols, 2007). We used the clock-like rate of 3.51 × 10−9 for coding region analysis that was detected by multiple algorithms in vertebrate (Li et al., 2009). Therefore, based on these clock-like rate, the average time of divergence between topmouth culter and zebraﬁsh was calculated as ranging from 61.85 to 67.33 MYA (mean = 64.39) (Lynch, 2000). The prediction will be the further estimation of speciation time, which exhibited the divergence of the genera Danio and Culter earlier than the divergence of the genera Danio and Cyprinus 50 MYA based on MHC genes (Kruiswijk et al., 2002; He et al., 2008).

420

741

850

Length (blunt snout bream vs zebraﬁsh) (bp)

Database accession

744

UniProtKB: B7ZKZ4

888

ZFIN ID: ZDB-GENE-041114-104

900

UniProtKB: Q86YI8

1197

UniProtKB-KW: KW-0442; UniProtKB-KW: KW-0812; UniProtKB-KW: KW-0256

4. Conclusion Analysis on geographic patterns of genetic differentiation between topmouth culter and zebraﬁsh will demonstrate the early divergence in Cyprinidae. 4 genes under strong positive selection performed the diversity adaption, and 34 candidate genes show signs of positive selection. Adaptive sequence evolution as a source of variation is used to explain the diversity between topmouth culter and zebraﬁsh. In addition, approximately 64.39 MYA has been predicted based on transcriptome evolutionary analysis. Coupled with studies of parallel evolution, functional effects and ﬁtness experiments, we are embarking on understanding how genetic and ecological factors interact and potentially drive speciation. Supplementary data to this article can be found online at http://dx. doi.org/10.1016/j.gene.2014.09.053.

Fig. 5. Distribution of Ks values based on six methods to identify events of speciation. The bold lines in the boxes indicate the average Ks and the thick lines in the boxes indicate the median.

L. Ren et al. / Gene 552 (2014) 265–271

Availability of supporting data The data set supporting the results of this article is available in the NCBI SRA under the Accession ID SRA099003. Authors' contributions LR, XJT and YFX carried out bioinformatics analyses and wrote the manuscript. SJL, YHH and YL contributed to the conception and design of the study. YZ and HZ provided assistance in the preparation of the manuscript. KX provided assistance extracting the raw material. All authors read and approved the ﬁnal manuscript. Acknowledgments This work was supported by the National Special Fund for Scientiﬁc Research in Public Beneﬁts (Grant No. 200903046), the National Natural Science Foundation of China (Grant No. 30930071), Major International Cooperation Projects of the National Natural Science Foundation of China (Grant No. 31210103918), Doctoral Fund of the Ministry of Education of China (Grant No. 20114306130001), the National High Technology Research and Development Program of China (Grant No. 2011AA100403), the National Key Basic Research Program of China (Grant No. 2012CB722305), Scientiﬁc Research Fund of Hunan Provincial Education Department (12A089), Training Program of the Major Research Plan of the National Natural Science Foundation of China (Grant No. 91331105), Key Item of the National Natural Science Foundation of China (Grant No. 31430088), the Cooperative Innovation Center of Engineering and New Products for Developmental Biology of the Education Department of Hunan Province (Grant No. 20134486). References Aparicio, S., Chapman, J., Stupka, E., Putnam, N., Chia, J.M., et al., 2002. Whole-genome shotgun assembly and analysis of the genome of Fugu rubripes. Science 297, 1301–1310. Bermingham, E., Avise, J.C., 1986. Molecular zoogeography of freshwater ﬁshes in the southeastern United States. Genetics 113, 939–965. Blanc, G., Wolfe, K.H., 2004. Widespread paleopolyploidy in model plant species inferred from age distributions of duplicate genes. Plant Cell 16, 1667–1678. Blanca, J., Canizares, J., Roig, C., Ziarsolo, P., Nuez, F., et al., 2011. Transcriptome characterization and high throughput SSRs and SNPs discovery in Cucurbita pepo (Cucurbitaceae). BMC Genomics 12, 104. Chen, X.L., Yue, P.Q., Lin, R.D., 1984. Major groups within the family cyprinidae and their phylogenetic relationships. Acta Zootaxon. Sin. 4, 022. Chen, J.M., Ye, J.Y., Pan, Q., 2003. A nutrition composition analysis of dorsal ﬂesh of Erythroculter ilishaeformis. J. Zhejiang Ocean Univ. 4, 003. Chen, J., Li, C., Huang, R., Du, F., Liao, L., et al., 2012. Transcriptome analysis of head kidney in grass carp and discovery of immune-related genes. BMC Vet. Res. 8, 108. Dominguez-Dominguez, O., Alda, F., de Leon, G.P., Garcia-Garitagoitia, J.L., Doadrio, I., 2008. Evolutionary history of the endangered ﬁsh Zoogoneticus quitzeoensis (Bean, 1898) (Cyprinodontiformes: Goodeidae) using a sequential approach to phylogeography based on mitochondrial and nuclear DNA data. BMC Evol. Biol. 8, 161. Dutta, S., Kumawat, G., Singh, B.P., Gupta, D.K., Singh, S., et al., 2011. Development of genic-SSR markers by deep transcriptome sequencing in pigeonpea [Cajanus cajan (L.) Millspaugh]. BMC Plant Biol. 11, 17. Elmer, K.R., Fan, S., Gunter, H., Jones, J., Boekhoff, S., et al., 2010. Rapid evolution and selection inferred from the transcriptomes of sympatric crater lake cichlid ﬁshes. Mol. Ecol. 19, 197–211. Flicek, P., Ahmed, I., Amode, M.R., Barrell, D., Beal, K., et al., 2013. Ensembl 2013. Nucleic Acids Res. 41, D48–D55. Grabherr, M.G., Haas, B.J., Yassour, M., Levin, J.Z., Thompson, D.A., et al., 2011. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat. Biotechnol. 29, 644–652. Hall, T.A., 1999. BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucleic Acids Symposium Series, pp. 95–98. He, S., Mayden, R.L., Wang, X., Wang, W., Tang, K.L., et al., 2008. Molecular phylogenetics of the family Cyprinidae (Actinopterygii: Cypriniformes) as evidenced by sequence variation in the ﬁrst intron of S7 ribosomal protein-coding gene: further evidence from a nuclear gene of the systematic chaos in the family. Mol. Phylogenet. Evol. 46, 818–829.

271

Hurst, L.D., 2002. The Ka/Ks ratio: diagnosing the form of sequence evolution. Trends Genet. 18, 486–487. Iseli, C., Jongeneel, C.V., Bucher, P., 1999. ESTScan: a program for detecting, evaluating, and reconstructing potential coding regions in EST sequences. Proc. Int. Conf. Intell. Syst. Mol. Biol. 138–48. Jaillon, O., Aury, J.M., Brunet, F., Petit, J.L., StangeThomann, N., et al., 2004. Genome duplication in the teleost ﬁsh Tetraodon nigroviridis reveals the early vertebrate protokaryotype. Nature 431, 946–957. Ji, P., Liu, G., Xu, J., Wang, X., Li, J., et al., 2012. Characterization of common carp transcriptome: sequencing, de novo assembly, annotation and comparative genomics. PLoS One 7, e35152. Kasahara, M., Naruse, K., Sasaki, S., Nakatani, Y., Qu, W., et al., 2007. The medaka draft genome and insights into vertebrate genome evolution. Nature 447, 714–719. Kruiswijk, C.P., Hermsen, T.T., Westphal, A.H., Savelkoul, H.F., Stet, R.J., 2002. A novel functional class I lineage in zebraﬁsh (Danio rerio), carp (Cyprinus carpio), and large barbus (Barbus intermedius) showing an unusual conservation of the peptide binding domains. J. Immunol. 169, 1936–1947. Kumar, S., 2005. Molecular clocks: four decades of evolution. Nat. Rev. Genet. 6, 654–662. Kumar, S., Subramanian, S., 2002. Mutation rates in mammalian genomes. Proc. Natl. Acad. Sci. 99, 803–808. Li, K.B., 2003. ClustalW-MPI: ClustalW analysis using distributed and parallel computing. Bioinformatics 19, 1585–1586. Li, J., Zhang, Z., Vang, S., Yu, J., Wong, G.K., et al., 2009. Correlation between Ka/Ks and Ks is related to substitution model and evolutionary lineage. J. Mol. Evol. 68, 414–423. Liao, X., Cheng, L., Xu, P., Lu, G., Wachholtz, M., et al., 2013. Transcriptome analysis of crucian carp (Carassius auratus), an important aquaculture and hypoxia-tolerant species. PLoS One 8, e62308. Luo, H., Hughes, A.L., 2012. d(N)/d(S) does not show positive selection drives separation of polar-tropical SAR11 populations. Mol. Syst. Biol. 8, 625. Lynch, M., 2000. The evolutionary fate and consequences of duplicate genes. Science 290, 1151–1155. Metzker, M.L., 2010. Sequencing technologies — the next generation. Nat. Rev. Genet. 11, 31–46. Nielsen, M., Hes, F.J., Nagengast, F.M., Weiss, M.M., Mathus-Vliegen, E.M., et al., 2007. Germline mutations in APC and MUTYH are responsible for the majority of families with attenuated familial adenomatous polyposis. Clin. Genet. 71, 427–433. Pulquerio, M.J., Nichols, R.A., 2007. Dates from the molecular clock: how wrong can we be? Trends Ecol. Evol. 22, 180–184. Ravi, V., Venkatesh, B., 2008. Rapidly evolving ﬁsh genomes and teleost diversity. Curr. Opin. Genet. Dev. 18, 544–550. Rivera-Feliciano, J., Lee, K.H., Kong, S.W., Rajagopal, S., Ma, Q., et al., 2006. Development of heart valves requires Gata4 expression in endothelial-derived cells. Development 133, 3607–3618. Rokas, A., Abbot, P., 2009. Harnessing genomics for evolutionary insights. Trends Ecol. Evol. 24, 192–200. Suyama, M., Torrents, D., Bork, P., 2006. PAL2NAL: robust conversion of protein sequence alignments into the corresponding codon alignments. Nucleic Acids Res. 34, W609–W612. Swanson, W.J., Wong, A., Wolfner, M.F., Aquadro, C.F., 2004. Evolutionary expressed sequence tag analysis of Drosophila female reproductive tracts identiﬁes genes subjected to positive selection. Genetics 168, 1457–1465. Thorne, J.L., Kishino, H., 2002. Divergence time and evolutionary rate estimation with multilocus data. Syst. Biol. 51, 689–702. Ueno, S., Le Provost, G., Leger, V., Klopp, C., Noirot, C., et al., 2010. Bioinformatic analysis of ESTs collected by Sanger and pyrosequencing methods for a keystone forest tree species: oak. BMC Genomics 11, 650. Volff, J.N., 2005. Genome evolution and biodiversity in teleost ﬁsh. Heredity (Edinb.) 94, 280–294. Wang, W., Chen, L.Q., Gu, Z.M., Peng, S.M., Li, Y.K., 2007. Analysis of morphological variations among seven populations of Erythroculter ilishaeformis. Freshw. Fish. 3, 009. Wang, J.T., Li, J.T., Zhang, X.F., Sun, X.W., 2012. Transcriptome analysis reveals the time of the fourth round of genome duplication in common carp (Cyprinus carpio). BMC Genomics 13, 96. Xue, Y., Wang, Q., Long, Q., Ng, B.L., Swerdlow, H., et al., 2009. Human Y chromosome base-substitution mutation rate measured by direct sequencing in a deep-rooting pedigree. Curr. Biol. 19, 1453–1457. Yang, Z., Bielawski, J.P., 2000. Statistical methods for detecting molecular adaptation. Trends Ecol. Evol. 15, 496–503. Ye, J., Fang, L., Zheng, H., Zhang, Y., Chen, J., et al., 2006. WEGO: a web tool for plotting GO annotations. Nucleic Acids Res. 34, W293–W297. Zhang, J.S., Zhang, W.M., Wang, M.Y., Gu, S.P., 1999. Study on artiﬁcial propagation and summer ﬁngerlings cultivating techniques of Erythroculter ilishaeformis. Fish. Sci. Technol. Inf. 3. Zhang, Z., Li, J., Zhao, X.Q., Wang, J., Wong, G.K.S., et al., 2006. KaKs_Calculator: calculating Ka and Ks through model selection and model averaging. Genomics Proteomics Bioinformatics 4, 259–263. Zhao, Y.F., Liu, B., Zhou, Q.L., Wang, G.Y., He, Y.J., et al., 2009. Effect of biological ﬁlter on growth, water quality and bacteria quantity of top-mouth culter (Erythroculter ilishaeformis Bleeker). J. Guangdong Ocean Univ. 4, 009.

Transcriptome analysis reveals positive selection on the divergent between topmouth culter and zebrafish

Transcriptome analysis reveals positive selection on the divergent between topmouth culter and zebrafish

Recommend Documents