Exploiting RNA-sequencing data from the porcine testes to identify the key genes involved in spermatogenesis in Large White pigs

Exploiting RNA-sequencing data from the porcine testes to identify the key genes involved in spermatogenesis in Large White pigs

GENE-40716; No. of pages: 7; 4C: Gene xxx (2015) xxx–xxx Contents lists available at ScienceDirect Gene journal homepage: www.elsevier.com/locate/ge...

845KB Sizes 0 Downloads 27 Views

GENE-40716; No. of pages: 7; 4C: Gene xxx (2015) xxx–xxx

Contents lists available at ScienceDirect

Gene journal homepage: www.elsevier.com/locate/gene

Research paper

Exploiting RNA-sequencing data from the porcine testes to identify the key genes involved in spermatogenesis in Large White pigs Huibin Song a, Lihua Zhu a, Yan Li a, Changping Ma a, Kaifeng Guan a, Xuanyan Xia b, Fenge Li a,⁎ a Key Laboratory of Pig Genetics and Breeding of Ministry of Agriculture & Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education, Huazhong Agricultural University, Wuhan 430070, PR China b College of Informatics, Huazhong Agricultural University, Wuhan 430070, PR China

a r t i c l e

i n f o

Article history: Received 22 March 2015 Received in revised form 7 July 2015 Accepted 16 July 2015 Available online xxxx Keywords: Pig Testis Spermatogenesis RNA-sequencing

a b s t r a c t Mammalian testis development and spermatogenesis play critical roles in male fertility. However, little genomic information is available for porcine sexually mature and immature testis. Presently, we detected approximately 76% of previously annotated genes that were expressed in the porcine testes by RNA sequencing. Taking an FDR of 0.001 and a |log2Ratio| of 1 as cutoffs, 10,095 genes were significantly differentially expressed between two stages, including 242 spermatogenesis-associated genes. These genes were significantly enriched to GO BP terms concerning spermatogenesis, male gamete generation, developmental process and sexual reproduction; to the KEEG pathways, including focal adhesion, ECM–receptor interaction, and phagosome. 186 extended transcripts, 1273 alternative splicing events and 2846 SNPs were detected in spermatogenesis-associated DEGs. Two PIWIL4 SNPs were successfully validated and suggested to be the potential molecular markers for semen quality. This study will help identify the specific genes and isoforms that are active in porcine spermatogenesis and sexual maturity. © 2015 Published by Elsevier B.V.

1. Introduction Spermatogenesis is a complex process of cellular divisions and developmental changes that occur within the seminiferous tubules of the testis (de Kretser et al., 1998). The process of spermatogenesis is composed of three major phases, including mitotic process of spermatogonia to form spermatocytes, meiosis to reduce the number of chromosomes to form spermatids, and spermiogenesis in which haploid spermatids develop into spermatozoa (de Kretser et al., 1998; Ro et al., 2007). Spermatogenesis involves many cellular and molecular events unique to germ cells, such as cell growth and development, cell adhesion, signaling and cell migration (Kleene, 2001). This complex process is developmentally regulated by thousands of genes that express specifically at both transcriptional and translational levels (Yu et al., 2003; Hecht, 1998; Eddy and O'Brien, 1998). For example, approximately half of the testis-specific genes are expressed in late stage spermatogenesis, sperm (30%) and spermatids (21%), whereas

Abbreviations: VOL, semen volume per ejaculate; SCON, sperm concentration; MOT, motility; ASR, abnormal sperm rate; DEGs, differentially expressed genes; RPKM, per million mapped reads; AS, alternative splicing; SNP, single nucleotide polymorphism; QTL, quantitative trait locus. ⁎ Corresponding author at: College of Animal Science, Huazhong Agricultural University, Wuhan 430070, PR China. E-mail address: [email protected] (F. Li).

only a few are expressed in earlier stages, spermatogonia (3%) and spermatocytes (4%) (Djureinovic et al., 2014). To our knowledge, there are three approaches to study gene expression during spermatogenesis. One uses cell sorting to separate mitotic, meiotic, and postmeiotic germ cells, and characterize their gene expressions compared with somatic cells or other germ cells (Yu et al., 2003; Chalmel et al., 2007; Fallahi et al., 2010). Using a mouse 1.2 cDNA expression arrays with 1176 unique gene probes, 260 were detected during spermatogenesis in at least one of the six cell types (Yu et al., 2003). Only 46% (120 of 260 genes) were expressed in all six types, but a number of genes showed a differential expression pattern (Yu et al., 2003). The second approach is based on examining gene expression in the testis at different time points during spermatogenesis (Margolin et al., 2014; Schultz et al., 2003; Shima et al., 2004; Laiho et al., 2013). By analyzing the SOLiD 4 next-generation sequencing data from the whole testes at 5 time points (days 7, 14, 17, 21 and 28), Laiho et al. (2013) identified 2494 differentially expressed genes associated mostly with meiosis, Piwi-interacting RNA metabolism. As the wild-type testis always contains a mixed population of germ cells at different stages of their differentiation process, it is impossible to make profiling gene expression within Sertoli cells or specific germ cells. Therefore, the third approach, developed very recently, synchronizes spermatogenesis without affecting fertility by manipulating retinoic acid levels within the neonatal testis from the RiboTag transgenic mouse line that allows for the isolation of polyribosomes from specific cell types using a Cre/Lox

http://dx.doi.org/10.1016/j.gene.2015.07.057 0378-1119/© 2015 Published by Elsevier B.V.

Please cite this article as: Song, H., et al., Exploiting RNA-sequencing data from the porcine testes to identify the key genes involved in spermatogenesis in Large White pigs, Gene (2015), http://dx.doi.org/10.1016/j.gene.2015.07.057

2

H. Song et al. / Gene xxx (2015) xxx–xxx

system (Evans et al., 2014). A microarray analysis showed that 392 and 194 transcripts were significantly changed in expression in germ and Sertoli cells during a synchronized first wave of spermatogenesis in RiboTag/Stra8-cre or RiboTag/Amh-cre mice (Evans et al., 2014). Onset of spermatogenesis appeared at boars of the typical US breeds including Large White pigs less than 120 days of age (Lunstra et al., 1997, 2003). Large White pig is one of the most common pig breeds used in the world-wide pork production. During the pre-pubertal period of development such as at 60 days of age, germ cells multiply and differentiate in a synchronous fashion in testis; but at 180 days of age, spermatozoa are produced asynchronously and the testis contains all kinds of germ cells. Therefore, the analysis of changes in gene expression based on major stages of development, especially in regard to male germ cell development, will provide a powerful tool to determine the cellular processes involved in the formation of spermatozoa. Several previous studies, based on expressed sequence tags (ESTs), microarray and RNA-sequencing (RNA-Seq), have identified an unusual and diverse set of genes expressed in testes (Margolin et al., 2014; Laiho et al., 2013). The high-throughput RNA-Seq approach has been used to detect the dynamic changes of various genes in a more sensitive and more robust way (Margolin et al., 2014). However, the transcriptomic profiles related to porcine sexual maturity are still not fully studied. Hence, we analyzed gene expression in the testis samples of 60-day-old (60-d) and 180-day-old (180-d) Large White boars using RNA deep sequencing in order to discover genes and isoforms that are active in porcine spermatogenesis and meiosis. The results from the study will allow us to ascertain differential dynamics of gene expression in swine testis, and gain a global understanding of the underlying functions of differentially expressed genes involved in spermatogenesis and sexual maturity. 2. Materials and methods 2.1. Animals Whole testes were removed from 3 young Large White boars at 60 days of age (sexually immature) and 180 days of age (sexually mature) Large White boars (Luo et al., 2010; Egbunike, 1979). Total RNAs were extracted by Trizol reagent (Invitrogen). The association analyses were conducted in Duroc (n = 186), Large White (n = 87) and Landrace (n = 50) pigs from Yangxiang Group Corporation. Genomic DNAs were isolated using standard phenol/chloroform extraction from sperm. Semen volume per ejaculate (VOL), sperm concentration (SCON), motility (MOT) and abnormal sperm rate (ASR) of above boars were recorded. 2.2. Construction and sequencing of cDNA libraries After the total RNA extraction and DNase I treatment, the pools were made up of equally mixed RNAs of 3 testes at the same age. Magnetic beads with Oligo (dT) were used to isolate mRNAs. Simply mixed with the fragmentation buffer (Ambion), the mRNA was fragmented into short fragments of 200–300 nt. Then cDNAs were synthesized, purified and resolved with EB buffer for end reparation and single nucleotide A (adenine) addition. After that, the short cDNA fragments were connected with Illumina PE adapters. After size-selected on agarose gel, the libraries were amplified by 15 cycles of PCR with Phusion DNA polymerase (New England Biolabs) and primers containing barcode sequences to distinguish different libraries during sequencing and data analysis. During the quality control steps, Agilent 2100 Bioanaylzer and ABI StepOnePlus Real-Time PCR System were used in quantification and qualification of the sample library. Finally, the library was sequenced using Illumina HiSeqTM 2000 sequencer. Raw reads were filtered into clean reads which were then aligned to the reference swine genome Sscrofa10.2 with SOAPaligner/SOAP2. No more than 5 mismatches were allowed in the alignment. After that,

we proceeded with the deep analyses including gene expression, gene structure refinement, alternative splicing (AS) and SNP analysis. 2.3. Expression analysis Gene expression levels were measured as numbers of reads per kilobase of exon model in a gene per million mapped reads (RPKM). Differentially expressed genes (DEGs) between 60-d and 180-d testes were identified with an R package named DEGseq. The genes with false discovery rate (FDR) ≤0.001 and |log2Ratio| ≥ 1 were taken as DEGs. 2.4. Functional enrichment analysis To annotate the function of these DEGs, we firstly mapped all DEGs to GO terms in the database (http://www.geneontology.org/), calculating gene numbers for every term. Then in order to find significantly enriched GO terms in the input list of DEGs, we used a strict in house algorithm, developed by BGI which was based on GO::TermFinder (Boyle et al., 2004) (http://smd.stanford.edu/help/GO-TermFinder/ GO_TermFinder_help.shtml/), to do hypergeometric test. KEGG was used to perform pathway enrichment analysis of DEGs (Kanehisa et al., 2014). Only GO-BP terms or KEGG pathways with a P value less than 0.05 were considered as significant. 2.5. Gene structure refinement We assembled transcripts with the reads using Cufflink (Trapnell et al., 2010). Through comparisons of assembled transcripts and gene annotation from swine genome assembly Sscrofa10.2, the assembled transcript(s) that extended 5′ or 3′ end of an annotated gene were found, and therefore that gene structure was refined. 2.6. Alternative splicing SOAPsplice (v1.1) was used for genome-wide ab initio detection of both known and novel splice junction sites from RNA-Seq (Huang et al., 2011). All initially unmapped reads were searched to find a junction alignment. Splice junctions were utilized to identify alternative spliced transcripts. 2.7. SNP identification SOAPsnp is the tool to detect SNPs (Langmead et al., 2009). The SNPs were identified on the consensus sequence through comparisons with the reference. We then compared the SNPs between 60-d and 180-d, and got the age-specific SNPs. As we focused on the spermatogenesisassociated mutations, we filtered with the chromosomal positions of the age-specific SNPs against those of spermatogenesis associated DEGs, and retained those variants which mapped to spermatogenesis associated DEGs. 2.8. Real-time RT-PCR Total RNAs were reverse transcribed into cDNAs by the M-MLV Reverse Transcriptase (Promega). Each 15 μl real-time RT-PCR reaction included 7.5 μl SYBR Green Real-time PCR Master Mix, 1 μl cDNAs, 0.4 μl primers (S1 Table). PCR conditions consisted of 1 cycle at 94 °C for 5 min, followed by 40 cycles at 94 °C for 40 s, 57 °C for 30 s, and 72 °C for 30 s, with fluorescence acquisition at 74 °C in single mode. The relative expression level was determined using the 2−ΔΔCt method with β-actin as the control. 2.9. SNP validations and population association analyses SNPs were validated by PCR-RFLPs using the primers listed in S1 Table. The associations between genotypes and semen quality traits in

Please cite this article as: Song, H., et al., Exploiting RNA-sequencing data from the porcine testes to identify the key genes involved in spermatogenesis in Large White pigs, Gene (2015), http://dx.doi.org/10.1016/j.gene.2015.07.057

H. Song et al. / Gene xxx (2015) xxx–xxx

pigs were evaluated with the general linear model (GLM) procedure of SAS version 8.0. Both additive and dominance effects were also estimated using regression (REG) procedure. The model: Y ij = μ + genotypei + farmj + eij , where Yij is the observation of semen quality trait, μ is the overall population mean; genotypei is the fixed effect of the ith genotype (i = 1, 2 and 3); farmj is the fixed effect of the jth farm (j = 1, 2); and eij is the random residual (Ma et al., 2013).

3. Results 3.1. An overview of the sequenced RNAs from the pig testes We obtained a number of 10.94 and 11.02 millions of clean reads from samples prepared at 60-d and 180-d, respectively (Table 1). The total length of the reads was over 9.84 gigabases (Gbs), representing 3.7-fold of the whole pig genome (2.7 Gbs). 77.40% (84,705,790/ 109,435,712) and 72.14% (79,537,649/110,248,938) of the total reads were mapped to the pig genome. Of which, 65.57–70.34% were uniquely mapped, while 6.58–7.06% showed multiple matches. A majority of the reads were located in annotated genic regions, with 45.37% for 60-d and 52.13% for 180-d, respectively. Of all 25,322 annotated pig genes, 19,245 and 19,167 were expressed in 60-d and 180-d testes, respectively (S2 Table).

3

3.3. GO and pathway analysis of DEGs DEGs were undergoing GO analysis, and 5995 of them were assigned to 34 GO BP terms with P b 0.05 (Table 2). The DEGs were significantly enriched to GO BP terms concerning spermatogenesis (GO:0007283), male gamete generation (GO:0048232), developmental process (GO:0032502) and sexual reproduction (GO:0019953) (Table 2). There were 242 spermatogenesis-associated DEGs, including Piwi-like (PIWIL), spermatogenesis associated (SPATA) and KIT genes (S2 Table). PIWIL4 (ENSSSCG00000014959) mRNA level was elevated in testes of 60-d piglets when compared to 180-d males. SPATA3 (ENSSSCG00000016269), SPATA4 (ENSSSCG00000015767), SPATA6 (ENSSSCG00000003881), SPATA9 (ENSSSCG00000030114), SPATA9-like (ENSSSCG00000014142), SPATA19 (ENSSSCG00000025612) and SPATA25 (ENSSSCG00000007429) were up-regulated in the adult boars (S2 Table). These DEGs were enriched to the pathways including focal adhesion (ko04510), ECM-receptor interaction (ko04512), and phagosome (ko04145) (S3 Table). In our study, the down-regulated FAK, c-Jun and Bcl-2 in 180-d testes, were closely related to the germ cell proliferation and survival (S2 Table).

3.2. Identification and validation of DEGs We quantified the expression levels of all genes in our RNA-Seq data set by determining RPKM values (Howe et al., 2011). A total of 20,002 annotated genes were detected with RPKM N 0 at least in one sample (S2 Table). Taking an FDR of 0.001 and a |log2Ratio| of 1 as cutoffs, 10,095 genes showed significantly different expression patterns between 60-d and 180-d porcine testes. In 180-d boars, 2 sets of 5199 and 4896 transcripts belonged to the up-regulated and down-regulated group, respectively. To confirm the gene expression pattern identified by RNA sequencing, 9 genes including BCL2-associated athanogene 6 (BAG6), dynactin (DCTN), inhibin, alpha (INHA), inhibin, beta A (INHBA), protein kinase, cAMP-dependent, catalytic, alpha (PRKACA), protein kinase, cAMPdependent, regulatory, type I, alpha (PRKAR1A), sp1 transcription factor (SP1) and testis-specific kinase 2 (TESK2), Piwi-like RNA-mediated gene silencing 4 (PIWIL4) were selected to be assessed by real-time RT-PCR in the samples prepared from 60-d and 180-d testes. The results showed that expression patterns of 8 genes were consistent with those of the RNA sequencing results with a contradiction expression profile in DCTN gene (Pearson correlation coefficient = 0.96, P = 3.31E − 05) (Fig. 1A, B).

Table 1 Summary of reads and matches. Group

Total reads Total base pairs Mapped reads Perfect match ≤5 bp mismatch Unique match Multi-position match Unmapped reads

60-d

180-d

Map to genome

Map to gene

Map to genome

Map to gene

109,435,712 9,849,214,080 84,705,790 53,013,946 31,691,844 76,977,819 7,727,971

109,435,712 9,849,214,080 49,653,357 33,563,849 16,089,508 46,367,396 3,285,961

110,248,938 9,922,404,420 79,537,649 54,555,983 24,981,666 72,287,227 7,250,422

110,248,938 9,922,404,420 57,467,665 42,934,680 14,532,985 53,624,075 3,843,590

24,729,922

59,782,355

30,711,289

52,781,273

Fig. 1. Nine mRNA expression profiles in the porcine testes. (A) Nine mRNA expressions were measured using q-PCR. The X-axis represents the mRNAs and the Y-axis shows the relative expression levels of small RNAs (2−ΔΔCt values for qRT-PCR). The significance of differences for the expression between 60-day (sexually immature) and 180-day (sexually mature) testes of Large White pigs was calculated using two-tailed T-test. *, P ≤ 0.05; **, P ≤ 0.01; ***, P ≤ 0.001. (B) Correlation of fold change values measured using q-PCR and RNA-Seq methods.

Please cite this article as: Song, H., et al., Exploiting RNA-sequencing data from the porcine testes to identify the key genes involved in spermatogenesis in Large White pigs, Gene (2015), http://dx.doi.org/10.1016/j.gene.2015.07.057

4

H. Song et al. / Gene xxx (2015) xxx–xxx

Table 2 GO analysis of the significantly differentially expressed transcripts in 60-d and 180-d testes. GO ID

Gene Ontology term

Counta Corrected P valueb

GO:0007283 GO:0048232 GO:0032502 GO:0007276 GO:0019953 GO:0007275 GO:0048856 GO:0048869 GO:0044767 GO:0032504 GO:0048609 GO:0030154 GO:0048731 GO:0044702 GO:0006066 GO:0009653 GO:0048513 GO:0007507 GO:0022414 GO:0000003 GO:0003341 GO:0051239 GO:0003351 GO:0046165 GO:1901615

Spermatogenesis Male gamete generation Developmental process Gamete generation Sexual reproduction Multicellular organismal development Anatomical structure development Cellular developmental process Single-organism developmental process Multicellular organism reproduction Multicellular organismal reproductive process Cell differentiation System development Single organism reproductive process Alcohol metabolic process Anatomical structure morphogenesis Organ development Heart development Reproductive process Reproduction Cilium movement Regulation of multicellular organismal process Epithelial cilium movement Alcohol biosynthetic process Organic hydroxy compound metabolic process Positive regulation of response to stimulus Regulation of cellular component movement Spermatid development Lipid metabolic process Spermatid differentiation Cilium morphogenesis Regulation of response to stimulus Regulation of localization Regulation of cell communication

242 242 2012 285 320 1805 1783 1264 1555 357 353 1194 1540 381 152 933 1081 224 674 710 25 907 22 57 178

3.29E−18 3.29E−18 3.98E−12 6.00E−11 1.97E−10 2.59E−10 3.74E−09 1.21E−08 2.14E−08 3.35E−08 3.99E−08 2.48E−06 3.49E−06 4.48E−06 1.16E−05 0.0001 0.0006 0.0013 0.0014 0.0029 0.0035 0.0039 0.0042 0.0053 0.0058

556 280 50 475 51 73 1108 747 1003

0.0081 0.0108 0.0169 0.0181 0.0219 0.0282 0.0294 0.0310 0.0448

GO:0048584 GO:0051270 GO:0007286 GO:0006629 GO:0048515 GO:0060271 GO:0048583 GO:0032879 GO:0010646 a b

The DE gene members, which belong to an annotation term. The Bonferroni correction P value representing the degree of enrichment of the GO term.

3.4. DEG structure refinement As the swine genome is not fully annotated yet, we here refined gene structure with the reads to further complete gene annotation. We assembled transcripts with reads using Cufflink (Trapnell et al., 2010). Through comparisons of assembled transcripts and an annotated gene from reference information, 6827 transcripts from 5804 unique genes, and 6433 transcripts from 5536 unique genes could be extended at the 5′ or 3′ ends from data generated from 60-d and 180-d testes, respectively (S4 Table). One hundred and eighty six spermatogenesisassociated differentially expressed transcripts including BAG6 and PIWIL4 were extended (S4 Table). Twenty four extended events from 11 BAG6 transcripts were detected with 12 extended in both 5 and 3 flanking regions, respectively (S4 Table), which may cause some changes in BAG6 protein structure and expression level.

3.5. Alternative splicing and gene expression Alternative transcript isoforms are believed to be a principal driving force for the evolution of the complex transcriptome of mammals. We used SOAPsplice to explore potential alternative splicing events. In total, 7588 and 7466 porcine testis-expressed genes had evidence for 30,243 and 29,204 AS events in 60-d and 180-d testes (Fig. 2A). At least 39.43% and 38.95% (7588/19,245 and 7466/19,167) of testisexpressed genes were alternatively spliced. These AS events were categorized into seven known types according to the structures of exons. Intron retention was the most prevalent form of alternative splicing in 60-d testes (40% of the AS events), while intron

retention, alternative 5′ splice site, alternative 3′ splice site (A3S) and exon skipping shared relative high frequencies in 180-d testes (Fig. 2A). We next analyzed to what extent do AS events were observed specifically in one stage. One thousand two hundred and seventy three (1273) AS events in this data set were age-specific and spermatogenesis-associated (S5 Table). Sperm-associated antigen 6 (SPAG6, ENSSSCG00000011080) had two A3Ss (58159904–58160044:1570.801418:1, 58226471– 58226625:1441.935484:1) and a skipped exon 10 (58159899– 58160044:1521.13:1) event in data from 180-d testes (S5 Table, Fig. 2B). The skip of exon 10 led to a huge change in the amino acid sequence. RT-PCR on pig RNAs from the above Large White pigs with RNA-Seq data showed the evidence for the skipped exon isoform (Fig. 2C). 3.6. Novel SNP identification in the DEGs We identified 19,996 and 15,964 SNPs in the sequence originated from testes of 60-d and 180-d pigs compared with the reference genome, respectively. One thousand two hundred and twenty four (1224) and 1622 SNPs, from 106 and 174 spermatogenesis associated DEGs, were fixed in 60-d and 180-d, respectively (S6 Table). Two PIWIL4 SNPs (g.572 TNC, g.561 ANG) detected by RNA-Seq were validated by PCR-RFLPs, and then their associations with semen quality were analyzed in Duroc, Large White and Landrace populations (S7 Table). The results showed that PIWIL4 g.572 CT boars had significantly higher semen volume than TT boars (P b 0.05) in Landrace pigs, with allelic T substitution effect − 26.77 ml; Duroc PIWIL4 g.561 AA boars had significantly higher sperm concentration than those of GG boars (P b 0.05). 4. Discussion Examination of the transcriptome can provide gene expression data, which can be used to detect expression signatures associated with phenotypes of interest, for example reproduction. The primary function of the male sex organ testis is responsible for the production of sperm (spermatogenesis). Spermatogenesis is mainly regulated by genes uniquely expressed at different developmental stages. The development of RNA-Seq allows for characterization of the porcine testis with high resolution. We here analyzed gene expression in the testes from sexually immature (60-d) and sexually mature (180-d) Large White boars using high throughput RNA deep sequencing, in order to discover genes and isoforms that are likely to be involved in testicular functions, especially spermatogenesis. 4.1. RNA profiling analysis of porcine testes from Large White boars By analyzing the RNA-Seq data from the porcine testes of Large White pigs, 19,245 and 19,167 were expressed in 60-d and 180-d testes, respectively (S2 Table). Such a high proportion (76%) of expressed genes may be partially contributed by the presence of many cell types including Sertoli cells, Leydig's cells and germ cells in the pig testis. Although the vast majority of genes were expressed in our samples, 5320 genes including 2083 protein coding genes were not detected. Olfactory receptor (OLFR) genes were highly enriched among those non-expressed genes (P = 2.1E−17), possibly because most olfactory receptors are specifically expressed in olfactory neurons (Margolin et al., 2014; Fleischer et al., 2009). The top five genes with the highest transcribed level in both stages were cytochrome c oxidase subunit I (MT-COX1), cytochrome c oxidase subunit 3 (MT-COX3), clusterin (CLU), cytochrome c oxidase subunit 2 (MT-COX2) and ATP synthase subunit 6 (MT-ATP6), in which all were associated with mitochondrial function. Clusterin inhibits c-Myc-mediated apoptosis by interacting with conformation-altered Bax in mitochondria (Zhang et al., 2005). Those mitochondrial genes were enriched in the porcine testis, possibly

Please cite this article as: Song, H., et al., Exploiting RNA-sequencing data from the porcine testes to identify the key genes involved in spermatogenesis in Large White pigs, Gene (2015), http://dx.doi.org/10.1016/j.gene.2015.07.057

H. Song et al. / Gene xxx (2015) xxx–xxx

5

Fig. 2. AS events observed by RNA-Seq and the validation by RT-PCR in 60-day (sexually immature) and 180-day (sexually mature) testes of Large White pigs. (A) Predicted alternative splicing (AS) events in pigs. (B) Gene model for ENSSSCT00000012127 (annotated as encoding SPAG6) from Sus scrofa genome assembly 10.2 (http://www.ensembl.org/Sus_scrofa/Info/Index). Exons are numbered 1–11. The 10th exon is skipped in a splice variant. (C) RT-PCR analysis of SPAG6 transcripts in RNA samples from porcine 60-d and 180-d testes. Marker: DL2000.

because the mitochondrial-mediated pathway played an important role in maintaining the germ cell population (Vaithinathan et al., 2010). 4.2. DEGs in porcine sexually immature and mature testes Taking an FDR of 0.001 and a |log2Ratio| of 1 as cutoffs, 10,095 genes showed significantly different expression patterns between 60-d and 180-d porcine testes. We then assigned these DEGs to 34 GO BP terms with P b 0.05 (Table 2). 242 DEGs were spermatogenesis associated, including Piwi-like (PIWIL), spermatogenesis associated (SPATA) and KIT genes (S2 Table). PIWIL4 belongs to the Argonaute family of proteins, which function in development and maintenance of germline stem cells (Aravin et al., 2008). Conditional inactivation of PIWIL4 (MIWI2) reveals that PIWIL4 (MIWI2) is only essential for prospermatogonial development in mice (Bao et al., 2014). Mouse Piwi-like proteins associate with piRNAs of characteristic size—26 nt for MILI, 28 nt for PIWIL4 (MIWI2), and 30 nt for MIWI. piRNAs derived from ping-pong biogenesis could be loaded onto MIWI2, then MIWI2/piRNA complex may enter the nucleus to direct transcriptional silencing of transposons (Aravin et al., 2008; De Fazio et al., 2011). PIWIL4 (ENSSSCG00000014959) mRNA level was elevated in testes of 60-d piglets when compared to 180-d males. Similar expression pattern was found in testes of 3-day-old piglets compared

to 2-year males (Kowalczykiewicz et al., 2012). Spermatogenesis associated (SPATA) genes are a large gene family which plays a very important role in testis development and spermatogenesis. SPATA3 and SPATA6 mRNA highest levels were detected in spermatids (Wu et al., 2010). SPATA19 expression increased gradually over time during mouse testis development (Matsuoka et al., 2004). SPATA4 expression in the testis can only be detected in the rats elder than 30-d, and the expression increase till 65-d (Liu et al., 2004). Consistent with above studies, here SPATA3 (ENSSSCG00000016269), SPATA4 (ENSSSCG00000015767), SPATA6 (ENSSSCG00000003881), SPATA9 (ENSSSCG00000030114), SPATA9-like (ENSSSCG00000014142), SPATA19 (ENSSSCG00000025612) and SPATA25 (ENSSSCG00000007429) were up-regulated in the adult boars (S2 Table). These DEGs were enriched to the pathways including focal adhesion (ko04510), ECM–receptor interaction (ko04512) and phagosome (ko04145) (S3 Table). Focal adhesion can sense the extracellular matrix (ECM), cell surface, physiological stress and mechanical stress through integrin, caveolin and receptor tyrosine kinase (RTK) (Siu et al., 2009; Zhang et al., 2013). The responses of focal adhesion pathway included cell growth or death, cell motility, cytoskeleton reorganization and gene expression. Focal adhesion kinase (FAK) is likely to be involved in the events of Sertoli–Sertoli and Sertoli–germ cell adhesion, as well as germ cell transport during spermatogenesis because of active

Please cite this article as: Song, H., et al., Exploiting RNA-sequencing data from the porcine testes to identify the key genes involved in spermatogenesis in Large White pigs, Gene (2015), http://dx.doi.org/10.1016/j.gene.2015.07.057

6

H. Song et al. / Gene xxx (2015) xxx–xxx

restructuring at the cell junctions (Zhang et al., 2013). In accordance with our present results, focal adhesion pathway was also the most significantly enriched for the differentially expressed genes between GC-1spg and GC-2spd (Zhang et al., 2013). In our study, the downregulated FAK, c-Jun and Bcl-2 in 180-d testes, was closely related to the germ cell proliferation and survival (S2 Table). 4.3. Gene structure refinement and novel isoform identification Recent progress in DNA sequencing technologies is making it possible to determine thousands of porcine genomic sequences (Ai et al., 2015). However, the annotation of each genome was not fully explored. Here 186 spermatogenesis-associated differentially expressed transcripts including BAG6 and PIWIL4 were extended (S4 Table). BAG6 inactivation in mice resulted in meiotic male GC apoptosis and male infertility (Sasaki et al., 2008). Twenty four extended events from 11 BAG6 transcripts were detected with 12 extended in both 5 and 3 flanking regions, respectively (S4 Table), which may cause some changes in BAG6 protein structure and expression level. Previous studies have shown that alternative splicing is critical for testicular development and spermatogenesis (Elliott and Grellscheid, 2006). By analyzing the RNA-Seq data from the porcine testes, we found that 39.43% and 38.95% (7588/19,245, 7466/19,167) of testisexpressed genes were alternatively spliced. Recent evidence based on next-generation sequencing indicated a high incidence of AS_86% of annotated genes in human, possibly because 15 diverse human tissue and cell line transcriptome data were used (Wang et al., 2008). Thirty percent of human ESTs in the testis were recorded to undergo alternative splicing, a frequency just less than that in the brain (Yeo et al., 2004). A comprehensive microarray study of human exons showed that after brain, testis has the highest number of tissue-specific exons (Clark et al., 2007). These AS events were categorized into seven known types according to the structures of exons. Intron retention was the most prevalent form of alternative splicing in 60-d testes (40% of the AS events), while intron retention, alternative 5′ splice site, alternative 3′ splice site (A3S) and exon skipping shared relative high frequencies in 180-d testes (Fig. 2A). High prevalence of intron retention was also found in other mammalian transcriptomes with integrated RNA-Seq data from more than 40 cell and tissue types of human and mouse (Braunschweig et al., 2014). We next analyzed to what extent AS events observed specifically in one stage. One thousand two hundred and seventy three (1273) AS events in this data set were age-specific and spermatogenesis-associated (S5 Table). Sperm-associated antigen 6 (SPAG6, ENSSSCG00000011080) had two A3Ss (58159904–58160044:1570.801418:1, 58226471– 58226625:1441.935484:1) and a skipped exon 10 (58159899– 58160044:1521.13:1) event in data from 180-d testes (S5 Table, Fig. 2B). The skip of exon 10 led to a huge change in the amino acid sequence. SPAG6 is important for sperm flagellar motility and mature sperm structural integrity (Sapiro et al., 2002). Alternatively spliced SPAG6 variants that encode different protein isoforms have been described in humans (http://www.ncbi.nlm.nih.gov/gene?term=9576), indicating the conservation property of alternative splicing in mammalians. RT-PCR on pig RNAs from above Large White pigs with RNA-Seq data showed the evidence for the skipped exon isoform (Fig. 2C). 4.4. Discovery of SNPs for semen quality traits Spermatogenesis, the process of producing sperm, is vital for semen quality and male fertility. Sperm concentration, motility and abnormal sperm rate have usually been used as criteria for semen quality evaluation (Colenbrander et al., 1993). A genome-wide scan was performed in the White Duroc × Erhualian intercross and 18 QTL for semen quality traits were detected, including 4 genome-wide significant QTL each for semen pH on SSC2 and SSC12, for semen volume on SSC15, and for ejaculation times on SSC17 (Xing et al., 2009). A single-SNP genome-

wide association study detected six significant SNPs on SSC1 (position 117.26-119.56Mb) in animals from a Large White-based pig population (Diniz et al., 2014). Candidate gene approach was also used to identify the genetic markers for semen quality, such as deleted in azoospermialike (DAZL) (Ma et al., 2013) and phosphoglycerate kinase 2 (PGK2) gene (Chen et al., 2004). Presently, RNA-Seq provided an efficient means to identify 19,996 and 15,964 genetic variations in the expressed portion of the genome from testes of 60-d and 180-d pigs compared with the reference genome, respectively. Furthermore, g.572 TNC and g.561 ANG in a spermatogenesis associated DEG (PIWIL4) were associated with semen quality traits in three commercial pig populations. This is the first report to analyze the PIWIL4 SNPs with semen quality traits, and PIWIL4 gene might be a potential molecular marker related to the reproductive traits in pig. 5. Conclusions The article described a genome-wide comparison between porcine transcriptomes derived from the 60-day-old and 180-day-old porcine testes by RNA-sequencing. We identified 10,095 genes significantly differentially expressed including 242 spermatogenesis-associated. We then analyzed SNPs, extended transcripts and AS events in the spermatogenesis-associated DEGs. The results of our study identified comprehensive transcriptome changes and provided new insights in the differently expressed genes that associated spermatogenesis. Meanwhile, the genes and isoforms that are active in spermatogenesis and meiosis offer a rich resource for further targeted studies. Supplementary data to this article can be found online at http://dx. doi.org/10.1016/j.gene.2015.07.057. Funding This work was supported financially by Key Projects in Doctoral Fund of Ministry of Education of China (20120146110018), National Science R&T Program (2014BAD20B01), Hubei Science R&T Program (2014BBB008, 2014BBA194), Fok Ying Tung Education Foundation (131028), and Fundamental Research Funds for the Central Universities (2013PY049). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing interests The authors have declared that no competing interests exist. Author contributions Conceived and designed the experiments: FL. Performed the experiments: HS. Analyzed the data: FL, HS, LZ, YL and XX. Contributed reagents/materials/analysis tools: HS, LZ, CM and KG. Wrote the manuscript: FL and HS. Acknowledgments We thank Dr. Jiuzhou Song (University of Maryland) and Prof. Chunyan Mou (Huazhong Agricultural University) for their efforts in revising this manuscript. We are grateful to Beijing Genomics Institute (Shenzhen) for technical assistance with RNA extraction and RNA sequencing. The authors also acknowledge the staff of the pig farm of Huazhong Agricultural University for providing pig samples. References Ai, H., Fang, X., Yang, B., Huang, Z., Chen, H., et al., 2015. Adaptation and possible ancient interspecies introgression in pigs identified by whole-genome sequencing. Nat. Genet. 47, 217–225.

Please cite this article as: Song, H., et al., Exploiting RNA-sequencing data from the porcine testes to identify the key genes involved in spermatogenesis in Large White pigs, Gene (2015), http://dx.doi.org/10.1016/j.gene.2015.07.057

H. Song et al. / Gene xxx (2015) xxx–xxx Aravin, A.A., Sachidanandam, R., Bourc'his, D., Schaefer, C., Pezic, D., et al., 2008. A piRNA pathway primed by individual transposons is linked to de novo DNA methylation in mice. Mol. Cell 31, 785–799. Bao, J., Zhang, Y., Schuster, A.S., Ortogero, N., Nilsson, E.E., et al., 2014. Conditional inactivation of Miwi2 reveals that MIWI2 is only essential for prospermatogonial development in mice. Cell Death Differ. 21, 783–796. Boyle, E.I., Weng, S., Gollub, J., Jin, H., Botstein, D., et al., 2004. GO: TermFinder—open source software for accessing Gene Ontology information and finding significantly enriched Gene Ontology terms associated with a list of genes. Bioinformatics 20, 3710–3715. Braunschweig, U., Barbosa-Morais, N.L., Pan, Q., Nachman, E.N., Alipanahi, B., et al., 2014. Widespread intron retention in mammals functionally tunes transcriptomes. Genome Res. 24, 1774–1786. Chalmel, F., Rolland, A.D., Niederhauser-Wiederkehr, C., Chung, S.S.W., Demougin, P., et al., 2007. The conserved transcriptome in human and rodent male gametogenesis. Proc. Natl. Acad. Sci. U. S. A. 104, 8346–8351. Chen, K., Knorr, C., Moser, G., Gatphayak, K., Brenig, B., 2004. Molecular characterization of the porcine testis-specific phosphoglycerate kinase 2 (PGK2) gene and its association with male fertility. Mamm. Genome 15, 996–1006. Clark, T.A., Schweitzer, A.C., Chen, T.X., Staples, M.K., Lu, G., et al., 2007. Discovery of tissue-specific exons using comprehensive human exon microarrays. Genome Biol. 8, R64. Colenbrander, B., Feitsma, H., Grooten, H.J., 1993. Optimizing semen production for artificial insemination in swine. J. Reprod. Fertil. Suppl. 48, 207–215. De Fazio, S., Bartonicek, N., Di Giacomo, M., Abreu-Goodger, C., Sankar, A., et al., 2011. The endonuclease activity of Mili fuels piRNA amplification that silences LINE1 elements. Nature 480, 259–263. de Kretser, D.M., Loveland, K.L., Meinhardt, A., Simorangkir, D., Wreford, N., 1998. Spermatogenesis. Hum. Reprod. 13 (Suppl. 1), S1–S8. Diniz, D.B., Lopes, M.S., Broekhuijse, M.L., Lopes, P.S., Harlizius, B., et al., 2014. A genomewide association study reveals a novel candidate gene for sperm motility in pigs. Anim. Reprod. Sci. 151, 201–207. Djureinovic, D., Fagerberg, L., Hallström, B., Danielsson, A., Lindskog, C., et al., 2014. The human testis-specific proteome defined by transcriptomics and antibody-based profiling. Mol. Hum. Reprod. 20, 476–488. Eddy, E.M., O'Brien, D.A., 1998. Gene expression during mammalian meiosis. Curr. Top. Dev. Biol. 37, 141–200. Egbunike, G.N., 1979. Development of puberty in Large White boars in a humid tropical environment. Acta Anat. (Basel) 104, 400–405. Elliott, D.J., Grellscheid, S.N., 2006. Alternative RNA splicing regulation in the testis. Reproduction 132, 811–819. Evans, E., Hogarth, C., Mitchell, D., Griswold, M., 2014. Riding the spermatogenic wave: profiling gene expression within neonatal germ and sertoli cells during a synchronized initial wave of spermatogenesis in mice. Biol. Reprod. 90 (108), 1–12. Fallahi, M., Getun, I.V., Wu, Z.K., Bois, P.R.J., 2010. A global expression switch marks pachytene initiation during mouse male meiosis. Genes 1, 469–483. Fleischer, J., Breer, H., Strotmann, J., 2009. Mammalian olfactory receptors. Front. Cell. Neurosci. 3 (9), 1–10. Hecht, N.B., 1998. Molecular mechanisms of male germ cell differentiation. Bioessays 20, 555–561. Howe, E.A., Sinha, R., Schlauch, D., Quackenbush, J., 2011. RNA-Seq analysis in MeV. Bioinformatics 27, 3209–3210. Huang, S., Zhang, J., Li, R., Zhang, W., He, Z., et al., 2011. SOAPsplice: genome-wide ab initio detection of splice junctions from RNA-Seq data. Front. Genet. 2, 46,1–46,12. Kanehisa, M., Goto, S., Sato, Y., Kawashima, M., Furumichi, M., et al., 2014. Data, information, knowledge and principle: back to metabolism in KEGG. Nucleic Acids Res. 42 (Database issue), D199–D205. Kleene, K.C., 2001. A possible meiotic function of the peculiar patterns of gene expression in mammalian spermatogenic cells. Mech. Dev. 106, 3–23. Kowalczykiewicz, D., Pawlak, P., Lechniak, D., Wrzesinski, J., 2012. Altered expression of porcine Piwi genes and piRNA during development. PLoS One 7, e43816. Laiho, A., Kotaja, N., Gyenesei, A., Sironen, A., 2013. Transcriptome profiling of the murine testis during the first wave of spermatogenesis. PLoS One 8, e61558.

7

Langmead, B., Schatz, M.C., Lin, J., Pop, M., Salzberg, S.L., 2009. Searching for SNPs with cloud computing. Genome Biol. 10, R134. Liu, S.F., He, S., Liu, B.W., Zhao, Y., Wang, Z., 2004. Cloning and characterization of testis-specific spermatogenesis associated gene homologous to human SPATA4 in rat. Biol. Pharm. Bull. 27, 1867–1870. Lunstra, D.D., Ford, J.J., Klindt, J., Wise, T.H., 1997. Physiology of the Meishan boar. J. Reprod. Fertil. Suppl. 52, 181–193. Lunstra, D.D., Wise, T.H., Ford, J.J., 2003. Sertoli cells in the boar testis: changes during development and compensatory hypertrophy after hemicastration at different ages. Biol. Reprod. 68, 140–150. Luo, L., Ye, L., Liu, G., Shao, G., Zheng, R., et al., 2010. Microarray-based approach identifies differentially expressed microRNAs in porcine sexually immature and mature testes. PLoS One 5, e11744. Ma, C., Li, J., Tao, H., Lei, B., Li, Y., et al., 2013. Discovery of two potential DAZL gene markers for sperm quality in boars by population association studies. Anim. Reprod. Sci. 143, 97–101. Margolin, G., Khil, P.P., Kim, J., Bellani, M.A., Camerini-Otero, R.D., 2014. Integrated transcriptome analysis of mouse spermatogenesis. BMC Genomics 15, 39,1–39,19. Matsuoka, Y., Iguchi, N., Kitamura, K., Nishimura, H., Manabe, H., et al., 2004. Cloning and characterization of a mouse spergen-1 localized in sperm mitochondria. Int. J. Androl. 27, 152–160. Ro, S., Park, C., Song, R., Nguyen, D., Jin, J., et al., 2007. Cloning and expression profiling of testis-expressed piRNA-like RNAs. RNA 13, 1693–1702. Sapiro, R., Kostetskii, I., Olds-Clarke, P., Gerton, G.L., Radice, G.L., et al., 2002. Male infertility, impaired sperm motility, and hydrocephalus in mice deficient in sperm-associated antigen 6. Mol. Cell. Biol. 22, 6298–6305. Sasaki, T., Marcon, E., McQuire, T., Arai, Y., Moens, P.B., et al., 2008. Bat3 deficiency accelerates the degradation of Hsp70-2/HspA2 during spermatogenesis. J. Cell Biol. 182, 449–458. Schultz, N., Hamra, F.K., Garbers, D.L., 2003. A multitude of genes expressed solely in meiotic or postmeiotic spermatogenic cells offers a myriad of contraceptive targets. Proc. Natl. Acad. Sci. U. S. A. 100, 12201–12206. Shima, J.E., McLean, D.J., McCarrey, J.R., Griswold, M.D., 2004. The murine testicular transcriptome-characterizing gene expression in the testis during the progression of spermatogenesis. Biol. Reprod. 71, 319–330. Siu, E.R., Wong, E.W., Mruk, D.D., Porto, C.S., Cheng, C.Y., 2009. Focal adhesion kinase is a blood–testis barrier regulator. Proc. Natl. Acad. Sci. U. S. A. 106, 9298–9303. Trapnell, C., Williams, B.A., Pertea, G., Mortazavi, A., Kwan, G., et al., 2010. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat. Biotechnol. 28, 511–515. Vaithinathan, S., Saradha, B., Mathur, P.P., 2010. Methoxychlor induces apoptosis via mitochondria- and FasL-mediated pathways in adult rat testis. Chem. Biol. Interact. 185, 110–118. Wang, E.T., Sandberg, R., Luo, S., Khrebtukova, I., Zhang, L., et al., 2008. Alternative isoform regulation in human tissue transcriptomes. Nature 456, 470–476. Wu, Q., Song, R., Yan, W., 2010. SPATA3 and SPATA6 interact with KLHL10 and participate in spermatogenesis. Biol. Reprod. 83, 177. Xing, Y., Ren, J., Ren, D., Guo, Y., Wu, Y., et al., 2009. A whole genome scanning for quantitative trait loci on traits related to sperm quality and ejaculation in pigs. Anim. Reprod. Sci. 114, 210–218. Yeo, G., Holste, D., Kreiman, G., Burge, C.B., 2004. Variation in alternative splicing across human tissues. Genome Biol. 5, R74. Yu, Z., Guo, R., Ge, Y., Ma, J., Guan, J., et al., 2003. Gene expression profiles in different stages of mouse spermatogenic cells during spermatogenesis. Biol. Reprod. 69, 37–47. Zhang, H., Kim, J.K., Edwards, C.A., Xu, Z., Taichman, R., et al., 2005. Clusterin inhibits apoptosis by interacting with activated Bax. Nat. Cell Biol. 7, 909–915. Zhang, X., Hao, L., Meng, L., Liu, M., Zhao, L., et al., 2013. Digital gene expression tag profiling analysis of the gene expression patterns regulating the early stage of mouse spermatogenesis. PLoS One 8, e58680.

Please cite this article as: Song, H., et al., Exploiting RNA-sequencing data from the porcine testes to identify the key genes involved in spermatogenesis in Large White pigs, Gene (2015), http://dx.doi.org/10.1016/j.gene.2015.07.057