Biochemical and Biophysical Research Communications xxx (2018) 1e7
Contents lists available at ScienceDirect
Biochemical and Biophysical Research Communications journal homepage: www.elsevier.com/locate/ybbrc
Dynamics of alternative polyadenylation in human preimplantation embryos Jen-Yun Chang a, Wen-Hsuan Yu b, Hsueh-Fen Juan a, b, c, *, Hsuan-Cheng Huang d, ** a
Institute of Molecular and Cellular Biology, National Taiwan University, Taipei 10617, Taiwan Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, Taipei 10617, Taiwan c Department of Life Science, National Taiwan University, Taipei 10617, Taiwan d Institute of Biomedical Informatics, National Yang-Ming University, Taipei 11221, Taiwan b
a r t i c l e i n f o
a b s t r a c t
Article history: Received 24 August 2018 Accepted 6 September 2018 Available online xxx
Alternative polyadenylation (APA) affects the length of the 30 untranslated region (30 -UTR) and the regulation of microRNAs. Previous studies have shown that cancer cells tend to have shorter 30 -UTRs than normal cells. A plausible explanation for this is that it enables cancer cells to escape the regulation of microRNAs. Here, we extend this concept to an opposing context: changes in 30 -UTR length in the development of the human preimplantation embryo. Unlike cancer cells, during early development 30 UTRs tended to become longer, and gene expression was negatively correlated with 30 -UTR length. Moreover, our functional enrichment results showed that length changes are part of the development mechanism. We also investigated the analogy of 30 -UTR length variation with respect to lncRNAs and found that, similarly, lncRNA length tended to increase during embryo development. © 2018 Elsevier Inc. All rights reserved.
Keywords: Alternative polyadenylation (APA) 30 untranslated region (30 -UTR) lncRNA Preimplantation embryo development RNA-seq
1. Introduction Human genes possess multiple polyadenylation sites, known as alternative polyadenylation (APA), which can result in mRNA isoforms with different coding regions or affect the lengths of 30 untranslated regions (30 -UTRs; Fig. 1A) [1]. Differences in 30 -UTR length could affect the regulation of microRNAs via gain or loss of some of the target sites [2]. It has been shown that the expression ratio of long and short isoforms of 30 -UTR (LSR) varies between different cell types and tissues [3]. In our previous study, we compared the LSR expression between cancer cells and normal cells. Overall, cancer cells express more short isoforms of 30 -UTR than normal cells [4]. Moreover, cancer cells are sometimes thought to be exemplifying reversed cell development. This idea inspired us to investigate LSR variability during human early embryo development. Understanding human preimplantation embryo development
* Corresponding author. Department of Life Science, National Taiwan University, No. 1, Sec. 4, Roosevelt Road, Taipei 10617, Taiwan. ** Corresponding author. Institute of Biomedical Informatics, National Yang-Ming University, No.155, Sec.2, Linong Street, Taipei 11221, Taiwan. E-mail addresses:
[email protected] (H.-F. Juan),
[email protected] (H.-C. Huang).
enables us to understand the processes involved in embryogenesis. It is also important for assisted reproductive technologies, and for therapies related to the use of human embryonic stem cells (hESC). Recently, the use of advanced technology for noninvasive imaging techniques, and the use of next-generation sequencing to examine the sequence information at a single-cell level, have helped us better understand this mysterious and fascinating stage of embryo development. Differences in gene expression patterns during this phase [5], lineage commitment [6e8], and embryonic genome activation [9] have been established. The new techniques have also made examination of the preimplantation embryo or fetus more affordable. The understanding gained has improved the success rate of embryo transfer and the birth rate, and reduced the likelihood of disability in newborns. However, the cellular and molecular mechanisms involved in this early development stage have not yet been well studied. Due to technique limitations, previous studies have mostly focused on differences in gene expression levels at different stages. In this article, we analyze the gene expression data from a different perspective. Our objective is to reveal the variations in LSR during the early stage of development, and to uncover their relationship with gene expression. In addition, long noncoding RNAs (lncRNAs) have gained widespread attention in recent years as a newly discovered regulatory factor [10]. It has been revealed that they
https://doi.org/10.1016/j.bbrc.2018.09.027 0006-291X/© 2018 Elsevier Inc. All rights reserved.
Please cite this article in press as: J.-Y. Chang, et al., Dynamics of alternative polyadenylation in human preimplantation embryos, Biochemical and Biophysical Research Communications (2018), https://doi.org/10.1016/j.bbrc.2018.09.027
2
J.-Y. Chang et al. / Biochemical and Biophysical Research Communications xxx (2018) 1e7
Fig. 1. Schematic diagram for LSR. Here, we only discuss the longest and shortest isoform on both genes and lncRNAs to simplify the problem. (A) Illustration of the 30 -UTR length variation of a gene. Different polyadenylation sites (AAUAAA signal inside 30 -UTR) cause different 30 -UTR lengths. (B) Length variations in the isoforms of an lncRNA.
play important roles in development, such as in dosage compensation [11,12], genomic imprinting [13e15], control of pluripotency, and lineage specification [16]. Here, we consider the full length of lncRNA as an analogy to 30 -UTR and define the LSR of lncRNA as the expression ratios of the long and short isoforms for lncRNAs (Fig. 1B). We investigate the trends in this ratio during the development of the human preimplantation embryo. 2. Materials and methods 2.1. RNA sequencing datasets All sequencing data were downloaded from public databases. EMTAB-3929 and E-MTAB-567 were downloaded from ArrayXpress. SRP081272 was downloaded from NCBI SRA. One dataset was of human preimplantation embryo development data (E-MTAB-3929) and two datasets were of normaletumor paired tissues (E-MTAB567 and SRP081272). All datasets were sequenced using an Illumina Hiseq 2000. Only one dataset used a single-end read type (EMTAB-3929) and the others used a paired-end type. Briefly, E-MTAB-3929 is a single-cell RNA-seq dataset that includes 1529 individual cells from 88 human preimplantation embryos, ranging Day 3 to Day 7. Some of the samples can be divided into smaller groups as follows: compacted, uncompacted, and not applicable under Day 4; epiblast, primitive endoderm, trophectoderm, and not applicable under Day 5; epiblast, primitive endoderm, trophectoderm mural, and trophectoderm polar under Day 6; and epiblast, endoderm, trophectoderm mural, and trophectoderm polar under Day 7 [17]. E-MTAB-567 is an RNA sequencing dataset of 14 primary prostate cancers and their paired normal counterparts from the Chinese population [18]. SRP081272 is another RNA sequencing dataset of eight gastric cancers and their paired normal tissues [19]. 2.2. Reference dataset Sequences of Untranslated Region (UTR) data were downloaded from the UTRdb (UTRfull collection) [20], which were derived from full-length transcripts collected in ASPicDB [21]. The dataset contains 86032 30 -UTR records; however, a gene could have many 30 UTR records with different lengths, ranging from <10 bp to >1000 bp. To reduce complexity, we used only the longest and shortest 30 UTR records for each gene to calculate the LSR. Moreover, to more precisely answer the question regarding the effect of changes in 30 -
UTR length, we set the condition that the longest and shortest 30 UTRs for each gene needed to have the same start position. This meant that the shortest 30 -UTR could be aligned to the longest 30 UTR perfectly from the first position. Finally, to reduce noise, we set the requirement that the shortest 30 -UTR should be 100 bp, and the difference between the longest and shortest 30 -UTR in each gene needed to be 100 bp. After processing, 12,920 30 -UTR records (6460 genes) remained and were used as the reference database for long- and short-form 30 -UTR alignment. For gene expression, lncRNA expression, and lncRNA LSR calculation, we used Gencode V24 [22] as the reference database. As with the 30 -UTR data, for simplicity, we used only the longest and shortest lncRNA isoforms to calculate LSR. After filtering, 3914 lncRNAs were retained (7828 records).
2.3. RNA sequencing data process All fastq files were filtered with a FASTX-Toolkit, using the criteria described below. First, any single base with a Q score <30 was discarded, along with any single sequence of which no more than 70% of its original length remained after the single-base filtering. Filtered sequences were aligned to our modified 30 -UTR database for LSR calculation, and were then aligned to Gencode V24 to evaluate the expression of both mRNA and lncRNA, using Sailfish aligner V0.9.0. Quantile normalization was performed at a sample level to reduce the bias between samples before further data processing. Briefly, each sample in a dataset was sorted separately from highest to lowest. 75th percentile from each sample was collected and calculate the median value. The normalized factor for each sample was calculated from its 75th percentile divided by median value that just calculated. Finally, all data in the sample were divided by the normalized factor to complete quantile normalization. For each gene, the LSRs for different stages were calculated if more than 80% of samples had an LSR value. For each lncRNA, 50% of samples were required to have an LSR value to be included. PCCs for LSRs and expression were calculated based on the same criteria, and for each gene needed to have both LSR and expression values in the sample. PCCs between LSRs and development stages were calculated from the genes that were expressed across all stages. For expression, isoforms were averaged to represent the value of each gene/lncRNA expression.
Please cite this article in press as: J.-Y. Chang, et al., Dynamics of alternative polyadenylation in human preimplantation embryos, Biochemical and Biophysical Research Communications (2018), https://doi.org/10.1016/j.bbrc.2018.09.027
J.-Y. Chang et al. / Biochemical and Biophysical Research Communications xxx (2018) 1e7 Table 1 Number of genes and lncRNAs with LSRs calculated for each stage in preimplantation embryo development datasets. Stages
LSR calculated gene#
LSR calculated lncRNA#
Day Day Day Day Day
916 1097 874 1147 1213
129 117 65 81 97
3 4 5 6 7
3
correction was used to correct for multiple tests; to eliminate the bias from our modified 30 -UTR database for LSR calculation, we changed the reference set to the gene list of our modified 30 -UTR database; all enriched functions were under Biological Process (GO Term); and the enriched results of all stages were compared together to identify which functions were listed at different stages. 3. Results 3.1. Establishment of LSR database
2.4. LSR calculation APA can result in mRNA isoforms with variable 30 -UTR lengths. Different 30 -UTR lengths may cause different regulation consequences. To quantize this phenomenon, we defined the Long isoform (L-form) and Short isoform (S-form) Ratio (LSR) as follows:
LSR ¼
L LþS
The expression levels of L-form (L) and S-form (S) for each gene of each sample were acquired from the quantification of RNA sequencing data aligned to a modified 30 -UTR database. Before calculating the LSR, quantile normalization was also performed on the alignment results to eliminate bias among samples.
To analyze the LSR of genes, we downloaded the human UTRfull collection in the UTRdb database [20]. 30 -UTR data were selected and reorganized. For simplicity, we retained only the longest and shortest 30 -UTR records for each gene. In addition, the paired longest and shortest record needs to be perfect match from first base. After this selection, 6460 genes remained in our 30 -UTR reference dataset. To analyze the LSR of lncRNAs, Gencode V24 [22] was used; lncRNA data were selected and filtered using the same criteria as for the gene data. The longest and shortest isoforms of each lncRNA were retained and used as the reference dataset. A reference dataset detail expression information for genes and lncRNAs was also downloaded from Gencode V24. 3.2. Validation of analysis pipeline
2.5. Functional enrichment analysis The Cytoscape plug-in tool BiNGO [23] was used to perform functional enrichment to assess genes that overpresentation of GO terms. Parameters were set as follows: the hypergeometric test was set with a significance level of <0.05; Benjamini and Hochberg FDR
With the reference database established, we first downloaded two cancer datasets from a public database to compare the results with our previous findings [4]. Both datasets consisted of RNA sequencing data for tumorenormal paired tissues. Sequencing data were filtered based on the following criteria: bases with a quality score <30 (Q < 30) were discarded; and each sequence could not be
Fig. 2. (A) Cumulative distributions of LSRs for protein-coding genes and (B) lncRNAs across the developmental stages of human preimplantation embryos. (C) Histograms of Pearson correlation coefficients showing the distribution of LSRs for protein-coding genes and (D) lncRNAs during early human preimplantation embryo development. The higher the r represents the LSR getting longer alongside the development.
Please cite this article in press as: J.-Y. Chang, et al., Dynamics of alternative polyadenylation in human preimplantation embryos, Biochemical and Biophysical Research Communications (2018), https://doi.org/10.1016/j.bbrc.2018.09.027
4
J.-Y. Chang et al. / Biochemical and Biophysical Research Communications xxx (2018) 1e7
less than 70% of its original length after these bases had been discarded. Processed data were aligned with our previously described reference database using Sailfish aligner V0.9.0 [24]. Quantile normalization was then performed to reduce bias between samples, and the LSR for each gene or lncRNA was calculated for each sample in both datasets. For every gene with a calculated LSR, we performed a Wilcoxon rank-sum test to filter out genes for which there was a significant difference in LSR between the normal and tumor groups. The number of genes for which we could calculate the LSR in these two datasets was about 2300 and 4200, respectively. After we had performed the Wilcoxon rank-sum test, these total genes were reduced to around 2000 and 4000, respectively, because some LSR data were missing for various samples. The number of genes for which there were significant differences in LSR, was 345 and 192, respectively. Of these genes, we found that the LSR tended to be lower in the tumor group for each gene in both datasets. These results were consistent with our previous findings for cancer cells [4] and show that our LSR analysis pipeline using typical RNA-seq data is feasible. 3.3. LSR tends to increase during preimplantation embryo development With the analysis pipeline established, human embryo RNA sequencing data were used to demonstrate the idea. We downloaded single-cell RNA sequencing dataset, E-MTAB-3929, from public database which contains data collected during early preimplantation embryo development. It comprises 1529 samples of human preimplantation embryos from embryo development days 3e7 [17]. The data were processed as previously described. All data were filtered by quality score and aligned to our reference database. Quantile normalization was performed to eliminate bias between samples before further data processing. The LSR of the genes was calculated for each stage of development (Table 1; Fig. 2AeB). For each gene with its long and short 30 -UTR isoforms expressed for more than 80% of the samples from each stage, the LSR will be calculated. The results showed that LSR had a slight tendency to increase over development. An analysis using Pearson correlation coefficients (PCC) confirmed that the LSR tended to increase during early embryo development (Fig. 2C). We also used the PCC to test whether higher LSRs downregulated the expression of mRNA (Fig. 3A). There was a negative
correlation between LSR and expression, indicating that overall, the higher LSRs were indeed associated with lower gene expression, and vice versa. We also analyzed the lncRNAs to evaluate whether their LSRs change during early development. The number of lncRNAs with LSR calculated were shown in Table 1. Similarly to the genes, the LSR was calculated for lncRNAs with both long and short 30 UTR isoforms expressed for more than 50% of the samples from each stage. Again, the PCC showed that the LSR tended to increase with development stage (Fig. 2D), and there was also a negative correlation between LSR and lncRNA expression during development (Fig. 3B). Together, these results indicate that LSRs tend to increase during early human embryo development, and that this could have an effect on the expression of genes and lncRNAs. 3.4. Enriched functions for LSR variation We also aimed to understand which functions are affected by variations in LSR, and therefore categorized genes by the stages during which their LSRs were lowest or highest. We performed a functional enrichment analysis for these genes, using a Cytoscape plug-in called Biological Networks Gene Ontology (BiNGO) [23]. The BiNGO analysis used Gene Ontology (GO) [25] as the reference database for gene function, and focused on biological process ontology (one of the three aspects of gene function covered by GO) to reveal the functions affected by changes in LSR. To avoid any bias from the limited number of genes with LSR information, the reference gene list used in BiNGO was replaced by the list of genes from our 30 -UTR reference database. All of the enrichment results were analyzed together to identify the relationship between biological functions (GO terms) and changes in LSR at different stages. There were 117 unique GO terms in total. The stage with the most enriched functions in both the highest and the lowest LSR was E-MTAB-3929 Day 3 (Fig. 4). This is the stage during which the maternalezygotic transition [26] occurs, which could explain why many functions are enriched. To go a step further, we filtered out the GO terms with a GO level <5 and those that only featured in a single stage, which left 15 GO terms (Table 2), 10 of which appear in only two stages. These coincided with one highest- and one lowest-LSR stage, which suggested that LSR variations might take some role to regulate the gene expression
Fig. 3. Histograms of Pearson correlation coefficients between LSRs and expression levels of protein-coding genes (left) and lncRNAs (right). Different colors represent different developmental stage. The longer LSRs tend to cause lower expressions which lead to negative pearson r. (For interpretation of the references to color in this figure legend, the reader is referred to the Web version of this article.)
Please cite this article in press as: J.-Y. Chang, et al., Dynamics of alternative polyadenylation in human preimplantation embryos, Biochemical and Biophysical Research Communications (2018), https://doi.org/10.1016/j.bbrc.2018.09.027
J.-Y. Chang et al. / Biochemical and Biophysical Research Communications xxx (2018) 1e7
5
Fig. 4. Functional enrichment analysis results for (A) E-MTAB-3929, Day 3, highest LSR and (B) E-MTAB-3929, Day 3, lowest LSR. Each node represents one function. Nodes with significance are colored; darker colors indicate higher levels of significance. (For interpretation of the references to color in this figure legend, the reader is referred to the Web version of this article.)
Please cite this article in press as: J.-Y. Chang, et al., Dynamics of alternative polyadenylation in human preimplantation embryos, Biochemical and Biophysical Research Communications (2018), https://doi.org/10.1016/j.bbrc.2018.09.027
6
J.-Y. Chang et al. / Biochemical and Biophysical Research Communications xxx (2018) 1e7
Table 2 Enriched function list and the associated LSR state during early embryo development stages. Only GO level higher than 5 were listed here. Stages and LSR states are named according to the following rule: Dataset, Stage, LSR state. Enriched function
GO Term
GO level
Stages and LSR states
ribosome assembly ribonucleoprotein complex assembly regulation of cytokinesis maintenance of chromatin silencing translational elongation cellular macromolecular complex assembly DNA catabolic process, endonucleolytic posttranscriptional regulation of gene expression nuclear-transcribed mRNA catabolic process DNA fragmentation involved in apoptotic nuclear change mRNA metabolic process mRNA processing Translation
GO:0042255 GO:0022618 GO:0032465 GO:0006344 GO:0006414 GO:0034622 GO:0000737 GO:0010608
5 5 5 5 5 5 6 6
E-MTAB-3929,Day E-MTAB-3929,Day E-MTAB-3929,Day E-MTAB-3929,Day E-MTAB-3929,Day E-MTAB-3929,Day E-MTAB-3929,Day E-MTAB-3929,Day
RNA processing
GO:0006396 5
RNA splicing
GO:0008380 6
3,Lowest 3,Lowest 3,Lowest 7,Lowest 4,Lowest 3,Lowest 3,Lowest 3,Lowest
LSR; LSR; LSR; LSR; LSR; LSR; LSR; LSR;
E-MTAB-3929,Day 6,Highest LSR E-MTAB-3929,Day 6,Highest LSR E-MTAB-3929,Day 5,Highest LSR E-MTAB-3929,Day 3,Highest LSR E-MTAB-3929,Day 3,Highest LSR E-MTAB-3929,Day 6,Highest LSR SRP018525,2-cell,Highest LSR E-MTAB-3929,Day 6,Highest LSR
GO:0000956 7
E-MTAB-3929,Day 3,Lowest LSR; SRP018525,oocyte,Highest LSR
GO:0006309 7
E-MTAB-3929,Day 3,Lowest LSR; SRP018525,2-cell,Highest LSR
GO:0016071 5 GO:0006397 6 GO:0006412 5
E-MTAB-3929,Day 3,Lowest LSR; E-MTAB-3929,Day 3,Highest LSR; E-MTAB-3929,Day 5,Highest LSR E-MTAB-3929,Day 3,Lowest LSR; E-MTAB-3929,Day 3,Highest LSR; E-MTAB-3929,Day 5,Highest LSR E-MTAB-3929,Day 4,Lowest LSR; E-MTAB-3929,Day 6,Lowest LSR; E-MTAB-3929,Day 3,Highest LSR; SRP018525,pronucleus,Highest LSR E-MTAB-3929,Day 3,Lowest LSR; E-MTAB-3929,Day 6,Lowest LSR; E-MTAB-3929,Day 3,Highest LSR; EMTAB-3929,Day 5,Highest LSR E-MTAB-3929,Day 3,Lowest LSR; E-MTAB-3929,Day 6,Lowest LSR; E-MTAB-3929,Day 3,Highest LSR; EMTAB-3929,Day 5,Highest LSR
affecting early embryo development. Moreover, the results show that genes related to the “maintenance of chromatin silencing” have the highest LSR on day 3, and the lowest LSR on day 7, which indicates a repression of their function on day 3 of embryo development. This matches the time of the maternalezygotic transition, and suggests that the silencing of the chromatin is reversed before the compaction step and the commencement of the production of mRNA by the embryo itself.
discussed with respect to adults. The function of lncRNAs during the early embryonic development stages requires further research to be fully revealed.
4. Discussion
Transparency document
Our results reveal a trend in which LSR tends to increase during early embryo development, and in which the expression and LSR of genes are negatively correlated. This may provide the perspective on how embryos manipulate the expression of genes. It has been established that the initial stage of embryo development uses the mRNA preserved in the oocyte until the maternalezygotic transition occurs. After this, the embryo gradually starts to produce its own mRNA; LSR could play a role in gene regulation at the early stage [26]. The five enriched functions that were present in more than two stages are shown in Table 2. Four of these were related to RNA and were present during the stage that featured both the highest and the lowest LSRs, E-MTAB-3929 day 3, the stage during which the maternalezygotic transition occurs. This provides further indirect evidence that changes in LSR regulate gene expression during early embryo development. The lncRNAs were found to exhibit similar patterns to the genes in terms of LSR changes and between LSR and expression. However, most lncRNAs still lack functional annotation, so we used the coexpressed genes of lncRNAs to identify their plausible functions. We calculated PCCs for every possible lncRNAegene pair during early embryo development, and retained pairs with a PCC 0.80. From these pairs, we could compile a gene list to perform functional enrichment analysis through BiNGO, as previously described, to predict the function of those lncRNAs. The only enriched functions we thereby identified were those related to translation. Although it has been reported that lncRNAs have a regulatory effect in translation [27], this has mainly been
Transparency document related to this article can be found online at https://doi.org/10.1016/j.bbrc.2018.09.027.
Acknowledgements This work was supported by the Ministry of Science and Technology, Taiwan (MOST 104-2628-E-010-001-MY3, 105-2320-B002-057-MY3 and 07-2221-E-010-017-MY2).
Appendix A. Supplementary data Supplementary data related to this article can be found at https://doi.org/10.1016/j.bbrc.2018.09.027. References [1] G. Edwalds-Gilbert, K.L. Veraldi, C. Milcarek, Alternative poly(A) site selection in complex transcription units: means to an end? Nucleic Acids Res. 25 (1997) 2547e2561. [2] R. Sandberg, J.R. Neilson, A. Sarma, P.A. Sharp, C.B. Burge, Proliferating cells express mRNAs with shortened 3' untranslated regions and fewer microRNA target sites, Science 320 (2008) 1643e1647. [3] H. Zhang, J.Y. Lee, B. Tian, Biased alternative polyadenylation in human tissues, Genome Biol. 6 (2005) R100. [4] H.H. Liaw, C.C. Lin, H.F. Juan, H.C. Huang, Differential microRNA regulation correlates with alternative polyadenylation pattern between breast cancer and normal cells, PLoS One 8 (2013), e56958. [5] P. Zhang, M. Zucchelli, S. Bruce, F. Hambiliki, A. Stavreus-Evers, L. Levkov, H. Skottman, E. Kerkela, J. Kere, O. Hovatta, Transcriptome profiling of human pre-implantation development, PLoS One 4 (2009), e7844. [6] R.G. Edwards, C. Hansis, Initial differentiation of blastomeres in 4-cell human embryos and its significance for early embryogenesis and implantation, Reprod. Biomed. Online 11 (2005) 206e218. [7] A. Galan, D. Montaner, M.E. Poo, D. Valbuena, V. Ruiz, C. Aguilar, J. Dopazo, C. Simon, Functional genomics of 5- to 8-cell stage human embryos by blastomere single-cell cDNA analysis, PLoS One 5 (2010), e13615. [8] B. Sozen, A. Can, N. Demir, Cell fate regulation during preimplantation development: a view of adhesion-linked molecular interactions, Dev. Biol. 395 (2014) 73e83.
Please cite this article in press as: J.-Y. Chang, et al., Dynamics of alternative polyadenylation in human preimplantation embryos, Biochemical and Biophysical Research Communications (2018), https://doi.org/10.1016/j.bbrc.2018.09.027
J.-Y. Chang et al. / Biochemical and Biophysical Research Communications xxx (2018) 1e7 [9] R. Vassena, S. Boue, E. Gonzalez-Roca, B. Aran, H. Auer, A. Veiga, J.C. Izpisua Belmonte, Waves of early transcriptional activation and pluripotency program initiation during human preimplantation development, Development 138 (2011) 3699e3709. [10] J.T. Kung, D. Colognori, J.T. Lee, Long noncoding RNAs: past, present, and future, Genetics 193 (2013) 651e669. [11] G. Furlan, C. Rougeulle, Function and Evolution of the Long Noncoding RNA Circuitry Orchestrating X-chromosome Inactivation in Mammals, vol. 7, 2016, pp. 702e722. Wiley Interdiscip Rev RNA. [12] D. Tian, S. Sun, J.T. Lee, The long noncoding RNA, Jpx, is a molecular switch for X chromosome inactivation, Cell 143 (2010) 390e403. [13] C. Kanduri, N. Thakur, R.R. Pandey, The length of the transcript encoded from the Kcnq1ot1 antisense promoter determines the degree of silencing, EMBO J. 25 (2006) 2096e2106. [14] S.T. da Rocha, C.A. Edwards, M. Ito, T. Ogata, A.C. Ferguson-Smith, Genomic imprinting at the mammalian Dlk1-Dio3 domain, Trends Genet. 24 (2008) 306e316. [15] J. Zhao, T.K. Ohsumi, J.T. Kung, Y. Ogawa, D.J. Grau, K. Sarma, J.J. Song, R.E. Kingston, M. Borowsky, J.T. Lee, Genome-wide identification of polycombassociated RNAs by RIP-seq, Mol. Cell 40 (2010) 939e953. [16] J.L. Deuve, P. Avner, The coupling of X-chromosome inactivation to pluripotency, Annu. Rev. Cell Dev. Biol. 27 (2011) 611e629. [17] S. Petropoulos, D. Edsgard, B. Reinius, Q. Deng, S.P. Panula, S. Codeluppi, A. Plaza Reyes, S. Linnarsson, R. Sandberg, F. Lanner, Single-cell RNA-seq reveals lineage and X chromosome dynamics in human preimplantation embryos, Cell 165 (2016) 1012e1026. [18] S. Ren, Z. Peng, J.H. Mao, Y. Yu, C. Yin, X. Gao, Z. Cui, J. Zhang, K. Yi, W. Xu, C. Chen, F. Wang, X. Guo, J. Lu, J. Yang, M. Wei, Z. Tian, Y. Guan, L. Tang, C. Xu, L. Wang, X. Gao, W. Tian, J. Wang, H. Yang, J. Wang, Y. Sun, RNA-seq analysis of prostate cancer in the Chinese population identifies recurrent gene fusions, cancer-associated long noncoding RNAs and aberrant alternative splicings, Cell Res. 22 (2012) 806e821. [19] W.F. Ooi, M. Xing, C. Xu, X. Yao, M.K. Ramlee, M.C. Lim, F. Cao, K. Lim, D. Babu, L.F. Poon, J. Lin Suling, A. Qamra, A. Irwanto, J. Qu Zhengzhong, T. Nandi, A.P. Lee-Lim, Y.S. Chan, S.T. Tay, M.H. Lee, J.O. Davies, W.K. Wong, K.C. Soo,
[20]
[21]
[22]
[23]
[24]
[25]
[26] [27]
7
W.H. Chan, H.S. Ong, P. Chow, C.Y. Wong, S.Y. Rha, J. Liu, A.M. Hillmer, J.R. Hughes, S. Rozen, B.T. Teh, M.J. Fullwood, S. Li, P. Tan, Epigenomic profiling of primary gastric adenocarcinoma reveals super-enhancer heterogeneity, Nat. Commun. 7 (2016) 12983. G. Grillo, A. Turi, F. Licciulli, F. Mignone, S. Liuni, S. Banfi, V.A. Gennarino, D.S. Horner, G. Pavesi, E. Picardi, G. Pesole, UTRdb and UTRsite (RELEASE 2010): a collection of sequences and regulatory motifs of the untranslated regions of eukaryotic mRNAs, Nucleic Acids Res. 38 (2010) D75eD80. P.L. Martelli, M. D'Antonio, P. Bonizzoni, T. Castrignano, A.M. D'Erchia, P. D'Onorio De Meo, P. Fariselli, M. Finelli, F. Licciulli, M. Mangiulli, F. Mignone, G. Pavesi, E. Picardi, R. Rizzi, I. Rossi, A. Valletti, A. Zauli, F. Zambelli, R. Casadio, G. Pesole, ASPicDB: a database of annotated transcript and protein variants generated by alternative splicing, Nucleic Acids Res. 39 (2011) D80eD85. T. Derrien, R. Johnson, G. Bussotti, A. Tanzer, S. Djebali, H. Tilgner, G. Guernec, D. Martin, A. Merkel, D.G. Knowles, J. Lagarde, L. Veeravalli, X. Ruan, Y. Ruan, T. Lassmann, P. Carninci, J.B. Brown, L. Lipovich, J.M. Gonzalez, M. Thomas, C.A. Davis, R. Shiekhattar, T.R. Gingeras, T.J. Hubbard, C. Notredame, J. Harrow, R. Guigo, The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression, Genome Res. 22 (2012) 1775e1789. S. Maere, K. Heymans, M. Kuiper, BiNGO: a Cytoscape plugin to assess overrepresentation of gene ontology categories in biological networks, Bioinformatics 21 (2005) 3448e3449. R. Patro, S.M. Mount, C. Kingsford, Sailfish enables alignment-free isoform quantification from RNA-seq reads using lightweight algorithms, Nat. Biotechnol. 32 (2014) 462e464. M. Ashburner, C.A. Ball, J.A. Blake, D. Botstein, H. Butler, J.M. Cherry, A.P. Davis, K. Dolinski, S.S. Dwight, J.T. Eppig, M.A. Harris, D.P. Hill, L. Issel-Tarver, A. Kasarskis, S. Lewis, J.C. Matese, J.E. Richardson, M. Ringwald, G.M. Rubin, G. Sherlock, Gene ontology: tool for the unification of biology. The Gene Ontology Consortium, Nat. Genet. 25 (2000) 25e29. A.F. Schier, The maternal-zygotic transition: death and birth of RNAs, Science 316 (2007) 406e407. F. Rashid, A. Shah, G. Shan, Long non-coding RNAs in the cytoplasm, Dev. Reprod. Biol. 14 (2016) 73e80.
Please cite this article in press as: J.-Y. Chang, et al., Dynamics of alternative polyadenylation in human preimplantation embryos, Biochemical and Biophysical Research Communications (2018), https://doi.org/10.1016/j.bbrc.2018.09.027