Accepted Manuscript Identification of crucial regulatory relationships between long non-coding RNAs and protein-coding genes in lung squamous cell carcinoma Xiaofen Wu, Lei Ruan, Yi Yang, Qi Mei PII:
S0890-8508(16)30019-6
DOI:
10.1016/j.mcp.2016.02.009
Reference:
YMCPR 1201
To appear in:
Molecular and Cellular Probes
Received Date: 11 September 2015 Revised Date:
22 January 2016
Accepted Date: 19 February 2016
Please cite this article as: Wu X, Ruan L, Yang Y, Mei Q, Identification of crucial regulatory relationships between long non-coding RNAs and protein-coding genes in lung squamous cell carcinoma, Molecular and Cellular Probes (2016), doi: 10.1016/j.mcp.2016.02.009. This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT Identification of crucial regulatory relationships between long non-coding RNAs
2
and protein-coding genes in lung squamous cell carcinoma
3
Running title: Bioinformatic analysis of LUSC
4
Xiaofen Wu1, MD; Lei Ruan1, MD; Yi Yang1, MM; Qi Mei2*, MD
5
1 Department of Gerontology, Tongji Hospital, Tongji Medical College, Huazhong
6
University of Science and Technology, Wuhan 430030, China
7
2 Department of Oncology, Tongji Hospital, Tongji Medical College, Huazhong
8
University of Science and Technology, Wuhan 430030, China
9
*
SC
RI PT
1
M AN U
Correspondence author: Qi Mei
Address: Department of Oncology, Tongji Hospital, Tongji Medical College,
11
Huazhong University of Science and Technology, Jiefang Avenue 1095, Wuhan
12
430030, China
13
Tel: 027-83663407; Fax: 027-83662834
14
E-mail:
[email protected]
17 18 19 20
EP
16
AC C
15
TE D
10
21 22 23
1
ACCEPTED MANUSCRIPT Abstract
25
Purpose: This study aimed to analyze the relationships of long non-coding RNAs
26
(lncRNAs) and protein-coding genes in lung squamous cell carcinoma (LUSC).
27
Methods: RNA-seq data of LUSC deposited in the TCGA database were used to
28
identify differentially expressed protein-coding genes (DECGs) and differentially
29
expressed lncRNA genes (DE-lncRNAs) between LUSC samples and normal samples.
30
Functional enrichment analysis of DECGs was then performed. Subsequently, the
31
target genes and regulators of DE-lncRNAs were predicted from the DECGs.
32
Additionally, expression levels of target genes of DE-lncRNAs were validated by
33
RT-qPCR after the silence of DE-lncRNAs.
34
Results: In total, 5162 differentially expressed genes (DEGs) were screened from the
35
LUSC samples, and there were seven upregulated lncRNA genes in the DEGs. The
36
upregulated DECGs were enriched in GO terms like RNA binding and metabolic
37
process. Meanwhile, the downregulated DECGs were enriched in GO terms like cell
38
cycle. Furthermore, the lncRNAs PVT1 and TERC targeted multiple DECGs. PVT1
39
targeted genes related to cell cycle (e.g. POLA2, POLD1, MCM4, MCM5 and MCM6),
40
and reduced expression of PVT1 decreased expression of the genes. TERC regulated
41
several genes (e.g. NDUFAB1, NDUFA11 and NDUFB5), and reduced expression of
42
TERC increased expression of the genes. Additionally, PVT1 was regulated by
43
multiple transcription factors (TFs) identified from DECGs, such as HSF1; and TERC
44
was modulated by TFs, such as PIR.
45
Conclusion: A set of regulatory relationships between PVT1 and its targets and
AC C
EP
TE D
M AN U
SC
RI PT
24
2
ACCEPTED MANUSCRIPT 46
regulators, as well as TERC and its targets and regulators, may play crucial roles in
47
the progress of LUSC.
48
Key words: lung squamous cell carcinoma, differentially expressed protein-coding
50
gene, long non-coding RNA, regulatory network
51
Introduction
RI PT
49
Lung squamous cell carcinoma (LUSC) is a common type of non small-cell lung
53
cancer and the second leading cause of death resulting from lung cancer, causing
54
about 400,000 deaths per year worldwide [1, 2]. It is urgent to reveal the molecular
55
mechanisms underlying LUSC oncogenesis.
M AN U
SC
52
In the past years, various molecular players have been demonstrated to be
57
correlated with LUSC, such as protein-coding genes and lncRNAs. For instance, it
58
has been demonstrated that in the mouse lung, biallelic inactivation of Lkb1 and Pten
59
that causes elevated PD-L1 expression, leads to LUSC [3]. PRKCI and SOX2
60
oncogenes cooperate to stimulate activation of Hedgehog signaling and drive a
61
stem-like phenotype in LUSC [4]. Besides, high expression of FGFR1 was found in
62
16% of a clinical cohort of LUSC patients [5]. Furthermore, high expressions of the
63
lncRNAs MALAT1 and HOTAIR are associated with poor prognosis of LUSC [6, 7]. A
64
recent study has been reported that the lncRNA LINC01133 is highly expressed in
65
LUSC and it predicts survival time [8]. However, the interactions between
66
differentially expressed protein-coding genes (DECGs) and differentially expressed
67
lncRNA genes (DE-lncRNAs) are still elusive.
AC C
EP
TE D
56
3
ACCEPTED MANUSCRIPT In this study, RNA-seq data of LUSC deposited in The Cancer Genome Atlas
69
(TCGA) database were used to identify DECGs and DE-lncRNAs. Following the
70
functional analysis of DECGs, the regulatory relationships between DECGs and
71
DE-lncRNAs were analyzed. Additionally, expression levels of target genes of
72
DE-lncRNAs were validated after the silence of DE-lncRNAs. These results were
73
expected to provide novel information for the understanding of molecular
74
mechanisms of LUSC tumorigenesis.
75
Materials and methods
76
RNA sequencing data
M AN U
SC
RI PT
68
The preprocessed RNA-seq level 3 data of LUSC were downloaded from TCGA
78
database (http://cancergenome.nih.gov/), which provides a platform for researchers to
79
download the data sets of genomic abnormalities driving tumorigenesis. In total, the
80
dataset contains 17 pairs of matched primary solid tumor samples and adjacent normal
81
tissue samples, and RPKM (Reads Per Kilobase of exon model per Million mapped
82
reads) values of each gene in these samples were included.
83
DEGs and DE-lncRNAs screening
EP
AC C
84
TE D
77
The paired-sample T test was used to identify the DEGs (including DECGs and
85
non-protein coding genes) between the two matched samples, and p-value was
86
adjusted by the Benjamin and Hochberg (BH) method [9]. Only the genes with
87
adjusted p-value < 0.05 and FC (fold change) > 2 were chosen as DEGs.
88 89
Based
on
annotation
information
in
the
GENCODE
database
(http://www.gencodegenes.org/) [10], the DE-lncRNAs were screened from the 4
ACCEPTED MANUSCRIPT 90
identified DEGs.
91
Enrichment analysis of DECGs To further reveal the functions of DECGs, the Gene Ontology (GO) functional
93
and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment
94
analyses of DEGs were performed using the clusterProfiler package of R [11]. The
95
GO terms and pathway terms with adjusted p-value < 0.05 were selected as the
96
significant terms.
97
Prediction of DE-lncRNA target genes and their functions
M AN U
SC
RI PT
92
Based on the coexpression relationships of lncRNA and protein-coding genes,
99
the target genes of DE-lncRNAs were predicted from the screened DECGs, and only
100
genes with p-value < 0.05 and expression similarity > 0.8 were chosen for subsequent
101
analyses. The expression similarity of DECGs and lncRNAs was measured as
102
previous researchers did [12, 13]. Regulatory networks of lncRNAs and their target
103
genes were then visualized by Cytoscape (http://cytoscape.org/) [14]. Additionally, the
104
GO functional enrichment analysis of target genes was conducted using the
105
clusterProfiler package of R.
106
Prediction of upstream transcription factors (TFs) of DE-lncRNAs
EP
AC C
107
TE D
98
To further analyze the differential expression mechanisms of DE-lncRNAs,
108
based on the TF information in TRANSFAC database [15], we used MotifDb package
109
of R [16] to predict the potential TFs regulating DE-lncRNAs from DECGs. This
110
package is able to search the binding motif in TF of the promoter sequence, based on
111
the gene promoter sequences of lncRNAs provided by users. The matching degree of 5
ACCEPTED MANUSCRIPT motif sequence and promoter sequence was calculated by pulse-width modulation
113
(PWM) algorithm [17], and TFs with matching degree ≥ 85% were considered as the
114
potential upstream TFs regulating DE-lncRNAs. The TF-DE-lncRNA regulatory
115
network was visualized by Cytoscape.
116
Validation of downstream gene expression of significant lncRNAs
RI PT
112
Based on the gene sequences of lncRNAs in the National Center For
118
Biotechnology Information (NCBI) database, three small interfering RNA (siRNA)
119
sequences and one siRNA-scramble sequence for each lncRNAs (Table 6) were
120
designed using BLOCK-iT™ RNAi Designer (www.invitrogen.com/rnai). Besides,
121
based on the sequences of downstream genes of lncRNAs in NCBI, gene primers
122
were designed using the Primer5 software (PRIMER-E Ltd, Plymouth, UK). Primer
123
sequences of genes were listed in Table 6.
TE D
M AN U
SC
117
Human LUSC cell line SK-MES-1 purchased from the Cell Bank at the Chinese
125
Academy of Sciences were placed in a 12 well plate and transfected with siRNA
126
sequences and siRNA-scramble sequence using Lipofectamine 2000 (Invitrogen,
127
Carlsbad, CA, USA), respectively. After transfection of 24 h, total RNA for mRNA
128
expression detection was extracted from SK-MES-1 cells by TriZol reagent
129
(Invitrogen, Carlsbad, CA, USA) according to the manufacturer’s protocol.
130
Afterwards, total RNA was reverse transcribed using PrimeScript RT Master Mix
131
(Takara, RR036A, Shiga, Japan). After cDNA synthesis, mRNA expression levels
132
were tested using Power SYBR Green PCR Master Mix (#4367659, Applied
133
Biosystems, Foster City, CA, USA), and glyceraldehyde-3-phosphate dehydrogenase
AC C
EP
124
6
ACCEPTED MANUSCRIPT 134
(GAPDH) was used as the reference gene.
135
Statistical methods Statistical analysis was performed using the SPSS 19.0 software (SPSS, Chicago,
137
USA). One-way analysis of variance (ANOVA) followed by Duncan test was used to
138
compare group means. Difference with P<0.05 was considered significant.
RI PT
136
Results
141
Predicted DEGs and DE-lncRNAs
M AN U
140
SC
139
In total, 5162 DEGs (4489 upregulated ones and 673 downregulated ones) were
143
identified from the LUSC samples, comparing the normal samples. Among these
144
DEGs, there were 7 upregulated lncRNA genes (Table 1).
145
Functions of DECGs
TE D
142
According to GO functional enrichment analysis of DECGs in the DEGs, the
147
upregulated DECGs were significantly enriched in a series of GO terms, such as RNA
148
binding (e.g. CEBPZ, TRIM28 and HNRNPR), membrane-enclosed lumen (e.g.
149
MRPL52 and COL1A1) and metabolic process (e.g. CEBPG and CDH3) (Table 2).
150
Meanwhile, the downregulated DEGs were distinctly enriched in multiple GO terms,
151
such as ion binding (e.g. KIF20A and TRAIP), microtubule cytoskeleton (e.g. KIF18B
152
and CEP152) and cell cycle (e.g. KIF20A and CENPA) (Table 3).
AC C
EP
146
153
Furthermore, the upregulated DEGs were significantly several pathways,such as
154
RNA transport (e.g. PRMT5 and TACC3), spliceosome (e.g. CHERP and USP39) and
155
DNA replication (e.g. MCM2 and RFC3) (Table 4). While, no downregulated DEGs 7
ACCEPTED MANUSCRIPT 156
were enriched in significant pathways.
157
Target genes of DE-lncRNAs and their functions Totally, 347 target genes of five DE-lncRNAs (PVT1, TERC, HAR1A, TTTY16
159
and FAM138B) were identified. Among the five DE-lncRNAs, PVT1 and TERC
160
targeted multiple genes. For example, PVT1 regulated a set of genes, such as HSF1,
161
USF2 and CEBPG; TERC modulated a series of genes, such as PIR, NDUFB5 and
162
PRPS2 (Figure 1).
SC
RI PT
158
According to GO functional enrichment analysis in biological process (BP) of
164
target genes of PVT1 and TERC, PVT1 target genes were mainly enriched in GO
165
terms of cell cycle, such as nucleobase−containing compound metabolic process,
166
heterocycle metabolic process and cellular nitrogen compound metabolic process
167
(Figure 2A). Furthermore, a set of PVT1 target genes were significantly enriched in
168
six pathways, such as DNA replication (e.g. POLA2, POLD1, and MCM5), purine
169
metabolism (e.g. POLA2 and POLD1) and cell cycle (e.g. MCM4, MCM5and MCM6)
170
(Table 5).
EP
TE D
M AN U
163
TERC target genes were mainly enriched in the metabolic process and biological
172
process (Figure 2B). A series of target genes (e.g. NDUFAB1, NDUFA11, NDUFB5,
173
NDUFA8 and NDUFB4) were markedly enriched in several pathways, such as
174
oxidative phosphorylation and metabolic pathways (Table 5).
175
Analysis of TF-DE-lncRNA regulatory network
AC C
171
176
A total of 163 TFs were identified from DECGs, including 139 upregulated ones
177
and 24 downregulated ones. Among these TFs, 73 TFs potentially modulated 7 8
ACCEPTED MANUSCRIPT DE-lncRNAs. The regulatory network consisted of 339 interactions, including 7
179
DE-lncRNAs and 73 TFs. There were four coexpressed relationships in this network,
180
including PVT1 and its three potential upstream TFs (HSF1, USF2 and CEBPG), as
181
well as TERC and its potential upstream TF PIR (Figure 3).
182
Gene expression validated by real-time quantitative polymerase chain reaction
183
(RT-qPCR)
RI PT
178
To validate the regulation of lncRNAs PVT1 and TERC on the expression of
185
their downstream genes, RT-qPCR was used to determine the gene expression after
186
lncRNAs were silenced by siRNAs. Among the three siRNAs, mRNA expression of
187
PVT1 silenced by siRNA 3 and TERC silenced by siRNA 1 was the lowest (Figure 4A
188
and B). Therefore, siRNA 3 for PVT1 and siRNA 1 for TERC were used for the
189
determination of expression of downstream target genes. After the silence of PVT1, all
190
of the mRNA expression of target genes (POLD1, POLA2, MCM4, MCM5 and
191
MCM6) was significantly decreased (p < 0.05), comparing with the control (Figure
192
4C). Furthermore, after the silence of TERC, all of the mRNA expression of target
193
genes (NDUFB5, NDUFA11, NDUFAB1, NDUFB4 and NDUFA8) was significantly
194
increased (p < 0.05), comparing with the control (Figure 4D).
196
M AN U
TE D
EP
AC C
195
SC
184
Discussion
197
In the current study, 5162 DEGs (4489 upregulated ones and 673 downregulated
198
ones) were screened from the LUSC samples, comparing the normal samples. Among
199
these DEGs, there were seven upregulated lncRNA genes, some of which (PVT1 and 9
ACCEPTED MANUSCRIPT 200
TERC) targeted multiple genes. PVT1 is encoded by a gene that resides in the well-known cancer risk region
202
8q24, and it plays a pivotal role in cancers, such as LUSC [18]. In this study, PVT1
203
targeted a set of genes, such as POLA2, POLD1, MCM4, MCM5 and MCM6, and it
204
was validated that reduced expression of PVT1 decreased the expression of the five
205
target genes. These five genes were significantly enriched in several GO terms of cell
206
cycle, such as DNA replication, purine metabolism and cell cycle. Both POLA2 and
207
POLD1 encode DNA polymerase. POLD1 is involved in DNA single-strand break
208
repair process, and increase genomic instability of POLD1 leads to a high incidence
209
of epithelial tumours in mice [19]. Dysregulated cell cycle results in aberrant cell
210
growth, which may lead to tumorigenesis [20]. Thus, POLA2 and POLD1 might play
211
key roles in LUSC. MCM4, MCM5 and MCM6 encode minichromosome maintenance
212
(MCM) complex. Study has demonstrated that MCM4 expression is higher in LUSC
213
cells than in adjacent normal bronchial epithelial cells [21]. There is no evidence that
214
MCM5 and MCM6 are associated with LUSC, and they may participate in the
215
progress of LUSC via DNA replication.
SC
M AN U
TE D
EP
AC C
216
RI PT
201
In this study, PVT1 was also regulated by some TFs, such as HSF1, CEBPG and
217
USF2, which were identified from the DEGs. HSF1 (Heat Shock Transcription Factor
218
1) was found to be enriched in the metabolic process in the present study. A previous
219
study has discovered copy number gain in HSF1 gene region in LUSC [22]. Besides,
220
CEBPG (CCAAT/enhancer binding protein (C/EBP), gamma) is a member of the
221
bZIP family of DNA binding proteins, and it is expressed allele-specifically in human 10
ACCEPTED MANUSCRIPT bronchial epithelial cells [23]. USF2 encodes a member of the helix-loop-helix
223
leucine zipper family, and can function as a cellular TF [24]. USF-2 overexpression in
224
lung squamous cell carcinoma cell line NCI-H2170 significantly stimulates in vitro
225
cell proliferation [25]. Taken together, the lncRNA PVT1 might exert essential roles in
226
LUSC, through its regulation on targets (e.g. POLA2, POLD1, MCM4, MCM5 and
227
MCM6) and the modulation by TFs (e.g. HSF1, CEBPG and USF2).
RI PT
222
Another lncRNA TERC is a telomerase RNA component encoding gene, and its
229
expression is common in LUSC [26]. Study has shown that TERC is upregulated in
230
LUSC cell lines, which is consistent with the result in this study [27]. It is a likely
231
target of the 3q26 amplification, and may be a useful biomarker for diagnosis of
232
non-small cell lung cancer [27]. In this study, TERC regulated a series of DEGs, such
233
as NDUFAB1, NDUFA11, NDUFB5, NDUFA8 and NDUFB4. It was validated that
234
reduced expression of TERC increased the expression of the five target genes. These
235
genes all encode NADH dehydrogenase (ubiquinone) 1, and were enriched in
236
oxidative phosphorylation (OXPHOS) and metabolic pathways. Inhibition of the
237
activity of OXPHOS stimulates a rapid increase in their rates of aerobic glycolysis in
238
LUSC cells. There is no evidence to prove the correlation of the five genes with
239
LUSC, while, NADH dehydrogenase (ubiquinone) alpha subcomplex-1 (NDUFA1) is
240
an accessory component of OXPHOS complex-I, which is indispensable for
241
respiratory activity [28]. NDUFA10 is significantly dysregulated in head and neck
242
squamous cell carcinoma [29]. Therefore, TERC and its targets might function in
243
LUSC through the oxidative phosphorylation pathway. Additionally, TERC was found
AC C
EP
TE D
M AN U
SC
228
11
ACCEPTED MANUSCRIPT to be regulated by the TF PIR. PIR encodes an Fe(II)-containing nuclear protein,
245
which is a member of the cupin superfamily [30]. It interacts with B cell lymphoma
246
3-encoded oncoprotein and nuclear factor I/CCAAT box transcription factor,
247
indicating that PIR may be involved in the modulation of DNA replication and
248
transcription [31]. There is no evidence that PIR is associated with LUSC so far.
249
Thereby, PIR may play pivotal roles in LUSC, via the regulation on TERC expression.
250
However, there were some limitations in this study. The aforementioned results,
251
including the expression level of lncRNAs and genes, as well as the functions of them,
252
were needed to be validated by experiments, which would be conducted in our further
253
studies.
M AN U
SC
RI PT
244
In conclusion, 5162 DEGs were screened from the LUSC samples, and there
255
were seven upregulated lncRNAs in the DEGs. Among the DECGs and DE-lncRNAs,
256
the lncRNA PVT1 and its targets (e.g. POLA2, POLD1, MCM4, MCM5 and MCM6)
257
as well as its regulators (e.g. HSF1, CEBPG and USF2), TERC and its targets (e.g.
258
NDUFAB1, NDUFA11 and NDUFB5) as well as its regulators (e.g. PIR) might exert
259
crucial functions in the tumorigenesis of LUSC. These findings were expected to
260
provide a theoretical basis for further experimental studies, and helpful to shed light
261
on the molecular mechanisms of LUSC.
EP
AC C
262
TE D
254
263 264
Acknowledgement
265
This study was supported by Hubei Provincial Natural Science Foundation of
266
China(No. 2014CFB366). 12
ACCEPTED MANUSCRIPT 267
Conflict of interests
268
All authors declare that they have no conflict of interests to state.
269
RI PT
270 271 272
SC
273
M AN U
274 275 276 277
TE D
278 279 280
EP
281
283 284
AC C
282
285 286
References
287
[1] Network CGAR. Comprehensive genomic characterization of squamous cell lung
288
cancers. Nature. 2012;489:519-25. 13
ACCEPTED MANUSCRIPT [2] Siegel R, Naishadham D, Jemal A. Cancer statistics, 2013. CA Cancer J Clin.
290
2013;63:11-30.
291
[3] Xu C, Fillmore CM, Koyama S, Wu H, Zhao Y, Chen Z, et al. Loss of Lkb1 and
292
Pten leads to lung squamous cell carcinoma with elevated PD-L1 expression. Cancer
293
Cell. 2014;25:590-604.
294
[4] Justilien V, Walsh MP, Ali SA, Thompson EA, Murray NR, Fields AP. The PRKCI
295
and SOX2 oncogenes are coamplified and cooperate to activate Hedgehog signaling
296
in lung squamous cell carcinoma. Cancer Cell. 2014;25:139-51.
297
[5] Heist RS, Mino-Kenudson M, Sequist LV, Tammireddy S, Morrissey L, Christiani
298
DC, et al. FGFR1 amplification in squamous cell carcinoma of the lung. J Thorac
299
Oncol. 2012;7:1775.
300
[6]
301
andthymosinbeta4predictmetastasisandsurvivalinearly
302
smallcelllungcancer. Oncogene. 2003;22:8031.
303
[7] Liu X-h, Liu Z-l, Sun M, Liu J, Wang Z-x, De W. The long non-coding RNA
304
HOTAIR indicates a poor prognosis and promotes metastasis in non-small cell lung
305
cancer. BMC Cancer. 2013;13:464.
306
[8] Zhang J, Zhu N, Chen X. A novel long noncoding RNA LINC01133 is upregulated
307
in lung squamous cell cancer and predicts survival. Tumor Biology. 2015:1-7.
308
[9] Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and
309
powerful approach to multiple testing. Journal of the Royal Statistical Society Series
310
B (Methodological). 1995:289-300.
D,
Wang
W.
mALAT
1,
anovelnoncoding
RnA,
stage
non
AC C
EP
TE D
JiP
M AN U
SC
RI PT
289
14
ACCEPTED MANUSCRIPT [10] Harrow J, Frankish A, Gonzalez JM, Tapanari E, Diekhans M, Kokocinski F, et al.
312
GENCODE: the reference human genome annotation for The ENCODE Project.
313
Genome Res. 2012;22:1760-74.
314
[11] Yu G, Wang L-G, Han Y, He Q-Y. clusterProfiler: an R package for comparing
315
biological themes among gene clusters. OMICS. 2012;16:284-7.
316
[12] Eisen MB, Spellman PT, Brown PO, Botstein D. Cluster analysis and display of
317
genome-wide expression patterns. Proc Natl Acad Sci U S A. 1998;95:14863-8.
318
[13] Herzel H, Beule D, Kielbasa S, Korbel J, Sers C, Malik A, et al. Extracting
319
information from cDNA arrays. CHAOS: an interdisciplinary journal of nonlinear
320
science. 2001;11:98-107.
321
[14] Kohl M, Wiese S, Warscheid B. Cytoscape: software for visualization and
322
analysis of biological networks.
323
291-303.
324
[15] Wingender E. The TRANSFAC project as an example of framework technology
325
that supports the analysis of genomic regulation. Brief Bioinform. 2008;9:326-32.
326
[16] Shannon P. MotifDb: an annotated collection of protein-DNA binding sequence
327
motifs. R package version 1.2. 2.
328
[17] Beckstette M, Homann R, Giegerich R, Kurtz S. Fast index based algorithms and
329
software for matching position specific scoring matrices. BMC Bioinformatics.
330
2006;7:389.
331
[18] Colombo T, Farina L, Macino G, Paci P. PVT1: A Rising Star among Oncogenic
332
Long Noncoding RNAs. BioMed research international. 2015;2015.
M AN U
SC
RI PT
311
AC C
EP
TE D
Data Mining in Proteomics: Springer; 2011. p.
15
ACCEPTED MANUSCRIPT [19] Parsons JL, Preston BD, O'Connor TR, Dianov GL. DNA polymerase
334
δ-dependent repair of DNA single strand breaks containing 3′-end proximal lesions.
335
Nucleic Acids Res. 2007;35:1054-63.
336
[20] Baserga R. The relationship of the cell cycle to tumor growth and control of cell
337
division: A review. Cancer Res. 1965;25:581-95.
338
[21] Kikuchi J, Kinoshita I, Shimizu Y, Kikuchi E, Takeda K, Aburatani H, et al.
339
Minichromosome maintenance (MCM) protein 4 as a marker for proliferation and its
340
clinical and clinicopathological significance in non-small cell lung cancer. Lung
341
Cancer. 2011;72:229-37.
342
[22] Baik S-H, Jee B-K, Choi J-S, Yoon H-K, Lee K-H, Kim Y-H, et al. DNA
343
profiling by array comparative genomic hybridization (CGH) of peripheral blood
344
mononuclear cells (PBMC) and tumor tissue cell in non-small cell lung cancer
345
(NSCLC). Mol Biol Rep. 2009;36:1767-78.
346
[23] Blomquist TM, Brown RD, Crawford EL, de la Serna I, Williams K, Yoon Y, et
347
al. CEBPG exhibits allele-specific expression in human bronchial epithelial cells.
348
Gene regulation and systems biology. 2013;7:125.
349
[24] Groenen PM, Garcia E, Debeer P, Devriendt K, Fryns JP, Van de Ven WJ.
350
Structure, Sequence, and Chromosome 19 Localization of HumanUSF2and Its
351
Rearrangement in a Patient with Multicystic Renal Dysplasia. Genomics.
352
1996;38:141-8.
353
[25] Ocejo‐Garcia M, Baokbah TA, Louise Ashurst H, Cowlishaw D, Soomro I,
354
Coulson JM, et al. Roles for USF‐2 in lung cancer proliferation and bronchial
AC C
EP
TE D
M AN U
SC
RI PT
333
16
ACCEPTED MANUSCRIPT carcinogenesis. J Pathol. 2005;206:151-9.
356
[26] Soder AI, Going JJ, Kaye SB, Keith WN. Tumour specific regulation of
357
telomerase RNA gene expression visualized by in situ hybridization. Oncogene.
358
1998;16:979-83.
359
[27] Yokoi S, Yasui K, Iizasa T, Imoto I, Fujisawa T, Inazawa J. TERC identified as a
360
probable target within the 3q26 amplicon that is detected frequently in non-small cell
361
lung cancers. Clin Cancer Res. 2003;9:4705-13.
362
[28] Mamelak AJ, Kowalski J, Murphy K, Yadava N, Zahurak M, Kouba DJ, et al.
363
Downregulation of NDUFA1 and other oxidative phosphorylation‐related genes is a
364
consistent feature of basal cell carcinoma. Exp Dermatol. 2005;14:336-48.
365
[29] Teh M-T, Gemenetzidis E, Patel D, Tariq R, Nadir A, Bahta AW, et al. FOXM1
366
induces a global methylation signature that mimics the cancer epigenome in head and
367
neck squamous cell carcinoma. Plos One. 2012;7:e34329.
368
[30] Wendler WM, Kremmer E, Förster R, Winnacker E-L. Identification of pirin, a
369
novel highly conserved nuclear protein. J Biol Chem. 1997;272:8482-9.
370
[31] Pang H, Bartlam M, Zeng Q, Miyatake H, Hisano T, Miki K, et al. Crystal
371
structure of human pirin An iron-binding nuclear protein and transcription cofactor. J
372
Biol Chem. 2004;279:1491-8.
SC
M AN U
TE D
EP
AC C
373
RI PT
355
374
Figure legends
375
Figure 1 The networks composed of differentially expressed long non-coding RNAs
376
(DE-lncRNAs) and their target genes. The diamond nodes represent DE-lncRNAs; 17
ACCEPTED MANUSCRIPT oval red nodes represent upregulated differentially expressed protein-coding genes
378
(DECGs); and oval green nodes represent downregulated DECGs.
379
Figure 2 The top 10 Gene Ontology (GO) terms with the highest p-value for target
380
genes of differentially expressed long non-coding RNAs (DE-lncRNAs). The value in
381
x-axis represents the number of genes enriched in the GO terms.
382
Figure 3 The regulatory network composed of differentially expressed long
383
non-coding RNAs (DE-lncRNAs) and transcription factors (TFs) identified from
384
differentially expressed protein-coding genes (DECGs). The diamond nodes represent
385
DE-lncRNAs; the triangular red nodes represent upregulated DECGs; and triangular
386
green nodes represent downregulated DECGs.
387
Figure 4 Relative mRNA expression levels of differentially expressed long
388
non-coding RNAs (DE-lncRNAs) and their target genes. (A) Relative mRNA
389
expression levels of lncRNA PVT1 silenced by three small interfering RNAs
390
(siRNAs). (B) Relative mRNA expression levels of lncRNA TERC silenced by three
391
siRNAs. (C) Relative mRNA expression levels of target genes after the silence of
392
PVT1. (D) Relative mRNA expression levels of target genes after the silence of TERC.
393
The NC group represents that lncRNA is silenced by siRNA-scramble sequence as a
394
negative control.
AC C
EP
TE D
M AN U
SC
RI PT
377
18
ACCEPTED MANUSCRIPT
Chr
Start
End
Strand
DIO3OS
chr14
101552221
101560431
-
HAR1A
chr20
63102205
63104386
+
FAM138B chr2
113577382
113578852
+
BCYRN1
chr2
47331060
47344517
+
TERC
chr3
169764520
169765060
-
HCG9
chr6
29975112
29978410
+
PVT1
chr8
127794533
128101253
+
M AN U
SC
LncRNA
RI PT
Table 1 The details of the identified differentially expressed lncRNAs
AC C
EP
TE D
Chr, chromosome; the columns of “Start” and “End” represent the start site and end site of the gene sequence of lncRNA, respectively. “+” and “-” represent positive-sense strand and antisense strand, respectively.
ACCEPTED MANUSCRIPT
Term
Adjusted p-value
Gene count
MF
GO:0003674
molecular_function
9.38E-257
3569
GO:0003723
RNA binding
1.03E-132
703
GO:0044822
poly(A) RNA binding
5.68E-125
569
GO:0005488
binding
4.57E-115
3006
GO:1901363
heterocyclic compound binding
1.33E-92
1641
GO:0031974
membrane-enclosed lumen
7.93E-122
GO:0044428
nuclear part
GO:0070013
intracellular organelle lumen
GO:0043233
organelle lumen
GO:0044424
intracellular part
GO:0008150 GO:0008152
BP
TE D
1114
1010
1.91E-116
1069
8.30E-115
1081
1.41E-107
3215
biological_process
1.41E-214
3624
metabolic process
1.71E-114
2815
EP
4.85E-117
AC C
CC
Genes
CEBPG, ITGAV, INCENP, CEBPZ, TRIM28, SLC25A13, SLC25A15, CDK2, ZNF256, CDK3… CEBPZ, TRIM28, ALYREF, MPHOSPH10, DDX39A, HNRNPR, HRSP12, POP7, NUDT5, WDHD1, STRAP… TRAP1, G3BP1, SUGP2, CEBPZ, TRIM28, ALYREF, SNRNP200, RBM34, RRP1B, WDR43… HSF1, CEBPG, CD24, PDCD6, BCL2L11, CHAF1A, PARP2, HMGXB4, DNAJB6, SMC4… HSF1, USF2, CEBPG, SMC4, UBA2, FARSB, ABCC5, ABCB6, DNM1L, ABCF2… MRPL52, LEO1, EARS2, NR2C2AP, COL1A1, COL1A2, COL3A1, MRPL55, CSTF2, SPC24… PDCD6, CHAF1A, PARP2, HMGXB4, SMC4, UBA2, SAE1, SNUPN, LRPPRC, CDK4… UBA2, SAE1, PPIF, LRPPRC, PDIA6, TRAP1, TRIM28, CDK2, ALYREF, CDK4… SMC4, UBA2, SAE1, PPIF, LRPPRC, PDIA6, TRAP1, TRIM28, CDK2, ALYREF… HSF1, CEBPG, CDH3, POM121C, OST4, ZNF737, PDCD6, BCL2L11, TOMM6, FRAT1 CEBPG, HNRNPR, RABEPK, TIMM17B, HRSP12, POP7, CNKSR1, SF3B4, NET1, UBE4B… HSF1, CEBPG, CDH3, ZNF737, CD24, PDCD6, BCL2L11, TOMM6,
SC
ID
M AN U
Category
RI PT
Table 2 The top 5 enriched GO terms of the upregulated differentially expressed protein-coding genes
ACCEPTED MANUSCRIPT
4.33E-105
2579
GO:0044237
cellular metabolic process
7.33E-104
2533
GO:0044260
cellular macromolecule metabolic process
1.55E-102
2032
RI PT
primary metabolic process
SC
GO:0044238
CHAF1A, PARP2… HSF1, CEBPG, ZNF737, CD24, PDCD6, BCL2L11, TOMM6, CHAF1A, PARP2, MAMLD1… HSF1, CEBPG, PPIF, PREB, ARL4C, LRPPRC, PDIA6, TRAP1, CEBPZ, TRIM28… HSF1, CEBPG, CHAF1A, PARP2, MAMLD1, DNAJB6, SMC4, UBA2, SAE1, FARSB…
AC C
EP
TE D
M AN U
GO, gene ontology; MF, molecular function; CC, cellular component; BP, biological process.
ACCEPTED MANUSCRIPT
Adjusted p-value
Gene count
MF
GO:0003674
molecular_function
1.35E-31
505
GO:0005488
binding
2.54E-14
427
GO:0043167
ion binding
8.19E-08
228
GO:0097367
carbohydrate binding
GO:0003777
microtubule motor activity
0.000287
GO:0005575
cellular_component
1.24E-11
GO:0000775
chromosome, centromeric region
GO:0015630
microtubule cytoskeleton
GO:0005819
spindle
GO:0000793
condensed chromosome
0.000246
18
GO:0008150
biological_process
1.06E-25
521
GO:0044699
single-organism process
1.98E-13
446
BP
0.000127
95 11
556
20
2.76E-05
57
7.14E-05
24
EP
8.44E-06
AC C
CC
derivative
Genes
RI PT
Term
SNRPGP15, TROAP, MEF2B, ZNF605, KIF20A, KLRG1, KCNMB2, TRAIP, NMUR1, CENPA… ZNF605, KIF20A, KLRG1, TRAIP, KLHL41, ABCA9, CORO2B, CENPA, CENPF, IGF2BP3… KIF20A, TRAIP, ABCA9, DMRT2, PLK4, MASP2, CAPN9, CFTR, ADAM28, DBF4… CFTR, KIF2C, KIF12, PSTK, TDRD9, EGFLAM, KIF18B, CTSG, NEK10, ABCA13… KIF20A, KIF2C, KIF12, KIF18B, DYNC1I1, KIF26A, KIF26B, KIF15, DNAI2, DYNC2H1… TMEM170B, LRRC70, SNRPGP15, SMIM6, TROAP, MEF2B, ZNF605, KIF20A, KLRG1, KCNMB2… CENPA, CENPF, KIF2C, CDCA5, OIP5, SGOL2, ESCO2, DYNC1I1, HELLS, BIRC5… KIF12, CCSAP, KIF18B, CKAP2L, SASS6, DYNC1I1, CEP152, NCAPH, POC1A, ASPM… KIF20A, CENPF, CCSAP, KIF18B, CKAP2L, DYNC1I1, POC1A, ASPM, WDR62, BIRC5… CENPA, CENPF, KIF2C, CDCA5, SGOL2, DYNC1I1, NCAPH, BIRC5, CENPW, MKI67… TROAP, MEF2B, ZNF605, KIF20A, KLRG1, KCNMB2, TRAIP, NMUR1, KLHL41, ABCA9… MEF2B, KIF20A, KLRG1, KCNMB2, NMUR1, KLHL41, ABCA9,
SC
ID
M AN U
Category
TE D
Table 3 The top 5 enriched GO terms of the downregulated differentially expressed protein-coding genes
ACCEPTED MANUSCRIPT
GO:0007049 GO:0000278
cellular
6.96E-12
411
cell cycle
5.82E-11
97
mitotic cell cycle
6.91E-11
68
RI PT
single-organism process
SC
GO:0044763
CORO2B, CENPA, CENPF… KIF20A, KLRG1, KCNMB2, TRAIP, NMUR1, KLHL41, ABCA9, CORO2B, CENPA, CENPF… KIF20A, CENPA, CENPF, PLK4, DBF4, KIF2C, CDCA5, OIP5, PARD3B, TDRD9… KIF20A, CENPA, CENPF, PLK4, DBF4, KIF2C, CDCA5, OIP5, IQGAP3, KIF18B…
AC C
EP
TE D
M AN U
GO, gene ontology; MF, molecular function; CC, cellular component; BP, biological process.
ACCEPTED MANUSCRIPT
ID
Term
Adjusted p-value
Gene count
hsa03013
RNA transport
8.41E-21
93
hsa03040
Spliceosome
1.31E-19
81
hsa03008
Ribosome biogenesis in eukaryotes
2.20E-11
50
hsa03050
Proteasome
8.92E-09
31
hsa03030
DNA replication
4.18E-08
26
Genes
RI PT
Table 4 The top 5 enriched pathways with the highest p-value of the upregulated differentially expressed protein-coding genes
AC C
EP
TE D
M AN U
SC
POP7, PRMT5, TACC3, RPP30, PAIP1, NUP50, RPP40, RNPS1, NUPL2, STRAP… PPIH, CHERP, USP39, SRSF10, TXNL4A, TCERG1, SF3A3, SF3B2, U2AF2, HNRNPA1L2… PWP2, RAN, NOL6, TCOF1, XPO1, RIOK1, WDR75, UTP15, CIRH1A, IMP4… PSMD2, PSMD3, PSMD4, PSMD7, PSMD8, PSMD11, PSMD12, PSMD13, SHFM1, PSMF1… FEN1, POLA2, RNASEH1, LIG1, MCM2, MCM3, MCM4, MCM5, RFC2, RFC3…
ACCEPTED MANUSCRIPT
Table 5 The results of pathway enrichment analysis of target genes of PVT1 and TERC Adjusted p-value
Gene count
PVT1 target genes
hsa03030
DNA replication
2.93E-12
11
hsa03040
Spliceosome
2.54E-07
12
hsa03410 hsa00230 hsa04110 hsa00240
Base excision repair Purine metabolism Cell cycle Pyrimidine metabolism
9.58E-06 0.001842 0.001842 0.002039
6 8 7 6
hsa05012
Parkinson's disease
1.81E-12
hsa05016
Huntington's disease
3.85E-12
14
hsa05010
Alzheimer's disease
3.05E-10
12
hsa00190
Oxidative phosphorylation
hsa01100
Metabolic pathways
13
TE D EP
AC C
TERC target genes
Genes
RI PT
Term
POLA2, PCNA, PRIM1, FEN1, MCM5, LIG1, POLE, MCM6, MCM4, PRIM2, POLD1 SNRPA, SNRNP70, SNRPD1, SF3A2, SF3B4, PUF60, PRPF3, U2AF2, EFTUD2, TXNL4A, SNRPE, HNRNPC PCNA, FEN1, LIG1, POLE, MUTYH, POLD1 PAICS, POLA2, PRIM1, POLE, GART, POLR2I, PRIM2, POLD1 PCNA, PTTG1, MCM5, E2F1, MCM6, MCM4, CDK2 POLA2, PRIM1, POLE, POLR2I, PRIM2, POLD1 NDUFAB1, COX5B, SLC25A5, NDUFA11, NDUFA12, ATP5G3, COX7B, COX5A, NDUFB5, COX6A1, CASP3, NDUFA8, NDUFB4 NDUFAB1, COX5B, SLC25A5, NDUFA11, NDUFA12, POLR2H, ATP5G3, COX7B, COX5A, NDUFB5, COX6A1, CASP3, NDUFA8, NDUFB4 NDUFAB1, COX5B, NDUFA11, NDUFA12, ATP5G3, COX7B, COX5A, NDUFB5, COX6A1, CASP3, NDUFA8, NDUFB4 NDUFAB1, COX5B, NDUFA11, NDUFA12, ATP5G3, COX7B, COX5A, NDUFB5, COX6A1, NDUFA8, NDUFB4 NDUFAB1, PRPS2, FUT5, COX5B, NDUFA11, PTS, ALG3, POLR2H, PDHA1, ATP5G3, COX7B, COX5A, NDUFB5, COX6A1, NDUFA8, NDUFB4
SC
ID
M AN U
Category
3.05E-10
11
0.000663
16
ACCEPTED MANUSCRIPT
Table 6 Sequences of small interfering RNAs and gene primers Name
Sequences
siRNA
PVT1-siRNA 1
Positive-sense strand: CCAGAAUCCUGUUACACCUdTdT Antisense strand: AGGUGUAACAGGAUUCUGGdTdT Positive-sense strand: GCACAUUUCAGGAUACUAAdTdT Antisense strand: UUAGUAUCCUGAAAUGUGCdTdT Positive-sense strand: GGCCUCGUGUCUAUUAAAUdTdT Antisense strand: AUUUAAUAGACACGAGGCCdTdT Positive-sense strand: GUUCAUUCUAGAGCAAACAdTdT Antisense strand: UGUUUGCUCUAGAAUGAACdTdT Positive-sense strand: GUCUAACCCUAACUGAGAAdTdT Antisense strand: UUCUCAGUUAGGGUUAGACdTdT Positive-sense strand: CACCGUUCAUUCUAGAGCAdTdT Antisense strand: UGCUCUAGAAUGAACGGUGdTdT Positive-sense strand: UUCUCCGAACGUGUCACGUdTdT Antisense strand: ACGUGACACGUUCGGAGAAdTdT Forward: ATCTTCAGTGGTCTGGGGAATAACG Reverse: CCATGGACATCCAAGCTGTAAGAAT Forward: TGGCCATTTTTTGTCTAACCCTAAC Reverse: TTTGCTCTAGAATGAACGGTGGAAG Forward: TACACCACACCTTCAAAGGGTTCTC Reverse: ACTTGACGGTGAGAGTAGCTGATGG Forward: GTGACCTGCAACGGGAGCTGAACTT Reverse: CCGTGGTACCCAAACATGCTCTCT Forward: AATCTTCTTTGACCGTTACCCTGAC Reverse: ATGAGCTGGTCAATGTCTTCTGGAT
TERC-siRNA 3 siRNA-scramble Primer
PVT1 TERC POLA2 POLD1 MCM4
SC
M AN U
TERC-siRNA 2
TE D
TERC-siRNA 1
EP
PVT1-siRNA 3
AC C
PVT1-siRNA 2
RI PT
Category
ACCEPTED MANUSCRIPT
NDUFA8 NDUFB4 GAPDH
RI PT
SC
NDUFB5
M AN U
NDUFA11
TE D
NDUFAB1
EP
MCM6
Forward: TTCCAGCATTCGTAGCCTGAAGTCG Reverse: GGCACTGGATAGAGATGCGGGT Forward: CAAGACCTGCCTACCAGACACAAGA Reverse: AGCACAGAAAAGTTCCGCTCACAAG Forward: TATGACAAGATTGACCCAGAGAAGC Reverse: CGTCTTCCATGGCCATGATAATCTC Forward: TCACACTCAATCCTCCGGGCACCTT Reverse: TGATGCAGGTGGTGAGGCCAAACA Forward: AAAGAACAATGGCCGTCCTTCAGAT Reverse: TAATACCAGGGTCCATCTCCTCTCA Forward: AACAGCAGGCAAAGTTTGACGAGTG Reverse: GGTCTTGGTCTTGAGTGATAGGGAT Forward: TACCTGCTTCAGTACAACGATCCCA Reverse: ACAGAGCTCCCATGAGTGAGTTTTT Forward: AGAAGGCTGGGGCTCATTTG Reverse: AGGGGCCATCCACAGTCTTC
AC C
MCM5
AC C
EP
TE D
M AN U
SC
RI PT
ACCEPTED MANUSCRIPT
AC C
EP
TE D
M AN U
SC
RI PT
ACCEPTED MANUSCRIPT
AC C
EP
TE D
M AN U
SC
RI PT
ACCEPTED MANUSCRIPT
AC C
EP
TE D
M AN U
SC
RI PT
ACCEPTED MANUSCRIPT
ACCEPTED MANUSCRIPT Highlights
In total, 5162 DEGs were screened from LUSC samples, and 7 ones were lncRNA genes. The lncRNAs PVT1 and TERC targeted multiple DEGs.
PVT1 and TERC were also regulated by a set of TFs identified from DECGs.
AC C
EP
TE D
M AN U
SC
RI PT