Identification of crucial regulatory relationships between long non-coding RNAs and protein-coding genes in lung squamous cell carcinoma

Identification of crucial regulatory relationships between long non-coding RNAs and protein-coding genes in lung squamous cell carcinoma

Accepted Manuscript Identification of crucial regulatory relationships between long non-coding RNAs and protein-coding genes in lung squamous cell car...

2MB Sizes 0 Downloads 37 Views

Accepted Manuscript Identification of crucial regulatory relationships between long non-coding RNAs and protein-coding genes in lung squamous cell carcinoma Xiaofen Wu, Lei Ruan, Yi Yang, Qi Mei PII:

S0890-8508(16)30019-6

DOI:

10.1016/j.mcp.2016.02.009

Reference:

YMCPR 1201

To appear in:

Molecular and Cellular Probes

Received Date: 11 September 2015 Revised Date:

22 January 2016

Accepted Date: 19 February 2016

Please cite this article as: Wu X, Ruan L, Yang Y, Mei Q, Identification of crucial regulatory relationships between long non-coding RNAs and protein-coding genes in lung squamous cell carcinoma, Molecular and Cellular Probes (2016), doi: 10.1016/j.mcp.2016.02.009. This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT Identification of crucial regulatory relationships between long non-coding RNAs

2

and protein-coding genes in lung squamous cell carcinoma

3

Running title: Bioinformatic analysis of LUSC

4

Xiaofen Wu1, MD; Lei Ruan1, MD; Yi Yang1, MM; Qi Mei2*, MD

5

1 Department of Gerontology, Tongji Hospital, Tongji Medical College, Huazhong

6

University of Science and Technology, Wuhan 430030, China

7

2 Department of Oncology, Tongji Hospital, Tongji Medical College, Huazhong

8

University of Science and Technology, Wuhan 430030, China

9

*

SC

RI PT

1

M AN U

Correspondence author: Qi Mei

Address: Department of Oncology, Tongji Hospital, Tongji Medical College,

11

Huazhong University of Science and Technology, Jiefang Avenue 1095, Wuhan

12

430030, China

13

Tel: 027-83663407; Fax: 027-83662834

14

E-mail: [email protected]

17 18 19 20

EP

16

AC C

15

TE D

10

21 22 23

1

ACCEPTED MANUSCRIPT Abstract

25

Purpose: This study aimed to analyze the relationships of long non-coding RNAs

26

(lncRNAs) and protein-coding genes in lung squamous cell carcinoma (LUSC).

27

Methods: RNA-seq data of LUSC deposited in the TCGA database were used to

28

identify differentially expressed protein-coding genes (DECGs) and differentially

29

expressed lncRNA genes (DE-lncRNAs) between LUSC samples and normal samples.

30

Functional enrichment analysis of DECGs was then performed. Subsequently, the

31

target genes and regulators of DE-lncRNAs were predicted from the DECGs.

32

Additionally, expression levels of target genes of DE-lncRNAs were validated by

33

RT-qPCR after the silence of DE-lncRNAs.

34

Results: In total, 5162 differentially expressed genes (DEGs) were screened from the

35

LUSC samples, and there were seven upregulated lncRNA genes in the DEGs. The

36

upregulated DECGs were enriched in GO terms like RNA binding and metabolic

37

process. Meanwhile, the downregulated DECGs were enriched in GO terms like cell

38

cycle. Furthermore, the lncRNAs PVT1 and TERC targeted multiple DECGs. PVT1

39

targeted genes related to cell cycle (e.g. POLA2, POLD1, MCM4, MCM5 and MCM6),

40

and reduced expression of PVT1 decreased expression of the genes. TERC regulated

41

several genes (e.g. NDUFAB1, NDUFA11 and NDUFB5), and reduced expression of

42

TERC increased expression of the genes. Additionally, PVT1 was regulated by

43

multiple transcription factors (TFs) identified from DECGs, such as HSF1; and TERC

44

was modulated by TFs, such as PIR.

45

Conclusion: A set of regulatory relationships between PVT1 and its targets and

AC C

EP

TE D

M AN U

SC

RI PT

24

2

ACCEPTED MANUSCRIPT 46

regulators, as well as TERC and its targets and regulators, may play crucial roles in

47

the progress of LUSC.

48

Key words: lung squamous cell carcinoma, differentially expressed protein-coding

50

gene, long non-coding RNA, regulatory network

51

Introduction

RI PT

49

Lung squamous cell carcinoma (LUSC) is a common type of non small-cell lung

53

cancer and the second leading cause of death resulting from lung cancer, causing

54

about 400,000 deaths per year worldwide [1, 2]. It is urgent to reveal the molecular

55

mechanisms underlying LUSC oncogenesis.

M AN U

SC

52

In the past years, various molecular players have been demonstrated to be

57

correlated with LUSC, such as protein-coding genes and lncRNAs. For instance, it

58

has been demonstrated that in the mouse lung, biallelic inactivation of Lkb1 and Pten

59

that causes elevated PD-L1 expression, leads to LUSC [3]. PRKCI and SOX2

60

oncogenes cooperate to stimulate activation of Hedgehog signaling and drive a

61

stem-like phenotype in LUSC [4]. Besides, high expression of FGFR1 was found in

62

16% of a clinical cohort of LUSC patients [5]. Furthermore, high expressions of the

63

lncRNAs MALAT1 and HOTAIR are associated with poor prognosis of LUSC [6, 7]. A

64

recent study has been reported that the lncRNA LINC01133 is highly expressed in

65

LUSC and it predicts survival time [8]. However, the interactions between

66

differentially expressed protein-coding genes (DECGs) and differentially expressed

67

lncRNA genes (DE-lncRNAs) are still elusive.

AC C

EP

TE D

56

3

ACCEPTED MANUSCRIPT In this study, RNA-seq data of LUSC deposited in The Cancer Genome Atlas

69

(TCGA) database were used to identify DECGs and DE-lncRNAs. Following the

70

functional analysis of DECGs, the regulatory relationships between DECGs and

71

DE-lncRNAs were analyzed. Additionally, expression levels of target genes of

72

DE-lncRNAs were validated after the silence of DE-lncRNAs. These results were

73

expected to provide novel information for the understanding of molecular

74

mechanisms of LUSC tumorigenesis.

75

Materials and methods

76

RNA sequencing data

M AN U

SC

RI PT

68

The preprocessed RNA-seq level 3 data of LUSC were downloaded from TCGA

78

database (http://cancergenome.nih.gov/), which provides a platform for researchers to

79

download the data sets of genomic abnormalities driving tumorigenesis. In total, the

80

dataset contains 17 pairs of matched primary solid tumor samples and adjacent normal

81

tissue samples, and RPKM (Reads Per Kilobase of exon model per Million mapped

82

reads) values of each gene in these samples were included.

83

DEGs and DE-lncRNAs screening

EP

AC C

84

TE D

77

The paired-sample T test was used to identify the DEGs (including DECGs and

85

non-protein coding genes) between the two matched samples, and p-value was

86

adjusted by the Benjamin and Hochberg (BH) method [9]. Only the genes with

87

adjusted p-value < 0.05 and FC (fold change) > 2 were chosen as DEGs.

88 89

Based

on

annotation

information

in

the

GENCODE

database

(http://www.gencodegenes.org/) [10], the DE-lncRNAs were screened from the 4

ACCEPTED MANUSCRIPT 90

identified DEGs.

91

Enrichment analysis of DECGs To further reveal the functions of DECGs, the Gene Ontology (GO) functional

93

and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment

94

analyses of DEGs were performed using the clusterProfiler package of R [11]. The

95

GO terms and pathway terms with adjusted p-value < 0.05 were selected as the

96

significant terms.

97

Prediction of DE-lncRNA target genes and their functions

M AN U

SC

RI PT

92

Based on the coexpression relationships of lncRNA and protein-coding genes,

99

the target genes of DE-lncRNAs were predicted from the screened DECGs, and only

100

genes with p-value < 0.05 and expression similarity > 0.8 were chosen for subsequent

101

analyses. The expression similarity of DECGs and lncRNAs was measured as

102

previous researchers did [12, 13]. Regulatory networks of lncRNAs and their target

103

genes were then visualized by Cytoscape (http://cytoscape.org/) [14]. Additionally, the

104

GO functional enrichment analysis of target genes was conducted using the

105

clusterProfiler package of R.

106

Prediction of upstream transcription factors (TFs) of DE-lncRNAs

EP

AC C

107

TE D

98

To further analyze the differential expression mechanisms of DE-lncRNAs,

108

based on the TF information in TRANSFAC database [15], we used MotifDb package

109

of R [16] to predict the potential TFs regulating DE-lncRNAs from DECGs. This

110

package is able to search the binding motif in TF of the promoter sequence, based on

111

the gene promoter sequences of lncRNAs provided by users. The matching degree of 5

ACCEPTED MANUSCRIPT motif sequence and promoter sequence was calculated by pulse-width modulation

113

(PWM) algorithm [17], and TFs with matching degree ≥ 85% were considered as the

114

potential upstream TFs regulating DE-lncRNAs. The TF-DE-lncRNA regulatory

115

network was visualized by Cytoscape.

116

Validation of downstream gene expression of significant lncRNAs

RI PT

112

Based on the gene sequences of lncRNAs in the National Center For

118

Biotechnology Information (NCBI) database, three small interfering RNA (siRNA)

119

sequences and one siRNA-scramble sequence for each lncRNAs (Table 6) were

120

designed using BLOCK-iT™ RNAi Designer (www.invitrogen.com/rnai). Besides,

121

based on the sequences of downstream genes of lncRNAs in NCBI, gene primers

122

were designed using the Primer5 software (PRIMER-E Ltd, Plymouth, UK). Primer

123

sequences of genes were listed in Table 6.

TE D

M AN U

SC

117

Human LUSC cell line SK-MES-1 purchased from the Cell Bank at the Chinese

125

Academy of Sciences were placed in a 12 well plate and transfected with siRNA

126

sequences and siRNA-scramble sequence using Lipofectamine 2000 (Invitrogen,

127

Carlsbad, CA, USA), respectively. After transfection of 24 h, total RNA for mRNA

128

expression detection was extracted from SK-MES-1 cells by TriZol reagent

129

(Invitrogen, Carlsbad, CA, USA) according to the manufacturer’s protocol.

130

Afterwards, total RNA was reverse transcribed using PrimeScript RT Master Mix

131

(Takara, RR036A, Shiga, Japan). After cDNA synthesis, mRNA expression levels

132

were tested using Power SYBR Green PCR Master Mix (#4367659, Applied

133

Biosystems, Foster City, CA, USA), and glyceraldehyde-3-phosphate dehydrogenase

AC C

EP

124

6

ACCEPTED MANUSCRIPT 134

(GAPDH) was used as the reference gene.

135

Statistical methods Statistical analysis was performed using the SPSS 19.0 software (SPSS, Chicago,

137

USA). One-way analysis of variance (ANOVA) followed by Duncan test was used to

138

compare group means. Difference with P<0.05 was considered significant.

RI PT

136

Results

141

Predicted DEGs and DE-lncRNAs

M AN U

140

SC

139

In total, 5162 DEGs (4489 upregulated ones and 673 downregulated ones) were

143

identified from the LUSC samples, comparing the normal samples. Among these

144

DEGs, there were 7 upregulated lncRNA genes (Table 1).

145

Functions of DECGs

TE D

142

According to GO functional enrichment analysis of DECGs in the DEGs, the

147

upregulated DECGs were significantly enriched in a series of GO terms, such as RNA

148

binding (e.g. CEBPZ, TRIM28 and HNRNPR), membrane-enclosed lumen (e.g.

149

MRPL52 and COL1A1) and metabolic process (e.g. CEBPG and CDH3) (Table 2).

150

Meanwhile, the downregulated DEGs were distinctly enriched in multiple GO terms,

151

such as ion binding (e.g. KIF20A and TRAIP), microtubule cytoskeleton (e.g. KIF18B

152

and CEP152) and cell cycle (e.g. KIF20A and CENPA) (Table 3).

AC C

EP

146

153

Furthermore, the upregulated DEGs were significantly several pathways,such as

154

RNA transport (e.g. PRMT5 and TACC3), spliceosome (e.g. CHERP and USP39) and

155

DNA replication (e.g. MCM2 and RFC3) (Table 4). While, no downregulated DEGs 7

ACCEPTED MANUSCRIPT 156

were enriched in significant pathways.

157

Target genes of DE-lncRNAs and their functions Totally, 347 target genes of five DE-lncRNAs (PVT1, TERC, HAR1A, TTTY16

159

and FAM138B) were identified. Among the five DE-lncRNAs, PVT1 and TERC

160

targeted multiple genes. For example, PVT1 regulated a set of genes, such as HSF1,

161

USF2 and CEBPG; TERC modulated a series of genes, such as PIR, NDUFB5 and

162

PRPS2 (Figure 1).

SC

RI PT

158

According to GO functional enrichment analysis in biological process (BP) of

164

target genes of PVT1 and TERC, PVT1 target genes were mainly enriched in GO

165

terms of cell cycle, such as nucleobase−containing compound metabolic process,

166

heterocycle metabolic process and cellular nitrogen compound metabolic process

167

(Figure 2A). Furthermore, a set of PVT1 target genes were significantly enriched in

168

six pathways, such as DNA replication (e.g. POLA2, POLD1, and MCM5), purine

169

metabolism (e.g. POLA2 and POLD1) and cell cycle (e.g. MCM4, MCM5and MCM6)

170

(Table 5).

EP

TE D

M AN U

163

TERC target genes were mainly enriched in the metabolic process and biological

172

process (Figure 2B). A series of target genes (e.g. NDUFAB1, NDUFA11, NDUFB5,

173

NDUFA8 and NDUFB4) were markedly enriched in several pathways, such as

174

oxidative phosphorylation and metabolic pathways (Table 5).

175

Analysis of TF-DE-lncRNA regulatory network

AC C

171

176

A total of 163 TFs were identified from DECGs, including 139 upregulated ones

177

and 24 downregulated ones. Among these TFs, 73 TFs potentially modulated 7 8

ACCEPTED MANUSCRIPT DE-lncRNAs. The regulatory network consisted of 339 interactions, including 7

179

DE-lncRNAs and 73 TFs. There were four coexpressed relationships in this network,

180

including PVT1 and its three potential upstream TFs (HSF1, USF2 and CEBPG), as

181

well as TERC and its potential upstream TF PIR (Figure 3).

182

Gene expression validated by real-time quantitative polymerase chain reaction

183

(RT-qPCR)

RI PT

178

To validate the regulation of lncRNAs PVT1 and TERC on the expression of

185

their downstream genes, RT-qPCR was used to determine the gene expression after

186

lncRNAs were silenced by siRNAs. Among the three siRNAs, mRNA expression of

187

PVT1 silenced by siRNA 3 and TERC silenced by siRNA 1 was the lowest (Figure 4A

188

and B). Therefore, siRNA 3 for PVT1 and siRNA 1 for TERC were used for the

189

determination of expression of downstream target genes. After the silence of PVT1, all

190

of the mRNA expression of target genes (POLD1, POLA2, MCM4, MCM5 and

191

MCM6) was significantly decreased (p < 0.05), comparing with the control (Figure

192

4C). Furthermore, after the silence of TERC, all of the mRNA expression of target

193

genes (NDUFB5, NDUFA11, NDUFAB1, NDUFB4 and NDUFA8) was significantly

194

increased (p < 0.05), comparing with the control (Figure 4D).

196

M AN U

TE D

EP

AC C

195

SC

184

Discussion

197

In the current study, 5162 DEGs (4489 upregulated ones and 673 downregulated

198

ones) were screened from the LUSC samples, comparing the normal samples. Among

199

these DEGs, there were seven upregulated lncRNA genes, some of which (PVT1 and 9

ACCEPTED MANUSCRIPT 200

TERC) targeted multiple genes. PVT1 is encoded by a gene that resides in the well-known cancer risk region

202

8q24, and it plays a pivotal role in cancers, such as LUSC [18]. In this study, PVT1

203

targeted a set of genes, such as POLA2, POLD1, MCM4, MCM5 and MCM6, and it

204

was validated that reduced expression of PVT1 decreased the expression of the five

205

target genes. These five genes were significantly enriched in several GO terms of cell

206

cycle, such as DNA replication, purine metabolism and cell cycle. Both POLA2 and

207

POLD1 encode DNA polymerase. POLD1 is involved in DNA single-strand break

208

repair process, and increase genomic instability of POLD1 leads to a high incidence

209

of epithelial tumours in mice [19]. Dysregulated cell cycle results in aberrant cell

210

growth, which may lead to tumorigenesis [20]. Thus, POLA2 and POLD1 might play

211

key roles in LUSC. MCM4, MCM5 and MCM6 encode minichromosome maintenance

212

(MCM) complex. Study has demonstrated that MCM4 expression is higher in LUSC

213

cells than in adjacent normal bronchial epithelial cells [21]. There is no evidence that

214

MCM5 and MCM6 are associated with LUSC, and they may participate in the

215

progress of LUSC via DNA replication.

SC

M AN U

TE D

EP

AC C

216

RI PT

201

In this study, PVT1 was also regulated by some TFs, such as HSF1, CEBPG and

217

USF2, which were identified from the DEGs. HSF1 (Heat Shock Transcription Factor

218

1) was found to be enriched in the metabolic process in the present study. A previous

219

study has discovered copy number gain in HSF1 gene region in LUSC [22]. Besides,

220

CEBPG (CCAAT/enhancer binding protein (C/EBP), gamma) is a member of the

221

bZIP family of DNA binding proteins, and it is expressed allele-specifically in human 10

ACCEPTED MANUSCRIPT bronchial epithelial cells [23]. USF2 encodes a member of the helix-loop-helix

223

leucine zipper family, and can function as a cellular TF [24]. USF-2 overexpression in

224

lung squamous cell carcinoma cell line NCI-H2170 significantly stimulates in vitro

225

cell proliferation [25]. Taken together, the lncRNA PVT1 might exert essential roles in

226

LUSC, through its regulation on targets (e.g. POLA2, POLD1, MCM4, MCM5 and

227

MCM6) and the modulation by TFs (e.g. HSF1, CEBPG and USF2).

RI PT

222

Another lncRNA TERC is a telomerase RNA component encoding gene, and its

229

expression is common in LUSC [26]. Study has shown that TERC is upregulated in

230

LUSC cell lines, which is consistent with the result in this study [27]. It is a likely

231

target of the 3q26 amplification, and may be a useful biomarker for diagnosis of

232

non-small cell lung cancer [27]. In this study, TERC regulated a series of DEGs, such

233

as NDUFAB1, NDUFA11, NDUFB5, NDUFA8 and NDUFB4. It was validated that

234

reduced expression of TERC increased the expression of the five target genes. These

235

genes all encode NADH dehydrogenase (ubiquinone) 1, and were enriched in

236

oxidative phosphorylation (OXPHOS) and metabolic pathways. Inhibition of the

237

activity of OXPHOS stimulates a rapid increase in their rates of aerobic glycolysis in

238

LUSC cells. There is no evidence to prove the correlation of the five genes with

239

LUSC, while, NADH dehydrogenase (ubiquinone) alpha subcomplex-1 (NDUFA1) is

240

an accessory component of OXPHOS complex-I, which is indispensable for

241

respiratory activity [28]. NDUFA10 is significantly dysregulated in head and neck

242

squamous cell carcinoma [29]. Therefore, TERC and its targets might function in

243

LUSC through the oxidative phosphorylation pathway. Additionally, TERC was found

AC C

EP

TE D

M AN U

SC

228

11

ACCEPTED MANUSCRIPT to be regulated by the TF PIR. PIR encodes an Fe(II)-containing nuclear protein,

245

which is a member of the cupin superfamily [30]. It interacts with B cell lymphoma

246

3-encoded oncoprotein and nuclear factor I/CCAAT box transcription factor,

247

indicating that PIR may be involved in the modulation of DNA replication and

248

transcription [31]. There is no evidence that PIR is associated with LUSC so far.

249

Thereby, PIR may play pivotal roles in LUSC, via the regulation on TERC expression.

250

However, there were some limitations in this study. The aforementioned results,

251

including the expression level of lncRNAs and genes, as well as the functions of them,

252

were needed to be validated by experiments, which would be conducted in our further

253

studies.

M AN U

SC

RI PT

244

In conclusion, 5162 DEGs were screened from the LUSC samples, and there

255

were seven upregulated lncRNAs in the DEGs. Among the DECGs and DE-lncRNAs,

256

the lncRNA PVT1 and its targets (e.g. POLA2, POLD1, MCM4, MCM5 and MCM6)

257

as well as its regulators (e.g. HSF1, CEBPG and USF2), TERC and its targets (e.g.

258

NDUFAB1, NDUFA11 and NDUFB5) as well as its regulators (e.g. PIR) might exert

259

crucial functions in the tumorigenesis of LUSC. These findings were expected to

260

provide a theoretical basis for further experimental studies, and helpful to shed light

261

on the molecular mechanisms of LUSC.

EP

AC C

262

TE D

254

263 264

Acknowledgement

265

This study was supported by Hubei Provincial Natural Science Foundation of

266

China(No. 2014CFB366). 12

ACCEPTED MANUSCRIPT 267

Conflict of interests

268

All authors declare that they have no conflict of interests to state.

269

RI PT

270 271 272

SC

273

M AN U

274 275 276 277

TE D

278 279 280

EP

281

283 284

AC C

282

285 286

References

287

[1] Network CGAR. Comprehensive genomic characterization of squamous cell lung

288

cancers. Nature. 2012;489:519-25. 13

ACCEPTED MANUSCRIPT [2] Siegel R, Naishadham D, Jemal A. Cancer statistics, 2013. CA Cancer J Clin.

290

2013;63:11-30.

291

[3] Xu C, Fillmore CM, Koyama S, Wu H, Zhao Y, Chen Z, et al. Loss of Lkb1 and

292

Pten leads to lung squamous cell carcinoma with elevated PD-L1 expression. Cancer

293

Cell. 2014;25:590-604.

294

[4] Justilien V, Walsh MP, Ali SA, Thompson EA, Murray NR, Fields AP. The PRKCI

295

and SOX2 oncogenes are coamplified and cooperate to activate Hedgehog signaling

296

in lung squamous cell carcinoma. Cancer Cell. 2014;25:139-51.

297

[5] Heist RS, Mino-Kenudson M, Sequist LV, Tammireddy S, Morrissey L, Christiani

298

DC, et al. FGFR1 amplification in squamous cell carcinoma of the lung. J Thorac

299

Oncol. 2012;7:1775.

300

[6]

301

andthymosinbeta4predictmetastasisandsurvivalinearly

302

smallcelllungcancer. Oncogene. 2003;22:8031.

303

[7] Liu X-h, Liu Z-l, Sun M, Liu J, Wang Z-x, De W. The long non-coding RNA

304

HOTAIR indicates a poor prognosis and promotes metastasis in non-small cell lung

305

cancer. BMC Cancer. 2013;13:464.

306

[8] Zhang J, Zhu N, Chen X. A novel long noncoding RNA LINC01133 is upregulated

307

in lung squamous cell cancer and predicts survival. Tumor Biology. 2015:1-7.

308

[9] Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and

309

powerful approach to multiple testing. Journal of the Royal Statistical Society Series

310

B (Methodological). 1995:289-300.

D,

Wang

W.

mALAT

1,

anovelnoncoding

RnA,

stage

non

AC C

EP

TE D

JiP

M AN U

SC

RI PT

289

14

ACCEPTED MANUSCRIPT [10] Harrow J, Frankish A, Gonzalez JM, Tapanari E, Diekhans M, Kokocinski F, et al.

312

GENCODE: the reference human genome annotation for The ENCODE Project.

313

Genome Res. 2012;22:1760-74.

314

[11] Yu G, Wang L-G, Han Y, He Q-Y. clusterProfiler: an R package for comparing

315

biological themes among gene clusters. OMICS. 2012;16:284-7.

316

[12] Eisen MB, Spellman PT, Brown PO, Botstein D. Cluster analysis and display of

317

genome-wide expression patterns. Proc Natl Acad Sci U S A. 1998;95:14863-8.

318

[13] Herzel H, Beule D, Kielbasa S, Korbel J, Sers C, Malik A, et al. Extracting

319

information from cDNA arrays. CHAOS: an interdisciplinary journal of nonlinear

320

science. 2001;11:98-107.

321

[14] Kohl M, Wiese S, Warscheid B. Cytoscape: software for visualization and

322

analysis of biological networks.

323

291-303.

324

[15] Wingender E. The TRANSFAC project as an example of framework technology

325

that supports the analysis of genomic regulation. Brief Bioinform. 2008;9:326-32.

326

[16] Shannon P. MotifDb: an annotated collection of protein-DNA binding sequence

327

motifs. R package version 1.2. 2.

328

[17] Beckstette M, Homann R, Giegerich R, Kurtz S. Fast index based algorithms and

329

software for matching position specific scoring matrices. BMC Bioinformatics.

330

2006;7:389.

331

[18] Colombo T, Farina L, Macino G, Paci P. PVT1: A Rising Star among Oncogenic

332

Long Noncoding RNAs. BioMed research international. 2015;2015.

M AN U

SC

RI PT

311

AC C

EP

TE D

Data Mining in Proteomics: Springer; 2011. p.

15

ACCEPTED MANUSCRIPT [19] Parsons JL, Preston BD, O'Connor TR, Dianov GL. DNA polymerase

334

δ-dependent repair of DNA single strand breaks containing 3′-end proximal lesions.

335

Nucleic Acids Res. 2007;35:1054-63.

336

[20] Baserga R. The relationship of the cell cycle to tumor growth and control of cell

337

division: A review. Cancer Res. 1965;25:581-95.

338

[21] Kikuchi J, Kinoshita I, Shimizu Y, Kikuchi E, Takeda K, Aburatani H, et al.

339

Minichromosome maintenance (MCM) protein 4 as a marker for proliferation and its

340

clinical and clinicopathological significance in non-small cell lung cancer. Lung

341

Cancer. 2011;72:229-37.

342

[22] Baik S-H, Jee B-K, Choi J-S, Yoon H-K, Lee K-H, Kim Y-H, et al. DNA

343

profiling by array comparative genomic hybridization (CGH) of peripheral blood

344

mononuclear cells (PBMC) and tumor tissue cell in non-small cell lung cancer

345

(NSCLC). Mol Biol Rep. 2009;36:1767-78.

346

[23] Blomquist TM, Brown RD, Crawford EL, de la Serna I, Williams K, Yoon Y, et

347

al. CEBPG exhibits allele-specific expression in human bronchial epithelial cells.

348

Gene regulation and systems biology. 2013;7:125.

349

[24] Groenen PM, Garcia E, Debeer P, Devriendt K, Fryns JP, Van de Ven WJ.

350

Structure, Sequence, and Chromosome 19 Localization of HumanUSF2and Its

351

Rearrangement in a Patient with Multicystic Renal Dysplasia. Genomics.

352

1996;38:141-8.

353

[25] Ocejo‐Garcia M, Baokbah TA, Louise Ashurst H, Cowlishaw D, Soomro I,

354

Coulson JM, et al. Roles for USF‐2 in lung cancer proliferation and bronchial

AC C

EP

TE D

M AN U

SC

RI PT

333

16

ACCEPTED MANUSCRIPT carcinogenesis. J Pathol. 2005;206:151-9.

356

[26] Soder AI, Going JJ, Kaye SB, Keith WN. Tumour specific regulation of

357

telomerase RNA gene expression visualized by in situ hybridization. Oncogene.

358

1998;16:979-83.

359

[27] Yokoi S, Yasui K, Iizasa T, Imoto I, Fujisawa T, Inazawa J. TERC identified as a

360

probable target within the 3q26 amplicon that is detected frequently in non-small cell

361

lung cancers. Clin Cancer Res. 2003;9:4705-13.

362

[28] Mamelak AJ, Kowalski J, Murphy K, Yadava N, Zahurak M, Kouba DJ, et al.

363

Downregulation of NDUFA1 and other oxidative phosphorylation‐related genes is a

364

consistent feature of basal cell carcinoma. Exp Dermatol. 2005;14:336-48.

365

[29] Teh M-T, Gemenetzidis E, Patel D, Tariq R, Nadir A, Bahta AW, et al. FOXM1

366

induces a global methylation signature that mimics the cancer epigenome in head and

367

neck squamous cell carcinoma. Plos One. 2012;7:e34329.

368

[30] Wendler WM, Kremmer E, Förster R, Winnacker E-L. Identification of pirin, a

369

novel highly conserved nuclear protein. J Biol Chem. 1997;272:8482-9.

370

[31] Pang H, Bartlam M, Zeng Q, Miyatake H, Hisano T, Miki K, et al. Crystal

371

structure of human pirin An iron-binding nuclear protein and transcription cofactor. J

372

Biol Chem. 2004;279:1491-8.

SC

M AN U

TE D

EP

AC C

373

RI PT

355

374

Figure legends

375

Figure 1 The networks composed of differentially expressed long non-coding RNAs

376

(DE-lncRNAs) and their target genes. The diamond nodes represent DE-lncRNAs; 17

ACCEPTED MANUSCRIPT oval red nodes represent upregulated differentially expressed protein-coding genes

378

(DECGs); and oval green nodes represent downregulated DECGs.

379

Figure 2 The top 10 Gene Ontology (GO) terms with the highest p-value for target

380

genes of differentially expressed long non-coding RNAs (DE-lncRNAs). The value in

381

x-axis represents the number of genes enriched in the GO terms.

382

Figure 3 The regulatory network composed of differentially expressed long

383

non-coding RNAs (DE-lncRNAs) and transcription factors (TFs) identified from

384

differentially expressed protein-coding genes (DECGs). The diamond nodes represent

385

DE-lncRNAs; the triangular red nodes represent upregulated DECGs; and triangular

386

green nodes represent downregulated DECGs.

387

Figure 4 Relative mRNA expression levels of differentially expressed long

388

non-coding RNAs (DE-lncRNAs) and their target genes. (A) Relative mRNA

389

expression levels of lncRNA PVT1 silenced by three small interfering RNAs

390

(siRNAs). (B) Relative mRNA expression levels of lncRNA TERC silenced by three

391

siRNAs. (C) Relative mRNA expression levels of target genes after the silence of

392

PVT1. (D) Relative mRNA expression levels of target genes after the silence of TERC.

393

The NC group represents that lncRNA is silenced by siRNA-scramble sequence as a

394

negative control.

AC C

EP

TE D

M AN U

SC

RI PT

377

18

ACCEPTED MANUSCRIPT

Chr

Start

End

Strand

DIO3OS

chr14

101552221

101560431

-

HAR1A

chr20

63102205

63104386

+

FAM138B chr2

113577382

113578852

+

BCYRN1

chr2

47331060

47344517

+

TERC

chr3

169764520

169765060

-

HCG9

chr6

29975112

29978410

+

PVT1

chr8

127794533

128101253

+

M AN U

SC

LncRNA

RI PT

Table 1 The details of the identified differentially expressed lncRNAs

AC C

EP

TE D

Chr, chromosome; the columns of “Start” and “End” represent the start site and end site of the gene sequence of lncRNA, respectively. “+” and “-” represent positive-sense strand and antisense strand, respectively.

ACCEPTED MANUSCRIPT

Term

Adjusted p-value

Gene count

MF

GO:0003674

molecular_function

9.38E-257

3569

GO:0003723

RNA binding

1.03E-132

703

GO:0044822

poly(A) RNA binding

5.68E-125

569

GO:0005488

binding

4.57E-115

3006

GO:1901363

heterocyclic compound binding

1.33E-92

1641

GO:0031974

membrane-enclosed lumen

7.93E-122

GO:0044428

nuclear part

GO:0070013

intracellular organelle lumen

GO:0043233

organelle lumen

GO:0044424

intracellular part

GO:0008150 GO:0008152

BP

TE D

1114

1010

1.91E-116

1069

8.30E-115

1081

1.41E-107

3215

biological_process

1.41E-214

3624

metabolic process

1.71E-114

2815

EP

4.85E-117

AC C

CC

Genes

CEBPG, ITGAV, INCENP, CEBPZ, TRIM28, SLC25A13, SLC25A15, CDK2, ZNF256, CDK3… CEBPZ, TRIM28, ALYREF, MPHOSPH10, DDX39A, HNRNPR, HRSP12, POP7, NUDT5, WDHD1, STRAP… TRAP1, G3BP1, SUGP2, CEBPZ, TRIM28, ALYREF, SNRNP200, RBM34, RRP1B, WDR43… HSF1, CEBPG, CD24, PDCD6, BCL2L11, CHAF1A, PARP2, HMGXB4, DNAJB6, SMC4… HSF1, USF2, CEBPG, SMC4, UBA2, FARSB, ABCC5, ABCB6, DNM1L, ABCF2… MRPL52, LEO1, EARS2, NR2C2AP, COL1A1, COL1A2, COL3A1, MRPL55, CSTF2, SPC24… PDCD6, CHAF1A, PARP2, HMGXB4, SMC4, UBA2, SAE1, SNUPN, LRPPRC, CDK4… UBA2, SAE1, PPIF, LRPPRC, PDIA6, TRAP1, TRIM28, CDK2, ALYREF, CDK4… SMC4, UBA2, SAE1, PPIF, LRPPRC, PDIA6, TRAP1, TRIM28, CDK2, ALYREF… HSF1, CEBPG, CDH3, POM121C, OST4, ZNF737, PDCD6, BCL2L11, TOMM6, FRAT1 CEBPG, HNRNPR, RABEPK, TIMM17B, HRSP12, POP7, CNKSR1, SF3B4, NET1, UBE4B… HSF1, CEBPG, CDH3, ZNF737, CD24, PDCD6, BCL2L11, TOMM6,

SC

ID

M AN U

Category

RI PT

Table 2 The top 5 enriched GO terms of the upregulated differentially expressed protein-coding genes

ACCEPTED MANUSCRIPT

4.33E-105

2579

GO:0044237

cellular metabolic process

7.33E-104

2533

GO:0044260

cellular macromolecule metabolic process

1.55E-102

2032

RI PT

primary metabolic process

SC

GO:0044238

CHAF1A, PARP2… HSF1, CEBPG, ZNF737, CD24, PDCD6, BCL2L11, TOMM6, CHAF1A, PARP2, MAMLD1… HSF1, CEBPG, PPIF, PREB, ARL4C, LRPPRC, PDIA6, TRAP1, CEBPZ, TRIM28… HSF1, CEBPG, CHAF1A, PARP2, MAMLD1, DNAJB6, SMC4, UBA2, SAE1, FARSB…

AC C

EP

TE D

M AN U

GO, gene ontology; MF, molecular function; CC, cellular component; BP, biological process.

ACCEPTED MANUSCRIPT

Adjusted p-value

Gene count

MF

GO:0003674

molecular_function

1.35E-31

505

GO:0005488

binding

2.54E-14

427

GO:0043167

ion binding

8.19E-08

228

GO:0097367

carbohydrate binding

GO:0003777

microtubule motor activity

0.000287

GO:0005575

cellular_component

1.24E-11

GO:0000775

chromosome, centromeric region

GO:0015630

microtubule cytoskeleton

GO:0005819

spindle

GO:0000793

condensed chromosome

0.000246

18

GO:0008150

biological_process

1.06E-25

521

GO:0044699

single-organism process

1.98E-13

446

BP

0.000127

95 11

556

20

2.76E-05

57

7.14E-05

24

EP

8.44E-06

AC C

CC

derivative

Genes

RI PT

Term

SNRPGP15, TROAP, MEF2B, ZNF605, KIF20A, KLRG1, KCNMB2, TRAIP, NMUR1, CENPA… ZNF605, KIF20A, KLRG1, TRAIP, KLHL41, ABCA9, CORO2B, CENPA, CENPF, IGF2BP3… KIF20A, TRAIP, ABCA9, DMRT2, PLK4, MASP2, CAPN9, CFTR, ADAM28, DBF4… CFTR, KIF2C, KIF12, PSTK, TDRD9, EGFLAM, KIF18B, CTSG, NEK10, ABCA13… KIF20A, KIF2C, KIF12, KIF18B, DYNC1I1, KIF26A, KIF26B, KIF15, DNAI2, DYNC2H1… TMEM170B, LRRC70, SNRPGP15, SMIM6, TROAP, MEF2B, ZNF605, KIF20A, KLRG1, KCNMB2… CENPA, CENPF, KIF2C, CDCA5, OIP5, SGOL2, ESCO2, DYNC1I1, HELLS, BIRC5… KIF12, CCSAP, KIF18B, CKAP2L, SASS6, DYNC1I1, CEP152, NCAPH, POC1A, ASPM… KIF20A, CENPF, CCSAP, KIF18B, CKAP2L, DYNC1I1, POC1A, ASPM, WDR62, BIRC5… CENPA, CENPF, KIF2C, CDCA5, SGOL2, DYNC1I1, NCAPH, BIRC5, CENPW, MKI67… TROAP, MEF2B, ZNF605, KIF20A, KLRG1, KCNMB2, TRAIP, NMUR1, KLHL41, ABCA9… MEF2B, KIF20A, KLRG1, KCNMB2, NMUR1, KLHL41, ABCA9,

SC

ID

M AN U

Category

TE D

Table 3 The top 5 enriched GO terms of the downregulated differentially expressed protein-coding genes

ACCEPTED MANUSCRIPT

GO:0007049 GO:0000278

cellular

6.96E-12

411

cell cycle

5.82E-11

97

mitotic cell cycle

6.91E-11

68

RI PT

single-organism process

SC

GO:0044763

CORO2B, CENPA, CENPF… KIF20A, KLRG1, KCNMB2, TRAIP, NMUR1, KLHL41, ABCA9, CORO2B, CENPA, CENPF… KIF20A, CENPA, CENPF, PLK4, DBF4, KIF2C, CDCA5, OIP5, PARD3B, TDRD9… KIF20A, CENPA, CENPF, PLK4, DBF4, KIF2C, CDCA5, OIP5, IQGAP3, KIF18B…

AC C

EP

TE D

M AN U

GO, gene ontology; MF, molecular function; CC, cellular component; BP, biological process.

ACCEPTED MANUSCRIPT

ID

Term

Adjusted p-value

Gene count

hsa03013

RNA transport

8.41E-21

93

hsa03040

Spliceosome

1.31E-19

81

hsa03008

Ribosome biogenesis in eukaryotes

2.20E-11

50

hsa03050

Proteasome

8.92E-09

31

hsa03030

DNA replication

4.18E-08

26

Genes

RI PT

Table 4 The top 5 enriched pathways with the highest p-value of the upregulated differentially expressed protein-coding genes

AC C

EP

TE D

M AN U

SC

POP7, PRMT5, TACC3, RPP30, PAIP1, NUP50, RPP40, RNPS1, NUPL2, STRAP… PPIH, CHERP, USP39, SRSF10, TXNL4A, TCERG1, SF3A3, SF3B2, U2AF2, HNRNPA1L2… PWP2, RAN, NOL6, TCOF1, XPO1, RIOK1, WDR75, UTP15, CIRH1A, IMP4… PSMD2, PSMD3, PSMD4, PSMD7, PSMD8, PSMD11, PSMD12, PSMD13, SHFM1, PSMF1… FEN1, POLA2, RNASEH1, LIG1, MCM2, MCM3, MCM4, MCM5, RFC2, RFC3…

ACCEPTED MANUSCRIPT

Table 5 The results of pathway enrichment analysis of target genes of PVT1 and TERC Adjusted p-value

Gene count

PVT1 target genes

hsa03030

DNA replication

2.93E-12

11

hsa03040

Spliceosome

2.54E-07

12

hsa03410 hsa00230 hsa04110 hsa00240

Base excision repair Purine metabolism Cell cycle Pyrimidine metabolism

9.58E-06 0.001842 0.001842 0.002039

6 8 7 6

hsa05012

Parkinson's disease

1.81E-12

hsa05016

Huntington's disease

3.85E-12

14

hsa05010

Alzheimer's disease

3.05E-10

12

hsa00190

Oxidative phosphorylation

hsa01100

Metabolic pathways

13

TE D EP

AC C

TERC target genes

Genes

RI PT

Term

POLA2, PCNA, PRIM1, FEN1, MCM5, LIG1, POLE, MCM6, MCM4, PRIM2, POLD1 SNRPA, SNRNP70, SNRPD1, SF3A2, SF3B4, PUF60, PRPF3, U2AF2, EFTUD2, TXNL4A, SNRPE, HNRNPC PCNA, FEN1, LIG1, POLE, MUTYH, POLD1 PAICS, POLA2, PRIM1, POLE, GART, POLR2I, PRIM2, POLD1 PCNA, PTTG1, MCM5, E2F1, MCM6, MCM4, CDK2 POLA2, PRIM1, POLE, POLR2I, PRIM2, POLD1 NDUFAB1, COX5B, SLC25A5, NDUFA11, NDUFA12, ATP5G3, COX7B, COX5A, NDUFB5, COX6A1, CASP3, NDUFA8, NDUFB4 NDUFAB1, COX5B, SLC25A5, NDUFA11, NDUFA12, POLR2H, ATP5G3, COX7B, COX5A, NDUFB5, COX6A1, CASP3, NDUFA8, NDUFB4 NDUFAB1, COX5B, NDUFA11, NDUFA12, ATP5G3, COX7B, COX5A, NDUFB5, COX6A1, CASP3, NDUFA8, NDUFB4 NDUFAB1, COX5B, NDUFA11, NDUFA12, ATP5G3, COX7B, COX5A, NDUFB5, COX6A1, NDUFA8, NDUFB4 NDUFAB1, PRPS2, FUT5, COX5B, NDUFA11, PTS, ALG3, POLR2H, PDHA1, ATP5G3, COX7B, COX5A, NDUFB5, COX6A1, NDUFA8, NDUFB4

SC

ID

M AN U

Category

3.05E-10

11

0.000663

16

ACCEPTED MANUSCRIPT

Table 6 Sequences of small interfering RNAs and gene primers Name

Sequences

siRNA

PVT1-siRNA 1

Positive-sense strand: CCAGAAUCCUGUUACACCUdTdT Antisense strand: AGGUGUAACAGGAUUCUGGdTdT Positive-sense strand: GCACAUUUCAGGAUACUAAdTdT Antisense strand: UUAGUAUCCUGAAAUGUGCdTdT Positive-sense strand: GGCCUCGUGUCUAUUAAAUdTdT Antisense strand: AUUUAAUAGACACGAGGCCdTdT Positive-sense strand: GUUCAUUCUAGAGCAAACAdTdT Antisense strand: UGUUUGCUCUAGAAUGAACdTdT Positive-sense strand: GUCUAACCCUAACUGAGAAdTdT Antisense strand: UUCUCAGUUAGGGUUAGACdTdT Positive-sense strand: CACCGUUCAUUCUAGAGCAdTdT Antisense strand: UGCUCUAGAAUGAACGGUGdTdT Positive-sense strand: UUCUCCGAACGUGUCACGUdTdT Antisense strand: ACGUGACACGUUCGGAGAAdTdT Forward: ATCTTCAGTGGTCTGGGGAATAACG Reverse: CCATGGACATCCAAGCTGTAAGAAT Forward: TGGCCATTTTTTGTCTAACCCTAAC Reverse: TTTGCTCTAGAATGAACGGTGGAAG Forward: TACACCACACCTTCAAAGGGTTCTC Reverse: ACTTGACGGTGAGAGTAGCTGATGG Forward: GTGACCTGCAACGGGAGCTGAACTT Reverse: CCGTGGTACCCAAACATGCTCTCT Forward: AATCTTCTTTGACCGTTACCCTGAC Reverse: ATGAGCTGGTCAATGTCTTCTGGAT

TERC-siRNA 3 siRNA-scramble Primer

PVT1 TERC POLA2 POLD1 MCM4

SC

M AN U

TERC-siRNA 2

TE D

TERC-siRNA 1

EP

PVT1-siRNA 3

AC C

PVT1-siRNA 2

RI PT

Category

ACCEPTED MANUSCRIPT

NDUFA8 NDUFB4 GAPDH

RI PT

SC

NDUFB5

M AN U

NDUFA11

TE D

NDUFAB1

EP

MCM6

Forward: TTCCAGCATTCGTAGCCTGAAGTCG Reverse: GGCACTGGATAGAGATGCGGGT Forward: CAAGACCTGCCTACCAGACACAAGA Reverse: AGCACAGAAAAGTTCCGCTCACAAG Forward: TATGACAAGATTGACCCAGAGAAGC Reverse: CGTCTTCCATGGCCATGATAATCTC Forward: TCACACTCAATCCTCCGGGCACCTT Reverse: TGATGCAGGTGGTGAGGCCAAACA Forward: AAAGAACAATGGCCGTCCTTCAGAT Reverse: TAATACCAGGGTCCATCTCCTCTCA Forward: AACAGCAGGCAAAGTTTGACGAGTG Reverse: GGTCTTGGTCTTGAGTGATAGGGAT Forward: TACCTGCTTCAGTACAACGATCCCA Reverse: ACAGAGCTCCCATGAGTGAGTTTTT Forward: AGAAGGCTGGGGCTCATTTG Reverse: AGGGGCCATCCACAGTCTTC

AC C

MCM5

AC C

EP

TE D

M AN U

SC

RI PT

ACCEPTED MANUSCRIPT

AC C

EP

TE D

M AN U

SC

RI PT

ACCEPTED MANUSCRIPT

AC C

EP

TE D

M AN U

SC

RI PT

ACCEPTED MANUSCRIPT

AC C

EP

TE D

M AN U

SC

RI PT

ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIPT Highlights 

In total, 5162 DEGs were screened from LUSC samples, and 7 ones were lncRNA genes. The lncRNAs PVT1 and TERC targeted multiple DEGs.



PVT1 and TERC were also regulated by a set of TFs identified from DECGs.

AC C

EP

TE D

M AN U

SC

RI PT