Comprehensive analysis of dysregulated lncRNAs and their competing endogenous RNA network in triple-negative breast cancer

Comprehensive analysis of dysregulated lncRNAs and their competing endogenous RNA network in triple-negative breast cancer

Journal Pre-proof Comprehensive analysis of dysregulated lncRNAs and their competing endogenous RNA network in triple-negative breast cancer Leimarem...

9MB Sizes 0 Downloads 18 Views

Journal Pre-proof Comprehensive analysis of dysregulated lncRNAs and their competing endogenous RNA network in triple-negative breast cancer

Leimarembi Devi Naorem, Vigneshwar Mathavan Muthaiyan, Amouda Venkatesan

Suriya

Prakash,

PII:

S0141-8130(19)38947-0

DOI:

https://doi.org/10.1016/j.ijbiomac.2019.12.196

Reference:

BIOMAC 14224

To appear in:

International Journal of Biological Macromolecules

Received date:

3 November 2019

Revised date:

21 December 2019

Accepted date:

21 December 2019

Please cite this article as: L.D. Naorem, V.S. Prakash, M. Muthaiyan, et al., Comprehensive analysis of dysregulated lncRNAs and their competing endogenous RNA network in triple-negative breast cancer, International Journal of Biological Macromolecules(2019), https://doi.org/10.1016/j.ijbiomac.2019.12.196

This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

© 2019 Published by Elsevier.

Journal Pre-proof Comprehensive analysis of dysregulated lncRNAs and their competing endogenous RNA network in triple-negative breast cancer Leimarembi Devi Naorem, Vigneshwar Suriya Prakash, Mathavan Muthaiyan and Amouda Venkatesan* Centre for Bioinformatics, School of Life Sciences, Pondicherry University, Pondicherry, India. *To whom the correspondence should be addressed.

of

Address: Centre for Bioinformatics, School of Life Sciences, Pondicherry University, R. V. Nagar Kalapet, Pondicherry-605014, India. E-mail: [email protected], [email protected]

-p

ro

Tel: +91-413-2654-582

re

Abstract

The study aimed to explore the molecular mechanism underlying triple-negative breast

lP

cancer (TNBC) and to identify their potential diagnostic/ prognostic biomarkers. The differentially expressed lncRNAs (DElncRNAs) were identified by meta-analysis and

na

machine learning feature selection methods. The dysregulated lncRNA-miRNA-mRNA network was constructed based on the competing endogenous RNA (ceRNA) hypothesis. A

Jo ur

total of 26 DElncRNAs were identified with a meta-analysis approach of which 18 DElncRNAs attained high accuracy in training and test dataset by Support Vector MachineRecursive Feature Elimination (SVM-RFE) which could act as diagnostic biomarkers. Among the identified DElncRNAs, LINC01315 and CTA-384D8.35 could act as prognostic biomarkers. Finally, two important sub-modules from lncRNA-miRNA-mRNA network were identified which consists of DElncRNAs (LINC01087, LINC01315, and SOX9-AS1) interacting with co-expressed DEmRNAs and DEmiRNAs. Thus, the study indicated the importance of DElncRNAs and highlighted the efficacy as potential biomarkers in TNBC.

Keywords: Triple-negative breast cancer, long non-coding RNAs, competing endogenous RNA networks, Support Vector Machine-Recursive Feature Elimination

Journal Pre-proof Introduction

Breast cancer (BC) is the most commonly diagnosed and deadly cancer among women worldwide. Triple-negative breast cancer (TNBC) is a subclass of BC which is well known for its aggressiveness and associated with poor prognosis. The targeted therapies are not significantly improved for the survival of TNBC patients due to a lack of promising target [1]. Recent advancement in transcriptomics has produced a large number of coding and noncoding RNAs such as microRNAs (miRNAs) and long non-coding RNAs (lncRNAs). Their

of

dysregulation is often associated with various cancers, suggesting an important role in

ro

tumorigenesis and acts as diagnostic/ prognostic biomarkers [2]. LncRNAs are a class of noncoding transcripts which are of greater than 200 nucleotides and cannot encode proteins. In

-p

the past, they were thought to be a transcriptional noise but, studies reported their

re

resemblance to mRNAs which exerted a broad range of diverse biological processes involved in various cancer hallmarks [3]. They can act as either oncogenes or tumor suppressor genes.

lP

Moreover, recent studies show that a variety of RNAs can serve as competitive endogenous RNAs (ceRNAs) including lncRNAs and mRNAs. CeRNAs are endogenous transcripts that

na

share miRNA-response elements (MREs) and regulating each other by reducing miRNA availability through competing for shared miRNAs. Thus, ceRNAs and miRNAs can form a

Jo ur

complex regulatory network named as ceRNA networks (ceRNETs). Moreover, miRNAs are reported to involve in various diseases including cancer. Perturbing in any of the key nodes of cancer-associated ceRNETs such as miRNAs, mRNAs and lncRNAs may disturb the stability and robustness of the network thus leading to cancer initiation and progression [4]. Moreover, few dysregulated lncRNAs have been found to regulate TNBC progression. For instance, several lncRNAs such as mitotically-associated long noncoding RNA (MANCR; LINC00704), metastasis associated lung adenocarcinoma transcript 1 (MALAT1), long noncoding RNA antidifferentiation ncRNA (DANCR), long noncoding RNA highly up in liver cancer (HULC), long-intergenic noncoding RNA for kinase activation (LINK-A), small nucleolar RNA host gene 12 (SNHG12) and antisense non-coding RNA in the INK4 locus (ANRIL) are up-regulated in TNBC tissues. They could inhibit cell progression, migration, invasion, proliferation and induce cell apoptosis of TNBC cells. Also, down-regulation of lncRNAs such as rhabdomyosarcoma 2-associated transcript (RMST) and growth arrest-

Journal Pre-proof specific 5 (GAS5) were observed in TNBC tissues, which behave as tumor suppressors [5–9]. A study conducted by analysis of 116 TNBC samples with 11 normal tissues from TCGA revealed that 6 key hub lncRNAs are also enriched in the ceRNETs construction [10]. Nevertheless, the research work on lncRNAs for TNBC is very limited and understanding the association of lncRNA-miRNA-mRNA ceRNET is lacking. Therefore, there is a need for a comprehensive study of lncRNA expression profiles in a larger number of samples to characterize their expression profile, functions and involvement in the molecular mechanism underlying TNBC tumorigenesis. Thus, analyses on lncRNAs expression profiles of different datasets were performed to

of

identify the meta-signature lncRNAs in TNBC. The lncRNA expression pattern was analyzed

ro

that can distinguish TNBC from non-TNBC samples. Also, the study offers a deeper understanding of ceRNETs, to explore the molecular mechanism underlying the disease and

-p

to identify potential diagnostic/ prognostic biomarkers of TNBC.

re

Materials and Methods

lP

Datasets Collection

For the study, six microarray-based gene expression datasets were retrieved from the publicly

na

available NCBI Gene Expression Omnibus (GEO) (http://www.ncbi.nlm.nih.gov/geo/) database. The gene accession numbers for the collected datasets include GSE3744 [11],

Jo ur

GSE7904 [12], GSE21653 [13], GSE45827 [14], GSE65194 [15], and GSE76275 [16] respectively. All the datasets consist of TNBC and non-TNBC samples from the Affymetrix Human Genome U133 plus 2.0 microarray platform. From each dataset, raw CEL files were downloaded and pre-processed with robust multi-array average (RMA) for background correction, log2 transformed and quantile normalization [17]. Further, the pre-processed data were checked for quality control with arrayQualitymetrics in R [18]. Annotation of lncRNAs The probes for lncRNAs were selected from the Affymetrix U133 plus 2.0 microarray platform. According to previous studies [19,20] for the annotation of lncRNAs, the latest version of NetAffx annotation file (Release 36) was downloaded from the Affymetrix website. Each Probeset ID consists of gene symbol, Ensembl gene ID, RefSeq Transcript ID followed by other meaningful information and those probes labelled as ‘NR_’, ‘long noncoding’, ‘long intergenic’, ‘lincRNA’, and ‘antisense’ were considered. Also, the probe sets

Journal Pre-proof were re-mapped to the human genome (GRCh38) and lncRNAs from the GENCODE project, using SeqMap tool. Finally, the expression profiles of 4958 lncRNA transcripts were considered and subjected for further analysis. Identification of Differentially Expressed lncRNAs The differentially expressed lncRNAs (DElncRNAs) of each GEO dataset were identified using R LIMMA package with adjusted p-value < 0.05 and fold change (FC) of 1 [21]. Robust Rank Aggregation Method for Meta-Analysis

of

The lists of DElncRNA obtained from different datasets were merged using robust rank aggregation (RRA) by RobustRankAggreg in R [22], which uses a probabilistic model for

ro

aggregation and suitable to find the consistent genes. The DElncRNAs from each dataset were ranked based on the p-values and FC, where p-values are subjected to Bonferroni

re

Cluster Analysis of DElncRNAs

-p

correction to prevent false-positive results.

lP

Hierarchical cluster analysis was performed on DElncRNAs using heatmap.2 function with ggplot package in R [23], to find the expression patterns in different samples for each dataset.

na

Feature Selection and Classification using SVM-RFE & Boruta To extract minimal set of lncRNAs with high prediction accuracy and for model construction,

Jo ur

feature selection was performed on training dataset (GSE21653) with SVM-RFE (Support Vector Machine based on Recursive Feature Elimination), in conjunction with 10-fold crossvalidation using the caret package in R. Further, Boruta feature selection algorithm was used to measure genes based on their importance [24]. Performance Evaluation on Test Datasets The model was build using SVM-RFE and tested on test datasets such as GSE7904, GSE65194, GSE76275, and GSE45827. The performances of the test datasets were evaluated based on different evaluation measures. Validation of DElncRNAs The validation of DElncRNAs was carried out using LncRNADisease database accessible at http://www.rnanut.net/lncrnadisease/ http://cbrc.kaust.edu.sa/farna.

and

FARNA

tool

available

at

Journal Pre-proof Expression level of DElncRNAs The verification of DElncRNAs was performed on an independent RNA-seq expression dataset from TCGA database through GEPIA. GEPIA, available at http://gepia.cancerpku.cn/ is a web-based tool used to analyze RNA-sequencing expression data obtained from TCGA and Genotype-Tissue Expression (GTex) projects. The differential expression analysis was performed for ‘BRCA’ with their subtype filter ‘basal-like/ triple-negative’ of 135 samples and 291 normal samples with log2FC Cutoff >1 and p-value of <0.01. Survival Analysis

of

Survival analysis which includes both overall survival (OS) and disease-free survival (DFS)

ro

were performed on BRCA dataset with the basal-like/ triple-negative of TCGA obtained by RNA-seq of 135 patient tissue samples using GEPIA tool. Samples were divided into high

-p

and low expression groups based on the median for logrank test, known as the Mantel-Cox test. It also calculates the hazard ratios (HR) based on Cox PH model with 95% confidence

re

intervals (CI). The p-value of less than 0.05 was considered as statistically significant.

lP

Prediction of Target DElncRNAs and DEmRNAs of DEmiRNAs The differentially expressed mRNAs (DEmRNAs) and miRNAs (DEmiRNAs) were

na

collected from our previous studies [25,26]. DEmiRNA-DEmRNA interaction pairs were constructed by selecting the target genes from DEmRNAs, which express inversely with that

Jo ur

of miRNAs since miRNAs decrease the expression of the target mRNAs. A total of six databases both experimentally validated and predicted algorithms (miRDB, Targetscan 7.1, DIANA microtCDS, miRTarbase and DIANA Tarbase v8) were used to retrieve the targeted DEmRNAs of DEmiRNAs. miRDB (http://www.mirdb.org/) is a web interface for predicting targets

of

miRNAs,

and

their

functional

annotations.

Targetscan

7.1

(http://www.targetscan.org/), predicted miRNA targets by searching the conserved 8 mer, 7 mer and 6 mer sites that match to the seed region of each miRNA. DIANA microtCDS (http://www.microrna.gr/microT-CDS) is the latest version of microT algorithm that uses both the positive and negative set of MREs present in 3’-UTR and CDS regions. miRTarbase (http://mirtarbase.mbc.nctu.edu.tw) is a curated database that has experimentally validated miRNA-target interactions from more than 8,500 articles. DIANA Tarbase v8 (http://www.microrna.gr/tarbase) is an online database which supports miRNA- target interactions obtained via, both experimental as well as in silico studies. In the study, the targeted DEmRNAs produced by at least two prediction algorithms along with targets from

Journal Pre-proof miRTarbase and DIANA TarBase v8 were considered. Further, based on DEmiRNADEmRNA pairs, the interaction network was constructed in Cytoscape. DElncRNAs express inversely with DEmiRNAs, as miRNA may tend to regulate the lncRNA expression similar to mRNA, thus DEmiRNA-DElncRNA pairs were also obtained. miRcode database (http://www.mircode.org) was used to find targeted DElncRNAs of DEmiRNAs. This database provides whole human transcriptome miRNA target predictions according to GENCODE gene annotation with more than 10,000 lncRNA genes. The interaction network based on DElncRNA-DEmiRNA pairs was constructed in Cytoscape

of

[27].

ro

DElncRNAs-DEmRNAs Co-expression

Pearson’s correlation coefficient (PCC) was used to determine the co-expression relationships

-p

between DElncRNAs and DEmRNAs in all datasets, with PCC >0.6 and p-value <0.05.

re

Construction of the ceRNA Regulatory Network and Centrality Analysis The DElncRNA-DEmiRNA-DEmRNA interaction network was constructed in Cytoscape

lP

according to DEmiRNA-DEmRNA, DEmiRNA-DElncRNA, and DEmRNA-DElncRNA interaction pairs. In these pairs, the nodes represent biological molecules such as mRNA,

na

miRNA and lncRNA and edges are the interactions between them. For the constructed network, topological parameters such as degree and betweenness were calculated to explore

Jo ur

the important hubs by NetworkAnalyser, a Cytoscape plugin [28]. Functional and Pathway Enrichment Analysis The co-expressed mRNAs of DElncRNAs were used as input to identify the enriched functions and pathways using GeneCodis tool (http://genecodis.dacya.ucm.es/). Module Analysis of the Sub-Network The functional modules were identified from the network using MCODE, a Cytoscape plugin with default scoring parameters [29]. Results Differentially Expressed lncRNAs and Hierarchical Clustering A total of 253 DElncRNAs were identified across all the datasets consisting of 64 upregulated and 189 down-regulated DElncRNAs, in TNBC compared to non-TNBC samples

Journal Pre-proof by LIMMA package of R. The number of DElncRNAs obtained across all the datasets is tabulated in Table S1. Further through RRA, 26 DElncRNAs (17 down-regulated and 9 upregulated) with an adjusted p-value of 0.05 and the scores obtained were listed in Table S2. A hierarchical cluster of 26 DElncRNAs revealed distinct clusters of TNBC-associated expression in all datasets, which distinguishes TNBC from non-TNBC (Fig. 1). Feature Selection and Model Construction In total, 18 minimal sets of DElncRNAs were obtained with an accuracy of 0.95 through SVM-RFE (Support Vector Machine – Recursive Feature Selection) machine learning

of

method, as listed in Table 1. The lncRNA sets have the highest predicted accuracy in comparison to other combinations of gene sets with 10-fold cross-validation. Fig. 2A

ro

depicted the obtained RFE plot. Further, the Boruta function selected features among 18

re

Performance Evaluation on Test Datasets

-p

DElncRNAs confirms that all are important as shown in green color (Fig. 2B).

The test datasets include GSE7904, which consists of 42 samples and correctly classify; 17 of

lP

20 TNBC samples, and 22 of 22 non-TNBC samples. Likewise, GSE65194 of 148 samples correctly classify; 49 of 58 TNBC samples, and 88 of 90 non-TNBC samples, GSE76275 of

na

250 samples classify; 140 of 142 TNBC samples correctly, and 60 of 108 non-TNBC samples. Also, GSE45827 of 145 samples correctly classify; 47 of 56 TNBC samples, and 87

Jo ur

of 89 non-TNBC samples. The performance measures for various datasets are shown in Table 2 and found that the test datasets showed an accuracy of more than 80 %. Thus, the selected features could be considered as diagnostic biomarkers. Verification of DElncRNAs

DElncRNAs were verified using FARNA tool and LncRNADisease database, where it is found that 9 DElncRNAs are associated with BC (Table S3). Validation of DElncRNAs The DElncRNAs were verified on an independent RNA-seq expression dataset from TCGA database through GEPIA tool. Box plots were produced to compare different lncRNA levels, where 6 DElncRNAs are up-regulated and 9 DElncRNAs are down-regulated in tumor samples (Fig. S1).

Journal Pre-proof DElncRNAs as Prognostic Biomarkers The validation of DElncRNAs as prognostic biomarkers was carried out on an independent RNA-seq dataset by GEPIA tool, to identify the association of DElncRNAs with OS and DFS in TNBC patients. Kaplan-Meier curves were plotted and a logrank test was performed for OS and DFS of DElncRNAs, within high-risk and low-risk TNBC groups. It is observed that high expression of CTA-384D8.35 [HR (high) = 0.37, p (HR) =0.059, logrank p=0.049] was found to correlate with OS in TNBC patients. High expression of CTA-384D8.35 [HR (high) = 0.34, p (HR) =0.04, logrank p=0.032] and LINC01315 [HR (high) = 0.39, p (HR) =0.057, logrank p=0.049] was found to correlate with a better DFS in TNBC patients (Fig. 3). Out of

of

18 DElncRNAs, two DElncRNAs viz., CTA-384D8.35 and LINC01315, could act as

ro

prognostic biomarkers.

-p

DEmiRNA-DEmRNA Interaction Network

Based on DEmiRNA-DEmRNA interaction pairs from various databases, four up-regulated

re

DEmiRNAs were found to target 209 down-regulated DEmRNAs and two down-regulated

lP

DEmiRNAs interacted with 34 up-regulated DEmRNAs. Finally, the interaction network was constructed and visualised in Cytoscape, which consists of 249 nodes with 321

na

interaction/edges (Fig. S2).

DEmiRNA-DElncRNA Interaction Network

Jo ur

Based on DEmiRNA-DElncRNA interactions predicted by miRcode, it was found that two down-regulated miRNAs were targeted by two up-regulated lncRNAs and seven downregulated lncRNAs by two up-regulated miRNAs. The interaction network was constructed and visualised in Cytoscape, which consists of 13 nodes and 11 edges (Fig. S3). DElncRNA-DEmRNA Interaction Network A total of 1382 DElncRNA and DEmRNA interaction pairs were obtained with PCC > 0.6 and p-value < 0.05, and the network was constructed and visualised in Cytoscape. Construction of ceRNA Network Based on the above interaction data, DElncRNA-DEmiRNA-DEmRNA ceRNA network was obtained, which consists of 18 DElncRNAs, six DEmiRNAs and 668 DEmRNAs, further, it was visualized through Cytoscape (Fig. 4). The network has 692 nodes and 1714 interactions, where nodes represent lncRNA/miRNA/mRNA and edges are the interactions between them.

Journal Pre-proof Topological Analysis and their Sub Modules The topological properties of the network with defined parameters were calculated and found that the degree distribution of the whole network follows power-law distributions, with a slope of -1.7; R2, the measure of the goodness of the fit of data points to the curve of 0.7 and correlation of 0.9 thus revealing a scale-free network. In order to explore the hubs, degree and betweenness centrality are considered to be important parameters and found that 17 lncRNAs have higher nodes based on degree > 35 (Table 3). The lncRNAs tend to form hubs with the highest degree and betweenness

of

centrality compared to mRNA. From Fig. 4, it is observed that a large number of mRNAs

ro

interacted with lncRNAs, thus lncRNAs act as ceRNAs to communicate with multiple mRNAs by competing specific binding with shared miRNAs. Two important sub-modules

-p

were obtained with MCODE scores >3 from the constructed network. It was observed that

re

LINC01087, LINC01315 and SOX9-AS1 are key lncRNAs involved in ceRNETs (Fig. 5).

lP

Annotation of DElncRNAs

The co-expressed DEmRNAs of DElncRNAs were involved in the mitotic cell cycle, signal transduction biological processes and various cancer-related pathways. 120 GO biological

na

processes enriched terms including mitotic cell cycle, cell division, cell proliferation, and mitosis. For GO molecular function, 47 enriched terms were obtained that include protein

Jo ur

binding, nucleotide binding, and ATP binding. Also, 80 enriched GO cellular component terms such as nucleus, cytoplasm, and cytosol were obtained (Table 4). Further, KEGG pathway analysis revealed that, 30 significant enriched terms with FDR of < 0.05 including cell cycle, oocyte meiosis, pathways in cancer, progesterone-mediated oocyte maturation, DNA replication and TGF-beta signaling pathway (Table 5). Discussion The absence of promising and specific target for TNBC therapy necessitates identifying effective targets/ biomarkers. However, the development of transcriptomics studies has advanced knowledge in identifying a different class of non-coding RNAs named lncRNAs. Since lncRNAs play an important role in cancer progression and development, and gene regulatory network, the present study explored their crucial role as diagnostic and prognostic biomarkers and in the regulation of ceRNET. Even though numerous studies have indicated

Journal Pre-proof lncRNA expression profiles in TNBC compared to non-TNBC, the study is limited and only focussed on a smaller number of cancer samples. Hence, a comprehensive analysis of lncRNA expression profiles was conducted on a large number of samples >800 and also the regulatory relationship between DEmRNA-DEmiRNA-DElncRNA network was carried out. Therefore, different datasets were compiled and integrated to find consistent DElncRNAs. A total of 26 DElncRNAs were detected through RRA analysis and subjected to SVM-RFE to obtain a significant number of DElncRNAs. Further, 18 DElncRNAs were obtained with more than 80% accuracy in all test datasets, with high sensitivity and specificity thus could act as diagnostic biomarkers. Also, two DElncRNAs viz., CTA-384D8.35 and LINC01315

of

could act as prognostic biomarkers correlating with better OS and DFS. Finally, ceRNET was

ro

constructed with 18 DElncRNAs along with 668 DEmRNAs and six DEmiRNAs, to explore the role of lncRNAs played in TNBC development and progression. It was found from the

-p

sub-network of the constructed network that the three lncRNAs are regulating miRNAs, and

re

co-expressed with mRNAs.

Through literature studies, it is found that some of the DElncRNAs are already reported in

lP

TNBC studies. For instance, LINC00511 is an oncogene promoting cell proliferation, apoptosis, invasion, metastasis, and worse prognosis. The up-regulation of LINC00511 in

na

TNBC has been reported in a few studies. The study made an analysis of two lncRNA microarray datasets of TNBC and found that LINC00511 is one of the up-regulated lncRNA

Jo ur

[30]. It is reported that they function as ceRNA for miR-185-3p and target E2F1 protein thus promoting BC stemness and tumorigenesis [31]. BLACAT1, an up-regulated lncRNA, has been already reported in various cancers such as bladder, glioma, cervical, gastric, small-cell lung and colorectal cancer [32]. Also, their up-regulation in BC influences cell proliferation as well as metastasis through miR-150-5p by targeting CCR2 protein [33]. LINC00993 is found as one of the most down-regulated lncRNA in TNBC [34]. In the study, the dysregulation of lncRNAs such as LINC01315, FOXP4-AS1, CTD-2284J15.1, LINC01087, DSCAM-AS1, LINC01116, RP11-774O3.3 and DRAIC were observed in BC subtype such as basal-like [35]. Notably, accumulating evidence indicates the importance of the candidate lncRNAs as key biomarkers in various cancers, which led the researchers to look into their importance in TNBC. CTA-384D8.35 was found to be up-regulated in the study and also predicted as a biomarker for tongue tumor tissue [36]. AC093642.1, AC008268.1, AC000095.10, AC110619.2, and CTD-2311B13.7 were found to be down-regulated and have not been reported elsewhere.

Journal Pre-proof From the module analysis, it can be inferred that ceRNETs exhibited the interaction of DElncRNAs (SOX9-AS1 and LINC01315) with DEmRNAs (MRAS, CDK6, CALML4 and SFT2D2) as regulated by both hsa-miR-190b and hsa-miR-449a. Also, DElncRNA (LINC01087) is co-expressed with SUOX, HECTD2 and SLC39A6, regulated by hsa-miR135b-5p. Eventually, it was observed that LINC01315 can act as a prognostic biomarker as well as ceRNA for hsa-miR-190b, co-expressed with various targets such as MRAS, CDK6, SFT2D2, CALML4 and CDK6. An earlier study suggested that hsa-miR‐ 449a/b targets and inhibits the expression of oncogenic proteins viz., CDK6 and CDC25A (cell division cycle 25 homolog A), to arrest the

of

cell cycle in the breast cancer cell line MCF‐ 7 [37]. However, in the current study

ro

DElncRNAs, LINC01315 and SOX9-AS1 were found to be associated with breast cancer. Thus, these results proposed that significant DElncRNAs which could act as diagnostic as

re

which play a vital role in TNBC development.

-p

well as prognostic biomarkers. Also, some of the identified lncRNA function as ceRNAs

lP

Conclusion

The study identified 18 DElncRNAs that could distinguish TNBC from non-TNBC with high diagnostic specificity and sensitivity. Among the identified DElncRNAs, LINC01315 and

na

CTA-384D8.35 act as prognostic biomarkers. The dysregulated lncRNA-miRNA-mRNA network was constructed based on ceRNA hypothesis and sub-modules were identified

Jo ur

including three lncRNAs (LINC01087, LINC01315, and SOX9-AS1). Enrichment analysis of co-expressed genes indicated their involvement in the mitotic cell cycle, cancer-related pathways, etc. Hence the study revealed the usefulness of lncRNAs as a potential biomarker for TNBC diagnosis and treatment. Conflict of Interest The authors declare no conflict of interest Funding Information None Acknowledgements The authors thank Centre for Bioinformatics, Pondicherry University for providing the computer facility to carry out the work. Leimarembi Devi Naorem acknowledges a Senior

Journal Pre-proof Research Fellowship from the Council of Scientific & Industrial Research (CSIR). Mathavan Muthaiyan acknowledges a Senior Research Fellowship from the Rajiv Gandhi National Fellowship (RGNF). The authors also thank the technical group of the Centre for their technical assistance. References [1]

A.C. Garrido-Castro, N.U. Lin, K. Polyak, Insights into molecular classifications of triple-negative breast cancer: Improving patient selection for treatment, Cancer Discov. (2019). doi:10.1158/2159-8290.CD-18-1177. M. Xie, L. Ma, T. Xu, Y. Pan, Q. Wang, Y. Wei, Y. Shu, Potential Regulatory Roles

of

[2]

ro

of MicroRNAs and Long Noncoding RNAs in Anticancer Therapies, Mol. Ther. Nucleic Acids. (2018). doi:10.1016/j.omtn.2018.08.019.

T. Gutschner, S. Diederichs, The hallmarks of cancer: A long non-coding RNA point

-p

[3]

[4]

re

of view, RNA Biol. (2012). doi:10.4161/rna.20481. X. Qi, Y. Lin, J. Chen, B. Shen, Decoding competing endogenous RNA networks for

S. Sha, D. Yuan, Y. Liu, B. Han, N. Zhong, Targeting long non-coding RNA DANCR

na

[5]

lP

cancer biomarker discovery, Brief. Bioinform. (2019). doi:10.1093/bib/bbz006.

inhibits triple negative breast cancer progression, Biol. Open. (2017).

[6]

Jo ur

doi:10.1242/bio.023135.

F. Shi, F. Xiao, P. Ding, H. Qin, R. Huang, Long Noncoding RNA Highly Upregulated in Liver Cancer Predicts Unfavorable Outcome and Regulates Metastasis by MMPs in Triple-negative Breast Cancer, Arch. Med. Res. (2016). doi:10.1016/j.arcmed.2016.11.001.

[7]

K.M. Tracy, C.E. Tye, P.N. Ghule, H.L.H. Malaby, J. Stumpff, J.L. Stein, G.S. Stein, J.B. Lian, Mitotically-associated lncRNA (MANCR) affects genomic stability and cell division in aggressive breast cancer, Mol. Cancer Res. (2018). doi:10.1158/15417786.MCR-17-0548.

[8]

R. Rodríguez Bautista, A. Ortega Gómez, A. Hidalgo Miranda, A. Zentella Dehesa, C. Villarreal-Garza, F. Ávila-Moreno, O. Arrieta, Long non-coding RNAs: Implications in targeted diagnoses, prognosis, and improved therapeutic strategies in human nonand triple-negative breast cancer, Clin. Epigenetics. (2018). doi:10.1186/s13148-018-

Journal Pre-proof 0514-z. [9]

X. Kong, W. Liu, Y. Kong, Roles and expression profiles of long non-coding RNAs in triple-negative breast cancers, J. Cell. Mol. Med. (2018). doi:10.1111/jcmm.13327.

[10] N. Yuan, G. Zhang, F. Bie, M. Ma, Y. Ma, X. Jiang, Y. Wang, X. Hao, Integrative analysis of lnCRNAs and miRNAs with coding RNAs associated with ceRNA crosstalk network in triple negative breast cancer, Onco. Targets. Ther. (2017). doi:10.2147/OTT.S149308. [11] A. Alimonti, A. Carracedo, J.G. Clohessy, L.C. Trotman, C. Nardella, A. Egia, L.

of

Salmena, K. Sampieri, W.J. Haveman, E. Brogi, A.L. Richardson, J. Zhang, P.P.

(2010) 454. https://doi.org/10.1038/ng.556.

ro

Pandolfi, Subtle variations in Pten dose determine cancer susceptibility, Nat. Genet. 42

-p

[12] A.L. Richardson, Z.C. Wang, A. De Nicolo, X. Lu, M. Brown, A. Miron, X. Liao, J.D.

re

Iglehart, D.M. Livingston, S. Ganesan, X chromosomal abnormalities in basal-like human breast cancer, Cancer Cell. (2006). doi:10.1016/j.ccr.2006.01.013.

lP

[13] R. Sabatier, P. Finetti, J. Adelaide, A. Guille, J.P. Borg, M. Chaffanet, L. Lane, D. Birnbaum, F. Bertucci, Down-regulation of ECRG4, a candidate tumor suppressor

na

gene, in human breast cancer, PLoS One. (2011). doi:10.1371/journal.pone.0027656. [14] T. Gruosso, V. Mieulet, M. Cardon, B. Bourachot, Y. Kieffer, F. Devun, T. Dubois, M.

Jo ur

Dutreix, A. Vincent‐ Salomon, K.M. Miller, F. Mechta‐ Grigoriou, Chronic oxidative stress promotes H2AX protein degradation and enhances chemosensitivity in breast cancer patients, EMBO Mol. Med. (2016). doi:10.15252/emmm.201505891. [15] P. Den Hollander, K. Rawls, A. Tsimelzon, J. Shepherd, A. Mazumdar, J. Hill, S.A.W. Fuqua, J.C. Chang, C.K. Osborne, S.G. Hilsenbeck, G.B. Mills, P.H. Brown, Phosphatase PTP4A3 promotes triple-negative breast cancer growth and predicts poor patient survival, Cancer Res. (2016). doi:10.1158/0008-5472.CAN-14-0673. [16] S. Maubant, B. Tesson, V. Maire, M. Ye, G. Rigaill, D. Gentien, F. Cruzalegui, G.C. Tucker, S. Roman-Roman, T. Dubois, Transcriptome analysis of wnt3a-treated triplenegative breast cancer cells, PLoS One. (2015). doi:10.1371/journal.pone.0122333. [17] L. Gautier, L. Cope, B.M. Bolstad, R.A. Irizarry, Affy - Analysis of Affymetrix GeneChip data at the probe level, Bioinformatics. (2004).

Journal Pre-proof doi:10.1093/bioinformatics/btg405. [18] A. Kauffmann, R. Gentleman, W. Huber, arrayQualityMetrics - A bioconductor package for quality assessment of microarray data, Bioinformatics. (2009). doi:10.1093/bioinformatics/btn647. [19] G. Zhang, H. Sun, Y. Zhang, H. Zhao, W. Fan, J. Li, Y. Lv, Q. Song, J. Li, M. Zhang, H. Shi, Characterization of dysregulated lncRNA-mRNA network based on ceRNA hypothesis to reveal the occurrence and recurrence of myocardial infarction, Cell Death Discov. (2018). doi:10.1038/s41420-018-0036-7.

of

[20] M. Zhou, H. Zhao, X. Wang, J. Sun, J. Su, Analysis of long noncoding RNAs

ro

highlights region-specific altered expression patterns and diagnostic roles in Alzheimer’s disease, Brief. Bioinform. (2018). doi:10.1093/bib/bby021.

-p

[21] G.K. Smyth, Limma: linear models for microarray data BT - Bioinformatics and

re

Computational Biology Solutions Using R and Bioconductor, Bioinforma. Comput. Biol. Solut. Using R Bioconductor. (2005). doi:10.1007/0-387-29362-0_23.

lP

[22] R. Kolde, S. Laur, P. Adler, J. Vilo, Robust rank aggregation for gene list integration

na

and meta-analysis, Bioinformatics. (2012). doi:10.1093/bioinformatics/btr709. [23] G.R. Warnes, B. Bolker, L. Bonebakker, R. Gentleman, W.H.A. Liaw, T. Lumley, M. Maechler, A. Magnusson, S. Moeller, M. Schwartz, B. Venables, Package “gplots”:

Jo ur

Various R programming tools for plotting data, R Packag. Version 2.17.0. (2016). doi:10.1111/j.0022-3646.1997.00569.x. [24] P. Guo, Y. Luo, G. Mai, M. Zhang, G. Wang, M. Zhao, L. Gao, F. Li, F. Zhou, Gene expression profile based classification models of psoriasis, Genomics. (2014). doi:10.1016/j.ygeno.2013.11.001. [25] L.D. Naorem, M. Muthaiyan, A. Venkatesan, Integrated network analysis and machine learning approach for the identification of key genes of triple-negative breast cancer, J. Cell. Biochem. 120 (2019) 6154–6167. doi:10.1002/jcb.27903. [26] L.D. Naorem, M. Muthaiyan, A. Venkatesan, Identification of dysregulated miRNAs in triple negative breast cancer: A meta-analysis approach, J. Cell. Physiol. 234 (2019) 11768–11779. doi:10.1002/jcp.27839. [27] P. Shannon, A. Markiel, O. Ozier, N.S. Baliga, J.T. Wang, D. Ramage, N. Amin, B.

Journal Pre-proof Schwikowski, T. Ideker, Cytoscape: a software environment for integrated models of biomolecular interaction networks., Genome Res. (2003). doi:10.1101/gr.1239303. [28] Y. Assenov, F. Ramírez, S.E.S.E. Schelhorn, T. Lengauer, M. Albrecht, Computing topological parameters of biological networks, Bioinformatics. (2008). doi:10.1093/bioinformatics/btm554. [29] G.D. Bader, C.W. V Hogue, An automated method for finding molecular complexes in large protein interaction networks., BMC Bioinformatics. (2003). [30] T. Tian, Z. Gong, M. Wang, R. Hao, S. Lin, K. Liu, F. Guan, P. Xu, Y. Deng, D. Song,

of

N. Li, Y. Wu, Z. Dai, Identification of long non-coding RNA signatures in triple-

ro

negative breast cancer, Cancer Cell Int. (2018). doi:10.1186/s12935-018-0598-8. [31] G. Lu, Y. Li, Y. Ma, J. Lu, Y. Chen, Q. Jiang, Q. Qin, L. Zhao, Q. Huang, Z. Luo, S.

-p

Huang, Z. Wei, Long noncoding RNA LINC00511 contributes to breast cancer

re

tumourigenesis and stemness by inducing the miR-185-3p/E2F1/Nanog axis, J. Exp. Clin. Cancer Res. (2018). doi:10.1186/s13046-018-0945-6.

lP

[32] H. Lu, H. Liu, X. Yang, T. Ye, P. Lv, X. Wu, Z. Ye, LncRNA BLACAT1 May Serve as a Prognostic Predictor in Cancer: Evidence from a Meta-Analysis, Biomed Res. Int.

na

(2019). doi:10.1155/2019/1275491.

[33] X. Hu, Y. Liu, Y. Du, T. Cheng, W. Xia, Long non-coding RNA BLACAT1 promotes

Jo ur

breast cancer cell proliferation and metastasis by miR-150-5p/CCR2, Cell Biosci. 9 (2019) 14. doi:10.1186/s13578-019-0274-2. [34] C. Chen, Z. Li, Y. Yang, T. Xiang, W. Song, S. Liu, Microarray expression profiling of dysregulated long non-coding RNAs in triple-negative breast cancer, Cancer Biol. Ther. (2015). doi:10.1080/15384047.2015.1040957. [35] C. Mathias, E.P. Zambalde, P. Rask, D.F. Gradia, J.C. de Oliveira, Long non-coding RNAs differential expression in breast cancer subtypes: What do we know?, Clin. Genet. (2019). doi:10.1111/cge.13502. [36] A. Mohammed, G. Biegert, J. Adamec, T. Helikar, Identification of potential tissuespecific cancer biomarkers and development of cancer versus normal genomic classifiers, Oncotarget. (2017). doi:10.18632/oncotarget.21127. [37] X. Yang, M. Feng, X. Jiang, Z. Wu, Z. Li, M. Aau, Y. Qiang, miR-449a and miR-449b

Journal Pre-proof are direct transcriptional targets of E2F1 and negatively regulate pRb-E2F1 activity through a feedback loop by targeting CDK6 and CDC25A, Genes Dev. (2009).

Jo ur

na

lP

re

-p

ro

of

doi:10.1101/gad.1819009.

Journal Pre-proof

Jo ur

na

lP

Fig. 1: Hierarchical clustering of datasets

re

-p

ro

of

Figures

Journal Pre-proof

Jo ur

na

lP

re

-p

ro

of

Fig. 2: A. Recursive Feature Elimination (RFE) and B. Boruta plot for DElncRNAs

of

Journal Pre-proof

Jo ur

na

lP

re

-p

ro

Fig. 3: Overall survival and disease-free survival for DElncRNAs

lP

re

-p

ro

of

Journal Pre-proof

Jo ur

na

Fig. 4: DElncRNA-DEmiRNA-DEmRNA interaction network

ro

of

Journal Pre-proof

Jo ur

na

lP

re

-p

Fig. 5: The sub-modules of ceRNA network

Journal Pre-proof Tables Table 1. The list of 18 minimal sets of DElncRNAs S.No. Probesets

1. 227452_at

lncRNA

Regul

Full name

LINC00511/// LINC00673

ation

long intergenic non-protein coding RNA Up 511 /// long intergenic non-protein coding RNA 673

2. 243754_at

CTA-384D8.35 uncharacterized LOC105373098

Up

3. 232105_at

BLACAT1

Up

4. 229930_at

LINC01315

5. 230419_at

SOX9-AS1

SOX9 antisense RNA 1

6. 238898_at

FOXP4-AS1

FOXP4 antisense RNA 1

7. 1555893_at

CTD-2284J15.1 uncharacterized LOC105375624

10. 1560850_at

of

ro

-p

re

Up Down Down

LINC01087

1087

AC008268.1

Jo ur

13. 214858_at

uncharacterized FLJ38379

Up

long intergenic non-protein coding RNA Down

11. 1562821_a_at DSCAM-AS1 12. 213089_at

1315

AC093838.7///

na

9. 1557843_at

long intergenic non-protein coding RNA Up

lP

8. 1556474_a_at AC093642.1

bladder cancer associated transcript 1

AC000095.10 AC110619.2/// PP14571

uncharacterized LOC101926959

Down

DSCAM antisense RNA 1

Down

uncharacterized LOC100272216

Down

uncharacterized LOC100130449

Down

AC017048.3///

long intergenic non-protein coding RNA Down

LINC01116

1116

15. 229130_at

RP11-774O3.3

---

16. 237339_at

LINC00993

17. 239594_at

DRAIC

18. 243241_at

CTD-2311B13.7

14. 228564_at

Down

long intergenic non-protein coding RNA Down 993 downregulated RNA in cancer, inhibitor Down of cell invasion and migration uncharacterized LOC105378171 /// uncharacterized LOC105378174

Down

Journal Pre-proof Table 2. Performance measures of different independent test datasets Accuracy

Kappa

Sensitivity

Specificity

Precision

Recall

F1

GSE7904

0.9286

0.8558

1

0.8800

0.8500

1

0.9189

GSE76275

0.80

0.5706

0.7447

0.9677

0.9859

0.7447

0.8485

GSE65194

0.9257

0.8406

0.9608

0.9072

0.8448

0.9608

0.8991

GSE45827

0.9241

0.8362

0.9592

0.9062

0.8393

0.9592

0.8952

Jo ur

na

lP

re

-p

ro

of

Dataset

Journal Pre-proof Table 3. The list of DElncRNAs with degree and betweenness centrality

Betweenness Degree 0.159735

130

AC017048.3///LINC01116

0.202788

129

RP11-328M4.2///FOXP4-AS1 0.432991

122

AC000095.10

0.101842

107

RP11-774O3.3

0.098943

104

CTD-2311B13.7

0.10924

100

CTA-384D8.35

0.364581

97

LINC01315

0.211189

83

AC093642.1

0.054693

79

DRAIC

0.06348

68

0.038579

62

0.176324

60

0.054975

60

0.105958

47

0.101899

43

AC093838.7///LINC01087

0.020751

39

DSCAM-AS1

0.018883

38

-p

of

LINC00993

ro

DElncRNAs

LINC00511

BLACAT1

Jo ur

na

SOX9-AS1

lP

CTD-2284J15.1

re

AC008268.1

Journal Pre-proof Table 4. The top 10 most significantly enriched GO terms for the co-expressed DEmRNAs of DElncRNAs

Biological Process GO Term

No. of genes

FDR

0000278 mitotic cell cycle

31

4.48E-13

0051301 cell division

29

2.40E-12

0008283 cell proliferation

24

1.36E-07

0007049 cell cycle

28

2.02E-07

0007067 mitosis

of

GO ID

17

2.84E-06

12

9.61E-06

14

1.95E-05

11

2.07E-05

44

2.55E-05

0045892 negative regulation of transcription, DNA-dependent 23

2.58E-05

0000082 G1/S transition of mitotic cell cycle

-p

0000236 mitotic prometaphase

ro

0000087 M phase of mitotic cell cycle

re

0007165 signal transduction

0005515 protein binding

lP

Molecular Function 199

2.52E-42

93

9.56E-17

68

1.04E-12

91

2.41E-08

0042803 protein homodimerization activity

25

0.000106

0017124 SH3 domain binding

11

0.000283

0004674 protein serine/threonine kinase activity

20

0.000297

0003677 DNA binding

53

0.000612

0003723 RNA binding

26

0.000643

0003682 chromatin binding

14

0.000971

0005634 nucleus

209

3.02E-35

0005737 cytoplasm

205

4.42E-35

0005730 nucleolus

70

1.95E-14

0005829 cytosol

86

1.51E-13

0005654 nucleoplasm

47

5.26E-11

0005524 ATP binding

na

0000166 nucleotide binding

Jo ur

0046872 metal ion binding

Cellular Component

Journal Pre-proof 121

2.11E-10

0005856 cytoskeleton

44

3.32E-10

0005886 plasma membrane

104

2.78E-08

0031012 extracellular matrix

16

5.74E-08

0005739 mitochondrion

51

1.02E-06

Jo ur

na

lP

re

-p

ro

of

0016020 membrane

Journal Pre-proof Table 5. The top 10 most significantly enriched KEGG pathway terms for the co-expressed DEmRNAs of DElncRNAs

Pathways

No. of genes

FDR

Cell cycle

18

1.99E-10

04114

Oocyte meiosis

12

1.68E-05

05200

Pathways in cancer

16

0.00206

04914

Progesterone-mediated oocyte maturation 8

0.00239

04970

Salivary secretion

8

0.00274

05222

Small cell lung cancer

8

0.00336

03030

DNA replication

5

0.0054

04350

TGF-beta signaling pathway

7

0.00724

00240

Pyrimidine metabolism

7

0.01243

04962

Vasopressin-regulated water reabsorption 5

Jo ur

na

lP

re

-p

ro

04110

of

KEGG ID

0.0125

Journal Pre-proof Author statement

Jo ur

na

lP

re

-p

ro

of

Not Applicable

Figure 1

Figure 2

Figure 3

Figure 4

Figure 5