Journal Pre-proof Comprehensive analysis of dysregulated lncRNAs and their competing endogenous RNA network in triple-negative breast cancer
Leimarembi Devi Naorem, Vigneshwar Mathavan Muthaiyan, Amouda Venkatesan
Suriya
Prakash,
PII:
S0141-8130(19)38947-0
DOI:
https://doi.org/10.1016/j.ijbiomac.2019.12.196
Reference:
BIOMAC 14224
To appear in:
International Journal of Biological Macromolecules
Received date:
3 November 2019
Revised date:
21 December 2019
Accepted date:
21 December 2019
Please cite this article as: L.D. Naorem, V.S. Prakash, M. Muthaiyan, et al., Comprehensive analysis of dysregulated lncRNAs and their competing endogenous RNA network in triple-negative breast cancer, International Journal of Biological Macromolecules(2019), https://doi.org/10.1016/j.ijbiomac.2019.12.196
This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
© 2019 Published by Elsevier.
Journal Pre-proof Comprehensive analysis of dysregulated lncRNAs and their competing endogenous RNA network in triple-negative breast cancer Leimarembi Devi Naorem, Vigneshwar Suriya Prakash, Mathavan Muthaiyan and Amouda Venkatesan* Centre for Bioinformatics, School of Life Sciences, Pondicherry University, Pondicherry, India. *To whom the correspondence should be addressed.
of
Address: Centre for Bioinformatics, School of Life Sciences, Pondicherry University, R. V. Nagar Kalapet, Pondicherry-605014, India. E-mail:
[email protected],
[email protected]
-p
ro
Tel: +91-413-2654-582
re
Abstract
The study aimed to explore the molecular mechanism underlying triple-negative breast
lP
cancer (TNBC) and to identify their potential diagnostic/ prognostic biomarkers. The differentially expressed lncRNAs (DElncRNAs) were identified by meta-analysis and
na
machine learning feature selection methods. The dysregulated lncRNA-miRNA-mRNA network was constructed based on the competing endogenous RNA (ceRNA) hypothesis. A
Jo ur
total of 26 DElncRNAs were identified with a meta-analysis approach of which 18 DElncRNAs attained high accuracy in training and test dataset by Support Vector MachineRecursive Feature Elimination (SVM-RFE) which could act as diagnostic biomarkers. Among the identified DElncRNAs, LINC01315 and CTA-384D8.35 could act as prognostic biomarkers. Finally, two important sub-modules from lncRNA-miRNA-mRNA network were identified which consists of DElncRNAs (LINC01087, LINC01315, and SOX9-AS1) interacting with co-expressed DEmRNAs and DEmiRNAs. Thus, the study indicated the importance of DElncRNAs and highlighted the efficacy as potential biomarkers in TNBC.
Keywords: Triple-negative breast cancer, long non-coding RNAs, competing endogenous RNA networks, Support Vector Machine-Recursive Feature Elimination
Journal Pre-proof Introduction
Breast cancer (BC) is the most commonly diagnosed and deadly cancer among women worldwide. Triple-negative breast cancer (TNBC) is a subclass of BC which is well known for its aggressiveness and associated with poor prognosis. The targeted therapies are not significantly improved for the survival of TNBC patients due to a lack of promising target [1]. Recent advancement in transcriptomics has produced a large number of coding and noncoding RNAs such as microRNAs (miRNAs) and long non-coding RNAs (lncRNAs). Their
of
dysregulation is often associated with various cancers, suggesting an important role in
ro
tumorigenesis and acts as diagnostic/ prognostic biomarkers [2]. LncRNAs are a class of noncoding transcripts which are of greater than 200 nucleotides and cannot encode proteins. In
-p
the past, they were thought to be a transcriptional noise but, studies reported their
re
resemblance to mRNAs which exerted a broad range of diverse biological processes involved in various cancer hallmarks [3]. They can act as either oncogenes or tumor suppressor genes.
lP
Moreover, recent studies show that a variety of RNAs can serve as competitive endogenous RNAs (ceRNAs) including lncRNAs and mRNAs. CeRNAs are endogenous transcripts that
na
share miRNA-response elements (MREs) and regulating each other by reducing miRNA availability through competing for shared miRNAs. Thus, ceRNAs and miRNAs can form a
Jo ur
complex regulatory network named as ceRNA networks (ceRNETs). Moreover, miRNAs are reported to involve in various diseases including cancer. Perturbing in any of the key nodes of cancer-associated ceRNETs such as miRNAs, mRNAs and lncRNAs may disturb the stability and robustness of the network thus leading to cancer initiation and progression [4]. Moreover, few dysregulated lncRNAs have been found to regulate TNBC progression. For instance, several lncRNAs such as mitotically-associated long noncoding RNA (MANCR; LINC00704), metastasis associated lung adenocarcinoma transcript 1 (MALAT1), long noncoding RNA antidifferentiation ncRNA (DANCR), long noncoding RNA highly up in liver cancer (HULC), long-intergenic noncoding RNA for kinase activation (LINK-A), small nucleolar RNA host gene 12 (SNHG12) and antisense non-coding RNA in the INK4 locus (ANRIL) are up-regulated in TNBC tissues. They could inhibit cell progression, migration, invasion, proliferation and induce cell apoptosis of TNBC cells. Also, down-regulation of lncRNAs such as rhabdomyosarcoma 2-associated transcript (RMST) and growth arrest-
Journal Pre-proof specific 5 (GAS5) were observed in TNBC tissues, which behave as tumor suppressors [5–9]. A study conducted by analysis of 116 TNBC samples with 11 normal tissues from TCGA revealed that 6 key hub lncRNAs are also enriched in the ceRNETs construction [10]. Nevertheless, the research work on lncRNAs for TNBC is very limited and understanding the association of lncRNA-miRNA-mRNA ceRNET is lacking. Therefore, there is a need for a comprehensive study of lncRNA expression profiles in a larger number of samples to characterize their expression profile, functions and involvement in the molecular mechanism underlying TNBC tumorigenesis. Thus, analyses on lncRNAs expression profiles of different datasets were performed to
of
identify the meta-signature lncRNAs in TNBC. The lncRNA expression pattern was analyzed
ro
that can distinguish TNBC from non-TNBC samples. Also, the study offers a deeper understanding of ceRNETs, to explore the molecular mechanism underlying the disease and
-p
to identify potential diagnostic/ prognostic biomarkers of TNBC.
re
Materials and Methods
lP
Datasets Collection
For the study, six microarray-based gene expression datasets were retrieved from the publicly
na
available NCBI Gene Expression Omnibus (GEO) (http://www.ncbi.nlm.nih.gov/geo/) database. The gene accession numbers for the collected datasets include GSE3744 [11],
Jo ur
GSE7904 [12], GSE21653 [13], GSE45827 [14], GSE65194 [15], and GSE76275 [16] respectively. All the datasets consist of TNBC and non-TNBC samples from the Affymetrix Human Genome U133 plus 2.0 microarray platform. From each dataset, raw CEL files were downloaded and pre-processed with robust multi-array average (RMA) for background correction, log2 transformed and quantile normalization [17]. Further, the pre-processed data were checked for quality control with arrayQualitymetrics in R [18]. Annotation of lncRNAs The probes for lncRNAs were selected from the Affymetrix U133 plus 2.0 microarray platform. According to previous studies [19,20] for the annotation of lncRNAs, the latest version of NetAffx annotation file (Release 36) was downloaded from the Affymetrix website. Each Probeset ID consists of gene symbol, Ensembl gene ID, RefSeq Transcript ID followed by other meaningful information and those probes labelled as ‘NR_’, ‘long noncoding’, ‘long intergenic’, ‘lincRNA’, and ‘antisense’ were considered. Also, the probe sets
Journal Pre-proof were re-mapped to the human genome (GRCh38) and lncRNAs from the GENCODE project, using SeqMap tool. Finally, the expression profiles of 4958 lncRNA transcripts were considered and subjected for further analysis. Identification of Differentially Expressed lncRNAs The differentially expressed lncRNAs (DElncRNAs) of each GEO dataset were identified using R LIMMA package with adjusted p-value < 0.05 and fold change (FC) of 1 [21]. Robust Rank Aggregation Method for Meta-Analysis
of
The lists of DElncRNA obtained from different datasets were merged using robust rank aggregation (RRA) by RobustRankAggreg in R [22], which uses a probabilistic model for
ro
aggregation and suitable to find the consistent genes. The DElncRNAs from each dataset were ranked based on the p-values and FC, where p-values are subjected to Bonferroni
re
Cluster Analysis of DElncRNAs
-p
correction to prevent false-positive results.
lP
Hierarchical cluster analysis was performed on DElncRNAs using heatmap.2 function with ggplot package in R [23], to find the expression patterns in different samples for each dataset.
na
Feature Selection and Classification using SVM-RFE & Boruta To extract minimal set of lncRNAs with high prediction accuracy and for model construction,
Jo ur
feature selection was performed on training dataset (GSE21653) with SVM-RFE (Support Vector Machine based on Recursive Feature Elimination), in conjunction with 10-fold crossvalidation using the caret package in R. Further, Boruta feature selection algorithm was used to measure genes based on their importance [24]. Performance Evaluation on Test Datasets The model was build using SVM-RFE and tested on test datasets such as GSE7904, GSE65194, GSE76275, and GSE45827. The performances of the test datasets were evaluated based on different evaluation measures. Validation of DElncRNAs The validation of DElncRNAs was carried out using LncRNADisease database accessible at http://www.rnanut.net/lncrnadisease/ http://cbrc.kaust.edu.sa/farna.
and
FARNA
tool
available
at
Journal Pre-proof Expression level of DElncRNAs The verification of DElncRNAs was performed on an independent RNA-seq expression dataset from TCGA database through GEPIA. GEPIA, available at http://gepia.cancerpku.cn/ is a web-based tool used to analyze RNA-sequencing expression data obtained from TCGA and Genotype-Tissue Expression (GTex) projects. The differential expression analysis was performed for ‘BRCA’ with their subtype filter ‘basal-like/ triple-negative’ of 135 samples and 291 normal samples with log2FC Cutoff >1 and p-value of <0.01. Survival Analysis
of
Survival analysis which includes both overall survival (OS) and disease-free survival (DFS)
ro
were performed on BRCA dataset with the basal-like/ triple-negative of TCGA obtained by RNA-seq of 135 patient tissue samples using GEPIA tool. Samples were divided into high
-p
and low expression groups based on the median for logrank test, known as the Mantel-Cox test. It also calculates the hazard ratios (HR) based on Cox PH model with 95% confidence
re
intervals (CI). The p-value of less than 0.05 was considered as statistically significant.
lP
Prediction of Target DElncRNAs and DEmRNAs of DEmiRNAs The differentially expressed mRNAs (DEmRNAs) and miRNAs (DEmiRNAs) were
na
collected from our previous studies [25,26]. DEmiRNA-DEmRNA interaction pairs were constructed by selecting the target genes from DEmRNAs, which express inversely with that
Jo ur
of miRNAs since miRNAs decrease the expression of the target mRNAs. A total of six databases both experimentally validated and predicted algorithms (miRDB, Targetscan 7.1, DIANA microtCDS, miRTarbase and DIANA Tarbase v8) were used to retrieve the targeted DEmRNAs of DEmiRNAs. miRDB (http://www.mirdb.org/) is a web interface for predicting targets
of
miRNAs,
and
their
functional
annotations.
Targetscan
7.1
(http://www.targetscan.org/), predicted miRNA targets by searching the conserved 8 mer, 7 mer and 6 mer sites that match to the seed region of each miRNA. DIANA microtCDS (http://www.microrna.gr/microT-CDS) is the latest version of microT algorithm that uses both the positive and negative set of MREs present in 3’-UTR and CDS regions. miRTarbase (http://mirtarbase.mbc.nctu.edu.tw) is a curated database that has experimentally validated miRNA-target interactions from more than 8,500 articles. DIANA Tarbase v8 (http://www.microrna.gr/tarbase) is an online database which supports miRNA- target interactions obtained via, both experimental as well as in silico studies. In the study, the targeted DEmRNAs produced by at least two prediction algorithms along with targets from
Journal Pre-proof miRTarbase and DIANA TarBase v8 were considered. Further, based on DEmiRNADEmRNA pairs, the interaction network was constructed in Cytoscape. DElncRNAs express inversely with DEmiRNAs, as miRNA may tend to regulate the lncRNA expression similar to mRNA, thus DEmiRNA-DElncRNA pairs were also obtained. miRcode database (http://www.mircode.org) was used to find targeted DElncRNAs of DEmiRNAs. This database provides whole human transcriptome miRNA target predictions according to GENCODE gene annotation with more than 10,000 lncRNA genes. The interaction network based on DElncRNA-DEmiRNA pairs was constructed in Cytoscape
of
[27].
ro
DElncRNAs-DEmRNAs Co-expression
Pearson’s correlation coefficient (PCC) was used to determine the co-expression relationships
-p
between DElncRNAs and DEmRNAs in all datasets, with PCC >0.6 and p-value <0.05.
re
Construction of the ceRNA Regulatory Network and Centrality Analysis The DElncRNA-DEmiRNA-DEmRNA interaction network was constructed in Cytoscape
lP
according to DEmiRNA-DEmRNA, DEmiRNA-DElncRNA, and DEmRNA-DElncRNA interaction pairs. In these pairs, the nodes represent biological molecules such as mRNA,
na
miRNA and lncRNA and edges are the interactions between them. For the constructed network, topological parameters such as degree and betweenness were calculated to explore
Jo ur
the important hubs by NetworkAnalyser, a Cytoscape plugin [28]. Functional and Pathway Enrichment Analysis The co-expressed mRNAs of DElncRNAs were used as input to identify the enriched functions and pathways using GeneCodis tool (http://genecodis.dacya.ucm.es/). Module Analysis of the Sub-Network The functional modules were identified from the network using MCODE, a Cytoscape plugin with default scoring parameters [29]. Results Differentially Expressed lncRNAs and Hierarchical Clustering A total of 253 DElncRNAs were identified across all the datasets consisting of 64 upregulated and 189 down-regulated DElncRNAs, in TNBC compared to non-TNBC samples
Journal Pre-proof by LIMMA package of R. The number of DElncRNAs obtained across all the datasets is tabulated in Table S1. Further through RRA, 26 DElncRNAs (17 down-regulated and 9 upregulated) with an adjusted p-value of 0.05 and the scores obtained were listed in Table S2. A hierarchical cluster of 26 DElncRNAs revealed distinct clusters of TNBC-associated expression in all datasets, which distinguishes TNBC from non-TNBC (Fig. 1). Feature Selection and Model Construction In total, 18 minimal sets of DElncRNAs were obtained with an accuracy of 0.95 through SVM-RFE (Support Vector Machine – Recursive Feature Selection) machine learning
of
method, as listed in Table 1. The lncRNA sets have the highest predicted accuracy in comparison to other combinations of gene sets with 10-fold cross-validation. Fig. 2A
ro
depicted the obtained RFE plot. Further, the Boruta function selected features among 18
re
Performance Evaluation on Test Datasets
-p
DElncRNAs confirms that all are important as shown in green color (Fig. 2B).
The test datasets include GSE7904, which consists of 42 samples and correctly classify; 17 of
lP
20 TNBC samples, and 22 of 22 non-TNBC samples. Likewise, GSE65194 of 148 samples correctly classify; 49 of 58 TNBC samples, and 88 of 90 non-TNBC samples, GSE76275 of
na
250 samples classify; 140 of 142 TNBC samples correctly, and 60 of 108 non-TNBC samples. Also, GSE45827 of 145 samples correctly classify; 47 of 56 TNBC samples, and 87
Jo ur
of 89 non-TNBC samples. The performance measures for various datasets are shown in Table 2 and found that the test datasets showed an accuracy of more than 80 %. Thus, the selected features could be considered as diagnostic biomarkers. Verification of DElncRNAs
DElncRNAs were verified using FARNA tool and LncRNADisease database, where it is found that 9 DElncRNAs are associated with BC (Table S3). Validation of DElncRNAs The DElncRNAs were verified on an independent RNA-seq expression dataset from TCGA database through GEPIA tool. Box plots were produced to compare different lncRNA levels, where 6 DElncRNAs are up-regulated and 9 DElncRNAs are down-regulated in tumor samples (Fig. S1).
Journal Pre-proof DElncRNAs as Prognostic Biomarkers The validation of DElncRNAs as prognostic biomarkers was carried out on an independent RNA-seq dataset by GEPIA tool, to identify the association of DElncRNAs with OS and DFS in TNBC patients. Kaplan-Meier curves were plotted and a logrank test was performed for OS and DFS of DElncRNAs, within high-risk and low-risk TNBC groups. It is observed that high expression of CTA-384D8.35 [HR (high) = 0.37, p (HR) =0.059, logrank p=0.049] was found to correlate with OS in TNBC patients. High expression of CTA-384D8.35 [HR (high) = 0.34, p (HR) =0.04, logrank p=0.032] and LINC01315 [HR (high) = 0.39, p (HR) =0.057, logrank p=0.049] was found to correlate with a better DFS in TNBC patients (Fig. 3). Out of
of
18 DElncRNAs, two DElncRNAs viz., CTA-384D8.35 and LINC01315, could act as
ro
prognostic biomarkers.
-p
DEmiRNA-DEmRNA Interaction Network
Based on DEmiRNA-DEmRNA interaction pairs from various databases, four up-regulated
re
DEmiRNAs were found to target 209 down-regulated DEmRNAs and two down-regulated
lP
DEmiRNAs interacted with 34 up-regulated DEmRNAs. Finally, the interaction network was constructed and visualised in Cytoscape, which consists of 249 nodes with 321
na
interaction/edges (Fig. S2).
DEmiRNA-DElncRNA Interaction Network
Jo ur
Based on DEmiRNA-DElncRNA interactions predicted by miRcode, it was found that two down-regulated miRNAs were targeted by two up-regulated lncRNAs and seven downregulated lncRNAs by two up-regulated miRNAs. The interaction network was constructed and visualised in Cytoscape, which consists of 13 nodes and 11 edges (Fig. S3). DElncRNA-DEmRNA Interaction Network A total of 1382 DElncRNA and DEmRNA interaction pairs were obtained with PCC > 0.6 and p-value < 0.05, and the network was constructed and visualised in Cytoscape. Construction of ceRNA Network Based on the above interaction data, DElncRNA-DEmiRNA-DEmRNA ceRNA network was obtained, which consists of 18 DElncRNAs, six DEmiRNAs and 668 DEmRNAs, further, it was visualized through Cytoscape (Fig. 4). The network has 692 nodes and 1714 interactions, where nodes represent lncRNA/miRNA/mRNA and edges are the interactions between them.
Journal Pre-proof Topological Analysis and their Sub Modules The topological properties of the network with defined parameters were calculated and found that the degree distribution of the whole network follows power-law distributions, with a slope of -1.7; R2, the measure of the goodness of the fit of data points to the curve of 0.7 and correlation of 0.9 thus revealing a scale-free network. In order to explore the hubs, degree and betweenness centrality are considered to be important parameters and found that 17 lncRNAs have higher nodes based on degree > 35 (Table 3). The lncRNAs tend to form hubs with the highest degree and betweenness
of
centrality compared to mRNA. From Fig. 4, it is observed that a large number of mRNAs
ro
interacted with lncRNAs, thus lncRNAs act as ceRNAs to communicate with multiple mRNAs by competing specific binding with shared miRNAs. Two important sub-modules
-p
were obtained with MCODE scores >3 from the constructed network. It was observed that
re
LINC01087, LINC01315 and SOX9-AS1 are key lncRNAs involved in ceRNETs (Fig. 5).
lP
Annotation of DElncRNAs
The co-expressed DEmRNAs of DElncRNAs were involved in the mitotic cell cycle, signal transduction biological processes and various cancer-related pathways. 120 GO biological
na
processes enriched terms including mitotic cell cycle, cell division, cell proliferation, and mitosis. For GO molecular function, 47 enriched terms were obtained that include protein
Jo ur
binding, nucleotide binding, and ATP binding. Also, 80 enriched GO cellular component terms such as nucleus, cytoplasm, and cytosol were obtained (Table 4). Further, KEGG pathway analysis revealed that, 30 significant enriched terms with FDR of < 0.05 including cell cycle, oocyte meiosis, pathways in cancer, progesterone-mediated oocyte maturation, DNA replication and TGF-beta signaling pathway (Table 5). Discussion The absence of promising and specific target for TNBC therapy necessitates identifying effective targets/ biomarkers. However, the development of transcriptomics studies has advanced knowledge in identifying a different class of non-coding RNAs named lncRNAs. Since lncRNAs play an important role in cancer progression and development, and gene regulatory network, the present study explored their crucial role as diagnostic and prognostic biomarkers and in the regulation of ceRNET. Even though numerous studies have indicated
Journal Pre-proof lncRNA expression profiles in TNBC compared to non-TNBC, the study is limited and only focussed on a smaller number of cancer samples. Hence, a comprehensive analysis of lncRNA expression profiles was conducted on a large number of samples >800 and also the regulatory relationship between DEmRNA-DEmiRNA-DElncRNA network was carried out. Therefore, different datasets were compiled and integrated to find consistent DElncRNAs. A total of 26 DElncRNAs were detected through RRA analysis and subjected to SVM-RFE to obtain a significant number of DElncRNAs. Further, 18 DElncRNAs were obtained with more than 80% accuracy in all test datasets, with high sensitivity and specificity thus could act as diagnostic biomarkers. Also, two DElncRNAs viz., CTA-384D8.35 and LINC01315
of
could act as prognostic biomarkers correlating with better OS and DFS. Finally, ceRNET was
ro
constructed with 18 DElncRNAs along with 668 DEmRNAs and six DEmiRNAs, to explore the role of lncRNAs played in TNBC development and progression. It was found from the
-p
sub-network of the constructed network that the three lncRNAs are regulating miRNAs, and
re
co-expressed with mRNAs.
Through literature studies, it is found that some of the DElncRNAs are already reported in
lP
TNBC studies. For instance, LINC00511 is an oncogene promoting cell proliferation, apoptosis, invasion, metastasis, and worse prognosis. The up-regulation of LINC00511 in
na
TNBC has been reported in a few studies. The study made an analysis of two lncRNA microarray datasets of TNBC and found that LINC00511 is one of the up-regulated lncRNA
Jo ur
[30]. It is reported that they function as ceRNA for miR-185-3p and target E2F1 protein thus promoting BC stemness and tumorigenesis [31]. BLACAT1, an up-regulated lncRNA, has been already reported in various cancers such as bladder, glioma, cervical, gastric, small-cell lung and colorectal cancer [32]. Also, their up-regulation in BC influences cell proliferation as well as metastasis through miR-150-5p by targeting CCR2 protein [33]. LINC00993 is found as one of the most down-regulated lncRNA in TNBC [34]. In the study, the dysregulation of lncRNAs such as LINC01315, FOXP4-AS1, CTD-2284J15.1, LINC01087, DSCAM-AS1, LINC01116, RP11-774O3.3 and DRAIC were observed in BC subtype such as basal-like [35]. Notably, accumulating evidence indicates the importance of the candidate lncRNAs as key biomarkers in various cancers, which led the researchers to look into their importance in TNBC. CTA-384D8.35 was found to be up-regulated in the study and also predicted as a biomarker for tongue tumor tissue [36]. AC093642.1, AC008268.1, AC000095.10, AC110619.2, and CTD-2311B13.7 were found to be down-regulated and have not been reported elsewhere.
Journal Pre-proof From the module analysis, it can be inferred that ceRNETs exhibited the interaction of DElncRNAs (SOX9-AS1 and LINC01315) with DEmRNAs (MRAS, CDK6, CALML4 and SFT2D2) as regulated by both hsa-miR-190b and hsa-miR-449a. Also, DElncRNA (LINC01087) is co-expressed with SUOX, HECTD2 and SLC39A6, regulated by hsa-miR135b-5p. Eventually, it was observed that LINC01315 can act as a prognostic biomarker as well as ceRNA for hsa-miR-190b, co-expressed with various targets such as MRAS, CDK6, SFT2D2, CALML4 and CDK6. An earlier study suggested that hsa-miR‐ 449a/b targets and inhibits the expression of oncogenic proteins viz., CDK6 and CDC25A (cell division cycle 25 homolog A), to arrest the
of
cell cycle in the breast cancer cell line MCF‐ 7 [37]. However, in the current study
ro
DElncRNAs, LINC01315 and SOX9-AS1 were found to be associated with breast cancer. Thus, these results proposed that significant DElncRNAs which could act as diagnostic as
re
which play a vital role in TNBC development.
-p
well as prognostic biomarkers. Also, some of the identified lncRNA function as ceRNAs
lP
Conclusion
The study identified 18 DElncRNAs that could distinguish TNBC from non-TNBC with high diagnostic specificity and sensitivity. Among the identified DElncRNAs, LINC01315 and
na
CTA-384D8.35 act as prognostic biomarkers. The dysregulated lncRNA-miRNA-mRNA network was constructed based on ceRNA hypothesis and sub-modules were identified
Jo ur
including three lncRNAs (LINC01087, LINC01315, and SOX9-AS1). Enrichment analysis of co-expressed genes indicated their involvement in the mitotic cell cycle, cancer-related pathways, etc. Hence the study revealed the usefulness of lncRNAs as a potential biomarker for TNBC diagnosis and treatment. Conflict of Interest The authors declare no conflict of interest Funding Information None Acknowledgements The authors thank Centre for Bioinformatics, Pondicherry University for providing the computer facility to carry out the work. Leimarembi Devi Naorem acknowledges a Senior
Journal Pre-proof Research Fellowship from the Council of Scientific & Industrial Research (CSIR). Mathavan Muthaiyan acknowledges a Senior Research Fellowship from the Rajiv Gandhi National Fellowship (RGNF). The authors also thank the technical group of the Centre for their technical assistance. References [1]
A.C. Garrido-Castro, N.U. Lin, K. Polyak, Insights into molecular classifications of triple-negative breast cancer: Improving patient selection for treatment, Cancer Discov. (2019). doi:10.1158/2159-8290.CD-18-1177. M. Xie, L. Ma, T. Xu, Y. Pan, Q. Wang, Y. Wei, Y. Shu, Potential Regulatory Roles
of
[2]
ro
of MicroRNAs and Long Noncoding RNAs in Anticancer Therapies, Mol. Ther. Nucleic Acids. (2018). doi:10.1016/j.omtn.2018.08.019.
T. Gutschner, S. Diederichs, The hallmarks of cancer: A long non-coding RNA point
-p
[3]
[4]
re
of view, RNA Biol. (2012). doi:10.4161/rna.20481. X. Qi, Y. Lin, J. Chen, B. Shen, Decoding competing endogenous RNA networks for
S. Sha, D. Yuan, Y. Liu, B. Han, N. Zhong, Targeting long non-coding RNA DANCR
na
[5]
lP
cancer biomarker discovery, Brief. Bioinform. (2019). doi:10.1093/bib/bbz006.
inhibits triple negative breast cancer progression, Biol. Open. (2017).
[6]
Jo ur
doi:10.1242/bio.023135.
F. Shi, F. Xiao, P. Ding, H. Qin, R. Huang, Long Noncoding RNA Highly Upregulated in Liver Cancer Predicts Unfavorable Outcome and Regulates Metastasis by MMPs in Triple-negative Breast Cancer, Arch. Med. Res. (2016). doi:10.1016/j.arcmed.2016.11.001.
[7]
K.M. Tracy, C.E. Tye, P.N. Ghule, H.L.H. Malaby, J. Stumpff, J.L. Stein, G.S. Stein, J.B. Lian, Mitotically-associated lncRNA (MANCR) affects genomic stability and cell division in aggressive breast cancer, Mol. Cancer Res. (2018). doi:10.1158/15417786.MCR-17-0548.
[8]
R. Rodríguez Bautista, A. Ortega Gómez, A. Hidalgo Miranda, A. Zentella Dehesa, C. Villarreal-Garza, F. Ávila-Moreno, O. Arrieta, Long non-coding RNAs: Implications in targeted diagnoses, prognosis, and improved therapeutic strategies in human nonand triple-negative breast cancer, Clin. Epigenetics. (2018). doi:10.1186/s13148-018-
Journal Pre-proof 0514-z. [9]
X. Kong, W. Liu, Y. Kong, Roles and expression profiles of long non-coding RNAs in triple-negative breast cancers, J. Cell. Mol. Med. (2018). doi:10.1111/jcmm.13327.
[10] N. Yuan, G. Zhang, F. Bie, M. Ma, Y. Ma, X. Jiang, Y. Wang, X. Hao, Integrative analysis of lnCRNAs and miRNAs with coding RNAs associated with ceRNA crosstalk network in triple negative breast cancer, Onco. Targets. Ther. (2017). doi:10.2147/OTT.S149308. [11] A. Alimonti, A. Carracedo, J.G. Clohessy, L.C. Trotman, C. Nardella, A. Egia, L.
of
Salmena, K. Sampieri, W.J. Haveman, E. Brogi, A.L. Richardson, J. Zhang, P.P.
(2010) 454. https://doi.org/10.1038/ng.556.
ro
Pandolfi, Subtle variations in Pten dose determine cancer susceptibility, Nat. Genet. 42
-p
[12] A.L. Richardson, Z.C. Wang, A. De Nicolo, X. Lu, M. Brown, A. Miron, X. Liao, J.D.
re
Iglehart, D.M. Livingston, S. Ganesan, X chromosomal abnormalities in basal-like human breast cancer, Cancer Cell. (2006). doi:10.1016/j.ccr.2006.01.013.
lP
[13] R. Sabatier, P. Finetti, J. Adelaide, A. Guille, J.P. Borg, M. Chaffanet, L. Lane, D. Birnbaum, F. Bertucci, Down-regulation of ECRG4, a candidate tumor suppressor
na
gene, in human breast cancer, PLoS One. (2011). doi:10.1371/journal.pone.0027656. [14] T. Gruosso, V. Mieulet, M. Cardon, B. Bourachot, Y. Kieffer, F. Devun, T. Dubois, M.
Jo ur
Dutreix, A. Vincent‐ Salomon, K.M. Miller, F. Mechta‐ Grigoriou, Chronic oxidative stress promotes H2AX protein degradation and enhances chemosensitivity in breast cancer patients, EMBO Mol. Med. (2016). doi:10.15252/emmm.201505891. [15] P. Den Hollander, K. Rawls, A. Tsimelzon, J. Shepherd, A. Mazumdar, J. Hill, S.A.W. Fuqua, J.C. Chang, C.K. Osborne, S.G. Hilsenbeck, G.B. Mills, P.H. Brown, Phosphatase PTP4A3 promotes triple-negative breast cancer growth and predicts poor patient survival, Cancer Res. (2016). doi:10.1158/0008-5472.CAN-14-0673. [16] S. Maubant, B. Tesson, V. Maire, M. Ye, G. Rigaill, D. Gentien, F. Cruzalegui, G.C. Tucker, S. Roman-Roman, T. Dubois, Transcriptome analysis of wnt3a-treated triplenegative breast cancer cells, PLoS One. (2015). doi:10.1371/journal.pone.0122333. [17] L. Gautier, L. Cope, B.M. Bolstad, R.A. Irizarry, Affy - Analysis of Affymetrix GeneChip data at the probe level, Bioinformatics. (2004).
Journal Pre-proof doi:10.1093/bioinformatics/btg405. [18] A. Kauffmann, R. Gentleman, W. Huber, arrayQualityMetrics - A bioconductor package for quality assessment of microarray data, Bioinformatics. (2009). doi:10.1093/bioinformatics/btn647. [19] G. Zhang, H. Sun, Y. Zhang, H. Zhao, W. Fan, J. Li, Y. Lv, Q. Song, J. Li, M. Zhang, H. Shi, Characterization of dysregulated lncRNA-mRNA network based on ceRNA hypothesis to reveal the occurrence and recurrence of myocardial infarction, Cell Death Discov. (2018). doi:10.1038/s41420-018-0036-7.
of
[20] M. Zhou, H. Zhao, X. Wang, J. Sun, J. Su, Analysis of long noncoding RNAs
ro
highlights region-specific altered expression patterns and diagnostic roles in Alzheimer’s disease, Brief. Bioinform. (2018). doi:10.1093/bib/bby021.
-p
[21] G.K. Smyth, Limma: linear models for microarray data BT - Bioinformatics and
re
Computational Biology Solutions Using R and Bioconductor, Bioinforma. Comput. Biol. Solut. Using R Bioconductor. (2005). doi:10.1007/0-387-29362-0_23.
lP
[22] R. Kolde, S. Laur, P. Adler, J. Vilo, Robust rank aggregation for gene list integration
na
and meta-analysis, Bioinformatics. (2012). doi:10.1093/bioinformatics/btr709. [23] G.R. Warnes, B. Bolker, L. Bonebakker, R. Gentleman, W.H.A. Liaw, T. Lumley, M. Maechler, A. Magnusson, S. Moeller, M. Schwartz, B. Venables, Package “gplots”:
Jo ur
Various R programming tools for plotting data, R Packag. Version 2.17.0. (2016). doi:10.1111/j.0022-3646.1997.00569.x. [24] P. Guo, Y. Luo, G. Mai, M. Zhang, G. Wang, M. Zhao, L. Gao, F. Li, F. Zhou, Gene expression profile based classification models of psoriasis, Genomics. (2014). doi:10.1016/j.ygeno.2013.11.001. [25] L.D. Naorem, M. Muthaiyan, A. Venkatesan, Integrated network analysis and machine learning approach for the identification of key genes of triple-negative breast cancer, J. Cell. Biochem. 120 (2019) 6154–6167. doi:10.1002/jcb.27903. [26] L.D. Naorem, M. Muthaiyan, A. Venkatesan, Identification of dysregulated miRNAs in triple negative breast cancer: A meta-analysis approach, J. Cell. Physiol. 234 (2019) 11768–11779. doi:10.1002/jcp.27839. [27] P. Shannon, A. Markiel, O. Ozier, N.S. Baliga, J.T. Wang, D. Ramage, N. Amin, B.
Journal Pre-proof Schwikowski, T. Ideker, Cytoscape: a software environment for integrated models of biomolecular interaction networks., Genome Res. (2003). doi:10.1101/gr.1239303. [28] Y. Assenov, F. Ramírez, S.E.S.E. Schelhorn, T. Lengauer, M. Albrecht, Computing topological parameters of biological networks, Bioinformatics. (2008). doi:10.1093/bioinformatics/btm554. [29] G.D. Bader, C.W. V Hogue, An automated method for finding molecular complexes in large protein interaction networks., BMC Bioinformatics. (2003). [30] T. Tian, Z. Gong, M. Wang, R. Hao, S. Lin, K. Liu, F. Guan, P. Xu, Y. Deng, D. Song,
of
N. Li, Y. Wu, Z. Dai, Identification of long non-coding RNA signatures in triple-
ro
negative breast cancer, Cancer Cell Int. (2018). doi:10.1186/s12935-018-0598-8. [31] G. Lu, Y. Li, Y. Ma, J. Lu, Y. Chen, Q. Jiang, Q. Qin, L. Zhao, Q. Huang, Z. Luo, S.
-p
Huang, Z. Wei, Long noncoding RNA LINC00511 contributes to breast cancer
re
tumourigenesis and stemness by inducing the miR-185-3p/E2F1/Nanog axis, J. Exp. Clin. Cancer Res. (2018). doi:10.1186/s13046-018-0945-6.
lP
[32] H. Lu, H. Liu, X. Yang, T. Ye, P. Lv, X. Wu, Z. Ye, LncRNA BLACAT1 May Serve as a Prognostic Predictor in Cancer: Evidence from a Meta-Analysis, Biomed Res. Int.
na
(2019). doi:10.1155/2019/1275491.
[33] X. Hu, Y. Liu, Y. Du, T. Cheng, W. Xia, Long non-coding RNA BLACAT1 promotes
Jo ur
breast cancer cell proliferation and metastasis by miR-150-5p/CCR2, Cell Biosci. 9 (2019) 14. doi:10.1186/s13578-019-0274-2. [34] C. Chen, Z. Li, Y. Yang, T. Xiang, W. Song, S. Liu, Microarray expression profiling of dysregulated long non-coding RNAs in triple-negative breast cancer, Cancer Biol. Ther. (2015). doi:10.1080/15384047.2015.1040957. [35] C. Mathias, E.P. Zambalde, P. Rask, D.F. Gradia, J.C. de Oliveira, Long non-coding RNAs differential expression in breast cancer subtypes: What do we know?, Clin. Genet. (2019). doi:10.1111/cge.13502. [36] A. Mohammed, G. Biegert, J. Adamec, T. Helikar, Identification of potential tissuespecific cancer biomarkers and development of cancer versus normal genomic classifiers, Oncotarget. (2017). doi:10.18632/oncotarget.21127. [37] X. Yang, M. Feng, X. Jiang, Z. Wu, Z. Li, M. Aau, Y. Qiang, miR-449a and miR-449b
Journal Pre-proof are direct transcriptional targets of E2F1 and negatively regulate pRb-E2F1 activity through a feedback loop by targeting CDK6 and CDC25A, Genes Dev. (2009).
Jo ur
na
lP
re
-p
ro
of
doi:10.1101/gad.1819009.
Journal Pre-proof
Jo ur
na
lP
Fig. 1: Hierarchical clustering of datasets
re
-p
ro
of
Figures
Journal Pre-proof
Jo ur
na
lP
re
-p
ro
of
Fig. 2: A. Recursive Feature Elimination (RFE) and B. Boruta plot for DElncRNAs
of
Journal Pre-proof
Jo ur
na
lP
re
-p
ro
Fig. 3: Overall survival and disease-free survival for DElncRNAs
lP
re
-p
ro
of
Journal Pre-proof
Jo ur
na
Fig. 4: DElncRNA-DEmiRNA-DEmRNA interaction network
ro
of
Journal Pre-proof
Jo ur
na
lP
re
-p
Fig. 5: The sub-modules of ceRNA network
Journal Pre-proof Tables Table 1. The list of 18 minimal sets of DElncRNAs S.No. Probesets
1. 227452_at
lncRNA
Regul
Full name
LINC00511/// LINC00673
ation
long intergenic non-protein coding RNA Up 511 /// long intergenic non-protein coding RNA 673
2. 243754_at
CTA-384D8.35 uncharacterized LOC105373098
Up
3. 232105_at
BLACAT1
Up
4. 229930_at
LINC01315
5. 230419_at
SOX9-AS1
SOX9 antisense RNA 1
6. 238898_at
FOXP4-AS1
FOXP4 antisense RNA 1
7. 1555893_at
CTD-2284J15.1 uncharacterized LOC105375624
10. 1560850_at
of
ro
-p
re
Up Down Down
LINC01087
1087
AC008268.1
Jo ur
13. 214858_at
uncharacterized FLJ38379
Up
long intergenic non-protein coding RNA Down
11. 1562821_a_at DSCAM-AS1 12. 213089_at
1315
AC093838.7///
na
9. 1557843_at
long intergenic non-protein coding RNA Up
lP
8. 1556474_a_at AC093642.1
bladder cancer associated transcript 1
AC000095.10 AC110619.2/// PP14571
uncharacterized LOC101926959
Down
DSCAM antisense RNA 1
Down
uncharacterized LOC100272216
Down
uncharacterized LOC100130449
Down
AC017048.3///
long intergenic non-protein coding RNA Down
LINC01116
1116
15. 229130_at
RP11-774O3.3
---
16. 237339_at
LINC00993
17. 239594_at
DRAIC
18. 243241_at
CTD-2311B13.7
14. 228564_at
Down
long intergenic non-protein coding RNA Down 993 downregulated RNA in cancer, inhibitor Down of cell invasion and migration uncharacterized LOC105378171 /// uncharacterized LOC105378174
Down
Journal Pre-proof Table 2. Performance measures of different independent test datasets Accuracy
Kappa
Sensitivity
Specificity
Precision
Recall
F1
GSE7904
0.9286
0.8558
1
0.8800
0.8500
1
0.9189
GSE76275
0.80
0.5706
0.7447
0.9677
0.9859
0.7447
0.8485
GSE65194
0.9257
0.8406
0.9608
0.9072
0.8448
0.9608
0.8991
GSE45827
0.9241
0.8362
0.9592
0.9062
0.8393
0.9592
0.8952
Jo ur
na
lP
re
-p
ro
of
Dataset
Journal Pre-proof Table 3. The list of DElncRNAs with degree and betweenness centrality
Betweenness Degree 0.159735
130
AC017048.3///LINC01116
0.202788
129
RP11-328M4.2///FOXP4-AS1 0.432991
122
AC000095.10
0.101842
107
RP11-774O3.3
0.098943
104
CTD-2311B13.7
0.10924
100
CTA-384D8.35
0.364581
97
LINC01315
0.211189
83
AC093642.1
0.054693
79
DRAIC
0.06348
68
0.038579
62
0.176324
60
0.054975
60
0.105958
47
0.101899
43
AC093838.7///LINC01087
0.020751
39
DSCAM-AS1
0.018883
38
-p
of
LINC00993
ro
DElncRNAs
LINC00511
BLACAT1
Jo ur
na
SOX9-AS1
lP
CTD-2284J15.1
re
AC008268.1
Journal Pre-proof Table 4. The top 10 most significantly enriched GO terms for the co-expressed DEmRNAs of DElncRNAs
Biological Process GO Term
No. of genes
FDR
0000278 mitotic cell cycle
31
4.48E-13
0051301 cell division
29
2.40E-12
0008283 cell proliferation
24
1.36E-07
0007049 cell cycle
28
2.02E-07
0007067 mitosis
of
GO ID
17
2.84E-06
12
9.61E-06
14
1.95E-05
11
2.07E-05
44
2.55E-05
0045892 negative regulation of transcription, DNA-dependent 23
2.58E-05
0000082 G1/S transition of mitotic cell cycle
-p
0000236 mitotic prometaphase
ro
0000087 M phase of mitotic cell cycle
re
0007165 signal transduction
0005515 protein binding
lP
Molecular Function 199
2.52E-42
93
9.56E-17
68
1.04E-12
91
2.41E-08
0042803 protein homodimerization activity
25
0.000106
0017124 SH3 domain binding
11
0.000283
0004674 protein serine/threonine kinase activity
20
0.000297
0003677 DNA binding
53
0.000612
0003723 RNA binding
26
0.000643
0003682 chromatin binding
14
0.000971
0005634 nucleus
209
3.02E-35
0005737 cytoplasm
205
4.42E-35
0005730 nucleolus
70
1.95E-14
0005829 cytosol
86
1.51E-13
0005654 nucleoplasm
47
5.26E-11
0005524 ATP binding
na
0000166 nucleotide binding
Jo ur
0046872 metal ion binding
Cellular Component
Journal Pre-proof 121
2.11E-10
0005856 cytoskeleton
44
3.32E-10
0005886 plasma membrane
104
2.78E-08
0031012 extracellular matrix
16
5.74E-08
0005739 mitochondrion
51
1.02E-06
Jo ur
na
lP
re
-p
ro
of
0016020 membrane
Journal Pre-proof Table 5. The top 10 most significantly enriched KEGG pathway terms for the co-expressed DEmRNAs of DElncRNAs
Pathways
No. of genes
FDR
Cell cycle
18
1.99E-10
04114
Oocyte meiosis
12
1.68E-05
05200
Pathways in cancer
16
0.00206
04914
Progesterone-mediated oocyte maturation 8
0.00239
04970
Salivary secretion
8
0.00274
05222
Small cell lung cancer
8
0.00336
03030
DNA replication
5
0.0054
04350
TGF-beta signaling pathway
7
0.00724
00240
Pyrimidine metabolism
7
0.01243
04962
Vasopressin-regulated water reabsorption 5
Jo ur
na
lP
re
-p
ro
04110
of
KEGG ID
0.0125
Journal Pre-proof Author statement
Jo ur
na
lP
re
-p
ro
of
Not Applicable
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5