Identification three LncRNA prognostic signature of ovarian cancer based on genome-wide copy number variation

Identification three LncRNA prognostic signature of ovarian cancer based on genome-wide copy number variation

Biomedicine & Pharmacotherapy 124 (2020) 109810 Contents lists available at ScienceDirect Biomedicine & Pharmacotherapy journal homepage: www.elsevi...

11MB Sizes 0 Downloads 18 Views

Biomedicine & Pharmacotherapy 124 (2020) 109810

Contents lists available at ScienceDirect

Biomedicine & Pharmacotherapy journal homepage: www.elsevier.com/locate/biopha

Identification three LncRNA prognostic signature of ovarian cancer based on genome-wide copy number variation

T

Mingjun Zhenga,b,c,1, Yuexin Hua,b,1, Rui Goua,b, Xin Niea,b, Xiao Lia,b, Juanjuan Liua,b, Bei Lina,b,* a

Department of Gynaecology and Obstetrics, Shengjing Hospital Affiliated to China Medical University, No. 36 Sanhao Street, Heping District, Shenyang 110004, Liaoning, China b Key Laboratory of Maternal-Fetal Medicine of Liaoning Province, Key Laboratory of Obstetrics and Gynecology of Higher Education of Liaoning Province, China c Department of Obstetrics and Gynecology, University Hospital, LMU Munich, Marchioninistr. 15, 81377, Munich, Germany

ARTICLE INFO

ABSTRACT

Keywords: lncRNAs Ovarian cancer Copy number alterations Multi-omics analysis Subtypes Prognostic biomarkers

Purpose: Ovarian cancer is one of the most common malignant tumors of the female reproductive system, which seriously threatens the health of patients. It is of great significance to identify biomarkers to improve the clinical status of ovarian cancer patients. Methods: Methylation, RNA- sequencing, Copy number variation (CNV), mutation and clinical characteristics of ovarian cancer and control samples were downloaded from The Cancer Genome Atlas database (TCGA). The “iClusterPlus”. R package was used to cluster the molecular subtypes. The copy number variation of the entire lncRNA genome was analyzed using GISTIC. The prognosis-associated lncRNA related to CNV was screened as potential targets for ovarian cancer. Results: Six molecular subtypes were identified based on multi-omics analysis and DElncRNAs are significantly enriched in specific molecular subtypes. The deletion or amplification of lncRNA copy number affects the occurrence and development of ovarian cancer to some extent. Three prognostic-associated lncRNA including LOC101927151, LINC00861 and LEMD1-AS1 were selected. These lncRNAs can be used as biomarkers to predict survival in patients with ovarian cancer. The accuracy of results were verified using the Gene Expression Omnibus (GEO) dataset. Conclusion: Based on genome-wide copy number variation, prognostic-associated lncRNAs were identified as new biomolecular markers for ovarian cancer.

1. Introduction Ovarian cancer is one of the three major malignant tumors of the female reproductive system. However, ovaries are located in deep pelvic cavities, it is not easily detected in the early stage. As a result, 70 % of the cases was diagnosed with distant metastasis. Most patients relapse within 2 years, since effective treatment is lacking for recurrent cases, the mortality rate ranks first among gynecological malignant tumors [1,2]. Ovarian epithelial malignant tumors are the most common pathological type, accounting for 80 %–95 % of ovarian malignant tumors. Such tumors show easy relapse after surgery, and become drug resistant [3]. Therefore, we aimed to study the mechanism

of ovarian epithelial serous tumors and to explore prognostic-associated lncRNAs, which is of great significance in providing guidance for the diagnosis and prognosis of ovarian cancer. LncRNA is a recently discovered RNA of more than 200 bp in length that does not encode proteins, but is important for development, differentiation and metabolism [4,5]. Recent studies have shown that lncRNAs play an important role not only in tissue development, but also in the proliferation and metastasis of cancer cells by regulating gene expression and nuclear transcription. Studies on the mechanisms of lncRNAs are helpful in the diagnosis and treatment of tumors [6–8]. In recent years, studies on lncRNA as biomarkers for ovarian cancer have been reported extensively. Martini P [9] et al. suggested that lnc-

Abbreviations: TCGA, The Cancer Genome Atlas database; CNV, Copy number variation; GEO, Gene Expression Omnibus; lncRNA, long non-coding RNA; FPKM, fragments per kilobase million; TPM, transcripts per kilobase millions; GISTIC, Genomic Identification of Significant Targets in Cancer; MCODE, the molecular complex detection ⁎ Corresponding author at: Department of Obstetrics and Gynecology, Shengjing Hospital Affiliated to China Medical University, 36 Sanhao Street, Shenyang, Liaoning, 110004, China. E-mail address: [email protected] (B. Lin). 1 These authors have contributed equally to this work. https://doi.org/10.1016/j.biopha.2019.109810 Received 9 September 2019; Received in revised form 10 November 2019; Accepted 20 November 2019 0753-3322/ © 2019 Published by Elsevier Masson SAS. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/BY-NC-ND/4.0/).

Biomedicine & Pharmacotherapy 124 (2020) 109810

M. Zheng, et al.

SERTAD2-3, lnc-SOX4-1, lnc-HRCT1-1 and PVT1 can be used as novel indicators of patients’ prognosis in stage I ovarian cancer through retrospective and multicentric study. Wu DD [10] et al. suggested that overexpression of lncRNA ABHD11-AS1 promotes tumorigenesis and progression of epithelial ovarian cancer through targeted regulation of RhoC in ovarian cancer cell A2780 and OVCAR3. Subcutaneous injection of tumor cells in vivo showed that ABHD11-AS1 significantly promoted tumor growth. In the study of Mitra R [11] et al., an integrated analysis of 700 ovarian cancer molecular profiles, including genomic datasets, from four patient cohorts was performed. The results showed that overexpression of lncRNA DNM3OS often lead to the deterioration of patients' overall survival and it was closely related to EMT in ovarian cancer. Liang H [12] et al. revealed that LncRNA PTAR promotes EMT and invasion-metastasis in serous ovarian cancer by competitively binding miR-101-3p to regulate ZEB1 expression. Copy number variation is widespread in the human genome, including deletions, insertions, duplications, and complex multi-site variation ranging from 1000 bp to millions of bp [13]. CNVs can lead to different degrees of differential gene expression, and multiple CNVs in the genome can cause heterogeneity of genomic and molecular phenotypes, leading to the occurrence and development of complex diseases including cancer [14]. DNA copy number variation (CNV) of germline is a ubiquitous source of genetic variation and remains largely unexplored in association with epithelial ovarian cancer (EOC) risk. Reid BM [15] et al. performed CNV quantitative analysis in the DNA of approximately 3500 cases and a genome-wide association study of common (> 1 %) CNV regions (CNVRs) with EOC and high-grade serous (HGSOC) risk. In addition to ovarian cancer, the incidence of other tumors is also closely related to copy number variation. Washaakh [16] et al. downloaded 547 copies of transcriptome data and 816 copies of breast cancer genome data from The Cancer Genome Atlas (TCGA) database in order to determine the prognostic correlation between CNVs and clinical pathological features. The results showed that 86 % of gene CNV expression was positively correlated. Shai [17] et al. collected a cohort of 197 patients with anaplastic oligodendroglioma and clinical, pathological, and molecular information were recorded. CNV analysis was performed using single-nucleotide polymorphism arrays. The results showed that CNV is closely related to survival. Currently, studies on lncRNA copy number variation and ovarian cancer have not been reported. Therefore, combined with the correlation analysis of lncRNA expression and CNVs, finding transcriptional dysregulation lncRNA caused by CNV abnormality is of vital importance for diagnosis, treatment and prognosis of ovarian cancer. Three prognostic biomarkers of lncRNA associated with CNV in ovarian cancer were identified. The present findings shed light on new therapeutic targets for ovarian cancer treatment.

removed type differences were downloaded. Single nucleotide mutation data processed by MuTect software were downloaded from the TCGA GDC. 2.2. Identification of prognostic sites using Univariate regression analysis In order to classify the samples better, we analyzed the influence of coding genes, CNV and methylation on prognosis. We chose samples with a prognostic follow-up time of more than 30 days and established a univariate Cox proportional risk regression model, with a significant p threshold of 0.05. We eventually screened 1152 coding genes, 14130 CNV regions and 847 CpG sites. 2.3. Copy number profile of lncRNA in ovarian cancer Genomic Identification of Significant Targets in Cancer (GISTIC) is an algorithm used to identify regions of variation that are more likely to trigger cancer pathogenesis [18]. It can visualize regions in the genome to show amplification and deletion in thousands of samples. 564 ovarian cancer copy number data downloaded from TCGA using GISTIC 2.0 software was analyzed and the copy number spectrum of lncRNA was extracted. Taking the number of copies greater than 1 as the threshold of copy amplification and less than −1 as the threshold of copy loss, we calculated the ratio of copy amplification and loss for each lncRNA, and observed their distribution in the genome. 2.4. Functional pathway enrichment analysis of lncRNA target genes Metascape (http://metascape.org) [19] is a free gene list analysis tool for gene annotation and analysis. It can not only complete pathway enrichment and biological process annotation, but also analyze generelated protein networks. In this study, Metascape was used as a functional tool to analyze the enrichment of target genes in lncRNAs. The conditions included: P < 0.01, minimum count of 3, and enrichment factor > 1.5 to obtain significant statistical differences. The following databases were used for PPI enrichment analysis in Metascape: BioGrid [20], InWeb_IM [21]and OmniPath [22]. In addition, the molecular complex detection (MCODE) algorithm was used to mine molecules with deeper network regulation relationships. 2.5. Survival analysis Data validation was realized using the KM-plot website (http:// kmplot.com/analysis/) [23].This database system integrated eight independent datasets resulting in a total of 1657 TCGA Ovarian Cancer (TCGA-OV) samples. Ovarian cancer patients were divided into two groups according to the expression of a certain gene (high or low expression). Kaplan–Meier plots were used to analyze overall survival in patients with ovarian cancer. The hazard ratio (HR) with 95 % confidence intervals (CIs) and log rank p-values were displayed online.

2. Materials and methods 2.1. Data downloaded and preprocessing

3. Results

We first downloaded all fragments per kilobase million (FPKM) and counts data from TCGA Genomic Data Commons (GDC). We converted FPKM to transcripts per kilobase millions (TPM). According to the types of genes in the genecode.v22.gtf file, we classified the mRNA of 3 prime overlapping (non-coding) ncRNAs, sense-intronic, sense-overlapping, antisense, and processed-transcripts as lncRNAs. In this way, we extracted the FPKM expression profile of lncRNAs. Those for which the type of gene was protein-coding were classified as protein-coding genes, from which we extracted the expression profiling of a protein-coding gene’s (PCGs) FPKM. We downloaded the expression profile data of all 27k samples from the TCGA GDC and removed CpG probes with NA in sample expression. The CNVs of all samples from TCGA GDC that

3.1. Prognostic analysis of six molecular subtypes based on multi-omics We analyzed the relationship between prognosis and PCGs, CNV and methylation, respectively. The coding genes, CNV and methylation sites significantly correlated with prognosis were screened with a threshold value of P < 0.05. R package iClusterPlus was used for multiomics analysis and R package sigclust was used for cluster significance analysis. The most significant clustering result (k = 6) was selected. As shown in Table 1, the samples of each subtype are classified evenly. As shown in Fig. 1A, we further analyzed the prognostic differences of these six subtypes and found significant differences in prognosis

2

Biomedicine & Pharmacotherapy 124 (2020) 109810

M. Zheng, et al.

SYSTEM、ALPHA_LINOLENIC_ACID_METABOLISM、GNRH_SIGNALING_PATHWAY. The C54 subtype is mainly related to abnormal TEROID_BIOSYNTHESIS、CELL CYCLE、PRIMARY_BILE_ACID_BIOSYNTHESIS、GLYCOSPHINGOLIPID_BIOSYNTHESIS_GANGLIO_SERIES. The C6 subtype is mainly related to abnormal TAURINE_AND_HYPOTAURINE_METABOLISM、PENTOSE_PHOSPHATE_PATHWAY. In summary, these subtypes have different abnormal pathways, which may be the reason why they have different clinical phenotypes. Furthermore, we compared the relationship between the four molecular subtypes previously reported based on TCGA data [24] and our six molecular subtypes as shown in Fig. 2B. The proportion of differentiated in C1, C2 and C4 is higher than that in the other three groups, and the proportion of immunoreactive in C3, C5 and C6 is higher than that in the other three groups. However, mesenchymal and proliferative are not significantly different in the distribution of each subtypes.

Table 1 The number of samples of six subtypes. Cluster

Sample Count

C1 C2 C3 C4 C5 C6

62 36 81 77 37 64

(p < 0.001), which indicated that the ovarian cancer molecular subtype based on gene expression, methylation and CNVs expression profile can predict the prognosis of ovarian cancer patients. Furthermore, we calculated the proportions of gene mutations in different subtypes. We selected 10 genes with the highest mutation proportion in each subtype, and identified 39 genes. This suggests that the six subtypes have less overlap between genes with the highest frequency of gene mutation. We further undertook hierarchical clustering analysis after extracting the mutation proportions of these 39 genes in each subtype, as shown in Fig. 1B, from which we can see that the mutation frequency of each subtype in the 39 genes is significantly different. The prognosis of C5 subtype was the worse and the mutations of MGA, WDFY3, ARID4A and FRAS1 in this subtype were significantly higher than those in other subtypes, indicating that these mutated genes could be used as unique prognostic biomarkers in this molecular subtype.

3.3. DElncRNA analysis in different subtypes R package DEseq2 was used to eliminate genes with an average count number of less than 1 in the expression profile. We further used a fold change greater than twice and FDR < 0.05 as thresholds to screen for different lncRNAs and coding genes in each subtype. The differences in each subtype are shown in Table 2, from which we found that there were a few DEmRNAs and DElncRNAs in C3 samples. C4 and C5 samples have the largest number of DEmRNAs and DElncRNAs. Volcano plot of DElncRNA for each subtype is shown in Fig. 3A–G, from this we observed that the number of Up-regulated lncRNAs was generally smaller than that of Down-regulated lncRNAs. A total of 889 DElncRNAs were obtained by combining all DElncRNAs in these subtypes. We further downloaded lncRNAs closely related to the disease from the LncRNA Disease and Lnc2Cancer database, and obtained 611 lncRNAs. Compared with 889 DElncRNAs, 83 common disease-related DElncRNAs were screened out in Fig. 3G. Moreover, hypergeometry was used to test the significance (p = 0.0004136).

3.2. Pathway analysis of molecular subtypes We used the R package ssGSEA to analyze the enrichment scores of each sample in each KEGG, further compared the differences in the enrichment scores of each subtype pathway with the t-test, selected p value < 0.01 as the threshold to identify specific pathways, and finally 60 pathways were screened as shown in Fig. 2A. The C1 subtype is mainly related to abnormal NON_SMALL_CELL_LUNG_CANCER、 LYSINE_DEGRADATION、RIBOSOME、SMALL_CELL_LUNG_CANCER TYPE_II_DIABETES_MELLITUS、MTOR_SIGNALING_PATHWAY. The C2 subtype is mainly related to abnormal immune pathways such as LEISHMANIA_INFECTION、ETHER_LIPID_METABOLISM、CYTOKINE_CYTOKINE_RECEPTOR_INTERACTION、PRIMARY_IMMUNODEFICIENCY、B_CELL_RECEPTOR_SIGNALING_PATHWAY、T_CELL_RECEPTOR_SIGNALING_PATHWAY、REGULATION_OF_ACTIN_CYTOSKELETON. There are few abnormal pathways of C3 subtype. The C4 subtype is mainly related to abnormal PHOSPHATIDYLINOSITOL_SIGNALING_

3.4. DElncRNAs gene enrichment analysis Taking the fold change of all the genes of 6 subtypes as the background and the fold change of DElncRNAs of the corresponding 6 subtypes as the input geneset, the enrichment significance of these DElncRNAs in the fold change of all genes was analyzed by GSEA. (Fig. 4A–F). It can be seen that most of the fold change in differentially expressed lncRNAs we screened were concentrated in areas with high expression multiples. We analyzed the intersection of lncRNAs between the six

Fig. 1. A: Kaplan–Meier (KM) curves of six subtypes of differential lncRNAs. The x-coordinate represents days, and the y-coordinate represents survival rate. The test method is log-rank; B: A hierarchical clustering of thermograms of the top 10 genes with the highest mutation in each subtype. The x-coordinate represents different subtypes, the y-coordinate represents genes, red represents high expression, and blue represents low expression. 3

Biomedicine & Pharmacotherapy 124 (2020) 109810

M. Zheng, et al.

Fig. 2. Pathway analysis of molecular subtypes. A: Pathways involved in 6 molecular subtypes, The x-coordinate represents the molecular subtype, the y-coordinate represents the pathway involved, and the darker the color, the more significant was the result. B: Comparison of 6 molecular subtypes with previous models. The x-coordinate represents the six molecular subtypes, and the y-coordinate represents the proportion.

subtypes using R package UpsetR, as shown in Fig. 4G, and found that differences between the six subtypes have less lncRNA intersection. Most DElncRNAs are only enriched in one molecular subtype. For example, 216 lncRNAs are all enriched in molecular subtype 4, 195 lncRNAs are all enriched in molecular subtype 5, and 18 lncRNAs are fully enriched in molecular subtype 3.

Table 2 Statistical data of the differences among subtypes. Type

C1

C2

C3

C4

C5

C6

PCG_Down PCG_Up PCG_All Lnc_Down Lnc_Up Lnc_All

261 77 338 162 40 202

296 150 446 94 37 131

118 32 150 29 17 46

456 21 477 309 26 335

381 167 548 173 110 283

196 29 225 82 46 128

3.5. Copy number profile of lncRNA in ovarian cancer Copy number change (CNA) is the most common type of gene modification, which is closely related to the occurrence and development of tumors. Genomic Identification of Significant Targets in Cancer (GISTIC) is

Fig. 3. A–E: Differential long non-coding RNA (lncRNA) volcano plot for six subtypes of differential lncRNAs, in which red represents an upward lncRNA and green represents a downward lncRNA. G: Venn diagram of differential lncRNAs related to disease. 4

Biomedicine & Pharmacotherapy 124 (2020) 109810

M. Zheng, et al.

Fig. 4. A–F: Gene set enrichment analysis (GSEA) of each subtype based on difference multiples as the rank. G: Intersection of the six subtypes of differential lncRNAs. The black dot in the figure indicates that the position has data, and the gray dot indicates no. The different points of the line indicate the existence of an intersection. The bar chart above represents the number of intersections. The bar graph on the left represents the total number of DElncRNAs of different molecular subtypes.

more than 5 % in each sample, and a correlation coefficient between lncRNA and CNV greater than 0.1. In this way, we obtained 28 lncRNAs (S_Table 2.). We further used univariate survival analysis to group samples into high and low expression groups according to the median level of lncRNA expression. We conducted a survival analysis of each of the 28 lncRNAs expressed (S_Table 3). The selected threshold was p < 0.05, and three lncRNAs with a significant prognosis were obtained eventually as shown in Table 3. From the table, it can be seen that LINC00861 has a higher CNV frequency of 32 %, and LOC101927151 has a higher correlation coefficient with CNV, which is 0.57. LOC101927151 and LEMD1-AS1 are differentially expressed in the worst prognostic C5 subtype. The correlation between these three lncRNA expressions and CNV is shown in Fig. 7A-C. It can be seen that with the increase of copy number, the expression of lncRNA showed an upward trend; According to the median expression of lncRNA, the samples were divided into high expression group (H) and low expression group (L), survival curve as shown by Fig. 7D-F. It can be seen that the lower the expression of LOC101927151, the worse the prognosis of ovarian cancer patients, and the higher the expression of LINC00861 and LEMD1-AS1, the worse the prognosis of the patients.

an algorithm used to identify regions of variation that are more likely to trigger cancer pathogenesis. It can visualize regions in the genome in order to show amplification and deletion in thousands of samples. We used GISTIC 2.0 software to analyze the copy number data of 564 ovarian cancers downloaded from TCGA. We first extracted the copy number spectrum from lncRNAs. Taking the number of copies greater than 1 as the threshold of copy amplification and less than −1 as the threshold of copy deletion, we calculated the ratio of copy amplification and deletion for each lncRNA and observed their distribution in the genome as shown in Fig. 5A. From this, we can see that the proportion of copy deletion is obviously less than that of copy amplification, and the ratio of amplification and deletion is the most on chromosome 8. We calculated the correlation distribution between the lncRNA expression profile and copy number, as shown in Fig. 5B. It can be seen that there is a positive correlation between copy number and lncRNA expression (p < 1e-16). We used a GISTIC algorithm to identify frequently changing areas in the ovarian cancer genome. Many areas were found with significant copy amplification and deletion of lncRNAs. The frequency of lncRNA copy deletion was more than that of copy amplification, as shown in Fig. 5C–D. This suggests that lncRNA copy deletion may be associated with the occurrence and development of ovarian cancer. To further investigate the relationship between lncRNA expression and CNV, we selected lncRNA with CNV change ratio of more than 30 %. A total of 32 copy number amplified lncRNAs were screened (S_Table 1), which means that the copy number deleted lncRNAs in the ovarian cancer genome was filtered. Further analysis was made of the expression differences in each lncRNA in copy number amplification and normal samples, as shown in the Fig. 6. It was found that the expression of 20 lncRNAs in copy number amplification samples was significantly higher than that in normal samples. This suggests that the change of lncRNA CNV is closely related to the expression of lncRNA.

3.7. External datasets validate the prognosis of three lncRNAs In order to observe and verify the relationship between the expression and prognosis of these three lncRNAs, we first compared the three lncRNAs to the chip of a GPL570 platform; the relationship between the three lncRNAs and probe is shown in Table 4. It can be seen that three lncRNAs were annotated to three probes, of which the ENSG00000245164 is not annotated to the probe. We annotated the probes in Kaplan–Meier (KM) plotter (http:// kmplot.com) online to analyze the relationship between these probes and the prognosis of ovarian cancer patients. The (KM) curve for overall survival is shown in Fig. 8A–C, from which we can see that the prognosis is consistent with our data. The KM curve for disease-free survival is shown in Fig. 8D–F. Significant prognostic differences were observed among the three probes, similar to the prognosis in overall survival.

3.6. LncRNA-CNV based prognostic biomarkers in OV patients To investigate whether the expression profile of lncRNA regulated by copy number deletion or amplification affects the survival status of patients with ovarian cancer, we analyzed copy numbers of lncRNAs that differed in each subtype by choosing lncRNAs with a CNV ratio of 5

Biomedicine & Pharmacotherapy 124 (2020) 109810

M. Zheng, et al.

Fig. 5. Copy number profile of lncRNA in ovarian cancer. A: Long non-coding RNA (lncRNA) copy loss and copy amplification proportion distribution in the genome; B: The correlation distribution between lncRNA expression and copy number variation (CNV). Gray represents the distribution under random conditions and pink represents the distribution under actual conditions. The difference between the two distributions was tested by t-test. C-D: The lncRNAs located in the focal CNA peaks are OV-related. False-discovery rates (q values) and scores from GISTIC 2.0 for alterations (x-axis) are plotted against genome positions (y-axis); dotted lines indicate the centromeres. The amplifications (C) and deletions (D) of lncRNA genes are also shown. The green line represents 0.25 q value cut-off point that determines significance.

3.8. Functional and pathway analysis of DElncRNAs

in Fig. 9A. The target genes of LEMD1-AS1 are mainly enriched in the mRNA processing, Cilium Assembly and other pathways (Fig. 9B). The target genes regulated by LOC101927151 are mainly enriched in SA PROGRAMMED CELL DEATH, Small CELL lung cancer and other signaling pathways (Fig. 9C).

We examined the correlation between the expression level of the LINC00861, LEMD1-AS1, LOC101927151 and each protein coding gene (PCGs) using two-sided Pearson correlation coefficients and the z-test. The PCGs positively correlated with these lncRNAs were considered as lncRNArelated PCGs. The top 200 PCGs were screened from high to low according to the Pearson correlation coefficient with P value < 0.05 (S _Table 4). Then we input these genes into Metascape database for functional and pathway enrichment analysis. It was found that the target genes regulated by Linc00861 is mainly enriched in tumor-related pathways such as lymphocyte activation, T cell activation, and adaptive immune response as shown

3.9. Pathway and protein-protein interaction enrichment analysis in LINC00861 To further capture the relationships between the terms of LINC00861 targeted genes, a subset of enriched terms have been selected and rendered as a network plot, where terms with a 6

Biomedicine & Pharmacotherapy 124 (2020) 109810

M. Zheng, et al.

Fig. 6. The difference in expression of 32 long non-coding RNA (lncRNAs) in samples with normal and amplified copies. The x-ordinate represents the expression level, the y-coordinate represents the copy number amplification and the normal group. Table 3 Significant prognostic differences in lncRNA with a copy number variation frequency of more than 5%. LncRNA

Symbol

p.value

Diff in subtype

CNV frequency

PCC

ENSG00000245164 ENSG00000267575 ENSG00000226235

LINC00861 LOC101927151 LEMD1-AS1

0.012960442 0.015641301 0.01955133

C2 C5 C5

32% 12% 5%

0.12 0.57 0.18

Fig. 7. A-C: Correlation between expression of three lncRNAs and copy number; B: Kaplan–Meier (KM) curve for the relationship between the expression of three lncRNAs and prognosis. The abscissa represents the copy number change; the ordinate represents the expression of lncRNA. D-F: Kaplan–Meier (KM) curve for the relationship between the expression of three lncRNAs and prognosis. The abscissa represents the time to live (day) and the ordinate represents the survival rate. The red line represents the high expression group and the blue represents the low expression group. 7

Biomedicine & Pharmacotherapy 124 (2020) 109810

M. Zheng, et al.

3.10. Data analysis flow chart

Table 4 Three lncRNA corresponding to GPL570 probes. LncRNA ENSG ID

GPL570 Probes

ENSG00000245164 ENSG00000267575 ENSG00000226235 ENSG00000226235

None 1569478_s_at 1561280_at 1561281_a_at

To make our study better understand. The workflow of the proposed method was developed as shown in Fig. 11. 4. Discussion In this study, we identified six molecular subtypes related to the prognosis of ovarian cancer patients based on multi-omic data including PCGs, CNV and methylation, among which subtype 5 had the worst prognosis. The mutations of MGA, WDFY3, ARID4A and FRAS1 in subtype 5 were significantly higher than those in other subtypes, indicating that these mutated genes could be used as unique prognostic markers in this molecular subtype. Pathway enrichment scores in each subtype shown that patients of subtype 1 is related to abnormal pathways such as NON_SMALL_CELL_LUNG_CANCER、LYSINE_ DEGRADATION、RIBOSOME、SMALL_CELL_LUNG_CANCER, patients of subtype 2 is related to abnormal immune pathways such as PRIMARY_IMMUNODEFICIENCY、B_CELL_RECEPTOR_SIGNALING_ PATHWAY、T_CELL_RECEPTOR_SIGNALING_PATHWAY, patients of subtype 3 is related to PHOSPHATIDYLINOSITOL_SIGNALING_SYSTEM、GNRH_SIGNALING_PATHWAY, patients of subtype 4 is related to abnormal pathways such as CELL CYCLE、PRIMARY_BILE_ ACID_BIOSYNTHESIS and patients of subtype 6 is related to abnormal pathway such as TAURINE_AND_HYPOTAURINE_METABOLISM pathway. Then 889 DElncRNAs were screened from six molecular subtypes, suggesting that these identified DElncRNAs are significantly enriched in specific molecular subtypes. Their expression may play a key role in the development of ovarian cancer and cause different clinical outcomes.

similarity > 0.3 are connected by edges. We select the terms with the best p-values from each of the 20 clusters, with the constraint that there are no more than 15 terms per cluster and no more than 250 terms in total. The network is visualized using Cytoscape, where each node represents an enriched term and is colored first by its p-value (Fig. 10A) and then by cluster ID (Fig. 10B). For each given gene list, proteinprotein interaction enrichment analysis has been carried out with the following databases: BioGrid, InWeb_IM, OmniPath. The resultant network contains the subset of proteins that form physical interactions with at least one other member in the list. The MCODE networks identified for individual gene lists have been gathered and are shown in Fig. 10C-D. Pathway and process enrichment analysis has been applied to each MCODE component independently, and the three best-scoring terms by p-value have been retained as the functional description of the corresponding components, shown in the tables underneath corresponding network plots within Fig. 10E. We can see that target genes in MCODE1 mainly enriched in Chemokine receptors bind chemokines、Class A/1 (Rhodopsin-like receptors)、G alpha (i) signaling events and target genes in MCODE2 mainly enriched in PID CD8 TCR PATHWAY、T cell receptor signaling pathway, TCR signaling.

Fig. 8. Survival analysis of the three lncRNAs. A–C: Kaplan–Meier (KM) curves of the overall survival of ovarian cancer with three probes in the KM plot database; D–F: KM curves of disease-free survival of ovarian cancer with three probes in the KMplot database. The abscissa represents the time of life and the ordinate represents the survival rate. The red line represents the high expression group and the black represents the low expression group. 8

Biomedicine & Pharmacotherapy 124 (2020) 109810

M. Zheng, et al.

Fig. 9. Analysis of Pathway and biological function of LncRNA Target Genes. A: Functional Analysis of LINC00861 Target Genes. B: Functional Analysis of LEMD1-AS1 Target Genes. C: Functional Analysis of LOC101927151 Target Genes. The darker the color, the more significant the enrichment.

selected from each sample. Among them, the expression of 20 lncRNA in the copy amplified samples was significantly higher than that in the samples with normal copies, which indicated that the change of lncRNA copy number was closely related to the expression of lncRNA. These abnormal copy number lncRNA may lead to dysregulation of transcription and affect the development of ovarian cancer. To investigate whether the expression profile of lncRNA regulated by copy number deletion or amplification affects the survival status of patients with ovarian cancer. 28 lncRNAs were selected according to CNV ratio and CNV correlation, univariate survival analysis was used and three lncRNAs with a significant prognosis were obtained including LINC00861, LEMD1-AS1 and LOC101927151. It can be seen that the high expression of LOC101927151 was related to a poor prognosis in ovarian cancer. In comparison, the low expression of LINC00861 and LEMD1-AS1 was related to a poor prognosis in ovarian cancer. Studies have shown that LINC00861 is involved in the occurrence and development of many kinds of cancers. The differential expression of LINC00861 between early and non-early recurrence groups in patients

CNVs can also cause genomic and molecular phenotype heterogeneity, leading to the occurrence and development of complex diseases, including cancer [14].Studies have shown that long non-coding RNAs identified from CNV show tissue-specific expression, and prognostic models based on lncRNA CNV, and targeting proteins can provide better predictions for cancer patients [25]. However, the study on the changes of lncRNA CNV and ovarian cancer has not been reported. Therefore, we downloaded 564 cases of lncRNA copy number spectrum in ovarian cancer from the TCGA database, respectively counted the proportion of each lncRNA with copy amplification and copy deletion, and observed its distribution in the genome. It can be seen that the copy number is positively correlated with the expression of lncRNA as a whole. GISTIC algorithm was used to identify the frequently changing regions in the ovarian cancer genome, and it was found that the frequent copy deletion of lncRNA was significantly more than the copy amplification region, suggesting that lncRNA copy deletion may be related to the occurrence and development of ovarian cancer. A total of 32 lncRNAs with copy number amplification were 9

Biomedicine & Pharmacotherapy 124 (2020) 109810

M. Zheng, et al.

Fig. 10. Network of enriched terms. A: colored by p-value, where terms containing more genes tend to have a more significant p-value. B: colored by cluster ID, where nodes that share the same cluster ID are typically close to each other; C: Protein-protein interaction network and MCODE1 components identified in the gene lists. D: Protein-protein interaction network and MCODE2 components identified in the gene lists. E: Pathway for gene enrichment in the network (top three)

with hepatocellular carcinoma can be used as a biomarker to predict the early recurrence of hepatocellular carcinoma after curable resection [26]. In addition, microRNA–mRNA–lncRNA networks in patients with chronic obstructive pulmonary disease, with or without a history of smoking, also showed that miR-218-5p/miR15a-RORALOC101928100/LINC00861 interactions may be associated with nonsmoking chronic obstructive pulmonary disease [27]. LINC00861 is also involved in the major biological processes of breast cancer; a

possible interaction between LINC00861 and clinical biomarkers (receptor tyrosine protein kinase ERBB-2, estrogen receptor and progesterone receptor) was found [28]. No direct association between LOC101927151, LINC00861 and LEMD1-AS1, and ovarian cancer was found in the current study. We found that the enrichment pathways of target genes regulated by LINC00861 are mainly in tumor related pathways such as lymphocyte activation、T cell activation、adaptive immune response. We

Fig. 11. The workflow of the proposed method.

10

Biomedicine & Pharmacotherapy 124 (2020) 109810

M. Zheng, et al.

Data availability

hypothesized that LINC00861 regulates the immune response and T cell activation of ovarian cells through the above pathways, and participates in the immune-related functions of ovarian cancer, which opens up a new direction in the study of ovarian cancer. At present, studies on the involvement of lncRNA in the regulation of immune function have been reported. Xu M [29] found that lncRNA SATB2-AS1 inhibits tumor metastasis and affects the tumor immune cell microenvironment in colorectal cancer by regulating SATB2. Metascape database was used to analyze the protein interaction network regulated by target genes of linc00861.It was found that target genes in MCODE1 mainly enriched in Chemokine receptors bind chemokines、Class A/1 (Rhodopsin-like receptors)、G alpha (i) signaling events and target genes in MCODE2 mainly enriched in PID CD8 TCR PATHWAY、T cell receptor signaling pathway、TCR signaling pathway. The target genes regulated by LOC101927151 are mainly enriched in SA PROGRAMMED CELL DEATH, Small CELL lung cancer and other signaling pathways. So far, no reports on LOC101927151 and LEMD1-AS1 have been reported in tumors.

The following information regarding data availability was supplied: GitHub: https://github.com/mingjunzheng/Analysis-of-lncRNAand-CNV Acknowledgement The authors sincerely thank BioMed Proofreading LLC for guidance in the English copyediting of our paper. Appendix A. Supplementary data Supplementary material related to this article can be found, in the online version, at doi:https://doi.org/10.1016/j.biopha.2019.109810. References [1] A. Jemal, R. Siegel, E. Ward, T. Murray, J. Xu, C. Smigal, Cancer statistics, 2006, CA Cancer J. Clin. 56 (2006) 106–130. [2] T.J. Herzog, Recurrent ovarian cancer: how important is it to treat to disease progression? Clin. Cancer Res. 10 (2004) 7439–7449. [3] P.E. Colombo, M. Fabbro, C. Theillet, F. Bibeau, P. Rouanet, I. Ray-Coquard, Sensitivity and resistance to treatment in the primary management of epithelial ovarian cancer, Crit. Rev. Oncol. Hematol. 89 (2) (2014) 207–216. [4] K. Kim, I. Jutooru, G. Chadalapaka, G. Johnson, J. Frank, R. Burghardt, S. Kim, S. Safe, HOTAIR is a negative prognostic factor and exhibits pro-oncogenic activity in pancreatic cancer, Oncogene 32 (13) (2012) 1616–1625. [5] G. Wu, J. Cai, Y. Han, J. Chen, Z.P. Huang, C. Chen, Y. Cai, H. Huang, Y. Yang, Y. Liu, Z. Xu, D. He, X. Zhang, X. Hu, L. Pinello, D. Zhong, F. He, G.C. Yuan, D.Z. Wang, C. Zeng, LincRNA-p21 regulates neointima formation, vascular smooth muscle cell proliferation, apoptosis, and atherosclerosis by enhancing p53 activity, Circulation 130 (17) (2014) 1452–1465. [6] R.A. Gupta, N. Shah, K.C. Wang, J. Kim, H.M. Horlings, D.J. Wong, M.C. Tsai, T. Hung, P. Argani, J.L. Rinn, Y. Wang, P. Brzoska, B. Kong, R. Li, R.B. West, M.J. van de Vijver, S. Sukumar, H.Y. Chang, Long non-coding RNA HOTAIR reprograms chromatin state to promote cancer metastasis, Nature 464 (7291) (2010) 1071–1076. [7] P.W. Lu, L. Li, F. Wang, Y.T. Gu, Inhibitory role of large intergenic noncoding RNAROR on tamoxifen resistance in the endocrine therapy of breast cancer by regulating the PI3K/Akt/mTOR signaling pathway, J. Cell. Physiol. 234 (2) (2018) 1904–1912. [8] Z. Xue, Z. Zhang, H. Liu, W. Li, X. Guo, Z. Zhang, Y. Liu, L. Jia, Y. Li, Y. Ren, H. Yang, L. Zhang, Q. Zhang, Y. Da, J. Hao, Z. Yao, Zhang R lincRNA-Cox2 regulates NLRP3 inflammasome and autophagy mediated neuroinflammation, Cell Death Differ. 26 (1) (2018) 130–145. [9] P. Martini, L. Paracchini, G. Caratti, M. Mello-Grand, R. Fruscio, L. Beltrame, E. Calura, G. Sales, A. Ravaggi, E. Bignotti, F.E. Odicino, E. Sartori, P. Perego, D. Katsaros, I. Craparotta, G. Chiorino, S. Cagnin, L. Mannarino, L. Ceppi, C. Mangioni, C. Ghimenti, M. D’Incalci, S. Marchini, C. Romualdi, lncRNAs as novel indicators of patients’ prognosis in stage I epithelial ovarian Cancer: a retrospective and multicentric study, Clin. Cancer Res. 23 (9) (2017) 2356–2366. [10] D.D. Wu, X. Chen, K.X. Sun, L.L. Wang, S. Chen, Y. Zhao, Role of the lncRNA ABHD11-AS1 in the tumorigenesis and progression of epithelial ovarian cancer through targeted regulation of RhoC, Mol. Cancer 16 (1) (2017) 138. [11] R. Mitra, X. Chen, E.J. Greenawalt, U. Maulik, W. Jiang, Z. Zhao, C.M. Eischen, Decoding critical long non-coding RNA in ovarian cancer epithelial-to-mesenchymal transition, Nat. Commun. 8 (1) (2017) 1604. [12] H. Liang, T. Yu, Y. Han, H. Jiang, C. Wang, T. You, X. Zhao, H. Shan, R. Yang, L. Yang, H. Shan, Y. Gu, LncRNA PTAR promotes EMT and invasion-metastasis in serous ovarian cancer by competitively binding miR-101-3p to regulate ZEB1 expression, Mol. Cancer 17 (1) (2018) 119. [13] B. Trost, S. Walker, Z. Wang, B. Thiruvahindrapuram, J.R. MacDonald, W.W.L. Sung, S.L. Pereira, J. Whitney, A.J.S. Chan, G. Pellecchia, M.S. Reuter, S. Lok, R.K.C. Yuen, C.R. Marshall, D. Merico, S.W. Scherer, A Comprehensive Workflow for Read Depth-Based Identification of Copy-Number Variation from Whole-Genome Sequence Data, Am. J. Hum. Genet. 102 (1) (2018) 142–155. [14] M. Qiu, W. Xia, R. Chen, S. Wang, Y. Xu, Z. Ma, W. Xu, E. Zhang, J. Wang, T. Fang, J. Hu, G. Dong, R. Yin, J. Wang, L. Xu, The circular RNA circPRKCI promotes tumor growth in lung adenocarcinoma, Cancer Res. 78 (11) (2018) 2839–2851. [15] B.M. Reid, J.B. Permuth, Y. Ann Chen, B.L. Fridley, E.S. Iversen, Z. Chen, H. Jim, R.A. Vierkant, J.M. Cunningham, J.S. Barnholtz-Sloan, S. Narod, H. Risch, J.M. Schildkraut, E.L. Goode, A.N. Monteiro, T.A. Sellers, Genome-wide analysis of common copy number variation and epithelial ovarian cancer risk. Cancer epidemiology, Biomarkers 7 (2019). [16] W. Ahmed, M.F.A. Malik, M. Saeed, F. Haq, Copy number profiling of Oncotype DX genes reveals association with survival of breast cancer patients, Mol. Biol. Rep. 45 (6) (2018) 2185–2192. [17] S. Rosenberg, F. Ducray, A. Alentorn, C. Dehais, N. Elarouci, A. Kamoun, Y. Marie, M.L. Tanguy, A. De Reynies, K. Mokhtari, D. Figarella-Branger, J.Y. Delattre, Idbaih

5. Conclusion With the development of internet and big data era coming, constructing databases [30–32] and establishing powerful webserver [33,34] will provide the convenience to most scholar. Through the genome sequencing information in the TCGA database, we can deeply explore the mechanism of tumor development and development. At present, studies on lncRNA CNV changes and ovarian cancer have not been reported, this is the first study combined TCGA-OV lncRNA expression profile analysis with CNVs correlation analysis. LncRNA with transcriptional dysregulation caused by CNV abnormality were found, which were LOC101927151, LINC00861 and LEMD1-AS1, respectively. The effect of CNVs on lncRNA expression is likely to be one of the pathogenesis of ovarian cancer and other diseases. Our results provide new insights into the diagnosis, treatment, prognosis of ovarian cancer. However, current large-scale multi-group and clinical data on ovarian cancer are still inadequate. With the emergence of more largescale multi-group and clinical data, the prediction accuracy and reliability of this method will be further improved. And in the present study, we mainly focused on finding transcriptional dysregulation lncRNA caused by CNV abnormality, screening for CNV-associated lncRNA prognostic biomarkers in ovarian cancer. We think that existing analysis may not be optimal, but should be sufficient to draw a conclusion that copy number variations primed lncRNAs deregulation contribute to poor prognosis in ovarian cancer. This part of the functional experiments of these lncRNAs will be the in-depth work in the follow-up study. Authorship Mingjun Zheng and Yuexin Hu made substantial contributions to the conception and design, acquisition, analysis and interpretation of data. Bei Lin was involved in drafting the manuscript and revising it critically for important intellectual content. Mingjun Zheng, Yuexin Hu, Rui Gou, Xin Nie, Xiao Li, Juanjuan Liu, and Bei Lin gave final approval of the version to be published. Funding This study was supported by grants from the Shengjing Freedom Researchers’ Plan (201804). The funding body had no role in the design or conduct of the study. Declaration of Competing Interest The authors declare no conflict of interest. 11

Biomedicine & Pharmacotherapy 124 (2020) 109810

M. Zheng, et al.

[18] [19] [20] [21]

[22] [23] [24] [25] [26]

A Machine Learning for Better Prognostic Stratification and Driver Gene Identification Using Somatic Copy Number Variations in Anaplastic Oligodendroglioma, Oncologist 23 (12) (2018) 1500–1510. C.H. Mermel, S.E. Schumacher, B. Hill, M.L. Meyerson, R. Beroukhim, G. Getz, GISTIC2.0 facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers, Genome Biol. 12 (4) (2011) R41. Y. Zhou, B. Zhou, L. Pache, M. Chang, A.H. Khodabakhshi, O. Tanaseichuk, C. Benner, S.K. Chanda, Metascape provides a biologist-oriented resource for the analysis of systems-level datasets, Nat. Commun. 10 (1) (2019) 1523. C. Stark, B.J. Breitkreutz, T. Reguly, L. Boucher, A. Breitkreutz, M. Tyers, BioGRID: a general repository for interaction datasets, Nucleic Acids Res. 1 (34) (2006) D535–9. T. Li, R. Wernersson, R.B. Hansen, H. Horn, J. Mercer, G. Slodkowicz, C.T. Workman, O. Rigina, K. Rapacki, H.H. Stærfeldt, S. Brunak, T.S. Jensen, K. Lage, A scored human protein-protein interaction network to catalyze genomic interpretation, Nat. Methods 14 (1) (2017) 61–64. D. Türei, T. Korcsmáros, J. Saez-Rodriguez, OmniPath: guidelines and gateway for literature-curated signaling pathway resources, Nat. Methods 13 (12) (2016) 966–967. A. Nagy, A. Lánczky, O. Menyhárt, B. Győrffy, Validation of miRNA prognostic power in hepatocellular carcinoma using expression data of independent datasets, Sci. Rep. 8 (1) (2018) 9227. Cancer Genome Atlas Research Network, Integrated genomic analyses of ovarian carcinoma, Nature 474 (7353) (2011) 609–615. T.R. Sarkar, A.K. Maity, Y. Niu, K. Bani, Mallick multiple omics data integration to identify long noncoding RNA responsible for breast cancer–related, Mortality Cancer Informatics 8 (24) (2019). Y. Lv, W. Wei, Z. Huang, Z. Chen, Y. Fang, L. Pan, X. Han, Z. Xu, A long non-coding

[27]

[28] [29]

[30] [31] [32] [33] [34]

12

RNA expression profile can predict early recurrence in hepatocellular carcinoma after curative resection, Hepatol. Res. 48 (13) (2018) 1140–1148. Y. Qiao, Z. Mao, Y.J. Shi, Z.G. Liu, Q. Cao, Q. Zhang, Comprehensive Analysis of miRNA-mRNA-lncRNA Networks in Non-Smoking and Smoking Patients with Chronic Obstructive Pulmonary Disease, Cell. Physiol. Biochem. 50 (3) (2018) 1140–1153. Y. Zhang, Y. Li, Q. Wang, X. Zhang, D. Wang, H.C. Tang, X. Meng, X. Ding, Identification of an lncRNA-miRNA-mRNA interaction mechanism in breast cancer based on bioinformatic analysis, Mol. Med. Rep. (2017). M. Xu, X. Xu, B. Pan, X. Chen, K. Lin, K. Zeng, X. Liu, T. Xu, L. Sun, J. Qin, B. He, Y. Pan, H. Sun, S. Wang, LncRNA SATB2-AS1 inhibits tumor metastasis and affects the tumor immune cell microenvironment in colorectal cancer by regulating SATB2, Mol. Cancer 18 (1) (2019) 135. T. Cui, L. Zhang, Y. Huang, Y. Yi, P. Tan, Y. Zhao, Y. Hu, L. Xu, E. Li, D. Wang, MNDR v2.0: an updated resource of ncRNA-disease associations in mammals, Nucleic Acids Res. 4 (46) (2018) D371–D374. Z.Y. Liang, H.Y. Lai, H. Yang, C.J. Zhang, H. Yang, H.H. Wei, X.X. Chen, Y.W. Zhao, Z.D. Su, W.C. Li, E.Z. Deng, H. Tang, W. Chen, H. Lin, Pro54DB: a database for experimentally verified sigma-54 promoters, Bioinformatics 33 (3) (2017) 467–469. B. Hu, L. Zheng, C. Long, M. Song, T. Li, L. Yang, Y. Zuo, EmExplorer: a database for exploring time activation of gene expression in mammalian embryos, Open Biol. 9 (6) (2019) 190054. H. Yang, H. Lv, H. Ding, W. Chen, H. Lin, iRNA-2OM: a sequence-based predictor for identifying 2’-O-Methylation sites in Homo sapiens, J. Comput. Biol. 25 (11) (2018) 1266–1277. Y. Zuo, Y. Li, Y. Chen, G. Li, Z. Yan, L. Yang, PseKRAAC: a flexible web server for generating pseudo K-tuple reduced amino acids composition, Bioinformatics 33 (1) (2017) 122–124.