Identification and validation of immune-related lncRNA prognostic signature for breast cancer

Identification and validation of immune-related lncRNA prognostic signature for breast cancer

Journal Pre-proof Identification and validation of immune-related prognostic signature for breast cancer lncRNA Yong Shen, Xiaowei Peng, Chuanlu She...

7MB Sizes 0 Downloads 27 Views

Journal Pre-proof Identification and validation of immune-related prognostic signature for breast cancer

lncRNA

Yong Shen, Xiaowei Peng, Chuanlu Shen PII:

S0888-7543(20)30065-3

DOI:

https://doi.org/10.1016/j.ygeno.2020.02.015

Reference:

YGENO 9479

To appear in:

Genomics

Received date:

22 January 2020

Revised date:

10 February 2020

Accepted date:

18 February 2020

Please cite this article as: Y. Shen, X. Peng and C. Shen, Identification and validation of immune-related lncRNA prognostic signature for breast cancer, Genomics (2020), https://doi.org/10.1016/j.ygeno.2020.02.015

This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

© 2020 Published by Elsevier.

Journal Pre-proof

Identification and validation of immune-related lncRNA Prognostic Signature for Breast Cancer Yong shena,#, Xiaowei Penga,#, Chuanlu Shena,*

a

Department of Pathology and Pathophysiology, School of Medicine, Southeast

-p

ro

of

University, 210009, Nanjing, China

re

Keywords

Jo ur

na

lP

TCGA, LncRNA, Breast Cancer, Risk score, Immune Prognostic Model

* #

Corresponding author, Email address: [email protected] (C Shen) and [email protected] (Y Shen) Yong shen and Xiaowei Peng contributed equally to this work.

Journal Pre-proof

ABSTRACT The prognosis of patients with breast cancer is closely related to both the infiltration of immune cells and the expression of lncRNAs. In this study, we evaluated the infiltration of immune cells in 1109 breast cancer samples obtained from TCGA by applying the ssGSEA to the transcriptome of these samples, thereby

of

generating high immune cell infiltration group and low immune cell infiltration group.

ro

On the basis of these groupings, we found 696 differentially expressed lncRNAs

-p

which were sequentially subjected to univariate Cox regression and stepwise multiple

re

Cox regression analysis, 11 lncRNAs were identified as prognostic signature for

lP

breast cancer. Kaplan-Meier analysis, univariate Cox regression, multivariate Cox regression, and ROC analyses further revealed that this 11-lncRNA signature was a

na

novel and important prognostic factor independent of multiple clinicopathological

Jo ur

parameters. The TIMER database showed that this 11-lncRNA prognostic signature for breast cancer was associated with the infiltration of immune cell subtypes.

Journal Pre-proof

1. Introduction Breast cancer is one of the most common malignancies in women worldwide[1]. Its morbidity rate is increasing year by year, and its mortality rate ranks second among female malignant tumors[2]. Fortunately, due to the improvement of diagnosis and treatment in recent years, the mortality rate of breast cancer has been greatly reduced

of

so far[3]. Breast cancer is a highly heterogeneous tumor, and its etiology and pathological manifestations vary from person to person[4]. However, the prognosis of

ro

patients with breast cancer is mostly related to immunity[5]. There are a large number

-p

of inflammatory cells infiltrated in breast cancer, not only around the tumor but also

re

in the tumor matrix[6]. Some studies have shown that the density of CD8+ T cells

lP

(cytotoxic T cells) is highly correlated with immune escape in breast cancer, and the

na

infiltration of CD8+ T and CD4+ T cells is also significantly related to the prognosis of breast cancer[7]. Macrophages are another important component of breast cancer

cells

and

Jo ur

tumor-infiltrating immune cells, reaching about 50%[8]. Cleaning up the wreckage of conducting

antigenic

reactions

are

their

main

functions[8].

Antigen-presenting cells (APC) and dendritic cells (DC) also play an important role in antigen presentation and cytotoxicity to tumor antigens[9, 10]. Therefore, in order to improve the prognosis of breast cancer and to provide reliable information to guide the correct individual treatment strategies, we urgently need to screen reliable immune predictors and prognostic indicators. Long non-coding RNA (LncRNA) is a class of RNA molecules with transcripts longer than 200 nt[11]. They do not encode proteins but regulate gene expression at

Journal Pre-proof

various levels (epigenetic, transcriptional or post-transcriptional regulation, etc.) in the form of RNA[11]. As a new type of gene regulator, lncRNA is associated with the development, progression, and prognosis of human diseases, especially cancer. The abnormal expression of some lncRNAs may be related to excessive cell growth, repressed apoptosis, invasion, metastasis, epithelial-mesenchymal transformation

of

(EMT) and poor prognosis of breast cancer[12]. For example, lncRNA-Hh strengthens cancer stem cell generation in twist-positive breast cancer via activation

ro

of the hedgehog signaling pathway[13]. LncRNA-HOXA11-AS has been found to

-p

overexpress in breast cancer, contributing to the invasion and metastasis of breast

re

cancer cells[14]. LncRNAs regulating the immune microenvironment of human breast

lP

cancer have become a hot spot. Some studies showed that there were a large number

na

of different types of immune cells infiltrating in breast cancer, not only in the cancer nest, but also in the tumor matrix, and the prognosis of breast cancer was closely

16].

Jo ur

related to the type and number of immune cells infiltrating around the neoplasm [15,

Therefore, the establishment of tools to accurately predict the prognosis of breast cancer patients is very important to guide clinical diagnosis and treatment. Because abnormal phenotypes are closely related to the poor prognosis of breast cancer, it is reasonable to identify lncRNA related to breast cancer phenotype to predict breast cancer prognosis[17]. In the present study, we analyzed the data set of lncRNA expression in the Cancer Genome Atlas (TCGA) and screened the lncRNAs related to tumor phenotype by single-sample gene set enrichment analysis (ssGSEA),

Journal Pre-proof

ESTIMATE,

Cox

and

other

analysis

methods. We

demonstrated

that

11

survival-related and grade-related lncRNAs were related closely to the prognosis of breast cancer.

2. Materials and methods

of

2.1 Collection and grouping of Breast cancer data The fragments per kilobase of per million (FPKM) of breast cancer

ro

transcriptome, lncRNA counts data and corresponding clinical data of breast cancer

-p

were downloaded from TCGA program (https://portal.gdc.cancer.gov). The grouping

re

of breast cancer transcriptome data based on the TCGA was realized by ssGSEA. We

lP

had obtained a set of marker genes for immune cell types from Bindea et al. Using 29

na

immune data sets, including immune cell types, immune-related pathways, and immune-related functions, we used the ssGSEA method of R software Gene Set

Jo ur

Variation Analysis (GSVA) package to analyze the infiltration level of different immune cells, immune-related pathways and the activity of immune-related functions in breast cancer expression profile data. The ssGSEA applied the genetic characteristics expressed by immune cell populations to individual cancer samples. According to the results of ssGSEA, samples of breast cancer in the TCGA were classified as high immune cell infiltration group and low immune cell infiltration group by using “hclust” (R package).

2.2 Verification of the effectiveness of immune grouping

Journal Pre-proof

The analysis of differentially expressed genes (DEG) in the expression profile data was carried out by using the ESTIMATE algorithm. The Stromal Score, Immune Score, ESTIMATE Score, and Tumor Purity were also analyzed by ESTIMATE algorithm based on transcriptome expression profiles of breast cancer to verify the effect of ssGSEA grouping and to draw clustering heat map and statistical map. The

of

gene expression level of human leukocyte antigen (HLA) and CD274 (PD-L1) were used to verify the differences between the two groups. The CIBERSORT

ro

deconvolution algorithm was used to accurately determine the composition of

-p

immune cells in large tumor sample data from mixed cell types, and the DEG of the

lP

re

two groups was verified again.

na

2.3 Identification of immune-related lncRNAs in breast cancer According to the above-mentioned groups, the TCGA lncRNA counts expression

Jo ur

profile data were divided into high immune cell infiltration group and low immune cell infiltration group. The differentially expressed lncRNA was analyzed by edgeR package according to the criteria of |log2FC|>1 and p<0.05. The lncRNA related to immunity and affecting tumorigenesis was screened out after the difference analysis was carried out according to the same criteria between cancer group and paracancerous group. Venn analysis was used to detect the immune-related lncRNA from two analyses above.

2.4 Identification of immune-related lncRNA prognostic signature for breast cancer

Journal Pre-proof

According to the clinical data of breast cancer cases in the TCGA, univariate Cox proportional hazard regression (PHR) analysis was used to screen lncRNA related to survival from immune-related lncRNA with p<0.001 as the criteria. Then multivariate Cox PHR analysis was used to construct a prognostic signature and the risk score was calculated. Kaplan-Meier survival analysis was performed to compare

of

the survival difference for both groups. LASSO Cox analysis identified lncRNAs most correlated with overall survival, and 10-round cross-validation was performed to

ro

prevent overfitting. The risk score for each patient was then calculated based on the

-p

expression levels of lncRNAs. According to the median risk score, breast cancer

re

patients were divided into high-risk group and low-risk group. The risk score was

lP

calculated using the following formula[18]:

𝑛

na

Risk core = ∑ coefi X id 𝑖=1

Jo ur

Univariate and multivariate Cox regression analysis was used to evaluate the prognostic relationship between risk score and age, sex, grade, clinical stage and T stage (N stage and M stage had a large number of uncertain values, which were not included in the study).

2.5 Correlation analysis of immune cell infiltration B cells, CD4+T-cells, CD8+T-cells, dendritic cells, macrophages, and neutrophils immune infiltration data were download from tumor immune estimation resource (TIMER) database (https://cistrome.shinyapps.io/timer/). The correlation between risk scores and immune infiltration was calculated by Pearson correlation.

Journal Pre-proof

2.6 Statistical Analysis All statistical analysis was applied by R version 3.6.1 (Institute for Statistics and Mathematics, Vienna, Austria; https://www.r-project.org) (Package: impute, Up Set R, ggplot2, rms, glmnet, preprocess Core, forest plot, survminer, survival ROC,

of

beeswarm)[18]. For descriptive statistics, mean ± standard deviation was used for the continuous variables in normal distribution while the media (range) was used for

ro

continuous variables in abnormal distribution. Categorical variables were described

-p

by counts and percentages. Two-tailed p<0.05 was regarded statistically

lP na

3. Result

re

significant[18].

3.1 Construction and verification of breast cancer groupings

Jo ur

We obtained 1109 breast cancer samples and 113 paracancerous samples from the TCGA. The ssGSEA method was applied to the transcriptome of breast cancer samples to evaluate the infiltration of immune cells. Twenty-four immune-related terms were included to eliminate the richness of multiple immune cell types in breast cancer. By using unsupervised hierarchical clustering algorithm, breast cancer samples were divided into two groups according to immune infiltration, including the high immune cell infiltration group (n = 943) and the low immune cell infiltration group (n = 166) (Figure 1a). In order to verify the feasibility of the above grouping strategy, based on the expression profile of breast cancer, the ESTIMATE algorithm

Journal Pre-proof

was used to calculate Tumor Purity, ESTIMATE Score, Immune Score, and Stromal Score. Compared with the low immune cell infiltration group, the high immune cell infiltration group had lower Tumor Purity but higher ESTIMATE Score, Immune Score and Stromal Score (Figure 1a). The box chart also showed that there was a significant

positive

correlation

between

high

immune

cell

infiltration

of

group(Immunity-H) and ESTIMATE Score, Immune Score and Stromal Score, respectively, while there was a positive correlation between low immune cell

ro

infiltration group(Immunity-L) and Tumor Purity (Figure 1b). Compared with the

-p

low immune cell infiltration group, the high immune cell infiltration group had higher

re

immune components and lower tumor purity (p<0.05). Also, we found that the

lP

expression of HLA family and CD274 (PD-L1) in the high immune cell infiltration

na

group was significantly higher than that in the low immune cell infiltration group, respectively (p<0.01) (Figure 1c and 1d). In addition, we used the CIBERSORT

Jo ur

method to verify the above groups and found that the high immune cell infiltration group had more amount of kinds of immune cells (Figure 1e). In aggregate, these results indicate that this breast cancer grouping can be used for follow-up analysis.

3.2 Analysis of differentially expressed lncRNAs between tumor group and paracancerous group and between high immune cell infiltration group and low immune cell infiltration group According to the criteria of |log2FC|>1 and FsDR<0.05, we analyzed the difference between breast cancer group (1109 cases) and breast cancer paracancerous

Journal Pre-proof

group (113 cases). We found 2999 differentially expressed lncRNAs, of which 2208 and 791 were up-regulated and down-regulated, respectively (Figure 2a). According to the same criteria, 1422 differentially expressed lncRNAs were identified in the high immune cell infiltration group compared with the low immune cell infiltration group, with 455 up-regulated and 967 down-regulated (Figure 2b). After a two-way Venn

of

analysis, a total of 696 differentially expressed lncRNAs were determined in the tumor group and high immune cell infiltration group compared with the

ro

paracancerous group and low immune cell infiltration group (Figure 2c). Together,

re

-p

these results suggest that there were immune-related lncRNAs in breast cancer tissue.

na

for breast cancer

lP

3.3 Identification and assessment of 11 immune-related lncRNA prognostic signature

Based on the survival data set of breast cancer samples, we applied univariate

Jo ur

Cox regression to the expression profiles of the 696 lncRNAs. A total of 18 differentially expressed lncRNAs were determined according to the criterion of p<0.001 (Figure 3a). In order to avoid overfitting the prognostic signature, we performed Lasso regression on these lncRNAs and found 17 differentially expressed lncRNAs related to immune cell infiltration in breast cancer (Figure 3b), and the optimal values of the penalty parameter

were determined by 10-round

cross-validation(Figure 3c). By stepwise multiple Cox regression analysis, 11 lncRNAs,

including

LINC00668,

LINC02418,

AL356515.1,

LINC01010,

AP005131.6, AL772337.1, AC027514.1, AL161646.2, AC004847.1, AC243773.2 and

Journal Pre-proof

AL591686.1. were further identified from the above 17 lncRNAs (Table 1). The risk score for each sample was then calculated based on the expression levels of these 11 lncRNAs. Risk score = 0.06*LINC00668 +0.13*LINC02418 +0.24*AL356515.1 -0.23*LINC01010

-0.15*AP005131.6

+0.18*AL772337.1

+0.21*AC027514.1

+0.17*AL161646.2 -0.13*AC004847.1 +0.07*AC243773.2 -0.13*AL591686.1(Table

of

1). According to the median risk score, breast cancer samples were divided into high-risk group and low-risk group. Kaplan-Meier curve showed that the samples in

ro

the high-risk group exhibited worse overall survival (OS) than those in the low-risk

-p

group, indicating the prognostic signature of risk score is effective (p = 2.493e-10)

re

(Figure 3d). The risk curve and scatterplot were generated to show the risk score and

lP

survival status of each breast cancer sample. The risk coefficient and mortality of

na

samples in the high-risk group were higher than those in the low-risk group (Figure 3e and 3f). The heatmap of these 11 lncRNA expression profiles in breast cancer

Jo ur

samples showed that LINC01010, AP005131.6, AC004847.1, and AL591686.1 were highly expressed in the low-risk group, while LINC00668, LINC02418, AL356515.1, AC027514.1, AL772337.1, AL161646.2, and AC243773.2 were highly expressed in the high-risk group (Figure 3g). Collectively, these studies identify 11 immune-related lncRNAs as prognostic signature for breast cancer.

3.4 Evaluation of 11 immune-related lncRNAs as independent prognostic factors in patients with breast cancer Univariate and multivariate Cox regression analyses were used to explore

Journal Pre-proof

whether the above 11 immune-related lncRNAs were prognostic factors for breast cancer independent of clinicopathological factors, such as age, gender, and pathological stage. The hazard ratio (HR) of risk score and 95% CI were 1.328 and 1.256-1.404 in univariate Cox regression analysis (p<0.001), and 1.266 and 1.188-1.349 in multivariate Cox regression analysis (p<0.001), respectively,

of

suggesting that the 11 lncRNAs were independent prognostic factors in patients with breast cancer (Figure 4a and 4b). In order to compare the sensitivity and specificity

ro

of risk score on the prognosis of patients with breast cancer, time-dependent receiver

-p

operating characteristics (ROC) analysis was performed. The area under the ROC

re

curve (AUC) of the risk score was 0.836 (Figure 4c), suggesting the 11 lncRNA

lP

prognostic signature for breast cancer was highly reliable. In aggregate, these results

na

indicate that the 11 immune-related lncRNAs were independent prognostic factors in

Jo ur

patients with breast cancer.

3.5 Correlation between 11 immune-related lncRNA prognostic signature for breast cancer and the infiltration of immune cell subtypes Given that these 11 lncRNAs were related to tumor immunity, we next analyzed the correlation between the 11 lncRNA prognostic signature and the infiltration of immune cell subtypes in breast cancer using the data from the TIMER database. As shown in Figure 5a-5f, the correlation values of B cells, CD4+ T cells, CD8+ T cells, DC, neutrophils, and macrophages with risk score were -0.111, -0.205, -0.169, -0.208, -0.204 and -0.097, respectively, suggesting that the infiltration of these immune cell

Journal Pre-proof

subtypes was significantly negative correlated with the prognosis of breast cancer. Taken together, these results indicate that the 11 lncRNA prognostic signature for breast cancer was associated with the infiltration of these immune cell subtypes.

4. Discussion

of

Breast cancer is the most common and fatal malignant tumor among women in the world, with highly heterogeneous biological and clinical features[19]. The high

ro

heterogeneity of breast cancer exists not only in the genotypes and phenotypes of

-p

tumor cells but also in the tumor microenvironment[20]. Breast cancer tissue is not

re

only composed of breast cancer cells but also mixed with many kinds of normal cells,

lP

such as immune cells, stromal cells and fibroblasts[21]. These different types of cells

na

interact with each other, evolve together, and eventually form a complex whole. In the current study, therefore, we focus on the heterogeneity of breast cancer and the

Jo ur

interaction between tumor-infiltrating immune cells and tumor cells, which was of great significance for studying the mechanism of tumor development and progression, and for developing new diagnostic and therapeutic approaches. In addition, with the wide application of high-throughput technology and the continuous maturity of data sharing mechanism, unprecedented large-scale multi-group tumor data have accumulated in the international public database, and tumor research has entered the era of "big data"[22]. Using the transcriptome sequencing data, especially on lncRNAs, and clinical-pathological features of breast cancer obtained from the TCGA, we identify and verify the 11 lncRNA prognostic signature related to immune cell

Journal Pre-proof

infiltration in this study. The heterogeneity of immune microenvironment in breast cancer is very high, and the type and number of infiltrating immune cells vary greatly in different locations[23]. In this study, there were significant differences in Tumor Purity, ESTIMATE Score, Immune Score, and Stromal Score between the high immune cell

of

infiltration group and the low immune cell infiltration group. Furthermore, the heterogeneity of immune microenvironment in breast cancer was verified by the

ro

expression of HLA and CD724 as well as the algorithm of CIBERSORT.

-p

In recent years, in-depth sequencing studies of transcriptome have found that

re

about 4/5 of the transcripts in the human genome are protein non-coding genes,

lP

including lncRNAs[24]. lncRNAs have been shown to participate in the development,

na

progression, invasion, and metastasis of breast cancer used variety of ways[12, 25]. In this study, we identify 11 lncRNAs including LINC01010, LINC00668, and

Jo ur

LINC02418 as prognostic signatures for breast cancer. Similarly, some studies have shown that LINC01010 was significantly related to the survival and prognosis of patients with neuroblastoma[26]. LINC00668 promotes the development of breast cancer by inhibiting apoptosis and accelerating cell cycle[27]. LINC02418 was a promising new tumor marker for the diagnosis and prognosis of colorectal cancer[28]. In order to explore the feasibility of the prognostic signature in clinical application, we compared this prognostic signature with the clinical indexes of breast cancer patients, such as gender, age, pathological stage, etc., using the univariate and multivariate COX analyses as well as ROC analysis, and confirmed that the 11

Journal Pre-proof

lncRNA prognostic signature could be independent prognostic factor in patients with breast cancer. Based on lncRNA sequencing data, tumor immune infiltrating cells account for a high proportion in many kinds of tumors, such as breast cancer, skin melanoma, non-small cell lung cancer and colon cancer[29, 30]. They are the key to tumor

of

immunotherapy[31]. The antigen-antibody complementary determining regions of T cell and B cell receptors play a decisive role in their recognition of tumor-specific

ro

antigen[32]. Therefore, the study of the sequence characteristics of tumor immune

-p

infiltrating T cell and B cell surface receptors was helpful to analyze the interaction

re

between tumor cells and T cells or B cells and to develop new methods for tumor

lP

diagnosis and treatment. Postoperative tumor tissue usually contains a certain amount

na

of immune infiltrating cells, leading to tumor tissue RNA sequencing data mixed with all kinds of information of tumor immune microenvironment[33]. In this study, we

Jo ur

find that 11 lncRNA prognostic signature for breast cancer was associated with the infiltration of immune cell subtypes using edgeR package. In conclusion, these studies identify 11 lncRNAs as prognostic signatures for breast cancer. The 11 lncRNA prognostic signature for breast cancer was associated with the infiltration of immune cell subtypes.

Acknowledgements This work was supported by grants (Nos. 81071803, 81272261, and 30971144) from the National Natural Science Foundation of China (http://www.nsfc.gov.cn/).

Journal Pre-proof

References [1] A. Dumas, I. Vaz Luis, T. Bovagnet, M. El Mouhebb, A. Di Meglio, S. Pinto, C. Charles, S. Dauchy, S. Delaloge, P. Arveux, C. Coutant, P. Cottu, A. Lesur, F. Lerebours, O. Tredan, L. Vanlemmens, C. Levy, J. Lemonnier, C. Mesleard, F. Andre, G. Menvielle, Impact of Breast Cancer Treatment on Employment: Results of a Multicenter Prospective Cohort Study (CANTO), J Clin Oncol, (2019) JCO1901726.

of

[2] A.M. Afifi, A.M. Saad, M.J. Al-Husseini, A.O. Elmehrath, D.W. Northfelt, M.B.

ro

Sonbol, Causes of death after breast cancer diagnosis: A US population-based analysis, Cancer, (2019).

-p

[3] T. Wang, L.E. McCullough, A.J. White, P.T. Bradshaw, X. Xu, Y.H. Cho, M.B.

re

Terry, S.L. Teitelbaum, A.I. Neugut, R.M. Santella, J. Chen, M.D. Gammon, Prediagnosis aspirin use, DNA methylation, and mortality after breast cancer: A

lP

population-based study, Cancer, 125 (2019) 3836-3844. [4] V. Cremasco, J.L. Astarita, A.L. Grauel, S. Keerthivasan, K. MacIsaac, M.C.

na

Woodruff, M. Wu, L. Spel, S. Santoro, Z. Amoozgar, T. Laszewski, S.C. Migoni, K. Knoblich, A.L. Fletcher, M. LaFleur, K.W. Wucherpfennig, E. Pure, G. Dranoff, M.C.

Jo ur

Carroll, S.J. Turley, FAP Delineates Heterogeneous and Functionally Divergent Stromal Cells in Immune-Excluded Breast Tumors, Cancer Immunol Res, 6 (2018) 1472-1485.

[5] E. Mamessier, F. Bertucci, R. Sabatier, D. Birnbaum, D. Olive, "Stealth" tumors: Breast cancer cells shun NK-cells anti-tumor immunity, Oncoimmunology, 1 (2012) 366-368. [6] N. Eiro, B. Fernandez-Garcia, L.O. Gonzalez, F.J. Vizoso, Cytokines related to MMP-11 expression by inflammatory cells and breast cancer metastasis, Oncoimmunology, 2 (2013) e24010. [7] M. Harao, M.A. Forget, J. Roszik, H. Gao, G.V. Babiera, S. Krishnamurthy, J.A. Chacon, S. Li, E.A. Mittendorf, S.M. DeSnyder, K.F. Rockwood, C. Bernatchez, N.T. Ueno, L.G. Radvanyi, L. Vence, C. Haymaker, J.M. Reuben, 4-1BB-Enhanced

Journal Pre-proof Expansion

of

CD8(+)

TIL from

Triple-Negative

Breast

Cancer

Unveils

Mutation-Specific CD8(+) T Cells, Cancer Immunol Res, 5 (2017) 439-445. [8] P. Bieniasz-Krzywiec, R. Martin-Perez, M. Ehling, M. Garcia-Caballero, S. Pinioti, S. Pretto, R. Kroes, C. Aldeni, M. Di Matteo, H. Prenen, M.V. Tribulatti, O. Campetella, A. Smeets, A. Noel, G. Floris, J.A. Van Ginderachter, M. Mazzone, Podoplanin-Expressing

Macrophages

Promote

Lymphangiogenesis

and

Lymphoinvasion in Breast Cancer, Cell Metab, 30 (2019) 917-936 e910. [9] C.D. Stefanski, K. Keffler, S. McClintock, L. Milac, J.R. Prosperi, APC loss

of

affects DNA damage repair causing doxorubicin resistance in breast cancer cells,

ro

Neoplasia, 21 (2019) 1143-1150.

[10] P. Michea, F. Noel, E. Zakine, U. Czerwinska, P. Sirven, O. Abouzid, C. Goudot,

-p

A. Scholer-Dahirel, A. Vincent-Salomon, F. Reyal, S. Amigorena, M. Guillot-Delost,

re

E. Segura, V. Soumelis, Adjustment of dendritic cells to the breast-cancer

lP

microenvironment is subset specific, Nat Immunol, 19 (2018) 885-897. [11] P. Cai, A.B. Otten, B. Cheng, M.A. Ishii, W. Zhang, B. Huang, K. Qu, B.K. Sun,

na

A genome-wide long noncoding RNA CRISPRi screen identifies PRANCR as a novel regulator of epidermal homeostasis, Genome Res, (2019).

Jo ur

[12] Q.Y. Huang, G.F. Liu, X.L. Qian, L.B. Tang, Q.Y. Huang, L.X. Xiong, Long Non-Coding RNA: Dual Effects on Breast Cancer Metastasis and Clinical Applications, Cancers (Basel), 11 (2019). [13] M. Zhou, Y. Hou, G. Yang, H. Zhang, G. Tu, Y.E. Du, S. Wen, L. Xu, X. Tang, S. Tang, L. Yang, X. Cui, M. Liu, LncRNA-Hh Strengthen Cancer Stem Cells Generation in Twist-Positive Breast Cancer via Activation of Hedgehog Signaling Pathway, Stem Cells, 34 (2016) 55-66. [14] W. Li, G. Jia, Y. Qu, Q. Du, B. Liu, B. Liu, Long Non-Coding RNA (LncRNA) HOXA11-AS Promotes Breast Cancer Invasion and Metastasis by Regulating Epithelial-Mesenchymal Transition, Med Sci Monit, 23 (2017) 3393-3403. [15] I. Bar, I. Theate, S. Haussy, G. Beniuga, J. Carrasco, J.L. Canon, P. Delree, A. Merhi, MiR-210 Is Overexpressed in Tumor-infiltrating Plasma Cells in Triple-negative Breast Cancer, J Histochem Cytochem, (2019) 22155419892965.

Journal Pre-proof [16] F. Pages, J. Galon, M.C. Dieu-Nosjean, E. Tartour, C. Sautes-Fridman, W.H. Fridman, Immune infiltration in human tumors: a prognostic factor that should not be ignored, Oncogene, 29 (2010) 1093-1102. [17] F.O. Beltran-Anaya, S. Romero-Cordoba, R. Rebollar-Vega, O. Arrieta, V. Bautista-Pina, C. Dominguez-Reyes, F. Villegas-Carlos, A. Tenorio-Torres, L. Alfaro-Riuz,

S. Jimenez-Morales,

A.

Cedro-Tanda,

M.

Rios-Romero,

J.P.

Reyes-Grajeda, E. Tagliabue, M.V. Iorio, A. Hidalgo-Miranda, Expression of long non-coding RNA ENSG00000226738 (LncKLHDC7B) is enriched in the

of

immunomodulatory triple-negative breast cancer subtype and its alteration promotes

ro

cell migration, invasion, and resistance to cell death, Mol Oncol, 13 (2019) 909-927. [18] T. Meng, R. Huang, Z. Zeng, Z. Huang, H. Yin, C. Jiao, P. Yan, P. Hu, X. Zhu, Z.

-p

Li, D. Song, J. Zhang, L. Cheng, Identification of Prognostic and Metastatic

re

Alternative Splicing Signatures in Kidney Renal Clear Cell Carcinoma, Front Bioeng

lP

Biotechnol, 7 (2019) 270.

[19] B. Sousa, A.S. Ribeiro, J. Paredes, Heterogeneity and Plasticity of Breast Cancer

na

Stem Cells, Adv Exp Med Biol, 1139 (2019) 83-103. [20] J.H.E. Baker, A.H. Kyle, S.A. Reinsberg, F. Moosvi, H.M. Patrick, J. Cran, K.

Jo ur

Saatchi, U. Hafeli, A.I. Minchinton, Heterogeneous distribution of trastuzumab in HER2-positive xenografts and metastases: role of the tumor microenvironment, Clin Exp Metastasis, 35 (2018) 691-705. [21] F. Bai, Y. Jin, P. Zhang, H. Chen, Y. Fu, M. Zhang, Z. Weng, K. Wu, Bioinformatic profiling of prognosis-related genes in the breast cancer immune microenvironment, Aging (Albany NY), 11 (2019) 9328-9347. [22] H. Li, C. Gao, L. Liu, J. Zhuang, J. Yang, C. Liu, C. Zhou, F. Feng, C. Sun, 7-lncRNA Assessment Model for Monitoring and Prognosis of Breast Cancer Patients: Based on Cox Regression and Co-expression Analysis, Front Oncol, 9 (2019) 1348. [23] A.S. Dias, C.R. Almeida, L.A. Helguero, I.F. Duarte, Metabolic crosstalk in the breast cancer microenvironment, Eur J Cancer, 121 (2019) 154-171. [24] A. Piovesan, F. Antonaros, L. Vitale, P. Strippoli, M.C. Pelleri, M. Caracausi, Human protein-coding genes and gene feature statistics in 2019, BMC Res Notes, 12

Journal Pre-proof (2019) 315. [25] Q. Guo, S. Lv, B. Wang, Y. Li, N. Cha, R. Zhao, W. Bao, B. Jia, Long non-coding RNA PRNCR1 has an oncogenic role in breast cancer, Exp Ther Med, 18 (2019) 4547-4554. [26] L. Gao, P. Lin, P. Chen, R.Z. Gao, H. Yang, Y. He, J.B. Chen, Y.G. Luo, Q.Q. Xu, S.W. Liang, J.H. Gu, Z.G. Huang, Y.W. Dang, G. Chen, A novel risk signature that combines 10 long noncoding RNAs to predict neuroblastoma prognosis, J Cell Physiol, (2019).

of

[27] X. Qiu, J. Dong, Z. Zhao, J. Li, X. Cai, LncRNA LINC00668 promotes the

ro

progression of breast cancer by inhibiting apoptosis and accelerating cell cycle, Onco Targets Ther, 12 (2019) 5615-5625.

-p

[28] Y. Zhao, T. Du, L. Du, P. Li, J. Li, W. Duan, Y. Wang, C. Wang, Long noncoding

re

RNA LINC02418 regulates MELK expression by acting as a ceRNA and may serve

lP

as a diagnostic marker for colorectal cancer, Cell Death Dis, 10 (2019) 568. [29] R. Huang, Z. Zeng, G. Li, D. Song, P. Yan, H. Yin, P. Hu, X. Zhu, R. Chang, X.

na

Zhang, J. Zhang, T. Meng, Z. Huang, The Construction and Comprehensive Analysis of ceRNA Networks and Tumor-Infiltrating Immune Cells in Bone Metastatic

Jo ur

Melanoma, Front Genet, 10 (2019) 828. [30] W.D. Yu, H. Wang, Q.F. He, Y. Xu, X.C. Wang, Long noncoding RNAs in cancer-immunity cycle, J Cell Physiol, 233 (2018) 6518-6523. [31] R. Jiang, J. Tang, Y. Chen, L. Deng, J. Ji, Y. Xie, K. Wang, W. Jia, W.M. Chu, B. Sun, The long noncoding RNA lnc-EGFR stimulates T-regulatory cells differentiation thus promoting hepatocellular carcinoma immune evasion, Nat Commun, 8 (2017) 15129. [32] I. Sela-Culang, M.R. Benhnia, M.H. Matho, T. Kaever, M. Maybeno, A. Schlossman, G. Nimrod, S. Li, Y. Xiang, D. Zajonc, S. Crotty, Y. Ofran, B. Peters, Using a combined computational-experimental approach to predict antibody-specific B cell epitopes, Structure, 22 (2014) 646-657. [33] Z.Q. Zhou, J.J. Zhao, Q.Z. Pan, C.L. Chen, Y. Liu, Y. Tang, Q. Zhu, D.S. Weng, J.C. Xia, PD-L1 expression is a predictive biomarker for CIK cell-based

Journal Pre-proof immunotherapy in postoperative patients with breast cancer, J Immunother Cancer, 7

Jo ur

na

lP

re

-p

ro

of

(2019) 228.

Journal Pre-proof

Figure 1. Construction and verification of breast cancer grouping. a. The immune cells were highly expressed in the cluster1 group, which was named as the high immune cell infiltration group (Immunity_H), and the low expression in the Cluster2 group was named as the low immune cell infiltration group (Immunity_L). Using ESTIMATE's algorithm, the Tumor Purity, ESTIMATE Score, Immune Score and

of

Stromal Score of each sample gene were displayed together with the grouping information. b. The box-plot showed that there was a statistical difference in Tumor

ro

Purity, ESTIMATE Score, Immune Score and Stromal Score between the two groups

-p

(p<0.01). c and d. The expression of HLA family genes and CD274 in high immune

re

cell infiltration group (red) were all significantly higher than that in low immune cell

lP

infiltration group (green) (p<0.01). e. The statistical chart after using the

na

CIBERSORT method showed the proportion difference of each immune cell between

group(green).

Jo ur

the high immune cell infiltration group (red) and the low immune cell infiltration

Figure 2. Analysis of differentially expressed lncRNAs. a. The volcano plot showed that 2208 genes were up-regulated and 791 down-regulated between breast cancer and paracancerous tissues. Each red dot showed a upregulated gene and each green dot showed a downregulated gene (fold change >4, p= 0.001). b. Consistent with Figure 3a, the volcano plot showed that 455 genes were up-regulated and 967 down-regulated between high and low immune cell infiltration group of breast cancer. c. Using R software package to pick up the intersection, we obtained a total of 696 differentially expressed genes.

Journal Pre-proof

Figure 3. Identification and assessment of

immune-related lncRNA prognostic

signature for breast cancer. a. The HR and p-value from the univariable Cox HR regression of selected genes in the immune terms (Criteria: p-value<0.001). b. The LASSO Cox analysis identified 17 lncRNAs most correlated with prognostics. c. The optimal values of the penalty parameter were determined by 10-round cross-validation.

of

d. Patients in the high-risk group (red) exhibited worse overall survival (OS) than those in the low-risk group (blue). e. The risk curve of each sample reordered by risk

ro

score. f. The scatter plot of the sample survival overview. The green and red dots

-p

represent survival and death, respectively. g. Heatmap showed the expression profiles

re

of the signature in the low-risk groups and high-risk groups. The pink bar represented

lP

the low-risk group, and the blue bar represents the high-risk group. The 0 to 4 level of

na

gene expression was represented by the evolution from green to red. Figure 4. The Cox regression analysis for evaluating the independent prognostic

Jo ur

value of the risk score.The univariate (a) and multivariate (b) Cox regression analysis of risk score, age, gender, grade, and TNM stage. c. Calculate the AUC for risk score, age, gender, grade, and TNM stage of the total survival risk score according to the ROC curve. Figure 5. Correlation between the 11 lncRNA prognostic signature for breast cancer and the infiltration of immune cell subtypes. The six most significant correlations of risk score with immune cell infiltration ssGSEA score. a. B cells. b. CD4+ T cell. c. CD8+ T cell. d. Dendritic. e. Neutrophil. f. Macrophage.

Journal Pre-proof

Author Statement

Chuanlu Shen and Yong Shen conceived the study design. Yong Shen drafted the manuscript. Yong Shen and Xiaowei Peng performed statistical analysis. All the

of

authors participated in the discussion, provided conceptual comments, and have read

Jo ur

na

lP

re

-p

ro

and approved the final manuscript.

Journal Pre-proof

The expression levels of these 11 lncRNAs coef

HR

HR.95L

HR.95H

p-value

LINC00668

0.05743

1.059112

1.000718

1.120913

0.047172

LINC02418

0.131969

1.141073

1.042715

1.248708

0.004112

AL356515.1

0.242501

1.274432

1.109549

1.463818

0.000602

LINC01010

-0.22616

0.797594

0.70333

0.904491

0.000425

AP005131.6

-0.15287

0.858243

0.75443

0.976341

0.020128

AL772337.1

0.180778

1.198149

1.057565

1.357421

0.004527

AC027514.1

0.206782

1.229715

1.046417

1.44512

0.012042

AL161646.2

0.168444

1.183462

1.03545

1.35263

0.013474

AC004847.1

-0.12587

0.76218

1.020025

0.090409

AC243773.2

0.072279

1.074956

0.995029

1.161303

0.066723

AL591686.1

-0.1254

0.882147

0.769362

1.011466

0.072397

ro

-p

0.881727

lP

na Jo ur

of

id

re

Table 1

Journal Pre-proof

Highlights



11 differential lncRNA expression prognostic models were established for breast cancer. Our prognostic model of breast cancer was correlated to the immune cell

na

lP

re

-p

ro

of

infiltration.

Jo ur



Figure 1

Figure 2

Figure 3

Figure 4

Figure 5

Figure 6