Accepted Manuscript Systematic bioinformatic approaches reveal novel gene expression signatures associated with acquired resistance to EGFR targeted therapy in lung cancer
Marjan Mojtabavi Naeini, Manoochehr Tavassoli, Kamran Ghaedi PII: DOI: Reference:
S0378-1119(18)30459-1 doi:10.1016/j.gene.2018.04.077 GENE 42801
To appear in:
Gene
Received date: Revised date: Accepted date:
8 February 2018 23 April 2018 25 April 2018
Please cite this article as: Marjan Mojtabavi Naeini, Manoochehr Tavassoli, Kamran Ghaedi , Systematic bioinformatic approaches reveal novel gene expression signatures associated with acquired resistance to EGFR targeted therapy in lung cancer. The address for the corresponding author was captured as affiliation for all authors. Please check if appropriate. Gene(2017), doi:10.1016/j.gene.2018.04.077
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT Systematic Bioinformatic Approaches Reveal Novel Gene Expression Signatures Associated with Acquired Resistance to EGFR Targeted Therapy in Lung Cancer
PT
Marjan Mojtabavi Naeini a, Manoochehr Tavassoli †a, Kamran Ghaedi b
b
Cellular and Molecular Biology Division, Biology Department,
SC
of Isfahan, Isfahan, Iran;
RI
Affiliations: a Division of Genetics, Department of Biology, Faculty of Science, University
Faculty of Sciences, University of Isfahan, Isfahan, Iran
Faculty
of
Science,
University
MA
Biology,
NU
† Address correspondence to: Manoochehr Tavassoli, Division of Genetics, Department of
[email protected]
of
Isfahan,
Isfahan,
Iran.
AC
CE
PT E
D
Running Title: Gene Expression Profile in Acquired Resistance to TKIs in NSCLC
Email:
ACCEPTED MANUSCRIPT Funding Source: None. Financial Disclosure: The authors have no financial relationships relevant to this article to disclose. Conflicts of Interest: The authors have no conflicts of interest to declare regarding the
PT
publication of the current article.
RI
Abbreviations: Differentially Expressed Genes, DEGs; effect size, ES; epidermal Growth
SC
Factor Receptor, EGFR; False Discovery Rate, FDR; Gene Expression Omnibus, GEO; Gene Set Enrichment Analysis, GSEA; International Molecular Exchange, IMEx; Molecular
NU
Signature Database, MSigDB; National Center for Biotechnology Information, NCBI; NonSmall Cell Lung Cancer, NSCLC; Normalized Enrichment Score, NES; Principal Component
MA
Analysis, PCA; Protein-Protein Interaction, PPI; Random Effect Model, REM; Tyrosine
AC
CE
PT E
D
Kinase Inhibitors, TKIs; Variance Stabilizing Normalization, VSN
ACCEPTED MANUSCRIPT Abstract Objectives: Human non-small cell lung cancer (NSCLC) that harbors activating mutations in epidermal growth factor receptor (EGFR) initially responds to treatment with EGFR tyrosine kinase inhibitors (TKIs) such as gefitinib and erlotinib but eventually tumor cells acquire
PT
resistance. To date, several gene expression profiles have been reported in TKIs-resistant EGFR-mutant NSCLC. The objective of this study is to identify robust gene expression
RI
signatures, biological processes, and promising overcoming targets for TKIs-resistant EGFR-
SC
mutant NSCLC.
NU
Materials and Methods: Five publicly available microarray datasets were integrated by performing two network-based meta-analyses following by protein-protein interaction (PPI)
MA
network and gene set enrichment analysis.
Results and Conclusion: According to our meta-analyses, 830 and 1286 genes were
PT E
D
differentially expressed in the TKIs-resistant EGFR-mutant NSCLC cell lines compared to TKIs-sensitive EGFR-mutant NSCLC cell lines in the absence and presence of TKIs treatment, respectively. PPI network analysis identified ESR1 and ELAVL1 to be the most
CE
highly ranked hub genes involved in the NSCLC acquired TKI-resistance. Moreover, gene set
AC
enrichment analyses indicated that up-regulated genes are mainly distributed in hallmarks "Glycolysis", some "E2F targets". Down-regulated genes mainly contribute to hallmarks "interferon alpha response", "interferon gamma response", and also "E2F targets. For the first time, this study has demonstrated several robust candidate genes and pathways of the NSCLC acquired TKI-resistance. Further experimental verifications are highly recommended to examine our findings. Keywords: EGFR-mutant; Erlotinib; Gefitinib; NSCLC; TKIs
ACCEPTED MANUSCRIPT 1. Introduction Lung cancer is the leading cause of cancer-related deaths worldwide (1). Non-small cell lung cancer (NSCLC), approximately 85% of lung carcinomas, has served as a model for genotype-directed targeted cancer therapy (2). To illustrate, NSCLC cells harboring
PT
epidermal growth factor receptor (EGFR) activating kinase domain mutations are frequently "addicted" to the EGFR signaling pathway; hence, often initially are sensitive to treatment
RI
with an EGFR tyrosine kinase inhibitors (TKIs) small molecule such as gefitinib (3) and
SC
erlotinib (4-6). However, acquired resistance to EGFR TKIs-treatment shortly emerges (7, 8) due to a secondary resistance mutation within the EGFR kinase domain (T790M), c-MET
NU
amplification, and activation of the NFκB pathway (9-14). Nevertheless, there are still some
EGFR-mutant NSCLC patients (15).
MA
unknown mechanisms underlying acquired resistance to EGFR TKI-treatment in over 40% of
D
High-throughput genomics technologies such as microarrays and RNA-Seq technologies
PT E
have widely enriched our understanding of complex gene networks in biological and cellular processes underlying the progression of different diseases, response to treatment, and so on
CE
(16-18). Microarrays measure the expression of thousands of genes simultaneously on a genome-wide scale to disclose altered genetic expression profiles (19). However, many
AC
microarray studies have produced lists of the differentially expressed genes (DEGs) which tend to be inconsistencies between studies due to the limitations of small sample sizes and variables during performing experiments (20). To address these challenges, available datasets in public repositories such as Gene Expression Omnibus (GEO) (21) can be further integrated by meta-analysis approach to identify robust gene expression signatures that would be otherwise unidentifiable in individual studies (22). Meta-analysis approaches combining multiple gene expression datasets can eliminate biological and technical biases associated
ACCEPTED MANUSCRIPT with individual studies and enhance statistical power, the reliability, and generalizability of studies to obtain a more precise estimate of gene expression profiles (23). To date, no comprehensive meta-analysis of cellular resistance to EGFR TKIs has been reported. In this study, we performed cross-platform meta-analyses of several public gene
PT
expression datasets focused on acquired resistance to EGFR TKI-treatment in mutant EGFRaddicted NSCLC (TKIs sensitive) cell lines to identify related DEGs and cellular mechanisms
RI
that are consistently modified in acquired resistance to EGFR TKI-treatment. Briefly, an
SC
user-friendly microarray meta-analysis tool called NetworkAnalyst was employed (24, 25) to obtain the gene expression profiles associated with acquired TKIs resistance. The aim of the
NU
current study is to overcome the statistical limitation of each individual study, to resolve
MA
inconsistencies, and to reduce the likelihood of false-positive or -negative associations by
PT E
2. Materials and methods
D
integrating different gene expression data collected from EGFR-mutant NSCLC cell lines.
2.1. Selection of eligible microarray datasets for meta-analysis related to TKIs-
CE
resistance
AC
We implemented a systematic mining of microarray datasets submitted on the Gene Expression Omnibus (GEO) database of the National Center for Biotechnology Information (NCBI) (http://www.ncbi.nlm.nih.gov/geo/) during June and July 2017. The keywords used in the search included "lung cancer tyrosine kinase inhibitor resistance and/or resistance", "lung cancer TKI", and "lung cancer gefitinib and/or erlotinib". The related details were extracted from each identified study to include a dataset in the meta-analysis if it contained [1] microarray expression profiling of EGFR-mutant NSCLC cell lines that acquire gefitinib or erlotinib resistance by a long-term and continuous exposure to a TKI (almost 6 months) [2]
ACCEPTED MANUSCRIPT all included datasets studied on more than 4 samples [3] sufficient data and a robust platform to facilitate analysis (Table S1). Studies that reported non-human data, patient subjects, or used intrinsic drug-resistant cells (e.g., EGFR wild type NSCLC cell lines) were excluded in selection process of microarray datasets. Briefly, included datasets provided gene expression of TKI-sensitive and -resistant EGFR-mutant NSCLC cell lines which were treated with the
PT
different conditions, no treatment and a TKI-treatment. In the present report, for each
RI
condition, separate meta-analysis was performed. The workflow of this study is presented in
SC
Supplementary Figure 1 (Figure S1).
NU
2.2. Data preparation, normalization, and statistical meta-analysis Statistical analysis of microarray datasets with different platforms has been conducted using
MA
NetworkAnalyst, a web interface for the integrative analysis of gene expression data (http://www.networkanalyst.ca/) according its step-wise description (24, 25). Before each
D
meta-analysis, text tab delimited files (*.txt) of the microarray datasets were manually
PT E
prepared according to NetworkAnalyst tutorial. To illustrate, the prepared (*.txt) files included ID for probes in rows and for samples in columns. The first and second lines
CE
determined sample names and their two-group class, respectively. The two-group class for the first meta-analysis was "RN" for TKIs-resistant cells and "SN" for TKIs-sensitive cells.
AC
Likewise, the two-group class for the second meta-analysis was "RD" for TKI-treated TKIsresistant cells and "SD" for TKI-treated TKIs-sensitive cells. Next, the prepared files were uploaded in the "Multiple gene expression datasets" panel provide by the NetworkAnalyst. In the "Data Conversion" step, probe IDs were converted to Entrez IDs by selecting "H. sapiens (human)" as the organism type and the microarray platform of the uploaded study (listed in Table S1).
ACCEPTED MANUSCRIPT Heterogeneity factors between individual studies may distort meta-analysis; therefore, in the next step, individual datasets were normalized using the NetworkAnalyst tool. The variance stabilizing normalization (VSN) in combination with quantile normalization method was employed for normalization. In addition, to reduce potential study-specific batch effects, the normalized data sets were further adjusted by the well-established ComBat procedures
PT
offered by NetworkAnalyst (26). After applying the normalization, to check and visualize the
RI
sample clustering patterns, the results were examined using the principal component analysis
SC
(PCA).
Afterward, the combining effect size (ES) method was chosen for statistical meta-analysis of
NU
multiple gene expression datasets with different platforms (25). ES is a difference between
MA
group means divided by its standard deviation (i.e. Z-score). Remarkably, the combining ES method usually provides less but more accurate DEGs. According to NetworkAnalyst developers' protocol (25), when the Cochran’s Q test estimation (27) shows significant cross-
PT E
D
study deviations, the random effect model (REM) (28, 29) is most appropriate model for calculating combined ES. In fact, Cochran’s Q test is the weighted sum of squared differences between each study effects and the pooled effect across studies and if the
CE
calculated Q values deviated significantly from a chi-squared distribution, the REM would be
AC
appropriate. In the present meta-analyses, considering the deviation of Q values from a chisquared distribution (caused by heterogeneity between experiments), the REM was utilized to calculate the combined ES. The significance level for p-values was considered < 0.01.
2.3. Data visualization and functional interpretation To further interpret the biological implications of the identified DEGs involved in NSCLC TKI-treatment resistance, the merged expression data containing all DEGs was explored using various interactive visualization methods offered within NetworkAnalyst (24, 25).
ACCEPTED MANUSCRIPT Heatmap visualizations of a subset top 50 significantly dysregulated genes from metaanalyses were performed using the heatmap builder tool from the NetworkAnalyst. International Molecular Exchange (IMEx) provided by NetworkAnalyst was used to generate protein-protein interaction (PPI) networks under the literature-curated comprehensive data
PT
from InnateDB (30, 31). For avoiding complicated network and providing simpler visualization, "Zero order" interaction networks were created. The properties of nodes in PPI
RI
networks were displayed as hub degree (the number of connections that a node has to other
Set
Enrichment
Analysis
(GSEA)
desktop
software
v3.0
NU
Gene
SC
nodes) and betweenness values (the number of shortest paths passing through the node) (25).
(http://software.broadinstitute.org/gsea/index.jsp) was employed for determining whether
MA
identified DEGs lists are significantly enriched in a gene set pre-ranked by their correlation with the TKIs-resistance. Gene sets from molecular signature database (MSigDB) v6.0 (using
D
the H “Hallmark” collection) were evaluated for differential expression (32, 33). Notably, R
PT E
program (version 3.2.1) (34) was utilized for data preparation of GSEA related results. In this study, the enrichment scores (ES) significance of gene sets estimated by an empirical
CE
phenotype-based permutation test procedure. GSEA normalizes the ES for each gene set to account for the variation in set sizes, yields normalized enrichment score (NES), and finally
AC
calculates a false discovery rate (FDR) corresponding to each NES. The FDR estimates the probability that a list with a given NES represents a false positive finding; it is computed by comparing the tails of the observed and permutation-computed null distributions for the NES. In this study permutation test iterations number was adjusted to 10000. FDR q-value of <0.25 was applied as the statistical significance cut-of value.
ACCEPTED MANUSCRIPT 3. Results 3.1. Data collection and filtering Initially collected datasets were filtered based on our inclusion-exclusion criteria and the remaining five datasets were selected for further meta-analyses. Included datasets examined
PT
gene expression microarray profile from parental EGFR-mutant NSCLC cell lines (TKIs-
RI
sensitive) and acquired TKIs-resistant NSCLC cell lines in the presence or absence of a TKItreatment. Detailed characteristics of the datasets are presented in Table S1. Importantly,
SC
samples from GSE38310, GSE34228, and GSE38302 were further separated into two
NU
subgroups during our meta-analyses. Briefly, in this study, gene expression pattern of resistance towards TKIs in NSCLC cells in the presence or absence of a TKI-treatment was
MA
tested by meta-analysis approach.
D
3.2. Data normalization and batch effect adjustment
PT E
The aim of the current study was to identify the DEGs in TKIs-resistant NSCLC cell lines by performing two separate meta-analyses of the selected datasets. Initially, before meta-
CE
analysis, expression values should be compared at log scales with identical distribution across different arrays. Therefore, the selected GEO datasets were normalized by the well-
AC
established VSN in combination with quantile normalization. In addition, unbiased data integration is prevented by batch effects due to non-biological variations; thus, batch effect adjustment was conducted by NetworkAnalyst. Simply, the well-established ComBat procedures (26) removed batch effect of the study. To compare the sample clustering patterns after applying the ComBat procedures, the results were visualized using the PCA (Figure S2).
ACCEPTED MANUSCRIPT 3.3. Meta-analyses results Identification of DEGs signatures among acquired TKIs-resistant NSCLC cell lines (TKIs-resistant cells vs TKIs-sensitive cells) To find an expression pattern shared between erlotinib or gefitinib long-term resistant EGFR-
PT
mutant NSCLC cell lines, five microarray datasets (GSE38310 (15); GSE34228; GSE38302
RI
(35); GSE60189 (36); GSE83666) were analyzed using the Cochran’s Q test REM statistical analysis providing by NetworkAnalyst. This statistical procedure considers true effect size
SC
varied from study to study by integrating unknown cross-study heterogeneities. According to
NU
our meta-analysis, a total of 830 genes (named DEGs list 1), including 387 up-regulated and 443 down-regulated genes were dysregulated across the datasets under an adjusted p-value
MA
cut-off of 0.01 (Table S2). The top 10 most up-regulated genes and top 10 most downregulated genes are listed in Table 1. Moreover, Figure 1 shows the heatmap of top 50
D
significantly DEGs involved in TKIs resistance in the absence of TKIs-treatment.
PT E
Identification of DEGs signatures induced by TKIs-treatment in acquired TKIs-
sensitive cells)
CE
resistant NSCLC cell lines (TKI-treated TKIs-resistant cells vs TKI-treated TKIs-
AC
The same meta-analysis statistical procedure was performed to identify common DEGs involved in erlotinib or gefitinib resistant NSCLC cell lines in the presence of a TKItreatment by integration of three microarray datasets (GSE38310 (15); GSE34228; GSE38302 (35)). 1286 DEGs (named DEGs list 2) including 770 up-regulated and 516 down-regulated genes were identified. A list of the top 10 most up-regulated and downregulated genes are presented in Table 2 and the complete list of DEGs is provided in Supplementary Table 3 (Table S3). The heatmap representation of top 50 DEGs involved in TKIs-resistance in the presence of TKIs-treatment is provided in Figure 2.
ACCEPTED MANUSCRIPT What's more, identification of the overlapped genes between the lists of the two obtained DEGs was conducted. A total of 126 DEGs were identified to be common between our two lists. Remarkably, although 119 out of the 126 DEGs were dysregulated among our two comparisons with the same pattern, the 7 remaining DEGs, including CCDC102A, RAB26, SCO1, HNRNPAB, PKD1L2, URB2, and MCM10 were expressed across the two groups with
PT
the converse pattern (Table S4).
RI
3.4. Functional enrichment analysis
SC
To find the key hub genes among the obtained DEGs from our two meta-analyses, "Zero
NU
order" PPI networks were generated by IMEx under the InnateDB interactome information. The top 10 hub genes involved in the TKIs-resistance based on network topology calculations
MA
are listed in Table S5. Continent PPI networks were created with 250 nodes connected with 352 edges for TKIs resistance in the no treatment condition (Figure S3A) and with 705 nodes
D
connected with 1530 edges for TKIs resistance in the TKIs-treatment condition (Figure S3B).
PT E
In addition, list of all hub genes involved in TKIs resistance identified in the two metaanalyses with network topology information (betweenness centrality; degree) and meta-
CE
analysis details (combined ES; p-value) are provided in Supplementary Table 5 and 6, respectively (Table S5 and S6).
AC
To gain insights into the biological functions of genes involved in TKIs resistance, DEGs obtained from the two meta-analyses were ordered according to up- or down-regulation and then were used for performing pre-ranked GSEA with 10000 permutation tests. The GSEA analysis resulted in gene sets that enriched positively and negatively in our ranked DEGs lists. Positively and negatively regulated gene sets (pathways) and calculated NES and FDR q-value are listed in Table S7. Moreover, enriched gene sets and their involved genes in core enrichment are represented in Figure 3.
ACCEPTED MANUSCRIPT 4. Discussion EGFR TKIs have been widely used for treating EGFR-mutant NSCLC (11). In this study, five publicly available microarray datasets were jointly analyze to identify consistent and common pattern of gene expression across TKIs-resistant EGFR-mutant NSCLC cell lines
PT
comparing to TKIs-sensitive EGFR-mutant NSCLC cell lines in different treatment condition, no treatment and a TKI-treatment. The aim of this study was to shed additional
RI
information in molecular mechanism of NSCLS TKIs-resistance by integrating datasets from
SC
multiple studies (Table S1) in comprehensive network-based meta-analyses. A total of 830
NU
DEGs was reported in the TKIs resistance without TKIs treatment (DEGs list 1; Table S2) and 1286 DEGs were identified in the second meta-analysis (DEGs list 2; Table S3) under
MA
the significance threshold of p-value <0.01 which 126 DEGs were overlapped between the two DEGs lists (Table S4). Top 10 up- and down-regulated DEGs identified in each meta-
D
analysis are listed in table 1 and 3. While previous reports suggested that AXL kinase and
PT E
chemokines, including CXCL1, CXCL2, CXCL6, and CXCL8 (IL-8), FGF2 are up-regulated in EGFR TKI acquired resistance in EGFR-mutant NSCLCs (15, 35, 36), these genes
CE
expression were not altered significantly in our analyses. Moreover, in contrast to Liu et al. report (36), we found that ALDH1A1 is down-regulated in TKIs-resistant NSCLC cell lines
AC
treated with a TKI (DEGs list 2; Combined ES = -1.5262; p-value = 0.001234). Consistent with a prior study (35), we observed that FGFR1 up-regulation was associated with acquired resistance to TKIs (DEGs list 1; Combined ES = 0.90472; p-value = 0.0092201). Lee et al. found that CFB gene was up-regulated during erlotinib treatment (37); however, according to our results, CFB is down-regulated in TKIs-resistant NSCLC cell lines comparing with TKIssensitive NSCLC cell lines during TKI-treatment (DEGs list 2; Combined ES = -5.8458; pvalue = 4.06E-11).
ACCEPTED MANUSCRIPT Our meta-analyses illustrate that a broad range of DEGs are correlated with TKIs-resistance across EGFR mutant NSCLC cell lines. Initially, to evaluate the biological significance and molecular complexity of each DEGs list, related PPI network and its hub genes were recognized (Table S5 and S6). Top 10 TKIs-resistant hub genes using the first DEGs are three up-regulated genes (ESR1, CCND1, and YWHAG), six down-regulated genes (EP300,
PT
IKBKE, MDM2, CRK, DDX3X, and CPSF6), and one external node (HIST1H4E). Hamilton
RI
et al. demonstrated a robust association between therapeutic resistance of lung carcinoma
SC
cells and the expression of ESR1 which is in agreement with our result (DEGs list 1; Combined ES = 0.95154; p-value = 0.0054282). YWHAG has been reported to act as a tumor
NU
oncogene in NSCLC (38) and also was up-regulated in the present meta-analysis (DEGs list 1; Combined ES = 1.1176; p-value = 0.00084169). CCND1 is reportedly down-regulated by
MA
erlotinib treatment (39); interestingly, here, we found that this gene was significantly upregulated in TKIs-resistant NSCLC cell lines (DEGs list 1; Combined ES = 0.94275; p-value
D
= 0.0078723). EP300 expression had converse correlation with the resistance to TKIs
PT E
treatment (DEGs list 1; Combined ES = -1.4528; p-value = 2.05E-05) and it also has been reported to be a hub gene in NSCLC (40). For IKBKE, we identified that it was down-
CE
regulated in TKIs-resistant NSCLC cells (DEGs list 1; Combined ES = -1.2518; p-value = 0.0002273); however, J et al. have reported that over-expression of IKBKE induces
AC
chemoresistance in NSCLC (41). MDM2 has been identified as either a favorable or unfavorable prognostic biomarker in NSCLC (42), and in this study, its expression was conversely associated with TKIs-resistant NSCLC cell line (DEGs list 1; Combined ES = 1.2224; p-value = 0.00627). Although CRK has been recognized as an oncogene in NSCLC (43); surprisingly, we found that the down-regulation of CRK can be linked to the TKIsresistance in NSCLC (DEGs list 1; Combined ES = -1.1837; p-value = 0.00041723). Similarly, there is an inverse observation comparing this study (DEGs list 1; Combined ES =
ACCEPTED MANUSCRIPT -1.1939; p-value = 0.0066982) and previously reported study (44) for role of DDX3X gene in resistance to TKIs in NSCC. Furthermore, we identified CPSF6 (cleavage and polyadenylation specific factor 6) as a novel down-regulated hub gene in TKIs-resistant NSCLC (DEGs list 1; Combined ES = -1.1065; p-value = 0.0011359), which has not been
PT
reported in lung cancer to date. Moreover, top 10 hub genes in the second DEGs list are seven up-regulated genes (ELAVL1,
RI
CUL4A, PCNA, PAXIP1, RAD21, COPS6, and PSMA3) and three down-regulated genes
SC
(IKBKE, EP300, and SP1). While some of the top hub genes, including ELAVL1, PCNA, PAXIP1, RAD2, have been identified as potential lung cancer related genes (45-48), this is
NU
the first study which reports the relationship of these genes and NSCLC TKIs-resistance.
MA
Wang et al. revealed that CUL4A activates EGFR transcriptional expression (49). Likewise, we found that CUL4A is over-expressed in TKIs-resistant NSCLC cell lines by a TKI treatment (DEGs list 2; Combined ES = 1.2595; p-value = 0.0084691). Importantly, we
PT E
D
recognized COPS6 and PSMA3 as novel up-regulated hub genes induced by a TKI treatment in TKIs-resistant NSCLC cell lines (DEGs list 2; Combined ES = 2.1631; p-value = 1.53E-05 and Combined ES = 1.4301; p-value = 0.0026263) since these genes have not been reported
CE
before in lung cancer. In consistent with our first DEGs list, IKBKE was down-regulated in
AC
TKIs-resistant NSCLC cells when they are treated with a TKI (DEGs list 2; Combined ES = 1.5034; p-value = 0.0020303); nevertheless, it has been reported that over-expression of IKBKE induces chemoresistance in NSCLC (41). EP300 was down-regulated in the second DEGs list (DEGs list 2; Combined ES = -2.0116; p-value = 3.96E-05) as well as DEGs list 1. In spite of the positive correlation between SP1 and EGFR signal cascade (50), we found its down-regulation in EGFR TKIs-resistant NSCLC cells in the presence of a TKI-treatment (DEGs list 2; Combined ES = -1.9496; p-value = 7.10E-05).
ACCEPTED MANUSCRIPT Next, to further investigate the functional mechanisms of the obtained DEGs, we performed GSEA. GSEA revealed that glycolysis related genes are up-regulated and some of E2F targets are mainly down-regulated in TKIs-resistant NSCLC cell lines when they are not treated with a TKI. On the other hand, up-regulated genes in the second DEGs list, obtained from comparison between TKIs-resistant and -sensitive NSCLC cell lines treated by a TKI,
PT
are mainly involved in E2F targets, MYC targets, G2M checkpoint, and IL2 STAT5 signaling
RI
gene sets defined by MSigDB. Moreover, members of following gene sets are down-
SC
regulated in TKI-resistant NSCLC cell lines when treated with a TKI: interferon alpha response, interferon gamma response, coagulation, heme metabolism, complement,
NU
inflammatory response, apoptosis, allograft rejection, P53 pathway, myogenesis, TNFα signaling via NFκB, estrogen response early, and xenobiotic metabolism (Figure 3 and Table
MA
S7). In agreement with our result, Xie et al. reported that glycolysis metabolism was significantly up-regulated in erlotinib-resistant NSCLC cell line cells (51). Interestingly,
D
although some of genes encoding cell cycle related targets of E2F transcription factors in
PT E
TKIs-resistant NSCLC cells with no TKI-treatment were down-regulated, some targets of E2F transcription factors were up-regulated in TKIs-resistant NSCLC cells in the presence of
CE
a TKI (Figure 3; Table S7). To our knowledge, almost all enriched gene sets found in the
AC
present study are novel NSCLC TKIs-resistance related pathways. While the current investigation my gain insights into the robust genetic signatures and pathways, there are several limitations associated with the current study. Firstly, in spite of performing individual normalization for included datasets, heterogeneity across datasets (e.g., different platforms or TKIs-treatment condition) may cause biased impact on meta-analyses results. Secondly, although we made an attempt to collect all relevant datasets, another limitation is the relatively small number of datasets; only five datasets were used in the
ACCEPTED MANUSCRIPT present meta-analyses. Last, the results are based on cell lines which may not be able to thoroughly generalize molecular signatures of resistant NSCLC among patient.
5. Conclusion
PT
We identified two common DEGs lists that are likely to be involved in the NSCLC TKIsresistance by performing two separate meta-analyses of five public microarray datasets. The
RI
systemic PPI network analysis of the significant DEGs introduced some novel hub genes
SC
underlying NSCLC TKIs-resistance. In addition, several contributing pathways were enriched using two DEGs NSCLC TKIs-resistance lists. The present study may gain insight
NU
into the complex molecular features and also provide a novel gene expression signature of
AC
CE
PT E
D
MA
NSCLC TKIs-resistance.
ACCEPTED MANUSCRIPT 6. References 1.
Siegel RL, Miller KD, Jemal A. Cancer statistics, 2016. CA: a cancer journal for
clinicians. 2016;66(1):7-30. 2.
Lantermann AB, Chen D, McCutcheon KJ, Hoffman GR, Frias E, Ruddy DA, et al.
PT
Inhibition of casein kinase 1 alpha prevents acquired drug resistance to erlotinib in EGFRmutant non-small cell lung cancer. Cancer research. 2015:canres. 1113.2015. Fichter CD, Gudernatsch V, Przypadlo CM, Follo M, Schmidt G, Werner M, et al.
RI
3.
SC
ErbB targeting inhibitors repress cell migration of esophageal squamous cell carcinoma and
NU
adenocarcinoma cells by distinct signaling pathways. Journal of Molecular Medicine. 2014;92(11):1209-23.
Lynch TJ, Bell DW, Sordella R, Gurubhagavatula S, Okimoto RA, Brannigan BW, et
MA
4.
al. Activating mutations in the epidermal growth factor receptor underlying responsiveness of lung
cancer
gefitinib.
New
England
Journal
of
Medicine.
PT E
2004;350(21):2129-39. 5.
to
D
non–small-cell
Pao W, Miller V, Zakowski M, Doherty J, Politi K, Sarkaria I, et al. EGF receptor
CE
gene mutations are common in lung cancers from “never smokers” and are associated with sensitivity of tumors to gefitinib and erlotinib. Proceedings of the National Academy of
AC
Sciences of the United States of America. 2004;101(36):13306-11. 6.
Paez JG, Jänne PA, Lee JC, Tracy S, Greulich H, Gabriel S, et al. EGFR mutations in
lung
cancer:
correlation
with
clinical
response
to
gefitinib
therapy.
Science.
2004;304(5676):1497-500. 7.
Gazdar AF. Personalized medicine and inhibition of EGFR signaling in lung cancer.
The New England journal of medicine. 2009;361(10):1018. 8.
Gower A, Wang Y, Giaccone G. Oncogenic drivers, targeted therapies, and acquired
resistance in non-small-cell lung cancer. Journal of molecular medicine. 2014;92(7):697-707.
ACCEPTED MANUSCRIPT 9.
Ohashi K, Maruvka YE, Michor F, Pao W. Epidermal growth factor receptor tyrosine
kinase inhibitor–resistant disease. Journal of Clinical Oncology. 2013;31(8):1070-80. 10.
Pao W, Miller VA, Politi KA, Riely GJ, Somwar R, Zakowski MF, et al. Acquired
resistance of lung adenocarcinomas to gefitinib or erlotinib is associated with a second mutation in the EGFR kinase domain. PLoS medicine. 2005;2(3):e73. Kobayashi S, Boggon TJ, Dayaram T, Jänne PA, Kocher O, Meyerson M, et al.
PT
11.
RI
EGFR mutation and resistance of non–small-cell lung cancer to gefitinib. N Engl J Med.
12.
SC
2005;2005(352):786-92.
Engelman JA, Zejnullahu K, Mitsudomi T, Song Y, Hyland C, Park JO, et al. MET
NU
amplification leads to gefitinib resistance in lung cancer by activating ERBB3 signaling. science. 2007;316(5827):1039-43.
Bivona TG, Hieronymus H, Parker J, Chang K, Taron M, Rosell R, et al. FAS and
MA
13.
NF-κB signalling modulate dependence of lung cancers on mutant EGFR. Nature.
Arcila ME, Oxnard GR, Nafa K, Riely GJ, Solomon SB, Zakowski MF, et al.
PT E
14.
D
2011;471(7339):523.
Rebiopsy of lung cancer patients with acquired resistance to EGFR inhibitors and enhanced
CE
detection of the T790M mutation using a locked nucleic acid-based assay. Clinical cancer research. 2011;17(5):1169-80. Zhang Z, Lee JC, Lin L, Olivas V, Au V, LaFramboise T, et al. Activation of the
AC
15.
AXL kinase causes resistance to EGFR-targeted therapy in lung cancer. Nature genetics. 2012;44(8):852. 16.
Mesrian Tanha H, Mojtabavi Naeini M, Rahgozar S, Moafi A, Honardoost MA.
Integrative computational in-depth analysis of dysregulated miRNA-mRNA interactions in drug-resistant pediatric acute lymphoblastic leukemia cells: an attempt to obtain new
ACCEPTED MANUSCRIPT potential gene-miRNA pathways involved in response to treatment. Tumor Biology. 2016;37(6):7861-72. 17.
Jha PK, Vijay A, Sahu A, Ashraf MZ. Comprehensive Gene expression meta-analysis
and integrated bioinformatic approaches reveal shared signatures between thrombosis and myeloproliferative disorders. Scientific reports. 2016;6. Park R, Kim T-H, Ji JD. Gene Expression Profile in Patients with Axial
PT
18.
RI
Spondyloarthritis: Meta-analysis of Publicly Accessible Microarray Datasets. Journal of
19.
SC
Rheumatic Diseases. 2016;23(6):363-72.
Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, et al.
monitoring. science. 1999;286(5439):531-7.
Siddiqui AS, Delaney AD, Schnerch A, Griffith OL, Jones SJ, Marra MA. Sequence
MA
20.
NU
Molecular classification of cancer: class discovery and class prediction by gene expression
biases in large scale gene expression profiling data. Nucleic acids research. 2006;34(12):e83-
Edgar R, Domrachev M, Lash AE. Gene Expression Omnibus: NCBI gene expression
PT E
21.
D
e.
and hybridization array data repository. Nucleic acids research. 2002;30(1):207-10. Rung J, Brazma A. Reuse of public genome-wide gene expression data. Nature
CE
22.
reviews Genetics. 2013;14(2):89. Griffith OL, Melck A, Jones SJ, Wiseman SM. Meta-analysis and meta-review of
AC
23.
thyroid cancer gene expression profiling studies identifies important diagnostic biomarkers. Journal of Clinical Oncology. 2006;24(31):5043-51. 24.
Xia J, Fjell CD, Mayer ML, Pena OM, Wishart DS, Hancock RE. INMEX—a web-
based tool for integrative meta-analysis of expression data. Nucleic acids research. 2013;41(W1):W63-W70.
ACCEPTED MANUSCRIPT 25.
Xia J, Gill EE, Hancock RE. NetworkAnalyst for statistical, visual and network-based
meta-analysis of gene expression data. Nature protocols. 2015;10(6):823-44. 26.
Johnson WE, Li C, Rabinovic A. Adjusting batch effects in microarray expression
data using empirical Bayes methods. Biostatistics. 2007;8(1):118-27. 27.
Cochran WG. The combination of estimates from different experiments. Biometrics.
Choi JK, Yu U, Kim S, Yoo OJ. Combining multiple microarray studies and
RI
28.
PT
1954;10(1):101-29.
29.
SC
modeling interstudy variation. Bioinformatics. 2003;19(suppl_1):i84-i90. Marot G, Foulley J-L, Mayer C-D, Jaffrézic F. Moderated effect size and P-value
30.
NU
combinations for microarray meta-analyses. Bioinformatics. 2009;25(20):2692-9. Orchard S, Kerrien S, Abbani S, Aranda B, Bhate J, Bidwell S, et al. Protein
MA
interaction data curation: the International Molecular Exchange (IMEx) consortium. Nature methods. 2012;9(4):345-50.
Breuer K, Foroushani AK, Laird MR, Chen C, Sribnaia A, Lo R, et al. InnateDB:
D
31.
PT E
systems biology of innate immunity and beyond—recent updates and continuing curation. Nucleic acids research. 2012;41(D1):D1228-D33. Liberzon A, Subramanian A, Pinchback R, Thorvaldsdóttir H, Tamayo P, Mesirov JP.
CE
32.
Molecular signatures database (MSigDB) 3.0. Bioinformatics. 2011;27(12):1739-40. Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, et al.
AC
33.
Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences. 2005;102(43):1554550. 34.
Gentleman R. Using Categories to Model Genomic Data 2009. 2013.
ACCEPTED MANUSCRIPT 35.
Terai H, Soejima K, Yasuda H, Nakayama S, Hamamoto J, Arai D, et al. Activation
of the FGF2-FGFR1 autocrine pathway: a novel mechanism of acquired resistance to gefitinib in NSCLC. Molecular cancer research. 2013;11(7):759-67. 36.
Liu Y-N, Chang T-H, Tsai M-F, Wu S-G, Tsai T-H, Chen H-Y, et al. IL-8 confers
resistance to EGFR inhibitors by inducing stem cell properties in lung cancer. Oncotarget.
Lee H-J, Zhuang G, Cao Y, Du P, Kim H-J, Settleman J. Drug resistance via feedback
RI
37.
PT
2015;6(12):10415.
38.
SC
activation of Stat3 in oncogene-addicted cancer cells. Cancer cell. 2014;26(2):207-21. Wang P, Deng Y, Fu X. MiR-509-5p suppresses the proliferation, migration, and
NU
invasion of non-small cell lung cancer by targeting YWHAG. Biochemical and biophysical research communications. 2017;482(4):935-41.
Rothenberg SM, Concannon K, Cullen S, Boulay G, Turke AB, Faber AC, et al.
MA
39.
Inhibition of mutant EGFR in lung cancer cells triggers SOX2-FOXO6-dependent survival
El-aarag SA, Mahmoud A, Hashem MH, Elkader HA, Hemeida AE, ElHefnawi M. In
PT E
40.
D
pathways. Elife. 2015;4:e06132.
silico identification of potential key regulatory factors in smoking-induced lung cancer. BMC
41.
CE
medical genomics. 2017;10(1):40.
Guo J, Kim D, Gao J, Kurtyka C, Chen H, Yu C, et al. IKBKE is induced by STAT3
AC
and tobacco carcinogen and determines chemosensitivity in non-small cell lung cancer. Oncogene. 2013;32(2):151. 42.
Jiang L-Y, Bi R, Ding F-B, Wang M-S, Mei J, He Y. Prognostic significance of
overexpressed matrix metalloproteinase-2, mouse-double minute: 2 homolog and epidermal growth factor receptor in non-small cell lung cancer. Journal of BU ON: official journal of the Balkan Union of Oncology. 2016;21(2):341-8.
ACCEPTED MANUSCRIPT 43.
Mortazavi F, Lu J, Phan R, Lewis M, Trinidad K, Aljilani A, et al. Significance of
KRAS/PAK1/Crk pathway in non-small cell lung cancer oncogenesis. BMC cancer. 2015;15(1):381. 44.
Nozaki K, Kagamu H, Shoji S, Igarashi N, Ohtsubo A, Okajima M, et al. DDX3X
induces primary EGFR-TKI resistance based on intratumor heterogeneity in lung cancer cells
Yang Z, Zheng R, Gao Y, Zhang Q. RETRACTED ARTICLE: Gene expression
RI
45.
PT
harboring EGFR-activating mutations. PloS one. 2014;9(10):e111019.
SC
profiles on predicting protein interaction network and exploring of new treatments for lung cancer. Molecular Biology Reports. 2014;41(12):8203-10.
Setia S, Sanyal SN. Downregulation of NF-κB and PCNA in the regulatory pathways
NU
46.
of apoptosis by cyclooxygenase-2 inhibitors in experimental lung cancer. Molecular and
47.
MA
cellular biochemistry. 2012;369(1-2):75-86.
Jhuraney A, Woods NT, Wright G, Rix L, Kinose F, Kroeger JL, et al. PAXIP1
D
potentiates the combination of WEE1 inhibitor AZD1775 and platinum agents in lung cancer.
48.
PT E
Molecular cancer therapeutics. 2016;15(7):1669-81. Kang H, Yoo S, Choi J, Hong M, Do S, Jin C, et al. Polymorphisms in mitotic
CE
checkpoint-related genes can influence survival outcomes of early-stage non-small cell lung cancer. Oncotarget. 2017.
Wang Y, Zhang P, Liu Z, Wang Q, Wen M, Wang Y, et al. CUL4A overexpression
AC
49.
enhances lung tumor growth and sensitizes lung cancer cells to erlotinib via transcriptional regulation of EGFR. Molecular cancer. 2014;13(1):252. 50.
Tominaga T, Tsuchiya T, Mochinaga K, Arai J, Yamasaki N, Matsumoto K, et al.
Epidermal growth factor signals regulate dihydropyrimidine dehydrogenase expression in EGFR-mutated non-small-cell lung cancer. BMC cancer. 2016;16(1):354.
ACCEPTED MANUSCRIPT 51.
Xie C, Jin J, Bao X, Zhan W-H, Han T-Y, Gan M, et al. Inhibition of mitochondrial
glutaminase activity reverses acquired erlotinib resistance in non-small cell lung cancer.
AC
CE
PT E
D
MA
NU
SC
RI
PT
Oncotarget. 2016;7(1):610.
ACCEPTED MANUSCRIPT 7. Legends 7.1. Figure legends Figure 1- Heatmap representation of expression profiles for the top 50 significantly TKIs DEGs identified by the meta-analysis.
PT
Figure 2- Heatmap representation of expression profiles for the top 50 significantly TKIs DEGs induced by
RI
TKIs treatment.
Figure 3- Enriched signaling pathways (gene sets) and their important counterparts identified by GSEA using
SC
the two DEGs lists. Red and green boxes represent positively and negatively enriched gene sets, with FDR q
7.2. Supplementary materials legends
NU
value < 0.25.
MA
Figure S1- The pipeline of the present study. Two separate meta-analyses were performed identically. Figure S2- PCA plots after batch effect correction (A) The first meta-analysis (TKIs-resistant cells vs TKIs-
D
sensitive cells); (B) The second meta-analysis (TKI-treated TKIs-resistant cells vs TKI-treated TKIs-sensitive
PT E
cells). Different datasets and samples are represented by distinct color and shapes, respectively. SN, untreated sensitive cells; RN, untreated resistant cells; SD, treated sensitive cells; RD, treated resistant cells
CE
Figure S3- PPI networks analyses of the defined DEGs. (A) The "Zero order" PPI network topology obtained using DEGs in TKIs resistance; (B) The "Zero order" PPI network topology obtained using DEGs in TKIs
AC
resistance in the presence of TKIs treatment. Nodes in red indicate upregulated genes and nodes in green indicate down-regulated genes. Table S1- The complete list of differentially expressed genes identified from the first meta-analysis using adjusted p value cut-off of 0.01. (DEGs list 1) Table S2- The complete list of differentially expressed genes identified from the second meta-analysis using adjusted p value cut-off of 0.01. (DEGs list 2) Table S3- Common differentially expressed genes between the two meta-analyses.
ACCEPTED MANUSCRIPT Table S4- The complete list of nodes in the PPI network recognized by DEGs list 1 Table S5- The complete list of nodes in the PPI network recognized by DEGs list 2
AC
CE
PT E
D
MA
NU
SC
RI
PT
Table S6- All GSEA results
ACCEPTED MANUSCRIPT Table 1- Top 20 shared TKIs resistant DEGs identified in the meta-analysis. Genes were ranked based on combined ES. The corresponding combined ES and P values were calculated by REM analysis. EntrezID
Gene symbol
Combined ES
P value
Top 10 upregulated DEGs TRPC1
3.4499
8.05E-05
27122
DKK3
3.4041
0.0008
123920
CMTM3
3.2033
0.0000
91156
IGFN1
3.1897
0.0007
5792
PTPRF
3.0375
0.0000
9482
STX8
2.6423
2.46E-11
28996
HIPK2
2.6404
23413
NCS1
2.6326
84255
SLC37A3
2.6082
10447
FAM3C
2.6004
RI
SC 0.0038
NU
2.46E-11
MA
Top 10 downregulated DEGs
PT
7220
1.08E-08 0.0003
RARRES3
-3.8741
0.0015
197259
MLKL
-3.1458
6.70E-05
9743
ARHGAP32
55572
FOXRED1
-3.0263
3.02E-10
2537
IFI6
-2.9502
0.0042
81610
FAM83D
-2.9035
0.0014
FAM120A
-2.8258
5.14E-12
SLC9A3R1
-2.8011
5.14E-12
OAS1
-2.7399
0.0003
SLC31A1
-2.7025
3.03E-11
4938 1317
-3.062
PT E
CE
9368
AC
23196
D
5920
ES, effect size; DEG, differentially expressed gene
0.0000
ACCEPTED MANUSCRIPT Table 2- Top 20 shared TKIs resistant DEGs identified in the meta-analysis with a TKI treatment condition. Genes were ranked based on combined ES. The corresponding combined ES and P values were calculated by REM analysis. EntrezID
Gene symbol
Combined ES
P value
Top 10 upregulated DEGs SPRED1
8.8373
1.22E-07
22822
PHLDA1
7.3298
1.17E-07
8187
ZNF239
7.2459
0.0024
7804
LRP8
6.7936
0.0014
3655
ITGA6
6.7223
0.0043
54206
ERRFI1
6.5075
136
ADORA2B
6.3026
1723
DHODH
5.2714
1964
EIF1AX
5.2099
55646
LYAR
5.1873
1.44E-08
-7.4067
2.73E-11
RI
SC
4.06E-11
NU
4.06E-11
MA
Top 10 downregulated DEGs
PT
161742
2.60E-10 0.0023
PRR15L
7134
TNNC1
79153
GDPD3
-6.5721
0.0014
389337
ARHGEF37
-6.5109
5.99E-09
93082
NEURL3
-6.2121
4.06E-11
PDZK1IP1
-6.1245
8.55E-05
LY6D
-5.9753
0.0034
CFB
-5.8458
4.06E-11
SLC7A7
-5.7815
0.0036
TM7SF2
-5.7484
0.0062
629 9056 7108
-6.9102
PT E
CE
8581
AC
10158
D
79170
ES, effect size; DEG, differentially expressed gene
0.0007
ACCEPTED MANUSCRIPT Key messages
Gene expression pattern of resistance towards TKIs in NSCLC is tested by metaanalysis approach. The key hub genes involved in the TKIs-resistance are introduced.
Positively and negatively enriched signaling pathways involved in the TKIs-resistance
PT
are introduced.
RI
Several robust candidate genes and pathways of the NSCLC acquired TKI-resistance
CE
PT E
D
MA
NU
SC
are demonstrated for the first time.
AC
Figure 1
Figure 2
Figure 3