Methylation signature genes identification of cancers occurrence and pattern recognition

Methylation signature genes identification of cancers occurrence and pattern recognition

Computational Biology and Chemistry 85 (2020) 107198 Contents lists available at ScienceDirect Computational Biology and Chemistry journal homepage:...

1MB Sizes 0 Downloads 33 Views

Computational Biology and Chemistry 85 (2020) 107198

Contents lists available at ScienceDirect

Computational Biology and Chemistry journal homepage: www.elsevier.com/locate/cbac

Research Article

Methylation signature genes identification of cancers occurrence and pattern recognition

T

Xuedong Wang, Wenhui Shang, Xiaoqin Li*, Yu Chang College of Life Science and Bioengineering, Beijing University of Technology, Beijing 100124, China

ARTICLE INFO

ABSTRACT

Keywords: Cancer occurrence TCGA Cancer genomics Feature selection Bioinformatics

In order to identify the signature genes of tumorigenesis, the pattern-recognition method was used to analyze the gene methylation (ME) data which included only normal and cancer samples and was collected from the TCGA (The Cancer Genome Atlas) database. Here, we analyzed the DNA methylation profiles of the six types of cancer and the ME signature genes for each cancer were selected by means of a combination of correlation, student's ttest and Elastic Net. Modeling by support vector machine, the accuracy of ME signature genes can be as high as 98 % for training set and as high as 97 % for the independent test set, the recognition accuracy of stage I is more than 97 % for training set and more than 98 % for test set. Then, the common signature genes and common pathways emerging in multiple cancers were obtained. A functional analysis of these signature genes indicates that the identified signatures have direct relationship with tumorigenesis and is very important for understanding the pathogenesis of cancer and the early therapy.

1. Introduction Cancer, also known as malignant tumor, is a very complex disease that affects human tissues in many ways (Dalkic et al., 2010). Cancer poses a serious threat to human health and shows high morbidity and mortality worldwide (Jemal et al., 2011). Clinically, the "three early" principle of early detection, early diagnosis, and early treatment is generally advocated, but many patients have been in advanced or distant metastases at the time of treatment, and lost the best treatment opportunity (Pujol et al., 2004). In the case of lung cancer, the 5-year survival rate of non-small cell lung cancer is only 15 %, and the 2-year survival rate of small cell lung cancer is only 1%. 70 % of patients are locally advanced or distant metastasis (stage IV) when diagnosed and missed the opportunity for surgical treatment and lack of effective treatment. If the patient can be diagnosed in stage Ia, the 5-year survival can be increased to 80 % (Sotiriou et al., 2006). It shows that studying the pathogenesis of cancer and identifying the key genes that causes tumors is conducive to the early diagnosis of cancer at the molecular level, seeking more treatment time for patients and gene-targeted therapy, and has very important theories and clinical value. As one of the vital epigenetic mechanisms, DNA methylation (DNAm) plays a key role in genomestability, chromatin structure, and maintenance of pluripotency in normal somatic cells (Carvalho et al., 2010; Meissner, 2010).In particular, variations in DNAm accompany



the early stages of human carcinogenesis and the process has the reversibility of the drug (Wang et al., 2016a; Yoo and Jones, 2006). As numerous ME has been discovered, it is gradually recognized that the nongenetic modification of this DNA sequence plays an important role in the early diagnosis of cancer. Abbaszadegan, et al. found that the p16 promoter is hypermethylated in the early stage of gastric cancer and is associated with the malignancy of gastric cancer, which can be used as a marker for early detection of gastric cancer (Abbaszadegan et al., 2008). Diaz-Lagares, et al. found that BCAT1, CDO1, TRIM58 and ZNF177 gene CpG islands were significantly hypermethylated compared with tumor tissues, and were highly specific in the diagnosis of early lung cancer (Diaz-Lagares et al., 2016). To identify cancer-associated genes for diagnosis and prognosis, high-throughput microarrays have recently been used to study the relationship between genetic changes and diseases such as cancer. However, the ultrahigh-dimensional, high-noise, and small sample features of microarray data make a few important genes easy to be submerged in the noise of tens of thousands of genome-wide genes. Therefore, there is a substantial interest in reducing the dimension. Bibikova, et al. performed a cluster analysis of the methylation status of 1536 loci in 371 genes in Multiple cancers, as well as in normal cells and discovered 55 sites that can better differentiate between cancer cells and normal cells and differentiate between various cancer cell types (Bibikova et al., 2006). Wang, et al. extracted 48 smoking-related

Corresponding author. E-mail addresses: [email protected] (X. Wang), [email protected] (W. Shang), [email protected] (X. Li), [email protected] (Y. Chang).

https://doi.org/10.1016/j.compbiolchem.2019.107198 Received 23 January 2019; Received in revised form 19 November 2019; Accepted 30 December 2019 Available online 22 February 2020 1476-9271/ © 2020 Elsevier Ltd. All rights reserved.

Computational Biology and Chemistry 85 (2020) 107198

X. Wang, et al.

methylation signature genes based on 127 lung adenocarcinoma samples downloaded from TCGA database, which can distinguish between smoking and non-smoking samples of lung adenocarcinoma (accuracy 87.5 %) (Wang et al., 2016b). There is very little information on the early stages of cancer in the above studies. In this study, we analyzed the early occurrence of 6 cancers based on their methylation and gene expression data. We collected data on paracancerous and cancerous stage I of 6 cancers data to analyze the early occurrence of cancer. Concentrating on ME signature genes and GE signature genes, closely related to cancers early occurrence, we collected normal and stage I of 6 cancers in the TCGA database to identify signature genes by means of advanced bioinformatics technology. Signature genes analysis reveals the similarities and differences in gene ME and GE level between normal and stage I, and provides a theoretical basis for further explaining the early pathogenesis of cancers.

Table 2 The Summary Information of Gene Expression data. Cancer Type

Normal

Stage I

No. Gene

breast invasive carcinoma kidney renal clear cell carcinoma kidney renal papillary cell carcinoma liver hepatocellular carcinoma lung squamous cell carcinoma thyroid carcinoma

113 72 32 50 49 58

182 272 172 173 245 286

57179 57985 57415 54487 57852 56988

2. Data and methods

2.1.3. Gene expression data We selected the same six cancers as methylation data and screened normal and stage I samples. The TCGA RNA-Seq data (level three) generated from IlluminaHiSeq_RNASeqV2 platform are downloaded from TCGA data portal (https://tcga-data.nci.nih.gov/tcga/).We obtain the expression of genes that removed the genes with ‘0’ values among all samples and normalized by “MinMaxScaler” in python. The information of GE data as shown in Table 2.

2.1. Data and pretreatment

2.2. Identification of signature genes

2.1.1. Methylation data In this study, we selected 6 cancers with at least 40 normal samples in each. These cancers include breast invasive carcinoma (BRCA), kidney renal clear cell carcinoma (KIRC), kidney renal papillary cell carcinoma (KIRP), liver hepatocellular carcinoma (LIHC), lung squamous cell carcinoma (LUSC) and thyroid carcinoma (THCA).Normal and stage I are based on the patient’s clinical information provided by TCGA, of which stage I patients ignore the subclass (IA and IB are considered as a stage I sample). Because of the analysis of multiple cancers and taking into account the sample size, we temporarily ignore confounding variables such as age, smoking history, and drinking history, etc. The DNA methylation data (level three) are downloaded from TCGA data portal (https://tcga-data.nci.nih.gov/tcga/). The methylation level of each probe is measured as beta value, which is from 0(unmethylated) to 1 (completely methylated), and the missing value is represented as ‘NA’.

Our study belongs to the binary classification problem, which is one of the major problems in the field of machine learning. Due to the fact that the dimension of the feature is much larger than the number of samples and the relationship between the features is relatively complicated, many problems arise in the study. In order to overcome the influence of these adverse factors and improve the accuracy of feature recognition, in this study, a multi-selection method was comprehensively applied to analyze the cancer methylation data. All screening processes are completed by Python language. Fig. 1 shows flow chart of signature genes identification for cancers occurrence. Methods are as follows: 1 Correlation selection. We computed Pearson coefficient correlations (PCCs) between the methylation profiles and the expression profiles of genes for each cancer using the matched RNA-seq and HM450 data. The genes whose absolute value of PCCs was higher than 0.5 were retained as the first part of the candidate gene set. 2 Significant difference selection. First, we selected the genes that have significant correlation with the category labels (Spearman correlation coefficient > 0.5). Then, we selected significant difference gene by using the Student's t-test (t-test) and homogeneity tests

2.1.2. Methylation data pretreatment Using Python language, we integrate methylation data and clinical information and screen out the normal and stage I samples. The data generated using HM450 platform as the training set and the data generated used the HM27 platform as the independent test set. We selected the probes located in the genomic regions from 200 bp upstream to the transcription start site (TSS200) and removed all sites located on the sex chromosome. To improve the accuracy of the classification results and the speed of model training, we remove the probes with ‘NA’ values among > 50 % samples and impute the remaining ‘NA’ values using 10-nearest neighbors imputation procedure with the ‘knnimpute’ (Fleischer et al., 2014) function in R. We used the mean value of beta of all probes located in the TSS200 as the ME level of the gene. And finally obtain the information of ME data of six kinds of cancer as shown in Table 1. Table 1 The Summary Information of Methylation Data. Cancer Type

Normal

Stage I

No. Gene

Test Set

breast invasive carcinoma kidney renal clear cell carcinoma kidney renal papillary cell carcinoma liver hepatocellular carcinoma lung squamous cell carcinoma thyroid carcinoma

95 154 45 49 42 56

126 152 168 174 172 287

22766 22794 22794 22793 22793 22771

Yes Yes No No Yes No

Fig. 1. Flow chart of signature genes identification for cancers occurrence. 2

Computational Biology and Chemistry 85 (2020) 107198

X. Wang, et al.

must bedone before the t-test. For candidate genes that meet the homogeneity of variance, t-test was used to select the signature genes and retain significant difference genes [p < 0.05, false discovery rate(FDR) < 0.01], genes that do not meet the homogeneity of variance, Kruskal–Wallis was used to screen signature genes and retain significant difference genes (p < 0.05, FDR < 0.01). Significant difference genes were selected as the second part of the candidate gene set. 3 Elastic Net filter (Hui and Hastie, 2005). The candidate genes collected in the previous two steps were pooled and finally selected by “ElasticNetCV” (cv = 10) in python.

Table 3 Pattern Recognition Results for Training Set of Methylation Data.

2.3. Model and evaluation

TP + TN TP + FP + FN + TN

(1)

SEN=

TP TP + FN

(2)

SPE=

TN FP + TN

(3)

MCC=

TP × TN

FP × FN

(TP + FP )(TP + FN )(TN + FP )(TN + FN )

Genes no.

ACC

SEN

SPE

MCC

breast invasive carcinoma kidney renal clear cell carcinoma kidney renal papillary cell carcinoma liver hepatocellular carcinoma lung squamous cell carcinoma thyroid carcinoma

256 184

0.9864 0.9934

0.9762 0.9934

1.0 0.9935

0.9728 0.9869

163

0.9953

0.9940

1.0

0.9861

236 219 111

0.9910 0.9907 0.9825

0.9885 0.9942 0.9861

1.0 0.9762 0.9643

0.9745 0.9704 0.9371

genes have the highest accuracy in identifying the stage I. According to the classification results of six kinds of cancers, the accuracy of the classification of the signature genes obtained by the screening method can reach more than 98 % and the MCC value can reach more than 0.93, indicating that the signature genes screened in this paper have high ability to distinguish between adjacent and stage I, reflecting the reliability and superiority of this paper screening method. In addition, the independent test set of the ME data was used to validate the robustness of the screening method to different data platforms. Using the gene set identified from the training set to match the test set data, and 35 of the LUSC and KIRC were matched in test set and 51 of the BRCA were matched. According to the matched signature genes, the data set is divided by six-fold cross-validation method and the prediction accuracy obtained by support vector machine modeling is shown in Table 4. It can be seen from Table 4 that the prediction accuracy of BRCA, KIRC and LUSC can reach more than 97 %. In combination with three kinds of cancer, the sensitivity is higher than 98 %, that is, the accuracy rate of recognition for stage I is higher than 98 %, and the differences of the indexes between training set and test set are all within 3%, which indicates the robustness of the signature genes recognized in this study to different platforms. It is necessary to select a small set of signature gene sets with higher confidence and equal quantity for each kind of cancer, and study the influence of the size of signature gene sets on classification results. According to the ranking of the contribution rate of genes to the pattern recognition classification, the top-ranked genes are selected as optimal signature genes. The number of the features increased from 1 to 20 and the accuracy and sensitivity of signature genes was calculated respectively. Fig. 2 shows the relationship between ACC and SEN and size of ME signature genes set. As can be seen from Fig. 2, different numbers of signature genes affect the predictive accuracy of the model for cancer stage I and normal. The number of signature genes corresponding to the maximum SEN and ACC of six kinds of cancer in Fig. 2 and other model evaluation results are listed in Table 5. As can be seen from Table 5, all the indicators of BRCA, KIRC, KIRP and LIHC are 1, and the sensitivity of all cancers are more than 99 %, indicating that these few genes have the highest recognition accuracy for stage I of cancer. Comparing the data in Table 2, it can be seen that when the number of signature genes is reduced to less than 20, the ACC and SPE is flat or rising. Which indicate that a small number of highconfidence signature genes can effectively distinguish early cancer and normal samples.

In order to make full use of the data, the training set and the test set data were divided by six-fold cross-validation, and the support vector machine (SVM) was used to modeled and predicted. The accuracy (ACC), sensitivity (SEN), specificity (SPE), and Matthews Correlation Coefficient (MCC) are used to evaluate the classification results of the model. ACC, SEN, SPE, and MCC are defined as follows:

ACC=

Cancer Type

(4)

The positive samples are cancer stage I samples and the negative samples are normal samples. True positive (TP) indicates the number of correctly classified cancer samples, false positive (FP) indicates that the adjacent cancer samples are misjudged to be the number of stage I samples; true negative (TN) indicates the number of correctly classified normal samples, false negative (FN) indicates that the adjacent sample was misjudged as the number of stage I samples. 2.4. Pathway enrichment analysis To evaluate the functional relevance of cancer and signature genes, we perform pathway enrichment analysis for methylation signature genes. The database of enrichment analysis we used includes Gene Ontology (GO) biological processes and Kyoto Encyclopedia of Genes and Genomes (KEGG). The DAVID bioinformatics resources 6.8/NIAID function annotation tool was used to Enrichment Analysis. The significant enrichment pathways are obtained with P < 0.05. 3. Results and discussion 3.1. Pattern recognition results The aim of this paper is to identify signature genes that are of crucial for the development of cancer and to lay the foundation for revealing the mechanism of cancer. For this reason, the genes identified must have a high enough ability to classify cancer, the classification model used must have high classification accuracy, in order to fully prove the validity of the gene. Using the signature genes screening method mentioned above, the number of ME signature genes of six cancers is 256,184,163,236,219,111. The data set is divided by six-fold cross validation, and the classification results obtained by support vector machine (SVM) modeling are shown in Table 3. It can be seen from Table 3 that the sensitivity of six kinds of cancers can reach more than 0.97, which indicates that the identified signature

Table 4 Pattern Recognition Result for Matched Signature Genes. Cancer

BRCA KIRC LUSC

3

ACC

SEN

SPE

MCC

Train

Test

Train

Test

Train

Test

Train

Test

0.9759 0.9903 0.9901

0.9864 1.0 0.9906

0.9821 0.9824 0.9865

0.9841 1.0 0.9941

0.9629 0.9949 1.0

0.9894 1.0 0.9762

0.9451 0.9792 0.9753

0.9723 1.0 0.9704

Computational Biology and Chemistry 85 (2020) 107198

X. Wang, et al.

Fig. 2. The relationship between ACC and SPE and size of ME signature genes set.

Compared with the research of Yang et al., we focused on the methylation changes of normal samples to cancer stage I. All of these indicate that Inflammation and immunity play an important role in early stage of tumorigenesis.

Table 5 The Pattern Recognition Result of Optimal Signature Gene Set for Each Cancer. Disease Type(number of genes)

ACC

SPE

SEN

MCC

Breast Invasive Carcinoma(11) Lung Squamous Cell Carcinoma(2) Kidney Renal Papillary Cell Carcinoma (3) Kidney Renal Clear Cell Carcinoma(11) Thyroid Carcinoma(6) Liver Hepatocellular Carcinoma(13)

1 0.9906 1 1 0.9825 1

1 0.9703 1 1 0.9352 1

1 0.9941 1 1 0.9930 1

1 0.9761 1 1 0.9285 1

3.3. Common functional pattern of signature genes For the methylation signature genes of each cancer, we selected the differentially genes, and distinguished between hypermethylation and hypomethylation according to the direction of methylation changes. As shown in Fig. 3, after a distinction between high and low methylation of differentially methylated genes in 6 cancers, an average of 56 % of differentially methylated genes are hypomethylated genes, indicating

3.2. Enrichment analysis results The results of pathway enrichment analysis is shown in Table A1 in Appendix. Pathways that occur in multi cancers are called common pathways of carcinogenesis. (Yang et al., 2017) found that methylation genes associated with gene expression are significantly enriched in p53 signaling pathway and TRAIL-activated apoptotic signaling pathway, etc which control cell cycle and death. But in our study, we found multi cancers share two enriched pathways: Complement and coagulation cascades and Cytokine-cytokine receptor interaction. It has been shown that Complement and coagulation cascades (Yuan et al., 2018) is mainly involved in the inflammatory process and the recruitment of immunocompetent cells and the direct killing of pathogens and the Cytokines and cytokine-interacting networks (Dong et al., 2017) are thought to be key factors in inflammation and tumor immunology.

Fig. 3. Proportion and distribution of differential methylation in six cancers. 4

Computational Biology and Chemistry 85 (2020) 107198

X. Wang, et al.

that in a wide range of promoter methylation genes, low methyl genes account for a non-negligible proportion. DNA hypomethylation usually occurs on DNA repeats, and hypomethylation of these repeats leads to decreased stability of the cancer genome sequence (Ehrlich et al., 2006; Ferguson et al., 1997). However, the role of hypomethylation of gene promoters in cancer has been ignored by researchers (Ehrlich, 2009). For the hypomethylated genes of each cancer, Gene Ontology (GO) analysis was used to determine the biological process of significantly enriching the hypomethylated genes. The results of GO analysis is shown in Table B1 in Appendix. The results of GO analysis showed that hypomethylated genes are significantly enriched in biological processes associated with immune responses and inflammatory responses. So I believe that hypomethylated genes have specific links to functions such as immunity, defense response, and inflammatory response (Dedeurwaerder et al., 2011; Son et al., 2010; Wang et al., 2010). At the same time, considering that the common pathway is immune and inflammation-related pathways, I speculate that the occurrence of early cancer is closely related to immune and inflammatory responses (Grivennikov et al., 2010; Kelly-Spratt et al., 2011; Lawrence, 2007). At the same time, we screen ME signature genes which occur at least four times in six cancers and obtain SNORD114-1(small nucleolar RNA, C/D box 114-1) and MIR21(microRNA 21). It has been shown that the variant of SNORD114-1 promoted cell growth through cell cycle modulation and its expression was implicated in the G0/G1 to S phase transition mediated by the Rb/p16 pathways, and MIR21 is significantly elevated in a variety of cancer tissues, and its target genes regulate cell proliferation, differentiation, apoptosis and migration and can be used as an oncogene and plays an importantrole in tumorigenesis and development (Chi et al., 2016; Cissell et al., 2008).

relevant methods in statistics, and the signature genes of six kinds of cancer datasets of TCGA database were screened respectively to obtain the ME signature gene sets of different cancers. Modeling by support vector machine, the accuracy of ME signature genes can be as high as 98 % for training set and as high as 97 % for the independent test set, the recognition accuracy of stage I is more than 97 % for training set and more than 98 % for test set. These results indicate that the signature genes screened can effectively distinguish early cancerous samples from normal samples, and the method of screening signature genes is universal and effective. The common ME genesSNORD114-1 and MIR21emerging in multiple cancers were obtained. At the same time, we obtain two common pathways of cancer by using KEGG (Kyoto Encyclopedia of Genes and Genomes) pathway analysis. A functional analysis of the pathways and genes show inflammatory response is closely related to the occurrence of cancer, it is very important for understanding the pathogenesis of cancer and the early therapy. CRediT authorship contribution statement Xuedong Wang: Conceptualization, Methodology, Software, Writing - review & editing. Wenhui Shang: Data curation, Writing original draft. Xiaoqin Li: Software, Validation, Writing - review & editing. Yu Chang: Supervision. Declaration of Competing Interest There is no conflict of interest in the article. Acknowledgements

4. Conclusion

This work was partly supported by the National Natural Science Foundation of China (Grant No. 11572014) and Major Research Projects in The Field of Intelligent Manufacturing (Grant No. 01500054631751).

In this paper, we use bioinformatics methods to study the changes of DNA methylation from normal to cancer stage I. The screening method of cancer signature genes was established by comprehensively using Appendix A

Table A1 The table of KEGG analysis of 6 cancers. Cancer

Term

PValue

Genes

BRCA

Chemokine signaling pathway Complement and coagulation cascades Cytokine-cytokine receptor interaction T cell receptor signaling pathway Olfactory transduction MicroRNAs in cancer Complement and coagulation cascades Olfactory transduction Malaria Cytokine-cytokine receptor interaction African trypanosomiasis Rheumatoid arthritis Cytosolic DNA-sensing pathway Inflammatory bowel disease Graft-versus-host disease Influenza A Type I diabetes mellitus Malaria Legionellosis NOD-like receptor signaling pathway Staphylococcus aureus infection Cytokine-cytokine receptor interaction Inflammatory bowel disease Tuberculosis Complement and coagulation cascades

0.009968 0.012804 0.028429 0.033914 0.044444 0.001328 0.049291 1.04E-04 0.049029 7.51E-05 9.47E-04 0.001765 0.006365 0.006365 0.016285 0.019334 0.025677 0.034167 0.040811 0.043594 9.31E-04 0.010708 0.022278 0.025418 0.02564

PIK3CG, CCL22, CCR7, CXCR6, GNG11, CXCR3 PLAT, CFH, CFI, C4BPA CSF2, CCL22, CCR7, TNFRSF10D, CXCR6, CXCR3 PIK3CG, CSF2, GRAP2, CD28 OR10A6, OR4F6, OR8H3, OR2T3, OR8B4, OR8H2 MIR200A, MIR26B, MIR200B, MIR199A2, MIR27A, MIR155, MIR21, MIR30C1 SERPING1, C4BPA, CPB2 OR2G3, OR2L8, OR14A16, OR51M1, OR1I1, OR4F6, OR1L6, OR52E2, OR51L1, OR2L3, OR2M3, OR4C46 CSF3, ICAM1, KLRK1 TNF, IL18RAP, CXCR5, IL18, CXCR6, TNFRSF13B, EDA2R, IL1B, CXCL10 TNF, IL18, IL1B, HPR TNF, IL18, CTLA4, IL1B, CD28 IL18, IL1B, AIM2, CXCL10 TNF, IL18RAP, IL18, IL1B TNF, IL1B, CD28 CIITA, TNF, IL18, IL1B, CXCL10 TNF, IL1B, CD28 TNF, IL18, IL1B TNF, IL18, IL1B TNF, IL18, IL1B C3AR1, C1QB, CFH, HLA-DOB TNFRSF9, IL2RB, IL12RB1, IL18, CSF2RB IL12RB1, IL18, HLA-DOB IL18, CAMP, TLR1, HLA-DOB C3AR1, C1QB, CFH

KIRC KIRP LIHC LUSC

THCA

5

Computational Biology and Chemistry 85 (2020) 107198

X. Wang, et al.

Table B1 The table of the results of GO analysis for hypomethylated genes of 6 cancers. Cancer

Term

PValue

BRCA

chemotaxis inflammatory response G-protein coupled receptor signaling pathway immune response neutrophil chemotaxis chemokine-mediated signaling pathway innate immune response negative thymic T cell selection positive regulation of neutrophil chemotaxis negative regulation of endothelial cell proliferation monocyte chemotaxis negative regulation of endopeptidase activity detection of chemical stimulus involved in sensory perception of smell cell adhesion regulation of apoptotic process inflammatory response leukotriene metabolic process negative regulation of peptidase activity leukocyte migration positive regulation of cytokine secretion signal transduction inflammatory response negative regulation of fibrinolysis innate immune response regulation of intracellular pH acute-phase response detection of chemical stimulus involved in sensory perception of smell G-protein coupled receptor signaling pathway regulation of immune response sensory perception of smell positive regulation of cellular extravasation regulation of cell adhesion negative regulation of phosphatase activity signal transduction adaptive immune response detection of chemical stimulus involved in sensory perception positive regulation of natural killer cell mediated cytotoxicity single organismal cell-cell adhesion water transport immune response inflammatory response G-protein coupled receptor signaling pathway positive regulation of NF-kappaB transcription factor activity positive regulation of NF-kappaB import into nucleus lipopolysaccharide-mediated signaling pathway tumor necrosis factor-mediated signaling pathway positive regulation of interferon-gamma production positive regulation of calcidiol 1-monooxygenase activity sequestering of triglyceride cellular response to organic cyclic compound positive regulation of fever generation positive regulation of JNK cascade regulation of establishment of endothelial barrier positive regulation of protein kinase B signaling positive regulation of granulocyte macrophage colonystimulating factor production positive regulation of chemokine biosynthetic process innate immune response defense response to bacterium sodium ion homeostasis positive regulation of I-kappaB kinase/NF-kappaB signaling inflammatory response

2.18E-04 2.93E-04 0.001291 0.003449 0.011304 0.012995 0.020353 0.02654 0.052393 0.068495 0.097693 0.007663 0.013456

KIRC

KIRP

LIHC

LUSC

THCA

0.018194 0.034124 0.037427 0.048893 0.055232 0.06254 0.080175 0.089917 0.008509 0.021239 0.063923 0.074428 0.080381 8.47E-06 3.49E-05 0.001988 0.008308 0.014933 0.019554 0.026909 0.029561 0.038093 0.081951 0.086356 0.090956 0.095484 3.70E-07 1.22E-04 2.67E-04 6.53E-04 0.00176 0.004071 0.005214 0.008272 0.008907 0.011858 0.013344 0.014801 0.016039 0.020662 0.025937 0.029388 0.029388 0.003384 0.004932 0.025901 0.056211 0.061018

Gastroenterol. 14, 2055–2060. Bibikova, M., Lin, Z., Zhou, L., et al., 2006. High-throughput DNA methylation profiling using universal bead arrays. Genome Res. 16, 383–393. Carvalho, D.D.D., You, J.S., Jones, P.A., 2010. DNA methylation and cellular reprogramming. Trends Cell Biol. 20, 609–617.

References Abbaszadegan, M.R., Moaven, O., Sima, H.R., et al., 2008. p16 promoter hypermethylation: a useful serum marker for early detection of gastric cancer. World J.

6

Computational Biology and Chemistry 85 (2020) 107198

X. Wang, et al. Chi, L.A.Y., Co, N.N., Tsuruga, T., et al., 2016. Exosomal transfer of stroma-derived miR21 confers paclitaxel resistance in ovarian cancer cells through targeting APAF1. Nat. Commun. 7, 11150. Cissell, K.A., Rahimi, Y., Shrestha, S., et al., 2008. Bioluminescence-based detection of microRNA, miR21 in breast cancer cells. Anal. Chem. 80, 2319–2325. Dalkic, E., Wang, X., Wright, N., et al., 2010. Cancer-drug associations: a complex system. PLoS One 5, e10031. Dedeurwaerder, S., Desmedt, C., Calonne, E., et al., 2011. DNA methylation profiling reveals a predominant immune component in breast cancers. EMBO Mol. Med. 3, 726–741. Diaz-Lagares, A., Mendez-Gonzalez, J., Hervas, D., et al., 2016. A novel epigenetic signature for early diagnosis in lung cancer. Clin. Cancer Res. 22, 3361. Dong, C., Wang, X., Xu, H., et al., 2017. Identification of a cytokine-cytokine receptor interaction gene signature for predicting clinical outcomes in patients with colorectal cancer. Int. J. Clin. Exp. Med. 10, 9009–9018. Ehrlich, M., 2009. DNA hypomethylation in cancer cells. Epigenomics 1, 239–259. Ehrlich, M., Woods, C.B., Yu, M.C., et al., 2006. Quantitative analysis of associations between DNA hypermethylation, hypomethylation, and DNMT RNA levels in ovarian tumors. Oncogene 25, 2636–2645. Ferguson, A.T., Vertino, P.M., Spitzner, J.R., et al., 1997. Role of estrogen receptor gene demethylation and DNA methyltransferase·DNA adduct formation in 5-Aza-2′deoxycytidine-induced cytotoxicity in human breast Cancer cells. J. Biol. Chem. 272, 32260–32266. Fleischer, T., Frigessi, A., Johnson, K.C., et al., 2014. Genome-wide DNA methylation profiles in progression to in situ and invasive carcinoma of the breast with impact on gene transcription and prognosis. Genome Biol. 15, 435. Grivennikov, S.I., Greten, F.R., Karin, M., 2010. Immunity, inflammation, and cancer. Cell 140, 883–899. Hui, Z., Hastie, T., 2005. Regularization and variable selection via the elastic net. J. R. Stat. Soc. 67, 768.

Jemal, A., Siegel, R., Xu, J., et al., 2011. Cancer statistics. CA Cancer Clin. 61, 69–90. Kelly-Spratt, K.S., Pitteri, S.J., Gurley, K.E., et al., 2011. Plasma proteome profiles associated with inflammation, angiogenesis, and cancer. PLoS One 6, e19721. Lawrence, T., 2007. Inflammation and cancer: a failure of resolution? Trends Pharmacol. Sci. 28, 162–165. Meissner, A., 2010. Epigenetic modifications in pluripotent and differentiated cells. Nat. Biotechnol. 28, 1079–1088. Pujol, J.L., Molinier, O., Ebert, W., et al., 2004. CYFRA 21-1 is a prognostic determinant in non-small-cell lung cancer: results of a meta-analysis in 2063 patients. Br. J. Cancer 90, 2097–2105. Son, K.S., Kang, H.S., Sun, J.K., et al., 2010. Hypomethylation of the interleukin-10 gene in breast cancer tissues. Breast 19, 484–488. Sotiriou, C., Wirapati, P., Loi, S., et al., 2006. Gene expression profiling in breast cancer: understanding the molecular basis of histologic grade to improve prognosis. J. Natl. Cancer Inst. 98, 262–272. Wang, Z., Zhang, J., Zhang, Y., et al., 2010. SPAN-Xb expression in myeloma cells is dependent on promoter hypomethylation and can be upregulated pharmacologically. Int. J. Cancer 118, 1436–1444. Wang, S., Zhang, F., Wang, L., et al., 2016a. Genome-wide smoke related methylation signature genes identification for lung adenocarcinomas. Chin. J. Biomed. Wang, S., Zhang, F., Wang, L., et al., 2016b. Genome-wide smoke related methylation signature genes identification for lung adenocarcinomas. Chin. J. Biomed. Eng. Yang, X.F., Gao, L., Zhang, S.H., 2017. Comparative pan-cancer DNA methylation analysis reveals cancer common and specific patterns. Brief. Bioinformatics 18 (5), 761–773. Yoo, C.B., Jones, P.A., 2006. Epigenetic therapy of cancer: past, present and future. Nat. Rev. Drug Discov. 5, 37–50. Yuan, L., Zeng, G., Chen, L., et al., 2018. Identification of key genes and pathways in human clear cell renal cell carcinoma (ccRCC) by co-expression analysis. Int. J. Biol. Sci. 14, 266–279.

7