Bioinformatics analysis of gene expression profile data to screen key genes involved in pulmonary sarcoidosis

Bioinformatics analysis of gene expression profile data to screen key genes involved in pulmonary sarcoidosis

    Bioinformatics analysis of gene expression profile data to screen key genes involved in pulmonary sarcoidosis Hongyan Li MD, Xiaonan ...

908KB Sizes 1 Downloads 77 Views

    Bioinformatics analysis of gene expression profile data to screen key genes involved in pulmonary sarcoidosis Hongyan Li MD, Xiaonan Zhao MD, Jing Wang MD, Minru Zong MD, Hailing Yang MM PII: DOI: Reference:

S0378-1119(16)30775-2 doi:10.1016/j.gene.2016.09.037 GENE 41599

To appear in:

Gene

Received date: Revised date: Accepted date:

24 February 2016 29 August 2016 24 September 2016

Please cite this article as: Li, Hongyan, Zhao, Xiaonan, Wang, Jing, Zong, Minru, Yang, Hailing, Bioinformatics analysis of gene expression profile data to screen key genes involved in pulmonary sarcoidosis, Gene (2016), doi:10.1016/j.gene.2016.09.037

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT Zhaoet al,Genes related to sarcoidosis

Bioinformatics analysis of gene expression profile data to screen key genes involved in pulmonary sarcoidosis

IP

T

Running title: Genes related to pulmonary sarcoidosis

SC R

Hongyan Li 1MD, Xiaonan Zhao1 MD,Jing Wang2 MD, Minru Zong3 MD, Hailing Yang4* MM 1

Infections Department, China-Japan Union Hospital, Jilin University, Changchun,

Pneumology Department, China-Japan Union Hospital, Jilin University, Changchun,

MA

2

NU

130033, China

130033, China

Rehabilitation Department, China-Japan Union Hospital, Jilin University,

Emergency Department, China-Japan Union Hospital, Jilin University, Changchun,

CE P

4

TE

Changchun, 130033, China

D

3

AC

130033, China

*Corresponding authors: Hailing Yang Tel: +86-0431-84995120; Fax: +86-0431-84641026 Email: [email protected] Address: Emergency Department, China-Japan Union Hospital, Jilin University, Changchun, 130033, China

1

ACCEPTED MANUSCRIPT Zhaoet al,Genes related to sarcoidosis

Abstract Background: Sarcoidosis is a multisystemic inflammatory and granulomatous

IP

T

disease that occurs in almost all populations and affects multiple organs. Meanwhile,

SC R

its most common manifestation is pulmonary sarcoidosis. This study aimed to identify effective biomarkers for the diagnosis and therapy of pulmonary sarcoidosis. Methods: GSE16538 was downloaded from Gene Expression Omnibus, including 6

NU

pulmonary sarcoidosis samples and 6 normal lung samples. Then, differentially

MA

expressed genes (DEGs) were identified by limma package in R. After the expression values of the DEGs were extracted, hierarchical clustering analysis was performed for

D

the DEGs using the pheatmap package in R. Subsequently, protein-protein interaction

TE

(PPI) pairs among the DEGs were searched by STRING or REACTOME databases,

CE P

and then PPI networks were visualized by Cytoscape software. Using DAVID and KOBAS, functional and pathway enrichment analyses separately were performed for

AC

the DEGs involved in the PPI network. Results: Total 208 DEGs were identified in pulmonary sarcoidosis samples, including 179 up-regulated genes and 29 down-regulated genes. Hierarchical clustering showed that the DEGs could clearly distinguish the pulmonary sarcoidosis samples from the normal lung samples. In the PPI network constructed by STRING database, CXCL9, STAT1, CCL5, CXCL11 and GBP1 had higher degrees and betweenness values, and could interact with each other. Functional enrichment showed that CXCL9, CXCL11 and CCL5 were enriched in immune response. Moreover, STAT1 was enriched in pathways of chemokine signaling pathway and JAK-STAT signaling pathway. 2

ACCEPTED MANUSCRIPT Zhaoet al,Genes related to sarcoidosis

Conclusion: CXCL9, CXCL11, STAT1, CCL5 and GBP1 might be implicated in pulmonary sarcoidosis through interacting with each other.

IP

T

Keywords: pulmonary sarcoidosis; differentially expressed genes; protein-protein

AC

CE P

TE

D

MA

NU

SC R

interaction network; enrichment analysis

3

ACCEPTED MANUSCRIPT Zhaoet al,Genes related to sarcoidosis

1. Introduction As a multisystemic inflammatory and granulomatous disease (Hunninghake et al.,

IP

T

1999), sarcoidosis affects young adults of both sexes and its incidence peaks in the

SC R

individuals in the third decade of life or later (Syed and Myers, 2004; Rybicki and Iannuzzi, 2007). The most common manifestation of sarcoidosis is pulmonary sarcoidosis (Longo et al., 2011), and approximately 90% sarcoidosis patients have

NU

experienced lung involvement (Lin et al., 2011). The clinical presentation of

MA

sarcoidosis is highly variable and often nonspecific (Judson et al., 2003). About 40% of sarcoidosis patients are asymptomatic, thus chest physical findings are usually

D

minimal in pulmonary sarcoidosis (Veltkamp and Grutters, 2014). In addition,

TE

sarcoidosis shares similarities with other lung diseases (such as tuberculosis, lung

CE P

cancer and cryptogenic organizing pneumonia) in clinical manifestation (Zhou et al., 2008; Liu et al., 2013). Therefore, effective biomarkers are urgently needed for the

AC

diagnosis and therapy of pulmonary sarcoidosis. The pathogenesis of sarcoidosis remains largely unknown, but researches have made some progress in understanding sarcoidosis over the past few years. Several infectious agents are found to implicate in the etiology of sarcoidosis, including mycobacteria, fungi, borrelia and rickettsia (Müller-Quernheim et al., 2012; Saidha et al., 2012). Complex immune responses are frequently observed in sarcoidosis as well (Zissel et al., 2010). Moller et al. (Moller and Chen, 2002) find that specific combinations of antigen, human leukocyte antigen (HLA) class II molecules and T-cell receptors may be required for the development of sarcoidosis. Furthermore, a number of genes also 4

ACCEPTED MANUSCRIPT Zhaoet al,Genes related to sarcoidosis

have been considered as potential pathogenic factors of the disease. Several evidences suggest that HLA genes are strongly associated with this disease in patients from

IP

T

America, India, Britain and Netherlands (Sato et al., 2002; Rossman et al., 2003;

SC R

Rybicki et al., 2003; Sharma et al., 2003; Voorter et al., 2005). The studies in Caucasian patients show that butyrophilin-like 2 (BTNL2) may be a risk factor for sarcoidosis (Rybicki et al., 2005; Valentonyte et al., 2005). Additionally, Levin et al.

NU

(Levin et al., 2014) reveal that XAF1 is a novel sarcoidosis susceptibility gene through

MA

admixture mapping in African Americans.

Several sarcoid genomic studies have been carried out in recent years. For example,

D

Koth et al. demonstrate that blood gene expression analysis can be used for

TE

identifying inflammatory pathways in lung sarcoidosis, and that the genes involved in

CE P

specific inflammatory pathways in sarcoidosis share overlap with those in tuberculosis (Koth et al., 2011). Through genome-wide peripheral blood transcriptome

AC

analysis, Zhou et al. identify a 20-gene signature that can distinguish sarcoidosis from healthy individuals and differentiate patients with complicated sarcoidosis from those without complicated sarcoidosis (Zhou et al., 2012). In 2009, Crouser et al. (Crouser et al., 2009) carried out gene expression analysis to identify the differentially expressed genes (DEGs) between sarcoidosis samples and normal lung samples, and obtained 319 DEGs in sarcoidosis samples. However, Crouser et al. (Crouser et al., 2009) have not performed a comprehensive bioinformatics analysis to identify the key genes involved in pulmonary sarcoidosis. It was well-known that different analysis procedures could lead to different results. To further reveal the molecular mechanisms 5

ACCEPTED MANUSCRIPT Zhaoet al,Genes related to sarcoidosis

of pulmonary sarcoidosis, we identified the key genes involved in the disease using the following bioinformatics methods. Using the same data by Crouser et al. (Crouser

IP

T

et al., 2009), we screened the DEGs based on different methods and thresholds.

SC R

Followed by protein-protein interaction (PPI) network analysis and enrichment analysis successively were conducted. 2. Materials and Methods

NU

2.1 Microarray data

MA

The gene expression profile of GSE16538 was downloaded from Gene Expression Omnibus (GEO, http://www.ncbi.nlm.nih.gov/geo/) database (Barrett et al., 2007),

D

which was sequenced on the platform of GPL570 [HG-U133_Plus_2] Affymetrix

TE

Human Genome U133 Plus 2.0 Array. GSE16538 included 6 pulmonary sarcoidosis

CE P

samples and 6 normal lung samples. Pulmonary sarcoidosis patients met the operational diagnosis of sarcoidosis. Especially, sarcoidosis samples exhibited

AC

non-necrotizing, well-formed epithelioid granuloma without foreign objects or identifiable infection, according to the American Thoracic Society's Joint Statement on Sarcoidosis (Society, 1999). Sarcoidosis samples with atypical pathological features and infection were excluded. Normal lung tissues were isolated during the operation of lung resection or in the immediate postmortem period from organ donors (Crouser et al., 2009). This study used gene expression data downloaded from the public database, thus no patient consent or ethics committee approval were necessary. 2.2 Data preprocessing and DEGs screening The raw data were preprocessed by utilizing affy package in R (available at 6

ACCEPTED MANUSCRIPT Zhaoet al,Genes related to sarcoidosis

http://www.bioconductor.org/packages/release/bioc/html/affy.html) (Gautier et al., 2004). Briefly, the processes of data preprocessing included background correction,

IP

T

quantile normalization and probe summarization. For multiple probes mapped to one

SC R

gene, their average value was considered as the gene expression value. After preprocessing, total 54674 probes corresponding to 43284 genes were obtained. Then, limma

package

(available

at

NU

http://www.bioconductor.org/packages/release/bioc/html/limma.html) (Diboun et al.,

MA

2006) in R was used to identify the DEGs between pulmonary sarcoidosis samples and normal lung samples. The t-test method was utilized to calculate the p-values of

D

genes. Then, Benjamini & Hochberg’s method (Benjamini and Hochberg, 1995) was

TE

used to calculate the adjusted p-values (that was false discovery rate, FDR). Only the

CE P

genes with the |log2fold change (FC)| >1 and FDR < 0.05 were considered as DEGs. 2.3 Hierarchical clustering analysis

AC

Hierarchical clustering can group similar elements in a binary tree and is widely applied for microarray data analysis (Bar-Joseph et al., 2001). The expression values of the DEGs were extracted from the gene expression profile. Afterwards, bidirectional hierarchical clustering was performed for the DEGs using the pheatmap package (Wang et al., 2014) in R. 2.4 Construction of PPI network STRING (version 9.1, http://www.string-db.org/) (Franceschini et al., 2013) is a web resource and biological database that includes comprehensively predicted and known interaction information. Through importing the DEGs into STRING database, the 7

ACCEPTED MANUSCRIPT Zhaoet al,Genes related to sarcoidosis

interaction relationships among the protein encoded by the DEGs were searched. The combined score > 0.4 was used as the cut-off criterion. As an open source access

IP

T

bioinformatics database includes interactions of proteins involved in pathways (Croft

SC R

et al., 2010), REACTOME (http://www.reactome.org) was also used to search the PPI among the DEGs. Then, the PPI pairs were inputted into Cytoscape software (available at http://www.cytoscape.org/) (Saito et al., 2012) to construct the PPI

NU

network. The hub nodes (highly connected proteins which possess important

MA

biological function) were identified by calculating the degree (the number of line connections between proteins) and betweenness value (how much nodes that are not

D

directly connected by a certain node) of each node.

TE

2.5 Enrichment analysis of DEGs

CE P

The Database for Annotation, Visualization and Integrated Discovery (DAVID, http://david.abcc.ncifcrf.gov/) (Jiao et al., 2012) includes a comprehensive biological

AC

knowledge base and a series of analytic tools used for extracting biological themes for genes or proteins. Gene Ontology (GO, http://www.geneontology.org/) database describes functions of genes and their products from three categories: biological process (BP), cellular component (CC) and molecular function (MF) (Consortium, 2004). Using DAVID database (used in Sep, 2015) (Jiao et al., 2012), GO functional enrichment analysis was performed for the DEGs involved in PPI network. Kyoto Encyclopedia of Genes and Genomes (KEGG, http://www.genome.jp/kegg/) database shows the way that genes or other molecules act (Kanehisa and Goto, 2000). KEGG Orthology Based Annotation System (KOBAS, http://kobas.cbi.pku.edu.cn/home.do) 8

ACCEPTED MANUSCRIPT Zhaoet al,Genes related to sarcoidosis

(Xie et al., 2011) is a web server for annotating and identifying enriched pathways and associated diseases for large gene lists. Using KOBAS software (Xie et al., 2011),

IP

T

KEGG pathway enrichment analysis was also conducted for the DEGs involved in

SC R

PPI network based on hypergeometric distribution algorithm. The p-value < 0.05 was chosen as the cut-off criterion for both GO functional and KEGG pathway enrichment analyses.

NU

3. Results

MA

3.1 DEGs screening

Total 54674 probes corresponding to 43284 genes were obtained by data

D

preprocessing. Then, the log2FC and FDR values of these genes were calculated to

TE

identify the DEGs in sarcoidosis samples compared with normal lung samples. Finally,

CE P

a total of 208 DEGs were identified in pulmonary sarcoidosis samples in comparison to normal lung samples, including 179 up-regulated genes and 29 down-regulated

AC

genes (Table 1).

3.2 Hierarchical clustering analysis of DEGs After extracting the expression values of the DEGs, hierarchical clustering analysis was conducted for the DEGs. As shown in Figure 1, the DEGs could clearly distinguish the pulmonary sarcoidosis samples from the normal lung samples. In pulmonary sarcoidosis samples, there were more up-regulated genes than down-regulated genes (Figure 1). 3.3 PPI network analysis By submitting the DEGs to the STRING database, 155 PPI pairs were obtained. Then, 9

ACCEPTED MANUSCRIPT Zhaoet al,Genes related to sarcoidosis

the PPI pairs were imported into Cytoscape software to construct PPI network. The PPI network had 94 nodes (represent proteins encoded by the DEGs, including 84

IP

T

up-regulated genes and 10 down-regulated genes) and 155 edges (line connections

SC R

between nodes) (Figure 2). In the PPI network, chemokine (C-X-C motif) ligand 9 (CXCL9; degree = 16, betweenness = 1273.56), signal transducer and activator of transcription 1 (STAT1; degree = 16, betweenness = 1798.33), chemokine (C-C motif)

NU

ligand 5 (CCL5; degree = 13, betweenness = 1126.34), chemokine (C-X-C motif)

MA

ligand 11 (CXCL11; degree = 10, betweenness = 339.09) and guanylate binding protein 1 (GBP1; degree = 10, betweenness = 148.83) had higher degrees and

D

betweenness values. In addition, CXCL9, STAT1, CCL5, CXCL11 and GBP1 could

TE

interact with each other in the PPI network.

CE P

Furthermore, the PPI network constructed by REACTOME database had 27 nodes (including 26 up-regulated genes and 1 down-regulated gene) and 43 edges (Figure 3).

AC

In the PPI network, G protein gamma 2 subunit (GNG2; degree = 10, betweenness = 63), guanine nucleotide binding protein, alpha 13 (GNA13; degree = 8, betweenness = 60), (CXCL9; degree = 7, betweenness = 0), (CCL5; degree = 7, betweenness = 0) and (CXCL11; degree = 7, betweenness = 0) had higher degrees. Additionally, the PPI networks constructed by STRING database and REACTOME database had 26 common nodes and 32 common edges (Figure 4). 3.4 Functional and pathway enrichment analyses GO functional and KEGG pathway enrichment analyses separately were conducted for the DEGs involved in PPI network constructed by STRING database. Total 19 GO 10

ACCEPTED MANUSCRIPT Zhaoet al,Genes related to sarcoidosis

terms were significantly enriched for the DEGs involved in PPI network, including immune response (p-value = 1.31E-09; which involved CXCL9, CCL5 and CXCL11),

IP

T

response to organic substance (p-value = 1.30E-05; which involved CCL5 and STAT1),

SC R

and cellular defense response (p-value = 4.16E-04; which involved CXCL9 and CCL5) (Table 2). The results of KEGG pathway analysis showed that the DEGs involved in PPI network were mainly enriched in pathways of chemokine signaling pathway

NU

(p-value = 2.37E-06; which involved CXCL9, CXCL11, STAT1 and CCL5),

MA

cytokine-cytokine receptor interaction (p-value = 5.81E-03; which involved CXCL9, CCL5 and CXCL11), natural killer cell mediated cytotoxicity (p-value = 2.51E-02)

D

and JAK-STAT signaling pathway (p-value = 4.08E-02; which involved STAT1)

TE

(Table 3).

CE P

Besides, functional and pathway enrichment analyses also were performed for the DEGs involved in the PPI network constructed by REACTOME database. Total 40

AC

GO terms and 124 KEGG terms were enriched for the DEGs involved in PPI network. The enriched GO terms included taxis (p-value = 4.87E-06; which involved CXCL9, CCL5 and CXCL11), response to organic substance (p-value = 1.00E-04; which involved CCL5 and STAT1), and cellular defense response (p-value = 1.26E-04; which involved CXCL9 and CCL5) (Table 4). The enriched KEGG terms included chemokine signaling pathway (p-value = 6.12E-06; which involved CXCL9, CCL5, STAT1 and CXCL11), GABAergic synapse (p-value = 1.44E-02), and morphine addiction (p-value = 1.60E-02) (Table 5).

11

ACCEPTED MANUSCRIPT Zhaoet al,Genes related to sarcoidosis

4. Discussion In this study, total 208 DEGs (including 179 up-regulated genes and 29

IP

T

down-regulated genes) were identified in pulmonary sarcoidosis samples relative to

SC R

normal lung samples. Hierarchical clustering analysis showed that the DEGs could clearly distinguish the pulmonary sarcoidosis samples from the normal lung samples. Besides, CXCL9, STAT1, CCL5, CXCL11 and GBP1 had higher degrees and

NU

betweenness values in the PPI network constructed by STRING database. Afterwards,

MA

REACTOME database was used to validate the PPI pairs searched by STRING database. In the PPI network constructed by REACTOME database, CXCL9, CCL5

D

and CXCL11 also had higher degrees. There were 26 common nodes and 32 common

TE

edges between the two PPI networks. In addition, the DEGs involved in the PPI

CE P

networks both were enriched in chemokine signaling pathway, as well as functions of response to organic substance and cellular defense response.

AC

Functional enrichment showed that CXCL9, CXCL11 and CCL5 were enriched in immune response. CXCL9 (monokine induced by interferon γ) and CXCL11 (interferon inducible T cell alpha chemoattractant) are the chemokines (C-X-C motif) that promote T helper type 1 (Th1) responses (Xanthou et al., 2003). As reported, the immune response in sarcoidosis is primarily mediated by Th1 lymphocytes, and interferon γ is implicated in Th1-type immune responses (Iannuzzi and Fontana, 2011). As the ligands of chemokine receptor chemokine (C-X-C motif) receptor 3 (CXCR3), CXCL9 and CXCL11 are proved to participate in the accumulation of Th1 lymphocytes in sarcoid lungs by enzyme linked immunosorbent assay (ELISA), 12

ACCEPTED MANUSCRIPT Zhaoet al,Genes related to sarcoidosis

quantitative real-time polymerase chain reaction (qRT-PCR) and immunostaining (Nishioka et al., 2007). A similar study shows that CXCR3 and its ligands (CXCL9

IP

T

and CXCL11) are involved in the formation of granulomatous lesions of pulmonary

SC R

sarcoidosis (Busuttil et al., 2009). Furthermore, a recent study indicates that CXCL9 can predict the sarcoidosis outcomes longitudinally (Su et al., 2014). As a member of GTPases family, GBP1 is also strongly induced by interferons (Cheng et al., 1983).

NU

Ingenuity canonical pathway analysis shows that GBP1 is differentially expressed in

MA

sarcoidosis (Su et al., 2014). As a member of the STAT family, STAT1 has been identified as a potential immune-related susceptibility gene in sarcoidosis (Tanaka et

D

al., 2005). Not surprisingly, the study of Rosenbaum et al. (Rosenbaum et al., 2009)

TE

also demonstrates that the sarcoidosis may be a disease mediated by STAT1. In

CE P

addition, STAT1 mediates chemokines CXCL9 and CXCL11 which are up-regulated in sarcoidosis tissues as well (Rosenbaum et al., 2009). The CC chemokine (C-C motif)

AC

CCL5 has been proved to be elevated in all stages of pulmonary sarcoidosis (Palchevskiy et al., 2011), while CXC chemokines are only associated with early stages of sarcoidosis (Nishioka et al., 2007; Busuttil et al., 2009). Pathway enrichment analysis showed that CXCL9, CCL5 and CXCL11 were enriched in chemokine signaling pathway and cytokine-cytokine receptor interaction. Meanwhile, STAT1 was enriched in pathways of chemokine signaling pathway and JAK-STAT signaling pathway. Thus, CXCL9, CXCL11, STAT1, CCL5 and GBP1 might play important roles in the pathogenesis of sarcoidosis. In the PPI network constructed by STRING database, CXCL9, CXCL11, STAT1, CCL5 and GBP1 had interactions with each 13

ACCEPTED MANUSCRIPT Zhaoet al,Genes related to sarcoidosis

other, indicating that these genes might also implicate in pulmonary sarcoidosis through interacting with each other. Additionally, the key genes screened by this study

IP

T

and Crouser et al. (Crouser et al., 2009) did not overlap with those identified by Zhou

SC R

et al (Zhou et al., 2012), suggesting that tissue and peripheral transcriptome analysis of pulmonary sarcoidosis were different.

In conclusion, total 208 DEGs were identified in pulmonary sarcoidosis samples using

NU

bioinformatics analyses. Besides, CXCL9, CXCL11, STAT1, CCL5 and GBP1 might

MA

implicate in the pathogenesis of pulmonary sarcoidosis through interacting with each other. These findings provided new sights into the pathogenesis of pulmonary

D

sarcoidosis, and lay the foundation for the target therapy of the disease. However,

TE

lacking of experimental verification is a limitation of this study. Aa a classical

CE P

database used for enrichment analysis, DAVID has not been updated for long. Thus, the results of GO functional enrichment analysis is correct, but may be incomplete. In

AC

future, these predicted results obtained from bioinformatics analysis can be validated by further experimental researches, such as qRT-PCR. Conflict of interest: none

14

ACCEPTED MANUSCRIPT Zhaoet al,Genes related to sarcoidosis

References Bar-Joseph, Z., Gifford, D.K. and Jaakkola, T.S.2001. Fast optimal leaf ordering for hierarchical clustering. Bioinformatics 17, S22-S29.

T

Barrett, T., Troup, D.B., Wilhite, S.E., Ledoux, P., Rudnev, D., Evangelista, C., Kim, I.F., Soboleva, A., Tomashevsky, M. and Edgar, R.2007. NCBI GEO: mining tens of millions of expression

IP

profiles—database and tools update. Nucleic acids research 35, D760-D765. Benjamini, Y. and Hochberg, Y.1995. Controlling the false discovery rate: a practical and powerful

SC R

approach to multiple testing. Journal of the Royal Statistical Society. Series B (Methodological), 289-300.

Busuttil, A., Weigt, S., Keane, M., Xue, Y., Palchevskiy, V., Burdick, M., Huang, C., Zisman, D., Fishbein, M. and Lynch, J.2009. CXCR3 ligands are augmented during the pathogenesis of

NU

pulmonary sarcoidosis. European Respiratory Journal 34, 676-686. Cheng, Y., Colonno, R.J. and Yin, F.H.1983. Interferon induction of fibroblast proteins with guanylate binding activity. Journal of Biological Chemistry 258, 7746-7750.

MA

Consortium, G.O.2004. The Gene Ontology (GO) database and informatics resource. Nucleic acids research 32, D258-D261.

Croft, D., O’Kelly, G., Wu, G., Haw, R., Gillespie, M., Matthews, L., Caudy, M., Garapati, P., Gopinath, G. and Jassal, B.2010. Reactome: a database of reactions, pathways and biological processes.

D

Nucleic acids research, gkq1018.

TE

Crouser, E.D., Culver, D.A., Knox, K.S., Julian, M.W., Shao, G., Abraham, S., Liyanarachchi, S., Macre, J.E., Wewers, M.D. and Gavrilin, M.A.2009. Gene expression profiling identifies MMP-12 and ADAMDEC1 as potential pathogenic mediators of pulmonary sarcoidosis.

CE P

American journal of respiratory and critical care medicine 179, 929-938. Diboun, I., Wernisch, L., Orengo, C.A. and Koltzenburg, M.2006. Microarray analysis after RNA amplification can detect pronounced differences in gene expression using limma. BMC genomics 7, 252.

AC

Franceschini, A., Szklarczyk, D., Frankild, S., Kuhn, M., Simonovic, M., Roth, A., Lin, J., Minguez, P., Bork, P. and von Mering, C.2013. STRING v9. 1: protein-protein interaction networks, with increased coverage and integration. Nucleic acids research 41, D808-D815.

Gautier, L., Cope, L., Bolstad, B.M. and Irizarry, R.A.2004. affy—analysis of Affymetrix GeneChip data at the probe level. Bioinformatics 20, 307-315. Hunninghake, G., Costabel, U., Ando, M., Baughman, R., Cordier, J., Du Bois, R., Eklund, A., Kitaichi, M., Lynch, J. and Rizzato, G.1999. ATS/ERS/WASOG statement on sarcoidosis. American Thoracic Society/European Respiratory Society/World Association of Sarcoidosis and other Granulomatous Disorders. Sarcoidosis, vasculitis, and diffuse lung diseases: official journal of WASOG/World Association of Sarcoidosis and Other Granulomatous Disorders 16, 149. Iannuzzi, M.C. and Fontana, J.R.2011. Sarcoidosis: clinical presentation, immunopathogenesis, and therapeutics. Jama 305, 391-399. Jiao, X., Sherman, B.T., Huang, D.W., Stephens, R., Baseler, M.W., Lane, H.C. and Lempicki, R.A.2012. DAVID-WS: a stateful web service to facilitate gene/protein list analysis. Bioinformatics 28, 1805-1806. Judson, M., Baughman, R., Thompson, B., Teirstein, A., Terrin, M., Rossman, M., Yeager Jr, H., McLennan, G., Bresnitz, E. and DePalo, L.2003. Two year prognosis of sarcoidosis: the 15

ACCEPTED MANUSCRIPT Zhaoet al,Genes related to sarcoidosis

ACCESS experience. Sarcoidosis, vasculitis, and diffuse lung diseases: official journal of WASOG/World Association of Sarcoidosis and Other Granulomatous Disorders 20, 204-211. Kanehisa, M. and Goto, S.2000. KEGG: kyoto encyclopedia of genes and genomes. Nucleic acids research 28, 27-30.

T

Koth, L.L., Solberg, O.D., Peng, J.C., Bhakta, N.R., Nguyen, C.P. and Woodruff, P.G.2011. Sarcoidosis blood transcriptome reflects lung inflammation and overlaps with tuberculosis. American

IP

journal of respiratory and critical care medicine 184, 1153-1163.

Levin, A.M., Iannuzzi, M.C., Montgomery, C.G., Trudeau, S., Datta, I., Adrianto, I., Chitale, D.A.,

SC R

McKeigue, P. and Rybicki, B.A.2014. Admixture Fine-Mapping in African Americans Implicates XAF1 as a Possible Sarcoidosis Risk Gene. PloS one 9, e92646. Lin, B.J., Longo, D., Fauci, A., Kasper, D., Hauser, S. and Loscalzo, J.2011. Harrison's principles of internal medicine.

NU

Liu, J., Zhu, Y., Pei, Q., Di, J. and Zhang, S.2013. Serum concentrations of A disintegrin and metalloproteinase 9 (ADAM9) mRNA as a promising novel marker for the detection of pulmonary sarcoidosis. Journal of International Medical Research 41, 1236-1241.

MA

Longo, D., Fauci, A., Kasper, D., Hauser, S., Jameson, J. and Loscalzo, J., Harrison's principles of internal medicine. McGraw Hill Professional (2011). Müller-Quernheim, J., Prasse, A. and Zissel, G.2012. Pathogenesis of sarcoidosis. La Presse Médicale 41, e275-e287.

D

Moller, D.R. and Chen, E.S.2002. Genetic basis of remitting sarcoidosis: triumph of the trimolecular

TE

complex? American journal of respiratory cell and molecular biology 27, 391-395. Nishioka, Y., Manabe, K., Kishi, J., Wang, W., Inayama, M., Azuma, M. and Sone, S.2007. CXCL9 and 11 in patients with pulmonary sarcoidosis: a role of alveolar macrophages. Clinical &

CE P

Experimental Immunology 149, 317-326. Palchevskiy, V., Hashemi, N., Weigt, S.S., Xue, Y.Y., Derhovanessian, A., Keane, M.P., Strieter, R.M., Fishbein, M.C., Deng, J.C. and Lynch III, J.P.2011. Immune response CC chemokines CCL2 and CCL5 are associated with pulmonary sarcoidosis. Fibrogenesis Tissue Repair 4, 10.

AC

Rosenbaum, J.T., Pasadhika, S., Crouser, E.D., Choi, D., Harrington, C.A., Lewis, J.A., Austin, C.R., Diebel, T.N., Vance, E.E. and Braziel, R.M.2009. Hypothesis: sarcoidosis is a STAT1-mediated disease. Clinical Immunology 132, 174-183.

Rossman, M.D., Thompson, B., Frederick, M., Maliarik, M., Iannuzzi, M.C., Rybicki, B.A., Pandey, J.P., Newman, L.S., Magira, E. and Beznik-Cizman, B.2003. < i> HLA-DRB1* 1101: A Significant Risk Factor for Sarcoidosis in Blacks and Whites. The American Journal of Human Genetics 73, 720-735. Rybicki, B.A. and Iannuzzi, M.C.: Epidemiology of sarcoidosis: recent advances and future prospects, Seminars in respiratory and critical care medicine. New York: Thieme Medical Publishers, c1994- (2007), pp. 22-35. Rybicki, B.A., Maliarik, M.J., Poisson, L.M., Sheffer, R., Chen, K.M., Major, M., Chase, G.A. and Iannuzzi, M.C.2003. The major histocompatibility complex gene region and sarcoidosis susceptibility in African Americans. American journal of respiratory and critical care medicine 167, 444-449. Rybicki, B.A., Walewski, J.L., Maliarik, M.J., Kian, H. and Iannuzzi, M.C.2005. The< i> BTNL2 Gene and Sarcoidosis Susceptibility in African Americans and Whites. The American Journal of Human Genetics 77, 491-499. 16

ACCEPTED MANUSCRIPT Zhaoet al,Genes related to sarcoidosis

Saidha, S., Sotirchos, E.S. and Eckstein, C.2012. Etiology of sarcoidosis: does infection play a role? The Yale journal of biology and medicine 85, 133. Saito, R., Smoot, M.E., Ono, K., Ruscheinski, J., Wang, P.-L., Lotia, S., Pico, A.R., Bader, G.D. and Ideker, T.2012. A travel guide to Cytoscape plugins. Nature methods 9, 1069-1076.

T

Sato, H., Grutters, J.C., Pantelidis, P., Mizzon, A.N., Ahmad, T., van Houte, A.-J., Lammers, J.-W.J., van den Bosch, J.M., Welsh, K.I. and du Bois, R.M.2002. HLA-DQB1* 0201: a marker for

IP

good prognosis in British and Dutch patients with sarcoidosis. American journal of respiratory cell and molecular biology 27, 406-412.

SC R

Sharma, S.K., Balamurugan, A., Pandey, R.M., Saha, P.K. and Mehra, N.K.2003. Human leukocyte antigen-DR alleles influence the clinical course of pulmonary sarcoidosis in Asian Indians. American journal of respiratory cell and molecular biology 29, 225-231. Society, A.T.1999. Statement on sarcoidosis. Am J Respir Crit Care Med 160, 736-755.

NU

Su, R., Li, M.M., Bhakta, N.R., Solberg, O.D., Darnell, E.P., Ramstein, J., Garudadri, S., Ho, M., Woodruff, P.G. and Koth, L.L.2014. Longitudinal analysis of sarcoidosis blood transcriptomic signatures and disease outcomes. European Respiratory Journal 44, 985-993.

MA

Syed, J. and Myers, R.2004. Sarcoid heart disease. The Canadian journal of cardiology 20, 89-93. Tanaka, G., Matsushita, I., Ohashi, J., Tsuchiya, N., Ikushima, S., Oritsu, M., Hijikata, M., Nagata, T., Yamamoto, K. and Tokunaga, K.2005. Evaluation of microsatellite markers in association studies: a search for an immune-related susceptibility gene in sarcoidosis. Immunogenetics 56,

D

861-870.

TE

Valentonyte, R., Hampe, J., Huse, K., Rosenstiel, P., Albrecht, M., Stenzel, A., Nagy, M., Gaede, K.I., Franke, A. and Haesler, R.2005. Sarcoidosis is associated with a truncating splice site mutation in BTNL2. Nature genetics 37, 357-364.

CE P

Veltkamp, M. and Grutters, J.C.: The Pulmonary Manifestations of Sarcoidosis, Pulmonary Sarcoidosis. Springer (2014), pp. 19-39. Voorter, C.E., Drent, M. and van den Berg-Loonen, E.M.2005. Severe pulmonary sarcoidosis is strongly associated with the haplotype HLA-DQB1* 0602–DRB1* 150101. Human

AC

immunology 66, 826-835. Wang, L., Cao, C., Ma, Q., Zeng, Q., Wang, H., Cheng, Z., Zhu, G., Qi, J., Ma, H. and Nian, H.2014. RNA-seq analyses of multiple meristems of soybean: novel and alternative transcripts, evolutionary and functional implications. BMC plant biology 14, 169.

Xanthou, G., Duchesnes, C.E., Williams, T.J. and Pease, J.E.2003. CCR3 functional responses are regulated by both CXCR3 and its ligands CXCL9, CXCL10 and CXCL11. European journal of immunology 33, 2241-2250. Xie, C., Mao, X., Huang, J., Ding, Y., Wu, J., Dong, S., Kong, L., Gao, G., Li, C.-Y. and Wei, L.2011. KOBAS 2.0: a web server for annotation and identification of enriched pathways and diseases. Nucleic acids research 39, W316-W322. Zhou, T., Zhang, W., Sweiss, N.J., Chen, E.S., Moller, D.R., Knox, K.S., Ma, S.-F., Wade, M.S., Noth, I. and Machado, R.F.2012. Peripheral blood gene expression as a novel genomic biomarker in complicated sarcoidosis. PLoS One 7, e44818. Zhou, Y., Li, H. and Li, Q.2008. Differentiation of sarcoidosis from tuberculosis using real-time PCR assay for the detection and quantification of Mycobacterium tuberculosis. Sarcoidosis vasculitis and diffuse lung disease 25, 93-99. Zissel, G., Prasse, A. and Müller-Quernheim, J.: Immunologic response of sarcoidosis, Seminars in 17

ACCEPTED MANUSCRIPT Zhaoet al,Genes related to sarcoidosis

AC

CE P

TE

D

MA

NU

SC R

IP

T

respiratory and critical care medicine. © Thieme Medical Publishers (2010), pp. 390-403.

18

ACCEPTED MANUSCRIPT Zhaoet al,Genes related to sarcoidosis

Figure legends Figure 1. The result of hierarchical clustering analysis for the differentially expressed

IP

T

genes (DEGs). Red and blue indicate higher expression and lower expression,

SC R

respectively.

Figure 2. The protein-protein interaction (PPI) network constructed by STRING database for the differentially expressed genes (DEGs). Red and green circles

NU

represent up-regulated and down-regulated genes, respectively. Nodes with yellow

MA

band are key nodes, and purple edges stand for their interactions. Figure 3. The protein-protein interaction (PPI) network constructed by REACTOME

D

database for the differentially expressed genes (DEGs). Red and green circles

TE

represent up-regulated and down-regulated genes, respectively.

CE P

Figure 4. The common protein-protein interactions (PPI) pairs between the PPI network constructed by STRING database and the PPI network constructed by

AC

REACTOME database. Red and green circles represent up-regulated and down-regulated genes, respectively. Nodes with yellow band are key nodes.

19

ACCEPTED MANUSCRIPT Zhaoet al,Genes related to sarcoidosis

Table 1. The statistical metrics for key differentially expressed genes (DEGs). P-value

False discovery rate

log fold change

CXCL11 CXCL9 GBP1 STAT1 CCL5 ALPL GLDN GDF15 RBP4 PGC

2.06E-04 7.20E-04 1.88E-04 8.30E-05 8.13E-04 3.34E-04 6.18E-04 7.85E-05 1.40E-03 1.52E-04

0.02159 0.03591 0.02077 0.01528 0.03763 0.02584 0.03335 0.01494 0.0478 0.01956

3.292573 3.241653 1.978243 1.697385 1.455779 -1.85024 -1.87851 -1.92919 -1.94572 -3.13401

AC

CE P

TE

D

MA

NU

SC R

IP

T

Gene

20

ACCEPTED MANUSCRIPT Zhaoet al,Genes related to sarcoidosis

Table 2. The Gene Ontology (GO) terms enriched for the differentially expressed genes (DEGs) involved in the protein-protein interaction (PPI) network constructed by STRING database. P-value

Gene symbol

21

1.31E-09

GO:0010033: response to organic 16 substance

1.30E-05

GO:0006952: defense response

14

4.51E-05

to 11

4.66E-05

LY75, FYB, GZMA, IL7, CXCL9, CCL19, ZEB1, IL15, TRBC1, CCL5, CXCL11, SLC11A1, SH2D1A, TAP2, BCL2, TAP1, FCN1, CCR2, CFI, GBP4, GBP1 ALPL, GATM, LDLR, DICER1, NR3C1, CCL5, STAT1, CD48, PRKAR2B, SLC11A1, TAP2, BCL2, SLC25A36, GNB4, GNG2, PRKACB LY75, CXCL9, CCL19, IL15, CXCL11, CCL5, CD48, SLC11A1, SH2D1A, TAP2, BCL2, TAP1, CCR2, CFI ALPL, PRKAR2B, GATM, LDLR, SLC25A36, BCL2, GNG2, GNB4, PRKACB, CCL5, STAT1 GNA13, LY75, GATM, CXCL9, CCL19, IL15, CXCL11, CCL5, SOD2, SLC11A1, BCL2, CCR2, CFI CD48, SLC11A1, IL7, BCL11B, BCL2, BCL11A, IL15 ALPL, PRKAR2B, GATM, LDLR, SLC25A36, BCL2, GNG2, GNB4, PRKACB, CCL5, STAT1 CD48, VCAM1, SLC11A1, IL7, BCL11B, BCL2, BCL11A, IL15 GNA13, CD48, VCAM1, SLC11A1, IL7, BCL11B, BCL2, BCL11A, IL15

7

CE P

GO:0042110: T cell activation

NU

MA

D

to 13

4.74E-05

TE

GO:0009725: response hormone stimulus GO:0009611: response wounding

8.6E-05 1.06E-04

GO:0030098: differentiation

3.15E-04

AC

GO:0009719: response to 11 endogenous stimulus GO:0046649: lymphocyte 8 activation GO:0001775: cell activation 9 lymphocyte 6

GO:0030097: hemopoiesis GO:0006968: cellular defense response GO:0002684: positive regulation of immune system process GO:0055066: di-, tri-valent inorganic cation homeostasis GO:0045321: leukocyte activation GO:0030217: T cell differentiation

SC R

number GO:0006955: immune response

T

Gene

IP

Term

1.47E-04 2.41E-04

VCAM1, IL7, BCL11B, BCL2, BCL11A, IL15

8

4.15E-04

VCAM1, IL7, BCL11B, BCL2, BCL11A, CDK6, IL15, SOD2

5

4.16E-04

SH2D1A, TAP2, CCR2, CXCL9, CCL5

8

4.37E-04

8

4.48E-04

8

4.82E-04

5

5.30E-04 21

VCAM1, SLC11A1, SH2D1A, IL7, TAP2, IL15, CFI, AQP3 GNA13, SLC11A1, BCL2, CCR2, CCL19, CCL5, PRKCB, SOD2 CD48, VCAM1, SLC11A1, IL7, BCL11B, BCL2, BCL11A, IL15 IL7, BCL11B, BCL2, BCL11A, IL15

ACCEPTED MANUSCRIPT Zhaoet al,Genes related to sarcoidosis

6.89E-04 7.38E-04

LY75, SLC11A1, CCR2, CXCL9, CCL19, IL15, CFI, CXCL11, CCL5 GNA13, ATXN1, SLC11A1, LDLR, BCL2, CCR2, CCL19, ABCA1, CCL5, PRKCB, SOD2 VCAM1, IL7, BCL11B, BCL2, BCL11A, CDK6, IL15, SOD2

T

5.54E-04

AC

CE P

TE

D

MA

NU

SC R

IP

GO:0006954: inflammatory 9 response GO:0048878: chemical 11 homeostasis GO:0048534: hemopoietic or 8 lymphoid organ development

22

ACCEPTED MANUSCRIPT Zhaoet al,Genes related to sarcoidosis

Table 3. The Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways enriched for the differentially expressed genes (DEGs) involved in the protein-protein interaction (PPI) network constructed by STRING database. Gene symbol

T

P-value

IP

hsa04062: Chemokine signaling pathway

Gene number 11

2.37E-06 SOS1, CCR2, CXCL9, CCL19, GNG2, GNB4, PRKACB, CXCL11, CCL5, STAT1, PRKCB 5.81E-03 IL7, CCR2, CXCL9, CCL19, IL15, CXCL11, CCL5, IFNGR1 2.51E-02 CD48, SH2D1A, SOS1, IFNGR1, PRKCB

SC R

Term

NU

hsa04060: Cytokine-cytokine receptor 8 interaction hsa04650: Natural killer cell mediated 5 cytotoxicity hsa04630: Jak-STAT signaling pathway 5

AC

CE P

TE

D

MA

4.08E-02 IL7, SOS1, IL15, STAT1, IFNGR1

23

ACCEPTED MANUSCRIPT Zhaoet al,Genes related to sarcoidosis

Table 4. The top 10 Gene Ontology (GO) terms enriched for the differentially expressed genes (DEGs) involved in the protein-protein interaction (PPI) network

Gene

P-value

Gene symbol

IP

Term

T

constructed by REACTOME database.

number 6 6

4.87E-06 4.87E-06

ROBO1, CCR2, CXCL9, CCL19, CXCL11, CCL5 ROBO1, CCR2, CXCL9, CCL19, CXCL11, CCL5

10

3.34E-05

GNA13, GPR18, LPAR6, CCR2, CXCL9, CCL19, GNG2, PRKACB, CXCL11, CCL5

12

5.16E-05

5

5.69E-05

PRKAR2B, GNG2, GNB4, PRKACB, STAT1

6

6.52E-05

ROBO1, CCR2, CXCL9, CCL19, CXCL11, CCL5

GO:0007610~behavior

7

NU

MA

D

TE

7.81E-05

GNA13, GPR18, LPAR6, SOS1, CCR2, CXCL9, CCL19, GNG2, PRKACB, CXCL11, STAT1, CCL5

PRKAR2B, ROBO1, CCR2, CXCL9, CCL19, CXCL11, CCL5

7.83E-05

PRKAR2B, GNG2, GNB4, PRKACB

1.00E-04

PRKAR2B, TAP2, DICER1, PRKACB, STAT1, CCL5

1.26E-04

TAP2, CCR2, CXCL9, CCL5

AC

CE P

GO:0009755~hormone-mediated 4 signaling GO:0010033~response to organic 8 substance GO:0006968~cellular defense 4 response

SC R

GO:0042330~taxis GO:0006935~chemotaxis GO:0007186~G-protein coupled receptor protein signaling pathway GO:0007166~cell surface receptor linked signal transduction GO:0032870~cellular response to hormone stimulus GO:0007626~locomotory behavior

24

GNG2,

GNB4,

ACCEPTED MANUSCRIPT Zhaoet al,Genes related to sarcoidosis

Table 5. The top 10 Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways enriched for the differentially expressed genes (DEGs) involved in the protein-protein interaction (PPI) network constructed by REACTOME database. P-value

T

Term

SC R

IP

Gene number

11

6.12E-06

MA

NU

hsa04062: Chemokine signaling pathway

4

1.44E-02

hsa05032: Morphine addiction

4

1.60E-02

hsa04713: Circadian entrainment

4

1.83E-02

hsa04723: Retrograde endocannabinoid signaling 4

2.22E-02

hsa04620: Toll-like receptor signaling pathway

4

2.58E-02

hsa04725: Cholinergic synapse

4

2.98E-02

hsa04726: Serotonergic synapse

4

3.06E-02

AC

CE P

TE

D

hsa04727: GABAergic synapse

25

Gene symbol CXCL9, CXCL11, CCL5, PRKCB, PRKACB, STAT1, SOS1, CCR2, CCL19, GNG2, GNB4 GNG2, PRKACB, PRKCB, GNB4 GNG2, PRKACB, PRKCB, GNB4 GNG2, PRKACB, PRKCB, GNB4 GNG2, PRKACB, PRKCB, GNB4 CXCL9, CXCL11, CCL5, STAT1 GNG2, PRKACB, PRKCB, GNB4 GNG2, PRKACB, PRKCB,

ACCEPTED MANUSCRIPT Zhaoet al,Genes related to sarcoidosis

3

T

hsa05140: Leishmaniasis

IP

4

AC

CE P

TE

D

MA

NU

SC R

hsa04724: Glutamatergic synapse

26

GNB4 GNG2, PRKACB, 3.23E-02 PRKCB, GNB4 IFNGR1, 4.08E-02 STAT1, PRKCB

ACCEPTED MANUSCRIPT

SC R

IP

T

Zhaoet al,Genes related to sarcoidosis

AC

CE P

TE

D

MA

NU

Figure 1

27

ACCEPTED MANUSCRIPT

MA

NU

SC R

IP

T

Zhaoet al,Genes related to sarcoidosis

AC

CE P

TE

D

Figure 2

28

ACCEPTED MANUSCRIPT

MA

NU

SC R

IP

T

Zhaoet al,Genes related to sarcoidosis

AC

CE P

TE

D

Figure 3

29

ACCEPTED MANUSCRIPT

MA

NU

SC R

IP

T

Zhaoet al,Genes related to sarcoidosis

AC

CE P

TE

D

Figure 4

30

ACCEPTED MANUSCRIPT Zhaoet al,Genes related to sarcoidosis

Highlights: 1. There were 208 differentially expressed genes in pulmonary sarcoidosis samples.

IP

T

2. CXCL9, CXCL11 and CCL5 were enriched in immune response and defense response.

SC R

3. CXCL9, CXCL11, STAT1, CCL5 and GBP1 might affect sarcoidosis via

AC

CE P

TE

D

MA

NU

interactions.

31

ACCEPTED MANUSCRIPT Zhaoet al,Genes related to sarcoidosis

Abbreviations

differentially expressed genes (DEGs); protein-protein interaction (PPI); human

IP

T

leukocyte antigen (HLA); butyrophilin-like 2 (BTNL2); Gene Expression Omnibus

SC R

(GEO); Gene Ontology (GO); biological process (BP);cellular component (CC);

AC

CE P

TE

D

MA

NU

molecular function (MF);

32