IDPT: Insights into potential intrinsically disordered proteins through transcriptomic analysis of genes for prostate carcinoma epigenetic data

IDPT: Insights into potential intrinsically disordered proteins through transcriptomic analysis of genes for prostate carcinoma epigenetic data

    IDPT: Insights into potential intrinsically disordered proteins through transcriptomic analysis of genes for prostate carcinoma epige...

2MB Sizes 0 Downloads 11 Views

    IDPT: Insights into potential intrinsically disordered proteins through transcriptomic analysis of genes for prostate carcinoma epigenetic data Saurav Mallik, Sagnik Sen, Ujjwal Maulik PII: DOI: Reference:

S0378-1119(16)30246-3 doi: 10.1016/j.gene.2016.03.056 GENE 41262

To appear in:

Gene

Received date: Revised date: Accepted date:

2 December 2015 22 February 2016 30 March 2016

Please cite this article as: Mallik, Saurav, Sen, Sagnik, Maulik, Ujjwal, IDPT: Insights into potential intrinsically disordered proteins through transcriptomic analysis of genes for prostate carcinoma epigenetic data, Gene (2016), doi: 10.1016/j.gene.2016.03.056

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT

IDPT: Insights into Potential Intrinsically

PT

Disordered Proteins Through Transcriptomic

SC

Epigenetic Data

RI

Analysis of Genes for Prostate Carcinoma

Intelligence Unit, Indian Statistical Institute, Kolkata-700108, India.

b Department

of Computer Science and Engineering, Jadavpur University,

MA

a Machine

NU

Saurav Mallik a , Sagnik Sen b , and Ujjwal Maulik c

Kolkata-700032, India. of Computer Science and Engineering, Jadavpur University,

D

c Department

AC CE P

Abstract

TE

Kolkata-700032, India.

Involvement of intrinsically disordered proteins (IDPs) with various dreadful diseases like cancer is an interesting research topic. In order to gain novel insights into the regulation of IDPs, in this article, we perform a transcriptomic analysis of mRNAs (genes) for transcripts encoding IDPs on a human multi-omics prostate carcinoma dataset having both gene expression and methylation data. In this regard, firstly the genes that consist of both the expression and methylation data, and that are corresponding to the cancer-related prostate-tissue-specific disordered proteins of M obiDb database, are selected. We apply standard t-test for determining differentially expressed genes as well as differentially methylated genes. A network having these genes and their targeter miRNAs from Diana Tarbase v7.0 database and corresponding Transcription Factors from TRANSFAC and ITFP databases, is then built. Thereafter, we perform literature search, and KEGG pathway and gene ontology analyses using DAVID database. Finally, we report several significant po-

Preprint submitted to Elsevier

22 February 2016

ACCEPTED MANUSCRIPT tential gene-markers (with the corresponding IDPs) that have inverse relationship between differential expression and methylation patterns, and that are hub genes of the TF-miRNA-gene network.

PT

Key words: Intrinsically Disordered proteins, Epigenetic genemarkers,

SC

dataset, t-test, TF-miRNA-gene network, hub genes.

RI

Transcriptomic analysis of genes, Multi-omics Prostate Carcinoma Epigenetic

1

Introduction

2

Intrinsically Disordered Proteins (IDP s) are a type of proteins without any

3

stable ordered structure [1], [2], [3]. IDP s consist of a specific disorder region.

4

Alternatively, it might have a complete structural disorder. IDP s are of differ-

5

ent categories of proteins (viz., membrane, soluble, aqueous proteins etc.) [4].

6

IDP s are indicators of evolutionary rate [5]. IDP s may involve different types

7

of functional activities. In regulatory and signaling pathways, IDP s that treat

8

like lead compounds, interact with other proteins by its disordered region. It

9

is known that many proteins need to acquire a well-defined structure in order

10

to perform their function, while a higher significant portion of the proteome

11

of any organism has polypeptide segments which are not likely to design a

12

defined 3D structure, but are still functional. These segments of proteins are

13

stated as intrinsically disordered regions (IDRs) [6]. The Proteins that do

14

not have IDRs, are denoted as structured proteins, whereas the proteins with

15

completely disordered sequences which do not acquire any tertiary structure,

16

are stated as intrinsically disordered proteins (IDP s) [7]. IDP is a complete

17

disordered structure or containing Intrinsically Disordered regions (IDRs) of

AC CE P

TE

D

MA

NU

1

Email addresses: [email protected], chasaurav [email protected] (Saurav Mallik), [email protected] (Sagnik Sen), [email protected] (Ujjwal Maulik).

2

ACCEPTED MANUSCRIPT different lengths at different positions [8]. It is observed that there are several

19

IDRs which are predicted from some ordered proteins. Approximately, 30%

20

of eukaryotic proteins have disordered regions. But it is not necessary that

21

they are differentially expressed.

22

Due to their unusual structural characteristics and important functional char-

23

acteristics, the presence of IDP s in a cell needs to be observed very carefully.

24

Different dreadful diseases like cancers, carcinomas, cardiac diseases and neu-

25

rodegenerative diseases can be formed by the differentially expressed IDP s

26

[8].

27

Now a days, gene expression is not a singular matter [9]. Different epigenetic

28

factors (like DNA methylation [10], [11], [12]) can affect the regulation of genes.

29

It signifies addition of the methyl group to 5th cytosine pyrimidine ring or 6th

30

nitrogen of the adenine purine ring in genomic DNA. Methylation in general

31

decreases the gene expression levels. Until now, IDP s are not analyzed with

32

multi-omics epigenetic dataset. Methylation has two states (hyper-methylation

33

and hypo-methylation). Approximately, 80-90% of CpG sites are methylated

34

across the whole genome, in human body.

35

Transcription Factors (T F s) [13] also play an important role in case of gene

36

regulation. T F s might regulate genes and miRNAs either positively or nega-

37

tively. It maintains the flow of genetic information either alone or with other

38

proteins in a group from DNA to mRN A by either promoting, or blocking the

39

hiring of RNA polymerase to the specified genes. T F s have significant role in

40

case of gene expression levels as well as methylation levels.

41

Prostate Carcinoma is a type of cancers that generally affects the male pro-

42

static glands. It is observed that prostate cancer is generally common factor

43

for the old aged people. It contains three risk groups (viz., genetic, obesity

44

and aged). Prostate cancer is uncommon for the age less than 45. Average age

45

of the prostate cancers might be 70. It has not have any clear symptoms. It

AC CE P

TE

D

MA

NU

SC

RI

PT

18

3

ACCEPTED MANUSCRIPT has some symptoms just like the benign prostatic hyperplasia. Some of the

47

common symptoms are hematuria (i.e., blood in urine), painful urination etc.

48

In case of advanced stages of prostate carcinoma, it can be possible that it

49

might affect other body organs like spinal chord, femer etc.

50

Now, the equilibrium level of a protein [14] is based on the rate of its creation

51

relative to the rate of its degradation. The production-quantity of a protein

52

in the cell is surely affected by the level of expression of its mRNA (gene)

53

transcript. If it is necessary for the proteins to keep themselves in the disor-

54

dered state for any length of time, they have to either bypass the pathways

55

related to endogenous degradation (e.g., ATP-dependent proteolytic 26S pro-

56

teasome) which target unfolded-proteins or be created in sufficient quantity

57

to overload the pathways associated with protein degradation in temporary

58

fashion. In order to determine the expression patterns of mRNA (gene) for

59

transcripts encoding IDPs and to discover novel insights into the biomolec-

60

ular mechanisms of transcriptional regulation, some bioinformatics-analysis

61

might be carried out on any publicly available tissue-specific mRNA (gene)

62

expression data having experimental (diseased) and control (normal) sam-

63

ples. Until now, no epigenetic study has been tried for this purpose. Thus, if

64

some multi-omics epigenetic data can be utilized for this purpose, then the

65

analysis will be surely enhanced. Therefore, for finding novel insights into

66

the regulation of IDPs, in this article, we perform a transcriptomic anal-

67

ysis of mRNAs (genes) for transcripts encoding IDPs on a publicly avail-

68

able human multi-omics dataset having both gene expression and methyla-

69

tion data. In this regard, we identify some epigenetic gene-markers having

70

their diffential expression and methylation patterns by integrating statistical

71

and TF-miRNA-gene network analyses for a prostate carcinoma multi-omics

72

(epigenetic) dataset having both the gene expression and methylation data.

73

Thereafter, the corresponding IDPs related to the resulting gene-markers are

74

obtained. These resulting IDPs are called as potential IDPs since their cor-

AC CE P

TE

D

MA

NU

SC

RI

PT

46

4

ACCEPTED MANUSCRIPT responding mRNA transcripts are both differentially expressed as well as dif-

76

ferentially methylated, and are also hub-genes (centralized genes) in corre-

77

sponding TF-miRNA-gene network that are associated with higher number of

78

other biomolecules (TF and miRNAs). In brief, for this regard, at first, from

79

M obiDb database [15], we collect the IDP s which are associated with the

80

cancers in different tissue types of human body (viz., prostate tissue, ovarian

81

tissue, lung tissue, oral tissue, cardiac tissue, lymph tissue, pancreatic tissue,

82

breast tissue). Since we know that protein is a translational product of mRNA

83

(gene), and epigenetic data is not available for the protein, therefore for the

84

experimental purposes, we take a multi-omics prostate carcinoma epigenetic

85

dataset having both the gene expression and methylation data. Thereafter, we

86

consider matched genes (say, Gmatch ) that are common to both the datasets

87

(i.e., gene expression and methylation datasets). We then perform intersec-

88

tion between Gmatch and the genes corresponding to the collected IDP s from

89

M obiDb database (=Gintegrated ). Notably, the matched samples between ex-

90

pression and methylation datasets are only considered from the intersected

91

genes and are utilized in our experiment. After that, we run standard t-test

92

[16] on the gene expression data as well as the methylated data of the selected

93

genes to identify differentially expressed genes (DEgenes ) as well as differen-

94

tially methylated genes (DMgenes ), respectively. Volcanoplot is applied to find

95

the genes whether they are up-regulated (DEup ) or down-regulated (DEdown ),

96

and hyper-methylated (DMhyper ) or hypo-methylated (DMhypo ). We also de-

97

termine the genes that have inverse relationship between the expression and

98

methylation patterns (i.e. (DEup ∩DMhypo ) or (DEdown ∩DMhyper )) that are

99

together denoted as Ginv . Besides this, we consider all the intersected genes

100

(Gintegrated ) and identify the corresponding targeter miRN As of them by Di-

101

ana Tarbase v7.0. Simultaneously, we determine the corresponding Transcrip-

102

tion Factors (T F s) that regulate these genes through TRANSFAC [16], [17]

103

and ITFP [16], [18] databases. Thereafter, we build a network comprising of

104

these genes, T F s and miRN As. In this case, we also highlight the genes which

AC CE P

TE

D

MA

NU

SC

RI

PT

75

5

ACCEPTED MANUSCRIPT are associated with higher number of T F s and/or miRN As. Thereafter, we

106

compute the degree centrality of the genes, and determine the hub genes of the

107

network (denoted as Ghub ). Thereafter, we perform intersection between the

108

Ginv and Ghub , and the resultant genes are stated as epigenetic markers for the

109

disease of prostate carcinoma. In addition, we perform literature search, and

110

Gene Ontology analysis using DAVID database. Finally, the top epigenetic

111

gene-markers, and corresponding potential IDP s are reported.

112

The rest of article is organized as follows. In Section 2, literature survey of

113

the IDP s and related works are described. Section 3 demonstrates the our

114

method whereas Section 4 represents the dataset as well as the experimental

115

results and discussion, consecutively. Finally, Section 5 concludes the article.

116

2

117

The area of research related to IDP is bound by In-Vivo techniques [19]. In

118

the area of In-silico studies, only a few computational methods have been

119

developed. Defining the structure of a protein is one of the important ob-

120

jectives in computational biology. In earlier, computational works related to

121

IDP s are based on X-Ray crystallography and Nuclear Magnetic Resonance

122

(N M R) predictions [20], [21]. In recent studies, differentially expressed IDP s

123

are causing diseases like cancers, neurodegenerative diseases, HIV [22], and

124

heart diseases. The computational techniques basically are used to predict

125

the disorder regions [20], [21]. As mentioned before, earlier ages of compu-

126

tational researches are depending on core structural studies. Later, different

127

ad-hoc techniques and energy computation are performed for the prediction of

128

IDRs. Some of the well known optimal tools are developed and designed. One

129

of them is DISOPRED viz., DISOPRED2 [23] and DISOPRED3 [24]. This

130

tool is designed for IDR predictions. Now a days, the relationship between

131

the diseases and IDP s are one of the important research platforms [1], [19].

MA

NU

SC

RI

PT

105

AC CE P

TE

D

Literature Review

6

ACCEPTED MANUSCRIPT The proteins with IDRs can cause different neurodegenerative diseases (e.g.,

133

Alzheimer’s disease). Sometimes the gene regulatory networks or pathways [1]

134

are completely or partially effected by the IDP s. Basically, if the IDP is a

135

lead compound, IDR becomes an active site. As a result, proteins with IDRs

136

behave differently during interaction which is affecting the normal verse of

137

protein-protein interactions as well as regulatory networks and pathways. The

138

functional classes associated with IDP s have been proposed to cover the wider

139

range of biological processes rather than the classes related to structured pro-

140

teins. The concept of IDP s does not belong to the classical structure theory of

141

proteins. The traditional structure-function relationship of proteins is applied

142

to the area of stable and folded proteins whereas the structured features are

143

identified for interaction of neighbor molecules [8], [25].

144

Besides the above information, [26] has provided a study of significant highly

145

disordered proteins in finding their functions in human cells by analyzing gene

146

expression data. The association of the IDP s in cell signaling and cancer is

147

demonstrated through a corresponding study in [1] and [27]. Furthermore,

148

several works have been performed on the transcriptome level of Nef-expressing

149

U937 cells and their exosomes [28], [29].

150

From the literature, it is observed that no epigenetic study has been performed

151

previously for determining IDP s. It might be helpful to refine the previous

152

findings regarding IDP s and additionally able to produce some novel results.

153

Therefore, for finding novel insights into the regulation of IDPs, in this article,

154

we perform a transcriptomic analysis of mRNAs (genes) for transcripts encod-

155

ing IDPs on a publicly available human multi-omics dataset having both gene

156

expression and methylation data. In this regard, we identify some epigenetic

157

gene-markers having their differential expression and methylation patterns by

158

integrating statistical and TF-miRNA-gene network analyses for a prostate

159

carcinoma multi-omics (epigenetic) dataset having both the gene expression

160

and methylation data. Here, the epigenetic gene-markers must have both dif-

AC CE P

TE

D

MA

NU

SC

RI

PT

132

7

ACCEPTED MANUSCRIPT ferential expression patterns as well as differential methylation patterns, and

162

they also behave as the hub genes of the network. Finally, the corresponding

163

IDPs related to the resulting gene-markers are obtained, and then reported.

164

These resulting IDPs are stated as potential IDPs as their corresponding

165

mRNA transcripts are both differentially expressed as well as differentially

166

methylated, and are also hub-genes (centralized genes) of TF-miRNA-gene

167

network.

168

3

169

In this article, we carry out a transcriptomic analysis of mRNAs (genes) for

170

transcripts encoding IDPs on a publicly available human multi-omics epi-

171

genetic dataset having both gene expression and methylation data. In this

172

regard, we determine some epigenetic gene-markers having their diffential ex-

173

pression and methylation patterns by integrating statistical and TF-miRNA-

174

gene network analyses for a prostate carcinoma multi-omics (epigenetic) dataset

175

having both the gene expression and methylation data. The corresponding

176

IDPs related to the resulting gene-markers are also identified. The steps of

177

the transcriptomic analysis are described as follows.

178

3.1 Initial set of IDP s from M obiDb database

179

M obiDb [15] is a structural database of IDP s combining the data from

180

Disprot [37], P DB [38], EM BL, U niprot and other structural databases.

181

M obiDb basically provides tissue specificity and annotation for a IDP . The

182

IDP s are chosen from the M obiDb based on their cancer specific tissue types

183

(viz., prostate, lung, ovarian, oral, pancreatic, cardiac, lymph, and breast tis-

184

sues).

SC

RI

PT

161

AC CE P

TE

D

MA

NU

Materials and Methods

8

ACCEPTED MANUSCRIPT In this article, we utilize M obiDb database to collect the IDP s related to one

186

of the tissues (viz., prostate tissue).

187

3.2 Initial geneset determination

188

We initially consider the common genes having both the values and the (matched)

189

common samples from multi-omics dataset (= Gmatch ). We thereafter perform

190

the intersection between the Gmatch and the geneset of the corresponding IDP s

191

(= Gintegrated ). This intersected genes will be used for further study.

192

3.3 Data-normalization

TE

D

MA

NU

SC

RI

PT

185

Microarray technique is one of the useful tools for measuring gene expression

AC CE P

data across different experimental and control samples. Before using any statistical test, we need to scale all the values of each gene. Therefore, initially we have to utilize a normalization technique on each of the Gintegrated genes for the expression data as well as methylation data. To normalize the expression/methylation dataset, we perform zero-mean normalization [11]. The zero-mean normalization is formulated in such a way that mean value of gene is converted to zero which is formulated as follows. norm = yjk

(yjk − µ) , σ

(1)

193

where µ and σ stand for the mean and standard deviation, respectively for

194

expression/methylation data of j-th gene before normalization, whereas yjk

195

norm and yjk denote the values of j-th gene across k-th sample before and after

196

normalization, respectively. 9

ACCEPTED MANUSCRIPT 197

3.4 Identification of Differentially expressed and differentially methylated genes

Statistical test is important for determining differentially expressed genes (DE)

PT

as well as differentially methylated genes (DM). The t-test [9] is a popular test that is widely used in microarray data for determing DE/DM genes.

RI

In (two-sample) t-test, the difference between the means of the two groups

SC

is compared where the variation of the data is also considered. The p-value is the probability of observing a t-value as large or larger than the actual

NU

observed t-value in which the null hypothesis is considered as true. The p-value is computed from either t-table or cumulative distribution function (cdf). Let, for each gene n, group 1 consists of m1 diseased samples, with mean a1 and

MA

standard deviation sm1 , and group 2 contains m2 control samples, with mean

D

a2 and standard deviation sm2 . Therefore, t-test is defined as follows. (a1 − a2 ) , em

(2)

TE

t=

as:

AC CE P

where em refers to the standard error of the groups’ mean, which is formulated

em = eP ooled ∗

s

1 1 + . m1 m2

(3)

Here, ePooled is the pooled estimate of the population standard deviation; i.e.,

v u u (m1 − 1) ∗ s2m1 + (m2 − 1) ∗ s2m2 epooled = t ,

dof

(4)

198

where dof refers to the degree of freedom of the test (i.e., dof = (m1 +m2 −2)).

199

In this article, we apply the t-test on the normalized expression data of each

200

gene belongs to the Gintegrated genes to identify DE genes. Similarly, we per-

201

form the t-test on the normalized methylation data of each gene belongs to

202

the Gintegrated genes to produce DM genes. If the p-value of a gene for its

203

(normalized) expression data is less than 5%, it can be said that the gene

204

becomes DE. Similarly, the p-value of a gene for its (normalized) methylation 10

ACCEPTED MANUSCRIPT data is less than 5%, the gene is DM .

206

After obtaining the DE/DM genes, we apply volcanoplot to identify up-

207

regulated/over-expressed genes (DEup ) and down-regulated/under-expressed

208

genes (DEdown ) from the expression data. Similarly, from the methylation

209

data, hyper-methylated genes (DMhyper ) and hypo-methylated genes (DMhypo )

210

are obtained by the use of the volcanoplot.

211

3.5 Integrated study of methylation on expression

212

Now, we identify the genes of which differential expression status is inverse of

213

the status of differential methylation (say, Ginv ). In other words, we consider

214

the genes having (i) DEup ∩ DMhypo , and (ii) DEdown ∩ DMhyper patterns.

215

Later, these genes will be utilized for selecting the disease-markers.

216

3.6 Accumulation of T F s and miRN As of the genes

217

We thereafter collect the targeter miRNAs of the genes belongs to the Gintegrated

218

set using DIANA Tarbase V.07 database [30], a web server which provides val-

219

idated interactions between the genes and the targeter miRN As. Besides this,

220

we have obtained the corresponding TFs of the genes belongs to the Gintegrated

221

set using TRANSFAC [16], [17] and ITFP [16], [18] databases.

222

3.7 T F -miRN A-gene regulatory network formation and Hub gene selection

223

A network comprising the genes, the corresponding TFs and the targeter miR-

224

NAs is then built. The TF-miRNA-gene network is analyzed by a well-known

225

topological measure, degree centrality (DC) [36], [40], [41] since the whole

226

network is considered as a symmetric network.

AC CE P

TE

D

MA

NU

SC

RI

PT

205

11

ACCEPTED MANUSCRIPT In the degree centrality measure, the degree of a vertex (gene/TF/miRNA)

228

is only considered whereas the degree of the vertex refers to the number of

229

vertices that are directly connected to it (i.e., nearest neighbors of the ver-

230

tex). In the degree centrality measure, each neighbour of the vertex has equal

231

importance for calculating the degree centrality of the vertex. Here, from the

232

final network, the degree centrality of a gene is observed to count the total

233

number of interactions of the gene to the corresponding miRN As and T F s.

234

After computing the DC values, we choose some top genes with respect to

235

their DC values from highest (best) to lowest (worst) cases. These top genes are

236

stated as Hub genes (centralized genes) of the network. These hub genes (say,

237

Ghub ) are utilized in epigenetic marker discovery in the next step. However,

238

the Tf-miRNA-gene network is visualized using Cytoscape 3.2.1 software.

239

3.8 Epigenetic marker identification

240

We thereafter perform intersection between Ginv genes and Ghub genes. Finally,

241

the intersected genes are the epigenetic gene-markers that are the hub genes

242

as well as they have inverse relationships between their differential expression

243

and differential methylation status. In addition, the corresponding IDP s of

244

these resulting epigenetic gene-markers are mentioned.

245

3.9 Biological validation

246

Furthermore, we have performed literature search, as well as Kegg pathway

247

and gene ontology (GO) analyses using DAVID database in order to validate

248

the gene-markers. However, we have provided a flowchart in Fig. 1 that rep-

249

resents site map of the proposed framework.

AC CE P

TE

D

MA

NU

SC

RI

PT

227

12

AC CE P

TE

D

MA

NU

SC

RI

PT

ACCEPTED MANUSCRIPT

Fig. 1. A Flow chart to describe the proposed framework.

13

ACCEPTED MANUSCRIPT 4

Epigenetic Dataset and Experimental Result

251

In this section, we describe the real epigenetic data that is used for the purpose

252

of experiment. Thereafter, we provide the experimental result and related

253

discussion.

254

4.1 Dataset Information

255

In this article, we use a multi-omics Prostate Carcinoma (epigenetic) dataset

256

(NCBI Ref. id: GSE55599) which consists of 18,309 common genes having

257

both gene expression and methylation values (=(Gmatch ). It has 32 common

258

Prostate Carcinoma (experimental) samples and 16 common control (normal)

259

samples.

260

4.2 Experimental Results and Discussion

261

Here, we discuss about the experimental result.

262

4.2.1 Determination of Differentially Expressed and Differentially Methy-

AC CE P

263

TE

D

MA

NU

SC

RI

PT

250

lated genes

264

At first, we accumulate 18,309 genes as Gmatch , and then identify 40 genes as

265

Gintegrated . Thereafter, as mentioned in Section 3, we perform data-normalization

266

and t-test consecutively on the expression/methylation data of Gintegrated genes.

267

We obtain three genes under DEup category, and eleven genes under DEdown

268

data (see Table 1). Similarly, four genes are identified as DMhyper genes,

269

whereas three genes are DMhypo (see Table 2 and Fig. 2). Thereafter, we

270

identify only one gene having DEup ∩ DMhypo pattern, and four genes belongs

271

to DEdown ∩ DMhyper category (see Fig. 3). We provide the p-values, and the 14

ACCEPTED MANUSCRIPT status of the differential expression and differential methylation of these five

273

(=1+4) genes in Table 3.

274

4.2.2 Resultant Network and Hub gene identification

275

We build a network consisting of the genes of Gintegrated set, the targeter

276

miRNA collected from the DIANA Tarbase V.07 database, and the corre-

277

sponding TFs accumulated from the TRANSFAC and ITFP databases. Fig. 4

278

shows the corresponding whole network whereas Fig. 5 shows another view of

279

Fig. 4 in which all the genes and all the miRNAs are kept in the left-top and

280

right sides, respectively of the network, whereas all the TFs are put in the

281

middle-bottom portion of the network.

282

In the TF-miRNA-gene network, we observe that four specific genes, named

283

PHF19, HTRA1, SPSB1 and CCDC151 have higher degree centrality (DC)

284

values consecutively than the others (see Table 4). According to Table 4, it

285

has been observed that DC values of the genes PHF19, HTRA1, SPSB1 and

286

CCDC151 are 38, 11, 10, and 3, respectively. Depending on degree central-

287

ity, we identify the hub genes from Figure 4. PHF19, HTRA1, SPSB1 and

288

CCDC151 are those four hub genes (Ghub ) which have highest degree central-

289

ity 38, 11, 10 and 3. Now, the genes which are differentially expressed as well

290

as differentially methylated, have been observed.

291

Besides this, we have performed further critical analysis. We observe that

292

very few number of targeter miRNAs of the corresponding genes are common.

293

E.g., PHF19 and HTRA1 have three common miRNAs (viz., hsa-miR-107,

294

hsa-miR-103a-3p and hsa-miR-16-5p). Similarly, PHF19 and SPSB1 have two

295

common miRNAs (viz., hsa-miR-522-5p and hsa-miR-96-5p), whereas only one

296

miRNA is common for the case of PHF19 and CCDC151, as well as the case of

297

SPSB1 and HTRA1 (viz., hsa-miR-424-5p and hsa-miR-335-5p, respectively).

298

In addition, three genes have relationship with three different TFs. E.g., SMN1

AC CE P

TE

D

MA

NU

SC

RI

PT

272

15

ACCEPTED MANUSCRIPT gene has associated with the TF CEBPB, whereas KLHDC4 is connected with

300

the TF OTX1, and SF3B2 is related with TF MESP2. See Table 4 for details.

301

4.2.3 Epigenetic gene markers

302

We consider every possible combinations between differentially expressed and

303

differentially methylated status of genes (i.e., DEup ∩DMhyper , DEup ∩DMhypo ,

304

DEdown ∩DMhyper , and DEdown ∩DMhypo ). From Table 3, we provide those

305

genes that consist of the combinations having inverse relationship between

306

expression and methylation values. Here five genes are listed with inverse

307

relationship (i.e, DEup ∩DMhyper , DEup ∩DMhyper ), denoted as Ginv . These

308

genes are PHF19, SPSB1, HTRA1, CCDC151 and STX19. Among them, only

309

STX19 is hypo-methylated as well as up-regulated. Other four genes are hyper-

310

methylated as well as down-regulated. Thereafter, we perform intersection be-

311

tween Ghub and Ginv genes, and identify the resultant genes (viz., PHF19,

312

HTRA1, SPSB1 and CCDC151) or epigenetic marker. Furthermore, the cor-

313

responding IDP s of the markers are mentioned in column 1 of Table 4 as well

314

as column 1 of Table 5.

315

4.2.4 Biological Validation of genes and related IDP s

316

We further perform literature search, and pathway and Gene-Ontology anal-

317

yses. The association between gene HTRA1 and prostate carcinoma is de-

318

scribed in [31]. Similarly, [43] represents the association of gene SPSB1 in

319

the prostate carcinoma, whereas the relation between the gene HTRA1 and

320

the prostate carcinoma is described in [35]. Besides these, gene SPSB1 is en-

321

riched with a prostate-carcinoma-related GO:BP (viz., GO:0007242 intracellu-

322

lar signaling cascade (p-value=0.017996325)). Now, the relationship between

323

the GO:BP of GO:0007242 intracellular signaling cascade and the prostate

324

carcinoma is found in [33]. Therefore, gene SPSB1 involves indirectly with

AC CE P

TE

D

MA

NU

SC

RI

PT

299

16

D

MA

NU

SC

RI

PT

ACCEPTED MANUSCRIPT

TE

Fig. 2. Volcanoplot for identifying hyper-methylated and hypo-methylated genes (DMhyper and DMhypo , respectively) through simultaneous verification of significance level of p-value and fold change in log scale for methylation data.

the corresponding disease through the gene-ontology analyses. Similarly, gene

326

PHF19 is enriched with some prostate-carcinoma-related GO:BP-terms like

327

GO:0006350 transcription (p-value=5.85E-04) [32], [39], and GO:0045449 reg-

328

ulation of transcription (p-value=0.002157192) [42]. HTRA1 is associated with

329

related GO:BPs of GO:0010648 negative regulation of cell communication (p-

330

value=0.046487613) [34] and GO:0009968 negative regulation of signal trans-

331

duction (p-value= 0.034877541).

332

Among the four gene-markers, three markers (viz., PHF19, HTRA1, and

333

SPSB1) are already known through either literature search or Gene Ontology

334

analyses or both, whereas the remaining marker (viz., CCDC151) is stated as

335

a novel epigenetic marker since no information is available through the liter-

336

ature search and Gene-Ontology analyses. Moreover, the corresponding IDPs

337

of the resulting gene-markers are provided in Table 5.

AC CE P

325

17

ACCEPTED MANUSCRIPT Table 1 The list of up-regulated and down-regulated genes with p-values for the expression data of the Prostate Carcinoma dataset.

P-value

Status of gene

STX19

0.0001056

DEup

MED25

0.009166984

DEup

SNRPB

0.016748568

DEup

PHF19

8.32E-05

DEdown

SAMD3

0.000100528

CCDC151

0.000245824

FYCO1

0.000337727

DEdown

HTRA1

0.001240815

DEdown

RXRA

0.001511319

DEdown

HNRNPA1

0.005020765

DEdown

SF3B2

0.038755606

DEdown

0.040726192

DEdown

SMAD9

0.043757832

DEdown

CDCA4

0.045839458

DEdown

AC CE P

RI

SC

DEdown

NU

MA

D

TE

SPSB1

PT

Gene Name

DEdown

Table 2 The list of hyper-methylated and hypo-methylated genes with p-values for the methylation data of the Prostate Carcinoma dataset.

Gene Name

P-value

Status of gene

SPSB1

5.93E-08

DMhyper

PHF19

3.20E-05

DMhyper

HTRA1

0.001479375

DMhyper

CCDC151

0.002158725

DMhyper

APP

0.000387211

DMhypo

STX19

0.000517349

DMhypo

LOXL4

0.000953542

DMhypo

18

MA

NU

SC

RI

PT

ACCEPTED MANUSCRIPT

TE

D

Fig. 3. Venn diagram that shows the intersections between each pair of different patterns of differential expression and differential methylation (viz., DEup , DEdown , DMhyper , DMhypo ).

AC CE P

Table 3 The list of genes having inverse relationship between their differential expression and methylation status for the integrated Prostate Carcinoma epigenetic dataset where “exp” and “methyl” denote the expression and methylation data, respectively.

Gene Name STX19

P-value 0.0001056 (exp)

0.000517349 (methyl)

PHF19

8.32E-05 (exp)

5.93E-08 (methyl) CCDC151

0.000245824 (exp) 3.20E-05 (methyl)

HTRA1

0.001240815 (exp) 0.001479375 (methyl)

SPSB1

0.040726192 (exp) 0.002158725 (methyl)

19

Status of gene (DEup ∩ DMhypo )

(DEdown ∩ DMhyper )

(DEdown ∩DMhyper )

(DEdown ∩DMhyper )

(DEdown ∩DMhyper )

AC CE P

TE

D

MA

NU

SC

RI

PT

ACCEPTED MANUSCRIPT

Fig. 4. A simple view of the complete T F -miRN A-gene network for the experiment, specially highlighting (enlarging) the four genes, PHF19, HTRA1, CCDC151, and SPSB1. Here, OV AL shape with light orange color denotes genes, P ARALLELOGRAM shape with pink color shows the T F s, whereas T RIAN GLE shape with light orange color represents miRs or miRN As.

20

AC CE P

TE

D

MA

NU

SC

RI

PT

ACCEPTED MANUSCRIPT

Fig. 5. Second view of the above figure (Fig. 4) in which all the genes are kept together in the left-top side of the network, and all the miRNAs are arranged together in the right side, whereas all the TFs are shown in the middle-bottom portion of the network.

21

ACCEPTED MANUSCRIPT Table 4 The forty IDP s and their corresponding transcripts (genes) of Gintegrated set, with their corresponding targeter miRN As and T F s. IDP s

Gene Names

Degree

Targeter miRN As

TFs

centrality

PHF19

38

Serine protease HTRA1

HTRA1

11

miR-107,miR-103a-3p,miR-16-5p,miR-196a-3p,miR-99a-3p,miR-31, miR-190a,miR-197-3p,miR-155-5p,miR-26b-5p,miR-335-5p.

SPRY domain-containing SOCS box protein 1

SPSB1

10

miR-522-5p,miR-96-5p,miR-27a-3p,miR-26a-5p,miR-567, miR-122-5p,miR-24-3p,miR-941,miR-185-5p,miR-335-5p.

CCDC151

3

STX19

1

SMN1

1

KLHDC4

1

SF3B2

1

APP

1

CDCA4

RXRA protein Mothers against decapentaplegic homolog 9 Zinc finger protein SNAI1 Vacuolar-sorting protein SNF8 U1 small nuclear ribonucleoprotein A Small nuclear ribonucleoproteinassociated proteins B and B’ Putative Polycomb group protein ASXL1 Small nuclear ribonucleoprotein Sm D3 Suppressor of cytokine signaling 7 Signal transducer and activator of transcription 1-alpha/beta Splicing factor U2AF 65 kDa subunit

RI

SC NU

kshv-miR-K12-6-5p,miR-10a-5p,miR-424-5p.

MA

ARHGEF1

1

miR-335-5p CEBPB

OTX1 MESP2 miR-298 miR-124

1

miR-424

1

miR-486-3p

1

miR-138

DNAJA3

1

miR-129-5p

FYCO1

1

miR-218-5p

GSTCD

1

miR-570

HDAC3

1

miR-1261

HIGD1A

1

miR-641

CHD3

D

CLNS1A

AC CE P

Kelch domain containing protein 4 Splicing factor 3B subunit 2 Amyloid beta A4 protein ARHGEF1 protein Cell division cycleassociated protein 4 Chromodomain-helicaseDNA-binding protein 3 Methylosome subunit pICln DnaJ homolog subfamily A member 3, mitochondrial FYVE and coiled-coil domain-containing protein 1 Glutathione S-transferase C-terminal domain -containing protein Histone deacetylase 3 HIG1 domain family member 1A, mitochondrial Histone H3.1t MICOS complex subunit MIC60 Lysyl oxidase homolog 4 Nuclear receptor coactivator 1 Glycylpeptide Ntetradecanoyltransferase 1 Oxysterols receptor LXR-alpha Phosphatidylinositol 3-kinase regulatory subunit alpha Peroxisome proliferatoractivated receptor delta Retinoic acid receptor alpha Ras association domaincontaining protein 2 REST corepressor 1

TE

Coiled-coil domain -containing protein 151 Syntaxin-19 Survival motor neuron protein

PT

PHD finger protein 19

let-7g-5p,miR-182-5p,miR-105-5p,miR-23a-3p,miR-18a-5p,let-7f, let-7d-5p,let-7b-5p,let-7a-5p,miR-1301,miR-590-3p,miR-BART5, miR-571,miR-328,miR-107,miR-98,let-7c,miR-9-3p, let-7e-5p,miR-1283,miR-423-5p,miR-522-5p,miR-92b-3p,miR-138-5p, miR-218-5p,miR-34a-5p,miR-103a-3p,miR-96-5p,miR-16-5p,miR-192-5p, miR-193b-3p,miR-124-3p,miR-33a-3p,miR-29a-5p,miR-15a-3p, miR-18b-5p,miR-424-5p,miR-23b-3p.

HIST3H3

1

miR-1276p

IMMT

1

miR-199a-3p

LOXL4

1

miR-455-3p

NCOA1

1

miR-498

NMT1

1

miR-623

NR1H3

1

miR-622

PIK3R1

1

miR-455-3p

PPARD

1

miR-1270

RARA

1

miR-939

RASSF2

1

miR-302c

RCOR1

1

miR-198p

RXRA

1

miR-574-3p

SMAD9

1

miR-512-3p

SNAI1

1

miR-199a-5p

SNF8

1

miR-506

SNRPA

1

miR-1108

SNRPB

1

miR-1207-5p

ASXL1

1

miR-1207-5p

SNRPD3

1

miR-570

SOCS7

1

miR-145

STAT1

1

miR-1303

U2AF2

1

miR-145

22

SC

RI

PT

ACCEPTED MANUSCRIPT

PHD finger protein 19

Gene

Gene Ontology

Literature

Status

name

search

of

GO:BP of GO:0006350 transcription (p-value=5.85E-04) [39]* , [32]* ,

[35]

Existing

[31]

Existing

[43]

Existing

MA

IDP

NU

Table 5 Biological validations of the epigenetic gene-markers, and the corresponding (potential) IDPs.

PHF19

marker

GO:0045449 regulation of transcription (p-value=0.002157192) [42]* . HTRA1

GO:BP of GO:0010648 negative regulation of cell communication

D

Serine protease HTRA1

SPSB1

AC CE P

SPRY domain -containing SOCS box protein 1

TE

(p-value=0.046487613)

Coiled-coil domaincontaining protein 151

[34]* ,

GO:0009968 negative regulation of

signal transduction (p-value=0.034877541).

GO:BP of GO:0007242 intracellular signaling cascade (p-value=0.017996325) [33] * .

CCDC151

Novel

The reference represents that the corresponding GO-term has a relation with Prostate Carcinoma. *

23

ACCEPTED MANUSCRIPT 5

Conclusion

339

In this article, we perform a transcriptomic analysis of genes for transcripts

340

encoding IDP s on a multi-omics prostate cancer dataset that consists of both

341

gene expression and methylation data. As a result, we identify some potential

342

epigenetic gene-markers (having inverse relationship between their diffential

343

expression and methylation patterns) through integrating statistical and TF-

344

miRNA-gene network centrality methods.

345

Notably, in this study, we have two hypotheses. First one is that the integrated

346

method of associations of methylation patterns with the gene expression pat-

347

terns, and hub gene selection together has a better significance of identify-

348

ing potential genemarkers (and corresponding IDPs) for the disease rather

349

than considering only the expression pattern. Now, methylation generally de-

350

creases the gene expression level. Since we utilize this phenomena (i.e., inverse

351

relationship between expression and methylation patterns), thus our study

352

is surely the enhanced version rather than considering only the expression

353

pattern. The functional significance is that the expression patterns of these

354

biomarkers are completely depended on their methylation patterns rather than

355

other factors. The second hypothesis is that here we have identified four epige-

356

netic gene-markers (viz., PHF19, HTRA1, SPSB1, and CCDC151). However,

357

for testing the second hypothesis, we initially perform literature search, and

358

gene ontology analyses on the genes using DAVID database for finding as-

359

sociations between the genes and the disease. Among these markers, three

360

markers (viz., PHF19, HTRA1, and SPSB1) are existing since some associa-

361

tions between the genes and the disease are found through either literature

362

evidences or Gene Ontology analysis or both, whereas the remaining marker

363

(viz., CCDC151) is a novel marker. It is expected that the novel marker may

364

have a significant therapeutic value for the disease. Finally, these markers and

365

their corresponding IDP s are reported.

AC CE P

TE

D

MA

NU

SC

RI

PT

338

24

ACCEPTED MANUSCRIPT References

367

[1] M. Babu, R. van der Lee , de Groot and J. Gsponer, Intrinsically disordered

368

proteins: regulation and disease, Current Opinion in Structural Biology,

369

21(3):432-440, 2011.

PT

366

[2] M. Fuxreiter, I. Simon, P. Friedrich and P. Tompa, Preformed Structural

371

Elements Feature in Partner Recognition by Intrinsically Unstructured Proteins,

372

Journal of Molecular Biology, 338(5):1015-1026, 2004.

SC

RI

370

[3] P. Romero, Z. Obradovic, X. Li, E. Garner, C. Brown and A. Dunker, Sequence

374

complexity of disordered protein, Proteins: Structure, Function, and Genetics,

375

42(1):38-48, 2000.

MA

NU

373

[4] F. Gallat, A. Laganowsky, K. Wood, F. Gabel, L. van Eijck, J. Wuttke,

377

M. Moulin, M. Hartlein, D. Eisenberg, J. Colletier, G. Zaccai and M. Weik,

378

Dynamical Coupling of Intrinsically Disordered Proteins and Their Hydration

379

Water: Comparison with Folded Soluble and Membrane Proteins, Biophysical

380

Journal, vol. 103, no. 1, pp. 129-136, 2012.

TE

D

376

[5] C. Brown, S. Takayama, A. Campen, P. Vise, T Marshall, C. Oldfield et al,

382

Evolutionary Rate Heterogeneity in Proteins with Long Disordered Regions,

383

Journal of Molecular Evolution, 55(1):104-110, 2002.

AC CE P

381

384

[6] J. Cheng, M. Sweredoski and P. Baldi, Accurate Prediction of Protein Disordered

385

Regions by Mining Protein Structure Data, Data Mining and Knowledge

386

Discovery, 11(3):213-222, 2005.

387

[7] R. van der Lee, M. Buljan, B. Lang, R. Weatheritt, G. Daughdrill, A. Dunker,

388

M. Fuxreiter, J. Gough, J. Gsponer, D. Jones, P. Kim, R. Kriwacki, C. Oldfield,

389

R. Pappu, P. Tompa, V. Uversky, P. Wright and M. Babu, Classification of

390

Intrinsically Disordered Regions and Proteins, Chemical Reviews, vol. 114, no.

391

13, pp. 6589-6631, 2014.

392

393

[8] V. Uversky, Intrinsically disordered proteins and their (disordered) proteomes in neurodegenerative disorders, Frontiers in Aging Neuroscience, 7, 2015.

25

ACCEPTED MANUSCRIPT [9] S. Bandyopadhyay, S. Mallik, and A. Mukhopadhyay, A Survey and Comparative

395

Study of Statistical Tests for Identifying Differential Expression from Microarray

396

Data, IEEE/ACM Transactions on Computational Biology and Bioinformatics,

397

11:1, pp. 95-115, 2013.

PT

394

[10] U. Maulik, S. Mallik, A. Mukhopadhyay and S. Bandyopadhyay, Analyzing

399

Gene Expression and Methylation Data Profiles using StatBicRM: Statistical

400

Biclustering-based Rule Mining, Plos One, 10(4): e0119448, 2015.

SC

RI

398

[11] S. Mallik, A. Mukhopadhyay, U. Maulik and S. Bandyopadhyay, Integrated

402

Analysis of Gene Expression and Genome-wide DNA Methylation for Tumor

403

Prediction: An Association Rule Mining-based Approach, Proc. IEEE Symposium

404

on Computational Intelligence in Bioinformatics and Computational Biology

405

(CIBCB), IEEE Symposium Series on Computational Intelligence - SSCI 2013,

406

Singapore, pp. 120-127, April 16, 2013.

MA

NU

401

[12] S. Mallik, A. Mukhopadhyay, and U. Maulik, RANWAR: Rank-Based Weighted

408

Association Rule Mining from Gene Expression and Methylation Data,IEEE

409

Nanobioscience, 14: 1, pp. 59-66, 2015.

TE

D

407

[13] S. Mallik and U. Maulik, MiRNA-TF-Gene Network Analysis through Ranking

411

of Biomolecules for Multi-Informative Uterine Leiomyoma Dataset, Journal of

412

Biomedical Informatics, vol. 57, pp. 308-319,2015.

413

414

AC CE P

410

[14] D. Baek, J. Villen, C. Shin, F. Camargo, S. Gygi and D. Bartel, The impact of microRNAs on protein output, Nature, 455(7209):64-71, 2008.

415

[15] T. Di Domenico,I. Walsh ,A. Martin and S. Tosatto, MobiDB: a comprehensive

416

database of intrinsic protein disorder annotations, Bioinformatics, 28(15):2080-

417

2081, 2012.

418

[16] D. Sengupta and S. Bandyopadhyay, Topological patterns in microRNA gene

419

regulatory network: studies in colorectal and breast cancer, Molec. BioSyst., vol

420

9:pp. 13601371, 2013.

421

422

[17] V. Matys et al.,TRANSFAC and its module TRANSCompel: transcriptional gene regulation in eukaryotes, Nucleic Acids Res. (34) , pp. D108-D110, 2006.

26

ACCEPTED MANUSCRIPT 423

[18] G. Zheng, K. Tu, Q. Yang, Y. Xiong, C. Wei, L. Xie, Y. Zhu and Y. Li, ITFP:

424

an integrated platform of mammalian transcription factors, Bioinformatics, 24

425

(20) , pp. 2416-2417, 2008. [19] A. Dunker, I. Silman, V. Uversky and J. Sussman, Function and structure of

427

inherently disordered proteins, Current Opinion in Structural Biology, 18(6):756-

428

764, 2008.

RI

PT

426

[20] T. Harmon and R. Pappu, An Evolutionary Algorithm for the Design of

430

Different Degrees of Secondary Structure in Intrinsically Disordered Proteins

431

(IDPs), Biophysical Journal, 108(2):228a, 2015.

NU

SC

429

[21] M. Jensen, P. Markwick, S. Meier, C. Griesinger, M. Zweckstetter, S. Grzesiek

433

et al, Quantitative Determination of the Conformational Properties of Partially

434

Folded and Intrinsically Disordered Proteins Using NMR Dipolar Couplings,

435

Structure, 17(9):1169-1185, 2009.

D

437

[22] S. Shojania and J. D. O’Neil, Intrinsic Disorder and Function of the HIV-1 Tat Protein, Protein & Peptide Letters, vol. 17, no. 8, pp. 999-1011, 2010.

TE

436

MA

432

[23] J. Ward, L. McGuffin, K. Bryson, B. Buxton and D. Jones, The DISOPRED

439

server for the prediction of protein disorder, Bioinformatics, 20(13):2138-2139,

440

2004.

AC CE P

438

441

[24] D. Jones and D. Cozzetto, DISOPRED3: precise disordered region predictions

442

with annotated protein-binding activity, Bioinformatics, 31(6):857-863, 2014.

443

[25] P. Ruy, R. Torrieri, J. Toledo, V. Alves, A. Cruz and J. Ruiz, Intrinsically

444

disordered proteins (IDPs) in trypanosomatids, BMC Genomics, 15(1):1100,

445

2014.

446

[26] Y. Edwards, A. Lobley, M. Pentony and D. Jones, Insights into the regulation

447

of intrinsically disordered proteins in the human proteome by analyzing sequence

448

and gene expression data, Genome Biol, 10(5):R50, 2009.

449

[27] L. Iakoucheva, C. Brown, J. Lawson, Z. Obradovi and A. Dunker, Intrinsic

450

Disorder in Cell-signaling and Cancer-associated Proteins, Journal of Molecular

451

Biology, 323(3):573-584, 2002.

27

ACCEPTED MANUSCRIPT 452

[28] M. Aqil, A.R. Naqvi, S. Mallik, S. Bandyopadhyay, U. Maulik and S. Jameel,

453

The HIV Nef protein modulates cellular and exosomal miRNA profiles in human

454

monocytic cells, Journal of Extracellular Vesicles, 3: 23129, 2014. [29] M. Aqil, S. Mallik, S. Bandyopadhyay, U. Maulik and S. Jameel, Transcriptomic

456

Analysis of mRNAs in Human Monocytic Cells Expressing the HIV-1 Nef Protein

457

and Their Exosomes, BioMed Research International, vol. 2015, article id:

458

492395, pp. 1-10, 2015.

SC

RI

PT

455

[30] I. S. Vlachos, M. D. Paraskevopoulou, D. Karagkouni, G. Georgakilas, T.

460

Vergoulis, I. Kanellos, I. L. Anastasopoulos, S. Maniou, K. Karathanou, D.

461

Kalfakakou, A. Fevgas, T. Dalamagas and A. G. Hatzigeorgiou, DIANA-TarBase

462

v7.0: indexing more than half a million experimentally supported miRNA:mRNA

463

interactions, Nucl. Acids Res, Vol. 43, pp. D153-D159, 2014.

MA

NU

459

[31] J. Chien, M. Campioni, V. Shridhar and A. Baldi, HtrA Serine Proteases as

465

Potential Therapeutic Targets in Cancer, Current Cancer Drug Targets, 9(4):451-

466

468, 2009.

TE

D

464

[32] S. Lee, M. Ha, J. Lee, P. Nguyen, Y. Choi, F. Pirnia, et al., Inhibition of

468

the 3-Hydroxy-3-methylglutaryl-coenzyme A Reductase Pathway Induces p53-

469

independent Transcriptional Regulation of p21WAF1/CIP1 in Human Prostate

470

Carcinoma Cells, Journal of Biological Chemistry, 273(17):10618-10623, 1998.

AC CE P

467

471

[33] F. Yuan, Y. Zhou, M. Wang, et al., Identifying New Candidate Genes

472

and Chemicals Related to Prostate Cancer Using a Hybrid Network and

473

Shortest Path Approach, Comput Math Methods Med., 2015: 462363, 2015, doi:

474

10.1155/2015/462363.

475

476

477

[34] http://cmrn.systemsbiology.net/cluster/BPH%20Prostate%20Dhanasekaran/6 [35] John R. Prensner, Discovery and Characterization of Long Non-Coding RNAs

478

in

Prostate

Cancer,

PhD

479

http://deepblue.lib.umich.edu/bitstream/handle/2027.42/107221/prensner_1.pdf?sequence=1

28

Dissertation,

2012,

ACCEPTED MANUSCRIPT 480

[36] S. Servia Rodr guez, A. Noulas, C. Mascolo, A. Fernndez-Vilas and R. Dz-

481

Redondo, The Evolution of Your Success Lies at the Centre of Your Co-

482

Authorship Network, PLOS ONE, 10(3):e0114302, 2015. [37] M. Sickmeier, J. Hamilton, T. LeGall, V. Vacic, M. Cortese, A. Tantos

484

et al, DisProt: the Database of Disordered Proteins, Nucleic Acids Research,

485

35(Database):D786-D793, 2007.

487

RI

[38] H. Berman, The Protein Data Bank, Nucleic Acids Research, 28(1):235-242,

SC

486

PT

483

2000.

[39] S. Bilke, R. Schwentner, F. Yang, et al., Oncogenic ETS fusions deregulate

489

E2F3 target genes in Ewing sarcoma and prostate cancer, Genome Research,

490

23:17971809, 2013.

MA

492

[40] L.C. Freeman, Centrality in social networks: conceptual clarification, Sociometry, vol. 1, pp. 215-239, 1979.

D

491

NU

488

[41] A. Ozgur, T. Vu, G. Erkan, and D.R. Radev, Identifying gene-disease

494

associations using centrality on a literature mined gene-interaction network,

495

Bioinformatics, vol.24:pp. i277-i285, 2008, doi:10.1093/bioinformatics/btn182.

AC CE P

TE

493

496

[42] G. Xu, J. Wu, L. Zhou, B. Chen, Z. Sun, F. Zhao, and Z. Tao, Characterization

497

of the Small RNA Transcriptomes of Androgen Dependent and Independent

498

Prostate Cancer Cell Line by Deep Sequencing, PLoS One, 5(11): e15519, 2010,

499

doi: 10.1371/journal.pone.0015519.

500

[43] D. G. Tandefelt, J. L. Boormans, H. A. van der Korput, G. W. Jenster, and

501

J. Trapman, A 36-gene Signature Predicts Clinical Progression in a Subgroup of

502

ERG-positive Prostate Cancers, European Urology, vol. 64, iss. 6, pp. 941-950,

503

2013.

29

ACCEPTED MANUSCRIPT List of Abbreviations IDPs

- Intrinsically Disordered Proteins

TF

- Transcription Factors

PT

miRNA - MicroRNA

HTRA1 - serine Protease HTRA1

SC

SPSB1 - SPRY domain-containing SOCS box protein 1

RI

PHF19 - PHD finger protein 19

NU

CCDC151 - Coiled-coil domain-containing protein 151 - CEBPB protein 1

OTX1

- Homeobox protein OTX1

MESP2

- Mesoderm posterior protein 2

MA

CEBPB

AC CE P

TE

D

ITPT - Insights into Potential Intrinsically Disordered Proteins Through Transcriptomic Analysis of Genes

ACCEPTED MANUSCRIPT Highlights

PT

RI SC NU MA D TE

· · ·

We perform a transcriptomic analysis of genes for transcripts encoding IDPs on a Prostate cancer multi-omics dataset. We obtain differentially expressed and differentially methylated genes using t-test. We determine hub-genes from TF-miRNA-gene network using degree centrality. We identify some epigenetic gene-markers by integrating statistical and TF-miRNAgene network analyses.

AC CE P

·