IDPT: Insights into potential intrinsically disordered proteins through transcriptomic analysis of genes for prostate carcinoma epigenetic data Saurav Mallik, Sagnik Sen, Ujjwal Maulik PII: DOI: Reference:
S0378-1119(16)30246-3 doi: 10.1016/j.gene.2016.03.056 GENE 41262
To appear in:
Gene
Received date: Revised date: Accepted date:
2 December 2015 22 February 2016 30 March 2016
Please cite this article as: Mallik, Saurav, Sen, Sagnik, Maulik, Ujjwal, IDPT: Insights into potential intrinsically disordered proteins through transcriptomic analysis of genes for prostate carcinoma epigenetic data, Gene (2016), doi: 10.1016/j.gene.2016.03.056
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT
IDPT: Insights into Potential Intrinsically
PT
Disordered Proteins Through Transcriptomic
SC
Epigenetic Data
RI
Analysis of Genes for Prostate Carcinoma
Intelligence Unit, Indian Statistical Institute, Kolkata-700108, India.
b Department
of Computer Science and Engineering, Jadavpur University,
MA
a Machine
NU
Saurav Mallik a , Sagnik Sen b , and Ujjwal Maulik c
Kolkata-700032, India. of Computer Science and Engineering, Jadavpur University,
D
c Department
AC CE P
Abstract
TE
Kolkata-700032, India.
Involvement of intrinsically disordered proteins (IDPs) with various dreadful diseases like cancer is an interesting research topic. In order to gain novel insights into the regulation of IDPs, in this article, we perform a transcriptomic analysis of mRNAs (genes) for transcripts encoding IDPs on a human multi-omics prostate carcinoma dataset having both gene expression and methylation data. In this regard, firstly the genes that consist of both the expression and methylation data, and that are corresponding to the cancer-related prostate-tissue-specific disordered proteins of M obiDb database, are selected. We apply standard t-test for determining differentially expressed genes as well as differentially methylated genes. A network having these genes and their targeter miRNAs from Diana Tarbase v7.0 database and corresponding Transcription Factors from TRANSFAC and ITFP databases, is then built. Thereafter, we perform literature search, and KEGG pathway and gene ontology analyses using DAVID database. Finally, we report several significant po-
Preprint submitted to Elsevier
22 February 2016
ACCEPTED MANUSCRIPT tential gene-markers (with the corresponding IDPs) that have inverse relationship between differential expression and methylation patterns, and that are hub genes of the TF-miRNA-gene network.
PT
Key words: Intrinsically Disordered proteins, Epigenetic genemarkers,
SC
dataset, t-test, TF-miRNA-gene network, hub genes.
RI
Transcriptomic analysis of genes, Multi-omics Prostate Carcinoma Epigenetic
1
Introduction
2
Intrinsically Disordered Proteins (IDP s) are a type of proteins without any
3
stable ordered structure [1], [2], [3]. IDP s consist of a specific disorder region.
4
Alternatively, it might have a complete structural disorder. IDP s are of differ-
5
ent categories of proteins (viz., membrane, soluble, aqueous proteins etc.) [4].
6
IDP s are indicators of evolutionary rate [5]. IDP s may involve different types
7
of functional activities. In regulatory and signaling pathways, IDP s that treat
8
like lead compounds, interact with other proteins by its disordered region. It
9
is known that many proteins need to acquire a well-defined structure in order
10
to perform their function, while a higher significant portion of the proteome
11
of any organism has polypeptide segments which are not likely to design a
12
defined 3D structure, but are still functional. These segments of proteins are
13
stated as intrinsically disordered regions (IDRs) [6]. The Proteins that do
14
not have IDRs, are denoted as structured proteins, whereas the proteins with
15
completely disordered sequences which do not acquire any tertiary structure,
16
are stated as intrinsically disordered proteins (IDP s) [7]. IDP is a complete
17
disordered structure or containing Intrinsically Disordered regions (IDRs) of
AC CE P
TE
D
MA
NU
1
Email addresses:
[email protected], chasaurav
[email protected] (Saurav Mallik),
[email protected] (Sagnik Sen),
[email protected] (Ujjwal Maulik).
2
ACCEPTED MANUSCRIPT different lengths at different positions [8]. It is observed that there are several
19
IDRs which are predicted from some ordered proteins. Approximately, 30%
20
of eukaryotic proteins have disordered regions. But it is not necessary that
21
they are differentially expressed.
22
Due to their unusual structural characteristics and important functional char-
23
acteristics, the presence of IDP s in a cell needs to be observed very carefully.
24
Different dreadful diseases like cancers, carcinomas, cardiac diseases and neu-
25
rodegenerative diseases can be formed by the differentially expressed IDP s
26
[8].
27
Now a days, gene expression is not a singular matter [9]. Different epigenetic
28
factors (like DNA methylation [10], [11], [12]) can affect the regulation of genes.
29
It signifies addition of the methyl group to 5th cytosine pyrimidine ring or 6th
30
nitrogen of the adenine purine ring in genomic DNA. Methylation in general
31
decreases the gene expression levels. Until now, IDP s are not analyzed with
32
multi-omics epigenetic dataset. Methylation has two states (hyper-methylation
33
and hypo-methylation). Approximately, 80-90% of CpG sites are methylated
34
across the whole genome, in human body.
35
Transcription Factors (T F s) [13] also play an important role in case of gene
36
regulation. T F s might regulate genes and miRNAs either positively or nega-
37
tively. It maintains the flow of genetic information either alone or with other
38
proteins in a group from DNA to mRN A by either promoting, or blocking the
39
hiring of RNA polymerase to the specified genes. T F s have significant role in
40
case of gene expression levels as well as methylation levels.
41
Prostate Carcinoma is a type of cancers that generally affects the male pro-
42
static glands. It is observed that prostate cancer is generally common factor
43
for the old aged people. It contains three risk groups (viz., genetic, obesity
44
and aged). Prostate cancer is uncommon for the age less than 45. Average age
45
of the prostate cancers might be 70. It has not have any clear symptoms. It
AC CE P
TE
D
MA
NU
SC
RI
PT
18
3
ACCEPTED MANUSCRIPT has some symptoms just like the benign prostatic hyperplasia. Some of the
47
common symptoms are hematuria (i.e., blood in urine), painful urination etc.
48
In case of advanced stages of prostate carcinoma, it can be possible that it
49
might affect other body organs like spinal chord, femer etc.
50
Now, the equilibrium level of a protein [14] is based on the rate of its creation
51
relative to the rate of its degradation. The production-quantity of a protein
52
in the cell is surely affected by the level of expression of its mRNA (gene)
53
transcript. If it is necessary for the proteins to keep themselves in the disor-
54
dered state for any length of time, they have to either bypass the pathways
55
related to endogenous degradation (e.g., ATP-dependent proteolytic 26S pro-
56
teasome) which target unfolded-proteins or be created in sufficient quantity
57
to overload the pathways associated with protein degradation in temporary
58
fashion. In order to determine the expression patterns of mRNA (gene) for
59
transcripts encoding IDPs and to discover novel insights into the biomolec-
60
ular mechanisms of transcriptional regulation, some bioinformatics-analysis
61
might be carried out on any publicly available tissue-specific mRNA (gene)
62
expression data having experimental (diseased) and control (normal) sam-
63
ples. Until now, no epigenetic study has been tried for this purpose. Thus, if
64
some multi-omics epigenetic data can be utilized for this purpose, then the
65
analysis will be surely enhanced. Therefore, for finding novel insights into
66
the regulation of IDPs, in this article, we perform a transcriptomic anal-
67
ysis of mRNAs (genes) for transcripts encoding IDPs on a publicly avail-
68
able human multi-omics dataset having both gene expression and methyla-
69
tion data. In this regard, we identify some epigenetic gene-markers having
70
their diffential expression and methylation patterns by integrating statistical
71
and TF-miRNA-gene network analyses for a prostate carcinoma multi-omics
72
(epigenetic) dataset having both the gene expression and methylation data.
73
Thereafter, the corresponding IDPs related to the resulting gene-markers are
74
obtained. These resulting IDPs are called as potential IDPs since their cor-
AC CE P
TE
D
MA
NU
SC
RI
PT
46
4
ACCEPTED MANUSCRIPT responding mRNA transcripts are both differentially expressed as well as dif-
76
ferentially methylated, and are also hub-genes (centralized genes) in corre-
77
sponding TF-miRNA-gene network that are associated with higher number of
78
other biomolecules (TF and miRNAs). In brief, for this regard, at first, from
79
M obiDb database [15], we collect the IDP s which are associated with the
80
cancers in different tissue types of human body (viz., prostate tissue, ovarian
81
tissue, lung tissue, oral tissue, cardiac tissue, lymph tissue, pancreatic tissue,
82
breast tissue). Since we know that protein is a translational product of mRNA
83
(gene), and epigenetic data is not available for the protein, therefore for the
84
experimental purposes, we take a multi-omics prostate carcinoma epigenetic
85
dataset having both the gene expression and methylation data. Thereafter, we
86
consider matched genes (say, Gmatch ) that are common to both the datasets
87
(i.e., gene expression and methylation datasets). We then perform intersec-
88
tion between Gmatch and the genes corresponding to the collected IDP s from
89
M obiDb database (=Gintegrated ). Notably, the matched samples between ex-
90
pression and methylation datasets are only considered from the intersected
91
genes and are utilized in our experiment. After that, we run standard t-test
92
[16] on the gene expression data as well as the methylated data of the selected
93
genes to identify differentially expressed genes (DEgenes ) as well as differen-
94
tially methylated genes (DMgenes ), respectively. Volcanoplot is applied to find
95
the genes whether they are up-regulated (DEup ) or down-regulated (DEdown ),
96
and hyper-methylated (DMhyper ) or hypo-methylated (DMhypo ). We also de-
97
termine the genes that have inverse relationship between the expression and
98
methylation patterns (i.e. (DEup ∩DMhypo ) or (DEdown ∩DMhyper )) that are
99
together denoted as Ginv . Besides this, we consider all the intersected genes
100
(Gintegrated ) and identify the corresponding targeter miRN As of them by Di-
101
ana Tarbase v7.0. Simultaneously, we determine the corresponding Transcrip-
102
tion Factors (T F s) that regulate these genes through TRANSFAC [16], [17]
103
and ITFP [16], [18] databases. Thereafter, we build a network comprising of
104
these genes, T F s and miRN As. In this case, we also highlight the genes which
AC CE P
TE
D
MA
NU
SC
RI
PT
75
5
ACCEPTED MANUSCRIPT are associated with higher number of T F s and/or miRN As. Thereafter, we
106
compute the degree centrality of the genes, and determine the hub genes of the
107
network (denoted as Ghub ). Thereafter, we perform intersection between the
108
Ginv and Ghub , and the resultant genes are stated as epigenetic markers for the
109
disease of prostate carcinoma. In addition, we perform literature search, and
110
Gene Ontology analysis using DAVID database. Finally, the top epigenetic
111
gene-markers, and corresponding potential IDP s are reported.
112
The rest of article is organized as follows. In Section 2, literature survey of
113
the IDP s and related works are described. Section 3 demonstrates the our
114
method whereas Section 4 represents the dataset as well as the experimental
115
results and discussion, consecutively. Finally, Section 5 concludes the article.
116
2
117
The area of research related to IDP is bound by In-Vivo techniques [19]. In
118
the area of In-silico studies, only a few computational methods have been
119
developed. Defining the structure of a protein is one of the important ob-
120
jectives in computational biology. In earlier, computational works related to
121
IDP s are based on X-Ray crystallography and Nuclear Magnetic Resonance
122
(N M R) predictions [20], [21]. In recent studies, differentially expressed IDP s
123
are causing diseases like cancers, neurodegenerative diseases, HIV [22], and
124
heart diseases. The computational techniques basically are used to predict
125
the disorder regions [20], [21]. As mentioned before, earlier ages of compu-
126
tational researches are depending on core structural studies. Later, different
127
ad-hoc techniques and energy computation are performed for the prediction of
128
IDRs. Some of the well known optimal tools are developed and designed. One
129
of them is DISOPRED viz., DISOPRED2 [23] and DISOPRED3 [24]. This
130
tool is designed for IDR predictions. Now a days, the relationship between
131
the diseases and IDP s are one of the important research platforms [1], [19].
MA
NU
SC
RI
PT
105
AC CE P
TE
D
Literature Review
6
ACCEPTED MANUSCRIPT The proteins with IDRs can cause different neurodegenerative diseases (e.g.,
133
Alzheimer’s disease). Sometimes the gene regulatory networks or pathways [1]
134
are completely or partially effected by the IDP s. Basically, if the IDP is a
135
lead compound, IDR becomes an active site. As a result, proteins with IDRs
136
behave differently during interaction which is affecting the normal verse of
137
protein-protein interactions as well as regulatory networks and pathways. The
138
functional classes associated with IDP s have been proposed to cover the wider
139
range of biological processes rather than the classes related to structured pro-
140
teins. The concept of IDP s does not belong to the classical structure theory of
141
proteins. The traditional structure-function relationship of proteins is applied
142
to the area of stable and folded proteins whereas the structured features are
143
identified for interaction of neighbor molecules [8], [25].
144
Besides the above information, [26] has provided a study of significant highly
145
disordered proteins in finding their functions in human cells by analyzing gene
146
expression data. The association of the IDP s in cell signaling and cancer is
147
demonstrated through a corresponding study in [1] and [27]. Furthermore,
148
several works have been performed on the transcriptome level of Nef-expressing
149
U937 cells and their exosomes [28], [29].
150
From the literature, it is observed that no epigenetic study has been performed
151
previously for determining IDP s. It might be helpful to refine the previous
152
findings regarding IDP s and additionally able to produce some novel results.
153
Therefore, for finding novel insights into the regulation of IDPs, in this article,
154
we perform a transcriptomic analysis of mRNAs (genes) for transcripts encod-
155
ing IDPs on a publicly available human multi-omics dataset having both gene
156
expression and methylation data. In this regard, we identify some epigenetic
157
gene-markers having their differential expression and methylation patterns by
158
integrating statistical and TF-miRNA-gene network analyses for a prostate
159
carcinoma multi-omics (epigenetic) dataset having both the gene expression
160
and methylation data. Here, the epigenetic gene-markers must have both dif-
AC CE P
TE
D
MA
NU
SC
RI
PT
132
7
ACCEPTED MANUSCRIPT ferential expression patterns as well as differential methylation patterns, and
162
they also behave as the hub genes of the network. Finally, the corresponding
163
IDPs related to the resulting gene-markers are obtained, and then reported.
164
These resulting IDPs are stated as potential IDPs as their corresponding
165
mRNA transcripts are both differentially expressed as well as differentially
166
methylated, and are also hub-genes (centralized genes) of TF-miRNA-gene
167
network.
168
3
169
In this article, we carry out a transcriptomic analysis of mRNAs (genes) for
170
transcripts encoding IDPs on a publicly available human multi-omics epi-
171
genetic dataset having both gene expression and methylation data. In this
172
regard, we determine some epigenetic gene-markers having their diffential ex-
173
pression and methylation patterns by integrating statistical and TF-miRNA-
174
gene network analyses for a prostate carcinoma multi-omics (epigenetic) dataset
175
having both the gene expression and methylation data. The corresponding
176
IDPs related to the resulting gene-markers are also identified. The steps of
177
the transcriptomic analysis are described as follows.
178
3.1 Initial set of IDP s from M obiDb database
179
M obiDb [15] is a structural database of IDP s combining the data from
180
Disprot [37], P DB [38], EM BL, U niprot and other structural databases.
181
M obiDb basically provides tissue specificity and annotation for a IDP . The
182
IDP s are chosen from the M obiDb based on their cancer specific tissue types
183
(viz., prostate, lung, ovarian, oral, pancreatic, cardiac, lymph, and breast tis-
184
sues).
SC
RI
PT
161
AC CE P
TE
D
MA
NU
Materials and Methods
8
ACCEPTED MANUSCRIPT In this article, we utilize M obiDb database to collect the IDP s related to one
186
of the tissues (viz., prostate tissue).
187
3.2 Initial geneset determination
188
We initially consider the common genes having both the values and the (matched)
189
common samples from multi-omics dataset (= Gmatch ). We thereafter perform
190
the intersection between the Gmatch and the geneset of the corresponding IDP s
191
(= Gintegrated ). This intersected genes will be used for further study.
192
3.3 Data-normalization
TE
D
MA
NU
SC
RI
PT
185
Microarray technique is one of the useful tools for measuring gene expression
AC CE P
data across different experimental and control samples. Before using any statistical test, we need to scale all the values of each gene. Therefore, initially we have to utilize a normalization technique on each of the Gintegrated genes for the expression data as well as methylation data. To normalize the expression/methylation dataset, we perform zero-mean normalization [11]. The zero-mean normalization is formulated in such a way that mean value of gene is converted to zero which is formulated as follows. norm = yjk
(yjk − µ) , σ
(1)
193
where µ and σ stand for the mean and standard deviation, respectively for
194
expression/methylation data of j-th gene before normalization, whereas yjk
195
norm and yjk denote the values of j-th gene across k-th sample before and after
196
normalization, respectively. 9
ACCEPTED MANUSCRIPT 197
3.4 Identification of Differentially expressed and differentially methylated genes
Statistical test is important for determining differentially expressed genes (DE)
PT
as well as differentially methylated genes (DM). The t-test [9] is a popular test that is widely used in microarray data for determing DE/DM genes.
RI
In (two-sample) t-test, the difference between the means of the two groups
SC
is compared where the variation of the data is also considered. The p-value is the probability of observing a t-value as large or larger than the actual
NU
observed t-value in which the null hypothesis is considered as true. The p-value is computed from either t-table or cumulative distribution function (cdf). Let, for each gene n, group 1 consists of m1 diseased samples, with mean a1 and
MA
standard deviation sm1 , and group 2 contains m2 control samples, with mean
D
a2 and standard deviation sm2 . Therefore, t-test is defined as follows. (a1 − a2 ) , em
(2)
TE
t=
as:
AC CE P
where em refers to the standard error of the groups’ mean, which is formulated
em = eP ooled ∗
s
1 1 + . m1 m2
(3)
Here, ePooled is the pooled estimate of the population standard deviation; i.e.,
v u u (m1 − 1) ∗ s2m1 + (m2 − 1) ∗ s2m2 epooled = t ,
dof
(4)
198
where dof refers to the degree of freedom of the test (i.e., dof = (m1 +m2 −2)).
199
In this article, we apply the t-test on the normalized expression data of each
200
gene belongs to the Gintegrated genes to identify DE genes. Similarly, we per-
201
form the t-test on the normalized methylation data of each gene belongs to
202
the Gintegrated genes to produce DM genes. If the p-value of a gene for its
203
(normalized) expression data is less than 5%, it can be said that the gene
204
becomes DE. Similarly, the p-value of a gene for its (normalized) methylation 10
ACCEPTED MANUSCRIPT data is less than 5%, the gene is DM .
206
After obtaining the DE/DM genes, we apply volcanoplot to identify up-
207
regulated/over-expressed genes (DEup ) and down-regulated/under-expressed
208
genes (DEdown ) from the expression data. Similarly, from the methylation
209
data, hyper-methylated genes (DMhyper ) and hypo-methylated genes (DMhypo )
210
are obtained by the use of the volcanoplot.
211
3.5 Integrated study of methylation on expression
212
Now, we identify the genes of which differential expression status is inverse of
213
the status of differential methylation (say, Ginv ). In other words, we consider
214
the genes having (i) DEup ∩ DMhypo , and (ii) DEdown ∩ DMhyper patterns.
215
Later, these genes will be utilized for selecting the disease-markers.
216
3.6 Accumulation of T F s and miRN As of the genes
217
We thereafter collect the targeter miRNAs of the genes belongs to the Gintegrated
218
set using DIANA Tarbase V.07 database [30], a web server which provides val-
219
idated interactions between the genes and the targeter miRN As. Besides this,
220
we have obtained the corresponding TFs of the genes belongs to the Gintegrated
221
set using TRANSFAC [16], [17] and ITFP [16], [18] databases.
222
3.7 T F -miRN A-gene regulatory network formation and Hub gene selection
223
A network comprising the genes, the corresponding TFs and the targeter miR-
224
NAs is then built. The TF-miRNA-gene network is analyzed by a well-known
225
topological measure, degree centrality (DC) [36], [40], [41] since the whole
226
network is considered as a symmetric network.
AC CE P
TE
D
MA
NU
SC
RI
PT
205
11
ACCEPTED MANUSCRIPT In the degree centrality measure, the degree of a vertex (gene/TF/miRNA)
228
is only considered whereas the degree of the vertex refers to the number of
229
vertices that are directly connected to it (i.e., nearest neighbors of the ver-
230
tex). In the degree centrality measure, each neighbour of the vertex has equal
231
importance for calculating the degree centrality of the vertex. Here, from the
232
final network, the degree centrality of a gene is observed to count the total
233
number of interactions of the gene to the corresponding miRN As and T F s.
234
After computing the DC values, we choose some top genes with respect to
235
their DC values from highest (best) to lowest (worst) cases. These top genes are
236
stated as Hub genes (centralized genes) of the network. These hub genes (say,
237
Ghub ) are utilized in epigenetic marker discovery in the next step. However,
238
the Tf-miRNA-gene network is visualized using Cytoscape 3.2.1 software.
239
3.8 Epigenetic marker identification
240
We thereafter perform intersection between Ginv genes and Ghub genes. Finally,
241
the intersected genes are the epigenetic gene-markers that are the hub genes
242
as well as they have inverse relationships between their differential expression
243
and differential methylation status. In addition, the corresponding IDP s of
244
these resulting epigenetic gene-markers are mentioned.
245
3.9 Biological validation
246
Furthermore, we have performed literature search, as well as Kegg pathway
247
and gene ontology (GO) analyses using DAVID database in order to validate
248
the gene-markers. However, we have provided a flowchart in Fig. 1 that rep-
249
resents site map of the proposed framework.
AC CE P
TE
D
MA
NU
SC
RI
PT
227
12
AC CE P
TE
D
MA
NU
SC
RI
PT
ACCEPTED MANUSCRIPT
Fig. 1. A Flow chart to describe the proposed framework.
13
ACCEPTED MANUSCRIPT 4
Epigenetic Dataset and Experimental Result
251
In this section, we describe the real epigenetic data that is used for the purpose
252
of experiment. Thereafter, we provide the experimental result and related
253
discussion.
254
4.1 Dataset Information
255
In this article, we use a multi-omics Prostate Carcinoma (epigenetic) dataset
256
(NCBI Ref. id: GSE55599) which consists of 18,309 common genes having
257
both gene expression and methylation values (=(Gmatch ). It has 32 common
258
Prostate Carcinoma (experimental) samples and 16 common control (normal)
259
samples.
260
4.2 Experimental Results and Discussion
261
Here, we discuss about the experimental result.
262
4.2.1 Determination of Differentially Expressed and Differentially Methy-
AC CE P
263
TE
D
MA
NU
SC
RI
PT
250
lated genes
264
At first, we accumulate 18,309 genes as Gmatch , and then identify 40 genes as
265
Gintegrated . Thereafter, as mentioned in Section 3, we perform data-normalization
266
and t-test consecutively on the expression/methylation data of Gintegrated genes.
267
We obtain three genes under DEup category, and eleven genes under DEdown
268
data (see Table 1). Similarly, four genes are identified as DMhyper genes,
269
whereas three genes are DMhypo (see Table 2 and Fig. 2). Thereafter, we
270
identify only one gene having DEup ∩ DMhypo pattern, and four genes belongs
271
to DEdown ∩ DMhyper category (see Fig. 3). We provide the p-values, and the 14
ACCEPTED MANUSCRIPT status of the differential expression and differential methylation of these five
273
(=1+4) genes in Table 3.
274
4.2.2 Resultant Network and Hub gene identification
275
We build a network consisting of the genes of Gintegrated set, the targeter
276
miRNA collected from the DIANA Tarbase V.07 database, and the corre-
277
sponding TFs accumulated from the TRANSFAC and ITFP databases. Fig. 4
278
shows the corresponding whole network whereas Fig. 5 shows another view of
279
Fig. 4 in which all the genes and all the miRNAs are kept in the left-top and
280
right sides, respectively of the network, whereas all the TFs are put in the
281
middle-bottom portion of the network.
282
In the TF-miRNA-gene network, we observe that four specific genes, named
283
PHF19, HTRA1, SPSB1 and CCDC151 have higher degree centrality (DC)
284
values consecutively than the others (see Table 4). According to Table 4, it
285
has been observed that DC values of the genes PHF19, HTRA1, SPSB1 and
286
CCDC151 are 38, 11, 10, and 3, respectively. Depending on degree central-
287
ity, we identify the hub genes from Figure 4. PHF19, HTRA1, SPSB1 and
288
CCDC151 are those four hub genes (Ghub ) which have highest degree central-
289
ity 38, 11, 10 and 3. Now, the genes which are differentially expressed as well
290
as differentially methylated, have been observed.
291
Besides this, we have performed further critical analysis. We observe that
292
very few number of targeter miRNAs of the corresponding genes are common.
293
E.g., PHF19 and HTRA1 have three common miRNAs (viz., hsa-miR-107,
294
hsa-miR-103a-3p and hsa-miR-16-5p). Similarly, PHF19 and SPSB1 have two
295
common miRNAs (viz., hsa-miR-522-5p and hsa-miR-96-5p), whereas only one
296
miRNA is common for the case of PHF19 and CCDC151, as well as the case of
297
SPSB1 and HTRA1 (viz., hsa-miR-424-5p and hsa-miR-335-5p, respectively).
298
In addition, three genes have relationship with three different TFs. E.g., SMN1
AC CE P
TE
D
MA
NU
SC
RI
PT
272
15
ACCEPTED MANUSCRIPT gene has associated with the TF CEBPB, whereas KLHDC4 is connected with
300
the TF OTX1, and SF3B2 is related with TF MESP2. See Table 4 for details.
301
4.2.3 Epigenetic gene markers
302
We consider every possible combinations between differentially expressed and
303
differentially methylated status of genes (i.e., DEup ∩DMhyper , DEup ∩DMhypo ,
304
DEdown ∩DMhyper , and DEdown ∩DMhypo ). From Table 3, we provide those
305
genes that consist of the combinations having inverse relationship between
306
expression and methylation values. Here five genes are listed with inverse
307
relationship (i.e, DEup ∩DMhyper , DEup ∩DMhyper ), denoted as Ginv . These
308
genes are PHF19, SPSB1, HTRA1, CCDC151 and STX19. Among them, only
309
STX19 is hypo-methylated as well as up-regulated. Other four genes are hyper-
310
methylated as well as down-regulated. Thereafter, we perform intersection be-
311
tween Ghub and Ginv genes, and identify the resultant genes (viz., PHF19,
312
HTRA1, SPSB1 and CCDC151) or epigenetic marker. Furthermore, the cor-
313
responding IDP s of the markers are mentioned in column 1 of Table 4 as well
314
as column 1 of Table 5.
315
4.2.4 Biological Validation of genes and related IDP s
316
We further perform literature search, and pathway and Gene-Ontology anal-
317
yses. The association between gene HTRA1 and prostate carcinoma is de-
318
scribed in [31]. Similarly, [43] represents the association of gene SPSB1 in
319
the prostate carcinoma, whereas the relation between the gene HTRA1 and
320
the prostate carcinoma is described in [35]. Besides these, gene SPSB1 is en-
321
riched with a prostate-carcinoma-related GO:BP (viz., GO:0007242 intracellu-
322
lar signaling cascade (p-value=0.017996325)). Now, the relationship between
323
the GO:BP of GO:0007242 intracellular signaling cascade and the prostate
324
carcinoma is found in [33]. Therefore, gene SPSB1 involves indirectly with
AC CE P
TE
D
MA
NU
SC
RI
PT
299
16
D
MA
NU
SC
RI
PT
ACCEPTED MANUSCRIPT
TE
Fig. 2. Volcanoplot for identifying hyper-methylated and hypo-methylated genes (DMhyper and DMhypo , respectively) through simultaneous verification of significance level of p-value and fold change in log scale for methylation data.
the corresponding disease through the gene-ontology analyses. Similarly, gene
326
PHF19 is enriched with some prostate-carcinoma-related GO:BP-terms like
327
GO:0006350 transcription (p-value=5.85E-04) [32], [39], and GO:0045449 reg-
328
ulation of transcription (p-value=0.002157192) [42]. HTRA1 is associated with
329
related GO:BPs of GO:0010648 negative regulation of cell communication (p-
330
value=0.046487613) [34] and GO:0009968 negative regulation of signal trans-
331
duction (p-value= 0.034877541).
332
Among the four gene-markers, three markers (viz., PHF19, HTRA1, and
333
SPSB1) are already known through either literature search or Gene Ontology
334
analyses or both, whereas the remaining marker (viz., CCDC151) is stated as
335
a novel epigenetic marker since no information is available through the liter-
336
ature search and Gene-Ontology analyses. Moreover, the corresponding IDPs
337
of the resulting gene-markers are provided in Table 5.
AC CE P
325
17
ACCEPTED MANUSCRIPT Table 1 The list of up-regulated and down-regulated genes with p-values for the expression data of the Prostate Carcinoma dataset.
P-value
Status of gene
STX19
0.0001056
DEup
MED25
0.009166984
DEup
SNRPB
0.016748568
DEup
PHF19
8.32E-05
DEdown
SAMD3
0.000100528
CCDC151
0.000245824
FYCO1
0.000337727
DEdown
HTRA1
0.001240815
DEdown
RXRA
0.001511319
DEdown
HNRNPA1
0.005020765
DEdown
SF3B2
0.038755606
DEdown
0.040726192
DEdown
SMAD9
0.043757832
DEdown
CDCA4
0.045839458
DEdown
AC CE P
RI
SC
DEdown
NU
MA
D
TE
SPSB1
PT
Gene Name
DEdown
Table 2 The list of hyper-methylated and hypo-methylated genes with p-values for the methylation data of the Prostate Carcinoma dataset.
Gene Name
P-value
Status of gene
SPSB1
5.93E-08
DMhyper
PHF19
3.20E-05
DMhyper
HTRA1
0.001479375
DMhyper
CCDC151
0.002158725
DMhyper
APP
0.000387211
DMhypo
STX19
0.000517349
DMhypo
LOXL4
0.000953542
DMhypo
18
MA
NU
SC
RI
PT
ACCEPTED MANUSCRIPT
TE
D
Fig. 3. Venn diagram that shows the intersections between each pair of different patterns of differential expression and differential methylation (viz., DEup , DEdown , DMhyper , DMhypo ).
AC CE P
Table 3 The list of genes having inverse relationship between their differential expression and methylation status for the integrated Prostate Carcinoma epigenetic dataset where “exp” and “methyl” denote the expression and methylation data, respectively.
Gene Name STX19
P-value 0.0001056 (exp)
0.000517349 (methyl)
PHF19
8.32E-05 (exp)
5.93E-08 (methyl) CCDC151
0.000245824 (exp) 3.20E-05 (methyl)
HTRA1
0.001240815 (exp) 0.001479375 (methyl)
SPSB1
0.040726192 (exp) 0.002158725 (methyl)
19
Status of gene (DEup ∩ DMhypo )
(DEdown ∩ DMhyper )
(DEdown ∩DMhyper )
(DEdown ∩DMhyper )
(DEdown ∩DMhyper )
AC CE P
TE
D
MA
NU
SC
RI
PT
ACCEPTED MANUSCRIPT
Fig. 4. A simple view of the complete T F -miRN A-gene network for the experiment, specially highlighting (enlarging) the four genes, PHF19, HTRA1, CCDC151, and SPSB1. Here, OV AL shape with light orange color denotes genes, P ARALLELOGRAM shape with pink color shows the T F s, whereas T RIAN GLE shape with light orange color represents miRs or miRN As.
20
AC CE P
TE
D
MA
NU
SC
RI
PT
ACCEPTED MANUSCRIPT
Fig. 5. Second view of the above figure (Fig. 4) in which all the genes are kept together in the left-top side of the network, and all the miRNAs are arranged together in the right side, whereas all the TFs are shown in the middle-bottom portion of the network.
21
ACCEPTED MANUSCRIPT Table 4 The forty IDP s and their corresponding transcripts (genes) of Gintegrated set, with their corresponding targeter miRN As and T F s. IDP s
Gene Names
Degree
Targeter miRN As
TFs
centrality
PHF19
38
Serine protease HTRA1
HTRA1
11
miR-107,miR-103a-3p,miR-16-5p,miR-196a-3p,miR-99a-3p,miR-31, miR-190a,miR-197-3p,miR-155-5p,miR-26b-5p,miR-335-5p.
SPRY domain-containing SOCS box protein 1
SPSB1
10
miR-522-5p,miR-96-5p,miR-27a-3p,miR-26a-5p,miR-567, miR-122-5p,miR-24-3p,miR-941,miR-185-5p,miR-335-5p.
CCDC151
3
STX19
1
SMN1
1
KLHDC4
1
SF3B2
1
APP
1
CDCA4
RXRA protein Mothers against decapentaplegic homolog 9 Zinc finger protein SNAI1 Vacuolar-sorting protein SNF8 U1 small nuclear ribonucleoprotein A Small nuclear ribonucleoproteinassociated proteins B and B’ Putative Polycomb group protein ASXL1 Small nuclear ribonucleoprotein Sm D3 Suppressor of cytokine signaling 7 Signal transducer and activator of transcription 1-alpha/beta Splicing factor U2AF 65 kDa subunit
RI
SC NU
kshv-miR-K12-6-5p,miR-10a-5p,miR-424-5p.
MA
ARHGEF1
1
miR-335-5p CEBPB
OTX1 MESP2 miR-298 miR-124
1
miR-424
1
miR-486-3p
1
miR-138
DNAJA3
1
miR-129-5p
FYCO1
1
miR-218-5p
GSTCD
1
miR-570
HDAC3
1
miR-1261
HIGD1A
1
miR-641
CHD3
D
CLNS1A
AC CE P
Kelch domain containing protein 4 Splicing factor 3B subunit 2 Amyloid beta A4 protein ARHGEF1 protein Cell division cycleassociated protein 4 Chromodomain-helicaseDNA-binding protein 3 Methylosome subunit pICln DnaJ homolog subfamily A member 3, mitochondrial FYVE and coiled-coil domain-containing protein 1 Glutathione S-transferase C-terminal domain -containing protein Histone deacetylase 3 HIG1 domain family member 1A, mitochondrial Histone H3.1t MICOS complex subunit MIC60 Lysyl oxidase homolog 4 Nuclear receptor coactivator 1 Glycylpeptide Ntetradecanoyltransferase 1 Oxysterols receptor LXR-alpha Phosphatidylinositol 3-kinase regulatory subunit alpha Peroxisome proliferatoractivated receptor delta Retinoic acid receptor alpha Ras association domaincontaining protein 2 REST corepressor 1
TE
Coiled-coil domain -containing protein 151 Syntaxin-19 Survival motor neuron protein
PT
PHD finger protein 19
let-7g-5p,miR-182-5p,miR-105-5p,miR-23a-3p,miR-18a-5p,let-7f, let-7d-5p,let-7b-5p,let-7a-5p,miR-1301,miR-590-3p,miR-BART5, miR-571,miR-328,miR-107,miR-98,let-7c,miR-9-3p, let-7e-5p,miR-1283,miR-423-5p,miR-522-5p,miR-92b-3p,miR-138-5p, miR-218-5p,miR-34a-5p,miR-103a-3p,miR-96-5p,miR-16-5p,miR-192-5p, miR-193b-3p,miR-124-3p,miR-33a-3p,miR-29a-5p,miR-15a-3p, miR-18b-5p,miR-424-5p,miR-23b-3p.
HIST3H3
1
miR-1276p
IMMT
1
miR-199a-3p
LOXL4
1
miR-455-3p
NCOA1
1
miR-498
NMT1
1
miR-623
NR1H3
1
miR-622
PIK3R1
1
miR-455-3p
PPARD
1
miR-1270
RARA
1
miR-939
RASSF2
1
miR-302c
RCOR1
1
miR-198p
RXRA
1
miR-574-3p
SMAD9
1
miR-512-3p
SNAI1
1
miR-199a-5p
SNF8
1
miR-506
SNRPA
1
miR-1108
SNRPB
1
miR-1207-5p
ASXL1
1
miR-1207-5p
SNRPD3
1
miR-570
SOCS7
1
miR-145
STAT1
1
miR-1303
U2AF2
1
miR-145
22
SC
RI
PT
ACCEPTED MANUSCRIPT
PHD finger protein 19
Gene
Gene Ontology
Literature
Status
name
search
of
GO:BP of GO:0006350 transcription (p-value=5.85E-04) [39]* , [32]* ,
[35]
Existing
[31]
Existing
[43]
Existing
MA
IDP
NU
Table 5 Biological validations of the epigenetic gene-markers, and the corresponding (potential) IDPs.
PHF19
marker
GO:0045449 regulation of transcription (p-value=0.002157192) [42]* . HTRA1
GO:BP of GO:0010648 negative regulation of cell communication
D
Serine protease HTRA1
SPSB1
AC CE P
SPRY domain -containing SOCS box protein 1
TE
(p-value=0.046487613)
Coiled-coil domaincontaining protein 151
[34]* ,
GO:0009968 negative regulation of
signal transduction (p-value=0.034877541).
GO:BP of GO:0007242 intracellular signaling cascade (p-value=0.017996325) [33] * .
CCDC151
Novel
The reference represents that the corresponding GO-term has a relation with Prostate Carcinoma. *
23
ACCEPTED MANUSCRIPT 5
Conclusion
339
In this article, we perform a transcriptomic analysis of genes for transcripts
340
encoding IDP s on a multi-omics prostate cancer dataset that consists of both
341
gene expression and methylation data. As a result, we identify some potential
342
epigenetic gene-markers (having inverse relationship between their diffential
343
expression and methylation patterns) through integrating statistical and TF-
344
miRNA-gene network centrality methods.
345
Notably, in this study, we have two hypotheses. First one is that the integrated
346
method of associations of methylation patterns with the gene expression pat-
347
terns, and hub gene selection together has a better significance of identify-
348
ing potential genemarkers (and corresponding IDPs) for the disease rather
349
than considering only the expression pattern. Now, methylation generally de-
350
creases the gene expression level. Since we utilize this phenomena (i.e., inverse
351
relationship between expression and methylation patterns), thus our study
352
is surely the enhanced version rather than considering only the expression
353
pattern. The functional significance is that the expression patterns of these
354
biomarkers are completely depended on their methylation patterns rather than
355
other factors. The second hypothesis is that here we have identified four epige-
356
netic gene-markers (viz., PHF19, HTRA1, SPSB1, and CCDC151). However,
357
for testing the second hypothesis, we initially perform literature search, and
358
gene ontology analyses on the genes using DAVID database for finding as-
359
sociations between the genes and the disease. Among these markers, three
360
markers (viz., PHF19, HTRA1, and SPSB1) are existing since some associa-
361
tions between the genes and the disease are found through either literature
362
evidences or Gene Ontology analysis or both, whereas the remaining marker
363
(viz., CCDC151) is a novel marker. It is expected that the novel marker may
364
have a significant therapeutic value for the disease. Finally, these markers and
365
their corresponding IDP s are reported.
AC CE P
TE
D
MA
NU
SC
RI
PT
338
24
ACCEPTED MANUSCRIPT References
367
[1] M. Babu, R. van der Lee , de Groot and J. Gsponer, Intrinsically disordered
368
proteins: regulation and disease, Current Opinion in Structural Biology,
369
21(3):432-440, 2011.
PT
366
[2] M. Fuxreiter, I. Simon, P. Friedrich and P. Tompa, Preformed Structural
371
Elements Feature in Partner Recognition by Intrinsically Unstructured Proteins,
372
Journal of Molecular Biology, 338(5):1015-1026, 2004.
SC
RI
370
[3] P. Romero, Z. Obradovic, X. Li, E. Garner, C. Brown and A. Dunker, Sequence
374
complexity of disordered protein, Proteins: Structure, Function, and Genetics,
375
42(1):38-48, 2000.
MA
NU
373
[4] F. Gallat, A. Laganowsky, K. Wood, F. Gabel, L. van Eijck, J. Wuttke,
377
M. Moulin, M. Hartlein, D. Eisenberg, J. Colletier, G. Zaccai and M. Weik,
378
Dynamical Coupling of Intrinsically Disordered Proteins and Their Hydration
379
Water: Comparison with Folded Soluble and Membrane Proteins, Biophysical
380
Journal, vol. 103, no. 1, pp. 129-136, 2012.
TE
D
376
[5] C. Brown, S. Takayama, A. Campen, P. Vise, T Marshall, C. Oldfield et al,
382
Evolutionary Rate Heterogeneity in Proteins with Long Disordered Regions,
383
Journal of Molecular Evolution, 55(1):104-110, 2002.
AC CE P
381
384
[6] J. Cheng, M. Sweredoski and P. Baldi, Accurate Prediction of Protein Disordered
385
Regions by Mining Protein Structure Data, Data Mining and Knowledge
386
Discovery, 11(3):213-222, 2005.
387
[7] R. van der Lee, M. Buljan, B. Lang, R. Weatheritt, G. Daughdrill, A. Dunker,
388
M. Fuxreiter, J. Gough, J. Gsponer, D. Jones, P. Kim, R. Kriwacki, C. Oldfield,
389
R. Pappu, P. Tompa, V. Uversky, P. Wright and M. Babu, Classification of
390
Intrinsically Disordered Regions and Proteins, Chemical Reviews, vol. 114, no.
391
13, pp. 6589-6631, 2014.
392
393
[8] V. Uversky, Intrinsically disordered proteins and their (disordered) proteomes in neurodegenerative disorders, Frontiers in Aging Neuroscience, 7, 2015.
25
ACCEPTED MANUSCRIPT [9] S. Bandyopadhyay, S. Mallik, and A. Mukhopadhyay, A Survey and Comparative
395
Study of Statistical Tests for Identifying Differential Expression from Microarray
396
Data, IEEE/ACM Transactions on Computational Biology and Bioinformatics,
397
11:1, pp. 95-115, 2013.
PT
394
[10] U. Maulik, S. Mallik, A. Mukhopadhyay and S. Bandyopadhyay, Analyzing
399
Gene Expression and Methylation Data Profiles using StatBicRM: Statistical
400
Biclustering-based Rule Mining, Plos One, 10(4): e0119448, 2015.
SC
RI
398
[11] S. Mallik, A. Mukhopadhyay, U. Maulik and S. Bandyopadhyay, Integrated
402
Analysis of Gene Expression and Genome-wide DNA Methylation for Tumor
403
Prediction: An Association Rule Mining-based Approach, Proc. IEEE Symposium
404
on Computational Intelligence in Bioinformatics and Computational Biology
405
(CIBCB), IEEE Symposium Series on Computational Intelligence - SSCI 2013,
406
Singapore, pp. 120-127, April 16, 2013.
MA
NU
401
[12] S. Mallik, A. Mukhopadhyay, and U. Maulik, RANWAR: Rank-Based Weighted
408
Association Rule Mining from Gene Expression and Methylation Data,IEEE
409
Nanobioscience, 14: 1, pp. 59-66, 2015.
TE
D
407
[13] S. Mallik and U. Maulik, MiRNA-TF-Gene Network Analysis through Ranking
411
of Biomolecules for Multi-Informative Uterine Leiomyoma Dataset, Journal of
412
Biomedical Informatics, vol. 57, pp. 308-319,2015.
413
414
AC CE P
410
[14] D. Baek, J. Villen, C. Shin, F. Camargo, S. Gygi and D. Bartel, The impact of microRNAs on protein output, Nature, 455(7209):64-71, 2008.
415
[15] T. Di Domenico,I. Walsh ,A. Martin and S. Tosatto, MobiDB: a comprehensive
416
database of intrinsic protein disorder annotations, Bioinformatics, 28(15):2080-
417
2081, 2012.
418
[16] D. Sengupta and S. Bandyopadhyay, Topological patterns in microRNA gene
419
regulatory network: studies in colorectal and breast cancer, Molec. BioSyst., vol
420
9:pp. 13601371, 2013.
421
422
[17] V. Matys et al.,TRANSFAC and its module TRANSCompel: transcriptional gene regulation in eukaryotes, Nucleic Acids Res. (34) , pp. D108-D110, 2006.
26
ACCEPTED MANUSCRIPT 423
[18] G. Zheng, K. Tu, Q. Yang, Y. Xiong, C. Wei, L. Xie, Y. Zhu and Y. Li, ITFP:
424
an integrated platform of mammalian transcription factors, Bioinformatics, 24
425
(20) , pp. 2416-2417, 2008. [19] A. Dunker, I. Silman, V. Uversky and J. Sussman, Function and structure of
427
inherently disordered proteins, Current Opinion in Structural Biology, 18(6):756-
428
764, 2008.
RI
PT
426
[20] T. Harmon and R. Pappu, An Evolutionary Algorithm for the Design of
430
Different Degrees of Secondary Structure in Intrinsically Disordered Proteins
431
(IDPs), Biophysical Journal, 108(2):228a, 2015.
NU
SC
429
[21] M. Jensen, P. Markwick, S. Meier, C. Griesinger, M. Zweckstetter, S. Grzesiek
433
et al, Quantitative Determination of the Conformational Properties of Partially
434
Folded and Intrinsically Disordered Proteins Using NMR Dipolar Couplings,
435
Structure, 17(9):1169-1185, 2009.
D
437
[22] S. Shojania and J. D. O’Neil, Intrinsic Disorder and Function of the HIV-1 Tat Protein, Protein & Peptide Letters, vol. 17, no. 8, pp. 999-1011, 2010.
TE
436
MA
432
[23] J. Ward, L. McGuffin, K. Bryson, B. Buxton and D. Jones, The DISOPRED
439
server for the prediction of protein disorder, Bioinformatics, 20(13):2138-2139,
440
2004.
AC CE P
438
441
[24] D. Jones and D. Cozzetto, DISOPRED3: precise disordered region predictions
442
with annotated protein-binding activity, Bioinformatics, 31(6):857-863, 2014.
443
[25] P. Ruy, R. Torrieri, J. Toledo, V. Alves, A. Cruz and J. Ruiz, Intrinsically
444
disordered proteins (IDPs) in trypanosomatids, BMC Genomics, 15(1):1100,
445
2014.
446
[26] Y. Edwards, A. Lobley, M. Pentony and D. Jones, Insights into the regulation
447
of intrinsically disordered proteins in the human proteome by analyzing sequence
448
and gene expression data, Genome Biol, 10(5):R50, 2009.
449
[27] L. Iakoucheva, C. Brown, J. Lawson, Z. Obradovi and A. Dunker, Intrinsic
450
Disorder in Cell-signaling and Cancer-associated Proteins, Journal of Molecular
451
Biology, 323(3):573-584, 2002.
27
ACCEPTED MANUSCRIPT 452
[28] M. Aqil, A.R. Naqvi, S. Mallik, S. Bandyopadhyay, U. Maulik and S. Jameel,
453
The HIV Nef protein modulates cellular and exosomal miRNA profiles in human
454
monocytic cells, Journal of Extracellular Vesicles, 3: 23129, 2014. [29] M. Aqil, S. Mallik, S. Bandyopadhyay, U. Maulik and S. Jameel, Transcriptomic
456
Analysis of mRNAs in Human Monocytic Cells Expressing the HIV-1 Nef Protein
457
and Their Exosomes, BioMed Research International, vol. 2015, article id:
458
492395, pp. 1-10, 2015.
SC
RI
PT
455
[30] I. S. Vlachos, M. D. Paraskevopoulou, D. Karagkouni, G. Georgakilas, T.
460
Vergoulis, I. Kanellos, I. L. Anastasopoulos, S. Maniou, K. Karathanou, D.
461
Kalfakakou, A. Fevgas, T. Dalamagas and A. G. Hatzigeorgiou, DIANA-TarBase
462
v7.0: indexing more than half a million experimentally supported miRNA:mRNA
463
interactions, Nucl. Acids Res, Vol. 43, pp. D153-D159, 2014.
MA
NU
459
[31] J. Chien, M. Campioni, V. Shridhar and A. Baldi, HtrA Serine Proteases as
465
Potential Therapeutic Targets in Cancer, Current Cancer Drug Targets, 9(4):451-
466
468, 2009.
TE
D
464
[32] S. Lee, M. Ha, J. Lee, P. Nguyen, Y. Choi, F. Pirnia, et al., Inhibition of
468
the 3-Hydroxy-3-methylglutaryl-coenzyme A Reductase Pathway Induces p53-
469
independent Transcriptional Regulation of p21WAF1/CIP1 in Human Prostate
470
Carcinoma Cells, Journal of Biological Chemistry, 273(17):10618-10623, 1998.
AC CE P
467
471
[33] F. Yuan, Y. Zhou, M. Wang, et al., Identifying New Candidate Genes
472
and Chemicals Related to Prostate Cancer Using a Hybrid Network and
473
Shortest Path Approach, Comput Math Methods Med., 2015: 462363, 2015, doi:
474
10.1155/2015/462363.
475
476
477
[34] http://cmrn.systemsbiology.net/cluster/BPH%20Prostate%20Dhanasekaran/6 [35] John R. Prensner, Discovery and Characterization of Long Non-Coding RNAs
478
in
Prostate
Cancer,
PhD
479
http://deepblue.lib.umich.edu/bitstream/handle/2027.42/107221/prensner_1.pdf?sequence=1
28
Dissertation,
2012,
ACCEPTED MANUSCRIPT 480
[36] S. Servia Rodr guez, A. Noulas, C. Mascolo, A. Fernndez-Vilas and R. Dz-
481
Redondo, The Evolution of Your Success Lies at the Centre of Your Co-
482
Authorship Network, PLOS ONE, 10(3):e0114302, 2015. [37] M. Sickmeier, J. Hamilton, T. LeGall, V. Vacic, M. Cortese, A. Tantos
484
et al, DisProt: the Database of Disordered Proteins, Nucleic Acids Research,
485
35(Database):D786-D793, 2007.
487
RI
[38] H. Berman, The Protein Data Bank, Nucleic Acids Research, 28(1):235-242,
SC
486
PT
483
2000.
[39] S. Bilke, R. Schwentner, F. Yang, et al., Oncogenic ETS fusions deregulate
489
E2F3 target genes in Ewing sarcoma and prostate cancer, Genome Research,
490
23:17971809, 2013.
MA
492
[40] L.C. Freeman, Centrality in social networks: conceptual clarification, Sociometry, vol. 1, pp. 215-239, 1979.
D
491
NU
488
[41] A. Ozgur, T. Vu, G. Erkan, and D.R. Radev, Identifying gene-disease
494
associations using centrality on a literature mined gene-interaction network,
495
Bioinformatics, vol.24:pp. i277-i285, 2008, doi:10.1093/bioinformatics/btn182.
AC CE P
TE
493
496
[42] G. Xu, J. Wu, L. Zhou, B. Chen, Z. Sun, F. Zhao, and Z. Tao, Characterization
497
of the Small RNA Transcriptomes of Androgen Dependent and Independent
498
Prostate Cancer Cell Line by Deep Sequencing, PLoS One, 5(11): e15519, 2010,
499
doi: 10.1371/journal.pone.0015519.
500
[43] D. G. Tandefelt, J. L. Boormans, H. A. van der Korput, G. W. Jenster, and
501
J. Trapman, A 36-gene Signature Predicts Clinical Progression in a Subgroup of
502
ERG-positive Prostate Cancers, European Urology, vol. 64, iss. 6, pp. 941-950,
503
2013.
29
ACCEPTED MANUSCRIPT List of Abbreviations IDPs
- Intrinsically Disordered Proteins
TF
- Transcription Factors
PT
miRNA - MicroRNA
HTRA1 - serine Protease HTRA1
SC
SPSB1 - SPRY domain-containing SOCS box protein 1
RI
PHF19 - PHD finger protein 19
NU
CCDC151 - Coiled-coil domain-containing protein 151 - CEBPB protein 1
OTX1
- Homeobox protein OTX1
MESP2
- Mesoderm posterior protein 2
MA
CEBPB
AC CE P
TE
D
ITPT - Insights into Potential Intrinsically Disordered Proteins Through Transcriptomic Analysis of Genes
ACCEPTED MANUSCRIPT Highlights
PT
RI SC NU MA D TE
· · ·
We perform a transcriptomic analysis of genes for transcripts encoding IDPs on a Prostate cancer multi-omics dataset. We obtain differentially expressed and differentially methylated genes using t-test. We determine hub-genes from TF-miRNA-gene network using degree centrality. We identify some epigenetic gene-markers by integrating statistical and TF-miRNAgene network analyses.
AC CE P
·