Accepted Manuscript A DNA methylation-based definition of biologically distinct breast cancer subtypes Olafur A. Stefansson, Sebastian Moran, Antonio Gomez, Sergi Sayols, Carlos Arribas-Jorba, Juan Sandoval, Holmfridur Hilmarsdottir, Elinborg Olafsdottir, Laufey Tryggvadottir, Jon G. Jonasson, Jorunn Eyfjord, Dr. Manel Esteller PII:
S1574-7891(14)00261-0
DOI:
10.1016/j.molonc.2014.10.012
Reference:
MOLONC 610
To appear in:
Molecular Oncology
Received Date: 27 December 2013 Revised Date:
6 October 2014
Accepted Date: 29 October 2014
Please cite this article as: Stefansson, O.A., Moran, S., Gomez, A., Sayols, S., Arribas-Jorba, C., Sandoval, J., Hilmarsdottir, H., Olafsdottir, E., Tryggvadottir, L., Jonasson, J.G., Eyfjord, J., Esteller, M., A DNA methylation-based definition of biologically distinct breast cancer subtypes, Molecular Oncology (2014), doi: 10.1016/j.molonc.2014.10.012. This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT
A DNA methylation-based definition of biologically distinct breast cancer subtypes
RI PT
Olafur A. Stefansson1, Sebastian Moran1, Antonio Gomez1, Sergi Sayols1, Carlos Arribas-Jorba1, Juan Sandoval1, Holmfridur Hilmarsdottir2, Elinborg Olafsdottir3,
SC
Laufey Tryggvadottir3, Jon G. Jonasson3,4, Jorunn Eyfjord2, Manel Esteller1, 5, 6, * 1. Cancer Epigenetics and Biology Program (PEBC), Bellvitge Biomedical Research Institute,
M AN U
08908 L’Hospitalet, Barcelona, Catalonia, Spain
2. The Cancer Research Laboratory, Medical Faculty, University of Iceland, Reykjavik, Iceland 3. The Icelandic Cancer Registry, Reykjavik, Iceland
4. Department of Pathology, Landspitali University Hospital, Reykjavik, Iceland 5. Department of Physiological Sciences II, School of Medicine, University of Barcelona,
TE D
Barcelona, Catalonia, Spain
6. Institucio Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Catalonia Spain
Dr. Manel Esteller.
EP
*Corresponding author:
AC C
IDIBELL - Bellvitge Biomedical Research Institute Cancer Epigenetics and Biology Program (PEBC) Hospital Duran i Reynals, Gran Via de l'Hospitalet, 199 08907 L'Hospitalet de Llobregat, Barcelona, Catalonia, Spain Email:
[email protected] Phone: 00 34 93 260 71 40, Fax: 00 34 93 260 72 19
Running title: DNA methylation in breast cancer. Key words: Breast cancer; DNA methylation; microarrays; biological subtypes; prognosis.
1
ACCEPTED MANUSCRIPT Financial support: The Icelandic Centre for Research RANNIS (J.E., H.H.), the Cellex Foundation (ME), Fundación Sandra Ibarra de Solidaridad frente al Cáncer (M.E.), Junta de Barcelona of the Asociación Española Contra el Cáncer (O.A.S.), and the Health and
RI PT
Science Departments of the Catalan Government (Generalitat de Catalunya) (M.E.). M.E. is an ICREA Research Professor.
AC C
EP
TE D
M AN U
SC
Conflicts of interest: The authors declare no conflicts of interests.
2
ACCEPTED MANUSCRIPT
Abstract
In cancer, epigenetic states are deregulated and thought to be of significance in cancer development and progression. We explored DNA methylation-based signatures in association with breast cancer subtypes to assess their impact on clinical presentation and
RI PT
patient prognosis. DNA methylation was analyzed using Infinium 450K arrays in 40 tumors and 17 normal breast samples, together with DNA copy number changes and subtypespecific markers by tissue microarrays. The identified methylation signatures were
SC
validated against a cohort of 212 tumors annotated for breast cancer subtypes by the PAM50 method (The Cancer Genome Atlas). Selected markers were pyrosequenced in an
M AN U
independent validation cohort of 310 tumors and analysed with respect to survival, clinical stage and grade. The results demonstrate that DNA methylation patterns linked to the luminal-B subtype are characterized by CpG island promoter methylation events. In contrast, a large fraction of basal-like tumors are characterized by hypomethylation events
TE D
occurring within the gene body. Based on these hallmark signatures, we defined two DNA methylation-based subtypes, Epi-LumB and Epi-Basal, and show that they are associated with unfavorable clinical parameters and reduced survival. Our data show that distinct
EP
mechanisms leading to changes in CpG methylation states are operative in different breast cancer subtypes. Importantly, we show that a few selected proxy markers can be used to
AC C
detect the distinct DNA methylation-based subtypes thereby providing valuable information on disease prognosis.
3
ACCEPTED MANUSCRIPT
1. Introduction
Breast cancer is the most common cancer among women and ranks among the leading causes of cancer-related deaths (Hortobagyi et al 2005). Nevertheless, the overall prognosis of breast cancer patients has been improving over time, with marked changes
RI PT
seen since the introduction of adjuvant chemotherapy and ionizing radiation, coupled with the use of tamoxifen for patients with hormone receptor-positive tumors and, more recently, trastuzumab for those displaying overexpression and amplification of the HER-2
SC
oncogene (López-Tarruella et al 2009). Further improvements in treatment of breast cancer patients are likely to emerge from our greater understanding of why only some
M AN U
patients develop aggressive disease and require adjuvant chemotherapy. A major milestone on the way to this goal is the definition of five biologically and clinically meaningful breast cancer subtypes based on genome-wide expression analyses: LuminalA, Luminal-B, HER-2, Normal-like and Basal-like (Sorlie et al 2003; Perou &Børresen-Dale
TE D
2011). An integrated analysis of genome-wide gene expression and copy number changes was recently carried out to further refine the previously established breast cancer subtypes (Curtis et al 2012).
EP
Different breast cancer subtypes are thought to arise through different “evolutionary paths” reflecting distinct patterns of mutated cancer genes (Stephens et al 2012; Cancer
AC C
Genome Atlas Network 2012; Saal et al 2008; Chin et al 2006; Diaz-Cruz et al 2010; Curtis et al 2012). Less is known about the contribution of epigenetic changes to the development of biologically distinct breast cancer subtypes. The epigenome-wide studies so far published on this subject have either lacked comprehensive profiling technologies or have not validated any clinical relevance in independent patient cohorts (Cancer Genome Atlas Network 2012; Holm et al 2010; Bediaga et al 2010; Fang et al 2011; Dedeurwaerder et al 2011; Hon et al 2012). Recent efforts to sequence cancer genomes, including those of breast cancer, have led to the identification of novel cancer genes and previously
4
ACCEPTED MANUSCRIPT
unrecognized signatures of mutational processes (Stephens et al 2012; Cancer Genome Atlas Network 2012). A recurring theme emerging from these studies is that acquired mutations often affect genes involved in regulating chromatin dynamics or the processing of epigenetic marks as seen in various cancer types (Stephens et al 2012; Cancer
RI PT
Genome Atlas Network 2012; Jones et al 2010; Mamo et al 2012). This highlights the importance the epigenome in cancer development and opens up new potentials for identifying patterns of potential relevance to patient prognosis and personalized medicine
AC C
EP
TE D
M AN U
SC
(Stefansson & Esteller 2012; Heyn & Esteller 2012).
5
ACCEPTED MANUSCRIPT
2. Materials and Methods 2.1. DNA samples
The study material consisted of 350 primary invasive breast tumors and 25 normal breast
RI PT
tissue samples obtained adjacent to tumour lesions of patients. The samples were obtained from the Department of Pathology, University Hospital, Iceland. The normal breast tissue samples obtained from the subset of 25 patients were derived from regions
SC
adjacent to the site of tumour growth. DNA was isolated by the commonly used phenolchloroform/proteinase-K method. Patient information about the breast cancer samples
M AN U
came from the population-based Icelandic Cancer Registry (Sigurdardottir et al 2012). Data on clinical staging and histological grade from the Icelandic Cancer Registry were based on analyses by pathologists at the Department of Pathology, Landspitali University Hospital. For clinical staging, the TNM system was followed (tumour size and nodal
TE D
status), with histological grade assessed according to the Nottingham system. Informed consent was obtained from all patients and all work was carried out according to permissions from the Icelandic Data Protection Commission (2006050307) and Bioethics
EP
Committee (VSNb2006050001/03-16).
AC C
2.2. DNA methylation analysis
Infinium HumanMethylation450 BeadChips were used to analyze DNA methylation on a genome-wide scale in the discovery cohort (40 tumors and 17 normal breast tissues). The 450K Infinium microarray data tissue samples have been deposited to NCBI´s GEO database and can be downloaded from the following link: http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?token=enijueysntsbvyd&acc=GSE52865 The experimental procedures used were those recommended by the manufacturer (Supplementary Materials & Methods).
6
ACCEPTED MANUSCRIPT
The PyroMark Q96 system for pyrosequencing was used to assess selected markers in the validation cohort. The primer sequences used in this analysis were designed using Qiagen’s Pyromark Assay Design 2·0 software (details and primer sequences are available in the Supplementary Methods).
RI PT
2.3. Statistical analyses & bioinformatics procedures Cluster analysis (carried out in section 3.1.) of the epigenome-wide data was performed using unsupervised hierarchical clustering with complete linkage and
SC
Manhattan distance on the top 5000 CpG´s found differentially methylated between breast tumors and normal breast samples (see further in supplementary methods). We then used
M AN U
the pvclust package in R to define statistically significant tumour clusters indicating patterns of potential biological relevance. The DNA methylation data were then analysed in a supervised design to identify sites associated with expression-based subtypes using the SAMr univariate multi-class marker discovery procedure (samr package in R) wherein
TE D
permutations are used to estimate the false discovery rate (FDR) (with FDR <5% considered statistically significant) as carried out in section 3.2. In this analysis, we additionally applied a threshold for the significant sites using mean differences on a
EP
subtype-by-subtype basis (also including the group of normal breast tissue samples) with
AC C
minimum change of +/- 0.10 in beta values required in all subtype-subtype comparisons for each site (wherein the direction of change was required to be consistent). The CpG sequence context of differentially methylated genes between subtypes was then analyzed (carried out in section 3.2.) by hypothesis testing for differences in the counting of CpGs by categories reflecting functionally relevant sequences, i.e. 1) those located proximal to the transcription start site (TSS) (inferred as promoter regions, i.e. CpG´s located within 200b of the TSS or within the 5´UTR/1stExon), 2) those located more distantly from the TSS lying within the interval from 200 - 1500bp upstream of the TSS site and 3) those found within the gene body (excluding the 1st Exon). More detailed analysis in this regards
7
ACCEPTED MANUSCRIPT
included the classification of CpG´s within promoter regions into three distinct groups, i.e. those found within 1) CpG islands, 2) CpG shores and 3) CpG poor regions. In section 3.3., we define DNA methylation-based subtypes by a cross-validation procedure that enables classification of each tumour according to the identified hallmark signatures (as
RI PT
defined in section 3.2). This was carried out using a well-known pattern recognition algorithm called prediction analysis for microarrays (PAM) in a leave-one-out crossvalidation design (Tibshirani et al 2002).
SC
Tabular data were analyzed by the two-sided chi-squared contingency test for count data (carried out in section 3.6). The Kaplan–Meier method was used to generate survival
M AN U
curves while relative hazards were estimated in multivariate analyses using the Cox proportional hazards model (carried out in sections 3.6 – 3.7). Patients were followed from the date of diagnosis until death or last date of follow-up (December 31st 2009). The outcome was breast cancer-specific survival, defined as the time from diagnosis to patient
TE D
death from breast cancer as registered on death certificates. Patients who died of causes other than breast cancer were censored at the date of death. Survival analyses were carried out in R (survival package). Hazard ratios and the corresponding P-values are
EP
reported for each methylation-based subtype, coded as a categorical variable, after having
AC C
adjusted for the potential confounding factors of patient age and year of diagnosis.
8
ACCEPTED MANUSCRIPT
3. Results
3.1. Genome-wide DNA methylation patterns and breast cancer subtypes CpGs that were differentially methylated between breast tumors and normal breast
RI PT
tissues were identified and explored in relation to biologically relevant subtypes by cluster analysis (Figure 1A; Supplementary Fig. 1). This revealed a distinctive DNA methylation pattern (significant at AU > 95%; by the pvclust method) enriched with tumors of the
SC
Luminal-B (LumB) phenotype (indicated as “Cluster 1”, Figure 1A). These tumors show extensive DNA methylation of CpG islands implying that they have acquired a methylator
M AN U
phenotype. Interestingly, these characteristics did not appear to be restricted to LumB tumors as a few members of other subtypes also displayed this pattern, whereby one tumour was classified as Luminal-A (LumA) and the other as HER2 (see in “Cluster 1”; Figure 1A). The DNA methylation patterns of most, but not all, Basal-like tumors were also
TE D
distinctive (AU > 95%; indicated as “Cluster 2”, Figure 1A) in that the changes affected a different set of CpGs from those affected in most other tumors. In contrast, HER2 and Luminal-A (LumA) breast tumors were more heterogeneous in terms of their DNA
EP
methylation patterns as these tumors were more widely dispersed throughout the cluster dendrogram. Indeed, only a small subset of the LumA breast tumors (four out of twelve)
AC C
showed evidence of a distinctive (i.e., statistically significant; AU > 95%) pattern of DNA methylation changes (indicated as “Cluster 3”, Figure 1A), emphasizing the biological heterogeneity within this subtype (Supplementary Fig. 1). In summary, this analysis supports the notion that CpG methylation changes observed on a genome-wide scale are systematically linked to distinct breast cancer subtypes, in particular the LumB and Basallike subtypes (Supplementary Table 1). 3.2. Distinct epigenomic characteristics between breast cancers of the Luminal-B and Basal-like subtypes
9
ACCEPTED MANUSCRIPT
Given the observed potential of DNA methylation changes for identifying biologically aggressive breast cancer subtypes, notably those identified as LumB and Basal-like (as revealed through the cluster analysis), we derived methylation signatures for each subtype through a multi-class procedure (uni-variate) with permutations to adjust for multiple
RI PT
hypothesis testing (Figure 1B; Supplementary Fig. 2). The subtype-specific CpG´s identified through this procedure were validated against an independent cohort (The Cancer Genome Atlas; TCGA) wherein breast cancer subtypes have been annotated for
SC
each tumour by the PAM50 assay using expression arrays (as described in Parker et al 2009). The overlap between CpG´s identified as specific for the expression-based
M AN U
subtypes in our cohort (PEBC) and the TCGA cohort is shown in Figure 2A. This analysis revealed consistent changes specific for breast cancers of the LumB and Basal-like subtypes with 254 and 202 CpG´s found specific for LumB and Basal-like breast cancers, respectively, in both cohorts (Figure 2A; Supplementary Table 2). In contrast, breast cancers of the LumA and HER2 subtypes showed very limited or no overlap at all (Figure
TE D
2A). The validated set of subtype-specific CpG’s are displayed in Figure 2B. Here, we note that while most LumB tumors appear to robustly display the methylation pattern associated
EP
with the LumB subype, a subset of LumB tumors do not and, furthermore, an appreciable number of LumA and HER2 tumors appear to display LumB-methylation characteristics
AC C
(Figure 2B). In contrast, the Basal-like specific CpG methylation pattern appears specific for basal-like tumors - especially when looking at the hypomethylated CpG’s (Figure 2B). Nonetheless, it is clear that some basal-like tumors do not conform to the basal-like methylation pattern, i.e. a few basal-like tumors appear to be “out of place” (Figure 2B). These observations are addressed and extended further in section 3.3. The resulting DNA methylation signatures for LumB and Basal-like tumors, based on the validated catalogue of subtype-specific CpG´s (i.e. the overlapping 254 LumB specific CpG´s and 202 Basal-like specific CpG´s), were analysed further in terms of
10
ACCEPTED MANUSCRIPT
functionally relevant DNA sequence elements. This analysis revealed that the LumB signature predominantly involves CpG methylation of promoter sequences (54%, 137 of 254; see Figure 3A) whereas the Basal-like signature predominantly involved hypomethylation events occurring in gene body regions (26%, 53 of 202; see Figure 3B).
RI PT
As this analysis only includes the catalogue of validated CpG´s between the two cohorts (i.e. the overlap between subtype-specific CpG´s identified in both the PEBC and TCGA cohorts), the LumA and HER2 subtypes were excluded due to the lack of consistently
SC
associated CpG´s (see further in Figure 2A).
In exploring events involving CpG promoter methylation further, i.e. focusing only on
M AN U
promoter regions, we found that promoter methylation events within the LumB signature almost exclusively involve CpG islands rather than shores (Figure 3C). In contrast, Basallike breast cancers are more or less equally likely to involve CpG shores and other CpG poor promoter regions (Figure 3C). We further demonstrate that the 90 genes uniquely
TE D
affected by promoter methylation over the 137 CpG´s consistently identified in association with the LumB subtype (see Supplementary Table 2) are significantly enriched as targets of the Polycomb group repressor complex-2 (PCR2) in embryonic stem cells (Padjusted <
EP
0.01; Figure 3D). Specifically, 31 out of these 90 genes (31 of 90; 34.4%) are known targets of the PCR2 complex. This fraction (34.4%) is significantly higher than can be
AC C
expected by chance assuming that the list of known PCR2 target genes encompasses 652 genes (Padjusted < 0.01; Figure 3D; see also Supplementary Methods). The same was not true for Basal-like breast cancers regardless of whether we based the analysis on the catalogue of 19 validated promoter associated CpG´s (i.e. the catalogue of 19 promoter methylation linked CpG´s associated with Basal-like tumors in both PEBC & TCGA cohorts), or the catalogue of “non-validated” promoter associated CpG´s (i.e. those identified as Basal-like specific events in either the PEBC or TCGA cohorts separately; data not shown).
11
ACCEPTED MANUSCRIPT 3.3. DNA methylation-based definition of breast cancer subtypes Given the distinctive epigenomic features observed in association with LumB and Basal-like breast cancers, i.e. CpG island promoter methylation and gene body
RI PT
hypomethylation in LumB and Basal-like tumors, respectively, we set out to determine how unique these “hallmark features” are to each of the two expression-based subtypes. This was carried out using a well-established algorithm for pattern recognition, by which we
SC
determined the “degree of similarity” (reflected in the cross-validation probabilities) for each tumour to the signatures of LumB associated CpG island promoter methylation
M AN U
(consisting of 129 CpG´s as shown in Figure 3C) and Basal-like associated gene body hypomethylation (consisting of 53 CpG´s as shown in the lower panel of Figure 3B). In this analysis, we found that the signature of LumB-associated CpG island promoter methylation events effectively characterized most, but not all, of the LumB tumors (6 of 7;
TE D
85.7%) while also characterizing two tumors of the LumA (2 of 11; 18.2%) and one of the HER2 subtype (1 of 3; 33.3%) (Figure 4A). In summary, out of the 40 tumors analysed in our cohort, 10 tumors (10 of 40; 25%) robustly displayed the LumB-signature. Similar
EP
results were obtained based on the analysis of the TCGA cohort, wherein we find that the
AC C
majority but not all of the LumB tumors robustly display the LumB-associated CpG island promoter methylation signature (33 of 46; 71.7%) (Figure 4A). While, at the same time, a few members of the LumA (29 of 108; 26.8%) and HER2-enriched subtype (4 of 14; 28.6%) also display the LumB-signature. In summary, out of the 212 tumors analysed in the TCGA cohort, 66 tumors (66 of 212; 31.1%) robustly displayed the LumB associated signature of CpG island promoter methylation events. The presence of LumB-linked methylome characteristics in an appreciable proportion of LumA and HER2 associated tumors provides the basis for defining a novel subtype hereafter referred to as Epi-LumB (the “Epi” prefix indicating its epigenetic nature).
12
ACCEPTED MANUSCRIPT
In looking at the gene body hypomethylation signature associated with the Basallike subtype, we found that most (though not all), of the basal-like tumors in our cohort robustly display this signature (11 of 14; 78.6%) (Figure 4B). In summary, we find no members of any other subtype displaying the basal-like associated gene body
RI PT
hypomethylation signature in our cohort (the PEBC cohort). Thus, out of the 40 tumors analysed in the PEBC cohort, 11 displayed this basal-like signature (11 of 40; 27.5%). In the TCGA cohort, we similarly find that most but not all of the basal-like tumors (30 of 39;
SC
76.9%) robustly display the basal-like associated gene body hypomethylation signature. In this cohort, only one member of another subtype displayed this signature, i.e. a HER2-
M AN U
enriched tumour (1 of 14; 7.1%). Thus, to summarize, out of the 212 tumors analysed in the TCGA cohort, a total of 31 robustly displayed the basal-like gene body hypomethylation
signature
(31
of
212;
14.6%).
Given
the
unique
methylome
characteristics, i.e. gene body hypomethylation, we refer to the group of tumors that robustly display the gene body hypomethylation signature as the Epi-Basal subtype. The
TE D
CpG methylation signatures leading to the classification of tumors as either Epi-LumB or Epi-Basal are shown in Figures 4C & D, respectively.
EP
3.4. Gene promoter methylation events affecting known cancer genes found in
AC C
association with the DNA methylation-based subtypes We found that the inherent propensity of Epi-LumB tumors to acquire CpG island promoter methylation events affected a subset of previously established tumour suppressor genes (based on the catalogue of 716 TSG, see Zhao et al 2013) of which at least five are shown to be strongly down-regulated upon CpG promoter methylation, i.e. L3MBTL4, ID4, IRX1, PTCH2 and RASSF10 (Table 1). The Epi-Basal subtype, in contrast, was not found to be associated with CpG methylation over the promoter region of known tumour suppressor genes. We note, however, that the analysis of both cohorts supports the observation that basal-like breast cancers, regardless of whether or not they are Epi-
13
ACCEPTED MANUSCRIPT
Basal, significantly associate with promoter methylation of the BRCA1 gene - a central tumour suppressor gene in breast cancer (data no shown). BRCA1 promoter methylation, however, does not hold significantly associated with the Epi-Basal subtype (data not shown). This event, i.e. BRCA1 promoter methylation, is therefore likely to be more
RI PT
specifically related to the basal-like phenotype rather than the gene body hypomethylation phenotype.
3.5. Distinct tumour evolutionary paths in association with Epi-LumB and Epi-Basal
SC
tumors
The different patterns, or signatures, of changes in CpG methylation states are
M AN U
indicative of divergent evolutionary paths. To test this hypothesis, we studied the patterns of DNA copy number changes in the same cohort of breast tumors as those analyzed using the 450K Infinium methylation method. Here, we identified highly recurrent copy number changes that were significantly associated with Epi-LumB tumors in both cohorts
TE D
involving deletions over chromosomes 13q and 16p, together with copy number gains of chromosomes 12q, 17q and 20q (Figure 5A). In this analysis, we identified subtypespecific events for Epi-LumB tumors involving either DNA copy number loss or CpG
EP
promoter methylation over DZIP1, TNFSF11, ZIC5, COL4A2, COL4A1 and PCDH8
AC C
indicating candidate tumour suppressor genes (Figure 5A; Supplementary Figure 3). Of these genes, DZIP1 and COL4A2 show a significant relation between promoter methylation and down-regulated expression based on data from the TCGA cohort (data not shown).
As expected, given the overlap with the basal-like subtype, we found that tumors of the Epi-Basal subtype displayed highly aberrational landscapes of DNA copy number changes. Many of the DNA copy number changes observed in Epi-Basal tumors were recurrent such as deletions over 4p, 5q, 12q and 17p as well as copy number gains over
14
ACCEPTED MANUSCRIPT
7q, 9p and 12p (Figure 5B). Here, subtype-specific events involving either DNA copy number loss or CpG promoter methylation were found to involve TENC1, RARA, PCDHB11, PCDHB14, PCDHGA2 and PCDHGB2 indicative of candidate tumour suppressor genes in this context (Figure 5B; Supplementary Figure 3). Based on the
RI PT
TCGA cohort, the TENC1, PCDHB11 and PCDHGA2 genes show a clear relation between promoter methylation and down-regulated expression (data not shown). 3.6. The clinical relevance of DNA methylation-based subtypes
SC
To determine the clinical relevance of the DNA methylation-based subtypes (the Epi-LumB and Epi-Basal subtypes), we developed locus-specific assays for analyzing a
M AN U
few selected markers that could serve as proxies for tumour classification. We selected TTBK1, ZNF132 and KCNA3 from the catalogue of top significant CpG island promoter methylation events (only taking into account CpG´s located in gene promoters containing CpG islands) associated with Epi-LumB tumors (Supplementary Table 3). In designing an
TE D
assay for the Epi-Basal subtype, there are a number of problems that arise with detecting gene body hypomethylation including the identification of a region that is recurrently hypomethylated while, at the same time, the change from high to low levels of methylation
EP
(in normal tissue compared with tumors, respectively), would need to be substantial to
AC C
enable accurate and sensitive detection in clinical samples. The presence of stromal cells, the normal breast epithelial tissue and lymphocytes in such samples all add to the difficulties with this type of analysis. Although these factors were not an obstacle in discovering the gene body hypomethylation phenotype (i.e. in relation to the Epi-Basal subtype), see Supplementary Figure 4, they can lead to unreliable results in the clinical setting. For these reasons we selected CpG promoter methylation events, rather than body hypomethylation events, identified as top significant markers in association with tumors classified as Epi-Basal. As this assay does not directly measure the phenotype of interest, i.e. gene body hypomethylation, it should be understood that it may simply reflect
15
ACCEPTED MANUSCRIPT
another way for classifying basal-like breast cancers (an alternative to expression analyses) – especially since nearly all of the basal-like breast cancers included in the discovery cohort were found to display the gene body hypomethylation phenotype as revealed in Figure 4B& D. Regardless of this, it is of interest to determine whether a DNA
RI PT
methylation-based assay for the Epi-Basal subtype can improve the prognostic value for this subtype contrasted against the expression-based definition alone. Based on this reasoning, we selected ZNF671 and TENC1 promoter methylation as putative markers of
SC
Epi-Basal breast cancers from the top promoter methylation events identified as significantly associated with tumors classified as Epi-Basal (Supplementary Table 3).
M AN U
Although the selection of subtype-specific methylation markers was based on the top significant CpG´s indicative of different breast cancer subtypes (not taking into account their effects on gene expression), we find that the proxy markers KCNA3, ZNF132, TENC1 and ZNF671 are down-regulated in association with promoter methylation based on RNA
TE D
sequencing and methylation data from the TCGA cohort (Supplementary Figure 5). Pyrosequencing was then used to analyze the selected proxy markers for Epi-LumB and Epi-Basal tumors in an independent validation cohort of primary breast tumour
EP
samples from 310 patients. The Epi-LumB subtype was then assigned to tumors displaying methylation over the promoter region of two out of the three surrogate markers
AC C
(TTBK1, ZNF132 and KCNA3). The Epi-Basal subtype was then assigned to tumors negative for the Epi-LumB phenotype while positive for methylation of either TENC1 or ZNF671. In support for the validity of this classification system, we find that LumB tumors are enriched within the Epi-LumB subtype while basal-like tumors are enriched within the Epi-Basal subtype (Table 2). The Epi-LumB and Epi-Basal subtypes, defined according to this proxy-based classification system, were both found to be significantly associated with greater tumour size and poorly differentiated phenotypes (Table 3). Importantly, the results revealed significantly shorter survival times for patients that develop Epi-LumB subtype
16
ACCEPTED MANUSCRIPT
breast tumors after adjustment for tumour size, the presence of lymph node metastases along with age- and year at diagnosis (Hazards-ratio=1.83; P=0.035) (Figure 6A). 3.7. Improved identification of highly aggressive breast cancers by combining methylation- and expression based subtype definitions
RI PT
We find that breast cancer-specific survival times in LumB breast cancer patients do not differ depending on whether or not the tumors are positive for the Epi-LumB phenotype (data not shown). Similarly, survival of patients with basal-like breast cancers does not
SC
differ depending on whether or not the tumors are positive for the Epi-Basal phenotype (data not shown). These results indicate that the prognostic value associated with
M AN U
methylation- and expression-based breast cancer subtype definitions do not differ significantly – although we note that the number of patients in the independent cohort with available information on both definitions entails limited statistical power.
TE D
Based on these results, we addressed the question of whether methylation markers can improve the prognostic value associated with standard expression-based definition of breast cancer subtypes. This was carried out using multivariate Cox proportional hazards
EP
regression analyses in estrogen positive and negative disease separately to then test whether the addition of the Epi-LumB and Epi-Basal subtype definitions, respectively,
AC C
would provide additional information on disease prognosis beyond that provided by expression-based markers alone. Using this approach, we first focused exclusively on estrogen-receptor (ER) positive disease and show that ER+ tumors displaying either the LumB expression-based phenotype or positivity for the Epi-LumB markers (grouped together as “high-risk luminal”) have approximately 5-fold increased risk for breast cancerspecific death (Hazards ratio=4.63; P=0.035) and, importantly, this effect was independent of the LumB expression-based phenotype alone in a model allowing adjustment for ageand year at diagnosis (Figure 6B). In estrogen-receptor negative disease, the low number
17
ACCEPTED MANUSCRIPT
of ER- tumors did not reveal formal statistical significance (Figure 6C). However, we note that a trend can be observed wherein ER- tumors that are positive for either the basal-like expression markers (CK5/6 or EGFR positivity) or the Epi-Basal markers (grouped together as “high-risk non-luminal“) are marginally associated with approximately 10-fold
RI PT
increased risk for breast cancer-specific death independently of the expression-based definition for basal-like cancers alone in a model with adjustment for age- and year at
AC C
EP
TE D
M AN U
SC
diagnosis (Hazards ratio=9.79; P=0.079) (Figure 6C, lower panel).
18
ACCEPTED MANUSCRIPT
4. Discussion
In this report we identify DNA methylation signatures that are associated with biologically distinct breast cancer subtypes. We show that the signatures identified are characterized not only by differences in the catalogue of genes affected but also in the
RI PT
CpG context, i.e. in the DNA sequences that tend to undergo changes in methylation states. This involves a significantly higher propensity for CpG island methylation at promoter sequences in breast tumors of the Luminal-B subtype (LumB), whereas gene
SC
body hypomethylation events characterize those of the Basal-like subtype. This suggests that the mechanisms predominantly responsible for changes in the landscape of breast
M AN U
cancer methylomes differ between biologically distinct cancer subtypes. Based on these results, we explored the clinical relevance of defining novel DNA methylation-based subtypes according to the hallmark features associated with each of the two identified signatures. To address this, we developed locus-specific assays for selected proxy
TE D
markers to identify 1) the signature of LumB-associated CpG island promoter methylation events defining the Epi-LumB subtype, and 2) the signature of Basal-like associated gene body hypomethylation events defining the Epi-Basal subtype. The analysis of these
EP
selected proxy markers in a larger independent cohort of patients reveals rapid disease
AC C
progression in the context of poorly differentiated phenotypes in tumors classified as either Epi-LumB or Epi-Basal. The clinical importance of this finding relates to the identification of patients that will require adjuvant treatment options and close follow-up. The failure to identify patients with aggressive disease can result in sub-optimal outcomes and, possibly, patient death that could otherwise have been avoided or substantially delayed. In turn, it is well established that a considerable fraction of patients are being unnecessarily treated with cytotoxic drugs. The novel DNA methylation-defined subtypes described here could therefore be helpful as prognostic parameters for making decision regarding the treatment of breast cancer patients.
19
ACCEPTED MANUSCRIPT
It has previously been shown that CpG island promoter methylation events in cancers tend to affect genes regulated by the Polycomb repressor group complex 2 (PRC2) marked for repression by histone modifications involving the H3K27Me3 mark in embryonic stem cells (Viré et al 2006). Our results support this finding and demonstrate
RI PT
that this property characterizes the majority of breast cancers classified as LumB. In contrast, we find that the tendency to display CpG island promoter methylation events does not specifically characterize Basal-like tumors. These tumors, however, are
SC
preferentially characterized by gene body hypomethylation events suggesting that a fundamentally different mechanism is effectively more responsible for driving epigenetic
M AN U
changes in breast cancers of the Basal-like subtype. The gene bodies of actively transcribed genes are generally found heavily methylated (Jones PA, 2012). The functional relevance of gene body hypomethylation events is still unclear, however, a recent paper describes hypomethylated gene bodies as a characteristic of repressed genes in a cancer cell line model (Hon et al 2012). Further support for a causative relation
TE D
between hypomethylation of gene bodies and loss of expression has been reported by other research groups (Yang et al 2014; Varley et al 2013). Thus, loss of gene body
EP
methylation described here as a hallmark feature of Basal-like tumors could represent a mechanism of gene silencing.
AC C
By using data on DNA copy number changes, we identified genes of potential interest as tumour suppressors in the development of Epi-LumB and Epi-Basal tumors, including DZIP1 and COL4A2 (Epi-LumB) and TENC1 (Epi-Basal). In fact, CpG promoter methylation of the TENC1 gene ranked among the top scoring genes as a proxy marker for the Epi-Basal subtype. The TENC1 gene (officially named tensin like C1 domain containing phosphatase tensin 2) has not previously been identified as a tumour suppressor gene although it has been shown to physically interact with DLC1 gene products (a well-established tumour suppressor gene in liver cancer) through which it
20
ACCEPTED MANUSCRIPT
contributes to growth suppression (Chan et al 2009). Likewise, none of the Epi-LumB specific markers, i.e. ZNF132, TTBK1 and KCNA3 have been described as tumour suppressors and it is not clear at this point whether epigenetic inactivation of these genes represent driver events that contribute to cancer development. Indeed, the possibility
RI PT
remains that CpG island promoter methylation of these genes may simply reflect the outcome of processes taking place on a genome-wide scale. As an example, it has been suggested that loss of protective barriers guarding CpG islands against cytosine
SC
methylation are responsible for CpG island promoter methylation in cancer; in which case PRC2-regulated genes can simply be seen as more vulnerable than other genes for CpG
M AN U
island promoter methylation as they are already targeted for repression in normal cells. In this scenario, the vast majority of CpG island promoter methylated genes represent passenger events with only a few true cancer driving events found in each case; in this context, DZIP1, COL4A2, L3MBTL4, IRX1 and RASSF10 are named as likely candidates. Nonetheless, CpG island promoter methylation of the three Epi-LumB proxy markers
TE D
identified here (ZNF132, TTBK1 and KCNA3) are more consistently observed across all tumors of this subtype and thus more appropriate as proxies for identifying tumors of this
EP
subtype regardless of whether they represent true cancer driving events or not. The causative factors implicated in the methylator phenotype are currently
AC C
emerging with exciting advances taking place in gliomas following the identification of acquired mutations in the IDH1 gene (Noushmehr et al 2010). IDH1 mutations were found to be correlated with high frequency of CpG island methylation events in gliomas and subsequently shown to be a causative factor in this regard (Noushmehr et al 2010; Turcan et al 2012). More recently, the value of IDH1 inhibitors has now been established in preclinical models as a possible anti-cancer drug specifically active in IDH1 mutated cancer cells leading to delayed growth and the induction of differentiation (Rohle et al 2013). However, the IDH1 gene is rarely mutated in cancers other than gliomas and is therefore
21
ACCEPTED MANUSCRIPT
not a likely candidate as a causative factor for the CpG methylator phenotype in breast cancers (Kandoth et al 2013). Nevertheless, recent efforts aimed at cancer genome sequencing have revealed recurrent mutations in epigenetic genes, i.e. genes involved in regulating chromatin dynamics and the processing of epigenetic marks (Stephens et al
RI PT
2012; Cancer Genome Atlas Network 2012). Examples include mutations in the ARID1A gene occurring in a subset of ovarian cancers (Jones et al 2010) and breast cancers (Mamo et al 2012) along with mutations in the MLL2, MLL3, SMARCD1, NCOR1 and
SC
ARID1B genes found in a subset of breast cancers (Stephens et al 2012; Cancer Genome Atlas Network 2012, Kandoth et al 2013). The distinct DNA methylation signatures we
M AN U
describe can be a great complement to the recently described genetic and expression (Stephens et al 2012, Curtis et al 2012, Dvinge et al 2013) profiles to translate genomic
AC C
EP
TE D
analyses to clinical applications in breast cancer.
22
ACCEPTED MANUSCRIPT
5. Conclusions
Breast cancer is a heterogeneous disease with different subtypes showing distinct biological and clinical features. An important problem in breast cancer treatment is the definition of patient subsets that will require aggressive treatment options and close follow-
RI PT
up after treatment. By studying the DNA methylation landscape of breast cancers, we have discovered signatures associated with two biologically distinct and aggressive breast cancer subtypes known as Basal-like and Luminal-B. The identified signatures underlie the
SC
definition of two novel DNA methylation-based subtypes referred to here as Epi-Basal and Epi-LumB, respectively. We show that the DNA methylation defined subtypes hold
M AN U
prognostic value that could provide helpful information beyond that of other clinical parameters for making decisions on whether or not cytotoxic therapy will be necessary. In this context, we show that only a few selected proxy markers are sufficient to classify tumors according to DNA methylation-based subtypes using a cost-effective method and
AC C
EP
TE D
therefore easily incorporated into the clinic.
23
ACCEPTED MANUSCRIPT
References
1. Bediaga N.G., Acha-Sagredo A., Guerra I., Viguri A., Albaina C., Ruiz Diaz I., Rezola R., Alberdi M.J., Dopazo J., Montaner D., Renobales M., Fernández A.F., Field J.K., Fraga M.F., Liloglou T., de Pancorbo M.M. 2010. DNA methylation
RI PT
epigenotypes in breast cancer molecular subtypes. Breast Cancer Res. 12(5), R77. 2. Cancer Genome Atlas Network. 2012. Comprehensive molecular portraits of human
SC
breast tumours. Nature. 490(7418), 61-70.
3. Chan LK, Ko FC, Ng IO, Yam JW. 2009. Deleted in liver cancer 1 (DLC1) utilizes a
M AN U
novel binding site for Tensin2 PTB domain interaction and is required for tumorsuppressive function. PLoS One. 4(5):e5572.
4. Chin K., DeVries S., Fridlyand J., Spellman P.T., Roydasgupta R., Kuo W.L., Lapuk A., Neve R.M., Qian Z., Ryder T., Chen F., Feiler H., Tokuyasu T., Kingsley C.,
TE D
Dairkee S., Meng Z., Chew K., Pinkel D., Jain A., Ljung B.M., Esserman L., Albertson D.G., Waldman F.M., Gray J.W. 2006. Genomic and transcriptional
EP
aberrations linked to breast cancer pathophysiologies. Cancer Cell. 10(6), 529-41. 5. Curtis C., Shah S.P., Chin S.F., Turashvili G., Rueda O.M., Dunning M.J., Speed
AC C
D., Lynch A.G., Samarajiwa S., Yuan Y., Gräf S., Ha G., Haffari G., Bashashati A., Russell R., McKinney S; METABRIC Group., Langerød A., Green A., Provenzano E., Wishart G., Pinder S., Watson P., Markowetz F., Murphy L., Ellis I., Purushotham A., Børresen-Dale A.L., Brenton J.D., Tavaré S., Caldas C., Aparicio S. 2012. The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups. Nature. 486(7403), 346-52. 6. Dedeurwaerder S., Desmedt C., Calonne E., Singhal SK., Haibe-Kains B., Defrance M., Michiels S., Volkmar M., Deplus R., Luciani J., Lallemand F., Larsimont D.,
24
ACCEPTED MANUSCRIPT
Toussaint J., Haussy S., Rothé F., Rouas G., Metzger O., Majjaj S., Saini K., Putmans P., Hames G., van Baren N., Coulie PG., Piccart M., Sotiriou C., Fuks F. 2011. DNA methylation profiling reveals a predominant immune component in breast cancers. EMBO Mol Med. 3(12), 726-741.
RI PT
7. Diaz-Cruz ES, Cabrera MC, Nakles R, Rutstein BH, Furth PA. 2010. BRCA1 deficient mouse models to study pathogenesis and therapy of triple negative breast cancer. Breast Dis. 32(1-2), 85-97.
SC
8. Dvinge H., Git A., Gräf S., Salmon-Divon M., Curtis C., Sottoriva A., Zhao Y., Hirst M., Armisen J., Miska E.A., Chin S.F., Provenzano E., Turashvili G., Green A., Ellis
M AN U
I., Aparicio S., Caldas C. 2013. The shaping and functional consequences of the microRNA landscape in breast cancer. Nature. 497(7449), 378-82. 9. Fang F., Turcan S., Rimner A., Kaufman A., Giri D., Morris L.G., Shen R., Seshan
TE D
V., Mo Q., Heguy A., Baylin S.B., Ahuja N., Viale A., Massague J., Norton L., Vahdat L.T., Moynahan M.E., Chan T.A. 2011. Breast cancer methylomes establish an epigenomic foundation for metastasis. Sci Transl Med. 3(75), 75ra25.
EP
10. Heyn H., Esteller M. 2012. DNA methylation profiling in the clinic: applications and
AC C
challenges. Nat Rev Genet. 13(10), 679-692. 11. Holm K., Hegardt C., Staaf J., Vallon-Christersson J., Jönsson G., Olsson H., Borg A., Ringnér M. 2010. Molecular subtypes of breast cancer are associated with characteristic DNA methylation patterns. Breast Cancer Res. 12(3), R36. 12. Hon G.C., Hawkins R.D., Caballero OL., Lo C., Lister R., Pelizzola M., Valsesia A., Ye Z., Kuan S., Edsall L.E., Camargo A.A., Stevenson B.J., Ecker J.R., Bafna V., Strausberg R.L., Simpson A.J., Ren B. 2012. Global DNA hypomethylation coupled to repressive chromatin domain formation and gene silencing in breast cancer.
25
ACCEPTED MANUSCRIPT
Genome Res. 22(2), 246-258.
13. Hortobagyi G.N., de la Garza Salazar J., Pritchard K., Amadori D., Haidinger R., Hudis C.A., Khaled H., Liu M.C., Martin M., Namer M., O'Shaughnessy J.A., Shen Z.Z., Albain K.S.; ABREAST Investigators. 2005. The global breast cancer burden:
RI PT
variations in epidemiology and survival. Clin Breast Cancer. 6(5), 391-401. 14. Jones PA. 2012. Functions of DNA methylation: islands, start sites, gene bodies and beyond. Nat Rev Genet. 13(7):484-92.
SC
15. Jones S., Wang T.L., Shih IeM., Mao T.L., Nakayama K., Roden R., Glas R., Slamon D., Diaz L.A Jr., Vogelstein B., Kinzler K.W., Velculescu V.E.,
M AN U
Papadopoulos N. 2010. Frequent mutations of chromatin remodeling gene ARID1A in ovarian clear cell carcinoma. Science. 330(6001), 228-231. 16. Kandoth C., McLellan M.D., Vandin F., Ye K., Niu B., Lu C., Xie M., Zhang Q., McMichael J.F., Wyczalkowski M.A., Leiserson M.D., Miller C.A., Welch J.S., Walter
TE D
M.J., Wendl M.C., Ley T.J., Wilson R.K., Raphael B.J., Ding L. 2013. Mutational landscape and significance across 12 major cancer types. Nature. 17; 502(7471),
EP
333-9.
17. López-Tarruella S., Martín M. 2009. Recent advances in systemic therapy.
AC C
Advances in adjuvant systemic chemotherapy of early breast cancer. Breast Cancer Res. 11, 204.
18. Mamo A., Cavallone L., Tuzmen S., Chabot C., Ferrario C., Hassan S., Edgren H., Kallioniemi O., Aleynikova O., Przybytkowski E., Malcolm K., Mousses S., Tonin PN., Basik M. 2012. An integrated genomic approach identifies ARID1A as a candidate tumor-suppressor gene in breast cancer. Oncogene. 31(16), 2090-2100. 19. Noushmehr H., Weisenberger D.J., Diefes K., Phillips H.S., Pujara K., Berman B.P., Pan F., Pelloski C.E., Sulman E.P., Bhat K.P., Verhaak R.G., Hoadley K.A., Hayes
26
ACCEPTED MANUSCRIPT
D.N., Perou C.M., Schmidt H.K., Ding L., Wilson R.K., Van Den Berg D., Shen H., Bengtsson H., Neuvial P., Cope L.M., Buckley J., Herman J.G., Baylin S.B., Laird P.W., Aldape K; Cancer Genome Atlas Research Network. 2010. Identification of a CpG island methylator phenotype that defines a distinct subgroup of glioma. Cancer
RI PT
Cell. 17(5), 510-22. 20. Parker JS, Mullins M, Cheang MC, Leung S, Voduc D, Vickery T, Davies S, Fauron C, He X, Hu Z, Quackenbush JF, Stijleman IJ, Palazzo J, Marron JS, Nobel AB,
SC
Mardis E, Nielsen TO, Ellis MJ, Perou CM, Bernard PS. 2009. Supervised risk predictor of breast cancer based on intrinsic subtypes. J Clin Oncol.; 27(8):1160-7.
M AN U
21. Perou C.M., Børresen-Dale A.L. 2011. Systems biology and genomics of breast cancer. 2011. Cold Spring Harb Perspect Biol. 3(2).
22. Rohle D., Popovici-Muller J., Palaskas N., Turcan S., Grommes C., Campos C.,
TE D
Tsoi J., Clark O., Oldrini B., Komisopoulou E., Kunii K., Pedraza A., Schalm S., Silverman L., Miller A., Wang F., Yang H., Chen Y., Kernytsky A., Rosenblum M.K., Liu W., Biller S.A., Su S.M., Brennan C.W., Chan T.A., Graeber T.G., Yen K.E.,
EP
Mellinghoff I.K. 2013. An inhibitor of mutant IDH1 delays growth and promotes differentiation of glioma cells. Science. 340(6132), 626-30.
AC C
23. Saal L.H., Gruvberger-Saal S.K., Persson C., Lövgren K., Jumppanen M., Staaf J., Jönsson G., Pires M.M., Maurer M., Holm K., Koujak S., Subramaniyam S., VallonChristersson J., Olsson H., Su T., Memeo L., Ludwig T., Ethier S.P., Krogh M., Szabolcs M., Murty V.V., Isola J., Hibshoosh H., Parsons R., Borg A. 2008. Recurrent gross mutations of the PTEN tumor suppressor gene in breast cancers with deficient DSB repair. Nat Genet. 40(1), 102-7. 24. Sigurdardottir L.G., Jonasson J.G., Stefansdottir S., Jonsdottir A., Olafsdottir G.H., Olafsdottir E.J., Tryggvadottir L.2012. Data quality at the Icelandic Cancer Registry:
27
ACCEPTED MANUSCRIPT
comparability, validity, timeliness and completeness. Acta Oncol. 51(7), 880-889. 25. Sorlie T., Tibshirani R., Parker J., Hastie T., Marron J.S., Nobel A., Deng S., Johnsen H., Pesich R., Geisler S., Demeter J., Perou C.M., Lønning P.E., Brown P.O., Børresen-Dale A.L., Botstein D. 2003. Repeated observation of breast tumor
RI PT
subtypes in independent gene expression data sets. Proc Natl Acad Sci U S A. 100(14), 8418-23.
26. Stefansson O.A., Esteller M. 2013. Epigenetic modifications in breast cancer and
SC
their role in personalized medicine. Am J Pathol. 183(4), 1052-63.
M AN U
27. Stephens P.J., Tarpey P.S., Davies H., Van Loo P., Greenman C., Wedge D.C., Nik-Zainal S., Martin S., Varela I., Bignell G.R., Yates L.R., Papaemmanuil E., Beare D., Butler A., Cheverton A., Gamble J., Hinton J., Jia M., Jayakumar A., Jones D., Latimer C., Lau K.W., McLaren S., McBride D.J., Menzies A., Mudie L.,
TE D
Raine K., Rad R., Chapman M.S., Teague J., Easton D., Langerød A.; Oslo Breast Cancer Consortium (OSBREAC), Lee M.T., Shen C.Y., Tee B.T., Huimin B.W., Broeks A., Vargas A.C., Turashvili G., Martens J., Fatima A., Miron P., Chin S.F.,
EP
Thomas G., Boyault S., Mariani O., Lakhani S.R., van de Vijver M., van 't Veer L., Foekens J., Desmedt C., Sotiriou C., Tutt A., Caldas C., Reis-Filho J.S., Aparicio
AC C
S.A., Salomon A.V., Børresen-Dale A.L., Richardson A.L., Campbell P.J., Futreal P.A., Stratton M.R. 2012. The landscape of cancer genes and mutational processes in breast cancer. Nature. 486(7403), 400-4. 28. Tibshirani R, Hastie T, Narasimhan B, Chu G. 2002. Diagnosis of multiple cancer types by shrunken centroids of gene expression.Proc Natl Acad Sci U S A. 99(10), 6567-72. 29. Turcan S., Rohle D., Goenka A., Walsh L.A., Fang F., Yilmaz E., Campos C., Fabius A.W., Lu C., Ward P.S., Thompson C.B., Kaufman A., Guryanova O., Levine
28
ACCEPTED MANUSCRIPT
R., Heguy A., Viale A., Morris L.G., Huse J.T., Mellinghoff I.K., Chan T.A. 2012. IDH1 mutation is sufficient to establish the glioma hypermethylator phenotype. Nature. 483(7390), 479-83. 30. Varley KE1, Gertz J, Bowling KM, Parker SL, Reddy TE, Pauli-Behn F, Cross MK,
RI PT
Williams BA, Stamatoyannopoulos JA, Crawford GE, Absher DM, Wold BJ, Myers RM. 2013. Dynamic DNA methylation across diverse human cell lines and tissues. Genome Res. 23(3):555-67.
SC
31. Viré E., Brenner C., Deplus R., Blanchon L., Fraga M., Didelot C., Morey L., Van Eynde A., Bernard D., Vanderwinden JM., Bollen M., Esteller M., Di Croce L., de
M AN U
Launoit Y., Fuks F. 2006. The Polycomb group protein EZH2 directly controls DNA methylation. Nature. 439(7078), 871-874.
32. Yang X, Han H, De Carvalho DD, Lay FD, Jones PA, Liang G. 2014. 2014. Gene
TE D
Body Methylation Can Alter Gene Expression and Is a Therapeutic Target in Cancer. Cancer Cell. pii: S1535-6108(14)00316-X. 33. Zhao M, Sun J, Zhao Z. TSGene: a web resource for tumor suppressor genes.
AC C
EP
Nucleic Acids Res. 2013; 41(Database issue): D970-6.
29
ACCEPTED MANUSCRIPT
Figure Legends
Figure 1: DNA methylation changes in breast tumors are non-random and define patterns correlated with clinically and biologically relevant subtypes. A) Cluster analysis of differentially methylated CpGs between breast cancers and normal breast tissue (the top
RI PT
5000 most significant CpGs). Tumor characteristics (by columns on top of the heat-map) in terms of breast cancer subtype along with the presence of acquired mutations in the TP53 geneare displayed together with the CpG context (by rows on the left-hand side) according
SC
to the color scheme shown at the right-hand side and bottom of the figure, respectively. The statistically significant tumor patterns/clusters (identified by the pvclust method in R)
M AN U
are shown as colored bars immediately below the dendrogram. B) The top 10 significant CpG´s specifically characterizing each of the four “core” subtypes are shown, i.e. the LumA, LumB, HER2 and Basal-like subtypes. Note, 5NP (i.e. unclassified tumors due to negativity for all five phenotypic markers, i.e. ER, PR, HER2, CK5/6 and EGFR) and
TE D
breast tumors with unknown subtype information are grouped together as 5NP/NA and were not included in this analysis. The normal breast tissue samples are shown and indicated in black on top of the heat-map. Note, the heat-map colors reflect beta-values
EP
representing the degree of methylation from low to high as green to red, respectively (wherein black represents heterogenous/hemi-methylation), as shown on the scale at the
AC C
top-right hand side of the figure.
Figure 2: Validation of CpG methylation patterns associated with breast cancer subtypes. A) The subtype-specific CpG methylation changes identified in relation to each of the four breast cancer subtypes (LumA, LumB, HER2 and Basal-like) were validated in an independent cohort obtained through the Cancer Genome Atlas. The overlap, i.e. the number of CpG´s consistently associated with each of the subtypes in both the TCGA and
30
ACCEPTED MANUSCRIPT
PEBC cohorts, is indicated by an arrow. B) The validated set of 254 LumB and 202 Basallike specific CpG’s shown in both cohorts, i.e. the PEBC cohort (left) and the TCGA cohort (right). The expression-based subtypes are shown on top of the heat-maps according to the color scheme displayed at the bottom of the figure. Note the heterogeneity among
RI PT
LumB breast tumors, i.e. although most LumB tumors appear to robustly display the LumB methylation pattern there are still those that do not and, furthermore, some of the tumors classified as either LumA or HER2 by expression analysis appear to display the LumB-
SC
associated CpG methylation pattern. The CpG methylation states identified as methylated or hypomethylated (relative to normal breast tissue samples) are indicated on the right
M AN U
hand side of the heat-maps. The heat-map colors reflect beta-values representing the degree of methylation from low to high as green to red, respectively (wherein black represents heterogenous/hemi-methylation), as shown on the scale at the bottom-right
TE D
hand side of the figure.
Figure 3: CpG sequence characteristics of the identified subtype-specific methylation changes. A) The validated DNA methylation signatures specific for LumB (254 CpG´s) and
EP
B) Basal-like (202 CpG´s) breast cancers differ significantly with respect to the sequence
AC C
context in which CpG methylation changes tend to occur. The P-values corresponding to a chi-squared contingency tests indicate “hallmark” features statistically significant for each of the two subtypes involving promoter methylation events for the LumB subtype (χ2=14.7; P-value=0.002) and gene body hypomethylation for the Basal-like subtype (χ2=9.8; Pvalue=0.02). The percentages are computed over all subtype-specific CpG´s wherein the total number was 254 CpG´s for the LumB subtype and 202 CpG´s for the Basal-like subtype (with the number of CpG´s given for each count; on the top of each bar). The categories analysed include 1) gene promoter regions, 2) -1500bp TSS indicative of CpG sites located between 1500bp and 200bp upstream of transcription start sites (TSS); thus
31
ACCEPTED MANUSCRIPT
representing potential cis regulatory regions without involving the promoter region and 3) the gene body sites representing sequences located within the gene body. Lastly, the “other” category predominantly represents CpG´s found in intergenic regions, i.e. those found outside of the cis regulatory regions (i.e. those not included in the categories of
RI PT
either promoter or -1500bp CpG´s) while also not included in the gene body category. C) Promoters displaying methylation in association with either the Basal-like or LumB subtypes analysed in terms of CpG islands, CpG shores or CpG poor promoter regions.
SC
The percentages are computed for each subtype separately (i.e. Basal-like and LumB) based on the total number of promoter methylation events specific for each of the two
M AN U
subtypes, i.e. a total of 19 CpG´s for the Basal-like subtype and 137 CpG´s for the LumB subtype. D) The genes affected by promoter methylation within the validated subtypespecific signatures analyzed in terms of whether or not they have been identified as targets of the Polycomb group repressor complex 2 (PRC2). The P-values were derived from Fisher’s exact hypothesis testing and the threshold line indicated (grey line) reflects
EP
testing.
TE D
the Padjusted< 0.01 significance level after Bonferroni adjustment for multiple hypothesis
AC C
Figure 4: The definition of DNA methylation-based subtypes in breast tumors. A) Crossvalidated probability values derived from a well-established pattern recognition algorithm (PAMr implemented in R) indicating how robustly each tumour displays the validated signature of LumB-associated CpG island promoter methylation events (based on the validated catalogue of LumB associated CpG island promoter methylation events, i.e. the 129 CpG´s indicated in Figure 3C). The barplots show the probability values on the y-axis for each tumour ordered on the x-axis from left to right according to the probability values from low to high, respectively, for each of the two cohorts i.e. the PEBC cohort (barplot on the left) and the TCGA cohort (barplot on the right). Tumors scoring positive for this
32
ACCEPTED MANUSCRIPT
signature have a cross-validated probability> 0.50 as indicated by the dashed line over each of the two barplots. The color of each bar (representing tumors) is indicative of the expression-based subtype as given at the bottom of the figure. B) The cross-validated probability values derived from PAMr indicating how robustly each tumour displays the
RI PT
Basal-like associated gene body hypomethylation signature (based on the 53 CpG´s as indicated in the lower panel of Figure 3B) shown for both the PEBC (left) and TCGA cohorts (right). C) DNA methylation data over the validated catalogue of 129 “hallmark”
SC
CpG´s characteristic of LumB tumors (i.e. those identified within CpG island promoters in association with the LumB subtype consistently in both the PEBC and TCGA cohorts)
M AN U
shown with respect to the novel Epi-LumB subtype. The expression-based subtypes are shown according to the color scheme displayed at the bottom of the figure. D) Similarly, the DNA methylation data over the validated catalogue of 53 “hallmark” CpG´s characteristic of Basal-like tumors (i.e. gene body CpG´s consistently associated with
novel Epi-Basal subtype.
TE D
Basal-like tumors in both the PEBC and TCGA cohorts) are shown with respect to the
EP
Figure 5: Patterns of DNA copy number changes associated with Epi-LumB and Epi-Basal
AC C
tumors revealing divergent tumor evolutionary paths and candidate tumor suppressor genes. A) Frequencies of DNA copy number gains and losses in Epi-LumB tumors are displayed as proportions on the y-axis (positive and negative signs represent gains and losses, respectively) according to genomic position along the x-axis from p-arm to q-arm (left to right, respectively) for all 23 chromosomes. The blue and red lines are frequencies derived from the PEBC and TCGA cohorts, respectively. The bars (black) on top and below the frequency plot indicate positions wherein copy number gains and losses, respectively, were found to be associated with Epi-LumB tumors (comparing Epi-LumB against all other tumors). Only regions wherein the statistical significance achieved less
33
ACCEPTED MANUSCRIPT
than 5% FDR in both the PEBC and TCGA cohorts are shown. A subset of the genes found within significant regions of copy number loss were also found to be promoter methylated in association with the Epi-LumB subtype. The identity of these candidate tumour suppressor genes are given below the frequency plot. B) Event frequency plot for
RI PT
Epi-Basal tumors (upper), with the corresponding significance analysis (black bars above and below the frequency plot indicative of significant copy number gains and losses, respectivley). Epi-Basal associated promoter methylation events (gene symbols) found
SC
within significant regions of copy number losses are shown.
M AN U
Figure 6: Breast cancer-specific survival analysis with respect to the novel DNA methylation defined subtypes, i.e. the Epi-LumB and Epi-Basal subtypes. A) Patients with breast tumors classified as either Epi-LumB or Epi-Basal on the basis of proxy CpG methylation markers (see section 3.6. for details) are associated with reduced time to
TE D
death due to breast cancer (i.e. breast cancer-specific death). The lower panel displays a multivariate Cox´s proportional hazards modelling of the survival data wherein the EpiLumB subtype was found to be an independent prognostic factor after adjustment for
EP
tumour size, lymph node metastases and age- and year at diagnosis. B) In analyzing a
AC C
subset of the tumors wherein information on hormone receptor status (along with subtypespecific markers) was available, we show that ER positive tumors classified as either EpiLumB (based on proxy-methylation markers) or LumB (based on the Ki-67 expression marker) represents an improvement above the LumB definition alone for predicting reduced time to breast cancer-specific death. The lower panel shows the multivariate Cox´s proportional hazards regression model for ER positive tumors classified as either Epi-LumB (based on the proxy-methylation markers) or LumB (based on high Ki-67 expression) referred to as “high-risk luminal tumors”, demonstrating the superior prognostic value in combining epigenetic and expression data above that of expression
34
ACCEPTED MANUSCRIPT
data alone. C) The Epi-Basal subtype combined with the basal-like expression-based definition (using EGFR and CK5/6) was not significantly associated with reduced time to breast cancer specific deaths in ER negative breast cancers. However, as shown in the lower panel, the combination of epigenetic and expression data provides superior
RI PT
prognostic value of marginal statistical significance above that of expression data alone. Here, ER negative tumors classified as either Epi-Basal (on the basis of proxy-methylation markers) or basal-like (based on expression of either EGFR or CK5/6), referred to as
SC
“high-risk non-luminal tumors” remain marginally significant for predicting breast cancer-
AC C
EP
TE D
M AN U
specific deaths while the basal-like definition alone does not hold significant.
35
ACCEPTED MANUSCRIPT Table 1: Epi-LumB specific CpG methylation events (significant at the <1% FDR in both the PEBC and TCGA cohorts) were found to affect a subset of previously known tumor suppressor genes (based on the catalouge of 716 TSG´s). The statistics shown describe the relation between CpG methylation and expression over Epi-LumB associated TSG´s in the TCGA cohort where data was available on both CpG methylation and expression (by RNA sequencing) for 731 tumours and 82 normal breast tissue samples. 2 Statistically significant hits are shown (including a miminum threshold of R > 0.10 and fold change > 2).
L3MBTL4 L3MBTL4 L3MBTL4 L3MBTL4 L3MBTL4 L3MBTL4 ID4 IRX1 L3MBTL4 ID4 PTCH2 RASSF10 IRX1
Fold change in expression (Unmethylated / Methylated)
P-value (adjusted)
0.264 0.259 0.255 0.253 0.245 0.241 0.238 0.191 0.177 0.147 0.116 0.109 0.106
2.667 3.004 2.621 2.902 2.652 2.710 2.850 6.137 2.625 2.676 2.028 9.006 5.251
4.96E-55 1.12E-53 7.90E-53 2.17E-52 2.02E-50 1.43E-49 8.89E-49 2.08E-38 1.99E-35 5.32E-29 7.70E-23 2.33E-21 8.22E-21
RI PT
cg14352983 cg08336641 cg14155416 cg12924825 cg18556788 cg17688525 cg03715143 cg09232937 cg05724871 cg14271531 cg21167628 cg20918243 cg10530883
R2
SC
Gene symbol
AC C
EP
TE D
M AN U
TargetID
ACCEPTED MANUSCRIPT Table 2: DNA methylation defined subtypes in an independent cohort validating the relation to the classification of breast cancers according to expression-based subtypes*. HER2 2 (8%) 1 (4%) 0
LumA 7 (29%) 6 (24%) 25 (62%)
LumB 11 (46%) 7 (28%) 14 (35%)
5NP** 2 (8%) 1 (4%) 1 (3%)
Total 24 (100%) 25 (100%) 40 (100%)
RI PT
EpiLumB EpiBasal Other
Basal-like 2 (8%) 10 (40%) 0
2
X =31.0; P =0.00014
* Information on expression-based subtype classification was available in 89 of the tumours included in the validation cohort.
AC C
EP
TE D
M AN U
SC
* * 5NP represents unclassified tumours due to negativity for the five phenoptypic markers ER, PR, HER2, CK5/6 and EGFR
ACCEPTED MANUSCRIPT Table 3: The clinical relevance of Epi-LumB and Epi-Basal tumours, defined according to selected proxy markers, analysed with respect to parameters of clinical staging (tumour size and nodal metastasis status) and degree of differentiation (histological grade). T1a-c
T2 - T3
Total
EpiLumB
18 (26%)
52 (74%)
68 (100%)
EpiBasal
17 (30%)
40 (70%)
56 (100%)
Other
46 (52%)
42 (48%)
RI PT
Tumour size
88 (100%)
2
X =13.7; P =0.0010
Positive
EpiLumB
23 (35%)
43 (65%)
64 (100%)
EpiBasal
19 (36%)
34 (64%)
52 (100%)
Other
37 (49%)
38 (51%)
75 (100%)
Histological grading
1+/2+ 12 (32%)
EpiBasal
11 (32%)
Other
47 (78%)
AC C
EP
TE D
EpiLumB
SC
Negative
M AN U
Nodal metastases
3+
Total
X 2=3.8; P = 0.15 Total
25 (67%)
35 (100%)
23 (68%)
33 (100%)
13 (22%)
54 (100%) X 2=27.6; P < 0.0001
ACCEPTED MANUSCRIPT
B
Cluster 3
**TP53 mutations Mutated
Not mutated
TE D
Not determined
***Subtype
HER2
Luminal-B
Luminal-A
CpG island/shore Non-CpG island/shore
AC C
CpG contex
CpG contex
EP
Basal-like
Normal breast tissue n = 17
5NP Unclassified
Normal breast tissue
Lum-A specific markers
Cluster 2
M AN U
Cluster 1
Lum-B HER2 Basal-like tumors tumors tumors
Lum-B specific markers
Subtype***
Lum-A tumors
100%
Top 10 significant CpG´s for each breast cancer subtype
HER2 specific markers
*Clusters (significance anlaysis)
0%
Basal-like specific markers
Clusters* TP53 mutations**
SC
RI PT
Methylation (%)
Normal breast tissue n = 17
5NP/NA tumors
ACCEPTED MANUSCRIPT
PEBC cohort
LumB specific CpG´s
PEBC
TCGA
6004
975
LumA specific CpG´s
EP
12
Hypo
PEBC
51
Hypo
TCGA
TE D
1 CpG
Meth
8
Meth
1212
254 CpG´s
202 Basal-like specific CpG´s
PEBC
Hypo
TCGA
Hypo
HER2 specific CpG´s
Meth
202 CpG´s
M AN U
430
Meth
254 LumB specific CpG´s
PEBC
8779
SC
TCGA
TCGA cohort
254 LumB specific CpG´s
B
Basal-like specific CpG´s
202 Basal-like specific CpG´s
A
RI PT
e2
HER-2
Luminal-B
AC C
No overlap
Luminal-A
Methylation (%)
Basal-like
Normal-like
Normal breast tissue
0%
100%
ACCEPTED MANUSCRIPT
137
Other
Promoter
LumB specific CpG hypomethylation
40 30
8
2
3
2
53
8
10
11
3 5
D
Polycomb repressive complex 2 gene targets
4 3
Padj < 0.01
2 1
Promoter
0 -1500bp TSS
Promoter
-1500bp TSS
AC C
Gene body
0
Other
0
41
1
0
5
Gene body
10
20
6
20
8
40
40
EP
%
60
50
30
20
19
Basal-like specific CpG hypomethylation
50
%
13
TE D
-1500bp TSS
0 Gene body
0 Other
10
%
LumB
23
10
34
LumB
27
20
HER2
32
%
10
Basal-like
30
43
Other
20
CpG islands CpG shores CpG poor
80
Gene body
30
%
129
LumA
40
Promoter associated CpG methylation
Promoter
40
M AN U
50
-1500bp TSS
50
C
SC
Basal-like specific CpG methylation
Basal-like
B
LumB specific CpG methylation
-Log10(P-value)
A
RI PT
Figure 3
ACCEPTED MANUSCRIPT
4
1.0
1.00
0.5
RI PT
0.50 0
CV probabilities
M AN U
1.0
1.00
0.5
0.50 0
0.0
CV probabilities
PEBC cohort
TCGA cohort
EpiBasal (CV > 0.50)
Methylation (%)
HER-2
Luminal-B
Luminal-A
Basal-like
Normal-like
EpiBasal (CV > 0.50)
53 CpG´s (Basal-like signature)
D
53 CpG´s (Basal-like signature)
AC C
129 CpG´s (LumB signature)
EpiLumB (CV > 0.50)
EP
TCGA cohort
EpiLumB (CV > 0.50)
129 CpG´s (LumB signature)
PEBC cohort
212 tumours & 38 normal breast samples
TE D
40 tumours & 17 normal breast samples
C
EpiBasal (CV > 0.50)
TCGA cohort
EpiBasal (CV > 0.50)
1 .0 0 .5
1.00 0.50 0
0 .0
CV probabilities
PEBC cohort
212 tumours & 38 normal breast samples
SC
40 tumours & 17 normal breast samples
B
EpiLumB (CV > 0.50)
TCGA cohort
EpiLumB (CV > 0.50)
1 .0 0 .5 0 .0
1.00 0.50 0
CV probabilities
PEBC cohort
0.0
A
Normal breast tissue
0%
100%
ACCEPTED MANUSCRIPT
Epi-LumB tumours (Frequency Plot)
M AN U
SC
Loss (-) /Gains (+)
A
RI PT
5
PEBC cohort TCGA cohort
ZIC5 (13q32.3) COL4A2(13q34) COL4A1(13q34)
Epi-Basal tumours (Frequency Plot)
PCDHB11 (5q31.3) PCDHB14 (5q31.3) PCDHGA2 (5q31.3) PCDHGB2 (5q31.3)
AC C
PEBC cohort TCGA cohort
EP
TE D
Loss (-) /Gains (+)
B
TNFSF11 (13q14.11) PCDH8 (13q14.3) DZIP1 (13q32.1)
TENC1 (12q13.13)
RARA (17q21.2)
ACCEPTED MANUSCRIPT
Figure 6 B
2
4
6
8
10
12
4
8
TE D
High-risk Lum
Epi-LumB
LumB
Age
*** *** 0.5
1.0
2.0
5.0
Hazards ratio
* P < 0.10
** P < 0.05
0.6 0.4
14
**
0
2
4
Year
Age
Age
Tumour size
Tumour size
LN
LN
2.0
5.0
20.0
Hazards ratio
*** P < 0.01
8
10
High risk non-Lum
Year
0.0
6
12
14
Time (years)
*
Basal-like
AC C
Year
LN
12
EP
**
Tumour size
10
Time (years)
Epi-Basal
0.0
6
0.8
1.0
2
0.2
SC 0
14
Time (years)
High-risk (Non-Luminal) (n=25) Other (Non-Luminal) (n=8)
0.0
0.0 0
M AN U
High-risk (Luminal) (n=42) Other (Luminal) (n=38)
0.0
Epi-LumB (n=76) Epi-Basal (n=70) Other (n=94)
Estrogen negative (Non-Luminal)
RI PT
0.8 0.6
Survival (proportion)
0.2
0.4
1.0 0.8 0.6 0.4 0.2
Survival (proportion)
C
Estrogen positive (Luminal) 1.0
Validation cohort (n = 210)
Survival (proportion)
A
***
0.0
20.0
125.0
Hazards ratio
ACCEPTED MANUSCRIPT
Highlights:
•
We describe distinct signatures associated with two aggressive breast
•
RI PT
cancer subtypes known as the luminal-B and basal-like subtypes.
The signatures identified are characterized not only by differences in the
SC
catalogue of genes affected but also in the CpG context, i.e., in the DNA
A selected set of proxy markers for each of the two distinct signatures were then analyzed in an independent cohort of breast cancer patients – revealing
EP
TE D
their clinical relevance.
AC C
•
M AN U
sequences that tend to undergo changes in methylation states.
ACCEPTED MANUSCRIPT
Necessary Additional Data Stefansson OA, Moran S, et al. A DNA methylation-based definition of biologically distinct breast cancer subtypes.
RI PT
Inventory:
Supplementary Figures 1, 2, 3, 4 and 5
M AN U
Supplementary Tables 1, 2,and 3
SC
Supplementary Methods
Supplementary Methods
TE D
Infinium HumanMethylation450 BeadChips High-quality DNA samples obtained from tumors (n=40) and normal breast tissue (n=17) were selected for bisulfite conversion (Zymo Research; EZ-96
EP
DNA Methylation™ Kit) and hybridization to Infinium HumanMethylation450 BeadChips (Illumina) following Illumina’s Infinium HD Methylation protocol.
AC C
The quality control sample standards were DNA concentration measurements by the PicoGreen method (Invitrogen) coupled with assessment of DNA purity based on A260/A280 ratios (ranging between 1·75 and 1·95) and A260/A230 ratios (ranging between 2·00 and 2·20). Analysis by 1% agarose gel permitted exclusion of samples with possible DNA fragmentation or RNA contamination. The Infinium HumanMethylation450 BeadChip provides coverage of > 450,000 CpG sites targeting nearly all RefSeq genes (> 99%).1 The chips were designed to cover coding and non-coding genes without bias against those lacking CpG islands. The design further aimed to cover not only promoter regulatory regions, but also CpGs across gene regions in order to include 5'-
ACCEPTED MANUSCRIPT untranslated regions (5´UTRs), the first exons, the gene bodies and 3´untranslated regions (3´UTRs). Approximately 96% of CpG islands were covered along with regions proximal to the CpG islands (“CpG shores”) and, more distal, CpG shelves. A total of 600 ng high-quality DNA samples were bisulfite converted. Whole-genome amplification and hybridization were then
RI PT
carried out on the BeadChip, followed by single-base extension and analysis on the HiScan SQ module (Illumina) to assess cytosine methylation states. Pyrosequencing
subtype
were
selected
as
proxies
SC
The most significant markers within the signature defining the Epi-LumB (TTBK1,
KCNA3and
ZNF132)
(Supplementary Table 3). In designing assays for the Epi-Basal subtype,
M AN U
several problems arise when detecting gene body hypomethylation, including the identification of a region that is recurrently hypomethylated, while the change from high to low levels of methylation in normal tissue compared with tumors would need to be substantial to allow accurate and sensitive detection in clinical samples. The presence of stromal cells, the normal breast epithelial tissue and lymphocytes in such samples all add to the difficulties with this type
TE D
of analysis. For these reasons, we selected CpG promoter methylation events, rather than hypomethylation, identified within the signature for the Epi-Basal subtype as potential surrogate markers of this phenotype (Supplementary
EP
Table 3). In this way, we identified CpG island promoter methylation of the ZNF671 and TENC1 genes as correlated features for the Epi-Basal phenotype (Supplementary Table 3). The PyroMark Q96 system (Qiagen) was used to
AC C
carry out pyrosequencing of bisulfite-converted DNA. The EZ-96 DNA Methylation-Gold™ Kit (Zymo Research) was used for bisulfite conversion. The pyrosequencing method enables quantification of the degree of methylation over selected CpG sites. The primer sequences used in this analysis were designed using Qiagen’s Pyromark Assay Design 2·0 software. Five markers were sequenced: CpG island promoter methylation of TTBK1, ZNF132, KCNA3, ZNF671 and TENC1. The primer sequences used were (all given 5´ > ´3): TTBK1 prePCR fwd: [Btn]ATGTTAGGGTTGGTGTTGA, TTBK1 pre-PCR
reverse:
CCCCRCCTCACCTACTCTATACC,
ATAACCCRCTCCCTCC.
ZNF132
prePCR
TTBK1
seq: fwd:
ACCEPTED MANUSCRIPT AGAGGTTTGTTGTAGTTTAATTTATTAGT,
ZNF132
[Btn]ATAAACCCCATTCCTCAATCACCTTAAC, GTTTTTATTGGTTTAAGGATTTT.
prePCR
rev:
ZNF132
ZNF671
seq:
prePCR
fwd:
GGAGTYGGAGAAAGGGTGATTGA, ZNF671 prePCR rev: [Btn]CATTTTATTTCTATCAAACTATCCCTAACC, TENC1
prePCR
[Btn]TTAGGTTGGAGTTTGGTTGGGATAT,
TENC1
CCAACTCTAAAATTATCCCACACCT, CCATAATCTTAAAATTCTACCC.
The
prePCR
KCNA3
PrePCR
was
prePCR
carried
M AN U
ACTACAAAAAATACAAACCC.
prePCR
SC
KCNA3
AACCRACTCTACTAAATAAACATTCC,
prePCR
TENC1 KCNA3
[Btn]GGTAGYGGTGAGGTTAGGT,
seq:
RI PT
GGGGAYGTAGGTATT.
ZNF671
fwd: rev: seq: fwd: rev: seq:
out
using
IMMOLASE (Bioline, USA) hot-start DNA polymerase in a touchdown PCR reaction (96°C 10 min; 45 cycles of {98°C for 30 s,
61°C annealing
temperature for 30 s w/-0.2°C each cycle, 72°C for 30 s} end of cycling; 72°C for 15 min). For quality control, at least five samples were run on agarose gels to validate the presence of a single sharp band of the expected size with no
TE D
extra background-amplified products present. Pyro Q-CpG software (Qiagen) was used to assess the degree of methylation, expressed as percentages from 0% (no methylation detected) to 100% (all sequenced CpGs detected as methylated). The definition of promoter methylation by pyrosequencing was
EP
based on the analysis of the 25 normal breast samples (derived from the normal breast adjacent to the site of tumorous tissue) with three standard
AC C
deviations above the mean of each marker (mean + 3*SD). This criteria gives the following marker-specific thresholds: KCNA3 (mean=3.54%; SD=4.68; KCNA3-threshold > 20%), ZNF132 (mean=1.99%; SD=3.17; ZNF132threshold > 10%), TTBK1 (mean=3.37; SD=1.85, TTBK1-threshold > 10%), TENC1
(mean=14.7%;
SD=4.12;
TENC1-threshold
>
30%),
ZNF671
(mean=2.0; SD=2.72; ZNF671-threshold > 10%). Cases of potential errors due to incomplete bisulfite conversion or low signal peaks, detected by the Pyro QCpG software, were discarded. The sequencing of the promoter regions of TTBK1, ZNF132 and ZNF671 was successful in 325, 287 and 279 of these tumors, respectively (performed in both discovery and validation cohorts, i.e.,
ACCEPTED MANUSCRIPT all 350 tumors in the study). DNA copy number analysis and tissue microarrays DNA copy number changes were analyzed by aCGH (array comparative genomic hybridization) using 385K oligonucleotide aCGH microarrays (Roche
RI PT
NimbleGen, Inc., Reykjavik, Iceland) as previously reported.2 This design contains 385,000 probes covering the human genome with ~1 probe for each 7000 bp. DNA was labeled using Cy3- and Cy5 for DNA derived from tumor and normal blood samples from the same patients, respectively, followed by
SC
hybridization on microarrays according to protocols developed by the manufacturer (NimbleGen Arrays User’s Guide-CGH Analysis, Roche NimbleGen, Inc., Reykjavik, Iceland). Tissue microarrays (TMAs) used in this
M AN U
study were constructed beforehand and analyzed for the expression of subtype-specific markers, i.e., ER, PR, HER-2, Ki-67, EGFR and CK5/6, by immunohistochemistry (IHC).3 Based on this analysis, tumors were assigned to breast cancer subtypes according to a validated classification scheme.4,5,6 Tumors positive for either ER or PR were classified as Luminal. High levels of expression of Ki-67 (> 14%) in breast tumors with a Luminal phenotype were
TE D
classified as Luminal-B (LumB), the remainder being classified as Luminal-A (LumA). Tumors negative for both ER and PR while positive for HER2 (IHC score 3+) were classified as HER2 subtype. Positivity for either CK5/6 or
EP
EGFR in tumors negative for both ER and PR were classified as Basal-like. Negativity for ER, PR, HER2, CK5/6 and EGFR was designated as 5NP (five
AC C
negative phenotype).
TP53 mutation analyses PCR amplification and constant denaturation gradient electrophoresis (CDGE) were carried out and mutations were confirmed by direct DNA sequencing. PCR conditions, CDGE and sequencing conditions were as described earlier.7,8 All samples were screened for mutations in exons 5 to 8 of the TP53
gene. Bioinformatic analyses DNA
methylation data were
normalized
using the quantile
method
ACCEPTED MANUSCRIPT implemented in lumi (R 2·12·2).9 CpGs differentially methylated between normal and breast tumor tissue samples were identified using the samr method, wherein random permutations are made to adjust for multiple hypothesis testing (samr package in R 2·12·1).10 We defined significant associations as those in which the false-discovery rate was < 5%. In exploring
RI PT
the potential of DNA methylation patterns to define significant tumor subtypes/clusters, we selected differentially methylated CpGs showing at least a 10% mean difference (in absoloute beta values, i.e. +/-0.10) between normal and breast tissue samples and a standard deviation of at least 15% within the
SC
group of tumor samples. These CpGs were then ranked according to their F score, computed as the difference in average methylation between the normal and tumor samples divided by the ratio between the deviation in normal breast
M AN U
tissue and that in tumor samples. This score is designed to identify CpG sites that differ strongly between samples of normal and tumor tissues while at the same time being highly stable, or similar, within the group of normal breast tissue samples.
In section 3.1, we used the top 5,000 CpG (rankedaccordingto the F-
TE D
score described above) to construct cluster dendrograms using hierarchical clustering with complete linkage and the Manhattan distance as a measure of similarity (heatmap.2 function implemented in the gplot package for R). By using the pvclust package (R 2·12·1), we defined significant tumor clusters as
EP
those showing AU > 90 (at least four cluster members). In section 3.2. we used the multiclass approach using the samr package
AC C
for R to identify CpG sites differentially methylated between the different expression-based subtypes (i.e. LumA, LumB, HER2 and Basal-like). This analysis was uni-variate (i.e. not adjusted for histological grade and clinical staging). The normal-like breast cancer subtype given in the TCGA cohort was excluded from this analysis as the biological relevance of this subtype is still debated. Statistically significant CpG´s were then identified based on a falsediscovery rate (FDR) within 5%. Subsequently, the mean beta value for each CpG site was computed for each subtype and the differences in means determined for each subtype against each of the others (resulting in three values for each CpG site for each of the four subtypes). A threshold of at least
ACCEPTED MANUSCRIPT +/- 0.10 in mean differences was applied to call subtype-specific CpGs (the direction of change required to be the same for a given subtype being compared against each of the others). In section 3.2, we further analyse the identified subtype-specific patterns in terms promoter-associated CpGs defined as those located within 200 bp of
RI PT
the transcription start site, or those within the 5´UTR or first exon, i.e. CpG´s proximal to the TSS (transcription start site) and we therefore performed separate analyses for CpG´s located more than 1500bp away from the TSS (those referred to as TSS1500). We further analysed the patterns in terms of
SC
curated gene catalogues with known polycomb group repressor complex 2 (PRC2) occupied genes and those marked for repression by the H3K27Me3
M AN U
mark (lysine residue 27 of histone 3 being attached with the mark of three methyl groups) were downloaded from the Broad Institute of MIT and Harvard website
hosting
the
gene
set
enrichment
analyzer
(http://www.broadinstitute.org/gsea/msigdb/genesets.jsp). In section 3.3., the signatures specific for each subtype were used asinput for the pamr algorithm (pamr package for R). The signatures used
TE D
were 1) CpG island promoter methylation events associated with the LumB subtype consisting of 129 CpG´s, see Supplementary Table 2 for a list of CpG methylation events associated with LumB together with information on which
EP
of these are located within CpG promoter Islands, and 2) gene body hypomethylation events associated with the Basal-like subtype consisting of 53 CpG´s,see Supplementary Table 2 for a list of CpG hypomethylation events
AC C
determined as Basal-like specific together with information on which of these are located within gene bodies. This involves classifying each tumor according to a leave-one-out cross-validation procedure carried out separately for each of the two signatures. Here, a “conventional” probability threshold of more than 0·50 was defined as significant for a positive classification. The results are then presented in Figure 4. In section 3.4, we analysed our results in terms of the tumor suppressor catalogue of 716 genes compiled previously by Zhao and colleagues.11 This catalogue
was
downloaded
from
the
TSGene
database;
ACCEPTED MANUSCRIPT http://bioinfo.mc.vanderbilt.edu/TSGene/. The overlap between these 716 TSG´s and the catalogue of Epi-LumB specific promoter methylated genes was determined. In Table 1, we list out the overlapping genes (their promoter CpG´s) that additionally were found to be strongly downregulated by methylation events. The relation between CpG methylation and expression
RI PT
was determined in the TCGA cohort where data was available on both methylation and expression in the same samples; this resulting in 713 primary breast tumours and 82 normal breast tissue samples. The P-values were derived from a linear regression model followed by a Benjamini-Hochberg adjustment for multiple hypothesis testing. The significant hits shown in Table
SC
1 fulfill additional thresholding of R2 > 0.10 and at least a two-fold change in expression (i.e. median expression in unmethylated tumours / median
M AN U
expression in methylated tumours).
In section 3.5., the aCGH data were pre-processed by obtaining the normalized log2 ratios,12 represented as means within windows of ten probes (10x data representation), then segmented using the CBS algorithm (DNAcopy package for R).13 The segmented profiles of each tumor were obtained by
TE D
replacing each of the ten-probe representations by the corresponding segment means. Event frequency plots were obtained using a threshold of ± 0·075 for the segmented profiles to obtain counts of DNA copy number gains and deletions at each location and histogram plots using standard graphic
EP
functions in R. Differences in DNA copy number changes between tumor groups or subtypes were determined using the segmented profiles as input in
AC C
samr to adjust for multiple hypothesis testing, significant differences being defined as < 5% the false-discovery rate threshold. Additional criteria for reporting characteristic subtype-specific DNA copy number changes were at least two-fold differences in event frequencies and a minimum of three observed events (to avoid conclusion of spurious relationships). The same approach was used for the TCGA cohort making use of the “nocnv” option (somatic CNV´s excluded) in previously segmented Level 3 data. In section 3.6, we re-analysed the methylation data (both PEBC and TCGA cohorts) in terms of the novel subtypes, i.e. Epi-LumB and Epi-Basal to identify specific promoter methylation events associated with each of the two
ACCEPTED MANUSCRIPT subtypes for use as proxy markers. The identification of promoter methylation events associated with the Epi-LumB subtype included only CpG´s found within CpG island promoter regions found unmethylated in normal breast tissues. CpG´s were defined as unmethylated in normal breast tissue if their distribution was found to be within the range of 0 <β< 0·25 (considering only
RI PT
the normal breast tissue samples). The SAMr method was then used to identify significantly associated CpG island promoter methylation events separately for tumours classified as either Epi-LumB or Epi-Basal. For the EpiLumB subtype we applied an FDR threshold of <1% whereas few events
SC
fulfilled this criteria for the Epi-Basal subtype and we therefore applied a more relaxed criteria of < 5% FDR. The same procedure was applied for both our cohort (PEBC) and the TCGA cohort. The number of CpG´s found associated
M AN U
with the Epi-LumB and Epi-Basal subtypes in both the PEBC and TCGA cohorts (the overlap) included 5852 and 68 CpG´s, respectively. To prepare a top gene list, we then further trimmed down this list of CpG´s by selecting only those that achieve a within-subtype recurrence rate of at least 50%, i.e. for a given CpG to be determined valid as a potential proxy marker they must be
TE D
found methylated in at least half of either the Epi-LumB and Epi-Basaltumours. The list of CpG´s fulfilling this criteria for the Epi-LumB and Epi-Basal subtypes in both the PEBC and TCGA cohorts included 3034 and 61 CpG´s, respectively. The top candidate gene promoters were then determined based
EP
on ranking the fold-change output (derived from the samr analysis above) for promoters with at least two CpG´s fulfilling the criteria above (in both cohorts).
AC C
Supplementary Table 3 shows the top 30 gene promoter regions for each of the two Epi subtypes. Note, only 8 candidate gene promoter regions fulfilled all the criteria for the Epi-Basal subtype. Thus, Supplementary Table 3 comprehensively includes all the candidate proxy markers (involving promoter methylation events) for the Epi-Basal subtype whereas only the top 30 are listed for the Epi-LumB subtype.
Supplementary References 1. Sandoval J, Heyn H, Moran S, Serra-Musach J, Pujana MA, Bibikova M, Esteller M. Validation of a DNA methylation microarray for 450,000 CpG sites
ACCEPTED MANUSCRIPT in the human genome. Epigenetics 2011; 6(6): 692-702. 2. Stefansson OA, Jonasson JG, Johannsson OT, Olafsdottir K, Steinarsdottir M, Valgeirsdottir S, Eyfjord JE. Genomic profiling of breast tumours in relation to BRCA abnormalities and phenotypes. Breast Cancer Res. 2009; 11(4): R47.
RI PT
3. Stefansson OA, Jonasson JG, Olafsdottir K, Hilmarsdottir H, Olafsdottir G, Esteller M, Johannsson OT, Eyfjord JE. CpG island hypermethylation of BRCA1 and loss of pRb as co-occurring events in basal/triple-negative breast
SC
cancer. Epigenetics 2011; 6(5): 638-649.
4. Cheang MC, Chia SK, Voduc D, Gao D, Leung S, Snider J, Watson M, Davies S, Bernard PS, Parker JS, Perou CM, Ellis MJ, Nielsen TO. Ki67 index,
M AN U
HER2 status, and prognosis of patients with luminal B breast cancer. J Natl Cancer Inst. 2009; 101(10): 736-50.
5. Cheang MC, Voduc D, Bajdik C, Leung S, McKinney S, Chia SK, Perou CM, Nielsen TO. Basal-like breast cancer defined by five biomarkers has superior prognostic value than triple-negative phenotype. Clin Cancer Res.
TE D
2008; 14(5): 1368-76.
6. Blows FM, Driver KE, Schmidt MK, Broeks A, van Leeuwen FE, Wesseling J, Cheang MC, Gelmon K, Nielsen TO, Blomqvist C, Heikkilä P, Heikkinen T,
EP
Nevanlinna H, Akslen LA, Bégin LR, Foulkes WD, Couch FJ, Wang X, Cafourek V, Olson JE, Baglietto L, Giles GG, Severi G, McLean CA, Southey MC, Rakha E, Green AR, Ellis IO, Sherman ME, Lissowska J, Anderson WF,
AC C
Cox A, Cross SS, Reed MW, Provenzano E, Dawson SJ, Dunning AM, Humphreys M, Easton DF, García-Closas M, Caldas C, Pharoah PD, Huntsman D. Subtyping of breast cancer by immunohistochemistry to investigate a relationship between subtype and short and long term survival: a collaborative analysis of data for 10,159 cases from 12 studies. PLoS Med. 2010; 7(5): e1000279. 7. A L Børresen, E Hovig, B Smith-Sørensen, D Malkin, S Lystad, T I Andersen, J M Nesland, K J. Isselbacher, and S H Friend. Constant denaturant gel electrophoresis as a rapid screening technique for p53
ACCEPTED MANUSCRIPT mutations. Proc Natl Acad Sci U S A. 1991; 88(19): 8405-8409. 8. Thorlacius S, Börresen AL, Eyfjörd JE. Somatic p53 mutations in human breast carcinomas in an Icelandic population: a prognostic factor. Cancer Res. 1993; 53(7): 1637-1641.
Bioinformatics 2008; 24(13): 1547-8.
RI PT
9. Du P, Kibbe WA, Lin SM. lumi: a pipeline for processing Illumina microarray.
10. Tusher VG, Tibshirani R, Chu G. Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci U S A. 2001;
SC
98(9): 5116–5121.
11. Zhao M, Sun J, Zhao Z. TSGene: a web resource for tumor suppressor
M AN U
genes. Nucleic Acids Res. 2013; 41(Database issue): D970-6. 12. Workman C, Jensen LJ, Jarmer H, Berka R, Gautier L, Nielser HB, Saxild HH, Nielsen C, Brunak S, Knudsen S. A new non-linear normalization method for reducing variability in DNA microarray experiments. Genome Biol. 2002; 3(9): research0048.
TE D
13. Olshen AB, Venkatraman ES, Lucito R, Wigler M. Circular binary segmentation for the analysis of array-based DNA copy number data.
AC C
EP
Biostatistics 2004; 5(4): 557-72.
TE D
M AN U
SC
RI PT
ACCEPTED MANUSCRIPT
Supplementary Figure 1: Significance analysis of tumor clusters (carried out with the pvclust package for R). Three patterns (other than those of normal breast tissues)
AC C
“cluster 3”.
EP
are significant at AU > 95%, as indicated (boxed in grey): “cluster 1”, “cluster 2” and
M AN U
SC
RI PT
ACCEPTED MANUSCRIPT
TE D
Supplementary Figure 2: Hierarchical clustering of all the 1425 CpG´s identified as subtype-specific in the PEBC cohort. The CpG markers (rows) are displayed according to their subtype-specific association in colours as indicated at the bottom of the figure. In summary, this includes 430 CpG´s uniquely associated with the Basal-
EP
like subtype, 975 CpG´s uniquely associated with the LumB subtype, 8 CpG´s uniquely associated with the HER2 subtype and 12 CpG´s uniquely associated with
AC C
the LumA subtype. The subtype class is shown for each sample (columns) as indicated in colors at the upper-right hand side of the figure.
SC
RI PT
ACCEPTED MANUSCRIPT
Supplementary Figure 3: Examples of co-occurring events involving CpG promoter
M AN U
methylation and copy number loss (hypothetically reflecting loss of one allele and promoter methylation of the retained allele). A) TENC1 promoter methylation is frequently observed in association with copy number loss in Epi-Basal tumors. B) Epi-LumB tumors frequently display loss of chromosome 13q wherein CpG island promoter methylation events can be seen as co-occurring events in DZIP1, PCDH8 and TNFSF11. The P-values displayed are those from Student’s t tests for
AC C
EP
indicated in each plot.
TE D
differences in methylation levels between tumors with and without loss of genes
M AN U
SC
RI PT
ACCEPTED MANUSCRIPT
Supplementary Figure 4: Analysis of the TCGA cohort where information on the percentage of tumor cell nuclei per sample was available (reflecting tumor cell content of the sample). No differences were detected in comparing the %tumor nucei of Epi-LumB tumor to that of other tumors (left panel) or when comparing %tumor
TE D
nuclei in Epi-Basal tumors to that of other tumors (right panel). The P-values were
AC C
EP
obtained from the Student’s t-test (two-tailed).
M AN U
SC
RI PT
ACCEPTED MANUSCRIPT
TE D
Supplementary Figure 5: CpG promoter methylation of KCNA3, ZNF132, TTBK1, TENC1 and ZNF671 analyzed in relation with expression. The data are based on Illumina RNA sequencing and Infinium methylation arrays derived from the Cancer
EP
Genome Atlas (TCGA). The P-values displayed were derived from hypothesis testing
AC C
of the Pearson´s correlation coefficient using the cor.test function in R.
ACCEPTED MANUSCRIPT
Supplementary Table 1: The relation between overall methylation patterns (as determined through the definition of distinct clusters) analysed in relation to expression-based subtype definitions. LumB
HER2
5NP*
Basal-like
Unknown
Total
Cluster 1
1 (14.3%)
5 (71.4%)
1 (14.3%)
0
0
0
7
Cluster 2
1 (6.7%)
0
1 (6.7%)
0
11 (73.3%)
Cluster 3
4 (100%)
0
0
0
0
Non-Clustered
6 (42.9%)
2 (14.3%)
1 (7.1%)
1 (7.1%)
3 (21.4%)
RI PT
LumA
2 (13.3%)
15
0
4
1 (7.1%)
14
AC C
EP
TE D
M AN U
*5NP (five negative phenotype; i.e. negativity for ER, PR, HER2, CK5/6 and EGFR)
SC
X2-squared=40.6 P=0.0004
ACCEPTED MANUSCRIPT Supplementary Table 2: Subtype-specific CpG’s. LumB-specific CpG methylated markers:
C1QL2;C1QL2 PTGS2 RSPO2;RSPO2
1stExon;5'UTR Body 1stExon;5'UTR
SFRP1
TSS200
KCNK9 SHISA3 CCDC67;CCDC67
TSS1500 TSS200 1stExon;5'UTR
NR2E1;NR2E1 FAM78B
5'UTR;1stExon Body
CPNE7;CPNE7 OBSL1 PDGFRA CFD MCF2L2;B3GNT5
TSS1500;TSS1500 Body 5'UTR Body Body;5'UTR
SFRP2 USP44;USP44 KIAA1755;KIAA1755 RSPO2;RSPO2 FBN1 HBM S1PR1;S1PR1 GSX2 SPSB4 COL23A1 TMEM155;LOC100192379 NID2 DFNA5;DFNA5;DFNA5;DFNA5;DFNA5 GFI1;GFI1;GFI1 PAQR9
TSS200 5'UTR;5'UTR 5'UTR;1stExon 1stExon;5'UTR 5'UTR Body 1stExon;5'UTR Body TSS200
TSS1500 TSS200;Body TSS1500 1stExon;TSS1500;1stExon;5'UTR;5'UTR 5'UTR;5'UTR;TSS1500 TSS1500
PROX1 GSX2 KCNH5;KCNH5;KCNH5 DAB1 DNM3;DNM3 FOXL2;C3orf72;FOXL2 T RASGRF2
5'UTR 1stExon 5'UTR;TSS1500;TSS1500 5'UTR 1stExon;1stExon 1stExon;TSS1500;5'UTR 5'UTR TSS200
HOXA7 RRN3P1 GPRC5B
TSS200 TSS1500 5'UTR
ELMO1 DSC3;DSC3 CXCL2 HBM
5'UTR TSS200;TSS200 Body TSS200
RUNX3;RUNX3;RUNX3 SFRP5 EN1 SIX6;SIX6 EMILIN2 C1QL2 CDX2 DFNA5;DFNA5;DFNA5 P2RX6;P2RX6;MGC16703 RELN;RELN ADCY4 RGMA;RGMA;RGMA WNT2;WNT2;WNT2 GABRB2;GABRB2 LAMA1 MDGA2 HBM DDIT4L CHST2 CHST2;CHST2 POU4F1 CLDN5;CLDN5 POU3F2 SLC6A2 COL25A1;COL25A1;COL25A1;COL25A1 DHH KCNH5;KCNH5;KCNH5 SFRP1
1stExon;Body;5'UTR 1stExon 1stExon 1stExon;5'UTR 1stExon TSS200 1stExon 5'UTR;5'UTR;5'UTR TSS1500;TSS1500;Body TSS200;TSS200 Body Body;5'UTR;1stExon Body;5'UTR;1stExon 5'UTR;5'UTR 1stExon TSS1500 TSS200 TSS1500 5'UTR 5'UTR;1stExon TSS200 1stExon;Body TSS1500 TSS200 1stExon;5'UTR;5'UTR;1stExon Body 5'UTR;TSS1500;TSS1500 TSS200
HAND2;NBLA00301;HAND2 FOXL2;C3orf72 ETS1;ETS1;ETS1
5'UTR;TSS1500;1stExon 1stExon;TSS1500 Body;TSS1500;TSS1500
Island Island Island Island Island Island Island Island Island Island Island Island Island Island Island Island Island Island Island Island Island Island Island Island Island Island Island Island Island Island Island Island Island Island Island Island S_Shore Island Island Island Island Island Island Island Island Island Island Island Island Island Island Island Island Island Island N_Shore Island Island N_Shore Island Island Island Island N_Shore Island Island Island Island Island Island Island Island Island Island Island Island Island Island S_Shore Island Island Island Island Island Island Island Island Island Island Island Island Island S_Shore
RI PT
5'UTR;TSS200 1stExon TSS200 TSS200
SC
ADCYAP1;ADCYAP1 FADS6 KIAA1024 KIAA1024
RELATION_TO_UC SC_CPG_ISLAND
M AN U
185942529 905156 72889549 79724794 79724802 106442588 119916017 186649354 109095568 166666838 6703803 41167087 82136116 140715557 42399851 93063888 96953051 108487388 166134699 66136094 89641296 220417030 55099058 863136 182972048 23908776 154710353 95942287 36889157 109095572 48937213 216450 101702519 54967423 140770588 35300874 178017937 122686453 52536175 24797363 92950450 142682652 5001047 214162545 54967018 63512588 58715539 171810910 138665654 166581538 80256535 47225326 27196365 21831636 19895321 35290307 37487613 28622912 74964493 215960 41883879 25256369 99531416 119604702 60975964 2847301 119916510 28542727 24796981 21368419 103630013 24803679 93631578 116963259 160974837 7117680 48145108 215791 101111872 142839578 142838938 79177763 19511987 99281178 55690381 110223700 49484169 63512760 41167113 41883890 174450722 138665006 128392708
EP
4 18 17 15 15 6 2 1 8 6 7 8 4 8 4 11 15 6 1 12 16 2 4 19 3 22 4 12 20 8 15 16 1 4 3 17 5 4 14 7 1 3 17 1 4 14 1 1 3 6 5 12 7 16 16 17 7 18 4 16 4 1 10 2 14 18 2 13 7 22 7 14 15 7 5 18 14 16 4 3 3 13 22 6 16 4 12 14 8 4 4 3 11
UCSC_REFGENE_GROUP
AC C
cg00043819 cg00128702 cg00266322 cg00462168 cg00544449 cg00624404 cg00690148 cg00690431 cg00910695 cg01349775 cg01447112 cg01495122 cg01508023 cg01532168 cg01565320 cg01923218 cg01926877 cg02062480 cg02229993 cg02262056 cg02449821 cg02527669 cg02722188 cg02813298 cg02958321 cg03133266 cg03202804 cg03308628 cg03378876 cg04050867 cg04125371 cg04141813 cg04156369 cg04275490 cg04353095 cg04588165 cg04591034 cg04638468 cg04701034 cg04770504 cg04838747 cg04912999 cg04974290 cg04976324 cg05196820 cg05299793 cg05375728 cg05377226 cg05386493 cg05478631 cg05542262 cg05728754 cg05743885 cg05949903 cg06088324 cg06094642 cg06159352 cg06263193 cg06312813 cg06319822 cg06375628 cg06377278 cg06576393 cg06704122 cg06785999 cg07012770 cg07212778 cg07213036 cg07320646 cg07448795 cg07482935 cg07485357 cg07491495 cg07697895 cg07763047 cg07846220 cg08217024 cg08393070 cg08663159 cg08774368 cg08858437 cg09010671 cg09092054 cg09276565 cg09774787 cg10074544 cg10086212 cg10162522 cg10406295 cg10454246 cg10541864 cg10641721 cg10679277
UCSC_REFGENE_NAME
TE D
TargetID CHR MAPINFO
ACCEPTED MANUSCRIPT
PROX1 RUNX3;RUNX3
TSS1500 Body;TSS200
COL23A1 DSC3;DSC3 SCG3;SCG3;SCG3;SCG3
TSS1500 TSS200;TSS200 5'UTR;1stExon;5'UTR;1stExon
FAM110B ANKRD34C RASGRF2
5'UTR 5'UTR 1stExon
TTBK1 FAM43B C8orf47;C8orf47 VIM
5'UTR 1stExon TSS200;TSS200 Body
MFSD2B ITGA11 NID2;NID2 ADCY4
TSS200 Body 5'UTR;1stExon TSS200
AKR1B1 TMEM90A RGS7;RGS7
TSS200 TSS1500 5'UTR;1stExon
USP44;USP44 GFI1;GFI1;GFI1 PAX7;PAX7;PAX7;PAX7;PAX7;PAX7 PTGS2
TSS1500;5'UTR 5'UTR;5'UTR;TSS1500 5'UTR;1stExon;5'UTR;1stExon;1stExon;5'UTR Body
FLRT2 EOMES
5'UTR 1stExon
M AN U
PDGFRA SHISA3 ANKLE1
5'UTR TSS200 Body
SPSB4 LOC100190940 SFRP1 MFSD2B SNX32 CD38 LOC283999
TSS200 TSS200
1stExon TSS200 TSS200 Body Body
EMX1 GALR2 CRHBP NRXN1;NRXN1;NRXN1;NRXN1
Body TSS1500 Body Body;1stExon;5'UTR;Body
ADRB3 PAQR9 SFRP1 CHD5 RFX4 TMEM22;TMEM22;TMEM22 FZD10 VGLL2;VGLL2 TLX3
1stExon TSS1500 TSS200 Body Body 5'UTR;5'UTR;5'UTR TSS1500 Body;Body TSS1500
SRD5A2 SIX6 TRIM9;TRIM9 TMEM22;TMEM22;TMEM22
TSS200 TSS200 1stExon;1stExon 5'UTR;5'UTR;5'UTR
RCSD1
Body
TRIM71 NBLA00301;HAND2 DMRTA2 SIX6 TRIM71 TRIM67 FZD10
TSS200 TSS1500;1stExon TSS1500 1stExon 1stExon TSS1500 TSS1500
BMP3 ST6GAL2;ST6GAL2;ST6GAL2 P2RY1 GDF7 RFX4 SFRP1
TSS200 TSS200;5'UTR;5'UTR 1stExon TSS200 Body TSS200
C1QL2 CLDN5;CLDN5;CLDN5 FZD10 PROX1 TLX3 TRPC7;TRPC7;TRPC7
TSS200 1stExon;3'UTR;3'UTR TSS200 5'UTR Body TSS200;TSS200;TSS200
LEPREL1;LEPREL1 MME;MME;MME;MME MFSD2B PROX1
5'UTR;1stExon TSS200;5'UTR;TSS200;5'UTR TSS200 TSS1500
TRIM71 GFI1;GFI1;GFI1
TSS200 5'UTR;5'UTR;TSS1500
Island S_Shore Island Island Island Island N_Shore S_Shore Island Island Island Island Island Island Island S_Shore Island N_Shore Island Island Island Island Island Island Island Island Island Island Island Island Island Island Island Island Island Island Island Island Island Island Island Island Island Island Island Island Island Island Island Island Island N_Shore Island Island Island Island Island Island Island Island Island Island Island Island Island Island Island Island Island Island Island Island Island
RI PT
5'UTR;1stExon TSS1500
SC
HCRTR2;HCRTR2 COL23A1
TE D
55039232 178017827 22625465 106974623 214161368 25256939 239549798 178017879 28622813 51973661 68113478 59058273 79576216 80256738 35290523 43211491 20879547 99076734 17271519 46632811 24232884 68723690 52535758 24803903 114140 134143919 74893112 241520331 14348844 95942907 92950265 18958084 186649330 88137909 96721468 133536793 133536345 85999933 27763581 146888890 55098130 42399843 17392923 131769581 140770617 130527046 67875033 41166530 24232886 65601265 15780306 76227996 43250569 106442911 73151595 74070682 76249776 50574708 174443529 37823475 142682682 41167109 6209018 106977641 136538801 130646256 117591684 170736021 46956496 31806042 60975846 51561293 136538934 30449172 167599719 99281070 137819267 32859407 174450408 50889427 60976285 32859587 231297555 130646370 144511212 81951951 107502615 152553673 20866231 106979874 41167107 24154607 119916486 19510977 130647004 214162711 170737255 135701240 47224954 189838112 154797917 24232889 214161371 8076277 32859438 92950249
EP
6 5 10 12 1 1 1 5 18 15 15 8 15 5 17 6 1 8 10 1 2 15 14 14 9 7 14 1 9 12 1 1 1 9 9 9 9 14 3 5 4 4 19 10 3 12 8 8 2 11 4 17 10 6 2 17 5 2 4 8 3 8 1 12 3 12 6 5 1 2 14 14 3 6 1 6 6 3 4 1 14 3 1 12 8 4 2 3 2 12 8 19 2 22 12 1 5 5 12 3 3 2 1 10 3 1
AC C
cg10725720 cg10730712 cg10732215 cg10790429 cg10827893 cg11018723 cg11158057 cg11256387 cg11722699 cg11831981 cg12002303 cg12615766 cg12628659 cg12631351 cg12655190 cg12691866 cg12768681 cg12845520 cg12874092 cg13104713 cg13259296 cg13434638 cg13592399 cg13631572 cg13768269 cg13801416 cg13829550 cg13849378 cg13849454 cg13879483 cg13930300 cg13969001 cg13986130 cg14058647 cg14169333 cg14192957 cg14236735 cg14400498 cg14557534 cg14704758 cg14777772 cg14866200 cg15110403 cg15344419 cg15562912 cg15584445 cg15684724 cg15839448 cg15871441 cg15890274 cg15994026 cg16195091 cg16325777 cg16573266 cg16781647 cg16898239 cg17448335 cg17526573 cg17609931 cg17619823 cg17627617 cg17816908 cg17831269 cg18061259 cg18172881 cg18209212 cg18392016 cg18468394 cg18475643 cg18529845 cg18639233 cg18683604 cg18740893 cg18750167 cg18800085 cg18917544 cg19003797 cg19127283 cg19178853 cg19380001 cg19456540 cg19741945 cg19806849 cg20058043 cg20387387 cg20642710 cg20803857 cg20944305 cg21039778 cg21176643 cg21517947 cg21534423 cg21858380 cg21872764 cg21885046 cg22489498 cg22686881 cg22770135 cg22868282 cg23015696 cg23209255 cg23220551 cg23260547 cg23316253 cg23528400 cg23807354
Island Island Island Island Island Island Island Island N_Shore Island Island Island Island Island Island Island Island Island Island Island Island Island Island Island Island Island Island Island Island Island Island
ACCEPTED MANUSCRIPT SLC30A10;SLC30A10 RUNDC3A;RUNDC3A;RUNDC3A PROX1 TMEM132E PROX1 TOX SFRP1 EOMES
5'UTR;1stExon Body;Body;Body TSS1500 Body 5'UTR Body TSS200 1stExon
RGS20 BEND4;BEND4 RASGRF2 RGS7 POU3F2 COL25A1;COL25A1;COL25A1;COL25A1 C8orf47;C8orf47 CAV2;CAV2 SLC30A10;SLC30A10 DSC3;DSC3 MEOX2;MEOX2 LRAT DNM3;DNM3 IRAK3;IRAK3 IRAK3;IRAK3;IRAK3;IRAK3 PRR16 DSC3;DSC3
Body Body;Body TSS1500 TSS200 TSS1500 1stExon;5'UTR;5'UTR;1stExon Body;Body Body;Body 5'UTR;1stExon TSS200;TSS200 1stExon;5'UTR TSS200 TSS200;TSS200 TSS200;TSS200 1stExon;1stExon;5'UTR;5'UTR 5'UTR TSS200;TSS200
VIM C8orf47;C8orf47;C8orf47;C8orf47 MEOX2 TTBK1 POU4F1 SOX21 ADRB3 FAM101A TRIM67 LOC283392;TRHDE;LOC283392 RAX
5'UTR 1stExon;5'UTR;1stExon;5'UTR TSS200 TSS200 TSS200 TSS1500 1stExon 5'UTR TSS200 Body;1stExon;Body Body
LBXCOR1
MAPINFO 20 1 12 17 17 17 7 17 7 2 17 12 17 20 7
UCSC_REFGENE_NAME
53088317 209852361 78581770 67710309 68070495 26419687 24474486 68100565 24837854 68149422 68100975 130590137 70026605 58533022 24798855
NAV3
UCSC_REFGENE_GROUP
Island Island Island Island N_Shore Island Island Island Island Island
RELATION_TO_UCSC_CPG_IS LAND N_Shelf S_Shelf
Body
KCNJ16;KCNJ16 NLK
TSS1500;TSS1500 Body
KCNJ16;KCNJ16;KCNJ16 OSBPL3;OSBPL3;OSBPL3;OSBPL3
5'UTR;TSS1500;5'UTR 3'UTR;3'UTR;3'UTR;3'UTR
KCNJ16;KCNJ16;KCNJ16
5'UTR;TSS200;5'UTR
CDH26 DFNA5;DFNA5
TSS1500 TSS1500;TSS1500
TE D
CHR
cg03539903 cg08847382 cg12871652 cg13451937 cg13606025 cg16342597 cg16717546 cg18432768 cg20110978 cg20436458 cg21871080 cg23653492 cg24310711 cg25325053 cg26712096
Island Island Island Island S_Shore Island Island Island Island
Body
LumB-specific CpG hypomethylated markers: TargetID
Island S_Shore Island Island Island Island Island Island Island Island Island Island Island Island Island Island Island Island Island
RI PT
220101945 42394004 214160848 32909042 214162604 60030763 41166990 27763102 42329446 54789978 42153708 80256166 241520598 99281500 110223598 99077132 116140128 220101962 28622940 15726166 155664993 171810543 66582805 66583048 119801431 28622918 35292390 17271051 99076837 15726466 43211208 79177700 95364675 37823292 124779532 231298651 72667150 56936434 46632696 68120393
SC
1 17 1 17 1 8 8 3 2 8 4 5 1 6 4 8 7 1 18 7 4 1 12 12 5 18 17 10 8 7 6 13 13 8 12 1 12 18 1 15
M AN U
cg23815582 cg23965689 cg23992410 cg24071532 cg24083324 cg24158594 cg24319902 cg24434959 cg24631482 cg24645214 cg24657817 cg24718971 cg24843474 cg24942226 cg25088758 cg25248750 cg25274503 cg25317664 cg25789861 cg25859468 cg26084529 cg26261793 cg26279550 cg26415547 cg26464221 cg26861703 cg26946821 cg26983469 cg26989202 cg27096779 cg27363327 cg27398263 cg27442308 cg27498114 cg27504292 cg27504802 cg27579953 cg27602263 cg27636310 cg27649239
S_Shore
TargetID
CHR
MAPINFO 13 2 6 2 1 1 1 13 19 1 8 15 10 8
110966036 119887680 56406084 119615889 201481480 202091933 20810527 113637903 58238600 114423726 30284255 63673286 8106003 74225117
cg04246144
5
140739501
cg04874286 cg04920032 cg05369926 cg05483021 cg05572417 cg05859267 cg06070625 cg06159269 cg06303936 cg06311422 cg06815715 cg07094847
12 12 17 8 13 16 3 1 3 6 15 7
95477853 50262986 79057147 54569668 113637815 77469667 69812798 8767347 172711298 56406336 49148520 27160586
cg07242860
5
140739522
cg07931024 cg07945618 cg08129759
20 15 1
55925586 99926398 202091944
UCSC_REFGENE_NAME
UCSC_REFGENE_GROUP
COL4A2
Body
DST;DST;DST;DST;DST
Body;Body;Body;Body;Body
GPR37L1 CAMK2N1 MCF2L;MCF2L ZNF671 BCL2L15 RBPMS;RBPMS;RBPMS;RBPMS CA12;CA12 GATA3;GATA3 RDH10 PCDHGA4;PCDHGA2;PCDHGB2;PCDHGA1 ;PCDHGB1;PCDHGB2;PCDHGA3 FGD6 FAIM2 BAIAP2;BAIAP2;BAIAP2;BAIAP2
TSS200 Body Body;Body Body 3'UTR Body;Body;Body;Body Body;Body Body;Body Body
MCF2L;MCF2L ADAMTS18 MITF;MITF RERE;RERE SPATA16 DST;DST;DST;DST;DST SHC4 HOXA3;HOXA3 PCDHGA4;PCDHGA2;PCDHGB2;PCDHGA1 ;PCDHGB2;PCDHGB1;PCDHGA3 RAE1;RAE1 LRRC28 GPR37L1
Body;Body TSS1500 Body;TSS200 5'UTR;5'UTR Body Body;Body;Body;Body;Body Body 5'UTR;TSS1500
AC C
cg00145419 cg00428457 cg01358444 cg01502588 cg02104952 cg02124912 cg02941008 cg03202693 cg03215105 cg03325407 cg03633268 cg04192393 cg04213746 cg04238758
EP
Basal-like specific CpG methylated markers:
Body;Body;TSS1500;Body;Body;TSS1500;Body
RELATION_TO_UCSC_CPG_IS LAND Island N_Shore N_Shore
Island S_Shelf Island
N_Shore
N_Shore
Body 3'UTR Body;Body;Body;Body Island S_Shore S_Shore S_Shelf N_Shore N_Shore
Body;Body;TSS200;Body;TSS200;Body;Body
N_Shore
TSS1500;TSS1500 3'UTR TSS200
N_Shore
ACCEPTED MANUSCRIPT
1
103573989
cg13107973 cg13300939 cg13628006 cg13726218 cg14685146 cg15996882 cg16051002 cg16474184 cg17077762 cg17585421 cg17652792 cg17835606 cg18003970 cg18362770 cg19007141
1 1 10 13 17 17 5 12 15 5 5 15 11 14 4
215178658 202091868 34446040 72438249 42088751 17715815 140579160 48111507 64996104 140536897 140562007 48839797 12410134 21966454 140808552
cg19595107
5
140734543
cg19836589 cg20349803 cg20366549 cg20688917 cg21776417 cg22332589 cg22367250 cg22701534 cg22865713 cg22925840 cg23092956 cg23361862 cg23440816 cg23756265 cg23903597 cg24016939 cg24195796 cg24865110 cg25087851 cg25487404 cg25602543
20 2 12 7 3 13 4 11 6 2 2 10 1 11 17 19 15 17 11 8 20
55925603 121036440 6486123 27160674 147142415 110966052 153199823 19935158 25779897 225043045 54846562 15256422 223948910 127891323 61704154 58239135 29362669 57407818 60623918 37551945 44507533
cg26337998
5
140749783
cg26373541 cg26777836 cg27024272 cg27139424 cg27281882 cg27313021 cg27580905
4 3 16 11 20 1 2
6644336 26665815 86834676 58695093 9820740 202091995 39335085
Body
LZTS1 MEF2D ZFHX3;ZFHX3 PTPRB ZNF589 DTNBP1;DTNBP1;DTNBP1 ZNF671;ZNF671
TSS200 5'UTR 5'UTR;5'UTR TSS1500 3'UTR Body;Body;Body 1stExon;5'UTR
ZIC1 COL11A1;COL11A1;COL11A1;COL11A1;C OL11A1;COL11A1;COL11A1;COL11A1 KCNK2 GPR37L1 PARD3 DACH1;DACH1;DACH1 TMEM101 SREBF1;SREBF1 PCDHB11 P11 OAZ2 PCDHB17 PCDHB16;PCDHB16 FBN1 PARVA TOX4;METTL3 MAML3 PCDHGA2;PCDHGA4;PCDHGA1;PCDHGA4 ;PCDHGB1;PCDHGA3 RAE1;RAE1 RALB SCNN1A;SCNN1A HOXA3;HOXA3
3'UTR
COL4A2
Body
NAV2;NAV2;NAV2 SLC17A4
Body;Body;Body 3'UTR
SPTBN1;SPTBN1 FAM171A1 CAPN2;CAPN2 MAP3K3;MAP3K3 ZNF671 APBA2;APBA2 YPEL2 GPR44 ZNF703 ZSWIM3 PCDHGA4;PCDHGA2;PCDHGB2;PCDHGA1 ;PCDHGB3;PCDHGB3;PCDHGA5;PCDHGB1 ;PCDHGA3 MRFAP1 LRRC3B
PAK7;PAK7 GPR37L1 SOS1
N_Shelf
Island S_Shore
5'UTR;5'UTR;1stExon;5'UTR;1stExon;1stExon;5'UTR;1stExon TSS1500 TSS200 Body Body;Body;Body 3'UTR 3'UTR;3'UTR TSS200 Body TSS1500 Body 5'UTR;1stExon Body Body 3'UTR;Body Body
N_Shore N_Shelf Island N_Shore
RI PT
cg12884406
COG3
S_Shore Island N_Shore
Body;TSS1500;Body;TSS1500;Body;Body TSS1500;TSS1500 Body Body;TSS1500 5'UTR;TSS1500
N_Shore
SC
89525620 46098921 75025753 20112959 156465749 73019173 71031813 48310616 15622032 58238987 229359488 147131860
M AN U
15 13 16 8 1 16 12 3 6 19 1 3
Body;Body Body Body;Body
N_Shore
N_Shore S_Shore Island
S_Shore
Body;Body TSS200 Body;Body TSS1500 TSS1500 TSS1500 3'UTR
S_Shelf S_Shore
Body;Body;Body;Body;TSS200;TSS200;Body;Body;Body
N_Shore
3'UTR 5'UTR
S_Shore N_Shore
TE D
cg08291506 cg08589212 cg10020890 cg10121113 cg10128479 cg10298741 cg10988187 cg11307417 cg11938455 cg11977686 cg12653626 cg12732998
TSS1500;TSS1500 TSS200 Body
N_Shore S_Shelf Island
S_Shore
cg00038736 cg00343906 cg00577950 cg00643333 cg00792053 cg01095389 cg01288184 cg01521397 cg01727651 cg02026204 cg02371401 cg02394578 cg02625638 cg02742918 cg03332897 cg03559389 cg04121731 cg04386759 cg05099145 cg05138133 cg05303559 cg05753693 cg06042504 cg06373940 cg06527318 cg06775420 cg07139527 cg07838495 cg07856071 cg07935320 cg08086385 cg08550317 cg08613144 cg08701134 cg08913523
CHR
MAPINFO 19 1 21 22 4 11 18 20 1 22 5 1 8
38877134 1696692 45176447 45705037 25097886 75511563 20811408 60590872 111772026 30663007 676784 197191366 124218648 153284103 740442 679322 31886923 170554795 1796333 84075503 158512380 10192930 55087323 128052778 25349006 1742245 80969525 8287763 15928114 124218630 72430211 3850732 133428836 69969618 126649807
UCSC_REFGENE_NAME
UCSC_REFGENE_GROUP
GGN NADK PDXK FAM118A;FAM118A
Body Body 3'UTR TSS1500;TSS200
AC C
TargetID
EP
Basal-like specific CpG hypomethylated markers:
X
19 10 16 6 16 16 7 8 8 2 1 16 17 17 5 8 3 6 2 16 8
RELATION_TO_UCSC_CPG_IS LAND Island N_Shore N_Shore Island
DGAT2 CABLES1;CABLES1;CABLES1 TAF4 CHI3L2;CHI3L2;CHI3L2 OSM TPPP
3'UTR Body;Body;Body Body Body;TSS1500;Body TSS200 Body
FAM83A;FAM83A IRAK1;IRAK1;IRAK1 PALM;PALM DIP2C ZNF267
Body;Body Body;Body;Body Body;Body Body Body
N_Shore N_Shore Island Island S_Shore
MAPK8IP3;MAPK8IP3 SLC38A8
Body;Body Body
N_Shore
MSRA;MSRA;MSRA
Body;Body;Body
S_Shore
ERCC3
TSS1500
S_Shore
HN1L B3GNTL1 RPL26 FBXL7 FAM83A;FAM83A RYBP FAM50B LYPD1;LYPD1;LYPD1 WWP2;WWP2
Body Body TSS1500 Body Body;Body Body Body 1stExon;5'UTR;TSS1500 Body;Body
S_Shelf
N_Shore
N_Shore S_Shore Island N_Shore Island Island
ACCEPTED MANUSCRIPT Body 1stExon
PRX;PRX UMODL1;UMODL1 FBXL7 CPEB3 ELANE
3'UTR;Body Body;Body Body Body Body
ATXN7L1;ATXN7L1 MAG;MAG DYSF;DYSF;DYSF;DYSF RTEL1;RTEL1
Body;Body Body;Body Body;Body;Body;Body Body;Body
FAM129A C13orf27 HSP90AA1 ALDH1A3 STK10
Body TSS1500 Body Body Body
COL23A1 ZFR2 FAM118A;FAM118A ASAP2;ASAP2
Body Body TSS1500;TSS200 Body;Body
RPTOR;RPTOR CACNA1H;CACNA1H HIC1;HIC1 TLE2;TLE2;TLE2
Body;Body 3'UTR;3'UTR TSS1500;5'UTR 3'UTR;3'UTR;3'UTR
HDAC4
Body
N_Shelf N_Shore Island
Island N_Shelf N_Shelf Island Island N_Shore
Island Island S_Shelf
RI PT
MYO1D LOC389333
Island Island
N_Shore Island N_Shore
HDAC4
TSS1500;3'UTR;TSS1500 Body
S_Shelf Island Island S_Shore Island N_Shelf N_Shore Island
Body
GTPBP5 ADHFE1 B3GNTL1 NRF1;NRF1
TMC8;TMC6
Body Body
N_Shelf
Body Body;Body
N_Shore
5'UTR;5'UTR
SRRM3 ZNF396 BCAN;BCAN
M AN U
KRTCAP3;NRBP1;KRTCAP3 EPHB3
SC
Island
Body TSS1500
TE D
76087149 30821827 138729283 138032270 101196790 25097821 40901963 43545792 15928441 93976249 855536 100805303 555251 116231420 105283982 35786580 71823484 62321166 43020716 184836205 103426862 102593156 101420708 171538334 62888566 25097558 177713841 3821095 45705034 9526703 25097661 112462079 78735596 1271153 1959124 2997638 836679 240174650 1615992 27665017 184297380 211688020 44695348 240169161 184344219 60773564 67358546 820835 80969547 129305192 168619344 1615843 76127608 76361166 75915065 32957683 4566575 156613060 193587264 3285299 66626141 3517953 8978201 157102769 124218573 37275069 113953812 168132832 1959066 30662994 5576256 166982925 240169280 131589189 2150016 92613280 63275602 174120078
5'UTR;5'UTR
Island Island S_Shelf S_Shore Island S_Shore N_Shore
SLC22A23;SLC22A23 PC;PC;LRFN4;PC
Body;Body Body;Body;1stExon;Body
Island
KIDINS220
TSS1500
S_Shore
FAM83A;FAM83A C20orf95 ZBTB16;ZBTB16
Body;Body Body Body;Body
HIC1;HIC1 OSM
TSS1500;5'UTR TSS200
N_Shore Island Island N_Shelf Island
HDAC4
Body
MAD1L1;MAD1L1;MAD1L1 SLCO3A1;SLCO3A1 LOC100132215
Body;Body;Body Body;Body TSS1500
EP
7 17 5 5 13 4 19 21 5 10 19 7 5 8 7 19 2 20 5 1 13 14 15 5 2 4 5 19 22 2 4 2 17 16 17 19 19 2 6 2 3 1 6 2 4 20 8 10 17 7 6 6 17 17 7 18 19 1 3 6 11 6 2 7 8 20 11 6 17 22 10 2 2 10 7 15 2 5
AC C
cg08944347 cg09010067 cg09156097 cg09476006 cg10092251 cg10092607 cg10684794 cg10851763 cg11005831 cg11509491 cg11683663 cg11693986 cg11833938 cg11888982 cg11895696 cg11902728 cg12245706 cg12600109 cg12887985 cg13100449 cg13221899 cg13263472 cg13615592 cg13656752 cg14210094 cg14533992 cg14864276 cg14950747 cg15477144 cg15545942 cg15773230 cg15809077 cg16565901 cg16577083 cg17029019 cg17389150 cg17473898 cg17939889 cg18346398 cg18428193 cg18990407 cg19234983 cg19254793 cg19324997 cg19465124 cg19573208 cg20040765 cg20199549 cg20664445 cg20712980 cg20762346 cg21010202 cg21282054 cg21348406 cg21938845 cg22194807 cg22308726 cg22439792 cg22525069 cg22544679 cg22740006 cg22745369 cg23421509 cg23590389 cg23698536 cg23884187 cg24452821 cg24687806 cg25432975 cg25666403 cg25887076 cg26056277 cg26495711 cg26664254 cg27109748 cg27132471 cg27344587 cg27615363
N_Shelf
Island
ACCEPTED MANUSCRIPT Supplementary Table 3: CpG promoter methylation events identified as candidate proxy markers for the novel Epi-Subtypes (i.e. Epi-LumB and Epi-Basal).
TargetID´s
BICC1
cg07857251
cg19940537
CHST2
cg08858437
cg08774368
CUGBP2
cg03813164
cg12356890
cg11472279
DNM3
cg26261793
cg06211893
cg05377226
EOMES
cg24434959
cg14557534
FMN2
cg01535698
cg25208017
FSD1
cg13355047
cg19752891
GFI1
cg11785652
cg11412935
cg23807354
GLB1L3
cg21692846
cg03323292
cg05621343
IGFBP7
cg11597475
cg21921384
cg01959562
ITGA4
cg06952671
cg21995919
cg25024074
KCNA3
cg20302133
cg26013553
cg11595545
LHX1
cg05527869
cg10043865
cg10356613
cg12518410
cg09392940
LOC285548
cg16292168
cg10461004
MFSD2B
cg02687055
cg23220551
POU4F1
cg27398263
cg09010671
QRFPR
cg19971716
cg13643914
RUNX3
cg06377278
cg11018723
SHISA3
cg06269673
cg16862295
SLC30A10
cg17721710
cg25317664
cg01565320
cg20776829
cg08708229
cg02192967
cg00119181
STK33
cg08788717
cg00450824
cg00393798 cg20568196
cg26281453
cg09775582
TM6SF1
cg26460092
cg03063639
TMEM155
cg10863741
cg04638468
TMEM22
cg22304507
cg18740893
cg27363327
cg16620382
VWC2
cg01893212
cg02467990
ZNF132
cg12042659
TE D
TTBK1
cg01423964
cg14866200
SNCA TBX21
cg13930300
cg20813081
M AN U
LOC100287216
cg07678904
SC
Gene Symbol
RI PT
Epi-LumB associated CpG island promoter methylation events (listing only the top 30 significantly associated genes; only CpG island associated)
cg03735888
Epi-Basal associated CpG promoter methylation events (listing all the 8 significantly associated gene promoter regions; CpG island or non-island associated) TargetID´s
FAM198B
cg03304437
cg24811290
cg12543949
cg03450635
MARCH11
cg25092681
cg00339556
cg01791874
cg17030173
cg17712694
PCDHB7
cg24168884
cg25412979
cg08048222
cg24016939
EP
Gene Symbol
cg22184507
cg11011625
TENC1
cg17094065
cg18037834
ZNF257
cg24175803
cg02912127
ZNF486 ZNF671
AC C
PCDHGA1
cg07242860
cg03570035
cg07227744
cg24749688
cg12074025
cg19246110
cg11977686
cg16150752
cg21901718