Article
AML Subtype Is a Major Determinant of the Association between Prognostic Gene Expression Signatures and Their Clinical Significance Graphical Abstract
Authors Caroline R.M. Wiggers, Mirna L. Baak, Edwin Sonneveld, Edward E.S. Nieuwenhuis, Marije Bartels, Menno P. Creyghton
Correspondence
[email protected] (M.B.),
[email protected] (M.P.C.)
In Brief Predicting relapse risk using prognostic gene expression signatures can help direct therapeutic strategies in the management of malignancies. Wiggers et al. demonstrate that a prognostic signature that predicts relapse risk across AML patients contains subsets of genes that are valid within defined patient subgroups only and do not transcend subgroups.
Highlights d
Regulatory DNA and expression analysis predict relapse risk in AML patients
d
Both patients and prognostic signatures are composed of defined subsets
d
Analyzing gene and patient subsets separately enhances prognostic performance
d
Prognostic gene subsets are valid exclusively in defined patient subgroups
Wiggers et al., 2019, Cell Reports 28, 2866–2877 September 10, 2019 ª 2019 The Author(s). https://doi.org/10.1016/j.celrep.2019.08.012
Cell Reports
Article AML Subtype Is a Major Determinant of the Association between Prognostic Gene Expression Signatures and Their Clinical Significance Caroline R.M. Wiggers,1,2 Mirna L. Baak,1 Edwin Sonneveld,3,4 Edward E.S. Nieuwenhuis,2 Marije Bartels,2,* and Menno P. Creyghton1,5,* 1Hubrecht
Institute-KNAW & University Medical Center Utrecht, 3584 CT Utrecht, the Netherlands of Pediatric Hematology, University Medical Center Utrecht, 3584 EA Utrecht, the Netherlands 3Prinsess Ma ´ xima Center for Pediatric Oncology, 3584 CS Utrecht, the Netherlands 4Dutch Childhood Oncology Group (DCOG), Princess Ma ´ xima Center for Pediatric Oncology, 3584 CS Utrecht, the Netherlands 5Lead Contact *Correspondence:
[email protected] (M.B.),
[email protected] (M.P.C.) https://doi.org/10.1016/j.celrep.2019.08.012 2Department
SUMMARY
Relapse in acute myeloid leukemia (AML) may result from variable genetic origins or convergence on common biological processes. Exploiting the specificity and sensitivity of regulatory DNA, we analyze patient samples of multiple clinical outcomes covering various AML molecular subtypes. We uncover regulatory variation among patients translating into a transcriptional signature that predicts relapse risk. In addition, we find clusters of coexpressed genes within this signature selectively link to relapse risk in distinct patient subgroups defined by molecular subtype or AML maturation. Analyzing these gene clusters and the AML subtypes separately enhances their prognostic value substantially and provides insight in the mechanisms underlying relapse risk across the distinct patient subgroups. We propose that prognostic gene expression signatures in AML are valid only within patient subgroups and do not transcend these subgroups. INTRODUCTION Acute myeloid leukemia (AML) is a heterogeneous disease affecting both adult and pediatric patients in which variable genetic aberrations lead to the expansion of undifferentiated myeloid cells at the expense of the hematopoietic system (Bolouri et al., 2018; Tarlock and Meshinchi, 2015). Although initial chemotherapeutical regimens induce efficient remissions, relapse rates are high, the cause of which is poorly understood (de Rooij et al., 2015; Gamis et al., 2013). Evidence exists for the selection of rare chemotherapy-resistant subpopulations during treatment in adult AML, but in other cases, therapy resistance may be prevalent in a larger proportion of blast cells at initial diagnosis (Ding et al., 2012; Jan et al., 2012; Li et al., 2016; Parkin et al., 2013). This could underlie some paucity in identifying consistent secondary mutations in patients that may explain their propensity to relapse (Ferrando and Lo´pez-Otı´n,
2017; Landau et al., 2014). In adult patients, relapse risk can be linked to overrepresented gene sets with stem cell-like properties, which may point to a link between the relapse risk and the developmental state from which the AML emerged (Eppert et al., 2011; Gentles et al., 2010; Levine et al., 2015; Ng et al., 2016). Nevertheless, these analyses have not yielded comprehensive insights into the processes underlying relapse potential and have yielded few viable targets for intervention. Furthermore, by focusing on one expression signature that is predictive across all patients, these studies assume that the propensity to relapse converges on a single biological program that is the same for all patients. A more sensitive approach to assessing cell state changes, compared with analyzing gene expression only, is the analysis of the gene regulatory elements (GREs) underlying these gene expression networks (Corces et al., 2016). This is largely because of the unique cell-specific activity of these elements (Kundaje et al., 2015). Direct analysis of GREs provides additional insight into the hierarchy underlying gene expression changes. As such, mutations in non-coding GREs are progressively being recognized as drivers of various diseases, thereby substantially expanding the mutational space beyond genes (Herz et al., 2014; Hnisz et al., 2015; Mansour et al., 2014). Thus, mutations and/or altered regulation of GREs may underlie some cases in which mutations in genes alone cannot reliably account for the progression of AML into relapse (Hnisz et al., 2013; Kundaje et al., 2015; Lee and Young, 2013). Activity of GREs can be predicted genome-wide by using histone signatures as a proxy (Calo and Wysocka, 2013). In the current study, we have employed large-scale epigenomic analysis to determine GRE activity in a set of pediatric patient samples matching initial diagnosis and relapse samples, as well as samples from patients that did not relapse. Following integration of these data with transcriptomic analysis, we were able to link GREs that changed in relapsed patients to an expression signature that was predictive of relapse risk in pediatric and adult AML. Coexpression analysis suggests that this signature is a mixture of smaller signatures that are exclusively relevant for specific subgroups of patients. Our data provide insight into the regulatory basis of prognostic gene expression
2866 Cell Reports 28, 2866–2877, September 10, 2019 ª 2019 The Author(s). This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
B
A
C 1500
100
Pediatric AML at initial diagnosis Relapse state RP-Dx
Dimension 2
RP-Rel
chemotherapy
remission
NRP
500
Mol. subtype NK
0
NK (Flt3-ITD) Other (Flt3-ITD)
−500
CBFa-ETO CBFb-MYH11
non-relapse
relapse
80
% variance explained
1000
60
40
20
MLL-AF9
−1000
MLL-ENL
0
500
1000
1500
E
4 kB
F
nt
oi
5 5 5 5 5 5 5 5 5
27
25
26
1.0
22
0.7
5
23 24 23 24 27
5
21
2,387 enriched regions
4 kB
114kB
16
0
(99,511,133 − 99,626,029)
20
rpm
4 kB
chr10
0.6
NK (Flt3-ITD) MLL NRP RP NRP RP
MLL NK(Flt3-ITD) CBF
CBF NRP RP
D
Ti
m
ep
R
el
ap
Dimension 1
se st a (D te xM R ol . s el) ub t FA ype Bcl a Fl ss t3 -IT D N PM 1 W T1 R R AS es id ua l
0
−1000 −500
RP-Dx
5
RP-Rel
NKX2-3
SLC25A28
Figure 1. Regulatory Variation in AML Patients Primarily Results from Molecular Subtype Variability (A) Leukemic blast cells were isolated from pediatric AML patients at initial diagnosis and, in the case of relapsed patients, at relapse. (B) t-SNE analysis of 36 pediatric AML patient samples based on 62,284 H3K27ac-enriched regions. Shape represents relapse state, where matched diagnosis (Dx) and relapse (Rel) samples are connected by dashed lines. Color coding represents the molecular subtypes. (C) Contribution of multiple covariates assessed by a multivariate linear regression using ANOVA on all 62,284 H3K27ac-enriched regions across all 36 patient samples. (D) Heatmaps of reads per million (RPM)-normalized H3K27ac read counts at regions that were differentially enriched among common AML molecular subtypes, including MLL translocations, CBF translocations, and NK-(Flt3-ITD) patient samples (4 samples merged for each subtype). (E) RPM-normalized ChIP-seq tracks (axis limit 5) for MLL-rearranged patient samples (red), NK-(Flt3-ITD) patient samples (blue), and CBF-rearranged patient samples (green) for the locus surrounding NKX2-3. (F) Hierarchical clustering (Pearson correlation) of 18 patient samples (9 relapsed patients at Dx and at Rel) based on normalized H3K27ac enrichment of 50,364 H3K27ac-enriched regions. Left dendrogram shows hierarchical clustering (Pearson correlation) of patient samples. Numbers indicate patient numbers (Table S1), colored boxes at the bottom indicate relapse state and molecular subtypes as shown in (B). See also Figure S1 and Table S1.
signatures while suggesting that their prognostic capability can best be realized by defining and analyzing patient subgroups separately. RESULTS Regulatory Variation in AML Patients Primarily Results from Molecular Subtype Variability To analyze the regulatory landscape of blast populations in pediatric AML patients, we collected 36 patient samples covering
common AML molecular aberrations (Table S1). We included 15 non-relapsed patients (NRPs) and 12 relapsed patients (RPs), of which we matched samples taken at initial diagnosis (Dx) and at relapse (Rel) for 9 RPs (Figure 1A; Table S1). We generated whole-genome chromatin immunoprecipitation sequencing (ChIP-seq) data for histone 3 lysine 27 acetylation (H3K27ac) to analyze activity changes at GREs among patients (Table S1). This is a robust and widely used assay to identify active GREs on a global scale (Arnold et al., 2013; Bonn et al., 2012; Creyghton et al., 2010; Nord et al., 2013; Rada-Iglesias
Cell Reports 28, 2866–2877, September 10, 2019 2867
et al., 2011; Vermunt et al., 2014; Villar et al., 2015). In addition, we generated RNA sequencing (RNA-seq) data covering a selection of patient samples (Table S1). By combining all H3K27acenriched regions, we prioritized a set of 62,284 enriched regions (ERs) across all samples, including promoters and enhancers, that were found in at least 2 patient samples (Figure S1A). As expected, genes near active regulatory elements were overall more active (Figure S1B). Similarity of the samples based on all active H3K27ac enriched regions was visualized in a t-Distributed Stochastic Neighbor Embedding (t-SNE) analysis (Figure 1B). We observed an overall separation of AML samples based on their molecular aberration with patients harboring a mixed-lineage leukemia (MLL) translocation, core binding factor (CBF) translocation, or normal karyotype (NK) with a fms-related tyrosine kinase 3-internal tandem duplication (Flt3-ITD) roughly coclustering. Most variation in the H3K27ac ChIP-seq dataset could be explained by molecular subgroups and maturation stage as defined by French-American-British (FAB) classification (Bennett et al., 1976) (Figure 1C). Significant differentially enriched (DE) regions between the main molecular aberrations (MLL rearranged, CBF rearranged, and NK with Flt3-ITD), as determined using DESeq2 analysis (false discovery rate [FDR] < 0.001, fold change [FC] > 2), included either uniquely activated or inactivated regions associated with each subtype (n = 2,387) (Figure 1D). For example, enrichment of NKX2-3 was specific for NK patients with Flt3-ITD, MEIS1 enrichment was observed in patients with a MLL translocation and NK patients with Flt3ITD, and an enhancer near KAT6A was significantly more enriched in CBF-rearranged samples only (Figure 1E; Figures S1C and S1D). In addition to the overall clustering based on genomic aberration, we observed that almost all matched DxRel samples were highly similar (Figures 1B and 1F). For one NK patient, patient 27, we did not observe high correlation between initial diagnoses and relapse, suggesting that clonal selection may have occurred after treatment (Figure 1F). This was supported by the presence of a distinct mutational profile present in the relapse sample (Table S1). Thus, major regulatory changes between initial diagnosis and relapse were not identified for most patients, while molecular subtype-specific differences were prominent. A Regulatory Signature Separates Relapsing from Nonrelapsing MLL-AF9 Patients When assessing the t-SNE analysis on all enriched regions, we observed a clear separation of MLL-AF9 patient samples that relapsed from MLL-AF9 patients that did not (Figure 1B). This was also observed for the other subtypes in a principalcomponent analysis (Figure S1E). Nonetheless, variation caused by molecular subtypes and maturation stage (FAB class) was larger than differences between patients that did and those that did not relapse (Figure 1C; Figure S1E). Because this variation can complicate the identification of relapse-linked regions, we first focused on one subtype. By analyzing only MLL-AF9 samples on all enriched regions, we found that most variation was between RPs and NRPs (principal component 1 [PC1], 34.5%) (Figure 2A; Figure S2A). Their direct comparison yielded 527 enriched regions that were DE
2868 Cell Reports 28, 2866–2877, September 10, 2019
(FDR < 0.001, FC > 2) between RPs and NRPs (Figure 2B). Focusing on these regions only revealed that most of their variation was related to relapse state (Figure S2B). For instance, strong H3K27ac enrichment at the BMI1 proto-oncogene, a regulator of self-renewal and stem cell maintenance (Schuringa and Vellenga, 2010), was observed in MLL-AF9 RP samples (Figure 2C). Gene expression analysis confirmed expression changes of linked genes (Table S2), correlating with changes in H3K27ac enrichment at GREs (p < 0.001, Wilcoxon rank-sum test) (Figure 2D). This was also observed in independent RNA-seq data (Therapeutically Applicable Research to Generate Effective Treatments [TARGET] AML study) (Bolouri et al., 2018) covering an additional 16 MLLAF9 patients (p < 0.001, Wilcoxon rank-sum test) (Figures 2D and 2E). Functional analysis of these genes demonstrated an overrepresentation of proto-oncogenes, as well as genes involved in the regulation of cell differentiation (Figure 2F). Consistent with their function, the expression of relapse genes was increased in hematopoietic stem cells (HSCs) compared with monocytes, and H3K27ac enrichment, as well as DNA accessibility, of their linked GREs was enhanced (Figures S2C–S2E). Thus, our analysis uncovers regulatory changes near early developmental genes and relapse potential in MLL-AF9 patients. GREs Linked to Relapse in MLL-Rearranged Patients Associate with Relapse across Molecular Subtypes To investigate our signature in patients carrying other molecular aberrations, we refined our dataset to GREs that were also enriched in at least two of the other major molecular subtypes of AML, including NK with Flt3-ITD and CBF-related subtypes. The remaining 307 regions could similarly differentiate between relapse and non-relapse samples harboring other molecular subtypes (Figure 3A; Figure S3A). Expression analysis leveraging available RNA-seq data from the TARGET AML study (Bolouri et al., 2018) using 20 patient samples equally distributed between RPs and NRPs for both NK with Flt3-ITD (n = 10) and CBF translocations (n = 10) confirmed that the genes near relapse GREs similarly changed (Figure 3B). Therefore, we focused on the relapse genes (n = 165) for further analysis (Table S2). Several of these genes coded for transcription factors that had their binding motifs overrepresented at relapse GREs. These include the Wnt regulator and proto-oncogene TCF4, the cellcycle regulator TFDP1, and ZNF563 (Table S3). In addition, these genes showed significant overlap with genes more highly expressed in leukemic stem cell (LSC) fractions (Gentles et al., 2010; Ng et al., 2016; Pabst et al., 2016) (Figure S3B). Functional analysis of relapse genes revealed that they were still disproportionally linked to developmental regulators, transcription factors, proto-oncogenes, and genes involved in signaling cascades, including Wnt signaling (Figure S3C). For instance, increased enrichment at the SPHK1 gene, which is important for survival in several cancers, including AML (Ogretmen, 2018; Yang et al., 2015), was observed in RPs across other molecular subtypes (Figure 3C). Thus, we identified a gene expression program based on GRE enrichment that is changed in RPs across molecular subtypes.
A
B
PC1: 34.5% −500
−250
0
250
4 kB
500
4 kB
4 kB
4 kB
4 kB
4 kB
4 kB
4 kB
4 kB
1.5 rpm 0 MLL-AF9 RP−Dx
MLL-AF9 RP−Rel
215 ER
MLL-AF9 NRP
C chr10
(22,304,170 − 22,332,036)
27kB
312 ER
RP
6 6 6 6
NRP
6 6 6 6 6
COMMD3
BMI1
MLL-AF9 RP−Dx
MLL-AF9 NRP
1.5
0
NRP genes
RP
0
NRP
High in NRP
Non-relapse genes
2
***
all genes
0.0
2
NRP
−2
5
High in RP
RP
TARGET AML samples
RP genes
0.4
RP
***
−2
pvalue < 0.001 FDR < 0.001 NES = 2.55
0.0
pvalue < 0.001 FDR < 0.001 NES = -2.41
-0.5
High in RP
0
Semantic space y
0
NRP
all genes
Relapse genes RP
***
−1.5
NRP genes
3
***
RP genes
all genes
0
NRP
DCOG samples
all genes
Log2 fold change
MLL-AF9 RP-Rel
F
Enrichment score
−3
E
Enrichment score
D
Differentiation
Cellular process
Transcription
Signaling
Histone/protein modification
Cell growth Cell migration
cellular cell component differentiation chromatin organization hematopoietic modification development cell migration leukocyte immune differentiation metabolism system process
protein ubiquitination kinase activity
0
−5
cell proliferation RNA polymerase II transcription
10000
20000
MAPK cascade
signal transduction by protein phosphorylation sphingosine-1-phosphate
JAK-STAT
High in NRP
Gene rank in dataset
cell growth developmental process
histone protein methylation methylation
−5
tgf-beta egf-receptor
0
5
Semantic space x
Figure 2. A Regulatory Signature Separates Relapsing from Non-relapsing MLL-AF9 Patients (A) Principal-component analysis (PC1, 34.5%) on reads per kilobase per million (RPKM)-normalized H3K27ac enrichment of 62,284-enriched regions in 9 MLLAF9 patient samples. Shape represents relapse state. (B) Heatmaps of RPM-normalized H3K27ac reads at differentially enriched regions (n = 527) between MLL-AF9 relapsed patients and non-relapsed patients. Shape represents relapse state, including relapsed patient samples taken at diagnosis (RP-Dx), at relapse (RP-Rel) and non-relapsed patients (NRPs). (C) RPM-normalized ChIP-seq tracks (axis limit 6) for relapsed and non-relapsed MLL-AF9 patient samples for the locus surrounding the proto-oncogene BMI1. (D) Log2 fold changes in expression (derived from DESeq2) of relapse and non-relapse genes between MLL-AF9 RP and MLL-AF9 NRP samples coming from our dataset (top, Dutch Childhood Oncology Group [DCOG], 2 RPs versus 2 NRPs) (Table S1), as well as from public data (bottom, TARGET AML study, 8 RPs versus 8 NRPs). Red and blue shading indicate genes more highly expressed in RPs or NRPs, respectively. Outliers are not plotted. Dissimilarity among distributions was calculated using a Wilcoxon rank-sum test (***p < 0.001). (E) Gene set enrichment analysis (GSEA, Broad Institute) of relapse genes (top) and non-relapse genes (bottom) of DCOG and TARGET AML samples (10 RPs versus 10 NRPs). (F) Overrepresented functional categories of 527 relapse and non-relapse genes analyzed using Database for Annotation, Visualization, and Integrated Discovery (DAVID). Functional categories are clustered and visualized based on similarity using Reduce + Visualize Gene Ontology (REVIGO) in a two-dimensional annotation space (semantic space). Similar categories are clustered together and colored. The size of the circles represents the p value derived from DAVID, where larger circles represent lower p values. See also Figure S2 and Tables S1 and S2.
Cell Reports 28, 2866–2877, September 10, 2019 2869
A
C chr17
RP
RP-Dx
20
MLL
RP-Rel
NRP
Mol. subtype
4 4 4 4
−20
NK (Flt3-ITD) Other (Flt3-ITD)
−30
0
30
RP
NK (Flt3-ITD)
CBFb-MYH11 NK
1
2
−2
−1
0
1
2
* ***
NRP
RP
NRP
CBF
ns
***
all genes RP genes
CBF
0
RP
4
Log2 fold change
NRP
−1
4
60
Log2 fold change −2
4 4
PC1: 34.9%
B
NRP genes
12kB
CBFa-ETO
NRP
PC2: 12.9%
NRP
0
(76,377,925 − 76,389,987)
4
Relapse state
4 4 4
RP
NK (Flt3-ITD)
SPHK1
Figure 3. GREs Linked to Relapse in MLL-Rearranged Patients Associate with Relapse across Molecular Subtypes (A) Principal-component analysis of non-MLL-rearranged pediatric AML patient samples based on 307 H3K27ac-enriched regions. Relapse state is represented by the shape (triangle, square, or circle), where matched diagnosis (Dx) and relapse (Rel) samples are connected by dashed lines. Color coding represents molecular subtypes. (B) Log2 fold changes in expression (derived from DESeq2) of relapse and non-relapse genes between RPs and NRPs harboring a CBF rearrangement or NK with Flt3-ITD (TARGET AML, 8 RPs versus 8 NRPs). Red and blue shading indicate genes more highly expressed in RPs or NRPs, respectively. Outliers are not plotted. Dissimilarity among distributions was calculated using a Wilcoxon rank-sum test (***p < 0.001). ns, not significant. (C) RPM-normalized ChIP-seq tracks for MLL-rearranged patient samples (red), NK-(Flt3-ITD) patient samples (blue), and CBF-related patient samples (green) for the locus surrounding the SPHK1 gene. See also Figure S3 and Tables S1, S2, and S3.
Genes Linked to Relapse GREs Are Predictive of Clinical Outcome in Pediatric and Adult AML Patients To analyze whether our gene expression signature is predictive of relapse and clinically relevant, we investigated the relationship between the expression of our gene set and the relapse risk in 231 pediatric AML patient samples, consisting of 82 RPs and 149 NRPs, for which clinical information and gene expression data were available (TARGET AML) (Bolouri et al., 2018). We calculated relapse scores based on genes linked to relapse GREs for each patient (see STAR Methods) and divided them into two groups below and above the median score as shown previously (Eppert et al., 2011). A significant difference in patient outcome (relapsed or not relapsed) (p = 0.004 logrank test, hazard ratio [HR] = 1.6; 95% confidence interval [CI], 1.2–2.2) was observed, with patients with a low relapse score relapsing less frequently than patients with a high relapse score despite their mixed genetic abnormalities (Figure 4A). Multivariate analysis demonstrated that this prediction of relapse risk did not depend on known risk factors (Table 1), although we did see some significant associations of individual mutations with a high relapse score (Figure 4B). Furthermore, our signature was still associated with increased risk of relapse in a multivariate analysis with known risk factors and a previously established prognostic gene signature (LSC17; Ng et al., 2016) (Table S4).
2870 Cell Reports 28, 2866–2877, September 10, 2019
To expand our analysis, we investigated the link between our signature and clinical outcome in adult AML. Similar to pediatric data, our signature was significantly associated with relapse risk in adult data of mixed genetic abnormalities (Verhaak et al., 2009) (p = 0.005 log-rank test, HR = 1.5; 95% CI, 1.1–1.9) (Figure 4C; Table S5) and in adult data of patients with NK (Metzeler et al., 2008) (p < 0.001 log-rank test, HR = 2.18; 95% CI, 1.44–3.31) (Figure 4D; Table S6). Thus, our findings based on analyzing GREs yield a prognostic gene set for relapse risk that has relevance across patients and studies. Furthermore, our GRE data provide additional information on the underlying regulatory circuitry. Expression of Coregulated Relapse Genes Is Variable across Distinct AML Subgroups Although our relapse signature is clinically relevant upon analysis of all AML patients combined, it was not equally relevant for the molecular subgroups when analyzed separately (Figure S4A). Similar observations were made when analyzing a distinct prognostic signature (LSC17) (Figure S4B). To better understand the cause-consequence relationships of the genes within this set and the link to relapse risk, we performed a robust weighted gene coexpression network analysis (WGCNA) to generate clusters of genes that share a similar expression pattern across pediatric AML patient samples (n = 231)
A
C
Pediatric AML, TARGET AML (n=231)
1.00
High relapse score
Low relapse score
0.25
Relapse free survival
Relapse free survival
0.50
0.75
0.50
0.25
0.75
0.50
0.25 p < 0.001
p = 0.005
p = 0.004
0.00
0.00
0.00
1
High relapse score
Low relapse score
0.75
0
Adult AML, GSE12417 (NK) (n=123)
1.00
High relapse score
Low relapse score
Relapse free survival
D
Adult AML, GSE6891 (n=349)
1.00
2
3
4
5
0
1
Time (years)
2
3
4
5
6
0
1
Time (years)
2
3
4
5
Time (years)
B Mutations Cytogenetic aberrations
Relapse score Relapse state
−3 0 3 Log2 odds ratio
MLL t(8;21) inv(16) del7q del9q trisomy 8 trisomy 21 minus Y minus X complex cytogenetics Flt3-ITD Flt3-PM NPM1
**
Relapse score Low relapse score High relapse score
Relapse state Relapsed
* *
CEBPA WT1 C-KIT
Non-relapsed
Aberration Present Absent Not diagnosed
Figure 4. Genes Linked to Relapse GREs Are Predictive of Clinical Outcome in Pediatric and Adult AML Patients (A) Kaplan-Meier relapse-free survival analysis of 231 pediatric AML patient samples (TARGET AML study, 149 RPs and 82 NRPs). Samples were separated based on low or high relapse score using the median score as a cutoff (p = 0.004, log-rank test). (B) Heatmap depicting cytogenetic aberrations and mutations in pediatric AML patients separated by low or high relapse score. Odds ratios of significant associations between aberrations and high relapse scores (Fisher exact test, **p < 0.01, *p < 0.05) are shown in the right-hand column. (C and D) Kaplan-Meier relapse-free survival analysis in adult AML of mixed genetic abnormalities (GEO: GSE6891, n = 349) (C) and adult patients with NK (GEO: GSE12417, n = 123) (D). Samples were separated based on low or high relapse score using the median score as a cutoff (p = 0.005, C, and p < 0.001, D, log-rank test). See also Figure S4 and Tables S2 and S4–S6.
(Figure S5A). We identified 12 distinct clusters of genes that coexpress across samples, of which six clusters contained more than five coregulated relapse genes (Figure S5B; Table S2). We focused on these six clusters for further analysis (Figure 5A). To test whether the clusters represented particular patient characteristics, we analyzed these in relation to expression variation across patients. We found that only a small proportion of the variance could be linked to molecular subtype and to maturation stage, with the rest of the variation unexplained (Figures S5C and S5D). This suggests that the clusters do not directly represent one of the tested characteristics. Testing the relapse genes separated by their clusters for their prognostic value across patients revealed that the prognostic relevance was markedly reduced for all clusters, leaving only cluster 1 with a significant HR (Figure 5B). However, expression variation of relapse genes in cluster 1 was still better explained by molecular subtype and maturation than by relapse risk (Figure S6). This raises the possibility that the prognostic value of the relapse genes separated in distinct clusters could partly depend on molecular aberration and maturation.
Coregulated Relapse Genes Are Heterogeneous in Predicting Relapse Risk across AML Subgroups To explore whether the gene expression variability across patient groups was consequential for their prognostic value, we tested the clinical relevance of the relapse genes in the clusters in separate patient groups (i.e., separated by molecular aberration, maturation stage, and molecular risk). We found that isolating specific patient sets for the analysis considerably increased the prognostic performance of the relapse genes in the clusters, with some clusters predicting relapse risk exclusively in one AML subgroup (Figures 5C–5E; Figure S7A). For instance, cluster 9 was highly predictive of relapse risk in MLLrearranged patients and patients classified as FAB-M5 (monocytic leukemia), but not for M0/M1 FAB-classified AML patients (Figures 5D and 5F). This cluster contains several genes, including DNA methyltransferases DNMT3A and DNMT3B. DNMT3A is found mutated in 50% of adult patients with monocytic AML FAB-M5 (Ley et al., 2010). Relapse genes in cluster 9 were also significantly associated with increased relapse risk in M5-classified adult patients (Figure S7B), even though
Cell Reports 28, 2866–2877, September 10, 2019 2871
Table 1. Univariate and Multivariate Cox Regression Analysis of Relapse Score in Pediatric AML (TARGET AML Study) Multivariate Analysis Continuous Relapse Scorea
Univariate Analysis
Multivariate Analysis Dichotomous Relapse Scoreb
Relapse-free Survival
Hazard Ratio (95% CI)
p Value
Hazard Ratio (95% CI)
p Value
Hazard Ratio (95% CI)
p Value
Relapse scorea
1.31 (1.07–1.62)
0.011
1.38 (1.13–1.68)
0.002
NA
NA
High relapse scoreb
1.59 (1.15–2.21)
0.004
NA
NA
1.63 (1.15–2.28)
0.005
CBF rearrangedc
0.75 (0.53–1.06)
0.102
0.43 (0.27–0.68)
<0.001
0.46 (0.29–0.73)
0.001
Complex cytogenetics
1.41 (0.83–2.40)
0.211
1.13 (0.65–1.97)
0.658
1.20 (0.69–2.08)
0.517
Flt3-ITDc
1.43 (0.92–2.23)
0.114
1.19 (0.72–1.97)
0.505
1.23 (0.74–2.04)
0.424
NPM1c
0.40 (0.18–0.91)
0.029
0.22 (0.09–0.52)
<0.001
0.23 (0.10–0.55)
<0.001
c
0.23 (0.08–0.72)
0.012
0.16 (0.05–0.52)
0.002
0.18 (0.06–0.58)
0.004
WT1c
1.35 (0.67–2.77)
0.401
1.25 (0.57–2.70)
0.579
1.21 (0.55–2.64)
0.635
cKITc
1.25 (0.77–2.03)
0.364
1.75 (0.95–3.22)
0.071
1.72 (0.94–3.15)
0.081
Genderd
0.77 (0.56–1.07)
0.118
0.81 (0.58–1.13)
0.210
0.81 (0.58–1.13)
0.206
Age (days)
1.00 (1.00–1.00)
0.960
1.00 (1.00–1.00)
0.382
1.00 (0.99–1.00)
0.299
CEBPA
CI, confidence interval; NA, not applicable. a Relapse scores per interquartile range. b Low versus high relapse score. c Mutated versus wild type. d Male versus female.
monocytic AML FAB-M5 is dominated by MLL translocations in pediatric patients while it is dominated by patients with NK in adults (Figure S7C). This demonstrates that the link between cluster 9 and relapse risk in patients with monocytic AML is related not to the molecular aberration but to the maturation stage. In contrast to cluster 9, cluster 1 is predictive of relapse risk in patients with myeloblastic leukemia, which are classified according to the FAB classification as M0, M1, or M2 but is not predictive for M4-classified AML (Figures 5E and 5G). The relapse genes in this cluster were overrepresented in Wnt signaling genes (Figure S7D) and contain several genes previously linked to overall survival in adult patients, including TCF4 (in ‘t Hout et al., 2014) and GATA2, the latter of which is also a poor prognostic marker that is found mutated in pediatric AML (Bolouri et al., 2018; Luesink et al., 2012). Although both cluster 1 and cluster 3 were composed of relapse genes that were enriched in HSCs compared with monocytes (Figure S8A), and relapse scores were higher in immature AML compared with mature AML (Figure S8B), their clinical relevance is specific to distinct patient groups (Figures 5C and 5E; Figure S7A). Likewise, for many other clusters, there was no clear relationship between the expression of the genes within a cluster and the patient group for which the cluster was predictive of relapse (Figures S8B–S8J). For instance, relapse scores in cluster 9 were selectively lower for patients with CBFa-MYH11 (inv(16)), while this cluster was only predictive in MLL-rearranged patients (Figure S8C). Nonetheless, relapse scores in cluster 2 were particularly high in high-molecular-risk samples, for which this cluster was also predictive of clinical outcome (Figure S8D). However, this was not a feature of the entire cluster, because all genes present in the cluster were not selectively elevated in high-molecular-risk samples, suggesting that the clusters are not direct
2872 Cell Reports 28, 2866–2877, September 10, 2019
representations of the patient subgroups for which they are predictive of relapse (Figure S5). Altogether, our results demonstrate that the association between increased relapse risk and our signature is a mixture of associations between clusters of the signature and AML samples with specific biological features. DISCUSSION Differential expression of genes in blast populations among AML patients is a likely contributor to the variable clinical outcome of these patients and of potential prognostic use to make patientspecific predictions that can be used to refine therapy (Eppert et al., 2011; Gentles et al., 2010; Levine et al., 2015; Ng et al., 2016). Nonetheless, translating these expression signatures to the clinic has been difficult. Our analysis demonstrates that the expression of the genes within these networks, while linked overall to relapse potential across patients, was variable among groups of patients as defined by molecular or cellular characteristics. We observe that sets of genes that were coregulated across all samples varied in their ability to predict relapse risk dependent on which subset of patients was analyzed (i.e., molecular aberration, FAB classification, or molecular risk). This demonstrates that subsets of the relapse genes are prognostic in a subgroup-specific manner. As such, these coregulated subsignatures were more efficient in predicting relapse risk when analyzing specific patient populations. One subgroup is defined based on the maturation stage of the AML blast cells by using FAB classification. This classification has been phased out and replaced by the World Health Organization (WHO) classification, which has greater prognostic value (Walter et al., 2013). However, our data show the prognostic capacity for a specific gene cluster in a FAB subgroup that is
(legend on next page)
Cell Reports 28, 2866–2877, September 10, 2019 2873
independent of the underlying molecular aberration. This cluster of relapse genes (cluster 9) was predictive of relapse in monoblastic AML-M5 samples in both adult and pediatric patients, while in pediatric patients, these are dominated by MLL translocations as opposed to in adults, for whom MLL translocations are rare. Thus, we propose that phasing out the maturation stage may be premature, because this information may help assign a patient to a more specific AML subgroup. Our data demonstrate that the gene set identified by directly comparing RPs versus NRPs is a heterogeneous mixture of smaller sets of genes that only predict relapse risk in particular subgroups of patients. They also explain why so many samples are typically needed to generate prognostic signatures using gene expression analysis (Eppert et al., 2011; Gentles et al., 2010; Levine et al., 2015; Ng et al., 2016). Furthermore, they demonstrate that efforts to develop one gene expression signature that will predict clinical outcome in all patient subgroups are unlikely to succeed. A more direct consequence of this heterogeneity is that variable intervention strategies may be needed toward patient-specific features of the blast population. For instance, relapse genes in cluster 1, which predicted relapse risk in myeloblastic leukemia (FAB-M0/M1/M2), were enriched in Wnt signaling genes, including TCF4, GATA2, and YES1. Genome sequencing and DNA methylation analysis suggest frequent mutations or deregulation of Wnt signaling genes in AML (Bolouri et al., 2018; Valencia et al., 2009), and while LSCs critically depend on Wnt signaling for their self-renewal, normal HSCs are not (Wang et al., 2010). Furthermore, YES1, a kinase closely linked to cancer progression and therapy resistance, has been shown to cooperate with the Wnt signaling component b-catenin in oncogenesis (Rosenbluh et al., 2012; Touil et al., 2014). This suggests that limiting Wnt signaling may be of particular benefit for this patient group. In contrast, relapse genes in cluster 9 were highly predictive of relapse risk in FAB-M5 regardless of age or the underlying molecular aberration (predominantly MLL rearrangements in pediatric patients and NK in adults). This cluster contained both DNMT3B and DNMT3A, the latter of which is often found mutated in patients with FAB-M5 (Ley et al., 2010). It will be interesting to explore how expression changes in DNMT3A and DNMT3B relate to the DNMT3A mutations in patients that are typically loss-of-function mutations. It is possible that expression changes reflect a feedback loop in an attempt to compen-
sate for loss of function. Alternatively, mere deregulation of DNA methylation by either gain or loss of DNMT3A activity may render a cell epigenetically instable and as such more likely to yield variants that escape therapy. Our data provide an important starting point from which patient group-specific analyses may be further explored. Nonetheless, although our data demonstrate that the identification of better-defined subcategories should be a priority, it is unclear how many subgroups need to be characterized to yield clinically useful prognostic gene expression signatures. Furthermore, because treatment intensity varies depending on karyotyping and clinical behavior, and clinical outcome depends on treatment regimen, the predictive value of specific gene expression signatures on clinical outcome across AML subtypes in relation to treatment intensity needs to be further evaluated. More extensive analysis of larger patient groups will have to be performed to identify all potential subgroups that are relevant for prognostic purposes. STAR+METHODS Detailed methods are provided in the online version of this paper and include the following: d d d d
d
KEY RESOURCES TABLE LEAD CONTACT AND MATERIALS AVAILABILITY EXPERIMENTAL MODEL AND SUBJECT DETAILS B Pediatric AML patient samples METHOD DETAILS B Handling of AML patient samples B Chromatin Immunoprecipitation-sequencing B ChIP-Sequencing alignment and read counting B Hierarchical clustering, t-SNE and PCA B RNA-sequencing B RNA-Sequencing alignment B Analysis of changes in gene expression B Gene ontology analysis B Transcription factor motif enrichment B Relapse scores and relapse free survival analysis B Weighted gene co-expression analysis (WGCNA) QUANTIFICATION AND STATISTICAL ANALYSIS B Analysis fraction of variance B Determination of differentially enriched regions B Gene expression analysis
Figure 5. Coregulated Relapse Genes Are Heterogeneous in Predicting Relapse Risk across AML Subgroups (A) Six clusters derived from WGCNA contained more than 5 relapse genes and are used for further analysis. Networks are visualized using Cytoscape. (B) Hazard ratios of the association between increased relapse risk and high relapse scores based on relapse genes present in network clusters in the pediatric AML dataset (TARGET AML) (*p < 0.05, log-rank test). Mol. risk, molecular risk; mol. subtype, molecular subtype. (C) Visualization of significant associations between increased relapse risk (p < 0.05, log-rank test) and each of the network clusters containing more than 5 relapse genes. Colored pieces of the pie chart represent significant associations among high relapse scores with an increased risk of relapse in that particular subgroup. See hazard ratios of cluster 9 in Figure 5D, cluster 1 in Figure 5E, and clusters 2–4 and 7 in Figure S7A. LMR, low molecular risk; SMR, standard molecular risk; HMR, high molecular risk. (D and E) Hazard ratios of relapse scores in cluster 9 (D) and cluster 1 (E) in separate AML patient groups. Hazard ratios were calculated using dichotomous relapse scores. Asterisks indicate a significant association of increased relapse risk with a high relapse score (log-rank test, ***p < 0.001, **p < 0.01, *p < 0.05). (F and G) Kaplan-Meier relapse-free survival analysis for cluster 9 relapse scores for FAB-M0/M1 (left) and FAB-M5 (right) (F) and for cluster 1 for FAB-M0/M1 (left) and FAB-M4 (right) (G). Samples were separated based on low or high relapse score using the median score as a cutoff. The p value is derived using a log-rank test. See also Figures S5–S8 and Table S2.
2874 Cell Reports 28, 2866–2877, September 10, 2019
B
Overlap with known LSC-genes Relapse free survival analysis DATA AND CODE AVAILABILITY B
d
SUPPLEMENTAL INFORMATION Supplemental Information can be found online at https://doi.org/10.1016/j. celrep.2019.08.012. ACKNOWLEDGMENTS We thank the Utrecht Sequencing Facility for providing sequencing service and data. The Utrecht Sequencing Facility is subsidized by the University Medical Center Utrecht, Hubrecht Institute, and Utrecht University. C.R.M.W. and M.B. were supported by the Foundation Friends of the UMC Utrecht/Wilhelmina Children’s Hospital. M.P.C. was supported by a grant from the Dutch Cancer Foundation (KWF/HUBR 2012-5392). AUTHOR CONTRIBUTIONS M.P.C., M.B., and C.R.M.W. conceived the project. C.R.M.W. performed experiments. C.R.M.W. and M.L.B. analyzed data. C.R.M.W., M.B., and M.P.C. interpreted the data. E.S. collected patient samples. M.P.C., C.R.M.W., M.B., and E.E.S.N. wrote the manuscript. All authors reviewed and approved the manuscript. DECLARATION OF INTERESTS The authors declare no competing interests. Received: February 15, 2019 Revised: May 24, 2019 Accepted: July 31, 2019 Published: September 10, 2019 REFERENCES Anders, S., Pyl, P.T., and Huber, W. (2015). HTSeq—a Python framework to work with high-throughput sequencing data. Bioinformatics 31, 166–169. , L.M., Rath, M., and Stark, A. Arnold, C.D., Gerlach, D., Stelzer, C., Boryn (2013). Genome-wide quantitative enhancer activity maps identified by STARR-seq. Science 339, 1074–1077. Bennett, J.M., Catovsky, D., Daniel, M.T., Flandrin, G., Galton, D.A., Gralnick, H.R., and Sultan, C. (1976). Proposals for the classification of the acute leukaemias. French-American-British (FAB) co-operative group. Br. J. Haematol. 33, 451–458. Bolouri, H., Farrar, J.E., Triche, T., Jr., Ries, R.E., Lim, E.L., Alonzo, T.A., Ma, Y., Moore, R., Mungall, A.J., Marra, M.A., et al. (2018). The molecular landscape of pediatric acute myeloid leukemia reveals recurrent structural alterations and age-specific mutational interactions. Nat. Med. 24, 103–112. Bonn, S., Zinzen, R.P., Girardot, C., Gustafson, E.H., Perez-Gonzalez, A., Delski, B., Riddell, A., and Furlong, E.E. homme, N., Ghavi-Helm, Y., Wilczyn (2012). Tissue-specific analysis of chromatin state identifies temporal signatures of enhancer activity during embryonic development. Nat. Genet. 44, 148–156. Calo, E., and Wysocka, J. (2013). Modification of enhancer chromatin: what, how, and why? Mol. Cell 49, 825–837.
(2010). Histone H3K27ac separates active from poised enhancers and predicts developmental state. Proc. Natl. Acad. Sci. USA 107, 21931– 21936. de Rooij, J.D., Zwaan, C.M., and van den Heuvel-Eibrink, M. (2015). Pediatric AML: From Biology to Clinical Management. J. Clin. Med. 4, 127–149. Ding, L., Ley, T.J., Larson, D.E., Miller, C.A., Koboldt, D.C., Welch, J.S., Ritchey, J.K., Young, M.A., Lamprecht, T., McLellan, M.D., et al. (2012). Clonal evolution in relapsed acute myeloid leukaemia revealed by whole-genome sequencing. Nature 481, 506–510. Eppert, K., Takenaka, K., Lechman, E.R., Waldron, L., Nilsson, B., van Galen, P., Metzeler, K.H., Poeppl, A., Ling, V., Beyene, J., et al. (2011). Stem cell gene expression programs influence clinical outcome in human leukemia. Nat. Med. 17, 1086–1093. Ferrando, A.A., and Lo´pez-Otı´n, C. (2017). Clonal evolution in leukemia. Nat. Med. 23, 1135–1145. Gamis, A.S., Alonzo, T.A., Perentesis, J.P., and Meshinchi, S.; COG Acute Myeloid Leukemia Committee (2013). Children’s Oncology Group’s 2013 blueprint for research: acute myeloid leukemia. Pediatr. Blood Cancer 60, 964–971. Gentles, A.J., Plevritis, S.K., Majeti, R., and Alizadeh, A.A. (2010). Association of a leukemic stem cell gene expression signature with clinical outcomes in acute myeloid leukemia. JAMA 304, 2706–2715. Herz, H.M., Hu, D., and Shilatifard, A. (2014). Enhancer malfunction in cancer. Mol. Cell 53, 859–866. Hnisz, D., Abraham, B.J., Lee, T.I., Lau, A., Saint-Andre´, V., Sigova, A.A., Hoke, H.A., and Young, R.A. (2013). Super-enhancers in the control of cell identity and disease. Cell 155, 934–947. Hnisz, D., Schuijers, J., Lin, C.Y., Weintraub, A.S., Abraham, B.J., Lee, T.I., Bradner, J.E., and Young, R.A. (2015). Convergence of developmental and oncogenic signaling pathways at transcriptional super-enhancers. Mol. Cell 58, 362–370. Huang, D.W., Sherman, B.T., and Lempicki, R.A. (2009). Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat. Protoc. 4, 44–57. in ’t Hout, F.E., van der Reijden, B.A., Monteferrario, D., Jansen, J.H., and Huls, G. (2014). High expression of transcription factor 4 (TCF4) is an independent adverse prognostic factor in acute myeloid leukemia that could guide treatment decisions. Haematologica 99, e257–e259. Jan, M., Snyder, T.M., Corces-Zimmerman, M.R., Vyas, P., Weissman, I.L., Quake, S.R., and Majeti, R. (2012). Clonal evolution of preleukemic hematopoietic stem cells precedes human acute myeloid leukemia. Sci. Transl. Med. 4, 149ra118. Kim, D., Langmead, B., and Salzberg, S.L. (2015). HISAT: a fast spliced aligner with low memory requirements. Nat. Methods 12, 357–360. Kundaje, A., Meuleman, W., Ernst, J., Bilenky, M., Yen, A., Heravi-Moussavi, A., Kheradpour, P., Zhang, Z., Wang, J., Ziller, M.J., et al.; Roadmap Epigenomics Consortium (2015). Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330. Landau, D.A., Carter, S.L., Getz, G., and Wu, C.J. (2014). Clonal evolution in hematological malignancies and therapeutic implications. Leukemia 28, 34–43. Landt, S.G., Marinov, G.K., Kundaje, A., Kheradpour, P., Pauli, F., Batzoglou, S., Bernstein, B.E., Bickel, P., Brown, J.B., Cayting, P., et al. (2012). ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia. Genome Res. 22, 1813–1831.
Corces, M.R., Buenrostro, J.D., Wu, B., Greenside, P.G., Chan, S.M., Koenig, J.L., Snyder, M.P., Pritchard, J.K., Kundaje, A., Greenleaf, W.J., et al. (2016). Lineage-specific and single-cell chromatin accessibility charts human hematopoiesis and leukemia evolution. Nat. Genet. 48, 1193–1203.
Langfelder, P., and Horvath, S. (2008). WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics 9, 559.
Creyghton, M.P., Cheng, A.W., Welstead, G.G., Kooistra, T., Carey, B.W., Steine, E.J., Hanna, J., Lodato, M.A., Frampton, G.M., Sharp, P.A., et al.
Langmead, B., and Salzberg, S.L. (2012). Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359.
Langfelder, P., Zhang, B., and Horvath, S. (2008). Defining clusters from a hierarchical cluster tree: the Dynamic Tree Cut package for R. Bioinformatics 24, 719–720.
Cell Reports 28, 2866–2877, September 10, 2019 2875
Lee, T.I., and Young, R.A. (2013). Transcriptional regulation and its misregulation in disease. Cell 152, 1237–1251. Levine, J.H., Simonds, E.F., Bendall, S.C., Davis, K.L., Amir, el-, A.D., Tadmor, M.D., Litvin, O., Fienberg, H.G., Jager, A., Zunder, E.R., et al. (2015). DataDriven Phenotypic Dissection of AML Reveals Progenitor-like Cells that Correlate with Prognosis. Cell 162, 184–197. Ley, T.J., Ding, L., Walter, M.J., McLellan, M.D., Lamprecht, T., Larson, D.E., Kandoth, C., Payton, J.E., Baty, J., Welch, J., et al. (2010). DNMT3A mutations in acute myeloid leukemia. N. Engl. J. Med. 363, 2424–2433. Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G., and Durbin, R.; 1000 Genome Project Data Processing Subgroup (2009). The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079.
Rada-Iglesias, A., Bajpai, R., Swigut, T., Brugmann, S.A., Flynn, R.A., and Wysocka, J. (2011). A unique chromatin signature uncovers early developmental enhancers in humans. Nature 470, 279–283. Robinson, J.T., Thorvaldsdo´ttir, H., Winckler, W., Guttman, M., Lander, E.S., Getz, G., and Mesirov, J.P. (2011). Integrative genomics viewer. Nat. Biotechnol. 29, 24–26. Rosenbluh, J., Nijhawan, D., Cox, A.G., Li, X., Neal, J.T., Schafer, E.J., Zack, T.I., Wang, X., Tsherniak, A., Schinzel, A.C., et al. (2012). b-Catenin-driven cancers require a YAP1 transcriptional complex for survival and tumorigenesis. Cell 151, 1457–1473. Schuringa, J.J., and Vellenga, E. (2010). Role of the polycomb group gene BMI1 in normal and leukemic hematopoietic stem and progenitor cells. Curr. Opin. Hematol. 17, 294–299.
Li, S., Garrett-Bakelman, F.E., Chung, S.S., Sanders, M.A., Hricik, T., Rapaport, F., Patel, J., Dillon, R., Vijay, P., Brown, A.L., et al. (2016). Distinct evolution and dynamics of epigenetic and genetic heterogeneity in acute myeloid leukemia. Nat. Med. 22, 792–799.
Shannon, P., Markiel, A., Ozier, O., Baliga, N.S., Wang, J.T., Ramage, D., Amin, N., Schwikowski, B., and Ideker, T. (2003). Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13, 2498–2504.
Love, M.I., Huber, W., and Anders, S. (2014). Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550.
Shen, L., Shao, N., Liu, X., and Nestler, E. (2014). ngs.plot: Quick mining and visualization of next-generation sequencing data by integrating genomic databases. BMC Genomics 15, 284.
Luesink, M., Hollink, I.H., van der Velden, V.H., Knops, R.H., Boezeman, J.B., de Haas, V., Trka, J., Baruchel, A., Reinhardt, D., van der Reijden, B.A., et al. (2012). High GATA2 expression is a poor prognostic marker in pediatric acute myeloid leukemia. Blood 120, 2064–2075.
Subramanian, A., Tamayo, P., Mootha, V.K., Mukherjee, S., Ebert, B.L., Gillette, M.A., Paulovich, A., Pomeroy, S.L., Golub, T.R., Lander, E.S., and Mesirov, J.P. (2005). Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. USA 102, 15545–15550. Supek, F., Bosnjak, M., Skunca, N., and Smuc, T. (2011). REVIGO summarizes
Mansour, M.R., Abraham, B.J., Anders, L., Berezovskaya, A., Gutierrez, A., Durbin, A.D., Etchin, J., Lawton, L., Sallan, S.E., Silverman, L.B., et al. (2014). Oncogene regulation. An oncogenic super-enhancer formed through somatic mutation of a noncoding intergenic element. Science 346, 1373– 1377. McLeay, R.C., and Bailey, T.L. (2010). Motif Enrichment Analysis: a unified framework and an evaluation on ChIP data. BMC Bioinformatics 11, 165. Metzeler, K.H., Hummel, M., Bloomfield, C.D., Spiekermann, K., Braess, J., Sauerland, M.C., Heinecke, A., Radmacher, M., Marcucci, G., Whitman, S.P., et al.; Cancer and Leukemia Group B; German AML Cooperative Group (2008). An 86-probe-set gene-expression signature predicts survival in cytogenetically normal acute myeloid leukemia. Blood 112, 4193–4201. Moreno-Betancur, M., Sadaoui, H., Piffaretti, C., and Rey, G. (2017). Survival Analysis with Multiple Causes of Death: Extending the Competing Risks Model. Epidemiology 28, 12–19. Ng, S.W., Mitchell, A., Kennedy, J.A., Chen, W.C., McLeod, J., Ibrahimova, N., Arruda, A., Popescu, A., Gupta, V., Schimmer, A.D., et al. (2016). A 17-gene stemness score for rapid determination of risk in acute leukaemia. Nature 540, 433–437. Nord, A.S., Blow, M.J., Attanasio, C., Akiyama, J.A., Holt, A., Hosseini, R., Phouanenavong, S., Plajzer-Frick, I., Shoukry, M., Afzal, V., et al. (2013). Rapid and pervasive changes in genome-wide enhancer usage during mammalian development. Cell 155, 1521–1531. Ogretmen, B. (2018). Sphingolipid metabolism in cancer signalling and therapy. Nat. Rev. Cancer 18, 33–50. Pabst, C., Bergeron, A., Lavalle´e, V.P., Yeh, J., Gendron, P., Norddahl, G.L., Krosl, J., Boivin, I., Deneault, E., Simard, J., et al. (2016). GPR56 identifies primary human acute myeloid leukemia cells with high repopulating potential in vivo. Blood 127, 2018–2027. Parikshak, N.N., Luo, R., Zhang, A., Won, H., Lowe, J.K., Chandran, V., Horvath, S., and Geschwind, D.H. (2013). Integrative functional genomic analyses implicate specific molecular pathways and circuits in autism. Cell 155, 1008– 1021. Parkin, B., Ouillette, P., Li, Y., Keller, J., Lam, C., Roulston, D., Li, C., Shedden, K., and Malek, S.N. (2013). Clonal evolution and devolution after chemotherapy in adult acute myelogenous leukemia. Blood 121, 369–377. Quinlan, A.R., and Hall, I.M. (2010). BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842.
2876 Cell Reports 28, 2866–2877, September 10, 2019
and visualizes long lists of gene ontology terms. PLoS ONE 6, e21800. Tarlock, K., and Meshinchi, S. (2015). Pediatric acute myeloid leukemia: biology and therapeutic implications of genomic variants. Pediatr. Clin. North Am. 62, 75–93. R Development Core Team (2012). R: A language and environment for statistical computing (R Foundation for Statistical Computing). Touil, Y., Igoudjil, W., Corvaisier, M., Dessein, A.F., Vandomme, J., Monte, D., Stechly, L., Skrypek, N., Langlois, C., Grard, G., et al. (2014). Colon cancer cells escape 5FU chemotherapy-induced cell death by entering stemness and quiescence associated with the c-Yes/YAP axis. Clin. Cancer Res 20, 837–846. Valencia, A., Roma´n-Go´mez, J., Cervera, J., Such, E., Barraga´n, E., Bolufer, P., Moscardo´, F., Sanz, G.F., and Sanz, M.A. (2009). Wnt signaling pathway is epigenetically regulated by methylation of Wnt antagonists in acute myeloid leukemia. Leukemia 23, 1658–1666. van der Maarten, L., and Hinton, G. (2008). Visualizing High-Dimensional Data Using t-SNE. J. Mach. Learn. Res. 9, 2579–2605. Verhaak, R.G., Wouters, B.J., Erpelinck, C.A., Abbas, S., Beverloo, H.B., Lugthart, S., Lo¨wenberg, B., Delwel, R., and Valk, P.J. (2009). Prediction of molecular subtypes in acute myeloid leukemia based on gene expression profiling. Haematologica 94, 131–134. Vermunt, M.W., Reinink, P., Korving, J., de Bruijn, E., Creyghton, P.M., Basak, O., Geeven, G., Toonen, P.W., Lansu, N., Meunier, C., et al.; Netherlands Brain Bank (2014). Large-scale identification of coregulated enhancer networks in the adult human brain. Cell Rep. 9, 767–779. Vermunt, M.W., Tan, S.C., Castelijns, B., Geeven, G., Reinink, P., de Bruijn, E., Kondova, I., Persengiev, S., Bontrop, R., Cuppen, E., et al.; Netherlands Brain Bank (2016). Epigenomic annotation of gene regulatory alterations during evolution of the primate brain. Nat. Neurosci. 19, 494–503. Villar, D., Berthelot, C., Aldridge, S., Rayner, T.F., Lukk, M., Pignatelli, M., Park, T.J., Deaville, R., Erichsen, J.T., Jasinska, A.J., et al. (2015). Enhancer evolution across 20 mammalian species. Cell 160, 554–566. Walter, R.B., Othus, M., Burnett, A.K., Lo¨wenberg, B., Kantarjian, H.M., Ossenkoppele, G.J., Hills, R.K., van Montfort, K.G., Ravandi, F., Evans, A., et al. (2013). Significance of FAB subclassification of ‘‘acute myeloid leukemia, NOS’’ in the 2008 WHO classification: analysis of 5848 newly diagnosed patients. Blood 121, 2424–2431.
Wang, Y., Krivtsov, A.V., Sinha, A.U., North, T.E., Goessling, W., Feng, Z., Zon, L.I., and Armstrong, S.A. (2010). The Wnt/beta-catenin pathway is required for the development of leukemia stem cells in AML. Science 327, 1650–1653. Yang, L., Weng, W., Sun, Z.X., Fu, X.J., Ma, J., and Zhuang, W.F. (2015). SphK1 inhibitor II (SKI-II) inhibits acute myelogenous leukemia cell
growth in vitro and in vivo. Biochem. Biophys. Res. Commun. 460, 903–908. Zhang, Y., Liu, T., Meyer, C.A., Eeckhoute, J., Johnson, D.S., Bernstein, B.E., Nusbaum, C., Myers, R.M., Brown, M., Li, W., and Liu, X.S. (2008). Modelbased analysis of ChIP-Seq (MACS). Genome Biol. 9, R137.
Cell Reports 28, 2866–2877, September 10, 2019 2877
STAR+METHODS KEY RESOURCES TABLE
REAGENT or RESOURCE
SOURCE
IDENTIFIER
Abcam
#ab4729; RRID: 2118291
Dutch Childhood Oncology Group
https://www.skion.nl
TruSeq ChIP Sample Prep Kit
Illumina
IP-202-1012
TruSeq Stranded mRNA Library Prep
Illumina
#20020594
EasySep Human CD33 Positive Selection Kit
StemCell Technologies
#18257
EasySep Human CD34 Positive Selection Kit
StemCell Technologies
#18056
Dynabeads Protein G
Life Technologies
#10004D
This paper
GEO: GSE117994
Pediatric AML RNA-sequencing data
TARGET AML
https://ocg.cancer.gov/programs/target/data-matrix/
ATAC hematopoietic cell types
Corces et al., 2016
GEO: GSE74912
RNA-seq hematopoietic stem cells
Corces et al., 2016
GEO: GSE74246
Adult AML RNA microarray data
Verhaak et al., 2009
GEO: GSE6891
Adult AML RNA microarray data
Metzeler et al., 2008
GEO: GSE12417
CD34+ primary cells H3K27ac ChIP-seq
Roadmap Epigenomics
GEO: GSE17312
CD14+ primary cells H3K27ac ChIP-seq
ENCODE
ENCSR000ASJ
Antibodies H3K27acetylation antibody Biological Samples Pediatric AML patient samples Critical Commercial Assays
Deposited Data Raw and processed data
Software and Algorithms Bowtie version 1.1.1
Langmead and Salzberg, 2012
http://bowtie-bio.sourceforge.net
Samtools version 1.5
Li et al., 2009
http://samtools.sourceforge.net/
MACS2 version 2.1.1
Zhang et al., 2008
http://liulab.dfci.harvard.edu/MACS/
Bedtools version 2.26.0
Quinlan and Hall, 2010
http://bedtools.readthedocs.io/en/latest/
R version 3.5.1
R Development Core Team, 2012
https://cran.r-project.org
IGV version 2.3.40
Robinson et al., 2011
http://software.broadinstitute.org/software/igv
T-SNE R package
van der Maarten and Hinton, 2008
https://cran.r-project.org/web/packages/tsne/
DESeq2 R package
Love et al., 2014
https://bioconductor.org/packages/release/bioc/ html/DESeq2.html
GSEA
Subramanian et al., 2005
https://software.broadinstitute.org/gsea/index.jsp
HISAT2 version 2.0.5
Kim et al., 2015
https://login.proxy.library.uu.nl/login?auth= eng&qurl=https://ccb.jhu.edu%2fsoftware% 2fhisat2%2fmanual.shtml
DAVID Gene Ontology version 6.8
Huang et al., 2009
https://david.ncifcrf.gov
REVIGO
Supek et al., 2011
http://revigo.irb.hr
AME, MEME Suite
McLeay and Bailey, 2010
http://meme-suite.org/doc/ame.html
Survival R package
Moreno-Betancur et al., 2017
https://cran.r-project.org/web/packages/survival/ index.html
WGCNA R package
Langfelder and Horvath, 2008
https://cran.r-project.org/web/packages/WGCNA/ index.html
Cytoscape version 3.5.1.
Shannon et al., 2003
https://cytoscape.org/
LEAD CONTACT AND MATERIALS AVAILABILITY Further information and requests for resources and reagents should be directed to and will fulfilled by the Lead Contact, Menno P. Creyghton (
[email protected]). This study did not generate new unique reagents.
e1 Cell Reports 28, 2866–2877.e1–e5, September 10, 2019
EXPERIMENTAL MODEL AND SUBJECT DETAILS Pediatric AML patient samples Viable frozen mononuclear cells derived from peripheral blood or bone marrow of pediatric AML patients were provided by the Dutch Childhood Oncology Group (DCOG, Utrecht, the Netherlands) (Table S1). In this study we included 36 AML patient samples covering 27 pediatric AML patients. For 9 patient samples we have collected matched samples at initial diagnosis and at relapse. Patient characteristics (gender, age, karyotype, mutations, maturation stage) can be found in Table S1. Written informed consent was obtained from parents or legal guardians according to the principles expressed in the Declaration of Helsinki. Protocols were approved by the institutional review board of the University Medical Center Utrecht. We included patient samples that covered the main molecular aberrations in pediatric AML (MLL-rearranged, CBF-rearranged and NK with Flt3-ITD) (provided by the Dutch Childhood Oncology Group) (Table S1). Two experimental groups were defined. Patients that did not relapse within five years were classified as non-relapse. Patients that reached clinical remission and subsequently relapsed were classified as relapsed patients (Table S1). METHOD DETAILS Handling of AML patient samples Pediatric AML patient samples were thawed and, if necessary, purified for leukemic blasts to obtain > 90% blast purity. For each patient sample, DCOG provided extensive information of cell markers analyzed by fluorescence-activated cell sorting (FACS). Blast purification was done using EasySep CD33 or CD34 (depending on blasts characteristics of patient sample provided by DCOG) Positive Selection Kit (StemCell Technologies, #18257 and #18056) according to protocol with some minor changes: incubation time with the CD33 or CD34 selection cocktail was extended to 20 minutes and incubation time with magnetic beads was extended to 15 minutes. For purification, 25ul CD33 or CD34 selection cocktail and 20ul magnetic beads were used per 1*10^7 cells. Chromatin Immunoprecipitation-sequencing Sample preparation was done as shown previously with minor modifications (Vermunt et al., 2016). 2*10^6 – 10*10^6 cells were cross-linked using 1% formaldehyde for 10’ at room-temperature (RT). Samples were quenched with 125mM glycine and centrifuged for 50 at 4,000 rpm at 4 C. Fixed cell pellets were washed twice with PBS, snap-freezed in liquid nitrogen and stored at 80 C. Samples were lysed in Lysis Buffer 1 (50mM HEPES pH7.5, 140mM NaCl, 1mM EDTA, 10% glycerol, 0.5% Igepal, 0.25% Triton X-100) for 10’ at RT while rotating. Lysed cells were pelleted (4,000rpm, 50 , 4 C), resuspended in Lysis Buffer 2 (200mM NaCl, 1mM EDTA, 0.5mM EGTA, 10mM Tris-HCl pH8) and kept for 10’ on a rotator at RT. After centrifugation (4,000rpm 50 , 4 C), cells were resuspended in Lysis Buffer 3 (100mM NaCl, 1mM EDTA, 0.5mM EGTA, 10mM Tris-HCl pH8, 0.1% Deoxycholate, 0.5% N-lauroyl sarcosine) and subjected to sonification (Covaris S2). Cells were sheared at: duty cycle: 20%, intensity: 3, cycles per burst: 200, 12 cycles of 60 s. After shearing, 550ul of LB3 containing 10% Triton X-100 was added. Samples were centrifuged for 10’ at 14,000rpm at 4 C and cleared supernatant was used for immunoprecipitation. 50ul of Dynabeads Protein G (Life Technologies, #10004D) were conjugated with 5ug H3K27acetylation antibody (Abcam, #ab4729) for at least 3 hours at 4 C while rotating. Cleared supernatant was added to the resulting antibody-beads complexes after washing (twice with LB3, once with LB3 + 10% Triton X-100) and incubated overnight (o/n) at 4 C while rotating. Beads were washed 4 times with cold RIPA buffer (50mM HEPES pH7.5, 1mM EDTA, 0.7% Deoxycholate, 1% Igepal, 0.5M LithiumChloride) and once with 50mM NaCl in cold TE. DNA was eluted and de-crosslinked in 200ul elution buffer (50mM Tris-HCl pH 8.0, 10mM EDTA, 1% SDS) at 65 C o/n. Samples were centrifuged for 1’ at 14,000rpm to pellet Dynabeads Protein G and the eluate was diluted 1:1 with TE. Samples were treated with RNase (final concentration 0.2 mg/ml) for 2 hours at 37 C, followed by proteinase K treatment (final concentration 0.2mg/ml) for 2 hours at 55 C. DNA was extracted using phenol/chloroform with MaXtract High Density gel tubes (QIAGEN, #129046) followed by ethanol precipitation with glycogen. 10ng of DNA was used for library preparation (Truseq ChIP Library Preparation, Illumina # IP-202-1012) following protocol. Sequencing libraries were checked for correct size using DNA High Sensitivity analysis on the Bioanalyzer 2100 (Agilent, G2939BA). Samples were sequenced at Illumina NextSeq500 at Utrecht Sequencing Facility (http://useq.nl/). ChIP-Sequencing alignment and read counting For each sample, at least 18*10^6 reads were sequenced (Table S1). Sequencing reads were aligned on human hg38 using bowtie version 1.1.1 (Langmead and Salzberg, 2012) excluding reads with more than 1 mismatch (seed length of 40 base pairs (bp)) or with multiple alignment. Duplicate reads were discarded using Samtools version 1.5 (Li et al., 2009). Percentage of aligned and unique reads was higher than 70% for all samples (Table S1). Statistically significant H3K27ac enriched regions compared to background were calculated for each sample using MACS2 version 2.1.1. (p value = 105, extsize 300, local lambda 100,000) (Zhang et al., 2008). Regions were extended to a minimum of 1500bp in length (+/ 750bp from peak center) which reflects the typical resolution of H3K27ac histone enrichment given the fragment sizes (Bonn et al., 2012; Vermunt et al., 2014; Villar et al., 2015). Enriched regions from all samples were merged into one non-redundant list for further analysis, where overlapping peaks were merged. Regions that readed significance in only one patient sample were removed from the non-redundant list. Reads in peaks were counted using bedtools version 2.26.0 (Quinlan and Hall, 2010) and fraction of reads in peaks (FRiP) per sample exceeded the 1% threshold used by the
Cell Reports 28, 2866–2877.e1–e5, September 10, 2019 e2
Encyclopedia of DNA Elements (ENCODE) (Landt et al., 2012) (Table S1). Enriched regions that overlapped within 1,000bp from a transcriptional start site from the hg38 RefSeq list were annotated as promoters, other enriched regions were annotated as enhancers. ChIP-seq panels were visualized using the Integrated Genomics Viewer (IGV version 2.3.40, Broad Institute) (Robinson et al., 2011). Hierarchical clustering, t-SNE and PCA For t-Distributed Stochastic Neighbor Embedding (t-SNE) and Principal Component Analysis (PCA), read counts were normalized for library size and peak size (reads per kilobase per million (RPKM)). T-SNE scaling coordinates were calculated in R using the t-SNE package for dimensionality reduction (van der Maarten and Hinton, 2008). First, a distance matrix was calculated from the correlation of all samples, after which the t-SNE analysis was performed with 3,000 iterations. PCA was performed using the prcomp function in R (R Development Core Team, 2012). The variance per principal component was plotted using the R package ggplot2. For hierarchical clustering across samples, Pearson correlations between samples were calculated using log2 normalized read counts. The Pearson distance matrix, calculated by average linkage, was visualized using the heatmap.2 function from gplots in R. For hierarchical clustering across regions, RPKM values were z-normalized for each region and visualized using the heatmap.2 function from gplots in R. RNA-sequencing RNA of purified AML patient blast cells was extracted using TRIzol reagent (Invitrogen, #15596018) according to the manufacturers protocol. RNA integrity was calculated using Bioanalyzer 2100 (Agilent, G2939BA) and RNA samples with RNA integrity values (RIN) lower than 7 were discarded. Total mRNA was amplified using poly A+ enrichment and sequencing libraries were prepared using the TruSeq Stranded mRNA Library Prep (Illumina, #20020594) according to protocol. Samples were sequenced using the Illumina NextSeq500 at the Utrecht Sequencing Facility (http://useq.nl/). RNA-Sequencing alignment For each sample, at least 19*10^6 RNA sequencing reads were aligned on the human reference genome build hg38 from RefSeq with annotated transcripts using HISAT2 version 2.0.5 (Kim et al., 2015) (Table S1). Reads were mapped with maximum and minimum mismatch penalties of 6 and 2, respectively. Reads that mapped to multiple locations and duplicate reads were removed using Samtools version 1.5. Reads in genes were counted using HTSeq-counts in the union mode q (Anders et al., 2015) and more than 70% of the reads were located within genes (Table S1). Analysis of changes in gene expression To generate a gene list out of H3K27ac enriched regions, enhancers were linked with their closest H3K27ac enriched transcriptional start site in the hg38 RefSeq list. We used the R package DESeq2 (Love et al., 2014) to analyze fold changes in expression of genes linked with differentially enriched regions. Gene counts, of our own generated AML patient samples as well as public RNAsequencing data of the TARGET AML study, were sample normalized using DESeq2 and DESeq2 derived log2 fold changes were used for gene expression analysis. Fold changes were plotted as boxplots using the function boxplot() in R with default settings. The boxes indicate the interquartile range with the center line representing the median value. The whiskers represent the maximum and minimum values. See section ‘Quantification and statistical analysis’ for statistical analysis. Gene ontology analysis Overrepresented gene ontologies of relapsed and non-relapse genes were analyzed using DAVID (version 6.8) (Huang et al., 2009). Significant gene ontologies (p value < 0.05) were clustered and visualized using REVIGO (Supek et al., 2011). We allowed similarity of gene ontologies of 0.5 and used SimRel as semantic similarity measure. Transcription factor motif enrichment To identify overrepresented transcription factor motifs in our differentially enriched regions, we used AME, MEME Suite analysis (McLeay and Bailey, 2010). In order to refine our relatively large ChIP-seq regions to the open chromatin regions where transcription factors bind, we overlapped the ChIP-seq enriched regions with narrowPeaks derived from ATAC-sequencing experiments of hematopoietic cell types (GSE74912). Overrepresented transcription factor binding motifs were analyzed in ATAC-refined differentially H3K27ac enriched regions with an ATAC-refined non-redundant ChIP-seq list as background. This non-redundant list contained the same ratio of promoters and enhancers as the analyzed list. Each sequence was scored for matches to a single motif with an average odds score. A rank sum test was used for statistical analysis for overrepresentation where motifs with a p value < 0.05 were significantly enriched. Relapse scores and relapse free survival analysis Relapse free survival analysis was performed in a pediatric AML dataset (TARGET AML study), an adult dataset with mixed genetic abnormalities (GSE6891) (Verhaak et al., 2009) and in an adult dataset with normal karyotype (GSE12417) (Metzeler et al., 2008). RNA values and clinical information of the TARGET AML dataset was downloaded via https://ocg.cancer.gov/programs/target/ data-matrix/. Clinical information of GSE6891 and GSE12417 was kindly provided by the authors. RPKM values were used in the
e3 Cell Reports 28, 2866–2877.e1–e5, September 10, 2019
RNA-sequencing datasets. For the RNA microarray datasets, RNA microarray probe names were converted into gene symbols, where probes with the highest mean expression were selected in case of the presence of multiple probes per gene symbol. RNA values were log2 transformed and normalized to zero mean. The relapse scores were calculated by taking the sum of the log2 transformed zero mean normalized values for the genes present in the relapse prediction gene-set, as shown previously (Eppert et al., 2011). Non-relapsed patients were at least 3-years relapse free and relapsed patients reached clinical remission. Samples were separated based on low or high relapse scores using the median score as a cut-off. LSC17 scores were calculated by taking the sum of the normalized RNA-seq values weighted by the regression coefficients as following: (DNMT3B * 0.0874) + (ZBTB46 * 0.0347) + (NYNRIN * 0.00865) + (ARHGAP22 * 0.0138) + (LAPTM4B * 0.00582) + (MMRN1 * 0.0258) + (DPYSL3 * 0.0284) + (KIAA0125 * 0.0196) + (CDK6 * 0.0704) + (CPXM1 * 0.0258) + (SOCS2 * 0.0271) + (SMIM24 * 0.0226) + (EMP1 * 0.0146) + (NGFRAP1 * 0.0465) + (CD34 * 0.0338) + (AKR1C3 * 0.0402) + (GPR56 * 0.0501) (Ng et al., 2016). Kaplan-Meier relapse free survival curves were generated using the survival package in R (Moreno-Betancur et al., 2017). Weighted gene co-expression analysis (WGCNA) Log2 zero mean normalized RPKM RNA-sequencing values from the TARGET AML study (Bolouri et al., 2018) were used for WGCNA. The WGCNA R package was used to create signed co-expression networks (Langfelder and Horvath, 2008). Network reproducibility and module robustness was ensured by random sampling of 66% of the dataset samples 50 times (Parikshak et al., 2013). Outlier samples were removed above 240 in the sample correlation tree. A soft-threshold power of 7 was used to obtain approximate scale-free topology (R2 > 0.8). Network interconnectedness was calculated for 14,147 genes using the blockwiseConsensusModules function for each gene pair. The resulting topological overlap dissimilarity matrix (1-topological overlap matrix (TOM)) was used for hierarchical clustering using average linkage and modules were generated using a hybrid dynamic tree cutting algorithm (Langfelder et al., 2008). Modules were generated with a minimum size of 50 genes and with the deepSplit parameter set on 2. The first principal component of the scaled expression in each module is represented as the module eigengene. Modules eigengenes were clustered on dissimilarity and cutting the dendrogram on 0.25 (mergeCutHeight) resulted in merged modules. Module membership (kME), defined as correlation to the module eigengene, was calculated for each gene (minimum kME < 0.2) within the modules. Genes that were not assigned to a specific module were grouped in module gray (ME 0). Relapse genes were extracted from the modules that had a connection with another relapse gene to generate the relapse network. Gene networks with an edge weight of more than 0.055 were visualized in Cytoscape (Shannon et al., 2003) to show most interconnected genes. The thickness of the edges represented the weight. QUANTIFICATION AND STATISTICAL ANALYSIS Analysis fraction of variance In order to test the contribution of multiple covariates across different AML subgroup in the ChIP-seq dataset and RNA-seq dataset, we performed a multivariate linear regression model with an ANOVA in R. For each H3K27ac enriched region in the ChIP-seq dataset using all patient samples, we analyzed the fraction of variance that could be attributed to the following subgroups: relapse state (nonrelapse and relapsed patients), molecular subtype, FAB-class, Flt3-ITD, NPM1, WT1 and RAS. For the analysis of MLL-AF9 samples: high EVI1 H3K27ac enrichment, KRAS, trisomy 8, age and gender were used. For the analysis of the variation in gene expression in the TARGET AML study we analyzed the variation of each gene across the following patient characteristics: molecular subtype, FABclass, molecular risk, relapse state (non-relapse and relapsed patients), Flt3-ITD-mutated, NPM1-mutated, CEBPA, WT1, cKIT, gender and age. Determination of differentially enriched regions To determine H3K27ac enriched regions with significant differences between molecular aberration subgroups MLL-rearranged, CBF-related and Flt3-ITD, we used the DESeq2 package in R (Love et al., 2014). Four patient samples were used for each molecular subtype to analyze differentially enriched regions. Regions were considered significantly differentially enriched (DE) when at least a 2-fold change was observed with an FDR smaller than 0.001. All significant differentially enriched regions were combined and visualized using ngs.plot in R (Shen et al., 2014), where all four samples per molecular subtype were merged. To determine H3K27ac enriched regions with significant differences between relapsed and non-relapsed MLL-AF9-rearranged patient samples, we used the DESeq2 package in R. Regions were considered significantly differentially enriched (DE) when at least a 2-fold change was observed with an FDR smaller than 0.001. Additionally, DE regions had to be significantly enriched in either 2/4 relapsed or 3/5 non-relapsed MLL-AF9-rearranged patients to prevent the analysis of background variation in regions that are specific to other molecular subtypes. Gene expression analysis Changes in gene expression of relapse and non-relapse genes was calculated using the DESeq2 package in R (Love et al., 2014). Dissimilarity between distributions of log2 fold changes of relapse and non-relapse genes compared to log2 fold changes of all genes was calculated using a Wilcoxon rank-sum test. For the analysis of complete gene sets, Gene Set Enrichment Analysis (GSEA, Broad Institute) (Subramanian et al., 2005) was performed using 1,000 permutations with the gene set as permutation type. Gene sets were
Cell Reports 28, 2866–2877.e1–e5, September 10, 2019 e4
identified to be significant with a p value < 0.05 and an FDR < 0.001. Significant expression differences of log2 RPKM expression values of relapse genes between normal hematopoietic cell types HSCs and monocytes were tested using a Wilcoxon signedrank test. Overlap with known LSC-genes We performed an hypergeometric test to evaluate the overlap between our relapse genes with genes known to be higher expressed in LSC fractions compared to non-LSC-fractions in other studies (Gentles et al., 2010; Ng et al., 2016; Pabst et al., 2016). Relapse free survival analysis Significance of relapse free survival in an univariate analysis was calculated using a log-rank test using the survival package in R. Hazard ratios were analyzed with dichotomous scores (high relapse score versus low relapse score) and plotted using ggplot2 in R. To test whether our relapse score was significantly associated with increased risk of relapse disregard other features, we performed a multivariate analysis using a Cox proportional hazard model from the R package survival. Multivariate statistical analysis was performed using the Wald test. Dichotomous and continuous relapse scores were analyzed together with CBF-rearrangements, complex cytogenetics, Flt3-ITD mutated, NPM1 mutated, CEPBA mutated, WT1 mutated, cKIT mutated, gender, age. All covariates were analyzed as analyzed dichotomous values, except for age. DATA AND CODE AVAILABILITY The accession number for the sequencing data reported in this paper is GEO: GSE117994. RNA-sequencing data from the TARGET AML study was downloaded via: https://ocg.cancer.gov/programs/target/data-matrix/. Other used datasets can be found via accession numbers GSE74912 (ATAC hematopoietic cell types), GSE74246 (RNA-seq hematopoietic stem cells), GSE6891 (RNA microarray adult AML), GSE12417 (RNA microarray normal karyotype adult AML), ENCODE ENCSR000ASJ (H3K27ac ChIP-seq CD14+ cells) and Roadmap Epigenomics GSE17312 (ChIP-seq H3K27ac CD34+ primary cells).
e5 Cell Reports 28, 2866–2877.e1–e5, September 10, 2019