Resource
Analysis of Normal Human Mammary Epigenomes Reveals Cell-Specific Active Enhancer States and Associated Transcription Factor Networks Graphical Abstract
Authors Davide Pellacani, Misha Bilenky, Nagarajan Kannan, ..., Samuel Aparicio, Martin Hirst, Connie J. Eaves
Correspondence
[email protected] (M.H.),
[email protected] (C.J.E.)
In Brief Pellacani et al. present comprehensive histone and DNA modification profiles for four cell types in normal human breast tissue and three immortalized human mammary epithelial cell lines. Analysis of activated enhancers place luminal progenitors in between bipotent progenitor-containing basal cells and nonproliferative luminal cells. Explore consortium data at the Cell Press IHEC webportal at http://www.cell.com/ consortium/IHEC.
Highlights d
Epigenomes of four normal human breast cell fractions and three cell lines are reported
d
Luminal progenitor and mature luminal cell epigenomes differ greatly
d
Enhancers define luminal progenitors as intermediate between basal and luminal cells
d
Transcription factor binding sites in active enhancers point to distinct regulators
Pellacani et al., 2016, Cell Reports 17, 2060–2074 November 15, 2016 ª 2016 The Author(s). http://dx.doi.org/10.1016/j.celrep.2016.10.058
Cell Reports
Resource Analysis of Normal Human Mammary Epigenomes Reveals Cell-Specific Active Enhancer States and Associated Transcription Factor Networks Davide Pellacani,1 Misha Bilenky,2 Nagarajan Kannan,1 Alireza Heravi-Moussavi,2 David J.H.F. Knapp,1 Sitanshu Gakkhar,2 Michelle Moksa,3 Annaick Carles,3 Richard Moore,2 Andrew J. Mungall,2 Marco A. Marra,2 Steven J.M. Jones,2 Samuel Aparicio,4,5 Martin Hirst,2,3,* and Connie J. Eaves1,6,* 1Terry
Fox Laboratory, BC Cancer Agency, Vancouver, BC V5Z 1L3, Canada Michael Smith Genome Sciences Centre, BC Cancer Agency, Vancouver, BC V5Z 1L3, Canada 3Michael Smith Laboratories, Department of Microbiology and Immunology, University of British Columbia, Vancouver, BC V6T 1Z4, Canada 4Department of Molecular Oncology, BC Cancer Agency, Vancouver, BC V5Z 1L3, Canada 5Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, BC V6T 2B5, Canada 6Lead Contact *Correspondence:
[email protected] (M.H.),
[email protected] (C.J.E.) http://dx.doi.org/10.1016/j.celrep.2016.10.058 2Canada’s
SUMMARY
The normal adult human mammary gland is a continuous bilayered epithelial system. Bipotent and myoepithelial progenitors are prominent and unique components of the outer (basal) layer. The inner (luminal) layer includes both luminal-restricted progenitors and a phenotypically separable fraction that lacks progenitor activity. We now report an epigenomic comparison of these three subsets with one another, with their associated stromal cells, and with three immortalized, non-tumorigenic human mammary cell lines. Each genome-wide analysis contains profiles for six histone marks, methylated DNA, and RNA transcripts. Analysis of these datasets shows that each cell type has unique features, primarily within genomic regulatory regions, and that the cell lines group together. Analyses of the promoter and enhancer profiles place the luminal progenitors in between the basal cells and the non-progenitor luminal subset. Integrative analysis reveals networks of subset-specific transcription factors. INTRODUCTION The normal adult human female mammary gland is a continuous bilayered, branching epithelial structure containing three phenotypically distinct subsets of cells. These can now be isolated and biologically distinguished (Fu et al., 2014). They are referred to as basal cells (BCs), luminal progenitors (LPs), and luminal cells (LCs) because their surface marker expression profiles identify their location within the outer (basal) or inner (luminal) layer of the gland. A prominent fourth cell population in the breast is comprised of mesodermally derived stromal cells (SCs). Approximately 20%–30% of human mammary BCs and LPs have progenitor (clonogenic) activity demonstrable in both 2D and
3D cultures (Eirew et al., 2008; Kannan et al., 2013; Raouf et al., 2008). Most colonies derived from the BCs contain a mixture of cells with either basal or luminal features, although sometimes only basal progeny are produced. Human LPs contain all of the in vitro progenitor activity of the luminal compartment and produce colonies of cells expressing luminal features exclusively. Because LCs lack substantial proliferative potential in vitro, they are assumed to represent a mature subset of luminal cells (Fridriksdottir et al., 2015; Kannan et al., 2013). When transplanted into immunodeficient mice, a small proportion (<1%) of human BCs, but not LPs or LCs, can also individually regenerate normal-appearing bilayered mammary structures that contain the same spectrum of cell types and progenitors found in the normal human gland. This property suggests the cells of origin of these structures represent a human mammary stem cell population (Eirew et al., 2008; Kuperwasser et al., 2004; Lim et al., 2009; Nguyen et al., 2014). Despite these advances, the molecular regulators that specify the unique features of each of these normal human mammary subsets have only recently become amenable to interrogation. Transcriptome profiling (Choudhury et al., 2013; Gascard et al., 2015; Huh et al., 2015; Kannan et al., 2013; Lim et al., 2009, 2010; Makarem et al., 2013; Maruyama et al., 2011; Raouf et al., 2008; Dos Santos et al., 2015), together with functional studies, has enabled several key transcription factors (TFs) to be identified in the subpopulations of normal human as well as analogous mouse mammary cell types (Fu et al., 2014; Visvader and Stingl, 2014). Thus implicated in BCs are SLUG, TP63 and TP53, SOX9, STAT3, MYC, and TAZ. In cells with luminal features, CEBPB and NOTCH3 (Raouf et al., 2008), together with GATA3, ELF5 (Chakrabarti et al., 2012), and FOXA1, which regulates estrogen receptor activity (Theodorou et al., 2013), have been identified as important. Previous human mammary epigenome profiles produced as part of the Roadmap Epigenomics Consortium (Roadmap Epigenomics Consortium et al., 2015) have provided an initial global annotation of active enhancer regions (Gascard et al., 2015). We now report a more extensive set of reference epigenome profiles
2060 Cell Reports 17, 2060–2074, November 15, 2016 ª 2016 The Author(s). This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
A
Normal breast samples
FACS
6 samples pooled ( 5x107 cells per fraction)
Molecular analyses
Integrative analyses
RNA RNA-seq DNA WGBS Chromatin H3K4me1 H3K4me3 H3K9me3 H3K27ac H3K27me3 H3K36me3
Datasets
n=6 LC EpCAM-PE
CD45- CD31single cells
LC
LP
LP
BC
BC
SC
SC CD49f-APC
Chromatin states
Expression
Enhancers Promoters Transcription factor networks
Colony assays
BC CEMT BC Nguyen BC Nguyen BC Nguyen myo REMC myo REMC myo REMC LP CEMT LP Nguyen LP Nguyen LP Nguyen lum REMC LC CEMT lum REMC lum REMC MCF10A CEMT 184−L9 CEMT 184−L2 CEMT vHMEC REMC vHMEC REMC SC CEMT fibr REMC fibr REMC 0.20
0.15
0.10
0.05
Type
Study
LP vs BC
LP vs LC
Up: 806 Do
Do
1 LP log10(RPKM)
1 LP log10(RPKM)
BC vs LC
DE genes
Do
LP vs BC
LP vs LC
BC vs LC 1 BC log10(RPKM)
0.00
Dist. = 1−Spearman | Link. = complete
D
C
LC log10(RPKM) 1
Study
BC log10(RPKM) 1
Type
LC log10(RPKM) 1
B
2
4
6
8
10
absolute log2(
E
myo REMC BC CEMT myo REMC SC CEMT CEMT LP CEMT LC lum REMC lum REMC fibr REMC fibr REMC vHMEC REMC MCF10A CEMT 184−L9 CEMT 184−L2 CEMT 0.4 0.3 0.2 0.1 0.0 Dist. = 1−Spearman | Link. = complete Figure 1. Isolation of Purified Cell Subsets and Epigenomic Profiling (A) Experimental design. (B) Unsupervised clustering of RNA-sequencing experiments comparing the samples analyzed in this study (CEMT) with other studies (Nguyen et al., 2015; Roadmap Epigenomics Consortium et al., 2015; Gascard et al., 2015).
(legend continued on next page)
Cell Reports 17, 2060–2074, November 15, 2016 2061
that include a separate analysis of human LPs and LCs and profiles for SCs isolated from the same normal breast tissue, as well as profiles generated from three widely used immortalized, but non-tumorigenic, human mammary epithelial cell lines: i.e., MCF-10A cells (Debnath et al., 2003) and 184-hTERT-L2 (184L2) and 184-hTERT-L9 (184-L9) cells (Burleigh et al., 2015). The results demonstrate that the cell lines are not representative of any freshly isolated normal human mammary subset. They have also allowed a hierarchy of enhancer states in normal human mammary tissue and their corresponding cell-specific TF networks to be identified. RESULTS AND DISCUSSION Generation and Validation of Comprehensive Reference Epigenomes Reference epigenomic profiles were generated following the recommendations of the International Human Epigenome Consortium (IHEC; http://ihec-epigenomes.org). The subsets of normal human breast cells isolated were >95% pure CD49f+ EPCAMlow/ cells (BCs), CD49f+EPCAMhigh cells (LPs), CD49f EPCAMhigh cells (LCs), and CD49fEPCAM cells (SCs). These subsets were initially isolated separately from each of six donor samples (aged 17-49 years) using gates that excluded CD45+ hematopoietic cells, CD31+ endothelial cells, and dead (DAPI+) cells (Figures 1A and S1). Like fractions of cells from each sample were then pooled for epigenomic profiling. In vitro 2D colony-forming cell (CFC) assays confirmed the expected content of mammary progenitors in each subset (24% of BCs, 18% of LPs, and <0.3% of LCs). Chromatin immunoprecipitation sequencing (ChIP-seq) data for six histone marks (H3K4me3, H3K4me1, H3K27ac, H3K27me3, H3K9me3, and H3K36me3), whole-genome bisulfite sequencing (WGBS), and RNA sequencing (RNA-seq) datasets (both mRNA and miRNA) were obtained for all four subsets. Parallel data were obtained on MCF-10A cells and 184-L2 and 184-L9 cells. Fully annotated tracks can be viewed in current genome browsers or downloaded directly for offline analysis at http://www. epigenomes.ca. Quality measures and methods to access the raw and processed data are provided in Experimental Procedures and Table S1. Unsupervised hierarchical clustering analysis of the transcriptome data indicated all three mammary epithelial subsets are more similar to each other than to the SCs but, nevertheless, distinct, with the LPs clustering closer to LCs than BCs. (Figure 1B). Paired comparison of the gene expression profiles confirmed a similarity of LP and LC transcriptomes (p < 0.001, Mann-Whitney) across both protein coding (Figures 1C and S2B; Table S2) and non-coding space (Figure S2C; Table S3).
These findings are consistent with and extend previously reported findings obtained both by RNA-seq (Gascard et al., 2015; Nguyen et al., 2015) and by 30 tag-based and microarray analysis (Kannan et al., 2013; Lim et al., 2009; Raouf et al., 2008), including an identification of many of the same genes as ‘‘BC specific’’ (e.g., TP63 (deltaN), ACTA2, MME, KRT14, THY1, and KRT5) or ‘‘LP specific’’ (e.g., MUC1, PROM1, KIT, KRT18, EPCAM, and KRT8) (Figure S2A). Similar clustering and differences between the same four cell types were observed when either microRNA (miRNA) or large intergenic noncoding RNA (lincRNA) profiles were analyzed (Figures 1D, S2D, and S2E). Notably, the transcriptomes for all three immortalized human mammary cell lines clustered independently from both the primary epithelial subsets and the SCs isolated directly from the same freshly dissociated normal human breast tissue (Figure 1B). Moreover, 66 of the 100 genes whose differential expression contributed most to the separate clustering of the mammary cell lines (highest rank differences) included genes of known importance in the mammary gland (e.g., PROM1, KIT, CXCR4, SOX10, ELF3, RARRES1, MMP7, TGFBI, and NT5E) and were found to be differentially expressed in the different subsets of primary epithelial cells (Figure 1E). These results show that these culture-adapted immortalized cell lines, commonly used as models of ‘‘normal’’ human mammary cells, exhibit significant and consistent differences in their transcriptional profiles from all subsets of epithelial mammary cells isolated directly from normal human breast tissue. Variations in the Chromatin States of Different Normal Human Breast Cell Phenotypes To examine and compare the chromatin states of the same primary breast cell types as well as the three cell lines, we used ChromHMM (Ernst and Kellis, 2012) to generate an 18-state model from the ChIP-seq profiles (Figures 2A and 2B). The results recapitulate many features of the 18-state model recently published for 111 primary human tissues and cell types (Roadmap Epigenomics Consortium et al., 2015), including the reported relationship between chromatin states and DNA methylation (Figure 2C), with two notable differences. One was the detection of a state characterized by short regions (median size = 400 bp) containing both H3K9me3 and H3K27me3 (Figure S3A, ‘‘Repr’’ state) that frequently overlapped with lamina-associated domains (LADs) across the entire genome (15%, 1.3-fold enrichment). The second was a split of each of the previously designated ‘‘quiescent’’ (Quies and Quies_G – no marks) and ‘‘transcribed’’ (Tx and Tx_3p – H3K36me3) states into two states specified by their transition rather than their emission probabilities (Figure 2A).
(C) Differential gene expression analysis for protein-coding genes. Each dot-plot shows all protein coding genes (gray), those with upregulated expression (green), and those with downregulated expression (red). Bottom right panel: boxplot showing the distribution of absolute fold changes in the three comparisons made. (D) Unsupervised clustering of miRNA sequencing data for the samples analyzed in this study (CEMT) together with those previously published within the REMC (Gascard et al., 2015). (E) Heatmap showing the expression (Z scores) of the 100 genes with the largest rank difference in cultured cells compared to freshly isolated cells. (Diff. Expr., gene differentially expressed between the primary epithelial subpopulations profiled here). See also Figures S1 and S2 and Tables S1, S2, and S3.
2062 Cell Reports 17, 2060–2074, November 15, 2016
B
TS TSS_A TSS_Fl TSS_Flnk_1 Tx S_Flnk_2 nk Tx _D En_3p Enh Enh_A Enh_G TSh_G _ EnS_B A i Reh_B v i Zn pr_Pv f _ He Rp C ts Ret Qupr Quies ies _G
A
173.3
173.35
173.4
Mb
Overlap (fold enrichment)
1.0 0.8 0.6 0.4 0.2 0.0
PDK1
ChromHMM 10
H3K4me3 0 2
max
H3K4me1 0 3.5
min
H3K27ac 0.0 2
Transition to state
Fractional Methylation
C
173.25
ITGA6
Emission probability
H3K36me3 H3K9me3 H3K27me3 H3K4me1 H3K27ac H3K4me3 Genome % CpGIsland Exon Gene TES TSS TSS±2kb LADs TSS_A TSS_Flnk_1 TSS_Flnk_2 TSS_Flnk_D Tx Tx_3p Enh Enh_A Enh_G Enh_G_A TSS_Biv Enh_Biv Repr_PC Znf_Rpts Het Repr Quies Quies_G
chr2
Transition probability
H3K36me3 1.0 0.8 0.6 0.4 0.2 0.0
0 2
H3K27me3 0 2
H3K9me3 0
D
1.0
184−L2 18/18
0.8
184−L9
14/18
MCF10A
0.6
18/18
SC 16/18
0.4
BC 17/18
LP
0.2
16/18
LC
E
Quies
Quies_G
Het
Repr
Znf_Rpts
Enh_Biv
Repr_PC
TSS_Biv
Enh_G
Enh_G_A
Enh
Enh_A
Tx
Tx_3p
TSS_Flnk_2
TSS_Flnk_D
TSS_A
TSS_Flnk_1
0.0 1.0
0.8 0.6 0.4 0.2 0.0 1 − Mean Norm. Jaccard
F 0.0
0.2
Jaccard value 0.4 0.6
0.8
TSS_A TSS_Flnk_1 TSS_Flnk_2 TSS_Flnk_D Tx Tx_3p Enh Enh_A Enh_G Enh_G_A TSS_Biv Enh_Biv Repr_PC Znf_Rpts Het Repr Quies Quies_G
Figure 2. Chromatin State Differences between Cell Types (A) Chromatin state definitions, histone mark probabilities, overlap with genomic features, and transition probabilities for an 18-state model defined using the ChromHMM software. (B) Example of the chromatin state model (ChromHMM) and ChIP-seq signal tracks (y axis = signal per million reads [SPMR]) around the ITGA6 gene (CD49f) for the LP subset. (C) Distributions of DNA methylation levels across all chromatin states in the four primary samples analyzed. Boxplots show 10%, 25%, 50%, 75%, and 90% quantiles. (D) Aggregated hierarchical clustering based on the normalized Jaccard values of each chromatin state. Fractions show the number of states for which each node is present in the individual dendrograms. (E) Heatmap showing the Jaccard value of enhancers associated states with the ChromHMM profiles generated by the Roadmap Epigenomics Consortium. Only the top five matches for each sample are represented (black indicates high Jaccard). See complete dataset in Figure S3B. (F) Jaccard values of each chromatin state between cell types (primary samples only). Each dot shows one pairwise comparison (lines indicate medians). See also Figure S3.
Cell Reports 17, 2060–2074, November 15, 2016 2063
A
C
B
D
E
F
G
H
(legend on next page)
2064 Cell Reports 17, 2060–2074, November 15, 2016
Comparison of these chromatin states again showed a consistent difference between all three mammary cell lines and the four primary cell types (Figure 2D). In fact, a comparison of the enhancer states of these cells with those of 127 cell and tissue types reported by the Roadmap Epigenomics Consortium (Roadmap Epigenomics Consortium et al., 2015) showed our three cell lines were similar to cultured epithelial cells derived from breast and skin, and the BC fraction was most similar to previously analyzed freshly isolated breast myoepithelial cells (Figures 2E and S3B). The present findings thus underscore the magnitude of changes imposed by culture adaptation on the chromatin landscape (Antequera et al., 1990), in addition to effects on the transcriptome. Pairwise comparisons between the chromatin states of each of the four primary mammary cell types revealed substantive additional differences, involving 25% of the genome in each case. Moreover, differences distinguishing LPs, LCs, and BCs from each other proved to be as extensive as those distinguishing any one of these from the developmentally unrelated SCs (i.e., for each chromatin state, the Jaccard indices of paired comparisons of SCs with each of the mammary cell types were not significantly different, p R 0.1 Mann-Whitney; Figure 2F). Enhancer states (abbreviated as Enh, Enh-A, Enh-G, and Enh-AG) appeared to be the most variable features, as reported previously (Roadmap Epigenomics Consortium et al., 2015). Many promoter-associated states (abbreviated as TSS-Flnk-1, TSS-Flnk-2, and TSS-Flnk-D) were also highly variable between all cell types (Figures S3C and S3D). In summary, the chromatin state models of the data reported here show that the three major epithelial cell types in normal adult human mammary tissue possess many unique features extending throughout a large part of their genomes. Nevertheless, their epigenomes are more different from those of three immortalized non-tumorigenic human mammary cell lines by comparison to the epigenome of developmentally unrelated SCs isolated from the same human breast tissue samples. The present primary mammary cell epigenome datasets also confirm a marked cell-type specificity in the chromatin states of both enhancer and promoter regions.
Chromatin Features at Promoters Point to a Hierarchy of Normal Mammary Epithelial Cell States The importance of H3K4me3 and H3K27me3 modifications in developmental specification processes (Voigt et al., 2013) prompted us to examine these modifications at gene promoters genome-wide, both individually and in combination (Figure 3). H3K4me3 levels at promoters were highly consistent across all primary cell types (Spearman correlation coefficient R 0.9). These correlated positively with the transcript levels of the associated protein-coding genes that were differentially expressed in different cell types. A similar trend was seen for noncoding RNAs. In general, H3K27me3 levels at promoters were more variable in all cell types (average Spearman correlation coefficient = 0.69), and these correlated negatively with the levels of transcripts of the associated differentially expressed genes (Figures 3A–3C and S4A–S4E). We next used false discovery rate (FDR)-based thresholds (q < 0.01 and at least 400 bp covered) to categorize gene promoters based on the presence of H3K4me3, H3K27me3, or both (bivalent) in all three primary mammary subsets (Figure 3D; Table S4). LCs showed a higher number of H3K27me3-marked and bivalent promoters compared to BCs, consistent with an important role of this polycomb-deposited mark in establishing LC identity, as suggested previously (Maruyama et al., 2011; Pal et al., 2013). H3K27me3-marked promoters specific to each cell type were associated with repressed gene expression (p < 0.01, Mann-Whitney on the group of genes; Figure S4F). However, a majority of these promoters lacked H3K4me3 in the other cell types (Figure 3E). Thus, for these genes, transcription appears to be independent of promoter modification by H3K4me3. Approximately 20%–30% of these promoters were marked by H3K4me1 (Figures 3F and 3G). They were also enriched in noncoding RNAs (Figure 3H), of which miRNAs and lincRNAs were the most frequent types. This suggests that the role of H3K4me1 and H3K27me3 in regulating noncoding RNA production seen in other tissues (Amin et al., 2015) also holds for subsets of freshly isolated normal human mammary epithelial cells. We identified 2,700 bivalent promoters in each of the three primary mammary cell types. This included the LCs, even though these cells are thought to represent a terminally differentiated
Figure 3. H3K27me3 Is Highly Variable at Promoters (A) Heatmaps showing Spearman correlation coefficients of H3K4me3 (top) and H3K27me3 (bottom) marks at promoters in all pairwise comparisons of the four subsets of primary cells analyzed. (B) Density plot showing the distribution of promoters marked by H3K4me3 (top) and H3K27me3 (bottom) in pairwise comparisons between epithelial cell types (warmer colors indicate higher densities). SPKM, signal per kilobase per million mapped reads. (C) Dot plot showing the difference in median levels of histone marks at promoters of differentially expressed protein-coding genes. Each dot shows one pairwise comparison. (D) Venn diagrams showing the numbers of H3K4me3, H3K27me3, and bivalently marked promoters marked in each epithelial cell type. (E) Hive plots showing the proportion of H3K27me3-marked promoters specific to each cell type (dark red), being bivalent (blue), and H3K4me3-marked (dark green), or not marked (black) in other cell types. Ribbons are proportional to the number of promoters changing marking between two cell types. (F) Fraction of cell-specific H3K27me3-marked promoters not marked by H3K4me3 in other cell types but marked by H3K4me1. Groups of bars represent cellspecific H3K27me3-marked promoters. Individual bars show the H3K4me1 marking in each cell type. (G) Average plots of H3K4me1-marked cell-type-specific H3K27me3-marked promoters, not marked by H3K4me3 in other cell types. Left, BCs; middle, LPs; and right, LCs. (H) Fraction of genes associated with cell-type-specific H3K27me3-marked promoters, but not marked by H3K4me3 in other cell types, that are classified as protein coding. Gray boxplots indicate distributions of equally sized random gene sets. See also Figure S4 and Table S4.
Cell Reports 17, 2060–2074, November 15, 2016 2065
(legend on next page)
2066 Cell Reports 17, 2060–2074, November 15, 2016
human mammary cell type (Figure 3D). Although heterogeneity of promoter marking between individual LCs cannot be excluded as an explanation for the bivalency observed, a similar variability across donors in gene expression of affected genes was evident between cell types in three independent studies (Figure S5). In addition, we found that bivalent promoters in BCs overlap significantly with the bivalent chromatin states identified in the Roadmap Epigenomics Consortium dataset for a myoepithelial cell population (essentially the same as BCs) obtained from a single donor (Fisher’s p value < 105). Thus it seems unlikely that the bivalent promoters identified here reflect a mixture of different promoter marking in the same phenotypes of cells isolated from different donors. Most promoters that were bivalent in a given subset of mammary epithelial cells contained at least one of the two marks in the other two subsets. These changes in promoter marking correlated with expected changes in gene expression; i.e., increased transcript levels were associated with loss of promoter H3K27me3 marks and decreased transcript levels were associated with loss of promoter H3K4me3 marks (Figures 4A–4C). More than 90% of the bivalent promoters specific to BCs (422/ 459) ‘‘resolved’’ to a display of the same single modification in both LPs and LCs, with 60% showing H3K4me3 and 40% showing H3K27me3 modifications. These findings are reminiscent of the changes in bivalent promoter modifications that accompany the differentiation of embryonic stem cells (Bernstein et al., 2006). Gene promoters that were bivalently marked in BCs, but marked by H3K27me3 in LPs or LCs, included many genes that encode proteins and TFs with a reported role in regulating stem cell properties (e.g., SOX17, HOXD10, TCF15, TAL1, PIWIL1, and the imprinted gene, IGF2). Conversely, gene promoters marked by H3K4me3 in both LCs and LPs with were significantly enriched in genes encoding proteins implicated in their differentiation (e.g., ESR1, SOX9, ERBB4, NOTCH2, RUNX1, CCND1, and EPCAM; p < 0.01, hypergeometric distribution with Bonferroni correction; Figure 4D). These findings support a model in which genes crucial for luminal differentiation may be poised in BCs but appear activated in cells that initiate luminal differentiation. 60% (269/449) of LP-specific bivalent promoters were marked by H3K4me3 in BCs and LCs. Associated genes were significantly enriched in genes involved in epithelial tissue morphogenesis and development (e.g., BMP4, TGFB1, PDGFA, and RXRA; p < 0.01, hypergeometric distribution with Bonferroni correction). This result is again consistent with LPs occupying a state
intermediate between BCs and LCs. Interestingly, in LPs, only 2% (11/449) of the bivalent promoters were marked by H3K27me3 in BCs and H3K4me3 in LCs, whereas 15% (68/449) showed the opposite pattern. The latter were also significantly enriched in genes encoding proteins that regulate extracellular matrix organization (p < 0.01, hypergeometric distribution with Bonferroni correction). Surprisingly, in both BCs and LPs, 88% (538/610) of the bivalent promoters specific to LCs were classified as H3K4me3, and these were enriched in genes encoding proteins responsible for cell migration, proliferation, and receptor tyrosine kinases (e.g., VIM, SNAI2, CDK6, KIT, and EGFR; p < 0.01, hypergeometric distribution with Bonferroni correction). This latter association suggests an unanticipated role of this promoter state in repressing processes specific to BCs and LPs. Previous studies in breast and other tissues have indicated an association of bivalently marked promoters with restricted expression of genes important for lineage determination (Bernstein et al., 2006; Maruyama et al., 2011; Visvader and Lindeman, 2012; Voigt et al., 2013). The present results suggest that promoter bivalency plays a broader role in regulating gene expression among normal human mammary cells. Our analysis also supports a hierarchical organization of mammary epithelial cell states in which a bivalent promoter landscape established in BCs is resolved in LPs and is then maintained in LCs. This includes some de novo gains of bivalency in LCs suggesting that this chromatin state may play a role in the repression of genes involved in proliferation. Chromatin Features at Enhancers Point to a Hierarchy of Normal Mammary Epithelial Cell States We next examined the distribution of enhancers in each subset of primary cells based on the identification of sites marked by H3K4me1, H3K4me3, H3K27ac and H3K27me3 (Figure 5; Table S5; see Experimental Procedures for details of this analysis). Distal ‘‘primed’’ enhancers were defined as any region of chromatin marked by H3K4me1, but not also high in H3K4me3, assuming the latter to have promoter rather than enhancer activity (Figures 5B, 5C, and S6). We termed those also strongly marked by H3K27ac as ‘‘active,’’ those strongly co-marked by H3K27me3 as ‘‘poised,’’ and the very rare regions strongly marked by both H3K27me3 and H3K27ac (<0.1%) as ‘‘mixed’’ (anticipating this latter group to represent regions of differential marking in individual cells within the same subset). Using these definitions, we identified 141,400 putative primed enhancers (Figures 5D and 5E) with a median distance to their
Figure 4. Bivalent Promoter Differences between Human Mammary Epithelial Cell Types (A) Genome browser tracks of the ChIP-seq signals for H3K4me3 (dark green) and H3K27me3 (dark red) (y axis = signal per million reads [SPMR) in the epithelial cell types around the NOTCH3 gene (bivalent in BCs), the MME gene (bivalent in ]LPs) and the EGFR gene (bivalent in LCs). Light gray rectangles indicate gene promoters. (B) Hive plots showing the proportion of bivalent promoters specific to each cell type (blue), being H3K27me3-marked (dark red), H3K4me3-marked (dark green), or not marked (black) in other cell types. Ribbons are proportional to the number of differently marked promoters in two cell types. (C) Levels of expression of genes associated with cell-specific bivalent promoters that show different marks in other cell types. Promoter categories are shown below the x axis label. Black lines indicate medians of each group. p values are from a two-tailed Mann-Whitney U test. (D) Top five gene ontologies enriched in association with cell-specific bivalent promoters that show different marks in other cell types. Promoter categories are shown in the top right corner. See also Figure S5.
Cell Reports 17, 2060–2074, November 15, 2016 2067
Figure 5. Identification of Different Classes of Enhancers in Different Human Mammary Cell Subsets (A) Table showing the different categories of enhancers identified, the histone marking used to define each category (X for absent, O for present, X/O for optional), and the numbers of enhancers in each cell type. (B) Expression of THY1 in the four mammary cell types analyzed. (C) Example of the enhancers identified (Enh track), ChIP-seq signal tracks (y axis = signal per million reads), unmethylated regions (UMR track), and DNA methylation levels (Fr. Meth.) and coverage (log2 scale) in all cell types around the THY1 gene. Light gray rectangles indicate all enhancers identified. (D) Heatmap showing the signal for H3K4me1, H3K27ac, and H3K27me3 on each enhancer region identified in LPs, ordered by region size and categorized as off (gray), primed (green), active (red), poised (blue), and mixed (magenta). Each row represents one enhancer region. (E) H3K4me1 levels (SPKM) at primed enhancers in all cell types. Each column represents one individual enhancer. Enhancers are grouped based on their presence in each cell type. (F) H3K27ac levels (SPKM) at active enhancers in all cell types. Each column represents one individual enhancer. Enhancers are grouped based on their presence in each cell type. See also Figure S6 and Table S5.
2068 Cell Reports 17, 2060–2074, November 15, 2016
A
B
C
D
E
G
F
H
I
(legend on next page)
Cell Reports 17, 2060–2074, November 15, 2016 2069
closest transcription start site (TSS) of 22 kb. Notably, these enhancers overlap significantly (Fisher’s p value < 105 for both robust and permissive datasets) with enhancers previously inferred from RNA expression data (Andersson et al., 2014). Of these, 374 overlap with functionally in vivo validated enhancers (Fisher’s p value < 105) present in the VISTA enhancer browser (Visel et al., 2007). They were characterized by an overall low average level of methylation (on average, 50% at enhancer midpoints, and 75% on equal numbers of randomly selected genomic regions; Figure S7A). They also showed significant overlap (Fisher’s p value < 105) with unmethylated regions (UMRs) and low-methylated regions (LMRs) (Figure S7B), as defined by applying a hidden Markov model to the WGBS dataset (Stadler et al., 2011). The repertoire of active enhancers comprised 40% of those defined as primed (Figure 5D), of which 30% to 50% were present in only one mammary subset (using our thresholds, Figures 5F and 6A). These were associated with 40% to 60% of the genes showing upregulated expression but only 10% to 30% of the genes showing downregulated expression in pairwise comparisons of all cell types (Figures 5B, 5C, 6B, and S6). A similar, but generally less significant, association was seen with primed enhancers (excluding those defined as active, poised or mixed; Figure 6C). Interestingly, regions corresponding to those identified as super enhancers in the mouse mammary gland (Shin et al., 2016) significantly overlapped with sites identified as active enhancers in human mammary epithelial cells, but not in the human SCs (Figure S7C). Thus, at least some of the enhancers identified here as active in human mammary cells appear to also be active in the mouse mammary gland. This analysis also revealed cell-type-specific enhancer activation of many genes with an established role in mammary epithelial biology. For example, many enhancers in the vicinity of BC-specific genes were defined as active only in BCs (e.g., TP63, THY1, VIM, and SNAI2). Similarly, multiple enhancers near LC-specific genes were active only in these cells (e.g., ESR1, KLF4, FOXA1, S100P, and TBX3), and several enhancers near KIT and ELF5 were active exclusively in LPs. Likewise, we found shared active enhancers for many genes that are expressed at higher levels in BCs and LPs, the two mammary subsets that show high progenitor activity (e.g., KRT5, EGFR, ITGA6,
HOXA4, and FOXO1). Significantly fewer distal enhancers were defined as ‘‘poised’’ (on average, only 5% of the primed enhancers in each cell type), and these showed a weak inverse relationship with gene expression (less with upregulated genes and more with downregulated genes, Figure S7D). We then compared the state of enhancer priming and activation within the different mammary subsets. This analysis showed that up to 40% of the enhancers that were specifically active in a particular subset were in a primed state in the other two subsets (Fisher p value < 105). This change in enhancer state was seen more often in LPs than in either LCs or BCs (Figure 6D). A deeper analysis of H3K4me1 marking of active enhancers revealed a significant imbalance in cell-specific marking (Figure 6E). Thus, of the enhancers classified as active only in LCs, 50% (2,500) of those that were primed in LPs were unmarked in BCs. However, only 30% (1,000) of those primed in BCs were unmarked in LPs. An analogous pattern was seen for enhancers classified as active only in BCs, with 60% (4,000) of those identified as primed in LPs being unmarked in LCs and only 30% (1,200) of those identified as primed in LCs being unmarked in LPs. Similar numbers of active LP-specific enhancers were also found to be primed in BCs and LCs. However, approximately half of these were unmarked in the other cell type (55% and 50% in BCs and LCs, respectively). This evolving pattern of enhancer activation states is again consistent with an underlying hierarchy of human mammary cell differentiation states in which LPs represent an epigenomic intermediate between BCs and LCs, with enhancer activation and decommissioning being governed by H3K27ac addition and removal, respectively. Importantly, functional annotation of genes associated with enhancers found to be active in BCs and primed in LPs showed enrichment for terms related to cell motility and adhesion (p < 0.01 FDR, binomial testing; Figure 6F). Genes associated with these terms and active enhancers in BCs were also expressed at a higher level in BCs than in LPs and LCs (Figure 6G). Strikingly, >60% of the enhancers uniquely active in one of the mammary cell types were not marked by H3K4me1 in either of the other two. This suggests that a majority of enhancers are activated de novo in each cell type. However, 50% of the genes associated with de novo activated enhancers were also found to be associated with active enhancers that are primed or active in
Figure 6. Differential Enhancer Activation in Different Human Mammary Cell Types (A) Venn diagrams showing the numbers of enhancers identified in each epithelial cell type as primed, poised, active, and mixed. (B and C) Fraction of differentially expressed genes associated with at least one active (B) or primed (C) enhancer in only one of the two cell types in each pairwise comparison. p values are from a two-tailed Fisher’s test. (D) Fraction of active enhancers present in only one epithelial cell type marked by H3K4me1 in the other two samples. p values are from a two-tailed Fisher’s test of a comparison between cell types. (E) Hive plots showing the proportion of active enhancers specific to each cell type (red), found to be off (gray), primed (green), poised (blue), or mixed (magenta) in the other cell types. Ribbons are proportional to the number of enhancers changing marking between two cell types. Left: active enhancers specific to BCs. Middle: active enhancers specific to LPs. Right: active enhancers specific to LCs. (F) Top eight gene ontologies enriched in association with enhancers classified as active in BCs, primed in LPs, and off in LCs. (G) Expression of genes associated with the gene ontologies shown in (F) and with enhancers classified as active in BCs, primed in LPs, and off in LCs. (Black lines represent medians). (H) Number of genes whose expression was associated only with active enhancers not primed in other cell types or with active enhancers that were either primed or not in other cell types. (I) Heatmap showing the levels of DNA methylation of the most variable CpG sites within UMRs and LMRs. Each row represents one CpG site. Sites are grouped according to their changes in methylation across the epithelial cell types. See also Figure S7.
2070 Cell Reports 17, 2060–2074, November 15, 2016
(legend on next page)
Cell Reports 17, 2060–2074, November 15, 2016 2071
one of the other two mammary cell types (Figure 6H). This association might be explained by two coexisting modes of enhancer activation within these mammary epithelial subsets: one in which active enhancers fall close to enhancers primed or active in the other cell subpopulations and one independent of pre-existing enhancer states. In summary, from a complete description of human mammary epithelial cell enhancer activation states, we show that their enhancer profiles display extensive cell-type specificity, with no greater similarity between LPs and LCs than between LPs and BCs, or between LCs and BCs. This epigenomic distinction of LPs and LCs suggests that enhancer activation plays an important role in regulating their distinct transcriptional profiles, with many BC- and LC-specific genes primed for activation or re-activation in LPs, even though they maintain a BC- or LC-specific transcriptional profile. The DNA methylation data obtained on the same three cell samples also showed most of the changes in DNA methylation followed a progressive shift from hyper- to hypo-methylation, or vice versa from hypo- to hyper-methylation, in BCs, then LPs, and then LCs. Many CpG sites were also specifically hypo-methylated in LPs, indicative of a marked cell-type-specific methylation profile (Figure 6I). Identification of Distinct TF Regulatory Networks We next mapped the DNA binding sites of 319 TFs to genomic regions identified as cell-type-specific active enhancers (Figures 7A and 7B). After filtering for significance (p < 106) and expression (reads per kilobase of transcript per million mapped reads [RPKM] > 1), binding sites for 64 TFs were found to be enriched in BCs, 21 in LPs, and 25 in LCs (an average of 72% of the H3K27ac peaks analyzed contained at least one binding site for the TFs enriched in the corresponding cell type). This analysis revealed TP63 and FOXA1 to have the most prevalent binding sites in BCs and LCs, respectively, consistent with their reported selective functions in these cells (Bernardo et al., 2010; Chakrabarti et al., 2014). Analogously identified TFs in LPs were EHF and ELF5, two members of the ETS family of TFs with very similar DNA binding motifs. ELF5 is a recently identified luminal-specific TF, but a role of EHF in the mammary gland has not been previously reported. However, EHF has a known role in the establishment and maintenance of epithelial cell phenotypes in other tissues (Albino et al., 2012; Stephens et al., 2013). An examination of several other reported mammary datasets confirmed higher levels of ELF5 and EHF transcripts in LPs as compared to LCs and their very low levels in BCs. Comparison of the relative prevalence of each of these four enhancer-based TF binding sites confirmed their respective cell-type specificities. This analysis also identified a number of other TF binding sites uniquely prevalent in one of the three normal human mammary
cell types, independently of whether all or only inter-genic or only intra-genic enhancers were considered (Figure 7B; Table S6). In BCs, these included binding sites for EGR1 and EGR2, as well as members of the TEAD EBF and TCF families, and TP53. In LCs, this analysis identified binding sites for many members of the AP-1 TF complex, together with ESR1 and FOXP1. The TFs similarly identified in LPs included GRHL2, ELF1, and ETS1. Lastly, we used these enriched TF binding sites to derive a TF regulatory network for each of the three human mammary cell types. To construct these networks, we inferred a positive interaction between two TFs whenever a binding site for one of the TFs was found in an active enhancer within 40 kb (62% of all regions analyzed) of the gene for the other TF and expression of the two TF genes was positively or negatively correlated (absolute Spearman correlation coefficient > 0.25 from a published array dataset; Kannan et al., 2013). The networks thus generated show a high degree of independence in cell-type-specific TFs, with very few shared nodes and edges (Figures 7C–7E; Table S7). The network operative in BCs shows a highly interconnected central group of TFs in which TP63, NF1 (NFIB), EGR2, and ETS1 are the most prominent. In contrast, the network generated for LCs is much smaller, centered on FOXA1, BATF, and STAT5, with many TFs potentially contributing to the regulation of ESR1. The network derived for LPs is dominated by ELF5 and EHF, which regulate each other, as well as EHF regulating itself. Interestingly, the LP network contains a module common to the BC network that involves NFIB and ETS1, and it includes five TFs present in the LC network (ATF3, BATF, FOS, FOSL1, and FOSL2), reinforcing the concept of LPs as an epigenomically intermediate state between BCs and LCs. These findings illustrate the power of complete epigenome profiles to reveal novel information about mechanisms that control the transcriptional landscape of biologically complex normal adult human tissues. We anticipate the networks defined here will be further refined by a deeper subsetting of the populations analyzed, as has been suggested (Visvader and Stingl 2014). However, the fact that the present datasets have been generated from very large pools of highly purified cells isolated from six donors makes it likely that they will be generally representative of normal breast cells from premenopausal women. EXPERIMENTAL PROCEDURES Breast Tissue Dissociation, Cell Sorting, and In Vitro Progenitor Assays Histologically normal breast tissue was obtained with informed consent from six healthy premenopausal adult female donors undergoing reduction mammoplasty surgeries and used according to procedures approved by the Research Ethics Board of the University of British Columbia. Fresh tissue
Figure 7. Distinct Transcriptional Regulatory Networks Derived for Each Mammary Epithelial Subpopulation (A) Table showing the top five TF binding sites enriched in the sequences associated with active enhancers specific to each epithelial cell type. (B) Heatmap showing the normalized enrichment value (ln(p value)/max((ln(p value))) of all TF binding sites found to be significantly enriched (p < 106) and expressed (RPKM > 1) in each epithelial cell type. (C–E) TF regulatory networks constructed for BCs (C), LPs (D), and LCs (E). Node size is proportional to the enrichment value shown in (B), and edge thickness is proportional to the gene expression correlation of the two nodes connected. Green arrows between the two genes indicate a positive correlation. Magenta lines indicate a negative correlation. See also Tables S6 and S7.
2072 Cell Reports 17, 2060–2074, November 15, 2016
was viably cryopreserved as enzymatically isolated organoids and then thawed and dissociated into single-cell suspensions and the four defined mammary cell types isolated by fluorescence-activated cell sorting (FACS) (Figure 1A) as previously described (Kannan et al., 2014). Similar subsets from each sort were combined in the same proportions to obtain pools of R50 3 106 cells for each subset. The cells were washed in PBS containing protease inhibitors and frozen until processed for ChIP-seq or DNA/RNA extraction. The progenitor content of each subset was determined independently, as previously described (Kannan et al., 2013).
REFERENCES
Production of Reference Epigenomes RNA and DNA extraction and ChIP library construction and sequencing were performed following the guidelines formulated by the IHEC (http://www. ihec-epigenomes.org). These guidelines, as well as the standard operating procedures for RNA-seq, ChIP-seq, and whole-genome bisulfite sequencing library construction, are available at http://www.epigenomes.ca/ protocols-and-standards or by request.
Andersson, R., Gebhard, C., Miguel-Escalada, I., Hoof, I., Bornholdt, J., Boyd, M., Chen, Y., Zhao, X., Schmidl, C., Suzuki, T., et al.; FANTOM Consortium (2014). An atlas of active enhancers across human cell types and tissues. Nature 507, 455–461.
Epigenomic Data Analysis Detailed methods explaining the data analysis performed for the RNA-seq, ChIP-seq, and WGBS datasets are provided in Supplemental Experimental Procedures. ACCESSION NUMBERS The accession numbers for the raw sequence datasets reported in this paper are EGA: EGAS00001000552 and EpIRR: IHECRE00000223, IHECRE00000228, IHECRE00000231, IHECRE00000242, IHECRE00000001, IHECRE00000225, and IHECRE00000227. Processed signal tracks are also available at http://www.epigenomes.ca. SUPPLEMENTAL INFORMATION Supplemental Information includes Supplemental Experimental Procedures, seven figures, and seven tables and can be found with this article online at http://dx.doi.org/10.1016/j.celrep.2016.10.058. AUTHOR CONTRIBUTIONS D.P., N.K., M.H., and C.J.E. designed the project, and C.J.E. and M.H. were jointly responsible for its overall execution and quality control. N.K. sorted the primary samples and performed in vitro assays. M. Moksa, M. Marra, S.J.M.J., R.M., A.M., and S.A. oversaw the generation of sequence data under the guidance of M.H., and D.P., M.B., A.H.-M., D.J.H.F.K., S.G., and A.C. analyzed it. D.P., M.H., and C.J.E. wrote the manuscript. All authors contributed to the interpretation of the results and read and approved the manuscript. ACKNOWLEDGMENTS The authors thank D. Wilkinson, G. Edin, and M. Hale for excellent technical support; Drs. E. Bovill, J. Boyle, S. Bristol, P. Gdalevitch, A. Seal, J. Sproul, and N. van Laeken for access to discarded reduction mammoplasty tissue; and E. Chun and Dr. A. Gagliardi for critical reading of this manuscript. This work was supported by the Canadian Cancer Society Research Institute (grants 702294 and 702851) and by Genome British Columbia and the Canadian Institutes of Health Research as part of the Canadian Epigenetics, Environment and Health Research Consortium Network (CIHR-262119). N.K. was supported by a MITACS Elevate Fellowship (application ref. IT02817). D.J.H.F.K. received a Vanier Canada Graduate Scholarship from CIHR (CGV-121184). S.A. is supported by a Canada Research Chair in Molecular Oncology. Received: January 14, 2016 Revised: August 10, 2016 Accepted: September 30, 2016 Published: November 15, 2016
Albino, D., Longoni, N., Curti, L., Mello-Grand, M., Pinton, S., Civenni, G., Thalmann, G., D’Ambrosio, G., Sarti, M., Sessa, F., et al. (2012). ESE3/EHF controls epithelial cell differentiation and its loss leads to prostate tumors with mesenchymal and stem-like features. Cancer Res. 72, 2889–2900. Amin, V., Harris, R.A., Onuchic, V., Jackson, A.R., Charnecki, T., Paithankar, S., Lakshmi Subramanian, S., Riehle, K., Coarfa, C., and Milosavljevic, A. (2015). Epigenomic footprints across 111 reference epigenomes reveal tissue-specific epigenetic regulation of lincRNAs. Nat. Commun. 6, 6370.
Antequera, F., Boyes, J., and Bird, A. (1990). High levels of de novo methylation and altered chromatin structure at CpG islands in cell lines. Cell 62, 503–514. Bernardo, G.M., Lozada, K.L., Miedler, J.D., Harburg, G., Hewitt, S.C., Mosley, J.D., Godwin, A.K., Korach, K.S., Visvader, J.E., Kaestner, K.H., et al. (2010). FOXA1 is an essential determinant of ERalpha expression and mammary ductal morphogenesis. Development 137, 2045–2054. Bernstein, B.E., Mikkelsen, T.S., Xie, X., Kamal, M., Huebert, D.J., Cuff, J., Fry, B., Meissner, A., Wernig, M., Plath, K., et al. (2006). A bivalent chromatin structure marks key developmental genes in embryonic stem cells. Cell 125, 315–326. Burleigh, A., McKinney, S., Brimhall, J., Yap, D., Eirew, P., Poon, S., Ng, V., Wan, A., Prentice, L., Annab, L., et al. (2015). A co-culture genome-wide RNAi screen with mammary epithelial cells reveals transmembrane signals required for growth and differentiation. Breast Cancer Res. 17, 4. isin, M., Chakrabarti, R., Hwang, J., Andres Blanco, M., Wei, Y., Lukac Romano, R.-A., Smalley, K., Liu, S., Yang, Q., Ibrahim, T., et al. (2012). Elf5 inhibits the epithelial-mesenchymal transition in mammary gland development and breast cancer metastasis by transcriptionally repressing Snail2. Nat. Cell Biol. 14, 1212–1222. Chakrabarti, R., Wei, Y., Hwang, J., Hang, X., Andres Blanco, M., Choudhury, A., Tiede, B., Romano, R.-A., DeCoste, C., Mercatali, L., et al. (2014). DNp63 promotes stem cell activity in mammary gland development and basal-like breast cancer by enhancing Fzd7 expression and Wnt signalling. Nat. Cell Biol. 16, 1004–1015, 1–13. Choudhury, S., Almendro, V., Merino, V.F., Wu, Z., Maruyama, R., Su, Y., Martins, F.C., Fackler, M.J., Bessarabova, M., Kowalczyk, A., et al. (2013). Molecular profiling of human mammary gland links breast cancer risk to a p27(+) cell population with progenitor characteristics. Cell Stem Cell 13, 117–130. Debnath, J., Muthuswamy, S.K., and Brugge, J.S. (2003). Morphogenesis and oncogenesis of MCF-10A mammary epithelial acini grown in three-dimensional basement membrane cultures. Methods 30, 256–268. Dos Santos, C.O., Dolzhenko, E., Hodges, E., Smith, A.D., and Hannon, G.J. (2015). An epigenetic memory of pregnancy in the mouse mammary gland. Cell Rep. 11, 1102–1109. Eirew, P., Stingl, J., Raouf, A., Turashvili, G., Aparicio, S., Emerman, J.T., and Eaves, C.J. (2008). A method for quantifying normal human mammary epithelial stem cells with in vivo regenerative ability. Nat. Med. 14, 1384–1389. Ernst, J., and Kellis, M. (2012). ChromHMM: automating chromatin-state discovery and characterization. Nat. Methods 9, 215–216. Fridriksdottir, A.J., Kim, J., Villadsen, R., Klitgaard, M.C., Hopkinson, B.M., Petersen, O.W., and Rønnov-Jessen, L. (2015). Propagation of oestrogen receptor-positive and oestrogen-responsive normal human breast cells in culture. Nat. Commun. 6, 8786. Fu, N., Lindeman, G.J., and Visvader, J.E. (2014). The mammary stem cell hierarchy. Curr. Top. Dev. Biol. 107, 133–160. Gascard, P., Bilenky, M., Sigaroudinia, M., Zhao, J., Li, L., Carles, A., Delaney, A., Tam, A., Kamoh, B., Cho, S., et al. (2015). Epigenetic and transcriptional determinants of the human breast. Nat. Commun. 6, 6351.
Cell Reports 17, 2060–2074, November 15, 2016 2073
Huh, S.J., Clement, K., Jee, D., Merlini, A., Choudhury, S., Maruyama, R., Yoo, R., Chytil, A., Boyle, P., Ran, F.A., et al. (2015). Age- and pregnancy-associated DNA methylation changes in mammary epithelial cells. Stem Cell Reports 4, 297–311. Kannan, N., Huda, N., Tu, L., Droumeva, R., Aubert, G., Chavez, E., Brinkman, R.R., Lansdorp, P., Emerman, J., Abe, S., et al. (2013). The luminal progenitor compartment of the normal human mammary gland constitutes a unique site of telomere dysfunction. Stem Cell Reports 1, 28–37. Kannan, N., Nguyen, L.V., Makarem, M., Dong, Y., Shih, K., Eirew, P., Raouf, A., Emerman, J.T., and Eaves, C.J. (2014). Glutathione-dependent and -independent oxidative stress-control mechanisms distinguish normal human mammary epithelial cell subsets. Proc. Natl. Acad. Sci. USA 111, 7789–7794. Kuperwasser, C., Chavarria, T., Wu, M., Magrane, G., Gray, J.W., Carey, L., Richardson, A., and Weinberg, R.A. (2004). Reconstruction of functionally normal and malignant human breast tissues in mice. Proc. Natl. Acad. Sci. USA 101, 4966–4971. Lim, E., Vaillant, F., Wu, D., Forrest, N.C., Pal, B., Hart, A.H., Asselin-Labat, M.-L., Gyorki, D.E., Ward, T., Partanen, A., et al.; kConFab (2009). Aberrant luminal progenitors as the candidate target population for basal tumor development in BRCA1 mutation carriers. Nat. Med. 15, 907–913. Lim, E., Wu, D., Pal, B., Bouras, T., Asselin-Labat, M.-L., Vaillant, F., Yagita, H., Lindeman, G.J., Smyth, G.K., and Visvader, J.E. (2010). Transcriptome analyses of mouse and human mammary cell subpopulations reveal multiple conserved genes and pathways. Breast Cancer Res. 12, R21. Makarem, M., Kannan, N., Nguyen, L.V., Knapp, D.J.H.F., Balani, S., Prater, M.D., Stingl, J., Raouf, A., Nemirovsky, O., Eirew, P., and Eaves, C.J. (2013). Developmental changes in the in vitro activated regenerative activity of primitive mammary epithelial cells. PLoS Biol. 11, e1001630. Maruyama, R., Choudhury, S., Kowalczyk, A., Bessarabova, M., BeresfordSmith, B., Conway, T., Kaspi, A., Wu, Z., Nikolskaya, T., Merino, V.F., et al. (2011). Epigenetic regulation of cell type-specific expression patterns in the human mammary epithelium. PLoS Genet. 7, e1001369. Nguyen, L.V., Makarem, M., Carles, A., Moksa, M., Kannan, N., Pandoh, P., Eirew, P., Osako, T., Kardel, M., Cheung, A.M.S., et al. (2014). Clonal analysis via barcoding reveals diverse growth and differentiation of transplanted mouse and human mammary stem cells. Cell Stem Cell 14, 253–263. Nguyen, L.V., Pellacani, D., Lefort, S., Kannan, N., Osako, T., Makarem, M., Cox, C.L., Kennedy, W., Beer, P., Carles, A., et al. (2015). Barcoding reveals
2074 Cell Reports 17, 2060–2074, November 15, 2016
complex clonal dynamics of de novo transformed human mammary cells. Nature 528, 267–271. Pal, B., Bouras, T., Shi, W., Vaillant, F., Sheridan, J.M., Fu, N., Breslin, K., Jiang, K., Ritchie, M.E., Young, M., et al. (2013). Global changes in the mammary epigenome are induced by hormonal cues and coordinated by Ezh2. Cell Rep. 3, 411–426. Raouf, A., Zhao, Y., To, K., Stingl, J., Delaney, A., Barbara, M., Iscove, N., Jones, S., McKinney, S., Emerman, J., et al. (2008). Transcriptome analysis of the normal human mammary cell commitment and differentiation process. Cell Stem Cell 3, 109–118. Roadmap Epigenomics Consortium; Kundaje, A., Meuleman, W., Ernst, J., Bilenky, M., Yen, A., Heravi-Moussavi, A., Kheradpour, P., Zhang, Z., Wang, J., Ziller, M.J., et al. (2015). Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330. Shin, H.Y., Willi, M., Yoo, K.H., Zeng, X., Wang, C., Metser, G., and Hennighausen, L. (2016). Hierarchy within the mammary STAT5-driven Wap superenhancer. Nat. Genet. 48, 904–911. Stadler, M.B., Murr, R., Burger, L., Ivanek, R., Lienert, F., Scho¨ler, A., van Nimwegen, E., Wirbelauer, C., Oakeley, E.J., Gaidatzis, D., et al. (2011). DNA-binding factors shape the mouse methylome at distal regulatory regions. Nature 480, 490–495. Stephens, D.N., Klein, R.H., Salmans, M.L., Gordon, W., Ho, H., and Andersen, B. (2013). The Ets transcription factor EHF as a regulator of cornea epithelial cell identity. J. Biol. Chem. 288, 34304–34324. Theodorou, V., Stark, R., Menon, S., and Carroll, J.S. (2013). GATA3 acts upstream of FOXA1 in mediating ESR1 binding by shaping enhancer accessibility. Genome Res. 23, 12–22. Visel, A., Minovitsky, S., Dubchak, I., and Pennacchio, L.A. (2007). VISTA Enhancer Browser–a database of tissue-specific human enhancers. Nucleic Acids Res. 35, D88–D92. Visvader, J.E., and Lindeman, G.J. (2012). Cancer stem cells: current status and evolving complexities. Cell Stem Cell 10, 717–728. Visvader, J.E., and Stingl, J. (2014). Mammary stem cells and the differentiation hierarchy: current status and perspectives. Genes Dev. 28, 1143–1158. Voigt, P., Tee, W.-W., and Reinberg, D. (2013). A double take on bivalent promoters. Genes Dev. 27, 1318–1338.