Chromatin and transcriptional signatures for Nodal signaling during endoderm formation in hESCs

Chromatin and transcriptional signatures for Nodal signaling during endoderm formation in hESCs

Developmental Biology 357 (2011) 492–504 Contents lists available at ScienceDirect Developmental Biology j o u r n a l h o m e p a g e : w w w. e l ...

1MB Sizes 26 Downloads 41 Views

Developmental Biology 357 (2011) 492–504

Contents lists available at ScienceDirect

Developmental Biology j o u r n a l h o m e p a g e : w w w. e l s ev i e r. c o m / d eve l o p m e n t a l b i o l o g y

Genomes and Developmental Control

Chromatin and transcriptional signatures for Nodal signaling during endoderm formation in hESCs Si Wan Kim a, 1, Se-Jin Yoon a, 1, Edward Chuong a, 1, Chuba Oyolu b, Andrea E. Wills a, Rakhi Gupta a, Julie Baker a, c,⁎ a b c

Department of Genetics, Stanford University, Stanford, CA 94305, USA Department of Bioengineering, Stanford University, Stanford, CA 94305, USA Department of Obstetrics and Gynecology, Stanford University, Stanford, CA 94305, USA

a r t i c l e

i n f o

Article history: Received for publication 29 March 2011 Revised 6 June 2011 Accepted 7 June 2011 Available online 29 June 2011 Keywords: Nodal SMAD Histone methylation JMJD3 Endoderm hESC

a b s t r a c t The first stages of embryonic differentiation are driven by signaling pathways hardwired to induce particular fates. Endoderm commitment is controlled by the TGF-β superfamily member, Nodal, which utilizes the transcription factors, SMAD2/3, SMAD4 and FOXH1, to drive target gene expression. While the role of Nodal is well defined within the context of endoderm commitment, mechanistically it is unknown how this signal interacts with chromatin on a genome wide scale to trigger downstream responses. To elucidate the Nodal transcriptional network that governs endoderm formation, we used ChIP-seq to identify genomic targets for SMAD2/3, SMAD3, SMAD4, FOXH1 and the active and repressive chromatin marks, H3K4me3 and H3K27me3, in human embryonic stem cells (hESCs) and derived endoderm. We demonstrate that while SMAD2/3, SMAD4 and FOXH1 associate with DNA in a highly dynamic fashion, there is an optimal bivalent signature at 32 gene loci for driving endoderm commitment. Initially, this signature is marked by both H3K4me3 and H3K27me3 as a very broad bivalent domain in hESCs. Within the first 24 h, SMAD2/3 accumulation coincides with H3K27me3 reduction so that these loci become monovalent marked by H3K4me3. JMJD3, a histone demethylase, is simultaneously recruited to these promoters, suggesting a conservation of mechanism at multiple promoters genome-wide. The correlation between SMAD2/3 binding, monovalent formation and transcriptional activation suggests a mechanism by which SMAD proteins coordinate with chromatin at critical promoters to drive endoderm specification. © 2011 Elsevier Inc. All rights reserved.

Introduction Specialized signaling pathways drive the differentiation of human embryonic stem cells (hESCs) toward specific cell fates, but how these signaling networks cooperate with chromatin state has not been extensively investigated. hESCs have highly euchromatic chromatin and, upon differentiation, heterochromatic regions begin accumulating (Meshorer and Misteli, 2006). Within this euchromatic signature, hESCs have a prevalent histone signature, called a bivalent domain, where promoters are associated with both active (H3K4me3) and repressive (H3K27me3) histone marks (Bernstein et al., 2006; Ku et al., 2008). A particular bivalent conformation, with broad marks of both H3K4me3 and H3K27me3 across promoters, exhibits a significant association with the promoters of developmentally regulated genes. This bivalent domain in hESCs has been hypothesized to ‘poise’ developmental genes for rapid activation (Bernstein et al., 2006). ⁎ Corresponding author at: Department of Genetics, Stanford University, Stanford, CA 94305, USA. Fax: + 1 650 725 1534. E-mail address: [email protected] (J. Baker). 1 Authors contributed equally to this work. 0012-1606/$ – see front matter © 2011 Elsevier Inc. All rights reserved. doi:10.1016/j.ydbio.2011.06.009

Indeed, several reports have shown that bivalent domains are resolved into either repressive (H3K27me3) or active (H3K4me3) states upon differentiation, suggesting that cell fate commitment may require the release of this primed bivalent state (Bernstein et al., 2006; Mikkelsen et al., 2007; Pan et al., 2007; Zhao et al., 2007). While hESCs harbor a significant number of bivalent domains at developmental promoters, this chromatin conformation exists to a lesser degree in more differentiated cell types as well, including mouse embryonic fibroblasts (MEFs), neural progenitor cells (NPCs) and T cells, suggesting that these domains continue to mark promoters for further functional and rapid activation (Bernstein et al., 2006; Cui et al., 2009; Mikkelsen et al., 2007; Pan et al., 2007; Zhao et al., 2007). While the bivalent domain is hypothesized to allow rapid activation of molecules critical to guiding specific differentiation events, there is very little molecular or biological evidence that supports this general hypothesis, other than the important biochemical observation describing this interesting domain. Signaling molecules must play a role – either actively or passively – in the depletion of the repressive mark, H3K27me3, and the accumulation of the active mark, H3K4me3, at these specific developmentally important regions. It is imperative that the interplay between these marks and specific transcription

S.W. Kim et al. / Developmental Biology 357 (2011) 492–504

factors are monitored through this process, together, during the differentiation from pluripotency to a more differentiated state. Endoderm is one of the first cell types to emerge during embryogenesis and does so under the control of the Nodal signaling pathway. hESCs can be driven toward endodermal fates by activation of the Nodal signaling pathway, which utilizes serine threonine kinase receptors to phosphorylate the intracellular proteins SMAD2 and SMAD3 (Attisano and Wrana, 2000; Schier, 2003; Shen, 2007). These proteins translocate into the nucleus and form an association with FOXH1 at regions within the genome. Several direct targets of SMAD2/3/4 and FOXH1 have been elucidated, including GSC, PITX2, LEFTY1, LEFTY2, NODAL and CADHERIN (Izzi et al., 2007; Saijoh et al., 2003; Shiratori et al., 2001; Takaoka et al., 2006; von Both et al., 2004). Genome-wide analysis of EOMES suggests that it is a key regulator of SMAD2/3 at overlapping promoters leading to the initial phases of endoderm formation (Teo et al., 2011). However, little is known regarding how the SMAD2/3/4 and FOXH1 complex assembles at specific genomic regions in a cell type specific manner. Much less is known about how this complex interacts with chromatin to release a repressive state. Recently, a mechanism has been proposed: Dahle et al. (2010) demonstrated that SMAD2/3 was capable of recruiting the histone demethylase, JMJD3, to the NODAL promoter in mouse ESCs, causing the loss of the repressive mark H3K27me3. This strongly suggests that a SMAD2/3/JMJD3 complex acts directly on the repressive chromatin state, leading to transcriptional activation. Nodal signaling is also required for self-renewal in hESCs (Besser, 2004; James et al., 2005; Vallier et al., 2005, 2009; Xu et al., 2008) which appears contradictory as it is also involved in the first stages of endoderm commitment (D'Amour et al., 2005, 2006). As Nodal has strong dose-dependent effects on cell fate specification, it is likely that the decision between maintaining pluripotency versus differentiation is due to significant changes in downstream effects in response to varying levels of Nodal signaling. The effect of Nodal in self-renewal may also be dependent upon the distinct chromatin state existing in hESCs, including open chromatin and large array of bivalent domains (Bernstein et al., 2006; Ku et al., 2008; Meshorer and Misteli, 2006). To examine how Nodal signaling interacts and effects particular chromatin states during hESC self renewal and endoderm commitment, we used ChIP-seq to generate genome-wide target maps for Nodal downstream transcription factors, including SMAD2/3, SMAD3, SMAD4, FOXH1 and the chromatin marks, H3K4me3 and H3K27me3, in hESCs and derived endoderm (D'Amour et al., 2005, 2006). Interestingly, we find that endoderm derived from hESCs has a wider array of bivalent domain structures than hESCs. At promoters of genes critical for differentiation, we observe a significant depletion of H3K27me3 only with a specific bivalent context. These ‘resolving’ bivalent domains are highly correlated with SMAD2/3 binding. We further show that the resolution of these regions occurs together with SMAD2/3 accumulation within the first 24 h of differentiation and are associated with JMJD3. This suggests that the resolution of the bivalent domain is not a cause or effect of SMAD2/3 binding, but a cooperative association between the transcription factor and chromatin context. Material and methods ChIP ChIP was performed as previously described (Johnson et al., 2007). 5 × 106 cells were used for each ChIP. Cells were cross-linked with 1% formaldehyde for 15 min at room temperature and the cross-linking was stopped by adding Glycine (to be 125 mM) for 5 min. The crosslinked cells were lysed in 500 μl of lysis buffer (50 mM Tris pH8.1, 10 mM EDTA, 1% SDS) with protease inhibitor cocktail (Roche). After being diluted in 1.5 ml of IP buffer (25 mM Tris pH 8.0, 150 mM NaCl, 0.01% SDS, 0.5% Deoxycholate, 1% TritonX-100, 5 mM EDTA), the cell lysate was sonicated to generate 200- to 600-bp fragments. Fragmented

493

chromatin was immunoprecipitated with magnetic beads coupled with 5 μg of each antibody. The antibodies used were anti-SMAD2/3 (A; Santa Cruz, sc-8332, B; R&D Systems, AF3797), anti-SMAD3 (Abcam, ab28379), anti-SMAD4 (R&D Systems, AF2097), anti-FOXH1 (R&D Systems, AF4248), anti-JMJD3 (Abgent, AP1022a), anti-H3K4me3 (Abcam, ab8580) and anti-H3K27me3 antibody (Upstate, 07–449). The pulldowned beads with immune complexes were washed twice each with low salt buffer (50 mM Tris pH 8.0, 150 mM NaCl, 0.1% SDS, 0.5% Deoxycholate, 1% NP40, 1 mM EDTA), high salt buffer (50 mM Tris pH 8.0, 500 mM NaCl, 0.1% SDS, 0.5% Deoxycholate, 1% NP40, 1 mM EDTA), LiCl wash buffer (50 mM Tris pH 8.0, 250 mM LiCl, 0.5% Deoxycholate, 1% NP40, 1 mM EDTA) and Tris-EDTA buffer (10 mM Tris pH 8.0, 1 mM EDTA) for 10 min at 4 °C. Immune complexes were eluted in elution buffer (1% SDS, 0.1 M NaHCO3) for 1 h at 65 °C and reverse cross-linked for 6 h at 65 °C. For sequential ChIP, the immune complexes from the first pulldown were eluted in the elution buffer for 1.5 h at 37 °C. After diluted in IP buffer, the complexes were immunoprecipitated with beads coupled with second antibody, and washed and eluted again in the elution buffer for 1 h at 65 °C. Reverse cross-linked DNA was purified by phenol/chloroform extraction and further by QIAquick PCR purification kit (Qiagen) and an aliquot was used for PCR validation. We performed these ChIPs at least six times for each of the antibodies and the qPCR levels had to be above 10× enriched above input control in each of the six replicates in order to move forward with making a sequencing library. Sequencing and data processing Sequencing libraries were prepared using Genomic DNA Sample Kit (Illumina) according to the manufacturer's protocol. The ChIP-seq libraries were sequenced by Genome Analyzer II (Illumina). Reads were 36 bp, and we used Eland to map the first 25 bp to human genome assembly hg18 and mismatches up to 3 bp were allowed. Only uniquely mapping reads were kept. Shifts were calculated within the peak calling programs QuEST 2.4 (Valouev et al., 2008) and CisGenome (Ji et al., 2008). The negative control for sequencing was the Input control library. Transcription factor ChIP-seq reads were processed to call peaks using CisGenome. The setting for peak-calling and sliding window size was 300 bp and the threshold number of reads required for peak to be called was 11 reads. The false discovery rate allowed was 0.01. The resulting peaks were mapped to the human genome hg18 to identify the locations and numbers of peaks around annotated genes. Histone H3K4me3 and H3K27me3 peaks were called using QuEST 2.4. We used the “histone” bandwidth setting with “relaxed” peak-calling parameters. Transcription factor binding regions and associated genes We parsed the peaks to determine distributions across the gene body. UCSC Known Genes (Human browser hg18) were used to locate the peaks into annotated genomic regions, exon, intron, promoter (±10 kb from transcription start site (TSS)) or intergenic region. We then associated genes for each peak and the nearest genes within 1 Mb, 100 kb, 10 kb and 1 kb from TSS were counted. If additional genes were within 25% of the distance to the closest genes, they were also associated with the binding site. Further, we examined the numbers of genes lying within the binding sites between hESCs and endoderm within the same distance categories. Xenopus tropicalis ortholog identification and in situ hybridization X. tropicalis orthologs of candidate human Nodal signaling target genes were identified by overall sequence homology validated by synteny between the human and X. tropicalis sequences (identified using Metazome). ESTs for the corresponding X. tropicalis gene were identified using the JGI genome browser, and IMAGE clones

494

S.W. Kim et al. / Developmental Biology 357 (2011) 492–504

containing appropriate ESTs were used as probes for in situ hybridization. Linearized plasmid was reverse-transcribed and labeled with DIG-UTP according to standard methods, and in situ hybridization using a multibasket technique was carried out as described in Khokha et al. (2002). Histone peak clustering We used K-means clustering (http://bonsai.ims.u-tokyo.ac.jp/ ~mdehoon/software/cluster/software.htm) to visualize the H3K4me3 and H3K27me3 surrounding TSS in the genome. The wiggle/enrichment plots represent normalized enrichment over the background. The data points were the normalized enrichment values that are calculated by QuEST. The log2(enrichment) values were used for clustering and plotting. H3K4me3 and H3K27me3 marks were analyzed depending on their patterns within ±5 kb of the TSS from UCSC Known Genes. For gene loci with isoforms with alternate TSS's, we chose the TSS with the largest H3K4me3 peak. Genes with a TSS within 10 kb of another gene TSS were discarded for clustering analysis. To functionally define these clusters, GO analysis was performed using PANTHER (Protein Analysis through Evolutionary Relationships) (www.pantherdb.org). In addition, we examined how Groups 1–9 are correlated with transcriptional activation in the context of endoderm. To this end we used microarray timecourse of hESC differentiation into endoderm to associate the behavior of transcripts, whether induced, constitutive, inactive or repressed with a specific histone grouping (1–9). We compared the transcriptional levels of day 5 derived endoderm (d5) with hESCs (d0). For each gene, we calculated the fold change (R), difference (D) between the means of the two groups, and the Welch's t-test p-value using dChip (Li and Wong, 2001). Induced genes were defined by R N 2 and D N 100 of d5 over d0, and P ≤ 0.05. Repressed genes were defined by R N 2 and D N 100 of d0 over d5, and P value ≤ 0.05. We also calculated the logarithm-transformed average (A) and difference (M) of the means of d0 and d5 for each gene. We calculated the z-scores of A (ZA) and the z-scores of M (ZM) for all genes. Constitutive genes were defined by ZA N 1 and ZM b 1. Inactive genes were defined by ZA b −1 and ZM b 1. Results Timecourse microarrays and extensive protein validation support endoderm formation from hESCs To characterize the downstream Nodal targets during the differentiation of hESCs into the endodermal lineage and to determine how this important signaling pathway interacts with chromatin, we first extensively validated our derivation of endodermal cells from hESCs. To this end, endodermal cells were derived from hESCs after a 5-day treatment with Activin (Fig. 1A), which acts as a Nodal analog (McLean et al., 2007; Schier, 2003; Shen, 2007). The efficiency of endoderm differentiation was rigorously examined using multiple methodologies, including determining the transcriptome of cells through time, analyzing the expression of endoderm specific proteins – with and without Nodal chemical inhibitors – and validating each timecourse using qPCR. To this end, we performed an extensive microarray timecourse during hESC differentiation into endoderm post Activin treatment, examining at days 0, 1, 3 and 5. The timecourse microarray data were deposited to the publicly available repository of GEO (accession number GSE29421). Several critical lineage specification genes including GSC, MIXL1 and EOMES are highly enriched (more than 40 fold) after the first 24 h of differentiation (Fig. 1B). Mesodermal markers are also expressed after 24 h of differentiation, including BRA and FGF4 suggesting the presence of a mesendodermal intermediate (Fig. 1B) (D'Amour et al., 2005; Liu et al., 2007). The transcripts of endoderm markers SOX17, FOXA2 and CXCR4 are upregulated (6–22 fold) as early as day 3 of differentiation while markers for other germ

layers, including mesoderm, are no longer expressed. Also, starting at day 3 of differentiation the protein levels of endoderm markers GATA4, GATA6, SOX17 and FOXA2 are induced (Fig. 1C). While extensive transcript profiles have been examined during endoderm commitment after exposure of hESCs to Activin, the dynamics of protein expression and sensitivity to Activin signals have not been demonstrated (D'Amour et al., 2005; Liu et al., 2007). Using western analysis to examine protein expression and its dependency on Nodal signaling, we examined multiple endodermal markers as well as the SMAD2/3 and FOXH1 proteins themselves throughout the course of endoderm differentiation at days 0, 1, 3, 4 and 5 post Activin treatment in the presence or absence of the Nodal chemical inhibitor, SB431542. This SB431542 inhibitor is known to be specific for SMAD2/3 signaling by inhibiting type I TGF-β receptors and has been studied in hESCs (Vallier et al., 2009; Xu et al., 2008). Here we show that the expression of phosphorylated SMAD2 (pSMAD2) and FOXH1 is upregulated in the nucleus as early as day 1 of differentiation after exposure to Activin. These proteins are clearly co-localized within the nucleus of the derived endodermal cells (Fig. 1D, E and F). Furthermore, pSMAD2 and FOXH1 expression is highly dependent on Nodal signaling as both proteins diminish following SB431542 treatment (Fig. 1E). Additionally, the endoderm markers LEFTY, GATA4, GATA6 and SOX17 were highly induced during differentiation as early as day 3 and maintained high level of expression throughout the rest of the timecourse (Fig. 1G). OCT4, a pluripotency marker, was strongly expressed in the nucleus of hESCs but decreased sharply during endoderm differentiation (Fig. 1C and E). This pattern is not the same for the OCT4 transcript, which remains abundant during the first few weeks of hESC differentiation in our timecourse microarray data, suggesting that the expression of OCT4 is more tightly controlled posttranscriptionally. When SB431542 was added during the third day of differentiation in Activin, LEFTY and GATA6 expression rapidly decreased within 24 h. Conversely, GATA4 and SOX17 showed weak or little reduction in protein in response to Nodal signaling inhibition when occurring on the third day of endoderm differentiation suggesting they are less dependent upon Nodal signaling or that these proteins are more stable. Interestingly, we have also shown that protein expressions of LEFTY, GATA4, GATA6 and SOX17 were completely inhibited if we treated cells with SB431542 at the beginning of differentiation of hESCs rather than day 3 differentiated cells, confirming that the initial expression of GATA4, GATA6, SOX17 and LEFTY are all dependent on Activin treatment (Yoon et al., unpublished results). Taken together, these results validate that, as in earlier studies, Activin/Nodal signaling drives definitive endoderm formation in hESCs and demonstrate fundamental expression differences in response to Nodal signaling. Genome-wide target analysis of SMAD2/3/4, FOXH1, H3K4me3 and H3K27me3 in hESCs and derived endoderm To examine the dynamics of the Nodal signaling pathway and associated chromatin marks during differentiation of hESCs into endoderm, we performed genome-wide location analysis using ChIPseq for SMAD2/3, SMAD3, SMAD4, FOXH1, H3K4me3 and H3K27me3. To this end, we examined multiple antibodies against SMAD2/3, SMAD3, SMAD4 and FOXH1 for their ability to pull down chromatin in both hESCs and derived-endoderm and found several, including two SMAD2/3 antibodies (anti-rabbit; SMAD2/3_A and anti-goat; SMAD2/3_B, Table 1) neither of which were phospho-specific, but which were highly effective for ChIP based upon extensive validation shown in Fig. S1. By using ChIPqPCR, we analyzed the enrichment of several known SMAD targets, including LEFTY1 and LEFTY2 (Fig. S1A) in all SMAD2/3, SMAD3, SMAD4 and FOXH1 ChIP samples. GAPDH intronic sequences were used as negative control. After validation, three biological replicate ChIPs were pooled from each antibody, as well as input controls, for both hESCs and derived endoderm. Libraries were then generated and sequenced with

S.W. Kim et al. / Developmental Biology 357 (2011) 492–504

A

Activin 100 ng/ml in RPMI medium

hESC 0%

0.2%

Endo

2%

2% FBS day 5

day 3

day 2

day 1

day 0

B Average Expression

8000 day 0

day 1

day 3

23914 15967 11533

day 5

6000

10284

4000 2000 0

L

DA

NO

C

495

F8

FG

T3

WN

F4

1

A

BR

FG

XF

FO

D

Activin

d0 d0 d1 d3 d5 d5 Hep GATA4

X1

S

C

GS

LH

ME

EO

E

Activin

1

XL

MI

K562 d0 d1 d3 d5

R

CE

2

4

7

X1

SO

CR

CX

XA

FO

d0 d3 d4 d5 o t t c t t Nu Cy Ac Ac SB Ac SB

pSMAD2

SMAD2/3

GATA6

pSMAD3

pSMAD2

SOX17

SMAD2/3

FOXH1

FOXA2

SMAD3

OCT4

OCT4

Tubulin

Tubulin

Tubulin

F

G

d4 (24 h ) d3

SMAD2/ 3

FOXH1

DAP I

Merge

t Ac V

d5 (48 h) t

SB Ac

V

SB

SMAD2/3 SMAD4 FOXH1 LEFTY GATA4 GATA6 SOX17 Tubulin

Fig. 1. Endoderm differentiation and validation. (A) Diagram for endoderm differentiation from hESCs. (B) Microarray expression data for lineage markers. Average expression was presented from array datasets of individual biological replicates for each gene over the timecourse. (C) Western blots for known endoderm regulators GATA4, GATA6, SOX17 and FOXA2. OCT4 is a marker for hESCs. Hep; HepG2 cell lysate as a positive control for FOXA2. (D) Western blots for phosphorylated SMAD2 and SMAD3 (pSMAD2 and pSMAD3). K562; positive control cell lysate for total SMADs but negative for pSMADs. (E) Western blots for samples after cellular fractionation or in the presence of SB431542. Nuc; nuclear lysate, Cyto; cytosolic lysate, Act; Activin treated, SB; SB431542 treated. (F) SMAD2/3 and FOXH1 immunohistochemistry on derived endoderm. (G) Western blots for known endoderm regulators and Nodal targets in the presence of SB431542 for specified duration. Act; Activin treated, V; the vehicle (ethanol) only, SB; SB431542 treated.

an Illumina Genome Analyzer II. The resulting sequence tags were mapped to the human genome (hg18) using Eland and binding sites were identified using CisGenome (Ji et al., 2008). The ChIP-seq data are available in GEO (accession number GSE29422). For SMAD2/3_A, SMAD2/3_B, SMAD3, SMAD4 and FOXH1, we generated 10.2, 8.7, 6.9, 9 and 10.4 million mapped reads in hESCs and 9.6, 5.9, 6.1, 6.1 and 10.8 million mapped reads in derived endoderm, respectively (Table 1). Since there were thousands of SMAD bound regions in both hESCs and endoderm, we validated these datasets using two methods. First, we compared bound regions elucidated from the two SMAD2/3 antibodies (A and B) and found a high degree of overlap in both hESCs and derived

endoderm (93% and 78%, respectively), showing that two independent antibodies pulled down highly overlapping regions. As the overlap was extensive, and the B dataset was far more comprehensive, all subsequent analysis was performed using the B dataset. The degree of overlap generated from two independent antibodies highly validates the dataset. Examples of bound regions are illustrated in Fig. 2. For further validation, we performed ChIP-qPCR at predicted loci with or without SMAD signaling present. To this end, we treated hESCs with SB431542 during the first 24 h of differentiation. We then examined by ChIP-qPCR whether SMAD2/3 and FOXH1 were still associated with the predicted binding regions in the absence of Nodal signaling. We found that every

496

S.W. Kim et al. / Developmental Biology 357 (2011) 492–504

Table 1 ChIP-seq analysis summary. The numbers of reads, peaks and associated genes of all transcription factors and histone marks studied in hESCs (hESC) and derived endoderm (Endo). Associated genes (kb) ChIP SMAD2/3_A SMAD2/3_B SMAD3 SMAD4 FOXH1 Input H3K4me3 H3K27me3 Input

Cell

Reads

hESC Endo hESC Endo hESC Endo hESC Endo hESC Endo hESC Endo hESC Endo hESC Endo hESC Endo

Peaks

10,200,000 9,605,287 8,708,351 5,910,789 6,928,056 6,055,629 8,959,821 6,066,743 10,400,000 10,800,000 9,465,441 9,716,862 7,338,695 10,326,110 17,893,702 19,595,165 8,824,050 10,876,757

27730000

4,032 1,037 14,833 2,915 2,688 2,296 3,936 4,531 9,702 29,292 – – 24,030 29,688 13,936 26,293 – –

1000

100

10

1

3,588 1,117 9,777 2,604 2,511 2,107 3,533 2,768 6,897 11,631 – –

3,077 827 9,052 1,905 2,062 1,466 3,223 2,753 5,797 10,385 – –

1,916 272 7,057 567 1,197 400 2,293 906 2,646 4,734 – –

1,249 72 5,715 106 745 67 1,702 207 1,123 1,324

27735000

27740000

27745000

EOMES

chr3

SMAD2/3_A SMAD3 50 -

hESC

SMAD2/3_B 50 -

SMAD4 50 -

K4me3

50 -

K27me3

Endoderm

SMAD2/3_A SMAD3 50 -

SMAD2/3_B 50 -

SMAD4 50 -

K4me3 50 -

K27me3

0

_

94300000 chr14

94305000

94310000

GSC

SMAD2/3_A SMAD3 50 -

hESC

SMAD2/3_B 50 -

SMAD4 50 -

K4me3 50 -

K27me3

Endoderm

SMAD2/3_A SMAD3

50 -

SMAD2/3_B

*

50 -

SMAD4 50 -

K4me3 50 -

K27me3

0_

Fig. 2. SMAD2/3/4 binding, H3K4me3 and H3K27me3 at EOMES and GSC loci in hESCs and derived endoderm. UCSC genome browser screen shots of EOMES and GSC in hESCs (hESC, blue) and derived endoderm (Endoderm, red). Binding regions of SMAD2/3_A and SMAD3 ChIP-seq dataset are shown as bars and others are shown as peaks. Dotted boxes indicate unique regions of SMAD2/3 and SMAD4 binding in derived endoderm, and asterisk indicates the known Activin response element in the GSC promoter region (Danilov et al., 1998). K4me3 and K27me3 stand for H3K4me3 and H3K27me3, respectively.

S.W. Kim et al. / Developmental Biology 357 (2011) 492–504

%/Total Peak Regions

A

497

Endoderm

hESC 50

50

40

40

30

30

20

20

10

10

0

0

Exon

SMAD2/3

SMAD3

B

SMAD4

FOXH1

Intron Promoter Intergenic SMAD2/3

SMAD3

Endo 14,374

C

hESC

1,879

26

Endo 7,918

459 2,456

FOXH1

FOXH1

1,134 771

8,506

59

SMAD4

SMAD2/3 629

SMAD4

SMAD2/3

FOXH1

Genes

Peaks hESC

SMAD4

2,694

227

7,691

914

1,004 206

45

D

GADD45A

GSC (+control)

LHX1

211

FZD8

SMAD3

CYP26A1

A

A

A

A

A

V

V

V

V

V

Fig. 3. SMAD proteins exhibit dynamic binding during endoderm differentiation. (A) Genomic distribution of transcription factor binding sites. SMAD2/3, SMAD4 and FOXH1 peaks were classified into annotated genomic regions — exon, intron, promoter or intergenic region using UCSC Known Genes (Human browser hg18). Promoter regions are defined as regions within 10 kb from TSS. (B) Venn diagrams showing comparison of SMAD2/3 targets in hESCs (blue) and endoderm (red). Comparison of peaks are Venn diagrams to left and comparison of associated genes are to the right (chi-square P b 1.6E–130). (C) Venn diagram representing the overlap of different SMAD and FOXH1 targets within the endoderm at 100 kb from TSS. Left diagrams are comparison of SMAD2/3 and FOXH1(chi-square P b 1.0E–228). Center diagrams are comparison of SMAD4 and FOXH1 targets (chi-square P b 1.0E– 228). Right diagrams are comparison of all SMAD targets (all chi-square P of each overlap between two SMADs were b1.0E–228). (D) Wholemount in situ hybridization analysis of SMAD2/3 target genes in X. tropicalis embryos (gastrula, stage 11). Embryos are shown in dorsal views with animal (A) to the top and vegetal (V) to the bottom. A known Nodal signaling target expressed in the dorsal mesoderm, GSC, is shown as a positive control. Examples are shown for SMAD2/3 targets expressed in the dorsal mesoderm (GADD45A, LHX1), dorsal endoderm (FZD8), and throughout the mesoderm (CYP26A1) regions of the embryo that are highly responsive to Nodal signaling.

examined region was affected by SB431542 treatment, strongly indicating that they are regions bound directly by SMAD2/3 and FOXH1 and are downstream targets of Nodal/Activin signaling (Fig. S1B). For H3K4me3 and H3K27me3, we generated 7.3 and 17.9 million mapped reads in hESCs and 10.3 and 19.6 million mapped reads in derived endoderm, respectively (Table 1). Since H3K4me3 and H3K27me3 marks have far wider distribution than the binding of transcription factors, we sought to address whether our depth of sequencing reached saturation. To this end, we called peaks from pooled reads (two biological replicates for H3K4me3 and three for H3K27me3) and checked the levels of saturation of unique peaks called. H3K4me3 reads reached saturation, but not H3K27me3 even after additional sequencing (Fig. S2). These data suggest that either more extensive sequence depth might be necessary or that H3K27me3 marks are more variable than H3K4me3 marks, consistent with initial data generated in

hESCs and analysis performed by Sharov and Ko (2007) (Pan et al., 2007; Zhao et al., 2007; available in GEO: GSE29422). SMAD2/3/4 associate with different regions in hESCs and derived endoderm To determine how dynamic SMAD and FOXH1 binding is between hESCs and derived endoderm, we examined the genomic distribution of identified Nodal signaling peaks. We first designed a Python script to search through UCSC Known Genes (Human browser hg18) to locate the peaks into annotated exon, intron, promoter or intergenic regions (Fig. 3A). We found that in hESCs, SMAD2, SMAD3 and SMAD4 are bound at similar frequencies to each of these genomic regions with 50–60 % of binding occurring in exons and promoters. In contrast, in derived endoderm, only 10–15 % of SMAD binding occurs in exons and promoters. Surprisingly, the genomic distribution of

S.W. Kim et al. / Developmental Biology 357 (2011) 492–504

B

hESC

4096

4096

256

256

16

16

1

1

C

1 k < 10 b < 1 kb 10 kb < kb 1 M 1 kb < b 10 < 1 kb 10 kb < kb 1 M 1 kb < b 10 < 1 kb 10 kb < kb 1 M 1 kb < b 10 < 1 kb 10 kb < kb 1 M 1 kb < b 10 < 1 kb 10 kb < kb 1 < Mb 1 M b

65536

SMAD2/3 SMAD3 SMAD4 FOXH1

None

SMAD2/3 SMAD3 SMAD4 FOXH1

D

hESC

4096

256

256

16

16

1

1

None

1 2 s >= si ite 3 tes si te s

4096

1 2 s >= si ite 3 tes si te s 1 s 2 >= si ite 3 tes si te s 1 s 2 >= si ite 3 tes si te s < 10 < kb 1 M b

65536

SMAD2/3 SMAD3 SMAD4 FOXH1

None

Endoderm

65536

1 2 s >= si ite 3 tes si te s

log2 (Expression)

Endoderm

65536

1 k < 10 b < 1 kb 10 kb < kb 1 M 1 kb < b 10 < 1 kb 10 kb < kb 1 M 1 kb < b 10 < 1 kb 10 kb < kb 1 M 1 kb < b 10 < 1 kb 10 kb < kb 1 M 1 kb < b 10 < 1 kb 10 kb < kb 1 < Mb 1 M b

log2 (Expression)

A

1 2 s >= si ite 3 tes si te s 1 2 s >= si ite 3 tes si te s 1 s 2 >= si ite 3 tes si te s < 10 < kb 1 M b

498

SMAD2/3 SMAD3 SMAD4 FOXH1

None

Fig. 4. Expression correlation of transcription factor binding at distances from TSS and correlation with multiple binding events. Expression levels of genes neighboring transcription factor binding sites at different distances (b 1 kb, 1 kb b 10 kb, and 10 kb b 1 Mb) in hESCs (A) and endoderm (B). “None” (in A and B) indicates genes that do not have surrounding sites for SMAD2/3, SMAD3, SMAD4 and FOXH1 within given distance. Expression levels of genes with multiple transcription factor binding sites in hESCs (C) and endoderm (D). Genes bound by transcription factors within 10 kb were analyzed. “None” (in C and D) are genes that do not have any binding regions of SMAD2/3, SMAD3, SMAD4 and FOXH1 at the given distance. Box plots show 25th, 50th, and 75th percentiles of log2 expression levels for genes in each group. Whiskers represent 5 and 95 percentile of genes in each group. Student t-tests were performed on each group comparing with appropriate ‘None’ group. One asterisk denotes P b 0.05 and two asterisks P b 0.01. The presence of a SMAD2/3, SMAD3, SMAD4 or FOXH1 binding event within 1 kb from TSS is significantly correlated with an increase in transcriptional levels, above background levels (In hESCs, P = 1.5E–50, 5.6E–38, 2.5E–15 and 5.6E–16, respectively; In endoderm, P = 3.5E–12, 8.7E–12, 1.7E–05 and 4.3E–12, respectively). Further, the presence of three or more binding regions of SMAD2/3, SMAD3, SMAD4 or FOXH1 is highly correlated with increased transcription levels in endoderm, above background levels (In hESCs, P = 2E–10, 9.2E–08, 6E–05 and 7.8E–04, respectively; In endoderm, P = 1.5E–22, 9.5E–16, 1.7E–09 and 7.9E–07, respectively).

FOXH1 peaks remains more constant between these two cell types, exhibiting a high degree of binding (80–85%) outside exons and promoters. This mimics the distribution of SMADs within derived endoderm, but not in hESCs. These observations suggest that SMAD transcription factors are highly dynamic in regard to their genomic distribution even within the 5 days that separate hESCs from endoderm. As SMAD binding is dynamic between hESCs and derived endoderm, we sought to define how these peaks are utilized in the different cells. By analyzing the common sites for each transcription factor in both hESCs and derived endoderm, we found that most of the SMAD peaks change upon differentiation. Only 3% (459/14,833) of SMAD2/3 peaks in hESCs are retained in the derived endodermal cells (Fig. 3B). A similar pattern is observed for SMAD3 (6.7%; 180/2688) and SMAD4 (8.8%; 345/3936). On the other hand, FOXH1 retains almost 50% (4346/9702) of its hESCs peaks upon differentiation toward endoderm, while the large number of additional sites is observed in derived endoderm. Together, this suggests that a vast change in transcription factor occupancy is triggered upon differentiation toward endoderm. SMAD2/3/4 associate with similar genes in hESCs and derived endoderm Although SMAD2/3 binding is highly dynamic at bound regions during differentiation into endoderm, we asked whether these

regions are associated with similar genes. To this end, we associated bound regions for all transcription factors with the nearest gene TSS (UCSC Known Gene) within 1000 kb (1 Mb), 100 kb, 10 kb and 1 kb (Table 1). To validate whether these genes might be involved in SMAD signaling, we stringently selected a high value group of 322 target genes by examining regions bound by all SMAD2/3, SMAD3 and SMAD4 in both of hESCs and endoderm (see Supplemental excel file). To obtain spatial information in the embryo, we then cloned the X. tropicalis orthologs of 42 of these target genes and performed in situ hybridization during gastrulation stages. We found that 45% (19/42) of these had specific endoderm or mesoderm expression patterns during gastrulation. This is an extremely high hit rate, especially considering that 29% (12/42) of the probes did not provide a signal, which could simply be due to probe design. Some of these expression patterns are represented in Fig. 3D, while the rest are in Supplemental Figure S3. Overall this suggests that the SMAD target datasets do indeed identify neighboring genes that are expressed embryonically at the temporal and spatially appropriate times for being involved in Nodal signaling. To determine the overlap between genes neighboring SMAD bound regions, we compared target genes between hESCs and derived endoderm. We found that the genes associated near SMAD bound regions remained more consistent between hESCs and derived endoderm compared to the regions themselves. For example, 60%

S.W. Kim et al. / Developmental Biology 357 (2011) 492–504

A

hESC K4

K27

499

Endoderm K4

K27

1 2 3 4

6

7

Peak Intensity (log2 R)

Cluster Groups

5

8

9

TSS

Fold Enrichment

B

TSS

TSS

8 0 -8

TSS

50 hESC

40

Endoderm

30 20 10

K2 K2 7_K 7 4 K2 _K 2 K2 7_I 7 7( gG 10 % K2 ) 7 K2 _K 7 4 K2 _K 2 K2 7_I 7 7( gG 10 % K2 ) K2 7_K 7 4 K2 _K 2 K2 7_I 7 7( gG 10 % K2 ) 7 K2 _K 7 4 K2 _K 2 K2 7_I 7 7( gG 10 % K2 ) 7 K2 _K 7 4 K2 _K 2 K2 7_I 7 7( gG 10 % K2 ) K2 7_K 7 4 K2 _K 2 K2 7_I 7 7( gG 10 % )

0

MYOD

ASCL1

KCND2

LECT1

CLDN10

GAPDH

Fig. 5. H3K4me3 and H3K27me3 clusters demonstrate new bivalent groups. (A) K-means clustering was performed to visualize H3K4me3 (K4) and H3K27me3 (K27) marks surrounding 16,621 TSSs. Promoter regions covered were ± 5 kb from TSS. Yellow areas are the regions of the log2 peak intensity higher than zero; black areas close to zero; and blue areas lower than zero. TSS indicated by blue arrow (bottom). (B) Sequential ChIP for H3K4me3 (K4) and H3K27me3 (K27) to validate bivalent domains in Group 1, Group 5 and Group 6. Sequential ChIPs were performed with anti H3K27me3 and H3K4me3 antibodies in hESCs and CXCR4 positive endodermal cells. Fold enrichment for each ChIP was calculated compared to the value of SERPINA1, a negative control for H3K4me3 and H3K27me3. MYOD and ASCL1 belong to Group 1, KCND2 to Group 5, LECT1 and CLDN10 to Group 6, and GAPDH to Group 3. K27_K4; 1st ChIP with K27 and 2nd ChIP with K4, K27_K27; both 1st and 2nd ChIP with K27, K27_IgG; 1st ChIP with K27 and 2nd ChIP with rabbit total IgGs, K27(10%); 10% of 1st K27 ChIP.

(1134/1905) of the genes neighboring SMAD2/3 sites in derived endoderm were also associated in hESCs, compared to 16% (459/2915) when comparing genomic binding sites alone (Fig. 3B and Supplemental file). Thus while individual SMAD proteins associate with DNA in a dynamic fashion during differentiation, they tend to occupy regions surrounding similar genes, suggesting that different loci within gene regulatory domains are used to mediate distinct cellular transcriptional responses. SMAD2, SMAD3, SMAD4 and FOXH1 are known to regulate similar downstream targets in a variety of cellular contexts and are known to form complexes at these sites (Attisano et al., 2001; Silvestri et al., 2008). Therefore, we examined the overlapping targets between these transcription factors. We found that, in both hESCs and derived endoderm, all SMAD transcription factors are bound near a highly

overlapping set of genes, regardless of the distance examined from TSS (Fig. 3C, S4 and Supplemental file) with most bound by all three proteins (1004 within 100 kb). Comparison between the putative target genes for the SMAD2, SMAD3 and SMAD4 with those of FOXH1 shows that while some overlap exists in hESCs (Fig. S5), due to the overwhelming genome-wide occupancy of FOXH1, it is extensive in derived endoderm, encompassing almost all (99%; 1879/1905) of SMAD2/3 and (98%; 2694/2753) of SMAD4 target genes within 100 kb (Fig. 3C). Proximal SMAD2/3/4 and FOXH1 complexes are predictive of gene transcription SMAD2, SMAD3, SMAD4 and FOXH1 bind thousands of regions genome wide, but since transcription factor binding does not

500

S.W. Kim et al. / Developmental Biology 357 (2011) 492–504

examined all regions in the genome within 1 kb, 10 kb and 1 Mb from a TSS, and identified those bound by SMAD2/3/4 or FOXH1 in both hESCs and derived endoderm. We next correlated these binding events with neighboring gene transcription levels, which were compared with transcriptional levels of genes with no detectable binding. Interestingly, we find that in both hESCs and derived endoderm, the presence of a SMAD2/3, SMAD3, SMAD4 or FOXH1 binding event within 1 kb of a TSS is significantly correlated with an increase in transcriptional levels, above background levels (Student's t-test; In hESCs, P = 1.5E–50, 5.6E– 38, 2.5E–15 and 5.6E–16, respectively; In endoderm, P = 3.5E–12, 8.7E– 12, 1.7E–05 and 4.3E–12, respectively; Fig. 4A and B). Once this distance is expanded to 10 kb or 1 Mb, this correlation becomes less significant for all transcription factors. We next examined only the 10 kb interval surrounding TSS and asked whether the presence of multiple SMAD2/3/4 or FOXH1 binding

necessarily affect transcriptional activity, we sought to understand how these binding signatures correlate with gene expression output. By using the timecourse microarray (days 0, 1, 3 and 5) and the ChIP-seq target datasets, we can identify the bound sites correlated with increased transcription. Several critical lineage specification genes including GSC, MIXL1 and EOMES are highly enriched (more than 40 fold) after the first 24 h of differentiation (Fig. 1B). SMAD proteins are bound to regions surrounding each of these developmentally important genes in both hESCs and derived endoderm. In each cellular context the nature of SMAD2/3/4 binding changes (Fig. 2). For example, upon differentiation to endoderm, EOMES and GSC are bound by SMAD2/3/4 in regions not bound in hESCs (Fig. 2 dotted black boxes). Conversely, several regions bound in hESCs are lost in endoderm. To elucidate the most predictive signature of SMAD2/3/4 or FOXH1 binding that can be correlated with transcriptional activation, we

A

Group 1

8

Group 2 31 27 559 617

4 32 2 456 490

0

Peak Level (log2 R)

-4

Group 3

8

Group 4 159 135 938 1232

4

61 55 1324 1440

0 -4

Group 5

8

Group 6 103 126 1887 2116

51 87 1838 1976

4 0 -4 TSS

TSS

K4

K27

K4

TSS

K27

hESC

Endoderm

Induced

Repressed

B Peak Level (log2 R)

TSS

TSS

K4

TSS

K27 hESC

TSS

K4

TSS

K27

Endoderm

Uncategorized

All

Group 1 8 4 Induced & SMAD2/3-bound 21 11 458 490

0 -4 TSS

K4

TSS

K27 hESC

TSS

K4

TSS

Induced & SMAD2/3-unbound Uncategorized All

K27

Endoderm

Fig. 6. Differentially expressed and SMAD2/3 bound genes are associated with H3K27me3 depletion. (A) The peak levels (histograms) of H3K4me3 (K4) and H3K27me3 (K27) in both hESCs and endoderm. Black solid lines indicate the histograms of all genes in each group. Red lines indicate histograms of genes that become transcribed in endoderm. Blue lines indicate histograms of genes that become repressed in endoderm. R represents normalized enrichment over the background. (B) The histograms of H3K4me3 (K4) and H3K27me3 (K27) peaks of Group 1 genes which become transcribed and are also bound by SMAD2/3 in endoderm. SMAD2/3 bound and unbound genes were represented in red and blue lines, respectively.

S.W. Kim et al. / Developmental Biology 357 (2011) 492–504

A

800

501

B

GSC_5'

80

GSC_3'

Fold Enrichment

Fold Enrichment

GSC_TSS 600

EOMES_5' 400

EOMES_TSS EOMES_3'

200

GAPDH

hESC 60

Endoderm

40 20 0

0 K27me3

K4me3

hESC

K27me3

K2 7 K2 _K4 7_ K2 K27 7_ K2 IgG 7( 10 % )

K4me3

Endoderm

EOMES GSC_5’, TSS EOMES_TSS, 3’

K27

hESC

C

SMAD/FOXH1

GSC_5’, 3’ EOMES_5’, 3’

Endo

GSC_5'

D

GSC_TSS GSC_3' EOMES_5'

12

100

EOMES_TSS

GAPDH 8

60 6 40

hESC

Activin_6 h

Activin_12 h

Activin_ 24 h

Activin _12 h

Activin_ 6 h

K27me3

FOXH1

SMAD2/3

K27me3

FOXH1

SMAD2/3

K27me3

FOXH1

0 SMAD2/3

0 K27me3

2

FOXH1

20

hESC

4

SMAD2/3

Fold Enrichment

EOMES_3'

10

80

Activin_24 h JMJD3

Fig. 7. SMAD2/3 binding and H3K27me3 depletion occur simultaneously. (A) H3K4me3 (K4) and H3K27me3 (K27) binding dynamics in promoter regions of GSC and EOMES. ChIPqPCR was performed in hESCs (left) and endoderm (right). 5′; loci further upstream 5 kb from TSS, TSS; loci within 5 kb from TSS, 3′; loci downstream of transcription termination site. GAPDH is a positive control for H3K4me3 and a negative control for H3K27me3. (B) H3K27me3 depletion in EOMES promoter occurs within the same nucleosome. Sequential ChIP was performed with anti H3K27me3 (K27) and H3K4me3 (K4) antibodies in hESCs and CXCR4 positive endodermal cells. Fold enrichment for each ChIP was calculated compared to the value of SERPINA1, a negative control for H3K4me3 and H3K27me3. The EOMES_TSS primer set was used for qPCR. K27_K4; 1st ChIP with K27 and 2nd ChIP with K4, K27_K27; both 1st and 2nd ChIP with K27, K27_IgG; 1st ChIP with K27 and 2nd ChIP with rabbit total IgGs, K27(10%); 10% of 1st K27 ChIP. (C) ChIP-qPCR for SMAD2/3, FOXH1 and H3K27me3 (K27) in the promoter regions of GSC and EOMES during hESC differentiation at 6, 12 and 24 h after Activin treatment. GAPDH is a negative control for SMAD2/3 and FOXH1 binding. (D) ChIP-qPCR for JMJD3 in the promoter regions of GSC and EOMES during hESC differentiation at 6, 12 and 24 h after Activin treatment. GAPDH is a negative control for JMJD3 binding.

events were more predictive of transcriptional activity than isolated binding events. In derived endoderm, we find that multiple binding events within the extended promoter region are highly correlated with increased transcription levels. Further, the more bound regions within this interval, the more significant the correlation. This correlation is particularly strong for regions containing three or more SMAD2/3 or SMAD3 bound sites in derived endoderm (Student's t-test; P = 1.5E–22 and 9.5E–16, respectively; Fig. 4C and D). In hESCs, all SMAD association within 10 kb is highly predictive of transcriptional increase. Overall, this data strongly suggests that in endodermal cells, Nodal signaling targets

are more likely to be transcriptionally activated if surrounded by concentrated regions of binding within 10 kb of the TSS. Endoderm differentiation involves establishment of new bivalent domains As it is known that occupancy of H3K4me3 and H3K27me3 changes during differentiation, particularly at bivalent regions, we sought to examine how these marks change during endoderm commitment. To this end, we used K-means clustering to visualize

502

S.W. Kim et al. / Developmental Biology 357 (2011) 492–504

H3K4me3 and H3K27me3 enrichment in a 10 kb centered window around 16,621 TSSs in both hESCs and derived endoderm (Heintzman et al., 2007; Hon et al., 2008). This analysis led to a clear demarcation of nine different groups (1–9) containing unique signatures that exist in both cell types (Fig. 5A and Supplemental file). Gene ontology (GO) analysis revealed highly significant and unique biological functions for different groups (Table S1 and Supplemental file), particularly the bivalent group with the strongest and widest H3K27me3 marks, which is highly significant for genes with roles in developmental processes (Group 1; Developmental group; P = 1.1E–88) (Bernstein et al., 2006; Ku et al., 2008). Indeed Group 1 contains the key regulators of endoderm formation, including EOMES, GSC, PITX2, SOX17, GATA4 and GATA6. While it is known that the bivalent motif exists in various forms (Cui et al., 2009; Ku et al., 2008; Mikkelsen et al., 2007; Pan et al., 2007), we were surprised to see how many different patterns emerged upon clustering. Interestingly, differentiation into endoderm appears to not only maintain many bivalent domains from hESCs, but also establishes new bivalent domains as well. To test whether these groups reflected true bivalent domains where both marks were present on the same nucleosome and not the result of heterogenous endoderm populations, we performed sequential ChIP-qPCR for H3K4me3 and H3K27me3 at specific promoters in both hESCs and derived endoderm. We examined the promoters of MYOD and ASCL1 (which belong to Group 1), KCND2 (Group 5), LECT1 and CLDN10 (Group 6), and GAPDH (Group 3). These promoters were selected from each group to represent the different bivalent domains depicted in Fig. 5A. In both hESCs and derived endoderm, we performed ChIP first with anti-H3K27me3 and then reprecipitated with antiH3K4me3. Total rabbit IgGs were a negative control for the second pulldown. We find that in each case the sequential ChIP results support the finding that both marks are present within a single nucleosome or, at least, the histone marks are on adjacent nucleosomes (as we sonicated DNA to generate 200- to 600-bp fragments). For example, the clustering of Group 1 promoters demonstrates broad H3K4me3 and H3K27me3 in both hESCs and derived endoderm (Fig. 5A). Sequential ChIP shows that, at the Group 1 genes, MYOD and ASCL1, the bivalent mark is present in the same or neighboring nucleosomes in both cell types (Fig. 5B). Whereas in Groups 5 and 6 promoters, which become bivalent only in derived endoderm, sequential ChIP at KCND2 (Group 5), LECT1 (Group 6) and CLDN10 (Group 6) promoters show both marks present together only in derived endoderm (Fig. 5B). Overall, sequential ChIP supports the hypothesis that bivalent domains are both maintained and enhanced in derived endoderm. Endoderm-specific transcription in Group 1 is associated with H3K27me3 depletion and SMAD2/3 accumulation While bivalent regions are prevalent in endoderm, we asked whether these domains were altered at actively transcribed genes. To this end, we correlated each promoter group (1–9) with induction of transcription during differentiation into endoderm. Histogram plots of the amount of H3K4me3 and H3K27me3 at each induced region for each group are shown in Fig. 6A. While the bivalent conformation is still observed, H3K27me3 levels are depleted and H3K4me3 increased only at promoters in Group 1 and in Group 4 in this endodermally expressed subset (Groups 1 and 4; all P for H3K4me3 and H3K27me3 were b1.0E–06, see Material and methods for statistical analysis). Overall this suggests that the promoters of Group 1 genes that are associated with transcriptional activation, exhibit relative depletion of H3K27me3 and accumulation of H3K4me3. We next determined whether SMAD2/3 binding could be associated with the chromatin changes observed at these few loci in the Group 1 bivalent domains. To this end, we examined whether SMAD2/3 binding within 100 kb of the TSS could be associated with the H3K27me3 depleted regions. Of the 32 Group 1 promoter regions

associated with endodermal transcriptional activation, 21 genes were bound within 100 kb by SMAD2/3. All 21 of these regions displayed almost complete depletion of H3K27me3 in endoderm compared to the other bivalent genes in Group 1 (Figs. 2 and 6B) (both P for H3K4me3 and H3K27me3 were b1.0E–06). Interestingly, the 21 bound regions included EOMES, GSC, SOX17, GATA4, GATA6 and FOXA2, all known to play important roles in endoderm commitment (Fig. 2 and Table S2). The remaining 11 regions of Group 1, which are associated with transcriptional activation and chromatin alterations, are not bound by SMAD2/3 within 100 kb. However, they are bound by SMAD4 or by SMAD2/3 using other criteria: this includes four genes, NOG, HNF1B, AHNAK and MSX2, which are bound by SMAD4 within 100 kb, and four genes, NOG, PCDH7, CYP26A1 and MSX2, which are bound by SMAD2/3 within 1 Mb. We further validated this result by examining biochemically H3K4me3 and H3K27me3 marks in hESCs and derived endoderm at two of the 21 ‘resolving promoters’ (Fig. 7A). We performed ChIP-qPCR using primers that tiled the GSC and EOMES promoters for H3K4me3 and H3K27me3 (Fig. 2). Importantly, in agreement with our findings from genomic analysis, H3K27me3 is depleted or significantly reduced across the promoters only after endoderm differentiation. In addition, we examined whether the H3K27me3 depletion occurs in the same nucleosome during differentiation. To this end, we performed sequential ChIP-qPCR for H3K4me3 and H3K27me3 in hESCs and endoderm at the EOMES promoter. Both marks are precipitated together in the hESC nucleosomes of the EOMES promoter, but not in derived endoderm, further indicating that at these loci and within the same nucleosome the bivalent domain has been resolved (Fig. 7B). Taken together, this suggests that SMAD2/3 or a family member plays an important role in bivalent resolution within these regions which are critical for endodermal specification and provides a system in which to study how these key ‘poised’ regions become activated. SMAD2/3, FOXH1, JMJD3 and H3K27me3 association dynamics during the first 24 h of differentiation As H3K27me3 depletion occurs at active Group 1 bivalent promoters associated with SMAD2/3, we next investigated the kinetics of this depletion over the course of 24 h. Since it was shown recently that SMAD2/3 can recruit the histone demethylase, JMJD3, to the NODAL promoter where it actively demethylates H3K27me3 (Dahle et al., 2010), we further propose that this might be a common mechanism occurring within the 32 Group 1 resolving promoters. Therefore, we performed timecourse ChIP-qPCR using SMAD2/3, FOXH1, H3K27me3 and JMJD3 during hESC differentiation into endoderm at 6, 12, and 24 h at two sites near GSC and EOMES. Surprisingly, we find that most of the SMAD2/3, FOXH1 and JMJD3 accumulation and H3K27me3 depletion occur simultaneously, even as early as 6 h post Activin treatment (Fig. 7C and D). Interestingly, while SMAD2/3 and FOXH1 binding at these regions continues to increase throughout the 24 h window, JMJD3 appears to peak at 6 h, diminishing by 24 h. This is consistent with JMJD3 role as a demethylase as by 24 h there is far less association of H3K27me3 to these regions, suggesting that the methylation is lost rapidly within the first day of differentiation and that JMJD3 plays a key role, not only at the NODAL locus, but also at many others as well. These protein dynamics are highly consistent with the transcriptional activation of GSC and EOMES, which are rapidly transcribed in the first 24 h after Activin treatment, at levels 40–50 fold higher than found in hESCs. Discussion A single broad class of bivalent domains known to be highly enriched at promoters for important developmental factors has been proposed to be ‘poised’ for rapid differentiation. Here we provide genomic and biochemical evidence that substantiate this observation,

S.W. Kim et al. / Developmental Biology 357 (2011) 492–504

at least during endoderm commitment from hESCs. We find that within the developmental class of bivalent promoters (Group1), SMAD2/3, FOXH1 and JMJD3 accumulate simultaneously with the depletion of H3K27me3 within the first 24 h of differentiation. This occurs at critical promoters for endoderm commitment, including GSC and EOMES, correlating with their transcriptional activation within the 24 h of Activin treatment. Furthermore, this implies that resolution of bivalent domains upon differentiation at critical promoters involves interplay between transcription factors and associated chromatin marks. In the case of endoderm, this association appears to be occurring at very few regions. This study presents thousands of regions bound by the SMAD complex and then narrows these regions down to a critical 32 that appear to be central to endoderm commitment. In the future it will be very interesting to understand mechanistically how these 32 regions are singled out for JMJD3 association and subsequent bivalent resolution. Intriguingly, we observed an increase in bivalent domains as hESCs differentiate into endoderm. This maintenance and expansion of bivalent promoters is distinct from many other cell types. While bivalent domains are prevalent in hESCs, encompassing more than 2000 promoters in the genome, most of these bivalent domains become monovalent in more differentiated cell types (Bernstein et al., 2006; Cui et al., 2009; Mikkelsen et al., 2007; Pan et al., 2007; Zhao et al., 2007). While a small fraction (less than 20%) of monovalent genes have been shown to become bivalent in more differentiated cell types including MEFs and NPCs (Mikkelsen et al., 2007; Zhao et al., 2007), we showed that most of the monovalent genes marked by H3K4me3 appear to become bivalent during endoderm formation (as observed in Groups 3, 5, 6 and 7). We suggest that this increase in bivalent promoters in endodermal cells is due to the fact that endoderm is one of the first cell types that arise in the embryo and therefore must maintain a degree of plasticity. It may therefore be expected that multipotent cell types in general retain a bivalent conformation at many promoters and may even utilize new subtleties in this conformation to activate gene transcription. As cells differentiate into endoderm, SMAD transcription factors bind to discrete target sequences. Some of these binding events lead to gene transcription, while others remain ‘inert’. Though the mechanics that govern the selection between these activities is unknown, it is becoming clear that chromatin plays a key role. Recently, a number of studies have shown that the levels of histone methylation and the recruitment of histone methyltransferase with transcription factors are critical for their transcriptional activity (Cheng et al., 2009; Demers et al., 2007; McKinnell et al., 2008; Miller et al., 2008). In agreement with this view, we have defined subtle classes of bivalent domains, each with distinct annotations, transcriptional responses, and binding variability within the endoderm. Group 1 represents the bivalent domain whose function is to regulate ‘Developmental Genes’ which recapitulates previous findings (Bernstein et al., 2006). In addition, we showed another subclass bivalent group, Group 4, which is strongly annotated to neuronal activities (122/414 genes (29.5%), Bonferroni P = 1.08E–20) and cell communication (167/849 genes (19.7%), Bonferroni P = 3.27E– 1). Group 4 is likely to represent the bivalent domains whose function is to fulfill cell signaling as previously discussed (Ku et al., 2008). Since these various bivalent groups revealed in derived endoderm are associated with distinct annotations and display unique histone marks, they can be further classified into the types associated with Polycomb repressive complexes (PRC) (Ku et al., 2008). Groups 1 and 4 are likely to be PRC1-positive because they exhibit large H3K27me3 regions and maintain the bivalent conformation during differentiation as well as are strongly annotated to development and cell signaling. Interestingly, Groups 5, 6 and 7 are likely to contain PRC1negative bivalent domains which emerge during endoderm formation as they display narrow H3K27me3 regions and are associated with non-developmental functions such as protein and DNA metabolism.

503

While many inroads have been made in understanding endoderm formation in vertebrates, the next paradigm shifts in embryology will be advanced by the application of new technologies. As ChIP-seq becomes more utilized in the scientific community, many reports have described transcription factor binding in hESCs and other developmental cell types (Boyer et al., 2005; Cole et al., 2008; Fujiwara et al., 2009; Yu et al., 2009). To date, our datasets are unique, representing not just a single transcription factor, but a complex of factors. Furthermore, these datasets follow the dynamics of this complex of transcription factors through developmental time — from hESC pluripotency to the transformation of hESC into endoderm. An in depth analysis of the relationship between the datasets for SMAD2/3, SMAD3, SMAD4, FOXH1, H3K4me3 and H3K27me3 provides insight into mechanisms underlying how SMAD transcription factors mediate Nodal signaling to specify endoderm. One of the surprising findings from this study is that SMAD2/3, SMAD3 and SMAD4 binding is highly dynamic; few specific bound regions are maintained from hESCs to endoderm. This suggests that the SMAD transcription complex is constantly in flux, using a variety of different sites to elicit activation of individual loci. Furthermore, we also show that FOXH1 has very different binding behavior than the SMAD proteins. First, throughout differentiation, FOXH1 maintains association with the same general genomic locations, whereas SMAD proteins become far more localized in intergenic regions once cells have become endoderm. Second, upon differentiation FOXH1 exhibits widespread binding throughout the genome whereas the SMADs become far more restricted to specific locales. This is consistent with a role of FOXH1, not specifically as a transcriptional activator, but as a pioneer protein which associates with chromatin to recruit histone modifiers to these loci (Cirillo et al., 2002; Cirillo and Zaret, 1999). Nodal signaling is reused throughout development to guide the formation of a plethora of tissue types. It has also been implicated in several cancers (Gupta et al., 2004; Lee et al., 2010; Mangone et al., 2010; Xu et al., 2004). Despite the importance of this signaling pathway, few direct targets have been elucidated since the SMAD transcription factors were identified more than 15 years ago. Here we provide a toolbox for the regions bound by the transcription factors SMAD2/3, SMAD3, SMAD4 and FOXH1 and the histone modifications H3K4me3 and H3K27me3 in both hESCs and derived endoderm that can be used for the functional examination of thousands of additional targets. These targets, several of which are bound by the SMAD complex in both hESCs and derived endoderm, may also be bound and activated in a multitude of other normal and diseased cell types. Thus, we anticipate that the analysis of these factors will have wide-spread benefit to the scientific community by providing both thousands of SMAD targets for further analysis and genomic data that can continually be mined. Supplementary materials related to this article can be found online at doi:10.1016/j.ydbio.2011.06.009.

Acknowledgments We thank Zhengqing Ouyang and Wing Wong for performing microarray analysis, Ziming Weng, Phil Lacroute, and Arend Sidow for performing high throughput sequencing, Yuqiong Pan for assistance with the tissue culture work, and Richard Harland for giving us access to gene clone collection and for supplying HEX and SOX17B probes. This work was supported by the California Institute of Regenerative Medicine to J.C.B.

References Attisano, L., Silvestri, C., Izzi, L., Labbe, E., 2001. The transcriptional role of Smads and FAST (FoxH1) in TGFbeta and activin signalling. Mol. Cell. Endocrinol. 180, 3–11. Attisano, L., Wrana, J.L., 2000. Smads as transcriptional co-modulators. Curr. Opin. Cell Biol. 12, 235–243.

504

S.W. Kim et al. / Developmental Biology 357 (2011) 492–504

Bernstein, B.E., Mikkelsen, T.S., Xie, X., Kamal, M., Huebert, D.J., Cuff, J., Fry, B., Meissner, A., Wernig, M., Plath, K., Jaenisch, R., Wagschal, A., Feil, R., Schreiber, S.L., Lander, E. S., 2006. A bivalent chromatin structure marks key developmental genes in embryonic stem cells. Cell 125, 315–326. Besser, D., 2004. Expression of nodal, lefty-a, and lefty-B in undifferentiated human embryonic stem cells requires activation of Smad2/3. J. Biol. Chem. 279, 45076–45084. Boyer, L.A., Lee, T.I., Cole, M.F., Johnstone, S.E., Levine, S.S., Zucker, J.P., Guenther, M.G., Kumar, R.M., Murray, H.L., Jenner, R.G., Gifford, D.K., Melton, D.A., Jaenisch, R., Young, R.A., 2005. Core transcriptional regulatory circuitry in human embryonic stem cells. Cell 122, 947–956. Cheng, Y., Wu, W., Kumar, S.A., Yu, D., Deng, W., Tripic, T., King, D.C., Chen, K.B., Zhang, Y., Drautz, D., Giardine, B., Schuster, S.C., Miller, W., Chiaromonte, F., Blobel, G.A., Weiss, M.J., Hardison, R.C., 2009. Erythroid GATA1 function revealed by genomewide analysis of transcription factor occupancy, histone modifications, and mRNA expression. Genome Res. 19, 2172–2184. Cirillo, L.A., Lin, F.R., Cuesta, I., Friedman, D., Jarnik, M., Zaret, K.S., 2002. Opening of compacted chromatin by early developmental transcription factors HNF3 (FoxA) and GATA-4. Mol. Cell 9, 279–289. Cirillo, L.A., Zaret, K.S., 1999. An early developmental transcription factor complex that is more stable on nucleosome core particles than on free DNA. Mol. Cell 4, 961–969. Cole, M.F., Johnstone, S.E., Newman, J.J., Kagey, M.H., Young, R.A., 2008. Tcf3 is an integral component of the core regulatory circuitry of embryonic stem cells. Genes Dev. 22, 746–755. Cui, K., Zang, C., Roh, T.Y., Schones, D.E., Childs, R.W., Peng, W., Zhao, K., 2009. Chromatin signatures in multipotent human hematopoietic stem cells indicate the fate of bivalent genes during differentiation. Cell Stem Cell 4, 80–93. D'Amour, K.A., Agulnick, A.D., Eliazer, S., Kelly, O.G., Kroon, E., Baetge, E.E., 2005. Efficient differentiation of human embryonic stem cells to definitive endoderm. Nat. Biotechnol. 23, 1534–1541. D'Amour, K.A., Bang, A.G., Eliazer, S., Kelly, O.G., Agulnick, A.D., Smart, N.G., Moorman, M.A., Kroon, E., Carpenter, M.K., Baetge, E.E., 2006. Production of pancreatic hormone-expressing endocrine cells from human embryonic stem cells. Nat. Biotechnol. 24, 1392–1401. Dahle, O., Kumar, A., Kuehn, M.R., 2010. Nodal signaling recruits the histone demethylase Jmjd3 to counteract polycomb-mediated repression at target genes. Sci. Signal. 3, ra48. Danilov, V., Blum, M., Schweickert, A., Campione, M., Steinbeisser, H., 1998. Negative autoregulation of the organizer-specific homeobox gene goosecoid. J. Biol. Chem. 273, 627–635. Demers, C., Chaturvedi, C.P., Ranish, J.A., Juban, G., Lai, P., Morle, F., Aebersold, R., Dilworth, F.J., Groudine, M., Brand, M., 2007. Activator-mediated recruitment of the MLL2 methyltransferase complex to the beta-globin locus. Mol. Cell 27, 573–584. Fujiwara, T., O'Geen, H., Keles, S., Blahnik, K., Linnemann, A.K., Kang, Y.A., Choi, K., Farnham, P.J., Bresnick, E.H., 2009. Discovering hematopoietic mechanisms through genome-wide analysis of GATA factor chromatin occupancy. Mol. Cell 36, 667–681. Gupta, V., Harkin, D.P., Kawakubo, H., Maheswaran, S., 2004. Transforming growth factor-beta superfamily: evaluation as breast cancer biomarkers and preventive agents. Curr. Cancer Drug Targets 4, 165–182. Heintzman, N.D., Stuart, R.K., Hon, G., Fu, Y., Ching, C.W., Hawkins, R.D., Barrera, L.O., Van Calcar, S., Qu, C., Ching, K.A., Wang, W., Weng, Z., Green, R.D., Crawford, G.E., Ren, B., 2007. Distinct and predictive chromatin signatures of transcriptional promoters and enhancers in the human genome. Nat. Genet. 39, 311–318. Hon, G., Ren, B., Wang, W., 2008. ChromaSig: a probabilistic approach to finding common chromatin signatures in the human genome. PLoS Comput. Biol. 4, e1000201. Izzi, L., Silvestri, C., von Both, I., Labbe, E., Zakin, L., Wrana, J.L., Attisano, L., 2007. Foxh1 recruits Gsc to negatively regulate Mixl1 expression during early mouse development. EMBO J. 26, 3132–3143. James, D., Levine, A.J., Besser, D., Hemmati-Brivanlou, A., 2005. TGFbeta/activin/nodal signaling is necessary for the maintenance of pluripotency in human embryonic stem cells. Development 132, 1273–1282. Ji, H., Jiang, H., Ma, W., Johnson, D.S., Myers, R.M., Wong, W.H., 2008. An integrated software system for analyzing ChIP-chip and ChIP-seq data. Nat. Biotechnol. 26, 1293–1300. Johnson, D.S., Mortazavi, A., Myers, R.M., Wold, B., 2007. Genome-wide mapping of in vivo protein-DNA interactions. Science 316, 1497–1502. Khokha, M.K., Chung, C., Bustamante, E.L., Gaw, L.W., Trott, K.A., Yeh, J., Lim, N., Lin, J.C., Taverner, N., Amaya, E., Papalopulu, N., Smith, J.C., Zorn, A.M., Harland, R.M., Grammer, T.C., 2002. Techniques and probes for the study of Xenopus tropicalis development. Dev. Dyn. 225, 499–510. Ku, M., Koche, R.P., Rheinbay, E., Mendenhall, E.M., Endoh, M., Mikkelsen, T.S., Presser, A., Nusbaum, C., Xie, X., Chi, A.S., Adli, M., Kasif, S., Ptaszek, L.M., Cowan, C.A., Lander, E.S., Koseki, H., Bernstein, B.E., 2008. Genomewide analysis of PRC1 and PRC2 occupancy identifies two classes of bivalent domains. PLoS Genet. 4, e1000242. Lee, C.C., Jan, H.J., Lai, J.H., Ma, H.I., Hueng, D.Y., Lee, Y.C., Cheng, Y.Y., Liu, L.W., Wei, H.W., Lee, H.M., 2010. Nodal promotes growth and invasion in human gliomas. Oncogene 29, 3110–3123. Li, C., Wong, W.H., 2001. Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection. Proc. Natl. Acad. Sci. U. S. A. 98, 31–36.

Liu, H., Dalton, S., Xu, Y., 2007. Transcriptional profiling of definitive endoderm derived from human embryonic stem cells. Comput. Syst. Bioinformatics Conf. 6, 79–82. Mangone, F.R., Walder, F., Maistro, S., Pasini, F.S., Lehn, C.N., Carvalho, M.B., Brentani, M. M., Snitcovsky, I., Federico, M.H., 2010. Smad2 and Smad6 as predictors of overall survival in oral squamous cell carcinoma patients. Mol. Cancer 9, 106. McKinnell, I.W., Ishibashi, J., Le Grand, F., Punch, V.G., Addicks, G.C., Greenblatt, J.F., Dilworth, F.J., Rudnicki, M.A., 2008. Pax7 activates myogenic genes by recruitment of a histone methyltransferase complex. Nat. Cell Biol. 10, 77–84. McLean, A.B., D'Amour, K.A., Jones, K.L., Krishnamoorthy, M., Kulik, M.J., Reynolds, D.M., Sheppard, A.M., Liu, H., Xu, Y., Baetge, E.E., Dalton, S., 2007. Activin a efficiently specifies definitive endoderm from human embryonic stem cells only when phosphatidylinositol 3-kinase signaling is suppressed. Stem Cells 25, 29–38. Meshorer, E., Misteli, T., 2006. Chromatin in pluripotent embryonic stem cells and differentiation. Nat. Rev. Mol. Cell Biol. 7, 540–546. Mikkelsen, T.S., Ku, M., Jaffe, D.B., Issac, B., Lieberman, E., Giannoukos, G., Alvarez, P., Brockman, W., Kim, T.K., Koche, R.P., Lee, W., Mendenhall, E., O'Donovan, A., Presser, A., Russ, C., Xie, X., Meissner, A., Wernig, M., Jaenisch, R., Nusbaum, C., Lander, E.S., Bernstein, B.E., 2007. Genome-wide maps of chromatin state in pluripotent and lineage-committed cells. Nature 448, 553–560. Miller, S.A., Huang, A.C., Miazgowicz, M.M., Brassil, M.M., Weinmann, A.S., 2008. Coordinated but physically separable interaction with H3K27-demethylase and H3K4-methyltransferase activities are required for T-box protein-mediated activation of developmental gene expression. Genes Dev. 22, 2980–2993. Pan, G., Tian, S., Nie, J., Yang, C., Ruotti, V., Wei, H., Jonsdottir, G.A., Stewart, R., Thomson, J.A., 2007. Whole-genome analysis of histone H3 lysine 4 and lysine 27 methylation in human embryonic stem cells. Cell Stem Cell 1, 299–312. Saijoh, Y., Oki, S., Ohishi, S., Hamada, H., 2003. Left-right patterning of the mouse lateral plate requires nodal produced in the node. Dev. Biol. 256, 160–172. Schier, A.F., 2003. Nodal signaling in vertebrate development. Annu. Rev. Cell Dev. Biol. 19, 589–621. Sharov, A.A., Ko, M.S., 2007. Human ES cell profiling broadens the reach of bivalent domains. Cell Stem Cell 1, 237–238. Shen, M.M., 2007. Nodal signaling: developmental roles and regulation. Development 134, 1023–1034. Shiratori, H., Sakuma, R., Watanabe, M., Hashiguchi, H., Mochida, K., Sakai, Y., Nishino, J., Saijoh, Y., Whitman, M., Hamada, H., 2001. Two-step regulation of left-right asymmetric expression of Pitx2: initiation by nodal signaling and maintenance by Nkx2. Mol. Cell 7, 137–149. Silvestri, C., Narimatsu, M., von Both, I., Liu, Y., Tan, N.B., Izzi, L., McCaffery, P., Wrana, J.L., Attisano, L., 2008. Genome-wide identification of Smad/Foxh1 targets reveals a role for Foxh1 in retinoic acid regulation and forebrain development. Dev. Cell 14, 411–423. Takaoka, K., Yamamoto, M., Shiratori, H., Meno, C., Rossant, J., Saijoh, Y., Hamada, H., 2006. The mouse embryo autonomously acquires anterior–posterior polarity at implantation. Dev. Cell 10, 451–459. Teo, A.K., Arnold, S.J., Trotter, M.W., Brown, S., Ang, L.T., Chng, Z., Robertson, E.J., Dunn, N.R., Vallier, L., 2011. Pluripotency factors regulate definitive endoderm specification through eomesodermin. Genes Dev. 25, 238–250. Vallier, L., Alexander, M., Pedersen, R.A., 2005. Activin/Nodal and FGF pathways cooperate to maintain pluripotency of human embryonic stem cells. J. Cell Sci. 118, 4495–4509. Vallier, L., Mendjan, S., Brown, S., Chng, Z., Teo, A., Smithers, L.E., Trotter, M.W., Cho, C.H., Martinez, A., Rugg-Gunn, P., Brons, G., Pedersen, R.A., 2009. Activin/Nodal signalling maintains pluripotency by controlling Nanog expression. Development 136, 1339–1349. Valouev, A., Johnson, D.S., Sundquist, A., Medina, C., Anton, E., Batzoglou, S., Myers, R.M., Sidow, A., 2008. Genome-wide analysis of transcription factor binding sites based on ChIP-Seq data. Nat. Methods 5, 829–834. von Both, I., Silvestri, C., Erdemir, T., Lickert, H., Walls, J.R., Henkelman, R.M., Rossant, J., Harvey, R.P., Attisano, L., Wrana, J.L., 2004. Foxh1 is essential for development of the anterior heart field. Dev. Cell 7, 331–345. Xu, G., Zhong, Y., Munir, S., Yang, B.B., Tsang, B.K., Peng, C., 2004. Nodal induces apoptosis and inhibits proliferation in human epithelial ovarian cancer cells via activin receptor-like kinase 7. J. Clin. Endocrinol. Metab. 89, 5523–5534. Xu, R.H., Sampsell-Barron, T.L., Gu, F., Root, S., Peck, R.M., Pan, G., Yu, J., AntosiewiczBourget, J., Tian, S., Stewart, R., Thomson, J.A., 2008. NANOG is a direct target of TGFbeta/activin-mediated SMAD signaling in human ESCs. Cell Stem Cell 3, 196–206. Yoon, S.J., Wills, A., Chuong, E., Baker, J.C., Unpublished results. HEB and E2A function as SMAD/FOXH1 cofactors. submitted. Yu, M., Riva, L., Xie, H., Schindler, Y., Moran, T.B., Cheng, Y., Yu, D., Hardison, R., Weiss, M. J., Orkin, S.H., Bernstein, B.E., Fraenkel, E., Cantor, A.B., 2009. Insights into GATA-1mediated gene activation versus repression via genome-wide chromatin occupancy analysis. Mol. Cell 36, 682–695. Zhao, X.D., Han, X., Chew, J.L., Liu, J., Chiu, K.P., Choo, A., Orlov, Y.L., Sung, W.K., Shahab, A., Kuznetsov, V.A., Bourque, G., Oh, S., Ruan, Y., Ng, H.H., Wei, C.L., 2007. Whole-genome mapping of histone H3 Lys4 and 27 trimethylations reveals distinct genomic compartments in human embryonic stem cells. Cell Stem Cell 1, 286–298.