Article
Resolving Cell Fate Decisions during Somatic Cell Reprogramming by Single-Cell RNA-Seq Graphical Abstract
Authors Lin Guo, Lihui Lin, Xiaoshan Wang, ..., Jing Liu, Duanqing Pei, Jiekai Chen
Correspondence
[email protected] (D.P.),
[email protected] (J.C.)
In Brief Guo et al. report the cell fate continuum during induced pluripotent stem cell (iPSC) reprogramming at single-cell resolution. By developing SOT as a new analytic tool, they identify several previously unknown bifurcation points along the reprogramming path and propose a generic bifurcation model for cell fate decisions.
Highlights d
Cell fate continuum generated by somatic reprogramming
d
Single-cell Orientation Tracing (SOT) for fate trajectory detection
d
Two non-reprogramming trajectories regulated by Klf4 and IFN-g
d
A generic bifurcation model for cell fate decisions
Guo et al., 2019, Molecular Cell 73, 815–829 February 21, 2019 ª 2019 Elsevier Inc. https://doi.org/10.1016/j.molcel.2019.01.042
Data Resources GSE103221
Molecular Cell
Article Resolving Cell Fate Decisions during Somatic Cell Reprogramming by Single-Cell RNA-Seq Lin Guo,1,2,3,5,7 Lihui Lin,1,2,4,7 Xiaoshan Wang,1,2,4,7 Mingwei Gao,1,2,4 Shangtao Cao,1,2,4 Yuanbang Mai,1,2 Fang Wu,1,2,4 Junqi Kuang,1,2,4 He Liu,1,2,4 Jiaqi Yang,1,2 Shilong Chu,1,2 Hong Song,1,2 Dongwei Li,1,2,4 Yujian Liu,1,2,3 Kaixin Wu,1,2,4 Jiadong Liu,1,2,4 Jinyong Wang,1,2,4 Guangjin Pan,1,2,4 Andrew P. Hutchins,1,2,8 Jing Liu,1,2 Duanqing Pei,1,2,3,4,5,6,* and Jiekai Chen1,2,3,4,5,9,* 1CAS Key Laboratory of Regenerative Biology, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China 2Guangdong Provincial Key Laboratory of Stem Cell and Regenerative Medicine, Guangzhou 510530, China 3Joint School of Life Sciences, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou Medical University, Guangzhou 511436, China 4University of Chinese Academy of Sciences, Beijing 100049, China 5Guangzhou Regenerative Medicine and Health GuangDong Laboratory (GRMH-GDL), Guangzhou 510005, China 6CUHK-GIBH Joint Laboratory of Stem Cell and Regenerative Medicine, the Chinese University of Hong Kong, Hong Kong, China 7These authors contributed equally 8Present address: Department of Biology, Southern University of Science and Technology of China, Shenzhen 518055, China 9Lead Contact *Correspondence:
[email protected] (D.P.),
[email protected] (J.C.) https://doi.org/10.1016/j.molcel.2019.01.042
SUMMARY
Somatic cells can be reprogrammed into induced pluripotent stem cells (iPSCs), which is a highly heterogeneous process. Here we report the cell fate continuum during somatic cell reprogramming at single-cell resolution. We first develop SOT to analyze cell fate continuum from Oct4/Sox2/Klf4- or OSKmediated reprogramming and show that cells bifurcate into two categories, reprogramming potential (RP) or non-reprogramming (NR). We further show that Klf4 contributes to Cd34+/Fxyd5+/Psca+ keratinocyte-like NR fate and that IFN-g impedes the final transition to chimera-competent pluripotency along the RP cells. We analyze more than 150,000 single cells from both OSK and chemical reprograming and identify additional NR/RP bifurcation points. Our work reveals a generic bifurcation model for cell fate decisions during somatic cell reprogramming that may be applicable to other systems and inspire further improvements for reprogramming.
INTRODUCTION Reprogramming of somatic cells to a pluripotent state should be regarded as a breakthrough for basic biology (Takahashi et al., 2007; Takahashi and Yamanaka, 2006, 2016; Wernig et al., 2007; Yu et al., 2007). In theory, reprogramming should generate a spectrum of cell fates between somatic and pluripotent states. As such, investigation into this process should provide valuable
insight into how cell fates and cell fate transitions are regulated. Indeed, by analyzing the cellular and molecular processes associated with reprogramming, it is clear that the process goes through a series of intermediate states with distinct molecular signatures (Buganim et al., 2012; Cacchiarelli et al., 2015; Chronis et al., 2017; Li et al., 2010; Polo et al., 2012; Samavarchi-Tehrani et al., 2010) that have been mapped by comprehensive transcriptomic, proteomic, and epigenetic studies (Cacchiarelli et al., 2015; Chen et al., 2013a, 2013b; Chronis et al., 2017; Di Stefano et al., 2016; Polo et al., 2012; Soufi et al., 2012; Tonge et al., 2014), based on populations of cells. The analysis of bulk samples suggests that MEFs are induced to a pluripotent state passing through several distinct biological processes, such as an early mesenchyme-epithelial transition, autophagy, and histone and DNA demethylations (Buganim et al., 2013; Hochedlinger and Jaenisch, 2015; Li et al., 2010; Papp and Plath, 2013; Polo et al., 2012; Theunissen and Jaenisch, 2014). Despite progresses made through bulk analyses as outlined above, very little is known at the single-cell level. The averaging of populations of cells tends to mask infrequent events during reprogramming, thus obscuring very rare essential cellular transitions, or to overemphasize irrelevant biological processes not required for reprogramming. The fact that only a small fraction of the starting cells eventually become pluripotent demands a vigorous reappraisal of principles and mechanisms established by bulk analysis at single-cell resolution (Buganim €n and van Oudenaarden, 2015). On the other et al., 2013; Gru hand, single-cell analysis may be able to help us uncover rare but important mechanisms that have evaded us so far (Gawad et al., 2016; Tanay and Regev, 2017; Wen and Tang, 2016). Indeed, several studies have been reported that single-cell analyses can help identify transient intermediate stages of
Molecular Cell 73, 815–829, February 21, 2019 ª 2019 Elsevier Inc. 815
reprogramming (Buganim et al., 2012; Lujan et al., 2015; Zhao et al., 2018; Zunder et al., 2015). Yet, cell fate decision and the underlying mechanism for the decision process remains largely unknown. In this report, we mapped the cell fate changes during reprogramming at single-cell resolution. Cell fate continuum was reconstructed between MEFs and ESCs/iPSCs based on the single-cell RNA-seq dataset generated in two distinct reprogramming systems, i.e., the classic Yamanaka system (Chen et al., 2011a; Takahashi and Yamanaka, 2006) and the recently reported chemical method (Cao et al., 2018). Along the cell fate continuum, a common reprogramming pathway is evident that bifurcates toward pluripotency or alternative fates. This generic mechanism not only predicts the existence of alternative reprogramming systems but also guides the improvements of current ones by eliminating alternative fates rationally. RESULTS Sampling Cell Fates during Yamanaka Reprogramming by Single-Cell RNA Sequencing Cells make fate decisions based on both intrinsic and extrinsic cues. Reprogramming of somatic cells to pluripotent ones represents perhaps the best and most unique system to study cell fate decisions, due to the well-defined nature of culture condition and factors. We hypothesize that MEFs, upon the introduction of reprogramming factors, will choose various fates between the starting fibroblastic cells to chimera-competent pluripotent ones. In order to map the cell fate continuum during reprogramming with confidence, we embarked on the optimization of our reprogramming system. We took advantage of our previously described iCD1-OSK (OCT4, SOX2, KLF4) reprogramming system with MEFs carrying the OG2 (Oct4-GFP) reporter. This system results in 10% of cells becoming GFP+ iPSC colonies, and GFP+ cells begin to appear as early as day 3 (Chen et al., 2011a). In order to ascertain when the reprogramming cells achieve pluripotency, we isolated GFP+ cells and injected them into mouse blastocysts directly without further culturing (Figure 1A). This functional assay allows us to conclude that we can generate chimera-competent cells at D7, as 33.3% (6 out of 18) of live pups were chimeras (Figures 1B and 1C; Table S1). It is of interest to note that this rate decreases after D11, perhaps due to the fact that the reprogramming medium iCD1 is not suitable for maintenance of iPSCs as 2iL (Chen et al., 2011a; Ying et al., 2008). After ascertaining the presence of chimera-competent iPSCs in our reprogramming system, we collected single cells for RNA sequencing (RNA-seq) with the Fluidigm C1 platform. In total, we isolated 1,045 single cells at D0, D1, D2, D3, D5, D7, and D8, plus MEFs, iPSCs, and ESCs (Figure S1A), and 912 of them passed quality control. We detected a median of 6,850 genes and a minimum of at least 4,247 genes in each cell (Figures S1B and S1C). Pearson correlation between the mean expression for all cells at each time point to the bulk RNA-seq samples showed very good correlation (R > 0.7) (Figure S1D). We also estimated the infection efficiency of each exogenous transgene by using reads from the RNA-seq that align cross the CDS and LTR (Figure S1E). We showed that around 90%
816 Molecular Cell 73, 815–829, February 21, 2019
of the MEFs expressed all three exogenous transgenes, and the correlation between the expression level of the three transgenes was high (Figures S1F and S1G). Projection of the RNAseq expression into a t-SNE space allows the visualization of the fate transitions during reprogramming (Figure 1D). Compared to the tight clustering of single cells from MEFs and ESCs, the reprogramming cells are more scattered, as expected (Figure 1D). When key genes were mapped onto the t-SNE, it is clear that cells transit through distinct phases from Thy1+ to Nanog+ states in 7 days (Figure 1E). Cells Bifurcate Early into Reprogramming versus Non-reprogramming Cells It is apparent that cells at D0 and D1 are tightly clustered in t-SNE projection (Figure 1D), suggesting the MEF-D1 transitions are nearly synchronized. PCA analysis of MEF-D1 transition supports this idea (Figure S1H), and the left side of the PC1 loading represents mesenchymal-related genes, consistent with our earlier finding that reprogramming starts with a mesenchymalto-epithelial transition (MET) (Li et al., 2010) (Figures S1I and S1J). As a known heterogeneous marker for reprogramming, Thy1 was reported previously to mark cells resistant to reprogramming (Polo et al., 2012) but almost completely silenced in our system at D1, suggesting that cells in our highly efficient system complete MET rapidly (Figure S1K). Interestingly, the right side of the PC1 loading represents genes critical for metabolism (Figures S1I and S1J), suggesting that cells at D1 are leaving the mesenchymal state and reaching a metabolically active intermediate state. Together, these results further confirm the welldefined MET process at single-cell resolution, thus validating our single-cell data with confidence. Indeed, because we eliminated TGF-b from our reprogramming system, the reprogramming cells transitioned through MET at near synchrony (Figure S1K). Furthermore, consistent with previous report that cell surface markers such as CD73 and CD104/CD49d are expressed by intermediate reprogramming prone cells (Lujan et al., 2015), we show that Nt5e (coding for CD73) and Itgb4 (coding for CD104) are upregulated in most of cells by day 2, and Itga4 (coding for CD49d) at day 1 (Figure S1L). The upregulation of Itgb4 is positively correlated with an early epithelial program, as measured by the co-expression of Cdh1 (Figure S1M). Additionally, although Itgb4 is only expressed briefly at day 1, we show that CD104+ cells are biased toward successful reprogramming (Figures S1N and S1O). These results demonstrate the robustness of our experimental system and the power of single-cell RNA-seq datasets. To identify critical decision points along the cell fate continuum during reprogramming, we developed an analytic framework named Single-cell Orientation Tracing or SOT. Briefly, after filtering out genes that contribute to noise factors such as cell cycle, SOT performs affinity propagation (AP) clustering (Frey and Dueck, 2007) to call out the co-expressed gene groups representing non-redundant information along the continuum. Then SOT applies a diffusion map approach (Haghverdi et al., 2015) to construct the high-dimension structure of the continuum trajectory and builds a k-nearest neighbors graph (k-NN graph) for visualization (Kobourov, 2012) that orders cells along the developmental process and branch detection (Figure S2A). As shown
A
B S KLF4 SOX2
140
OCT4
120
Number of injected blastocysts Live pups Live chimeras
127
Numbers
80 60
20 0 Chimera mice
D7 iPSCs
62
60
40
iPSCs at different days
MEF
C
96
100 High efficient iPSCs induce system
13
30 18 0
D6
34 6
D7
9 D8
19
13
1 D11
2
D14
Blastocysts injection
D
E Timepoints
Thy1
Cdh1
Sall4
Day5 Day7 Day8 iPSC ESC
t-SNE 2
MEF Day0 Day1 Day2 Day3
Itgb4
Cd44
log2 (Expression + 1)
OCT4/SOX2/KLF4 Reprogramming
12.5 10.0 7.5 5.0 2.5 0.0
Nanog
912 cells t-SNE 1
Figure 1. Single-Cell RNA Sequencing of Validated Reprogramming Cells (A) Schematic diagram of OCT4/SOX2/KLF4 (OSK)-driven reprogramming and validation of chimera-competent cells. iPSCs isolated by FACS or directly picking at different days are injected into mouse embryos for chimera generation. (B) Summary results of Figure 1A. Note that iPSCs give rise to chimera as early as day 7 (D7). See Table S1 for more details. (C) Chimera with germline transmission from iPSCs at D7. (D) t-SNE projection of 912 single cells revealing cell fate transition during reprogramming. Each dot represents one cell, with color code for sampling time points and conditions. (E) Expression of representative markers is shown using the same layout (log2(Expression + 1)). See also Figure S1 and Table S1.
in Figure 2A, two fate branches emerged in SOT analysis during the day 2 to day 8 reprogramming process, which is highly scrambled in t-SNE plotting (Figures 2A and 1D). We define the bifurcating branches by Wishbone (Setty et al., 2016) and identify the differentially expressed genes between the two branches (Figures 2A and 2B; Table S2). We show that one branch is likely to represent reprogramming potential fate (RP, rightward), as these cells express mostly pluripotent genes such as Tet2 and Sall4, while the second branch represents a non-reprogramming (NR, leftward) fate in which cells express Psca, Klk10, and Cd34 (Figures 2A and 2B). When we mapped specific genes to the SOT map, it is clear that while Cdh1 appears to be present in both NR and RP cells, Klk10, Cd34, Fxyd5, and Psca appear with the NR, Sall4, and Nanog and Dppa5a with RP (Figures 2C and S3A). To examine if this bifurcation represents cell fate determination, we took advantage of the fact that CD34 is a well-known cell surface marker denoting a significant portion of the NR cells. To see if CD34 can denote the bifurcation at early time point, we sorted D3 cells into CD34+ and CD34 populations and re-planted them for further reprogramming (Figures 2D and 2E). As shown
in Figure 2E, when we scored Oct4-GFP-positive colonies at D7, we show that the CD34+ cells are ten times less efficient than the CD34 ones to give rise to GFP+ iPSCs, indicating that CD34 marks cells for non-reprogramming fate. Flow cytometry analysis of reprogramming cells at D7 also showed that most of the CD34+ population at D3 remained CD34+ cells at D7 and hardly became Oct4-GFP+, while CD34 cells rarely gained CD34 expression, indicating that CD34+ cells are committed to a non-reprogramming fate (Figure S3B). Although CD34 only marks a fraction of the NR branch shown in Figure 2A, these results suggest that cells in the NR branch have been driven into a stable non-reprogramming fate. To see whether non-reprogramming colonies come from the NR branch, we then randomly picked Oct4-GFP+ and GFP colonies at day 9 and analyzed them by RNA-seq. We showed that the Oct4GFP colonies express a gene set similar to that of the NR branch, indicating that they have reached a stable alternative fate during reprogramming (Figure S3C). These results suggest that SOT may be a useful tool to define bifurcation during cell fate transitions.
Molecular Cell 73, 815–829, February 21, 2019 817
A SOT (Single-cell Orientation Tracing) NR bran
ch
FR 2
bran
FR 2
FR 2
ch
RP
Timepoints MEF Day0 Day1 Day2 Day3
Day5 Day7 Day8 iPSC ESC
Pseudotime 0
FR 1
FR 1
Dppa5a Nr5a2 Nanog Esrrb Epcam Sall4 Etv5 Cd59a Morc1 Tet2 Dnmt3b Tet1
1
C
Branch NR RP
Gene expression trend 1.0
Normalized expression
B
0.5
FR 1
Nt5e Cd34 Psca Krt14 Klk8 Lypd2 Krt6a Fxyd5 Cd44 Klk10 S100a4
Sall4 Dppa5a Klk10 Cd34 RP NR
0.8 0.6 0.4 0.2 0.0 -0.2
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Pseudotime
Z-Score −1
0
1
2
D Day 3 FACS
CD34- CD34+
Oct4-GFP positive colonies
E 500
***
400 300 200 100 0
Function test
CD34-
CD34+
Figure 2. Bifurcation of the OSK Trajectory (A) The force-directed graph of 912 cells pre-processed with SOT. Each dot is a single cell with colors corresponding to sampling time. FR, force-directed layout algorithm by Fruchterman and Reingold (left panel). The bifurcating branches defined as NR and RP separately (middle panel) and the ordering of D1–D8 cells along pseudotime (right panel). The sequential color indicates the position of each cell in pseudotime starts from 0 to 1. The pseudotime and branch associations were inferred based on the top diffusion components using Wishbone algorithm. NR, non-reprogramming. RP, reprogramming potential. (B) Identification of the differentially expressed genes between NR and RP branches using Beta-Poisson model for single-cell RNA-seq data analyses (BPSC) (Vu et al., 2016) with adjusted p value < 0.01 and lfc > 0.5. Z score of log2(Expression + 1) was shown. See Table S2 for more details. (C) Expression kinetics of selected maker genes along pseudotime. (D) Validation scheme for bifurcated cells. CD34 ± cells were separated by FACS at post-infection D3 and replated for further reprogramming. (E) Results from Figure 2D showing, after replanting, the emergency of Oct4-GFP-positive colonies at post-infection D7, n = 3, technical replicates, data are represented as mean ± SD. ***p < 0.001, two-tailed, unpaired Student’s t test. See also Figures S2 and S3 and Table S2.
A Balancing Act between Klf4 and Sox2 Controls Keratinocyte-like NR/RP Bifurcation To understand the molecular basis of the bifurcation point called out by SOT, we enrich potential upstream transcription factors
818 Molecular Cell 73, 815–829, February 21, 2019
and relevant pathways based on the differentially expressed genes between two separate cell populations (Croft et al., 2014; Fabregat et al., 2018; Janky et al., 2014) (Figure 3A). This combined strategy identifies two potential TF families and three
A
B
C
D
Figure 3. Key TFs and Pathways Separating NR versus RP Branches (A) The strategy to infer TF motifs and pathway enrichment based on the differentially expressed gene list. (B) Dot plot showing the enrichment for potential TF motifs (top panel) and pathway (bottom panel) for each branch. NES, normalized enrichment score. See STAR Methods for more details. (C) MEF cells were infected with OCT4, SOX2, KLF4, OCT4/SOX2, OCT4/KLF4, SOX2/KLF4, and OCT4/SOX2/KLF4, respectively, for 48 h and then changed to reprogramming medium. CD34-positive cells were analyzed by FACS at post-infection D3. n = 2, biological replicates, data are represented as mean ± SD. ***p < 0.001, two-tailed, unpaired Student’s t test. Mock cells were treated with same culture condition without infection. (D) MEF cells were infected with OCT4, SOX2, KLF4, and OCT4/SOX2/KLF4 as described in Figure 3C and then analyzed by qRT-PCR for the expression of the selected genes. n = 2, biological replicates, data are represented as mean ± SEM. See also Figures S3 and S4.
pathways in the NR and RP fates, respectively (Figure 3B). AP-1, known previously as key inhibitory TFs during reprogramming (Li et al., 2017a; Liu et al., 2015), is enriched in the NR branch (Figure 3B). However, it is quite surprising that the KLF family is also enriched in this NR fate, as KLF4 is one of the original Yamanaka factors (Figure 3B). We also notice that the motif for SOX family is enriched in RP cell fate (Figure 3B). One potential explanation for the differences between RP and NR fates could be the differential expression of reprogramming factors from viral-delivered transgenes. We found a modest correlation between the expres-
sion levels of exogenous transgene Klf4 and NR marker genes such as Cd34, Fxyd5, and Klk10 (Figure S3D). We also show that the keratinocyte differentiation pathway is enriched in the NR branch (Figure 3B). NR cells express keratins such as Krt6a and Krt14 (Figure 2B). Since CD34 marks hair follicle stem cells and keratinocytes progenitors (Trempus et al., 2003, 2007) and Klf4 is important for epidermal development (Segre et al., 1999), we hypothesize that KLF4 is responsible for the keratinocyte-like NR cell fate during or after MET and counterbalanced by OCT4 or SOX2 to keep the reprogramming cells toward
Molecular Cell 73, 815–829, February 21, 2019 819
pluripotency. To test this hypothesis, we infect the MEFs with different combinations of OSK and score the CD34+ cells at day 3. We found that Klf4 alone can indeed induce 70% cells into CD34+, but this efficiency would decrease to 20% in Klf4/Sox2-infected cells (Figure 3C). The Klf4-infected cells not only become CD34 positive but also express a series of NR marker genes such as Psca and Klk10, which are not expressed in MEFs, indicating that Klf4 alone induces the keratinocyte-like NR cell fate (Figure 3D). We performed RNA-seq on MEFs expressing O, S, K, OS, OK, and SK (Figure S3E) and showed that each factor or combination induces unique sets of genes: Oct4 upregulates genes related to circulatory system process, Sox2 activates genes for neural fate, and Klf4 induces genes for keratinocyte differentiation (Figure S3E). We also showed that these potential NR fates are inhibited by other reprogramming factors (Figure S3E, right panel), suggesting that the reprogramming factors cooperate to minimize the fate biases of each individual while guiding the reprogramming cells toward pluripotency. We further show that each factor and combination opens unique set of chromatin loci (Figures S3F and S3G), providing a mechanistic explanation for the observed fate choices. Specifically, we show that KLF4 opens chromatin loci controlling keratinocyte-like fate attenuated by Sox2, but not Oct4 (Figure S3H). We then tested whether other members of KLF family have a similar effect. By overexpressing Klf4/Klf2/Klf5 individually with Oct4 and Sox2, we showed that Klf2 can help generate 200 Oct4-GFP-positive colonies at D7, while Klf5 cannot (Figure S4A). Interestingly, Oct4/Sox2/Klf2 induce a similar ratio of CD34-positive cells as Oct4/Sox2/Klf4, suggesting that both Klf2 and Klf4 are capable of generating similar levels of CD34+ NR cells (Figure S4B). The discrepancy in reprogramming efficiency between Klf2 and Klf4 can be further explained by the apparent difference in their ability to mediate MET, i.e., Klf2 is far less effective in inducing Cdh1 and repressing Zeb2 (Figure S4C). By examining the NR marker genes, we show that Klf2/4/5 alone can induce the expression of NR genes significantly, but only Klf4 can increase NR genes when co-expressed with Oct4/Sox2 (Figure S4D). In one of our early studies (Chen et al., 2011b), we demonstrated with an inducible Klf4 system that Klf4 is required at the initial MET stage during reprogramming. We then reproduced this experiment and showed that the extended expression of Klf4 neither impairs reprogramming efficiency nor increases CD34+ cell number (Figures S4E and S4F), suggesting that the MET/ CD34+ activities of KLF4 might have been coupled together. As KLF4 has been widely used in different reprogramming studies, we are able to detect NR markers by qPCR during reprogramming in a secondary inducible Col1a1-OKSM system (second MEF) (Carey et al., 2010) (Figure S4G). Furthermore, we then examined the datasets from previously published works (Chen et al., 2016; Di Stefano et al., 2014) and found that NR marker genes were all upregulated in these systems, consistent with our hypothesis that Klf4 induces keratinocytes-like NR fate during reprogramming (Figure S4H). Identifying Single Pluripotent Cells We then continued to follow the RP branch and wished to determine additional decision points toward pluripotency by focusing
820 Molecular Cell 73, 815–829, February 21, 2019
on the emergence of chimera-competent pluripotent cells or PCs at D7 and D8 (Figures 1A and 1B). To this end, we found 12 cells that are separated from the scattered D2–D8 mixture and move closer to the ESCs/iPSCs (Figures 1D and 4A). When we clustered the D7 and D8 cells into six groups with an unsupervised SC3 method (Kiselev et al., 2017) (Figure 4B; Table S3), we could again identify these 12 cells all in cluster 6. Furthermore, when plotted with MEFs and ESCs as reference on t-SNE, these 12 cells again are clustered with ESCs (Figure 4C). These results suggest that they may be the PC candidates capable of contributing to chimeras as demonstrated in Figure 1B. We fit the regularized regression model (Simon et al., 2013; Zou and Hastie, 2005) to select genes that could interpret the clustering result and show that a series of pluripotent genes such as Esrrb, Dppa5a, Tdgf1, and Utf1 are expressed by these PC candidates (Figure 4B). We analyzed the expression of these pluripotent genes in D5, D7, and PC candidates and ESCs to show a very close relationship between PC candidates and ESCs based on the PC signature genes (Figure S5A). In addition to the PCs, the rest of the cells may represent a stage we tentatively call pre-PCs along the cell fate continuum of the RP branch. Indeed, we further identify signature genes for them with sequential upregulated pattern along pseudo-time on RP branch (Figure S5B), which is similar to results in Figure 4B. By qRT-PCR analysis of select genes, we show that PC signature genes are indeed expressed exclusively in day 7 cultures in contrast to the expression of pre-PC marker genes such as Sall4/Zfp42 (Figure S5C). We then examined published datasets (Chen et al., 2016; Di Stefano et al., 2014) and found that these ‘‘PC candidate’’ signature genes are expressed at very low or undetectable levels until the very last stage of reprogramming in previous bulk analyses, consistent with our conclusion that these genes might serve as markers for late-stage pluripotency with chimera competency (Figure S5D). PC Candidates Represent Chimera-Competent Fate To isolate PC candidates with precision, we generated a Dppa5a-tdTomato knockin reporter in the OG2 background and show that colonies at D7 can be identified as Oct4-GFP+ and Dppa5a-tdTomato+ double positive (G+R+) and (G+R) (Figure 4D). We further show by FACS analysis of cells at D6 and D7 that Dppa5a-tdTomato-positive cells can emerge with 0.3% and 0.7% ratio (Figure S5E). We isolated G+R+ cells and G+R cells and found that G+R+ cells have not only silenced exogenous retroviral vectors but also activated PC marker genes to a similar level as ESCs (Figures S5F and S5G). In addition, G+R+ cells also have much improved performance in clone formation (Figure S5H). To confirm that G+R+ cells are responsible for the live chimeras, we then performed chimera assay again by picking G+R+ or G+R colonies at D7 and injecting these cells into blastocysts respectively. We showed that the G+R+ cells are capable, but that G+R cell are not capable, of giving rise to live chimeras (Figures 4E and 4F; Table S1). Taken together, these results support the idea that the PC candidates defined by single-cell analysis may be the ones chimera competent as shown in Figure 1 and that the Dppa5a-reporter can serve as a fate tracer for reprogramming cells.
A
C
t-SNE 2
Cluster 4 1
MEFs
2
5
iPSCs
3
6
ESCs
Rate-limiting step
PC candidates
t-SNE 1
Dppa2 Esrrb Dppa5a Tdgf1 Prdm14 Tet1 Zscan10 Sall4 Foxh1 Gdf3 Tdh Zfp42 Prmt5 Tdg Cdh1 Rab25 Cldn3 Epcam Gbp2 Ifitm3 Tgtp1 Irf7 Ccl2 Stat1 Ifi202b Vtcn1 Myof Anxa1 Klk10 Cd44 Fxyd5 Cav1 S100a4 Psca Anxa8 Klk8 Cd34
Set A
Set B
Set C
Set D
Set E
Set F
GO enrichment
D
DNA modification stem cell population maintenance blastocyst formation
Phase
Oct4-GFP
Dppa5a-Tdtomato
ncRNA metabolic process RNA splicing DNA repair
epithelial cell morphogenesis regulation of epidermal cell differentiation cell junction organization response to interferon-beta
D7
E
25
Live pups Live chimeras
21
response to interferon-gamma
20
response to type I interferon myeloid cell differentiation negative regulation of cell activation
Numbers
C
lu
st
C er 5 lu ES ste C r iP s 6 S M Cs EF s
4
3
er lu
st
er C
C
lu
st
er st
lu
C
C
lu s
te r
1
B
2
PC candidates
15 10
13 10
negative regulation of cell adhesion·
5 keratinocyte proliferation response to wounding cardiac chamber morphogenesis regulation of vasculature development
0
0 G+R+
G+R-
Z-Score −1
0
1
2
F
Figure 4. Identification of Single Pluripotent Cells (A) Highlight the pluripotent candidates in Figure 1D. PC, pluripotent cell. (B) SC3 clustering partitioned a total of 123 cells from D7 and D8 into six clusters. A total of 730 signature genes were identified to interpret the clustering result based on the regularized regression model. The genes were further clustering into six coordinate gene set (set A–F) (see STAR Methods and Table S3 for more details). The normalized expression was calculated by removing mean and divided by SD across D7 and D8 cells on logarithmical expression level. (C) t-SNE showing the clustering results corresponding to Figure 4B and the relationship among D7, D8, MEFs, iPSCs, and ESCs. (D) MEF cells with Oct4-GFP and Dppa5a-tdTomato double reporter system (OD MEFs) were derived at 13.5 dpc (see STAR Methods) and infected with OSK. Dppa5a-tdTomato-positive colonies emerged from some Oct4-GFP-positive colonies at post-infection D7. Scale bar, 250 mm. (E) Chimera results showed the chimera-competent iPSCs as Oct4-GFP and Dppa5a-tdTomato double-positive (G+R+) colonies. Data were from four independent experiments (see Table S1 for more details). n = 2, biological replicates. (F) Chimera mice derived from Oct4-GFP and Dppa5a-tdTomato double-positive (G+R+) colonies. See also Figure S5, Table S1, and Table S3.
IFN-g Impedes the Final Decision to Pluripotency Given the fact that only a small fraction of RP cells can reach pluripotency (0.7%/25% = 3%, D7; Figure S5E), we propose the Oct4/Dppa5a bifurcating point as a rate-limited step in reprog-
ramming. To identify potential barriers preventing the generation of chimera-competent iPSCs within the RP branch, we compared specific genes upregulated in either PC candidates or the pre-PCs (Figure 5A; Table S4). We then applied these
Molecular Cell 73, 815–829, February 21, 2019 821
A
C
B
D
E
G
F
H
(legend on next page)
822 Molecular Cell 73, 815–829, February 21, 2019
genes on the analysis strategy shown in Figure 3B and showed that PC candidates appear to have acquired an active cell cycle pathway and the corresponding E2F, NF-Y TF activity (Figure 5B), consistent with the fact that pluripotent stem cells have a unique cell cycle (Savatier et al., 1994, 2002; Stead et al., 2002). On the other hand, pre-PCs are enriched with interferon response genes and TF activity of the IRF TF family, which is part of an interferon response (Figure 5B). To confirm this, we re-analyzed the expression patterns from the single-cell RNAseq and identified a group of primarily immune response genes specifically activated during the middle to late stages of reprogramming (D2–D7) but silenced at the end stage of reprogramming and in ESCs (Figure S6A). To confirm this immune response, we performed ELISA assays on serum-free media collected during reprogramming. As shown in Figure 5C, IFN-g starts to rise at D3 and peaks at D4 and D5, then declines around D7, consistent with the qPCR data (Figures 5C and S6B). On the other hand, we did not observe any appreciable activation of IFN-b (Figure S6C). To see if IFN-g influences cell fate decision during reprogramming, we added IFN-g or IFN-b to reprogramming cells. We show that the two cytokines activate different immune response genes as expected (Schneider et al., 2014) (Figure 5D), but critically, only IFN-g can suppress the expression of PC signature genes, such as Dppa5a and Ooep (Figure 5E). We further show that IFN-g completely blocks the formation of Dppa5a-Tdtomato-positive colonies but only has a limited impact on the formation of Oct4-GFP+ colonies (Figure 5F). We showed that IFN-g almost completely blocks the formation of Dppa5a-Tdtomato-positive colonies at 0.8 ng/mL and that the inhibitory effect persists during the entire reprogramming process (Figures S6D and S6E). Interestingly, by blocking IFN-g pathway with anti-IFN-g and anti-IFN-g receptor antibodies, we show that the antibodies do not enhance Oct4-GFP+ colonies but instead enhance Dppa5a-Tdtomato-positive colonies and the expression of late reprogramming markers such as Dppa5a and Ooep (Figures 5G and 5H and S6F–S6H). Consistently, these genes failed to be upregulated by antibodies for IFNAR1 (one of the receptor of IFN-b) and IFN- b (Figures S6I–S6K), indicating that IFN-g, but not IFN-b, is a barrier prior to the acquisition of chimera competency stage during reprogramming. We identified 9/140, 12/147, 1/87, 2/78, and 1/54 cells expressing Ifng at D2, D3, D5, D7, and D8, respectively (Figure S6L),
in agreement with bulk data showing that Ifng began to be expressed around D3. On the other hand, we showed that these cells tend to express GFP, while they are not enriched in the CD34+ lineage, as confirmed by qPCR (Figures S6M and S6N). Curiously, Ifng appears to be expressed in Oct4-GFP+ cells at D3 and D5, but not in Dppa5a+ cells (Figure S6N), suggesting that they may form a sub-branch within RP. Finally, we show that even though the OKSM-mediated reprogramming does not express IFN-g, IFN-g can still diminish the Dppa5a+ colonies when added exogenously, further suggesting that cells at a late stage of reprogramming are sensitive to IFN-g (Figures S6O– S6Q). These results demonstrate that Ifng impedes reprogramming at the step of acquiring chimera competency. Improving Cell Number Shows Detailed Cell Continuum during Reprogramming A higher-throughput platform like the 103 genomics system provides an easy way to get transcriptome from thousands of cells with single-cell resolution. We performed scRNA-seq with the 103 system on OSK-mediated reprogramming and obtained data from 59,141 cells (Figures 6A and S7A–S7C). Projection of these data with UMAP (Mclnnes et al., 2018) showed that the data structure is similar to that from the C1 system, with MEF/D0/D1 cells well separated from each other and D3–D7 cells quite mixed together (Figure 6A). We further performed SOT analysis on the cells of D3–D7 and found three significant cell fate branches throughout the reprogramming process (Figure 6B). We constructed a cell-fate transition network through PAGA (Wolf et al., 2017) based on Figure 6B and defined that there are at least three endpoints for cell types (Figures 6C and 6D). By picking up the clusters representing these three endpoints and the initial cell population at D3, we identified the signature genes for each cell population and found that one is Cd34+/Fxyd5+ NR cells, as shown with the C1 system; another one is neuron-like Ascl1+ cells; and the last one is pluripotencyprone cells (Figures 6C–6F). We then performed the analysis strategy shown in Figure 3B and showed that the neuron-like cell population appears to have acquired SOX, MAZ/PATZ1 TF activity (Figures 6F and S7D), suggesting a potential NR branch which might be driven by Sox2 and Patz1 (Ma et al., 2014). This result is consistent with Sox2-infected MEFs highly expressing neural-fate-relative genes such as Ascl1 (Figure S3E). We also identified PC candidates by SC3 clustering, which is Dppa5a
Figure 5. IFN-g Impedes the Transition to Chimera Competency (A) The signature genes mark pre-PC and PC candidates. The genes ranked at top 1,000 (Student’s t test) from both types of cells were selected. Z score of log2(Expression + 1) was shown. NR, non-reprogramming. RP, reprogramming potential. PC, pluripotent cell. See Table S4 for more details. (B) Dot plot showing the enrichment for potential TF motifs (top panel) and pathway (bottom panel) of pre-PC and PC candidates. NES, normalized enrichment score. (C) IFN-g in cell culture supernatant was quantified by ELISA kit, n = 2, biological replicates, data are represented as mean ± SD. (D and E) IFN-g (20 ng/mL) or IFN-b (20 ng/mL) was added to the medium during OSK-derived reprogramming. The cells were analyzed by qRT-PCR for IFN downstream genes and PC candidate signature genes, n = 3, biological replicates, data are represented as mean ± SEM. (F) Oct4-GFP and Dppa5a-tdTomato positive -olonies from OSK-driven reprogramming with or without the treatment of IFN-g. Colonies were counted at postinfection D7, n = 3, technical replicates, data are represented as mean ± SD. (G) Oct4-GFP- and Dppa5a-tdTomato-positive colonies were analyzed with or without the treatment of IFN-g antibodies. Colonies were counted at post-infection D7, n = 3, technical replicates, data are represented as mean ± SD. **p < 0.01, two-tailed, unpaired Student’s t test. (H) qRT-PCR analysis of the expression level of PC signature genes with or without the treatment of IFN-g antibodies at post-infection D7. n = 3, biological repeats, data are represented as mean ± SEM. See also Figure S6 and Table S4.
Molecular Cell 73, 815–829, February 21, 2019 823
A
B
C
D
E
F
Figure 6. Dissecting Diverse Reprogramming Trajectories using 103 Genomics Single-Cell RNA-Seq (A) Uniform Manifold Approximation and Projection (UMAP) analysis of 59,141 single cells of OSK reprogramming process. Cells are colored by sampling time. (B) Subtle branches structure of D3–D8 cells visualized by FR layout. (C) Mapping the topology 103 OSK reprogramming. The cells were clustering using Louvain algorithm, and the relationship of clusters was measured using partition-based graph abstraction (PAGA) algorithm. (D) The mean expression of representative genes in each cluster was showed in PAGA layout. (E) Main clusters identified by Louvain clustering algorithm. (F) Heatmap shows the top 100 signature genes of reprogramming progenitor, non-reprogramming, neuro-like and reprogramming potential states separately. Bimod test as implanted in R package Seurat (see Table S5 for more details). See also Figure S7 and Table S5.
824 Molecular Cell 73, 815–829, February 21, 2019
A
B
iP
SC
Branch1
Branch1
D26 D32 D36 D40
Branch2
Timepoints D0 D12 D3 D16 D6 D18 D9 D22
No n-r Cdkn ep rog 2a+ ram mi ng
Chemical-medated reprogramming
s
Branch3
iti
Branch2
al
st
at
e
Branch3
In
XEN-like
Cdkn2a
Dcn
Wnt5a
0246
0246
0 2 4
Sox17
Gata4
Gata6
0 2 4 6
0 2 4
0 2 4
Pou5f1
Sox2
Nanog
0 2 4
0 2 4
0 2 4
C NR1
NR2
... ...
NR-X
Figure 7. Resolving the Fate Complexity of Chemical Reprogramming (A) SOT reveals three branches of CIP reprogramming. A total of 12,000 cells were sampling from the dataset for visualization. (B) The expression of signature genes corresponding to each branch. The expression level is transformed by log2(UMI +1). (C) Schematic diagram shows diverse non-reprogramming branches during cell reprogramming. NR, non-reprogramming. See also Figure S7.
positive (Figures S7E and S7F). These results demonstrate that both C1 and 103 systems generate consistent results, despite the differences in platform technologies and throughputs. Resolving the Fate Complexity of Chemical Reprogramming by scRNA-seq Yamanaka induction of pluripotency or YIP has been well studied for the past 10 years or so, and yet we have been able to discover two previously unknown cell fate checkpoints by scRNA-seq, highlighting the resolving power of this technology. We then start to look for the reprogramming route of other reprogramming systems, such as chemical induction of pluripotency (CIP) (Cao et al., 2018; Hou et al., 2013). Unlike YIP, CIP remains in its infancy with a 40 day lengthy induction period (Cao et al., 2018), which may require a higher throughput platform like the 103 genomics system. By performing scRNAseq with the 103 system on CIP system, we then obtained 93,566 cells for 12 time points during CIP reprogramming (Figures S7A, S7G, and S7H). Projection of the expression of these cells into a t-SNE space reveals more than 23 distinct populations (Figure S7I). Based on the clustering of cells, a significant Sox17+/Gata6+ XEN-like cell population (clusters 1/4/10/15) emerged as early as D9–D12 and peaked at D18/D22 (Figures S7I and S7J). Activation of Oct4/Dppa5a can be observed in cluster 15 around D22 and acquisition of late pluripotency
markers such as Sox2 and Nanog in cluster 12 at D36–D40 (Figures S7I and S7J). We then applied SOT framework with optimization for the data produced by 103 systems, revealing a non-reprogramming branch marked by Cdkn2a (Ink4a) and Dcn (Figures 7A and 7B). The rest cells access a XEN-like state that highly expressed Sox17, Gata4, and Gata6 at first (Figures 7A and 7B). The Pou5f1+/Nanog+ iPS cell fate is derived from this XEN-like cell branch (Figures 7A and 7B), demonstrating that XEN-like cell fate switching is a critical intermediate step of chemical-induced pluripotency (CIP) (Cao et al., 2018; Zhao et al., 2015). However, we failed to detect any significant activation of 2C-like genes during our CIP process (Figure S7K), further highlighting the differences among the CIP protocols reported so far (Zhao et al., 2018). DISCUSSION As the basic unit of life, cell assumes a particular fate based on both intrinsic and extrinsic instructions. How cell decides its fate is one of the fundamental questions in biology. During this developmental process, a continuum of cell fates is generated from totipotency of the fertilized egg to fully differentiated cells in an adult, with remarkable precision. With similar precision, a fibroblast has been shown to embark on a reverse journey that traverses from a differentiated state to
Molecular Cell 73, 815–829, February 21, 2019 825
a pluripotent one found only in cells of the inner cell mass (ICM) from a blastocyst, thus generating a cell fate continuum in a way kind of mirroring that of development. By taking advantage of the simplicity for the latter experimentally, we mapped this fate continuum by single-cell RNA sequencing. In addition to validation of previously identified mechanisms such as MET, our results demonstrate that fate decisions have evaded all previous investigations based on bulk cell analyses, thus demonstrating the enormous power of single-cell RNA sequencing. This approach and the findings may have several implications in basic biology. Validating Chimera-Competent Cells In Situ One key feature of our experimental system is the validation of chimera-competent cells at the conclusion of the reprogramming experiments. Previous practice entails the picking of GFP+ colonies from reprogramming dishes and allowing them to expand into stable clones under standard ESC culture conditions. As the subsequent culturing and passaging may take weeks and months, it is not clear if the cells become pluripotent at the time of picking or during the ensuring culture and growing phases. To this end, we show in Figures 1A–1C that the reprogramming cells acquire chimera competency at D7. This direct assay confirms that the PC candidates exist in reprogramming culture and may be identified through single-cell analysis, thus confirming cell identity by relying not only on transcriptome data analysis solely but also on functional validation. This strategy is instrumental in our effort to define the PCs as well as identify the final Dppa5a-marked PC decision post-Oct4 activation (Figure 4). Criteria for Cell Fate Decision Cell fate decision is the specification step for a cell to assume a particular fate. In the developmental field, transplantation experiments are employed to define what state the cell is in in the fate continuum based on functional assays. Even though cell fate transitions in cell culture system may not proceed as precise as in vivo, a definitive functional readout is required for the definition of cell fate decision points. This criterion for cell fate decision should be instrumental in excluding events that are merely correlative in nature but unable to mediate cell fate change. With such criterion, we defined two fate decision points that have evaded us for more than 10 years of investigations since the discovery of iPSC (Takahashi and Yamanaka, 2016). Specifically, the first is a decision between a keratinocyte-like fate and reprograming potential (RP) fate (Figures 2 and 3), and the second is a decision toward a bona fide pluripotent fate (Figure 5). We have developed extensive validation assays to confirm these two fate choices and believe that more fate decisions can be identified and validated from our databases,such a Ascl1+ neural-like fate (Figures 6B–6F). As with MET, which as we described earlier now can be also confirmed during differentiation (Li et al., 2010, 2017b), cell fate decisions we uncovered here may help explain similar decisions in vivo. We believe that, by combining single-cell RNA sequencing data analysis with stringent criteria for experimental validation, one can define more cell fate decision points relevant to both biology and medicine.
826 Molecular Cell 73, 815–829, February 21, 2019
Innate Immunity and IFN-g as Barriers of Reprogramming Our finding that IFN-g impedes the last transition to pluripotency suggests that stem cell fate and immune responses are interconnected. Indeed, it is interesting to point out that earlier reports have identified components of innate immune response activated by the reprogramming factors or viral infection as a positive event required for the acquisition of pluripotency. For example, toll-like receptor 3 (TLR3) has been shown to be required for reprogramming (Lee et al., 2012), and interleukin-6 (IL-6) can facilitate reprogramming (Mosteiro et al., 2016). It is plausible that innate immunity influences cell fate differentially through individual components such as TLR3, IL6, and IFN-g. For example, IFN-g may block the activation of endogenous retroviral elements or other transposable elements so that it prevents the activation of the pluripotent state (Hutchins and Pei, 2015). Furthermore, the inhibitory effect of IFN-g only shows in the final transition step into full pluripotency, but not in early steps, suggesting that this type II interferon response may be only relevant to core pluripotency network. Interestingly, it is reported that embryonic stem cells get a unique property of attenuated innate immunity (Guo et al., 2015), suggesting innate immunity response might be incompatible to pluripotency. It is expected that more mechanistic insights can be generated that may inform us on how to improve reprogramming technologies. Proposed Mode of Cell Fate Decisions We propose that the induction of iPSCs from MEFs is executed in a series of bifurcating decisions. We have shown that one of the earliest bifurcating decisions for the MEFs to make is whether or not to undergo MET (Li et al., 2010). In this study, we have now identified two additional bifurcations in Yamanaka-factor system, i.e., the keratinocyte-like-NR/R and the Dppa5a bifurcations (Figures 5A and S7L), and predicted bifurcations in CIP system (Figures 7A and S7L), thus supporting a generic model of cell fate continuum during somatic cell reprogramming (Figure 7C). While quite preliminary, these findings should provide a conceptual framework for us to rationalize cell fate decision in reprogramming, and perhaps during development. We also showed that optimized reprogramming condition could influence such bifurcation points, as MET is complete in the highly efficient system, further suggesting potential utility of this bifurcation model. Our studies also predict that more of these bifurcation points exist and wait to be discovered, perhaps through single-cell analysis. It is our hope that with improved means to extract and analyze single cells, we should be able to add more bifurcation points along the reprogramming path. The utility of each point will be realized in multiple ways. For instance, each point represents a real decision that requires both intrinsic factors such as the reprogramming factors at the NR/RP point, i.e., Klf4 for and Sox2 against that decision. On the other hand, other points may require extrinsic factors such as IFN-g, as we discovered for the Dppa5a decision. A comprehensive understanding of these decisions will allow improvement of reprogramming technology. For instance, one may develop small molecule inhibitors for the IFN-g pathway, which should be able to improve the
Dppa5a+ chimera-competent iPSCs. Finally, investigation of these intrinsic and extrinsic factors involved in fate bifurcations would help map and understand the specification processes between somatic and pluripotent fates.
STAR+METHODS Detailed methods are provided in the online version of this paper and include the following: d d d
d
d d
KEY RESOURCES TABLE CONTACT FOR REAGENT AND RESOURCE SHARING EXPERIMENTAL MODEL AND SUBJECT DETAILS B Mice B Cell lines METHOD DETAILS B Generation of iPSCs cells B Assessed for pluripotency by direct blastocyst injection B Single cell sequencing B Single colony sequencing B Quantitative PCR B Generation of Dppa5a-tdTomato reporter cell lines B Flow cytometry B Live cell staining B ELISA B Preprocessing of Reprogramming Data QUANTIFICATION AND STATISTICAL ANALYSIS B Statistical Analysis DATA AND SOFTWARE AVAILABILITY B Data Resources B Code Availability
SUPPLEMENTAL INFORMATION
AUTHOR CONTRIBUTIONS L.G., X.W., J.C., and D.P. initiated the project. L.G. and J.C designed the experiments. L.G., H.L., and J.Y. performed single-cell sequencing experiments with G.P.’s help. L.L., X.W., and Y.M. performed the bioinformatics analysis of the data under J.C.’s supervision. L.L., Y.M., and J.C. performed the bifurcation node analysis. L.G., J.L., and J.C., with assistance from S. Chu and H.S., ascertained the presence of chimera-competent iPSCs. L.G., M.G., F.W., and Y. Liu performed the reprogramming experiments. S.Cao performed the CIP experiment. J.K. constructed the Dppa5a-reporter. L.G. and M.G. performed FACS experiments with J.W.’s help. M.G., F.W., K.W., and Jiadong Liu performed qPCR experiments. M.G. performed the ELISA experiments. D.L. performed the ATAC-seq analysis. L.G., L.L., and X.W. wrote the STAR Methods, and A.P.H helped to improve it. D.P. and J.C. supervised the project and wrote the manuscript. DECLARATION OF INTERESTS The authors declare no competing interests. Received: April 23, 2018 Revised: July 25, 2018 Accepted: January 29, 2019 Published: February 13, 2019 REFERENCES Bacher, R., Chu, L.F., Leng, N., Gasch, A.P., Thomson, J.A., Stewart, R.M., Newton, M., and Kendziorski, C. (2017). SCnorm: robust normalization of single-cell RNA-seq data. Nat. Methods 14, 584–586. Buganim, Y., Faddah, D.A., Cheng, A.W., Itskovich, E., Markoulaki, S., Ganz, K., Klemm, S.L., van Oudenaarden, A., and Jaenisch, R. (2012). Single-cell expression analyses during cellular reprogramming reveal an early stochastic and a late hierarchic phase. Cell 150, 1209–1222. Buganim, Y., Faddah, D.A., and Jaenisch, R. (2013). Mechanisms and models of somatic cell reprogramming. Nat. Rev. Genet. 14, 427–439. Butler, A., Hoffman, P., Smibert, P., Papalexi, E., and Satija, R. (2018). Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat. Biotechnol. 36, 411–420.
Supplemental Information includes seven figures and six tables and can be found with this article at https://doi.org/10.1016/j.molcel.2019.01.042.
Cacchiarelli, D., Trapnell, C., Ziller, M.J., Soumillon, M., Cesana, M., Karnik, R., Donaghey, J., Smith, Z.D., Ratanasirintrawoot, S., Zhang, X., et al. (2015). Integrative analyses of human reprogramming reveal dynamic nature of induced pluripotency. Cell 162, 412–424.
ACKNOWLEDGMENTS
Cao, S., Yu, S., Li, D., Ye, J., Yang, X., Li, C., Wang, X., Mai, Y., Qin, Y., Wu, J., et al. (2018). Chromatin accessibility dynamics during chemical induction of pluripotency. Cell Stem Cell 22, 529–542.
We appreciate the assistance of Xiangjie Zhao, Yongqiang Chen, Zhaoming Chen, Ning Guo, Jiaoyang Sun, Yunyun Fu, Rongping Luo, and Shijuan Huang on experiments and analysis. We are grateful to Lei Gu, Wanqiang Sheng, and Yunhao Tan for their helpful discussions and advice. We thank the Guangzhou Branch of the Supercomputing Center of Chinese Academy of Sciences for its support. All single-cell RNA sequencing data reported in this study are available at the Gene Expression Omnibus (GEO) under accession GSE103221. This work is supported by the National Key R&D Program of China (2017YFA0504100 and 2016YFA0100400), the ‘‘Strategic Priority Research Program’’ of the Chinese Academy of Sciences (XDA16010401), the National Natural Science Foundation of China (31522033, 31421004, 31501185, and 31530038), the Science and Technology Planning Project of Guangdong Province (2014B020225002 and 2017B030314056), the National Basic Research Program of China (2014CB965200), the Key Research & Development Program of Guangzhou Regenerative Medicine and Health Guangdong Laboratory (2018GZR110104003), and the Science and Technology Program of Guangzhou (201804020052). The authors gratefully acknowledge the support from CAS Youth Innovation Promotion Association (to L.G.) and Fountain-Valley Life Sciences Fund of University of Chinese Academy of Sciences Education Foundation (to D.P.).
Carey, B.W., Markoulaki, S., Beard, C., Hanna, J., and Jaenisch, R. (2010). Single-gene transgenic mouse strains for reprogramming adult somatic cells. Nat. Methods 7, 56–59. Chen, J., Liu, J., Chen, Y., Yang, J., Chen, J., Liu, H., Zhao, X., Mo, K., Song, H., Guo, L., et al. (2011a). Rational optimization of reprogramming culture conditions for the generation of induced pluripotent stem cells with ultra-high efficiency and fast kinetics. Cell Res. 21, 884–894. Chen, J., Liu, J., Yang, J., Chen, Y., Chen, J., Ni, S., Song, H., Zeng, L., Ding, K., and Pei, D. (2011b). BMPs functionally replace Klf4 and support efficient reprogramming of mouse fibroblasts by Oct4 alone. Cell Res. 21, 205–212. Chen, J., Guo, L., Zhang, L., Wu, H., Yang, J., Liu, H., Wang, X., Hu, X., Gu, T., Zhou, Z., et al. (2013a). Vitamin C modulates TET1 function during somatic cell reprogramming. Nat. Genet. 45, 1504–1509. Chen, J., Liu, H., Liu, J., Qi, J., Wei, B., Yang, J., Liang, H., Chen, Y., Chen, J., Wu, Y., et al. (2013b). H3K9 methylation is a barrier during somatic cell reprogramming into iPSCs. Nat. Genet. 45, 34–42. Chen, J., Chen, X., Li, M., Liu, X., Gao, Y., Kou, X., Zhao, Y., Zheng, W., Zhang, X., Huo, Y., et al. (2016). Hierarchical Oct4 binding in concert with primed
Molecular Cell 73, 815–829, February 21, 2019 827
epigenetic rearrangements during somatic cell reprogramming. Cell Rep. 14, 1540–1554. Chronis, C., Fiziev, P., Papp, B., Butz, S., Bonora, G., Sabri, S., Ernst, J., and Plath, K. (2017). Cooperative binding of transcription factors orchestrates reprogramming. Cell 168, 442–459. Croft, D., Mundo, A.F., Haw, R., Milacic, M., Weiser, J., Wu, G., Caudy, M., Garapati, P., Gillespie, M., Kamdar, M.R., et al. (2014). The Reactome pathway knowledgebase. Nucleic Acids Res. 42, D472–D477. Di Stefano, B., Sardina, J.L., van Oevelen, C., Collombet, S., Kallin, E.M., Vicent, G.P., Lu, J., Thieffry, D., Beato, M., and Graf, T. (2014). C/EBPa poises B cells for rapid reprogramming into induced pluripotent stem cells. Nature 506, 235–239. Di Stefano, B., Collombet, S., Jakobsen, J.S., Wierer, M., Sardina, J.L., Lackner, A., Stadhouders, R., Segura-Morales, C., Francesconi, M., Limone, F., et al. (2016). C/EBPa creates elite cells for iPSC reprogramming by upregulating Klf4 and increasing the levels of Lsd1 and Brd4. Nat. Cell Biol. 18, 371–381. Fabregat, A., Jupe, S., Matthews, L., Sidiropoulos, K., Gillespie, M., Garapati, P., Haw, R., Jassal, B., Korninger, F., May, B., et al. (2018). The Reactome Pathway Knowledgebase. Nucleic Acids Res. 46 (D1), D649–D655. Frey, B.J., and Dueck, D. (2007). Clustering by passing messages between data points. Science 315, 972–976. Gawad, C., Koh, W., and Quake, S.R. (2016). Single-cell genome sequencing: current state of the science. Nat. Rev. Genet. 17, 175–188. €n, D., and van Oudenaarden, A. (2015). Design and analysis of single-cell Gru sequencing experiments. Cell 163, 799–810. Guo, Y.L., Carmichael, G.G., Wang, R., Hong, X., Acharya, D., Huang, F., and Bai, F. (2015). Attenuated innate immunity in embryonic stem cells and its implications in developmental biology and regenerative medicine. Stem Cells 33, 3165–3173. Haghverdi, L., Buettner, F., and Theis, F.J. (2015). Diffusion maps for highdimensional single-cell analysis of differentiation data. Bioinformatics 31, 2989–2998. Haghverdi, L., Lun, A.T.L., Morgan, M.D., and Marioni, J.C. (2018). Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors. Nat. Biotechnol. 36, 421–427. Hochedlinger, K., and Jaenisch, R. (2015). Induced pluripotency and epigenetic reprogramming. Cold Spring Harb. Perspect. Biol. 7, a019448. Hou, P., Li, Y., Zhang, X., Liu, C., Guan, J., Li, H., Zhao, T., Ye, J., Yang, W., Liu, K., et al. (2013). Pluripotent stem cells induced from mouse somatic cells by small-molecule compounds. Science 341, 651–654. Hutchins, A.P., and Pei, D. (2015). Transposable elements at the center of the crossroads between embryogenesis, embryonic stem cells, reprogramming, and long non-coding RNAs. Sci. Bull. (Beijing) 60, 1722–1733.
Li, R., Liang, J., Ni, S., Zhou, T., Qing, X., Li, H., He, W., Chen, J., Li, F., Zhuang, Q., et al. (2010). A mesenchymal-to-epithelial transition initiates and is required for the nuclear reprogramming of mouse fibroblasts. Cell Stem Cell 7, 51–63. Li, D., Liu, J., Yang, X., Zhou, C., Guo, J., Wu, C., Qin, Y., Guo, L., He, J., Yu, S., et al. (2017a). Chromatin accessibility dynamics during iPSC reprogramming. Cell Stem Cell 21, 819–833. Li, Q., Hutchins, A.P., Chen, Y., Li, S., Shan, Y., Liao, B., Zheng, D., Shi, X., Li, Y., Chan, W.Y., et al. (2017b). A sequential EMT-MET mechanism drives the differentiation of human embryonic stem cells towards hepatocytes. Nat. Commun. 8, 15166. Liu, F.T., Ting, K.M., and Zhou, Z.H. (2008). Isolation Forest. Proceedings of the 2008 Eighth IEEE International Conference on Data Mining. IEEE Computer Society, 413–422. Liu, F.T., Ting, K.M., and Zhou, Z.H. (2012). Isolation-based anomaly detection. ACM Trans. Knowl. Discov. Data 6, 1–39, https://doi.org/10.1145/ 2133360.2133363. Liu, J., Han, Q., Peng, T., Peng, M., Wei, B., Li, D., Wang, X., Yu, S., Yang, J., Cao, S., et al. (2015). The oncogene c-Jun impedes somatic cell reprogramming. Nat. Cell Biol. 17, 856–867. Love, M.I., Huber, W., and Anders, S. (2014). Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550. Lujan, E., Zunder, E.R., Ng, Y.H., Goronzy, I.N., Nolan, G.P., and Wernig, M. (2015). Early reprogramming regulators identified by prospective isolation and mass cytometry. Nature 521, 352–356. Ma, H., Ow, J.R., Tan, B.C., Goh, Z., Feng, B., Loh, Y.H., Fedele, M., Li, H., and Wu, Q. (2014). The dosage of Patz1 modulates reprogramming process. Sci. Rep. 4, 7519. Maaten, L.d., and Hinton, G. (2008). Visualizing high-dimensional data using tSNE. J. Mach. Learn. Res. 9, 2579–2605. Macosko, E.Z., Basu, A., Satija, R., Nemesh, J., Shekhar, K., Goldman, M., Tirosh, I., Bialas, A.R., Kamitaki, N., Martersteck, E.M., et al. (2015). Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell 161, 1202–1214. Mclnnes, L., Healy, J., and Melville, J. (2018). UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. arXiv, arXiv: 1802.03426 https://arxiv.org/abs/1802.03426. Mosteiro, L., Pantoja, C., Alcazar, N., Mario´n, R.M., Chondronasiou, D., Rovira, M., Fernandez-Marcos, P.J., Mun˜oz-Martin, M., Blanco-Aparicio, C., Pastor, J., et al. (2016). Tissue damage and senescence provide critical signals for cellular reprogramming in vivo. Science 354, aaf4445. Papp, B., and Plath, K. (2013). Epigenetics of reprogramming to induced pluripotency. Cell 152, 1324–1343. Petris, G., Petrone, S., and Campagnoli, P. (2009). Dynamic Linear Models with R (Springer-Verlag).
Janky, R., Verfaillie, A., Imrichova´, H., Van de Sande, B., Standaert, L., Christiaens, V., Hulselmans, G., Herten, K., Naval Sanchez, M., Potier, D., et al. (2014). iRegulon: from a gene list to a gene regulatory network using large motif and track collections. PLoS Comput. Biol. 10, e1003731.
Picelli, S., Faridani, O.R., Bjo¨rklund, A.K., Winberg, G., Sagasser, S., and Sandberg, R. (2014). Full-length RNA-seq from single cells using Smartseq2. Nat. Protoc. 9, 171–181.
Kiselev, V.Y., Kirschner, K., Schaub, M.T., Andrews, T., Yiu, A., Chandra, T., Natarajan, K.N., Reik, W., Barahona, M., Green, A.R., and Hemberg, M. (2017). SC3: consensus clustering of single-cell RNA-seq data. Nat. Methods 14, 483–486.
Polo, J.M., Anderssen, E., Walsh, R.M., Schwarz, B.A., Nefzger, C.M., Lim, S.M., Borkent, M., Apostolou, E., Alaei, S., Cloutier, J., et al. (2012). A molecular roadmap of reprogramming somatic cells into iPS cells. Cell 151, 1617–1632.
Kobourov, S.J. (2012). Spring embedders and force directed graph drawing algorithms. arXiv, arXiv:1201.3011 https://arxiv.org/abs/1201.3011.
Qiu, X., Mao, Q., Tang, Y., Wang, L., Chawla, R., Pliner, H.A., and Trapnell, C. (2017). Reversed graph embedding resolves complex single-cell trajectories. Nat. Methods 14, 979–982.
Kolaczyk, E.D., and Csa´rdi, G. (2014). Visualizing network data. Statistical Analysis of Network Data with R. Use R! (Springer), pp. 29–41. Lee, J., Sayed, N., Hunter, A., Au, K.F., Wong, W.H., Mocarski, E.S., Pera, R.R., Yakubov, E., and Cooke, J.P. (2012). Activation of innate immunity is required for efficient nuclear reprogramming. Cell 151, 547–558.
Samavarchi-Tehrani, P., Golipour, A., David, L., Sung, H.K., Beyer, T.A., Datti, A., Woltjen, K., Nagy, A., and Wrana, J.L. (2010). Functional genomics reveals a BMP-driven mesenchymal-to-epithelial transition in the initiation of somatic cell reprogramming. Cell Stem Cell 7, 64–77.
Li, B., and Dewey, C.N. (2011). RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics 12, 323.
Savatier, P., Huang, S., Szekely, L., Wiman, K.G., and Samarut, J. (1994). Contrasting patterns of retinoblastoma protein expression in mouse embryonic stem cells and embryonic fibroblasts. Oncogene 9, 809–818.
828 Molecular Cell 73, 815–829, February 21, 2019
Savatier, P., Lapillonne, H., Jirmanova, L., Vitelli, L., and Samarut, J. (2002). Analysis of the cell cycle in mouse embryonic stem cells. Methods Mol. Biol. 185, 27–33.
Tonge, P.D., Corso, A.J., Monetti, C., Hussein, S.M., Puri, M.C., Michael, I.P., Li, M., Lee, D.S., Mar, J.C., Cloonan, N., et al. (2014). Divergent reprogramming routes lead to alternative stem-cell states. Nature 516, 192–197.
Schneider, W.M., Chevillotte, M.D., and Rice, C.M. (2014). Interferon-stimulated genes: a complex web of host defenses. Annu. Rev. Immunol. 32, 513–545.
Trempus, C.S., Morris, R.J., Bortner, C.D., Cotsarelis, G., Faircloth, R.S., Reece, J.M., and Tennant, R.W. (2003). Enrichment for living murine keratinocytes from the hair follicle bulge with the cell surface marker CD34. J. Invest. Dermatol. 120, 501–511.
Scholz, F.W., and Stephens, M.A. (1987). K-sample Anderson-Darling tests. J. Am. Stat. Assoc. 82, 918–924. Segre, J.A., Bauer, C., and Fuchs, E. (1999). Klf4 is a transcription factor required for establishing the barrier function of the skin. Nat. Genet. 22, 356–360. Setty, M., Tadmor, M.D., Reich-Zeliger, S., Angel, O., Salame, T.M., Kathail, P., Choi, K., Bendall, S., Friedman, N., and Pe’er, D. (2016). Wishbone identifies bifurcating developmental trajectories from single-cell data. Nat. Biotechnol. 34, 637–645.
Trempus, C.S., Morris, R.J., Ehinger, M., Elmore, A., Bortner, C.D., Ito, M., Cotsarelis, G., Nijhof, J.G., Peckham, J., Flagler, N., et al. (2007). CD34 expression by hair follicle stem cells is required for skin tumor development in mice. Cancer Res. 67, 4173–4181. Vu, T.N., Wills, Q.F., Kalari, K.R., Niu, N., Wang, L., Rantalainen, M., and Pawitan, Y. (2016). Beta-Poisson model for single-cell RNA-seq data analyses. Bioinformatics 32, 2128–2135. Wen, L., and Tang, F. (2016). Single-cell sequencing in stem cell biology. Genome Biol. 17, 71.
Shannon, P., Markiel, A., Ozier, O., Baliga, N.S., Wang, J.T., Ramage, D., Amin, N., Schwikowski, B., and Ideker, T. (2003). Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13, 2498–2504.
Wernig, M., Meissner, A., Foreman, R., Brambrink, T., Ku, M., Hochedlinger, K., Bernstein, B.E., and Jaenisch, R. (2007). In vitro reprogramming of fibroblasts into a pluripotent ES-cell-like state. Nature 448, 318–324.
Simon, N., Friedman, J., and Hastie, T. (2013). A blockwise descent algorithm for group-penalized multiresponse and multinomial regression. arXiv, arXiv:1311.6529 https://arxiv.org/abs/1311.6529.
Wolf, F.A., Hamey, F., Plass, M., Solana, J., Dahlin, J.S., Gottgens, B., Rajewsky, N., Simon, L., and Theis, F.J. (2017). Graph abstraction reconciles clustering with trajectory inference through a topology preserving map of single cells. bioRxiv. https://doi.org/10.1101/208819.
Soufi, A., Donahue, G., and Zaret, K.S. (2012). Facilitators and impediments of the pluripotency reprogramming factors’ initial engagement with the genome. Cell 151, 994–1004. Stead, E., White, J., Faast, R., Conn, S., Goldstone, S., Rathjen, J., Dhingra, U., Rathjen, P., Walker, D., and Dalton, S. (2002). Pluripotent cell division cycles are driven by ectopic Cdk2, cyclin A/E and E2F activities. Oncogene 21, 8320–8333. Takahashi, K., and Yamanaka, S. (2006). Induction of pluripotent stem cells from mouse embryonic and adult fibroblast cultures by defined factors. Cell 126, 663–676. Takahashi, K., and Yamanaka, S. (2016). A decade of transcription factormediated reprogramming to pluripotency. Nat. Rev. Mol. Cell Biol. 17, 183–193. Takahashi, K., Tanabe, K., Ohnuki, M., Narita, M., Ichisaka, T., Tomoda, K., and Yamanaka, S. (2007). Induction of pluripotent stem cells from adult human fibroblasts by defined factors. Cell 131, 861–872. Tanay, A., and Regev, A. (2017). Scaling single-cell genomics from phenomenology to mechanism. Nature 541, 331–338. Theunissen, T.W., and Jaenisch, R. (2014). Molecular control of induced pluripotency. Cell Stem Cell 14, 720–734. Tibshirani, R. (2011). Regression shrinkage and selection via the lasso: a retrospective. J. R. Stat. Soc. Series B Stat. Methodol. 73, 273–282.
Ying, Q.L., Wray, J., Nichols, J., Batlle-Morera, L., Doble, B., Woodgett, J., Cohen, P., and Smith, A. (2008). The ground state of embryonic stem cell self-renewal. Nature 453, 519–523. Yu, J., Vodyanik, M.A., Smuga-Otto, K., Antosiewicz-Bourget, J., Frane, J.L., Tian, S., Nie, J., Jonsdottir, G.A., Ruotti, V., Stewart, R., et al. (2007). Induced pluripotent stem cell lines derived from human somatic cells. Science 318, 1917–1920. Yu, G., Wang, L.G., Han, Y., and He, Q.Y. (2012). clusterProfiler: an R package for comparing biological themes among gene clusters. OMICS 16, 284–287. Zhang, B., and Horvath, S. (2005). A general framework for weighted gene coexpression network analysis. Stat. Appl. Genet. Mol. Biol. 4, Article 17. Zhao, Y., Zhao, T., Guan, J., Zhang, X., Fu, Y., Ye, J., Zhu, J., Meng, G., Ge, J., Yang, S., et al. (2015). A XEN-like state bridges somatic cells to pluripotency during chemical reprogramming. Cell 163, 1678–1691. Zhao, T., Fu, Y., Zhu, J., Liu, Y., Zhang, Q., Yi, Z., Chen, S., Jiao, Z., Xu, X., Xu, J., et al. (2018). Single-cell RNA-seq reveals dynamic early embryonic-like programs during chemical reprogramming. Cell Stem Cell 23, 31–45. Zou, H., and Hastie, T. (2005). Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. A Stat. Soc. 67, 301–320. Zunder, E.R., Lujan, E., Goltsev, Y., Wernig, M., and Nolan, G.P. (2015). A continuous molecular roadmap to iPSC reprogramming through progression analysis of single-cell mass cytometry. Cell Stem Cell 16, 323–337.
Molecular Cell 73, 815–829, February 21, 2019 829
STAR+METHODS KEY RESOURCES TABLE
REAGENT or RESOURCE
SOURCE
IDENTIFIER
PE anti-mouse CD34
Biolegend
Cat# 119307; RRID: AB_493401
Alexa Fluor 647 anti-mouse CD104
Biolegend
Cat# 123607; RRID: AB_2563511
Mouse IFN-a/b R1 Antibody
R&D Systems
Cat# AF3039; RRID: AB_664107
Anti-IFNAR1-1, clone MARI-5A3
Millipore
Cat# 04-151; RRID: AB_11213383
Anti-Mouse CD119 (IFN gamma recepter 1) Functional Grade Purified
Thermo Fisher Scientific
Cat# 16-1193; RRID: AB_2573079
Anti-Mouse Interferon Beta, Rabbit Serum
R&D Systems
Cat# 32400-1; RRID: AB_2121718
Anti-Mouse IFN gamma Functional Grade purified
Thermo Fisher Scientific
Cat# 16-7311; RRID: AB_469243
Antibodies
Chemicals, Peptides, and Recombinant Proteins Recombinant Mouse IFNb
R&D Systems
Cat# 8234-MB-010
Recombinant Mouse IFNg
R&D Systems
Cat# 485-MI-100
Leukemia Inhibitory Factor (LIF)
Millipore
Cat# ESGE107
CHIR99021
This paper
N/A
PD0325901
This paper
N/A
Y27632
Chembest
Cat# C14357
iCD1 mouse iPS inducing complete medium for the reprogramming of mouse iPS cells
DeliCell
Cat# 820250
Fetal Bovine Serum
Moregate
Batch# 67301120
DMEM-Dulbecco’s Modified Eagle Medium, High Glucose
Hyclone
Cat# SH30022-2B
GlutaMAX
GIBCO
Cat# 35050079
Non-Essential Amino Acids Solution
GIBCO
Cat# 11140076
2-Mercaptoethoethanol
GIBCO
Cat# 21985-023
Sodium Pyruvate Solution
GIBCO
Cat# 11360070
Penicillin-Streptomycin Solution
Hyclone
Cat# SV30010
Phosphate-Buffer Saline (PBS)
GIBCO
Cat# C14190500BT
Trypsin-EDTA (0.05%), phenol red
GIBCO
Cat# 25300120
Trypsin-EDTA (0.25%), phenol red
GIBCO
Cat# 25200114
0.1% Gelatin Solution
Millipore
Cat# ES-006-B
DAPI
Sigma
Cat# D9542
TRI Reagent
MRC
Cat# TR118-200
Puromycin Dihydrochloride
Thermo Fisher Scientific
Cat# A1113803
Mineral Oil, Light
Thermo Fisher Scientific
Cat# O121-20
KSOM Embryo Culture (1x)
Millipore
Cat# MR-020P-5F
M2 Medium (1x)
Millipore
Cat# MR-015P-5F
ReverTra Ace
Toyobo
Cat# A3477L
Oligo-dT
TaKaRa
Cat# D511
RRI
TaKaRa
Cat# D2313A
SsoAdvanced Universal SYBR Green Supermix
Bio-Rad
Cat# 172-5274
SMARTer Ultra Low RNA Kit for Illumina Sequencing
Clonetech
Cat# 643833
C1 Single-Cell Auto Prep Reagent Kit for mRNA Seq
Fluidigm
Cat# 100-6201
Quant-iT PicoGreen dsDNA Assay Kit
Thermo Fisher
Cat# P7589
C1 IFC for mRNA seq (10-17 mm)
Fluidigm
Cat# 100-5760
Critical Commercial Assays
(Continued on next page)
e1 Molecular Cell 73, 815–829.e1–e7, February 21, 2019
Continued REAGENT or RESOURCE
SOURCE
IDENTIFIER
C1 IFC for mRNA seq (5-10 mm)
Fluidigm
Cat# 100-5759
VAHTS mRNA-seq v2 Library Prep Kit for Illumina
Vazyme
Cat# NR601
Nextera XT DNA Sample Preparation Kit
Illumina
Cat# FC-131-1096
Chromium Single Cell 30 Library Kit
10x Genomics
Cat# 120230
Chromium Single Cell 30 Gel Bead Kit
10x Genomics
Cat# 120231
Chromium Single Cell 30 Chip Kit
10x Genomics
Cat# 120232
NextSeq500 Mid output 150 cycles
Illumina
FC-404-2001
NextSeq500 High output 150 cycles
Illumina
FC-404-2002
Murine IFN-g ELISA Kits
Beijing 4A Biotech
Cat# CME0003
Mouse IFN-b ELISA Kit
PBL Assay Science
Cat# 42400
Mouse ES Cell Nucleofector Kit (25 RCT)
LONZA
VVPH-1001
Deposited Data Raw and analyzed data
This paper
GEO: GSE103221
Original data
This paper
https://doi.org/10.17632/p5pjm2smmp.1
2nd OSKM MEF data
Chen et al., 2016
GEO: GSE67462
Pulse B cells data
Di Stefano et al., 2014
GEO: GSE52396
OG2 mouse embryonic stem cells
This study
N/A
Primary mouse embryonic fibroblast
This study
N/A
OSK mB4-4 (mouse induced pluripotent stem cell lines)
This study
N/A
OSK mB1-1 (mouse induced pluripotent stem cell lines)
This study
N/A
Platinum-E (Plat-E)
A gift from The Fourth Military Medical University
N/A
Experimental Models: Cell Lines
Experimental Models: Organisms/Strains OG2 mice
The Jackson Laboratory
Mouse strain datasheet: 004654
129 mice
Beijing Vital River Laboratory Animal Technology
Stock No.: 217
CD-1 mice
Beijing Vital River Laboratory Animal Technology
Stock No.: 201
R26rtTA;Col1a14F2A mice
The Jackson Laboratory
Mouse strain datasheet: 011004
This study
N/A
Oligonucleotides Oligonucleotides are summarized in Table S6 Recombinant DNA pMXs-Oct3/4
Addgene
Cat# 13366
pMXs-Sox2
Addgene
Cat# 13367
pMXs-Klf4
Addgene
Cat# 13370
pMXs-cMyc
Addgene
Cat# 13375
pMXs-Flag
This study
N/A
pMXs-Klf2
This study
N/A
pMXs-Klf5
This study
N/A
pW-TRE-Klf4
This study
N/A
pMM-rtTA
This study
N/A
Px330-U6-Chimeric_BB-CBh-hSpCas9
Addgene
Cat# 42230
FlowJo
FlowJo LLC
https://www.flowjo.com/solutions/flowjo
ZEN
Zeiss
https://www.zeiss.com/microscopy/int/ software-cameras.html
Software and Algorithms
(Continued on next page)
Molecular Cell 73, 815–829.e1–e7, February 21, 2019 e2
Continued REAGENT or RESOURCE
SOURCE
IDENTIFIER
Bio-RAD CFX Manager
BIO-RAD
http://www.bio-rad.com/en-us/product/ cfx-manager-software?tab=Download
Prism 6
GraphPad
https://www.graphpad.com/scientificsoftware/prism/
Beacon Designer
PREMIER Biosoft
http://www.premierbiosoft.com/molecular_beacons/
Isolation Forest
Liu et al., 2008
http://scikit-learn.org/dev/modules/ generated/sklearn.ensemble.IsolationForest. html#sklearn.ensemble.IsolationForest
Anderson-Darling test
Scholz and Stephens, 1987
R package kSamples
t-SNE
Maaten and Hinton, 2008
https://lvdmaaten.github.io/tsne/
UMAP
McInnes et al., 2018
python package umap
AP clustering
Frey and Dueck, 2007
R package apcluster
Diffusion map
Haghverdi et al., 2015
R package destiny
Monocle2
Qiu et al., 2017
R package monocle
Wishbone (pseudotime ordering)
Setty et al., 2016
https://github.com/ManuSetty/wishbone
Fruchterman-Reingold layout (visualization)
Kolaczyk and Csa´rdi, 2014
R package igraph
DESeq2
Love et al., 2014
R package DESeq2
BPSC
Vu et al., 2016
R package BPSC
Seurat
Butler et al., 2018
R package Seurat
iRegulon
Janky et al., 2014
Cytoscape software plugin
clusterProfile (pathway enrichment)
Yu et al., 2012
R package clusterProfiler
SC3
Kiselev et al., 2017
R package SC3
Elastic-net
Zou and Hastie, 2005
R package glmnet
Dynamic linear model and Kalman smoother
Petris et al., 2009
R package dlm
PAGA
Wolf et al., 2017
python package scanpy
scran
Haghverdi et al., 2018
R package scran
CONTACT FOR REAGENT AND RESOURCE SHARING Further information and requests for reagents may be directed to the Lead Contact, Jiekai Chen (
[email protected]). EXPERIMENTAL MODEL AND SUBJECT DETAILS Mice All animal procedures were performed according to NIH guidelines. All mouse work performed in Duanqing Pei’s laboratory were approved by the Institutional Animal Care and Use Committee of Guangzhou institutes of biomedicine and health, Chinese Academy of Sciences. OG2 transgenic mice (CBA/CaJ 3 C57BL/6J) were original from Jackson Laboratory (Mouse strain datasheet: 004654); CD-1 (Stock No.: 201) and 129 (Stock No.: 217) mice were obtained from Beijing Vital River Laboratory Animal Technology Co., Ltd. The mice were housed with a 12 hr light/dark cycle between 07:00 and 19:00 in a temperature controlled room (22 - 23 C) with free access to water and food. Cell lines Primary mouse embryonic fibroblasts were derived from 13.5 d.p.c. mouse embryos from 129 female mice crossing male OG2 mice. Plat-E cell line was a gift from The Fourth Military Medical University. MEFs and Plat-E cells were maintained in 10% FBS medium (high glucose DMEM supplemented with 10% FBS, GlutaMAX and NEAA). OG2 mouse embryonic stem cells were derived from 3.5 d.p.c ICM from 129 female mice crossing male OG2 mice. iPSCs lines were derived from OCT4/SOX2/KLF4 retrovirus induced MEF cells with the same genetic background as the OG2 mESCs. Mouse ESCs (mESCs) and iPSCs were cultured on feeder layers with 2iL medium (high glucose DMEM, 15%FBS, NEAA, GlutaMAX, LIF) plus 2i (1 mm PD0325901, 3 mm CHIR99021) and LIF. All the ESC and iPSC lines sequenced in this paper had a correct karyotype, and were able to generate germline transmission chimera mice.
e3 Molecular Cell 73, 815–829.e1–e7, February 21, 2019
METHOD DETAILS Generation of iPSCs cells Plat-E cells were plated at density of 8 3 106 cells per 100 mm dish one day before transfection. Retroviruses were prepared by transfecting pMXs vectors containing reprogramming factors using calcium phosphate method. To perform calcium phosphate transfection, the medium was replaced with 7.5 mL fresh 10%FBS when cell density was around 80%. 25 mg plasmid DNA (final concentration 1 mg/mL) was added into 1068.75 ml water. Then 156.25 ml 2 M CaCl2 solution was added to the DNA-water mixture. Next, 1250 ml 2 3 HBS (0.05mol/L HEPES, 0.012 mol/L D-(+)-Glucose, 0.28 mol/L NaCl, 0.023 mol/L KCl, 0.0015 mol/L Na2HPO4) was added into the well-mixed solution with the pipette at several times. The final mixture was incubated statically for 3 min at room temperature and added drop by drop onto Plat-E cells. Cells were then incubated overnight at 37 C with 5% CO2. 12 hours after transfection, the medium was replaced with fresh 10% FBS medium. The viral supernatants were collected 48 hours and 72 hours post-transfection and filtered by 0.45 mm filter (Millipore). Retroviral supernatants were stored at room temperature and used within 24 hours. MEFs at passage 1 were digested by 0.25% trypsin and replanted at 4000 cells/cm2 12 hours before infection. Retroviral supernatants supplemented with 4 mg/mL polybrene were used to infect MEFs with 150 ml/cm2. The same infection procedure was repeated once the following day. The day that viral supernatants were removed and indicated media were added was defined as day 0 post-treatment. Medium was changed every day. OSK iPSCs were generated in iCD1 medium (Chen et al., 2011a). In the part of neutralization experiment, the IFN-g antibodies were added following anti-IFNGR1 1 mg/mL from D0 to D7 and anti-IFN-g 1 mg/mL from D3 to D7. Chemical induced iPSCs (CIP) were generated as previously described (Cao et al., 2018). Assessed for pluripotency by direct blastocyst injection During reprogramming, iPS cells derived from OG2/129 MEFs were sorting out in different days through two methods: (I) Flow cytometry sorting Oct4-GFP positive cells. After sorting, cells were centrifuged at 300 g, and resuspended in mES plus 2i medium; (II) Manually picked Oct4-GFP positive colonies were pooled together. After washing with PBS, the colonies were digested by 0.25% trypsin, and then neutralizing trypsin with mES plus 2i medium. Cells were centrifuged at 300 g, and resuspended in mES plus 2i medium. The iPS cells were put on ice ready for blastocyst injection. 10-15 of the digested cells were selected by inspection of the Oct4-GFP or together with Dppa5a-Tdtomato fluorescence, and injected into each E3.5 embryo. Afterward, about 15 injected embryos were transplanted into the uterine horn of 2.5 d.p.c. pseudo-pregnant mouse. CD-1 mice were used as embryo donors and the pseudopregnant recipients for blastocyst injection. The chimera mice were crossing with CD-1 mice to identify germline transmission. Single cell sequencing MEFs, ESCs, or reprogramming cells were digested by 0.25% trypsin, and washed twice with PBS. Single cells were captured using the Fluidigm C1 Single-Cell Auto Prep System. 5-10 mm C1 chips for ESCs, iPSCs and late reprogramming cells (D3, D5, D7) and 10-17 mm C1 chips for MEFs and early reprogramming cells (D0, D1, D2) dependent on the cell diameters. A concentration of 2 3 105 cells per ml was used for chip loading. After cell capture, chips were observed under the microscope to exclude capturing multiple cells. Cell lysis and cDNA synthesis were performed on-chip with the SMARTer Ultra Low RNA kit for the Fluidigm C1 system and following the manufacturer’s instructions. After cDNA harvesting, the concentrations were analysis with PicoGreen (Thermo Fisher) following the protocol of Single-Cell mRNA Seq PicoGreen Template (Fluidigm, PN 100-6160). 0.5 ng amplified products were used for Nextera XT library preparation, and following the protocol of Using the C1 Single-Cell Auto Prep System to Generate mRNA from Single Cells and Libraries for Sequencing (Fluidigm, PN 100-5950 B1). For 10x single-cell RNA-seq, we prepared the libraries following the Chromium Single Cell 30 Reagent Kits User Guide.The single cell libraries were quantified by Quant-iT dsDNA Assay Kit, high sensitivity (Thermo Fisher) on Qubit 2.0, and then sequenced on illumina NextSeq 500. Single colony sequencing MEFs were plated at 5000 per well in a 6-well plate. iPS cells were induced as described above. Cells were digested in situ at postinfected day 9. The single colony libraries were prepared following the Smart-seq2 (Picelli et al., 2014) protocol. The single colony libraries were sequenced on NextSeq 500 (Illumina). Quantitative PCR Total RNAs were extracted by chloroform-isopropanol method. The first-strand cDNAs were synthesized with ReverTra Ace (Toyobo) and oligo-dT (Takara), and then qRT-PCR was performed on a CFX96 real-time system (Bio-Rad) with SsoAdvanced Universal SYBR Green Supermix (Bio-Rad). The primers used for qRT-PCR were listed in Table S6. Generation of Dppa5a-tdTomato reporter cell lines The Dppa5a-tdTomato reporter ESC line was generated by genome editing using CRISPR/Cas9. The donor vector containing a tdTomato-poly(A)-loxp-PGK-Puro-loxp cassette was targeted between the 50 UTR and the first exon of Dppa5a locus. The pX330 vector (#42230, addgene) containing U6-sgDppa5a and linearized donor vectors were electroporated into OG2 mES cells in a ratio
Molecular Cell 73, 815–829.e1–e7, February 21, 2019 e4
of 3:2 (w/w) using Nucleofector 2b Device (LONZA) following the instruction of the manual. The colonies were selected by puromycin (2 mg/mL) for the following three days. The mES cell lines with right genotype and karyotype were injected into CD-1 blastocysts and the male chimeras were bred with female C57BL/6 or OG2 to get the offspring. The sgRNA used for genome editing was listed in Table S6. Flow cytometry Reprogramming cells at different days were digested by 0.25% trypsin, and wash with PBS. The cells were resuspension with flow cytometry buffer (PBS with 2% FBS), and CD34 antibody (BioLegend, 119307) or CD104 antibody (BioLegend, 123607) was used as recommended by the manufacturer for 15 min at 4 C. After staining, cells were washed by flow cytometry buffer and resuspension with flow cytometry buffer containing DAPI (2.5 mg/mL, Sigma), and analyzed with Fortessa (BD Biosciences) or sorted with Arial (BD Biosciences). The FACS data was analysis with FlowJo software. Live cell staining Reprogramming cells were changed with fresh medium containing CD34 antibody (BioLegend, 119307) at a final concentration of 2 mg/mL. The cells were kept under normal culture conditions in an incubator for one hour, and then the medium was removed. Cells were washed with DPBS twice, and then the medium was changed to fresh medium. The light was avoided during this process. After staining, cells were scanned in a BiosStation CT. ELISA Samples were collected from cell culture supernatant during OSK induced reprogramming and centrifuged for 5 min at 2500rpm. Particulates were carefully removed, and samples were assayed immediately, or stored in aliquots at 20 C. Murine IFN-b ELISA Kits were acquired from PBL. Murine IFN-g ELISA Kits were acquired from Beijing 4A Biotech. Samples were detected following the manufacturer’s instructions. Preprocessing of Reprogramming Data Fluidigm C1 data The raw bcl files produced by Illumina Next500 were converted to fastq files by bcl2fastq (v1.84, illumina) scripts. Data features were checked by FastQC. Reads were aligned to mouse reference genome mm10(GRCm38.p5) with gene annotation file from ensemble database (Ensemble release 86) by RSEM (Li and Dewey, 2011). The doublets observed in photomicrograph were excluded. We also removed outliers using Isolation Forest (Liu et al., 2008, 2012), which is based on randomly generated binary trees on gene expression values. We trimmed top 5% extreme value of each gene by winsorization to reduce the effect of extreme expression. 10x data The cellranger-2.1.1 was used for mapping of the 10x single cell RNA-seq data. The read1 data of pooled cells were split into singlecell data using the barcode sequences contained in the first 16 bp. The next 10 bp were recorded as unique molecular identifiers (UMIs). Read2 with 75 bp were aligned to the mm10 genome. We used Seurat (v2.3.0) to pre-processing the data. For OSK data of MEF, D0, D1, D3, D5, D6, D7, D8, ESC, we excluded the cells with detected genes less than 2500 and larger than 6000, sum of the non-normalized UMI counts larger than 50000, percentage of mitochondrion UMI values larger than 0.15. For CIP data of D0, D3, D6, D9, D12, D16, D18, D22, D26, D32, D36, D40, we excluded the cells with detected genes less than 1000 and larger than 6000, sum of the non-normalized UMI counts larger than 40000, percentage of mitochondrion UMI values larger than 0.2. The total UMIs of each cell were normalized to 10000 and logarithmic transformed. Winsorization was performed on each gene where the top 100 values across cells were replaced by the 101th value to reduce the effect of possibly spurious outliers. Overall, 59141 (OSK) and 93585 (CIP) cells passed the quality control criteria. On median, there were 13803/11710 UMI counts and 3417/3121 detected genes for each cell of OSK/CIP data. Single-cell Orientation Tracing (SOT) SOT contains four steps (Figure S2A): 1. Gene filter; 2. Grouping co-expressed genes and calculating the core expression pattern; 3. Recovering the low-dimensional representation of the heterogeneity via manifold learning; 4. Identification of bifurcating reprogramming trajectories. 1. Gene filter. The sequencing depth were first normalized by SCnorm (Bacher et al., 2017) for C1 or Seurat for 10x data. The high variable genes (HVGs) were then identified using R package scran or Seurat. The annotated cell cycle genes from Macosko et al. (Macosko et al., 2015) were excluded from the gene list. We next test if gene have similar distribution across different time points using Anderson-Darling test. The genes with adjusted p value < 1e-3 were retained for further analysis. 2. Grouping co-expressed genes. To extract the continuous heterogeneities that related to the cell fate transition, HVGs were organized into co-expression groups using affinity propagation (AP) clustering (Frey and Dueck, 2007) implemented by R package apcluster. The pairwise correlations of genes were used as input for clustering while the cluster number was determined automatically by recursively transmitting messages between data points. After the gene clusters were identified, the pattern of each cluster was summarized by calculating the first eigenvector of PCA, called ‘‘eigengene’’ (Zhang and Horvath, 2005), which provides a denoised view
e5 Molecular Cell 73, 815–829.e1–e7, February 21, 2019
of the data. The direction of eigengenes was then corrected according to the correlation with the mean cluster expression. The gene clusters were further clustering into several gene groups to generate non-redundant view of the cellular function. Similarly, the eigengenes of each gene group can be calculated on gene expression. The gene groups can be examined for biological meanings and for their character of expression pattern. We performed GO and Reactome pathway enrichment for each group genes, the group that enriched for cell-cycle effect were excluded. In addition, we normalized the eigengenes by removing mean value followed by adjusting to 1 deviation. This procedure balances the weights of each eigengene so as to equivalently reflect heterogeneities in rare as well as abundant populations. 3. Low-dimensional representation of the heterogeneity. Diffusion-map approach was used to reconstruct the reprogramming trajectories. Diffusion-map is a dimension reduction algorithm that preserves the non-linear structure of data as a continuum in high dimension space. The normalized eigengenes of gene groups with biological meanings were used as input for diffusion map. We calculated a diffusion-map as implemented in the R package destiny using the default parameters. 4. Identification of bifurcating reprogramming trajectories. Fruchterman-Reingold algorithm (Kobourov, 2012) was applied to visualize the continuous reprogramming process. R package FNN is used to build k-nearest neighbor graph (k-NN graph) on top diffusion components, each cell is a node that extends edges to the k other nodes with the shortest Euclidean distance. The FR layout used to order single cells and detect branches simultaneously along bifurcating reprogramming trajectories by Wishbone algorithm (Setty et al., 2016). TF-Motifs and pathway enrichment TF-Motifs enrichment. We inferred the transcription factors (TFs) and the corresponding binding motifs of different cell cluster using iRegulon (Janky et al., 2014) plugins in Cytoscape software. For the signature genes we identified, the over-represented motifs and associate TFs were enriched in the 10kb search space around Transcription Start Site (TSS). TFs with expression in more than 50% of origin cells in a cluster were considered as potential regulatory factors. The similar motifs were clustered (Shannon et al., 2003) to provide concise results, and the normalized enrichment score (NES) of each motif cluster was defined by the maximum NES among cluster. The motif clusters and the representative sequence logo are selected to display in Figures 3B and 5B. Pathway enrichment. To explore the biological events of different cell clusters, we performed gene set over-representation test on cluster signature genes using R package clusterProfile with Reactome (https://reactome.org/) database. The pathways with hypergeometric test p value < 0.01 and q-value < 0.2 were considered significance. Unsupervised clustering of D7 and D8 cells Unsupervised clustering. To reveal the intrinsic structure and potential subtypes of D7 and D8 cells, we perform unsupervised clustering using SC3 algorithm (Kiselev et al., 2017). For D7 and D8 expression data with 123 cells, we used 2940 high variable gene as input and six clusters were identified. For feature selection, we seek for genes that are relevant for the clustering results by fitting a regularized generalized linear model via penalized maximum likelihood using R package glmnet (Tibshirani, 2011). Because the clustering results are categorical variables, we fitted the data using multinomial model. When fitting 3086 genes and 123 cells of 6 clusters with grouped-lasso penalty, we set a = 0.45, and the tuning parameter l is determined by the value that gives minimum mean of 10fold cross-validation error. The elastic-net solver results in 398 genes with nonzero coefficients. In addition, we performed ‘‘recall’’ procedure. The genes were clustering into 6 groups underlying pairwise Pearson correlation coefficient using apclusterK as a function implement in R package apcluster. The eigenvalues were calculated as the ‘‘hub gene’’ of each group, and the direction of each eigenvalue were corrected according to mean expression of associate group. The correlations between each ‘‘hub gene’’ and the filtered genes were computed, those genes with r>0.4 were recalled. This resulted in 729 genes, which were clustered again into 6 clusters using AP clustering. Reduce dimension for 10x reprogramming data We used SOT to discover the reprogramming transition and branches. For the OSK data, we first performed clustering analysis in mixed D3-D8 cells using Seurat. The 35025 cells were clustering into 7 groups using PC3-PC10 while PC1 and PC2 were highly correlated with sampling time and cell cycle respectively. The top 100 marker genes of each group were summarized as eigengenes for diffusion map. For CIP data, we performed two level AP clustering on HVGs, which resulted in 9 gene groups. There is one group highly related to the heterogeneity caused by changing medium. This group was excluded to ensure the continuous structure of data. The eigengenes of resulting gene group were then used for performing diffusion map. Partition based graph abstraction We first performed Louvain clustering for 10x-OSK data using top 5 diffusion components (DCs) preprocessed by SOT. The relationship of resulting 15 clusters were computed using partition based graph abstraction (PAGA). The signature of each cluster was represented by averaging the gene expression of all cell in each cluster. QUANTIFICATION AND STATISTICAL ANALYSIS Statistical Analysis Statistical and graphical data analyses were performed using Microsoft Excel and Prism 6. Statistical parameters including statistical analysis, statistical significance, and n value are reported in the figure legends and supplemental figure legends.
Molecular Cell 73, 815–829.e1–e7, February 21, 2019 e6
DATA AND SOFTWARE AVAILABILITY Data Resources The accession number for the sequencing data reported in this paper is NCBI GEO: GSE103221. Code Availability SOT is available as an R package at https://github.com/JiekaiLab/SOT.
e7 Molecular Cell 73, 815–829.e1–e7, February 21, 2019