Clonal origin in normal adults of all blood lineages and circulating hematopoietic stem cells

Clonal origin in normal adults of all blood lineages and circulating hematopoietic stem cells

Category: Normal hematopoiesis Journal Pre-proof Clonal Origin in Normal Adults of All Blood Lineages and Circulating Hematopoietic Stem Cells Kai W...

812KB Sizes 0 Downloads 28 Views

Category: Normal hematopoiesis

Journal Pre-proof

Clonal Origin in Normal Adults of All Blood Lineages and Circulating Hematopoietic Stem Cells Kai Wang , Zi Yan , Shouping Zhang , Boris Bartholdy , Connie J. Eaves , Eric E. Bouhassira PII: DOI: Reference:

S0301-472X(20)30025-4 https://doi.org/10.1016/j.exphem.2020.01.005 EXPHEM 3789

To appear in:

Experimental Hematology

Received date: Revised date: Accepted date:

20 December 2019 14 January 2020 16 January 2020

Please cite this article as: Kai Wang , Zi Yan , Shouping Zhang , Boris Bartholdy , Connie J. Eaves , Eric E. Bouhassira , Clonal Origin in Normal Adults of All Blood Lineages and Circulating Hematopoietic Stem Cells, Experimental Hematology (2020), doi: https://doi.org/10.1016/j.exphem.2020.01.005

This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. © 2020 Published by Elsevier Inc. on behalf of ISEH – Society for Hematology and Stem Cells.

Clonal Origin in Normal Adults of All Blood Lineages and Circulating Hematopoietic Stem Cells

Kai Wang1, Zi Yan1, Shouping Zhang1, Boris Bartholdy1, Connie J. Eaves2, Eric E. Bouhassira1^ 1 2

Department of Cell Biology, Albert Einstein College of Medicine, Bronx New York 10706 Terry Fox Laboratory, British Columbia Cancer and University of British Columbia, Vancouver, BC V5Z 1L3

*Corresponding author and lead contact: Email: [email protected] Mailing address: Albert Einstein College of Medicine 1300 Morris Park Ave Bronx, New York, 10461 Category: Normal hematopoiesis Summary: Characterization of human cells that sustain blood cell production lifelong has historically been inferred from phenotypically defined subsets of cells assayed in vitro, in transplanted immunodeficient mice, or in patients transplanted with genetically marked cells. These approaches have led to the concept of a persistent complex hierarchical process of differentiation divisions originating from a rare population of CD34+CD38-CD45RA-CD90+CD49f+ cells with an average self-renewal potential of >0.5, and an ability to produce some or all blood cell types for >1 year. However, the role of these “49f” cells in the unperturbed adult has remained poorly understood. To address this gap, somatic single nucleotide polymorphisms (SNPs) have recently been exploited as lineage tracing markers to enumerate and characterize active hematopoietic clones in normal adults using a capture and recapture approach. We show here that the use of somatic transversions to identify somatically acquired variant alleles enabled their detection in bulk populations at frequencies of approximately 1 in 80,000 cells. We then applied this method to blood cells isolated from 2 normal adults (aged 31 and 53 years) over a 1 to 3 year period. The results revealed in both donors a continued clonal output of both T- and B-lymphoid cells as well as myeloid cells identified by the same unique transversions found to distinguish single 49f cells 1

isolated from the same donors’ initial blood samples. These findings provide the first evidence of a continuing hematopoietic stem cell-derived source of all mature blood cell types in normal (unperturbed) adult humans. Highlights: 

Lineage tracing of somatically acquired DNA transversions enables a sensitive method for tracking hematopoietic clones



This approach has revealed multi-lineage clones persisting in 2 normal individuals up to 53 years old

Introduction Understanding the origin and mechanisms that lead to hematologic disease is informed by a knowledge of the cell types responsible for blood cell production and their regulation in the unperturbed adult. Morphology has been widely used to interrogate the homeostatic regulation of cells undergoing the terminal stages of blood cell differentiation, but preceding events controlling their generation have had to rely on alternative approaches. The most developed of these have led to models in which phenotypically distinct subpopulations are hierarchically organized based on the different types and durations of mature blood cells they produce in various in vitro or in vivo (transplantation) assays optimized for this purpose. Such systems have been powerful sources of information about the changing molecular and immediate viability, mitogenic and growth factor-induced signaling properties of the different phenotypes thus characterized. However, there is also recent evidence of a greater underlying complexity in matching this model to how individual hematopoietic cells behave. This relates to the finding that both mouse and human cells with the greatest regenerative activity measured in clonal transplantation assays, and hence assigned an apical position in the hierarchy, still show marked and independent variability in the self-renewal and lineage outputs they display in such assays as well as in their diverse molecular features, differentiation trajectories, and evolution of their clonal outputs and persistent activity during development and aging (Haas et al., 2018; Knapp et al., 2018, 2019; Laurenti and Göttgens, 2018). Interestingly, in adult mice, lineage tracing techniques have indicated that normal blood cell production is supported extensively by cells that display limited self-sustaining ability in transplantation assays, albeit with a continuous contribution from the phenotypes classified as long-term hematopoietic stem cells (LT-HSCs) 2

based on their extensive regenerative activity in serial transplantation assays (Carrelha et al., 2018; Hadland and Yoshimoto, 2018; Rodriguez-Fraticelli et al., 2018). Human LT-HSCs have been defined experimentally as cells capable of reconstituting blood cell production for at least 20 weeks in transplanted irradiated immunodeficient mice. In 2011, Notta et al (Notta et al., 2011) reported and we subsequently confirmed (Knapp et al., 2017) that these cells can be selectively isolated at 10% purity by flow sorting the 49f+ subset of the CD34+CD38-CD45RA-CD90+ fraction of low density human cord blood (CB) collections (hereafter referred to as 49f cells). The same markers can be used to identify a cell population in normal adult human bone marrow (BM) or peripheral blood (PB) but their content of functionally defined LT-HSC activity is lower (about 1/400 BM cells (Wang et al., 2019)). In the unperturbed adult, the number of clones contributing to the production of mature human blood cells at any one time was initially estimated at a few hundred from analyses of skewed distributions of X-linked variant properties measured in blood cells from female heterozygotes (Buescher et al., 1985; Catlin et al., 2011). More recently, the number of transplantable LT-HSCs remaining active in individual gene therapy patients was shown to be much larger (tens of thousands) (Biasco et al., 2016). However, this latter calculation necessarily ignores potential influences introduced as a result of the systemic perturbations caused by the transplantation procedure. To examine the clonal contributions to blood cell production in untreated individuals, Lee-Six et al (Lee-Six et al., 2018) developed an approach that exploits spontaneously arising mutations as clonal tracing markers. In this study, somatic variants were first identified in different CD34+ phenotypes by sequencing their clonal progeny generated in vitro, and the representation of these variants in mature blood cell isolates then traced (recaptured) to infer clonal evolution dynamics. In accordance with previous data (Wang et al., 2019), the CD34+ cells thus characterized carried on average slightly more than a thousand somatic mutations, many of which were shared indicating their acquisition early in development. Using data from the recapture phase, the authors calculated the number of active clones to have reached 50,000– 200,000 by adolescence. Further analysis demonstrated the existence of adult stem cells producing granulocytes and B cells, but evidence of clones also producing T cells was not obtained, either because their frequency was below the limit of detection or they were no longer present. A subsequent study using a similar approach confirmed many of these findings 3

including the continued detection of bi-lineage myeloid/B cell clones, but without data for T cells (Osorio et al., 2018). In a related study, we identified >2,000 somatic single nucleotide polymorphisms (SNPs) in the 49f cells of 8 normal (young) adults and confirmed a similar rate of their acquisition to that reported by Lee-Six et al (Lee-Six et al., 2018) (i.e., at ~11 SNPs/year (Wang et al., 2019)). We now report the use of transversions rather than single nucleotide transitions to increase the frequency of detecting cells with somatically acquired 49f cell-specific variant alleles by more than an order of magnitude and the consequent identification in 2 normal adults 30 and 55 years old of a continued shared clonal output of T cells as well as B cells and myeloid cells (monocytes).

Results Standard SNP tracing strategy fails to detect recently acquired variant allele frequencies (VAFs). In a first experiment (Figure 1A), we used a conventional recapture panel for 232 of the somatic SNPs we had previously identified in 4 normal 18 to 31 year-old adults (Wang et al., 2019). This SNP panel encompassed from 5 to 15 SNPs identified in 4 to 9 49f cells from each of the 4 donors (232 SNPs in all). To recapture these SNPs in the low density (mononuclear cells, MNCs) isolated from the PB and the BM of these 4 individuals, 232 multiplex amplicons (~150 bp each) were designed and the amplified DNA products sequenced on a single lane of an Illumina Hiseq machine at an average read depth of 280,000x. Somatic SNPs present in the PB and/or the BM MNCs were then identified by applying a chi-square test using, as a control, the average error rate observed for each of the 6 possible types of nucleotide substitutions for the entire panel. Seven of the 232 SNPs tested (3.0%) were significant (q-value <0.01) with VAFs ranging between 1 in 10,000 and 1 in 30 cells, thus demonstrating recapture of the original 49fcell SNPs in the MNCs of the same PB sample (Figures 1B and 1C). Comparison of results from paired BM and PB MNC samples showed the VAFs for 4 of the 7 original 49f cells that could be traced in these were similar, suggesting that the cells carrying these SNPs were present in similar numbers in both sample types. In contrast, for 2 of the other

4

49f cells the VAFs were much higher in the BM than in the PB MNCs and, for the 7th 49f cell, the VAFs were higher in the PB MNCs. These differences in VAF suggest that the ancestors of the 7 different 49f cells tracked had contributed variably to their concurrently detectable mature progeny. However, the expected low yield of recaptured SNPs (7 of 232), and the failure to recapture more than one SNP from any given 49f cell originally sequenced, indicated the need for improved sensitivity to analyze the lineage distribution and tracking of clonally related elements. Increased detection of clonally-related blood cells from 49f cell-specific somatic transversions We then sought to investigate the possibility that the VAF might be improved by restricting the recapture step to an analysis of transversions. This was based on a previous report of the high error rate in detecting transitions incurred by the use of Taq polymerase (McInerney et al., 2014). Consistent with this prediction, we found the average error rate for detecting transversions in the recapture data to be up to 20 times lower than the average error rate for detecting transitions (1.8±0.2x10-3 for A to G and 1.7±1x10-3 for C to T, see Figure 2A). In addition, this analysis revealed that a subset of genomic positions exhibited very high error rates (Figure S1), suggesting that the optimal method to estimate VAFs was to calculate error rates at each genomic position. Based on these findings, we designed a new recapture panel to detect 61 transversions that were originally identified in 22 of the 49f cells isolated from 5 normal adults ranging in age from 30 to 53 years (Wang et al., 2019). Transversion amplicons expanded from DNA extracts of the PB MNCs isolated from these 5 individuals were then sequenced to an average read depth of 550,000x. The results of analyzing this sequence data revealed 3 transversions that uniquely identified a single clone in each of 2 donors at the time of the first blood collection (i.e., CP1 in donor NY22 aged 53 years and CP2 in donor SB10 aged 31 years). This finding is consistent with the expectation that focusing on transversions would increase the sensitivity of recaptured variant detection and also favor detection of recent clonal outputs of mature cells sharing the same unique genomic features evident in a single 49f cell in each of the 2 corresponding PB samples. Detection of monocyte, B- and T-cell outputs in clones identified by somatically acquired single 49f cell-specific transversions 5

To characterize the lineage content and dynamics of these 49f cell-associated clones, we isolated pure populations of monocytes, B and T cells from the same PB samples of individuals NY22 and SB10 by fluorescence-activated cell sorting (FACS) of suitably antibody-stained cells and then sequenced their extracted DNA. In addition, we analyzed multiple technical repeats of the unfractionated MNCs to assess in parallel the reproducibility of the recapture data. To evaluate the durability of these clones and their outputs, additional PB samples were obtained from NY22 and SB10 14 and 36 months after the initial blood draw and the cells from these were similarly analyzed. A total of 20 amplicon libraries were then prepared and sequenced to an average depth of about 300,000x. The SNPs identified in these libraries were again all features of the 49f cellrelated NY22-CP1 and SB10-CP2 clones (Figure 3A). Examination of technical repeats showed the same transversions to be consistently called significant and the VAFs to be reproducible at a level ranging between 1 in 5,000 1 to 1 in 80,000 cells (Figure S2). Analysis of the monocytes isolated from the original PB sample from donor NY22 revealed that the same 3 NY22-CP1 transversions that we had previously detected in the MNC fractions were present at an average VAF of 7.1 ± 1.4x10-4 corresponding to one cell in 2,800 ± 562 cells. Fourteen months later, this value had decreased ~2-fold (to 1 in 5,500 ± 2,600 cells), but after 36 months had recovered to the level initially measured (to 1 in 2,400 ± 700 cells). The fourth NY22 CP2-specific transversion, although not initially detected in the monocytes, became evident in both the later PB samples and also in the bulk MNCs isolated from those. This continued presence of all 4 transversions in monocytes monitored over a 3-year period provides strong evidence of their continued output from a 49f cell related closely to the one originally annotated. In absolute terms, their measured frequencies infer a clonal output sufficient to sustain a representation in the total PB monocyte pool ranging between 0.04% and 0.08%. Similar analyses revealed that 3 of the same 4 transversions were present in the B cells in both the initial PB sample and that obtained 14 months later, but at a 10x lower frequency than found for the monocytes (i.e., 1 in 25,000 ± 18,100 cells). Analysis of the T cells in the same initial PB sample detected only one of the 4 transversions at a frequency of 1/85,000 cells and different ones in the T cells isolated from the PB samples obtained 14 and 36 months later (also at low frequencies). Nevertheless, these findings indicate both T and B cells as well as monocytes related clonally to a 49f cell from this person were being produced, albeit at limiting levels of detection. 6

Analysis of the initial PB sample from donor SB10 revealed that 2 of the 6 identifying transversions of the CP2 clone were present in the monocytes at frequencies of 1 in 1,900 to 1 in 10,600 cells. One of these transversions was also found in the T cells isolated from this same sample at a frequency of 1 in 54,000 cells. Three years later, the same finding was noted for the monocytes and the T cells, and one of the same transversions was then also identified for the first time in the circulating B cells. In addition, at this later time point, another of the 4 transversions was detected in the monocytes. Together, these results establish in a second normal adult, the clonal origin of cells belonging to all 3 major mature blood cell lineages monitored as well as the 49f cell compartment.

Discussion Measurements of changing distributions of somatically acquired mutations to track the clonal dynamics and differentiation activity of individual cell types offers an important approach to analyze processes that regulate normal tissue homeostasis in humans. This topic has recently become of great interest in the hematopoiesis field for 2 reasons. The first is prompted by a growing body of data indicating a discrepancy in the mature cell outputs of phenotypically distinct subsets of primitive mouse hematopoietic cells when these are examined in situ versus after their isolation and stimulated growth in vitro or in vivo (Carrelha et al., 2018; Hadland and Yoshimoto, 2018; Rodriguez-Fraticelli et al., 2018), now corroborated by recent evidence that this is also the case in normal humans (Lee-Six et al., 2018; Osorio et al., 2018). A second source of interest in this topic derives from the finding of a dramatic increase with age of skewed clonal contributions to mature blood cell outputs in normal adult humans (see other contributions to this issue of Experimental Hematology). Adding to this is the unexpected discovery of the frequent presence in these enlarged clones of mutations historically considered to be “drivers” of AML, but found to predispose only modestly to the subsequent development of malignant transformation (Bowman et al., 2018; Gibson and Steensma, 2018; Wiedmeier et al., 2016). A limitation of the use of somatically acquired SNPs for lineage tracing in humans has been the low recapture rate attributed to the high error that Taq polymerase introduces in the recapture DNA amplification step (McInerney et al., 2014). In addition, transitions have been found to be the predominant type of mutation acquired during early development (Osorio et al., 2018) 7

reflecting a higher rate at that time of spontaneous deamination of methylated cytosines and a subsequent disproportionate acquisition of C to T transitions (Alexandrov et al., 2013, 2015). Here, we confirmed the predicted finding of a low detection frequency of initially identified SNPs in the recaptured MNC DNA (3.0%) which precluded examining their frequency in different lineages of mature cells isolated from the same donors. Because our SNP detection panel was based on sequences originally obtained from derivative sibling populations and had a validation rate close to 100% (Wang et al., 2019), the failure to recapture most SNPs was unlikely due to errors in their initial identification but rather due to errors introduced in the recapture step. To overcome these limitations, we focused on tracking only somatically acquired transversions. This enabled 13% of the SNPs then tested (8 of 61) to be detected in recaptured DNA and allowed clonally related cells to be detected at frequencies of up to 1 in 80,000 cells. Application of this approach to FACS-purified monocytes, B and T cells purified from the PB of 2 healthy donors (aged 31 and 53; i.e., young and middle-aged, respectively) revealed a shared clonal origin of all of these mature blood cell types with a uniquely identified 49f cell from each donor. In addition, evidence of a continued tri-lineage output history of the clones for at least 3 years was demonstrated. Taken together, these results strongly support the sustained output of T, B and myeloid blood cells from single 49f (HSCs) in normal adult humans, at least up to 53 years of age. The majority of transversions tracked here were unlikely to have had any functional significance because they were located relatively far from genes and none modified the coding sequences of any known protein. However, this does not preclude the possibility that the clones identified may have possessed other unique mutations with biological effects mediated directly on RNA expression or accessibility of epigenomic modifiers or DNA binding regulators. On the other hand, the sensitivity of the neutral somatic transversion tracking methodology described here could offer a useful new approach to assess the development, output dynamics, and evolution of the hematopoietic system at a clonal level in aging individuals in an unbiased and non-invasive manner both during normal aging and in association with disease pressures.

Acknowledgments

8

KW, ZY, SZ, BB and EEB were supported by grants NYSTEM C030135, C029154, NIH HL130764, and Doris Duke Foundation 2017087. CE was supported by a Terry Fox Foundation New Frontiers Program Project grant (#1074) and grants from Genome British Columbia and the Canadian Institutes of Health Research (CIHR) as part of the Canadian Epigenetics, Environment and Health Research Consortium Network (CIHR-262119). We thank Daqian Sun from the Stem Cell flow Cytometry and Xenotransplantation facility for expert help with flow sorting. We thank Robert Durbin from the Einstein Epigenomic Core for processing the raw sequencing data. We thank Dr. Kenny Ye (Albert Einstein College of Medicine) for helpful discussions.

Author contributions KW, ZY and SZ performed the experiments and contributed to the experimental design. BB contributed to data analysis. CE contributed to the experimental design, interpretation of the data and manuscript writing. EEB conceptualized and supervised the execution of the project and contributed to the experimental designs, data analysis and manuscript writing. Declaration of Interests The authors declare no competing interest. References: Alexandrov, L.B., Nik-Zainal, S., Wedge, D.C., Aparicio, S.A.J.R., Behjati, S., Biankin, A. V., Bignell, G.R., Bolli, N., Borg, A., Børresen-Dale, A.-L., et al. (2013). Signatures of mutational processes in human cancer. Nature 500, 415–421. Alexandrov, L.B., Jones, P.H., Wedge, D.C., Sale, J.E., Campbell, P.J., Nik-Zainal, S., and Stratton, M.R. (2015). Clock-like mutational processes in human somatic cells. Nat. Genet. 47, 1402–1407. Benjamini, Y., and Hochberg, Y. (1995). Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. J. R. Stat. Soc. Ser. B 57, 289–300.

9

Biasco, L., Pellin, D., Scala, S., Dionisio, F., Basso-Ricci, L., Leonardelli, L., Scaramuzza, S., Baricordi, C., Ferrua, F., Cicalese, M.P., et al. (2016). In Vivo Tracking of Human Hematopoiesis Reveals Patterns of Clonal Dynamics during Early and Steady-State Reconstitution Phases. Cell Stem Cell 19, 107–119. Bowman, R.L., Busque, L., and Levine, R.L. (2018). Clonal Hematopoiesis and Evolution to Hematopoietic Malignancies. Cell Stem Cell 22, 157–170. Buescher, E.S., Alling, D.W., and Gallin, J.I. (1985). Use of an X-linked human neutrophil marker to estimate timing of lyonization and size of the dividing stem cell pool. J. Clin. Invest. 76, 1581–1584. Carrelha, J., Meng, Y., Kettyle, L.M., Luis, T.C., Norfo, R., Alcolea, V., Boukarabila, H., Grasso, F., Gambardella, A., Grover, A., et al. (2018). Hierarchically related lineage-restricted fates of multipotent haematopoietic stem cells. Nature 554, 106–111. Catlin, S.N., Busque, L., Gale, R.E., Guttorp, P., and Abkowitz, J.L. (2011). The replication rate of human hematopoietic stem cells in vivo. Blood 117, 4460–4466. Gibson, C.J., and Steensma, D.P. (2018). New Insights from Studies of Clonal Hematopoiesis. Clin. Cancer Res. 24, 4633–4642. Haas, S., Trumpp, A., and Milsom, M.D. (2018). Causes and Consequences of Hematopoietic Stem Cell Heterogeneity. Cell Stem Cell 22, 627–638. Hadland, B., and Yoshimoto, M. (2018). Many layers of embryonic hematopoiesis: new insights into B-cell ontogeny and the origin of hematopoietic stem cells. Exp. Hematol. 60, 1–9. Knapp, D.J.H.F., Hammond, C.A., Miller, P.H., Rabu, G.M., Beer, P.A., Ricicova, M., Lecault, V., Da Costa, D., VanInsberghe, M., Cheung, A.M., et al. (2017). Dissociation of Survival, Proliferation, and State Control in Human Hematopoietic Stem Cells. Stem Cell Reports 8, 152– 162. Knapp, D.J.H.F., Hammond, C.A., Hui, T., van Loenhout, M.T.J., Wang, F., Aghaeepour, N., Miller, P.H., Moksa, M., Rabu, G.M., Beer, P.A., et al. (2018). Single-cell analysis identifies a CD33+ subset of human cord blood cells with high regenerative potential. Nat. Cell Biol. 20, 10

710–720. Knapp, D.J.H.F., Hammond, C.A., Wang, F., Aghaeepour, N., Miller, P.H., Beer, P.A., Pellacani, D., VanInsberghe, M., Hansen, C., Bendall, S.C., et al. (2019). A topological view of human CD34+ cell state trajectories from integrated single-cell output and proteomic data. Blood 133, 927–939. Laurenti, E., and Göttgens, B. (2018). From haematopoietic stem cells to complex differentiation landscapes. Nature 553, 418–426. Lee-Six, H., Øbro, N.F., Shepherd, M.S., Grossmann, S., Dawson, K., Belmonte, M., Osborne, R.J., Huntly, B.J.P., Martincorena, I., Anderson, E., et al. (2018). Population dynamics of normal human blood inferred from somatic mutations. Nature 561, 473–478. Li, H., and Durbin, R. (2010). Fast and accurate long-read alignment with Burrows–Wheeler transform. Bioinformatics 26, 589–595. McInerney, P., Adams, P., and Hadi, M.Z. (2014). Error Rate Comparison during Polymerase Chain Reaction by DNA Polymerase. Mol. Biol. Int. 2014, 1–8. Notta, F., Doulatov, S., Laurenti, E., Poeppl, A., Jurisica, I., and Dick, J.E. (2011). Isolation of single human hematopoietic stem cells capable of long-term multilineage engraftment. Science 333, 218–221. Osorio, F.G., Rosendahl Huber, A., Oka, R., Verheul, M., Patel, S.H., Hasaart, K., de la Fonteijne, L., Varela, I., Camargo, F.D., and van Boxtel, R. (2018). Somatic Mutations Reveal Lineage Relationships and Age-Related Mutagenesis in Human Hematopoiesis. Cell Rep. 25, 2308-2316.e4. Rodriguez-Fraticelli, A.E., Wolock, S.L., Weinreb, C.S., Panero, R., Patel, S.H., Jankovic, M., Sun, J., Calogero, R.A., Klein, A.M., and Camargo, F.D. (2018). Clonal analysis of lineage fate in native haematopoiesis. Nature 553, 212–216. Wang, K., Guzman, A.K., Yan, Z., Zhang, S., Hu, M.Y., Hamaneh, M.B., Yu, Y.K., Tolu, S., Zhang, J., Kanavy, H.E., et al. (2019). Ultra-High-Frequency Reprogramming of Individual Long-Term Hematopoietic Stem Cells Yields Low Somatic Variant Induced Pluripotent Stem 11

Cells. Cell Rep. 26, 2580-2592.e7. Wiedmeier, J.E., Kato, C., Zhang, Z., Lee, H., Dunlap, J., Nutt, E., Rattray, R., McKay, S., Eide, C., Press, R., et al. (2016). Clonal hematopoiesis as determined by the HUMARA assay is a marker for acquired mutations in epigenetic regulators in older women. Exp. Hematol. 44, 857865.e5.

12

Main figure titles and legends

Figure 1: Recapture of somatic SNPs in mononuclear cells. A: Capture Recapture method. Blood cells were collected from healthy donors, 49f cells isolated, and single 49f cells expanded in vitro and sequenced to identify (capture) somatic SNPs as described (Wang et al., 2019). Monocytes, B and T cells were isolated by FACS from thawed 13

aliquots of MNCs obtained from the original and additional PB samples and stored viably in liquid N2. Amplicons were designed and sequenced to measure the VAF of a selected subset of somatic SNPs (recapture) in the previously stored bulk populations. B: Histogram illustrating the results of a recapture experiment in which 232 SNPs from 4 individuals were recaptured. Each bar represent a 49f cell that was isolated either from BM (CB clones) or from PB (CP clones). The total height of the bar represents the number of SNPs characteristic of each clone in the panel. The red fraction of the bar represents the number of detected SNPs (in the BM or PB), the blue fraction the number of undetected SNPs. Seven of the 232 were detected in the BM, 6 in the PB. At most one SNP per 49f cell were detected C: Table summarizing the observed VAFs and their q-values in the BM and the PB. Since only a very small fraction of the SNPs in individual 49f cells were detected, the VAFs likely represent the frequency of the progeny of ancestors of the target 49f cells rather than their direct progeny. The mean error rates are the average error rates summarized in Figure 2A. VAFs and q-value calculations are described in the methods. Significant q-values are colored in pink.

14

Figure 2: Detection of 49f cells using transversion panels. A: Error rates during the recapture phase calculated on about 40 kb of amplified genomic DNA (sum of amplicon size in panel 1 and 2) sequenced to a total depth > 2 million. The transition error rates are up to 20-fold higher than the transversion error rates.

15

B: Histogram summarizing the results of the recapture experiments in which 61 transversions originally detected in 16 49f cells were traced in PB MNCs of 5 individuals. Three transversions from 49f cell NY22-CP1 and 3 from SB10 CP2 were detected. The graph is organized as Figure 1B. C: Table summarizing the results. Significant q-values are colored in pink.

16

Figure 3: 49f cells can differentiate into monocytes, B and T cells. Heatmap illustrating the q-values observed after the recapture of 61 transversions in PB MNCs, 17

and sorted monocytes, B cells and T cells from the NY22 and SB10 samples. NY22 (0) and SB10 (0): Cells collected at time 0. NY22 (14), NY22 (36) and SB10 (36): cells collected 14 or 36 months after the initial blood draw. Multiple transversions characteristic of clone NY22-CP1 and SB10-CP2 were detected indicating that 49f cells can differentiate into monocytes, B cells and T cells. B: Calculated VAFs for the detected transversions.

18

Main Tables and Legends: Table 1: Characteristics of the healthy volunteers who contributed samples to the study.

Sample Age Sex ID

Ethnicity

Sample # of 49f cells analyzed

SB10

31

F

Caucasian

PB

4

NY22

53

M

Caucasian

PB

3

SB02

31

F

Asian

PB

3

SB06

32

M

Caucasian

PB

3

SB08

32

F

Hispanic

PB

3

L1

24

M

Black

PB/BM

6

L2

18

M

Black

PB/BM

6

L3

23

M

Caucasian

PB/BM

9

19

STAR Methods CONTACT FOR REAGENT AND RESOURCE SHARING Further information and requests for resources and reagents should be directed to and will be fulfilled by the Lead Contact, Eric Bouhassira ([email protected]). EXPERIMENTAL MODEL AND SUBJECT DETAILS Human samples: Heparinized BM aspirates and PB cells from 4 healthy individuals were purchased from Lonza (Basel, Switzerland) and shipped to Albert Einstein College of Medicine (AECOM) overnight at 4oC. Additional PB samples were obtained from 5 normal adults at AECOM, in both cases according to IRB-approved protocols. MNCs were isolated by centrifugation on Histopaque as recommended by the manufacturer (Sigma-Aldrich, St Louis, MO) and residual red blood cells lysed by incubation in an ice-cold aqueous solution of 790 mg/L of ammonium bicarbonate and 7.7 g/L ammonium chloride, and then viably frozen within 36 hours of the sample collection. The age and gender of all donors are reported in Table 1.

METHOD DETAILS Isolation of 49f cells: PBMCs were stained with Rhodamine123 (100 ng/mL) and with a panel of fluorophore-conjugated monoclonal antibodies (See Key Resource Table) and sorted on a BD FACSaria

III.

49f

cells

are

defined

as

(Lin-CD34+CD38-CD90+CD45RA-

CD49f+Rhodamine123low)(Notta et al., 2011). Isolation of monocytes, B cells and T cells: PB MNCs were stained with anti-human CD3BV480, anti-human CD19-FITC and anti-human CD11b-BV510 and near infrared live dead dye and sorted on a BD FACSaria III. T cells and B cells were isolated as CD3+ and CD19+ cells, respectively. Monocytes were identified first based on high forward and side light scattering properties and then as CD3-CD19-CD11b+ cells. Custom multiplex-PCR amplicon libraries and sequencing: 232 custom TargetGxOne amplicon primers (GeneWiz, South Plainfield, NJ) 100 to 300 bp in size were designed and used to prepare 10 TargetGxOne amplicon libraries which were then sequenced preparation on a single lane of an Illumina Hiseq sequence Analyzer (150 bp x2). Somatic SNPs included in the 20

panel were picked randomly from the list of ~2,000 SNPs that we had identified previously and filtered to eliminate SNPs located in repetitive regions or regions not compatible with the design of multiple primers using the TargetGxOne technology. A second set of panels to detect transversions was created using 61 custom TargetGxOne amplicon primers (GeneWiz, South Plainfield, NJ) spanning ≤150 bp. These were used to prepare 7 TargetGxOne amplicon libraries from DNA extracted from purified PB MNC cells from 5 individuals. All somatic transversions identified in the 49f cells analyzed from the 5 individuals tested and that were compatible with the design of multiple primers using the TargetGxOne technology were included. Libraries were then sequenced on a single lane of an Illumina Hiseq sequence Analyzer (150 bp x2). The same panel was then reused to generate 20 additional libraries to analyze monocytes, B and T cells from individuals NY22 and SB10 at time zero and the later time points.

QUANTIFICATION AND STATISTICAL ANALYSIS Data analysis. Demultiplexed fastq files were trimmed of flanking adapter sequences (trim galore, v0.3.7) and then aligned to the human genome hg19 (bwa mem, v0.7.10) (Li and Durbin, 2010). Aligned reads were filtered to retain those with a mapping quality (MAPQ) score of at least 40 (samtools, v1.9) and showing an exact match to the index sequence. Remaining reads were sorted using SortSam (Picard, v1.119). Allele frequencies for selected SNPs and amplicons were determined using mpileup and call (bcftools, version 1.9). Statistical analysis. Error-rates for each of the 6 possible types of single nucleotide substitutions (A → C , A → G, A → T, C → A, C → G, C → T) were calculated from the read counts obtained for the ~40 kb of genomic DNA that was sequenced to a cumulated average total depth of about 20,000,000x. Significantly detected SNPs in Figure 1 were called by comparing the ratio of the number of reference (ref) to alternate (alt) allele reads for each somatic SNP to the respective average errorrate for each of the 6 possible types of single nucleotide substitutions (i.e., the average number of ref and alt reads observed in the entire panel for each of the 6 possible types of single nucleotide substitutions).

21

In the experiments performed with the transversion panel (Figures 2 and 3), significantly detected SNPs were called using error-rates specific for each transversion. These error-rates were computed using the libraries not expected to contain the clone-specific somatic transversions. For instance in Figure 2C, the VAFs for NY22 were based on error-rates observed in individuals SB02, SB06, SB08 and SB10, NY22, and those for SB10 on error-rates observed in individuals SB02, SB06, SB08 and NY22. In all cases, comparisons were performed using Pearson’s Chi-squared test for count data (base R chisq.test function) imputing the average number of ref and alt reads observed in the controls as expected values in the chi-square test. P-values were corrected for multiple testing by computing the q-values using the Benjamini and Hochberg FDR method (Benjamini and Hochberg, 1995) (base R p.adjust function). The VAFs of the significant SNPs were then estimated by subtracting the error rate calculated from the control samples from the observed frequency for each significant SNP (chi-square q-value < 0.01 (fdr method)). The observed frequency was calculated as the ratio of the number of Alternate and Reference reads obtained (Alt/Ref column in Figure 2). The frequencies of the mature cell outputs of each 49f-associated clone in each fraction of PB cells were estimated by multiplying the VAFs by 2 since all of the transversions are heterozygous. Calculating VAFs using average- or transversion-specific error rates in the experiments depicted in Figures 2 and 3 led to the same general conclusions, but the use of transversion specific error rates was more sensitive.

ADDITIONAL RESOURCES Not applicable

22

KEY RESOURCES TABLE REAGENT or RESOURCE Antibodies CD2 Monoclonal Cyanine5 CD3 Monoclonal Cyanine5

SOURCE

IDENTIFIER

Antibody (RPA-2.10), PE- ThermoFisher

Cat# 15-0029-42; RRID: AB_10736743

PE- ThermoFisher

Cat# 15-0038-42; RRID: AB_10598354

Antibody

(UCHT1),

CD4 Monoclonal Antibody (S3.5), TRI-COLOR

ThermoFisher

Cat# MHCD0406; RRID: AB_10392548

CD7 Monoclonal Antibody (CD7-6B7), TRI- ThermoFisher COLOR

Cat# MHCD0706; RRID: AB_10373996

CD8 Monoclonal Antibody (3B5), TRI-COLOR

ThermoFisher

Cat# MHCD0806; RRID: AB_10372207

CD10 Monoclonal Antibody (eBioCB-CALLA ThermoFisher (CB-CALLA)), PE-Cyanine5

Cat# 15-0106-42; RRID: AB_10596518

CD14 Monoclonal Antibody (TuK4), TRI-COLOR

ThermoFisher

Cat# MHCD1406; RRID: AB_10373566

PE- ThermoFisher

Cat# 15-0199-42; RRID: AB_10853658

ThermoFisher

Cat# 15-0209-42; RRID:AB_10548510

CD56 Monoclonal Antibody (MEM-188), TRI- ThermoFisher COLOR

Cat# MHCD5606; RRID: AB_10372520

CD38 Monoclonal Antibody (HIT2), PE-Cyanine7

ThermoFisher

Cat# 25-0389-42; RRID: AB_1724057

CD90 (Thy-1) Monoclonal Antibody (eBio5E10 ThermoFisher (5E10)), Biotin

Cat# 13-0909-82; RRID: AB_763525

CD45RA Monoclonal Antibody (HI100), Super ThermoFisher Bright 600

Cat# 63-0458-42; RRID: AB_2688186

CD49f Monoclonal Antibody (eBioGoH3 (GoH3)), ThermoFisher PE

Cat# 12-0495-82; RRID: AB_891474

CD45 Monoclonal Antibody (HI30), APC

Cat#17-0459-42; RRID: AB_10667894

CD19 Monoclonal Cyanine5

Antibody

(HIB19),

CD20 Monoclonal Antibody (2H7)

ThermoFisher

23

CD45.1 Monoclonal Antibody (A20), eFluor 450

ThermoFisher

Cat# 48-0453-82; RRID: AB_1272189

CD33 Monoclonal Antibody (WM-53 (WM53)), ThermoFisher PE

Cat# 12-0338-42; RRID: AB_10855036

CD19 Monoclonal Antibody (HIB19), FITC

ThermoFisher

Cat# 11-0199-42; RRID: AB_10669461

PE-Cy™5 Mouse Anti-Human CD235a

BD Biosciences

Cat# 561776

APC Mouse Anti-Human CD34

BD Biosciences

Cat# 555824

BV421 Streptavidin

BD Biosciences

Cat# 563259

PE Mouse Anti-Human CD123

R&D Systems

Cat# 555644

Unprocessed Human Bone Marrow

Lonza

Cat#: 1M-105

Unprocessed Human Peripheral Blood

Lonza

Cat#: 1W-500

Sigma-Aldrich

Cat# 10771

TargetGxOne amplicon primers library prep

GeneWiz

Custom order

Next Generation Sequencing (HiSeq 2x150bp)

GeneWiz

Wizard Genomic DNA purification kit

Promega

Custom order

In process

NA

NA

NA

Bacterial and Virus Strains NA Biological Samples

Chemicals, Peptides, and Recombinant Proteins Histopaque-1077 Critical Commercial Assays

Deposited Data XX fastq and XX vcf have been deposited to SRA Experimental Models: Cell Lines N/A Experimental Models: Organisms/Strains 24

NA Oligonucleotides N/A

NA

NA

NA

NA

Bwa (version 0.7.10, MEM algorithm)

SourceForge

http://biobwa.sourceforge.net/bwa .shtml

GATK (version 3.8)

GATKForum

https://gatkforums.broad institute.org/gatk/discuss ion/11188/gatk-version3-8-download

Samtools, v.1.9

Samtools

http://www.htslib.org/

Picard, v1.119

Broad Institute

https://broadinstitute.gith ub.io/picard/

Bcftools, v 1.9

bcftools

http://www.htslib.org/

Trim galore, v0.3.7

Babraham Bioinformatics

https://www.bioinformat ics.babraham.ac.uk/proje cts/trim_galore/

R-base

CRAN Project

Recombinant DNA N/A Software and Algorithms

R- https://cran.rproject.org/web

Other NA

NA

25

NA