Somatic mutations – Evolution within the individual

Somatic mutations – Evolution within the individual

Journal Pre-proofs Somatic mutations ---Evolution within the individual Satoshi Oota PII: DOI: Reference: S1046-2023(18)30382-7 https://doi.org/10.10...

490KB Sizes 0 Downloads 59 Views

Journal Pre-proofs Somatic mutations ---Evolution within the individual Satoshi Oota PII: DOI: Reference:

S1046-2023(18)30382-7 https://doi.org/10.1016/j.ymeth.2019.11.002 YMETH 4822

To appear in:

Methods

Received Date: Revised Date: Accepted Date:

31 December 2018 31 October 2019 7 November 2019

Please cite this article as: S. Oota, Somatic mutations ---Evolution within the individual, Methods (2019), doi: https:// doi.org/10.1016/j.ymeth.2019.11.002

This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

© 2019 Published by Elsevier Inc.

Somatic mutations ---Evolution within the individual Satoshi Oota Image Processing Research Team Center for Advanced Photonics, RIKEN 2-1 Hirosawa, Wako, Saitama 351-0198, Japan

Abstract With the rapid advancement of sequencing technologies over the last two decades, it is becoming feasible to detect rare variants from somatic tissue samples. Studying such somatic mutations can provide deep insights into various senescence-related diseases, including cancer, inflammation, and sporadic psychiatric disorders. While it is still a difficult task to identify true somatic mutations, relentless efforts to combine experimental and computational methods have made it possible to obtain reliable data. Furthermore, state-of-the-art machine learning approaches have drastically improved the efficiency and sensitivity of these methods. Meanwhile, we can regard somatic mutations as a counterpart of germline mutations, and it is possible to apply well-formulated mathematical frameworks developed for population genetics and molecular evolution to analyze this ‘somatic evolution’. For example, retrospective cell lineage tracing is a promising technique to elucidate the mechanism of prediseases using single-cell RNA-sequencing (scRNA-seq) data.

1. Introduction In multicellular organisms, all the cells inherit their genetic information from a single celled zygote [1]. Theoretically, these cells should share identical genetic information, while differential gene expression occurs at the levels of gene transcription, nuclear RNA processing, mRNA translation, and protein modification [2]. This assertion is, however, not exactly accurate; the post-zygotic cells in an individual do not always share identical genetic information. Due to replication errors and various other types of mutations, somatic cells gradually accumulate spontaneous and inducive alternations from endogenous and exogenous factors (Fig. 1). The genome of somatic cells is, therefore, subject to mosaicism. In other words, our somatic genome is no longer the same as the ‘original’ zygotic genome, but has significant variations, depending on the part of the individual. In that sense, the genome is drifting or rather ‘evolving’ from that of the single-celled zygote. Somatic mutation and subsequent mosaicism are involved in various biological traits, usually resulting in negative consequences: e.g., inflammation, cancer, and ageing. In particular, senescence-related diseases are putatively related to somatic mutations in a variety of contexts [3]. Meanwhile, the mosaicism in the central nervous system may contribute to the complexity of neurons as a form of epigenetic effect, which may play important roles in terms of behavior and cognition [4, 5]. In this review, I first explain the biological significance of somatic mutations and then introduce methods for analyzing somatic mutations, including how currently available technologies can be used to detect them.

2. Overview of somatic mutations

Somatic mutation is an alternation in DNA that occurs after conception. In contrast to germline mutations, somatic mutations are not transmitted to progeny [6] (Fig. 1). Precisely speaking, however, some post-zygotic mutations (PZMs) can be transmitted to the subsequent generations as described below. As somatic mutations and PZMs are not clearly distinguished in various contexts [7-9], I use the term ‘somatic mutation’ in a broad sense in this review. Somatic mutations can be subdivided into the same categories as germline mutations [10-13]:  Single nucleotide variations (SNV)  Structural variations (SV) o Copy number variations (CNV) o Retrotransposon long interspersed element-1 (LINE-1 or L1) insertions o Deletions associated with transposable elements. Somatic mutations have variant allele fractions (VAFs) that "evolve" from those of the germline mutations [14]: VAF =

𝑥 (𝑖) , 𝑛 (𝑖)

where 𝑥 (𝑖) is the sampled frequency of allele 𝑖, and 𝑛 (𝑖) is the total number of sampled cells, from which the sample allele 𝑖 is derived. Therefore, the VAFs of a somatic mutation are determined by two factors: prevalence and heterogeneity. The prevalence describes how widely distributed the somatic mutation is. The heterogeneity in this context refers to that of the tissue from which the sequenced sample was obtained. In extreme cases, if a mutation occurs during the first cell division and there is no heterogeneity in the sample, the VAF would be around 0.25. One of the key features of somatic mutations is their expected mutation rate. In cases of SNV, median somatic mutation rates are 2.8⨯10 ―7 and 4.4⨯10 ―7 per base pair (bp) per generation for humans and mice, respectively. These are more than an order of magnitude higher than the germline mutation rates in those species [15]. Kong et al. [16] have used the following five criteria for calling de novo SNP mutations from 78 Icelandic trios: 𝐿 (AA) 𝐿 (AR) 1. They excluded all the variants that have a likelihood ratio (𝐿 (RR) or 𝐿 (RR), where R is the reference allele and A the alternative allele) less than 104 as ‘false positive and obtained 6,221 candidates.’ 2. There should be at least 16 quality reads at the candidate site. 𝐿 (AR) 3. Likelihood ratio 𝐿 (RR) > 1010. 𝐿 (RR)

4. Likelihood ratio 𝐿 (AA) > 102 for both parents. 5. Include SNPs where the number of A allele calls is greater than 30 % among the proband’s sequence reads. We can also set relaxed criteria for detecting de novo somatic mutations by comparing against germline mutations. However, it is still a challenging task to detect rare variants at the genome-wide level with next-generation sequencing (NGS) technologies. For example, even the Illumina platform has a sensitivity of 0.1 % at best, which is far lower than what required to detect true rare mutations [17]. Therefore, we need sophisticated yet feasible methods to detect somatic mutations accurately.

2.1 Cancer and somatic mutations The most typical pathology caused by somatic mutations is cancer [18]. The NGS technology has made it possible to conduct detailed analysis on causal links between somatic mutations and malignancy: i.e., how normal cells evolve into cancer cells. Answering this question, however, requires longitudinal investigations on normal cells [19]. Besides Alfred Knudson’s classic two-hit model involving an inherited germline mutation and a spontaneous somatic mutation [20], metastatic cancers putatively occur as a result of multiple spontaneous mutations in a single cell [21]. To understand the mechanisms underlying cancer evolution, we need to conduct systematic analyses on the ageing process. From a clinical point of view, identifying pre-symptomatic traits is important to predict the future of ‘normal’ cells that carry spontaneous mutations [22]. We should also note that the detection of the mosaicism caused by somatic mutations is an almost intractable problem with conventional bulk sampling [23, 24]. Single-cell whole genome sequencing (WGS) is expected to provide comprehensive assessments of somatic mutations [5].

2.2 Ageing and somatic mutations Modern ageing theories can be categorized into two schools of thought: programmed and error theories [25]. ‘The somatic mutation theory’ or somatic DNA damage theory belongs to the latter. According to the somatic mutation theory, the ageing process is characterized by a decrease in genome integrity [26], where DNA damages gradually accumulate in cells as DNA polymerases and the other DNA repair mechanisms fail to correct the errors, leading to older and more deteriorated cells that have the potential to malfunction. Notably, the telomere damage in the nucleus can trigger the activation of p53 that is associated with mitochondrial dysfunction, causing apoptosis and age-related dysfunction of the mitochondrion-rich quiescent tissues [27, 28]. The most striking example that supports the somatic mutation theory is seen in patients suffering from inherited DNA repair deficits, who exhibit premature ageing-like phenotypes called Werner syndrome (WS or WRN) [29, 30]. They have mutations through which the WRN gene product is lost, resulting in early onset cataracts, scleroderma, thinning grey hair, atherosclerosis, diabetes, myocardial infarction, stroke, osteoporosis, and cancers [29, 31]. The WRN gene is involved in homologous recombination, telomere maintenance, and DNA repair. The lack of the WRN presumably causes rapid accumulation of various types of somatic mutations [32]. Many studies suggest that somatic mutation burden increases with age. This may manifest as an ageing-dependent increase of mutations in human lymphocytes [33], T-cell receptors [34], or hypoxanthine phosphoribosyl transferase (HPRT) [35]. The mutations induce apoptosis or senescence and a reduction in the number of stem cell lineages [36]. The remaining of the stem cell lineages need to cover the loss and undertake a higher somatic mutation burden. This may influence replicative fitness, resulting in a catastrophic breakdown, cell attrition, and loss of replicative homeostasis [31].

2.3 Inflammation and somatic mutations While some ageing-related somatic mutations present as loss-of-function mutations [37], many of inflammation-related somatic mutations are of the gain-of-function mutations. In the context of damaged blood vessels caused by inflammatory reactions that result in

Alzheimer’s dementia (AD), for example, gain-of-function somatic mutations in NLRP3, APP, TREX1, NOTCH3, and ColA1 putatively damage the microvasculature of the brain [38]. Through gain-of-function mutations, the gene encoding the tumor suppressor protein p53 (TP53) likely regulates the malignant phenotypes of glioblastoma (GBM) [39]. Ham et al. also showed that a TP53 gain-of-function mutation promotes inflammation in GBM [39]. Many human inflammatory hepatocellular adenomata (IHCAs, benign liver tumors) have IL6 signal transducer (IL6ST or CD130) mutations that activate interleukin 6 (IL-6) signaling. Meanwhile, 21 % of IHCA that have no IL6ST mutations actually harbor somatic mutations in the signal transducer and activator of transcription 3 (STAT3 or APRF) [40]. It is known that three-prime repair exonuclease 1 (TREX1) prevents a type I interferon-associated inflammatory response [41]. If TREX1 function is undermined by a structural mutation (e.g., an insertion of endogenous retrotransposon LINE-1), autoimmune diseases can occur and lead to neuroinflammation [42]. While the details of the involvement of retro-transposition in the human brain remain a controversial subject, it has been suggested that inflammatory stress in the fetal brain can cause somatic retro-transposition in a mouse model [43].

2.4 Post-zygotic germline mutations and an evolutionary implication Since the discovery of the Ac/Ds transposition by Barbara McClintock [10, 44], the scientific community has realized the importance of various types of post-zygotic mutations [45, 46]. Post-zygotic mutations in the earlier stages of development have larger phenotypic impacts [45] (Fig. 2). In the most extreme cases, a mutation can occur during the initial mitosis, producing two cells with different genotypes. In mice, the mosaicism reveals leftright separation if post-zygotic mutations occur at the eight-cell stage [47]. While mutations that occur before left-right determination likely affect both sides of the individual, mutations after the determination point will affect only one side [48]. The point here is that the gonads are not exceptional. This means that the germline can also be subject to mosaicism and that this post-zygotic mutation can potentially be transmitted to the next generation. Paternal bias of de novo post-zygotic mutations is another transmission genetics issue [49]. While primary oocytes are arrested during prophase of meiosis I, spermatogonia undergo mitosis to self-renew and to produce sperm precursors. As a result, the mutation rate of the paternal germ lineage is much higher than the maternal germ lineage because the former is exposed to more mitotic division cycles than the latter [50, 51]. This means that mitotic risk is higher. For example, a 30-year-old male experiences around 400 mitoses, an order of magnitude more than that of an oocyte [45]. This means that the male germline continues to be exposed to the mitotic risk potentially associated with exogenous environmental factors; the male individual experiences these environmental factors during his lifetime. This implies that genetic traits (and consequent phenotypic traits) caused by external stimuli may be inherited transgenerationally. However, it would be a bit too early to discuss this Lamarckism-taste issue here. With the rapid accumulation of post-zygotic mutation data, however, it may be possible to elucidate evolutionary perspectives between post-zygotic mutations and phenotypes in the near future.

2.5 Transposable elements (TE) and mosaicism One of transposable elements, the retrotransposon long-interspersed element 1 (LINE-1 or L1), is extensively integrated in the mammalian germline [52]. In somatic cells, L1 retrotransposition can lead to somatic mosaicism. The behavior of L1 is still not clear, with the

exception of its involvements in L1-mosaicism in early embryonic development and neural cells. While typical somatic mutations are not inherited, donors of L1s causing de novo L1 insertions in somatic cells are transmitted through the germline [53]. Therefore, both somatic and germline L1s are subject to selection. This is an L1-specific feature that is not presented by SNVs. Another issue regarding the evolutionary perspective is that the activities of L1 may affect the fitness of the individual. If a donor L1 is very active, it may decrease the fitness of the individual due to a subsequent disease. In differentiated stem cells and adult somatic cells, L1 is rarely expressed. However, its expression is observed in epithelial cancers [54] and in stem cells in their undifferentiated state, indicating that L1 activity is linked to cell proliferation and differentiation [55]. Several studies show that L1 elements “jump” during neuronal differentiation, inducing variations that may affect neuronal plasticity and behavior [56]. An outstanding characteristic of the nervous system is its diversity. Like the function of the immune system during an infection to improve its recognition of the pathogen [57], we may be able to explain how the neuronal diversity of the brain can be encoded by only approximately 20,000 genes: i.e., the cellular diversity is realized by somatic DNA rearrangement by L1. However, this hypothesis is still needs to be evaluated [58, 59].

2.6 Mosaicism in the brain Psychiatric disorders are notoriously complex in terms of gene-environment interactions. SNVs and CNVs provide important genetic information for inheritable risk factors. In the past, all of the non-inherited risks were attributed to the environmental factors [43], and were supposed to be technically intractable. However, with recent genomic analyses this view is beginning to be revised. Intraindividual genetic diversity (somatic mosaicism) may play an important role in cognition and behavior [5]. It is well known that the mosaicism due to somatic mutations is prominent in the central nervous system (brain). Since neurons are not replenished except for those in the dentate gyrus and the subventricular regions, which are subject to neurogenesis, somatic mutations are fixed after the maturation of the brain [60]. The other ageing tissues, in contrast, are subject to mosaicism with clonal dominance of expanded mutant stem and progenitor cell populations [26]. Therefore, in such tissues, mitotic risk gets higher with age. In a sense, the brain is carefully protected from the mitotic risk. Somatic mutations in the brain are, however, not always ‘deleterious’. The mosaicism in the central nervous system is prominent due to retrotransposons, by which subpopulations of neuron may acquire genetic diversity as described in the previous section. For example, the L1 mobilization in the latter phase of neurogenesis causes each neuron to be genetically ‘unique’ in terms of the cohort that has somatic L1 insertions. Since the hippocampus continues its neurogenesis until adulthood, this mosaicism is persistently accumulated. Such mosaicism leads to genetic diversity among subpopulations of neurons, potentially contributing to functional complexity with a limited number of genes. Of course, the misregulation of mobile elements can cause the neural disorders, including Rett syndrome and schizophrenia [61].

2.7 Gene expressions and somatic mutations Somatic mutations in a tissue can alter the gene expression patterns. For example, global gene expression patterns associated with somatic mutations were assessed with RNA

sequencing (RNA-seq) to study intracranial aneurysms (IA) [62]. Li et al. [62] found a set of hub proteins––IKBKG, ACTB, and MKI67IP, which are associated with inflammation processes and other pathogeneses of IA. They postulated that potential somatic alterations in the differentially expressed genes (DEGs) are essential in IA. Many of the DEG studies associated with somatic mutations have been conducted in the field of cancer research. Using a Bayes statistical model, xseq, Ding et al. conducted quantification of effects of somatic mutations on expression profiles in 12 tumors [63]. By using a mutation matrix, gene interaction network, and a gene expression matrix, they estimated the posterior marginal probabilities of each gene (𝑃(𝐷)), each mutation (𝑃(𝐹)) influencing expression patterns, and regulatory probabilities of the genes connected to the mutated gene in a patient (𝑃(𝐺)). Another systematic analysis of 33 cancer types revealed genes that were associated with somatic mutations and the cores of a co-expression network [64].

3 Detection of somatic mutations The ‘next-generation sequencing’ (NGS) or the second-generation sequencing (2GS) appeared at the end of the twentieth century, in order to overcome limitations of Sangerbased sequencing technology [65, 66]. The impact of NGS was so enormous that it revolutionized basal experimental designs. However, NGS has mainly three kinds of drawbacks: (1) the short reads may lose information that the original DNA sequence data have; (2) we need the large computational cost for post-processing; (3) error rates are notoriously high [67, 68]. The shortcomings of NGS are inevitable with detecting rare variants. To reduce the intractability, many methods have been proposed (Fig. 3). The followings are examples of such methods. 3.1

Duplex Sequencing (Duplex-seq) [69, 70]

To identify the true mutations, the two DNA strands are tagged with random sequences (barcodes) and adaptors. Using these tags, we can recognize which strand as well as which read a potential mutation comes from. After PCR amplification, read sequences from each strand can be collected using the adapters. We compare a pair of the copied reads that are from the same DNA strands. If we assume that the PCR/sequencing error rates are very low, we can identify which potential mutations are true mutations. 3.2

O2n-seq [70]

To eliminate sequencing errors, the O2n-seq method uses two different copies of one original molecule into a pair of paired-end (PE) reads, improving data efficiency and reducing library bias. The O2n-seq method has the advantages of both the barcode method and the rolling cycle-amplification (RCA) method, which effectively produces tandemly copied sequences (O2n-seq reads) [71, 72]. The authors claim that this method can detect ‘low- and ultralowfrequency mutations with ultralow error rate’. Genomic DNA (gDNA) is sheared into pieces shorter than the length of a single PE read. As a result, each fragment is sequenced twice independently. The DNA sequences with Y-shaped adaptors are denatured into single-stranded molecules and circularized by a singlestrand DNA ligase. Using high-fidelity DNA polymerase and primers, the second strand is synthesized. The circularized double strands are linearized by USER enzyme, following heteroduplex formation between the first and second strands, and subsequent string

displacement. This produces the tandem copies of the DNA sequence, O2n-seq reads, which are an NGS-ready standard library. If a variant exists on only one DNA copy, the variant should be an error. If a variant is supported by the both DNA copies, it is considered a genuine mutation. 3.3

Bottleneck sequencing system (BotSeqS) [17]

This is a simple method to increase the sensitivity of sequencing by diluting libraries. The dilution makes it possible to conduct random sampling of double-stranded template molecules with a higher likelihood than the other methods. This implies that we can randomly sample the “Watson” and “Crick” strands of the DNA molecule in an efficient manner, leading to fewer artifacts and increased specificity [73]. 3.4

CypherSeq [74]

It is obvious that if we can achieve high coverage sequencing, sequencing errors can be ruled out. In the real world, however, it is necessary to take into account the cost and time, especially in clinical applications. CypherSeq is a new application to explore the use of lowcoverage data. Using double-stranded barcoding, we can trace back a PCR product to the original DNA, by which we can correct errors. In the meantime, the circular nature of the barcoded CypherSeq vectors allows for enrichment and amplification of specific targets via RCA.

4 Detection of mutations with the third-generation sequencing (3GS): the nanopore technology While NGS technologies would be significant in clinical diagnostics, their relatively high costs (e.g., capital costs and complex library preparations) are still major obstacles to their introduction. The third-generation sequencing (3GS) can read nucleotide sequences at the single molecule level [75]: i.e., it can read longer reads in a relatively short time without the need for amplification libraries [76]: e.g., MinION/PromethION (Oxford Nanopore Technologies) [77] and PacBio/Sequel (Pacific Biosciences, Menlo Park, CA, USA) [78] are such paradigm-shifting devices. One of the drawbacks of 3GS is its increased error rates (5 %-15 %) compared with NGS due to the instability of the molecular machinery during the sequencing process [79]. Various computational strategies were proposed to translate raw sequencing data to base calls to obtain consensus data [80]. For example, MinION was used to detect variants of TP53 and ABL1 genes in chronic lymphocytic leukemia (CLL) [81]. However, we should note that detection of SNVs depends on the threshold of the VAFs of the MinION reads [82]. Another issue of 3GS is that the long-read sequencing would be more advantageous to investigate structural variants and retrotransposons, as well as somatic CNVs [43]. In terms of epigenetics, meanwhile, it is possible to distinguish 5-methylcytosine from unmethylated cytosine by using low-coverage sequencing with a hidden Markov model [83].

5

Machine-learning approaches

In cancer therapies, sensitive detection of rare variants in subclonal and/or low-purity tumor samples is a challenging problem. How to distinguish genuine somatic mutations from germline mutations or from sequencing artifacts caused by PCR amplification or the

sequencing itself is also an open question. To increase the accuracy of identifying somatic mutations, machine-learning approaches were applied to overcome such problems. For example, Cerebro detects true somatic mutations with a high sensitivity and positive predictability (97% and 98%, respectively) [84] (Fig. 4). The strategy used in Cerebro is as follows: a pair of tumor and normal whole exome sequence data are mapped to the reference genome by using “dual alignment (alignment of the two kinds of sequences to the reference genome)” to obtain the consensus mutation calling. By characterizing the data, the system constructs a randomized classification model, “an extremely randomized trees classification model (Cerebro)”, from which we can evaluate a forest of decision trees to give a confidence score for each candidate variant. Another recent example of the application of machine-learning approaches is Sentieon TNscope [85]. While Cerebro is based on a classic artificial intelligence approach (e.g., decision trees), TNscope utilizes a novel variant score combining two log-odd ratios (NLOD and TLOD). There was an international effort to develop standard methods to identify cancer-associated mutations and rearrangements in whole-genome sequencing (WGS), called the ICGC-TCGA DREAM Genomic Mutation Calling Challenge [86, 87]. The authors claimed that TNscope was the leader in accuracy for SNVs, indels, and SVs in the most recent challenge. There are several variant callers based on machine learning frameworks: e.g., MutationSeq [88], SomaticSeq [89], SNooPer [90], and BAYSIC [91]. MutationSeq trains four kinds of classifiers (random forest, Bayesian adaptive regression tree, support vector machine, and logistic regression) using extracted relevant features on each site as well as a set of ground truth somatic mutations. SomaticSeq is a stochastic boosting algorithm for both SNVs and small structural variants. SNooPer is particularly designed for low-coverage data. BAYSIC applies an unsupervised latent class model to multi-calls like MutationSeq.

6

Tracing cell lineages by somatic mutations

The clarification of lineage relationships between cells is not only long-standing interests in the developmental study, but also important in understanding pathological states. Lineage tracing has become a state of the art technique with the involvement of retroviral libraries [92]. The rapid development of this methodology has made it possible to perform single-cell sequencing [93] as well as the advanced exogenous labeling [94]. Using these tools, we can track cell lineages at a far larger scale. There are two primary methods of lineage tracing: prospective and retrospective lineage tracing. Prospective lineage tracing requires an exogenous lineage mark in a single founder cell, which is ‘inherited’ by progeny cells: i.e., it traces cell lineages along a timeline. Contrastingly, retrospective lineage tracing retrieves cells backwards to observe endogenous marks accumulated over a lifetime: i.e., it traces cell lineages against a timeline. Representative prospective lineage-tracing methods include: (1) sparse retroviral labelling [95, 96], (2) the transposon plasmid vector system [97], (3) tissue-specific genetic recombination with Cre transgenic line [98], (4) multicolor mosaics with Cre-transgenic lines [99], and (5) CRISPR-Cas9 genome-editing systems [100, 101]. In retrospective lineage tracing, we can estimate the ‘history’ of cells through accumulated somatic mutations in the cells (Fig. 5). In this context, we have a finer categories of somatic mutations than described before: SNV, CNV, retrotransposition event (L1), microsatellite, and single-strand legion (cytosine deamination). To reconstruct the genealogy of cell types, we need to identify mutations that are shared between different cells. However, it was difficult to discover low frequency somatic

mutations from a mixed population. Owing to deep NGS, single cell genome sequencing, and RNA-seq, it is now possible to directly identify rare mutations that mark cell lineages [102]. We should note that, depending on the types of somatic mutations and sequencing methods, the required depth of coverage varies; for L1 retro-transposition events, 0.35x-40x; for CNV, 0.05x-50x; for SNV, 15x-500x (500x is for targeted sequencing); and for microsatellite, 40x [94]. While the single-cell genome sequencing can be a powerful tool for the retrospective lineage tracing, its drawback is technical artifacts that manifest as sequencing errors. Furthermore, the amount of DNA required for whole-genome sequencing is far more than the amount present in a single cell. Pre-sequencing genome amplification may yield complex data output that requires interpretation to calibrate the amplification errors [103]. In the meantime, new methods based on single-cell RNA-sequencing (scRNA-seq) data are emerging [104, 105].

7

Conclusion

Somatic mutations have significant biological impact in terms of ageing-related diseases, e.g., cancer, inflammation, and neurodegenerative diseases. Previous studies suggest that the ageing process itself may be a consequence of somatic mutations [31, 106]. To understand the mechanisms underlying senescence-related disease, it is important to analyze somatic mutation patterns in the post-zygotic cells. However, it is not a straightforward task to detect genuine somatic mutations. While NGS is a powerful tool to analyze somatic mutations, we need both experimental and computational methods to improve efficiency and sensitivity. Detecting somatic mutations by using RNA-seq data is more challenging than detecting mutations using DNA-seq data. Considering the rapid accumulation of RNA data, however, the RNA-seq-based detection of somatic mutations has an advantage for the data enrichment. Emerging single-cell sequencing technologies are promising in detecting stochastic age-related errors. The 3GS technology is expected to accelerate the acquisition of RNA-seq data even though this new technology requires further computational postprocessing. In the meantime, the machine learning is a powerful tool for detecting genuine somatic mutations from candidate rare variants. Genetic alternations with somatic mutations can be regarded as a type of ‘evolution’ within an individual: i.e., it is possible to interpret the post-zygotic genetic alterations as analogous to germline mutations. In this case, the mosaicism in a somatic cell population corresponding to polymorphisms in a population that shares a gene pool, and an efficiency of proliferation of particular (genetically altered) cells is fitness. Cell lineages inferred in a retrospective manner correspond to a phylogenetic tree. While this type of analogy has been used in oncology, in other fields, such as in ageing, inflammation, and sporadic psychiatric diseases, it still lacks rigorous mathematical frameworks. Single-cell transcriptome analysis has advantages regarding detailed assessments of somatic mutations: e.g., detection of stochastic age-related errors, transcriptional noise and signs of fate drift, and the modality of error accumulation [107]. Single-cell level somatic mutation data are expected to be accumulated rapidly. Somatic mutations can be a good resource to construct mathematical models to describe the ‘somatic evolution.’

Acknowledgements I would like to thank the anonymous reviewers for their comments to brush up this paper. I am grateful to Dr. Kazuho Ikeo for providing us an evolutionary view to interpret somatic

mutations. This work is dedicated to my former superior, late Dr. Fukami-Kobayashi. This work is supported by KAKENHI (17H06399).

References [1] [2] [3] [4] [5] [6] [7] [8] [9]

[10] [11] [12] [13] [14] [15] [16]

B. Alberts, A. Johnson, and J. Lewis, "Chapter 21, Development of Multicellular Organisms," in Molecular Biology of the Cell4th edition ed. New York: Garland Science, 2002. S. F. Gilbert, "Principles of Development: Developmental Genetics," in Developmental Biology6th edition ed. Sunderland (MA):: Sinauer Associates, 2000. R. A. Risques and S. R. Kennedy, "Aging and the rise of somatic cancer-associated mutations in normal tissues," PLoS Genet, vol. 14, no. 1, p. e1007108, Jan 2018. E. B. Keverne, D. W. Pfaff, and I. Tabansky, "Epigenetic changes in the developing brain: Effects on behavior," Proceedings of the National Academy of Sciences, vol. 112, no. 22, pp. 6789-6795, 2015. A. C. M. Paquola, J. A. Erwin, and F. H. Gage, "Insights into the role of somatic mosaicism in the brain," Curr Opin Syst Biol, vol. 1, pp. 90-94, Feb 2017. A. J. F. Griffiths, J. H. Miller, and D. T. Suzuki, An Introduction to Genetic Analysis, 7th edition ed. New York: W. H. Freeman, 2000. R. Acuna-Hidalgo et al., "Post-zygotic Point Mutations Are an Underrecognized Source of De Novo Genomic Variation," Am J Hum Genet, vol. 97, no. 1, pp. 67-74, Jul 2 2015. E. T. Lim et al., "Rates, distribution and implications of postzygotic mosaic mutations in autism spectrum disorder," Nat Neurosci, vol. 20, no. 9, pp. 1217-1224, Sep 2017. Communications Department of Karolinska Institutet. (2017, December 17). Mutations occurring after fertilisation could play a role in autism. Available: https://ki.se/en/news/mutations-occurring-after-fertilisation-could-play-a-role-inautism C. B. Mc, "The origin and behavior of mutable loci in maize," Proc Natl Acad Sci U S A, vol. 36, no. 6, pp. 344-55, Jun 1950. C. Xu, "A review of somatic single nucleotide variant calling algorithms for nextgeneration sequencing data," Comput Struct Biotechnol J, vol. 16, pp. 15-24, 2018. M. Fontanilles et al., "Non-invasive detection of somatic mutations using nextgeneration sequencing in primary central nervous system lymphoma," Oncotarget, vol. 8, no. 29, pp. 48157-48168, Jul 18 2017. L. Koch, "A catalogue of somatic mutations," Nature Reviews Genetics, vol. 17, p. 378, 05/09/online 2016. Y. Dou, H. D. Gold, L. J. Luquette, and P. J. Park, "Detecting Somatic Mutations in Normal Cells," Trends Genet, vol. 34, no. 7, pp. 545-557, Jul 2018. B. Milholland, X. Dong, L. Zhang, X. Hao, Y. Suh, and J. Vijg, "Differences between germline and somatic mutation rates in humans and mice," Nat Commun, vol. 8, p. 15183, May 9 2017. A. Kong et al., "Rate of de novo mutations and the importance of father's age to disease risk," Nature, vol. 488, no. 7412, pp. 471-5, Aug 23 2012.

[17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37]

M. L. Hoang et al., "Genome-wide quantification of rare somatic mutations in normal human tissues using massively parallel sequencing," Proceedings of the National Academy of Sciences, vol. 113, no. 35, pp. 9846-9851, 2016. I. Martincorena and P. J. Campbell, "Somatic mutation in cancer and normal cells," Science, vol. 349, no. 6255, pp. 1483-1489, 2015. C. A. Aktipis, V. S. Kwan, K. A. Johnson, S. L. Neuberg, and C. C. Maley, "Overlooking evolution: a systematic analysis of cancer relapse and therapeutic resistance research," PLoS One, vol. 6, no. 11, p. e26100, 2011. A. G. Knudson, Jr., "Mutation and cancer: statistical study of retinoblastoma," Proc Natl Acad Sci U S A, vol. 68, no. 4, pp. 820-3, Apr 1971. R. C. Bast, Jr., B. Hennessy, and G. B. Mills, "The biology of ovarian cancer: new opportunities for translation," Nat Rev Cancer, vol. 9, no. 6, pp. 415-28, Jun 2009. L. N. Lodder et al., "Presymptomatic testing for BRCA1 and BRCA2: how distressing are the pre-test weeks? Rotterdam/Leiden Genetics Working Group," J Med Genet, vol. 36, no. 12, pp. 906-13, Dec 1999. N. McGranahan and C. Swanton, "Clonal Heterogeneity and Tumor Evolution: Past, Present, and the Future," Cell, vol. 168, no. 4, pp. 613-628, Feb 9 2017. Y. Liu, Q. He, and W. Sun, "Association analysis using somatic mutations," PLoS Genet, vol. 14, no. 11, p. e1007746, Nov 2018. K. Jin, "Modern Biological Theories of Aging," Aging Dis, vol. 1, no. 2, pp. 72-74, Oct 1 2010. P. D. Adams, H. Jasper, and K. L. Rudolph, "Aging-Induced Stem Cell Mutations as Drivers for Disease and Cancer," Cell Stem Cell, vol. 16, no. 6, pp. 601-12, Jun 4 2015. E. Sahin et al., "Telomere dysfunction induces metabolic and mitochondrial compromise," Nature, Article vol. 470, p. 359, 02/09/online 2011. D. P. Kelly, "Ageing theories unified," Nature, vol. 470, p. 342, 02/09/online 2011. R. J. Monnat, Jr., "Human RECQ helicases: roles in DNA metabolism, mutagenesis and cancer biology," Semin Cancer Biol, vol. 20, no. 5, pp. 329-39, Oct 2010. C. E. Yu et al., "Positional cloning of the Werner's syndrome gene," Science, vol. 272, no. 5259, pp. 258-62, Apr 12 1996. S. R. Kennedy, L. A. Loeb, and A. J. Herr, "Somatic mutations in aging, cancer and neurodegeneration," Mech Ageing Dev, vol. 133, no. 4, pp. 118-26, Apr 2012. S. Kyoizumi, Y. Kusunoki, T. Seyama, A. Hatamochi, and M. Goto, "In vivo somatic mutations in Werner's syndrome," Hum Genet, vol. 103, no. 4, pp. 405-10, Oct 1998. S. A. Grist, M. McCarron, A. Kutlaca, D. R. Turner, and A. A. Morley, "In vivo human somatic mutation: frequency and spectrum with age," Mutat Res, vol. 266, no. 2, pp. 189-96, Apr 1992. M. Akiyama, S. Kyoizumi, Y. Hirai, Y. Kusunoki, K. S. Iwamoto, and N. Nakamura, "Mutation frequency in human blood cells increases with age," Mutat Res, vol. 338, no. 1-6, pp. 141-9, Oct 1995. R. F. Branda et al., "Measurement of HPRT mutant frequencies in T-lymphocytes from healthy human populations," Mutat Res, vol. 285, no. 2, pp. 267-79, Feb 1993. J. Campisi and J. Vijg, "Does damage to DNA and other macromolecules play a role in aging? If so, how?," J Gerontol A Biol Sci Med Sci, vol. 64, no. 2, pp. 175-8, Feb 2009. J. Vijg, "Somatic mutations and aging: a re-evaluation," Mutat Res, vol. 447, no. 1, pp. 117-35, Jan 17 2000.

[38] [39] [40] [41] [42] [43] [44] [45] [46]

[47] [48] [49] [50] [51] [52] [53] [54]

V. T. Marchesi, "Gain-of-function somatic mutations contribute to inflammation and blood vessel damage that lead to Alzheimer dementia: a hypothesis," FASEB J, vol. 30, no. 2, pp. 503-6, Feb 2016. S. W. Ham et al., "TP53 gain-of-function mutation promotes inflammation in glioblastoma," Cell Death & Differentiation, 2018/05/21 2018. C. Pilati et al., "Somatic mutations activating STAT3 in human inflammatory hepatocellular adenomas," The Journal of Experimental Medicine, vol. 208, no. 7, pp. 1359-1366, 2011. A. Richards et al., "C-terminal truncations in human 3'-5' DNA exonuclease TREX1 cause autosomal dominant retinal vasculopathy with cerebral leukodystrophy," Nat Genet, vol. 39, no. 9, pp. 1068-70, Sep 2007. C. A. Thomas et al., "Modeling of TREX1-Dependent Autoimmune Disease using Human Stem Cells Highlights L1 Accumulation as a Source of Neuroinflammation," Cell Stem Cell, vol. 21, no. 3, pp. 319-331 e8, Sep 7 2017. M. Nishioka, M. Bundo, K. Iwamoto, and T. Kato, "Somatic mutations in the human brain: implications for psychiatric research," Molecular Psychiatry, 2018/08/07 2018. C. B. Mc, "Chromosome organization and genic expression," Cold Spring Harb Symp Quant Biol, vol. 16, pp. 13-47, 1951. I. M. Campbell, C. A. Shaw, P. Stankiewicz, and J. R. Lupski, "Somatic mosaicism: implications for disease and transmission genetics," Trends Genet, vol. 31, no. 7, pp. 382-92, Jul 2015. S. A. Frank, "Somatic evolutionary genomics: Mutations during development cause highly variable genetic mosaicism with risk of cancer and neurodegeneration," Proceedings of the National Academy of Sciences, 10.1073/pnas.0909343106 vol. 107, no. suppl 1, p. 1725, 2010. R. L. Gardner, "Normal bias in the direction of fetal rotation depends on blastomere composition during early cleavage in the mouse," PLoS One, vol. 5, no. 3, p. e9610, Mar 10 2010. R. Chander, B. Varghese, M. Jabeen, T. Garg, and M. Jain, "CHILD syndrome with thrombocytosis and congenital dislocation of hip: A case report from India," Dermatol Online J, vol. 16, no. 8, p. 6, Aug 15 2010. R. Ma et al., "A clear bias in parental origin of de novo pathogenic CNVs related to intellectual disability, developmental delay and multiple congenital anomalies," Scientific Reports, Article vol. 7, p. 44446, 03/21/online 2017. R. Rahbari et al., "Timing, rates and spectra of human germline mutation," Nat Genet, vol. 48, no. 2, pp. 126-133, Feb 2016. A. Scally, "Mutation rates and the evolution of germline structure," Philos Trans R Soc Lond B Biol Sci, vol. 371, no. 1699, Jul 19 2016. G. J. Faulkner and J. L. Garcia-Perez, "L1 Mosaicism in Mammals: Extent, Effects, and Evolution," Trends Genet, vol. 33, no. 11, pp. 802-816, Nov 2017. G. J. Faulkner and J. L. Garcia-Perez, "L1 Mosaicism in Mammals: Extent, Effects, and Evolution," Trends in Genetics, vol. 33, no. 11, pp. 802-816, 2017/11/01/ 2017. D. Rangasamy, N. Lenka, S. Ohms, J. E. Dahlstrom, A. C. Blackburn, and P. G. Board, "Activation of LINE-1 Retrotransposon Increases the Risk of Epithelial-Mesenchymal Transition and Metastasis in Epithelial Cancer," (in eng), Current molecular medicine, vol. 15, no. 7, pp. 588-597, 2015 2015.

[55]

[56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71]

D. Rangasamy, N. Lenka, S. Ohms, J. E. Dahlstrom, A. C. Blackburn, and P. G. Board, "Activation of LINE-1 Retrotransposon Increases the Risk of Epithelial-Mesenchymal Transition and Metastasis in Epithelial Cancer," Curr Mol Med, vol. 15, no. 7, pp. 58897, 2015. T. Singer, M. J. McConnell, M. C. Marchetto, N. G. Coufal, and F. H. Gage, "LINE-1 retrotransposons: mediators of somatic variation in neuronal genomes?," Trends Neurosci, vol. 33, no. 8, pp. 345-54, Aug 2010. B. Pakkenberg et al., "Aging and the human neocortex," Exp Gerontol, vol. 38, no. 1-2, pp. 95-9, Jan-Feb 2003. A. Abeliovich, D. Gerber, O. Tanaka, M. Katsuki, A. M. Graybiel, and S. Tonegawa, "On somatic recombination in the central nervous system of transgenic mice," Science, vol. 257, no. 5068, pp. 404-10, Jul 17 1992. M. J. McConnell et al., "Failed clearance of aneuploid embryonic neural progenitor cells leads to excess aneuploidy in the Atm-deficient but not the Trp53-deficient adult cerebral cortex," J Neurosci, vol. 24, no. 37, pp. 8090-6, Sep 15 2004. M. J. McConnell et al., "Intersection of diverse neuronal genomes and neuropsychiatric disease: The Brain Somatic Mosaicism Network," Science, vol. 356, no. 6336, Apr 28 2017. J. A. Erwin, M. C. Marchetto, and F. H. Gage, "Mobile DNA elements in the generation of diversity and complexity in the brain," Nat Rev Neurosci, vol. 15, no. 8, pp. 497-506, Aug 2014. Z. Li et al., "Global Gene Expression Patterns and Somatic Mutations in Sporadic Intracranial Aneurysms," World Neurosurg, vol. 100, pp. 15-21, Apr 2017. J. Ding et al., "Systematic analysis of somatic mutations impacting gene expression in 12 tumour types," Nat Commun, vol. 6, p. 8554, Oct 5 2015. H. Kim and Y.-M. Kim, "Pan-cancer analysis of somatic mutations and transcriptomes reveals common functional gene clusters shared by multiple cancer types," Scientific Reports, vol. 8, no. 1, p. 6041, 2018/04/16 2018. F. Sanger and A. R. Coulson, "A rapid method for determining sequences in DNA by primed synthesis with DNA polymerase," J Mol Biol, vol. 94, no. 3, pp. 441-8, May 25 1975. F. Sanger, S. Nicklen, and A. R. Coulson, "DNA sequencing with chain-terminating inhibitors," Proc Natl Acad Sci U S A, vol. 74, no. 12, pp. 5463-7, Dec 1977. X. V. Wang, N. Blades, J. Ding, R. Sultana, and G. Parmigiani, "Estimation of sequencing error rates in short reads," BMC Bioinformatics, vol. 13, p. 185, Jul 30 2012. F. Pfeiffer et al., "Systematic evaluation of error rates and causes in short samples in next-generation sequencing," Scientific Reports, vol. 8, no. 1, p. 10950, 2018/07/19 2018. V. Marx, "Cancer: hunting rare somatic mutations," Nat Methods, vol. 13, no. 4, pp. 295-9, Mar 30 2016. K. Wang et al., "Ultrasensitive and high-efficiency screen of de novo low-frequency mutations by o2n-seq," Nature communications, vol. 8, pp. 15335-15335, 2017. T. Yoshimura, K. Nishida, K. Uchibayashi, and S. Ohuchi, "Microwave assisted rolling circle amplification," Nucleic Acids Symposium Series, vol. 50, no. 1, pp. 305-306, 2006.

[72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90]

R. Johne, H. Müller, A. Rector, M. van Ranst, and H. Stevens, "Rolling-circle amplification of viral DNA genomes using phi29 polymerase," Trends in Microbiology, vol. 17, no. 5, pp. 205-211, 2009. M. W. Schmitt, S. R. Kennedy, J. J. Salk, E. J. Fox, J. B. Hiatt, and L. A. Loeb, "Detection of ultra-rare mutations by next-generation sequencing," Proceedings of the National Academy of Sciences, vol. 109, no. 36, pp. 14508-14513, 2012. M. T. Gregory et al., "Targeted single molecule mutation detection with massively parallel sequencing," Nucleic Acids Res, vol. 44, no. 3, p. e22, Feb 18 2016. C. S. Pareek, R. Smoczynski, and A. Tretyn, "Sequencing technologies and genome sequencing," J Appl Genet, vol. 52, no. 4, pp. 413-35, Nov 2011. M. Kchouk, J.-F. Gibrat, and M. Elloumi, "Generations of Sequencing Technologies: From First to Next Generation," Biology and Medicine, vol. 9, no. 3, pp. 1-8, 2017. M. Eisenstein, "Oxford Nanopore announcement sets sequencing sector abuzz," Nat Biotechnol, vol. 30, no. 4, pp. 295-6, Apr 10 2012. A. Rhoads and K. F. Au, "PacBio Sequencing and Its Applications," Genomics Proteomics Bioinformatics, vol. 13, no. 5, pp. 278-89, Oct 2015. P. K. Gupta, "Single-molecule DNA sequencing technologies for future genomics research," Trends in Biotechnology, vol. 26, no. 11, pp. 602-611, 2008/11/01/ 2008. F. J. Rang, W. P. Kloosterman, and J. de Ridder, "From squiggle to basepair: computational approaches for improving nanopore sequencing read accuracy," Genome Biology, journal article vol. 19, no. 1, p. 90, July 13 2018. P. Orsini et al., "Design and MinION testing of a nanopore targeted gene sequencing panel for chronic lymphocytic leukemia," Scientific Reports, vol. 8, no. 1, p. 11798, 2018/08/07 2018. A. Suzuki et al., "Sequencing and phasing cancer mutations in lung cancers using a long-read portable sequencer," DNA Research, vol. 24, no. 6, pp. 585-596, 2017. J. T. Simpson, R. Workman, P. C. Zuzarte, M. David, L. J. Dursi, and W. Timp, "Detecting DNA Methylation using the Oxford Nanopore Technologies MinION sequencer," bioRxiv, p. 047142, 2016. D. E. Wood et al., "A machine learning approach for somatic mutation discovery," Science Translational Medicine, vol. 10, no. 457, p. eaar7939, 2018. D. Freed, R. Pan, and R. Aldana, "TNscope: Accurate Detection of Somatic Mutations with Haplotype-based Variant Candidate Detection and Machine Learning Filtering," bioRxiv, p. 250647, 2018. C. The International Cancer Genome et al., "International network of cancer genome projects," Nature, Perspective vol. 464, p. 993, 04/15/online 2010. K. Tomczak, P. Czerwinska, and M. Wiznerowicz, "The Cancer Genome Atlas (TCGA): an immeasurable source of knowledge," Contemp Oncol (Pozn), vol. 19, no. 1A, pp. A68-77, 2015. J. Ding et al., "Feature-based classifiers for somatic mutation detection in tumournormal paired sequencing data," Bioinformatics, vol. 28, no. 2, pp. 167-75, Jan 15 2012. L. T. Fang et al., "An ensemble approach to accurately detect somatic mutations using SomaticSeq," Genome Biol, vol. 16, p. 197, Sep 17 2015. J. F. Spinella et al., "SNooPer: a machine learning-based method for somatic variant identification from low-pass next-generation sequencing," BMC Genomics, vol. 17, no. 1, p. 912, Nov 14 2016.

[91] [92] [93] [94] [95] [96] [97] [98] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108]

B. L. Cantarel, D. Weaver, N. McNeill, J. Zhang, A. J. Mackey, and J. Reese, "BAYSIC: a Bayesian method for combining sets of genome variants with improved specificity and sensitivity," BMC Bioinformatics, vol. 15, p. 104, Apr 12 2014. A. Gerrits et al., "Cellular barcoding tool for clonal analysis in the hematopoietic system," Blood, vol. 115, no. 13, pp. 2610-8, Apr 1 2010. L. Kester and A. van Oudenaarden, "Single-Cell Transcriptomics Meets Lineage Tracing," Cell Stem Cell, vol. 23, no. 2, pp. 166-179, Aug 2 2018. M. B. Woodworth, K. M. Girskis, and C. A. Walsh, "Building a lineage from single cells: genetic techniques for cell lineage tracking," Nature Reviews Genetics, Review Article vol. 18, p. 230, 01/23/online 2017. S. P. Beddington, "An autoradiographic analysis of the potency of embryonic ectoderm in the 8th day postimplantation mouse embryo," J Embryol Exp Morphol, vol. 64, pp. 87-104, Aug 1981. G. N. Serbedzija, M. Bronner-Fraser, and S. E. Fraser, "A vital dye analysis of the timing and pathways of avian trunk neural crest cell migration," Development, vol. 106, no. 4, pp. 809-16, Aug 1989. C. E. Holt, N. Garlick, and E. Cornel, "Lipofection of cDNAs in the embryonic vertebrate central nervous system," Neuron, vol. 4, no. 2, pp. 203-14, Feb 1990. P. C. Orban, D. Chui, and J. D. Marth, "Tissue- and site-specific DNA recombination in transgenic mice," Proceedings of the National Academy of Sciences of the United States of America, vol. 89, no. 15, pp. 6861-6865, 1992. D. Cai, K. B. Cohen, T. Luo, J. W. Lichtman, and J. R. Sanes, "Improved tools for the Brainbow toolbox," Nature Methods, vol. 10, p. 540, 05/05/online 2013. A. McKenna, G. M. Findlay, J. A. Gagnon, M. S. Horwitz, A. F. Schier, and J. Shendure, "Whole organism lineage tracing by combinatorial and cumulative genome editing," Science, p. aaf7907, 2016. J. P. Junker et al., "Massively parallel whole-organism lineage tracing using CRISPR/Cas9 induced genetic scars," bioRxiv, p. 056499, 2016. E. Shapiro, T. Biezuner, and S. Linnarsson, "Single-cell sequencing-based technologies will revolutionize whole-organism science," Nat Rev Genet, vol. 14, no. 9, pp. 618-30, Sep 2013. D. Grun and A. van Oudenaarden, "Design and Analysis of Single-Cell Sequencing Experiments," Cell, vol. 163, no. 4, pp. 799-810, Nov 5 2015. J. M. Kebschull and A. M. Zador, "Cellular barcoding: lineage tracing, screening and beyond," Nature Methods, vol. 15, no. 11, pp. 871-879, 2018/11/01 2018. J. Ding, C. Lin, and Z. Bar-Joseph, Cell lineage inference from SNP and scRNA-Seq data. 2018. A. A. Morley, "The somatic mutation theory of ageing," Mutat Res, vol. 338, no. 1-6, pp. 19-23, Oct 1995. M. Enge et al., "Single-Cell Analysis of Human Pancreas Reveals Transcriptional Signatures of Aging and Somatic Mutation Patterns," Cell, vol. 171, no. 2, pp. 321330 e14, Oct 5 2017. K. Wang et al., "Using ultra-sensitive next generation sequencing to dissect DNA damage-induced mutagenesis," Scientific Reports, Article vol. 6, p. 25310, 04/28/online 2016.

Figure legends Fig. 1 A Schematic diagram of somatic mutation and cell lineage. Dots in different colors represent various types of mutations. The blue trapezoid represents clonal amplification of cells with genomic drift. A star-shaped mark represents a cancer-causing mutation. Gray trapezoid represents a clonal tumor lineage. Fig. 2 An earlier mutation (a) produces a larger population of mutant cells than a later mutation (b). Grey triangle: whole cell lineages of an individual; red triangle: a mutant cell lineage; yellow triangle: a germline cell lineage. Depending on the timing of a mutation event, the affected cell population size varies, including the germline cells.

Fig. 3 Representative methods to detect mutations with ultra-low allele frequencies. There are two kinds of methods: the barcode-based and RCA-based methods. The RCA-based methods are more sensitive and more efficient [108]. Fig. 4 An example of a machine-learning framework to detect somatic mutations. The system maps paired tumor-normal whole-exome sequence data to the human reference genome (a dual mapping protocol) to obtain consensus mutation calling. The candidate mutation callings are assessed: e.g., by an extremely randomized trees classification model (Cerebro), which evaluates a large set of decision trees to estimate a confidence score.

Fig. 5 Retrospective lineage tracing. (a) De novo or somatic mutations during development. Grey triangle: whole cell lineages of an individual; red triangle: s mutant cell lineage; and red dots: observed somatic mutations. Some somatic mutations are shared between different cells while the others are not. By detecting a phylogenetic (genealogical) signal in the observed somatic mutations, we can reconstruct cell lineages in a retrospective manner. (b) A cell lineage reconstructed by using the somatic mutations in adult. Highlights:      

Biological significance of somatic mutations in terms of senescence Somatic mutations as conceptual counterparts to inheritable germline mutations The state-of-art machine learning approach to detect true rare variants Selection and structural somatic mutations Mosaicism of the central nervous system Retrospective lineage tracing by using somatic mutations