Accepted Manuscript Title: Complete mitochondrial genome database and standardized classification system for Canis lupus familiaris Author: Anna Duleba Katarzyna Skonieczna Wiesław Bogdanowicz Boris Malyarchuk Tomasz Grzybowski PII: DOI: Reference:
S1872-4973(15)30034-X http://dx.doi.org/doi:10.1016/j.fsigen.2015.06.014 FSIGEN 1381
To appear in:
Forensic Science International: Genetics
Received date: Revised date: Accepted date:
21-5-2015 22-6-2015 29-6-2015
Please cite this article as: Anna Duleba, Katarzyna Skonieczna, Wieslaw Bogdanowicz, Boris Malyarchuk, Tomasz Grzybowski, Complete mitochondrial genome database and standardized classification system for Canis lupus familiaris, Forensic Science International: Genetics http://dx.doi.org/10.1016/j.fsigen.2015.06.014 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Title: Complete mitochondrial genome database and standardized classification system for Canis lupus familiaris Anna Dulebaa, Katarzyna Skoniecznaa, Wiesław Bogdanowiczb, Boris Malyarchukc, Tomasz Grzybowskia a
Department of Molecular and Forensic Genetics, Institute of Forensic Medicine, Ludwik
Rydygier Collegium Medicum, Nicolaus Copernicus University, 9 Sklodowskiej-Curie Street, 85-094 Bydgoszcz, Poland b
Museum and Institute of Zoology, Polish Academy of Sciences, 64 Wilcza Street, 00-679
Warsaw, Poland c
Institute of Biological Problems of the North, Far-East Branch of the Russian Academy of
Sciences, 18 Portovaya Street, 685000 Magadan, Russia
Corresponding author: Tomasz Grzybowski, Department of Molecular and Forensic Genetics, Institute of Forensic Medicine, Ludwik Rydygier Collegium Medicum, Nicolaus Copernicus University, 9 Sklodowskiej-Curie Street, 85-094 Bydgoszcz, Poland, tel.: +48 52 585 35 49; fax: +48 52 585 35 53; e-mail:
[email protected]
Highlights
1. We performed a detailed phylogenetic analysis of 555 complete mitogenomes of C. l. familiaris 2. We reconstructed the global phylogenetic tree of modern dogs’ mtDNA 3. We standardized the nomenclature of mtDNA haplogroups of modern dogs 4. The tree of dogs’ mtDNA could be used for data quality control in population and forensic genetics.
Abstract To contribute to the complete mitogenome database of the species C. l. familiaris and shed more light on its origin, we have sequenced mitochondrial genomes of 120 modern dogs from worldwide populations. Together with all the previously published mitogenome sequences of acceptable quality, we have reconstructed a global phylogenetic tree of 555 C. l. familiaris mitogenomes and standardized haplogroup nomenclature. The phylogenetic tree presented here and available online at http://clf.mtdna.tree.cm.umk.pl/ could be further used by forensic and evolutionary geneticists as well cynologists, for data quality control and unambiguous haplogroup classification. Our in-depth phylogeographic analysis of all C. l. familiaris mitogenomes confirmed that domestic dogs may have originated in East Asia during the Mesolithic and Upper Paleolithic time periods and started to expand to other parts of the world during Neolithic times. Keywords: dog, mitochondrial genome, phylogenetic tree, haplogroup, haplotype.
Introduction It is agreed that the dog (Canis lupus familiaris) is a domesticated form of the gray wolf (Canis lupus lupus) (1). Nevertheless, extensive research, employing a wide range of genetic markers, performed in the last two decades has provided no consensus on the place of C. l. familiaris origin [2-13]. Some studies based on mitochondrial DNA (mtDNA) lineage analysis in modern dog populations indicated that C. l. familiaris originated in Southeastern Asia [2, 7, 12-14, 15]; in the others alternative places of origin, like Africa, Middle East and Europe were proposed [16,17, 18]. Similarly, the mtDNA studies performed so far have not provided consensus on the time of divergence of dogs and their initial expansion. Different ranges of mtDNA sequencing and molecular clocks used in these studies have provided a wide range of time estimates. Some studies suggested that dogs diverged from wolves about 100 kya [2], whereas others gave a time frame of 5 ~ 16 kya [13]. In contrast, recent studies based on the comparison of complete mitochondrial genome variability of modern dogs with that of archaeological materials obtained from extinct wolf-like and dog-like specimens indicated that wolf domestication took place in Europe, about 19 ~ 32 kya [18]. Interestingly, a recent draft of a complete nuclear genome sequence from a 35,000-year-old wolf from the Taimyr Peninsula in northern Siberia indicates that it may be even longer, 27-40 kya timescale for the divergence of the dog and wolf lineages [19]. This early divergence suggests that domesticated dogs might have accompanied the early colonizers into Americas [19]. The reliability of mtDNA phylogenetic analysis increases with the range (targeted mtDNA regions) of sequencing. In this respect, it is worth noting that recent studies were based mainly on control region haplotypes [2, 7, 12, 14-17]. Only Pang et al. [13], Thalmann et al. [18] and, most recently, Skoglund et al. [19] considered the whole mitochondrial genome variability in their phylogenetic reconstructions. Intriguingly, their results provided different conclusions. Importantly, the accuracy of phylogeographic interpretation of complete mitogenomes strongly depends on representative sample collection and sufficient quality of sequencing data. As for the latter, Shi et al. [20] clearly demonstrated errors in the existing dogs’ and wolves’ datasets, which have never been subjected to phylogenetic scrutiny. Therefore, due to limited resolution, insufficient sampling and poor data quality, the origin of dogs as revealed by mtDNA analysis is still being debated. Recently, the forensic community has also become interested in mtDNA analysis of dogs, due to its usefulness in identification of biological traces [21, 22] However, despite some early efforts concerning standardization of nomenclature for dog mtDNA haplotypes [23], no
consistent classification system based on complete genome data has been proposed. For this reason, we performed a new study on complete mtDNA sequences obtained from 120 modern dogs from Europe, Asia, Central America and Oceania. Together with 435 previously published entire mitochondrial genome haplotypes, we performed a detailed phylogenetic analysis of 555 complete mitogenomes of C. l. familiaris. Based on the reconstructed tree, we present a consistent and universal phylogenetic classification system of dogs’ mitochondrial genome sequences, which standardize the nomenclature of mtDNA haplogroups. Moreover, based on the modern dogs’ tree, we have interpreted all currently available complete (or nearly complete) mitochondrial genome sequences of modern and ancient C. l. lupus specimens. Accompanied by analysis of regional specificity of particular haplogroups and their molecular dating, our reconstruction provides new insight on the origin and timing for the divergence of domesticated dogs. The reconstructed tree of dogs’ mitogenomes could be further used for data quality control in population and forensic genetics. Materials and methods. Samples. The study was approved by the Animal Research Local Ethics Committee of the University of Technology and Life Sciences in Bydgoszcz, Poland (statements no. 44/2012). Buccal swabs were randomly collected from maternally unrelated, predominantly free-breeding, 120 C. l. familiaris specimens (9 samples from the Caucasus, 9 from Jamaica, 19 from New Caledonia, 18 from Poland, 10 from Fiji, 11 from Lazovsky Region (Primorsky Krai, Russia), 5 from Shkotovsky Region (Primorsky Krai, Russia), 5 from Ukraine, 13 from Saint Petersburg (Russia), 10 from Costa Rica and 11 from Tajikistan, Table S5). Sequencing of dogs mitochondrial genomes. Complete mitochondrial genome amplification of 120 samples was performed as described by Bjornerfeld et al. [24]. PCR products were purified with Amicon Ultra-0.5 mL Centrifugal Filters (Millipore). Sequencing reactions were performed with BigDye® Terminator v3.1 Cycle Sequencing Kit (Life Technologies) according to the manufacturer’s protocol. Primers designed by Bjornerfeld et al. [24] were used for sequencing reactions. PCR and sequencing reactions were performed using GeneAmp PCR System 9700 (Applied Biosystems). Capillary electrophoresis was performed using 3130xl Genetic Analyzer (Applied Biosystems). All mtDNA sequences were compared with the reference sequence published by Kim et al [25] (GeneBank accession number: NC_002008) using the SeqScape v. 2.5 software (Applied Biosystems). The GenBank accession numbers for the complete mitochondrial genomes reported in this paper are KM061475-KM061594. Data analysis. The phylogenetic tree of 555 (435 previously published and 120 newly sequenced, Table S5) complete mitochondrial genomes of C. l. familiaris was reconstructed with the maximum parsimony (MP) method using mtPhyl software v. 4.015 (http://eltsov.org). Point indels and transversions located between nucleotide positions (nps) 1486-1493, 5444, 15510-15532, 15932-15938, 16025, 16040-16550 and 16661-16674 were disregarded during phylogenetic analysis. The tree was rooted by using a published coyote (Canis latrans) mitochondrial genome (GenBank accession number: NC_008093.1). Phylogenetic tree of 555 complete mitochondrial genomes of C. l. familiaris reconstructed with maximum parsimony is available at the website: http://clf.mtdna.tree.cm.umk.pl/. Coalescence ages were calculated with maximum likelihood (ML) estimates of branch lengths using PhyML v 3.0 software [26], assuming HKY85 [27] mutation model and γ distribution. The obtained values were converted to time using a molecular clock as suggested by Pang et al. [13], assuming one nucleotide substitution per 3,200 - 9,600 years (95% HPD: 2,500 - 13,500 years). Population structure and genetic variation of complete mitochondrial genome sequences were calculated
using Arlequin software v. 3.01. Multidimensional scaling (MDS) analysis of pairwise interpopulation FST values was performed using STATISTICA software v. 7.1 (StatSoft, Inc., USA). The mtDNA discrimination power was calculated as previously described [28]. Changes in the effective population size of dogs throughout time have been inferred by using Bayesian skyline plots (BSPs) produced from 120 complete mtDNA sequences by means of the program BEAST 1.7.5 [29]. The GTR+G model was selected as the best-fit model of nucleotide substitutions for the molecular data set by the Bayesian Information Criterion using the program MEGA 5.05 [30]. Preliminary analyses were performed with an uncorrelated lognormal relaxed clock to test if a strict molecular clock could be rejected for our dataset. Because our analyses of complete mtDNAs the ucld.stdev parameter values were >0 with a frequency histogram not abutting against zero, we chose an uncorrelated lognormal relaxed clock for the analyses [31]. Three independent analyses with Markov chains were simulated over 100 million generations sampled every 10000 steps, with the first 10 million generations regarded as burn-in. Tracer 1.4 was used to analyze the data generated by BEAST. Convergence was checked to the stationary distribution and sufficient sampling by inspection of posterior samples. Effective sample size (ESS) values were calculated for each parameter to ensure adequate mixing of the Markov chain Monte Carlo (ESS > 200). Calibrated with the time for the separation between wolf and coyote (~1.5 Ma), this gives an evolutionary rate of 1.82 x 10-8 substitutions per site per year (with 95% highest posterior density (HPD), 1.2 x 10-8 – 2.6 x 10-8). Results and Discussion Phylogenetic tree of dogs’ mitochondrial genomes - standardization of classification system. Since phylogenetic topologies using different parts of mitogenome can be incompatible, the best way to reconstruct the mtDNA tree of dogs is analysis of complete haplotypes. In this study, 120 complete mitochondrial genome sequences obtained in this study and 435 previously published haplotypes (jointly 555 mitogenomes) of C. l. familiaris from all over the world were used to reconstruct the phylogeny of mitochondrial genomes. Ambiguous sites (highly unstable tandem repeats located between positions: 16040-16550, homopolymer regions located between positions: 1486-1493, 15932-15938 and 16661-16674, hypervariable 16025 position as well as positions that were not defined in many haplotypes deposited in GenBank, located at position 5444 and between positions 15510-15532) were excluded from the phylogeny reconstruction. Overall 1049 polymorphic positions were observed (Table 1). The majority of mutations were transitions (about 92.5%). Transversions were observed about 12 times less frequent than transitions. Only 1.7% of all changes where insertions or deletions (Table 1). Nevertheless, most of the polymorphic positions (about 76%) were observed in protein coding genes and the frequency of mutations was the highest in noncoding region (Table 1). rRNA and tRNA genes were found as the most evolutionary stable (Table 1). The previous mtDNA classification system of C. l. familiaris was restricted to control region sequence positions only [12, 23]. Even complete mitochondrial genome classification into mtDNA haplogroups was based solely on the control region information [13, 24, 32]. Here we present for the first time a phylogenetic classification system that is based on the entire mitochondrial genomes of C. l. familiaris (Fig. S1). In reconstruction of the maximum parsimony phylogenetic tree of dogs mitogenomes we used a coyote mitochondrial DNA sequence as an outgroup (Fig. S1). In population and forensic genetics mtDNA polymorphisms are usually reported in comparison of the obtained sequence with a reference sequence. In order to unify the results and make the analysis straightforward, the reference sequence should be the same for all research groups. Here we report dog haplotypes by
comparing the entire mitochondrial genome sequences with dog mitogenome determined by Kim et al. [25], which was previously recommended as reference sequence in population and forensic genetics [23]. A consistent and universal phylogenetic classification system presented here unifies the nomenclature of dogs’ mitochondrial haplogroups used so far [mainly 12 and 33, 13 and 23] and generally accepted rules of the nomenclature used in human mtDNA phylogenetic analysis [34-36]. We defined each (sub)-haplogroup by a specific mutation motif encompassing both the coding and control regions. We denoted 6 major clades, which we marked in capital letters: A, B, C, D, E and F. The nomenclature of subsequent sub-haplogroups were designated alternately with positive integers and lowercase letters (eg. A1a1b1a1). Within the A clade, we identified 6 sub-branches (A1 to A6). Moreover, within each of the B, C and D clades we defined two minor subclades (B1 and B2, C1 and C2, D1 and D2) (Fig. 1). In total, we defined 178 branching points, of which 93 were terminal branches (Fig. S1). In the previous study by Pang et al. [13], 10 minor subhaplogroups within the major A, B and C clades were also identified, but the mutational motifs that defined the individual branches were not specified. The complete maximum parsimony tree is available in Supporting Information (Fig. S1) and additionally freely available at http://clf.mtdna.tree.cm.umk.pl/. The complete mtDNA tree will be continuously and regularly updated in the future. Mitochondrial genome sequence variability of modern dogs. The most frequent haplogroup in all populations was haplogroup A, which, on average, encompassed 65% of the mtDNA pool. Haplogroups B and C were less common with a frequency of occurrence of approximately 23% and 10%, respectively. The percentage distribution of specimens belonging to the major A, B and C clades were similar in all the analyzed populations, regardless of their geographic origin (Fig. 2). Haplogroups D, E and F were incidentally observed in dog populations and their occurrence was limited to certain geographical regions (Fig. 2). These results are comparable to previous reports [13]. The analysis of the geographical origin of the studied sequences shows that the mtDNA variability of some populations may be underestimated due to the small number of specimens analyzed and thus may not fully reflect the actual distribution of haplotypes in different regions. A twodimensional MDS plot of pairwise FST values performed for 34, 15, 6 and 3 groups revealed low population differentiation in dogs (Fig. S2). Moreover, all populations were characterized by similar values of genetic variation parameters (Table S1). The lowest values of gene diversity (H) were observed in French (70%), Indian (83.3%) and German (85.7%) populations (Table S1). Gene diversity values for the other populations ranged from 90% to 100% (Table S1). Nucleotide diversity (π) and the average number of pairwise differences (πm), were lowest for the Indian (π = 0.1%, πm = 17.5) and Ukrainian (π = 0.1% , πm = 19.8) populations. The highest values for these parameters were observed in Switzerland (π = 0.5%, πm = 80), Japan (π = 0.5%, πm = 83.3) and Spain (π = 0.6%, πm = 105) (Table S1). Genetic and nucleotide diversities did not show significant differences depending on the size of the population. Comparing the genetic variation parameters among Asian and European populations it can be observed that values of gene diversity (H), nucleotide diversity (π) and the average number of pairwise differences (πm) are very similar (Table S1). Analysis of molecular variance (AMOVA) showed that more genetic variation was found within a population (about 93%) than among other populations (about 10%) (Table S2). Summarizing, the analysis of each population using the basic parameters of genetic diversity (Table S1), multidimensional scaling (MDS) based on FST genetic distances (Fig. S2), and the results of AMOVA (Table S2) indicates the lack of a clear stratification of the global population of dogs. Among 555 sequences, 1047 polymorphic sites and 380 haplotypes were identified. The
greatest number of polymorphic sites (522) was identified in the Chinese population (85 samples), of which 84 haplotypes were observed (Table S1). Interestingly, in the most numerous Belgian population (161) with 113 haplotypes identified, only 312 polymorphic sites were observed (Table S1). In contrast, the North American population (94 samples) was characterized by 403 polymorphic sites and 76 different haplotypes (Table S1). Overall, the number of different haplotypes with respect to the population size as well as the number of polymorphic positions were higher for the Eastern Asian population than for the European or American populations (Table S1). This underlines the unique status of East Asia as the population characterized by the greatest mitochondrial genome diversity. We utilized mitochondrial genome sequences of dogs to estimate changes in the population size over time using Bayesian skyline analysis (Fig. 3). The BSP for dogs shows an increase of population size starting at approximately 23.5 kya (95% HPD, 15.3-42.0), followed by a sharp increase at about 15 kya. It is worth noting that the time of initial increase of population size broadly coincides with the recently suggested separation of the ancestors of dogs from the present day wolves before the Last Glacial Maximum [19]. Estimation of the increment rate, corresponding to the number of times the effective population size increased during a period of time, indicates that there was a 10-fold increase in the population size of dogs which occurred after 15 kya. This suggests that the increase in population size may be attributable to dog domestication events, consistent with the demographic dependence of dogs on human populations [18]. Origin of the major C. l. familiaris haplogroups The topology of the maximum parsimony tree shows that, in general, none of the major haplogroups had a star-like phylogeny structure. Star phylogeny potentially indicates demographic expansions starting from a small group of founders. Some elements of star-like topology can only be seen in some of the younger branches such as subhaplogroups A1a1b1a, A1b1a1a or B1a1a (Fig. S1). Thus, it seems most likely that wolf domestication occurred in one geographical region, from where dogs further expanded to other parts of the world without explicit founder effect episodes. Nevertheless, one could not exclude the possibility of hybridization of dogs and the gray wolves after first domestication. The population mixing that led to "homogenization" of the dogs mtDNA gene pool could have been further augmented by migrations that took place in the past. Some evidence for hybridization between dogs and wolves was provided by analysis of entire autosomal genome variation [36], as well as by studies of haploid markers in low resolution [38, 39]. Mitochondrial genome variation of wolf populations could be informative in the context of the time and place of wolf domestication. In this paper, we have analyzed all currently published complete (or nearly complete) mitochondrial genome sequences of modern and ancient wolves (75 haplotypes, Table S4) against our modern dogs’ mtDNA phylogenetic tree. Unfortunately, the results of this analysis are not entirely satisfactory, due to numerous gaps in the published wolves’ sequences (Table S4). In the vast majority of cases, these sequences were classified into the deepest, thus the oldest clades of the phylogenetic tree, not represented by any contemporary C. l. familiaris samples (Fig. S3). For example, mtDNA of a 35,000-year-old wolf from the Taimyr Peninsula in northern Siberia [19] harbors only two out of seven diagnostic mutations for a deep node B’C’E, most probably representing its ancient sister lineage (Fig. S3, accession number PRJEB7788). These results may indicate that C. l. lupus lines are very old, but simultaneously, they may also indicate that incomplete wolf sequences lack many diagnostic mutations characteristic of younger haplogroups (Table S4). The abovementioned data suggests that modern and fossil samples of C. l. lupus are of limited use for the consideration of the regional domestication processes, partly due to the loss of the original genetic diversity, and partly because of the simple absence of data or its
poor quality. It seems therefore that the origin of dogs may be better recognized in the light of the phylogeography and evolutionary age of mtDNA haplogroups in modern dog populations. To estimate the time of origin of mtDNA (sub)haplogroups we used mutation rate calculated by Pang et al. [13] as one nucleotide substitution per 3,200 - 9,600 years (95% HPD: 2,500 13,500 years). Almost a three-fold difference between the lower and upper limit of the mutation rate is associated with uncertainty on the time between wolf / dog and coyote divergence which was used to calibrate the molecular clock. In fact, based on fossil studies the divergence time is approximately 1.5 million to 4.5 million years ago [40]. Nevertheless, on the basis of the morphological and molecular data the divergence time was set to approximately 1 to 1.5 million years [41, 42]. Therefore, we discuss below the dating results based on the lower limit of the evolutionary age. Although the mtDNA gene pool of current dogs is similar between regions (especially subcontinental), a detailed analysis of the tree topology within its main haplogroups: A, B and C allows the identification of several subclades (A2a, A2b, A3, A4, A5, B2a and C1b) that are represented almost exclusively in modern dogs from Eastern Asia (Fig. S4). The evolutionary age of these haplogroups is advanced, ranging from about 15 kya to 38.7 kya (Table S3). It seems that the mtDNA clades described above represent the initial C. l. familiaris gene pool and point to the fact that domestication could have taken place in East Asia at a time ranging from the Upper Paleolithic to the Mesolithic (38 – 10 kya). This observation is consistent with results of recent complete nuclear genome analysis [37], mtDNA studies performed at lower resolution [12, 13] and Y-chromosome analysis [9-11], further pointing to the higher variability of Y-chromosome in the southern part of the Yangtze River [9]. Moreover, our dating of mitochondrial subhaplogroups of the alleged Eastern Asian origin does not contradict a recent suggestion on a longer timescale (27-40 kya) for the divergence of the dog and wolf lineages based on ancient nuclear DNA data [19]. Indeed, as Skoglund et al. [19] pointed out, the time of initial divergence would not necessarily have had to coincide with further expansion and human-mediated domestication processes. It is worth noting that the C1a subclade, which is seen mainly in the Middle East and Europe is much younger (about 4,8 kya, Table S3) than its sister Eastern Asian C1b clade. It is possible that the entire C1 haplogroup (about 22.8 kya, Table S3) arose in East Asia, as two haplotypes that are located on sister branches to the C1a and C1b clades are also of Eastern Asian origin. The samples from Eastern Asia are also present in many other, much younger branches and occur as founder or peripheral haplotypes for these clades (Fig. S4). Consequently, East Asia represents the area with almost the whole range of mtDNA variation of modern domestic dog populations. Unlike Asian haplotypes, European haplotypes localize within evolutionary younger subclades (A1a1a3, A1b1a1a, A1b2a1a1, B1a1a, B1a1p, C1a2, C1a4, C2a2, C2a3) of ages ranging from about 431 ya to 15.6 kya (Fig. S4, Table S3). It seems therefore that haplogroups which are represented mainly by European haplotypes arose in Neolithic period (6-7 kya). The D, E and F haplogroups are likely to have distinct evolutionary histories than the main A, B and C clades. Indeed, the occurrence of the relatively old D, E, and F clades is limited to specific geographic regions. This may suggest the dogs’ hybridization with local populations of wolves. Haplogroup E, represented by only one entire mitochondrial genome from Korea (Fig. S4) was previously observed in a relatively wide area of Eastern Eurasia (South-East Asia, Korea, Japan) [7]. Clade F, represented by two mitogenomes from Japan, may have formed due to hybridization of dogs with wolves in Japan (Fig. S4). Haplogroup D, a sister clade to A'B'C'E'F macrohaplogroup (Fig. S4) may be the only haplogroup that formed outside East Asia separate from all the other haplogroups. Low frequency and limited geographically manifestation of the D clade suggest hybridization of dogs with wolves rather
than an independent domestication episode. Distinct formation of the D haplogroup was previously suggested by Pang et al. [13], Klutsch et al. [8] and Ardalan et al. [7]. On the basis of the current data it can be assumed that the founder haplotype for the entire haplogroup D originated in South-East Asia. This clade further expanded to the northern and western part of the Asian continent up to some parts of Europe, and have a clear, recent founder effect in the Scandinavian population (Fig. S4). Canis l. familiaris mitogenome tree as a quality control measure in forensic and population genetics Due to the high variability of mtDNA and its resistance to degradation, this marker is used in forensic genetics for identification of biological traces, especially those for which nuclear DNA profiling fails to give results. In forensic practice, mtDNA analysis of dog specimens was usually limited to HVS I and HVS II regions. However, it was shown that inclusion of polymorphic positions located within the coding region significantly increases the discrimination power in forensic mtDNA testing [28, 32, 43, 44). Maximum parsimony tree reconstructed in this study allows those specific polymorphic positions located outside the control region to be identified. In 555 complete mitochondrial genomes analyzed, 959 polymorphic positions were identified in the coding region (Table 1). High values for power of discrimination (93.3% in Western Europe population and about 99% for most of the rest of the population, Table 2) show high utility of the complete mitogenome as a marker in forensic genetics. Importantly, in 555 complete mitochondrial genome sequences 17 hotspots were identified, six of which localized in the coding region (1493, 1634, 5367, 7923, 12063 and 14977), and 11 in the control region (15553, 15625, 15627, 15632, 15639, 15652, 15665, 15931, 15938, 15955, 16025). Hotspots were identified when substitution at a given position was observed at least 5 times in the reconstructed phylogenetic tree. Five of the hotspots were located in the control region between 15595 and 15653 nucleotide positions. This region is considered to be the most variable, with the largest number of mutations relative to the number of nucleotides [45, 46]. The phylogenetic analysis based on the dogs’ mtDNA tree allows errors in mitochondrial datasets to be detected. These errors can be introduced at various stages of the analytical process, as was shown previously on numerous examples of human mtDNA analysis [47-49]. Indeed, the key step in assessing the quality of the raw data is a detailed phylogenetic scrutiny of mtDNA sequencing results [47, 48, 50-54]. Data quality control based on phylogenetic analysis allows incomplete mutation patterns to be identified, detection of contamination or sequencing artifacts. For instance, phylogenetic analysis of 447 dog haplotypes deposited in GenBank revealed errors in at least 16 of them [20]. Therefore, the comprehensive phylogenetic tree presented here, including all available dog mitogenomes, could be a useful tool in the quality check of data in population and forensic genetics. Acknowledgements. The study was supported by the European Social Fund (no. 48/9/POKL/4.1.1/2010). Additional funding was provided by the National Science Centre (grant No. 2011/01/B/NZ8/02978) in Poland. The following people are acknowledged for their help with the swab samples: Sergey Belokobylskij, Bernal Rodríguez Herrera, Grzegorz Kłys, Ragde Sanchez, Oleksandr Radchenko, Ronald S. Stewart, Jörn Theuerkauf and Eduard Yavruyan. We thank Dr. Miroslava Derenko for her valuable comments and help with the manuscript’s preparation. References 1. Clutton-Brock J (1999) A Natural History of Domesticated Mammals. Cambridge University Press. Cambridge, UK. 2. Vilà C1 Savolainen P, Maldonado JE, Amorim IR, Rice JE, Honeycutt RL,
Crandall KA, Lundeberg J, Wayne RK. (1997) Multiple and ancient origins of the Domestic Dog. Science 276(5319): 1687–1689. (doi:10.1126/science.276.5319.1687) 3. Vilà C, Walker C, Sundqvist AK, Flagstad Ø, Andersone Z, Casulli A, Kojola I, Valdmann H, Halverson J, Ellegren H. (2003) Combined use of maternal, paternal and bi-parental genetic markers for the identification of wolf-dog hybrids. Heredity 90(1): 17-24. (doi:10.1038/sj.hdy.6800175) 4. Vila C, Seddon J, Ellegren H (2005) Genes of domestic mammals augmented by backcrossing with wild ancestors. Trends Genet. 21(4): 214-218. (doi: http://dx.doi.org/10.1016/j.tig.2005.02.004) 5. Adams J-R, Leonard J-A, Waits L-P (2003) Widespread occurence of a domestic dog mitochondrial DNA haplotype in southeastern US coyotes. Mol Ecol 12 (2): 541-546. (doi:10.1046/j.1365-294X.2003.01708.x) 6. Verardi A, Lucchini V, Randi E (2006) Detecting introgressive hybridization between free-ranging domestic dogs and wild wolves (Canis lupus) by admixture linkage disequilibrium analysis. Mol Ecol 15(10): 2845-2855. (doi:10.1111/j.1365-294X.2006.02995.x) 7. Ardalan A, Kluetsch CF, Zhang AB, Erdogan M, Uhlén M, Houshmand M, Tepeli C, Ashtiani SR, Savolainen P. (2011) Comprehensive study of mtDNA among Southwest Asian dogs contradicts independent domestication of wolf, but implies dog–wolf hybridization. Ecol Evol 1(3): 373-385. (doi:10.1002/ece3.35) 8. Klütsch CF, Seppälä EH, Fall T, Uhlén M, Hedhammar A, Lohi H, Savolainen P. (2011) Regional occurrence, high frequency but low diversity of mitochondrial DNA haplogroup d1 suggests a recent dog-wolf hybridization in Scandinavia. Animal Genetics, 42(1): 100–103. (doi:10.1111/j.1365-2052.2010.02069.x) 9. Ding ZL, Oskarsson M, Ardalan A, Angleby H, Dahlgren LG, Tepeli C, Kirkness E, Savolainen P, Zhang YP. (2012) Origins of domestic dog in southern East Asia is supported by analysis of Y-chromosome DNA. Heredity 108(5): 507–514. (doi: 10.1038/hdy.2011.114) 10. Brown SK, Pedersen NC, Jafarishorijeh S, Bannasch DL, Ahrens KD, Wu JT, Okon M, Sacks BN. (2011) Phylogenetic distinctiveness of Middle Eastern and Southeast Asian village dog Y chromosomes illuminates dog origins. PLoS One 6(12): e28496. (doi: 10.1371/journal.pone.0028496) 11. Sacks BN, Brown SK, Stephens D, Pedersen NC, Wu JT, Berry O. (2013) Y chromosome analysis of dingoes and Southeast Asian village dogs suggests a Neolithic continental expansion from Southeast Asia followed by multiple Austronesian dispersals. Mol Biol Evol 30 (5): 1103-1118. (doi: 10.1093/molbev/mst027) 12. Savolainen P, Zhang Y-P, Luo J, Lundeberg J, Leitner T (2002) Genetic evidence for an East Asian origin of domestic dogs. Science 298(5598): 1610–1613. (doi:10.1126/science.1073906) 13. Pang J-F, et al. (2009) mtDNA Data Indicate a Single Origin for Dogs South of Yangtze River, Less Than 16,300 Years Ago, from Numerous Wolves. Mol Biol Evol 26(12): 2849–2864. (doi: 10.1093/molbev/msp195) 14. Tsuda K, Kikkawa Y, Yonekawa H, Tanabe Y, (1997) Extensive interbreeding
occurred among multiple matriarchal ancestors during the domestication of dogs: evidence from inter- and intraspecies polymorphisms in the D-loop region of mitochondrial DNA between dogs and wolves. Genes Genet Sys. 72(4): 229–238. 15. Angleby H, Savolainen P (2005) Forensic informativity of domestic dog mtDNA control region sequences. Forensic Sci Int 154 (2-3): 99-110. (http://dx.doi.org/10.1016/j.forsciint.2004.09.132) 16. Boyko AR, et al. (2009) Complex population structure in African village dogs and its implications for inferring dog domestication history. Proc Natl Acad Sci U S A. 106(33): 13903-13908. (doi: 10.1073/pnas.0902129106) 17. Vonholdt BM, et al. (2010) Genome-wide SNP and haplotype analyses reveal a rich history underlying dog domestication. Nature 464(7290): 898–902. (doi: 10.1038/nature08837) 18. Thalmann O, et al. (2013) Complete Mitochondrial Genomes of Ancient Canids Suggest a European Origin of Domestic Dogs. Science 15(6160): 871-874. (doi: 10.1126/science.1243650) 19. Skoglund P, Ersmark E, Palkopoulou E, Dalén L (2015) Ancient wolf genome reveals an early divergence of domestic dog ancestors and admixture into highlatitude breeds. Curr Biol. 1;25(11):1515-9. (doi: 10.1016/j.cub.2015.04.019) 20. Shi NN, Fan L, Yao YG, Peng MS, Zhang YP (2014) Mitochondrial genomes of domestic animals need scrutiny. Mol Ecol 23(22):5393-5397. (doi: 10.1111/mec.12955) 21. Eichmann C., Parson W., (2007) Molecular characterization of the canine mitochondrial DNA control region for forensic applications. Int. J. Legal. Med. 121, 411–416. (10.1007/s00414-006-0143-5) 22. Imaizumi K., Akutsu T., Miyasaka S., Yoshino M. (2007) Development of species identification tests targeting the 16S ribosomal RNA coding region in mitochondrial DNA. Int. J. Legal. Med. 121, 184-191. (10.1007/s00414-0060127-5) 23. Pereira L, Van Asch B, Amorim A (2004) Standardisation of nomenclature for dog mtDNA D-loop: a prerequisite for launching a Canis familiaris database. Forensic Sci Int.141(2-3): 99–108. (http://dx.doi.org/10.1016/j.forsciint.2003.12.014) 24. Björnerfeldt S, Webster M-T, Vila C (2006) Relaxation of selective constraint on dog mitochondrial DNA following domestication. Genome Res 16(8): 990–994. (doi:10.1101/gr.5117706) 25. Kim KS, Lee SE, Jeong HW, Ha JH, (1998) The complete nucleotide sequence of the domestic dog (Canis familiaris) mitochondrial genome. Mol Phylogenet Evol 10(2): 210–220. 26. Guindon S, Dufayard JF, Lefort V, Anisimova M, Hordijk W, Gascuel O. (2010) New Algorithms and Methods to Estimate Maximum-Likelihood Phylogenies: Assessing the Performance of PhyML 3.0. Systematic Biology 59(3): 307-321. (doi: 10.1093/sysbio/syq010) 27. Hasegawa M, Kishino H, Yano T (1985) Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. J Mol Evol 22(2): 160-174. 28. Verscheure S, Thierry Backeljau T, Desmyter S (2014) Dog mitochondrial genome
sequencing to enhance dog mtDNA discrimination power in forensic casework. Forensic Sci Int Genet 12:60-68. (doi: 10.1016/j.fsigen.2014.05.001) 29. Drummond AJ, Ho SYW, Rawlence N, Rambaut A (2007) A Rough Guide to BEAST 1.4. Available at: http://beast.bio.ed.ac.uk/ 30. Drummond A-J, Suchard, M-A, Xie D, Rambaut A (2012) Bayesian phylogenetics with BEAUti and the BEAST 1.7. Mol Biol Evol 29: 1969-1973. (doi: 10.1093/molbev/mss075) 31. Tamura K, Peterson D, Peterson N, Stecher G, Nei M, Kumar S. (2011) MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Mol Biol Evol 28: 2731-2739. (doi: 10.1093/molbev/msr121) 32. Webb KM, Allard MW (2009) Mitochondrial genome DNA analysis of the domestic dog: identifying informative SNPs outside of the control region. J Forensic Sci 54 (2): 275-288. (doi: 10.1111/j.1556-4029.2008.00952.x) 33. Savolainen P, Leitner T, Wilton A-N, Matisoo-Smith E, Lundeberg J (2004) A detailed picture of the origin of the Australian dingo, obtained from the study of mitochondrial DNA. Proc Natl Acad Sc. USA. 101(33): 12387-12390. (doi: 10.1073/pnas.0401814101) 34. Richards M-B, Macaulay V-A, Bandelt H-J, Sykes BC (1998) Phylogeography of mitochondrial DNA in western Europe. Ann Hum Genet 62(3): 241-260. ( 10.1046/j.1469-1809.1998.6230241.x) 35. Macaulay V, Richards M, Hickey E, Vega E, Cruciani F, Guida V, Scozzari R, Bonné-Tamir B, Sykes B, Torroni A. (1999) The emerging tree of West Eurasian mtDNAs: a synthesis of control-region sequences and RFLPs. Am J Hu. Genet 64(1): 232-249. (http://dx.doi.org/10.1086/302204) 36. Richards M, et al. (2000) Tracing European founder lineages in the Near Eastern mtDNA pool. Am J Hum Genet 67(5): 1251-1276. (http://dx.doi.org/10.1016/S0002-9297(07)62954-1) 37. Freedman A-H, et al. (2014) Genome sequencing highlights the dynamic early history of dogs. PLoS Genet 10(1):e1004016. (doi: 10.1371/journal.pgen.1004016) 38. Randi E, Hulva P, Fabbri E, Galaverni M, Galov A, Kusak J, Bigi D, Bolfíková BČ, Smetanová M, Caniglia R. (2014) Multilocus Detection of Wolf x Dog Hybridization in Italy, and Guidelines for Marker Selection. PLoS One 9(1):e86409. (doi: 10.1371/journal.pone.0086409) 39. Kopaliani N, Shakarashvili M, Gurielidze Z, Qurkhuli T, Tarkhnishvili D (2014) Gene Flow between Wolf and Shepherd Dog Populations in Georgia (Caucasus). Journal of Heredity 105(3): 345-353. (doi: 10.1093/jhered/esu014) 40. Nowak RM (2003) Wolf evolution and taxonomy. W: Mech L-D, Boitani L (ed) Wolves: behavior, ecology, and conservation. Chicago, University of Chicago Press. 41. Perini FA, Russo CAM, Schrago CG (2009) The evolution of South American endemic canids: a history of rapid diversification and morphological parallelism. J Evol Biol 23(2):311-322.(doi: 10.1111/j.1420-9101.2009.01901.x)
42. vonHoldt BM, et al. (2011) A genome-wide perspective on the evolutionary history of enigmatic wolf-like canids. Genome Res. 21(8):1294-305. (doi: 10.1101/gr.116301.110) 43. Parsons TJ, Coble MD (2001) Increasing the forensic discrimination of mitochondrial DNA testing through analysis of the entire mitochondrial DNA genome. Croat Med J 42 (3): 304–309. 44. Imes DL, Wictum EJ, Allard MW, Sacks BN (2012) Identification of single nucleotide polymorphisms within the mtDNA genome of the domestic dog to discriminate individuals with common HVI haplotypes. Forensic Sci Int Genetics 6(5): 630-639. (doi: 10.1016/j.fsigen.2012.02.004) 45. Himmelberger AL, Spear TF, Satkoski JA, George DA, Garnica WT, Malladi VS, Smith DG, Webb KM, Allard MW, Kanthaswamy S. (2008) Forensic utility of the mitochondrial hypervariable region 1 of domestic dogs, in conjunction with breed and geographic information. J Forensic Sci 53(1): 81-89. (doi: 10.1111/j.15564029.2007.00615.x) 46. Webb KM, Allard MW (2009) Identification of forensically informative SNPs in the domestic dog mitochondrial control region. J Forensic Sci 54(2): 289-304. (doi: 10.1111/j.1556-4029.2008.00953.x.) 47. Bandelt H-J, Lahermo P, Richards M., Macaulay V (2001) Detecting errors in mtDNA data by phylogenetic analysis. Int J Legal Med 115(2): 64-9. (10.1007/s004140100228) 48. Bandelt H-J, Salas A, Lutz-Bonengel S (2004) Artificial recombination in forensic mtDNA population databases. Int J Legal Med 118(5): 267–273. (doi:10.1007/s00414-004-0455-2) 49. Bandelt H-J, Kong Q-P, Richards M, Macaulay V (2006) Lab-Specific Mutation Processes. W: Bandelt H-J, Macaulay V, Richards M (ed.) Human mitochondrial DNA and the evolution of Homo sapiens. Springer-Verlag, Berlin Heidelberg. 50. Richards M (2004) The mitochondrial DNA tree and forensic science. Int. Congress Series 1261, 91–93. (doi:10.1016/S0531-5131(03)01704-7) 51. Yao YG, Bravi CM, Bandelt HJ (2004) A call for mtDNA data quality control in forensic science. Forensic Sci Int. 141(1): 1–6. (http://dx.doi.org/10.1016/j.forsciint.2003.12.004) 52. Salas A, Carracedo A, Macaulay V, Richards M, Bandelt HJ (2005) A practical guide to mitochondrial DNA error prevention in clinical, forensic, and population genetics. Biochem Biophys Res Commun 335(3): 891–899. 53. Bandelt H-J, Salas A (2012) Current Next Generation Sequencing technology may not meet forensic standards. Forensic Sci Int. Genetics 6(1): 143–145. (doi: 10.1016/j.fsigen.2011.04.004) 54. Parson W, et al. (2014) DNA Commission of the International Society for Forensic Genetics: Revised and extended guidelines for mitochondrial DNA typing. Forensic Sci.Int.Genet 13: 134-142. (doi: 10.1016/j.fsigen.2014.07.010)
Figure legends. Fig. 1. Schematic mtDNA phylogenetic tree of C. l. familiaris. The tree is rooted in coyote (COY) sequence.
Fig. 2. Frequencies of the major mtDNA haplogroups in dogs. Fig. 3. Bayesian skyline plot derived from complete mitochondrial genomes of dogs. The x-axis is the time from the present in units of millions of years, and the y-axis is equal to Neμ (the product of the effective population size and mutation rate). The thick solid line represents the median posterior effective population size through time, and the thin lines show the 95% highest posterior density limits.
Table 1. Characterization of polymorphic positions in complete mitochondrial genome sequences of Canis lupus familiaris. REGION Coding gene D-loop
Protein
rRNA
tRNA
Noncoding positions outside D-loop
Summary
Length (bp)
722
11410
2536
1519
81
16268
No. of transitions
76
746
89
51
9
971
No. of transversions
7
58
9
4
0
78
10.86
12.86
9.89
12.75
-
12.45
2
1
6
9
0
18
No. of evolutionarily stable positions
641
10615
2436
1455
72
15219
No. of polymorphic sites
81
795
100
64
9
1049
No. of evolutionarily stable position/ No. of polymorphic sites
7.91
13.35
24.36
22.73
8.00
14.51
Frequency of mutations
0.11
0.07
0.04
0.04
0.11
0.06
No. of sites with 1 change
78
785
97
64
9
1033
No. of sites with 2 changes
2
10
2
0
0
14
No. of sites with 3 or more changes
1
0
1
0
0
2
Transitions/transversions No. of indels
Table 2. Discrimination power of dogs complete mitochondrial genomes. Population Africa Northern America Central America South Asia North Asia North-eastern Asia Central Asia East Asia Central and eastern Europe Southern Europe Western Europe Northern Europe Europe (unspecified origin) Middle East Oceania
Power of discrimination 0.9999 0.9730 0.9989 0.9998 0.9999 0.9973 0.9994 0.9634 0.9938 0.9999 0.9375 0.9973 0.9999 0.9980 0.9966
Figure_1
Figure_2
Figure_3