Mitogenomic diversity in Russians and Poles

Mitogenomic diversity in Russians and Poles

Accepted Manuscript Title: Mitogenomic diversity in Russians and Poles Authors: Boris Malyarchuk, Andrey Litvinov, Miroslava Derenko, Katarzyna Skonie...

624KB Sizes 253 Downloads 321 Views

Accepted Manuscript Title: Mitogenomic diversity in Russians and Poles Authors: Boris Malyarchuk, Andrey Litvinov, Miroslava Derenko, Katarzyna Skonieczna, Tomasz Grzybowski, Aleksandra Grosheva, Yuri Shneider, Sergei Rychkov, Olga Zhukova PII: DOI: Reference:

S1872-4973(17)30135-7 http://dx.doi.org/doi:10.1016/j.fsigen.2017.06.003 FSIGEN 1734

To appear in:

Forensic Science International: Genetics

Received date: Revised date: Accepted date:

25-4-2017 1-6-2017 11-6-2017

Please cite this article as: Boris Malyarchuk, Andrey Litvinov, Miroslava Derenko, Katarzyna Skonieczna, Tomasz Grzybowski, Aleksandra Grosheva, Yuri Shneider, Sergei Rychkov, Olga Zhukova, Mitogenomic diversity in Russians and Poles, Forensic Science International: Geneticshttp://dx.doi.org/10.1016/j.fsigen.2017.06.003 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Mitogenomic diversity in Russians and Poles

Boris Malyarchuka, Andrey Litvinova, Miroslava Derenkoa, Katarzyna Skoniecznab, Tomasz Grzybowskib, Aleksandra Groshevac, Yuri Shneiderc, Sergei Rychkovc, Olga Zhukovac

a

Institute of Biological Problems of the North, Far-East Branch of the Russian Academy of

Sciences, Portovaya Street 18, Magadan 685000, Russia b

Division of Molecular and Forensic Genetics, Department of Forensic Medicine, Ludwik

Rydygier Collegium Medicum, Nicolaus Copernicus University, Sklodowskiej-Curie Street 9, Bydgoszcz 85-094, Poland c

N.I. Vavilov Institute of General Genetics, Russian Academy of Sciences, Gubkin Street 3,

Moscow 119991, Russia

Corresponding author: Boris Malyarchuk at Institute of Biological Problems of the North, Portovaya Street 18, Magadan 685000, Russia, e-mail: [email protected], Tel.: +7 4132 631164.

Highlights  Complete mitogenomes were sequenced in populations of Russians and Poles.  High diversity but low differentiation was observed among six Russian populations.  Bayesian skyline analysis demonstrates the Bronze Age expansion ~4.3 kya.

1

Abstract Complete

mtDNA

genome

sequencing

improves

molecular

resolution

for

distinguishing variation between individuals and populations, but there is still deficiency of mitogenomic population data. To overcome this limitation, we used Sanger-based protocol to generate complete mtDNA sequences of 376 Russian individuals from six populations of European part of Russia and 100 Polish individuals from northern Poland. Nearly complete resolution of mtDNA haplotypes was achieved – about 97% of haplotypes were unique both in Russians and Poles, and no haplotypes overlapped between them when indels were considered. While European populations showed a low, but statistically significant level of between-population differentiation (Fst = 0.66%, p = 0), Russians demonstrate lack of between-population differences (Fst = 0.22%, p = 0.15). Results of the Bayesian skyline analysis of Russian mitogenomes demonstrate not only post-Last Glacial Maximum expansion, but also rapid population growth starting from about 4.3 kya (95% CI: 2.9-5.8 kya), i.e. in the Bronze Age. This expansion strongly correlates with the Kurgan model established by archaeologists and confirmed by paleogeneticists.

Keywords: complete mtDNA sequencing, human populations, molecular phylogeography 1. Introduction Analysis of mitochondrial DNA (mtDNA) polymorphism has become a useful tool for human population and molecular evolution studies, allowing researchers to infer the pattern of female migrations and peopling of different regions of the world. In addition, mtDNA data have been implemented actively in human identity testing [1]. Over the years, population and forensic geneticists have focused mainly on mtDNA hypervariable segments I and II (HVSI and II, respectively) characterized by the high mutation rate. However, recurrent mutations at 2

hypervariable positions may generate phylogenetic uncertainty, thus limiting molecular resolution of the phylogenetic inferences [2-4]. Meanwhile, sequencing of the whole mitogenomes increases significantly the discrimination power and can be informative as a unique tool for the assessment of female-specific aspects of the demographic history of human populations [4]. Despite the fact that mitochondrial gene pools of different Slavic populations have been intensively studied in relation to the HVSI variability, there are no published complete mitogenome data for population samples of Slavs. Only some mitochondrial haplogroups (such as U2e, U3, U4, U5, U7, U8, H5, H6) were characterized on the complete mtDNA level in Russians, Belarusians, Poles, Czechs, Slovaks and Serbians [5-9]. Therefore, to obtain a better characterization of Slavic mtDNA variability, we present here complete mtDNA diversity data in population samples of Russians and Poles, representing Eastern and Western Slavs, respectively.

2. Materials and methods 2.1 Population samples and DNA extraction DNA from blood samples was obtained from 376 unrelated individuals, representing six Russian populations from European part of Russia: 64 individuals from Belgorod region, 48 individuals from Orel region, 68 individuals from Pskov region, 64 individuals from Velikiy Novgorod region, 59 individuals from Tula region and 73 individuals from Vladimir region (Figure 1). Genomic DNA was extracted from blood using a standard phenol/chloroform method. A population sample of 100 individuals from the PomeraniaKujawy region of the northern part of Poland was also studied. DNA was extracted from buccal swabs of Poles using the GeneMatrix Bio-Trace DNA Purification Kit according to the

3

manufacturer’s protocols (Eurx, Gdansk, Poland). Informed consent and genealogical information was obtained from all donors about their birthplace, parents and grandparents.

2.2 Complete mtDNA genome sequencing and data analysis We performed the complete mtDNA sequencing as previously described [10] using ABI3500xL and ABI3130xL Genetic Analyzers. DNA sequence data were analyzed using SeqScape 2.5 software (Applied Biosystems) and compared to the revised Cambridge reference sequence (rCRS) [11]. Sequences were aligned following phylogenetic alignment rules suggested by the International Society for Forensic Genetics (ISFG) [1,12,13]. All cases of point heteroplasmy were confirmed by independent PCR and sequencing reactions as recommended by Just et al. [14]. DnaSP 5.10.01 software [15] was used to calculate the basic parameters of genetic diversity. The analysis of molecular variance (AMOVA) was carried out by means of Arlequin 3.5.1.2 [16]. The statistical significance of Fst values was estimated by permutation analysis, using 10000 permutations. For these analyses, all indels were removed. Point heteroplasmies were treated as differences. The STATISTICA10 (StatSoft Inc., Tulsa, OK, USA) was used for multi-dimensional scaling (MDS) analysis based on the pairwise Fst values between populations. The probability of two randomly selected individuals from a population having identical mtDNA haplotypes (match probability, RMP) was defined as in Stoneking et al. [17]. The most-parsimonious trees of the complete mtDNA sequences were reconstructed using the mtPhyl v4.015 software (http://eltsov.org). Since this software uses earlier version of PhyloTree (Build 11) as its reference phylogeny, the trees were modified manually in order to take into account the updated mtDNA variation incorporated in PhyloTree Build 17. This 4

updated variation was also considered in assigning each haplotype into appropriate haplogroup. New subclades were defined when at least two different mitogenomes shared at least one mutation that is not at a hotspot [18]. Insertions at nucleotide positions 309, 315, 515, 524, 574 and 16193; transversions at positions 16182 and 16183; and transition at position 16519 were not used in phylogenetic analysis due to their high instability. In order to detect population growth, we obtained the Bayesian skyline plots (BSPs) for population sets of complete mtDNA sequences using BEAST 1.7.5 [19]. The HKY + G + I model was selected as the best-fit model of nucleotide substitution for the molecular data by means of MEGA 5.05 [20]. We used the strict molecular clock because analysis of ucld.stdev parameter has shown that this clock cannot be rejected for our datasets. A mutation rate of 1.665×10-8 [3] was used. Analyses with Markov chains were simulated over 100 million generations for 376 Russian complete mitogenomes and 60 million generations for 100 Polish mitogenomes. Tracer 1.4 was used to analyze the data generated by BEAST. We checked for convergence to the stationary distribution and sufficient sampling by inspection of posterior samples. Effective sample size (ESS) values were calculated for each parameter to ensure adequate mixing of the Markov chains (ESS > 200). Published population data on complete mtDNA variability in Tatars from the VolgaUral region [21], Estonians [22] and Sardinians from eastern Sardinia [23] were included in our comparative analysis. The GenBank (http://www.ncbi.nlm.nih.gov/genbank) accession numbers for the 389 novel complete mtDNA sequences (289 Russian and 100 Polish mtDNAs) reported in this paper are KY670838-KY671126 for Russians and KY782150KY782249 for Poles. These haplotypes are also provided for forensic searches under accession numbers EMP00685 (for Poles) and EMP00692 (for Russians) in the EMPOP database (www.empop.org; [24]). Previously published 87 Russian mitogenomes were taken from our studies [5-8,21,25-30]. 5

3. Results 3.1 mtDNA diversity in Russians and Poles Summary statistics describing mitogenomic diversity in European populations are shown in Table 1. In Russians, 96.8% mtDNA profiles (364 haplotypes) were unique, while 3.2% of haplotypes were observed more than once. Out of the latter 12 haplotypes, 10 were identical, even when indel polymorphisms were also taken into account (Table S1). Therefore, random match probability (RMP) for mitogenomic data in Russians was 0.28%, that is lower than in the U.S. Caucasian (0.39% for n=263), African American (0.6% for n=170) and Hispanic populations (0.72% for n=155) [14], as well as in Estonians (1.2% for n=114) [22]. However, within Russian populations (with sample sizes varying from 48 to 73 individuals) the probability of identity ranges from 1.6% to 2.1%, thereby confirming the previous suggestion on the dependence of the random match probabilities on the sample sizes [14]. In 100 Poles, only three haplotypes were observed twice when indels were ignored, and only two haplotypes were completely identical if all indels we taken into account (Table S1). Therefore, RMP for Polish mitogenomes was 1.06%. We should note that only two identical mtDNA haplotypes were found between Russians and Poles, but they became different when indels were considered (Table S1). Point heteroplasmy was detected in 12 Russian (3.2%) and 7 Polish (7%) samples (Table S2).

3.2 mtDNA pairwise nucleotide differences in Russians and Poles As shown in Table 1, the number of mean pairwise differences (MPDs) ranged from 27.9 for Estonians to 35.2 for Volga Tatars, while for Sardinians the MPDs is much lower – 24.3. The MPD value for Russians is 29.0. In Russian populations the MPDs vary from 26.9

6

for Pskov region to 31.4 for Vladimir region. Significantly negative values for Tajima’s D suggest that all European populations demonstrate a recent population expansion (Table 1). All European populations analyzed are characterized by bimodal distribution of pairwise nucleotide differences (Figure S1a). In Russians, the number of differences ranges from 0 to 96, and two peaks with a mode around 13 and 30 differences were observed (Figure S1b). A similar, but less clear, distribution of pairwise differences was found in Poles (Figure S1c). In general, the bimodal distribution could be due to subdivision of ancestral populations, even in the context of exponential population growth [31]. However, the bimodal distribution can be a reflection of two groups of pairwise distributions, with the first, smaller peak representing nucleotide differences between recently diverged lineages, and the second, larger peak representing differences between more distantly related haplotypes, as described in Just et al. [14].

3.3 Bayesian skyline plots in Russians and Poles To investigate changes in population size over time, BSPs were constructed using complete mtDNA sequences. The BSP for six Russian populations demonstrates an increase of human population size up to around 50 kya (95% CI: 46-61 kya) followed by a prolonged period of slow decrease up to 24.5 kya (95% CI: 21.7-26.1 kya) in the Last Glacial Maximum (LGM), and then an expansion up to 13-11 kya (Figure 2). Next sharp expansion with an 11fold increase of population size has occurred about 4.3 kya (95% CI: 2.9-5.8 kya). Meanwhile, the BSP for 100 Poles is less informative, demonstrating only post-LGM expansion (Figure S2). This could be due to differences in sample sizes of complete mtDNA sequences used for analysis – for instance, the BSP for 112 Russian mitogenomes also demonstrates only post-LGM expansion (Figure S3).

7

3.4 mtDNA differentiation of European populations Genetic differentiation of European populations based on mitogenome variability data in Russians, Poles, Estonians, Volga Tatars and Sardinians was investigated by an AMOVA (Table S3). The results showed a low, but statistically significant, level of between-population differentiation (Fst = 0.66%, p = 0). Significant pairwise differences (p < 0.05) were found between Sardinians and the remaining European populations. Volga Tatars also differed significantly from all populations, with exception of Poles. Very low differences were found between Russian populations, Poles and Estonians (Fst = 0.22%, p = 0.09). Only Russians from Vladimir were significantly distinguishable from Tula and Pskov Russians. Yet in general Russians demonstrate lack of significant between-population differentiation (Fst = 0.22%, p = 0.15). In order to visualize the relationships of European populations based on complete mtDNA genome sequences, an MDS plot was constructed from the pairwise Fst values (Figure 3). The results showed a cluster including a great majority of Russian populations and Estonians, while Poles and Russians from Vladimir region were separated from them.

3.5 mtDNA haplogroup assignment We assigned haplogroups for each individual using mtPhyl, in accordance with the nomenclature in PhyloTree Build 17 (Figure S4). A total of 198 different mtDNA subclades were identified in Russians and Poles, and 40 subclades were in common between them (Table S4). Twenty nine new sub-clades with respect to the last version in PhyloTree were identified (Figure S4): 14 allocated to H, 2 to V, 7 to JT, 4 to U, 2 to W. In 9 cases sub-clade specific nucleotide motifs were redefined (3 in U, 3 in H, 3 in JT). In general, mtDNA haplogroups revealed in Russians and Poles were of West Eurasian origin. Only several

8

subclades typical for Siberian and East Asian populations were found in Russians and Poles at frequency of 1.6% and 4%, respectively. As the data were submitted to EMPOP, which uses a maximum likelihood approach (EMMA) [32] for mtDNA haplogroup assignment, we compared the results of haplogrouping with the use of mtPhyl and EMMA (Table S5). We have found that EMMA correctly confirmed the haplogroup status defined by mtPhyl, so the correspondence between results obtained using the two analyses is very high. The discrepancies between the results of haplogrouping were associated with either the detection of new mtDNA subgroups or the redescription of subhaplogroup-specific motifs, since shorter sets of subgroup-specific motifs were found in the samples studied. For instance, in Poles 5 samples could be assigned to 5 new subgroups (H1c23, T2b39, U4a1a1a, U4a2a4, W6a1) and 3 samples belong to subclades which could be defined by lower number of mutations (i.e., H4a1a1, HV1a1a, J1c2d) (Table S5, Figure S4). Russians have more inconsistencies, mainly due to the appearance of new mtDNA subgroups (Table S5). However, in some cases EMMA classifies the sample into a subgroup according to PhyloTree build 17, ignoring the mutations that determine the new subgroups of Russian mitogenomes. For instance, samples 13_VN and 39_Vl were assigned by EMMA to T2b4a defined (according to PhyloTree) by a transition at position 16172, while both of them, together with sample 128_T, displayed a transition at 6593 and thus were classified into a new subgroup T2b4j, including T2b4j1 characterized by 16172 transition (Figure S4). Sample 70_OR identified by EMMA as belonging to subgroup H1bh (defined by motif 11377-16239) has, however, transition at 10463 and could be assigned together with sample 60_PS to a new subgroup H1cl (Figure S4). Similarly, sample II-18_BG identified by EMMA as H2a5b1 (defined by 249delA variant according to PhyloTree build 17 classification) should be included in a new subgroup H2a5b3, because two samples (II-18_BG and 43_PS) shared the transition at the coding-region position 12052 and thus build the new 9

sub-branch of H2a5b-phylogeny characterized by recurrent deletion of adenine at the noncoding position 249 (Figure S4).

4. Discussion Complete mtDNA genome sequencing indicated high genetic diversity in Russian populations, but the level of Fst-based between-population differentiation is very low and statistically non-significant (Fst = 0.22%, p = 0.15). One should note that earlier AMOVA results based on the mtDNA HVSI variation in eleven Russian populations showed the low but statistically significant level of mtDNA differentiation (Fst = 0.42%, p = 0.005) in Russians [33]. Moreover, the results of the MDS analysis performed on the basis of pairwise Fst values for mtDNA HVSI sequences demonstrate that Russian populations do not cluster together and can be differentiated into subregions of European part of Russia [33]. Similar results indicative of a considerable heterogeneity of Russian populations were obtained previously by means of analysis of distribution of mtDNA haplogroups [33-36]. For instance, the separation of northwestern Russians (Pskov and Velikiy Novgorod) and northeastern Poles (Suwalki) from other regional Polish and Russian populations was revealed [34]. Morozova et al. [35] found that northern Russians (including Pskov and Velikiy Novgorod populations) are much closer to the Finno-Ugric-speaking populations than the southern Russians. Kushniarevich et al. [36] reported that the maternal gene pool of the BaltoSlavic populations bears some features similar to those of autosomal and Y-chromosomal ones such as the differentiation of northern Russians (from Arkhangelsk region) and the overlap between different Eastern Slavic (Russians, Belarusians and Ukrainians) and Baltic groups (Latvians and Lithuanians). However, the results of complete mtDNA genome sequencing presented here indicate a tight cluster of most Russian populations and Estonians, while Poles and Russians from Vladimir region are distanced from them. Thus, discordance 10

between the results of analysis of complete mitogenomes and mtDNA data with lower phylogenetic resolution (such as HVSI sequences and frequencies of haplogroups) can be explained by a relatively small number of populations characterized at the complete mtDNA level. Perhaps, the addition of new samples from more northern regions of European Russia will change slightly the level of mtDNA differentiation and diversity in Russians. It is known that the Bayesian skyline analysis is extremely accurate for detecting relatively recent demographic events [37-40]. Using mitogenomic data, it was found that major population expansions in Europe began before the Neolithic time and the advent of agriculture, at post-LGM as the temperature started to rise [38,39]. Results of our study confirm this point. Furthermore, we were able to obtain more detailed information about demographic expansions in Eastern Europe, probably due to much larger sample sizes of completely sequenced mtDNA genomes used in our study. As a result, we have observed an episode of rapid population growth starting from ~4.3 kya (95% CI: 2.9-5.8 kya), i.e. in the Bronze Age. This expansion strongly correlates with the Kurgan model established by archaeologists and confirmed by paleogeneticists [41,42]. It is noteworthy that the evolutionary ages of most mtDNA lineages specific to eastern and central Europeans (mtDNA subclades described within haplogroups H5, U2e, U3, U4, U5 [5,7,9]) were calculated to approximately 4 kya (from 2.3 to 5.9 kya), thus corresponding to population expansions of the Corded Ware culture (also known as the Battle-Axe culture), which flourished 5.2-3.8 kya in eastern and central Europe [43]. In conclusion, in this study we present population data for 376 Russian and 100 Polish mtDNA genomes. Overall, the data presented here may constitute a reference database of complete mitogenomes from two numerous Slavic-speaking nations of Central and Eastern Europe and thus may be of interest for both forensic geneticists and anthropologists. The results show that the use of complete mitogenomic data allows to increase the power of 11

discrimination among individuals, the quality being essential for forensic applications of mitochondrial marker. Simultaneously, this data gives a better insight into various features of the mtDNA landscape at the population level. A more comprehensive view of European genetic history requires additional complete mtDNA-based studies of different European populations.

Conflict of interest Authors declare no conflict of interests.

Acknowledgements This study was funded by the Russian Foundation for Basic Research (grant number 14-04-00131). KS was supported by a grant from the Faculty of Medicine, CM UMK, Poland (MN-4/WL/2016). We are grateful to Maria Perkova (Institute of Biological Problems of the North, Magadan, Russia), Ewa Lewandowska and Aneta Jakubowska (CM UMK, Poland) for technical assistance.

Appendix A. Supplementary data Supplementary data associated with this article can be found, in the online version, at doi:.

12

References [1] W. Parson, L. Gusmão, D.R. Hares, J.A. Irwin, W.R. Mayr, N. Morling, et al., DNA Commission of the International Society for Forensic Genetics. DNA Commission of the International Society for Forensic Genetics: revised and extended guidelines for mitochondrial

DNA

typing,

Forensic

Sci.

Int.

Genet.

13

(2014)

134-142,

doi:10.1016/j.fsigen.2014.07.010. [2] B.A. Malyarchuk, I.B. Rogozin, V.B. Berikov, M.V. Derenko, Analysis of phylogenetically reconstructed mutational spectra in human mitochondrial DNA control region, Hum. Genet. 111 (1) (2002) 46-53. [3] P. Soares, L. Ermini, N. Thomson, M. Mormina, T. Rito, A. Röhl, et al., Correcting for purifying selection: an improved human mitochondrial molecular clock, Am. J. Hum. Genet. 84 (6) (2009) 740-759, doi:10.1016/j.ajhg.2009.05.001. [4] T. Kivisild, Maternal ancestry and population history from whole mitochondrial genomes, Investig. Genet. 6 (2015) 3, doi:10.1186/s13323-015-0022-2. [5] B. Malyarchuk, T. Grzybowski, M. Derenko, M. Perkova, T. Vanecek, J. Lazur, et al., Mitochondrial DNA phylogeny in Eastern and Western Slavs, Mol. Biol. Evol. 25 (8) (2008) 1651-1658, doi:10.1093/molbev/msn114. [6] B. Malyarchuk, M. Derenko, T. Grzybowski, M. Perkova, U. Rogalla, T. Vanecek, I. Tsybovsky, The peopling of Europe from the mitochondrial haplogroup U5 perspective, PLoS One. 5 (4) (2010) e10285, doi:10.1371/journal.pone.0010285. [7] M. Mielnik-Sikorska, P. Daca, B. Malyarchuk, M. Derenko, K. Skonieczna, M. Perkova, et al., The history of Slavs inferred from complete mitochondrial genome sequences, PLoS One. 8 (1) (2013) e54360, doi: 10.1371/journal.pone.0054360.

13

[8] Derenko M, Malyarchuk B, Denisova G, Perkova M, Litvinov A, Grzybowski T, et al., Western Eurasian ancestry in modern Siberians based on mitogenomic data, BMC Evol. Biol. 14 (2014) 217, doi:10.1186/s12862-014-0217-9. [9] S. Davidovic, B. Malyarchuk, J.M. Aleksic, M. Derenko, V. Topalovic, A. Litvinov, et al., Mitochondrial DNA perspective of Serbian genetic diversity, Am. J. Phys. Anthropol. 156 (3) (2015) 449-465, doi:10.1002/ajpa.22670. [10] A. Torroni, C. Rengo, V. Guida, F. Cruciani, D. Sellitto, A. Coppa, et al., Do the four clades of the mtDNA haplogroup L2 evolve at different rates? Am. J. Hum. Genet. 69 (6) (2001) 1348-1356. [11] R.M. Andrews, I. Kubacka, P.F. Chinnery, R.N. Lightowlers, D.M. Turnbull, N. Howell, Reanalysis and revision of the Cambridge reference sequence for human mitochondrial DNA, Nat. Genet. 23 (2) (1999) 147. [12] A. Carracedo, W. Bär, P. Lincoln, W. Mayr, N. Morling, B. Olaisen et al., DNA Commission of the International Society for Forensic Genetics: guidelines for mitochondrial DNA typing, Forensic Sci. Int. 110 (2) (2000) 79-85, doi:10.1016/S0379-0738(00)00161-4. [13] H.J. Bandelt, W. Parson, Consistent treatment of length variants in the human mtDNA control region: a reappraisal, Int. J. Legal Med. 122 (1) (2008) 11-21, doi:10.1007/s00414006-0151-5. [14] R.S. Just, S.A. Fast, M.K. Scheible, K. Sturk-Andreaggi, A.W. Rock, J.M. Bush, et al., Full mtGenome reference-data: development and characterization of 588 forensic-quality haplotypes representing three U.S. populations, Forensic Sci. Int. Genet. 14 (2015) 141-155, doi:10.1016/j.fsigen.2014.09.021.

14

[15] P. Librado, J. Rozas, DnaSP v5: a software for comprehensive analysis of DNA polymorphism

data,

Bioinformatics.

25

(11)

(2009)

1451-1452,

doi:10.1093/bioinformatics/btp187. [16] L. Excoffier, H.E. Lischer, Arlequin suite ver 3.5: a new series of programs to perform population genetics analyses under Linux and Windows, Mol. Ecol. Resour. 10 (3) (2010) 564-567, doi: 10.1111/j.1755-0998.2010.02847.x. [17] M. Stoneking, D. Hedgecock, R.G. Higuchi, L. Vigilant, H.A. Erlich, Population variation of human mtDNA control region sequences detected by enzymatic amplification and sequence-specific oligonucleotide probes, Am. J. Hum. Genet. 48 (2) (1991) 370-382. [18] T. Heinz, M. Pala, A. Gómez-Carballa, M.B. Richards, A. Salas, Updating the African human mitochondrial DNA tree: Relevance to forensic and population genetics, Forensic Sci. Int. Genet. 27 (2017) 156-159, doi: 10.1016/j.fsigen.2016.12.016. [19] A.J. Drummond, M.A. Suchard, D. Xie, A. Rambaut, Bayesian phylogenetics with BEAUti

and

the

BEAST

1.7,

Mol.

Biol.

Evol.

29

(8)

(2012)

1969-1973,

doi:10.1093/molbev/mss075. [20] K. Tamura, D. Peterson, N. Peterson, G. Stecher, M. Nei, S. Kumar, MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum

parsimony

methods,

Mol.

Biol.

Evol.

28

(10)

(2011)

2731-2739,

doi:10.1093/molbev/msr121. [21] B. Malyarchuk, M. Derenko, G. Denisova, O. Kravtsova, Mitogenomic diversity in Tatars from the Volga-Ural region of Russia, Mol. Biol. Evol. 27 (10) (2010) 2220-2226, doi:10.1093/molbev/msq065. [22] M. Stoljarova, J.L. King, M. Takahashi, A. Aaspõllu, B. Budowle, Whole mitochondrial genome genetic diversity in an Estonian population sample, Int. J. Legal Med. 130 (1) (2016) 67-71, doi:10.1007/s00414-015-1249-4. 15

[23] C. Fraumene, E.M. Belle, L. Castri, S. Sanna, G. Mancosu, M. Cosso, et al., High resolution analysis and phylogenetic network construction using complete mtDNA sequences in

Sardinian

genetic

isolates,

Mol.

Biol.

Evol.

23

(11)

(2006)

2101-2111,

doi:10.1093/molbev/msl084. [24] W. Parson, A. Dür, EMPOP – a forensic mtDNA database, Forensic Sci. Int. Genet. 1 (2007) 88-92, doi:10.1016/j.fsigen.2007.01.018. [25] M. Derenko, B. Malyarchuk, T. Grzybowski, G.A. Denisova, I. Dambueva, M. Perkova, et al., Phylogeographic analysis of mitochondrial DNA in North Asian populations, Am. J. Hum. Genet. 81 (5) (2007) 1025-1041, doi:10.1086/522933. [26] B.A. Malyarchuk, M. Derenko, M. Perkova, T. Grzybowski, T. Vanecek, J. Lazur, Reconstructing the phylogeny of African mitochondrial DNA lineages in Slavs, Eur. J. Hum. Genet. 16 (9) (2008) 1091-1096, doi:10.1038/ejhg.2008.70. [27] M. Derenko, B. Malyarchuk, T. Grzybowski, G. Denisova, U. Rogalla, M. Perkova, et al., Origin and post-glacial dispersal of mitochondrial DNA haplogroups C and D in northern Asia, PLoS One. 5 (12) (2010) e15214, doi:10.1371/journal.pone.0015214. [28] M.G. Palanichamy, C.L. Zhang, B. Mitra, B. Malyarchuk, M. Derenko, T.K. Chaudhuri, Y.P. Zhang, Mitochondrial haplogroup N1a phylogeography, with implication to the origin of European farmers, BMC Evol. Biol. 10 (2010) 304, doi:10.1186/1471-2148-10-304. [29] M. Derenko, B. Malyarchuk, G. Denisova, M. Perkova, U. Rogalla, T. Grzybowski, et al., Complete mitochondrial DNA analysis of eastern Eurasian haplogroups rarely found in populations of northern Asia and eastern Europe, PLoS One. 7 (2) (2012) e32179, doi:10.1371/journal.pone.0032179.

16

[30] M. Derenko, B. Malyarchuk, A. Bahmanimehr, G. Denisova, M. Perkova, S. Farjadian, L. Yepiskoposyan, Complete mitochondrial DNA diversity in Iranians, PLoS One. 8 (11) (2013) e80673, doi:10.1371/journal.pone.0080673. [31] P. Marjoram, P. Donnelly, Pairwise comparisons of mitochondrial DNA sequences in subdivided populations and implications for early human evolution, Genetics. 136 (2) (1994) 673-683. [32] A.W. Röck, A. Dür, M. van Oven, W. Parson, Concept for estimating mitochondrial DNA haplogroups using a maximum likelihood approach (EMMA), Forensic Sci. Int. Genet. 7 (6) (2013) 601-609, doi: 10.1016/j.fsigen.2013.07.005. [33] B. Malyarchuk, M. Derenko, T. Grzybowski, A. Lunkina, J. Czarny, S. Rychkov, et al., Differentiation of mitochondrial DNA and Y chromosomes in Russian populations, Hum. Biol. 76 (6) (2004) 877-900. [34] T. Grzybowski, B.A. Malyarchuk, M.V. Derenko, M.A. Perkova, J. Bednarek, M. Woźniak, Complex interactions of the Eastern and Western Slavic populations with other European groups as revealed by mitochondrial DNA analysis, Forensic Sci. Int. Genet. 1 (2) (2007) 141-147, doi:10.1016/j.fsigen.2007.01.010. [35] I. Morozova, A. Evsyukov, A. Kon'kov, A. Grosheva, O. Zhukova, S. Rychkov, Russian ethnic history inferred from mitochondrial DNA diversity, Am. J. Phys. Anthropol. 147 (3) (2012) 341-351, doi:10.1002/ajpa.21649. [36] A. Kushniarevich, O. Utevska, M. Chuhryaeva, A. Agdzhoyan, K. Dibirova, I. Uktveryte, et al., Genetic heritage of the Balto-Slavic speaking populations: a synthesis of autosomal, mitochondrial and Y-chromosomal data, PLoS One. 10 (9) (2015) e0135820, doi:10.1371/journal.pone.0135820.

17

[37] C.R. Gignoux, B.M. Henn, J.L. Mountain, Rapid, global demographic expansions after the origins of agriculture, Proc. Natl. Acad. Sci. USA. 108 (15) (2011) 6044-6049, doi:10.1073/pnas.0914274108. [38] H.X. Zheng, S. Yan, Z.D. Qin, L. Jin, MtDNA analysis of global populations support that major population expansions began before Neolithic Time, Sci Rep. 2 (2012) 745, doi:10.1038/srep00745. [39] S. Lippold, H. Xu, A. Ko, M. Li, G. Renaud, A. Butthof, et al., Human paternal and maternal demographic histories: insights from high-resolution Y chromosome and mtDNA sequences, Investig. Genet. 5 (2014) 13, doi:10.1186/2041-2223-5-13. [40] M. Karmin, L. Saag, M. Vicente, M.A. Wilson Sayres, M. Järve, U.G. Talas, et al., A recent bottleneck of Y chromosome diversity coincides with a global change in culture, Genome Res. 25 (4) (2015) 459-466, doi:10.1101/gr.186684.114. [41] M.E. Allentoft, M. Sikora, K.G. Sjögren, S. Rasmussen, M. Rasmussen, J. Stenderup, et al., Population genomics of Bronze Age Eurasia, Nature. 522 (7555) (2015) 167-172, doi:10.1038/nature14507. [42] W. Haak, I. Lazaridis, N. Patterson, N. Rohland, S. Mallick, B. Llamas, et al., Massive migration from the steppe was a source for Indo-European languages in Europe, Nature. 522 (7555) (2015) 207-211, doi:10.1038/nature14317. [43] M. Gimbutas, The Slavs, New York: Praeger Publishing, 1971.

18

Titles and legends to figures Figure 1. Geographic locations of Russian and Polish samples used in this study.

Figure 2. BSP indicating the median of the hypothetical effective population size through time based on 376 Russian complete mtDNA genome sequences. The x-axis is the time from the present in units of thousands of years, and the y-axis is equal to Neμ (the product of the effective population size and mutation rate). The thick solid line represents the median posterior effective population size through time, and the thin lines show the 95% highest posterior density limits.

Figure 3. MDS plot based on Fst values calculated from complete mtDNA sequences for population samples from Europe.

19

20

21

22

Table 1 Summary statistics for Russians, Poles, Estonians, Volga Tatars and Sardinians based on complete mitogenome sequences Population

Russians:

N

k

s

Haplotype

Nucleotide

diversity

diversity

MPD

Tajima’s D (p value)

376 361 1097 1 ± 0

0.0018 ± 0.0001

29.02

-2.57 (< 0.001)

Belgorod

64

64

437

1 ± 0.003

0.0018 ± 0.0001

30.18

-2.39 (< 0.01)

Orel

48

48

310

1 ± 0.004

0.0017 ± 0.0001

28.4

-2.18 (< 0.01)

Tula

59

59

418

1 ± 0.003

0.0018 ± 0.0002

29.38

-2.42 (< 0.01)

Vladimir

73

71

433

0.999 ± 0.002 0.0019 ± 0.0001

31.38

-2.27 (< 0.01)

Pskov

68

66

368

0.999 ± 0.003 0.0016 ± 0.0001

26.88

-2.29 (< 0.01)

Velikiy

64

63

404

1 ± 0.003

27.99

-2.39 (< 0.01)

Poles

100 97

582

0.999 ± 0.002 0.002 ± 0.0001

32.48

-2.43 (< 0.01)

Estonians

119 106 481

0.999 ± 0.001 0.0017 ± 0.0001

27.93

-2.33 (< 0.01)

Volga Tatars

73

68

507

0.998 ± 0.003 0.0021 ± 0.0001

35.2

-2.33 (< 0.01)

Sardinians

63

50

234

0.992 ± 0.004 0.0015 ± 0.0001

24.31

-1.8 (< 0.05)

0.0017 ± 0.0001

Novgorod

N – number of individuals in each population; k – number of haplotypes; s – number of polymorphic sites; MPD – mean of pairwise differences between mtDNA sequences.

23