Forensically relevant phylogeographic evaluation of mitogenome variation in the Basque Country

Forensically relevant phylogeographic evaluation of mitogenome variation in the Basque Country

Forensic Science International: Genetics 46 (2020) 102260 Contents lists available at ScienceDirect Forensic Science International: Genetics journal...

272KB Sizes 0 Downloads 22 Views

Forensic Science International: Genetics 46 (2020) 102260

Contents lists available at ScienceDirect

Forensic Science International: Genetics journal homepage: www.elsevier.com/locate/fsigen

Forensically relevant phylogeographic evaluation of mitogenome variation in the Basque Country

T

Óscar Garcíaa,**, Santos Alonsob, Nicole Huberc, Martin Bodnerc, Walther Parsonc,d,* a

Forensic Science Unit, Forensic Genetics Section, Basque Country Police, Erandio (Bizkaia), Spain Department of Genetics, Physical Anthropology and Animal Physiology, University of the Basque Country, Spain c Institute of Legal Medicine, Medical University of Innsbruck, Innsbruck, Austria d Forensic Science Program, The Pennsylvania State University, University Park, Pennsylvania, USA b

A R T I C LE I N FO

A B S T R A C T

Keywords: Mitochondrial DNA Massively parallel next generation sequencing Haplogroups Phylogeny Phylogeography Forensic science

The Basque Country has been the focus of population (genetic) and evolutionary studies for decades, as it represents an interesting evolutionary feature: it is the only European country where a non-Indo-European language is still spoken today and, for which there are no known living or extinct relatives. Early studies that were based on anatomical and serological methods, along with subsequent molecular genetic investigations, contain controversial interpretations of their data. Additionally, the analysis of mitochondrial DNA, which is maternally inherited and thus suitable for the examination of the maternal phylogeny of the population, was the focus of some studies. Early mtDNA studies were however restricted to the information provided by the control region or its hypervariable segments only. These are known to harbour little phylogenetic information, particularly for haplogroup H that is dominant in Westeurasian populations including the Basques. Later studies analysed complete mitogenome sequences. Their information content is however limited, either because the number of samples was low, or because these studies only considered particular haplogroups. In this study we present the full mitogenome sequences of 178 autochthonous Basque individuals that were carefully selected based on their familial descent and discuss the observed phylogenetic signals in the light of earlier published findings. We confirm the presence of Basque-specific mtDNA lineages and extend the knowledge of these lineages by providing data on their distribution in comparison to other Basque and non-Basque populations. This dataset improves our understanding of the Basque mtDNA phylogeny and serves as a high-quality dataset that is provided via EMPOP for forensic genetic purposes.

1. Introduction Basque is a non-Indoeuropean language that is today mainly spoken on both sides of the western Pyrenees along the Bay of Biscay. It is the only language in Europe that has no known living or extinct relatives and thus, Basques have been subject to demographic and evolutionary studies. In the 19th century, Paul Broca's ideas were en vogue, according to which the morphology of the cranium would provide useful information for establishing the relationships between populations. In this context, the findings of human remains showing a so called 'Basque cranial type' in at least some prehistoric sites of the area were considered in support of the hypothesis of a local evolution of the Basque population since the Upper Palaeolithic [1]. Although this hypothesis

was later proven wrong [2], the idea of an indigenous and local evolution of the Cro-Magnon humans towards the ‘Basque type’ remained. Basques were found to show very high frequencies of Rhesus negative blood types and also high frequencies of blood group 0 [3], which was interpreted in the way that Basques could be a relict population without significant admixture of elements common in the general western European populations [4]. This line of thought and argument was later confirmed by several independent studies including classical and genetic markers [5–10]. The data supported a pre-Neolithic origin and an in situ evolution for at least some of the lineages of present day Basques. Based on genome wide data, Laayouni et al. [11] found that Basques are not the most differentiated of the Spanish populations and concluded that the presumed genetic individuality of the Basque does not really



Corresponding author at: Institute of Legal Medicine, Medical University of Innsbruck, Müllerstrasse 44, 6020 Innsbruck, Austria. Corresponding author at: Forensic Science Unit, Forensic Genetics Section, Basque Country Police, Erandio (Bizkaia), Larrauri Mendotxe Bidea 18, 48950, Erandio, Spain. E-mail addresses: [email protected] (Ó. García), [email protected] (S. Alonso), [email protected] (N. Huber), [email protected] (M. Bodner), [email protected] (W. Parson). ⁎⁎

https://doi.org/10.1016/j.fsigen.2020.102260 Received 12 August 2019; Received in revised form 26 November 2019; Accepted 1 February 2020 Available online 06 February 2020 1872-4973/ © 2020 Elsevier B.V. All rights reserved.

Forensic Science International: Genetics 46 (2020) 102260

Ó. García, et al.

2.2. Library preparation, emulsion PCR and enrichment

exist when analysed at the genomic level. On the contrary, Rodriguez Ezpeleta et al. [12] analysed genome wide SNPs and concluded that Basques (French and Spanish) form a homogeneous population, which seems to cluster apart from the rest of the European populations. Nevertheless, the inclusion of further samples of "resident" Basque individuals in the Basque Country (individuals with at least one grandparent from outside the Basque Country) seems to fill in this gap between Basque and Europeans, with some of these "resident" Basque fitting well into the presumed Basque cluster. At the mitochondrial DNA (mtDNA) level, which is the focus of this study, the early work of Bertranpetit et al. [13] analysing the hypervariable segment I (HVS-I) of the control region (CR) could not confirm this distinctiveness of the Basques. Instead, it reflected an apparent lack of geographical clustering of European populations, possibly due to the high mutation rates and relatively low diversity indices for HVS-I. Later, Richards et al. demonstrated that the only consistently different European population were the Basque and explained their genetic individuality with genetic isolation and drift [14]. Notwithstanding, different researchers kept using Basques as representatives of the first settlers of the Upper Palaeolithic (e.g [15].). A broad mtDNA study on European populations including Basques focused on the distribution of haplogroup V [16]. The authors reported a higher diversity in Basques with a decline of diversity towards the North, which the authors attributed to a post-glacial population expansion from an area comprising the Basque Country towards Northern Europe. Follow up studies however, failed to confirm the existence of this haplogroup in prehistoric (4–5 thousand years ago [kya]) and historic (6th–7th centuries BC) times [17,18]. Also, a more recent study reported no evidence to support the Franco-Cantabrian refuge-expansion hypothesis, as all measures of gene diversity pointed to the Cantabrian fringe in general, and the Basques in particular, as being less polymorphic for mtDNA haplogroups V, H1 and H3 than other southern regions in Iberia or in Central Europe [19]. These findings were contrasted by Achilli et al. [20] whose results suggest that the FrancoCantabrian refuge area was indeed the source of late glacial expansions of hunter-gatherers that repopulated Central and Northern Europe after the last glacial maximum. In the forensic genetic context this study adds full mitochondrial genome (mitogenome) sequences of 178 autochthonous Basque individuals to the body of data relevant for forensic purposes. This is particularly important as the forensic field has moved to Massively Parallel Sequencing (MPS) of mtDNA, which enables the generation of full mitogenome sequences from even degraded forensically relevant specimens (e.g. [21–25]). Entire mitogenome sequences provide substantially increased discrimination power (e.g [26].) and more detailed haplogroup information (e.g [22].). This study also sheds more light onto the maternal phylogenetic background of this population and provides a reference dataset for future population genetic and forensic studies at highest resolution. One mitogenome sequence was published earlier [19], but is added to the 177 novel mitogenomes from this study as it fitted the sampling scheme.

Samples were amplified using the Precision ID mtDNA Whole Genome Panel (Thermo Fisher Scientific (TFS), MA, USA) using 1 ng of genomic DNA. DNA libraries were prepared using the Ion AmpliSeq DL8 chemistry and the Ion Chef instrument (TFS) according to the manufacturer’s recommendations. Libraries were quantified using the Ion Library Quantitation kit (TFS) and pooled to an equimolar concentration of 30 pM for template preparation. Template preparation, enrichment of beads containing the template and chip loading were performed using the Ion Chef instrument and the Ion PGM Hi Q Kit according to the manufacturer’s recommendations (TFS). 2.3. Sequencing and data analyses Ion 318 v2 chips (TFS) were sequenced on the Ion PGM instrument (TFS) together with the Ion PGM Hi Q View Chef Supplies Kit (TFS). Primary sequencing data were aligned and reported relative to the rCRS (GenBank: NC_012920, [32]) using the Torrent Suite Software v5.2.1 (TFS) under default settings. Secondary sequence analyses were performed using the Coverage Analysis (v5.2.1.2) and the Variant Caller (v5.2.1.38) plugins. For the scored variants in VCF (variant call format) files with unexpected or missing mutations, we further manually checked the corresponding BAM (binary alignment map) files with Integrative Genomics Viewer (IGV v2.3.72; Broad Institute, MA, USA [33];) and GeneMarker HTS (demo version, v1.0.17.1266; SoftGenetics, PA, USA; [34]). Mitochondrial haplogroups were estimated using HaploGrep 2 [35], according to PhyloTree Build 17 (www.phylotree. org; [36]) and confirmed and/or adapted using SAM2 [37] provided by EMPOP (https://empop.online; [38]). The FASTA format sequences are available at GenBank under accession numbers MN046411-MN046587 and GQ888723 (mitogenome Basque_98 [19]) and on EMPOP in phylogenetic alignment [37,39] under accession number EMP00756. The entire dataset underwent EMPOP quality control according to [38]. 2.4. Statistical analyses Mean read length and coverage were determined according to [40]. Twenty-four samples were picked randomly to determine strand bias and forward/reverse strand coverage. Strand balance was calculated by dividing the coverage of the forward strands by the total coverage and strand bias was considered when the obtained value was < 0.3 or > 0.7. Coverage was estimated as the number of reads times read length and divided by the effective mitochondrial genome size (16,568 bp). General data handling and statistical testing were performed with “Excel 2016″ (Microsoft Corporation, Redmond, WA). Inter-and intra-population comparisons, forensic and population genetic molecular diversity indices and the analysis of molecular variance (AMOVA) were calculated for the two hypervariable segments (HVS-I: nps 16024–16365, and HVS-II: nps 73–340; excluding homopolymeric C-stretches) using Arlequin v3.5.1.2 [41]. Calculations for the 177 mitogenomes presented in this study plus one additional mitogenome published earlier in [19] were performed from complete mitogenomes and CR data except C-stretches around nps 309, 573 and 16193. Power of discrimination haplotype diversity and random match probability (RMP) were calculated as suggested in [42]. The novel Basque population was compared to 18 published mtDNA datasets from available surrounding Basque and non-Basque populations from Spain, Portugal and France for the HVS-I and HVS-II regions excluding cytosine runs (nps 16024–16193, 16194–16365, 73–309, 310–340). The 178 mitotypes presented herein were compared to a total of 3236 elsewhere published mitotypes, comprising (i) six further Basque population datasets (1157 samples): the Basque samples included in Prieto et al. 2011 (n = 84, EMP00293 [43]), Cardoso et al. 2012 (n = 106, EMP00365 [44]), Cardoso et al. 2013 (n = 100 [45]), Núñez et al. 2016 (n = 107, EMP00556 and EMP00557 [46]), Palencia-Madrid

2. Material and methods 2.1. Samples and DNA extraction Blood samples were collected from 178 unrelated healthy Basque autochthonous donors following informed consent. Individuals were considered autochthonous based on the eight surnames and birthplaces of their grandparents. This sampling scheme excludes recent migratory events, mainly from the early to mid-20th century as a result of the strong development of the Basque industrial sector. The same samples were previously analysed for autosomal STRs [27], Y-chromosomal STRs [28], SNPs [29,30] and INDELs [31]. DNA was extracted by a standard phenol chloroform extraction procedure. 2

Forensic Science International: Genetics 46 (2020) 102260

Ó. García, et al.

3. Results and discussion

of pairwise differences (MNPD) was 24.8 ± 10.92 and 7.8 ± 3.64, confirming the higher discriminatory capacity of mitogenomes as compared to the standard CR range also in this population (Table 1). The most common mitogenome, with nine observations (5.1 %), belonged to haplogroup U5b1f1a (PhyloTree, build 17 [36]), mitotype 73G 150T 263G 315.1C 533G 750G 1438G 2706G 3197C 3507A 4769G 5656G 7028T 7768G 8860G 9477A 11467G 11719A 12308G 12372A 13617C 14182C 14766T 15326G 16192T 16270T 16319A (relative to the rCRS [32]), followed by the haplogroup U5b1g mitogenome 73G 150T 151T 228A 263G 315.1C 750G 1438G 2706G 3197C 4769G 5656G 7028T 7768G 8860G 9477A 10654T 11467G 11719A 12308G 12372A 13617C 13759A 14182C 14577C 14766T 15326G 16192T 16270T 16519C, as well as the haplogroup V22 mitogenome 72C 150T 263G 315.1C 709A 750G 1438G 2706G 4580A 4769G 7028T 7765G 8860G 15326G 15904T 16298C, with five observations each (2.9 %). The top ranking mitogenome (see above) was also the most frequent when considering only the CR range, with 12 observations (6.7 %). Another haplotype, viz. 152C 263G 315.1C 456T 16304C, occurred with the same frequency. The third most common CR mitotype was 263G 315.1C 16129A 16519C, with nine observations (5.1 %). We have observed 16 instances of point heteroplasmy (about 9 % of the samples) at positions 1555R, 3497Y, 4659R, 4811R, 4935R, 5460R, 6248Y, 6267R, 9993Y, 11914R, 12286R, 12295Y, 13111Y, 14305R, 14318Y, and 15916Y. The heteroplasmy threshold was set at 10 % of total coverage.

3.1. Overall run performance

3.3. Phylogeographic evaluation

We have analysed 178 mitogenomes from autochthonous Basque individuals using a total of 17 Ion 318 v2 chips (PGM). The mean raw accuracy values were above 99 %, which means that more than 99 % of the sequencing reads were mapped correctly to the reference sequence. An average ISP (Ion Sphere Particle) loading of 76 % was obtained and the final average usable reads were 47 % (Table S1). The average number of reads generated for all sequenced samples was 33,067,553 with a mean read length of 117 bp and the total number of reads per sample ranged from 96,601–773,633 with an average of 269,728 reads per sample (Table S2). Read depth ranged from 676 to 6306 reads with an average of 2008 reads across all samples (Table S2). The raw data of 24 randomly selected samples were chosen to study forward/reverse sequence coverage (Fig. S1) and strand bias (Fig. S2). Our data confirm earlier findings, in which a few coverage tips were observed along the mitogenome, e.g. between positions 10409–10464, while the majority of regions were covered with both sequencing strands (e.g. [54]). The majority of analysed regions fell within the 30–70 % range in terms of strand bias with some exceptions, i.e. between positions 546–593, 7332–7351, 10945–11000, 12975–13073, 13650–13698 and 14754–14843 (Fig. S2) A possible explanation for the low coverage and strand bias detected in several regions of the mitogenome could be due to the flow-call of the proton-based Ion system in homopolymer regions [55–57]. Affected regions were 10409–10464 (3-4-mers), 546–593 (36-mers), 7332–7351 (4-mer), 10945–11000 (4-6-mers), 12975–13073 (3-4-mers), 13650–13698 (3-4-mers) and 14754–14843 (3-5-mers).

Numerous (mito-)genetic studies have aimed to shed light on the genetic history of Basque populations. Early investigations focussed on their uniqueness and homogeneity given the varying frequencies of genetic markers found in different Basque samples [58–60]. A comparison of mtDNA and Y-chromosomal diversity in Basque and nonBasque speaking neighbour populations found that only 2 % of variation was attributable to inter-group differences best explained by geographical adscription to the hypothetical tribal structure described by Roman historians, while no genetic structure relating to Basque dialects was found [61]. A pivotal study claimed a northward population expansion from a refuge comprising the Basque Country ∼12-13 kya, probably after the Last Glacial Maximum, from haplogroup V diversity that dropped towards the North [16]. Later studies did not confirm this hypothesis, as the Cantabrian fringe populations, in particular the Basques, were found less diverse for the investigated mtDNA haplogroups V, H1, and H3 than other Southern regions in Iberia or in Central Europe [19], with no genetic continuity since prehistoric (4–5 kya) and historic (6th-7th centuries BC) times being found [17,18]. Another study described that Basques, as all Iberian populations, underwent a 40 % replacement of their ancestry including an almost complete replacement of male lineages by Pontic Steppe and Northern and Central European populations arriving during the Bronze Age and Iron Age [62]. Present-day Basques were described as a typical Iron Age population without the Mediterranean and North African admixture events that later affected the rest of Iberia. The replacement also did not affect the language, as Basque is the only surviving pre-Indo European language in Europe [62]. On the maternal side, partial genetic continuity of contemporary Basques with preceding Paleo- and Mesolithic settlers was postulated from Basque-specific mtDNA lineages at the level of maximum resolution [48]. Ten mtDNA clades have meanwhile been suggested as autochthonous to Basques from frequency and dispersal patterns, viz. H1e1a1, H1j1, H1t1, H1av1, H3c2a [48], H2a5a1, [48,63], HV4a1a [64], J1c5c1, U5b1f1a, and V22 [45]. The 178 mitogenomes presented in this study revealed an ample Westeurasian haplogroup spectrum. They classified into 86 named clades (using all information) according to PhyloTree Build 17 [36]. The top-ranking haplogroups were U5b1f1a (17 occurrences, 9.55 %), V22 (7 occurrences, 3.93 %) and H1j1a (6 occurrences, 3.37 %), while

et al. 2017 (n = 158, EMP00668 [47]), as well as the mitotypes from Araba, Biscaye, Biscaye W, Guipuscoa, Guipuscoa SW, Labourdin, Navarra CW, Navarra NE, Navarra NW, Navarra France, and Soule published in Behar et al. 2012 [48] (n = 602, excluding those exhibiting base mis-scoring [49]); (ii) three population datasets from Portugal (523 samples): the Portuguese samples included in Prieto et al. 2011 (n = 240, EMP00292, EMP00294, EMP00295 [43]), Marques et al. 2015 (n = 162, EMP00552, EMP00553, EMP00554 [50]), as well as Mairal et al. 2013 (n = 121, EMP00617 [51]), (iii) one population dataset from France (162 samples): the French samples from Bigorre, Bearn, and Chalosse included in Behar et al. 2012 [48] excluding those exhibiting base mis-scoring), and (iv) eight population datasets from Spain (1394 samples): Ramos et al. 2013 (n = 101, EMP00555 [52]), Crespillo et al. unpublished (n = 154, EMP00023), Lopez Parra et al. unpublished (n = 438, EMP000533), the Spanish samples included in Prieto et al. 2011 (n = 154, EMP00024; n = 249, EMP00290 [43]), Cardoso et al. 2010 (n = 61, EMP00400 [53]), Mairal et al. 2013 (n = 113, EMP00618 [51]), as well as the Spanish samples from Aragon, Burgos, Cantabria, and La Rioja included in Behar et al. 2012 (n = 124 [48]). A multi dimensional scaling (MDS) plot was created from pairwise FST values to visualize the genetic relation of these populations using SPSS v22 (IBM, Armonk, NY).

3.2. Forensic and population genetic parameters We here present the analysis of entire mitogenome sequences from 177 novel and one published [19] autochthonous Basque individuals that were selected based on their family names and the names and birthplaces of their grandparents. This Basque sample (n = 178) contained 141 (79.2 %) different mitogenomes and 96 (53.9 %) different control region sequences disregarding length variants around nps 16193, 309 and 573, of which 122 (86.5 %) and 67 (69.8 %) were singletons, respectively (Table 1). For the mitogenome and the control region, RMP was calculated as 1.08 % and 2.25 %, respectively. The power of discrimination was 99.5 % and 98.3 %, and the mean number 3

Forensic Science International: Genetics 46 (2020) 102260

Ó. García, et al.

Table 1 Forensic and population genetic parameters of the 178 mitotypes, (a) mitogenome range except C-stretches around nps 309, 573 and 16193: 1–309 310–573 574–16193 16194-16569; (b) CR 16024-576 range except C-stretches around nps 309, 573 and 16193: 16024–16193 16194-309 310–573 574-576; (c) HVS-I and HVS-II range except C-stretches around nps 309, 16193: 16024–16193 16194–16365 73–309 310–340. MNPD - mean number of pairwise differences.

Range # of samples # of different mitotypes # of singletons MNPD Power of discrimination (%) RMP (%) Number of hgs Most frequent hg

Mitogenome

Control region

HVS-I/HVS-II

1–309 310–573 574–16193 16194-16569 178 141 122 24.8 ± 10.9 99.5 1.079 86 17 occurrences

16024–16193 16194-309 310–573 574-576 178 96 67 7.8 ± 3.6 98.3 2.254 62 25 occurrences

16024–16193 16194–16365 73–309 310–340 178 85 56 5.8 ± 2.8 97.2 3.358 59 29 occurrences

significant (15 at a nominal cut-off value of 5 % and 10 after a standard Bonferroni correction). The single French dataset yielded the least differentiation (0.002), followed by the six published Basque samples (mean and median 0.006), the eight Spanish samples (mean 0.011, median 0.008) and the three Portuguese samples (mean 0.016, median 0.014) (Table S5, Table S6). AMOVA analysis revealed that 99.08 % of variation was attributable to differences within the populations, and only 0.92 % to those among populations (Table S7). In the correspondence analysis performed using pairwise FST values, the novel Basque population (this study) clustered together the six published Basque datasets. This Basque cluster was closest to another cluster formed by the Spanish and French datasets, while the Portuguese samples located slightly more distant. The smallest (Spanish) population set included in this study [53] derived from an isolate population (Pas Valley, Cantabria) and was, coherently, an outlier also in this analysis (Fig. S4).

53 clades were just represented by singletons in this dataset (Table S3). Grouping the mitogenome-based clades into their descriptive, yet not coequal, “superhaplogroups” at the one-letter level (plus HV0 and R0) revealed major contributions of superhaplogroups H (48.9 %) and U (20.2 %), smaller proportions of V (8.4 %), K (7.3 %), T (7.3 %), J (3.3 %), X (2.8 %), as well as single observations of HV0, M, and W (0.6 % each) (Fig. S3). All ten mtDNA lineages (including subclades) described as Basquespecific in the literature (see above) were also found in the dataset presented in this study and altogether comprised 34.27 % of the samples (see Table S3). Further inspection of the novel complete mitogenome data with respect to the previous studies allowed the identification of six further potentially Basque-specific mtDNA clades: (i) H1b4 (four published samples [48]), two in this study), (ii) H1c4a1 (seven published samples [45–48]), five in this study), (iii) H1r1 (four published samples [19,45,48]), two in this study), (iv) H3at1 (five published samples [48]), three in this study, (v) H5a3a1 (eight published samples [29–32]), three in this study), and (vi) H14b1 (three published samples [48]), three in this study; Table S3). Including also these six clades, a total of 44.38 % of our samples may be attributed to 16 potentially autochthonous Basque mtDNA lineages.

4. Conclusions As Michelena [65] put it already over fifty years ago, “[it] can be assured that the search will continue in the future undeterred by the scarcity of previous harvests. May Fortune, friend always to the bold and persevering, ever deign to smile to us”. Part of the problem may reside in that these studies have traditionally been interpretational, meaning that researchers first carried out a population genetic study and only then browsed historical events in an attempt to explain the observed results. This is bound to lead to contradictory interpretations because “almost any finding of population genetics may be associated with a historical episode potentially accounting for it, but also with other historical episodes suggesting the contrary” [66]. Not to mention the preconceived ideas that can inadvertently interfere with a rigorous interpretation. Olalde et al. [62] suggest that approximately 40 % of the local population and almost 100 % of the men were replaced in Iberia with the arrival of people from the Steppe in the Bronze Age (see above). If so, then an apparent contradiction arises with Basques, having a strong Indo-European component at the genetic level, nevertheless speaking the only still-surviving pre-Indo European language in Europe. However, the replacement seems to have affected particularly men and not women. This would explain our confirmed observation of Basque-specific mtDNA lineages that date prior to that period [45] and could perhaps even go back to the Palaeolithic [48,67]. This would also explain the maintenance of the Basque language by a possible maternal transmission.

3.4. Population genetic comparison of the 178 mitotypes with surrounding Basque and non-Basque populations The novel Basque dataset presented in this study shared 81.2 % of its haplotypes (comprising 91.0 % of its samples) with the 18 surrounding population datasets gathered from the literature in the largest common sequenced segment, nps 16024–16193, 16194–16365, 73–309, and 310–340 (HVS-I and HVS-II excluding length variants; Table S4). The largest proportion of shared haplotypes was found with the six Basque populations, including 1157 samples [43–48]: an average of 34.0 % (median 34.6 %) of haplotypes were shared with the novel dataset, comprising 54.1 % (median 57.0 %) of samples. The French sample (n = 162 [48]) was second and shared 17.5 % of haplotypes comprising 34.0 % of its samples. The eight Spanish population samples (n = 1394, Ramos et al. 2013 [52], Crespillo et al. unpublished, Lopez Parra et al. unpublished, Prieto et al. 2011 [43], Cardoso et al. 2010 [54], Mairal et al. 2013 [51], Behar et al. 2012 [48]) ranked third with an average of 15.2 % (median 14.7 %) of shared haplotypes and 28.0 % (median 26.3 %) of shared samples, respectively. The geographically most distant set of three samples from Portugal (n = 523, Prieto et al. 2011 [43], Marques et al. 2015 [50], Mairal et al. 2013 [51]) shared fewest haplotypes (9.2 %, median 10.3 %) and samples (21.9 %, median 22.2 %) in this comparison. Only haplotype 263G 315.1C was shared between all populations in the investigated mitogenome segment. Comparisons between the novel Basque dataset and the 18 published populations in the above-mentioned mtDNA segment yielded generally low pairwise FST values, the majority of which were

Supplementary material The complete mitogenome sequences are also available from GenBank (accession numbers MN046411-MN046587 for 177 novel sequences). Mitogenome Basque_98 has been previously deposited (accession number GQ888723) [19]. 4

Forensic Science International: Genetics 46 (2020) 102260

Ó. García, et al.

Acknowledgements

8–15. [24] C. Strobl, M. Eduardoff, M.M. Bus, M. Allen, W. Parson, Evaluation of the Precision ID whole MtDNA genome panel for forensic analyses, Forensic Sci. Int. Genet. 35 (2018) 21–25. [25] C. Strobl, J.C. Cichlar, R. Lagacé, S. Wootton, C. Roth, N. Huber, L. Schnaller, B. Zimmermann, G. Huber, S.L. Hong, R. Moura-Neto, R. Silva, F. Alshamali, L. Souto, K. Anslinger, B. Egyed, R. Jankova-Ajanovska, A. Casas-Vargas, W. Usaquén, D. Silva, C. Barletta-Carrillo, D.H. Tineo, C. Vullo, R. Würzner, C. Xavier, L. Gusmão, H. Niederstätter, M. Bodner, B. Budowle, W. Parson, Evaluation of mitogenome sequence concordance, heteroplasmy detection, and haplogrouping in a worldwide lineage study using the Precision ID mtDNA Whole Genome Panel, Forensic Sci. Int. Genet. 42 (2019) 244–251. [26] R.S. Just, J.A. Irwin, W. Parson, Mitochondrial DNA heteroplasmy in the emerging field of massively parallel sequencing, Forensic Sci. Int. Genet. 18 (2015) 131–139. [27] O. García, I. Uriarte, P. Martín, C. Albarrán, A. Alonso, STR data from Basque Country autochthonous population, Forensic Sci. Int. 115 (2001) 111–112. [28] S. Alonso, C. Flores, V. Cabrera, A. Alonso, P. Martín, C. Albarrán, N. Izagirre, C. de la Rúa, O. García, The Basques Y chromosome in the European landscape, Eur. J. Hum. Genet. 13 (2005) 1293–1302. [29] O. García, A. Soto, I. Yurrebaso, Allele frequencies and other forensic parameters of the HID-Ion AmpliSeqTM Identity Panel markers in Basques using the Ion Torrent PGMTM platform, Forensic Sci. Int. Genet. 28 (2017) e8–e10. [30] O. García, J.A. Ajuriagerra, A. Alday, S. Alonso, J.A. Pérez, A. Soto, I. Uriarte, I. Yurrebaso, Frequencies of the precision ID ancestry panel markers in Basques using the Ion Torrent PGMTM platform, Forensic Sci. Int. Genet. 31 (2017) e1–e4. [31] P. Martín, O. García, B. Heinrichs, I. Yurrebaso, A. Aguirre, A. Alonso, Population genetic data of 30 autosomal indels in Central Spain and the Basque Country populations, Forensic Sci. Int. Genet. 7 (2013) e27–e30. [32] R.M. Andrews, I. Kubacka, P.F. Chinnery, R.N. Lightowlers, D.M. Turnbull, N. Howell, Reanalysis and revision of the Cambridge reference sequence for human mitochondrial DNA, Nat. Genet. 23 (1999) 147. [33] H. Thorvaldsdóttir, J.T. Robinson, J.P. Mesirov, Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration, Brief. Bioinform. 14 (2013) 178–192. [34] M.M. Holland, E.D. Pack, J.A. McElhoe, Evaluation of GeneMarker® HTS for improved alignment of mtDNA MPS data, haplotype determination, and heteroplasmy assessment, Forensic Sci. Int. Genet. 28 (2017) 90–98. [35] H. Weissensteiner, D. Pacher, A. Kloss-Brandstätter, L. Forer, G. Specht, H.J. Bandelt, F. Kronenberg, A. Salas, S. Schönherr, HaploGrep 2: mitochondrial haplogroup classification in the era of high-throughput sequencing, Nucleic Acids Res. 44 (2016) W58–W63. [36] M. van Oven, M. Kayser, Updated comprehensive phylogenetic tree of global human mitochondrial DNA variation, Hum. Mutat. 30 (2009) e386–e394. [37] N. Huber, W. Parson, A. Dür, Next generation database search algorithm for forensic mitogenome analyses, Forensic Sci. Int. Genet. 37 (2018) 204–214. [38] W. Parson, A. Dür, EMPOP–a forensic mtDNA database, Forensic Sci. Int. Genet. 1 (2007) 88–92. [39] W. Parson, L. Gusmão, D.R. Hares, J.A. Irwin, W.R. Mayr, N. Morling, E. Pokorak, M. Prinz, A. Salas, P.M. Schneider, T.J. Parsons, DNA Commission of the International Society for Forensic Genetics: revised and extended guidelines for mitochondrial DNA typing, Forensic Sci. Int. Genet. 13 (2014) 134–142. [40] E. Lander, M. Waterman, Genomic mapping by fingerprinting random clones: a mathematical analysis, Genomics 3 (1998) 231–239. [41] L. Excoffier, H.E. Lischer, Arlequin suite ver 3.5: a new series of programs to perform population genetics analyses under Linux and Windows, Mol. Ecol. Resour. 10 (2010) 564–567. [42] M. Stoneking, D. Hedgecock, R.G. Higuchi, L. Vigilant, H.A. Erlich, Population variation of human mtDNA control region sequences detected by enzymatic amplification and sequence specific oligonucleotide probes, Am. J. Hum. Genet. 48 (1991) 370–382. [43] L. Prieto, B. Zimmermann, A. Goios, A. Rodriguez-Monge, G.G. Paneto, C. Alves, A. Alonso, C. Fridman, S. Cardoso, G. Lima, M.J. Anjos, M.R. Whittle, M. Montesino, R.M. Cicarelli, A.M. Rocha, C. Albarrán, M.M. de Pancorbo, M.F. Pinheiro, M. Carvalho, D.R. Sumita, W. Parson, The GHEP EMPOP collaboration on mtDNA population data–A new resource for forensic casework, Forensic Sci. Int. Genet. 5 (2011) 146–151. [44] S. Cardoso, M.J. Villanueva-Millán, L. Valverde, A. Odriozola, J.M. Aznar, S. Piñeiro-Hermida, M.M. de Pancorbo, Mitochondrial DNA control region variation in an autochthonous Basque population sample from the Basque Country, Forensic Sci. Int. Genet. 6 (2012) e106–e108. [45] S. Cardoso, L. Valverde, M.A. Alfonso-Sánchez, L. Palencia-Madrid, X. Elcoroaristizabal, J. Algorta, S. Catarino, D. Arteta, R.J. Herrera, M.T. Zarrabeitia, J.A. Peña, M.M. de Pancorbo, The expanded mtDNA phylogeny of the Franco-Cantabrian region upholds the pre-neolithic genetic substrate of basques, PLoS One 8 (2013) e67835. [46] C. Núñez, M. Baeta, S. Cardoso, L. Palencia-Madrid, N. García-Romero, A. Llanos, M.M. de Pancorbo, Mitochondrial DNA reveals the trace of the ancient settlers of a violently devastated late bronze and Iron ages village, PLoS One 11 (2016) e0155342. [47] L. Palencia-Madrid, S. Cardoso, C. Keyser, J.C. López-Quintana, A. Guenaga-Lizasu, M.M. de Pancorbo, Ancient mitochondrial lineages support the prehistoric maternal root of Basques in Northern Iberian Peninsula, Eur. J. Hum. Genet. 25 (2017) 631–636. [48] D.M. Behar, C. Harmant, J. Manry, M. van Oven, W. Haak, B. Martinez-Cruz, J. Salaberria, B. Oyharçabal, F. Bauduer, D. Comas, L. Quintana-Murci, The Genographic Consortium, the Basque Paradigm: genetic evidence of a maternal

The authors would like to thank all individuals that donated their DNA for this study. SA was funded by the Spanish Ministry of Economy, Industry and Competitiveness (CGL2014-58526-P) and the Basque Government (IT1138-16). The authors would like to thank David Ballard (London) for reviewing and commenting on the manuscript. Appendix A. Supplementary data Supplementary material related to this article can be found, in the online version, at doi:https://doi.org/10.1016/j.fsigen.2020.102260. References [1] T. Aranzadi, J.M. Barandiarán, Excavaciones de la cueva de Urtiaga 1928-1936, Eusko Jakintza 11 (1948) 285–330. [2] J. Altuna, C. de la Rúa, Dataciones absolutas de los cráneos del yacimiento prehistórico de Urtiaga, Munibe 41 (1989) 23–28. [3] M.A. Etcheberry, El factor rhesus, su genética e importancia clínica, El Día Médico 17 (1945) 1237. [4] A.E. Mourant, The blood groups of the basques, Nature 160 (1947) 505–506. [5] L.L. Cavalli-Sforza, The basque population and ancient migrations in Europe, Munibe (Antropología y Arqueología) 6 (1988) 129–137. [6] J. Bertranpetit, L.L. Cavalli Sforza, A genetic reconstruction of the history of the population of the Iberian Peninsula, Ann. Hum. Genet. 55 (1991) 51–67. [7] F. Calafell, J. Bertranpetit, Principal component analysis of gene-frequencies and the origin of Basques, Am. J. Phys. Anthropol. 93 (1994) 201–215. [8] L.L. Cavalli-Sforza, P. Menozzi, A. Piazza, The History and Geography of Human Genes, Princeton University Press, Princeton (NJ), 1994. [9] S. Alonso, J.A.L. Armour, MS205 minisatellite diversity in basques: evidence for a pre-neolithic component, Genome Res. 8 (1998) 1289–1298. [10] S. Alonso, C. Flores, V. Cabrera, A. Alonso, P. Martin, C. Albarran, N. Izagirre, C. de la Rua, O. Garcia, The place of the Basques in the European Y-chromosome diversity landscape, Eur. J. Hum. Genet. 13 (2005) 1293–1302. [11] H. Laayouni, F. Calafell, J. Bertranpetit, A genome-wide survey does not show the genetic distinctiveness of Basques, Hum. Genet. 127 (2010) 455–458. [12] N. Rodriguez-Ezpeleta, J. Alvarez-Busto, L. Imaz, M. Regueiro, M.N. Azcarate, R. Bilbao, M. Iriondo, A. Gil, A. Estonba, A.M. Aransay, High-density SNP genotyping detects homogeneity of Spanish and French Basques, and confirms their genomic distinctiveness from other European populations, Hum. Genet. 128 (2010) 113–117. [13] J. Bertranpetit, J. Sala, F. Calafell, P.A. Underhill, P. Moral, D. Comas, Human mitochondrial DNA variation and the origin of Basques, Ann. Hum. Genet. 59 (1995) 63–81. [14] M. Richards, H. Côrte-Real, P. Forster, V. Macaulay, H. Wilkinson-Herbort, D. Demaine, S. Papiha, R. Hedges, H.J. Bandelt, B. Sykes, Paleolithic and Neolithic lineages in the European mitochondrial gene pool, Am. J. Hum. Genet. 59 (1996) 185–203. [15] L. Chikhi, R.A. Nichols, G. Barbujani, M.A. Beaumont, Y genetic data support the Neolithic demic diffusion model, PNAS 99 (2002) 11008–11013. [16] A. Torroni, H.J. Bandelt, L. D’Urbano, P. Lahermo, P. Moral, D. Sellitto, C. Rengo, P. Forster, M.L. Savontaus, B. Bonné-Tamir, R. Scozzari, mtDNA analysis reveals a major late Paleolithic population expansion from southwestern to northeastern Europe, Am. J. Hum. Genet. 62 (1998) 1137–1152. [17] N. Izagirre, C. de la Rúa, An mtDNA analysis in ancient Basque populations: implications for haplogroup V as a marker for a major paleolithic expansion from southwestern Europe, Am. J. Hum. Genet. 65 (1999) 199–207. [18] A. Alzualde, N. Izagirre, S. Alonso, A. Alonso, C. de la Rua, Temporal mitochondrial DNA variation in the Basque country: influence of post-Neolithic events, Ann. Hum. Genet. 69 (2005) 665–679. [19] O. García, R. Fregel, J.M. Larruga, V. Álvarez, I. Yurrebaso, V.M. Cabrera, A.M. González, Using mitochondrial DNA to test the European post glacial human recolonization from the Franco Cantabrian refuge, Heredity 106 (2011) 37–45. [20] A. Achilli, C. Rengo, C. Magri, V. Battaglia, A. Olivieri, R. Scozzari, F. Cruciani, M. Zeviani, E. Briem, V. Carelli, P. Moral, J.M. Dugoujon, U. Roostalu, E.L. Loogväli, T. Kivisild, H.J. Bandelt, M. Richards, R. Villems, A.S. SantachiaraBenerecetti, O. Semino, A. Torroni, The molecular dissection of mtDNA haplogroup H confirms that the Franco-Cantabrian glacial refuge was a major source for the European gene pool, Am. J. Hum. Genet. 75 (2004) 910–918. [21] W. Parson, C. Strobl, G. Huber, B. Zimmermann, S.M. Gomes, L. Souto, L. Fendt, R. Delport, R. Langit, S. Wootton, R. Lagacé, J. Irwin, Evaluation of next generation mtGenome sequencing using the ion torrent personal genome machine (PGM), Forensic Sci. Int. Genet. 7 (2013) 543–549. [22] J.L. King, B.L. La Rue, N.M. Novroski, M. Stoljarova, S. Bum Seo, X. Zeng, D.H. Warshauer, C.P. Davis, W. Parson, A. Sajantila, B. Budowle, High-quality and high-throughput massively parallel sequencing of the human mitochondrial genome using the Illumina MiSeq, Forensic Sci. Int. Genet. 12 (2014) 128–135. [23] W. Parson, G. Huber, L. Moreno, M.B. Madel, M.D. Brandhagen, S. Nagl, C. Xavier, M. Eduardoff, T.C. Callaghan, J.A. Irwin, Massively parallel sequencing of complete mitochondrial genomes from hair shaft samples, Forensic Sci. Int. Genet. 15 (2015)

5

Forensic Science International: Genetics 46 (2020) 102260

Ó. García, et al.

[49] [50]

[51]

[52]

[53]

[54]

[55]

[56]

[57]

[58]

[59]

[60] [61]

continuity in the Franco-Cantabrian region since pre-neolithic times, Am. J. Hum. Genet. 90 (2012) 486–493. H.-J. Bandelt, P. Lahermo, M. Richards, V. Macaulay, Detecting errors in mtDNA data by phylogenetic analysis, Int. J. Legal Med. 115 (2001) 64–69. S.L. Marques, A. Goios, A.M. Rocha, M.J. Prata, A. Amorim, L. Gusmão, C. Alves, L. Alvarez, Portuguese mitochondrial DNA genetic diversity: an update and a phylogenetic revision, Forensic Sci. Int. Genet. 15 (2015) 27–32. Q. Mairal, C. Santos, M. Silva, S.L. Marques, A. Ramos, M.P. Aluja, A. Amorim, M.J. Prata, L. Alvarez, Linguistic isolates in Portugal: insights from the mitochondrial DNA pattern, Forensic Sci. Int. Genet. 7 (2013) 618–623. A. Ramos, C. Santos, L. Mateiu, M. del, M. Gonzalez, L. Alvarez, L. Azevedo, A. Amorim, M.P. Aluja, Frequency and pattern of heteroplasmy in the complete human mitochondrial genome, PLoS One 8 (2013) e74636. S. Cardoso, M.T. Zarrabeitia, L. Valverde, A. Odriozola, M.Á. Alfonso-Sánchez, M.M. de Pancorbo, Variability of the entire mitochondrial DNA control region in a human isolate from the Pas Valley (northern Spain), J. Forensic Sci. 55 (2010) 1196–1201. J.L. King, B.L. La Rue, N.M. Novroski, M. Stoljarova, S. Bum Seo, X. Zeng, D.H. Warshauer, C.P. Davis, W. Parson, A. Sajantila, B. Budowle, High-quality and high-throughput massively parallel sequencing of the human mitochondrial genome using the Illumina MiSeq, Forensic Sci. Int. Genet. 12 (2014) 128–135. L.M. Bragg, G. Stone, M.K. Butler, P. Hugenholtz, G.W. Tyson, Shining a light on dark sequencing: characterising errors in Ion Torrent PGM data, PLoS Comput. Biol. 9 (2013) e1003031. S.B. Seo, X. Zeng, J.L. King, B.L. Larue, M. Assidi, M.H. Al-Qahtani, A. Sajantila, B. Budowle, Underlying data for sequencing the mitochondrial genome with the massively parallel sequencing platform ion torrentTM PGMTM, BMC Genomics 16 (Suppl. 1) (2015) S4. J.D. Churchill, J.L. King, R. Chakraborty, B. Budowle, Effects of the Ion PGM™ HiQ™ sequencing chemistry on sequence data quality, Int. J. Legal Med. 130 (2016) 1169–1180. A. Aguirre, A. Vicario, L.I. Mazon, A. Estomba, M. Martinez de Pancorbo, V. ArrietaPico, F. Perez-Elortondo, C.M. Lostao, Are the Basques a single and unique population? Am. J. Hum. Genet. 49 (1991) 450–458. C. Manzano, A.I. Aguirre, M. Iriondo, M. Martín, L. Osaba, C. de la Rúa, Genetic polymorphisms of the Basques from Gipuzkoa: genetic heterogeneity of the Basque population, Ann. Hum. Biol. 23 (1996) 285–296. M. Iriondo, M.C. Barbero, C. Manzano, DNA polymorphisms detect ancient barriers to gene flow in Basques, Am. J. Phys. Anthropol. 122 (2003) 73–84. B. Martínez-Cruz, C. Harmant, D.E. Platt, W. Haak, J. Manry, E. Ramos-Luis, D.F. Soria-Hernanz, F. Bauduer, J. Salaberria, B. Oyharçabal, L. Quintana-Murci,

[62]

[63]

[64]

[65] [66] [67]

6

D. Comas, the Genographic Consortium, Evidence of Pre-Roman Tribal Genetic Structure in Basques from Uniparentally Inherited Markers, Mol. Biol. Evol. 29 (2012) 2211–2222. I. Olalde, S. Mallick, N. Patterson, N. Rohland, V. Villalba-Mouco, M. Silva, K. Dulias, C.J. Edwards, F. Gandini, M. Pala, P. Soares, M. Ferrando-Bernal, N. Adamski, N. Broomandkhoshbacht, O. Cheronet, B.J. Culleton, D. Fernandes, A.M. Lawson, M. Mah, J. Oppenheimer, K. Stewardson, Z. Zhang, J.M. Jiménez Arenas, I.J. Toro Moyano, D.C. Salazar-García, P. Castanyer, M. Santos, J. Tremoleda, M. Lozano, P. García Borja, J. Fernández-Eraso, J.A. Mujika-Alustiza, C. Barroso, F.J. Bermúdez, E. Viguera Mínguez, J. Burch, N. Coromina, D. Vivó, A. Cebrià, J.M. Fullola, O. García-Puchol, J.I. Morales, F.X. Oms, T. Majó, J.M. Vergès, A. Díaz-Carvajal, I. Ollich-Castanyer, F.J. López-Cachero, A.M. Silva, C. Alonso-Fernández, G. Delibes de Castro, J. Jiménez Echevarría, A. MorenoMárquez, G. Pascual Berlanga, P. Ramos-García, J. Ramos-Muñoz, E. Vijande Vila, G. Aguilella Arzo, Á. Esparza Arroyo, K.T. Lillios, J. Mack, J. Velasco-Vázquez, A. Waterman, L. Benítez de Lugo Enrich, M. Benito Sánchez, B. Agustí, F. Codina, G. de Prado, A. Estalrrich, Á. Fernández Flores, C. Finlayson, G. Finlayson, S. Finlayson, F. Giles-Guzmán, A. Rosas, V. Barciela González, G. García Atiénzar, M.S. Hernández Pérez, A. Llanos, Y. Carrión Marco, I. Collado Beneyto, D. LópezSerrano, M. Sanz Tormo, A.C. Valera, C. Blasco, C. Liesau, P. Ríos, J. Daura, M.J. de Pedro Michó, A.A. Diez-Castillo, R. Flores Fernández, J. Francès Farré, R. GarridoPena, V.S. Gonçalves, E. Guerra-Doce, A.M. Herrero-Corral, J. Juan-Cabanilles, D. López-Reyes, S.B. McClure, M. Merino Pérez, A. Oliver Foix, M. Sanz Borràs, A. Catarina Sousa, J.M. Vidal Encinas, D.J. Kennett, M.B. Richards, K. Werner Alt, W. Haak, R. Pinhasi, C. Lalueza-Fox, D. Reich, The genomic history of the Iberian Peninsula over the past 8000 years, Science 363 (2019) 1230–1234. V. Álvarez-Iglesias, A. Mosquera-Miguel, M. Cerezo, B. Quintáns, M.T. Zarrabeitia, I. Cuscó, M.V. Lareu, Ó. García, L. Pérez-Jurado, Á. Carracedo, A. Salas, New population and phylogenetic features of the internal variation within mitochondrial DNA macro-haplogroup R0, PLoS One 4 (2009) e5112. A. Gómez-Carballa, A. Olivieri, D.M. Behar, A. Achilli, A. Torroni, A. Salas, Genetic continuity in the Franco-Cantabrian region: new clues from autochthonous mitogenomes, PLoS One 7 (2012) e32851. L. Michelena, Sobre el pasado de la lengua vasca, Ed, Auñamendi, San Sebastián (Spain) (1964). G. Barbujani, Geographic Patterns: How to Identify Them and Why, Hum. Biol. 72 (2000) 133–153. A.M. González, O. García, J.M. Larruga, V.M. Cabrera, The mitochondrial lineage U8a reveals a Paleolithic settlement in the Basque country, BMC Genomics 7 (2006) 124.