A novel multiplex assay amplifying 13 Y-STRs characterized by rapid and moderate mutation rate

A novel multiplex assay amplifying 13 Y-STRs characterized by rapid and moderate mutation rate

Accepted Manuscript Title: A novel multiplex assay amplifying 13 Y-STRs characterized by rapid and moderate mutation rate Author: Urszula Rogalla Marc...

371KB Sizes 2 Downloads 69 Views

Accepted Manuscript Title: A novel multiplex assay amplifying 13 Y-STRs characterized by rapid and moderate mutation rate Author: Urszula Rogalla Marcin Wo´zniak Jacek Swobodzi´nski Miroslava Derenko Boris A. Malyarchuk Irina Dambueva Marek Kozi´nski Jacek Kubica Tomasz Grzybowski PII: DOI: Reference:

S1872-4973(14)00244-0 http://dx.doi.org/doi:10.1016/j.fsigen.2014.11.004 FSIGEN 1267

To appear in:

Forensic Science International: Genetics

Received date: Revised date: Accepted date:

14-8-2014 4-11-2014 6-11-2014

Please cite this article as: U. Rogalla, M. Wo´zniak, J. Swobodzi´nski, M. Derenko, B.A. Malyarchuk, I. Dambueva, M. Kozi´nski, J. Kubica, T. Grzybowski, A novel multiplex assay amplifying 13 Y-STRs characterized by rapid and moderate mutation rate, Forensic Science International: Genetics (2014), http://dx.doi.org/10.1016/j.fsigen.2014.11.004 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Title: A novel multiplex assay amplifying 13 Y-STRs characterized by rapid and moderate mutation rate

3  4  5 

Authors: Urszula Rogalla1, Marcin Woźniak1, Jacek Swobodziński1, Miroslava Derenko2, Boris A. Malyarchuk2, Irina Dambueva3, Marek Koziński4, Jacek Kubica4, Tomasz Grzybowski1

6  7 

1

Institute of Molecular and Forensic Genetics, Nicolaus Copernicus University, Collegium Medicum in Bydgoszcz, M. Sklodowskiej-Curie 9, 85-094 Bydgoszcz, Poland

8  9 

2

10  11 

3

12  13 

4

cr

Institute of Biological Problems of the North, Far-East Branch of the Russian Academy of Sciences, Portovaya str. 18, Magadan 685000, Russia

us

Institute of General and Experimental Biology, Russian Academy of Sciences, Ulan-Ude, Russia

U. Rogalla: [email protected] M. Woźniak: [email protected] J. Swobodziński: [email protected] M. Derenko: [email protected] B.A. Malyarchuk: [email protected] I. Dambueva: [email protected] M. Koziński: [email protected] J. Kubica: [email protected] T. Grzybowski: [email protected]

d

M

Email addresses:

an

Chair of Cardiology and Internal Disease, Nicolaus Copernicus University, Collegium Medicum in Bydgoszcz, M. Skłodowskiej-Curie 9, 85-094 Bydgoszcz, Poland

te

14  15  16  17  18  19  20  21  22  23  24 

ip t

1  2 

Corresponding author: Urszula Rogalla at The Nicolaus Copernicus University, Ludwik Rydygier Collegium Medicum, Institute of Forensic Medicine, Department of Molecular and Forensic Genetics, Skłodowskiej-Curie 9, Bydgoszcz, 85-094, Poland [email protected], +48 525853556

29 

Keywords: RM Y-STRs, Y chromosome, individual identification, Buryats

30 

Ac ce p

25  26  27  28 

Page 1 of 26

30 

Abstract

31  As microsatellites located on Y chromosome mutate with different rates, they may be

33 

exploited in evolutionary studies, genealogical testing of a variety of populations and even,

34 

as proven recently, aid individual identification. Currently available commercial Y-STR kits

35 

encompass mostly low to moderately mutating loci, making them a perfect choice for the first

36 

two applications. Some attempts have been made so far to utilise Y-STRs to provide a

37 

discriminatory tool for forensic purposes. Although all thirteen rapidly mutating Y-STRs were

38 

already multiplexed, no single assay based on single-copy markers allowing at least a

39 

portion of close male relatives to be differentiated from one another is available. To fill in the

40 

blanks, we constructed and validated an assay comprised of single-copy Y-STR markers

41 

only with a mutation rate ranging from 8x10-3 to 1x10-2. Performance of the resulting

42 

combination of nine RM Y-STRs and four moderately mutating ones was tested on 361

43 

father-son pairs and 1326 males from 9 populations revealing an overall mutation rate of

44 

1.607x10-1 for the assay as a whole. Application of the proposed 13 Y-STR set to

45 

differentiation of haplotypes present among homogenous population of Buryats resulted in a

46 

three-fold increase of discrimination as compared with 10 Y-STRs from the PowerPlex® Y.

M

an

us

cr

ip t

32 

47  1. Introduction

d

48 

Y chromosome analysis is an exceptionally valuable tool both in forensic and human

50 

population genetics mainly owing to inheritance along paternal line and susceptibility to

51 

genetic drift and patrilocality [1,2]. During criminal investigation, Y-STRs are most

52 

commonly used for solving sexual assault cases where identification of the DNA from the

59 

Ac ce p

te

49 

60 

are usually restricted only to identification of male lineage the sample originates from,

61 

not acknowledging potentially vital information on certain individual. Meanwhile, it seems

62 

it is not a rare occurrence that close relatives are involved in the same crime [6].

63 

A broad screening of Y-STRs performed by Ballantyne et al. [7] has revealed significant

64 

differences in mutation rates of these loci and identified an exceptional set of thirteen

65 

among them, termed “rapidly mutating”, which mutation rates were to exceed 1x10-2. As

66 

has been proven later [8], these RM Y-STRs are capable of differentiating nearly 27% of

53  54  55  56  57  58 

male perpetrator requires overcoming huge excess of female component [3]. They are also of a great use in examination of the DNA mixtures donated by multiple males, as well as conducting deep studies on human population history [4] and even in deficiency paternity testing [5].

Its major disadvantage, as seen from the forensic genetics community’s point of view,

is the inability to unambiguously differentiate between plausible male donors. Results obtained with most of the commercially available Y-STR kits encompassing up to 17 loci

Page 2 of 26

father and sons and 56.3% of brothers, while for the most broadly used Y-STR kit, being

68 

Yfiler® (Yfiler, Life Technologies, Foster City, CA), these values are 4.5% and 10%,

69 

respectively. That outstanding ability was quickly noticed by the community what

70 

resulted in a recent launch of commercial Promega PowerPlex® Y23 kit (PPY23,

71 

Promega Corporation, Madison, WI) [9-10] that does contain two tetranucleotide RM loci

72 

(DYS570 & DYS576). Noteworthy, also the YHRD database [11] has been adapted to

73 

handle these new markers. In its most up-to-date version (release 47) one may browse

74 

21909 PPY23 haplotypes and 873 haplotypes obtained with the only just announced

75 

AmpFLSTR® Yfiler® Plus kit [12] encompassing inter alia 7 RM Y-STRs.

76 

Rapidly mutating Y-STRs are undoubtedly of a great use when attempting to

77 

differentiate male lineages in populations that underwent recent bottleneck or founder

78 

effect resulting in general low diversity of haplotypes as seen for example among

79 

Buryats or Finns [13-15]. Studies of this kind performed using Y-STRs with midrange

80 

mutation rates usually fail to give satisfactory results.

an

us

cr

ip t

67 

Although these are multicopy Y-STRs, which are characterized mostly by the highest

82 

mutation rates, they suffer from some major drawbacks that need to be mentioned. If the

83 

evidence contains DNA from multiple donors, multi-copy STRs simply hinder proper

84 

determination of the number of males involved. Moreover, for inferring genealogical

85 

relationships of any kind, it would be beneficial to rely only on information from single-

86 

copy markers as the multi-copy ones often exhibit values that cannot be unequivocally

87 

traced back to the ancestral state [16].

te

d

M

81 

To address these issues, we decided to raise a question if a carefully chosen set of

89 

single copy Y-STR markers characterized by medium to rapid mutation rate is capable of

90  91  92 

Ac ce p

88 

aiding male individual identification. We also aimed at constructing single tube multiplex assay amplifying these chosen loci and check, whether it can be of any use also for the field of population genetics.

93 

2. Materials and Methods

94 

2.1.

Samples

95 

Mutation rates of the chosen markers and the ability of the proposed assay to

96 

differentiate between male relatives was tested using 361 father and son pairs, all

97 

sampled among Poles. Assay’s performance and diversity of the markers was checked

98 

using 1329 samples collected in Europe (406 from Poland, 84 from Ukraine, 45 from

99 

European part of Russian Federation including Pskov, Veliky Novgorod, Volot, 24 from

100 

Nogais, 289 from Austria, 95 from Northern Italy), Middle East (141 Palestinian Arabs)

101 

and Asia (210 Buryats – from South Siberia and 35 Kazakhs from Kereit clan). Sampling

Page 3 of 26

102 

locations are shown in Figure 1. In order to check assay’s ability to resolve genealogies

103 

of closely related males we additionally analyzed 14 samples from potentially related

104 

men sharing a common last name.

105 

2.2.

Loci selection and primer design

The assay encompasses 13 markers characterized by medium (DYS458, DYS516,

107 

DYS534, DYS611) to rapid (DYS449, DYS518, DYS526b, DYS547, DYS570, DYS576,

108 

DYS612, DYS626, DYS627) mutation rate. All of the markers were selected from the

109 

study by Ballantyne et al. [7] based upon 1) the highest mutation rate (lower threshold

110 

set at 6x10-3 to cover sampling bias), 2) the broadest allelic range (arbitrary set as 8 or

111 

more), 3) being a single copy marker. The only exception was DYS526 normally present

112 

in two copies. For this marker primers were designed in a way providing opportunity for

113 

amplification of one allele only. All the primers were designed specifically for this study

114 

using Primer3Plus software [17]. Seven among the chosen markers (DYS611, DYS612,

115 

DYS526b, DYS516, DYS626, DYS547 and DYS534) are not present in any

116 

commercially available Y-STR system. For details on the marker’s repeat structure,

117 

mutation rate, genomic location and observed allelic ranges refer to the Table 1.

cr

us

an

Multiplex amplification

M

2.3.

d

118 

ip t

106 

All thirteen markers were amplified in a single multiplex reaction. The 10 µL final

120 

volume PCR reaction contained: 1x GoTaq G2 Reaction Buffer, 3 mM MgCl2, 160 µM of

121 

each dNTP, 0.8 U GoTaq G2 HotStart Polymerase (Promega) and 400 ng/µL bpvine

122 

serum albumin (BSA, Promega) with forward and reverse primers in concentrations

124  125  126  127  128  129 

Ac ce p

123 

te

119 

listed in Table 2. Full profiles were obtained with 0.5 ng of DNA. Amplification was performed in GeneAmp 9700 thermal cyclers (Applied Biosystems) and required 5 minutes of initial denaturation at 96˚C. Subsequent cycles of denaturation at 96˚C for 30 seconds, annealing at 61˚C for 45 seconds and elongation at 72˚C for 45 seconds were repeated a total of 30 times, followed by 10 minutes of the final elongation at 72˚C to avoid splitting peaks.

2.4.

Detection and genotyping

130 

Products were separated using capillary electrophoresis on an ABI PRISM® 3130xl

131 

analyzer (Applied Biosystems) equipped with 36cm capillaries and POP-7 polymer. The

132 

injection time was 23 s and the applied voltage was set to 1.2kV. Samples for capillary

133 

electrophoresis were prepared by mixing 1 µl of PCR product with 9 µl Hi-Di

134 

Formamide™ (Applied Biosystems) and 0.3 µl GeneScan™ 600 LIZ® (Applied

135 

Biosystems). A peak detection threshold of 50RFU was applied.

Page 4 of 26

All genotyping was performed with GeneMapper ID v.3.2 software (Applied Biosystems)

137 

with custom allelic ladder and bin sets for each marker. All alleles present in the allelic

138 

ladder were sequenced to confirm length and repeat unit structure, using Big Dye

139 

Terminator v.3.1 chemistry (Applied Biosystems). All father-son mutations revealed

140 

during this study were confirmed by second amplification or sequencing.

141 

2.5.

ip t

136 

Sensitivity, specificity and inhibition

Sensitivity testing was performed using a series of diluted samples from male donors.

143 

DNA concentration of these samples was previously assessed with QuantiFiler® Duo kit

144 

(Applied Biosystems). To test if the assay is specific for male DNA only, we performed 5

145 

amplifications with 1ng of female DNA templates. We also checked the ability of the

146 

assay to amplify male DNA in the presence of an excess of female material (3 pairs of

147 

samples tested), which reflects actual conditions encountered during analysing samples

148 

collected from post-coital swabs (ratio 1:1, 1:10, 1:100 and 1:1000). To assess the

149 

efficacy of the assay to amplify DNA from multiple donors present in different

150 

proportions, we analysed extracts containing mixtures of male DNA (3 pairs tested) in

151 

the following ratios: 1:1, 1:5, 1:10. Assessing species specificity encompassed testing

152 

performance of the assay in amplifying 10ng of template DNA from animals most likely to

153 

appear on the crime scene, including dog, cat, hamster, guinea pig, rabbit, horse and

154 

cattle. BLAST cross-search reveals high probability of successful amplification of most

155 

markers from the DNA extracted from various primates. We have not tested this

156 

possibility experimentally, as primate species other than humans are virtually absent in

157 

Europe. Resistance to inhibition was assessed using two most common inhibitors

163 

Ac ce p

te

d

M

an

us

cr

142 

164 

Haplotype diversities and discrimination capacity were calculated according to standard

165 

methodology [10]. Fst and Rst were computed using Arlequin v.3.5 [19] software and

166 

their graphical presentation in a heatmap form was constructed using gplots package for

167 

R [20]. Multidimensional scaling of both linearized Fst and Rst was performed using

168 

Statistica package v.9.1 (StatSoft). Haplotypes’ networks were constructed using median

169 

joining algorithm [21] embedded in the Network v.4.6.12 software [22].

158  159  160  161  162 

encountered in our practice – hematin and humic acid (HA) – with varying concentrations of both (for the details proceed to the results section).

2.6.

Statistics

Mutations were counted directly and the mutation rate was calculated as the number of observed mutations divided by the number of father-son pairs. 95% confidence intervals from binomial probability distribution was estimated using the formula available at [18].

Page 5 of 26

170 

3. Results and discussion

171 

3.1.

Nomenclature

Nomenclature used for all the loci follows the ISFG recommendations [23] and is

173 

equal to one published in [7], including DYS449 locus (see [24]). The only discrepancy in

174 

nomenclature may be seen in DYS534 locus, which in case of our assay contains

175 

additional ATCT11-13 microsatellite (rs72167351) making it necessary to add a value of

176 

11 to 13 to the repeat number of DYS534 marker. 3.2.

cr

177 

ip t

172 

Overall assay’s performance

Sensitivity: As each assay to be used in forensics needs to be capable of amplification of

179 

low amounts of DNA found on the crime scene (LT-DNA samples), we tested our PCR

180 

reaction using varying amounts of DNA collected from female (1 ng) and male donors

181 

(31.25 pg-10 ng). For female samples the reaction gave no signal of amplification, as

182 

expected. For male samples full, well-balanced profiles, evenly distributed across all loci

183 

in all channels were obtained in the presence of 500 pg to 1 ng DNA, which is

184 

comparable to commercially available kits (see Supplementary Figure 3). With the lower

185 

amounts of the DNA template, reaching 125 pg we still observed full profiles, yet signal

186 

intensity was much weaker, therefore one need to assume some drop-outs may

187 

potentially occur.

188 

Mixtures: Most of the instances where Y-STR assays are employed refer to sexual

189 

assault cases, therefore it is crucial to test the assay’s ability to amplify samples

196 

Ac ce p

te

d

M

an

us

178 

197 

Species specificity: We tested the assay’s specificity for human DNA template by making

198 

attempt to amplify non-human samples collected from dog, cat, guinea pig, hamster,

199 

rabbit, horse and cattle – species most probable of being present at the crime scene in

200 

our climatic zone. As expected, amplification of DNA from all the aforementioned species

201 

did not yield detectable products in any locus.

190  191  192  193  194  195 

consisting of female:male and male:male DNA mixtures present in variable ratios. Our

results suggest that the 13 Y-STRs assay is able to amplify male DNA even if there is a huge excess of female material added to the reaction (up to 100:1). However, commercial assays successfully amplify male profiles in female admixed samples at

ratios ranging from 1:1000 (as for Yfiler [25]) to 1:24000 (for PPY23 [26]). The 13 Y-STR assay is also capable of retrieving full profiles of two male donors if their DNA is present even in the 1:10 proportion (upper tested value).

Page 6 of 26

Inhibition: Overcoming inhibition is a daily problem for forensic genetics, as samples

203 

from crime scenes can contain inhibitors that affect amplification. Therefore, we

204 

analysed resistance of the assay to two most commonly encountered inhibitors –

205 

hematin and humic acid. For hematin the range of examined concentrations was 1 to

206 

30μM with 20μM being the upper resistance value allowing successful amplification

207 

across all 13 loci. This result is much worse than the one reported for PPY23 (500 μM

208 

[26] yet comparable with Yfiler data, which exhibits overall inhibition if hematin

209 

concentration exceeds 16 μM [25]. In case of humic acid, we applied 1 to 30 ng/μL of

210 

inhibitor and observed non-problematic amplification up to the addition of 20ng/μL of HA.

211 

It is worth noting, however, that the only PCR enhancer tested was BSA, without which

212 

overcoming HA inhibition was not possible at all (data not shown).

cr

3.3.

Efficacy for differentiating individuals

us

213 

ip t

202 

In order to test for efficacy of the assay for differentiating individuals we tested 361

215 

father-son pairs and performed a broad population study based on samples collected

216 

from individuals from 9 populations representing Europe, Asia and the Middle East. We

217 

found 58 mutations within father-son pairs (see Table 1 for the details) corresponding to

218 

the overall mutation rate of 16.07% (as the sum of all the mutation rates for separate

219 

loci), which is remarkably higher than the one given for Yfiler [27] and PPY23 and

220 

comparable with the data for Yfiler Plus [12] yet still much lower than reported for the set

221 

built up with 13 RM Y-STRs only [8]. One should note, however, that the highest

222 

mutation rate has been reported so far for multi-allelic markers [7], which we purposely

223 

avoided in our assay. According to our results, most mutable single-allelic marker was

230 

Ac ce p

te

d

M

an

214 

231 

analysed. Most of the observed mutations were single-step ones (13.5:1) with slight

232 

majority of gains over losses (1.23:1).

224  225  226  227  228  229 

DYS518, which is also included in the Yfiler Plus kit. Second most mutable marker in our study was DYS458, a well-established marker described – as seen in previous studies [7,28] – as a moderate mutation rate one. We observed no single meiosis mutations in our dataset for DYS611 marker, yet given its extremely complex layout, one may presume that sequencing could reveal some trinucleotide repeat loss counterbalanced with repeat gain impossible to trace with fragment length electrophoresis. However, these estimates would definitely be different if a few thousand father-son pairs were

233 

Haplotype diversity and discrimination capacity values as well as the number of

234 

haplotypes including unique ones are summarized in Table 3. There were no shared

235 

haplotypes between any samples from the populations under study. For Polish

236 

population, which we analysed most extensively, there were 403 haplotypes in 406

237 

individuals resulting in haplotype diversity and discrimination capacity values reaching

Page 7 of 26

238 

0.99996 and 0.993, respectively. Overall, for all the populations studied (excluding

239 

Buryats and Kereits which would severely bias the result due to their high homogeneity)

240 

HD and DC show values of 0.999931 and 0.969 respectively, which seem to be

241 

promising in terms of individual identification, especially if compared with other

242 

previously available Y-STR assays [10].

244 

ip t

243  3.3.1. Practical application of the assay

We also took an opportunity to verify our assay’s performance in distinguishing

246 

closely related males whilst performing genealogical testing for 14 males inhabiting

247 

diverse regions of central Poland but sharing the same last name, who were eager to

248 

investigate if they have common male ancestor. The gathered documentation was

249 

scarce, yet it contained evidence that there were at least four individuals representing

250 

one male lineage and separated from each other by six or nine meioses, depending on

251 

the branch of genealogy. Yfiler testing excluded one of these individuals as a close

252 

relative of the others but simultaneously revealed that two more men from the tested

253 

group bear exactly the same haplotype as the supposedly related men. Altogether 5 out

254 

of 14 males tested shared one Yfiler haplotype. On the contrary, results obtained with

255 

the assay proposed in this study clearly distinguished all the five men from each other by

256 

one to four mutations, what remains concordant with the calculated mutation rate of 16%

257 

(equalling to 6.2 meioses).

te

259 

3.4.

260 

Ac ce p

258 

d

M

an

us

cr

245 

261  262  263  264  265  266 

Use of the assay in population and evolutionary genetics

Y-STR markers characterized by a rapid mutation rate may potentially be of a great

use not only for individual identification purposes but also for resolving phylogenies of homogenous populations, whose diversity is low or has been dramatically reduced by a bottleneck effect, for instance. To answer the question if our assay may aid identification of individuals sampled from homogenous populations we chose Buryats from southern Siberia and Kereit clan from Kazakhstan, both of which have been previously screened for the Y-STR diversity [13, 29]. Among Buryats (n=202) 58 haplotypes were reported

267 

based on the PowerPlex Y analyses. Three of these haplotypes were shared by 59% of

268 

all males in the group. Applying our set of markers allowed distinguishing 163 unique

269 

haplotypes, which is an almost three-fold increase in resolution (see Figure 4. for

270 

haplotypes’ network comparison). After combining our 13 markers with 10 provided by

271 

the PowerPlex Y, we got only 11 distinctive haplotypes more. In case of Kereits (n=36)

272 

the difference was less spectacular, yet still remarkable. Using 10 PowerPlex Y markers

273 

only 15 haplotypes could have been identified, whereas our set used separately allowed

Page 8 of 26

distinguishing 25 haplotypes. These values do not point to the ability of unambiguous

275 

identification of individuals but definitely prove that inclusion of some of the RM Y-STRs

276 

in analyses results in a massive increase in differentiation capabilities of the assay.

277 

In order to investigate further the potential relationships between populations under

278 

study as revealed with our set of markers, we calculated Fst and Rst distances between

279 

groups of samples and visualised them using heatmaps and MDS plots, as presented in

280 

Figure 2 and Figure 3. The results obtained with both methods differed significantly from

281 

one another. On the MDS plot of Fst distances, all the populations grouped together with

282 

the exception of Kereits and Buryats that were evidently separated. In case of Rst, which

283 

is a usual choice for STR analysis, no single cluster could be distinguished, although the

284 

greatest differences were observed between Arabs, Kereits and the remaining

285 

populations. Thus, it seems that the chosen set of markers performs poorly in terms of

286 

providing data for inferring relationships between populations and in practice can

287 

probably be overrun by a set of slowly or moderately mutating markers that maintains

288 

uniqueness of populations or changes in clines across geographic regions. In case of

289 

RM Y-STRs the chance of a random convergence of haplotypes is high, thus making

290 

them less useful in analysing populations with well-resolved phylogenies.

M

an

us

cr

ip t

274 

291  4. Conclusions

d

292 

Rapidly mutating markers have recently proven ability to increase the differentiation of

294 

related males, yet no single assay has been reported so far and it seems that analyses

295 

of multi-allelic markers may be cumbersome in some cases. The proposed multiplex

296 

assay comprised of markers exhibiting single copy alleles allows distinguishing fathers

303 

Ac ce p

te

293 

304 

paternity cases as it would create interpretational issues.

305 

It has been shown that the assay is human specific, requires only 0.5 ng of the DNA for

306 

the successful amplification, offers some, although limited, level of resistance to common

307 

inhibitors (humic acid and hematin) and is complete in just 2.5 hours including CE,

308 

making it a promising alternative for the commercially available kits.

297  298  299  300  301  302 

from their sons in at least 16% of cases. It also appears to be useful in testing of homogenous populations characterised by low genetic diversity, as it is capable of differentiation of many of the samples on individual’s level. On the other hand, it seems that there is no point in applying the tested marker set to population genetics research focused on highly diverse populations, as the results would bring no new information and would rather blur the overall picture. Certainly, use of RM Y-STRs should also be avoided in missing persons (when comparison with potential relatives is performed) and

Page 9 of 26

309 

For further comparative purposes all the obtained haplotypes are available in

310 

Supplementary Table 1. Supplementary Figures 1 and 2 depict representative

311 

electropherogram for the assay and custom allelic ladder, respectively.

312  Acknowledgements

314 

This study is funded by the Polish Ministry of Science and Higher Education Preludium Grant

315 

no. 2012/05/N/NZ8/00801 and the Faculty of Medicine CM NCU Grant for young scientists

316 

no. 07/WL/2013. We are also grateful to Mrs Mariola Mrozek for the excellent technical

317 

assistance.

318 

Conflict of interest:

319 

Authors declare no conflict of interests.

320 

Ethical approval:

321 

All samples

322 

variation from anonymous donors and from individuals participating in forensic paternity

323 

testing performed by Institute of Molecular and Forensic Genetics in Bydgoszcz, Poland.

324 

References

325 

[1] M. Kayser. Uni-parental markers in human identity testing including forensic DNA

326 

analysis. Biotechniques 43 (2007) 3042.

327 

[2] P.A. Underhill, and T. Kivisild. Use of y chromosome and mitochondrial DNA population

328 

structure in tracing human migrations. Annu. Rev. Genet. 41 (2007) 539–564.

329 

[3] W. Parson, H. Niederstatter, A. Brandstatter, B. Berger, Improved specificity of YSTR

330 

typing in DNA mixture samples, Int. J. Legal Med. 117 (2003) 109–114.

331 

[4] W. Shi, Q. Ayub, M. Vermeulen, R.G. Shao, S. Zuniga, K. van der Gaag, P. de Knijff, M.

332 

Kayser, Y. Xue and C. Tyler-Smith. A worldwide survey of human male demographic history

333 

based on Y-SNP and Y-STR data from the HGDPCEPH populations. Mol. Biol. Evol. 27

334 

(2010) 385–393.

335 

[5] M.A. Jobling, A. Pandya, C. Tyler-Smith, The Y chromosome in forensic analysis and

336 

paternity testing, Int. J. Leg. Med. 110 (1997) 118 – 124.

337 

[6] C.J. Gershaw, A.J. Schweighardt, L.C. Rourke, M.M. Wallace, Forensic utilization of

338 

familial searches in DNA databases, Forensic Sci. Int. Genet. 5 (2011) 16–20.

with

us an

obtained

informed

consent

for studies of gene frequency

Ac ce p

te

d

M

were

cr

ip t

313 

Page 10 of 26

[7] K.N. Ballantyne, M. Goedbloed, R. Fang, O. Schaap, O. Lao, A. Wollstein, Y. Choi, K. van

340 

Duijn, M. Vermeulen, S. Brauer, R. Decorte, M. Poetsch, N. von-Wurmb-Schwark, P. de

341 

Knijff, D. Labuda, H. Vezina, K. Knoblauch, R. Lessig, L. Roewer, R. Ploski, T. Dobosz, L.

342 

Henke, J. Henke, M.R. Furtado, M. Kayser, Mutability of Ychromosomal microsatellites:

343 

rates, characteristics, molecular bases, and forensic implications, Am. J. Hum. Genet. 87

344 

(2010) 341–353.

345 

[8] N. Ballantyne, A. Ralf, R. Aboukhalid, N. M. Achakzai, M.J. Anjos et al. Toward Male

346 

Individualization with Rapidly Mutating Y-Chromosomal Short Tandem Repeats. Hum Mut 35

347 

(2014) 1021-1032.

348 

[9] J.M. Thompson, M.M. Ewing, W.E. Frank, J.J. Pogemiller, C.A. Nolde, D.J. Koehler, A.M.

349 

Shaffer, D.R. Rabbach, P.M. Fulmer, C.J. Sprecher, D.R. Storts, Developmental validation of

350 

the PowerPlex Y23 System: a single multiplex Y-STR analysis system for casework and

351 

database samples, Forensic Sci. Int. Genet. 7 (2013) 240–250.

352 

[10] J. Purps et al. A global analysis of Y-chromosomal haplotype diversity for 23 STR loci.

353 

Forensic Sci Int Genet. (2014) doi: 10.1016/j.fsigen.2014.04.008

354 

M

an

us

cr

ip t

339 

[11] YHRD database: http://yhrd.org 

356 

[12] Life Technologies, 2013 Future Trends in Forensic DNA Technology Seminar Series:

357 

Development of a Next Generation Y-STR Multiplex for Forensic Applications : Development

358 

of

359 

http://www.slideshare.net/Lifetech_HID/2013-hid-universityyfilerplus,

360 

14.10.2014.

361 

[13] M. Woźniak, M. Derenko, BA Malyarchuk, I. Dambueva, T. Grzybowski, D. Miścicka-

362 

Śliwka. (2006) Allelic and haplotypic frequencies at 11 Y-STR loci in Buryats from South-East

363 

Siberia. Forensic Sci. Int. 164 (2006) 271–275.

364 

[14] M. Hedman, V. Pimenoff, M. Lukka, P. Sistonen, A. Sajantila, Analysis of 16 Y STR loci

365 

in the Finnish population reveals a local reduction in the diversity of male lineages, Forensic

366 

Sci. Int. 142 (2004) 37–43.

367 

[15] M. Hedman, AM. Neuvonen, A. Sajantila, and J.U. Palo. Dissecting the Finnish male

368 

uniformity: The value of additional Y-STR loci. Forensic Sci. Int. Genet., 5 (2011) 199-201.

369 

[16] M. Vermeulen, A. Wollstein, K. van der Gaag, O. Lao, Y. Xue, Q. Wang, L. Roewer, H.

370 

Knoblauch, C. Tyler-Smith, P. de Knijff, M. Kayser. Improving global and regional resolution

Next

Generation

Ac ce p

a

te

d

355 

Y-STR

Multiplex

for

Forensic last

Applications. accessed

Page 11 of 26

of male lineage differentiation by simple single-copy Y-chromosomal short tandem repeat

372 

polymorphisms. Forensic Sci. Int. Genet. 3 (2009) 205–13.

373 

[17] A. Untergasser, I. Cutcutache, T. Koressaar, J Ye, B.C. Faircloth, M. Remm, S.G.

374 

Rozen. Primer3--new capabilities and interfaces. Nucleic Acids Res. 40 (2012) e115.

375 

[18] Exact Binomial and Poisson Confidence Intervals: http://statpages.org/confint.html

376 

[19] L. Excoffier and H.E. L. Lischer. Arlequin suite ver 3.5: A new series of programs to

377 

perform population genetics analyses under Linux and Windows. Molecular Ecology

378 

Resources. 10 (2010) 564-567.

379 

[20] R: http://cran.r-project.org

380 

[21 H-J Bandelt, P Forster, A. Röhl. Median-joining networks for inferring intraspecific

381 

phylogenies. Mol Biol Evol 16 (1999) 37-48.

382 

[22] Network v.4.6.12 software: fluxus-engineering.com

383 

[23] L. Gusmao, J.M. Butler, A. Carracedo, P. Gill, M. Kayser, W.R. Mayr, et al., DNA

384 

Commission of the International Society of Forensic Genetics (ISFG): an update of the

385 

recommendations on the use of Y-STRs in forensic analysis, Forensic Sci. Int. 157 (2006)

386 

187–197.

387 

[24] J. Mulero, J. Ballantyne, K. Ballantyne, B. Budowle, M. Coble, L. Gusmao, L. Roewer,

388 

M. Kayser. Nomenclature update and allele repeat structure for the markers DYS518

389 

and

390 

http://dx.doi.org/10.1016/j.fsigen.2014.04.009

391 

[25] J.J. Mulero, C.W. Chang, L.M. Calandro, R.L. Green, Y. Li, C.L. Johnson, L.K.

392 

Hennessy. Development and validation of the AmpFlSTR Yfiler PCR amplification kit: a male

393 

specific, single amplification 17 Y-STR multiplex system. J Forensic Sci. 51(2006) 64-75.

394 

[26] J.M. Thompson, M.M. Ewing, W.E. Frank, J.J. Pogemiller, C.A. Nolde, D.J. Koehler,

395 

A.M. Shaffer, D.R. Rabbach, P.M. Fulmer, C.J. Sprecher, D.R. Storts Developmental

396 

validation of the PowerPlex® Y23 System: a single multiplex Y-STR analysis system for

397 

casework and database samples. Forensic Sci Int Genet. 7 (2013) 240-50.

398 

[27] M. Goedbloed, M. Vermeulen, R.N. Fang, M. Lembring, A. Wollstein, K Ballantyne, O.

399 

Lao, S. Brauer, C. Krüger, L. Roewer, R. Lessig, R. Ploski, T. Dobosz, L. Henke, J. Henke,

400 

M.R. Furtado, M. Kayser. Comprehensive mutation analysis of 17 Y-chromosomal short

Forensic

Ac ce p

DYS449.

te

d

M

an

us

cr

ip t

371 

Sci.

Int.

Genet.

(2014),

Page 12 of 26

tandem repeat polymorphisms included in the AmpFlSTR Yfiler PCR amplification kit. Int J

402 

Legal Med. 123 (2009) 471-82.

403 

[28]K.N. Ballantyne, V. Keerl, A. Wollstein, Y. Choi, S.B. Zuniga, A. Ralf, M. Vermeulen, P.

404 

de Knijff, M. Kayser, A new future of forensic Y-chromosome analysis: rapidly mutating Y-

405 

STRs for differentiating male relatives and paternal lineages, Forensic Sci. Int. Genet. 6

406 

(2012) 208–218.

407 

[29] S. Abilev, B.A. Malyarchuk, M. Derenko, M. Wozniak, T. Grzybowski, I. Zakharov. The Y-

408 

chromosome C3* star-cluster attributed to Genghis Khan’s descendants is present at high

409 

frequency in the Kerey clan from Kazakhstan. Hum Biol. 84 (2012) 79-89.

410 

Captions for figures:

411 

Figure 1. Geographic distribution of samples, POL- Poles, UKR – Ukrainians, AUS-Austrians,

412 

IT- Italians, ROS – Russians (Pskov, Veliky Novgorod, Volot), NG – Nogais, BR – Buryats,

413 

KER – Kereits, AR – Palestinian Arabs.

414 

Figure 2. MDS plotted for Fst and Rst for all the populations under study.

415 

Figure 3. Heatmaps constructed with R gplots package based on Fst and Rst results.

416 

Figure 4. Networks constructed using MJ algorithm for the Buryat population obtained with

417 

10 Y-STRs data (from PowerPlex® Y) and 13 Y-STRs from our assay.

cr

us

an

M

d

te

Ac ce p

418 

ip t

401 

Page 13 of 26

418  Number of mutations

DYS458

DYS449

Motif

95% confidence interval

Location

Obs. allelic range Total gains loses single multip.

(GAAA)11-24

simple

9.6x10-32.22x10-2 4.32x10-2

Yp11.2

10-21

(TTCT)13-19N22(TTCT)3N12(TTCT)13-19

complex

3x10-3 1.11x10-2 2.81x10-2

Yp11.2

26-38

complex

-

simple

4.5x10-3 1.39x10-2 3.2x10-2

complex

4.5x10-3 1.39x10-2 3.2x10-2

4

4

7

4

3

1

1

3

1

(TTC)5N9(TTC)4(CTC)1(TTC)3N9(TTC)5 (CTC)1(TTC)3N15(TTC)4 (CT)1(TTC)3 (CTC)1(TTC)3 N20 (TTC)3T(TTC)3N7(TTC)3N9 (TTC)4(TCC)1 (TTC)7-21N23 (TTC)4N4 [(TTC)1(CTC)1]2[(CTC)1(TTC)1]3

-

-

-

-

Yq11.221 12-24

5

3

2

5

0

Yp11.2

19-31

5

5

0

4

1

7x10-4 5.54x10-3 1.99x10-2

Yp11.2

11-28

2

0

2

2

0

1.5x10-2 3.05x10-2 5.39x10-2

Yq11.21

24-38

11

6

5

11

0

complex

1.7x10-38.31x10-3 2.4x10-2

Yq11.221 36-49

3

2

1

3

0

complex

6.1x10-3 1.66x10-2 3.58x10-2

Yq11.223 9-18

6

3

3

6

0

DYS627 (AGAA)3N16(AGAG)3(AAAG)1224N81(AAGG)3

complex

7x10-4 5.54x10-3 1.99x10-2

Yp11.2

2

1

1

2

0

DYS534 (CTTT)3N8(CTTT)920N9(CTTT)3N169(ACTC)11-13

complex

1.7x10-3 8.31x10-3 2.4x10-2

Yq11.221 22-32

3

2

1

2

1

simple

1.7x10-3 8.31x10-3 2.4x10-2

Yp11.2

12-21

3

2

1

3

0

complex

6.1x10-3 1.66x10-2 3.58x10-2

Yq11.221 29-41

6

2

4

6

0

summed for all loci

12.43x10-2 - 20.27x1016.07x10-2 2

(CCT)5(CTT)1(TCT)4(CCT)1(TCT)19-31

DYS518 (AAAG)3(GAAG)1(AAAG)1422(GGAG)1(AAAG)4N6(AAAG)1119N27(AAGG)4

Ac ce p

DYS516

DYS576

complex

te

DYS547 (CCTT)9-13T(CTTC)4-5N56(TTTC)1022N10(CCTT)4(TCTC)1(TTTC)916N14(TTTC)3

d

DYS626 (GAAA)1423N24(GAAA)3N6(GAAA)5(AAA)1(GAAA)23(GAAG)1(GAAA)3 complex

M

DYS612

(TTTC)14-24

(TTCT)4N30(TTCT)9-18

(AAAG)13-22

DYS526b (CCCT)3N20(CTTT)11-17(CCTT)610N113(CCTT)10-17

Yq11.221 15-20

an

0

DYS570

-

us

cr

DYS611

8

ip t

Locus

Mean mutation Complexity rate

8-22

Table 1. Summary of all the 13 Y-STRs form the assay, including structure of mutations

419  420 

Page 14 of 26

420  Locus

Primers (3'-5')

C[μM] Dye

DYS458 F: TGCAGACTGAGCAACAGGAAT

Size range

0.32 6-FAM 166-218

DYS449 F: TGGAGTCTCTCAAGCCTGTTC

0.4

6-FAM 294-342

DYS611 F: CTGAAGCGATCCCCTGAGTAG

cr

R: GGTTGGACAACAAGAGTAAGACAG 0.24 6-FAM 413-459

0.32 VIC

DYS612 F: TTCACACAGGTTCAGAGGTTTG

0.4

VIC

M

R: CTTGACACTTGCCATGGGTAT DYS518 F: CTGGGCAACACAAGTGAAACT

107-147

an

R: GCTGAAATGCAGATATTCCCTA

us

R: ACTTGGCAACATAGCAGATCC DYS570 F: GCTGTGTCCTCCAAGTTCCT

ip t

R: TTTCCTGACCTTGTGATCCAG

0.16 VIC

189-225

310-378

R: GCATCACATGTAGCACTCTGG

d

DYS547 F: GTTCCAATTCTATCCATGTTACTGC 0.8

VIC

412-508

te

R: CCTGAGTGACAGAGCATAAACG

0.32 NED

177-213

0.24 NED

240-288

0.64 NED

389-433

0.24 PET

165-205

0.8

PET

215-274

DYS526b F: CATTATGTATTCTGTTTGTTTTCAGC 1.2

PET

400-496

Ac ce p

DYS516 F: GCCATGGTTTCTTGCTTCTTT

R: ACGAACCTGCAAATTGTTCAC

DYS627 F: AGCGCAGGATTCCATCTAAAA R: GCCTTTCATTCTCTCCTTCGT

DYS534 F: TCATCCCTCATCTACCCAACA R: TCAGTTCTTAACTCAACCAAACAA

DYS576 F: TTGGGCTGAGGAGTTCAATC R: GGCAGTCTCATTTCCTGGAG DYS626 F: CTGGGTGACAGAGTGCAAGAC R: TTTGGGACATGTTTGTTCTTTC

R: GTTTGGGTTACTTCGCCAGA

Page 15 of 26

Table 2. List of all the primers designed for the assay with dye-labels and expected product sizes in bp

421 

Ac ce p

te

d

M

an

us

cr

ip t

422 

Page 16 of 26

422  Unique (n=1)

n=2

n=3

n=>4

HD

DC

Poles

406

400

3

-

-

0.99996

0.993

Ukrainians

84

80

2

-

-

0.99942

0.976

Russians

45

43

1

-

-

0.99899

0.978

Nogais

24

20

2

-

-

0.99275

0.917

Buryats

210

150

13

3

3

Italians

95

89

3

-

-

Austrians

289

274

6

1

Arabs

141

118

5

3

Kereits

36

19

3

cr

0.805

0.99932

0.968

us

0.99416

-

0.99978

0.972

1

0.99797

0.901

0.96984

0.694

0.99977

0.935

an 2

ip t

Number of haplotypes

1

M

Total

te Ac ce p

423 

d

Table 3. Summary of distinct haplotypes as seen in various populations with haplotype diversity and discrimantion capacity values

Page 17 of 26

Highlights

424 

‐ Thirteen carefully selected Y‐STRs are capable of differentiating fathers and sons in ca. 16% of cases 

425 

‐ The newly designed 13 Y‐STR assay successfully amplifies 0.5‐1ng of DNA 

426  427 

‐ Rapidly mutating Y‐STRs aid lineage differentiation in populations characterized by low genetic  diversity 

428 

 

ip t

423 

Ac ce p

te

d

M

an

us

cr

429 

Page 18 of 26

Ac

ce

pt

ed

M

an

us

cr

i

Figure_1

Page 19 of 26

Ac

ce

pt

ed

M

an

us

cr

i

Figure_1_Grayscale

Page 20 of 26

Ac

ce

pt

ed

M

an

us

cr

i

Figure_2

Page 21 of 26

Ac

ce

pt

ed

M

an

us

cr

i

Figure_2_Grayscale

Page 22 of 26

Ac

ce

pt

ed

M

an

us

cr

i

Figure_3

Page 23 of 26

Ac

ce

pt

ed

M

an

us

cr

i

Figure_3_Grayscale

Page 24 of 26

Ac

ce

pt

ed

M

an

us

cr

i

Figure_4

Page 25 of 26

Ac

ce

pt

ed

M

an

us

cr

i

Figure_4_Grayscale

Page 26 of 26