Forensic Science International: Genetics Supplement Series 3 (2011) e204–e205
Contents lists available at ScienceDirect
Forensic Science International: Genetics Supplement Series journal homepage: www.elsevier.com/locate/FSIGSS
Sequences of microvariant/‘‘off-ladder’’ STR alleles Eszter Rockenbauer a,*, Magdalena B. Holgersson a, Sarah L. Fordyce b, Maria C. A´vila Arcos b, Claus Børsting a, Anders J. Hansen a, Rune Frank-Hansen a, Eske Willerslev b, M. Thomas P. Gilbert b, Niels Morling a a b
Section of Forensic Genetics, Department of Forensic Medicine, Faculty of Health Sciences, University of Copenhagen, Denmark Centre for GeoGenetics, Natural History Museum of Denmark, Copenhagen, Denmark
A R T I C L E I N F O
A B S T R A C T
Article history: Received 26 August 2011 Accepted 29 August 2011
We have recently developed a sequencing method using a second generation sequencing platform. We typed 111 samples for one or two of 10 short tandem repeat loci and discovered a high degree of sequence diversity. Most variation was seen in the D21S11 locus and least variation was found in the D18S51 locus. Our sequencing method gave a better resolution than traditional capillary electrophoresis methods, and also better resolution than other techniques, such as pyrosequencing and mass spectrometry. We found that poly-A/T stretches longer than six bases decreased the quality of reads, making sequence composition an important factor for the quality of sequencing results. ß 2011 Elsevier Ireland Ltd. All rights reserved.
Keywords: Microvariant allele Short tandem repeat Second generation sequencing
1. Introduction Commercial short tandem repeat (STR) multiplex kits and capillary electrophoresis (CE) platforms are widely used for DNA profiling. However, analysis of the lengths of the PCR amplicons cannot reveal information about sequence variations within the amplicons and, thus, important information may be overlooked. We have recently developed a sequencing method and an algorithm specially designed to process data reads of repetitive sequences from high-throughput Roche Genome Sequencers (GS) [1]. The aim of this study was to further investigate the sequence variation in common and microvariant STR alleles. 2. Materials and methods A total of 111 samples, previously typed with commercial STR kits (the AmpF‘STR1 Identifiler1 kit from Applied Biosystems and/ or the PowerPlex1 16 System kit from Promega) were sequenced for one or two of 10 STR loci (CSF1PO, FGA, TH01, VWA, D5S818, D7S820, D13S317, D19S433, D21S11 and D18S51). According to CE data, 86 of the samples had microvarant alleles, and 10 had three alleles within an STR locus. DNA was extracted from the samples and PCR-amplified using previously published primer sets [1,2]. The amplified samples were MID-tagged, pooled into libraries, and sequenced on a GS FLX (Roche) or a GS Junior (Roche) following recommendations from the manufacturer.
* Corresponding author. Tel.: +45 353 26284; fax: +45 353 26270. E-mail address:
[email protected] (E. Rockenbauer). 1875-1768/$ – see front matter ß 2011 Elsevier Ireland Ltd. All rights reserved. doi:10.1016/j.fsigss.2011.08.102
Sequences were matched to the correct samples using an inhouse developed algorithm [1]. Alleles were identified by aligning the sequences for each sample. Allele calls were analyzed by comparing the consensus sequences to a GeneBank reference sequence.
3. Results and discussion A total of 10,144 sequence reads were successfully matched, giving an average of 85 reads per sample. For 85 samples, all expected alleles were obtained, while 18 samples had no or very few reads. There was a correlation between the amount of sequence reads and the DNA concentration. This indicates suboptimal PCR amplification in these samples, rather than poor sequencing. For another 8 samples, only one allele was identified. Most sequence variation was seen among the D21S11 alleles were 12 alleles with different repeat and/or sequence composition were observed from just 6 samples (data not shown). Least sequence variation was seen among the D18S51 alleles, where 30 alleles had the same repeat pattern and only two alleles contained a single base difference to the GeneBank reference sequence (data not shown). Single nucleotide polymorphisms (SNPs) and repeat structure variations that go undetected on CE are easily discovered with sequencing, as demonstrated in the following examples. According to CE data, sample S14 has two alleles for the D13S317 locus with 9 and 12 STR repeats, respectively. The peak heights of the two alleles were significantly different (Fig. 1A), suggesting the presence of a sequence variation that altered the efficiency of
E. Rockenbauer et al. / Forensic Science International: Genetics Supplement Series 3 (2011) e204–e205
e205
Fig. 1. (A) CE results of S14, where two alleles were detected in the D13S317 locus. Note the difference between the peak heights of the two alleles. (B) GS sequence data of S14 revealed a hidden, third allele.
Fig. 2. (A) CE data of a boy with three alleles in the D21S11 locus and his parents. (B) Sequencing data showed that the third allele originated from the mother.
the PCR amplification of the shorter allele. However, GS sequencing data showed that the individual has three alleles in the D13S317 locus; 9, 12 and 120 . Allele 120 has the same length and sequence composition as allele 12 except for a SNP in position 75 (Fig. 1B). Sample S34, S35 and S36 were samples from a boy and his parents. CE data revealed the existence of three D21S11 alleles in the child’s sample (Fig. 2A). However, all three family members had an allele with 30 repeats, and there was no way to tell from the CE data alone, which parent passed the allele with 30 repeats to the child. Sequencing data on the other hand revealed a difference in repeat composition in the alleles of the parents and showed that the child’s allele 30 originated from his mother (Fig. 2B). In conclusion, GS sequencing offers better resolution of the STRs than traditional CE methods and also better resolution than other techniques, such as pyrosequencing [3] and mass spectrometry [4]. However, when designing a new experiment, it is crucial to take the sequence composition of the chosen STR alleles into consideration. We found that poly-A/T stretches longer than 6 bases decreased the fidelity of the sequencing result. STR loci, such as D7S820 with a stretch of 6–8 T residues downstream of the STR region, gave sequence reads with varying number of T residues in this position (data not shown).
Conflict of interest None. Role of funding None. Acknowledgements We thank Trine Leerhøj Hansen, Anja Jørgensen and Maibritt Sigvardt for technical assistance and Martin Mikkelsen and Morten Rasmussen for advice concerning sequencing. References [1] S.L. Fordyce, M.C. Avila-Arcos, E. Rockenbauer, et al., High-throughput sequencing of core STR loci for forensic genetic investigations using the Roche Genome Sequencer FLX platform, Biotechniques 51 (2) (2011) 127–133. [2] M.C. Kline, C.R. Hill, A.E. Decker, J.M. Butler, STR sequence analysis for characterizing normal, variant, and null alleles, Forensic Sci. Int. Genet. 5 (4) (2010) 329–332. [3] A.M. Divne, H. Edlund, M. Allen, Forensic analysis of autosomal STR markers using Pyrosequencing, Forensic Sci. Int. Genet. 4 (2010) 122–129. [4] J.V. Planz, B. Budowle, T. Hall, et al., Enhancing resolution and statistical power by utilizing mass spectrometry for detection of SNPs within the short tandem repeats, Forensic Sci. Int. Genet. Suppl. Ser. 2 (2009) 529–531.