Characterising stutter in forensic STR multiplexes

Characterising stutter in forensic STR multiplexes

Forensic Science International: Genetics 6 (2012) 58–63 Contents lists available at ScienceDirect Forensic Science International: Genetics journal h...

561KB Sizes 0 Downloads 70 Views

Forensic Science International: Genetics 6 (2012) 58–63

Contents lists available at ScienceDirect

Forensic Science International: Genetics journal homepage: www.elsevier.com/locate/fsig

Characterising stutter in forensic STR multiplexes Clare Brookes a, Jo-Anne Bright b, SallyAnn Harbison b, John Buckleton b,* a b

Department of Chemistry, University of Auckland, Private Bag 92019, Auckland 1142, New Zealand Institute of Environmental Science and Research Ltd, Private Bag 92021, Auckland 1142, New Zealand

A R T I C L E I N F O

A B S T R A C T

Article history: Received 6 October 2010 Received in revised form 23 December 2010 Accepted 1 February 2011

Stutter is an artefact seen when amplifying short tandem repeats and typically occurs at one repeat unit shorter in length than the parent allele. In forensic analysis, stutter complicates the analysis of DNA profiles from multiple contributors, known as mixed profiles, a common profile type. Consequently it is important to both understand and predict stutter behaviour in order to improve our understanding of the resolution and interpretation of these profiles. Whilst stutter is well recognised and documented, little information is available that identifies and quantifies what influences the formation of stutter. In this work we use a novel approach to examine this. We have used synthetic oligonucleotides comprising multiple repeat units to test; the influence of repeat number, the influence of repeat sequence and the impact of interruptions to the repeat sequence length. Using multiple replicates allows detailed statistical analysis. We have confirmed a linear relationship between stutter ratio and repeat number. We have shown that increased A–T content increases stutter ratio and that interruptions in repeating sequences decreased stutter ratios to levels similar to the longest uninterrupted repeat stretch. We also found that there was no relationship between stutter ratio and repeat number for a repeat unit with an A–T content of 1/4 and that half of the interrupted repeat sequences stuttered significantly less than their longest uninterrupted repeat stretches. We have applied the knowledge gained to examine specific features of the loci present in the AmpFlSTR1 SGM Plus1 multiplex kit used in our laboratory. ß 2011 Elsevier Ireland Ltd. All rights reserved.

Keywords: Forensic DNA STR Stutter Synthetic oligonucleotides

1. Introduction Modern forensic DNA analysis is dominated by the use of polymerase chain reaction (PCR) of short tandem repeat (STR) loci [1] since it is a sensitive technique permitting the analysis of degraded and small samples. Tetranucleotide or pentanucleotide repeats are favoured and can be classified into different categories based on their structure [2]. Multiplexing of STR loci efficiently enables greater discrimination to be achieved but necessarily involves some compromise of conditions for each locus and the PCR reaction does not always operate at complete fidelity. Occasional ‘‘miscopies’’ are produced. The most prevalent of these miscopies is the complete loss of one repeat unit usually referred to as stuttering. A proposed mechanism for stuttering is slipped strand mispairing (SSM) during PCR [3–10], resulting in either the insertion or deletion of one repeat unit on the new strand [4]. Generally the template strand loops out resulting in the new strand being one repeat unit shorter than the template strand [10], although stutter products corresponding to a difference of two or more repeat units smaller or one unit larger have been observed

* Corresponding author. Tel.: +64 9 8153 904; fax: +64 9 8496 046. E-mail address: [email protected] (J. Buckleton). 1872-4973/$ – see front matter ß 2011 Elsevier Ireland Ltd. All rights reserved. doi:10.1016/j.fsigen.2011.02.001

[11,12]. A looping out model to account for DNA (mutation) was first proposed by Fresco and Alberts in 1960 [13]. SSM is also considered to be the mechanism for microsatellite instability in vivo with the consequent mutation that arises [7,14,15] generating new alleles. The characteristics shared by mutations and stutter are considerable [14,15] including higher mutation rates for longer alleles and events typically involving the insertion or deletion of whole repeat units. In the forensic literature, stutter peak size has been characterised numerically using either a stutter ratio or stutter proportion [16,17]: Stutter ratio ¼ SR ¼

fS fA

Stutter proportion ¼ Sx ¼

fS fA þ fS

where fS = area/height of the stutter peak and fA = area/height of the allelic peak. Since fS is typically small there is little difference between these two statistics. Because there is a propensity for forensic biologists to quantify stutter by the use of a stutter ratio we have chosen to report the size of the stutter peaks using this statistic. Schlo¨tterer and Tauta [18] concluded previously that the A–T base pair content has some influence on the amount of stutter, with repeat units having a high A–T content producing greater amounts of stutter product. This would be consistent with the

C. Brookes et al. / Forensic Science International: Genetics 6 (2012) 58–63

59

Fig. 1. The AT base pair (left) and the GC base pair (right) showing two and three hydrogen bonds respectively.

Fig. 2. The synthetic AGAT repeating sequence (number of repeat units experiments) and primers.

weaker bonding between A and T nucleotides compared with C and G. The A–T base pair has two hydrogen bonds whereas the G–C pair has three, and the actual energies of stabilisation are nearer a ratio of 1:2 (Fig. 1). There is a known relationship between stutter size and the number of repeat units (allele) length [5,10,19] with stutter ratio increasing with increasing repeat number. However if the allele contains several repeat sequences interrupted with a conserved or non-consensus segment, there appears to be more correlation with the longest uninterrupted repeat sequence [5,6,10]. These are challenging observations. It is not immediately obvious why a compound allele would not behave as the sum of its parts. There are other factors that are thought to affect stuttering. Low template levels should lead to stochastic variation in stutter ratio. Hence we would expect greater variation in stutter ratios for low template samples and this could affect profiles where elevated cycle number or other enhancement techniques have been used. Other factors likely to affect stutter are DNA over amplification, analysis of imbalanced mixtures, specific PCR chemistry and conditions, and the influence of flanking primer sequence. Since stutter is largely an undesirable phenomenon, particularly in the analysis of casework samples where mixtures may be present and obscured, it would be advantageous to more fully understand the underlying mechanisms influencing stutter formation. To this end we have investigated stutter using synthetic oligonucleotides comprising multiple repeat units and multiple replicates testing the influence of repeat number, the influence of repeat sequence and the impact of interruptions to the repeat sequence length. 2. Method Synthetic oligonucleotides, corresponding in sequence with the repeat sequence and length to be examined, were obtained from

FAM*

5’

InvitrogenTM Life Technologies at a 50 nmol scale of synthesis and desalted purity. The synthetic sequences had forward and reverse primer binding sites of 20 bases each corresponding to the locus D16S539 [20] at either end. The use of synthetic sequences eliminated the need to sequence samples because the number of repeating units and the position of any interruptions were known. Using synthetic sequences also enabled interruptions to be made in selected positions, allowing comparisons between two different repeating units keeping primers and primer-binding sequences constant. 2.1. AGAT repeat structure The synthetic oligonucleotides and primers used to assess the influence of repeat number using the repeat (AGAT) is shown in Fig. 2, where n was 7, 9, 11, 13 and 15. The repeat structure AGAT has an A–T content of 3/4. 2.2. AGCG repeat structure To examine the effects of A–T content the synthetic repeat sequence AGCG was tested using the same number of odd repeating units (AGCG)n=7. . .15. Primers and other conditions were held constant. The synthetic oligonucleotide and primers used to assess the influence of repeat sequence composition using the repeat (AGCG) are shown in Fig. 3, where n was 7, 9, 11, 13 and 15. 2.3. Sequence interruptions The (AGAT)15 repeat sequence was interrupted at six different locations. The interruptions were a single base change from AGAT to TGAT. This change maintained the repeating unit’s AT content of 3/4, whilst not creating any dinucleotide repeats inside the repeating unit. The interruption of the 15 repeat allele with a nonconsensus repeat created two uninterrupted sequences that

GAT CCC AAG CTC TTC CTC TT GAT CCC AAG CTC TTC CTC TT (AGCG)n ACA GAT GCA CAC ACA AAC GT 3’ TGT CTA CGT GTG TGT TTG CA

Fig. 3. The synthetic AGCG repeating sequence (number of repeat unit experiments) and primers.

C. Brookes et al. / Forensic Science International: Genetics 6 (2012) 58–63

2:12 4:10 5:9 6:8 7:7 12:2

—— (AGAT)2(TGAT)(AGAT)12 —— —— (AGAT)4(TGAT)(AGAT)10 —— —— (AGAT)5(TGAT)(AGAT)9 —— —— (AGAT)6(TGAT)(AGAT)8 —— —— (AGAT)7(TGAT)(AGAT)7 —— —— (AGAT)12(TGAT)(AGAT)2 ——

0.25 AGAT AGCG

0.20

Stutter ratio

60

y = 0.0084x + 0.0203 R2 = 0.6088

0.15 0.10 0.05

Fig. 4. The allele sequence for the interrupted sequences.

0.00 6

AGAT repeat (AGAT)5, 7, 9, 11 (AGAT)13 (AGAT)15 AGCG repeat (AGCG)7 (AGCG)9 (AGCG)11 (AGCG)13 (AGCG)15 Interruptions All interruptions except 4:10 4:10

Optimised concentration

In reaction mix

0.05 fg mL1 0.1 fg mL1 0.5 fg mL1

0.025 fg 0.05 fg 0.25 fg

1

0.07 fg mL 0.2 fg mL1 1 fg mL1 5 fg mL1 5 fg mL1

0.03 fg 0.1 fg 0.5 fg 2.5 fg 2.5 fg

1

0.25 fg 0.1 fg

0.5 fg mL 0.2 fg mL1

7

8

9

10

11

12

13

14

15

16

repeat number Fig. 5. Stutter ratio for peak areas for the AGAT and AGCG repeat sequences. The whole number of repeats has been offset by 0.1 on the x-axis to allow all of the data to be seen. The trendline, equation and R2 value for the AGAT data are shown.

0.25 Interrupted

Stutter ratio

Table 1 Optimised amplification concentrations and the amount of DNA in each reaction mix for each experiment: number of repeat units (AGAT repeat sequences), sequence composition (AGCG repeat sequences), and interrupted repeat sequences.

0.20

AGAT simple repeats

0.15 0.10 0.05 0.00 6

7

8

9

10

11

12

13

14

15

16

Repeat number totaled 14 repeats. In the nomenclature used here the repeat sequence nearest the 50 end is given first. Hence 2:12 describes a sequence interrupted after 2 repeats from the 50 end. The alleles 2:12, 4:10, 5:9, 6:8, 7:7, and 12:2 were examined (Fig. 4). Amplification is typically carried out at 1 ng of genomic DNA per reaction. We calculated the amount of synthetic oligonucleotide needed in each amplification to approximate the number of copies of repeated sequence present in a typical non-synthetic reaction. This data is shown in Table 1. The calculated amount of oligonucleotide (0.05–5 fg per mL) was amplified in 25 mL reactions using AmpliTaq Gold1 PCR master mix (Applied Biosystems). Optimal primer concentration was found to be 1 mM in the PCR reaction. Primers were reconstituted from a 100 mM stock solution in TE buffer and used at 10 mM working solution in distilled water. Amplification cycle conditions were 5 min at 95 8C, followed by 30 cycles of 95 8C, 15 s; 50 8C, 15 s; 72 8C, 15 s; followed by a 60 min extension at 72 8C and a 4 8C soak. Amplified products were analysed on an ABI 3130 Genetic Analyzer using the Genescan1-120 Liz internal size standard (Applied Biosystems), the IdentifilerTM protocol, the HID fragment 36-POP4-1 run module, and the G5 dye set. Profiles were analysed using GeneScan1 version 3.1. 30 replicates were used for the AGAT and interrupted sequences. 60 replicates were used for the AGCG sequences. Amplification negatives were used in all assays. In order to negate any loading effects on the proportion of stutter observed, samples for analysis on the ABI 3130 Genetic Analyzer were randomly assigned positions in the 96 well plate prior to loading. In order to determine the stability of the system between plates and over time, a set of control samples was run on every plate. These samples were initially selected randomly. Pawlowski and Maciejewska [21] found no difference between using peak height or area for characterising stutter. However area is expected to deal with issues such as peak morphology better than height and thus peak

Fig. 6. Stutter ratio for peak areas for interrupted (AGAT)15 repeating sequences against a background of the stutter ratios for the uninterrupted alleles (with trendline). The data are offset from the whole number of repeats by 0.1 on the xaxis to allow the data to be seen. The interrupted alleles are plotted against the longest interrupted stretch. E.g. 2:12 is plotted at 11.9 and 12:2 is plotted at 12.1.

area was used in the experiments described here. Stutter was quantified as a stutter ratio based on peak areas. Such ratios would usually be examined on a log scale. However comparison of plots of stutter ratio and ln(stutter ratio) suggested little advantage to the log plot and the simpler option was taken. 3. Results 3.1. AGAT stutter data analysis Examination of the plot of stutter ratio vs. repeat number for the synthetic AGAT repeat sequences (Fig. 5) suggests a linear relationship over the repeat number range examined. The residual plot (not shown) suggested that a linear model explains the data well. 3.2. AGCG stutter data analysis Fig. 5 also presents the stutter data for the synthetic AGCG repeat numbers 7, 9, 13 and 15. Repeat number 11 was not used due to concerns about the purity of the synthetic sequence. There is no evidence for a relationship between repeat number and stutter for the synthetic AGCG repeat sequences. This is at variance with the prediction of existing theory. A comparison of the stutter ratios for the synthetic (AGCG)9–15 and (AGAT)7–15 sequences showed that the synthetic AGAT repeats had, on average, a higher stutter ratio with the exception of the data for 7 repeats. This is consistent with the lower bond strength of the high AT content repeat. We cannot explain the higher variance

C. Brookes et al. / Forensic Science International: Genetics 6 (2012) 58–63

61

Fig. 7. A comparison of stutter ratio vs. allele designation and LUS for the locus TH01.

for the 7 repeat AGCG allele. However this higher variability was maintained upon replication.

remains to be explained why the shorter stretch of an interrupted allele does not add to the stuttering.

3.3. Interrupted (AGAT)15 sequences stutter data analysis

3.4. Analysis of reference data

The results of the interruption experiments are displayed in a series of dot diagrams. The data from the interrupted sequence has been placed in the position (on the x-axis) of its longest uninterrupted sequence against a background of the stutter ratios for the simple AGAT uninterrupted alleles (Fig. 6). The longest, uninterrupted repeat stretch was a better predictor of stutter than the total number of AGAT repeat units (14 total repeats) for the interrupted sequences. This confirms previous observations [5,6,10]. Stutter for the interrupted sequences was similar to or even slightly less than expected for uninterrupted AGAT repeat sequences corresponding to the interrupted sequences’ longest uninterrupted repeat stretches. A plausible hypothesis for stutter in interrupted sequences might have been that the interrupted repeat sequences would stutter slightly more than their longest uninterrupted repeat stretches due to the possibility of stutter also occurring in their shorter repeat stretch and the effect being additive. For example, for the interrupted sequence 5:9, the majority of the stuttering would be dictated by the longest uninterrupted repeat stretch of nine AGAT repeats, but in addition, that there would be a smaller amount of stutter from the shorter repeat stretch of five AGAT repeats. Therefore the observation that stutter is similar to or even less that predicted from the longest stretches was unexpected. It

Stutter data from reference samples analysed using the AmpFlSTR1 IdentifilerTM multiplex were examined in view of these findings. This multiplex contains simple, compound and complex STR loci with two A–T base pair contents: 1/2 and 3/4. Only data with an allele height of 300RFU or higher were used. A threshold of 30RFU was applied and data below this were assigned a value of 15RFU. 6949 observations of stutter ratio were obtained. If we hypothesise that the primary determinants of stutter ratio are A–T base pair content and longest uninterrupted stretch (LUS) then a plot of stutter ratio vs. LUS should be more nearly linear than a plot of stutter ratio vs. allele designation. Allele sequences were obtained, where possible from the literature. In some cases there were several sequences for the same allele designation. In such cases the average LUS has been used. The improvement in the linearity of the plot when using LUS vs. allele designation is obvious at the individual locus TH01 (Fig. 7). In this comparison we see the 9.3 allele (plotted at 9.75) sitting well below the trendline. This allele has the structure [AATG]6ATG[AATG]3 and hence an LUS of 6. When plotted at this position it fits much better with the general trend. Comparisons of the stutter ratio vs. allele designation and LUS for loci with an A–T base content of 1/2 and 3/4 are shown in Figs. 8 and 9, respectively. Because of overstrike of data points we have

Fig. 8. A comparison of stutter ratio vs. allele designation and LUS for those loci (D2S1338, D19S433) with an A–T base content of 1/2.

C. Brookes et al. / Forensic Science International: Genetics 6 (2012) 58–63

62

Fig. 9. A comparison of stutter ratio vs. allele designation and LUS for those loci (FGA, vWA, D3S1358, D16S539, D18S51, D21S11, D8S1179, CSF1PO, D13S317, D5S818, D7S820, TPOX) with an A–T base content of 3/4.

Table 2 The results of linear regression of stutter ratio vs. LUS and A–T content.

Intercept %A–T LUS LUS  %A–T

Coefficients

t-Stat

p-Value

0.0203 0.0173 0.0075 0.0001

7.1 5.9 32.2 0.5

0.0000 0.0000 0.0000 0.6375

spread the data by adding and subtracting small random amounts to the x variable in Figs. 7–9. The results of linear regression of stutter ratio vs. LUS and A–T content are shown in Table 2. The linear regression suggests, as expected a large and significant effect of LUS and A–T content. The interaction term (LUS  %A–T) allows different slopes for plots of stutter vs. LUS for %AT of 1/2 and 3/4. Since the coefficient (0.0001) is small and the pvalue large (0.6375) these data suggest that the interaction term is unnecessary implying that the same slope may be used for plots of stutter vs. LUS for %AT of 1/2 and 3/4. This simple model implies stutter ¼ 0:0203 þ 0:0075  LUS  0:0173

for %AT

stutter ¼ 0:0203 þ 0:0075  LUS

1 2

for %AT

3 4

This finding differs from the observation from the synthetic sequence experiments that high AT content should lead to higher stutter. In the reference data the two loci with low AT content have higher stutter by a factor of 0.0173. Low template levels should lead to stochastic variation in stutter ratio. We can observe this in the data and are thinking about ways to model this. 4. Conclusions Existing theory and observation suggest that repeat unit (allele) length and base pair content influence the amount of stutter, with longer repeat units or those having a high A–T content producing greater amounts of stutter product. There are other factors that are thought to affect stuttering. Low template levels should lead to stochastic variation in stutter ratio and this can be observed in the reference data. DNA over amplification, analysis of imbalanced mixtures, specific PCR chemistry and conditions, and the influence of flanking primer sequence could also affect stutter ratios but cannot be commented upon using the data in these experiments. Our synthetic sequence experiments support the suggestion that repeat sequences with high AT content have high stutter but

this finding is contradicted by the analysis of reference data (note that the two loci with AT content of 1/2 have, on average, more stutter by a factor of 0.0173). If the allele contains several repeat sequences interrupted with a conserved or non-consensus segment, there is a better correlation with the longest uninterrupted repeat sequence. This hypothesis was supported by the reference data examined here. However the finding that stutter product partially correlates to the longest uninterrupted sequence as opposed to the sum of the sequence lengths is unexplained and suggests that there is some undiagnosed subtlety in stutter production. Since loci with shorter uninterrupted sequences are likely to produce less stutter product, this feature is likely to be of some forensic utility. Acknowledgements This work was conducted by Clare Brookes as part of the requirements for a Master of Science Degree in Forensic Science Degree for The University of Auckland, New Zealand. We gratefully acknowledge the comments from Susan Vintiner that have greatly improved this manuscript. References [1] A. Edwards, H.A. Hammond, L. Jin, C.T. Caskey, R. Chakraborty, DNA typing and genetic mapping at five trimeric and tetrameric tandem repreats, Am. J. Hum. Genet. 49 (1991) 746–756. [2] A. Urquhart, C.P. Kimpton, T.J. Downes, P. Gill, Variation in short tandem repeat sequences – a survey of twelve microsatellite loci for use as forensic identification markers, Int. J. Legal Med. 107 (1994) 13–20. [3] P. Gill, R. Sparkes, C.P. Kimpton, Development of guidelines to designate alleles using an STR multiplex system, Forensic Sci. Int. 89 (3) (1997) 185–197. [4] X.Y. Hauge, M. Litt, A study of the origin of ‘‘shadow bands’’ seen when typing dinucleotide repeat polymorphisms by the PCR, Hum. Mol. Genet. 2 (4) (1993) 411–415. [5] M. Klintschar, P. Wiegand, Polymerase slippage in relation to the uniformity of tetrameric repeat stretches, Forensic Sci. Int. 135 (2) (2003) 163–166. [6] K. Lazaruk, J. Wallin, C. Holt, T. Nguyen, P.S. Walsh, Sequence variation in humans and other primates at six short tandem repeat loci used in forensic identity testing, Forensic Sci. Int. 119 (1) (2001) 1–10. [7] G. Levinson, G.A. Gutman, Slipped-strand mispairing: a major mechanism for DNA sequence evolution, Mol. Biol. Evol. 4 (3) (1987) 203–221. [8] M. Meldgaard, N. Morling, Detection and quantitative characterization of artificial extra peaks following polymerase chain reaction amplification of 14 short tandem repeat systems used in forensic investigations, Electrophoresis 18 (11) (1997) 1928–1935. [9] R. Sparkes, C.P. Kimpton, S. Gilbard, P. Carne, J. Anderson, N. Oldroyd, et al., The validation of a 7-locus multiplex STR test for use in forensic casework. (II) Artefacts, casework studies and success rates, Int. J. Legal Med. 109 (1996) 195–204.

C. Brookes et al. / Forensic Science International: Genetics 6 (2012) 58–63 [10] S.P Walsh, N.J. Fildes, R. Reynolds, Sequence analysis and characterization of stutter products at the tetranucleotide repeat locus vWA, Nucleic Acids Res. 24 (14) (1996) 2807–2812. [11] P.M. Schneider, S. Meuser, W. Waiyawuth, Y. Seo, C. Rittner, Tandem repeat structure of the duplicated Y chromosomal STR locus DYS385 and frequency studies in the German and three Asian populations, Forensic Sci. Int. 97 (1998) 61–70. [12] B.E. Krenke, L. Viculis, M.L. Richard, M. Prinz, S.C. Milne, C. Ladd, et al., Validation of a male-specific 12-locus fluorescent short tandem repeat (STR) multiplex, Forensic Sci. Int. 148 (1) (2005) 1–14. [13] J.R. Fresco, B.M. Alberts, The accommodation of noncomplementary bases in helical polyribonucleotides and deoxyribonucleic acids, Proc. Natl. Acad. Sci. U.S.A. 46 (1960) 311–321. [14] B. Brinkmann, M. Klintschar, F. Neuhuber, J. Hu¨hne, B. Rolf, Mutation rate in human microsatellites: influence of the structure and length of the tandem repeat, Am. J. Hum. Genet. 62 (6) (1998) 1408–1415. [15] E.M. Dauber, W. Ba¨r, M. Klintschar, F. Neuhuber, W. Parson, B. Glock, et al., Mutation rates at 23 different short tandem repeat loci, Int. Congr. Ser. 1239 (2003) 565–567.

63

[16] P. Gill, J Buckleton, Biological basis for DNA evidence, in: J. Buckleton, C. Triggs, S.J. Walsh (Eds.), Forensic DNA Evidence Interpretation, Florida, CRC Press, 2005, pp. 1–27. [17] P. Gill, R. Sparkes, L. Fereday, D.J. Werrett, Report of the European Network of Forensic Science Institutes (ENSFI): formulation and testing of principles to evaluate STR multiplexes, Forensic Sci. Int. 108 (1) (2000) 1– 29. [18] C. Schlo¨tterer, D. Tauta, Slippage synthesis of simple sequence DNA, Nucleic Acids Res. 20 (2) (1992) 211–215. [19] Applied Biosystems. AmpFlSTR1 SGM Plus1 PCR Amplification Kit: User’s Manual. California, USA; 2005 Contract No.: Document Number. [20] J. Murray, V. Sheffield, J.L. Weber, G. Duyk, K.H. Buetow. Cooperative Human Linkage Center (Accession G07925) 1995 [updated 1995]; Available from HY00 PERLINK http://www.ncbi.nlm.nih.gov/genbank/00 www.ncbi.nlm.nih.gov/ genbank/. [21] R. Pawlowski, A. Maciejewska, Forensic validation of a multiplex containing nine STRs – population genetics in Northern Poland, Int. J. Legal Med. 114 (2000) 45–49.