Assigning confidence to sequence comparisons for species identification: A detailed comparison of the cytochrome b and cytochrome oxidase subunit I mitochondrial genes

Assigning confidence to sequence comparisons for species identification: A detailed comparison of the cytochrome b and cytochrome oxidase subunit I mitochondrial genes

Forensic Science International: Genetics Supplement Series 3 (2011) e246–e247 Contents lists available at ScienceDirect Forensic Science Internation...

110KB Sizes 0 Downloads 44 Views

Forensic Science International: Genetics Supplement Series 3 (2011) e246–e247

Contents lists available at ScienceDirect

Forensic Science International: Genetics Supplement Series journal homepage: www.elsevier.com/locate/FSIGSS

Assigning confidence to sequence comparisons for species identification: A detailed comparison of the cytochrome b and cytochrome oxidase subunit I mitochondrial genes Shanan S. Tobe a,*, Andrew C. Kitchener b,c, Adrian Linacre d a

Centre for Forensic Science, WestCHEM, Department of Pure and Applied Chemistry, University of Strathclyde, UK Department of Natural Sciences, National Museums Scotland, Edinburgh, UK Institute of Geography, School of Geosciences, University of Edinburgh, Edinburgh, UK d School of Biological Sciences, Flinders University, Adelaide, Australia b c

A R T I C L E I N F O

A B S T R A C T

Article history: Received 29 August 2011 Accepted 31 August 2011

Species identification is a tool used extensively in forensic science; particularly in the investigation of wildlife crime. The two most commonly used genetic loci in species identification are the cytochrome oxidase I gene (COI) and the cytochrome b gene (cyt b), and identification is generally carried out through the use of DNA sequencing. However, there is currently no standard method to quantify the data from sequence comparisons for presentation in reports and to courts as there have been no detailed studies of the expected levels of inter- and intraspecific variation. For the first time this study provides a detailed comparison of the effectiveness of these two loci. Interspecific and intraspecific variation are assessed and statistical confidence is applied to sequence comparisons. Comparison of 217 different mammalian species revealed that cyt b more accurately reconstructed their phylogeny and known relationships, and gave better resolution when separating species based on sequence data. Intraspecific variation was assessed using three model species and showed variation ranging from 0 to 1.16% (Kimura 2-parameter p-distance (K2P)  100%), indicating that some level of variation should be expected. Interspecific variation was greater in cyt b than in COI. Using a K2P (100) threshold of 1.5, cyt b gives a better resolution for separating species with a lower false positive rate and higher positive predictive value than those of COI. This study allows, for the first time, application of statistical confidence to sequences comparisons for species identification. ß 2011 Elsevier Ireland Ltd. All rights reserved.

Keywords: Cytochrome b Cytochrome oxidase I Species identification Statistical confidence

1. Introduction Species identification for forensic purposes is being used increasingly, as the value of non-human evidence is realized. This requires the identification of the species before individual analysis can take place. This is generally accomplished through sequence comparisons of an unknown with either reference sequences or with an online database. Traditionally the cytochrome b (cyt b) gene was used, but in 2003 the cytochrome c oxidase subunit I (COI) gene was introduced under the terminology ‘barcoding’. This started a continuing debate as to which gene offers the best template for species identification (high inter-species variability and low intra-species variation) [1]. Both genes have their staunch supporters. Until now, no research has actually investigated the probabilities associated with

* Corresponding author. Tel.: +44 0141 548 5992; fax: +44 0141 548 2532. E-mail address: [email protected] (S.S. Tobe). 1875-1768/$ – see front matter ß 2011 Elsevier Ireland Ltd. All rights reserved. doi:10.1016/j.fsigss.2011.08.122

a sequence match for either gene or ever undertaken an unbiased comparison of the two genes’ ability to indentify species and their phylogenetic interrelationships [2]. 2. Materials and methods Sequence data were aligned for the cyt b and COI genes from 217 species of mammal, compromising 29 Orders, 89 Families and 174 genera for analysis of inter-specific variation and were Table 1 A two-by-two contingency table for K2P frequencies for the tabulation of the same species (A) or different species (a) with a K2P value (100) falling below or above (B or b) a threshold. Same species (A)

Different species (a)

Total

Threshold (b)

nAB nAb

naB nab

nB nb

Total

nA

na

n

S.S. Tobe et al. / Forensic Science International: Genetics Supplement Series 3 (2011) e246–e247

e247

Table 2 Results of the analyses from the two-by-two contingency table compared against a threshold of 1.5. This table has been adapted from the full version which can be found in [2]. Cyt b Total greater than (nb) 1.5% Total less than (nB) 1.5% False negative (nAb) At 1.5% False positive (naB) At 1.5% Rate of false positive (naB/na) At 1.5%

COI

445,097

444,971

456,056

456,182

0

0

90

216

0.000202

0.000485

obtained from the NCBI database GenBank. Sequence data were also aligned from samples of 945 humans, Homo sapiens, 130 domestic cattle, Bos taurus, and 35 domestic dogs, Canis familiaris, to analyse intra-specific variation. Kimura 2-parameter p-distances (K2P) were calculated pairwise. K2P values were plotted according to frequency and thresholds were identified where there was a split between K2P values for within species (low values) and between species (high values). Data sets were combined (1343 sequences in total) and K2P values re-calculated and compared to the thresholds (Table 1). nAB represents true positives; naB represents false positives; nAb represents false negatives; nab represents true negatives; nA represents all samples/values from the same species; na represents all samples/values from different species; nB represents total positive samples; nb represents total negative samples and; n represents the total number of samples/values. Adapted from [3]. 3. Results Cyt b contained 21.3% more variable base positions than COI [2]. Cyt b also demonstrated more variability (+3.1%) than COI in a sequence that was 408 bp shorter [2]. It was determined that a comparison threshold of 1.5% would provide the best discrimination and confidence for both genes (Table 2) [2]. Intraspecific variation was assessed using three model species and showed variation ranging from 0 to 1.16% (Kimura 2parameter p-distance (K2P)  100%), indicating that some level of variation should be expected. Interspecific variation was greater in cyt b than in COI (28.79  1.01% and 24.54  0.75% K2P  100 respectively). Using a K2P (100) threshold of 1.5, cyt b gives a better resolution for separating species with a lower false positive rate and higher positive predictive value than those of COI. At a threshold of 1.5%, cyt b showed a false positive rate less than half of that of COI.

Cyt b Rate of false negative (nAb/nA) At 1.5% 0 Sensitivity (nAB/nA) At 1.5% 1 Specificity (nab/na) At 1.5% 0.999798 Positive predictive value (nAB/nB) At 1.5% 0.999803 Negative predictive value (nab/nb) At 1.5% 1

COI 0 1 0.999515 0.999527 1

4. Discussion and conclusion COI and cyt b showed similar results when used in species identification. However, cyt b demonstrates: (i) greater variation in base pairs in a shorter sequence; (ii) that its intraspecific variation is similar to that of COI and still remains below a nominal threshold and; (iii) that it has a rate of false positive less than half that of COI and a greater positive predictor value. This is the first study to compare the relative values of cyt b and COI for phylogenetic reconstruction and identification of mammalian species despite much investment in the previous use of both these loci. For the first time statistical confidence has been applied to species identification. Role of funding Funding for this research is through the Leverhulme Trust (Grant number A20080076). The funder was not involved in study design; in the collection, analysis and interpretation of data; in the writing of the report; or in the decision to submit the paper for publication. Conflict of interest None. References [1] S.S. Tobe, A. Kitchener, A. Linacre, Cytochrome b or cytochrome c oxidase subunit I for mammalian species identification—an answer to the debate, Forensic Sci. Int. Gen. Sup. 2 (2009) 306–307. [2] S.S. Tobe, A.C. Kitchener, A.M.T. Linacre, Reconstructing mammalian phylogenies: a comparison of two mitochondrial genes, PLoS ONE 5 (11) (2010) e14156. [3] C. Aitken, F. Taroni, Statistics and the Evaluation of Evidence for Forensic Scientists, 2nd ed., John Wiley & Sons, Ltd, Chichester, 2004.