Forensic Science International: Genetics 13 (2014) 3–9
Contents lists available at ScienceDirect
Forensic Science International: Genetics journal homepage: www.elsevier.com/locate/fsig
Exploring iris colour prediction and ancestry inference in admixed populations of South America ˜ as a, J. So¨chtig a, A. Go´mez Tato b, A. Freire-Aradas a, Y. Ruiz a, C. Phillips a,*, O. Maron b b ´ J. Alvarez Dios , M. Casares de Cal , V.N. Silbiger c, A.D. Luchessi c, A.D. Luchessi d, M.A. Chiurillo e, A´. Carracedo a,f, M.V. Lareu a a
Unidad de Gene´tica Forense, Instituto de Ciencias Forenses Luis Concheiro, Grupo de Medicina Xeno´mica, Universidade de Santiago de Compostela, Spain Facultade de Matema´ticas, Universidade de Santiago de Compostela, Spain c Faculdade de Farma´cia, Universidade Federal do Rı´o Grande do Norte, Natal, Brazil d Faculdade de Cieˆncias Aplicadas, Universidade Estadual de Campinas, Limeira, Brazil e University Centro-Occidental Lisandro Alvarado, Barquisimeto, Venezuela f Center of Excellence in Genomic Medicine Research, King Abdulaziz University, Jeddah, Saudi Arabia b
A R T I C L E I N F O
A B S T R A C T
Article history: Received 5 March 2014 Received in revised form 23 May 2014 Accepted 11 June 2014
New DNA-based predictive tests for physical characteristics and inference of ancestry are highly informative tools that are being increasingly used in forensic genetic analysis. Two eye colour prediction models: a Bayesian classifier – Snipper and a multinomial logistic regression (MLR) system for the Irisplex assay, have been described for the analysis of unadmixed European populations. Since multiple SNPs in combination contribute in varying degrees to eye colour predictability in Europeans, it is likely that these predictive tests will perform in different ways amongst admixed populations that have European coancestry, compared to unadmixed Europeans. In this study we examined 99 individuals from two admixed South American populations comparing eye colour versus ancestry in order to reveal a direct correlation of light eye colour phenotypes with European co-ancestry in admixed individuals. Additionally, eye colour prediction following six prediction models, using varying numbers of SNPs and based on Snipper and MLR, were applied to the study populations. Furthermore, patterns of eye colour prediction have been inferred for a set of publicly available admixed and globally distributed populations from the HGDP-CEPH panel and 1000 Genomes databases with a special emphasis on admixed American populations similar to those of the study samples. ß 2014 Elsevier Ireland Ltd. All rights reserved.
Keywords: SNPs Admixture America Genetic ancestry Iris colour prediction
1. Introduction Forensic genetics is entering a new phase of enhanced DNA analysis with the development of predictive tests for physical characteristics, such as hair and eye colour [1,2], together with the inference of genetic ancestry [3–6] using sets of single nucleotide polymorphisms (SNPs). Hair and eye colour variation is largely confined to European populations and although one SNP in particular, rs12913832 in HERC2, is responsible for the greatest proportion of eye colour predictability, several other SNPs in five
* Corresponding author at: Forensic Genetics Unit, Institute of Legal Medicine, University of Santiago de Compostela, Santiago de Compostela, Galicia, Spain. Tel.: +34 981 582 327; fax: +34 981 580 336. E-mail address:
[email protected] (C. Phillips). http://dx.doi.org/10.1016/j.fsigen.2014.06.007 1872-4973/ß 2014 Elsevier Ireland Ltd. All rights reserved.
other genes are also informative for differentiating blue and brown iris phenotypes and have been brought together in the Irisplex test [1], Similarly, an extended SNP set of 23 markers has been additionally reported by Ruiz et al. [7] for similar purposes. Common to both prediction models is the source of unadmixed European populations the assays are based on. Since multiple SNPs in combination contribute to eye colour predictability to varying degrees in Europeans, and additional variation has yet to be identified to account for the poor success rate for intermediate eye colour predictions, it is likely that the overall predictive performance of SNP combinations will be different in admixed populations with European co-ancestry compared to unadmixed Europeans. This study examined 99 individuals from two admixed South American population samples collected from Natal in Northeast Brazil and Barquisimeto in Northwest Venezuela, to assess a correlation of light eye colour phenotypes with European
4
A. Freire-Aradas et al. / Forensic Science International: Genetics 13 (2014) 3–9
co-ancestry in admixed individuals. Additionally, predictive performance of six eye colour prediction models based on subsets of an extended SNP set of 23 markers [7] using both the USC Bayesian forensic SNP classifier Snipper and the recently launched online Irisplex prediction webtool based on a multinomial logistic regression (MLR) model [8] have been assessed. Many of the additional SNPs in the extended sets are weakly predictive for blue and brown eyes but we previously indicated they can contribute to intermediate eye colour predictive performance [7]. The proportions of European, Native American and African co-ancestry were assessed in the sampled populations by genotyping 60 autosomal ancestry-informative marker SNPs (AIM-SNPs), a proportion of which were specifically developed to improve differentiation of American from European and African ancestries. We also examined differences in the eye colour predictions patterns inferred with the aforementioned six prediction methods for a set of admixed populations distributed across the globe from the HGDP-CEPH panel and 1000 Genomes databases with a special emphasis on admixed American populations similar to those of the study samples. Although these samples lack eye colour phenotypes, we aimed to compare results from these populations that show significant European co-ancestry, with eye colour predictions made for unadmixed HGDP-CEPH human genome diversity panel populations in the original Irisplex studies [1], as well as our own analyses of this panel with extended SNP sets. 2. Materials and methods
2.2. DNA extraction and SNP genotyping DNA samples were extracted from buccal swabs using standard phenol/chloroform extraction [9]. A total of 60 AIM-SNPs were genotyped using SNaP shot assays as previously described [3,5,10], to estimate co-ancestry proportions in the study samples. These AIM-SNPs all have genotypes available for 1000 Genomes and HGDP-CEPH populations. Study samples were genotyped for 23 SNPs that have been associated with eye colour in a range of genome-wide studies of European pigmentation patterns [11–15]. The 23 eye colour SNPs were genotyped using two SNaP shot assays: SHEP1 and SHEP2, as previously described [7], developed to analyze skin, hair or eye colour variation and used to identify the best forensic predictors in each case. The same SNPs have all genotypes available for 1000 Genomes but only 15 of the 23 are available for HGDP-CEPH populations. The 15 SNPs typed for the HGDP-CEPH panel is therefore a novel combination not previously assessed by us, comprising: rs1015362, rs11636232, rs12203592, rs12896399, rs12913832, rs1393350, rs1408799, rs1667394, rs4778138, rs4778232, rs4778241, rs683, rs7183877, rs7495174, rs8024968 (four bold SNPs part of Irisplex). We completed the genotyping for the other eight SNPs in selected HGDP-CEPH populations outside of Europe where light eye colour has been previously predicted in certain individuals using Irisplex analysis (see Fig. 5 of [1]), specifically: Algerians, three Israeli and eight Pakistani-region populations. The five American HGDP-CEPH populations of Karitiana, Surui, Colombian, Mayan and Pima were also genotyped for the full set of 23 SNPs.
2.1. Population samples 2.3. Statistical analysis Study DNA samples were obtained from two South American regions with high levels of population admixture from European, African and Native American contributors, comprising: Natal in Northeast Brazil (N = 79) and Barquisimeto in Northwest Venezuela (N = 20). Study DNA samples were collected with informed consent in all cases and ethical approval was granted from the ethics committee of the University of Santiago de Compostela, Spain. Digital photography (Canon EOS 1000D) using a macro lens, with uniform lighting and distance-to-subject conditions, recorded the donor’s iris colours and these were independently scrutinized by two investigators in the genotyping laboratory. Eye colour phenotype assignment was categorized into three groups: blue, green-hazel or brown. In the cases presenting a brown peripupillary ring within a blue/green-hazel outer iris, the assigned phenotype was recorded as the corresponding outer iris colour. Additionally, a total of 931 samples from seven population groups of the CEPH human genome diversity panel (HGDP-CEPH) from Africa, Europe, Middle East, Central South Asia, East Asia, America and Oceania were analyzed. Eye colour-associated SNP genotypes plus AIM-SNP data was also collected for 437 samples from seven 1000 Genomes populations comprising: African Americans in Southwest USA (ASW), Mexicans in Los Angeles (MXL), Colombians of Medellin (CLM) and Puerto Ricans (PUR), as well as Finnish, British and Spanish Europeans (FIN, GBR, IBS). Lastly; in order to be used as reference material for the three aforementioned sets of samples, data from 256 Europeans phenotyped for eye colour and genotyped for 23 eye colour associated SNPs in a previous study [7] was used as the training set for a Bayesian classifier: Snipper, adapted to enable closely linked SNPs to be uploaded as haplotype frequencies. This function was used to accommodate data for the closely linked HERC2 SNP pair of rs12913832 and rs1129038. All SNP data from the training set was included in the original publication as supplementary material [7] and can be uploaded directly to the Snipper portal to permit 23-SNP profiles, or subsets thereof, to classify individuals into blue, brown and intermediate eye colour categories.
Ancestry was analyzed using the clustering algorithm Structure v 2.3.3 [16] with a burn-in period of 200,000 followed by 200,000 MCMC steps after burn-in. We applied the admixture model with correlated allele frequencies and used prior labelling of reference populations (POPFLAG = 1) while treating test samples as unknown (POPFLAG = 0). HGDP-CEPH populations from Africa, Europe, East Asia and America were used as reference data to analyze the South American test populations applying a fourpopulation model (K = 4) for ancestry. The K value was analyzed in five independent runs and plots were constructed using CLUMPP 1.1.2 [17] and Distruct 1.1 software [18]. Eye colour predictions were made using the recently enhanced multinomial logistic regression model provided by the Erasmus Irisplex and HIrisplex Eye and Hair Colour DNA Phenotyping Webtool (http://www.erasmusmc.nl/fmb/resources/Irisplex_HIrisplex/herein termed simply as the Erasmus Irisplex Webtool) [8] and the Snipper forensic classifier (http://mathgene.usc.es/snipper/ ). Besides the 99 study individuals, SNP genotypes were collected for HGDP-CEPH and 1000 Genomes populations using SPSmart [19] and genotypes compared to reference data from previously constructed training sets [7] with SNP sets sub-grouped into: five prediction models analyzed with Snipper: Snipper 6 (the 6 Irisplex SNPs); Snipper5+1* (5 Irisplex SNPs plus rs1129038 counted in combination with Irisplex rs12913832); Snipper13* (the 13 best candidate SNPs from Ruiz [7]); Snipper15 (those SNPs present in the HGDP-CEPH online data) and Snipper23* (all 23 SNPs studied by Ruiz).Erasmus6 (the 6 Irisplex SNPs analyzed with the Erasmus Irisplex Webtool) was additionally tested. It should be noted that the Erasmus Irisplex Webtool is based on a larger reference dataset [8] than the Excel calculator included in the original Irisplex publication [1] and gives improved MLR probabilities for a majority of SNP profiles. Supplementary Fig. S1 shows a comparison of the 91 parallel MLR analyses made for all the complete Irisplex profiles (Excel version of MLR calculator does not handle missing data). For Snipper analysis we applied the frequency-based
A. Freire-Aradas et al. / Forensic Science International: Genetics 13 (2014) 3–9
model in order to analyze the rs12913832-rs1129038 combination as a haplotype when present in the different SNP combinations (Snipper5+1*, Snipper13* and Snipper23*). A likelihood ratio of at least 3:1 was used to assign eye colours into three classes: blue, brown, green-hazel, (i.e. the highest eye colour prediction likelihood is 3 times greater or more than the next highest likelihood). Likelihood ratios below this threshold were marked as unclassified. For Erasmus6, individuals presenting prediction probabilities under 0.7 were also marked as unclassified. ROC curves were constructed using R software v2.15.0and the ROCR package [20]. 3. Results 3.1. Eye colour prediction performance in study samples from Brazil and Venezuela The eye colour classification results and the corresponding statistical values for AUC, sensitivity and specificity of 99 study individuals are summarized in Table 1 for a total of six prediction methods. Additionally, the corresponding ROC curves are shown in
5
Supplementary File S1.The Snipper23 predictions comprised: 60 successful classifications, 24 incorrectly classified and 15 unclassified. Four of the 24 incorrectly classified were a blue prediction for brown-eyed individuals, 20 consisted of green-hazel misclassified as blue or brown, highlighting the consistently observed lower success rate predicting intermediate eye colour [1,7], additionally reflected in decreased values of AUC (0.6676) and sensitivity (0.2105). The subset of 15 SNPs, Snipper15, provided similar performance to the 23 SNP set for blue and brown although became more conservative for the green-hazel category by retaining a significant proportion of falsely blue-predicted green-hazels as unclassified (13.1%). The Snipper13 provided comparable success rates to 23 SNPs for blue and a slight increase for green-hazel to 21.7% and brown to 76.6%. Furthermore, this SNP subset has only misclassified a single brown-eyed individual as blue. Similar results were found between the Snipper13 and the Snipper5+1 for either blue or brown classifications, whereas greenhazel prediction was slightly increased to 26.1% by Snipper5+1, providing the highest value of sensitivity (0.3158) of all of the six methods tested for the green-hazel phenotype. The Erasmus6 analysis does not classify any of the green-hazel phenotypes, with
Table 1 Prediction success (bold diagonal values) estimated from the study samples of Brazil and Venezuela. Three eye colour phenotypes were predicted applying six eye colour prediction methods: Erasmus6, Snipper6, Snipper5+1, Snipper13, Snipper15 and Snipper23. Additionally, the corresponding summary statistics (AUC, sensitivity and specificity values) are provided. Erasmus6
% Predicted
Summary statistics
Blue
Intermediate
Brown
Unclassified
AUC
Sensitivity
Specificity
Blue Intermediate Brown
75 60.9 12.5
0 0 0
0 17.4 81.25
25 21.7 6.25
0.85 No data 0.9097
1 0 0.8667
0.7179 1 0.8519
Snipper6
% Predicted Blue
Green-hazel
Brown
Unclassified
AUC
Sensitivity
Specificity
Blue Green-hazel Brown
58.3 21.7 6.2
0 4.3 0
0 4.3 59.4
41.7 69.6 34.4
0.8793 0.7291 0.8440
1 0.1250 0.7755
0.8636 1 0.9677
Snipper5+1*
% Predicted
Summary statistics
Summary statistics
Blue
Green-hazel
Brown
Unclassified
AUC
Sensitivity
Specificity
Blue Green-hazel Brown
100 52.2 1.6
0 26.1 6.2
0 4.3 76.6
0 17.4 15.6
0.9569 0.6281 0.9214
1 0.3158 0.7778
0.8219 0.9403 0.9706
Snipper13*
% Predicted Blue
Green-hazel
Brown
Unclassified
AUC
Sensitivity
Specificity
Blue Green-hazel Brown
100 56.6 1.6
0 21.7 7.8
0 4.3 76.6
0 17.4 14
0.9358 0.6688 0.9393
1 0.2632 0.8167
0.8354 0.9538 0.9697
Snipper15
% Predicted Blue
Green-hazel
Brown
Unclassified
AUC
Sensitivity
Specificity
Blue Green-hazel Brown
100 47.8 7.8
0 17.4 10.9
0 8.7 65.6
0 26.1 15.7
0.9071 0.6602 0.9237
1 0.2353 0.7778
0.8049 0.8939 0.9412
Snipper23*
% Predicted Blue
Green-hazel
Brown
Unclassified
AUC
Sensitivity
Specificity
100 60.9 6.2
0 17.4 7.8
0 4.3 68.8
0 17.4 17.2
0.9464 0.6676 0.9411
1 0.2105 0.8
0.7857 0.9231 0.9688
Blue Green-hazel Brown
Summary statistics
Summary statistics
Summary statistics
*HERC2 haplotype, rs12913832 plus rs1129038. Erasmus6/Snipper6: rs12203592, rs12896399, rs12913832, rs1393350, rs16891982, rs1800407. Snipper5+1*: rs1129038, rs12203592, rs12896399, rs12913832, rs1393350, rs16891982, rs1800407. Snipper13*: rs1129038, rs11636232, rs12203592, rs12896399, rs12913832, rs1393350, rs1667394, rs16891982, rs1800407, rs4778232, rs4778241, rs7183877, rs8024968. Snipper15: rs1015362, rs11636232, rs12203592, rs12896399, rs12913832, rs1393350, rs1408799, rs1667394, rs4778138, rs4778232, rs4778241, rs683, rs7183877, rs7495174, rs8024968. Snipper23*: rs1015362, rs1129038, rs11636232, rs12203592, rs12592730, rs12896399, rs12913832, rs1375164, rs1393350, rs1408799, rs1667394, rs16891982, rs1800407, rs26722, rs4778138, rs4778232, rs4778241, rs6058017, rs683, rs7183877, rs7495174, rs8024968, rs916977.
6
A. Freire-Aradas et al. / Forensic Science International: Genetics 13 (2014) 3–9
most predicted as blue. The predictive success of blue decreases to 75% but brown rises to 81.25%. The six Irisplex SNPs analyzed with Snipper also provided poor green-hazel classification accuracy (4.3%) but the main difference with the MLR model is that instead of being classified as blue, most of them are unclassified. The higher non-classification rate also affects the blue and brown predictions (up to 41.7% and 34.4% respectively). In summary, the Snipper13 set appears to give a suitable balance between SNPs typed in a reasonably compact test and success rate, with this set showing some 50% lower error for brown eye classifications compared to 15 and 23 SNPs. This finding is in line with that of Ruiz when assessing the effect of adding extra weak-effect SNPs to the six of Irisplex [7]. However, blue eye colour phenotypes represented only 12% of the study samples and slight differences in predictive performance are not necessarily reliably gauged from such a small number of individuals. Supplementary File S2 lists the genotypes of all SNPs analyzed for the study samples ranked within each eye colour group by decreasing European co-ancestry proportions. The Erasmus6 profiles are also listed separately with prediction values obtained from the Erasmus Irisplex Webtool giving 87 predictions. These are aligned with the predictions made with the other SNP sets using Snipper and the recorded phenotypes. Eight brown-eyed individuals predicted to be blue are arranged separately. One of these individuals (sample code’’17’’) had blue eye predictions in all SNP sets, whereas all the others had at least two of five ‘no prediction’ results from Snipper. Also noteworthy was the prediction of blue eyes in the 12 blue-eyed study individuals in all cases with extended SNP sets, whereas 3 or 5/12 did not have predictions with Erasmus6.
Both populations show similar patterns of European co-ancestry as the majority ancestral component in 85%ofindividuals. Amongst the Brazilians, presenting a mean European co-ancestry of 0.672, African co-ancestry is the second major component (mean: 0.177) while American ancestry is a much smaller third component in almost all cases (mean: 0.086). We note that one or two Brazilians show non-negligible East Asian co-ancestry. Amongst Venezuelans (mean European co-ancestry: 0.730), American co-ancestry is the second major component (mean: 0.178) and African co-ancestry is much less evident in most samples from this region (mean: 0.063). According to these data, the admixture could be expressed as roughly 1:0.5 and 1:0.4 for Brazilian and Venezuelan samples respectively, as the mean correlation of European: non European co-ancestries. Fig. 1 plots the 99 study individuals arranged in descending European co-ancestry proportions (dark grey/European versus light grey/non-European) together with the recorded eye colour phenotype plus the predicted phenotypes corresponding to the six methods we assessed. Blue, green and brown recorded/predicted colours depict the corresponding eye colour phenotype whereas grey colours represent unclassified samples. Phenotype predictions largely mirror the data reported in Table 1 and the corresponding genotypes and ancestry proportions are outlined in Supplementary File S2. Nearly all Brazilian and Venezuelan study samples are admixed with predominantly European coancestry: 53.5% of individuals have European co-ancestry above 0.7. Not surprisingly, the proportion of brown eye phenotypes rises towards the right side of the chart. The highest number of unclassified individuals was found with the Snipper6 prediction method.
3.2. Comparison of European co-ancestry and eye colour in study samples
3.3. Patterns of eye colour prediction in global population panels without phenotypes
Ancestry was assessed with Structure using four reference population groups and an optimum K = 4 clusters (likelihood of K plot not shown). Supplementary Fig. S2A shows the cluster pattern for the study group, ordered by decreasing European cluster membership. Supplementary Fig. S2B shows identical analyses but depicting independent clusters for each study population location.
As the CEPH-15 combination of SNPs (Snipper15) had not been previously tested, their predictive performance was assessed by analyzing the 256European reference samples of Ruiz [7] using cross-validation in Snipper and gave classification success rates of blue: 95.17%, green-hazel: 95.00% and brown: 94.12%. When compared to the 23 SNPs of the original study [7], green-hazel
Fig. 1. Study samples (N = 99) ranked by decreasing European ancestry proportions (dark grey/European: light grey/non European) aligned with the recorded eye colour and the output from the six eye colour prediction methods (Erasmus6, Snipper6, Snipper5+1, Snipper13, Snipper15 and Snipper23). Blue, green or brown colours denote corresponding iris phenotype predictions and grey depict no predictions (prediction probability below 0.7 for Erasmus or likelihood ratio below 3:1 for Snipper).
A. Freire-Aradas et al. / Forensic Science International: Genetics 13 (2014) 3–9
7
Fig. 2. Worldwide landscape of predicted eye colour phenotypes based on 15 predictive SNPs genotyped in59 populations (HGDP-CEPH and 1000 Genomes). Blue, green or brown segments denote corresponding iris colour predictions and grey depicts no predictions (likelihood ratio below 3:1).
predictive success drops by 4–95% and blue by around 3–95.17% but there is a slight rise in brown predictive success. Applying a 3:1 likelihood ratio threshold in Snipper with the Snipper15, analyzing all HGDP-CEPH and selected 1000 Genomes samples (1368 individuals in 59 populations), gave the prediction results summarized in Fig. 2. As widely observed [1,7,10–12], light iris colour predominates in the north of Europe, reaching a maximum 72% predicted blue eyes in the Finnish. Other parts of Eurasia display some eye colour variation prediction but on a reduced scale and largely confined to green-hazel. Blue eyes were predicted in a small proportion of individuals in Israel-Druze (10%), IsraelPalestinian (2.2%), Pakistani-Balochi (4.2%), Pakistani-Burusho (4%) and Pakistani-Kalash (8.7%). Africans, East Asians (including Siberian Yakut) and Oceanians predictions display exclusively brown eye colour. Native American populations are predicted to have predominantly brown eyes, although green-hazel was predicted in a larger than expected proportion of individuals. Results of eye colour prediction analysis of HGDP-CEPH Karitiana, Surui, Colombian, Maya and PIma plus Algerians, Israel-region populations and Pakistanis, using all six SNP sets are summarized in Fig. 3.Comparisons of the predictions made with 13, 15 or 23 SNPs indicate very similar results with generally close matches for the proportions of green-hazel and brown eye classifications inferred. In fact, the very small proportion of blue eye predictions in each population are comparable in all SNP sets applied, with reduced numbers predicted amongst 1000 Genomes admixed populations by Irisplex SNPs. Additionally, the small number of blue eye predictions in Pakistani-Burusho and -Balochi were not inferred by Irisplex SNPs. For certain cases in American and Eurasian populations the Snipper13 and Snipper15 sets appear to over-estimate the proportion of green-hazel eyes and evidently Erasmus6 does not make predictions to this phenotype for any of the SNP genotypes found in the populations assessed. Snipper6 performs in a very similar way to the recently released online MLR
model, in this case presenting minimal green-hazel predictions in a minority of populations however; displaying the major unclassified proportions between all the six SNP sets. Finally, eye colour predictions from each SNP set are aligned against European coancestry proportions for the admixed American 1000 Genomes populations and these are detailed in Supplementary Fig. S3. 4. Discussion Individuals from admixed populations are characterized by widely contrasting co-ancestry proportions and this can hinder accurate forensic ancestry analysis made with small numbers of AIM-SNPs. However, one advantage of ancestry inference compared to phenotype prediction in a forensic context is that sufficient allowance can be made of mixed patterns detected by comparative analyses such as Structure and principal component analysis (PCA). These tests can reveal that admixed individuals exhibit distinct cluster membership proportions or displaced positions on PCA plots. To aid detection of admixed individuals in Snipper analyses of single SNP profiles, this classifier now includes an estimation of co-ancestry proportions when these are detected at significant levels. Although eye colour prediction is more categorical in comparison, we have observed in this study that Irisplex SNPs applied to American or Eurasian populations do not predict any green-hazel phenotypes and a higher proportion of individuals are unclassified compared to Europeans. This matches the analyses made of HGDP-CEPH populations in the first Irisplex study [1] and suggests Irisplex SNPs can handle non-European population analyses robustly as originally claimed, despite making no intermediate eye colour predictions in these populations. The same cannot be said for Irisplex SNPs analyzed using Snipper (Snipper6), with a much larger number of admixed American and Pakistani individuals unclassified. However, a large proportion of these do obtain a high enough likelihood ratio when the
8
A. Freire-Aradas et al. / Forensic Science International: Genetics 13 (2014) 3–9
Fig. 3. Comparative pie charts from the analysis of six eye colour prediction models: Erasmus6, Snipper6, Snipper5+1, Snipper13, Snipper15 and Snipper23. Populations evaluated comprise: admixed 1000 Genomes (ASW, CLM, MXL, PUR), the 99 study individuals (Brazil-Venezuela), HGDP-CEPH American (Brazil-Karitiana, Brazil-Surui, Colombia-Piapoco/Curripaco, Mexico-Maya, Mexico-Pima) and Eurasian populations (Algeria, Israeli-region and Pakistan regions).
rs12913832-rs1129038 haplotype is included and Fig. 3 shows results are very similar for Snipper5+1 and the 13 best SNPs of Ruiz (Snipper13) [7]. The study admixed individuals show reduced predictive performance and accuracy is lower than analyses of Europeans as well as showing reduced predictions above the 0.7 MLR probability threshold. Therefore in these samples, use of forensic ancestry tests alongside eye colour tests can caution the user to the complexities of analyzing individuals with admixed ancestry. Although the authors of the original Irisplex study [1] made the point that ancestry information is not a necessary adjunct to Irisplex analyses, given the demographic profiles of most urban populations worldwide we would suggest ancestry tests should be considered in order to alert forensic users to those cases where normal eye colour SNP genotype combinations do not apply. While use of online SNP data to analyze the predictive success of different marker sets is obviously hindered by an absence of phenotype descriptions for the samples, it still provides an interesting set of patterns with which to compare outcomes with increasing numbers of weak-effect eye colour predictors. In particular, we note the observation that blue-eyed individuals are predicted in many admixed American and Eurasian populations studied here and these are inferred with a degree of consistency by all SNP sets, albeit at a slightly higher frequency with 13, 15 and 23 SNPs than with Erasmus6/Snipper6. Blue eye colour in individuals from such population backgrounds represents atypical physical appearance that would have considerable value to police investigators. Whether the green-hazel predictions we made with expanded SNP sets are accurate predictions of this eye colour phenotype in HGDP-CEPH and 1000 genomes samples is not possible to confirm, but amongst study population samples the larger SNP sets gave a consistent 17% correct classification rate for
green-hazel that could be further adapted to reduce error by raising the likelihood ratio threshold for this phenotype above 3:1. The inclusion of marker rs1129038 in these SNP sets can contribute to improved green-hazel prediction as noted by Ruiz [7]. However, there appears to be much higher numbers of green-hazel eye colours predicted than might be expected in American as well as Pakistani and Middle East populations. The pie-chart proportions of green-hazel phenotypes suggest predictions with a large number of weak predictors could be over-estimating their numbers in non-European populations, although these may comprise a large proportion of light brown eye colour phenotypes that all studies to date have had difficulty differentiating from green-hazel. It is informative to make comparisons to two Irisplex eye colour studies with similarities to this one. Firstly, a study of Irisplex performance by Prestes et al. [21], analyzing admixed EuropeanEast Asian individuals, produced higher non-classification rates of 22%, compared to analyses of Europeans. Blue and brown eye colour was predicted with 100% accuracy but no green eye colour individuals were correctly predicted as intermediate (though based on a very small sample size of this eye colour). Interestingly, this study suggested that the number of generations since population admixture has a bearing on predictive success, observing that levels of admixture below 1:7 were less accurately predicted than levels of 1:1 or 1:3. This may be reflected in our study populations where admixture events in Brazil and Venezuela are likely to have occurred several generations earlier in a large proportion of individuals. In the second study of Yun et al. [22], Irisplex SNPs in 905 European, intermediate Eurasian and East Asian population samples were compared using the Snipper prediction systems and an adaptation of the MLR calculator [1] adapted for the FROG-kb website [23] but preserving the formulae.
A. Freire-Aradas et al. / Forensic Science International: Genetics 13 (2014) 3–9
The Snipper likelihood ratio threshold was identical to ours and populations were not phenotyped for eye colour. Therefore this study gauged differences in the scope and nature of the predictions made by two statistical models for the same genotypes, in a similar way to the first two columns of pie charts in Fig. 3. Inconsistent predictions found between Snipper and the FROG-kb calculator comprised mainly brown vs. green-hazel respectively, having an identical 6-SNP profile and this effect could partly explain the higher levels of green-hazel phenotypes in our survey. The study of Yun also examined the effect on predictive consistency of analyzing incomplete profiles, noting the very weak contribution of rs12203592 as previously suggested [1,7]. More importantly, the effect of missing rs12913832 led to unreliable predictions (26 Han Chinese with no rs12913832 genotypes predicted to have blue eyes) with the FROG-kb MLR calculator. The Erasmus Irisplex Webtool has been designed to decline predictions for Irisplex-6 profiles that lack rs12913832 data (and the HIrisplex section declines profiles lacking MC1R genotypes). Therefore, with the improved prediction values we observed with this calculator and safeguards described above against unreliable predictions when critical SNP data is missing, the Erasmus Irisplex Webtool is a good step forward in the reliability and usefulness of the Irisplex system. Nevertheless, bearing in mind the eight brown-eyed study samples incorrectly predicted as blue with the Erasmus Irisplex Webtool, we do also advocate genetic analysis of ancestry as important support in the interpretation of SNP eye colour predictors from individuals with admixed backgrounds. We also continue to favour the inclusion of rs1129038 as a SNP that both reduces the nonclassification rate for blue and brown while improving prediction of green-hazel eye colour. We note that the recent study of eye colour predictive models by Allwood and Harbison [24] found that rs1129038 provided the root split in their proposed brown and blue classification tree predictions. When combined with rs12913832, the additional HERC2 rs1129038SNP contributed particularly strongly to the success of brown: non-brown classifications. Acknowledgements AFA was supported by a research grant from Deputacio´n da ˜ a and an academic mobility grant from the Asociacio´n Corun Universitaria Iberoamericana de Postgrado. MVL was supported by funding from Xunta de Galicia INCITE 09 208163PR. The authors would like to thank all the anonymous donors who participated in the study. The authors are especially grateful to Promega Biotech Ibe´rica, Promega Corporation, for sponsoring the printing of the colour figures. Appendix A. Supplementary data Supplementary data associated with this article can be found, in the online version, at doi:10.1016/j.fsigen.2014.06.007. References [1] S. Walsh, F. Liu, K.N. Ballantyne, M. van Oven, O. Lao, M. Kayser, Irisplex: a sensitive DNA tool for accurate prediction of blue and brown eye colour in the absence of ancestry information, Forensic Sci. Int. Genet. 5 (2011) 170–180. [2] S. Walsh, F. Liu, A. Wollstein, L. Kovatsi, A. Ralf, A. Kosiniak-Kamysz, W. Branicki, M. Kayser, The HIrisplex system for simultaneous prediction of hair and eye colour from DNA, Forensic Sci. Int. Genet. 7 (2013) 98–115.
9
[3] C. Phillips, A. Salas, J.J. Sa´nchez, M. Fondevila, A. Go´mez-Tato, J. Alvarez-Dios, M. Calaza de Cal, D. Ballard, M.V. Lareu, A´. Carracedo, The SNPforID Consortium, Inferring ancestral origin using a single multiplex assay of ancestry-informative marker SNPs, Forensic Sci. Int. Genet. 1 (2007) 273–280. [4] O. Lao, P.M. Vallone, M.D. Coble, T.M. Diegoli, M. van Oven, K.J. van der Gaag, J. Pijpe, P. de Knijff, M. Kayser, Evaluating self-declared ancestry of U.S. Americans with autosomal, Y-chromosomal and mitochondrial DNA, Hum. Mutat. 31 (2010) E1875–E1893. [5] M. Fondevila, C. Phillips, C. Santos, A. FreireAradas, P.M. Vallone, J.M. Butler, M.V. Lareu, A´. Carracedo, Revision of the SNP for ID 34-plex forensic ancestry test: assay enhancements, standard reference sample genotypes and extended population studies, Forensic Sci. Int. Genet. 7 (2013) 63–74. [6] R. Pereira, C. Phillips, N. Pinto, C. Santos, S.E. dos Santos, A. Amorim, A´. Carracedo, L. Gusma˜o, Straightforward inference of ancestry and admixture proportions through ancestry-informative insertion deletion multiplexing, PLoS One 7 (2012) e29684. [7] Y. Ruiz, C. Phillips, A. Gomez-Tato, J. Alvarez-Dios, M. Casares de Cal, R. Cruz, O. Maron˜as, J. So¨chtig, M. Fondevila, M.J. Rodriguez-Cid, A´. Carracedo, M.V. Lareu, Further development of forensic eye colour predictive tests, Forensic Sci. Int. Genet. 7 (2013) 28–40. [8] S. Walsh, L. Chaitanya, L. Clarisse, L. Wirken, J. Draus-Barini, L. Kovatsi, H. Maeda, T.i. Ishikawa, T. Sijen, P. de Knijff, W. Branicki, F. Liu, M. Kayser, Developmental validation of the HIrisplex system: DNA-based eye and hair colour prediction for forensic and anthropological usage, Forensic Sci. Int. Genet. 9 (2014) 150–161. [9] X. Liu, S. Harada, DNA isolation from mammalian samples, Curr. Protoc. Mol. Biol. (2013), Chapter 2: Unit 2.14. [10] C. Phillips, L. Porras-Hurtado, A. Freire-Aradas, M. Fondevila, C. Santos, A. Salas, J. Henao, C. Isaza, L. Beltra´n, V. NogueiraSilbiger, A. Castillo, A. Ibarra, F. Moreno Chavez, J. So¨chtig, Y. Ruiz, C. CarvalhoGontijo, S. de Oliveira, G. Barreto, F. Rondon, W. Zabala, L. Borjas, A´. Carracedo, M.V. Lareu, The PIMA SNP Panel: a Population Informative Multiplex for The Americas, 2014 (in preparation). [11] F. Liu, K. van Duijn, J.R. Vingerling, A. Hofman, A.G. Uitterlinden, A.C. Janssens, M. Kayser, Eye colour and the prediction of complex phenotypes from genotypes, Curr. Biol. 19 (2009) R192–R193. [12] P. Sulem, D.F. Gudbjartsson, S.N. Stacey, A. Helgason, T. Rafnar, K.P. Magnusson, A. Manolescu, A. Karason, A. Palsson, G. Thorleifsson, M. Jakobsdottir, S. Steinberg, S. Palsson, F. Jonasson, B. Sigurgeirsson, K. Thorisdottir, R. Ragnarsson, K.R. Benediktsdottir, K.K. Aben, L.A. Kiemeney, J.H. Olafsson, J. Gulcher, A. Kong, U. Thorsteinsdottir, K. Stefansson, Genetic determinants of hair, eye and skin pigmentation in Europeans, Nat. Genet. 39 (2007) 1443–1452. [13] P. Sulem, D.F. Gudbjartsson, S.N. Stacey, A. Helgason, T. Rafnar, M. Jakobsdottir, S. Steinberg, S.A. Gudjonsson, A. Palsson, G. Thorleifsson, S. Palsson, B. Sigurgeirsson, K. Thorisdottir, R. Ragnarsson, K.R. Benediktsdottir, K.K. Aben, S.H. Vermeulen, A.M. Goldstein, M.A. Tucker, L.A. Kiemeney, J.H. Olafsson, J. Gulcher, A. Kong, U. Thorsteinsdottir, K. Stefansson, Two newly identified genetic determinants of pigmentation in Europeans, Nat. Genet. 40 (2008) 835–837. [14] R.P. Stokowski, P.V. Pant, T. Dadd, A. Fereday, D.A. Hinds, C. Jarman, W. Filsell, R.S. Ginger, M.R. Green, F.J. van der Ouderaa, D.R. Cox, A genome wide association study of skin pigmentation in a South Asian population, Am. J. Hum. Genet. 81 (2007) 1119–1132. [15] J. Han, P. Kraft, H. Nan, Q. Guo, C. Chen, A. Qureshi, S.E. Hankinson, F.B. Hu, D.L. Duffy, Z.Z. Zhao, N.G. Martin, G.W. Montgomery, N.K. Hayward, G. Thomas, R.N. Hoover, S. Chanock, D.J. Hunter, A genome-wide association study identifies novel alleles associated with hair colour and skin pigmentation, PLoS Genet. 4 (2008) e1000074. [16] J.K. Pritchard, M. Stephens, P. Donnelly, Inference of population structure using multilocus genotype data, Genetics 155 (2000) 945–959. [17] M. Jakobsson, N.A. Rosenberg, CLUMPP: a cluster matching and permutation program for dealing with label switching and multimodality in analysis of population structure, Bioinformatics 23 (2007) 1801–1806. [18] N.A. Rosenberg, DISTRUCT: a program for the graphical display of population structure, Mol. Ecol. Notes 4 (2004) 137–138. [19] HGDP-CEPH and 1000 genomes data obtained from SPSmart at: http://spsmart.cesga.es (accessed April 2014) [20] T. Sing, O. Sander, N. Beerenwinkel, T. Lengauer, ROCR: visualizing classifier performance in R, Bioinformatics 21 (2005) 3940–3941. [21] P.R. Prestes, R.J. Mitchell, R. Daniel, K.N. Ballantyne, R.A.H. van Oorschot, Evaluation of the Irisplex system in admixed individuals, Forensic Sci. Int. Genet. Suppl. 3 (2011) e283–e284. [22] L. Yun, Y. Gu, H. Rajeevan, K.K. Kidd, Application of six Irisplex SNPs and comparison of two eye colour prediction systems in diverse Eurasia populations, Int. J. Leg. Med. 128 (2014) 447–453. [23] H. Rajeevan, U. Soundararajan, A.J. Pakstis, K.K. Kidd, Introducing the Forensic Research/Reference on Genetics knowledge base, FROG-kb, Invest. Genet. 3 (2012) 18. [24] J.S. Allwood, S.A. Harbison, SNP model development for the prediction of eye colour in New Zealand, Forensic Sci. Int. Genet. 7 (2013) 444–452.