Forensic Science International: Genetics Supplement Series 4 (2013) e35–e36
Contents lists available at ScienceDirect
Forensic Science International: Genetics Supplement Series journal homepage: www.elsevier.com/locate/FSIGSS
Soil sample metagenome NGS data management for forensic investigation Liisa Lilje, Triin Lillsaar, Ranno Ra¨tsep, Jaak Simm, Anu Aaspo˜llu * Tallinn University of Technology, Centre for Biology of Integrated Systems, Akadeemia tee 15A, Tallinn, Estonia
A R T I C L E I N F O
A B S T R A C T
Article history: Received 31 August 2013 Accepted 2 October 2013
Soil analysis is a valuable resource in forensic investigation. Classical forensic soil analysis involves examination of its physical characteristics and chemical composition, such as soil type, colour, particle size, shape, pH, elemental, mineral and organic content. However the limited variability of these parameters is not always allowing adequate discrimination between soil samples. As soil supports extreme diversity of microorganisms and eukaryotic communities, microbiological approaches have been proposed. Several molecular approaches for microbial DNA profiling are available; however there is a lack of published data of implementation of the next generation sequencing (NGS) approaches for forensic soil analysis. The aim of the current study was elaboration of criteria for soil metagenome data management and database searching. We used our previously sequenced collection of 11 samples collected from different environments (forests, fields, grasslands, urban park) with different flora. The single sample collection includes 9 soil samples per one sampling area (30 m 30 m) spaced by 15 m. In the current study we concentrated mainly on 18S rRNA gene V2-V3 region for fungi however SSU rRNA region for arbuscular mycorrhizal (AMF) fungi and V2-V3 hypervariable region of 16S rRNA gene for bacterial communities were taken into account. The sequencing was performed by Roche/454 platform. For data analysis OTU based approach on mothur software and NCBI BLASTN search were used. NCBI BLASTN analysis revealed altogether 2983 AMF matches and 8997 18S matches as well as 25477 OTUs (16S) were determined. Several data filtration approaches were used for data management. We found that 18S marker results could be used to create and run a filtered database that is computationally much more efficient and flexible. Our results have broad impact; however more samples have to be analysed, additional studies performed and cooperation between soil scientists and forensic scientists is required to be able to implement these novel techniques into the routine forensic practice. ß 2013 Elsevier Ireland Ltd. All rights reserved.
Keywords: Forensic soil analysis Metagenome 16S rRNA 18S rRNA SSU rRNA Next generation sequencing
1. Introduction Soil analysis is a valuable resource in forensic investigation. Classical forensic soil analysis involves examination of its physical characteristics and chemical composition, such as soil type, colour, particle size, shape, pH, elemental, mineral and organic content. However the limited variability of these parameters is not always allowing adequate discrimination between soil samples. As soil supports extreme diversity of microorganisms and eukaryotic communities, microbiological approaches have been proposed. Several molecular approaches for microbial DNA profiling are available; however there is a lack of published data of
* Corresponding author. Tel.: +372 620 4428; fax: +372 620 4401. E-mail address:
[email protected] (A. Aaspo˜llu). 1875-1768/$ – see front matter ß 2013 Elsevier Ireland Ltd. All rights reserved. http://dx.doi.org/10.1016/j.fsigss.2013.10.017
implementation of the next generation sequencing (NGS) approaches for forensic soil analysis. The aim of the current study is elaboration of criteria for soil metagenome data management and database searching. 2. Materials and methods We used our previously sequenced collection of 11 samples collected from different environments (forests, fields, grasslands, town park) with different flora (Table 1). The single sample collection includes 9 soil samples per one sampling area (30 m 30 m) spaced by 15 m. In the current study we concentrated mainly on 18S rRNA gene V2-V3 region for fungi [1] however SSU rRNA region for arbuscular mycorrhizal (AMF) fungi [2] and V2-V3 hypervariable region of 16S rRNA gene for bacterial communities [3] were taken into account. The sequencing was performed by Roche/454 platform.
L. Lilje et al. / Forensic Science International: Genetics Supplement Series 4 (2013) e35–e36
e36 Table 1 Sampling areas. Sample code
Sample type
Sample code
Sample type
INT1 KH1 KO KTM2 KU2 Mahe1
Intensively cultivated field Virtsu cocksfoot field Koeru eutrophic boreo-nemoral forest Kivio˜li gangue hill Kurtna successive forest resulting from human intervention Organic food field
PU Ray TR JSI1 VI
Laelatu woodland Rasina field of rape Tartu Riia street park Ja¨rvselja spruce plantation Virtsu fallow
Fig. 1. Number of discriminative 18S GI matches found in all 9 samples of the particular sampling area and unique to that sampling area.
For data analysis operational taxonomic unit (OTU) based approach on mothur software [4] and NCBI BLASTN search were used. Several data filtration approaches were used for data management (5–9 common sequences per sampling area as well as sequence IDs) for model building. 3. Results NCBI BLASTN analysis revealed altogether 2983 AMF matches and 8997 18S matches as well as 25477 OTUs (16S) were determined. Data filtration ended up in discriminative GI matches, which are GI matches found in all 9 samples of the particular sampling area and unique to that sampling area. In total 42 AMF GI matches, 252 18S GI matches and 26 16S GI matches found. These 252 of 18S matches are divided between particular sampling areas (Fig. 1). 4. Discussion and conclusions Our earlier proposed statistical modelling approach using logistic regression showed good performance in finding sampling
area based on limited samples. The most informative statistical approach for finding true sampling area was based on OTU clustering using all sequences available (16S bacterial communities) (data not published). However for data management and database handling we found in the current study that 18S marker results could be used to create and run a filtered database that is computationally much more efficient and flexible. After using the filtered 18S database user can perform deeper statistical analysis using 16S and AMF data. It should be mentioned that different database matches could point actually to the same sequences (species) due to repeated insertion into the database (question of the database quality). In conclusion more samples have to be analysed, additional studies performed and cooperation between soil scientists and forensic scientists is required to be able to implement these novel techniques into the routine forensic practice. Role of funding This research was funded by Tallinn University of Technology. Conflict of interest None. References [1] C. Schabereiter-Gurtner, G. Pin˜ar, W. Lubitz, et al., Analysis of fungal communities on historical church window glass by denaturing gradient gel electrophoresis and phylogenetic 18S rDNA sequence analysis, J. Microbiol. Methods 47 (2001) 345– 354. [2] J. Lee, S. Lee, J.P. Young, Improved PCR primers for the detection and identification of arbuscular mycorrhizal fungi, FEMS Microbiol. Ecol. 65 (2008) 339–349. [3] F. Armougom, R. Didier, Exploring microbial diversity using 16S rRNA highthroughput methods, Journal of Crop Science and Biotechnology 2 (2009) 074–092. [4] P.D. Schloss, S.L. Westcott, T. Ryabin, et al., Weber, Introducing mothur: opensource, platform-independent, community-supported software for describing and comparing microbial communities, Appl. Environ. Microbiol. 75 (2009) 7537– 7541.