Performance of cytochrome c oxidase subunit I (COI), ribosomal DNA Large Subunit (LSU) and Internal Transcribed Spacer 2 (ITS2) in DNA barcoding of Collembola

Performance of cytochrome c oxidase subunit I (COI), ribosomal DNA Large Subunit (LSU) and Internal Transcribed Spacer 2 (ITS2) in DNA barcoding of Collembola

European Journal of Soil Biology 69 (2015) 1e7 Contents lists available at ScienceDirect European Journal of Soil Biology journal homepage: http://w...

1MB Sizes 0 Downloads 24 Views

European Journal of Soil Biology 69 (2015) 1e7

Contents lists available at ScienceDirect

European Journal of Soil Biology journal homepage: http://www.elsevier.com/locate/ejsobi

Original article

Performance of cytochrome c oxidase subunit I (COI), ribosomal DNA Large Subunit (LSU) and Internal Transcribed Spacer 2 (ITS2) in DNA barcoding of Collembola Sten Anslan a, *, Leho Tedersoo b a b

Institute of Ecology and Earth Sciences, University of Tartu, 14A Ravila, 50411 Tartu, Estonia Natural History Museum, University of Tartu, 14A Ravila, 50411 Tartu, Estonia

a r t i c l e i n f o

a b s t r a c t

Article history: Received 30 September 2014 Received in revised form 6 April 2015 Accepted 10 April 2015 Available online

Studies in soil meiofauna with traditional methods are highly laborious and identification exclusively by means of morphological traits usually underestimates species richness. Although the mitochondrial DNA Cytochrome c Oxidase subunit I gene (COI, CO1, cox1, coxI) has been proposed as a standard DNA barcode for animals, nuclear ribosomal DNA markers such as Large Subunit (LSU, 28S) and Internal Transcribed Spacer (ITS) have proven efficient to distinguish among species of animals, eukaryotic microbes and plants. The COI and LSU markers discriminate among closely related species of Collembola, but the efficiency of ITS2 has never been tested. We evaluated the relative performance of these three markers for DNA barcoding of Collembola. All three markers proved highly efficient in discriminating species, but LSU exhibited very low differences among certain closely related species. We conclude that ITS2 may serve as an alternative barcode to identify Collembola and has a potential to be used for metabarcoding analyses. © 2015 Elsevier Masson SAS. All rights reserved.

Keywords: Springtails Molecular identification Internal transcribed spacer (ITS) Cytochrome c oxidase subunit I (COI, CO1, cox1, coxI) Ribosomal large subunit (LSU,28S) DNA barcoding

1. Introduction Identification of different organisms is relatively easy in large and medium-sized animals that exhibit abundant macroscopic morphological characters [but see] [1]. By contrast, species-level identification of small invertebrates such as Collembola may be highly labor-intensive due to poor resolution in microscopic characters. In addition, juveniles cannot be reliably identified based on morphological characters. These issues could lead to serious underestimation of biodiversity [2]. Moreover, the paucity of taxonomic experts constitutes a serious bottleneck for numerous largescale ecological projects. Therefore, molecular methods are commonly used for more accurate identification and separation of species. Unfortunately, numerous inconsistencies occur between molecular and morphological identification methods, indicating that morphologically similar, cryptic species are a common phenomenon in meiofauna [3e5]. DNA barcoding of selected genes has become a popular tool for

* Corresponding author. E-mail address: [email protected] (S. Anslan). http://dx.doi.org/10.1016/j.ejsobi.2015.04.001 1164-5563/© 2015 Elsevier Masson SAS. All rights reserved.

identification of species over all ontogenetic stages and sexes and it therefore improves precision in determining species richness. This method relies on sequence comparison of a short genetic marker and it assumes that intraspecific genetic variation is consistently lower than interspecific genetic variation [6]. The most commonly used DNA barcode for animals is the mitochondrial DNA Cytochrome c Oxidase subunit I gene (COI, CO1, cox1, coxI). It has proven efficient to separate between closely related species of Collembola and distinguish cryptic taxa [7,8]. The ribosomal DNA Large Subunit (LSU, 28S) has been demonstrated to be similarly suited for species discrimination [8]. Although the efficiency of the nuclear Internal Transcribed Spacer (ITS) to identify Collembola species has not been evaluated, several studies outlined the potential of particularly ITS2 subregion for species-level identifications of several groups of animals, plants, fungi and protists [9e12]. The ribosomal DNA benefits from highly conserved domains suitable for nondegenerate primer design, ease of amplification due to multiple copies per genome, and sufficient variability in unconserved regions for distinguishing among species [9]. Therefore, ITS2 is regarded as one of the most promising candidates for DNA barcoding and ecological metabarcoding studies of multiple organisms [9,13]. Whereas it is found that the COI region is not fully applicable

2

S. Anslan, L. Tedersoo / European Journal of Soil Biology 69 (2015) 1e7

for metabarcoding studies [14], ITS2 has been long utilized by botanists, mycologists and eukaryote microbiologists for latter approach. For example, O'Brien et al. [15] found in their mycological study that also other eukaryotic organisms were co-amplified and sequenced with ITS primers. The flow of data produced by highthroughput sequencing (HTS) from environmental DNA extracts is rapidly increasing, but without the knowledge of sequence affiliation there is no way to interpret that information. The primary goal of DNA metabarcoding is the simultaneous identification of large sets of taxa present in single environmental samples [16]. That task is favored by the improved sequencing depth during the recent development of HTS methods. Full length ITS2 reads are usually short enough to be sequenced in most HTS platforms and public databases contain large and growing number of ITS2 sequences from wide range of soil microbes. Thus, ITS barcodes also from soil animals would support the primary goal of metabarcoding by enabling simultaneous identification of wide range of organisms from environmental samples. This study aims to evaluate the relative suitability of COI, LSU and ITS for DNA barcoding of collembolans. In particular, we address the usefulness of ITS2, because of its potential for simultaneous use in metabarcoding surveys of multiple taxa. 2. Methods Litter- and soil samples (ca. 500 cm3) were collected throughout the vegetation periods of 2012 and 2013 from different sites in Estonia (Table S.1). Individuals of Collembola were extracted using a Tullgren funnel method and stored in 96% ethanol for morphological and genetic analyses. A total of 206 specimens were identified according to Fjellberg [17,18]. Because of the destructive DNA extraction method, the identified specimens were compared to voucher specimens, which were deposited in the Natural History Museum of the University of Tartu (TUZ). Genomic DNA was extracted from the whole body (smaller specimens) or from one leg (larger specimens) in a lysis buffer (0.8 M TriseHCl, 0.2 M (NH4)2SO4, 0.2% w/v Tween-20; Solis BioDyne, Tartu, Estonia) using a proteinase K method (100 ml lysis buffer and 2.5 ml proteinase K; incubation at 56  C for 24 h and at 98  C for 15 min). The COI marker was amplified using the primers LCO1490 and HCO2198 [19]. The D1 and D2 domains of LSU were amplified with primers CTB6 (https://unite.ut.ee/primers.php) and TW13 (T.J. White, unpublished) or C10, C2, C2'coll and D2Coll [20]. The latter groups of primers were used in samples, in which CTB6 and TW13 failed to produce a readable sequence chromatogram. To recover the full ITS or ITS2, samples were amplified using the universal primers ITS5 and ITS4 [21] or 5.8SF (modified 58A1F from Ref. [22]) and ITS4, respectively. Because the specimens were sometimes contaminated with fungal DNA originating from gut or appendices and resulted in low-quality sequence chromatograms, we designed primers ITS7-Coll and ITS4-Coll to discriminate against fungal DNA. These primers amplify only the ITS2 subregion and ca. 50 bp of the flanking genes. ITS7-Coll and ITS4-Coll were designed according to the high quality full-length ITS sequences of Collembola compared against sequences of other soil invertebrates and particularly all fungal groups. Samples that failed to result in clean sequences of Collembola and samples collected in 2013 were amplified with these newly designed primers. All primer sequences are given in Table 1. PCR was carried out in 25 ml containing 0.5 ml of each primer (100 mM), 5 ml FirePol Mastermix (10 mM MgCl2, premixed readyto-use solution, Solis BioDyne, Tartu, Estonia), 1 ml of 10 diluted DNA template and sterilised distilled water. For amplifying the COI region, PCR conditions consisted of 1 min at 94  C, five cycles of 1 min at 94  C, 1 min at 45  C, and 1 min at 72  C, 35 cycles of

Table 1 Primers used in this study. Primer name

Sequence 50 e30

LCO1490 HCO2198 ITS5 ITS4 5.8SF ITS7-Colla ITS4-Colla CTB6 TW13 C10 C2 C2'coll D2coll

GGTCAACAAATCATAAAGATATTGG TAAACTTCAGGGTGACCAAAAAATCA GGAAGTAAAAGTCGTAACAAGG TCCTCCGCTTATTGATATGC ATGCATCGATGAAGAACGC GTGAACTGCAGGACACATG GCTTAAATTTAGCGGGTAATC GCATATCAATAAGCGGAGG GGTCCGTGTTTCAAGACG ACCCGCTGAATTTAAGCAT TGAACTCTCTCTTCAAAGTTCTTTTC GAGACCGATAGCGAACAAGTACCGTGA ACCACGCATGCWTTAGATTG

a

Primers designed in this study.

1 min at 94  C, 1 min at 51  C, and 1 min at 72  C, and 5 min at 72  C [8]. For amplifying the ITS and LSU regions, PCR conditions consisted of 15 min at 95  C, 35 cycles of 30 s at 95  C, 30 s at 55  C, 1 min at 72  C, and 10 min at 72  C. Amplification success was checked with 1% agarose gel via electrophoresis. PCR products were cleaned using Exo-SAP enzymes (GE Healthcare, Freiburg, Germany) using incubation at 37  C for 45 min and at 85  C for 15 min. PCR products were sequenced in Macrogen Inc. (Amsterdam, The Netherlands) using the PCR primers. Quality check and manual editing was performed using Sequencher 5.1 (Gene Codes Corporation, Ann Arbor, USA). All sequences are deposited in PlutoF cloud database [23] (http://plutof.ut.ee) and in GenBank (accessions numbers are given in Table S.2). DNA sequences were aligned using MAFFT 7 [24] and edited in Seaview 4.4.2 [25]. Because the 50 and 30 ends of some COI sequences were of poor quality, all COI sequences were trimmed to 606 bp. Sequences of LSU were manually trimmed to include D1-D2 domains. The ITS sequences were trimmed to include only ITS2 subregion without the flanking genes by using the ITS2 database annotation tool [26]. All sequences were verified to belong to Collembola using the BLASTn algorithm against GenBank. Intraspecific and interspecific sequence divergence was calculated based on Levenshtein distance (raw distance, equally weighing indels and substitutions) and the more widely used Kimura twoparameter distance (K2P) using usearch 7.0.1090 [27] and MEGA6 [28], respectively. Neighbor-Joining trees of COI, LSU and ITS2 were used to illustrate clustering between species (conducted in MEGA 6; [28]). 3. Results For COI, LSU and ITS2 markers, we respectively included 162, 154 and 162 high-quality sequences from 33 species (range: 2e13 specimens per species) to determine barcoding gaps. Sequences were considered to represent high quality reads when 85% of base pairs had quality scores 40e60 (in Sequencher 5.1). Although we were able to discriminate between 33 species, 6 of them remained identified only to the genus level (Table S.2) because of the incapability to find appropriate morphological features for certain specification and lack of reference sequences in the public databases. The length of COI, LSU and ITS2 sequences was 606 bp, 729e756 bp and 182e434 bp, respectively. The universal COI primers successfully amplified 96.1% of samples and yielded 74.3% high-quality sequences. Primers CTB6 and TW13 amplified Collembolan LSU region in 82.5% of samples and resulted in 87.5% of high-quality sequences. Fungal DNA was amplified in 8.3% samples and these samples were re-amplified with C10, C2, C2'coll and D2coll resulting in successful sequencing of 88.2% samples. On average, the ITS primers amplified successfully 90.0% of samples

S. Anslan, L. Tedersoo / European Journal of Soil Biology 69 (2015) 1e7

and yielded 57.3% high-quality sequences. The Universal ITS primers (ITS5, 5.8SF, ITS4) successfully amplified 82.1% of samples and yielded high-quality sequences for 43.8% individuals. Samples with traces of fungal DNA (39.6%) or failed amplifications were reamplified with ITS7-Coll and ITS4-Coll. These primers amplified 97.9% and yielded high-quality sequences for 70.8% individuals. Primer pairs ITS7-Coll and ITS4-Coll that were designed on the basis of Collembola ITS2 region successfully eliminated coamplification of fungal DNA and were therefore more reliable for direct sequencing. However, use of specific primers did not improve sequence quality if there was length polymorphism in the two ITS2 alleles. The maximum distance between our sampling plots was ~330 km. Within that range, we found that the intraspecific variability of collembolan COI was in the range of 0e10.7% (0e11.2% using K2P distance). The highest intraspecific variability was €ffer, 1896) (0e10.7% Levenshtein possessed by Isotomiella minor (Scha distance and 0e11.2% K2P distance). There was a substantial barcoding gap among species (between 10.7 and 13.9% Levenshtein and 11.2e14.3% K2P) (Fig. 1). As an exception, interspecific variability of Isotoma viridis (Bourlet, 1839) and Isotoma riparia (Nicolet, 1842) remained in the boundaries of intraspecific variability (10.1e10.7% Levenshtein and 10.4e11.2% K2P). The NJ algorithm was unable to place congeneric species together (Fig. 2), indicating that genetic differences become obscure at the genus level. The length of D1-D2 domains of the LSU was strongly conserved within species (0 bp difference). There was no intraspecific variability in LSU sequences, whereas interspecific variability ranged from 0.7% to 28.7% (31.8%, K2P) (Fig. 1). The NJ algorithm placed congeneric species and confamilial genera together (Fig. 3). The length of ITS2 sequences differed over two-fold among species, but it was relatively conserved within species and individuals (0e3 bp difference). Intraspecific ITS2 sequence variability was in the range of 0e3.7% (Levenshtein distance), showing the highest variability in Orchesella flavescens (Bourlet, 1839) (0e3.7%) and Orchesella bifasciata (Bourlet, 1839) (0e3.6%). Isotomiella minor, that exhibited the highest variability in COI sequences (10.7%), had also high 3.1% difference in ITS2. All ITS2 interspecific differences were greater than 6% (6.4 to ca. 60%) and the barcoding gap ranged between 3.7 and 6.4% sequence dissimilarity (Fig. 1). While the interspecific COI divergence between the two Isotoma species remained within the limits of intraspecific divergence, it was not the case according to ITS2 (interspecific divergences were 6.4%). Taken together, the intraspecific pairwise Levenshtein distances of COI and ITS2 did not exhibited significant correlation (rs ¼ 0.09, N ¼ 430 p < 0.06), and there was no correlation between the maximum intraspecific variability (rs ¼ 0.131, N ¼ 33, p ¼ 0.468) (Table 2). The analysis of ITS2 sequences using K2P distances were unable to separate several species that were clearly delimited using Levenshtein distance, including Orchesella cincta (Linnaeus, 1758)O. bifasciata (K2P: 0e4.2%), Entomobrya nivalis (Linnaeus, 1758)Entomobrya marginata (Tullberg, 1871) (0e2.1%), Tomocerus sp.Pogonognathellus flavescens (Tullberg, 1871) (0e2.1%). According to ITS2 K2P distance, the greatest intraspecific variability of 4.2% occurred among specimens of O. bifasciata, whereas the lowest interspecific variability of 4.3% was evident between Hypogastrura socialis (Uzel, 1890) and Ceratophysella denticulata (Bagnall, 1941) L3 [5]. NJ phylograms of ITS2 placed congeneric species together, but did not resolve relationships among families (Fig. 4). Two different lineages of the cryptic morphospecies Parisotoma €ffer, 1896) were detected and named L2 [according to notabilis (Scha 8] and L4 (according to blast results in BOLD database, unpublished data). These distinct lineages were confirmed by all markers and the Levenshtein distances between morphospecies were 15.5e17%

3

Fig. 1. Pairwise comparisons of COI, LSU and ITS2 sequences based on dissimilarity (Levenshtein distance). Closed columns represent intraspecific variability and open columns interspecific variability. Most of the pairwise comparisons of ITS2 sequences exceed 30% difference.

(16.4e18.4%, K2P), 1.3% (0.8%, K2P) and 18e18.3% (6.4e8.7%, K2P) for COI, LSU and ITS2, respectively. In addition, we found that several conspecific individuals of Collembola collected from Estonia differed >14% (Levenshtein distance) from specimens originated from Canada according to COI (e.g. Entomobrya marginata, Lepidocyrtus lignorum (Fabricius, 1775), Isotoma viridis, Isotoma riparia) [29].

4. Discussion Our data reveals the usefulness of ITS2 sequences for Collembola species identification purposes and is also in accordance with the

4

S. Anslan, L. Tedersoo / European Journal of Soil Biology 69 (2015) 1e7

Fig. 2. Neighbor-Joining phylogram of uncorrected p-distances based on an analysis of the COI sequences. The numbers above nodes represent bootstrap support (1000 replications). Taxa in bold font highlight the unsuccessful placement of confamilial genera together using the example of Isotomidae.

Fig. 3. Neighbor-Joining phylogram of uncorrected p-distances based on an analysis of the LSU sequences. The numbers above nodes represent bootstrap support (1000 replications). Congeneric species and confamilial genera are placed together (except for highlighted taxa in bold font).

valuable utility of COI and LSU found in previous studies [7,8]. Although the K2P distance is widely used for barcoding studies, its application is not always justified and problems are common [30,31]. In our study, K2P distances of COI and LSU sequences did not change the species delimitation success, but K2P distances of ITS2 performed poorly and proved unsuitable for representing sequence variation among species. Levenshtein distance of nuclear ribosomal DNA markers (ITS2 and LSU) exhibited invariably lower inter-than intraspecific differences. For the COI we found an overlap between intra- and interspecific variability among two closely

related species. All tested markers proved to be useful for species identification purposes, although, a distinct gap for ITS2 and LSU (Fig. 1) may represent artefacts of insufficient sampling across taxa and geographical scales [32]. We found mitochondrial and nuclear ribosomal markers to differ in their contamination risk. The benefits of COI include better amplification and sequencing success from single individuals using standard primers. This is largely ascribed to the lack of coamplification of non-animal DNA and the lack of intra-individual sequence length polymorphism. However, other studies have

S. Anslan, L. Tedersoo / European Journal of Soil Biology 69 (2015) 1e7 Table 2 Intraspecific variability (Levenshtein distance, %) of ITS2 and COI sequences. Sorted by smallest to largest values of ITS2. There was no intraspecific variability in LSU sequences. Species

Intraspesific variability ITS2

COI

Allacma fusca (Linnaeus, 1758) Dicyrtoma fusca (Lubbock, 1873) Folsomia quadrioculata (Tullberg, 1871) Folsomia sp Hypogastrura socialis (Uzel, 1890) Lepidocyrtus lignorum (Fabricius, 1775) Megalothorax minimus (Willem, 1900) Orchesella cincta (Linnaeus, 1758) €ffer, 1896) L2 Parisotoma notabilis (Scha Pseudachorutes subcrassus (Tullberg, 1871) Willowsia buskii (Lubbock, 1870) Xenylla humicola (Fabricius, 1780) Isotoma riparia (Nicolet, 1842) Isotoma viridis (Bourlet, 1839) Protaphorura sp Sminthurinus sp2 Entomobrya marginata (Tullberg, 1871) Lipothrix lubbocki (Tullberg, 1872) Neanura muscorum (Templeton, 1835) Folsomia fimetaria (Linnaeus, 1758) Entomobrya corticalis (Nicolet, 1842) Sminthurinus sp1 Pogonognathellus flavescens (Tullberg, 1871) Ceratophysella denticulata (Bagnall, 1941) L3 Entomobrya nivalis (Linnaeus, 1758) Dicyrtomina minuta (Fabricius, 1783) Anurophorus septentrionalis (Palissa, 1966) Tomocerus sp Dicyrtoma atra (Linnaeus, 1758) Isotomiella minor (Sch€ affer, 1896) €ffer, 1896) L4 Parisotoma notabilis (Scha Orchesella bifasciata (Bourlet, 1839) Orchesella flavescens (Bourlet, 1839)

0 0 0 0 0 0 0 0 0 0 0 0 0 0e0.4 0e0.4 0e0.5 0e0.5 0.5 0e0.6 0.7 0.3e0.9 1 0e1.1 1.2 0e1.2 0.4e1.2 0e1.8 0e2.1 0e2.5 0e3.1 0.4e3.2 0e3.6 0e3.7

0e0.5 0.5 0 0e3 0e0.5 0e4.3 0 0 0e4.6 1e5.9 0e1.8 0.3e5.1 0e0.2 0e4.3 0e0.8 1.3e6.1 1.8e6.8 0 0e0.5 0.2 0.3 0 0e0.2 0 0.3e6.6 2e4 0e0.2 0e0.3 0e3.5 0e10.7 0.8e2 0e5.1 0e4

revealed that with the universal COI primers it is possible to amplify endosymbiotic bacteria (genus Wolbachia) from Arthropoda, even if only legs are used for DNA extraction [33]. Here we found only a few cases where the amplified and sequenced barcode

5

belonged to groups other than Collembola. Because fungal particles may be found in gut contents and on body surfaces of arthropods [34], there is a potential risk of fungal contamination when identifying arthropods by means of ITS2 or LSU DNA barcoding using universal primers. Mixed chromatograms that probably resulted from co-amplification of fungal DNA were common in our study. The universal ITS primers often yielded PCR products indicating the presence of fungal DNA in the appendices and gut contents of Collembola (S. Anslan, unpublished). Our newly designed ITS2 primers and previously published Collembola-specific LSU primers [20] proved efficient for minimizing contamination during amplification and sequencing. With careful selection of primers and checking of sequence affinities, it is possible to recognize potential contamination of undesirable endosymbiotic bacteria or fungi and therefore prevent misassignment of incorrect sequences to specimens of interest. For the COI marker, it is proposed that differences greater than 3% delimit species in various animal groups [6]. In the present study, in some cases COI exhibited much higher intraspecific variation suggesting lineage separation, while LSU sequences of the same individuals were invariably identical. For example, we found that intraspecific divergence of COI reached 10.7% (in I. minor), which clearly lies outside the general 97% DNA barcoding threshold. There is a wide range of evidence that the morphologically defined species are not biologically meaningful [e.g. 5,35]. Although, the issue of cryptic Collembola species is a rising problem in species determination based on morphology, specimens of I. minor exhibited only up to 2.7% sequence variation in ITS2 and were identical based on LSU. In this study we detected only a single occasion of cryptic species (in the ubiquitous P. notabilis), but we found ample evidence for several species to comprise cryptic morphospecies based on comparisons with database entries. COI sequence divergence of Collembola species appears to be greater than that of other animals [7,29]. Interspecific COI sequence divergence exceeded 8% (Levenshtein distance) [7] and 13.5% (K2P distance) [29] in data sets of 19 and 41 species, respectively. These findings roughly corroborate our results of >10.7% interspecific sequence variation. However, the 89.3% sequence similarity threshold failed to discriminate between two Isotoma species

Fig. 4. Neighbor-Joining phylogram of uncorrected p-distances based on an analysis of the ITS2 sequences. The numbers above nodes represent bootstrap support (1000 replications). Congeneric species and confamilial genera are placed together (except for highlighted taxa in bold font).

6

S. Anslan, L. Tedersoo / European Journal of Soil Biology 69 (2015) 1e7

(I. riparia and I. viridis). These species were also closely related according to ITS2 (Levenshtein distance 6.4%). Burkhardt and Filser [36] found that I. viridis and I. riparia were sister species and K2P distances were below 9% based on COII, hence similar values for COI are expected. Nevertheless, LSU clearly delimited these species in our study, which is consistent with the species delimitation power of LSU in Collembola [8]. 4.1. The promise of alternative barcodes Molecular methods are relatively well established for studying soil microbial communities and with the second-generation sequencing methods it is possible to collect and process very large amounts of data in a relatively short period of time. Typically, soil animal communities are studied using the methods in which specimens are first separated from soil followed by microscopic determination [37]. There are only a few published studies in which soil animal communities are addressed directly from soil samples using the second-generation molecular methods [e.g. 38e40]. Finding primers that are suitable for particular environmental applications is one of the major challenges of metabarcoding [41]. Because COI has the largest library of animal barcodes, its applicability has been tested for metabarcoding purposes [42e44]. However, it is found that the COI region is not entirely adoptable for environmental applications [14,45,46]. To determine soil animal communities, the rDNA Small Subunit (SSU, 18S) region has been successfully used [38,40]. This conservative gene can be selectively amplified from various groups of animals or together with most other eukaryotes. However, SSU does not allow identification of organisms at the species level [2,47]. Inversely, LSU region has a high discriminating power, but usually also high sequence similarity thresholds. It has been recognized that the sequencing errors in HTS may inflate diversity estimates [48]. Therefore few random errors in HTS may blur differences among OTUs when sequence similarity thresholds are high. Conversely, ITS2 marker has been long used for metabarcoding studies and has a high species-level discrimination power across eukaryotes [9]. But to our knowledge, application of the ITS region for identifying soil animals from soil samples has not been tested. Our study shows that the ITS2 marker may be successfully used to discriminate between Collembola species and it provides sufficient molecular signals for detecting cryptic species. Although, DNA barcoding focuses on species delimitation rather than their relationships, we found that closely related species of Collembola clustered together based on the ITS2 and LSU markers (Fig. 3 and 4). The NJ trees of COI sequences typically perform poorly for specimen identification purposes [49,50] and therefore, the genus-level identification of an unknown specimen may be more reliable with nuclear markers. Moreover, the secondary structure of ITS2 may further improve the accuracy of phylogenetic trees [51]. Thus, ITS2 has a potential of simultaneous species-level determination of several eukaryotic groups in metabarcoding approaches. However, potential drawbacks of the ITS region include the great sequence length variation, presence of pseudogenes [52] and incompleteness of the ITS2 reference data for animals. Although with the ongoing rapid development of sequencing technology the reference data of ITS2 is rapidly increasing. Acknowledgments We thank Edite Jucevica for assistance in identification of Collembola and Axios Review system for primary reviews and referrals. This study received funding from the Estonian Science Foundation grants 9286, PUT0171, EMP265, and FIBIR.

Appendix A. Supplementary data Supplementary data related to this article can be found at http:// dx.doi.org/10.1016/j.ejsobi.2015.04.001. References [1] A.L. Roca, N. Georgiadis, J. Pecon-Slattery, S.J. O'Brien, Genetic evidence for two species of elephant in Africa, Science 293 (2001) 1473e1477. [2] C.Q. Tang, F. Leasi, U. Obertegger, A. Kieneke, T.G. Barraclough, D. Fontaneto, The widely used small subunit 18S rDNA molecule greatly underestimates true diversity in biodiversity surveys of the meiofauna, Proc. Natl. Acad. Sci. U. S. A. 109 (2012) 16208e16212. [3] F. Cicconardi, F. Nardi, B.C. Emerson, F. Frati, P.P. Fanciulli, Deep phylogeographic divisions and long-term persistence of forest invertebrates (Hexapoda: Collembola) in the North-Western Mediterranean basin, Mol. Ecol. 19 (2010) 386e400. [4] S. Schaeffer, T. Pfingstl, S. Koblmueller, K.A. Winkler, C. Sturmbauer, G. Krisper, Phylogenetic analysis of European Scutovertex mites (Acari, Oribatida, Scutoverticidae) reveals paraphyly and cryptic diversity: a molecular genetic and morphological approach, Mol. Phylogen. Evol. 55 (2010) 677e688. [5] D. Porco, A. Bedos, P. Greenslade, C. Janion, D. Skarznynski, M.I. Stevens, B.J. van Vuuren, L. Deharveng, Challenging species delimitation in Collembola: cryptic diversity among common springtails unveiled by DNA barcoding, Invertebr. Syst. 26 (2012) 470e477. [6] P.D.N. Hebert, A. Cywinska, S.L. Ball, J.R. DeWaard, Biological identifications through DNA barcodes, Proc. R. Soc. B 270 (2003) 313e321. [7] I.D. Hogg, P.D.N. Hebert, Biological identification of springtails (Hexapoda : Collembola) from the Canadian Arctic, using mitochondrial DNA barcodes, Can. J. Zool. 82 (2004) 749e754. [8] D. Porco, M. Potapov, A. Bedos, G. Busmachiu, W.M. Weiner, S. Hamra-Kroua, L. Deharveng, Cryptic diversity in the Ubiquist species parisotoma notabilis (Collembola, isotomidae): a long-used Chimeric species? Plos One 7 (2012). [9] H. Yao, J. Song, C. Liu, K. Luo, J. Han, Y. Li, X. Pang, H. Xu, Y. Zhu, P. Xiao, S. Chen, Use of ITS2 region as the universal DNA barcode for plants and animals, Plos One 5 (2010). [10] H. Gou, G. Guan, A. Liu, M. Ma, Z. Xu, Z. Liu, Q. Ren, Y. Li, J. Yang, Z. Chen, H. Yin, J. Luo, A DNA barcode for Piroplasmea, Acta Trop. 124 (2012) 92e97. [11] C.L. Schoch, K.A. Seifert, S. Huhndorf, V. Robert, J.L. Spouge, C.A. Levesque, W. Chen, E. Bolchacova, K. Voigt, P.W. Crous, A.N. Miller, M.J. Wingfield, M.C. Aime, K.D. An, F.Y. Bai, R.W. Barreto, D. Begerow, M.J. Bergeron, M. Blackwell, T. Boekhout, M. Bogale, N. Boonyuen, A.R. Burgaz, B. Buyck, L. Cai, Q. Cai, G. Cardinali, P. Chaverri, B.J. Coppins, A. Crespo, P. Cubas, C. Cummings, U. Damm, Z.W. de Beer, G.S. de Hoog, R. Del-Prado, B. Dentinger, J. Dieguez-Uribeondo, P.K. Divakar, B. Douglas, M. Duenas, T.A. Duong, U. Eberhardt, J.E. Edwards, M.S. Elshahed, K. Fliegerova, M. Furtado, M.A. Garcia, Z.W. Ge, G.W. Griffith, K. Griffiths, J.Z. Groenewald, M. Groenewald, M. Grube, M. Gryzenhout, L.D. Guo, F. Hagen, S. Hambleton, R.C. Hamelin, K. Hansen, P. Harrold, G. Heller, G. Herrera, K. Hirayama, Y. Hirooka, H.M. Ho, K. Hoffmann, V. Hofstetter, F. Hognabba, P.M. Hollingsworth, S.B. Hong, K. Hosaka, J. Houbraken, K. Hughes, S. Huhtinen, K.D. Hyde, T. James, E.M. Johnson, J.E. Johnson, P.R. Johnston, E.B. Jones, L.J. Kelly, P.M. Kirk, D.G. Knapp, U. Koljalg, G.M. Kovacs, C.P. Kurtzman, S. Landvik, S.D. Leavitt, A.S. Liggenstoffer, K. Liimatainen, L. Lombard, J.J. Luangsa-Ard, H.T. Lumbsch, H. Maganti, S.S. Maharachchikumbura, M.P. Martin, T.W. May, A.R. McTaggart, A.S. Methven, W. Meyer, J.M. Moncalvo, S. Mongkolsamrit, L.G. Nagy, R.H. Nilsson, T. Niskanen, I. Nyilasi, G. Okada, I. Okane, I. Olariaga, J. Otte, T. Papp, D. Park, T. Petkovits, R. Pino-Bodas, W. Quaedvlieg, H.A. Raja, D. Redecker, T. Rintoul, C. Ruibal, J.M. Sarmiento-Ramirez, I. Schmitt, A. Schussler, C. Shearer, K. Sotome, F.O. Stefani, S. Stenroos, B. Stielow, H. Stockinger, S. Suetrong, S.O. Suh, G.H. Sung, M. Suzuki, K. Tanaka, L. Tedersoo, M.T. Telleria, E. Tretter, W.A. Untereiner, H. Urbina, C. Vagvolgyi, A. Vialle, T.D. Vu, G. Walther, Q.M. Wang, Y. Wang, B.S. Weir, M. Weiss, M.M. White, J. Xu, R. Yahr, Z.L. Yang, A. Yurkov, J.C. Zamora, N. Zhang, W.Y. Zhuang, D. Schindel, Nuclear ribosomal internal transcribed spacer (ITS) region as a universal DNA barcode marker for Fungi, Proc. Natl. Acad. Sci. U. S. A. 109 (2012) 6241e6246. [12] J. Pawlowski, S. Audic, S. Adl, D. Bass, L. Belbahri, C. Berney, S.S. Bowser, I. Cepicka, J. Decelle, M. Dunthorn, A.M. Fiore-Donno, G.H. Gile, M. Holzmann, R. Jahn, M. Jirk u, P.J. Keeling, M. Kostka, A. Kudryavtsev, E. Lara, J. Lukes, D.G. Mann, E.A.D. Mitchell, F. Nitsche, M. Romeralo, G.W. Saunders, A.G.B. Simpson, A.V. Smirnov, J.L. Spouge, R.F. Stern, T. Stoeck, J. Zimmermann, D. Schindel, C. de Vargas, CBOL protist Working group: barcoding eukaryotic richness beyond the animal, plant, and fungal Kingdoms, PLoS Biol. 10 (2012). [13] A.W. Coleman, Is there a molecular key to the level of “biological species” in eukaryotes? A DNA guide, Mol. Phylogen. Evol. 50 (2009) 197e203. [14] B.E. Deagle, S.N. Jarman, E. Coissac, F. Pompanon, P. Taberlet, DNA metabarcoding and the cytochrome c oxidase subunit I marker: not a perfect match, Biol. Lett. 10 (2014). [15] H.E. O'Brien, J.L. Parrent, J.A. Jackson, J.M. Moncalvo, R. Vilgalys, Fungal community analysis by large-scale sequencing of environmental samples, Appl. Environ. Microbiol. 71 (2005) 5544e5550.

S. Anslan, L. Tedersoo / European Journal of Soil Biology 69 (2015) 1e7 [16] P. Taberlet, E. Coissac, F. Pompanon, C. Brochmann, E. Willerslev, Towards next-generation biodiversity assessment using DNA metabarcoding, Mol. Ecol. 21 (2012) 2045e2050. [17] A. Fjellberg, The collembola of Fennoscandia and Denmark. Part 1: Poduromorpha, Fauna Entomol. Scand. 35 (1998) 1e183. [18] A. Fjellberg, The collembola of Fennoscandia and Denmark. Part II: entomobryomorpha and Symphypleona, Fauna Entomol. Scand. 42 (2007) 1e264 ievi. [19] O. Folmer, M. Black, W. Hoeh, R. Lutz, R. Vrijenhoek, DNA primers for amplification of mitochondrial cytochrome c oxidase subunit I from diverse metazoan invertebrates, Mol. Mar. Biol. Biotechnol. 3 (1994) 294e299. [20] C.A. D'Haese, Were the first springtails semi-aquatic? A phylogenetic approach by means of 28S rDNA and optimization alignment, Proc. R. Soc. B 269 (2002) 1143e1151. [21] T.J. White, T. Bruns, S. Lee, J. Taylor, Amplification and direct sequencing of fungal ribosomal RNA genes for phylogenetics, in: PCR Protocols: a Guide to Methods and Applications, vol. 18, 1990, pp. 315e322. [22] K.J. Martin, P.T. Rygiewicz, Fungal-specific PCR primers developed for analysis of the ITS region of environmental DNA extracts, BMC Microbiol. 5 (2005). [23] K. Abarenkov, L. Tedersoo, R.H. Nilsson, K. Vellak, I. Saar, V. Veldre, E. Parmasto, M. Prous, A. Aan, M. Ots, O. Kurina, I. Ostonen, J. Jogeva, S. Halapuu, K. Poldmaa, M. Toots, J. Truu, K.-H. Larsson, U. Koljalg, PlutoF-a Web based Workbench for ecological and taxonomic research, with an online implementation for fungal its sequences, Evol. Bioinform. 6 (2010) 189e196. [24] K. Katoh, K. Misawa, K. Kuma, T. Miyata, MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform, Nucleic Acids Res. 30 (2002) 3059e3066. [25] M. Gouy, S. Guindon, O. Gascuel, SeaView Version 4: a multiplatform graphical user interface for sequence alignment and phylogenetic tree building, Mol. Biol. Evol. 27 (2010) 221e224. [26] A. Keller, T. Schleicher, J. Schultz, T. Mueller, T. Dandekar, M. Wolf, 5.8S-28S rRNA interaction and HMM-based ITS2 annotation, Gene 430 (2009) 50e57. [27] R.C. Edgar, Search and clustering orders of magnitude faster than BLAST, Bioinformatics 26 (2010) 2460e2461. [28] K. Tamura, G. Stecher, D. Peterson, A. Filipski, S. Kumar, MEGA6: molecular evolutionary genetics analysis Version 6.0, Mol. Biol. Evol. 30 (2013) 2725e2729. [29] D. Porco, D. Skarzynski, T. Decaens, P.D.N. Hebert, L. Deharveng, Barcoding the Collembola of Churchill: a molecular taxonomic reassessment of species diversity in a sub-Arctic area, Mol. Ecol. Resour. 14 (2014) 249e261. [30] A. Srivathsan, R. Meier, On the inappropriate use of Kimura-2-parameter (K2P) divergences in the DNA-barcoding literature, Cladistics 28 (2012) 190e194. [31] R.A. Collins, L.M. Boykin, R.H. Cruickshank, K.F. Armstrong, Barcoding's next top model: an evaluation of nucleotide substitution models for specimen identification, Methods Ecol. Evol. 3 (2012) 457e465. [32] J. Bergsten, D.T. Bilton, T. Fujisawa, M. Elliott, M.T. Monaghan, M. Balke, L. Hendrich, J. Geijer, J. Herrmann, G.N. Foster, I. Ribera, A.N. Nilsson, T.G. Barraclough, A.P. Vogler, The effect of geographical scale of sampling on DNA barcoding, Syst. Biol. 61 (2012) 851e869. [33] M.A. Smith, C. Bertrand, K. Crosby, E.S. Eveleigh, J. Fernandez-Triana, B.L. Fisher, J. Gibbs, M. Hajibabaei, W. Hallwachs, K. Hind, J. Hrcek, D.W. Huang, M. Janda, D.H. Janzen, Y. Li, S.E. Miller, L. Packer, D. Quicke, S. Ratnasingham, J. Rodriguez, R. Rougerie, M.R. Shaw, C. Sheffield, J.K. Stahlhut, D. Steinke, J. Whitfield, M. Wood, X. Zhou, Wolbachia and DNA

7

barcoding insects: patterns, potential, and problems, Plos One 7 (2012). [34] E.A. Lilleskov, T.D. Bruns, Spore dispersal of a resupinate ectomycorrhizal fungus, Tomentella sublilacina, via soil food webs, Mycologia 97 (2005) 762e769. [35] F. Cicconardi, P.P. Fanciulli, B.C. Emerson, Collembola, the biological species concept and the underestimation of global species richness, Mol. Ecol. 22 (2013) 5382e5396. [36] U. Burkhardt, J. Filser, Molecular evidence for a fourth species within the Isotoma viridis group (Insecta, Collembola), Zool. Scr. 34 (2005) 177e185. [37] D.C. Coleman, D. Crossley, P.F. Hendrix, Fundamentals of Soil Ecology, Academic press, 2004. [38] H.C. Hamilton, M.S. Strickland, K. Wickings, M.A. Bradford, N. Fierer, Surveying soil faunal communities using a direct molecular approach, Soil Biol. Biochem. 41 (2009) 1311e1314. [39] F. Bienert, S. De Danieli, C. Miquel, E. Coissac, C. Poillot, J.-J. Brun, P. Taberlet, Tracking earthworm communities from soil DNA, Mol. Ecol. 21 (2012) 2017e2030. [40] T. Wu, E. Ayres, R.D. Bardgett, D.H. Wall, J.R. Garey, Molecular study of worldwide distribution and diversity of soil animals, Proc. Natl. Acad. Sci. U. S. A. 108 (2011) 17720e17725. [41] E. Coissac, T. Riaz, N. Puillandre, Bioinformatic challenges for DNA metabarcoding of plants and animals, Mol. Ecol. 21 (2012) 1834e1847. [42] I. Meusnier, G.A.C. Singer, J.-F. Landry, D.A. Hickey, P.D.N. Hebert, M. Hajibabaei, A universal DNA mini-barcode for biodiversity analysis, BMC Genomics 9 (2008). [43] M. Hajibabaei, S. Shokralla, X. Zhou, G.A.C. Singer, D.J. Baird, Environmental barcoding: a next-generation sequencing approach for biomonitoring applications using river Benthos, Plos One 6 (2011). [44] D.W. Yu, Y. Ji, B.C. Emerson, X. Wang, C. Ye, C. Yang, Z. Ding, Biodiversity soup: metabarcoding of arthropods for rapid biodiversity assessment and biomonitoring, Methods Ecol. Evol. 3 (2012) 613e623. [45] G.F. Ficetola, E. Coissac, S. Zundel, T. Riaz, W. Shehzad, J. Bessiere, P. Taberlet, F. Pompanon, An in silico approach for the evaluation of DNA barcodes, BMC Genomics 11 (2010). [46] L.S. Epp, S. Boessenkool, E.P. Bellemain, J. Haile, A. Esposito, T. Riaz, C. Erseus, V.I. Gusarov, M.E. Edwards, A. Johnsen, H.K. Stenoien, K. Hassel, H. Kauserud, N.G. Yoccoz, K. Brathen, E. Willerslev, P. Taberlet, E. Coissac, C. Brochmann, New environmental metabarcodes for analysing soil DNA: potential for studying past and present ecosystems, Mol. Ecol. 21 (2012) 1821e1833. [47] H.M. Bik, D.L. Porazinska, S. Creer, J.G. Caporaso, R. Knight, W.K. Thomas, Sequencing our way towards understanding global eukaryotic biodiversity, Trends Ecol. Evol. 27 (2012) 233e243. [48] I.A. Dickie, Insidious effects of sequencing errors on perceived diversity in molecular surveys, New. Phytol. 188 (2010) 916e918. [49] R. Meier, K. Shiyang, G. Vaidya, P.K.L. Ng, DNA barcoding and taxonomy in diptera: a tale of high intraspecific variability and low identification success, Syst. Biol. 55 (2006) 715e728. [50] R.A. Collins, R.H. Cruickshank, The seven deadly sins of DNA barcoding, Mol. Ecol. Resour. 13 (2013) 969e975. [51] A. Keller, F. Foerster, T. Mueller, T. Dandekar, J. Schultz, M. Wolf, Including RNA secondary structures improves accuracy and robustness in reconstruction of phylogenetic trees, Biol. Direct 5 (2010). [52] L.M. Marquez, D.J. Miller, J.B. MacKenzie, M.J.H. van Oppen, Pseudogenes contribute to the extreme diversity of nuclear ribosomal DNA in the hard coral Acropora, Mol. Biol. Evol. 20 (2003) 1077e1086.