Applying microbial biogeography in soil forensics

Applying microbial biogeography in soil forensics

Accepted Manuscript Title: Applying microbial biogeography in soil forensics Authors: Habteab Habtom, Zohar Pasternak, Ofra Matan, Chen Azulay, Ron Ga...

1MB Sizes 1 Downloads 69 Views

Accepted Manuscript Title: Applying microbial biogeography in soil forensics Authors: Habteab Habtom, Zohar Pasternak, Ofra Matan, Chen Azulay, Ron Gafny, Edouard Jurkevitch PII: DOI: Reference:

S1872-4973(18)30581-7 https://doi.org/10.1016/j.fsigen.2018.11.010 FSIGEN 2002

To appear in:

Forensic Science International: Genetics

Received date: Revised date: Accepted date:

2 July 2017 10 October 2018 7 November 2018

Please cite this article as: Habtom H, Pasternak Z, Matan O, Azulay C, Gafny R, Jurkevitch E, Applying microbial biogeography in soil forensics, Forensic Science International: Genetics (2018), https://doi.org/10.1016/j.fsigen.2018.11.010 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Applying microbial biogeography in soil forensics

Habteab Habtom1*, Zohar Pasternak1*, Ofra Matan1, Chen Azulay1, Ron Gafny2, Edouard Jurkevitch1

Department of Plant Pathology and Microbiology, Robert H. Smith Faculty of Agriculture, Food and Environment, The Hebrew University of Jerusalem, Rehovot, Israel.

2

Forensic Biology Laboratory, Division of Identification and Forensic Science, Israel Police,

SC R

National Headquarters, Haim Bar-Lev Road, Jerusalem, Israel. *

IP T

1

Equal contribution

U

Corresponding author: Zohar Pasternak

N

E-mail: [email protected]

M

A

Tel: +972-8-9489235

Highlights

Soil microbial DNA profiles correlate to geographic location more than to soil type.



At 25-1000m apart, there is strong distance-decay relationship between soil samples.



This may enable pinpointing the source location of a sample to within at least 25m.



At 1-260 km apart, difference between samples correlates to annual precipitation.

CC E

PT

ED



A

Abstract

The ubiquity, heterogeneity and transferability of soil makes it useful as evidence in criminal

investigations, especially using new methods that survey the microbial DNA it contains. However, to be used effectively and reliably, more needs to be learned about the natural distribution patterns of microbial communities in soil. In this study we examine these patterns in detail, at local to regional scales (2 m - 260 km), across an environmental gradient in three different soil types. Geographic 1

location was found to be more important than soil type in determining the microbial community composition: communities from the same site but different soil types, although significantly different from each other, were still much more similar to each other than were communities from the same soil type but from different sites. At a local scale (25 – 1000 m), distance-decay relationships were observed in all soil types: the farther apart two soil communities were located, even in the same soil type, the more they differed. At regional-scale distances (1 - 260 km), differences between

IP T

communities did not increase with increased geographic distance between them, and the dominant factor determining the community profile was the physico-chemical environment, most notably

SC R

annual precipitation (R2=0.69), soil sodium (R2=0.49) and soil ammonium (R2=0.47) levels. We introduce a likelihood-ratio framework for quantitative evaluation of soil microbial DNA profile

U

evidence in casework. In conclusion, these profiles, along with detailed knowledge of natural soil

N

microbial biogeography, provide valuable forensic information on soil sample comparison and allow

A

the determination of approximate source location on large (hundreds of km) spatial scales. Moreover,

M

at small spatial scales it may enable pinpointing the source location of a sample to within at least

ED

25m, regardless of soil type and environmental conditions.

CC E

Introduction

PT

Keywords: microbial DNA; soil; biogeography; TRFLP; forensics; distance-decay.

Spatial distribution of macro-organisms is subject to geographical barriers. For microbes,

however, as Beijerinck suggested, “everything is everywhere” (O’Malley, 2008), to which Baas

A

Becking added “but the environment selects” (Quispel, 1998). Many studies support the hypothesis of microbial cosmopolitanism (Nagler et al., 2016; Finlay, 2002), which may be explained by multiple mechanisms. Almost anything that moves, whether deliberately or accidentally, can become a viable means of dispersal for microorganisms; this, coupled with very high abundance (e.g. 109 bacteria per gram of soil [Torsvik et al. 1990] and 106 bacteria per ml of lake water [Fenchel and Finlay, 2004]) and effective survival strategies such as sporulation (Lücking et al., 2013) greatly enhance the 2

probability of dispersal. Finally, extensive lateral gene transfer (Treangen and Rocha, 2011) lowers the probability of speciation and contributes to the lack of unique taxa specific to a given geographic region (Finlay, 2002). Yet patterns of local microbial community composition frequently emerge (Liu et al., 2016) driven by heterogeneity in environmental parameters (Green and Bohannan, 2006; Nagler et al., 2016) and dispersal limitation (Sul et al. 2013). This is particularly pertinent in the highly spatially-structured soil habitat where organic matter, nitrogen, moisture, pH, mineral ions and

IP T

plant cover may all affect local selection (Gartzia-Bengoetxea et al., 2016). However, significant variations in community structure may also appear in the absence of detectable ecological gradients

SC R

(Adams et al., 2013; Lear et al., 2014), particularly distance-decay relationships, where bacterial communities become increasingly different as the geographical distance between them

U

increases (Nekola and White, 1999). Even seemingly-ubiquitous soil species may show distance-

N

related population differentiation caused by genetic drift due to limited dispersal (Vos et al. 2013).

A

The ubiquity, heterogeneity and transferability of soil provide valuable geographic

M

information that can help differentiate between locations. As such, soil-derived information is useful as evidence in criminal investigations, as it may enable the provenance of a soil sample of unknown

ED

origin, may help restrict search areas in criminal investigations and provide association through the comparison between samples from a suspect and from a crime scene (Fitzpatrick, 2011). The current

PT

toolbox available to the forensic investigator mostly relies upon details of soil physical appearance

CC E

and chemical composition, including soil type, color and particle size, as well as its elemental, mineral, and organic content (Woods et al., 2016), providing useful data, sometimes instrumental for solving crimes (Fitzpatrick and Raven, 2012). Accordingly, in a number of countries (e.g. the UK,

A

Australia and the USA) soil analysis is successfully used in legal submissions (Uitdehaag et al. 2016). Yet, these methods may not be sufficiently discriminative if samples originate from landscape geology and mineralogy that are relatively homogeneous and very similar over large areas (Giampaoli et al., 2014), are in close proximity or are from soils with low inorganic content (Young et al., 2014). Additionally, most of these methods are costly and involve considerable time and specialist soil expertise in their use and interpretation; therefore, application of soil information in criminal 3

investigations has been under-used or is non-existent in many countries around the world (Lorna Dawson, Personal Communication). To address this issue, DNA-based technologies are being developed and assessed (Kuiper, 2016). These include, among others, optimized microbial DNAbased methods, such as 16S-rRNA TRFLP and 16S-rRNA next-generation sequencing that are suitable for differentiation of soil samples (Habtom et al. 2017; Demaneche et al. 2017). In order to be used effectively in investigations and in court, more needs to be learned about the natural

IP T

distribution patterns of microbial communities in soil and how these relate to the forensic context. Accordingly, in the current study we use these DNA-based technologies to evaluate bacterial

SC R

distribution patterns and assess their resolution at local (meters) to regional (hundreds of km) scales, between and within different soil types forming similar clusters at various sites across a climatic

U

gradient.

N

Material and Methods

A

Soil sampling. Sampling was undertaken at five research sites across the rainfall gradient of

M

Israel, spanning a geographic line of 260 km from the desert region in the south (100 mm rain per year) to the Mediterranean climatic region in the north (1300 mm rain per year; Figure 1). At each

ED

site, 2 to 4 different soil types were sampled within three km of each other, all sampling points similar in landscape structure - flat to mild slopes with mostly no vegetation. Each sample weighed 100 g,

PT

with five replicates per soil type 1 to 2 m apart, using sterilized collecting and storage equipment. At

CC E

the Gimzo site only, for each of the three soil types available there, samples were collected at locations distant 25 m, 75 m and 100 m from each other, again with five replicates each, 1 to 2 m apart. All samples were kept in airtight plastic bags at 4 °C until processing within 24 h. Physical and chemical

A

measurements were collected from each soil type (Table 1) using standard soil analytical methods (Page et al. 1982): saturation percentage (by gravimetric method, in Saturated Soil Extract [SSE]), pH (in SSE), electrical conductivity (salinity; in SSE), sodium (in SSE), calcium and magnesium (in SSE by ICP [inductively coupled plasma] spectroscopy and flame photometer), sodium adsorption ratio (in SSE), phosphorus adsorption ratio (in SSE), percentage organic matter (by dichromate method), N as nitrate, N as ammonium (including adsorbed N), phosphorus in extract, potassium in 4

extract. Due to technical difficulties, physical and chemical measurements are not available for the two southern sites (Avdat and Beer-Sheva). Microbial profiling. Terminal restriction fragment length polymorphism (TRFLP) analysis was performed as described previously (Habtom et al. 2017). In short, DNA was extracted from 0.25 g soil using PowerSoil (MoBIO, USA), concentrated and purified using DNA Clean & Concentrator25 (Zymo Research, USA), and then amplified with PCR using primers 341F [5’-

IP T

CCTACGGGAGGCAGCAG] and 907R [5’-CCGTCAATTCMTTTRAGTTT] fluorescently 5’tagged with 6FAM. Amplicons were digested with Mung Bean nuclease (New England BioLabs,

SC R

USA), purified with Wizard SV (Promega) and digested with Nuclease TaqαI (New England BioLabs, USA) for 3 h at 65 °C followed by enzyme inactivation at 80 °C for 20 min. The digested DNA was

U

again treated with DNA Clean & Concentrator-25 (Zymo Research) the resulting fragments were

N

separated on an 3500xl Genetic Analyzer (Applied Biosystems, USA) with three technical replicates

A

per sample. Electropherograms were read using GeneMarker (SoftGenetics, USA) and peaks (i.e.

M

TRF, terminal restriction fragments) identified between 50-500 bp with a minimal intensity of 50 RFU (relative fluorescence units). TRF were binned to the closest basepair, and for each sample with

ED

three capillary electrophoresis (CE) runs, the highest intensity level of the three was taken to represent the intensity at each TRF. In addition to TRFLP, soil profiling at the Gimzo site was also performed

PT

by high throughput sequencing (HTS) as described in Habtom et al. (2017). In short, primer pair

CC E

515F-806R, targeting the V4 region of the 16S rRNA gene, was used in Illumina Miseq with standard targeted-amplicon sequencing (TAS; Green et al. 2015). Data were analyzed with MOTHUR (Schloss et al. 2009). Quality control, trimming and de-noising were performed as outlined in the standard

A

MOTHUR protocol (Kozich et al. 2013). All sequences were aligned to the SILVA reference alignment database (Quast et al. 2013) and trimmed at both ends so that the alignment contained no ‘overhangs’. To further reduce sequencing errors, sequences were pre-clustered based on the algorithm of Huse et al. (2010) and chimeric reads were removed with MOTHUR’s implementation of the UCHIME method (Edgar et al. 2011). Pairwise distances were calculated between all DNA reads, and reads were subsequently clustered into operational taxonomic units (OTUs) at the 0.03 5

level, meaning that sequences that displayed >97% similarity with each other were considered the same OTU. The taxonomic affiliation of each OTU was identified according to current SILVA taxonomy, and the abundance of each taxon in each sample were not rarefied, following the recommendation of McMurdie and Holmes (2014). TRFLP and HTS data were relativized according to the sample total, i.e. the sum of abundance in each sample. Multivariate analyses were performed in PC-ORD v6.12 (MjM Software, USA) using Bray-Curtis distances. Ordinations were performed

IP T

with non-metric multidimensional scaling (NMDS; Mather 1976) at 500 iterations, and cluster analyses were performed with flexible beta linkages (β=-0.25). Differences between sample groups

SC R

were calculated with the multi-response permutation procedure (MRPP; Mielke et al. 1984), where the size of the difference between sample groups (i.e. sites or soil types) was represented by the A-

U

statistic of the MRPP test while its significance was represented by the MRPP P-value. To account

N

for multiple testing bias, we employed a Bonferroni-style adjustment to the significance threshold.

A

Evaluation of forensic soil microbial evidence by likelihood ratio. The TRFLP profiles from

M

six soil types (loess, desert Skeletal soil, sand, Mediterranean mountain soil, rendzina and terra rossa) at five locations (Avdat, Beer-Sheva, Dalya, Gimzo and Idmit) were used as a test set for modelling

ED

and validation of a likelihood-ratio (LR) framework for evidence evaluation. Overall, 15 soil type/location combinations were taken with five replicates each, for a total of 75 profiles. Bray-Curtis

PT

distances were calculated for two groups, "same source" (i.e. replicates sampled 1 – 2 m apart;

CC E

N=143) and "different source" (sampled 25 – 260,000 m apart; N=2485). Probability distributions were charted for both groups, followed by calculation of a Gaussian model for the probability density functions. The cost of log-likelihood ratio was calculated using the Bosaris toolkit (Brummer and du

A

Preez 2006). Results

Soil vs. site effects. TRFLP analysis of the microbial community detected a total of 447 TRFs (fragments) in our study. Of these, 29 (6.5%) were ubiquitously found in all 15 sites x soil types whereas 55 (12.3%) were specific to a single soil type at a specific site, i.e. they were not found at any other site or soil type. No obvious trend was found in the richness or diversity of TRFs in specific 6

soil types or sites (Supplementary table S1). Geographical sites formed tight ordination clusters without regard to the soil types they consisted of (Figure 2A), whereas the same soil types from different sites did not form clusters (Figure 2B). In other words, site location was more important than soil type in determining the microbial community composition: communities from the same site but different soil types were much more similar to each other than were communities from the same soil type but from different sites (Figure 3). Nevertheless, at each site, the bacterial communities from

IP T

different soil types were still significantly different (MRPP tests, p<0.003, Supplementary Table S2). Average annual precipitation was highly correlated to soil community composition in all sites

SC R

(R2=0.69). Due to technical difficulties, all other physical and chemical measurements are not available for the two southern sites (Avdat and Beer-Sheva); between sites, of all the chemical and physical parameters that were measured at the three northern sites (Idmit, Dalya and Gimzo), soil

N

U

community composition was most correlated to soil sodium (R2=0.49) and soil ammonium (R2=0.47)

A

levels (Figure 4). At a local scale (within each site), where precipitation is highly similar, differences

M

between bacterial communities were primarily driven by soil type (rendzina, terra rossa and sand; Figures 2, 3) as determined by various soil parameters which were highly correlated (R2>0.7) to

ED

within-site microbial community structures: water saturation, phosphorus and organic matter levels

Dalya (Figure 4C).

PT

at Gimzo (Figure 4A); sodium and SAR at Idmit (Figure 4B); and sodium and potassium levels at

CC E

Distance-decay analysis of bacterial communities along geographical distances. A significant correlation was discovered between the geographical distance between soil samples and the difference between their microbial communities: the farther apart two soil communities were located,

A

the more significantly they differed (Figure 5A) and the larger this difference was (Figure 5B). This distance-decay relationship was observed from distances of tens of meters to tens of km, across all soil types. Differences between communities were non-significant at 2 m apart (MRPP tests, P>0.35), increasing logarithmically in significance (P-value) when distances between samples grew from 25 m (P≈10-3) to 60 km (P≈10-8) and then remaining steady. The inflection point of the logarithmic trend line in all soils (sand, rendzina and terra rossa) was located at 200-1000 m; dissimilarity between 7

samples from the same soil type strongly increased between local-scale distances (25-1000 m), but did not significantly increase as regional-scale distances increased from 60 to 260 km (R2=0.2; Figure 5A-B insets). High throughput sequencing-based taxonomic comparison. We further examined the bacterial composition of three adjacent soils using high-throughput sequencing of the 16S rRNA gene. A total of 1,429,852 reads were sequenced from 15 samples at the Gimzo site, representing five replicates

IP T

each of rendzina, sand and terra rossa with 95,323±24,230 reads per sample (Supplementary table S3; Supplementary Figure S4). Using phylum-level taxonomy, it was impossible to differentiate

SC R

rendzina from terra rossa, and even sand was not well differentiated from both (Figure 6A). The main phyla in all three soil types were Proteobacteria and Actinobacteria, with bacilli levels being

U

significantly higher in sand (indicator value analysis, IV=80.4, P<0.01) and planctomycetacia and

N

verrucomicrobia significantly lower (IV=70.2 and 71.1, respectively, with P<0.01 for both). Using

A

genus-level taxonomy, the sand samples displayed the lowest diversity (Shannon index 4.55±0.10)

M

compared to rendzina and terra rossa (4.70±0.05 and 4.75±0.05, respectively). The communities at the three soil types were significantly different from each other (MRPP tests, P<0.006), with the

ED

rendzina and terra rossa communities about three times more similar to each other (MRPP-A=0.11) than either of them to the sand (A=0.35 and A=0.28, respectively). Variability between communities

PT

from the same soil type was higher in sand (mean within-group Bray-Curtis distance of 0.26) than in

CC E

rendzina or terra rossa (0.16 and 0.18, respectively). The most abundant genera (each consisting >3.4% of reads per sample) found in the rendzina and terra-rossa soils were the radiotolerant Rubrobacter, the nitrogen-fixing Microvirga and unidentified members of the Acidobacteria, whereas

A

in sand it was Microvirga and the stress-tolerant genera Arthrobacter and Bacillus. Evaluation of forensic soil microbial evidence by likelihood ratio. To evaluate the evidence,

we use the likelihood ratio (LR) framework as recommended by the ENFSI (Willis, 2015). Within this framework, two competing propositions are considered: first, the prosecution hypothesis (termed Hp) that states that the two soil samples under comparison (e.g. from a crime scene and a suspect's shoe) originate from a single source; and second, the defence hypothesis (termed Hd) that states that 8

the two soil samples originate from different sources. For each hypothesis, we calculate the probability of observing each Bray-Curtis score (S) if that hypothesis is correct, namely Pr(S|Hp) and Pr(S|Hd). This calculation was performed by a validation experiment involving two populations of S, one that came from samples known to originate from the same soil type and location (1 – 2 m from each other) and the other that came from samples originating from different soils (25 – 260,000 m apart). After the distribution of both groups was known, we modelled the probability density functions

IP T

as Gaussians (Table 2; Figure 7). The likelihood ratio can now be calculated for each new pair of soil samples from casework where their connection is unknown. The LR is the ratio between both

SC R

probabilities, and using our model we get for each Bray-Curtis score (S) an 𝐿𝑅 = 0.17∗exp(−35.3∗(S−0.33)^2) 0.19∗exp(−42.2∗(S−0.58)^2)

Pr⁡(𝑆|𝐻𝑝)

Pr⁡(𝑆|𝐻𝑑)

=

. The "same source" proposition is supported by LR values larger than 1 with

U

the support strengthening as LR increases, whereas the "different source" proposition is supported by

N

LR values smaller than 1 with the support strengthening as LR decreases towards 0. when LR=1 both

A

propositions are equally likely and the soil evidence has no probative value. In our model, this "non-

M

probative" point is set around a Bray-Curtis score of 0.456. The accuracy of our model, measured as

ED

the cost of log-likelihood ratio (Cllr; Brummer and du Preez, 2005) was found to be Cllr=0.57. Discussion

PT

Two main mechanisms are thought to govern compositional similarity in biological communities. First, if organisms are dispersal-limited, a decrease in community similarity would

CC E

occur with distance as community members are more likely to colonize nearby locations. Second, as environmental attributes become more heterogenous with distance, adaptation to local conditions may

A

also lead to a distance-decay in similarity. Many studies using bacteria focusing on either local spatial scales (cm to hundreds of meters) (e.g. Bell 2010; Stursova et al. 2016) or regional and continental scales (km to hundreds and thousands of km) (e.g. Jiao et al. 2016; Zhang et al. 2016) examined the relative importance of these two mechanisms, and usually concluded that it was some mathematical combination of both (Langenheder and Ragnarsson, 2007; Ma et al., 2016). However, it could be suggested that dispersal limitation is more important in determining bacterial community composition 9

at local scales than at larger regional or cross-continental scales (Lear et al. 2014). Studies performed in different ecosystems across scales, from local to continental, provided support for this hypothesis (Martiny et al., 2011; Lear et al. 2013), including in soils (Bisset et al., 2010). Yet, we are not aware of studies which examined local versus regional effects across clusters of similar soil types. This gap is important in the forensic context, and to address it, our study investigated bacterial community distribution patterns within clusters of the same soil types at both scales simultaneously, along the

IP T

rainfall gradient of Israel. At the regional scale, dissimilarity between bacterial communities located 60 to 260 km apart was not significantly correlated to geographical distance. Within these distances

SC R

there exists a steep rainfall gradient ranging from 100 to 1300 mm per year (Halfon et al., 2009), with an additional gradient in temperatures and heterogeneous distributions of soil factors such as pH,

U

organic matter, nitrogen and mineral ions. This environmental variability, which seems to be the main

N

effector of community selection behind regional-scale microbial assemblage patterns (Nagler et al.,

A

2016; Rousk et al., 2010; Bååth and Anderson, 2003; Pei et al., 2016; Fierer and Jackson, 2006), also

M

blurs differences between soil types.

At the local scale of 25 to 100 m we found a strong distance-decay relationship, i.e. a

ED

correlation between bacterial community dissimilarity and the geographical distance between communities, across soil types, supporting soil type as a significant factor in determining community

PT

similarity. Microbial communities respond differentially to different soil types (e.g. Rousk et al.,

CC E

2010), due to different tolerances and preferences to organic matter and mineral contents (Marković et al., 2016; Cai et al., 2016), soil alkalinity (Dlamini et al., 2016), pH and much more. Rendzina and terra rossa, two similar soil types that are both rich in clay, supported quite similar bacterial

A

communities, yet both were very different from the communities in sand, as shown in T-RFLP profiling, OTU binning and taxonomic analysis. As all soil types were located near each other in each of the study sites, these differences are likely due to the soil parameters that drive microbial distribution patterns. Most notably, sand displays ca. half the water saturation of rendzina and terra rossa from the same site, and other studies from Israel found microbial distribution patterns to be driven by ecosystem type, especially water availability (Angel et al 2010; 2013). In highly-diverse 10

habitats such as soil, biogeographical patterns of the bacterial communities may be affected by the physiological capacities of the composing populations (Bru et al., 2011). Indeed, in a study of the Antarctic extreme environment, which allows for the mitigation of confounding factors, taxon niche widths were decreased as growth rates decreased, diversity decreased, and -diversity increased with decreasing temperature (Okie et al., 2015). In another study using temperate soils, hardy (e.g. spore forming like Bacillaceae and Clostridiaceae) bacteria showed no evidence of a biogeographic

IP T

signature in contrast to poor dispersers/colonizers (e.g Rhizobiaceae, Bradyrhizobiaceae and Xanthomonadaceae) (Bisset et al., 2010). Accordingly, 16S rRNA gene sequencing of the three soil

SC R

types used in this study showed that sandy soils contained significantly higher relative levels of bacteria highly resistant to desiccation, such as Bacillus spp and Arthrobacter (Kumar et al., 2016).

U

However, even within spore-forming bacteria, clades may not all behave similarly: Myxococcus

N

xanthus, a gram-negative spore former showed significant isolation by distance (Vos and Velicer

A

2008). Taken together, these results suggest that soil DNA-based forensics analyses should not be

M

restricted to specific taxonomical or physiological (e.g. spore forming) clades but rather rely on whole community analyses.

ED

Noticeably, differences within a distance of 25 m between samples in a specific soil were also significant. As at the local scale, environmental parameters such as rainfall, which in our system is a

PT

main factor of community change (Angel et al 2010; 2013), are more homogenous and dispersal

CC E

limitation emerges as the main factor of community selection. Indeed, a number of studies have shown that microbial communities exhibit non-random variations at the meter scale (summarized in Martiny et al., 2006), with ecological drift a possible contributing factor (Martiny et al., 2011). We

A

further show an inflection point depending on the soil type, i.e. the distance where dominance shifts from dispersal limitation to environmental selection between 200 and 1000 m. Below this distance, dispersal limitation is more dominant in determining community composition; as geographical distances between communities become larger, the environment becomes more heterogenous, and above the inflection distance, environmental selection becomes dominant in determining community structure. The steep rain gradient between the sites could explain this regional effect (Angel et al., 11

2010). Interestingly, changes from dispersal-limited regime to an environment selection regime have been found at similar scales in studies of soils (Damaso et al., 2018, Bisset et al., 2010) or sediments (Martiny et al., 2011). Thus, it may be hypothesized that in environments where the climatic and/or physicochemical heterogeneity is less pronounced, the "local scale" will stretch to longer distances and the inflection distance will become longer; and indeed, studies of more homogenous environments such as oceans (Sul et al. 2013), streams (Lear et al. 2013) and arid steppe (Yao et al.

IP T

2017) display significant microbial distance-decay relationships for up to hundreds of kilometers.

The ubiquity, heterogeneity and transferability of soil makes it useful as evidence in criminal

SC R

investigations, especially using new methods that survey the microbial DNA it contains (Young et al. 2017). In one study, of 13 blind tests, 10 successfully identified the origin of the sample; no false

U

positives were obtained (Quaak et al., 2011). In an additional study of six mock crime scenarios,

N

eukaryotic 18S rRNA gene marker analysis in conjunction with infrared (MIR) spectroscopy

A

successfully linked the questioned samples to locations (Young et al., 2015). Our study furthers the

M

harnessing of microbial soil biogeography for forensic applications, as the use of improved technical procedures (Habtom et al., 2017) enabled us to statistically and repeatedly differentiate between

ED

different soil types from the same geographical site, and between different geographical sites within the same soil type. Therefore, forensic soil DNA investigation has the potential to not only determine

PT

the approximate ("large scale") source of a sample (Pasternak et al. 2013), but also pinpoint its exact

CC E

location to within at least 25 m (Demaneche et al. 2017; Damaso et al. 2018). Here, we employed a likelihood ratio approach to quantitatively evaluate the probability of two soil samples from casework originating from the same source (1 – 2 m apart) relative to their probability of originating from

A

different sources (25 – 260,000 m apart). Within the (limited) scope of soil types and geographical locations in our test set, the Bray-Curtis scores of these two groups were well differentiated and provided an accurate model for distinction (Cllr=0.57). As Bray-Curtis scores drop below 0.456, LR values grow larger than 1 and the support for the "same source" proposition is strengthened; as BrayCurtis scores rise above 0.456, LR values decrease below 1 and the support for the "different source" proposition is strengthened. When LR=1 both propositions are equally likely and the soil evidence 12

has no probative value. However, there are still limitations for implementing this method in forensic casework. These include temporal resolution, e.g. collection of evidence at different time points; sample storage (Pasternak et al., 2018); lack of standardized ways to process soil DNA (Damaso et al., 2018); and the inability to analyze samples containing a mixture of soils (Quaak et al., 2011, Young et al., 2017). This study adds to the growing body of studies in forensic soil genetics and confirms that soil samples, e.g. from a suspect's shoe or spade, can be reliably compared to possible

IP T

source or alibi soils as shown in blind tests (Quaak et al., 2011; Demanèche et al., 2017). The recent developments in DNA-based approaches of soil analysis suggest that a combination of criminalistics

SC R

and technical forensics expertise including soil microbiome analyses may expand the soil forensic

U

toolbox.

N

Acknowledgements

A

We are grateful to Shlomit Avraham, Moshe Shpitzen and Merav Amiel from the Forensic Biology

M

Laboratory, Division of Identification and Forensic Science, Israel Police, for their dedication and expertise in analysing TRFLP samples. This work was supported by the FP7 program of the

PT

References

ED

European Commission (grant No. 313149).

CC E

Adams RI, Miletto M, Taylor JW, Bruns TD (2013) Dispersal in microbes: fungi in indoor air are dominated by outdoor air and show dispersal limitation at short distances. The ISME Journal 7: 1262-1273.

A

Angel R, Pasternak Z, Soares MIM, Conrad R, Gillor O (2013) Active and total prokaryotic communities in dryland soils. FEMS Microbiology Ecology 86(1): 130-138.

Angel R, Soares MIM, Ungar ED, Gillor O (2010) Biogeography of soil archaea and bacteria along a steep precipitation gradient. The Multidisciplinary Journal of Microbial Ecology 4: 553-563. Bååth E, Anderson TH (2003) Comparison of soil fungal/bacterial ratios in a pH gradient using physiological and PLFA-based techniques. Soil Biology & Biochemistry 35: 955-963. 13

Bell T (2010) Experimental tests of the bacterial distance-decay relationship. The ISME Journal 4: 1357–1365. Bissett A, Richardson AE, Baker G, Wakekin S, Thrall PH (2010) Life history determines biogeographical patterns of soil bacterial communities over multiple spatial scales. Molecular Ecology 19: 4315–4327. Bru D, Ramette A, Saby NPA, Dequiedt S, Ranjard L, et al. (2011) Determinants of the distribution

IP T

of nitrogen-cycling microbial communities at the landscape scale. The ISME journal 5(3): 532. Brummer n, du Preez J (2006) Application-Independent Evaluation of Speaker Detection. Computer

SC R

Speech and Language 20: 230-275.

Cai A, Feng W, Zhang W, Xu M (2016) Climate, soil texture, and soil types affect the contributions

U

of fine fraction-stabilized carbon to total soil organic carbon in different land uses across China.

N

Journal of Environmental Management 172: 2-9.

A

Damaso N, Mendel J, Mendoza M, von Wettberg EJ, Narasimhan G, et al. (2018) Bioinformatics

M

Approach to Assess the Biogeographical Patterns of Soil Communities: The Utility for Soil Provenance. Journal of forensic sciences 63(4): 1033-1042.

ED

Demanèche S, Schauser L, Dawson L, Franqueville L, Simonet P (2017) Microbial soil community

158.

PT

analyses for forensic science: Application to a blind test. Forensic Science International 270: 153–

CC E

Dlamini P, Chivenge P, Chaplot V (2016) Overgrazing decreases soil organic carbon stocks the most under dry climates and low soil pH: A meta-analysis shows. Agriculture, Ecosystem & Environment 221: 258-269.

A

Edgar RC, Haas BJ, Clemente JC, Quince C, Knight R (2011) UCHIME improves sensitivity and speed of chimera detection. Bioinformatics 27: 2194–2200.

Fenchel T, Finlay BJ (2004) The ubiquity of small species: patterns of local and global diversity. BioScience 54: 777-784. Fierer N, Jackson RB (2006) The diversity and biogeography of soil bacterial communities. Proceedings of the National Academy of Sciences of the United States of America 103: 626-631. 14

Finlay BJ (2002) Global Dispersal of Free-Living Microbial Eukaryote Species. Science 296: 10611063. Fitzpatrick R, Raven M, Self P (2011) Detailed mineralogical characterization of small brick and soil fragments (< 0.5 mm diameter) by Synchrotron X-ray diffraction analyses for further forensic comparisons relating to operation Dargan. Centre for Australian Forensic Soil Science Restricted Client Report, 1-116.

IP T

Fitzpatrick RW, Raven MD (2012) How Pedology and Mineralogy Helped Solve a Double Murder Case: Using Forensics to Inspire Future Generations of Soil Scientists. Soil Horizons 53: 14-29.

SC R

Gartzia-Bengoetxea N, Kandeler E, Martínez de Arano I, Arias-González A (2016) Soil microbial functional activity is governed by a combination of tree species composition and soil properties in

U

temperate forests. Applied Soil Ecology 100: 57-64.

N

Giampaoli S, Berti A, Di Maggio RM, Pilli E, Valentini A, et al. (2014) The environmental biological

A

signature: NGS profiling for forensic comparison of soils. Forensic science international 240: 41-

M

47.

Green J, Bohannan BJM (2006) Spatial scaling of microbial biodiversity. Trends in Ecology and

ED

Evolution 2: 501-507.

Green SJ, Venkatramanan R, Naqib A (2015) Deconstructing the polymerase chain reaction:

PT

understanding and correcting bias associated with primer degeneracies and primer-template

CC E

mismatches. PLoS One 10(5): e0128122. Habtom H, Demanèche S, Dawson L, Azulay C, Matan O, et al. (2017) Soil characterisation by bacterial community analysis for forensic applications: a quantitative comparison of

A

environmental technologies. Forensic Science International Genetics 26: 21–29. Halfon N, Levin Z, Alpert P (2009) Temporal rainfall fluctuations in Israel and their possible link to urban and air pollution effects. Environmental Research Letters 4: 1-12. Huse SM, Welch DM, Morrison HG (2010) Ironing out the wrinkles in the rare biosphere through improved OTU clustering. Environemental Microbiology 12: 1889–98.

15

Jiao S, Liu Z, Lin Y, Yang J, Chen W, et al. (2016) Bacterial communities in oil contaminated soils: biogeography and co-occurrence patterns. Soil Biology and Biochemistry 98: 64–73. Kozich JJ, Westcott SL, Baxter NT, Highlander SK, Schloss PD (2013) Development of a dual-index sequencing strategy and curation pipeline for analyzing amplicon sequence data on the MiSeq Illumina sequencing platform. Applied Environmental Microbiology 79(17): 5112–5120. Kuiper I (2016) Microbial forensics: next‐generation sequencing as catalyst. EMBO reports

IP T

e201642794.

Kumar R, Singh D, Swarnkar MK, Singh AK, Kumar S (2016) Complete genome sequence

SC R

of Arthrobacter alpinus ERGS4:06, a yellow pigmented bacterium tolerant to cold and radiations isolated from Sikkim Himalaya. Journal of Biotechnology 220: 86-87.

U

Langenheder S, Ragnarsson H (2007) The role of environmental and spatial factors for the

N

composition of aquatic and bacterial communities. Ecology 9: 2154–2161.

A

Lear G, Bellamy J, Case BS, Lee JE, Buckley HL (2014) Fine-scale spatial patterns in bacterial

M

community composition and function within freshwater ponds. The ISME Journal 8: 1715–1726. Lear G, Washington V, Neale MW, Case B, Buckley HL, et al. (2013) The biogeography of stream

ED

bacteria. Global Ecology and Biogeography 22: 544–554. Liu J, Su Y, Yu Z, Yao Q, Shi Y, et al. (2016) Diversity and distribution patterns of acidobacterial

PT

communities in the black soil zone of northeast China. Soil Biology & Biochemistry 95: 212-222.

CC E

Lücking G, Stoeckel M, Atamer Z, Hinrichs J, Ehling-Schulz M (2013) Characterization of aerobic spore-forming bacteria associated with industrial dairy processing environments and product spoilage. International Journal of Food Microbiology 166: 270–279.

A

Ma J, Ibekwe AM, Yang CH, Crowley DE (2016) Bacterial diversity and composition in major fresh produce growing soils affected by physiochemical properties and geographic locations. Science of the Total Environment 563–564: 199–209. Marković J, Jović M, Smičiklas I, Pezo L, Šljivić-Ivanović M, et al. (2016) Chemical speciation of metals in unpolluted soils of different types: Correlation with soil characteristics and an ANN modelling approach. Journal of Geochemical Exploration 165: 71-80. 16

Martiny JB, Eisen JA, Penn K, Allison SD, Horner-Devine MC (2011) Drivers of bacterial βdiversity depend on spatial scale. Proceedings of the National Academy of Sciences: 201016308. Martiny JBH, Bohannan BJ, Brown JH, Colwell RK, Fuhrman JA, et al. (2006) Microbial biogeography: putting microorganisms on the map. Nature Reviews Microbiology 4(2): 102. Mather PM (1976) Computational methods of multivariate analysis in physical geography. John Wiley and Sons, Inc, London.

IP T

McMurdie PJ, Holmes S (2014) Waste not, want not: why rarefying microbiome data is inadmissible. PLoS Computational Biology 10(4): e1003531.

SC R

Mielke P (1984) Meteorological applications of permutation techniques based on distance functions. Elsevier Science Publishers, pp. 813–830.

U

Nagler M, Ascher J, Gómez-Brandón M, Insam H (2016) Soil microbial communities along the route

N

of a venturous cycling trip. Applied Soil Ecology 99: 13-18.

A

Nekola JC, White PS (1999) The distance decay of similarity in biogeography and ecology. Journal of Biogeography 26: 867–878.

M

O’Malley MA (2008) ‘Everything is everywhere: but the environment selects’: ubiquitous

ED

distribution and ecological determinism in microbial biogeography. Studies in History and Philosophy of Biological and Biomedical Sciences 39: 314-325.

PT

Okie JG, Van Horn DJ, Storch D, Barrett JE, Gooseff MN, et al. (2015) Niche and metabolic

CC E

principles explain patterns of diversity and distribution: theory and a case study with soil bacterial communities. Proceedings of the Royal Society B 282(1809): 20142630. Page AL, Miller RH, Keeney DR (1982) Methods of soil analysis, Part 2: chemical and

A

microbiological properties. American Society of Agronomy, Madison, WI. Pasternak Z, Al-Ashhab A, Gatica J, Gafny R, Avraham S, et al. (2013) Spatial and temporal biogeography of soil microbial communities in arid and semiarid regions. PLoS One 8: e69705. Pasternak Z, Luchibia AO, Matan O, Dawson L, Gafny R, et al. (2018) Mitigating temporal mismatches in forensic soil microbial profiles. Australian Journal of Forensic Sciences, 1-10.

17

Pei Z, Eichenberg D, Bruelheide H, Krober W, Kühn P, et al. (2016) Soil and tree species traits both shape soil microbial communities during early growth of Chinese subtropical forests. Soil Biology and Biochemistry 96: 180-190. Quaak FC, Kuiper I (2011) Statistical data analysis of bacterial t-RFLP profiles in forensic soil comparisons. Forensic science international 210(1-3): 96-101. Quast C, Pruesse E, Yilmaz P, Gerken J, Schweer T, et al. (2013) The SILVA ribosomal RNA gene

IP T

database project: improved data processing and web-based tools. Nucleic Acids Research 41(D1): D590–D596.

SC R

Quispel A (1998) Lourens G. M. Baas Becking (1895-1963), Inspirator for many (micro)biologists. International Microbiology 1: 69-72.

U

Rousk J, Bååth E, Brookes PC, Lauber CL, Lozupone C, et al. (2010) Soil bacterial and fungal

N

communities across a pH gradient in an arable soil. International Society for Microbial Ecology

A

4: 1340-1351.

M

Schloss PD, Westcott SL, Ryabin T, Hall JR, Hartmann M, et al. (2009) Introducing mothur: OpenSource, Platform-Independent, Community-Supported Software for Describing and Comparing

ED

Microbial Communities. Applied Environtal Microbiology 75(23): 7537-7541. Stursova M, Barta J, Santruckova H, Baldrian P (2016) Small-scale spatial heterogeneity of

PT

ecosystem properties, microbial community composition and microbial activities in a temperate

CC E

mountain forest soil. FEMS Microbiology Ecology 92; frw185. Sul WJ, Oliver TA, Ducklow HW, Amaral-Zettler LA, Sogin ML (2013) Marine bacteria exhibit a bipolar distribution. Proceeding of the National Academy of Sciences USA 110: 2342–2347.

A

Torsvik V, Goksoyr J, Daae FL (1990) High Diversity in DNA of Soil Bacteria. Applied and environmental microbiology 56: 782-787.

Treangen TJ, Rocha EP (2011) Horizontal transfer, not duplication, drives the expansion of protein families in prokaryotes. PLoS Genetics 7: e1001284.

18

Uitdehaag S, Quaak F, Kuiper I (2016) Soil Comparisons Using Small Soil Traces, A Case Report. In: Kars H., van den Eijkel L. (eds) Soil in Criminal and Environmental Forensics. Soil Forensics. Springer, Cham. Vos M, Velicer GJ (2008) Isolation by distance in the spore-forming soil bacterium Myxococcus xanthus. Current Biology 18: 386–391. Vos M, Wolf AB, Jennings SJ, Kowalchuk GA (2013) Micro-scale determinants of bacterial diversity

IP T

in soil. FEMS Microbiology Reviews 37(6): 936-954.

Willis SE (2015) ENFSI Guideline for Evaluative Reporting in Forensic Science - Strengthening the

SC R

Evaluation of Forensic Results across Europe. European Network of Forensic Science Institutes. Woods B, Lennard C, Kirkbride KP, Robertson J (2016) Soil examination for a forensic trace

U

evidence laboratory - part 3: a proposed protocol for the effective triage and management of soil

N

examinations. Forensic Science International 262: 46–55.

A

Yao M, Rui J, Niu H, Heděnec P, Li J, et al. (2017) The differentiation of soil bacterial communities

M

along a precipitation and temperature gradient in the eastern Inner Mongolia steppe. Catena 152: 47–56.

ED

Young JM, Austin JJ, Weyrich LS (2017) Soil DNA metabarcoding and high-throughput sequencing as a forensic tool: considerations, potential limitations and recommendations. FEMS Microbiology

PT

Ecology 93(2): fiw207.

CC E

Young JM, Rawlence NJ, Weyrich LS, Cooper A (2014) Limitations and recommendations for successful DNA extraction from forensic soil samples: a review. Science & Justice 54(3): 238244.

A

Young JM, Weyrich LS, Breen J, Macdonald LM, Cooper A (2015) Predicting the origin of soil evidence: High throughput eukaryote sequencing and MIR spectroscopy applied to a crime scene scenario. Forensic science international 251: 22-31. Zhang Y, Cong J, Lu H, Deng Y, Liu x, et al. (2016) Soil bacterial endemism and potential functional redundancy in natural broadleaf forest along a latitudinal gradient. Scientific Reports 6: 28819.

19

Figures

0 km

50

100

Idmit

IP T

Dalya

Gimzo

SC R

Med. sea

U

Beer-sheva

A

N

Avdat

M

North

ED

Figure 1: Study sites and annual average precipitation. Precipitation map adapted from the Water Data. The

A

CC E

PT

map does not infer political boundaries. map Banks Project, http://www.macam.ac.il.

20

IP T SC R U N A M ED PT

CC E

Figure 2: NMDS ordination (stress=12%) of soil bacterial TRFLP communities grouped according

A

to geographic site (A) and soil type (B).

21

Size of difference (MRPP-A)

1.E-01

0.4

1.E-02 1.E-03

0.3

1.E-04

0.2

1.E-05 0.1

1.E-06 1.E-07

0 Same site, Same site, Different site, Different sites, same soil type different soil same soil type different soil type types

IP T

0.5

Size of difference (MRPP-A)

1.E+00

SC R

Significance of difference (MRPP-P)

Significance of difference (MRPP-P)

Figure 3: Significance (MRPP-P, black) and size (MRPP-A, grey) of the differences between soil

N

U

bacterial communities. Data for “same site, same soil type” column are mean±SD values from the

A

detailed analysis at the Gimzo site (rendzina, sand and terra rossa were not significantly different

A

CC E

PT

ED

M

from each other); all other data are mean±SD of all soil types from all five sites.

22

Soil type Sand Rendzina Terra rossa Med. moun.

Soil type:

A

BSand

Rendzina Terra rossa

Med. moun.

Saturation OM

Phosphor Sand

Axis 2

Axis 2

Rendzina Terra rossa

Sodium SAR

Rendzina

Terra rossa

Sand

Sodium Rendzina

Skeletal soil (Avdat)

Potasium

SC R

D

Axis 2

Terra rossa

Loess (Beer-Sheva)

Med. moun.

Axis 1

A

Axis 1

N

Sand (Beer-sheva)

U

Axis 2

Axis 1 Soil type Sand Loess (Avdat) Rendzina Terra rossa Med. moun.

Axis 1

C

IP T

Sand

M

Figure 4: Bacterial community distribution patterns and the contribution of physico-chemical properties to these distributions at different sites and soil types. Data analysed using non-metric

ED

multidimensional scaling (NMDS) ordination with stress<11%. A, Gimzo; B, Idmit; C, Dalya; D, Avdat and Beer-Sheva. All samples from the same soil type are similarly colored and connected by

PT

convex hulls. Physico-chemical parameters with Spearman correlation of R2>0.7 to either of the axes

CC E

are drawn as vectors (red lines) arising from the origin (0,0), with vector length representing the strength of correlation and vector direction representing its axis of increase. At the southern sites

A

(Avdat and Beer-Sheva), soil physico-chemical parameters were not available.

23

IP T SC R U N A M ED PT CC E

A

Figure 5: Logarithmic distance-decay relationships (using TRFLP) between sites for all soil types. Each data point represents a pair of soil samples, with values of (A) MRPP-P as significance of difference and (B) MRPP-A as size of difference. Lines, equations and R2 values are for trendlines. Local data (2-100 m) is from the Gimzo site; regional data (60-260 km) is from five sites along Israel. Insets are the regional data points with linear x-axis.

24

A ED

PT

CC E

Rendzina

Genus abundance 0% 8%

0

B Soil type: Sand Terra rossa

U

N

100 Proteobacteria Actinobacteria Sphingobacteriia Acidobacteria Bacilli Planctomycetacia Verrucomicrobia Unclassified Chloroflexi Gemmatimonadetes Nitrospira Armatimonadetes Chlorobia Chlamydiae

Information Remaining (%) 25 50 75 100 R11 R22 T31 R21 T11 R31 R32 T21 T13 S23 S32 S11 S13 S21 T25

Information Remaining (%) 25 50 75 100

R11 R21 R31 R32 R22 T31 T11 T21 T13 T25 S11 S13 S21 S23 S32

SC R

0

Abundance 43.5%

IP T

25

0

A

50

0%

M

75

A

Microvirga Unclassified bacteria Sphingomonas Solirubrobacter Rubrobacter Unclassified Acidobacteria 6 Skermanella Flavisolibacter Unclassified Chitinophagaceae Unclassified RB41 Gemmatimonas Bryobacter Blastocatella Ferruginibacter Steroidobacter Gaiella Conexibacter Unclassified Actinobacteria Unclassified Solirubrobacterales Unclassified Phycisphaerae Unclassified Acidimicrobiales Gemmata Unclassified Rhizobiales Lamia Ohtaekwangia Unclassified Verrucomicrobia Arthrobacter Bacillus Massilia Sporosarcina Adhaeribacter Unclassified Flammeovirgaceae Unclassified Cytophagaceae Otu0087 Otu0079 Otu0059 Otu0062 Otu0106 Otu0099 Otu0123 Otu0094 Otu0187 Otu0102 Otu0070 Otu0093

100

75

50

25

O000001 1 2 3

Figure 6: Cluster analysis of bacterial communities (using 16S HTS) of rendzina, terra rossa and

sand soil types at Gimzo with taxonomic clustering of phyla (A) and genera (B). Only the 33 most

abundant taxa are shown in (B), together comprising 53-63% of the reads in each sample. Rows

represent taxa; columns show sample clustering; gray scale (A) and color intensity (B) marks the

relative abundance of the taxon in a specific sample.

25

0

0.25 Same soil

Different soil

0.15

0.10

IP T

Probability

0.20

SC R

0.05

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95

0.00

U

Difference between samples (Bray-Curtis)

N

Figure 7: Probability distributions and Gaussians for Bray-Curtis distances from the same soil

A

source (1 – 2 m apart; N=143) and different soil source (25 – 260,000 m apart; N=2485) in our data

A

CC E

PT

ED

M

set.

26

Tables Delta F

PAR (ratio)

OM (%)

K (mg/L)

P (mg/kg)

NH4 (mg/kg)

NO3 (mg/kg)

SAR (ratio)

Mg (mg/L)

Ca (mg/L)

Na (meq/L)

EC (dS/m)

pH

Sat. (%)

Rain (mm)

Soil type

Coordinates

(E)

Coordinates

(N)

Site

34°45'39'

L

100

-

-

-

-

-

-

-

-

-

-

-

-

-

-

30°48'20

34°45'

SL

100

-

-

-

-

-

-

-

-

-

-

-

-

-

-

Beer

31°11'24

34°36'

L

200

-

-

-

-

-

-

-

-

-

-

-

-

-

-

Sheva

31°9'40

34°30'30

S

200

-

-

-

-

-

-

-

-

-

-

-

-

-

-

31°57'

34°56'30

R

385

103

7.3

2.06

0.9

553

54

0.22

5

91.9

10.5

18.9

12.0

0.12

3307

31°54'57

34°47'50

S

385

50

7.2

1.36

0.7

274

31

0.25

12

56

20.1

35.8

2.92

0.32

2719

31°56'54

34°58'26

T

385

70

7.3

0.88

0.9

196

12

0.38

65

108

10.9

5.6

4.13

0.06

3715

32°35'30

35°4'

M

700

98

6.7

1.23

1.2

273

20

0.44

123

14.9

123.3

14.4

7.72

0.13

3242

32°36'59

35°1'30

R

700

94

7

1.12

0.9

279

15

0.32

53

28.7

52.7

5.6

6.55

0.05

3816

32°36'52

34°55'47

S

700

54

6.9

1.8

1.4

351

33

0.45

114

15.0

113.6

18.9

3.8

0.15

3170

32°38'

35°1'57

T

700

82

6.9

0.74

1.3

131

14

0.65

37

9.4

37.4

13.2

3.82

0.17

3089

33°5'

35°14'

M

1300

72

6.5

1.08

0.9

175

37

82

9.8

81.8

9.8

3.72

0.1

3397

33°5'39

35°12'44

R

1300

83

7.1

1.22

0.5

305

0.17

133

20.8

133.5

16.9

11.2

0.15

3174

33°3'33

35°6'21

S

1300

29

7.4

0.77

0.7

33°4'37'

35°12'35

T

1300

73

7.2

1.68

0.6

U

N

0.39

14

131

18

0.33

17

2.5

17

9.3

0.43

0.12

3307

397

35

0.18

162

18.9

161.8

29.5

8.97

0.23

2932

M

Idmit

A

Dalya

SC R

Gimzo

IP T

30°48'

Avdat

Table 1. Physical and chemical characteristics of the soils in each site. L, loess; S, sand; SL, desert

ED

skeletal soil; R, rendzina; T, terra rossa; M, Mediterranean brown soil. Rain, average annual rainfall;

PT

Sat., saturation percentage; EC, electrical conductivity (salinity); Na, sodium; Ca, calcium; Mg, magnesium; SAR, sodium adsorption ratio; PAR, phosphorus adsorption ratio; OM, organic matter;

A

CC E

NO3, nitrate; NH4, ammonium (including adsorbed N); P, phosphorus; K, potassium.

27

same soil 143 0.36 0.11 0.08 0.59 y=0.17*exp(-35.3*(S-0.33)^2) 0.94

different soil 2485 0.60 0.10 0.24 0.84 y=0.19*exp(-42.2*(S-0.58)^2) 0.99

IP T

Bray-Curtis scores N Mean Standard Deviation Minimum Maximum Gaussian R2 of Gaussian

Table 2. Validation experiment for creating a likelihood-ratio model. "same soil" samples were

A

CC E

PT

ED

M

A

N

U

SC R

taken within 1 – 2 m of each other. S denotes the Bray-Curtis score.

28

Supplementary material Supplementary Table S1. Diversity of TRFs in the different soils.

Dalya

85±7 135±25 146±18 128±13 129±10

3.56 3.93 4.05 3.96 3.90

Terra rossa Med. Moun. Rendzina Sand Terra rossa Med. Moun. Rendzina Sand Terra rossa

185 201 286 223 244 204 221 202 272

127±8 135±10 158±28 147±11 151±12 146±8 136±9 139±6 154±40

3.87 4.00 4.17 4.09 4.12 4.06 3.83 3.93 3.96

ED

M

Idmit

124 232 213 197 200

IP T

Gimzo

Sand Loess Skeletal soil Rendzina Sand

3.65

SC R

Loess

Avdat

Shannon diversity

146

mean±SD TRF per sample 89±22

U

BeerSheva

Total TRF#

N

Soil type

A

Site

Supplementary Table S2. Bacterial community comparison between sites (controlling for soil type) and between soil types (controlling for site). Sites compared Avdat vs. Beer Sheva Dalya vs. Idmit Gimzo vs. Dalya Gimzo vs. Idmit Dalya vs. Idmit Gimzo Gimzo Gimzo Gimzo Beer Sheva vs. Gimzo Beer Sheva vs. Dalya Beer Sheva vs. Idmit Gimzo vs. Dalya Gimzo vs. Idmit Dalya vs Idmit Gimzo

A

CC E

PT

Soil type Loess soil Med. Moumtain soil Rendzina Rendzina Rendzina Rendzina Rendzina Rendzina Rendzina Sand Sand Sand Sand Sand Sand Sand

distance (m) 60000 75000 86000 161000 75000 100 75 25 2 96000 182000 257000 86000 161000 75000 100

MRPP A 0.276293 0.340604 0.237665 0.417503 0.34541 0.097264 0.06584 0.04244 -0.02116 0.383587 0.377793 0.38241 0.205887 0.315714 0.345827 0.206896

MRPP p 3.00E-08 3.00E-08 1.00E-08 2.00E-08 1.00E-08 6.69E-05 0.00633 0.002884 0.783999 3.00E-08 3.00E-08 1.00E-08 2.30E-07 7.00E-08 1.00E-08 0.000275 29

Gimzo Gimzo Gimzo Gimzo vs. Dalya Gimzo vs. Idmit Dalya vs. Idmit Gimzo Gimzo Gimzo Gimzo

75 25 2 86000 161000 75000 100 75 25 2

0.213634 0.127417 0.006063 0.21788 0.248725 0.349247 0.167051 0.116023 0.086733 -0.00036

0.001725 0.000235 0.354453 2.03E-06 1.90E-07 1.00E-08 0.000188 0.002782 0.000932 0.429812

Research site Avdat Beer Sheva Dalya Idmit Dalya Idmit Dalya Idmit Dalya Gimzo Idmit Dalya Gimzo Idmit Dalya Idmit

Soil types compared Loess soil vs. Desert skeletal soil Loess soil vs. Sand Med. Mountain soil vs. Rendzina Med. Mountain soil vs. Rendzina Med. Mountain soil vs. Sand Med. Mountain soil vs. Sand Med. Mountain soil vs. Terra Rossa Med. Mountain soil vs. Terra Rossa Rendzina vs. Sand Rendzina vs. Sand Rendzina vs. Sand Rendzina vs. Terra Rossa Rendzina vs. Terra Rossa Rendzina vs. Terra Rossa Sand vs. Terra Rossa Sand vs. Terra Rossa

Distance <3000 <3000 <3000 <3000 <3000 <3000 <3000 <3000 <3000 <3000 <3000 <3000 <3000 <3000 <3000 <3000

MRPP A 0.399508 0.254453 0.22765 0.417582 0.252576 0.451838 0.12512 0.374726 0.233057 0.208624 0.37069 0.13616 0.111828 0.200376 0.164875 0.223881

MRPP p 6E-08 4E-08 2E-08 5E-08 1E-08 1E-08 4.68E-05 3E-08 1E-08 8.03E-06 2E-08 1.67E-06 0.002859 1.05E-06 1.41E-06 3.6E-07

SC R

U

N

A

ED

M

IP T

Sand Sand Sand Terra Rossa Terra Rossa Terra Rossa Terra Rossa Terra Rossa Terra Rossa Terra Rossa

sample R11 R21 R22 R31 R32 S11 S13 S21 S23 S32 T11 T13 T21 T25 T31

CC E

Soil type Rendzina

PT

Supplementary Table S3. Bacterial Miseq sequencing parameters at the Gimzo site. Operational taxonomic units (OTUs) were created using 98% similarity.

A

Sand

Terra rossa

# reads 60931 96993 54441 55341 118707 75387 81702 105790 131654 116782 99440 109755 105564 114024 103341

coverage 0.948942 0.945295 0.914256 0.920999 0.950053 0.961253 0.957994 0.960535 0.965744 0.962811 0.953037 0.954881 0.953412 0.948107 0.953794

# OTUs 7936 13094 11677 10132 13984 6899 8431 9928 11151 10707 11879 12334 12168 13509 12002 30

Supplementary figure S4. Rarefaction curves of Gimzo samples at a cutoff level of 98%, showing the number of observed OTUs (sharing at least 98% identity) as a function of the number of sequences. 14000

10000

IP T

8000 6000 4000

SC R

No. of OTUs (2% difference)

12000

R11 R12 R21 R22 R31 R32 S11 S13 S21 S23 S32 S34 T11 T13 T21 T25 T31 T33

2000 0 20000

40000

60000

80000

100000 120000 140000

U

0

A

CC E

PT

ED

M

A

N

No. of reads

31