Microbiome-based body fluid identification of samples exposed to indoor conditions

Microbiome-based body fluid identification of samples exposed to indoor conditions

Accepted Manuscript Title: Microbiome-based body fluid identification of samples exposed to indoor conditions Authors: Akos Dobay, Cordula Haas, Geoff...

824KB Sizes 0 Downloads 4 Views

Accepted Manuscript Title: Microbiome-based body fluid identification of samples exposed to indoor conditions Authors: Akos Dobay, Cordula Haas, Geoffrey Fucile, Nora Downey, Hilary G Morrison, Adelgunde Kratzer, Natasha Arora PII: DOI: Reference:

S1872-4973(18)30491-5 https://doi.org/10.1016/j.fsigen.2019.02.010 FSIGEN 2047

To appear in:

Forensic Science International: Genetics

Received date: Revised date: Accepted date:

5 September 2018 14 January 2019 10 February 2019

Please cite this article as: Dobay A, Haas C, Fucile G, Downey N, Morrison HG, Kratzer A, Arora N, Microbiome-based body fluid identification of samples exposed to indoor conditions, Forensic Science International: Genetics (2019), https://doi.org/10.1016/j.fsigen.2019.02.010 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Microbiome-based body fluid identification of samples exposed to indoor conditions

IP T

Authors: Akos Dobay1, Cordula Haas1, Geoffrey Fucile2, Nora Downey3, Hilary G

SC R

Morrison3, Adelgunde Kratzer1, Natasha Arora1

Affiliations:

U

1 Zurich Institute of Forensic Medicine, University of Zurich, Winterthurerstrasse 190,

N

CH-8057 Zurich, Switzerland.

A

2SIB Swiss Institute of Bioinformatics, Klingelbergstrasse 50/70, CH-4056 Basel,

M

Switzerland.

3Josephine Bay Paul Center, Marine Biological Laboratory, Woods Hole, MA 02543,

TE D

USA.

EP

Corresponding authors: Akos Dobay, Natasha Arora

CC

Highlights 

A

 

Evaluation of forensically-relevant samples exposed outside their natural human body habitats Metagenomic 16S rRNA gene sequencing data suitable for forensic body fluid/tissue identification Exposed samples harbor microbial signatures characteristic for their bodily origin

1

Abstract In the forensic reconstruction of crime scene activities, the identification of biological traces and their bodily origin are valuable evidence that can be presented in court. While several presumptive and confirmatory tests are currently available, the limitations in specificity and

IP T

sensitivity have instigated a search for alternative methods. Bacterial markers have been

proposed as a novel approach for forensic body fluid/tissue identification. Bacteria are not

SC R

only ubiquitous throughout the human body, but also, as shown by recent microbiome

sequencing studies of the 16S rRNA gene, bacterial community structures are distinct across

U

body sites. Traces and stains at crime scenes are, however, often exposed to the environment

N

outside the human body for variable periods of time before laboratory processing. Thus, it is

A

not clear whether exposed samples continue to harbor microbial signatures characteristic of

M

their body site of origin. In this proof-of-concept study we collected samples from six different body sites: saliva, skin, peripheral blood, vaginal fluid, menstrual blood and semen.

TE D

We exposed a subset of these samples to indoor conditions for 30 days while the remaining samples were processed directly after extraction. Our analyses of 16S rRNA gene sequence

EP

data for a total of 46 control and exposed samples show that both types of samples group by body site, although a few outliers are observed. Based on our results, vaginal and menstrual

CC

samples share their microbial signatures, and cannot be distinguished using bacterial markers. Overall, our findings indicate that bacterial markers are a promising avenue for forensic body

A

fluid/tissue identification.

2

Keywords: Microbiome, Body fluid identification, 16S rRNA sequencing, Alpha diversity, Beta diversity, Principal component/coordinates analysis

IP T

Introduction Determining the presence of biological traces at a crime scene and identifying their body site

SC R

of origin are important aspects of the reconstruction of crime scene events and for the

selection of samples for further downstream analyses, including DNA testing (Virkler and Lednev 2009). Establishing whether a stain corresponds to saliva, skin, vaginal fluid, semen,

U

or menstrual blood, among others, is valuable information that may aid in clarifying the

N

activities that took place, for instance, in the case of sexual assault. Forensic body fluid/tissue

A

identification is conducted using presumptive and/or confirmatory tests, which are based on

M

chemical, enzymatic, immunological, spectroscopic or microscopic methods. Many of these methods present limitations in sensitivity, specificity or both and have thus spurred forensic

TE D

scientists to explore novel approaches based on human tissue-specific markers such as mRNA, microRNA (miRNA), or methylation markers (Virkler and Lednev 2009; Kayser and

EP

de Knijff 2011; Sijen 2015).

In recent years, bacterial markers have been posited as a potential new method for body site

CC

identification both due to their ubiquitous presence at human body sites as well as their patterns of distribution (Sijen 2015). Bacteria are found throughout the human body and are

A

estimated to outnumber human cells overall, although there is variation across different body areas and individuals (Sender et al. 2016). These numbers nonetheless suggest that bacterial 3

DNA from body sites may still be detectable when human DNA or RNA are in low copy number or degraded. Our understanding and characterization of the human microbiome has improved over the last decade, which has witnessed an exponential increase in marker gene and shotgun metagenomic sequencing studies. Particularly since the start of the Human Microbiome Project, a wealth of data on microbial community composition has been

IP T

generated in an effort to understand what constitutes a healthy biome and to generate

reference databases. Typically, marker gene data comprise sequences from variable stretches

SC R

of the gene encoding 16S ribosomal RNA (rRNA), one of the structural RNAs integral to

prokaryotic ribosomes, present across bacteria and archaea. These marker gene sequencing studies have consistently shown that microbial communities in the human body are specific to

U

the sites at which they are found. The benchmark study by Costello et al. (2009) examined the

N

V2 region of the 16S rRNA gene of bacteria in the human gut, oral cavity, nostril, hair and

A

various skin sites. Using distance-based ordination methods, the authors showed that the

M

samples grouped first according to body site, and then at a given body site, they grouped

TE D

according to the individual. Since then, numerous other studies conducted on these as well as other body sites in different subjects have confirmed this distinctive pattern (Caporaso et al. 2011; 2013; Human Microbiome Project Consortium 2012; Lloyd-Price et al. 2017). Site-

EP

specificity of microbes has even been found at fine spatial scales, with for instance an

CC

ecological gradient in the human mouth that may be associated with salivary flow (Proctor et al. 2018). Interestingly, the distinctiveness and differentiation of microbial communities

A

across body sites appears to develop early on in an infant’s life (Costello et al. 2013; Chu et al. 2017). 4

This association between bacteria and body sites has been exploited in forensic studies of body fluid/tissue identification, although most of these have focused on the presence/absence patterns of a targeted and limited set of markers per site (Nakanishi et al. 2009; Benschop et al. 2012; Giampaoli et al. 2012; Choi et al. 2014a). For instance, Giampaoli et al. (2012) investigated the distinction of a total of 47 vaginal, oral and fecal samples using six microbial

IP T

markers: Lactobacillus crispatus, Lactobacillus gasseri, Streptococcus salivarius,

Streptococcus mutans, Staphylococcus aureus and Enterococcus sp. In another study, Choi et

SC R

al. (2014b) extended a methylation-based approach to incorporate PCRs to amplify the 16S rRNA gene regions of two saliva-specific bacteria (Streptococcus salivarius, Veillonella atypica), and two vaginal fluid and menstrual blood-specific bacteria (Lactobacillus

U

crispatus, Lactobacillus gasseri). While these studies showed the potential of microbial

N

detection, the selected markers were limited and tested on small datasets, occasionally

A

producing false positives or false negatives. Relying on very few or a limited set of bacterial

M

markers alone may be problematic as there is temporal variation in the specific operational

TE D

taxonomic units (OTUs), as well as in the abundance patterns, found at a site (Caporaso et al. 2011; Flores et al. 2014). In their study of the temporal changes for three body sites (gut, mouth and skin) in two individuals, Caporaso et al. (2011) found few taxa to be consistently

EP

present across all sampling events for those three sites, although the distinction across body

CC

sites remained. Hence, non-targeted marker gene sequencing approaches may enable bypassing the challenge posed by transient taxa. Such an approach was recently applied by

A

Hanssen et al. (2017) in their study detecting saliva deposited on skin.

5

While the dynamic changes of the microbiome have been explored in studies examining intraindividual stability (Costello et al. 2009; Caporaso et al. 2011; Flores et al. 2014), less attention has been given to stability over time for samples outside the human body. This is an important concern for the forensic prediction of the body source from which a stain or fluid originated, as samples from a crime scene may be exposed outside their natural human body

IP T

habitat prior to their collection, or during storage prior to laboratory analysis. For the purpose

of individual identification, Fierer and colleagues (2010) found that the 16S rRNA gene-based

SC R

microbial community composition on keyboard keys matched that of the skin from the user when keyboards were swabbed more than 30 minutes after use and stored at -80C in the freezer before laboratory processing. Interestingly, keyboard samples and skin samples still

U

grouped according to the user even when the skin samples were left exposed in the laboratory

N

for either 3 or 14 days before freezer storage and processing (Fierer et al. 2010). However, in

A

another study linking household surfaces to the skin of the occupants of the house, Wilkins et

M

al. (2017) found they could accurately match the surfaces and the occupants with 67%

TE D

accuracy only when sampled in the same season. Sampling in different seasons led to a sharp decrease in accuracy. As these studies were addressing individual identification, it is as yet unclear to what extent fluids or stains that have been outside the human body for a given

EP

length of time will continue to harbor the characteristic microbiota to identify a body site, and

CC

not the individual, especially when there is a time gap between a crime and the moment when the crime scene is effectively discovered. Our goal was to assess whether samples exposed at

A

room temperature for a given length of time continue to show the distinctive microbial communities that would be expected. We therefore collected samples from six different body 6

sites/fluids (skin, semen, saliva, peripheral blood, menstrual blood, vaginal fluid) and compared the taxonomic composition of samples processed right after collection with those that were processed after exposure in a room, choosing 30 days time for this study.

IP T

Materials and Methods Sample collection and bacterial DNA extraction

SC R

A total of 70 samples from six body fluids or tissues was collected directly using Sterile

Catch-All™ Sample Collection Swab (Epicentre Biotechnologies, Inc., Madison WI, USA) or transferred to these after collection. These fluids/tissues were saliva (Sa), vaginal fluid (Vg),

U

skin (Sk) from the palm of the right hand, menstrual blood (Mb), peripheral blood (Bl), and

N

semen (Se). Samples were obtained from a total of 19 subjects, all of whom were above 18

A

years of age, of European (n = 18) and Asian descent (n = 1), and reported no history of

M

antibiotic use for a month prior to the study. Each subject received information about the study’s purpose and consented to participation. In order to examine variation across body

TE D

sites and assess the microbial signature in control and exposed samples, for each body site we collected two samples from 4-5 different individuals, extracting the DNA at different time

EP

points: i) for one sample, extraction was carried out right after collection (t1); ii) for the other sample, extraction followed exposure for 30 days at indoor environmental conditions (t3).

CC

Exposure was carried out by placing the swabs on a test tube rack, and on the top of a shelf in a laboratory. The laboratory was open, used on a daily basis and was at room temperature,

A

with the window open at times. Additionally, for one individual per body site, we incorporated two measures capturing intra-individual variation. First, to examine sampling 7

baseline variability between two consecutive swabbing actions, we included biological replicates (labelled “a” and “b”) from the same individual, extracting the DNA right after collection. Second, from one individual per body site we obtained another swab (t2) a few days after the first (t1), with a maximum time difference of 7 days. Four negative controls (collection swabs without any body fluids) were also incorporated: two of these were

number of samples per regime are provided in Supplementary Table S1a.

IP T

extracted right after collection and two were exposed in the laboratory. The details on the

SC R

DNA extraction was carried out using the MoBio BiOstic® Bacteremia DNA Isolation Kit

(Mo Bio Laboratories, Inc., Carlsbad CA, USA) following the manufacturer’s guidelines with modifications (detailed in the supplementary methods). Quantification of total DNA was

N

U

performed with the Quantus™ Fluorometer (Promega, Inc., Madison WI, USA).

A

16S ribosomal RNA gene amplification and sequencing

M

Amplicon libraries for the V4-V5 region of the bacterial 16S rRNA gene (positions cognate to

TE D

518-926 in Escherichia coli, ~408 nt), were constructed using the primers, reaction components, and cycling conditions described in the supplementary methods and in Huse et al. (2014). For a subset of samples which displayed strong amplification in the 18S rRNA

EP

region relative to the 16S rRNA region, samples were pooled and size-selected at 425-625 bp

CC

in order to isolate the 16S rRNA region and prevent sequencing of the 18S rRNA product. Size selection was conducted using a Pippin Prep (Sage Biosciences, Inc., Beverly MA, USA)

A

and 1.5% gel cartridge, and the selected pool was cleaned and concentrated with a Qiagen MinElute column (Qiagen, GmbH., Hilden, Germany). Sequencing was carried out on an 8

Illumina MiSeq platform (Illumina, Inc., Hayward CA, USA), generating paired-end reads of 250 nt.

Data processing The raw demultiplexed Illumina reads were merged, quality filtered and clustered using

IP T

USEARCH version 9.2.64 (Edgar 2010). Quality control of merged reads was performed using the parameters recommended by Edgar and Flyvbjerg 2015 (default USEARCH

SC R

filtering parameter: maximum expected error threshold = 1). To obtain OTU representative sequences, chimera filtering and clustering was performed with UPARSE at 97% sequence similarity, with singleton OTUs discarded. Following these steps, we obtained a total of

U

2,666,056 OTU sequences, with an average of 60,592 sequences per sample.

N

Taxonomic assignment for the OTU representative sequences was done using the mothur-

A

compatible reference database adapted from the Silva Reference SSU NR 128 database

M

(https://www.mothur.org/wiki/Silva_reference_files), and applying the default naïve Bayesian

TE D

classifier on mothur (Wang et al. 2007; Schloss et al. 2009). We also compared the output with that obtained using two other methods: i) the Ribosomal Database Project (RDP) classifier training database and the default naïve Bayesian classifier (Wang et al. 2007); ii)

EP

using the entire Silva Reference Database 128 NR 99 and a custom hidden Markov model

CC

(HMM) method. Three OTUs that were unclassified with the naïve Bayesian classifier on mothur (Otu160, Otu255, and Otu345) were removed from the alignment. Among these 3

A

OTUs i) Otu160 was classified as Proteobacteria by the HMM algorithm and the RDP classifier; ii) Otu255 was classified as Proteobacteria using the HMM algorithm, and 9

Firmicutes with the RDP classifier; and iii) Otu345 was classified as Firmicutes by the HMM algorithm and the RDP classifier. A BLAST (Altschul et al. 1990) search for all three OTUs was conducted. Otu160, Otu255 and Otu345 shared over 95% identities with uncultured bacteria. These three OTUs were therefore not included in further analyses. Further BLAST

IP T

searches were also conducted for other OTUs.

Multiple Sequence Alignment

SC R

Subsequently, we did a de novo multiple sequence alignment using Infernal version 1.1.2

(Nawrocki et al. 2009). Alignment filtering was carried out in QIIME version 1.9 (Caporaso et al. 2010) using an entropy setting of e = 0.10. We used RAxML version 7.2.8 (Stamatakis

N

U

2006) to generate the maximum likelihood phylogenetic tree.

A

Statistical analyses

M

We used the OTU table generated by USEARCH, the taxonomic table generated by mothur,

TE D

and the RAxML phylogenetic tree as input for subsequent analyses in R version 3.4.4 (R Core Team 2018) using Phyloseq (R Core Team 2018; McMurdie and Holmes 2013) and vegan (Oksanen et al. 2016). Both intra-sample (α-diversity) and inter-sample variation (β-diversity)

EP

were computed. β-diversity was visualized using adaptive generalized principal components

CC

analysis (agPCA) (Fukuyama 2017), and principal coordinates analyses (PCoA) with weighted Unifrac distances and Bray-Curtis dissimilarity measures (Lozupone and Knight

A

2005; Lozupone et al. 2007; Bray and Curtis 1957). For both agPCA and the PCoA of weighted Unifrac distances, we applied a logarithmic transformation to the OTU relative 10

abundance, after incorporating a small pseudocount of 0.00001. For the PCoA of Bray-Curtis dissimilarity measures, we applied the logarithmic transformation of the OTU relative abundance after adding a pseudocount of 1. We carried out a permutational multivariate analysis of variance (PERMANOVA) and tested homogeneity of dispersion among body sites with the adonis and the betadisper functions

IP T

available in the vegan package in R. Figures were generated with the help of the R package

SC R

ggplot2 (Wilkinson 2011).

Results

U

We collected a total of 70 samples from 19 individuals, and from multiple body sites/fluids:

N

saliva (n = 13), vaginal fluid (n = 12), menstrual blood (n = 10), peripheral blood (n = 11),

A

skin (n = 12), and semen (n = 12). In addition, we also incorporated 4 negative controls. As a

M

test to examine whether the bacterial communities of exposed samples were different, we separated the samples into two sets: for one set, DNA extraction was carried out immediately

TE D

after collection (control samples), while for the other set, DNA extraction was carried out after 30 days following exposure to indoor conditions (exposed samples). Following PCR

EP

amplification of the 16S rRNA gene V4-V5 region, sequencing and quality control, we obtained data from 47 samples (Supplementary Table S1b). The negative controls did not

CC

yield amplicons for further sequencing. PCR amplification and sequencing success was 100% for vaginal and menstrual blood samples, and 92% for saliva samples. For skin, 67% of

A

samples yielded V4-V5 PCR products, and success dropped to much lower levels for semen (42%) and peripheral blood (9%, only 1 out of 11 samples). These two fluids, semen and 11

blood, have been considered sterile body fluids. Overall, samples displayed a large variation in read depth, ranging from 9,543 to 97,386 reads (Supplementary Figure S1). Lower read depths generally associated with samples that initially amplified more strongly at the 18S rRNA region, found in eukaryotic cells, and which were then pooled and size-selected to maximize the 16S rRNA sequence reads (data not shown). Menstrual blood samples were

IP T

associated with higher numbers of reads compared to skin, saliva and semen (ANOVA, p <

0.001 and Supplementary Figure S1). The comparison between menstrual blood and vaginal

SC R

fluid was not significant. As we only had one peripheral blood sample, we excluded it from further analyses.

U

Overview of taxonomic diversity

N

Across all 46 samples, we found 10 different phyla (Actinobacteria, Bacteroidetes,

A

Cyanobacteria, Firmicutes, Fusobacteria, Proteobacteria, Spirochaetae, Synergistetes,

M

Tenericutes, and Verrucomicrobia), 73 families, and a total of 150 different genera,

TE D

corresponding to 353 OTUs. At each body site we observed between 90 and 143 genera, with the highest number of genera in skin, followed by saliva, menstrual blood, vaginal fluid and semen. However, a considerable proportion of this diversity is present at very low

EP

frequencies: setting a threshold of a minimum of 1% relative abundance resulted in an

CC

average of 20-21 genera for skin, semen and saliva, 3 for vaginal fluid and 11 for menstrual blood, when considering control and exposed samples together (for the number of genera

A

above 1% relative abundance for control and exposed samples at each body site, see Table 1).

12

We further investigated within-sample diversity patterns at each body site by computing αdiversity estimates (Supplementary Figures S2 and S3). Observed species and phylogenetic distance (PD) revealed higher taxon richness and evenness in skin and saliva compared to vaginal fluid and menstrual blood (paired two-sample Welch’s t-test with 100 permutations, p <0.01). The comparisons with semen did not yield significant results, probably due to the

IP T

small sample size for semen. Visual inspection of the α-diversity boxplots for both rarefied

and unrarefied data (Supplementary Figure S3) appears to indicate that skin control samples

SC R

have higher diversity compared to exposed samples while in menstrual blood this trend is

reversed. However, larger sample sizes and tests for statistical significance are warranted to

U

check whether this pattern holds true.

N

Samples cluster according to body site

A

Differences in microbial composition across body sites were visualized through

M

phylogenetically-informed ordination with adaptive generalized principal component analysis

TE D

(agPCA) (Fukuyama et al. 2017). The agPCA for the 30 most abundant genera ranked across all samples (Figure 1), and that for all genera (Supplementary Figure S4) showed similar sample clustering; therefore we focused on the agPCA for the top 30 genera. As illustrated in

EP

Figure 1A, the agPCA for the sample points reveals clustering according to body site,

CC

although vaginal and menstrual samples show extensive overlap. Three exposed samples are notable as outliers: one exposed saliva sample (8Sat3) which clusters with vaginal fluid and

A

menstrual blood; and one exposed menstrual blood (16Mbt3) as well as an exposed skin sample (16Skt3), both of which cluster together with semen. In order to assess the effect of 13

the taxa on the clustering, we also investigated the taxon loadings on the principal axes (Figure 1B). These projections show the positive scores on the first axis of a subset of the Firmicutes, Lactobacillus (center right from the origin), associated with menstrual/vaginal samples; while for saliva we observe an association with Fusobacteria such as Fusobacterium and Leptotrichia, as well as Actinomyces, Prevotella, Alloprevotella, and Veillonella (lower

genera with positive scores, including Acinetobacter, Escherichia-Shigella,

IP T

left from the origin). Along the second axis (upper left from the origin), we observe diverse

SC R

Enterobacteriaceae, Pseudomonas, Bifidobacteria and Blautia, characterising skin samples. We also used principal coordinates analysis (PCoA) to explore differences in microbial composition, utilizing two measures: Bray-Curtis dissimilarity measure (Bray and Curtis

U

1957) and weighted UniFrac distances (Lozupone and Knight 2005; Lozupone et al. 2007)

N

(Supplementary Figure S5 and S6). The latter, like agPCA, takes into account the

A

phylogenetic relationship among taxa, whereas Bray-Curtis dissimilarity measures do not

M

utilize phylogenetic tree branch lengths. While overall samples grouped according to body

TE D

site, as also observed with agPCA, the PCoA plot of Bray-Curtis dissimilarity measures shows vaginal and menstrual samples forming part of two groups that contain both kinds of samples. In each group we observed enrichment for one of two lactobacilli OTUs, either

EP

OTU1 in the case of individuals 14 and 3, or OTU2 for all other individuals, indicative of

CC

individual-specificity (Supplementary Figure S7). For samples in the first group, OTU1 has a relative abundance between 16-98% (or 54-98% when excluding the outlier sample 16Mbt3),

A

while for samples in the second group OTU2 has a relative abundance between 42-75%. The BLAST searches on the NCBI nucleotide collections gave a match for OTU1 with either 14

Lactobacillus acidophilus or Lactobacillus crispatus, and for OTU2 a match with Lactobacillus iners. For both the Bray-Curtis dissimilarity measures and the weighted Unifrac distances, the permutational multivariate analysis of variance (PERMANOVA) revealed significant effects of body site on the clustering of samples (unfiltered dataset with all genera: weighted Unifrac: PERMANOVA R2 = 0.59, p = 0.001; Bray-Curtis PERMANOVA R2 =

IP T

0.46, p = 0.001). To examine more closely the effect of body site, we also carried out

PERMANOVA analyses on pairs of body sites. For most pairs, p values were below 0.001,

SC R

except in the case of vaginal fluid and menstrual blood, for which p values were higher (p=0.021, Weighted Unifrac) or non-significant (p= 0.061, Bray-Curtis), indicating the

difficulty in distinguishing samples from these two human habitats (Supplementary Table

U

S2). This result is also visible in the PCoA plots where we observe the overlap in location for

A

M

Differences across samples per body site

N

menstrual blood and vaginal fluid samples.

TE D

We examined the similarities and differences across samples for the dominant genera, focusing on the 20 with the highest mean relative abundance across all samples. These top 20 genera constitute at least 50% of the microbial reads at a body site, increasing to a minimum

EP

of 80% for vaginal fluid and menstrual blood (Figure 2). Overall, samples from the same

CC

body site show similar signatures for these dominant genera. For example, skin shows the highest taxonomic diversity, and is characterised by Propionibacterium (mean relative

A

abundance across all samples 15%), Staphylococcus (8%), Streptococcus (7%), Bacteroides (5%), Blautia (5%), Bifidobacterium (4%), and Lachnospiraceae (4%). Saliva samples are 15

characterised by Prevotella (16%), Streptococcus (15%), Veillonella (7%), Haemophilus (6%) and Fusobacterium (4%). The microbial signature in semen comprises Bacteroides (19%), Lachnospiraceae (15%), Streptococcus (11%), Blautia (9%) and Bifidobacterium (7%). Vaginal fluid and menstrual blood have the most skewed taxonomic distribution, as Lactobacillus makes up on average 75% and 86% of the bacterial reads at each of these,

IP T

respectively. This genus is dominant whether the samples are control or have been exposed for 30 days.

SC R

Nonetheless, at each body site, we also observe considerable variation in the relative

abundance of these genera (for a summary of the top 10 genera for control and exposed samples per body site, see Table 1). The changes are particularly strong for some of the

U

exposed samples, notably the three outliers in Figure 1A: 8Sat3, 16Mbt3, and 16Skt3. In

N

saliva, 8Sat3 displays a skewed abundance of Alcaligenacea (80% relative abundance) and a

A

drastic decrease in Firmicutes, especially Prevotella (0.06%) and Streptococcus (0.04%)

M

compared to the other saliva samples. In most of these, Prevotella and Streptococcus are

TE D

among the top 4 most abundant taxa, with relative frequencies between 6-26% and 4-30% respectively. Another unusual characteristic of the taxonomic composition of 8Sat3 is that it is the only saliva sample to feature Achromobacter (4%), Stenotrophomonas (3%),

EP

Lactobacillus (0.24%), and Bacteroides (0.17%) among its top 10 genera. Although

CC

Lactobacillus and Bacteroides are also found in other saliva samples, including control samples, at higher relative abundances, they are not present among the top 10. The exposed

A

saliva sample 9Sat3 also shows an unusual abundance of Alcaligenacea (42%), which was not the case for the other two exposed saliva samples (1Sat3 and 6Sat3). The exposed skin 16

(16Skt3) and exposed menstrual sample (16Mbt3) clustering with semen in Figure 1A both show enrichment of four Proteobacteria genera: Enterobacteriaceae (26% in 16Skt3, 12% in 16Mbt3), Stenotrophomonas (19% in 16Skt3, 11% in 16Mbt3), Pseudomonadaceae (19% in 16Skt3, 11% in 16Mbt3) and Acinetobacter (16% in 16Skt3, 11% in 16Mbt3), which may be driving the similarities between these two samples. Compared to other skin samples, 16Skt3

IP T

shows a decrease in the abundance of Firmicutes. For 16Mbt3, there is a reduction in the

characteristic genus Lactobacillus (31%) that is otherwise found at a much higher relative

SC R

abundance in menstrual samples. Interestingly these two samples originate from the same

individual (16). More broadly, beyond these specific three samples, exposed skin and semen samples reveal a decrease in the relative frequency of some characteristic taxa, for example

U

Propionibacterium and Staphylococcus in skin, Bacteroides and Blautia in semen. However,

N

given the limited sample size, we could not identify consistent patterns in the variation

A

between control and exposed samples at each body site (Supplementary Figures S8 to S12).

M

We also checked, for one individual per body site, the baseline differences between two

TE D

samples obtained in two consecutive swabbing actions only a few minutes apart (labelled “a” and “b”). Semen was excluded for this test. We examined the change in relative abundance for the genera that were among the ten most abundant taxa in each sample and shared by both

EP

samples. Our results revealed varying degrees of differences depending on the body site

CC

(Supplementary Table S3). While the comparisons are drawn on only one pair of samples per site, thus precluding firm conclusions, the data suggests that intrapersonal diversity for

A

consecutive sampling actions was highest for skin and lowest for menstrual blood and vaginal fluid (for semen a comparison was not possible). The variation between two consecutive 17

swabbing actions minutes apart (“a” and “b”) was not necessarily smaller than that between two time points days apart (t1 and t2).

Discussion

IP T

In this study we analyzed the 16S rRNA V4-V5 regions of the bacterial communities from 46 different samples encompassing five different human body sites/fluids. We investigated the

SC R

taxonomic composition and clustering patterns, focusing on the comparison between samples

processed shortly after collection and samples exposed for 30 days. Our analyses showed that bacterial communities group according to the body site they originated from: even exposed

U

samples continue to harbor a microbial signature that can be used to identify this bodily

N

origin, although outliers are observed.

A

We were able to successfully obtain read data for 47 out of the 70 body site samples,

M

including one peripheral blood sample out of eleven. The amplification of 16S rRNA gene regions was most successful with menstrual blood, vaginal fluid, and saliva samples, and also

TE D

possible for skin and semen samples. In contrast, peripheral blood was particularly challenging, most probably due to the low concentration of microorganisms present. Further

EP

work will be needed to improve the amplification and sequencing from these body sites, either by increasing sequencing depth and/or modifying laboratory protocols. Our

CC

examination of within-sample diversity patterns also pointed to higher species richness in skin samples compared to the others, as also found by Flores et al. (2014) in their study of the gut,

A

tongue and skin (forehead and palm) microbial communities.

18

We included intra-individual samples that we obtained either by swabbing the same site consecutively (biological replicates) or after a few days up to a maximum of 1 week. We incorporated these samples as controls to check whether baseline variation while sampling or variation across time would impact the effect of body site on grouping patterns. Despite the limited intra-individual sampling, our results show that intra-individual variation did not

IP T

preclude grouping by body site. Our exploratory analyses revealed an overlap between

menstrual and vaginal samples in the agPCA as well as the PCoA plots based on the weighted

SC R

Unifrac distance. The overlap between menstrual and vaginal samples were driven by their similar taxonomic compositions. Both menstrual and vaginal samples exhibited a very skewed distribution, with heavy dominance of lactobacilli OTUs. Thus, the distinction of

U

these two body sites may necessitate additional information, for example the detection of

N

proteins specific to blood or menstrual blood using a mass spectrometer (Yang et al. 2012;

A

Van Steendam et al. 2013), or detection of mRNA markers specific to menstrual blood (Haas

M

et al. 2014). In the case of the PCoA of the Bray-Curtis dissimilarity measure, phylogenetic

TE D

information is not taken into account, therefore driving the separation of two clusters that comprise both vaginal and menstrual samples. When we examined the most abundant taxa in each of these two clusters, we found that each one was characterized by the dominance of one

EP

particular OTU from the Lactobacillus genus: either L.crispatus/acidophilus, or L. iners.

CC

Interestingly, microbiome profiles dominated by L. crispatus or L. iners have been observed in prior studies of the vaginal microbiome. These studies have revealed the distinction of at

A

least six types of microbiome profiles, termed community state types (CSTs), four of which show the predominance of one particular Lactobacillus species: either L. crispatus (CST I), L. 19

gasseri (CST II), L. iners (CST III), or L. jensenii (CST V). In our study, it is thus probable that the vaginal-menstrual cluster dominated by L.crispatus/acidophilus corresponds to CST I, and that the vaginal-menstrual cluster dominated by L. iners belongs to CST III. The other two CSTs (IV-A and IV-B) are characterized by diverse anaerobic bacteria and low abundance of Lactobacillus spp (Ravel et al. 2011; Gajer et al. 2012). CST IV was not

IP T

detected in our study, which could be due to the small sample size but also because of the

ethnicities sampled, as previous studies indicate that black women are more likely to have

SC R

CST IV while white women are more likely to have CST I (Gajer et al. 2012). Hence,

approaches relying strongly or solely on presence/absence of lactobacilli to predict vaginal origin could produce false negatives.

U

Our results also revealed close proximity of the semen and skin samples in the PCoA with

N

weighted Unifrac and Bray-Curtis. This outcome is not surprising given that skin bacteria

A

may be transferred to semen. Also, while a proportion of the seminal fluid is produced in the

M

seminal vesicles, it goes through the urinary tract system, picking up bacteria on the way. In

TE D

future forensic studies, it would thus be relevant to examine the distinction of seminal fluid and urine microbiomes. Previous studies have also found shared taxa between semen and vaginal fluid, including Lactobacillus (Hou et al. 2013; Weng et al. 2014). In our study,

EP

however, Lactobacillus did not make up a high relative abundance of the semen microbiome.

CC

As mentioned earlier, an important finding in our study was that, overall, samples exposed for 30 days continued to display the microbial signatures expected for their bodily origin.

A

Nonetheless, several outliers were observed, corresponding to three of the exposed samples (8Sat3, 16Skt3, 16Mbt3). These three had microbiome profiles that stood out among other 20

samples from the same body site in the agPCA, and it is interesting to note that two of these samples originate from the same individual (individual 16). The taxonomic similarities between 16Skt3 and 16Mbt3 could potentially be indicative of contamination during collection of the sample or during exposure. It is also noteworthy that in the PCoA carried out with the weighted Unifrac and Bray-Curtis

IP T

metrics, only two (8Sat3 and 16Mbt3) of the three samples mentioned earlier appear to be

outliers. It is not unexpected that different distance-based ordination metrics should lead to

SC R

different visualization outcomes of the data, although in this study there is general agreement among the three methods used. Ecological ordination metrics compute the dissimilarities based on binary information (presence/absence using an indicator function) or quantitative

U

information (for example, abundance using read counts) from two samples. Achieving an

N

optimal distance value that captures the true differences among samples, while using limited

A

information, for example using only the 16S rRNA read data from a sample, is a matter of

M

combining mathematical techniques. For instance, Bray-Curtis dissimilarity measures use the

TE D

difference in relative abundance (Bray and Curtis 1957), while weighted UniFrac uses phylogenetic tree branch lengths in addition to the difference in relative abundance, and unweighted UniFrac utilizes branch lengths and presence/absence information. Another

EP

metric, the generalized UniFrac, weights the branch lengths according to the total relative

CC

abundance found in both samples in addition to the difference in relative abundance between the two specimens (Chen et al. 2012). This, in turn, should correct for low-abundance

A

branches. In a study examining the microbiome data from skin, hair, nostril, gut and oral sites produced by Costello et al. (2009), the researchers found that the gut and oral samples could 21

be distinguished with unweighted Unifrac but not with the Bray-Curtis dissimilarity measure (Knights et al. 2011). In our study, the Bray-Curtis metric was useful to distinguish two vaginal-menstrual clusters that probably correspond to two community state types. While the vaginal microbiome appears to vary through time considerably for some women, the distinction of “types” could potentially contribute additional individualizing information in

IP T

forensic cases. It is not in the scope of this study to discuss which of these distance-based ordination measures are most suitable for metagenomics studies since this is in itself an

SC R

ongoing field of research.

Importantly, ordination methods such as PCA or PCoA are unsupervised learning approaches used mainly for data exploration and visualization, rather than prediction of unlabeled data, so

U

it is not clear to what extent the outliers in this study would be misclassified when carrying

N

out prediction. For classification tasks, supervised learning is more appropriate than

A

unsupervised learning as it provides predictive scores and the expected prediction errors can

M

be estimated. Among supervised methods, machine learning algorithms such as random forest

TE D

classifiers have been shown to yield high predictive power for body site classification (Knights et al. 2011; Statnikov et al. 2013). As illustrated in the investigation of the Costello body sites dataset by Knights et al. (2011), the samples from the ear canal, hair, nostril, and

EP

skin are difficult to visually distinguish in the PCoA, but nonetheless, the random forests

CC

classifier is able to predict labels with low expected error rates. This classifier was trained on datasets of over 2,400 samples. Due to the limited size of our dataset, we did not conduct

A

predictive modelling with a supervised approach here. However, we expect that generating a

22

large training dataset that incorporates samples exposed over varying periods of time would result in high performing models and high classification accuracies. Importantly, for forensic purposes the statistical output needs to be not only accurate but also in a suitable format for expert forensic evaluation. In cases where DNA matches between allelic profiles are investigated, this output is often the likelihood ratio between the hypothesis

IP T

being tested (for example, that a particular individual is the source of the questioned profile) versus the null hypothesis that the source of the profile is a random individual in the

SC R

population). While DNA matching is routinely conducted in forensics, and therefore an

extensively investigated area, novel applications require rigorous testing frameworks adapted to the forensic setting, as exemplified by a molecular investigation of pathogen transmission

U

networks. In this study, González-Candelas et al. (2013) reconstructed a phylogenetic tree to

N

check whether the strains of hepatitis C in patients and the anesthesiologist suspected of

A

infecting them were more closely related to each other than to other hepatitis C strains. They

M

then computed likelihoods for the phylogenetic tree under the hypothesis of infection by the

TE D

presumed suspect and under the hypothesis of infection from a different source, and used the likelihood ratio as evidence in court. In the case of microbiome sequencing to identify body fluids, such a statistical testing framework based on the likelihood ratios of competing

EP

hypotheses is expected to be of great value and thus requires further exploration.

CC

Another general point regarding microbiome sequencing that requires investigation is whether to focus on 16S rRNA gene sequencing data or utilize other markers within bacterial genomes

A

for body site classification. For individual identification, and as highlighted by Schmedes et al. (2017; 2018), bacterial markers that provide strain-level resolution are preferred over 16S 23

rRNA gene data because of the higher level of resolution achieved. In a recent study, Schmedes and colleagues (2018) examined publicly available shotgun metagenomics data from various skin sites (Oh et al. 2016) in order to select a set of over 250 candidate markers, which they incorporated into a next-generation sequencing assay. Their evaluation of these bacterial markers showed average classification accuracies above 70%, not only for the

IP T

identification of the donor of the skin sample, but also when distinguishing from which of the three skin sites the sample originated. These results on skin sites indicate the potential of

SC R

utilizing strain-level diversity for body site classification. However, further investigation is

necessary to determine a set of markers that would maximise classification accuracy across body sites other than skin as well as across individuals. To achieve this goal, shotgun

U

metagenomic sequencing of body fluid/tissue samples will be valuable for data mining and

N

for comparisons with 16S rRNA gene data.

A

Given the promising findings of our study, it will be interesting to explore the advantages and

M

the limitations of microbial sequencing for forensic body fluid/tissue identification. Much

TE D

work is yet to be conducted, including tests over shorter and longer periods of time with larger datasets, as well as the examination of mock samples placed on various kinds of

EP

surfaces typical of crime scenes.

CC

Conclusion

A

In this proof-of-concept study, we investigated the reliability of microbial signatures for forensic body fluid/tissue identification by comparing the bacterial community structures of 24

samples exposed to indoor conditions for 30 days versus control samples. We obtained a total of 70 samples from 6 body sites (saliva, semen, vaginal fluid, menstrual blood, skin and peripheral blood). We successfully amplified the 16S rRNA V4-V5 regions of most samples that were not peripheral blood. Our findings also show that even when samples have been exposed for a month, they still exhibit microbial signatures that are characteristic of their

IP T

body site of origin. Thus, this approach is a valuable tool that should be further explored to

SC R

test its applicability and limitations in the forensic setting.

U

Acknowledgements

N

We are very grateful to Murat A. Eren for his support setting up the study and his valuable

A

advice. Special thanks to Corinne Moser for excellent laboratory assistance and to all

TE D

M

participants who contributed samples to the study.

Consent for publication

EP

Subjects were informed of the project goals and consented to the publication of data from the

CC

present study.

A

Competing interests The authors declare no competing interests regarding the publication of this article. 25

References

A

CC

EP

TE D

M

A

N

U

SC R

IP T

Altschul, F. Stephen, Warren Gish, Webb Miller, Eugene W. Myers and David J. Lipman. 1990. “Basic local alignment search tool.” Journal of Molecular Biology 215 (3): 403–10. Benschop, Corina C. G., Frederike C. A. Quaak, Mathilde E. Boon, Titia Sijen, and Irene Kuiper. 2012. “Vaginal Microbial Flora Analysis by next Generation Sequencing and Microarrays; Can Microbes Indicate Vaginal Origin in a Forensic Context?” International Journal of Legal Medicine 126 (2): 303–10. Bray, J. Roger, and J. T. Curtis. 1957. “An Ordination of the Upland Forest Communities of Southern Wisconsin.” Ecological Monographs 27 (4): 325–49. Caporaso, J. Gregory, Justin Kuczynski, Jesse Stombaugh, Kyle Bittinger, Frederic D. Bushman, Elizabeth K. Costello, Noah Fierer, et al. 2010. “QIIME Allows Analysis of High-Throughput Community Sequencing Data.” Nature Methods 7 (5): 335–36. Caporaso, J. Gregory, Christian L. Lauber, Elizabeth K. Costello, Donna Berg-Lyons, Antonio Gonzalez, Jesse Stombaugh, Dan Knights, et al. 2011. “Moving Pictures of the Human Microbiome.” Genome Biology 12 (5): R50. Chen, Jun, Kyle Bittinger, Emily S. Charlson, Christian Hoffmann, James Lewis, Gary D. Wu, Ronald G. Collman, Frederic D. Bushman, and Hongzhe Li. 2012. “Associating Microbiome Composition with Environmental Covariates Using Generalized UniFrac Distances.” Bioinformatics 28 (16): 2106–13. Choi, Ajin, Kyoung-Jin Shin, Woo Ick Yang, and Hwan Young Lee. 2014a. “Body Fluid Identification by Integrated Analysis of DNA Methylation and Body Fluid-Specific Microbial DNA.” International Journal of Legal Medicine 128 (1): 33–41. ———. 2014b. “Body Fluid Identification by Integrated Analysis of DNA Methylation and Body Fluid-Specific Microbial DNA.” International Journal of Legal Medicine 128 (1): 33–41. Chu, Derrick M., Jun Ma, Amanda L. Prince, Kathleen M. Antony, Maxim D. Seferovic, and Kjersti M. Aagaard. 2017. “Maturation of the Infant Microbiome Community Structure and Function across Multiple Body Sites and in Relation to Mode of Delivery.” Nature Medicine 23 (3): 314– 26. Costello, Elizabeth K., Erica M. Carlisle, Elisabeth M. Bik, Michael J. Morowitz, and David A. Relman. 2013. “Microbiome Assembly across Multiple Body Sites in Low-Birthweight Infants.” mBio 4 (6): e00782–13. Costello, Elizabeth K., Christian L. Lauber, Micah Hamady, Noah Fierer, Jeffrey I. Gordon, and Rob Knight. 2009. “Bacterial Community Variation in Human Body Habitats across Space and Time.” Science 326 (5960): 1694–97. Edgar, Robert C. 2010. “Search and Clustering Orders of Magnitude Faster than BLAST.” Bioinformatics 26 (19): 2460–61. Edgar, Robert C and Henrik Flyvbjerg. 2015. “Error filtering, pair assembly and error correction for next-generation sequencing reads.” Bioinformatics 31 (21): 3476–82. Fierer, Noah, Christian L. Lauber, Nick Zhou, Daniel McDonald, Elizabeth K. Costello, and Rob Knight. 2010. “Forensic Identification Using Skin Bacterial Communities.” Proceedings of the National Academy of Sciences of the United States of America 107 (14): 6477–81. Flores, Gilberto E., J. Gregory Caporaso, Jessica B. Henley, Jai Ram Rideout, Daniel Domogala, John Chase, Jonathan W. Leff, et al. 2014. “Temporal Variability Is a Personalized Feature of the Human Microbiome.” Genome Biology 15 (12): 531. Fukuyama, Julia. 2017. “Adaptive gPCA: A Method for Structured Dimensionality Reduction.” arXiv. https://arxiv.org/abs/1702.00501. Fukuyama, Julia, Laurie Rumker, Kris Sankaran, Pratheepa Jeganathan, Les Dethlefsen, David A.

26

A

CC

EP

TE D

M

A

N

U

SC R

IP T

Relman, and Susan P. Holmes. 2017. “Multidomain Analyses of a Longitudinal Human Microbiome Intestinal Cleanout Perturbation Experiment.” PLoS Computational Biology 13 (8): e1005706. Gajer, Pawel, Rebecca M. Brotman, Guoyun Bai, Joyce Sakamoto, Ursel M. E. Schütte, Xue Zhong, Sara S. K. Koenig, et al. 2012. “Temporal Dynamics of the Human Vaginal Microbiota.” Science Translational Medicine 4 (132): 132ra52. Giampaoli, Saverio, Andrea Berti, Federica Valeriani, Gianluca Gianfranceschi, Antonio Piccolella, Laura Buggiotti, Cesare Rapone, Alessio Valentini, Luigi Ripani, and Vincenzo Romano Spica. 2012. “Molecular Identification of Vaginal Fluid by Microbial Signature.” Forensic Science International. Genetics 6 (5): 559–64. González-Candelas, Fernando, María Alma Bracho, Borys Wróbel, and Andrés Moya. 2013. “Molecular Evolution in Court: Analysis of a Large Hepatitis C Virus Outbreak from an Evolving Source.” BMC Biology 11 (July): 76. Haas, C., E. Hanson, M. J. Anjos, K. N. Ballantyne, R. Banemann, B. Bhoelai, E. Borges, et al. 2014. “RNA/DNA Co-Analysis from Human Menstrual Blood and Vaginal Secretion Stains: Results of a Fourth and Fifth Collaborative EDNAP Exercise.” Forensic Science International. Genetics 8 (1): 203–12. Hanssen, Eirik Nataas, Ekaterina Avershina, Knut Rudi, Peter Gill, and Lars Snipen. 2017. “Body Fluid Prediction from Microbial Patterns for Forensic Application.” Forensic Science International. Genetics 30 (September): 10–17. Hou, Dongsheng, Xia Zhou, Xue Zhong, Matthew L. Settles, Jessica Herring, Li Wang, Zaid Abdo, Larry J. Forney, and Chen Xu. 2013. “Microbiota of the Seminal Fluid from Healthy and Infertile Men.” Fertility and Sterility 100 (5): 1261–69. Human Microbiome Project Consortium. 2012. “Structure, Function and Diversity of the Healthy Human Microbiome.” Nature 486 (7402): 207–14. Huse, Susan M., Vincent B. Young, Hilary G. Morrison, Dionysios A. Antonopoulos, John Kwon, Sushila Dalal, Rose Arrieta, et al. 2014. “Comparison of Brush and Biopsy Sampling Methods of the Ileal Pouch for Assessment of Mucosa-Associated Microbiota of Human Subjects.” Microbiome 2 (1): 5. Kayser, Manfred, and Peter de Knijff. 2011. “Improving Human Forensics through Advances in Genetics, Genomics and Molecular Biology.” Nature Reviews. Genetics 12 (3): 179–92. Knights, Dan, Elizabeth K. Costello, and Rob Knight. 2011. “Supervised Classification of Human Microbiota.” FEMS Microbiology Reviews 35 (2): 343–59. Lloyd-Price, Jason, Anup Mahurkar, Gholamali Rahnavard, Jonathan Crabtree, Joshua Orvis, A. Brantley Hall, Arthur Brady, et al. 2017. “Strains, Functions and Dynamics in the Expanded Human Microbiome Project.” Nature 550 (7674): 61–66. Lozupone, Catherine A., Micah Hamady, Scott T. Kelley, and Rob Knight. 2007. “Quantitative and Qualitative Diversity Measures Lead to Different Insights into Factors That Structure Microbial Communities.” Applied and Environmental Microbiology 73 (5): 1576–85. Lozupone, Catherine, and Rob Knight. 2005. “UniFrac: A New Phylogenetic Method for Comparing Microbial Communities.” Applied and Environmental Microbiology 71 (12): 8228–35. McMurdie, Paul J., and Susan Holmes. 2013. “Phyloseq: An R Package for Reproducible Interactive Analysis and Graphics of Microbiome Census Data.” PloS One 8 (4): e61217. Nakanishi, Hiroaki, Akira Kido, Takeshi Ohmori, Aya Takada, Masaaki Hara, Noboru Adachi, and Kazuyuki Saito. 2009. “A Novel Method for the Identification of Saliva by Detecting Oral Streptococci Using PCR.” Forensic Science International 183 (1-3): 20–23. Nawrocki, Eric P., Diana L. Kolbe, and Sean R. Eddy. 2009. “Infernal 1.0: Inference of RNA Alignments.” Bioinformatics 25 (10): 1335–37.

27

A

CC

EP

TE D

M

A

N

U

SC R

IP T

Oh, Julia, Allyson L. Byrd, Morgan Park, Heidi H. Kong, Julia A. Segre. 2016. “Temporal Stability of the Human Skin Microbiome.” Cell 165 (4): 854–66. Oksanen, J., Blanchet, F., Kindt, R., Legendre, P., and O’Hara, R. 2016. Vegan: Community Ecology Package (version 2.4-6). https://doi.org/10.4135/9781412971874.n145. Proctor, Diana M., Julia A. Fukuyama, Peter M. Loomer, Gary C. Armitage, Stacey A. Lee, Nicole M. Davis, Mark I. Ryder, Susan P. Holmes, and David A. Relman. 2018. “A Spatial Gradient of Bacterial Diversity in the Human Oral Cavity Shaped by Salivary Flow.” Nature Communications 9 (1): 681. Ravel, Jacques, Pawel Gajer, Zaid Abdo, G. Maria Schneider, Sara S. K. Koenig, Stacey L. McCulle, Shara Karlebach, et al. 2011. “Vaginal Microbiome of Reproductive-Age Women.” Proceedings of the National Academy of Sciences of the United States of America 108 Suppl 1 (March): 4680– 87. R Core Team. 2018. R:A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. Schloss, Patrick D., Sarah L. Westcott, Thomas Ryabin, Justine R. Hall, Martin Hartmann, Emily B. Hollister, Ryan A. Lesniewski, et al. 2009. “Introducing Mothur: Open-Source, PlatformIndependent, Community-Supported Software for Describing and Comparing Microbial Communities.” Applied and Environmental Microbiology 75 (23): 7537–41. Schmedes, Sarah E., August E. Woerner, and Bruce Budowle. 2017. “Forensic Human Identification Using Skin Microbiomes.” Applied and Environmental Microbiology, September. https://doi.org/10.1128/AEM.01672-17. Schmedes, Sarah E., August E. Woerner, Nicole M. M. Novroski, Frank R. Wendt, Jonathan L. King, Kathryn M. Stephens, and Bruce Budowle. 2018. “Targeted Sequencing of Clade-Specific Markers from Skin Microbiomes for Forensic Human Identification.” Forensic Science International. Genetics 32: 50–61. Sender, Ron, Shai Fuchs, and Ron Milo. 2016a. “Are We Really Vastly Outnumbered? Revisiting the Ratio of Bacterial to Host Cells in Humans.” Cell 164 (3): 337–40. ———. 2016b. “Revised Estimates for the Number of Human and Bacteria Cells in the Body.” PLoS Biology 14 (8): e1002533. Sijen, Titia. 2015. “Molecular Approaches for Forensic Cell Type Identification: On mRNA, miRNA, DNA Methylation and Microbial Markers.” Forensic Science International. Genetics 18 (September): 21–32. Stamatakis, Alexandros. 2006. “RAxML-VI-HPC: Maximum Likelihood-Based Phylogenetic Analyses with Thousands of Taxa and Mixed Models.” Bioinformatics 22 (21): 2688–90. Statnikov, Alexander, Mikael Henaff, Varun Narendra, Kranti Konganti, Zhiguo Li, Liying Yang, Zhiheng Pei, Martin J. Blaser, Constantin F. Aliferis, and Alexander V. Alekseyenko. 2013. “A Comprehensive Evaluation of Multicategory Classification Methods for Microbiomic Data.” Microbiome 1 (1): 11. Van Steendam, Katleen, Marlies De Ceuleneer, Maarten Dhaenens, David Van Hoofstat, and Dieter Deforce. 2013. “Mass Spectrometry-Based Proteomics as a Tool to Identify Biological Matrices in Forensic Science.” International Journal of Legal Medicine 127 (2): 287–98. Virkler, Kelly, and Igor K. Lednev. 2009. “Analysis of Body Fluids for Forensic Purposes: From Laboratory Testing to Non-Destructive Rapid Confirmatory Identification at a Crime Scene.” Forensic Science International 188 (1-3): 1–17. Wang, Qiong, George M. Garrity, James M. Tiedje, and James R. Cole. 2007. “Naive Bayesian Classifier for Rapid Assignment of rRNA Sequences into the New Bacterial Taxonomy.” Applied and Environmental Microbiology 73 (16): 5261–67. Weng, Shun-Long, Chih-Min Chiu, Feng-Mao Lin, Wei-Chih Huang, Chao Liang, Ting Yang, Tzu-

28

A

CC

EP

TE D

M

A

N

U

SC R

IP T

Ling Yang, et al. 2014. “Bacterial Communities in Semen from Men of Infertile Couples: Metagenomic Sequencing Reveals Relationships of Seminal Microbiota to Semen Quality.” PloS One 9 (10): e110152. Wilkins, David, Marcus H. Y. Leung, and Patrick K. H. Lee. 2017. “Microbiota Fingerprints Lose Individually Identifying Features over Time.” Microbiome 5 (1): 1. Wilkinson, Leland. 2011. “ggplot2: Elegant Graphics for Data Analysis by WICKHAM, H.” Biometrics 67 (2): 678–79. Yang, Heyi, Bo Zhou, Mechthild Prinz, and Donald Siegel. 2012. “Proteomic Analysis of Menstrual Blood.” Molecular & Cellular Proteomics: MCP 11 (10): 1024–35.

29

Figure Legends

Figure 1. Generalized adaptive principal components analysis for the top 30 genera dataset. The first two principal axes are displayed. A) Sample plot with samples colored

IP T

according to body site, and separated into control and exposed samples. Contour plots correspond to 2D kernel density estimates, providing a continuous representation of the

SC R

distribution of the samples. B) Taxon plot with taxa colored-coded according to the phylum, with size representing the genus. C) Taxon plots shown separately for each phylum, with color coding as in B. Similar scores (positive or negative) along the axes (first or second) for

A

CC

EP

TE D

M

A

N

U

body fluid/tissue samples and for taxa provide information on their association.

30

IP T SC R U N A M TE D

Figure 2. Relative abundance for the top 20 genera. The stacked barplot for each sample

EP

shows the relative abundance of the top 20 genera within each body fluid/tissue sample. The

CC

top 20 genera were calculated by examining the mean relative abundance across all samples

A

per body site.

31

32

EP

CC

A TE D

IP T

SC R

U

N

A

M

Tables

Table 1. Most abundant genera in the control (nc for “number of control samples”) and the exposed (ne for “number of exposed samples”) samples per body site. The number of genera

IP T

with relative abundance above 1% is given for both the control and exposed samples, with the total number of genera in parentheses. Also shown are the 10 most abundant genera (with

SC R

relative abundance in parentheses) for the control and exposed samples. Control No. Genera above 1% and total

Exposed No. Genera above 1% and total

Top 10 Control (%)

Skin

19 (139)

18 (110)

Propionibacterium (20.1), Staphylococcus (10.0), Streptococcus (7.6), Blautia (5.8), Bacteroides (4.8), Lachnospiraceae (4.6), Escherichia-Shigella (3.3), Bifidobacterium (3.1), Corynebacterium (2.6), Lactobacillus (2.4)

Enterobacteriaceae (13.7), Lactobacillus (12.6), Stenotrophomonas (10.3), Pseudomonadaceae (9.8), Acinetobacter (8.3), Subdoligranulum (6.4), Bifidobacterium (5.8), Streptococcus (5.6), Bacteroides (4.0), Lachnospiraceae (2.3)

Bacteroides (29.3) , Blautia (12.2), Lachnospiraceae (11.5), Streptococcus (9.3), Bifidobacterium (8.3), Subdoligranulum (4.4), Faecalibacterium (3.4), Erysipelotrichaceae (3.4), Enterobacteriaceae (2.1), Anaerostipes (1.8)

Lachnospiraceae (19.2), Streptococcus (12.4), Alcaligenaceae (11.7), Enterobacteriaceae (4.7), Erysipelatoclostridium (4.6), Thiorhodococcus (4.4), Bacteroides (3.8), Bifidobacterium (3.7), Blautia (3.6), EscherichiaShigella (3.2)

U

Body fluid type

EP

15 (85)

A

CC

Semen (nc = 3, ne = 2)

TE D

M

A

N

(nc = 6, ne = 2)

20 (72)

Top 10 Exposed (%)

33

23 (119)

17 (108)

3 (102)

5 (97)

8 (107)

11 (106)

(nc = 7, ne = 4)

Vaginal fluid

U

Menstrual blood

Alcaligenaceae (33.2), Streptococcus (14.6), Prevotella (10.6), Aquamicrobium (4.3), Haemophilus (3.9), Veillonella (3.9), Alloprevotella (2.0), Granulicatella (1.8), Aggregatibacter (1.7), Actinomyces (1.7) Lactobacillus (85.3), Lachnospiraceae (3.4), Subdoligranulum (2.3), Blautia (1.7), Bifidobacterium (1.7), Alcaligenaceae (0.9), Escherichia-Shigella (0.5), Prevotella (0.5), Anaerostipes (0.4), Stenotrophomonas (0.3) Lactobacillus (67.3), Enterobacteriaceae (3.2), Pseudomonadaceae (2.9), Stenotrophomonas (2.8), Acinetobacter (2.8), Corynebacterium (2.6), Blautia (2.1), Lachnospiraceae (1.8), Bifidobacterium (1.3), Bacteroides (1.1)

SC R

(nc = 7, ne = 5)

Prevotella(18.9), Streptococcus (14.7), Veillonella (8.1), Haemophilus (7.8), Fusobacterium (6.0), Alloprevotella (4.3), Leptotrichia (2.5), Granulicatella (2.4), Neisseriaceae (2.2), Alcaligenaceae (2.1) Lactobacillus (87.5), Lachnospiraceae (2.3), Prevotella (1.1), Alcaligenaceae (1.0), Erysipelatoclostridium (0.9), Corynebacterium (0.7), Peptoniphilus (0.6), Bifidobacterium (0.6), Anaerococcus (0.5), Staphylococcus (0.5) Lactobacillus (80.1), Alcaligenaceae (2.9), Corynebacterium (2.7), Bacteroides (2.0), Lachnospiraceae (1.7), Escherichia-Shigella (1.3), Prevotella (1.1), Finegoldia (1.1), Anaerococcus (0.7), Corynebacteriales (0.7)

IP T

Saliva

A

CC

EP

TE D

M

A

N

(nc = 6, ne = 4)

34