Journal Pre-proof Comparative metagenomic analysis of microbial taxonomic and functional variations in untreated surface and reclaimed waters used in irrigation applications Jessica Chopyk, Daniel J. Nasko, Sarah Allard, Anthony Bui, Todd Treangen, Mihai Pop, Emmanuel F. Mongodin, Amy R. Sapkota PII:
S0043-1354(19)31024-3
DOI:
https://doi.org/10.1016/j.watres.2019.115250
Reference:
WR 115250
To appear in:
Water Research
Received Date: 11 December 2018 Revised Date:
8 October 2019
Accepted Date: 27 October 2019
Please cite this article as: Chopyk, J., Nasko, D.J., Allard, S., Bui, A., Treangen, T., Pop, M., Mongodin, E.F., Sapkota, A.R., Comparative metagenomic analysis of microbial taxonomic and functional variations in untreated surface and reclaimed waters used in irrigation applications, Water Research (2019), doi: https://doi.org/10.1016/j.watres.2019.115250. This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. © 2019 Published by Elsevier Ltd.
Irrigation Water Source
Brackish River
Reclamation Facility
Freshwater Pond
Taxonomy
CRISPR
Microbial Community
Resistome
Functional Potential
Freshwater Creek
1
Comparative metagenomic analysis of microbial taxonomic and functional variations in
2
untreated surface and reclaimed waters used in irrigation applications
3 4 5 6
Jessica Chopyk1, Daniel J. Nasko2, Sarah Allard1, Anthony Bui1, Todd Treangen3, Mihai Pop2, Emmanuel F. Mongodin4, Amy R. Sapkota1*
7
1
8
Health, College Park, Maryland, USA; 2 Center for Bioinformatics and Computational Biology,
9
Institute for Advanced Computer Sciences, University of Maryland, College Park, Maryland,
Maryland Institute for Applied Environmental Health, University of Maryland School of Public
10
USA; 3Department of Computer Science, Rice University, Houston, Texas, USAs; 4 Institute for
11
Genome Sciences and Department of Microbiology and Immunology, University of Maryland
12
School of Medicine, Baltimore, Maryland, USA
13 14 15 16 17 18 19 20
*Corresponding Author:
21
Amy R. Sapkota, Ph.D., M.P.H.
22
Maryland Institute for Applied Environmental Health
23
University of Maryland School of Public Health
24
School of Public Health Building (255)
25
4200 Valley Drive, Room 2234P
26
College Park, MD 20742
27
Phone 301-405-1772
28
Fax 301-314-1012
1
29 30
Abstract The use of irrigation water sourced from reclamation facilities and untreated surface
31
water bodies may be a practical solution to attenuate the burden on diminishing groundwater
32
aquifers. However, comprehensive microbial characterizations of these water sources are
33
generally lacking, especially with regard to variations through time and across multiple water
34
types. To address this knowledge gap we used a shotgun metagenomic approach to characterize
35
the taxonomic and functional variations of microbial communities within two agricultural ponds,
36
two freshwater creeks, two brackish rivers, and three water reclamation facilities located in the
37
Mid-Atlantic, United States. Water samples (n=24) were collected from all sites between
38
October and November 2016, and filtered onto 0.2 µm membrane filters. Filters were then
39
subjected to total DNA extraction and shotgun sequencing on the Illumina HiSeq platform. From
40
these data, we found that Betaproteobacteria dominated the majority of freshwater sites, while
41
Alphaproteobacteria were abundant at times in the brackish waters. One of these brackish sites
42
was also host to a greater abundance of the bacterial genera Gimesia and Microcystis.
43
Furthermore, predicted microbial features (e.g. antibiotic resistance genes (ARGs) and Clustered
44
Regularly Interspaced Short Palindromic Repeats (CRISPR) arrays) varied based on specific site
45
and sampling date. ARGs were found across samples, with the diversity and abundance highest
46
in those from a reclamation facility and a wastewater-impacted freshwater creek. Additionally,
47
we identified over 600 CRISPR arrays, containing ~2,600 unique spacers, suggestive of a diverse
48
and often site-specific phage community. Overall, these results provide a better understanding of
49
the complex microbial community in untreated surface and reclaimed waters, while highlighting
50
possible environmental and human health impacts associated with their use in agriculture.
51
Keywords: metagenome, shotgun sequencing, agricultural irrigation, antibiotic resistance,
2
52
CRISPR, microbial communities, bacteria, bacteriophage
53 54 55 56 57
Abbreviations: CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats); CARD
58
(Comprehensive Antibiotic Resistance Database); GO (Gene Ontology); ARG (Antibiotic
59
Resistance Gene); ORF (Open Reading Frame)
Running title: Microbial communities of irrigation waters
60 61
3
62 63
1. Introduction Steady declines in groundwater supplies, coupled with the estimation that by 2050 water
64
withdrawals for irrigation will increase by ~10%, has strengthened the demand for alternative
65
sources of water for agricultural applications (Bruinsma, 2009). The use of reclaimed (advanced
66
treated municipal wastewater) and untreated surface waters (e.g. brackish rivers, freshwater
67
creeks, and ponds) may provide effective solutions to reduce pressures on groundwater sources.
68
However, to date, there are very few studies that have comprehensively evaluated the taxonomy
69
and functional potential of microbial communities in alternative water sources. Moreover, there
70
has been limited research that has compared these microbial features across different sources.
71
Filling these knowledge gaps would identify the alternative water sources with the highest and
72
lowest degrees of putative risk with regard to both produce safety and public health. This
73
information is critical for informing the development of water reuse regulations.
74
Currently, reclaimed water is used as an alternative source of non-potable water,
75
especially in arid and semi-arid regions around the world (Dieter et al., 2018). However, because
76
it is the end-result of wastewater treatment, concerns remain about the levels of harmful
77
microbiological constituents that may persist in the water. Previous studies have identified that,
78
while wastewater treatment does reduce bacterial diversity, potential human pathogens
79
(Legionella spp., Mycobacterium spp., and Streptococcus spp.) may still be present and, in some
80
cases, selectively enriched by the disinfection process (Lin et al., 2016;Kulkarni et al., 2018).
81
Additionally, antibiotics introduced through municipal influent and agricultural runoff can
82
persist throughout the treatment process, resulting in high selection pressure for antibiotic-
83
resistant bacteria (Szczepanowski et al., 2009;Zhang and Li, 2011;Luczkiewicz et al., 2015).
84
However, reclaimed water characteristics are likely dependent on the quality of the influent and
4
85
the treatment practices employed by the wastewater treatment facility (Munir et al., 2011;Shanks
86
et al., 2013).
87
While reclaimed water is commonly considered the standard for water reuse, untreated
88
surface water sources may also represent practical alternatives for water management. For
89
instance, ponds are common features across the United States, with an estimated abundance
90
between 2.6 and 9 million (Renwick et al., 2006). They can occur naturally (e.g. floodplains,
91
isolated depressions), but are often human-constructed for a variety of utilitarian and aesthetic
92
purposes (Renwick et al., 2006). In agricultural settings, ponds may be constructed as a means of
93
capture and storage of freshwater for localized irrigation. While ponds are not as widely studied
94
as large lakes and marine systems, their unique topography may influence their microbial
95
community composition. For instance, ponds, and other surface water sites, are subjected to
96
environmental and anthropogenic factors that can largely be avoided when using groundwater
97
(e.g. animal fecal contamination and agricultural/urban runoff) (Suslow et al., 2003). Ponds also
98
tend to have a higher terrestrial-aquatic interchange compared to larger bodies of water (e.g.
99
lakes) that may drive the abundance of terrestrial microorganisms (Chopyk et al., 2018).
100
In contrast, lotic ecosystems (e.g. rivers, creeks) are marked by a natural flow, usually
101
toward another body of water such as an ocean or lake. Therefore, in addition to traditional
102
environmental factors, lotic systems may be impacted by connected waterways and upstream
103
discharge facilities. For instance, discharge of wastewater effluent into urban and suburban rivers
104
was reported to increase downstream organic and inorganic nutrient content and decrease overall
105
bacterial diversity in sediment (Drury et al., 2013). Alterations in bacterial community structure
106
have also been associated with catchment area (e.g. agricultural, forested, urban), likely due to
107
variations in nutrient concentration stemming from runoff (Miller et al., 2011;Staley et al., 2014).
5
108
Despite the potential importance of these water sources to regional biodiversity and water
109
management, previous studies on alternative water sources have relied on the amplification and
110
analysis of specific marker genes (e.g., 16S rRNA gene). While useful for broad community
111
analyses, these studies cannot explore functionality nor capture larger complex genomic features;
112
many of which are capable of providing a more holistic picture of the microbial community, such
113
as the CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) system. Because
114
the CRISPR-Cas system functions by integrating pieces of foreign DNA (i.e. plasmid, phage)
115
into recognizable arrays, predicting CRISPR can provide valuable information on phage
116
infection history, as well as provide a potential means to subtype pathogens (Shariat and Dudley,
117
2014). Previous studies exploring CRISPR-Cas from human body sites (Rho et al., 2012), dairy
118
operations (Horvath et al., 2008), and extreme environments (e.g. microbial mats) (Heidelberg et
119
al., 2009) such as hot springs (Snyder et al., 2010) and Antarctic snow (Lopatina et al., 2016)
120
have given us profound insights into the infection history and defense strategies of their
121
microbial communities. This is of particular importance when assessing the quality of irrigation
122
waters, because, while they are not direct human pathogens, phage are responsible for shaping
123
the diversity and genetic architecture of their host(s) and are often left unexplored (Frost et al.,
124
2005).
125
Therefore, the overall aim of this study was to provide comprehensive, foundational
126
knowledge on the microbial communities in multiple alternative irrigation water sources,
127
highlighting putative risks to produce safety and public health. To address this aim, we used
128
high-throughput shotgun metagenomic sequencing, which enabled us to evaluate the full
129
genomic spectrum present in the samples. With these data, our specific objectives were to: 1)
130
characterize and compare bacterial community composition; 2) predict and compare functional,
6
131
pathogenic, and antibiotic resistance genes; and 3) improve understanding of phage-host
132
relationships across multiple untreated surface and reclaimed water sites sampled over the course
133
of five weeks from October to November 2016 in the Mid Atlantic, United States.
7
134
2. Materials and Methods
135
2.1 Study Sites
136
Water samples (n=24) were collected from nine sites of four different water body types in
137
Maryland, United States: two tidal brackish rivers (MA04, MA08); two freshwater ponds
138
(MA10, MA11); two non-tidal freshwater creeks (MA03, MA07); and three water reclamation
139
facilitates (MA01, MA02, MA06) (Table 1).
140 141 142
2.2 Sample Collection: All sites were sampled on the following dates: 10/10/16, 10/24/16, and 11/14/16, with the
143
exception of the reclamation facilities (MA01, MA02, MA06) where samples were only
144
collected on 10/24/16 and 11/14/16. At each site, 1 L of water was collected. For the surface
145
water sites (e.g. creek, pond, river), sterile polypropylene sampling containers (Thermo Fisher
146
Scientific, MA, USA) were submerged 15-30 cm below the surface using a long-range grabbing
147
tool. For the reclamation facilities, 1 L of water was collected from a spigot, irrigation line or
148
storage lagoon, depending on the facility and sampling feasibility on that date. Samples were
149
transported on ice to the laboratory and stored at 4°C. In addition, a ProDSS digital sampling
150
system (YSI, Yellow Springs, OH, United States) was used to measure, in triplicate, the water
151
temperature (°C) and pH described in Table S1. Ambient temperature was also collected for the
152
time and date of sampling via the National Weather Services historical data archive.
153 154
2.3 Sample processing:
155
To remove the cellular fraction, each water sample was vacuum filtered through a 0.2 µm
156
membrane filter (Pall Corporation, MI, USA). Microbial DNA was then extracted from the filters
8
157
using an enzymatic and mechanical lysis procedure currently used in our lab to extract DNA
158
from various environmental biomes (Chopyk et al., 2017;Chopyk et al., 2018). Briefly, the filters
159
were added to lysing matrix tubes along with a cocktail of PBS buffer, lysozyme, lysostaphin,
160
and mutanolysin. After incubating, samples were subjected to a second lysing cocktail
161
(Proteinase K and SDS) followed by another incubation and mechanical lysis via bead beating.
162
The resulting DNA was purified with the QIAmp DNA mini kit (Qiagen, CA, USA) and
163
assessed with the NanoDrop 2000 Spectrophotometer.
164 165 166
2.4 Shotgun sequencing For each sample, DNA was used in a tagmentation reaction, followed by 12 cycles of
167
PCR amplification using Nextera i7 & i5 index primers per the modified Nextera XT protocol.
168
The final libraries were then quantitated by Quant-iT hsDNA kit. The libraries were pooled
169
based on their concentrations as determined by Quantstudio 5 and loaded onto an Agilent High
170
Sensitivity D1000 ScreenTape System. The samples were run across 8 lanes of an Illumina
171
Hiseq X10 flow (Illumina, San Diego, CA, United States) cell targeting 100 bp paired-end reads
172
per sample.
173 174
2.5 Metagenomic assembly
175
The resulting paired-end reads were quality trimmed using Trimmomatic ver. 0.36
176
(sliding window:4:30 min len:60) (Bolger et al., 2014), merged with FLASh ver. 1.2.11 (Magoč
177
and Salzberg, 2011), and assembled de novo with metaSPAdes ver. 3.10.1 (without read error
178
correction) (Nurk et al., 2016). Open reading frames (ORFs) were predicted from the assembled
179
contigs using MetaGene and translated into their corresponding peptides (Noguchi et al., 2006).
9
180 181 182
2.6 Taxonomic and functional classification Translated peptide ORFs were searched against UniRef 100 (retrieved May 2018) using
183
protein-protein BLAST (BLASTp ver. 2.6.0+) (E value ≤ 1e-3) (Altschul et al.,
184
1990;Consortium, 2009). Taxonomic classifications were then made to contigs by max
185
cumulative bit score. This was calculated by summing the bit scores of all taxa with a hit to
186
translated peptide ORF encoded by the contig. Functional assignments were made by assigning
187
Gene Ontology (GO) terms to translated ORFs. UniProt sequences are continually assigned GO
188
terms by the Gene Ontology Annotation (GOA) program (Huntley et al., 2014). Translated ORFs
189
were assigned all GO terms that were linked to UniRef 100 peptides within 3% of the top hit’s
190
bit score.
191
Coverage was calculated for each contig by recruiting quality-controlled reads to
192
assembled contigs using Bowtie2 ver. 2.3.3 (very sensitive local mode) and then using the
193
“depth” function of Samtools ver. 1.4.1 to compute the per-contig coverage (Langmead and
194
Salzberg, 2012) . To normalize abundances across libraries, contig and ORF coverages were
195
divided by the sum of coverage per million, similar to the transcripts per million (TPM) metric
196
used in RNA-Seq (Li et al., 2009;Conesa et al., 2016). Scripts performing these assignments and
197
normalization are available at https://github.com/dnasko/baby_virome. Taxonomic and
198
functional data were visualized using the R packages ggplot2 ver. 3.1.0 and pheatmap ver 1.0.10
199
(Wickham, 2009;Kolde and Kolde, 2018). Significance tests were conducted using an ANOVA
200
with post-hoc Tukey’s HSD Test.
201 202
2.7 Translated peptide ORF clustering
10
203
To assess the shared and unique functional profiles among the sampling sites, all
204
complete translated peptide ORFs were clustered at 60% with CD-HIT (Li and Godzik, 2006).
205
Cluster files were then parsed using the clstr2txt.pl script.
206 207 208
2.8 Identification of Antibiotic Resistance genes Translated peptide ORFs were searched against the “Comprehensive Antibiotic
209
Resistance Database” (CARD; retrieved July 2018) using protein-protein BLAST (BLASTp ver.
210
2.6.0+) (E value ≤ 1e-3) (Altschul et al., 1990;McArthur et al., 2013). A queried translated ORF
211
was regarded as ARG-like if >40% coverage and >80% amino-acid identity to a protein in the
212
CARD database (Zankari et al., 2012;Enault et al., 2017). Using the CARD database as a
213
reference, the putative ARGs were assigned a gene name and a drug class. For those conferring
214
resistance through mutations (i.e. kasA, gyrA, gyrB, murA, ndh, thyA, rpsL, rpsJ), a post-
215
processing step (MAFFT alignment with reference sequences available at CARD) was taken to
216
confirm the presence of resistance-conferring mutations (Katoh et al., 2002).
217 218 219
2.9 Prediction and analysis of CRISPRs CRISPR arrays were predicted from assembled contigs using the CRISPR detection and
220
validation tool, CASC available at https://github.com/dnasko/CASC (Nasko et al., 2019). CASC
221
utilizes a modified version of the CRISPR Recognition Tool (CRT) to call putative CRISPR
222
spacers (Bland et al., 2007). CASC then validates these spacers by searching against a database
223
of Cas proteins and CRISPR repeats to remove false positives and outputs FASTA files
224
containing: (1) valid CRISPR spacers; (2) false-positive CRISPR spacers; (3) valid CRISPR
225
repeats; and (4) false-positive CRISPR repeats. Valid CRISPR spacers were clustered with CD-
11
226
HIT-EST at 97% nucleotide similarity to determine the number of unique and shared spacers
227
within and among the sites (Li and Godzik, 2006).
228 229
2.10 Data availability
230
Metagenomic reads were submitted to NCBI’s Sequence Read Archive under the
231
BioProject accession number PRJNA473136 (SAMN10386131- SAMN10386154). The
232
assembled contigs were also submitted to NCBI under the same BioProject accession number
233
and BioSample IDs.
234 235
12
236
3 Results
237
3.1 Sequencing effort, assembly and taxonomy
238
All processed samples (n=24) were sequenced on the Illumina HiSeq for a total of
239
803,403,499 read pairs (Table S2), with an average of 33,475,146 per metagenome (+/− SD
240
9,122,444). After metagenomic assembly, an average of 50% of reads within a library recruited
241
to contigs from its assembly (11,383,477 contigs total; average contigs/library = 474,310 +/-
242
150,989 SD), with ~48% of reads within a library able to recruit to assembled contigs > 500 bp.
243
Using homology searching with BLAST we were able to identify the putative bacterial
244
genera and phyla that dominated each site at each time point. On average, 65% of contigs could
245
be confidently assigned a taxonomic representative (Table S3). Of these, between 78% and 98%
246
(mean: 91%) were assigned as Bacteria, followed by Eukaryota (min: 0.6%, max: 17%, mean:
247
4%), and Viruses (min: 0.6%, max: 10%, mean: 3%). For the contigs that could be identified, a
248
normalized abundance was calculated to account for sequencing effort and assembly/recruitment
249
proficiency. For those assigned as Eukaryota, the majority of the abundance was classified as
250
Streptophyta (min: 6%, max: 51%, mean: 21%), Arthropoda (min: 6%, max: 43%, mean: 15%),
251
and Chordata (min: 8%, max: 30%, mean: 15%) (Figure S1).
252
For those assigned as Bacteria, the most frequently observed bacterial phyla relative to
253
each sample was Proteobacteria (min: 35%, max: 83%, mean: 55%) (Figure 1A). This was
254
followed by Actinobacteria (min: 2%, max: 25%, mean: 13%), Bacteriodetes (min: 6%, max:
255
21%, mean: 13%) or Firmicutes (min: 4%, max: 12%, mean: 7%). Within the Proteobacteria
256
phylum, the class Betaproteobacteria was the most abundant at each of the sites, with the
257
exception of sampling dates 10/24/16 and 11/14/16 within the tidal brackish river, MA04, and
13
258
11/14/16 within MA08, in which bacteria from the Alphaproteobacteria class were the most
259
abundant within the Proteobacteria phylum (Figure 1B).
260
To further classify the taxonomic composition, we considered the bacterial assignments
261
at the genus level and compared them within and among sites (Figure 2). Between 74-90% of
262
contigs could be assigned to a bacterial genus. In total, there were 2,207 different genera, with
263
789 identified at some abundance in all of the samples. Of these, 44 occurred at a relative
264
abundance ≥1% in at least one sample. Interestingly, the majority of sites were dominated by the
265
same bacterial genera. However, there were some trends that demarcated specific sites. For
266
instance, Variovorax was the most abundant genus in 17 of the 24 samples, (min: 2%, max: 32%,
267
mean: 11%). However, in all the samples collected from the freshwater pond, MA10, and the
268
sample collected on 10/24/16 from the reclamation facility, MA01, Streptomyces was the most
269
abundant. Similarly, Pusillimonas was the most abundant in MA03, a freshwater creek
270
(11/14/16), and MA04, a brackish river (10/10/16), while Nostoc was the most abundant in
271
MA04 (10/24/16).
272
Furthermore, when we compared the normalized abundance of the dominant genera
273
among the different sites and tested their significance via an ANOVA with post-hoc Tukey HSD
274
we found significant differences for the genera Mesorhizobium, Cystobacter, Microvirga,
275
Microcystis, Gimesia, and Prochlorococcus (Table S4). Notably, we found that MA04, a
276
brackish river, had a significantly (p <0.05) higher abundance of Gimesia and Microcystis than
277
all of the other sites except MA01, a reclamation facility. Alternatively, MA01 had a
278
significantly (p <0.05) higher abundance of Prochlorococcus than all of the other sites except
279
MA04.
280
14
281 282
3.2 Functional analysis of bacterial-assigned translated peptide ORFs In addition to the taxonomic analysis, the metagenomic data generated in this study
283
allowed us to look at the putative function assigned to each of the translated peptide ORFs. On
284
average, 56% (min: 31%, max: 75%) of translated ORFs were assigned at least one GO-term,
285
with the majority (min: 81%, max: 98%, mean: 93%) coming from contigs assigned as bacteria.
286
Overall, the dominant putative molecular functions and biological processes that could be
287
assigned were similar among the various water sources. The GO-terms assigned at the greatest
288
frequency to all of the water sources were: transferase activity (GO:0016740) (min: 25%, max:
289
29%, mean: 27%), hydrolase activity (GO:0016787) (min: 25%, max: 27%, mean: 26%), ATP
290
binding (GO:0005524) (min: 18%, max: 25%, mean: 20%), oxidation-reduction process
291
(GO:0055114) (min: 17%, max: 20%, mean: 18%), and catalytic activity (GO:0003824) (min:
292
15%, max: 18%, mean: 17%) (Figure 3A).
293
Additionally, we explored the presence of the GO-term for pathogenesis (GO:0009405),
294
as well as its child term, toxin activity (GO:0090729), and terms associated with antibiotic
295
GO:0046677) and drug resistance (GO:0042493) (Korves and Colosimo, 2009). Again, the
296
normalized abundance was calculated and totaled for bacterial contigs containing translated
297
ORFs assigned to one or more of these GO-terms (Figure 3B). The GO-term for pathogenesis
298
was assigned to the greatest portion between 0.7 and 4% (mean: 2%), with the majority of these
299
contigs taxonomically assigned as Pseudomonas (min: 6%, max: 14%, mean: 9%) and
300
Pusillimonas (min: 1%, max: 12%, mean: 5%) (Figure S3). While the greatest percentage of
301
pathogenesis containing contigs was identified at MA03 on 11/14/16, there was no statistically
302
significant difference among the sites.
303
15
304 305
3.3 Translated peptide ORF clustering In addition to functional analysis, the complete translated peptide ORFs were clustered at
306
60% to assess the shared and unique functional profiles among the different sampling sites
307
(Figure S2, Table S5) (Li and Godzik, 2006). Despite the majority of ORFs assigned to the same
308
GO-term, most of the translated ORFs clustered within site, highlighting the unique peptide
309
sequences composition at each water source. The two tidal brackish rivers, MA04 and MA08,
310
had the greatest fraction of unique translated ORFs, 63% and 51%, respectively, followed by
311
MA07 (48%), MA11 (41%), MA03 (36%), MA02 (36%), MA01 (34%), MA10 (33%), and
312
MA06 (31%). The remaining fraction of translated ORFs from each site clustered with one or
313
more different sites and, in some cases, revealed a putative connection between reclaimed water
314
sites and ponds. In total, 51%, 41%, and 38% of the translated ORFs from the reclaimed water
315
sites MA02, MA06, and MA01 clustered with translated ORFs from the ponds MA11, MA10,
316
and MA10, respectively. In addition to MA10, a large percentage of translated ORFs (47%) from
317
MA06 also clustered with those from the non-tidal freshwater creek, MA03.
318
Additionally, between 4% and 9% of the translated ORFs clustered with at least one
319
member from all the other sites, representing a “functional core” among all the sampled water
320
sources. Within this core, 92% of the translated ORFs were assigned a GO-term. Similar to the
321
functional analysis, the majority of these GO-terms were for transferase activity (GO:0016740,
322
23%), oxidation-reduction process (GO:0055114, 20%), and hydrolase activity (GO:0016787,
323
19%), as well as translation (GO:0006412, 18%) and metal ion binding (GO:0046872, 18%).
324
Suggesting that the most abundant functions are those shared among the different sites. Not
325
surprisingly, in addition to these molecular functions and biological processes, we also saw an
16
326
abundance of cellular components assigned to the core, such as membrane (GO:0016020, 24%),
327
integral component of membrane (GO:0016021, 22%), and cytoplasm (GO:0005737, 21%).
328 329
3.4 Antibiotic Resistance
330
To further assess antibiotic resistance, we conducted a BLAST analysis of translated
331
peptide ORFs against CARD. Across all samples, 114 translated ORFs were identified as 32
332
unique ARGs conferring resistance to over ten drug classes, including resistance mechanisms
333
associated with target mutations and those associated with dedicated antibiotic resistance gene
334
products (McArthur et al., 2013). For the former, proteins were confirmed to carry the following
335
mutations: KasA, R121K (Lee et al., 1999); GyrA, S95T (Sulochana et al., 2007); MurA, C117D
336
(De Smet et al., 1999), RpsL, K88R, K43R (Brzostek et al., 2004;Ballif et al., 2012), RpsJ,
337
V57M (Kubanov et al., 2016), and EF-Tu Q124K (Zuurmond et al., 1999). Overall, the
338
reclamation water sampled on 11/14/16 from MA06 contained the greatest diversity of ARGs,
339
with 22 unique ARGs (identified from 25 ARG-like ORFs) (Figure 4). This was followed by the
340
non-tidal freshwater creek, MA03, sampled on 10/24/16, which had 13 unique ARGs (identified
341
from 14 ARG-like ORFs). The other non-tidal freshwater creek, MA07, contained the lowest
342
diversity of ARGs, with just one unique ARG identified throughout the entire sampling period.
343
We also identified the source genera and phyla of each ARG-like ORFs by parsing the
344
contig taxa identified previously (Figure 5). All the ARG-like ORFs originated from contigs
345
assigned as Bacteria, with the majority coming from Actinobacteria (59 of 114; 52%). While the
346
majority of these were rpsL genes, Actinobacteria genera were also associated with 12 other
347
unique ARGs. Furthermore, 31 ARG-like ORFs were classified as Proteobacteria (11 alpha, 6
348
beta, 14 gamma) and encompassed the majority of ARG diversity, with 26 unique ARGs.
17
349 350 351
3.5 CRISPR array abundance and taxonomy In addition to identifying traditional genes of concern we also sought to determine the
352
phage-host relationships within and among sites using CRISPR arrays. CRISPR arrays were
353
predicted in every library for a total of 612 arrays on 604 contigs. For the contigs that had a
354
predicted CRISPR array, we calculated their normalized abundance (Figure 6A). Overall,
355
CRISPR containing contigs only accounted for between 0.003% and 0.04% of the total contig
356
abundance among all samples. However, on average, the tidal brackish water site, MA04, had
357
both the greatest number of detected CRISPR arrays (238 across 234 contigs) and the highest
358
normalized abundance of the CRISPR-containing contigs.
359
To identify the taxonomy of the contigs containing putative CRISPR arrays, we parsed
360
the BLASTp assigned taxa. However, similar to previous studies (Gogleva et al., 2014) making
361
taxonomic predictions for CRISPR-containing contigs was difficult. Only 22% (130/604) of
362
CRISPR containing contigs could be assigned a taxa (Figure S4), the majority of which were of
363
the phyla Proteobacteria (45%) made up of 38% Gammaproteobacteria, 29%
364
Alphaproteobacteria and 17% Betaproteobacteria. This was followed by Firmicutes (21%) and
365
Cyanobacteria (11%).
366 367 368
3.6 CRISPR spacers within and among sites To determine the number of unique spacers, we clustered all the sampling dates within
369
each site at 97% nucleotide similarity. Not surprisingly, considering it had the greatest number of
370
CRISPR arrays, MA04, a brackish river site, also had the greatest number of unique spacers
371
(1173 spacers) followed by MA10 (398 spacers), MA11 (321 spacers), MA01 (293 spacers),
18
372
MA06 (269 spacers), MA03 (124 spacers), MA08 (161 spacers), MA07 (25 spacers), and MA02
373
(21 spacers) (Figure 6B). These unique spacers were then clustered together to produce the
374
number of shared spacers among the different sites. We observed similar trends to what was
375
found with the peptide clustering, where ponds and reclamation sites had the greatest similarity.
376
Here, the reclaimed sites shared the highest degree of spacers with pond sites, with MA01 and
377
MA10 sharing the most (69 spacers).
378 379
19
380 381
4. Discussion: Water reuse is an important practice to mitigate our dependence on dwindling
382
groundwater supplies. However, across the farm-to-fork continuum, irrigation water is a known
383
source of microbial contamination of fresh produce and, therefore, must be subjected to scrutiny.
384
In the present study, we utilized metagenomics to characterize multiple facets of the microbial
385
communities from a variety of irrigation water sites, including the bacterial community and
386
functional composition, phage infection history, and the presence of pathogenic genes and
387
ARGs. Together, these data provide a more complete picture of the microbial community
388
consortium within alternative water sources and showcase those sources with the greatest
389
putative risk to public health and food safety.
390
When comparing the bacterial communities amongst the different water sources the
391
greatest differences were between the fresh and brackish waters (Figure 1, 2, S2). This is likely
392
due to the mixing of fresh and saline water in the brackish sites enabling the co-occurrence of
393
bacteria typically associated with both freshwater and marine systems (e.g. Alphaproteobacteria
394
and Betaproteobacteria)(Piwosz et al., 2013;Gołębiewski et al., 2017). Additionally, the influx
395
of saline water likely selected for microorganisms tolerant of high salinity, such as some species
396
of Microcystis (Tonk et al., 2007;Ferreira et al., 2016). This may be of concern considering that
397
species of Microcystis are capable of producing hepatotoxins that have demonstrated some
398
ability to bioaccumulate in produce (Crush et al., 2008) – a potential source of toxin exposure not
399
often considered when assessing irrigation water quality.
400
In addition to Microcystis found in brackish water, genera commonly associated with
401
human disease (e.g. Streptococcus, Enterococcus) were identified in all samples. However,
402
typical environmental genera such as Variovorax were the most abundant (Figure 2).
20
403
Variovorax falls within the family Comamonadaceae and is phylogenetically closely related to
404
Acidovorax, another widespread environmental genera (Anzai et al., 2000;Lee et al., 2010).
405
Variovorax has been found throughout a variety of environments, such as: drinking water (Lee et
406
al., 2010;Delafont et al., 2013), freshwater riverine water (Nakayama et al., 2017), groundwater
407
(Humphries et al., 2005;Posman et al., 2017), and soil (Lipson et al., 2009). As a result, it has
408
been suggested that Variovorax can adapt to different environmental constraints, likely due to its
409
ability to degrade a variety of organic compounds, including pollutants (Bers et al., 2011).This
410
heterogeneity in metabolic potential may be an effective means for out-competing foreign
411
invaders, such as human-associated bacteria, and likely the explanation for its dominance among
412
the different water sources.
413
To further investigate the potential public health impacts of these water sources we also
414
investigated the presence of ARGs. Previous studies have identified a diverse range of ARGs in
415
reclaimed waters (Fahrenfeld et al., 2013), natural surface water sites (Vaz-Moreira et al., 2014),
416
and even pristine environments (e.g. ancient permafrost (D’Costa et al., 2011;Van Goethem et
417
al., 2018). These studies reflect both the natural production of antibiotics by bacteria and those
418
introduced to the environment by way of human pollution. In this study, we identified ARGs in
419
metagenomes from the majority of sites. However, comparing the sites revealed the highest ARG
420
diversity in samples from the reclamation facility, MA06, and the non-tidal freshwater creek,
421
MA03 (Figure 4). In MA06, most of the ARGs confer resistance to antibiotics commonly used in
422
clinical and agricultural applications including aminoglycosides, sulfonamides, rifamycins,
423
macrolides, cephalosporins, fluoroquinolones, and tetracyclines that function through a suite of
424
resistance mechanisms, including antibiotic inactivation and efflux, as well as antibiotic target
425
alteration, protection, and replacement (McArthur et al., 2013;Control and Prevention, 2014;Jia
21
426
et al., 2016). This broad range of resistance is not surprising as wastewater treatment plants are
427
considered “hotspots” of antimicrobial resistance due to traces of antibiotics in the wastewater
428
driving selection pressure (Guo et al., 2017;Karkman et al., 2017). One of the most abundant
429
ARGs detected in the reclaimed water was ermF, which confers resistance to macrolide-
430
lincosamide-streptogramin (MLS) antibiotics. These antibiotics are used frequently to treat
431
Gram-positive infections and have been found to withstand wastewater treatment, persisting at
432
high levels in the effluent (McArdell et al., 2003;Szekeres et al., 2017).
433
For the freshwater creek, MA03, we also observed a high ARG diversity. This is likely
434
due to discharge from a wastewater treatment facility located upstream of the sampling location.
435
Previous studies have identified that wastewater effluent discharge into natural lotic systems may
436
increase the presence of antibiotics, antibiotic-resistant bacteria, and ARGs downstream (Iwane
437
et al., 2001;Sabri et al., 2018). A study of the Grote Beerze River in the Netherlands found
438
increased amounts of antibiotics and ARGs up to 20 km downstream of an effluent discharge
439
point compared to upstream samples (Sabri et al., 2018). However, it is important to note in our
440
study that ARG diversity was not consistently high throughout the entire sampling period. Due to
441
the limited number of time points it is difficult to determine whether this was the result of
442
changing abiotic and biotic factors throughout time or variability in sampling. Nevertheless, the
443
presence of these ARGs in irrigation water sources raises potential concerns for the use of these
444
waters in agricultural applications and should be investigated further. For instance, the ARG at
445
the highest abundance in the majority of the other water sources besides MA03 was the mutant
446
of rpsL (Figure 4). Mutants of rpsL are resistant to aminoglycosides, including streptomycin, an
447
antibiotic produced by the soil Actinobacteria Streptomyces griseus and used widely in clinical
448
and agricultural applications (Brzostek et al., 2004;Ballif et al., 2012). Mutations in rpsL confer
22
449
resistance by disrupting interactions between the ribosomal protein and the antibiotic. These
450
types of target mutations are frequently found in environmental bacteria and are thought to be the
451
result of spontaneous pleotropic mutations (Paulander et al., 2009;Allen et al., 2010). This is
452
consistent with the potential taxonomic origin of the majority of rpsL mutants, Ferrimicrobium,
453
an iron-oxidizing Actinobacteria found in mine waters (Johnson and Hallberg, 2003), geothermal
454
soils (Stott et al., 2008), and a waterlogged bog (Sanyika et al., 2012). In fact, while pathogen-
455
associated bacterial genera (e.g. Listeria, Vibrio, Escherichia) were identified to be potential
456
hosts of some ARGs, most of the putative ARG hosts were traditional environmental genera
457
(Figure 5). The presence of ARGs even in environmental genera may still be a concern, as
458
previous research suggests that ARGs may pass through horizontal gene transfer from
459
indigenous environmental bacteria to pathogens (Guo et al., 2017).
460
Finally, to extend our study to phage, a traditionally understudied component of
461
microbial ecosystems, we leveraged the CRISPR-Cas system. Overall, we identified CRISPR
462
arrays at all time points and sampling locations, suggesting that this defense system is present in
463
bacteria from surface and reclaimed waters. Interestingly, the tidal brackish river site, MA04,
464
had the greatest abundance of CRISPR arrays (Figure 6). As described earlier, the influx of
465
saline water at this site may have selected for a unique bacterial composition, including an
466
increased abundance of Microcystis (Figure 2, S2). Previously, CRISPR arrays have been
467
identified in strains of the cyanobacterium Microcystis aeruginosa isolated from a shallow
468
eutrophic reservoir (Kuno et al., 2012). In addition, the authors found that the Microcystis
469
spacers were rarely shared among the strains, which is similar to the results observed in this
470
study.
23
471
Within the CRISPR arrays, the majority of spacers were unique to each site, suggesting
472
specific interactions between phage and hosts. This is likely due to the native bacteria having
473
adapted to the local phage populations at each site (Kunin et al., 2008). As a result, it has been
474
hypothesized that spacers can be used as a molecular fingerprint to subtype bacteria and
475
potentially track pathogen outbreaks (Shariat and Dudley, 2014). However, there were samples
476
across sites that demonstrated shared spacers, with the most apparent between the reclaimed
477
water site, MA01, and the freshwater pond site, MA10. This suggests a similarity in community
478
composition that may be indicative of similar bacterial lineages and/or environmental exposures.
479
For instance, the reclaimed water sites were stored at least temporarily in irrigation
480
ponds/lagoons, which have similar topographical features to ponds. This may also explain the
481
high percentage of shared translated peptide ORFs observed between the two sites (Figure S2).
482
However, this connection needs to be explored in greater detail.
483
Essential for the use of untreated surface and reclaimed water sources is knowledge of
484
their risks to environmental health and food safety. Although this study provides valuable
485
foundational data, there are some limitations. In particular, our study utilizes shotgun
486
sequencing, a method that surveys all DNA in a sample regardless of its host viability and
487
activity. As a result, accurately assigning taxonomy and function to unknown DNA fragments
488
with this methodology is a challenge (Sczyrba et al., 2017). Not only does the approach used for
489
taxonomic assignment (e.g. k-mer-based vs. alignment-based) influence the accuracy of the
490
assignments (McIntyre et al., 2017), but the database used in this process also plays a role
491
(Nasko et al., 2018). Here, we sought to circumvent these challenges by utilizing a widely
492
benchmarked similarity-searching tool (i.e. BLAST) and one of the most comprehensive non-
493
redundant reference database (i.e. UniRef). Using this approach we are confident that our
24
494
assignments accurately reflect microorganisms in the database with the highest degree of
495
taxonomic and functional homology. Additionally, due to financial constraints we were only able
496
to shotgun sequence samples from a limited number of sampling time points that took place
497
during a larger 2-year longitudinal study. Therefore, we are limited in our power to detect
498
changes through time within this subsample. This is particularly evident for the reclamation
499
facilities in which we were only able to shotgun sequence samples from two dates, further
500
limiting the assessment of these sites. Nevertheless, the foundational knowledge gained through
501
this study can be used to inform future studies and support the implementation of adaptive on-
502
farm technologies (e.g. water treatment approaches and drip irrigation) that can reduce the spread
503
of pathogenic microbial constituents.
504 505 506
5. Conclusions: The overall aim of our study was to provide comprehensive, foundational knowledge on the
507
microbial communities in multiple alternative water sources through the use of shotgun
508
metagenomic sequencing, with the ultimate goal of highlighting putative risks to produce safety
509
and public health. The main conclusions drawn from these data are:
510
•
511 512
vary depending on the date sampled and the specific site. •
513 514
Features of the bacterial community in untreated surface and reclaimed water sources
In particular, sites impacted by saline water had the greatest divergence with regard to putative bacterial taxonomy and functionality.
•
Potential public health threats, such as ARGs, were at the greatest abundance and
515
diversity in samples from a reclamation facility and a creek impacted by run-off from a
516
nearby wastewater facility.
25
517 518
•
The high degree of CRISPR spacers shared between certain sites (e.g. pond, reclamation facility) suggests similar environmental constraints and phage-host relationships.
519 520 521
Declaration of interests
522
The authors declare that they have no competing interests.
523 524
Acknowledgements:
525
This work was supported by the United States Department of Agriculture-National Institute of
526
Food and Agriculture, Grant number 2016-68007-25064, awarded to the University of Maryland
527
School of Public Health, that established CONSERVE: A Center of Excellence at the Nexus of
528
Sustainable Water Reuse, Food and Health. J.C. was supported by the NSF Research Traineeship
529
program, COMBINE, Grant number 1632976. This work also utilized the Extreme Science and
530
Engineering Discovery Environment (XSEDE), Grant number ACI-1548562.TBA. The authors
531
would also like to thanks Anup Mahurkar and Tom Emmel (University of Maryland School of
532
Medicine, Institute for Genome Sciences) for assistance with high-throughput computing.
533 534
26
535 536
Figure Legends
A
Brackish River
MA04
MA08
Reclamation Facility
MA01 MA02 MA06
Freshwater Pond
MA10
MA11
Freshwater Creek
MA03
MA07
Phyla
100
Acidobacteria Actinobacteria Bacteroidetes can. Magasanikbacteria can. Nomurabacteria
Relative Abundance (%)
75
Chloroflexi Cyanobacteria Firmicutes Unclassified 50
Planctomycetes Proteobacteria Verrucomicrobia Other
25
Alpha Beta
75
Delta 50
Epsilon Gamma
25
Other Unclassified
10 -2 11 4 -1 4 10 -2 4 11 -1 4 10 -2 11 4 -1 4 10 -1 10 0 -2 11 4 -1 4 10 -1 10 0 -2 11 4 -1 4 10 -1 10 0 -2 11 4 -1 4 10 -1 10 0 -2 11 4 -1 4
0
10 -1 0 10 -2 11 4 -1 4 10 -1 10 0 -2 11 4 -1 4
Relative Abundance (%)
B
Proteobacterial classes
0 100
Sampling Date
537 538 539
Figure 1: Normalized relative abundance of the bacterial taxa present in reclaimed and
540
untreated surface water sites at each sampling date. (A) Stacked bar chart depicting the
541
community structure at the phylum level. (B) Stacked bar chart of the Proteobacteria phyla split
542
into classes. Sites are grouped by water type and arranged in the order of sampling date.
543
Normalized abundance measured as contig coverage divided by the sum contig coverage per
544
million and presented as relative.
27
545 546
Figure 2: Taxonomic heatmap of the bacterial communities present in reclaimed and
547
untreated surface water sites at each sampling date. Heatmap based on the log-transformed
548
normalized abundance of the most dominant genera (>1% in at least one sample). Normalized
549
abundance measured as contig coverage divided by the sum contig coverage per million.
550 551 552 553 554 555 28
556 29
557
Figure 3: Functional profiles of bacterial communities present in reclaimed and untreated
558
surface water sites at each sampling date. (A) Heatmap of the number of translated peptide
559
ORFs assigned to the top ten GO terms for the biological and molecular categories in each
560
sample. The corresponding GO IDs are presented in parentheses. One translated ORF may be
561
matched to multiple GO terms. (B) Normalized abundance of contigs containing translated
562
ORFs assigned at each site to the following GO terms: pathogenesis (GO:0009405), toxin
563
activity (GO:0090729), response to antibiotic GO:0046677), response to drug (GO:0042493).
564
Sites are grouped by water type and arranged in the order of sampling date.
565 566
30
567 568
Figure 4: Antibiotic resistance genes (ARGs) predicted in reclaimed and untreated surface
569
water sites at each sampling date. Dotplot showing the ARG-like translated peptide ORFs
570
present at each water site, with the size of each dot equivalent to the number of translated ORFs
571
with homology to each ARG listed on the y-axis, and the color representative of the water type.
572
ARG drug class designations consistent with the ontology from the Comprehensive Antibiotic
573
Resistance Database. *MLS: macrolide, lincosamide, streptogramin antibiotic.
574
31
575 576
Figure 5: Bipartite network of the bacterial taxa with predicted antibiotic resistance genes
577
(ARGs) from reclaimed and untreated surface water sites. Uncolored circles represent ARGs
578
connected by an edge to its putative bacterial host, with each edge colored by the water type of
579
the sample it was identified in. Bacterial host defined as the taxa assigned to the contig the ARG-
580
like translated peptide ORF originated from and colored accordingly.
581
32
582 33
583
Figure 6: CRISPR array abundance and spacer overlap within and among reclaimed and
584
untreated surface water sites. (A) Bar plot of the normalized abundance of contigs containing
585
CRISPR arrays. Bars are colored by water type and labeled with the number of contigs
586
containing CRISPR arrays. Normalized abundance measured as contig coverage divided by the
587
sum contig coverage per million. (B) Network of shared spacers (97% identity) among
588
alternative water sites. Donut-shaped centroid nodes represent each of the nine sampling sites,
589
with the size equivalent to the number of spacers within that site, and the color representative of
590
the water type. Nodes connecting the centroids represent shared spacers between and among
591
sites.
34
592
References
593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634
Allen, H.K., Donato, J., Wang, H.H., Cloud-Hansen, K.A., Davies, J., and Handelsman, J. (2010). Call of the wild: antibiotic resistance genes in natural environments. Nature Reviews Microbiology 8, 251. Altschul, S.F., Gish, W., Miller, W., Myers, E.W., and Lipman, D.J. (1990). Basic local alignment search tool. Journal of molecular biology 215, 403-410. Anzai, Y., Kim, H., Park, J.-Y., Wakabayashi, H., and Oyaizu, H. (2000). Phylogenetic affiliation of the pseudomonads based on 16S rRNA sequence. International journal of systematic and evolutionary microbiology 50, 1563-1589. Ballif, M., Harino, P., Ley, S., Coscolla, M., Niemann, S., Carter, R., Coulter, C., Borrell, S., Siba, P., and Phuanukoonnon, S. (2012). Drug resistance-conferring mutations in Mycobacterium tuberculosis from Madang, Papua New Guinea. BMC microbiology 12, 191. Bers, K., Sniegowski, K., Albers, P., Breugelmans, P., Hendrickx, L., De Mot, R., and Springael, D. (2011). A molecular toolbox to estimate the number and diversity of Variovorax in the environment: application in soils treated with the phenylurea herbicide linuron. FEMS microbiology ecology 76, 14-25. Bland, C., Ramsey, T.L., Sabree, F., Lowe, M., Brown, K., Kyrpides, N.C., and Hugenholtz, P. (2007). CRISPR recognition tool (CRT): a tool for automatic detection of clustered regularly interspaced palindromic repeats. BMC bioinformatics 8, 209. Bolger, A.M., Lohse, M., and Usadel, B. (2014). Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics, btu170. Bruinsma, J. (Year). "The resource outlook to 2050: By how much do land, water and crop yields need to increase by 2050", in: Expert meeting on how to feed the world in), 2426. Brzostek, A., Sajduda, A., Śliwiński, T., Augustynowicz-Kopeć, E., Jaworski, A., Zwolska, Z., and Dziadek, J. (2004). Molecular characterisation of streptomycin-resistant Mycobacterium tuberculosis strains isolated in Poland. The International Journal of Tuberculosis and Lung Disease 8, 1032-1035. Chopyk, J., Allard, S., Nasko, D.J., Bui, A., Mongodin, E.F., and Sapkota, A.R. (2018). Agricultural freshwater pond supports diverse and dynamic bacterial and viral populations. Frontiers in Microbiology 9, 792. Chopyk, J., Chattopadhyay, S., Kulkarni, P., Claye, E., Babik, K.R., Reid, M.C., Smyth, E.M., Hittle, L.E., Paulson, J.N., and Cruz-Cano, R. (2017). Mentholation affects the cigarette microbiota by selecting for bacteria resistant to harsh environmental conditions and selecting against potential bacterial pathogens. Microbiome 5, 22. Conesa, A., Madrigal, P., Tarazona, S., Gomez-Cabrero, D., Cervera, A., Mcpherson, A., Szcześniak, M.W., Gaffney, D.J., Elo, L.L., and Zhang, X. (2016). A survey of best practices for RNA-seq data analysis. Genome biology 17, 1. Consortium, U. (2009). The universal protein resource (UniProt) in 2010. Nucleic acids research 38, D142-D148. Control, C.F.D., and Prevention (2014). "Outpatient antibiotic prescriptions—United States".).
35
635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678
Crush, J., Briggs, L., Sprosen, J., and Nichols, S. (2008). Effect of irrigation with lake water containing microcystins on microcystin content and growth of ryegrass, clover, rape, and lettuce. Environmental Toxicology 23, 246-252. D’costa, V.M., King, C.E., Kalan, L., Morar, M., Sung, W.W., Schwarz, C., Froese, D., Zazula, G., Calmels, F., and Debruyne, R. (2011). Antibiotic resistance is ancient. Nature 477, 457. De Smet, K.A., Kempsell, K.E., Gallagher, A., Duncan, K., and Young, D.B. (1999). Alteration of a single amino acid residue reverses fosfomycin resistance of recombinant MurA from Mycobacterium tuberculosis. Microbiology 145, 3177-3184. Delafont, V., Brouke, A., Bouchon, D., Moulin, L., and Héchard, Y. (2013). Microbiome of freeliving amoebae isolated from drinking water. Water research 47, 6958-6965. Dieter, C.A., Maupin, M.A., Caldwell, R.R., Harris, M.A., Ivahnenko, T.I., Lovelace, J.K., Barber, N.L., and Linsey, K.S. (2018). "Estimated use of water in the United States in 2015". US Geological Survey). Drury, B., Rosi-Marshall, E., and Kelly, J.J. (2013). Wastewater treatment effluent reduces the abundance and diversity of benthic bacterial communities in urban and suburban rivers. Applied and environmental microbiology, AEM. 03527-03512. Enault, F., Briet, A., Bouteille, L., Roux, S., Sullivan, M.B., and Petit, M.-A. (2017). Phages rarely encode antibiotic resistance genes: a cautionary tale for virome analyses. The ISME journal 11, 237. Fahrenfeld, N.L., Ma, Y., O'brien, M., and Pruden, A. (2013). Reclaimed water as a reservoir of antibiotic resistance genes: distribution system and irrigation implications. Frontiers in microbiology 4, 130. Ferreira, C., Soares, A.R., Lamosa, P., Santos, M.A., and Da Costa, M.S. (2016). Comparison of the compatible solute pool of two slightly halophilic planctomycetes species, Gimesia maris and Rubinisphaera brasiliensis. Extremophiles 20, 811-820. Frost, L.S., Leplae, R., Summers, A.O., and Toussaint, A. (2005). Mobile genetic elements: the agents of open source evolution. Nature Reviews Microbiology 3, 722. Gogleva, A.A., Gelfand, M.S., and Artamonova, I.I. (2014). Comparative analysis of CRISPR cassettes from the human gut metagenomic contigs. BMC genomics 15, 202. Gołębiewski, M., Całkiewicz, J., Creer, S., and Piwosz, K. (2017). Tideless estuaries in brackish seas as possible freshwater‐marine transition zones for bacteria: the case study of the Vistula river estuary. Environmental microbiology reports 9, 129-143. Guo, J., Li, J., Chen, H., Bond, P.L., and Yuan, Z. (2017). Metagenomic analysis reveals wastewater treatment plants as hotspots of antibiotic resistance genes and mobile genetic elements. Water research 123, 468-478. Hagström, Å., Pommier, T., Rohwer, F., Simu, K., Stolte, W., Svensson, D., and Zweifel, U.L. (2002). Use of 16S ribosomal DNA for delineation of marine bacterioplankton species. Applied and environmental microbiology 68, 3628-3633. Heidelberg, J.F., Nelson, W.C., Schoenfeld, T., and Bhaya, D. (2009). Germ warfare in a microbial mat community: CRISPRs provide insights into the co-evolution of host and viral genomes. PLoS One 4, e4169. Horvath, P., Romero, D.A., Coûté-Monvoisin, A.-C., Richards, M., Deveau, H., Moineau, S., Boyaval, P., Fremaux, C., and Barrangou, R. (2008). Diversity, activity, and evolution
36
679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724
of CRISPR loci in Streptococcus thermophilus. Journal of bacteriology 190, 14011412. Humphries, J., Ashe, A.M., Smiley, J., and Johnston, C. (2005). Microbial community structure and trichloroethylene degradation in groundwater. Canadian journal of microbiology 51, 433-439. Huntley, R.P., Sawford, T., Mutowo-Meullenet, P., Shypitsyna, A., Bonilla, C., Martin, M.J., and O'donovan, C. (2014). The GOA database: gene ontology annotation updates for 2015. Nucleic acids research 43, D1057-D1063. Iwane, T., Urase, T., and Yamamoto, K. (2001). Possible impact of treated wastewater discharge on incidence of antibiotic resistant bacteria in river water. Water Science and Technology 43, 91-99. Jia, B., Raphenya, A.R., Alcock, B., Waglechner, N., Guo, P., Tsang, K.K., Lago, B.A., Dave, B.M., Pereira, S., and Sharma, A.N. (2016). CARD 2017: expansion and model-centric curation of the comprehensive antibiotic resistance database. Nucleic acids research, gkw1004. Johnson, D.B., and Hallberg, K.B. (2003). The microbiology of acidic mine waters. Research in microbiology 154, 466-473. Karkman, A., Do, T.T., Walsh, F., and Virta, M.P. (2017). Antibiotic-resistance genes in waste water. Trends in microbiology. Katoh, K., Misawa, K., Kuma, K.I., and Miyata, T. (2002). MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic acids research 30, 3059-3066. Kolde, R., and Kolde, M.R. (2018). Package ‘pheatmap’. Korves, T., and Colosimo, M.E. (2009). Controlled vocabularies for microbial virulence factors. Trends in microbiology 17, 279-285. Kubanov, A., Vorobyev, D., Chestkov, A., Leinsoo, A., Shaskolskiy, B., Dementieva, E., Solomka, V., Plakhova, X., Gryadunov, D., and Deryabin, D. (2016). Molecular epidemiology of drug-resistant Neisseria gonorrhoeae in Russia (Current Status, 2015). BMC infectious diseases 16, 389. Kulkarni, P., Olson, N.D., Paulson, J.N., Pop, M., Maddox, C., Claye, E., Goldstein, R.E.R., Sharma, M., Gibbs, S.G., and Mongodin, E.F. (2018). Conventional wastewater treatment and reuse site practices modify bacterial community structure but do not eliminate some opportunistic pathogens in reclaimed water. Science of The Total Environment 639, 1126-1137. Kunin, V., He, S., Warnecke, F., Peterson, S.B., Martin, H.G., Haynes, M., Ivanova, N., Blackall, L.L., Breitbart, M., and Rohwer, F. (2008). A bacterial metapopulation adapts locally to phage predation despite global dispersal. Genome Research 18, 293-297. Kuno, S., Yoshida, T., Kaneko, T., and Sako, Y. (2012). Intricate interactions between the bloom-forming cyanobacterium Microcystis aeruginosa and foreign genetic elements, revealed by diversified clustered regularly interspaced short palindromic repeat (CRISPR) signatures. Applied and environmental microbiology 78, 5353-5360. Langmead, B., and Salzberg, S.L. (2012). Fast gapped-read alignment with Bowtie 2. Nature methods 9, 357-359. Lee, A.S., Lim, I.H., Tang, L.L., Telenti, A., and Wong, S.Y. (1999). Contribution of kasA Analysis to Detection of Isoniazid-Resistant Mycobacterium tuberculosisin Singapore. Antimicrobial agents and chemotherapy 43, 2087-2089. 37
725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768
Lee, J., Lee, C., Hugunin, K., Maute, C., and Dysko, R. (2010). Bacteria from drinking water supply and their fate in gastrointestinal tracts of germ-free mice: a phylogenetic comparison study. Water research 44, 5050-5058. Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G., and Durbin, R. (2009). The sequence alignment/map format and SAMtools. Bioinformatics 25, 2078-2079. Li, W., and Godzik, A. (2006). Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22, 1658-1659. Lin, Y., Li, D., Zeng, S., and He, M. (2016). Changes of microbial composition during wastewater reclamation and distribution systems revealed by high-throughput sequencing analyses. Frontiers of Environmental Science & Engineering 10, 539-547. Lipson, D.A., Monson, R.K., Schmidt, S.K., and Weintraub, M.N. (2009). The trade-off between growth rate and yield in microbial communities and the consequences for under-snow soil respiration in a high elevation coniferous forest. Biogeochemistry 95, 23-35. Lopatina, A., Medvedeva, S., Shmakov, S., Logacheva, M.D., Krylenkov, V., and Severinov, K. (2016). Metagenomic analysis of bacterial communities of Antarctic surface snow. Frontiers in microbiology 7, 398. Luczkiewicz, A., Kotlarska, E., Artichowicz, W., Tarasewicz, K., and Fudala-Ksiazek, S. (2015). Antimicrobial resistance of Pseudomonas spp. isolated from wastewater and wastewater-impacted marine coastal zone. Environmental Science and Pollution Research 22, 19823-19834. Magoč, T., and Salzberg, S.L. (2011). FLASH: fast length adjustment of short reads to improve genome assemblies. Bioinformatics 27, 2957-2963. Mark Ibekwe, A., Leddy, M.B., Bold, R.M., and Graves, A.K. (2012). Bacterial community composition in low-flowing river water with different sources of pollutants. FEMS microbiology ecology 79, 155-166. Mcardell, C.S., Molnar, E., Suter, M.J.-F., and Giger, W. (2003). Occurrence and fate of macrolide antibiotics in wastewater treatment plants and in the Glatt Valley Watershed, Switzerland. Environmental Science & Technology 37, 5479-5486. Mcarthur, A.G., Waglechner, N., Nizam, F., Yan, A., Azad, M.A., Baylay, A.J., Bhullar, K., Canova, M.J., De Pascale, G., and Ejim, L. (2013). The comprehensive antibiotic resistance database. Antimicrobial agents and chemotherapy, AAC. 00419-00413. Miller, J.D., Schoonover, J.E., Williard, K.W., and Hwang, C.R. (2011). Whole catchment land cover effects on water quality in the lower Kaskaskia River watershed. Water, Air, & Soil Pollution 221, 337. Munir, M., Wong, K., and Xagoraraki, I. (2011). Release of antibiotic resistant bacteria and genes in the effluent and biosolids of five wastewater utilities in Michigan. Water research 45, 681-693. Nakayama, T., Hoa, T.T.T., Harada, K., Warisaya, M., Asayama, M., Hinenoya, A., Lee, J.W., Phu, T.M., Ueda, S., and Sumimura, Y. (2017). Water metagenomic analysis reveals low bacterial diversity and the presence of antimicrobial residues and resistance genes in a river containing wastewater from backyard aquacultures in the Mekong Delta, Vietnam. Environmental pollution 222, 294-306.
38
769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812
Nasko, D.J., Ferrell, B.D., Moore, R.M., Bhavsar, J.D., Polson, S.W., and Wommack, K.E. (2019). CRISPR spacers indicate preferential matching of specific virioplankton genes. mBio 10, e02651-02618. Newton, R.J., Jones, S.E., Eiler, A., Mcmahon, K.D., and Bertilsson, S. (2011). A guide to the natural history of freshwater lake bacteria. Microbiology and Molecular Biology Reviews 75, 14-49. Noguchi, H., Park, J., and Takagi, T. (2006). MetaGene: prokaryotic gene finding from environmental genome shotgun sequences. Nucleic acids research 34, 5623-5630. Nurk, S., Meleshko, D., Korobeynikov, A., and Pevzner, P. (2016). metaSPAdes: a new versatile de novo metagenomics assembler. arXiv preprint arXiv:1604.03071. Paulander, W., Maisnier-Patin, S., and Andersson, D.I. (2009). The fitness cost of streptomycin resistance depends on rpsL mutation, carbon source and RpoS (σS). Genetics 183, 539-546. Piwosz, K., Salcher, M.M., Zeder, M., Ameryk, A., and Pernthaler, J. (2013). Seasonal dynamics and activity of typical freshwater bacteria in brackish waters of the Gulf of Gdańsk. Limnology and Oceanography 58, 817-826. Posman, K.M., Derito, C.M., and Madsen, E.L. (2017). Benzene degradation by a Variovorax species within a coal tar-contaminated groundwater microbial community. Applied and environmental microbiology 83, e02658-02616. Renwick, W.H., Sleezer, R.O., Buddemeier, R.W., and Smith, S.V. (Year). "Small artificial ponds in the United States: impacts on sedimentation and carbon budget", in: Proceedings of the Eighth Federal Interagency Sedimentation Conference), 738-744. Rho, M., Wu, Y.-W., Tang, H., Doak, T.G., and Ye, Y. (2012). Diverse CRISPRs evolving in human microbiomes. PLoS genetics 8, e1002441. Sabri, N., Schmitt, H., Van Der Zaan, B., Gerritsen, H., Zuidema, T., Rijnaarts, H., and Langenhoff, A. (2018). Prevalence of antibiotics and antibiotic resistance genes in a wastewater effluent-receiving river in the Netherlands. Journal of Environmental Chemical Engineering. Sanyika, T.W., Stafford, W., and Cowan, D.A. (2012). The soil and plant determinants of community structures of the dominant actinobacteria in Marion Island terrestrial habitats, Sub-Antarctica. Polar biology 35, 1129-1141. Shanks, O.C., Newton, R.J., Kelty, C.A., Huse, S.M., Sogin, M.L., and Mclellan, S.L. (2013). Comparison of microbial community structure in untreated wastewater from different geographic locales. Applied and environmental microbiology, AEM. 0344803412. Shariat, N., and Dudley, E.G. (2014). CRISPRs: molecular signatures used for pathogen subtyping. Appl. Environ. Microbiol. 80, 430-439. Snyder, J.C., Bateson, M.M., Lavin, M., and Young, M.J. (2010). Use of cellular CRISPR (clusters of regularly interspaced short palindromic repeats) spacer-based microarrays for detection of viruses in environmental samples. Applied and environmental microbiology 76, 7251-7258. Staley, C., Gould, T.J., Wang, P., Phillips, J., Cotner, J.B., and Sadowsky, M.J. (2014). Bacterial community structure is indicative of chemical inputs in the Upper Mississippi River. Frontiers in microbiology 5, 524.
39
813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850
Stott, M.B., Crowe, M.A., Mountain, B.W., Smirnova, A.V., Hou, S., Alam, M., and Dunfield, P.F. (2008). Isolation of novel bacteria, including a candidate division, from geothermal soils in New Zealand. Environmental Microbiology 10, 2030-2041. Sulochana, S., Narayanan, S., Paramasivan, C., Suganthi, C., and Narayanan, P. (2007). Analysis of fluoroquinolone resistance in clinical isolates of Mycobacterium tuberculosis from India. Journal of chemotherapy 19, 166-171. Suslow, T., Oria, M., Beuchat, L., Garrett, E., Parish, M., Harris, L., Farber, J., and Busta, F. (2003). Production practices as risk factors in microbial food safety of fresh and fresh‐cut produce. Comprehensive Reviews in Food Science and Food Safety 2, 3877. Szczepanowski, R., Linke, B., Krahn, I., Gartemann, K.-H., Guetzkow, T., Eichler, W., Pühler, A., and Schlueter, A. (2009). Detection of 140 clinically relevant antibiotic-resistance genes in the plasmid metagenome of wastewater treatment plant bacteria showing reduced susceptibility to selected antibiotics. Microbiology 155, 2306-2319. Szekeres, E., Baricz, A., Chiriac, C.M., Farkas, A., Opris, O., Soran, M.-L., Andrei, A.-S., Rudi, K., Balcázar, J.L., and Dragos, N. (2017). Abundance of antibiotics, antibiotic resistance genes and bacterial community composition in wastewater effluents from different Romanian hospitals. Environmental Pollution 225, 304-315. Tonk, L., Bosch, K., Visser, P.M., and Huisman, J. (2007). Salt tolerance of the harmful cyanobacterium Microcystis aeruginosa. Aquatic Microbial Ecology 46, 117-123. Van Goethem, M.W., Pierneef, R., Bezuidt, O.K., Van De Peer, Y., Cowan, D.A., and Makhalanyane, T.P. (2018). A reservoir of ‘historical’antibiotic resistance genes in remote pristine Antarctic soils. Microbiome 6, 40. Vaz-Moreira, I., Nunes, O.C., and Manaia, C.M. (2014). Bacterial diversity and antibiotic resistance in water habitats: searching the links with the human microbiome. FEMS microbiology reviews 38, 761-778. Wickham, H. (2009). ggplot2: elegant graphics for data analysis. Springer Science & Business Media. Zankari, E., Hasman, H., Cosentino, S., Vestergaard, M., Rasmussen, S., Lund, O., Aarestrup, F.M., and Larsen, M.V. (2012). Identification of acquired antimicrobial resistance genes. Journal of antimicrobial chemotherapy 67, 2640-2644. Zhang, T., and Li, B. (2011). Occurrence, transformation, and fate of antibiotics in municipal wastewater treatment plants. Critical reviews in environmental science and technology 41, 951-998. Zuurmond, A.-M., Olsthoorn-Tieleman, L.N., De Graaf, J.M., Parmeggiani, A., and Kraal, B. (1999). Mutant EF-tu species reveal novel features of the enacyloxin IIa inhibition mechanism on the ribosome1. Journal of molecular biology 294, 627-637.
40
Table 1: Description of study sites. Water Type
Site ID
Catchment land use
MA10
Agricultural
MA11
Agricultural
MA04
Marshland/Fore sted
Pond
River MA08
MA03
Creek
MA07
Reclaimed
Marsh grasses on both sides (~25-50m wide) then pine woods. Wooded, Agronomic cropland adjacent to the creek (~30-50 m) Flood plain grasses and woodland (hardwoods)
MA01
Wooded pines with grass lanes
MA02
Agronomic cropland (Corn and Soybeans)
MA06
Native grass
Description Freshwater pond with a maximum depth of ~3.35 meters and a surface area of ~ 0.26 ha Freshwater pond with a maximum depth of ~3 meters and a surface area of ~0.40 ha Tidal brackish river flowing into the Choptank River. At sampling site, width of ~ 76 meters and depth of ~ 0.3-0.6 meters. Tidal brackish river flowing into the Chesapeake Bay. At sampling site, width of ~15 meters and depth of ~2-3 meters. Within 1-1.5 miles downstream from broiler concentrated animal feeding operations (CAFOs). Non-tidal freshwater creek tributary of the Nanticoke River. At sampling site width of ~3 meters and a depth of ~1 meter. Within 1 mile downstream from wastewater treatment discharge facility. Non-tidal freshwater creek tributary of the Nanticoke River. At sampling site width of ~10 meters and a depth of ~1 meter. Within 2.5 miles downstream from CAFO poultry houses. Influent is treated through activated sludge processing (Sequential Batch Reactor), filtration, UV light, chlorination, and then stored in an openair lagoon before land application. Influent is treated through activated sludge processing (Sequential Batch Reactor), filtration, UV light, chlorination, and then stored in an openair lagoon before land application. Influent is treated through grinding, activated sludge processing, secondary clarification, and then stored in an open-air lagoon. It is chlorinated prior to land application.
Highlights (limited to 85 characters each, with spaces) •
Metagenomes from irrigation water sources were analyzed across time and among different sites
•
Proteobacteria, especially Betaproteobacteria, dominated the majority of sites.
•
Alphaproteobacteria were abundant at specific times in the brackish river samples.
•
Genes associated with pathogenesis were detected at all sites.
•
ARG and CRISPR abundance/diversity varied based on specific site and sampling date.
Declaration of interests X The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. ☐The authors declare the following financial interests/personal relationships which may be considered as potential competing interests: