Comparative metagenomic analysis of microbial taxonomic and functional variations in untreated surface and reclaimed waters used in irrigation applications

Comparative metagenomic analysis of microbial taxonomic and functional variations in untreated surface and reclaimed waters used in irrigation applications

Journal Pre-proof Comparative metagenomic analysis of microbial taxonomic and functional variations in untreated surface and reclaimed waters used in ...

2MB Sizes 0 Downloads 35 Views

Journal Pre-proof Comparative metagenomic analysis of microbial taxonomic and functional variations in untreated surface and reclaimed waters used in irrigation applications Jessica Chopyk, Daniel J. Nasko, Sarah Allard, Anthony Bui, Todd Treangen, Mihai Pop, Emmanuel F. Mongodin, Amy R. Sapkota PII:

S0043-1354(19)31024-3

DOI:

https://doi.org/10.1016/j.watres.2019.115250

Reference:

WR 115250

To appear in:

Water Research

Received Date: 11 December 2018 Revised Date:

8 October 2019

Accepted Date: 27 October 2019

Please cite this article as: Chopyk, J., Nasko, D.J., Allard, S., Bui, A., Treangen, T., Pop, M., Mongodin, E.F., Sapkota, A.R., Comparative metagenomic analysis of microbial taxonomic and functional variations in untreated surface and reclaimed waters used in irrigation applications, Water Research (2019), doi: https://doi.org/10.1016/j.watres.2019.115250. This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. © 2019 Published by Elsevier Ltd.

Irrigation Water Source

Brackish River

Reclamation Facility

Freshwater Pond

Taxonomy

CRISPR

Microbial Community

Resistome

Functional Potential

Freshwater Creek

1

Comparative metagenomic analysis of microbial taxonomic and functional variations in

2

untreated surface and reclaimed waters used in irrigation applications

3 4 5 6

Jessica Chopyk1, Daniel J. Nasko2, Sarah Allard1, Anthony Bui1, Todd Treangen3, Mihai Pop2, Emmanuel F. Mongodin4, Amy R. Sapkota1*

7

1

8

Health, College Park, Maryland, USA; 2 Center for Bioinformatics and Computational Biology,

9

Institute for Advanced Computer Sciences, University of Maryland, College Park, Maryland,

Maryland Institute for Applied Environmental Health, University of Maryland School of Public

10

USA; 3Department of Computer Science, Rice University, Houston, Texas, USAs; 4 Institute for

11

Genome Sciences and Department of Microbiology and Immunology, University of Maryland

12

School of Medicine, Baltimore, Maryland, USA

13 14 15 16 17 18 19 20

*Corresponding Author:

21

Amy R. Sapkota, Ph.D., M.P.H.

22

Maryland Institute for Applied Environmental Health

23

University of Maryland School of Public Health

24

School of Public Health Building (255)

25

4200 Valley Drive, Room 2234P

26

College Park, MD 20742

27

Phone 301-405-1772

28

Fax 301-314-1012

1

29 30

Abstract The use of irrigation water sourced from reclamation facilities and untreated surface

31

water bodies may be a practical solution to attenuate the burden on diminishing groundwater

32

aquifers. However, comprehensive microbial characterizations of these water sources are

33

generally lacking, especially with regard to variations through time and across multiple water

34

types. To address this knowledge gap we used a shotgun metagenomic approach to characterize

35

the taxonomic and functional variations of microbial communities within two agricultural ponds,

36

two freshwater creeks, two brackish rivers, and three water reclamation facilities located in the

37

Mid-Atlantic, United States. Water samples (n=24) were collected from all sites between

38

October and November 2016, and filtered onto 0.2 µm membrane filters. Filters were then

39

subjected to total DNA extraction and shotgun sequencing on the Illumina HiSeq platform. From

40

these data, we found that Betaproteobacteria dominated the majority of freshwater sites, while

41

Alphaproteobacteria were abundant at times in the brackish waters. One of these brackish sites

42

was also host to a greater abundance of the bacterial genera Gimesia and Microcystis.

43

Furthermore, predicted microbial features (e.g. antibiotic resistance genes (ARGs) and Clustered

44

Regularly Interspaced Short Palindromic Repeats (CRISPR) arrays) varied based on specific site

45

and sampling date. ARGs were found across samples, with the diversity and abundance highest

46

in those from a reclamation facility and a wastewater-impacted freshwater creek. Additionally,

47

we identified over 600 CRISPR arrays, containing ~2,600 unique spacers, suggestive of a diverse

48

and often site-specific phage community. Overall, these results provide a better understanding of

49

the complex microbial community in untreated surface and reclaimed waters, while highlighting

50

possible environmental and human health impacts associated with their use in agriculture.

51

Keywords: metagenome, shotgun sequencing, agricultural irrigation, antibiotic resistance,

2

52

CRISPR, microbial communities, bacteria, bacteriophage

53 54 55 56 57

Abbreviations: CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats); CARD

58

(Comprehensive Antibiotic Resistance Database); GO (Gene Ontology); ARG (Antibiotic

59

Resistance Gene); ORF (Open Reading Frame)

Running title: Microbial communities of irrigation waters

60 61

3

62 63

1. Introduction Steady declines in groundwater supplies, coupled with the estimation that by 2050 water

64

withdrawals for irrigation will increase by ~10%, has strengthened the demand for alternative

65

sources of water for agricultural applications (Bruinsma, 2009). The use of reclaimed (advanced

66

treated municipal wastewater) and untreated surface waters (e.g. brackish rivers, freshwater

67

creeks, and ponds) may provide effective solutions to reduce pressures on groundwater sources.

68

However, to date, there are very few studies that have comprehensively evaluated the taxonomy

69

and functional potential of microbial communities in alternative water sources. Moreover, there

70

has been limited research that has compared these microbial features across different sources.

71

Filling these knowledge gaps would identify the alternative water sources with the highest and

72

lowest degrees of putative risk with regard to both produce safety and public health. This

73

information is critical for informing the development of water reuse regulations.

74

Currently, reclaimed water is used as an alternative source of non-potable water,

75

especially in arid and semi-arid regions around the world (Dieter et al., 2018). However, because

76

it is the end-result of wastewater treatment, concerns remain about the levels of harmful

77

microbiological constituents that may persist in the water. Previous studies have identified that,

78

while wastewater treatment does reduce bacterial diversity, potential human pathogens

79

(Legionella spp., Mycobacterium spp., and Streptococcus spp.) may still be present and, in some

80

cases, selectively enriched by the disinfection process (Lin et al., 2016;Kulkarni et al., 2018).

81

Additionally, antibiotics introduced through municipal influent and agricultural runoff can

82

persist throughout the treatment process, resulting in high selection pressure for antibiotic-

83

resistant bacteria (Szczepanowski et al., 2009;Zhang and Li, 2011;Luczkiewicz et al., 2015).

84

However, reclaimed water characteristics are likely dependent on the quality of the influent and

4

85

the treatment practices employed by the wastewater treatment facility (Munir et al., 2011;Shanks

86

et al., 2013).

87

While reclaimed water is commonly considered the standard for water reuse, untreated

88

surface water sources may also represent practical alternatives for water management. For

89

instance, ponds are common features across the United States, with an estimated abundance

90

between 2.6 and 9 million (Renwick et al., 2006). They can occur naturally (e.g. floodplains,

91

isolated depressions), but are often human-constructed for a variety of utilitarian and aesthetic

92

purposes (Renwick et al., 2006). In agricultural settings, ponds may be constructed as a means of

93

capture and storage of freshwater for localized irrigation. While ponds are not as widely studied

94

as large lakes and marine systems, their unique topography may influence their microbial

95

community composition. For instance, ponds, and other surface water sites, are subjected to

96

environmental and anthropogenic factors that can largely be avoided when using groundwater

97

(e.g. animal fecal contamination and agricultural/urban runoff) (Suslow et al., 2003). Ponds also

98

tend to have a higher terrestrial-aquatic interchange compared to larger bodies of water (e.g.

99

lakes) that may drive the abundance of terrestrial microorganisms (Chopyk et al., 2018).

100

In contrast, lotic ecosystems (e.g. rivers, creeks) are marked by a natural flow, usually

101

toward another body of water such as an ocean or lake. Therefore, in addition to traditional

102

environmental factors, lotic systems may be impacted by connected waterways and upstream

103

discharge facilities. For instance, discharge of wastewater effluent into urban and suburban rivers

104

was reported to increase downstream organic and inorganic nutrient content and decrease overall

105

bacterial diversity in sediment (Drury et al., 2013). Alterations in bacterial community structure

106

have also been associated with catchment area (e.g. agricultural, forested, urban), likely due to

107

variations in nutrient concentration stemming from runoff (Miller et al., 2011;Staley et al., 2014).

5

108

Despite the potential importance of these water sources to regional biodiversity and water

109

management, previous studies on alternative water sources have relied on the amplification and

110

analysis of specific marker genes (e.g., 16S rRNA gene). While useful for broad community

111

analyses, these studies cannot explore functionality nor capture larger complex genomic features;

112

many of which are capable of providing a more holistic picture of the microbial community, such

113

as the CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) system. Because

114

the CRISPR-Cas system functions by integrating pieces of foreign DNA (i.e. plasmid, phage)

115

into recognizable arrays, predicting CRISPR can provide valuable information on phage

116

infection history, as well as provide a potential means to subtype pathogens (Shariat and Dudley,

117

2014). Previous studies exploring CRISPR-Cas from human body sites (Rho et al., 2012), dairy

118

operations (Horvath et al., 2008), and extreme environments (e.g. microbial mats) (Heidelberg et

119

al., 2009) such as hot springs (Snyder et al., 2010) and Antarctic snow (Lopatina et al., 2016)

120

have given us profound insights into the infection history and defense strategies of their

121

microbial communities. This is of particular importance when assessing the quality of irrigation

122

waters, because, while they are not direct human pathogens, phage are responsible for shaping

123

the diversity and genetic architecture of their host(s) and are often left unexplored (Frost et al.,

124

2005).

125

Therefore, the overall aim of this study was to provide comprehensive, foundational

126

knowledge on the microbial communities in multiple alternative irrigation water sources,

127

highlighting putative risks to produce safety and public health. To address this aim, we used

128

high-throughput shotgun metagenomic sequencing, which enabled us to evaluate the full

129

genomic spectrum present in the samples. With these data, our specific objectives were to: 1)

130

characterize and compare bacterial community composition; 2) predict and compare functional,

6

131

pathogenic, and antibiotic resistance genes; and 3) improve understanding of phage-host

132

relationships across multiple untreated surface and reclaimed water sites sampled over the course

133

of five weeks from October to November 2016 in the Mid Atlantic, United States.

7

134

2. Materials and Methods

135

2.1 Study Sites

136

Water samples (n=24) were collected from nine sites of four different water body types in

137

Maryland, United States: two tidal brackish rivers (MA04, MA08); two freshwater ponds

138

(MA10, MA11); two non-tidal freshwater creeks (MA03, MA07); and three water reclamation

139

facilitates (MA01, MA02, MA06) (Table 1).

140 141 142

2.2 Sample Collection: All sites were sampled on the following dates: 10/10/16, 10/24/16, and 11/14/16, with the

143

exception of the reclamation facilities (MA01, MA02, MA06) where samples were only

144

collected on 10/24/16 and 11/14/16. At each site, 1 L of water was collected. For the surface

145

water sites (e.g. creek, pond, river), sterile polypropylene sampling containers (Thermo Fisher

146

Scientific, MA, USA) were submerged 15-30 cm below the surface using a long-range grabbing

147

tool. For the reclamation facilities, 1 L of water was collected from a spigot, irrigation line or

148

storage lagoon, depending on the facility and sampling feasibility on that date. Samples were

149

transported on ice to the laboratory and stored at 4°C. In addition, a ProDSS digital sampling

150

system (YSI, Yellow Springs, OH, United States) was used to measure, in triplicate, the water

151

temperature (°C) and pH described in Table S1. Ambient temperature was also collected for the

152

time and date of sampling via the National Weather Services historical data archive.

153 154

2.3 Sample processing:

155

To remove the cellular fraction, each water sample was vacuum filtered through a 0.2 µm

156

membrane filter (Pall Corporation, MI, USA). Microbial DNA was then extracted from the filters

8

157

using an enzymatic and mechanical lysis procedure currently used in our lab to extract DNA

158

from various environmental biomes (Chopyk et al., 2017;Chopyk et al., 2018). Briefly, the filters

159

were added to lysing matrix tubes along with a cocktail of PBS buffer, lysozyme, lysostaphin,

160

and mutanolysin. After incubating, samples were subjected to a second lysing cocktail

161

(Proteinase K and SDS) followed by another incubation and mechanical lysis via bead beating.

162

The resulting DNA was purified with the QIAmp DNA mini kit (Qiagen, CA, USA) and

163

assessed with the NanoDrop 2000 Spectrophotometer.

164 165 166

2.4 Shotgun sequencing For each sample, DNA was used in a tagmentation reaction, followed by 12 cycles of

167

PCR amplification using Nextera i7 & i5 index primers per the modified Nextera XT protocol.

168

The final libraries were then quantitated by Quant-iT hsDNA kit. The libraries were pooled

169

based on their concentrations as determined by Quantstudio 5 and loaded onto an Agilent High

170

Sensitivity D1000 ScreenTape System. The samples were run across 8 lanes of an Illumina

171

Hiseq X10 flow (Illumina, San Diego, CA, United States) cell targeting 100 bp paired-end reads

172

per sample.

173 174

2.5 Metagenomic assembly

175

The resulting paired-end reads were quality trimmed using Trimmomatic ver. 0.36

176

(sliding window:4:30 min len:60) (Bolger et al., 2014), merged with FLASh ver. 1.2.11 (Magoč

177

and Salzberg, 2011), and assembled de novo with metaSPAdes ver. 3.10.1 (without read error

178

correction) (Nurk et al., 2016). Open reading frames (ORFs) were predicted from the assembled

179

contigs using MetaGene and translated into their corresponding peptides (Noguchi et al., 2006).

9

180 181 182

2.6 Taxonomic and functional classification Translated peptide ORFs were searched against UniRef 100 (retrieved May 2018) using

183

protein-protein BLAST (BLASTp ver. 2.6.0+) (E value ≤ 1e-3) (Altschul et al.,

184

1990;Consortium, 2009). Taxonomic classifications were then made to contigs by max

185

cumulative bit score. This was calculated by summing the bit scores of all taxa with a hit to

186

translated peptide ORF encoded by the contig. Functional assignments were made by assigning

187

Gene Ontology (GO) terms to translated ORFs. UniProt sequences are continually assigned GO

188

terms by the Gene Ontology Annotation (GOA) program (Huntley et al., 2014). Translated ORFs

189

were assigned all GO terms that were linked to UniRef 100 peptides within 3% of the top hit’s

190

bit score.

191

Coverage was calculated for each contig by recruiting quality-controlled reads to

192

assembled contigs using Bowtie2 ver. 2.3.3 (very sensitive local mode) and then using the

193

“depth” function of Samtools ver. 1.4.1 to compute the per-contig coverage (Langmead and

194

Salzberg, 2012) . To normalize abundances across libraries, contig and ORF coverages were

195

divided by the sum of coverage per million, similar to the transcripts per million (TPM) metric

196

used in RNA-Seq (Li et al., 2009;Conesa et al., 2016). Scripts performing these assignments and

197

normalization are available at https://github.com/dnasko/baby_virome. Taxonomic and

198

functional data were visualized using the R packages ggplot2 ver. 3.1.0 and pheatmap ver 1.0.10

199

(Wickham, 2009;Kolde and Kolde, 2018). Significance tests were conducted using an ANOVA

200

with post-hoc Tukey’s HSD Test.

201 202

2.7 Translated peptide ORF clustering

10

203

To assess the shared and unique functional profiles among the sampling sites, all

204

complete translated peptide ORFs were clustered at 60% with CD-HIT (Li and Godzik, 2006).

205

Cluster files were then parsed using the clstr2txt.pl script.

206 207 208

2.8 Identification of Antibiotic Resistance genes Translated peptide ORFs were searched against the “Comprehensive Antibiotic

209

Resistance Database” (CARD; retrieved July 2018) using protein-protein BLAST (BLASTp ver.

210

2.6.0+) (E value ≤ 1e-3) (Altschul et al., 1990;McArthur et al., 2013). A queried translated ORF

211

was regarded as ARG-like if >40% coverage and >80% amino-acid identity to a protein in the

212

CARD database (Zankari et al., 2012;Enault et al., 2017). Using the CARD database as a

213

reference, the putative ARGs were assigned a gene name and a drug class. For those conferring

214

resistance through mutations (i.e. kasA, gyrA, gyrB, murA, ndh, thyA, rpsL, rpsJ), a post-

215

processing step (MAFFT alignment with reference sequences available at CARD) was taken to

216

confirm the presence of resistance-conferring mutations (Katoh et al., 2002).

217 218 219

2.9 Prediction and analysis of CRISPRs CRISPR arrays were predicted from assembled contigs using the CRISPR detection and

220

validation tool, CASC available at https://github.com/dnasko/CASC (Nasko et al., 2019). CASC

221

utilizes a modified version of the CRISPR Recognition Tool (CRT) to call putative CRISPR

222

spacers (Bland et al., 2007). CASC then validates these spacers by searching against a database

223

of Cas proteins and CRISPR repeats to remove false positives and outputs FASTA files

224

containing: (1) valid CRISPR spacers; (2) false-positive CRISPR spacers; (3) valid CRISPR

225

repeats; and (4) false-positive CRISPR repeats. Valid CRISPR spacers were clustered with CD-

11

226

HIT-EST at 97% nucleotide similarity to determine the number of unique and shared spacers

227

within and among the sites (Li and Godzik, 2006).

228 229

2.10 Data availability

230

Metagenomic reads were submitted to NCBI’s Sequence Read Archive under the

231

BioProject accession number PRJNA473136 (SAMN10386131- SAMN10386154). The

232

assembled contigs were also submitted to NCBI under the same BioProject accession number

233

and BioSample IDs.

234 235

12

236

3 Results

237

3.1 Sequencing effort, assembly and taxonomy

238

All processed samples (n=24) were sequenced on the Illumina HiSeq for a total of

239

803,403,499 read pairs (Table S2), with an average of 33,475,146 per metagenome (+/− SD

240

9,122,444). After metagenomic assembly, an average of 50% of reads within a library recruited

241

to contigs from its assembly (11,383,477 contigs total; average contigs/library = 474,310 +/-

242

150,989 SD), with ~48% of reads within a library able to recruit to assembled contigs > 500 bp.

243

Using homology searching with BLAST we were able to identify the putative bacterial

244

genera and phyla that dominated each site at each time point. On average, 65% of contigs could

245

be confidently assigned a taxonomic representative (Table S3). Of these, between 78% and 98%

246

(mean: 91%) were assigned as Bacteria, followed by Eukaryota (min: 0.6%, max: 17%, mean:

247

4%), and Viruses (min: 0.6%, max: 10%, mean: 3%). For the contigs that could be identified, a

248

normalized abundance was calculated to account for sequencing effort and assembly/recruitment

249

proficiency. For those assigned as Eukaryota, the majority of the abundance was classified as

250

Streptophyta (min: 6%, max: 51%, mean: 21%), Arthropoda (min: 6%, max: 43%, mean: 15%),

251

and Chordata (min: 8%, max: 30%, mean: 15%) (Figure S1).

252

For those assigned as Bacteria, the most frequently observed bacterial phyla relative to

253

each sample was Proteobacteria (min: 35%, max: 83%, mean: 55%) (Figure 1A). This was

254

followed by Actinobacteria (min: 2%, max: 25%, mean: 13%), Bacteriodetes (min: 6%, max:

255

21%, mean: 13%) or Firmicutes (min: 4%, max: 12%, mean: 7%). Within the Proteobacteria

256

phylum, the class Betaproteobacteria was the most abundant at each of the sites, with the

257

exception of sampling dates 10/24/16 and 11/14/16 within the tidal brackish river, MA04, and

13

258

11/14/16 within MA08, in which bacteria from the Alphaproteobacteria class were the most

259

abundant within the Proteobacteria phylum (Figure 1B).

260

To further classify the taxonomic composition, we considered the bacterial assignments

261

at the genus level and compared them within and among sites (Figure 2). Between 74-90% of

262

contigs could be assigned to a bacterial genus. In total, there were 2,207 different genera, with

263

789 identified at some abundance in all of the samples. Of these, 44 occurred at a relative

264

abundance ≥1% in at least one sample. Interestingly, the majority of sites were dominated by the

265

same bacterial genera. However, there were some trends that demarcated specific sites. For

266

instance, Variovorax was the most abundant genus in 17 of the 24 samples, (min: 2%, max: 32%,

267

mean: 11%). However, in all the samples collected from the freshwater pond, MA10, and the

268

sample collected on 10/24/16 from the reclamation facility, MA01, Streptomyces was the most

269

abundant. Similarly, Pusillimonas was the most abundant in MA03, a freshwater creek

270

(11/14/16), and MA04, a brackish river (10/10/16), while Nostoc was the most abundant in

271

MA04 (10/24/16).

272

Furthermore, when we compared the normalized abundance of the dominant genera

273

among the different sites and tested their significance via an ANOVA with post-hoc Tukey HSD

274

we found significant differences for the genera Mesorhizobium, Cystobacter, Microvirga,

275

Microcystis, Gimesia, and Prochlorococcus (Table S4). Notably, we found that MA04, a

276

brackish river, had a significantly (p <0.05) higher abundance of Gimesia and Microcystis than

277

all of the other sites except MA01, a reclamation facility. Alternatively, MA01 had a

278

significantly (p <0.05) higher abundance of Prochlorococcus than all of the other sites except

279

MA04.

280

14

281 282

3.2 Functional analysis of bacterial-assigned translated peptide ORFs In addition to the taxonomic analysis, the metagenomic data generated in this study

283

allowed us to look at the putative function assigned to each of the translated peptide ORFs. On

284

average, 56% (min: 31%, max: 75%) of translated ORFs were assigned at least one GO-term,

285

with the majority (min: 81%, max: 98%, mean: 93%) coming from contigs assigned as bacteria.

286

Overall, the dominant putative molecular functions and biological processes that could be

287

assigned were similar among the various water sources. The GO-terms assigned at the greatest

288

frequency to all of the water sources were: transferase activity (GO:0016740) (min: 25%, max:

289

29%, mean: 27%), hydrolase activity (GO:0016787) (min: 25%, max: 27%, mean: 26%), ATP

290

binding (GO:0005524) (min: 18%, max: 25%, mean: 20%), oxidation-reduction process

291

(GO:0055114) (min: 17%, max: 20%, mean: 18%), and catalytic activity (GO:0003824) (min:

292

15%, max: 18%, mean: 17%) (Figure 3A).

293

Additionally, we explored the presence of the GO-term for pathogenesis (GO:0009405),

294

as well as its child term, toxin activity (GO:0090729), and terms associated with antibiotic

295

GO:0046677) and drug resistance (GO:0042493) (Korves and Colosimo, 2009). Again, the

296

normalized abundance was calculated and totaled for bacterial contigs containing translated

297

ORFs assigned to one or more of these GO-terms (Figure 3B). The GO-term for pathogenesis

298

was assigned to the greatest portion between 0.7 and 4% (mean: 2%), with the majority of these

299

contigs taxonomically assigned as Pseudomonas (min: 6%, max: 14%, mean: 9%) and

300

Pusillimonas (min: 1%, max: 12%, mean: 5%) (Figure S3). While the greatest percentage of

301

pathogenesis containing contigs was identified at MA03 on 11/14/16, there was no statistically

302

significant difference among the sites.

303

15

304 305

3.3 Translated peptide ORF clustering In addition to functional analysis, the complete translated peptide ORFs were clustered at

306

60% to assess the shared and unique functional profiles among the different sampling sites

307

(Figure S2, Table S5) (Li and Godzik, 2006). Despite the majority of ORFs assigned to the same

308

GO-term, most of the translated ORFs clustered within site, highlighting the unique peptide

309

sequences composition at each water source. The two tidal brackish rivers, MA04 and MA08,

310

had the greatest fraction of unique translated ORFs, 63% and 51%, respectively, followed by

311

MA07 (48%), MA11 (41%), MA03 (36%), MA02 (36%), MA01 (34%), MA10 (33%), and

312

MA06 (31%). The remaining fraction of translated ORFs from each site clustered with one or

313

more different sites and, in some cases, revealed a putative connection between reclaimed water

314

sites and ponds. In total, 51%, 41%, and 38% of the translated ORFs from the reclaimed water

315

sites MA02, MA06, and MA01 clustered with translated ORFs from the ponds MA11, MA10,

316

and MA10, respectively. In addition to MA10, a large percentage of translated ORFs (47%) from

317

MA06 also clustered with those from the non-tidal freshwater creek, MA03.

318

Additionally, between 4% and 9% of the translated ORFs clustered with at least one

319

member from all the other sites, representing a “functional core” among all the sampled water

320

sources. Within this core, 92% of the translated ORFs were assigned a GO-term. Similar to the

321

functional analysis, the majority of these GO-terms were for transferase activity (GO:0016740,

322

23%), oxidation-reduction process (GO:0055114, 20%), and hydrolase activity (GO:0016787,

323

19%), as well as translation (GO:0006412, 18%) and metal ion binding (GO:0046872, 18%).

324

Suggesting that the most abundant functions are those shared among the different sites. Not

325

surprisingly, in addition to these molecular functions and biological processes, we also saw an

16

326

abundance of cellular components assigned to the core, such as membrane (GO:0016020, 24%),

327

integral component of membrane (GO:0016021, 22%), and cytoplasm (GO:0005737, 21%).

328 329

3.4 Antibiotic Resistance

330

To further assess antibiotic resistance, we conducted a BLAST analysis of translated

331

peptide ORFs against CARD. Across all samples, 114 translated ORFs were identified as 32

332

unique ARGs conferring resistance to over ten drug classes, including resistance mechanisms

333

associated with target mutations and those associated with dedicated antibiotic resistance gene

334

products (McArthur et al., 2013). For the former, proteins were confirmed to carry the following

335

mutations: KasA, R121K (Lee et al., 1999); GyrA, S95T (Sulochana et al., 2007); MurA, C117D

336

(De Smet et al., 1999), RpsL, K88R, K43R (Brzostek et al., 2004;Ballif et al., 2012), RpsJ,

337

V57M (Kubanov et al., 2016), and EF-Tu Q124K (Zuurmond et al., 1999). Overall, the

338

reclamation water sampled on 11/14/16 from MA06 contained the greatest diversity of ARGs,

339

with 22 unique ARGs (identified from 25 ARG-like ORFs) (Figure 4). This was followed by the

340

non-tidal freshwater creek, MA03, sampled on 10/24/16, which had 13 unique ARGs (identified

341

from 14 ARG-like ORFs). The other non-tidal freshwater creek, MA07, contained the lowest

342

diversity of ARGs, with just one unique ARG identified throughout the entire sampling period.

343

We also identified the source genera and phyla of each ARG-like ORFs by parsing the

344

contig taxa identified previously (Figure 5). All the ARG-like ORFs originated from contigs

345

assigned as Bacteria, with the majority coming from Actinobacteria (59 of 114; 52%). While the

346

majority of these were rpsL genes, Actinobacteria genera were also associated with 12 other

347

unique ARGs. Furthermore, 31 ARG-like ORFs were classified as Proteobacteria (11 alpha, 6

348

beta, 14 gamma) and encompassed the majority of ARG diversity, with 26 unique ARGs.

17

349 350 351

3.5 CRISPR array abundance and taxonomy In addition to identifying traditional genes of concern we also sought to determine the

352

phage-host relationships within and among sites using CRISPR arrays. CRISPR arrays were

353

predicted in every library for a total of 612 arrays on 604 contigs. For the contigs that had a

354

predicted CRISPR array, we calculated their normalized abundance (Figure 6A). Overall,

355

CRISPR containing contigs only accounted for between 0.003% and 0.04% of the total contig

356

abundance among all samples. However, on average, the tidal brackish water site, MA04, had

357

both the greatest number of detected CRISPR arrays (238 across 234 contigs) and the highest

358

normalized abundance of the CRISPR-containing contigs.

359

To identify the taxonomy of the contigs containing putative CRISPR arrays, we parsed

360

the BLASTp assigned taxa. However, similar to previous studies (Gogleva et al., 2014) making

361

taxonomic predictions for CRISPR-containing contigs was difficult. Only 22% (130/604) of

362

CRISPR containing contigs could be assigned a taxa (Figure S4), the majority of which were of

363

the phyla Proteobacteria (45%) made up of 38% Gammaproteobacteria, 29%

364

Alphaproteobacteria and 17% Betaproteobacteria. This was followed by Firmicutes (21%) and

365

Cyanobacteria (11%).

366 367 368

3.6 CRISPR spacers within and among sites To determine the number of unique spacers, we clustered all the sampling dates within

369

each site at 97% nucleotide similarity. Not surprisingly, considering it had the greatest number of

370

CRISPR arrays, MA04, a brackish river site, also had the greatest number of unique spacers

371

(1173 spacers) followed by MA10 (398 spacers), MA11 (321 spacers), MA01 (293 spacers),

18

372

MA06 (269 spacers), MA03 (124 spacers), MA08 (161 spacers), MA07 (25 spacers), and MA02

373

(21 spacers) (Figure 6B). These unique spacers were then clustered together to produce the

374

number of shared spacers among the different sites. We observed similar trends to what was

375

found with the peptide clustering, where ponds and reclamation sites had the greatest similarity.

376

Here, the reclaimed sites shared the highest degree of spacers with pond sites, with MA01 and

377

MA10 sharing the most (69 spacers).

378 379

19

380 381

4. Discussion: Water reuse is an important practice to mitigate our dependence on dwindling

382

groundwater supplies. However, across the farm-to-fork continuum, irrigation water is a known

383

source of microbial contamination of fresh produce and, therefore, must be subjected to scrutiny.

384

In the present study, we utilized metagenomics to characterize multiple facets of the microbial

385

communities from a variety of irrigation water sites, including the bacterial community and

386

functional composition, phage infection history, and the presence of pathogenic genes and

387

ARGs. Together, these data provide a more complete picture of the microbial community

388

consortium within alternative water sources and showcase those sources with the greatest

389

putative risk to public health and food safety.

390

When comparing the bacterial communities amongst the different water sources the

391

greatest differences were between the fresh and brackish waters (Figure 1, 2, S2). This is likely

392

due to the mixing of fresh and saline water in the brackish sites enabling the co-occurrence of

393

bacteria typically associated with both freshwater and marine systems (e.g. Alphaproteobacteria

394

and Betaproteobacteria)(Piwosz et al., 2013;Gołębiewski et al., 2017). Additionally, the influx

395

of saline water likely selected for microorganisms tolerant of high salinity, such as some species

396

of Microcystis (Tonk et al., 2007;Ferreira et al., 2016). This may be of concern considering that

397

species of Microcystis are capable of producing hepatotoxins that have demonstrated some

398

ability to bioaccumulate in produce (Crush et al., 2008) – a potential source of toxin exposure not

399

often considered when assessing irrigation water quality.

400

In addition to Microcystis found in brackish water, genera commonly associated with

401

human disease (e.g. Streptococcus, Enterococcus) were identified in all samples. However,

402

typical environmental genera such as Variovorax were the most abundant (Figure 2).

20

403

Variovorax falls within the family Comamonadaceae and is phylogenetically closely related to

404

Acidovorax, another widespread environmental genera (Anzai et al., 2000;Lee et al., 2010).

405

Variovorax has been found throughout a variety of environments, such as: drinking water (Lee et

406

al., 2010;Delafont et al., 2013), freshwater riverine water (Nakayama et al., 2017), groundwater

407

(Humphries et al., 2005;Posman et al., 2017), and soil (Lipson et al., 2009). As a result, it has

408

been suggested that Variovorax can adapt to different environmental constraints, likely due to its

409

ability to degrade a variety of organic compounds, including pollutants (Bers et al., 2011).This

410

heterogeneity in metabolic potential may be an effective means for out-competing foreign

411

invaders, such as human-associated bacteria, and likely the explanation for its dominance among

412

the different water sources.

413

To further investigate the potential public health impacts of these water sources we also

414

investigated the presence of ARGs. Previous studies have identified a diverse range of ARGs in

415

reclaimed waters (Fahrenfeld et al., 2013), natural surface water sites (Vaz-Moreira et al., 2014),

416

and even pristine environments (e.g. ancient permafrost (D’Costa et al., 2011;Van Goethem et

417

al., 2018). These studies reflect both the natural production of antibiotics by bacteria and those

418

introduced to the environment by way of human pollution. In this study, we identified ARGs in

419

metagenomes from the majority of sites. However, comparing the sites revealed the highest ARG

420

diversity in samples from the reclamation facility, MA06, and the non-tidal freshwater creek,

421

MA03 (Figure 4). In MA06, most of the ARGs confer resistance to antibiotics commonly used in

422

clinical and agricultural applications including aminoglycosides, sulfonamides, rifamycins,

423

macrolides, cephalosporins, fluoroquinolones, and tetracyclines that function through a suite of

424

resistance mechanisms, including antibiotic inactivation and efflux, as well as antibiotic target

425

alteration, protection, and replacement (McArthur et al., 2013;Control and Prevention, 2014;Jia

21

426

et al., 2016). This broad range of resistance is not surprising as wastewater treatment plants are

427

considered “hotspots” of antimicrobial resistance due to traces of antibiotics in the wastewater

428

driving selection pressure (Guo et al., 2017;Karkman et al., 2017). One of the most abundant

429

ARGs detected in the reclaimed water was ermF, which confers resistance to macrolide-

430

lincosamide-streptogramin (MLS) antibiotics. These antibiotics are used frequently to treat

431

Gram-positive infections and have been found to withstand wastewater treatment, persisting at

432

high levels in the effluent (McArdell et al., 2003;Szekeres et al., 2017).

433

For the freshwater creek, MA03, we also observed a high ARG diversity. This is likely

434

due to discharge from a wastewater treatment facility located upstream of the sampling location.

435

Previous studies have identified that wastewater effluent discharge into natural lotic systems may

436

increase the presence of antibiotics, antibiotic-resistant bacteria, and ARGs downstream (Iwane

437

et al., 2001;Sabri et al., 2018). A study of the Grote Beerze River in the Netherlands found

438

increased amounts of antibiotics and ARGs up to 20 km downstream of an effluent discharge

439

point compared to upstream samples (Sabri et al., 2018). However, it is important to note in our

440

study that ARG diversity was not consistently high throughout the entire sampling period. Due to

441

the limited number of time points it is difficult to determine whether this was the result of

442

changing abiotic and biotic factors throughout time or variability in sampling. Nevertheless, the

443

presence of these ARGs in irrigation water sources raises potential concerns for the use of these

444

waters in agricultural applications and should be investigated further. For instance, the ARG at

445

the highest abundance in the majority of the other water sources besides MA03 was the mutant

446

of rpsL (Figure 4). Mutants of rpsL are resistant to aminoglycosides, including streptomycin, an

447

antibiotic produced by the soil Actinobacteria Streptomyces griseus and used widely in clinical

448

and agricultural applications (Brzostek et al., 2004;Ballif et al., 2012). Mutations in rpsL confer

22

449

resistance by disrupting interactions between the ribosomal protein and the antibiotic. These

450

types of target mutations are frequently found in environmental bacteria and are thought to be the

451

result of spontaneous pleotropic mutations (Paulander et al., 2009;Allen et al., 2010). This is

452

consistent with the potential taxonomic origin of the majority of rpsL mutants, Ferrimicrobium,

453

an iron-oxidizing Actinobacteria found in mine waters (Johnson and Hallberg, 2003), geothermal

454

soils (Stott et al., 2008), and a waterlogged bog (Sanyika et al., 2012). In fact, while pathogen-

455

associated bacterial genera (e.g. Listeria, Vibrio, Escherichia) were identified to be potential

456

hosts of some ARGs, most of the putative ARG hosts were traditional environmental genera

457

(Figure 5). The presence of ARGs even in environmental genera may still be a concern, as

458

previous research suggests that ARGs may pass through horizontal gene transfer from

459

indigenous environmental bacteria to pathogens (Guo et al., 2017).

460

Finally, to extend our study to phage, a traditionally understudied component of

461

microbial ecosystems, we leveraged the CRISPR-Cas system. Overall, we identified CRISPR

462

arrays at all time points and sampling locations, suggesting that this defense system is present in

463

bacteria from surface and reclaimed waters. Interestingly, the tidal brackish river site, MA04,

464

had the greatest abundance of CRISPR arrays (Figure 6). As described earlier, the influx of

465

saline water at this site may have selected for a unique bacterial composition, including an

466

increased abundance of Microcystis (Figure 2, S2). Previously, CRISPR arrays have been

467

identified in strains of the cyanobacterium Microcystis aeruginosa isolated from a shallow

468

eutrophic reservoir (Kuno et al., 2012). In addition, the authors found that the Microcystis

469

spacers were rarely shared among the strains, which is similar to the results observed in this

470

study.

23

471

Within the CRISPR arrays, the majority of spacers were unique to each site, suggesting

472

specific interactions between phage and hosts. This is likely due to the native bacteria having

473

adapted to the local phage populations at each site (Kunin et al., 2008). As a result, it has been

474

hypothesized that spacers can be used as a molecular fingerprint to subtype bacteria and

475

potentially track pathogen outbreaks (Shariat and Dudley, 2014). However, there were samples

476

across sites that demonstrated shared spacers, with the most apparent between the reclaimed

477

water site, MA01, and the freshwater pond site, MA10. This suggests a similarity in community

478

composition that may be indicative of similar bacterial lineages and/or environmental exposures.

479

For instance, the reclaimed water sites were stored at least temporarily in irrigation

480

ponds/lagoons, which have similar topographical features to ponds. This may also explain the

481

high percentage of shared translated peptide ORFs observed between the two sites (Figure S2).

482

However, this connection needs to be explored in greater detail.

483

Essential for the use of untreated surface and reclaimed water sources is knowledge of

484

their risks to environmental health and food safety. Although this study provides valuable

485

foundational data, there are some limitations. In particular, our study utilizes shotgun

486

sequencing, a method that surveys all DNA in a sample regardless of its host viability and

487

activity. As a result, accurately assigning taxonomy and function to unknown DNA fragments

488

with this methodology is a challenge (Sczyrba et al., 2017). Not only does the approach used for

489

taxonomic assignment (e.g. k-mer-based vs. alignment-based) influence the accuracy of the

490

assignments (McIntyre et al., 2017), but the database used in this process also plays a role

491

(Nasko et al., 2018). Here, we sought to circumvent these challenges by utilizing a widely

492

benchmarked similarity-searching tool (i.e. BLAST) and one of the most comprehensive non-

493

redundant reference database (i.e. UniRef). Using this approach we are confident that our

24

494

assignments accurately reflect microorganisms in the database with the highest degree of

495

taxonomic and functional homology. Additionally, due to financial constraints we were only able

496

to shotgun sequence samples from a limited number of sampling time points that took place

497

during a larger 2-year longitudinal study. Therefore, we are limited in our power to detect

498

changes through time within this subsample. This is particularly evident for the reclamation

499

facilities in which we were only able to shotgun sequence samples from two dates, further

500

limiting the assessment of these sites. Nevertheless, the foundational knowledge gained through

501

this study can be used to inform future studies and support the implementation of adaptive on-

502

farm technologies (e.g. water treatment approaches and drip irrigation) that can reduce the spread

503

of pathogenic microbial constituents.

504 505 506

5. Conclusions: The overall aim of our study was to provide comprehensive, foundational knowledge on the

507

microbial communities in multiple alternative water sources through the use of shotgun

508

metagenomic sequencing, with the ultimate goal of highlighting putative risks to produce safety

509

and public health. The main conclusions drawn from these data are:

510



511 512

vary depending on the date sampled and the specific site. •

513 514

Features of the bacterial community in untreated surface and reclaimed water sources

In particular, sites impacted by saline water had the greatest divergence with regard to putative bacterial taxonomy and functionality.



Potential public health threats, such as ARGs, were at the greatest abundance and

515

diversity in samples from a reclamation facility and a creek impacted by run-off from a

516

nearby wastewater facility.

25

517 518



The high degree of CRISPR spacers shared between certain sites (e.g. pond, reclamation facility) suggests similar environmental constraints and phage-host relationships.

519 520 521

Declaration of interests

522

The authors declare that they have no competing interests.

523 524

Acknowledgements:

525

This work was supported by the United States Department of Agriculture-National Institute of

526

Food and Agriculture, Grant number 2016-68007-25064, awarded to the University of Maryland

527

School of Public Health, that established CONSERVE: A Center of Excellence at the Nexus of

528

Sustainable Water Reuse, Food and Health. J.C. was supported by the NSF Research Traineeship

529

program, COMBINE, Grant number 1632976. This work also utilized the Extreme Science and

530

Engineering Discovery Environment (XSEDE), Grant number ACI-1548562.TBA. The authors

531

would also like to thanks Anup Mahurkar and Tom Emmel (University of Maryland School of

532

Medicine, Institute for Genome Sciences) for assistance with high-throughput computing.

533 534

26

535 536

Figure Legends

A

Brackish River

MA04

MA08

Reclamation Facility

MA01 MA02 MA06

Freshwater Pond

MA10

MA11

Freshwater Creek

MA03

MA07

Phyla

100

Acidobacteria Actinobacteria Bacteroidetes can. Magasanikbacteria can. Nomurabacteria

Relative Abundance (%)

75

Chloroflexi Cyanobacteria Firmicutes Unclassified 50

Planctomycetes Proteobacteria Verrucomicrobia Other

25

Alpha Beta

75

Delta 50

Epsilon Gamma

25

Other Unclassified

10 -2 11 4 -1 4 10 -2 4 11 -1 4 10 -2 11 4 -1 4 10 -1 10 0 -2 11 4 -1 4 10 -1 10 0 -2 11 4 -1 4 10 -1 10 0 -2 11 4 -1 4 10 -1 10 0 -2 11 4 -1 4

0

10 -1 0 10 -2 11 4 -1 4 10 -1 10 0 -2 11 4 -1 4

Relative Abundance (%)

B

Proteobacterial classes

0 100

Sampling Date

537 538 539

Figure 1: Normalized relative abundance of the bacterial taxa present in reclaimed and

540

untreated surface water sites at each sampling date. (A) Stacked bar chart depicting the

541

community structure at the phylum level. (B) Stacked bar chart of the Proteobacteria phyla split

542

into classes. Sites are grouped by water type and arranged in the order of sampling date.

543

Normalized abundance measured as contig coverage divided by the sum contig coverage per

544

million and presented as relative.

27

545 546

Figure 2: Taxonomic heatmap of the bacterial communities present in reclaimed and

547

untreated surface water sites at each sampling date. Heatmap based on the log-transformed

548

normalized abundance of the most dominant genera (>1% in at least one sample). Normalized

549

abundance measured as contig coverage divided by the sum contig coverage per million.

550 551 552 553 554 555 28

556 29

557

Figure 3: Functional profiles of bacterial communities present in reclaimed and untreated

558

surface water sites at each sampling date. (A) Heatmap of the number of translated peptide

559

ORFs assigned to the top ten GO terms for the biological and molecular categories in each

560

sample. The corresponding GO IDs are presented in parentheses. One translated ORF may be

561

matched to multiple GO terms. (B) Normalized abundance of contigs containing translated

562

ORFs assigned at each site to the following GO terms: pathogenesis (GO:0009405), toxin

563

activity (GO:0090729), response to antibiotic GO:0046677), response to drug (GO:0042493).

564

Sites are grouped by water type and arranged in the order of sampling date.

565 566

30

567 568

Figure 4: Antibiotic resistance genes (ARGs) predicted in reclaimed and untreated surface

569

water sites at each sampling date. Dotplot showing the ARG-like translated peptide ORFs

570

present at each water site, with the size of each dot equivalent to the number of translated ORFs

571

with homology to each ARG listed on the y-axis, and the color representative of the water type.

572

ARG drug class designations consistent with the ontology from the Comprehensive Antibiotic

573

Resistance Database. *MLS: macrolide, lincosamide, streptogramin antibiotic.

574

31

575 576

Figure 5: Bipartite network of the bacterial taxa with predicted antibiotic resistance genes

577

(ARGs) from reclaimed and untreated surface water sites. Uncolored circles represent ARGs

578

connected by an edge to its putative bacterial host, with each edge colored by the water type of

579

the sample it was identified in. Bacterial host defined as the taxa assigned to the contig the ARG-

580

like translated peptide ORF originated from and colored accordingly.

581

32

582 33

583

Figure 6: CRISPR array abundance and spacer overlap within and among reclaimed and

584

untreated surface water sites. (A) Bar plot of the normalized abundance of contigs containing

585

CRISPR arrays. Bars are colored by water type and labeled with the number of contigs

586

containing CRISPR arrays. Normalized abundance measured as contig coverage divided by the

587

sum contig coverage per million. (B) Network of shared spacers (97% identity) among

588

alternative water sites. Donut-shaped centroid nodes represent each of the nine sampling sites,

589

with the size equivalent to the number of spacers within that site, and the color representative of

590

the water type. Nodes connecting the centroids represent shared spacers between and among

591

sites.

34

592

References

593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634

Allen, H.K., Donato, J., Wang, H.H., Cloud-Hansen, K.A., Davies, J., and Handelsman, J. (2010). Call of the wild: antibiotic resistance genes in natural environments. Nature Reviews Microbiology 8, 251. Altschul, S.F., Gish, W., Miller, W., Myers, E.W., and Lipman, D.J. (1990). Basic local alignment search tool. Journal of molecular biology 215, 403-410. Anzai, Y., Kim, H., Park, J.-Y., Wakabayashi, H., and Oyaizu, H. (2000). Phylogenetic affiliation of the pseudomonads based on 16S rRNA sequence. International journal of systematic and evolutionary microbiology 50, 1563-1589. Ballif, M., Harino, P., Ley, S., Coscolla, M., Niemann, S., Carter, R., Coulter, C., Borrell, S., Siba, P., and Phuanukoonnon, S. (2012). Drug resistance-conferring mutations in Mycobacterium tuberculosis from Madang, Papua New Guinea. BMC microbiology 12, 191. Bers, K., Sniegowski, K., Albers, P., Breugelmans, P., Hendrickx, L., De Mot, R., and Springael, D. (2011). A molecular toolbox to estimate the number and diversity of Variovorax in the environment: application in soils treated with the phenylurea herbicide linuron. FEMS microbiology ecology 76, 14-25. Bland, C., Ramsey, T.L., Sabree, F., Lowe, M., Brown, K., Kyrpides, N.C., and Hugenholtz, P. (2007). CRISPR recognition tool (CRT): a tool for automatic detection of clustered regularly interspaced palindromic repeats. BMC bioinformatics 8, 209. Bolger, A.M., Lohse, M., and Usadel, B. (2014). Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics, btu170. Bruinsma, J. (Year). "The resource outlook to 2050: By how much do land, water and crop yields need to increase by 2050", in: Expert meeting on how to feed the world in), 2426. Brzostek, A., Sajduda, A., Śliwiński, T., Augustynowicz-Kopeć, E., Jaworski, A., Zwolska, Z., and Dziadek, J. (2004). Molecular characterisation of streptomycin-resistant Mycobacterium tuberculosis strains isolated in Poland. The International Journal of Tuberculosis and Lung Disease 8, 1032-1035. Chopyk, J., Allard, S., Nasko, D.J., Bui, A., Mongodin, E.F., and Sapkota, A.R. (2018). Agricultural freshwater pond supports diverse and dynamic bacterial and viral populations. Frontiers in Microbiology 9, 792. Chopyk, J., Chattopadhyay, S., Kulkarni, P., Claye, E., Babik, K.R., Reid, M.C., Smyth, E.M., Hittle, L.E., Paulson, J.N., and Cruz-Cano, R. (2017). Mentholation affects the cigarette microbiota by selecting for bacteria resistant to harsh environmental conditions and selecting against potential bacterial pathogens. Microbiome 5, 22. Conesa, A., Madrigal, P., Tarazona, S., Gomez-Cabrero, D., Cervera, A., Mcpherson, A., Szcześniak, M.W., Gaffney, D.J., Elo, L.L., and Zhang, X. (2016). A survey of best practices for RNA-seq data analysis. Genome biology 17, 1. Consortium, U. (2009). The universal protein resource (UniProt) in 2010. Nucleic acids research 38, D142-D148. Control, C.F.D., and Prevention (2014). "Outpatient antibiotic prescriptions—United States".).

35

635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678

Crush, J., Briggs, L., Sprosen, J., and Nichols, S. (2008). Effect of irrigation with lake water containing microcystins on microcystin content and growth of ryegrass, clover, rape, and lettuce. Environmental Toxicology 23, 246-252. D’costa, V.M., King, C.E., Kalan, L., Morar, M., Sung, W.W., Schwarz, C., Froese, D., Zazula, G., Calmels, F., and Debruyne, R. (2011). Antibiotic resistance is ancient. Nature 477, 457. De Smet, K.A., Kempsell, K.E., Gallagher, A., Duncan, K., and Young, D.B. (1999). Alteration of a single amino acid residue reverses fosfomycin resistance of recombinant MurA from Mycobacterium tuberculosis. Microbiology 145, 3177-3184. Delafont, V., Brouke, A., Bouchon, D., Moulin, L., and Héchard, Y. (2013). Microbiome of freeliving amoebae isolated from drinking water. Water research 47, 6958-6965. Dieter, C.A., Maupin, M.A., Caldwell, R.R., Harris, M.A., Ivahnenko, T.I., Lovelace, J.K., Barber, N.L., and Linsey, K.S. (2018). "Estimated use of water in the United States in 2015". US Geological Survey). Drury, B., Rosi-Marshall, E., and Kelly, J.J. (2013). Wastewater treatment effluent reduces the abundance and diversity of benthic bacterial communities in urban and suburban rivers. Applied and environmental microbiology, AEM. 03527-03512. Enault, F., Briet, A., Bouteille, L., Roux, S., Sullivan, M.B., and Petit, M.-A. (2017). Phages rarely encode antibiotic resistance genes: a cautionary tale for virome analyses. The ISME journal 11, 237. Fahrenfeld, N.L., Ma, Y., O'brien, M., and Pruden, A. (2013). Reclaimed water as a reservoir of antibiotic resistance genes: distribution system and irrigation implications. Frontiers in microbiology 4, 130. Ferreira, C., Soares, A.R., Lamosa, P., Santos, M.A., and Da Costa, M.S. (2016). Comparison of the compatible solute pool of two slightly halophilic planctomycetes species, Gimesia maris and Rubinisphaera brasiliensis. Extremophiles 20, 811-820. Frost, L.S., Leplae, R., Summers, A.O., and Toussaint, A. (2005). Mobile genetic elements: the agents of open source evolution. Nature Reviews Microbiology 3, 722. Gogleva, A.A., Gelfand, M.S., and Artamonova, I.I. (2014). Comparative analysis of CRISPR cassettes from the human gut metagenomic contigs. BMC genomics 15, 202. Gołębiewski, M., Całkiewicz, J., Creer, S., and Piwosz, K. (2017). Tideless estuaries in brackish seas as possible freshwater‐marine transition zones for bacteria: the case study of the Vistula river estuary. Environmental microbiology reports 9, 129-143. Guo, J., Li, J., Chen, H., Bond, P.L., and Yuan, Z. (2017). Metagenomic analysis reveals wastewater treatment plants as hotspots of antibiotic resistance genes and mobile genetic elements. Water research 123, 468-478. Hagström, Å., Pommier, T., Rohwer, F., Simu, K., Stolte, W., Svensson, D., and Zweifel, U.L. (2002). Use of 16S ribosomal DNA for delineation of marine bacterioplankton species. Applied and environmental microbiology 68, 3628-3633. Heidelberg, J.F., Nelson, W.C., Schoenfeld, T., and Bhaya, D. (2009). Germ warfare in a microbial mat community: CRISPRs provide insights into the co-evolution of host and viral genomes. PLoS One 4, e4169. Horvath, P., Romero, D.A., Coûté-Monvoisin, A.-C., Richards, M., Deveau, H., Moineau, S., Boyaval, P., Fremaux, C., and Barrangou, R. (2008). Diversity, activity, and evolution

36

679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724

of CRISPR loci in Streptococcus thermophilus. Journal of bacteriology 190, 14011412. Humphries, J., Ashe, A.M., Smiley, J., and Johnston, C. (2005). Microbial community structure and trichloroethylene degradation in groundwater. Canadian journal of microbiology 51, 433-439. Huntley, R.P., Sawford, T., Mutowo-Meullenet, P., Shypitsyna, A., Bonilla, C., Martin, M.J., and O'donovan, C. (2014). The GOA database: gene ontology annotation updates for 2015. Nucleic acids research 43, D1057-D1063. Iwane, T., Urase, T., and Yamamoto, K. (2001). Possible impact of treated wastewater discharge on incidence of antibiotic resistant bacteria in river water. Water Science and Technology 43, 91-99. Jia, B., Raphenya, A.R., Alcock, B., Waglechner, N., Guo, P., Tsang, K.K., Lago, B.A., Dave, B.M., Pereira, S., and Sharma, A.N. (2016). CARD 2017: expansion and model-centric curation of the comprehensive antibiotic resistance database. Nucleic acids research, gkw1004. Johnson, D.B., and Hallberg, K.B. (2003). The microbiology of acidic mine waters. Research in microbiology 154, 466-473. Karkman, A., Do, T.T., Walsh, F., and Virta, M.P. (2017). Antibiotic-resistance genes in waste water. Trends in microbiology. Katoh, K., Misawa, K., Kuma, K.I., and Miyata, T. (2002). MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic acids research 30, 3059-3066. Kolde, R., and Kolde, M.R. (2018). Package ‘pheatmap’. Korves, T., and Colosimo, M.E. (2009). Controlled vocabularies for microbial virulence factors. Trends in microbiology 17, 279-285. Kubanov, A., Vorobyev, D., Chestkov, A., Leinsoo, A., Shaskolskiy, B., Dementieva, E., Solomka, V., Plakhova, X., Gryadunov, D., and Deryabin, D. (2016). Molecular epidemiology of drug-resistant Neisseria gonorrhoeae in Russia (Current Status, 2015). BMC infectious diseases 16, 389. Kulkarni, P., Olson, N.D., Paulson, J.N., Pop, M., Maddox, C., Claye, E., Goldstein, R.E.R., Sharma, M., Gibbs, S.G., and Mongodin, E.F. (2018). Conventional wastewater treatment and reuse site practices modify bacterial community structure but do not eliminate some opportunistic pathogens in reclaimed water. Science of The Total Environment 639, 1126-1137. Kunin, V., He, S., Warnecke, F., Peterson, S.B., Martin, H.G., Haynes, M., Ivanova, N., Blackall, L.L., Breitbart, M., and Rohwer, F. (2008). A bacterial metapopulation adapts locally to phage predation despite global dispersal. Genome Research 18, 293-297. Kuno, S., Yoshida, T., Kaneko, T., and Sako, Y. (2012). Intricate interactions between the bloom-forming cyanobacterium Microcystis aeruginosa and foreign genetic elements, revealed by diversified clustered regularly interspaced short palindromic repeat (CRISPR) signatures. Applied and environmental microbiology 78, 5353-5360. Langmead, B., and Salzberg, S.L. (2012). Fast gapped-read alignment with Bowtie 2. Nature methods 9, 357-359. Lee, A.S., Lim, I.H., Tang, L.L., Telenti, A., and Wong, S.Y. (1999). Contribution of kasA Analysis to Detection of Isoniazid-Resistant Mycobacterium tuberculosisin Singapore. Antimicrobial agents and chemotherapy 43, 2087-2089. 37

725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768

Lee, J., Lee, C., Hugunin, K., Maute, C., and Dysko, R. (2010). Bacteria from drinking water supply and their fate in gastrointestinal tracts of germ-free mice: a phylogenetic comparison study. Water research 44, 5050-5058. Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G., and Durbin, R. (2009). The sequence alignment/map format and SAMtools. Bioinformatics 25, 2078-2079. Li, W., and Godzik, A. (2006). Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22, 1658-1659. Lin, Y., Li, D., Zeng, S., and He, M. (2016). Changes of microbial composition during wastewater reclamation and distribution systems revealed by high-throughput sequencing analyses. Frontiers of Environmental Science & Engineering 10, 539-547. Lipson, D.A., Monson, R.K., Schmidt, S.K., and Weintraub, M.N. (2009). The trade-off between growth rate and yield in microbial communities and the consequences for under-snow soil respiration in a high elevation coniferous forest. Biogeochemistry 95, 23-35. Lopatina, A., Medvedeva, S., Shmakov, S., Logacheva, M.D., Krylenkov, V., and Severinov, K. (2016). Metagenomic analysis of bacterial communities of Antarctic surface snow. Frontiers in microbiology 7, 398. Luczkiewicz, A., Kotlarska, E., Artichowicz, W., Tarasewicz, K., and Fudala-Ksiazek, S. (2015). Antimicrobial resistance of Pseudomonas spp. isolated from wastewater and wastewater-impacted marine coastal zone. Environmental Science and Pollution Research 22, 19823-19834. Magoč, T., and Salzberg, S.L. (2011). FLASH: fast length adjustment of short reads to improve genome assemblies. Bioinformatics 27, 2957-2963. Mark Ibekwe, A., Leddy, M.B., Bold, R.M., and Graves, A.K. (2012). Bacterial community composition in low-flowing river water with different sources of pollutants. FEMS microbiology ecology 79, 155-166. Mcardell, C.S., Molnar, E., Suter, M.J.-F., and Giger, W. (2003). Occurrence and fate of macrolide antibiotics in wastewater treatment plants and in the Glatt Valley Watershed, Switzerland. Environmental Science & Technology 37, 5479-5486. Mcarthur, A.G., Waglechner, N., Nizam, F., Yan, A., Azad, M.A., Baylay, A.J., Bhullar, K., Canova, M.J., De Pascale, G., and Ejim, L. (2013). The comprehensive antibiotic resistance database. Antimicrobial agents and chemotherapy, AAC. 00419-00413. Miller, J.D., Schoonover, J.E., Williard, K.W., and Hwang, C.R. (2011). Whole catchment land cover effects on water quality in the lower Kaskaskia River watershed. Water, Air, & Soil Pollution 221, 337. Munir, M., Wong, K., and Xagoraraki, I. (2011). Release of antibiotic resistant bacteria and genes in the effluent and biosolids of five wastewater utilities in Michigan. Water research 45, 681-693. Nakayama, T., Hoa, T.T.T., Harada, K., Warisaya, M., Asayama, M., Hinenoya, A., Lee, J.W., Phu, T.M., Ueda, S., and Sumimura, Y. (2017). Water metagenomic analysis reveals low bacterial diversity and the presence of antimicrobial residues and resistance genes in a river containing wastewater from backyard aquacultures in the Mekong Delta, Vietnam. Environmental pollution 222, 294-306.

38

769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812

Nasko, D.J., Ferrell, B.D., Moore, R.M., Bhavsar, J.D., Polson, S.W., and Wommack, K.E. (2019). CRISPR spacers indicate preferential matching of specific virioplankton genes. mBio 10, e02651-02618. Newton, R.J., Jones, S.E., Eiler, A., Mcmahon, K.D., and Bertilsson, S. (2011). A guide to the natural history of freshwater lake bacteria. Microbiology and Molecular Biology Reviews 75, 14-49. Noguchi, H., Park, J., and Takagi, T. (2006). MetaGene: prokaryotic gene finding from environmental genome shotgun sequences. Nucleic acids research 34, 5623-5630. Nurk, S., Meleshko, D., Korobeynikov, A., and Pevzner, P. (2016). metaSPAdes: a new versatile de novo metagenomics assembler. arXiv preprint arXiv:1604.03071. Paulander, W., Maisnier-Patin, S., and Andersson, D.I. (2009). The fitness cost of streptomycin resistance depends on rpsL mutation, carbon source and RpoS (σS). Genetics 183, 539-546. Piwosz, K., Salcher, M.M., Zeder, M., Ameryk, A., and Pernthaler, J. (2013). Seasonal dynamics and activity of typical freshwater bacteria in brackish waters of the Gulf of Gdańsk. Limnology and Oceanography 58, 817-826. Posman, K.M., Derito, C.M., and Madsen, E.L. (2017). Benzene degradation by a Variovorax species within a coal tar-contaminated groundwater microbial community. Applied and environmental microbiology 83, e02658-02616. Renwick, W.H., Sleezer, R.O., Buddemeier, R.W., and Smith, S.V. (Year). "Small artificial ponds in the United States: impacts on sedimentation and carbon budget", in: Proceedings of the Eighth Federal Interagency Sedimentation Conference), 738-744. Rho, M., Wu, Y.-W., Tang, H., Doak, T.G., and Ye, Y. (2012). Diverse CRISPRs evolving in human microbiomes. PLoS genetics 8, e1002441. Sabri, N., Schmitt, H., Van Der Zaan, B., Gerritsen, H., Zuidema, T., Rijnaarts, H., and Langenhoff, A. (2018). Prevalence of antibiotics and antibiotic resistance genes in a wastewater effluent-receiving river in the Netherlands. Journal of Environmental Chemical Engineering. Sanyika, T.W., Stafford, W., and Cowan, D.A. (2012). The soil and plant determinants of community structures of the dominant actinobacteria in Marion Island terrestrial habitats, Sub-Antarctica. Polar biology 35, 1129-1141. Shanks, O.C., Newton, R.J., Kelty, C.A., Huse, S.M., Sogin, M.L., and Mclellan, S.L. (2013). Comparison of microbial community structure in untreated wastewater from different geographic locales. Applied and environmental microbiology, AEM. 0344803412. Shariat, N., and Dudley, E.G. (2014). CRISPRs: molecular signatures used for pathogen subtyping. Appl. Environ. Microbiol. 80, 430-439. Snyder, J.C., Bateson, M.M., Lavin, M., and Young, M.J. (2010). Use of cellular CRISPR (clusters of regularly interspaced short palindromic repeats) spacer-based microarrays for detection of viruses in environmental samples. Applied and environmental microbiology 76, 7251-7258. Staley, C., Gould, T.J., Wang, P., Phillips, J., Cotner, J.B., and Sadowsky, M.J. (2014). Bacterial community structure is indicative of chemical inputs in the Upper Mississippi River. Frontiers in microbiology 5, 524.

39

813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850

Stott, M.B., Crowe, M.A., Mountain, B.W., Smirnova, A.V., Hou, S., Alam, M., and Dunfield, P.F. (2008). Isolation of novel bacteria, including a candidate division, from geothermal soils in New Zealand. Environmental Microbiology 10, 2030-2041. Sulochana, S., Narayanan, S., Paramasivan, C., Suganthi, C., and Narayanan, P. (2007). Analysis of fluoroquinolone resistance in clinical isolates of Mycobacterium tuberculosis from India. Journal of chemotherapy 19, 166-171. Suslow, T., Oria, M., Beuchat, L., Garrett, E., Parish, M., Harris, L., Farber, J., and Busta, F. (2003). Production practices as risk factors in microbial food safety of fresh and fresh‐cut produce. Comprehensive Reviews in Food Science and Food Safety 2, 3877. Szczepanowski, R., Linke, B., Krahn, I., Gartemann, K.-H., Guetzkow, T., Eichler, W., Pühler, A., and Schlueter, A. (2009). Detection of 140 clinically relevant antibiotic-resistance genes in the plasmid metagenome of wastewater treatment plant bacteria showing reduced susceptibility to selected antibiotics. Microbiology 155, 2306-2319. Szekeres, E., Baricz, A., Chiriac, C.M., Farkas, A., Opris, O., Soran, M.-L., Andrei, A.-S., Rudi, K., Balcázar, J.L., and Dragos, N. (2017). Abundance of antibiotics, antibiotic resistance genes and bacterial community composition in wastewater effluents from different Romanian hospitals. Environmental Pollution 225, 304-315. Tonk, L., Bosch, K., Visser, P.M., and Huisman, J. (2007). Salt tolerance of the harmful cyanobacterium Microcystis aeruginosa. Aquatic Microbial Ecology 46, 117-123. Van Goethem, M.W., Pierneef, R., Bezuidt, O.K., Van De Peer, Y., Cowan, D.A., and Makhalanyane, T.P. (2018). A reservoir of ‘historical’antibiotic resistance genes in remote pristine Antarctic soils. Microbiome 6, 40. Vaz-Moreira, I., Nunes, O.C., and Manaia, C.M. (2014). Bacterial diversity and antibiotic resistance in water habitats: searching the links with the human microbiome. FEMS microbiology reviews 38, 761-778. Wickham, H. (2009). ggplot2: elegant graphics for data analysis. Springer Science & Business Media. Zankari, E., Hasman, H., Cosentino, S., Vestergaard, M., Rasmussen, S., Lund, O., Aarestrup, F.M., and Larsen, M.V. (2012). Identification of acquired antimicrobial resistance genes. Journal of antimicrobial chemotherapy 67, 2640-2644. Zhang, T., and Li, B. (2011). Occurrence, transformation, and fate of antibiotics in municipal wastewater treatment plants. Critical reviews in environmental science and technology 41, 951-998. Zuurmond, A.-M., Olsthoorn-Tieleman, L.N., De Graaf, J.M., Parmeggiani, A., and Kraal, B. (1999). Mutant EF-tu species reveal novel features of the enacyloxin IIa inhibition mechanism on the ribosome1. Journal of molecular biology 294, 627-637.

40

Table 1: Description of study sites. Water Type

Site ID

Catchment land use

MA10

Agricultural

MA11

Agricultural

MA04

Marshland/Fore sted

Pond

River MA08

MA03

Creek

MA07

Reclaimed

Marsh grasses on both sides (~25-50m wide) then pine woods. Wooded, Agronomic cropland adjacent to the creek (~30-50 m) Flood plain grasses and woodland (hardwoods)

MA01

Wooded pines with grass lanes

MA02

Agronomic cropland (Corn and Soybeans)

MA06

Native grass

Description Freshwater pond with a maximum depth of ~3.35 meters and a surface area of ~ 0.26 ha Freshwater pond with a maximum depth of ~3 meters and a surface area of ~0.40 ha Tidal brackish river flowing into the Choptank River. At sampling site, width of ~ 76 meters and depth of ~ 0.3-0.6 meters. Tidal brackish river flowing into the Chesapeake Bay. At sampling site, width of ~15 meters and depth of ~2-3 meters. Within 1-1.5 miles downstream from broiler concentrated animal feeding operations (CAFOs). Non-tidal freshwater creek tributary of the Nanticoke River. At sampling site width of ~3 meters and a depth of ~1 meter. Within 1 mile downstream from wastewater treatment discharge facility. Non-tidal freshwater creek tributary of the Nanticoke River. At sampling site width of ~10 meters and a depth of ~1 meter. Within 2.5 miles downstream from CAFO poultry houses. Influent is treated through activated sludge processing (Sequential Batch Reactor), filtration, UV light, chlorination, and then stored in an openair lagoon before land application. Influent is treated through activated sludge processing (Sequential Batch Reactor), filtration, UV light, chlorination, and then stored in an openair lagoon before land application. Influent is treated through grinding, activated sludge processing, secondary clarification, and then stored in an open-air lagoon. It is chlorinated prior to land application.

Highlights (limited to 85 characters each, with spaces) •

Metagenomes from irrigation water sources were analyzed across time and among different sites



Proteobacteria, especially Betaproteobacteria, dominated the majority of sites.



Alphaproteobacteria were abundant at specific times in the brackish river samples.



Genes associated with pathogenesis were detected at all sites.



ARG and CRISPR abundance/diversity varied based on specific site and sampling date.

Declaration of interests X The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. ☐The authors declare the following financial interests/personal relationships which may be considered as potential competing interests: