w a t e r r e s e a r c h 4 4 ( 2 0 1 0 ) 4 2 5 2 e4 2 6 0
Available at www.sciencedirect.com
journal homepage: www.elsevier.com/locate/watres
Pyrosequencing of the 16S rRNA gene to reveal bacterial pathogen diversity in biosolids Kyle Bibby, Emily Viau, Jordan Peccia* Department of Chemical Engineering, Environmental Engineering Program, Yale University, New Haven, CT 06520, USA
article info
abstract
Article history:
Given the potential for a variety of bacterial pathogens to occur in variably stabilized
Received 17 March 2010
sewage sludge (biosolids), an understanding of pathogen diversity and abundance is
Received in revised form
necessary for accurate assessment of infective risk when these products are land applied.
14 May 2010
16S rDNA was PCR amplified from genomic DNA extracted from municipal wastewater
Accepted 25 May 2010
residuals (mesophilic- and thermophilic-phased anaerobic digestion (MAD and TPAD),
Available online 9 June 2010
composting (COM)), and agricultural soil (SOIL), and these amplicons were sequenced using massively parallel pyrosequencing technology. Resulting libraries contained an average of
Keywords:
30,893 16S rDNA sequences per sample with an average length of 392 bases. FASTUNIFRAC-
Pathogen
based comparisons of population phylogenetic distance demonstrated similarities between
Anaerobic digestion
the populations of different treatment plants performing the same stabilization method
Sludge
(e.g. different MAD samples), and population differences among samples from different
Compost
biosolids stabilization methods (COM, MAD, and TPAD). Based on a 0.03 Jukes-Cantor
16S
distance to 80 potential bacterial pathogens, all samples contained pathogens and enrichment ranged from 0.02% to 0.1% of sequences. Most (61%) species identified were opportunistic pathogens of the genera Clostridium and Mycobacterium. As risk sciences continue to evolve to address scenarios that include multiple pathogen exposure, the analysis described here can be used to determine the diversity of pathogens in an environmental sample. This work provides guidance for prioritizing subsequent culturable and quantitative analysis, and for the first time, ensuring that potentially significant pathogens are not left out of risk estimations. ª 2010 Elsevier Ltd. All rights reserved.
1.
Introduction
More than 7 million dry tons of sewage sludge are produced annually in the U.S., Bastian (1997). Globally, this waste stream continues to increase as urban populations increase, municipal wastewater treatment facilities move toward biological nutrient removal, and urban areas of developing nations build sewer systems and centralized treatment works. In the U.S., sewage sludges that have been stabilized by digestion or composting are termed “biosolids” if there is a resulting beneficial use. Greater than 60% of stabilized sewage sludges are reused through
application to agricultural land, Bastian (1997), Spicer (2002). While the agricultural benefits of land applying biosolids are well documented, Tenenbaum (1997), the potential pathogen content of biosolids and associated health complaints from residents living near biosolids land application sites have resulted in widespread public health concerns and community opposition to this practice, NRC (2002). The pathogen content of biosolids is likely diverse. In large municipalities, centralized wastewater treatment facilities commonly serve over 1 million residents. Although enteric pathogens are the traditional focus of biosolids management
* Corresponding author. Tel.: þ1 203 432 4385; fax: þ1 203 432 4387. E-mail address:
[email protected] (J. Peccia). 0043-1354/$ e see front matter ª 2010 Elsevier Ltd. All rights reserved. doi:10.1016/j.watres.2010.05.039
w a t e r r e s e a r c h 4 4 ( 2 0 1 0 ) 4 2 5 2 e4 2 6 0
practice, all viral and bacterial pathogens can be excreted in urine and feces, Sinclair et al. (2008). Potential exposure routes to these pathogens during land application include ingestion, inhalation, and dermal contact, NRC (2002). Surveys of pathogens in biosolids have thus far only included a limited suite of known infectious agents and indicators, Rusin et al. (2003), Viau and Peccia (2009b). Given the potential for such diverse pathogen content, past approaches that have targeted only a limited number of organisms may underestimate the pathogen diversity and total pathogen content, thus limiting the ability to fully understand potential infectious risk from human exposure during land application, Brooks et al. (2005), Eisenberg et al. (2008), Low et al. (2007). To properly define the risks posed by biosolids land application, a more complete understanding of pathogen abundance and diversity is required. Previous analyses of the dominant biosolids aerosol exposure pathway have estimated a maximum inhalation dose of respirable biosolids material to be 1.4 mg at 500 m, Low et al. (2007). At a bulk concentration of 5 1010 total cells per dry gram of biosolids, Paez-Rubio et al. (2007), this dose corresponds to inhalation of 7 104 biosolids-derived cells respectively during a land application event. Methods to examine microbial community diversity in such an environmental sampledprimarily construction of phylogenetic clone librariesdare typically limited to less than 103 sequence identifications per sample. This does not reach the sampling depth required to identify pathogens, which are less common and typically account for far less than 1% of an environmental microbial population. Recently developed massively parallel sequencing technologies can provide a large number of short read sequences, Margulies et al. (2005), and significantly improve researcher’s ability to investigate community members that are not highly enriched, Sogin et al. (2006). These studies have been performed with sequence reads of 250 bp and less, McKenna et al. (2008), Sanapareddy et al. (2009). However, technology for improved read lengths that average up to 400 bp is now available, allowing for a more definitive phylogenetic-based classification of individual reads. We hypothesize that massively parallel sequencing technology will allow for sequencing deeply enough into a biosolids populations such that pathogen diversity can be explored. To test this hypothesis, 454 FLX Titanium series technology was used to produce large 16S rRNA encoding gene libraries (average 30,893 sequences per sample with average 390 base read length). Samples included mesophilic anaerobically digested municipal wastewater sludge (MAD), thermophilic temperature-phased anaerobically digested municipal sludges (TPAD), and composted municipal sludges (COM), as well as an unamended agricultural soil for comparison. The bacterial pathogen content and diversity of each sample were determined, and microbial population structures from different treatments were compared.
wastes were from municipal wastewater treatment facilities and all samples obtained from processes were final products. Sampling occurred in November 2008. At two separate facilities (Texas and California, USA), biosolids representing a Class B (Class B biosolids are expected to contain pathogens, USEPA (1999)) quality product were sampled after mesophilic anaerobic digestion (MAD, 2 samples). These two facilities then composted the MAD stabilized biosolids via agitated windrow composting and curing to create a Class A (defined as pathogen free by U.S. EPA, USEPA (1999)) quality product (COM, 2 samples). Biosolids were also collected from one facility (Indiana, USA) performing temperature-phased anaerobic digestion to generate a Class A biosolids product (TPAD, 1 sample). For comparison with municipal biosolids, soil (SOIL, 1 sample) was collected from an agricultural site in Connecticut, U.S.A. with no previous record of sludge, compost, or manure application. Table 1 provides detailed descriptions and stabilization operational parameters for biosolids and soils used in this study. For each of the six treatments considered, five 100 g grab samples were shipped overnight to the laboratory on ice, recombined upon receipt, processed within 24 h from sampling and stored at 80 C for further processing. All samples were collected in accordance with U.S. EPA method 1680 sampling procedures, USEPA (2006).
2.2. Nucleic acid extraction and PCR amplification of bacterial 16S rRNA gene sequences Total DNA was extracted from each sample using the MoBio PowerSoil DNA kit (MoBio Laboratories, Carlsbad, CA) following a modified protocol for increased DNA yields described elsewhere, Viau and Peccia (2009b). DNA was extracted from three aliquots of each biosolids and soil sample and then pooled. Four independent PCR reactions were then run for each pooled sample. Bacterial 16S rRNA genes were amplified using the BSF8 broad-range forward primer 50 -NNNNNNNNNNNNTCAGAGTTTGATCCTGGCTCAG30 and the USR515 universal reverse primer 50 NNNNNNNNNNNNCACCGCGGCKGCTGGCAC-30 . A unique 12base barcode was used to designate the PCR amplicons from each of the six biosolids or soil samples. Sample PCR mixtures were prepared in 50 ml volumes and included 1 High Fidelity PCR buffer (Invitrogen, Carlsbad, CA), 0.2 mM deoxyribonucleoside triphosphates, 0.4 mM each of forward and reverse primers, 1.5 mM MgCl2, 0.4 mg/ml bovine serum albumin, 2.5 U Platinum Taq DNA Polymerase High Fidelity (Invitrogen) and 100e200 ng DNA template. Reactions were run on a gradient thermal cycler (Techne, Burlington, N.J.) under the following cycling conditions: 5 min initial denaturation at 95 C followed by 20 cycles of denaturing at 95 C for 30 s, annealing at 56 C for 30 s, extension at 72 C for 90 s, and a final extension at 72 C for 7 min.
2.3.
2.
Materials and methods
2.1.
Sample collection
Biosolids were collected from three anonymous U.S. wastewater treatment facilities. All treatments process influent
4253
PCR amplicon purification and pyrosequencing
Each PCR product was visualized on a 1.2% agarose gel to confirm 507 bp DNA amplicons. The four PCR reactions from each sample were then combined and primers, PCR reagents, and salts were removed using a QIAquick PCR purification kit (Qiagen Inc., Valencia, CA). Purified DNA amplicon
4254
w a t e r r e s e a r c h 4 4 ( 2 0 1 0 ) 4 2 5 2 e4 2 6 0
Table 1 e Operational parameters for municipal biosolids. All treatments process influent wastes were from municipal wastewater treatment facilities and all samples obtained from processes were final products. COM biosolids are the results of composting MAD stabilized biosolids of the same identification number. Treatment COM1(municipal) COM2(municipal) MAD1(municipal) MAD2(municipal) SOIL TPAD(municipal)
Operational Parameters Agitated windrow; amended with sawdust, woodchips, or green waste Agitated windrow; amended with sawdust, woodchips, or green waste 35e37 for < 20 d 35e37 for < 20 d No previous sludge or manure amendments 55 for 15e30 d, 37 for 15 d
concentrations were determined by a Nanodrop ND-1000 UVeVis spectrophotometer (Nanodrop Technologies, Wilmington, DE) and 1.25 mg DNA from each of the six samples was combined in a master pool of 10 mg DNA. DNA was sent to the Yale Center for Genome Analysis for shotgun pyrosequencing on a 454 GS-FLX sequencer (Roche Diagnostics Corporation, Indianapolis, IN) utilizing the Titanium Sequencing Kit (Roche) to generate 400-bp sequence reads. Sequences were subjected to quality control at the machine based on run produced Phred scores. Keypass, Dots, and Mixed filters were used to assess the quality of the whole read while the quality of the ends of the reads was checked by signal intensity and primer filters. Failure of a sequence to pass any of these controls resulted in rejection of the entire read, Roche (2009).
2.4.
Dewatering Scheme
Phylogenetic and pathogen identification
Raw sequence data were sorted and trimmed using the Pipeline Initial Process on the Ribosomal Database Project II, Cole et al. (2009). Sequences were sorted based on sample specific tags and primer and tag sequences were trimmed from sorted sequences. Sequences were excluded if they could not be sorted by a tag, were shorter than 50 bases, or contained more than 1 undefined base. To identify pathogens, sorted and trimmed sequences were first phylogenetically classified using the RDP naı¨ve Bayesian rRNA Classifier tool version 2.0, Wang et al. (2007), on the RDPII pyrosequencing website. An 80% confidence threshold was set. These classifications were examined for the presence of sequences within the pathogenic genus identified in Table 2. This list was populated with pathogens based on the U.S. EPA list of potential pathogens in biosolids, USEPA (2001), and an extensive literature review to identify the breadth of bacterial pathogens that may exist in biosolids. Sequences in these genera were separated for further alignment and analysis. ClustalX, Larkin et al. (2007), was used to align sequences within each pathogen genera with multiple sequences encompassing the 8e515 region of the 16S rRNA gene of representative pathogenic species extracted from NCBI Genbank. For final classification, these alignments were used as inputs for the DNADist program in PHYLIP, Felsenstein (2009). The distance matrix output from DNADist was examined for sample sequences that were within 0.03 Jukes-Cantor distance of a known pathogen sequence. Such sequences were classified as potential pathogens.
Solids Content
Location
None
56%
Texas, USA
None
76%
California, USA
Belt filter press Belt filter press None Belt filter press
17% 15% 92% 15%
Texas, USA California, USA Connecticut, USA Indiana, USA
Phylogenetic trees were created by aligning and bootstrapping with ClustalX. Trees were viewed, edited, and published with MEGA4, Tamura et al. (2007).
2.5.
Population analysis
FASTUNIFRAC, Hamady et al. (2009), was utilized to produce Principal Coordinate Analyses (PCoA) comparing all 6 samples. Due to limitations in the number of sequences that FASTUNIFRAC may process, approximately 25% of the sequences from each sample (4300 COM1, 1947 COM 39, 11,528 MAD1, 9796 MAD2, 11,322 SOIL, and 9370 TPAD sequences) were randomly selected for analysis. To facilitate UNIFRAC analysis, a BLAST, Altschul et al. (1990), library was constructed using the GreenGenes, DeSantis et al. (2006), core set from May 2009. A hit table for the sample sequences was created by BLAST comparing the sequences against the core set library with a threshold of 1e-30. The hit table was trimmed for use in FASTUNIFRAC using Pycogent, Knight et al. (2007), and the Pycogent trimmed hit table was used as the input for the FASTUNIFRAC website, (http://bmf2.colorado. edu/fastunifrac). The resulting PCoA axes were exported and used to produce a graph summarizing the analysis.
2.6.
Nucleotide sequence accession numbers
Tag information and the unprocessed DNA sequences obtained in this study have been deposited in the GenBank Short Read Archive (SRA) under accession number SRA009753.
3.
Results and discussion
3.1.
Sequencing results
A total of 238,718 raw sequences were generated. After trimming, sorting, and quality control, 185,358 or 78% of the sequences were used in our analysis. A summary of trimmed sequence information is included in Table 3 and a characteristic histogram of trimmed read lengths is shown in Figure 1. Based on previous studies that indicate short reads can be used to adequately classify microbial communities, Liu et al. (2007), a minimum read length of 50 bases was set to include all potentially useful sequences. The average length of trimmed and sorted sequences was 362 bases and increased to 392
4255
w a t e r r e s e a r c h 4 4 ( 2 0 1 0 ) 4 2 5 2 e4 2 6 0
Table 2 e Genus and species of bacterial pathogens considered in this study. Groups in underline represent U.S. EPA bacteria of concern in biosolids, USEPA (2001).
Table 3 e Pyrosequencing trimmed read length statistics. Read lengths presented are for sequences in which the primer and tag (e30 bp) have been trimmed.
Genus
Treatment
Aeromonas spp. Bacillus Bordetella spp. Brevibacterium Brucella Burkholderia spp. Campylobacter spp. Chlamydia Chlamydophila spp. Clostridium spp. Corynebacterium Enterobacter spp. Enterococcus spp. Escherichia Francisella Haemophilus Helicobacter Klebsiella spp. Legionella spp. Leptospira Listeria Moraxella Mycobacterium spp. Mycoplasma Neisseria Nocardia spp. Plesiomonas Pseudomonas Salmonella spp. Serratia Shigella spp. Staphylococcus spp. Stenotrophomonas Streptococcus spp. Vibrio Yersinia spp.
Pathogenic species enteropelogene, caviae, hydrophila, sobria, veronii anthracis bronchiseptica, parapertussis, pertussis linens melitensis cepacia, pseudomallei jejuni, coli trachomatis pneumonia, psittaci argentinense, baratii, botulinum, perfringens, tetani diphtheria aerogenes, cloacae, faecium faecalis, faecium coli 0157:H7 (enterohemorrhagic) tularensis influenza pylori pneumoniae, oxytoca anisa, bozemanii, longbeachae, pneumophila interrogans monocytogenes catarrhalis avium, bovis, chelonae, fortuitum, kansasii, leprae, phlei, scrofulaceum, tuberculosis pneumonia meningitides asteroides, brasiliensis shigelloides aeruginosa enteriditis, paratyphi, typhi marcescens boydii, dysenteriae, flexneri, sonnei aureus, epidermis, haemolyticus, lugdunensis, saprophyticus, schleiferi maltophilia bovis, enterica, milleri, pneumonia, pyogenes cholera enterocolitica, pestis
bases including the primer and classification tags that were subsequently trimmed. Massively parallel studies are forced to decide between sequencing depth and number of samples (i.e. number of sequences per samples). As the goal of this study is to identify pathogens, including those that are not traditionally monitored, the sequencing depth necessary to identify these pathogens was not known prior to the study. In an effort to best characterize these pathogens, 6 representative samples were selected. With the goal of directing future monitoring and risk assessment efforts, the samples selected were those to be land applied, i.e. final products. Recent bioinformatic analyses have suggested that the error rate in pyrosequencing combined with the large amount of sequences generated provides the potential for artificially high estimates of diversity, Reeder and Knight (2009). Here, we observed that 33%e45% of the OTUs were accounted for by single sequences. Although error analysis methods exist for
COM1 COM2 MAD1 MAD2 SOIL TPAD
Number
Average Read Length
Standard Deviation
Median Read Length
16474 7314 44108 37670 43670 36122
363.88 360.91 366.71 361.95 360.94 360.19
129.85 124.58 128.96 127.28 126.29 126.37
396 395 398 393 392 389
earlier generations of pyrosequencing technologies, Quince et al. (2008), this analysis is not yet able to analyze sequences generated by GS-FLX Titanium chemistry. Given this error potential and that diversity measurements are not a central theme of this work, population diversity is not reported.
3.2.
Population comparisons
Broad taxonomic analysis revealed that biosolids samples are dominated (greater than 90% of sequences in waste treatment samples) by the Bacteroidetes, Chloroflexi, Firmicutes, Proteobacteria, and Actinobacteria phylum (Fig. 2). A distinctive feature of MAD biosolids has been the previously reported high content (33% by FISH and 40% by clone libraries) of an uncultured environmental clone that grouped near sequences in the Chloroflexi phylum, Baertsch et al. (2007), Chouari et al. (2005). Due to this high enrichment compared to fecal indicators (e0.001% enrichment of fecal indicators and Clostridia) and the absence of these sequences in soils, these organisms have been suggested and used as potential biosolids source tracking organisms, Baertsch et al. (2007), Chouari et al. (2005). Based on a 0.03 Jukes-Cantor distance, these sequences were also described in this study and were the dominant OTU in MAD, accounting for 16% of MAD sequences (average), 2% of TPAD
Fig. 1 e Histogram of trimmed read lengths from the MAD1 sample. The apparent bimodal distribution and left tail are characteristic of all samples. Inset contains a histogram of sequences from all six samples that were identified as members of the 80 pathogenic species considered.
4256
w a t e r r e s e a r c h 4 4 ( 2 0 1 0 ) 4 2 5 2 e4 2 6 0
Fig. 2 e Phylogenetic classifications of samples at phylum level and using candidate divisions. Sequences that were not classified to the Phylum level were excluded.
sequences, 1.9% of COM sequences (average), and only 0.4% in SOIL sequences. Beyond suggesting a potential important role for these undescribed organisms in anaerobic digestion, these libraries confirm the high enrichment of Chloroflexi spp. in two new MAD digesters, and support the utility of Chloroflexi spp. source tracking of Class B MAD wastes. To compare each population a quantitative analysis of sample similarity based on phylogenetic distance was conducted using the principal coordinate analysis (PCoA) function of FASTUNIFRAC. The first two axes of this analysis are shown in Fig. 3. Although derived from different wastewater treatment facilities, similarities exist among MAD samples and among COM samples. The similarity among treatments for MAD and the variability in COM are also mirrored in previous biosolids surveys of fecal indicator content that demonstrate a greater variability in plant to plant indicator concentrations
for COM, Viau and Peccia (2009a,b). These differences are a property of stabilization practice. The standards of practice for operating anaerobic digesters are typically very similar due to common goals of volatile suspended solids destruction, biogas production, and control of pH and volatile fatty acid concentration. In this study, these standards appear to have resulted in similar population structures in different, successfully operated digesters. In COM processesdwhere the stabilization goals are less certaindlimited standardization of aeration techniques, filler material, and residence times result in a more variable product between treatment facilities. While comparisons with additional, different samples will be necessary to confirm these population trends with MAD and COM, the results presented here do support the trends observed in pathogen indicator data that suggest uniformity in populations among MAD biosolids and a less uniform population outcome among different COM biosolids. Furthermore, the demonstrated impact of sewage sludge stabilization on ecology suggests that the microbial population is driven by the process. Thus, reductions in influent pathogens will occur as long as stabilization conditions are unfavorable to the pathogen. The SOIL sample was distinct from all other waste samples and operation of a digester at thermophilic conditions (MAD vs. TPAD) resulted in low similarities between populations.
3.3.
Fig. 3 e FASTUNIFRAC output of first two axes from PCoA analysis from all treatments.
Pathogens
The 36 pathogenic genera and 80 species considered in this analysis are shown in Table 2. Retrieved sequences were compared with 16S rRNA gene sequences of these known pathogens through a distance matrix constructed by the Jukes-Cantor algorithm. The 16S rDNA method is restricted to bacterial pathogens and does not include helminthes, human viruses, and protozoan pathogens. A Jukes-Cantor distance of 0.03 has been proposed as a measure of species-level
4257
w a t e r r e s e a r c h 4 4 ( 2 0 1 0 ) 4 2 5 2 e4 2 6 0
similarity, Schloss and Handelsman (2006). The sequences that were within a 0.03 Jukes-Cantor distance of known pathogen sequences were counted as potential pathogens and are presented in Table 4. Of the 36 pathogen genera considered (Table 2), 14 included sequences that were counted as potential pathogens. Mycobacteria and Clostridia were the most common, numerically comprising 61% of positive pathogen identifications and were the only pathogen genera present in all six samples. For Mycobacteria, the most common species were the opportunistic pathogens Mycobacterium forituitum, Mycobacterium phlei, and Mycobacterium chelonae (68/74). Mycobacterium avium was also found in SOIL (4) as well as MAD1and MAD2 (1 each) and the only Mycobacterium tuberculosis sequence was retrieved from SOIL. The most common Clostridia spp. pathogen was Clostridium perfringens. Within each Table 2 genus, the majority of retrieved sequences did not align with known pathogens and among samples, the percent of total sequences within a pathogen genus that were pathogens ranged from 0.7 to 56%. The enrichment of pathogens in all sequences within a sample was 0.08% for MAD (average), 0.05% for COM (average), 0.022% for TPAD, and 0.10% for SOIL. Distribution of the sequence lengths used for pathogen identifications were approximately the same as the distribution observed in the total sequence pool (Fig. 1 inset). Short reads of the 16S rRNA encoding gene region are now commonly used for characterization of bacterial communities. Using massively parallel sequencing techniques, phylogenetic libraries have been constructed for the human, Andersson et al. (2008) and macaque gut, McKenna et al. (2008), soil, Roesch et al. (2007), food processing slurries, Humblot and Guyot (2009) and activated sludge, Sanapareddy et al. (2009). The large sequence yield has dramatically increased the capability of researchers to explore rare sequences within a phylogenetic assemblage, Sogin et al. (2006). This new ability, coupled with longer reads that allow for a more definitive phylogenetic classification suggests the potential for sequencing deeply into an environmental assemblage to
identify the diverse pathogenic strains, Dowd et al. (2008), Luna et al. (2007). Successful efforts in this area have the potential to change the current paradigm of pathogen detection in environmental and medical samples by including the full diversity of different pathogens, rather than limiting searchers to individual organisms that are suspected to be present. With respect to expected human exposure and risk, the sampling coverage here was deep. More than 80,000 16S rDNA sequences were retrieved from the two MAD samples, which is greater than the exposure (70,000 cell inhaled) predicted at 500 m setback from a biosolids land application site. Pathogen content was found in all samples and for biosolids, values were 0.08% for MAD (average), 0.05% for COM (average), and 0.022% for TPAD. Pathogen enrichment in soil was not significantly different from that of MAD biosolids, although the organisms were different. MAD biosolids and SOIL all contained opportunistic pathogens that are common in the environment and soils (Clostrida spp., Mycobacteria spp., Nocardia spp.) while biosolids also contained opportunistic species that are common human microflora (Staphylococcus spp., Streptococcus spp., Enterococcus spp.). The results in Table 4 do not demonstrate reduction between Class B MAD and Class A COM biosolids, and suggest that the opportunistic pathogen loads in biosolids is similar to or less than that of soils. However, similar to clone libraries, these results are only semi-quantitative due to varying copy numbers of the 16s rRNA genes in different bacterial species, the required PCR amplification step, and the novel nature of Titanium reagent pyrosequencing. Furthermore, sequencing did not reach deeply enough to identify primary pathogens. Indeed, the differences in selected pathogen content between Class A and Class B biosolids has been documented in both culture-based, Pourcher et al. (2007) and quantitative PCRbased surveys, Viau and Peccia (2009b). The primary pathogens of concern that have previously been identified for biosolids are listed in bold in Table 2, USEPA (2001). While sequences belonging to primary pathogens including Burkholderia cepatia, Shigella spp. and M. tuberculosis were found in
Table 4 e Number of sequences within 0.03 Jukes-Cantor distance of known pathogens for each treatment. Species considered for each genus are described in Table 2, only genera with classified pathogen sequences at the genus level are included. The number of sequences originally classified in the genus is listed in parentheses. Underlined totals are number of sequences in that sample. Pathogen
Bordetella Brucella Burkholderia Clostridium Enterobacter Enterococcus Mycobacterium Nocardia Pseudomonas Serratia Shigella Staphylococcus Stentrophomonas Streptococcus TOTAL
Treatment COM1
COM2
MAD1
MAD2
SOIL
TPAD
0(3) 0(0) 0(0) 0(2) 0(0) 0(0) 4(182) 0(3) 0(0) 0(0) 0(0) 0(0) 1(3) 0(0) 5/16474
4(29) 1(6) 0(0) 0(7) 1(1) 0(0) 2(116) 1(7) 0(0) 0(0) 0(0) 0(0) 0(0) 0(0) 9/7314
0(0) 0(0) 1(2) 6(106) 0(1) 1(2) 16(386) 0(0) 0(2) 0(0) 0(0) 1(1) 0(3) 18(51) 43/44108
0(0) 0(0) 0(1) 1(145) 1(1) 0(1) 14(200) 0(1) 0(2) 0(0) 2(5) 0(0) 0(7) 4(13) 22/37670
0(0) 0(0) 2(143) 2(54) 0(0) 0(1) 35(320) 5(9) 0(25) 1(1) 3(6) 0(0) 0(1) 0(0) 48/43670
4(35) 0(0) 0(0) 0(13) 0(0) 0(2) 3(12) 0(0) 1(24) 0(0) 0(1) 0(0) 0(2) 0(0) 8/36122
4258
w a t e r r e s e a r c h 4 4 ( 2 0 1 0 ) 4 2 5 2 e4 2 6 0
this study, their values were low (1 in most cases) and more sequencing depth is required to either exclude them or more confidently describe the presence of these primary pathogens in biosolids. Previous estimates of risk from Salmonella spp. during biosolids land application suggest that a viable pathogen level ofe103 cells/dry gram may lead to a 1 in 10,000 risk, Brooks et al. (2005). A 103 value, when compared to the 1010 total cells per dry gram in biosolids requirese107 sequences to identify a pathogenic sequence at this enrichment. This depth is now close to attainable, given that order of magnitude increases in sequencing technology have just been introduced, Metzker (2010). Sample pretreatment to remove common sequences, such as DGGE or the use of multiple genus level PCR primers to focus on pathogens of concern may also be required to fully investigate these rare sequences and describe diversity. Moreover, recent studies that tracked the culturable and total concentrations of pathogens in MAD, TPAD, and COM suggest that culturable and qPCR-based values are similar and that PCR-based detection may be a reasonable indicator of pathogen load in a biosolids samples, Viau and Peccia (2009a). Finally, while building libraries to understand pathogen diversity will most certainly provide new and important insights into pathogen loads in the environment, the question of identifying a species at 0.03 Jukes-Cantor distance takes on added importance. Although the 400 base length is a significant improvement over 100 or 250 bases, it still may not allow definitive, phylogenetic placement of pathogens. Also, the 0.03 Jukes-Cantor distance (although well accepted as an operational definition) may not define a true pathogen. In many cases, for example Clostridia spp., species grouped with
known pathogens but were greater than 0.03 Jukes-Cantor distance (Fig. 4). Without additional information, it cannot be determined if these similar sequences are nonpathogenic strains or if they were strains or variants that may be associated with disease but not yet identified. In absence of sequences describing all pathogens associated with infectious disease, environmental sample sequencing efforts that are based on universal primers do not yet have the resolving power to identify the significance of sequences closely related to known pathogens but outside of accepted species-level distance. This power will improve as clinical microbiology continues to populate databases with sequences of known or suspected etiological agents.
4.
Conclusions
Efforts to determine pathogen concentration and exposure in environmental air, water, and wastewater samples are biased by the requirement to select a single or limited group of potential pathogens for analysis. Massively parallel sequencing technology coupled with continually increasing read lengths and sequence quantities can potentially remove these biases by sequencing deeply enough into populations to describe the true diversity of pathogens and include both established and emerging agents. Application of pyrosequencing technology here to biosolids and soil identified 135 bacterial pathogens, demonstrated methods for defining sequences of pathogenic species, and highlighted the importance of determining the sequencing depth required to describe appropriate risk levels. However, this approach has certain limitations, including the need for greater sequencing depth to reach a level where primary pathogens are abundant and the potential uncertainty in definitive pathogen placement using the 16s rRNA gene. Future DNA sequencing enabled surveys of pathogen presence in environmental samples should build upon existing pyrosequencing studies to determine proper sequencing depth, sequencing targets, and bioinformatics techniques. As regulatory agencies consider performing national surveys of biosolids pathogens to develop risk-based land application guidelines, the judicious application of pyrosequencing should provide important guidance for prioritizing the culture-based and quantitative analysis of selected pathogens and ensure the public that significant species are not left out of these risk assessments.
Acknowledgements This work was supported by the National Science Foundation grant BES0348455. KJB is partially supported by a fellowship from the Environmental Research and Education Foundation. Fig. 4 e Phylogenetic tree of MAD1 sequences that were classified as Clostridia spp. Sequences within 0.03 JukesCantor distance of known pathogens are bolded. Tree rooted by Escherichia coli (GQ273522.1). Bootstrap values calculated by 1000 repetitions. Values >50% indicated by open circles, values >70% indicated by closed circles.
references
Altschul, S.F., Gish, W., Miller, W., Meyers, E.W., Lipman, D.J., 1990. Basic local alignment search tool. Journal of Molecular Biology 215, 403e410.
w a t e r r e s e a r c h 4 4 ( 2 0 1 0 ) 4 2 5 2 e4 2 6 0
Andersson, A.F., Lindberg, M., Jakobsson, H., Backhead, F., Nyren, P., Engstrand, L., 2008. Comparative analysis of human gut microbiota by barcoded pyrosequencing. PLoS ONE 3 (7), e2836. Baertsch, C., Paez-Rubio, T., Viau, E., Peccia, J., 2007. Source tracking aerosols released from land-applied class B biosolids during high-wind events. Applied and Environmental Microbiology 73, 4522e4531. Bastian, R., 1997. The biosolids (sludge) treatment, beneficial use, and disposal situation in the USA. European Water Pollution Control Journal 7, 62e72. Brooks, J.P., Tanner, B.D., Gerba, C., Haas, C., Pepper, I., 2005. Estimation of bioaerosol risk of infection to residents adjacent to a land applied biosolids site using an empirically derived transport model. Journal of Applied Microbiology 98 (2), 397e405. Chouari, R., Le Paslier, D., Daegelen, P., Weissenbach, J., Sghir, A., 2005. Novel predominant archaeal and bacterial groups revealed by molecular analysis of an anaerobic sludge digestor. Applied and Environmental Microbiology 7, 1104e1115. Cole, J.R., Wang, Q., Cardenas, E., Fish, J., Chai, B., Farris, R.J., Kulam-Syed-Mohideen, A.S., McGarrell, D.M., Marsh, T., Garrity, G.M., Tiedje, J.M., 2009. The ribosomal database project: improved alignments and new tools for rRNA analysis. Nucleic Acids Research 37, D141eD145. DeSantis, T.Z., Hugenholtz, P., Larsen, N., Rojas, M., Brodie, E.L., Keller, K., Huber, T., Dalevi, D., Hu, P., Andersen, G.L., 2006. Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB. Applied and Environmental Microbiology 72 (7), 5069e5072. Dowd, S.E., Sun, Y., Secor, P.R., Rhoads, D.D., Wolcott, B.M., James, G.A., Wolcott, R.D., 2008. Survey of bacterial diversity in chronic wounds using pyrosequencing, DGGE, and full ribosome shotgun sequencing. BMC Microbiol 8, 43. Eisenberg, J.N.S., Moore, K., Soller, J.A., Eisenberg, D.M., Colford Jr., J.M., 2008. Microbial risk assessment framework for exposure to amended sludge projects. Environmental Health Perspectives 116 (6), 727e733. Felsenstein, J., 2009. (Phylogeny Inference Package) Version 3.6. Distributed by the Author. Department of Genome Sciences, University of Washington, Seattle. Hamady, M., Lozupone, C., Knight, R., 2009. Fast UniFrac: facilitating high-throughput phylogenetic analyses of microbial communities including analysis of pyrosequencing and PhyloChip data. ISME Journal 4 (1), 17e27. Humblot, C., Guyot, J.-P., 2009. Pyrosequencing of tagged 16S rRNA gene amplicons for rapid deciphering of the microbiomes of fermented foods such as Pearl Millet slurries. Applied and Environmental Microbiology 75 (13), 4354e4361. Knight, R., Maxwell, P., Birmingham, A., Carnes, J., Caporaso, J.G., Easton, B., Eaton, M., Hamady, M., Lindsay, H., Liu, Z., Lozupone, C., McDonald, D., Robeson, M., Sammut, R., Smit, S., Wakefield, M., Widmann, J., Wikman, S., Wilson, S., Ying, H., Huttley, G., 2007. PyCogent: a toolkit for making sense from sequence. Genome Biology 8 (8), R171. Larkin, M.A., Blackshields, G., Brown, N.P., Chenna, R., McGettigan, P.A., McWilliam, H., Valentin, F., Wallace, I.M., Wilm, A., Lopez, R., Thompson, J.D., Gibson, T.J., Higgins, D.G., 2007. Clustal W and Clustal X version 2.0. Bioinformatics 23 (21), 2947e2948. Liu, Z., Lozupone, C., Hamady, M., Bushman, F.D., Knight, R., 2007. Short pyrosequencing reads suffice for accurate microbial community analysis. Nucleic Acids Research 35 (18), e120. Low, S.Y., Paez-Rubio, T., Baertsch, C., Kucharski, M., Peccia, J., 2007. Off-site exposure to respirable aerosols produced during the disk-incorporation of class B biosolids. Journal of Environmental Engineering 133 (10), 987e994. Luna, R.A., Fasciano, L.R., Jones, S.C., Boyanton Jr., B.L., Ton, T.T., Versalovic, J., 2007. DNA pyrosequencing-based bacterial
4259
pathogen identification in a pediatric hospital setting. Journal of Clinical Microbiology 45 (9), 2985e2992. Margulies, M., Egholm, M., Altman, W.E., Attiya, S., Bader, J.S., Bemben, L.A., Berka, J., Braverman, M.S., Chen, Y.J., Chen, Z.T., Dewell, S.B., Du, L., Fierro, J.M., Gomes, X.V., Godwin, B.C., He, W., Helgesen, S., Ho, C.H., Irzyk, G.P., Jando, S.C., Alenquer, M.L.I., Jarvie, T.P., Jirage, K.B., Kim, J.B., Knight, J.R., Lanza, J.R., Leamon, J.H., Lefkowitz, S.M., Lei, M., Li, J., Lohman, K.L., Lu, H., Makhijani, V.B., McDade, K.E., McKenna, M.P., Myers, E.W., Nickerson, E., Nobile, J.R., Plant, R., Puc, B.P., Ronan, M.T., Roth, G.T., Sarkis, G.J., Simons, J.F., Simpson, J.W., Srinivasan, M., Tartaro, K.R., Tomasz, A., Vogt, K.A., Volkmer, G.A., Wang, S.H., Wang, Y., Weiner, M.P., Yu, P.G., Begley, R.F., Rothberg, J.M., 2005. Genome sequencing in microfabricated high-density picolitre reactors. Nature 437 (7057), 376e380. McKenna, P., Hoffmann, C., Minkah, N., Aye, P.P., Lackner, A., Liu, Z., Lozupone, C.A., Hamady, M., Knight, R., Bushman, F.D., 2008. The macaque gut microbiome in health, lentiviral infection, and chronic enterocolitis. PLoS Pathogens 4 (2), e20. Metzker, M.L., 2010. Sequencing technologies e the next generation. Nature Review Genetics 11 (1), 31e46. NRC, 2002. Committee on Toxins and Pathogens in Biosolids Applied to Land: Advancing Standards and Practices. National Research Council, Washington DC. Paez-Rubio, T., Ramarui, A., Sommer, J., Xin, H., Anderson, J., Peccia, J., 2007. Emission rates and characterization of aerosols produced during the spreading of dewatered class B biosolids. Environmental Science & Technology 41 (10), 3537e3544. Pourcher, A.M., Francoise, P.B., Virginie, F., Agnieszka, G., Vasilica, S., Gerard, M., 2007. Survival of faecal indicators and enteroviruses in soil after land-spreading of municipal sewage sludge. Applied Soil Ecology 35 (3), 473e479. Quince, C., Curtis, T.P., Sloan, W.T., 2008. The rational exploration of microbial diversity. ISME Journal 2 (10), 997e1006. Reeder, J., Knight, R., 2009. The ‘rare biosphere’: a reality check. Nature Methods 6 (9), 636e637. Roche, 2009. Genome Sequencer FLX System Software Manual, version 2.3. Roesch, L.F.W., Fulthorpe, R.R., Riva, A., Casella, G., Hadwin, A.K. M., Kent, A.D., Daroub, S.H., Camargo, F.A.O., Farmerie, W.G., Triplett, E.W., 2007. Pyrosequencing enumerates and contrasts soil microbial diversity. ISME Journal 1 (4), 283e290. Rusin, P.A., Maxwell, S.L., Brooks, J.P., Gerba, C.P., Pepper, I.L., 2003. Evidence for the absence of Staphylococcus aureus in land applied biosolids. Environmental Science and Technology 37, 4027e4030. Sanapareddy, N., Hamp, T.J., Gonzalez, L.C., Hilger, H.A., Fodor, A. A., Clinton, S.M., 2009. Molecular diversity of a North Carolina wastewater treatment plant as revealed by pyrosequencing. Applied and Environmental Microbiology 75 (6), 1688e1696. Schloss, P.D., Handelsman, J., 2006. Toward a census of bacteria in soil. PLoS Computational Biology 2 (7), e92. Sinclair, R., Boone, S.A., Greenberg, D., Keim, P., Gerba, C.P., 2008. Persistence of category a select agents in the environment. Applied and Environmental Microbiology 74 (3), 555e563. Sogin, M.L., Morrison, H.G., Huber, J.A., Welch, D.M., Huse, S.M., Neal, P.R., Arrieta, J.M., Herndl, G.J., 2006. Microbial diversity in the deep sea and the underexplored “rare biosphere”. Proceedings of the National Academy of Sciences 103 (32), 12115e12120. Spicer, S., 2002. Fertilizer, manures, or biosolids. Water Environment and Technology 14, 32e33. Tamura, K., Dudley, J., Nei, M., Kumar, S., 2007. MEGA4: molecular evolutionary genetics analysis (MEGA) software version 4.0. Molecular Biology and Evolution 24 (8), 1596e1599. Tenenbaum, D., 1997. The beauty of biosolids. Environmental Health 105 (1).
4260
w a t e r r e s e a r c h 4 4 ( 2 0 1 0 ) 4 2 5 2 e4 2 6 0
USEPA, 1999. Environmental regulations and technology: control of pathogens and vector attraction in sewage sludge. Office of Research and Development, US Environmental Protection Agency, Washington DC. USEPA, 2001. Workshop on Emerging Infectious Disease Agents and Associated with Animal Manures, Biosolids and Other Similar By-products. USEPA National Risk Management Research Laboratory, Cincinnati, OH. USEPA, 2006. Method 1680: Fecal Coliforms in Sewage Sludge (Biosolids) by Multiple-Tube Fermentation using Lauryl Tryptose Broth (LTB) and EC Medium.
Viau, E., Peccia, J., 2009a. Evaluation of the enterococci indicator in biosolids using culture-based and quantitative PCR assays. Water Research 43 (19), 4878. Viau, E., Peccia, J., 2009b. A survey of wastewater indicators and human pathogen genomes in biosolids produced by class A and class B stabilization treatments. Applied and Environmental Microbiology 75, 164e174. Wang, Q., Garrity, G.M., Tiedje, J.M., Cole, J.R., 2007. Naive bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Applied and Environmental Microbiology 73 (16), 5261e5267.