Support vector regression model of wastewater bioreactor performance using microbial community diversity indices: Effect of stress and bioaugmentation

Support vector regression model of wastewater bioreactor performance using microbial community diversity indices: Effect of stress and bioaugmentation

w a t e r r e s e a r c h 5 3 ( 2 0 1 4 ) 2 8 2 e2 9 6 Available online at www.sciencedirect.com ScienceDirect journal homepage: www.elsevier.com/lo...

1MB Sizes 1 Downloads 27 Views

w a t e r r e s e a r c h 5 3 ( 2 0 1 4 ) 2 8 2 e2 9 6

Available online at www.sciencedirect.com

ScienceDirect journal homepage: www.elsevier.com/locate/watres

Support vector regression model of wastewater bioreactor performance using microbial community diversity indices: Effect of stress and bioaugmentation Hari Seshan a,b, Manish K. Goyal c, Michael W. Falk d, Stefan Wuertz a,b,e,* a

Singapore Centre on Environmental Life Sciences Engineering (SCELSE), School of Biological Sciences SBS-B1N-27, Nanyang Technological University, 60 Nanyang Drive, Singapore 637551, Singapore b Department of Civil and Environmental Engineering, 2001 Ghausi Hall, One Shields Avenue, University of California, Davis, CA 95616, USA c Department of Civil Engineering, Indian Institute of Technology, Guwahati, 781039, India d HDR Engineering, Inc., 2365 Iron Point Road, Suite 300, Folsom, CA 95630-8709, USA e School of Civil and Environmental Engineering, 50 Nanyang Ave, Nanyang Technological University, Singapore 639798, Singapore

article info

abstract

Article history:

The relationship between microbial community structure and function has been examined

Received 25 May 2013

in detail in natural and engineered environments, but little work has been done on using

Received in revised form

microbial community information to predict function. We processed microbial community

28 October 2013

and operational data from controlled experiments with bench-scale bioreactor systems to

Accepted 7 January 2014

predict reactor process performance. Four membrane-operated sequencing batch reactors

Available online 21 January 2014

treating synthetic wastewater were operated in two experiments to test the effects of (i) the toxic compound 3-chloroaniline (3-CA) and (ii) bioaugmentation targeting 3-CA degrada-

Keywords:

tion, on the sludge microbial community in the reactors. In the first experiment, two re-

TRFLP

actors were treated with 3-CA and two reactors were operated as controls without 3-CA

Microbial community modeling

input. In the second experiment, all four reactors were additionally bioaugmented with a

Vector regression model

Pseudomonas putida strain carrying a plasmid with a portion of the pathway for 3-CA

Microbial community diversity

degradation. Molecular data were generated from terminal restriction fragment length

Bioaugmentation

polymorphism (T-RFLP) analysis targeting the 16S rRNA and amoA genes from the sludge

Chloroanaline

community. The electropherograms resulting from these T-RFs were used to calculate diversity indices e community richness, dynamics and evenness e for the domain Bacteria as well as for ammonia-oxidizing bacteria in each reactor over time. These diversity indices were then used to train and test a support vector regression (SVR) model to predict reactor performance based on input microbial community indices and operational data. Considering the diversity indices over time and across replicate reactors as discrete values, it was found that, although bioaugmentation with a bacterial strain harboring a subset of genes involved in the degradation of 3-CA did not bring about 3-CA degradation, it significantly

* Corresponding author. Singapore Centre on Environmental Life Sciences Engineering (SCELSE), School of Biological Sciences SBS-B1N27, Nanyang Technological University, 60 Nanyang Drive, Singapore 637551, Singapore E-mail address: [email protected] (S. Wuertz). 0043-1354/$ e see front matter ª 2014 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.watres.2014.01.015

w a t e r r e s e a r c h 5 3 ( 2 0 1 4 ) 2 8 2 e2 9 6

283

affected the community as measured through all three diversity indices in both the general bacterial community and the ammonia-oxidizer community (a ¼ 0.5). The impact of bioaugmentation was also seen qualitatively in the variation of community richness and evenness over time in each reactor, with overall community richness falling in the case of bioaugmented reactors subjected to 3-CA and community evenness remaining lower and more stable in the bioaugmented reactors as opposed to the unbioaugmented reactors. Using diversity indices, 3-CA input, bioaugmentation and time as input variables, the SVR model successfully predicted reactor performance in terms of the removal of broad-range contaminants like COD, ammonia and nitrate as well as specific contaminants like 3-CA. This work was the first to demonstrate that (i) bioaugmentation, even when unsuccessful, can produce a change in community structure and (ii) microbial community information can be used to reliably predict process performance. However, T-RFLP may not result in the most accurate representation of the microbial community itself, and a much more powerful prediction tool can potentially be developed using more sophisticated molecular methods. ª 2014 Elsevier Ltd. All rights reserved.

1.

Introduction

Wastewater treatment (WWT) plants are increasingly tasked with addressing removal of specific contaminants like xenobiotics as a result of ever more stringent discharge regulations, and because these contaminants may disrupt regular process performance. For example, the state of Oregon in the United States recently revised water quality standards for toxic pollutants in surface waters (Oregon Department of Environmental Quality, 2011). This action will require WWT plants discharging into surface waters in the state to remove a wider range of contaminants to lower levels. However, current practice in WWT system design is more suited to the removal of broad category constituents like biochemical oxygen demand (BOD), which represents general organic substrates. The approach assumes that the microbial community involved in the biological treatment process has sufficient functional redundancy and diversity so that knowledge of the microbial community structure and its specific metabolic capabilities is unnecessary. This redundancy has been shown for broadrange processes like BOD removal and nitrification (Hien et al., 2011; Reid et al., 2008; Siripong and Rittmann, 2007; Wang et al., 2012). For example, ammonia oxidation, crucial to the nitrification process common in most wastewater treatment systems around the world, was functionally stable in reactors where the ammonia oxidizing community was dynamic (Siripong and Rittmann, 2007; Wang et al., 2012). This has also been shown in the case of BOD removal, where reactors with high community diversity performed similarly (Reid et al., 2008; Valentin-Vargas et al., 2012; Wang et al., 2011). However, functional redundancy may not be apparent in the case of specific processes in WWT, like the response to perturbations (Kaewpipat and Grady, 2002) and the mineralization of xenobiotics (Falk and Wuertz, 2010). Engineers would therefore benefit from a more detailed and specific knowledge of the role of microbial communities in biological wastewater treatment. An overall understanding of reactor dynamics must originate in understanding three components: (a) operational, environmental and design conditions of the reactor,

(b) the microbial community in the reactor, and (c) reactor treatment performance in response to (a) and (b) and to their possible fluctuations. While parameters that fall under (a) are often easily controlled or monitored, (b) is not as straightforward to control or monitor, requiring a reliable method to describe microbial communities. Several cultivation-independent methods exist to qualitatively describe microbial communities, of which terminal restriction fragment length polymorphism (T-RFLP) has historically been an effective method in terms of base pair resolution and lower labor intensity (Osborn et al., 2000; Schutte et al., 2008). A simple yet effective way to quantitatively describe microbial communities based on T-RFLP data is through the use of diversity indices. For example, the three diversity indices, richness (R), community dynamics (Dy) and functional organization (Fo), defined here as evenness, can be used to describe a microbial community using data obtained from any established fingerprinting or sequencing technique (Marzorati et al., 2008). Once a microbial community has been adequately described, its relationship to function can be investigated. In this case, diversity indices may be used to relate the microbial community in bioreactors to process performance. Traditional statistical analyses in wastewater systems are gradually being replaced by artificial intelligence soft computing approaches such as artificial neural networks (ANN), fuzzy logic and adaptive neuro fuzzyefuzzy interference system (ANFIS) and can be used to identify complex relationships between community structure and function. Such models are useful in modeling biological systems because they are able to account for nonlinear relationships that are not yet completely understood in theory (Chen et al., 2003; Huang et al., 2010). They have been used extensively to model fullscale WWT systems (Aguado et al., 2009; Civelekoglu et al., 2009; Hamed et al., 2004; Hanbay et al., 2008), and it has been demonstrated that these techniques are generally more satisfactory than those given by traditional regression equations (Civelekoglu et al., 2009; Hamed et al., 2004; Karthikeyan et al., 2005). Recently, Singh and colleagues used support vector regression (SVR) to predict the removal efficiency of

284

w a t e r r e s e a r c h 5 3 ( 2 0 1 4 ) 2 8 2 e2 9 6

settling basins and the results were found to be better than the neural network approach (Singh et al., 2008). SVR has been used for prediction of scour depth in water resources engineering and the results were compared with neural network based models (Goyal and Ojha, 2011). The SVR-based model more accurately predicted the scour downstream than did the ANN model. The majority of approaches in the field of WWT have been applied (a) to full-scale or pilot-scale systems, (b) without taking information about the microbial community into consideration and (c) during unstressed periods where only the removal of broad-range contaminants was examined. Machine learning models like the ones described above have not been used to predict reactor performance in controlled bench-scale reactor systems where the response of the microbial community to the input of stress can be investigated using the relevant experimental controls. The compound 3-chloroaniline (3-CA), a xenobiotic compound that is a byproduct in the production of azo dyes, herbicides and pesticides, served as a stressor in our experiments. 3-CA is poorly biodegradable and disrupts regular processes like COD removal and nitrification (Boon et al., 2002; Falk and Wuertz, 2010; Krol et al., 2012). Bioaugmentation, the addition of a specific bacterial strain to a bioreactor to enhance degradation of this compound, has been addressed before (Bathe, 2004; Bathe et al., 2009; Boon et al., 2002), but the impact of such a stress and bioaugmentation on the microbial community, as well as the subsequent impact on process performance and its predictability, have not been studied before. In this study, the structure and dynamics of the microbial community in controlled bench-scale reactor experiments were examined by means of diversity indices extracted from T-RFLP data and used to predict process performance. The objectives were to (i) use diversity indices as a simple tool to quantitatively describe the dynamics of microbial communities in controlled bioreactor experiments with stress; (ii) relate the process performance of these reactors to the microbial community structure and experimental parameters using a Support Vector Regression (SVR) model; and (iii) test the ability of the SVR model to predict the effect of stress in the form of the input of 3-chloroaniline (3-CA) and of bioaugmentation aimed at degrading this compound. We wanted to test the following hypotheses: (a) the addition of 3-CA would cause a significant shift in the microbial community structure as represented by diversity indices and (b) bioaugmentation, if successful, would help buffer such a community shift. With respect to performance prediction, we hypothesized that (c) the community structure as seen through diversity indices and experimental conditions could reliably predict reactor performance in the treatment of both broad-range as well as specific contaminants.

2.

Materials and methods

2.1.

Experimental set-up

The data used to develop the models were generated from two bench-scale reactor experiments, the details of which have

been described earlier (Falk et al., 2009; Falk and Wuertz, 2010). The data generated from these experiments were re-analyzed with the objectives stated above. In the first, described here as the unbioaugmented experiment, stress was imposed on an acclimated sludge microbial community in the form of a model xenobiotic compound, 3-chloroaniline (3-CA) (Falk and Wuertz, 2010). Two reactors were used as control reactors, with no 3-CA input (total COD input of 400 mg/L), and two were used as treatment reactors, with 3-CA input mixed in with the feed at a level representing 25% of the total COD load (67 mg/L 3-CA, or 100 mg/L 3-CA as COD out of a total COD input of 400 mg/L; the remainder of the COD components were scaled back to 75% of their concentrations in the control reactors). The reactors were run as membrane-operated sequencing batch reactors with 2 h anoxic conditions (including feeding), 7 h aeration, and 3 h membrane filtration (with aeration). Aeration was administered at 0.2 L/min of air. pH was maintained at 7.5 using a buffered feed. The input of 3CA was shown to disrupt the process performance of an otherwise stable reactor community. In the second experiment, designated the bioaugmented experiment, the control biomass from the unbioaugmented experiment was pooled and redistributed among the four reactors and bioaugmented with Pseudomonas putida strain UWC3 (pWDL7::rfp), which carries the upper pathway genes on its plasmid for partial degradation of 3-CA via conversion to the intermediate 4-chlorocatechol (Krol et al., 2012). This host strain was added to all four reactors at the beginning of the experiment to a final mixed liquor concentration of 1.2  107 cells/mL, representing about 1% of the total active biomass in the reactors. The biomass was subjected to the same 3-CA loading as in the unbioaugmented experiment. All other operational parameters were the same as in the unbioaugmented experiment. Again, two control and two treatment reactors were used, in terms of 3-CA input. The host strain and the plasmid were both maintained stably at low levels in reactors as evidenced by qPCR and epifluorescence microscopy; however, the expression of a full degradation pathway was not evident and 3-CA removal did not occur (Falk et al., 2013). Each of the above experiments was started using thoroughly cleaned reactors and sterilized tubing, and sludge that had not previously been exposed to 3-CA.

2.2.

Determination and analysis of diversity indices

Mixed liquor samples from each reactor in each of the above experiments were collected periodically and processed as described in earlier work (Falk et al., 2009; Falk and Wuertz, 2010). Extracted DNA was subjected to PCR using 8f/926r primers for the 16S rRNA gene (Liu et al., 1997) and 1f/2r primers for amoA (Rotthauwe et al., 1997). For each sample, an electropherogram was generated from CfoI enzyme digestion for the 16S rRNA gene amplicon and separately from the amoA gene amplicon. This electropherogram was processed to generate a T-RFLP profile (Abdo et al., 2006). The final T-RFLP profile that was used to calculate diversity indices was produced by implementing a signal to noise ratio of 3 and a resolving power of 3 bp between consecutive peaks. The final spectrum was interpreted to indicate, roughly, a composite of fragments of unique lengths ideally representing distinct taxa,

285

w a t e r r e s e a r c h 5 3 ( 2 0 1 4 ) 2 8 2 e2 9 6

and the fluorescence peak intensities roughly corresponding to their relative abundances (Schutte et al., 2008). Using the data from both experiments, diversity indices were determined as described earlier (Marzorati et al., 2008), but adapted for T-RFLP data rather than for DGGE data using the following procedure: (i) Richness (R): The number of detectible true peaks in each sample, (ii) Dynamics (Dy): an adaptation of a “moving window” analysis comparing samples in consecutive time steps using Pearson’s productemoment correlation coefficient to indicate similarity (Marzorati et al., 2008). The complement of this value gives the difference between samples, as given in Equation (1):   Pn  i¼1 Xi  X Yi  Y q ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi Dy ¼ 1  qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi  2 Pn  2ffi Pn  i¼1 Xi  X i¼1 Yi  Y

(1)

where X ¼ fluorescence intensities from first sample, Y ¼ fluorescence intensities from second sample, n ¼ number of T-RFs detected in each sample. (iii) Evenness (Fo): This was calculated by generating a rarefaction curve for cumulative relative abundance of each detected OTU against the corresponding cumulative number of detected OTU. To be consistent across samples, the cumulative relative abundance at which 20% of the detected OTUs were present was taken as the evenness value for each sample (Marzorati et al., 2008).

were measured according to Standard Methods 4500-NH3 D and 4110 C respectively.

2.4.

Support vector regression

Mathematically, support vector machines (SVMs) are a range of classification and regression algorithms that have been formulated from the principles of statistical learning theory (Vapnik, 1995). The formulation embodies the structural risk minimization (SRM) principle, as opposed to the empirical risk minimization (ERM) approach commonly employed within statistical learning methods (Gunn, 1998). In support vector regression fðxi ; yi Þgli¼1 is considered as a training set where each xi 3Rn represents the input space of the sample and has a corresponding scalar measure output value yi 3R for i ¼ 1,., l, where l denotes the size of the training data. Here the goal is to find a function that predicts the response in the best possible way (Han et al., 2007). The form of this function is: f ðxÞ ¼ ðw$FðxÞÞ þ b

(2)

n

where w3R and b3R denote the weight vector and the bias, respectively, while F denotes a non-linear transformation from Rn to a high dimensional space. In SVR, the aim is to find the value of w and b such that values of x can be determined by minimizing the regression risk, Rreg (Wu et al., 2003): Rreg ðf Þ ¼ C

[ X   1 G f ðxi Þ  yi þ kwk2 2 i¼0

(3)

where Gð$Þ is a cost function, C is a constant and vector w can be written in terms of data points as: w¼

[ X 

 ai  ai Fðxi Þ

(4)

i¼1

The diversity indices thus generated were used to test the differences between experiments, treatments and reactors. The means of R, Dy and Fo were compared in the cases of reactors with and without 3-CA in bioaugmented and unbioaugmented experiments, and for reactors within treatments using two-way ANOVA models. Finally, the changes in diversity indices over the course of each experiment were also examined to evaluate how the MBR community profiles changed over time.

2.3.

Other predictor and response variables

Variables used in the model other than the diversity indices described above were mixed liquor total suspended solids (TSS or MLSS), presence of stress (1 for input of 3-CA and 0 for no 3-CA), presence of bioaugmentation (1 for the bioaugmented experiment and 0 for the unbioaugmented experiment) and time (reactor operation period) as predictor variables and COD and 3-CA removal as response variables. COD measurements in influent and effluent were made in accordance with Standard Methods 5220 D (APHAeAWWAeWEF, 1998) and TSS in mixed liquor was measured per Standard Methods 5240 D. 3-CA in influent and effluent was measured using reverse-phase HPLC as described before (Falk and Wuertz, 2010). Also modeled as response variables were ammonia and nitrate in the effluent. These

By substituting Eq. (4) into Eq. (2), Eq (2) may be rewritten as: f ðxÞ ¼

[ X  i¼1

[ X    ai  ai ðFðxi Þ$FðxÞÞ þ b ¼ ai  ai kðxi ; xÞ þ b

(5)

i¼1

In Eq. (3) the dot product can be replaced with function k(xi,x), known as the kernel function. The flexibility of the SVR is provided by the use of kernel functions since kernel functions enable the dot product to be performed in highdimensional feature space using low dimensional space data input without knowing the transformation F. A linear solution in the higher dimensional feature space corresponds to a nonlinear solution in the original, lower dimensional input space. There are methods available that use non-linear kernels in their approach towards regression problems while applying SVR (Maity et al., 2010). All kernel functions must satisfy Mercer’s condition that corresponds to the inner product of some feature space (Goyal and Ojha, 2011). There exist several choices of kernel function K including linear, polynomial and the most commonly used Gaussian radial basis function: o n (6) kðxi ; xÞ ¼ exp  gjx  xi j2 The purpose of the SVR is to find a function having at most ε deviation from the actual target vectors for all given training data. The ε-insensitive loss function is the most widely used cost function. The function is in the form:

286

w a t e r r e s e a r c h 5 3 ( 2 0 1 4 ) 2 8 2 e2 9 6

     jf ðxÞ  yj  ε; forjf ðxÞ  yj  ε G f x y ¼ 0 otherwise

(7)

Syj)xi ¼

Dyj=y

j

(9)

Dxi=x

i

The regression risk in Eq. (3) and theε-insensitive loss function (7) can be minimized by solving the quadratic optimization problem in Eq. (8): [ [       X  1X ai ðyi εÞ  ai yi þ ε ai  ai aj  aj k xi ; xj  2 i;j¼1 i¼1

(8)

subject to the following constraint: [ X

ai  ai ¼ 0; ai ; ai ˛½0; C

i¼1

ai ; ai

where the Lagrange multipliers represent solutions to the above quadratic problem that act as forces pushing predictions towards output value yi. Only the non-zero values of the Lagrange multipliers in Eq. (8) are useful in forecasting the regression line and are known as support vectors (SVs) (Goyal and Ojha, 2011). The constant, C, introduced in Eq. (3) determines the penalties to the estimation errors. A large C assigns higher penalties to errors so that the regression is trained to minimize errors, which have a lower generalization, while a small C assigns lower penalties to errors; this allows the minimization of the margin with errors, improving the generalization ability of the model. If C is infinitely large, the SVR algorithm would not allow the occurrence of any error, and this would result in a complex model, whereas when C is zero, the result will contain larger errors and the model will be less complex. For further details, readers are referred to other literature (Gandhi et al., 2007; Goyal and Ojha, 2011; Gunn, 1998; Wu et al., 2003).

2.5.

Sensitivity analysis

To quantify the relative importance of each diversity index with respect to the prediction of reactor performance, a sensitivity analysis was performed on each SVR model according to earlier work (Almeida, 2002). This was done by evaluating the normalized ratio of change in a predicted output value to the incremental change in a given input value, as shown in Eq. (9):

Where Syj)xi is the sensitivity of the jth SVR output to the ith input. Since the relationships between diversity indices and performance parameters are nonlinear, a cumulative value of individual sensitivities was calculated for a given input and output variable and normalized to the sum of cumulative sensitivities of all three diversity indices for a given performance parameter.

3.

Results

3.1.

Diversity indices

3.1.1.

Differences between treatments

The diversity indices developed from the T-RFLP results of molecular community data included richness, dynamics and evenness, as specified earlier (Marzorati et al., 2008). Two sets of diversity indices were generated per sample; one for the general bacterial community using 16S rRNA gene T-RFs and another set for the ammonia-oxidizing community using amoA gene T-RFs. In terms of the general bacterial community, the values of richness (R) ranged from 9 to 30, the dynamics index (Dy) ranged from 0.001 to 0.384, and the evenness index (Fo) ranged from 0.6 to 0.9 over all the data points (Table 1). Since only two replicate reactors were used per treatment for both experiments, statistical analyses within time points were not deemed useful. Therefore, we computed differences on the whole set of samples, taking each time point as an independent measurement. For both the general bacterial community and the ammonia-oxidizing community, the diversity indices generated were compared using a three-way fixed effects ANOVA model. In each model, each diversity index was tested for the effects of individual reactor, presence of 3-CA, and bioaugmentation. For the general bacterial community, examined using 16S rRNA gene-derived T-RFs, richness (R) was affected significantly by bioaugmentation (p ¼ 8.89  105) but not by 3-CA (p ¼ 0.306). Community evenness (Fo) was significantly affected by both 3-CA (p ¼ 8.98  105) and bioaugmentation

Table 1 e Richness (R), dynamics (Dy) and evenness (Fo) for T-RFs generated from the 16S rRNA gene or amoA gene. The numbers represent the mean and one standard deviation of the mean. Bioaugmentationa 16S rRNA gene T-RFs 0 0 1 1 amoA T-RFs 0 0 1 1 a b

3-CAb

n

0 1 0 1

26 26 30 30

20.1  21.2  18.0  18.2 

2.8 3.4 2.1 4.4

0.05 0.04 0.06 0.09

   

0.03 0.05 0.05 0.07

0.81 0.82 0.81 0.77

   

0.23 0.06 0.06 0.07

0 1 0 1

30 30 22 22

19.5  19.4  17.5  13.7 

6.2 5.6 3.6 3.0

0.09 0.10 0.33 0.33

   

0.04 0.04 0.01 0.01

0.63 0.61 0.98 0.9

   

0.09 0.13 0.01 0.01

0 ¼ Unbioaugmented experiment; 1 ¼ bioaugmented experiment. 0 ¼ Control reactors (no 3-CA input); 1 ¼ treatment reactors (3-CA input).

R

Dy

Fo

w a t e r r e s e a r c h 5 3 ( 2 0 1 4 ) 2 8 2 e2 9 6

287

Fig. 1 e Variation of diversity indices from 16S rRNA gene T-RFs over time in all four reactors in unbioaugmented and bioaugmented experiments. Richness (R) is shown over the unbioaugmented experiment (A) and the bioaugmented experiment (B), Dynamics (Dy) over the unbioaugmented experiment (C) and the bioaugmented experiment (D), and Evenness (Fo) over the unbioaugmented experiment (E) and the bioaugmented experiment (F). In all plots, dashed lines with open symbols represent control reactors (operated without 3-CA addition) and solid lines with closed symbols represent treatment reactors (treated with 3-CA at 25% of the total COD feed). Plots A, C and E represent the unbioaugmented experiment, whereas plots B, D and F represent the bioaugmented experiment in which a partial degradation pathway for 3-CA was provided to each reactor.

(p ¼ 1.59  106). Community dynamics (Dy), on the other hand, was only affected by bioaugmentation (p ¼ 0.0013). The interaction effects between bioaugmentation and 3-CA were also significant for Dy (p ¼ 0.0386). In the case of the ammonia-oxidizer community, similar three-way ANOVA models were run for diversity indices generated from amoA T-RFs. In this case, R for the ammoniaoxidizer community ranged from 9 to 35; Dy from 0.04 to 0.36; and Fo from 0.13 to 0.99 over all the data points (Table 1). Testing the means within each treatment, experiment and reactor revealed that R was primarily affected by bioaugmentation (p ¼ 0.0002). Bioaugmentation was also found to have a significant effect on the Dy and Fo indices (in both

cases p ¼ 2  1016). Interaction effects between 3-CA and bioaugmentation were not significant in these cases.

3.1.2.

Temporal changes

The temporal changes in diversity indices were also investigated for both the 16S rRNA gene T-RFs (Fig. 1) and amoA T-RFs (Fig. 2) within each experiment. In terms of the 16S rRNA gene, the richness index in the unbioaugmented experiment stayed constant overall in both control and treatment reactors over time (Fig. 1A). On the contrary, in the bioaugmented experiment, the control reactors displayed little change while treatment reactors showed an overall fall in richness (Fig. 1B). Bioaugmentation in conjunction with 3-CA input appeared to

288

w a t e r r e s e a r c h 5 3 ( 2 0 1 4 ) 2 8 2 e2 9 6

Fig. 2 e Variation of diversity indices from amoA T-RFs over time in all four reactors in unbioaugmented and bioaugmented experiments. Richness (R) is shown over the unbioaugmented experiment (A) and the bioaugmented experiment (B), Dynamics (Dy) over the unbioaugmented experiment (C), and the bioaugmented experiment (D) and Evenness (Fo) over the unbioaugmented experiment (E) and the bioaugmented experiment (F). In all plots, dashed lines with open symbols represent control reactors (operated without 3-CA addition) and solid lines with closed symbols represent treatment reactors (treated with 3-CA at 25% of the total COD feed). Plots A, C and E represent the unbioaugmented experiment and plots B, D and F represent the bioaugmented experiment in which a partial degradation pathway for 3-CA was provided to each reactor.

show the greatest temporal change in community richness. It must be noted, however, that since sampling was less frequent toward the end of the experiment, any fluctuations that occurred during this period may not have been observed. The dynamics index (Dy) from the 16S rRNA gene did not show overall trends in either experiment or treatment type. In the unbioaugmented experiment, Dy remained relatively constant throughout, with the exception of a peak in one of the treatment reactors around day 20 (Fig. 1C). In the bioaugmented experiment, Dy was slightly higher than in the unbioaugmented experiment. Two clear peaks were observed;

one in treatment reactors alone and one in all four reactors, near days 20 and 30 of reactor operation, respectively (Fig. 1D). Community dynamics in general appeared to fluctuate more toward the end of the bioaugmented experiment. There was no difference between control and treatment reactors in either experiment (Fig. 1C and D), except for the above mentioned presence of a sudden spike. The evenness index (Fo) from the 16S rRNA gene showed little overall change throughout the course of both experiments. Fo values were, overall, higher without bioaugmentation, implying that the community profiles may

289

w a t e r r e s e a r c h 5 3 ( 2 0 1 4 ) 2 8 2 e2 9 6

have been slightly more even in the bioaugmented experiment (Fig. 1E and F). Toward the end of both experiments, the control reactors had higher Fo values (lower evenness) than the treatment reactors (Fig. 1E and F). In the bioaugmented experiment, a clear divergence between control and treatment reactors was observed at the end of the experiment (Fig. 1F). Similar investigations were made for diversity indices derived from the amoA gene T-RFs. R was not affected by 3-CA input (Fig. 2A). However, the temporal changes in R over the course of the bioaugmented experiment (Fig. 2B) reveal that control (no 3-CA) reactors showed consistently higher R values than did 3-CA treated reactors from Day 19 onwards. There was little change in Dy for amoA over time during the unbioaugmented experiment, revealing little difference between control and treatment reactors (Fig. 2C); however, both treatment reactors exhibited an oscillatory pattern with a wavelength of about 10 days during the first half of the experiment until about Day 33, following which the wavelength shifted to a period of about 20 days. During the second half, the control reactors began to exhibit the same pattern in phase with the treatment reactors (Fig. 2C). In contrast, amoA Dy during the bioaugmentation experiment was different, showing little variation over time among the four reactors, although Dy was consistently higher than in the unbioaugmented experiment by about three times (Fig. 2D). The variation of Fo for amoA over time was similar to that of Dy, in that clear differences were seen between the two experiments. Without bioaugmentation, Fo fell sharply toward the end of the experiment in all four reactors. In the bioaugmented experiment, Fo was consistently high for all four reactors, indicating a highly uneven ammonia-oxidizer community, most likely dominated by very few phyla. As in the case of Dy, the reactors in the bioaugmented experiment exhibited much higher Fo values than in the unbioaugmented experiment.

3.2.

SVR Modeling

In the SVR model, the input variables were the presence of 3-CA (designated “Treatment”), the presence of bioaugmentation, reactor, time, mixed liquor total suspended solids (TSS), community richness (R), community dynamics (Dy) and community evenness (Fo). The last three parameters (the diversity indices) were derived separately for the 16S rRNA gene representing the general bacterial community and amoA representing the ammonia oxidizer community. The diversity indices derived from 16S rRNA data were used as inputs in models where the effluent COD and 3-CA concentrations were used, separately, as

output variables; the indices from amoA data were used in models where effluent ammonia and nitrate concentrations were applied separately as output variables (Table 2). Since all other environmental and operational parameters were kept constant across all four reactors in both experiments, they were not included as inputs in SVR modeling. The application of SVR involves the optimization of the regularization cost parameter (C), type of kernel, and the kernel-specific parameter g(Eqs. (3) and (7)). Determining appropriate values of these parameters is often achieved by trial and error. The appropriate values of these parameters were found after a number of trials of several models used in this study. The range for g used in trials ranged from 0.1 to 1000. The SVM with a radial basis kernel function was selected to construct the SVM model. The radial basis kernel function (g ¼ 549) and cost parameter ¼ 1 were found to provide the best results. To arrive at a suitable choice of these parameters, the correlation coefficient (CC) and root mean square error (RMSE) were compared and a combination of parameters providing the smallest value of RMSE and the highest value of correlation coefficient was selected for the final results. For each model, a training data set was used to establish the SVM models, and the remaining data were used to evaluate the performance of (test) the models. In the models we developed, about two-thirds of the data set was used to train the model and the remaining third was used to test it. The training and testing data points were selected randomly from all the data collected in both experiments. Model M4 performed best in training while model M2 performed worst in both training and validation datasets (Table 2). CC and RMSE were 0.98 and 10.79, respectively, in training, while the same were 0.97 and 13.42, respectively, for validation for COD removal (Table 2). For Ammonia, CC and RMSE were 0.92 and 13.12, respectively, for training and 0.89 and 14.28, respectively, for the validation data set (Table 2). Based on RMSE values, Model M3 for nitrate removal performed better compared to the other models. For Model M3, CC and RMSE were 0.95 and 6.41, respectively, in training while the same were 0.94 and 8.41, respectively for validation. For model M4, CC and RMSE were 0.94 and 9.32, respectively, for the validation data set (Table 2). In terms of predicting COD removal, the model M1 was able to clearly differentiate between control and treatment reactors, where three clusters of data were formed along the ideal prediction line (Fig. 3A and B), more notably in the case of the test data set (Fig. 3B). Based on the measured effluent COD, these clusters can be inferred to represent control reactors in both experiments. Looser clusters were also present,

Table 2 e List of models analyzed. Parameters used in the models are indicated as [ f(). Models M1 and M4 used diversity indices (R, Dy and Fo) from the 16S rRNA gene T-RFLP dataset and models M2 and M3 used diversity indices (R, Dy and Fo) from the amoA T-RFLP dataset. Model

M1 M2 M3 M4

Parameter COD ¼ f(Bioaugmentation,Treatment,Reactor,Day,TSS,R,Dy,Fo) Amm ¼ f(Bioaugmentation,Treatment,Reactor,Day,TSS,R,Dy,Fo) Nitrate ¼ f(Bioaugmentation,Treatment,Reactor,Day,TSS,R,Dy,Fo) 3-CA ¼ f(Bioaugmentation,Treatment,Reactor,Day,TSS,R,Dy,Fo)

Correlation coefficient

RMSE

Training

Validation

Training

Validation

0.98 0.92 0.95 0.99

0.97 0.89 0.94 0.94

10.79 13.12 6.41 3.65

13.42 14.28 8.41 9.32

290

w a t e r r e s e a r c h 5 3 ( 2 0 1 4 ) 2 8 2 e2 9 6

Fig. 3 e Support vector regression (SVR) model results showing predicted concentrations of response variables plotted against corresponding measured concentrations in four models used to predict effluent COD (mg/L) (A, B), effluent ammonia (mg/L NH3eN) (C, D), effluent nitrate (mg/L NOL 3 eN) (E, F) and effluent 3-chloroaniline (mg/L) (G, H) concentrations. The x-axis of each plot includes the measured values of each parameter and the y-axis the corresponding values predicted by the model. Plots A, C, E and G on the left depict the performance of the SVR model with the data set used for training and plots B, D, F and H on the right show the performance of the same model with the data set used for validation. The dashed line through each plot shows the ideal “y [ x” line along which the points would lie for a model that perfectly fit the data set. Also included on each plot are a linear fit for each set of measured versus predicted points and the coefficient of determination of each fit. Finally, clusters are shown, representing samples from control reactors (dotted lines) and 3-CA treated reactors (dashed lines).

representing treatment reactors in the bioaugmented experiment and treatment reactors in the unbioaugmented experiment, moving from left to right along the x-axis. This implies that the model was able to accurately group reactors based on their respective treatments. Similar but looser clustering was seen in the case of effluent ammonia prediction (Fig. 3C and D); however, predictions in

the test case (Fig. 3D) were not as clearly segregated between control and treatment reactors as with COD. Two loose clusters are seen in Fig. 3D e treatment reactors forming a set in the middle of the plot and control reactors toward the bottom left. Effluent nitrate predictions were much more reliable (Fig. 3E and F), with control and treatment reactors clustering as expected based on effluent nitrate concentrations.

w a t e r r e s e a r c h 5 3 ( 2 0 1 4 ) 2 8 2 e2 9 6

291

Fig. 4 e Relative sensitivities of predicted SVR output parameters COD (A), ammonia (B), nitrate (C) and 3-CA (D) to the diversity index inputs richness, evenness and dynamics.

SVR model results to predict 3-CA in effluent are shown (Fig. 3G and H). In this case, data representing the control reactors were not included in the model because there was no 3CA input into those reactors. 3-CA predictions are adequate; however, since the compound was not processed by the microbial community to our knowledge, it may not be possible for microbial community information to quantitatively predict its levels of removal.

3.3.

Sensitivity analysis on SVR predictions

The sensitivity of SVR outputs to the diversity index inputs was tested to investigate the relative importance of the three diversity indices, richness, dynamics and evenness, on process performance (Fig. 4). COD predictions were found to be most affected by changes in richness, accounting for 90.1% of total sensitivity in COD prediction while evenness and dynamics played negligible roles in COD sensitivity (Fig. 4A). This trend was similar for 3-CA predictions (Fig. 4D), where richness accounted for the bulk of variation in COD prediction (88.6%). However, ammonia and nitrate predictions (Fig. 4B and C respectively) were both more sensitive to dynamics and evenness indices while richness played a negligible role in these nitrification models.

4.

Discussion

The three major hypotheses addressed in this study were, first, that the addition of 3-CA would cause a significant shift

in microbial community structure as represented by diversity indices; second, that bioaugmentation would help buffer such a community shift; and third, that the community structure and experimental conditions could be used together to reliably predict reactor performance in the treatment of broadrange and specific contaminants.

4.1.

Diversity indices to compare bioreactor communities

Our first hypothesis regarding the effect of 3-CA on community structure was not found to hold true statistically. Although community richness as measured using both the 16S rRNA gene and amoA and community evenness as measured using the 16S rRNA gene diverged between control and treatment reactors towards the end of both experiments, the difference was not statistically significant (results not shown). Richness fell with the addition of 3-CA in the bioaugmented experiment (both genes), but did not show a change with 3-CA presence in the unbioaugmented experiment. The presence of 3-CA apparently resulted in a more even community in both bioaugmented and unbioaugmented reactors (16S rRNA gene). The fall in richness with 3-CA addition may have been the result of a less diverse and less specialized community developing from the input of a toxic, recalcitrant compound. Such an effect on the sludge community has also been found in the case of systems where degradation of the toxin was obtained (Lozada et al., 2006; Zhang et al., 2012). The apparent rise in evenness in the general bacterial community with 3-CA input is not intuitive, and this is the first study to our knowledge to make this finding.

292

w a t e r r e s e a r c h 5 3 ( 2 0 1 4 ) 2 8 2 e2 9 6

However, it must be noted that the differences in evenness at the end of the experiment were not statistically significant. Our second hypothesis was that bioaugmentation targeted at responding to stress input would buffer any shifts in community structure produced by the stressor (Boon et al., 2002). In the bioaugmented experiment, we added to all reactors P. putida UWC3 (pWDL7::rfp), a strain containing a partial degradation pathway for 3-CA (Krol et al., 2012) on the plasmid, and two of the four reactors were additionally subjected to 3-CA input. If community buffering were reflected in diversity indices, control and treatment reactors would have closer community structures in the bioaugmented experiment than in the unbioaugmented experiment. This was clearly not the case in any of our reactors. Indeed, in the bioaugmented experiment, community richness from 16S rRNA actually fell more sharply over time in the 3-CA-treated reactors as opposed to the control reactors, while the differences were negligible in the unbioaugmented experiment. Community function in terms of COD and ammonia removal was also not buffered by bioaugmentation. The nitrifier community as observed from T-RFLP of the amoA gene was more affected by bioaugmentation than was the general bacterial community as evidenced by the changes in dynamics and evenness indices. This implies that the ammonia-oxidizer community grew genotypically less even and fluctuated more in relative abundance of detected T-RFs in the bioaugmented reactors. The input of 3-CA was previously shown to affect nitrification (Falk and Wuertz, 2010), and toxicant input is known to destabilize the balance between ammonia-oxidizers and nitrite-oxidizers, hampering the nitrification process (Graham et al., 2007), but the present study shows that 3-CA input alone did not have any effect on nitrifier community diversity. The fact that differences in community structure between bioaugmented and unbioaugmented reactors were also observed from the 16S rRNA gene, leads us to conclude that bioaugmentation was the major factor in bringing about these community differences. Several researchers examining the impact of bioaugmentation on community structure have reported differences in community structure between bioaugmented and control communities (Bai et al., 2012; Podmirseg et al., 2010; RodriguezeRodriguez et al., 2012; Roh and Chu, 2011; Truu et al., 2007). However, other work has found that unsuccessful bioaugmentation did not impact microbial community structure (Westerholm et al., 2012). To our knowledge, this is the first study suggesting that although bioaugmentation did not produce the targeted result in terms of performance, it may nevertheless have impacted community structure as observed through diversity indices. The partial degradation pathway for 3-CA that was the basis of bioaugmentation (both the plasmid containing these genes and the host strain carrying this plasmid) was maintained in control as well as treatment reactors, as seen by quantitative PCR and epifluorescence microscopy (Falk et al., 2013). The addition and maintenance of these bioaugmented genes may have been the reason for this shift in community structure and dynamics. We considered that observed differences may have arisen due to the experimental design e the unbioaugmented and unbioaugmented experiments were run at different times, and systemic effects could have had an impact on the results.

However, this is unlikely, since the operational and environmental conditions the reactors were exposed to in each experiment were identical, with the exception of bioaugmentation, and the same acclimated sludge was used in both experiments. We also considered that bioaugmentation differences could have arisen due to the mere presence of the bioaugmenting strain as an additional food/carbon source. Assuming that a cell is represented by the formula C12H87O23N12P, we calculated that the additional COD exerted by the bioaugmented strain’s cells would be a little over 1 mg/L as COD, which is unlikely to have had a drastic impact on community structure. To conclude, none of these alternate explanations could account for the observed effect of bioaugmention on community diversity.

4.2.

SVR Model to predict bioreactor performance

In terms of our third hypothesis, we demonstrated that microbial community indices can be used to predict broad-range reactor performance in terms of COD removal and nitrification, as well as specific contaminant removal in terms of 3-CA removal. The association between diversity indices and COD removal was very strong, as shown by our SVR models. Nitrification was also predicted well in our models, albeit not as strongly as COD. Indeed, 3-CA predictions using the 16S rRNA gene indices were also good, given that only half as much data could be used to train the 3-CA model because only half the reactors were subjected to 3-CA input. Furthermore, since 3-CA was not processed by the microbial community, reliable predictions of 3-CA removal from community indices may not be possible to make. As in the case of most machine learning models, the model parameters are highly dependent on the nature of the dataset. The developed SVR model has the following characters (Thissen et al., 2003): (1) a global optimal solution exists that will be found, i.e., use of a quadratic optimization problem, which provides a global minimum in comparison to the presence of a local minimum; (2) the result is a general solution avoiding overtraining; and (3) non-linear solutions can be calculated efficiently because of the usage of inner products. The model parameter gamma is inversely related to sigma, which is a degree for spread around a mean in statistics: the higher the value of gamma, the lower the value of sigma, thus the less spread or the more nonlinear the behavior of the kernel. The values of these training parameters C and gamma are determined. To arrive at a suitable choice of these parameters, the correlation coefficient (CC) and root mean square error (RMSE) were compared and a combination of parameters providing the smallest value of RMSE and the highest value of correlation coefficient was selected for the final results. Then, the performance of the constructed model is estimated on the training data. Finally, the constructed model is validated by comparing these predictions with the real observations. The aim of this study was to investigate the feasibility of such a model in the controlled experimental setting provided here, using microbial community parameters to predict broad and specific community functions, and this was achieved successfully. An eventual goal is to build a tool to train real-time from existing data by monitoring conditions at a wastewater treatment plant, receiving data on microbial

w a t e r r e s e a r c h 5 3 ( 2 0 1 4 ) 2 8 2 e2 9 6

community parameters and providing predictions of effluent quality in cases of system perturbations like the pulse input of a xenobiotic compound. Such real-time models have been attempted in full-scale systems with varying degrees of success (Galinha et al., 2013; Huang et al., 2009), but seldom using community information. Partly, this is because of the bottleneck presented by molecular analysis and feeding these data to a model. Our work is the first step toward showing that microbial community data can be used to provide good predictions of process performance. As technologies in investigating microbial communities advance and become more time- and cost-effective, this bottleneck will become easier to address, and microbial community information will be a critical input in predicting system performance. Other work in relating community structure to function has found strong associations of community structure with the removal of broad-range contaminants like COD (Gao et al., 2012), nitrogen (Martinez-Hernandez et al., 2011; Sun et al., 2012) and phosphorus (Carvalho et al., 2007; Slater et al., 2010). In terms of specific contaminant removal, it has been found that the degradation of chlorinated aromatics in groundwater correlated with the abundance of chlorocatechol-2,3-dioxygenase as evidenced by T-RFLP performed on the PCR product of this gene (Tancsics et al., 2010). Genes for specific metabolisms relating to chlorinated aromatics are beginning to be characterized (Junca and Pieper, 2004; Kasuga et al., 2007; Krol et al., 2012; Kunze et al., 2009). The investigation of such genes rather than the 16S rRNA gene to predict specific contaminant removal is now more viable and may produce more reliable results. Similarly, other functional parameters can also be closely correlated to the genotypes responsible for their performance. Our final goal in this paper was to go beyond forming associations between microbial community and influent or effluent, and attempt to make predictions for effluent quality from our controlled experiments. Most previous research with regards to predicting biological wastewater treatment process performance using complex models has focused on using physical parameters as predictors in full-scale systems (Civelekoglu et al., 2009; Hamed et al., 2004; Hanbay et al., 2008; Perendeci et al., 2009). However, these process parameters are primarily influenced by microbial mediation and would therefore be best predicted by examining the microbial community. Werner and colleagues correlated microbial community parameters derived from 454 pyrosequencing to COD removal and methanogenesis as performance parameters in full-scale anaerobic systems, but did not use the data to predict performance in their reactors (Werner et al., 2011). Another avenue of research in predicting community function is to examine the microbial community beyond DNA, or community composition. Transcriptomics (Yu and Zhang, 2012) and metabolomics (Bouju et al., 2011), which represent community activity rather than composition, and are emerging as major co-players with DNA in the field of microbial community analysis, may be more useful in predicting process performance in wastewater treatment.

4.3.

Sensitivity analysis on SVR outputs

Analyzing the sensitivity of each process performance parameter (effectively, each SVR model) to each diversity

293

index provides an insight into the “black box” of microbial community structure’s influence on function (Wolf et al., 2005). It also sheds light on whether all three diversity indices contribute toward predicting process performance. Here, we observed that COD and 3-CA levels in effluent, as predicted by our models, were several times more sensitive to the richness index than to dynamics and evenness. On the other hand, the predictions of nitrogen species were more sensitive to dynamics and evenness indices while richness contributed negligibly. The fact that community richness plays an important role in predicting COD removal is expected, but this may not hold true in other sludges with higher richness values. Marzorati et al. (2008) estimated range-weighted richness at a value of 55  11 in domestic sludge, at which levels functional redundancy may overtake community richness in deciding COD removal levels. Since relatively few genera are known to play active roles in nitrification in sludge (Curtis and Sloan, 2004; Wittebolle et al., 2008), it is not surprising that richness was less prominent in predicting ammonia and nitrate levels. It is possible that the nitrifier community profile (as evidenced by dynamics and evenness indices) is more influential on the nitrification process than the overall diversity of nitrifiers. The most direct implication of our sensitivity analysis is that all three diversity indices may currently be needed to provide adequate predictive models.

4.4.

Outlook

On average only 23% of obtained T-RFs were matched to a bacterial clone library produced from the initial inoculum (Falk et al., 2009). Since there was no immigration into the system, this finding indicates that the majority of obtained TRF spectra lie outside the sequenced library. T-RFLP analysis does not identify the full range of microbial diversity in a community and consequently may not describe its dynamics adequately. Recent work has used high-throughput DNA sequencing on the Illumina platform to characterize microbial communities in full-scale wastewater treatment plants (Albertsen et al., 2012; Werner et al., 2011) as well as lab-scale bioreactors (Ye et al., 2012). These methods may add value to predictive modeling approaches. In addition, “omics” approaches make it possible to accurately characterize not only the community, but also gene expression and real-time metabolic activity in the reactor. Insights into the expression of specific metabolic genes could significantly enhance our ability to predict community response to specific contaminants. Future work on the development of predictive models in activated sludge systems should therefore involve sequencing and “omics” approaches.

5.

Conclusions

We used microbial community fingerprinting data from the operation of four bench-scale bioreactors to investigate microbial communities in controlled experiments and predict reactor performance in terms of broad-range and specific contaminants in controlled experiments. We demonstrated that:

294

w a t e r r e s e a r c h 5 3 ( 2 0 1 4 ) 2 8 2 e2 9 6

 Bioaugmentation with a partial bioadegradation pathway for 3-CA produced major changes in community structure, even though bioaugmentation did not produce degradation of the target contaminant.  Reactor performance in terms of effluent COD, nitrification and 3-CA could be predicted reliably from microbial community information using a support vector regression model in controlled experiments.

Acknowledgments This work was supported by EPA STAR and UC TSR&TP Fellowships, and by the Singapore Ministry of Education and National Research Foundation through a RCE award to the Singapore Centre on Environmental Life Sciences Engineering (SCELSE).

references

Abdo, Z., Schuette, U.M.E., Bent, S.J., Williams, C.J., Forney, L.J., Joyce, P., 2006. Statistical methods for characterizing diversity of microbial communities by analysis of terminal restriction fragment length polymorphisms of 16S rRNA genes. Environ. Microbiol. 8, 929e938. Aguado, D., Ribes, J., Montoya, T., Ferrer, J., Seco, A., 2009. A methodology for sequencing batch reactor identification with artificial neural networks: a case study. Comput. Chem. Eng. 33, 465e472. Albertsen, M., Hansen, L.B.S., Saunders, A.M., Nielsen, P.H., Nielsen, K.L., 2012. A metagenome of a full-scale microbial community carrying out enhanced biological phosphorus removal. ISME J. 6, 1094e1106. Almeida, J.S., 2002. Predictive non-linear modeling of complex data by artificial neural networks. Curr. Opin. Biotechnol. 13, 72e76. APHAeAWWAeWEF, 1998. Standard Methods for the Examination of Water and Wastewater. Bai, Y.H., Sun, Q.H., Sun, R.H., Wen, D.H., Tang, X.Y., 2012. Comparison of denitrifier communities in the biofilms of bioaugmented and non-augmented zeolite-biological aerated filters. Environ. Technol. 33, 1993e1998. Bathe, S., 2004. Conjugal transfer of plasmid pNB2 to activated sludge bacteria leads to 3-chloroaniline degradation in enrichment cultures. Lett. Appl. Microbiol. 38, 527e531. Bathe, S., Schwarzenbeck, N., Hausner, M., 2009. Bioaugmentation of activated sludge towards 3-chloroaniline removal with a mixed bacterial population carrying a degradative plasmid. Bioresour. Technol. 100, 2902e2909. Boon, N., De Gelder, L., Lievens, H., Siciliano, S.D., Top, E.M., Verstraete, W., 2002. Bioaugmenting bioreactors for the continuous removal of 3-chloroaniline by a slow release approach. Environ. Sci. Technol. 36, 4698e4704. Bouju, H., Hommes, G., Wintgens, T., Malpei, F., Corvini, P.F.X., 2011. The fate of C-14-radiolabelled diclofenac and 4 ’-hydroxydiclofenac in membrane bioreactor treatment of wastewater. Water Sci. Technol. 63, 2878e2885. Carvalho, G., Lemos, P.C., Oehmen, A., Reis, M.A.M., 2007. Denitrifying phosphorus removal: linking the process performance with the microbial community structure. Water Res. 41, 4383e4396. Chen, J.C., Chang, N.B., Shieh, W.K., 2003. Assessing wastewater reclamation potential by neural network model. Eng. Appl. Artif. Intell. 16, 149e157.

Civelekoglu, G., Yigit, N.O., Diamadopoulos, E., Kitis, M., 2009. Modelling of COD removal in a biological wastewater treatment plant using adaptive neuro-fuzzy inference system and artificial neural network. Water Sci. Technol. 60, 1475e1487. Curtis, T.P., Sloan, W.T., 2004. Prokaryotic diversity and its limits: microbial community structure in nature and implications for microbial ecology. Curr. Opin. Microbiol. 7, 221e226. Falk, M.W., Song, K.G., Matiasek, M.G., Wuertz, S., 2009. Microbial community dynamics in replicate membrane bioreactors e natural reproducible fluctuations. Water Res. 43, 842e852. Falk, M.W., Seshan, H., Dosoretz, C., Wuertz, S., 2013. Partial bioaugmentation to remove 3-chloroaniline slows bacterial species turnover rate in bioreactors. Water Res. 47, 7109e7119. Falk, M.W., Wuertz, S., 2010. Effects of the toxin 3-chloroaniline at low concentrations on microbial community dynamics and membrane bioreactor performance. Water Res. 44, 5109e5115. Galinha, C.F., Guglielmi, G., Carvalho, G., Portugal, C.A.M., Crespo, J.G., Reis, M.A.M., 2013. Development of a hybrid model strategy for monitoring membrane bioreactors. J. Biotechnol. 164, 386e395. Gandhi, A.B., Joshi, J.B., Jayaranlan, V.K., Kulkarni, B.D., 2007. Development of support vector regression (SVR)-based correlation for prediction of overall gas hold-up in bubble column reactors for various gas-liquid systems. Chem. Eng. Sci. 62, 7078e7089. Gao, X.Y., Xu, Y., Liu, Y., Liu, Z.P., 2012. Bacterial diversity, community structure and function associated with biofilm development in a biological aerated filter in a recirculating marine aquaculture system. Mar. Biodivers. 42, 1e11. Goyal, M.K., Ojha, C.S.P., 2011. Estimation of scour downstream of a ski-jump bucket using support vector and M5 model tree. Water Resour. Manag. 25, 2177e2195. Graham, D.W., Knapp, C.W., Van Vleck, E.S., Bloor, K., Lane, T.B., Graham, C.E., 2007. Experimental demonstration of chaotic instability in biological nitrification. ISME J. 1, 385e393. Gunn, S.R., 1998. Support Vector Machines for Classification and Regression. Speech Image and Intelligent Systems Research Group, Southampton. Technical report. http://users.ecs.soton. ac.uk/srg/publications/pdf/SVM.pdf. Hamed, M.M., Khalafallah, M.G., Hassanien, E.A., 2004. Prediction of wastewater treatment plant performance using artificial neural networks. Environ. Model. Softw. 19, 919e928. Han, D., Chan, L., Zhu, N., 2007. Flood forecasting using support vector machines. J. Hydroinform. 9, 267e276. Hanbay, D., Turkoglu, I., Demir, Y., 2008. Prediction of wastewater treatment plant performance based on wavelet packet decomposition and neural networks. Expert Syst. Appl. 34, 1038e1043. Hien, T.T.N., Le, V.Q., Hansen, A.A., Nielsen, J.L., Nielsen, P.H., 2011. High diversity and abundance of putative polyphosphate-accumulating Tetrasphaera-related bacteria in activated sludge systems. FEMS Microbiol. Ecol. 76, 256e267. Huang, M.Z., Wan, J.Q., Ma, Y.W., Li, W.J., Sun, X.F., Wan, Y., 2010. A fast predicting neural fuzzy model for on-line estimation of nutrient dynamics in an anoxic/oxic process. Bioresour. Technol. 101, 1642e1651. Huang, M.Z., Wan, J.Q., Ma, Y.W., Wang, Y., Li, W.J., Sun, X.F., 2009. Control rules of aeration in a submerged biofilm wastewater treatment process using fuzzy neural networks. Expert Syst. Appl. 36, 10428e10437. Junca, H., Pieper, D.H., 2004. Functional gene diversity analysis in BTEX contaminated soils by means of PCR-SSCP DNA fingerprinting: comparative diversity assessment against bacterial isolates and PCR-DNA clone libraries. Environ. Microbiol. 6, 95e110. Kaewpipat, K., Grady, C.P.L., 2002. Microbial population dynamics in laboratory-scale activated sludge reactors. Water Sci. Technol. 46, 19e27.

w a t e r r e s e a r c h 5 3 ( 2 0 1 4 ) 2 8 2 e2 9 6

Karthikeyan, B., Gopal, S., Vimala, M., 2005. Conception of complex probabilistic neural network system for classification of partial discharge patterns using multifarious inputs. Expert Syst. Appl. 29, 953e963. Kasuga, I., Nakajima, F., Furumai, H., 2007. Diversity of catechol 2,3-dioxygenase genes of bacteria responding to dissolved organic matter derived from different sources in a eutrophic lake. FEMS Microbiol. Ecol. 61, 449e458. Krol, J.E., Penrod, J.T., McCaslin, H., Rogers, L.M., Yano, H., Stancik, A.D., Dejonghe, W., Brown, C.J., Parales, R.E., Wuertz, S., Top, E.M., 2012. Role of IncP-1 beta plasmids pWDL7::rfp and pNB8c in chloroaniline catabolism as determined by genomic and functional analyses. Appl. Environ. Microbiol. 78, 828e838. Kunze, M., Zerlin, K.F., Retzlaff, A., Pohl, J.O., Schmidt, E., Janssen, D.B., Vilchez-Vargas, R., Pieper, D.H., Reineke, W., 2009. Degradation of chloroaromatics by Pseudomonas putida GJ31: assembled route for chlorobenzene degradation encoded by clusters on plasmid pKW1 and the chromosome. Microbiology-Sgm 155, 4069e4083. Liu, W.T., Marsh, T.L., Cheng, H., Forney, L.J., 1997. Characterization of microbial diversity by determining terminal restriction fragment length polymorphisms of genes encoding 16S rRNA. Appl. Environ. Microbiol. 63, 4516e4522. Lozada, M., Figuerola, E.L.M., Itria, R.F., Erijman, L., 2006. Replicability of dominant bacterial populations after longterm surfactant-enrichment in lab-scale activated sludge. Environ. Microbiol. 8, 625e638. Maity, R., Bhagwat, P.P., Bhatnagar, A., 2010. Potential of support vector regression for prediction of monthly streamflow using endogenous property. Hydrol. Process. 24, 917e923. Martinez-Hernandez, S., Texier, A.C., Cuervo-Lopez, F.D., Gomez, J., 2011. 2-Chlorophenol consumption and its effect on the nitrifying sludge. J. Hazard. Mater. 185, 1592e1595. Marzorati, M., Wittebolle, L., Boon, N., Daffonchio, D., Verstraete, W., 2008. How to get more out of molecular fingerprints: practical tools for microbial ecology. Environ. Microbiol. 10, 1571e1581. Oregon Department of Environmental Quality, O.-D, 2011. Table 40: Human Health Water Quality Criteria for Toxic Pollutants. http://www.deq.state.or.us/wq/rules/div041/table40.pdf. Osborn, A.M., Moore, E.R.B., Timmis, K.N., 2000. An evaluation of terminal-restriction fragment length polymorphism (T-RFLP) analysis for the study of microbial community structure and dynamics. Environ. Microbiol. 2, 39e50. Perendeci, A., Arslan, S., Tanyolac, A., Celebi, S.S., 2009. Effects of phase vector and history extension on prediction power of adaptive-network based fuzzy inference system (ANFIS) model for a real scale anaerobic wastewater treatment plant operating under unsteady state. Bioresour. Technol. 100, 4579e4587. Podmirseg, S.M., Schoen, M.A., Murthy, S., Insam, H., Wett, B., 2010. Quantitative and qualitative effects of bioaugmentation on ammonia oxidisers at a two-step WWTP. Water Sci. Technol. 61, 1003e1009. Reid, N.M., Bowers, T.H., Lloyd-Jones, G., 2008. Bacterial community composition of a wastewater treatment system reliant on N-2 fixation. Appl. Microbiol. Biotechnol. 79, 285e292. Rodriguez-Rodriguez, C.E., Jelic, A., Pereira, M.A., Sousa, D.Z., Petrovic, M., Alves, M.M., Barcelo, D., Caminal, G., Vicent, T., 2012. Bioaugmentation of sewage sludge with Trametes versicolor in solid-phase biopiles produces degradation of pharmaceuticals and affects microbial communities. Environ. Sci. Technol. 46, 12012e12020. Roh, H., Chu, K.H., 2011. Effects of solids retention time on the performance of bioreactors bioaugmented with a 17 beta-

295

estradiol-utilizing bacterium, Sphingomonas strain KC8. Chemosphere 84, 227e233. Rotthauwe, J.H., Witzel, K.P., Liesack, W., 1997. The ammonia monooxygenase structural gene amoA as a functional marker: molecular fine-scale analysis of natural ammonia-oxidizing populations. Appl. Environ. Microbiol. 63, 4704e4712. Schutte, U.M.E., Abdo, Z., Bent, S.J., Shyu, C., Williams, C.J., Pierson, J.D., Forney, L.J., 2008. Advances in the use of terminal restriction fragment length polymorphism (T-RFLP) analysis of 16S rRNA genes to characterize microbial communities. Appl. Microbiol. Biotechnol. 80, 365e380. Singh, K.K., Pal, M., Ojha, C.S.P., Singh, V.P., 2008. Estimation of removal efficiency for settling basins using neural networks and support vector machines. J. Hydrol. Eng. 13, 146e155. Siripong, S., Rittmann, B.E., 2007. Diversity study of nitrifying bacteria in full-scale municipal wastewater treatment plants. Water Res. 41, 1110e1120. Slater, F.R., Johnson, C.R., Blackall, L.L., Beiko, R.G., Bond, P.L., 2010. Monitoring associations between clade-level variation, overall community structure and ecosystem function in enhanced biological phosphorus removal (EBPR) systems using terminal-restriction fragment length polymorphism (TRFLP). Water Res. 44, 4908e4923. Sun, G.Z., Zhu, Y.F., Saeed, T., Zhang, G.X., Lu, X.G., 2012. Nitrogen removal and microbial community profiles in six wetland columns receiving high ammonia load. Chem. Eng. J. 203, 326e332. Tancsics, A., Szabo, I., Baka, E., Szoboszlay, S., Kukolya, J., Kriszt, B., Marialigeti, K., 2010. Investigation of catechol 2,3dioxygenase and 16S rRNA gene diversity in hypoxic, petroleum hydrocarbon contaminated groundwater. Syst. Appl. Microbiol. 33, 398e406. Thissen, U., van Brakel, R., de Weijer, A.P., Melssen, W.J., Buydens, L.M.C., 2003. Using support vector regression for time series prediction. Chemom. Intell. Lab. Syst 69, 35e49. Truu, J., Heinaru, E., Vedler, E., Juhanson, J., Viirmae, M., Heinaru, A., 2007. Formation of microbial communities in oil shale chemical industry solid wastes during phytoremediation and bioaugmentation. In: Heipieper, H.J. (Ed.), Bioremediation of Soils Contaminated with Aromatic Compounds, pp. 57e66. Valentin-Vargas, A., Toro-Labrador, G., Massol-Deya, A.A., 2012. Bacterial community dynamics in full-scale activated sludge bioreactors: operational and ecological factors driving community assembly and performance. PLoS One 7 (8), e42524. Vapnik, V., 1995. The Nature of Statistical Learning Theory. Springer-Verlag, Berlin. Wang, X.H., Wen, X.H., Xia, Y., Hu, M., Zhao, F., Ding, K., 2012. Ammonia oxidizing bacteria community dynamics in a pilot-scale wastewater treatment plant. PLoS One 7 (4), e36272. Wang, X.H., Wen, X.H., Yan, H.J., Ding, K., Zhao, F., Hu, M., 2011. Bacterial community dynamics in a functionally stable pilotscale wastewater treatment plant. Bioresour. Technol. 102, 2352e2357. Werner, J.J., Knights, D., Garcia, M.L., Scalfone, N.B., Smith, S., Yarasheski, K., Cummings, T.A., Beers, A.R., Knight, R., Angenent, L.T., 2011. Bacterial community structures are unique and resilient in full-scale bioenergy systems. Proc. Natl. Acad. Sci. U. S. A. 108, 4158e4163. Westerholm, M., Leven, L., Schnurer, A., 2012. Bioaugmentation of syntrophic acetate-oxidizing culture in biogas reactors exposed to increasing levels of ammonia. Appl. Environ. Microbiol. 78, 7619e7625. Wittebolle, L., Vervaeren, H., Verstraete, W., Boon, N., 2008. Quantifying community dynamics of nitrifiers in functionally stable reactors. Appl. Environ. Microbiol. 74, 286e293.

296

w a t e r r e s e a r c h 5 3 ( 2 0 1 4 ) 2 8 2 e2 9 6

Wolf, G., Almeida, J.S., Reis, M.A.M., Crespo, J.G., 2005. Modelling of the extractive membrane bioreactor process based on natural fluorescence fingerprints and process operation history. Water Sci. Technol. 51, 51e58. Wu, C.H., Wei, C.C., Su, D.C., Chang, M.H., Ho, J.M., 2003. Travel time prediction with support vector regression. Proc. IEEE Intelligent Transp. Conf., 1438e1442. Ye, L., Zhang, T., Wang, T.T., Fang, Z.W., 2012. Microbial structures, functions, and metabolic pathways in wastewater

treatment bioreactors revealed using high-throughput sequencing. Environ. Sci. Technol. 46, 13244e13252. Yu, K., Zhang, T., 2012. Metagenomic and metatranscriptomic analysis of microbial community structure and gene expression of activated sludge. PLoS One 7 (5), e38183. Zhang, Y.M., Pu, X.J., Fang, M.M., Zhu, J., Chen, L.J., Rittmann, B.E., 2012. 2,4,6-trichlorophenol (TCP) photobiodegradation and its effect on community structure. Biodegradation 23, 575e583.