Rapid in situ measurements of algal cell concentrations using an artificial neural network and single-excitation fluorescence spectrometry

Rapid in situ measurements of algal cell concentrations using an artificial neural network and single-excitation fluorescence spectrometry

Algal Research 45 (2020) 101739 Contents lists available at ScienceDirect Algal Research journal homepage: www.elsevier.com/locate/algal Review art...

2MB Sizes 0 Downloads 7 Views

Algal Research 45 (2020) 101739

Contents lists available at ScienceDirect

Algal Research journal homepage: www.elsevier.com/locate/algal

Review article

Rapid in situ measurements of algal cell concentrations using an artificial neural network and single-excitation fluorescence spectrometry ⁎

T



Jing-Yan Liua, Li-Hua Zenga, , Zhen-Hui Rena, , Tie-Min Dub, Xuan Liua a b

College of Mechanical and Electrical Engineering, Hebei Agricultural University, Baoding 071001, China College of Life Sciences, Hebei Agricultural University, Baoding 071001, China

A R T I C LE I N FO

A B S T R A C T

Keywords: Algal cell concentration Nonlinear relation Fluorescence spectra Artificial neural network

Accurate measurements of the algal cell concentration are very important in microalgae culturing and ecological monitoring. To realize an automatic, in situ measurement of the cell concentration of microalgae and to reduce the measurement cost, a detection method combining single-excitation fluorescence spectroscopy and an artificial neural network (ANN) was developed to monitor the cell concentrations of Chlamydomonas reinhardtii in the range of 2 × 105 to 6.4 × 106 mL−1 cells mL−1. Using a 470 nm wavelength light emitting diode (LED) as a light source, samples with different concentrations of Chlamydomonas reinhardtii were electronically excited. The measured fluorescence emission spectra were used as input, and the algal cell concentration was the output. Because there is a nonlinear relationship between the input and the output, a Back Propagation Neural Network Model Optimized by Genetic Algorithms (GA-BP) was established to predict the cell concentration. Then the model was validated by using samples from different growth batches. In addition, the GA-BP model was compared with the existing algae cell concentration detection methods (Back Propagation Artificial Neural Network), and it was found that the GA-BP model was more accurate. Moreover, the equipment used for this method is simple and easy to carry and install. The combination of single-excitation fluorescence spectrometry and an artificial neural network provides a feasible and cost-effective tool for algal cell concentration monitoring.

1. Introduction It is very important to accurately measure the microalgal biomass concentration in both microalgae culturing and ecological monitoring [1–3]. However, due to the small cell sizes and the large amount of microalgae, it is difficult and time consuming to accurately quantify the biomass which is one of the difficult problems in practical research and production [4]. Commonly used laboratory methods to measure the concentration of microalgae cells include dry weight measurement, cell counting, turbidity measurement, and chlorophyll-a content determination. These methods require sample collection and are not suitable for continuous monitoring. Moreover, the time lag between sample collection and sample measurement makes it impossible to monitor short-term changes of algal biomass. In view of the above problems, the development of spectroscopy in recent years has provided other ways to measure the concentration of microalgae cells. The microalgal biomass concentration detection methods that have been used include spectrophotometry, three-



dimensional fluorescence spectroscopy and two-dimensional fluorescence spectroscopy, which provide rich spectral information and nondestructive detection of microalgal biomass concentrations [1,2,5]. However, the equipment used is relatively large, expensive, and it requires complex data preprocessing; therefore, it cannot be easily used for in situ measurements. The fluorescence spectrum of single excitation light can obtain relatively simple spectral characteristics, reduce the influence of external factors on the measurement results, greatly shorten the time of spectral processing and calculation, and reduce costs [6]. Previous studies have confirmed that the single-wavelength excitation fluorescence spectrum of algae has a linear relationship with the biomass concentration of algae. However, the environment studied had a small range of algal concentrations, and the prediction equations given in these studies are only applicable to low concentrations of algae [7–9]. Gregor and Marsalek [10] observed that the in situ fluorescence technique is suitable for use in water bodies with lower levels of chlorophyll-a. However, the chlorophyll-a concentration can be underestimated if the phytoplankton concentration is high. Similar results

Corresponding authors. E-mail addresses: [email protected] (L.-H. Zeng), [email protected] (Z.-H. Ren).

https://doi.org/10.1016/j.algal.2019.101739 Received 11 May 2019; Received in revised form 20 November 2019; Accepted 20 November 2019 2211-9264/ © 2019 Elsevier B.V. All rights reserved.

Algal Research 45 (2020) 101739

J.-Y. Liu, et al.

were reported by Becker [11], in which the total fluorescence emitted by colonial cyanobacteria, such as Microcystis, was less than the summation of each cell. This nonlinear relationship between the emission fluorescence signal and the fluorophore concentration may be caused by the inner-filter effect (IFE), which results in a decrease in the fluorescence intensity at high concentrations; the principle of the IFE is described in detail by Pacheco and Bruzzone [12]. Wang et al. reviewed methods to correct the IFE in order to obtain a reliable relationship between the fluorescence spectrum and the fluorophore concentration [13]. An artificial neural network (ANN) is a general mathematical tool that is used to correct the IFE [14]. Cancilla et al. believe that the popularity of ANNs is due to their ability to discover and fully explain the highly complex and nonlinear relationship between data sets and to create a simple, manageable, mathematical model [14]. In recent years, artificial neural networks have been used to predict the dynamic growth of microalgae at any nutrient concentration [15] and to predict the next-day concentration of Spirulina by using the biomass concentration, the nitrate concentration, the pH of the culture medium, the dissolved oxygen concentration, the light intensity and the days of culture [15,16] to distinguish between single algal and mixed algal cultures combined with the absorption spectra [17]. The above studies can prove the feasibility of the application of artificial neural networks in the field of microalgae detection. Compared with the multivariate calibration model, a neural network model can obtain better prediction results when there is an unknown, nonlinear relationship in the samples. Therefore, in recent years, artificial neural networks have become an alternative to the traditional regression model. The main purpose of this study is to determine whether using a single fluorescence emission spectrum combined with Back Propagation Artificial Neural Network Model (BP) can predict algal cell densities. We used the fluorescence emission spectrum obtained by 470 nm excitation light without any additional data as the input of the ANN model. The genetic algorithm (GA) was used to optimize the neural network and the model was applied to three groups of algae under different growth conditions.

zero content of some components). In order to measure algal cell density across a wide range, samples were collected during the 15-day growth period. Sample data are collected once a day (the first T0 and the last T14, low cell density at initial stage to high cell density at stable stage). The cell concentration of the algae culture was determined by the number of algae cells observed under optical microscope. The method for measuring the number of algae cells was described in detail in Section 2.2. Four different batches of algae were cultured in batches at room temperature (18–25 °C) in an environment with a light-dark cycle of 12 h:12 h. BG-11 medium was used as the control medium in all experiments. Cells in their exponential growth phase were used for inoculation. Cultures were illuminated from the top using six 20 W fluorescent lamps arranged side by side. All experiments were carried out in triplicate. The growth of algae was measured by cell counting once a day. Another group of samples was collected and analyzed from four batches of algae samples, which served as an external validation data set. These validation datasets are cultured under different conditions (temperature, pH, illumination intensity) to ensure the usefulness of the model. The measurement of the spectrum and the cell concentration are described in detail in the following chapters. 2.2. Algal cell concentration acquisition Lugol solution (0.025 mL) was added to the algae samples (1 mL), and the number of cells in each sample was counted three times with a hemocyte counting plate under a microscope, and the average cell density was calculated. The number of single-celled algae or algal cell populations was recorded in the five medium-sized squares: upper right, lower right, lower left, upper left, and middle. These numbers are denoted by a1, a2, a3, a4, a5, respectively [18]. The formula for calculating the concentration of microalgae particles is as follows:

M (cells mL−1) = (a1+a2 + a3 + a4+a5)/80 × 400 × 10,000

When the density of the algae is too high to allow counting, the algae sample can be diluted and counted. The cell density of the original algae can be obtained by multiplying the dilution amount. C. reinhardtii is microscopically spherical, and dividing cells are counted if their daughter cells are clearly visible.

2. Materials and methods 2.1. Algae cultivation and sample preparation

2.3. Spectral acquisition

C. reinhardtii was sourced from the Germplasm Bank of Wild Species of the Chinese Academy of Sciences. An initial 1568 spectra comprised the original data set obtained from 120 Tris-Acetate-Phosphate (TAP) medium optimization experiments selected as the experimental data set. In the medium optimization experiment, the optimal culture medium is obtained by changing the concentration of each component in a plurality of experiments. The composition of the TAP medium and the concentration range of each component in the 1 L medium are shown in Table 1. This data set contains four different batches, each batch of 30 experiments. Different media were used for each experiment. In order to enhance the predictive ability of the model, one batch of experiments used extreme medium formulations (i.e., formulations with very high or

Spectrum acquisition system is established for the laboratory. As shown in Fig. 1, a miniature spectrometer USB6500pro (USB6500pro, Jingyi Optics, China) is selected to measure the fluorescence spectrum. SpectraPro (SpectraPro, Jingyi Optics, China) software that matching with the miniature spectrometer USB6500pro is used to analyze the measured fluorescence spectrum on the computer. The maximum range of spectral measurement is 185–1100 nm, the pixel spacing is 0.27 nm, and the resolution is 1.48 μm. 470 nm LED light source was used as exciting light resource to irradiate the prepared algae solution. An immersed “Y” type optical fiber reflecting probe is used in the experiment. The optical fiber reflecting probe is connected to the spectrometer and light source simultaneously. The diameter of the optical fiber center is 600 μm. The immersed probe consists of seven fibers tightly tied together. The sleeve is made of stainless steel (the size of the sleeve is 3.0 × 0.25 in.) and an optical window of 30° is added. Six of them are illuminated light fibers, the middle of which is the readout fiber. The measurement of reflection spectra and fluorescence can be realized with this setup. The acquisition mode selects the immersion acquisition. The optical fiber is fixed on a highly adjustable bracket, so the depth of the optical fiber probe entering algae liquid is adjustable, so as to reduce the influence of sample inhomogeneity. During the experiment, the fiber fixation directly ensures that the depth of the probe entering all the samples to be tested remains the

Table 1 Composition of the medium, and the concentration range of each component in 1 L medium. Component NH4Cl MgSO4·7H2O CaCl2·2H2O K2HPO4 KH2PO4 EDTA-Na2

Concentration (M) −3

0–7.0 × 10 0–8.3 × 10−4 0–4.5 × 10−4 0–1.65 × 10−3 0–1.05 × 10−3 0–2.5 × 10−5

Component

Concentration (M)

(NH4)6Mo7O24 Na2SeO3 ZnSO4-EDTA MnCl2-EDTA FeCl3-EDTA CuCl2-EDTA

0–2.85 × 10−5 0–1.0 × 10−7 0–2.5 × 10−6 0–6.0 × 10−6 0–2.0 × 10−5 0–2.0 × 10−6

(1)

2

Algal Research 45 (2020) 101739

J.-Y. Liu, et al.

470nm LED

Optical fiber probe

Socket

Fig. 1. Diagrammatic figure of the experimental setup for immersing spectral acquisition equipment. The “Y” type optical fiber probe is immersed in algal solution, which is connected to the spectrometer and light source simultaneously. 470 nm LED light source was used as excitation light source to irradiate the algae solution. The spectral data obtained in the experiment is read by SpectraPro software in the computer.

Computer

Algae solution Spectrometer

same. The algal samples were shaken prior to the spectral data acquisition measurements, and the sampling volume is 60 mL. In order to avoid the influence of the elastic scattering of water and the reflection of the glass container on the fluorescence spectrum, the spectral acquisition range was set to 660–760 nm. Spectral sampling interval was set to 1 nm. The measuring time of each group of samples was 15 s. In order to ensure the measuring accuracy, the sample pool and the optical fiber probe needed to be cleaned during the measuring interval between the two groups of samples. And the spectral calibration was performed using distilled water. The parameters used in spectrum acquisition are as follows: integration time (15 ms), average times (50), and smoothness (15).

P1

Σ fh

(2)

y = tansig(x) =

2 −1 1 + exp(−2x)

(3)

y = purelin(x) = x

k

Σ

fo

T

E

Σ fh

Pn Imput layer

Hidden layer

Output layer

Fig. 2. Structural chart of the Back propagation (BP) neural network. P represents the value of fluorescence intensity at each wavelength, wij represents the connection weight between the input layer and the hidden layer, wjk represents the connection weight between the hidden layer and the output layer, T represents the output value of algal cell concentration, E represents the error. The connections between nodes are weighted by wij and wjk coefficients. The output algal cell concentration is compared with the real value, and the network weights wij and wjk are adjusted by the back propagation of the error, so as to obtain the optimal network. The hidden layer is connected to the input layer by the tansig function (Eq. (3)). The output layer is connected to the hidden layer by the purelin function (Eq. (4)).

2.4.1. The back propagation neural network model optimized by genetic algorithms Genetic algorithms (GA) are a randomized search method, which are based on the evolutionary law of biology (survival of the fittest, survival of the fittest genetic mechanism). They were first proposed by Professor J. Holland of the United States in 1962 [19]. The method is a highly parallel, stochastic and adaptive optimization algorithm based on “survival of the fittest” [20]. The process of the algorithm is to copy, cross and mutate the “chromosome” group represented by the problem solution coding to evolve from generation to generation, and eventually converge to the most suitable group, so as to obtain the optimal or satisfactory solution of the problem [21]. The back propagation neural network (BP) is a commonly used ANN [22]. It was proposed by Rumelhart and McClelland in 1986 [19]. It is a multilayer feedforward neural network trained by an error back propagation algorithm [22]. The BP neural network consists of an input layer, a hidden layer and an output layer, and the hidden layer is used to connect the input layer and the output layer. Each of these layers is composed of nodes. Connecting pathways, called weights, connect these nodes. The weights between the input layer and the hidden layer are Wij, and Wjk is the weights between the hidden layer and the output layer, as shown in Fig. 2. Each node represents a specific output function called the transfer function. There are three kinds of transfer function nodes, the log sigmoid function (logsig) (Eq. (2)), the tangent sigmoid function (tansig) (Eq. (3)) and the pure linear function (purelin) (Eq. (4)) [14].

1 1 + exp(−x)

Wj

P2

2.4. Establishment of back propagation neural network based on genetic algorithms

y = logsig(x) =

Back-propagation of Erros Wij

running the model, the weighted signals (Xj) and deviations of the input nodes are summed up by the hidden layer nodes, and then projected by the transfer function of the hidden layer. The result is weighted by the connection weight Wjk between the hidden layer node and the output node (f(y)) and sent to the output node (Eqs. (5), (6)) [25].

Xj = tanh

f (y ) =

n

(∑

i=1

m

∑ j=1

x i wij + bjh

Xj wjk + bk 0

)

(5)

(6)

where i, j, and k are the number of nodes in the input layer, the hidden layer, and the output layer, respectively. bjh and bko are deviations of nodes. xi is the input signal. The error between the target value and the predicted value at the output layer are calculated, then redistributed back using the back propagation network, and the weight was adjusted accordingly until the error reached the intended target. Since the BP neural network uses the gradient descent algorithm for optimal training, it is very sensitive to the setting of initial weight and threshold setting of the network, and the local minimum value is prone to occur during the training process. Therefore, the BP neural network model optimized by genetic algorithm (GA-BP neural network) is adopted. The role of genetic algorithm in function optimization is to find the optimal initial weights and thresholds of the network. The training process begins with GA and adjusts the initial weights and thresholds of the network through selection, crossover, and mutation steps. Finally, the best initial weights and thresholds in the network model are obtained and then assigned to the BP neural network. Finally, the BP algorithm begins the training process with the best initial

(4)

In this study, the tansig function was chosen for the hidden layer nodes, and the purelin function was chosen for the output layer nodes. The nodes at each level transfer the input values and processes to the next layer. The principle of BP neural network is shown in Fig. 2. When 3

Algal Research 45 (2020) 101739

J.-Y. Liu, et al.

weights and thresholds provided by GA and approximates the optimal solution [21]. Therefore, the application of genetic algorithm greatly improves the prediction accuracy of the network.

yN = 2

NTR (N I + 1)

2.4.4. Model performance assessment The prediction performance was evaluated by the coefficient of determination (R2) and mean absolute error (MAE), and mean absolute relative error (MARE). The values of R2, MAE and MARE can be calculated using formulas (10), (11) and (12), respectively: n

R2 = 1 −

(7)

In formula (7), N is the number of nodes in the hidden layer; NTR(1568) is the number of training samples; NI (101) is the number of nodes in the input layer. The purpose of this study was to develop a more general neural network model. Therefore, networks with fewer hidden nodes were better than networks with many hidden nodes, because networks with fewer hidden neurons usually have better generalization ability and fewer over-fitting problems [25]. Through repeated experiments, a single hidden layer structure composed of 9 nodes was selected. The GA-BP neural network had a topological structure of 101-9-1, as shown in Fig. 3.

MAE =

1 n

MARE =

n

∑i = 1 (yTi − ya )2

(10)

n

∑i =1 1 n

n

∑i =1

| yxi − yxi |  yxi − yxi yxi

(11)

(12)

where n is the number of samples used for forecasting, yxi is the reference value of samples used for forecasting,  yxi is the forecasted result, Lower MAE and MARE represent better model predictions [27]. All GABP-related calculations were performed using MATLAB 2016a.

2.4.3. Input variables and data processing A total of 1568 original spectra were selected as the experimental data set. The method of ‘early stop training method (STA)’ was adopted to prevent over-fitting. The total spectral data set was randomly divided into three subsets: training set (80%), verification set (10%) and test set (10%). The training data set was used for network training. The validation data set was used to determine when to stop training in order to prevent over-fitting in the training process. The test data set was a new data set which had never been used in the training process. It was used to evaluate the generalization ability of the model after training [27]. Due to the large magnitude of the input and output variables, the following equations were used to normalize the input and output variables [26]:

I − Imin −1 Imax − Imin

∑i = 1 (yTi − ypi )2

where n is the number of samples, yTi is the true value of the sample, ypi represents the predicted value of the sample, and ya is the average of the true values in the sample. The determinant coefficient can be used to describe the correlation between two variables x and y. It can also be used to evaluate the prediction accuracy of the model. The value of R2 can be between 0 and 1. An R2 between 0.50 and 0.65 can only be used to distinguish between high and low concentrations. An R2 between 0.66 and 0.81 indicates an approximate quantitative prediction, while an R2 between 0.82 and 0.90 indicates a good prediction. If the calibration model R2 is higher than 0.91, it is considered an excellent model [22].

H

IN = 2

(9)

In the above equations, I is the input variable, IN is the normalized value of the input, Imin is the minimum value of I, Imax is the maximum value of I, y is the target variable, ymin is the minimum value of y, ymax is the maximum value of y, and yN is the normalized value of the target.

2.4.2. Formulation of back propagation neural network model based on genetic algorithm optimization The input variables of the GA-BP model are composed of 1568 × 101 matrices, in which 101 are the fluorescence intensity values at a specific wavelength, and the output variable is a 1568 × 1 column vector of the algal cell concentration. The number of input nodes was set to 101. The output was the microalgal cell concentration corresponding to the input spectral data, and the output layer was set to 1. Determining the number of hidden layers and the number of hidden nodes is usually a trial-and-error process. Tarassenko believes that the empirical rule of selecting the number of hidden nodes depends on the fact that the number of samples in the training set should be at least greater than the number of synaptic weights [23]. To prevent over-fitting of training data, Rogers and Dowla suggest limiting the number of hidden layer nodes by the following criteria [24]:

NH =

y − ymin −1 ymax − ymin

3. Results and discussions 3.1. Fluorescence spectrum analysis The samples of microalgae solution were excited at 470 nm to emit fluorescence containing important information about culture [28]. The samples were scanned by fluorescence spectrometer. The wavelength ranged from 660 to 800 nm. The fluorescence emission spectra of the microalgae solution were obtained, as shown in Fig. 4. It can be seen in the figure that the fluorescence emission spectra of the samples were similar in shape, but the positions of the fluorescence peaks were slightly different. There is an obvious fluorescence peak near 685 nm

(8)

Fig. 3. Topological structure of the Back Propagation (BP) Neural Network Based on Genetic Algorithms with single hidden layer. The model contains 101 input nodes, 9 hidden layer nodes, and one output node. 4

Algal Research 45 (2020) 101739

J.-Y. Liu, et al.

Fig. 4. Fluorescence emission spectrum of algae cell concentration between 2 × 105–6.4 × 106 cells/mL. Due to too many samples, we only select the typical 5 sets of lines. Data 1 (“Δ” line) is the fluorescence emission spectrum of the lowest concentration of algae; data 2 (“+” line) is the fluorescence emission spectrum of algae at the initial stage of logarithmic growth; data 3 (“.” line) is the fluorescence emission spectrum of algae when they are infected with bacteria; data 4 (“×” line) is the highest fluorescence intensity spectrum; data 5 (“*” line) is the fluorescence emission spectrum of algae at the highest concentration measured.

surrounding substances rather than detected by the spectrometer, which results in the decrease in the fluorescence intensity [14,37]. This can be used as a cause of a decrease in the fluorescence intensity of high-concentration algae. Therefore, a linear model cannot accurately estimate wide ranges of concentrations using fluorescence measurements without being overfit [38]. Although the amount of data affected is not large, ignoring this part of the data will greatly affect the accuracy of the model. Therefore, to correctly explain these results, this paper uses GA-BP to establish a nonlinear prediction model.

and a minor peak near 740 nm. The two peaks were caused by chlorophyll and the algal biomass, respectively [8]. In 1977 and 1980, Neville Gower and Gower et al. observed a strong peak at 685 nm in the spectrum of natural waters [29,30]. There were several attempts to explain this peak: this may be due to the fluorescence of phytoplankton pigments [31,32]. In 2010, Zhao Dongzhi et al. studied the fluorescence peak characteristics of eight algae. They found that with the increase of phytoplankton concentration, the fluorescence peak was stronger and the position of the peak gradually moved toward longer wavelengths. They called this phenomenon ‘redshift’. In addition, they found that the redshift rate of the fluorescent peaks of different species may be variable. Even some algae have a redshift velocity of zero [9,34]. In this study, we observed the shift of the fluorescence peak position as shown in Fig. 4 (but we did not study the red shift rate). In this experiment, the shortest central wavelength of the fluorescence peak appeared at 687 nm, the longest central wavelength was 689 nm, and the shift range of the fluorescence peak was 2 nm. It can also be seen from Fig. 4 that the microalgae sample with minimum concentration (2 × 105 mL−1) also produces the lowest fluorescence intensity (5145.5). The cell concentration of microalgae with the highest fluorescence intensity (18,056.7) was 4.8 × 106 mL−1. The highest cell concentration (6.4 × 106 mL−1) in the figure corresponds to a fluorescence intensity of only 13,618. Through the statistics of a large number of experimental data, it was found that when the cell concentration was higher than 480,000, the fluorescence intensity began to decrease. This value is lower than the concentration (1.51 × 108 mL−1) reported by Jiafei when the sample is chlorella [33]. Zhao believes that the location and intensity of fluorescence peaks are closely related to the concentration, species and water conditions of phytoplankton [35]. And this non-linear relationship will lead to an underestimation of the microalgae concentration in the model prediction. In previous studies, Chang et al. also found that a high concentration of chlorophyll-a interferes with the fluorescence determination of phycocyanin, resulting in an underestimation of the cyanobacterial cell concentrations [36]. As the concentration of the solution increases, the distance between molecules decreases, and the fluorescence emitted by the molecules may be absorbed by the

3.2. Performance of GA-BP model The results of the training model are shown in Fig. 5. The mean square error (MSE) and the number of iterations for the training run, the test run and the validation run. The train curve in the figure is the training error obtained by the training data input network; the validation curve is the verification error obtained after verifying the sample input network; and the test curve is the prediction error obtained from the prediction results. As shown in Fig. 5, the algorithm requires 56 iterations to achieve a satisfactory convergence. At 56 iterations, the minimum MSE value is 0.00099846. After training, the network model does not have a local optimum. The training network model converges faster, and the training error, the verification error and the prediction error have better convergences. Fig. 6 compares the training run, the verification run, the test run, and the combination (training and test) run. Y represents the model prediction result, and T represents the target result. R2 measures the relationship between the actual microalgal cell concentration and the predicted microalgal cell concentration, and its value can be between 0 and 1. Fig. 6 shows that the correlation coefficients (R2) between the training results and target data is 0.9983, between validation results and target data is 0.99802, between prediction results and target data is 0.99862, and of the whole training network model is 0.99831. The results show that the model correlates well with the actual results. The measured and predicted values are uniformly distributed around the parity line (Y = T), which confirms the ability of the model to predict 5

Algal Research 45 (2020) 101739

J.-Y. Liu, et al.

Fig. 5. Network training results of stochastic gradient descent. At 56 iterations, the optimal mean square error (MSE) value is 0.00099846. The “test” line is related to the test operation. The “train” line is used for training. The “validation” line is used for the early stop of the training process, and its function is to prevent over fitting of the model.

Fig. 6. A parity plot comparison of the predicted and the measured dimensionless algal cell concentrations for the training runs, the test runs, the validation runs and all the runs combined. (a) Training dataset, R2 = 0.9983 (b) validation dataset, R2 = 0.99802 (c) test dataset, R2 = 0.99862 and (d) combination all data, R2 = 0.99831. 6

Algal Research 45 (2020) 101739

J.-Y. Liu, et al.

Fig. 7. The Back Propagation neural network model based on Genetic Algorithm optimization (GA-BP) and Back Propagation neural network (BP) model predicted the cell concentration of algae. Both models were predicted using full sample data from algae. (a) prediction results of the GA-BP model, and (b) prediction results of the BP model. The prediction errors of the two models are shown in Fig. 8, and the prediction performances are shown in Table 2.

(a)

(b) cells mL−1 and the MARE is 9.8426 × 10−6. Although both models have a certain amount of prediction error, the output error predicted by the GA-BP model is smaller than the output error predicted by the BP neural network model. Therefore, the GA-BP model is considered to have better performance. The relationship between the predicted and the measured values of the established model is shown in Fig. 7. As can be seen, the predicted output values of the GA-BP model are closer to the measured values than those of the BP neural network model. This result is consistent with the relatively high R2 value of the neural network model (Table 2). Overall, compared with the BP neural network model, the GA-BP model has a relatively good model fit under the same basic spectral data and pretreatment mode, and the related indicators are better than the BP neural network model, and the prediction performance of the model is greatly improved. Fig. 8 shows the residual analysis of predicted values and measured data of the GA-BP model and the BP neural network model. From Fig. 8a, it can be seen that the residual is an approximately normal distribution, indicating that there is no large system deviation between the two models. From the scatter plot of Fig. 8b, it can be seen that the residual errors are randomly distributed around the average value. Compared to the GA model however, the BP model still had a lower R2 and higher MAE and MARE, indicating that the GA-BP model predicts best the GA-BP model had the best predictive effect on the cell concentration of microalgae.

new data.

3.3. Comparison of GA-BP with BP neural network To evaluate the performance of the GA-BP model, the modeling results from the GA-BP model and the BP neural network model are compared for the same group of samples. Fig. 7 shows the comparison of predicted and observed values for all models. The comparative results of the calibration models developed are depicted in Table 2. Table 2 shows that compared with the traditional BP neural network model prediction method, the prediction results of GA-optimized BP neural network (GA-BP) have been greatly improved. This is because genetic algorithm assigns the optimal initial weights and thresholds to BP neural network through selection, crossover and mutation operations, so as to improve its prediction error [21]. When predicting the microalgal cell concentration based on the same spectral data, the GABP model obtains larger values for the indicators of model fit than the BP neural network model: the R2 of the GA-BP model is 0.9958, the MAE is 1.4174 × 104 cells mL−1 and the MARE is 0.14359 × 10−6; the R2 of the BP neural network model is 0.9818, the MAE is 12.48 × 104 Table 2 Evaluation of the Back Propagation neural network based on Genetic Algorithm optimization (GA-BP) and Back Propagation neural network (BP) construction and prediction performance in the total 1568 algae samples. Modeling algorithm

R2

MAE (×10

GA-BP BP neural network

0.9958 0.9818

1.4174 12.480

4

cells mL−1)

MARE (×10−6)

3.4. Use of additional data to verify the GA-BP model 0.14359 9.8426

In order to verify the validity of the model, the prediction was applied to the monitoring of algae cell concentration at different growth stages in the actual aquaculture process. In this section, three groups of algae samples were used to test the GA-BP model under different media

R2: represents the coefficient of determination. MAE: represents the mean absolute error. MARE: represents the mean absolute relative error. 7

Algal Research 45 (2020) 101739

J.-Y. Liu, et al.

(a)

(b) Fig. 8. Residual plots for the Back Propagation neural network model based on Genetic Algorithm optimization (GA-BP) and Back Propagation neural network (BP) model: (a) Residual histogram of GA-BP model and BP model, (b) Residual scatter plot of GA-BP model and BP model.

6.585 × 105 mL−1 and 0.039, respectively (Fig. 9b). For the third group of samples, the predicted MAE and MARE were 6.47 × 106 mL−1 and 0.1977, respectively (Fig. 9c). As can be seen from Fig. 9, when the trained model is used to monitor the concentration of algae cells, the order of the sequences is related to the error of prediction. Most of the relative errors of the first two groups are concentrated above the 0 boundary (Fig. 9a, b), indicating that the two groups of algae concentrations are overestimated. Compared with the previous two groups, the results of cell concentration prediction for bacterially contaminated algae were even worse, and

conditions. The first group was algae under normal growth condition, the second group was algae under nitrogen deficiency culture condition, and the third group was algae contaminated by bacteria. For the first group of algae cell concentration monitoring, as shown in Fig. 9a, visual comparison reiterates the effectiveness of GA-BP model in algae cell concentration monitoring. The predicted MAE and MARE were 8.96 × 105 mL−1 and 0.0178, respectively (Fig. 9a). For the second group of samples, the results showed that monitoring with the GA-BP model, under the condition of nitrogen deficiency culture, also achieved good verification results. The predicted MAE and MARE were 8

Algal Research 45 (2020) 101739

J.-Y. Liu, et al.

Fig. 9. Performance test of the Back Propagation neural network model based on Genetic Algorithm optimization (GA-BP). (a1) The relationship between the measured (“*” line) and the predicted (“o” line) microalgae cell concentration in the first group of samples. (b1) The relationship between the measured (“*” line) and the predicted (“o” line) microalgae cell concentration in the second group of samples. (c1) The relationship between the measured value (“*” line) and the predicted value (“O” line) of microalgae cell concentration in the third group of samples. Panels a2, b2 and c2 are the prediction error maps. 9

Algal Research 45 (2020) 101739

J.-Y. Liu, et al.

Du and Xuan Liu acquired and analyzed the data; Jing-Yan Liu and LiHua Zeng designed the experiments and composed the manuscript. All authors read and approved the final version of the manuscript.

most of the relative errors were concentrated below the 0 boundary (Fig. 9c), indicating that the algae concentration was underestimated. The possible cause of this phenomenon is that the algae grows slowly after being contaminated by bacteria, and the death of a large number of algae leads to a decrease in light transmission, which leads to an underestimation of the concentration of algae cells. Another possible reason is that there is a certain deviation between the algal concentration calculated by the number of algae cells under the microscope and the actual algal concentration, which may be due to sampling or other artificial reasons. Therefore, in the actual prediction of the algal cell concentration, we can use the model established in this paper to make a rough estimate. If we want to achieve an accurate prediction of the algal cell concentration, we should also comprehensively consider other prediction methods.

References [1] F. Liang, Q. Ya, W. Du, X. Wen, Y. Geng, Y. Li, The relationships between optical density, cell number, and biomass of four microalgae, Acta Ecol. Sin. 21 (2014) 6156–6163, https://doi.org/10.5846/stxb201301310207. [2] R. Martínez-Guijarro, M. Pachés, J. Ferrer, A. Seco, Model performance of partial least squares in utilizing the visible spectroscopy data for estimation of algal biomass in a photobioreactor, Environ. Technol. Innov. 10 (2018) 122–131, https:// doi.org/10.1016/j.eti.2018.01.005. [3] M.-J. Griffiths, C. Garcin, R.-P. van Hille, S.-T.-L. Harrison, Interference by pigment in the estimation of microalgal biomass concentration by optical density, J. Microbiol. Methods 85 (2011) 119–123, https://doi.org/10.1016/j.mimet.2011.02. 005. [4] H. Wang, R. Zhu, J. Zhang, L. Ni, H. Shen, P. Xie, A novel and convenient method for early warning of algal cell density by chlorophyll fluorescence parameters and its application in a highland lake, Front. Plant Sci. 9 (2018) 869, https://doi.org/10. 3389/fpls.2018.00869. [5] B.T. Nguyen, B.E. Rittmann, Low-cost optical sensor to automatically monitor and control biomass concentration in microalgal cultivation, Algal Res. 32 (2018) 101–106, https://doi.org/10.1016/j.algal.2018.03.013. [6] X.R. Jin, Research on On-line Recognition and Concentration Measurement of Dominant Species of Algae Based on Fluorescent Ratios, 101, Zhejiang University, 2018 http://kns.cnki.net/KCMS/detail/detail.aspx?FileName=1018186751.nh& DbName=CMFD2018. [7] M. Beutler, K.H. Wiltshire, B. Meyer, A fluorometric method for the differentiation of algal populations in vivo and in situ, Photosynth. Res. 72 (2002) 39–53, https:// doi.org/10.1023/a:1016026607048. [8] W.L. Vos, M. Doneze, H. Bueteveld, On the Reflectance Spectrum of Algae in Water: The Nature of the Peak at 700nm and its Shift With Varying Concentration, Technical Report, Delft, Netherla nds (1986), pp. 86–122. [9] A. Gitelson, The peak near 700 nm on radiance spectra of algae and water; relationships of its magnitude and position with chlorophyll concentration, Int. J. Remote Sens. 13 (1992) 3367–3373, https://doi.org/10.1080/ 01431169208904125. [10] J. Gregor, B. Marsalek, Freshwater phytoplankton quantification by chlorophyll a: a comparative study of in vitro, in vivo and in situ methods, Water Res. 38 (2004) 517–522, https://doi.org/10.1016/j.watres.2003.10.033. [11] S. Becker, H.C.P. Matthijs, E. van Donk, Biotic factors in induced defence revisited: cell aggregate formation in the toxic cyanobacterium Microcystis aeruginosa PCC 7806 is triggered by spent Daphnia medium and disrupted cells, Hydrobiologia 644 (2010) 159–168, https://doi.org/10.1007/s10750-011-0767-4. [12] M.E. Pacheco, L. Bruzzone, Synchronous fluorescence spectrometry: conformational investigation or inner filter effect, J. Lumin. 137 (2013) 138–142, https://doi.org/ 10.1016/j.jlumin.2012.12.056. [13] T. Wang, L. Zeng, D. Li, A review on the methods for correcting the fluorescence inner-filter effect of fluorescence spectrum, Appl. Spectrosc. Rev. 52 (2017) 883–908, https://doi.org/10.1080/05704928.2017.1345758. [14] J.C. Cancilla, Artificial neural networks applied to fluorescence studies for accurate determination of N-butylpyridinium chloride concentration in aqueous solution, Sensors Actuators B Chem. 198 (2014) 173–179, https://doi.org/10.1016/j.snb. 2014.02.097. [15] F. García-Camacho, Artificial neural network modeling for predicting the growth of the microalga Karlodinium veneficum, Algal Res. 14 (2016) 58–64, https://doi.org/ 10.1016/j.algal.2016.01.002. [16] J. Sharon Mano Pappu, G. Karthik Vijayakumar, V. Ramamurthy, Artificial neural network model for predicting production of Spirulina platensis in outdoor culture, Bioresour. Technol. 130 (2013) 224–230, https://doi.org/10.1016/j.biortech.2012. 12.082. [17] B.M. Franco, L.M. Navas, C. Gómez, C. Sepúlveda, F.G. Acién, Monoalgal and mixed algal cultures discrimination by using an artificial neural network, Algal Res. 38 (2019) 101419, , https://doi.org/10.1016/j.algal.2019.101419. [18] R.R. Guillard, Counting slides, Phytoplankton Manual-Monographs on Oceanographic Methodology, UNESCO, Paris, France, 1978. [19] F.Y. Jiang, Y.L. Zhao, S. Dong, Z.B. Dong, BP neural network based on genetic algorithm to predict the impact damage of submarine pipeline, Trans. Oceanol. Limnol. 03 (2019) 52–59, https://doi.org/10.13984/j.cnki.cn37-1141.2019.03. 007. [20] S. Wang, N. Zhang, L. Wu, Y. Wang, Wind speed forecasting based on the hybrid ensemble empirical mode decomposition and GA-BP neural network method, Renew. Energy 94 (2016) 629–636 https://10.1016/j.renene.2016.03.103. [21] S. Wang, J. Wang, F.K. Shang, Y.T. Wang, A GA-BP method of detecting carbamate pesticide mixture based on three-dimensional fluorescence spectroscopy, Spectrochim. Acta A Mol. Biomol. Spectrosc. 224 (2020) 11739 https://10.1016/j. saa.2019.117396. [22] F. Lu, Z. Chen, W.Q. Liu, Modeling chlorophyll-a concentrations using an artificial neural network for precisely eco-restoring lake basin, Ecol. Eng. 95 (2016) 422–429, https://doi.org/10.1016/j.ecoleng.2016.06.072. [23] L. Tarassenko, A Guide to Neural Computing Applications, Arnold Publishers, London, 1998. [24] H.R. Maier, G.C. Dandy, Neural network based modelling of environmental

4. Conclusion Due to the rapid development of micro-monitoring technology, technology costs are cheaper and usability is higher. Ecology is changing from a scientific field with limited data to a scientific field with abundant data [39]. Traditional algae concentration monitoring methods are vulnerable to some errors and uncertainties, including differences in sampling techniques, instrument failures and observer bias [39,40]. In most cases, the cell concentration of algae in artificial culture environment is much higher than that of algae in natural environment. Higher algae concentrations lead to greater uncertainty. It was found that with the increase of algae concentration, the fluorescence intensity of algae had a non-linear relationship with algae cell concentration. And the fluorescence peak moves to the longer wave direction. In order to monitor the concentration of microalgae cells quickly and accurately, a GA-BP model was constructed. The model was used to process the fluorescence spectral data of microalgae. Within the concentration range of 2 × 105 to 6.4 × 106 mL−1, the rapid, unlabeled rough estimation, and monitoring of the cell concentration of microalgae were realized. These results also show that in situ fluorescence spectrometry can effectively identify and respond to changes in the algal cell concentration. This method can be used to monitor algae changes in the water environment more easily. The fluorescence spectrum in combination with an ANN may be an interesting source for the design of algal cell concentration sensors, and can also be used to estimate the concentration of other fluorescence-emitting compounds for process monitoring, environmental control, and many other fields. While this study was carried out in a laboratory environment without consider environmental factors, such as turbidity, temperature and other fluorescent substances, the prediction deviation in the experiments was still large. Therefore, the next challenge is to improve the stability of the system for practical applications and further reduce the prediction error. Declaration of competing interest The authors declare no conflict of interest. No conflicts, informed consent, human or animal rights applicable. Acknowledgments The authors thank American Journal Experts (AJE) for providing English language editing of this paper. We are grateful for financial support from the Top-notch talent plan program of higher education in Hebei (BJ2017036). Author contributions Jing-Yan Liu designed and carried out the experiments together with Li-Hua Zeng; Jing-Yan Liu, Li-Hua Zeng, Zhen-Hui Ren, Tie-Min 10

Algal Research 45 (2020) 101739

J.-Y. Liu, et al.

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33] F. Jia, M. Kacira, K. Ogden, Multi-wavelength based optical density sensor for autonomous monitoring of microalgae, Sensors-Basel 15 (2015) 22234–22248, https://doi.org/10.3390/s150922234. [34] D.Z. Zhao, X.X.Y.G. Liu, J.H. Yang, L. Wang, The relation of chlorophyll-a concentration with the reflectance peak near 700 nm in algae-dominated waters and sensitivity of fluorescence algorithms for detecting algal bloom, Int. J. Remote Sens. 31 (2010) 39–48, https://doi.org/10.1080/01431160902882512. [35] D.Z. Zhao, F.S. Zhang, F. Du, Interpretation of sun-induced fluorescence peak of chlorophyll a on reflectance spectrum of algal waters, J. Remote Sens. 9 (2005) 265–270, https://doi.org/10.3321/j.issn:1007-4619.2005.03.007. [36] D. Chang, P. Hobson, M. Burch, T. Lin, Measurement of cyanobacteria using in-vivo fluoroscopy-effect of cyanobacterial species, pigments, and colonies, Water Res. 46 (2012) 5037–5048, https://doi.org/10.1016/j.watres.2012.06.050. [37] X. Chen, Z. Yuan, F. Gao, W. Gao, Method of accurate correction on inner filter effects in fluorescence quenching analysis, Acta Phot. Sin. 44 (2015) 160–165, https://doi.org/10.3788/gzxb20154410.1017001. [38] M. Tarai, A.K. Mishra, Inner filter effect and the onset of concentration dependent red shift of synchronous fluorescence spectra, Anal. Chim. Acta 940 (2016) 113–119, https://doi.org/10.1016/j.aca.2016.08.041. [39] X. Xiao, H. Sogge, K. Lagesen, A. Tooming-Klunderud, K.S. Jakobsen, T. Rohrlack, Use of high throughput sequencing and light microscopy show contrasting results in a study of phytoplankton occurrence in a freshwater environment, PLoS One 9 (2014) e106510, https://doi.org/10.1371/journal.pone.0106510. [40] D. Straile, M.C. Jochimsen, R. Kummerlin, The use of long-term monitoring data for studies of planktonic diversity: a cautionary tale from two Swiss lakes, Freshw. Biol. 58 (2013) 1292–1301, https://doi.org/10.1111/fwb.12118.

variables: a systematic approach, MComM 33 (2001) 669–682, https://doi.org/10. 4018/978-1-930708-26-6.ch015. F. Lu, Z. Chen, W. Liu, H. Shao, Modeling chlorophyll-a concentrations using an artificial neural network for precisely eco-restoring lake basin, Ecol. Eng. 95 (2016) 422–429, https://doi.org/10.1016/j.ecoleng.2016.06.072. F. García-Camacho, L. López-Rosales, A. Sánchez-Mirón, Artificial neural network modeling for predicting the growth of the microalga Karlodinium veneficum, Algal Res. 14 (2016) 58–64, https://doi.org/10.1016/j.algal.2016.01.002. X. Xiao, J. He, H. Huang, T.R. Miller, G. Christakos, E.S. Reichwaldt, A novel singleparameter approach for forecasting algal blooms, Water Res. 108 (2017) 222–231 https://10.1016/j.watres.2016.10.076. I. Havlik, P. Lindner, T. Scheperk, F. Reardon, On-line monitoring of large cultivations of microalgae and cyanobacteria, Trends Biotechnol. 31 (2013) 406–414, https://doi.org/10.1016/j.tibtech.2013.04.005. R.A. Neville, J.F.R. Gower, Passive remote sensing of phytoplankton via chlorophyll-α fluorescence, J. Geophys. Res. 82 (1977) 3487–3493, https://doi.org/10. 1029/JC082i024p03487. J.F.R. Gower, Observations of in situ fluorescence of chlorophyll-a in Saanich inlet, Bound.-Layer Meteorol. 18 (1980) 235–245, https://doi.org/10.1007/ BF00122022. A.A. Gitelson, Nature of the peak near 700 nm on the radiance spectra and its application for remote estimation of phytoplankton pigments in inland waters, Proc. SPIE Int. Soc. Opt. Eng. 1971 (1993) 170–179, https://doi.org/10.1117/12. 150992. B.Y. Tao, Z.H. Mao, D.L. Pan, Y.Z. Shen, Influence of bio-optical parameter variability on the reflectance peak position in the red band of algal bloom waters, Eco. Inform. 16 (2013) 17–24, https://doi.org/10.1016/j.ecoinf.2013.04.005.

11