Predicting concentrations of a mixture in bioreactor for on-line monitoring using Raman spectroscopy

Predicting concentrations of a mixture in bioreactor for on-line monitoring using Raman spectroscopy

8th IFAC Symposium on Advanced Control of Chemical Processes The International Federation of Automatic Control Singapore, July 10-13, 2012 Predicting...

NAN Sizes 0 Downloads 3 Views

8th IFAC Symposium on Advanced Control of Chemical Processes The International Federation of Automatic Control Singapore, July 10-13, 2012

Predicting concentrations of a mixture in bioreactor for on-line monitoring using Raman spectroscopy Se-Kyu Oh, Sung Jin Yoo, Jong Min Lee ∗ ∗

School of Chemical and Biological Engineering, Institute of Chemical Processes, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul, 151-744, Korea (e-mail: [email protected]) Abstract: On-line monitoring of biological processes is essential for maximizing productivity because reactions are very slow and much effort is required to analyze the composition of bioproduct. The Raman spectroscopy is suitable for on-line monitoring of bioprocess with proper analysis algorithms. This work proposes a novel soft-sensor framework based on Raman spectra. It first removes the background effect and further reduces noise using the Rolling-Circle Filter (RCF) and Savitzky-Golay smoothing filter, respectively. Then Partial Least Square (PLS) is used to reduce the dimension of spectra and predict the concentrations based on latent variables. The prediction performance was improved by 30% by applying the proposed methods. The Raman spectrum can be used to analyze the concentrations by using the PLS. Keywords: Raman spectroscopy; Savitzky-Golay smoothing filter; Rolling-Circle Filter; Partial Least Square; On-line monitoring; 1. INTRODUCTION In bioprocess, it is not simple to analyze composition of product. Cell-disruption, extraction, freeze-drying and centrifugation are required to know the composition of the bio-product, and mass spectroscopy or high-performance liquid chromatography (HPLC) is used to determine their characteristics. These processes take a very long time (30∼60 mins) and much effort is required. If the results obtained through these processes are not satisfactory, process variables should be manipulated to improve product quality. However, this process has many problems for feedback control because there exists a significant time delay between sampling and completion of analysis. Furthermore, since a biological process model is not robust compared to traditional chemical process models, the importance of on-line monitoring for bioprocess is more emphasized. Raman spectroscopy has attracted much attention in composition analysis for on-line monitoring because sample preparation is not needed and samples are not destroyed. Also, an important advantage of Raman spectroscopy over other ones is that water interference peak is insignificant [Demirbas, 2010]. Thus, Raman spectroscopy is suitable for bioprocess monitoring. However Raman spectra obtained without sample preparation may not be uniform because of sediments and floating matters. Uniformity of samples is important to take accurate spectrum. Microalgal bioreactor system is an important example for bioprocesses. Microalgae are considered as one of the most promising feedstock for biofuels due to their lipid production potential [Mata et al., 2010]. There are several ? Corresponding author. Tel) +82-2-880-1878 Fax) +82-2-888-1604 Email) [email protected]

978-3-902823-05-2/12/$20.00 © 2012 IFAC

822

papers for prediction of lipid concentration in algae using Raman spectroscopy [Huang et al., 2010, Samek et al., 2010]. However, these performed off-line spectroscopic and biological analysis only. In this study, we will present a comprehensive framework to predict the concentration using Raman spectroscopy and the comparison of prediction performance on-line. We will also show the influence of the difference in uniformity on prediction performance. Raman spectra are analyzed using multivariate analysis, not biological and spectroscopic analysis. The mixture solution consisting of water, soybean oil, glucose and glycine was used to test applicability of heterogeneous solution such as microalgae. Water, soybean oil, glucose and glycine are added to explain aqueous solution, ununiformity, uniformity and fluorescent effect, respectively. The concentrations of the mixture were determined by mixture design of experiment. 2. MATERIALS AND METHODS 2.1 Sample Preparation The reagents used in this experiment were D-(+)-glucose (ACS reagent grade, Sigma-Aldrich Co.), ultra-pure water, soybean oil (CJ) and Glycine (99.0% pure, Sigma-Aldrich Co.). Glucose used as carbon sources in bioprocess was added to test prediction performance of the material that has characteristic of uniformity. Soybean oil was used to Table 1. Concentrations of mixture in this experiment (g/mL) min. max.

water 0.7029 0.7920

soybean oil 0.1344 0.2100

glucose 0 0.0385

glycine 0 0.0174

10.3182/20120710-4-SG-2026.00099

8th IFAC Symposium on Advanced Control of Chemical Processes Singapore, July 10-13, 2012

confirm the prediction performance of extremely ununiform material (i.e. phase separation). Since Glycine has an intense fluorescent effect that makes background effect on the Raman spectrum, it was used to add background effect on the Raman spectrum of this experiment. This mixture is water-based solution because most reactants in bioreactor form aqueous solution. Sample concentrations were determined by mixture design of experiment (DOE). The extreme vertex design with two replicates for the whole design was conducted to prepare 34 samples. The upper and lower bounds of the concentrations are shown in Table 1. The DOE was implemented using MINITAB 16. 2.2 Raman Spectroscopy Raman spectroscopy is a simple and fast spectroscopic technique to analyze molecular characteristics by examining vibrational and rotational modes in molecules. Raman spectrometer uses the separated inelastically scattered (Raman scattered) light from the Rayleigh scattered laser light. It is suitable to apply to biological samples due to its low sensitivity to aqueous solution and used to perform quantitative analysis because the intensity of characteristic peak increases linearly with concentration. Since Raman wavelengths can be effectively transferred by optical fiber compared with Fourier transform infrared (FT-IR), visible or near IR, it is applicable for on-line monitoring in rough environment [Ye et al., 2009, Huang et al., 2010]. In our experiment, Raman spectra were obtained by Ocean Optics QE65000 Scientific-grade spectrometer with a 532nm excitation wavelength. Each spectrum was an average of 4 accumulations with an integration time of 5s. The spectral range selected was 453 ∼ 4629 cm−1 . 2.3 Partial Least Square (PLS) PLS regression is a technique that predicts dependent variables (e.g. concentrations) from independent variables (e.g. Raman spectra). When a set of independent variables is very large, this regression is a particularly suitable method for predicting dependent variables [Otto, 2007]. PLS latent variable direction of the independent matrix (X) is calculated in order to maximize the covariance between it and the dependent vector (y). Both the X and y are decomposed into smaller matrices. X = TPT + E

(1)

y = T qT + f

(2)

W = [X0T y0 |X1T y1 | · · · |XkT yk ]

(7)

where SXX and SXY are covariances, and k is the number of latent variables [Lopes et al., 2004]. Although Multiple Linear Regression (MLR) and Principal Component Regression (PCR) are used for multivariate analysis frequently, MLR has some problems about collinearities and singularity. That is, The inverse of X T X in the least squares solution b = (X T X)−1 X T y may not exist. In case of PCR, there is no guarantee that the principal components containing most information of X variables explain y variables well. PLS solves these problems by using inner relationships between scores of X and y. Therefore, in most cases PLS is used to find a relationship between X and y [Geladi and Kowalski, 1986]. R In this paper, PLS was developed using The Unscramber X. The algorithm uses the non-linear iterative partial least squares algorithm (NIPALS). Prediction performance is evaluated based on a 10-fold cross validation used to simulate a test set when there are not enough samples to make an independent test set.

2.4 Rolling-Circle Filter (RCF) A Raman spectrum has usually a background caused by the fluorescence effect. The background should be removed because it does not include any chemical information. The Rolling-Circle Filter is an easy-to-apply and intuitive filter to subtract background effect [Paraschuk et al., 2006]. This method uses geometrical difference between characteristic peaks and background. First, a radius R should be determined. The size of the circle should be significantly larger than the Raman linewidths and less than the background. Then, the circle rolls beneath the spectrum as shown in Fig. 1. Only the parts which the circle cannot enter are left [Mikhailyuk and Razzhivin, 2003]. The key advantages of the RCF are it has a single parameter (radius) and there is no limit on a background shape. We first normalize the spectrum before applying the RCF. The Raman shift range becomes equal to the intensity range in this step. X(i) − min[X] Xnorm (i) = ·N (8) max[X] − min[X]

where T are the score matrices containing orthogonal rows, P and q are loadings of the X and y, respectively. E and f are the residuals. Dependent variables y is predicted by regression vector b. y = Xb

(3)

A regression vector b is calculated by using a weight matrix W. b = W (W T SXX W )−1 W T SXY

(4)

T

(5)

T

(6)

SXX = X X SXY = X y

823

Fig. 1. The geological expression of the algorithm of the Rolling-Circle Filter.

(B)

(A)

Intensity (a.u.)

Intensity (a.u.)

8th IFAC Symposium on Advanced Control of Chemical Processes Singapore, July 10-13, 2012

2900

(A) 2800 2700 2600 2500 500

1000

1500

2000

2500

3000

3500

4000

4500

3500

4000

4500

Raman Shift (cm−1) 0.5

1 0

0.5

x, ×103 (C)

Intensity (a.u.)

1

x, ×103

Intensity (a.u.)

0

(D) R=1500

800

(B)

600 400 200 0 500

R=500

1000

1500

2000

2500

3000

Raman Shift (cm−1)

R=100 0.5

1 0

0.5

3

Fig. 3. (A) Spectra of the samples consisting of the same concentrations. (B) Spectra filtered by the RCF(R=400) for background subtraction.

1

3

x, ×10

x, ×10

Fig. 2. (A) Simulated characteristic peaks produced by five Gaussian functions. (B) Simulated spectrum with the background effect and noise with standard deviation of 4.(C) Spectra filtered by the RCF with R=1500, 500 and 100. (D) Spectrum filtered by the RCF with R=100 and 15-points quadratic Savitzky-Golay smoothing filter.

4000

Intensity (a.u.)

0

Table 2. The parameters of the Gaussian functions for the simulated spectrum.

3000

1000

1500

2000

2500

3000

3500

4000

4500

3500

4000

4500

3500

4000

4500

−1

Raman Shift (cm ) 1000

x0 100 300 450 600 900 600 900

Intensity (a.u.)

∆x 10 5 5 10 40 700 500

(A)

2500 500

(B) 500

0 500

1000

1500

2000

2500

3000

−1

Raman Shift (cm )

where X is the spectrum intensity, N is the number of spectral points and i = 1, 2, 3 · · · N [Paraschuk et al., 2006]. MATLAB 2011b is used to implement the RCF. 2.5 Savitzky-Golay smoothing filter

4000

Intensity (a.u.)

peak 1 peak 2 peak 3 peak 4 peak 5 background 1 background 2

A 60 130 20 100 80 300 300

3500

3500 3000

2500 500

The part that needs to remove is not only the background, but also noise. Noise is a severe problem in spectral analysis. Savitzky-Golay smoothing filter is the most wellknown noise reduction filter. This filter fits a polynomial to each successive curve segment, thus replacing the original values with more regular variations. Parameters to choose are the length of the smoothing segment (moving window) and the order of the polynomial. It is a very useful method to effectively remove spectral noise while keeping chemical information [Savitzky and Golay, 1964]. This filter was R performed by using The Unscramber X. 3. RESULTS 3.1 Simulated data A simulated spectrum was made to assess the feasibility of the proposed methods using Gaussian functions F (x) = A· 824

Water spectrum Soybean oil spectrum

(C)

1000

1500

2000

2500

3000

Raman Shift (cm−1)

Fig. 4. (A) Spectra of the samples consisting of the same concentrations. (B) Spectra filtered by the RCF(R=400) for background subtraction. (C) Two spectra of water and soybean oil. 2

e−{(x−x0 )/∆x} . The characteristic peaks produced by five Gaussian functions (Fig. 2(A)) were added to the Raman background produced by two Gaussian functions as shown in Fig. 2(B). The parameters of the Gaussian functions are shown in Table 2. Fig. 2(A) shows the original spectrum, but the Raman spectrum was taken with the background and random noise as shown in Fig. 2(B). We should find the original spectrum (Fig. 2(A)) from the spectrum with the background and noise (Fig. 2(B)). Fig. 2(C) is the results of applying the RCF with R=1500, 500, 100. The

8th IFAC Symposium on Advanced Control of Chemical Processes Singapore, July 10-13, 2012

Table 3. The prediction performance of the PLS model of the spectra processed by the RCF. Validation RMSE R-square

R=1500 0.0038 0.9742

R=1000 0.0040 0.9722

R=500 0.0040 0.9733

R=400 0.0035 0.9806

R=300 0.0039 0.9755

R=200 0.0042 0.9715

R=100 0.0045 0.9637

Table 4. The prediction performance of the PLS model of the spectra processed by the RCF and Savitzky-Golay smoothing filter. Validation RMSE R-square

5-points 0.0034 0.9822

7-points 0.0032 0.9823

9-points 0.0032 0.9820

11-points 0.0032 0.9816

13-points 0.0033 0.9817

15-points 0.0030 0.9849

17-points 0.0033 0.9809

19-points 0.0031 0.9839

4000 800

(A)

(A)

3500

Raw spectra

Intensity (a.u.)

Intensity (a.u.)

700

3000

600 500

Spectra filtered by RCF (R=400)

400 300 200 100

2500 500

1000

1500

2000

2500

3000

3500

4000

0 500

4500

1000

1500

−1

Calibration data Validation data

0.04

(B)

Predicted glucose (g/mL)

Predicted glucose (g/mL)

0.04 0.035 0.03 0.025 0.02 0.015 0.01

RMSE R−Square

0.005

Cal. 0.0024

0.9899

0

Val. 0.0043

0.9677

0

0.005

0.01

0.015

0.02

0.025

2000

2500

3000

3500

4000

4500

Raman Shift (cm−1)

Raman Shift (cm )

0.03

0.035

0.04

Calibration data Validation data

0.035 0.03 0.025 0.02 0.015 0.01

RMSE R−Square

0.005

Cal. 0.0005

0.9995

0

Val. 0.0035

0.9806

0

Actual glucose (g/mL)

(B)

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

Actual glucose (g/mL)

Fig. 5. (A) The spectra of the 34 samples without any preprocessing. (B) The results of the PLS regression for the prediction of the glucose concentration using the raw spectra.

Fig. 6. (A) The spectra of the 34 samples filtered by the RCF with R=400. (B) The results of the PLS regression for prediction of the glucose concentration using the spectra filtered by the RCF with R=400.

radius of curvature of the background is about 200 and the radii of characteristic peaks are about 10∼80. Therefore, the spectra filtered by the RCF with R=80∼200 may be similar each other. In this case, the spectrum filtered by the RCF with R=100 was chosen as a backgroundfree spectrum. Fig. 2(D) is the spectrum removed noise effect by Savitzky-Golay smoothing filter with 15-points quadratic. A comparison between Fig. 2(A) and Fig. 2(D) shows that these proposed methods are applicable to find the original characteristic peaks from an actual Raman spectrum with the background and noise.

composition, these spectra have the different intensity. Fig. 3(B) is obtained when the RCF with R=400 applies to the spectra in Fig. 3(A). In the same environment, even if the spectra of the samples having the same concentrations are taken, the difference in Raman intensity occurs because of the fluorescent effect. Thus, background subtraction is essential process before the prediction of the concentration. Fig. 4(A) is also the same case as above.

3.2 Experimental data

First, glucose uniformly mixed in the solution would be applied to predict concentration using the PLS. To test the performance of the filters, the PLS applied to raw spectra and the result is shown in Fig. 5. For the validation

Fig. 3(A) is the spectra of two samples of the same concentration. Even though two samples consist of the same 825

In this case, two spectra have different spectral shapes in spite of application of the RCF. This result arises from phase separation between soybean oil and water.

8th IFAC Symposium on Advanced Control of Chemical Processes Singapore, July 10-13, 2012

800

Intensity (a.u.)

700

Raw spectra

600 500

(A) Spectra filtered by RCF (R=400), SG filter (15−point quadratic)

400 300

Intensity (a.u.)

200 100 0 500

1000

1500

Predicted glucose (g/mL)

0.04

RCF (R=400)

RCF (R=100) 500

1000

1500

2000

2500

3000

3500

4000

4500

Raman Shift (cm−1)

RCF (R=1500)

2000

2500

3000

3500

4000

4500

−1

Raman Shift (cm )

Calibration data Validation data

0.035 0.03 0.025 0.02 0.015 0.01

RMSE R−Square

0.005

Cal. 0.0019

0.9976

0

Val. 0.0030

0.9849

0

0.005

When the radius is larger or smaller than 400 the prediction performance decreases. Fig. 7 explains decreasing performance. The change of the spectra according to the radius of the RCF is shown in Fig. 7. The characteristic peaks disappear when the radius is smaller than bandwidth of the characteristic peaks. If selection of the radius is difficult, it would be better to choose a big radius as a parameter than a small radius because most Raman spectrum has a gentle-sloped background. The radius of the RCF based on points not Raman shift. In this experiment, the Raman spectrum has the total 941 points. Thus the circle with R=400 is relatively large size. The quadratic Savitzky-Golay smoothing filter that generally use the quadratic as a polynomial was applied to the spectra filtered by the RCF with R=400 which has the best predictive performance in this experiment and moving 826

0.015

0.02

0.025

0.03

0.035

0.04

Fig. 8. (A) The spectra of the 34 samples filtered by the RCF with R=400 and the 15-points quadratic Savitzky-Golay smoothing filter. (B) The results of the PLS regression for the prediction of the glucose concentration using the spectra filtered by the RCF with R=400 and the 15-points quadratic SavitzkyGolay smoothing filter.

0.22

Predicted soybean oil (g/mL)

The PLS regression was applied to the spectra that were processed by the RCF with various radii in order to determine the optimal radius of the RCF. These results are shown in Table 3. The PLS model of the spectra filtered by the RCF with R=400 has the best prediction performance in this step. This result is shown in Fig. 6.

0.01

Actual glucose (g/mL)

Fig. 7. The comparison of the the filtered Raman spectra according to the difference of the radius of the RCF. data, the root mean square error (RMSE) and the Rsquare are 0.0043 and 0.9677, respectively. The RMSE with a value closer to 0 and the R-square with a value closer to 1 indicate a better fit. In the PLS, the RMSE is more accurate to evaluate the prediction performance of the model than R-square. This is because the R-square tends to approach 1 when a lot of regression coefficients exist regardless of prediction performance.

(B)

Calibration data Validation data

0.2

0.18

0.16

RMSE R−Square Cal. 0.0099 0.9970 Val. 0.0322 0.9690

0.14

0.12 0.12

0.14

0.16

0.18

0.2

0.22

Actual soybean oil (g/mL)

Fig. 9. The results of the PLS regression for the prediction of the soybean oil concentration using the spectra filtered by the RCF with R=400 and the 15-points quadratic Savitzky-Golay smoothing filter.

8th IFAC Symposium on Advanced Control of Chemical Processes Singapore, July 10-13, 2012

windows were used from 5 to 19. The results of this step are shown in table 4. The 15-points quadratic SavitzkyGolay smoothing filter has the best predictive performance as shown in Fig. 8. Generally, 15-points quadratic is the default value in commercial programs (e.g. PLS Toolbox, Eigenvector Research Incorporated). For spectral noise reduction, the moving window would be better to choose a small value because the width of the Raman spectral peaks is narrower than other ones of spectroscopes. Thus, characteristic peaks could vanish when a large value as a moving window is selected. The RMSE was improved by 30% from 0.0043 to 0.0030, comparing the results of the PLS regression of the raw spectra and the filtered spectra. The PLS was applied to prediction of the soybean oil concentration using the spectra processed by the RCF with R=400 and 15-points quadratic Savitzky-Golay smoothing filter based on above result. In Fig. 9, the R-square is 0.9690 but the soybean oil concentration is not at all predicted. As mentioned above, the R-square is not correct as a method for assessing the accuracy of the regression model in the PLS. Although the extremely insoluble material exists in mixture other material was accurately predicted. 4. CONCLUSION In this paper, we have demonstrated that it is possible to predict glucose concentration by using PLS and Raman spectra. Prediction performance was improved by 30% using the proper pre-processing technique. We chose Savitzky-Golay smoothing filter for reducing noise and Rolling-Circle Filter for removing fluorescent effect. Uniformity of the materials in solution is important to predict the concentration precisely. We could identify that the concentration of the uniform material is estimated although the insoluble material exists in mixture. ACKNOWLEDGEMENTS The authors gratefully acknowledge the financial support from Institute of Chemical Processes of Seoul National University REFERENCES A. Demirbas. Use of algae as biofuel sources. Energy Conversion and Management, 51(12):2738–2749, 2010. P. Geladi and B. R. Kowalski. Partial least-squares regression - a tutorial. Analytica Chimica Acta, 185: 1–17, 1986. Y. Y. Huang, C. M. Beal, W. W. Cai, R. S. Ruoff, and E. M. Terentjev. Micro-raman spectroscopy of algae: composition analysis and fluorescence background behavior. Biotechnol Bioeng, 105(5):889–98, 2010. J. A. Lopes, P. F. Costa, T. P. Alves, and J. C. Menezes. Chemometrics in bioprocess engineering: process analytical technology (pat) applications. Chemometrics and Intelligent Laboratory Systems, 74(2):269–275, 2004. T.M. Mata, A.A. Martins, and N.S. Caetano. Microalgae for biodiesel production and other applications: A review. Renewable and Sustainable Energy Reviews, 14 (1):217–232, 2010. 827

I. K. Mikhailyuk and A. P. Razzhivin. Background subtraction in experimental data arrays illustrated by the example of raman spectra and fluorescent gel electrophoresis patterns. Instruments and Experimental Techniques, 46(6):765–769, 2003. M. Otto. Chemometrics: statistics and computer application in analytical chemistry. Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim, 2007. O. D. Paraschuk, N. N. Brandt, O. O. Brovko, and A. Y. Chikishev. Optimization of the rolling-circle filter for raman background subtraction. Applied Spectroscopy, 60(3):288–293, 2006. O. Samek, A. Jon, Z. Pilt, P. Zemnek, L. Nedbal, J. Tska, P. Kotas, and M. Trtlek. Raman microspectroscopy of individual algal cells: Sensing unsaturation of storage lipids in vivo. Sensors, 10(9):8635–8651, 2010. A. Savitzky and M. J. E. Golay. Smoothing + differentiation of data by simplified least squares procedures. Analytical Chemistry, 36(8):1627–1639, 1964. Q. Ye, Q. Xu, Y. Yu, R. Qu, and Z. Fang. Rapid and quantitative detection of ethanol proportion in ethanolgasoline mixtures by raman spectroscopy. Optics Communications, 282(18):3785–3788, 2009.