Vibrational Spectroscopy 107 (2020) 103023
Contents lists available at ScienceDirect
Vibrational Spectroscopy journal homepage: www.elsevier.com/locate/vibspec
Rapid determination of geniposide in the extraction and concentration processes of lanqin oral solution by near-infrared spectroscopy coupled with chemometric algorithms
T
Yong Chena, Ming Chena, Siyu Zhanga, Hui Maa, Jun Wangb, Hongwei Luc, Yongjiang Wua,* a
Institute of Modern Chinese Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, PR China Suzhou ZeDaXingBang Pharmaceutical Co., Ltd., Suzhou 215000, PR China c Yangtze River Pharmaceutical Group, Taizhou 225300, PR China b
A R T I C LE I N FO
A B S T R A C T
Keywords: Near-infrared spectroscopy Synergy interval partial least square (siPLS) Lanqin oral solution Geniposide Universal supervision
Process monitoring is essential for the pharmaceutical industry to control the quality uniformity of final products. In this study, the prompt quantitative methods of geniposide in Lanqin oral solution (LOS) during the process of extraction and concentration were developed by near-infrared spectroscopy (NIRs). Simultaneously, the feasibility of the method for universally monitoring the two processes was investigated. Methods including interval partial least square (iPLS), synergy interval partial least square (siPLS) and competitive adaptive reweighted sampling (CARS) were performed for characteristic variables selection. The single-process models of extraction, concentration and the multi-process model based on the combination of the two processes were constructed and discussed. Results showed that the performance of siPLS models were superior to others. For the extraction, concentration and the combination of these two processes, the Rc values were 0.9486, 0.9746 and 0.9980; while the RSEP values were 10.42 %, 5.41 % and 6.17 %, respectively. Compared with the specific models, the strong accuracy and stability of the multi-process model were also demonstrated. In addition, Pearson correlation analysis and paired t-test were also applied to verify the predictive performance of models. The overall results indicate that NIRs combined with siPLS algorithm is a robust method for predicting the geniposide in the extraction and concentration of LOS. Furthermore, the multi-process model of combination is feasible for the universal supervision of geniposide, which is convenient for industrial production applications.
1. Introduction Lanqin oral solution (LOS) is an extensively used preparation, with the functions of clearing heat, removing toxins, benefiting deglutition and reducing swelling [1]. Geniposide is one of the most important components in LOS, possessing various biological activities such as antiinflammatory, anti-oxidation, and anti-depression [2]. At present, the quality control of LOS mainly focused on raw materials and final products, but the supervision of production process is usually ignored. The extraction, as the initial process of the production of LOS, is crucial, since its efficiency determines the dissolution of effective ingredients. The concentration process is also an essential unit in the manufacturing of LOS, whose purpose is to remove excess solvent from intermediates of anterior process [3]. However, according to empirical methods of quality control, extraction process is usually ended with the stipulated extraction time, which is similar to concentration process determined by the corresponding density. In this way, the variations of ingredients’ ⁎
contents caused by temperature, pressure and other factors can not be monitored timely [4]. In order to enhance the efficiency of processes and ensure the quality stability of final products, process analytical technology (PAT) should be introduced into the production process of LOS. PAT guidance [5] has been issued in 2004 to provide theoretical instruction for monitoring critical quality attributes (CQAs) without delay in manufacturing process. Near-infrared spectroscopy (NIRs), in accordance with the requirements of PAT due to its fast, non-destructive, and efficient characteristics [6], has become a powerful process analysis tool in pharmaceutical industry as well as in the production of traditional Chinese medicine (TCM) [7–10]. However, there are a large number of variables in NIR spectra, involving characteristic variables from specific indicators and interference variables from environment and noise, which makes it difficult to establish a stable and accurate calibration model directly. The defective features of high-dimensional data, redundancy and collinearity [11] demand that NIRs should be
Corresponding author. E-mail address:
[email protected] (Y. Wu).
https://doi.org/10.1016/j.vibspec.2020.103023 Received 23 September 2019; Received in revised form 2 January 2020; Accepted 13 January 2020 Available online 16 January 2020 0924-2031/ © 2020 Elsevier B.V. All rights reserved.
Vibrational Spectroscopy 107 (2020) 103023
Y. Chen, et al.
The extract solution, a mixture of three extractions, was concentrated in double-effect concentrators with the capacity of 2000 L per can. Because of the large volume of extract in on-site production, it was replenished into the concentrators in several times under the judgment of liquid level sensors. Samples were collected every 10 min. when the extract was completely added. After the liquid in the second-effect tank was merged into the first-effect one, samples were collected at intervals of 4 min. until the termination. A total of 135 samples were obtained from 8 batches. It should be noted that since sample collection of extraction process and concentration process were carried out separately at different time periods, the number of batches sampled for the two processes were also inconsistent.
applied in conjunction with various chemometrics methods for quantitative analysis. Variable selection plays a significant role in selecting variables closely related to indicators of interest and removing irrelevant interference [12]. It can not only reduce the dimension of data sets, but also improve the precision and stability of models to some extent. In this work, interval partial least square (iPLS), synergy interval partial least square (siPLS) and competitive adaptive reweighted sampling (CARS) were investigated to improve the predicted performance of calibration models. Besides, prior to the variable selection, outliers removal, sample sets division and spectral pretreatment were implemented by different chemometrics methods. Currently, there are many researches having achieved excellent results in process monitoring through NIRs [13–16]. However, most of them are limited to qualitative or quantitative analysis in single process, which is time-consuming and laborious in model construction and maintenance. The concept of continuous manufacturing proposed to ameliorate quality control and reduce production costs [17–19], has been vigorously promoted in the pharmaceutical industry in recent years. Under the circumstances, the flexibility of process is augmented and continuous monitoring of CQAs is facilitated. Complying with the trend of continuous manufacturing, multi-process or even whole-process universal supervision may be increasingly required. Therefore, models that can be applied to multi-process inspection are more suitable for the quality control of continuous manufacturing. The objective of this study is to realize at-line determination of geniposide in the extraction, concentration processes of LOS by NIRs. The specific models of the two processes were constructed. Meanwhile, the multi-process model based on spectral integration of the two processes was implemented and then discussed by comparison with the specific ones. As far as we know, the study on multiple processes merged in this paper is rare. The main steps of this study are as follows: (1) The outliers were eliminated by Monte Carlo cross-validation (MCCV) and sample sets were divided by sample set portioning based on joint x–y distance (SPXY) method; (2) Raw spectra were preprocessed and characteristic variables were selected by iPLS, siPLS, CARS methods; (3) Quantitative models of extraction process, concentration process and the combination of these two process were constructed and compared; (4) The consistency and difference between reference values and prediction values in prediction sets were analyzed by Pearson correlation analysis and paired t-test.
2.3. NIR spectra acquisition The NIR spectra of all samples were collected by Matrix-F spectrometer (Bruker Optik GmbH, Germany) with a transmission probe. The spectra were acquired at room temperature (28℃) with air as reference and the scanning range was from 12 000–4 000 cm−1. The scan resolution and the number of scans were set as 8 cm−1 and 32, respectively. In order to ensure the accuracy of the spectra obtained, each sample was scanned 3 times in parallel and the averaged spectra were used for data analysis. 2.4. Determination of reference values Extracted samples collected were diluted with methanol at the proportion of 1:10 while concentrated samples at 1:50. The diluted samples were filtered with 0.45 μm filter membrane, and then determined by Agilent 1200 HPLC system with a Luna® C18 column (250 × 4.6 mm, 5 μm). The mobile phase consisted of 0.017 mol/L disodium hydrogen phosphate solution (adjusted to pH 3 by phosphoric acid, A) and acetonitrile (B) with gradient elution: 0~3 min, 2 % B; 3~ 13 min, 2 %~20 % B; 13~35 min, 20 %~45 % B. The flow rate was 1.0 mL/min, and the volume of injection was 10 μL. The column temperature was set as 30℃ and the wavelength of detection was 238 nm. In addition, the reference method by HPLC was verified by linearity, precision, repeatability, stability and recovery rate, which proved that it could be used to measure the content of geniposide accurately. 2.5. Data analysis
2. Materials and methods 2.5.1. Variable selection methods Variable selection can screen characteristic variables related to specified indicators effectively. iPLS is a wavelength interval selection method proposed by L. Norgaard et al. [20]. It is applied to select feature variables by dividing full spectra into several equal subintervals, and partial least squares regression (PLSR) models are developed based on each subinterval. The root mean square error of cross validation (RMSECV) values were used as the criterion to prefer the optimal subinterval. The one with lowest RMSECV was chosen. The siPLS algorithm is proposed on the basis of iPLS [21], and is used to select the optimal combination of two, three or four intervals from several subintervals to construct a PLSR model. Similar to the iPLS procedure, the most excellent combination of intervals is determined by the minimum of RMSECV. CARS is applied to screen the best subset of key wavenumbers by simulating the "survival of the fittest" principle in Darwin's theory of evolution [22]. For CARS algorithm, the significance of wavenumbers is evaluated by absolute values of PLS model’s regression modulus. First of all, in an iterative and competitive manner, N subsets are acquired by running Monte Carlo sampling, and each subset is used in the construction of a calibration model at a fixed ratio(e.g. 75 %). Then, according to the regression coefficients, both exponentially decreasing function (EDF) based on compelled wavenumbers selection and adaptive reweighted sampling (ARS) dependent on competitive
2.1. Materials and reagents Geniposide was purchased from Chengdu Must Bio-Technology Co., Ltd (Sichuan, China). Chromatographic grade methanol, acetonitrile and phosphoric acid were from Merck (Darmstadt, Germany). Anhydrous sodium dihydrogen phosphate was analytical grade and water was purified by Millipore water purification device (Massachusetts, USA). 2.2. Sample collection The extraction and concentration processes of LOS sampled on site were carried out in Yangtze River Pharmaceutical group (Taizhou, China). For extraction process, herbs of LOS and water were mixed in a certain proportion and boiled in a 7 m3 extractor. The whole process was repeated thrice as 2 h for the first time and 1 h for the second and third times, respectively. The extract was circulated from the bottom of the extractor to the top by the pipe every half hour (excluding the third extraction) to improve the efficiency. Due to the limitation of practical production, we sampled at the starting point, the circulation point and the end point of extractions, as well as at the mixture point of three extractions from partial batches to obtain a wide range of concentration. In this way, a total of 124 samples were collected from 14 batches. 2
Vibrational Spectroscopy 107 (2020) 103023
Y. Chen, et al.
wavenumbers selection are adopted to select the key variables, which is described as two stages of fast selection and refined selection. At last, the subset with the smallest RMSECV value is chosen as the optimal one. 2.5.2. Regression model As a classical multivariate linear regression method, PLSR has been applied to the quantification of many substances and has obtained satisfactory results [23–25]. Moreover, appropriate number of latent variables (LVs) can prominently reduce the dimension of multivariate, which is conducive to the establishment of a stable PLSR model [26]. Leave-one-out cross-validation method was carried out to determine the right number of LVs based on the minimum of RMSECV in this study. Fig. 1. Averaged NIR spectra of samples from extraction process (in green line) and concentration process (in blue line).
2.5.3. Statistical analysis methods In this study, Pearson correlation analysis together with paired t-test were adopted to evaluate the predictive performance of calibration models. Pearson correlation analysis and paired t-test are commonly used statistical methods to assess the consistency and difference between two sets of data. The former is judged by Pearson correlation coefficient (PCC) and the later by the p-values. Generally, if the PCC value is greater than 0.7, it is considered to be strongly correlated between two sets of data [27]. Analogically, if the p-value is greater than the acceptable level of uncertainty (e.g. α = 0.05) under the specified degrees of freedom, there is no significant differences between the paired data; otherwise, there is a significant difference [28].
Brucker). All algorithms were completed by MATLAB software (v.2017a, Math Works, Natick, USA). SPSS 19.0 software and ORIGIN 8.0 were used for statistical analysis and graphical representation, respectively. 3. Results and discussions 3.1. NIR spectral analysis and outliers identification It is well known that NIR spectra are mainly derived from overtone and combined vibration frequencies of fundamental molecules containing hydrogen groups (XeH) [30]. Geniposide is a primary iridoid glycoside [31], indicating that there are overtone and combination absorption of C–H, OeH and C=O groups in the near-infrared region. Fig. 1 shows the averaged spectra acquired of samples for extracted solution (in green line) and concentrated solution (in blue line). Although the overall trend of spectra in extraction and concentration processes were similar, the increase in concentration and density of samples from concentration process resulted in a certain degree of light scattering effect, which broadened the baseline of spectra at the range of 12000−8000 cm−1. The apparent absorption at the range of 5550−4550 cm-1 and 7500−6650 cm−1 were generally considered as “water regions” [32], which occurred absorption saturation. Moreover, a large amount of noise was observed around 5000 cm−1 and 4000 cm−1. The high-intensity absorption of aqueous solution and the band overlapping masked other bands existed, which increased difficulties of analysis. Therefore, to identify characteristic absorption and eliminate noise effects, it was necessary to conduct quantitative analysis by means of chemometrics. Abnormal samples resulted from instrument errors, environmental changes, sample anomalies would affect the prediction precision of models constructed. In this research, MCCV was put into use to eliminate outliers whose prediction residual mean deviated from the subject samples significantly [33]. By Monte Carlo random sampling, calibration set and prediction set were randomly partitioned at a ratio of 75:25 to establish a PLSR model, and the procedure was repeated 2500 times. Then the mean and variance of prediction residuals for each sample were calculated. Finally, according to the discretization of values, samples with the mean greater than 1.4 and variance greater than 0.6 were removed in extraction process, while samples whose mean value were higher than 6 and variance were greater than 1.7 were excluded in concentration process. As a result, There were 5 and 3 abnormal samples removed from extraction and concentration samples respectively. Consequently, 119 and 132 samples were left correspondingly.
2.6. Evaluation of model performance The performance of models constructed were evaluated by correlation coefficient of calibration (Rc), correlation coefficient of prediction (Rp), the root mean square error of cross-validation (RMSECV), the root mean square error of calibration (RMSEC), the root mean square error of prediction (RMSEP), the relative standard error of calibration (RSEC) and the relative standard errors of prediction (RSEP). Among them, LVs number of models were determined by RMSECV values, Rc, RMSEC, RSEC were used to assess the performance of calibration models, and others reflected the predictive effect of models by prediction sets. Generally, the closer the Rc and Rp values are to 1, the better the models’ performance. RMSEC and RMSEP values are expected to be smaller and closer to each other to reduce the model prediction error. Additionally, when RSEC and RSEP being smaller, more accurate and robust the models are [29], which is conducive to practical application. The relevant equations are as follows: n
RC / P =
∑i = 1 (yi, actual − yi, predicted )2
1−
n
∑i = 1 (yi, actual − y¯i, actual )2 n ∑i = 1
RMSECV =
RSEC P =
(yi, actual − yi, predicted )2 n
n ∑i = 1
RMSEC P =
(2)
(yi, actual − yi, predicted )2 n
n ∑i = 1
(1)
(3)
(yi, actual − yi, predicted )2 n
∑i = 1 (yi, actual )2
(4)
In the above formula, yi, actual and yi, predicted represent reference value and predicted value of the ith sample in calibration set or prediction set, respectively. y¯i, actual is the mean of all reference values in calibration set or prediction set, while n refers to the number of samples in calibration set or prediction set.
3.2. Sample sets division and spectral pretreatment
2.7. Software
After the removal of outliers, calibration set and prediction set was
Spectral acquisition was performed by OPUS software(v.7.0., 3
Vibrational Spectroscopy 107 (2020) 103023
Y. Chen, et al.
Table 1 The division results of calibration set and prediction set for three data sets. Data sets
Sample sets
S.N.a
Range (mg/mL)
Mean
S.D.b
Extraction
Calibration set Prediction set Calibration set Prediction set Calibration set Prediction set
83 36 92 40 175 76
1.01-3.29 1.04-2.79 11.07-28.38 13.88-27.85 1.01-28.38 1.04-27.85
2.01 1.79 20.18 19.46 11.56 11.08
0.48 0.50 4.35 3.25 9.61 9.22
Concentration Combination
a b
Sample numbers. Standard deviation.
divided by SPXY method at a ratio of 7:3. The calibration set was used for the establishment of calibration model, and the prediction set was to validate the predicted performance of the model independently. SPXY method could take both x and y variables into account when calculating the distance between samples [34]. Thus, multi-dimensional vector space of samples data could be covered to improve the representativeness of calibration set effectively. Results of sample sets division for three data sets are shown in Table 1. It can be seen that the concentration range of samples in prediction set is contained in that of calibration set, which reflects that samples are partitioned properly. To attenuate instrumental noise and other effects, raw spectra were preprocessed by pretreatment methods, including autoscaling, normalization, standard normal variate (SNV) transformation, multiplicative scatter correction (MSC), Savitzky-Golay smoothing (SG) algorithm and SG with first derivative. After comparison, the original spectra, SG and autoscaling were chosen as suitable preprocessed methods for extraction, concentration and the combination models, correspondingly. Results are presented as Supplementary material (Table S1). Combined spectra processed by autoscaling are provided in Fig. 2. Autoscaling can assign the same weight to all wavenumbers in spectra by centralizing the spectral mean and then dividing it by the standard deviation spectrum. As can be seen from Fig. 2, as a kind of data enhancement algorithm, autoscaling could enhance characteristic spectral information effectively and be good for the selection of feature variables.
Fig. 3. The results of preferred intervals by iPLS method (region A: 4696−4348 cm−1; region B: 6401−5601 cm−1).
By the same procedure as the combination data set, the selected iPLS model for extraction process were established on the range of 4696-4348 cm−1, which was marked by the region A of Fig. 3. The model for concentration process was based on the same wavenumber range as the combination data set from 6401 to 5601 cm−1. Regarding the selection results of iPLS method, there was no noise information contained. The combination of C–H stretching and C–H bending in methyl and methylene groups is at the range of 4650−4300 cm−1 [35,36]. The 6200−5450 cm-1 region corresponds to the first overtones of C–H stretching vibration [13,37], and the absorption near 4545 cm-1 is related to the combination of C–H stretching and C=O stretching [38]. 3.3.2. siPLS The siPLS models, which are constructed on the combination of multiple intervals, can screen related variables more comprehensively. When full spectra of combination data set were separated into 5, 10, 15, 20, 25, 30 and 35 subintervals, RMSECV value was the smallest with the subintervals at 20. Subsequently, the ideal number of subintervals was studied in detail from 15 to 25. The model with 20 subintervals established on the combination of 5th, 6th, 9th, 10th intervals was better. The range of 6402−5604 cm−1 and 8006.9−7208 cm−1 chosen as valid intervals were marked in the region B and C of Fig.4. Similarly, the siPLS model with four synergy intervals [2,6,7,11] of 25 subintervals was proved to be the best for extraction data set. The selected variable intervals were 4640−4320 cm−1, 6241−5601 cm−1 and 7522−7202 cm−1. For the concentration process, the wavenumbers range of 5868−5601 cm−1 and 6401−6135 cm−1 were integrated for the construction of siPLS model, which is superior to others. Similar to iPLS, the variable intervals selected by siPLS method
3.3. Characteristic variable selection 3.3.1. iPLS When constructing iPLS models, full spectra were initially divided into 4, 8, 12, 16, 20, 24, 28 intervals for modeling. For the data set of combination, when the subinterval number was 8, the RMSECV was the lowest. Then, the subintervals from 7 to 17 were screened in sequence. Taking the RMSECV, R and RMSE values into consideration, when the subinterval was 10, the model worked best with the third interval, which indicated that wavenumbers range was 6401.6-5601.1 cm−1 marked by grey in the area B of Fig. 3.
Fig. 4. The results of preferred intervals by siPLS method (region A: region B: 6402−5604 cm−1; region C: 4640−4320 cm−1; 8006.9−7208 cm−1).
Fig. 2. Combined spectra processed by autoscaling. 4
Vibrational Spectroscopy 107 (2020) 103023
Y. Chen, et al.
marked in bold in Table 2. For instance, in contrast with Full-PLS, the Rc value of siPLS model for extraction was promoted from 0.8551 to 0.9486, and the RSEC value was decreased by 49 %. Meanwhile, RMSEC and RMSEP were also reduced by 38.9 % and 48.9 %, respectively. In addition, models were validated by prediction sets. For extraction, concentration and combination, the Rp values of siPLS models were 0.9173, 0.9431 and 0.9953, while the RSEP values were 10.62 %, 5.41 % and 6.17 %, respectively, which were better than other models. The results show that the optimized siPLS models could realize the rapid detection of geniposide in the production process. Fig. 6. shows the correlation between reference values and predicted values of geniposide in siPLS models for three data sets, in which samples of calibration sets and prediction sets are all evenly distributed around the regression line. Moreover, the Rc and Rp of siPLS model for combination data set were 0.9980 and 0.9953 respectively, indicating that the regression correlation was high. Besides, the relative standard error, not only for calibration set but for prediction set, were small. Results signify that the multi-process model is precise and reliable, which is sufficient to meet the demand of multi-process monitoring in continuous manufacturing. Through spectral fusion, more comprehensive information is contained in multi-process model compared with two separate processes, making the model more robust. The concentration range of multi-process model also contributes to the higher stability and wider application. By replacing the specific models with the multi-process one, the workload input of model construction in previous research can be reduced and the complexity of models switching in application can be avoided. These characteristics make the multi-process model more suitable for universal supervision in practical applications.
excluded the noise as well, mainly focusing on the absorption of C–H first combination (4640−4320 cm−1) and C–H stretching first overtone (6241−5601 cm−1) in −CH2, −CH3 and −CH=CH— groups [39]. In addition, the second combination of C–H in the hydrocarbon is about 7691−7263 cm−1. The above results show that as wavenumber interval selection methods, the wavenumber ranges selected by iPLS and siPLS are generally consistent. The key point is that noise information is eliminated successfully and characteristic variables are screened out, which is beneficial for model optimization. 3.3.3. CARS CARS method was applied to search for optimal subsets of variables with the number of random sampling set to 100. Supplementary material (Fig. S1) shows the variation trend of the number of variables sampled (plot A), the RMSECV value (plot B), and the regression coefficients of every variable (plot C) for the combination data set as the number of sampling runs increased. In Fig. S1- (A), the curve representing the number of sampled variables firstly reduces and then stabilizes, that is because irrelevant wavenumbers are quickly eliminated by the fast selection, whereas in the stage of refined selection, variables selected are moderately decreased. With the removal of unrelated variables, RMSECV values descend first, then increase on account of the elimination of some key variables, which is shown in Fig. S1- (B). The sterisk line in Fig. S1- (C) refers to the best subset of variables with the smallest RMSECV. Finally, the numbers of variables selected for the extraction, concentration and combination were 82, 710, 38 respectively. Fig. 5 illustrates the results of variable selection for the combination. It can be seen that the variables screened by CARS algorithm were mainly in the vicinity of 7000 cm−1, 5400 cm−1 and at the range of 4500−4100 cm−1, reflecting the first overtone and combination of O—H and the first combined frequency of C–H respectively. However, as a wavenumber point selection method, CARS may be not sensitive to recognize noise information, resulting in the selected variables containing noise (around 4100 cm−1), which may have adverse effects on the optimization of models.
3.5. The results of statistical analysis methods The predictive performance of siPLS models were also evaluated by PPCs and p-values between reference and predicted values in prediction sets. Results are shown in Table 3. It is known from Table 3 that PCCs of project a, b and c are greater than 0.9, and p-values are above 0.05, which indicates that there is a strong correlation and no significant difference between the two groups of data. It illustrates that siPLS models optimized for the extraction, concentration and combination in this study can predict the geniposide content accurately. Furthermore, the precision and robustness of multi-process model are also demonstrated.
3.4. Multivariate regression model Results of iPLS, siPLS and CARS-PLS models optimized for extraction, concentration and combination are summarized in Table 2. As can be seen from Table 2, compared with Full-PLS models, after variable selection, Rc values of models are generally increased, and values of RMSECV, RMSEC and RSEC are reduced, manifesting that variable selection is vital to enhance the precision and robustness of models. For the three data sets, CARS-PLS models are less effective than iPLS and siPLS, probably because of the insufficient identification of characteristic variables and the inclusion of noise information in filtered variables. Overall, siPLS models are more competent in predicting the geniposide content than the others for the three data sets, which are
4. Conclusions In this study, the rapid determination of geniposide in production process of LOS were realized by NIRs coupled with chemometrics. The quantitative models of extraction, concentration and the combination of the two processes were established and discussed. Performance of models based on NIRs combined with siPLS method were performed better than others and models obtained could quantify the content of geniposide accurately, which were evaluated by the performance indexes and statistical methods. Moreover, it was proved that the performance of multi-process model was satisfactory in the detection of geniposide. It could not only improve the applicability and robustness of models but reduce the difficulty of model maintenance, which facilitated production applications. The multi-process or whole-process models would be beneficial for universal monitoring and play an indispensable role in promoting continuous manufacturing in pharmaceutical industry. CRediT authorship contribution statement Yong Chen: Conceptualization, Supervision. Ming Chen: Methodology, Investigation, Writing - original draft. Siyu Zhang: Software. Hui Ma: Data curation. Jun Wang: Project administration.
Fig. 5. The optimal subset selected by CARS for the combination data set. 5
Vibrational Spectroscopy 107 (2020) 103023
Y. Chen, et al.
Table 2 Results of different PLS models for three data sets. Data sets
Methods
Extraction
Full-PLS iPLS siPLS CARS Full-PLS iPLS siPLS CARS Full-PLS iPLS siPLS CARS
Concentration
Combination
Variables
2074 91 332 82 2074 208 138 710 2074 208 416 38
LVs
5 6 11 16 4 8 7 4 5 11 14 12
Calibration Set
Prediction Set
RMSECV
Rc
RMSEC
RSEC%
Rp
RMSEP
RSEP%
0.4225 0.3056 0.2520 0.2798 1.4555 1.2974 1.2430 1.4757 1.1537 0.9973 0.9836 1.0221
0.8551 0.8748 0.9486 0.9122 0.9578 0.9759 0.9746 0.9596 0.9940 0.9971 0.9980 0.9954
0.2487 0.2324 0.1518 0.1966 1.2443 0.9455 0.9701 1.2183 1.0490 0.7249 0.6110 0.9175
12.02 11.23 7.34 9.50 6.03 4.58 4.70 5.90 6.99 4.83 4.07 6.11
0.6262 0.8740 0.9173 0.8708 0.9107 0.9371 0.9431 0.9076 0.9934 0.9947 0.9953 0.9953
0.3868 0.2411 0.1976 0.2439 1.3247 1.1195 1.0668 1.3469 1.0515 0.9441 0.8868 0.8836
20.80 12.96 10.62 13.11 6.72 5.68 5.41 6.83 7.31 6.57 6.17 6.14
Fig. 6. Correlation between reference values and predicted values of geniposide in siPLS models (A. extraction samples; B. concentration samples; C. combination samples).
Hongwei Lu: Resources. Yongjiang Wu: Writing - review & editing, Supervision.
Table 3 The results of Pearson correlation analysis and paired t-test in different projects. Project
Sample numbers
PCCs
p-value(α=0.05)
a b c
36 40 76
0.927 0.949 0.995
0.185 0.865 0.311
Declaration of Competing Interest The authors declare that no conflict of interest exists in this work, and the manuscript is approved by all authors for publication. Acknowledgments This work is supported by National Major Scientific and Technological Special Project for “Significant New Drugs Development” (2018ZX09201010). And authors also appreciate the cooperation and 6
Vibrational Spectroscopy 107 (2020) 103023
Y. Chen, et al.
support of Yangtze River Pharmaceutical group.
142 (2019) 265–279. [19] N. Bostijn, J. Van Renterghem, B. Vanbillemont, W. Dhondt, C. Vervaet, T. De Beer, Continuous manufacturing of a pharmaceutical cream: investigating continuous powder dispersing and residence time distribution (RTD), Eur. J. Pharm. Scie. 132 (2019) 106–117. [20] L. Norgaard, A. Saudland, J. Wagner, J.P. Nielsen, L. Munck, S.B. Engelsen, Interval Partial least-squares regression (iPLS): a comparative chemometric study with an example from Near-Infrared spectroscopy, Appl. Spectrosc. 54 (2000) 413–419. [21] H.L. Ma, J.W. Wang, Y.J. Chen, J.L. Cheng, Z.T. Lai, Rapid authentication of starch adulterations in ultrafine granular powder of Shanyao by near-infrared spectroscopy coupled with chemometric methods, Food Chem. 215 (2017) 108–115. [22] H.D. Li, Y.Z. Liang, Q.S. Xu, D.S. Cao, Key wavelengths screening using competitive adaptive reweighted sampling method for multivariate calibration, Anal. Chim. Acta 648 (2009) 77–84. [23] A. Phuphaphud, K. Saengprachatanarug, J. Posom, K. Maraphum, E. Taira, Prediction of the fibre content of sugarcane stalk by direct scanning using visibleshortwave near infrared spectroscopy, Vib. Spectrosc. 101 (2019) 71–80. [24] Y.J. Li, C.M. Altaner, Effects of variable selection and processing of NIR and ATR-IR spectra on the prediction of extractive content in Eucalyptus bosistoana heartwood, Spectrochim. Acta A. 213 (2019) 111–117. [25] A.T. Badaró, F.L. Morimitsu, A.R. Ferreira, M.T.P.S. Clerici, D.F. Barbin, Identification of fiber added to semolina by near infrared (NIR) spectral techniques, Food Chem. 289 (2019) 195–203. [26] Y. Liu, Y.X. Wang, Z.Z. Xia, Y.J. Wang, Y.N. Wu, Z.Y. Gong, Rapid determination of phytosterols by NIRS and chemometric methods, Spectrochim. Acta A. 211 (2019) 336–341. [27] H.M. Zhou, Z.H. Deng, Y.Q. Xia, M.Y. Fu, A new sampling method in particle filter based on Pearson correlation coefficient, Neurocomputing 216 (2016) 208–215. [28] E.C. Hedberg, S. Ayers, The power of a paired t-test with a covariate, Soc. Sci. Res. 50 (2015) 277–291. [29] X.S. Liu, S.Y. Zhang, H.F. Ni, W. Xiao, J. Wang, Y.R. Li, Y.J. Wu, Near infrared system coupled chemometric algorithms for the variable selection and prediction of baicalin in three different processes, Spectrochim. Acta A. 218 (2019) 33–39. [30] L.J. Ma, Y.F. Peng, Y.L. Pei, J.Q. Zeng, H.R. Shen, J.J. Cao, Y.J. Qiao, Z.S. Wu, Systematic discovery about NIR spectral assignment from chemical structural property to natural chemical compounds, Sci. Rep. (2019), https://doi.org/10. 1038/s41598-019-45945-y 9:9503. [31] J.Z. Tian, Y. Yi, Y. Zhao, C.Y. Li, Y.S. Zhang, L.M. Wang, C. Pan, J.Y. Han, G.Q. Li, X.L. Li, J. Liu, N. Deng, Y. Gao, A.H. Liang, Oral chronic toxicity study of geniposide in rats, J. Ethnopharmacol. 213 (2018) 166–175. [32] P. Wang, H. Zhang, H.L. Yang, L. Nie, H.C. Zang, Rapid determination of major bioactive isoflavonoid compounds during the extraction process of kudzu (Pueraria lobata) by near-infrared transmission spectroscopy, Spectrochim. Acta A. 137 (2015) 1403–1408. [33] Y.P. Li, T. Fang, S.Q. Zhu, F.R. Huang, Z.Q. Chen, Y. Wang, Detection of olive oil adulteration with waste cooking oil via Raman spectroscopy combined with iPLS and SiPLS, Spectrochim. Acta A. 189 (2018) 37–43. [34] R.K.H. Galvao, M.C.U. Araujo, G.E. Jose, M.J.C. Pontes, E.C. Silva, T.C.B. Saldanha, A method for calibration and validation subset partitioning, Talanta 67 (2005) 736–740. [35] L.Y. Tao, B. Via, Y.J. Wu, W. Xiao, X.S. Liu, NIR and MIR spectral data fusion for rapid detection of Lonicera japonica and Artemisia annua by liquid extraction process, Vib. Spectrosc. 102 (2019) 31–38. [36] P.S. Sampaio, A. Soares, A. Castanho, A.S. Almeida, J. Oliveira, C. Brites, Optimization of rice amylose determination by NIR-spectroscopy using PLS chemometrics algorithms, Food Chem. 242 (2018) 196–204. [37] J. Märk, M. Andre, M. Karner, C.W. Huck, Prospects for multivariate classification of a pharmaceutical intermediate with near-infrared spectroscopy as a process analytical technology (PAT) production control supplement, Eur. J. Pharm. Biopharm. 76 (2010) 320–327. [38] J.J. Xue, Z.L. Yang, L.J. Han, Y.C. Liu, Y. Liu, C.C. Zhou, On-line measurement of proximates and lignocellulose components of corn stover using NIRS, Appl. Energ. 137 (2015) 18–25. [39] K.B. Beć, J. Grabska, C.G. Kirchler, C.W. Huck, NIR spectra simulation of thymol for better understanding of the spectra forming factors, phase and concentration effects and PLS regression features, J. Mol. Liq. 268 (2018) 895–902.
Appendix A. Supplementary data Supplementary material related to this article can be found, in the online version, at doi:https://doi.org/10.1016/j.vibspec.2020.103023. References [1] National Pharmacopoeia Commission, National Food and Drug Administration National Drug Standards (New Drugs to Positive Standards), WS3-214 (Z-033)2002 (Z). [2] W.P. Xiao, S.M. Li, S.Y. Wang, C.T. Ho, Chemistry and bioactivity of Gardenia jasminoides, J. Food Drug Anal. 25 (2017) 43–61. [3] R.H. Liu, Q.F. Sun, T. Hu, L. Li, L. Nie, J.Y. Wang, W.H. Zhou, H.C. Zang, Multiparameters monitoring during traditional Chinese medicine concentration process with near infrared spectroscopy and chemometrics, Spectrochim. Acta A 192 (2018) 75–81. [4] Y.J. Wu, Y. Jin, Y.R. Li, D. Sun, X.S. Liu, Y. Chen, NIR spectroscopy as a process analytical technology (PAT) tool for on-line and real-time monitoring of an extraction process, Vib. Spectrosc. 58 (2012) 109–118. [5] Food and Drug Administration, Guidance for Industry PAT: A Framework for Innovative Pharmaceutical Development, Manufacturing, and Quality Assurance, (2004). [6] C. Pasquini, Near infrared spectroscopy: a mature analytical technique with new perspectives-A review, Anal. Chim. Acta 1026 (2018) 8–36. [7] Q. Kang, Q.G. Ru, Y. Liu, L.Y. Xu, J. Liu, Y.F. Wang, Y.W. Zhang, H. Li, Q. Zhang, Q. Wu, On-line monitoring the extract process of Fu-fang Shuanghua oral solution using near infrared spectroscopy and different PLS algorithms, Spectrochim. Acta A. 152 (2016) 431–437. [8] L.Y. Tao, Z.L. Lin, J.S. Chen, Y.J. Wu, X.S. Liu, Mid-infrared and near-infrared spectroscopy for rapid detection of Gardeniae Fructus by a liquid-liquid extraction process, J. Pharmaceut. Biomed. 145 (2017) 1–9. [9] T.C. Suo, H.X. Wang, X.J. Shi, L.L. Feng, J.Y. Cai, Y. Duan, H.M. Bao, X.L. Wu, Y. Zhang, H.S. Yu, Z. Li, Combining near infrared spectroscopy with predictive model and expertise to monitor herb extraction processes, J. Pharmaceut. Biomed. 148 (2018) 214–223. [10] L.H. Yin, J.M. Zhou, D.D. Chen, T.T. Han, B.S. Zheng, A. Younis, Q.S. Shao, A review of the application of near-infrared spectroscopy to rare traditional Chinese medicine, Spectrochim. Acta A. 221 (2019) 117208. [11] L.Q. Hu, C.L. Yin, S. Ma, Z.M. Liu, Rapid detection of three quality parameters and classification of wine based on Vis-NIR spectroscopy with wavelength selection by ACO and CARS algorithms, Spectrochim. Acta A. 205 (2018) 574–581. [12] Y.H. Yun, H.D. Li, B.C. Deng, D.S. Cao, An overview of variable selection methods in multivariate analysis of near-infrared spectra, Trends Analyt. Chem. 113 (2019) 102–115. [13] M.H. Yi, T. Qiu, M. Okubo, X.Y. Li, L.H. Guo, Innovative on-line near-infrared (NIR) spectroscopy to estimate content of each phase in composite polymer particles prepared by seeded emulsion polymerization, Vib. Spectrosc. 95 (2018) 23–31. [14] Y. Yang, L. Wang, Y.J. Wu, X.S. Liu, Y. Bi, W. Xiao, Y. Chen, On-line monitoring of extraction process of Flos Lonicerae Japonicae using near infrared spectroscopy combined with synergy interval PLS and genetic algorithm, Spectrochim. Acta A. 182 (2017) 73–80. [15] K. Phetpan, V. Udompetaikul, P. Sirisomboon, In-line near infrared spectroscopy for the prediction of moisture content in the tapioca starch drying process, Powder Technol. 345 (2019) 608–615. [16] Y. Jin, Z.Z. Wu, X.S. Liu, Y.J. Wu, Near infrared spectroscopy in combination with chemometrics as a process analytical technology (PAT) tool for on-line quantitative monitoring of alcohol precipitation, J. Pharmaceut. Biomed. 77 (2013) 32–39. [17] M.A. Alama, Z.Q. Shi, J.K. Drennen III, C.A. Anderson, In-line monitoring and optimization of powder flow in a simulated continuous process using transmission near infrared spectroscopy, Int. J. Pharm. 526 (2017) 199–208. [18] R. Pisano, A. Arsiccio, L.C. Capozzi, B.L. Trout, Achieving continuous manufacturing in lyophilization: technologies and approaches, Eur. J. Pharm. Biopharm.
7