Infrared Physics and Technology 98 (2019) 297–304
Contents lists available at ScienceDirect
Infrared Physics & Technology journal homepage: www.elsevier.com/locate/infrared
Regular article
Nondestructive measurement of soluble solids content in apple using near infrared hyperspectral imaging coupled with wavelength selection algorithm
T
Dongyan Zhanga, Yunfei Xua,b, Wenqian Huangb,c,d, Xi Tianb,c,d, Yu Xiab, Lu Xua, ⁎ Shuxiang Fanb,c,d, a
National Engineering Research Center for Agro-Ecological Big Data Analysis & Application, Anhui University, Hefei 230601, China Beijing Research Center of Intelligent Equipment for Agriculture, Beijing 100097, China National Research Center of Intelligent Equipment for Agriculture, Beijing 100097, China d Key Laboratory of Agri-informatics, Ministry of Agriculture, Beijing 100097, China b c
ARTICLE INFO
ABSTRACT
Keywords: Near infrared hyperspectral imaging Apple Soluble solids content Variable selection
Hyperspectral imaging is a promising technique for nondestructive sensing of multiple quality attributes of apple fruit. This research evaluated and compared different mathematical models to extract effective wavelengths for measurement of apple soluble solids content (SSC) based on near infrared (NIR) hyperspectral imaging over the spectral region of 1000–2500 nm. A total of 160 samples were prepared for the calibration (n = 120) and prediction (n = 40) sets. Competitive adaptive reweighted sampling (CARS), successive projections algorithm (SPA), random frog (RF), and CARS-SPA, CARS-RF combined algorithms were used for extracting effective wavelengths from hyperspectral images of apples, respectively. Based on the selected effective wavelengths, different models were built and compared for predicting SSC of apple using partial least squares (PLS) and least squared support vector regression (LS-SVR). Among all the models, the models based on the ten effective wavelengths selected by CARS-SPA achieved the best results, with Rp, RMSEP of 0.907, 0.479 °Brix for PLS and 0.917, 0.453 °Brix for LS-SVR, respectively. The overall results indicated that CARS-SPA can be used for selecting the effective wavelengths from hyperspectral data. Both PLS and LS-SVR can be applied to develop calibration models to predict apple SSC. Furthermore, the wavelengths selected by CARS-SPA algorithm has a great potential for online detection of apple SSC.
1. Introduction Apple is a widely consumed fruit rich in sugar, vitamins, anthocyanins, minerals, and other nutrients. Soluble solids content (SSC) is an important indicator of the taste or quality of apple fruit [1]. The fresh fruit market is becoming increasingly demanding with regard to product quality. Today, one of the main aims of the fruit industry is to provide the consumer with products meeting high internal quality standards, rather than fruit which looks mouthwatering but actually tastes insipid [2]. SSC is one of the most crucial internal quality attributes of fruit. It could provide valuable information for commercial decision-making [3]. Standard methods for SSC measurement are mostly destructive, inefficient, time-consuming, and only applied to small groups of samples [4]. Hence, nondestructive sensing of fruit internal SSC is of great value. Significant advances have been made on
⁎
near-infrared (NIR) spectroscopy technology for quality assessment of fresh fruit, especially flavor-related quality attributes such as SSC, hardness, maturity, moisture content and so on [5]. However, this technology has certain limitations because the NIR spectroscopy technique is a single point measurement for the sample, and the acquired spectral information is so limited that the collected data cannot represent a large area of a fruit [6]. Hyperspectral imaging (HSI) technology integrates image information and spectral information and is an important development trend of non-destructive testing [7]. HSI system usually has two types of spectral ranges, namely visible near-infrared spectral range (VIS-NIR, 400–1000 nm) and near-infrared spectral range (NIR, 1000–2500 nm) [2]. HSI has been widely used in the quality inspection of agricultural products and has developed rapidly in recent years [8]. In particular, a number of studies have been performed to measure the SSC of apples.
Corresponding author at: Beijing Research Center of Intelligent Equipment for Agriculture, Beijing 100097, China. E-mail address:
[email protected] (S. Fan).
https://doi.org/10.1016/j.infrared.2019.03.026 Received 17 December 2018; Received in revised form 6 March 2019; Accepted 22 March 2019 Available online 23 March 2019 1350-4495/ © 2019 Elsevier B.V. All rights reserved.
Infrared Physics and Technology 98 (2019) 297–304
D. Zhang, et al.
Peng et al. [5] developed an optimal model for predicting fruit firmness and SSC of ‘Golden Delicious’ apples over the spectral region between 450 nm and 1000 nm. Mendoza et al. [9] improved firmness and SSC prediction for ‘Golden Delicious’ (GD), ‘Jonagold’ (JG), and ‘Delicious’ (RD) apples by integrating critical spectral and image features extracted from the hyperspectral scattering images over the wavelength region of 500–1000 nm. However, most of previous studies about SSC prediction using HSI focus on the Vis-NIR region, few researchers have applied NIR HSI because of the lack of suitable NIR spectral imaging devices [10]. Li et al. [11] studied the firmness, SSC and color components of plums based on 600–975 nm and 865–1610 nm hyperspectral imaging. Li et al. [12] reported the relationships between SSC and PH cherry fruit of different maturity stages using 874–1734 nm region NIR hyperspectral imaging technology. Since there is no study about apple SSC prediction based on NIR HSI, it is necessary to investigate the potential of NIR HSI for apple SSC detection. Due to the acquired hyperspectral images typically have a high dimensional data and contains a large amount of redundant information which would impair the predictive performance and application of the spectral model [14]. Thus, it is important to reduce high dimensionality data and to seek the most sensitive wavelengths to predict the SSC of apple, which can meet the requirements of real-time acquisition and processing of hyperspectral images. Fan et al. [13] used successive projection algorithm (SPA), competitive adaptive reweighted sampling (CARS) and the combination of CARS and SPA to select effective wavelength variables, the results indicated that the CARS-SPA was a fast and potential wavelengths selection method for the determination of SSC and firmness of pear. Li et al. [2] combined Monte Carlo-uninformative variable elimination (MC-UVE) and SPA to select effective variables for pear SSC prediction, and the results indicated that the calibration model built using MC-UVE-SPA on 18 effective variables achieved the optimal performance. These studies suggested that variable selection methods could be combined to get better prediction results than using them alone. In the quantitative prediction modeling analysis of fruit SSC, linear regression models such as partial least square (PLS) and multiple linear regression (MLR) are often applied in spectral analysis. However, some recent studies have shown that nonlinear models have a clear advantage over linear models in the quantitative analysis of SSC in certain fruits. Zhan et al. [15] reported that least squared support vector regression (LS-SVR) model was more suitable than PLS model for SSC prediction of Korla fragrant pear, suggesting that there was nonlinear information between fruit spectral data and its internal quality. Compared with traditional NIR spectroscopy, HSI technology is more susceptible to non-monochromatic light, stray light, temperature, and other factors in the spectral acquisition, resulting in nonlinearity between hyperspectral image data and the properties to be tested. Therefore, it is meaningful to establish a nonlinear model for quantitative prediction of SSC in fruit using hyperspectral data. As apple is a spheroidal fruit, the spherical shape causes the spectral energy gradually decreases from the central region to the edge of the image. Thus, the SSC spatial distribution cannot be obtained unless this phenomenon is addressed. Although some methods have been proposed to correct the uneven distribution of lightness in the surface of spheroidal fruits for apple bruise detection [16], onion SSC perdition [17], there has been no effective way to obtain the internal quality distribution map using HSI technology. Thus, only the sliced apples were used for mapping the SSC distribution [18]. The visualization of internal quality by HSI technology is only applied to objects with relatively flat surfaces such as beef [19], lamb [20], and oilseed rape leaves [21]. The equatorial region of apple was therefore investigated as the region of interest (ROI) using in this study before visualizing the SSC distribution in the whole surface facing the camera. Therefore, the specific objectives of this study were to: (1) test whether NIR hyperspectral imaging can get a good prediction ability of SSC of apple; (2) identify the effective variables that were most
informative for the assessment of SSC; and (3) establish and compare PLS and LS-SVR multivariate calibration models based on the whole range spectra and identified wavelengths. 2. Material and methods 2.1. Sample preparation The apples were purchased from a fruit wholesale market in Beijing. In order to include a larger gradient range of SSC to ensure the model’s stability, a total of 160 samples without damage or scar on the surface were selected. After cleaning, the samples were placed in a laboratory environment at (20 ± 1)°C for 24 h to avoid the effect of temperature on the measurement results [22]. Before the data collection, each sample is marked at the equator for measurement of spectroscopy and SSC physicochemical values. 120 samples were selected randomly and used as the calibration set and the remaining 40 samples were considered as the prediction set. 2.2. Hyperspectral imaging system A prototype HSI system developed in the agricultural product intelligent detection laboratory of the China National Engineering Research Center for Information Technology in Agriculture (NERCITA) was used to acquire NIR hyperspectral reflectance images of samples. The entire system consists of a high-spectrum spectrometer (ImSpector N25E, Spectral Imaging Ltd, Oulu, Finland), two 150 W halogen light area sources, a 320 × 256 pixels array of near-infrared camera (Xeva2.5-320, Xenics Ltd, Belgium), a precision mobile platform controlled by stepper motors (EZHR17EN, AllMotion, Inc., USA), and a computer (Dell E6520, Intel Core i7-2620 M@ 2.70 GHz, RAM 4G). The spectral range is 930–2548 nm and the spectral resolution is 8 nm. As the signal to noise ratio was rather low at the beginning and end of the waveband, only the range of 1000–2500 nm (with 238 variables) was selected as for further analysis. The image acquisition parameters controlled by the software include motor speed, exposure time, and object distance. Based on the system configuration, the equatorial region of apple was facing the camera, and the camera was fixed at a distance of 340 mm from the lens to the samples’ surface. Each apple was placed on the sample table to be scanned at a 43 mm/s constant speed line by line using 2 ms exposure time to create a hyperspectral image. The hyperspectral images acquired contain both the spectral information of each pixel and the image information under a specific wavelength. Due to the non-uniform illumination or variations in the pixel-to-pixel sensitivity of the detector, original hyperspectral images need to be corrected to eliminate the influence [23]. Under the same system parameters as the sample image acquisition, the white reference (Rwhite ) was captured by a standard whiteboard with a reflectance of approximately 99%, and then the dark reference image (Rdark ) was obtained with the lamps turned off and the optical lens completely covered by its cap [24]. The corrected images were calculated according to the formula:
R=
Roriginal Rwhite
Rdark Rdark
(1)
where Roriginal represents the original hyperspectral image and R represents the calibrated image. The calibrated image was then used for further data processing and analyzing. The ROI with 100 × 100 pixels was manually extracted from hyperspectral image of equatorial region of each sample. Although each pixel in the identified ROI has its own spectrum, it is practically impossible to measure the reference SSC values for every pixel within the ROI using the standard method. The spectrum of each pixel within the ROI was first extracted and then the spectra of all pixels within ROI were averaged at each wavelength to represent the ROI. Following this 298
Infrared Physics and Technology 98 (2019) 297–304
D. Zhang, et al.
this paper, 10-fold cross-validation were used to select PLS subset models, and the number of Monte Carlo samples was set to 50. When the RMSECV reached the lowest value, the corresponding subset was the optimal variable subset [15]. SPA algorithm is an effective method for the selection of characteristic variables in the spectrum. It uses vector projection analysis to find the minimum redundant information variable group in the spectral information, which minimizes the collinearity between variables and reduces the number of modeling variables. It can effectively reduce the complexity of fitting in the process of model establishment and speed up the fitting operation [30]. In this study, the SPA algorithm is applied directly to the full spectra on the one hand, and the feature variables selected by CARS on the other hand. In the SPA sample optimization, different sample subsets establish different MLR models and calculate the root mean square of these models, separately, the subset of samples corresponding to the lowest RMSEP value is the optimal sample subset. In this study, the default parameters were used for SPA method. RF algorithm is first proposed for gene expression analysis of diseases [31]. This method is similar to the reversible jump Markov chain Monte Carlo (RJMCMC), which calculates weight of each variable by simulating a Markov chain obeying a steady-state distribution in the model space. The more important the variable is to the model, the greater the probability of being selected. Therefore, the selected probabilities of all variables can be sorted, and the variables with higher probability are selected as the characteristic wavelengths. As RF algorithm is based on the Monte Carlo idea, the selected variables of each run are slightly different. In order to reduce the impact of random factors, it is necessary to run multiple times and count the results [32]. In this study, the RF algorithm was run 100 times and the average of 100 running results was used as the final characteristic wavelength selection. All the variable selection procedures mentioned above were carried out in the MATLAB 2012 (The MathWorks, Natick, USA).
Table 1 Statistics of soluble solids content (SSC, °Brix) measured values. Sample set
No. of samples
Min
Max
Average
Total Calibration set Prediction set
160 120 40
10.1 10.1 10.2
16.6 16.6 15.4
13.25 13.35 12.8
procedure, a total of 160 mean spectral reflectance values were obtained. 2.3. Reference measurements In order to assess the influence of internal fruit quality on the transmission spectra, the SSC content was determined using a digital refractometer (PAL-1, ATAGO, Tokyo, Japan). After image acquisition, a piece of flesh (10 mm) with peel was cut from the marked part and was manually pressed and filtered through gauze. Then apple juice was dripped onto the mirror of the refractometer. The measured SSC value was read and recorded (Table 1). The SSC ranges of the calibration set and prediction set were 10.1–16.6 °Brix and 10.2–15.4 °Brix, respectively. The SSC range of calibration set is larger than that of prediction set, which is helpful to develop a good model. 2.4. Spectral preprocessing During hyperspectral data acquisition, some disturbances, such as external stray light, sample background, electron source, and instrument performance could cause baseline translation and rotation, and spectral scattering [25]. In order to establish a stable mathematical model, the original spectrum needs to be preprocessed to eliminate or weaken the influence of non-target factors and improve the signal to noise ratio. Therefore, five spectral pre-processing methods, including multiplicative scatter correction (MSC), standard normal variate (SNV) [26], Savitzky-Golay (S-G 21-point) smoothing, the first derivative and second derivative [27] were used to preprocess the raw spectral data. These spectral preprocessing treatments were performed in the Unscrambler v9.7 (CAMO PROCESS AS, Oslo, Norway).
2.6. Model building and evaluation PLS is a multivariate statistical method that is insensitive to collinear variables and tolerant to large numbers of variables and has been frequently used for spectral analysis. It combines the advantages of principal component analysis and MLR and projects the spectral data onto a set of orthogonal factors named latent variables (LVs) that are estimated by the minimum value of the RMSECV [33]. The LVs can explain the variance and reduce the dimensionality of the original spectra. In the development of PLS model, the optimal number of LVs was determined by the cross-validation of the calibration samples using 10-fold cross validation until RMSECV reached the minimum. Afterward, the calibration model was tested to predict SSC with prediction set that had not been used in the calibration. PLS regression reveals a linear relationship between spectral variables (X) and sample properties (Y) [34], and the resulting model can be expressed as:
2.5. Effective variables selection Due to the high spectral resolution of the hyperspectral data, the huge amount of data and the strong correlation between adjacent variables, the spectral variables in the full range spectra often contain noise from environment and instrumental sources, leading to complexity and poor predicting ability of a calibration model. In addition, the complex calibration models developed with the full spectrum restricts the hyperspectral imaging application when it is used for on-line purpose. To deal with these problems, multivariate methods are often used to treat large data matrices to decrease data size by exacting a few key wavelengths for rapid and accurate quantitative or qualitative analysis of fruit quality [28]. CARS is a new variable selection method, which can effectively reduce the influence of collinear variables on the model while removing the non-information variables [29]. In the CARS variable selection process, absolute values of regression coefficients of PLS model are used as an index for evaluating the importance of each variable. CARS sequentially selects N subsets of variables from N Monte Carlo sampling run in an iterative and competitive manner. In each sampling run, some samples are first randomly chosen in a fixed ratio to build a calibration model. Next, the exponentially decreasing function (EDF) and adaptive reweighted sampling (ARS) process are adopted to select the key variables based on the regression coefficients. Finally, the most critical variables for the prediction target are selected according to the lowest root mean-square error of cross validation (RMSECV) of each subset. In
Y = Xb + e
(2)
where b and e represent regression coefficients and prediction errors, respectively. LS-SVR is improved by support vector machine (SVM) and can quickly solve linear and nonlinear problems in multiple regression problems. LS-SVR uses equality constraints to solve a more complex quadratic optimization problem in a traditional SVM by solving a set of linear equations, thereby improving computational efficiency and the prediction ability of the model [35]. The choice of kernel function and kernel function parameters directly determines the performance of the LS-SVR model. The RBF kernel function can reduce the computational complexity during the training process. Therefore, RBF was used as the kernel function of LS-SVR in the present study. The regression error weight (γ) and the kernel function parameter (σ2) were determined by the simplex method optimization combined with the 10-fold cross-validation. The LS-SVR model can be expressed as: 299
Infrared Physics and Technology 98 (2019) 297–304
D. Zhang, et al.
Fig. 1. Hyperspectral image analysis structure (spatial axes x and y, and spectral axis λ ).
y(x) =
n
ak k (x , xk ) + b
associated with the first overtone of OeH band in water. The peak appeared at about 1450 nm was attributed to the combination of second overtone of CeH stretching and the first overtone of OeH stretching. The absorption peak at about 1960 nm was related to the OeH stretching and bending vibration [38]. Thus, it is possible to use NIR method to determine the SSC in apple. All the spectral curves have similar trends, but they are dissimilar in reflectance intensity, which indicates that different fruit basically have the same internal substance, but the individual compounds are different in content. The difference in spectral reflectance intensity provides a premise for establishing the regression model. After comparison, the second derivative was better than the other preprocessing methods in SSC prediction. Fig. 2b shows the spectra after second derivative preprocessing in the wavelength range of 1000–2500 nm. Therefore, the preprocessed spectra were used for subsequent variables selection and modeling analysis.
(3)
where k (x , xk ) is the kernel function, xk is the input vector, ak is the Langland multiplier (also known as the support vector), and b is the model deviation. The performance of the established PLS and LS-SVR models was evaluated by calculating the coefficient of calibration (Rc), cross-validation (Rcv) and prediction (Rp), and corresponding root mean square error of calibration (RMSEC), RMSECV, and prediction (RMSEP). Generally, a satisfactory model should have higher values of Rc, Rcv, and Rp, and lower values of RMSEC, RMSECV and RMSEP as well as a small difference between them. It is always expected to obtain R as close as one and RMSE as close as zero [36]. The overall scheme of this study is shown in Fig. 1. 3. Results and discussion 3.1. Spectral analysis
3.2. PLS and LS-SVR models based on full wavelengths
NIR was sensitive to the concentrations of organic materials, which involved the response of molecular bonds CeH, OeH, and NeH [37]. Fig. 2a shows spectral curves in the effective region of all samples. It can be seen from Fig. 2a that the spectra have obvious absorption peaks at 1200, 1450 and 1960 nm. The absorption peak around 1200 nm was
The PLS and LS-SVR calibration models were established over the full spectral range of spectra preprocessed by second derivative and corresponding SSCs. During the calibration process, the quantitative relationship between the reflectance spectra of samples and their corresponding SSCs was established based on the samples from the
Fig. 2. The original reflectance spectra of all apples with the wavelength range of 1000–2500 nm (a) and the spectra after second derivative preprocessing (b). 300
Infrared Physics and Technology 98 (2019) 297–304
D. Zhang, et al.
Table 2 The prediction results of SSC of apple by different PLS models. Variable selection methods
Full-spectrum-PLS CARS-PLS SPA-PLS RF-PLS CARS-SPA-PLS CARS-RF-PLS
No. of variables
238 25 17 9 10 10
LVs
Rcv
15 8 6 6 7 7
RMSECV
0.925 0.951 0.916 0.908 0.928 0.924
0.474 0.382 0.496 0.516 0.459 0.472
Calibration set
Prediction set
Rc
RMSEC
Rp
RMSEP
0.976 0.960 0.931 0.922 0.939 0.936
0.268 0.344 0.452 0.478 0.423 0.435
0.913 0.913 0.878 0.863 0.907 0.888
0.482 0.469 0.545 0.573 0.479 0.523
Bold font implies that the model was emphasized and finally selected.
calibration set. Then the prediction set was analyzed as a new test set to estimate the actual predictive capability of the established models. Table 2 shows the results of calibration, cross-validation, and prediction of different PLS models. Selecting the ideal number of LVs is a critical step in building a robust PLS model with the best calibration results. A smaller number of LVs will lead to poor prediction ability, while a larger number will result in overfitting. The lowest value of RMSECV is commonly considered as the indicator of the optimal number of LVs [20]. From Table 2, in the case of full-spectrum-PLS model, RMSECV has the lowest value when 15 LVs was used, with Rc and Rp of 0.976 and 0.913, respectively, RMSEC and RMSEP of 0.268 and 0.482 °Brix, respectively. In order to obtain a LS-SVR model, two additional optimization parameters: γ and σ2 need to be determined at first. These two parameters are very important, which determine the learning ability, predictive ability and generalization ability of LS-SVR [39]. γ is a regular parameter, depending on the trade-off between training error minimization and estimation function smoothing; σ2 is a parameter of the kernel function. In this case, a leave-one-out cross-validation method was performed to determine the two optimal parameters and the result was shown in Fig. 3. The first step of grid search was for a crude search with a large step size and the second step for the specified search with a small step size [40]. Grid points ‘‘.’’ and ‘‘×’’ represent research range and step size of first and second step grid searches, respectively. Curves represent the contour error. Table 3 illustrates the performance of the LS-SVR models in calibration and prediction. Compared with the obtained PLS model, the established LS-SVR model also showed more satisfactory results, the Rc, RMSEC, Rp, RMSEP were 0.968, 0.310 °Brix, 0.912, 0.466 °Brix, respectively. In fact, according to the requirements of the model evaluation, both models established using the PLS and LS-SVR algorithms showed excellent performance. Furthermore, based on the accuracy and
Table 3 The prediction results of SSC of apple by different LS-SVR models. Variable selection methods
No. of variables
Calibration set
Prediction set
Rc
RMSEC
Rp
RMSEP
Full-spectrum-LS-SVR CARS-LS-SVR SPA-LS-SVR RF-LS-SVR CARS-SPA-LS-SVR CARS-RF-LS-SVR
238 25 17 9 10 10
0.968 0.966 0.941 0.940 0.959 0.942
0.310 0.320 0.418 0.422 0.352 0.414
0.912 0.912 0.874 0.870 0.917 0.883
0.466 0.470 0.552 0.560 0.453 0.532
Bold font implies that the model was emphasized and finally selected.
reliability of the obtained prediction models, it was demonstrated that the hyperspectral imaging system over the full spectral range (1000–2500 nm) has potential to predict SSC of apples. 3.3. PLS and LS-SVR models based on selected wavelengths Based on the foregoing description, it has been demonstrated that hyperspectral imaging system is suitable for predicting the SSC of apples over the full wavelength range. However, transforming these researches into online detection is limited because of the complexity of hyperspectral data processing and computer hardware limitations. Therefore, it is important to select the characteristic wavelengths instead of the full wavelength to participate in the operation and obtain similar precision. Therefore, CARS, SPA, RF, and CARS-SPA, CARS-RF methods were used to select characteristic variables that carried the most valuable information about SSC. 3.3.1. Effective variable selection by CARS The purpose of the CARS method is to eliminate useless variables and improve the performance of the model. The results of the CARS algorithm for screening full spectrum effective variables are shown in Fig. 4. Fig. 4a shows that under the action of the exponential decay function (EDP), the speed of variable reduction is from fast to slow, which displayed the rough selection and refined selection of CARS, greatly improving the efficiency of variables selection. Fig. 4b shows that with the number of sampling runs increases, the RMSECV value of the PLS prediction model decreases gradually, indicating that a large amount of irrelevant or noise information was removed in the full spectrum. When the number of sampling runs reaches 24 (marked as open blue square), the RMSECV value increases, which indicated that some important variables related to apple SSC prediction were excluded. The RMSECV value of the PLS model established by the subset of variables obtained in the 24th sampling reaches the minimum (Fig. 4b). As a result of CARS calculation, 25 variables were selected as the important variables for determining SSC. 3.3.2. Effective variable selection by SPA Fig. 5a shows the RMSEP plot with the increasing of selected variables obtained by SPA. It is clear that RMSEP has a downward trend
2
Fig. 3. The optimization of the two parameters, γ and σ . 301
Infrared Physics and Technology 98 (2019) 297–304
D. Zhang, et al.
Fig. 4. The changing trend of the number of sample variables (a) and RMSECV values (b) with the increasing of sampling runs by CARS.
with the increase of the selected variable number, although some fluctuations are also existent, the trend was not obvious after n > 19. When the 17 variables were selected (marked as open blue square), the RMSEP reached its optimal value. Fig. 5b shows the selected 17 variables (marked as solid blue circles). It can be observed that all the wavelengths selected by SPA were concentrated at the spectra range of 1000–1300 nm. 3.3.3. Effective variable selection by RF The result of the selection probability for every wavelength by RF is shown in Fig. 6, the larger the selection probability, the more important the corresponding wavelength. All wavelengths were ranked in the descending order of importance on the basis of the selection probability. In order to determine the optimal wavelength number, PLS models were developed using the ranked wavelengths in turn. After calculation of the RMSECV values of each PLS models, the top nine wavelengths above the cutoff threshold were chosen as the characteristic wavelengths for further analysis.
Fig. 6. Effective variable selection by RF.
3.3.4. PLS and LS-SVR models based on effective variables selected by CARS, SPA, RF The simplified PLS and LS-SVR models were established using the selected variables. The results of PLS and LS-SVR models are shown in Table 2 and Table 3, respectively. CARS algorithm performed the optimal performance for both PLS and LS-SVR models. Compared with the full-spectrum-PLS model, CARS-PLS model based on 25 variables improved the SSC prediction results with RMSEP decreased by 2.7%, CARS-LS-SVR almost had the same predictive power as full-spectrum-
LS-SVR model. Although the number of variables was reduced to 17, SPA exhibited poorer performance than CARS algorithm, whether in the PLS or LS-SVR model. SPA was operated on the full-spectrum which contained some uninformative variables [41], which could explain why the selected variables by SPA were not effective for SSC prediction. Thus, it might be meaningful to eliminate the uninformative variables before using SPA. For this reason, it might be possible to improve the SPA performance by conducting CARS operation before using SPA. In
Fig. 5. RMSEP plot (a) and the selected variables for SSC (b) by SPA. 302
Infrared Physics and Technology 98 (2019) 297–304
D. Zhang, et al.
addition, although the RF algorithm is less effective, considering the minimum number of extracted feature variables, RF is also be used for further discussion in this study. 3.3.5. PLS and LS-SVR models based on effective variables selected by CARS-SPA, CARS-RF Although the models based on effective wavelengths selected by CARS obtained the best results, the spectral variables must be further optimized for online application in the future. Therefore, two combination algorithms, namely CARS-SPA and CARS-RF, were proposed based on CARS to select the most effective variables for the determination of SSC of apple. CARS was firstly used to eliminate the uninformative variables in the full-spectrum data, followed by the SPA and RF that were applied to determine the effective variables separately. It can be seen from Table 2 that the CARS-SPA-PLS model has a better prediction result (with Rc = 0.939, RMSEC = 0.423, Rp = 0.907 RMSEP = 0.479) than the SPA-PLS model (with Rc = 0.931, RMSEC = 0.452, Rp = 0.878, RMSEP = 0.545), and the number of selected variables is decreased from 17 to 10, which improved the model fitting speed and efficiency. Similarly, as seen from Table 3, when the ten variables selected by CARS-SPA were set as the inputs of the LS-SVR model, a better prediction result (with Rc = 0.959 RMSEC = 0.352, Rp = 0.917, RMSEP = 0.453) was obtained than the SPA-LS-SVR model (with Rc = 0.941, RMSEC = 0.418, Rp = 0.874, RMSEP = 0.552). In addition, it can be observed that the prediction difference between calibration set and prediction set became smaller. In summary, the CARS-SPA algorithm had certain advantages than SPA in effective variable selection. However, it should be noticed that although CARS-RF also selected ten variables, the performance of CARS-RF-PLS model with Rp of 0.888 and RMSEP of 0.523 was obviously worse than that of CARS-SPA-PLS model with Rp of 0.907 and RMSEP of 0.479 in terms of prediction set. A similar difference between CARS-SPA-LS-SVR and CARS-RF-LS-SVR also existed. These results implied that CARS-RF algorithm was not optimal in variable selection, might due to the removal of some important variables. In comparison to the performance of CARS-PLS model based on 25 variables and full-spectrum-PLS model based on original spectra 238 variables, CARS-SPA-PLS model based on ten variables showed a little inferior. However, the performance of CARS-SPA was still acceptable as the number of variables was less than those selected by CARS alone and far less than the full spectra variables. On the other hand, CARS-SPA-LSSVR model showed the highest Rp and the lowest RMSEP than fullspectrum-LS-SVR and other variables selection algorithms. It would make a great help for the simplification of the predicted model and the application of HSI in online quality grading of apple. Fig. 7 brings the scatter plots for visualizing the distributions of actual measured and
Fig. 8. Variables selected by CARS-SPA algorithm.
predicted SSC values by CARS-SPA-PLS and CARS-SPA-LS-SVR, the solid line is the regression line corresponding to the ideal correlation between the measured values and predicted values. It can be noticed that the predicted values were basically close to the corresponding actual values for both calibration and prediction sets. The ten effective wavelengths selected by CARS-SPA for the simplified model, including 1060.932, 1091.999, 1123.098, 1260.304, 1492.423, 1549.143, 1605.964, 1853.360, 1955.405, 2276.359 nm, were almost evenly distributed between 1000 and 2500 nm (Fig. 8). Among them, 1605.964 and 1853.360 nm also be used as feature wavelengths in noninvasive measurements of SSC of ‘Fuji’ apple [42]. The prediction results obtained in this study (Rp = 0.917, RMSEP = 0.453 °Brix) were better than those obtained by Peng et al. [5] (Rp = 0.883, RMSEP = 0.730 °Brix) and Mendoza et al. [9] (Rp = 0.880, RMSEP = 0.670 °Brix for GD apples, Rp = 0.780, RMSEP = 0.720 °Brix for JG apples, Rp = 0.680, RMSEP = 0.900 °Brix for RD apples) who used hyperspectral scattering technique over the spectral region of 500–1000 nm. The predictive statistics from this study was also more promising than that reported by Fan et al.[43] (Rp = 0.919, RMSEP = 0.675 °Brix) using Vis-NIR hyperspectral reflectance imaging technology. The overall results indicated that CARS-SPA has the potential to select NIR hyperspectral effective wavelengths, the performance of two simplified models using optimal wavelengths was comparable to that of the original model using full wavelengths. Base on the ten wavelengths selected by CARS-SPA, it is feasible to develop a multi-spectral system by capturing images at the selected wavelengths, thus reducing the total volume of the data and improving the processing speed and efficiency,
Fig. 7. The scatter plots of measured and predicted SSC based on PLS (a) and LS-SVR (b) models with the variables selected by CARS-SPA. 303
Infrared Physics and Technology 98 (2019) 297–304
D. Zhang, et al.
which could meet the requirements of real-time detection of apple SSC. However, the model proposed in this paper is based on the hyperspectral image of the equatorial positions. Since the apple is an uneven sphere and the whole sample is unevenly illuminated, visualizing the distribution of SSC on the surface is still a challenging task.
112–118. [13] S. Fan, W. Huang, Z. Guo, B. Zhang, C. Zhao, Prediction of soluble solids content and firmness of pears using hyperspectral reflectance imaging, Food Anal. Methods 8 (2015) 1936–1946. [14] B. Zhang, B. Gu, G. Tian, J. Zhou, J. Huang, Y. Xiong, Challenges and solutions of optical-based nondestructive quality inspection for robotic fruit and vegetable grading systems: a technical review, Trends Food Sci. Technol. (2018). [15] B.S. Zhan, J.H. Ni, J. Li, Hyperspectral technology combined with CARS algorithm to quantitatively determine the SSC in Korla fragrant pear, Spectroscopy Spectral Anal. 34 (2014) 2752–2757. [16] B. Zhang, W. Huang, L. Gong, J. Li, C. Zhao, C. Liu, D. Huang, Computer vision detection of defective apples using automatic lightness correction and weighted RVM classifier, J. Food Eng. 146 (2015) 143–151. [17] H. Wang, C. Li, M. Wang, Quantitative determination of onion internal quality using reflectance, interactance, and transmittance modes of hyperspectral imaging, Trans. ASABE 56 (2013) 1623–1635. [18] C. Mo, M.S. Kim, G. Kim, J. Lim, S.R. Delwiche, K. Chao, H. Lee, B.-K. Cho, Spatial assessment of soluble solid contents on apple slices using hyperspectral imaging, Biosyst. Eng. 159 (2017) 10–21. [19] G. ElMasry, D. Sun, P. Allen, Non-destructive determination of water-holding capacity in fresh beef by using NIR hyperspectral imaging, Food Res. Int. 44 (2011) 2624–2633. [20] M. Kamruzzaman, G. ElMasry, D.-W. Sun, P. Allen, Prediction of some quality attributes of lamb meat using near-infrared hyperspectral imaging and multivariate analysis, Anal. Chim. Acta 714 (2012) 57–67. [21] X. Zhang, F. Liu, Y. He, X. Gong, Detecting macronutrients content and distribution in oilseed rape leaves based on hyperspectral imaging, Biosyst. Eng. 115 (2013) 56–65. [22] S. Fan, W. Huang, J. Li, C. Zhao, B. Zhang, Characteristic wavelengths selection of soluble solids content of pear based on NIR spectral and LS-SVM, Spectrosc. Spect. Anal 34 (2014) 2089–2093. [23] A. Gowen, C. O'Donnell, P. Cullen, G. Downey, J. Frias, Hyperspectral imaging–an emerging process analytical tool for food quality and safety control, Trends Food Sci. Technol. 18 (2007) 590–598. [24] K. Yu, Y. Zhao, X. Li, Y. Shao, F. Zhu, Y. He, Identification of crack features in fresh jujube using Vis/NIR hyperspectral imaging combined with image processing, Comput. Electron. Agric. 103 (2014) 1–10. [25] B. Zhang, D. Dai, J. Huang, J. Zhou, Q. Gui, F. Dai, Influence of physical and biological variability and solution methods in fruit and vegetable quality nondestructive inspection by using imaging and near-infrared spectroscopy techniques: a review, Crit. Rev. Food Sci. Nutr. 58 (2018) 2099–2118. [26] H. Wang, J. Peng, C. Xie, Y. Bao, Y. He, Fruit quality evaluation using spectroscopy technology: a review, Sensors 15 (2015) 11889–11927. [27] A. Savitzky, M.J. Golay, Smoothing and differentiation of data by simplified least squares procedures, Anal. Chem. 36 (1964) 1627–1639. [28] J. Burger, A. Gowen, Data handling in hyperspectral image analysis, Chemom. Intell. Lab. Syst. 108 (2011) 13–22. [29] H. Li, Y. Liang, Q. Xu, D. Cao, Key wavelengths screening using competitive adaptive reweighted sampling method for multivariate calibration, Anal. Chim. Acta 648 (2009) 77–84. [30] D. Liu, D. Sun, X.-A. Zeng, Recent advances in wavelength selection techniques for hyperspectral image processing in the food industry, Food Bioprocess Technol. 7 (2014) 307–323. [31] H. Li, Q. Xu, Y. Liang, Random frog: an efficient reversible jump Markov Chain Monte Carlo-like approach for variable selection with applications to gene selection and disease classification, Anal. Chim. Acta 740 (2012) 20–26. [32] J. Zheng, Z. Zhu, Z. Shanmin, Chestnut browning detected with near infrared spectroscopy and a random frog algorithm, J Zhejiang A & F Univ 33 (2016) 322–329. [33] D. Wu, D. Sun, Potential of time series-hyperspectral imaging (TS-HSI) for noninvasive determination of microbial spoilage of salmon flesh, Talanta 111 (2013) 39–46. [34] Y. Feng, D. Sun, Near-infrared hyperspectral imaging in tandem with partial least squares regression and genetic algorithm for non-destructive determination and visualization of Pseudomonas loads in chicken fillets, Talanta 109 (2013) 74–83. [35] L. Xuemei, L. Jianshe, Measurement of soil properties using visible and short wavenear infrared spectroscopy and multivariate calibration, Measurement 46 (2013) 3808–3814. [36] J. Cheng, D. Sun, Rapid and non-invasive detection of fish microbial spoilage by visible and near infrared hyperspectral imaging and multivariate analysis, LWTfood Sci. Technol. 62 (2015) 1060–1068. [37] P.K. Ghosh, D.S. Jayas, Use of spectroscopic data for automation in food processing industry, Sens. Instrum. Food Qual. Saf. 3 (2009) 3–11. [38] P.J. Curran, Remote sensing of foliar chemistry, Remote Sens. Environ. 30 (1989) 271–278. [39] J.A. Suykens, T. Van Gestel, J. De Brabanter, Least Squares Support Vector Machines, World Scientific, 2002. [40] J. Li, W. Huang, C. Zhao, B. Zhang, A comparative study for the quantitative determination of soluble solids content, pH and firmness of pears by Vis/NIR spectroscopy, J. Food Eng. 116 (2013) 324–332. [41] J. Li, W. Huang, L. Chen, S. Fan, B. Zhang, Z. Guo, C. Zhao, Variable selection in visible and near-infrared spectral analysis for noninvasive determination of soluble solids content of ‘Ya’pear, Food Anal. Methods 7 (2014) 1891–1902. [42] Z. Xiaobo, Z. Jiewen, H. Xingyi, L. Yanxiao, Use of FT-NIR spectrometry in noninvasive measurements of soluble solid contents (SSC) of ‘Fuji’apple based on different PLS models, Chemom. Intell. Lab. Syst. 87 (2007) 43–51. [43] S. Fan, B. Zhang, J. Li, C. Liu, W. Huang, X. Tian, Prediction of soluble solids content of apple using the combination of spectra and textural features of hyperspectral reflectance imaging data, Postharvest Biol. Technol. 121 (2016) 51–61.
4. Conclusions A push broom hyperspectral imaging system in the spectral range of 1000–2500 nm was used to acquire reflectance images of apple in the region of interest at the equator. The PLS and LS-SVR models were established using the full wavelengths and the selected wavelengths by different variable selection methods. The prediction results showed that the model developed by 10 feature variables selected by CARS-SPA had the best results. The Rc and the RMSEC of PLS model were 0.939 and 0.423 °Brix respectively, and the Rp and RMSEP were 0.907 and 0.479 °Brix. Meanwhile, the Rc, RMSEC, Rp and RMSEP of LS-SVR model were 0.959, 0.352 °Brix, 0.917, 0.453 °Brix respectively. The results showed that the simplified PLS and LS-SVR models (also known as CARS-SPAPLS and CARS-SPA-LS-SVR models) reduced the number of modeling variables, optimized the model and improved the accuracy of the model. This study could be very helpful for real-time monitoring of the quality of apples using spectral imaging technique. In future work, more samples with different species and growing sites will be collected to improve the universality of the model. The visualization of SSC distribution using the NIR HSI technology will also be investigated. Conflict of interest The authors declared that there is no conflict of interest. Acknowledgements The authors gratefully acknowledge the financial support provided by National Natural Science Foundation of China (No. 31671927 and No. 41771463) and the Young Scientist Fund of Beijing Academy of Agriculture and Forestry Sciences (No. QNJJ201818). References [1] R. Lu, Multispectral imaging for predicting firmness and soluble solids content of apple fruit, Postharvest Biol. Technol. 31 (2004) 147–157. [2] M.-H. Hu, Q.-L. Dong, B.-L. Liu, U.L. Opara, Prediction of mechanical properties of blueberry using hyperspectral interactance imaging, Postharvest Biol. Technol. 115 (2016) 122–131. [3] P. Paz, M.-T. Sánchez, D. Pérez-Marín, J.E. Guerrero, A. Garrido-Varo, Instantaneous quantitative and qualitative assessment of pear quality using near infrared spectroscopy, Comput. Electron. Agric. 69 (2009) 24–32. [4] F. Liu, Y. He, L. Wang, Comparison of calibrations for the determination of soluble solids content and pH of rice vinegars using visible and short-wave near infrared spectroscopy, Anal. Chim. Acta 610 (2008) 196–204. [5] Y. Peng, R. Lu, Analysis of spatially resolved hyperspectral scattering images for assessing apple fruit firmness and soluble solids content, Postharvest Biol. Technol. 48 (2008) 52–62. [6] T. Brosnan, D. Sun, Inspection and grading of agricultural and food products by computer vision systems—a review, Comput. Electron. Agric. 36 (2002) 193–213. [7] G.A. Leiva-Valenzuela, R. Lu, J.M. Aguilera, Prediction of firmness and soluble solids content of blueberries using hyperspectral reflectance imaging, J. Food Eng. 115 (2013) 91–98. [8] X. Wei, F. Liu, Z. Qiu, Y. Shao, Y. He, Ripeness classification of astringent persimmon using hyperspectral imaging technique, Food Bioprocess Technol. 7 (2014) 1371–1380. [9] F. Mendoza, R. Lu, D. Ariana, H. Cen, B. Bailey, Integrated spectral and image analysis of hyperspectral scattering data for prediction of apple fruit firmness and soluble solids content, Postharvest Biol. Technol. 62 (2011) 149–160. [10] W.-H. Lee, M.S. Kim, H. Lee, S.R. Delwiche, H. Bae, D.-Y. Kim, B.-K. Cho, Hyperspectral near-infrared imaging for the detection of physical damages of pear, J. Food Eng. 130 (2014) 1–7. [11] B. Li, M. Cobo-Medina, J. Lecourt, N.B. Harrison, R.J. Harrison, J.V. Cross, Application of hyperspectral imaging for nondestructive measurement of plum quality attributes, Postharvest Biol. Technol. 141 (2018) 8–15. [12] X. Li, Y. Wei, J. Xu, X. Feng, F. Wu, R. Zhou, J. Jin, K. Xu, X. Yu, Y. He, SSC and pH for sweet assessment and maturity classification of harvested cherry fruit based on NIR hyperspectral imaging technology, Postharvest Biol. Technol. 143 (2018)
304