LWT - Food Science and Technology 72 (2016) 63e70
Contents lists available at ScienceDirect
LWT - Food Science and Technology journal homepage: www.elsevier.com/locate/lwt
Development of NIRS models for rapid quantification of protein content in sweetpotato [Ipomoea batatas (L.) LAM.] nia Ine ^s Messo Naidoo b, c, Sunette M. Laurie b, Lembe Samukelo Magwaza a, *, So Mark D. Laing c, Hussein Shimelis c a Department of Crop Science, School of Agricultural, Earth and Environmental Sciences, University of KwaZulu-Natal, Private Bag X01, Scottsville, 3209 Pietermaritzburg, South Africa b Agricultural Research Council-Vegetable and Ornamental Plants (ARC-VOP), Private Bag X 293, Pretoria 0001, South Africa c African Centre for Crop Improvement (ACCI), School of Agricultural, Earth and Environmental Sciences, University of KwaZulu-Natal, Private Bag X01, Pietermaritzburg 3209, South Africa
a r t i c l e i n f o
a b s t r a c t
Article history: Received 30 December 2015 Received in revised form 4 March 2016 Accepted 16 April 2016 Available online 19 April 2016
Near-infrared spectroscopy (NIRS) is an alternative analytical method that can be used to quantify protein content in sweetpotato. It is relatively cheaper and efficient than other methods. This study was conducted to develop NIRS-based models for quantifying protein content of sweetpotato for selection or wide-area production of recommended varieties. A pool of 104 sweetpotato varieties were sampled and roots scanned using NIR spectrometer. Calibration models were developed by subjecting spectral and reference datasets to partial least squares regression. Several pre-processing methods were investigated. Models that yielded the highest coefficient of determination (R2), residual predictive deviation (RPD) and lowest root mean square error of calibration (RMSEC) and prediction (RMSEP) were selected. Optimal model performances were obtained using second derivative pre-processing, showing the highest values of R2v, RMSEP and RPDv of 0.98, 0.29, and 4.0, respectively. The regression analysis indicated that informative NIR bands for quantifying protein content of sweetpotatoes ranged between 1600 and 2200 nm. The results demonstrated that NIRS is capable of predicting protein content on sweetpotatoes, rapidly and accurately. Therefore, the NIRS model developed in this study may help to quantify protein composition of sweetpotato for rapid screening of germplasm in breeding programs with high throughput and accuracy. © 2016 Elsevier Ltd. All rights reserved.
Keywords: Partial least square Non-destructive Calibration Validation
1. Introduction Protein-energy malnutrition is a growing concern in the €ssner, 2003) caused by the developing countries (De Onis & Blo societies’ economic inequalities (Zere & MacIntyre, 2003). Lack of adequate protein-energy consumption in the human diet has serious health implications such as kwashiorkor and/or marasmus. The past 40 years witnessed efforts in agriculture towards the production of high energy foods especially cereals to mitigate hunger (Saltzman et al., 2013). Due to the high levels of hidden hunger (macronutrient deficiencies) in the developing countries, the current challenge remains to produce high quality and
* Corresponding author. E-mail addresses:
[email protected], (L.S. Magwaza). http://dx.doi.org/10.1016/j.lwt.2016.04.032 0023-6438/© 2016 Elsevier Ltd. All rights reserved.
[email protected]
relatively inexpensive food, that are rich in vital micronutrients and able to meet the daily requirement and sustain a healthy diet (Bouis, Hotz, Mcclafferty, & Pfeiffer, 2013; Saltzman et al., 2013). Bio-fortification of crops through plant breeding is a cheaper and sustainable approach to avail micronutrients to the resource poor households through improved staple foods (Bouis et al., 2013). Sweetpotato [Ipomoea batatas (L.) Lam.] is one of the major staple crops contributing to food security in the developing countries, especially in the African continent (FAO, 2014). Interest in growing sweetpotato has grown tremendously during the past three decades due to the fact that sweetpotato roots and leaves combine rich nutritional content (carbohydrates, fibre, pro-vitamin A, vitamin C, riboflavin, thiamine and niacin as well as high levels of edible energy) (Lebot, 2009; Woolfe, 1992). The crop is particularly preferred by poor resourced and/or small-scale farmers for its ability of growing in marginal conditions with limited agricultural inputs (Lebot, 2009; Woolfe, 1992).
64
L.S. Magwaza et al. / LWT - Food Science and Technology 72 (2016) 63e70
Despite the high level of energy, sweetpotato in general presents considerably lower levels of protein (1.6%) compared to most high energy cereals such as rice (Oriza sativa L.) (7.5%), maize (Zea mays L.) (9.5%), millet (Pennisetum glaucum L.) (10.5%) and sorghum (Sorghum bicolor (L.) Moench) (10.5%)] (Bowkamp, 1985). Woolfe (1992) reported sweetpotato genotypes possessing protein content ranging from 0.6 to 9.14% of the dry matter content. Tumwegamire, Kapinga, et al. (2011), Tumwegamire, Rubaihayo, et al. (2011) later confirmed the existence of genetic variability of sweetpotatoes regarding the protein content. The results of 90 accessions assessed by Tumwegamire, Kapinga, et al. (2011), Tumwegamire, Rubaihayo, et al. (2011) showed that the protein content varied from 3.8 to 9.5% of dry matter, opening the enormous possibility for improving the low protein content in the commonly used and consumer accepted cream and orange sweetpotato cultivars. With the high variability among genotypes, protein content should be one of the parameters to be considered in the selection of sweetpotato clones for breeding or wide-area production of the superior genotypes. A comprehensive collection of 506 sweetpotato accessions were made by the Agricultural Research Council e Vegetable and Ornamental Plants (ARC-VOP) of South Africa. These genetic resources have not been characterized with regards to the magnitude of protein content. In this endeavor, the major drawback is the high cost of protein determination using time consuming conventional methods such as Dumas combustion (AOAC, 1980). In breeding programs where value addition is a major priority, there is a need for fast and accurate technique for quantifying chemical constituents of samples. Unless breeders are supported with high throughput and quick techniques in the selection process of elite breeding material, breeding for high nutritional quality in sweetpotato will remain unexploited. Near infrared spectroscopy (NIRS)has become the most widely used and alternative to conversional analytical methods due to its rapidity, simplicity, accuracy, cost effectiveness and potential for routine analysis and quantification of nutrients in food products. NIRS has been shown to accurately predict protein content in different food products, including sweetpotatoes (Tumwegamire, Kapinga, et al., 2011, Tumwegamire, Rubaihayo, et al.,2011) and rice (Bagchi, Sharma, & Chattopadhyay, 2016). The technology has been previously used successfully in screening large numbers of sweetpotato samples for total protein content (Tumwegamire et al., 2011; Diaz, Veal, & Chinn, 2014) and other nutrient including starch (Katayama, Komaki, & Tamiya, 1996; Lu, Huang, Zhang, 2006a,b; Tumwegamire, Kapinga, et al., 2011, Tumwegamire, Rubaihayo, et al., 2011), sugar (Katayama et al., 1996); sucrose, b-carotene, iron, zinc and magnesium (Diaz et al., 2014; Tumwegamire, Kapinga, et al.,2011, Tumwegamire, Rubaihayo, et al., 2011; Zum Felde et al., 2007, 2009). NIRS has also been used to analyze and discriminate pure and adulterated powdered sweetpotato samples (Ding, Ni, & kokot, 2015). Although NIRS has been used to screen for macronutrients in root and tuber crops (Haase, 2006; Mehrübeoglu , 1997; Young, MacKerron, & Davies, 1997), including & Cote sweetpotato (Lebot, Champagne, Malapa, & Shiley, 2009; 2011; Lu et al. 2006a,b; Tumwegamire, Kapinga, et al.,2011, Tumwegamire, Rubaihayo, et al.,2011), very few studies have reported on calibration models developed for predicting and quantifying protein content in sweetpotato (Diaz et al., 2014). Several studies have shown that the accuracy of NIRS prediction models depends mostly on crops growing conditions such as geographical and seasonal variability which may results in differences in biochemical composition of samples (Magwaza et al., 2014, Magwaza et al., 2014a). The analytical performance of NIRS models has been shown to be highly dependent on genotypes (Lebot et al., 2009). These variations have been considered important to include
in Vis/NIRS because chemical composition alters NIRS optical characteristics and spectral band assignment with concomitant effect on model predictions (Golic, Walsh, & Lawson, 2003; Guthrie, Reid, & Walsh, 2005; McGlone, Jordan, & Martinsen, 2002). Consequently, calibration models developed for one population may not be useful to another, even within a species. This indicates that proper calibration and validation steps with reference data, generated through laboratory method is the primary requirement to establish reliable prediction models. Therefore, this study was conducted to develop and validate NIRS calibration models for fast and accurate prediction of protein content of sweetpotato germplasm collections. 2. Materials and methods 2.1. Plant material The study used 104 genetically diverse sweetpotato genotypes collected and maintained by the Agricultural Research Council e Vegetable and Ornamental Plants (ARC-VOP) of South Africa (Table 1). The collections are maintained in vivo under greenhouse condition and in vitro. The in vitro plantlets were acclimatized in the glasshouse and later planted in pots for vine multiplication. Stem cutting (with at least 3 nodes or 30 cm length) from all the accessions were established at ARC-VOP in a field trial consisting of two replicates (five plants per plot). 2.2. Sampling and sample preparation At harvest maturity (five months after plating), two roots per replicate were sampled. The roots were peeled and cut longitudinally into four parts. Samples of two opposite quarters were grated and freeze dried. The freeze dried samples were manually milled using a mortar and a pestle. Afterwards milled samples were stored in a dry and dark place until further protein analysis was conducted. Powdered sweet potato samples (5 g) of each genotype were prepared and kept in a temperature insulated container and samples sent to the University of KwaZulu-Natal (UKZN) Plant Pathology laboratory, where NIRS analyses were conducted. Samples arrived at UKZN within 24 h and stored for 24 h at room temperature (20 C) prior to NIR analysis. 2.3. NIRS spectral acquisition NIR spectral data of powdered sweetpotato samples were acquired using a method described by Sabatier, Moon, Mhora, Rutherford, and Laing (2013) with slight modification. Reflectance NIR spectra was acquired using a laboratory bench-top monochromator NIRSystems Model XDS spectrometer (Foss NIRSystems, Inc., Silver Spring, MD, USA) equipped with a quartz halogen lamp and PbS detector. Prior to scanning and after each 30 min, the NIR spectrometer was calibrated by scanning a 100% white reference tile. The spectra were acquired with a circular sample cup with a quartz window (38 ¼ mm in diameter and 10 mm in thickness). All samples were scanned by carefully placing on a sample cup in an enclosed box, specifically designed to prevent light leakage. The NIR system was operated with Vision software (Vision TM, version 3.5.0.0, Tidestone Technologies Inc., KS, USA). Reflectance spectra were obtained at 2 nm intervals from 400 to 2500 nm wavelength range. Each spectrum consisted of 32 scans which were automatically averaged and saved as absorbance intensity [log (1/R)]. The integration time was less than 500 ms per spectrum collected. Each sample was scanned three times and the spectra averaged.
L.S. Magwaza et al. / LWT - Food Science and Technology 72 (2016) 63e70
65
Table 1 List of sweetpotato varieties used for NIRS prediction and destructive reference analysis of protein content characterization. Sample number
Genotype name
Genotype code
Trial
Locality
Origin
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73
199062.1 199062.1 199062.1 199062.1 199062.1 1999-9-4 1999-9-4 2004-9-2 2004-9-2 2004-9-2 2007-7-2 2008-12-4 2008-12-4 2008-12-5 2008-12-5 2008-1-4 2008-3-1 2008-8-5 2008-8-5 2010-1-1 2010-15-2 2010-15-2 2010-3-1 2010-5-4 2010-6-4 2010-6-5 2010-7-2 2010-9-1 2011-10-2 2011-10-2 2011-14-2 2011-14-2 2011-14-4 2011-14-4 2011-14-5 2011-14-5 2011-16-1 2011-16-1 2011-21-1 2011-21-1 2011-25-2 2011-25-2 2011-28-1 2011-28-1 2011-29-1 2011-29-1 2011-33-1 2011-42-1 2011-42-1 2011-7-2 2011-7-2 2011-8-1 2011-8-1 2012-10-1 2012-1-1 2012-13-1 2012-14-1 2012-15-1 2012-15-5 2012-16-1 2012-17-1 2012-18-2 2012-29-1 2012-29-4 2012-30-1 2012-32-1 2012-33-2 2012-34-1 2012-40-1 2012-43-1 2012-8-2 2012-8-4 2012-8-5
1 1 1 2 2 3 3 4 4 4 5 6 6 7 7 8 9 10 10 11 12 12 13 14 15 16 17 18 19 19 20 20 21 21 22 22 23 23 24 24 25 25 26 26 27 27 28 29 29 30 30 31 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51
PYT IYT AYT IYT AYT IYT AYT IE PYT PYT IYT IYT AYT AYT AYT IYT AYT AYT AYT IYT IYT AYT IYT IYT IYT IYT IYT IYT PYT PYT PYT PYT PYT PYT PYT PYT PYT PYT PYT PYT PYT PYT PYT PYT PYT PYT PYT PYT PYT PYT PYT PYT PYT IE IE IE IE IE IE IE IE IE IE IE IE IE IE IE IE IE IE IE IE
OSCA OSCA OSCA OSCA OSCA OSCA RDP RDP OSCA OSCA OSCA OSCA RDP RDP RDP RDP RDP OSCA OSCA RDP RDP OSCA RDP RDP RDP RDP OSCA OSCA OSCA OSCA OSCA OSCA OSCA OSCA OSCA OSCA OSCA OSCA OSCA OSCA OSCA OSCA OSCA OSCA OSCA OSCA OSCA OSCA OSCA OSCA OSCA OSCA OSCA RDP RDP RDP RDP RDP RDP RDP RDP RDP RDP RDP RDP RDP RDP RDP RDP RDP RDP RDP RDP
CIPCIPCIPARC ARC ARC ARC ARC ARC ARC ARC ARC ARC ARC ARC ARC ARC ARC ARC ARC ARC ARC ARC ARC ARC ARC ARC ARC ARC ARC ARC ARC ARC ARC ARC ARC ARC ARC ARC ARC ARC ARC ARC ARC ARC ARC ARC ARC ARC ARC ARC ARC ARC ARC ARC ARC ARC ARC ARC ARC ARC ARC ARC ARC ARC ARC ARC ARC ARC ARC ARC ARC ARC
Ndou Ndou Khano (6) Khano (6)
South America South America South America Line Line Line Line Line Line Line Line Line Line Line Line Line Line Line Line Line Line Line Line Line Line Line Line Line Line Line Line Line Line Line Line Line Line Line Line Line Line Line Line Line Line Line Line Line Line Line Line Line Line Line Line Line Line Line Line Line Line Line Line Line Line Line Line Line Line Line Line Line Line
(continued on next page)
66
L.S. Magwaza et al. / LWT - Food Science and Technology 72 (2016) 63e70
Table 1 (continued ) Sample number
Genotype name
Genotype code
Trial
Locality
Origin
74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104
2012-9-1 2012-9-2 Beauregard Blesbok Blesbok Blesbok Blesbok Blesbok Bonita Bonita Bophelo Bophelo Bophelo Bosbok Bosbok IIAM-10 IIAM-3 IIAM-5 Impilo Impilo Impilo Khano 199062.1 Letlhabula Monate Monate Mvuvhelo Ndou Ndou Purple sunset Resisto Resisto
52 53 54 55 55 55 55 55 56 56 57 57 57 58 58 59 60 61 62 62 62 63 64 65 66 67 68 68 69 70 70
IE IE IYT IE AYT AYT PYT PYT AYT AYT IYT AYT IE AYT AYT AYT AYT AYT IE IYT AYT AYT IE IE AYT AYT IE AYT IYT IE AYT
RDP RDP OSCA RDP OSCA RDP OSCA OSCA OSCA RDP RDP OSCA RDP OSCA RDP OSCA OSCA RDP RDP OSCA OSCA RDP RDP RDP RDP RDP RDP OSCA OSCA RDP RDP
ARC Line ARC Line USAeNorth America SA cultivar SA cultivar SA cultivar SA cultivar SA cultivar USAeNorth America USAeNorth America SA cultivar SA cultivar SA cultivar SA cultivar SA cultivar MozambiqueeAfrica MozambiqueeAfrica MozambiqueeAfrica SA cultivar SA cultivar SA cultivar ARC line SA cultivar SA cultivar SA cultivar SA cultivar SA cultivar MozambiqueeAfrica SA cultivar North america North america
AYT, Advanced yield trial; IYT, Intermediate yield trial; PYT, Preliminary yield trial; IE, Initial evaluation; SA, South Africa; RDP, Roodeplaat Pretoria (25 400 51.3300 S; 28 170 10.1200 E); OSCA, Owen Sithole Empangeni (28 440 7.8400 S; 31 530 47.7700 E).
2.4. Destructive (reference) measurements Protein content was determined according to AOAC (1980) methods. Total nitrogen (N) was analysed using Dumas combustion. A 120 mg of sweetpotato root powder from 120 samples obtained from various trials at ARC-VOP breeding trials (initial evaluation, intermediate and advanced yield trials) were weighed in tin foil and subjected to Leco analyser (Leco corporation, USA) using 100 mg of EDTA as a standard. Crude protein content was calculated as N 6.25. 2.5. Chemometric analysis The reflectance spectra in Vision format (Vision TM, version 3.5.0.0, Tidestone Technologies Inc., KS, USA) were transformed to MSD format compatible with The Unscrambler® chemometric software (Version 10.3, Camo Process, SA., Norway). Three individual spectra from each sweetpotato sample were averaged prior to calibration and validation, thus results reported herein are based on average spectra. Average spectral data was first subjected to principal component analysis (PCA) to compare spectral characteristics, determinate effective wavelengths, evaluate distribution of residual values for normality from normal probability plots and detect spectral outliers (Magwaza et al., 2014b). After PCA exploration, spectral and destructive protein data were subjected to partial least square (PLS) regression. Several spectral pre-treatment methods, including multiple scatter correction (MSC), Savitzky Golay (SG) smoothing, first derivative and second derivative (2nd order polynomial) were individually tested. These were applied to smooth spectral data prior to regression to correct for light scatter, reduce the changes of light path length. The spectral pre-processing which provided the lowest root mean square error of prediction (RMSEP) was selected and
used to develop the calibration model for predicting protein content. The final model for predicting protein content of sweetpotato samples was based on 25-point SG spectral smoothing (2nd order polynomial) followed by an SG second-derivative transformation over 25 points (2nd order polynomial). Outliers were detected using the sample residual variance and leverage on PCA data exploration exercise and PLS regression (Shiroma & Rodriguez-Saona, 2009; Magwaza et al., 2014a, b). Samples with large residual and/or high leverage and located far from the zero line of the residual variance plot were identified and recorded as potential outliers. More stringent Hotelling T2 ellipse and statistics was used as a diagnostic tool for confirming outlier values of samples. Only two outliers were identified and removed, which means that samples for analysis were reduced to 102. The optimal number of latent variables (LVs) to take into account was determined as the minimum number of LVs corresponding to the first lowest value of residual X-variance from the plot of the residual X-variance for increasing number of LVs (Davey et al., 2009). During PLS model calibration, spectral and reference data were subjected to test set validation, where samples were randomly assigned into two subsets, 40% for calibration, 30% for validation and 30% for prediction. Although samples in calibration, validation and prediction datasets were randomly selected, validation and prediction data sets were examined to ensure that these data sets were restricted within the range of the calibration dataset. PLS results were compared on the basis of the regression statistics of the models described by values of the root mean square error of calibration (RMSEC, Eq. (1)), root mean square error of validation or prediction (RMSEP, Eq. (2)), regression coefficient for predicted versus measured protein content (R2, Eq. (3)), which represents the proportion of explained variance of the response variable in the calibration (R2c) or validation dataset (R2v) and the residual predictive deviation (RPD, Eq. 4) (Bobelyn et al., 2010;
L.S. Magwaza et al. / LWT - Food Science and Technology 72 (2016) 63e70
Camps & Christen, 2009; Liu, Sun, & Ouyang, 2010; Lu et al. 2006a,b; Sun, Zhang, & Liu, 2009). The model reported herein was selected for its higher R2 and RPD values as well as lower LVs, RMSEC and RMSEP values. Other statistical parameters explaining selected model were low average difference between predicted and measured values (Bias) (Eq. (5)) and a small difference between RMSEC and RMSEP.
P ðycal yact Þ2 R2 ¼ 1 P ðycal ymean Þ2
(1)
RMSEC ¼
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi .ffi X ðycal yact Þ2 n
(2)
RMSEP ¼
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 . X n ypred yact
(3)
RPD ¼
SD RMSEP
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 1 X Bias ¼ ypred yact n
(4)
(5)
where:
n ¼ number of spectra yact ¼ actual value ymean ¼ mean value ycal ¼ calculated value ypred ¼ predicted value of the attribute SD ¼ standard deviation of reference values Calibration model stability was tested by interchanging calibration and validation sets during model development and checking that the differences in the regression statistics obtained were small (Alvarez-Guerra et al., 2010). 3. Results and discussion 3.1. Description spectra Fig. 1 shows a typical reflectance spectrum of measured sweetpotato samples. The reflectance spectrum varied from 400 to 2500 nm. The spectral line represents the average spectra acquired from all tested samples. Since NIR region is generally associated with combination bands of fundamental vibrations which are very broad and highly overlapped, it is difficult to distinguish them
Fig. 1. A typical average visible to near infrared (400e2500 nm) spectrum illustrating peaks of dried sweetpotato storage root samples of 91 genotypes.
67
visually (Cozzolino, 2015; ElMasry, Sun, & Allen, 2013). This is more difficult with biological material such as sweetpotato samples which are characterised by complex hydrogen bonding interactions between sugars, fatty acids and proteins. Nonetheless, the average spectral line in Fig. 1 illustrates all the important absorption peaks related to the composition of sweetpotatoes previously reported by Katayama et al. (1996), Lu et al. 2006a,b and Ding et al. (2015). In general, the spectra was dominated by eight groups of peaks positioned at 1200, 1465, 1586, 1730, 1950, 2110, 2314, and 2350 nm wavelength bands. According to Williams and Norris (2001) and Lebot et al. (2009), these peaks arise from overlapping absorption corresponding to mainly to overtones and combinations of vibrational modes involving CeH, OeH, and NeH, associated with carbohydrates, fatty acids and proteins, respectively. Protein content is associated with NeH overtones, whose absorption band appears to vary from 1670 to 1870 nm. Similar spectra for dried sweetpotatoes were reported by Ding et al. (2015). 3.2. Description of destructive data for Vis/NIRS calibrations The distribution statistics for reference datasets used in calibration, validation and prediction validation are presented in Table 2. During data exploration, it was observed that reference measurements for protein content in calibration, validation prediction datasets were normally distributed around the means. Protein content of different genotypes varied from a range of 2.53e6.93%. The means for destructive protein content used for calibration, validation and prediction sets were respectively at 4.26, 4.22 and 4.18%, with corresponding coefficient of variation of 28.39, 27.46 and 28.53%. These values were found to be well within the range of 1e10%, previously reported in different sweetpotato cultivars (Bovell-Benjamin, 2007; Ravindran, Ravindran, Sivakanesan, & Rajaguru, 1995; Shekhar, Mishra, Buragohain, Chakraborty, & Chakraborty, 2015). It was apparent from the standard deviation, minimum-to-maximum range and CV% statistics that all datasets ment, covered a large range helpful in calibration models (Cle Dorais, & Vernon, 2008). The large variation in reference protein content observed in this study resulted from sweetpotato breeding clones used (Table 1). The large variation in the reference data allowed the interpretation and prediction accuracy of calibration models for NIR spectroscopy. This is highly dependent upon the precision of the determined reference data and enough variation in both calibration and validation datasets (Lu et al. 2006a,b). 3.3. Calibration and validation of PLS models PLS multivariate data analysis was applied to spectral and reference data to develop prediction models. The best and stable calibration model for predicting protein content was obtained with eight latent variables. These were determined as the minimum number corresponding to the first lowest value of the residual yvariance, from the graph of residual y-variance against number of latent variables (Fig. 2). Therefore, the optimum number of LVs for quantifying protein content of sweetpotato samples was eight. The residual variance did not change after the eighth latent variable, indicating that adding another factor or latent variable explained a small variance and would have resulted in an overfitted model. The number of LVs (8) were relatively low considering a general recommendation of one latent variable for every 10 samples in a model (Lammertyn, Peirs, De Baerdemaeker, & Nicolaï, 2000). A characteristic regression coefficient curve obtained during calibration is portrayed in Fig. 3. The smoothness of the regression co-efficient curve in Fig. 3 showed that only significant latent variables were included in the calibration model and noise was not added in the model. Spectral variables with significant contributing
68
L.S. Magwaza et al. / LWT - Food Science and Technology 72 (2016) 63e70
Table 2 Mean, standard deviation (SD), range and coefficient of variation (CV%) for calibration (n ¼ 40), validation (n ¼ 31) and prediction (n ¼ 31) of subsets of reference protein content of sweetpotato samples. Variable
Protein content (%)
Calibration data set
Validation data set
Prediction data set
Mean ± SD
Range
CV%
Mean ± SD
Range
CV%
Mean ± SD
Range
CV%
4.26 ± 1.21
2.53e6.93
28.39%
4.22 ± 1.16
2.56e6.93
27.46%
4.18 ± 1.19
2.68e6.87
28.53%
Fig. 2. Residual y-variance as a function of the number of latent variables (or factors) in the calibration model for predicting protein content of sweetpotato storage root samples in the NIR spectral range of 1600e2400 nm.
to the protein prediction model were determined from its regression coefficients (loadings). Spectral peaks with relatively higher absolute values of regression coefficient values were an indication that the variable is contributed significantly to the model while those with values close to zero were not important to the model. Prominent peaks with high absolute regression coefficient values were observed in the wavelength region between 1600 and 2200 nm, corresponding with second overtones for NeH bonds, CeH stretching, CeO stretch combination, and the CeO stretch or NeH stretching vibrations, all related to protein molecules (Shenk, Workman, & Westerhaus, 2008; Williams, 2007). The wavelength bands selected for the protein calibration model were similar to those identified by Diaz et al. (2014) as important for protein quantification. Table 3 show summary statistics of PLS models for predicting
Fig. 3. Regression coefficients curve of the protein model of sweetpotato storage root samples with eight latent variables and near infrared spectral range of 1600e2400 nm.
protein content in storage root of sweetpotato genotypes. The calibration model was accurate with high coefficient of determination (R2c ¼ 0.98) and lower RMSEC (0.26). The validation model for protein content had an R2v-value of 0.96 and RMSEP of 0.29%, confirming a good fit for the developed models. The models developed presented very similar RMSEC and RMSEP values of 0.26 and 0.29%, respectively, indicating robust fitting. The model was characterized by a low absolute value of the bias (0.03). The lower bias value of the model demonstrated its stability and robustness and therefore not sensitive to the external factors such as growing environment and seasonality. The predictive performance of models was good with RPD statistical parameters above 4, which according to Saeys, Mouazen, and Ramon (2005) and Davey et al. (2009) are considered excellent for prediction models of biological samples. Deviations of single samples during the prediction exercise are visualized in a scatter plot of measured and predicted values (Fig. 4). The accuracy and precision of NIR was satisfactory for rapid quantification of protein content in sweetpotato samples. Therefore, NIR analysis in combination with model developed in the current study is sufficiently accurate for the routine screening of large numbers of sweetpotato genotypes for breeding programs. Considering the rapid and reliable analytical capability, NIRS can be used for many applications within the fields of food and crop analysis. With its very low operating costs, this technique can be used in breeding programs to assess the genetic diversity of protein content among germplasm collection in developing countries, including sub-Saharan Africa. This will complement the morphological data sets of the genotypes collected under filed conditions. Both data sets are still valuable to select genetically complementary and unrelated parental sweetpotato clones for designed crosses and effective breeding with regards to high protein content, yield and related agronomic traits. Non-destructive measurements of different quality parameters using NIRS-based techniques have been used to determine protein content and product profiles of many horticultural and agronomic crops. Existing sweetpotato breeding investigations, are focusing on assessing quality attributes and very few have integrated
Fig. 4. Scatter plots of NIR predicted against values of protein content of sweetpoato storage root measured using Dumas combustion method.
L.S. Magwaza et al. / LWT - Food Science and Technology 72 (2016) 63e70
69
Table 3 Performance statistics of the partial least squares calibration and prediction models for quantifying protein content of storage root of sweetpotato. The model was developed using second derivative (second polynomial) spectral data pre-processing method. Variable
a
LV
b
Prepr
Calibration model c 2
R
Protein content (%) a b c d e f g h i
8
i
2nd Der
c
0.98
Validation model
d
e
0.26
4.65
RMSEC
RPDC
f 2
R
v
0.96
g
h
0.29
4.00
RMSEP
RPDv
Slope
Bias
0.98
0.003
LV latent variables. Pre.Pr pre-processing method. R2c coefficient of determination for calibration. RMSEC root mean square error of calibration. RPDC residual predictive deviation for calibration. R2v coefficient of determination for validation. RMESP root mean square error of prediction. RPDv residual predictive deviation for validation. 2nd der second derivative.
quantitative assessment of these variables in one program due to several constraints including limited financial means. Most traits of interest in sweetpotatoes such as protein content and composition, dry matter content, carotenoids, starch and sugars are based on organic molecules which contain CeH, OeH, CeO, NeH, SeH and CeC bonds absorbing at different wavelength bands. Hence, it is possible to use NIRS technique for integrated quantification of these traits. NIRS can, therefore, be a useful tool in plant breeding programs aimed at combining several quality traits that satisfy the requirements of both the industry and the market. Taking into account that NIRS has been used for non-destructive prediction of different chemical properties of intact crop products, it will also be interesting in future to apply the technique to predict chemical composition, including protein content and profile of storage roots of diverse samples of sweetpotato. Although the study was able to show the power of NIR to screen sweetpotato genotypes for protein content, it should be noted that some chemical properties of food are more accurately measured with NIRS than others. For instance, the NIRS technique has significantly greater accuracy for determining protein content but has limitations to determine composition of amino acids. The low success could be attributed to the fact that amino acids concentration on sweetpotato is relatively low; hence, calibration of this attribute is likely to represent secondary correlation on attributes related amino acids. In addition, the accuracy of NIR measurement of amino acids in food samples may be affected by close correlation between some amino acids and total protein content (Baianu et al., 2004). The limitation of NIRS to determine protein quality and amino acid composition is not unique to sweetpotatoes. It has been reported on soybeans by Fontaine, Horr, and Schirmer (2001), who deduced that NIR spectroscopy measures amino acid concentration indirectly by deriving it from the total amount of nitrogencontaining molecules. At this phase in breeding it is not feasible to consider this detailed analysis due to the large number of genotypes being screened. However, when reaching advanced stages of evaluation (Laurie, van den Berg, Tjale, Mulandawa, & Mtileni, 2009), the amino acid will need to be assessed. This will also be critical to make varietal recommendations to address human dietary intake deficiencies. 4. Conclusion The interrogation of NIRS spectra together with the destructive values using PLS multivariate analysis allowed for the establishment of calibration equations for the prediction of protein content of sweetpotato samples. This work demonstrated NIRS and associated chemometric analysis as useful techniques for quantifying protein content of storage root of sweetpotato. Since protein
content is a highly desirable trait to growers, breeders, processors and consumers alike, a rapid and high throughput analyses including NIR is important for clearly bench marking diverse crop varieties. Therefore, the NIRS model developed in this study may help to quantify protein composition of sweetpotato for rapid screening of germplasm in breeding programs with high throughput and reasonable accuracy. However, there is still a need for continued development of models for predicting different quality parameters for different population due to the inherent variability between species, genotypes and growing regions. New calibration models are required for different sets of population and genotypes growing in a given agro-ecology. Models for integrated prediction of other biochemical constituents such as carotenoids, sugars, carotenoids, starch and dry matter content should also be investigated. Acknowledgements This work is supported by the Professional Development Program of the Agricultural Research Council (ARC) of South Africa through ARC-VOP, National Research Foundation (NRF) of South Africa, and the University of KwaZulu-Natal. The authors are grateful to Richard Burgdorf and Khulekani Mkhonza for technical research support. References AOAC. (1980). Official methods of analysis (13th ed.). Washington, DC: Association of Official Analytical Chemists. Bagchi, T. B., Sharma, S., & Chattopadhyay, K. (2016). Development of NIRS models to predict protein and amylose content of brown rice and proximate compositions of rice bran. Food Chemistry, 191, 21e27. Baianu, I. C., You, T., Costescu, D. M., Lozano, P. R., Prisecaru, V., & Nelson, R. L. (2004). High-resolution nuclear magnetic resonance and near-infrared determination of soybean oil, protein, and amino acid residues in soybean seeds. In D. L. Luthria (Ed.), Oil extraction and analysis: Critical issues and ComparatiVe studies (pp. 193e240). Champaign, IL: AOCS Press. Bobelyn, E., Serban, A., Nicu, M., Lammertyn, J., Nicolaï, B. M., & Saeys, W. (2010). Postharvest quality of apple predicted by NIR-spectroscopy: study of the effect of biological variability on spectra and model performance. Postharvest Biology and Technology, 55, 133e143. Bouis, H. E., Hotz, C., Mcclafferty, J. V., & Pfeiffer, W. H. (2013). Bio-fortification: a new tool to reduce micronutrient malnutrition. Food and Nutrition Bulletin, 32, S31eS40. Bovell-Benjamin, A. C. (2007). Sweet potato: a review of its past, present, and future role in human nutrition. Advances in Food and Nutrition Research, 52, 1e59. Camps, C., & Christen, D. (2009). Non-destructive assessment of apricot fruit quality by portable visible-near infrared spectroscopy. LWT - Food Science and Technology, 42, 1125e1131. ment, A., Dorais, M., & Vernon, M. (2008). Nondestructive measurement of fresh Cle tomato lycopene content and other physicochemical characteristics using visible-NIR spectroscopy. Journal of Agricultural and Food Chemistry, 56, 9813e9818. Cozzolino, D. (2015). Foodomics and infrared spectroscopy: from compounds to
70
L.S. Magwaza et al. / LWT - Food Science and Technology 72 (2016) 63e70
functionality. Current Opinion in Food Science, 4, 39e43. Davey, M. W., Saeys, W., Hof, E., Ramon, H., Swennen, R. L., & Keulemans, J. (2009). Application of visible and near-infrared reflectance spectroscopy (Vis/NIRS) to determine carotenoid contents in banana (Musa spp.) fruit pulp. Journal of Agricultural and Food Chemistry, 57, 1742e1751. €ssner, M. (2003). The world health organization global database De Onis, M., & Blo on child growth and malnutrition: methodology and applications. International Journal of Epidemology, 32, 518e526. Diaz, J. T., Veal, M. W., & Chinn, M. S. (2014). Development of NIRS models to predict composition of enzymaticallyprocessed sweetpotato. Industrial Crops and Products, 59, 119e124. Ding, X., Ni, Y., & kokot, S. (2015). NIR spectroscopy and chemometrics for the discrimination of pure, powdered, purple sweet potatoes and their samples adulterated with the white sweet potato flour. Chemometrics and Intelligent Laboratory Systems, 144, 17e23. ElMasry, G., Sun, D.-W., & Allen, P. (2013). Chemical-free assessment and mapping of major constituents in beef using hyperspectral imaging. Journal of Food Engineering, 117, 235e246. FAO, 2014. www.FAOSTAT.FAO.org[Online]. Accessed on 15.03.2015. Fontaine, J., Horr, J., & Schirmer, B. (2001). Near-infrared reflectance spectroscopy enables the fast and accurate prediction of the essential amino acid contents in soy, rapeseed meal, sunflower meal, peas, fishmeal, meat meal products, and poultry meal. Journal of Agricultural and Food Chemistry, 49, 57e66. Golic, M., Walsh, K. B., & Lawson, P. (2003). Short-wavelength near-infrared spectra of sucrose, glucose, and fructose with respect to sugar concentration and temperature. Applied Spectroscopy, 57, 139e145. Guthrie, J. A., Reid, D. J., & Walsh, K. B. (2005). Assessment of internal quality attributes of mandarin fruit. 2. NIR calibration model robustness. Australian Journal of Experimental Agriculture, 56, 417e426. Haase, N. U. (2006). Rapid estimation of potato tuber quality by near infrared spectroscopy. Starch, 58, 268e273. Katayama, K., Komaki, K., & Tamiya, S. (1996). Prediction of starch, moisture, and sugar in sweetpotato by near infrared transmittance. HortScience, 31, 1003e1006. Lammertyn, J., Peirs, J., De Baerdemaeker, J., & Nicolaï, B. M. (2000). Light penetration properties of NIR radiation in fruit with respect to non-destructive quality assessment. Postharvest Biology and Technology, 18, 121e132. Laurie, S. M., van den Berg, A. A., Tjale, S. S., Mulandawa, N. S., & Mtileni, M. M. (2009). Initiation and first results of a biofortification program for sweetpotato in South Africa. J of Crop Improvement, 23, 235e251. Lebot, V. (2009). Tropical root and tuber crops: Cassava, sweet potato, yams and aroids (p. p413). Oxfordshire, U.K: CABI Publishing. Lebot, V., Champagne, A., Malapa, R., & Shiley, D. (2009). NIR determination of major constituents in tropical root and tuber crop flours. Journal of Agricultural and Food Chemistry, 57, 10539e10547. Lebot, V., Ndiaye, A., & Malapa, R. (2011). Phenotypic characterization of sweet potato [Ipomoea batatas (L.) Lam.] genotypes in relation to prediction of chemical quality constituents by NIRS equations. Plant Breeding, 130, 457e463. Liu, Y., Sun, X., & Ouyang, A. (2010). Non-destructive measurements of soluble solid content of navel orange fruit by visible-NIR spectrometric technique with PLSR and PCA-BPNN. LWT - Food Science and Technology, 43, 602e607. Lu, G., Huang, H., & Zhang, D. (2006a). Application of near-infrared spectroscopy to predict sweetpotato starch thermal properties and noodle quality. J. Zhejiang University Science, B 7, 475e481. Lu, G., Huang, H., & Zhang, D. (2006b). Prediction of sweetpotato starch physicochemical quality and pasting properties using near-infrared reflectance spectroscopy. Food Chemistry, 94, 632e639. Magwaza, L. S., Landahl, S., Cronje, P. J. R., Nieuwoudt, H. H., Mouazen, A. M., Nicolaï, B. M., et al. (2014). The use of Vis/NIRS and chemometric analysis to predict fruit defects and postharvest behaviour of ‘Nules Clementine’ mandarin fruit. Food Chemistry, 163, 267e274. Magwaza, L. S., Opara, U. L., Cronje, P. J. R., Landahl, S., Nieuwoudt, H. H., Mouazen, A. M., et al. (2014a). Assessment of rind quality of ‘Nules Clementine’ mandarin fruit during postharvest storage: 2. Robust Vis/NIRS PLS models for prediction of physico-chemical attributes. Scientia Horticulturae, 165, 421e432. Magwaza, L. S., Opara, U. L., Cronje, P. J. R., Landahl, S., Nieuwoudt, H. H., Mouazen, A. M., et al. (2014b). Assessment of rind quality of ‘Nules Clementine’ mandarin fruit during postharvest storage: 1. Vis/NIRS PCA models and
relationship with canopy position. Scientia Horticulturae, 165, 410e420. McGlone, V. A., Jordan, R. B., & Martinsen, P. J. (2002). Vis-NIR estimation at harvest of pre- and post-storage quality indices for ‘Royal Gala’ apple. Postharvest Biology and Technology, 25, 135e144. , G. L. P. (1997). Determination of total reducing sugars in Mehrübeoglu, M., & Cote potato samples using near-infrared spectroscopy. Cereal Foods World, 42, 409e413. Ravindran, V., Ravindran, G., Sivakanesan, R., & Rajaguru, S. B. (1995). Biochemical and nutritional assessment of tubers from 16 cultivars of sweet potato (Ipomoea batatas (L.) Lam). Journal of Agricultural and Food Chemistry, 43, 2646e2651. Sabatier, D., Moon, C., Mhora, T., Rutherford, R., & Laing, M. (2013). Near-infrared reflectance (NIR) spectroscopy as a high-throughput screening tool for pest and disease resistance in a sugarcane breeding programme. In Proc. 86th Ann. Congr. South Afr. Sugar Technol. Assn., Durban, South Africa (pp. 101e106). Saeys, W., Mouazen, A. M., & Ramon, H. (2005). Potential onsite and online analysis of pig manure using visible and near infrared reflectance spectroscopy. Biosystems Engineering, 91, 393e402. Saltzman, A., Birol, E., Bouis, H. E., Boy, E., De Moura, F. F., et al. (2013). Biofortification: progress toward a more nourishing future. Global Food Security, 2, 9e17. Shekhar, S., Mishra, D., Buragohain, A. K., Chakraborty, S., & Chakraborty, N. (2015). Comparative analysis of phytochemicals and nutrient availability in two contrasting cultivars of sweet potato (Ipomoea batatas L.). Food Chemistry, 173, 957e965. Shenk, J., Workman, J., & Westerhaus, M. (2008). Application of NIR spectroscopy to agricultural products. In D. Burns, & E. Ciurczak (Eds.), Handbook of near-infrared analysis, 35 (3rd ed., pp. 347e386). Boca Raton, Florida: CRC Press. Shiroma, C., & Rodriguez-Saona, S. (2009). Application of NIR and MIR spectroscopy in quality control of potato chips. Journal of Food Composition and Analysis, 22, 596e605. Sun, X., Zhang, H., & Liu, Y. (2009). Nondestructive assessment of quality of ‘Nanfeng’ mandarin fruit by a portable near infrared spectroscopy. International Journal of Agricultural and Biological Engineering, 2, 65e71. Tumwegamire, S., Kapinga, R., Rubaihayo, P. R., LaBonte, D. R., Grüneberg, W. J., Burgos, G., et al. (2011). Evaluation of dry matter, protein, starch, sucrose, bcarotene, iron, zinc, calcium, and magnesium in East African sweetpotato [Ipomoea batatas (L.) Lam.] germplasm. HortScience, 46, 348e357. Tumwegamire, S., Rubaihayo, P. R., LaBonte, D. R., Grüneberg, W. J., Kapinga, R., Mwanga, R. O. M., et al. (2011). Genetic diversity in whiteand orange-fleshed sweetpotato farmer varieties from East Africa evaluated by simple sequence repeat (SSR). Markers Crop Science, 1132e1142. Williams, P. (2007). Grains and seeds. In Y. Ozaki, W. McClure, & A. Christy (Eds.), Near-infrared spectroscopy in food science and technology (pp. 165e217). Hoboken, New Jersey: Wiley and SonsInc. Williams, P., & Norris, K. H. (2001). Variable affecting near infrared spectroscopic analysis. In P. Williams, & K. H. Norris (Eds.), Near infrared technology in the agriculture and food industries (2nd ed., pp. 171e185). St Paul, MNL: The American Association of Cereal Chemists. Woolfe, J. A. (1992). Sweetpotato: An untapped food resource. Cambridge, UK: Cambridge University Press. Young, M. W., MacKerron, D. K. L., & Davies, H. V. (1997). Calibration of near-infrared reflectance spectroscopy to estimate nitrogen concentration in potato tissues. Potato Research, 40, 215e220. Zere, E., & MacIntyre, D. (2003). Inequities in under-five child malnutrition in South Africa. International Journal of Equity Health, 2, 7. Zum Felde, T., Burgos, G., Espinoza, J., Eyzaguirre, R., Porras, E., & Grüneberg, W. (2007). Analysis of carotenoid, iron, zinc, and calcium content of potato (Solanum phureja) and sweetpotato (Ipomea batatas) using near infrared reflectance spectroscopy (NIRS). In Proceedings of the 13th international conference on near infrared spectroscopy (13th ICNIRS), 15e21 June 2007, Umea, Sweden and Vasa, Finland. Special issue of the J. Near Infrared Spectroscopy http://www. impublications.com/nir/page/nir-2007. Last Accessed 17.12.15.. Zum Felde, T., Burgos, G., Espinoza, J., Eyzaguirre, R., Porras, E., & Grüneberg, W. (2009). Screening for b-carotene, iron, zinc, starch, individual sugars and protein in sweetpotato germplasm by near infrared reflectance spectroscopy (NIRS). In 15th Triennial symposium of the International society for tropical root crops, Lima Peru, April 2010.