Rapid differentiation of Ghana cocoa beans by FT-NIR spectroscopy coupled with multivariate classification

Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy 114 (2013) 183–189 Contents lists available at SciVerse ScienceDirect Spectrochi...

Download PDF

1MB Sizes 0 Downloads 53 Views

Report

PDF Reader
Full Text

Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy 114 (2013) 183–189

Contents lists available at SciVerse ScienceDirect

Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy journal homepage: www.elsevier.com/locate/saa

Rapid differentiation of Ghana cocoa beans by FT-NIR spectroscopy coupled with multivariate classiﬁcation Ernest Teye a,b,⇑, Xingyi Huang a, Huang Dai a, Quansheng Chen a a b

School of Food and Biological Engineering, Jiangsu University, Xuefu Road 301, Zhenjiang 212013, Jiangsu, PR China School of Agriculture, Department of Agricultural Engineering, University of Cape Coast, Cape Coast, Ghana

h i g h l i g h t s

g r a p h i c a l a b s t r a c t

MC and MSC preprocessing method

enhanced the spectral of fermented cocoa beans. Regional differentiation of fermented cocoa beans by NIR spectroscopy. Optimal discrimination model was achieved by Support vector machine. NIR spectroscopy and non-linear multivariate method was used to identify cocoa beans.

a r t i c l e

i n f o

Article history: Received 25 January 2013 Received in revised form 24 April 2013 Accepted 19 May 2013 Available online 29 May 2013 Keywords: Ghana cocoa beans Near Infrared Spectroscopy Support vector machine

a b s t r a c t Quick, accurate and reliable technique for discrimination of cocoa beans according to geographical origin is essential for quality control and traceability management. This current study presents the application of Near Infrared Spectroscopy technique and multivariate classiﬁcation for the differentiation of Ghana cocoa beans. A total of 194 cocoa bean samples from seven cocoa growing regions were used. Principal component analysis (PCA) was used to extract relevant information from the spectral data and this gave visible cluster trends. The performance of four multivariate classiﬁcation methods: Linear discriminant analysis (LDA), K-nearest neighbors (KNN), Back propagation artiﬁcial neural network (BPANN) and Support vector machine (SVM) were compared. The performances of the models were optimized by cross validation. The results revealed that; SVM model was superior to all the mathematical methods with a discrimination rate of 100% in both the training and prediction set after preprocessing with Mean centering (MC). BPANN had a discrimination rate of 99.23% for the training set and 96.88% for prediction set. While LDA model had 96.15% and 90.63% for the training and prediction sets respectively. KNN model had 75.01% for the training set and 72.31% for prediction set. The non-linear classiﬁcation methods used were superior to the linear ones. Generally, the results revealed that NIR Spectroscopy coupled with SVM model could be used successfully to discriminate cocoa beans according to their geographical origins for effective quality assurance. Ó 2013 Elsevier B.V. All rights reserved.

Introduction ⇑ Corresponding author at: School of Food and Biological Engineering, Jiangsu University, Xuefu Road 301, Zhenjiang 212013, Jiangsu, PR China. Tel.: +86 51188792368/15240270079; fax: +86 51188797308. E-mail addresses: [email protected], [email protected] (E. Teye). 1386-1425/$ - see front matter Ó 2013 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.saa.2013.05.063

Cocoa bean (Theobroma cacao L.), since its discovery has been used for many tasty dishes and is consumed by majority of the world’s population. Globally, the major cocoa producing countries in descending order are Cote d’Ivory, Ghana, Indonesia, Brazil,

184

E. Teye et al. / Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy 114 (2013) 183–189

Nigeria, Cameroon and Ecuador where it provides both economic and socio-economic beneﬁt for farmers and the countries at large. World production of cocoa beans has increased drastically with nearly 70% coming from West Africa where Cote d’Ivory and Ghana contribute a total of about 60%. Ghana cocoa beans are of a higher quality internationally. This has emerged to become a reference standard by which other cocoa beans are measured and hence comparatively sold at a higher price [1,2]. Cocoa is the largest exported commodity in Ghana; more than two million metric tons have been exported over the past 2 years. The cocoa bean is cultivated by farmers in seven cocoa growing regions; Ashanti, Brong Ahafo, Central, Eastern, Volta, Western north and Western south. Quality attributes of cocoa beans like other agricultural produce differ from one geographical origin to the other because of disparities in production conditions such as climate, soil, harvesting, fermentation and drying variables [3,4]. The differences in the quality of cocoa bean according to geographical origin is often recognized and appreciated by the food industry and consumers, it is normally an important factor that determines price [2]. Recent studies have shown that; Ghana and Nigeria cocoa beans received a high response for strong chocolate ﬂavor than others [5]. Other researchers also investigated the inﬂuence of geographical origin on the aroma and ﬂavor of cocoa beans and observed similar trend. Othman and co-workers further observed differences in epicatechin content, antioxidant capacity and phenolic content of cocoa beans from different countries [1]. For ﬁnished cocoa bean product [6], Cambrai and others observed differences in chocolate samples according to their cocoa’s geographical origin. It could be said that wide differences exist for cocoa beans from different origins, and it is therefore very important to have a rapid analytical tool for discrimination of cocoa beans either intra or inter country. However, little is done about the effect of geographical origin of cocoa beans grown in different parts of Ghana. More so, it is not easy to discriminate cocoa beans by their geographical origin using sensory evaluation. Furthermore, the well known and accepted methods normally employed such as gas chromatography–mass spectrometry, high performance liquid chromatography, and colorimetry and inductively coupled plasma mass spectrometry [7] though precise; they are time consuming, tedious and require chemical use which is sometimes harmful to the environment. Therefore, a quick and comparatively accurate method is essentially required to differentiate cocoa beans from different geographical origins for quality assurance and control and monitoring of post-harvest activities. Fourier transform near infrared (FT-NIR) spectroscopy is an advanced and excellent analytical technique which has found its use for qualitative and quantitative analyses in several industries/specialties, e.g. pharmaceutical, petrochemical, textile, agricultural and food processing industries [4,8]. This advanced analytical tool has excellent merits over the traditional chemical analytical methodology. These advantages include: it is physical, rapid, accurate, reliable, non-invasive, semi/non-destructive, environmentally friendly (because there is no chemical involved) and requires minimal or no sample preparation [4,9,10]. NIR spectroscopy method has been successfully used for quantiﬁcation of fat, nitrogen and moisture content of cocoa powder [10]. Kaffka and co-workers attempted to use NIR technique to determine fat, protein and carbohydrate content in cocoa [11] . Also, the prediction of proanthocyanidins in cocoa has been studied [12]. Interestingly, the potential of NIR technique have been exploited in the analysis of coffee in several ways such as; predicting of caffeine and roasting color [13], prediction of espresso from roasted coffee [14], coffee varietal identiﬁcation [15], detection of addition of barley to coffee [16], and quantitative descriptive sensory analysis of Arabica coffee beverages [8]. Also, Aculey and co-workers used spectroscopic and

chromatographic methods to analyze cocoa beans from different cocoa growing regions in Ghana [17]. These studies stated above revealed that, FT-NIR spectroscopy method can be used to classify raw cocoa beans according to their geographical origin. However, there are no studies until now on the use of NIR spectroscopy for the discrimination of cocoa beans according to cocoa growing regions of Ghana and more importantly there is no discussion on the use of different multivariate classiﬁcation methods. It is now well known that the amount of information derived from the spectra data after scanning requires the use of mathematical models to extract maximum understandable data from the multivariate data set. Consequently, linear and non-linear regression method as a supervised pattern recognition method has become an important method used in a wide range of ﬁelds for the analysis of NIR spectra. It is therefore very important to select the most efﬁcient method. In this study, four supervised recognition methods made up of two linear (Linear discriminant analysis and K-nearest neighbors) and two non-linear (Artiﬁcial neural network and Support vector machine) with FT-NIR spectroscopy were attempted to discriminate cocoa beans from different cocoa growing regions in Ghana. Materials and methods Sample preparation In this research, cocoa bean samples ready for export were collected from the seven cocoa growing regions in Ghana as well as from the warehouse of the quality control division of the Ghana Cocoa Board where each cocoa bean samples were accurately labeled. The regions included; Ashanti, Brong-Ahafo, Central, Eastern, Volta, Western north and Western south. Considering the heterogeneities of the beans, each sample was ground separately for 15 s by a small multi-purpose grinder (QE-100, Zhejiang YiLi Tool Co., Ltd., China). The powders of each sample were sieved with a 500 lm mesh before further analysis. Spectra acquisition The spectrum of each sample was collected in the reﬂectance mode using the Antaris II Near Infrared Spectrophotometer (Thermo Electron Company, USA) with an integrating sphere. 10 g sample was collected into a standard sample cup and the spectra were scanned three times after rotating the cup 120°. The whole experiments were conducted at an ambient temperature of 25 ± 1 °C and the humidity kept at steady state. Each spectrum was an average of 32 scans with a spectra range of 10,000–4000 cm1, and the raw data set were measured in 3.856 cm1 interval, resulting in 1557 variables. Software device All calculations and algorithms were carried out in Matlab Version 7.14 (Mathworks Inc., USA) with Windows 7 ultimate for data processing. Antaris II System (Thermo Electron Company, USA) was used for spectra acquisition. Theory Spectra preprocessing methods The raw spectra proﬁle from the cocoa bean samples as shown in Fig. 1 were preprocessed before further analysis because the NIR spectra contained background information, noise and useful sample properties. In this study, four spectra preprocessing methods were applied comparatively. These included: Detrend correction

E. Teye et al. / Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy 114 (2013) 183–189

185

KNN K-nearest neighbor is a powerful and simple method for classiﬁcation which has successfully been used in a wide range of ﬁelds such as medicine, food industry and face and signature recognition [21]. It is a linear and non-parametric supervised recognition tool, where the distance between unknown object and each of the objects of the training set are determined [22]. In KNN, parameter K has a great impact on the identiﬁcation rate of the K-NN model hence the choice of K was optimized by calculating the prediction potential with several K values preferably an old number of small K values (3 or 5). In this study, PCs were used as an input data in KNN model. The efﬁciency of the KNN model was determined by the number of PCs and parameter K. These factors (PCs and K) were simultaneously optimized to derive a very good model. Prior to that, preprocessing of the data was also done to avoid the effect of different scales of the variables. Fig. 1. Raw spectra of cocoa beans from different regions.

(DC), Mean centering (MC), Multiplicative scatter correction (MSC) and Second derivative (2-Der.). These pretreatments were implemented in Matlab version 7.14. DC is normally applied to spectra to remove baseline shift and curvilinearity for densely packed solids such as powders. The log(1/R) values in NIR spectra with R being the reﬂectance, often show an increasing trend between 1100 and 2500 nm. To correct for this effect, the baseline is ﬁtted by a second degree polynomial and subsequently subtracted from the spectra [18]. MC preprocessing method is performed by calculating the average spectrum of the data set and subtracting the average from each spectrum. In addition, MSC is also a method used for the correction of scattered light on the different particle size. This procedure is used to correct additive and multiplicative effects in the spectra [19]. Furthermore, 2-Der spectra correction pretreatment is used to separate overlapping peaks and eliminate baseline drifts while small spectral difference are enhanced. However, the drawback in 2-Der is the enhancing of noise and to overcome this, the spectra are smoothed by using the Savitzky–Golay algorithm which is a moving window averaging tool. All the above, these pretreatment methods were applied in each multivariate model to enhance the discrimination rate.

Multivariate classiﬁcation algorithms Classiﬁcation algorithms are supervised pattern recognition methods used to develop discrimination models. Choosing the appropriate type is extremely signiﬁcant in further analysis. In this work, four different classiﬁcation algorithms were investigated to develop discrimination model. These were: Linear discriminant analysis (LDA) K-nearest neighbors (KNN), Back propagation artiﬁcial neural network (BP-ANN), and Support vector machine (SVM).

LDA Linear discriminant analysis is the most frequently used technique (linear and parametric) among the supervised pattern recognition methods. LDA is used to ﬁnd the linear combination of features and the resulting combination may be used as a linear classiﬁer [20]. The principle of LDA is based on the determination of linear discrimination functions which brings out clearly the ratio between class variance and reduces the ratio of within-class variance [20]. According to Berrueta and co-workers [22] in LDA, the classes are supposed to follow a multivariate normal distribution and be linearly separated. LDA is also considered as PCA and the number of principal component factor is crucial to the performance of LDA discrimination model.

BP-ANN Artiﬁcial neural network is a non-linear and non-parametric regression technique used to solve classiﬁcation and discrimination problems. ANN modeling method was basically designed to mimic the biological nervous system that is capable of self-learning on examples [23]. Complete neural networks are structures made up of a set of densely interconnected adaptive processing elements called nodes. It has the potential of performing parallel computations for data processing and may incorporate any degree of non-linearity in theory. It is known that ANNs may perform better in certain instances than linear regression methods. Conversely, the results are very difﬁcult to visualize. This makes its interpretation very cumbersome [24]. In brief the commonly used ANNs include counter propagation, radial basis function, Kohonen, probabilistic neural networks and back propagation. Nevertheless, in this study the back propagation neural network was adopted. The principle of BP-ANN is based on an algorithm that corrects the weights within each layer in proportion to the error obtained from the previous layer [25]. For this method, the data are fed forward into the network without feedback and the neurons are connected. Whiles, the error computed at the output side is propagated backward from the output layer to the hidden layer and ﬁnally to the input layer. Most applications of ANN for processing of spectra information use PCs as an input variable, and the efﬁciency of the ANN model is improved by additional preprocessing methods [26] and this was also applicable in this work. SVM SVM is a non-linear supervised learning method which was developed by Vapnik and co-workers for two-group classiﬁcation problems [27]. It works by obtaining the optimal boundary of two groups in a vector space independent on the probabilistic arrangements of vectors in the training set. When the linear boundary in the low dimension input space is not enough to separate the two classes, SVM can create a hyperplane that allows linear separation in the higher dimension feature space. This method is a transformational tool that converts data from a low dimension input space to a high dimension feature space. If the categories are linearly separated, SVM ﬁnds optimal hyperplane boundary which separates both classes of the training set and unknown sample. However, if the classes are separated by non-linear boundary, then the kernel function is used to ﬁnd the boundary by mapping the non-separable data into a higher dimensional space. This causes the classes to become linearly separated. For a detailed information readers can refer [22]. There are three main kernel functions in SVM model (Sigmoid kernel, Polynomial kernel and Gaussian kernel). In this study, the Gaussian kernel function was selected because it is the most simple and rapid during computation.

186

E. Teye et al. / Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy 114 (2013) 183–189

Results and discussion Spectra examination Fig. 1 presents the raw spectra of the cocoa bean samples and this revealed a water absorption bands around 5312 cm1 and 7202 cm1 corresponding to ﬁrst overtone region which is O–H stretching and +O–H deformation. These regions were eliminated during the analysis together with other regions (10,000– 9000 cm1 and 5000–4000 cm1) showing high a level of noise. Therefore, the rest of the band in the spectrum that could provide useful information belongs to the vibration range of 5000– 9000 cm1. In this vibration range, there are Carbonyl group, C–H stretch and C–H deformation, S–H, N–H, CH2 and –CH3 corresponding to phytochemicals such as polyphenols, proteins, alkaloids, volatile and non-volatile acid and other compounds. However, in this research a spectra range of 5500–6500 cm1 were selected because they have relevant features from the organic substances and water absorption bands were eliminated. The range therefore became more convenient to use. Fig. 2 shows Second derivative (2-Der) preprocessing of the spectra range (5500–6500 cm1) selected for this study. The major peaks with their wave number possess useful features that can provide information for classiﬁcation. The selected spectra were then further processed to give visible groupings as can be seen in Fig. 3.

Fig. 3. Mean spectra of the cocoa bean from the seven regions.

Principal component analysis (PCA) Principal component analysis is an unsupervised pattern recognition method which is used for visualizing data trends in a dimensional space. It works by reducing the dimension of the data matrix and compressing the information into interpretable variables called principal components (PCs), which are orthogonal [28]. To observe clear cluster trends of a sample, a scatter or score plot should be obtained using the topmost three principal components (PC1, PC2 and PC3). This could bring out important informations and remove non-useful ones, so that similar samples would be clustered closer to each other. In this regard, the visual graphical output provides information that could be used for determining differences within and between cluster trends. In this experiment all the NIR spectra of cocoa beans from the seven cocoa growing regions were used for the PCA. PCA is not a classiﬁcation tool; however, its properties could provide spectra information trends as a result of the seven cocoa growing regions. To visualize the data trends, a score plot was obtained by using the topmost three principal components (PC1, PC2 and PC3). Fig. 4 shows the outcome of the principal component analysis and it revealed that there are separations in the geographical locations. All

Fig. 4. Score cluster plot of the topmost three principle components (PCs) for all samples.

the samples clustered well along the three PCs plane where PC1, PC2, and PC3 can explain 87.63%, 7.34%, and 2.43% of the variance respectively, giving a total accumulative contribution of 97.4% variance for the 194 samples used in the study. More importantly, the ﬁrst principal component (PC1) covers the maximum information direction because noise and error measurements are modeled by it [28]. The 3D representation of the topmost three PCs can explain 97.4% spectra information from all the samples used and this provides chemical compositional information in the NIR region. Cocoa beans have considerable differences in chemical properties according to their geographical origins, pre-harvest activities and postharvest practices. After the initial cluster trends, the samples were used as data input for the multivariate classiﬁcation. The samples were divided into two sets, namely training set and prediction set. The training set was made up of 130 samples and these samples were used to build the model. Whiles, 64 samples were used for the prediction set. The prediction set was also used to test the reliability and stability of the model. To avoid bias in the division of the subsets, for every ﬁve samples three spectra were randomly selected as training set while the remaining samples were used as test set. This procedure was processed in Matlab program. Performance of models Every multivariate classiﬁcation model has its strength and weaknesses. In this work, the few ones used revealed their own potential in the classiﬁcation of cocoa beans from different geographical location.

Fig. 2. Spectra of cocoa beans after 2-derivative preprocessing.

Linear discriminant analysis (LDA) model Fig. 5 shows the performance of LDA model. The optimal number of PCs was according to the best discrimination rate performed

187

E. Teye et al. / Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy 114 (2013) 183–189

Fig. 5. Cross validation discrimination rates of LDA models at different PCs.

by cross-validation. In this ﬁgure, the best discrimination rate was 96.15% for the training set and 90.63 for prediction set at an optimal number of PCs = 8. Table 1 also shows the inﬂuence of preprocessing methods used. Mean centering preprocessing was used to enhance the performance of LDA in both the training and prediction set. K-nearest neighbors (KNN) model Fig. 6 shows the 3D plot of the performance of KNN model. KNN as a linear and non-parametric method achieved an optimal performance after optimization when PC = 9 and K = 2. The best preprocessing method was multiplicative scatter correction (SNV). From Table 2, the optimal discrimination rate by KNN model was of 86.15% for the training set and 84.38% for the prediction sets. Back propagation artiﬁcial neural network (BP-ANN) model In this work, a non-linear approach was also considered. Fig. 7 shows the performance of BPANN model for solving classiﬁcation problem after cross validation. The parameters considered included the number of neurons in the hidden layer and the number of PCs with their inﬂuence on the discrimination rate. From Table 3 it can be seen that, Mean centering (MC) was superior to the others. The optimal performance of BP-ANN model in the training set was 99.23%, while in prediction set it was 96.88%. This model performed better than the two linear methods used because the parameters in BP-ANN exerted inﬂuence on the performance. Support vector machine (SVM) model Fig. 8 shows the discrimination rate of SVM model after cross validation. The cross-validation was done to ensure the stability of the model. Table 4 also shows the percentage enhancement by the preprocessing methods used. SVM model as a non-linear classiﬁcation method had a discrimination rate of 100% for both training set and prediction set respectively. From Table 4, it can be observed that MC and MSC showed a promising discrimination rate between 96% and 100% for both training and prediction sets in the SVM model. Generally the non-linear model performed better than the linear. Table 1 The inﬂuence of preprocessing on LDA model. Preprocessing methods

MC MSC Detrend 2-Der

Discrimination rate (%) Training set

Prediction set

96.15 90.77 86.92 93.08

90.63 90.63 65.63 78.13

Fig. 6. Cross validation discrimination rates of KNN model at different PCs and parameter K.

Table 2 The inﬂuence of preprocessing on KNN model. Preprocessing methods

MC MSC Detrend 2-Der

Discrimination rate (%) Training set

Prediction set

54.62 87.69 50.00 72.31

48.44 85.93 46.88 75.00

Table 5 compares the identiﬁcation rates of LDA, KNN, BPANN and SVM. The identiﬁcation rate (%) is an important factor for testing the performance of the models and it was calculated by

IR ¼

N1 100 N2

ð1Þ

where IR is the identiﬁcation rate (%), N1 is the number of samples correctly identiﬁed in either the training set or prediction set and N2 is the total number of samples used in either the training set or prediction set General discussion The application of NIR radiation on the cocoa beans samples resulted in incident radiation reﬂected, transmitted and absorbed depending on the chemical composition. These radiations produce a spectrum that shows multiple bands and few peaks as can be seen from Fig. 1. These bands are made up of overtones and combinations of fundamental vibrations which correspond to organic properties in the materials used. Though very similar by a simple look with the eye, there exist a lot of useful and non-useful information, hence the need to use advanced multivariate mathematical method to extract the useful parameters from each spectrum. Moreover, in eliminating the inﬂuence of water absorption band and noise, there was the need to select accurately the spectra range that is devoid of water. In this work, the spectra range of 5500– 6500 cm1 was carefully selected. As shown from Fig. 2, this range has chemical properties that can be used to discriminate the cocoa beans. The mean absorbance was then further processed, and it revealed that, there were seven different groupings (Fig. 3) which depict the seven cocoa growing regions. This is because each group could have unique chemical and physical properties. PCA as an unsupervised pattern recognition tool also gave visible cluster trends as shown in Fig. 4. The groupings could be explained by the chemical properties in each as a result of differences

188

E. Teye et al. / Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy 114 (2013) 183–189

Fig. 7. Training and validation performance of BP-ANN model.

Table 3 The inﬂuence of preprocessing on BPANN model. Preprocessing methods

MC MSC Detrend 2-Der

Table 5 The overall performance of the multivariate classiﬁcation methods.

Discrimination rate (%)

Models

Training set

Prediction set

99.23 96.20 90.77 96.15

96.88 95.31 87.63 85.94

Fig. 8. Cross validation discrimination rates of SVM model at different PCs.

Table 4 The inﬂuence of preprocessing on SVM model. Preprocessing methods

MC MSC Detrend 2-Der

Discrimination rate (%) Training set

Prediction set

100 96.92 100 97.69

100 96.88 95.75 92.19

in geographical origin, pre-harvest activities and post-harvest practices. The contributions of the top three PCs were 97.4% for the total variances in the raw data; however, partial overlapping in the cluster trends was observed. From Fig. 4, it could be seen that Western region with two locations (north and south) had the most overlapping, followed by Eastern and Ashanti. This implies that the differences between them are not so pronounced and from a simple look at the map these locations are closer to each other. Furthermore, Central, Volta and Brong-Ahafo clustered

LDA KNN BPANN SVM

Total cocoa bean samples

Identiﬁcation rate (%)

Training set

Prediction set

Training set

Prediction set

130 130 130 130

64 64 64 64

96.15 72.31 99.23 100

90.63 75.00 96.88 100

differently with some few data points overlapping. It could be explained that these differences are as a result of their geographical locations, and partially on pre-harvest and post-harvest activities. Generally, the results obtained by the PCA were satisfactory revealing that there is similar cocoa bean quality across Ghana. However, PCA is poor in discrimination because it functions by reducing dimensionality while preserving as much variance in a high dimensional space. Hence, other pattern recognition methods were employed which are known to possess strength in discrimination. The four supervised pattern recognition techniques used were cross validated to ensure its stability (as shown in Figs. 5–8). The experimental results showed that the two non-linear models (BPANN and SVM) were superior to the other two linear methods (LDA and KNN). This could be explained by the fact that non-linear models have stronger capability of self-learning and self-adjustment. For the cocoa beans used in this study, the complex organoleptic and nutritional properties could explain why linear classiﬁcation algorithm could not provide optimal solution. Furthermore, among the two non-linear techniques used, Support vector machine (SVM) model had the best or optimal discrimination performance. It could be explained that the SVM model embodies structural risk minimization principle where upper bound is reduced on the expected risk as compared to Back propagation artiﬁcial neural network (BPANN) that has difﬁculties with generalization [4,29], and leads to producing models that over ﬁt data. Conclusion The following conclusions can be drawn from this research: Near Infrared Spectroscopy technique coupled with multivariate classiﬁcation methods have proved to be a powerful tool for discriminating cocoa bean samples according to their geographical origin. The best preprocessing method was found to be MC, followed by MSC. Among the four classiﬁcation methods used, two linear (LDA and KNN) and two non-linear (BPANN and SVM) to develop discrimination models, the two non-linear models performed better than the linear ones. The performance of SVM model was superior to all the models. It can be said that SVM is

E. Teye et al. / Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy 114 (2013) 183–189

an excellent multivariate method for building identiﬁcation model for the accurate discrimination of fermented cocoa beans. Generally, it could be concluded that NIR spectroscopy and SVM can be exploited in the discrimination of cocoa beans from different geographical location for quality assurance, quality control and monitoring as compared with the more tedious and time consuming wet chemistry method normally used. Acknowledgements The authors are grateful to Jiangsu University and for their ﬁnancial support. We are also thankful to Quality Control Company Ltd., of the Ghana Cocoa Board and cocoa farmers for their reliable assistance. The perfect stimulating discussions by colleagues and proof reading by Mrs. Winifred Teye are highly acknowledged. References [1] A. Othman, A. Ismail, N. Abdul Ghani, I. Adenan, Food Chemistry 100 (2007) 1523–1530. [2] S. Jinap, P. Dimick, R. Hollender, Food Control 6 (1995) 105–110. [3] G.A. Reineccius, D.A. Andersen, T.E. Kavanagh, P.G. Keeney, Journal of Agricultural and Food Chemistry 20 (1972) 199–202. [4] Q. Chen, J. Zhao, H. Lin, Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy 72 (2009) 845–850. [5] A. Caligiani, M. Cirlini, G. Palla, R. Ravaglia, M. Arlorio, Chirality 19 (2007) 329– 334. [6] A. Cambrai, C. Marcic, S.p. Morville, P. Sae Houer, F.o. Bindler, E. Marchioni, Journal of Agricultural and Food Chemistry 58 (2010) 1478–1483. [7] D.M.A.M. Luykx, S.M. van Ruth, Food Chemistry 107 (2008) 897–911. [8] J.S. Ribeiro, M.M.C. Ferreira, T.J.G. Salva, Talanta 83 (2011) 1352–1358. [9] I. Esteban-Diez, J.M. González-Sáiz, C. Pizarro, Analytica Chimica Acta 514 (2004) 57–67.

189

ˇ opíková, M.A. Coimbra, [10] A. Veselá, A.S. Barros, A. Synytsya, I. Delgadillo, J. C Analytica Chimica Acta 601 (2007) 77–86. [11] K. Kaffka, K. Norris, F. Kulcsar, I. Draskovits, Acta Alimentaria 11 (1982) 271– 288. [12] E. Whitacre, J. Oliver, n. van den, R. Broek, P. van Engelen, B. Kremers, B. van der Horst, M. Stewart, A. Jansen-Beuvink, Journal of Food Science 68 (2003) 2618. [13] C. Pizarro, I. Esteban-Diez, A.J. Nistal, J.M. González-Sáiz, Analytica Chimica Acta 509 (2004) 217–227. [14] I. Esteban-Diez, J.M. González-Sáiz, C. Pizarro, Analytica Chimica Acta 525 (2004) 171–182. [15] G. Downey, R. Briandet, R.H. Wilson, E.K. Kemsley, Journal of Agricultural and Food Chemistry 45 (1997) 4357–4361. [16] H. Ebrahimi-Najafabadi, R. Leardi, P. Oliveri, M. Chiara Casolino, M. JalaliHeravi, S. Lanteri, Talanta 99 (2012) 175–179. [17] P.C. Aculey, P. Snitkjaer, M. Owusu, M. Bassompiere, J. Takrama, L. Nørgaard, M.A. Petersen, D.S. Nielsen, Journal of Food Science 75 (2010) S300–S307. [18] J. Luypaert, S. Heuerding, Y.V. Heyden, D. Massart, Journal of Pharmaceutical and Biomedical Analysis 36 (2004) 495–503. [19] M. Dhanoa, S. Lister, R. Sanderson, R. Barnes, Journal of Near Infrared Spectroscopy 2 (1994) 43–47. [20] Q. Chen, J. Cai, X. Wan, J. Zhao, LWT – Food Science and Technology 44 (2011) 2053–2058. [21] M. O’Farrell, E. Lewis, C. Flanagan, W. Lyons, N. Jackman, Sensors and Actuators B: Chemical 111–112 (2005) 354–362. [22] L.A. Berrueta, R.M. Alonso-Salces, K. Héberger, Journal of Chromatography A 1158 (2007) 196–214. [23] I.V. Kovalenko, G.R. Rippke, C.R. Hurburgh, Journal of the American Oil Chemists’ Society 83 (2006) 421–427. [24] B.M. Nicolaï, K. Beullens, E. Bobelyn, A. Peirs, W. Saeys, K.I. Theron, J. Lammertyn, Postharvest Biology and Technology 46 (2007) 99–118. [25] B.D. Ripley, Pattern Recognition and Neural Networks, Cambridge University Press, 2008. [26] D. Pérez-Marín, A. Garrido-Varo, J.E. Guerrero, Talanta 72 (2007) 28–42. [27] C. Cortes, V. Vapnik, Machine Learning 20 (1995) 273–297. [28] A.S. Luna, A.P. da Silva, J.S.A. Pinho, J. Ferré, R. Boqué, Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy 100 (2013) 115–119. [29] H. Lin, Q. Chen, J. Zhao, P. Zhou, Journal of Pharmaceutical and Biomedical Analysis 50 (2009) 803–808.

Rapid differentiation of Ghana cocoa beans by FT-NIR spectroscopy coupled with multivariate classification

Rapid differentiation of Ghana cocoa beans by FT-NIR spectroscopy coupled with multivariate classification

Recommend Documents