Raman spectroscopy based analysis of milk using random forest classification

Raman spectroscopy based analysis of milk using random forest classification

Accepted Manuscript Title: Raman Spectroscopy based Analysis of Milk using Random Forest Classification Authors: Arslan Amjad, Rahat Ullah, Saranjam K...

NAN Sizes 2 Downloads 121 Views

Accepted Manuscript Title: Raman Spectroscopy based Analysis of Milk using Random Forest Classification Authors: Arslan Amjad, Rahat Ullah, Saranjam Khan, Muhammad Bilal, Asifullah Khan PII: DOI: Reference:

S0924-2031(18)30159-0 https://doi.org/10.1016/j.vibspec.2018.09.003 VIBSPE 2855

To appear in:

VIBSPE

Received date: Revised date: Accepted date:

4-5-2018 3-9-2018 6-9-2018

Please cite this article as: Amjad A, Ullah R, Khan S, Bilal M, Khan A, Raman Spectroscopy based Analysis of Milk using Random Forest Classification, Vibrational Spectroscopy (2018), https://doi.org/10.1016/j.vibspec.2018.09.003 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Raman Spectroscopy based Analysis of Milk using Random Forest Classification Arslan Amjad1, Rahat Ullah2,*, Saranjam Khan2, Muhammad Bilal2, Asifullah Khan1,* 1

Pattern Recognition Lab, DCIS, Pakistan Institutes of Engineering and Applied Sciences (PIEAS), Nilore, Islamabad 45650, Pakistan. 2

IP T

Agri. & Biophotonics Division, National Institute of Lasers and Optronics (NILOP), Lehtrar road, Islamabad, Pakistan.

Abstract

M

A

N

U

SC R

The development of a classification system based on the Raman spectra of milk samples is proposed in present study. Such development could be useful for nutritionists in suggesting healthy food to infants for their proper growth. Previously, molecular structures in milk samples have been exploited by Raman spectroscopy. In the current study, Raman spectral data of milk samples of different species is utilized for multi-class classification using a dimensionality reduction technique in combination with random forest (RF) classifier. Quantitative and experimental analysis is based on locally collected milk samples of different species including cow, buffalo, goat and human. This classification is based on the variations (different concentrations of the components present in milk such as proteins, milk fats, lactose etc.) in the intensities of Raman peaks of milk samples. Principal component analysis (PCA) is used as a dimensionality reduction technique in combination with RF to highlight the variations which can differentiate the Raman spectra of milk samples from different species. The proposed technique has demonstrated sufficient potential to be used for differentiation between milk samples of different species as the average accuracy of about 93.7%, precision of about 94%, specificity of about 97% and sensitivity of about 93% has been achieved.

ED

Keywords: Raman spectroscopy; Milk; Principal component analysis; Random forest classifier.

A

CC E

PT

* Corresponding Authors Email: [email protected] & [email protected]

1

1.

Introduction

SC R

IP T

Milk is considered to be an essential part of a balanced diet and an important source of dietary energy, fats and proteins which are required for infants as well as adults. It is a complex food and its quality is mainly determined by the components such as proteins, minerals, vitamins, carbohydrates and fats. Proteins play several important roles in our body like fight against diseases, muscles building and regeneration of cells etc. [1]. Apart from the main elements, milk also contains microelements like calcium, potassium, magnesium and phosphorus. Calcium and phosphorus are the two components that are found in milk in approximately the same ratio as that in bone [1]. Different vitamins like A, D, E, K and riboflavin (vitamin B2) etc., which are water soluble and help to promote healthy skin and eyes are also found in milk. From a consumption point of view, about 85% of the world milk production comes from cow, 11% from buffaloes, 2.3% from goats, 1.4% from sheep and 0.2% from camels [2]. The composition of milk corresponds to the requirements of the offspring, which differs from species to species [3].

M

A

N

U

Differences in milk composition among different species has remained an area of interest for the past several years. Studies have shown that the protein content of milk varies from species to species based on which different species can be identified e.g. cow milk has more protein than human while human milk has more carbohydrate than cow. Similarly, the fatty acid composition in milk is also different among species [4 5]. Furthermore, recent studies have shown that within species, the milk composition can vary due to genetics (not only at a species level but also at breed and individual level), physiology (male/female infant, lactation stage, milking interval), nutrition (feed energy value and composition) and environment (location, season). In the past two decades interest in the study of milk composition has increased with the development of newer analytical techniques that could yield reliable results [6].

CC E

PT

ED

Many techniques are in use for the analysis of milk composition. These techniques include infrared (IR) spectroscopy, high performance liquid chromatography (HPLC), nuclear magnetic resonance (NMR), electrophoresis and liquid chromatography mass spectrometry (LCMS). Most of these techniques have high accuracy and sensitivity but have obstruction in term of financial cost [7]. Spectroscopic techniques include: infrared, UV visible and luminescence or Raman spectroscopy. Raman spectroscopy is a vibrational spectroscopic technique which is distinguished from other analytical techniques (e.g. HPLC, PCR) based on its low cost, easy application, high speed accuracy, and its extensive information content on both the structural and chemical composition of biomolecules within the micro-organisms.

A

This article proposes the use of Raman spectroscopy to classify milk samples of different species based on their Raman signatures. This technique comprises of principal component analysis (PCA) used as a dimensionality reduction technique in combination with random forest (RF) classifier to highlight the variations which can differentiate the Raman spectra of milk samples from different species. In developed and especially in developing countries cow, goat, buffalo etc. milks are considered as replacement of the feeding mother milk. The idea of differentiation of milk based on species may help nutritionists in suggesting proper milk to an infant i.e. promoting gender’s specific healthy foods, which is necessary for the physical development of child. Such kind of research opens up a new window for the researchers to analyzed and compare milk composition among species using other established techniques as well. To the best of the authors’ knowledge, the results presented in this article are the first of its kind and have not been reported previously.

2

2.

Data Collection and Analysis

2.1

Sample Collection and Preparation

2.2

SC R

IP T

In total, 602 samples of milk were used in this study. Among these samples; 152 samples were of cow (Bos taurus) milk, 210 samples were of human (Homo sapiens) milk, 120 samples were of buffalo (Bubalus bubalis) milk and 120 samples were of goat (Capra aegagrus hircus) milk. All samples were collected at the time of lactation from local farm houses/hospitals in the surrounding areas of Islamabad, Pakistan. From each subject, about 20 ml fresh milk was acquired at the time of lactation except human. Human milk samples were collected in the capped glass tube, after taking written consent of the feeding mothers. Prior to samples collections, written approval from the ethical committee of Pakistan Atomic Energy Commission general hospital has been obtained. Uniformity of the samples was ensured by collecting these samples from the same area (locality) and same group of ages. Ethical standards have been strictly followed for conducting this study. For transporting milk samples to the laboratory, standard carrier box with ice packs has been used. All the samples were stored in freezer at the temperature of -16°C without any processing till further use.

Instrumental Setup

Pre-processing and Data Analysis

PT

2.3

ED

M

A

N

U

From all milk samples, a drop of 20 μl was placed on a glass slide and was kept at room temperature for about 30 minutes prior from recording Raman spectra with Raman spectrometer (μRamboss DONGWOO OPRTON, South Korea). This instrument has spectral resolution of 4 cm-1. A laser beam with 532 nm wavelength was used for excitation. The laser power at the sample surface was 40 mW. A microscope with objective of 100X magnification and numerical aperture of 0.7 was used for focusing purpose as well as for collection of back-scattered light (Raman scattered). In order to record a Raman spectrum, an exposure time of 10 seconds was set. Five spectra were recorded from each sample but from different spot position to get representation of the whole sample. An integrated camera of this Raman system enabled us to capture the image of surface of the sample. Post-excitation image was compared with pre-excitation image of the sample to assess any sort of thermal damage caused by excitation. Raman spectral range from 600 to 1800 cm-1 was selected [8].

A

CC E

In Raman spectra of all biological samples, in conjunction with the instrumental noise, strong fluorescence background exists due to the presence of natural fluorophores that weaken the Raman signal. The phenomenon of fluorescence is much stronger (106 to 108 times) than Raman, which suppresses the Raman signal [9]. Therefore, all spectra were preprocessed for the adjustment of fluorescence background, baseline correction, and removal of noise contribution. A separate figure of the raw data without any base line correction or smoothening is provided as a supporting figure (Figure 6S). For whole sample representation purpose, five spectra from different locations of each sample has been obtained and averaged. For smoothening purpose, ‘loess’ (local regression using weighted linear least squares) smoothening method with span of 5 and 2nd order polynomial was applied [10]. A script developed in the MATLAB environment was used for preprocessing. For background adjustment, built-in function ‘msbackadj’ was used by keeping the window size set at 50 while step size at 40.

2.4

Imbalanced Dataset Learning

Imbalanced data typically have non-uniform class distribution which affects the performance of classification. Classification algorithms struggle with accuracy because of the unequal distribution in class variables. Presenting imbalanced data to a classifier effects the generalization of classifier

3

which results in lower performance on the testing data as compared to the training data [11]. In the current study, stratified cross-validation is used for parameter selection. Stratified cross-validation is a variation of the k-fold cross-validation, where it is ensured that folds are generated by preserving the percentage of samples for each class as in the original data set. In order to stratify the data, a MATLAB (2016) code has been implemented.

2.5

Dimensionality Reduction Techniques

PT

ED

M

A

N

U

SC R

IP T

When dimensions of data is greater than number of samples then the bias introduce by regularization can be so large towards the training data, that the model heavily underperforms [12], or one can say that the learner is affected by the redundant and irrelevant attributes. To avoid this problem we use a dimension reduction (DR) technique. It is a method of projecting the high dimensional data to lower dimension, ensuring that it conveys similar information concisely [13]. Reducing the dimensions of data may allow one to plot and visualize/observe the patterns more clearly and precisely. Further, the performance of the machine learning algorithm is improved by reducing the feature collinearity. The 3-D visualization of data from different angles is given in Figure 1.

Figure 1: 3D visualization of training dataset classification from two different angles.

A

CC E

Principle component analysis (PCA) is the most frequently used technique for dimensionality reduction. In this technique, variables are projected into a new coordinate system called the principal component (PC) system. The variables are projected in such a way that the variance of the data in the low dimensional space (PC coordinate system) is maximized. These new sets of variables are acquired such that the first principal component (PC1) contains highest possible variation of original data after which each successive component has the next most possible variance. The second principal component (PC2) must be perpendicular to the PC1, and so on for the remaining principal components (PC3, PC4, etc.). In other words, PC2 and PC3 find the variance in the data that is not discerned by the first principal component (PC1). In most cases, only the first few principal components explain more than 95% variation in the data. As a result, one can ignore the higher PCs without losing much of the information [14].

4

3.

Training of the Proposed Classification Technique

CC E

PT

ED

M

A

N

U

SC R

IP T

Flowchart of the proposed methodology is shown in Figure 2. Initially, the Raman spectra were acquired from the samples. After collecting the data, pre-processing methods were applied. Then, machine learning techniques such as PCA was used for dimensionality reduction whereas RF was used for the classification of species using the transformed feature space as an input vector.

Figure 2: Flowchart of the proposed methodology.

A

3.1 Data Partitioning Spectral data was partitioned into training and test data sets with 60% and 40% of the samples respectively. For the training of classification model, optimal parameters (i.e. number of trees, depth, number of random features etc.) are selected by exploiting k-fold cross validation. Stratified 5-fold cross validation is employed on the training data, which divides the training data into 5 different disjoint sets and at each fold, 4 disjoint sets (80% of the training data) are used for the training of the model, while one set (20% of training data) is used for the validation of the model. Parameters are selected by evaluating the model performance on validation data, and we chose

5

IP T

those parameters, which give the lowest classification error on validation data. After selection of the parameters, model is trained with these parameters on 60% of the training data. Once the optimal model is obtained, the model is used for prediction on 40% of unseen test data (separated from training data) to evaluate the performance of the model. The data partitioning is shown in Figure 3.

SC R

Figure 3: Partitioning of the data set into training, testing and validation sets. 3.2 Model Training and Classification

A

CC E

PT

ED

M

A

N

U

Since most of the variance in the data set is explained by the first few PCs, so only those components are selected which depicts the maximum variance of the data. In this study, first eight PCs were used which covers the 90% of the variance of the data as shown in Figure 4.

Figure 4: Percentage variance of first ten PCs.

RF is an ensemble learning algorithm which generates multiple classification tress (weak classifiers) and makes a final decision based on their aggregated decision. It can be used for multi-class classification problem. The training algorithm for RF applies a technique termed as bootstrap aggregating or bagging. A subset of the original training data or bootstrap sample data is used to train each tree that determine the split in a randomly selected subset of the input variables. Bootstrapping method generates multiple classification trees which give better classification performance because it reduces the variance of the classification model without affecting the bias

6

A

CC E

PT

ED

M

A

N

U

SC R

IP T

[15]. Each tree in RF predicts some output and the final prediction of a classifier is based on the majority voting. In RF, number of trees play an important role in regulating the bias and variance of the classification model. RF works by creating a number of trees and then taking average of them, which reduces the variance of the model over a single tree. The higher number of trees reduces the variance of the model as well as increases the confidence (average accuracy) of the model without affecting the bias. RF model with a small number of trees has high statistical variance. RF achieves a lower test error solely by variance reduction. But large number of trees increase the computational time, therefore it is required to select an optimal number of trees without affecting the performance and increasing the computational complexity of the model. Therefore in this study, out of bag error (shown in figure 5) and cross validation is used to select the optimal number of trees (which are noted to be n=100) to make classification model computationally efficient without affecting the accuracy (maintaining low statistical variance) of model. Here out-of-bag classification error (OOBE) is the mean prediction error on each training sample, let’s say x; evaluates performance improvement by employing only those trees which did not contain sample x in their subset or bootstrap sample [16].

Figure 5: Trend of out-of-bag classification error (OOBE) against generated number of trees.

4. Results and Discussion 4.1 Raman Spectral Analysis In figure 6, the mean vector normalized Raman spectra of milk samples from four different species have been shown. All the samples show Raman peaks/bands at shift position 880, 920, 1060, 1120, 1150, 1265, 1290, 1435, 1510, 1640 and 1725 cm-1. Mostly, the Raman peak position (Raman shifts)

7

IP T

in all types of samples remains the same, showing similar structure of bio-molecules except variation in their intensities that depicts difference in the contents and concentration of these similar biomolecules. These Raman signatures correspond to fats (lipids) along with protein and other constituents which are abundantly available in milk [16]. The Raman peak/band appeared at shift position 880 cm-1 arises due to δ(ring) of tryptophan. Similarly, the peak at 920 and 1060 cm-1 are due to stretching vibration of ν(C-C) available in lipids [17]. The Raman peak appeared at 1120 cm-1 corresponds to ν(C–C) stretching modes of saturated fatty acids more precisely myristic and palmitic acids, which are abundantly available in milk fats [18]. Moreover, the Raman bands at 1435 and 1725 cm-1 mostly arise due to the scissoring CH2-deformation and C=O vibrational modes in lipids respectively [19–21].

CC E

PT

ED

M

A

N

U

SC R

A carotenoid known as β-carotene has Raman peaks at 1150 cm-1 and 1510 cm-1 in each sample except in buffalo milk [22–24]. In buffalo milk samples, it is converted into vitamin A internally. Similarly, the Raman peaks at shift position 1265 and 1290 cm-1 are associated with the C–H bending in R–HC=CH–R and C–H twisting mode of –CH2 group respectively [21,25]. The Raman peak at shift position 1640 cm-1 is prominent in human as compared to other species. This is because of the cis double bond stretching ν(C꞊C) of RHC꞊CHR [21,22,26]. Human milk contains less proteins and fats as compared to other species milk but has more lactose (carbohydrates), which results in its considerably higher energy contents. Moreover, goat milk has similar protein contents (including βlactoglobulin) as cow milk. However, the fat globule size in goat milk is smaller than cow milk which makes it easily digestible [8,21,27].

Figure 6: Mean vector normalized Raman spectra of milk samples of four species.

A

Due to high concentration of other components in goat milk such as oligosaccharide, sialic acid, free amino acids, cysteine etc., the Raman spectral intensity of goat milk is highest among all the species in this study. Comparison between the Raman spectra of human and buffalo milk shows that the fat and total solids contents are present in highest concentration in buffalo milk, whereas the human milk contains the highest carbohydrates and lactose contents as shown in table 1.

4.2 PCA & RF based Analysis of Milk using Raman Spectrum From classification point of view, the variations in the intensity of Raman peaks are very important, as it corresponds to the differences in concentration of nutrients present in milk samples of different

8

species. The peaks, where the intensity variations are large, act as strong features for the classification. These variations in peaks act as fingerprints for the classification system. A series of trials were performed to verify the correctness of the proposed system. The average classification accuracy for training and testing phase is 94.30% and 93.97% respectively, as shown in table 2. The aforementioned results are obtained by running classification model for training as well as testing phases repeatedly for five times such that the result of each run is also reported.

SC R

IP T

The measures used in this work are true positive, true negative, false positive, false negative, accuracy, sensitivity, specificity, and overall model accuracy. True positives are the number of instances or samples that are classified correctly whereas true negative are instances or samples that are correctly rejected. Similarly, samples that are assigned to a class incorrectly are called false positive while samples that are miss-classified but belong to a class are false negative. Accuracy is the ratio of the sum of true positive and true negative to the overall samples. Sensitivity is the measure of positives samples that are correctly classified. Similarly, Specificity gives the measure of negative samples that are correctly classified.

M

A

N

U

The experimental results on testing data against each class (confusion matrix) are shown in table 3 and 4. The best classification is achieved in case of human that is evident from confusion matrix (table 4). The sensitivity (1) and specificity (0.99) values suggests that human milk composition is distinct as compared to other species (cow, buffalo and goat). These findings also coincide with experimental results [2] which suggests that proportion of milk components in human [table 2] is different from other species. Misclassification rate [table 3 and 4] as well as ratio of milk components [table 2] suggests that cow, buffalo and goat have overlapping features. The overall percentage average accuracy of the model was found to be 93.63%, which showed that the proposed system classifies the species well.

5. Conclusion

A

CC E

PT

ED

In this study, it is shown that Raman spectroscopy in combination with PCA and RF is able to differentiate between milk samples of different species. Such type of automatic classification of milk samples is very important from nutritional point of view, especially for infants. A series of experiments were conducted to verify the correctness of the proposed system. The average accuracy of the proposed model was found to be 93.97%, which is promising. Particularly, RF showed highest accuracy for human and goat samples. Based on these findings, the proposed technique can be easily used to differentiate between milk samples of different species, which can help nutritionists in suggesting an appropriate milk source for a child i.e. promoting gender’s specific healthy food which is necessary for the physical development of child. Moreover, the proposed system can be enhanced by adding more species like camel, sheep etc. into the data set with more data samples. It is also suggested that Raman spectroscopy together with chemometric methods can be helpful in differentiation between species based on other dairy products (e.g. yogurt and butter etc.).

Acknowledgement: We are thankful to Ms. Fatima Batool and Mr. Muhammad Irfan, scientific assistant, Agri. & Biophotonics Division at National Institute of Lasers and Optronics (NILOP), for their assistance in the presented research work.

9

References Y.W. Park and G.F. Haenlein Handbook of Milk of Non-Bovine Mammals edi. Y W Park and G F W Haenlein, (Ames, Iowa, USA: Blackwell Publishing Professional), 2008.

[2]

Gantner V, Mijić P, Baban M, Škrtić Z and Turalija A, Mljekarstvo 65(2015), 223–31.

[3]

Jenness R, Sloan R E, Dairy Science 32(1970) 599-612.

[4]

Ullah R, Khan S, Ali H, Bilal M and Saleem M, PLoS One 12(2017) 1–10.

[5]

Ullah R, Khan S, Ali H, Bilal M, Saleem M, Ahmed M and Mehmood A, J. Raman Spectrosc. 48(2017) 692–696.

[6]

Iyengar G V 1982 Elemental Composition of Human and Animal Milk - A Review by G.v. Izengar (A Report Prepared Under the Auspices of the IAEA in Collaboration with WHO) (VIENNA)

[7]

Reid L M, O’Donnell C P and Downey G, Trends Food Sci. Technol. 17(2006) 344–53.

[8]

Ali H, Nawaz H, Saleem M, Nurjis F and Ahmed M, J. Raman Spectrosc. 47(2016) 706–11.

[9]

Shreve A P, Cherepy N J and Mathies R, Appl. Spectrosc. 46(1992) 707–11.

[10]

Ullah R, Khan S, Ali H, Bilal M, Saleem M, Mahmood A and Ahmed M, J. Raman Spectrosc. 48 (2017) 692–6.

[11]

Khalilia M, Chakraborty S and Popescu M, Med. Inform. Decis. Mak. 11(2011) 51.

[12]

Makihara Y, Muramatsu D, Iwama H, Ngo T T, Yagi Y and Hossain M A, IJCB IEEE/IAPR International Joint Conference on Biometrics (IEEE) (2014) 1–8.

[13]

Burges C J C Found. Trends® Mach. Learn. 2(2009) 275–364.

[14]

Abdi H and Williams L J Wiley Interdiscip. Rev. Comput. Stat. 2(2010) 433–59.

[15]

Manly B F J Multivariate statistical methods : a primer (Chapman and Hall London – New York, 159 S. ISBN 0‐412‐28610‐6, ISBN 0‐412‐28620‐3) 1986.

[16]

Banfield R E, Hall L O, Bowyer K W and Kegelmeyer W P IEEE Trans. Pattern Anal. Mach. Intell. 29 (2007) 173–80.

[17]

Liu Z, Davis C, Cai W, He L, Chen X and Dai H, Proc. Natl. Acad. Sci. Unit. States Am., 105 (2008) 1410–1415.

CC E

PT

ED

M

A

N

U

SC R

IP T

[1]

Muehlhoff E, Bennett A and McMahon D, Milk dairy Prod. Hum. Nutr. 2013.

[19]

Stone N, Kendall C, Smith J, Crow P and Barr H, Faraday Discuss., 126 (2004) 141–157.

[20]

Huang N Y, Short M, Zhao J H, Wang H Q, Lui H, Korbelik M and Zeng H S, Optic. Express 19 (2011) 22892–22909.

A

[18]

[21]

El-Abassy R M, Eravuchira P J, Donfack P, von der Kammer B and Materny A Vib. Spectrosc. 56(2011) 3–8.

[21]

Raynal-Ljutovac K, Lagriffoul G, Paccard P, Guillet I and Chilliard Y, Small Rumin. Res. 79(2008) 57–72

[22]

Weng Y-M, Weng R-H, Tzeng C-Y and Chen W Appl. Spectrosc. 57(2003) 413–8.

[23]

Rodrigues Júnior P H, de Sá Oliveira K, Almeida C E R de, De Oliveira L F C, Stephani R, Pinto M

10

da S, Carvalho A F de and Perrone Í T Food Chem. 196(2016) 584–8. Manoharan R, Baraga J J, Feld M S and Rava R P J. Photochem. Photobiol. B. 16(1992) 211–33.

[25]

McGoverin C M, Clark A S S, Holroyd S E and Gordon K C Anal. Chim. Acta 673(2010) 26–32.

[26]

Almeida M R, Oliveira K de S, Stephani R and de Oliveira L F C J. Raman Spectrosc. 42(2011) 1548–52.

A

CC E

PT

ED

M

A

N

U

SC R

IP T

[24]

11

IP T SC R

U

Table No. 1: Variation in milk macronutrients among various mammals [2]. Milk

M

A

N

Cow 3.3-6.4 3.0-4.0 4.4-5.6 0.7-0.8 270-280

ED

Fat (g/100g) Protein (g/100g) Lactose (g/100g) Ash (g/100g) Energy (kJ/100g)

Human 2.1-4.0 0.9-1.9 6.3-7.0 0.2-0.3 207-209

Buffalo 5.3-15.0 2.7-4.7 3.2-4.9 0.8-0.9 420-480

Goat 3.0-7.2 3.0-5.2 3.2-4.5 0.7-0.9 280-290

Table 2: Average percentage accuracies of five experimental runs on test data. Phase Run-1 Run-2 Run-3 Run-4 Run-5 Average Accuracy 93.87

Testing

92.80

95.05

92.80

95.01

94.78

94.30

93.91

94.18

93.35

95.01

93.97

CC E

PT

Training

A

Table 3: Classification results (on test data) with RF for four classes. Actual Classes Performance Measures Cow Human Buffalo 86 124 63 True Positive 11 0 10 False Positive 5 2 9 False Negative 259 235 279 True Negative 0.89 1.00 0.86 Accuracy 0.95 0.98 0.88 Sensitivity 0.96 1.00 0.97 Specificity 93.63% Average Accuracy % 12

Goat 65 2 7 287 0.97 0.90 0.99

Table 4: Misclassification results on test data with RF for four classes.

Actual Classes Human Buffalo

Goat

Cow

85

0

4

2

Human Buffalo Goat

0 2

126 2

0 64

0 4

7

0

2

A

CC E

PT

ED

M

A

N

U

SC R

Figure 6S: Mean vector normalized Raman spectra of milk samples of four species.

IP T

Predicted Classes

Cow

13

63