On farm automatic sheep breed classification using deep learning

Computers and Electronics in Agriculture 167 (2019) 105055 Contents lists available at ScienceDirect Computers and Electronics in Agriculture journa...

Download PDF

4MB Sizes 0 Downloads 47 Views

Report

Full Text

Computers and Electronics in Agriculture 167 (2019) 105055

Contents lists available at ScienceDirect

Computers and Electronics in Agriculture journal homepage: www.elsevier.com/locate/compag

On farm automatic sheep breed classiﬁcation using deep learning a,⁎

b

Sanabel Abu Jwade , Andrew Guzzomi , Ajmal Mian a b

T

a

Department of Computer Science and Software Engineering, The University of Western Australia, 35 Stirling Hwy, Crawley, WA, Australia Department of Mechanical Engineering, The University of Western Australia, 35 Stirling Hwy, Crawley, WA, Australia

A R T I C LE I N FO

A B S T R A C T

Keywords: Agricultural automation Computer vision Deep learning Convolutional neural networks Sheep breed identiﬁcation Image classiﬁcation

Automatic identiﬁcation of breeds of sheep can be valuable to the sheep industry. Sheep producers need to identify diﬀerent breeds of sheep to estimate the commercial value of their ﬂock. In many situations however, farmers ﬁnd it challenging to identify the breeds of sheep without a great deal of experience. DNA testing is an alternative method for breed identiﬁcation. However, it is not practical for real time assessment of large quantities of sheep in a production environment. Hence, autonomous methods that can eﬃciently and accurately replicate the identiﬁcation ability of a sheep breed expert, while operating in a farm environment are beneﬁcial to the industry. Our original contributions in this ﬁeld include: setting up a prototype computer vision system in a sheep farm, building a database compromising 1642 sheep images of four breeds captured on a farm and labelled by an expert with its breed and training a sheep breed classiﬁer using machine learning and computer vision to achieve an average accuracy of 95.8% with 1.7 standard deviation. This classiﬁer could assist sheep farmers to accurately and eﬃciently diﬀerentiate between breeds and allow more accurate estimation of meat yield and cost management.

1. Introduction Proﬁt for sheep producers has a direct relationship between the commercial value of the ﬂock and the cost to grow the sheep. The commercial value of a sheep in Australia depends mainly on its meat weight, also known as carcass weight (MLA, 2017). A study (Rowe and Atkins, 2006) showed that only 80% of the total ﬂock contribute to the farm’s productivity and proﬁtability. Therefore, optimising the remaining 20% of the ﬂock could signiﬁcantly improve proﬁts. Farmers typically do not have the means to pre-emptively estimate the productivity of their ﬂock as carcass weight is only determined after the sheep are released and slaughtered at the abattoir. Farmers currently use live weight, obtained through automatic drafting of their ﬂocks, to estimate when to release sheep. In a drafting session, sheep get weighed and are segregated into groups based on their live weight. However, studies (Kirton et al., 1984; Hopkins, 1991) have shown that many other factors aﬀect meat yield. For example, live weight includes gut ﬁll, wool weight, and bone frame. Gut ﬁll was found to have a signiﬁcant impact on meat yield prediction (Kirton et al., 1984). Weights of sheep dropped 2 kg between being weighed oﬀ pasture and again after an overnight fast. Because of these diﬀerences, (Kirton et al., 1984) provided two separate meat yield prediction models for sheep

with diﬀerent gut ﬁlls. A general estimation of the gut ﬁll of the ﬂock can be made by estimating the time from their last feed (MLA, 2017). Another factor that inﬂuences meat yield estimation is wool weight (Kirton et al., 1984; MLA, 2017). A number of studies have deﬁned prediction models for wool growth with fairly high accuracy (Hong et al., 2000; Finlayson et al., 1995). For example, Hong et al. (2000) modelled the 12-month pattern of wool growth rate as a function of live weight, breed, age, and rump status. The model predicted the rate of growth of wool that ranged between 6 and 16 g/day with 2.86 g/day average root mean square error. Moreover, diﬀerent breeds have different meat yields. Identifying optimum timing to release each breed would be beneﬁcial. Kirton et al. (1984) performed a study on 2207 sheep to investigate factors that aﬀect meat yield. They compared the meat production of shorn sheep of similar live weights sired by twelve diﬀerent breeds. They found that the meat production was similar for sheep of the same breed but diﬀerent from one breed to another. For example, Merino cross sheep were found to have meat yield 0.7 kg less than Southdown cross sheep of the same live weight. This small variation of meat yield between individual sheep can translate into signiﬁcant losses for enterprises comprising thousands of sheep. The three discussed factors: gut ﬁll, wool weight and breed can provide farmers with valuable insights about the productivity of their ﬂock. Gut ﬁll and

⁎

Corresponding author. E-mail addresses: [email protected], [email protected] (S. Abu Jwade), [email protected] (A. Guzzomi), [email protected] (A. Mian). https://doi.org/10.1016/j.compag.2019.105055 Received 16 June 2019; Received in revised form 10 October 2019; Accepted 13 October 2019 0168-1699/ © 2019 Elsevier B.V. All rights reserved.

Computers and Electronics in Agriculture 167 (2019) 105055

S. Abu Jwade, et al.

Fig. 1. Conventional drafter layout with three camera positions: (1) at the exit of the weighing station, (2) at the weighing station and (3) at the entrance of the weighing station.

Fig. 2. The four drafted sheep breeds: (a) Marino; (b) Suﬀolk; (c) White Suﬄok; (d) Poll Dorset; and (e) a sample of the variability in the image dataset due to sheep movement, shadow movement and variable light conditions.

from very large training sets in a magnitude exceeding tens of thousands of objects (Krizhevsky et al., 2017). This capacity allows it to learn objects in realistic settings including objects which exhibit considerable variability in appearance (Krizhevsky et al., 2017). Because of CNNs, deep learning based image classiﬁcation is now performing better than human vision in many tasks (Devikar, 2018). A large and growing body of literature has investigated the use of CV to classify and manage livestock based on body condition, cleanliness, lameness, and pain level (Spoliansky et al., 2016; RMIT, 2017; Van Hertem et al., 2014; Lu et al., 2017). However, very few of these applications were designed for sheep farms because of the challenges associated with: 1) segmenting individual sheep from a ﬂock with uniform colour, 2) predicting a naturally de-formable body shape and 3) extracting body features in the presence of wool (Sarwar et al., 2018; Kassler, 2001; Burke et al., 2004). CV classiﬁcation applications in the agricultural ﬁeld typically use classiﬁers such as: support vector machines (SVM) (Lu et al., 2017), CNN (Kumar et al., 2018; RMIT, 2017) or clustered polynomial regression model (Spoliansky et al., 2016). It can be noted that CNN achieved good accuracy levels in cleanliness (RMIT, 2017) and cattle classiﬁcation (Kumar et al., 2018) at 80% and 76% respectively. Although the polynomial regression model (Spoliansky et al., 2016) achieved higher accuracy (91%), it is only eﬀective when the output has a continuous range and there is a small number of discriminating features between classes (Armstrong, 2012). SVMs are traditional ML techniques that analyse data for classiﬁcation and regression. They are based on the idea of ﬁnding a hyperplane that best segregates the data points of the two diﬀerent classes (Manning

wool weight prediction models already exist in the literature. However, there are no models for automatic sheep breed identiﬁcation that currently exist. Therefore, this paper will focus speciﬁcally on automatic sheep breed identiﬁcation during the drafting process. Diﬀerent breeds of sheep are in many cases phenotypically diverse. Several studies (Carneiro et al., 2010; Asamoah Boaheng et al., 2016; Searle et al., 1989) have used body measurements to identify sheep breeds. Carneiro et al. (2010) successfully identiﬁed eleven breeds using only three body measurements, namely: shoulder height, head width and length. Similarly, Asamoah Boaheng et al. (2016) used a mathematical model to classify three African sheep breeds using six body measurements and achieved 86.2% classiﬁcation accuracy. Body appearance was also used by Searle et al. (1989) where it was found that, at any given live weight, one of the breeds had longer legs and smaller shoulders than the other. As diﬀerent breeds have subtle differences in appearance, computer vision (CV) and machine learning (ML) based approaches could oﬀer beneﬁts to the industry. One of the most powerful ML techniques is deep learning. Deep learning uses a cascade of many layers of processing units for feature extraction and transformation. Convolutional neural network (CNN) is one of the most popular algorithms for deep learning that is commonly applied to analyse and classify visual imagery (Zeiler and Fergus, 2013). CNN employs deep neural network architectures to automatically learn discriminating features from an input image without the need for feature engineering (Hinton et al., 2006). This independence from prior knowledge and human eﬀort in feature design is a major advantage. With its complex architecture, CNN has the capacity to learn 2

Computers and Electronics in Agriculture 167 (2019) 105055

S. Abu Jwade, et al.

Fig. 3. The overall pipeline of the proposed methodology. Individual frames are extracted from videos, labelled and preprocessed before being used to train and test deep learning models.

Training a deep CNN from scratch requires a very large labelled dataset of images (in the order of tens of thousands or more) (Krizhevsky et al., 2017). However, this number of images is not always available in practice. Therefore, it is common to use a limited amount of training data to re-train an existing model in a process called transfer learning (Oquab et al., 2014). In transfer learning, a CNN model that is pre-trained using a very large dataset of images is reused or ﬁne tuned for a relevant classiﬁcation task (Yosinski et al., 2014; Oquab et al., 2014).

et al., 2009; Nasiriany et al., 2018). Although there is a large amount of literature on identifying different breeds of animals such as dogs, cats and birds (Parkhi et al., 2012; Liu et al., 2012; Devikar, 2018; Atanbori et al., 2016), few studies were found about automatic classiﬁcation of sheep breeds. CNN achieved a very high accuracy of 96% in classifying a large group of dog breeds, but this was only possible with the use of a considerable amount of data (100,000) (Devikar, 2018). Unlike other classiﬁers, CNN can learn from much larger datasets.

3

Computers and Electronics in Agriculture 167 (2019) 105055

S. Abu Jwade, et al.

Fig. 4. Image sets for training a sheep breed classiﬁer: (a) full sheep images with variations; and (b) sheep face images with reduced variations.

weights) (Simonyan and Zisserman, 2014):

• thirteen convolutional layers with very small receptive ﬁeld of size 3 × 3, • ﬁve max-pooling layers of size 2 × 2 to carry out spatial pooling, • rectiﬁcation non-linearity (ReLu) layers following all hidden layers, and • three fully-connected layers, with the ﬁnal layer being the soft-max layer.

Since there appears to be no automatic sheep breed classiﬁcation system present in the reviewed literature, this paper aims to diﬀerentiate sheep breeds using computer vision and machine learning on farm.

Fig. 5. Preprocessing sheep face images. Images were aligned using two points (right eye and septum). Image alignment was veriﬁed as follows: one colour channel was extracted from three diﬀerent images and then superimposed to create an RGB image. The face in the resulting image reﬂected a good alignment of the three images.

2. Materials and methods The approach taken in this project was to obtain images of sheep breeds from an operational farm to build a prototype CV system. These images were pre-processed in order to enable training and testing CNN models.

The VGG-16 model (Simonyan and Zisserman, 2014) is a 16-layer CNN developed by the Visual Geometry Group at the University of Oxford. The model was trained on 1.2 million images belonging to 1000 categories from the ImageNet dataset (Deng et al., 2009). Because of the model’s ability to capture features from images distributed over large and diverse classes, VGG-16 has been shown to be very useful for solving cross domain image recognition and classiﬁcation problems (Simonyan and Zisserman, 2014). For example, applying transfer learning on VGG-16 has been explored in bird species classiﬁcation (Molchanov et al., 2016), pollen-bearing bees recognition (Rodriguez et al., 2018), semantic segmentation (Long et al., 2015), pavement distress detection (Gopalakrishnan et al., 2017) and cell nuclei classiﬁcation (Bayramoglu et al., 2016). The model brings an improvement on the prior-art architectures by increasing the depth and using very small convolution ﬁlters (Simonyan and Zisserman, 2014). While there are much more sophisticated models which outperform VGG-16 in ImageNet challenge (Russakovsky et al., 2015), the model is remarkable for its simplicity. As a comparison, the authors of GoogLeNet model emphasised that their model places notable consideration on memory and power usage because of its complexity (Szegedy et al., 2015). Similarly, Microsoft’ s ResNet model which was the winner of 2015 ImageNet challenge has 152 layers (He et al., 2016). The architecture of the VGG-16 model comprises 138 million parameters and consists of 41 layers (16 of which are with learn-able

2.1. Data acquisition and preprocessing A conventional drafting unit is depicted in Fig. 1. Such a system is installed at the Kingston Rest, a commercial sheep producing farm in the South-West of Western Australia. The farm was visited in Autumn on May 2nd, 2018 from 10:00 am to 1:00 pm. The weather was partly cloudy with light intermittent showers and an average temperature of 15.6 °C (Bunbury, 2018). A method to obtain thousands of pictures in a farm environment during the drafting process was required to train a ML model. A GoPro Hero5 camera (model: GPCHDHX-502) was used to collect the images of diﬀerent sheep breeds. This camera model was chosen because it is water proof, rugged and able to cope with vibration caused by the drafting machines. The camera was set to record at 24 fps with 1920 × 1080 resolution. Diﬀerent camera positions were tested, as shown in Fig. 1, and evaluated based on the captured videos. A good quality video was deﬁned as one that provided an unobstructed view of the face and the front of the sheep and captured one sheep at a time to avoid the segmentation problem. A total of 160 sheep from 4 diﬀerent breeds were drafted. These 4

Computers and Electronics in Agriculture 167 (2019) 105055

S. Abu Jwade, et al.

Fig. 6. VGG-16 architecture. The last six layers of the VGG-16 were ﬁne-tuned in this study. Table 1 A list of training hyper parameters used in the study. Parameter name

Description

Selected value

Optimisation algorithm

Updates the network parameters to minimise the loss function by taking small steps in the direction of the negative gradient of the loss speciﬁes the maximum number of full passes through the entire dataset Speciﬁes the number of parameter updates per epoch Speciﬁes the number of observations in each iteration Speciﬁes the global learning rate

Stochastic Gradient Descent with Momentum (sgdm) 10 129 10

Maximum number of epochs Iterations per epoch Mini batch size Initial learning rate Weights learning rate factor Biases learning rate factor Validation frequency Execution environment

Speciﬁes the factor by which the global learning rate gets multiplied to determine the learning rate for the weights in fully connected layers Speciﬁes the factor by which the global learning rate gets multiplied to determine the learning rate for the biases in fully connected layers Speciﬁes the frequency of validation during training Speciﬁes the execution environment of the training

1 × 10−4 10 10 every 3 iterations CPU

Table 2 A summary of four conducted experiments. Experiment Objective

Approach

Controlled Variable

1. Analysing the eﬀect of varying the number of ﬁne-tuned layers

Two models were trained using the same training parameters such that only the last 3 fully connected layers were ﬁne-tuned for one of the models and the last 6 layers were ﬁne tuned for the other. Both models were trained for 5 epochs using the same dataset such that 80% of the data was used for training and 20% was used for testing. Pre-trained VGG-16 was used as a deep feature generator where the output of the intermediate layer (fc8) was used to train one-versus-all SVM. The performance of this model was compared to the ﬁne-tuned model. Two models were trained using the same training parameters; however, one having 5 epochs and the other having 10 epochs. The same dataset was used for training the two models. Using the training parameters described in Table 1, a model was trained once using noisy images (full sheep images) and once using less noisy images (sheep face images). Training was done for ﬁve folds.

Number of ﬁne-tuned layers

2. Evaluating the improvement that the ﬁne-tuned model brings over the baseline VGG-16 model 3. Finding the ideal number of epochs for training

4. Analysing the eﬀect of noise and variation in the training dataset

Transfer learning approach Number of epochs

Type of images used for training

an equal number of frames was taken for each bread in order to have a balanced data set. Balancing the dataset was important to prevent the model’s bias to the breed with the maximum number of frames. Four hundred frames for each breed were used. This number was based on to the maximum number of unique frames that could be extracted for the breed with the smallest sample (Poll Dorset). The frames were selected

were Merino (70), Suﬀolk (63), White-face Suﬀolk (16), Poll Dorset (11) as identiﬁed by the grazier. An example of each breed is shown in Fig. 2. The sample of sheep included both ram lambs and ewe lambs at the typical slaughter age of 6 to 8 months with wool lengths varying between 0 to 200 mm. Although there was a varying number of sheep within each breed, 5

Computers and Electronics in Agriculture 167 (2019) 105055

S. Abu Jwade, et al.

Fig. 7. Pictures of sheep from three camera positions: (a) inside the weighing station; (b) exit of the weighing station; and, (c) entrance of the weighing station. Best results were found at the entrance of the weighing station.

The sheep face image set was planned to have much less variation to simplify the problem of comparing images of diﬀerent breeds. The preprocessing for this data was aimed to: a) reduce irrelevant features by cropping the face only, b) eliminate posture variation by aligning all face images to one direction and c) rescale face images to have a uniform scale. Fig. 4b shows some sample images from this dataset. The preprocessing steps for this dataset were as follows:

to be unique to ensure the accuracy and reliability of the results. For the classiﬁcation model to be able to deal with pictures taken in real life scenarios, it has to be trained on images with a range of variation. The data set collected successfully captured a range of variations including: varying posture of a moving sheep, varying brightness due to the changing weather, moving shadows and interference in an image. Fig. 2e demonstrate some of the variations in the collected data. The overall pipeline of the methodology, post recording the videos of the sheep being drafted, is presented in Fig. 3:

1) Two control points were selected in each image. The control points were the sheep’s right eye and the sheep’s septum separating the nostrils as shown in Fig. 5. In cases where the right eye was not visible, the left eye was selected and the image was ﬂipped horizontally. 2) A geometric transformation matrix was inferred using the control points and two predeﬁned template points. This matrix was a non reﬂective similarity transformation. 3) The transformation matrix was applied on the image. a) Translation, rotation, and scaling were applied on the image to align it with the template image b) The right eye and septum were visible at roughly the same position in all images. 4) Images were resized to 224 × 224 pixels to match the input size of the CNN model used. 5) Images used for training were augmented with translations to increase the training dataset.

1) Individual frames were extracted form the captured videos and ﬁltered so that frames with no sheep or with parts of a sheep were removed. 2) To ensure that the images were suﬃciently diﬀerent, videos were sampled by taking every 4th frame. 3) Frames were labelled by breed. A total of 1642 images were obtained. 4) Two pre-processing methods were applied to each frame: i) The frame was cropped to obtain a uniform aspect ratio of images showing the entire sheep. ii) Geometric transformation was applied on a zoom-in image of the face of the sheep to align with a predeﬁned template. 5) Image dataset was split into training set (80%) and testing set (20%). 6) Data augmentation was applied on the training set. 7) Deep learning models were trained and tested using the two sets. 8) Training and testing was repeated in a ﬁvefold cross validation approach.

To verify the quality of the sheep face image alignment, a template image was selected such that the right eye and the nostrils were aligned diagonally. Three diﬀerent sheep images were selected and aligned to the template image. Then one colour channel (red, green or blue) was extracted from each respectively and superimposed to create a single RGB image. The quality of the face seen in the resulting image reﬂected how well the three faces were aligned. Fig. 5 demonstrates the veriﬁcation steps.

Data preprocessing is a necessary step as it prepares data for the training stage. The eﬃciency of the breed classiﬁer depends highly on the quality of its input data. A total of 1642 sheep images were obtained and manually labelled by breed. Finally, four-hundred images of each breed were used to build the dataset for this study. Each image was preprocessed in two diﬀerent ways to build two types of data sets: 1) full sheep dataset with noise and multiple types of variation; and, 2) sheep face dataset with minimal noise and variation. Diﬀerent preprocessing steps were needed to prepare images for each of the two datasets. Matlab scripts were written to automate some parts of the preprocessing tasks such as image labelling, cropping, alignment, resizing and augmentation. The full sheep image set was designed to retain the diﬀerent types of variation within it and hence minimal preprocessing was applied. The preprocessing steps for this set included cropping the images to obtain a uniform aspect ratio (1080 × 1080) and resizing each image to 224 × 224 pixels to match the input size of the CNN model used. Images used for training were augmented with translations along the X and Y direction to produce transformed images in a way similar to what Krizhevsky et al. (2017) describe. This step increases the size of the training data which helps in building a model that is translation invariant and prevents model over-ﬁtting. Fig. 4a shows sample images from this dataset.

2.2. Implementation of transfer learning and ﬁne-tuning Transfer learning was applied on the VGG-16 model to classify four breeds of sheep. Modern deep neural networks were found to learn general features in the early layers and more speciﬁc features that depends greatly on the data on the last layers in a study conducted by Yosinski et al. (2014). For this reason, only the last six layers of VGG-16 were ﬁne-tuned in this study to build a sheep breed classiﬁer. The six ﬁne-tuned layers are shown in Fig. 6. To customise the classiﬁcation part of the network, the last three layers of the network (fc_8, prob and output) were replaced by three new fully connected layers. The new ﬁnal fully connected layer had 4 classes to match the number of breeds in the sheep image dataset. Learning rates for each group of layers were assigned as follows:

• The ﬁrst 10 convolutional layers were assigned zero to freeze their

weights and maintain the general feature detectors (e.g. edge or

6

Computers and Electronics in Agriculture 167 (2019) 105055

S. Abu Jwade, et al.

Fig. 8. Training progress of the VGG-16 with ﬁne tuning: (a) three layers; the accuracy curve ﬂuctuates and shows random behaviour and (b) six layers; the accuracy curve converges at 95%.

was run on the network for several epochs. Every epoch was a full pass through the entire dataset. In every epoch, the model picked every image in the dataset, extracted its features and made predictions in the ﬁnal layer. These predictions were then compared to the actual label to update the tunable weights through the back-propagation process. The network was trained using MatConvNet toolbox (Matconvnet: cnns for matlab, 2018) on a machine with 2.9 GHz Intel core i7 processor. The detailed training parameters are listed in Table 1. The toolbox allows

colour blob detectors).

• A small initial learning rate (1 × 10

−4 )

•

was used to tune the weights of the middle 3 convolutional layers to prevent distorting the original weights too quickly or too much. The learning rate factor of the three fully connected layers was set to 10 to allow the network to learn faster in these layers. To train the network on the sheep image dataset, back propagation 7

Computers and Electronics in Agriculture 167 (2019) 105055

S. Abu Jwade, et al.

Fig. 9. Accuracy results from ﬁvefold cross validation of pre-trained and ﬁne-tuned VGG-16. Fine-tuned VGG-16 achieved higher average accuracy with lower standard deviation: (a) achieved accuracy in each fold; (b) confusion matrices of ﬁne-tuned VGG-16; (c) confusion matrices of pre-trained VGG-16; and (d) achieved accuracy using two types of images.

eﬀect of various training parameters on the performance of the end model. The performance of the ﬁne-tuned model was compared with the general VGG-16 architecture without ﬁne-tuning. Table 2 gives a summary of the conducted experiments and their set-up.

the visualisation of various metrics during training such as: a) training accuracy on each individual mini-batch, b) validation accuracy on the entire validation set, c) training loss on each mini-batch and d) validation loss on the validation set. The loss is deﬁned as a summation of the errors made for each image in training or validation sets. The progress and performance of the trained network can be assessed using the accuracy and loss curves. For example, the problem of over-ﬁtting, where a model memorises the training data and is not able to generalise to new data, can be detected when the training accuracy is signiﬁcantly higher than the validation accuracy. Similarly, the problem of underﬁtting, where a model performs poorly on the training data, can be discovered when the training accuracy is very low. Several experiments were performed in this study to analyse the

2.3. Evaluation metrics The quality of a classiﬁcation model is usually assessed by computing the number of correctly predicted classes over all kinds of predictions made (Sokolova and Lapalme, 2009). Therefore, the performance of the trained ML models was evaluated based on the average accuracy and standard deviation. The performance was measured using ﬁvefold cross validation which gives a statistical estimation of how well 8

Computers and Electronics in Agriculture 167 (2019) 105055

S. Abu Jwade, et al.

Fig. 10. Examples of classiﬁed sheep images with predicted labels (P) and actual labels (A): (a) classiﬁcation of full sheep images and (b) classiﬁcation of cropped sheep face images.

loss and training accuracy and loss. The plots are shown in Fig. 8. Looking at the accuracy curves of the model with six ﬁne-tuned layers, it can be seen that both training and validation curves increase gradually and converge on a ﬁnal high percentage. Similarly, the loss curves of the same model show continuous improvement after each iteration of optimisation. In contrast, the model with only three ﬁnetuned layers shows unstable behaviour with ﬂuctuating accuracy and loss curves. This can be attributed to the model describing random noise instead of the underlying relationship which is a form of overﬁtting. Therefore, a model with six ﬁne tuned layers was considered for further analysis. To identify the best method for training a sheep breed classiﬁer, two transfer learning approaches were evaluated: one through ﬁne-tuning six layers of VGG-16 and the other was using the pre-trained VGG-16 with SVM on top of it. It took twenty minutes to train an SVM using features from pre-trained VGG-16 and twelve hours to ﬁne tune VGG-16 on the same machine. Fine-tuned VGG-16 achieved a high average accuracy of 94% which is 15% higher than that achieved by the pretrained VGG-16. It also showed lower standard deviation than the second model (1.9 versus 3.9) which reﬂects a consistency in its performance. The performance of the two trained models is depicted in Fig. 9a. A full list of confusion matrices for all ﬁve folds of cross validation is shown in Fig. 9b and c. Two types of images were compared to build the sheep breed classiﬁer. The ﬁrst one was full sheep images with noise and multiple types of variation and the second was cropped sheep face images with

the model would perform on a real farm with new images of sheep. Fivefold cross validation was implemented as follows: data was partitioned into ﬁve equal partitions: four were used for training and one for testing. The training was repeated ﬁve times with each of the ﬁve partitions used exactly once as a testing set. Both average accuracy and standard deviation were used to evaluate the performance. The accuracy of the classiﬁcation system in each fold is depicted using a standard confusion matrix. In addition to average accuracy, the response time of the end model was measured to assess the practicality of the technology for the use of sheep producers.

3. Results and discussion Examples of captured frames from each of the three camera positions are shown in Fig. 7. The best position to mount a camera on farm was found to be at the entrance of the weighing station (i.e. Fig. 7c) of the drafting unit. This is because: sheep tended to lower their faces once inside the weighing station, and, multiple captures of a single sheep were readily attainable as sheep spent longer time in the view of the camera than in other positions. The eﬀect of ﬁne-tuning diﬀerent numbers of layers of VGG-16 was experimented to analyse its impact on the performance of the end sheep breed classiﬁer. A model with six ﬁne-tuned layers showed a more stable behaviour with higher classiﬁcation accuracy than a model with three ﬁne-tuned layers. The progress of the two models was plotted during the training process using four metrics: validation accuracy and 9

Computers and Electronics in Agriculture 167 (2019) 105055

S. Abu Jwade, et al.

Fig. 11. Training progress through ten epochs and example classiﬁcation results: (a) accuracy improves and reaches its maximum value in the seventh epoch; (b) average accuracy improves after 5 epochs and standard deviation remains low; and (c) examples of correctly classiﬁed images with conﬁdence levels.

minimal noise and variation. The ﬁne-tuned classiﬁcation model was found to perform well using either type of images which reﬂects its robustness. While full sheep images contained similar facial features as the face images, it also included other parts of the body along with irrelevant noise. The CNN model was able to deal with the variations of moving animals, moving shadows and other forms of noise which shows the eﬀectiveness of using augmented training datasets. This also shows the power of CNN models in learning key features and ﬁltering out noise. The accurate predictions of the noisy full sheep images can also be attributed to the model’s ability to learn distinguishing features

from the rest of the body in addition to the facial ones. This is similar to what farmers typically do as they look for physical cues on both the face and the body of the sheep. Examples of these cues include the colour of the head, ears, hooves and legs and the presence of neck folds, horns and wool on the legs (sheep characteristics, 2019). The average accuracy results achieved using each of the two image types are displayed in Fig. 9d. Examples of classiﬁcation results from each dataset are shown in Fig. 10. Based on these ﬁndings, the following conclusions can be made:

10

Computers and Electronics in Agriculture 167 (2019) 105055

S. Abu Jwade, et al.

improvement of 2% after the ﬁrst ﬁve epochs and reached a maximum average accuracy of 95.8% at the seventh epoch. Fig. 11a shows the training progress over 10 epochs. Fig. 11b shows the average accuracy of both models using ﬁvefold cross validation. Example results of breed classiﬁcation can be seen in Fig. 11c, it can be seen that the model gives correct classiﬁcations with relatively high conﬁdence. Fig. 12 shows the average classiﬁcation accuracy of the ﬁne-tuned model for each breed across ﬁve experiments. The model’s average accuracy in identifying each of the four breeds was found to be highest for the Suﬀolk (98%) and lowest for the Merino (92%) breeds. This can be attributed to the unique feature of having black spots on the face that makes Suﬀolk breed easier to identify. Merino, on the other hand, has the lowest average accuracy which could be attributed to the fact that they have four basic strains that are diverse in character (Australian merino, 2016). To improve the performance of the classiﬁer, the variation within one class needs to be minimised. Hence, having separate classes for the diﬀerent strains of Merino is recommended. The misclassiﬁed images were analysed to have a better understanding of the model’s performance. It was found that 76% of the misclassiﬁed images were of the Merino breed. In these misclassiﬁcations, Merino was either the actual or the predicted breed. Table 3 lists all misclassiﬁcations of the ﬁne-tuned model from three folds of validations and Fig. 13 shows a sample of some misclassiﬁed frames. This high occurrence of Merino in misclassiﬁed images can be again attributed to the variability within the Merino class. To improve the model’s performance, diﬀerent Merino strains need to be given separate classiﬁcation labels. Misclassiﬁcation could also be a result of having crossbred sheep with parents being of two diﬀerent breeds. Objective methods for determining the breeds of the sheep are needed to identify this type of sheep and better understand and improve the performance of the model. It is also possible that some of the misclassiﬁcations are due to the distance of the sheep from the camera or its posture. This type of misclassiﬁcation can be rectiﬁed by capturing and processing multiple frames of the same sheep as it walks closer to the camera and using the most common prediction from all the frames. The response time of the sheep breed classiﬁer was measured to study its practicality for use on a farm during the drafting process. The average time the model took to classify a single image was found to be 0.7 s on a machine with 2.9 GHz processor. In a drafting process, the speed of drafting is bound by the speed of weighing a sheep and the speed of sheep walking through the drafting system. Looking at the drafting videos captured in this study, a single sheep was in the ﬁeld view of the camera for seventeen seconds on average which gives the model suﬃcient time for identifying its breed using multiple images for even higher accuracy. Therefore, the model is fast enough to be

Fig. 12. Average classiﬁcation accuracy per breed.

• As •

• • • •

long as the underlying classiﬁer is well trained, classifying images with varying noise and variation can be done with similar levels of accuracy. Minimal image preprocessing (resizing and augmentation) is required to train an on farm sheep breed classiﬁer. This is because CNN model proved to be invariant to viewpoint, sheep posture, size or illumination when trained using the full image dataset. This makes sheep breed classiﬁcation very robust and practical for on farm use. A picture of the sheep can be taken at any point when the sheep is in the race. This is because CNN model was able to handle full body images as well as zoomed-in face images. The face of the sheep has key information that diﬀerentiates between the diﬀerent breeds. To identify breeds during a drafting session, cameras should be mounted such that the face of the sheep can be captured clearly. Drafting is performed when sheep approach the slaughter age. As sheep in this study were at the typical slaughter age, it may be argued that breed classiﬁcation using face imaging alone has suitability during drafting sessions. As the model proved that breed classiﬁcation during drafting can be done through the face image alone, a picture of the sheep’s face on its own could provide key features for future image-based sheep classiﬁcations.

The eﬀect of modifying the maximum training epochs was analysed to identify the best training parameters. It took twelve hours to ﬁne tune VGG-16 for ﬁve epochs and twenty-four hours to train it for ten epochs on the same machine. Training for ten epochs showed a slight Table 3 A list of missclassiﬁcations with details of classes and model’s conﬁdence levels. Predicted class

Actual class

Softmax output

Predicted class

Actual class

Softmax output

Poll Dorset White Suﬀolk Marino Marino Poll Dorset Marino Poll Dorset Poll Dorset Marino White Suﬀolk White Suﬀolk Suﬀolk White Suﬀolk White Suﬀolk Suﬀolk White Suﬀolk Marino Poll Dorset

Marino Marino Poll Dorset Suﬀolk White Suﬀolk White Suﬀolk Marino Marino White Suﬀolk Marino Marino Marino Marino Marino Poll Dorset Poll Dorset Suﬀolk White Suﬀolk

56.9% 81.6% 73.8% 56.5% 70.5% 97.7% 84.7% 49.7% 96.9% 91.8% 50.4% 88.8% 83% 99.7% 54.9% 96.2% 97.4% 71.8%

White Suﬀolk Poll Dorset Marino Suﬀolk Poll Dorset Marino White Suﬀolk Marino White Suﬀolk Suﬀolk Poll Dorset Suﬀolk Poll Dorset White Suﬀolk Suﬀolk Marino Suﬀolk

Marino Marino Suﬀolk White Suﬀolk White Suﬀolk White Suﬀolk Marino Suﬀolk Marino Marino Marino Marino Marino Marino Poll Dorset Poll Dorset White Suﬀolk

95.4% 89.2% 98.7% 94.1% 92.4% 71.4% 99.6% 96.2 % 82.9 % 99.9 % 73.1% 49.3% 51.2% 78% 99.8% 99.2% 97.3%

11

Computers and Electronics in Agriculture 167 (2019) 105055

S. Abu Jwade, et al.

Fig. 13. Examples of miss-classiﬁed sheep images with predicted labels (P) and actual labels (A).

integrated into a real time drafting system.

a discriminant analysis approach. SpringerPlus 5 (1), 1–12. Atanbori, J., Duan, W., Murray, J., Appiah, K., Dickinson, P., 2016. Automatic classiﬁcation of ﬂying bird species using computer vision techniques. Pattern Recogn. Lett. 81 (C), 53–62. Australian merino, 2016. Australian Association of Stud Merino Breeders, accessed: 201804-14. [Online]. Available: http://merinos.com.au/genetics/merino-history/ australian-merino. Bayramoglu, N., Heikkilä, J., 2016. Transfer learning for cell nuclei classiﬁcation in histopathology images. In: Hua, G., Jégou, H. (Eds.), Computer Vision – ECCV 2016 Workshops. Springer International Publishing, Cham, pp. 532–539. Bunbury, 2018. WA: daily weather observations 2018. Bom.gov.au [Online]. Available: http://www.bom.gov.au/climate/dwo/IDCJDW6017.latest.shtml. Burke, J., Nuthall, P., McKinnon, A., 2004. An analysis of the feasibility of using image processing to estimate the live weight of sheep. Carneiro, H., Louvandini, H., Paiva, S., Macedo, F., Mernies, B., McManus, C., 2010. Morphological characterization of sheep breeds in brazil, uruguay and colombia. Small Ruminant Res. 94 (1), 58–65. Deng, J., Dong, W., Socher, R., Li, L., Li, K., Fei-Fei, L., June 2009. ImageNet: A large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. Devikar, P., 2018. Transfer learning for image classiﬁcation of various dog breeds. Int. J. Adv. Res. Comput. Eng. Technol. (IJARCET) 5(12), 2707–2715 [Online]. Available: http://ijarcet.org/wp-content/uploads/IJARCET-VOL-5-ISSUE-12-2707-2715.pdf. Finlayson, J., Cacho, O., Bywater, A., 1995. A simulation model of grazing sheep: Animal growth and intake. Agric. Syst. 48 (1), 1–25. Gopalakrishnan, K., Khaitan, S.K., Choudhary, A., Agrawal, A., 2017. Deep convolutional neural networks with transfer learning for computer vision-based data-driven pavement distress detection. Construct Build Mater. 157, 322–330 [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0950061817319335. He, K., Zhang, X., Ren, S., Sun, J., 2016. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778. Hinton, G.E., Osindero, S., Teh, Y.-W., 2006. A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554. https://doi.org/10.1162/neco.2006.18.7. 1527, pMID: 16764513. [Online]. Hong, F., Tan, J., Mccall, D., 2000. Application of neural network and time series techniques in wool growth modeling. Trans. Asae 43 (1), 139–144. Hopkins, D., 1991. Estimating carcass weight from liveweight in lambs. Small Ruminant Res. 6 (4), 323–328. Kassler, M., 2001. Automatic counting of sheep. Meat & Livestock Australia Ltd [Online]. Available: https://www.mla.com.au/download/ﬁnalreports?itemId=772. Kirton, A.H., Carter, A.H., Clarke, J.N., Duganzich, D.M., 1984. Dressing percentages of lambs. New Zealand Soc. Animal Prod. 44, 231–233. Krizhevsky, A., Sutskever, I., Hinton, G., 2017. ImageNet classiﬁcation with deep convolutional neural networks. Commun. ACM 60 (6), 84–90. Kumar, S., Pandey, A., Sai Ram Satwik, K., Kumar, S., Singh, S.K., Singh, A.K., Mohan, A., 2018. Deep learning framework for recognition of cattle using muzzle point image pattern. Measurement 116, 1–17. Liu, J., Kanazawa, A., Jacobs, D., Belhumeur, P., 2012. Dog breed classiﬁcation using part localization. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (Eds.), Computer Vision – ECCV 2012. Springer, Berlin Heidelberg, Berlin, Heidelberg, pp. 172–185. Long, J., Shelhamer, E., Darrell, T., June 2015. Fully convolutional networks for semantic segmentation. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Lu, Y., Mahmoud, M., Robinson, P., 2017. Estimating sheep pain level using facial action unit detection. In: 2017 12th IEEE International Conference on Automatic Face Gesture Recognition (FG 2017), pp. 394–399. Manning, C.D., Raghavan, P., Schütze, H., 2009. Introduction to Information Retrieval, 3rd ed. Cambridge University Press. Matconvnet: cnns for matlab, Vlfeat.org [Online]. Available: http://www.vlfeat.org/

4. Conclusion This research studied the practicality of automating sheep breed identiﬁcation using computer vision and machine learning techniques during a drafting session. The study analysed several training parameters and experimented with two types of training images: one using full sheep images which included noise and lots of variation and the second using cropped facial images with minimal noise and variation. The performance of the trained classiﬁers were evaluated on their average accuracy and standard deviation using ﬁvefold cross validation. We concluded that ﬁne-tuning the last six layers of VGG-16 for 10 epochs achieved the highest classiﬁcation accuracy of 95.8% with 1.7 standard deviation. We also observed that feeding the classiﬁer with datasets including variations in viewpoint, sheep posture or illumination had no impact on the ﬁnal performance of the classiﬁer which proves its robustness. The model was proven to be very practical and eﬀective for on farm use as it takes 0.7 s on average to process a single image. The classiﬁer developed in this study could assist sheep meat producers as a tool to accurately and eﬃciently diﬀerentiate between breeds without the need of an expert and allow more accurate estimation of meat yield and cost management. Future work can be focused on expanding the existing datasets by adding images of more breeds at diﬀerent times of the year and sheep at diﬀerent ages. It may also be possible to integrate the techniques developed here with the detection and counting techniques developed by others (Sarwar et al., 2018) to identify breeds of sheep in an open area where more than one sheep are captured at once. Moreover, to overcome the uncertainty associated with subjective labels, objective methods need to be used to verify the ground truth labels. Funding This research did not receive any speciﬁc grant from funding agencies in the public, commercial, or not-for-proﬁt sectors. Acknowledgment Our special thanks goes to Alan Garstone, The Kingston Rest sheep producer, who provided us an opportunity to visit his farm and helped identifying the sheep breeds. References Armstrong, J.S., 2012. Illusions in regression analysis. Int. J. Forecast. 28 (3), 689–694. Asamoah Boaheng, M., Sam, E., 2016. Morphological characterization of breeds of sheep:

12

Computers and Electronics in Agriculture 167 (2019) 105055

S. Abu Jwade, et al.

Sarwar, F., Griﬃn, A., Periasamy, P., Portas, K., Law, J., 2018. Detecting and counting sheep with a convolutional neural network. pp. 1–6. Searle, T.W., Graham, N.M., Donnelly, J.B., Margan, D.E., 1989. Breed and sex diﬀerences in skeletal dimensions of sheep in the ﬁrst year of life. J. Agric. Sci. 113 (3), 349–354. sheep characteristics, breeds and facts, 2019. Encyclopedia Britannica [Online]. Available: https://www.britannica.com/animal/sheep. Simonyan, K., Zisserman, A., 2014. Very deep convolutional networks for large-scale image recognition. Sokolova, M., Lapalme, G., 2009. A systematic analysis of performance measures for classiﬁcation tasks. Informat. Process. Manage. 45(4), 427–437 [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0306457309000259. Spoliansky, R., Edan, Y., Parmet, Y., Halachmi, I., 2016. Development of automatic body condition scoring using a low-cost 3-dimensional Kinect camera. J. Dairy Sci. 99 (9), 7714–7725. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A., 2015. Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9. Van Hertem, T., Viazzi, S., Steensels, M., Maltz, E., Antler, A., Alchanatis, V., SchlageterTello, A.A., Lokhorst, K., Romanini, E.C., Bahr, C., Berckmans, D., Halachmi, I., 2014. Automatic lameness detection based on consecutive 3D-video recordings. Biosyst. Eng. 119 (C), 108–116. Yosinski, J., Clune, J., Bengio, Y., Lipson, H., 2014. How transferable are features in deep neural networks? In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (Eds.), Advances in Neural Information Processing Systems 27. Curran Associates Inc, pp. 3320–3328. [Online]. Available: http://papers.nips.cc/ paper/5347-how-transferable-are-features-in-deep-neural-networks.pdf. Zeiler, M.D., Fergus, R., 2013. Visualizing and understanding convolutional networks.

matconvnet/. MLA, 2017. Market information services – sheep assessment manual, accessed: 2018-0414. [Online]. Available: https://www.mla.com.au/globalassets/mla-corporate/ prices-markets/documents/minlrs-information-brochures-etc/mla-sheep-assessmentmanual-jan-2017.pdf. Molchanov, P., Tyree, S., Karras, T., Aila, T., Kautz, J., 2016. Pruning convolutional neural networks for resource eﬃcient transfer learning. CoRR, abs/1611.06440. Nasiriany, S., Thomas, G., Wang, W., Yang, A., 2018. A Comprehensive Guide to Machine Learning. University of California [Online]. Available: http://snasiriany.me/ﬁles/mlbook.pdf. Oquab, M., Bottou, L., Laptev, I., Sivic, J., 2014. Learning and transferring mid-level image representations using convolutional neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition. Proceedings, pp. 1717–1724. [Online]. Available: http://search.proquest.com/docview/1677905608/. Parkhi, O.M., Vedaldi, A., Zisserman, A., Jawahar, C.V., 2012. Cats and dogs. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3498–3505. RMIT, 2017. Automated visual inspection and preparation of live animals for meat processing. Rodriguez, I.F., Megret, R., Acuna, E., Agosto-Rivera, J. L., Giray, T., March 2018. Recognition of pollen-bearing bees from video using convolutional neural network. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 314–322. Rowe, J., Atkins, K., 2006. Precision sheep production pipedream or reality? In: Australian Society of Animal Production 26th Biennial Conference, No. 33. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L., 2015. ImageNet large scale visual recognition challenge. Int. J. Comput. Vision (IJCV) 115 (3), 211–252.

13

On farm automatic sheep breed classification using deep learning

On farm automatic sheep breed classification using deep learning

Recommend Documents