Computers and Electronics in Agriculture 167 (2019) 105055
Contents lists available at ScienceDirect
Computers and Electronics in Agriculture journal homepage: www.elsevier.com/locate/compag
On farm automatic sheep breed classification using deep learning a,⁎
b
Sanabel Abu Jwade , Andrew Guzzomi , Ajmal Mian a b
T
a
Department of Computer Science and Software Engineering, The University of Western Australia, 35 Stirling Hwy, Crawley, WA, Australia Department of Mechanical Engineering, The University of Western Australia, 35 Stirling Hwy, Crawley, WA, Australia
A R T I C LE I N FO
A B S T R A C T
Keywords: Agricultural automation Computer vision Deep learning Convolutional neural networks Sheep breed identification Image classification
Automatic identification of breeds of sheep can be valuable to the sheep industry. Sheep producers need to identify different breeds of sheep to estimate the commercial value of their flock. In many situations however, farmers find it challenging to identify the breeds of sheep without a great deal of experience. DNA testing is an alternative method for breed identification. However, it is not practical for real time assessment of large quantities of sheep in a production environment. Hence, autonomous methods that can efficiently and accurately replicate the identification ability of a sheep breed expert, while operating in a farm environment are beneficial to the industry. Our original contributions in this field include: setting up a prototype computer vision system in a sheep farm, building a database compromising 1642 sheep images of four breeds captured on a farm and labelled by an expert with its breed and training a sheep breed classifier using machine learning and computer vision to achieve an average accuracy of 95.8% with 1.7 standard deviation. This classifier could assist sheep farmers to accurately and efficiently differentiate between breeds and allow more accurate estimation of meat yield and cost management.
1. Introduction Profit for sheep producers has a direct relationship between the commercial value of the flock and the cost to grow the sheep. The commercial value of a sheep in Australia depends mainly on its meat weight, also known as carcass weight (MLA, 2017). A study (Rowe and Atkins, 2006) showed that only 80% of the total flock contribute to the farm’s productivity and profitability. Therefore, optimising the remaining 20% of the flock could significantly improve profits. Farmers typically do not have the means to pre-emptively estimate the productivity of their flock as carcass weight is only determined after the sheep are released and slaughtered at the abattoir. Farmers currently use live weight, obtained through automatic drafting of their flocks, to estimate when to release sheep. In a drafting session, sheep get weighed and are segregated into groups based on their live weight. However, studies (Kirton et al., 1984; Hopkins, 1991) have shown that many other factors affect meat yield. For example, live weight includes gut fill, wool weight, and bone frame. Gut fill was found to have a significant impact on meat yield prediction (Kirton et al., 1984). Weights of sheep dropped 2 kg between being weighed off pasture and again after an overnight fast. Because of these differences, (Kirton et al., 1984) provided two separate meat yield prediction models for sheep
with different gut fills. A general estimation of the gut fill of the flock can be made by estimating the time from their last feed (MLA, 2017). Another factor that influences meat yield estimation is wool weight (Kirton et al., 1984; MLA, 2017). A number of studies have defined prediction models for wool growth with fairly high accuracy (Hong et al., 2000; Finlayson et al., 1995). For example, Hong et al. (2000) modelled the 12-month pattern of wool growth rate as a function of live weight, breed, age, and rump status. The model predicted the rate of growth of wool that ranged between 6 and 16 g/day with 2.86 g/day average root mean square error. Moreover, different breeds have different meat yields. Identifying optimum timing to release each breed would be beneficial. Kirton et al. (1984) performed a study on 2207 sheep to investigate factors that affect meat yield. They compared the meat production of shorn sheep of similar live weights sired by twelve different breeds. They found that the meat production was similar for sheep of the same breed but different from one breed to another. For example, Merino cross sheep were found to have meat yield 0.7 kg less than Southdown cross sheep of the same live weight. This small variation of meat yield between individual sheep can translate into significant losses for enterprises comprising thousands of sheep. The three discussed factors: gut fill, wool weight and breed can provide farmers with valuable insights about the productivity of their flock. Gut fill and
⁎
Corresponding author. E-mail addresses:
[email protected],
[email protected] (S. Abu Jwade),
[email protected] (A. Guzzomi),
[email protected] (A. Mian). https://doi.org/10.1016/j.compag.2019.105055 Received 16 June 2019; Received in revised form 10 October 2019; Accepted 13 October 2019 0168-1699/ © 2019 Elsevier B.V. All rights reserved.
Computers and Electronics in Agriculture 167 (2019) 105055
S. Abu Jwade, et al.
Fig. 1. Conventional drafter layout with three camera positions: (1) at the exit of the weighing station, (2) at the weighing station and (3) at the entrance of the weighing station.
Fig. 2. The four drafted sheep breeds: (a) Marino; (b) Suffolk; (c) White Sufflok; (d) Poll Dorset; and (e) a sample of the variability in the image dataset due to sheep movement, shadow movement and variable light conditions.
from very large training sets in a magnitude exceeding tens of thousands of objects (Krizhevsky et al., 2017). This capacity allows it to learn objects in realistic settings including objects which exhibit considerable variability in appearance (Krizhevsky et al., 2017). Because of CNNs, deep learning based image classification is now performing better than human vision in many tasks (Devikar, 2018). A large and growing body of literature has investigated the use of CV to classify and manage livestock based on body condition, cleanliness, lameness, and pain level (Spoliansky et al., 2016; RMIT, 2017; Van Hertem et al., 2014; Lu et al., 2017). However, very few of these applications were designed for sheep farms because of the challenges associated with: 1) segmenting individual sheep from a flock with uniform colour, 2) predicting a naturally de-formable body shape and 3) extracting body features in the presence of wool (Sarwar et al., 2018; Kassler, 2001; Burke et al., 2004). CV classification applications in the agricultural field typically use classifiers such as: support vector machines (SVM) (Lu et al., 2017), CNN (Kumar et al., 2018; RMIT, 2017) or clustered polynomial regression model (Spoliansky et al., 2016). It can be noted that CNN achieved good accuracy levels in cleanliness (RMIT, 2017) and cattle classification (Kumar et al., 2018) at 80% and 76% respectively. Although the polynomial regression model (Spoliansky et al., 2016) achieved higher accuracy (91%), it is only effective when the output has a continuous range and there is a small number of discriminating features between classes (Armstrong, 2012). SVMs are traditional ML techniques that analyse data for classification and regression. They are based on the idea of finding a hyperplane that best segregates the data points of the two different classes (Manning
wool weight prediction models already exist in the literature. However, there are no models for automatic sheep breed identification that currently exist. Therefore, this paper will focus specifically on automatic sheep breed identification during the drafting process. Different breeds of sheep are in many cases phenotypically diverse. Several studies (Carneiro et al., 2010; Asamoah Boaheng et al., 2016; Searle et al., 1989) have used body measurements to identify sheep breeds. Carneiro et al. (2010) successfully identified eleven breeds using only three body measurements, namely: shoulder height, head width and length. Similarly, Asamoah Boaheng et al. (2016) used a mathematical model to classify three African sheep breeds using six body measurements and achieved 86.2% classification accuracy. Body appearance was also used by Searle et al. (1989) where it was found that, at any given live weight, one of the breeds had longer legs and smaller shoulders than the other. As different breeds have subtle differences in appearance, computer vision (CV) and machine learning (ML) based approaches could offer benefits to the industry. One of the most powerful ML techniques is deep learning. Deep learning uses a cascade of many layers of processing units for feature extraction and transformation. Convolutional neural network (CNN) is one of the most popular algorithms for deep learning that is commonly applied to analyse and classify visual imagery (Zeiler and Fergus, 2013). CNN employs deep neural network architectures to automatically learn discriminating features from an input image without the need for feature engineering (Hinton et al., 2006). This independence from prior knowledge and human effort in feature design is a major advantage. With its complex architecture, CNN has the capacity to learn 2
Computers and Electronics in Agriculture 167 (2019) 105055
S. Abu Jwade, et al.
Fig. 3. The overall pipeline of the proposed methodology. Individual frames are extracted from videos, labelled and preprocessed before being used to train and test deep learning models.
Training a deep CNN from scratch requires a very large labelled dataset of images (in the order of tens of thousands or more) (Krizhevsky et al., 2017). However, this number of images is not always available in practice. Therefore, it is common to use a limited amount of training data to re-train an existing model in a process called transfer learning (Oquab et al., 2014). In transfer learning, a CNN model that is pre-trained using a very large dataset of images is reused or fine tuned for a relevant classification task (Yosinski et al., 2014; Oquab et al., 2014).
et al., 2009; Nasiriany et al., 2018). Although there is a large amount of literature on identifying different breeds of animals such as dogs, cats and birds (Parkhi et al., 2012; Liu et al., 2012; Devikar, 2018; Atanbori et al., 2016), few studies were found about automatic classification of sheep breeds. CNN achieved a very high accuracy of 96% in classifying a large group of dog breeds, but this was only possible with the use of a considerable amount of data (100,000) (Devikar, 2018). Unlike other classifiers, CNN can learn from much larger datasets.
3
Computers and Electronics in Agriculture 167 (2019) 105055
S. Abu Jwade, et al.
Fig. 4. Image sets for training a sheep breed classifier: (a) full sheep images with variations; and (b) sheep face images with reduced variations.
weights) (Simonyan and Zisserman, 2014):
• thirteen convolutional layers with very small receptive field of size 3 × 3, • five max-pooling layers of size 2 × 2 to carry out spatial pooling, • rectification non-linearity (ReLu) layers following all hidden layers, and • three fully-connected layers, with the final layer being the soft-max layer.
Since there appears to be no automatic sheep breed classification system present in the reviewed literature, this paper aims to differentiate sheep breeds using computer vision and machine learning on farm.
Fig. 5. Preprocessing sheep face images. Images were aligned using two points (right eye and septum). Image alignment was verified as follows: one colour channel was extracted from three different images and then superimposed to create an RGB image. The face in the resulting image reflected a good alignment of the three images.
2. Materials and methods The approach taken in this project was to obtain images of sheep breeds from an operational farm to build a prototype CV system. These images were pre-processed in order to enable training and testing CNN models.
The VGG-16 model (Simonyan and Zisserman, 2014) is a 16-layer CNN developed by the Visual Geometry Group at the University of Oxford. The model was trained on 1.2 million images belonging to 1000 categories from the ImageNet dataset (Deng et al., 2009). Because of the model’s ability to capture features from images distributed over large and diverse classes, VGG-16 has been shown to be very useful for solving cross domain image recognition and classification problems (Simonyan and Zisserman, 2014). For example, applying transfer learning on VGG-16 has been explored in bird species classification (Molchanov et al., 2016), pollen-bearing bees recognition (Rodriguez et al., 2018), semantic segmentation (Long et al., 2015), pavement distress detection (Gopalakrishnan et al., 2017) and cell nuclei classification (Bayramoglu et al., 2016). The model brings an improvement on the prior-art architectures by increasing the depth and using very small convolution filters (Simonyan and Zisserman, 2014). While there are much more sophisticated models which outperform VGG-16 in ImageNet challenge (Russakovsky et al., 2015), the model is remarkable for its simplicity. As a comparison, the authors of GoogLeNet model emphasised that their model places notable consideration on memory and power usage because of its complexity (Szegedy et al., 2015). Similarly, Microsoft’ s ResNet model which was the winner of 2015 ImageNet challenge has 152 layers (He et al., 2016). The architecture of the VGG-16 model comprises 138 million parameters and consists of 41 layers (16 of which are with learn-able
2.1. Data acquisition and preprocessing A conventional drafting unit is depicted in Fig. 1. Such a system is installed at the Kingston Rest, a commercial sheep producing farm in the South-West of Western Australia. The farm was visited in Autumn on May 2nd, 2018 from 10:00 am to 1:00 pm. The weather was partly cloudy with light intermittent showers and an average temperature of 15.6 °C (Bunbury, 2018). A method to obtain thousands of pictures in a farm environment during the drafting process was required to train a ML model. A GoPro Hero5 camera (model: GPCHDHX-502) was used to collect the images of different sheep breeds. This camera model was chosen because it is water proof, rugged and able to cope with vibration caused by the drafting machines. The camera was set to record at 24 fps with 1920 × 1080 resolution. Different camera positions were tested, as shown in Fig. 1, and evaluated based on the captured videos. A good quality video was defined as one that provided an unobstructed view of the face and the front of the sheep and captured one sheep at a time to avoid the segmentation problem. A total of 160 sheep from 4 different breeds were drafted. These 4
Computers and Electronics in Agriculture 167 (2019) 105055
S. Abu Jwade, et al.
Fig. 6. VGG-16 architecture. The last six layers of the VGG-16 were fine-tuned in this study. Table 1 A list of training hyper parameters used in the study. Parameter name
Description
Selected value
Optimisation algorithm
Updates the network parameters to minimise the loss function by taking small steps in the direction of the negative gradient of the loss specifies the maximum number of full passes through the entire dataset Specifies the number of parameter updates per epoch Specifies the number of observations in each iteration Specifies the global learning rate
Stochastic Gradient Descent with Momentum (sgdm) 10 129 10
Maximum number of epochs Iterations per epoch Mini batch size Initial learning rate Weights learning rate factor Biases learning rate factor Validation frequency Execution environment
Specifies the factor by which the global learning rate gets multiplied to determine the learning rate for the weights in fully connected layers Specifies the factor by which the global learning rate gets multiplied to determine the learning rate for the biases in fully connected layers Specifies the frequency of validation during training Specifies the execution environment of the training
1 × 10−4 10 10 every 3 iterations CPU
Table 2 A summary of four conducted experiments. Experiment Objective
Approach
Controlled Variable
1. Analysing the effect of varying the number of fine-tuned layers
Two models were trained using the same training parameters such that only the last 3 fully connected layers were fine-tuned for one of the models and the last 6 layers were fine tuned for the other. Both models were trained for 5 epochs using the same dataset such that 80% of the data was used for training and 20% was used for testing. Pre-trained VGG-16 was used as a deep feature generator where the output of the intermediate layer (fc8) was used to train one-versus-all SVM. The performance of this model was compared to the fine-tuned model. Two models were trained using the same training parameters; however, one having 5 epochs and the other having 10 epochs. The same dataset was used for training the two models. Using the training parameters described in Table 1, a model was trained once using noisy images (full sheep images) and once using less noisy images (sheep face images). Training was done for five folds.
Number of fine-tuned layers
2. Evaluating the improvement that the fine-tuned model brings over the baseline VGG-16 model 3. Finding the ideal number of epochs for training
4. Analysing the effect of noise and variation in the training dataset
Transfer learning approach Number of epochs
Type of images used for training
an equal number of frames was taken for each bread in order to have a balanced data set. Balancing the dataset was important to prevent the model’s bias to the breed with the maximum number of frames. Four hundred frames for each breed were used. This number was based on to the maximum number of unique frames that could be extracted for the breed with the smallest sample (Poll Dorset). The frames were selected
were Merino (70), Suffolk (63), White-face Suffolk (16), Poll Dorset (11) as identified by the grazier. An example of each breed is shown in Fig. 2. The sample of sheep included both ram lambs and ewe lambs at the typical slaughter age of 6 to 8 months with wool lengths varying between 0 to 200 mm. Although there was a varying number of sheep within each breed, 5
Computers and Electronics in Agriculture 167 (2019) 105055
S. Abu Jwade, et al.
Fig. 7. Pictures of sheep from three camera positions: (a) inside the weighing station; (b) exit of the weighing station; and, (c) entrance of the weighing station. Best results were found at the entrance of the weighing station.
The sheep face image set was planned to have much less variation to simplify the problem of comparing images of different breeds. The preprocessing for this data was aimed to: a) reduce irrelevant features by cropping the face only, b) eliminate posture variation by aligning all face images to one direction and c) rescale face images to have a uniform scale. Fig. 4b shows some sample images from this dataset. The preprocessing steps for this dataset were as follows:
to be unique to ensure the accuracy and reliability of the results. For the classification model to be able to deal with pictures taken in real life scenarios, it has to be trained on images with a range of variation. The data set collected successfully captured a range of variations including: varying posture of a moving sheep, varying brightness due to the changing weather, moving shadows and interference in an image. Fig. 2e demonstrate some of the variations in the collected data. The overall pipeline of the methodology, post recording the videos of the sheep being drafted, is presented in Fig. 3:
1) Two control points were selected in each image. The control points were the sheep’s right eye and the sheep’s septum separating the nostrils as shown in Fig. 5. In cases where the right eye was not visible, the left eye was selected and the image was flipped horizontally. 2) A geometric transformation matrix was inferred using the control points and two predefined template points. This matrix was a non reflective similarity transformation. 3) The transformation matrix was applied on the image. a) Translation, rotation, and scaling were applied on the image to align it with the template image b) The right eye and septum were visible at roughly the same position in all images. 4) Images were resized to 224 × 224 pixels to match the input size of the CNN model used. 5) Images used for training were augmented with translations to increase the training dataset.
1) Individual frames were extracted form the captured videos and filtered so that frames with no sheep or with parts of a sheep were removed. 2) To ensure that the images were sufficiently different, videos were sampled by taking every 4th frame. 3) Frames were labelled by breed. A total of 1642 images were obtained. 4) Two pre-processing methods were applied to each frame: i) The frame was cropped to obtain a uniform aspect ratio of images showing the entire sheep. ii) Geometric transformation was applied on a zoom-in image of the face of the sheep to align with a predefined template. 5) Image dataset was split into training set (80%) and testing set (20%). 6) Data augmentation was applied on the training set. 7) Deep learning models were trained and tested using the two sets. 8) Training and testing was repeated in a fivefold cross validation approach.
To verify the quality of the sheep face image alignment, a template image was selected such that the right eye and the nostrils were aligned diagonally. Three different sheep images were selected and aligned to the template image. Then one colour channel (red, green or blue) was extracted from each respectively and superimposed to create a single RGB image. The quality of the face seen in the resulting image reflected how well the three faces were aligned. Fig. 5 demonstrates the verification steps.
Data preprocessing is a necessary step as it prepares data for the training stage. The efficiency of the breed classifier depends highly on the quality of its input data. A total of 1642 sheep images were obtained and manually labelled by breed. Finally, four-hundred images of each breed were used to build the dataset for this study. Each image was preprocessed in two different ways to build two types of data sets: 1) full sheep dataset with noise and multiple types of variation; and, 2) sheep face dataset with minimal noise and variation. Different preprocessing steps were needed to prepare images for each of the two datasets. Matlab scripts were written to automate some parts of the preprocessing tasks such as image labelling, cropping, alignment, resizing and augmentation. The full sheep image set was designed to retain the different types of variation within it and hence minimal preprocessing was applied. The preprocessing steps for this set included cropping the images to obtain a uniform aspect ratio (1080 × 1080) and resizing each image to 224 × 224 pixels to match the input size of the CNN model used. Images used for training were augmented with translations along the X and Y direction to produce transformed images in a way similar to what Krizhevsky et al. (2017) describe. This step increases the size of the training data which helps in building a model that is translation invariant and prevents model over-fitting. Fig. 4a shows sample images from this dataset.
2.2. Implementation of transfer learning and fine-tuning Transfer learning was applied on the VGG-16 model to classify four breeds of sheep. Modern deep neural networks were found to learn general features in the early layers and more specific features that depends greatly on the data on the last layers in a study conducted by Yosinski et al. (2014). For this reason, only the last six layers of VGG-16 were fine-tuned in this study to build a sheep breed classifier. The six fine-tuned layers are shown in Fig. 6. To customise the classification part of the network, the last three layers of the network (fc_8, prob and output) were replaced by three new fully connected layers. The new final fully connected layer had 4 classes to match the number of breeds in the sheep image dataset. Learning rates for each group of layers were assigned as follows:
• The first 10 convolutional layers were assigned zero to freeze their
weights and maintain the general feature detectors (e.g. edge or
6
Computers and Electronics in Agriculture 167 (2019) 105055
S. Abu Jwade, et al.
Fig. 8. Training progress of the VGG-16 with fine tuning: (a) three layers; the accuracy curve fluctuates and shows random behaviour and (b) six layers; the accuracy curve converges at 95%.
was run on the network for several epochs. Every epoch was a full pass through the entire dataset. In every epoch, the model picked every image in the dataset, extracted its features and made predictions in the final layer. These predictions were then compared to the actual label to update the tunable weights through the back-propagation process. The network was trained using MatConvNet toolbox (Matconvnet: cnns for matlab, 2018) on a machine with 2.9 GHz Intel core i7 processor. The detailed training parameters are listed in Table 1. The toolbox allows
colour blob detectors).
• A small initial learning rate (1 × 10
−4 )
•
was used to tune the weights of the middle 3 convolutional layers to prevent distorting the original weights too quickly or too much. The learning rate factor of the three fully connected layers was set to 10 to allow the network to learn faster in these layers. To train the network on the sheep image dataset, back propagation 7
Computers and Electronics in Agriculture 167 (2019) 105055
S. Abu Jwade, et al.
Fig. 9. Accuracy results from fivefold cross validation of pre-trained and fine-tuned VGG-16. Fine-tuned VGG-16 achieved higher average accuracy with lower standard deviation: (a) achieved accuracy in each fold; (b) confusion matrices of fine-tuned VGG-16; (c) confusion matrices of pre-trained VGG-16; and (d) achieved accuracy using two types of images.
effect of various training parameters on the performance of the end model. The performance of the fine-tuned model was compared with the general VGG-16 architecture without fine-tuning. Table 2 gives a summary of the conducted experiments and their set-up.
the visualisation of various metrics during training such as: a) training accuracy on each individual mini-batch, b) validation accuracy on the entire validation set, c) training loss on each mini-batch and d) validation loss on the validation set. The loss is defined as a summation of the errors made for each image in training or validation sets. The progress and performance of the trained network can be assessed using the accuracy and loss curves. For example, the problem of over-fitting, where a model memorises the training data and is not able to generalise to new data, can be detected when the training accuracy is significantly higher than the validation accuracy. Similarly, the problem of underfitting, where a model performs poorly on the training data, can be discovered when the training accuracy is very low. Several experiments were performed in this study to analyse the
2.3. Evaluation metrics The quality of a classification model is usually assessed by computing the number of correctly predicted classes over all kinds of predictions made (Sokolova and Lapalme, 2009). Therefore, the performance of the trained ML models was evaluated based on the average accuracy and standard deviation. The performance was measured using fivefold cross validation which gives a statistical estimation of how well 8
Computers and Electronics in Agriculture 167 (2019) 105055
S. Abu Jwade, et al.
Fig. 10. Examples of classified sheep images with predicted labels (P) and actual labels (A): (a) classification of full sheep images and (b) classification of cropped sheep face images.
loss and training accuracy and loss. The plots are shown in Fig. 8. Looking at the accuracy curves of the model with six fine-tuned layers, it can be seen that both training and validation curves increase gradually and converge on a final high percentage. Similarly, the loss curves of the same model show continuous improvement after each iteration of optimisation. In contrast, the model with only three finetuned layers shows unstable behaviour with fluctuating accuracy and loss curves. This can be attributed to the model describing random noise instead of the underlying relationship which is a form of overfitting. Therefore, a model with six fine tuned layers was considered for further analysis. To identify the best method for training a sheep breed classifier, two transfer learning approaches were evaluated: one through fine-tuning six layers of VGG-16 and the other was using the pre-trained VGG-16 with SVM on top of it. It took twenty minutes to train an SVM using features from pre-trained VGG-16 and twelve hours to fine tune VGG-16 on the same machine. Fine-tuned VGG-16 achieved a high average accuracy of 94% which is 15% higher than that achieved by the pretrained VGG-16. It also showed lower standard deviation than the second model (1.9 versus 3.9) which reflects a consistency in its performance. The performance of the two trained models is depicted in Fig. 9a. A full list of confusion matrices for all five folds of cross validation is shown in Fig. 9b and c. Two types of images were compared to build the sheep breed classifier. The first one was full sheep images with noise and multiple types of variation and the second was cropped sheep face images with
the model would perform on a real farm with new images of sheep. Fivefold cross validation was implemented as follows: data was partitioned into five equal partitions: four were used for training and one for testing. The training was repeated five times with each of the five partitions used exactly once as a testing set. Both average accuracy and standard deviation were used to evaluate the performance. The accuracy of the classification system in each fold is depicted using a standard confusion matrix. In addition to average accuracy, the response time of the end model was measured to assess the practicality of the technology for the use of sheep producers.
3. Results and discussion Examples of captured frames from each of the three camera positions are shown in Fig. 7. The best position to mount a camera on farm was found to be at the entrance of the weighing station (i.e. Fig. 7c) of the drafting unit. This is because: sheep tended to lower their faces once inside the weighing station, and, multiple captures of a single sheep were readily attainable as sheep spent longer time in the view of the camera than in other positions. The effect of fine-tuning different numbers of layers of VGG-16 was experimented to analyse its impact on the performance of the end sheep breed classifier. A model with six fine-tuned layers showed a more stable behaviour with higher classification accuracy than a model with three fine-tuned layers. The progress of the two models was plotted during the training process using four metrics: validation accuracy and 9
Computers and Electronics in Agriculture 167 (2019) 105055
S. Abu Jwade, et al.
Fig. 11. Training progress through ten epochs and example classification results: (a) accuracy improves and reaches its maximum value in the seventh epoch; (b) average accuracy improves after 5 epochs and standard deviation remains low; and (c) examples of correctly classified images with confidence levels.
minimal noise and variation. The fine-tuned classification model was found to perform well using either type of images which reflects its robustness. While full sheep images contained similar facial features as the face images, it also included other parts of the body along with irrelevant noise. The CNN model was able to deal with the variations of moving animals, moving shadows and other forms of noise which shows the effectiveness of using augmented training datasets. This also shows the power of CNN models in learning key features and filtering out noise. The accurate predictions of the noisy full sheep images can also be attributed to the model’s ability to learn distinguishing features
from the rest of the body in addition to the facial ones. This is similar to what farmers typically do as they look for physical cues on both the face and the body of the sheep. Examples of these cues include the colour of the head, ears, hooves and legs and the presence of neck folds, horns and wool on the legs (sheep characteristics, 2019). The average accuracy results achieved using each of the two image types are displayed in Fig. 9d. Examples of classification results from each dataset are shown in Fig. 10. Based on these findings, the following conclusions can be made:
10
Computers and Electronics in Agriculture 167 (2019) 105055
S. Abu Jwade, et al.
improvement of 2% after the first five epochs and reached a maximum average accuracy of 95.8% at the seventh epoch. Fig. 11a shows the training progress over 10 epochs. Fig. 11b shows the average accuracy of both models using fivefold cross validation. Example results of breed classification can be seen in Fig. 11c, it can be seen that the model gives correct classifications with relatively high confidence. Fig. 12 shows the average classification accuracy of the fine-tuned model for each breed across five experiments. The model’s average accuracy in identifying each of the four breeds was found to be highest for the Suffolk (98%) and lowest for the Merino (92%) breeds. This can be attributed to the unique feature of having black spots on the face that makes Suffolk breed easier to identify. Merino, on the other hand, has the lowest average accuracy which could be attributed to the fact that they have four basic strains that are diverse in character (Australian merino, 2016). To improve the performance of the classifier, the variation within one class needs to be minimised. Hence, having separate classes for the different strains of Merino is recommended. The misclassified images were analysed to have a better understanding of the model’s performance. It was found that 76% of the misclassified images were of the Merino breed. In these misclassifications, Merino was either the actual or the predicted breed. Table 3 lists all misclassifications of the fine-tuned model from three folds of validations and Fig. 13 shows a sample of some misclassified frames. This high occurrence of Merino in misclassified images can be again attributed to the variability within the Merino class. To improve the model’s performance, different Merino strains need to be given separate classification labels. Misclassification could also be a result of having crossbred sheep with parents being of two different breeds. Objective methods for determining the breeds of the sheep are needed to identify this type of sheep and better understand and improve the performance of the model. It is also possible that some of the misclassifications are due to the distance of the sheep from the camera or its posture. This type of misclassification can be rectified by capturing and processing multiple frames of the same sheep as it walks closer to the camera and using the most common prediction from all the frames. The response time of the sheep breed classifier was measured to study its practicality for use on a farm during the drafting process. The average time the model took to classify a single image was found to be 0.7 s on a machine with 2.9 GHz processor. In a drafting process, the speed of drafting is bound by the speed of weighing a sheep and the speed of sheep walking through the drafting system. Looking at the drafting videos captured in this study, a single sheep was in the field view of the camera for seventeen seconds on average which gives the model sufficient time for identifying its breed using multiple images for even higher accuracy. Therefore, the model is fast enough to be
Fig. 12. Average classification accuracy per breed.
• As •
• • • •
long as the underlying classifier is well trained, classifying images with varying noise and variation can be done with similar levels of accuracy. Minimal image preprocessing (resizing and augmentation) is required to train an on farm sheep breed classifier. This is because CNN model proved to be invariant to viewpoint, sheep posture, size or illumination when trained using the full image dataset. This makes sheep breed classification very robust and practical for on farm use. A picture of the sheep can be taken at any point when the sheep is in the race. This is because CNN model was able to handle full body images as well as zoomed-in face images. The face of the sheep has key information that differentiates between the different breeds. To identify breeds during a drafting session, cameras should be mounted such that the face of the sheep can be captured clearly. Drafting is performed when sheep approach the slaughter age. As sheep in this study were at the typical slaughter age, it may be argued that breed classification using face imaging alone has suitability during drafting sessions. As the model proved that breed classification during drafting can be done through the face image alone, a picture of the sheep’s face on its own could provide key features for future image-based sheep classifications.
The effect of modifying the maximum training epochs was analysed to identify the best training parameters. It took twelve hours to fine tune VGG-16 for five epochs and twenty-four hours to train it for ten epochs on the same machine. Training for ten epochs showed a slight Table 3 A list of missclassifications with details of classes and model’s confidence levels. Predicted class
Actual class
Softmax output
Predicted class
Actual class
Softmax output
Poll Dorset White Suffolk Marino Marino Poll Dorset Marino Poll Dorset Poll Dorset Marino White Suffolk White Suffolk Suffolk White Suffolk White Suffolk Suffolk White Suffolk Marino Poll Dorset
Marino Marino Poll Dorset Suffolk White Suffolk White Suffolk Marino Marino White Suffolk Marino Marino Marino Marino Marino Poll Dorset Poll Dorset Suffolk White Suffolk
56.9% 81.6% 73.8% 56.5% 70.5% 97.7% 84.7% 49.7% 96.9% 91.8% 50.4% 88.8% 83% 99.7% 54.9% 96.2% 97.4% 71.8%
White Suffolk Poll Dorset Marino Suffolk Poll Dorset Marino White Suffolk Marino White Suffolk Suffolk Poll Dorset Suffolk Poll Dorset White Suffolk Suffolk Marino Suffolk
Marino Marino Suffolk White Suffolk White Suffolk White Suffolk Marino Suffolk Marino Marino Marino Marino Marino Marino Poll Dorset Poll Dorset White Suffolk
95.4% 89.2% 98.7% 94.1% 92.4% 71.4% 99.6% 96.2 % 82.9 % 99.9 % 73.1% 49.3% 51.2% 78% 99.8% 99.2% 97.3%
11
Computers and Electronics in Agriculture 167 (2019) 105055
S. Abu Jwade, et al.
Fig. 13. Examples of miss-classified sheep images with predicted labels (P) and actual labels (A).
integrated into a real time drafting system.
a discriminant analysis approach. SpringerPlus 5 (1), 1–12. Atanbori, J., Duan, W., Murray, J., Appiah, K., Dickinson, P., 2016. Automatic classification of flying bird species using computer vision techniques. Pattern Recogn. Lett. 81 (C), 53–62. Australian merino, 2016. Australian Association of Stud Merino Breeders, accessed: 201804-14. [Online]. Available: http://merinos.com.au/genetics/merino-history/ australian-merino. Bayramoglu, N., Heikkilä, J., 2016. Transfer learning for cell nuclei classification in histopathology images. In: Hua, G., Jégou, H. (Eds.), Computer Vision – ECCV 2016 Workshops. Springer International Publishing, Cham, pp. 532–539. Bunbury, 2018. WA: daily weather observations 2018. Bom.gov.au [Online]. Available: http://www.bom.gov.au/climate/dwo/IDCJDW6017.latest.shtml. Burke, J., Nuthall, P., McKinnon, A., 2004. An analysis of the feasibility of using image processing to estimate the live weight of sheep. Carneiro, H., Louvandini, H., Paiva, S., Macedo, F., Mernies, B., McManus, C., 2010. Morphological characterization of sheep breeds in brazil, uruguay and colombia. Small Ruminant Res. 94 (1), 58–65. Deng, J., Dong, W., Socher, R., Li, L., Li, K., Fei-Fei, L., June 2009. ImageNet: A large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. Devikar, P., 2018. Transfer learning for image classification of various dog breeds. Int. J. Adv. Res. Comput. Eng. Technol. (IJARCET) 5(12), 2707–2715 [Online]. Available: http://ijarcet.org/wp-content/uploads/IJARCET-VOL-5-ISSUE-12-2707-2715.pdf. Finlayson, J., Cacho, O., Bywater, A., 1995. A simulation model of grazing sheep: Animal growth and intake. Agric. Syst. 48 (1), 1–25. Gopalakrishnan, K., Khaitan, S.K., Choudhary, A., Agrawal, A., 2017. Deep convolutional neural networks with transfer learning for computer vision-based data-driven pavement distress detection. Construct Build Mater. 157, 322–330 [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0950061817319335. He, K., Zhang, X., Ren, S., Sun, J., 2016. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778. Hinton, G.E., Osindero, S., Teh, Y.-W., 2006. A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554. https://doi.org/10.1162/neco.2006.18.7. 1527, pMID: 16764513. [Online]. Hong, F., Tan, J., Mccall, D., 2000. Application of neural network and time series techniques in wool growth modeling. Trans. Asae 43 (1), 139–144. Hopkins, D., 1991. Estimating carcass weight from liveweight in lambs. Small Ruminant Res. 6 (4), 323–328. Kassler, M., 2001. Automatic counting of sheep. Meat & Livestock Australia Ltd [Online]. Available: https://www.mla.com.au/download/finalreports?itemId=772. Kirton, A.H., Carter, A.H., Clarke, J.N., Duganzich, D.M., 1984. Dressing percentages of lambs. New Zealand Soc. Animal Prod. 44, 231–233. Krizhevsky, A., Sutskever, I., Hinton, G., 2017. ImageNet classification with deep convolutional neural networks. Commun. ACM 60 (6), 84–90. Kumar, S., Pandey, A., Sai Ram Satwik, K., Kumar, S., Singh, S.K., Singh, A.K., Mohan, A., 2018. Deep learning framework for recognition of cattle using muzzle point image pattern. Measurement 116, 1–17. Liu, J., Kanazawa, A., Jacobs, D., Belhumeur, P., 2012. Dog breed classification using part localization. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (Eds.), Computer Vision – ECCV 2012. Springer, Berlin Heidelberg, Berlin, Heidelberg, pp. 172–185. Long, J., Shelhamer, E., Darrell, T., June 2015. Fully convolutional networks for semantic segmentation. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Lu, Y., Mahmoud, M., Robinson, P., 2017. Estimating sheep pain level using facial action unit detection. In: 2017 12th IEEE International Conference on Automatic Face Gesture Recognition (FG 2017), pp. 394–399. Manning, C.D., Raghavan, P., Schütze, H., 2009. Introduction to Information Retrieval, 3rd ed. Cambridge University Press. Matconvnet: cnns for matlab, Vlfeat.org [Online]. Available: http://www.vlfeat.org/
4. Conclusion This research studied the practicality of automating sheep breed identification using computer vision and machine learning techniques during a drafting session. The study analysed several training parameters and experimented with two types of training images: one using full sheep images which included noise and lots of variation and the second using cropped facial images with minimal noise and variation. The performance of the trained classifiers were evaluated on their average accuracy and standard deviation using fivefold cross validation. We concluded that fine-tuning the last six layers of VGG-16 for 10 epochs achieved the highest classification accuracy of 95.8% with 1.7 standard deviation. We also observed that feeding the classifier with datasets including variations in viewpoint, sheep posture or illumination had no impact on the final performance of the classifier which proves its robustness. The model was proven to be very practical and effective for on farm use as it takes 0.7 s on average to process a single image. The classifier developed in this study could assist sheep meat producers as a tool to accurately and efficiently differentiate between breeds without the need of an expert and allow more accurate estimation of meat yield and cost management. Future work can be focused on expanding the existing datasets by adding images of more breeds at different times of the year and sheep at different ages. It may also be possible to integrate the techniques developed here with the detection and counting techniques developed by others (Sarwar et al., 2018) to identify breeds of sheep in an open area where more than one sheep are captured at once. Moreover, to overcome the uncertainty associated with subjective labels, objective methods need to be used to verify the ground truth labels. Funding This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors. Acknowledgment Our special thanks goes to Alan Garstone, The Kingston Rest sheep producer, who provided us an opportunity to visit his farm and helped identifying the sheep breeds. References Armstrong, J.S., 2012. Illusions in regression analysis. Int. J. Forecast. 28 (3), 689–694. Asamoah Boaheng, M., Sam, E., 2016. Morphological characterization of breeds of sheep:
12
Computers and Electronics in Agriculture 167 (2019) 105055
S. Abu Jwade, et al.
Sarwar, F., Griffin, A., Periasamy, P., Portas, K., Law, J., 2018. Detecting and counting sheep with a convolutional neural network. pp. 1–6. Searle, T.W., Graham, N.M., Donnelly, J.B., Margan, D.E., 1989. Breed and sex differences in skeletal dimensions of sheep in the first year of life. J. Agric. Sci. 113 (3), 349–354. sheep characteristics, breeds and facts, 2019. Encyclopedia Britannica [Online]. Available: https://www.britannica.com/animal/sheep. Simonyan, K., Zisserman, A., 2014. Very deep convolutional networks for large-scale image recognition. Sokolova, M., Lapalme, G., 2009. A systematic analysis of performance measures for classification tasks. Informat. Process. Manage. 45(4), 427–437 [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0306457309000259. Spoliansky, R., Edan, Y., Parmet, Y., Halachmi, I., 2016. Development of automatic body condition scoring using a low-cost 3-dimensional Kinect camera. J. Dairy Sci. 99 (9), 7714–7725. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A., 2015. Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9. Van Hertem, T., Viazzi, S., Steensels, M., Maltz, E., Antler, A., Alchanatis, V., SchlageterTello, A.A., Lokhorst, K., Romanini, E.C., Bahr, C., Berckmans, D., Halachmi, I., 2014. Automatic lameness detection based on consecutive 3D-video recordings. Biosyst. Eng. 119 (C), 108–116. Yosinski, J., Clune, J., Bengio, Y., Lipson, H., 2014. How transferable are features in deep neural networks? In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (Eds.), Advances in Neural Information Processing Systems 27. Curran Associates Inc, pp. 3320–3328. [Online]. Available: http://papers.nips.cc/ paper/5347-how-transferable-are-features-in-deep-neural-networks.pdf. Zeiler, M.D., Fergus, R., 2013. Visualizing and understanding convolutional networks.
matconvnet/. MLA, 2017. Market information services – sheep assessment manual, accessed: 2018-0414. [Online]. Available: https://www.mla.com.au/globalassets/mla-corporate/ prices-markets/documents/minlrs-information-brochures-etc/mla-sheep-assessmentmanual-jan-2017.pdf. Molchanov, P., Tyree, S., Karras, T., Aila, T., Kautz, J., 2016. Pruning convolutional neural networks for resource efficient transfer learning. CoRR, abs/1611.06440. Nasiriany, S., Thomas, G., Wang, W., Yang, A., 2018. A Comprehensive Guide to Machine Learning. University of California [Online]. Available: http://snasiriany.me/files/mlbook.pdf. Oquab, M., Bottou, L., Laptev, I., Sivic, J., 2014. Learning and transferring mid-level image representations using convolutional neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition. Proceedings, pp. 1717–1724. [Online]. Available: http://search.proquest.com/docview/1677905608/. Parkhi, O.M., Vedaldi, A., Zisserman, A., Jawahar, C.V., 2012. Cats and dogs. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3498–3505. RMIT, 2017. Automated visual inspection and preparation of live animals for meat processing. Rodriguez, I.F., Megret, R., Acuna, E., Agosto-Rivera, J. L., Giray, T., March 2018. Recognition of pollen-bearing bees from video using convolutional neural network. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 314–322. Rowe, J., Atkins, K., 2006. Precision sheep production pipedream or reality? In: Australian Society of Animal Production 26th Biennial Conference, No. 33. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L., 2015. ImageNet large scale visual recognition challenge. Int. J. Comput. Vision (IJCV) 115 (3), 211–252.
13