Image-based species identification of wild bees using convolutional neural networks

Image-based species identification of wild bees using convolutional neural networks

Journal Pre-proof Image-based species identification convolutional neural networks of wild bees using Keanu Buschbacher, Dirk Ahrens, Marianne Es...

511KB Sizes 0 Downloads 30 Views

Journal Pre-proof Image-based species identification convolutional neural networks

of

wild

bees

using

Keanu Buschbacher, Dirk Ahrens, Marianne Espeland, Volker Steinhage PII:

S1574-9541(19)30328-0

DOI:

https://doi.org/10.1016/j.ecoinf.2019.101017

Reference:

ECOINF 101017

To appear in:

Ecological Informatics

Received date:

17 May 2019

Revised date:

24 September 2019

Accepted date:

25 September 2019

Please cite this article as: K. Buschbacher, D. Ahrens, M. Espeland, et al., Image-based species identification of wild bees using convolutional neural networks, Ecological Informatics(2019), https://doi.org/10.1016/j.ecoinf.2019.101017

This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

© 2019 Published by Elsevier.

Journal Pre-proof

Image-Based Species Identification of Wild Bees Using Convolutional Neural Networks Keanu Buschbachera, Dirk Ahrensb, Marianne Espelandb, Volker Steinhagea,* [email protected] a

Department of Computer Science IV, University of Bonn, Endenicher Allee 19A, D-53115 Bonn, Germany

b

Zoological Research Museum Alexander Koenig, Adenauerallee 160, D-53113 Bonn, Germany

*

of

Corresponding author.

Abstract

ro

Monitoring insect populations is vital for estimating the health of ecosystems. Recently,

-p

insect population decline has been highlighted both in the scientific world and the media. Investigating such decline requires monitoring which includes adequate sampling and correctly

re

identifying sampled taxa. This task requires extensive manpower and is time consuming and hard, even for experts, if the process is not automated. Here we propose DeepABIS based on the

lP

concepts of the successful Automated Bee Identification System (ABIS), which allowed mobile field investigations including species identification of live bees in field. DeepABIS features three

na

important advancements. First, DeepABIS reduces the efforts of training the system significantly by employing automated feature generation using deep convolutional networks (CNN). Second,

ur

DeepABIS enables participatory sensing scenarios employing mobile smart phones and a cloud-based platform for data collection and communication. Third, DeepABIS is adaptable and

Jo

transferable to other taxa beyond Hymenoptera, i.e., butterflies, flies, etc. Current results show identification results with an average top-1 accuracy of 93.95% and a top-5 accuracy of 99.61% applied to data material of the ABIS project. Adapting DeepABIS to a butterfly dataset showing morphologically difficult to separate populations of the same species of butterfly yields identification results with an average top-1 accuracy of 96.72% and a top-5 accuracy of 99.99%.

1.

Introduction Species classification is a common task in computer vision which traditionally requires

extensive domain knowledge of the problem at hand. Recent advances in machine learning based image recognition have moved attention to data driven approaches, which require much less effort

Journal Pre-proof on developing problem-specific techniques. They instead focus on learning distributions of features present in the underlying data which discriminate species, or more generally taxa, from one another. Convolutional Neural Networks (CNNs) have proven to be particularly successful at classifying and detecting objects in images compared to traditional descriptor based strategies, and even outperformed humans in terms of classification accuracy (He et al., 2015b). They have been successfully applied to species identification tasks for plants (Barre et al., 2017), wild animals in camera-trap images (Norouzzadeh et al., 2018) and birds (Martinsson, 2017). In the specific field of entomology (i. e., the scientific study of insects) and particularly, in

of

melittology (i. e., the scientific study of bees) taxonomic identification can be extraordinary

ro

difficult for human experts. Images of insects can vary heavily within their respective species while looking similar to images depicting different species. As a consequence, trained taxonomists

-p

are lacking, which presents an additional problem that warrants automated solutions. The investigation of concerns regarding a potential decline of bee populations, for example claims

re

made about a Colony Collapse Disorder (CCD), requires intensive monitoring of their populations

lP

which includes collecting and identifying specimens at farms and in the wild alike. Wing venation has long been an important characteristic to distinguish between genera and species within the

na

Hymenoptera. Using pattern recognition techniques, automatic processes have been developed

1.1.

Related work

ur

that apply geometric morphometrics to digital images of insects.

Jo

The employment of computer aided identification systems in zoological systematics and biodiversity monitoring is around for decades. For insects in particular, the difficult taxonomy and the lack of experts greatly hampers studies on conservation and ecology. This problem was first emphasized at the UN Conference on Environment, RIO 1992, leading to a directive to intensify efforts to develop automated identification systems for insects, especially for pollinating insects. Fostered by ongoing global climate change and a growing dramatic decline in insect populations the number of approaches to automated identification and monitoring systems for insects has growing considerably. Recently, a comprehencive survey on image-based approaches to insect classification was published by Martineau et al. (2017). Without repeating their survey, we update the survey by introducing a slightly different categorization from the perspective of application settings.

Journal Pre-proof The application categories are: (1) digitization and identification of samples in scientific collections and museums; (2) field-based identification of specimens; (3) collection and identification by participatory sensing and web portals. (1) For the digitization and identification of samples in scientific collections and museums, DAISY is one of the most popular examples. DAISY stands for Digital Automated Identification SYstem (Weeks et al., 1999; O’Neill, 2000; Gaston and O’Neill, 2004) and is designed for the identification of different taxa of insects using particularly the wing shapes and venations. The basic algorithm of DAISY is based on a recognition via eigen-images. User interaction is required

of

for image capture and processing, because specimens must be aligned in the images. Reed (2010)

ro

reports that DAISY is backed and under evaluation by the Natural History Museum (NHM) in London. Species identification is reported with accuracies up to 95%.

-p

Russell et al. (2008) propose with SPIDA (Species IDentification Automated) a specifically designed identification system for Australian spiders. For identification SPIDA is

re

using images of the spider’s external genitalia. Therefore, this approach has the drawback that all

lP

specimens has to be manipulated manually. Also, image capture and preprocessing require user interaction. Accuracies of species identification are reported in the range of 90–96%.

na

To foster high throughput of the digitization and identification of samples in scientific collections and museums, Hudson et al. (2015) propose INSELECT as an automated approach to whole-drawer scanning by localizing individual specimens in the images of the scanned drawers.

ur

The output of INSELECT shows the cropped specimen images as well as extracted meta data

Jo

generated by barcode reading, label transcription and metadata capture. Simalar approaches are proposed by Holovachov et al. (2014) and Mantle et al. (2012). (2) For the field-based identification of specimens, the Automated Bee Identification System (ABIS) offers a mobile and fully automated approach to the species identification of Hymenoptera (Steinhage et al., 1997; Roth et al., 1999; Steinhage et al., 2007). ABIS provides a fully automated extraction of morphological features from wing images followed by linear and nonlinear kernel discriminant analysis (Roth and Steinhage, 2000) for species identification achieving recognition rates up to 99.3%. No user interaction is required for image alignment since a thorough model-based analysis of the wing features is used for fully automated image alignment. Live bees captured in the wild can be cooled down using a standard icebox making them immobile for a time suffcient for image capture and species identification without causing the bees long-term

Journal Pre-proof harm. ABIS is also used for the identification within the next taxonmical levels, i.e. subspecies and races, which are far more similar and therefore more difficult to separate (Drauschke et al., 2007; Francoy et al., 2008, 2009). ABIS correctly identified 98.05% of the specimens from all subspecies. Considering just Africanized bee specimens, among 5280 identifications, only four Africanized bees were incorrectly identified. Species identification of live specimens is also reported by Mayo and Watson (2007) where live moths are also refrigated while being imaged. They first used DAISY for species identification but DAISY required that each moth’s least worn wing region be highlighted

of

manually for each image (Watson et al., 2004). Using the WEKA toolkit (Frank et al., 2005) they

ro

achieve a greater level of accuracy of 85%.

In the field, also high throughput and especially for monitoring populations automated

-p

24 / 7 sensing is in demand using camera traps, light traps, pheromone traps etc. A lot of these insect trapping results in so-called insect soup imagery, i.e., images showing dozens or hundreds

re

of partially overlapping insects. Therefore, the important first stepp is to identfiy the individual

lP

specimens before performing species identification (Mele, 2013; Yao et al., 2013; Ding and Taylor, 2016). After the localization and detection individual specimens the species identification

na

can be performed by appropriate identification approaches like DAISY, ABIS, etc. (3) Speeding up the collection and integration of observed of identified data is a crucial step towards the monitoring and conservation of biodiversity. The aforementioned SPIDA system

ur

(Russell et al., 2008) is one of the first examples for employing a web portal and participatory

Jo

sensing: users can submit their own images of spiders for classification, although some expertise and equipment is required to obtain optimal images. This opportunity yields a win-win scenario by providing a useful service to interested novices and and experts in the diversity of spiders on the one hand and additional training data for the SPIDA system on the other hand. There are plenty of insect and animal ID websites availabe. A comprehensive overview including web links is given by EntomologyToday (2019). One of the most challenging approaches with respect to crowd-sourced species identification is the iNaturalist species identification website and dataset (iNaturalist, 2019). There is reported that in April 2017, iNaturalist had around 5,000,000 ’verifiable’ observations representing around 100,000 distinct species. The dataset features visually similar species as well as a large class imbalance, i.e. many images for very prominent animal groups like birds and butterflies and few images for others.

Journal Pre-proof Addionally, the images are captured in a wide variety of situations with different camera types having varying image quality. Therefore, results are reported to show only 67% top one classification accuracy, illustrating the difficulty of the dataset (Horn et al., 2018; iNaturalist, 2019).

1.2.

Feature engineering vs. user interaction The high recognition rates of the Automatic Bee Identification System (ABIS) up to 99.3%

in species identification and up to 98.05% in subspecies identification as well as the fully

of

automated image alignment are due to a thorough model-based analysis of the wing features. For

ro

each genus an explicit graph model of the genus-specific wing venation is interactively trained from all trainng images of the respective genus. These genus-specific models depict “maps” of the

-p

venation and cell shapes of each bee genus respectively to control an automated contour-based extraction of all veins, cells, and vein junctions. Finally, using these extracted morphological

re

features, genera and species of bees can be identified with LDA, NKDA and SVM classifiers.

lP

This way, ABIS like many of the aforementioned approaches to insect identification rely on the analysis of hand-crafted morphological features. Due to the employment of such explicitly

na

modeled morphological features of certain orders, families or genera such identification approaches are more or less tailored to the respective groups of insects. Therefore, adaptability and

ur

transferablity to other insect groups was hitherto impossible or required severe algorithmic changes in the feature detection part.

Jo

Such traditional approaches like DAISY that are designed for the identification of different taxa of insects pay the price by requiring user interaction because there is no strong taxon-specific model that controls automatically image alignment and detection of relevant morphological image regions (Watson et al., 2004).

1.3.

Contributions The aforementioned Automatic Bee Identification System (ABIS) was developed in our lab.

Therefore, the strengths of ABIS, i.e., mobility, fully automation, applicability in the field and in scientific collections and and identification of live specimens, form one starting point of our new approach to the visual species identification. On the other hand, we aim to overcome the limitation to the order of Hymenoptera by offering species identification of different taxa of insects like

Journal Pre-proof moths, butterflies, bugs, etc. aside from Hymenoptera. Additionally, we aim to extend the new approach from being just a mobile system for species identification into a web-based framework offering the opportunity for participatory sensing and a cloud-based collection and integration of observed insects. Summing up, our new approach should contribute to all aforementioned three application categories (cf. section 1.1): (1) digitization and identification of samples in scientific collections and museums; (2) field-based identification of specimens; (3) collection and identification by participatory sensing and web portals.

of

From the technical point of view, our approach shows three contributions:

ro

(1) To avoid both alternative burdens of explicit feature engineering on the one hand and the need for interactive image alignment and manual marking of relevant image regions on the

-p

other hand, the new approach employs deep convolutional neural networks (CNN) for species identification. Due to the deep learning framework, training of taxa is possible simply by efficient

re

end-to-end learning on training data for the taxa. No programming of new program code or

lP

adaption of given code is necessary. In end-to-end learning, training is just based on pairs of input images and the corresponding labeling (for example, wing images of bees with the corresponding

na

labeling like Bombus terrestris).

In reminiscence of our well-known ABIS approach and due to the deep learning approach, we call our new approach DeepABIS.

ur

(2) To enable participatory sensing and the collaborative gathering of data and knowledge

Jo

DeepABIS is developed as a mobile Android application that is easily installable, accessible and usable in the outdoor and indoor environment. The app contains the full neural network and thus works offline, which is possible due to the low memory and processing requirements of the MobileNet architecture employed in DeepABIS. The collaborative Framework is completed by a cloud-based web application offering a central data collection and communication platform. (3) The adaptability and transferablity of DeepABIS to other insect taxa is now simply given by the end-to-end learning of DeepABIS. Since training of DeepABIS is just based on pairs of input images and the corresponding labeling, the adaption of DeepABIS to different applications in the field of taxonomic identification can be done by simply train DeepABIS on new approriate data sets providing just pairs of input images and the corresponding labeling.

Journal Pre-proof

2.

Data and Methods This section presents the data sets and the methods used to implement the three

contributions of DeepABIS, i.e., automated feature generation, suitability for participatory sensing, and adaptability and transferability. Subsection 2.1 depicts two data sets. The first data set comprises the original data set that was processed by the classic ABIS approach (cf. section 1.2) to prove the suitability of the new DeepABIS approach for species identification of wild bees. The second dataset shows images of different populations of the same species of butterflies to

Datasets

ro

2.1.

of

demonstrate the adaptability and transferability of DeepABIS.

The first data set origins from the classic ABIS approach and (cf. section 1.2) and

-p

comprises wing images. They were collected by entomologists of the former Institute of

re

Agricultural Zoology and Bee Biology at the University of Bonn and include species from Germany, Brazil, the United States and China. A total of 9942 images, 166 genera and 881 species

lP

is present in the dataset. Each image depicts a bee’s forewing, showing the venation shape and wing structure. Sample images are shown in fig. 1 wings of a honey bee, a bumble bee, a mason

na

bee and a leafcutter bee, respectively.

Although the dataset is fairly large in terms of total images and number of classes, for

ur

multi-class classification, regression and deep learning in particular the number of samples per class is an important factor for successful estimation. Unfortunately, the number of samples per

Jo

class turned out to be relatively sparse: for all 9942 images, the sample-to-species ratio was 11.28, with only 71.4% of species having more than one and 42.5% having more than two samples. Classes containing too few samples cannot be learned effectively with deep learning, and classes containing only one sample are unable to be evaluated at all. Thus, as a baseline dataset for training and evaluation, only genera having more than 100 images, and from those only species having more than 20 images are kept. The resulting dataset contains 7595 images belonging to 8 genera and 124 different species. Still existing class imbalances and low representations are compensated with class weighting and data augmentation (see section 2.2.1).

Figure 1: Examples of the first data set: wing images of wild bees.

Journal Pre-proof The Lepidoptera dataset comprises the second data set used in this work and is provided by the Alexander Koenig Research Museum. It consists of five classes, each one representing pinned specimens of a different population of the same species of butterflies, Parnassius apollo (Papilionidae): the Altmühl valley (Bavaria, Germany) , the Mosel river (Rhineland Palatinate, Germany), the Madonie mountains (Sicily, Italy) the Sierra Nevada (Spain) and the Tian Shan mountain range (Kazakhstan). Sample images are shown in fig. 2. The data set contains 300 images with exactly 60 images per population (30 males and 30 females).

2.2.

ro

of

Figure 2: Female lepidoptera specimen, each one representing a different population.

Model Architecture

-p

DeepABIS is planned to be a portable system able to be utilized by researchers out in the field. Smartphones have seen their computation and memory capacity grow in the recent years.

re

Conventional convolutional networks however still have too many parameters as to be feasible

lP

options for real-time scenarios. Even for standard applications like species identification where there are no time constraints, long inference times present unpractical and unnecessary obstacles

na

for end users. Model files can reach sizes of several hundred megabytes which, while easily stored nowadays, should still be avoided for mobile applications where download speed plays a role. For

ur

this reason, a different approach needs to be considered. The following section first describes MobileNet as a solution for this problem and then lays out the architecture adopted for DeepABIS.

Jo

Howard et al. (2017) propose MobileNets as computation and memory efficient convolutional neural networks for mobile vision applications. They design convolutional layers that take up less space and require less calculations than conventional ones by applying depth-wise separable convolutions. Convolution operations are split into depth-wise and point-wise convolutions: with depth-wise convolutions, a single filter per input channel is applied, while point-wise convolutions create a linear combination of the depth-wise output by applying 11 convolutions. This way, convolution is factorized into a filtering and a combination layer, compared to filtering and producing the output feature maps at once. Compared to a standard convolutional MobileNet without depth-wise separable convolutions, MobileNet performs 1.1% less accurate on ImageNet with 4 million instead of 29 million parameters. This would be a reasonable tradeoff for applications like DeepABIS where mobile usability for end users is a

Journal Pre-proof non-negligible factor. Furthermore, the authors improve MobileNet with an updated version called MobileNetV2 (Sandler et al., 2018). This version introduces bottleneck residual blocks which have two more layers: one expansion layer before the depth-wise convolutional layer and one projection layer after it. The expansion layer expands the number of input channels with an expansion factor t  while the projection layer projects data with a high number of channels into a vector with fewer dimensions. These two layers share a residual connection to help gradients flowing to earlier layers.

of

The architecture adopted for this work is the standard MobileNetV2 architecture as

ro

visualized in table 1. MobileNet with almost unchanged net structure was, in addition to computational and storage-wise efficiency, partly chosen for this approach because with

-p

preserving the original structure it is possible to harness the advantages of transfer-learning.

Layer

Expansion

Stride

2242  3

Conv 3  3

re

Filters

-

32

2

1122  32

Bottleneck

1

16

1

1122 16

Bottleneck

6

24

2

562 16

Bottleneck

6

24

1

562  24

Bottleneck

6

32

2

282  24

Bottleneck

6

32

1

282  24

Bottleneck

6

32

1

Bottleneck

6

64

2

142  32

Bottleneck

6

64

1

142  32

Bottleneck

6

64

1

142  32

Bottleneck

6

64

1

142  64

Bottleneck

6

96

1

142  64

Bottleneck

6

96

1

142  64

Bottleneck

6

96

1

142  96

Bottleneck

6

160

2

na

Jo

282  32

lP

Input

ur

Table 1: The MobileNetV2 model architecture adopted for DeepABIS.

Journal Pre-proof 72  96

Bottleneck

6

160

1

72  96

Bottleneck

6

160

1

72 160

Bottleneck

6

320

1

72  320

Conv 11

-

1280

1

72 1280

AvgPool 7  7

-

1280

-

12 1280

Conv 11

-

124

-

Pre-trained weights can enable faster convergence and can lead to ultimately better results

of

(see ?? and section 3.1.2). For this reason, the input is a 224  224  3 vector that matches the

ro

input of the publicly accessible MobileNet model trained on the ILSVRC dataset (Russakovsky et al., 2015).

-p

The last layer was replaced with a fully-connected layer of size 124 in accordance with

re

the number of species to be classified. As activation of the classifier layer serves the softmax function. Note that the spatial dimensions of the layer inputs only change after layers with stride

lP

s > 1 and that no max pooling layers are used for dimensionality reduction. Springenberg et al. (2014) show that they do not always improve model performance and can be replaced by

na

convolutional layers with increased stride. For the given model architecture, the number of parameters is quantified in table 2. Model size takes up approximately 20MB of memory.

Trainable Non-trainable

Jo

ur

Table 2: The parameter count of the MobileNet model.

Total

# Parameters 2,382,140 34,112 2,416,252

2.2.1. Data augmentation and class weights The ABIS dataset used in this work contains 7595 images, but is expanded for training using data augmentation. The following transformations are applied: Rotate , Zoom , Shear , FlipUD and Pepper . Data augmentation effectively doubles the training samples as every

sample here is augmented once. During initial evaluation of the dataset, it became obvious that image samples are not

Journal Pre-proof distributed evenly among classes. While the mean number of images per class with 61.25 is moderately sufficient, the standard deviation amounts to 185.4, which indicates a class imbalance. Training with this imbalance could lead to lower accuracy on lower represented classes because they experience less weight updates than classes that have high representations in the population. This might not affect total accuracy. Low performance on those classes is counterbalanced with higher performance on highly represented classes, which outweigh them in numbers. Nevertheless, the goal of DeepABIS is to provide good estimations on all species instead of just a few. One solution to this problem is duplicating samples to compensate low representation using

of

data augmentation. Similarly, one could simply ignore or remove images of classes that outweigh

ro

the others. This could diminish however the learning of crucial features necessary to distinguish classes from another, because it needlessly removes a fairly high amount of data, reducing

-p

regularization and in turn generalization capabilities. Instead, class weighting is chosen as a solution. During training, instances of classes are weighted differently depending on their

re

prevalence in the population. For distribution D and class i , the weight wi of class i can be

|D| |{I j  D | class( I j ) = i}|

na

wi =

lP

given as:

(1)

The solver then aims to minimize the loss function

ur

L( yˆ , y) = wi yˆi log( yi )

(2)

i

Jo

which is equivalent to maximizing the log-likelihood for all classes evenly by using weighted log-likelihood (King and Zeng, 2001). By using class weighting, per-class accuracy could be improved, as higher gradients are propagated to weights that are important for minimizing the error of marginally represented classes. Lower gradients in case of highly represented classes in turn are compensated by default through the higher number of gradient updates in total (compared to classes with greater weights).

2.2.2. MobileABIS and CloudABIS

Figure 3: Screenshots of a MobileABIS inference (left) and a CloudABIS inference (right).

Journal Pre-proof For researchers and laymen without data science capabilities, the core classifier of DeepABIS should still be made available in form of a bundled application. For now, a mobile application developed for the Android operating system is presented. It allows offline identification of bee species by taking a picture of a bee’s wing, or alternatively choosing a picture from storage (e.g. to use pictures taken by a connected microscope). As previously shown, the MobileNet architecture is a suitable model for this task: its weights files are lightweight, its computation requirements are low and it is the best performing model in the evaluation presented in this work. Moreover, additional optimizations are possible for use in the Android environment

of

with the TensorFlow Lite framework (Abadi et al., 2017), which utilizes the underlying Android Neural Networks API to provide hardware acceleration. To make a model file loadable for mobile

ro

and embedded devices, its weights must first be ”frozen”, i.e. the training layers must be removed.

-p

Secondly, the model is optimized by pre-computing weights for convolutions used during batch normalization, removing nodes in the computational graph that are never reached during

re

inference, and fusing common operations into unified versions. The final transformed graph is

lP

then converted into the compressed TFLite format which can be read by the TensorFlow Lite interpreter used in the application. A screenshot of the developed application can be viewed in fig.

na

3. Inference results show the class with the highest output confidence as predicted species, with the following four classes ranked below.

Inference should be possible not only on mobile devices, but on conventional home

ur

computers as well. For this reason, a web application mirroring the features of the mobile

Jo

application is proposed, called CloudABIS. Users can drag and drop their images of bee wings into the page and view the inference result presented as a ranking of the top-5 class predictions, again including each class probability (see fig. 3). This application is developed using the Laravel PHP framework (Otwell, 2011), which communicates with an inference server built in Python providing the TensorFlow backend. The project is made available on deepabis.github.io. On this page, we outline instructions on how to set up and operate the web and mobile applications of DeepABIS and also specify respective software and hardware requirements.

3.

Results Training and evaluation of the presented model architectures is done with the popular deep

Journal Pre-proof learning framework TensorFlow (Abadi et al., 2015), which works by descriptively building computation graphs and supports training on GPUs. Network models are designed and constructed with the Python wrapper library Keras (Chollet et al., 2015), which uses TensorFlow as its backend. Identification is performed on a subset of the ABIS dataset described in section 1.2, which was split into a training and test set using a ratio of 0.1, the latter comprising 760 samples.

3.1.

Evaluation of identification performance The first part of the evaluation should proof that the adapted MobileNetV2 architecture of

of

DeepABIS is the right choice for achieving best accuracies in species identification. Therfore, we

ro

compare our MobileNetV2 architecture of DeepABIS with two alternative architectures for DeepABIS. Additionally, the effects of class weighting and pre-training on the performance are

-p

evaluated. As the first baseline method the Branch Convolutional Neural Network (B-CNN) (Zhu and Bain, 2017) is selected because its architecture reflects in a natural way a stepwise species

re

identification by determining fits the genus of a given sample and then its species. Unlike a

lP

conventional CNN, which has just one series of layers, a B-CNN has layers branching off the stem of the network, leading to respective output layers. The network is trained using a weighted loss

na

function that combines the losses of each branch. During the training procedure at the end of each epoch, the loss weights are adapted in order to switch from weighting coarse errors more heavily at

ur

the beginning to weighting fine errors more heavily near the end. This is called the ”Branch Training Strategy”. This way, it can be trained to first predict features that are considered coarse

Jo

and more abstract, before concentrating on features which help in making finer predictions. For our approach, there is one coarse and one fine prediction branch: the coarse branch is trained to predict genera only, the fine branch to predict species of Apoidea. The losses of the coarse branch

k = 1 and fine branch k = 2 are weighted with A = ( A1 , A2 ) 

2

. In the beginning, the weight

distribution is defined as (0.99,0.01) and changes to (0.9,0.1) after eight, (0.3,0.7) after 18 and (0,1) after 28 epochs. The second baseline method is Inception-ResNet (Szegedy et al., 2016), which is a hybrid of the contributions from GoogLeNet (Szegedy et al., 2014) and ResNet (He et al., 2015a), namely Inception modules and residual connections. It achieved a top5-error rate of 3.08% on the validation set of ILSVRC 2012 and can be seen as high performance classification approach. Like MobileNet, versions that were pre-trained on ImageNet datasets are available.

Journal Pre-proof Each architecture was tested with and without the class weights option. For MobileNet and Inception-ResNet where transfer-learning is possible, instances were tested with random and with pre-trained weights. Performance is compared using the validation metrics top-1 and top-5 accuracy, precision, recall and F1-score. Each model’s validation accuracies during training are visualized in fig. 5. Respective evaluations are presented below.

3.1.1. Training strategies An excerpt of the training plots is given in fig. 4. In the course of the B-CNN loss, the

of

effects of the training strategy can be seen. The spikes in fig. 4a occur each time the loss weights

ro

are changed. As the spikes occur, composition of the total loss switches more and more from mostly coarse to a completely fine-grained prediction loss. Consequently, the B-CNN loss

-p

converges much later than the MobileNet loss, which can be observed in the loss curve in fig. 4b.

re

The B-CNN approach performs best using a class weighted loss with a top-1 accuracy of 85.92%.

lP

Figure 4: Comparison of loss curves produced by training baseline and MobileNet models.

na

3.1.2. Identification results

Compared to B-CNN, MobileNet and Inception-ResNet with randomly initialized weights do not perform particularly better in terms of accuracy: B-CNN stays behind by 2% at most (cf. 5).

ur

This could indicate that the choice of architecture itself does not matter much on the ABIS dataset,

Jo

as each tested model predicts more than 85% of the validation set correctly. Inception-ResNet performs best with a maximum top-1 accuracy achieved of 88%, followed by MobileNet (86.71%) and B-CNN (85.92%).

Using transfer learning with pre-trained weights dramatically improves accuracy. Providing MobileNet with weights learned for the ImageNet dataset causes accuracy to reach 93.95%, while Inception-ResNet achieves 93.16%. Both pre-trained models converge after roughly 20 epochs, much earlier than their randomly initialized counterparts, which converge after 40 epochs. Judging from fewer spikes present on pre-trained models, training is also more stable with pre-trained weights. The fastest training was observed with Inception-ResNet, having the steepest learning curve and achieving an accuracy of 84% already at the fifth epoch. Although Inception-ResNet has a deeper network structure than MobileNet and performed

Journal Pre-proof better on the ImageNet dataset, it performs worse than MobileNet on the task of bee identification. One reason could be that MobileNet has fewer parameters than Inception-ResNet (2,416,252 vs. 54,526,748), so it takes more data and more epochs to train Inception-ResNet in order to get high results. For large datasets like ImageNet, more parameters might generally payoff more, while a dataset like ABIS, which was trained on for 60 epochs only, yields higher results on lightweight architectures.

of

Figure 5: Top-1 accuracies during training. All models in this plot were trained with class

ro

weights. MobileNetV2 PR and Inception-ResNet PR are pre-trained variants.

-p

3.1.3. Class weighting

Measuring the effects of class weighting requires employing metrics that take each class

re

representation into account. Precision, recall and F1-score metrics are therefore calculated on a

lP

species-to-species basis and then averaged. The results are shown in table 3.

na

Table 3: Results of the different architectures on the ABIS dataset. Values are given in percent. Precision, Recall and F1 values are averages of the respective per-class values. Top-1 and Top-5 are accuracies overall.

Top-1

Top-5

Precision

Recall

F1

85.26

97.50

80.85

77.84

77.31

87.37

98.29

83.19

81.53

80.56

88.03

98.95

84.33

81.41

80.42

MobileNet-PR

92.89

99.21

91.51

88.81

88.36

Inception-ResNet-PR

93.03

98.68

90.65

89.62

89.10

B-CNN-CW

85.92

97.89

81.55

78.36

77.94

MobileNet-CW

86.71

98.68

82.36

80.17

79.48

Inception-ResNet-CW

88.03

97.76

83.26

80.78

79.67

MobileNet-PR-CW

93.95

99.61

92.14

90.86

90.12

Inception-ResNet-PR-CW93.16

99.74

90.94

89.84

89.24

ur

Model

MobileNet Inception-ResNet

Jo

B-CNN

Journal Pre-proof Weighting each class during training according to their representation has a significant effect on detecting true positives (recall) and ruling out false positives (precision) for each class, however only the baseline model and models that were pre-trained show improvement. The pre-trained MobileNet model benefits the most from using class weights, gaining 1.06% top-1 and 0.4% top-5 overall accuracy. Most importantly, the averaged precision, recall and F1-score improved up to 2%. Its confusion matrix on the ABIS validation dataset (cf. fig. 6) nearly resembles a diagonal line, confirming the high performance on a per-class basis.

of

Figure 6: Confusion matrix for the pre-trained MobileNet model. Dark squares indicate high

Evaluation of MobileABIS and CloudABIS

-p

3.2.

ro

correlation between predicted and actual labels.

The performances of the mobile and web variants of DeepABIS are evaluated by

re

measuring the average duration of the species identification of one bee specimen. The

on the test set outlined in section 3.

lP

identification includes both image preprocessing and the network forward pass. The tests are run

na

For the mobile application, we run the evaluation on a OnePlus 5T (2017) with Android 9.0 and 8GB RAM. The web application runs on a virtual server with Ubuntu 14.04, 6GB RAM and 4

ur

vCores. For CloudABIS, no hardware acceleration is utilized compared to MobileABIS, which uses the Android Neural Networks API and an optimized TensorFlow Lite model format.

Jo

The mean inference speed of the mobile application is found to be 301 milliseconds. An inference of the web application takes approximately 404 milliseconds on average. This difference likely arises from GPU utilization of the smartphone, whereas CloudABIS uses CPU cores. However, both results indicate generally acceptable timeframes for non real-time applications.

3.3.

Demonstration of adaptability and transferability To demonstrate the adaptability and transferability of the DeepABIS approach, the system

is just fine-tuned using the second data set, i.e., the Lepidoptera data set, instead of the first data set, i.e., the Apoidea data set. Past approaches successfully applied machine learning methods to images of butterflies and moths. Xie et al. (2018) propose to use Faster R-CNN to detect and localize living butterfly

Journal Pre-proof species in the wild, and Mayo and Watson (2007) use Support Vector Machines to identify live moths, showing that their visible wing features can be indeed of use for machine learning algorithms to determine their species. Applying the newly fine-tuned DeepABIS to the Lepidoptera dataset, a top-1 accuracy of 96.72% was achieved with a pre-trained MobileNet architecture. This value is the average over accuracies of five random splits in 5-fold cross validation. Identification of insects other than bees is therefore possible, and easily achievable, with the use of convolutional neural networks and require no domain-specific preprocessing techniques, provided the datasets depict features which

ro

4.

of

are sufficient to identify the corresponding taxa.

Conclusions

-p

With DeepABIS, a flexible, end-to-end deep learning approach has been proposed that

re

meets three challenges: First, in contrast to highly engineered and customized automated approaches to species identification like the ABIS system, the new DeepABIS approach includes

lP

no complex handcrafted feature engineering, no critical-to-function components and no semi-automated steps. Instead, one has to present DeepABIS just an appropriate set of training

na

data to initialize a fully automated end-to-end learning of appropriate features and final species identification, i.e., the training data consists just of labeled raw data, such as images, nothing else.

ur

Based on such training, DeepABIS has achieved a top-1 accuracy of 93.95% and a top-5 accuracy of 99.61% on the dataset of the ABIS project. Second, DeepABIS has been embedded within two

Jo

infrastructures facilitating participatory sensing scenarios employing mobile smart phones and a cloud-based platform for data collection and communication in comfortable timeframes: The mean inference speed of the mobile application has been found to be 301 milliseconds. An inference of the web application is taking approximately 404 milliseconds on average. Third, DeepABIS is adaptable and transferable to other taxa beyond Hymenoptera, i.e., butterflies, flies, etc. As a proof of concept, DeepABIS has been adapted to a butterfly dataset showing morphologically difficult to separate populations of the same species of butterfly yielding identification results with an average top-1 accuracy of 96.72% and a top-5 accuracy of 99.99%. Future work will include different topics: •

Squeeze-and-Excitation modules (Hu et al., 2018) will be explored to further improve classification accuracy.

Journal Pre-proof •

Incorporating additional information about species like their geographic distributions may help ruling out wrong predictions, provided they are known at inference time.



Predicting additional categories that are not exclusive to the taxon hierarchy, like sex, would also enhance the classifier’s utility.



For species identification tasks where the number of different, existing species is often unknown, it is beneficial to detect outliers instead of outputting the most similar species the classifier knows of. Therefore, novelty or anomaly detection is an important next step for the proposed identification system. Finally, MobileABIS and CloudABIS has been evaluated currently only with respect to

of



ro

technical capabilities. Therefore, the employment of both modules within surveillance

-p

campaigns of the Zoological Research Museum Alexander Koenig are planned.

re

References

Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G. S., Davis, A.,

lP

Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving, G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur, M., Levenberg, J., Mané, D., Monga, R., Moore, S.,

na

Murray, D., Olah, C., Schuster, M., Shlens, J., Steiner, B., Sutskever, I., Talwar, K., Tucker, P., Vanhoucke, V., Vasudevan, V., Viégas, F., Vinyals, O., Warden, P., Wattenberg, M., Wicke,

ur

M., Yu, Y., Zheng, X., 2015. TensorFlow: Large-scale machine learning on heterogeneous systems.

Software

available

from

tensorflow.org.

URL

Abadi,

M.,

Jo

https://www.tensorflow.org/ Agarwal,

A.,

et

al.,

2017.

Tensorflow

lite.

https://www.tensorflow.org/lite/. Barre, P., Stoever, B. C., Mueller, K. F., Steinhage, V., 2017. Leafnet: A computer vision system for automatic plant species identification. Ecological Informatics 40, 50 – 56. URL http://www.sciencedirect.com/science/article/pii/S157495411630 2515 Chollet, F., et al., 2015. Keras. https://keras.io. Ding, W., Taylor, G., 04 2016. Automatic moth detection from trap images for pest management. Computers and Electronics in Agriculture 123, 17–28. Drauschke, M., Steinhage, V., Vega, A., Müller, S., Francoy, T., Wittmann, D., 01 2007. Reliable

Journal Pre-proof biometrical analysis in biodiversity information systems. pp. 27–36. EntomologyToday,

2019.

Entomology

today.

Website,

source

https://entomologytoday.org; retrieved Sept. 18, 2019. Francoy, T., Wittmann, D., Drauschke, M., Müller, S., Steinhage, V., Bezerra-Laure, M., De Jong, D., Gonçalves, L., 2008. Identification of africanized honey bees through wing morphometrics: Two fast and efficient procedures. Apidologie 39 (5). Francoy, T., Wittmann, D., Steinhage, V., Drauschke, M., Müller, S., Cunha, D., Nascimento, A., Figueiredo, V. L., Simoes, Z., De Jong, D., Arias, M., Gonçalves, L., 02 2009. Morphometric

of

and genetic changes in a population of apis mellifera after 34 years of africanization. Genetics

ro

and molecular research : GMR 8, 709–17.

Frank, E., Hall, M., Holmes, G., Kirkby, R., Pfahringer, B., Witten, I. H., Trigg, L., 2005. Weka.

-p

In: Maimon, O., Rokach, L. (Eds.), Data Mining and Knowledge Handbook. Springer, pp. 1305–1314.

re

Gaston, K. J., O’Neill, M. A., 2004. Automated species identification: why not? Philos Trans R

lP

Soc Lond B Biol Sci. 359 (1444), 655–667.

He, K., Zhang, X., Ren, S., Sun, J., 2015a . Deep residual learning for image recognition. CoRR

na

abs/1512.03385. URL http://arxiv.org/abs/1512.03385 He, K., Zhang, X., Ren, S., Sun, J., 2015b . Delving deep into rectifiers: Surpassing human-level performance

on

imagenet

classification.

CoRR

abs/1502.01852.

URL

ur

http://arxiv.org/abs/1502.01852

Jo

Holovachov, O., Zatushevsky, A., Shydlovsky, I., 2014. Whole-drawer imaging of entomological collections: Benefits, limitations and alternative applications. Journal of Conservation and Museum Studies 12 (1), 1–13. Horn, G., Aodha, O., Song, Y., Cui, Y., Sun, C., Shepard, A., Adam, H., Perona, P., Belongie, S., 2018. The inaturalist species classification and detection dataset. pp. 8769–8778. Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H., 2017. Mobilenets: Efficient convolutional neural networks for mobile vision applications. CoRR abs/1704.04861. URL http://arxiv.org/abs/1704.04861 Hu, J., Shen, L., Sun, G., 2018. Squeeze-and-excitation networks. Hudson, L. N., Blagoderov, V., Heaton, A., Holtzhausen, P., Livermore, L., Price, B. W., van der Walt, S., Smith, V. S., 2015. Inselect: Automating the digitization of natural history

Journal Pre-proof collections. PLOS ONE 10 (11), 1–15. iNaturalist, 2019. inaturalist. Website, source https://www.inaturalist.org/; retrieved Sept. 18, 2019. King, G., Zeng, L., 2001. Logistic regression in rare events data. Political analysis 9 (2), 137–163. Mantle, B., La Salle, J., Fisher, N., 07 2012. Whole-drawer imaging for digital management and curation of a large entomological collection. ZooKeys 209, 147–63. Martineau, M., Conte, D., Raveaux, R., Arnault, I., Munier, D., Venturini, G., 2017. A survey on image-based insect classification. Pattern Recogn. 65, 273–284.

of

Martinsson, J., 2017. Bird species identification using convolutional neural networks. Master’s

ro

thesis, 68.

Mayo, M., Watson, A., 2007. Automatic species identification of live moths. Knowledge-Based

-p

Systems 20, 195–202.

Mele, K., 12 2013. Insect soup challenge: Segmentation, counting, and simple classification. pp.

re

168–171.

lP

Norouzzadeh, M. S., Nguyen, A., Kosmala, M., Swanson, A., Palmer, M. S., Packer, C., Clune, J., 2018. Automatically identifying, counting, and describing wild animals in camera-trap images with deep learning. Proceedings of the National Academy of Sciences 115 (25),

na

E5716–E5725. URL http://www.pnas.org/content/115/25/E5716

ur

O’Neill, 2000. Daisy: a practical tool for semi-automated species identification. Automated taxon identification in systematics: theory, approaches, and applications. CRC Press/Taylor &

Jo

Francis Group, Boca Raton/Florida, 101–114. Otwell, T., 2011. Laravel. https://laravel.com. Reed, S., 2010. Pushing daisy. Science 328 (5986), 1628–1629. Roth, V., Steinhage, V., 09 2000. Nonlinear discriminant analysis using kernel functions. Adv. Neural Inform. Proces. Syst 12. Roth, V., Steinhage, V., Schröder, S., Cremers, A., Wittmann, D., 1999. Pattern recognition combining de-noising and linear discriminant analysis within a real world application. pp. 251–258. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A. C., Fei-Fei, L., 2015. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV) 115 (3), 211–252.

Journal Pre-proof Russell, K., Do, M., Huff, J., Platnick, N., 2008. Introducing spida-web: Wavelets, neural networks and internet accessibility in an image-based automated identification system. In: MacLeod, N. (Ed.), Automated Taxon Identi cation in Systematics: Theory, Approaches and Applications. CRC Press, pp. 131–152. Sandler, M., Howard, A. G., Zhu, M., Zhmoginov, A., Chen, L., 2018. Inverted residuals and linear bottlenecks: Mobile networks for classification, detection and segmentation. CoRR abs/1801.04381. URL http://arxiv.org/abs/1801.04381 Springenberg, J. T., Dosovitskiy, A., Brox, T., Riedmiller, M. A., 2014. Striving for simplicity: all

convolutional

net.

CoRR

abs/1412.6806.

of

The

ro

http://arxiv.org/abs/1412.6806

URL

Steinhage, V., Kastenholz, B., Schröder, S., Drescher, W., 1997. A hierarchical approach to

-p

classify solitary bees based on image analysis. In: German Workshop on Pattern Recognition 1997. pp. 419–426.

re

Steinhage, V., Schroeder, S., Lampe, K., Cremers, A., 07 2007. Automated extraction and analysis

lP

of morphological features for species identification, 115–129. Szegedy, C., Ioffe, S., Vanhoucke, V., 2016. Inception-v4, inception-resnet and the impact of residual

connections

on

learning.

CoRR

abs/1602.07261.

URL

na

http://arxiv.org/abs/1602.07261 Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S. E., Anguelov, D., Erhan, D., Vanhoucke, V.,

ur

Rabinovich, A., 2014. Going deeper with convolutions. CoRR abs/1409.4842. URL

Jo

http://arxiv.org/abs/1409.4842 Watson, A. T., O’Neill, M. A., Kitching, I. J., 2004. Automated identification of live moths (macrolepidoptera) using digital automated identification system (daisy). Systematics and Biodiversity 1 (3), 287–300. Weeks, P. J. D., O’Neill, M. A., Gaston, K. J., Gauld, I. D., 1999. Automating insect identification: exploring the limitations of a prototype system. Journal of Applied Entomology 123 (1), 1–8. Xie, J., Hou, Q., Shi, Y., Peng, L., Jing, L., Zhuang, F., Zhang, J., Tang, X., Xu, S., 2018. The automatic

identification

of

butterfly

species.

CoRR

abs/1803.06626.

URL

http://arxiv.org/abs/1803.06626 Yao, Q., Liu, Q., Dietterich, T., Todorovic, S., Lin, J., Diao, G., Yang, B., Tang, J., 02 2013. Segmentation of touching insects based on optical flow and ncuts. Biosystems Engineering

Journal Pre-proof 114, 67–77. Zhu, X., Bain, M., 2017. B-CNN: branch convolutional neural network for hierarchical classification. CoRR abs/1709.09890. URL http://arxiv.org/abs/1709.09890

Highlights:

ur

na

lP

re

-p

ro

of

Reducing training efforts by automated feature generation using deep convolutional networks Enabling participatory sensing using mobile smart phones and a cloud-based platform Transferable to other taxa beyond Hymenoptera using end-to-end learning

Jo

  