Generating artificial images of plant seedlings using generative adversarial networks

Generating artificial images of plant seedlings using generative adversarial networks

b i o s y s t e m s e n g i n e e r i n g 1 8 7 ( 2 0 1 9 ) 1 4 7 e1 5 9 Available online at www.sciencedirect.com ScienceDirect journal homepage: w...

2MB Sizes 0 Downloads 86 Views

b i o s y s t e m s e n g i n e e r i n g 1 8 7 ( 2 0 1 9 ) 1 4 7 e1 5 9

Available online at www.sciencedirect.com

ScienceDirect journal homepage: www.elsevier.com/locate/issn/15375110

Research Paper

Generating artificial images of plant seedlings using generative adversarial networks Simon L. Madsen*, Mads Dyrmann, Rasmus N. Jørgensen, Henrik Karstoft Department of Engineering, Aarhus University, Finlandsgade 22, 8200, Aarhus N, Denmark

article info

Plants seedlings are a part of a domain with low inter-class and relatively high intra-class

Article history:

variance with respect to visual appearance. This paper presents an approach for generating

Received 28 January 2019

artificial image samples of plant seedlings using generative adversarial networks (GAN) to

Received in revised form

alleviate for the lack of training data for deep learning systems in this domain. We show

6 August 2019

that it is possible to use GAN to produce samples that are visually distinct across nine

Accepted 2 September 2019

different plants species and maintain a high amount variance within each species. The generated samples resemble the intended species with an average recognition accuracy of 58:9±9:2%, evaluated using a state-of-the-art classification model. The observed errors are

Keywords:

related to samples representing species which are relatively anonymous at the dicotyle-

Image synthesis

donous growth stage and to the model's incapability to reproduce small shape details. The

Plant seedlings

artificial plant samples are also used for pretraining a classification model, which is fine-

GAN

tuned using real data. The pretrained model achieves 62:0±5:3% accuracy on classifying

Low inter-class and high intra-class

real plant seedlings prior to any finetuning, thus providing a strong basis for further

variance

training. However, finetuning the pretrained models show no performance increase

Plant classification

compared to models trained without finetuning, as both approaches are capable of achieving near perfect classification on the dataset applied in this work. © 2019 IAgrE. Published by Elsevier Ltd. All rights reserved.

1.

Introduction

Deep learning (DL) applications often require large datasets for training in order to achieve high performance on visual recognition tasks, such as detection, classification etc (Halevy, Norvig, & Pereira, 2009; Sun, Shrivastava, Singh, & Gupta, 2017). However, data for training these algorithms can be difficult to come by, especially for domain specific applications. An example of such a domain is plant detection, segmentation and classification (Tsaftaris, Minervini, & Scharr,

2016). Plant seedlings often have similar appearance across several species at the early development stages, but this might change drastically, as they develop their true leaves. Thus, this is a domain with relative low inter-class diversity and relative high intra-class diversity with respect to visual appearance (See examples in Figs. 1 and 2) (Dyrmann, Midtiby, & Jørgensen, 2016). Thus, it is very important to have a strong dataset for training automated detection and classification systems for e.g. weed control applications, where early weed detection and identification is essential to achieve the most effective treatment.

* Corresponding author. E-mail addresses: [email protected] (S.L. Madsen), [email protected] (M. Dyrmann), [email protected] (R.N. Jørgensen), hka@eng. au.dk (H. Karstoft). https://doi.org/10.1016/j.biosystemseng.2019.09.005 1537-5110/© 2019 IAgrE. Published by Elsevier Ltd. All rights reserved.

148

b i o s y s t e m s e n g i n e e r i n g 1 8 7 ( 2 0 1 9 ) 1 4 7 e1 5 9

Nomenclature Symbols D Discriminator Network Ds; Dc Output signals from the discriminator corresponding to the source distribution estimate and class estimate respectively G Generator Network z Random noise vector y; y~ Class label, real and artificial ~ x; x Data sample [RBG image], real and artificial Distribution of real data samples pr Distribution of artificial data samples pg Random sample distribution generated from pbx interpolation between real and artificial samples Gradient penalty coefficient lGP Number of classes NC LS Data source dependent loss Classification loss LC Classification loss weighting terms for wd ; wg discriminator and generator respectively l Interpolation coefficient Abbreviations GAN Generative Adversarial Networks DL Deep Learning sPSD Segmented Plant Seedlings Dataset ACGAN Auxiliary Classifier GAN WGAN Wasserstein GAN WGAN-GP Wasserstein GAN with gradient penalty ReLU Rectified Linear Units EPPO The European and Mediterranean Plant Protection Organization tconv Transpose convolutional layer conv Convolutional layer fc Fully connected layer

Although it is relatively easy to collect data in the plant domain it is often not feasible to increase the amount of training data, as it requires additional annotation to become useful for DL training purposes. This annotation process is often time-consuming and require a high level of expertise to avoid misclassification or other errors (Dyrmann et al., 2016; Minervini, Giuffrida, Perata, & Tsaftaris, 2017). Alternatively, generative models can be used to generate artificial data that mimic statistical properties of real data. Generative models have previously been applied for improving plant classification (Søgaard, 2005), leaf counting (Valerio Giuffrida, Scharr, & Tsaftaris, 2017; Zhu, Aoun, Krijn, Vanschoren, & Campus, 2018b) or leaf segmentation (Ubbens, Cieslak, Prusinkiewicz, & Stavness, 2018; Ward, Moghadam, & Hudson, 2018) on plants, showing promising results. However, the models applied in these studies were either limited to a single species or required a considerable amount of empirical measurements to build. Generative adversarial networks (GAN) constitute a promising new approach for generative modelling, and were

first introduced by Goodfellow et al. (2014). GANs are trained using end-to-end learning, thus does not require empirical measurements to model the data. Additionally, GAN support multi-class modelling through unsupervised and/or supervised conditioning schemes (Chen et al., 2016; Odena, Olah, & Shlens, 2017). GAN models are useful for data augmentation purposes as they learn a general representation of the data distribution and they do not simply memorise the data on which they are trained (Goodfellow et al., 2014; Odena et al., 2017; Radford, Metz, & Chintala, 2015). Examples of fields where GAN have been used for data augmentation with high success includes: liver lesion classification (Frid-Adar, Klang, Amitai, Goldberger, & Greenspan, 2018), emotion classification (Zhu, Liu, Li, Wan, & Qin, 2018a), Person Re-identification (Zheng, Zheng, & Yang, 2017) and single plant leaf-counting (Valerio Giuffrida et al., 2017; Zhu et al., 2018b). GAN models have primarily been applied in single class domains or domains with high inter-class diversity, as they tend to suffer from poor convergence or mode-collapse in multi-class domains with low inter-class diversity (Arjovsky, Chintala, & Bottou, 2017; Goodfellow, 2016). However, due to new and improved objective formulations training GAN have become more robust with regard to these issues (Arjovsky et al., 2017; Goodfellow, 2016; Gulrajani, Ahmed, Arjovsky, Dumoulin, & Courville, 2017). In this paper, a GAN model is applied to generate artificial images of plant seedlings, with the purpose of augmenting the available training data for plant classification models. We show that it is possible to combine results from previous research to train a single GAN model, capable of producing distinguishable artificial samples of plant seedlings from multiple species in a challenging domain with low inter-class diversity. Additionally, the quality and value of the artificial plant seedling samples are validated through a class discriminability test and a transfer learning test.

2.

Related work

Since the introduction of GANs, several improvements have been proposed to increase the resolution and quality of the generated samples. The improvements include applying new network designs for the generator and discriminator networks (Denton, Chintala, & Fergus, 2015; Radford et al., 2015), modification of the GAN configuration (Chen et al., 2016; Karras, Aila, Laine, & Lehtinen, 2017; Odena et al., 2017; Salimans et al., 2016; Zhang et al., 2017a, 2017b) and reformulation of the GAN objective function (Arjovsky et al., 2017; Goodfellow, 2016; Gulrajani et al., 2017; Salimans, Zhang, Radford, & Metaxas, 2018). By including conditioning in a GAN configuration, it is possible to control the content of the artificial samples. Conditions can be applied unsupervised, supervised or in a mixture of the two, dependent on the task. The InfoGAN model by Chen et al. (2016) presents an approach for applying unsupervised conditioning scheme. Without using labelled data, the InfoGAN model is capable of disentangling information in the data, such as class clusters and dominating modalities. Similarly, the ACGAN model by Odena et al. (2017) applied a supervised conditioning approach to generate

b i o s y s t e m s e n g i n e e r i n g 1 8 7 ( 2 0 1 9 ) 1 4 7 e1 5 9

149

Fig. 1 e Real data samples from the segmented plant seedlings dataset (sPSD) (Giselsson, Jørgensen, Jensen, Dyrmann, & Midtiby, 2017). Each column shows a different species. The backgrounds are changed from black to white, to increase the contrast in the images. Note, the plant species are identified using the European and Mediterranean Plant Protection Organization (EPPO) coding scheme throughout the paper (EPPO 2018).

150

b i o s y s t e m s e n g i n e e r i n g 1 8 7 ( 2 0 1 9 ) 1 4 7 e1 5 9

Fig. 2 e Single instance of MATIN tracked for the first two weeks of growth (after (Dyrmann & Christiansen, 2014)). discriminable 128  128 image samples of all 1000 ImageNet classes (Russakovsky et al., 2015). The Wasserstein GAN model (WGAN) presented an alternative formulation of the adversarial objective function (Arjovsky et al., 2017). Instead of using the loglikelihood, the loss is formulated as the distance between the estimated probability distributions for the real and artificial data, also known as the earth mover's distance (Rubner, Tomasi, & Guibas, 2000). This formulation improves GAN model convergence properties and improves the quality of the generated samples. Gulrajani et al. (2017) propose further improvements of the Wasserstein objective function by introducing a regularisation term in the loss formulation. Generative models for modelling plant seedlings are to our knowledge not widely studied. Søgaard (2005) modelled 19 different weed species using active shape models that potentially could be used to warp plants to have different appearances. Mu¨ndermann, Erasmus, Lane, Coen, and Prusinkiewicz (2005) modelled in 3D, the development of Arabidopsis plants from seedling to mature plant using a Lindenmayer system. The Lindenmayer system describes a plant as a developing assembly of individual modules, each characterised by parameters such as length, width, and age, as well as parameters characterising shape (Mu¨ndermann et al., 2005). Ubbens et al. (2018) further extended this work by showing that the synthetic samples generated using a Lindenmayer system can be used to improve the performance on leaf counting. By augmenting the training data with artificial samples they achieve a reduction of approximately 27% in the mean absolute count error. Ward et al. (2018) used an approach inspired by domain randomisation to produce artificial images of Arabidopsis plants. The approach first builds a 3D model for a single leaf using real data and iteratively adds randomly modified instances of this leaf together to produce synthetic models of full plants. In the end, these synthetic samples are used to improve the performance on the task of leaf counting. The paper reports a 3.4% increase in segmented best dice (Scharr et al., 2016), when artificial samples are used to augment the training set. ARIGAN, developed by Valerio Giuffrida et al. (2017), applied GAN to create artificial samples of Arabidopsis plants. The model uses a condition on the leaf count, this enables the model to generate images of plants at different growth stages. Valerio Giuffrida et al. (2017) also showed that artificial images can be used to augment the training data and thereby reduce the absolute difference in

counting error by 5.4% on leaf counting. Zhu et al., (2018b) also used a conditional GAN setup to create artificial images of Arabidopsis plants, with the focus on improving leaf counting. The approach applied in Zhu et al., (2018b) used a leaf segmentation mask as conditioning, which enabled the model to produce samples with high textural details since the outlining shape was given and it did not need to model this. The model is also capable of producing a realistic background in artificial images. Zhu et al., (2018b) reported a 16.7% reduction in absolute difference in counting, when artificial images were used to augment the training data. The above-mentioned GAN approaches are similar to what is presented in this work. However, in this work, the conditioning is related to the plant species rather than leaf count, since our task is to produce artificial samples that improve plant classification and not leaf counting. We also attempt to cover multiple species using a single model instead of focussing on a single species.

3.

Method

The generative process for GANs can be formulated as a game between two players: A discriminator network, D, and a generator network, G. The objective of D is to distinguish between samples from the real and artificial data distribution, thus acting as a classifier between the two distributions. The objective of G is to produce artificial samples that look so realistic that D cannot distinguish them from real samples. The GAN model in this work is a modified combination of the previous GAN models: WGAN-GP (Gulrajani et al., 2017) and ACGAN (Odena et al., 2017). Thus, we designate this configuration WacGAN (Wasserstein auxiliary classifier generative adversarial network). WacGAN applies a supervised conditioning scheme that enables the model to produce visually distinct samples for multiple classes, while still maintaining a relative high amount of variability in the produced sample within each of these. The configuration of WacGAN is visualised in Fig. 3.

3.1.

Objective function

The Objective function is divided into two parts: a source loss and a class loss. The source loss, LS , depends on the discriminator's ability to distinguish between the real and artificial data distributions. In initial experiments, LS was implemented

b i o s y s t e m s e n g i n e e r i n g 1 8 7 ( 2 0 1 9 ) 1 4 7 e1 5 9

as the MinMax objective described in the original GAN formulation (Goodfellow et al., 2014). However, using this formulation the model had poor convergence properties and suffered from mode collapse. Instead, LS was implemented as a Wasserstein loss with gradient penalty as described by Gulrajani et al. (2017). h

~ Þ þ lGP E LS ¼ E ½DsðxÞ  E ½Dsðx xpr ~ pg x bx p bx

2 i b Þk 2  1 k Vbx Dsð x

(1)

where pr is the real data distribution and pg is the model dis~ ¼ Gðz; y ~ Þ; z  pðzÞ (Gulrajani tribution implicitly defined by x et al., 2017). pbx is a random sample distribution generated from random interpolation between the real and artificial samples (Gulrajani et al., 2017). Similarly, the class loss, LC , describes the discriminator's ability to distinguish between each class in the data. This loss will also depend on the generator's ability to generate samples that properly represent each class. The class loss is implemented as the cross-entropy loss between the expected output, y, and the actual output of the discriminator's classification branch, DcðxÞ (Bishop, 2006; Odena et al., 2017). " LC ¼ E

xpr



X i2NC

"

# yi log Dci ðxÞ þ E

~ pg x



X i2NC

# ~Þ e yi log Dcðx (2)

where NC is the number of classes. The discriminator is trained to minimise  LS þ wd LC , whereas the generator is trained to minimise LS þ wg LC . wd and wg are weight terms used to regulate the importance of the class loss relative to the source loss. Large class weights increase the capability of the model to produce distinguishable samples for each class but reduce the diversity within each class. The values of the wd and wg are determined empirically through experiments. The combination of the Wasserstein

Fig. 3 e WacGAN model configuration. z and y are the model inputs corresponding to the noise vector and class vector respectively. The signals Ds and Dc are the discriminator outputs corresponding to the distribution estimation (true or artificial samples) and the class estimation respectively.

151

source loss and the weighted class loss are intended to ensure that the model can generate samples with relatively high intra-class diversity while still generating discriminable class samples, despite low inter-class diversity in the training data.

3.2.

Networks design

The network designs for WacGAN follow the same general design described by Odena et al. (2017). The generator consists of one fully connected layer (fc) followed by five transposed convolutional layers (tconv). The network uses rectified linear units (ReLU) activation function in all layers except the last where Tanh is used to ensure the generated samples are within the range of ½  1; 1. The input for the generator consists of a one-hot encoded conditioning on the plant species concatenated with a vector drawn from a uniform noise distribution, pðzÞ  Uð  1; 1Þ. The length of the noise vector is found empirically. The discriminator consists of six convolutional layers (conv) followed by two parallel fc layers: one for classifying source and one for classifying class. The network uses LeakyReLU in all the conv layers and uses softmax activation for the class branch. The discriminator network does not use batch normalisation in any layer since this does not comply with the Wasserstein source loss (Gulrajani et al., 2017). The full implementation is summarised in Appendix A.

3.3.

Evaluation

The quality of the generated artificial samples is evaluated qualitatively and quantitatively. Qualitative evaluation is performed through visual inspection of the artificial samples. Through this inspection we provide a subjective evaluation and discussion of the sample quality. The qualitative analysis is performed because there to our knowledge do not exist any good quantitative measure to assess the realism of artificial plant seedling images. The quantitative evaluation of the artificial WacGAN samples consists of a class discriminability test and a transfer learning test. The class discriminability test is used to assess how well the artificial samples represents the intended species by using an external classification model to evaluate the samples. A similar approach is commonly used in GAN research (Gulrajani et al., 2017; Odena et al., 2017; Salimans et al., 2016; Zhang et al., 2017a, 2017b), where the Inception score (Szegedy, Vanhoucke, Ioffe, Shlens, & Wojna, 2016) is a popular metric for accessing the discriminability of artificial samples trained on ImageNet (Russakovsky et al., 2015). InceptionNet (Szegedy et al., 2016) is not suited for this application, since the classes in this study differs from the classes of ImageNet. Instead another external classifier based on ResNet-101 is applied (He, Zhang, Ren, & Sun, 2016). Class discriminability is reported as a recognition accuracy, which indicates how often the artificial samples is classified as the intended species by the external classifier. It should be noted that this test only provides a metric for how well that specific classifier recognises the samples. However, the results still provide an indication of whether the GAN model is capable of producing realistic class information.

152

b i o s y s t e m s e n g i n e e r i n g 1 8 7 ( 2 0 1 9 ) 1 4 7 e1 5 9

The transfer-learning test is used to evaluate if the artificial WacGAN samples can be used in conjunction with real data samples to achieve better classification of plant seedlings. In the transfer-learning setup, an external ResNet-101 classifier (He et al., 2016) is pretrained on artificial samples from the WacGAN model and finetuned on real data samples. This method should show an increase in convergence rate and potentially also an increase in classification accuracy, when testing on a holdout subset of the real data (Yosinski, Clune, Bengio, & Lipson, 2014). To avoid data-leakage in the quantitative evaluations, a leave-p-out cross-validation scheme is applied (Celisse & Robin, 2008). The real data is divided into four nonoverlapping, equally sized, and evenly class-distributed parts. With a p of 50%, and four data parts, six unique data splits of training and test data are obtained. In the class discriminability evaluation, the WacGAN model is trained on the test-data, and the external ResNet-101 classifier is trained on the training-data. Thereby the WacGAN model can be used to produce an additional test-set of artificial samples that are unrelated to data used to train the external ResNet-101 classifier. In the transfer learning setup, the WacGAN model and the external ResNet-101 classifier are both trained on the training-data and tested on the test-data.

4.

Results

The segmented Plant Seedlings Dataset (sPSD) (Giselsson et al., 2017) was used for all experiments in this work. The sPSD consists of segmented RGB images of plant seedlings cultivated in a greenhouse setting. The images in the dataset have different spatial resolutions, but due to the network design, all input images must be 128  128 pixels. To keep a low resizing-factor all images with a spatial resolution >400 pixels in either dimension was excluded from the dataset (400=128 ¼ 3:125). Additionally, all grass species were excluded, since these were not properly segmented in the dataset. Thus, nine plant species remained in the dataset, see Table 1. The images were pre-processed, by zero-padding to a resolution of 566  566 pixels, randomly rotated around the image center, cropped to a resolution of 400  400 pixels and resized using bilinear interpolation to provide the desired resolution of 128  128 pixels. In the following experiments the class weights, wd and wg , were both set to 7.5. This value is determined empirically and provided a good trade-off between intra- and inter-class diversity. The length of the noise vector, z, was set to 128. Shorter length produced samples with poor diversity (both inter- and intra-class) and longer lengths did not appear to improve the quality of the artificial samples. A full summary of the hyperparameters used can be found in Appendix A. For each data-split, the WacGAN model was used to generate 10,000 artificial samples for each class. These artificial sample acted as an additional test-set in the class discriminability evaluation, and as additional training-data in the transfer learning setup.

Table 1 e Overview of plant species and sample count after removing grass species images with a resolution larger than 400 pixels in either dimension from segmented plant seedlings dataset. English name Charlock Cleavers Common Chickweed Fat Hen Maize Scentless Mayweed Shepherd's Purse Small-flowered Cranesbill Sugar Beets

4.1.

Latin name

EPPO code

N samples

Sinapis arvensis Galium aparine Stellaria media

SINAR GALAP STEME

297 270 591

Chenopodium album Zea mays Matricaria perforata Capsella bursa-pastoris Geranium pusillum Beta vulgaris var. altissima

CHEAL

447

ZEAMX MATIN

149 498

CAPBP

225

GERPU

429

BEAVA

191

Visual inspection of artificial samples

Figure 4 shows some examples of artificially generated plant seedlings, where each column represents a different species and each row is generated using a unique noise vector. From visual inspection of the artificial samples it seems that the network learned that plants are composed of a number of leaves arranged around a plant centre. There was no stem in most of the artificial samples, but this is to be expected since the stem was also missing in many of the training examples (see Fig. 1). When examined closely the leaves of the generated samples consists mainly of smooth green surfaces with limited or no textural patterns. This makes it hard to, for example, distinguish the border between overlapping leaves. With closer inspection, it was possible to observe image artefacts in some of the generated images. Specifically, two artefacts were apparent: pixel errors along the borders of the artificial plant samples and a “grid effect” that causes pixel errors in a systematic pattern in the texture of the samples. Figure 5 show an example with severe image artefacts. Figure 4 also shows that the model is capable of producing visually distinct samples across the classes, even though the noise vector is fixed and only the conditioning changes. This behaviour can be credited to the relatively high class loss weighting. It is an important result that we were able to force the model to produce distinctive samples, by simply weighting the class loss higher relative to the source loss. Whether the samples actually correspond to the real data classes is explored in the following section. The WacGAN model was also analysed by making interpolations between pairs of noise vectors drawn randomly from the noise space as inspired by previous GAN studies (Chen et al., 2016; Goodfellow et al., 2014; Odena et al., 2017; Radford et al., 2015). The interpolations between two random noise vectors, z1 and z2 are defined by: zinterpolated ¼ lz2 þ ð1  lÞz1

with l2½0; 1

(3)

b i o s y s t e m s e n g i n e e r i n g 1 8 7 ( 2 0 1 9 ) 1 4 7 e1 5 9

153

Fig. 4 e Artificial sample of plant seedlings. Each column represents a different species and each row is a unique random noise vector. The background is changed from black to white, to increase the contrast in the images.

154

b i o s y s t e m s e n g i n e e r i n g 1 8 7 ( 2 0 1 9 ) 1 4 7 e1 5 9

is supported by Fig. 6b that shows the area of each sample in Fig. 6a as a function of the interpolation parameter, l. This is an indication that the noise space is only responsible for introducing intra-class variance and the latent class code provides control of the species. Note the rate of change is not linear, only analysed in this way. Nor does the size change relate directly to plant development, since unrealistic growth patterns are present. Observations from other interpolations have indicated changes in orientations of the plants, number of leaves, etc.

4.2.

Fig. 5 e Example of severe image artefacts. Intended species of the sample is CAPBP. Image is scaled to 150% size to better visualise the issues. By sampling l uniformly over the range it is possible to observe the transition from z1 to z2 . The interpolated noise vectors are parsed through the generator, G, along with class codes for each species. Figure 6a shows one such interpolation, between two randomly drawn noise vectors. Each column in the Fig. 6a is generated using the same interpolation vector, only the latent class coding changes for each row. This example indicates that it is possible to control the size of the artificial plant seedlings, by transitioning in a specific direction in the noise space used for producing the samples. It is also interesting to observe that this interpolation has a similar effect independent of the different species, with few exceptions. This

Class discriminability of artificial samples

The results of the class discriminability test are summarised in Table 2 and in Fig. 7. The artificial WacGAN samples achieved an average recognition accuracy of 58:9±9:2%, which is well above random guessing, which would yield 11.1%. The results show that the WacGAN model is able to reproduce species-specific characteristics that are recognised by the external classifier. The recognition accuracy was highest for the species STEME and GERPU, with average accuracies of 83.3% and 79.7% respectively. It was lowest for CAPBP and BEAVA, with average accuracies of 43.6% and 45.2% respectively. The distributions of the classifications are summarised in a confusion matrix (Fig. 7). The confusion matrix shows that the WacGAN model seems to have overfitted slightly towards STEME, since CHEAL, ZEAMX, MATIN and CAPBP, tended to get misclassified as that species. Additionally, BEAVA also tended to get misclassified as CHEAL. These misclassifications are discussed in the following section.

4.3.

Transfer learning using artificial samples

The results of the transfer learning test are visualised in Fig. 8. It shows the test accuracy as a function of the training time

Fig. 6 e (a) Interpolation between two random points in the noise space used for producing the artificial plant seedling samples. Same interpolation vector is used in each column, only the latent class coding changes. The background is changed from black to white, to increase the contrast in the images. (b) The pixel area of each sample given in Fig. 6a as a function of l.

b i o s y s t e m s e n g i n e e r i n g 1 8 7 ( 2 0 1 9 ) 1 4 7 e1 5 9

Table 2 e The accuracy of recognizing the intended species of the artificial samples from WacGAN, when analysed using an external ResNet-101 classifier trained sole on real data samples from sPSD. The recognition accuracies are reported as the mean and standard deviation over the six splits in the cross-validation scheme. Species

Test Set Real Data

SINAR GALAP STEME CHEAL ZEAMX MATIN CAPBP GERPU BEAVA Total

Artificial Data

N Samples

Accuracy

N Samples

Accuracy

149 135 296 224 75 249 113 215 96 1549

94:3±1:1% 93:5±2:3% 96:3±0:7% 95:3±1:7% 90:5±3:1% 94:3±1:6% 84:0±3:1% 95:8±1:5% 90:6±4:8% 92:7±1:0%

10,000 10,000 10,000 10,000 10,000 10,000 10,000 10,000 10,000 90,000

63:3±19:1% 57:1±19:1% 83:3±7:8% 56:0±14:7% 49:2±10:0% 53:0±8:5% 43:6±16:0% 79:7±9:9% 45:2±23:5% 58:9±9:2%

(given in epochs), for ResNet-101 classification models with and without pretraining on the artificial WacGAN samples. From Fig. 8 it can be seen that the pretrained model converged significantly faster than the model without pretraining. The pre-trained model was capable of achieving 62:0±5:3% accuracy prior to any finetuning and  90% accuracy after only 75

155

training epochs on real data samples. For comparison, the model without pretraining starts at 15:3±3:0% accuracy, which is close to random guessing and reached  90% accuracy after 175 training epochs. These observations show that the artificial plant samples hold a significant amount of information that can be used to distinguish real plant species and thereby provide a better basis for training a classifier than training from scratch. However, given enough training time both model configurations achieved comparable performance: 92:4±0:9% (With pretraining) and 92:7±1:0% (Without pre-training).

5.

Discussion

The results in the preceding sections, show that the artificial samples from the WacGAN model, resemble sensible and plausible representations of plant seedlings. It is possible to distinguish between the different species in the artificial samples, while these still maintain a relative high visual variance within each species. Although the WacGAN samples look sensible, the samples are not perfect. The artificial plant seedling samples are lacking in terms of textural and shape details, compared to real samples as shown in Fig. 1 and Fig. 4. Additionally, image artefacts can also be observed as mentioned earlier. Both the lack of detail and the image artefacts are probably related to

Fig. 7 e Average confusion matrix when evaluating the artificial samples from the WacGAN model using the external classifier trained solely on real samples.

156

b i o s y s t e m s e n g i n e e r i n g 1 8 7 ( 2 0 1 9 ) 1 4 7 e1 5 9

Fig. 8 e Average classification accuracy as a function training/finetuning iterations using real data samples from the sPSD. The blue and orange curves shows the convergence of the ResNet-101 classification models with and without pretraining on WacGAN samples respectively. The pre-training is not shown in the figure.

the network architecture of the generator. The generator was implemented with relative large kernels (8 8Þ in the output layer. These kernels cover a large area, which might make the network unable to model “hard” edges and transitions. This would explain why the generator was incapable of reproducing small details in the samples and why pixel errors were observed along the border of the generated objects. Additionally, the generator uses a stride of two for up-sampling in all layers. This stride could account for the second image artefact, the “grid effect”, as this repeats itself every second pixel in the affected samples. Similar effects can be observed in the samples provided in the original ACGAN paper (Odena et al., 2017). However, the effects are not as apparent, probably because ACGAN does not operate on segmented images. Neither of these hypotheses were tested further in this work. According to the results from the class discriminability tests, the WacGAN samples resembled the intended species to a relatively high degree (58:9±9:2% recognition accuracy). However, as mentioned in the results, the WacGAN model might have overfitted slightly towards a few species. This issue is probably an effect of the incapability of the WacGAN model to reproduce small structural patterns, which significantly reduces, or eliminates, the visual identity of the different species. This is especially true for samples of small plants, since these appear as small green blobs, which might easily be misidentified. Furthermore, STEME and CHEAL are both classes with a relative high number of samples compared to the other classes. Because of this overrepresentation, it is logical that WacGAN might have overfitted during training. The overfitting issue could have been avoided, or reduced, by balancing the training data according the number of samples, so all classes were equally represented. Additionally, the artificial samples also tend to get misclassified for species, which are relatively anonymous at the dicotyledonous growth stage or have highly similar visual appearance, so even minor changes in appearance will affect the external classifiers

ability to recognise them. This is especially apparent for samples of CHEAL and BEAVA and in a lesser degree MATIN and CAPBP (see Fig. 7). The transfer learning tests show that a classification model pretrained on WacGAN samples can provide a reasonable classification of real data samples prior any finetuning (62:0±5:3% accuracy). This indicates that the WacGAN samples do indeed provide a significant amount of species-specific information. However, the pretrained models did not show a performance increase after finetuning on real samples. Both classification setups (with and without pre-training on WacGAN samples) achieved near perfect classification with the sPSD testsets. This could indicate that the method reached the upper boundary when using the ResNet-101 classifier architecture. Future research in this field should consider using other datasets, which should ideally include more species and larger intra-class variance to increase the challenge. To improve the results in future work, the quality of the artificial samples needs to be increased further. The generative model needs to become better at modelling textural and shape details. Additionally, it would be helpful to increase the resolution of the generated samples as well. To achieve this, other GANs configurations might be needed, with larger network models and more conditioning, such as stackGAN (Zhang et al., 2017a; 2017b). Another interesting extension would be to explore, the possibility of modulating dominating characteristics of the samples such as plant size, orientation, number of leaves etc. Through an additional set of latent variables. This could possibly be achieved by using unsupervised conditioning as in Chen et al. (2016).

6.

Conclusion

In this paper, we have shown that it is possible to model multiple species of plant seedlings using a single GAN model.

157

b i o s y s t e m s e n g i n e e r i n g 1 8 7 ( 2 0 1 9 ) 1 4 7 e1 5 9

collection, analyses, or interpretation of data; in the writing of the manuscript, and in the decision to publish the results.

The devised model, WacGAN, is capable of generating samples that resemble real data and are visual distinct across the different species, even though there is low inter-class and relatively high intra-class variance in this domain. The samples produced using the WacGAN models achieve 58:9± 9:2% recognition accuracy using a state-of-the-art classification model trained for the domain. The artificial samples were applied in a transfer learning setup, where they were used for pretraining of a classification model, which was fine-tuned using real data. The pretrained classification models achieved 62:0±5:3% accuracy on classifying real plant seedlings prior to any finetuning, which showed that the WacGAN samples held a significant amount of domain information. However, further finetuning of the pretrained models using real data, show no increase in performance compared to models trained solely on real data samples, since both achieve near perfect classification on the specific dataset used in this study. Improvements are still required with respect to textural and shape details before the artificial plant samples can achieve a realism comparable with real samples. These improvements are left to be explored in future work in the field.

Acknowledgement This work is founded by Innovation Foundation Denmark as part of the RoboWeedMaPS project [J. nr.6150-00027B].

Appendix A. Implementation details Appendix A.1. Network designs Tables A.1 and A.2 provides implementation details on the generator and discriminator network architectures respectively. The Generator described in Table A.1 is implemented as a transpose convolutional neural network, that maps a vector in one dimension to a 128128 RGB image. The tconv layers all apply a valid padding scheme. The Discriminator described in Table A.2 is implemented as a convolutional neural network that inputs 128  128 RGB images and outputs two output vectors: an estimate of the source distribution (fc-s) and an estimate of the class (fc-c). The conv layers alternate between using same and valid padding in each second layer.

Conflicts of interest The authors declare no conflict of interest. The founding sponsors had no role in the design of the study; in the

Table A.1 e Generator network architecture details. Generator Gðz; yÞ Layer input fc1 reshape tconv1 tconv2 tconv3 tconv4 tconv5

Kernel size e ½137 e ½5  ½5  ½5  ½5  ½8 

Stride e e e ½2 ½2 ½2 ½2 ½2

 768 5 5 5 5 8

    

Output shape batch  batch  batch  batch  batch  batch  batch  batch 

2 2 2 2 2

Activation function

Batch norm

e None e ReLU ReLU ReLU ReLU Tanh

e No e Yes Yes Yes Yes No

ð128 þ 9Þ 768 1  1  768 5  5  384 13  13  256 29  29  192 61  61  64 128  128  3

Table A.2 e Discriminator network architecture details. Discriminator DðxÞ Layer input conv1 conv2 conv3 conv4 conv5 conv6 reshape fc-s fc-c

Kernel size e ½3  3 ½3  3 ½3  3 ½3  3 ½3  3 ½3  3 e ½86528  1 ½86528  9

Stride e ½2 ½1 ½2 ½1 ½2 ½1 e e e

     

2 1 2 1 2 1

Output shape batch  batch  batch  batch  batch  batch  batch  batch  batch  batch 

128  128  3 64  64  16 62  62  32 31  31  64 29  29  128 15  15  256 13  13  512 86528 1 9

Activation function

Batch norm

Dropout

e Leaky ReLU Leaky ReLU Leaky ReLU Leaky ReLU Leaky ReLU Leaky ReLU e None Softmax

e No No No No No No e No No

e 0.5 0.5 0.5 0.5 0.5 0.5 e 0.0 0.0

158

b i o s y s t e m s e n g i n e e r i n g 1 8 7 ( 2 0 1 9 ) 1 4 7 e1 5 9

Appendix A.2. Hyperparameters Table A.3 provides a summary of the hyperparameters and other training setting used in the implementation of the GAN model.

Table A.3 e Hyperparameters and training settings. Parameter Unstructured noise dimension Batch size Class scaling (wd , wg ) Training ratio (D : G) Gradient penalty coefficient (lGP ) Leaky ReLU slope Optimiser (Discriminator) Optimiser (Generator) Weight, bias initialisation Training iterations

references

Arjovsky, M., Chintala, S., & Bottou, L. (2017). Wasserstein gan. ArXiv preprint ArXiv:1701.07875. Bishop, C. M. (2006). Pattern recognition and machine learning (information science and statistics). Secaucus, NJ, USA: SpringerVerlag New York, Inc. Celisse, A., & Robin, S. (2008). Nonparametric density estimation by exact leave-p-out cross-validation. Computational Statistics and Data Analysis, 52(5), 2350e2368. Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., & Abbeel, P. (2016). Infogan: Interpretable representation learning by information maximizing generative adversarial nets. In Advances in neural information processing systems (pp. 2172e2180). Denton, E. L., Chintala, S., & Fergus, R. (2015). Deep generative image models using aOBJ laplacian pyramid of adversarial networks. In Advances in neural information processing systems (pp. 1486e1494). Dyrmann, M., & Christiansen, P. (2014). Automated classification of seedlings using computer vision: Pattern recognition of seedlings combining features of plants and leaves for improved discrimination. Dyrmann, M., Midtiby, H. S., & Jørgensen, R. N. (2016). Evaluation of intra variability between annotators of weed species in color images. In CIGR-AgEng conference, 26e29 June 2016, Aarhus, Denmark. Abstracts and full papers (pp. 1e6). EPPO. (2019). EPPO Global Database (available online). https://gd. eppo.int. Frid-Adar, M., Klang, E., Amitai, M., Goldberger, J., & Greenspan, H. (2018). Synthetic data augmentation using GAN for improved liver lesion classification. In 2018 IEEE 15th international symposium on biomedical imaging (ISBI 2018) (pp. 289e293). https://doi.org/10.1109/ISBI.2018.8363576. Giselsson, T. M., Jørgensen, R. N., Jensen, P. K., Dyrmann, M., & Midtiby, H. S. (2017). A public image database for benchmark of plant seedling classification algorithms. ArXiv Preprint ArXiv:1711.05458. Goodfellow, I. (2016). NIPS 2016 tutorial: Generative adversarial networks. ArXiv preprint ArXiv:1701.00160. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., WardeFarley, D., Ozair, S., et al. (2014). Generative adversarial nets. In

Value 128 64 7.5 5:1 10 0.2 Adam (lr ¼ 0:0002; b1 ¼ 0:5; b2 ¼ 0:9) Adam (lr ¼ 0:0010; b1 ¼ 0:5; b2 ¼ 0:9) Xavier, constant (0) 9000 epochs

Advances in neural information processing systems (pp. 2672e2680). Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., & Courville, A. C. (2017). Improved training of wasserstein gans. In Advances in neural information processing systems (pp. 5769e5779). Halevy, A., Norvig, P., & Pereira, F. (2009). The unreasonable effectiveness of data. IEEE Intelligent Systems. https://doi.org/ 10.1109/MIS.2009.36. He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770e778). Karras, T., Aila, T., Laine, S., & Lehtinen, J. (2017). Progressive growing of gans for improved quality, stability, and variation. ArXiv Preprint ArXiv:1710.10196. Minervini, M., Giuffrida, M. V., Perata, P., & Tsaftaris, S. A. (2017). Phenotiki: An open software and hardware platform for affordable and easy image-based phenotyping of rosetteshaped plants. The Plant Journal, 90(1), 204e216. Mu¨ndermann, L., Erasmus, Y., Lane, B., Coen, E., & Prusinkiewicz, P. (2005). Quantitative modeling of Arabidopsis development. Plant Physiology, 139(2), 960e968. Odena, A., Olah, C., & Shlens, J. (2017). Conditional image synthesis with auxiliary classifier gans. In Proceedings of the 34th International Conference on Machine Learning (Vol. 70, pp. 2642e2651). Radford, A., Metz, L., & Chintala, S. (2015). Unsupervised representation learning with deep convolutional generative adversarial networks. ArXiv Preprint ArXiv:1511.06434. Rubner, Y., Tomasi, C., & Guibas, L. J. (2000). The earth mover's distance as a metric for image retrieval. International Journal of Computer Vision, 40(2), 99e121. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., et al. (2015). ImageNet large scale visual recognition challenge. International Journal of Computer Vision (IJCV), 115(3), 211e252. https://doi.org/10.1007/s11263-015-0816-y. Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., & Chen, X. (2016). Improved techniques for training gans. In Advances in neural information processing systems (pp. 2234e2242). Salimans, T., Zhang, H., Radford, A., & Metaxas, D. (2018). Improving GANs using optimal transport. ArXiv preprint ArXiv:1803.05573.

b i o s y s t e m s e n g i n e e r i n g 1 8 7 ( 2 0 1 9 ) 1 4 7 e1 5 9

Scharr, H., Minervini, M., French, A. P., Klukas, C., Kramer, D. M., Liu, X., et al. (2016). Leaf segmentation in plant phenotyping: A collation study. Machine Vision and Applications, 27(4), 585e606. Søgaard, H. T. (2005). Weed classification by active shape models. Biosystems Engineering. https://doi.org/10.1016/j. biosystemseng.2005.04.011. Sun, C., Shrivastava, A., Singh, S., & Gupta, A. (2017). Revisiting unreasonable effectiveness of data in deep learning era. In Proceedings of the IEEE international conference on computer vision. https://doi.org/10.1109/ICCV.2017.97. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., & Wojna, Z. (2016). Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2818e2826). Tsaftaris, S. A., Minervini, M., & Scharr, H. (2016). Machine learning for plant phenotyping needs image processing. Trends in Plant Science, 21(12), 989e991. Ubbens, J., Cieslak, M., Prusinkiewicz, P., & Stavness, I. (2018). The use of plant models in deep learning: An application to leaf counting in rosette plants. Plant Methods, 14(1), 6. Valerio Giuffrida, M., Scharr, H., & Tsaftaris, S. A. (2017). ARIGAN: Synthetic Arabidopsis plants using generative adversarial network. In Proceedings of the IEEE international conference on computer vision (pp. 2064e2071). Ward, D., Moghadam, P., & Hudson, N. (2018). Deep leaf segmentation using synthetic data. ArXiv Preprint ArXiv:1807.10931.

159

Yosinski, J., Clune, J., Bengio, Y., & Lipson, H. (2014). How transferable are features in deep neural networks?. In Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, & K. Q. Weinberger (Eds.), Advances in neural information processing systems (Vol. 27, pp. 3320e3328) Curran Associates, Inc. Zhang, H., Xu, T., Li, H., Zhang, S., Wang, X., Huang, X., et al. (2017a). Stackganþþ: Realistic image synthesis with stacked generative adversarial networks. ArXiv Preprint ArXiv:1710.10916. Zhang, H., Xu, T., Li, H., Zhang, S., Wang, X., Huang, X., et al. (2017b). Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks. In Proceedings of the IEEE international conference on computer vision (pp. 5907e5915). Zheng, Z., Zheng, L., & Yang, Y. (2017). Unlabeled samples generated by gan improve the person re-identification baseline in vitro. In Proceedings of the IEEE international conference on computer vision (pp. 3754e3762). Zhu, Y., Aoun, M., Krijn, M., Vanschoren, J., & Campus, H. T. (2018b). Data augmentation using conditional generative adversarial networks for leaf counting in arabidopsis plants. Computer Vision Problems in Plant Phenotyping (CVPPP2018). Zhu, X., Liu, Y., Li, J., Wan, T., & Qin, Z. (2018a). Emotion classification with data augmentation using generative adversarial networks. In Pacific-asia conference on knowledge discovery and data mining (pp. 349e360).