Available online at www.sciencedirect.com Available online at www.sciencedirect.com
ScienceDirect ScienceDirect
Procedia online Computer 00 (2018) 000–000 Available at Science www.sciencedirect.com Procedia Computer Science 00 (2018) 000–000
ScienceDirect
www.elsevier.com/locate/procedia www.elsevier.com/locate/procedia
Procedia Computer Science 144 (2018) 133–139
INNS Conference on Big Data and Deep Learning 2018 INNS Conference on Big Data and Deep Learning 2018
Training dataset reduction on generative adversarial network Training dataset reduction on generative adversarial network Fajar Ulin Nuha, Afiahayati* Fajar Ulin Nuha, Afiahayati*
Department of Computer Science and Electronics,Universitas Gadjah Mada, Yogyakarta, Indonesia Department of Computer Science and Electronics,Universitas Gadjah Mada, Yogyakarta, Indonesia
Abstract Abstract In recent years, generative model using neural network (GAN) has become an interesting field in machine learning. However, a In recent generative using neural training networkdataset (GAN)for hasGAN become an interesting fieldconducted, in machinewhile learning. study thatyears, investigates the model effect of reducing model has not been it is However, known thata study that images investigates the effect of reducing GAN model has research, not been series conducted, while it iswith known that collecting for training dataset requirestraining a lot ofdataset human for labor work. In this of experiments various collecting for training dataset requires a lot ofofhuman labor iswork. In this of research, of for experiments amount of images dataset have been conducted to get the idea how small the amount dataset series required a GAN towith work.various It has amount of dataset have been conducted get the idea ofimages how small is the amount of dataset for aa GAN to work. It has been shown that the reduction to aroundtofifty thousand of dataset has gained a betterrequired result than full amount dataset. been shown that the reduction around for fiftyquantifying thousand images of dataset has gainednetwork a betterwas result than a full amount Additionally, a new evaluationtomethod the performance of GAN also proposed, which dataset. can be Additionally, a new evaluation method for quantifying the performance of GAN network was also proposed, which can be considered later as another evaluation method for GAN framework. considered later as another evaluation method for GAN framework. © 2018 The Authors. Published by Elsevier Ltd. © 2018 2018 The The Authors. Authors. Published Published by by Elsevier Elsevier Ltd. Ltd. © This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/) This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/) This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/) Selection and peer-review under responsibility of the INNS Conference on Big Data and Deep Learning 2018. Selection and peer-review under responsibility of the INNS Conference on Big Data and Deep Learning 2018. Selection and peer-review under responsibility of the INNS Conference on Big Data and Deep Learning 2018. Keywords: generative model; adversarial training; deep learning; convolutional nets; dataset size Keywords: generative model; adversarial training; deep learning; convolutional nets; dataset size
1. Introduction 1. Introduction Generating images from a dataset is one of interesting field in artificial intelligence. A model which is capable of Generating from a dataset is oneisofcalled interesting field in artificial intelligence. A model which is capable generating dataimages from the data distribution generative model. A generative model has many reasons why of it generating from One the data generative model. generative model has many reasonswell whyby it should be data studied. of distribution the reason isis called generative model can A perform semi-supervised learning should be studied. Oneto of reason is then generative model cancan perform by incorporating the model thethe training data, generative model predict semi-supervised the inputs that arelearning missing.well On the incorporating the model to the training data, then generative the inputs that are missing. On the other side, training and sampling from generative models is amodel good can test predict of our ability to manipulate and represent other side, trainingprobability and sampling from generative models a good test of our abilityhigh-dimensional to manipulate andprobability represent high-dimensional distributions. In applied mathisand engineering domains, high-dimensional In applied math and engineering domains, high-dimensional probability distributions are anprobability important distributions. object to study. distributions are an important object to study.
* Corresponding author. Tel.: +62-274-546194; fax: +62-274-546194. address:author.
[email protected] * E-mail Corresponding Tel.: +62-274-546194; fax: +62-274-546194. E-mail address:
[email protected] 1877-0509 © 2018 The Authors. Published by Elsevier Ltd. 1877-0509 © 2018 Thearticle Authors. Published by Elsevier Ltd. This is an open access under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/) Selection under responsibility of the INNS Conference on Big Data and Deep Learning 2018. This is an and openpeer-review access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/) Selection and peer-review under responsibility of the INNS Conference on Big Data and Deep Learning 2018. 1877-0509 © 2018 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/) Selection and peer-review under responsibility of the INNS Conference on Big Data and Deep Learning 2018. 10.1016/j.procs.2018.10.513
134 2
Fajar Ulin Nuha et al. / Procedia Computer Science 144 (2018) 133–139 Fajar Ulin Nuha, Afiahayati / Procedia Computer Science 00 (2018) 000–000
According to the book written by Shaker et al. [1], the other reason why human should adopt the technology of image generation is that it decreases the need of human labor work in creating content in industry such as game. Manually create an image by humans are expensive and slow work to do, and sometimes an industry need the work more and more resulting in an expensive cost of money and time. Many of costly employees in this work was for designers and artists rather than programmers. In 2014, Generative Adversarial Networks (GAN) framework has been invented by Goodfellow et al. [2], the framework estimates a generative model via adversarial process in which two neural networks are competing against each other. Those networks are called Discriminator and Generator, the model is analogous to a counterfeiter who is trying to produce fake money while the Discriminator is analogous to the police, trying to detect whether the money is fake (produced by Generator) or real (produced by data training). In the deep neural network, usually it is trained with a large amount of dataset. Meanwhile, dataset collection in the area of machine learning is a hard thing to do. It requires a lot of human labor work [3] For example, the popular ImageNet dataset collection was requiring more than a year of human labor on Amazon Mechanical Turk. In most of GAN papers, they had used very large dataset for training. For example, in the original GAN paper, the dataset they used were the CIFAR-10 dataset consists of 60,000 and MNIST database of handwritten digits which has a training set of 60,000 examples. These are a problem since collecting dataset is not an easy task to do. In another word, the smaller number of data training is required for GAN the more advantageous the GAN is to the real-world application. Despite of many researches have been done for GAN, they are mostly about improving GAN’s performance and application. The experiment and investigation with the number of dataset for GAN has not been done yet. In this research, using a newly proposed evaluation method, the writer tried to experiment with various amount of dataset to get the idea of how small is the amount of dataset required for a GAN to work. 2. Model Design 2.1. Base Architecture In the implementation of the research, Deep Convolutional Generative Adversarial Network (DCGAN) model, which was proposed by Radford et al. was used [4]. It was an improved version of original GAN that can adapt to higher resolution images and became one of the standard in most of GAN based architectures. The work took Convolutional Neural Network [5] to its architecture because it works well for image as has been used by the work of Rajagede et al. [6] The architecture utilizes several different keys such as: convolutional layer for discriminator, transposed convolutional layer for discriminator, no use of pooling nor un-pooling, and used batch normalization on most of its layers to stabilize the learning process [7]. The input for the discriminator is an image from training dataset, the input for the generator is a 100-D vector generated randomly using uniform distribution [8]. The output of the network is a generated image. The implementation of this framework was done using Tensorflow [9] deep learning framework. The program was coded with the base of work by Kim [10]. 2.2. Dataset There were two datasets used in testing the performance of each experiment in this research. The first dataset is CelebFaces Attributes Dataset (CelebA) dataset [11] It is a large-scale face attributes dataset with more than 200.000 celebrity images. There are large pose variations and background clutter in this dataset. CelebA has large diversities, large quantities, and rich annotations. It has 202,599 number of face images. The dataset is usually used for face detection, face attribute recognition, and landmark localization. The second dataset is a bridge subset of LSUN dataset [12]. It contains over 700.000 images. Each entry has a value in the jpg format. They resized all the images to have a smaller dimension of 256 and compress the images in jpeg with quality 75. To reduce the datasets, the following scheme was used during designing the experiments for this research.
3
Fajar Ulin Nuha, Afiahayati / Procedia Computer Science 00 (2018) 000–000
Fajar Ulin Nuha et al. / Procedia Computer Science 144 (2018) 133–139
135
Table 1. Training Data Partition List Dataset
Quantity
Image Total
celebA
100%
202599
celebA
25% 1%
50650
celebA LSUN Bridge LSUN Bridge LSUN Bridge
35% 8% 0.3%
2025 289595 72399 2896
2.3. Evaluation A performance investigation on each of the experiment requires an evaluation technique to assess how well a model performs, However, a method to evaluate GAN quantitatively was not known yet and is an interesting research study in the field [13]. Thus, we proposed a quantitative evaluation technique based on the quality survey of human eyes, the assessment was calculated by using fallout given by equation 1 FALLOUT = FP / FP + TN
(1)
The formula is taken from confusion matrix [14] as a standard measurement in machine leaning task. As seen above, the fallout formula was used to measure, how much the model can fool human eyes. In the beginning, the respondent was not told that the images in the form were fake at all. The whole process of evaluation needs a trained model, after every model has been trained, a constant uniformly distributed random matrix was fed into the generator network to produce 15 images each, then the generated images were combined into one form that would be evaluated personally. Figure 1 represents the high-level overview of the evaluation method. 3. Experiments and Results Using the dataset obtained from the dataset partition as described before, we also tried to experiment with other parameters hoping to get a better result for our investigation. These are the modification of the model: The size of convolutional network is large; thus, it burdens the computational cost much. We tried with several parameter values for feature map with the goal of making the network much simple and light. The size of batch training, we tried to reduce the batch size since 64 is relatively large when our training dataset is small. From above considerations, these experiment rundowns would be implemented. An alias of each experiment was also defined in Table 2.
4136
Fajar UlinFajar Nuha, Afiahayati Procedia Computer Science 00 (2018) 000–000 Ulin Nuha et/ al. / Procedia Computer Science 144 (2018) 133–139
Fig. 1. The evaluation method used in this research, the method was never been used in the previous GAN research, and this method might be considered later as another evaluation method for GAN Table 2. Experiment Rundown
Alias
Dataset
C1 C2 C3 C4 C5 L1 L2 L3 L4 L5
celebA celebA celebA celebA celebA LSUN Bridge LSUN Bridge LSUN Bridge LSUN Bridge LSUN Bridge
Dataset Size 2,025 2,025 50,650 50,650 202,599 2,896 2,896 72,399 72,399 289,595
Epoch 15 45 15 30 15 15 45 15 30 15
First Dimension 64 32 64 48 64 64 32 64 48 64
Batch Size 64 32 64 48 64 64 32 64 48 64
The generated images are the result of generator network after training phase. The input was a uniformly distributed random matrix with value in range between -1 and 1. It has a dimension of batch size by 100. The best results for CelebA dataset is shown in Figure 2.
Fajar Ulin Nuha et al. / Procedia Computer Science 144 (2018) 133–139
Fajar Ulin Nuha, Afiahayati / Procedia Computer Science 00 (2018) 000–000
137
5
Fig. 2. The result of experiment C3 as the best result showing random faces which were generated by machine
The best results for CelebA dataset is generated in Experiment C3. Figure 3 depicts the comparison of the experiments’ results for one face image in more detail. The experiments are conducted using the same latent variable as the input.
Fig. 3. A comparison between each of the face experiment using same latent variable as the input
Fig. 4. A comparison between each of the bridge experiment using same latent variable as the input
While the comparison of the detail experiments’ results for one bridge image in LSUN bridge dataset is shown in Figure 4. The overall representation of the best results for LSUN bridge dataset is depicted in Figure 5.
6138
Fajar UlinFajar Nuha, Afiahayati Procedia Computer Science 00 (2018) 000–000 Ulin Nuha et/ al. / Procedia Computer Science 144 (2018) 133–139
Fig. 5. The result of experiment L3 as the best result showing random bridges which were generated by machine
4. Survey Evaluation Result The survey has been conducted to 5 different people, the people are from variety of age and gender. The data were collected by asking the respondent one by one and the writer tried to make the survey as transparent as possible. The flow of the survey can be seen in Figure 1. Table 3. The result of survey evaluation Experiment
Respondent
Average
I
II
III
IV
V
C1
43.75%
37.50%
43.75%
0.00%
50.00%
35.00%
C2
43.75%
43.75%
50.00%
18.75%
25.00%
36.25%
C3
62.50%
68.75%
68.75%
56.25%
50.00%
61.25%
C4
68.75%
56.25%
87.50%
56.25%
31.25%
60.00%
C5
62.50%
68.75%
50.00%
56.25%
50.00%
57.50%
L1
0.00%
0.00%
0.00%
0.00%
0.00%
0.00%
L2
12.50%
62.50%
62.50%
0.00%
6.25%
28.75%
L3
43.75%
68.75%
87.50%
43.75%
37.50%
56.25%
L4
37.50%
31.25%
50.00%
43.75%
37.50%
40.00%
L5
18.75%
56.25%
43.75%
43.75%
31.25%
38.75%
Based on the above results, it can be implied that experiments with low-level dataset gained very low percentage, experiments with high-level dataset had a good enough percentage, but the winner was the experiment with mid-
Fajar Ulin Nuha et al. / Procedia Computer Science 144 (2018) 133–139 Fajar Ulin Nuha, Afiahayati / Procedia Computer Science 00 (2018) 000–000
139 7
level dataset for all type of dataset. Additionally, over all of the result, an average of each dataset experiment was calculated and gave 50% fallout average for celebA dataset, and 32.75% fallout average for bridge dataset. Therefore, it can also be seen that celebA got a better result than LSUN Bridge. 5. Conclusion and Future Works Reduction the dataset size into the low-level (around 2,000 images) results in very poor images, while the reduction into the mid-level (around 50,000) produced a competitive, if not better results rather than the high-level (around 200,000) dataset. It was also seen that smaller feature map and batch size sometimes give a better complement when applied to smaller dataset (comparing L1 with L2 and C1 with C2). The evaluation method proposed in this research might be considered later as another GAN evaluation method since it gives a quantified value between the performance of each experiment. Another evaluation of GAN architecture using this method might be interesting in the future works. References [1] [2] [3] [4]
N. Shaker, J. Togelius, and M. J. Nelson, Procedural content generation in games. Springer, 2016. I. Goodfellow et al., “Generative adversarial nets,” in Advances in neural information processing systems, 2014, pp. 2672–2680. D. Rolnick, A. Veit, S. Belongie, and N. Shavit, “Deep learning is robust to massive label noise,” arXiv Prepr. arXiv1705.10694, 2017. A. Radford, L. Metz, and S. Chintala, “Unsupervised representation learning with deep convolutional generative adversarial networks,” arXiv Prepr. arXiv1511.06434, 2015. [5] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in neural information processing systems, 2012, pp. 1097–1105. [6] R. A. Rajagede, C. K. Dewa, and others, “Recognizing Arabic letter utterance using convolutional neural network,” in Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD), 2017 18th IEEE/ACIS International Conference on, 2017, pp. 181–186. [7] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” arXiv Prepr. arXiv1502.03167, 2015. [8] L. Kuipers and H. Niederreiter, Uniform distribution of sequences. Courier Corporation, 2012. [9] M. Abadi et al., “TensorFlow: A System for Large-Scale Machine Learning.,” in OSDI, 2016, vol. 16, pp. 265–283. [10] T. Kim, “DCGAN in Tensorflow.” . [11] Z. Liu, P. Luo, X. Wang, and X. Tang, “Deep learning face attributes in the wild,” in Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 3730–3738. [12] F. Y. Y. Z. S. Song and A. S. J. Xiao, “Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop,” arXiv Prepr. arXiv1506.03365, 2015. [13] I. Goodfellow, “NIPS 2016 tutorial: Generative adversarial networks,” arXiv Prepr. arXiv1701.00160, 2016. [14] D. M. Powers, “Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation,” 2011.