Accepted Manuscript
Progressive Generative Adversarial Networks with Reliable Sample Identification Gang Wei, Minnan Luo, Huan Liu, Donghui Zhang, Qinghua Zheng PII: DOI: Reference:
S0167-8655(19)30006-6 https://doi.org/10.1016/j.patrec.2019.01.007 PATREC 7412
To appear in:
Pattern Recognition Letters
Received date: Revised date: Accepted date:
2 May 2018 22 November 2018 8 January 2019
Please cite this article as: Gang Wei, Minnan Luo, Huan Liu, Donghui Zhang, Qinghua Zheng, Progressive Generative Adversarial Networks with Reliable Sample Identification, Pattern Recognition Letters (2019), doi: https://doi.org/10.1016/j.patrec.2019.01.007
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT 1 Highlights • We propose a principled model to alleviate the unstable problem of training GANs.
• We consider the quality of each samples by identifying reliable sample according to its loss.
• We conduct extensive experiments over challenging
datasets to validate the effectiveness of our proposed
• The model is so flexible that could be extended to vary-
AC
CE
PT
ED
M
AN US
ing frameworks of GANs.
CR IP T
models.
ACCEPTED MANUSCRIPT 2
Pattern Recognition Letters journal homepage: www.elsevier.com
Progressive Generative Adversarial Networks with Reliable Sample Identification Gang Weia , Minnan Luoa,b,∗∗, Huan Liua , Donghui Zhanga , Qinghua Zhenga,c a
CR IP T
School of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an, China Ministry of Education Key Lab For Intelligent Networks and Network Security, Xi’an, China c National Engineering Lab for Big Data Analytics, Xi’an Jiaotong University, Xi’an, China b
ABSTRACT
1. Introduction
PT
ED
M
AN US
Generative Adversarial Networks (GANs) are deep neural network architectures comprising of two neural networks, namely discriminator and generator, which contest with each other in a zero-sum game. In the past years, although original GANs and their variations have achieved impressive success, there are some challenges still remain, especially unstable training progress leading to gradient vanishing or saturation. We can show by inspection that the reliable samples with smaller errors are beneficial to achieve a better generator, while the unreliable one might disturb the training procedure. Enlightened from this observation, we introduce an indicator for each sample to indicate its reliability in this paper. Based on this, we exploit a new objective function to learn the generator/discriminator and infer the indicator for each sample simultaneously. In such a way, the unreliable samples that might result in the opposite side are discarded in training stage. Meanwhile, when the training errors become smaller, more and more samples are included in the reliable set of samples, until no more reliable one are produced. It is noteworthy that the proposed method is adapted to both the original GANs and its variations. Experiments on CIFAR-10, STL-10 and LSUN datasets demonstrate the state-of-the-art performance of the proposed framework with respect to GANs and its variations. c 2019 Elsevier Ltd. All rights reserved.
CE
Generative adversarial networks(GANs) (Goodfellow et al., 2014) are a deep learning framework in which two models, namely a generative model and a discriminative
rency to cheat police (discriminative model). Competing in this game, both the generative and discriminative models try to improve their performance until the counterfeits obtained by the generator are indistinguishable to the discriminator. In the past years, GANs have achieved great success in
criminative model aims to judge whether a sample is from
various applications such as objection detection/recognition,
real data or fake data while the generative model aims to
3D object reconstruction, image generation (Zhu et al.,
capture the distribution of some target data (e.g. distribu-
2016), transformation, inpainting(Pathak et al., 2016) and
tion of high-resolution images) and puzzle the discrimina-
so on.
tive model. This procedure can be understood as a team of
GANs not only provides a more flexible framework with
counterfeiters (generative model) trying to produce fake cur-
respect to different tasks but also alleviate the heavy com-
AC
model, are trained simultaneously. Specifically, the dis-
In contrast to the traditional generative models,
putational burden suffered by Markov chain learning mech∗∗
Corresponding author: e-mail:
[email protected] (Minnan Luo)
anism (Goodfellow et al., 2014). However, there are still some practical and essential issues. One of the significant
ACCEPTED MANUSCRIPT 3 different impacts on training the generator. To illustrate this
Training GAN by selecting reliable samples
point more clearly, we take the task of image generation as fake samples
real samples
fake samples
an example. It is evident that high-quality fake samples,
distance > λ
such as generated sharp and realistic images which are close
distance < λ
to the real data distribution, are beneficial for training the generator, i.e. getting a proper gradient descent direction and furthermore achieving better minimum. On the contrary, samples with low quality, such as fuzzy or confusing generated images which are far away from real data distribution,
CR IP T
are adverse for this procedure because they are misleading and deviate the gradient descent direction. In other words, for the training process of GANs, the former generated fake reliable
samples are reliable while the latter generated samples are
unreliable
unreliable. Therefore, it is significantly necessary to disFig. 1: Example of selecting samples. The samples whose distance
should select reliable images but discard ambiguous ones
AN US
between “fake” and “real” images is no greater than λ are selected
tinguish between the reliable and unreliable samples. We
for generator, Otherwise the samples are thought as unreliable and
when training the generator of GANs. See Figure 1 for a
discarded.
better understanding. In this case, the distance between fake images and real images represents the similarity between
challenges comes from the instability of training progress, which might result in poor performance of GANs (Arjovsky
M
and Bottou, 2017). Based on such consideration, a consid-
erable body of research has sprung up to mitigate this trend
by reconstructing the objective functions of GANs (Mao et
ED
al., 2017) or optimizing the network architecture. For example, Radford et al. (Radford et al., 2016) introduced a
these two distributions. Fake samples in the left-hand column have a small distance ( i.e., lower than λ ) between the
real data, therefore they should be selected as reliable samples during the training process. Conversely, the right-hand column fake samples should be discarded as their distribution is far away ( i.e., the distance is greater than λ ) from
the real data. Enlightened from the analysis above, we propose a pro-
to estimate the energy gradient of training data. Instead of
gressive generative adversarial networks (PGANs) by iden-
using the extant Kullback-Leibler Divergence, Wasserstein
tifying reliable samples in the training procedure of GANs.
GAN(WGAN) (Arjovsky et al., 2017) prevented the gra-
The framework is shown in Figure 2. We propose a new ob-
dient vanish by using a new distance named Earth-Mover.
jective function to learn generator/discriminator and infer a
Denton et al. (Denton et al., 2015) improved the framework
binary indicator for each sample simultaneously. The indi-
of Laplacian pyramid with a cascade of the convolutional
cator refers to whether this sample is reliable or not. To be
network(LAPGAN) to generate high-quality samples of nat-
specific, we discard the unreliable samples which might lead
ural images. Although these works have achieved decent
to the opposite side, and only take into account the training
performance, they still suffer from the problem of “collapse”
loss from the selected reliable samples. It is noteworthy that,
pathology (Arjovsky and Bottou, 2017) caused by unstable
the strategy used in this paper can be extended to the original
training stage, because that all fake samples obtained by the
GANs and its variations. Additionally, we introduce a nega-
generator are treated equally during this process.
tive `1 regularization term which adaptively selects reliable
AC
CE
PT
denoising auto-encoder and leverage the “denoised” feature
In the framework of GANs, fake samples produced by
samples in the training stage of GAN. Considering that the
the generator are used to update its parameters. However,
proposed objective function is non-smooth and non-convex,
these numerous samples with different qualities might have
an effective alternating optimization algorithm is exploited
ACCEPTED MANUSCRIPT 4 to solve the proposed challenging problem. The contribu-
Arjovsky and Bottou (Arjovsky and Bottou, 2017) state that the training stage would be unstable and the perfor-
tions of this paper are summarized as follows: 1. We propose a principled model to alleviate the unstable problem of training GANs. Besides, our model is so flexible that could be extended to varying frameworks of GANs, such as the deep convolutional generative adversarial network(DCGAN) (Radford et al., 2016) and WGAN (Arjovsky et al., 2017).
mance of GANs would be miserable if the discriminator D is over-sufficient trained. In order to deal with this problem, the most direct way is to restrain GANs, for instance, conditional generative adversarial network(CGAN) (Mirza and Osindero, 2014) constrains the random noise by adding label information both in generator and discriminator. To improve GANs’ inability to deal with the high-resolution images, LAPGAN generates an image in several times, ev-
ing stage of GANs, we consider the quality of each
ery time it only produces a part of an image. The advan-
sample and develop progressive generative adversarial
tage of this method is just taking the residual error between
networks by identifying reliable sample according to its
fake samples and real samples, which makes the model easy
loss.
to train. DCGAN proposes a specific network framework
3. We conduct extensive experiments over the datasets of our proposed models toward to image generation task. Besides we also conduct experiments on the challeng-
ing datasets LSUN bedroom, church and bridge to show our model’s ability to generate complex images.
of GAN and uses Batch Normalization in both generator and discriminator, which makes the training progress sta-
AN US
CIFAR-10 and STL-10 to validate the effectiveness of
CR IP T
2. Instead of using all batch size samples equally in train-
M
The remainder of this paper is organized as follows. In Section 2, we briefly review the related work about the
GANs and their applications. In Section 3, we first intro-
ED
duce an indicator variable to each sample and then propose a novel progressive GANs model by identifying the reliability of each sample in the training stage. An efficient alternat-
PT
ing algorithm is exploited in Section 4 to solve the proposed challenging optimization problem. Section 5 presents extensive experiments over several benchmark data sets. Finally,
CE
conclusions are given in Section 6.
AC
2. Related Works
ble. WGAN (Arjovsky et al., 2017) and improved training of WGAN(WGAN-gp) (Gulrajani et al., 2017) improve the GANs by proposing a new measure to calculate the distance named Wasserstein-1(also called Earth-Mover) distance between the model distribution and the real distribution. In addition, GANs can also be used for other tasks. In
the field of object detection, GANs can detect smaller objects in low-resolution and noisy representation (Li et al., 2017b) compared with traditional method (Chang et al., 2017) (Luo et al., 2018a). Bousmalis et al. (Bousmalis et al., 2016) use GANs to transfer the images of sourcedomain into the target domain. In the task of feature selection (Luo et al., 2018b), GANs can learn disentangled representations (Chen et al., 2016). Similarly, GANs can also do face sketch synthesis taskZhang et al. (2018a) (Wang et al., 2017; Huang et al., 2018). In image retrieval field, compared to traditional methods (Zhu et al., 2015), GANs outperform
With the Generative adversarial networks (Goodfellow
existing hashing methods (Song, 2017). As for the image-
et al., 2014) proposed in 2014, they have been receiving
to-image translation, GANs can learn the mapping from the
increasing attention in recent years. Compared with tradi-
input image to output image (Isola et al., 2017; Li et al.,
tional generative networks, one of the most important ad-
2017a). Except generating 2D images, GANs can generate
vantages of GANs is that only backpropagation is needed,
3D objects (Wu et al., 2016; Gadelha et al., 2016). Yeh et al.
which leads to much more simple computation. As one of
(Yeh et al., 2016) use GANs to restore the images according
the most successful generative models, a rich body of works
to context and perception. Although GANs and their varia-
devoted to studying GANs and their variations for practical
tions have achieved great success in a broad of applications,
applications especially image generation.
the problem of the unstable training process is still unsolved
ACCEPTED MANUSCRIPT 5 Update G Generator G
Generated Fake samples
λ=1
loss
0.92
0.58
0.43
×
×
0.11
×
×
Update D
dv -dim Random noise
Discriminator D
Real Samples
5.85
1.23
2.01
1.23
CR IP T
loss
Self-paced progress from easy to hard
Fig. 2: The reliable sample with smaller error is beneficial to achieve better generator, while the unreliable one with poor quality might derivate the gradient decent direction in training stage.
and desires a more robust framework.
AN US
Notation Description
3. The Proposed Methodology
In this section, we briefly review the traditional GANs first and then exploit a novel progressive GANs by identifying the reliable samples in the training stage. We have an un-
M
labeled images set X = {x1 , x2 , · · · , xn }, where xi ∈ Rd
is the feature vector of the i-th image (i = 1, 2, · · · , n).
Let z = {z1 , z2 , · · · , zm } be a set of random noise, where
ED
zi ∈ Rdv refers to a dv -dimensional vector of random noise. Our task is to learn a generative model which is capable of
PT
ˆ = G(z). For transforming z into fake image G(z), i.e., x simplicity, a list of notations used in this paper are shown in Table 1
n
number of real samples
m
number of fake samples
dv
dimension of random noise
D
the parameter set of Discriminator
G
the parameter set of Generator
λ
threshold parameter to discard unreliable samples
ˆ x
fake sample generated by Generator
X
the real samples set
z
random noise
v
selection indicator vector
α, β1 , β2 hyper-parameters in Adam Table 1: List of notations.
Unfortunately, finding Nash equilibrium is a very difficult
ˆ erative network G which transforms random noise z into x;
problem. Besides the training progress of GANs sometimes
at the same time, a discriminative network D is designed
is unstable, resulting in gradient vanishing or saturation.
CE
To this end, the traditional GANs aim to learn the gen-
Considering the significance of selecting reliable sam-
both the generator and discriminator are a deep neural net-
ples for better training, in this section, we introduce progres-
work. Training the GANs is to find a Nash equilibrium in
sive processing into the training stage of GANs, to produce
a two-player non-cooperative game (Salimans et al., 2016).
more realistic “fake” images to cheat the discriminator. For-
Taking DCGAN as an example. Formally, the game between
mally, we denote vi ∈ {0, 1} as the selection indicator of
AC
to distinguish the real image from the fake one. Note that
the generator G and the discriminator D is formulated as the
sample x ˆi . If vi equals to 1, the sample x ˆi is selected as a
following optimization problem
reliable one for training; otherwise, the sample is discarded.
min max Ex∼Pr [log(D(x))] + Exˆ∼Pg [log(1 − D(ˆ x))]. G
D
(1)
For a better representation, let v = {v1 , · · · , vm } ∈ {0, 1}m
be the selection indicator vector overall noise samples. We formulate the idea of progressive GANs with reliable sample
ACCEPTED MANUSCRIPT 6 Name DCGAN WGAN
Algorithm 1 The alternative algorithm for optimization
Loss function
problem (2).
E[log[D(x)]] + E[log(1 − D(G(z)))] E[fw (x)] − E[fw (gθ (z))], w ∈ W
ˆ − E[D(x)] + λE[(k5D(x)k ˆ 2 − 1)2 ] WGAN-GP E[D(x)]
Input: Feature set X = {xi ∈ Rd : i = 1, 2, · · · , n};
Trade-off parameter λ ≥ 0; Adam hyperparameters
α, β1 , β2 , the number of steps to apply to the discrim-
Table 2: Different formulation of loss function of GANs and its variations.
inator ncritic , here ncritic = 1 Output: The discriminator D and generator G. Initialize: Random noise z ∈ Rdv , initial discriminator D,
identification as the following optimization problem
generator G and v .
(2)
1: 2:
m 1 X vj [log(1 − D(G(zj )))] + λh(v) m j=1
3: 4:
for number of training iterations do
CR IP T
n
1X min max [log(D(xi ))]+ G,v D n i=1
for j = 0 to m do
if log(1 − D(G(zj ))) < λ then vj = 1
else
where D and G collects the parameters in deep neuron net-
5:
works of discriminator D and generator G, respectively. λ
6:
is a non-negative parameter. In this work, the regulariza-
7:
tion term is formulated as an negative `1 -norm, i.e., h(v) =
8:
end for
−kvk1 . In this sense, the parameter λ serves as a threshold
9:
for ncritic steps do P g D ← 5D xi ∈X ;zj ∈S n1 [log[D(xi )]+
vj = 0
AN US
end if
10:
of training progress, i.e., its indicator is set to 0. Moreover,
11:
when the training errors are becoming smaller during the
12:
training process, more indicators vj (j = 1, 2, · · · , m) of
13:
end for
samples turn to 1. In other words, more and more samples
14:
are included in the reliable set of samples, until all samples
15:
1 g G ← 5G m
are selected.
16:
ED
M
to discard the unreliable samples in the iterative procedure
1 v log(1 |S| j D
− D(G(zj )))]
D ← Adam(g , D, α, β1 , β2 ) Pm
j=1 G
vj log(1 − D(G(zj )))
G ← Adam(g , G, α, β1 , β2 )
end for
It is noteworthy that we formulate the idea of progresrepresentation, we take the framework of DCGAN as an ex-
our model can be intuitively extended to the variations of
ample to illustrate the training procedure of our proposed
GANs with different loss function. In Table 2, we summa-
progressive GANs with reliable sample identification.
PT
sive GANs with respect to the original GANs in (2). Indeed,
CE
rize the objective functions of some impressive variations of GANs, including DCGAN , WGAN and WGAN-GP. Note that DCGAN shares the same objective function with the
AC
original GANs. Regarding WGAN and WGAN-GP, the loss
In the framework of an alternative optimization algorithm, firstly, we update the indicator vector v ∈ Rn with
fixed generator D and discriminator G, i.e., solving the following optimization problem:
function of the discriminator measures the Wasserstein distance between the real samples and the fake ones. In Section 5, we conduct extensive experiments to demonstrate the superiority of the proposed progressive GANs over the corresponding GANs in Table 2. 4. Optimization Strategy In this section, we exploit an alternative algorithm to solve the proposed optimization problem (2). For a better
min
v∈{0,1}m
m X j=1
vj [log(1 − D(G(zj )))] + λh(v)
(3)
It is evident that the minimum of optimization problem (3) is derived as:
1, vj = 0,
log(1 − D(G(zj ))) < λ
(4)
otherwise
for j = 1, 2, · · · , m. As a result, the procedure of updating indicator vector is indeed to select the reliable samples in
ACCEPTED MANUSCRIPT 7 the iterative process of training generator. Then, with the selected set of reliable samples, denoted as S, and the fixed generator G, we first update the discriminator D by solving the following optimization problem:
consisting of 60,000 color images with the size of 32 × 32. There are 50,000 training images and 10,000 test images in 10 classes with 6,000 images per category, including airplane, automobile, bird, cat, deer, dog, frog, horse, ship,
1 1 [log[D(xi )] + vj log(1 − D(G(zj )))]. n |S|
and truck. STL-10 (Coates et al., 2011) is an image recog-
(5)
algorithms. It has the same classes as the CIFAR-10 in 500
We denote the gradient of the objective function (5) by P 1 5D xi ∈X ;zj ∈S n1 [log[D(xi )] + |S| vj log(1 − D(G(zj )))]
labeled images and 800 test images, each category has fewer
max D
X
xi ∈X ;zj ∈S
nition dataset for developing unsupervised feature learning
labeled training examples than CIFAR-10. Note that STL10 has a very large set of unlabeled examples to learn image
approached in each epoch by Adam optimization algorithm:
models with unsupervised training. LSUN (Yu et al., 2015)
X
contains 10 scene categories and 20 object categories of la-
D ←Adam(5D +
xi ∈X ;zj ∈S
1 [log[D(xi )] n
(6)
consisting of almost 4M training images and 900 validation
able D and selected reliable samples set S by solving the
following optimization problem: G
zj ∈S
(7)
log[1 − D(G(zj ))]
M
With the gradient of the objective function (7), represented P 1 as 5G |S| zj ∈S log(1 − D(G(zj ))), the optimal generator G can be updated by Adam optimization algorithm
ED
1 X log(1 − D(G(zj ))), α, β1 , β2 ). G ← Adam(5G |S| z ∈S j
images.
AN US
Similarly, we update the parameter G with fixed vari-
X
beled images with the size of 128 × 128. In the experiments on LSUN, we only use bedroom, church and bridge datasets
1 vj log(1 − D(G(zj )))], α, β1 , β2 ) |S|
min
CR IP T
with respect to variable D, therefore the optimal D can be
5.2. Experimental Settings To demonstrate the generalization of our method, for
CIFAR-10 and STL-10, we not only conduct image generation but also calculate the Inception Score with the proposed models. However, we only generate images from LSUN dataset since this dataset is too large and the computational cost for the Inception Score is unbearable. Although some paper has succeeded in image synthesis models operating at 128 × 128 (Salimans et al., 2016) and 256 × 256 (Zhao
et al., 2016) resolution, we test our model at relatively low
For a better understanding, we summarize the key steps
resolutions, i.e., 96 × 96 for LSUN dataset, 64 × 64 for
PT
(8)
of the proposed alternative algorithm in Algorithm 1, where
CE
Adam refer to an extension to the stochastic gradient descent
AC
algorithm, which is effective in this parameter space. 5. Experiments
STL-10 dataset and 32 × 32 for CIFAR-10 dataset, both for computational ease and because we believe that the problem
of unconditional modeling of diverse image collections is not well solved even at low-resolutions. In our experiment, three GANs architectures are used: DCGAN, WGAN and WGAN-gp. Our generator and discriminator architectures
To demonstrate the effectiveness and superiority of the
are both deep neural network. Batch normalization (Ioffe
proposed model, in this section, we conduct extensive ex-
and Szegedy, 2015) is used in the generator and discrimina-
periments over three benchmark datasets.
tor of DCGAN and WGAN, but WGAN-gp only employs this layer on the discriminator. The number of selected re-
5.1. Datasets
liable samples ranges from [23 , 27 ]. For simplicity, all net-
We use three benchmark datasets, including CIFAR-
works are trained with the Adam optimizer (Kingma and Ba,
10, STL-10 and LSUN datasets. Specifically, CIFAR-10
2015) with the parameter set as β1 = 0.5 and β2 = 0.9. We
(Krizhevsky and Hinton, 2009) is a small-studied dataset
report the best results for each algorithm.
ACCEPTED MANUSCRIPT 8
Progressive DCGAN
Progressive WGAN
Progressive WGAN-gp
AN US
CR IP T
Fig. 3: Samples generated by Progressive GANs on cifar-10 dataset.
Progressive WGAN
Progressive DCGAN
ED
M
Fig. 4: Samples generated by Progressive GANs on SIT-10 dataset.
AC
CE
PT
Fig. 5: Samples generated from Progressive WGAN-gp on LSUN bedroom.
Fig. 6: Samples generated from Progressive WGAN-gp on LSUN church.
Fig. 7: Samples generated from Progressive WGAN-gp on LSUN bridge.
Progressive WGAN-gp
ACCEPTED MANUSCRIPT 9
(b)
(c)
AN US
CR IP T
(a)
(f)
(e)
(d)
M
Fig. 8: Training curves and samples at different stages and different architectures of training.
Table 4: Comparison of reception score with variations of GANs on
the dataset of CIFAR-10.
the dataset of STL-10.
ED
Table 3: Comparison of reception score with variations of GANs on
DCGAN WGAN
Original
Ours
Model
Original
Ours
6.50
6.79
DCGAN
6.83
7.17
5.65
6.02
WGAN
5.88
5.99
6.46
6.51
WGAN-gp(dcgan architecture)
6.64
7.03
PT
Model
CE
WGAN-gp(dcgan architecture)
AC
5.3. Experimental Results Analysis In order to demonstrate the effectiveness of our pro-
posed robust progressive generative adversarial networks with unlabeled reliable images, we use the Inception Score of images to evaluate the random 100 generated images on CIFAR-10 and STL-10. Higher Inception Score, as well as visual inspection, suggest that the procedure captures classspecific features of the training data in a manner superior to the original adversarial objective alone. The experimental results regarding the mean value of the Inception Score
over CIFAR-10 and STL-10 are demonstrated by Table 3 and Table 4, respectively. We observe from the experimental results that: 1. The original GANs models perform differently in Inception Score on both datasets, where DCGAN performs best, followed by WGAN-gp, and WGAN performs worst. Specifically, WGAN-gp performs better than WGAN since WGAN-gp uses a gradient penalty term to make the model more stable. 2. The Inception Score of each progressive GANs model
ACCEPTED MANUSCRIPT 10 is higher than its original one in both CIFAR-10 and
liable samples. In this section, we set up some experiment
STL-10, which indicates the mechanism that select-
to analyze the influence of this hyper-parameter on gener-
ing reliable samples in the training stage is effective
ating images’ quality over CIFAR-10 and STL-10 datasets.
for improving the performance. Meanwhile, this phe-
To analyze the hyper-parameter λ, we plot sensitivity curves
nomenon also shows the flexibility of the proposed
for regularization parameter λ in Figure 9, where λ is tuned
framework concerning different GANs architectures.
in {10−2 , 5 × 10−2 , 10−1 , 5 × 10−1 , 100 , 5 × 100 , 101 }. The
models’ performance is the same as the original models on different datasets, the property of different models stay unchanged, which indicates the robustness of our proposed models.
results indicate that our methodology, which is evaluated by
Inception Score, on three GANs architectures has the analogous variation trend with the increase of λ. To be specific, the Inception Score improves gradually with the increasing
CR IP T
3. After using the proposed method, the rank of different
value of λ until the maximum is achieved; after that, the value begins to decrease when the λ is still increasing. A proper value of λ could achieve the best performance by in-
some generated CIFAR-10 samples in Figure 3, STL-10
troducing reliable samples into the image generation task.
samples in Figure 4, LSUN bedroom in Figure 5, churches
According to the mentioned above, the performance of Pro-
in Figure 6 and bridges in Figure 7 to show the ability of gen-
gressive GANs is fluctuant with varying values of parame-
AN US
Because of the limitation of the space, we only display
erator. CIFAR-10 and STL-10 have low-resolution but com-
ters λ. Overall speaking, when λ is respectively tuned in
plicated images, therefore the images generated by the gen-
intervals [10−1 , 10], the performance could be satisfactory
erator are less realistic. LSUN datasets have high-resolution
and relatively stable.
images, so the generated images are sharper and more recognizable to human. All images generated by our model is
5.4. Training quality and convergence
M
comparable to original models.
6. Conclusion and Future Work Motivated by the significance of reliability of samples
used for training GANs, we propose a new progressive generative adversarial networks by identifying the reliable sam-
method over two datasets. To be precise, (a), (b) and (c) are
ples. By introducing a binary indicator of each sample
the generating images’ quality of CIFAR-10, (d), (e) and
for training, our idea is executed by learning the genera-
(f) display the generating images’ quality of STL-10. (a)
tor/discriminator and inferring the indicator for each sample
and (d) are related to Progressive DCGAN architecture, (b)
simultaneously. It is notable that the proposed framework
and (e) are related to Progressive WGAN architecture, (c)
of progressive GANs is so flexible and can be intuitively ex-
and (f) are related to Progressive WGAN-gp architecture.
tended to the original GANs and its variations with varying
We can see a clear correlation between the Inception Score
objective functions. Some extensive experimental results
and samples quality both in CIFAR-10 and STL-10. The
over some benchmark datasets including CIFAR-10, STL-
higher Inception Score is, the better the samples are. As
10 and three classes of LSUN dataset: bedroom, church and
for convergence, all training architectures converges at the
bridge, demonstrate the state-of-the-art performance of the
last stage of training progress. To be specific, Progressive
proposed framework with respect to the original GANs and
DCGAN converges faster than other architectures, followed
its variations. In the framework of PGAN, there is always
by Progressive WGAN, and Progressive WGAN-gp is the
a limitation on hyper-parameter λ which should be deter-
slowest.
mined in advance. To combat this issue, we attempt to use an
AC
CE
PT
ED
Figure 8 shows the convergence curves of our proposed
adaptive method by adding a penalty term in the loss func5.5. Sensitivity Analysis
tion and integrate the determination of λ into the process of
There is only one hyper-parameter λ in the proposed
training stage. Additionally, object tracking is an important
methodology, which controls the amount of the selected re-
part of the computer vision field, which is widely used in
ACCEPTED MANUSCRIPT 11
(a) CIFAR-10
(b) STL-10
motion analysis, video surveillance, human–computer interface, etc. (Henriques et al., 2015) (Zhang et al., 2018b). At present limited data samples still constrain the performance
CR IP T
Fig. 9: Sensitivity analysis on parameter λ in terms of Inception Score over two datasets.
Emily L Denton, Soumith Chintala, Rob Fergus, et al. Deep generative image models using a laplacian pyramid of adversarial networks. In NIPS, 2015. Matheus Gadelha, Subhransu Maji, and Rui Wang. 3d shape induction from 2d views of multiple objects. arXiv preprint arXiv:1612.05872, 2016. Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-
work is to produce more reliable video samples by our meth-
Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative ad-
AN US
of object tracking. One interesting direction for our future ods in order to benefit the performance of object tracking.
versarial nets. In NIPS, 2014.
Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron Courville. Improved training of wasserstein gans. arXiv preprint arXiv:1704.00028, 2017.
7. Acknowledgments
Jo˜ao F Henriques, Rui Caseiro, Pedro Martins, and Jorge Batista. High-speed
This work is supported by National Nature Science
M
Foundation of China (61502377), National Key Research
tracking with kernelized correlation filters. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(3):583–596, 2015.
Bin Huang, Weihai Chen, Xingming Wu, Chun-Liang Lin, and Ponnuthurai
and Development Program of China (2018YFB1004500),
Nagaratnam Suganthan. High-quality face image generated with conditional
Innovative Research Group of the National Natural Sci-
boundary equilibrium generative adversarial networks. Pattern Recognition
ED
ence Foundation of China (61721002), Innovation Research Team of Ministry of Education (IRT 17R86), Project of China Knowledge Center for Engineering Science and Tech-
PT
nology.
CE
References
Martin Arjovsky and L´eon Bottou. Towards principled methods for training generative adversarial networks. In ICLR, 2017. Martin Arjovsky, Soumith Chintala, and L´eon Bottou. Wasserstein generative
AC
adversarial networks. In ICML, 2017.
Konstantinos Bousmalis, Nathan Silberman, David Dohan, Dumitru Erhan, and Dilip Krishnan. Unsupervised pixel-level domain adaptation with generative adversarial networks. arXiv preprint arXiv:1612.05424, 2016. Xiaojun Chang, Zhigang Ma, Yi Yang, Zhiqiang Zeng, and Alexander G Hauptmann. Bi-level semantic representation analysis for multimedia event detection. IEEE transactions on cybernetics, 47(5):1180–1197, 2017.
Letters, 2018. Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In ICML, 2015. Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. Image-to-image translation with conditional adversarial networks. arXiv preprint, 2017. Diederik Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In ICLR, 2015. Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. 2009. Ce Li, Xinyu Zhao, Zhaoxiang Zhang, and Shaoyi Du. Generative adversarial dehaze mapping nets. Pattern Recognition Letters, 2017. Jianan Li, Xiaodan Liang, Yunchao Wei, Tingfa Xu, Jiashi Feng, and Shuicheng Yan. Perceptual generative adversarial networks for small object detection. In IEEE CVPR, 2017. Minnan Luo, Xiaojun Chang, Liqiang Nie, Yi Yang, Alexander G Hauptmann, and Qinghua Zheng. An adaptive semisupervised feature analysis for video semantic recognition. IEEE transactions on cybernetics, 48(2):648–660, 2018. Minnan Luo, Feiping Nie, Xiaojun Chang, Yi Yang, Alexander G Hauptmann,
Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya Sutskever, and Pieter
and Qinghua Zheng. Adaptive unsupervised feature selection with structure
Abbeel. Infogan: Interpretable representation learning by information max-
regularization. IEEE transactions on neural networks and learning systems,
imizing generative adversarial nets. In Advances in neural information processing systems, pages 2172–2180, 2016. Adam Coates, Andrew Ng, and Honglak Lee. An analysis of single-layer networks in unsupervised feature learning. In AISTATS, pages 215–223, 2011.
29(4):944–956, 2018. Xudong Mao, Qing Li, Haoran Xie, Raymond YK Lau, Zhen Wang, and Stephen Paul Smolley. Least squares generative adversarial networks. In ICCV, 2017.
ACCEPTED MANUSCRIPT 12 Mehdi Mirza and Simon Osindero. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784, 2014. Deepak Pathak, Philipp Krahenbuhl, Jeff Donahue, Trevor Darrell, and Alexei A Efros. Context encoders: Feature learning by inpainting. In CVPR, 2016. Alec Radford, Luke Metz, and Soumith Chintala. Unsupervised representation learning with deep convolutional generative adversarial networks. In ICLR, 2016. Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen. Improved techniques for training gans. In NIPS, 2016. Jingkuan Song. Binary generative adversarial networks for image retrieval. 2017. fective postprocessing method for gan-based face sketch synthesis. Pattern Recognition Letters, 2017. Jiajun Wu, Chengkai Zhang, Tianfan Xue, Bill Freeman, and Josh Tenenbaum. Learning a probabilistic latent space of object shapes via 3d generativeadversarial modeling. In Advances in Neural Information Processing Systems, pages 82–90, 2016. Raymond Yeh, Chen Chen, Teck Yian Lim, Mark Hasegawa-Johnson, and Minh N Do. Semantic image inpainting with perceptual and contextual
AN US
losses. arXiv preprint arXiv:1607.07539, 2016. Fisher Yu, Ari Seff, Yinda Zhang, Shuran Song, Thomas Funkhouser, and Jianxiong Xiao. Lsun: Construction of a large-scale image dataset using
deep learning with humans in the loop. arXiv preprint arXiv:1506.03365, 2015.
Lingling Zhang, Jun Liu, Minnan Luo, Xiaojun Chang, and Qinghua Zheng. Deep semisupervised zero-shot learning with maximum mean discrepancy. Neural computation, 30(5):1426–1447, 2018.
M
Shunli Zhang, Wei Lu, Weiwei Xing, and Li Zhang. Learning scale-adaptive
tight correlation filter for object tracking. IEEE transactions on cybernetics, 2018.
Junbo Zhao, Michael Mathieu, and Yann LeCun. Energy-based generative ad-
ED
versarial network. arXiv preprint arXiv:1609.03126, 2016.
Lei Zhu, Jialie Shen, and Liang Xie. Topic hypergraph hashing for mobile image retrieval. In Proceedings of the 23rd ACM international conference on Multimedia, pages 843–846. ACM, 2015.
PT
Jun-Yan Zhu, Philipp Kr¨ahenb¨uhl, Eli Shechtman, and Alexei A Efros. Gener-
CE
ative visual manipulation on the natural image manifold. In ECCV, 2016.
AC
CR IP T
Nannan Wang, Wenjin Zha, Jie Li, and Xinbo Gao. Back projection: an ef-