Progressive generative adversarial networks with reliable sample identification

Progressive generative adversarial networks with reliable sample identification

Accepted Manuscript Progressive Generative Adversarial Networks with Reliable Sample Identification Gang Wei, Minnan Luo, Huan Liu, Donghui Zhang, Qi...

7MB Sizes 0 Downloads 73 Views

Accepted Manuscript

Progressive Generative Adversarial Networks with Reliable Sample Identification Gang Wei, Minnan Luo, Huan Liu, Donghui Zhang, Qinghua Zheng PII: DOI: Reference:

S0167-8655(19)30006-6 PATREC 7412

To appear in:

Pattern Recognition Letters

Received date: Revised date: Accepted date:

2 May 2018 22 November 2018 8 January 2019

Please cite this article as: Gang Wei, Minnan Luo, Huan Liu, Donghui Zhang, Qinghua Zheng, Progressive Generative Adversarial Networks with Reliable Sample Identification, Pattern Recognition Letters (2019), doi:

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT 1 Highlights • We propose a principled model to alleviate the unstable problem of training GANs.

• We consider the quality of each samples by identifying reliable sample according to its loss.

• We conduct extensive experiments over challenging

datasets to validate the effectiveness of our proposed

• The model is so flexible that could be extended to vary-







ing frameworks of GANs.




Pattern Recognition Letters journal homepage:

Progressive Generative Adversarial Networks with Reliable Sample Identification Gang Weia , Minnan Luoa,b,∗∗, Huan Liua , Donghui Zhanga , Qinghua Zhenga,c a


School of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an, China Ministry of Education Key Lab For Intelligent Networks and Network Security, Xi’an, China c National Engineering Lab for Big Data Analytics, Xi’an Jiaotong University, Xi’an, China b


1. Introduction





Generative Adversarial Networks (GANs) are deep neural network architectures comprising of two neural networks, namely discriminator and generator, which contest with each other in a zero-sum game. In the past years, although original GANs and their variations have achieved impressive success, there are some challenges still remain, especially unstable training progress leading to gradient vanishing or saturation. We can show by inspection that the reliable samples with smaller errors are beneficial to achieve a better generator, while the unreliable one might disturb the training procedure. Enlightened from this observation, we introduce an indicator for each sample to indicate its reliability in this paper. Based on this, we exploit a new objective function to learn the generator/discriminator and infer the indicator for each sample simultaneously. In such a way, the unreliable samples that might result in the opposite side are discarded in training stage. Meanwhile, when the training errors become smaller, more and more samples are included in the reliable set of samples, until no more reliable one are produced. It is noteworthy that the proposed method is adapted to both the original GANs and its variations. Experiments on CIFAR-10, STL-10 and LSUN datasets demonstrate the state-of-the-art performance of the proposed framework with respect to GANs and its variations. c 2019 Elsevier Ltd. All rights reserved.


Generative adversarial networks(GANs) (Goodfellow et al., 2014) are a deep learning framework in which two models, namely a generative model and a discriminative

rency to cheat police (discriminative model). Competing in this game, both the generative and discriminative models try to improve their performance until the counterfeits obtained by the generator are indistinguishable to the discriminator. In the past years, GANs have achieved great success in

criminative model aims to judge whether a sample is from

various applications such as objection detection/recognition,

real data or fake data while the generative model aims to

3D object reconstruction, image generation (Zhu et al.,

capture the distribution of some target data (e.g. distribu-

2016), transformation, inpainting(Pathak et al., 2016) and

tion of high-resolution images) and puzzle the discrimina-

so on.

tive model. This procedure can be understood as a team of

GANs not only provides a more flexible framework with

counterfeiters (generative model) trying to produce fake cur-

respect to different tasks but also alleviate the heavy com-


model, are trained simultaneously. Specifically, the dis-

In contrast to the traditional generative models,

putational burden suffered by Markov chain learning mech∗∗

Corresponding author: e-mail: [email protected] (Minnan Luo)

anism (Goodfellow et al., 2014). However, there are still some practical and essential issues. One of the significant

ACCEPTED MANUSCRIPT 3 different impacts on training the generator. To illustrate this

Training GAN by selecting reliable samples

point more clearly, we take the task of image generation as fake samples

real samples

fake samples

an example. It is evident that high-quality fake samples,

distance > λ

such as generated sharp and realistic images which are close

distance < λ

to the real data distribution, are beneficial for training the generator, i.e. getting a proper gradient descent direction and furthermore achieving better minimum. On the contrary, samples with low quality, such as fuzzy or confusing generated images which are far away from real data distribution,


are adverse for this procedure because they are misleading and deviate the gradient descent direction. In other words, for the training process of GANs, the former generated fake reliable

samples are reliable while the latter generated samples are


unreliable. Therefore, it is significantly necessary to disFig. 1: Example of selecting samples. The samples whose distance

should select reliable images but discard ambiguous ones


between “fake” and “real” images is no greater than λ are selected

tinguish between the reliable and unreliable samples. We

for generator, Otherwise the samples are thought as unreliable and

when training the generator of GANs. See Figure 1 for a


better understanding. In this case, the distance between fake images and real images represents the similarity between

challenges comes from the instability of training progress, which might result in poor performance of GANs (Arjovsky


and Bottou, 2017). Based on such consideration, a consid-

erable body of research has sprung up to mitigate this trend

by reconstructing the objective functions of GANs (Mao et


al., 2017) or optimizing the network architecture. For example, Radford et al. (Radford et al., 2016) introduced a

these two distributions. Fake samples in the left-hand column have a small distance ( i.e., lower than λ ) between the

real data, therefore they should be selected as reliable samples during the training process. Conversely, the right-hand column fake samples should be discarded as their distribution is far away ( i.e., the distance is greater than λ ) from

the real data. Enlightened from the analysis above, we propose a pro-

to estimate the energy gradient of training data. Instead of

gressive generative adversarial networks (PGANs) by iden-

using the extant Kullback-Leibler Divergence, Wasserstein

tifying reliable samples in the training procedure of GANs.

GAN(WGAN) (Arjovsky et al., 2017) prevented the gra-

The framework is shown in Figure 2. We propose a new ob-

dient vanish by using a new distance named Earth-Mover.

jective function to learn generator/discriminator and infer a

Denton et al. (Denton et al., 2015) improved the framework

binary indicator for each sample simultaneously. The indi-

of Laplacian pyramid with a cascade of the convolutional

cator refers to whether this sample is reliable or not. To be

network(LAPGAN) to generate high-quality samples of nat-

specific, we discard the unreliable samples which might lead

ural images. Although these works have achieved decent

to the opposite side, and only take into account the training

performance, they still suffer from the problem of “collapse”

loss from the selected reliable samples. It is noteworthy that,

pathology (Arjovsky and Bottou, 2017) caused by unstable

the strategy used in this paper can be extended to the original

training stage, because that all fake samples obtained by the

GANs and its variations. Additionally, we introduce a nega-

generator are treated equally during this process.

tive `1 regularization term which adaptively selects reliable




denoising auto-encoder and leverage the “denoised” feature

In the framework of GANs, fake samples produced by

samples in the training stage of GAN. Considering that the

the generator are used to update its parameters. However,

proposed objective function is non-smooth and non-convex,

these numerous samples with different qualities might have

an effective alternating optimization algorithm is exploited

ACCEPTED MANUSCRIPT 4 to solve the proposed challenging problem. The contribu-

Arjovsky and Bottou (Arjovsky and Bottou, 2017) state that the training stage would be unstable and the perfor-

tions of this paper are summarized as follows: 1. We propose a principled model to alleviate the unstable problem of training GANs. Besides, our model is so flexible that could be extended to varying frameworks of GANs, such as the deep convolutional generative adversarial network(DCGAN) (Radford et al., 2016) and WGAN (Arjovsky et al., 2017).

mance of GANs would be miserable if the discriminator D is over-sufficient trained. In order to deal with this problem, the most direct way is to restrain GANs, for instance, conditional generative adversarial network(CGAN) (Mirza and Osindero, 2014) constrains the random noise by adding label information both in generator and discriminator. To improve GANs’ inability to deal with the high-resolution images, LAPGAN generates an image in several times, ev-

ing stage of GANs, we consider the quality of each

ery time it only produces a part of an image. The advan-

sample and develop progressive generative adversarial

tage of this method is just taking the residual error between

networks by identifying reliable sample according to its

fake samples and real samples, which makes the model easy


to train. DCGAN proposes a specific network framework

3. We conduct extensive experiments over the datasets of our proposed models toward to image generation task. Besides we also conduct experiments on the challeng-

ing datasets LSUN bedroom, church and bridge to show our model’s ability to generate complex images.

of GAN and uses Batch Normalization in both generator and discriminator, which makes the training progress sta-


CIFAR-10 and STL-10 to validate the effectiveness of


2. Instead of using all batch size samples equally in train-


The remainder of this paper is organized as follows. In Section 2, we briefly review the related work about the

GANs and their applications. In Section 3, we first intro-


duce an indicator variable to each sample and then propose a novel progressive GANs model by identifying the reliability of each sample in the training stage. An efficient alternat-


ing algorithm is exploited in Section 4 to solve the proposed challenging optimization problem. Section 5 presents extensive experiments over several benchmark data sets. Finally,


conclusions are given in Section 6.


2. Related Works

ble. WGAN (Arjovsky et al., 2017) and improved training of WGAN(WGAN-gp) (Gulrajani et al., 2017) improve the GANs by proposing a new measure to calculate the distance named Wasserstein-1(also called Earth-Mover) distance between the model distribution and the real distribution. In addition, GANs can also be used for other tasks. In

the field of object detection, GANs can detect smaller objects in low-resolution and noisy representation (Li et al., 2017b) compared with traditional method (Chang et al., 2017) (Luo et al., 2018a). Bousmalis et al. (Bousmalis et al., 2016) use GANs to transfer the images of sourcedomain into the target domain. In the task of feature selection (Luo et al., 2018b), GANs can learn disentangled representations (Chen et al., 2016). Similarly, GANs can also do face sketch synthesis taskZhang et al. (2018a) (Wang et al., 2017; Huang et al., 2018). In image retrieval field, compared to traditional methods (Zhu et al., 2015), GANs outperform

With the Generative adversarial networks (Goodfellow

existing hashing methods (Song, 2017). As for the image-

et al., 2014) proposed in 2014, they have been receiving

to-image translation, GANs can learn the mapping from the

increasing attention in recent years. Compared with tradi-

input image to output image (Isola et al., 2017; Li et al.,

tional generative networks, one of the most important ad-

2017a). Except generating 2D images, GANs can generate

vantages of GANs is that only backpropagation is needed,

3D objects (Wu et al., 2016; Gadelha et al., 2016). Yeh et al.

which leads to much more simple computation. As one of

(Yeh et al., 2016) use GANs to restore the images according

the most successful generative models, a rich body of works

to context and perception. Although GANs and their varia-

devoted to studying GANs and their variations for practical

tions have achieved great success in a broad of applications,

applications especially image generation.

the problem of the unstable training process is still unsolved


Generated Fake samples











Update D

dv -dim Random noise

Discriminator D

Real Samples







Self-paced progress from easy to hard

Fig. 2: The reliable sample with smaller error is beneficial to achieve better generator, while the unreliable one with poor quality might derivate the gradient decent direction in training stage.

and desires a more robust framework.


Notation Description

3. The Proposed Methodology

In this section, we briefly review the traditional GANs first and then exploit a novel progressive GANs by identifying the reliable samples in the training stage. We have an un-


labeled images set X = {x1 , x2 , · · · , xn }, where xi ∈ Rd

is the feature vector of the i-th image (i = 1, 2, · · · , n).

Let z = {z1 , z2 , · · · , zm } be a set of random noise, where


zi ∈ Rdv refers to a dv -dimensional vector of random noise. Our task is to learn a generative model which is capable of


ˆ = G(z). For transforming z into fake image G(z), i.e., x simplicity, a list of notations used in this paper are shown in Table 1


number of real samples


number of fake samples


dimension of random noise


the parameter set of Discriminator


the parameter set of Generator


threshold parameter to discard unreliable samples

ˆ x

fake sample generated by Generator


the real samples set


random noise


selection indicator vector

α, β1 , β2 hyper-parameters in Adam Table 1: List of notations.

Unfortunately, finding Nash equilibrium is a very difficult

ˆ erative network G which transforms random noise z into x;

problem. Besides the training progress of GANs sometimes

at the same time, a discriminative network D is designed

is unstable, resulting in gradient vanishing or saturation.


To this end, the traditional GANs aim to learn the gen-

Considering the significance of selecting reliable sam-

both the generator and discriminator are a deep neural net-

ples for better training, in this section, we introduce progres-

work. Training the GANs is to find a Nash equilibrium in

sive processing into the training stage of GANs, to produce

a two-player non-cooperative game (Salimans et al., 2016).

more realistic “fake” images to cheat the discriminator. For-

Taking DCGAN as an example. Formally, the game between

mally, we denote vi ∈ {0, 1} as the selection indicator of


to distinguish the real image from the fake one. Note that

the generator G and the discriminator D is formulated as the

sample x ˆi . If vi equals to 1, the sample x ˆi is selected as a

following optimization problem

reliable one for training; otherwise, the sample is discarded.

min max Ex∼Pr [log(D(x))] + Exˆ∼Pg [log(1 − D(ˆ x))]. G



For a better representation, let v = {v1 , · · · , vm } ∈ {0, 1}m

be the selection indicator vector overall noise samples. We formulate the idea of progressive GANs with reliable sample


Algorithm 1 The alternative algorithm for optimization

Loss function

problem (2).

E[log[D(x)]] + E[log(1 − D(G(z)))] E[fw (x)] − E[fw (gθ (z))], w ∈ W

ˆ − E[D(x)] + λE[(k5D(x)k ˆ 2 − 1)2 ] WGAN-GP E[D(x)]

Input: Feature set X = {xi ∈ Rd : i = 1, 2, · · · , n};

Trade-off parameter λ ≥ 0; Adam hyperparameters

α, β1 , β2 , the number of steps to apply to the discrim-

Table 2: Different formulation of loss function of GANs and its variations.

inator ncritic , here ncritic = 1 Output: The discriminator D and generator G. Initialize: Random noise z ∈ Rdv , initial discriminator D,

identification as the following optimization problem

generator G and v .


1: 2:

m 1 X vj [log(1 − D(G(zj )))] + λh(v) m j=1

3: 4:

for number of training iterations do



1X min max [log(D(xi ))]+ G,v D n i=1

for j = 0 to m do

if log(1 − D(G(zj ))) < λ then vj = 1


where D and G collects the parameters in deep neuron net-


works of discriminator D and generator G, respectively. λ


is a non-negative parameter. In this work, the regulariza-


tion term is formulated as an negative `1 -norm, i.e., h(v) =


end for

−kvk1 . In this sense, the parameter λ serves as a threshold


for ncritic steps do P g D ← 5D xi ∈X ;zj ∈S n1 [log[D(xi )]+

vj = 0


end if


of training progress, i.e., its indicator is set to 0. Moreover,


when the training errors are becoming smaller during the


training process, more indicators vj (j = 1, 2, · · · , m) of


end for

samples turn to 1. In other words, more and more samples


are included in the reliable set of samples, until all samples


1 g G ← 5G m

are selected.




to discard the unreliable samples in the iterative procedure

1 v log(1 |S| j D

− D(G(zj )))]

D ← Adam(g , D, α, β1 , β2 ) Pm

j=1 G

vj log(1 − D(G(zj )))

G ← Adam(g , G, α, β1 , β2 )

end for

It is noteworthy that we formulate the idea of progresrepresentation, we take the framework of DCGAN as an ex-

our model can be intuitively extended to the variations of

ample to illustrate the training procedure of our proposed

GANs with different loss function. In Table 2, we summa-

progressive GANs with reliable sample identification.


sive GANs with respect to the original GANs in (2). Indeed,


rize the objective functions of some impressive variations of GANs, including DCGAN , WGAN and WGAN-GP. Note that DCGAN shares the same objective function with the


original GANs. Regarding WGAN and WGAN-GP, the loss

In the framework of an alternative optimization algorithm, firstly, we update the indicator vector v ∈ Rn with

fixed generator D and discriminator G, i.e., solving the following optimization problem:

function of the discriminator measures the Wasserstein distance between the real samples and the fake ones. In Section 5, we conduct extensive experiments to demonstrate the superiority of the proposed progressive GANs over the corresponding GANs in Table 2. 4. Optimization Strategy In this section, we exploit an alternative algorithm to solve the proposed optimization problem (2). For a better



m X j=1

vj [log(1 − D(G(zj )))] + λh(v)


It is evident that the minimum of optimization problem (3) is derived as:

 1, vj = 0,

log(1 − D(G(zj ))) < λ



for j = 1, 2, · · · , m. As a result, the procedure of updating indicator vector is indeed to select the reliable samples in

ACCEPTED MANUSCRIPT 7 the iterative process of training generator. Then, with the selected set of reliable samples, denoted as S, and the fixed generator G, we first update the discriminator D by solving the following optimization problem:

consisting of 60,000 color images with the size of 32 × 32. There are 50,000 training images and 10,000 test images in 10 classes with 6,000 images per category, including airplane, automobile, bird, cat, deer, dog, frog, horse, ship,

1 1 [log[D(xi )] + vj log(1 − D(G(zj )))]. n |S|

and truck. STL-10 (Coates et al., 2011) is an image recog-


algorithms. It has the same classes as the CIFAR-10 in 500

We denote the gradient of the objective function (5) by P 1 5D xi ∈X ;zj ∈S n1 [log[D(xi )] + |S| vj log(1 − D(G(zj )))]

labeled images and 800 test images, each category has fewer

max D


xi ∈X ;zj ∈S

nition dataset for developing unsupervised feature learning

labeled training examples than CIFAR-10. Note that STL10 has a very large set of unlabeled examples to learn image

approached in each epoch by Adam optimization algorithm:

models with unsupervised training. LSUN (Yu et al., 2015)


contains 10 scene categories and 20 object categories of la-

D ←Adam(5D +

xi ∈X ;zj ∈S

1 [log[D(xi )] n


consisting of almost 4M training images and 900 validation

able D and selected reliable samples set S by solving the

following optimization problem: G

zj ∈S


log[1 − D(G(zj ))]


With the gradient of the objective function (7), represented P 1 as 5G |S| zj ∈S log(1 − D(G(zj ))), the optimal generator G can be updated by Adam optimization algorithm


1 X log(1 − D(G(zj ))), α, β1 , β2 ). G ← Adam(5G |S| z ∈S j



Similarly, we update the parameter G with fixed vari-


beled images with the size of 128 × 128. In the experiments on LSUN, we only use bedroom, church and bridge datasets

1 vj log(1 − D(G(zj )))], α, β1 , β2 ) |S|



with respect to variable D, therefore the optimal D can be

5.2. Experimental Settings To demonstrate the generalization of our method, for

CIFAR-10 and STL-10, we not only conduct image generation but also calculate the Inception Score with the proposed models. However, we only generate images from LSUN dataset since this dataset is too large and the computational cost for the Inception Score is unbearable. Although some paper has succeeded in image synthesis models operating at 128 × 128 (Salimans et al., 2016) and 256 × 256 (Zhao

et al., 2016) resolution, we test our model at relatively low

For a better understanding, we summarize the key steps

resolutions, i.e., 96 × 96 for LSUN dataset, 64 × 64 for



of the proposed alternative algorithm in Algorithm 1, where


Adam refer to an extension to the stochastic gradient descent


algorithm, which is effective in this parameter space. 5. Experiments

STL-10 dataset and 32 × 32 for CIFAR-10 dataset, both for computational ease and because we believe that the problem

of unconditional modeling of diverse image collections is not well solved even at low-resolutions. In our experiment, three GANs architectures are used: DCGAN, WGAN and WGAN-gp. Our generator and discriminator architectures

To demonstrate the effectiveness and superiority of the

are both deep neural network. Batch normalization (Ioffe

proposed model, in this section, we conduct extensive ex-

and Szegedy, 2015) is used in the generator and discrimina-

periments over three benchmark datasets.

tor of DCGAN and WGAN, but WGAN-gp only employs this layer on the discriminator. The number of selected re-

5.1. Datasets

liable samples ranges from [23 , 27 ]. For simplicity, all net-

We use three benchmark datasets, including CIFAR-

works are trained with the Adam optimizer (Kingma and Ba,

10, STL-10 and LSUN datasets. Specifically, CIFAR-10

2015) with the parameter set as β1 = 0.5 and β2 = 0.9. We

(Krizhevsky and Hinton, 2009) is a small-studied dataset

report the best results for each algorithm.


Progressive DCGAN

Progressive WGAN

Progressive WGAN-gp



Fig. 3: Samples generated by Progressive GANs on cifar-10 dataset.

Progressive WGAN

Progressive DCGAN



Fig. 4: Samples generated by Progressive GANs on SIT-10 dataset.




Fig. 5: Samples generated from Progressive WGAN-gp on LSUN bedroom.

Fig. 6: Samples generated from Progressive WGAN-gp on LSUN church.

Fig. 7: Samples generated from Progressive WGAN-gp on LSUN bridge.

Progressive WGAN-gp











Fig. 8: Training curves and samples at different stages and different architectures of training.

Table 4: Comparison of reception score with variations of GANs on

the dataset of CIFAR-10.

the dataset of STL-10.


Table 3: Comparison of reception score with variations of GANs on



















WGAN-gp(dcgan architecture)






WGAN-gp(dcgan architecture)


5.3. Experimental Results Analysis In order to demonstrate the effectiveness of our pro-

posed robust progressive generative adversarial networks with unlabeled reliable images, we use the Inception Score of images to evaluate the random 100 generated images on CIFAR-10 and STL-10. Higher Inception Score, as well as visual inspection, suggest that the procedure captures classspecific features of the training data in a manner superior to the original adversarial objective alone. The experimental results regarding the mean value of the Inception Score

over CIFAR-10 and STL-10 are demonstrated by Table 3 and Table 4, respectively. We observe from the experimental results that: 1. The original GANs models perform differently in Inception Score on both datasets, where DCGAN performs best, followed by WGAN-gp, and WGAN performs worst. Specifically, WGAN-gp performs better than WGAN since WGAN-gp uses a gradient penalty term to make the model more stable. 2. The Inception Score of each progressive GANs model

ACCEPTED MANUSCRIPT 10 is higher than its original one in both CIFAR-10 and

liable samples. In this section, we set up some experiment

STL-10, which indicates the mechanism that select-

to analyze the influence of this hyper-parameter on gener-

ing reliable samples in the training stage is effective

ating images’ quality over CIFAR-10 and STL-10 datasets.

for improving the performance. Meanwhile, this phe-

To analyze the hyper-parameter λ, we plot sensitivity curves

nomenon also shows the flexibility of the proposed

for regularization parameter λ in Figure 9, where λ is tuned

framework concerning different GANs architectures.

in {10−2 , 5 × 10−2 , 10−1 , 5 × 10−1 , 100 , 5 × 100 , 101 }. The

models’ performance is the same as the original models on different datasets, the property of different models stay unchanged, which indicates the robustness of our proposed models.

results indicate that our methodology, which is evaluated by

Inception Score, on three GANs architectures has the analogous variation trend with the increase of λ. To be specific, the Inception Score improves gradually with the increasing


3. After using the proposed method, the rank of different

value of λ until the maximum is achieved; after that, the value begins to decrease when the λ is still increasing. A proper value of λ could achieve the best performance by in-

some generated CIFAR-10 samples in Figure 3, STL-10

troducing reliable samples into the image generation task.

samples in Figure 4, LSUN bedroom in Figure 5, churches

According to the mentioned above, the performance of Pro-

in Figure 6 and bridges in Figure 7 to show the ability of gen-

gressive GANs is fluctuant with varying values of parame-


Because of the limitation of the space, we only display

erator. CIFAR-10 and STL-10 have low-resolution but com-

ters λ. Overall speaking, when λ is respectively tuned in

plicated images, therefore the images generated by the gen-

intervals [10−1 , 10], the performance could be satisfactory

erator are less realistic. LSUN datasets have high-resolution

and relatively stable.

images, so the generated images are sharper and more recognizable to human. All images generated by our model is

5.4. Training quality and convergence


comparable to original models.

6. Conclusion and Future Work Motivated by the significance of reliability of samples

used for training GANs, we propose a new progressive generative adversarial networks by identifying the reliable sam-

method over two datasets. To be precise, (a), (b) and (c) are

ples. By introducing a binary indicator of each sample

the generating images’ quality of CIFAR-10, (d), (e) and

for training, our idea is executed by learning the genera-

(f) display the generating images’ quality of STL-10. (a)

tor/discriminator and inferring the indicator for each sample

and (d) are related to Progressive DCGAN architecture, (b)

simultaneously. It is notable that the proposed framework

and (e) are related to Progressive WGAN architecture, (c)

of progressive GANs is so flexible and can be intuitively ex-

and (f) are related to Progressive WGAN-gp architecture.

tended to the original GANs and its variations with varying

We can see a clear correlation between the Inception Score

objective functions. Some extensive experimental results

and samples quality both in CIFAR-10 and STL-10. The

over some benchmark datasets including CIFAR-10, STL-

higher Inception Score is, the better the samples are. As

10 and three classes of LSUN dataset: bedroom, church and

for convergence, all training architectures converges at the

bridge, demonstrate the state-of-the-art performance of the

last stage of training progress. To be specific, Progressive

proposed framework with respect to the original GANs and

DCGAN converges faster than other architectures, followed

its variations. In the framework of PGAN, there is always

by Progressive WGAN, and Progressive WGAN-gp is the

a limitation on hyper-parameter λ which should be deter-


mined in advance. To combat this issue, we attempt to use an





Figure 8 shows the convergence curves of our proposed

adaptive method by adding a penalty term in the loss func5.5. Sensitivity Analysis

tion and integrate the determination of λ into the process of

There is only one hyper-parameter λ in the proposed

training stage. Additionally, object tracking is an important

methodology, which controls the amount of the selected re-

part of the computer vision field, which is widely used in


(a) CIFAR-10

(b) STL-10

motion analysis, video surveillance, human–computer interface, etc. (Henriques et al., 2015) (Zhang et al., 2018b). At present limited data samples still constrain the performance


Fig. 9: Sensitivity analysis on parameter λ in terms of Inception Score over two datasets.

Emily L Denton, Soumith Chintala, Rob Fergus, et al. Deep generative image models using a laplacian pyramid of adversarial networks. In NIPS, 2015. Matheus Gadelha, Subhransu Maji, and Rui Wang. 3d shape induction from 2d views of multiple objects. arXiv preprint arXiv:1612.05872, 2016. Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-

work is to produce more reliable video samples by our meth-

Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative ad-


of object tracking. One interesting direction for our future ods in order to benefit the performance of object tracking.

versarial nets. In NIPS, 2014.

Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron Courville. Improved training of wasserstein gans. arXiv preprint arXiv:1704.00028, 2017.

7. Acknowledgments

Jo˜ao F Henriques, Rui Caseiro, Pedro Martins, and Jorge Batista. High-speed

This work is supported by National Nature Science


Foundation of China (61502377), National Key Research

tracking with kernelized correlation filters. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(3):583–596, 2015.

Bin Huang, Weihai Chen, Xingming Wu, Chun-Liang Lin, and Ponnuthurai

and Development Program of China (2018YFB1004500),

Nagaratnam Suganthan. High-quality face image generated with conditional

Innovative Research Group of the National Natural Sci-

boundary equilibrium generative adversarial networks. Pattern Recognition


ence Foundation of China (61721002), Innovation Research Team of Ministry of Education (IRT 17R86), Project of China Knowledge Center for Engineering Science and Tech-





Martin Arjovsky and L´eon Bottou. Towards principled methods for training generative adversarial networks. In ICLR, 2017. Martin Arjovsky, Soumith Chintala, and L´eon Bottou. Wasserstein generative


adversarial networks. In ICML, 2017.

Konstantinos Bousmalis, Nathan Silberman, David Dohan, Dumitru Erhan, and Dilip Krishnan. Unsupervised pixel-level domain adaptation with generative adversarial networks. arXiv preprint arXiv:1612.05424, 2016. Xiaojun Chang, Zhigang Ma, Yi Yang, Zhiqiang Zeng, and Alexander G Hauptmann. Bi-level semantic representation analysis for multimedia event detection. IEEE transactions on cybernetics, 47(5):1180–1197, 2017.

Letters, 2018. Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In ICML, 2015. Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. Image-to-image translation with conditional adversarial networks. arXiv preprint, 2017. Diederik Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In ICLR, 2015. Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. 2009. Ce Li, Xinyu Zhao, Zhaoxiang Zhang, and Shaoyi Du. Generative adversarial dehaze mapping nets. Pattern Recognition Letters, 2017. Jianan Li, Xiaodan Liang, Yunchao Wei, Tingfa Xu, Jiashi Feng, and Shuicheng Yan. Perceptual generative adversarial networks for small object detection. In IEEE CVPR, 2017. Minnan Luo, Xiaojun Chang, Liqiang Nie, Yi Yang, Alexander G Hauptmann, and Qinghua Zheng. An adaptive semisupervised feature analysis for video semantic recognition. IEEE transactions on cybernetics, 48(2):648–660, 2018. Minnan Luo, Feiping Nie, Xiaojun Chang, Yi Yang, Alexander G Hauptmann,

Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya Sutskever, and Pieter

and Qinghua Zheng. Adaptive unsupervised feature selection with structure

Abbeel. Infogan: Interpretable representation learning by information max-

regularization. IEEE transactions on neural networks and learning systems,

imizing generative adversarial nets. In Advances in neural information processing systems, pages 2172–2180, 2016. Adam Coates, Andrew Ng, and Honglak Lee. An analysis of single-layer networks in unsupervised feature learning. In AISTATS, pages 215–223, 2011.

29(4):944–956, 2018. Xudong Mao, Qing Li, Haoran Xie, Raymond YK Lau, Zhen Wang, and Stephen Paul Smolley. Least squares generative adversarial networks. In ICCV, 2017.

ACCEPTED MANUSCRIPT 12 Mehdi Mirza and Simon Osindero. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784, 2014. Deepak Pathak, Philipp Krahenbuhl, Jeff Donahue, Trevor Darrell, and Alexei A Efros. Context encoders: Feature learning by inpainting. In CVPR, 2016. Alec Radford, Luke Metz, and Soumith Chintala. Unsupervised representation learning with deep convolutional generative adversarial networks. In ICLR, 2016. Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen. Improved techniques for training gans. In NIPS, 2016. Jingkuan Song. Binary generative adversarial networks for image retrieval. 2017. fective postprocessing method for gan-based face sketch synthesis. Pattern Recognition Letters, 2017. Jiajun Wu, Chengkai Zhang, Tianfan Xue, Bill Freeman, and Josh Tenenbaum. Learning a probabilistic latent space of object shapes via 3d generativeadversarial modeling. In Advances in Neural Information Processing Systems, pages 82–90, 2016. Raymond Yeh, Chen Chen, Teck Yian Lim, Mark Hasegawa-Johnson, and Minh N Do. Semantic image inpainting with perceptual and contextual


losses. arXiv preprint arXiv:1607.07539, 2016. Fisher Yu, Ari Seff, Yinda Zhang, Shuran Song, Thomas Funkhouser, and Jianxiong Xiao. Lsun: Construction of a large-scale image dataset using

deep learning with humans in the loop. arXiv preprint arXiv:1506.03365, 2015.

Lingling Zhang, Jun Liu, Minnan Luo, Xiaojun Chang, and Qinghua Zheng. Deep semisupervised zero-shot learning with maximum mean discrepancy. Neural computation, 30(5):1426–1447, 2018.


Shunli Zhang, Wei Lu, Weiwei Xing, and Li Zhang. Learning scale-adaptive

tight correlation filter for object tracking. IEEE transactions on cybernetics, 2018.

Junbo Zhao, Michael Mathieu, and Yann LeCun. Energy-based generative ad-


versarial network. arXiv preprint arXiv:1609.03126, 2016.

Lei Zhu, Jialie Shen, and Liang Xie. Topic hypergraph hashing for mobile image retrieval. In Proceedings of the 23rd ACM international conference on Multimedia, pages 843–846. ACM, 2015.


Jun-Yan Zhu, Philipp Kr¨ahenb¨uhl, Eli Shechtman, and Alexei A Efros. Gener-


ative visual manipulation on the natural image manifold. In ECCV, 2016.



Nannan Wang, Wenjin Zha, Jie Li, and Xinbo Gao. Back projection: an ef-