Feature preserving GAN and multi-scale feature enhancement for domain adaption person Re-identification

Feature preserving GAN and multi-scale feature enhancement for domain adaption person Re-identification

Feature Preserving GAN and multi-scale Feature Enhancement for Domain Adaption Person Re-identification Communicated by Dr Li Sheng Accepted Manusc...

2MB Sizes 0 Downloads 81 Views

Feature Preserving GAN and multi-scale Feature Enhancement for Domain Adaption Person Re-identification

Communicated by Dr

Li Sheng

Accepted Manuscript

Feature Preserving GAN and multi-scale Feature Enhancement for Domain Adaption Person Re-identification Xiuping Liu, Hongchen Tan, Xin Tong, Junjie Cao, Jun Zhou PII: DOI: Reference:

S0925-2312(19)31068-9 https://doi.org/10.1016/j.neucom.2019.07.063 NEUCOM 21124

To appear in:

Neurocomputing

Received date: Revised date: Accepted date:

4 January 2019 18 June 2019 20 July 2019

Please cite this article as: Xiuping Liu, Hongchen Tan, Xin Tong, Junjie Cao, Jun Zhou, Feature Preserving GAN and multi-scale Feature Enhancement for Domain Adaption Person Re-identification, Neurocomputing (2019), doi: https://doi.org/10.1016/j.neucom.2019.07.063

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT

Feature Preserving GAN and multi-scale Feature Enhancement for Domain Adaption Person Re-identification

a School

CR IP T

Xiuping Liua , Hongchen Tana,∗, Xin Tongb , Junjie Caoa , Jun Zhoua

of Mathematical Sciences, Dalian University of Technology, Dalian 116024, China Graphics Group, Microsoft Research Asia, Beijing 100080, China

b Internet

AN US

Abstract

The performance of Person Re-identification (Re-ID) model depends much on its training dataset, and drops significantly when the detector is applied to a new scene due to the large variations between the source training dataset and the target scene. In this paper, we proposed multi-scale Feature Enhancement(MFE) Re-ID model and Feature Preserving Generative Adversarial Network (FPGAN)

M

for cross-domain person Re-ID task. Here, MFE Re-ID model provides a strong baseline model for cross-domain person Re-ID task, and FPGAN bridges the

ED

domain gap to improve the performance of Re-ID on target scene. In the MFE Re-ID model, person semantic feaure maps, extracted from backbone of segmentation model, enhance person body region’s multi-scale feature responce.

PT

This operation could capture multi-scale robust discriminative visual factors related to person. In FPGAN, we translate the labeled images from source to target domain in an unsupervised manner, and learn a transfer function

CE

to preserve the person perceptual information of source images and ensure the transferred person images show similar styles with the target dataset. Extensive

AC

experiments demonstrate that combining FPGAN and MFE Re-ID model could achieve state-of-the-art results in cross-domain Re-ID task on DukeMTMC-reID ✩ Fully

documented templates are available in the elsarticle package on CTAN. author Email addresses: [email protected] (Xiuping Liu), [email protected] (Hongchen Tan), [email protected] (Xin Tong), [email protected] (Junjie Cao), [email protected] (Jun Zhou) ∗ Corresponding

Preprint submitted to Journal of LATEX Templates

July 24, 2019

ACCEPTED MANUSCRIPT

and Market-1501 datasets. Besides, MFE Re-ID model could achieve state-ofthe-art results in supervised Re-ID task. All source codes and models will be released for comparative study.

GAN, multi-scale Feature Enhancement.

1. Introduction

CR IP T

Keywords: Domain Adaptation, Person Re-identification, Feature Preserving

Person Re-identification (Re-ID) aims to match person images captured

AN US

from two non-overlapping cameras. Due to the importance in automated video surveillance and forensics, person Re-identification has drawn much attention 5

in the computer vision and machine learning communities [1, 2, 3, 4]. It is also a challenge computer vision task because the visual appearance of a person often undergoes intensive changes in illumination, background, camera viewangle and human pose. The art implicitly addresses this problem by learning

M

identity-discriminative but robust visual appearance characteristics or factors. Most existing Re-ID employ deep neural networks (DNNs) to learn robust

10

discriminative features such as optimising pairwise matching distance metrics

ED

[5, 6, 7] or deep learning methods [8, 9, 10]. For matching, the features are typically extracted from the very top feature layer of a trained model. It is

15

PT

widely acknowledged [11, 12, 13] that, when progressing from the bottom to the top layers, the visual concepts captured by the feature maps tend to be more abstract. As shown in Figure 1, shallow feature maps contain more de-

CE

tail information, middle-level feature maps contain more structure contextual information, and deep-level feature maps contain more abstract semantic infor-

AC

mation. And automatically learning the multi-scale robust discriminative visual

20

factors, plays an important role on Re-ID task [10]. In order to capture discriminative factors of multiple levels, [14, 15, 16] fo-

cused on learning semantic visual information with additional person attributes (gender, object carrying, clothing colour/texture, etc). However, annotating attributes is costly and error-prone in some degree. Then, HP-net [17], HA-CNN

2

person image

Figure 1:

shallow feature

middle-level feature

CR IP T

ACCEPTED MANUSCRIPT

deep-level feature

Shallow feature, middle-level feature and deep-level feature maps of person ex-

tracted from deep convolution model. As we can see that the shallow feature maps contain

AN US

details and texture feature, middle level feature maps contain more structured feature and deep feature maps contain more semantic information.

25

[9] and CMDL-Dis [18] automatically learn the attention model to capture local and global feature or multi-scale feature. Different from these outstanding approaches [17], [9] and [18] in attention model, we only pay attention on per-

M

son body part, and capture multi-scale discriminative person body visual factors including detail information, structure contextual information, and semantic in30

formation. These operations push the model capture rich robust discriminative

ED

feature. Based on ID-discriminative Embedding (IDE) [19] Re-ID model, firstly we extract multi-scale feature maps from IDE backbone network to automatically learn the discriminative visual cues that are insensitive to scene condition

35

PT

changes. Secondly, we introduce person semantic feature extracted from backbone of segmentation model, to directly enhance the response of person body

CE

region on these multi-scale feature maps. These operations could reduce the interference of background from different scenes in some degree. As shown in Figure 5, person semantic feature maps producted by backbone of segmentation

AC

network, have stronger response on person part than other region.

40

With the rapid development of image segmentation approaches including

FCN [20], Mask R-CNN [21], DeepLab V2 [22]. It has been proved that the body segmentation is robust to illumination, cloth colors, and this is useful for identifying a person [23]. DeepLab V2 model has good performance on seg-

3

ACCEPTED MANUSCRIPT

mentation task recently. Therefore, we use backbone of DeepLab V2 model 45

as the basic network of our Re-ID model. Based on backbone of DeepLab V2 model, combining multi-scale feature and person semantic feature strategies, we

CR IP T

proposed a multi-scale Feaure Enhancement (MFE) Re-ID model, as shown in Figure 4. In the MFE Re-ID model, firstly the multi-scale feature maps are extracted from the ”ResNet-101 Original scale” in DeepLab V2’s backbone net50

work. Secondly, person semantic feature map, extracted from backbone network of DeepLab V2 model, enhances the feature response of person part on these

multi-scale feature. Thirdly, this operation products enhanced multi-scale dis-

AN US

criminative feature maps. Because feature from the whole image contains the

global structure feature from scene, finally, discriminative person visual cues 55

are composed of feature of the whole image and above enhanced multi-scale discriminative feature maps. Our MFE Re-ID model could effectively improve the performance of IDE Re-ID model on person Re-ID task.

Although recently many methods [24, 25, 26, 27, 28, 29, 30] have good perfor-

60

M

mance on person Re-ID task, most existing Re-ID studies follow the supervised learning paradigm. The performance of these studies drops significantly in prac-

ED

tical Re-ID deployments due to the large variations between the source training dataset and the target scene. In general, source dataset and target dataset have different drastically in visual appearance due the different illumination

65

PT

conditions, and to the camera configurations and viewing angles. As shown in Figure 2, DukeMTMC [31] images and Market-1501 [32] images have drasti-

CE

cally different in background and illumination conditions. Besides, this manual annotation task can be very time consuming and expensive when considering the huge number of images from target scene. This significantly limits their

AC

scalability and usability in real-world large scale deployments with the need for

70

performing Re-ID across many scene. A common strategy for this problem is unsupervised domain adaptation,

which is more challenging than normal supervised person Re-ID task. Typically, unsupervised domain adaptation is a way of handling dataset bias [33] and also used to minimize the visual gap between labeled dataset and unlabeled 4

CR IP T

ACCEPTED MANUSCRIPT

Duke Images

Market Images

Figure 2: Sample images of (left:) DukeMTMC-reID dataset, (right:) Market-1501 dataset.

real images. While many unsupervised domain adaptation methods have been

AN US

75

developed [34, 35, 36], they typically offer weaker Re-ID performances when compared to the supervised methods. One main reason is that without labelled data across scenes, unsupervised methods lack the necessary knowledge of target scene due to different view angles, background and illumination. With a 80

varies of novel deep architectures, there are a few of studies pay more atten-

M

tion on cross-domain unsupervised person Re-identification, such as unlabeled samples generated by GAN (Generative Adversarial Network)[37], PTGAN [38],

ED

CamStyle[39], DATS[40] and SPGAN [41]. These approaches based on CycleGAN learn image-image translation models to minimize the gap between source 85

domain and target domain.

PT

In SPGAN [41], they introduce contrastive loss to CycleGAN [42] to preserve the person ID information, and the distance of two pair images in contrastive

CE

loss is caculated by feature vector of fully connection layer. As shown in Figure 3, deep-level two-dimensional convolution feature contains rich perceptual

90

information including structure semantic information and high frequency fea-

AC

ture [43, 42], which plays an important role on identifying person and is very limited in fully-connected feature. PTGAN [38] and DATS [40] introduce a person identity loss to CycleGAN [42], which is computed by first acquiring the foreground mask of a raw person image. However, the bad person mask are

95

partly resulted by low image resolution or similar foreground and background, which will lead to bad transferred person images. 5

ACCEPTED MANUSCRIPT

Convolution Feature Fully Connected Feature

Figure 3:

Fully Connected Layer Feature

AN US

Deep-level Convolution Feature

CR IP T

……

(upper:)A general convolution network structure, (lower left:) two-dimensional

convolution feature, and (lower right:) we spread a portion of the fully connected layer vector from left to right, from top to bottom. As shown in Figure, two-dimensional convolution feature contains more semantic information and structured context information than fully

M

connected layer feature.

In this paper, we aim to extend the CycleGAN [42] framework for image style transfer. We are interested in generating a perceptually high-quality image that

100

ED

contain rich person structure semantic information and style information of target scene. To perceptually high-quality images, we introduce perceptual loss terms for the generator in CycleGAN [42], corresponding to feature activation.

PT

Thus our approach is to regularize the original minimax optimization for CycleGAN [42] with perceptual loss terms. Introducing the perceptual loss into

CE

GAN, has been applied in many computer vision tasks, including Text-to-Image

105

Synthesis [44], Image Super-Resolution [43], and Image Transformation [45]. In this paper, we hope to learn a transfer function to preserve the high-level per-

AC

ceptual information of source images and ensure the transferred person images show similar styles with the target dataset. Therefore, we proposed Feature Preserving Generative Adversarial Network(FPGAN) by introducing the per-

110

ceptual loss [43] to CycleGAN. Perceptual loss enforce preserving perceptual similarity between the real and the generated images.

6

ACCEPTED MANUSCRIPT

In this paper, MFE Re-ID model proposed is designed to improve the performancd of IDE model in supervised Re-ID task, and further to provides the strong baseline Re-ID model for domain adaption Re-ID task. Combining MFE Re-ID model and FPGAN has good performance in domain adaption Re-ID

CR IP T

115

task. The contributions of this paper can be summarized as follows:

(1) We propose multi-scale Feature Enhancement(MFE) Re-ID model, effectively improves the performance of IDE model in supervised Re-ID task.

(2) We introduce FPGAN to improve the unsupervised cross-domain person Re-ID by preserving the underlying high-level perceptual information of person during image-image translation.

AN US

120

(3) In cross-domain and supervised Re-ID task, extensive experiments on two large scale datasets, Market-1501 [32] and DukeMTMC-reID [31], show that our framework could achieve the state-of-the-art results. 2. Related Work

M

125

In this section, we briefly review two-type works that are related to our

ED

approach: (1) Cross-Domain person Re-ID, (2) Visual Attention Mechanism. 2.1. Cross-Domain person Re-ID Hand-craft features [46, 47, 48, 49, 32] can be directly employed for unsupervised Re-ID in target dataset. But these feature design methods do not fully

PT

130

exploit rich information from data distribution. Some methods are based on

CE

saliency statistics [35, 50]. In [51], K-means clustering is used for learning an unsupervised asymmetric metric. Peng et al. Recently, some transfer learning algorithms [52, 53] are proposed to leverage the Re-ID models pre-trained in source datasets to improve the performance on target dataset. Peng et al.

AC

135

[52] propose a multi-task dictionary learning model to transfer a view-invariant representation from a labeled source dataset to an unlabeled target dataset. Geng et al. [53] transfer representations learned from large image classification datasets to Re-ID datasets using a deep neural network which combines classifi-

140

cation loss with verification loss. Besides, domain adaption and image-to-image 7

ACCEPTED MANUSCRIPT

translation approaches have been applied to Re-ID task increasingly. Deng et al. [41] combine CycleGAN [42] with similarity constraint for domain adaptation which improve performance in cross-dataset setting. Zhong et al. [54]

145

CR IP T

introduce camera style transfer approach to address image style variation across multiple views and learn a camera-invariant descriptor subspace. PTGAN [38] and DATS [40] introduce a person identity loss to CycleGAN [42], which is

computed by first acquiring the foreground mask of a raw person image. As described in Section 1, deep-level two-dimensional convolution feature contains

rich perceptual information including structure semantic information and high frequency feature [43, 42], which plays an important role on identifying per-

AN US

150

son and is very limited in fully-connected feature and pixel-wise image. Thus, different from [41], PTGAN [38] and DATS [40], we introduce the perceptual loss into CycleGAN. And during the transferring process, our FPGAN preserve person perceptual semantic information and capture more style information of target scene.

M

155

2.2. Visual Attention Mechanism

ED

In many computer vision field, such as person search [55], person Re-ID [56, 57], object tracking [58, 59] and Image Captioning [60, 61], visual attention mechanism been studied. It is efficient and effective via implementing a spatial attention map across each location of the features. In Re-ID task, HP-net [17],

PT

160

HA-CNN [9] and CMDL-Dis [18] automatically learn the attention model to capture local and global feature or multi-scale feature. Different from these

CE

outstanding approaches, we introduce person semantic feature extracted from backbone of segmentation model, to directly enhance the response of person body region on these multi-scale feature maps. And our MFE Re-ID model

AC

165

could capture multi-scale discriminative person body visual factors including detail information, structure contextual information, and semantic information. These operations push the model capture rich robust discriminative feature, and further reduce the interference of background from different scenes.

8

AN US

CR IP T

ACCEPTED MANUSCRIPT

Figure 4: Multi-scale Feature Enhancement (MFE) Re-ID model in backbone of DeepLab V2 model. The basic network of MFE Re-ID is backbone of DeepLab V2 model (dashed line box). MFE Re-ID model cotains Multi-scale Feature Representation and Person Semantic Feature Enhancement. Details about MFE Re-ID model can be refer to text description in

170

M

subsection 3.1.

3. THE PROPOSED APPROACH

ED

In this paper, firstly we describe the details of multi-scale Feature Enhancement(MFE) Re-ID model, which provides a strong baseline model for crossdomain adaptation Re-ID. Secondly, based on MFE Re-ID model, we cast per-

175

PT

son Re-identification as an unsupervised domain adaptation problem, to find an effectively unsupervised strategy for performing person Re-identification on the

CE

target scene.

3.1. Multi-scale Feature Enhancement Re-ID model

AC

Same as ID-discriminative Embedding (IDE) Re-ID model, our MFE Re-

ID model also regards Re-ID training as an image classification task. Firstly,

180

we simply describe the IDE Re-ID model. Secondly, we structure multi-scale Feaure Enhancement(MFE) Re-ID model to improve the performance of IDE model in the supervised Re-ID task. Besides, MFE Re-ID model provides the strong baseline model for cross-domain Re-ID task in next subsection. 9

ACCEPTED MANUSCRIPT

In the IDE Re-ID model [19], which uses ResNet-50 as backbone and follow 185

the training strategy in [19] for fine-tuning on the ImageNet [62] pre-trained model. Using the Softmax loss, IDE regards Re-ID training as an image clas-

CR IP T

sification task. Besides, based on the IDE Re-ID model [19], the IDE strong baseline model (S-baseline) [63] introduce some effective training tricks and de-

sign a new neck structure to improve the performance of traditional IDE model 190

in the Re-ID task. For more details about S-baseline, please refer to [63]. In or-

der to improve our proposed method has better performance on supervised and unsupervised Re-ID task, based on more strong IDE baseline model(S-baseline)

AN US

[63], we design our multi-scale Feaure Enhancement(MFE) Re-ID model.

We structure multi-scale Feaure Enhancement(MFE) Re-ID model, as shown 195

in Figure 4. The MFE Re-ID model could effectively improve the performance of IDE model in supervised Re-ID task. In MFE Re-ID model, firstly we use the backbone (three-scale ResNet-101 network, feature fusion part and upsampling part) of DeepLab V2 as our MFE model’s basic network. Secondly, multi-

200

M

scale semantic feature maps are extracted from MFE model’s basic network, to capture multi-scale latent discriminative feature about person. Thirdly, person

ED

semantic feature is extracted from MFE model’s basic network to enhance the feature response of person part on multi-scale discriminative feature, which could reduce the interference of background. More details about DeepLab V2

205

PT

could be found in [22]. Next, we specifically describe the second stage and third stage respectively, and provide the algorithm step of multi-scale feature

CE

enhancement.

Multi-scale Feature Representation. As shown in bottom part of Fig-

ure 4, we extract the shallow, middle-level and deep-level information of per-

AC

son from three-scale feature maps of ResNet101-Original Scale branch in MFE

210

model. As shown in Figure 1, shallow feature maps contain abundant color or texture feature, middle level feature maps contain more structured feature and deep feature maps contain more semantic information. In this paper, we specifically extract feature map res3d3, res4b22 and res5c in ResNet101-Original Scale branch as three-scale feature maps. By extracting the three-scale feature maps, 10

ACCEPTED MANUSCRIPT

215

our Re-ID model could capture robust multi-level discriminative visual factors to varaint environment. Person Semantic Feature Enhancement.

In addition to capturing

CR IP T

multi-level discriminative person information, enhancing the feature response of person body region also plays an important role on Re-ID task. As we know 220

that object segmentation is pixel-level classification. Feature map in object seg-

mentation model contains rich semantic feature information. Because DeepLab

V2 has good performance on object segmentation. Therefore, we use back-

bone network (dashed line box in Figure 4) of DeepLab V2 model as MFE

225

AN US

Re-ID model’s basic network. Backbone network (dashed line box in Figure 4) of DeepLab V2 model contains three scale ResNet-101 network, feature fusion part and upsampling part.

In order to acquire robust multi-scale discriminative person feature, firstly we extracted person semantic feature from MFE Re-ID model’s basic network (backbone network of DeepLab V2 model). As shown in Figure 5, the person semantic feature maps could make the person region more saliency than other

M

230

region, which makes person’s discriminative feature more robust to complex

ED

scene background. Secondly, applying feature fusion operation on the person semantic feature map with the three-scale Feature maps respectively, and output three scale enhanced feature maps. Thirdly, fusing the these three-scale enhanced feature maps, and then it generates the semantic Enhanced Feature

PT

235

map. Besides, feature from the whole image contains the global structure fea-

CE

ture from scene, which also plays an important role on Re-ID task in some degree. In our MFE Re-ID model, the final discriminative feature are composed of the whole image’s feature and semantic Enhanced Feature.

AC

240

Algorithm of multi-scale Feature Enhancement. In this part, we describe the algorithm of multi-scale Feature Enhancement.

The global structure is shown in Figure 4, and visual result of multi-scale Feature Enhancement operation is shown in Figure 5. Step 1: In Figure 4, extract three scale Feature maps (shallow, middle-level

245

and deep-level feature maps). Then apply bilinear interpolation on three scale 11

ACCEPTED MANUSCRIPT

Shallow Feature

Fusion

Fusion

CR IP T

+

+

Middle-level Feature

Semantic Enhanced Feature

Deep-level Feature

Person Semantic Feature

AN US

Multi-scale Enhanced Feature Maps

Figure 5: Multi-scale Feature Enhancement. Firstly, extract three scale Feature maps, including Shallow, Middel-level and Deep-level Feature maps. Secondly, extract person semantic feature map. Thirdly, using the person semantic feature map to enhance these three-scale Feature maps, and generates Multi-scale Enhanced Feature Maps. Finally, fusing the Multi-scale enhanced feature maps generates Semantic Enhanced Feature.

M

feature maps to adjust the size of these feature maps.

Step 2: Extract person semantic feature map from MFE Re-ID model’s basic network (dashed line box in Figure 4).

250

ED

Step 3: Using the person semantic feature map from step 2, to enhances the aboved three scale Feature maps in step 1, and outputs three scale enhanced

PT

feature maps in Figure 5.

Step 4: Fusing the these three scale enhanced feature maps from step 3, and then it generates Semantic Enhanced Feature map (Semantic En-

CE

hanced Feature in Figure 5). Apply pooling operation on Semantic Enhanced

255

Feature map, and then it generates single vector, namely vector a.

AC

Step 5: Extract global feature vector, namely vector b in Figure 4. Step 6: Concating vector a and vector b products a L2-normalized feature

vector c.

260

In this time, using the final vector c to train the classifier or implement

person Re-ID task.

12

CR IP T

ACCEPTED MANUSCRIPT

Figure 6: Pipeline of the our cross-domain Re-ID framework based on unpaired Image-toImage Translation. First, we translate the labeled images from a source domain to a target

AN US

domain by Feature Preserving GAN (FPGAN). Second, we train our multi-scale Feature

Enhancement (MFE) Re-ID model with the translated images in supervised learning methods.

3.2. Architecture of Domain Adaption person Re-ID

In this section, firstly we describe the pipeline of unsupervised domain adaption person Re-ID task. secondly, more details of our Feature Preserving Gen-

M

erative Adversarial Network (FPGAN) are described.

In this paper, we propose multi-scale Feature Enhancement(MFE) Re-ID

265

ED

network and Feature Preserving Generative Adversarial Network(FPGAN). MFE Re-ID model could provide strong baseline model for cross-domain Re-ID task. FPGAN performs image-to-image translation and creates a dataset on the tar-

270

PT

get domain in an unsupervised manner. The dataset inherits the labels from the source domain and thus can be used in supervised learning in the target

CE

domain. The overview of the proposed method is shown in Figure 6. We formulate cross-domain Re-ID task as the following two-steps: Firstly, labeled

images from the source domain are transferred to the target domain based on

AC

FPGAN, so that the transferred image has a similar style with the target. Sec-

275

ondly, the style-transferred images and their associated labels are used to train MFE Re-ID model in supervised learning.

13

ACCEPTED MANUSCRIPT

DT

Ds G

Target Dataset Image T

F

CR IP T

Source Dataset Image S

Figure 7: CycleGAN consists of two mapping functions: G: S −→T and F : T −→S , and

associated adversarial discriminators DT and DS . DT encourages G to translate S into

AN US

outputs indistinguishable from domain T , and vice versa for D S and F .

3.3. Feature Preserving GAN: Approach Details

In this section, we will show more details about FPGAN which translates the annotated dataset S from the source domain to target domain T in an 280

unsupervised manner. Applying this network, we can create a labeled training

3.3.1. CycleGAN Revisit

M

dataset G ( S ) on the target domain.

ED

As shown in Figure 7, CycleGAN introduces two generator-discriminator pairs, {G, DT } and {F, DS }. In the CycleGAN, the generator G maps a sample from source domain S to target domain T and the generator F maps a sample

PT

from target domain T to source domain S . In addition, CycleGAN include two adversarial discriminators D T and D S . For generator G, and its associated

CE

discriminator D T , the adversarial loss is

AC

LTadv (G, DT , px , py ) = IEy∼py [(DT (y) − 1)2 ] + IEx∼px [DT (G(x))2 ]

(1)

where px and py denote the image distributions in the source and target

dataset, respectively. For generator F , and its associated discriminator D S ,

the adversarial loss is LSadv (F, DS , px , py ) = IEx∼px [(DS (x) − 1)2 ] + IEy∼py [DS (F (y))2 ]

14

(2)

ACCEPTED MANUSCRIPT

Target Dataset Image IT

True Discriminator Network DT

Generator Network G

Target Dataset Style Transfer Image G(Is)

Source Dataset Image Is

Fake

CR IP T

Source Dataset Image Is





Φ( G( Is ))





Φ ( Is )

Network-Φ(·)

AN US

Figure 8: G Branch of the FPGAN, in which SiaNet(Network-Φ(·)) preserves the person

peceptual information during the style transfer, namely translated image and its counter part in the source dataset have the same peceptual feature.

Both cycle consistency losses can be expressed as

(3)

M

Lcyc (G, F ) = IEx∼px [kF (G(x)) − 1k1 ] + IEy∼py [kG(F (y)) − yk1 ] Lide loss function could be expressed as

ED

Lide (G, F ) = IEx∼px [kF (x) − 1k1 ] + IEy∼py [kG(y) − yk1 ]

(4)

As mentioned in [42], Lide loss function could preserve the color composition 285

PT

under image style translation between source dataset and target dataset. For more details about CycleGAN, please refer to [42].

CE

3.3.2. FPGAN

Applied in person Re-ID, preserving person feature and capturing image

styles of target scene are essential functions to generate improved samples for

AC

cross-domain person Re-ID task [41, 40, 38, 39]. In SPGAN [41], they introduce

290

contrastive loss to preserve the person ID information, and the distance of two pair images in contrastive loss is caculated by feature vector of fully connection layer. However, as shown in Figure 3, deep-level two-dimensional convolution feature map contains rich perceptual information including structure semantic

15

ACCEPTED MANUSCRIPT

information and high frequency feature [43, 42], which is very limited in fully295

connected feature. PTGAN [38] and DATS [40] introduce a person identity loss to CycleGAN, which is computed by first acquiring the foreground mask of a

CR IP T

raw person image. However, the bad person mask are partly resulted by low image resolution or similar foreground and background, which will lead to bad transferred person images.

Thus, we hope to learn a transfer function to preserve the high-level per-

ceptual information (person body part feature) of source images and ensure the transferred person images show similar styles with the target dataset. In order

AN US

to achieve the goal, inspired by [44, 43, 45], we proposed Feature Preserving Generative Adversarial Network(FPGAN) by introducing the perceptual loss [43] to CycleGAN. As shown in Figure 8, we show the G branch of FPGAN.

In the G branch of FPGAN, the SiaNet consists of two Network-Φ(·). We extract two-dimensional convolution feature maps as person perceptual feature from Network-Φ(·). And the distance in perceptual loss is calculated by two-

M

dimensional convolution feature maps. We design a Feature preserving loss to train SiaNet based on Perceptual loss [43]:

ED

LP er (x, y) = kΦ(G(x)) − Φ(x)k2F + kΦ(F (y)) − Φ(y)k2F

(5)

where x belongs to source dataset and y belongs to target dataset, Φ(x) indi-

300

PT

cates person perceptual feature (deep-level two-dimensional convolution feature) of person image x. Perceptual loss enforce perceptual similarity between the real

CE

and the generated images.

AC

Overall objective function in FPGAN can be written as

305

LF P = LTadv + LSadv + λ1 Lcyc + λ2 Lide + λ3 LP er

(6)

where λt , t∈ { 1, 3, 5 } controls the relative importance of four objectives.

The first three losses belong to the CycleGAN formulation [42], and the perceptual loss induced by SiaNet imposes a new constraint on the system. In the training phase, FPGAN is divided into three components including the generators, discriminators and a SiaNet. These three components are learned 16

ACCEPTED MANUSCRIPT

alternately. When the parameters of the generators and discriminators are fixed, 310

the parameters of the SiaNet is updated. We train the FPGAN until the con-

CR IP T

vergence or the maximum iterations.

4. Experiment 4.1. Datasets

We select two large-scale Re-ID datasets for experiment, i.e., Market-1501 315

[32] and DukeMTMC-reID [31]. Market-1501 has 12,936 training and 19,732

AN US

testing images with 1,501 identities in total from 6 cameras. We follow the

standard training and evaluation protocols in [26] where 751 identities are used for training and the remaining 750 for testing in a single query setting. DukeMTMC-reID is also a large-scale Re-ID dataset from 8 cameras. There 320

are 16,522 training images of 702 identities, 2,228 query images and 17,661 gallery images of the other 702 identities. Sample images of the two datasets

M

are shown in Figure 2. We use Rank-1 accuracy and mean Average Precision (mAP) for evaluation on these two big Re-ID datasets. In the unsu-

325

ED

pervised cross-domain adaption experiments, there are two source-target settings. 1. DukeMTMC-reID→Market-1501: Duke images which are translated to Market style, Target Domain is Market-1501 and Source Domain is

PT

DukeMTMC-reID; 2. Market-1501→DukeMTMC-reID: Market images translated to Duke style, Target Domain is DukeMTMC-reID and Source Do-

CE

main is Market-1501. 330

4.2. Implementation Details MFE Re-ID model training and testing. Firstly, the DeepLab V2

AC

model is pretrained on PASCAL VOC 2012 dataset to implement person segmentation task. Secondly, based on backbone network(three scale ResNet-101 network, feature fusion part and upsampling part) of DeepLab V2 model, MFE

335

Re-ID model is designed. We use mini-batch SGD to train CNN models on GTX 1080 Ti GPU. Training parameters such as batch size, maximum number

17

ACCEPTED MANUSCRIPT

Table 1: Results of the ablation study in supervised Re-ID task.

Method

Base Network

IDE [19]

DukeMTMC-reID Market-1501 mAP

Rank-1 mAP

ResNet-50

66.7%

46.3%

75.6% 51.9%

IDE+ (S-baseline) [63]

ResNet-50

86.9%

76.3%

94.1% 85.2%

IDE*

ResNet-101

87.4%

77.1%

94.4% 86.0%

IDE*+MS

ResNet-101

88.3%

77.7%

94.9% 86.7%

IDE*+PF

ResNet-101

88.0%

77.5%

95.2% 87.0%

DeepLabV2’s Backbone 88.7%

77.9%

95.8% 87.6%

AN US

MFE

CR IP T

Rank-1

epochs, momentum and gamma are set to 16, 50, 0.9 and 0.1, respectively. The initial learning rate is set to 0.001 and then divided by 10 at step 20k, 40k, 60k and 80k. At inference stage, we rank the gallery people according to their 340

feature distances to the target person.

M

FPGAN training and testing. We train FPGAN using the training datasets of Market-1501 and DukeMTMC-reID on Tensorflow. Note that, training FPGAN procedure belongs to unsupervised learning due to not using use

345

ED

any person ID label. In experiments, empirically set λ1 = 9, λ2 = 4 and λ3 = 2 in Eq. 6. With an initial learning rate 0.0002, and model stop training after 8

PT

epochs. During the testing procedure, we employ the Generator G for Market1501−→DukeMTMC-reID translation and the Generative F for DukeMTMCreID−→Market-1501 translation. The translated images are only used to train

CE

the MFE Re-ID model. FPGAN is composed of an Siamese network (SiaNet)

350

and a CycleGAN. For CycleGAN, we adopt the architecture described in [42]. For SiaNet, it contains 3 convolutional layers and 3 max pooling layers, config-

AC

ured as below. (1) Conv. 4 × 4, stride = 2; (2) Max pooling 2 × 2, stride = 2; (3) Conv. 4 × 4, stride = 2; (4) Max pooling 2 × 2, stride = 2; (5) Conv. 4 × 4, stride = 2; (6) Max pool 2 × 2, stride = 2.

18

ACCEPTED MANUSCRIPT

355

4.3. Ablation Study in MFE Re-ID model In this subsection, we mainly discuss the effectiveness of each strategy in MFE Re-ID model. Firstly, based on the traditional IDE-baseline [19], we in-

CR IP T

troduce recently more strong baseline [63] as our MFE’s basic model. Secondly, we demonstrate the effectiveness of each strategy by a series of ablation study.

IDE model for Re-ID task. In this part, firstly we introduce more

360

strong baseline [63] as our MFE’s basic model. Based on the traditional IDEbaseline (backbone network is ResNet-50) [19], the IDE strong baseline model

(S-baseline) [63] introduce some effective training tricks and design a new neck

365

AN US

structure to improve the performance of traditional IDE model in the Re-ID task. For more details about S-baseline, please refer to [63]. Table 1 shows that IDE+ (S-baseline) could effectively improve the performance of traditional

IDE model in Re-ID task. As shown in Table 1, IDE+ could achieve 86.9% and 76.3% in Rank-1 accuracy and mAP on DukeMTMC-reID respectively. And it could achieve 94.1% and 85.2% in Rank-1 accuracy and mAP on Market-1501 respectively.

M

370

Secondly, we use ResNet-101 as basic network instead of ResNet-50 in IDE+

ED

model. Table 1 shows the effectiveness of ResNet-101 backbone architecture in supervised Re-ID task. On DukeMTMC-reID, using the ResNet-101 as backbone network, IDE* leads to +0.5% and +0.8% improvement over IDE+ in Rank-1 accuracy and mAP, respectively. On Market-1501, the gains are +0.3%

PT

375

and +0.8%. Based on the IDE*, in next three part, we discuss the effectiveness

CE

of multi-scale feature strategy and person semantic enhancement strategy. The effectiveness of the multi-scale feature strategy. In order to prove

the effectiveness of multi-scale feature strategy, based on IDE* model we extract three-scale feature (feature map res3d3, res4b22 and res5c of ResNet101-Original

AC

380

Scale branch in MFE model) instead of deep-level feature to implement Re-ID task. Table 1 shows the effectiveness of multi-scale feature strategy(IDE*+MS). On DukeMTMC-reID, based on the original ResNet-101, IDE*+MS leads to +0.9% and +0.6% improvement in Rank-1 accuracy and mAP, respectively. On

385

Market-1501, the gains are +0.5% and +0.7%. 19

ACCEPTED MANUSCRIPT

The effectiveness of the person semantic enhancement. In this part, we discuss the effectiveness of person semantic enhancement strategy. As shown in Figure 5, person region in person semantic feature extracted from backbone

390

CR IP T

of DeepLav V2 model has stronger feature response than other region. In order to prove the effectiveness of the person semantic feature, firstly we extract person semantic feature from backbone of DeepLav V2 model. Secondly, we only

fuse the person semantic feature with deep-level feature to generate enhanced deep-level feature map. Note that multi-scale feature are not used here, and the deep-level feature is usually used to implement Re-ID task. Finally, using

the enhanced deep-level feature map instead of original deep-level feature map

AN US

395

implements person Re-ID task. Table 1 shows the effectiveness of person semantic feature(IDE*+PF). On DukeMTMC-reID, compared with IDE* based on ResNet-101, IDE*+PF leads to +0.6% and +0.4% improvement in Rank-1 accuracy and mAP, respectively. On Market-1501, the gains are +0.8% and 400

+1.0%.

M

The effectiveness of the MFE. In this part, we discuss the effectiveness of MFE Re-ID model. We introduce both person semantic enhancement

ED

strategy and multi-scale feature strategy into IDE* model, namely MFE Re-ID model. As shown in Figure 5, using person semantic feature to fuse shallow, 405

middle-level and deep-level feature respectively. As we can see that the feature

PT

response in person region is stronger than that of other region in Multi-scale Enhanced Feature maps. Table 1 shows the effectiveness of MFE Re-ID model

CE

in supervised Re-ID task. On DukeMTMC-reID, MFE Re-ID model leads to +1.3% and +0.8% improvement over IDE* in Rank-1 accuracy and mAP, re-

410

spectively. On Market-1501, the gains are +1.4% and +1.6%. Above all, it is

AC

proved that MFE Re-ID model could effectively the performance of IDE model in supervised and cross-domain person Re-ID task. 4.4. Ablation Study in FPGAN Image Quality Evaluation. As described in Section 1, we are interested

415

in generating a perceptually high-quality image that contain rich person struc20

CR IP T

ACCEPTED MANUSCRIPT

Market images to Duke style

PT

ED

M

Market images

Duke images to Market style

AN US

Duke images

Figure 9:

Sample images of (upper left:) DukeMTMC-reID dataset, (lower left:) Market-

CE

1501 dataset, (upper right:) Duke images which are translated to Market style, and (lower right:) Market images translated to Duke style. We use FPGAN for capturing the image style

AC

of target scene and preseving the person perceptual feature.

21

ACCEPTED MANUSCRIPT

Table 2: Image Quality Evaluation. We calculate the Fr´ echet inception distance (FID) between target images and style transferred images. DukeMTMC-reID→Market-1501: Duke images which are translated to Market style, and Market-1501→DukeMTMC-reID: Market

Method

CR IP T

images translated to Duke style.

DukeMTMC-reID→Market-1501 Market-1501→DukeMTMC-reID

SPGAN [41]

27.194

FPGAN

24.131

27.736 21.150

ture semantic information and style information of target scene. Image quality

AN US

plays an important role on cross-domain Re-ID task. In cross-domain Re-ID

task, style transferred images need to capture more image-style information about target scene except for preserving person perceptual feature. Thus, we 420

evaluate the image quality about SPGAN [41] and our FPGAN. Here, we use the Fr´echet inception distance (FID) [64] score as the quantitative evaluation metrics. And we calculate the FID between target images and style transferred

M

images. Lower FID values mean closer distances between target images and style transferred images. Lower FID values mean that style transferred images 425

capture more image-style information about target scene, which makes Re-ID

ED

model become more robust to target scene. As shown in Table 2, in two big ReID datasets(DukeMTMC-reID and Market-1501), the FID score of FPGAN is

PT

lower than that of SPGAN. Besides, examples of translated images by FPGAN are shown in Figure 9. As we can see that transferred images could capture 430

more style information of target scene including background and illumination,

CE

which is more suitable for cross-domain Re-ID task. Compared with various methods based on GAN. In this part, we

AC

compare FPGAN with various methods based on GAN in domain adaption Re-ID task, such as PTGAN, CycleGAN, and SPGAN.

435

As shown in Table 3 and Table 4, compared with PTGAN, CycleGAN, and

SPGAN, our FPGAN has better peformance in cross-domain Re-ID task on DukeMTMC-reID and Market-1501. Firstly, similar to PTGAN, CycleGAN, and SPGAN, we also set traditional IDE model as our Re-ID model. Table 3

22

ACCEPTED MANUSCRIPT

Table 3: Results of the ablation study of FPGAN and various methods based on GAN in cross-domain Re-ID task on Market-1501 dataset. ”Direct Transfer” means directly applying the source-trained model on the target domain. ”#” means the results directly from the

CR IP T

corresponding reference.

DukeMTMC-reID→Market-1501

Method

Re-ID model

Supervised Re-ID task

IDE

75.6%

Direct Transfer

IDE

43.2%

CycleGAN

IDE

48.1%

PTGAN# [38]

IDE(GoogLeNet)

38.6%

SPGAN [41]

IDE

50.5%

21.4%

FPGAN

IDE

51.8%

23.2%

FPGAN

IDE (ResNet-101)

52.8%

23.7%

FPGAN+LMP

IDE (ResNet-101)

59.1%

27.1%

FPGAN+LMP+MFE

MFE

64.4%

35.2%

mAP

51.9% 17.3% 20.7% -

M

AN US

Rank-1

and Table 4 show the effectiveness of FPGAN in domain adaption Re-ID task. On DukeMTMC-reID, FPGAN leads to +1.6% and +1.1% improvement over

ED

440

SPGAN in Rank-1 accuracy and mAP, respectively. On Market-1501, the gains are +1.3% and +1.8%. Secondly, we use ResNet-101 as the base network in IDE

PT

model, namely IDE (ResNet-101). Compared with IDE model, IDE (ResNet101) leads to +2.5% and +1.2% improvement in Rank-1 accuracy and mAP 445

on DukeMTMC-reID, respectively. On Market-1501, the gains are +1.0% and

CE

+0.5%. And then, we apply Local Max Pooling (LMP)[41, 39] in our approach during testing phase. With LMP, our approach (FPGAN+LMP) gains further

AC

improvement. Specifically, the Rank-1 and mAP of FPGAN+LMP is higher than FPGAN (IDE(ResNet-101)) by 6.3% and 3.4% respectively, when tested on

450

Market-1501. The Rank-1 and mAP of FPGAN+LMP is higher than FPGAN by 6.4% and 4.2% on DukeMTMC-reID, respectively. Finally, we proposed MFE Re-ID model to improve the performance of IDE model in person Re-ID task, and provide a strong baseline model for cross-domain Re-ID task. As shown 23

ACCEPTED MANUSCRIPT

Table 4:

Results of the ablation study of FPGAN and various methods based on GAN

in cross-domain Re-ID task on DukeMTMC-reID dataset. ”Direct Transfer” means directly applying the source-trained model on the target domain. ”#” means the results directly from

CR IP T

the corresponding reference.

Market-1501→DukeMTMC-reID

Method

Re-ID model

Supervised Re-ID task

IDE

66.7%

Direct Transfer(IDE)

IDE

27.6%

CycleGAN

IDE

36.2%

PTGAN# [38]

IDE(GoogLeNet)

27.4%

SPGAN [41]

IDE

37.6%

20.3%

FPGAN

IDE

39.2%

21.4%

FPGAN

IDE (ResNet-101)

41.7%

22.6%

FPGAN+LMP

IDE (ResNet-101)

48.1%

26.8%

FPGAN+LMP+MFE

MFE

52.1%

30.7%

mAP

46.3% 13.3% 19.3% -

M

AN US

Rank-1

in Table 3 and Table 4, FPGAN+LMP+MFE is higher than FPGAN+LMP by 5.3% and 8.1% respectively, when tested on Market-1501. The Rank-1 and

ED

455

mAP of FPGAN+LMP+MFE is higher than FPGAN+LMP by 4.0% and 3.9%

PT

on DukeMTMC-reID, respectively. 4.5. Comparison with State-of-the-art Methods in unsupervised Re-ID We compare the proposed method with recently state-of-the-art unsupervised learning methods on Market-1501 and DukeMTMC-reID in Table 5 and

CE

460

Table 6, respectively.

AC

We first compare our results with two hand-crafted features, i.e., Bag-of-

Words (BoW) [32] and local maximal occurrence (LOMO) [49]. Those two hand-crafted features are directly applied on test dataset without any training

465

process, their inferiority can be clearly observed. Secondly, we compare our method with two unsupervised methods including CAMEL [51], UMDL [52], and SSDAL [24]. These unsupervised methods exploit

24

ACCEPTED MANUSCRIPT

Table 5: Performance comparison with state-of-the-art unsupervised approaches on Market1501. CMC Rank-1 and mAP accuracies are reported. The scores of our proposed methods are shown in bold.

mAP

Bow [32]

35.8%

14.8%

LOMO [49]

27.2%

8.0%

SSDAL [24]

39.4%

-

CAMEL [51]

54.5%

-

SPGAN [41]

50.5%

21.4%

SPGAN+LMP [41]

57.7%

26.9%

PTGAN [38] TJ-AIDL [14] TF-Fusion [65] DATS[40] FPGAN

2015 ICCV

2015 CVPR 2016 CVPR 2017 ICCV

2018 CVPR 2018 CVPR

38.6%

-

2018 CVPR

58.2%

26.5%

2018 CVPR

58.2%

-

2018 CVPR

65.7%

-

2018 ECCV

52.8%

23.7%

Proposed

59.1%

27.1%

Proposed

64.4%

35.2%

Proposed

M

FPGAN+LMP

Reference

CR IP T

Rank-1

AN US

Method

ED

FPGAN+LMP+MFE

the unlabeled data on target domain for training Re-ID model and achieve higher results than hand-crafted methods. Finally, we compare our method with recently proposed state-of-the-art do-

PT

470

main adaptation methods, including the SSDAL [24], CAMEL [51], TJ-AIDL

CE

[14], SPGAN [41],TF-Fusion [65], PTGAN [38], and DATS[40]. On Market1501, our method achieves Rank-1 accuracy = 64.4% and mAP = 35.2% in Rank-1 accuracy and mAP, respectively. on DukeMTMC-reID, our method achieves Rank-1 accuracy = 52.1% and mAP = 30.7% in Rank-1 accuracy and

AC

475

mAP, respectively. Our method obtains competitive results compared with the state-of-the-art approaches.

25

ACCEPTED MANUSCRIPT

Table 6:

Performance comparison with state-of-the-art unsupervised approaches on

DukeMTMC-reID. CMC Rank-1 and mAP accuracies are reported. The scores of our proposed methods are shown in bold.

mAP

Bow [32]

17.1%

8.3%

LOMO [49]

12.3%

4.8%

UMDL [52]

18.5%

7.3%

PTGAN [38]

27.4%

-

SPGAN [41]

37.6%

20.3%

SPGAN+LMP [41]

46.4%

26.2%

TJ-AIDL [14] FPGAN FPGAN+LMP FPGAN+LMP+MFE

Reference

CR IP T

Rank-1

2015 ICCV

2015 CVPR 2016 CVPR 2018 CVPR 2018 CVPR 2018 CVPR

AN US

Method

44.3%

23.0%

2018 CVPR

41.7%

22.6%

Proposed

48.1%

26.8%

Proposed

52.1%

30.7%

Proposed

M

4.6. Comparison with State-of-the-art Methods in supervised Re-ID In unsupervised Re-ID task, our method has achieved state-of-the-art result. Besides, our approach is also very competitive with a series of state-of-the-

ED

480

art supervised techniques. In supervised Re-ID task, we train MFE Re-ID model in training set of two big datasets, and implement person Re-ID task

PT

on testing set respectively. We also compare the proposed method with the state-of the-art supervised learning methods on Market-1501 and DukeMTMCreID in Table 7 and Table 8, respectively. Experimental results show that our

CE

485

method achieves state-of-the-art results through multi-scale feaure enhancement strategy enhancing the feature response of the human body part. Specifically,

AC

we achieve rank-1 accuracy = 95.8% for Market-1501, and rank-1 accuracy = 88.7% for DukeMTMC-reID. And we achieve mAP = 87.6% for Market-1501,

490

and mAP= 77.9% for DukeMTMC-reID. All experimental results show that our MFE Re-ID model also achieves state-of-the-art results in supervised Re-ID task.

26

ACCEPTED MANUSCRIPT

Table 7: Performance comparison with state-of-the-art supervised techniques on Market-1501. CMC Rank-1 and mAP accuracies are reported. The scores of our proposed methods are shown in bold.

Rank-1

mAP

Point-to-Set [27]

70.7%

44.3%

CCAFA [7]

71.8%

45.5%

Consistent-Aware [28]

73.8%

47.1%

Spindle [29]

76.9%

-

HydraPlus-Net [17]

76.9%

-

Re-ranking[30]

77.1%

63.6%

GAN [66]

78.1%

56.2%

2017 ICCV

MSCAN [67]

80.3%

57.5%

2017 CVPR

DLPAR [68]

81%

63.4%

2017 ICCV

Scalable [69]

82.2%

68.8%

2017 CVPR

SVDNet [70]

82.3%

62.1%

2017 ICCV

MCAM[57]

83.8%

74.3%

2018 CVPR

88.1%

68.7%

2019 TIP

92.5%

81.3%

2018 CVPR

PCB [73]

93.8%

81.6%

2018 ECCV

Mancs[74]

93.1%

82.3%

2018 ECCV

S-Baseline[63]

94.1%

85.2%

2019 CVPR

95.8%

87.6%

Proposed

PT

2017 TPAMI 2017 CVPR 2017 CVPR 2017 ICCV

2017 CVPR

CE

MFE

2017 CVPR

AN US

ED

SPReID[72]

M

Camstyle[71]

Reference

CR IP T

Method

5. Conclusion This paper focuses on cross-domain person Re-ID model. In order to improve

the performance of the Re-ID model on target dataset, Firstly, we proposed

AC 495

MFE Re-ID model to provide a strong baseline Re-ID model for cross-domain person Re-ID task. In MFE model, multi-scale feaure enhancement strategy could enhance the multi-scale person feature response, which could reduce the interference of background and capture robust multi-scale discriminative visual

27

ACCEPTED MANUSCRIPT

Table 8: Performance comparison with state-of-the-art supervised techniques on DukeMTMCreID. CMC Rank-1 and mAP accuracies are reported. The scores of our proposed methods are shown in bold.

mAP

GAN [66]

67.7%

47.1%

OIM [75]

68.1%

-

APR [76]

70.7%

51.9%

TriNet [77]

72.4%

53.5%

SVDNet [70]

76.7%

56.8%

DPFL [78]

79.2%

60.6%

JLML [79]

2017 ICCV

2017 CVPR 2017 arXiv 2017 arXiv

2017 ICCV 2017 ICCV

73.3%

56.4%

2017 IJCAI

71.6%

51.5%

2018 TCSVT

75.3%

53.5%

2019 TIP

78.3%

57.6%

2018 CVPR

80.5%

63.8%

2018 CVPR

83.3%

69.2%

2018 ECCV

84.9%

71.8%

2018 ECCV

84.4%

71.0%

2018 CVPR

S-Baseline[63]

86.9%

76.3%

2019 CVPR

MFE

88.7%

77.9%

Proposed

PAN [80] Camstyle[71] IDE+CamStyle+RE [39]

PCB [73] Mancs[74]

PT

ED

SPReID[72]

M

HA-CNN [9]

500

Reference

CR IP T

Rank-1

AN US

Method

factors. MFE Re-ID model plays an important role in cross-domain and super-

CE

vised Re-ID task. Based on MFE Re-ID model, secondly we proposed FPGAN. During the image-image translation, FPGAN could effectively preserve the highlevel perceptual information of source images and ensure the transferred person

AC

images show similar styles with the target dataset. Experiment shows that FP-

505

GAN could better qualify the generated images for domain adaptation. On two source-target settings, FPGAN has good performance on cross-domain person Re-ID task. Overall, FPGAN and MFE Re-ID model achieve good performance in cross-domain and supervised person Re-ID task.

28

ACCEPTED MANUSCRIPT

6. Acknowledgements This work is supported by National Natural Science Foundation of China

510

(U1811463). We thank the anonymous reviewers for the insightful and con-

CR IP T

structive comments. We thank all authors finish this research and complete the writing of the paper. No conflict of interest: Xiuping Liu, Hongchen Tan, Xin Tong, Junjie Cao and Jun Zhou declare that they have no conflict of interest.

Conflict Of Interest Have no conflict of interest

References References

AN US

515

[1] L. Wei, Z. Rui, X. Tong, X. G. Wang, Deepreid: Deep filter pairing neural

M

network for person re-identification, in: IEEE Conference on Computer

520

Vision and Pattern Recognition (CVPR), 2014. doi:10.1109/CVPR.2014.

ED

27.

[2] X. Li, W. S. Zheng, X. Wang, T. Xiang, S. Gong, Multi-scale learning for

PT

low-resolution person re-identification, in: IEEE International Conference on Computer Vision, 2015. doi:10.1109/ICCV.2015.429.

525

CE

[3] D. Tao, L. Jin, Y. Wang, Y. Yuan, X. Li, Person re-identification by regularized smoothing kiss metric learning, IEEE Transactions on Circuits and Systems for Video Technology 23 (10) (2013) 1675–1685. doi:

AC

10.1109/TCSVT.2013.2255413.

530

[4] R. Zhang, L. Lin, R. Zhang, W. Zuo, L. Zhang, Bit-scalable deep hashing with regularized similarity learning for image retrieval and person reidentification, IEEE Transactions on Image Processing 24 (12) (2015) 4766– 4779. doi:10.1109/TIP.2015.2467315.

29

ACCEPTED MANUSCRIPT

[5] Z. Wei-Shi, G. Shaogang, X. Tao, Reidentification by relative distance comparison, IEEE Transactions on Pattern Analysis and Machine Intelligence

535

(TPAMI) 35 (3) (2013) 653–668. doi:10.1109/TPAMI.2012.138.

CR IP T

[6] T. Wang, S. Gong, X. Zhu, S. Wang, Person re-identification by dis-

criminative selection in video ranking, IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 38 (12) (2016) 2501–2514. doi:10.1109/TPAMI.2016.2522418.

540

[7] Y. C. Chen, X. Zhu, W. S. Zheng, J. H. Lai, Person re-identification by

AN US

camera correlation aware feature augmentation, IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 40 (2) (2017) 392–408. doi:10.1109/TPAMI.2017.2666805. 545

[8] W. Li, X. Zhu, S. Gong, Person re-identification by deep joint learning of multi-loss classification, 2017. doi:10.24963/ijcai.2017/305.

M

[9] W. Li, X. Zhu, S. Gong, Harmonious attention network for person reidentification, in: IEEE Conference on Computer Vision and Pattern

00243.

550

doi:10.1109/CVPR.2018.

ED

Recognition (CVPR), 2018, pp. 2285–2294.

[10] X. Chang, T. M. Hospedales, X. Tao, Multi-level factorisation net for per-

PT

son re-identification, in: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 2109–2118. doi:10.1109/cvpr.2018.

CE

00225. 555

[11] Y. Lecun, Y. Bengio, G. Hinton, Deep learning, Nature 521 (7553) (2015)

AC

436. doi:10.1038/nature14539.

[12] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan,

560

V. Vanhoucke, A. Rabinovich, Going deeper with convolutions, in: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015. doi:10.1109/CVPR.2015.7298594.

30

ACCEPTED MANUSCRIPT

[13] X. Jin, Y. Chen, J. Dong, J. Feng, S. Yan, Collaborative layer-wise discriminative learning in deep neural networks, in: European Conference on Computer Vision (ECCV), 2016, pp. 733–749. doi:10.1007/

565

CR IP T

978-3-319-46478-7_45. [14] J. Wang, X. Zhu, S. Gong, L. Wei, Transferable joint attribute-identity deep learning for unsupervised person re-identification, in: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 2275–2284. doi:10.1109/cvpr.2018.00242.

AN US

[15] X. Tong, L. Shuang, B. Wang, L. Liang, X. Wang, Joint detection and identification feature learning for person search, in: IEEE Conference on

570

Computer Vision and Pattern Recognition (CVPR), 2017. doi:10.1109/ CVPR.2017.360.

[16] N. Mclaughlin, J. M. D. Rincon, P. Miller, Person re-identification using

M

deep convnets with multi-task learning, IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) PP (99) (2013) 1–1. doi:

575

ED

10.1109/TCSVT.2016.2619498.

[17] X. Liu, H. Zhao, M. Tian, S. Lu, S. Jing, Y. Shuai, J. Yan, X. Wang, Hydraplus-net: Attentive deep features for pedestrian analysis, in: IEEE doi:10.

PT

International Conference on Computer Vision (ICCV), 2017. 1109/ICCV.2017.46.

580

CE

[18] S. Li, M. Shao, Y. Fu, Person re-identification by cross-view multi-level dictionary learning, IEEE Transactions on Pattern Analysis and Machine In-

AC

telligence (TPAMI) 40 (12) (2018) 2963–2977. doi:10.1109/TPAMI.2017.

585

2764893.

[19] L. Zheng, Y. Yang, A. G. Hauptmann, Person re-identification: Past, present and future, 2016. doi:10.1093/med/9780199644957.003.0015.

[20] E. S. J. Long, T. Darrell, Fully convolutional networks for semantic segmen-

31

ACCEPTED MANUSCRIPT

tation, IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 39 (4) (2017) 640–651. doi:10.1109/TPAMI.2016.2572683. 590

[21] K. He, G. Gkioxari, P. Dollr, R. Girshick, Mask r-cnn, IEEE Transactions

1–1. doi:10.1109/TPAMI.2018.2844175.

CR IP T

on Pattern Analysis and Machine Intelligence (TPAMI) PP (99) (2017)

[22] L. C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, A. L. Yuille, Deeplab:

Semantic image segmentation with deep convolutional nets, atrous con-

volution, and fully connected crfs, IEEE Transactions on Pattern Anal-

595

AN US

ysis and Machine Intelligence (TPAMI) PP (99) (2016) 834–848. doi: 10.1109/TPAMI.2017.2699184.

[23] W. Liang, T. Tan, S. Member, H. Ning, W. Hu, Silhouette analysis-based gait recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 25 (12) (2003) 1505–1518. doi:10.1109/tpami.2003.

600

M

1251144.

[24] X. T. G. S. Zhang, L., Learning a discriminative null space for person

ED

re-identification, in: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 1335–1344. doi:10.1109/CVPR.2016.139. 605

[25] D. Chen, Z. Yuan, B. Chen, N. Zheng, Similarity learning with spatial

PT

constraints for person re-identification, in: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 1268–1277. doi:10.

CE

1109/CVPR.2016.142. [26] R. R. Varior, M. Haloi, W. Gang, Gated siamese convolutional neural network architecture for human re-identification, in: European Conference on

AC

610

Computer Vision (ECCV), 2016. doi:10.1007/978-3-319-46484-8_48.

[27] S. Zhou, J. Wang, J. Wang, Y. Gong, N. Zheng, Point to set similarity based deep feature learning for person re-identification, in: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 3741–3750.

615

doi:10.1109/CVPR.2017.534. 32

ACCEPTED MANUSCRIPT

[28] J. Lin, L. Ren, J. Lu, J. Feng, J. Zhou, Consistent-aware deep learning for person re-identification in a camera network, in: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 3396–3405.

620

CR IP T

doi:10.1109/CVPR.2017.362. [29] H. Zhao, M. Tian, S. Sun, S. Jing, J. Yan, Y. Shuai, X. Wang, X. Tang,

Spindle net: Person re-identification with human body region guided feature decomposition and fusion, in: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. doi:10.1109/CVPR.2017.103.

AN US

[30] Z. Zhong, L. Zheng, D. Cao, S. Li, Re-ranking person re-identification

with k-reciprocal encoding, in: IEEE Conference on Computer Vision and

625

Pattern Recognition (CVPR), 2017. doi:10.1109/CVPR.2017.389. [31] E. Ristani, F. Solera, R. Zou, R. Cucchiara, C. Tomasi, Performance measures and a data set for multi-target, multi-camera tracking, in: Eu-

M

ropean Conference on Computer Vision (ECCV), 2016. doi:10.1007/ 978-3-319-48881-3_2.

630

ED

[32] L. Zheng, L. Shen, L. Tian, S. Wang, J. Wang, Q. Tian, Scalable person reidentification: A benchmark, in: IEEE International Conference on Computer Vision (ICCV), 2015, pp. 1116–1124. doi:10.1109/ICCV.2015.133.

PT

[33] A. Torralba, A. A. Efros, Unbiased look at dataset bias, in: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2011.

635

CE

doi:10.1109/CVPR.2011.5995347. [34] L. Giuseppe, M. Iacopo, A. D. Bagdanov, D. B. Alberto, Person re-

AC

identification by iterative re-weighted sparse ranking, IEEE Transactions

640

on Pattern Analysis and Machine Intelligence (TPAMI) 37 (8) (2015) 1629– 1642. doi:10.1109/tpami.2014.2369055.

[35] H. Wang, S. Gong, T. Xiang, Unsupervised learning of generative topic saliency for person re-identification, in: British Machine Vision Association (BMVC), 2014. doi:10.5244/c.28.48. 33

ACCEPTED MANUSCRIPT

[36] Z. R, O. W, W. X, Person re-identification by saliency learning, IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 39 (2)

645

(2017) 356–370. doi:10.1109/TPAMI.2016.2544310.

CR IP T

[37] Y. Y. Zhedong Zheng, Liang Zheng, Unlabeled samples generated by gan improve the person re-identification baseline in vitro, in: International Con-

ference on Computer Vision (ICCV), 2017. doi:10.1109/ICCV.2017.405. 650

[38] L. Wei, S. Zhang, W. Gao, Q. Tian, Person transfer gan to bridge domain gap for person re-identification, in: IEEE Conference on Computer Vision

AN US

and Pattern Recognition (CVPR), 2018, pp. 79–88. doi:10.1109/cvpr. 2018.00016.

[39] Z. Zhong, L. Zheng, Z. Zheng, S. Li, Y. Yang, Camera style adaptation for person re-identification, in: IEEE Conference on Computer Vision and

655

Pattern Recognition (CVPR), 2018. doi:10.1109/cvpr.2018.00541.

M

[40] S. Bak, P. Carr, J.-F. Lalonde, Domain adaptation through synthesis for unsupervised person re-identification, in: European Conference on Com-

660

ED

puter Vision (ECCV), 2018. doi:10.1007/978-3-030-01261-8_12. [41] W. Deng, Z. Liang, G. Kang, Y. Yi, J. Jiao, Image-image domain adaptation with preserved self-similarity and domain-dissimilarity for person

PT

re-identification, in: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 994–1003. doi:10.1109/cvpr.2018.

CE

00110. 665

[42] J. Y. Zhu, T. Park, P. Isola, A. A. Efros, Unpaired image-to-image trans-

AC

lation using cycle-consistent adversarial networks, in: IEEE International Conference on Computer Vision (ICCV), 2017. doi:10.1109/ICCV.2017.

244.

[43] L. T. Christian Ledig, et.al., Photo-realistic single image super-resolution

670

using a generative adversarial network, in: IEEE Conference on Computer

34

ACCEPTED MANUSCRIPT

Vision and Pattern Recognition (CVPR), 2017. doi:10.1109/CVPR.2017. 19. [44] C. Miriam, G. Youngjune, K. H. T., Adversarial nets with perceptual losses

CR IP T

for text-to-image synthesis, in: IEEE International Workshop on Machine Learning for Signal Processing (MLSP), 2017. doi:10.1109/mlsp.2017.

675

8168140.

[45] R. Mechrez, I. Talmi, L. Zelnik-Manor, The contextual loss for im-

age transformation with non-aligned data, in: Proceedings of the Eu-

978-3-030-01264-9_47.

680

AN US

ropean Conference on Computer Vision (ECCV), 2018. doi:10.1007/

[46] B. Ma, S. Yu, F. Jurie, Covariance descriptor based on bio-inspired features for person re-identification and face verification , Image and Vision Computing 32 (6-7) (2014) 379–390. doi:10.1016/j.imavis.2014.04.002.

M

[47] D. Gray, H. Tao, Viewpoint invariant pedestrian recognition with an ensemble of localized features, in: European Conference on Computer Vision

685

ED

(ECCV), 2008, pp. 262–275. doi:10.1007/978-3-540-88682-2_21. [48] T. Matsukawa, T. Okabe, E. Suzuki, Y. Sato, Hierarchical gaussian descriptor for person re-identification, in: IEEE Conference on Computer Vision

690

PT

and Pattern Recognition (CVPR), 2016. doi:10.1109/CVPR.2016.152. [49] S. Liao, Y. Hu, X. Zhu, S. Z. Li, Person re-identification by local maximal

CE

occurrence representation and metric learning, in: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 2197–2206.

AC

doi:10.1109/CVPR.2015.7298832.

[50] Z. Rui, W. Ouyang, X. Wang, Unsupervised salience learning for person

695

re-identification, in: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013. doi:10.1109/CVPR.2013.460.

35

ACCEPTED MANUSCRIPT

[51] H. X. Yu, A. Wu, W. S. Zheng, Cross-view asymmetric metric learning for unsupervised person re-identification, in: IEEE International Conference on Computer Vision (ICCV), 2017. doi:10.1109/ICCV.2017.113. [52] P. Peng, T. Xiang, Y. Wang, M. Pontil, S. Gong, T. Huang, Y. Tian,

CR IP T

700

Unsupervised cross-dataset transfer learning for person re-identification, in: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. doi:10.1109/CVPR.2016.146.

[53] M. Geng, Y. Wang, T. Xiang, Y. Tian, Deep transfer learning for person re-identification, 2016. doi:arXivpreprintarXiv:1611.05244.

AN US

705

[54] L. Chen, H. Zhang, J. Xiao, L. Nie, J. Shao, T. S. Chua, Sca-cnn: Spatial and channel-wise attention in convolutional networks for image captioning, in: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 6298–6306. doi:10.1109/CVPR.2017.667. [55] L. Shuang, X. Tong, H. Li, Y. Wei, X. Wang, Identity-aware textual-visual

M

710

matching with latent co-attention, in: IEEE Conference on Computer Vi-

209.

ED

sion and Pattern Recognition (CVPR), 2017. doi:10.1109/iccv.2017.

[56] M. M. Kalayeh, E. Basaran, M. Gokmen, M. E. Kamasak, M. Shah, Human semantic parsing for person re-identification, in: IEEE Confer-

PT

715

ence on Computer Vision and Pattern Recognition (CVPR), 2018. doi:

CE

10.1109/cvpr.2018.00117. [57] S. Chunfeng, H. Yan, W. Liang, O. Wanli, Mask-guided contrastive atten-

AC

tion model for person re-identification, in: IEEE Conference on Computer

720

Vision and Pattern Recognition (CVPR), 2018. doi:10.1109/cvpr.2018. 00129.

[58] A. He, L. Chong, X. Tian, W. Zeng, A twofold siamese network for realtime object tracking, in: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018. doi:10.1109/cvpr.2018.00508. 36

ACCEPTED MANUSCRIPT

725

[59] Z. Zheng, W. Wei, Z. Wei, J. Yan, End-to-end flow correlation tracking with spatial-temporal attention, in: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018. doi:10.1109/cvpr.2018.00064.

CR IP T

[60] Q. You, H. Jin, Z. Wang, F. Chen, J. Luo, Image captioning with semantic attention, in: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. doi:10.1109/cvpr.2016.503.

730

[61] J. Lu, C. Xiong, D. Parikh, R. Socher, Knowing when to look: Adaptive attention via a visual sentinel for image captioning, in: IEEE Conference on

AN US

Computer Vision and Pattern Recognition (CVPR), 2017. doi:10.1109/ cvpr.2017.345. 735

[62] J. Deng, W. Dong, R. Socher, L. J. Li, K. Li, F. F. Li, Imagenet: a large-scale hierarchical image database, in: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009, pp. 248–255. doi:

M

10.1109/CVPR.2009.5206848.

[63] H. Luo, Y. Gu, X. Liao, S. Lai, W. Jiang, Bag of tricks and a strong baseline for deep person re-identification, in: IEEE Conference on Computer Vision

ED

740

and Pattern Recognition (CVPR), 2019. [64] M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, S. Hochreiter, Gans

PT

trained by a two time-scale update rule converge to a local nash equilibrium, in: Advances in Neural Information Processing Systems (NIPS), 2017. [65] J. Lv, W. Chen, Q. Li, C. Yang, Unsupervised cross-dataset person re-

CE

745

identification by transfer learning of spatial-temporal patterns, in: IEEE

AC

Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 7948–7956. doi:10.1109/cvpr.2018.00829.

[66] Z. Zhedong, Z. Liang, Y. Yi, Unlabeled samples generated by GAN improve

750

the person re-identification baseline in vitro, in: IEEE International Conference on Computer Vision (ICCV), 2017. doi:10.1109/iccv.2017.405.

37

ACCEPTED MANUSCRIPT

[67] L. Dangwei, C. Xiaotang, Z. Zhang, H. Kaiqi, Learning deep context-aware features over body and latent parts for person re-identification, in: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.

CR IP T

doi:10.1109/cvpr.2017.782.

755

[68] Deeply-learned part-aligned representations for person re-identification, in: IEEE International Conference on Computer Vision (ICCV), 2017. doi: 10.1109/iccv.2017.349.

[69] S. Bai, X. Bai, Q. Tian, Scalable person re-identification on supervised

AN US

smoothed manifold, in: IEEE Conference on Computer Vision and Pattern

760

Recognition (CVPR), 2017. doi:10.1109/cvpr.2017.358.

[70] S. Yifan, Z. Liang, D. Weijian, W. Shengjin, Svdnet for pedestrian retrieval, in: IEEE International Conference on Computer Vision (ICCV), 2017. doi: 10.1109/iccv.2017.410.

[71] Z. Zhong, L. Zheng, Z. Zheng, S. Li, Y. Yang, CamStyle: A novel data aug-

M

765

mentation method for person re-identification, IEEE Transactions on Image

ED

Processing 28 (3) (2019) 1176–1190. doi:10.1109/tip.2018.2874313. [72] M. M. Kalayeh, E. Basaran, M. Gokmen, M. E. Kamasak, M. Shah, Human semantic parsing for person re-identification, in: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.

PT

770

[73] Y. Sun, L. Zheng, Y. Yang, Q. Tian, S. Wang, Beyond part models: Person

CE

retrieval with refined part pooling (and a strong convolutional baseline), in: Proceedings of the European Conference on Computer Vision (ECCV),

AC

2018. doi:10.1007/978-3-030-01225-0_30.

775

[74] C. Wang, Q. Zhang, C. Huang, W. Liu, X. Wang, Mancs: A multi-task attentional network with curriculum sampling for person re-identification, in: Proceedings of the European Conference on Computer Vision (ECCV), 2018. doi:10.1007/978-3-030-01225-0_23.

38

ACCEPTED MANUSCRIPT

[75] X. Tong, L. Shuang, W. Bochao, L. Liang, W. Xiaogang, Joint detection and identification feature learning for person search, in: IEEE Conference

780

on Computer Vision and Pattern Recognition (CVPR), 2017. doi:10.

CR IP T

1109/cvpr.2017.360. [76] Y. Lin, Z. Liang, Z. Zheng, W. Yu, Y. Yi, Improving person re-identification

by attribute and identity learning, 2017. doi:arXivpreprintarXiv:1703. 07220.

785

[77] A. Hermans, L. Beyer, B. Leibe, In defense of the triplet loss for person

AN US

re-identification, 2017. doi:arXivpreprintarXiv:1703.07737.

[78] Y. Chen, X. Zhu, S. Gong, Person re-identification by deep learning multiscale representations, in: IEEE International Conference on Computer Vision (ICCV), 2017. doi:10.1109/iccvw.2017.304.

790

[79] L. Wei, Z. Xiatian, G. Shaogang, Person re-identification by deep joint

M

learning of multi-loss classification, in: International Joint Conference on Artificial Intelligence (IJCAI), 2017. doi:10.24963/ijcai.2017/305.

ED

[80] Z. Zhedong, Z. Liang, Y. Yi, Pedestrian alignment network for large-scale person re-identification, IEEE Transactions on Circuits and Systems for

795

AC

CE

PT

Video Technology (TCSVT)doi:10.1109/tcsvt.2018.2873599.

Xiuping Liu is a Professor in School of Mathematical Sciences at Dalian

University of Technology, P.R. China. She received Ph.D degrees in computa-

800

tional mathematics from Dalian University of Technology. Her research interests include shape modeling and analyzing.

39

CR IP T

ACCEPTED MANUSCRIPT

Hongchen Tan is doctor candidate of Mathematical Sciences at Dalian University of Technology. His research interest is object detection, person Reidentification and Cross-modal Retrieval.

AN US

805

Xin Tong is a principal researcher in Internet Graphics Group of Microsoft Research Asia. His research interests include appearance modeling and render-

810

M

ing, texture synthesis, and image based modeling and rendering. Specifically, his research concentrates on studying the underline principles of material light

ED

interaction and light transport, and developing efficient methods for appearance modeling and rendering. He is also interested in performance capturing

CE

PT

and facial animation.

Junjie Cao is a lecturer in School of Mathematical Sciences at Dalian Uni-

815

AC

versity of Technology, P.R. China. He received Ph.D degrees in computational mathematics from Dalian University of Technology. His research interests include shape modeling, image processing and machine learning.

40

CR IP T

ACCEPTED MANUSCRIPT

Jun Zhou is a Ph.D. Candidate in School of Mathematical Sciences at

820

Dalian University of Technology, P.R. China. He received the B.S. in Information and Computing Science from Dalian University of Technology. His research

AC

CE

PT

ED

M

AN US

interests include computer graphics, image processing, machine learning.

41