Person re-identification post-rank optimization via hypergraph-based learning

Person re-identification post-rank optimization via hypergraph-based learning

Accepted Manuscript Person re-identification post-rank optimization via hypergraph-based learning Saeed-Ur Rehman , Zonghai Chen , Mudassar Raza , Pe...

1MB Sizes 0 Downloads 22 Views

Accepted Manuscript

Person re-identification post-rank optimization via hypergraph-based learning Saeed-Ur Rehman , Zonghai Chen , Mudassar Raza , Peng Wang , Qibin Zhang PII: DOI: Reference:

S0925-2312(18)30127-9 10.1016/j.neucom.2018.01.086 NEUCOM 19288

To appear in:

Neurocomputing

Received date: Revised date: Accepted date:

20 December 2016 30 January 2018 31 January 2018

Please cite this article as: Saeed-Ur Rehman , Zonghai Chen , Mudassar Raza , Peng Wang , Qibin Zhang , Person re-identification post-rank optimization via hypergraph-based learning, Neurocomputing (2018), doi: 10.1016/j.neucom.2018.01.086

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT

Highlights 

Hypergraph is exploited to describe relationship between person re-identification images concerning highorder. To the best of our knowledge, especially for image re-ranking in person re-identification no existing work explores the features of hypergraph. A new refinement algorithm is presented for rank list classification and filtering. This process is based on relative score estimation and rank categorization.



CR IP T



A hypergraph is constructed via discriminative information from different visual features and complex visual representations, whereas individual weight learning is performed using soft assignment and each hyperedge is given a unique weight.

Promising results are achieved which are compared to available state-of-the-art approaches.

AC

CE

PT

ED

M

AN US



1

ACCEPTED MANUSCRIPT

Person re-identification post-rank

CR IP T

optimization via hypergraph-based learning Saeed-Ur-Rehman, *Zonghai Chen, Mudassar Raza, Peng Wang, Qibin Zhang

AN US

Department of Automation, University of Science and Technology of China (USTC), Hefei, Anhui, PR. China

*Corresponding Author: Zonghai Chen (email:[email protected]) Abstract: - In computer vision, person re-identification has recently received significant attention from researchers

M

and is becoming an emerging research domain with various challenges. Specifically, re-ranking or post-rank optimization is a significant challenge. Existing re-identification methods perform well in certain particular

ED

scenarios, but their performance at rank-1 remains a major concern. Such methods cannot model the complex and higher-order relationship among the images. To address such issues, we present a hypergraph-based learning scheme that not only improves the rank-1 accuracy but also models the complex and higher-order relationships among the

PT

images. After obtaining the rank list using a baseline method, we apply a new refinement algorithm on it to classify

CE

ranks accordingly. Furthermore, to discover the relationship among samples, we utilize the hypergraphs for re-rank learning. A soft assignment technique is used to perform weight learning of hyperedges. The proposed method

AC

achieves better ranking performance; consequently, the re-identification is improved. An extensive experimental analysis on challenging and publicly available datasets reveals that the proposed re-ranking scheme performs better than the existing methods. Keywords: Hypergraph-based learning, post-rank optimization, person re-identification, rank classification 1. Introduction

2

ACCEPTED MANUSCRIPT

In the recent era, vision systems are used in public places, such as airports, subway stations and shopping malls, for regular security monitoring. The surveillance cameras generate visual data that are passed to security agencies, law enforcement units, and various other military and civil divisions for investigation and forensic purposes. Likewise, the analyst can easily monitor the activities and behaviors of individuals or groups of people at a specific time. Additionally, this visual information facilitates the officers in the generation of surveillance, event predictions

CR IP T

and in-time alerts in various situations. Person re-identification is an important application for this technology, where images of people are matched across multiple cameras. Recently, person re-identification has appeared as an emerging field for researchers [1-3]. In such a re-identification system, a query image, also called a probe, is compared with database images, mainly known as a gallery set. It is very important that we match and present the

AN US

images to users for prompt decisions. In this case, the accuracy or recognition rate is very important.

In general, person re-identification is difficult to automate due to various challenges such as variations in lighting, background, and viewpoint that can degrade the performance of a re-identification system. Recently, a range of techniques [4-8] has been presented in the literature to improve the accuracy of person re-identification systems. The exact match and retrieval of a person’s image from large database is an intricate process in a person re-identification

M

system, where the goal is to produce the rank list that contains target images after matching with the gallery set. Existing methods primarily focus on two aspects, 1) generating robust feature representations or feature descriptors

ED

[9] and 2) learning an effective distance metric [10]. In most approaches, the matching score is computed between the query and every gallery image via extracted features, and then a rank list is generated. Such type of pairwise

PT

similarity is unable to explore the complex and high-order relationships between the sample images. Therefore, it leads to suboptimal matching results, especially at rank-1. Although several efforts have been made to improve

CE

accuracy and performance, still there is space for further improvements [2]. Fig. 1 illustrates a general person reidentification system including a post-ranking module.

AC

To boost the ranking results and to enhance the rank-1 accuracy of a re-identification system, several re-ranking

systems, additionally called post-rank optimization approaches, have been investigated. In our study, both aforementioned terms are used interchangeably. Soft biometric [11] , post-rank optimization [12] , bidirectional reranking [13] and saliency re-ranking [14] have tried to improve the rank list performance. Although various add-on techniques [11, 13, 15] are applied with the baseline methods [4, 5, 14] and different algorithms have been proposed for image re-ranking, existing results show that the rank-1 accuracy is still not guaranteed and the resultant

3

ACCEPTED MANUSCRIPT

techniques are disjoint from the initial matching results [2, 3]. Hence, extracting features from different feature spaces and exploring higher-order relationships between the samples can be a solution. Therefore, the focus of this

M

AN US

CR IP T

study is the post-rank optimization, or re-ranking, which is currently least addressed in the literature [2].

ED

Fig. 1. A typical person re-identification system with a post-ranking module In this work, we present a hypergraph-based learning method that solves the re-ranking problem for a human reidentification system. Hypergraphs have demonstrated outstanding performance in clustering, image retrieval and

PT

person re-identification [16-18]. Unlike a conventional graph, where an edge connects only two vertices and

CE

considers the pair wise relationship, a hypergraph explores the higher-order relationships and constructs a structure having more than two vertices connected with one edge. This structure also helps in the visualization and consideration of complex data [19].

AC

The main contributions of this study are as follows: 1.

A hypergraph is exploited to describe the higher-order relationship among person re-identification images. To the best of our knowledge, no existing work is found for image re-ranking in person re-identification that explores the features of a hypergraph.

2.

A refinement algorithm is presented for classification and filtering of initial rank lists. The process is based on relative score estimation and rank categorization.

4

ACCEPTED MANUSCRIPT

3.

A hypergraph is constructed via discriminative information from different visual features and hyperedge weight learning is performed using soft assignment mechanism.

The rest of the paper is organized as follows: Section II lists the related work. Section III introduces the motivations and preliminaries of hypergraph-based learning. A detailed overview and other components of the proposed re-ranking system are given in Section IV. Section V explains experimental configuration and comparison

CR IP T

with state-of-the-art methods, while Section VI presents the paper’s conclusions.

2. Related Work

Most existing approaches that focus on improving matching percentage for person re-identification are

AN US

categorized as either feature representative methods [9] or metric learning methods [20]. Feature representative methods are based on local and global features including color and texture. Color features are invariant to the viewpoint and pose variations and are broadly applied in vision systems [21-23]. These use histogram-oriented color descriptors[23], scale-invariant feature transform (SIFT) based color descriptors [24], and Schmid filters [25]. Haarlike [26], Gabor wavelet [27] and local binary patterns (LBP) [28] are texture feature descriptors. Additionally, the

M

descriptors utilizing gradient feature useful in a re-identification framework are (SIFT) [29], speeded up robust features (SURF) [30], the histogram of oriented gradients (HOG) [31] and the pyramid of the histogram of oriented

ED

gradients (PHOG) [32]. Recently, local maximal occurrence (LOMO) [3] and Hexagonal-SIFT [33] descriptors have been proposed for vision systems and have achieved robust results.

PT

Metric learning methods use extracted features and calculate the optimal distance metric between images. This approach can enhance performance in classification, clustering and retrieval tasks [10] and is also applied in person

CE

re-identification by maximizing the distance and minimizing the differences between matched and mismatched image pairs [34]. Weinberger and Saul [20] present a large margin nearest neighbor (LMNN) method to find a

AC

reliable distance computation. Davis et al. [4] introduce an information theoretic metric learning (ITML) method for computing a Mahalanobis distance. Zheng et al. [35] provide a probabilistic relative distance comparison (PRDC) learning model, which boosts the likelihood of a potential image based on the smaller distance between a true and a false match pair. M. Kostinger et al. [5] produce a large-scale metric learning from equivalence constraints

(KISSME) strategy to learn a Mahalanobis metric from similar and dissimilar pairs. Later, a regularized smooth

5

ACCEPTED MANUSCRIPT

KISSME [36] and an error-based KISSME [37] methods are put forward to boost the performance of the KISS

ED

M

AN US

CR IP T

metric learning method. Recently, a metric learning method named cross view quadratic discriminant analysis

Fig. 2. Block diagram of the proposed system

PT

(XQDA) [3] is proposed that is employed in combination with the local maximal occurrence. Although these methods attain better results, rank-1 accuracy needs further improvements. Recently, various methods [38-41] have

CE

been proposed that utilize deep learning and produce robust results. Re-ranking regarding person re-identification is an emerging field that has been addressed by few researchers [2].

AC

One approach found is related to the selection and usage of different attributes [11] that the authors found useful in images. These attributes include using gender, backpack, short hair, jeans and “carrying” (having any object in their hands e.g., jackets/coats/uppers, pages or handbags) to re-rank the results. Moreover, the authors exploit the idea of sliding windows, in which the initial results are split into non-overlapping sections called windows. Specifically, in every window, three images are placed from the initial ranking results. Further, re-ranking is performed using the aforementioned attributes. At last, these windows are merged together to form a final window, also referred as a

6

ACCEPTED MANUSCRIPT

final re-rank list. In this process, the windows move or slide together to produce final re-rank results. However, time required for the creation and merging of these windows make it a slow process. In another work, the rank list is optimized [12] and the final results are refined with user intervention. This intervention makes it a difficult and laborious process. To obtain better ranking results, a bi-directional re-ranking method is proposed where the gallery subjects are used as a probe [13]. In [14], a supplementary re-ranking step that

CR IP T

takes advantage of salience patch matching is added to increase accuracy. The selected features used in this method were able to improve selection after re-ranking a little. Some researchers [15, 42] used the content and context information for rank optimization, which assumes that correct match is likely found in first ranks. Therefore, they also ignore the positions where the true match is located in previous positions.

AN US

In contrast to abovementioned methods, our approach does not require a human-in-the-loop and considers all those positions where the true match is present, even at the upper positions in the list. Moreover, to increase the ranking accuracy, we utilize the hypergraph-based learning that is successfully applied in various computer vision applications. The proposed method not only enhances the accuracy but also retrieves the relevant images that are similar from the gallery to be probe. While, images retrieved by another method [20] are quite dissimilar from the

M

probe images.

In a recent work, Gala and Shah [2] present a survey on recent approaches associated with person re-

ED

identification, in which they note the requirement of re-ranking for person re-identification. In light of the literature review and to the best of our knowledge, no re-ranking technique for person re-identification exists that utilizes

PT

hypergraphs to address the post-rank accuracy of re-identification methods. The proposed technique not only handles this issue but also incorporates the discriminative information from different visual features into a

CE

hypergraph.

3. Motivations of Using Hypergraph

AC

Existing methods construct a graph on the given data by representing sample images as vertices and similarities

as edges [43, 44]. These methods used in image search were demonstrated to be effective in image re-ranking [45] . The graphs only consider the pairwise relationship between samples and ignore higher-order and complex relationships. This issue can efficiently be addressed by a graph generalization method known as hypergraphs [16, 17]. In addition, hypergraphs can easily represent complex relational objects in many real world problems. In these methods, a hyperedge connects multiple vertices and therefore explores the higher-order relationships between data

7

ACCEPTED MANUSCRIPT

samples. Initially, hypergraph learning has been widely disseminated for tasks such as clustering, classification, and

CR IP T

embedding [17]. Following that step, several computer vision applications, such as image retrieval [16], object-

AN US

Fig. 3. Differentiating between simple graph and hypergraph: (a) A simple graph considering just pairwise relations among various vertices. (b) A hypergraph exploring higher-order relationships between multiple vertices by a single hyperedge drawn by connecting the nearest vertices

recognition [19], and person re-identification [18], also make the most of the hypergraph-based learning. For 3-D object retrieval [46], multiple hypergraphs are learned and fused together for superior performance. For

M

hyperspectral image classification, a bi-layer hypergraph-based learning approach was presented [47] in which the first layer creates a simple graph using pairwise relationships, while the second layer constructs the hypergraph. In

ED

Fig. 3, the difference may be seen in the hypergraph and a graph.

3.1 Basic notations used in hypergraph

PT

In the traditional graph, vertex set represents the data samples, while the edge set that is used for pairwise (

) and it may be (un)directed. As there is

CE

dependencies between the vertices. Such a graph is represented as

only one edge between each of the nodes, these graphs lack the true representation of pairwise relations. In contrast, in a hypergraph one hyperedge can connect more than two vertices simultaneously. It allows the discovery of

AC

pairwise as well as higher-order and multi-way relations. A hypergraph

*

+ and

(

) is formed using vertex set

*

+ , a set of hyperedges

as weights of hyperedges. A hyperedge incident with a vertex

degree ( ) defining the sum of all values regarding hyperedge weights of the vertex

( )



( )

(

)

8

, provided that

and the

is (1)

ACCEPTED MANUSCRIPT

( ) is a positive number associated with each hyperedge. Later, these degree values are used to formulate is calculated as ( )

the diagonal matrix . The degree of a hyperedge assume that there are

(

). In our case, we

vertices for each hyperedge. Therefore, in the incidence matrix, there are

nonzero elements such that ( )

(

)

. Let | | and | |express the cardinalities of edge and vertex sets in the

hypergraph, respectively. It follows that a hypergraph | |, described as the incidence matrix

may also be given by the order of the matrix

, such that for every entry in

( In a hypergraph, the adjacency matrix

)

,

{ of

is denoted as

. Here,

stands for a diagonal matrix having hyperedge weights, while

| |

(2)

is computed using Eq.

is the transpose of

AN US

(1).



CR IP T

where

, whereas

are

sample images. The weight learning for individual hyperedge is explained in Section 4.3. Generally, a hypergraphbased regularized framework [17] for optimization can be represented as

*

( )+

is the similarity score vector of gallery images later used for re-ranking function,

(3) ( ) denotes

M

where

( )

regularization term for selected empirical loss,

represents the regularization parameter/balance factor, while

( )

ED

is the regularization term that helps to differentiate between the initial label vector and initial results. The regularization term can be defined as a generalization on natural random walk on hypergraph, ( )



PT

( )

(

)

(

( )

)

(

( )

√ ( )

( ) √ ( )

)

CE

where ( ) and ( ) are the similarity score, whereas d( ) and ( ) are the degrees relevant to vertices , respectively, in the hypergraph

(4) and

.

AC

4. System Overview

Fig. 2 demonstrates the block diagram of the proposed system. It has three major parts. In the first part, the initial

rank list is produced by extending the method proposed in [3]. In the second part, a newly introduced refinement technique applied to initial lists generates result lists that are more refined than initial ones. In the third part, hypergraph-based learning is utilized for re-ranking and final results are obtained. Details of the proposed system are as follows:

9

ACCEPTED MANUSCRIPT

4.1 Rank list refinement In person re-identification systems, the different matching algorithms [4, 6, 14, 20] return gallery images to the user matching a query image. These images are in a form called rank list, representing various images at different positions. The results produced by these strategies show that it is not necessary that the true match of the query lie at

CR IP T

first position. Consequently, it is very complex for the user to manually scan for the true match on the list, especially when there are more than one query images. Thus, a post-rank refinement procedure is required. An existing method [12] involves the user in the refinement process. Still, it is a tedious and time-consuming process. Few other methods use content and context information for refinement [15, 42].

Alternatively, we introduce a two-fold refinement process in which the actual ranking positions are determined

AN US

by using the calculations of a baseline method. Further classification is performed for various positions. We call this classification a post-rank refinement for categorization. This process helps us to improve the initial re-identification results and the overall accuracy of the system.

In the proposed refinement process, for a given probe image

, an initial rank list

*

+

is obtained

where

*

. Let

*

+ contain sub-lists for

number of test images against the gallery,

+ shows the position of all the relevant retrieved images related to the individual probe. The

ED

gallery images * +

M

by exploiting the method proposed in [3] that is based on an initial score vector computation for ranking the

basic assumption in our case is that against multiple probes, if

does not have a true match,

in the top

PT

returned candidates, then the refinement algorithm is employed for re-ordering. The purpose is to detect and filter such ranks from the corresponding correlated matches where the true match lies at first position. This approach

retrievals.

CE

facilitates focusing on the remaining retrieved ranks in the lists and on re-ordering the positions in the initial

AC

In the refinement phase, the output of the baseline method is used. In a baseline model, if

images from Camera A to be matched with the gallery images

represents probe

from Camera B, we have to utilize the same probe

and gallery for training our model to make it consistent with the baseline output. Otherwise, the results may change from the initial rank lists and consequently make our model difficult to comprehend and evaluate. Therefore, denote as { } and { } sets that will constitute the training data for our algorithm. For example, for the VIPeR dataset, both { }

and { } are set to 316. In this instance, it is assumed that the particular dataset is split according to the

10

ACCEPTED MANUSCRIPT

pre-defined protocols [3, 14, 33]. Moreover, the values for the splits can be altered according to the experimental requirements.

taken from { }

+

AN US

Algorithm 1 Input: Initial rank list Probe list taken from { } Gallery Images * + Output: and all rank lists Step 1: Take , and * + Step 2: Analyze the initial rank list * + Step 3: Get matching rank of each probe in Step 4: Apply refinement as For i=1 to N ǀ check for relative position of probe in each * ∑ ǀ do ǀ Store all list in the ranking database ǀ Store all other rank lists to End

CR IP T

TABLE 1 The rank list refinement algorithm

The proposed algorithm works in two steps. In the first step, the actual position of the probe images against the

M

retrieved gallery images is computed based on the calculation of a relevance score. This step employs assigning each probe image a score related to its position in the initial ranking list. For this purpose, the probe image +

are exploited. In particular, this approach involves the labeling of the respective retrieved images

ED

*

and

with a ranking score according to their position in the retrieved list. Additionally, the true location of the probe is

PT

also determined in its corresponding retrieved rank list. After calculating this score, the particular labeled probe and its relative initial rank list from

are further utilized in the second step. This estimation and assignment is

CE

helpful for processing and reducing the list size for re-rank learning that ultimately minimizes the computation time of re-ordering. In the second step, all such lists where the true match lies at first position are excluded from

,

AC

and further stored in the ranking database. All other remaining lists with their respective probe images are considered for re-ordering. Consequently, a reduced rank list



is produced. The proposed

algorithm is summarized in Table 1. The refinement process is carried out before hypergraph learning; hence, it works in offline mode. As a result, this procedure has less of an effect on the overall computational complexity of the system. Another advantage of this

11

ACCEPTED MANUSCRIPT

strategy is that it operates on the baseline technique's mass pairwise similarity score matrix to determine the actual location of the probe.

CR IP T

4.2 Hypergraph learning for re-ranking The objective of re-ranking is to produce a score list for a new ranking by using learning on hypergraphs. In the is to be matched with the gallery * +

testing phase, the probe from

to produce results. In a hypergraph

construction, a hyperedge is formulated using the center vertex and the associated k nearest vertices, e.g., as in Fig.

AN US

3. The transductive learning framework is exploited for hypergraph learning. The aim is to assign a similarity score to those images that are associated with highly weighted hyperedges. In addition, similar gallery images might rank top in the resultant list. Therefore, Eq. (3) is presented by the optimization function ( )such that

( )

( )

{

( )}

(5)

More precisely, the term ( ) is calculated by using Eq. (4) where the identical hyperedges will get more similar

where ( )





( ( )

( ))

(6)

ED



M

( ) is defined as

similarity scores, while

is a vector having binary labels, and ( )

{

, if

is set as a probe. Here,

is a

PT

vertex. Hence, the value of the first element is set to 1 in label vector and the remaining values are set to 0. In such

CE

cases, the term ( ), in Eq. (5) is calculated by deriving Eq. (4) as

AC

( )





( ( )

( ) where



is denoted as

identity matrix and

(

)

(

( )

( )

( )∑

As from Eq. (7) if ∑

( )

)

(

( )

( )

)

( ( )



(

( )

( ) ( )

√ ( )

( )

and ∑

)

(

∑ )

√ ( ) ( )

)

) ( )



( )

(

)

(

) ( ) ( )

( )√ ( ) ( )

(7)

. Therefore, the solution can be obtained as ( )

(

)

(

) ( ) ( )

( )√ ( ) ( )

(8)

is a positive semi-definite matrix for the hypergraph Laplacian, represents the

can be calculated as

12

ACCEPTED MANUSCRIPT

( where

)

(

)

(9)

are diagonal matrices of vertex degrees, hyperedge weights and edge degrees correspondingly.

For computing

,

is used as an identity matrix. By substituting the values of Eq. (6) and Eq. (8) in Eq. (5) we get

( )

* ‖



+

(10)

Algorithm 2 Input: Probe Image ,Gallery Images * + Output: Final rank list Hypergraph learning

and

(

CR IP T

TABLE 2 Hypergraph learning for post-rank optimization

)

For large values of

)

ED

(

M

By differentiating Eq. (10) w.r.t. , we have

AN US

( ) ( ) 1.Compute Similarity matrix S by using equation 2. Construct hypergraph for each vertex. Calculate vertex degree, hyperedge degree matrices 3. Calculate weight of individual vertex for hyperedge by searching k-nearest neighbors 4. Using soft assignment, compute incidence matrix H using Eq. (12) 5. Compute hypergraph Laplacian using 6. Calculate Relevance score using , and rank all the vertices and produce the results 7. Compare and merge results already stored in ranking database 8. Produce final results

(11)

, the computation of its inverse is not feasible. Instead, the score vector (

)

(

)

. In this equation,

PT

calculated efficiently through iterated computation as

can be

the iteration number is denoted by and this iterated procedure is guaranteed to converge [48]. The hypergraph-

CE

based learning algorithm is presented in Table 2. After getting the relevance score vector

from learning against the probe and gallery, a re-rank list is generated

AC

by sorting the score. The next task is to compare it with the previously saved results in the ranking database, with the purpose being to incorporate the pre-stored rank-1 results with newly generated re-rank lists and to avoid the repetitive results from the cumulative final lists.

4.3 Weight learning of hyperedges

13

ACCEPTED MANUSCRIPT

Hypergraph Laplacian and transductive learning framework [49] can be leveraged to gain robust matching results. Moreover, we can gain the advantage of different discriminative features to form a hypergraph. In this study, we assigned different weights to different hyperedges such that an edge gains a higher weight if it is more discriminative and vice versa. The incidence matrix may be calculated using Eq. (2). However, this traditional

AN US

CR IP T

binary structure treats every edge equally. The relative distance between the edges is ignored, which degrades the

Fig. 4. Shows comparison between the incidence matrices. In (a) based on the binary assignment using the equation

(

)

{

is the incidence matrix calculated

. This matrix contains only two values and

ignores the other values between 0 and 1. Whereas in (b) represents the incidence matrix computed by ( ) using soft assignment method ( ) { . In this technique, the degree of membership of a vertex to is shown by a real value between 0 and 1. This soft assignment technique represents more discriminative relationships between hyperedges

M

a hyperedge

ED

performance of the hypergraph. Therefore, instead of using binary values, in our case, the incidence matrix is constructed using softly assigned values as used in [16]. Therefore,

where (

{

(

)

(12)

) denotes the similarity between two vertices, which can be calculated as

CE

where (

)

PT

(

) is the distance calculated between hyperedge center

AC

which is more effective than using the Euclidean distance.

and

(

)

(

(

)

),

using the Mahalanobis [10] distance,

is the average distance of all the images in a

hyperedge. Therefore, the weights are assigned as



(

)

(13)

Assuming a similarity function exists between the images, hyperedges are constructed by taking each image as a centroid vertex with its corresponding k-nearest neighbor image. In our case, the value of k is set as 2. In a particular hypergraph, an edge can connect multiple vertices. The ultimate goal is to find the best match, which is only

14

ACCEPTED MANUSCRIPT

possible by selecting the closest vertices in that hyperedge. Moreover, the weight regarding an edge is calculated between the two nearest vertices based on the similarity score (

) In addition, the calculation of this score

requires only two images for manipulation. Therefore, the motive for selecting the two nearest neighbors, i.e., k=2 is due to calculations of this pairwise distance. Furthermore, this mechanism also reduces the complexity of the hypergraph by making it more elaborate and uniform. It is also illustrated in Fig. 3, where an edge contains exactly

specific. In Fig. 4, a comparison between the incidence matrix matrix

CR IP T

three vertices. Moreover, the formation of a hyperedge and the selection of the nearest neighbor are problem-

of a hypergraph using binary values and the incidence

of a hypergraph using the soft assignment technique is given. The soft assignment technique represents

AN US

more discriminative relationships between hyperedges. Whereas, in the binary structure hypergraph, the strength between the hyperedges is ignored by using only two values, i.e., 0 and 1.

5. Experiments and Results

This section presents the experimental results and a detailed analysis of the proposed method. First, datasets,

M

feature extraction and evaluation protocols are given. Second, comparisons with contemporary approaches including direct matching and post-rank optimization are provided. The conducted experiments address the following

ED

questions:

How to evaluate the proposed work's performance with the publically available datasets?

2.

Are the joint effects of the baseline and proposed technique valuable?

3.

Does the proposed re-ranking technique perform better than other recent strategies, including both direct and

CE

PT

1.

re-ranking methods?

AC

5.1 Datasets

VIPeR [50] is a publically available dataset having 632 pairs of person images. It is captured from two non-

overlapping cameras and each subject appears in each camera view. It is widely used and contains features such as illumination, occlusion, and pose variation. Therefore, it is ideal for assessing the performance of person reidentification algorithms. Different samples from both datasets are shown in Fig. 5(a) and Fig. 5(b).

15

ACCEPTED MANUSCRIPT

CUHK02 [51] dataset is also taken using two non-overlapping cameras, that capture frontal and back views of the subjects. It contains 1871 challenging images having viewpoint, illumination, and occlusion variations. The motives of selecting these datasets are (a) the pairwise samples are suitable for surveillance and inhibit various re-identification and real world challenges and (b) the captured images are from non-overlapping cameras.

M

AN US

CR IP T

Moreover, the datasets are widely used and publically accessible for assessing re-identification approaches.

ED

Fig. 5. Model images from (a) the VIPeR and (b) the CUHK02 datasets. In each column, same person images shown are taken from two non-overlapping cameras of respective datasets

PT

5.2 Feature extraction and evaluation protocol

The feature extraction process comprises the following steps. First, all P person images are rescaled to 128*48.

CE

Block size is set to 8*16 for each image division, which overlaps with their neighboring blocks by half of their size i.e., 4*16. Second, HSV and lab color values are utilized as the quantized mean, whereas for texture features 8-bit

AC

LBP values are taken from each block. Ultimately, the resultant features vector is the concatenation of all the features extracted from each block. Moreover, selection of the block size in our experiment is a common practice and is adopted by many methods. Fig. 6 demonstrates the image scale and its respective divisions in blocks. In our trials, we exercise a similar protocol as given in [3, 14, 33]. In particular, both data sets are arbitrarily divided equally into two groups. One group is used for training while second is used for testing. To get a fair comparison of results, we run the tests ten times and present the comparison in the form of CMC curves of the

16

ACCEPTED MANUSCRIPT

average matching rate at various ranks. For the VIPeR dataset, the value of p is selected as 316 while for the CUHK02 dataset p is chosen as 485. The parameters used in the experiment, such as lambda, are adjusted to 0.1 in Eq. (10) and verified using cross-validation. Furthermore, both

and

involve matrix calculation which can be

efficiently stored and used by compressed sparse matrix representation. Moreover, the size of the gallery is not too is also feasible.

AN US

CR IP T

large. Therefore, the inverse calculation of

(b)

(a)

(c)

M

Fig. 6. Showing feature extraction mechanism of an image taken from the VIPeR dataset in Fig. 6(a). Sample image of size 128*48 is divided into 8*16 size blocks as shown in Fig. 6(b) and an individual block of size 8*16 is cropped and zoomed as shown in Fig. 6(c)

ED

The experiments are conducted on a PC having 6 GB RAM and equipped with Intel core i-7 processor while the implementation is done in the MATLAB 2014b platform.

PT

5.3 Evaluation with state-of-the-art post-ranking approaches

CE

This section provides comparisons with the modern techniques that have used re-ranking and shows the outcomes of the presented technique along with four different baseline methods.

AC

In Table 3, the computed results on the VIPeR dataset are given regarding recognition rate percentage at different ranks for those methods that have employed re-ranking as an additional process or have utilized the post-ranking process for optimization. i.e., RDs + Saliency Re-ranking [18], KISSME+SB [11], POP [12], Rank Optimization [52], Bidirectional re-ranking [15], and KCCA+DCIA [42]. This also illustrates the advantages of the proposed method w.r.t other state-of-the-art methods. As these methods are evaluated on the VIPeR dataset, we also provide a comparison for the same dataset as well. It is to be noticed that the methods such as KISSME+ SB and RDs+ Saliency Re-ranking, include additional re-ranking step in

17

ACCEPTED MANUSCRIPT

their main or proposed baseline method. Giving a small priority to their post-ranking step, therefore, these postranking methods have low performance at rank-1. The performances are 19.3% and 33.29%, respectively. Such methods as POP [12], Rank Optimization [52], Bidirectional re-ranking [15], and DCIA [42] are dedicated postranking methods designed exclusively for re-ranking. Therefore, these methods have better performance at rank-1. This is 59.09% for POP, 34.97% for Rank Optimization and 63.92% for DCIA, respectively.

Method

Ranks 10 78.35 63.30 63.10 72.03 67.11 78.21 80.12

20 88.48 76.60 70.01 80.21 89.32 87.11 86.51

50 97.53 90.60 --95.40 99.05 100

AN US

RDs+ Saliency Re-ranking KISSME+SB POP Rank Optimization Bidirectional re-ranking KCCA+DCIA XQDA+ Proposed

1 33.29 19.30 59.05 34.97 22.21 63.92 64.75

CR IP T

TABLE 3 Top-ranked matching rate (%) comparison of the proposed method with the state-of-the-art post-ranking methods on the VIPeR dataset @ p=316. Best results are highlighted in boldface font.

Although bi-directional re-ranking is designed for post-ranking, still it has 22.2% recognition rate at rank-1. One reason for this property may be the selection of inappropriate baseline methods for evaluation.

M

The results in boldface show that XQDA+proposed outperforms all state-of-the-art techniques. This method

ED

gains an improvement of 64.5% at rank-1. One reason is the selection of an appropriate baseline method at the time of the experiments. The second and most important reason for the results is that the proposed method employs a novel refinement algorithm and a dedicated hypergraph-based learning framework, both of which have shown

PT

excellent performance in re-ranking in our experiments. More interestingly, DCIA, SB and bi-directional re-ranking

CE

used KISSME[5] as baseline model and produce their results. Using KISSME in our proposed method, we achieve an improvement of almost 4% over DCIA, 17% over SB and 15% over bi-directional re-ranking. Hence, an average of 12% improvement over these three methods can be seen. Thus, the proposed post-ranking optimization method is

AC

more effective than existing strategies when the same baseline method is chosen for re-ranking. To analyze the proposed method's performance with different metric-learning methods, the results regarding

recognition rate at rank-1 are presented in Fig. 7 and Fig. 8. The figures show that various metric learning algorithms when used without the proposed re-ranking method have reduced recognition performance. However, by using the proposed hypergraph-based re-ranking method, better results are achieved.

18

ACCEPTED MANUSCRIPT

In Fig. 7, we illustrate the performance of the proposed method with different baseline methods. Specifically, for the VIPeR dataset at rank-1, Euclidian and Mahalanobis distance methods have a recognition rate of 8% and 18% at rank-1 before applying the proposed model. After the application, they achieved a recognition rate of 19% and 30%, respectively, at rank-1, showing an improvement of 11% and 12%, respectively. Similarly, KISSME improves results from 20% to 36%, gaining an increase of almost 16% at rank-1. The best results are shown by XQDA+

CR IP T

proposed method. Alone, LOMO+XQDA provides an average recognition rate of almost 42%, whereas LOMO+XQDA with the proposed method obtains a recognition rate of 64.54%, improving the results by more than 22.57%. Although the results of all base models are enhanced, XQDA as a baseline model achieves significant improvement than other methods.

AN US

Similarly, in Fig. 8, the proposed technique demonstrates a remarkable performance improvement against various baseline methods on the CUHK02 dataset. More precisely, it improves upon the baseline method results from 24% to 40% for KISSME, 14% to 29% and 19% to 34% for Euclidian and Mahalanobis distance models, respectively. The best results are show by improving XQDA’s performance from 52% to 66%. All of these results are at rank-1. For CUHK02, more performance improvement can be noticed. One reason for this improvement is

M

that the baseline methods have shown good results. Moreover, this dataset is less challenging than the VIPeR dataset.

ED

Concerning the evaluation of post-ranking methods on the CUHK02 dataset, the proposed method is compared against DCIA [42], RD [14], and Rank Optimization [52]. Particularly, our method improves the baseline methods

PT

[3] by almost 14%, and the post-ranking DCIA technique by 4% on this dataset. However, the proposed method improves RD [14] results from 31.10% to 66% by gaining an increase of almost 35% and improves the Rank

CE

optimization [52] results from 36% to 66%, increasing the accuracy by 30%. The results are reported in Fig. 9. Therefore, it is demonstrated from the results that the given approach performs significantly better than existing

AC

methods.

19

CR IP T

ACCEPTED MANUSCRIPT

PT

ED

M

AN US

Fig. 7. Recognition rate (%) bars of baseline methods on the VIPeR dataset. Gray bars show results before and maroon bars show results after applying proposed method with baseline

AC

CE

Fig. 8. Recognition rate (%) bars of baseline methods on the CUHK02 dataset. Gray bars show results without and maroon bars show results after applying proposed method with baseline

20

ACCEPTED MANUSCRIPT

Fig. 9. CMC curves for comparing the performance of the proposed method with DCIA[42], RD[14], and Rank Optimization[52] on the CUHK02 dataset 5.4 Evaluation with state-of-the-art ranking approaches To ensure a reasonable comparison, we evaluate the presented technique with the latest direct matching or ranking approaches that use the same dataset and similar evaluation protocols as were used in the original

CR IP T

experiments. The results of these techniques are acquired empirically from the respective papers. Table 4 and Table 5 exhibit the recognition rate percentage of numerous methods on both datasets while Fig. 10 provides CMC curves for each method on both datasets. Specifically, LOMO+XQDA [3] has shown remarkable performance on the VIPeR dataset as in Table 4, while RD [14] has shown good results on the CUHK02 dataset respectively as in Table

AN US

5. Selection of the baseline method is very important and it plays a vital role in the post-ranking framework.

TABLE 4 Recognition rate (%) of various state-of-the-art person re-identification or baseline methods on the VIPeR dataset @ p=316. LOMO-XQDA[3] shows the best performance and its results are highlighted in boldface font. Methods

Ranks

1

M

40.00 30.16 15.66 20.34 13.01 34.23

ED

LOMOXQDA SalMatch PRDC KISSME SDALF L1-norm

10

20

50

80.51

92.08

95.21

43.45 53.66 62.45 53.22 57.35

58.48 70.09 77.62 71.05 73.47

78.53 90 92.32 90.41 88.76

PT

TABLE 5 Recognition rate (%) of different contemporary person re-identification or baseline methods on the CUHK02 dataset @ p=485. RD[14] shows the best performance and its results are highlighted in boldface font. RD LMNN ITML SDALF ROCCA eSDC

Ranks 1 31.10 13.45 15.98 9.90 29.77 20.01

10 68.55 42.25 45.60 30.33 66.03 40.21

AC

CE

Methods

21

20 79.17 54.11 59.81 41.03 76.78 50.55

50 90.38 73.29 76.61 55.99 88.47 70.21

CR IP T

ACCEPTED MANUSCRIPT

AN US

Fig. 10. Showing CMC curves for performance comparison of proposed method with XQDA[3],RD[14], KCCA[53], LMNN[20], ITML[4], and KISSME[5] on VIPeR in Fig. 10(a) and Salmtach[54], ITML[4], L1-norm [8], Midfilter[55], eSDC[51],CCA[56] , and KCCA[53] on CUHK02 in Fig. 10(b) Fig. 10(a) shows the performance of the proposed method on the VIPeR dataset by comparing existing methods, namely, XQDA [3], RD [14], KCCA [53], ITML [4], KISSME [5], and LMNN [20]. The presented method outperforms the latest ranking techniques, especially from rank1-10. Particularly we achieved 64.5% at rank-1 and 84% at rank-5, while no other method gains such a robust recognition rate at these ranks. Moreover, an increased

M

performance gap can be seen in higher ranks, as well. This finding is observed because our refinement algorithm efficiently handles the initial rank results of the baseline methods. Hypergraph-based learning further enhances the

ED

output.

In Fig. 10(b), comparisons are given with Salmtach [57], ITML [4], L1-norm [8], Midfilter [55], eSDC [51],

PT

CCA [56] , and KCCA [53] on the CUHK02 dataset. KCCA has performed well on this dataset, having a recognition rate of 38% at rank-1, 65% at rank-5 and 72% at rank-10. Specifically, XQDA + proposed achieved a

CE

rank-1 recognition rate of 66%, 84% at rank-5 and 94% at rank-10, gaining an average of 22% increase in performance at these ranks and gaining 14% increase at rank-1. In particular, it is obvious through results and

AC

comparisons that the presented method gains an increase in correct recognition rate against all ranks relative to existing state-of-the-art methods. From the above analysis, the importance of the proposed method is obvious from two aspects: (a) by using the

proposed refinement technique, we can better classify the ranks by utilizing the correlation information from the initial rank list, and (b) applying hypergraph-based learning for post-rank optimization is more effective for

22

ACCEPTED MANUSCRIPT

discovering the relationships between images. Moreover, the presented framework is more robust than other post-

AN US

CR IP T

rank optimization methods. Fig. 11 shows some retrieval examples before and after re-ranking is applied.

ED

M

Fig. 11. Retrieval examples of comparative results before and after re-ranking on CUHK02 dataset. Probes are in the left column highlighted with green boxes and the top 10 rank results are shown on the right. The white arrows indicate before, and solid arrows refer results after the proposed re-ranking is applied. The true matches are highlighted by the red boxes 6. Conclusions

PT

A re-ranking framework has been presented for person re-identification that takes advantage of using a proposed refinement algorithm and hypergraph-based learning. The refinement for classification regarding different ranks was

CE

performed using the correlated information gained by exploiting the baseline model followed by exploring higherorder relationships among the images using hypergraphs. Extensive experiments and evaluations on public datasets

AC

revealed that the described re-ranking scheme is more robust and outperforms a wide range of other techniques by improving results at different ranks. At rank-1, we accomplished an average 22.57% improvement in the accuracy on the VIPeR dataset and an average 14% improvement on the CUHK02 dataset. The empirical investigation on different baseline methods also revealed the effectiveness of the presented method. In this paper, we focused on reranking of individual images in pairwise cameras. Future work might comprise generalizing the proposed

23

ACCEPTED MANUSCRIPT

framework for multi-camera and real-time scenarios, for example, in [33]. Furthermore, for large datasets such as CUHK03 [58] and Market-1501 [59], the time complexity issue can also be addressed in the future.

Acknowledgement The authors would like to thank National Natural Science Foundation of PR China (61375079), and The Chinese

CR IP T

Academy of Science-The World Academy of Sciences (CAS-TWAS) President’s Fellowship.

References

AC

CE

PT

ED

M

AN US

[1] H. Yang, L. Shao, F. Zheng, L. Wang, Z. Song, Recent advances and trends in visual tracking: A review, Neurocomputing, 74 (2011) 3823-3831. [2] A. Bedagkar-Gala, S.K. Shah, A survey of approaches and trends in person re-identification, Image and Vision Computing, 32 (2014) 270-286. [3] S. Liao, Y. Hu, X. Zhu, S.Z. Li, Person re-identification by local maximal occurrence representation and metric learning, In IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2015), pp. 21972206. [4] J.V. Davis, B. Kulis, P. Jain, S. Sra, I.S. Dhillon, Information-theoretic metric learning, In ACM Conference on Machine Learning (2007), pp. 209-216. [5] M. Köstinger, M. Hirzer, P. Wohlhart, P.M. Roth, H. Bischof, Large scale metric learning from equivalence constraints, In IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2012), pp. 2288-2295. [6] B. Prosser, W.S. Zheng, S. Gong, T. Xiang, Person Re-Identification by Support Vector Ranking, In British Machine Vision Conference (BMVC 2010), pp. 21.01-21.11. [7] X. Wu, A.G. Hauptmann, C.-W. Ngo, Measuring novelty and redundancy with multiple modalities in cross-lingual broadcast news, Computer vision and image understanding, 110 (2008) 418-431. [8] W.-S. Zheng, S. Gong, T. Xiang, Reidentification by relative distance comparison, IEEE Trans. Pattern Anal. Mach. Intell., 35 (2013) 653-668. [9] C. Liu, S. Gong, C.C. Loy, X. Lin, Person re-identification: What features are important?, In European Conference on Computer Vision (ECCV 2012), pp. 391-401. [10] L. Yang, R. Jin, Distance metric learning: A comprehensive survey, Michigan State Universiy, 2 (2006). [11] L. An, X. Chen, M. Kafai, S. Yang, B. Bhanu, Improving person re-identification by soft biometrics based reranking, In IEEE Conference on Distributed Smart Cameras (ICDSC 2013), pp. 1-6. [12] C. Liu, C. Loy, S. Gong, G. Wang, POP: Person re-identification post-rank optimisation, In IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2013), pp. 441-448. [13] Q. Leng, R. Hu, C. Liang, Y. Wang, J. Chen, Bidirectional ranking for person re-identification, In IEEE Conference on Multimedia and Expo (ICME 2013), pp. 1-6. [14] L. An, M. Kafai, S. Yang, B. Bhanu, Person Reidentification With Reference Descriptor, IEEE Trans. Circuits Syst. Video Technol., 26 (2016) 776-787. [15] Q. Leng, R. Hu, C. Liang, Y. Wang, J. Chen, Person re-identification with content and context reranking, Multimedia Tools and Applications, 74 (2015) 6989-7014. [16] Y. Huang, Q. Liu, S. Zhang, D.N. Metaxas, Image retrieval via probabilistic hypergraph ranking, In IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2010 ), pp. 3376-3383. 24

ACCEPTED MANUSCRIPT

AC

CE

PT

ED

M

AN US

CR IP T

[17] D. Zhou, J. Huang, B. Schölkopf, Learning with hypergraphs: Clustering, classification, and embedding, Advances in Neural Information Processing Systems, (2006), pp. 1601-1608. [18] L. An, X. Chen, S. Yang, Person re-identification via hypergraph-based matching, Neurocomputing, (2015). [19] M. Hofmann, D. Wolf, G. Rigoll, Hypergraphs for joint multi-view reconstruction and multi-object tracking, In IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2013), pp. 3650-3657. [20] K.Q. Weinberger, L.K. Saul, Distance metric learning for large margin nearest neighbor classification, The Journ. Mach. Learning Res., 10 (2009) 207-244. [21] M. Guo, Y. Zhao, C. Zhang, Z. Chen, Fast object detection based on selective visual attention, Neurocomputing, 144 (2014) 184-197. [22] H. Bao, M. Lin, Z. Chen, Robust visual tracking based on hierarchical appearance model, Neurocomputing, 221 (2017) 108-122. [23] G. Doretto, T. Sebastian, P. Tu, J. Rittscher, Appearance-based person reidentification in camera networks: problem overview and current approaches, Journ. Ambient Intell. Humanized Comp., 2 (2011) 127-151. [24] A.E. Abdel-Hakim, A.A. Farag, CSIFT: A SIFT descriptor with color invariant characteristics, In IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2006), pp. 1978-1983. [25] K. Mikolajczyk, C. Schmid, An affine invariant interest point detector, In European Conference on Computer Vision (ECCV 2002), pp. 128-142. [26] R. Lienhart, J. Maydt, An extended set of haar-like features for rapid object detection, In IEEE Conference on Image Processing (2002), pp. I-900-I-903 vol. 901. [27] S. Arivazhagan, L. Ganesan, S.P. Priyal, Texture classification using Gabor wavelets based rotation invariant features, Pattern recognition letters, 27 (2006) 1976-1982. [28] Y. Zhang, S. Li, Gabor-LBP based region covariance descriptor for person re-identification, In IEEE Conference on Image and Graphics (ICIG 2011), pp. 368-371. [29] D.G. Lowe, Object recognition from local scale-invariant features, In IEEE Conference on Computer Vision (1999), pp. 1150-1157. [30] H. Bay, A. Ess, T. Tuytelaars, L. Van Gool, Speeded-up robust features (SURF), Computer vision and image understanding, 110 (2008) 346-359. [31] N. Dalal, B. Triggs, Histograms of oriented gradients for human detection, In IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2005), pp. 886-893. [32] A. Dhall, A. Asthana, R. Goecke, T. Gedeon, Emotion recognition using PHOG and LPQ features, In IEEE Conference on Automatic Face & Gesture Recognition and Workshops (FG 2011), pp. 878-883. [33] J.H. Shah, M. Lin, Z. Chen, Multi-camera handoff for person re-identification, Neurocomputing, 191 (2016) 238-248. [34] T. Zhou, M. Qi, J. Jiang, X. Wang, S. Hao, Y. Jin, Person Re-identification based on nonlinear ranking with difference vectors, Information Sciences, 279 (2014) 604-614. [35] W.-S. Zheng, S. Gong, T. Xiang, Person re-identification by probabilistic relative distance comparison, In IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2011), pp. 649-656. [36] D. Tao, L. Jin, Y. Wang, Y. Yuan, X. Li, Person re-identification by regularized smoothing kiss metric learning, IEEE Trans. Circuits Syst. Video Technol., 23 (2013) 1675-1685. [37] D. Tao, L. Jin, Y. Wang, X. Li, Person reidentification by minimum classification error-based KISS metric learning, IEEE Trans. Cybern., 45 (2015) 242-252. [38] L. Ren, J. Lu, J. Feng, J. Zhou, Multi-modal uniform deep learning for RGB-D person re-identification, Pattern Recognition, 72 (2017) 446-457. [39] J. Lin, L. Ren, J. Lu, J. Feng, J. Zhou, Consistent-aware deep learning for person re-identification in a camera network, In IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017).

25

ACCEPTED MANUSCRIPT

AC

CE

PT

ED

M

AN US

CR IP T

[40] M. Raza, Z. Chen, S.-U. Rehman, P. Wang, P. Bao, Appearance based pedestrians’ head pose and body orientation estimation using deep learning, Neurocomputing, 272 (2018) 647-659. [41] M. Raza, Z. Chen, S.U. Rehman, P. Wang, J.-k. Wang, Framework for estimating distance and dimension attributes of pedestrians in real-time environments using monocular camera, Neurocomputing, 275 (2018) 533-545. [42] J. Garcia, N. Martinel, C. Micheloni, A. Gardel, Person re-identification ranking optimisation by discriminant context information analysis, In IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2015), pp. 1305-1313. [43] D. Conte, P. Foggia, C. Sansone, M. Vento, Thirty years of graph matching in pattern recognition, Int. Journ. Patt. Recog. Artifi. Intellig., 18 (2004) 265-298. [44] G. Kurillo, Z. Li, R. Bajcsy, Wide-area external multi-camera calibration using vision graphs and virtual calibration object, In ACM/IEEE Conference on Distributed Smart Cameras (ICDSC 2008), pp. 1-9. [45] M. Wang, H. Li, D. Tao, K. Lu, X. Wu, Multimodal graph-based reranking for web image search, IEEE Trans. Image Process., 21 (2012) 4649-4661. [46] Y. Gao, M. Wang, D. Tao, R. Ji, Q. Dai, 3-d object retrieval and recognition with hypergraph analysis, IEEE Trans. Image Process., 21 (2012) 4290-4303. [47] Y. Gao, R. Ji, P. Cui, Q. Dai, G. Hua, Hyperspectral image classification through bilayer graph-based learning, IEEE Trans. Image Process., 23 (2014) 2769-2778. [48] L. Zhu, J. Shen, H. Jin, R. Zheng, L. Xie, Content-based visual landmark search via multimodal hypergraph learning, IEEE Trans. Cybern., 45 (2015) 2756-2769. [49] D. Zhou, O. Bousquet, T.N. Lal, J. Weston, B. Schölkopf, Learning with local and global consistency, Advances in neural information processing systems, 16 (2004) 321-328. [50] D. Gray, H. Tao, Viewpoint invariant pedestrian recognition with an ensemble of localized features, In European Conference on Computer Vision (ECCV 2008), pp. 262-275. [51] R. Zhao, W. Ouyang, X. Wang, Unsupervised salience learning for person re-identification, In IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2013), pp. 3586-3593. [52] M. Ye, J. Chen, Q. Leng, C. Liang, Z. Wang, K. Sun, Coupled-view based ranking optimization for person re-identification, In International Conference on MultiMedia Modeling (2015), pp. 105-117. [53] G. Lisanti, I. Masi, A. Del Bimbo, Matching people across camera views using kernel canonical correlation analysis, In ACM Conference on Distributed Smart Cameras (2014), pp. 10. [54] R. Zhao, W. Oyang, X. Wang, Person re-identification by saliency learning, IEEE Trans. Pattern Anal. Mach. Intell., 39 (2017) 356-370. [55] R. Zhao, W. Ouyang, X. Wang, Learning mid-level filters for person re-identification, In IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2014), pp. 144-151. [56] D.R. Hardoon, S. Szedmak, J. Shawe-Taylor, Canonical correlation analysis: An overview with application to learning methods, Neural computation, 16 (2004) 2639-2664. [57] R. Zhao, W. Ouyang, X. Wang, Person re-identification by saliency learning, IEEE Trans. Pattern Anal. Mach. Intell., 39 (2016) 356 - 370. [58] W. Li, R. Zhao, T. Xiao, X. Wang, Deepreid: Deep filter pairing neural network for person reidentification, In IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2014), pp. 152159. [59] L. Zheng, L. Shen, L. Tian, S. Wang, J. Wang, Q. Tian, Scalable person re-identification: A benchmark, In IEEE International Conference on Computer Vision (ICCV 2015), pp. 1116-1124.

26

ACCEPTED MANUSCRIPT

Authors Biography

CR IP T

Saeed Ur Rehman has received his MS degree from Mohammad Ali Jinnah University Islamabad Pakistan. Currently he is a PhD Scholar under CAS-TWAS Fellowship; in the University of Science and Technology of China (USTC), Hefei, Anhui, PR. China. His Research Interests include computer vision, deep learning and use of machine learning and its applications.

AN US

Chen Zonghai, born in Anhui, China, in December 1963. He obtained his Bachelor's degree from the Department of Management and Systems Science of University of Science and Technology of China (USTC) in 1988. He is a Professor at the Department of Automation, USTC since 1998. Prof. Chen is also a recipient of special allowances from the State Council of PR China and a member of the Robotics Technical Committee of the International Federation of Automation Control (IFAC). Prof. Chen's main research area covers modeling and control of complex systems, control system engineering and intelligent information processing, energy management technologies for electric vehicles and smart microgrids.

M

Mudassar Raza is a PhD Scholar at University of Science and Technology of China (USTC), China under CAS-TWAS fellowship. He has more than seven years of experience of teaching undergraduate classes at COMSATS Institute of Information Technology, Pakistan. His interests include are Deep Learning, pattern recognition, and parallel & distributed computing. Peng Wang received his BS degree from University of Science and Technology (USTC) in 2010. He continued to be a PhD candidate ever since 2010 in USTC his PhD degree in 2015. His currently work as a postdoctor in the Automation Department. His current research is focused on uncertain information process in robot navigation, interval analysis, and deep/reinforcement learning.

ED

of China and got

PT

mobile

AC

CE

Qibin Zhang received his BS degree from the University of Science and Technology of China (USTC) in 2012. He is now a PhD candidate at the Knowledge Representation and Intelligent Information Technology Laboratory in the Department of Automation, USTC. His research interests include mobile robot localization, SLAM and knowledge representation.

27