Image retrieval framework based on texton uniform descriptor and modified manifold ranking

Image retrieval framework based on texton uniform descriptor and modified manifold ranking

Accepted Manuscript Image retrieval framework based on texton uniform descriptor and modified manifold ranking Jun Wu, Lin Feng, Shenglan Liu, Muxin S...

2MB Sizes 0 Downloads 30 Views

Accepted Manuscript Image retrieval framework based on texton uniform descriptor and modified manifold ranking Jun Wu, Lin Feng, Shenglan Liu, Muxin Sun PII: DOI: Reference:

S1047-3203(17)30164-5 http://dx.doi.org/10.1016/j.jvcir.2017.08.002 YJVCI 2040

To appear in:

J. Vis. Commun. Image R.

Received Date: Accepted Date:

1 November 2016 2 August 2017

Please cite this article as: J. Wu, L. Feng, S. Liu, M. Sun, Image retrieval framework based on texton uniform descriptor and modified manifold ranking, J. Vis. Commun. Image R. (2017), doi: http://dx.doi.org/10.1016/j.jvcir. 2017.08.002

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Image retrieval framework based on texton uniform descriptor and modified manifold ranking Jun Wua , Lin Fenga,b,∗, Shenglan Liuc , Muxin Suna a School

of Innovation and Entrepreneurship, Dalian University of Technology, Dalian, Liaoning, China 116024 b School of Computer Science and Technology, Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology, Dalian, Liaoning, China 116024 c School of Control Science and Engineering, Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology, Dalian, Liaoning, China 116024

Abstract Image representation and ranking are crucial parts in image retrieval. These two steps are independently constructed in most retrieval models, but the compatibility between descriptors and ranking algorithms play an important role. Inspired by human vision perception and manifold learning, we propose a novel image retrieval framework in this paper. We first propose an image representation called texton uniform descriptor, and then illustrate the preservation of the intrinsic manifold structure through visualizing the distribution of image representations on the two-dimensional manifold. This characteristic provides the foundation for subsequent manifold-based ranking. To further improve the efficiency in image retrieval, we propose modified manifold ranking (MMR) which aims at selecting small-scale images randomly as landmarks to propagate adjacent similarity among images iteratively. The extensive experiments in four public datasets demonstrate that our framework has better performance than other state-of-the-art methods in image retrieval. Keywords: image retrieval, texton uniform descriptor, modified manifold ranking, landmark

∗ Corresponding

author Email address: [email protected] (Lin Feng)

Preprint submitted to Journal of LATEX Templates

May 17, 2017

1. Introduction In recent years, image retrieval [1] has been becoming one of the most popular topics in computer vision and pattern recognition with the increasing amounts of complex image data on the Internet and mobile terminals. Partic5

ularly, Content-based image retrieval systems (CBIRs) play the significant part in image retrieval fields because image contents can be conveniently defined and described through the connection between human vision perception and pixel structures. As the key steps in these systems, image representation [2, 3] and similarity-based ranking [4] have a great effect on the efficiency and effective-

10

ness in CBIRs. Many research works related to representing image have been introduced in past decades. Low-level feature representation in pixels have showed superior performance in image retrieval. Lots of global descriptors aims at describing the spatial structure distribution and contrast among pixels within the holistic image. Julesz

15

[5] proposed the texton theory which claimed that image can be seen as the arrangement of regular patterns. Local Binary Patterns (LBP) [6] and Histogram of Oriented Gradient (HOG) [7] reflected this idea and described the global distribution of texture and gradient orientation respectively. Liu et al. [8] introduced color difference histogram (CDH) and micro-structure descriptor

20

(MSD) [9] to characterize the salient visual features for image retrieval.In addtion, patch-based image representations usually describe specific regions where discriminative information (e.g. line, edge or object) is salient. Bag-of-words (BoW) models characterize the distribution of visual words in the images. And local feature descriptors [10, 11] tend to be utilized to detect and construct the

25

visual words in the images. High-level semantic perception for visual images is also the important but challenging research issue in image retrieval. The problem in “semantic gap” between high-level human visual perception and low-level image pixels captured by computer is difficult to solve in real-world CBIR systems. Even so,

30

there are still some related works in pattern recognition and image retrieval.

2

Deep learning attempts to mimic the human brain and processes information in a deep architecture [12]. Deep convolutional neural networks (CNN) [13] are constructed to learn discriminative image feature representation, which can be expressed by the full-connected layers hierarchy in the middle of CNN ar35

chitecture. Motivated by visual cortex, Serre et al. [14] introduced HMAX model which can mimic the tuning properties of neurons in visual cortical areas through the iteration between simple S units and complex C units. Even though these models have been successfully applied in object recognition or image retrieval, biologically-motivated features still need to be further researched

40

because the time complexities in these descriptors are often very expensive and large amounts of parameters in these complex models should be selected and determined carefully. Furthermore, that lacking enough qualified labeled training images may also limits the performance of these descriptors in real-world applications.

45

Except for image representation, ranking methods also play a significant part in CBIR. Fu et al.[44] adopted the huge number of online images to further improve image search quality. The common used methods are pair-wise distance measurements based on metric learning [15, 16]. Hashing [20, 21] is also a widely-studied solution for similarity ranking since hash codes is the compact

50

low-dimensional representation with less computational and storage space. But searching nearest neighbors approximately might lead to the loss of accuracy. In addition, graph based ranking methods are also widely used in image retrieval, e.g. PageRank [17], query specific rank fusion [18], Locally Adaptive Retrieval [43], etc. Manifold ranking [4, 19] is introduced based on manifold learning in

55

image representations and obtains good results in ranking the similar images. However, there are two limitations in traditional manifold ranking: first, it has terrible computational time complexity in large-scale datasets because the inverse matrix computation is very time-consuming; second, the weight matrix is difficult to update online when out-of-sample images are retrieved.

60

Image representations and ranking are important but often independently separate parts in image retrieval. In most cases, appropriate ranking strategy 3

facilitates to further improve image retrieval performance. For examples, L1 distance has shown better results than Euclidean distance in many CBIR systems. But how to select or construct the most appropriate ranking method for 65

descriptors is still a difficult problem. There are little research works discussing the compatibility between image descriptor and ranking strategy. Motivated by the connection between manifold learning and human perception for images [22, 23], we propose a novel image retrieval framework based on manifold perception. Firstly, a new image descriptor called texton uniform descriptor (TUD)

70

is presented, and analyzed the distribution on the manifold compared with other descriptors. Considering the manifold distribution characteristic of our image representation, we introduce a fast manifold-based ranking method for image retrieval. The essential idea in this CBIR system is that the compatibility between our descriptor and ranking can be reflected in the good distribution of

75

image representation on the manifold. Extensive experiments in image datasets are designed to validate the reasonability of this idea. The main contributions in this paper are summarized as follows: (1) A novel image descriptor is proposed to describe the visual content in images through considering the uniform distribution of pixels in both local patterns and

80

holistic regions. (2) By observing the distribution of image representation on the manifold, the compatibility between our descriptor and manifold ranking is analyzed, and thus manifold-based ranking method is considered for improving image retrieval performance.

85

(3) In order to improve the efficiency and effectiveness in ranking, we present a fast modified manifold ranking (MMR) method which aims at selecting smallscale images randomly as landmarks to propagate adjacent similarity among images iteratively. The remainder of this paper is organized as follows: The motivation and

90

overall framework of this paper are presented in Section 2; Section 3 and Section 4 introduce our image descriptor and modified manifold ranking method, respectively; Section 5 shows the experimental results on four image datasets; 4

the conclusion is given in Section 6.

2. The motivation and overall image retrieval framework 95

Many excellent image representations have been presented and applied for content-based image retrieval[6, 9]. However, these methods seldom consider the reflection of discriminatively intrinsic structure among image representations. But these works have some limitations: 1) discriminatively intrinsic structure among image representations are seldom considered; 2) it is difficult to select

100

compatible ranking algorithm for image retrieval. Some research works learn the compact image representation based on the graph structure from original local features[39, 40]. But they are different from the main work in this paper because they attempt to learn the manifold structure of local features. In this paper, the connection between manifold learning and human perception for images is

105

considered to analyze the intrinsic image representations and manifold-based ranking. The good manifold structure of our image representation meets the basic manifold assumption in manifold ranking algorithm, which demonstrates the compatibility between image representation and manifold ranking in our image retrieval framework.

110

Inspired by human perception for images, texton uniform descriptor (TUD) is proposed to extract the intrinsic image features which will be validated in 2D manifold visualization. We define and detect visual uniform regions where nearby pixels have similar color or edge orientation properties. The reason why these two image properties are chosen is that human visual system is very sensi-

115

tive to them. Visual uniform regions means the approximate continuousness of certain property and thus contains main content or object in the image. These regions are more likely to be perceived by human visual system. To describe these specific regions, we propose angle difference feature and uniform texton feature to represent contrast and spatial structure information respectively.

120

Original image embedded in high-dimensional image space is intrinsically low-dimensional [22]. Manifolds are fundamental to perception, and thus image

5

representation inspired by perception can be validated through the distribution on the manifold. Besides, the manifold characteristic of image representation reflects the intrinsic perception for the change of image, and also meet the basic 125

assumption for manifold ranking in CBIR. Manifold ranking is different from traditional pair-wise similarity metrics and learn similarity score through weights propagation in nearest neighbors. However, the high time complexity limits the performance in large-scale image retrieval. The key problems in this algorithm are the construction of large-scale weight matrix and subsequent inverse matrix

130

computation. In this paper, we focus on the modification in inverse matrix computation and propose a fast modified manifold ranking algorithm because the construction of weight matrix is off-line. The core idea of this algorithm is to compute similarity score based on small landmarks selected from the datasets, and then update the landmarks through ranking this score iteratively. In this

135

way, the process of inverse matrix computation is always operating in the small selected landmarks.

Figure 1: The overall image retrieval framework in this paper

The overall image retrieval framework in this paper is illustrated in Fig. 1. The main contributions in this framework are that a novel image descriptor 140

and fast modified manifold ranking method are proposed, which are introduced 6

in details in Section 3 and Section 4 respectively. Besides, the manifold distributions of our descriptor and other image representations are validated and analyzed in the experiments. The compatibility between our descriptor and modified manifold ranking also can be demonstrated in the image retrieval re145

sults.

3. Texton uniform descriptor (TUD) Image descriptors can be seen as the low-dimensional representations of original images in the feature space and contains discriminative information. In this section, we propose a novel image representation named texton uniform 150

descriptor (TUD) based on the human perception for specific uniform regions. Based on Gestalt psychology[45], people visually tend to perceive similar elements into unity. So the regions which have similar low-level visual properties among neighboring pixels are more likely to be group together in visual image. First, visual uniform regions are defined and detected based on the similarity

155

among the neighborhood pixels. The connected uniform regions contain the main body of object in the image and thus are discriminative for representing the global image. Second, in order to describe these discriminative regions, we propose angle difference feature and uniform texton feature to represent contrast and spatial structure information respectively, and then combine the two

160

relatively independent but mutually complement features. The manifold distribution characteristic of our descriptor is validated in the experiments, which demonstrates that the good manifold distribution in the feature space means the compatibility between feature representation and manifold ranking. 3.1. Visual uniform regions

165

Image is an organized whole for transmitting visual information, and also can be seen as the connection among visual uniform regions. The borderline in these regions denotes the edge of the object in image. There are many lowlevel properties for visual similarity perception. Thus there may be many kinds

7

of uniform regions considering each property. In this paper, color and edge 170

orientation are selected to detect the visual uniform regions because human visual system is very sensitive to them. Color is the direct expression of natural image and thus plays an important part in visual attributes. HSV color space is used to determine the color attribute in images because it is close to the visual perception of human eyes. And

175

H, S and V components are uniformly quantized into 8, 3 and 3 bins to reduce the computation in feature representation. As for Edge orientation, we adopt the method proposed in [24] to obtain gradient value and orientation using color image directly. And then edge orientation also is uniformly quantized into 12 bins for simplifying the edge similarity.

180

Visual uniform regions are defined as neighborhood patterns where neighboring pixels have similar color or edge orientation. Considering that each pixel has 8 nearest neighbors (boundary pixels are not considered in this case), every pixel in the image is compared with its neighbors to find the consistent pixels in small patterns. Note that there are two important attributes including color

185

and edge orientation in the image, each individual attribute is utilized to detect the uniform regions. In other words, color uniform regions and edge uniform regions are detected and described in the same way, respectively. Take the detection of color uniform region as an example, comparing with neighbors, each center pixel is preserved if they have identical quantized color

190

value; otherwise, the center pixel is removed. Finally, color-based visual uniform region is composed of the remaining pixels. As shown in Fig. 2, similar pixels are jointed into separated regions where pixels are change slightly. The pixels whose attributes change greatly have no similar neighbors and thus can be seen as the isolated points which may be affected by noise. Edge uniform region has

195

the same detection method by considering edge orientation attribute. Next, we present the feature representation in color uniform regions and edge uniform regions where the discriminative information is remained.

8

Figure 2: The detection process of color uniform region

3.2. Feature representation 200

Image representations is the discriminatively compact feature set in which the similarity of images in the same category is remained or even enhanced while the dissimilarity in different category is enlarged. Contrast and spatial structure are relatively independent and complemental properties, where spatial structure is to analyze the distribution relationship among neighboring pixels while con-

205

trast focuses on characterizing the similarity or dissimilarity magnitude. The two orthogonal properties in image representation provide certain robustness for transformation. So in this section, angle difference feature is presented to measure the contrast based on angle change among neighboring pixels in the color space, and uniform texton feature describes the spatial structure using

210

different uniform patterns. Suppose that an image f (x, y) corresponds to its attribute maps Tk (x, y) (k = 1, 2) where T1 (x, y) denotes the color uniform regions map and T2 (x, y) denotes the edge uniform regions map. The attribute maps reflects the local uniform regions and can be used to measure the uniform patterns. But the

215

contrast information in these patterns is still described in original HSV color space. a) Uniform texton feature The description in local patterns for images not only reflects the basic idea in texton theory, but also embodies the human visual uniformity in perceiv-

220

ing local contour or lines of objects. Local binary patterns (LBP) has shown discriminative performance in analyzing texture in gray images. The binary

9

representation for local regular patterns is computationally simple and robust to illumination transformation. Furthermore, visual uniformity mainly focuses on the description in similar neighboring pixels and their continual uniformity. 225

On the contrary, the radical change in the neighborhood reflects the endless of this uniform continuation. Based on these analysis, we define the image texton as binary patterns, and focus on the description in continual uniformity of these patterns in visual uniform regions. In this paper, local patterns are detected and described in 3 × 3 block. In

230

other words, only the 8 nearest neighbors around the center pixel are considered when defining the pattern types in the image. The reason why only 8 neighbors are compared with the center pixel is that these neighbors have the straightforward connection with the center pixel and increased neighborhood brings more computational burden in image representation. For each center pixel gc (x, y)

235

in attribute maps, the neighborhood pattern can be represented as: T ex ≈ t (s (gc , g0 ) , s (gc , g1 ) , · · · , s (gc , g7 ))

(1)

where gi (i = 0, 1, · · · , 7) denotes the nearest neighbors of gc (x, y). The uniform relationship among pixels are expressed as:   1, s (gc , gi ) =  0,

Tk (gc ) = Tk (gi ) Tk (gc ) ̸= Tk (gi )

(2)

Based on this formula, the texton is detected by the basic low-level attributes between center pixel and its neighbors, and is expressed in a binary string. 240

It can be seen that this binary string contains four different transformation types: 00, 01, 10, 11, where 11 means the uniform continually change around the center pixel, 01 or 10 denote the radical change in the endless of uniform continuation, and 00 may contains irregular change among pixels. These parts constructs the overall binary string patterns, but they has different meanings

245

and discriminative information. To describe the binary patterns and meanwhile distinguish different parts, we define the texton as follows:

10

P (gc ) =

6 ∑

u (s (gc , gi ) , (gc , gi+1 )) + u (s (gc , g0 ) , (gc , g7 ))

(3)

i=0

        u (x, y) =

      

0,

x=y=0

1,

x = 0, y = 1

1,

x = 1, y = 0

a,

(4)

x=y=1

where a ≥ 1 denotes that continual uniformity plays the most important part in the description of texton around the center pixel (a = 1.5 in our experiments as listed in Table 2). To describe the distribution of these texton in images, local 250

texton uniform is proposed based on the uniform probability: ∑ P (gc ) χgc hl tex (w0 ) = ∑ 8a · χgc

χgc

  1, =  0,

Tk (gc ) = w0

(5)

(6)

otherwise

where w0 ∈ Tk . This feature is local and mainly focuses on the local distribution of texton. Thus, a feature fusion strategy proposed in [33] is used to characterize the distribution of texton around the center pixel in local and global regions where global distribution of these center pixels can be expressed by normalized 255

histogram:

∑ hg

tex

(w0 ) =

χgc sum

(7)

where sum denotes the sum of pixels in visual uniform region. So utilizing the fusion strategy in [33], uniform texton feature is expressed as: htex (w0 ) = hl tex (w0 ) · (hg tex (w0 ) + 1) (∑ ) ∑ P (gc )χgc χ gc = ∑ 8a·χ · sum + 1 gc ∑ ∑ P (gc )χg P (gc )χg = 8a·sum c + ∑ 8a·χ c

(8)

gc

It can be seen that uniform texton feature fuses local texton uniform and global normalized histogram, and also has the other mean that it describes the

11

260

distribution of uniform binary string in both local neighborhood and holistic regions. Moreover, this fusion obtains more discriminative than single local texton uniform feature, and meanwhile the dimensionality of image feature is not enlarge so that there is no extra computational burden in image retrieval. b) Angle difference feature

265

Uniform texton feature aims to characterize the spatial structure of uniform texton patterns. Besides that, visual perceptual difference among the pixels also stimulates the human visual system for discerning natural images. Euclidean distance in L∗ a∗ b∗ color space has been utilized to measure the uniform color difference between colors and edge orientation [9]. However, Euclidean distance

270

may overemphasize the description of color intensity difference. And thus this measurement is sensitive to illumination transformation in images. For example, the same directional color coordinates tend to have less discriminative difference than different directional ones in the color space system. So in this paper, we proposed angle-based measurement to evaluate the visual perceptual difference.

275

Angle-based difference among the pixels is evaluated in the Cartesian coordinate system. HSV color space is introduced based on Cylinder coordinate system, and thus it is necessary to transform this system to Cartesian coordinate system H ′ S ′ V ′ with H ′ = S · cos (H), S ′ = S · sin (H) and V ′ = V . It is also analyzed for each center pixel in 3 × 3 block. The difference between the

280

center pixel and its neighbors is expressed as:

cos (gc , gi ) =

⟨f (gc ) , f (gi )⟩ ∥f (gc )∥ · ∥f (gi )∥ (

β (gc , gi ) = arccos

⟨f (gc ) , f (gi )⟩ ∥f (gc )∥ · ∥f (gi )∥

(9) ) (10)

where f (gc ) = [Hc′ , Sc′ , Vc′ ] and f (gi ) = [Hi′ , Si′ , Vi′ ] denotes the coordinates of gc and gi , repectively. β (gc , gi ) ∈ [0, 2π] denotes the angle difference among pixels in the coordinate system. Then angle difference distribution can be described. There are two distribution descriptions in local pattern or holistic regions for

12

285

this contrast feature. Local angle difference can be expressed as: ∑∑ (gc , gi ) · s (gc , gi ) · χgc i β∑ ∑ hl dif (w0 ) = i β (gc , gi ) · χgc

(11)

The term “local” means that this feature aims at angle difference distribution in local patterns, and lacks some global description for these difference. Then the other feature named global angle difference histogram is introduced: ∑∑ i β (gc , gi ) · χgc hg dif (w0 ) = sumdif

(12)

where sumdif denotes the whole possible angle difference distribution in visual 290

uniform regions. Finally, these two features are fused based on the strategy in [33], and angle difference feature can be represented as:

hdif (w0 ) = hl

dif

(w0 ) · (hg

dif

(w0 ) + 1)

(13)

Angle-based difference describes the contrast information in uniform regions and is relatively independent with uniform texton feature. c) Image representation 295

Color and edge orientation have been used to detect visual uniform regions, respectively. These two low-level attribute maps has similar expression ways, and thus can be used to extract image feature in the same way as discussed above. In other words, Htex and Hdif are extracted in these two maps according to different attribute values, respectively. Due to the fact that Htex and Hdif

300

represent orthogonal but complementary properties in visual uniform regions, the image representation for each attribute map is the unweighted connection between Htex and Hdif :

h(k) = [htex hdif ]

(14)

where h(k) (k = 1, 2) denotes the image representation corresponding to attribute map Ti (x, y). Though both color and edge orientation are important 305

attributes for image representation, they still have different discriminative performance for representing different images, which can be expressed by added 13

weights. Therefore, the proposed Texton uniform descriptor (TUD) is expressed as follows: [ ] h = η · h(1) (1 − η) · h(2)

(15)

As the global visual feature, TUD is designed to characterize the contrast 310

and spatial structure among pixels in uniform regions. The manifold distribution of TUD image representation in the feature space are illustrated and analyzed in the experiments by comparing with other descriptors. The good manifold structure for image representations provides the superior compatibility with manifold-based ranking method. In next section, traditional manifold ranking

315

is analyzed and further modified to improve its effectiveness for image retrieval.

4. Modified manifold ranking Ranking the whole images in the database according to the similarity or distance with the query is the last and significant step in CBIR. The similarity measurements based on pair-wise metric learning have been widely used in image 320

retrieval. This framework, however, usually evaluates the similarity between the query and the database, and seldom considers the similarity relationship among database images. Manifold-based ranking[19] is proposed to learn the nearest neighbor relationship among the images through manifold learning algorithm. 4.1. Manifold ranking

325

The essential aim of manifold ranking[19] is to learn the underlying manifold by analyzing their local neighbor relationship in the Euclidean space and then propagate the similarity weight through this relationship. This algorithm can fully utilize the relationship among all the database images. Given image representations H = [h1 , h2 , · · · , hN ] ∈ RD×N contains N dis-

330

tinct images and the dimension of each image representation is D. The adjacent relationship among these images can be constructed with the weight graph W .

14

The element Wij denotes the similarity between image hi and hj , which can be defined by the heat kernel: [ ] Wij = exp −d2 (hi , hj ) /2σ 2

(16)

where hj belongs to hi ’s neighbors, otherwise, Wij = 0. And d (hi , hj ) denotes 335

a distance metric between hi and hj . In order to ensure the convergence of this algorithm, weight graph W is symmetrically normalized by S = D−1/2 W D−1/2 where D is the diagonal matrix with (i, i)-element equal to the sum of the ith row of W. For every query image hq , the initial score y0 = [y1 , y2 , · · · , yn ] is given where yi = 1 if hi is the query; otherwise, yi = 0. The cost function in

340

manifold ranking with regard to f is defined to be:

O(y) =

n 1 ∑ fi fj ( wij ∥ √ −√ ∥2 + µ∥y0 − f ∥2 ) 2 i,j=1 Dii Djj

(17)

where f denotes the final ranking score, and µ > 0 is the regularization parameter. By minimizing this function, the optimal solution is expressed as: f ∗ = (In − αS)−1 y0 where α =

1 1+µ

(18)

∈ [0, 1). Besides, manifold ranking also has another iteration

solution. Let f (t) denotes the score after t iterations, the score in t + 1 iteration 345

is expressed as: f (t + 1) = αSf (t) + (1 − α)y0

(19)

So in this algorithm, he positive score of the query is iteratively propagated to nearby images via the adjacency matrix until the algorithm convergence. The larger their scores are, the higher relevance they have with the query. But though manifold ranking can fully utilized the relationship among the images, 350

the high time complexity limits the applicability of this algorithm in the largescale dataset.

15

4.2. Modified manifold ranking Since original manifold ranking has terrible computational time complexity because the construction of adjacent matrix and the inverse matrix computa355

tion is very time-consuming. To solve this problem, we propose a fast modified manifold ranking algorithm for image retrieval. Our algorithm aims at reducing the time complexity in the second step in manifold ranking because the construction of adjacency matrix W is offline and has relatively little effect on the online similarity measurement. And more importantly, this algorithm can fur-

360

ther improve the image retrieval performance greatly compared with traditional manifold ranking [19]. In the process of propagating similarity weights in adjacency matrix, we [ ] ˜ 1, h ˜ 2, · · · , h ˜ P ∈ RD×P from pre-ranking ˜ = h choose a subset of images H images in L1 distance as the initial landmarks. The reason why we choose

365

these images is that they are the most similar samples to the query in preranking. That indicates that they have had high probability to be the final retrieval results. The score of landmarks corresponds to y˜0 = [˜ y1 , y˜2 , · · · , y˜n ], but the score y˜ is a meaningless zero vector if query hq does not belong to

370

the landmarks. So we add another fixed (P + 1)th landmark hq , namely [ ] ˜ 1, h ˜ 2, · · · , h ˜ P , hq ∈ RD×(P +1) ,which corresponds to its initial score ˜ = h H y˜0 = [˜ y1 , y˜2 , · · · , y˜n , 1]. In this way, the positive score of the query can be iteratively propagated to nearby landmark images via the adjacency matrix, which is the same as traditional manifold ranking. After selecting the landmarks from the database, the adjacent matrix S˜ can

375

also be selected from the original adjacent graph S instead of computing new subgraph every iteration. And then the ranking score among these landmarks regard to the query can be determined with original manifold ranking method: ( )−1 f˜∗ = In − αS˜ y˜0 (20) ( ) The time complexity is O P 3 . When the number of randomly selected landmarks is small, it can greatly improve the speed of score propagation. Then

380

the landmarks score is embedded into original data score yt except the last 16

landmark hq . Utilizing the global adjacency matrix among the whole database images, the score y is propagated to nearby images through iterating only once: yt+1 = αSyt + (1 − α)y0

(21)

After that, one global iterative propagation is finished and the score y is updated. This score is dependent on the initial landmarks. Next, the landmarks 385

need to be updated to choose the similar images as the new landmarks according to their scores. With the update of new landmarks, more similar images are selected as the propagation sources and thus the higher score images have the higher relevance with the query until the landmarks are not changing greatly or fixed iteration is reached.

390

The simplified process of modified manifold ranking (MMR) is illustrated as shown in Algorithm 4.1. The time complexity of this algorithm in weight propa( ) gation is O t · (N logN + P 3 ) where t is the iterations . So it is able to speed up the ranking significantly when the database is very large because in this case the ( ) ( ) time complexity is O t · (N logN + P 3 ) instead of O N 3 in traditional mani-

395

fold ranking. More importantly, this algorithm can achieve competitive ranking performance in image retrieval compared with traditional manifold ranking or common distance metric as demonstrated in the experiments.

5. Experiments In this section, we validate the effectiveness of Texton uniform descriptor 400

(TUD) and modified manifold ranking (MMR) in several image datasets. Some other image descriptors are presented and compared with our proposed descriptor, and traditional L1 distance and original manifold ranking are also considered to validate our method in the experiments. Besides, we present some image retrieval results in recent published papers. The detailed experimental results

405

and discussion are illustrated in the rest part of this section.

17

Algorithm 4.1: Modified manifold ranking (MMR) Input: image representation database H = [h1 , h2 , · · · , hn ], query image hq Output: the top-r relevant images with query 1. Initialize the score y0 for the query 2. Construct the adjacent matrix W based on L1 distance among points 3. Choose the top P points in preranking L1 distance as the initial landmarks corresponding to its initial score y˜0 4. Add the query to landmarks and its initial score is 1 5. Repeat t Find the ranking score of landmarks using traditional manifold ranking Embed the landmark scores into the overall score yt Propagate yt only once to update the overall score yt+1 Sort the score yt+1 and choose the top-P images as new landmarks Until Landmarks are not changing or reach certain iterations T 6. Sort the final overall score and output top-r relevant samples

5.1. Datasets We evaluate the proposed method on three public datasets: Corel-1K, Corel10K, Coil-100 and UKbench. The first two datasets involve category-level relevant images while Coil-100 and UKbench datasets involve object-level relevant 410

images which are the transformed versions of the query. We utilize different evaluation indicator to test the performance of the proposed method in these datasets. Average precision and recall are used as the evaluation measurement on Corel-1K, Corel-10K and Coil-100 datasets. And N-S score (maximum is 4) is used to measure the retrieval performance on UKbench dataset.

415

5.2. Manifold distribution evaluation The fundamental assumption of manifold-based ranking methods is that data is distributed well in the underlying manifold. For color images, the manifold distribution of image representations in the feature space is able to connect

18

image descriptors with subsequent manifold-based ranking. Image representa420

tion is not only to extract discriminative information, but also transform highdimensional images to low-dimensional feature space with remaining manifold characteristics. Moreover, visual perception is executed and stored in the way of manifold [22]. Considering that images vary in rotation as shown in Fig. 3, it can be seen that the intrinsic transformation is only one-dimensional, which rep-

425

resents the continuous curve change in the image space. Therefore, the intrinsic dimension of images for visual perception tend to be low relatively even though the original space may be high-dimensional. If image representation can capture the intrinsic transformation in natural images, it is possible to utilize much low-dimensional feature to represent discriminative information in the images,

430

and meanwhile provide the foundation of manifold-based ranking in subsequent step for image retrieval.

Figure 3: A set of rotated images in Coil-100 dataset

To evaluate and compare the manifold distribution between TUD and other descriptors, we select a set of images (shown in Fig. 3) from Coil-100 dataset 435

where identical objects are rotated with respect to a fixed axis. So the continuous curve is the one of the most important intrinsic transformations in these images. As one of the classical nonlinear manifold learning algorithms, Locality Linear Embedding (LLE) is utilized to visualize the two-dimensional manifold structure of image representations. Micro-structure descriptor (MSD), color

440

difference histogram (CDH) and local binary patterns (LBP). MSD describes the distribution of miro-structure in underlying color regions. CDH aims to represent the perceptually uniform color difference between colors and edge 19

orientation based on a measure of Euclidean distance. Utilizing binary string to represent texture structure, LBP has achieved superior performance when 445

applying to many vision tasks. The manifold visualized distribution of these descriptors are shown in Fig. 4. It can be seen that CDH and TUD are able to visualize the curve change in the exampled images. But TUD has slightly better curve distribution than CDH. LBP and MSD fail to describe the curve change in the two-dimensional visualization. As discussed above, the intrinsic change

450

lies in the continuous curve for this example, so TUD can reflect intrinsically the manifold structure and be compatible with manifold-based ranking compared with other global image representations.

(a) LBP

(b) MSD

(c) CDH

(d) TUD

Figure 4: The manifold visualized distribution of different methods

5.3. Experimental results To validate the effectiveness of our methods in category-level relevant images, 455

we compared the proposed TUD and modified manifold ranking with other common retrieval models in Corel-1K and Corel-10K datasets. And Coil-100 and UKbench datasets are used to test our methods for object-level relevant 20

images. In this paper, the number of landmarks in modified manifold ranking is chosen as the number of returned images in image retrieval, and the maximum 460

iterations are set to 10 in the experiments. The parameters η and α are discussed in 1 in Corel-1K datasets. It can be seen that η = 0.1 and α = 0.9 obtain the best retrieval results in this dataset. And similarly, these parameters are determined in other datasets. Other parameter selections in the datasets are listed in Table 2 in detail. It can be seen that the related parameters in category-level and object-

465

level images are significantly different. Category-level images are distinguished by scenes while object-level images are tested for the robustness of changing objects. With these settings, the performance of our methods and others are evaluated in the four image datasets. Table 1: The average accuracy with diferent parameters η and α in the Corel-1K dataset (%)

α

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0.05

77.72

79.04

79.88

80.49

80.87

81.20

81.55

81.74

81.85

0.10

79.34

80.49

81.21

81.74

82.06

82.50

82.74

82.89

83.24

0.15

79.17

80.09

80.64

80.94

81.32

81.72

81.91

82.06

81.89

0.20

78.51

79.03

79.40

79.70

79.92

80.07

80.08

79.99

79.97

0.25

77.40

77.90

78.04

78.24

78.53

78.58

78.29

77.51

76.36

0.30

76.45

76.70

76.78

76.98

77.14

76.86

75.42

73.11

68.94

0.35

75.56

75.59

75.76

75.81

75.85

74.21

70.60

64.27

58.33

0.40

74.75

74.87

74.91

74.88

74.73

70.26

61.98

52.86

43.86

0.45

74.09

74.06

73.98

73.73

73.57

64.01

52.02

39.47

28.69

0.50

73.53

73.37

73.11

72.75

72.56

57.18

38.94

25.24

18.81

η

In the Corel-1K dataset, every image is considered as the query with one 470

hundred relevant images. Three common global features are presented to compare with TUD, including MSD, CDH and LBP. Besides, common L1 distance measurement is introduced to test the validation of modified ranking method be21

Table 2: The parameter settings in the experiments

Parameters

a

η

K

α

Corel-1K

1.5

0.1

50

0.9

Corel-10K

1.5

0.1

50

0.9

Coil-100

1.5

0.5

50

0.5

UKbench

1.5

0.5

8

0.5

cause distance measurements among image representation are still widely used in image retrieval and L1 distance has great computational efficiency and ef475

fectiveness in most case. Also original manifold ranking (MR) is referred and compared in the experiments. The retrieval results in Corel-1K dataset are listed in Table 3 and Fig. 5 where average precision and recall for top-r returned images are used to evaluate the image retrieval. As shown in Fig. 5, compared with MSD, CDH and LBP

480

whose precision are 69.07%, 63.37% and 61.35% respectively with 20 images returned, TUD has better performance (77.73%) in image retrieval when using traditional L1 distance to measure the similarity among the images. Manifold ranking takes the similarity relationship among nearest neighbors into consideration and utilizes adjacent graph to propagate similarity score. TUD, MSD

485

and CDH with manifold ranking obtain 1.56%, 1.83%, 0.66% higher precision than that with L1 distance, respectively. CDH is not compatible with manifold ranking because CDH cannot be distributed well on the manifold. Furthermore, modified manifold ranking (MMR) is introduced to be associate with TUD, MSD and CDH for image retrieval. In this case, the retrieval precision and recall

490

are further improved. The retrieval precision of TUD with MMR can reach 83.24%, which is still higher than MSD and LBP. These results demonstrate that TUD and MMR have superior performance for image retrieval in Corel-1K dataset. MSD focuses on the local spatial distribution of certain patterns, and lack the contrast information among the pixels. On the contrary, CDH utilized

22

495

Euclidean distance to measure the perceptually uniform color difference, which only emphasizes the contrast. LBP is constructed to describe texture structure and not consider the basic color information, and thus limit its performance in natural image retrieval.

90 TUD__MMR MSD__MMR LBP__MMR TUD__MR MSD__MR LBP__MR TUD__L1 MSD__L1 CDH__L1 LBP__L1

85 80

Precision(%)

75 70 65 60 55 50 45 40

0

10

20

30

40 Recall(%)

50

60

70

80

Figure 5: The comparison of different descriptors and ranking methods on Corel1K dataset

500

The recent published retrieved results in each category of Corel-1K dataset are listed in Table 3 with 20 images returned. It can be seen that the retrieved results of TUD MMR in every category are competitive when compared with recent retrieval models. In the category of African, building, bus, dinosaur and food, our method has the best results. Due to the complex backgrounds, it

505

is difficult to clearly specify the main objects in the images. As one of the global image representations, TUD aims to mine the holistically discriminative information shared by relevant images each other and particular objects are parts of image contents which is essentially different from local image features. The average precision 77.73% of TUD is higher than most of the models when

510

considering the commonly used L1 distance in image retrieval. With modified manifold ranking, the average precision is further improved to 83.24%, which is much higher than others as shown in Table 3.

23

Table 3: The comparison of different image retrieval methods on Corel-1K dataset (%) African Beach Building Bus

DinosaurElephantFlower

Horse

Mountains Food

Average

Yu et al[25]

84.90

35.60

61.60

81.80

100.00

59.10

93.10

92.80

40.40

68.20

71.70

Lin et al.[26]

68.30

54.00

56.20

88.80

99.30

65.80

89.10

80.30

52.20

73.30

72.70

Irtaza et al.[27]

65.00

60.00

62.00

85.00

93.00

65.00

94.00

77.00

73.00

81.00

75.50

EIAlami [28]

72.60

59.30

58.70

89.10

99.30

70.20

92.80

85.60

56.20

77.20

76.10

Rao et al.[29]

75.15

57.65

74.70

94.30

98.95

56.55

95.45

86.65

45.90

82.00

76.73

Guo et al.[30]

84.70

45.40

67.80

85.30

99.30

71.10

93.30

95.80

49.80

80.80

77.33

Walia et al.[31]

51.00

90.00

58.00

78.00

100.00

84.00

100.00

100.00

84.00

38.00

78.30

GMM[32]

72.50

65.20

70.60

89.20

100.00

70.50

94.80

91.80

72.25

78.80

80.57

denseSIFT MR

58.55

54.45

55.00

58.80

93.85

95.90

70.40

76.65

81.65

39.65

68.49

TUD L1

76.45

40.35

78.20

90.60

99.10

70.50

92.60

90.60

51.15

87.75

77.73

TUD MR

83.45

35.35

78.80

94.60

99.90

61.90

98.85

97.30

49.90

92.80

79.29

TUD MMR

84.10

53.85

80.65

93.55

99.05

72.05

96.60

94.25

65.65

92.60

83.24

65 TUD__MMR MSD__MMR LBP__MMR TUD__MR MSD__MR LBP__MR TUD__L1 MSD__L1 CDH__L1 LBP__L1

60 55

Precision(%)

50 45 40 35 30 25 20

2

4

6

8

10

12 Recall(%)

14

16

18

20

22

Figure 6: The comparison of different descriptors and ranking methods on Corel10K dataset

24

Corel-10K dataset is the extensive version of Corel-1K and contains more 515

natural scenes and categories, such as various animals, buildings, plants, people, etc. Fig. 6 illustrates the retrieval results comparison among different descriptors and ranking methods on this dataset. TUD obtains higher average precision and recall than other three common descriptors in all the ranking methods. It demonstrates that TUD has more robust and discriminative performance to re-

520

trieve various category-level relevant images. Manifold ranking improves the retrieval results in TUD and MSD. But this ranking method is not appropriate for LBP because the worse distribution on the manifold may aggravate the propagation of irrelevant adjacent weights. So in modified manifold ranking, the average precision of LBP further decreases. However, TUD and MSD are com-

525

patible with modified manifold ranking and thus obtain better results (61.97% and 53.63% with 10 images returned). In most case shown in Fig. 6, the superior retrieval model tends to preserve higher precision when the number of returned images is enlarging. Besides, some recent published retrieved results with 12 images returned in Corel-10K dataset are listed in Table 4. It can be

530

seen that the retrieval results of our methods are always competitive in this dataset. Different from Corel-1K and Corel-10K datasets, Coil-100 contains nearduplicated objects which are rotated with respect to a fixed axis. Thus the intrinsic transformation among the images is apparent and can be expressed as

535

low-dimensional representations. As discussed above, our descriptor can reflect the rotation change among similar images. Table 5 lists the retrieval results of different methods. It can be seen that TUD with MMR can reach the best precision (98.05%) and recall (27.23%) when returning 20 images, which are significantly higher than any other methods in the Table 5. In some cases that

540

the global image representation may be different among images of the same category, e.g. objects and their opposite when rotating by 180 degree, modified manifold ranking method still has superior performance because the similarity

25

Table 4: The comparison of different image retrieval methods on Corel-10K dataset Methods

Precision (%)

Recall (%)

Ri-HOG[34]

52.13

6.25

SSH[35]

54.88

6.58

GMM[32]

47.25

5.67

CDFH[41]

45.07

5.41

denseSIFT MR

41.06

4.96

TUD L1

54.00

6.48

TUD MR

56.10

6.73

TUD MMR

58.99

9.94

weights propagate in the neighborhood based on adjacent matrix. With the iteratively propagation, the query can find the similar images even though they 545

are transformed greatly. To show the effectiveness of our descriptor and ranking method for nearduplicated images or objects, we also conduct extensive experiment in widely used UKbench dataset. Due to severe illustration and pose variations in this dataset, global feature may be difficult to capture these variations. The N-S

550

score of our methods and some state-of-the-art retrieved models are listed in Table 6. It can be seen from this table that the N-S score of TUD can reach 3.33, which is higher than MSD (N-S = 3.23) with L1 distance. Manifold ranking improves the MSD and TUD to 3.24 (+0.01) and 3.47 (+0.14) respectively. The reason why manifold ranking does not improve them considerably is that

555

each category has only 4 images and adjacent weight matrix has limited relationship measurement among neighbors. When TUD and MSD are associated with modified manifold ranking method, the retrieved results are 3.43 and 3.30 respectively, which are still comparable with much lower complexity. Some other methods are introduced to validate the effectiveness of TUD MMR as listed

560

in Table 6. BoW model with local SIFT feature can only reach N-S = 2.88 because SIFT only considers the gradient information in gray image and disre26

Table 5: The comparison of different image retrieval methods on Coil-100 dataset Methods

Precision (%)

Recall (%)

LBP L1

74.30

20.64

CDH L1

92.48

25.69

MSD L1

96.25

26.74

TUD L1

97.68

27.13

LBP MR

79.20

22.00

MSD MR

97.78

27.16

TUD MR

98.92

27.48

LBP MMR

76.33

21.20

MSD MMR

97.56

27.10

TUD MMR

98.65

27.40

gard color attribute in most natural images. The N-S score in [30, 36–38] also slightly lower than our TUD MMR model.

6. Conclusion 565

Image representation and ranking are essential parts in most content-based image retrieval systems, but these two steps are relatively independent for each other. Inspired by the connection between human vision system and manifold perception for visional images, we propose a novel image representations named texton uniform descriptor (TUD), and then analyze its distribution structure

570

on the manifold. The manifold learning analysis prove that the intrinsic structure is preserved in our descriptor for representing image contents. Based on that, manifold-based ranking method is considered to propagate the similarity with the query to neighborhood. This method is compatible with our descriptor because the intrinsic neighborhood structure is preserved and facilitate to

575

propagate the adjacent relationship among the images. Moreover, we propose fast modified manifold ranking (MMR) to further improve the efficiency and

27

Table 6: The comparison of different image retrieval methods on UKbench dataset Models

N-S score

Lin et al.[36]

3.29

Liu et al.[37]

3.43

Guo et al.[30]

3.42

BoW-SIFT[38]

2.88

EI-MST[42]

3.36

MSD L1

3.23

MSD MR

3.24

MSD MMR

3.35

TUD L1

3.33

TUD MR

3.47

TUD MMR

3.43

effectiveness in image retrieval. This algorithm selects small images as landmarks for propagating the adjacent relationship and update these landmarks through choosing similar images. Thus it avoids the expensive computation 580

for propagating the adjacent relationship in all the images every time in traditional manifold ranking. The extensive experiments in four public datasets demonstrate that our descriptor and modified manifold ranking have superior performance in both category-level and object-level image retrieval than other common descriptors and traditional manifold ranking. Also our methods

585

are better than state-of-the-art methods, especially in category-level relevant datasets.

Acknowledgments This work was supported by National Natural Science Foundation of P.R. China (61370200, 61210009) and China Postdoctoral Science Foundation (ZX20150629).

28

590

References [1] Jiang S, Wu Y, Fu Y. Deep Bi-directional Cross-triplet Embedding for Cross-Domain Clothing Retrieval[C]//Proceedings of the 2016 ACM on Multimedia Conference. ACM, 2016: 52-56. [2] Xia Y, Wan S, Yue L. Local spatial binary pattern: A new feature descrip-

595

tor for content-based image retrieval[C]//Fifth International Conference on Graphic and Image Processing. International Society for Optics and Photonics, 2014: 90691K-90691K-6. [3] Trzcinski T, Christoudias M, Fua P, et al. Boosting binary keypoint descriptors[C]//Proceedings of the IEEE Conference on Computer Vision and

600

Pattern Recognition. 2013: 2874-2881. [4] Xu B, Bu J, Chen C, et al. Efficient manifold ranking for image retrieval[C]//Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval. ACM, 2011: 525534.

605

[5] Julesz B. Textons, the elements of texture perception, and their interactions[J]. Nature, 1981, 290(5802): 91-97. [6] Ojala T, Pietik¨ainen M, M¨aenp¨a¨a T. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns[J]. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 2002, 24(7): 971-

610

987. [7] Dalal N, Triggs B. Histograms of oriented gradients for human detection[C]//Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on. IEEE, 2005, 1: 886-893. [8] Liu G H, Yang J Y, Content-based image retrieval using color difference

615

histogram, Pattern Recognition, 2013, 46(1): 188-198.

29

[9] Liu G H, Li Z Y, Zhang L, et al, Image retrieval based on micro-structure descriptor, Pattern Recognition, 2011, 44(9): 2123-2133. [10] Lowe D G, Distinctive image features from scale-invariant keypoints, International journal of computer vision, 2004, 60(2): 91-110. 620

[11] Bay H, Ess A, Tuytelaars T, et al. Speeded-up robust features(SURF)[J]. Computer vision and image understanding, 2008, 110(3): 346-359. [12] Wan J, Wang D, Hoi S C H, et al. Deep learning for content-based image retrieval: A comprehensive study[C]//Proceedings of the ACM International Conference on Multimedia. ACM, 2014: 157-166.

625

[13] Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks[C]//Advances in neural information processing systems. 2012: 1097-1105. [14] Serre T, Wolf L, Bileschi S, et al. Robust object recognition with cortex-like mechanisms[J]. Pattern Analysis and Machine Intelligence, IEEE Transac-

630

tions on, 2007, 29(3): 411-426. [15] Androutsos D, Plataniotis K N, Venetsanopoulos A N. Distance measures for color image retrieval[C]//Image Processing, 1998. ICIP 98. Proceedings. 1998 International Conference on. IEEE, 1998, 2: 770-774. [16] Kokare M, Chatterji B N, Biswas P K. Comparison of similarity metrics

635

for texture image retrieval[C]//TENCON 2003. Conference on Convergent Technologies for the Asia-Pacific Region. IEEE, 2003, 2: 571-575. [17] Brin S, Page L. The anatomy of a large-scale hypertextual web search engine[C]//Proceedings of the Seventh World Wide Web Conference. 2007. [18] Zhang S, Yang M, Cour T, et al. Query specific fusion for image re-

640

trieval[M]//Computer Vision–ECCV 2012. Springer Berlin Heidelberg, 2012:660-673.

30

[19] He J, Li M, Zhang H J, et al. Manifold-ranking based image retrieval[C]//Proceedings of the 12th annual ACM international conference on Multimedia. ACM, 2004: 9-16. 645

[20] Gong Y, Lazebnik S, Gordo A, et al. Iterative quantization: A procrustean approach to learning binary codes for large-scale image retrieval[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(12): 2916-2929. [21] Weiss Y, Torralba A, Fergus R. Spectral hashing[C]//Advances in neural

650

information processing systems. 2009: 1753-1760. [22] Seung H S, Lee D D. The manifold ways of perception[J]. Science, 2000, 290(5500): 2268-2269. [23] Hinton G E, Salakhutdinov R R. Reducing the dimensionality of data with neural networks[J]. Science, 2006, 313(5786): 504-507.

655

[24] Di Zenzo S. A note on the gradient of a multi-image[J]. Computer vision, graphics, and image processing, 1986, 33(1): 116-125. [25] Yu F X, Luo H, Lu Z M. Colour image retrieval using pattern co-occurrence matrices based on BTC and VQ[J]. Electronics letters, 2011,47(2): 100-101. [26] Lin C H, Chen R T, Chan Y K. A smart content-based image retrieval

660

system based on color and texture feature[J]. Image and Vision Computing, 2009, 27(6): 658-665. [27] Irtaza A, Jaffar M A, Aleisa E, et al. Embedding neural networks for semantic association in content based image retrieval[J]. Multimedia tools and applications, 2014, 72(2): 1911-1931.

665

[28] ElAlami M E. A new matching strategy for content based image retrieval system[J]. Applied Soft Computing, 2014, 14: 407-418.

31

[29] Rao L K, Rao D V. Local quantized extrema patterns for content-based natural and texture image retrieval[J]. Human-centric Computing and Information Sciences, 2015, 5(1): 1-24. 670

[30] Guo J M, Prasetyo H, Su H S. Image indexing using the color and bit pattern feature fusion[J]. Journal of Visual Communication and Image Representation, 2013, 24(8): 1360-1379. [31] Walia E, Pal A. Fusion framework for effective color image retrieval[J]. Journal of Visual Communication and Image Representation, 2014, 25(6):

675

1335-1348. [32] Zeng S, Huang R, Wang H, et al. Image retrieval using spatiograms of colors quantized by Gaussian Mixture Models[J]. Neurocomputing, 2016, 171: 673-684. [33] Feng L, Wu J, Liu S, et al. Global Correlation Descriptor: A novel image

680

representation for image retrieval[J]. Journal of Visual Communication and Image Representation, 2015, 33: 104-114. [34] Chen J, Nakashika T, Takiguchi T, et al. Content-based Image Retrieval Using Rotation-invariant Histograms of Oriented Gradients[C]//Proceedings of the 5th ACM on International Conference on Multimedia Retrieval.

685

ACM, 2015: 443-446. [35] Liu G H, Yang J Y, Li Z Y. Content-based image retrieval using computational visual attention model[J]. Pattern Recognition, 2015, 48(8): 25542566. [36] Lin Z, Brandt J. A local bag-of-features model for large-scale object

690

retrieval[M]//Computer Vision–ECCV 2010. Springer Berlin Heidelberg, 2010: 294-308. [37] Liu Z, Wang S, Tian Q. Fine-residual VLAD for image retrieval[J]. Neurocomputing, 2016, 173: 1183-1191.

32

[38] J´egou H, Douze M, Schmid C. Improving bag-of-features for large scale 695

image search[J]. International Journal of Computer Vision, 2010, 87(3): 316-336. [39] Cai H, Mikolajczyk K, Matas J. Learning linear discriminant projections for dimensionality reduction of image descriptors[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(2): 338-352.

700

[40] Deng Y, Li Y, Qian Y, et al. Visual words assignment via informationtheoretic manifold embedding[J]. IEEE transactions on cybernetics, 2014, 44(10): 1924-1937. [41] Liu G H. Content-based image retrieval based on cauchy density function histogram[C]//Natural Computation, Fuzzy Systems and Knowledge Dis-

705

covery (ICNC-FSKD), 2016 12th International Conference on. IEEE, 2016: 506-510. [42] Liu Y, Huang L, Wang S, et al. Efficient Segmentation for Region-based Image Retrieval Using Edge Integrated Minimum Spanning Tree[J]. ICPR 2016.

710

[43] Fu Y, Li Z, Huang T S, et al. Locally adaptive subspace and similarity metric learning for visual data clustering and retrieval[J]. Computer Vision and Image Understanding, 2008, 110(3): 390-402. [44] Xiao J, Fu Y, Lu Y, et al. Refining image retrieval using one-class classification[C]//Multimedia and Expo, 2009. ICME 2009. IEEE International

715

Conference on. IEEE, 2009: 314-317. [45] Koffka K. Principles of Gestalt psychology[M]. Routledge, 2013.

33

Highlight  A novel image descriptor is proposed to describe the visual content in images;  The compatibility between our descriptor and manifold ranking is analyzed;  A fast modified manifold ranking (MMR) is presented to improve the retrieval efficiency;  The extensive experiments demonstrate the effectiveness of our framework.

Author Biography Jun Wu, received BS degree in School of Mathematical Sciences from Dalian University of Technology, China, in 2014. Currently, he is working toward the MS degree in the School of Computer Science and Technology, Dalian University of Technology, China. His research interests include pattern recognition and machine learning. Lin Feng, received the BS degree in electronic technology from Dalian University of Technology, China, in 1992, the MS degree in power engineering from Dalian University of Technology, China, in 1995, and the PhD degree in mechanical design and theory from Dalian University of Technology, China, in 2004. He is currently a professor and doctoral supervisor in the School of Innovation Experiment, Dalian University of Technology, China. His research interests include intelligent image processing, robotics, data mining, and embedded systems. Shenglan Liu received the MS degree in College of Computer and Information Technology, Liaoning Normal University, China, in 2011. Currently, he is working toward the PhD degree in the School of Computer Science and Technology, Dalian University of Technology, China. His research interests include pattern recognition, image retrieval and machine learning. Muxin Sun, received BS degree from Dalian University of Technology, China, in 2014. Currently, he is working toward the MS degree in the School of Computer Science and Technology, Dalian University of Technology, China. His research interests include pattern recognition and machine learning.