Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎
Contents lists available at ScienceDirect
Neurocomputing journal homepage: www.elsevier.com/locate/neucom
Hierarchical support vector machine based structural classification with fused hierarchies Shuo Zhao a,b, Yahong Han a,b, Quan Zou a,b, Qinghua Hu a,b,n a b
School of Computer Science and Technology, Tianjin University, Tianjin 300072, PR China Tianjin Key Lab of Cognitive Computing and applications, Tianjin 300072, PR China
art ic l e i nf o
a b s t r a c t
Article history: Received 30 July 2015 Received in revised form 30 May 2016 Accepted 31 May 2016
In this paper, we consider the problem of hierarchical image classification with multiple semantic views of object categories. A novel method is proposed for computing an image-semantic measure by determining the weights for the semantic similarity among the concepts of each view. After obtaining the new image-semantic measure, we construct a semantic hierarchy with the existing method called TRUST-ME. For the hierarchical classification, we translate the classification task with a learned taxonomy into a structured support vector machine (SVM) learning framework. We demonstrate our method on VOC2010 and a subset of the Animals with Attributes dataset, and show that the structured SVM using the weighted semantic hierarchy provides better accuracy. & 2016 Elsevier B.V. All rights reserved.
Keywords: Hierarchical classification Taxonomies Structural learning Hierarchies construction
1. Introduction In computer vision, the recognition of object categories in images is one of the most difficult challenges, because of the semantic gap between low-level visual features and high-level semantics [1,2]. A good automatic image annotation system should bridge this gap and output a high-level semantic interpretation. However, most machine learning approaches [3] just utilize a mapping function that maps the visual features to the semantic classes. Those methods can only depict the visual contents of images without understanding their semantics. The use of examples with labels alone seems to be insufficient for image annotation. Therefore, it is time to incorporate extra semantic information into the classification models. Object recognition is a cognitive process of humans that occurs in a semantic space. In this semantic space, some classes are relative. Some classes are similar, grouped, or co-occur. A excellent recognition system should consider the relationship among different classes. In the real word, many objects are organized into a class hierarchy tree or a directed acyclic graph (DAG). Moreover, humans can recognize semantic contents accurately and efficiently with the help of taxonomical hierarchies. It has been shown that semantic hierarchies are very useful for narrowing the semantic gap [4]. A taxonomical hierarchy is a tree structure that groups n Corresponding author at: School of Computer Science and Technology, Tianjin University, Tianjin 300072, PR China. E-mail address:
[email protected] (Q. Hu).
related classes together. One well-known taxonomy is WordNet. Actually, hierarchical trees are based on the recognition process of humans and the method used by humans to distinguish one class from others. Thus, these trees can help learning models select more meaningful low-level features. Therefore, many studies have used taxonomies to incorporate semantic information for classification tasks [5–10]. Some have used hierarchies for the sake of accurate classification [11–14] and others for fast classification [15–18]. It could also improve the 3D classification performance [19,20]. Although a taxonomical hierarchy can assist with classification in this way, there are two issues that may complicate the recognition. First, the relatedness described by the tree structure may not agree with the visual relatedness. For example, “human” and “whale” are semantically close, because they are both the mammals, but there are huge differences between their visual features. A “shark” is more semantically distant from a “whale” than a “human”, but they are similar because of their visual features. Second, the visual features are so meaningful that a single relatedness measure cannot reflect the complex visual meaning. Some previous studies only considered one semantic hierarchy [15,16,21,22]. However, in reality, objects have different degrees of relatedness based on different views (e.g., conceptual similarity based on WordNet or visual similarity based on visual features). The same issues would happen in the tracking problems, too [23,24]. Thus, a good semantic hierarchy should incorporate these different degrees of relatedness to provide a better understanding of image semantics.
http://dx.doi.org/10.1016/j.neucom.2016.05.072 0925-2312/& 2016 Elsevier B.V. All rights reserved.
Please cite this article as: S. Zhao, et al., Hierarchical support vector machine based structural classification with fused hierarchies, Neurocomputing (2016), http://dx.doi.org/10.1016/j.neucom.2016.05.072i
S. Zhao et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎
2
Considering these issues, we present a semantic hierarchy construction approach that fuses multiple similarity or relatedness categories. First, for every view, we can compute the relatedness between categories. Then, we determine a set of weights for all of the relatedness values and total the linear weighted relatedness values to obtain the new semantic relatedness measure between categories, which can really reflect the meaningful relationship of the categories. After obtaining the semantic relatedness measure, we construct the hierarchy using the method called TRUST-ME [25], which is a set of heuristic rules that links the concepts according to the learned semantic relatedness measure. When a hierarchy is available, the task becomes one hierarchical classification problem with a pre-determined taxonomy. We translate the hierarchical classification problem into a structured support vector machine (SVM) learning framework. The definition of the taxonomic loss function in this structured SVM is based on the new learned semantic relatedness measure. Our main contributions are combining multiple semantic relatedness measure to construct an appropriate hierarchy, translating the hierarchical classification problem into a structured SVM learning framework, and defining a new taxonomic loss function. We demonstrate our method on PASCAL VOC2010 and a subset of the Animals with Attributes dataset. The results of a thorough experiment are reported, where structured SVM framework using the learned weighted semantic hierarchy provided accuracy improvements. The rest of the paper is organized as follows. The next section provides a review of the related work and discusses hierarchy construction and hierarchical classification. Section 3 introduces the proposed learning framework and structured SVM framework. The result of an experimental study is given in Section 4. Finally, our conclusion is given in Section 5.
2. Related work
2.2. Using hierarchies for object recognition Many real-world classification problems are essentially hierarchical classification problems. Hierarchical classification problems are scattered across different application domains, including image recognition [12,13], document classification [28], protein prediction [5,29,30]. Although these applications have different domains, they all exploit a hierarchical structure to promote classification. Several studies have utilized a hierarchy for efficient image classification [15,21,31]. These methods adopt a sequential greedy algorithm, which starts from the root node, chooses the most probable child at each node, and rules out other unlikely children until reaching a leaf node. In addition to improving efficiency, some other studies have exploited hierarchies to improve the recognition accuracy [4,28,12]. Such studies incorporated the inter-class relatedness of the hierarchy into the classification model. The assumption in these studies was that similar classes in the hierarchy should share similar visual features. One idea was to penalize the misclassifications according to the semantic distance in the tree hierarchy [4] or define some tree-induced loss [28] . Fan et al. [12] proposed an approach that simultaneously trains the classifiers of the nodes that share same parent node, addressing the problem of inter-class visual similarity. These methods simply exploit the parent–child relatedness in the tree without taking the entire tree hierarchy structure into account. More related to our work are techniques that exploit a structured learning framework [32]. The structured hierarchical classification approach considers the entire tree structure, maximizes the margin between correct and incorrect paths in the training phase, and considers paths as a unit for prediction. Many applications take advantage of structured learning framework for hierarchical classification, such as documents categorization with taxonomies [28] and seafloor imagery taxonomic categorization [33].
2.1. Building image hierarchy There have been several methods proposed to construct semantic hierarchies for images classification [12,15,16]. Some approaches [4,6,21] are based on the WordNet [26], which contains cognitive words and their superior-subordinate relations. These approaches build hierarchies by extracting the relevant subgraph of the concepts from the WordNet. Other approaches use visual feature information [15,16]. Griffin and Perona [15] computed a confusion matrix instead of the visual relatedness, and then used clustering algorithms to construct the hierarchy. A nonparametric Bayesian model was proposed to organize concepts into a tree hierarchy in [16]. These visual hierarchies provide the visual relatedness of concepts, but they cannot interpret the higher level image semantics. Therefore, both conceptual and visual information should be considered to construct a meaningful hierarchy. Other approaches have been based on multiple relatedness measures [12,25,27]. Li et al. [27] proposed an approach that automatically constructs the “semantivisual” hierarchy using visual features and tag information. Fan et al. [12] constructed a hierarchy based on visual similarity and conceptual similarity computed from WordNet. Bannour and Hudelot [25] used “semantico-visual relatedness of concepts” measure, which includes three types of similarity information (visual, conceptual, and contextual) to construct a faithful hierarchy. These methods just sum the similarity matrices using weights defined by human. These weights defined by human may not be suitable for image classification task.
3. Hierarchical classification with multiple semantic relatedness We cast the problem of exploiting multiple semantic views as the problem of constructing a meaningful measure, and then transform the hierarchical classification into a structured output prediction problem. The proposed framework for constructing hierarchies and hierarchical classification is illustrated in Fig. 1. First, we describe the method for computing meaningful measure, which utilizes a set of weights for different relatedness values. Then, we describe the hierarchical classification model, using the structured SVM framework. Finally, we propose an appropriate loss function based on the new measure developed in the structured SVM. We assume that we are given a dataset, D ¼ fðx1 ; y1 Þ; …; ðxN ; yN Þg A X Y, where xi A Rd denotes the ith instance, and yi A f1; 2; …; Cg is its class label, where C is the number of classes.
3.1. Learning semantic hierarchy For every view, we can compute the similarity matrix K A RCC , where K is a symmetric matrix and Kðyi ; yj Þ denotes the similarity or relatedness between classes yi and yj. We suppose that there are M M view, then we can get fK t gM t ¼ 1 . We first normalize fK t gt ¼ 1 into the same interval using the Min–Max Normalization. Now, we need determine the weight w A RM . We formulate our objective
Please cite this article as: S. Zhao, et al., Hierarchical support vector machine based structural classification with fused hierarchies, Neurocomputing (2016), http://dx.doi.org/10.1016/j.neucom.2016.05.072i
S. Zhao et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎
3
Fig. 1. Flowchart of our framework. First, we determine a set of weights to combine multiple similarity relatedness values and obtain a meaningful similarity measure. We use this new similarity measure to construct a hierarchy. Then we translate the classification task with a hierarchy into a structured SVM framework.
Fig. 2. The semantic hierarchy constructed on Pascal VOC2010 dataset. Leaf nodes are original classes. (For interpretation of the references to color in this figure caption, the reader is referred to the web version of this paper.)
3.2. Structural learning with taxonomies
function as follows: 2 M X wt K t minI w t¼1
F
s:t:
M X
wt ¼ 1
ð1Þ
t¼1
where I is the identity matrix. The definition of the objective function can handle the inconsistencies between different relatedness measures. If two concepts are similar (or dissimilar) in both the visual and conceptual relatedness measures, after summing the linear weighted matrices, the two concepts are also similar (or dissimilar) because of restricting the sum of weights to one. However, if two concepts are inconsistent in the visual and conceptual relatedness, the objective function will give a small weight to a large similarity. For example, the visual similarity between a whale and a human is smaller than their conceptual similarity: K visual ðwhale; humanÞ⪡K conceptual ðwhale; humanÞ. We will get a small weight for K conceptual ðwhale; humanÞ, and this can balance the inconsistency between visual and conceptual relatedness measures. The learning problem can be solved using an alternating iterative optimization algorithm. After obtaining the weight, we can compute the new similarity measure by summing the linear weighted similarity matrix to obtain the similarity measure PM t ¼ 1 wt K t . To obtain the tree structure, we exploit the method called TRUST-ME [25], which is a set of heuristic rules that links the concepts according to the new learned similarity measure. The hierarchy we construct on the Pascal VOC2010 dataset is depicted in Fig. 2, and the hierarchy constructed on the Animal with Attribute dataset is depicted in Fig. 3.
Given a hierarchy, for every leaf node y in the hierarchy, the set of nodes on the path from the root to node y is defined as π ðyÞ. Then we represent path π ðyÞ by a binary vector λðyÞ A Rs , where s is the number of total nodes and the ith element is given by 1 if i A π ðyÞ λi ðyÞ ¼ 0 otherwise For instance, in the hierarchy in Fig. 3, the path π ð5 : leopardÞ ¼ f5; 12; 15; 16g, and the class leopard is represented by λðleopardÞ ¼ ð0; 0; 0; 0; 1; 0; 0; 0; 0; 0; 0; 1; 0; 0; 1; 1Þ. A hierarchical classification task can be transformed into a structured SVM framework where a discrimination function, f ðxi ; yÞ ¼ 〈w; ϕðxi ; yÞ〉
ð2Þ
needs to be learned. The structured learning problem involves finding the hyperplane w according to the joint feature representation ϕðxi ; yÞ. The mapping function ϕðxi ; yÞ is dependent on the structured problem. In the taxonomical structure, ϕðxi ; yÞ is defined as 0 1 λ1 ðyÞ xi B C B λ2 ðyÞ xi C C ð3Þ ϕðxi ; yÞ ¼ λðyÞ xi ¼ B B C ⋮ @ A λs ðyÞ xi where is the tensor product, and λðyÞ is a binary attribute vector that denotes the path in the hierarchy tree of class y. Our goal is to find a multiclass classifier that maps the feature xi to the structured space Y. We will follow the structured learning
Please cite this article as: S. Zhao, et al., Hierarchical support vector machine based structural classification with fused hierarchies, Neurocomputing (2016), http://dx.doi.org/10.1016/j.neucom.2016.05.072i
S. Zhao et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎
4
Fig. 3. The semantic hierarchy constructed on Animal with Attribute dataset. Leaf nodes are original classes. (For interpretation of the references to color in this figure caption, the reader is referred to the web version of this paper.)
framework formulated by Tsochantaridis and Joachims [32]. Given N training examples D ¼ fðx1 ; y1 Þ; …; ðxN ; yN Þg, we formulate the learning problem with a margin rescaling as N 1 T CX w wþ ξ N i i w;ξ 2
arg min
s:t:
8 i; ξi 40;
^ Z Δðyi ; y^i Þ ξi 8 i; 8 y^ ayi : 〈w; ϕðxi ; yi Þ ϕðxi ; yÞ〉
4. Experiments ð4Þ
where C 4 0 is a constant that controls the tradeoff between training error minimization and margin maximization, and Δðyi ; y^i Þ is the loss function, which we will explain in the next section. The constraint in the minimization problem (3) is added to every training image and each constraint corresponds a slack variable ξi, which is added as an upper bound on the error Δðyi ; y^i Þ. This will make violating a margin constraint with a high Δðyi ; y^i Þ value result in a more severe penalty. We solve the optimization problem using the cutting plane algorithm in the SVMStruct software package [34]. 3.3. Taxonomic loss functions Different loss function can be defined in the structured learning framework. In this work, we evaluate several loss functions and introduce a meaningful loss function based on the learned similarity measure in Section 3.1. Considering the hierarchy structure, the loss function can be defined based on the given ground truth label y and the predicted ^ In this section, we consider four different loss functions: label y. (1) the standard 0/1 loss function: ( 1 if y ¼ y^ ^ yÞ ¼ Δ0=1 ðy; 0 otherwise The Δ0=1 loss just considers the tree hierarchy structure, but treats the classes as unrelated. (2) The hierarchy-based loss function [13], P ^ yÞ ¼ i j λi ðyÞ ^ λi ðyÞj , which counts the number of nonΔh ðy; shared nodes on the path between the true label y and the pre^ (3) The weighted hierarchical difference loss [33], diction y. X ^ yÞ ¼ ^ Ψ ðλi ðyÞÞj ΔWHD ðy; j Ψ ðλi ðyÞÞ i
penalizes an error that occurs higher up the hierarchy more severely. Ψ ðλi ðyÞÞ is defined as a weighting function, which divides each element i of the binary vector λðyÞ by its level. For example, in the hierarchy of Fig. 3, for class 5 : leopard, we have Ψ ðλðleopardÞÞ ¼ ð0; 0; 0; 0; 1=4; 0; 0; 0; 0; 0; 0; 1=3; 0; 0; 1=2; 1Þ. (4) Out proposed semantic loss function can be written as follows: ^ yÞ ¼ 1 Δsemantic ðy;
M X
^ yÞ wt K t ðy;
^ We compared the result of different loss functions in prediction y. an experiment.
ð5Þ
t¼1
This is defined based on the learned similarity measure and can describe the meaningful distance between the true label y and the
In this section, we evaluate the performance of our approach using two image datasets, and compare the results with several baseline algorithms. 4.1. Datasets We performed experiments on two image datasets: Pascal VOC2010 and Animals with Attributes (AWA). The Pascal VOC2010 dataset contains 20 classes and a total of 11,321 images. For the AWA, we used a subset that contained ten classes in [14], and a total of 6180 images. We refer to these as VOC-2010 and AWA-10, respectively. We compared our approach with four other methods: (1) random forest (RF); (2) flat structured multiclass SVM (flat-SVM), which uses the joint feature representation, ignoring the hierarchy structure; (3) linear-SVM; and (4) semantic kernel forests(SKF) [14], which leverages multiple hierarchies for hierarchical recognition. In the Pascal VOC2010 dataset, we also compared our hierarchy with the hierarchy constructed in [25]. 4.2. Experimental setup For the Pascal VOC2010 dataset, we used a bag-of-visual-words (BOV) model. The BOV model was constructed by computing dense scale-invariant feature transform (SIFT) descriptors, generate codebook, encoding SIFT. We computed the SIFT descriptors using the VLFeat package [35]. The dimensionality of the SIFT descriptors were reduced to d ¼32 by using PCA. The usage of PCA was found to improve the performance and decrease computation complexity in Fisher vector representation. When we obtained the collection of SIFT descriptors from the training images, we performed Gaussian mixture model (GMM) clustering on a random subset of patches from the training set to generate a visual codebook. Considering the computational complexity, we set the size of codebook to k ¼32. After obtaining the codebook, we used the Fisher encoding [36] method to encode the SIFT features in an image. For the implementation of Fisher encoding method, we followed the method in [37]. Then, every image was represented by a BOV vector with 2nknd dimensions. In the AWA-10, we used the deep convolutional activation features (DeCAF) supplied in the dataset. For comparison with the SKF [14], we reduced the dimensionality to 100 using principal component analysis (PCA) to speed up training the tree of metrics (ToM) in [14].
Please cite this article as: S. Zhao, et al., Hierarchical support vector machine based structural classification with fused hierarchies, Neurocomputing (2016), http://dx.doi.org/10.1016/j.neucom.2016.05.072i
S. Zhao et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎
To obtain a meaningful similarity measure, we need compute different similarity relatedness values for the multiple views. For the VOC-2010, we used three similarity relatedness measures as follows [25]: the visual similarity, which represented the distance between concepts in a visual feature space; the conceptual similarity, which defined the distance to the nearest ancestor of two concepts in WordNet; and the co-occurrence probability between each pair of concepts. For AWA-10, we used four similarity relatedness measures: the visual similarity and conceptual similarity that were defined for VOC-2010, appearance similarity, and habitat similarity. When computing appearance similarity and habitat similarity, we followed methods outlined in [14], which computed these based on the Euclidean distance using vectors of the training images real-valued attributes supplied in the dataset.
5
the Pascal VOC2010. The results were shown in Table 1. The structured SVM approach was used with different hierarchies. Compared with the hierarchy constructed in [25], our learned hierarchy showed an improvement of 3.6%. This demonstrates that our learned hierarchy can efficiently describe the true relatedness of different classes. We also compared the learned hierarchy with the original hierarchies, which is constructed based on single similarity relatedness. In VOC2010 dataset, we had three original hierarchies: visual hierarchy (SSVMþVisual), conceptual hierarchy (SSVMþ Concept), and contextual hierarchy (SSVMþcontextual). The fused hierarchy got better performance than every original hierarchy. This result shows that only a single relatedness measure is insufficient to reflect the complex visual meaning. The learned Table 1 Performance on VOC-2010.
4.3. Experimental results 4.3.1. Results in VOC-2010 To evaluate our approach, we used training set of VOC-2010 to train a model and validation set for testing. The classification performance was evaluated using the average precision (AP) score, a standard evaluation criterion supplied by the PASCAL challenge. The AP computes the area under the precision/recall curve, and a higher value represents a better performance. We reported the mean average precision (mAP) which was based on the mean AP score for the 20 classes. We first compare our hierarchical classification (SSVMþsemantic) and the flat classification (flat-SVM), which uses the structured learning framework but ignores the hierarchy structure. Fig. 4 compares the performance for every-class of our hierarchic method with the performance of the flat-SVM classification. Our hierarchical approach has better classification performance for most classes than the flat one. The hierarchical approach gains huge improvement on the classes with lower average precision, such as “chair” and “sofa”. This comparison shows that the hierarchy we developed could enhance the recognition accuracy. We also compare our approach (SSVMþsemantic) with three other methods: RF, flat-SVM, and linear-SVM. Considering the efficiency, our method is not compared with SKF [14]. These comparisons are listed in Table 1 and shown in Fig. 5. As expected, the structure learning with the learned hierarchy methods outperform the others. Compared with other flat popular methods, the hierarchical approach produces higher performance. The structure learning with the hierarchy performs a better classification than the flat-SVM, with an improvement of þ2.43%. This means that the hierarchy contributes to the classification. In order to evaluate our learned hierarchy, the learned hierarchy was compared with some other hierarchies constructed on
Method
mAP
Random Forest Flat-SVM Linear-SVM
0.3823 0.4244 0.4277
SSVMþVisual SSVMþConceptual SSVMþContextual SSVMþHichem SSVMþSemantic
0.4104 0.4089 0.4166 0.4127 0.4487
Fig. 5. Performance of different methods on VOC2010.
Fig. 4. Average precision of flat and hierarchical classification on VOC2010.
Please cite this article as: S. Zhao, et al., Hierarchical support vector machine based structural classification with fused hierarchies, Neurocomputing (2016), http://dx.doi.org/10.1016/j.neucom.2016.05.072i
S. Zhao et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎
6
Table 2 Classification accuracy on AWA-10, across five training/testing splits. Method
Accuracy
Random Forest Flat-SVM linear-SVM SKF [14]
74.447 0.10 75.7071.21 75.82 7 1.31 75.96 7 1.81
SSVMþ Visual SSVMþ Concept SSVMþ Appearance SSVMþ Habitat SSVMþ Semantic
76.72 70.81 75.86 7 1.52 76.56 7 1.44 75.86 7 1.01 77.11 71.25
Table 3 Evaluation of performances with various loss functions. Dataset
Δ0=1
Δh
ΔWHD
Δsemantic
AWA-10 VOC2010
76.56 7 0.92 0.4398
76.177 1.38 0.4091
75.78 7 1.48 0.3939
77.11 71.25 0.4487
4.3.3. Evaluation of loss function To examine the effect of our meaningful semantic loss function, we compared the performances using different loss functions on VOC2010 and AWA-10. Table 3 lists the performances of the loss functions, which included the standard 0/1 loss, hierarchy-based loss [13], weighted hierarchical difference loss [33], and our semantic loss. We observe that our semantic loss function performed the best. This illustrates that the learned similarity measure is more faithful and can precisely describe the relationship between concepts. 4.4. Experimental discussion
Fig. 6. Performance of different methods on AWA10.
The experiments above showed that our proposed method was compatible with other encoding method, including Fisher Vector. Furthermore, the results in Tables 1 and 2 demonstrated that Structured SVM with the new learned semantic hierarchy performed better than the other three original hierarchies. The new learned semantic hierarchy worked better, which meant that proper hierarchy would improve the classification accuracy. The improvement was due to the weights optimization and basic hierarchies ensemble. Deep learning has been developed and applied vastly in image classification [38]. Abundant training samples made deep learning outperform traditional methods. Mid-level image representation [39] was also been proposed for image classification. We did not compare with these methods because our method suits for the small training set and low dimensional features, while deep learning and mid-level image representation work well on the high dimensional features and big data.
hierarchy can depict the complex relatedness of the classes more accurately. 5. Conclusions 4.3.2. Results in AWA-10 In AWA-10, we spilt the images into 100/100/100 images per class for training/validation/testing, and generated five such random splits. We evaluated the performance using the recognition accuracy and standard errors for a 95% confidence interval. We evaluated our approach (SSVMþsemantic) on AWA-10 and compared it with four other methods: RF, flat SVM, linear-SVM, and SKF [14]. In Table 2 and Fig. 6, we report the multi-class classification accuracy of 10 classes for AWA-10. Our structure learning with the learned hierarchy outperformed all the other methods. Our method is comparable with SKF. We just use the linear kernel, but the SKF uses multiple kernel learning and nonlinear kernels. Compared with the flat SVM, the hierarchical method slightly improves the classification accuracy. We analyze that the ten classes all belong to animals. Their relatedness is relatively simple, so the hierarchical method only produces slight improvement. To evaluate our learned hierarchy, we compared the learned hierarchy with the original hierarchies, which is constructed based on single similarity relatedness. In this dataset, we got four original hierarchies: visual hierarchy (SSVMþVisual), conceptual hierarchy (SSVMþConcept), appearance hierarchy (SSVMþAppearance), and habitat hierarchy (SSVMþ Habitat). The comparisons were listed in Table 2. The learned hierarchy preformed better than every original hierarchy. This result shows that the hierarchy learned from multiple views contains more meaningful information than only single original view.
This paper considered the problem of hierarchical image classification with multiple semantic views. We proposed a new method of fusing multiple relatedness measures of concepts for constructing a hierarchy and translated the hierarchical classification task into a structured SVM learning framework. Experimental results showed that our learned similarity measure is more faithful, and the structure learning with the learned hierarchy improves the object recognition accuracy. In future work, we plan to explore more meaningful similarity measure and new approach for hierarchical classification. The latest feature reduction methods [40,41] are also expected to combine together with our method.
Acknowledgments This work is supported by the Natural Science Foundation of China (Nos. 61370010, 61222210, 61472276 and 61432011).
References [1] A.W. Smeulders, M. Worring, S. Santini, A. Gupta, R. Jain, Content-based image retrieval at the end of the early years, IEEE Trans. Pattern Anal. Mach. Intell. 22 (12) (2000) 1349–1380. [2] W. Shen, B. Wang, Y. Wang, X. Bai, Latecki Longin, Face identification using reference-based features with message passing model, Neurocomputing 99 (2013) 339–346.
Please cite this article as: S. Zhao, et al., Hierarchical support vector machine based structural classification with fused hierarchies, Neurocomputing (2016), http://dx.doi.org/10.1016/j.neucom.2016.05.072i
S. Zhao et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎ [3] J. Wang, J. Yang, K. Yu, F. Lv, T. Huang, Y. Gong, Locality-constrained linear coding for image classification, in: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, San Francisco, 2010, pp. 3360–3367. [4] J. Deng, A.C. Berg, K. Li, L. Fei-Fei, What does classifying more than 10,000 image categories tell us? in: Computer Vision–ECCV 2010, Springer, Crete, Greece, 2010, pp. 71–84. [5] C. Lin, Y. Zou, J. Qin, X. Liu, Y. Jiang, C. Ke, Q. Zou, Hierarchical classification of protein folds using a novel ensemble classifier, PloS One 8 (2) (2013) e56499. [6] R. Fergus, H. Bernal, Y. Weiss, A. Torralba, Semantic label sharing for learning with many categories, in: Computer Vision–ECCV 2010, Springer, Crete, Greece, 2010, pp. 762–775. [7] R. Ji, X. Xie, H. Yao, W.-Y. Ma, Vocabulary hierarchy optimization for effective and transferable retrieval, in: IEEE Conference on Computer Vision and Pattern Recognition, 2009. CVPR 2009, IEEE, Miami, 2009, pp. 1161–1168. [8] R. Ji, H. Yao, X. Sun, B. Zhong, W. Gao, Towards semantic embedding in visual vocabulary, in: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, San Francisco, 2010, pp. 918–925. [9] Q. Zou, X.-B. Li, W.-R. Jiang, Z.-Y. Lin, G.-L. Li, K. Chen, Survey of mapreduce frame operation in bioinformatics, Brief. Bioinform. 15 (4) (2014) 637–647. [10] X. Wang, Z.-G. Hou, A. Zou, M. Tan, L. Cheng, A behavior controller based on spiking neural networks for mobile robots, Neurocomputing 71 (4) (2008) 655–666. [11] C. Lin, W. Chen, C. Qiu, Y. Wu, S. Krishnan, Q. Zou, Libd3c: ensemble classifiers with a clustering and dynamic selection strategy, Neurocomputing 123 (2014) 424–435. [12] J. Fan, Y. Gao, H. Luo, Integrating concept ontology and multitask learning to achieve more effective classifier training for multilevel image annotation, IEEE Trans. Image Process. 17 (3) (2008) 407–426. [13] A. Binder, K.-R. Müller, M. Kawanabe, On taxonomies for multi-class image categorization, Int. J. Comput. Vis. 99 (3) (2012) 281–301. [14] S.J. Hwang, K. Grauman, F. Sha, Semantic kernel forests from multiple taxonomies, in: Advances in Neural Information Processing Systems, 2012, pp. 1718–1726. [15] G. Griffin, P. Perona, Learning and using taxonomies for fast visual categorization, in: IEEE Conference on Computer Vision and Pattern Recognition, 2008. CVPR 2008, IEEE, Anchorage, Alaska, 2008, pp. 1–8. [16] E. Bart, I. Porteous, P. Perona, M. Welling, Unsupervised learning of visual taxonomies, in: IEEE Conference on Computer Vision and Pattern Recognition, 2008. CVPR 2008, IEEE, Anchorage, Alaska, 2008, pp. 1–8. [17] S. Bengio, J. Weston, D. Grangier, Label embedding trees for large multi-class tasks, in: Advances in Neural Information Processing Systems, 2010, pp. 163–171. [18] L. Cheng, Z.-G. Hou, M. Tan, A mean square consensus protocol for linear multi-agent systems with communication noises and fixed topologies, IEEE Trans. Autom. Control 59 (1) (2014) 261–267. [19] X. Bai, S. Bai, Z. Zhu, Latecki Longin, 3d shape matching via two layer coding, IEEE Trans. Pattern Anal. Mach. Intell. 37 (12) (2015) 2361–2373. [20] B. Zhong, Y. Shen, Y. Chen, W. Xie, Z. Cui, H. Zhang, D. Chen, T. Wang, X. Liu, S. Peng, et al., Online learning 3d context for robust visual tracking, Neurocomputing 151 (2015) 710–718. [21] M. Marszałek, C. Schmid, Semantic hierarchies for visual object recognition, in: IEEE Conference on Computer Vision and Pattern Recognition, 2007. CVPR’07, IEEE, Minneapolis, 2007, pp. 1–7. [22] L. Wei, M. Liao, Y. Gao, R. Ji, Z. He, Q. Zou, Improved and promising identification of human micrornas by incorporating a high-quality negative set, IEEE/ ACM Trans. Comput. Biol. Bioinform. 11 (1) (2014) 192–201. [23] B. Zhong, Y. Chen, Y. Shen, Y. Chen, Z. Cui, R. Ji, X. Yuan, D. Chen, W. Chen, Robust tracking via patch-based appearance model and local background estimation, Neurocomputing 123 (2014) 344–353. [24] B. Zhong, X. Yuan, R. Ji, Y. Yan, Z. Cui, X. Hong, Y. Chen, T. Wang, D. Chen, J. Yu, Structured partial least squares for simultaneous object tracking and segmentation, Neurocomputing 133 (2014) 317–327. [25] H. Bannour, C. Hudelot, Building semantic hierarchies faithful to image semantics, Adv. Multimed. Model. 7131 (2012) 4–15. [26] G. Miller, C. Fellbaum, Wordnet: An Electronic Lexical Database, 1998. [27] L.-J. Li, C. Wang, Y. Lim, D.M. Blei, L. Fei-Fei, Building and using a semantivisual image hierarchy, in: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, San Francisco, 2010, pp. 3336–3343. [28] L. Cai, T. Hofmann, Hierarchical document categorization with support vector machines, in: Proceedings of the Thirteenth ACM International Conference on Information and Knowledge Management, ACM, Bremen, Germany, 2004, pp. 78–87. [29] L. Song, D. Li, X. Zeng, Y. Wu, L. Guo, Q. Zou, ndna-prot: identification of dnabinding proteins based on unbalanced classification, BMC Bioinform. 15 (1) (2014) 298. [30] Q. Zou, X. Li, Y. Jiang, Y. Zhao, G. Wang, Binmempredict: a web server and software for predicting membrane protein types, Curr. Proteom. 10 (1) (2013) 2–9. [31] M. Marszałek, C. Schmid, Constructing category hierarchies for visual recognition, in: Computer Vision–ECCV 2008, Springer, Marseille, France, 2008, pp. 479–491. [32] I. Tsochantaridis, T. Joachims, T. Hofmann, Y. Altun, Large margin methods for structured and interdependent output variables, J. Mach. Learn. Res. (2005) 1453– 1484. [33] N. Nourani-Vatani, R. López-Sastre, S. Williams, Structured output prediction with hierarchical loss functions for seafloor imagery taxonomic categorization. Pattern Recognition and Image Analysis. Springer International Publishing, 2015: 173-183.
7
[34] T. Joachims, Support Vector Machine for Complex Outputs, 〈http://www.cs. cornell.edu/people/tj/svm_light/svm_struct.html〉, 2008. [35] A. Vedaldi, B. Fulkerson, Vlfeat: An open and portable library of computer vision algorithms, in: Proceedings of the International Conference on Multimedia, ACM, Florence, Italy, 2010, pp. 1469–1472. [36] F. Perronnin, J. Sánchez, T. Mensink, Improving the fisher kernel for large-scale image classification, in: Computer Vision–ECCV 2010, Springer, Crete, Greece, 2010, pp. 143–156. [37] K. Chatfield, V.S. Lempitsky, A. Vedaldi, A. Zisserman, The devil is in the details: an evaluation of recent feature encoding methods, in: BMVC, vol. 2, 2011, p. 8. [38] C. Mircea, M. Subhransu, K. Iasonas, A. Vedaldi, Deep filter banks for texture recognition, description, and segmentation, Int. J. Comput. Vis. 118 (1) (2016) 65–94. [39] X. Wang, B. Wang, X. Bai, W. Liu, Z. Tu, Max-margin multiple-instance dictionary learning, in: International Conference on Machine Learning, 2013, pp. 846–854. [40] Q. Zou, J. Zeng, L. Cao, R. Ji, A novel features ranking metric with application to scalable visual and bioinformatics data classification, Neurocomputing 173 (2016) 346–354. [41] X. Bai, C. Yao, W. Liu, Strokelets: a learned multi-scale mid-level representation for scene text recognition, IEEE Trans. Image Process. 25 (6) (2016) 2789–2802.
Shuo Zhao is currently a Master Student with the School of Computer Science and Technology in Tianjin University. His research interests include computer vision and machine learning.
Yahong Han received the Ph.D. degree from Zhejiang University, Hangzhou, China. He is currently an Associate Professor with the School of Computer Science and Technology, Tianjin University, Tianjin, China. His current research interests include multimedia analysis, retrieval, and machine learning.
Quan Zou received his Ph.D. from Harbin Institute of Technology, PR China in 2009. From 2009 to 2015, he is an Assistant and Associate Professor in Xiamen University, PR China. He is currently a Professor of Computer Science at Tianjin University, a member of ACM and IEEE. His research is in the areas of bioinformatics, machine learning and parallel computing.
Qinghua Hu received B.E., M.E. and Ph.D. degrees from Harbin Institute of Technology, Harbin, China in 1999, 2002 and 2008, respectively. He once worked with Harbin Institute of Technology as Assistant Professor and Associate Professor from 2006 to 2011 and a Postdoctoral Fellow with the Hong Kong Polytechnic University. He is now a full Professor with Tianjin University. His research interests are focused on intelligent modeling, data mining, knowledge discovery for classification and regression. He is the PC co-chair of RSCTC 2010, CRSSC 2012, and ICMLC 2014 and severs as referee for a great number of journals and conferences. He has published more than 100 journal and conference papers in the areas of pattern recognition, machine learning and data mining.
Please cite this article as: S. Zhao, et al., Hierarchical support vector machine based structural classification with fused hierarchies, Neurocomputing (2016), http://dx.doi.org/10.1016/j.neucom.2016.05.072i