Incremental and Dynamic Graph Construction with Application to Image Classification
Journal Pre-proof
Incremental and Dynamic Graph Construction with Application to Image Classification Alireza Bosaghzadeh, Fadi Dornaika PII: DOI: Reference:
S0957-4174(19)30834-6 https://doi.org/10.1016/j.eswa.2019.113117 ESWA 113117
To appear in:
Expert Systems With Applications
Received date: Revised date: Accepted date:
27 February 2019 31 August 2019 1 December 2019
Please cite this article as: Alireza Bosaghzadeh, Fadi Dornaika, Incremental and Dynamic Graph Construction with Application to Image Classification, Expert Systems With Applications (2019), doi: https://doi.org/10.1016/j.eswa.2019.113117
This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. © 2019 Published by Elsevier Ltd.
Highlights • The propose incremental method can insert nodes into a previously constructed graph • The performance is similar to the graph that is constructed from scratch • The incremental scenario can be exploited by any graph construction method • Computational complexity is far less than that of the batch graph
1
Incremental and Dynamic Graph Construction with Application to Image Classification
Alireza Bosaghzadeh 1
1,∗
Computer Engineering Faculty, Shahid Rajaee Teacher Training University, IRAN ∗ Corresponding author: EMAIL:
[email protected] TEL: 0098 2122970060 Fadi Dornaika 2
3
2,3
University of the Basque Country, Spain IKERBASQUE, Basque Foundation for Science, Spain EMAIL:
[email protected] TEL: 0034 943018034
2
Incremental and Dynamic Graph Construction with Application to Image Classification 1
A. Bosaghzadeh 1,∗ , F. Dornaika 2,3 , Computer Engineering Faculty, Shahid Rajaee Teacher Training University, IRAN 2 University of the Basque Country, Spain 3 IKERBASQUE, Basque Foundation for Science, Spain
Abstract In this paper, we propose a dynamic graph construction technique that inserts new samples into a previously constructed graph, which reduces the computational time and complexity of the classic batch construction schemes. The basic assumption of the proposed method is that by adding one sample into a graph, only its close nodes will be affected, hence, it is not necessary to update the whole graph but only the weights of close nodes. The proposed method has two steps of insertion and updating. In the insertion phase, the similarity between the new sample and the available data is calculated. The similarity vector is retrieved either from a distance function or a coding scheme. Then, in the update phase, by evaluating the similarity vector of the new sample, close nodes which will be affected are identified and the graph weights of these close (similar) nodes are updated. By adopting this scenario, in each insertion of a node (or a set of nodes), only the weights of very few nodes have to be updated. It is worthy to mention that since the proposed method does not invoke labels of the samples, it can be adopted by any unsupervised, semi-supervised or supervised technique. A set of extensive experiments for the task of classification on different image datasets show that, in various post-graph learning tasks (i.e., Label propagation and Manifold learning), the graph which is constructed by the proposed method, even after hundreds of insertions and updates, has a similar performance (in some cases it can be better) compared to the graph which is constructed from scratch.
Keywords : Incremental graph construction, Locality-constrained Linear Coding, Graphbased label propagation, Graph-based linear manifold learning, Semi-supervised learning, Face recognition
3
1
Introduction
With the extensive use of digital cameras, a lot of images and videos are produced which require huge processing power to analyze them. One of the main challenges is to use an efficient method which can relate some recently produced data to the previously available ones. It may happen in different scenarios, as an example, the images and videos which are archived in a personal PC or uploaded to a website like facebook should be linked together to find their similarities in the contexts like similar people in the images or similar locations where the photos have been taken. The most general case can be considered as when we have new information, and we want to relate it to the available knowledge. It should be taken into account that this fusion of information can also affect the relation between the available data. Affinity graph also known as Similarity matrix is a tool which can be used to represent the relation between the data. It is a powerful tool with a wide range of applications in clustering, classification and manifold learning (Luxburg, 2007; Z. Wang, Song, & Zhang, 2009; West, 2001). Hence, the challenge will become how we can efficiently add new information and nodes into a previously constructed graph. Moreover, the insertion of new information and nodes into a graph can change the relation between the available nodes. So the second challenge is, how to update the relation among the already available nodes. Due to the efficiency of KNN graph, it is a popular technique which is widely used for graph construction, despite the fact that its performance is lower compared to other graph construction techniques (Dornaika, Bosaghzadeh, Salmane, & Ruichek, 2014b; Dornaika & Bosaghzadeh, 2015). Despite this popularity, the construction of neighborhood graph suffers from parameter setting (i.e., neighborhood size and weighting parameters) especially for datasets with large data samples(Gan, 2012; Connor & Kumar, 2010; Liu, He, & Chang, 2010). To solve these problems the authors in (Uno, Sugiyama, & Tsuda, 2009; Chen, Fang, & Saad, 2009; Zhang, Huang, Geng, & Liu, 2013) estimated the neighborhood graph in a twostep scenario. At first, they calculate subgraphs which are constructed from overlapping and non-overlapping sets of data. Then, they fuse the subgraphs together to approximate the neighborhood graph. In (Liu et al., 2010), the authors proposed an algorithm based on the use of some samples called anchor points. In the first phase, they cluster the data by adopting K-means algorithm
4
and the center of the clusters are called anchor points. Then, they calculate the similarity between the data and anchor points. If the number of anchor points is sufficiently large, this algorithm can approximate the exact neighborhood graph. Similar idea for fast spectral clustering was proposed in (W. Zhu, Nie, & Li, 2017). At first, they construct an anchor-based similarity graph with Balanced K-means based Hierarchical K-means algorithm and then perform spectral analysis on the graph. While Anchor Graph Regularization (AGR) is proposed to deal with large amount of data, when a dataset becomes much larger, even AGR faces a big graph which increases the computational costs. To solve this issue, a Hierarchical Anchor Graph Regularization (HAGR) approach is proposed which explores multiple-layer anchors with a pyramid-style structure (M. Wang, Fu, Hao, Liu, & Wu, 2017). The labels are inferred in a hierarchial manner from a coarsest anchor layer to the finest one. The main objective of the above-mentioned methods is efficient graph construction. This would be a good solution if we are dealing with large datasets and the whole data are available in the beginning. However, if we have incoming sequences of samples, one needs to reconstruct a new graph each time new samples are received. The main drawback of these methods is that they cannot update an existing graph which makes these solutions computationally expensive especially when we have a stream of data. In the work described in (Hacid & Yoshida, 2007), the authors proposed an algorithm to perform local update of a Relative Neighbor Graph (RNG) following the insertion of a new node which is used for incremental RNG construction. The proposed algorithm is as follows, first they randomly select two nodes and create an edge between them and then in an iterative process they insert new nodes to the graph and locally update it. After insertion of a new node, they define a hypersphere around the added node and update the edges of the nodes inside the hypersphere with the classic brute-force algorithm. Rayar et al. (Rayar, Barrat, Bouali, & Venturini, 2015) proposed a method that is similar to (Hacid & Yoshida, 2007) but it leverages an edge-based neighborhood local update strategy to yield an approximate graph. Following the insertion of a new image, they update only a relevant sub-graph of the existing graph. Hence, it outperforms the work of (Hacid & Yoshida, 2007) regarding computation time while yielding a refined approximation. In this paper, an incremental graph construction method is proposed which starts from a
5
small graph and inserts new nodes to it and then look for the neighbors of the new samples and update the entire graph. Instead of constructing the graph from scratch, the proposed method only update the nodes which their weights and edges might be affected by the insertion of new samples which helps to save computational time especially, for data self-representativeness graph construction methods. This strategy eliminates the limitation of Transductive learning methods which requires that the whole labeled and unlabeled samples should be available for graph construction. Furthermore, by updating the edges of nodes that are close to the inserted sample, the possible false edges that might have been created due to the lack of data will be eliminated. In our previous paper (Dornaika, Dahbi, Bosaghzadeh, & Ruichek, 2017), we focused on the case of one insertion and updating for multiple observations cases where there are different instances all belonging to one class. However, our current work is different from our previous work in several aspects. First, while in the previous work there was only one insertion and update in the graph, in this work, we are addressing the most general case in which the graph is built incrementally in the sense it undergoes hundreds of expansions (including insertion and updating). Second, the main task in (Dornaika et al., 2017) was recognizing multiple observations over a graph while in this paper, we focus on graph-based semi-supervised label propagation and Manifold learning. Third, the experiments in this paper are conducted using two graph construction methods to prove that the proposed incremental construction algorithm is a general procedure that can be exploited by any graph construction technique, even though in this paper we focus only on two graph construction algorithms: the KNN graph and the TPLLC graph. Moreover, the conducted experiments show that our incremental graph is as informative as the batch graph even when the incremental one is built after hundreds of insertion and updates. It should be noted that this work is also different from (Hacid & Yoshida, 2007) in several aspects. The goal in our method is affinity graph construction which has an edge weighting phase, while the objective in (Hacid & Yoshida, 2007) is edge detection for a specific type of graphs, namely the Relative Neighbors Graph (RNG). Moreover, we conduct extensive experiments on a variety of post-graph learning tasks to evaluate the proposed technique in different scenarios. Furthermore, we adopt a non-symmetric coding function to find the similarity between the
6
nodes and detect the edges, whereas in (Hacid & Yoshida, 2007) a symmetric distance function is used. In a non-symmetric coding function for two samples xi and xj , the similarity between the node xi and xj is not necessarily equal to the similarity between the node xj and xi , in other words sim(xi , xj ) is not necessarily equal to sim(xj , xi ) where sim is a non-symmetric similarity function. This property will further produce a non-symmetric matrix. In recent times, it is showed that the methods that construct graphs trough coding techniques (i.e., using data self-representation which produces non-symmetric edge weights) can better demonstrate the similarity between the samples compared to the classic symmetric functions like the Gaussian kernel or the Cosine function(J. Wang et al., 2010). Consequently, the graphs that are constructed by these coding schemes can obtain better performances in post-processing tasks. It worth noticing that some of the widely used graph construction techniques like KNN method, even though they use a symmetric similarity function(e.g., Gaussian weights), they do not necessarily construct a symmetric similarity graph. Meaning that if xi is among the K nearest neighbours of xj , xj is not necessarily among the K neighbours of xi . Hence, obtaining a symmetric graph and moreover adopting a symmetric similarity function is not a preference in graph construction and what matters is that the graph can correctly demonstrate the similarity between the nodes. Finally, we stress the fact that for any post-processing task which requires a symmetric graph, one can obtain a symmetric graph Wsym from a non-symmetric one Wnsym adopting Eq. (1). 1 Wsym = (Wnsym + WTnsym ) 2
(1)
The rest of this paper is organized as follows: Section 2 describes the TPLLC1 algorithm along with graph-based label propagation and manifold learning. Section 3, explains the motivation for incremental graph construction and our proposed algorithm. Section 4 presents the experimental results of the proposed method in different scenarios. Finally, we conclude the paper in section 5. 1
Two phase Locality-constrained Linear Coding
7
2
Background
2.1
Representation ability
To analyze massive data and to perform online learning, apart from low computational complexity, good representation ability is another desired characteristic of graphs. The edge weight has a crucial role in the graph representation ability and will affect the post-graph learning task performance. Gaussian kernel is a widely used algorithm for edge weighting (Belkin & Niyogi, 2003; He, Yan, Hu, Niyogi, & Zhang, 2005; X. Zhu, Ghahramani, & Lafferty, 2003). Recent graph construction methods attempt to retrieve the graph affinity matrix by exploiting the data self-representativeness such as LLE graph (Roweis & Saul, 2000), Non-Negative LLE graph, and sparse coding-based graphs (Wright, Yang, Ganesh, Sastry, & Ma, 2009; Dornaika & Bosaghzadeh, 2015). To improve the sparsity and data representativeness of graphs, recently, we proposed TwoPhase Locality-constrained Linear Coding (TPLLC) scheme which contains a pruning and coefficient reweighing phase (Dornaika & Bosaghzadeh, 2015; Dornaika, Bosaghzadeh, Salmane, & Ruichek, 2014a; Dornaika et al., 2014b). We have conducted a variety of experiments to evaluate the graph that is constructed based on the proposed coding method and the obtained results show that the graph obtained by the TPLLC coding has better representation ability. Furthermore, for the post-graph learning tasks, it outperforms other graph construction techniques. In this paper, we use the TPLLC coding method as a case study for incremental graph building. Thus, the TPLLC coding is exploited to represent new samples according to the database. We stress the fact that, although in this paper we use TPLLC coding technique, the proposed incremental graph construction strategy is generic in the sense that it can be used with any other graph construction algorithm like those presented in (Wright et al., 2009; Roweis & Saul, 2000; Belkin & Niyogi, 2003; He, Yan, et al., 2005; X. Zhu et al., 2003).
2.2
Review of Two Phase Locality-constrained Linear Coding (Dornaika & Bosaghzadeh, 2015)
Recently, (J. Wang et al., 2010) proposed Locality-constrained Linear coding (LLC) for dictionary learning and coefficient estimation. In LLC, the goal is to reconstruct a sample y based on the predefined dictionary X while minimizing the weighted `2 norm of the reconstruction 8
coefficients. In mathematical terms, it can be written as N X min(ky − Xbk22 + σ p2j b2j ), s.t.1T b = 1 b j=1
(2)
where σ is the balance parameter between the reconstruction error and coefficient sparsity and pj is a weight coefficient that controls how small or large each reconstruction coefficient should be and can be defined by any distance function that relates y and xj . Eq. (2) has a closed form solution as
b = (XT X + σPT P)−1 XT y.
(3)
where P is a diagonal matrix with Pjj = pj and vector b is the reconstruction coefficient vector. The goal in TPLLC is to use the LLC coding to obtain sparse reconstruction coefficients and use them to construct a sparse affinity matrix which can best describe the similarity between the samples. It has two phases wherein the first phase the weights of coefficients are calculated, and sample pruning is carried out. In the second phase, sparse coefficients are obtained. In the first phase, the coefficient vector b is estimated by the linear construction of sample y through the database X via Eq. (3). The weight for each sample xj is set by the Gaussian function pj = 1 − exp(−ky − xj k2 /t) where t is the Gaussian width and heuristically set to the average of distance between the whole database samples. For sample pruning, the coefficient vector b is considered as a similarity measurement. Since we are interested in the samples with high similarity, we remove samples with low reconstruction coefficients (low similarity). The criterion for the similarity threshold is the average of the coefficient vector as, T H(y) =
N 1 X |bj |. N
(4)
j=1
Hence, samples with reconstruction contribution lower than the average are removed. In the second phase, we again use Eq. (3) to estimate the similarity between the sample and the database. However, this time the database contains only the samples with high reconstruction contribution. By invoking Eq. (3), a new vector of coefficients will be estimated. For those database samples which have not been selected in the first phase, the edge weight will be set to zero. 9
To get the graph matrix, the TPLLC coding is invoked for every data sample. For more details of the TPLLC coding method, we refer the interested readers to (Dornaika & Bosaghzadeh, 2015).
2.3
Graph-based label propagation
Semi-Supervised Learning (SSL) algorithms try to learn from few labeled samples along with a lot of unlabeled samples. Label propagation belongs to graph-based SSL methods that try to derive the labels of the unlabeled data given the labeled data and an affinity graph (W) which shows the relation between the labeled and unlabeled samples. Let yi be the posterior probability of sample xi belonging to C different classes, namely, yi (c) = p(c|xi ); c = 1, 2, . . . , C. For a labeled sample xi , yi (c) = 1 if xi belongs to the cth class; yi (c) = 0, otherwise. The goal is to estimate the posterior probability of unlabeled sample xu belonging to different classes, i.e., yu . The label propagation algorithms (can also be called classifiers) (de Sousa, Rezende, & Batista, 2013) are: Gaussian Random Fields (GRF)(X. Zhu et al., 2003), Local and Global Consistency (LGC) (Zhou, Bousquet, Lal, Weston, & Sch¨olkopf, 2004), Laplacian Regularized Least Square (LapRLS) (Belkin, Niyogi, & Sindhwani, 2006), Laplacian SVM (LapSVM), Robust Multi-class Graph Transduction (RMGT) (Liu & Chang, 2009), Flexible Manifold Embedding (FME) (Nie, Xu, Tsang, & Zhang, 2010) and Deformed Graph Laplacian (DGL) (Gong et al., 2015). In this paper, we adopt GRF, LGC and DGL methods for semi-supervised label propagation. These algorithms are used for evaluating the performance of the batch graphs and incremental graphs.
2.4
Graph-based manifold learning
Manifold learning techniques try to project samples into another space to satisfy a desired criterion. Some techniques like PCA and LDA are global in a sense that they see data globally. Other techniques are based on local criteria. Graph-based manifold learning methods are examples of such cases that use the affinity graph to treat the data samples based on the relation between them. Different linear (He & Niyogi, 2004; He, Yan, et al., 2005; Xu, Zhong, Yang, & Zhang,
10
2010; He, Cai, Yan, & Zhang, 2005) and nonlinear (Roweis & Saul, 2000; Belkin & Niyogi, 2003) techniques, each with a specific objective function has been proposed. Laplacian Eigenmap (LE) (Belkin & Niyogi, 2003), Locally Linear Embedding (LLE) (Roweis & Saul, 2000) and Locality Preserving Projection (LPP) (He & Niyogi, 2004) are examples of such techniques where the first two are nonlinear and last one (LPP) is linear. The main advantage of linear techniques over nonlinear ones is that the former provide an explicit mapping function that allows the projection of out-of-sample data. In this paper, for evaluation purposes, we use the linear technique LPP that provides a projection matrix based on a data graph.
3 3.1
Proposed incremental graph construction Problem statement
Given the available data matrix as X = {x1 , x2 , ..., xN } ∈ RD×N where each data sample xi is described with a D dimensional feature vector, the affinity matrix W ∈ RN ×N as a square matrix where wij represents the similarity between the two nodes xi and xj and X0 = {x01 , x02 , ..., x0M } ∈ RD×M as M recently received samples. The goal is to construct the affinity f ∈ R(N +M )×(N +M ) which shows the similarity between the union of the available matrix W
samples and the recently received ones. It should be noted that in our scenario the arrival of new samples happens frequently. Thus, the goal is to incrementally construct an affinity matrix associated will all already available data samples.
3.2
Motivation and basic idea
Adding new nodes into a graph have different usages. A newly uploaded video into Youtube which should be tagged and related to other videos, a new member in Linkedin which should be linked into other members are two examples in which new nodes should be added into a previously constructed graph to represent the relation or similarity associated with the union of data. The simplest solution is to joint all data together and construct a new graph from scratch. In this sequel, the graph which is constructed by this scenario is referred to as ”Batch graph”. Although this is the simplest solution, it has a huge computational complexity especially when we have a stream of data that comes very often. 11
The second solution is to incrementally construct the graph where there is a seed graph and new samples are added to it as they arrive. This scenario has a lower computational complexity since it avoids the construction of a graph from scratch each time new samples are received. In this paper, we propose an incremental graph construction which contains two phases of insertion and update where in the former new nodes are added to the seed graph and in the latter, the structure of the seed graph is reevaluated and some edges might be added or removed from it. We assume that each newly received sample will only interact with few nodes, namely its closest neighbors. This interaction means that (i) the new sample will be linked to the most similar samples, and (ii) due to the introduction of the new sample, the links of the closest samples (in the previously available graph) should undergo some revision and update. The basic idea of the proposed method is demonstrated in Fig. 1, where there are 8 available nodes (P1 to P8 ) and one new node (P0 ). The lines between the nodes represent the edges and the line thickness shows the similarity between a pair of nodes. The goal is to construct a graph which shows the similarity between all nodes. At first, P0 is inserted to the graph and the similarity between P0 and the available nodes is calculated and as we see in Fig. 1(b) it has similarity with three nodes (namely, P1 , P2 and P3 ). In the second part, the edges and weights of close nodes should be reevaluated for possible changes. The new weights of these nodes are plotted in Fig. 1(c). As we observe, P1 has lost its edge with P6 which is because it found a more similar node (i.e., P0 ). Moreover, due to the insertion of P0 , the weights of its close nodes are modified (either strengthened or weakened). This shows that the insertion of a new node can change the weights of its similar nodes in the graph. Based on what we observed, we conclude that the insertion of a new node to a graph does not affect the whole graph but only affects the edges of the nodes that are close to the new node. Consequently, after inserting a node into the graph, it is not necessary to evaluate all nodes again but only the nearby ones.
3.3
Proposed Incremental graph construction
In this section, we describe our proposed incremental graph construction method which enables us to add new nodes into a previously constructed graph. The proposed technique avoids constructing a graph from scratch each time we receive new samples. Consequently, it has a
12
Figure 1: Basic idea of our proposed method. (a) 9 nodes where P1 to P8 are initial nodes (violet dots) connected together and P0 (red dot) is the new node. (b) New node is connect to its similar(close) samples, namely (P1 to P3 ). (c) The edges and edge weights of three similar nodes are reevaluated and some edges are removed and the edge weights have changed. lower computational complexity compared to the batch graph construction method. Consider that we have an initial affinity matrix W which represents the similarity between the nodes of the initial data matrix X. We receive a new set of data, X0 and the goal is to construct a larger f which demonstrates the relation between the union of data (i.e., X e = X ∪ X0 ). affinity graph W We can split the new affinity matrix as below
f dd W f dn W f = W f nd W f nn W
(5)
f dd is the initial seed graph which represents the similarity between the initial nodes, where W
f dn shows the similarity between the initial nodes and the new ones, W f nd and W f nn shows the W similarity of new nodes with respect to initial nodes and new nodes, respectively.
In the insertion phase, we calculate the similarity between each new sample and the whole data matrix which gives us a 1 × (N + M ) vector. Although, the similarity can be obtained using any function which calculates the similarity, we study two types: TPLLC coding (Dornaika & Bosaghzadeh, 2015) and Gaussian function. We directly put the obtained similarity vector in f nd W f nn ] matrix which represents the similarity between each new sample and whole data [W matrix. In the update phase, we revise the weights of the nodes that are close to each added
sample for possible changes. At first, we identify the nodes that are close to the new sample by selecting the nodes that have a high similarity value with the new samples. To select the close
13
nodes, one can use a fixed number or an adaptive one. For the fixed value, a predefined number of the samples with high similarity values, and for the adaptive case, a threshold value using any statistical function can be adopted. Using any of the fixed or adaptive threshold, we select a subset (Xs )which contains Q samples that their weights should be reevaluated for possible changes. In the reevaluation phase, the weights may reduce or increase and the edges may be created or eliminated. We calculate the similarity of each sample of the selected subset respect to the whole data which will form a 1 × (N + M ) vector. The calculated similarity vector will go f dd W f dn ] submatrix. For the rest of the samples in directly into the corresponding row in the [W the initial matrix which correspond to the nodes that are far from the new samples, hence, their
weights remain intact, we copy the weights from the seed graph W and put them directly in the f dd subgraph and set the corresponding rows in the W f dn subgraph to zero. This is justified W by the fact that these nodes are far away (dissimilar) from the recent samples and consequently there is no interaction between them, so there is a zero similarity. Algorithm 1 summarizes the proposed incremental graph construction method. When data arrive in chunks, Algorithm 1 is recursively invoked for every chunk. f is asymmetric, for any post-processign Note that although the estimated affinity matrix W
task which requires a symmetric graph, one can always adopt Eq (1) to obtain a symmetric graph. Fig. 2 shows a visual illustration of the incremental graph construction using a seed graph with 6 nodes and 9 other nodes added in three stages. As we can see, by adding new nodes to the graph, new edges are created either for the new nodes or their similar nodes. Moreover, it should be emphasized that by adding new nodes to the graph, it is possible that some of the nodes loss their edges.
3.4
Analyzing computational complexity
It is evident that the computational complexity of the proposed method depends on the computational complexity of the adopted coding scheme. Apart from that, it also depends on three factors, N , number of the available samples, r, number of the new samples and Q, number of the affected nodes. Consider that the computational complexity of the adopted coding scheme is O((N )p ).
14
Data: Initial data matrix X, its affinity matrix W and new sample matrix X0 of M samples (M ≥ 1 ) f Result: New affinity matrix W for i = 1, . . . , M do
New sample coding phase; e = X ∪ X0 using TPLLC algorithm; Code x0i with respect to X f nd W f nn ] ; Copy the obtained coefficients to the corresponding row of [W Updating phase ;
According to the obtained coefficients, select the set of samples S whose edges should be updated; for j = 1, . . . , Q = |S|) do e Code xsj with respect to X; f dd W f dn ] ; Copy the obtained coefficients to the corresponding row of [W end for Each non-selected sample of X do f dd ; Copy the corresponding row from W to W f dn to zero; Set the corresponding rows in W end end Algorithm 1: Proposed incremental graph construction.
Figure 2: Illustration of the incremental graph construction. (a) The seed graph with 6 nodes. (b) New graph when three nodes are added to the seed graph. (c) Previous graph after adding three more nodes. (d) Final graph when last chunk of data are inserted. As we observe, while most part of each graph is preserved after inserting new nodes, some of the edges of the previous graph may be pruned due to the new samples.
15
Hence, for the batch graph, since we have N + r nodes that should be coded, the computational complexity will be O((N + r) · (N + r − 1)p ). The proposed method has two phases of insertion with the computational cost of O(r ·(N +r −1)p ) and updating with the computational cost of O(r · Q · (N + r − 1)p ). Consequently, the total computational cost of the incremental graph is O((r · Q + r) · (N + r − 1)p ). Comparing computational complexity of the batch and incremental mode, we can see that if the total number of nodes to be updated (i.e., r · Q) is smaller than the number of available samples, the computational complexity of the proposed incremental method is less than that of the batch graph. It is obvious that the largest possible number of nodes for updating is equal to the number of available nodes and if it happens, the computational complexity of both methods will be equal. However, in reality (as we will see in the experiments), with proper choice of Q, not only this will not happen, but also the computational complexity of the incremental graph is orders of magnitude less than that of the batch graph. Moreover, it should be mentioned that the efficiency gain highly depends on the ratio N/(r · Q). Hence, as the number of samples (i.e., N ) increases, the efficiency gain will increase too. It is worth noting that in the incremental graph construction, r and Q can be fixed, whereas N increases as long as new samples are added. As a numerical example, consider the computational complexity of the TPLLC graph which is O(N · (N − 1)3 ). Let N = 2000, r = 5, and Q = 5 (Our experiments prove that this is a good choice). For the batch graph the computational cost is O((N +r)(N +r−1)3 ) = O(1.6136e+13). Whereas, for incremental graph it is O((rQ + r)(N + r − 1)3 ) = O(2.4144e + 11) which shows the efficiency gain of 67. Similar calculations show the efficiency gain of 33.5 for r = 10. It should be mentioned that when a chunk of new nodes is inserted into the graph, there might be some duplicated nodes for the update phase and one time updating of these duplicated nodes increases the efficiency even more.
4
Experiments
A graph cannot be used to evaluate the algorithm that constructs it. To assess a constructed graph, one needs to quantify the performance of post-graph learning tasks when they use such graphs. In this paper, we use different semi-supervised post-graph learning tasks in different scenarios to evaluate the proposed incremental graph construction method. It worth mentioning 16
that the main goal in the following experiments is to demonstrate that the graph constructed by the proposed incremental technique has the similar performance in learning tasks as of the batch graph. We have conducted several types of experiments to evaluate the proposed method. At first, we briefly describe the image databases used in the evaluations. Then, in the first experiment, we compare the performance of the incremental graph and batch graph in semi-supervised scenarios using Gaussian Random Field (GRF) and Local and Global Consistency (LGC) on three face databases. In the second experiment, we study the effect of varying the number of updated nodes (fixed selection scenario) on the accuracy and the computational cost of the incremental graph. In the third and fourth experiments, USPS and LFW databases are used to evaluate the incremental graph on large datasets. In the fifth experiment, we compare the incremental graph obtained by TPLLC coding and KNN method. In the sixth experiment, incremental graph construction is evaluated using the recently proposed Deformed Laplacian Graph label propagation method. We used Locality Preserving Projection(He & Niyogi, 2004) to evaluate the proposed incremental graph on a graph-based manifold learning method in the seventh experiment. The last experiment compares the elapsed time of batch graph and incremental graph construction techniques.
4.1
Datasets
We use one handwritten digit dataset (USPS) and 5 face datasets namely LFW, Extended Yale, PF01, PIE, and FERET. In the following, we briefly describe each dataset. 1. USPS: The US Postal (USPS) handwritten digit dataset is derived from a project on recognizing handwritten digits on envelopes (Hull, 1994). Each image digit was resized to a 16×16 pixel image. The database has ten classes, each corresponds to a digit from zero to nine. The database has 11000 images in total with 1100 image per class. For evaluations, we randomly select 300 samples from each class. Hence, in total 3000 samples are used. Fig. 3a shows some samples of this database. 2. Labeled Faces in the Wild (LFW): This is a large scale database of face photographs designed for unconstrained face verification with variations of pose, illumination, expression, misalignment and occlusion, etc. The dataset contains more than 13000 images of 17
faces collected from the web. We used a subset of aligned LFW-a (Taigman, Wolf, & Hassner, 2009) which consists of 141 subjects with no less than 11 and no more than 200 samples per subject. This subset contains 3408 images. In Figure 3b, we show some samples of this database. 3. Extended Yale2 : We use a subset containing 1774 face images of 28 individuals. The images are cropped and contain illumination and facial expression variations. The image size is 192×168 pixels with an 8-bit grayscale. The images are rescaled to 32×32 pixels in our experiments. Some samples are demonstrated in Fig. 3c. 4. PF013 : It contains the true-color face images of 103 people, 53 men and 50 women, representing 17 various images (1 normal face, 4 illumination variations, 8 pose variations, 4 expression variations) per person. There are three kinds of systematic variations, such as illumination, pose, and expression variations in the database. The images are rescaled to 32×32 pixels in our experiments. Fig. 3d illustrates some samples of this database. 5. PIE4 : We use a reduced dataset containing 1926 face images of 68 individuals. The images contain pose, illumination and facial expression variations. The images are resized to 32×32 pixels with an 8-bit grayscale. A small set of images are shown in Fig. 3e. 6. FERET5 : This corpus contains more than 1000 subjects and consists of 14051 images of human faces with views ranging from frontal to left and right profiles. The proposed method is evaluated on a subset of FERET database, which includes 1400 images of 200 distinct subjects, each having seven images. The subset involves variations in facial expression, illumination, and pose. In our experiments, the facial portion of each original image is cropped automatically based on the location of eyes and resized to 32×32 pixels. Some samples of this subset are shown in Fig. 3f. 2
http : //vision.ucsd.edu/ ∼ leekc/ExtY aleDatabase/ExtY aleB.html http : //nova.postech.ac.kr/special/imdb/imdb.html 4 http : //www.ri.cmu.edu/projects/project 418.html 5 http : //www.itl.nist.gov/iad/humanid/f eret/ 3
18
(b)
(a)
(d)
(c)
(f)
(e)
Figure 3: Some samples of (a) USPS hand digit dataset, (b) LFW dataset (c) Extended Yale dataset, (d) PF01 dataset, (e) PIE dataset and (f) FERET dataset.
19
4.2
Batch graph vs Incremental graph
The goal of this experiment is to verify if the graph obtained incrementally can yield similar performance compared to the batch graph. In this experiment, we use Gaussian Random Field which is a semi-supervised learning technique as the post-graph learning method and three face datasets namely, Extended Yale, PF01, and PIE. We assume that labeled data are available in the first place and the unlabeled samples come in sequences. For each dataset, we randomly select l samples per class as labeled and the rest of data as unlabeled data that should be added incrementally. Thus, the size of the initial (seed) graph is l × C where C denotes the number of classes. To get rid of redundant data and reduce the feature size, we reduce the dimensionality of each dataset using the projection matrix obtained by applying PCA on the labeled data such that it preserves 99% of eigenvalues energy. For batch graph construction, we construct the graph using all labeled and unlabeled samples. For incremental graph construction, we construct a seed graph using only the labeled samples. Then, chunks of unlabeled samples are added to the existing graph recursively, with 10 images per chunk, until all unlabeled samples are inserted into the graph. Then GRF label propagation is carried out over the graph to estimate the labels of unlabeled data. The correct labeling percentage over unlabeled samples is used to evaluate the performance of these two graph construction schemes. For batch graph and incremental graph, the above process is repeated 10 times with different random sets of labeled and unlabeled samples. In Table 1, we report the correct labeling accuracy on the three above datasets for 10 different splits. Table 1: The accuracy of correct labeling (%) with 10 different random combination of labeled and unlabeled sets, using batch graph and incremental graph on three different datasets. The number of labeled samples is 15 for Extended Yale, 9 for PF01, and 15 for PIE. Split\ Graph
1
2
3
4
Batch Incre.
94.46 94.16
94.83 94.61
94.01 94.09
97.86 97.78
Batch Incre.
87.38 87.03
85.51 85.16
92.17 91.82
82.01 82.59
Batch Incre.
87.86 87.20
86.53 86.20
81.35 81.13
92.27 91.83
5
6
Ext.Yale 92.47 96.23 92.39 96.38 PF01 91.24 83.53 91.35 83.53 PIE 85.65 86.42 85.65 86.09
20
7
8
9
10
Mean
96.09 96.09
94.98 94.83
91.36 91.28
95.64 95.35
94.79 94.70
82.36 81.78
65.19 65.07
88.67 88.9
91.59 91.82
84.96 84.90
91.06 91.62
93.60 93.49
88.30 87.42
86.98 86.42
88.00 87.70
To have an overall view for the accuracy of the batch graph and incremental graph, we use two different number of labeled samples (l) and for each one, 10 random sets of labeled and unlabeled samples are adopted. In Table 2, we report the average of accuracy over 10 random splits. Table 2: Average of correct labeling (%) over 10 random splits of labeled and unlabeled sets on Ext.Yale, PF01 and PIE databases. For each database two different number of labeled samples is used. Dataset l Batch graph Incremental graph
Ext.Yale 15 20 94.79 96.00 94.70 96.00
PF01 9 12 84.96 86.93 84.90 86.80
PIE 15 88.00 87.70
20 89.48 89.10
As we can see from the results in both Tables 1 and 2, the incremental graph shows similar performances compared to the batch graph when using GRF label propagation.
4.3
Studying interaction schemes: Varying the number of updated nodes
As explained in the proposed method section, after appending the incoming chunk to the graph, we select the samples which are similar to the incoming chunk to update their edges and weights. Adaptive and fixed selection schemes are two possible options. In the previous experiment, we used the adaptive scheme. In other words, for each incoming sample, we use the average of the obtained edge weights as a threshold and samples with weights higher than the threshold are selected as similar samples for the updating phase. In this experiment, we want to compare the adaptive selection with fixed selection where for each incoming datum a predefined number of nodes are selected for the updating process. This selection is based on the sorted list of the estimated coefficients (descending order of similarities). The experimental setup is the same as in the previous experiment, l samples are selected as labeled and the dimensionality of data are reduced with PCA. The labeled data are used to construct the seed graph, then we incrementally add unlabeled samples chunk by chunk to the graph until all unlabeled samples are inserted. In both scenarios (adaptive selection and fixed selection), the coefficients obtained by TPLLC is used as similarity value. In this experiment, we use PF01 and PIE databases with 9 and 15 samples as labeled and the rest as unlabeled samples, respectively. For adaptive selection, average of the coefficients is used as the threshold value. For fixed selection, we try a different number of updating nodes going from 1 to 50 and 21
for each one 10 different combinations of labeled and unlabeled samples are used and the average of accuracy is calculated. Fig. 4 illustrates the obtained accuracy as a function of the number of the closest nodes for PIE database when 15 samples are labeled. In Table 3, we report the time elapsed to insert the last chunk to the graph along with its corresponding accuracy for PF01 and PIE databases. All tests are conducted using MATLAB running on a PC equipped with an Intel Core i7 CPU at 2.93 GHz and 8 GB of RAM. As we can see, in general by reducing the number of nodes to be updated for each chunk of data, the accuracy decreases. We expected such behavior since by reducing the number of updated nodes, some of the nodes that are close to the newly added samples, and thus, should have their edges updated, will remain intact. It will prevent the graph from being completely updated and matched with new samples. On the other hand, reducing the number of modified nodes helps the algorithm to construct the graph faster, since for each chunk of samples less computation is needed. In the case of the PF01 dataset, for 5 nodes to be updated, the whole updating process associated with 10 new images takes 9 seconds. This elapsed time corresponds to the TPLLC coding of 10 + 10 × 5 = 60 images with a dictionary containing 1751 images. Considering a trade-off between accuracy and computational time, we can see that by reducing the number of updated nodes to 5 there is no significant drop in the accuracy (less than 1%), however, it drastically reduces the total computational cost and makes the algorithm faster (20 times faster) in both PF01 and PIE databases. Hence, in the rest of the experiments in this paper, we only select 5 samples per datum to be updated after inserting it into the graph.
4.4
Deformed Graph Laplacian
In this experiment, we evaluate the incremental and batch graph in the recently proposed DGL label propagation method. We use four face datasets namely, Extended Yale, PF01, PIE and FERET with 10, 6, 10 and 3 labeled samples, respectively. The rest of data are used as unlabeled samples. For each dataset, 10 random combinations of labeled and unlabeled samples are used for graph construction. In the DGL label propagation method, we set γ and β to 1. The average of accuracy over the 10 random combinations is reported in Table 4. As we can see, except for PF01 database, the accuracy obtained by using the incremental graph is slightly better than that of obtained from the batch graph.
22
85
Average of accuracy over splits
84.5
84
83.5
83
82.5
82
81.5
0
10
20
30
40
50
Number of nodes to be updated
Figure 4: Recognition rate (%) in PIE database as a function of the number of updated nodes for a fixed selection scheme. The results are obtained by using 15 labeled samples and the rest as unlabeled. The recognition rate is averaged over 10 different random splits.
Table 3: Varying the number of updated nodes after adding one chunk to the graph. The first column shows the number of the most similar nodes to be updated (their edges and weights). The Accuracy column shows the average of accuracy over 10 different splits when different nodes are updated. In the Chunk insertion column, we report the average of total time elapsed (in seconds) to insert the last chunk (adding the chunk of 10 images to the graph and updating close nodes). Dataset\ Num. updated nodes Acc. (%) Adaptive 84.90 50 84.88 30 84.86 20 84.91 10 84.61 5 84.16 2 82.42 1 81.60 0 82.34
PF01 Chunk insertion (sec) 262 83 55 35 16 9 5 3 2
Acc. (%) 87.70 87.69 87.75 87.74 87.97 87.81 86.52 86.04 86.47
PIE Chunk insertion (sec) 415 110 71 52 23 12 6 4 2
Table 4: Correct labeling (%) of label propagation via Deformed Graph Laplacian (DGL) using 4 face datasets. For Extended Yale, PF01, PIE and FERET databases we use 10, 6, 10 and 3 labeled samples, respectively. Graph\Dataset Batch graph Incremental graph
Ext.Yale 93.11 93.28
23
PF01 77.96 77.17
PIE 81.51 81.80
FERET 74.61 75.71
4.5
Face recognition in the wild
To evaluate the proposed method on large datasets, we use LFW-a dataset. The images in this dataset are more challenging since they have been selected from the internet and captured in completely uncontrolled situations. Hence, the recognition rate in this database will be lower compared to accuracies obtained with other datasets. Another significant difference is that the variation of the number of samples in each class that vary from 11 to 200 samples per class. We select 10 samples from each class as labeled and set the rest of data as unlabeled. The dimensionality of labeled and unlabeled data is reduced by applying PCA on the labeled data. The batch graph is constructed by using the union of labeled and unlabeled data. For the incremental graph, after constructing the seed graph using labeled samples, unlabeled data are added to the seed graph in chunks of 10 images. GRF label propagation is used to determine the labels of unlabeled samples. We repeat the above process for 10 different random splits of labeled and unlabeled samples. The average of recognition rate over unlabeled samples is reported in Table 5. As we expected, the accuracy is lower compared to the results obtained in other face datasets. We emphasize that the performance of face recognition on LFW-a is much lower than the one obtained by the face verification problem on the same dataset. The results obtained with the incremental method are slightly better compared to the batch graph. Table 5: Comparing the accuracy (%) between the Batch graph and incremental graph on LFW-a database. Number of Labeled samples Batch graph Incremental graph
4.6
10 38.93 39.02
Digit dataset
In this section, we evaluate the proposed incremental graph on USPS dataset which is a handwritten digit dataset. The performance of the proposed incremental graph and batch graph are evaluated with the GRF label propagation method. For evaluation, l samples per class are selected as labeled and the rest of data as unlabeled. We reduce the dimensionality of the labeled data and unlabeled data by PCA. For the batch graph, the whole reduced data are used at once to construct the graph. For the incremental graph, after constructing a seed graph using the 24
labeled samples, the unlabeled data were inserted in chunks of 10 images. A different number of labeled samples l going from 10 to 50 are adopted similar to (Cheng, Yang, Yan, Fu, & Huang, 2010). For each l, 10 different combinations (splits) of labeled and unlabeled samples have been tested. Table 6 illustrates the accuracy of each split separately along with the average accuracy for the l number of labeled samples. As we can see, the proposed method was able to even gain better results compared to the batch graph. Table 6: Accuracy (%) of correct labeling of unlabeled data for USPS handwritten digit database. Five different number of labeled samples is adopted and for each one 10 different combinations of labeled and unlabeled has been evaluated. l 10 20 30 40 50
4.7
Graph Batch Inc. Batch Inc. Batch Inc. Batch Inc. Batch Inc.
1 67.4 71.5 77.6 83.2 83.6 87.4 85.8 89.0 88.0 90.8
2 57.1 69.4 78.8 82.3 83.3 86.6 86.7 89.9 88.9 91.0
3 71.4 71.4 78.9 82.9 82.4 86.7 86.0 89.1 87.8 90.7
4 68.7 64.5 80.0 81.7 83.2 85.3 85.3 88.1 86.2 89.2
5 54.3 64.7 79.2 83.0 84.3 87.0 87.3 89.8 89.2 90.7
6 68.6 71.2 80.5 84.2 85.5 88.4 87.3 89.7 88.6 91.0
7 70.3 66.9 78.3 80.9 83.8 86.1 87.0 89.7 87.9 90.3
8 67.6 70.2 78.8 84.7 84.1 87.3 87.4 90.6 88.6 91.0
9 59.1 58.9 76.9 77.4 83.3 85.5 86.9 89.2 89.6 90.7
10 70.7 68.1 81.5 79.8 85.4 87.1 87.3 89.5 88.5 90.4
Average 65.53 67.69 79.06 82.02 83.89 86.76 86.72 89.47 88.32 90.58
Incremental TPLLC vs. other incremental schemes
This section describes two groups of experiments. In the first one, we evaluate and compare the proposed incremental graph construction using two different coding schemes and in the second one, we compare the proposed method with the algorithms proposed by Hacid (Hacid & Yoshida, 2007) and Rayar (Rayar et al., 2015). To the best of our knowledge, these algorithms were the only ones that addressed incremental graphs for image data. In the first group of experiments, we apply the same proposed incremental algorithm for graph construction but with the KNN graph construction method instead of TPLLC method. In other words, the coefficients of the affinity matrix are derived using the K nearest neighbor samples with Gaussian weighting mechanism. The goal is to compare the performance of these two graph construction methods. For KNN method, we use the optimum parameters from (Dornaika & Bosaghzadeh, 2015). The number of samples to be updated after the insertion of each sample is set to 5. We use five face datasets namely, Extended Yale, PF01, PIE, FERET
25
and LFW with 10, 6, 10, 3 and 10 labeled samples for each database, respectively. For each dataset, 10 different random combinations of labeled and unlabeled samples are adopted. GRF label propagation is used to retrieve the labels of unlabeled samples. Table7 reports the average of the accuracy of correct labeling over the random splits. These results generalize the superiority of the batch TPLLC graphs over the batch KNN graphs (as shown in (Dornaika & Bosaghzadeh, 2015)) to the incremental variants of both types of coding. It should be mentioned that although only two edge weigh schemes are adopted in this paper, any similarity coding method (LLC, l1 coding, LLE) can exploit the proposed strategy. Hence, our algorithm is not limited to any specific similarity metric. Table 7: Accuracy (%) in GRF label propagation method using incremental graph construction with TPLLC and KNN coding.
Coding method\Dataset Incremental KNN Incremental TPLLC
Ext.Yale 80.26 94.26
PF01 48.22 76.37
PIE 47.08 83.20
FERET 42.63 76.40
LFW 16.20 39.02
In the second group of experiments, we compare the proposed algorithm with that of Hacid (Hacid & Yoshida, 2007) and Rayar(Rayar et al., 2015). These two algorithms proposed incremental update of a relative neighborhood graph (RNG). We select 30, 10, 14, 4 samples which makes 840, 1070, 952, 800 images as labeled and leave the remaining images which are 934, 749, 974, 600 as unlabeled for Extended Yale, PF01, PIE and FERET databases, respectively. The labeled samples are used to construct the seed graph and the unlabeled samples are considered as the new samples to be added to the seed graph. Since both Hacid and Rayar algorithms insert one node at a time, for a fair comparison, in each iteration of the proposed method we only add one sample. After inserting all of the new samples to the seed graph, we adopt LGC method for label propagation to estimate the labels of unlabeled samples. We repeat this process for 10 combinations of labeled/unlabeled samples and report the average of correct label estimation in Table 8. As we can see, the proposed incremental graph construction algorithm has a better performance in terms of accuracy to that of Hacid (Hacid & Yoshida, 2007) and Rayar (Rayar et al., 2015) incrementally constructed graph. It should be noted that although all of the algorithms in Table 8 exploit locality and adopt an update strategy to incrementally construct a graph, the proposed method is based on TPLLC coding scheme while Hacid and Rayar algorithms are based on a simple distance metric function. 26
Table 8: Accuracy (%) of LGC label propagation method adopting Hacid, Rayar and proposed incremental graph construction techniques.
Inc. method\Dataset Hacid(Hacid & Yoshida, 2007) Rayar(Rayar et al., 2015) Prop. method(TPLLC Coding) 4.8
Ext.Yale 69.69 70.01 94.40
PF01 54.91 54.74 88.42
PIE 37.98 38.24 78.10
FERET 59.22 60.28 84.08
Manifold learning
In this experiment, we compare the batch graph and the proposed incremental graph in a manifold learning algorithm. The semi-supervised manifold learning scenario is implemented as follows. We divide the dataset into three parts, namely, labeled, unlabeled and test parts. The goal is to compare the performance of the graphs constructed on labeled and unlabeled data when the post-graph learning task is the linear Locality Preserving Projections (LPP)(He & Niyogi, 2004). Four face datasets, namely Extended Yale, PF01, PIE and, FERET are used in this experiment. For the batch graph, we use labeled and unlabeled data for graph construction. For the incremental graph, a seed graph by using only the labeled samples is constructed. Then in chunks of 10 images, we insert the unlabeled data into the seed graph until all unlabeled data are added to the graph. We learn the linear transform associated with LPP using the constructed graphs along with the labeled and unlabeled data. We recall that LPP does not consider label information in its framework. Then, we project the labeled and test data into the obtained space. Test data are then classified using nearest neighbor classifier over different LPP embedding dimensions. This process is repeated for 10 different combinations of labeled, unlabeled and test samples and then the average over 10 combinations is calculated and reported. Fig. 5 illustrates the average recognition rate over different LPP dimensions for Extended Yale, PF01, PIE and FERET datasets. The maximum dimension in each database depends on the dimension of samples in each dataset. As we can see, the incremental graph has similar performance compared to the batch graph over all dimensions. It shows that the graph constructed by the proposed incremental method has similar performance compared to the batch graph in a manifold learning scenario.
27
ExtYale
PF01
90
60 55
80
50 Recognition rate (%)
Recognition rate (%)
70
60 Batch graph Incremental graph
50
40
45 40
Batch graph Incremental graph
35 30 25 20
30
15 20
0
20
40
60
80
100
Dimension
0
20
40
60
80 100 Dimension
(a)
(b)
PIE
120
140
160
180
FERET
65 50
55
45
50
40 Recognition rate (%)
Recognition rate (%)
60
45 40 35 30
Batch graph Incremental graph
25
30 Batch graph Incremental graph
25 20 15
20
10
15 10
35
20
40
60
80
100 120 Dimension
140
160
180
5
200
(c)
0
20
40
60
80 100 Dimension
120
140
160
180
(d)
Figure 5: Recognition rate using different embedding dimension of LPP adopting batch and incremental graph. (a) Extended Yale dataset, (b) PF01 dataset, (c) PIE dataset and (d) FERET datasets.
28
As we can see from the above experiments shown so far, the incremental graph shows similar performance compared to the batch one, which proves that despite more than one hundred updates and expansions, the incremental graph is still as informative as the batch graph.
4.9
Computational complexity
As the last experiment, we compare the elapsed time used to construct the batch graph and incremental graph. For the batch graph, this time corresponds to the graph construction of the whole data samples. For the proposed method this time corresponds to the insertion of the last chunk of images. Table 9 reports the obtained CPU times. We used a non-optimized MATLAB code running on a PC equipped with an Intel Core i7 CPU at 2.93 GHz and 8 GB of RAM. As we can see, the elapsed time for graph construction using the proposed method is by far lower than that of the batch graph method. Table 9: Time (in seconds) elapsed to construct Batch and Incremental graph on different databases. Graph construction \Dataset Batch graph Incremental graph
5
Ext. Yale 364 8
PF01 468 9
PIE 578 12
FERET 205 8
USPS 2390 43
LFW 4530 76
Conclusion
In this paper, we proposed a new incremental graph construction technique which dynamically add new nodes into a graph without the need to construct it from scratch. At first, we insert the new samples to the graph. The edges and their weights are computed using the Two-Phase Locality-constrained Linear Coding. In the second phase, we evaluate the existing samples and select among them those which are highly similar to the newly added sample, as their edges and weights are prone to change. Although we adopt TPLLC and Gaussian kernel for similarity calculation, the proposed method is not limited to these two methods and any other coding technique such as sparse `1 coding or LLE coding can exploit the proposed framework. The experimental results show that, even though few nodes are adopted in the updating phase, the incremental graph has a similar performance compared to the batch graph and for some cases, even slightly better results can be obtained. The first reason for the outperformance of the
29
incremental graph compared to the batch graph is that the former is more sparse compared to the latter one and since in general sparsity is a desired characteristic for a graph, it causes the incremental graph to outperform batch graph. The second reason is that our main assumption for the proposed incremental graph construction method was correct for real-world problems. Meaning that by inserting a sample to the graph only the edges and weights related to its nearby samples will be affected. Hence, by updating the connections of nearby samples we can obtain a similar graph to the one built from scratch and obtain similar performances. According to the experimental results, even after hundreds of insertion and update, the obtained graph has a similar performance compared to the graph constructed from scratch. The proposed method incrementally constructs the graph. Therefore, it has lower computational complexity and constructs the graph faster compared to batch graph techniques. Future works may investigate strategies for incremental graph construction with the constraint of maintaining the graph size manageable. To this end, we will try to prevent the increase in the graph size by removing nodes with low information.
Declaration of interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgment This work was supported by the Spanish Ministerio de Ciencia, Innovacion y Universidades, Programa Estatal de I+D+i Orientada a los Retos de la Sociedad, RTI2018-101045-B-C21.
References Belkin, M., & Niyogi, P. (2003). Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation, 15 (6), 1373–1396. Belkin, M., Niyogi, P., & Sindhwani, V. (2006, December). Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. J. Mach. Learn. Res., 7 , 2399–2434. 30
Chen, J., Fang, H.-r., & Saad, Y. (2009, December). Fast approximate knn graph construction for high dimensional data via recursive lanczos bisection. J. Mach. Learn. Res., 10 , 1989– 2012. Retrieved from http://dl.acm.org/citation.cfm?id=1577069.1755852 Cheng, B., Yang, J., Yan, S., Fu, Y., & Huang, T. (2010, April). Learning with l1-graph for image analysis. Trans. Img. Proc., 19 (4), 858–866. Connor, M., & Kumar, P. (2010, July). Fast construction of k-nearest neighbor graphs for point clouds. Visualization and Computer Graphics, IEEE Transactions on, 16 (4), 599-608. doi: 10.1109/TVCG.2010.9 de Sousa, C., Rezende, S., & Batista, G. (2013). Influence of graph construction on semisupervised learning. In Machine learning and knowledge discovery in databases (Vol. 8190, p. 160-175). Springer Berlin Heidelberg. Dornaika,
F., & Bosaghzadeh,
A.
(2015).
Adaptive graph construction using
data self-representativeness for pattern classification. 118 - 139.
Information Sciences, 325 ,
Retrieved from http://www.sciencedirect.com/science/article/pii/
S0020025515004880 doi: http://dx.doi.org/10.1016/j.ins.2015.07.005 Dornaika, F., Bosaghzadeh, A., Salmane, H., & Ruichek, Y.
(2014a).
A graph con-
struction method using {LBP} self-representativeness for outdoor object categorization.
Engineering Applications of Artificial Intelligence, 36 , 294 - 302.
Retrieved
from http://www.sciencedirect.com/science/article/pii/S0952197614002012 doi: http://dx.doi.org/10.1016/j.engappai.2014.08.003 Dornaika, F., Bosaghzadeh, A., Salmane, H., & Ruichek, Y. (2014b, Aug). Locality constrained encoding graph construction and application to outdoor object classification. In Pattern recognition (icpr), 2014 22nd international conference on (p. 2483-2488). doi: 10.1109/ ICPR.2014.429 Dornaika, F., Dahbi, R., Bosaghzadeh, A., & Ruichek, Y. (2017). Efficient dynamic graph construction for inductive semi-supervised learning. Neural Networks, 94 , 192 - 203. Retrieved from http://www.sciencedirect.com/science/article/pii/S0893608017301582 doi: http://dx.doi.org/10.1016/j.neunet.2017.07.006 Gan, R. (2012). Scalable k-nn graph construction for visual descriptors. In Proceedings of the 2012 ieee conference on computer vision and pattern recognition (cvpr) (pp. 1106–1113).
31
Washington, DC, USA: IEEE Computer Society. Retrieved from http://dl.acm.org/ citation.cfm?id=2354409.2354854 Gong, C., Liu, T., Tao, D., Fu, K., Tu, E., & Yang, J. (2015, Oct). Deformed graph laplacian for semisupervised learning. Neural Networks and Learning Systems, IEEE Transactions on, 26 (10), 2261-2274. doi: 10.1109/TNNLS.2014.2376936 Hacid, H., & Yoshida, T. (2007). Incremental neighborhood graphs construction for multidimensional databases indexing. In Z. Kobti & D. Wu (Eds.), Advances in artificial intelligence (Vol. 4509, p. 405-416). Springer Berlin Heidelberg. Retrieved from http://dx.doi.org/ 10.1007/978-3-540-72665-4 35 doi: 10.1007/978-3-540-72665-4 35 He, X., Cai, D., Yan, S., & Zhang, H. (2005). Neighborhood preserving embedding. In Ieee int. conference on computer vision. He, X., & Niyogi, P. (2004). Locality preserving projections. In Advances in neural information processing systems 16 (pp. 153–160). MIT Press. He, X., Yan, S., Hu, Y., Niyogi, P., & Zhang, H. (2005). Face recognition using laplacianfaces. IEEE Trans. on Pattern Analysis and Machine Intelligence, 27 (3), 328–340. Hull, J. (1994, May). A database for handwritten text recognition research. IEEE Transactions on Pattern Analysis and Machine Intelligence, 16 (5), 550-554. doi: 10.1109/34.291440 Liu, W., & Chang, S. (2009, June). Robust multi-class transductive learning with graphs. In Computer vision and pattern recognition, 2009. cvpr 2009. ieee conference on (p. 381-388). Liu, W., He, J., & Chang, S. (2010, June). Large graph construction for scalable semi-supervised learning. In Proceedings of the 27th international conference on machine learning (icml10) (pp. 679–686). Haifa, Israel: Omnipress. Retrieved from http://www.icml2010.org/ papers/16.pdf Luxburg, U. (2007). A tutorial on spectral clustering. Statistics and Computing, 17 (4), 395-416. Nie, F., Xu, D., Tsang, I., & Zhang, C. (2010, July). Flexible manifold embedding: A framework for semi-supervised and unsupervised dimension reduction. Image Processing, IEEE Transactions on, 19 (7), 1921-1932. Rayar, F., Barrat, S., Bouali, F., & Venturini, G. (2015). An approximate proximity graph incremental construction for large image collections indexing. In Foundations of intelligent systems: 22nd international symposium, ismis 2015, lyon, france, october 21-23,
32
2015, proceedings (pp. 59–68). Cham: Springer International Publishing. Retrieved from https://doi.org/10.1007/978-3-319-25252-0 7 doi: 10.1007/978-3-319-25252-0 7 Roweis, S., & Saul, L. (2000). Nonlinear dimensionality reduction by locally linear embedding. Science, 290 (5500), 2323–2326. Taigman, Y., Wolf, L., & Hassner, T. (2009, Sept.). Multiple one-shots for utilizing class label information. In The british machine vision conference (bmvc). Uno, T., Sugiyama, M., & Tsuda, K. (2009). Efficient construction of neighborhood graphs by the multiple sorting method. CoRR, abs/0904.3151 . Retrieved from http://arxiv.org/ abs/0904.3151 Wang, J., Yang, J., Yu, K., Lv, F., Huang, T., & Gong, Y. (2010, June). Locality-constrained linear coding for image classification. In Computer vision and pattern recognition (cvpr), 2010 ieee conference on (p. 3360-3367). Wang, M., Fu, W., Hao, S., Liu, H., & Wu, X. (2017, May). Learning on big graph: Label inference and regularization with anchor hierarchy. IEEE Transactions on Knowledge and Data Engineering, 29 (5), 1101-1114. doi: 10.1109/TKDE.2017.2654445 Wang, Z., Song, Y., & Zhang, C. (2009). Knowledge transfer on hybrid graph. In Proceedings of the 21st international jont conference on artifical intelligence (pp. 1291–1296). San Francisco, CA, USA: Morgan Kaufmann Publishers Inc. Retrieved from http://dl.acm .org/citation.cfm?id=1661445.1661652 West, D. B. (2001). Introduction to graph theory (2nd ed.). Prentice Hall. Wright, J., Yang, A., Ganesh, A., Sastry, S., & Ma, Y. (2009). Robust face recognition via sparse representation. IEEE Trans. on Pattern Analysis and Machine Intelligence, 31 (2), 210–227. Xu, Y., Zhong, A., Yang, J., & Zhang, D. (2010). Lpp solution schemes for use with face recognition. Pattern Recognition, 43 (12), 4165–4176. Zhang, Y.-M., Huang, K., Geng, G., & Liu, C.-L. (2013). Fast knn graph construction with locality sensitive hashing. In (Vol. 8189, p. 660-674). Springer. Zhou, D., Bousquet, O., Lal, T. N., Weston, J., & Sch¨olkopf, B. (2004). Learning with local and global consistency. In Advances in neural information processing systems 16 (pp. 321–328). MIT Press.
33
Zhu, W., Nie, F., & Li, X. (2017, March). Fast spectral clustering with efficient large graph construction. In 2017 ieee international conference on acoustics, speech and signal processing (icassp) (p. 2492-2496). doi: 10.1109/ICASSP.2017.7952605 Zhu, X., Ghahramani, Z., & Lafferty, J. (2003). Semi-supervised learning using gaussian fields and harmonic functions. In International conference on machine learning (pp. 912–919).
34
*Credit Author Statement
AUTHORSHIP STATEMENT
Manuscript title: Incremental and Dynamic Graph Construction with Application to Image Classification All persons who meet authorship criteria are listed as authors, and all authors certify that they have participated sufficiently in the work to take public responsibility for the content, including participation in the concept, design, analysis, writing, or revision of the manuscript. Furthermore, each author certifies that this material or similar material has not been and will not be submitted to or published in any other publication before its appearance in the Hong Kong Journal of Occupational Therapy. Authorship contributions Please indicate the specific contributions made by each author (list the authors’ initials followed by their surnames, e.g., Y.L. Cheung). The name of each author must appear at least once in each of the three categories below. Category 1 Conception and design of study: Fadi Dornaika, Alireza Bosaghzadeh; acquisition of data: Alireza Bosaghzadeh, Fadi Dornaika; analysis and/or interpretation of data: Alireza Bosaghzadeh, Fadi Dornaika;
Category 2 Drafting the manuscript: Alireza Bosaghzadeh, Fadi Dornaika; revising the manuscript critically for important intellectual content: Alireza Bosaghzadeh, Fadi Dornaika;
Category 3 Approval of the version of the manuscript to be published (the names of all authors must be listed): Alireza Bosaghzadeh, Fadi Dornaika;
Acknowledgements All persons who have made substantial contributions to the work reported in the manuscript (e.g., technical help, writing and editing assistance, general support), but who do not meet