Optik 126 (2015) 4004–4013
Contents lists available at ScienceDirect
Optik journal homepage: www.elsevier.de/ijleo
Variational approximate inferential probability generative model for ship recognition using remote sensing data Weiya Guo a,b,∗ , Xuezhi Xia b , Xiaofei Wang b a b
College of Information Technology, Harbin Engineering University, Harbin, China Wuhan Digital Engineering Research Institute, Wuhan, China
a r t i c l e
i n f o
Article history: Received 6 July 2014 Accepted 27 July 2015 Keywords: Ship recognition Hough transformation ε-Local neighborhood information Manifold similarity Variational methods
a b s t r a c t Aiming at detecting sea targets reliably and timely, a discriminative ship recognition method using optical remote sensing data based on variational methods probability generative model is presented. First, an improved Hough transformation is utilized for pretreatment of the target candidate region, which reduces the amount of computation by filterring the edge points, our experiments indicate the targets (ships) can be detected quickly and accurately. Second, based on rough set theory, the common discernibility degree is used to compute the significance weight of each candidate feature and select valid recognition features automatically. Finally, for each node, its neighbor nodes are sorted by their manifold similarity to the node. Using the classes of the selected nodes from top of sorted neighbor nodes list, a dynamic probability generative model is built to recognize ships in data from optical remote sensing system. Experimental results on real data show that the proposed approach can get better classification rates at a higher speed than the k-nearest neighbor (KNN), support vector machines (SVM) and traditional hierarchical discriminant regression (HDR) method. © 2015 Elsevier GmbH. All rights reserved.
1. Introduction Target detection and recognition from remote sensing image play a critical role in various applications of pattern recognition, such as fishery management, vessel traffic services, and maritime activities. In particular, with the naval strength keeps developing at high speed in the world, ship detection and recognition has become increasingly important for effective and efficient ship monitoring to form the marine combat intelligence. In recent years, with the rapid development of earth observation technology, satellite remote sensing has entered an unprecedented new stage, and the sea reconnaissance and target surveillance are provided with abundant data source by a number of high spatial resolution, short revisit circle imaging satellites. For example, taking the French SPOT-5 satellite full-color image into account, its point resolution has access to 2.5 m, taking the American Quick bird full-color image into account, its resolution is 0.6 m, and revisit circle is 1–3.5 days and also taking the most advanced military spy satellite of America into account, its resolution is 0.05 m. Moreover, with the promotion and implementation of the
∗ Corresponding author at: Harbin Engineering University, College of Information Technology, Harbin, China. Tel.: +86 13031666128. E-mail address:
[email protected] (W. Guo). http://dx.doi.org/10.1016/j.ijleo.2015.07.178 0030-4026/© 2015 Elsevier GmbH. All rights reserved.
earth satellite imaging system, in future, there will be some better performance, higher resolution, shorter revisit circle of earth observation satellite, and the available satellite remote sensing image data also increases by explosive growth. In the face of such massive remote sensing image data, due to the low efficiency, high cost, long information acquisition cycle and other defects, the traditional way such as artificial visual interpretation cannot satisfy the requirements of the modern society efficient information. How to quickly and accurately extract automatically and recognize ship target from massive remote sensing data has become an urgent need to solve the problem. Recently, different kinds of features have been proposed for optical remote sensing target recognition, and they appear most promising for the recognition performance. However, these methods mainly have two shortcomings: (1) they are generally depended on experiences to select feature vectors, and the selected vectors are high dimensional and redundancy, it is difficult to improve recognition accuracy and speed; (2) its classifier needs global search, and the time complexity is high, which cannot satisfy the requirements of real-time processing. To overcome the first aforementioned shortcoming, in the last century 90s, genetic algorithm was introduced to apply to feature selection [1,2], which obtained good results. Zheng [3] proposed an improved feature selection method based on genetic algorithm, Shang et al. [4] employed NSGA-II algorithm to optimize the
W. Guo et al. / Optik 126 (2015) 4004–4013
classification learning framework (MSCC) [5]. Besides, based on rough set theory [6], Du et al. [7] used the idea of attributes reduction to compute the significance weight of each candidate feature and select valid recognition features. Moreover, the improved Zernike moment invariant was used to recognize large warship in air remote sensing image [8]. In addition, an efficient and accurate face detection using feature selection was proposed in [9]. Later, a novel linear function combining pixel and region characteristics by Yang et al. [10] is employed to select ship candidates, which has optimized the detection performance. Despite they have enabled the recognition accuracy to improve, most of them have the lower time complexity. To overcome the second aforementioned shortcoming, depending on the existing various methods, image recognition efficiency have improved gradually. A task oriented facial behavior recognition method was presented in [11], despite the recognition rate was high, it cost a lot of computation time. Support Vector Machine (SVM) technology was applied to the target recognition of Quick Bird satellite data [12] and SAR images [13], and SVM also used in face recognition [14–16]. Whereas SVM method was derived from statistical theory, which caused a preference for two class problems, it performs poorly in multi-class classification tasks. An improved k neighbor algorithm to realize the supervision of the remote sensing image classification was presented in [17], and likelihood function based on the Bayesian criterion was designed to recognize the ship satellite image [18]. These methods were successfully applied in ship target recognition. However, they were improved based on traditional algorithms, they needed global traversal optimization in the process of classification, and did not have the ability of online learning, namely, and it cannot effectively use the distribution of the training samples. Akakın and Sankur [19] proposed a method of robust classification of face and head gestures in video, although the recognition rate is as high as 98%, it costs a lot of computation time. In addition, hierarchical discriminating regression (HDR) [20] was already investigated, which unified the classification and regression problems to the issue of regression, the efficiency was greater, the time complexity was O (d log n), it was suitable for high-dimensional image data processing. Based on HDR method, Weng and Hwang [21] proposed an incremental discriminant regression technique, it allowed us to obtain a significantly better classification accuracy, but the computation time was slightly larger than the traditional HDR algorithm. Besides, for ship detection, a new classification approach using shape and texture was introduced in [22], which attained a good classification rate. And another method [23] provided a robust enhancement and detection of mostly line structures in 2-D gray-scale images. Although these methods relatively have a high detection or recognition accuracy, with respect to the time consumption, they are not suitable for the requirements of real-time ship detection or recognition. Further, An et al. classified airplane candidates by using circle frequency filter [24], the detection rate is higher than 96%, however, the whole time cost is around 1 m. In recent years, the application of probability generative model in classifier becomes popular, for example, the approach [25] classified ship candidates by using their class probability distributions rather than the direct extracted features, which was effective in distinguishing between ships and non-ships. For ship recognition, it is prefer a the higher recognition rate and lower time consumption. In view of the above-mentioned facts, in this paper, we are concerned with recognition efficiency, taking more than 4 m resolution of optical remote sensing image data as the research object, based on class propagation distribution, a general ship target recognition method using probability generative model is proposed, which has made necessary improvements in image segmentation, feature selection and the design of classifier.
4005
Fig. 1. The proposed method flow diagram.
Fig. 1 illustrates the flow diagram of the proposed method procedure. The rest of the paper is organized as follows. Section 2 briefly introduces an improved Hough transformation and we use them to preprocess target candidate region. Section 3 sketches our features extraction and based on rough set theory how we apply attribute reduction to features selection. Section 4 is devoted to target classification of dynamic probability generative model based on the classes of neighbor nodes. Section 5 gives experiments and comparative results in detail, and Section 6 summarizes our contributions and sketches future work. 2. Pretreatment of the target candidate region Hough transformation [26] has good detection adaptability due to its high positioning accuracy, strong ability to resist noise and clutter interference. However, its shortcomings like large amount of computation and high storage space have limited its application. To solve this problem we proposed the improved Hough transformation. Its principle as follows: (1) Randomly select three edge points that are not on the same line from an image’s edge point set V, according to the three points, we can get a circle, of which the center is (a, b), and the radius is r. So we have 2xa + 2yb + d = x2 + y2
(1)
(2) Now, assume the selected edge points are vi = (x, yi ) , i = 1, 2, 3, and the obtained center (a123 , b123 ) as well as the r123 can be expressed as follows:
a123 =
b123 =
x2 + y2 − (x2 + y2 ) 2(y2 − y1 ) 1 1 2 2 2 2 x3 + y3 − (x12 + y12 ) 2(y3 − y1 )
4((x2 − x1 )(y3 − y1 ) − (x3 − x1 )(y2 − y1 ))
2(x2 − x1 ) x2 + y2 − (x2 + y2 ) 2 2 1 1 2(x3 − x1 ) x32 + y32 − (x12 + y12 )
4((x2 − x1 )(y3 − y1 ) − (x3 − x1 )(y2 − y1 ))
(2)
(3)
r123 =
2
(xi − a123 )2 + (yi − b123 ) ,
i = 1, 2, 3
(4)
In addition, we take v4 = (x4 , y4 ), if it meets Eq. (5), and then v4 is on the circle. Where ı is the threshold value:
d4→123 =
2
(x4 − a123 )2 + (y4 − b123 ) − r123 < ı
(5)
(3) Based on the above, for the candidate circle cijk , which is determined by vi , vj , vk , take all the points vl of the set V, if the points can satisfy dl→ijk < ı, which means they are on the candidate
4006
W. Guo et al. / Optik 126 (2015) 4004–4013
Fig. 4. Area code examples.
3.1. Feature extraction For optical remote sensing images, there are some differences in size, shape, texture, and so on. We extract features from the following items: Fig. 2. Diagram of improved Circular Hough detection algorithm.
circle cijk , then count is incremented until all the edge points are completely taken. If the accumulated value is greater than a specified threshold ˛, it can be determined that the candidate circle is a true circle, then continue to detect the next candidate circle in accordance with the above steps. In fact, the improved algorithm reduces the amount of computation by filterring the edge points, its physical meaning can be explained by Fig. 2. For the circle, with the center is (a, b), and the radius is r, its any points must exist within its external square. So by Fig. 2, we can get inequalities (6).
⎧ xl > aijk + rijk + t ⎪ ⎪ ⎪ ⎪ ⎨ xl < aijk − rijk − t ⎪ yl > bijk + rijk + t ⎪ ⎪ ⎪ ⎩
(6)
yl < bijk − rijk − t
where t is a variable parameter, for any point such as vl of the set V, as long as it satisfies one of the inequalities (6), dl→ijk won’t be calculated, then start to detect next edge point. As we explain in the above, the environment of experiment as follows: the tool is Matlab7.0 R, and each original image solution is 400 × 300, they are derived from Quick bird satellite images. Fig. 3 strengthens the results by saying that using the improved Circular Hough the targets (ships) can be detected quickly and accurately. 3. Ship target feature extraction and selection Feature selection as well as extraction is a challenging problem in areas such as pattern recognition, machine learning and data mining. In general, distinguished features are helpful for ship recognition, so in this section, we intend to find more effective features, and consider a consistency measure based on rough set theory (also called attribute reduction), which aims to retain the discriminatory power of original features.
(1) Size features: length F1 , width F2 , length-width ratio F3 , and perimeter F4 . (2) Texture features: image entropy F5 and smoothness F6 . Within the 8×8-neighborhood-window, image entropy is computed as 8 8
ME (x, y) = −
px,y log px,y
(7)
x=1 y=1
px,y =
f (x, y)
8 8 x=1
y=1
(8)
f (x, y)
where f (x, y) is the gray value of a point (x, y) in the neighbor window, and f (x, y) > 0, px,y is the probability distribution of f (x, y). (3) Shape features: bow shape F7 and stern F8 , radius deviation F9 , spindle length F10 , and compactness F11 , F11 is given as Eq. (9), compared to roundness, compactness gives full consideration to the impact on the average radius of the object boundary changes, which is better to describe complexity of target shapes.
L=
× r2 A
where r =
=×
(i,j)∈˝
(i,j)∈˝
i − ¯i
2
+ j − ¯j
2 (9)
(P × A)
i−i
2
+ j−j
2
/P is the equivalent
radius, P is the perimeter and A is the area, i, j represents its center, ˝ expresses the target boundary points set. (4) Moment invariant features: 7 Hu moment invariants (F12 ∼F18 ), the front 20 invariants of less than 8 Zernike moment (F19 ∼F38 ) and 8 wavelet moment invariants (F39 ∼F46 ). (5) Area ratio features: different with the area code of whole target [27], this paper only extracts ship target area ratio of the bow F47 and stern F48 , which is shown in Fig. 4: To Fig. 4, obtain the target slices circumscribed rectangle along the spindle rotation to its horizontal direction, then divide the rectangle into N equal parts, take the ith part of target area as Si , each
Fig. 3. Original images and detection results.
W. Guo et al. / Optik 126 (2015) 4004–4013
part is encoded according to Eq. (10) and ARC is defined in terms of the following equation:
Ci 1≤i≤N
= floor
Si
S 1≤i≤N i
× 10
(10)
Ci = [C1 , C2 , . . ., CN ]
(11)
In view of the above-mentioned facts, N = 6, ARC code is C=[4,8,10,10,9,7] in this paper, we just take C1 and C6 , namely, F47 = C1 , F48 = C6 . Considering the majority
ofship
target, which bow and stern parts have occupied about 1/3 ∼ 1/8 of the whole ship, so usually the divided number is 3 ≤ N ≤ 8, specifically, N = 6. Summing up, obtain the feature vectors F = [F1 , F2 , . . ., F48 ] from all the extracted features.
Insufficient prior knowledge often brings blindness to feature extraction, in this paper the relevance of extracted features is strong, due to the redundancy, and it’s easy to reduce recognition efficiency. In fact, ship target recognition is a multi-feature combination classification problem, different features combinations for recognition contributions are different, and our purpose is to find the most effective features combinations. Referring to the related algorithm [28], based on rough set theory [6], the common discernibility degree [29] is utilized to compute the significance weight of each candidate feature and select valid recognition features automatically. 3.2.1. Common discernibility degree computing Formally, in rough set theory, an information system can be expressed by a system S, where, S = (U, A, V, f ), U is an universe, and U = x1 , x2 , . . ., x|U| , A is a set of all attributes. Each attribute aj ∈ A, and Vj is called the value set of aj . We let V = a ∈A Vj , and j
define an information f : U × A → V, which satisfies the con function ditions: ∀aj ∈ A, f x, aj ∈ Vj . If A = C ∪ D and C∩ D = ∅, S = (U, C, D) is called a decision table, where C is called a condition attribute set, D is called a decision attribute set. If ∃aj ∈ A, and Vj includes “∅”, then S is called an incomplete information system. In consequence of the above assumption, first, we associate discernibility relation of the attribute set P ⊆ A as the following equation:
It can be seen from Eqs. (15) and (16), DIS (Q ; P) belonging to attribute sets P and Q is an ordered set and can distinguish the domain U from another. Simultaneously, DIS (Q ; P) is the number of elements of DIS (Q ; P), which is used to measure the common discernibility abilities of P and Q. 3.2.2. Feature selection algorithm steps As mentioned in Section 3.2.1, the realized stages in this phase can be summarized as below: (1) Discretize the extracted ship features, construct feature decision table as follows. C1
/ f (y, a) , (x, y) ∈ U × U|∃a ∈ P, f (x, a) =
f (x, a) = / ∅, f (y, a) = / ∅
DP (x) =
y ∈ U| (x, y) ∈ DIS (P)
(13)
(14)
i=1
|DIS (P) | in Eq. (14) measures the discernibility ability of attribute set P, which is equal to the number of the ordered elements. Further, we define the common discernibility relation between attributes P and Q: DIS (Q ; P) = DIS (Q ) ∩ DIS (P)
(15)
Its corresponding common discernibility degree is described as
DIS (Q ; P) = DIS (Q ) DIS (P)
Cn ···
F11
.. .
D
F1n .. .
Fm1
Fmn
d1 .. . dm
⎤ ⎥ ⎥ ⎦
(17)
T C i = (F1i , F2i , . . ., Fmi ) means the ith candidate features, C = C1 , C2 , . . ., Cm indicates all candidate features set, D =
d1 , d2 , . . ., dm is the ship decision (category) set, Fij denotes the jth feature of the ith training sample, m, n represent the number of training samples and feature dimensions, respectively. (2) Define the feature sets Q and T, and set Q =∅, T = C, according to Eq. (16) calculate the common discernibility degree DIS (Q ; P) between attribute sets C and D. (3) By Eq. (19) define feature set importance function, according to Eq. (20) calculate the most important feature Ck of set C–Q, and update Q and T, namely, we set:
Q = Q ∪ Ck ,
T = T − Ck ,
1 ≤ i ≤ T
− DIS (D, Q ) SGF (Ck , Q, D) = DIS D, Q ∪ Ck SGF (Ck , Q, D) = maxSGF (Ci , Q, D)
Ci ∈T
(18) (19) (20)
(4) If DIS (D; Q ) = DIS (D; C), go to step 5, otherwise, go to step 3. (5) Take the final Q = C1 , C2 , . . ., Cl as the feature selection results, build the reduced training sample feature set: C1
X=
.. . xm
⎡ ⎢ ⎢ ⎣
···
F11
Cl ··· .. .
Fm1
···
D
F1l .. . Fml
d1 .. . dm
⎤ ⎥ ⎥ ⎦
(21)
where xi is the ith target reduced feature vector, l is the reduced feature dimensions number.
The attribute set P discernibility degree is viewed as |U| DIS (P) = DP (xi )
···
In Eq. (17), Ui = (Fi1 , Fi2 , . . ., F in ) represents the ith feature vector, U = U1 , U2 , . . ., Um indicates the target set,
(12)
The largest set of distinguishable objects with x can be written
⎢ ⎢ ⎣
.. . Um
xi as
⎡
U1 S (U, C, D) =
3.2. Feature selection
DIS (P) =
4007
(16)
To sum up, applications and performance of the feature extraction and selection method are discussed in subsequent Section 5.2. 4. Target classification and recognition In this paper, the classifier is designed based on dynamic probability generative model. The main idea of the proposed method is derived from building a new generative model in an undirected graph, in which the edges of the graph are observed variables and the classes of the nodes whose classes are unknown are latent variables. The values of the latent variables can be calculated by fitting the generative model of the graph.
4008
W. Guo et al. / Optik 126 (2015) 4004–4013
The method does not require training process, it directly uses iterative formula to calculate the class of the unknown node. In this section, first we propose the concept of class propagation distribution, and use the concept to express the probability of a node’s neighbors belonging to each class. Then based the class distribution, the dynamic probability generating model is presented. Finally, by solving the model, the classes of the nodes whose classes are unknown are obtained. 4.1. Dynamic probability generative model based on the classes of neighbor nodes
First, nodes pairwise constraints are divided into two kinds: Vi , Vj ∈ Link, which means two nodes Vi and Vj belong to the
4.1.1. Class propagation distribution In the networks with low homophily [30], there exists a majority of connected nodes whose classes are different from each other. So just depending on whether there is an edge between two nodes to judge whether they belonging to the same class are not enough. Therefore, this paper considers two nodes’ classes and their neighbor nodes’ classes at the same time. Assume there two nodes Vi and Vj , and Vj belongs to the class Lc , in the neighbor nodes of Vj , the more nodes belong to class Lc , the greater probability of edges exists between the nodes Vi and Vj . This paper differs from the class propagation distribution proposed in [31], which only considered the neighbor nodes classes and had no limit on the number of neighbor nodes. It’s necessary to discuss a node’s own class contribution for improving classification accuracy, and in order to reduce the computing time, also discuss how many neighbor nodes should participate in computation. First, corresponding definitions are listed as follows: Definition 1 (: ε-Neighborhood distance). In an undirected graph G = (V, E), in which there exists two nodes Vi and Vj , defines Vi ’s ε-neighborhood distance as below:
⎡
Dε Vi , Vj = ⎣
dist(Vi ,Vj )
˚1
dist(Vi ,Vj )
˚2
−1 −1
dist Vi , Vj < ε
Fig. 5. Flow chart of computing similarity matrix.
same class; Vi , Vj ∈ / Link, which means two nodes Vi and Vj do not belong to the same class. Next, make the following adjustments to the labeled nodes: (1) To existing previous information constraint (Vi , Vk ) ∈ Link, make the following adjustments to the similarity between two corresponding nodes.
s (i, j) = ∞
(24)
In Eq. (24), w (i, j) is an element of the similarity matrix Wn×n . Get a new restriction from the known ∈Link constraint extension, namely, if there exists a nodes Vk , which satisfies (Vi , Vk ) ∈ Link restriction, then
Vi , Vj ∈ Link
(Vi , Vk ) ∈ Link
⇒ Vj , Vk ∈ Link ⇒ w (j, k) = 0, w (k, j) = 0 (25)
(2) To existing previous information constraint Vi , Vj ∈ / Link, make the following adjustments to the similarity between two corresponding nodes.
(22)
dist Vi , Vj ≥ ε
Vi , Vj ∈ / Link ⇒ w (i, j) = ∞,
w (j, i) = ∞
(26)
Obtain a new restriction from the known ∈ /Link constraint extension, namely, if there exists a node Vk , which satisfies (Xi , Xk ) ∈ Link restriction, then
where dist Vi , Vj means the Euclidean distance between two nodes Vi and Vj , ˚1 as well as ˚2 is an adjustment density factor, ε is an adjustable radius. For a node, if the distance between another is less than ε, then set ˚1 < 1, on the contrary, if the distance is beyond ε, then set ˚2 > 1. Noting that minus
the constant 1 is able to satisfy the condition, that is, if dist Vi , Vj = 0, then
Vi , Vj ∈ / Link ⇒ s (i, j) = ∞,
Vi , Vj ∈ / Link
(Vi , Vk ) ∈ Link
⇒ Vj , Vk ∈ / Link ⇒ w (j, k) = ∞, w (k, j) = ∞ (27)
Dε Vi , Vj = 0. However, we argue that while ε-neighborhood distances between two samples are equal, if only use ε-neighborhood distance to measure its neighbor nodes, which may result in the wrong classification. Therefore, we need to further give as follows: Definition 2 (: Manifold similarity). By manifold assumption [32], nodes close to each other in the manifold most likely belong to the same class. In a weighted undirected graph G = (V, E), P (i, j) represents a set of all the paths that connect nodes Vi and Vj , p is for any path, pk is the kth node’s path. According to the shortest path, the manifold similarity is measured as
|p|−1
W (i, j) = min
Dε (pk , pk+1 )
(23)
k=1
Manifold similarity is calculated along the shortest path in the manifold, which makes the nodes in the same high density regions connect with shorter sides and in different density regions connect with longer sides. The main constructing manifold similarity procedure is summarized as follows:
Fig. 5 outlines the flow diagram of the similarity matrix Wn×n computing procedure: Noticing that the influences of Definition 1 parameters such as ε, ˚1 and ˚2 on the classification results in detail will be shown in subsequent Section 5.3. Then quickly sort the neighbor nodes of Vi in accordance with the manifold similarity, and select Mi th nodes from top of sorted neighbor nodes list (the value of Mi is about equal to 40–60% ratio of the all the nodes). Definition 3 (: Class propagation distribution). Suppose that there exists K classes, the node Vi has Mi neighbor nodes, in which there are Mic nodes belonging to class Lc , and the ratio of the node Vi itself and its neighbor nodes belonging to the class Lc is defined as ˛
1 N + ˇ ic K Mi
0 < ˛ < 1,
ˇ =1−˛
(28)
For convenience, Eq. (28) can be given as ic = ˛
1 N + ˇ ic K Mi
(29)
W. Guo et al. / Optik 126 (2015) 4004–4013 Table 1 Symbols interpretation. Symbol
Interpretation
S Sij VL VU Vi L
The adjacency matrix of the undirected graph G The element in the ith row and jth column of the matrix S The set of nodes whose classes are known The set of nodes whose classes are unknown The ith node in V The set of classes which are composed of the classes of the nodes in VL The cth class in L The index indicating the position of the class of Vi in L The class propagation distribution of Vi The matrix which is composed of the class propagation distribution of all nodes
Lc yi i
So the vector i1 , . . ., ic , . . ., iK is called the class propagation distribution of Vi , which is usually denoted as i . From the above definitions, for a node, despite its class propagation distribution is determined by a node’s own class and its neighbor nodes’ classes, by means of the subsequent Section 5.3, we can reasonably infer that when ˛ is close to 0 and ˇ is close to 1, the better classification effect is obtained, so we change Eq. (29) into: N ic = ic Mi
(30)
Indeed, since a node’s own class is not involved in Eq. (30), we only consider its neighbor nodes’ classes influence on the classification results in the subsequent discussion. Intuitively, in this paper, we use the class propagation distribution of Vi and the class of Vj to describe the probability of existing edge between Vi and Vj , which is the component of the class propagation distribution of Vi in the class of Vj . 4.1.2. Probability generative model based on class propagation distribution Before showing the usage of our model, we give the related symbols and their interpretations. We use an undirected graph G = (V, E) to express the class nodes, here, E is the set of edges, V is the set of nodes. Table 1 shows the interpretations of each symbol. Take yi as the class index of node Vi , namely, if the class index of Vi is yi , then the class of Vi is denoted by Lyi . If there is an edge between nodes Vi and Vj , then Sij = 1, otherwise Sij = 0. Assume in an undirected graph, there are N nodes belonging to K classes, and the creating edges steps are as follows: First of all, generate the class propagation distribution of each node from Dirichlet prior with parameter ı. Then, for anode whose class is unknown, get an integer from the uniform distribution between 0 and K−1, and take the integer as the class index of the node, so assign a class to each node. Finally, the edges between Vi and Vj is generated from Bernoulli distribution, and the Bernoulli distribution parameter is the component iyj of the class propagation distribution of Vi in class Lyj . Use y to express the set which is composed of the class indexes of the nodes whose classes are unknown, then the model’s joint probability distribution of observed variables and latent variables is shown as
p E, y| = p (y) p E|y,
(31)
In this paper, we apply the Gibbs sampling method to fitting the model. Gibbs sampling is a fast and efficient Markov chain Monte Carlo sampling method, which is commonly used in probability generative models [33]. Usually Gibbs sampling is adopted iterative formula to get the values of latent variables by sampling from the posterior distribution of the latent variables.
4009
Consider Gibbs sampling method to solve the probability generative model. In fact, this method has better versatility, but its efficiency is lower, and it is also known that in each iteration, it must wait until every Markov chain reaches a steady state. So in this paper we adopt the variational method [34] to solve the proposed model. In the calculus of variations study, for the visible variable E of each training sample, we use the approximate posterior distribu tion q (E|yu , ) to replace the latent variable distribution p E|yu , , where yu¯ represents the class indexes of the nodes except Vu . Therefore, we use naive mean field [35], and utilize the distribution of complete factorization to the true posterior distribution:
K| q (y, ) = j=1 q yj , where, q yj = 1 = j . Note that the lower bound on the log likelihood function of training samples has the following:
K N
ln p E, ≥
ei wij j − ln Z
i=1,i = / u j=1
+
K
j ln j + 1 − j ln 1 − j
j=1
(32)
N K where the function Z = e−E(ei ,yi ,) is a normal/ u i=1,i = j=1 ized item, and the energy function is defined as
E ei , yi , = −
N K
ei wij yj −
i=1,i = / u j=1
N i=1
i ei −
N
i yi
(33)
i=1
Fix , by maximizing Eq. (32) to learn the variational parameter , so the mean field’s fixed point equation is defined as j ←
N i=1
wij ei
(34)
where (x) = 1/1 + e−x is for the logistic function. Further, given the variational parameter , update the parameter as follows: ic =
C +ı
ic K
k=1
Cic + ı
(35)
where ic is equal to the proportion of the nodes whose class is Lc in all neighbor nodes of Vi . The estimator can be used as the value of i in calculation. The ic procedure is available in Appendix A. In this section, for the classification we use direct iteration method, and abandon the training process, so reduce the training time. According to Eq. (32) sampling, after a certain number of iterations, check whether or algorithm satisfies the not the proposed termination condition k+1 − k < ı, if OK, terminate the algorithm. Otherwise, go on. Note that, at each iteration, we record each unknown node class. So, to a node, in terms of Definition 3, take the assigned highest number class as its final class. 5. Experimental results and analysis 5.1. Experiments settings In order to test the efficiency of the proposed method, experimental data have been randomly selected from Google map’s aerial images. First, preprocess the images as mentioned in Section 2, and then get target slices by GPAC (Graph Partitioning Active Contours) algorithm [36], as shown in Fig. 6. There are six vessels experimental data, and each vessel has 20 pieces, so there are a total of 120 pieces images.
4010
W. Guo et al. / Optik 126 (2015) 4004–4013
Fig. 7. Significance weight of candidate features.
Fig. 8. Results of different ε.
5.3. Influences of parameter ˛, ˇ and burn-in period on classification results In this experiment, we use Micro-F1 to evaluate the classification result. Micro-F1 is a real between 0 and 1, and the greater numerical value is, the better classification performance is. According to the method [37], calculate the value of Micro-F1, as shown in Eq. (38). Where tic denotes the actual class of node Vi , yic represents the gained class by classification methods. If the actual class of Vi is Lc , then tic = 1, otherwise tic = 0. If the gained class is Lc , then yic = 1, otherwise yic = 0.
Fig. 6. Example of experiment data.
5.2. Recognition feature extraction and selection As explained in Sections 3.1 and 3.2, we get the 48-dimensional feature vector. Then take the ship training samples set as the domain U, its corresponding feature vectors as the condition attributes set C, the corresponding class as decision attributes set D. By Eq. (17), construct the decision table as below:
⎡ ⎢
F11
S (U, C, D) = ⎢ .. ⎣ . 1 F60
d1 .. . .. . . .. 48 d · · · F60 60 ···
F148
⎤ ⎥ ⎥ ⎦
(36)
j
where Fi represents the ith feature value of the jth training sample. As was pointed out in Section 3.2, we get the reduction
T
i result Q = C1 , C3 , C5 , C43 , C48 , where Ci = F1i , F2i , . . ., F60 is the feature vectors which are composed of the ith feature of all the samples. With respect to Eq. (19), compute the significance weight of individual feature: WCk = SGF (Ck , Q, D), for convenience processing, it can be modified as
W Ck =
W
Ck , Ci ∈ Q WCi
The selected valid features is Z =
(37)
F3 , F4 , F5 , F11 , F18 , F29 ,
F41 , F44 , F46 , F47 , F48 , namely, including the ratio of length to width, perimeter, entropy, compactness, the 7th Hu moment invariant, the 11th Zernike moment invariant, three of the most wavelet moment F influential 41 , F44 , F46 , which are invariants: F41 = W010 , F44 = W101 , F46 = W111 , and the first and last ARC. Fig. 7 shows the significance weight of the valid candidate features. It can conclude that based on rough set theory, the common discernibility degree is utilized to compute the significance weight of each candidate feature well and select valid recognition features automatically.
× tic ) i,c (yic Micro − F1 = 2 i,c
(yic + tic )
(38)
To test the influence of various parameters on the classification results, select 30, 40, 50, 60, 70, 80, 90 samples from all the samples, respectively. Fig. 8 is the classification results of different ε. As can be visible in Fig. 8, either ε is large or ε becomes small, the classification performance decreases significantly. When ε is around 5, the best performances are achieved, so in the experiment, we take ε = 5. As mentioned before in Definition 1, ˚1 is a density factor, which aim at decreasing the distance between two nodes distributed in a high density region. Fig. 9 shows the classification results with different ˚1 . As ˚1 varies, the classification result varies too. Notice that the best performance with respect to ˚1 is around 0.5 is obtained, so in the paper, we take ˚1 = 0.5. Similarity, ˚2 is also a density factor, which aim at increasing the distance between two nodes distributed in a low density region. As ˚2 varies, the classification result varies too. Notice that when ˚2 is around 2.5, the best performance are received, so here we take ˚2 = 2.5 (Fig. 10). In Definition 3, ˛ is a weight coefficient. Referring to Fig. 11, if the value of ˛ is too large, a node’s own class has greater influence
Fig. 9. Results of different ˚1 .
W. Guo et al. / Optik 126 (2015) 4004–4013
4011
Table 2 Experimental data of algorithms (I). Method
Performance index
Group 1
Group 2
Group 3
Group 4
KNN
Average recognition rate (%) Average time consumption (s)
66.3 0.490
67.9 0.461
70.6 0.422
76.9 0.362
SVM
Average recognition rate (%) Average time consumption (s)
61.5 0.074
71.7 0.075
73.3 0.078
77.1 0.075
HDR
Average recognition rate (%) Average time consumption (s)
62.8 0.049
79.1 0.047
85.0 0.045
87.2 0.044
The proposed method
Average recognition rate (%) Average time consumption (s)
69.7 0.042
81.1 0.040
87.2 0.037
89.2 0.035
5.4. Recognition performance comparisons
Fig. 10. Results of different ˚2 .
Fig. 11. Results of different ˛.
on the classification results, if the value is too small, the classes of the neighbor nodes have greater influence on the classification results, however, it is easy to see that, the smaller the value of ˛ is, the more suited the classification results appears, we can reasonably deduce ˛ → 0. In this paper, we take ˛ = 0.ˇ is a parameter of the prior distribution, before the actual data have been observed, it represents the estimator of the data distribution. Despite it can play a role in smoothing, if the value of ˇ is too large, the prior distribution would weaken the actual data influence on the final results, which makes the classification performance decrease significantly.
As can be seen from Fig. 12, if ˇ changes in the vicinity of 1/K , it has little impact on the classification performance (Fig. 13). ı is a parameter of the prior distribution, before the true data have been observed, it represents the estimator of the data distribution. Despite it can play a role in smoothing, when ı is too large, the prior distribution would weaken the true data influence on the final results, and the classification performance signifi decrease cantly. However, ı changes in the vicinity of 1/K , it has little impact on the classification performance.
Fig. 12. Results of different ˇ.
As was pointed out in Section 3.2.1, based on rough set theory, we use common discernibility degree to compute the significance weight of each candidate feature and select valid recognition features automatically, and take the valid recognition features as sample sets to run recognition tests. We use the average obtained from many experiments to verify the proposed algorithm recognition performance (average recognition rate, average time consumption) and compared it with the results performed by k-nearest neighbor (KNN) [38] method, support vector machine (SVM) method and hierarchical discriminant regression (HDR) method. In this paper, a total of 8 groups’ experiments have been done. In each experiment, randomly select 30, 40, 50, 60, 70, 80, 90, and 100 training samples from 6 class cruisers. Repeat 10 times, for each group, take the average of 10 experiments data as the final recognition rate. The 8 sets of experiments are denoted: groups 1, 2, 3, 4, 5, 6, 7, and 8, respectively. Test data are shown in Tables 2 and 3. Based on the comparisons between Tables 2 and 3, it is well known that, under the same experimental conditions, the correct detection rate of proposed algorithm is slightly higher than HDR classifier and is around 10% higher than other algorithms. Its time consumption is significantly lower than HDR classifier in each group and is at least 90% lower than others. As aforementioned, it can easily be found that the proposed method has advantages compared with the present traditional methods: KNN, SVM and HDR. KNN method has taken the nearest Euclidean distance samples classes as output, it can’t make use of the training sample distribution, which causes the lower detection performance. SVM method has been proposed based on statistical learning theory, which is suitable for two classes’ recognition, to expand to multi-class recognition, the performance has declined. HDR is a tree classifier, which has unified the classification and regression problems to return to the issue of regression, and has the ability of incremental learning, so with the samples increasing, it has a higher detection rate, but it costs a long time. The proposed method is efficient, because of the direct iteration method without training process, and not all the neighbor nodes having participated in computation. Nonetheless, the proposed method seems to give a reasonable indication, and can be used to complement other results. It’s helpful
Fig. 13. Results of different ı.
4012
W. Guo et al. / Optik 126 (2015) 4004–4013
Table 3 Experimental data of algorithms (II). Method
Performance index
Group 5
Group 6
Group 7
Group 8
KNN
Average recognition rate (%) Average time consumption (s)
79.0 0.298
80.6 0.262
83.8 0.240
85.8 0.214
SVM
Average recognition rate (%) Average time consumption (s)
79.3 0.070
81.1 0.063
84.2 0.055
87.1 0.044
HDR
Average recognition rate (%) Average time consumption (s)
87.8 0.040
88.9 0.038
90.1 0.036
92.3 0.030
The proposed method
Average recognition rate (%) Average time consumption (s)
89.3 0.038
89.5 0.036
89.7 0.033
91.2 0.031
for classification, in addition, only a portion of all neighbor nodes are used to participate in computation, it lowers the computing time.
6. Conclusions This paper has proposed a discriminative approach for ship detection using optical remote sensing data based on manifold similarity probability generative model. The proposed approach can be considered as a reliable and timely detection method for battlefield’s targets on the sea. First, an improved Hough transformation is introduced and applied to preprocess target candidate region, which reduces the amount of computation by filterring the edge points. By our experiments, we conclude that the targets (ships) can be detected quickly and accurately. Second, an attribute reduction algorithm based on common discernibility degree is adopted to select recognition features automatically. In this section, on the basis of the extracted invariant features, based on rough set theory, the common discernibility degree is used to compute the significance weight of each candidate feature and select valid recognition features automatically. It is also worthwhile to mention in this context that the rough set theory ensures the relatively high reduction rate and simultaneously reduce time complexity. Finally, an efficient classifier with dynamic probability generative model is applied to classify the ship target. In this section, the classifier is designed based on dynamic probability generative model, its main idea is to build a new generative model in the undirected graph, in which the edges of the graph are observed variables and the classes of the nodes whose classes are unknown are latent variables. The values of the latent variables can be calculated by fitting the generative model to the graph. Noticing that, the neighbor sample nodes with respect to each sample node are sorted by their manifold similarity. Using the classes of the selected nodes from top of sorted neighbor nodes list, a dynamic probability generative model is built to classify ships in data from optical remote sensing system. In feature extraction, apart from commonly used size, texture, shape and moment invariants features, an area ratio code has been introduced. Our experiments further prove the new feature to be a powerful feature for the target classification. In feature selection, based on rough set theory, the common discernibility degree is utilized to compute the significance weight of each candidate feature and select valid recognition features automatically. Experimental results in this paper also show that the attribute reduction algorithm based on common discernibility degree ensures the relatively high reduction rate and simultaneously reaches fairly low time complexity in incomplete information system, it can be a practical method for feature selection and helpful for ship recognition.
In classification, the proposed method has shown to be more discriminative than KNN, SVM and HDR methods. Moreover, by the analysis and comparisons, experimental results show that the classification strategies based on manifold similarity probability generative model is helpful for achieving a good classification performance. It turns out to be more suitable for real-time target detection application. Although our approach has shown promising results overall, several issues that necessitate our further improvement or refinement to enhance the performance of the proposed method still remain to be discovered. First, the proposed method is heuristic, it consists of several steps containing a lot of thresholds, some of them should be further refined. A semi-supervised hierarchical strategy may be a better solution. Second, false candidates, which mainly comprise ports and sea clutter, especially in the case of relatively lower resolution also exist. More effective features are needed to distinguish between them. Finally, the time consumption of the classifier is still not ideal in multi-class tasks. In future work, hopefully enable us to make different methods: more effective feature extraction and detailed feature selection, efficient classification, the use of multispectral features, and a test of the proposed approach on a larger set of remote sensing images over a wide resolution range.
Acknowledgments The authors would like to acknowledge Wuhan Digital Engineering Research Institute for providing materials and equipments for this study, they also like to thank the anonymous reviewers and the associate editors for their constructive comments and suggestions. Furthermore, they should thank the America Dr Wang for giving useful advice for the related algorithm.
Appendix A. Procedure of the estimator of cth component ic of parameter i Posterior probability distribution of parameter i:
p i |E, ı =
∝ p i , E|ı =
p i , E|ı
p E|ı
N
p i |ı p E|i = p i |ı
iyj
Wij
K
= p i |ı
j=1
iyj
Cik
(A.1)
k=1
Because,
N
K ×ı
K ı−1 ik
K
p i |ı = Dirichlet i |ı = i=1
ı
k=1
(A.2)
W. Guo et al. / Optik 126 (2015) 4004–4013
Clearly, we have:
K
p i |ı
k=1
iyj
Cik
∝ Dirichlet i |ı
!
(A.3)
"
where ı = ı + Ci1 , . . ., ı + Cic , . . ., ı + Cik , Cic says the number of edges between Vi and the nodes whose class is Lc . And accordingly,
p i |Ei , ı ∝ Dirichlet i |ı
(A.4)
The value of parameter i can be estimated by the expectations of its posterior distribution. In fact, the estimator of the cth component ic of i is expressed as ˆ ic =
C +ı
ic K
k=1
Cic + ı
(A.5)
References [1] W. Siedlecki, J. Sklansky, A note on genetic algorithm for large-scale feature selection, Pattern Recognit. Lett. 10 (11) (1989) 335–347. [2] W. Siedlecki, J. Sklansky, On automatic feature selection, Int. J. Pattern Recognit. Artif. Intell. 2 (2) (1998) 197–200. [3] Y.M. Zheng, Improvement of Feature Selection Method Based on Genetic Algorithm, Chongqing University, Chongqing, 2009. [4] Rong-hua Shang, Chao-xu Hu, Li-cheng Jiao, Jing Bai, Research of multiobjective optimization algorithms’ application in multi-class classification, Acta Electron. Sin. 40 (11) (2012) 2265–2266. [5] W.L. Cai, S.C. Chen, D.Q. Zhang, A mult-objective simultaneous learning framework for clustering and classification, IEEE Trans. Neural Networks 21 (2) (2010) 185–200. [6] Z. Pawlak, Rough Sets: Theoretical Aspects of Reasoning about Data, Kluwer Academic Publisher, London, 1991. [7] Chun Du, Jixiang Sun, Zhiyong Li, et al., Method for ship recognition using optical remote sensing data, J. Image Graphics 17 (4) (2012), 591-591. [8] J. Lan, L. Wan, Automatic ship target classification based on aerial images, in: Proceedings of SPIE, PIE: Bellingham, WA, 2009, pp. 1–10, vol. 7156, 12. [9] H. Pan, Y.P. Zhu, L.Z. Xia, Efficient and accurate face detection using heterogeneous feature descriptors and feature selection, Comput. Vision Image Understanding 117 (1) (2013) 12–28. [10] Guang Yang, Bo Li, Shufan Ji, Feng Gao, Qizhi Xu, Ship detection from optical satellite images based on sea surface analysis, Geosci. Remote Sens. Lett. 11 (3) (2014) 641–645. [11] H.S. Gu, Y.M. Zhang, Q. Ji, Task oriented facial behavior recognition with selective sensing, Comput. Vision Image Understanding 100 (3) (2005) 385–415. [12] Z. Zhang, A Study on Harbor Target Recognition in High Resolution Optical Remote Sensing Image, University of Science and Technology of China, Hefei, 2009. [13] G.C. Anagnostopoulos, SVM-based target recognition from synthetic aperture radar images using target region outline descriptors, Nonlinear Anal. 71 (12) (2009) e2934–e2939. [14] Wei Jin, Jian-qi Zhang, Xiang Zhang, Face recognition method based on support vector machine and particle swarm optimization, Expert Syst. Appl. 38 (2011) 4390–4391.
4013
[15] Wen Ying, An improved discriminative common vectors and support vector machine based face recognition approach, Expert Syst. Appl. 39 (2012) 4629–4630. [16] Min Tang, Feng Chen, Facial expression recognition and its application based on curvelet transform and PSO-SVM, Optik 124 (2013) 5403–5404. [17] Luis Samaniego, Andras Bardossy, Karsten Schulz, Supervised classification of remotely sensed imagery using a modified k-NN technique, IEEE Trans. Geosci. Remote Sens. 46 (7) (2008) 2112–2125. [18] J. Antelo, G. Ambrosio, J. Gonzalez, et al., Ship detection and recognition in high resolution satellite images, in: IEEE International Geosciences and Remote Sensing Symposium, IEEE Computer Society, Washington, DC, USA, 2009, pp. 514–517, vol. 4. [19] C¸ınar Hatice Akakın, Bülent Sankur, Robust classification of face and head gestures in video, Image Vision Comput. 29 (2011) 470–480. [20] Wey Shiuan Hwang, Juyang Weng, Hierarchical discriminant regression, IEEE Trans. Pattern Anal. Mach. Intell. 22 (11) (2000) 1277–1293. [21] J.Y. Weng, W. Hwang, Incremental hierarchical discriminant regression, IEEE Trans. Neural Networks 18 (2) (2007) 397–415. [22] M. Uma Selvi, S. Suresh Kumar, A novel approach for ship recognition using shape and texture, Int. J. Adv. Inf. Technol. (IJAIT) 1 (5) (2011) 23–29. [23] Costas Panagiotakis, Ilias Grinias, Georgios Tziritas, Natural Image Segmentation based on tree equipartition, bayesian flooding and region merging, IEEE Trans. Image Process. 20 (8) (2011) 2276–2287. [24] Zhenyu An, Zhenwei Shi, Xichao Teng, Xinran Yu, Wei Tang, An automated airplane detection system for large panchromatic image with high spatial resolution, Optik 40 (10) (2013) 3448–3450. [25] Changren Zhu, Hui Zhou, Runsheng Wang, Jun Guo, A novel hierarchical method of ship detection from spaceborne optical image based on shape and texture features, IEEE Trans. Geosci. Remote Sens. 48 (9) (2010) 3446–3456. [26] R.O. Duda, P.E. Hart, Use of the Hough transformation to detect lines and curves in picture, Commun. ACM 15 (1) (1972) 11–15. [27] Chen Wenting, Ji Kefeng, Xing Xiangwei, Ship recognition in high resolution SAR imagery based on feature selection, Proc. Int. Conf. Comput. Vis. Remote Sens. (2012) 302–303. [28] Chun Du, Jixiang Sun, Zhiyong Li, Shuhua Teng, Method for ship recognition using optical remote sensing data, J. Image Graphics 17 (4) (2012) 591–592. [29] S.H. Teng, D.C. Zan, J.X. Sun, Z.G. Tan, Attribute reduction algorithm based on common discernibility degree, Pattern Recognit. Artif. Intell. 23 (5) (2010) 630–638. [30] M. WcPherson, L. SmithLovin, J.M. Cook, Birds of a feather: homophily in social network, Annu. Rev. Sociol. 27 (1) (2001) 415–444. [31] Zhenwen Wang, Weidong Xiao, Wentang Tan, Classification in networked data based on the probability generative model, J. Comput. Res. Dev. 50 (12) (2013) 2645–2646. [32] S. Thedoridis, K. Koutroumbas, Pattern Recognition, third ed., Publishing House of Electronics, Beijing, 2010. [33] C.Y. Zhang, J.L. Sun, Y.Q. Ding, Topiis mining for microblog based on MB-LDA model, J. Comput. Res. Dev. 48 (10) (2011) 1795–1802. [34] M.I. Jordan, Z. Ghahramani, T. Jaakkola, L.K. Saul, An introduction to variational methods for graphical models, Mach. Learn. 37 (2) (1999) 183–233. [35] M. Opper, D. Saad, Advanced Mean Field Methods: Theory and Practice, MIT Press, Cambridge, MA, 2001. [36] B.S. Manjunath, B. Sumengen, Graph partitioning active contours (GPAC) for image segmentation, IEEE Trans. Pattern Anal. Mach. Intell. 28 (4) (2006) 509–521. [37] L. Tang, H. Liu, Leveraging social media networks for classification, J. Data Min. Knowl. Discovery 23 (3) (2011) 447–478. [38] J.X. Sun, Modern pattern recognition, second ed., Higher Education Publishing Company, Beijing, 2008, pp. 252–259.