Variational approximate inferential probability generative model for ship recognition using remote sensing data

Optik 126 (2015) 4004–4013 Contents lists available at ScienceDirect Optik journal homepage: www.elsevier.de/ijleo Variational approximate inferent...

Download PDF

1MB Sizes 1 Downloads 30 Views

Report

PDF Reader
Full Text

Optik 126 (2015) 4004–4013

Contents lists available at ScienceDirect

Optik journal homepage: www.elsevier.de/ijleo

Variational approximate inferential probability generative model for ship recognition using remote sensing data Weiya Guo a,b,∗ , Xuezhi Xia b , Xiaofei Wang b a b

College of Information Technology, Harbin Engineering University, Harbin, China Wuhan Digital Engineering Research Institute, Wuhan, China

a r t i c l e

i n f o

Article history: Received 6 July 2014 Accepted 27 July 2015 Keywords: Ship recognition Hough transformation ε-Local neighborhood information Manifold similarity Variational methods

a b s t r a c t Aiming at detecting sea targets reliably and timely, a discriminative ship recognition method using optical remote sensing data based on variational methods probability generative model is presented. First, an improved Hough transformation is utilized for pretreatment of the target candidate region, which reduces the amount of computation by ﬁlterring the edge points, our experiments indicate the targets (ships) can be detected quickly and accurately. Second, based on rough set theory, the common discernibility degree is used to compute the signiﬁcance weight of each candidate feature and select valid recognition features automatically. Finally, for each node, its neighbor nodes are sorted by their manifold similarity to the node. Using the classes of the selected nodes from top of sorted neighbor nodes list, a dynamic probability generative model is built to recognize ships in data from optical remote sensing system. Experimental results on real data show that the proposed approach can get better classiﬁcation rates at a higher speed than the k-nearest neighbor (KNN), support vector machines (SVM) and traditional hierarchical discriminant regression (HDR) method. © 2015 Elsevier GmbH. All rights reserved.

1. Introduction Target detection and recognition from remote sensing image play a critical role in various applications of pattern recognition, such as ﬁshery management, vessel trafﬁc services, and maritime activities. In particular, with the naval strength keeps developing at high speed in the world, ship detection and recognition has become increasingly important for effective and efﬁcient ship monitoring to form the marine combat intelligence. In recent years, with the rapid development of earth observation technology, satellite remote sensing has entered an unprecedented new stage, and the sea reconnaissance and target surveillance are provided with abundant data source by a number of high spatial resolution, short revisit circle imaging satellites. For example, taking the French SPOT-5 satellite full-color image into account, its point resolution has access to 2.5 m, taking the American Quick bird full-color image into account, its resolution is 0.6 m, and revisit circle is 1–3.5 days and also taking the most advanced military spy satellite of America into account, its resolution is 0.05 m. Moreover, with the promotion and implementation of the

∗ Corresponding author at: Harbin Engineering University, College of Information Technology, Harbin, China. Tel.: +86 13031666128. E-mail address: [email protected] (W. Guo). http://dx.doi.org/10.1016/j.ijleo.2015.07.178 0030-4026/© 2015 Elsevier GmbH. All rights reserved.

earth satellite imaging system, in future, there will be some better performance, higher resolution, shorter revisit circle of earth observation satellite, and the available satellite remote sensing image data also increases by explosive growth. In the face of such massive remote sensing image data, due to the low efﬁciency, high cost, long information acquisition cycle and other defects, the traditional way such as artiﬁcial visual interpretation cannot satisfy the requirements of the modern society efﬁcient information. How to quickly and accurately extract automatically and recognize ship target from massive remote sensing data has become an urgent need to solve the problem. Recently, different kinds of features have been proposed for optical remote sensing target recognition, and they appear most promising for the recognition performance. However, these methods mainly have two shortcomings: (1) they are generally depended on experiences to select feature vectors, and the selected vectors are high dimensional and redundancy, it is difﬁcult to improve recognition accuracy and speed; (2) its classiﬁer needs global search, and the time complexity is high, which cannot satisfy the requirements of real-time processing. To overcome the ﬁrst aforementioned shortcoming, in the last century 90s, genetic algorithm was introduced to apply to feature selection [1,2], which obtained good results. Zheng [3] proposed an improved feature selection method based on genetic algorithm, Shang et al. [4] employed NSGA-II algorithm to optimize the

W. Guo et al. / Optik 126 (2015) 4004–4013

classiﬁcation learning framework (MSCC) [5]. Besides, based on rough set theory [6], Du et al. [7] used the idea of attributes reduction to compute the signiﬁcance weight of each candidate feature and select valid recognition features. Moreover, the improved Zernike moment invariant was used to recognize large warship in air remote sensing image [8]. In addition, an efﬁcient and accurate face detection using feature selection was proposed in [9]. Later, a novel linear function combining pixel and region characteristics by Yang et al. [10] is employed to select ship candidates, which has optimized the detection performance. Despite they have enabled the recognition accuracy to improve, most of them have the lower time complexity. To overcome the second aforementioned shortcoming, depending on the existing various methods, image recognition efﬁciency have improved gradually. A task oriented facial behavior recognition method was presented in [11], despite the recognition rate was high, it cost a lot of computation time. Support Vector Machine (SVM) technology was applied to the target recognition of Quick Bird satellite data [12] and SAR images [13], and SVM also used in face recognition [14–16]. Whereas SVM method was derived from statistical theory, which caused a preference for two class problems, it performs poorly in multi-class classiﬁcation tasks. An improved k neighbor algorithm to realize the supervision of the remote sensing image classiﬁcation was presented in [17], and likelihood function based on the Bayesian criterion was designed to recognize the ship satellite image [18]. These methods were successfully applied in ship target recognition. However, they were improved based on traditional algorithms, they needed global traversal optimization in the process of classiﬁcation, and did not have the ability of online learning, namely, and it cannot effectively use the distribution of the training samples. Akakın and Sankur [19] proposed a method of robust classiﬁcation of face and head gestures in video, although the recognition rate is as high as 98%, it costs a lot of computation time. In addition, hierarchical discriminating regression (HDR) [20] was already investigated, which uniﬁed the classiﬁcation and regression problems to the issue of regression, the efﬁciency was greater, the time complexity was O (d log n), it was suitable for high-dimensional image data processing. Based on HDR method, Weng and Hwang [21] proposed an incremental discriminant regression technique, it allowed us to obtain a signiﬁcantly better classiﬁcation accuracy, but the computation time was slightly larger than the traditional HDR algorithm. Besides, for ship detection, a new classiﬁcation approach using shape and texture was introduced in [22], which attained a good classiﬁcation rate. And another method [23] provided a robust enhancement and detection of mostly line structures in 2-D gray-scale images. Although these methods relatively have a high detection or recognition accuracy, with respect to the time consumption, they are not suitable for the requirements of real-time ship detection or recognition. Further, An et al. classiﬁed airplane candidates by using circle frequency ﬁlter [24], the detection rate is higher than 96%, however, the whole time cost is around 1 m. In recent years, the application of probability generative model in classiﬁer becomes popular, for example, the approach [25] classiﬁed ship candidates by using their class probability distributions rather than the direct extracted features, which was effective in distinguishing between ships and non-ships. For ship recognition, it is prefer a the higher recognition rate and lower time consumption. In view of the above-mentioned facts, in this paper, we are concerned with recognition efﬁciency, taking more than 4 m resolution of optical remote sensing image data as the research object, based on class propagation distribution, a general ship target recognition method using probability generative model is proposed, which has made necessary improvements in image segmentation, feature selection and the design of classiﬁer.

4005

Fig. 1. The proposed method ﬂow diagram.

Fig. 1 illustrates the ﬂow diagram of the proposed method procedure. The rest of the paper is organized as follows. Section 2 brieﬂy introduces an improved Hough transformation and we use them to preprocess target candidate region. Section 3 sketches our features extraction and based on rough set theory how we apply attribute reduction to features selection. Section 4 is devoted to target classiﬁcation of dynamic probability generative model based on the classes of neighbor nodes. Section 5 gives experiments and comparative results in detail, and Section 6 summarizes our contributions and sketches future work. 2. Pretreatment of the target candidate region Hough transformation [26] has good detection adaptability due to its high positioning accuracy, strong ability to resist noise and clutter interference. However, its shortcomings like large amount of computation and high storage space have limited its application. To solve this problem we proposed the improved Hough transformation. Its principle as follows: (1) Randomly select three edge points that are not on the same line from an image’s edge point set V, according to the three points, we can get a circle, of which the center is (a, b), and the radius is r. So we have 2xa + 2yb + d = x2 + y2

(1)

(2) Now, assume the selected edge points are vi = (x, yi ) , i = 1, 2, 3, and the obtained center (a123 , b123 ) as well as the r123 can be expressed as follows:

a123 =

b123 =

x2 + y2 − (x2 + y2 ) 2(y2 − y1 ) 1 1 2 2 2 2 x3 + y3 − (x12 + y12 ) 2(y3 − y1 )

4((x2 − x1 )(y3 − y1 ) − (x3 − x1 )(y2 − y1 ))

2(x2 − x1 ) x2 + y2 − (x2 + y2 ) 2 2 1 1 2(x3 − x1 ) x32 + y32 − (x12 + y12 )

4((x2 − x1 )(y3 − y1 ) − (x3 − x1 )(y2 − y1 ))

(2)

(3)

r123 =

2

(xi − a123 )2 + (yi − b123 ) ,

i = 1, 2, 3

(4)

In addition, we take v4 = (x4 , y4 ), if it meets Eq. (5), and then v4 is on the circle. Where ı is the threshold value:

d4→123 =

2

(x4 − a123 )2 + (y4 − b123 ) − r123 < ı

(5)

(3) Based on the above, for the candidate circle cijk , which is determined by vi , vj , vk , take all the points vl of the set V, if the points can satisfy dl→ijk < ı, which means they are on the candidate

4006

W. Guo et al. / Optik 126 (2015) 4004–4013

Fig. 4. Area code examples.

3.1. Feature extraction For optical remote sensing images, there are some differences in size, shape, texture, and so on. We extract features from the following items: Fig. 2. Diagram of improved Circular Hough detection algorithm.

circle cijk , then count is incremented until all the edge points are completely taken. If the accumulated value is greater than a speciﬁed threshold ˛, it can be determined that the candidate circle is a true circle, then continue to detect the next candidate circle in accordance with the above steps. In fact, the improved algorithm reduces the amount of computation by ﬁlterring the edge points, its physical meaning can be explained by Fig. 2. For the circle, with the center is (a, b), and the radius is r, its any points must exist within its external square. So by Fig. 2, we can get inequalities (6).

⎧ xl > aijk + rijk + t ⎪ ⎪ ⎪ ⎪ ⎨ xl < aijk − rijk − t ⎪ yl > bijk + rijk + t ⎪ ⎪ ⎪ ⎩

(6)

yl < bijk − rijk − t

where t is a variable parameter, for any point such as vl of the set V, as long as it satisﬁes one of the inequalities (6), dl→ijk won’t be calculated, then start to detect next edge point. As we explain in the above, the environment of experiment as follows: the tool is Matlab7.0 R, and each original image solution is 400 × 300, they are derived from Quick bird satellite images. Fig. 3 strengthens the results by saying that using the improved Circular Hough the targets (ships) can be detected quickly and accurately. 3. Ship target feature extraction and selection Feature selection as well as extraction is a challenging problem in areas such as pattern recognition, machine learning and data mining. In general, distinguished features are helpful for ship recognition, so in this section, we intend to ﬁnd more effective features, and consider a consistency measure based on rough set theory (also called attribute reduction), which aims to retain the discriminatory power of original features.

(1) Size features: length F1 , width F2 , length-width ratio F3 , and perimeter F4 . (2) Texture features: image entropy F5 and smoothness F6 . Within the 8×8-neighborhood-window, image entropy is computed as 8 8

ME (x, y) = −

px,y log px,y

(7)

x=1 y=1

px,y =

f (x, y)

8 8 x=1

y=1

(8)

f (x, y)

where f (x, y) is the gray value of a point (x, y) in the neighbor window, and f (x, y) > 0, px,y is the probability distribution of f (x, y). (3) Shape features: bow shape F7 and stern F8 , radius deviation F9 , spindle length F10 , and compactness F11 , F11 is given as Eq. (9), compared to roundness, compactness gives full consideration to the impact on the average radius of the object boundary changes, which is better to describe complexity of target shapes.

L=

× r2 A

where r =

=×

(i,j)∈˝

(i,j)∈˝

i − ¯i

2

+ j − ¯j

2 (9)

(P × A)

i−i

2

+ j−j

2

/P is the equivalent

radius, P is the perimeter and A is the area, i, j represents its center, ˝ expresses the target boundary points set. (4) Moment invariant features: 7 Hu moment invariants (F12 ∼F18 ), the front 20 invariants of less than 8 Zernike moment (F19 ∼F38 ) and 8 wavelet moment invariants (F39 ∼F46 ). (5) Area ratio features: different with the area code of whole target [27], this paper only extracts ship target area ratio of the bow F47 and stern F48 , which is shown in Fig. 4: To Fig. 4, obtain the target slices circumscribed rectangle along the spindle rotation to its horizontal direction, then divide the rectangle into N equal parts, take the ith part of target area as Si , each

Fig. 3. Original images and detection results.

W. Guo et al. / Optik 126 (2015) 4004–4013

part is encoded according to Eq. (10) and ARC is deﬁned in terms of the following equation:

Ci 1≤i≤N

= ﬂoor

Si

S 1≤i≤N i

× 10

(10)

Ci = [C1 , C2 , . . ., CN ]

(11)

In view of the above-mentioned facts, N = 6, ARC code is C=[4,8,10,10,9,7] in this paper, we just take C1 and C6 , namely, F47 = C1 , F48 = C6 . Considering the majority

ofship

target, which bow and stern parts have occupied about 1/3 ∼ 1/8 of the whole ship, so usually the divided number is 3 ≤ N ≤ 8, speciﬁcally, N = 6. Summing up, obtain the feature vectors F = [F1 , F2 , . . ., F48 ] from all the extracted features.

Insufﬁcient prior knowledge often brings blindness to feature extraction, in this paper the relevance of extracted features is strong, due to the redundancy, and it’s easy to reduce recognition efﬁciency. In fact, ship target recognition is a multi-feature combination classiﬁcation problem, different features combinations for recognition contributions are different, and our purpose is to ﬁnd the most effective features combinations. Referring to the related algorithm [28], based on rough set theory [6], the common discernibility degree [29] is utilized to compute the signiﬁcance weight of each candidate feature and select valid recognition features automatically. 3.2.1. Common discernibility degree computing Formally, in rough set theory, an information system can be expressed by a system S, where, S = (U, A, V, f ), U is an universe, and U = x1 , x2 , . . ., x|U| , A is a set of all attributes. Each attribute aj ∈ A, and Vj is called the value set of aj . We let V = a ∈A Vj , and j

deﬁne an information f : U × A → V, which satisﬁes the con function ditions: ∀aj ∈ A, f x, aj ∈ Vj . If A = C ∪ D and C∩ D = ∅, S = (U, C, D) is called a decision table, where C is called a condition attribute set, D is called a decision attribute set. If ∃aj ∈ A, and Vj includes “∅”, then S is called an incomplete information system. In consequence of the above assumption, ﬁrst, we associate discernibility relation of the attribute set P ⊆ A as the following equation:

It can be seen from Eqs. (15) and (16), DIS (Q ; P) belonging to attribute sets P and Q is an ordered set and can distinguish the domain U from another. Simultaneously, DIS (Q ; P) is the number of elements of DIS (Q ; P), which is used to measure the common discernibility abilities of P and Q. 3.2.2. Feature selection algorithm steps As mentioned in Section 3.2.1, the realized stages in this phase can be summarized as below: (1) Discretize the extracted ship features, construct feature decision table as follows. C1

/ f (y, a) , (x, y) ∈ U × U|∃a ∈ P, f (x, a) =

f (x, a) = / ∅, f (y, a) = / ∅

DP (x) =

y ∈ U| (x, y) ∈ DIS (P)

(13)

(14)

i=1

|DIS (P) | in Eq. (14) measures the discernibility ability of attribute set P, which is equal to the number of the ordered elements. Further, we deﬁne the common discernibility relation between attributes P and Q: DIS (Q ; P) = DIS (Q ) ∩ DIS (P)

(15)

Its corresponding common discernibility degree is described as

DIS (Q ; P) = DIS (Q ) DIS (P)

Cn ···

F11

.. .

D

F1n .. .

Fm1

Fmn

d1 .. . dm

⎤ ⎥ ⎥ ⎦

(17)

T C i = (F1i , F2i , . . ., Fmi ) means the ith candidate features, C = C1 , C2 , . . ., Cm indicates all candidate features set, D =

d1 , d2 , . . ., dm is the ship decision (category) set, Fij denotes the jth feature of the ith training sample, m, n represent the number of training samples and feature dimensions, respectively. (2) Deﬁne the feature sets Q and T, and set Q =∅, T = C, according to Eq. (16) calculate the common discernibility degree DIS (Q ; P) between attribute sets C and D. (3) By Eq. (19) deﬁne feature set importance function, according to Eq. (20) calculate the most important feature Ck of set C–Q, and update Q and T, namely, we set:

Q = Q ∪ Ck ,

T = T − Ck ,

1 ≤ i ≤ T

− DIS (D, Q ) SGF (Ck , Q, D) = DIS D, Q ∪ Ck SGF (Ck , Q, D) = maxSGF (Ci , Q, D)

Ci ∈T

(18) (19) (20)

(4) If DIS (D; Q ) = DIS (D; C), go to step 5, otherwise, go to step 3. (5) Take the ﬁnal Q = C1 , C2 , . . ., Cl as the feature selection results, build the reduced training sample feature set: C1

X=

.. . xm

⎡ ⎢ ⎢ ⎣

···

F11

Cl ··· .. .

Fm1

···

D

F1l .. . Fml

d1 .. . dm

⎤ ⎥ ⎥ ⎦

(21)

where xi is the ith target reduced feature vector, l is the reduced feature dimensions number.

The attribute set P discernibility degree is viewed as |U| DIS (P) = DP (xi )

···

In Eq. (17), Ui = (Fi1 , Fi2 , . . ., F in ) represents the ith feature vector, U = U1 , U2 , . . ., Um indicates the target set,

(12)

The largest set of distinguishable objects with x can be written

⎢ ⎢ ⎣

.. . Um

xi as

⎡

U1 S (U, C, D) =

3.2. Feature selection

DIS (P) =

4007

(16)

To sum up, applications and performance of the feature extraction and selection method are discussed in subsequent Section 5.2. 4. Target classiﬁcation and recognition In this paper, the classiﬁer is designed based on dynamic probability generative model. The main idea of the proposed method is derived from building a new generative model in an undirected graph, in which the edges of the graph are observed variables and the classes of the nodes whose classes are unknown are latent variables. The values of the latent variables can be calculated by ﬁtting the generative model of the graph.

4008

W. Guo et al. / Optik 126 (2015) 4004–4013

The method does not require training process, it directly uses iterative formula to calculate the class of the unknown node. In this section, ﬁrst we propose the concept of class propagation distribution, and use the concept to express the probability of a node’s neighbors belonging to each class. Then based the class distribution, the dynamic probability generating model is presented. Finally, by solving the model, the classes of the nodes whose classes are unknown are obtained. 4.1. Dynamic probability generative model based on the classes of neighbor nodes

First, nodes pairwise constraints are divided into two kinds: Vi , Vj ∈ Link, which means two nodes Vi and Vj belong to the

4.1.1. Class propagation distribution In the networks with low homophily [30], there exists a majority of connected nodes whose classes are different from each other. So just depending on whether there is an edge between two nodes to judge whether they belonging to the same class are not enough. Therefore, this paper considers two nodes’ classes and their neighbor nodes’ classes at the same time. Assume there two nodes Vi and Vj , and Vj belongs to the class Lc , in the neighbor nodes of Vj , the more nodes belong to class Lc , the greater probability of edges exists between the nodes Vi and Vj . This paper differs from the class propagation distribution proposed in [31], which only considered the neighbor nodes classes and had no limit on the number of neighbor nodes. It’s necessary to discuss a node’s own class contribution for improving classiﬁcation accuracy, and in order to reduce the computing time, also discuss how many neighbor nodes should participate in computation. First, corresponding deﬁnitions are listed as follows: Deﬁnition 1 (: ε-Neighborhood distance). In an undirected graph G = (V, E), in which there exists two nodes Vi and Vj , deﬁnes Vi ’s ε-neighborhood distance as below:

⎡

Dε Vi , Vj = ⎣

dist(Vi ,Vj )

˚1

dist(Vi ,Vj )

˚2

−1 −1

dist Vi , Vj < ε

Fig. 5. Flow chart of computing similarity matrix.

same class; Vi , Vj ∈ / Link, which means two nodes Vi and Vj do not belong to the same class. Next, make the following adjustments to the labeled nodes: (1) To existing previous information constraint (Vi , Vk ) ∈ Link, make the following adjustments to the similarity between two corresponding nodes.

s (i, j) = ∞

(24)

In Eq. (24), w (i, j) is an element of the similarity matrix Wn×n . Get a new restriction from the known ∈Link constraint extension, namely, if there exists a nodes Vk , which satisﬁes (Vi , Vk ) ∈ Link restriction, then

Vi , Vj ∈ Link

(Vi , Vk ) ∈ Link

⇒ Vj , Vk ∈ Link ⇒ w (j, k) = 0, w (k, j) = 0 (25)

(2) To existing previous information constraint Vi , Vj ∈ / Link, make the following adjustments to the similarity between two corresponding nodes.

(22)

dist Vi , Vj ≥ ε

Vi , Vj ∈ / Link ⇒ w (i, j) = ∞,

w (j, i) = ∞

(26)

Obtain a new restriction from the known ∈ /Link constraint extension, namely, if there exists a node Vk , which satisﬁes (Xi , Xk ) ∈ Link restriction, then

where dist Vi , Vj means the Euclidean distance between two nodes Vi and Vj , ˚1 as well as ˚2 is an adjustment density factor, ε is an adjustable radius. For a node, if the distance between another is less than ε, then set ˚1 < 1, on the contrary, if the distance is beyond ε, then set ˚2 > 1. Noting that minus

the constant 1 is able to satisfy the condition, that is, if dist Vi , Vj = 0, then

Vi , Vj ∈ / Link ⇒ s (i, j) = ∞,

Vi , Vj ∈ / Link

(Vi , Vk ) ∈ Link

⇒ Vj , Vk ∈ / Link ⇒ w (j, k) = ∞, w (k, j) = ∞ (27)

Dε Vi , Vj = 0. However, we argue that while ε-neighborhood distances between two samples are equal, if only use ε-neighborhood distance to measure its neighbor nodes, which may result in the wrong classiﬁcation. Therefore, we need to further give as follows: Deﬁnition 2 (: Manifold similarity). By manifold assumption [32], nodes close to each other in the manifold most likely belong to the same class. In a weighted undirected graph G = (V, E), P (i, j) represents a set of all the paths that connect nodes Vi and Vj , p is for any path, pk is the kth node’s path. According to the shortest path, the manifold similarity is measured as

|p|−1

W (i, j) = min

Dε (pk , pk+1 )

(23)

k=1

Manifold similarity is calculated along the shortest path in the manifold, which makes the nodes in the same high density regions connect with shorter sides and in different density regions connect with longer sides. The main constructing manifold similarity procedure is summarized as follows:

Fig. 5 outlines the ﬂow diagram of the similarity matrix Wn×n computing procedure: Noticing that the inﬂuences of Deﬁnition 1 parameters such as ε, ˚1 and ˚2 on the classiﬁcation results in detail will be shown in subsequent Section 5.3. Then quickly sort the neighbor nodes of Vi in accordance with the manifold similarity, and select Mi th nodes from top of sorted neighbor nodes list (the value of Mi is about equal to 40–60% ratio of the all the nodes). Deﬁnition 3 (: Class propagation distribution). Suppose that there exists K classes, the node Vi has Mi neighbor nodes, in which there are Mic nodes belonging to class Lc , and the ratio of the node Vi itself and its neighbor nodes belonging to the class Lc is deﬁned as ˛

1 N + ˇ ic K Mi

0 < ˛ < 1,

ˇ =1−˛

(28)

For convenience, Eq. (28) can be given as ic = ˛

1 N + ˇ ic K Mi

(29)

W. Guo et al. / Optik 126 (2015) 4004–4013 Table 1 Symbols interpretation. Symbol

Interpretation

S Sij VL VU Vi L

The adjacency matrix of the undirected graph G The element in the ith row and jth column of the matrix S The set of nodes whose classes are known The set of nodes whose classes are unknown The ith node in V The set of classes which are composed of the classes of the nodes in VL The cth class in L The index indicating the position of the class of Vi in L The class propagation distribution of Vi The matrix which is composed of the class propagation distribution of all nodes

Lc yi i

So the vector i1 , . . ., ic , . . ., iK is called the class propagation distribution of Vi , which is usually denoted as i . From the above deﬁnitions, for a node, despite its class propagation distribution is determined by a node’s own class and its neighbor nodes’ classes, by means of the subsequent Section 5.3, we can reasonably infer that when ˛ is close to 0 and ˇ is close to 1, the better classiﬁcation effect is obtained, so we change Eq. (29) into: N ic = ic Mi

(30)

Indeed, since a node’s own class is not involved in Eq. (30), we only consider its neighbor nodes’ classes inﬂuence on the classiﬁcation results in the subsequent discussion. Intuitively, in this paper, we use the class propagation distribution of Vi and the class of Vj to describe the probability of existing edge between Vi and Vj , which is the component of the class propagation distribution of Vi in the class of Vj . 4.1.2. Probability generative model based on class propagation distribution Before showing the usage of our model, we give the related symbols and their interpretations. We use an undirected graph G = (V, E) to express the class nodes, here, E is the set of edges, V is the set of nodes. Table 1 shows the interpretations of each symbol. Take yi as the class index of node Vi , namely, if the class index of Vi is yi , then the class of Vi is denoted by Lyi . If there is an edge between nodes Vi and Vj , then Sij = 1, otherwise Sij = 0. Assume in an undirected graph, there are N nodes belonging to K classes, and the creating edges steps are as follows: First of all, generate the class propagation distribution of each node from Dirichlet prior with parameter ı. Then, for anode whose class is unknown, get an integer from the uniform distribution between 0 and K−1, and take the integer as the class index of the node, so assign a class to each node. Finally, the edges between Vi and Vj is generated from Bernoulli distribution, and the Bernoulli distribution parameter is the component iyj of the class propagation distribution of Vi in class Lyj . Use y to express the set which is composed of the class indexes of the nodes whose classes are unknown, then the model’s joint probability distribution of observed variables and latent variables is shown as

p E, y| = p (y) p E|y,

(31)

In this paper, we apply the Gibbs sampling method to ﬁtting the model. Gibbs sampling is a fast and efﬁcient Markov chain Monte Carlo sampling method, which is commonly used in probability generative models [33]. Usually Gibbs sampling is adopted iterative formula to get the values of latent variables by sampling from the posterior distribution of the latent variables.

4009

Consider Gibbs sampling method to solve the probability generative model. In fact, this method has better versatility, but its efﬁciency is lower, and it is also known that in each iteration, it must wait until every Markov chain reaches a steady state. So in this paper we adopt the variational method [34] to solve the proposed model. In the calculus of variations study, for the visible variable E of each training sample, we use the approximate posterior distribu tion q (E|yu , ) to replace the latent variable distribution p E|yu , , where yu¯ represents the class indexes of the nodes except Vu . Therefore, we use naive mean ﬁeld [35], and utilize the distribution of complete factorization to the true posterior distribution:

K| q (y, ) = j=1 q yj , where, q yj = 1 = j . Note that the lower bound on the log likelihood function of training samples has the following:

K N

ln p E, ≥

ei wij j − ln Z

i=1,i = / u j=1

+

K

j ln j + 1 − j ln 1 − j

j=1

(32)

N K where the function Z = e−E(ei ,yi ,) is a normal/ u i=1,i = j=1 ized item, and the energy function is deﬁned as

E ei , yi , = −

N K

ei wij yj −

i=1,i = / u j=1

N i=1

i ei −

N

i yi

(33)

i=1

Fix , by maximizing Eq. (32) to learn the variational parameter , so the mean ﬁeld’s ﬁxed point equation is deﬁned as j ←

N i=1

wij ei

(34)

where (x) = 1/1 + e−x is for the logistic function. Further, given the variational parameter , update the parameter as follows: ic =

C +ı

ic K

k=1

Cic + ı

(35)

where ic is equal to the proportion of the nodes whose class is Lc in all neighbor nodes of Vi . The estimator can be used as the value of i in calculation. The ic procedure is available in Appendix A. In this section, for the classiﬁcation we use direct iteration method, and abandon the training process, so reduce the training time. According to Eq. (32) sampling, after a certain number of iterations, check whether or algorithm satisﬁes the not the proposed termination condition k+1 − k < ı, if OK, terminate the algorithm. Otherwise, go on. Note that, at each iteration, we record each unknown node class. So, to a node, in terms of Deﬁnition 3, take the assigned highest number class as its ﬁnal class. 5. Experimental results and analysis 5.1. Experiments settings In order to test the efﬁciency of the proposed method, experimental data have been randomly selected from Google map’s aerial images. First, preprocess the images as mentioned in Section 2, and then get target slices by GPAC (Graph Partitioning Active Contours) algorithm [36], as shown in Fig. 6. There are six vessels experimental data, and each vessel has 20 pieces, so there are a total of 120 pieces images.

4010

W. Guo et al. / Optik 126 (2015) 4004–4013

Fig. 7. Signiﬁcance weight of candidate features.

Fig. 8. Results of different ε.

5.3. Inﬂuences of parameter ˛, ˇ and burn-in period on classiﬁcation results In this experiment, we use Micro-F1 to evaluate the classiﬁcation result. Micro-F1 is a real between 0 and 1, and the greater numerical value is, the better classiﬁcation performance is. According to the method [37], calculate the value of Micro-F1, as shown in Eq. (38). Where tic denotes the actual class of node Vi , yic represents the gained class by classiﬁcation methods. If the actual class of Vi is Lc , then tic = 1, otherwise tic = 0. If the gained class is Lc , then yic = 1, otherwise yic = 0.

Fig. 6. Example of experiment data.

5.2. Recognition feature extraction and selection As explained in Sections 3.1 and 3.2, we get the 48-dimensional feature vector. Then take the ship training samples set as the domain U, its corresponding feature vectors as the condition attributes set C, the corresponding class as decision attributes set D. By Eq. (17), construct the decision table as below:

⎡ ⎢

F11

S (U, C, D) = ⎢ .. ⎣ . 1 F60

d1 .. . .. . . .. 48 d · · · F60 60 ···

F148

⎤ ⎥ ⎥ ⎦

(36)

j

where Fi represents the ith feature value of the jth training sample. As was pointed out in Section 3.2, we get the reduction

T

i result Q = C1 , C3 , C5 , C43 , C48 , where Ci = F1i , F2i , . . ., F60 is the feature vectors which are composed of the ith feature of all the samples. With respect to Eq. (19), compute the signiﬁcance weight of individual feature: WCk = SGF (Ck , Q, D), for convenience processing, it can be modiﬁed as

W Ck =

W

Ck , Ci ∈ Q WCi

The selected valid features is Z =

(37)

F3 , F4 , F5 , F11 , F18 , F29 ,

F41 , F44 , F46 , F47 , F48 , namely, including the ratio of length to width, perimeter, entropy, compactness, the 7th Hu moment invariant, the 11th Zernike moment invariant, three of the most wavelet moment F inﬂuential 41 , F44 , F46 , which are invariants: F41 = W010 , F44 = W101 , F46 = W111 , and the ﬁrst and last ARC. Fig. 7 shows the signiﬁcance weight of the valid candidate features. It can conclude that based on rough set theory, the common discernibility degree is utilized to compute the signiﬁcance weight of each candidate feature well and select valid recognition features automatically.

× tic ) i,c (yic Micro − F1 = 2 i,c

(yic + tic )

(38)

To test the inﬂuence of various parameters on the classiﬁcation results, select 30, 40, 50, 60, 70, 80, 90 samples from all the samples, respectively. Fig. 8 is the classiﬁcation results of different ε. As can be visible in Fig. 8, either ε is large or ε becomes small, the classiﬁcation performance decreases signiﬁcantly. When ε is around 5, the best performances are achieved, so in the experiment, we take ε = 5. As mentioned before in Deﬁnition 1, ˚1 is a density factor, which aim at decreasing the distance between two nodes distributed in a high density region. Fig. 9 shows the classiﬁcation results with different ˚1 . As ˚1 varies, the classiﬁcation result varies too. Notice that the best performance with respect to ˚1 is around 0.5 is obtained, so in the paper, we take ˚1 = 0.5. Similarity, ˚2 is also a density factor, which aim at increasing the distance between two nodes distributed in a low density region. As ˚2 varies, the classiﬁcation result varies too. Notice that when ˚2 is around 2.5, the best performance are received, so here we take ˚2 = 2.5 (Fig. 10). In Deﬁnition 3, ˛ is a weight coefﬁcient. Referring to Fig. 11, if the value of ˛ is too large, a node’s own class has greater inﬂuence

Fig. 9. Results of different ˚1 .

W. Guo et al. / Optik 126 (2015) 4004–4013

4011

Table 2 Experimental data of algorithms (I). Method

Performance index

Group 1

Group 2

Group 3

Group 4

KNN

Average recognition rate (%) Average time consumption (s)

66.3 0.490

67.9 0.461

70.6 0.422

76.9 0.362

SVM

Average recognition rate (%) Average time consumption (s)

61.5 0.074

71.7 0.075

73.3 0.078

77.1 0.075

HDR

Average recognition rate (%) Average time consumption (s)

62.8 0.049

79.1 0.047

85.0 0.045

87.2 0.044

The proposed method

Average recognition rate (%) Average time consumption (s)

69.7 0.042

81.1 0.040

87.2 0.037

89.2 0.035

5.4. Recognition performance comparisons

Fig. 10. Results of different ˚2 .

Fig. 11. Results of different ˛.

on the classiﬁcation results, if the value is too small, the classes of the neighbor nodes have greater inﬂuence on the classiﬁcation results, however, it is easy to see that, the smaller the value of ˛ is, the more suited the classiﬁcation results appears, we can reasonably deduce ˛ → 0. In this paper, we take ˛ = 0.ˇ is a parameter of the prior distribution, before the actual data have been observed, it represents the estimator of the data distribution. Despite it can play a role in smoothing, if the value of ˇ is too large, the prior distribution would weaken the actual data inﬂuence on the ﬁnal results, which makes the classiﬁcation performance decrease signiﬁcantly.

As can be seen from Fig. 12, if ˇ changes in the vicinity of 1/K , it has little impact on the classiﬁcation performance (Fig. 13). ı is a parameter of the prior distribution, before the true data have been observed, it represents the estimator of the data distribution. Despite it can play a role in smoothing, when ı is too large, the prior distribution would weaken the true data inﬂuence on the ﬁnal results, and the classiﬁcation performance signiﬁ decrease cantly. However, ı changes in the vicinity of 1/K , it has little impact on the classiﬁcation performance.

Fig. 12. Results of different ˇ.

As was pointed out in Section 3.2.1, based on rough set theory, we use common discernibility degree to compute the signiﬁcance weight of each candidate feature and select valid recognition features automatically, and take the valid recognition features as sample sets to run recognition tests. We use the average obtained from many experiments to verify the proposed algorithm recognition performance (average recognition rate, average time consumption) and compared it with the results performed by k-nearest neighbor (KNN) [38] method, support vector machine (SVM) method and hierarchical discriminant regression (HDR) method. In this paper, a total of 8 groups’ experiments have been done. In each experiment, randomly select 30, 40, 50, 60, 70, 80, 90, and 100 training samples from 6 class cruisers. Repeat 10 times, for each group, take the average of 10 experiments data as the ﬁnal recognition rate. The 8 sets of experiments are denoted: groups 1, 2, 3, 4, 5, 6, 7, and 8, respectively. Test data are shown in Tables 2 and 3. Based on the comparisons between Tables 2 and 3, it is well known that, under the same experimental conditions, the correct detection rate of proposed algorithm is slightly higher than HDR classiﬁer and is around 10% higher than other algorithms. Its time consumption is signiﬁcantly lower than HDR classiﬁer in each group and is at least 90% lower than others. As aforementioned, it can easily be found that the proposed method has advantages compared with the present traditional methods: KNN, SVM and HDR. KNN method has taken the nearest Euclidean distance samples classes as output, it can’t make use of the training sample distribution, which causes the lower detection performance. SVM method has been proposed based on statistical learning theory, which is suitable for two classes’ recognition, to expand to multi-class recognition, the performance has declined. HDR is a tree classiﬁer, which has uniﬁed the classiﬁcation and regression problems to return to the issue of regression, and has the ability of incremental learning, so with the samples increasing, it has a higher detection rate, but it costs a long time. The proposed method is efﬁcient, because of the direct iteration method without training process, and not all the neighbor nodes having participated in computation. Nonetheless, the proposed method seems to give a reasonable indication, and can be used to complement other results. It’s helpful

Fig. 13. Results of different ı.

4012

W. Guo et al. / Optik 126 (2015) 4004–4013

Table 3 Experimental data of algorithms (II). Method

Performance index

Group 5

Group 6

Group 7

Group 8

KNN

Average recognition rate (%) Average time consumption (s)

79.0 0.298

80.6 0.262

83.8 0.240

85.8 0.214

SVM

Average recognition rate (%) Average time consumption (s)

79.3 0.070

81.1 0.063

84.2 0.055

87.1 0.044

HDR

Average recognition rate (%) Average time consumption (s)

87.8 0.040

88.9 0.038

90.1 0.036

92.3 0.030

The proposed method

Average recognition rate (%) Average time consumption (s)

89.3 0.038

89.5 0.036

89.7 0.033

91.2 0.031

for classiﬁcation, in addition, only a portion of all neighbor nodes are used to participate in computation, it lowers the computing time.

6. Conclusions This paper has proposed a discriminative approach for ship detection using optical remote sensing data based on manifold similarity probability generative model. The proposed approach can be considered as a reliable and timely detection method for battleﬁeld’s targets on the sea. First, an improved Hough transformation is introduced and applied to preprocess target candidate region, which reduces the amount of computation by ﬁlterring the edge points. By our experiments, we conclude that the targets (ships) can be detected quickly and accurately. Second, an attribute reduction algorithm based on common discernibility degree is adopted to select recognition features automatically. In this section, on the basis of the extracted invariant features, based on rough set theory, the common discernibility degree is used to compute the signiﬁcance weight of each candidate feature and select valid recognition features automatically. It is also worthwhile to mention in this context that the rough set theory ensures the relatively high reduction rate and simultaneously reduce time complexity. Finally, an efﬁcient classiﬁer with dynamic probability generative model is applied to classify the ship target. In this section, the classiﬁer is designed based on dynamic probability generative model, its main idea is to build a new generative model in the undirected graph, in which the edges of the graph are observed variables and the classes of the nodes whose classes are unknown are latent variables. The values of the latent variables can be calculated by ﬁtting the generative model to the graph. Noticing that, the neighbor sample nodes with respect to each sample node are sorted by their manifold similarity. Using the classes of the selected nodes from top of sorted neighbor nodes list, a dynamic probability generative model is built to classify ships in data from optical remote sensing system. In feature extraction, apart from commonly used size, texture, shape and moment invariants features, an area ratio code has been introduced. Our experiments further prove the new feature to be a powerful feature for the target classiﬁcation. In feature selection, based on rough set theory, the common discernibility degree is utilized to compute the signiﬁcance weight of each candidate feature and select valid recognition features automatically. Experimental results in this paper also show that the attribute reduction algorithm based on common discernibility degree ensures the relatively high reduction rate and simultaneously reaches fairly low time complexity in incomplete information system, it can be a practical method for feature selection and helpful for ship recognition.

In classiﬁcation, the proposed method has shown to be more discriminative than KNN, SVM and HDR methods. Moreover, by the analysis and comparisons, experimental results show that the classiﬁcation strategies based on manifold similarity probability generative model is helpful for achieving a good classiﬁcation performance. It turns out to be more suitable for real-time target detection application. Although our approach has shown promising results overall, several issues that necessitate our further improvement or reﬁnement to enhance the performance of the proposed method still remain to be discovered. First, the proposed method is heuristic, it consists of several steps containing a lot of thresholds, some of them should be further reﬁned. A semi-supervised hierarchical strategy may be a better solution. Second, false candidates, which mainly comprise ports and sea clutter, especially in the case of relatively lower resolution also exist. More effective features are needed to distinguish between them. Finally, the time consumption of the classiﬁer is still not ideal in multi-class tasks. In future work, hopefully enable us to make different methods: more effective feature extraction and detailed feature selection, efﬁcient classiﬁcation, the use of multispectral features, and a test of the proposed approach on a larger set of remote sensing images over a wide resolution range.

Acknowledgments The authors would like to acknowledge Wuhan Digital Engineering Research Institute for providing materials and equipments for this study, they also like to thank the anonymous reviewers and the associate editors for their constructive comments and suggestions. Furthermore, they should thank the America Dr Wang for giving useful advice for the related algorithm.

Appendix A. Procedure of the estimator of cth component ic of parameter i Posterior probability distribution of parameter i:

p i |E, ı =

∝ p i , E|ı =

p i , E|ı

p E|ı

N

p i |ı p E|i = p i |ı

iyj

Wij

K

= p i |ı

j=1

iyj

Cik

(A.1)

k=1

Because,

N

K ×ı

K ı−1 ik

K

p i |ı = Dirichlet i |ı = i=1

ı

k=1

(A.2)

W. Guo et al. / Optik 126 (2015) 4004–4013

Clearly, we have:

K

p i |ı

k=1

iyj

Cik

∝ Dirichlet i |ı

!

(A.3)

"

where ı = ı + Ci1 , . . ., ı + Cic , . . ., ı + Cik , Cic says the number of edges between Vi and the nodes whose class is Lc . And accordingly,

p i |Ei , ı ∝ Dirichlet i |ı

(A.4)

The value of parameter i can be estimated by the expectations of its posterior distribution. In fact, the estimator of the cth component ic of i is expressed as ˆ ic =

C +ı

ic K

k=1

Cic + ı

(A.5)

References [1] W. Siedlecki, J. Sklansky, A note on genetic algorithm for large-scale feature selection, Pattern Recognit. Lett. 10 (11) (1989) 335–347. [2] W. Siedlecki, J. Sklansky, On automatic feature selection, Int. J. Pattern Recognit. Artif. Intell. 2 (2) (1998) 197–200. [3] Y.M. Zheng, Improvement of Feature Selection Method Based on Genetic Algorithm, Chongqing University, Chongqing, 2009. [4] Rong-hua Shang, Chao-xu Hu, Li-cheng Jiao, Jing Bai, Research of multiobjective optimization algorithms’ application in multi-class classiﬁcation, Acta Electron. Sin. 40 (11) (2012) 2265–2266. [5] W.L. Cai, S.C. Chen, D.Q. Zhang, A mult-objective simultaneous learning framework for clustering and classiﬁcation, IEEE Trans. Neural Networks 21 (2) (2010) 185–200. [6] Z. Pawlak, Rough Sets: Theoretical Aspects of Reasoning about Data, Kluwer Academic Publisher, London, 1991. [7] Chun Du, Jixiang Sun, Zhiyong Li, et al., Method for ship recognition using optical remote sensing data, J. Image Graphics 17 (4) (2012), 591-591. [8] J. Lan, L. Wan, Automatic ship target classiﬁcation based on aerial images, in: Proceedings of SPIE, PIE: Bellingham, WA, 2009, pp. 1–10, vol. 7156, 12. [9] H. Pan, Y.P. Zhu, L.Z. Xia, Efﬁcient and accurate face detection using heterogeneous feature descriptors and feature selection, Comput. Vision Image Understanding 117 (1) (2013) 12–28. [10] Guang Yang, Bo Li, Shufan Ji, Feng Gao, Qizhi Xu, Ship detection from optical satellite images based on sea surface analysis, Geosci. Remote Sens. Lett. 11 (3) (2014) 641–645. [11] H.S. Gu, Y.M. Zhang, Q. Ji, Task oriented facial behavior recognition with selective sensing, Comput. Vision Image Understanding 100 (3) (2005) 385–415. [12] Z. Zhang, A Study on Harbor Target Recognition in High Resolution Optical Remote Sensing Image, University of Science and Technology of China, Hefei, 2009. [13] G.C. Anagnostopoulos, SVM-based target recognition from synthetic aperture radar images using target region outline descriptors, Nonlinear Anal. 71 (12) (2009) e2934–e2939. [14] Wei Jin, Jian-qi Zhang, Xiang Zhang, Face recognition method based on support vector machine and particle swarm optimization, Expert Syst. Appl. 38 (2011) 4390–4391.

4013

[15] Wen Ying, An improved discriminative common vectors and support vector machine based face recognition approach, Expert Syst. Appl. 39 (2012) 4629–4630. [16] Min Tang, Feng Chen, Facial expression recognition and its application based on curvelet transform and PSO-SVM, Optik 124 (2013) 5403–5404. [17] Luis Samaniego, Andras Bardossy, Karsten Schulz, Supervised classiﬁcation of remotely sensed imagery using a modiﬁed k-NN technique, IEEE Trans. Geosci. Remote Sens. 46 (7) (2008) 2112–2125. [18] J. Antelo, G. Ambrosio, J. Gonzalez, et al., Ship detection and recognition in high resolution satellite images, in: IEEE International Geosciences and Remote Sensing Symposium, IEEE Computer Society, Washington, DC, USA, 2009, pp. 514–517, vol. 4. [19] C¸ınar Hatice Akakın, Bülent Sankur, Robust classiﬁcation of face and head gestures in video, Image Vision Comput. 29 (2011) 470–480. [20] Wey Shiuan Hwang, Juyang Weng, Hierarchical discriminant regression, IEEE Trans. Pattern Anal. Mach. Intell. 22 (11) (2000) 1277–1293. [21] J.Y. Weng, W. Hwang, Incremental hierarchical discriminant regression, IEEE Trans. Neural Networks 18 (2) (2007) 397–415. [22] M. Uma Selvi, S. Suresh Kumar, A novel approach for ship recognition using shape and texture, Int. J. Adv. Inf. Technol. (IJAIT) 1 (5) (2011) 23–29. [23] Costas Panagiotakis, Ilias Grinias, Georgios Tziritas, Natural Image Segmentation based on tree equipartition, bayesian ﬂooding and region merging, IEEE Trans. Image Process. 20 (8) (2011) 2276–2287. [24] Zhenyu An, Zhenwei Shi, Xichao Teng, Xinran Yu, Wei Tang, An automated airplane detection system for large panchromatic image with high spatial resolution, Optik 40 (10) (2013) 3448–3450. [25] Changren Zhu, Hui Zhou, Runsheng Wang, Jun Guo, A novel hierarchical method of ship detection from spaceborne optical image based on shape and texture features, IEEE Trans. Geosci. Remote Sens. 48 (9) (2010) 3446–3456. [26] R.O. Duda, P.E. Hart, Use of the Hough transformation to detect lines and curves in picture, Commun. ACM 15 (1) (1972) 11–15. [27] Chen Wenting, Ji Kefeng, Xing Xiangwei, Ship recognition in high resolution SAR imagery based on feature selection, Proc. Int. Conf. Comput. Vis. Remote Sens. (2012) 302–303. [28] Chun Du, Jixiang Sun, Zhiyong Li, Shuhua Teng, Method for ship recognition using optical remote sensing data, J. Image Graphics 17 (4) (2012) 591–592. [29] S.H. Teng, D.C. Zan, J.X. Sun, Z.G. Tan, Attribute reduction algorithm based on common discernibility degree, Pattern Recognit. Artif. Intell. 23 (5) (2010) 630–638. [30] M. WcPherson, L. SmithLovin, J.M. Cook, Birds of a feather: homophily in social network, Annu. Rev. Sociol. 27 (1) (2001) 415–444. [31] Zhenwen Wang, Weidong Xiao, Wentang Tan, Classiﬁcation in networked data based on the probability generative model, J. Comput. Res. Dev. 50 (12) (2013) 2645–2646. [32] S. Thedoridis, K. Koutroumbas, Pattern Recognition, third ed., Publishing House of Electronics, Beijing, 2010. [33] C.Y. Zhang, J.L. Sun, Y.Q. Ding, Topiis mining for microblog based on MB-LDA model, J. Comput. Res. Dev. 48 (10) (2011) 1795–1802. [34] M.I. Jordan, Z. Ghahramani, T. Jaakkola, L.K. Saul, An introduction to variational methods for graphical models, Mach. Learn. 37 (2) (1999) 183–233. [35] M. Opper, D. Saad, Advanced Mean Field Methods: Theory and Practice, MIT Press, Cambridge, MA, 2001. [36] B.S. Manjunath, B. Sumengen, Graph partitioning active contours (GPAC) for image segmentation, IEEE Trans. Pattern Anal. Mach. Intell. 28 (4) (2006) 509–521. [37] L. Tang, H. Liu, Leveraging social media networks for classiﬁcation, J. Data Min. Knowl. Discovery 23 (3) (2011) 447–478. [38] J.X. Sun, Modern pattern recognition, second ed., Higher Education Publishing Company, Beijing, 2008, pp. 252–259.

Variational approximate inferential probability generative model for ship recognition using remote sensing data

Variational approximate inferential probability generative model for ship recognition using remote sensing data

Recommend Documents