Applied Soft Computing 15 (2014) 1–20
Contents lists available at ScienceDirect
Applied Soft Computing journal homepage: www.elsevier.com/locate/asoc
Semi-supervised change detection using modified self-organizing feature map neural network Susmita Ghosh a , Moumita Roy a , Ashish Ghosh b,∗ a b
Department of Computer Science and Engineering, Jadavpur University, Kolkata 700032, India Center for Soft Computing Research, Indian Statistical Institute, Kolkata 700108, India
a r t i c l e
i n f o
Article history: Received 11 May 2012 Received in revised form 6 August 2013 Accepted 24 September 2013 Available online 18 October 2013 Keywords: Semi-supervised learning Change detection Fuzzy set Self-organizing feature map
a b s t r a c t In the present article, semi-supervised learning is integrated with an unsupervised context-sensitive change detection technique based on modified self-organizing feature map (MSOFM) network. In the proposed methodology, training of the MSOFM network is initially performed using only a few labeled patterns. Thereafter, the membership values, in both the classes, for each unlabeled pattern are determined using the concept of fuzzy set theory. The soft class label for each of the unlabeled patterns is then estimated using the membership values of its K nearest neighbors. Here, training of the network using the unlabeled patterns along with a few labeled patterns is carried out iteratively. A heuristic method has been suggested to select some patterns from the unlabeled ones for training. To check the effectiveness of the proposed methodology, experiments are conducted on three multi-temporal and multi-spectral data sets. Performance of the proposed work is compared with that of two unsupervised techniques, a supervised technique and two semi-supervised techniques. Results are also statistically validated using paired t-test. The proposed method produced promising results. © 2013 Elsevier B.V. All rights reserved.
1. Introduction Change detection is a process of detecting temporal effects of multi-temporal images [1–3]. This process is used for finding out changes in a land cover over time by analyzing remotely sensed images of a geographical area captured at different time instants. The changes can occur due to natural hazards (e.g., disaster, earthquake), urban growth, deforestation. Change detection is one of the most challenging tasks in the field of pattern recognition and machine learning [4]. There are various applications of change detection like land use change analysis [5,6], monitoring urban growth [7,8], burned area identification [9], etc. Change detection can be viewed as an image segmentation problem, where two groups of pixels are to be formed, one for the changed class and the other for the unchanged one. Process of change detection can be broadly classified into two categories: supervised [10–12] and unsupervised [13–21]. Supervised techniques have certain advantages like they can explicitly recognize the kinds of changes occurred and are robust to different atmospheric and light conditions of acquisition dates. Various methodologies exist in literature to carry out supervised change detection e.g., post classification method [1,11,22], direct multi-date
∗ Corresponding author. Tel.: +91 33 2575 3110/3100; fax: +91 33 2578 3357. E-mail addresses:
[email protected] (S. Ghosh),
[email protected] (M. Roy),
[email protected] (A. Ghosh). 1568-4946/$ – see front matter © 2013 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.asoc.2013.09.010
classification method [1], kernel based method [12]. Besides several advantages, applicability of supervised methods in change detection is poor due to mandatory requirement of sufficient amount of ground truth information which is expensive, hard and monotonous. On the contrary, in unsupervised approach [13–20], there is no need of additional information like ground truth. Due to depletion of labeled patterns, unsupervised techniques seem to be compulsory for change detection. Generally, three consecutive steps are followed for unsupervised change detection. These are image preprocessing, image comparison and image analysis [1]. Images of the same geographical area captured at different time instants constitute the input of the change detection process. In the preprocessing step, these images are made compatible by operations like radiometric and geometric corrections, co-registration and noise reduction [1]. After preprocessing, image comparison is carried out, pixel by pixel, to generate a difference image (DI) which is used for change detection. There are various methods for generating DI like univariate image differencing, change vector analysis (CVA), image ratioing [1]. In the present work, CVA technique [1] is used for creation of DI. Unsupervised change detection process can be of two types: context insensitive [1,15] and context sensitive [13,14,16–19]. Histogram thresholding [1,15] is the simplest unsupervised context insensitive change detection method which has the main disadvantage of not considering spatial correlation between neighborhood pixels in the decision process. To overcome this difficulty, context sensitive methods using Markov random field (MRF) [16,17] are developed. These techniques also suffer from
2
S. Ghosh et al. / Applied Soft Computing 15 (2014) 1–20
certain difficulties like requirement of the selection of proper model for statistical distribution of changed and unchanged class pixels. On the contrary, change detection methodologies based on neural networks are free from such limitations. Work along this direction is already being carried out employing neural networks for change detection, both using supervised and unsupervised learning [13,14,18,20]. In change detection, a situation may occur where the categorical information of a few labeled patterns can be collected easily by the experts. If the number of these labeled patterns is less, then this information may not be sufficient for developing supervised methods. In such a scenario, knowledge of labeled patterns, though not much, may be completely unutilized if unsupervised approach is considered. Under this circumstance, semi-supervised approach [23–25] can be opted instead of unsupervised or supervised ones. Semi-supervision uses a small amount of labeled patterns with abundant unlabeled ones for learning, and integrates the merits of both supervised and unsupervised strategies to make full utilization of collected patterns [23,24]. Semi-supervision has been used successfully for improving the performance of clustering and classification [26–34] when insufficient amount of labeled data are present. Semi-supervised learning using neural networks is explored in various domains [35–38]. Though research is carried out using multilayer perceptron (MLP) for change detection [39] in semi-supervised classification framework, there is no such application of neural network using semi-supervised clustering approach for change detection problem. This motivated us to pursue the present study using neural networks to improve the performance of change detection process. In one of the earlier works, the self-organizing feature map (SOFM) network [40,41] was modified (named as, modified self-organizing feature map (MSOFM) [42]) and was used for unsupervised context sensitive change detection [14]. In the proposed methodology semi-supervised learning is incorporated within the said MSOFM framework [14]. The network architecture considered is similar to the one used in [14]. The network consists of two layers: input and output. For each feature of the input pattern, there is a neuron in the input layer. The output layer is two dimensional and each (i, j)th neuron in the output layer represents the (i, j)th pixel position in the difference image (DI). Here, we have a few labeled patterns. So, some neurons in the output layer correspond to these labeled patterns (labeled neuron); others corresponds to unlabeled patterns (unlabeled neurons). There is a weighted connection between each neuron in the output layer and all the neurons in the input layer. In the present work, connection weights are initialized differently for labeled and unlabeled neurons. The weight vectors for unlabeled neurons are initialized randomly in [0, 1]. The weight vectors for labeled neurons are initialized by the normalized feature values of the corresponding labeled patterns (to introduce the effect of supervision). To normalize the feature values of the input patterns between [0, 1], a mapping function (Eq. (2)) is used. At the onset, the network is learned by the labeled patterns only. Then, the unlabeled patterns are passed through the network and the membership values of the unlabeled patterns for the changed and the unchanged classes are calculated (from the trained network) depending on some pre-fixed threshold value. If the similarity measure between an unlabeled pattern and the weight vector of the corresponding neuron in the output layer is greater than the said threshold, then the membership value of that unlabeled pattern in the changed class will be more than that of the unchanged class; and vice versa. A method is also suggested for computing the membership values of unlabeled patterns for both the classes. In [14] a correlation based and an energy based techniques were used for selecting suitable thresholds. In the proposed methodology, the threshold selection process is the same as it was used in [14]. Thereafter, soft class label (or, target value) of each
of the unlabeled patterns is updated using the membership values of K nearest neighbors [39] of the corresponding pattern. After each training step, the unlabeled patterns, which are more likely to belong to the changed class, are selected and the MSOFM network is re-iterated by considering the labeled patterns along with these selected unlabeled patterns. Thus, learning of the MSOFM network and modification of soft class labels for the unlabeled patterns are continued iteratively until a given convergence criterion is satisfied or the number of training steps exceeds a certain value. To assess the effectiveness of the proposed method, experiments are carried out on three multi-temporal and multi-spectral data sets of Mexico area, Island of Sardinia and the southern part of the Peloponnesian Peninsula, Greece and the results are compared with those of the existing unsupervised method based on MSOFM [14], a robust fuzzy clustering technique [43], a supervised method based on MLP [41], a semi-supervised technique based on MLP [39] and constrained k-means algorithm [44] (a semi-supervised clustering algorithm). The rest of the article is organized into five sections. Section 2 describes the methodology of the proposed semi-supervised change detection technique. Description of the data sets used to carry out the investigation is provided in Section 3. In Section 4, implementation details and experimental results are discussed. Conclusion is drawn in Section 5. The performance measures used for investigation are concisely explained in Appendix A. 2. Proposed methodology for semi-supervised change detection In some of our earlier works, we have developed different change detection techniques [13,14,19,43] in unsupervised framework. In [13,14], context-sensitive change detection techniques were proposed using unsupervised learning based neural networks i.e. Hopfield-type neural network and modified self-organizing map neural network. Various fuzzy clustering techniques (i.e. fuzzy c-means and Gustafson–Kessel clustering) are used for unsupervised change detection in [19]. These fuzzy clustering based change detection techniques are further improved by incorporating local information in [43]. We have also developed a semi-supervised change detection technique in [39] by modifying the learning of supervised neural network (i.e., multilayer perceptron) in such a way that it can utilize the abundant unlabeled patterns along with a few labeled patterns during learning. As already mentioned, no research work is carried out in this direction using unsupervised neural network when a few labeled patterns are available. In the present work, modified self-organizing map neural network is integrated with the concept of semi-supervised learning for better change detection. Detailed description of the proposed change detection technique is presented in the subsequent sections. 2.1. Generation of input pattern The difference image D = {lmn , 1 ≤ m ≤ p, 1 ≤ n ≤ q} is produced by the CVA technique [1] from two co-registered and radiometrically corrected -spectral band images Y1 and Y2 , each of size p × q, of the same geographical area captured at different times T1 and T2 . Here, gray value of the difference image D at spatial position (m, n), denoted as lmn , is calculated as,
2 ˛ ˛ lmn = (int) (lmn (Y1 ) − lmn (Y2 )) ,
(1)
˛=1 ˛ (Y ) and l˛ (Y ) are the gray values of the pixels at the where lmn 1 mn 2 spatial position (m, n) in the ˛th band of the images Y1 and Y2 , respectively.
S. Ghosh et al. / Applied Soft Computing 15 (2014) 1–20
3
− W (itr)), Wkl (itr + 1) = Wkl (itr) + hkl,ij (itr)(itr)(Xmn kl
Fig. 1. Kohonen’s model of self-organizing feature map with two-dimensional output layer.
From the difference image D, the input pattern for a particular pixel position is generated by considering the gray value of the said pixel as well as those of its neighboring ones to exploit (spatial) contextual information from neighbors. In the present methodology, 2nd order neighborhood system is used. Here, each input pattern consists of nine features, one gray value of its own and eight gray values from its neighbors. The y-dimensional input pattern of the (m, n)th pixel position = [xmn,1 , xmn,2 , . . ., xmn,y ]. Here, a mapping of DI is denoted by Xmn algorithm is used to normalize the feature values of the input pattern in [0, 1]. The ith feature value (i = 1, 2, . . . y) of the y-dimensional , is normalized as input pattern, Xmn xmn,i =
xmn,i − cmin , cmax − cmin
(2)
where cmax and cmin , respectively, are the maximum and the minimum gray values of DI.
(3)
where (itr) denotes the learning rate in the itrth iteration and it decreases with the increase of itr. The weight vectors of the wining neuron and its neighborhood neurons gradually move towards the input pattern under consideration. As mentioned earlier, the modified SOFM (MSOFM) network [42] was used for unsupervised context sensitive change detection [14]. In the present work, a similar MSOFM network architecture is used. Like SOFM, in the MSOFM network [14,42] the output layer is two dimensional and there is a representative neuron corresponding to each pixel position of DI. The number of neurons in the input layer is the same as the number of features of the input pattern. There is also a weighted connection between each neuron in the output layer and all the neurons in the input layer. In [14], the input pattern for every pixel position in DI is passed through the MSOFM network. Thereafter, the similarity measure and the weight vector Wmn of between the given input pattern Xmn the (m, n)th output neuron, is computed. If the similarity is more than a pre-fixed threshold then the concerned output neuron is the winner and the weight updating is performed for that neuron and its neighbors, using Eq. (3). In the SOFM network, the same input pattern is applied to all the output neurons for selecting the winning neuron. On the other hand, in the MSOFM network, different inputs are given to different output neurons and the selection of the winner (neuron) is done depending on a pre-defined threshold value. A correlation based and an energy based methods are also suggested to select suitable thresholds. 2.3. Labeled pattern collection and weight initialization Semi-supervised learning of the MSOFM network requires a small amount of labeled patterns. The labeled patterns can be collected in many ways. In the present technique, for experimental purpose, labeled patterns are picked up from the ground truth for both the classes with equal percentage. After collecting the labeled patterns, weight initializations for the labeled and the unlabeled neurons are done differently. If the class label of the (m, n)th pixel of DI is known, then the weight vector for the (m, n)th output neuron, denoted as Wmn , is initialized with the normalized feature values of the corresponding labeled pattern; whereas weight vector for others is initialized randomly between [0, 1]. 2.4. Learning by a small amount of labeled patterns
2.2. Modified self-organizing feature map (MSOFM) [42] The self-organizing feature map network (SOFM) [40,41] uses the concept of competitive learning. It has two layers: input and output. The output layer is two dimensional (see Fig. 1). There is a weighted connection between each neuron in the output layer and all the neurons of the input layer. The y-dimensional weight vector between the (m, n)th neuron of the output layer and all the input neurons is represented by Wmn = [wmn,1 , wmn,2 , . . ., wmn,y ]. The neurons in the output layer are competing among themselves. The SOFM is learned iteratively and it gradually generates topological map of the input patterns. The SOFM follows three steps during learning: compete, co-operation and update weight (learn). In com petition step, similarity measure between a given input pattern Xmn and the weight vector of all the output neurons is computed. Then, the (i, j)th output neuron is selected as winner where the similarity measure is maximum. Let, hkl,ij (itr) be the topological neighborhood function between the wining neuron (i, j) and its topological neighborhood (k, l) at iteration number itr and this function shrinks after each iteration. It can be of any form like Gaussian, rectangular, etc. The weight updating for (k, l)th output neuron is performed as
are passed to the MSOFM During training, the input patterns Xmn network consecutively. Each time, the dot product, d(m, n), between and Wmn is calculated as, Xmn · Wmn = d(m, n) = Xmn
y
xmn,k · wmn,k .
(4)
k=1
At the beginning of the training phase, the connection weights of the network are updated in the following manner using the labeled patterns only. If the class label of the (m, n)th pixel position is known, then the weight vector Wij for all neighboring unlabeled neurons (defined by hij,mn (·)) of the (m, n)th output neuron is updated using Eq. (3). The weight vector is gradually shifted towards the given input pattern through updating. This is done because the input patterns of the same class have similar feature values and the neighboring pixels of a given input pattern have high probability to belong to the same class as that of the input pattern. In the proposed method, the weight updating process, using labeled patterns, brings their neighboring unlabeled pixels to their respective classes, if they originally belong to the same
4
S. Ghosh et al. / Applied Soft Computing 15 (2014) 1–20
class. On the other hand, the weight vector for labeled neuron is initialized with its own feature values and is not updated during entire learning process; otherwise during learning the weight vector might move closer to the class in which the labeled pattern does not originally belong to. Learning using labeled patterns is continued iteratively until convergence. To check convergence, the output of the MSOFM network, O at each iteration ‘itr’ is calculated as, O=
(5)
d(m, n),
d(m,n)≥
where is a pre-defined threshold value. The network converges for any value of (proof is given in [42]). In the present work, similar threshold selection techniques (correlation maximization criterion and energy based criterion) are used as it was used in the existing unsupervised change detection method [14]. Weight updating is preformed until the difference between output O in two consecutive iterations is less than ı, where ı is a small positive quantity. The value of lies within [0, 1]. The components of the weight vector Wmn are normalized in the following way so that the dot product d(m, n) lies in [0, 1]: wmn,k =
wmn,k
y
k=1
wmn,k
.
(6)
2.5. Computation of soft class label of the unlabeled patterns After each training step, the unlabeled patterns are presented to the network and their soft class labels are calculated using the concept of fuzzy set theory. Let us consider that there exists two fuzzy sets: one for the changed class and the other for the unchanged one. The membership values of each unlabeled pattern for both the classes can be determined. For each (i, j)th unlabeled pattern, d(i, j) is computed by Eq. (4); if, d(i, j) ≥ , then the (i, j)th pattern is more likely to belong to the changed class than the unchanged one; otherwise it is from the unchanged class. Let, (i, j) = [1 (i, j), 2 (i, j)] be the membership value of the (i, j)th unlabeled pattern, where 1 (i, j) and 2 (i, j) are the membership values of the (i, j)th pattern in the unchanged class and the changed class, respectively. These values can be calculated as, [1 (i, j), 2 (i, j)] =
p q 2
2
(k (i, j) − tk (i, j)) .
(9)
i=1 j=1 k=1
Learning is continued until the difference of error between two consecutive training steps is less than (where is a small positive quantity) or the number of training steps exceeds a certain number. After convergence, the hard class labels are assigned to the unlabeled patterns depending on their target values. The algorithmic representation of the proposed methodology is given in Table 1.
To evaluate the effectiveness of the proposed methodology, experiments are carried out on three multi-temporal remotely sensed images corresponding to the geographical areas of Mexico, Sardinia Island of Italy and Greece. 3.1. Data set related to Mexico area [13,14,39] This data set consists of two multi-spectral images of the Landsat-7 satellite captured by the Landsat Enhanced Thematic Mapper Plus (ETM+) sensor over an area of Mexico taken on 18th April 2000 and 20th May 2002. From the entire available Landsat scene, a section of 512 × 512 pixels has been selected as test site. A fire destroyed a large portion of the vegetation in the considered region between two acquisition dates. Initially, we performed some trials in order to determine the most effective spectral bands for detecting the burnt area in the considered data set. On the basis of the results of these trials, band 4 is observed to be more if d(i, j) ≥
[max(d(i, j), 1 − d(i, j)), min(d(i, j), 1 − d(i, j))],
otherwise
(s, l) Xsl ∈M 1 K
=
[min(d(i, j), 1 − d(i, j)), max(d(i, j), 1 − d(i, j))],
After that, the target value (or, soft class label) of the unlabeled pattern is updated in the same way using K-nearest neighbor technique as it was done in [39]. For each unlabeled pattern, its K nearest neighbors are determined. To search for the K number of nearest neighbors, instead of using all the patterns, we considered only those which lie within a window around that unlabeled pattern. This is done to reduce time requirement for searching. Let, M be the set of K nearest neighbors of the (i, j)th unlabeled pattern. Now, the target value t(i, j) = [t1 (i, j), t2 (i, j)] of the (i, j)th unlabeled pattern is estimated as,
t(i, j) =
Initially, learning considers only labeled patterns; and the soft class labels of the unlabeled patterns are obtained by Eqs. (7) and (8). Then, the unlabeled patterns for which the estimated target value in changed class is greater than unchanged class, are selected for training of the MSOFM network again. The process of training the MSOFM using the labeled patterns and the selected unlabeled patterns continue until convergence. Training of the network and re-estimation of soft class labels of the unlabeled patterns using Eqs. (7) and (8) are continued iteratively until the network is stabilized. The stability of network (for DI of size p × q) is checked by computing the sum of square error, , after each training step as:
3. Description of data sets
After each epoch, the learning rate (itr) and the size of the topological neighborhood h(itr) is decreased.
2.6. Iterative learning process
,
(s, l) Xsl ∈M 2 K
.
(8)
It is to be noted that for the labeled patterns, both of t(i, j) and (i, j) are either [1, 0] or [0, 1].
.
(7)
effective to locate the burnt area. Fig. 2(a) and (b) shows the band 4 images corresponding to April 2000 and May 2002, respectively. The difference image (Fig. 2(c)) created by spectral band 4 using CVA technique is only used for further analysis. For evaluation of the proposed approach, a reference map (Fig. 2(d)) was used. The reference map contains 25,599 changed and 236,545 unchanged pixels. 3.2. Data set related to Sardinia Island, Italy [13,14,39] Two multi-spectral images are acquired by the Landsat Thematic Mapper (TM) sensor of the Landsat-5 satellite in September 1995 and July 1996. The test site of 412 × 300 pixels of a scene includes the lake Mulargia on the Island of Sardinia (Italy). The water level of the lake increased (see lower center part of the image) between two acquisition dates. Fig. 3(a) and (b) , respectively, shows the 1995 and 1996 images of band 4. We applied CVA technique on spectral
S. Ghosh et al. / Applied Soft Computing 15 (2014) 1–20
5
Table 1 Algorithmic representation of the proposed work. Step 1:
Pick up a few labeled patterns from the reference map.
Step 2:
Initialize connection weights of the MSOFM network. For the output neuron corresponding to each of the labeled patterns, initialize weights using the feature values of the corresponding pattern. For the output neuron corresponding to each of the unlabeled patterns, initialize weights randomly in [0, 1].
Step 3:
Update the network weight vector for the output neuron corresponding to each of the unlabeled patterns using labeled patterns only.
Step 4:
Calculate the membership value () of the unlabeled patterns using similarity measure (d) and the pre-fixed threshold value () by passing through the network. if d ≥ , in the changed class = max[d, (1 − d)]. in the unchanged class = min[d, (1 − d)]. else in the changed class = min[d, (1 − d)]. in the unchanged class = max[d, (1 − d)].
Step 5:
Assign the target value of each unlabeled pattern using the membership values of its K nearest neighbors.
Step 6:
For the next training step select those unlabeled patterns for which the estimated target value in changed class is greater than the unchanged one.
Step 7:
Update the network weight vector for the output neuron corresponding to each of the unlabeled patterns using the labeled patterns as well as the selected unlabeled patterns.
Step 8:
Repeat Steps 4, 5, 6 and 7 until convergence. At convergence, go to Step 9.
Step 9:
Assign a hard class label to each of the unlabeled patterns.
bands 1, 2, 4, and 5 of the two multi-temporal images to generate the difference image (Fig. 3(c)), as elementary experiments show that the above channels contain useful information on the changes of water body. In the reference map (Fig. 3(d)), 7480 changed and 116,120 unchanged pixels were identified.
3.3. Data set related to the Peloponnesian Peninsula, Greece This data set is composed two images captured by a passive multi-spectral scanner installed on a satellite (i.e., the Wide Field Sensor (WiFS) mounted on board the IRS-P3 satellite)
Fig. 2. Images of Mexico area. (a) Band 4 image acquired in April 2000, (b) band 4 image acquired in May 2002, (c) corresponding difference image generated by CVA technique, and (d) a reference map of the changed area.
6
S. Ghosh et al. / Applied Soft Computing 15 (2014) 1–20
Fig. 3. Images of Sardinia Island, Italy. (a) Band 4 image acquired in September 1995, (b) band 4 image acquired in July 1996, (c) difference image generated by CVA technique using 1, 2, 4, and 5, and (d) a reference map of the changed area.
on the southern part of the Peloponnesian Peninsula, Greece, in April 1998 and September 1998. From the entire available WiFS scene, a section of 492 × 492 pixels has been selected as test site. Fig. 4(a) and (b) shows the respective images for NIR band (i.e., near-infrared spectral channel). Various wildfire destroyed a large portion of vegetation in the said area
between two acquisition dates. Fig. 4(c) and (d), respectively, shows the corresponding difference image and the reference map which are obtained by the same process as used in the case of the previously mentioned data sets. The reference map contains 5197 changed and 236,867 unchanged pixels.
Fig. 4. Images of the Peloponnesian Peninsula, Greece. (a) NIR band of the IRS-P3 WiFS image acquired in April 1998, (b) NIR band of the IRS-P3 WiFS image acquired in September 1998, (c) corresponding difference image generated by CVA technique, and (d) a reference map of the changed area.
S. Ghosh et al. / Applied Soft Computing 15 (2014) 1–20
7
Table 2 Results obtained by the unsupervised change detection technique using MSOFM on Mexico data set. Threshold
Max/Min
MA
FA
OE
Avg. OE
Avg. Micro F1
Avg. Macro F1
Avg. Kappa
Avg. PE
0.216 (optimal)
Min Max
1366 1366
1618 1634
2984 3000
2991.7 (4.360046)
0.9677 (0.000044)
0.9677 (0.000044)
0.9355 (0.000089)
0.0114 (0.000017)
0.183 (energy based)
Min Max
556 556
3004 3024
3560 3580
3570.9 (6.42573)
0.9635 (0.00006)
0.9629 (0.000063)
0.9258 (0.000125)
0.01362 (0.000025)
0.232 (correlation based)
Min Max
1987 1989
1210 1227
3197 3216
3208.5 (6.086871)
0.9648 (0.000064)
0.9648 (0.000063)
0.9296 (0.000127)
0.0122 (0.000023)
Table 3 Results obtained by a supervised change detection technique using MLP on Mexico data set. FA
OE
Avg. OE
Avg. Micro F1
Avg. Macro F1
Avg. Kappa
Avg. PE
0.1%
Training patterns
Max/Min Min Max
1345 2268
MA
1429 1139
2774 3407
3086.4 (186.277857)
0.9663 (0.002266)
0.9662 (0.00234)
0.9275 (0.004676)
0.0117 (0.000711)
0.5%
Min Max
1269 875
1406 2192
2675 3067
2834.3 (95.736148)
0.9695 (0.000899)
0.9694 (0.000933)
0.915 (0.001731)
0.0108 (0.000365)
1%
Min Max
1208 913
1420 2034
2628 2947
2727.1 (87.813951)
0.9703 (0.000838)
0.9703 (0.000864)
0.8946 (0.001477)
0.0104 (0.000335)
Table 4 Results obtained by constrained k-means algorithm on Mexico data set. Min/Max
MA
FA
OE
Avg. OE
Avg. Micro F1
Avg. Macro F1
Avg. Kappa
Avg. PE
0.1%
Training patterns
Min Max
3099 3106
664 665
3763 3771
3767.5 (2.539685)
0.9581 (0.000029)
0.9574 (0.000031)
0.9148 (0.000061)
0.0143 (0.00001)
0.5%
Min Max
3077 3081
661 664
3738 3745
3743.6 (3.104835)
0.9584 (0.000035)
0.9576 (0.000036)
0.9154 (0.000072)
0.0142 (0.000012)
1%
Min Max
3063 3073
657 662
3720 3735
3727 (3.464102)
0.9585 (0.000039)
0.9578 (0.00004)
0.9157 (0.000081)
0.0142 (0.000013)
Table 5 Results obtained by using robust fuzzy c-means and semi-supervised MLP on Mexico data set. Techniques used
Min/Max
MA
OE
Avg. OE
Avg. Micro F1
Avg. Macro F1
Avg. Kappa
Avg. PE
RFCM
Min Max
1795 1795
1068 1068
2863 2863
2863 (0)
0.9687 (0)
0.9686 (0)
0.9372 (0)
0.0109 (0)
Semi-supervised MLP
Min Max
2660 2771
706 661
3366 3432
3388.1 (22.38504)
0.9625 (0.00026)
0.9620 (0.000283)
0.9240 (0.000564)
0.0129 (0.000085)
FA
Table 6 Results obtained by the proposed semi-supervised technique on Mexico data set. Threshold 0.216 (optimal)
0.183 (energy based)
0.232 (correlation based)
Training patterns
Max/Min
MA
0.1%
Min Max
1512 1516
0.5%
Min Max
1%
FA
OE
Avg. OE
Avg. Micro F1
Avg. Macro F1
Avg. Kappa
Avg. PE
1218 1239
2730 2755
2741.4 (6.666333)
0.9701 (0.000071)
0.9701 (0.000071)
0.9403 (0.000141)
0.0104 (0.000025)
1470 1502
1235 1232
2705 2734
2723.7 (8.331266)
0.9702 (0.000096)
0.9702 (0.000096)
0.9404 (0.000193)
0.0104 (0.000032)
Min Max
1478 1483
1201 1237
2679 2720
2700.8 (12.432216)
0.9703 (0.000135)
0.9703 (0.000135)
0.9406 (0.00027)
0.010407 (0.000048)
0.1%
Min Max
697 710
2174 2195
2871 2896
2884.1 (6.284107)
0.9697 (0.000062)
0.9695 (0.000064)
0.939 (0.000127)
0.011 (0.000024)
0.5%
Min Max
683 692
2175 2189
2858 2881
2868.2 (8.459314)
0.9698 (0.000089)
0.9695 (0.000089)
0.9391 (0.000178)
0.0109 (0.000032)
1%
Min Max
688 687
2142 2197
2830 2884
2854.4 (13.821722)
0.9698 (0.000137)
0.9695 (0.000141)
0.9391 (0.000281)
0.0109 (0.000053)
0.1%
Min Max
2143 2148
920 945
3063 3093
3080.7 (9.327915)
0.966 (0.000102)
0.9659 (0.000102)
0.9318 (0.000203)
0.0117 (0.000036)
0.5%
Min Max
2098 2137
931 946
3029 3083
3054.1 (16.585837)
0.9662 (0.000186)
0.966 (0.000188)
0.9321 (0.000375)
0.0117 (0.000064)
1%
Min Max
2070 2111
929 940
2999 3051
3024.6 (14.779716)
0.9664 (0.000166)
0.9662 (0.000167)
0.9324 (0.000335)
0.0116 (0.000057)
8
S. Ghosh et al. / Applied Soft Computing 15 (2014) 1–20
Table 7 Results obtained by the unsupervised change detection technique using MSOFM on Sardinia data set. Threshold
OE
Avg. OE
Avg. Micro F1
Avg. Macro F1
Avg. Kappa
Avg. PE
0.368 (optimal)
Max/Min Min Max
1070 1076
MA
FA 574 579
1644 1655
1649.2 (3.37046)
0.9397 (0.000126)
0.9394 (0.000127)
0.8789 (0.000254)
0.0133 (0.000027)
0.356 (energy based)
Min Max
915 914
766 776
1681 1690
1685.7 (2.45153)
0.9395 (0.000081)
0.9394 (0.00008)
0.8789 (0.00016)
0.0136 (0.00002)
0.337 (correlation based)
Min Max
704 702
1181 1196
1885 1898
1892.3 (3.606938)
0.9349 (0.000104)
0.9346 (0.000108)
0.8693 (0.000216)
0.0153 (0.000029)
Table 8 Results obtained by a supervised change detection technique using MLP on Sardinia data set. Max/Min
MA
OE
Avg. OE
Avg. Micro F1
Avg. Macro F1
Avg. Kappa
Avg. PE
0.1%
Training patterns
Min Max
1161 1231
FA 504 1591
1665 2822
1969.4 (376.058559)
0.9277 (0.012885)
0.9268 (0.01306)
0.8472 (0.025651)
0.0159 (0.003043)
0.5%
Min Max
1057 1672
659 641
1716 2313
2035 (177.15925)
0.9258 (0.006511)
0.9249 (0.006851)
0.8198 (0.013029)
0.0164 (0.001433)
1%
Min Max
1050 1534
578 409
1628 1943
1815.3 (88.293884)
0.9327 (0.003542)
0.9321 (0.003847)
0.8057 (0.007149)
0.0146 (0.000714)
Table 9 Results obtained by constrained k-means algorithm on Sardinia data set. Min/Max
MA
FA
OE
Avg. OE
Avg. Micro F1
Avg. Macro F1
Avg. Kappa
Avg. PE
0.1%
Training patterns
Min Max
637 637
1876 1881
2513 2518
2514.8 (1.4)
0.9184 (0.000039)
0.9169 (0.000041)
0.8339 (0.000082)
0.0203 (0.000011)
0.5%
Min Max
635 634
1858 1876
2493 2510
2501.8 (5.87875)
0.9188 (0.000146)
0.9173 (0.000161)
0.8347 (0.000321)
0.0202 (0.000047)
1%
Min Max
631 628
1839 1866
2470 2494
2480.4 (6.696268)
0.9194 (0.000168)
0.9179 (0.000185)
0.8359 (0.000369)
0.02 (0.000054)
Table 10 Results obtained by using robust fuzzy c-means and semi-supervised MLP on Sardinia data set. Techniques used
Min/Max
OE
Avg. OE
Avg. Micro F1
Avg. Macro F1
Avg. Kappa
Avg. PE
RFCM
Min Max
606 606
1576 1576
2182 2182
2182 (0)
0.9278 (0)
0.9268 (0)
0.8536 (0)
0.0176 (0)
Semi-supervised MLP
Min Max
1369 1450
279 246
1648 1696
1669.2 (11.496086)
0.9378 (0.000468)
0.9360 (0.000559)
0.8721 (0.001115)
0.0135 (0.000093)
MA
FA
Table 11 Results obtained by the proposed semi-supervised technique on Sardinia data set. Threshold
Training patterns
Max/Min
0.368 (optimal)
0.1%
Min Max
0.5%
FA
OE
Avg. OE
Avg. Micro F1
Avg. Macro F1
Avg. Kappa
Avg. PE
1172 1225
371 355
1543 1580
1555.5 (10.052363)
0.9424 (0.000404)
0.9416 (0.000442)
0.8832 (0.000883)
0.0125 (0.000081)
Min Max
1155 1166
359 399
1514 1565
1540.2 (12.237647)
0.9428 (0.000456)
0.9419 (0.000455)
0.8839 (0.00091)
0.0125 (0.000099)
1%
Min Max
1143 1183
355 357
1498 1540
1516 (16.321765)
0.9434 (0.000624)
0.9426 (0.000634)
0.8852 (0.001268)
0.0123 (0.000133)
0.1%
Min Max
999 1002
496 513
1495 1515
1502.5 (7.003571)
0.945 (0.000247)
0.9447 (0.000242)
0.8894 (0.000483)
0.0121 (0.000056)
0.5%
Min Max
976 991
493 511
1469 1502
1485.6 (12.682271)
0.9454 (0.000473)
0.9451 (0.000476)
0.89 (0.000952)
0.012 (0.000103)
1%
Min Max
967 989
473 490
1440 1479
1459.9 (12.332477)
0.9461 (0.000464)
0.9458 (0.000469)
0.8916 (0.000937)
0.0119 (0.000101)
0.337 (correlation based) 0.1%
Min Max
811 825
826 847
1637 1672
1656.2 (9.537295)
0.9411 (0.000334)
0.9411 (0.000334)
0.8822 (0.000669)
0.0134 (0.000077)
0.5%
Min Max
785 805
841 850
1626 1655
1635.7 (7.253275)
0.9416 (0.000253)
0.9416 (0.000253)
0.8832 (0.000506)
0.0133 (0.000059)
1%
Min Max
794 818
787 835
1581 1653
1613.7 (18.649665)
0.942 (0.000653)
0.942 (0.000653)
0.8841 (0.001305)
0.01318 (0.000153)
0.356 (energy based)
MA
S. Ghosh et al. / Applied Soft Computing 15 (2014) 1–20
9
Table 12 Results obtained by the unsupervised change detection technique using MSOFM on Greece data set. Threshold
Max/Min
MA
FA
OE
Avg. OE
Avg. Micro F1
Avg. Macro F1
Avg. Kappa
Avg. PE
0.425 (optimal)
Min Max
2202 2203
728 741
2930 2944
2935.4 (4.103657)
0.8381 (0.000197)
0.8324 (0.000158)
0.665 (0.000317)
0.0121 (0.000017)
0.397 (energy based)
Min Max
1709 1709
1564 1582
3273 3291
3285.2 (5.095096)
0.8364 (0.000177)
0.8364 (0.000174)
0.6729 (0.000349)
0.01357 (0.000021)
0.438 (correlation based)
Min Max
2510 2510
528 541
3038 3051
3045.3 (4.450843)
0.8274 (0.000243)
0.8159 (0.000173)
0.6322 (0.000348)
0.0125 (0.000018)
Table 13 Results obtained by a supervised change detection technique using MLP on Greece data set. Training patterns
Max/Min
MA
FA
OE
Avg. OE
Avg. Micro F1
Avg. Macro F1
Avg. Kappa
Avg. PE
0.1%
Min Max
2167 2292
990 3380
3127 5672
3924.2 (816.001201)
0.8033 (0.027801)
0.7989 (0.027682)
0.5929 (0.052822)
0.0162 (0.003371)
0.5%
Min Max
2570 1564
464 2201
3034 3765
3386.3 (241.102903)
0.8238 (0.004244)
0.8203 (0.006501)
0.6103 (0.011776)
0.0139 (0.000996)
1%
Min Max
1948 1575
973 2024
2921 3599
3071 (196.981217)
0.8344 (0.008858)
0.8291 (0.015223)
0.6002 (0.021243)
0.0126 (0.000814)
Fig. 5. Change detection maps obtained for Mexico data set: (a) using an unsupervised technique based on optimal threshold, (b) using MLP based supervised technique (with 0.1% training pattern), (c) using constrained k-means algorithm (with 0.1% training pattern), (d) using the proposed semi-supervised technique based on optimal threshold (with 0.1% training pattern), (e) a reference map for the changed area, (f) using semi-supervised MLP, and (g) using robust fuzzy c-means algorithm.
10
S. Ghosh et al. / Applied Soft Computing 15 (2014) 1–20
Fig. 6. Error maps obtained for Mexico data set: (a) using an unsupervised technique based on optimal threshold, (b) using MLP based supervised technique (with 0.1% training pattern), (c) using constrained k-means algorithm (with 0.1% training pattern), (d) using the proposed semi-supervised technique based on optimal threshold (with 0.1% training pattern), (e) using semi-supervised MLP, and (f) using robust fuzzy c-means algorithm.
Table 14 Results obtained by constrained k-means algorithm on Greece data set. Min/Max
MA
OE
Avg. OE
Avg. Micro F1
Avg. Macro F1
Avg. Kappa
Avg. PE
0.1%
Training patterns
Min Max
37 37
FA 67,295 67,339
67,332 67,376
67,358.3 (13.842326)
0.6583 (0.000014)
0.4835 (0.000032)
0.0966 (0.000025)
0.2782 (0.000057)
0.5%
Min Max
37 37
66,798 66,885
66,835 66,922
66,860.4 (28.228355)
0.6588 (0.000032)
0.4847 (0.000065)
0.0975 (0.000053)
0.2762 (0.000117)
1%
Min Max
39 39
65,871 65,939
65,910 65,978
65,948.1 (23.257042)
0.6596 (0.000028)
0.4867 (0.000054)
0.0992 (0.000044)
0.2724 (0.000096)
4. Experimental results and discussion As mentioned in Section 1, to investigate the effectiveness of the proposed semi-supervised technique, experiments are conducted on three different multi-temporal and multispectral data sets and the results obtained using the proposed methodology are compared with those of the unsupervised approach using MSOFM [14], a robust fuzzy c-means clustering technique (RFCM) [43], a supervised method using MLP, a semi-supervised change detection technique using MLP [39], and
constrained k-means algorithm [44] (a semi-supervised clustering algorithm). To implement the proposed algorithm, during training of MSOFM network, the learning rate in each iteration ‘itr’ is computed as (itr) = 1/(1 + itr) to ensure its value to lie within the range of 0 to 1 (i.e., 0 < ≤ 1). The topological neighborhood, h(itr) was considered to be a rectangular window with initial size 11 × 11 and after each epoch, the window size was gradually reduced until it attained a size of 3 × 3; thereafter its size was kept constant till convergence. For each unlabeled pattern, to search for its K nearest
Table 15 Results obtained by using robust fuzzy c-means and semi-supervised MLP on Greece data set. Techniques used
Min/Max
MA
OE
Avg. OE
Avg. Micro F1
Avg. Macro F1
Avg. Kappa
Avg. PE
RFCM
Min Max
37 37
66,963 66,963
67,000 67,000
67,000 (0)
0.6587 (0)
0.4844 (0)
0.0973 (0)
0.276786 (0)
Semi-supervised MLP
Min Max
39 33
68,110 74,571
68,149 74,604
70,572.5 (1798.824352)
0.6552 (0.001671)
0.4763 (0.004005)
0.0911 (0.002918)
0.2915 (0.007431)
FA
2948.9 (19.94718)
0.8295 (0.001333)
0.8103 (0.001404)
0.6212 (0.002798)
0.0123 (0.000083)
11
270 297 2639 2673 1%
Min Max
2909 2970
0.0123 (0.000081) 0.6174 (0.002642) 0.8084 (0.001325) 0.828 (0.001293) 2985.6 (19.458674) 269 312 2687 2704 0.5%
Min Max
2956 3016
0.0124 (0.000034) 0.615 (0.001114) 0.8072 (0.000559) 0.8269 (0.000539) 3013.3 (8.124654) 282 295 2721 2730 0.438 (correlation based)
0.1%
Min Max
3003 3025
0.0119 (0.000054) 0.6966 (0.001408) 0.8482 (0.000704) 0.8492 (0.000683) 2865.3 (12.99269) 1093 1116 1751 1766 1%
Min Max
2844 2882
0.0119 (0.000075) 0.6966 (0.001652) 0.8483 (0.000826) 0.8492 (0.000848) 2886.3 (18.177184) 1098 1159 1750 1753 0.5%
Min Max
2848 2912
0.012 (0.00004) 0.6951 (0.000948) 0.8475 (0.000474) 0.8484 (0.000478) 2915.7 (9.571311) 1143 1161 1763 1781 0.397 (energy based)
0.1%
Min Max
2906 2942
0.0114 (0.000063) 0.6663 (0.002007) 0.833 (0.001005) 0.8444 (0.00095) 2744.7 (15.152888) 399 422 2310 2347 1%
Min Max
2709 2769
0.0114 (0.000062) 0.6648 (0.002096) 0.8322 (0.001051) 0.8437 (0.000948) 2769.6 (14.914423) 420 417 2344 2376 0.5%
Min Max
2744 2793
0.0115 (0.000029) 0.6633 (0.000823) 0.8315 (0.000412) 0.8429 (0.000415) 2791.8 (6.939741) 2778 2801 405 425 2373 2376 Min Max 0.425 (optimal)
0.1%
FA MA Max/Min Training patterns Threshold
Table 16 Results obtained by the proposed semi-supervised technique on Greece data set.
OE
Avg. OE
Avg. Micro F1
Avg. Macro F1
Avg. Kappa
Avg. PE
S. Ghosh et al. / Applied Soft Computing 15 (2014) 1–20
Fig. 7. Change detection maps obtained for Sardinia data set: (a) using an unsupervised technique based on optimal threshold, (b) using MLP based supervised technique (with 0.1% training pattern), (c) using constrained k-means algorithm (with 0.1% training pattern), (d) using the proposed semi-supervised technique based on optimal threshold (with 0.1% training pattern), (e) a reference map for the changed area, (f) using semi-supervised MLP, and (g) using robust fuzzy c-means algorithm.
neighbors, although we experimented with different window sizes and K values, finally, the window size was taken as 51 × 51 and the value of K was fixed at 8. The similar threshold calculation methods (optimal, correlation based and energy based) are adopted in our experiments as they were employed in [14]. Weight initialization of the network connections (for both MSOFM and MLP) and the training (labeled) patterns used are different for different simulations. For experimentation, three different percentages of training patterns (0.1%, 0.5%, and 1%) are considered and 10 simulations are conducted. At the beginning of the training phase, the labeled patterns are obtained from the reference map and a target value is assigned to each labeled pattern depending on its class label. The target value of each of the training patterns is fixed to its class label while testing. To assess the effectiveness of the proposed methodology, various performance measuring indices are considered in our investigation and these are as follows: the number of missed alarms
12
S. Ghosh et al. / Applied Soft Computing 15 (2014) 1–20
Fig. 8. Error maps obtained for Sardinia data set: (a) using an unsupervised technique based on optimal threshold, (b) using MLP based supervised technique (with 0.1% training pattern), (c) using constrained k-means algorithm (with 0.1% training pattern), (d) using the proposed semi-supervised technique based on optimal threshold (with 0.1% training pattern), (e) using semi-supervised MLP, and (f) using robust fuzzy c-means algorithm.
(MA), the number of false alarms (FA), the number of overall error (OE), micro averaged F1 measure (MicroF1 ), macro averaged F1 measure (MacroF1 ), Kappa measure (Kappa) and error probability (PE ). Except the cases of missed alarms and false alarms, the average (Avg.) and standard deviation (written in brackets in the tables) values (over 10 simulations) of all other performance measuring indices are considered for comparative analysis. The best results (denoted by ‘Min’ in the tables) and the worst results (denoted by ‘Max’ in the tables) for MA, FA and OE, considering all the simulations, are also provided in the tables. Results of Mexico data set, Sardinia data set and Greece data set are put in Tables 2–6, Tables 7–11, and Tables 12–16, respectively. Results obtained using unsupervised MSOFM, supervised, constrained k-means algorithm, a robust fuzzy clustering algorithm, semi-supervised MLP and the proposed semi-supervised techniques for Mexico data set are given in Tables 2, 3, 4, 5 and 6, respectively. The corresponding results for Sardinia data set are depicted in Tables 7–11 and for Greece data set the results are put in Tables 12–16. From Tables 2 and 6, it is noticed that for Mexico data set, the proposed semi-supervised method (considering all the percentages of training patterns) outperforms the corresponding unsupervised version for most of the cases except the cases of missed alarms. It
has also been observed that in case of the proposed strategy the average values of almost all the measuring indices are significantly better than those of the corresponding unsupervised method, but the standard deviations are little more. This might be due to the fact that different training patterns are used for different simulations in case of semi-supervised technique. By comparing the standard deviation values of Tables 2, 3 and 6, it has been found that for all the performance measurements MLP based supervised technique produces much higher values than those obtained using unsupervised and semi-supervised approaches. This may be due to the unavailability of sufficient number of training samples to carry out any supervised method and it might be a typical example of any real life scenario. From Tables 3 and 6, it is also seen that the maximum overall error (worst case) over 10 simulations using the proposed approach are lower than the corresponding supervised method for the cases of optimal and energy based threshold; whereas the minimum overall error (best case) using the proposed strategy are not better for most of the cases for Mexico data set. It may be due to the fact that the supervised framework with good representative labeled patterns (covering the underline pattern distribution properly) can obtain better results than the corresponding semisupervised approach. But, such good training patterns may not be
S. Ghosh et al. / Applied Soft Computing 15 (2014) 1–20
13
Fig. 9. Change detection maps obtained for Greece data set: (a) using an unsupervised technique based on optimal threshold, (b) using MLP based supervised technique (with 0.1% training pattern), (c) using constrained k-means algorithm (with 0.1% training pattern), (d) using the proposed semi-supervised technique based on optimal threshold (with 0.1% training pattern), (e) a reference map for the changed area, (f) using semi-supervised MLP, and (g) using robust fuzzy c-means algorithm.
available for most of the cases, especially when there is a circumstance of inadequacy of labeled patterns. This is also justified by the attainment of higher standard deviation values for the supervised method over the unsupervised and semi-supervised ones. It has been also found that for the proposed method out of a total of 18 cases (considering different threshold values with different percentages of training patterns used for both maximum and minimum overall error), missed alarms and false alarms are less in 8 cases and 12 cases, respectively. For Kappa measure, in all cases the results obtained using the semi-supervised method are better than those of the corresponding supervised method. It has been also observed that out of a total of 9 cases (considering different threshold values with different percentages of training patterns), average values of overall error, micro averaged F1 measure, macro averaged F1 measure, error probability are more in 5 cases, 4 cases, 4 cases and 5 cases, respectively. From Tables 4 and 6, it has been observed that the proposed technique is better than constrained k-means algorithm in terms of all the measuring indices except the case of false alarms. By analyzing the results in Tables 5 and 6, it has been found that the proposed technique using optimal threshold (considering all the three different percentage of training patterns) outperforms the robust fuzzy clustering (RFCM) technique in all
the cases except the case of false alarms. From Tables 5 and 6, it is also seen that the proposed approach using all the three different threshold values is significantly better than Semi-supervised MLP in most of the cases. Comparative analysis of results of the unsupervised (in Table 7) and the proposed semi-supervised (in Table 11) techniques, for Sardinia data set, reveals a similar findings as in Mexico data set. By analyzing the results depicted in Tables 8 and 11, it is observed that the proposed methodology is always better than the supervised approach in terms of maximum, minimum and average overall error, micro averaged F1 measure, macro averaged F1 measure, Kappa measure, error probability and standard deviation. In cases of missed alarms and false alarms, it is seen that out of 18 cases (considering different threshold values along with different percentages of training patterns in terms of maximum and minimum overall error) the proposed strategy is better in 15 cases and 12 cases, respectively. From Tables 9 and 11, it is noticed that the proposed technique is better than constrained k-means algorithm in terms of all the measuring indices except the case of missed alarms. By comparing the results displayed in Tables 10 and 11, it has been found that the proposed approach considering all threshold values is significantly better than the robust fuzzy c-means
14
S. Ghosh et al. / Applied Soft Computing 15 (2014) 1–20
Fig. 10. Error maps obtained for Greece data set: (a) using an unsupervised technique based on optimal threshold, (b) using MLP based supervised technique (with 0.1% training pattern), (c) using constrained k-means algorithm (with 0.1% training pattern), and (d) using the proposed semi-supervised technique based on optimal threshold (with 0.1% training pattern).
clustering in most of the cases except the case of missed alarms and standard deviation. It has been also noticed that the proposed semisupervised approach is better suited for change detection than the semi-supervised approach using MLP in terms of almost all the measuring indices. From Tables 12 and 16 it is seen that for Greece data set the proposed methodology is providing better results than the corresponding unsupervised version in terms of false alarms, overall error and error probability. It has also been observed that the semisupervised approach is not being able to improve the performance for the cases of missed alarms. These results also corroborate our findings for the other two data sets. It is also noticed that the performance of the proposed method using micro averaged F1 measure is better than the corresponding unsupervised methodology considering the optimal and the energy based threshold values. In case of correlation based threshold with 0.5% and 1% training patterns (and not with 0.1% training patterns), micro averaged F1 measure provided better results than those obtained using unsupervised method. Among the three different threshold selection techniques used in the present article, for Kappa measure and macro averaged F1 measure the proposed technique produces better results than the unsupervised version only for the case of energy based threshold. By comparing the results of the supervised and the semisupervised approaches, for Greece data set (in Tables 13 and 16), it has been found that the proposed method is always better than the supervised method in terms of overall error, kappa measure and error probability. It has been also noticed that for the proposed method out of total 18 cases (considering different threshold values with different percentages of training patterns used for both maximum and minimum overall error), missed alarms and false alarms are less in 5 cases and 15 cases, respectively. Out
of total 9 cases (considering different threshold values with different percentages of training patterns), average values of micro averaged F1 measure and macro averaged F1 measure are higher in 8 cases and 7 cases. From Tables 14 and 16, it is noticed that the proposed technique is better than constrained k-means algorithm in terms of all the measuring indices except the case of missed alarms. From Tables 15 and 16, it has been found that the results obtained using the proposed approach for all the three different thresholding techniques are significantly better than those obtained using a robust fuzzy clustering approach and the semisupervised MLP in almost all cases. In case of Greece data set, the performance of the semi-supervised MLP [39] and the fuzzy cmeans algorithm [43] are noticeably worse. By this observation, it can be concluded that the semi-supervised MLP and RFCM are not robust for all the datasets, used for experimentation; whereas the proposed semi-supervised approach performed well for all the data sets. From Tables 6, 11 and 16 (considering the results obtained using semi-supervised approach under different conditions), it is also observed that the performance measurement indices are mostly attaining better values with an increase of percentages of training patterns. Robustness of the proposed methodology (as evident from the standard deviation) is slightly worse than the unsupervised approach but far better than the supervised technique. To sum up, considering the results obtained for 10 different performance measures with different data sets, different thresholding techniques and different percentages of training patterns (wherever applicable), the proposed semi-supervised methodology, has an edge over the unsupervised as well as supervised ones when a small number of labeled patterns are available.
S. Ghosh et al. / Applied Soft Computing 15 (2014) 1–20
15
Fig. 11. Graphs of the error-rate for Mexico data set obtained using: (a) unsupervised MSOFM (optimal threshold), (b) fuzzy c-means algorithm incorporating local information (i.e. RFCM), (c) supervised MLP (with 0.5% training pattern), (d) Semi-supervised MLP, (e) constrained k-means algorithm (with 0.5% training pattern), and (f) the proposed semi-supervised technique MSOFM (with optimal threshold and 0.5% training pattern).
To test the significance of results statistically (in terms of Kappa measure) of the investigation, paired t-test [45] has been performed with the proposed semi-supervised approach versus the other unsupervised, supervised and semi-supervised methods at 5% level of significance and the results of t-test in terms of p-score are reported in Table 17. For typical illustration, we have considered the results (over 10 simulations) obtained using optimal threshold and 0.5% training patterns. Statistically significant results in terms of p-score of the paired t-test (at 5% level of significance) are marked as bold in Table 17. By analyzing the results, in all the cases, significant improvement has been found by the proposed method as compared to the other methods. For visual illustration, the change detection maps are displayed in figures, corresponding to the minimum overall error (obtained over 10 simulations), using unsupervised method, supervised method, constrained k-means algorithm and the
proposed semi-supervised method. The change detection maps obtained using these four approaches for Mexico, Sardinia and Greece data sets are shown, respectively, in Figs. 5, 7 and 9. The error maps (highlighting the difference between the change detection map and the reference map) are also displayed in Figs. 6, 8 and 10. It has been observed that the change detection maps obtained using the proposed method are more accurate (resemblance of the reference map) in all cases. From the maps, it is clearly visible that erroneous classification of the unchanged areas as changed ones (i.e., false alarms) has been significantly reduced by the proposed methodology. But, it failed to detect some of the small and scattered changed areas (i.e., missed alarms), where the pixels are on the boundary or the area is surrounded by a vast amount of unchanged regions. This is obviously due to the neighboring effect. Graphs of error-rate obtained using different unsupervised, supervised and semi-supervised techniques on three data sets are
16
S. Ghosh et al. / Applied Soft Computing 15 (2014) 1–20
Fig. 12. Graphs of the error-rate for Sardinia data set obtained using: (a) unsupervised MSOFM (optimal threshold), (b) fuzzy c-means algorithm incorporating local information (i.e. RFCM), (c) supervised MLP (with 0.5% training pattern), (d) Semi-supervised MLP, (e) constrained k-means algorithm (with 0.5% training pattern), and (f) the proposed semi-supervised technique MSOFM (with optimal threshold and 0.5% training pattern).
displayed in Figs. 11–13. For typical illustration, we have considered the results obtained using optimal threshold and 0.5% training patterns. From the graphs, the rate of change in terms of overall error with increasing epoch or training step has been noticed
in a particular simulation in which the minimum overall error is obtained. In the figures, it has been noticed for all the data sets that after initial fluctuations, the overall error is becoming either steady or decreases with increasing epochs (or training step) in
Table 17 Results of paired t-test performed with the proposed semi-supervised technique versus the other unsupervised, supervised and semi-supervised methods in terms of p-score. Data set used
Proposed vs. unsupervised MSOFM
Proposed vs. RFCM
Proposed vs. MLP
Proposed vs. semi-MLP
Proposed vs. constrained k-means
Mexico Sardinia Greece
1.1221 × 10−23 2.9941 × 10−13 0.0176
1.4471 × 10−21 2.2788 × 10−26 1.7321 × 10−42
0.0262 1.5046 × 10−11 2.0630 × 10−5
7.5308 × 10−25 9.7339 × 10−16 2.0507 × 10−38
1.9845 × 10−36 1.2720 × 10−29 1.7321 × 10−42
S. Ghosh et al. / Applied Soft Computing 15 (2014) 1–20
17
Fig. 13. Graphs of the error-rate for Greece data set obtained using: (a) unsupervised MSOFM (optimal threshold), (b) fuzzy c-means algorithm incorporating local information (i.e. RFCM), (c) supervised MLP (with 0.5% training pattern), (d) Semi-supervised MLP, (e) constrained k-means algorithm (with 0.5% training pattern), and (f) the proposed semi-supervised technique MSOFM (with optimal threshold and 0.5% training pattern).
18
S. Ghosh et al. / Applied Soft Computing 15 (2014) 1–20
of overall error is the total error obtained by the change detection process, and is computed as the summation of number of missed alarms and number of false alarms. Our objective is to minimize this value. A.2. Macro averaged F1 measure The macro averaged F1 measure [46] is also called macro F1 . This is computed by averaging the F1 score of each category. The F1 score of each category is computed form precision and recall. The precession of category i, denoted as pi , is,
Fig. 14. Mathematical representation of confusion matrix.
pi =
patterns correctly classified into category i , patterns classified into category i
(A.1)
and recall of the category i, ri , is defined as, case of most of the techniques except the case of constrained k-means. 5. Conclusion In this paper, an attempt has been made to improve the performance of change detection of remotely sensed images under the scarcity of labeled patterns by exploiting the selforganizing capacity of Kohonen’s neural network integrated with semi-supervision. Here, semi-supervised learning is employed by taking into consideration a few labeled patterns. Iterative learning of the MSOFM network is done using both the labeled patterns and the selected unlabeled patterns. A heuristic technique is also suggested for collecting the unlabeled patterns. Experiments are carried out on three multi-temporal and multi-spectral data sets to confirm the effectiveness of the proposed technique. From the results, it has been found that the proposed semi-supervised approach is better suited for change detection than unsupervised and supervised methods where a small amount of labeled patterns is available. Like other semi-supervised methods, the technique has a drawback of requirement of more computational time. Acknowledgments The authors like to thank the reviewers for their thorough and constructive comments which helped to enhance the quality of the article. The authors are also grateful to the Department of Science and Technology (DST), Government of India and University of Trento, Italy, the sponsors of the ITPAR program and Prof. L. Bruzzone for providing the data. Moumita Roy is grateful to Council of Scientific & Industrial Research (CSIR), India for providing her a Senior Research Fellowship [No. 09/096(0684)2k11-EMR-I]. Appendix A. Performance measures The detailed description of different performance measuring indices, used for evaluation purpose, are given below: A.1. Missed alarms, false alarms and overall error The number of missed alarms is calculated by comparing the obtained change detection map and the reference map. It is equal to the number of pixels wrongly predicted to be in unchanged category, i.e., changed ones are identified as unchanged ones. The number of false alarms is the reverse situation of the number of missed alarms. It is the number of unchanged pixels classified as changed ones. In this case, it is calculated as the number of pixels those are wrongly predicted to be in changed category. The number
ri =
patterns correctly classified into category i . patterns that are truely present in category i
(A.2)
Then, F1 score of category i, (F1 )i , is computed as the harmonic mean between precision and recall, i.e., (F1 )i =
2 × pi × ri . pi + ri
(A.3)
F1 measure gives equal importance to both precision and recall. After that, macro F1 is calculated to find the global mean of percategory F1 scores. 1 (F1 )i , C C
Macro averaged F1 =
(A.4)
i=1
where C represents the number of categories (classes). Macro averaged F1 gives equal weightage to each category and its value lies between 0 and 1. For macro averaged F1 measure, a value close to 1 denotes better classification. A.3. Micro averaged F1 measure The micro averaged F1 measure [46] is calculated by using a global contingency table. The cell values of that table are defined by summing up the corresponding cell values in the per-category contingency table. The micro averaged F1 measure gives equal weightage to each sample and it is defined as: Micro averaged F1 =
2 × (1/C) (1/C)
C
C
p i=1 i
p i=1 i
× (1/C)
+ (1/C)
C
C
r i=1 i
r i=1 i
,
(A.5)
where C is the number of categories. The micro averaged F1 measure is also called micro F1 . Like macro F1 , the value of this measure also lies between 0 and 1. More close the value of micro-averaged F1 to 1, the better is the classification. A.4. Kappa measure Kappa measure [47] is calculated using the confusion matrix. The confusion matrix (in Fig. 14) is a C × C matrix where M samples can be classified into C categories. The M samples are distributed into C2 cells, and each sample is assigned to one of the C categories in the classification map (usually, the rows) and independently to one of the same categories in the reference map (usually, the columns). Let, mij denote the number of samples classified into category i (i = 1, 2, . . ., C) in the classification map and category j (j = 1, 2, . . ., C) in the reference map. C Let, mi+ = m be the number of samples classified into catj=1 ij
C
m be the number egory i in the classification map and m+j = i=1 ij of samples classified into category j in the reference map.
S. Ghosh et al. / Applied Soft Computing 15 (2014) 1–20
Let, Pij denote the proportion of the samples in the (i, j)th cell, corresponding to mij in the confusion matrix. In other words, Pij = mij /M. Let, Pi+ and P+j be defined as: Pi+ =
C
Pij ,
j=1
and P+j =
C
Pij .
i=1
Then, the actual agreement Po between the reference judgment C P . and classifier judgment is computed as, Po = i=1 ii The chance agreement Pc between them is calculated as: Pc =
C
Pi+ P+j .
i=1
The Kappa measure is defined as follows: kˆ = (Po − Pc )/(1 − Pc ). For computational purpose, it can be rewritten as: kˆ =
M
C
i=1
M2
mii −
−
C
C i=1
i=1
mi+ m+i
mi+ m+i
.
(A.6)
A.5. Error probability The error probability (PE ) is computed by using the probability of missed alarms (PM ), the probability of false alarms (PF ), the a-priori probability of the changed pixels (PO ) and the a-priori probability of the unchanged pixels (PL ). PE is calculated as, PE = PO × PF + PL × PM .
(A.7)
PF is defined as the ratio between the number of false alarms and the total number of unchanged pixels; PM is determined as the ratio between the number of missed alarms and the total number of changed pixels. The probability, PO , can be estimated as the ratio between the number of changed pixels in the reference map and the total number of image pixels; whereas, the probability, PL , is calculated similarly as the ratio between the number of unchanged pixels in the reference map and the total number of image pixels. Lesser is the value of the error probability (PE ), better is the classification. References [1] A. Singh, Digital change detection techniques using remotely-sensed data, International Journal of Remote Sensing 10 (6) (1989) 989–1003. [2] M.J. Canty, Image Analysis, Classification and Change Detection in Remote Sensing, CRC Press/Taylor & Francis, Boca Raton, 2006. [3] R.J. Radke, S. Andra, O. Al-Kofahi, B. Roysam, Image change detection algorithms: a systematic survey, IEEE Transactions on Image Processing 14 (3) (2005) 294–307. [4] C.M. Bishop, Pattern Recognition and Machine Learning, Springer, New York, USA, 2006. [5] Q. Zhang, J. Wang, X. Peng, P. Gong, P. Shi, Urban built-up land change detection with road density and spectral information from multi-temporal Landsat TM data, International Journal of Remote Sensing 23 (15) (2002) 3057–3078. [6] R. Manonmani, G.M.D. Suganya, Remote sensing and GIS application in change detection study in urban zone using multi temporal satellite, International Journal of Geomatics and Geosciences 1 (1) (2010) 60–65. [7] K.R. Merril, L. Jiajun, A comparison of four algorithms for change detection in an urban environment, Remote Sensing of Environment 63 (2) (1998) 95–100. [8] M.M. Yagoub, Monitoring of urban growth of a desert city through remote sensing: Al-A in, UAE, between 1976 and 2000, International Journal of Remote Sensing 25 (6) (2004) 1063–1076. [9] L. Bruzzone, D.F. Prieto, An adaptive parcel-based technique for unsupervised change detection, International Journal of Remote Sensing 21 (4) (2000) 817–822.
19
[10] F. Yuan, K.E. Sawaya, B.C. Loeffelholz, M.E. Bauer, Land cover classification and change analysis of Twin cities (Minnesota) Metropolitan Area by multitemporal Landsat remote sensing, Remote Sensing of Environment 98 (2005) 317–328. [11] G.M. Foody, Monitoring the magnitude of land-cover change around the southern limits of the Sahara, Photogrammetric Engineering and Remote Sensing 67 (2001) 841–847. ˜ [12] G. Camps-Valls, L. Gómez-Chova, J. Munoz-Mari, J.L. Rojo-Álvarez, M. MartinezRamón, Kernel-based framework for multitemporal and multisource remote sensing data classification and change detection, IEEE Transactions on Geoscience & Remote Sensing 46 (6) (2008) 1822–1835. [13] S. Ghosh, L. Bruzzone, S. Patra, F. Bovolo, A. Ghosh, A context-sensitive technique for unsupervised change detection based on Hopfield-type neural networks, IEEE Transactions on Geoscience & Remote Sensing 45 (3) (2007) 778–789. [14] S. Ghosh, S. Patra, A. Ghosh, An unsupervised context-sensitive change detection technique based on modified self-organizing feature map neural network, International Journal of Approximate Reasoning 50 (1) (2009) 37–50. [15] F. Melgani, G. Moser, S.B. Serpico, Unsupervised change detection methods for remote sensing images, Optical Engineering 41 (12) (2002) 3288–3297. [16] D. Liu, K. Song, J.R.G. Townshend, P. Gong, Using local transition probability models in Markov random fields for forest change detection, Remote Sensing of Environment 112 (5) (2008) 2222–2231. [17] T. Kasetkasem, P.K. Varshney, An image change detection algorithm based on Markov random field models, IEEE Transactions on Geoscience & Remote Sensing 40 (8) (2002) 1815–1823. [18] X. Liu, Urban change detection based on an artificial neural network, International Journal of Remote Sensing 23 (12) (2002) 2513–2518. [19] A. Ghosh, N.S. Mishra, S. Ghosh, Fuzzy clustering algorithms for unsupervised change detection in remote sensing images, Information Sciences 181 (4) (2011) 699–715. [20] G. Pajares, A Hopfield neural network for image change detection, IEEE Transactions on Neural Networks 17 (5) (2006) 1250–1264. [21] Y. Bazi, F. Melgani, L. Bruzzone, G. Vernazza, A genetic expectationmaximization method for unsupervised change detection in multitemporal SAR imagery, International Journal of Remote Sensing 30 (24) (2009) 6591–6610. [22] J.R. Jensen, Introductory Digital Image Processing: A Remote Sensing Perspective, Prentice Hall, New Jersey, 2005. [23] X. Zhu, Semi-supervised Learning Literature Survey, Computer Sciences TR1530, University of Wisconsin, Madison, 2008. [24] O. Chapelle, B. Schölkopf, A. Zien, Semi-supervised Learning, MIT Press, Cambridge, 2006. [25] T. Lange, M.H.C. Law, A.K. Jain, J.M. Buhmann, Learning with constrained and unlabelled data, in: Proceedings IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1, 2005, pp. 731–738. [26] S. Basu, M. Bilenko, R.J. Mooney, Comparing and unifying search-based and similarity-based approaches to semi-supervised clustering, in: Proceedings 20th International Conference on Machine Learning (ICML-2003), Washington, DC, 2003, pp. 42–49. [27] K. Wagstaff, C. Cardie, S. Rogers, S. Schroedl, Constrained K-means clustering with background knowledge, in: Proceedings 18th International Conference on Machine Learning, Williamstown, MA, USA, 2001, pp. 577–584. [28] D.-Y. Yeung, H. Chang, A kernel approach for semisupervised metric learning, IEEE Transactions on Neural Networks 18 (1) (2007) 141–149. [29] C. Hou, F. Nie, F. Wang, C. Zhang, Y. Wu, Semisupervised learning using negative labels, IEEE Transactions on Neural Networks 22 (3) (2011) 420–432. [30] H. Chen, L. Li, J. Peng, Error bounds of multi-graph regularized semi-supervised classification, Information Sciences 179 (12) (2009) 1960–1969. [31] C.F. Gao, X.J. Wu, A new semi-supervised clustering algorithm with pairwise constraints by competitive agglomeration, Applied Soft Computing 11 (8) (2011) 5281–5291. [32] C.-C. Chang, H.-K. Pao, Y.-J. Lee, An RSVM based two-teachers-onestudent semi-supervised learning algorithm, Neural Networks 25 (2012) 57–69. [33] K. Chen, S. Wang, Semi-supervised learning via regularized boosting working on multiple semi-supervised assumptions, IEEE Transactions on Pattern Analysis and Machine Intelligence 33 (1) (2011) 129–143. [34] N. Kumar, K. Kummamuru, Semisupervised clustering with metric learning using relative comparisons, IEEE Transactions on Knowledge and Data Engineering 20 (4) (2008) 496–503. [35] A. Verikas, A. Gelzinis, K. Malmqvist, Using unlabelled data to train a multilayer perceptron, Neural Processing Letters 14 (3) (2001) 179–201. [36] Y. Kamiya, T. Ishii, S. Furao, O. Hasegawa, An online semi-supervised clustering algorithm based on a self-organizing incremental neural network, in: Proceedings International Joint Conference on Neural Networks, Orlando, FL, USA, 2007. [37] F. Ratle, G. Camps-Valls, J. Weston, Semisupervised neural networks for efficient hyperspectral image classification, IEEE Transactions on Geoscience & Remote Sensing 48 (5) (2010) 2271–2282. [38] X. Zenglin, I. King, M.-T. Lyu, J. Rong, Discriminative semi-supervised feature selection via manifold regularization, IEEE Transactions on Neural Networks 21 (7) (2010) 1033–1047. [39] S. Patra, S. Ghosh, A. Ghosh, Change detection of remote sensing images with semi-supervised multilayer perceptron, Fundamenta Informaticae 84 (2008) 429–442. [40] T. Kohonen, Self-Organizing Maps, 2nd edition, Springer, Berlin, 1997.
20
S. Ghosh et al. / Applied Soft Computing 15 (2014) 1–20
[41] S. Haykin, Neural Networks A Comprehensive Foundation, Prentice-Hall of India, New Delhi, 2007. [42] A. Ghosh, S.K. Pal, Neural network, self-organization and object extraction, Pattern Recognition Letters 13 (5) (1992) 387–397. [43] N.S. Mishra, S. Ghosh, A. Ghosh, Fuzzy clustering algorithms incorporating local information for change detection in remotely sensed images, Applied Soft Computing 12 (8) (2012) 2683–2692. [44] S. Basu, A. Banerjee, R. Mooney, Semi-supervised clustering by seeding, in: Proceedings 19th International Conference on Machine Learning (ICML-2002), Sydney, Australia, 2002, pp. 19–26.
[45] E. Kreyszig, Introductory Mathematical Statistics: Principles and Methods, John Wiley & Sons Publisher, New York, 1970. [46] A. Halder, A. Ghosh, S. Ghosh, Aggregation pheromone density based pattern classification, Fundamenta Informaticae 92 (4) (2009) 345–362. [47] R.G. Congalton, K. Green, Assessing the Accuracy of Remotely Sensed Data: Principles and Practices, 2nd edition, CRC Press/Taylor & Francis Group, Boca Raton/London/New York, 2009.