Computers & Geosciences 131 (2019) 132–143
Contents lists available at ScienceDirect
Computers and Geosciences journal homepage: www.elsevier.com/locate/cageo
Research paper
Remote sensing image classification based on semi-supervised adaptive interval type-2 fuzzy c-means algorithm Jindong Xu *, Guozheng Feng, Tianyu Zhao, Xiao Sun, Meng Zhu School of Computer and Control Engineering, YanTai University, Yantai, 264005, China
A R T I C L E I N F O
A B S T R A C T
Keywords: Fuzzy c-means algorithm Remote sensing image classification Semi-supervised Type-2 fuzzy set
Because of the uncertainty in remote sensing images and the ill-posedness of the problem, it is difficult for traditional unsupervised classification algorithms to create an accurate classification model. In contrast, pattern recognition methods based on fuzzy set theory, such as fuzzy c-means clustering, can manage the fuzziness of data effectively. Of these methods, the type-2 fuzzy c-means algorithm is better able to control uncertainty. Furthermore, semi-supervised training can use prior knowledge to deal with ill-posedness, and hence is more suitable. Therefore, we propose a novel classification method based the semi-supervised adaptive interval type-2 fuzzy c-means algorithm (SS-AIT2FCM). First, by integrating the semi-supervised approach, an evolutional fuzzy weight index m is proposed that improves the robustness and well-posedness of the model used in the clustering algorithm. This makes the algorithm suitable for remote sensing images with severe spectral aliasing, large coverage areas, and abundant features. In addition, soft constraint supervision is performed using a small number of labeled samples, which optimizes the iterative process of the algorithm and determines the optimal set of features for the data. This further reduces the ill-posedness of the model itself. The experimental data consist of three study areas: SPOT5 imagery from Big Hengqin Island, Guangdong, China, and the Summer Palace, Beijing, China, as well as TM imagery from Hengqin Island. Compared with several state-of-the-art fuzzy classification algorithms, our algorithm improves classification accuracy by more than 5% overall and obtains clearer boundaries in remote sensing images with serious mixed pixels. Moreover, it is able to suppress the phenomenon of isomorphic spectra.
1. Introduction Remote sensing is an indispensable global-scale observation tool in earth science, environmental science, resource science, and global change research. It is the basic technical support required for imple menting sustainable development strategies (Xu et al., 2016), and remote sensing image classification is the fundamental basis of analysis and applications in the field of remote sensing (Du et al., 2016). The ultimate goal of remote sensing image classification is to divide the remote sensing image into multiple regions (pixel groups). However, the inherent uncertainty of remote sensing image data and its ubiquitous phenomenon of the “same object with different spectra” (He et al., 2016a,b; Xu et al., 2015) often leads to the ill-posedness of the model used in the algorithm. It further influences the accuracy of the classifi cation results and restricts the development and application of remote sensing technology. As the spatial resolution of remote sensing images increases, the diversity of classifications and the complexity of noise also
greatly increase (Zhang, 2018). Traditional single-approach remote sensing image classification techniques often cannot adapt to the pro cessing needs of various remote sensing data (Gong and Zhong, 2016). Therefore, it is important to study new classification algorithms that are more universal and adaptable to the current problems. Classification algorithms can be divided into two categories ac cording to whether there is labeled information in the unclassified data (i.e., supervised and unsupervised algorithms; Sinh and Long, 2018). A traditional unsupervised classification algorithm ignores any labeled information in the data samples, often obtaining a data distribution that is unsuitable for characterizing remote sensing images. For supervised clustering algorithms, it is hard to obtain sufficient labeled information because the data of remote sensing images has a high number of di mensions (Zhang, 2016). Classification methods based on the semi-supervised approach utilize a small number of labeled samples and a large number of unlabeled samples, so they greatly reduce the need of the classifier for labeled samples, and thus become an effective solution
* Corresponding author. E-mail address:
[email protected] (J. Xu). https://doi.org/10.1016/j.cageo.2019.06.005 Received 19 December 2018; Received in revised form 5 June 2019; Accepted 10 June 2019 Available online 17 June 2019 0098-3004/© 2019 Elsevier Ltd. All rights reserved.
J. Xu et al.
Computers and Geosciences 131 (2019) 132–143
(Tong et al., 2016; We et al., 2017; Zhang, 2016). The semi-supervised classification of remote sensing images is based on prior knowledge and introduces the classification training process for labeled classes (Du et al., 2016). Because of the ill-posedness of the remote sensing image classification problem, statistical patterns cannot completely express the data distribution of remote sensing images, and the classification results using such pattern cannot be estimated or are inaccurate (Persello and Bruzzone, 2014). Introducing the method of semi-supervised classifi cation can effectively address the ill-posedness of this problem (Craw ford, 2013). Fuzzy set mathematical theory is an effective way to express fuzzi ness and uncertainty (Zadeh, 1965). Type-1 fuzzy c-means (T1FCM) algorithms have been widely studied in the field of remote sensing image classification because they can handle the uncertainty of remote sensing data to some extent (Choubin et al., 2017; Liu, 2016; Yu et al., 2014), but they are not ideal for the classification of remote sensing image data with density differences and large uncertainty (Hwang and Rhee, 2007). In contrast to type-1 fuzzy sets, type-2 fuzzy sets describe the uncer tainty of data by constructing an uncertain membership function (Memon, 2018). Thus, type-2 fuzzy sets are better able to describe multiple types of fuzzy or uncertain information and are more suitable for dealing with the multiple uncertainties in remote sensing image classification (Huo et al., 2017). However, it is hard to use them for remote sensing image classification because of their high computing complexity. Interval-based type-2 fuzzy classification can describe higher-order uncertainty well while effectively reducing the computa tional complexity of the ordinary type-2 fuzzy sets (Guo and Huo, 2017; Yu et al., 2014). In addition, deep learning method has proven to be a successful breakthrough in the field of remote sensing image classifi cation from the perspective of learning and connectionism (Palafox et al., 2018; Wang et al., 2018), but this paper aims to solve the classi fication problem from the statistical characteristics of data and its own uncertainty. Therefore, to preserve the well-posedness and computa tional complexity of the classification algorithm while taking advantage of the semi-supervised learning approach, this paper proposes a remote sensing image classification method based on semi-supervised adaptive interval type-2 FCM (SS-AIT2FCM). The results show that SS-AIT2FCM substantially improves the classification of remote sensing images.
experiment results proved that interval type-2 fuzzy modeling is more accurate and robust than type-1 fuzzy modeling in remote sensing image classification. In 2016, He et al. proposed the adaptive IT2FCM (AIT2FCM) algorithm, which models the interval membership using a fuzzy distance metric and adaptively reduces type to 1 by class variance (He et al., 2016a,b). However, research based on type-2 fuzzy set clus tering in remote sensing images is still in the initial stage. At present, semi-supervised clustering algorithms use a small number of labeled samples to assist unsupervised learning. These labeled sam ples guide iterative search, are used to enhance search accuracy, decrease the chance of the algorithm becoming trapped in local ex tremes, and accelerate the algorithm’s convergence and calculation speeds (Wang et al., 2017). Pedrycz et al. introduced labeled samples into the FCM algorithm, employing supervised search in the objective function optimization. They further proposed the semi-supervised FCM (SS-FCM) algorithm, including the physical definition of the objective function in the SS-FCM algorithm, and the calculation of the equilibrium coefficient, and the alternating optimization process of fuzzy covariance (Bouchachia and Pedrycz, 2006; Pedrycz, 1985; Pedrycz and Waletzky, 1997a, 1997b). There are many studies on semi-supervised fuzzy c-means algorithms for remote sensing image classification (Pei et al., 2014; Shao et al., 2016; Zhu et al., 2014). Long proposed a semi-supervised IT2FCM (SIIT2-FCM) algorithm based on two different fuzzy weight index models. This method is used for classification and change detection in remote sensing images (Long et al., 2015). However, the different index modeling methods are too subjective for selecting a fuzzy index, which are prone to the problem of large categories swal lowing small categories. In summary, IT2FCM clustering adopts a more advanced mechanism than the traditional T1FCM clustering method for measuring classifi cation distance. Moreover, there is much room for improvement in al gorithm efficiency, accuracy, and adaptability. The semi-supervised classification method makes the model in the classification algorithm well posed and able to handle remote sensing data because it employs some prior information. Therefore, we present our classification method by integrating a semi-supervised approach with IT2FCM clustering.
2. Related work
3.1. Fundamentals of SS-AI2FCM
Zadeh first proposed the concept of a fuzzy set in 1965, and Ruspini proposed the concept of fuzzy clustering analysis combined with fuzzy theory in 1969. In 1974, Dunn proposed a fuzzy c-means (FCM) clus tering algorithm based on the determinate fuzzy weight index (m ¼ 2), which is a dynamic clustering algorithm that uses gradient descent to find the optimal solution of a function (He et al., 2016a,b). Since then, fuzzy clustering has been widely applied and developed in the field of pattern recognition and image processing (Awad et al., 2009; Awad and Nasri, 2009; Stutz and Runkler, 2002). In 1975, Zadeh further extended the membership function of type-1 fuzzy sets by proposing type-2 fuzzy set theory, which is more able to express fuzziness (Zadeh, 1975). Type-2 fuzzy theory has received much attention and been widely developed (Liang and Mendel, 2000; Wu and Mendel, 2007). Mendel et al. combined the interval set concept with fuzzy logic, proposing the interval type-2 fuzzy set (Mendel, 2007), which forces the membership of the second-order membership function to a value of 1. This reduces the computational complexity of type-2 fuzzy sets, improving the real-time processing ability of type-2 fuzzy algorithms, and facilitating the extensive application of type-2 fuzzy set theory. At present, there are many algorithms based on interval type-2 fuzzy sets. In 2014, Yu et al. used interval type-2 FCM (IT2FCM) for remote sensing image classifi cation (Yu et al., 2014). This method uses two fuzzy weight indices to construct the membership interval and obtains accurate classification results. In 2015, Assas (2015) compared the various hard structure models of membership function intervals on remote sensing data. The
Based on IT2FCM (Hwang and Rhee, 2007), we propose SS-AIT2FCM. The IT2FCM algorithm consists of three steps: fuzzy con struction (constructing a fuzzy partition matrix), clustering center updating (determining cluster centers by iterative optimization), and defuzzification (classifying the pattern set according to the principle of maximum membership).
3. Proposed algorithm—SS-AIT2FCM
3.1.1. Fuzzy construction According to the definition of interval type-2 fuzzy sets, for n sam ples, an interval type-2 fuzzy set of p feature sample spaces X ¼ fx1 ; x2 ; fx1 ; x2 ; …; xn g ðxi ¼ fxi1 ; xi2 ; …; xip gÞ can be represented by n intervals. Therefore, the definition of an interval type-2 fuzzy partition matrix is as follows: (1)
U ¼ ½U ; U�
where U is an X -interval fuzzy partition matrix, and fuzzy membership matrix of the two boundaries U and U are described as follows: 3 2 3 2 6 6 6 U ¼6 6 6 4
133
u 11 u 21 ⋮ u c1
u 12 u 22 ⋮ u c1
… u 1n … u 2n ⋮ ⋮ … u cn
6 7 6 u11 7 6 7 u 7 and U ¼ 6 6 21 7 6 ⋮ 7 6 5 4 uc1
u12 u22 ⋮ uc2
⋯ u1n ⋯ u2n ⋮ ⋮ ⋯ ucn
7 7 7 7 7 7 7 5
(2)
J. Xu et al.
Computers and Geosciences 131 (2019) 132–143
where uik ¼ ½uik ; uik � defines the sample xi membership degree of clus tering center V ¼ fv1 ;⋯;vk g, and U defines the interval type-2 fuzzy set, which can be obtained from two fuzzy partition functions F1 and F2 as follows:
where c is the number of categories and djk represents the distance be tween data point xk and cluster center vj . Here, it can be seen that Jm will monotonically decrease as m increases, which proves that m can effec tively determine the Jm corresponding to the classification result of FCM. However, some theories of FCM based on fuzzy set theory still need to be further improved (Pal and Bezdek, 1995). The best way to choose fuzzy weight index m still lacks theoretical guidance. Although there are some studies in this field, there is no uniform standard (Pal and Bezdek, 1995; Yu, 2003). The fuzzy algorithm based on interval type-2 fuzzy sets rarely includes the selection criterion of fuzzy weight index m. Currently, se lection indicators are still specified by the user. However, it is clear that the selection of the fuzzy weight index m depends on the data itself (Yu, 2003). Because of the uncertainty of remote sensing image data, it is necessary to select the m value that can best express the ambiguity of the data. From this point of view, the value of m should not be too large or too small. Therefore, in this study, the best selection interval [1.5, 2.5] proposed by Pal and Bezdek (1995) is expanded to the interval to [1.1, 2.9] for classifying remote sensing images with serious spectral aliasing, large coverage area, and abundant features. Hence, an adaptive fuzzy weight index selection method based on the data itself is proposed. Index βt 2 ½0; 1� is defined as the clustering validity index of fuzzy weight index mt , and is calculated as follows: �� �� �� M �� XM j jj u ��X �� c jjX X kj j �� �� ; XM εXL βt ¼ (11) �� �� c⋅��XLj �� j¼1 k¼1
(3)
U ¼ F1 ðX; VÞ and U ¼ F2 ðX; VÞ
3.1.2. Clustering center updating After determining U ¼ ½U ; U�, we construct c interval type-2 fuzzy sets of sample space X. For each interval type-2 fuzzy set, in each feature dimension, the center of the interval type-2 fuzzy sets found by the typereduction algorithm is as follows: (4)
V ¼ ½V L ; V R � where V L and V R are defined as follows: 2 3 2 6 6 6 VL ¼ 6 6 6 4
v 11 v 21 ⋮ v c1
v 12 v 22 ⋮ v c1
… v 1p … v 2p ⋮ ⋮ … v cp
6 7 6 v11 7 6 7 v 7 and V R ¼ 6 6 21 7 6 ⋮ 7 6 5 4 vc1
3 v12 v22 ⋮ vc2
⋯ v1p ⋯ v2p ⋮ ⋮ ⋯ vcp
7 7 7 7 7 7 7 5
(5)
where the kth clustering center vk is a fuzzy set of type-1 intervals: (6)
vk ¼ ½vkl ; vkr �
Various methods can be used to deblur. The mean method is given below: vk ¼
vkl þ vkr 2
where XL ¼ fXL1 ; X L2 ; …; XLc g represents the set of labeled samples, M M XM ¼ fXM 1 ; X 2 ; …; X c g represents the intersection of the��classifica �� �� �� tion result and the labels when the weighting index is mi , ��X M j �� and �� �� �� L �� ��Xj �� respectively represent the number of samples of the jth class in XM �� �� and X L , ��XL �� indicates the number of samples in the marker sample set,
(7)
Using the new center, the final center can be obtained by iterating through the above steps. 3.1.3. Defuzzification The final clustering center V and the fuzzy partition matrix U ¼ ½U ; U�are obtained through iteration. The clear partitioning matrix is ob tained by the closest approach to the central principle, as expressed by the following formula: uik ¼
u ik þ uik 2
and ukj represents the membership degree of the kth sample point in XM j
belonging to the jth cluster. Larger values of β indicate that clustering result is good. The formula for calculating the fuzzy weight index is as follows: mt ¼ mt
(8)
(9)
1�k�c
þ N½0; αð1
βt Þ�
(12)
where mt 1 and mt are the fuzzy weight indices of the t 1th and t th it erations, respectively, and the values are within the interval [1.1, 2.9]. Constant α is a positive constant specified by the user. In addition, random variable N½0; αð1 βt Þ� is the evolution step size of the fuzzy weight index, which follows a normal distribution with mean 0 and variance αð1 βt Þ. It can be seen that large values of β show that the evolution step size of m is smaller. When β is close to 1, m also converges to the optimal fuzzy weight index. To guide the search for a more reli able fuzzy weight index m, this method compares labeled samples with unsupervised outputs using validity index β as a measure. The fuzzy weight index m selection algorithm based on evolution (Szilagyi, 2009) is performed as follows:
Finally, the partitioning result is obtained using the maximum-value defuzzifier; that is, the class of each data point is determined by its maximum membership value as follows: xi 2 Ck if uik ¼ max fuik g
1
where Ck is the partition class for center vk . 3.2. Selecting fuzzy weight index m based on evolution theory In contrast to the T1FCM algorithm, the core idea of the IT2FCM al gorithm is the interval structure of membership matrix U, which gives the algorithm stronger fuzzy expressiveness. The calculation of membership matrix U depends largely on fuzzy weight index m. Larger m values lead to more blurred classification results, so the selection of m is crucial. n P c P The objective function of the FCM algorithm is Jm ¼ ðujk Þm djk 2 .
Step 1: Initialize the fuzzy weight index m0 2 ½1:1; 2:9�, positive constant α and validity index β0 ¼ 0, initial cluster center V 0 , maximum iteration number T, threshold ε, number of clusters c, and iteration number t ¼ 1; Step 2: Iterate fuzzy unsupervised classification on the labeled sample set XL , and obtain the intersection X M of the classification result and the labels; Step 3: Calculate the validity index βt according to Eq. (11) and update mt according to Eq. (12);
k¼1 j¼1
Further, if the relationship among the membership function, clustering model, and fuzzy weight index m is ignored, the function can be derived as follows: # n X c n X c � �2 X � �� �2 ∂Jm X ujk log ujk umjk 1 djk <0 ðuik Þm ⋅log ujk ⋅ djk ¼ ¼ ∂m k¼1 j¼1 k¼1 j¼1 (10) 134
J. Xu et al.
Computers and Geosciences 131 (2019) 132–143
Step 4: If jjβt βt 1 jj � ε or t � T or mt 62 ½1:1; 2:9�, terminate the it erations; otherwise, set t ¼ t þ 1, change cluster center to V 0 , and go to Step 2. 3.3. SS-AIT2FCM algorithm Combined with a semi-supervised approach, the objective function of the SS-AIT2FCM algorithm is designed as follows: Jt ¼ ð1
XU jj � XL jj � � c jjX c jj �m X X X m �2 �2 u dpj þ λ dij upj uuij λÞ j¼1
j¼1
p¼1
(13)
i¼1
where λ is the ratio of labeled sample points in the data sample, XU is the �� �� unlabeled sample set, XL is the labeled sample set, X U and ��X L �� respectively are the number of samples in these sets, uuij represents the
membership degree to which data point xui belongs to the j th cluster, dij represents the distance between data point xi and cluster center vp , and c is the number of clusters. To improve classification accuracy and reduce unnecessary algo rithm calculation, this study introduces the concept of semi-supervised learning to FCM in two respects: the initial V 0 values and the center calculation formulas. This approach can reduce the number of iterations of the algorithm and make up for the increase in algorithm runtime caused by the increase in clustering accuracy. If the cluster centers are initialized randomly, which means that the membership degree matrix is randomly initialized, this will not only increase the number of unnecessary iterations, but also cause unstable clustering results. In contrast, we propose to initialize centers V 0 ac cording to expert knowledge (the labeled sample set XL ), and then initialize the membership degree. The initial center formula is as follows: PjjXL jj
ul xl V 0 ¼ Pi¼1 L i i jjX jj l i¼1 ui
Fig. 1. Type-reduction example of an adaptive factor in a cluster. Adaptive factor γ oscillates at the beginning. Then, the mean-squared error in each cluster tends to reach a minimum value as the number of iterations increases. Finally, γ reaches a stable value.
unsupervised clustering algorithm. Larger of λ, give labeled samples in the algorithm a stronger guiding ability. The structure of the T2FCM membership interval can be thought of as a measure of maximum difference, to improve its ability to express uncertainty. The membership interval of an unlabeled sample is defined as follows: h i U U ¼ uuij ;
(14)
h i U U ¼ uuij ;
1
uuij ¼
c hX
� �2=m 1 i Sij Sik
1
(19)
maximum value over all dimensions, and dxij is the dimensional distance
between unlabeled sample xui and cluster center vj . The IT2FCM algorithm needs to normalize the membership degree of the interval. In the method proposed in this paper, adaptive factor γ is introduced to dynamically adjust the membership width (Li, 1999). Adaptive factor γ is used as follows: U U U� (20) UU ¼ U γ U U
(15)
where uij represents the membership degree of data point xi belonging to the jth cluster. To fully utilize the guiding role of labeled sample points in the al gorithm, accelerate the speed of convergence, and increase the accuracy of the clustering results, the following formula for calculating the cluster center vj is employed:
Ratio λ is defined as follows: �� L �� �� L �� ��X �� ��X �� λ¼ ¼ ���� L ���� ���� U ����; λ 2 ½0; 1� jjXjj X þ X
(18)
meanðdxij Þ is the average value over all dimensions, Sij ¼ maxðdxij Þ is the
p¼1
PjjXU jj u u PjjXL jj l l PjjXU jj u u ! p¼1 upj xp i¼1 uij xp p¼1 upj xp vj ¼ P U þλ P L PjjXU jj u jjX jj u jjX jj l p¼1 upj i¼1 uij p¼1 upj PjjXU jj u u PjjXL jj l l p¼1 upj xp i¼1 uij xp ¼ ð1 λÞ P U þλ P L jjX jj u jjX jj l p¼1 upj i¼1 uij
1
where uuij and uuij are the upper and lower bounds of membership uuij , Lij ¼
The calculation of membership matrix U is as follows: � �2=m 1 i dij dip
� �2=m 1 i Lij Lik
k¼1
1, else ulij ¼ 0. c hX
c hX k¼1
where xli is the ith point in labeled sample set XL and uli is the membership matrix of the labeled sample point xli . If xli belongs to the j th cluster, ulij ¼
uij ¼
uuij ¼
where γ ¼ fγ1 ; γ 2 ; ⋯; γ c g is defined for each cluster. Hence, when data are partitioned into difference clusters, adaptive factor γ will become larger in ½0; 1� if the mean-square error of the cluster becomes larger. Moreover, the membership descending value is extended immediately. As the number of iterations increases, γ tends to become stable as shown in Fig. 1. The positive correlation curve of the adaptive factor γ calcu lation is defined as follows: � γ ¼ 1 0:97exp 5e2 (21)
(16)
where e ¼ fe1 ; e2 ; ⋯; ej ; ⋯; ec g is the mean-squared error of the cluster, and is defined as follows: P � u �m u � δ xi ; vj i2Cj uij �� �� ej ¼ ; j ¼ 1; 2; ⋯; c (22) ��Cj ��
(17)
where X is the dataset, and X is the total number of points in the dataset. Here, if λ ¼ 1, all samples are labeled samples, and the algorithm does not need to iterate. In contrast, if λ ¼ 0, this algorithm becomes an
Here, δðxui ; vj Þ represents the deviation between the unlabeled sample �� �� points xui and the center vj in current cluster Cj . Moreover, ��Cj �� is the 135
J. Xu et al.
Computers and Geosciences 131 (2019) 132–143
Big Hengqin Island SPOT5 image was acquired on December 3, 2007. Fig. 3(a) (428 � 696 pixels) shows an RGB color composite map of 1, 2, and 3 bands. It covers the area east to Macao Island, west to Modaomen Estuary, south to Sandie Waterfalls, and north to Baosheng Road. As shown in Table 1, the surface is covered by water, bare land, grassland, construction sites, woodland, agricultural land, and other surface types. Summer Palace SPOT5 image was acquired on November 16, 2010. Fig. 4(a) (591 � 736 pixels) shows an RGB color composite map of 1, 2, and 3 bands. It covers the area east to Century City, west to Beijing Botanical Garden, south to Xingshikou Road, and north to the Summer Palace. As shown in Table 2, the land is covered by water, bare land, grassland, construction sites, and woodland. The two groups of experi mental data have similar features, such as a large coverage areas, many shadows, and strong fuzziness, which make the classification more difficult. These features can be used to evaluate the accuracy and adaptability of the algorithm to complex remote sensing image classi fication problems. Hengqin Island TM image was acquired on November 15, 1999. Fig. 5(a) (452 � 795 pixels) shows an RGB color composite map of 4, 3, and 2 bands. It covers the entire Hengqin Island area and surrounding waters. As shown in Table 3, the land is covered by vege tation, farms, building land, clear water, turbid water, and wetland. This experimental data has more uncertain characteristics than the others because the same pixel can represent many types of ground objects and there are serious spectral overlaps and large density differences of the categories. Moreover, these data can effectively evaluate the feasibility and effectiveness of the SS-AIT2FCM method. This experiment aims to evaluate the correctness and practicality of the proposed algorithm for the classification of more complex remote sensing image data. Based on the analysis of the above experimental image characteristics, we set λ ¼ 0.01 for SS-AIT2FCM in experiments 4.1 and 4.2 and λ ¼ 0.05 for SS-AIT2FCM in experiment 4.3. Hence, no filtering, post-processing or other operations were performed in these experiments. In addition, to ensure the comparability of the final experimental results, all experiments were implemented with MATLAB R2016a and the parameters in each experiment were uniform. The effectiveness of the classification results is compared using visual interpretation and several objective indicators.
Fig. 2. Flow chart of the SS-AIT2FCM algorithm.
number of sample points in cluster Cj . The SS-AIT2FCM algorithm is shown in Fig. 2, and the specific implementation steps are as follows: Step 1: Certain expert knowledge of each class is obtained in dataset X, and the samples are labeled. Create labeled sample set XL and unlabeled sample set XU . Determine the number of clusters c, maximum number of iterations T, and threshold ε, initializing the number of iterations as t ¼ 1 and objective function J0 ¼ 0; Step 2: Using the fuzzy weight index m selection algorithm in Section 3.2, determine the optimal m; Step 3: Use Eq. (17) to initialize the ratio λ of the labeled samples. Use Eq. (14) to initialize center V 0 . Use Eq. (15) to initialize the membership matrix UU of unlabeled sample points; Step 4: Use Eqs. (18) and (19) to calculate the membership degree U
4.1. Experimental results for Big Hengqin Island SPOT5 multi-spectral data As shown in Fig. 3, the result of FCM (Fig. 3(b)) includes a serious misclassification of forest land and grassland with a similar spectrum, and is be easily affected by the noise point. In addition, it is not robust enough to describe the heterogeneous region and is vulnerable to the influence of spectral aliasing. For example, the raised oysters, which have a spectrum similar to that of woodland, are wrongly divided into the woodland category in region 2. Moreover, the single membership function-based FCM algorithm has insufficient fuzzy expressiveness for remote sensing images. The result of IT2FCM (Fig. 3(c)) fixes the misclassification in region 2, but the misclassification of forest land and grassland remains. Moreover, in the construction land partitioning, the “phagocytosis” problem occurs in region 3. This also indicates that the IT2FCM algorithm based on two fuzzy exponent models has better fuzzy expressiveness, but it is easy to find the phenomenon in which lowdensity categories are swallowed by high-density ones. This is because the algorithm, which uses different fuzzy exponents to construct the membership space, has the problem of the fuzzy difference between different classes when the type is reduced. The result of AIT2FCM (Fig. 3 (d)) shows that woodland and grassland are more accurately divided and the boundary between categories is clearer. It can be observed that the membership interval model based on fuzzy distance is suitable for this remote sensing image. However, vegetable fields are wrongly divided into construction site areas, and there is some noise in the overall classification results. Although SIIT2-FCM (Fig. 3(e)) benefits from semi-supervised information for recognizing the woodland and
U
interval ½U ; U �; Step 5: Using membership matrix UU , obtain the mean-squared error of each cluster in Eqs. (21) and (22), update adaptive factor γ, and update descending membership matrix UU ; Step 6: Update center V according to Eq. (16); Step 7: Update objective function Jt According to Eq. (13); Step 8: If jjJt Jt 1 j � ε or t � T, stop; else, SET t ¼ t þ 1 and go to Step 4. 4. Experiments and discussion To test the validity and adaptability of the algorithm, we selected two multi-spectral datasets from the SPOT5 satellite (10 m spatial res olution) and one multi-spectral dataset from the Landsat TM satellite (30 m spatial resolution). The datasets cover Big Hengqin Island, Guangdong, China, and the Summer Palace, Beijing, China, as well as Hengqin Island (including Big Hengqin and Small Hengqin Islands), Guangdong, China, respectively. They have a wide coverage and fea tures that are rich in content. Moreover, there are obvious uncertainties, ambiguities, and sources of interference. 136
J. Xu et al.
Computers and Geosciences 131 (2019) 132–143
Fig. 3. Experiment results for Big Hengqin Island (SPOT5 data) based on different methods. (a) Original SPOT5 false color composite images (bands 1, 2, and 3); (b) FCM algorithm; (c) IT2FCM algorithm with two fuzzy indices (m1 ¼ 2 and m2 ¼ 10); (d) AIT2FCM algorithm; (e) SIIT2-FCM (m1 ¼ 2 and m2 ¼ 10) algorithm; (f) Evolutionary AIT2FCM algorithm (E-AIT2FCM) based on evolutionary fuzzy weight index m; (g) SSAIT2FCM algorithm (with a labeled sample ratio of 1%); (h) Ground truth and legend. The three regional images in (a–g) were selected for their obvious differences in clas sification results. (For interpretation of the references to color in this figure legend, the reader is referred to the Web version of this article.)
137
J. Xu et al.
Computers and Geosciences 131 (2019) 132–143
Table 1 Composition of the categories in Big Hengqin Island (SPOT5 data). Experimental data Big Hengqin Island SPOT5
Land cover
Describe
water bare land grass construction site woodland
rivers, reservoir, raise oysters, wetland vegetable fields, garden plot, feldweg arable land, lawn, weed golf course runway, building site, rock mountain forest, artificial forest
Table 2 Composition of the categories in Summer Palace (SPOT5 data). Experimental data Summer Palace SPOT5
construction land, it has the same “phagocytosis” problem as IT2FCM (Fig. 3(c)), because they have the same method of constructing the membership space. Compared with the classification results of AIT2FCM (Fig. 3(d)), those of E-AIT2FCM (Fig. 3(f)) are less affected by noise and are more accurate. For example, the error rate for bare land is reduced in region 1. This is related to the fact that E-AIT2FCM selects the optimal m value. An appropriate value for m can effectively control the noise. Only SS-AIT2FCM (Fig. 3(g)) did not wrongly classify bare land as construc tion site and obtained the more compact classes and obvious boundaries. This implies the importance of prior knowledge in the rough classifi cation of remote sensing images.
Land cover
Describe
water bare land grass construction site woodland
river, Kunming reservoir main trunk road, residential district grassland, greenbelt large buildings, airports mountain forest, landscape forest
with high-order fuzziness. In contrast to FCM (Fig. 4(b)), the water and woodland are not misclassified in region 2 and the division between woodland and grass is more accurate. However, the division of roads and waterways has not been solved nor has the partial misclassification of mountain shadows as water. SIIT2-FCM (Fig. 4(e)) recognizes woodland better than IT2FCM, because the woodland in region 1 is more compact, but the same problem of misclassification is not solved. The result of AIT2FCM (Fig. 4(d)) shows that road is misclassified as water, and many serious misclassifications with unclear boundaries between water, woodland, and grass occur. In contrast, the mountains are clearly divided into water and grass in region 1 and woodland is divided into water and bare land in region 2 by mistake. The spectral aliasing is a serious problem for this algorithm, demonstrating its inability to adapt. In contrast, the classification result of E-AIT2FCM (Fig. 4(f)) is better because roads and waterways are clearly visible, each class is compact and complete, and the algorithm is clearly more adaptable than AIT2FCM. Moreover, only a part of the grassland area is incorrectly classified as woodland and has salt-and-pepper noise, such as in region 3. SS-AIT2FCM (Fig. 4(g)) achieves the best classification result, with fewer errors, better anti-noise capability, a higher degree of category aggregation, and more obvious boundaries, especially in the division of woodland and grassland. To summarize the classification results of all the algorithms, they all misclassified house shadow as water. This is because the algorithms only consider the single-pixel fuzzy method and
4.2. Experimental results for Beijing Summer Palace SPOT5 multispectral data As shown in Fig. 4, the result of FCM (Fig. 4(b)) shows that the di vision of roads and waterways is indistinct, water and woodland are badly misclassified, and grassland and woodland are wrongly distin guished. The shadow area is classified as water by mistake because the spectra of shadows and water are similar. For example, the mountain shadow is classified as water by mistake in region 1. The FCM algorithm is not suitable for remote sensing images with serious spectral aliasing, large coverage area, and abundant features. The result of IT2FCM (Fig. 4 (c)) shows that this algorithm is more suitable for remote sensing images
Fig. 4. Experiment result for Summer Palace (SPOT5 data) based on different methods. (a) Original SPOT5 false color composite images (bands 1, 2, and 3); (b) FCM algorithm; (c) IT2FCM algorithm with two fuzzy indices (m1 ¼ 1:5 and m2 ¼ 4:5); (d) AIT2FCM algorithm; (e) SIIT2-FCM (m1 ¼ 1:5 and m2 ¼ 4:5) algorithm; (f) Evolutionary AIT2FCM algorithm (E-AIT2FCM) based on evolutionary fuzzy weight index m; (g) SS-AIT2FCM algorithm (with a labeled sample ratio of 1%); (h) Ground truth and legend. Again, the three regional images in (a–g) were selected for their obvious classification results differences. (For interpretation of the ref erences to color in this figure legend, the reader is referred to the Web version of this article.) 138
J. Xu et al.
Computers and Geosciences 131 (2019) 132–143
Fig. 5. Experiment result for Hengqin Island TM data based on different methods. (a) Original SPOT5 false color composite images (bands 4, 3, and 2); (b) FCM algorithm; (c) IT2FCM algorithm with two fuzzy indices (m1 ¼ 2 and m2 ¼ 5); (d) AIT2FCM algorithm; (e) SIIT2-FCM (m1 ¼ 2 and m2 ¼ 5) algorithm; (f) Evolutionary AIT2FCM algorithm (E-AIT2FCM) based on evolutionary fuzzy weight index m; (g) SS-AIT2FCM algorithm (with a labeled sample ratio of 5%); (h) Ground truth. Again, the three regional images were selected for their obvious classification result differences. (For interpretation of the references to color in this figure legend, the reader is referred to the Web version of this article.)
do not take spatial neighborhood information into consideration. Hence, this is a direction in which the algorithm will be improved in the future. To compare the results of these algorithms using two types of multispectral data, we measured reference classification images (Figs. 4(h)
and 5(h)) with the land-use map and historical data of previous years. The confusion matrixes and the accuracy (acc) of each class from all algorithms in the two experiments are provided in Tables 4 and 5 and the overall accuracy (OA), kappa coefficient (KC) and CPU-time are 139
J. Xu et al.
Computers and Geosciences 131 (2019) 132–143
classification can reduce the algorithm time while meeting the re quirements of complex classification problems. Moreover, although the classification execution time of E-AIT2FCM in the two sets of experi ments is slightly higher than that of AIT2FCM, the overall classification accuracy and kappa coefficient are higher than those of AIT2FCM, and are significantly better than those of AIT2FCM on the second set of data. This further proves that the E-AIT2FCM algorithm is more accurate and more adaptable to remote sensing images with serious spectral aliasing, large coverage areas, and rich features.
Table 3 Composition of the categories in Hengqin Island (TM data). Experimental data
Land cover
Describe
Hengqin Island TM
vegetation aquiculture area building land clear water turbid water wetland
grassland, forest land raise oysters, water flooded rice fields housing, roads, airport clear water turbid water wetland
calculated to verify the classification performance and arithmetic speed of the algorithms (Awad et al., 2009) as shown in Tables 6 and 7. The OA and KC of SS-AIT2FCM are the highest of all algorithms and the acc of different classes are generally improved. Moreover, and the classifica tion results are consistent with the results of visual interpretation and the algorithm execution time (CPU time) is the lowest of all type-2 fuzzy comparison algorithms. This proves that the addition of semi-supervised
4.3. Experimental results for Hengqin Island TM multi-spectral data As shown in Fig. 5, region 1 includes flooded rice fields and a fraction of vegetation categories. Because the flooded rice fields and vegetation have similar spectra and the two are connected together in practice, which makes them susceptible to the influence of neighborhood pixels, spectral aliasing occurs in the interface areas, making it difficult to
Table 4 Confusion matrix of the results from the classification algorithms (Big Hengqin Island SPOT5 data). Ground truth class Water (Wa) Bare land (Ba) Grass (Gr) Construction sites (Co) Woodland (Wo)
FCM
IT2FCM
Wa
Ba
Gr
Co
Wo
acc
Wa
Ba
Gr
Co
Wo
acc
5585 111 2 12 2
38 2764 3006 212 1364
0 80 3899 0 0
4 437 110 3674 1647
764 401 266 21 3031
87.39% 72.87% 53.54% 93.75% 54.34%
6104 220 0 31 52
17 1364 1903 54 2
0 60 4830 0 1425
26 1908 13 3829 0
244 241 537 5 1552
95.51% 35.96% 66.32% 97.70% 51.20%
Ba 44 1688 505 304 0
Gr 19 882 5716 6 81
Co 0 50 7 3593 0
Wo 34 10 1051 0 2935
acc 98.48% 70.87% 78.48% 91.68% 96.83%
Ba 37 847 1523 82 23
Gr 1 1025 5535 5 61
Co 21 1733 79 3817 0
Wo 68 31 142 0 2852
acc 98.01% 22.33% 76.00% 97.40% 94.09%
Ba 41 2847 245 373 0
Gr 15 643 5929 13 52
Co 0 204 3 3510 0
Wo 7 10 1095 0 2967
acc 99.01% 75.06% 81.41% 89.56% 97.89%
Gr 0 204 6109 0 22
Co 0 2 0 3614 0
Wo 0 4 1163 5 3009
acc 100% 92.22% 83.88% 92.22% 99.27%
AIT2FCM Ground truth class Water (Wa) Bare land (Ba) Grass (Gr) Construction sites (Co) Woodland (Wo)
SIIT2-FCM
Wa 6294 163 4 16 15 E-AIT2FCM
Ground truth class Water (Wa) Bare land (Ba) Grass (Gr) Construction sites (Co) Woodland (Wo)
Wa 6328 89 11 23 12
Wa 6264 157 4 15 95
SS-AIT2FCM Wa 6391 85 0 26 0
Ba 0 3498 11 274 0
Table 5 Confusion matrix of the results from the classification algorithms (Summer Palace SPOT5 data). Ground truth class Water (Wa) Bare land (Ba) Grass (Gr) Construction sites (Co) Woodland (Wo)
FCM
IT2FCM
Wa
Ba
Gr
Co
Wo
acc
Wa
Ba
Gr
Co
Wo
acc
7756 1189 208 1161 2961
4 12066 487 1594 7
0 3948 5933 382 106
0 0 65 14800 0
0 58 10917 0 1550
99.95% 69.90% 33.69% 82.51% 33.52%
7760 1303 52 1165 652
0 14703 1078 3374 13
0 638 10217 272 129
0 0 31 13121 0
0 617 6232 5 3830
100% 85.18% 58.02% 73.15% 82.83%
Ba 0 12209 641 983 3
Gr 0 1925 10795 153 100
Co 0 5 72 15610 0
Wo 0 18 5782 0 4007
acc 100% 70.73% 61.30% 87.03% 86.66%
Gr 0 355 14903 0 389
Co 0 0 74 16650 0
Wo 0 2 1256 2 4205
acc 100% 97.93% 84.63% 92.82% 90.94%
AIT2FCM Ground truth class Water (Wa) Bare land (Ba) Grass (Gr) Construction sites (Co) Woodland (Wo)
Wa 7760 1459 337 1167 2933
Ground truth class Water (Wa) Bare land (Ba) Grass (Gr) Construction sites (Co) Woodland (Wo)
Wa 7760 65 21 1149 95
SIIT2-FCM Ba 0 12764 544 851 8
Gr 0 43 9521 0 1587
Co 0 0 69 15473 0
Wo 0 2995 7139 446 96
acc 100% 73.95% 54.07% 86.26% 2.08%
Ba 0 15943 1302 2345 11
Gr 0 301 12447 260 226
Co 0 0 50 14176 0
Wo 0 952 3790 7 4292
acc 100% 92.36% 70.68% 79.03% 92.82%
E-AIT2FCM
Wa 7760 3104 320 1191 514
SS-AIT2FCM
140
Wa 7760 0 12 1142 12
Ba 0 16904 1365 143 18
J. Xu et al.
Computers and Geosciences 131 (2019) 132–143
woodland fringe as wetland, AIT2FCM (Fig. 5(d)) misclassifies some of the farms in the border areas as wetland and E-AIT2FCM (Fig. 5(f)) blurs the boundaries of the mountain vegetation. SIIT2-FCM (Fig. 5(e)) and SS-AIT2FCM (Fig. 5(g)) not only correctly identifies the mountain vegetation and farms areas, it also divides the boundaries more clearly and efficiently. In region 3, there are clear water and farm areas with similar spectra. All algorithms misclassify part of the clear water as aquiculture area. The error rate of SS-AIT2FCM (Fig. 5(g)) on clear water is lower. These results verify that the semi-supervised approach of SSAIT2FCM can effectively guide the classification, clarify the classifica tion of category boundaries, and retain more detailed information. Comparing the experimental results of these algorithms for TM multi-spectral data, we measured a reference classification image (Fig. 5 (h)) with the land-use map and historical data of previous years. The confusion matrix and the each class acc of all algorithms are provided in Table 8, while OA, KC, and CPU-time were calculated and shown in Table 9 to verify the classification performance and arithmetic speed of the algorithms (Awad et al., 2009). The AIT2FCM and E-AT2FCM have a similar OA and KC. The OA and KC of SS-AIT2FCM are both the highest and the different class acc are generally improved, while the CPU time of SS-AIT2FCM is the lowest of all type-2 comparison algorithms.
Table 6 Objective indicator comparison results of the classification algorithm (Big Hengqin Island SPOT5 data). Experimental data
Big Hengqin Island SPOT5
Classification algorithm
OA
KC
CPU-time (s)
FCM IT2FCM(m1 ¼ 2, m2 ¼ 10)
71.95% 72.40%
0.6457 0.6470
75.926 175.010
AIT2FCM SIIT2-FCM(m1 ¼ 2 m2 ¼ 10)
82.84% 79.10%
0.7833 0.7322
113.587 109.042
E-AIT2FCM SS-AIT2FCM(λ¼0.01)
88.39% 92.64%
0.8515 0.9062
120.342 88.955
Table 7 Objective indicator comparison results of the classification algorithm (Summer Palace SPOT5 data). Experimental data
Summer Palace SPOT5
Classification algorithm
OA
KC
CPU-time (s)
FCM IT2FCM(m1 ¼ 1:5, m2 ¼ 4:5)
64.59% 76.13%
0.5567 0.6972
116.548 281.098
AIT2FCM SIIT2-FCM(m1 ¼ 1:5, m2 ¼ 4:5)
69.97% 77.28%
0.6218 0.7123
206.530 156.371
83.78% 92.68%
0.7918 0.9047
210.663 123.220
E-AIT2FCM SS-AIT2FCM(λ¼0.01)
Table 9 Objective indicator comparison results of the classification algorithm (TM data in Hengqin Island). Experimental data
distinguish the two categories. FCM (Fig. 5(b)) and IT2FCM (Fig. 5(c)) mistakenly identify flooded rice fields and vegetation as wetland. Although AIT2FCM (Fig. 5(d)) has divided the partially flooded rice fields and vegetation, the intersections of the two categories are classi fied as wetlands by mistake. SIIT2-FCM (Fig. 5(e)) and E-AIT2FCM (Fig. 5(f)) classify some flooded rice fields connected with vegetation as vegetation by mistake. In contrast, SS-AIT2FCM (Fig. 5(g)) divides the two categories better. Region 2 is the interface area between mountain vegetation and farms. Influenced by sensitive changes in the mountain vegetation, both FCM (Fig. 5(b)) and IT2FCM (Fig. 5(c)) misclassify
Hengqin Island TM
Classification algorithm
OA
KC
CPU-time (s)
FCM IT2FCM(m1 ¼ 2, m2 ¼ 5)
67.30% 68.65%
0.5996 0.6118
105.368 378.242
AIT2FCM
75.99%
0.7091
319.378
SIIT2-FCM(m1 ¼ 2, m2 ¼ 5)
81.77%
0.7762
354.764
76.20% 85.48%
0.7115 0.8212
337.226 179.554
E-AT2FCM SS-AIT2FCM(λ¼0.05)
Table 8 Confusion matrix of the results from the classification algorithms (TM data in Hengqin Island). Ground truth class Vegetation (Ve) Aquiculture area (Aq) Building land (Bu) Clear water (Cl) Turbid water (Tu) Wetland (We)
FCM
IT2FCM
Ve
Aq
Bu
Cl
Tu
We
acc
Ve
Aq
Bu
Cl
Tu
We
acc
2140 66 7 0 0 0
102 4983 16 1038 0 8
165 327 4230 55 0 341
0 208 48 4756 387 117
11 147 363 59 1816 1817
1211 1915 305 0 0 10
58.97% 65.17% 85.13% 80.50% 82.43% 0.44%
2656 209 15 0 0 0
47 5517 24 1647 0 16
139 320 4419 70 0 1229
0 24 36 4010 525 1
8 174 121 181 1678 1034
779 1402 354 0 0 13
73.19% 72.16% 88.93% 67.87% 76.17% 0.57%
Aq 13 4518 14 571 0 3
Bu 132 78 3123 21 0 9
Cl 0 145 31 5157 300 61
Tu 4 0 129 49 1903 80
We 71 1928 1445 110 0 2140
acc 93.94% 59.09% 62.85% 87.29% 86.38% 93.33%
Bu 154 238 3835 10 0 33
Cl 1 166 12 5371 300 86
Tu 0 3 62 26 1898 7
We 25 446 822 87 5 2161
acc 90.47% 68.56% 77.18% 90.91% 86.16% 94.24%
Bu 125 69 3042 18 0 8
Cl 0 151 27 5269 287 28
Tu 0 1 115 49 1916 26
We 64 907 1524 119 0 2226
acc 94.41% 57.91% 61.22% 89.18% 86.97% 97.08%
Bu 148 93 3421 25 0 23
Cl 0 204 35 5698 98 157
Tu 5 17 111 50 2105 139
We 24 408 1100 62 0 1925
acc 94.60% 81.04% 68.85% 96.45% 95.55% 83.95%
AIT2FCM Ground truth class Vegetation (Ve) Aquiculture area (Aq) Building land (Bu) Clear water (Cl) Turbid water (Tu) Wetland (We)
Ve 3409 977 227 0 0 0
SIIT2-FCM
E-AIT2FCM Ground truth class Vegetation (Ve) Aquiculture area (Aq) Building land (Bu) Clear water (Cl) Turbid water (Tu) Wetland (We)
Ve 3426 2090 250 9 0 2
Aq 14 4428 11 444 0 3
Ve 3283 1551 233 0 0 2
Aq 166 5242 5 414 0 4
SS-AIT2FCM
141
Ve 3433 728 256 0 0 0
Aq 19 6196 46 73 0 49
Computers and Geosciences 131 (2019) 132–143
J. Xu et al.
5. Conclusions
Awad, M., Chehdi, K., Nasri, A., 2009. Multi-component image segmentation using a hybrid dynamic genetic algorithm and fuzzy c-means. IET Image Process. 3 (2), 52–62. Awad, M., Nasri, A., 2009. Satellite image segmentation using self-organizing maps and fuzzy c-means. In: Signal Processing and Information Technology (ISSPIT), 2009 IEEE International Symposium on. IEEE, pp. 398–402. Bouchachia, A., Pedrycz, W., 2006. Data clustering with partial supervision. Data Min. Knowl. Discov. 2 (1), 47–78. Choubin, B., Solaimani, K., Habibnejad, R., Malekian, A., 2017. Watershed classification by remote sensing indices: a fuzzy c-means clustering approach. J. Mt. Sci. 14 (10), 2053–2063. Crawford, M., Tuia, D., Yang, H., 2013. Active learning: any value for classification of remotely sensed data. Proc. IEEE 101 (3), 593–608. Du, P., Xia, J., Xue, Z., Tan, K., Su, H., Bao, R., 2016. Review of hyperspectral remote sensing image classification. J. Rem. Sens. 20 (2), 236–256. Gong, J., Zhong, Y., 2016. Survey of intelligent optical remote sensing image processing. J. Rem. Sens. 20 (5), 733–747. Guo, J., Huo, H., 2017. An enhanced it2fcm* algorithm integrating spectral indices and spatial information for multi-spectral remote sensing image clustering. Rem. Sens. 9 (9), 960. He, H., Hu, D., Yu, X., 2016. Land cover classification based on adaptive interval type-2 fuzzy clustering. Chin. J. Geophys. 59 (6), 1983–1993. He, H., Liang, T., Hu, D., Yu, X., 2016. Remote sensing clustering analysis based on object-based interval modeling. Comput. Geosci. 94, 131–139. Huo, H., Guo, J., Li, Z., Jiang, X., 2017. Remote sensing of spatiotemporal changes in wetland geomorphology based on type 2 fuzzy sets: a case study of beidahuang wetland from 1975 to 2015. Rem. Sens. 9 (7), 683. Hwang, C., Rhee, C., 2007. Uncertain fuzzy clustering: interval type-2 fuzzy approach to c-means. IEEE Trans. Fuzzy Syst. 15 (1), 107–120. Li, H., 1999. Variable domain adaptive fuzzy controller. Sci. China Technol. Sci. 1, 32–42. Liang, Q., Mendel, J., 2000. Interval type-2 fuzzy logic systems: theory and design. IEEE Trans. Fuzzy Syst. 8 (5), 534–550. Liu, L., 2016. A new fuzzy clustering method with neighborhood distance constraint for volcanic ash cloud. IEEE Access 4 (99), 7005–7013. Long, N., Mai, D., Pedrycz, W., 2015. Semi-supervising Interval Type-2 Fuzzy C-Means clustering with spatial information for multi-spectral satellite image classification and change detection. Comput. Geosci. 83, 1–16. Memon, K., 2018. A histogram approach for determining fuzzy values of interval type-2 fuzzy c-means. Expert Syst. Appl. 91, 27–35. Mendel, J., 2007. Type-2 fuzzy sets and systems: an overview. IEEE Comput. Intell. Mag. 2 (1), 20–29. Pal, N., Bezdek, J., 1995. On cluster validity for the fuzzy c-mean model. IEEE Trans. Fuzzy Syst. 3 (3), 370–379. Palafox, L., Hamilton, C., Scheidt, S., Alvarez, A., 2017. Automated detection of geological landforms on mars using convolutional neural networks. Comput. Geosci. 101, 48–56. Pedrycz, W., 1985. Algorithms of fuzzy clustering with partial supervision. Pattern Recogn. 3, 13–20. Pedrycz, W., Waletzky, J., 1997. Fuzzy clustering with partial supervision. IEEE Trans. Syst. Man Cybern. -Syst. 27 (5), 787–795. Pedrycz, W., Waletzky, J., 1997. Neural-network front ends in unsupervised learning. IEEE Trans. Neural Netw. Learn. Syst. 8 (2), 390–401. Pei, T., Sobolevsky, S., Ratti, C., Shaw, S., Li, T., Zhou, C., 2014. A new insight into land use classification based on aggregated mobile phone data. Int. J. Geogr. Inf. Sci. 28 (9), 1988–2007. Persello, C., Bruzzone, L., 2014. Active and semisupervised learning for the classification of remote sensing image. IEEE Trans. Geosci. Remote Sens. 52 (11), 6937–6956. Shao, P., Shi, W., He, P., Hao, M., Zhang, X., 2016. Novel approach to unsupervised change detection based on a robust semi-supervised FCM clustering algorithm. Rem. Sens. 8 (3), 264. Sinh, D., Long, T., 2018. Multiple kernel approach to semi-supervised fuzzy clustering algorithm for land-cover classification. Eng. Appl. Artif. Intell. 68, 205–213. Stutz, C., Runkler, T., 2002. Classification and prediction of road traffic using application-specific fuzzy clustering. IEEE Trans. Fuzzy Syst. 10 (3), 297–308. Szilagyi, L., Iclanzan, D., Szilagyi, S., Dumitrescu, D., Hirsbrunner, B., 2009. A generalized c-means clustering model using optimized via evolutionary computation. In: Fuzzy Systems (FUZZY), International Conference on. IEEE, vol. 2, pp. 451–455. Tong, Q., Zhang, B., Zhang, L., 2016. Current progress of hyperspectral remote sensing in China. J. Rem. Sens. 20 (5), 689–707. Wang, J., Zhang, C., et al., 2017. Semi-supervised classification algorithm of hyperspectral image based on DL1 graph and KNN superposition graph. Sci Sin Inform 47 (12), 1662–1673. Wang, Z., Zhang, L., Zhang, L., Li, R., Zheng, Y., Zhu, Z., 2018. A deep neural network with spatial pooling (dnnsp) for 3-d point cloud classification. IEEE Trans. Geosci. Remote Sens. 56 (8), 1–11. We, J., Jiang, Z., et al., 2017. Hyperspectral remote sensing image classification based on semi-supervised conditional random field. J. Rem. Sens. 21 (04), 588–603. Wu, D., Mendel, J., 2007. Aggregation using the linguistic weighted average and interval type-2 fuzzy sets. IEEE Trans. Fuzzy Syst. 15 (6), 1145–1161. Xu, G., Liu, Q., Chen, L., Liu, L., 2016. Remote sensing for China’s sustainable development: opportunities and challenges. J. Rem. Sens. 20 (5), 679–688. Xu, J., Yu, X., Pei, W., Hu, D., Zhang, L., 2015. A remote sensing image fusion method based on feedback sparse component analysis. Comput. Geosci. 85 (PB), 115–123.
In this paper, we proposed a new remote sensing image classification method based on a semi-supervised adaptive interval type-2 fuzzy cmeans algorithm, called the SS-AIT2FCM algorithm. The main contri butions of this paper are as follows. i. To guide the search for a more reliable fuzzy weight index m, SSAIT2FCM compares labeled samples with unsupervised outputs and uses the proposed validity index as a measure. The process of calculating m is an iterative climbing process. By designing a validity index to evaluate the classification results of different m values, the most effec tive one is obtained. ii. SS-AIT2FCM is the first method to introduce the semi-supervised concept with fuzzy distance metrics to build IT2FCM. It combines the labeled samples to initialize the centers and to assist the center calcu lation process. In addition, the adaptive type-descending method takes full advantage of the labeled samples. The experimental results show that the IT2FCM algorithm performs better than the T1FCM algorithm in remote sensing image data with low spatial resolution and large coverage areas of mixed pixels and rich features. The E-AIT2FCM algorithm is more suitable than the AIT2FCM algorithm for this application. Moreover, the accuracy of the SSAIT2FCM algorithm with prior knowledge is significantly higher than that of the unsupervised algorithm. It can also effectively reduce the computation time of the algorithm and make up for the increase in complexity of the type-2 fuzzy set method. This further validates the effectiveness of the semi-supervised approach in the algorithm proposed in this paper. The introduction of expert prior knowledge also solved the ill-posedness of the model used in the algorithm itself for remote sensing image data and improved the final classification result. However, the algorithm requires the selection of the labeled samples to be accurate, and the accuracy of the classification still has much room for improve ment. The matching model of different remote sensing image data and fuzzy algorithms and the selection of labeled samples are research di rections that still need to be further studied in the future. Computer code availability We have uploaded the computer code of all the algorithms in the submission system. Contributor roles Xu-Methodology and conceptualization; Feng-Investigation and writing original draft; Zhao-Investigation and Software; Sun-Data curation; Zhu-Software. Acknowledgements This research is funded by Natural Science Foundation of Shandong (ZR2019MF060, ZR2017MF008, ZR201702220179, ZR201709210160), a Project of the Shandong Province Higher Educational Science and Technology Key Program (J18KZ016), the Yantai Science and Technol ogy Plan (2018YT06000271), and Natural Science Foundation of China (61801414, 61802330, 61802331). Appendix A. Supplementary data Supplementary data to this article can be found online at https://doi. org/10.1016/j.cageo.2019.06.005. References Assas, O., 2015. Images segmentation based on interval type-2 Fuzzy C-Means. In: Sai Intelligent Systems Conference. IEEE, pp. 773–781.
142
J. Xu et al.
Computers and Geosciences 131 (2019) 132–143
Yu, J., 2003. On fuzzy index of fuzzy c-means algorithm. Chin. J. Comput. 26 (8), 968–973. Yu, X., Zhou, W., He, H., 2014. A method of remote sensing image auto classification based on interval type-2 fuzzy c-means. In: Fuzzy Systems (FUZZ-IEEE), 2014 IEEE International Conference on. IEEE, pp. 223–228. Zadeh, L., 1965. Fuzzy Set. Informat. Control 8, 338–353. Zadeh, L., 1975. The concept of a linguistic variable and its application to approximate reasoning-I. Inf. Sci. 8 (1), 199–249.
Zhang, B., 2016. Advancement of hyperspectral image processing and information extraction. J. Rem. Sens. 20 (5), 1062–1090. Zhang, B., 2018. A survey of developments on optical remote sensing information technology and applications. J. Nanjing Univ. Informat. Sci. Technol. (Nat. Sci. Ed.) 10 (1), 1–5. Zhu, C., Yang, S., Zhao, Q., Cui, S., Wen, N., 2014. Robust semi-supervised kernel-FCM algorithm incorporating local spatial information for remote sensing image classification. J. Indian Soc. Remote Sens. 42 (1), 35–49.
143