Int J Appl Earth Obs Geoinformation 82 (2019) 101898
Contents lists available at ScienceDirect
Int J Appl Earth Obs Geoinformation journal homepage: www.elsevier.com/locate/jag
Cloud detection algorithm using SVM with SWIR2 and tasseled cap applied to Landsat 8
T
⁎
Pratik P. Joshia, , Randolph H. Wynneb, Valerie A. Thomasb a b
Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA Department of Forest Resources and Environmental Conservation, Virginia Tech, Blacksburg, VA 24060, USA
A R T I C LE I N FO
A B S T R A C T
Keywords: Cloud detection SWIR Tasseled cap SVM Landsat 8 Feature mask
Landsat satellite images are subject to cloud cover effects resulting in erroneous analysis and observations of ground features. In this work, we present a novel algorithm (STmask) combining tasseled cap band 4 (TC4) with short wave infrared spectral band 2, SWIR2 (2.107–2.294 μm) for generating cloud, water, shadow, snow and vegetation masks. A support vector machine (SVM) with a non-linear kernel is trained on a feature space of TC4 versus SWIR2 for generating feature masks. To develop a generic and unbiased algorithm, the SVM is trained using reference data comprised of 12891 pixels from Landsat 8 scenes from ten spatially and temporally diverse biomes including deciduous forest, rainforest, great plain, savanna, desert, ocean, freshwater, taiga, tundra, and icesheet. 960000 text pixels spanning 96 scenes across 8 biomes from the USGS Landsat cloud cover assessment data set are used for accuracy assessment of STmask as well as to compare its performance with the operational Landsat algorithm, C function of mask (CFmask). Using McNemar's statistic, STmask is shown to maximize both the precision and sensitivity of the classification of all features compared to CFmask. It addresses the challenges of CFmask through statistically significant improvement in the precision of cloud detection over snow/ice, barren, water, urban, and shrubland biomes. Aggregated over all biomes, the average improvement in cloud detection over CFmask is observed to be ∼3.8% using the F-measure. The classification of non-cloud features exhibits promising improvements and mostly comparable performance to CFmask. Overall classification performance is promising, and thus STmask is a novel, biome-independent, parsimonious, and computationally efficient alternative (and/or a cloud screening addition) to the operational CFmask algorithm. The work is timely and is targeted as an innovative processing solution for the land surface remote sensing research community.
1. Introduction Landsat data has historically been the most valuable resource for the study of dynamics of land cover and land use. The International Satellite Cloud Climatology Project-Flux Data (ISCCP-FD) estimates the global annual mean cloud cover to be approximately 65% (Zhang et al., 2004). Cloud brightness and shadows have adverse effects on data analysis. These include inaccurate classification and false change detection of land cover and land use, erroneous estimates of NDVI and other vegetation indices, inaccuracies in atmospheric correction models, etc. Cloud detection and screening is even more vital with the increasingly common paradigm of using Landsat time series stacks (LTSS) for mapping land cover and land use change (Brooks et al., 2012, 2014). Hence, the starting stage for most multitemporal studies is cloud cover detection and filtering (Irish et al., 2006). Thick cold clouds having strong reflectance values and thin transparent clouds having
⁎
mixed reflectance and temperature values are the two broad categories of clouds (Gao et al., 2002). With the use of multiple spectral filters and the predominant use of the thermal infrared band for each Landsat scene, the Automated Cloud Cover Assessment (ACCA) system (Irish et al., 2006) has historically been used for cloud filtering in Landsat images. By using a two stage process, involving four tests in the first stage and a thermal test in second stage, along with ancilliary surface temperature data, the Landsat Ecosystem Disturbance Adaptive Processing System (LEDAPS) algorithm generates an internal cloud mask (Vermote and Saleous, 2007). For low and mid-latitudes Landsat imagery, LEDAPS is extensively used for performing atmospheric correction and cloud detection. One of the most popular globally efficient algorithms used to generate feature masks for Landsat images is Fmask (function of mask) which uses a combination of a scene-based threshold and a probability mask (Zhu and Woodcock, 2012). The (multitemporal) exponentially-
Corresponding author. E-mail address:
[email protected] (P.P. Joshi).
https://doi.org/10.1016/j.jag.2019.101898 Received 31 December 2018; Received in revised form 9 June 2019; Accepted 11 June 2019 0303-2434/ © 2019 Elsevier B.V. All rights reserved.
Int J Appl Earth Obs Geoinformation 82 (2019) 101898
P.P. Joshi, et al.
technique for haze removal across space and time. Kong et al. (2011) and Xiao et al. (2017) compared haze-removal algorithms and denote the usefulness of both TC4 and HOT along with their variants. TC4's use of both the VNIR and SWIR bands, detailed below, has potential to improve separability between water (snow, clouds and ice) and nonwater features. Baig et al. (2014) have derived the tasseled cap transformation coefficients specific to Landsat 8. TC4 is given as
weighted moving average change detection (EWMA-CD) algorithm detects remnant clouds using X-bar statistical control charts (Brooks et al., 2014). Building on demonstrated prior potential of multitemporal cloud detection and screening, Zhu and Woodcock (2014) developed a multi-temporal variant of Fmask called Tmask, which utilised an outlier detection techniques applied to a time series trajectory of land-cover features in Landsat images to improve the sensitivity of feature detection. At present, the operational algorithm called C function of mask (CFmask) is used for generating feature masks in the Landsat surface climate data record provided by U.S. Geological Survey (USGS). The Fmask algorithm (Zhu and Woodcock, 2012) was originally designed to generate cloud, shadow and snow masks for Landsat 4-7 and was later expanded to Landsat 8 as CFmask by utilizing the cirrus band (Zhu et al., 2012). The CFmask algorithm operates on a single scene and computes a cloud mask using a scene-based threshold combined with a probabilistic mask. A cloud shadow mask is computed using object matching and flood-fill transformation. The CFmask algorithm is known to perform well at higher latitudes but has been prone to overestimation of cloud cover over snow/ice and water biomes and bright targets, and underestimation of clouds over desert/hot biomes, primarily because of using a fixed scene-based threshold for all the pixels. There is thus still room for improvement even in the current operational standard. The open source code and readily available pre-screened data products also make CFmask a good reference for evaluating newer feature detection algorithms for Landsat. The short wave infrared reflectance band (SWIR) has been utilized for cloud detection by several studies. Li et al. (2012) developed a gradient-based fusion method for cloud detection which used SWIR1 and the visible bands in Landsat 7 to dehaze visible images along with spatial enhancement. The potential of detecting clouds using the SWIR wavelengths measured by the OLI sensor was also presented by Gao et al. (1993). Due to strong water absorption in the SWIR, the two-way water vapor path lengths for high-altitude clouds is small, which results in higher SWIR reflectances for clouds compared to snow/ice features, which are known to exhibit weaker SWIR reflectances (Gao et al., 1993). This in turn often enables the separation of clouds from snow/ ice. In addition, cloud shadows have low reflectance in the SWIR compared to clouds, resulting in an added advantage of SWIR to differentiate clouds and cloud shadows (Zhu et al., 2015). Sun et al. (2017) have used SWIR reflectances from Landsat 8 OLI and MODIS to develop an automated cloud-detection algorithm. The tasseled cap transformation (TC) is a linear transformation which converts 6 spectral bands (blue, green, red, NIR, SWIR1 (1.55–1.75 μm), and SWIR2 (2.107–2.294 μm)) of Landsat 8 to six composite bands which efficiently represent the underlying physical features. The transformed components in each of the six composite bands are orthogonal to each other. The first three composite bands define the brightness (B), greenness (G), and wetness (W), respectively, and have been widely used for vegetation, soil and land cover mapping. TC4 (Crist, 1985) has a major contribution from haze (Richter, 1996) and has demonstrated efficacy for de-hazing Landsat Thematic Mapper images (Lavreau, 1991). These properties make the tasseled cap transformation a viable precursor to feature detection algorithms. Zhang et al. (2002) proposed the haze-optimised transform (HOT), which is a two-band transformation based on TC using only the blue and red bands. HOT in combination with dark object subtraction has been used as an operational technique for relative atmospheric correction (Zhang et al., 2002; Moro and Halounova, 2007). However, HOT requires a pre-condition of high correlation between the blue and red bands, which in some (admittedly relatively rare) instances is not met. In addition, HOT is shown to provide false response to bare rock/ soil, water bodies, snow/ice and certain man-made targets. Although a few recent studies (Liu et al., 2011; He et al., 2010) have tried to improve HOT for reducing the spurious response to non-haze features, the HOT still needs more research before it can be used as a robust
TC4 = − 0.8239(Blue) + 0.0849(Green) + 0.4396(Red) − 0.058(NIR) + 0.2013(SWIR1) − 0.2773(SWIR2)
(1)
As we can see, the blue and red spectral bands (used in the HOT) are more weighted, followed by the SWIR1 and SWIR2 bands. Clouds and snow have high reflectances across the optical portion of the spectrum, yielding large negative values for TC4. Vegetation is relatively unreflective except for the NIR, leading to small (but still negative) TC4 values. TC4 on the whole has a hitherto largely untapped potential to distinguish clouds, snow and water from vegetation and clear sky features. A support vector machine (SVM) is a computationally efficient algorithm which finds a hyperplane using minimal training data (support vectors) at the edge of class distributions and finds optimal decision boundaries (Mountrakis et al., 2011). It operates without any data assumptions, avoids overfitting, and often minimizes misclassification. As a result, many studies have successfully used SVM for classification applications (Zhu and Blumberg, 2002; Melgani and Bruzzone, 2004; Liu and Liu, 2010; Liu et al., 2014; Bahari et al., 2014). SVMs, however, do not account very well for the redundancy and high-dimensionality in remote sensing data as spectral resolution increases [9]. As a result, to derive best-performance from SVMs for higher spectral resolution image data, many studies (Melgani and Bruzzone, 2004; Liu et al., 2014; Bahari et al., 2014) combined feature extraction and reduction techniques like principal component analysis (PCA), tasseled cap transformation (TCT), and independent component analysis (ICA) with SVMs and reported higher classification accuracies relative to the use of individual spectral bands. Considering its superior classification accuracy, performance (computational efficiency), suitability for smaller training data sets, and feasibility of easy implementation, SVM presents strong potential for use in multispectral classification problems, even as spectral resolution increases, when combined with feature reduction techniques. As discussed above, TC4 is a valuable feature transformation to separate clouds, snow and water from clear sky features, and presents us with an important feature reduction basis suitable for training an efficient SVM. As clouds and snow have high reflectances in the visible bands (blue, green and red) and NIR, these bands are not suitable to distinguish clouds and snow, which are grouped together by the TC4 feature (Zhu and Woodcock, 2012). However, combining TC4 and SWIR has the potential to distinguish clouds and snow (using SWIR) and distinguish clouds, snow and water from ground and clear sky features (using TC4). The pairing of TC4 with the SWIR2 band is better than the pairing of TC4 with the SWIR1 band as suggested by (1) a larger weight of 0.2773 for the SWIR2 band in the TC4 as compared to the weight of 0.2013 for SWIR1 band, and (2) the established equivalence in the cloud and snow separation using either the SWIR1 or SWIR2 band. The work presented in this paper utilizes the combined potential of TC4 and SWIR2 in the framework of SVM to develop a classification algorithm for screening clouds, snow, water, and ground (vegetation/ land cover) in Landsat 8 images. The rest of the paper is organized as follows. Section 2 presents the data and methods, Section 3 describes the results and discussion, and Section 4 outlines the summary and future work.
2
Int J Appl Earth Obs Geoinformation 82 (2019) 101898
P.P. Joshi, et al.
Fig. 1. Locations of the Landsat 8 biomes on the world map. The red stars represent the ten different biomes used in the training of STmask. From left to right in longitude, the locations represent: Illinois Great Plain (path/row: 23/32), Michigan Freshwater (path/row: 22/31), Virginia Deciduous Forest (path/row: 17/34), Amazon Rainforest (path/row: 232/62), Greenland Tundra (path/row: 9/13), Greenland Icesheet (path/row: 33/1), Zambia Savanna (path/row: 22/31), Sahara Desert (178/43), Siberia Taiga (path/row: 150/19), and Bay of Bengal Ocean (path/row:141/48).
2. Methods
landsat.usgs.gov/landsat-8-cloud-cover-assessmentvalidation-data) which consists of a total of 96 Landsat 8 OLI/TIRS Level-1T scenes with 12 scenes from each of 8 biomes (Barren, Forest, Grass/Crops, Shrubland, Snow/Ice, Urban, Water and Wetlands). We generate 10000 pixels uniformly at random for all 96 scenes resulting in 960000 total pixels with 120000 pixels for each of the 8 biomes. For these 960000 pixels, a manual ground truth mask provided by USGS serves as our reference data set for testing.
2.1. Data Landsat 8 scenes are structured in the USGS database based on the path/row of the WRS-2 reference coordinates. The training data set used in this work comprises 32 scenes spaning ten different biomes of the world. Fig. 1 shows the location of these ten biomes (referenced to path/row coordinates) on the world map. The biome locations in Fig. 1 correspond to Virginia deciduous forest (path/row: 17/34), Amazon Rainforest (path/row: 232/62), Illinois Great Plain (path/row: 23/32), Zambia Savanna (path/row: 174/69), Sahara Desert (path/row: 178/ 43), Bay of Bengal Ocean (path/row: 141/48), Michigan Freshwater (path/row: 22/31), Greenland Tundra (path/row: 9/13), Siberia Taiga (path/row: 150/19), and Greenland Icesheet (path/row: 33/1). The reason for selecting these biomes is to include all possible extremes of vegetation, weather and seasons (from desert to ice sheets). This diversity in training data satisfies an important property of having unbiased data for training discriminative machine learning algorithms such as SVM. The 32 images used in this study are summarized in Table 2. As listed in Table 2, 23 images from the Virginia scene cover all seasons over an entire year, whereas the rest of the 9 scenes from each of the 9 other biomes add spatial diversity of cloud cover, vegetation and weather. The reference data set of 12891 pixels is created by using randomly sampled pixels from each scene and manually classifying the pixels for all 32 images. The class labels used for the reference data classification are cloud (C), ground (G), snow (S), shadow (D), and water (W). Among these classes, it must be noted that the ground class represents vegetation, land cover and clear pixels whereas the shadow class represents pixels with shadows from land features and clouds (dark, almost black, in appearance in the composite color image). The training data set consists of 8211 pixels from all multi-temporal Virginia scenes combined, plus 520 pixels from each of the remaining 9 scenes from 9 different biomes. Cumulatively this sums up to 12891 pixels for algorithm training. For testing and accuracy assessment we use the Landsat 8 Cloud Cover Assessment Validation Data (https://
2.2. Machine learning technique: Multi-class SVM SVM is a binary classifier, in the family of supervised learning models (Cortes and Vapnik, 1995). A multi-class SVM is commonly solved with either a “one vs. rest” strategy or a “pairwise” strategy. For the case of “one vs. rest”, for an n-dimensional input xi(i = 1, 2,...,N) with labels yi associated with M classes, we construct a set of M binary classifiers f1, f2. .. . fM where each classifier is trained to differentiate one class from the rest with maximum margin. The decision function for jth class gj (x) is given as:
gj (x) = wj x + bj
(2)
where wj is the n-dimensional weight vector and bj is a scalar. gj (x) returns a signed real-valued value which can be interpreted as the distance from the separation plane to the input vector x. The optimal hyperplane for jth class is given by the condition:
gj (x) = 0,
where j = 1, 2, ...,M
(3)
A standard Lagrangian optimization technique is generally used to find the optimal hyperplane, that allows us to express the optimal hyperplane solution as a linear combination (using scalar αi) of training vectors (yi, xi) as (Cortes and Vapnik, 1995): n
gj (x) =
∑ αi j yi xi + bj
where j = 1, 2, ...,M (4)
i=1
j
Here, we note that only the training vectors xi with αi >0 provide an effective contribution towards this sum. These (generally subset of) input training vectors xi are called the support vectors (as they 3
Int J Appl Earth Obs Geoinformation 82 (2019) 101898
P.P. Joshi, et al.
“support” the position of the separating hyperplane). If the training data is linearly separable, then input vector x is classified into the class j if it satisfies gj (x) > 0. To enable classification in the case that the input vector x satisfies gj (x) > 0 for more than one class j, the input vector x is assigned to the class who's distance gj (x) is the largest, represented as class for which,
arg max gj (x)
Table 2 List of biome-specific scenes used for the training of STmask Biome
Landsat 8 Scene ID
Virginia Deciduous Forest (Path/row: 17/34)
(5)
j = 1,2,..., M
If the training data is not linearly separable, it is converted to a linearly separable feature space through a non-linear transform function called a kernel. Polynomial and radial-basis functions are the most commonly used non-linear kernels. The same steps shown in equations (2–5) are then applied in the transformed feature space (which is now linearly separable), which generally results in optimal hyperplanes that are non-linear in the original training feature space. Another way of solving the multi-class SVM problem is by using “pairwise” classification. Here, for M classes, we have M(M-1)/2 classifiers (i.e, hyperplanes). It uses the same approach given by equations (2–4) to find the optimal hyperplanes gj (x) between each pair of classes (i.e., j = 1, 2, .. ., M(M − 1)/2). Any given input vector x is classified by applying each of the pairwise classifiers and counting the number of assignments to a particular class. The class with maximum number of assignment counts is considered the class label for x. The pairwise classification (also called ”one against one”) approach is shown to reduce unclassified data resulting in slightly improved accuracy (Pal, 2005).
Amazon Rainforest (Path/ Row: 232/62) Illinois Great Plain (Path/ Row: 23/32) Zambia Savanna (Path/Row: 174/69) Sahara Desert (Path/Row: 178/43) Bay of Bengal Ocean (Path/ Row: 141/48) Michigan Freshwater (Path/ Row: 22/31) Greenland Tundra (Path/ Row: 9/13) Siberia Taiga (Path/Row: 150/19) Greenland Icesheet (Path/ Row: 33/1)
2.3. Feature selection To benchmark the SVM feature selection, we use the rminer library in R which evaluates the relative importance of Landsat 8 multispectral bands and TC4 on the classification accuracy of SVM. The sensitivity method queries the fitted SVM model using a range of values for a given training feature, and obtains a set of sensitivity responses. The importance of that feature is then quantified using the average absolute deviation (AAD) from the median value of the SVM responses as described in detail by Cortez and Embrechts (2013). We use the AAD measure instead of measures such as variance or gradient because it is robust to the effect of outliers present in the Landsat 8 band reflectances. The feature set consists of 10 spectral bands and TC4 with random samples from the training dataset. As these 11 features exhibit correlated variation, we allow all features to simultaneously vary during the evaluation of individual feature importance. The relative importance of each feature is then calculated as a fraction of summed importance measures of all features. Such a sensitivity test (with feature interactions) is implemented using the global data-based sensitivity analysis (GSA method) of the “importance()” function in the rminer library. Table 1 presents the relative feature importance (in %) obtained from this data-based sensitivity test. It is clear that TC4 and SWIR2 have by far the highest importance for our classification problem and thus further justifies the physical reasons discussed in Section 1 for our selection of TC4 and SWIR2 for training the SVM model.
LC08_L1TP_017034_20130622_20170309_01_T1 LC08_L1TP_017034_20130724_20170309_01_T1 LC08_L1TP_017034_20130809_20170309_01_T1 LC08_L1TP_017034_20130825_20170309_01_T1 LC08_L1TP_017034_20130910_20170309_01_T1 LC08_L1TP_017034_20130926_20170308_01_T1 LC08_L1GT_017034_20131012_20170308_01_T2 LC08_L1TP_017034_20131028_20170308_01_T1 LC08_L1TP_017034_20131113_20170307_01_T1 LC08_L1TP_017034_20131129_20170307_01_T1 LC08_L1TP_017034_20131215_20170307_01_T1 LC08_L1TP_017034_20131231_20170307_01_T1 LC08_L1TP_017034_20140116_20170307_01_T1 LC08_L1TP_017034_20140201_20170307_01_T1 LC08_L1TP_017034_20140217_20170307_01_T2 LC08_L1TP_017034_20140305_20170306_01_T1 LC08_L1TP_017034_20140321_20180130_01_T1 LC08_L1TP_017034_20140406_20170307_01_T1 LC08_L1GT_017034_20140422_20170306_01_T2 LC08_L1TP_017034_20140508_20170307_01_T1 LC08_L1TP_017034_20140524_20170306_01_T1 LC08_L1TP_017034_20140609_20170305_01_T1 LC08_L1TP_017034_20140625_20170304_01_T1 LC08_L1GT_232062_20151105_20170402_01_T2 LC08_L1GT_023032_20161030_20170219_01_T2 LC08_L1TP_174069_20160203_20170330_01_T1 LC08_L1TP_178043_20150111_20170415_01_T1 LC08_L1TP_141048_20130915_20170502_01_T1 LC08_L1TP_022031_20160905_20170222_01_T1 LC08_L1TP_009013_20150722_20170406_01_T1 LC08_L1TP_150019_20160330_20170327_01_T2 LC08_L1TP_033001_20150916_20170404_01_T1
from the USGS ESPA database. TC4 was then calculated for all training pixels in each of the 32 scenes using equation (1). The aggregate training pixels comprising of TOA reflectance of SWIR2 (band 7) and TC4 for all 32 scenes were then combined and plotted in the feature space of TC4 and SWIR2. Fig. 2a represents this aggregate feature space. Every pixel is color coded with respect to the reference data class label (cyan for cloud, light pink for snow, white for ground, dark pink for water, and light cyan for shadows). From Fig. 2, we clearly observe that the features of cloud, ground, snow, shadow and water have a very good grouping (cluster). This is consistent with the knowledge used for deciding the best cues for the feature space. As a result, our task is simplified (reducing complexity). The next step is to decide the separation boundaries for mapping the pixels to the respective class labels. To achieve this, we use the popular classification technique of multi-class SVM. In this work we use the “pairwise” classification approach through the implementation of SVM (as described in Section 2.2) using version 1.6-8 of the e1071 library in R. We train the multiclass SVM technique on the aggregate training data set shown in Fig. 2a.
2.4. Algorithm training In this work, we used Top of Atmosphere (TOA) scenes obtained Table 1 Feature sensitivity of SVM Feature
Band 1
Band 2
Band 3
Band 4
Band 5
Band 6
Band 7
Band 9
Band 10
Band 11
TC4
% Relative importance
Coastal 10.11
Blue 6.19
Green 6.56
Red 7.10
NIR 7.69
SWIR1 10.81
SWIR2 17.88
Cirrus 7.41
TIRS1 0.52
TIRS2 3.42
22.31
4
Int J Appl Earth Obs Geoinformation 82 (2019) 101898
P.P. Joshi, et al.
Fig. 2. TC4 vs SWIR2 feature space representing training dataset (panel a) and the SVM generated feature-classes (panel b). The reference data pixels as well as the respective SVM classes are color coded as cloud (cyan), ground/clear (white), water (dark pink), snow (light pink) and shadow (light cyan).
matrix), specificity, and kappa (Cohen, 1960; Foody, 2004). In addition, the F-measure, which combines sensitivity and precision into a single class-specific accuracy, was calculated with equal weighting for both precision and sensitivity (β = 1) (Sokolova et al., 2006; Brooks et al., 2017). Comparison of the kappa coefficients for STmask and CFmask provides an estimate of the magnitude of difference between the two algorithms and has often been the basis for technique comparisons (Foody, 2004). However, as we use the same reference test data set for calculating the kappa coefficient of STmask and CFmask, the requirement of independent samples underlying the kappa statistic is not satisfied (Cohen, 1960; Foody, 2004). As a result, we cannot use the comparison of kappa statistics to evaluate the statistical significance of difference in the algorithm accuracy. For such related samples in our case, McNemar's test provides a robust non-parametric test for assessing the statistical significance of differences (Bradley, 1968). Hence, instead of comparing kappa statistics we calculate the McNemar's test statistics for each biome to quantify the statistical significance of the observed performance differences in STmask and CFmask.
As the decision boundaries between classes in our work are non-linear in nature, we train the SVM model with a radial-basis kernel function and tune the SVM (using the built-in R optimizer) to generate a hyperplane with maximum separation of classes without over-fitting (Cortes and Vapnik, 1995). The resulting feature space segmentation achieved by the SVM is represented in Fig. 2b with the cloud class as C (cyan), ground class as G (white), shadow class as D (light cyan), snow class as S (light pink), and water class as W (dark pink). The resulting support vectors and data classes are denoted in Fig. 2a by crosses and circles, respectively, and color coded for each class. We thus get a trained algorithm for classification with best possible match with the reference training data as seen from the overlap of reference data pixels in Fig. 2a and class labels in Fig. 2b. As the feature mask is based on SWIR and TC4 feature space, we name it as STmask. A small source of uncertainty in the final model comes from any possible errors in manual creation of the training data set. However, the use of an extensive training data set of all classes serves to minimize this effect. 2.5. Algorithm implementation
3. Results and discussion The complete algorithm is described by the flowchart shown in Fig. 3. We start with the TOA processed scenes obtained from USGS ESPA database for a given scene location and time. The six TOA bands (blue, green, red, NIR, SWIR1 and SWIR2) are combined into a composite image. TC4 is computed on this composite image using equation (1), resulting in another raster layer. The trained SVM algorithm (STmask) described in Section 2.2 is then applied to the composite image comprised of SWIR2 and TC4 rasters to classify each pixel into one of five classes. The resulting classification produces a feature mask for the entire Landsat 8 image and uses the feature specific color coding (brown for ground/clear pixels, grey for clouds, black for shadows, cyan for snow and blue for water) for representation of the classified features. This is the final product of STmask and serves as the feature detection filter on any given Landsat 8 image. As CFmask is the operational algorithm for feature mask generation in Landsat data product, the performance of STmask is evaluated in comparison to CFmask. As a result, accuracy assessment is performed for both STmask and CFmask using test pixels for each of the five classes across all 96 scenes. It is important to note here that the main focus of STmask is to improve cloud detection performance compared to the rest of the features (ground, cloud, snow and water). Hence, McNemar's statistic is calculated to quantify the differences in performance relative to the CFmask algorithm for both overall feature detection and cloud detection.
The STmask algorithm was implemented on one scene from each biome in the test data set to generate a classified image with feature masks. For each scene, the respective classified image using CFmask was obtained from the pixel quality assessment data product at ESPA. To detect the differences in the classified features, a difference mask image was generated by subtracting the feature masks produced by CFmask and STmask algorithms. As the focus of STmask is on the improvement in cloud detection, the color coding of the difference mask denotes the pixels with cloud overestimation by CFmask in red, cloud underestimation by CFmask in orange and all other feature differences clubbed together in green. Fig. 4 presents the feature mask for every biome-specific scene generated by STmask and CFmask along with their difference masks and the original Landsat 8 TOA images. From Fig. 4, we qualitatively observe that STmask accurately captures most of the detailed features in the original Landsat 8 images. As seen from the red color features in the difference mask of all biomes except barren and snow/ice, it is observed that CFmask overestimates cloud cover at the expense of lesser precision, whereas STmask is more precise in the detection of cloud cover. In the case of barren and snow/ice biomes, the orange color features in the difference mask indicate cloud features missed by CFmask which were detected by STmask, as evident from the visual comparison with the original Landsat image. CFmask is known to underestimate clouds when the underlying surface is hot and bright, similar to the case of sand dunes in a desert (barren) biome (Foga et al., 2017). As a result, the improved sensitivity to cloud cover by STmask in the barren biome is a major advantage. CFmask was found by Foga et al. (2017) to have high commission errors which resulted in underestimation of cloud cover (low sensitivity) when the underlying features
2.6. Accuracy assessment Accuracy assessment was conducted using standard methods resulting in measures of sensitivity (producer's accuracy), precision (user's accuracy), overall accuracy (diagonal accuracy in the contingency 5
Int J Appl Earth Obs Geoinformation 82 (2019) 101898
P.P. Joshi, et al.
Fig. 3. Flowchart representing training, implementation and validation of the STmask algorithm.
sensitivity of cloud detection, leading to a higher overall accuracy in the detection of cloud cover. The detection of snow/ice, water, and ground/clear pixels with STmask is accurate, and comparable to CFmask. However, the major differences (indicated by green colored features) in the difference masks are observed to be comprised of the shadow features of CFmask, which are missed by STmask. The object detection approach of CFmask for shadows is certainly more effective. Fewer training pixels for the shadow class along with ambiguity errors in the manual labels within the reference data table are the most likely reasons for an average performance in shadow detection. As seen in Fig. 2, shadows and water pixels form the smallest set of training pixels and drive non-significant performance improvement in detection. Although the focus of STmask has been on the improvement of cloud detection, an extended training data set for shadow and water pixels could improve the classification of these features as a direction of future work. Table 3 presents the accuracy assessment of both STmask and CFmask using 120000 test pixels from the respective biome-specific scenes. As the ground truth mask in the USGS Landsat 8 validation data set only labels cloud, clear, and cloud shadow pixels, we include the accuracy assessment for each of these three features for all 8 biomes. As the STmask classification considers both thin and thick cloud together as clouds, we combine the “thin cloud” and “cloud” data in the truth mask and CFmask for consistent evaluation of cloud feature accuracy.
are bright and very cold, such as snow/ice cover, because of the similarity in the visible band signature for snow/ice and clouds. Such conditions exist in scenes from the snow/ice biomes, where CFmask clearly underestimates the cloud cover over the icesheet whereas STmask provides higher sensitivity as well as increased precision in detecting clouds (as denoted by red and orange color features in the difference mask). This points to another notable improvement of STmask when compared to CFmask. CFmask is prone to overestimate clouds over water (Foga et al., 2017). Such overestimation of clouds by CFmask is observed in the water biome scenes, where, STmask delivers more precise detection of clouds and increased sensitivity to water, thus addressing another important challenge faced by CFmask. Also, for scenes having cloud cover over vegetation (forests, grass/crops, shrublands, wetlands and urban) STmask is again observed to provide improved precision in cloud detection over CFmask (evident from red color features in difference masks of the respective scenes). Although less significant than the barren, water, and snow/ice biomes, the improvement in such cases is notable. This improvement also addresses in part the challenge of cloud overestimation by CFmask over bright urban targets as described by Foga et al. (2017). The extensive data set of training pixels we developed for this study is likely an important driver of this improved cloud detection precision. Although STmask has a slightly smaller sensitivity to clouds than CFmask, it jointly maximizes both precision and 6
Int J Appl Earth Obs Geoinformation 82 (2019) 101898
P.P. Joshi, et al.
Fig. 4. Comparison of feature masks generated by STmask and CFmask for all biomes in the test data set. The feature masks are color coded as cloud (grey), ground (brown), water (blue), snow (cyan) and shadow (black). The color coding of the difference mask represents the pixels with cloud overestimation by CFmask in red, cloud underestimation by CFmask in orange, matching in both CFmask and STmask as white, and all other feature differences clubbed together in green.
7
Int J Appl Earth Obs Geoinformation 82 (2019) 101898
P.P. Joshi, et al.
Table 3 Accuracy assessment of STmask and CFmask Biome / Scene
Parameter
STmask
CFmask
Ground
Cloud
Shadow
Ground
Cloud
Shadow
Precision Sensitivity Specificity Overall F-measure Kappa
79.1 38 91.5 67 51.3 0.31
74.3 93.6 52.8 69.9 84.9 0.62
NA NA NA NA NA NA
81.2 65.3 87.3 77.2 72.3 0.53
78 82.2 80.5 83.8 80.1 0.57
NA NA NA NA NA NA
Precision Sensitivity Specificity Overall F-measure Kappa
81.6 63.2 91.3 80.7 71.2 0.57
85.4 83.8 84.7 84.2 84.6 0.68
32.3 57.1 95.7 94.3 41.3 0.38
86 60.1 94.1 81.2 70.7 0.56
80.5 91.7 76.2 84.2 85.7 0.68
25.5 38.2 96.1 94.1 30.5 0.28
Precision Sensitivity Specificity Overall F-measure Kappa
71.8 88.6 74.6 80.5 79.3 0.61
66.1 82.3 76.7 78.7 73.3 0.65
14.3 6.2 99.4 97.9 8.7 0.09
75.9 83.1 80.8 81.8 79.3 0.63
64.1 92.1 71.4 78.8 75.5 0.67
20 33.3 98.0 96.9 24.9 0.24
Precision Sensitivity Specificity Overall F-measure Kappa
82.4 76.4 85 80.9 79.3 0.61
70.7 76.2 76.7 76.5 73.3 0.72
3.6 12.5 97.3 96.7 5.6 0.03
73.5 75.6 74.8 75.2 74.5 0.50
67.9 70.8 75.4 73.5 69.3 0.65
6.8 37.5 95.9 95.4 11.5 0.08
Precision Sensitivity Specificity Overall F-measure Kappa
NA NA NA NA NA NA
72 74.4 33.4 45.5 73.8 0.73
NA NA NA NA NA NA
NA NA NA NA NA NA
71.4 49 55 53.2 58.1 0.61
NA NA NA NA NA NA
Precision Sensitivity Specificity Overall F-measure Kappa
75 28.9 90.5 59.8 41.7 0.19
78.1 88.9 70.4 78.1 82.2 0.64
NA NA NA NA NA NA
81.7 39.6 91.3 65.6 53.3 0.31
70.2 93.5 71.8 80.8 80.2 0.62
NA NA NA NA NA NA
Precision Sensitivity Specificity Overall F-measure Kappa
78.8 11.2 97.4 57.5 19.6 0.09
77.6 85.8 82.4 83.8 81.5 0.77
2.2 33.3 91.3 90.9 4.1 0.02
66.7 12.1 94.8 56.4 20.5 0.07
71.9 76.6 78.8 78 74.1 0.71
6.7 33.3 97.2 96.8 11.1 0.08
Precision Sensitivity Specificity Overall F-measure Kappa
76.3 55.3 87.9 74.5 64.1 0.46
77.5 88.4 78.5 83 82.6 0.66
NA NA NA NA NA NA
79 53.8 89.9 75 64 0.45
73.7 94.5 72 82.2 82.8 0.67
NA NA NA NA NA NA
Barren
Forest
Grass/Crops
Shrubland
Snow
Urban
Water
Wetland
CFmask. This increased sensitivity is achieved through higher precision in detection of cloud cover in such biomes as opposed to CFmask. Furthermore, in the barren and snow/ice biome, STmask exhibits higher sensitivity to cloud cover and accounts for the clouds missed by CFmask. As seen in Table 3, the kappa coefficients for both STmask and CFmask are ≥0.6 for most biomes, indicating moderate to very good agreement between the respective algorithm classification and reference data (Cohen, 1960; Congalton and Mead, 1983). McNemar test results are presented in Table 4. McNemar test statistics of z-score (Z), chi-square (χ2) and probability for 95% confidence interval are presented for the overall feature detection and the cloud
In line with our observation from Fig. 4, across all biomes, STmask is observed to provide higher precision and generally improved overall accuracy (comparable for forest and grass/crops biomes) of cloud detection compared to CFmask. This is also evident from the typically higher values of F-measure compared to CFmask, which implies that STmask jointly maximizes both precision and sensitivity compared to CFmask, which appears to maximize sensitivity at the cost of slight loss of precision. (NB: For some change detection and image classification algorithms this is more feature than flaw, since undetected clouds can be worse than overdetected clouds.) For water biomes with large cloud cover, STmask provides higher sensitivity to water features than
8
Int J Appl Earth Obs Geoinformation 82 (2019) 101898
P.P. Joshi, et al.
of training samples (Liu et al., 2014; Bahari et al., 2014). Based on all results in Figs. 4 and 5, coupled with the performance evaluations in Tables 3 and 4, we observe that STmask achieves promising improvements in feature classification both qualitatively and quantitatively in the wake of its simple but novel approach. The reference STmask model in R is accessible at https://doi.org/10.6084/m9.figshare.8852231.v1 as an additional reference about the presented approach.
Table 4 McNemar's test Scene
Z Overall
χ2 Overall
p Overall
Z Cloud
χ2 Cloud
p Cloud
Barren Forest Grass/Crops Shrubland Snow Urban Water Wetland
81.8 24.4 17.9 24.4 80.4 75.2 34.3 26.1
6704.6 596.8 323.3 593.9 6469.5 5659.7 1174.6 684.3
0.0 8e-132 2.7e-72 3.4e-131 0.0 0.0 1.9e-257 7.7e-151
98.1 2.7 1.7 30.8 45.9 25.8 52.4 8.8
9616.7 7.3 2.8 953.6 2107.2 669.1 2745.5 78.4
0.0 0.005 0.09 2.2e-209 0.0 1.6e-147 0.0 8.4e-19
4. Summary and conclusions A novel algorithm, STmask, is developed to automatically detect land cover features and clouds in Landsat 8 images using SVM on 2D feature space of SWIR2 and TC4. Based on a multi-biome accuracy assessment using Landsat 8 images, STmask is observed to provide improved precision and overall accuracy of cloud detection compared with the existing operational algorithm for Landsat (CFmask). Statistically significant improvement in overall feature detection as well as cloud detection performance is observed in snow/ice, barren, water, urban and shrubland biomes, which address some of the known challenges faced by CFmask. As an standalone algorithm, STmask has a promising scope for biome-independent applicability in generating feature masks using multispectral images from any sensor with a SWIR2 band comparable to Landsat 8 and from which TC4 can be calculated. As such, it is not specific to Landsat 8; Sentinel 2 MSI data, for example, could be used after preprocessing to address the spatial resolution mismatch between the VNIR and SWIR. Alternatively, due to its simple approach it could also be used to augment the cloud detection performance in the framework of CFmask. Future work involving a more extensive training datatest can further benefit the detection performance of shadow and water features in STmask as well as offer a scope for large scale implementation. Acknowledgments This work was supported in part by USGS Grant G12PC00073, “Making Multitemporal Work”, in part by the Virginia Agricultural Experiment Station and the McIntire-Stennis Program of NIFA, USDA (Project Number 1007054, “Detecting and Forecasting the Consequences of Subtle and Gross Disturbance on Forest Carbon Cycling”), and in part by the Remote Sensing IGEP program at Virginia Tech. The authors would like to thank Paige Williams and Lilly Potts from the Environmental Informatics program at Virginia Tech for their contribution in preparing the training data sets used in this effort; Dr. Evan Brooks from the Virginia Tech Department of Forest Resources and Environmental Conservation for useful discussions; Dr. Joseph Baker and Dr. J. Michael Ruohoniemi from the Department of Electrical and Computer Engineering at Virginia Tech for their support and advisory role; and USGS for making the Landsat archive freely available.
Fig. 5. Aggregate F-measure for cloud detection calculated as an average of biome-specific F-measures summarized in Table 3. The dark grey and light grey bars represent the average F-measure for STmask and CFmask, respectively.
detection alone. Statistically significant change is indicated by p < 0.05 and χ2 > 3.841 (equivalent to Z > 1.96). From Table 4, it is observed that the overall feature detection of STmask has statistically significant differences with a significance level of 95% relative to CFmask. The improvement of cloud detection has the highest statistical significance for the snow/ice, barren and water biomes with moderate improvement for Shrubland and Urban biomes. It is more modest for the wetland biome. On the other hand, the cloud detection performance of STmask is comparable to that of CFmask over the biomes of forest and grass/crops with no notable advantage at the 95% significance level. All of these performance differences are also reflected in the feature masks in Fig. 4 and accuracy assessment in Table 3. Fig. 5 highlights the overall improvement in cloud detection performance of STmask compared to CFmask as indicated by the increase in average F-measure calculated as mean of the scene-specific F-measures. Aggregated, the average improvement in cloud detection over CFmask is observed to be ∼3.8%, which is modest but noteworthy considering that the performance improvement addresses some of the challenges in the operational CFmask algorithm. The performance improvements in the overall classification achieved by SVM presents a classic example of suitability of SVM for image classification when combined with feature reduction techniques like the tasseled cap transformation. The generic nature and efficiency of the STmask algorithm also benefits from the fact that SVM does not require assumptions about the training data and operates efficiently independent of the size
References Bahari, N.I.S., Ahmad, A., Aboobaider, B.M., 2014. Application of support vector machine for classification of multispectral data. IOP Conference Series: Earth and Environmental Science 20, 12038. http://stacks.iop.org/1755-1315/20/i=1/a= 012038. Baig, M.H.A., Zhang, L., Shuai, T., Tong, Q., 2014. Derivation of a tasselled cap transformation based on Landsat 8 at-satellite reflectance. Remote Sens. Lett. 5, 423–431. https://doi.org/10.1080/2150704X.2014.915434. Bradley, J., 1968. Distribution-Free Statistical Tests. Prentice-Hall, Englewood Cliffs, New Jersey, pp. 388. Brooks, E.B., Thomas, V.A., Wynne, R.H., Coulston, J.W., 2012. Fitting the multitemporal curve: A Fourier series approach to the missing data problem in remote sensing analysis. Geoscience and Remote Sensing, IEEE Transactions on 50, 3340–3353. https://doi.org/10.1109/TGRS.2012.2183137. Brooks, E.B., Wynne, R.H., Thomas, V.A., Blinn, C.E., Coulston, J.W., 2014. On-the-fly massively multitemporal change detection using statistical quality control charts and Landsat data. IEEE Transactions on Geoscience and Remote Sensing 52, 3316–3332. https://doi.org/10.1109/TGRS.2013.2272545. Brooks, E.B., Yang, Z., Thomas, V.A., Wynne, R.H., 2017. Edyn: Dynamic signaling of changes to forests using exponentially weighted moving average charts. Forests 8.
9
Int J Appl Earth Obs Geoinformation 82 (2019) 101898
P.P. Joshi, et al.
https://doi.org/10.1109/TGRS.2004.831865. Moro, G.D., Halounova, L., 2007. Haze removal for high resolution satellite data: a case study. Int. J. Remote Sens. 28, 2187–2205. https://doi.org/10.1080/ 01431160600928559. Mountrakis, G., Im, J., Ogole, C., 2011. Support vector machines in remote sensing: A review. ISPRS J. Photogrammetry Remote Sens. 66, 247–259. https://doi.org/10. 1016/j.isprsjprs.2010.11.001. Pal, M., 2005. Multiclass approaches for support vector machine based land cover classification. Richter, R., 1996. Atmospheric correction of satellite data with haze removal including a haze/clear transition region. Comput. Geosci. 22, 675–681. https://doi.org/10.1016/ 0098-3004(96)00010-6. Sokolova, M., Japkowicz, N., Szpakowicz, S., 2006. Beyond accuracy, F-Score and ROC: A family of discriminant measures for performance evaluation. In: Sattar, A., Kang, B.h. (Eds.), AI 2006: Advances in Artificial Intelligence. Springer Berlin Heidelberg, Berlin, Heidelberg, pp. 1015–1021. Sun, L., Mi, X., Wei, J., Wang, J., Tian, X., Yu, H., Gan, P., 2017. A cloud detection algorithm-generating method for remote sensing data at visible to short-wave infrared wavelengths. ISPRS J. Photogrammetry Remote Sens. 124, 70–88. https://doi. org/10.1016/j.isprsjprs.2016.12.005. Vermote, E., Saleous, N., 2007. LEDAPS Surface Reflectance Product Description. Xiao, Y., Ouyang, Z., Zhang, Z., Xian, C., 2017. A comparison of haze removal algorithms and their impacts on classification accuracy for Landsat imagery. BCG 23, 55–71. http://www.scielo.br/scielo.php?script=sci_arttext&pid=S198221702017000100055&nrm=iso. Zhang, Y., Guindon, B., Cihlar, J., 2002. An image transform to characterize and compensate for spatial variations in thin cloud contamination of Landsat images. Remote Sens. Environ. 82, 173–187. https://doi.org/10.1016/S0034-4257(02)00034-2. Zhang, Y., Rossow, W.B., Lacis, A.A., Oinas, V., Mishchenko, M.I., 2004. Calculation of radiative fluxes from the surface to top of atmosphere based on ISCCP and other global data sets: Refinements of the radiative transfer model and the input data. J. Geophys. Res.: Atmospheres 109. https://doi.org/10.1029/2003JD004457. Zhu, G., Blumberg, D.G., 2002. Classification using ASTER data and SVM algorithms: The case study of Beer Sheva, Israel. Remote Sens. Environ. 80, 233–240. https://doi.org/ 10.1016/S0034-4257(01)00305-4. Zhu, Z., Wang, S., Woodcock, C.E., 2015. Improvement and expansion of the Fmask algorithm: cloud, cloud shadow, and snow detection for Landsats 4-7, 8, and Sentinel 2 images. Remote Sens. Environ. 159, 269–277. https://doi.org/10.1016/j.rse.2014. 12.014. Zhu, Z., Woodcock, C.E., 2012. Object-based cloud and cloud shadow detection in Landsat imagery. Remote Sensing of Environment 118, 83–94. https://doi.org/10. 1016/j.rse.2011.10.028. Zhu, Z., Woodcock, C.E., 2014. Automated cloud, cloud shadow, and snow detection in multitemporal Landsat data: An algorithm designed specifically for monitoring land cover change. Remote Sens. Environ. 152, 217–234. https://doi.org/10.1016/j.rse. 2014.06.012. Zhu, Z., Woodcock, C.E., Olofsson, P., 2012. Continuous monitoring of forest disturbance using all available Landsat imagery. Remote Sens. Environ. 122, 75–91. https://doi. org/10.1016/j.rse.2011.10.030.
Cohen, J., 1960. A Coefficient of Agreement for Nominal Scales. Educational and Psychological Measurement 20, 37–46. https://doi.org/10.1177/ 001316446002000104. Congalton, R., Mead, R.A., 1983. A quantitative method to test for consistency and correctness in photo-interpretation. Photogrammetric Eng. Remote Sens. 49, 69–74. Cortes, C., Vapnik, V., 1995. Support-vector networks. Machine Learn. 20, 273–297. https://doi.org/10.1007/BF00994018. Cortez, P., Embrechts, M.J., 2013. Using sensitivity analysis and visualization techniques to open black box data mining models. Inform. Sci. 225, 1–17. https://doi.org/10. 1016/j.ins.2012.10.039. Crist, E.P., 1985. A TM tasseled cap equivalent transformation for reflectance factor data. Remote Sens. Environ. 17, 301–306. https://doi.org/10.1016/0034-4257(85) 90102-6. Foga, S., Scaramuzza, P.L., Guo, S., Zhu, Z., Dilley, R.D., Beckmann, T., Schmidt, G.L., Dwyer, J.L., Hughes, M.J., Laue, B., 2017. Cloud detection algorithm comparison and validation for operational Landsat data products. Remote Sens. Environ. 194, 379–390. https://doi.org/10.1016/j.rse.2017.03.026. Foody, G.M., 2004. Thematic map comparison. Photogrammetric Eng. Remote Sens. 70. Gao, B.C., Goetz, A.F.H., Wiscombe, W.J., 1993. Cirrus cloud detection from airborne imaging spectrometer data using the 1.38 μm water vapor band. Geophysical Res. Lett. 20, 301–304. https://doi.org/10.1029/93GL00106. Gao, B.C., Yang, P., Han, W., Li, R.R., Wiscombe, W.J., 2002. An algorithm using visible and 1.38- um channels to retrieve cirrus cloud reflectances from aircraft and satellite data. Geosci. Remote Sens. IEEE Transactions on 40, 1659–1668. https://doi.org/10. 1109/TGRS.2002.802454. He, X.Y., Hu, J.B., Chen, W., Li, X.Y., 2010. Haze removal based on advanced haze-optimized transformation (AHOT) for multispectral imagery. Int. J. Remote Sens. 31, 5331–5348. https://doi.org/10.1080/01431160903369600. URL: https://doi.org/ 10.1080/01431160903369600. Irish, R.R., Barker, J.L., Goward, S.N., Arvidson, T., 2006. Characterization of the Landsat-7 ETM+ automated cloud-cover assessment (ACCA) algorithm. Photogrammetric Eng. Remote Sens. 72, 1179. Kong, X., Qian, Y., Zhang, A., 2011. Haze and cloud cover recognition and removal for serial Landsat images. Lavreau, J., 1991. De-hazing Landsat thematic mapper images. PE&RS 57, 1297–1302. https://ci.nii.ac.jp/naid/80006117919/en/. Li, H., Zhang, L., Shen, H., Li, P., 2012. A variational gradient-based fusion method for visible and SWIR imagery. Photogrammetric Eng. Remote Sens. 78, 947–958. Liu, C., Hu, J., Lin, Y., Wu, S., Huang, W., 2011. Haze detection, perfection and removal for high spatial resolution satellite imagery. Int. J. Remote Sens. 32, 8685–8697. https://doi.org/10.1080/01431161.2010.547884. Liu, Q., Guo, Y., Liu, G., Zhao, J., 2014. Classification of Landsat 8 OLI image using support vector machine with tasseled cap transformation. 2014 10th International Conference on Natural Computation (ICNC) 665–669. https://doi.org/10.1109/ ICNC.2014.6975915. Liu, Q., Liu, G., 2010. Combining tasseled cap transformation with support vector machine to classify Landsat TM imagery data. 2010 Sixth International Conference on Natural Computation 3570–3572. https://doi.org/10.1109/ICNC.2010.5582727. Melgani, F., Bruzzone, L., 2004. Classification of hyperspectral remote sensing images with support vector machines. IEEE Trans Geosci. Remote Sens. 42, 1778–1790.
10