Remote Sensing of Environment 182 (2016) 39–48
Contents lists available at ScienceDirect
Remote Sensing of Environment journal homepage: www.elsevier.com/locate/rse
Mapping Solanum mauritianum plant invasions using WorldView-2 imagery and unsupervised random forests Kabir Peerbhay ⁎, Onisimo Mutanga, Romano Lottering, Riyad Ismail University of KwaZulu-Natal, School of Agricultural, Earth and Environmental Sciences, Discipline of Geography, P/Bag X01, Scottsville, 3209, Pietermaritzburg, South Africa
a r t i c l e
i n f o
Article history: Received 17 July 2015 Received in revised form 21 April 2016 Accepted 30 April 2016 Available online xxxx Keywords: WorldView-2 Plant invasions Unsupervised random forest Proximity analysis Eigenvector analysis
a b s t r a c t The accurate detection and mapping of plant invasions is important for an effective weed management strategy in forest plantations. In this study, the utility of WorldView-2 was investigated to automatically map the occurrence of Solanum mauritianum (bugweed) found as an anomaly in forest margins, open areas and riparian zones. The unsupervised methodology developed, proved to be an effective and an accurate framework in detecting and mapping the invasive alien plant (IAP). Using the random forest (RF) proximity matrix, similarity measures between pixels were successfully transformed into scores (Eigen weights) for each pixel using eigenvector analysis. Neighbourhood windows with minimum variance revealed the most important information from localized surrounding pixels to detect potential anomalous pixels. Bugweed occurrence in forest margins, open areas and riparian zones were successfully mapped at accuracies of 91.33%, 85.08%, and 67.90%, respectively. This research has demonstrated the unique capability of using an automated unsupervised RF approach for mapping IAPs using new generation multispectral remotely sensed data. Crown Copyright © 2016 Published by Elsevier Inc. All rights reserved.
1. Introduction Plant invaders are causing extensive economic and ecosystem damage in their introduced environments. They effectively capitalize on available natural resources, survive and rapidly reproduce unaided across landscapes (van Wilgen, Richardson, Le Maitre, Marais & Magadlela, 2001). Inherent competitive qualities provide these plant invaders with the capability to suppress and displace surrounding vegetation, while extensive densities of these weeds may replace canopy or sub-canopy layers of ecosystems (Olckers, 2011). Plant invaders can also increase their competitive ability by modifying their environment to induce ecological change. This includes the indirect transformation of certain ecosystem structures and functions such as goods and services, vegetation health and nutrient dynamics (Ustin, DiPietro, Olmstead, Underwood & Scheer, 2002). Irrespective of the constant eradication and management efforts to control weeds, their abundance and adverse impacts on our valuable resources is increasing (Müllerová, Pergl & Pyšek, 2013). Once established, removing or slowing down weed infestations can prove nearly impossible. Precise automated detection methods are therefore a pre-requisite for weed management approaches to be cost effective while preserving the ecological integrity of our ecosystems. While traditional methods of detecting weed cover involved ground-based periodic surveys, the application of remote sensing to
⁎ Corresponding author. E-mail address:
[email protected] (K. Peerbhay).
http://dx.doi.org/10.1016/j.rse.2016.04.025 0034-4257/Crown Copyright © 2016 Published by Elsevier Inc. All rights reserved.
identify and map plant invasions has become increasingly popular (Lawrence, Wood & Sheley, 2006, Andrew & Ustin, 2008, Atkinson, Ismail & Robertson, 2014). Very high spatial resolution aerial photography, although limited in extent, provided researchers with an adequate source of information conditioned on the visual recognition and the correct phenological timing of the target weed species (Anderson, Everitt, Escobar, Spencer, & Andrascik, 1996, Everitt, Escobar, Alaniz, Davis & Richerson, 1996, Mullerova, Pyšek, JaroŠik & Pergl, 2005). However, studies utilising multispectral remote sensing have also been limited, since researchers have expressed concerns due to the lack of spatial and spectral details for the accurate detection of weeds (Carson, Lass & Callihan, 1995, Fuller, 2005). Using multispectral analysis is potentially valuable to observe large weed stands that are distinct and not obscured by the backdrop of surrounding plant species (Lass et al., 2005; Huang & Asner, 2009). The recent advancement of space sensor technologies allows researchers to investigate the potential of the new generation of multispectral sensors (e.g., WorldView-2, RapidEye and GeoEye). With greater spatial resolutions and the strategic placement of wavebands across the electromagnetic spectrum, the detection and mapping of weeds could be improved (Everitt, Yang & Deloach, 2005). For instance, Wang, Silván-Cárdenas, Yang & Frazier, 2013 integrated multi-temporal and multi-resolution imagery to differentiate invasive saltcedar from riparian vegetation. While the very high spatial resolution Quickbird imagery (0.61 m; 450 nm–900 nm) proved successful (82%), the hyperspectral AISA data (1 m; 430 nm–1000 nm) performed better (87.7%) when mapping the invasive weed. Support vector machines (SVM) produced the best classification results among the several supervised classification methods adopted.
40
K. Peerbhay et al. / Remote Sensing of Environment 182 (2016) 39–48
The application of hyperspectral remote sensing provides a more detailed spectral description for the accurate detection of invader plants (Ustin et al., 2002; Williams & Hunt, 2002, Glenn et al., 2005; Mundt et al., 2005; Lawrence et al., 2006; Asner, Jones, Martin, Knapp & Hughes, 2008, Andrew & Ustin, 2008, Atkinson et al., 2014). Despite the provision of the technology to fully exploit the electromagnetic spectrum to detect plant invaders and their compositions from the surrounding vegetation, the utility of the technology is often presented with numerous difficulties associated with data extraction and analysis (Peerbhay, Mutanga & Ismail, 2014a). The huge amount of spectral information can represent an oversampled dataset resulting in high data dimensionality and redundant wavebands, which may be irrelevant to detecting and mapping the species of interest (Peerbhay, Mutanga & Ismail, 2013). This study therefore investigates whether few high spatial resolution multispectral wavebands placed across strategic portions of the spectrum (visible and near-infrared range) could serve as a viable alternative to utilising high spectral imaging for weed detection and mapping. When dealing with conventional remote sensing techniques, detecting weed species proves to be a challenging task. The high correlation among variables (multicollinearity) coupled with image noise and background effects seem to limit the detection capability of multispectral techniques (van Coillie, Verbeke & De Wulf, 2007, Peerbhay, Mutanga & Ismail, 2014b). In addition, the subtle variation present between invader plants and surrounding vegetation reduces the statistical separability to distinguish between them, creating difficulty in terms of selecting a suitable algorithm. Most algorithms follow either a supervised or unsupervised learning approach. Supervised classification methods are commonly utilised in the remote sensing domain and the learning results are determined by a prescribed set of decisions that assign class membership to pixels based on user defined labels (Tarca, Carey, Chen, Romero & Drăghici, 2007). However, labelled classes may not actually match the information in reality since the predefined labels are based on field information (usually captured using a GPS) and hence subject to error (Tarca et al., 2007). Additionally, the classification of image data using supervised approaches is also highly dependent on the decisions of the analyst which could be subject to bias and potential human error. For instance, information captured during the training phase may not be truly representative of conditions encountered in an image. In contrast, information not captured and recognized in the training phase could also compromise the final classification results (Hegarat-Mascle, Bloch & Vidal-Madjar, 1997, Tarca et al., 2007). In this regard, unsupervised learning methods are suggested to overcome these concerns (Breiman & Cutler, 2003) by uncovering the natural groupings (i.e. clusters) within an image which is based on the inherent spectral properties present in the dataset. Unsupervised learning methods require no prior information to label classes and require fewer human decisions (Carson et al., 1995; Hegarat-Mascle et al., 1997). This minimizes potential errors within training samples and is suggested to be less time consuming. In general, unsupervised methods explore the data to discover similarities between pixels, which are then assigned class membership (Tarca et al., 2007). Similarities are usually based on a measure of distance between pixels and can be computed using a proximity matrix. In this study a novel way of calculating the nonlinear distances between pixels using the random forest (RF) algorithm is presented. RF is a multivariate ensemble method that combines multiple decision trees and uses the entire forest as a complex composite classifier. Individual trees are grown using different bootstrap samples while randomly selecting a subset of explanatory variables to determine the best split at each node in a tree. The final classification of a given sample is decided by the majority rule of individual tree votes (Breiman, 2001; Ismail & Mutanga, 2011). Many remote sensing studies have successfully investigated the utility of RF for supervised classification applications (Pal, 2005; Lawrence et al., 2006; Sesnie et al., 2010; Dye, Mutanga & Ismail, 2011, Ismail & Mutanga, 2011, Adam, Mutanga, Rugege & Ismail, 2012) with limited
focus on implementing RF within an unsupervised context (Peerbhay, Mutanga & Ismail, 2015). Nonetheless, researchers have shown the potential of using an unsupervised RF methodology in computer science (Zhang & Zulkernine, 2006, Zhang, Zulkernine & Haque, 2008), biology (Tarca et al., 2007) and human genetics (Shi & Horvath, 2006). It is within this context that this study evaluates an unsupervised classification approach based on the RF proximity matrix and WorldView-2 multispectral image data to automatically detect and map anomalous bugweed, found within various forest vegetation cover classes in a commercial plantation. 2. Methods and materials 2.1. Study species Solanum mauritianum (bugweed) is an evergreen, noxious, bunched shrub that is native to the tropical and sub-tropical regions of South America (Olckers, 2011). Growing between 2 m and 10 m in height and with a lifespan of up to thirty years, bugweed is a major constituent of agricultural land, forestry plantations, water courses and disturbed environments (Copeland & Wharton, 2006, Olckers & Borea, 2009). A single plant is capable of self-pollination throughout the year and could produce around 250 berries of which 98% are viable (van den Bosch et al., 2004). The aggressive nature of the weed is to effectively capture natural resources and to outcompete other plants. The weeds resilience towards mitigation efforts has earned it a classification of a category one ‘transformer’ invader. Category one plants have the ability to colonize, dominate and replace vegetative layers of ecosystems, thus legally binding land users to not establish, maintain and propagate the plant (Olckers, 2011). Through processes of nutrient sequestration and the production of toxic allelophatic chemicals (Huang & Asner, 2009), this invader suppresses the growth of other plants and alters the ecosystem. Due to the widespread distribution of bugweed in South Africa, protecting and preventing its establishments in natural and semi-natural ecosystems is of national importance (DAFF, 2009). However, the lack of accurate spatial information, coupled with the unavailability of immediate detection techniques delays effective management control efforts (Lass et al., 2005; Gray, Shaw & Bruce, 2009). 2.2. Study area The study was conducted during February 2010 in the Sappi Hodgsons forest plantation (Fig.1) in KwaZulu-Natal, South Africa (29° 13′18″ S; 30° 23′13″ E). Annual temperatures average 15.9 °C with an annual rainfall between 730 mm–1280 mm/year. Occupying an area of 6391 ha at an altitude range of 1030 m to 1590 m, the commercial forest plantation is extensively characterised by Eucalyptus, Pinus and Acacia species (Mucina & Rutherford, 2006). Natural vegetation in the study area is predominantly Ngongoni veld and Southern tall grassveld with sparse ground cover in the understory. The undulating landscape consists mainly of red and yellow apedal type subsoils and humic topsoils which are sometimes covered by short bunched grasslands (SFR, 1993). The weed predominantly occupies riparian zones, forest margins and open areas and occurs in low to high densities with frequent noticeable monospecific thickets (Copeland & Wharton, 2006, Olckers & Borea, 2009). 2.3. Image acquisition and processing WorldView-2 is a new generation push broom imager designed for the acquisition of high resolution visible and near-infrared multispectral imagery. Capturing reflectance in eight wavebands across the 427 nm to 908 nm spectral range, cloud free WorldView-2 image data was acquired during summer on 17 February 2010 at 11:30 am. The sensor has a spatial resolution of 2 m and a swath width of 16.4 km at nadir (DigitalGlobe, 2010). The image used in this study was converted from
K. Peerbhay et al. / Remote Sensing of Environment 182 (2016) 39–48
41
Fig. 1. The location of the study area and the WorldView-2 image (R: band 5, G: band 3, B: band 2) used in the unsupervised detection of anomalous bugweed.
radiance to surface reflectance using the Fast Line-of-sight Atmospheric Analysis of Spectral Hypercubes (FLAASH) (Adler-Golden et al., 1998) in ENVI 4.8 processing software (ENVI, 2010). Using a digital elevation
model (DEM) developed from contours of 5 m, the image was orthorecified and referenced to the Universal Transverse Mercator projection (36S) using the WGS-84 World Geodetic datum.
42
K. Peerbhay et al. / Remote Sensing of Environment 182 (2016) 39–48
2.4. Field verification A total of 90 bugweed observations were recorded using a high resolution (10 cm) colour aerial photograph. The aerial photograph was collected over the study area using a fixed wing light aircraft with at an airborne mean GPS altitude of 2728 m. Using 20 m × 20 m sample plots, field verification was conducted to confirm the status and occurence of the selected observations using a differentially corrected Trimble GeoXT handheld GPS receiver with an accuracy of b 2 m. The presence of bugweed occurred in habitats that are highly susceptible to this plant invader and included riparian zones, open areas and forest margins. Subsequenty, 30 bugweed observations were recorded in each of the representative forest vegetation cover classes were a larger number of bugweed pixels were present in forest margins (n = 30; sum: 434; mean: 14%) compared to open (n = 30; sum: 339; mean: 11%) and riparian areas (n = 30; sum: 387; mean: 13%). The WorldView-2 image was then prioritized into subsets consisting of the three forest vegetation cover classes in which bugweed were present. Each subset was then used as input datasets into R Project for Statistical Computing for unsupervised learning utilising random forests (Development Core Team, 2012). 2.5. Statistical analysis 2.5.1. General overview of the unsupervised methodology The proposed methodology investigates the capability of WorldView-2 multispectral image data to automatically detect and map the occurrence of bugweed in forest margins, open areas and riparian zones. The spectral contrast of the weed when present in the respective forest vegetation cover classes, would facilitate the detection of bugweed pixels as anomalous from the other surrounding vegetation. Fig. 2 describes the automated unsupervised approach to detect and define these anomalies (i.e. bugweed) in each of the forest vegetation cover classes. The process entails calculating RF proximities for each pixel (i.e. proximity matrix) to extract similarity patterns from the various forest vegetation cover image subsets (n = 90). Eigenvector analysis assign Eigen weights to each pixel and are calculated within various neighbourhood window sizes. Optimal neighbourhood windows sizes are set by achieving the minimum variance as they expose only the most valuable information from localized surrounding pixels. Finally, Anselin local Moran's I is applied for spatial clustering and anomaly detection (Peerbhay et al., 2015). 2.5.2. Random forest unsupervised learning The unsupervised RF constructs a second artificial dataset based on the distribution of the original unlabelled data (Shi & Horvath, 2006). Pixels contained in the original data are labelled as one class while pixels in the artificially constructed dataset are labelled as another class (Wu, Lee, Wang & Abadir, 2007). The ensemble then successfully differentiates between the two datasets and the results can be obtained using the proximity matrix. The section below details the utility of the RF proximity matrix for unsupervised learning. 2.5.3. Random forest proximity matrix The proximity matrix measures the similarity between each pixel and every other pixel in a dataset and is considered to be one of the important by-products of the RF algorithm (Xiao & Segal, 2009). The similarity is defined by the distance between pixels whereby similar pixels should be in the same terminal node of a tree more often than dissimilar pixels. The distance between two pixels is measured by counting the number of trees that used the same paths to classify them. For instance, if pixel k and n are contained in the same node of a tree, then their proximity is increased by 1 before dividing it by the total number of trees in the ensemble. Thus the distances between similar pixels would be far greater whereas dissimilar pixels (potential anomalies) will record low distance values (Liaw & Wiener, 2002). Based on the similarity
measure between pixels, the proximity matrix builds patterns within a dataset which provides an unsupervised automated methodology of detecting anomalous pixels and allows for the spatial clustering of the unlabelled classes. 2.5.4. Decomposition of the proximity matrix using eigenvector analysis Identifying and detecting anomalies requires the transformation of information from the complex RF proximity matrix into one single proximity value for each pixel (Liu, Ting & Zhou, 2008). Since the proximity matrix captures information where every pixel in the dataset is compared to every other pixel, the decomposition of the matrix using eigenvector analysis (Triantaphyllou & Mann, 1990) provides a simple methodology that produces individual Eigen weights for each pixel in the respective image subsets. The computation of the Eigen values is defined below: ½A W ¼ n W
ð1Þ
where [A] represents the proximity matrix (N × N) within a defined neighbourhood size and the rank [A] = 1, W is the eigenvector of matrix A and n is the respective eigenvalue. Since the maximum eigenvalue corresponds closely to n and all other eigenvalues to zero, the eigenvector that corresponds to the maximum eigenvalue is calculated. In order to obtain Eigen weights for each sample the values from the calculated eigenvector had to be normalized so that the sum of its elements was equal to one (Triantaphyllou & Mann, 1990) and is defined as follows:
w¼
~ w n X ~i w
ð2Þ
i¼1
where W represents the eigenvector and Wi is the corresponding length of the eigenvector. Subsequently, each pixel was then assigned the newly derived Eigen weight that best explained their respective proximities. The Eigen weights were calculated on the proximity matrix within a moving window of a given size. These were used to accurately determine the probability of a pixel being an anomaly based on its local surrounding pixels within a particular neighbourhood (Zhang, Jordan & Higgins, 2007). 2.5.5. Neighbourhood statistics Neighbourhood windows calculate values for a pixel location by taking into account the data within the local neighbourhood of a given spatial location (Zhang et al., 2007). Utilising local statistics, neighbourhood analysis can reveal more information that could aid in detecting anomalous pixels within the dataset rather than using global statistics (Zhang et al., 2007). Using a fixed neighbourhood window (i.e. 3 m × 3 m), eigenvectors are calculated on the proximity matrix to develop Eigen weights for each pixel in the image subsets. The window is then modified by moving forward to exclude pixels in the first column and to include the number of pixels in the next column it encounters. As the window scans across the image it assigns a value (i.e. Eigen weight) to the center pixel based on the value of surrounding pixels within the neighbourhood. Since neighbourhood window sizes are variable and range from small, medium to large, multiple sizes were calculated (Zhao, Wang & Jia, 2007) to determine the best window that captures the most valuable information to accurately detect the occurrence of anomalous pixels in each vegetation cover class (i.e. forest margins, open and riparian areas). Various window sizes were computed (e.g. 3 m × 3 m, 5 m × 5 m, 7 m × 7 m, 9 m × 9 m, 11 m × 11 m, 13 m × 13 m, 15 m × 15 m, 17 m × 17 m, 19 m × 19 m, 21 m × 21 m) and assessed by using a minimal variance approach. Window sizes were tested until the addition of larger windows yielded no further improvement.
K. Peerbhay et al. / Remote Sensing of Environment 182 (2016) 39–48
43
Fig. 2. The proposed framework for the unsupervised detection of anomalous bugweed located within commercial forestry using WorldView-2 multispectral data.
2.5.6. Minimal variance Minimal variance (Marceau, Gratton, Fournier & Fortin, 1994) was used to indicate the most suitable window sizes with the least variance between pixels in each of the vegetation cover classes considered in this
study (forest margins, open areas, riparian zones). Using the Eigen weights calculated for the WorldView-2 image subsets (n = 90), the variance of each window size was computed. The neighbourhood window that displays the least variance for each forest vegetation cover
44
K. Peerbhay et al. / Remote Sensing of Environment 182 (2016) 39–48
class, was selected to calculate Eigen weights for the respective image subsets and used in the cluster and anomaly detection analysis using Anselin local Moran's I. 2.5.7. Clustering and anomaly detection using Anselin local Moran's I Anselin local Moran's I uses the spatial association between pixels to calculate a local Moran's I value, a z score, a p-value and a final label indicating the cluster type a pixel is assigned (Anselin, 1995). The Anselin local Moran's I statistic of spatial clustering was used to cluster the Eigen weights derived for pixels in each vegetation cover type and can be defined below as (Anselin, 1995): Ii ¼
n zi −z X wij z j −z σ 2 j−1; j≠i
ð3Þ
where Zi is the Eigen value of pixel Z at location i; σ2 is the variance of pixel Z within the optimal neighbourhood; Zj is the value of other pixels at other locations; z is the average value of pixel Z with the sample number of n and Wij is the weight defined as the inverse of the distance dij among locations i and j. Finally, based on the Eigen weight of every pixel, a positive local Moran's I value would indicate that a pixel has neighbouring pixels with similar Eigen weights and is part of the same spatial cluster. Conversely, a negative local Moran's I value would indicate a pixel has neighbouring pixels with dissimilar attributes and would be spatially clustered as an anomaly (Anselin, 1995). 2.5.8. Accuracy assessments This study evaluated the performance of the unsupervised methodology using the Receiver Operating Characteristic (ROC) curve (Bamber, 1975). The ROC curve provides a visual measure of model performance and is plotted by showing the relationship between the detection rate (DR) and the false positive rate (FPR). The former is calculated by dividing the number of classified anomalous pixels by the observed number of anomalous pixels (i.e. bugweed) in the image subset, while the latter is calculated by dividing the number of non-anomaly pixels (i.e. nonbugweed) that are misclassified as anomalous pixels by the total number of non-anomaly pixels in the image subset. The detection and false positive rate is explained by Eqs. (4) and (5) respectively (Spitalnic, 2004): DR ¼
A AþB
Fig. 3. Graphical representation of the random forest proximity matrix for a selected WorldView-2 image subset consisting of bugweed occurring in an open area. The similarities between pixels are illustrated by different colours in the matrix, with the diagonal line representing perfect similarity (value of 1). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
dissimilarity and red indicating perfect similarity (value of 1). The proximity matrix was calculated for all image subsets (n = 90) within the three forest vegetation covers. Eigenvectors were then applied to decompose the matrix and produce Eigen weights for each pixel. 3.2. Determining optimal neighbourhood window size for each forest vegetation class The utility of various neighbourhood window sizes to accurately detect the probability of anomalous pixels in each forest vegetation cover class is shown in Fig. 4. A decrease in the variance from smaller moving windows (i.e. 3 m × 3 m) to larger window sizes (i.e. 21 m × 21 m) is
ð4Þ
where A is the number of anomalous bugweed pixels that were correctly classified as bugweed and B is the proportion of anomalous pixels that were incorrectly classified. FPR ¼
C CþD
ð5Þ
where C is the number of non-anomaly pixels that were incorrectly classified as anomalous bugweed and D is the proportion of non-anomaly pixels that were classified correctly. The overall classification (OA) accuracies were determined by the following equation: OA þ
AþD AþBþCþD
ð6Þ
3. Results 3.1. Random forest proximity analysis The measure of similarity for each pixel is shown in Fig. 3 for a selected WorldView-2 image subset (20 m × 20 m). The subset consists of 100 pixels and the distance between each pixel to every other pixel in the matrix is shown using a colour ramp, with blue indicating total
Fig. 4. Determining the variance of different neighbourhood window sizes to detect the occurrence of anomalous pixels located in forest margins, riparian areas and open areas. The optimal neighbourhood windows with the greatest potential are shown by the respective arrows for each vegetation cover class considered in this study.
K. Peerbhay et al. / Remote Sensing of Environment 182 (2016) 39–48
evident across the various vegetation classes. More specifically, an 11 m × 11 m window showed the least variance in riparian zones by producing the lowest variance of 2.8%. In open areas, using a 13 m × 13 m window produced a lowest variance of 1.9%, while a 17 m × 17 m neighbourhood window showed the best variance of 0.8% in forest margins. Window sizes after 21 m × 21 m showed no improvement in variance.
45
3.3. Clustering and anomaly detection using Anselin local Moran's I The Eigen weights for each pixel, based on the proximity measures and calculated using the optimal neighbourhood window size, allowed for the clustering and detection of anomalous pixels using Anselin local Moran's I. Fig. 5 shows examples of the clustered anomalous pixels in relation to the observed bugweed pixels and the Eigen weights.
Fig. 5. Clustering and anomaly detection based on Eigen weights derived using eigenvectors and the optimal neighbourhood window sizes for: a) Open areas (13 m × 13 m), b) Forest margins (17 m × 17 m) and c) Riparian areas (11 m × 11 m). Selected sample plots are displayed with the actual occurrence of bugweed indicated in red. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
46
K. Peerbhay et al. / Remote Sensing of Environment 182 (2016) 39–48
3.4. Clustering and anomaly detection using Anselin local Moran's I: Overall accuracy assessments using the optimal window size Fig. 6 plots the ROC curves showing the performance of the unsupervised method for mapping anomalies (i.e. bugweed) using Anselin local Moran's I. The relationship between the DR and FPR is compared for detecting bugweed in: a) forest margins; b) open areas and c) riparian areas. Results indicate that the utilisation of Anselin local Moran's I for the clustering of anomalous pixels has the potential to map the occurrence of bugweed within the three selected forest vegetation cover types. More specifically, bugweed occurring in riparian areas was capable of being detected with an accuracy of 77.86% and a FPR of 9.01%. In open areas, bugweed occurrences were identified with a DR of 89.88% and a FPR of 9.58%. The best DR of 95% was achieved when detecting bugweed occurring between forest margins which produced a desirable FPR of 7.05%. Equally important is that when the FPR is reduced to 1.65% the DR for mapping bugweed in forest margins is 80%. When assessing the overall classification accuracies, bugweed was mapped with an accuracy of 91.33% in forest margins, 85.08% in open areas and 67.90% in riparian zones. For comparison purposes, the Reed–Xiaoli (RX) algorithm (Reed & Yu, 1990) was implemented and is considered to be the benchmark anomaly detection method when using remotely sensed information. RX accuracies (DR) for detecting bugweed pixels in forest margins, open areas and riparian zones were between 91.20% and 73% with FPR's between 8.75% and 9.58%. Additionally, the technique produced overall mapping classification accuracies of 86.70%, 78.92% and 63.5%, respectively. 4. Discussion This study shows that WorldView-2 has the capability of accurately detecting and automatically mapping S. mauritianum in commercial forests using an unsupervised random forest approach. The unsupervised approach was based on decomposing the RF proximity matrix into single Eigen weights for each pixel using eigenvector analysis. This process allowed for the automatic clustering and detection of anomaly pixels using Anselin local Moran's I. Determining the optimal neighbourhood window sizes produced Eigen weights that accurately mapped the occurrence of bugweed in forest margins, open areas and riparian zones. 4.1. Random forest proximity matrix analysis The results in this study confirm that the information in the RF proximity matrix is a valuable and accurate measure for identifying similarities that are able to define each unlabelled pixel in a dataset (Liaw & Wiener, 2002, Auret & Aldrich, 2010). Many remote sensing
applications (Lawrence et al., 2006; Ismail, Mutanga & Kumar, 2010, Dye et al., 2011; Adam et al., 2012) utilising RF have focused on variable importance, the Gini Index and the OOB error as valuable RF byproducts, however, the potential does exist for the exploitation of the information rich proximity matrix which has become popular in other research domains. For example, Zhang and Zulkernine (2006) used the RF proximity matrix to detect anomalies in network traffic data and obtained a relatively high DR (95%) when the FPR was low (1%). In a subsequent study, similar DRs and FPRs were obtained for detecting intrusions in network security systems using RF proximities (Zhang et al., 2008). Shi and Horvath (2006) found that the RF proximities are useful for detecting biological tumour sample clusters with a misclassification error rate of 9.4%. Finally, Gray, Aljabar, Heckemann, Hammers, and Rueckert (2013) successfully classified Alzheimer's disease from healthy samples based on neuroimaging features and biological data. Using RF similarities, they produced a good classification accuracy of 89%. Results from this study indicate that calculating the proximity matrix and Eigen weights at different window sizes using minimal variance is effective for providing localized information to best detect potential anomalies within the various vegetation classes. There was a decrease in the variance from smaller moving window sizes to larger window sizes. Larger moving windows used the pixels from a greater spatial location to assign an Eigen weight to the center pixel compared to smaller window sizes, which are limited in extent. As a result, smaller moving windows may not have been able to gather enough sampling values to assign to the center pixel as accurately as larger ones (Zhao et al., 2007). Applying different neighbourhood windows in an unsupervised framework, in conjunction with the RF proximity analysis, has established a unique basis for detecting and mapping plant invaders in commercial forests. However, in remote sensing the application of moving windows has become increasingly popular for other applications. Some of these applications include estimating the road edge effect on adjacent forests' vegetation (Lottering & Mutanga, 2012); estimation of boreal forest structure (Wunderle, Franklin & Guo, 2007) and the prediction of forest age (Dye, Mutanga & Ismail, 2012). While the application of unsupervised approaches are recommended when utilising high resolution datasets (Wu et al., 2007), detecting anomalies such as the occurrence of bugweed in commercial forest areas can be computationally demanding and complex. For instance, a dataset consisting of N pixels results in an N x N proximity matrix (Zhang & Zulkernine, 2006). The resulting calculation is therefore N x N, producing a huge dataset which demands advanced CPU memory and processing time. Therefore, sub-setting the WorldView-2 image into 20 m × 20 m sample plots was successful in reducing the computational load as well as accurately detecting anomalous pixels. 4.2. Implications for mapping bugweed using the proposed unsupervised method
Fig. 6. ROC curves for the mapping of bugweed located within the three selected vegetation cover classes considered in this study.
The results of this study show the potential of detecting and mapping IAPs with a reasonable level of accuracy in a complex commercial forest environment. The eight strategically placed WorldView-2 visible (427 nm, 478 nm, 546 nm, 608 nm, 659 nm, 724 nm) and nearinfrared (831 nm and 908 nm) wavebands proved capable of detecting S. mauritianum in each of the vegetation cover classes considered in this study. This finding is consistent with the finding of Laba et al. (2008) and Masocha and Skidmore (2011), who also recognize the significance of the visible and near-infrared regions in the mapping of IAP species using multispectral data. Nonetheless, the proposed method should be investigated over broader spatial extents and using other satellite platforms that may exploit different spectral regions for IAP species mapping. When assessing the detection capabilities of the multispectral wavebands to identify the anomalous bugweed pixels, satisfactory detection rates were achieved with low FPRs. Bugweed, however, was
K. Peerbhay et al. / Remote Sensing of Environment 182 (2016) 39–48
best detected in forest margins with the highest DR (95%) and lowest FPR (6.25%) and was mapped with an overall accuracy of 91.33%. Since there was a reduction in the overall classification accuracies and DRs for the mapping of bugweed in open and riparian areas compared to the mapping of bugweed in forest margins, the methodology in this study may have been affected by the environmental conditions in which the weed grew. For instance, when mapping the occurrence of bugweed in forest margins, the image subsets consisted of relatively homogenous pixels representing mainly the forest canopy and bugweed in between. However, in open areas the image subsets were subjected to the reflectance from underlying background material such as bare soil and other photosynthetic and non–photosynthetic material. Similarly, riparian bugweed grew in a backdrop of underlying riparian vegetation and was sometimes surrounded by dampened soils. Such variation in conditions within each of the vegetation cover types considered in this study may have accounted for the differences in detection capabilities and classification results for mapping bugweed in this study. Although the detection of bugweed occurence in riparian areas proved the most difficult, this automated unsupervised classification framework produced results comparable to that of previous studies relating to the mapping of IAPs using multispectral datasets (Fuller, 2005; Laba et al., 2008; Masocha & Skidmore, 2011). Fuller (2005) for example, mapped the dominant cover of the invasive tree Melaleuca quinquenervia among other woody plants, with an accuracy of 85.66%. The study used IKONOS imagery (4 m) and a back-propagated neural network (NN) classifier. Laba et al. (2008) estimated the presence of Lythrum salicaria, Phragmites australis and Trapa natans in a diverse wetland environment using high resolution Quickbird imagery (2.4 m). Utilising a maximum likelihood classifier they obtained accuracies greater than 65%. Recently, in a supervised hyperspectral application, Atkinson et al. (2014) mapped bugweed with AISA Eagle (2.4 m; 272 bands) and obtained an impressive classification accuracy of 93% using support vector machines (SVM). The methods developed in this study could be potentially valuable to decisions related to the constant eradication and weed management initiatives undertaken in commercial forests (Peerbhay, Mutanga & Ismail, 2016). Forest margins in particular are highly susceptible to weed invasion since they represent pathways which introduce alien species into forest interiors (Pauchard & Alaback, 2004). Similarly, riparian zones are positioned in low lying areas which are adjacent to rivers and serve as direct conduits for the dispersal of seedlings (Anderson et al., 1996; Le Maitre et al., 2002). Finally, open areas in the plantation represent prime invasion sinks which are fuelled by dispersal mechanisms, poor weed control measures and dormant seeds buried within the soil bed (Martins & Engel, 2007, Jordaan & Downs, 2012). While these areas provide initial pathways for invasion they also promote further spread within the plantation and are therefore encouraged to be monitored and included in a weed management regime for effective results (Gray, Shaw, Gerard & Bruce, 2008). With the automated potential of detecting weed occurrences in areas that are highly susceptible, using remote sensing techniques may have the potential of adding great value to the effectiveness of eradication measures in plantation forestry. 5. Conclusion This study has demonstrated that the WorldView-2 multispectral sensor has the potential of mapping bugweed occurrences in forest margins, open areas and riparian areas with overall classification accuracies of 91.33%, 85.08% and 67.90% respectively. The proximity matrix provided valuable information that was successfully decomposed into Eigen weights for each pixel using eigenvectors. The utility of different neighbourhood window sizes proved to be effective in detecting and mapping anomalous pixels in each of the vegetation cover classes using Anselin local Moran's I. Based on the DRs and classification accuracies obtained, this study has presented an alternative unsupervised approach for multispectral analysis while accurately mapping the
47
occurrence of IAPs using new generation multispectral remotely sensed data.
Acknowledgements This study was carried out with the financial support from the Applied Centre for Climate and Earth Systems Science (ACCESS) under the Land Use and Land Cover Change theme. We would like to thank Sappi forests-SA for allowing us the opportunity of carrying out our study under excellent conditions.
References Adam, E., Mutanga, O., Rugege, D., & Ismail, R. (2012). Discriminating the papyrus vegetation (Cyperus papyrus L.) and its co-existent species using random forest and hyperspectral data resampled to Hymap. International Journal of Remote Sensing, 33(2), 552–569. Adler-Golden, S., Berk, A., Bernstein, L. S., Richtsmeier, S., Acharya, P. K., Matthew, M. W., ... Chetwynd, J. H. (1998). “FLAASH, a MODTRAN4 atmospheric correction package for hyperspectral data retrievals and simulations. Proc. 7th ann. JPL airborne earth science workshop (pp. 9–14). Anderson, G., Everitt, J., Escobar, D., Spencer, N., & Andrascik, R. (1996). Mapping leafy spurge (Euphorbia esula) infestations using aerial photography and geographic information systems. Geocarto International, 11(1), 81–89. Andrew, M. E., & Ustin, S. L. (2008). The role of environmental context in mapping invasive plants with hyperspectral image data. Remote Sensing of Environment, 112(12), 4301–4317. Anselin, L. (1995). Local indicators of spatial association—Lisa. Geographical Analysis, 27(2), 93–115. Asner, G. P., Jones, M. O., Martin, R. E., Knapp, D. E., & Hughes, R. F. (2008). Remote sensing of native and invasive species in hawaiian forests. Remote Sensing of Environment, 112(5), 1912–1926. Atkinson, J. T., Ismail, R., & Robertson, M. (2014). Mapping bugweed (Solanum mauritianum) infestations in Pinus patula plantations using hyperspectral imagery and support vector machines. IEEE Journal of Selected Topics in Applied Earth Observation and Remote Sensing, 7(1), 17–28. Auret, L., & Aldrich, C. (2010). Change point detection in time series data with random forests. Control Engineering Practice, 18(8), 990–1002. Bamber, D. (1975). The area above the ordinal dominance graph and the area below the receiver operating characteristic graph. Journal of Mathematical Psychology, 12(4), 387–415. Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. Breiman, L., & Cutler, A. (2003). Random forests manual v4, 0. Technical report, UC Berkel, 2003. ftp://ftp. stat. berkeley. edu/pub/users/breiman/Using random forests v4. 0. pdf. Carson, H. W., Lass, L. W., & Callihan, R. H. (1995). Detection of yellow hawkweed (Hieracium pratense) with high resolution multispectral digital imagery. Weed Technology, 477–483. Copeland, R. S., & Wharton, R. A. (2006). Year round production of pest ceratitis species (diptera: Tephritidae) in fruit of the invasive species Solanum mauritianum in Kenya. Annals of the Entomological Society of America, 99(3), 530–535. DAFF (2009). Report on commercial timber resources and primary roundwood processing South Africa. Pretoria: Department of Agriculture Foresetry and Fisheries. Development Core Team, R. (2012). R: A language and environment for statistical computing. Vienna: Austria: R Foundation for Statistical Computing (http://www.R-project. org). DigitalGlobe (2010). The benefits of the 8 spectral bands of WorldView-2. (USA). Dye, M., Mutanga, O., & Ismail, R. (2011). Examining the utility of random forest and AISA eagle hyperspectral image data to predict Pinus patula age in Kwazulu-Natal, South Africa. Geocarto International, 26(4), 275–289. Dye, M., Mutanga, O., & Ismail, R. (2012). Combining spectral and textural remote sensing variables using random forests: Predicting the age of Pinus patula forests in KwazuluNatal, South Africa. Journal of Spatial Science, 57(2), 193–211. ENVI (2010). Environment for visualizing images: Version 4.7. USA: Exelis Visual Information Solutions, ITT Industries. Everitt, J. H., Escobar, D. E., Alaniz, M. A., Davis, M. R., & Richerson, J. V. (1996). Using spatial information technologies to map Chinese Tamarisk (Tamarix chinensis) infestations. Weed Science, 44(1), 194–201. Everitt, J. H., Yang, C., & Deloach, C. (2005). Remote sensing of giant reed with quickbird satellite imagery. Journal of Aquatic Plant Management, 43, 81–85. Fuller, D. O. (2005). Remote detection of invasive melaleuca trees (Melaleuca quinquenervia) in South Florida with multispectral ikonos imagery. International Journal of Remote Sensing, 26(5), 1057–1063. Glenn, N. F., Mundt, J. T., Weber, K. T., Prather, T. S., Lass, L. W., & Pettingill, J. (2005). Hyperspectral data processing for repeat detection of small infestations of leafy spurge. Remote Sensing of Environment, 95(3), 399–412. Gray, C. J., Shaw, D. R., Gerard, P. D., & Bruce, L. M. (2008). Utility of multispectral imagery for soybean and weed species differentiation. Weed Technology, 22(4), 713–718. Gray, C. J., Shaw, D. R., & Bruce, L. M. (2009). Utility of hyperspectral reflectance for differentiating soybean (Glycine max) and six weed species. Weed Technology, 23(1), 108–119.
48
K. Peerbhay et al. / Remote Sensing of Environment 182 (2016) 39–48
Gray, K. R., Aljabar, P., Heckemann, R. A., Hammers, A., & Rueckert, D. (2013). Random forest-based similarity measures for multi-modal classification of alzheimer's disease. NeuroImage, 65, 167–175. Hegarat-Mascle, L., Bloch, I., & Vidal-Madjar, D. (1997). Application of dempster-Shafer evidence theory to unsupervised classification in multisource remote sensing. IEEE Transactions on Geoscience and Remote Sensing, 35(4), 1018–1031. Huang, C. Y., & Asner, G. (2009). Applications of remote sensing to alien invasive plant studies. Sensors, 9(6), 4869–4889. Ismail, R., & Mutanga, O. (2011). Discriminating the early stages of sirex noctilio infestation using classification tree ensembles and shortwave infrared bands. International Journal of Remote Sensing, 32(15), 4249–4266. Ismail, R., Mutanga, O., & Kumar, L. (2010). Modeling the potential distribution of pine forests susceptible to sirex noctilio infestations in Mpumalanga, South Africa. Transactions in GIS, 14(5), 709–726. Jordaan, L. A. and Downs, C. T. 2012. Comparison of germination rates and fruit traits of indigenous Solanum giganteum and invasive Solanum mauritianum in South Africa. South African Journal of Botany, 80(0): 13–20. Laba, M., Downs, R., Smith, S., Welsh, S., Neider, C., White, S., ... Baveye, P. (2008). Mapping invasive wetland plants in the Hudson river national estuarine research reserve using quickbird satellite imagery. Remote Sensing of Environment, 112(1), 286–300. Lass, L. W., Prather, T. S., Glenn, N. F., Weber, K. T., Mundt, J. T., & Pettingill, J. (2005). A review of remote sensing of invasive weeds and example of the early detection of spotted knapweed (Centaurea maculosa) and babysbreath (Gypsophila paniculata) with a hyperspectral sensor. Weed Science, 53(2), 242–251. Lawrence, R. L., Wood, S. D., & Sheley, R. L. (2006). Mapping invasive plants using hyperspectral imagery and breiman cutler classifications (randomforest). Remote Sensing of Environment, 100(3), 356–362. Le Maitre, D. C., van Wilgen, B. W., Gelderblom, C. M., Bailey, C., Chapman, R. A., & Nel, J. A. (2002). Invasive alien trees and water resources in South Africa: Case studies of the costs and benefits of management. Forest Ecology and Management, 160(1–3), 143–159. Liaw, A., & Wiener, M. (2002). Classification and regression by randomforest. R news, 2(3), 18–22. Liu, F. T., Ting, K. M., & Zhou, Z. H. (2008). Isolation forest. IEEE international conference on data mining (pp. 413–422). Lottering, R., & Mutanga, O. (2012). Estimating the road edge effect on adjacent Eucalyptus grandis forests in Kwazulu-Natal, South Africa, using texture measures and an artificial neural network. Journal of Spatial Science, 57(2), 153–173. Marceau, D. J., Gratton, D. J., Fournier, R. A., & Fortin, J. -P. (1994). Remote sensing and the measurement of geographical entities in a forested environment. 2. The optimal spatial resolution. Remote Sensing of Environment, 49(2), 105–117. Martins, A. M., & Engel, V. L. (2007). Soil seed banks in tropical forest fragments with different disturbance histories in southeastern Brazil. Ecological Engineering, 31(3), 165–174. Masocha, M., & Skidmore, A. K. (2011). Integrating conventional classifiers with a GIS expert system to increase the accuracy of invasive species mapping. International Journal of Applied Earth Observation and Geoinformation, 13(3), 487–494. Mucina, L., & Rutherford, M. C. (2006). The vegetation of south africa, lesotho and swaziland. Pretoria: South African National Biodiversity Institute. Mullerova, J., Pyšek, P., JaroŠik, V., & Pergl, J. A. N. (2005). Aerial photographs as a tool for assessing the regional dynamics of the invasive plant species Heracleum mantegazzianum. Journal of Applied Ecology, 42(6), 1042–1053. Müllerová, J., Pergl, J., & Pyšek, P. (2013). Remote sensing as a tool for monitoring plant invasions: Testing the effects of data resolution and image classification approach on the detection of a model plant species Heracleum mantegazzianum (giant hogweed). International Journal of Applied Earth Observation and Geoinformation, 25, 55–65. Mundt, J. T., Glenn, N. F., Weber, K. T., Prather, T. S., Lass, L. W., & Pettingill, J. (2005). Discrimination of hoary cress and determination of its detection limits via hyperspectral image processing and accuracy assessment techniques. Remote Sensing of Environment, 96(3–4), 509–517. Olckers, T. (2011). Biological control of Solanum mauritianum scop. (solanaceae) in South Africa: Will perseverance pay off? African Entomology, 19(2), 416–426. Olckers, T., & Borea, C. (2009). Assessing the risks of releasing a sap-sucking lace bug, gargaphia decoris, against the invasive tree Solanum mauritianum in new zealand. BioControl, 54(1), 143–154. Pal, M. (2005). Random forest classifier for remote sensing classification. International Journal of Remote Sensing, 26(1), 217–222. Pauchard, A., & Alaback, P. B. (2004). Influence of elevation, land use, and landscape context on patterns of alien plant invasions along roadsides in protected areas of southcentral Chile. Conservation Biology, 18(1), 238–248. Peerbhay, K. Y., Mutanga, O., & Ismail, R. (2013). Commercial tree species discrimination using airborne AISA eagle hyperspectral imagery and partial least squares discriminant analysis (PLS-DA) in kwazulu–natal, South Africa. ISPRS Journal of Photogrammetry and Remote Sensing, 79, 19–28.
Peerbhay, K. Y., Mutanga, O., & Ismail, R. (2014a). Does simultaneous variable selection and dimension reduction improve the classification of pinus forest species? Journal of Applied Remote Sensing, 8(1). http://dx.doi.org/10.1117/1.JRS.8.085194 (085194085194). Peerbhay, K. Y., Mutanga, O., & Ismail, R. (2014b). Investigating the capability of few strategically placed WorldView-2 multispectral bands to discriminate forest species in Kwazulu-Natal, South Africa. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 7(1), 307–316. Peerbhay, K., Mutanga, O., & Ismail, R. (2015). Random forests unsupervised classification: The detection and mapping of Solanum mauritianum infestations in plantation forestry using hyperspectral data. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing. http://dx.doi.org/10.1109/JSTARS.2015.2396577. Peerbhay, K., Mutanga, O., & Ismail, R. (2016). The identification and remote detection of alien invasive plants in commercial forests: An overview. South African Journal of Geoinformatics, 5(1), 49–67. Reed, I. S., & Yu, X. (1990). Adaptive multiple-band CFAR detection of an optical pattern with unknown spectral distribution. IEEE Transactions on Acoustics, Speech and Signal Processing, 38, 1760–1770. Sesnie, S. E., Finegan, B., Gessler, P. E., Thessler, S., Bendana, Z. R., & Smith, A. M. (2010). The multispectral separability of costa rican rainforest types with support vector machines and random forest decision trees. International Journal of Remote Sensing, 31(11), 2885–2909. SFR (1993). Forest land types of the natal region: Howick, KwaZulu-Natal. South Africa: Sappi Forest Research. Shi, T., & Horvath, S. (2006). Unsupervised learning with random forest predictors. Journal of Computational and Graphical Statistics, 15(1), 118–138. Spitalnic, S. (2004). Test properties 2: likelihood ratios, bayes' formula, and receiver operating characteristic curves. Hospital Physician, 40(10), 53–58. Tarca, A. L., Carey, V. J., Chen, X. -W., Romero, R., & Drăghici, S. (2007). Machine learning and its applications to biology. PLoS Computational Biology, 3(6), 116. Triantaphyllou, E., & Mann, S. H. (1990). An evaluation of the eigenvalue approach for determining the membership values in fuzzy sets. Fuzzy Sets and Systems, 35(3), 295–301. Ustin, S. L., DiPietro, D., Olmstead, K., Underwood, E., & Scheer, G. J. (2002). Hyperspectral remote sensing for invasive species detection and mapping. IEEE Geoscience and Remote Sensing Symposium, 3, 1658–1660. van Coillie, F., Verbeke, L. P. C., & De Wulf, R. R. (2007). Feature selection by genetic algorithms in object-based classification of ikonos imagery for forest mapping in flanders, Belgium. Remote Sensing of Environment, 110(4), 476–487. van den Bosch, E., Ward, B., & Clarkson, B. (2004). Woolly nightshade (solanum mauritianum) and its allelopathic effects on New Zealand native Hebe stricta seed germination. New Zealand Plant Protection, 57, 98. van Wilgen, B. W., Richardson, D. M., Le Maitre, D. C., Marais, C., & Magadlela, D. (2001). The economic consequences of alien plant invasions: Examples of impacts and approaches to sustainable management in South Africa. Environment, Development and Sustainability, 3(2), 145–168. Wang, L., Silván-Cárdenas, J. L., Yang, J., & Frazier, A. E. (2013). Invasive saltcedar (Tamarisk spp.) distribution mapping using multiresolution remote sensing imagery. The Professional Geographer, 65(1), 1–15. Williams, A. P., & Hunt, R. (2002). Estimation of leafy spurge cover from hyperspectral imagery using mixture tuned matched filtering. Remote Sensing of Environment, 82(2), 446–456. Wu, S. H., Lee, B. N., Wang, L., & Abadir, M. S. (2007). Statistical analysis and optimization of parametric delay test. IEEE International Test Conference, ITC, 2007, 1–10. Wunderle, A., Franklin, S., & Guo, X. (2007). Regenerating boreal forest structure estimation using spot-5 pan-sharpened imagery. International Journal of Remote Sensing, 28(19), 4351–4364. Xiao, Y., & Segal, M. R. (2009). Identification of yeast transcriptional regulation networks using multivariate random forests. PLoS Computational Biology, 5(6), 1000414. Zhang, J., & Zulkernine, M. (2006). Anomaly based network intrusion detection with unsupervised outlier detection. IEEE international conference on communications (pp. 2388–2393). Zhang, C., Jordan, C., & Higgins, A. (2007). Using neighbourhood statistics and gis to quantify and visualize spatial variation in geochemical variables: An example using ni concentrations in the topsoils of Northern Ireland. Geoderma, 137(3), 466–476. Zhang, J., Zulkernine, M., & Haque, A. (2008). Random-forests-based network intrusion detection systems. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, 38(5), 649–659. Zhao, C., Wang, F., & Jia, M. (2007). Dissimilarity analysis based batch process monitoring using moving windows. AICHE Journal, 53(5), 1267–1277.