Environmental Modelling & Software 35 (2012) 192e193
Contents lists available at SciVerse ScienceDirect
Environmental Modelling & Software journal homepage: www.elsevier.com/locate/envsoft
Software, data and modelling news
imageRF e A user-oriented implementation for remote sensing image analysis with Random Forests Björn Waske a, *, Sebastian van der Linden b, Carsten Oldenburg a, Benjamin Jakimow b, Andreas Rabe b, Patrick Hostert b a b
Institute of Geodesy and Geoinformation, Faculty of Agriculture, University of Bonn, Nussallee 15, 53115 Bonn, Germany Geomatics Lab, Geography Department, Humboldt-Universität zu Berlin, Unter den Linden 6, 10099 Berlin, Germany
a r t i c l e i n f o
a b s t r a c t
Article history: Received 3 November 2011 Accepted 27 January 2012 Available online 3 March 2012
An IDL implementation for the classification and regression analysis of remote sensing images with Random Forests is introduced. The tool, called imageRF, is platform and license independent and uses generic image file formats. It works well with default parameterization, yet all relevant parameters can be defined in intuitive GUIs. This makes it a user-friendly image processing tool, which is implemented as an add-on in the free EnMAP-Box and may be used in the commercial IDL/ENVI software. Ó 2012 Elsevier Ltd. All rights reserved.
Keywords: Random forests RF Remote sensing Classification Regression Land cover Image analysis
Software availability Name of software: imageRF Concept: B. Waske, C. Oldenburg, A. Rabe, B. Jakimow, S. van der Linden Programming: C. Oldenburg, B. Jakimow, A. Rabe Availability: www.imagerf.net Year first availability: 2011 Software required: EnMAP-Box; IDL/ENVI 1. Introduction Remotely sensed land cover maps provide important information for decision-making and environmental monitoring systems (e.g., Rao et al., 2007). The map accuracy is driven, among others, by factors such as input data and classification method (e.g., Chan and Paelinckx, 2008; Waske and van der Linden, 2008). The use of multitemporal and multisensor data can further improve the accuracy of maps (e.g., Waske and van der Linden, 2008) and the use of imaging spectroscopy data is interesting when mapping complex environments (e.g. Chan and * Corresponding author. Tel.: þ49 228 73 60190; fax: þ49 228 73 2712. E-mail address:
[email protected] (B. Waske). 1364-8152/$ e see front matter Ó 2012 Elsevier Ltd. All rights reserved. doi:10.1016/j.envsoft.2012.01.014
Paelinckx, 2008). The potential of common and frequently available methods is often limited in this context. More sophisticated approaches from the field of machine learning have been shown to perform more accurate (e.g., Gislason et al., 2006; Waske and Braun, 2009). Ensemble classifiers emerged to be one of the most promising approaches for more complex recently available Earth observation data. Original implementations of such tools, however, do not favor the use of remote sensing data and require complex work-arounds, which are rarely shared among the community. Regular remote sensing software packages, on the other hand, usually do not offer most recent developments. To overcome the limited availability and complicated handling of one of the most successful approaches, Random ForestsÔ (RFÔ),1 we present a freely available, platform independent and useroriented tool for remote sensing image classification and regression, called imageRF. 2. Background RF are decision tree (DT) ensembles for the classification or regression of categorical and continuous data (Breiman, 2001). 1
Random Forests and RF are trademarks of L. Breiman and A. Cutler.
B. Waske et al. / Environmental Modelling & Software 35 (2012) 192e193
193
- Generic image files with an ENVI type text header for the image data as well as the continuous or categorical reference data and model outputs. - Two-step image analysis consisting of separate model parameterization and application; trained models are saved and may be applied several times, e.g., for transfer to other data sets. - Number of trees, number of randomly selected features and split criterion can be defined in intuitive GUIs (Fig. 1). Reliable default parameters are provided. - OOB accuracy and variable importance. 4. Conclusion By introducing imageRF a user-friendly tool for RF classification and regression becomes freely available in the (remote sensing) image analysis community. Through the common file formats, its flexible parameterization, and the reliable default parameterization both experienced users and RF novices are addressed. Therefore, the application of state-of-the-art methodology is advanced and the increased requirements for effective analysis of recent and upcoming Earth observation data may be easier fulfilled. Fig. 1. Advanced parameterization menu for RF classification, using imageRF in the EnMAP-Box.
Acknowledgments Each DT in the ensemble is individually trained with a bootstrapped sample of training data and at each split node a randomly selected feature subset is used. The individual outputs are combined by a simple majority vote. Theoretically the outputs can be interpreted as probabilities (Peters et al., 2009) and thus, used as measurement for the reliability of the result. Since the free availability of a FORTRAN code version,2 RF have been widely used for ecological applications (e.g., Cutler et al., 2007; PinoMejias et al., 2010) and over the past years they have emerged in the context of remote sensing image classification (e.g., Gislason et al., 2006; Chan and Paelinckx, 2008; Waske and Braun, 2009) and regression analysis (Walton, 2008). RF perform well with small training sample sets and often outperform other methods in terms of accuracy and computation time. RF offer a cross-validation-like accuracy measure through the out-of-bag (OOB) error estimate and give insight on variable importance by assessing the accuracy loss when feature values are randomly permuted (Breiman, 2001). 3. imageRF imageRF is an IDL-based tool for RF classification and regression analysis of remote sensing imagery. It can be fully integrated into the commercially available IDL/ENVI environment. It may also be run as an add-on to the EnMAP-Box a freely available, license-free and platform-independent processing environment for remote sensing imagery (Fig. 1). imageRF includes:
2
http://www.stat.berkeley.edu/wbreiman/RandomForests/.
The development of imageRF was partly funded through the German Aerospace Centre (DLR), FKZ 50EE0949. References Breiman, L., 2001. Random Forests. Machine Learning 45 (1), 5e32. Chan, J.C.W., Paelinckx, D., 2008. Evaluation of Random Forest and Adaboost treebased ensemble classification and spectral band selection for ecotope mapping using airborne hyperspectral imagery. Remote Sensing of Environment 112 (6), 2999e3011. Cutler, D.R., Edwards, T.C., Beard, K.H., Cutler, A., Hess, K.T., 2007. Random forests for classification in ecology. Ecology 88 (11), 2783e2792. Gislason, P.O., Benediktsson, J.A., Sveinsson, J.R., 2006. Random Forests for land cover classification. Pattern Recognition Letters 27 (4), 294e300. Peters, J., Verhoest, N.E.C., Samson, R., Van Meirvenne, M., Cockx, L., De Baets, B., 2009. Uncertainty propagation in vegetation distribution models based on ensemble classifiers. Ecological Modelling 220 (6), 791e804. Pino-Mejias, R., Cubiles-de-la-Vega, M.D., Anaya-Romero, M., Pascual-Acosta, A., Jordan-Lopez, A., Bellinfante-Crocci, N., 2010. Predicting the potential habitat of oaks with data mining models and the R system. Environmental Modelling & Software 25 (7), 826e836. Rao, M., Fan, G., Thomas, J., Cherian, G., Chudiwale, V., Awawdeh, M., 2007. A web-based GIS decision support system for managing and planning USDA’s Conservation Reserve Program (CRP). Environmental Modelling & Software 22 (9), 1270e1280. Walton, J.T., 2008. Subpixel urban land cover estimation: comparing cubist, Random Forests, and support vector regression. Photogrammetric Engineering and Remote Sensing 74 (10), 1213e1222. Waske, B., van der Linden, S., 2008. Classifying multilevel imagery from SAR and optical sensors by decision fusion. IEEE Transaction on Geoscience and Remote Sensing 46 (5), 1457e1466. Waske, B., Braun, M., 2009. Classifier ensembles for land cover mapping using multitemporal SAR imagery. ISPRS Journal of Photogrammetry and Remote Sensing 64 (5), 450e457.