Applied Geography 67 (2016) 94e108
Contents lists available at ScienceDirect
Applied Geography journal homepage: www.elsevier.com/locate/apgeog
Spatial and temporal dimensions of land use change in cross border region of Luxembourg. Development of a hybrid approach integrating GIS, cellular automata and decision learning tree models dis c Reine Maria Basse a, *, Omar Charif b, Katalin Bo a
Departement of Urban Development and Mobility, Luxembourg Institute of Socio-Economic Research (LISER), 11 Porte des Sciences, L-4366 Esch, Alzette, Luxembourg Department of Geomatics Engineering, University of Calgary, 2500 University Dr NW, Calgary, AB T2N 1N4, Canada c European Commission, Joint Research Centre, Institute for Energy and Transport, Via Enrico Fermi 2749, TP450, I-21027, Ispra, Italy b
a r t i c l e i n f o
a b s t r a c t
Article history: Received 1 July 2015 Received in revised form 29 November 2015 Accepted 3 December 2015 Available online xxx
This paper presents a geographical and computational modelling approach to explore the nonlinear relationship between land use types and geospatial driving factors. It focuses on the dynamism of land use characteristics in a cross-border region. The developed model is based on fully integrated Cellular Automata (CA), Geographic Information System (GIS) and Decision Learning Tree (DLT) model, which is used to define the CA transition rules. Existing literature considers CA as one of the most relevant tools for modelling spatial changes over time, particularly when complex systems such as land use are involved. The literature also highlights that, when CA is combined with other tools, results lead to a better spatial prospect of land use dynamics. Our results reveal how land use is structured around both the transportation system and the border, and how measuring accessibility from different angles using GIS platform permits analysis of the temporal and spatial discontinuity of land use itself, thereby identifying the discontinuity components of land use patterns determined by land use boundaries. © 2015 Elsevier Ltd. All rights reserved.
Keywords: Cellular automata Decision learning tree GIS Land use Boundaries Discontinuity Accessibility
1. Introduction In geography and in environment science in general, modelling became essential to study the spatial and temporal dynamics of land use as they reveal future changes, development potential and planning issues and challenges (Geist & Lambin, 2001). Land use is a synthetic indicator that translates not only human dispersal but also the distribution of natural elements and urban activities. Moreover, land use is a very complex structure and this complexity is a result of multiple interactions between its categories, and environmental constraints (Lambin, Geist, & Lepers, 2003). To explore these interactions, the derivative implications on the structure and the evolution of land use patterns, researchers devoted much effort in modelling the land use change. Methods from various research fields were used when developing land use dynamic models including methods from artificial intelligence, GIS,
* Corresponding author. E-mail addresses:
[email protected] (R.M. Basse),
[email protected] dis). (O. Charif),
[email protected] (K. Bo http://dx.doi.org/10.1016/j.apgeog.2015.12.001 0143-6228/© 2015 Elsevier Ltd. All rights reserved.
geography and statistics. In the toolset of artificial intelligence, CA is the most frequently used model for studying land use change. It has been proven that when CA are applied in the modelling of a cross-border region, determining targets and analysing land use dynamics are more complex issues as a result of the particularity of the cross-border system (Basse, Omrani, Charif, Gerber, & Bodis, 2014). By taking into account this complexity, the methodological approach proposed in this paper combines different methods and tools: CA, GIS and DLT. These tools (GIS) and methods (CA and DLT) are “consistent and mature technologies” capable of modelling land use dynamics. The proposed model seems efficient to respond to the following challenging questions: How can land use dynamics in a cross-border system be formulated using spatial models? What are the main drivers of land use changes? What can we learn from the combined modelling exercises? The findings will contribute to showing why these combined models are also a plus for (1) a better understanding of the complexity of land use dynamics in a crossborder system and (2) building a comprehensive CA-based land use model. Coupling CA, GIS, and DLT can provide the essential background information to explore the structure of land use and its
R.M. Basse et al. / Applied Geography 67 (2016) 94e108
potential evolution. “Simple rules for modelling and simulating complex phenomena” e is doubtless the most simple and “naive” definition we can attribute to CA, while remaining faithful to the precursors; Alan Turing (1950), John Von Neumann (1951) and Stanislaw Ulam (1952). CA has sparked interest in many domains such as computer science, physics, biology, chemistry, ecology, economy, mathematics and geography. The use of CA in geography is relatively recent and has essentially been applied in urban and land use modelling (Batty & Xie, 1994; Benenson & Torrens, 2004; Couclelis, 1985; Phipps, 1989; Tobler, 1970, 1979; White and Engelen, 1997 and others). The development of GIS contributed to popularising the use of CA in geography. Indeed, when associated with GIS, CA is one of the most relevant computational methods used to model and simulate the intrinsic complexity of, for instance, the land use system (Batty, Xie, & Sun, 1999; Clarke & Gaydos, 1998; Clarke & Gaydos, 1998; White & Engelen, 2000). Two types of CA can be distinguished: (A), standard or simple CA, based on the original universe of Conway's Game of Life which was developed to simulate the emergence and self-organisation/self-replication of living organisms (Gardner, 1970). For this type of CA, the characteristics are follows: The environment is a two-dimensional orthogonal grid of square cells/lattice. Cell states have two possibilities (alive or dead). Time is incremental (iteration). Simple conditional rules defining cells' state transitions. Neighbourhood setting (eight neighbourhood). Cells are autonomous. The standard example is Conway's GAME of LIFE. And, (B), the advanced CA, used to model and to simulate more complex human or other living organisms and environmental phenomena. This kind of CA is also known in geography and environmental sciences as a CA-based model. For this type of CA the characteristics are follows: The environment/space is a two-dimensional orthogonal grid/ lattice of square cells. Cells are pixels presenting real world geographical units. States are multiples (e.g. in CA-based land use model, the CA states present the different land use categories). Time is given in discrete steps presenting real-life time unit such as years. Rules determine local interactions between geographical units and dictate their evolution (change, stationary or/and growth). Different models have different sets and neighbourhoods. The CA-based model developed in this paper consists of (1) a regular discrete lattice of cells in raster GIS format presenting the study area; (2) discrete time steps controlling the evolution of cells; (3) a finite set of state (land use classes) to characterise cells; (4) Moore neighbourhood identifying cells that affect the evolution of the central cell; (5) a set of transition rules determining cells' future states using the states of the cell and its neighbours. In this paper, the transition rule reflects land use dynamics over the 1990 to 2000 period. Therefore, the CA model was calibrated based on the historical period of 1990 and 2000. The CA transition rules were implemented using the DLT. More precisely, the CART (Classification and Regression Tree) (Breiman, 1993) implementation of decision trees with a GINI impurity index (Raileanu & Stoffel, 2004) for selecting the best split criterion was used. The model was
95
validated by comparing unseen parts (i.e. areas that were not used for the calibration of the model) of the actual (i.e. observed) Corine Land Cover dataset of the year 2000 with the simulated ones. Using the calibrated and validated CA model, the land use map up to 2006 (e.g. the Business As Usual scenario) was generated, analysed and discussed. The land use map up to 2000 and 2006 reflects how the CA model re-interpreted and re-produced local land use change processes in 2000 and 2006. 2. The study area Cross-border regions are complex spatial systems strongly characterised by neighbouring nations. These zones are “crossroads” between heterogeneous human societies and spaces where various socio-economic, socio-cultural and socio-demographic characteristics interact to and influence each other. The next section presents a specific cross border region, the Luxembourg cross border region, which has been already investigated (Charif, 2013; Decoville, Durand, Sohn, & Walther, 2013; Schiebel, Omrani, & Gerber, 2015; Sohn, Reitel, & Walther, 2009). 2.1. The morphological characteristics of the study area The Grand Duchy of Luxembourg is a landlocked country surrounded by France, Belgium and Germany (Fig. 1). Although Luxembourg is among the smallest countries in Europe, it is one of the most dynamic and attractive regions in terms of economy and quality of life. Indeed, the Luxembourg that has emerged in Europe as an attractive and metropolitan area is experiencing an increase in residential and daily mobility (Gerber, 2012). Today, the number of commuters from neighbouring countries shows that Luxembourg is a significant economic force in the region. It plays a leading role in European politics, it is a founding member of the European Union and its capital is the headquarters of several EU institutions. Researchers from different fields will certainly agree that carrying out cross-border land used change research (in particular on a national scale) is a challenging and complex exercise. On one hand, this is because of the particularity of crossborder areas that are generally governed by several interactions and relationships between fundamental drivers such as spatial, policy, social and economic characteristics, specific to each country in the study area (Brunet & Jailly, 2005; Paasi, 2005). On the other hand, cross-border research is complex because it is difficult to collect relevant data, set up harmonised land use datasets, and find out land use drivers that are common among all studied countries. Based on the digital elevation model, we can describe the physical morphology of the study area. The westeeast profiles in Fig. 2 represent the fragmentation of the terrain and show the relief of the study area. The municipalities located in the northern section of the study area are physically more constrained than the municipalities in the southern section. In the real world, especially with regard to potentially suitable areas for construction, terrain conditions (e.g., slope gradient in Fig. 3) rarely hamper construction plans; only prices can rise indefinitely. This factor could thus be a new aspect to investigate in future studies. Groundwater also deserves further analysis. Indeed, groundwater levels and conditions are crucial for agricultural and industrial activities, as well as for building residential areas or industrial units. 2.2. Measuring the spatial and temporal analysis of urban and industrial land use within variable accessibilities This section presents different types of accessibility maps and their calculation. The objective here is to highlight the existing correlation between land use distribution and location and the
96
R.M. Basse et al. / Applied Geography 67 (2016) 94e108
Fig. 1. Geographical location and administrative division of the study area. The shaded relief background derived from a digital elevation model indicates the terrain conditions.
structure of accessibilities by using different variables (e.g. distance to state border in kilometres, time in minutes needed to access specific networks). Fig. 4 shows the travel time to reach urban and industrial zones from the highway. The closest urban areas or industrial units (cells representing them) are those that can be reached within the shortest time from the transport network. However, cell density (number of adjacent cells) varies depending on the level of accessibility. For example, the urban land use class is highly present within two minutes from highway exits. Indeed, this phenomenon is illustrated in Fig. 6c where the peak of more than 10,000 cells is reached. It is also clear that the further away from the highway, the higher the intensity of urban cell expansion. As regards the industrial class, the peak of 3500 cells is reached within two minutes from the highway (Fig. 6a). This can be explained by the supposed priority of industrial and business activities aiming to attract the maximum number of clients and to minimise transport costs. Indeed, within approximately 30 min from the highway, the industrial class vanishes from the urban landscape (Fig. 4). On the contrary, the urban class, which exhibits a more complex
behaviour, continues to be localised in areas further than 30 min from the highway. This is because people may choose not to live next to busy roads because of air and noise pollution and therefore seek locations at a comfortable distance and providing quiet and healthy living conditions and good connectivity. This optimisation push/pull effect partly explains the observed scattered phenomenon and the difficulty in calibrating these types of land occupation within a modelling context using cellular automata-based models (White & Engelen, 2000). Figs. 5 and 6 show the discontinuity in land use distribution. These discontinuities are more consistent in Fig. 5, which shows the physical distance from the state border indicating accessibility of urban and industrial zones. Indeed, as Fig. 6b and d show, land use movements are jerky. From 2 km from the border, urban and industrial cells appear to compete in order to be located within the same areas; they are therefore often adjacent and interlocked (Fig. 5). This spatial competition for the best location explains why cells belonging to the urban and industrial classes show similar jerking movements up until 15 km (see Fig. 6b and d).
R.M. Basse et al. / Applied Geography 67 (2016) 94e108
97
Fig. 2. WesteEast relief profiles of the study area.
Undertaking comparative analysis with variable accessibilities in relation to land use type, location and distribution allows better understanding of the behaviour of land use patterns in the analysed cross-border system. Identified spatial discontinuities lead to the emergence of what we have referred to in here as land use boundaries or cellular barriers. In relation to the real world, these boundaries can be caused by the presence of a transport network (e.g., highways, access to highway railway stations), the presence of a hydraulic network, geographic exposure (south-east; south-west, etc.), steep slope gradient and other anthropic activities (such as the establishment of recreational green spaces around an urban area) or simply vicinity to an international border (see Fig. 6 b and d). 2.3. Analysing the land use dynamics of the study area between 1990 and 2006 In addition to transport networks, we have selected five land use classes (urban areas, industrial units, agriculture areas, forest and water bodies) based on the Corine Land Cover datasets for 1990, 2000 and 2006 (EEA, 2000). Fig. 7 shows how the land use system is organised around transport networks. Table 1 summarises the evolution of land use in different periods: between 1990 and 2000, between 2000 and 2006 and between 1990 and 2006. This distinction of land use evolution is made in different time series in order to show the difference degree of land use dynamics in 10 years, 4 years and 16 years. Table 1 also reveals land use growth tendencies of urban areas with a growth of 3.56% between 1990 and 2000, 3.10% between 2000 and 2006. The growth of 6.8% is registered for the full period from 1990 till 2006. The most important growth is recorded in industrial units with 12.30%, between 1990 and 2000, 7.9% between 2000 and 2006, and 20.26% for the entire period of 16 years. At first sight, we are confronted with a
territory that is primarily dominated by two classes: the agricultural class and the forest class that represent respectively 52.74% and 39,44% in percentage of the land use of total area of interest in 2006 (Table 1 and Fig. 7). If we look at their evolution, we notice a loss of momentum, particularly for the agricultural classes with 0.68% between 1990 and 2000, 0.62% between 2000 and 2006 and 1.29% between 1990 and 2006. The decrease of agriculture class undoubtedly benefits artificial classes; that is, urban and industrial units. Contrary to the agricultural category, forest category areas appear less affected by the growth of artificial classes because this class decreased slowly between 1990 and 2000 (0.01%) but started to increase for example between 2000 and 2006 (þ0. 07%), thus over the studied 16 years forest area witnessed grew with a ratio of þ0.16%. The increase of the water bodies' class between 1990 and 2000 can be explained by the construction of an artificial lake between the two periods or by errors in land cover images classification. Table 1 shows that this study deals with a highly stable area. 3. Model development 3.1. The conceptual modelling framework The conceptual modelling framework (Fig. 8) describes the way how this model was built in particular the process, its components and their interaction between building and thinking. It details the developed methodological approach integrating CA, GIS, and DLT. Basically, Fig. 8 answers the questions on how we built the model and what the main ingredients were used to develop the model? The steps of data collection, data harmonization, pre-processing, and preparation of model input dataset were all completed whithin a GIS environment (ArcGIS) whereas model calibration, validation and performance assessment have been developed using
98
R.M. Basse et al. / Applied Geography 67 (2016) 94e108
Fig. 3. Terrain conditions represented by slope gradient of the study area.
MATLAB. The conceptual modelling framework indicates also the driving factors e.g., distance to Luxembourg borderline and travel time to main cities such as Luxembourg city, Esch sur Alzette and Differdange. 3.2. Model rule-based DLT: how do we formalised the decision learning tool? 3.2.1. What is DLT in concrete terms? Suppose we want to predict the output Y ¼ fY1 ; …; Ym g (dependent variable) using explanatory variables fX1 ; X2 ; …; Xn g. A classification tree (Breiman, 1993) is a simple data mining method used to define the relationship between Y and the explanatory variables Xi ; …; n. This method is based on the simple idea of asking a series of well crafted questions about the explaining variables Xi ; …; n to finally predict the value of Y. Classification trees are particularly useful for data with many features where the interaction between the dependent and the explanatory variables is complicated and non linear. They are usually used when linear or
polynomial regressions are inapplicable. Using the values of explanatory variables, this method consists of sub-dividing the decision space into areas A1 ; …; An in which prediction is straightforward. Each area is associated with one of the prediction values. A decision tree is a hierarchical structure composed of three types of nodes: - Root node: represents the higher level of this hierarchy. It represents the data to predict, with the first orientation question defining its path through the decision tree. - Internal node: represents the data with further orientation questions and thus reduces its decision/search space. - Leaf node: is the lower level of the decision tree structure in which the prediction is made. 3.2.2. Example of a decision tree The following section shows an example of a decision tree for classifying non overlapping set coordinates from three different 2D Gaussian functions (350 couples of coordinates). The full dataset
R.M. Basse et al. / Applied Geography 67 (2016) 94e108
99
Fig. 4. Accessibility expressed by travel time in minutes between highway exits and industrial zones or urban districts.
was split into two parts: 245 records for training and 105 for testing. A sample of this data is presented in Table 2. Fig. 9 shows the decision tree that resulted from processing this data and how the decision area A1 ; …; A6 was constructed using root and internal node questions. In this paper, we used the CART implementation of decision trees with a GINI impurity index to select the best split criterion. CART implementation uses a series of binary split over one variable to define the decision tree. It calculates all places where splits are possible on all variables and selects the one that minimises the GINI index given by the following equations:
GINIðnt Þ ¼ 1
X
½pðijnt Þ2
(1)
i¼1;…;m
where pðijnt Þ is the fraction of proportion record belonging to class i given nt node split. The GINI indexes of children are then summed up to calculate the parent node (root and internal nodes) using the
following equation:
GINIðnt Þ ¼
2 X ni i¼1
n
GINIðiÞ
(2)
where ni and n are the total number of records belonging to the ith child and to the parent node nt respectively. 3.2.3. Why combining CA and decision learning tree model in land use change modelling? CA is often combined with others methods capable of predicting land use evolution. Two types of predictive models were used: 1) data models such as logistic regression and 2) machine learning based models such as DLT (Pal & Mather, 2003; Ballestores & Qiu, 2012). Indeed, DLTs are among the first machine learning method that were known for their ability: (a) to classify spatial data; (b) to extract patterns from and mine data; (c) to predict and consequently (d) take reasonable and comprehensive decisions (Breiman, Friedman, Olshen, & Stone, 1984; Goodman & Smyth,
100
R.M. Basse et al. / Applied Geography 67 (2016) 94e108
Fig. 5. Overview of urban and industrial locations and their geometric distances from the country border line.
1988; Moore et al, 1991; Wu, Silvan-Cardenas, & Wang, 2007; Speybroeck et al., 2004; Li & Claramunt, 2006). DLT, like other machine learning method (e.g. artificial neural networks; support vector machine) build a non-linear relationships between land use categories, and the land use change driving factors identified using land use modelling exercise (Razi & Athappilly, 2005). Comparing to other machine learning, decision tree has one major advantage that it is not a “black-box system”, in contrary it can be defined as a “white-box system” or “transparent-box system” in the sense that it allows a comprehensive modelling of the evolution processes of the studied system. Indeed, the transition processes generate by the DLT during the modelling phase can be easily read and interpreted by modeller and decision makers (Breiman et al., 1984; Li & Yeh, 2004). In land use change prediction, the possibility of interpreting the processes that influence land use change is an important aspect and remain one of the main challenges (Briassoulis, 1999; Lambin, Rounsevell, & Geist, 2000; Guisan & Zimmermann, 2000; Houet, Verburg, & Loveland, 2010; Verburg, van Berkel, van Doorn, van Eupen, & van den Heiligenberg, 2010).
In the developed model, the decision tree is used to define the CA transition rules so it is kernel part of the model. DLT model deduce the transition rules directly from data (characteristics of cells). It extracts patterns of the land use, and defines a set of “if else” conditions over the driving factors that leads to predicting land use dynamics. Deducing the transition rules directly from data instead of manually calibrating the model has many advantages including the following: 1) It simplifies, shorten the time, and decrease the effort needed for the development of the model. 2) It calibrates the model automatically and thus allows the modeller to focus on identifying the adequate driving factors and on conceptualizing the land use change model. 3) It extracts the CA transition rules through a learning process using training processes and evaluating with unseen datasets. Thus when these two dataset are carefully sampled, the model is capable of generalizing and predicting land use changes well. In other words comparing to manual calibration, DLT may be
R.M. Basse et al. / Applied Geography 67 (2016) 94e108
Fig. 6. (a, b, c, d). Spatial discontinuities of land use distribution in 2000.
Fig. 7. Land use maps (left 1990; right 2006).
101
102
R.M. Basse et al. / Applied Geography 67 (2016) 94e108
Table 1 Land Use evolution between 1990, 2000 and 2006. Land use Categories (LUC)
LUC_1990 in total cells [hectare]
1990e2000 2000e2006 1990e2006 LUC_2006 LUC_2006 in % of the LUC_2000 LUC_2000 In % of the LUC_1990 in % of the land use of total area of evolution In evolution In evolution In land use of total area of in total land use of total area of in total % % % interest cells interest cells interest
Urban areas Industrial units Agricultural areas Forest Water bodies Total Cells
70,152 16,081
5.3 1.22
72,672 18,059
5.5 1.37
74,924 19,339
5.67 1.46
3.59 12.30
3.10 7.09
6.80 20.26
706,640
53.43
701,867
53.07
697,510
52.74
0.68
0.62
1.29
520,731 8838
39.38 0.67
520,677 9167
39.37 0.69
521,543 9126
39.44 0.69
0.01 3.72
0.17 0.45
0.16 3.26
1,322,442
100
1,322,442 100
0.00
0.00
0.00
1,322,442 100
Fig. 8. The conceptual modelling framework.
R.M. Basse et al. / Applied Geography 67 (2016) 94e108 Table 2 Sample of datasets. X
Y
Gaussian
10.78922 8.628982 6.238848 0.467834 8.472096 6.649388 3.829801 7.088098 9.424672
6.067396 3.657541 1.774954 7.455956 7.425656 3.249413 5.957257 1.424874 7.728822
1 2 2 3 1 2 3 2 1
103
2000 and 2006, version 16, released in April 2012. Corine data classifies land cover types into 44 classes using the methodology described in the “Corine land cover technical guide e Addendum 2000” (EEA, 2000). Surface/terrain characteristics (e.g., elevation, gradient, profiles) were described in the digital elevation model (DEM) obtained by the Shuttle Radar Topography Mission (SRTM). The raw elevation data was further processed by Vogt et al. (2007). The spatial resolution of the cell-based input data was 100 m. The vector-based input data (e.g., administrative areas and borders, transport network and facilities) were available from different sources e LISER (Luxembourg Institute of Economic
Fig. 9. The decision learning tree structure.
presented as very capable solution to the problem of overfitting. 3.3. Description of input data Spatial information related to land use were based on the European Corine Land Cover/Land Use dataset for the years 1990,
Research; OSM (Open Street Maps). Regarding the different original reference systems of data sources, a harmonised reference system was chosen for the analysis (Lambert Azimuthal Equal Area projection with the ETRS 1989 datum). This system is also in line with European standards (Annoni, , Luzet, & Gubler andIhde, 2001). In addition to the previously described settings of CA and DLT, the model inputs are as follows (Table 3):
104
R.M. Basse et al. / Applied Geography 67 (2016) 94e108
(1) Land use in 1990 and 2006 with five classes (urban, industrial, agriculture, forest and water) (Corine Land Cover, EEA., 2000). (2) The used neighbourhood is an adapted version of the Moore neighbourhood which is characterised by a 10*10 window where cells are symmetrically arranged (composed of 99 cells which surround the studied central cell in a the two dimensional square lattice). (3) The transport networks and their elements (e.g., bus stations, railway stations, highway exits, secondary roads, derived distance from objects). (4) State borders of Luxembourg and the calculated geometric distance both sides of the border line. (5) The Euclidian distance to the centre of three main cities (Luxembourg, Esch-sur-Alzette and Differdange). (6) the gradient map in percentage derived from the DEM [SRTM-Digital Elevation Model] (Farr et al., 2007).
3.4. Model calibration and validation This section presents the calibration and validation process and also answers the questions on the extent to which the model was able to replicate real, observed changes in the land use of the study area. To calibrate the model, we divided the database into two parts: a learning part and a testing part at two levels. The database was first divided at the level of cells which changed their state between 1990 and 2000 (¼7802) of which 70% (¼5462) were used for learning and 30% (¼2340) for testing/validation. Second, it was divided at the level of cells which remained unchanged between 1990 and 2000 (¼454,015) of which 30% (¼136,205) were used for learning and the remaining 70% for testing/validation. This approach is important as it made it possible to identify land use changes patterns even when the “system” appeared to be “stable” in time and allowed the model to deal with the issue of unbalanced data (i.e. the number of observed unchanged cells is much more than changed one) (Charif, 2013; Charif and Basse, 2015). Table 4 shows the model performance at two levels; first, when predicting the
cells which change their state (changed-set-results) with a prediction score of 82.44%. It is important to first concentrate on these types of cells in order to predict realistically the number of cells which actually change their state within the entire system and to be able to detect and isolate the land use type that is more conducive/ sensitive to change (Pontius, Huffaker, & Denman, 2004a; 2004b; White, 2006). Focussing on these changes, it becomes possible to quantify the change potential in the overall land use system. Model validation then proceeds at the level of cells which do not change their state e the “unchanged-set-results”. These cells were successfully predicted with a rate 99.35%. Therefore, The overall success rate considering both changed and unchanged cell is 99.23%. The high success rate showcases that the model was capable of learning the land use change patterns despites dealing with the complexity of the cross-border spatial system. Table 5 re-enforces our previous remarks. In this table, the diagonal elements in the matrix represent the number of correctly predicted pixels of each of the four land use classes and the offdiagonal elements present the number of wrongly predicted cells i.e. prediction errors. Based on the confusion matrix (Table 5), the model can be considered well constructed, able to use diverse empirical datasets and is now ready to predict land use change until 2006. To evaluate model performance in 2006, we predicted the land use map of 2006. The confusion matrix shows that the model managed to accurately predict the land use of 2006 (Table 6). Regarding the overall model performance (table, 4, 5 and 6), we believe that decision learning tree algorithm enhance the performance of the CA model when predicting land use maps of 2000 and 2006, by highlighting (a) the influence of neighbourhood in the land use class interactions (b) revealing the spatial distribution of
Table 4 Three steps for the validation of results. In percentage from the full dataset (%) Changed-Set-Results Unchanged-Set-Results Full-Set-Results
82.44 99.35 99.23
Table 3 Model inputs. Variables
Description of the variables
Land use states
1 Urban, 2 Industry 3 Agriculture 4 Forest 5 Water 6 Transport - Urbaneneighbours in the 10 10 Moore neighbourhood - Industrialeneighbours in the 10 10 Moore neighbourhood - Agricultureeneighbours in the 10 10 Moore neighbourhood - Agricultureeneighbours in the 10 10 Moore neighbourhood - Foresteneighbours in the 10 10 Moore neighbourhood - Watereneighbours in the 10 10 Moore neighbourhood Distance to border/Frontier in kilometres Slope value of cell (%) Distance to main cities - Distance to Luxembourg city in min - Distance to Differdange city in min - Distance to Esch-sur-Alzette city in min Amount of transport cells in the neighbourhood of the studied cell - Distance to the closest bus station (metres) - Distance to the closest train station (metres) - Distance from cell to the nearest highway access point (km) - Number of bus stations located 2 km away from cell - Number of train stations located 2 km away from cell
Adapted Moore-neighbourhood
Border/Frontier Slope gradient Distance Travel time
Transporteneighbours
R.M. Basse et al. / Applied Geography 67 (2016) 94e108
105
Table 5 The confusion matrix table for the year 2000. Observed situation 2000
Simulated situation 2000 Urban Industry Agriculture Forest
Urban 69,142 27 5507 68
Industry 21 15,348 1629 823
Agriculture 141 298 684,229 904
Forest 10 103 113 513,226
the land use pattern, (c) showing key drivers that influence land use transition within the hierarchical structure of the DLT. With the use of DLT as the rule-based model, knowledge about transition from one land use class to another is formalised to a certain degree. 3.5. Neighbourhood sensitivity: a different way to explore model performances In order to model the sensitivity of CA to the Moore neighbourhood, we trained the decision tree with a neighbourhood radius varying from 1 to 10. Its performance in predicting land use change on the test dataset was then recorded. To avoid randomness bias, we repeated this procedure 100 times for each neighbourhood setting. Fig. 10 shows that performance increased and reached its maximum for a neighbourhood radius equal to 10. Neighbourhood sensitivity was tested in a highly restrictive area which corresponded to the Luxembourg national territory. Indeed, reducing the size of the study area made it possible to better measure the probability of cell changes within the varying neighbourhood. Doing this in a restrictive area (smaller that the initial study area) had the major advantage of enabling us to overlook the “bordure” of the initial study area. The “bordure” can also be addressed by increasing the size of the study area. We chose to reduce it in order to control the quality of the harmonised data and keep the accurate model validation/model calibration match. 4. Discussion The exercise presented involved modelling the land use of a cross-border region by taking into consideration variables such as the transport network, state borders, physical determinants (slope gradient in percentage) and neighbourhood effects. Results show that taking into account the complexity of land use within a border context cannot be achieved using simple statistical tools alone, but that the applied tools must be supplemented with GIS and other more comprehensive methods such as the model presented in this paper. Integrating DLT and CA allowed involvement of all structural variables of cross-border land use systems and provision of a realistic, robust and complex land use model. Indeed, analysing the model's results, considering the variable accessibility maps suggests that land use has a multiform behaviour concerning the relationship to the transport facilities and in particular for residential zones (urban class). This is because it is either attracted or repelled by transport infrastructure. Spatial competition between urban and industrial features occupying the same location was
Fig. 10. Model performance: sensitivity to neighbourhood radius in cell.
obvious. The observed rivalry explains why cells belonging to urban and Industrial classes are interlocked and naturally adjacent in some regions. In cases where there was repulsion, the urban class distanced itself from the transport infrastructure, leading to sprawl and dispersion at urban cell level (urban sprawl phenomenon). On the contrary, the industrial land use categories often sought to be located in immediate proximity to transport facilities in order to obtain the best economic performance (e.g., price of properties, rental fees, transport costs, attracting clients, vicinity of human resources). Focussing on the state borders, we observed a physical barrier effect. This is logical if we consider the variables used to construct the model. We were therefore able to highlight the fact that the majority of urban and industrial zones were located close to the border. Moreover, the further one moved away from the border, the less concentrated were the urban and industrial cells. The only exception was the “sillon lorrain” (extreme south of the study area) which forms a dense urban belt with significant industrial zones. The cells which make up the industrial category were primarily concentrated within and just outside Luxembourg's southern borders. This phenomenon can be credited to the economic history of the region which has specialised in heavy industry (specifically steel) over the decades. Indeed, cities such as Florange and
Table 6 The confusion matrix table for 2006. Observed situation 2006 Urban Industrial Agriculture Forest
Simulated situation 2006 Urban Industrial 68,783 159 291 15,180 2701 1211 111 674
Agriculture 2397 865 681,384 3128
Forest 204 546 2025 512,045
106
R.M. Basse et al. / Applied Geography 67 (2016) 94e108
Fig. 11. Top left (observed/mapped land use based on CLC2000) top right (predicted/modelled land use for the year 2000). Bottom left (observed/mapped land use based on the CLC2006) bottom right (predicted/modelled land use 2006).
Differdange still bare traces of this industrial history. However, we must remain prudent at this level of the study. Indeed, only socioeconomic and demographic data can support the existence of a border effect by showing, for example, how the concentration of industrial activities in Luxembourg's southern border area influenced employment organisation in the southern part of the study area. To a certain extent, this influence led to the specialisation of the Lorrain corridor's workforce. Finally, the land use maps (simulated situation in 2000, simulated situation in 2006 (Fig. 11)) and the difference maps between observed and simulated land use maps 2000; 2006 (Fig. 12)
indicate that there is no significant land use change in the study area. The unchanged-set-results (99.35%) and the full-set-results (99.23%) during the validation process also confirm this observation. The global modelling results reveal that the area is still dynamic but changes are slow and do not affect the structure of the land use system. Regarding model performance, we have no doubt that the presented methodology can be replicated and that the model can be adapted to other areas of interest in an efficient and successful manner.
R.M. Basse et al. / Applied Geography 67 (2016) 94e108
107
Fig. 12. Areas where differences occur between observed and predicted features for the years 2000 and 2006.
5. Conclusion The results of the study demonstrate that land use dynamics in a cross-border region can be modelled adequately by combining GIS, CA and DLT methods, particularly when spatial and explicit variables are mobilised. Indeed, these variables play a crucial role in land use development and spatial interconnections. The results also indicate that the model is able to predict correctly the changes in land use characteristics through the change-set-results validation process. These results strengthen the hypothesis that the use of CART implementation of a decision tree with a GINI index is able to test all probabilities in cell state transition. It is also capable of learning the transition between land use categories during the simulation period by proposing the “ideal” choice for selecting the best split criterion in a satisfactory way. Although, this paper focuses on neighbourhood sensitivity, it is clearly beneficial in the model validation process and this aspect contributes to guarantee good model performance. The global modelling results show that the area analysed is still dynamic but the changes have slowed
down in the last decade. Future research will use the same method but will go further by integrating the different socio-economic conditions and planning issues across the cross border region of Luxembourg. Acknowledgements This paper would like to acknowledge the SMART-BOUNDARYproject (Automates cellulaires pour la simulation de la croissance s transfrontalie res), co-founded by CNRS urbaine et des mobilite (Centre National de la Recherche Scientifique), France and FNR (Fond National de la Recherche), Luxembourg. This material is also la Formabased upon work partially supported by an AFR (Aide a tion Recherche) Grant No. PHD- 09-077, funded by the FNR. The paper also acknowledge the following contribution «Modelling land use dynamics in Luxembourg cross border region: The use of cellular automata and decision tree learning model ». presented by Omar Charif and Reine Maria Basse at the GeoComputation 2015 conference, May 20 - 23, 2015, Dallas, USA.
108
R.M. Basse et al. / Applied Geography 67 (2016) 94e108
References Annoni, A., Luzet, C., Gubler, E., & Ihde, J. (Eds.). (2001). Map projections for Europe, European Commission, Directorate-General Joint Research Centre (p. 131). Ispra, Italy: Institute for Environment and Sustainability. EUR 20120 EN. Ballestores, F., Jr., & Qiu, Z. (2012). An integrated parcel-based land use change model using cellular automata and decision tree. Proceedings of the International Academy of Ecology and Environmental Sciences, 2(2), 53e69. Basse, R. M., Omrani, H., Charif, O., Gerber, P., & Bodis, K. (2014). Land use changes modelling using advanced methods: cellular automata and artificial neural networks. The spatial and explicit representation of land cover dynamics at the cross-border region scal”e. Applied Geography, 53, 160e171. Batty, M., & Xie, Y. (1994). From cells to cities. Environment and Planning B: Planning and Design, 21(7), 31e48. Batty, M., Xie, Y., & Sun, Z. L. (1999). Modeling urban dynamics through GIS-based cellular automata. Computers, environment and urban systems, 23(3), 205e233. Benenson, I., & Torrens, P. M. (2004). Geosimulation: object-based modeling of urban phenomena. Computers, Environment and Urban Systems, 28(1e2), 1e8. Breiman, L. (1993). Classification and regression trees. CRC press. Breiman, L., Friedman, J. H., Olshen, R., & Stone, C. J. (1984). Classification and regression trees. Wadsworth statistics/probability series. Belmont, CA,USA: Wadsworth Advanced Books and Software. Briassoulis, H. (1999). Analysis of land use change: Theoretical and modeling approaches. Morgantown, West Virginia, USA: Regional Research Institute, West Virgina University. Brunet-Jailly, E. (2005). Theorizing borders: an interdisciplinary perspective. Geopolitics, 10, 633e649. Charif, O. (2013). Modelling and simulating individual's mobility: Case study of de Technologie de Luxembourg and its greater region. PhD diss. Universite gne. Compie Charif, O., & Basse, R. M. (2015). Modelling land use dynamics in Luxembourg cross border region: the use of cellular automata and decision tree learning model. In Geocomputing conference, Dallas,Texas, USA, 20e23, may. Clarke, K. C., & Gaydos, L. J. (1998). Loose-coupling a cellular automata model and GIS: long-term urban growth prediction for San Francisco and Washington/ Baltimore. International Journal of Geographic Information Science, 12, 699e714. Couclelis, H. (1985). Cellular worlds: a framework for modeling micro-macro dynamics. Environment and Planning A, 20, 99e109. Decoville, A., Durand, F., Sohn, C., & Walther, O. (2013). Comparing cross-border metropolitan integration in Europe: towards a functional typology. Journal of Borderlands Studies, 28(2), 221e237. EEA.. (2000). CORINE land cover technical guide e Addendum 2000. Technical report No 40 (p. 105). Copenhagen: European Environment Agency. URL http://reports. eea.europa.eu/tech40add/en. Farr, T. G., Rosen, P. A., Caro, E., Crippen, R., Duren, R., Hensley, S., et al. (2007). The shuttle radar topography mission. Reviews of Geophysics, 45, RG2004. http:// dx.doi.org/10.1029/2005RG000183. Available at: the website of Consultative Group for International Agriculture Research (CGIAR), CGIAR Consortium for Spatial Information (CGIAR-CSI) http://srtm.csi.cgiar.org/. Gardner, M. (1970). The fantastic combination of John Conway's new solitaire game life. Scientific American, 223, 120e123. Geist, H. J., & Lambin, E. F. (2001). What drives tropical deforestation? A meta-analysis of proximate and underlying causes of deforestation based on sub-national case study evidence. Louvain-la-Neuve, France: LUCC International Project Office, University of Louvain-la-Neuve. Gerber, P. (2012). Advancement in conceptualizing cross-border daily mobility: the Benelux context in the European Union. European Journal of Transport and Infrastructure Research, 12(2), 178e197. Goodman, R. M., & Smyth, P. (1988). Decision tree design from a communication theory standpoint. IEEE Transactions on Information Theory, 34(5), 979e994. Guisan, A., & Zimmermann, N. E. (2000). Predictive habitat distribution models in ecology. Ecological Modelling, 135(2e3), 147e186. Houet, T., Verburg, P. H., & Loveland, T. (2010). Monitoring and modelling landscape dynamics. Landscape Ecology, 25(2), 163e167. Lambin, E. F., Geist, H. J., & Lepers, E. (2003). Dynamics of land use and land-cover change in tropical regions. Annual Review of Environmental Resources, 28, 205e241. Lambin, E. F., Rounsevell, M. D. A., & Geist, H. J. (2000). Are agricultural land-use
models able to predict changes in land-use intensity? Agriculture Ecosystems and the Environment, 82, 1e3. Li, X., & Claramunt, C. A. (2006). Spatial entropy-based decision tree for classification of geographical information. Transactions in GIS, 10(3), 451e467. Li, X., & Yeh, A. G. (2004). Data mining of cellular automata's transition rules. International Journal of Geographical Information Science, 18(8), 723e744. Moore, D. M., Lees, B. G., & Davey, S. M. (1991). A new method for predicting vegetation distributions using decision tree analysis in a geographic information system. Environmental Management, 15(1), 59e71. Paasi, A. (2005). Generations and the ‘development’ of border studies. Geopolitics, 10, 663e671. Pal, M., & Mather, P. M. (2003). An assessment of the effectiveness of decision tree methods for land cover classification. Remote Sensing of Environment, 86(4), 554e565. Phipps, M. (1989). Dynamical behavior of cellular automata under the constraint of neighborhood coherence. Geographical Analysis, 21, 197e204. Pontius, R. G., Jr., Huffaker, D., & Denman, K. (2004a). Useful techniques of validation for spatially explicit land-change models. Ecological Modelling, 179(4), 445e461. Pontius, R. G., Jr., Shusas, E., & McEachern, M. (2004b). Detecting important categorical land changes while accounting for persistence. Agriculture, Ecosystems & Environment, 101(2e3), 251e268. Raileanu, L. E., & Stoffel, K. (2004). Theoretical comparison between the gini index and information gain criteria. Annals of Mathematics and Artificial Intelligence, 41(1), 77e93. Razi, M. A., & Athappilly, K. (2005). A comparative predictive analysis of neural networks (NNs), nonlinear regression and classification and regression tree (CART) models. Expert Systems with Applications, 29(1), 65e74. Schiebel, J., Omrani, H., & Gerber, P. (2015). Border effects on the travel mode choice of resident and cross-border workers in Luxembourg. European Journal of Transport and Infrastructure Research, 15(4), 570e596. Sohn, C., Reitel, B., & Walther, O. (2009). Cross-border metropolitan integration in Europe. The case of Luxembourg. Basel and Geneva. Environment & Planning C, 27(5), 922e939. Speybroeck, N., Berkvens, D., Mfoukou-Ntsakala, A., Aerts, M., Hens, N., Van Huylenbroeck, G., et al. (2004). Classification trees versus multinomial models in the analysis of urban farming systems in central Africa. Agricultural Systems, 80(2), 133e149. Tobler, W. R. (1970). A computer movie simulating urban growth in the Detroit region. Economic Geography, 46(2), 234e240. Tobler, W. R. (1979). Cellular geography. In S. Gale, & G. Ollson (Eds.), Philosophy in geography (pp. 279e386). Dordrecht: Reidel. Turing, A. (1950). computing machinery and intelligence. Mind, 59(236), 433e460. Ulam, S. (1952). Random processes and transformations. In Proceedings of the International Congress of Mathematicians (Cambridge, Massachusetts, August 30eSeptember 6, 1950) (vol. 2). Rhode Island: American Mathematical Society, 264e275. Verburg, P. H., van Berkel, D. B., van Doorn, A., van Eupen, M., & van den Heiligenberg, H. (2010). Trajectories of land use change in Europe: a modelbased exploration of rural futures. Landscape Ecology, 25(2), 217e232. Vogt, J. V., Soille, P., de Jager, A., Rimaviciute, E., Mehl, W., Foisneau, S., et al. (2007a). A Pan-European river and Catchment database, European Commission, Directorate-General Joint Research Centre. JRC Reference Reports, EUR 22920 EN (p. 119). Ispra, Italy: Institute for Environment and Sustainability. URL http:// desert.jrc.ec.europa.eu/action/php/index.php?action¼view&id¼23. Von Neumann, J. (1951). The general and logical theory of automata. In L. A. Jeffress (Ed.), Cerebral Mechanism in Behavior-the Hixon Symposium, 1948 (pp. 1e41). Pasadena, CA, New York: Wiley. White, R. (2006). Pattern based map comparisons. Journal of Geographical Systems, 8(2), 145e164. White, R., & Engelen, G. (1997). The use of constrained cellular automata for highresolution modelling of urban land use dynamics. Environment and Planning B, 24(3), 323e343. White, R., & Engelen, G. (2000). High-resolution integrated modelling of the spatial dynamics of urban and regional systems. Computers, Environment and Urban Systems, 24, 383e400. Wu, S., Silvan-Cardenas, J., & Wang, L. (2007). Per-field urban land use classification based on tax parcel boundaries. International Journal of Remote Sensing, 28(12), 2777e2801.