Journal of Environmental Management 90 (2009) 236e250 www.elsevier.com/locate/jenvman
Predicting land cover using GIS, Bayesian and evolutionary algorithm methods M.J. Aitkenhead a,*, I.H. Aalders b a
Department of Plant and Soil Science, University of Aberdeen, St. Machar Drive, Aberdeen AB24 3UU, Scotland, UK b The Macaulay Institute, Craigiebuckler, Aberdeen AB15 8QH, Scotland, UK Received 12 April 2006; received in revised form 16 July 2007; accepted 14 September 2007 Available online 20 February 2008
Abstract Modelling land cover change from existing land cover maps is a vital requirement for anyone wishing to understand how the landscape may change in the future. In order to test any land cover change model, existing data must be used. However, often it is not known which data should be applied to the problem, or whether relationships exist within and between complex datasets. Here we have developed and tested a model that applied evolutionary processes to Bayesian networks. The model was developed and tested on a dataset containing land cover information and environmental data, in order to show that decisions about which datasets should be used could be made automatically. Bayesian networks are amenable to evolutionary methods as they can be easily described using a binary string to which crossover and mutation operations can be applied. The method, developed to allow comparison with standard Bayesian network development software, was proved capable of carrying out a rapid and effective search of the space of possible networks in order to find an optimal or near-optimal solution for the selection of datasets that have causal links with one another. Comparison of land cover mapping in the North-East of Scotland was made with a commercial Bayesian software package, with the evolutionary method being shown to provide greater flexibility in its ability to adapt to incorporate/utilise available evidence/knowledge and develop effective and accurate network structures, at the cost of requiring additional computer programming skills. The dataset used to develop the models included GIS-based data taken from the Land Cover for Scotland 1988 (LCS88), Land Capability for Forestry (LCF), Land Capability for Agriculture (LCA), the soil map of Scotland and additional climatic variables. Ó 2007 Elsevier Ltd. All rights reserved. Keywords: Bayesian statistics; Bayesian networks; Land Cover of Scotland 1988 (LCS88); Evolutionary algorithms; Land cover; Land cover change; GIS
1. Introduction The demand for accurate, end-user relevant and up-to-date information on the Earth’s land cover and land cover dynamics is increasing. Skelsey et al. (2004) describe how the increasing supply of raw data from satellites and aerial photography is not being matched by our ability to automate the analysis of this information and render it in useful, targeted, formats. In addition, the potential of other information sources that may allow land cover mapping to be carried out, such as climatic * Corresponding author. Tel.: þ44 1224 498200x2239; fax: þ44 1224 498206. E-mail addresses:
[email protected], m.aitkenhead@abdn. ac.uk (M.J. Aitkenhead). 0301-4797/$ - see front matter Ó 2007 Elsevier Ltd. All rights reserved. doi:10.1016/j.jenvman.2007.09.010
or topographic information, is not being fully realised. Here we present a method of implementing different spatial datasets within a Bayesian system that can automatically integrate these different and disparate datasets, and can determine which ones are most applicable for their use in producing a specific land cover map. Many approaches to land cover mapping exist within the realm of remote sensing image interpretation, but the majority of these are ‘method-oriented’, i.e. mapping based on one specific methodology, rather than ‘result-oriented’, i.e. mapping which applies a combination of analyses and data (such as neural networks, fuzzy k-means, textural measurements, etc.) to solving the problem at hand, with the method selection and application determined by the limits of what each of those methods can achieve. A disadvantage of using method-oriented
M.J. Aitkenhead, I.H. Aalders / Journal of Environmental Management 90 (2009) 236e250
approaches is that, quite simply, no single method is, or ever will be, perfect in its ability to provide land cover mapping capabilities. Using a results-oriented approach instead would allow several different methods to be applied, with the problem then becoming the search for a meaningful integration of these different methods. While Bayesian networks provide a flexible framework for modelling with limited or incomplete information, the design of Bayesian networks is largely based on a logical understanding of the relationship structure of the variables (Welp et al., 2006), where the variables are linked to reflect either causal chains, common cause or common effect (Korb and Nicholson, 2004). Here we present a results-based approach that identifies an optimal design for a Bayesian network using evolutionary methods, avoiding bias or logical error in the assumptions about relationships between any two particular factors within the system. The goal of this work is to show that evolutionary methods can provide a useful tool for creating an objective Bayesian network structure, by applying this optimisation to a real-life situation in which several data sources are used to map the land cover. We do not mean to state that other data-mining methodologies, such as decision trees (Brown De Colstoun and Walthall, 2006; Pal, 2006; McCarty et al., 2007) or neural networks (Kuplich, 2006; Cots-Folch et al., 2007) are less effective for producing land cover maps. Instead, our intention is to demonstrate the utility of linking evolutionary approaches with Bayesian systems in providing a classification system capable of flexibility in data type usage, and to show that a Bayesian system can be evolved to produce optimal or near-optimal network designs based on the datasets available. 1.1. The problems of mapping and modelling land cover Recent years have seen the development of land cover and land cover dynamics models which attempt to include not just environmental conditions but also management practices, economic conditions and other human-specific considerations such as social factors (Lowe et al., 1992; Erasmus et al., 2002; Kloprogge and van der Sluijs, 2002; Mertens et al., 2002). For these models to be successfully implemented, a wide range of variables must be applied, both subjective and objective, descriptive and numerical. These variables must be used in such a way that the relationships between them can be described and implemented effectively. This requirement has created a demand for modelling methods that are flexible enough to be applied across any dataset that is available. The distinction here between mapping and modelling is blurred because while mapping is usually carried out using information directly representative of the land cover (i.e. imagery), we wish to avoid discounting the application of datasets that do not have direct relationships to land cover but from which inferences of land cover can be made using modelling methods. Geographical Information Systems (GIS) provide a method of combining different layers of data that are relevant to the same spatial area. They allow not only visualisation and intuitive comprehension of different datasets but also statistical
237
analysis of the interactions taking place. GISs are rapidly becoming a vital part of any successful environmental modelling approach, allowing disparate data source materials to be brought together, for example satellite imagery and questionnaire results (Mu¨ller and Zeller, 2002), or digital maps and model predictions (Mu¨nier et al., 2001). Anselin (2000) emphasised the importance of the relationship between GIS and spatial statistics, with the emphasis being made that the correct choice of statistical analysis method coupled with GIS-based information can provide a powerful tool for modelling land cover dynamics. Aspinall et al. (1993) described specific approaches required to ensure that GIS methods could be effectively applied to land cover change modelling and land use planning, identifying flexibility of information processing as a particularly important aspect. Comber et al. (2004) provided a comparison of several knowledge-based techniques of determining land cover change, including Bayesian statistics in which the ability to predict a variable within a dataset is improved by drawing on historical relationships with additional variables within that or other datasets. Rigorous Bayesian methods were shown to be most useful where the information was expressed numerically and in situations where prior evidence provided a complete probability distribution (see Section 1.3). The goal of this work is to demonstrate the effectiveness of an adaptation of Bayesian networks that can be applied to land cover mapping/modelling and that can be implemented using many different types of data. This approach, if successful, could be used both to map existing land cover and to determine the trajectories of land cover change. The method involves the implementation of an evolutionary method in optimising a Bayesian network for the extraction of information from several noisy datasets containing a mixture of quantitative and qualitative information. As is common and to be expected in land cover mapping projects, the datasets available do not contain sufficient information to produce a highly accurate model. Even so, meaningful relationships can be extracted from the data. The use of evolutionary methodologies in tandem with Bayesian networks allows us to develop a ‘resultsoriented’ approach that relies less on one specific data type and more on using the available data to produce a better predictive model. In addition to the model development, comparison is made with an existing Bayesian network software package, NeticaÔ (Norsys, http://www.norsys.com) with evaluation between the customised, evolutionary-based method and the off-the-shelf market product in terms of speed, flexibility and accuracy. 1.2. Land Cover of Scotland 1988 (LCS88) Any method that is used to predict land cover must have some validation dataset available to demonstrate its utility, which in this case implies either an existing land cover map or a collection of sampled data. This validation data must be sufficiently sophisticated and widely applied that a demonstration of the ability to automate land cover mapping in the future using the same legend would be useful to a range of end-users.
238
M.J. Aitkenhead, I.H. Aalders / Journal of Environmental Management 90 (2009) 236e250
The Land Cover of Scotland 1988 (LCS88) (MLURI, 1993) survey was instigated in 1987 by The Scottish Office. The survey was carried out in three phases: 1. Medium scale air photography was acquired for the whole of Scotland; 2. Photographic interpretation was carried out, to obtain a 1:250,000 digital map of land cover information that could be used in conjunction with Geographic Information Systems (GIS); 3. Ground-truth assessment was carried out to allow the accuracy of the aerial photography interpretation to be assessed. The overall accuracy of the final land cover map was found to be approximately 98%, with this accuracy being consistent across Scotland and between land cover classes. Substantial proportions of phases 2 and 3 were carried out by the Macaulay Land Use Research Institute (MLURI), now The Macaulay Institute. The data have been widely used by the principal sponsoring body, the contributing bodies of Scottish Natural Heritage (the Nature Conservancy Council for Scotland) and the Forestry Commission, plus local government, academia and the private sector, as a key data source that has led to improved policy formulation and development. While the initial LCS88 survey was regarded as successful, it required many man-years of effort, particularly in the aerial photography interpretation stage, and experienced problems with interpreter consistency across Scotland (Aspinall et al., 1993). The time and expense required to carry out this task are important factors influencing decisions of whether or not to repeat the survey. Therefore, any methods that can provide a degree of automation to the LCS updating process, particularly those that provide flexibility in their information handling, will be useful. A successful method must rely not on a single data source which may become unavailable over time or which may be replaced by something better, but must be capable of accepting whatever information is available at the time and extracting the maximum amount of information from this. Following the development of a land cover map using any manual or automated method an assessment of the map accuracy will need to be carried out. However, the time taken to do this is considerably less than the time taken to produce the map as the number of points examined is far fewer than the map contains. Bayesian methods are seen as one potential solution due to their ability to use prior knowledge to determine land cover transition probabilities. This prior knowledge often already exists in one form or another, reducing the need for expensive and time-consuming data acquisition. In addition, Bayesian methods are well suited for the processing of environmental data due to their ability to handle a noisy mixture of objective and subjective data. 1.3. Bayesian methods of image classification Many problems come to light when one attempts to classify land cover from remote sensing imagery. Discrimination of
classes can prove difficult for many reasons, particularly when within-class variability is high, as is often the case (McIver and Friedl, 2002). Use of prior probabilities, in which additional knowledge about the situation is added to boost the separability of classes, has been shown to work in cases where environmental transitions are unclear (Marcot et al., 2001; Borsuk et al., 2002; Rouget et al., 2003). Other areas in which Bayesian-class methods have proven useful include Decision Support Tools (DSTs) (e.g. Stassopoulou et al., 1998; Palomo et al., 2002) and optimisation of complex system models (Berger and Insua, 1998; Cowles, 2003), along with their application within classical probabilistic modelling methodologies. As discussed by Comber et al. (2004), Bayesian methods work best when a complete prior probability distribution is available (i.e. all possible permutations of the variables have been described in terms of their outcomes). In situations such as this where variables take a finite number of discrete values, it is also necessary to know the probability of each value occurring. However, even in cases where the available dataset does not include all possible permutations of the studied variables, it is possible to develop an adaptation of Bayesian methods to produce relationship networks from the data available. This is made possible by the ability of Bayesian methods to handle objective (i.e. measured numerical or categorical values) and subjective (categorical terms based on descriptions) information, and their ability to provide meaningful results even when prior knowledge is incomplete. For a large proportion of datasets, particularly those with many variables, the number of permutations that are possible means that a complete probability distribution is effectively impossible to achieve. The principles behind Bayesian inference are mathematically relatively simple and yet often prove remarkably difficult to grasp. This core concept, true for most statistical inference, states that the more one knows about the relationships within a situation, the more one is also able to predict what one does not know about that situation. In its simplest form, Bayes’ theorem states that if X and Y are two statements that can be true or false, with the probability of X being P(X ), and the probability of Y being P(Y ), then PðXjYÞ ¼ PðYjXÞPðXÞ=PðYÞ
ð1Þ
where P(XjY ) is the probability of X given Y (posterior probability), and P(YjX ) is the probability of Y given X (prior probability). The posterior probability of X given Y is useful because it allows us to predict unknown situations from prior experience. The following example is adapted from Fotheringham et al. (2000). Statement X: the land cover type in a certain location in a region is heather moorland. Statement Y: the slope at this location is classed as flat. P(X ): a total of 20% of the region is covered by heather moorland (P(X ) ¼ 0.2). P(Y ): a total of 10% of the region is classed as flat (P(Y ) ¼ 0.1).
M.J. Aitkenhead, I.H. Aalders / Journal of Environmental Management 90 (2009) 236e250
Prior probability P(YjX ), the proportion of heather moorland that is flat in the region: 30% (P(YjX ) ¼ 0.3). This information could be based on either expert knowledge of the situation or quantitative sources. Posterior probability P(XjY ), the likelihood that the land is heather moorland given that it is flat: P(XjY ) ¼ 0.3 0.2/ 0.1 ¼ 0.6, or 60%. This example shows how a statement may be investigated given prior knowledge. Berger (2000) gives a good account of how Bayesian methods are being increasingly used throughout the scientific community, with researchers in the biological sciences rapidly adopting and adapting the central core of concepts to their own ends. In a Bayesian network the above example will result in a network with two linked propositional nodes, heather moorland and flat terrain. The knowledge about P(Y ) is used to create the conditional probability table for flat terrain (true ¼ 10 and false ¼ 90). The conditional probability table of heather moorland due to the link from flat terrain to heather moorland has two rows and two columns. With the knowledge that the prior knowledge that 30% of heather moorland is flat, 70% of flat land is not heather moorland and 70% of heather moorland is not flat. Each row needs to total 100%, hence false flat land and false heather moorland is 30% (Table 1). With the conditional probabilities of this very simple Bayesian network filled with our available knowledge the network is compiled. The result of this very simple network (Fig. 1) shows that the probability of heather moorland is 66%, slightly higher than the 0.6 calculated earlier. This difference can be contributed to the fact that the probabilities are calculated based on forward and backward propagation. In real situations, the exact probabilities used to develop this simple network would not be known, but would rather be probability estimates. However, unless the sampling count is insufficient the accuracy of the developed system should be relatively unaffected by this. 1.4. Evolutionary algorithms In this work, which deals with land cover change prediction based on several variables, we implemented Bayesian methods in concert with evolutionary algorithms. For systems that contain multiple variables (i.e. land cover, soil type) and a resulting high number of possible permutations of variables values, an exhaustive comparison of every permutation would take an unavailable length of time or scale of computing resources, and may be impossible due to the scale of the system being studied. Great improvements in the time to optimisation of the Bayesian network can be gained from ‘evolving’ the Table 1 Conditional probability table for example Bayesian network Flat
True False
Heather moorland
239
Fig. 1. Probabilities for example Bayesian network giving likelihood of heather moorland based solely on terrain information.
system from its starting conditions, using mutation, crossover and natural selection. Mutation is the alteration, usually at random, of attributes of the model. The number of alterations that are made can be based on how close the model is to some target design, or how large model variations need to be to adequately allow exploration of the fitness landscape of the model. Crossover is the process of selecting certain portions of parent model A and certain other portions of parent model B and adding them together to produce a new model C. The selection of which components of the model come from which parent is made at random, and each component of the child model must come from exactly one parent. Natural selection as applied in this context is the implementation of a selection process that increases the probability of breeding or continuation of a model that is ‘fitter’ at the cost of others that are less fit. In an analysis of mutation and crossover, Spears (1993) argued that both processes are important in situations where evolutionary pressures are brought to bear on modelled systems. Sette et al. (1998), Aitkenhead et al. (2003) and Arenas et al. (2006) describe methods of evolving complex neural networks towards an optimal solution for a specific problem using multiple variables, while Harvey et al. (1997) and Gomez et al. (2006) amongst others gave an overview of methods used to evolve robot control systems. Many other examples of the use of evolutionary methods to optimise complex system models exist (e.g. Huo and Ma, 2001; Robin et al., 2005), with the implementations varying from extremely simple (single bit mutation) to extremely complex (biologically plausible simulations including mutation, crossover, gene expression, etc.). As long as a classification framework or methodology can be (a) defined in some mathematical way, either as a string of ones and zeroes or some other set of values, and (b) some measure of the ‘fitness’ or effectiveness of the system at achieving some goal can be made, then that system can be subjected to evolutionary pressures. In the case of a Bayesian network, the description is relatively simple: for N variables there exist N(N 1) possible relational connections between variables, with a total of 2N(N 1) permutations to consider. The network can therefore be considered as a string of N(N 1) binary values and as such is ideal for being subjected to mutation and crossover processes. Other requirements for implementing this concept include an algorithm containing an implementation of the evolutionary paradigm, with consideration made of the number of children from each surviving parent, the bit mutation rate and frequency of crossover. 2. Methods
True
False
30 70
70 30
Several steps were required to achieve an evolved Bayesian network, including data extraction, network implementation
240
M.J. Aitkenhead, I.H. Aalders / Journal of Environmental Management 90 (2009) 236e250
and evolutionary algorithm design. These steps are described in the following sections, with explanations of the datasets that were used in the land cover mapping model development. A description of the comparison of the developed model with a commercial Bayesian package (NeticaÔ), in terms of ease of network development and comprehension of output, is given in Section 3. This is done in order to demonstrate that while software packages are currently available that can be applied to this problem, there are still issues of flexibility and data manipulation that remain to be resolved.
Table 2 Summary classes of LCS Value
Summary class
Original class
1
Arable
2
Improved grass
2 30
Improved grassland Improved grassland/good rough grassland
3
Rough grazing
3 4 26
31 32
Good rough grassland Poor rough grassland Poor rough grassland/ heather moorland Good rough grassland/ heather moorland Good rough grassland/poor rough grassland Good rough grassland/bracken Poor rough grassland/peat
1
27
2.1. Map data extraction
29
Spatial data for nine variables are used, which are available for North-East Scotland in the form of 50 m raster data. The data are reclassified from the original data based on ‘functionality’ where necessary to produce categorisations that were judged useful for the purposes of demonstrating land use and land cover change without loss of relevant information. Summary classes were developed wherever possible in a manner that retained the existing relationships within the reclassification, and between summary classes. The datasets used included: The Land Cover for Scotland (LCS88) dataset which is a 1:250,000 digital map created from the interpretation and generalisation of 1:24,000 black and white aerial photographs (MLURI, 1993). This 1:250,000 map is available as a vector and raster map at different resolutions. The original 34 classes were reclassified into 6 classes which represent land covers closely associated to different land uses in Scotland (see Table 2). These classes are easily identifiable within Scotland’s landscape and although they are obviously not appropriate for everyone’s use (particularly as we have put several different classes into ‘other’, it does allow us to distinguish between the land cover types that are dominant in Scotland. Land Capability for Forestry (LCF, Bibby et al., 1988) is a 1:250,000-scale map that is available as a vector map for Scotland and for this paper converted into raster format. The original classification was generalised from 9 to 4 classes, to represent areas where forestry production is viable, those areas where forestry production is limited, and those areas which are completely unsuitable for forestry either due to natural circumstances or urbanisation, see Table 3. Land Capability for Agriculture (LCA, Bibby et al., 1991) is a map with a 1:250,000 scale, which is available for North-East Scotland in raster format. The original classification for the LCA was generalised from 15 to 5 classes, to represent areas where arable agriculture is viable, areas where arable production is limited, areas where only livestock production is possible, and areas which are unsuitable for agriculture with a distinction between natural circumstances and urbanisation, see Table 4. Additional climatic variables that exist as GIS layers (1:625,000) developed and published by the Macaulay
Original class designation Arable
4
Forest
10 11 12 13 14 15
Felled woodland Recent planting Coniferous plantation Semi-natural coniferous Mixed woodland Broadleafed woodland
5
Development
22 23
Rural development Urban
6
Other
5 6 7 8 9 16 17 18 19 20 21 24 25 28 33 34
Bracken Heather moorland Peatland Montane Rocks and cliffs Scrub Freshwater Marshland Salt marshland Dunes Tidal waters Missing/obscured Heather moorland/peat Peat/montane Heather moorland/montane Other mosaics
Institute (Birse and Dry, 1970; Birse and Robertson, 1970; Bibby et al., 1988). The variables include: accumulated temperature (derived from the average annual accumulated temperature (above 5.6 C) at 61 meteorological stations); potential water deficit (derived from the difference between calculated potential evapotranspiration and measured rainfall at the 200 meteorological stations); exposure (derived from average wind speeds in metres per seconds at 72 meteorological stations in the UK); accumulated frost (derived from average annual accumulated temperature below 0 C at 61 meteorological stations), see Table 5. The Soil Survey for Scotland 1:250,000 is a map produced and published by the Macaulay Institute, formerly the Macaulay Institute for Soil Research, 1984 based on survey work carried out by the Institute. The data available in vector and raster format are classified at a number of different hierarchal levels (soil association, soil series
M.J. Aitkenhead, I.H. Aalders / Journal of Environmental Management 90 (2009) 236e250
241
Table 3 Summary classes of LCF
Table 5 Classes of climatic data
Value Summary class Original class Original class designation
Value
Accumulated temperature (in day degrees Centigrade)
1
Land with excellent capability for the growth and management of tree crops Land with very good capability for the growth and management of tree crops Land with good capability for the growth and management of tree crops
1 2 3 4 5 6
Extremely cold Very cold Cold Cool Fairly warm Warm
Land with moderate capability for the growth and management of tree crops Land with limited capability for the growth and management of tree crops
Potential water deficit (in millimeters) 1 Dry 2 Rather dry 3 Moist 4 Rather wet 5 Wet
0 0e25 25e50 50e75 >75
Exposure (in m/s) 1 2 3 4
<2.6 2.6e4.4 4.4e6.2 >6.2
Good capability 1 2 3
2
Some capability 4 5
3
4
Unsuitable
Urban
6
18
Land with very limited capability for the growth and management of tree crops No land
20
Urban
and major soil subgroups). The soil classes are derived from major soil subgroups in the study area (see Table 6). An inventory of forestry in the Grampian region by the Macaulay Institute (personal communication). Fig. 2 shows data for two of the variables used. The raster data that represented these individual variables was combined into one map layer. In the process of combining the raster data Table 4 Summary classes of LCA Value Summary class Original Original class designation class 1
Very capable
1 2
2
Capable
3a 3b 4a 4b
3
Moderately capable
5a 5b 5c 6a 6b 6c
Land capable of producing a very wide range of crops Land capable of producing a wide range of crops Land capable of producing range of crops (division 1) Land capable of producing range of crops (division 2) Land capable of producing range of crops (division 1) Land capable of producing range of crops (division 2)
a moderate a moderate a narrow a narrow
Land capable of use as improved grassland and rough grazing (division 1) Land capable of use as improved grassland and rough grazing (division 2) Land capable of use as improved grassland and rough grazing (division 3) Land capable of use only as rough grazing (division 1) Land capable of use only as rough grazing (division 2) Land capable of use only as rough grazing (division 3)
4
Unsuitable
7 Water
Land of very limited agricultural value No land
5
Urban
Urban
Urban
Sheltered Moderately exposed Exposed Very exposed
Accumulated frost (in day degrees Centigrade) 1 Extremely mild winters 2 Fairly mild winters 3 Moderate winters 4 Rather severe winters 5 Very severe winters 6 Extremely severe winters
0e275 275e550 550e825 825e1100 1100e1375 >1375
>470 230e470 110e230 50e110 20e50 <20
an attribute table was created which linked each individual 50 m cell to a combination of variable values, representing the actual combination of variable values. This attribute table was exported to a ‘tab delimited’ text file, which was modified for use in the later modelling process. The first stage towards development of the Bayesian network was to adjust the dataset to make it meaningful to the network. Bayesian networks benefit from a limited number of values for each variable. It is therefore an advantage to minimize the number of classes to the minimum. The reduction of the number of classes is done on the basis that the new classes are functional and represent critical values in relation to land cover change. In order to carry out the reclassification, missing data elements were removed and data values were categorised. This categorisation took two possible forms. In the first, variables given in the form of discrete, but not necessarily Table 6 Classes of major soil series groups Value
Major soil subgroup
1 2 3 4 5 6
113 118 88 120 90 85 and 86
7
44, 46, 71, 83
8 9 10
91 111 10, 57, 101, 102, 128
Brown forest soils Humus iron podzols Peaty podzols Non-calcareous gleys Peaty gleys Peat (basin, valley and blanket) (Sub-) alpine and (peaty) rankers Brown magnesium soils Magnesium gleys Alluvial and coastal soils
242
M.J. Aitkenhead, I.H. Aalders / Journal of Environmental Management 90 (2009) 236e250
Fig. 2. Soil and land cover distribution for North-East Scotland.
M.J. Aitkenhead, I.H. Aalders / Journal of Environmental Management 90 (2009) 236e250
sequential, values were mapped so that their values were sequential (for example the values 0, 1, 4, 18, 876 would become 1, 2, 3, 4, 5). In the second case, variables composed of analogue values were split into specific range classes. The conversion values of the nine variables are given in the ‘value’ column in Tables 2e6. 2.2. Bayesian network design Following dataset adjustments given above, a structure of Bayesian network was created using the standard Bayesian network development software (NeticaÔ) and the evolutionary method. With the standard Bayesian network development software our current understanding informed the creation of network structure based on logical relationships between the variables. For the evolutionary method the Bayesian network was initialised and evolved (see Section 2.3 for further details). Finally the resulting structures were compared (see Results, Section 3). All programming for the evolutionary method was carried out using Microsoft Visual Basic 6.0. During the evolutionary phase it was possible to apply certain restrictions to the structure of the network, and so reduce the search time required to find an optimal solution. Certain variables, specifically Forestry (i.e. a high density of trees) and land cover, were defined as ‘outputs’ of the network and were therefore required to have at least one connection to them and no connections from them, while all other variables (land capability for agriculture, land capability for forestry, temperature, potential water deficit, shelter, winter duration and soil classes) were defined as ‘inputs’ from each of which there could be no connections to and there had to be at least one connection from. Candidate networks that did not conform to these restrictions could be easily detected and eliminated, using a simple subroutine, to avoid wasting search time with meaningless network designs. Implementation of the network occurred at the start of each evolutionary generation, which is explained further in Section 2.3. Fig. 3 gives an example of a randomly generated Bayesian network containing the variables used in this work. This is just an example of a network, and is not intended to represent the network architectures that were actually used. Training of the Bayesian network can be considered as a process by which each cell of the combined map/data layer was given to the network as a piece of evidence from an
Fig. 3. A random Bayesian network prior to evolutionary optimisation.
243
‘experiment’, about the relationship between the variables involved. While this evidence represents ‘observations’ of actual relationships in the study area, the Bayesian Network is created for all possible combinations between the variables. The actual training was carried out using the generation of a Conditional Probability Table (CPT) for each of the output variables X in the network that was on the receiving end of an identified relationship or ‘connection’ between two variables. This table was composed of every permutation possible from the variables providing input to X, with every permutation also having both a list of probabilities for each of the different possible values of X and the total number of occurrences of that permutation. For example, if X(1, 2) (i.e. X can have values of 1 or 2) was connected to A(1, 2, 3) (A can have values of 1, 2 or 3), B(1, 2) and C(1, 2, 3), then the table would contain 3 2 3 ¼ 18 rows and 4 columns, as shown in Table 7. The first column gives the input variable permutations, while the next two columns give the number of occurrences of that permutation that result in each of the possible values of X. The final column gives the total number of occurrences of each permutation. For the real dataset used, as the controlling algorithm (written in Visual Basic, although almost any other programming language would have been able to accomplish exactly the same implementation) looped through the training dataset, the relevant row was adjusted for every permutation. The relevant row was determined using the number of possible values of each of the input variables. An example using the above variable descriptions as shown in Table 6 is A ¼ 3, B ¼ 1, C ¼ 2, X ¼ 2. The row number to be adjusted in the CPT for X is therefore row 14, with the value in column 2 incremented by one and the value in column 3 also incremented by one, as demonstrated in Table 6. When the end of the training set was reached, each of the values in columns 1 and 2 would be divided by the value in column 3 (for each row in turn), to give a prior knowledge probability of that result occurring. Table 7 An example of a Conditional Probability Table (CPT)
A ¼ 1, A ¼ 1, A ¼ 1, A ¼ 1, A ¼ 1, A ¼ 1, A ¼ 2, A ¼ 2, A ¼ 2, A ¼ 2, A ¼ 2, A ¼ 2, A ¼ 3, A ¼ 3, A ¼ 3, A ¼ 3, A ¼ 3, A ¼ 3,
B ¼ 1, B ¼ 1, B ¼ 1, B ¼ 2, B ¼ 2, B ¼ 2, B ¼ 1, B ¼ 1, B ¼ 1, B ¼ 2, B ¼ 2, B ¼ 2, B ¼ 1, B ¼ 1, B ¼ 1, B ¼ 2, B ¼ 2, B ¼ 2,
C¼1 C¼2 C¼3 C¼1 C¼2 C¼3 C¼1 C¼2 C¼3 C¼1 C¼2 C¼3 C¼1 C¼2 C¼3 C¼1 C¼2 C¼3
X¼1
X¼2
Sum
2 0 4 3 1 7 5 9 0 3 6 1 2 7 7 4 9 4
6 2 8 2 4 5 8 5 9 5 1 5 7 2þ1¼3 5 3 7 5
8 2 12 5 5 12 13 14 9 8 7 6 9 9 þ 1 ¼ 10 12 7 16 9
The highlighted row is discussed in the main text for a demonstration of how the CPT operates.
244
M.J. Aitkenhead, I.H. Aalders / Journal of Environmental Management 90 (2009) 236e250
Fitness measurement took place using a similar arrangement, but using a dataset that was not used to train the network. The training and testing datasets were chosen at random from a larger dataset containing information from 1000 points selected at random within North-East Scotland, with statistical analysis of the datasets resulting from the split showing that even for land cover classes that occurred less frequently, the spatial distribution within both training and testing datasets (each containing 500 data points) was representative of the original whole. For each set of test data the relevant permutation was examined and a prediction made from the outcome with the highest value for that row. A comparison of predicted and actual values for each variable in the test dataset was then used to determine the accuracy of the trained network. This procedure completed the multi-step algorithm by which the Bayesian network was evolved:
distance between different categories of an attribute may not have great meaning, or that the meaning may change related to how the categorisation is carried out in the first place. As mentioned in Section 2.1, generalisation of the factors used in this study was carried out in a manner that resulted in meaningful classification changes from one category to another. Where this was not as easy, for example with the Land Cover of Scotland, efforts were still made to have a logical progression from one class to another. It is true that different reclassifications would result in different end structures due to the varying Euclidean distances, but this issue has been minimised wherever possible. Many methods exist in the literature for dealing with qualitative variables in this kind of situation, and this is not seen as an obstacle to the development and application of this methodology. 2.3. Evolving the network
1. Develop the network structure (described in Section 2.3). 2. Use the network structure to identify the Conditional Probability Table structure. 3. Use the training data to fill in the CPT values. 4. Use the test data to evaluate the fitness of the Bayesian network and its CPT. In cases where there were a large number of variables, each of which had several possible values, the number of permutations and therefore the size of the relevant table could be excessively large. A solution to this was found by only adding to the CPT those permutations that occurred in the training dataset, and having a look-up table that mapped from the actual permutation value to the location in the CPT of the permutation’s information. This eliminated the need for extremely large Conditional Probability Tables with a high proportion of empty rows, such as will occur in situations where the system being examined does not in practice have a high number of different ‘situations’ or variable outcomes. During testing of a trained Bayesian network, if a variable permutation is found that did not occur within the training dataset, one of two methods can be applied to provide a response. The first is for the system to simply admit a lack of knowledge and to say ‘I do not know’, the implications of which are that further training may be required before the system can be relied upon. The second solution is to apply the existing permutation which is closest to the given test data, using some measure. Here we applied the second of these two methods, with a measure of the squared Euclidean distance between the missing permutation P and the existing permutation Q given by 2 2 2 d 2 ¼ ðp1 q1 Þ þðp2 q2 Þ þ/ þ ðpn qn Þ
ð2Þ
where p1, p2, etc. were the variable values of P, q1, q2, etc. were the variable values of Q and n was the number of variables in each permutation. This method could also be applied to situations where a specific test value has only partial data, i.e. if one or more of the variables were unknown. There is a potential problem here in that the measuring of Euclidean
As has already been discussed, the structure of a Bayesian network lends itself readily to evolutionary methods, being easily described as a binary string of 72 digits (for the 9 variables considered here). This corresponds to the number of possible connections between variables (9 variables possibly connected to each of the 8 others), and can be considered as the ‘genome’ of the network. Each bit in the 72-digit string corresponded to one of the connections, with the position of that connections’ bit on the string being assigned as follows: (1e2; 1e3.1e8; 1e9; 2e1; 2e3.9e7; 9e8) where each number corresponds to one of the variables involved and each pair corresponding to one bit in the network’s genome. In order to implement evolutionary pressures, an initial population of 12 random network structures was created. In this initial population every possible connection between variables had a 0.1 probability of existing, with the input/output restrictions given above implemented after connection randomisation. The networks were trained and tested as described in the previous subsection, and the four fittest selected. Each of these four was then bred with the other three to produce a total of 12 children, and the process continued. The breeding process involved elements of both crossover and mutation processes. For the crossover procedure, one of the parents was initially selected at random and copied directly into the child’s network genome bit by bit. After each bit was copied a random number in the range [0,1] was generated, and if this value was less than 0.1 then the values in the child’s network architecture continued to be copied in, but from the other parent. This process of alternately copying different parent networks continued until the end of the network architecture was reached. For the mutation procedure, which followed the crossover process, the first step performed was a count of the ratio of active connections (value ¼ 1) to inactive connections (value ¼ 0) in the child’s network. This ratio r was then used to determine the probability of mutations from active to inactive connections (or vice versa) in such a way that the value of r did not simply trend towards half the number of potential connections, as would be the case if bits were switched completely randomly, but that
M.J. Aitkenhead, I.H. Aalders / Journal of Environmental Management 90 (2009) 236e250
r remained constant and only changed due to an evolutionary pressure to do so. For example, if r equalled 4 (the number of active connections was four times the number of inactive connections) then the probability of a specific active connection being switched to an inactive one was one-quarter of that of a specific inactive connection being switched to an active connection. For situations where number of inactive connections in the child network is zero (i.e. all connections are active), then the probability of an active connection being switched was equal to 0.05. Fig. 4 shows the two aspects of the evolutionary process being carried out. In cases where the 12 offspring in each generation did not provide four candidates whose individual fitness (measured as the prediction accuracy of the network, i.e. the proportion of the number of input values that resulted in the correct output) was above the mean fitness of their parents, those that did not achieve this threshold of fitness were replaced by the fittest (or the fittest two, or three, or four, if necessary) of the four parents. This was done in order to prevent the best candidates
Fig. 4. Crossover and mutation of parent network architecture strings to produce a child Bayesian network. In each parent, cells corresponding to the presence or absence of connections that are surrounded by a box are selected for the child network, with the corresponding cells in the other parent discarded. Following crossover, mutation of randomly selected sites is carried out.
245
being selected from a mediocre generation and thus effectively taking a step backward in the evolutionary process. 3. Results The utility of the described evolutionary method for optimising Bayesian networks can be measured in three ways: flexibility of the method for network optimisation, predictive accuracy of optimised system in terms of land cover mapping, and ease of use. Each of these three characteristics of the method is described here, both through direct measurement of the evolutionary system’s accuracy, and through comparison with a commercial Bayesian network development package, NeticaÔ application (Norsys, http://www.norsys.com). 3.1. Flexibility Combining evolutionary methods with Bayesian statistics provides flexibility of land cover prediction in two ways: first, through the ability of the system to adapt to systems where different factors are relevant and have strong influence over the land cover; secondly, through the nature of the Bayesian methodology which can be applied to any land cover classification system and to any set of environmental variables. The same can be said of other statistical approaches, but we feel that Bayesian networks are ideally suited to the situation of land cover and land use modelling where so many factors and data types must be considered. However, the system developed was less flexible than NeticaÔ in terms of data input, as the program was written with specific input dataset requirements. This means that the user has to pre-process the information in a manner that is not required for the commercial package, and which relates not only to flexibility but also to ease of use. This situation could be easily resolved for future use of the method, and is not one that we consider an obstacle to its application. In the graphical Bayesian network software NeticaÔ, the variables used for the model are represented as nodes and these nodes (studied variables) are linked to create the model’s structure. The graphic interface makes both the creation and the modification of models simple. For the purposes of this paper, the NeticaÔ model used a similar structure to the evolutionary system, which was a single layered model with all input variables linked directly to the output variables (Land Cover and Forestry, see Fig. 5). Once the model structure was created, the model was trained and subsequently tested. A training dataset and a testing dataset were randomly selected from the original data used for the evolution of the network using ArcGIS, with this dataset being used for both systems under comparison. The evolutionary system was able to evolve a network from the initially random configuration within a matter of minutes, with the fitness (accuracy) of the evolving network increasing at first rapidly, then more slowly over a period of 1000 generations towards the final network (see Fig. 6). The reason for having a random network was to demonstrate the evolutionary method’s ability to fit the network to the data without using
246
M.J. Aitkenhead, I.H. Aalders / Journal of Environmental Management 90 (2009) 236e250
Fig. 5. A Bayesian network developed using NeticaÔ software.
prior knowledge of the system. With the commercial software, adjustments to the network design had to be carried out manually, with a trial and error method being necessary in order to find the same optimal model design as the evolutionary method. For models with a large numbers of variables this trial and error method would be effectively useless, although for
simpler models the search for an optimal model design by hand is relatively easy, particularly if the user has some knowledge of the system. For any specific model design, NeticaÔ was able to develop the Conditional Probability Table of the training dataset more rapidly than the evolutionary method, taking a minimal amount of time in comparison. The speed of NeticaÔ in this case was due to the programming languages used and issues of efficiency in software design. 3.2. Accuracy
Fig. 6. Changing fitness of an evolved network over multiple generations, and the network obtained at the end of the evolutionary process.
One of the most effective methods of representing the accuracy of a predictive system where multiple answers can be given is through a confusion matrix. This is a representation that relates the number of actual occurrences of each type within the studied system to the number of predictions of each type, and shows the reader which types tend to be ‘confused’ with one another. The results from the comparison of land cover types are presented in a confusion matrix (see Table 8). These results were identical for both methods used, as the network developed using the evolutionary method could not be improved using NeticaÔ, and so Table 8 is effectively the confusion matrix for both methods. This result shows that the internal calculations used by the evolved Bayesian network are identical to those of NeticaÔ, giving us confidence that the system is working correctly. As mentioned above, the ease with which these results can be obtained using the evolutionary method in comparison to NeticaÔ or any other Bayesian statistical software not using evolutionary methods will vary with network size. For small networks (approximately 2e5 connections), trial and error may find an optimal approach more rapidly, but once the
M.J. Aitkenhead, I.H. Aalders / Journal of Environmental Management 90 (2009) 236e250
247
Table 8 Confusion matrix of variables obtained using optimised Bayesian network Predicted land use
Actual land use
Arable Improved grass Rough grazing Forest Development Other Total predicted land use (%)
Arable
Improved grass
Rough grazing
Forest
Development
Other
Total actual land use (%)
29.29 8.81 1.41 4.70 1.04 1.06 46.31
4.36 1.57 1.35 2.98 0.29 0.75 11.29
0.10 0.36 0.22 0.35 0.02 1.31 2.36
2.06 1.61 1.28 4.12 0.08 4.06 13.19
0.36 0.12 0.04 0.05 0.94 0.03 1.53
0.51 0.74 2.56 4.19 0.35 16.96 25.31
36.68 13.20 6.86 16.37 2.72 24.17 100.00
network size gets much larger than this the evolutionary approach is going to be faster than trying to find an optimal network design manually. With this particular study, we were able to exhaustively explore every permutation of network architecture and evaluate the accuracy of each. Of the 16,384 permutations available (of which 2187 satisfied the design criteria given in Section 2.2), the evolutionary method found the most accurate in 830 steps. The results show that 29% of the Grampian Region which includes almost 80% of the actual arable land cover is correctly predicted as arable, while 17% of the region, equal to 70% of other land cover, is also correctly predicted. However, the remaining classes, in particular improved grass and rough grazing are poorly predicted. This means that for these remaining classes the predicted land covers do not reflect what is actually there. For example, about 65% of actual improved grass is predicted as arable, representing about 9% of the study area. However, the comparison between arable and improved pasture is one that could also be explained by error in the land cover dataset (MLURI, 1993; Aspinall et al., 1993). Additional reasons for errors could be due to the coarse predictive values provided by the input variables, where division between classes is not always perfect, or due to the low spatial resolution of the datasets involved leading to imperfect mapping accuracy. In total, the error rate for land cover classes for the optimised Bayesian network is 46.90% (Table 8). An example of the evolutionary method’s utility is with forestry, where the connections that have been evolved give an error rate of 14.28%, as opposed to a rate of 16.85% for full connectivity within the network. This is a minor improvement, but still shows that evolutionary methods can be used to improve Bayesian networks from initially random or completely connected topologies (i.e. every input variable has a connection to every output variable). Comparison of the evolved Bayesian network and that developed using customised, off-the-shelf software, either in terms of overall accuracy or through examination of the confusion matrices, shows that each model has greater success in predicting the land uses which have higher abundance and greater spatial coherence, i.e. that occur in larger patches, than those which have lower abundance and lower spatial coherence. This seems to support the notion that more evidence makes more reliable predictions, but it also suggests that the
structure of the current model is not particularly suitable for the prediction of improved grass and rough grazing. We consider this to be due to the importance of socioeconomic factors in driving these land cover types in addition to the influence of biophysical factors. Another consideration when dealing with the method accuracy is that for these comparisons, each category has been given equal weighting within the possible values, when in actual fact some of them occur much more often than others. If a weighting is given equal to the proportion of occurrence for each category, then we found that the predictive errors are lowered further due to the predictive accuracy of more common classes being greater than for less common ones. It is possible to adjust frequentist methods to avoid this situation when dealing with better-represented land cover types, although we have not done this here. The modelling methods that have been applied here have been used as a method of extracting the most information from a noisy, error-laden series of datasets of different spatial resolution, from different sources. It is suggested that it would be difficult for the Bayesian and possibly any other method (although comparison has not been made here with other methods) to extract additional meaningful or accurate information from the available datasets due to their nature, and that as a result the land cover mapping exercise is limited here both by the datasets and the land cover classification system used. 3.3. Ease of use Interpretation of the results of a trained neural network using the NeticaÔ software is easy, with an intuitive point-andclick graphical interface, the use of which is becoming standard in modern statistical software. Development of confusion matrices and prediction of land cover class types can be carried out in seconds by an untrained user who is familiar with current PC operating systems. In comparison, the evolutionary method, which was developed using Microsoft Visual Basic, is not designed with a graphical interface yet and would be of little use to researchers wishing to apply it to their own data without prior training. Output is in the form of tabdelimited text files that can then be opened in a standard spreadsheet software package, and input data has to be adjusted before it can be used by the system. In terms of ease of use, a previously untrained user would require an estimated
248
M.J. Aitkenhead, I.H. Aalders / Journal of Environmental Management 90 (2009) 236e250
1e2 h of training with the system, with additional experience in using programming languages and spreadsheet programs in order to be able to alter the system to their specific requirements or fully apply the results of a trained, evolved Bayesian network. Having said this, the fact that the two systems (off-the-shelf software and the evolutionary approach) are appropriate for different situations means that any ‘ease of use’ comparison must also consider the applicability of each system for specific circumstances. NeticaÔ is more appropriate for smaller Bayesian networks or for larger ones where the structure is already known, while the evolutionary approach is more appropriate where the system being modelled is more complex, containing more variables and relationships and the structure describing these relationships is unknown. For these two types of situation, each system described is also easier to use and apply. In the more complex situation, the evolutionary approach can provide a Bayesian network structure that is more appropriate to the situation, and can provide a simpler network structure that is easier to understand. 4. Discussion Two main conclusions can be drawn from the work carried out here and the results obtained. The first is that while there are causal relationships between commonly available datasets and land cover, any approach that is successful in extracting these relationships must be flexible enough to handle multiple data types and sources. The factors that were found to be important to land cover and forestry using the Bayesian approach included all those that were made available, and while some of these have obvious causal relationships with the output variables (e.g. Land capability for forestry and forestry; accumulated temperature and land cover), others are not so obvious (e.g. Land capability for agriculture and forestry). The fact that the inclusion of these less obvious relationships was still found to be necessary in improving the predictive ability of the system is important for future land cover mapping projects. The second conclusion is that whereas commercial software packages that implement novel modelling methods are useful, they can lack the flexibility required to optimise those modelling methods when the system being modelled contains a large number of variables and uncertain relationships. An advantage of NeticaÔ’s graphical interface is that it is relatively easy to interpret the results of a trained Bayesian network and to create graphs or tables showing accuracy and confusion matrix information. The lack of a need for specialist training also means that the commercial software is more accessible to researchers who do not have a computer programming or mathematical background, while the optimised design of NeticaÔ also means that it performs the relevant calculations and provides answers more rapidly as well as more intuitively. The ability of the evolutionary method to find an optimal or near-optimal solution (in terms of prediction accuracy) is an important attribute of the method described here. With larger Bayesian networks this would be even more useful as the
likelihood of a researcher finding an optimal network design through trial and error would decrease as the size of the network increases. Also important is the flexibility of the method in its ability to deal with different numbers of input attributes, and the ability to specify which attributes are of interest and therefore must have predictions made about them, and which ones are not. Through the evolutionary method, important relationships between particular pairs of factors are rapidly discovered, with pairs of factors that have little influence on one another also identified through the lack of connectivity. This is based on the assumption that if the system identifies that a pair of factors are connected causally (as in a Bayesian network) then a relationship between them must exist. A possible flaw with the system, and one that can arguably be attributed to all Bayesian network methods, is that both the number of factors and the number of possible values of each factor have to be kept to a minimum lest the size of the conditional probability table becomes difficult to handle. This can be partially dealt with by using only those permutations of factors that occur in the training dataset, although this solution causes a lack of flexibility in dealing with previously unencountered permutations. Another potential problem is that in order to carry out the evolutionary processes on a Bayesian network, the data were manipulated using a custom-built piece of software. This meant that the data had to be extracted from GIS data and transformed into a format more easily read using the programming language used here. It would have been more efficient to carry out the work within a GIS package, however the degree of sophistication and model manipulation required made this impossible. The evolved network itself illustrates the bias that can be created in an empirical model when there is overwhelming evidence for causal relationships between variables. This bias can ignore less frequently occurring relationships that may nevertheless be important to the model. In the study area most smaller land uses, but in particular improved grass and rough grazing, are land cover types that are fragmented and located between larger areas of other land cover types, and in addition, may not be particularly driven by biophysical factors as much as by socioeconomic factors, which are not included in this particular modelling structure. In addition, the accuracy of land cover maps covering fragmented landscapes is likely to be less accurate than for landscapes containing large consistent areas of land cover, This is due to the increased proportion of map pixels that contain boundaries between land cover classes, and that therefore are effectively mixed but that are given a single classification. Future work would need to address this issue, with considerations of the land management issues relating to each land cover type within its physical and economic environment. In addition, issues relating to the use of datasets at different scales would have to be investigated, in order to address concerns of sliver effects or minimum mapping units. While it can be fairly claimed that an prediction accuracy of approximately 50% is not particularly good, the counterarguments are that (a) the datasets used to create this model
M.J. Aitkenhead, I.H. Aalders / Journal of Environmental Management 90 (2009) 236e250
were relatively restricted and of varying accuracy, and (b) there are doubtless other factors influencing land cover in the North-East of Scotland, for example socioeconomic or physical characteristics that have not been included in the study. What the evolutionary system has been able to do is identify the relationships that do exist within the data, and optimise our ability to extract meaningful information from it. It is considered highly likely that with the inclusion of further information, such as remote sensing data for example, the accuracy of the maps produced would improve greatly. In relation to land cover and land cover change, we are most interested in those relationships that will cause a large diversion from the general trend, as these relationships are the ones that are most difficult to adapt to or model. The evolutionary method described here allows us to investigate the relationships between variables, and to determine which variables have the strongest effects on one another (i.e. whether a connection between two variables needs to be included in the network). The space of all possible Bayesian networks, even for the relatively simple examples examined here, is sufficiently large to prohibit an exhaustive search. With larger networks, evolutionary methods may be the only way of optimising models and investigating the complex relationships that exist. The approach used here could also be applied to the prediction of future land cover through the identification of specific trajectories of change, and as such could provide a useful tool for understanding the effects of climate change on the landscape. 5. Conclusions We have shown here that benefits can be obtained by applying evolutionary pressures to the design of a Bayesian system, both in terms of flexibility and accuracy, when compared to a commercial Bayesian system. Identification of the datasets that relate to the variables of interest and which influence them, and determination of the maximum accuracy possible using the data available, are both important aspects of land cover mapping with disparate datasets and uncertainty of relationships between variables. The principles of evolutionary development can also be applied to other expert knowledge or data-mining approaches (e.g. neural networks, fuzzy classification, decision trees) (Pe~ na-Reyes and Sipper, 2000). Investigation of other commercial software packages that use different AI-based methodologies for data analysis may show that improvements in utility, accuracy and flexibility can be achieved by integrating evolutionary computation with these methodologies also. Acknowledgements The authors would like to thank Dr. Mark Brewer at BioSS (Biomathematics and Statistics in Scotland), and Dr. David Miller (The Macaulay Institute) for advice given during this work.
249
References Aitkenhead, M.J., Dalgetty, I.A., Mullins, C.E., McDonald, A.J.S., Strachan, N.J.C., 2003. Weed and crop discrimination using image analysis and artificial intelligence methods. Computers and Electronics in Agriculture 39, 157e171. Anselin, L., 2000. Computing environments for spatial data analysis. Journal of Geographical Systems 2, 201e220. Arenas, M.G., Castillo, P.A., Romero, G., Rateb, F., Merelo, J.J., 2006. Coevolving multilayer perceptrons along training sets. Advances in Soft Computing, 503e513. Aspinall, R.J., Miller, D.R., Birnie, R.V., 1993. Geographical information systems for rural land use planning. Applied Geography 13, 54e66. Berger, J.O., 2000. Bayesian analysis: a look at today and thoughts of tomorrow. Journal of the American Statistical Association 95, 1269e1276. Berger, J.O., Insua, D.R., 1998. Recent developments in Bayesian inference with applications in hydrology. In: Parent, E., et al. (Eds.), Statistical and Bayesian Methods in Hydrological Sciences. UNESCO Press, Paris, pp. 43e62. Bibby, J.S., Heslop, R.E.F., Harnup, R., 1988. Land Capability for Forestry in Britain. The Macaulay Land Use Research Institute, Aberdeen. Bibby, J.S., Douglas, H.A., Thomasson, A.J., Robertson, J.S., 1991. Land Capability Classification for Agriculture. The Macaulay Land Use Research Institute, Aberdeen. Birse, E.L., Dry, F.T., 1970. Assessment of Climatic Conditions in Scotland 1. The Macaulay Institute for Soil Research. Birse, E.L., Robertson, L., 1970. Assessment of Climatic Conditions in Scotland 2. The Macaulay Institute for Soil Research. Borsuk, M.E., Stow, C.A., Reckhow, K.H., (2002). Integrative environmental prediction using Bayesian networks: a synthesis of models describing estuarine eutrophication. In: Rizzoli, A.E., Jakeman, A.J. (Eds.), Integrated Assessment & Decision Support, Proceedings of the first biennial meeting of the International Environmental Modelling and Software Society, vol. 2, pp. 102e107. Brown De Colstoun, E.C., Walthall, C.L., 2006. Improving global scale land cover classifications with multi-directional POLDER data and a decision tree classifier. Remote Sensing of Environment 100 (4), 474e485. Comber, A.J., Law, A.N.R., Lishman, J.R., 2004. A comparison of Bayes’, Dempster-Shafter and Endorsement theories for managing knowledge uncertainty in the context of land cover monitoring. Computers, Environment and Urban Systems 28 (4), 311e327. Cots-Folch, R., Aitkenhead, M.J., Martı´nez-Casasnovas, J.A., 2007. Mapping land cover from detailed aerial photography data using textural and neural network analysis. International Journal of Remote Sensing 28 (7), 1625e1642. Cowles, M.K., 2003. Efficient model-fitting and model-comparison for highdimensional Bayesian geostatistical models. Journal of Statistical Planning and Inference 112, 221e239. Erasmus, L., van Jaarsveld, A.S., Bommel, P.O., (2002). A spatially explicit modelling approach to socioeconomic development in South Africa. In: Rizzoli, A.E., Jakeman, A.J. (Eds.), Integrated assessment & decision support, Proceedings of the first biennial meeting of the International Environmental Modelling and Software Society, vol. 3, pp. 91e96. Fotheringham, A.S., Brunsdon, C., Charlton, M., 2000. Quantitative Geography. Sage Publications, London, Thousand Oaks, New Delhi. Gomez, F., Schmidhuber, J., Miikkulainen, R., 2006. Efficient non-linear control through neuroevolution. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 4212 LNAI, 654e662. Harvey, I., Husbands, P., Cliff, D., Thompson, A., Jakobi, N., 1997. Evolutionary robotics: the Sussex approach. Robotics and Autonomous Systems 20, 205e224. Huo, Q., Ma, B., 2001. Online adaptive learning of continuous-density hidden markov models based on multiple-stream prior evolution and posterior pooling. IEEE Transactions on Speech and Audio Processing 9 (4), 388e398. Kloprogge, P., van der Sluijs, J.P., (2002). Choice processes in modelling for policy support. In: Rizzoli, A.E., Jakeman, A.J. (Eds.), Integrated
250
M.J. Aitkenhead, I.H. Aalders / Journal of Environmental Management 90 (2009) 236e250
Assessment & Decision Support, Proceedings of the first biennial meeting of the International Environmental Modelling and Software Society, vol. 1, pp. 96e101. Korb, K.B., Nicholson, A.E., 2004. Bayesian Artificial Intelligence. Chapman & Hall/CRC, London. Kuplich, T.M., 2006. Classifying regenerating forest stages in Amazonia using remotely sensed images and a neural network. Forest Ecology and Management 234 (1e3), 1e9. Lowe, P., Ward, N., Munton, R.J.C., 1992. Social analysis of land use change: the role of the farmer. In: Whitby, M.C. (Ed.), Land Use Change: the Causes and Consequences. ITE Symposium No. 27. HMSO, London. Macaulay Institute for Soil Research, 1984. The Soil Survey for Scotland 1:250,000. University Press, Aberdeen. MLURI, 1993. The Land Cover of Scotland. Final Report. MLURI, Aberdeen. Marcot, B.G., Holthausen, R.S., Raphael, M.G., Rowland, M.M., Wisdom, M.J., 2001. Using Bayesian belief networks to evaluate fish and wildlife population viability under land management alternatives from an environmental impact statement. Forest Ecology and Management 153, 29e42. McCarty, J.L., Justice, C.O., Korontzi, S., 2007. Agricultural burning in the Southeastern United States detected by MODIS. Remote Sensing of Environment 108 (2), 151e162. McIver, D.K., Friedl, M.A., 2002. Using prior probabilities in decision-tree classification of remotely-sensed data. Remote Sensing of Environment 81, 253e261. Mertens, B., Poccard-Chapuis, R., Piketty, M.-G., Lacques, A.-E., Venturieri, A., 2002. Crossing spatial analyses and livestock economics to understand deforestation processes in the Brazilian Amazon: the case of S~ao Fe´lix do Xingu´ in South Para. Agricultural Economics 27, 269e294. Mu¨ller, D., Zeller, M., 2002. Land use dynamics in the central highlands of Vietnam: a spatial model combining village survey data with satellite imagery interpretation. Agricultural Economics 27, 333e354.
Mu¨nier, B., Nygaard, B., Ejrnæs, R., Bruun, H.G., 2001. A biotope landscape model for prediction of semi-natural vegetation in Denmark. Ecological Modelling 139, 221e233. Pal, M., 2006. M5 model tree for land cover classification. International Journal of Remote Sensing 27 (4), 825e831. Palomo, J., Insua, D.R., Salewicz, K.A., (2002). Reservoir management decision support. In: Rizzoli, A.E., Jakeman, A.J. (Eds.), Integrated Assessment & Decision Support, Proceedings of the first biennial meeting of the International Environmental Modelling and Software Society, vol. 3, pp. 229e234. Pe~na-Reyes, C.A., Sipper, M., 2000. Evolutionary computation in medicine: an overview. Artificial Intelligence in Medicine 19 (1), 1e23. Robin, A., Mascle-Le He´garat, S., Moisan, L., 2005. A multiscale multitemporal land cover classification method using a Bayesian approach. Proceedings of SPIE. The International Society for Optical Engineering. 5982. Rouget, M., Richardson, D.M., Cowling, R.M., Lloyd, J.W., Lombard, A.T., 2003. Current patterns of habitat transformation and future threats to biodiversity in terrestrial ecosystems of the Cape Floristic Region, South Africa. Biological Conservation 112, 63e85. Sette, S., Boullart, L., Van Langenhove, L., 1998. Using genetic algorithms to design a control strategy of an industrial process. Control Engineering Practice 6, 523e527. Skelsey, C., Law, A.N.R., Winter, M., Lishman, J.R., 2004. Automating the analysis of remotely-sensed data. Photogrammetric Engineering & Remote Sensing 70 (3), 341e350. Spears, W.M., 1993. Crossover or mutation? In: Whitley, L.D. (Ed.), Foundations of Genetic Algorithms 2. Morgan Kaufman. Stassopoulou, A., Petrou, M., Kittler, J., 1998. Application of a Bayesian network in a GIS based decision-making system. International Journal of Geographical Information Science 12 (1), 23e45. Welp, M., de la Vega-Leinert, A., Stoll-Kleemann, S., Jaeger, C.C., 2006. Science-based stakeholder dialogues: theories and tools. Global Environmental Change 16, 170e181.