1.22
Spatial Data Uncertainty
Linna Li, Hyowon Ban, and Suzanne P Wechsler, California State University, Long Beach, CA, United States Bo Xu, California State University, San Bernardino, CA, United States © 2018 Elsevier Inc. All rights reserved.
1.22.1 1.22.2 1.22.2.1 1.22.2.1.1 1.22.2.1.2 1.22.2.2 1.22.2.3 1.22.3 1.22.3.1 1.22.3.1.1 1.22.3.1.2 1.22.3.1.3 1.22.3.1.4 1.22.3.1.5 1.22.3.2 1.22.3.2.1 1.22.3.3 1.22.3.4 1.22.3.4.1 1.22.3.4.2 1.22.3.4.3 1.22.4 1.22.4.1 1.22.4.2 1.22.5 1.22.5.1 1.22.5.1.1 1.22.5.1.2 1.22.5.1.3 1.22.5.2 1.22.5.2.1 1.22.5.2.2 1.22.5.2.3 1.22.6 1.22.7 1.22.7.1 1.22.7.2 1.22.7.3 1.22.8 References
1.22.1
Introduction Scale and Conceptualization of Uncertainty Data Quality: Cartographic Scale and Map Accuracy Error as a component of data quality Map accuracy as a measure of error MAUP and Ecological Fallacy: The Case of Process and Temporal Scales Spatial Extent: Continuous Surfaces and Raster Scale Uncertainty Analysis Methods Modeling Uncertainty in Vector data Positional uncertainty Analytical modeling of positional uncertainty Modeling positional uncertainty of point features Modeling positional uncertainty of linear features Modeling positional uncertainty of polygons Attribute Uncertainty Error matrix for attribute uncertainty Modeling Topological Relations of Spatial Objects With Vague Boundaries Analytical Methods and Monte Carlo Simulation Uncertainty simulation in continuous fields Uncertainty in raster data: The case of the digital elevation model DEM uncertainty propagation to derived parameters Uncertainty Propagation in Spatial Analysis Modeling Uncertainty Propagation in Computational Models Modeling Uncertainty Propagation in Simple Spatial Operations Semantic Uncertainty in Spatial Concepts Uncertainty, Fuzzy-Sets, and Ontologies Uncertainty Fuzzy-set approach Ontologies Applications The 1990s The 2000s The 2010s Uncertainty Visualization Uncertainty in Crowd-Sourced Spatial Data Evaluation of Uncertainty in Crowd-Sourced Spatial Data Uncertainty in Platial Data Uncertainty Equals Low Quality? Future Directions
313 315 315 315 315 316 317 318 318 318 318 318 319 320 321 321 321 322 322 322 323 323 324 324 324 324 325 325 327 327 327 328 328 329 330 330 331 332 333 334
Introduction
Spatial data uncertainty is an important subfield of geographic information science (GIScience). Uncertainty is used as an umbrella term to encompass data quality, accuracy, error, vagueness, fuzziness, and imprecision. Each of these terms refers to imperfections in spatial datasets. Given that spatial data are representations of reality, it is impossible to have perfect representation of the world without any loss of information. Therefore, uncertainty is inevitable in all geographic datasets and analyses. Spatial data uncertainty means that we are still uncertain about the geographic world even when we have a high-quality geographic database. From reality to representation, uncertainty is introduced and propagated at every step of the spatial analysis processdfrom conceptualization and generalization to measurement and analysis.
313
314
Spatial Data Uncertainty
Spatial data quality is defined based on the assumption that there is geographic truth to compare with a datasetdthe closer a spatial dataset is to the truth, the higher its quality. The term “error” refers to how far a measurement is from truth. An example of a data quality report may be in the form of metadata that describe known uncertainties by data providers, such as measurement errors. Error is therefore a term closely related to quality. High quality is correlated with small errors while low quality is associated with big errors. Accuracy provides a measurement of error. An accurate dataset is one that is close to a represented phenomenon with only small errors. Vagueness, fuzziness, and imprecision are generally used to describe uncertainties associated with geographic concepts, classes, and values that must be addressed appropriately to responsibly communicate measure, model, and communicate uncertainty in spatial datasets and analyses. Over the past three decades, the geospatial community has addressed the issue of spatial data uncertainty in varied and meaningful ways. In 1988 the US National Center for Geographic Information and Analysis (NCGIA) hosted a conference dedicated to addressing the “Accuracy of Spatial Databases” which represents the beginning of a growing interest and concern for this topic. The conference resulted in a seminal work devoted to the topic (Goodchild and Gopal, 1989). Since 1994, 12 International Symposiums on Spatial Accuracy Assessment in Natural Resources and Environmental Sciences have been held biannually. Nine International Symposiums on Spatial Data Quality have been organized, most recently in 2015. The 2017 AAG conference included uncertainty as one of three core themes. Research related to spatial uncertainty continues to expand (Fig. 1). The geospatial community, as it has matured over the past three decades, has consistently addressed issues of uncertainty. This article reviews many of these approaches. Spatial data uncertainty has been described and addressed from different perspectives. Zhang and Goodchild (2002) organize and discuss uncertainty-related research in continuous fields and discrete objects, respectively. Uncertainty may be studied as positional uncertainty or attribute uncertainty. Spatial uncertainty is also intertwined with other related issues, such as scale. This article is organized as follows. Section “Scale and Conceptualization of Uncertainty” discusses scale and conceptualization of uncertainty. Section “Uncertainty Analysis Methods” focuses on methods used to evaluate and analyze uncertainty in both discrete objects and continuous fields “Uncertainty Propagation in Spatial Analysis” discusses methods to measure uncertainty propagation in computational models and simple spatial operations. Section “Semantic Uncertainty in Spatial Concepts” describes semantic uncertainty in spatial concepts and section “Uncertainty Visualization” presents visualization of spatial data uncertainty. Section “Uncertainty in Crowd-Sourced Spatial Data” discusses new dimensions of uncertainty in the era of big data. Finally, the article ends with conclusions and future research directions in section “Future Directions”.
800 14000 700 12000 600
Citations
500 8000
400
6000
300
4000
200
2000
100
Publications
10000
0 01
02 20 03 20 04 20 05 20 06 20 07 20 08 20 09 20 10 20 11 20 12 20 13 20 14 20 15 20 16
20
20
20
00
0
Spatial data uncertainty citations
Spatial data accuracy citations
Spatial data quality citations
Spatial data uncertainty publications
Spatial data accuracy publications
Spatial data quality publications
Fig. 1 Yearly publications and citations from 2000 to 2016 indexed by Scopus using the following search terms: “spatial data quality”, “spatial data uncertainty”, and “spatial data accuracy”. The Boolean search was limited using AND NOT to prevent overlap. The search was limited to environment sciences and social sciences. The total number of publications including all terms was 10,521.
Spatial Data Uncertainty
1.22.2
315
Scale and Conceptualization of Uncertainty
Scale is arguably a fundamental characteristic of geographic data. All spatial data are scale and context dependent. These scales impact how we interpret and make meaning of resulting spatial analyses. Understanding components of scale is essential to understanding uncertainty associated with our results, which in turn shape our interpretations and decisions. The word “scale” itself has many meanings, both outside and within the geographic context (Quattrochi and Goodchild, 1997; Ruddell and Wentz, 2009). The conceptualization of scale and its varied manifestations on spatial data and associated analyses have been addressed extensively in the literature (see for example Blöschl, 1996; Goodchild, 2001, 2011; Goodchild and Proctor, 1997; Goodchild and Quattrochi, 1997; Lam and Quattrochi, 1992; Quattrochi and Goodchild, 1997). In the geospatial context, the term scale manifests in the following areasdcartographic scale, process scale, and spatial extent. Each of these categories of scale is interrelated and affects spatial data accuracy, error, and associated uncertainty (Goodchild, 2001, 2011; Goodchild and Proctor, 1997). Cartographic scale refers specifically to the representative fraction that relates a map feature with associated ground distance. This reference scale dictates the level of detail represented by map features (Goodchild, 2011; Goodchild and Proctor, 1997). Process scale refers to the spatial representation of a natural phenomenon such as soil erosion in a landslide hazard. The scales of natural processes are often unknown because they occur at a range of spatial and temporal scales. For example, topography results from many different processes operating over a range of spatial and temporal scales. Herein lies one of the perpetually confounding issues of spatial data analyses and resulting uncertainty. Matching the appropriate spatial data scale with the process we are trying to understand is an ongoing creative exploration. Physical and spatial processes manifest at different and not consistent space–time scales, while the mathematical relationships (spatial data and algorithms) we use to describe them are generally scale dependent. Uncertainty due to vagueness and ambiguity results from process, temporal scale, and spatial extent. Spatial extent refers to the geographic boundaries of a study area and influences the level of detail that can be represented, as well as the amount of storage and processing power required for spatial analysis. Spatial extent influences the scale at which a geographic phenomenon can be observed. This observational scale in turn imposes a scale on our understanding of natural or spatial processes. Uncertainty can be separated into three classes: (1) ideas of error, (2) vagueness, and (3) ambiguity (Fisher, 1999). We explore these varied expressions of geographic scale to frame an understanding of spatial uncertainty. Uncertainty due to errors is associated with Boolean representation of cartographic features. Uncertainty due to vagueness and ambiguity results from process, temporal scale, and spatial extent. In the following section, components of geospatial scale provide a framework for exploring the varied expression of uncertainty in spatial data.
1.22.2.1
Data Quality: Cartographic Scale and Map Accuracy
Geospatial analyses are performed to derive understanding about reality. Such understanding derived from spatial data analyses is inextricably linked to the “quality” of the underlying geospatial datasets. The term “data quality” has been used to refer to fitness for use, data accuracy, and by association, error and related uncertainty (Chrisman, 1991; Brus, 2013). Spatial data quality has been a core focus of geospatial practice since the arrival of geographic information systems in the early 1980s (Goodchild and Gopal, 1989; Veregin, 1999; Devillers et al., 2010; Li et al., 2012). Mechanisms for describing data quality have been used as a measure of accuracy, and in turn a mechanism to quantify what we do not know, or uncertainty. Data quality implies characteristics of both error and accuracy.
1.22.2.1.1
Error as a component of data quality
Accepted descriptions and measures of data quality are required to promote effective methods for analysis and display of uncertainty (Buttenfield, 1993). For the purposes of this article, we define these related terms as follows. Error is defined as a departure of a measurement, or in the case of spatial data, the representation of a feature on a map, from its true value. The nature and extent of error in spatial datasets are often unknowable and result in uncertainty. Efforts to represent and manage error are rooted in concepts of data quality. Error can be classified as mistakes or blunders, systematic, or random. Blunders arise from errors in the data collection process and are generally removed once identified. Systematic errors result from bias in the data collection process. Random errors remain in the data once blunders and systematic errors are addressed (USGS, 1998; Wechsler and Kroll, 2006). Quantifying map accuracy provides a measure of the magnitude of error and has been used as a measure of data quality (Li et al., 2012). Positional (horizontal) and vertical accuracy of spatial data are inextricably linked to the scale of the data from which they were derived.
1.22.2.1.2
Map accuracy as a measure of error
Before the advent of computer technology, maps were drawn by hand. Uncertainty occurs at different levels of generalization and is linked to the scale of representation (MacEachren et al., 2005). Accuracy of these cartographic representations is linked to the scale at which they were drawn. The National Map Accuracy Standards released in 1947 provided a measure of data quality for paper maps. Per these requirements, no more than 10% of sampled points on maps of 1:20,000 scale or larger could be “off” by 0.08 cm or 0.05 cm in maps at scales smaller than 1:20,000. Effectively, for 1:24,000 scale maps, the horizontal position of 10% of the map features on maps could have a positional error of up to 12 m. These standards served as the measure of map accuracy for over 50 years. Because digital maps were generated from paper maps, this measure of accuracy persisted into the digital cartographic
316
Spatial Data Uncertainty
age. Data quality, as measured by map accuracy, became the standard mechanism by which to gauge map accuracy and associated uncertainty (Buttenfield, 2000). In 1989, standards for measuring horizontal and vertical accuracy were developed. The root mean square error (RMSE) was adopted as the accuracy statistic by which data quality is measured (ASPRS, 1989). The RMSE is the square root of the average squared discrepancies and is expressed as: vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi uN uP u ðyi yti Þ2 ti¼1 (1) N where yi refers to the ith map value, yti refers to the ith known or measured sample point, and N is the number of sample points, reported in the measurement units. The RMSE has become the de-facto mechanism for quantifying and stating accuracy in derived spatial products. Geospatial metadata standards, first released in 2004, require that all spatial data be accompanied by a statement of data quality as defined by accuracy assessments (FGDC, 1998). This remains the only requirement of geospatial practitioners to document and represent data quality. However, the RMSE has its drawbacks. The RMSE statistic has no spatial dimension. Although it provides information about the overall accuracy of a dataset, errors in spatial data vary spatially (Wood and Fisher, 1993). The RMSE is based on the assumption that errors are random and normally distributed, which may not always be the case (Zandbergen, 2008), and by squaring the difference, more weight is given to values that deviate from the “truth”. Additionally, sample points used to calculate the RMSE for a particular data product may not be appropriately spatially distributed or of sufficient number to adequately represent inherent data quality. Despite these shortcomings, the RMSE persists as the standard accuracy statistic that is used to quantify the quality of geospatial data. The challenge remains to provide users with a mechanism for integrating that information to avoid data misuse and make better decisions (Devillers et al., 2010).
1.22.2.2
MAUP and Ecological Fallacy: The Case of Process and Temporal Scales
Making inferences about phenomena observed at one scale based on data observed and presented at coarser or finer scales results in potential misrepresentation and misinterpretation of results. Even before the digital age, researchers have grappled with this issue. In 1934, the Journal of the American Statistical Association published a series of articles dealing with sampling issues associated with census data and discrepancies between sampling space and time (Gehlke and Biehl, 1934; Neprash, 1934; Stephan, 1934). The disconnect between the scale of spatial data and the processes they attempt to explain is a well-established geographic concept and conundrum. The expression of this is referred to as the Modifiable Areal Unit Problem (MAUP) and as the “ecological fallacy” (Openshaw, 1984a, 1984b; Sui, 2009). These related concepts represent the issues associated with the potential disconnect between scales of processes and the scales of spatial data used to represent them. To understand spatial phenomena, we construct boundaries within which data are aggregated and analyzed. This very act of imposing a boundarydbe it a census block, census tract, parcel, zip code, or grid celldaffects our understanding of the process or pattern we are trying to discern and the results of spatial analyses. MAUP and ecological fallacies result from integrating data derived using different measurement scales (Duckham et al., 2001). The disconnect occurs between spatial aggregation whereby the measurement of variables through observation, associated representation through spatial data, and the subsequent relationships between these variables are affected by the scale at which these measurements take place. Analyzing the same spatial phenomenon using different scales of areal units i.e., “modifiable areal units” of analysis, can produce differing analytical results. An ecological fallacy occurs when conclusions are drawn about individuals based on aggregate data (e.g., you live in a rich tract, so you must be rich). These concepts are well documented in the academic literature related to spatial analyses (see for example Dark and Bram, 2007; Fotheringham and Wong, 1991; Jelinski and Wu, 1996; Openshaw, 1984b, 1998; Stouffer, 1934; Wong, 2009). An understanding and recognition of MAUP, how it manifests, and how to best minimize it, is essential to accommodating uncertainty in results of spatial analyses. Recommendations for approaching MAUP have focused on finding the best possible scales at which the process being studied operates (Kwan, 2012a). However, even if we had perfect knowledge of process scale, limitations with how we frame spatial scales still exist. Offshoots of this currently intractable spatial concept have recently emerged. The Modifiable Temporal Unit Problem (MTUP) (Cheng and Adepeju, 2014; Çöltekin et al., 2011) expands the issue of mismatching spatial scales in analyses to the temporal domain. The MTUP acknowledges the importance of selecting the appropriate temporal resolution through which to analyze a geographic process. Spatial data are static, yet the phenomena they represent are not. Digital map features represent a certain snapshot of the subject based on a specified scale and time. There is a persistent gap in the space–time continuum represented by geospatial data and associated analyses. Historically, GISs were not adept at managing the temporal dimension. Recently software enhancements facilitate the integration of time steps in spatial analyses, notably the release of ESRI’s space–time pattern mining functionality (Esri, 2016). However, temporal scales are complex. Limiting temporal representation to collections of time stamps, while a step forward, still does not resolve the issue (Claramunt and Theriault, 1996). Kwan (2012a,b) suggests that while MAUP addresses the disconnect between boundaries and inferences made based on the scale of representation, it does not address the underlying geographic contextdhow individuals within these boundaries behave based
Spatial Data Uncertainty
317
on experience due to geography and time. She states that how we conceptualize a spatial process influences the output of an associated spatial analysis. This Uncertain Geographic Context Problem (UGCoP) is an important extension of MAUP, acknowledging that it is not just the scale that imposed on an analysis but how the behaviors and experience within that scale are matched, or mismatched. Tobler (1970) acknowledged yet chose not to incorporate this complexity. Kwan, however, extends the complexity of underlying behaviors to the spatial incongruity. Embedding the term “uncertainty” in the acronym, Kwan more directly connects “unknowns” inherent in spatial data analyses resulting from the scales our data impose (Kwan, 2012a,b). The Modifiable Conceptual Unit Problem (MCUP) (Miller, 2016) suggests that not only are analyses sensitive to the spatial scales, but also to how spatial processes are conceptualized. This was demonstrated using pollen models where the underlying conceptualization, as represented by models of the dispersal process, manifest at more than one spatial dimension (Miller, 2016). The difficulty in identifying a demarcation between discrete and continuous boundaries based on scales of analysis has been recognized since ancient times. In the Old Testament, the demarcation between righteous and wicked is questioned (Genesis 18–23). The “sorites paradox” from the 4th century CE articulates the complexity associated with scales of analysis (Couclelis, 2003; Fisher, 2000) and explores at what scale is a heap of sand still a heap. This philosophical paradox exemplifies the vagueness that pervades geospatial information (Fisher, 2000). Boundaries are inherently fuzzy. Fuzzy set theory recognizes that boundaries are not discrete and there is a continuum along boundaries in which borders encapsulate a bit of both phenomena (Fisher, 1996). The fuzziness resulting from where boundaries are placed constitutes uncertainty. Fuzzy set theories have been integrated into geographic information systems and the use of mathematical fuzzy logic, and fuzzy set theories have been used as a mechanism for defining uncertainty in spatial boundaries (see section “Semantic Uncertainty in Spatial Concepts”). Although considerable academic work addresses the ecological fallacy, MAUP, and their related expressions, the resulting issues of vagueness and ambiguity contribute uncertainty to analytic results.
1.22.2.3
Spatial Extent: Continuous Surfaces and Raster Scale
Perhaps the disciplines that have addressed the problems of ecological fallacy related to geospatial data most directly have been ecology, natural resources, and remote sensing. Considerable research in these fields grapples with the particular issue of scale and scaling as it relates to the ability to use spatial data to link spatial patterns with natural processes (Blöschl, 1996; Hunsaker et al., 2013; Lowell and Jaton, 2000; Mowrer and Congalton, 2003; Quattrochi and Goodchild, 1997; Sui, 2009; Wu et al., 2006). Landscape processes do not always operate on the scales represented in geospatial data, yet the geospatial data we use in a GIS to assess these systems imposes a fixed scale within which we attempt to understand them. Especially in disciplines related to ecology and natural resources, spatial data analyses revolve around use of the raster data structure to represent continuous surfaces. The issue of spatial extent is exemplified by the grid cell structure and the scale it imposes on spatial analyses. Placement of discrete boundaries impacts analyses and contributes uncertainty associated with derived results. This is considerable when using the raster data structure. Although rasters represent continuous surfaces, the grid cell structure itself imposes a discrete boundary and associated scale of representation. In the raster data structure, the spatial support or resolution of spatial datasets is predefined, determined by mechanisms of the satellite (in the case of remotely sensed imagery) or grid cell resolution (in the case of digital elevation models (DEMs)), without consideration of the natural processes that are evaluated using these data (Dark and Bram, 2007). Continuous surfaces represent spatial features that are not discrete and commonly represented in a GIS using uniform grids. This raster grid cell resolution imposes a measurement scale on the nature of geospatial analyses and, by association, a scale on the process (e.g., hydrologic, ecologic) these data and associated analyses represent. The concept of resolution is closely related to scale and refers to the smallest distinguishable component of an object (Lam and Quattrochi, 1992; Tobler, 1988). The grid cell is also referred to as the spatial support, a concept in geostatistics referring to the area over which a variable is measured or predicted (Dungan, 2002). Spatial resolution is related to the sampling interval. The Nyquist sampling theory states that the sampling rate must be twice as fine as the feature to be detected. The sensitivity of model input parameters and model predictions to spatial support have been documented in numerous geospatial analyses and remains an important factor in our understanding, assessment, and quantification of uncertainty in spatial data and related modeling applications (Wechsler, 2007). Practitioners often do not have control of the grid cell resolution of a dataset (e.g., products provided from satellite remote sensing or government-produced DEMs). Subgrid variabilitydthat is variability at scales larger than those captured by the grid cell areadcannot be resolved or captured using a typical raster grid cell structure. This is changing as new technologies place the decision for selecting an appropriate support in the hand of the practitioners, such as data derived from UAV platforms. As technologies advance, new spatial datasets are continually being developed. In recent years, the commercial availability of low-cost hardware and embedded computer systems has led to an explosion of lightweight aerial platforms frequently referred to as unpiloted aerial vehicles (UAVs) or “drones”. UAVs are becoming a powerful cost-effective platform for collection of remotely sensed images. Advances in computer vision software have enabled the construction of 3D Digital Surface Models (DSMs) from acquired imagery using Structure from Motion (SfM). SfM uses complex computer algorithms to find matching points from overlapping images, enabling reconstructions of surface feature reconstructions from overlapping 2D images (Fonstad et al., 2013; Westoby et al., 2012). UAV-derived imagery and surfaces are cost effective, accessible, and facilitate data collection at spatial and temporal scales previously inaccessible. As such, they are becoming widely used data sources in a wide range of disciplines and applications including geomorphological mapping (Gallik and Bolesova, 2016; Hugenholtz et al., 2013), vegetation mapping (Cruzan et al., 2016), and
318
Spatial Data Uncertainty
coastal monitoring (Goncalves and Henriques, 2015). Point clouds obtained from SfM-derived surfaces are used to generate digital surface models (DSMs). Data quality is addressed using RMSE to quantify the accuracy of UAV-derived surfaces and vertical accuracies in the centimeter range are commonplace (Harwin and Lucieer, 2012; Neitzel and Klonowski, 2011; Reshetyuk and Martensson, 2016; Verhoeven et al., 2012). However, statements of accuracy and data quality are no substitute for estimates of uncertainty and resulting decisions for fitness-of-use. Data quality and accuracy assessment have become mainstream practice. The challenge remains to bridge the gap between representation of data quality and mechanisms for quantifying and communicating uncertainty.
1.22.3
Uncertainty Analysis Methods
In the previous section, map scale was used as a framework for exploring uncertainties in spatial data due to cartographic and process scales. Data quality, error, and accuracy were described. This section describes methods that have been used to address uncertainty in both vector and raster data structures. There are two basic data models used in GISs to represent real-world features. The vector data model describes features on the Earth’s surface as discrete and definite objects, such as buildings, parks, and cities. The field data or raster data model describes the Earth’s features as continuous phenomena distributed across the space (e.g., elevation, temperature, or rainfall). In the vector data model, the positions of each object are expressed with pairs of x and y coordinates. The attributes of the object are stored in a relational table and linked with the location of the spatial object. In the raster data structure, space is subdivided into grid cells. Each grid cell contains a number indicating the attribute at the location. Since the vector data model and raster data model represent the world differently, the methods of modeling uncertainty in the two models also differ. The first part of this section explores how uncertainty is addressed in discrete vector datasets. This is followed by a discussion of approaches to addressing uncertainty in continuous surfaces, using research related to the DEM as a representative case study.
1.22.3.1
Modeling Uncertainty in Vector data
Uncertainty in vector data is contributed by errors due to lineage, positional accuracy, attribute accuracy, logical inconsistency, incompleteness, semantic uncertainty, and temporal uncertainty (Guptill and Morrison, 1995; Lo and Yeung, 2002). In this section, we explore the cases of positional and attribute uncertainty in the vector data model.
1.22.3.1.1
Positional uncertainty
In the vector data model, positional uncertainty results from our lack of knowledge about the difference between coordinate values of an object in the model and the “true” locations of that object in the real world (Drummond, 1995). Spatial objects in the vector model are represented as points, lines, and polygons. The position of a point is stored as a pair of x and y coordinates. A polyline is a sequence of connected points with associated x and y coordinates. A polygon is a closed polyline. Positional errors that result in uncertainty may be contributed by factors that include, but are not limited to, limitations due to map scale and cartographic generalization, limitations of current technology, digitizing errors, raster to vector conversion, lack of precision in measurement methods, and source map distortion.
1.22.3.1.2
Analytical modeling of positional uncertainty
The basic geometric type of vector data is the point. Analytical modeling of positional uncertainty starts with point uncertainty, which serves as the building block for discrete uncertainty modeling. Analytical models to assess uncertainty make the assumption that characteristics of spatial data uncertainty are known, and apply error propagation techniques to perform uncertainty analyses (Hong and Vonderohe, 2014). Positional uncertainty contains systematic uncertainty and random errors. Systematic errors are reproducible and tend to be consistent in magnitude and/or direction, such as errors introduced during map projection or generated due to limitations of measuring instruments. Systematic errors, once identified, can be eliminated with correction procedures. In contrast, random errors vary in magnitude and direction and can be analyzed statistically. For example, repeated measurements of a coordinate using a GPS unit contain random errors. In this section, random errors are assumed to be randomly distributed following a normal distribution. With these assumptions, uncertainty associated with these errors can be expressed using descriptive statistics, such as the variance and standard deviation.
1.22.3.1.3
Modeling positional uncertainty of point features
The positional uncertainty of a point has been extensively investigated in geodesy and land surveying (Mikhail and Gracie, 1981). The error ellipse is widely used to model point positional uncertainty. The error ellipse represents the zone of uncertainty surrounding a point. The semi-major axis, semi-minor axis, and orientation of the error ellipse are calculated from the xdirectional variance sx, the y-directional variance sy, and the covariance of x and y coordinates of the point q (Fig. 2) (Hoover, 1984; Alesheikh et al., 1999). Therefore, the uncertainty of a point feature can be represented by the variance of the point. The center of the ellipse is the most likely “true” location of the point. However, statistically speaking, the true location can be anywhere inside
Spatial Data Uncertainty
319
Y
X
Fig. 2
The error ellipse of a point.
the ellipse. Shi (2009) provides an approach to calculate the probability that the true location is inside the error ellipse, represented by the volume of the two-dimensional curved surface on top of the error ellipse.
1.22.3.1.4
Modeling positional uncertainty of linear features
Different methods have been developed to model positional uncertainty of straight lines and curved lines.
1.22.3.1.4.1 Modeling positional uncertainty of straight lines Three models have been used to model positional uncertainty of straight lines: (1) the epsilon band model, (2) the error band model, and (3) the G-band model. The most popular line uncertainty model is the epsilon (3 ) band model which has been explored and interpreted by many researchers (Perkal, 1966; Chrisman, 1989, Blakemore, 1984; Edwards and Lowell, 1996; Leung and Yan, 1998). This model is based on the idea that a line is surrounded by a fixed-width buffer zone of uncertainty on each side of the line. This buffer is referred to as an epsilon band (Fig. 3). There are two general approaches to determine the boundaries of the epsilon band. The deterministic approach supposes the true line lies within the epsilon band with no model of error distribution involved, which is not the case in reality. The pseudo-probabilistic approach proposes that the width of the epsilon band is a function of various variables that contribute to error such as digitizing error, generalization error, and scale. The result is a rectangular, bellshaped, or bimodal distribution that delineates the epsilon band along the “true” line (Chrisman, 1982, Blakemore, 1984; Dunn et al., 1990; Alai, 1993; Alesheikh et al., 1999). Regardless of the approach, epsilon band models assume the same positional uncertainty for every point on the line, that is, a fixed band width along the line. In addition, calculation of the band width in the pseudo-probabilistic approach does not involve a random variable, thereby the pseudo-probabilistic approach is not a stochastic process model (Goodchild, 1991). The error band model relies on the placement of a boundary or band around the location of a line. Unlike the epsilon band model, the band width in the error band model is dissimilar along the line. Dutton (1992) first proposed the error band model assuming that the endpoints of a straight line are random variables with a circular normal distribution. By simulating the error band using the Monte Carlo method, Dutton pointed out that the error band is narrower in the middle and wider at the endpoints of the line. Shi (1994) applied probability analysis to describe the error distribution of the line. In his method, a joint probability function of all the points on the line is computed first, from which the probability of the true location of an individual point inside the corresponding error ellipse is obtained. As the number of individual points on the line tends to be infinite, the final probability distribution for the line within a particular region is then formed by integrating the error curved surfaces of individual points on the line (Fig. 4). This error band model is derived under the assumption that the positional uncertainties of the two endpoints are independent, and therefore the resulting shape of the error band is narrowest in the middle of the line and widest at the endpoints. This assumption may not coincide with reality, as uncertainties in the two endpoints may be correlated.
P1
P2
Fig. 3 Epsilon band model. Adapted from Aleshelkh, A. A., Blais, J. A. R., chapman, M. A. and Kariml, H. (1999). Rigorous Geospatial data uncertainty models for GISs. In Spatial accuracy assessment: Land information uncertainty in natural resources. Lowell, K. and Jaton, A. (eds.) Ann Arber Press: Chelsea Michigan. Figure 24.1b, p. 196.
320
Spatial Data Uncertainty
Fig. 4 Error band model. Adapted from Aleshelkh, A. A., Blais, J. A. R., chapman, M. A., and Kariml, H. (1999). Rigorous Geospatial data uncertainty models for GISs. In Spatial accuracy assessment: Land information uncertainty in natural resources. Lowell, K. and Jaton, A. (eds.) Ann Arber Press: Chelsea Michigan. Fig. 24.1c, p. 196.
Shi and Liu (2000) presented a more generic G-band error model to compensate for the drawbacks of the error band model. The G-band error model assumes that uncertainties in the locations of the two endpoints are correlated, and the error band model is a special case of the G-band error model. With this relaxed assumption, the shape and size of the error band may vary according to the statistical characteristics of the line. For instance, the location of the minimum error can be anywhere along the line depending on the variances of the endpoints and other characteristics. Other models for line uncertainty include the buffer model developed by Goodchild and Hunter (1997), the map perturbation model by Kiiveri (1997), the confidence regional model by Shi (1994), the positional uncertainty model of line segments by Alesheikh and Li (1996) and Alesheikh et al. (1999), the locational-based model by Leung and Yan (1998), the entropy error band (Hband) model by Fan and Guo (2001), the covariance-based a-error band by Leung et al. (2004a), and the statistical simulation model by Tong et al. (2013). These methods are variations or expansions of the epsilon band model or error band model based on either non-probabilistic or probabilistic approaches. 1.22.3.1.4.2 Modeling positional uncertainty of curved lines Linear features in the vector data model also include curves, which can be represented by a series of straight-line segments or a true curve defined by a mathematical function (e.g., a circular curve or spline curve). Modeling uncertainty in a curve approximated by a series of straight-line segments combines approaches used to model uncertainty on straight-line segments. In this section, we discuss uncertainty modeling of true curves defined by mathematical functions. Alesheikh (1998) extended the error band model to curves in which error ellipses for arbitrary points along the curve are defined first, and the region encompassing these error ellipses forms the confidence of the curve. The true curve is thus located inside the region with predefined confidence levels. Shi et al. (2000) suggested two modelsdthe 3 s error band model and 3 m error band modeldto measure uncertainty of a curve. The positional error of an arbitrary point Pi (variance of the arbitrary point) on a curve is derived first. The standard deviation of the point in a normal direction perpendicular to the tangent line of the curve (3 s) is then computed based on the positional error of the arbitrary point. The locus of 3 s along the curve is thus defined as the 3 s error band and represents positional uncertainty of the curve. In this case, 3 s may not be the maximum distance between the error ellipse of the arbitrary point and the curve. Therefore, 3 m, the maximum distance from the potential point of the error ellipse of the arbitrary point on the curve, perpendicular to the curve, is calculated, and the locus of 3 m along the curve is the 3 m error band (Fig. 5). Tong and Shi (2010) further developed these two models for circular curves. Two case studies were conducted to compute positional uncertainty of digital shorelines and digitized road curves.
1.22.3.1.5
Modeling positional uncertainty of polygons
Uncertainty indicators for polygons can be used to estimate uncertainty in area, perimeter, and a gravity point or the centroid of a polygon. The most widely applied error indicator is for the area of polygons. Because polygons are composed of vertices and lines, it is natural to estimate polygon uncertainty based on point uncertainty and line uncertainty. Chrisman and Yandell (1988) proposed a simple statistical model to compute the variance of a polygon area using the variances of its vertices under the assumption that the uncertainties of the vertices are independent and identically distributed. A similar statistical model developed by Ghilani (2000) used two less rigorous and simplified techniques and resulted in the same outcome. Zhang and Kirby (2000) suggested a conditional simulation approach to incorporate spatial correlation between vertices into modeling of polygon
Pi P1
P2
Fig. 5 3 s error band model and 3 m error band model. Adapted from Tong, X. and Shi, W. (2010). Measuring positional error of circular curve features in Geographic information systems (GIS), Computers & Geosciences, 36: 861–870. Fig. 3, p. 864.
Spatial Data Uncertainty
321
uncertainty. Liu and Tong (2005) employed two approaches to compute the standard deviation of a polygon area. One approach is based on the variance of the polygon’s vertices and the other is based on the area of the standard error band of its line segments. A case study shows that the uncertainty of a polygon is caused by the positional uncertainty of its vertices and boundary lines, and there is no significant difference between the two approaches (Kiiveri, 1997; Griffith, 1989; Prisley et al., 1989; Leung et al., 2004c). Hunter and Goodchild (1996) integrated both vector and raster approaches to address uncertainty in positional data. They enhanced the existing grid cell model and extended it to vector data. Two separate and normally distributed random error grids in the x and y directions are created with a mean and standard deviation equal to the estimate of positional error in the original polygon data set. The error grids are then overlaid with the polygon to create a new but equally probable version of the original polygon by applying the x and y positional shifts in the error grids to the vertices of the original polygon. This process can be repeated a number of times to assess uncertainty in the final products. Hunter et al. (2000) applied this model to a group of six polygons. By perturbing the set of polygons 20 times, the resulting 20 realizations are obtained and the mean polygon areas and their standard deviations are calculated to show area uncertainty of the six polygons.
1.22.3.2
Attribute Uncertainty
Attribute error in the vector data model refers to the discrepancy between the descriptive values of an object in the model and the “true” values of that object in the real world (Goodchild, 1995). Different models are used to assess attribute uncertainty based on whether the attribute values are categorical or continuous.
1.22.3.2.1
Error matrix for attribute uncertainty
Uncertainty in categorical data is assessed using an error matrix, also called a confusion matrix. This matrix has been widely adopted to compute classification accuracy in remote sensing, but it applies equally well in modeling categorical attributes in vector models. Another method is to convert uncertainty of categorical attributes to a probabilistic model and evaluate uncertainty using sensitivity analysis. An error matrix is a square array of numbers that cross-tabulates the number of sample spatial data units assigned to a particular category, compared with a “true” category. The “true” category can be acquired either from field checks or from a source data with a higher degree of accuracy (Lo and Yeung, 2002; Shi, 2009). Conventionally, rows list attribute categories in the vector spatial database, and columns represent “true” categories in the reference data. The value in the intersection of row i and column j indicates the number of sample spatial data units assigned to category i in the vector spatial database that actually belongs in category j in the reference “true” data. From the error matrix, the overall accuracy is calculated as the ratio of the total number of correctly assigned spatial data units to the total number of sample spatial data units. The data producer and user accuracy are also derived to evaluate the accuracy of each individual category both in the vector spatial database and in the reference data. These three indices are sensitive to the error matrix structure (e.g., one category of spatial data units dominates the sample) and do not consider chance agreements (Stehman and Czaplewski, 1998; Lo and Yeung, 2002). The kappa coefficient is widely used because it takes chance agreement into account (Cohen, 1960; Rosenfield and Fitzpatrick-Lins, 1986). Additionally, two error indices are the error of commission and error of omission. Errors of commission are defined as incorrect inclusion of spatial data units in a particular “true” category. Error of omission is the percentage of spatial data units that are missing from their “true” category. Sampled data are used as the input to error matrices; therefore, the sampling scheme and sample size influence the assessment results. Because the collection of samples is time consuming and requires a great amount of effort and expense, the sample size has to be kept minimum. However, it should be large enough for the assessment to be conducted in a statistically valid manner (Congalton, 1988; Fukunaga and Hayes, 1989). Various sampling schemes are designed in classical statistics and spatial statistics including simple random sampling, system sampling, stratified random sampling, and stratified systematic unaligned sampling (Shi, 2009). The selection of a sampling scheme depends on the situation and purpose of an application.
1.22.3.3
Modeling Topological Relations of Spatial Objects With Vague Boundaries
Vector data models describe the world with discrete and definable points, lines, and polygons. Real-world objects are not that simple. For example, the boundaries between mountains and plains are not clear and sharp. This type of uncertainty, unlike positional uncertainty, is inherent in the nature of the object. Over the past two decades, the development of topological relation models for spatial objects with vague boundaries has gained increasing attention (Chen, 1996; Zhan, 1998). Clementini and Felice (1996) suggested an algebraic model for topological relations of spatial objects with broad boundaries by extending the 9-intersection model (Egenhofer and Herring, 1991) for crisp spatial objects. In their model, a spatial object has an inner boundary and an outer boundary representing the indeterminacy or uncertainty of the object. The closed region between the inner and outer boundaries is the broad boundary. Thereby, a spatial object can be described with three parts: the interior, the broad boundary, and the exterior. Forty-four topological relations are defined according to geometric conditions of the three parts for two spatial objects. Cohn and Gotts (1996) expressed a vague object with two concentric subregions. The inner subregion is called “yolk”, the outer called “white”, and the inner and outer subregions together make the “egg”. They extended the region connection calculus theory (Randell et al., 1992) to the “egg-yolk” representation of vague objects and derived 46 topological relations between spatial objects with vague boundaries. Shi and Liu (2004) used fuzzy set theory to represent a spatial object with vague boundaries. Each fuzzy object has
322
Spatial Data Uncertainty
a fuzzy membership function. Quasi coincidence and quasi difference were applied to partition the fuzzy sets via the sum of their membership functions and the difference in their membership functions, respectively. The sum and the difference values were then used to interpret topological relations between the fuzzy objects. It is found that there are infinite topological relations between two fuzzy sets, which can be approximated by a sequence of matrices.
1.22.3.4
Analytical Methods and Monte Carlo Simulation
Analytical methods and Monte Carlo simulation methods are two approaches used to represent and quantify uncertainties in both the vector and raster data models. Analytical methods develop functional relationships to link characteristics of uncertainties in input variables and output variables. However, they use cumbersome mathematical and/or statistical concepts and formulae that are analytically and computationally complex (Alesheikh, 1998). In addition, most analytical methods assume independence in input uncertainties and a linear relationship between input and output uncertainties, which is rarely the case in reality (Hong and Vonderohe, 2014). New approaches have been developed to solve these problems. Among them, the Taylor series method is the most widely adopted approach which approximates nonlinear relationships by a truncated series (Heuvelink, 1998; Helton and Davis, 2003; Leung et al., 2004a; Zhang, 2006, Anderson et al., 2012; Xue et al., 2015). Although the Taylor series method simplifies error propagation modeling, it also introduces approximation error uncertainty into the analytical model. While various methods for simulating surfaces are available (Deutsch and Journel, 1992), the most common technique for representing uncertainty in continuous fields applies Monte Carlo simulation. In Monte Carlo simulation, uncertainty is addressed by generating realizations of the surface from a set of random samples drawn from the probability distribution of input data (Hong and Vonderohe, 2014). Monte Carlo simulation has been extensively applied to uncertainty analysis for spatial objects, spatial operations, and computation models (Heuvelink, 1998). In many studies, analytical models and simulation methods have been applied together for comparison and cross-validation (Leung et al., 2004b; Shi et al., 2004; Zhang et al., 2006; Sae-Jung et al., 2008; Cheung and Shi, 2000). In fact, almost all the analytical models can be simulated with Monte Carlo methods.
1.22.3.4.1
Uncertainty simulation in continuous fields
The most common technique for representing uncertainty in continuous fields is Monte Carlo simulation. Simulation methods regard any map as only one of an infinite number of equiprobable realizations within which the true map exists. Realizations of equiprobable error surfaces are integrated in a Monte Carlo simulation to quantify the distribution of the magnitude and spatial dependence of a map’s uncertainty (Ehlschlaeger, 1998). The stochastic approach to error modeling requires a number of maps or realizations upon which selected statistics are performed. Uncertainty is computed by evaluating the statistics associated with the range of outputs. The general steps of a Monte Carlo simulation are as follows (Alesheikh, 1998): 1. 2. 3. 4.
Determine the probability density function of the errors in input data; Obtain a set of N random variables drawn from the probability density function; Perform the spatial operations or computational models with the set of random variables to get N output realizations; Calculate the summary statistics from the N realizations.
The underlying assumption of these representations of continuous surfaces is that it is impossible to know the actual variance and spatial variability of error. This is approximated using random fields to approximate this variability. Differences among Monte Carlo methods for simulating uncertainty are in the methods used to generate random fields to represent the spatial structure of error. Numerous approaches to simulating error have been proposed. The concepts of spatial autocorrelation and cross correlation have been used to derive measures of uncertainty and include the spatial autocorrelation of errors in the generation of random fields (Ehlschlaeger, 1998; Ehlschlaeger and Shortridge, 1996; Fisher, 1991a,b; Goodchild, 2000; Griffith and Chun, 2016; Hengl et al., 2010; Hunter and Goodchild (1997); Wechsler and Kroll, 2006). Fisher (1991a,b) used the Moran’s I statistic to measure the autocorrelation of a normally distributed random field. Hunter and Goodchild (1997) used a spatially autoregressive random field as a disturbance term to propagate errors. Ehlschlaeger and Shortridge (1996) and Ehlschlaeger (1998) developed a model that creates random fields with a Gaussian distribution that matches the mean and standard deviation parameters derived from a higher accuracy source. Wechsler and Kroll (2006) integrated four approaches for generating random fields to quantify DEM error and its propagation to derived parameters. Monte Carlo simulation methods are best suited to model the probability of different outcomes in a process that cannot be easily predicted due to the existence of random variables. Monte Carlo methods can represent a wide range of variations of the input data, and they are easy to implement and can deal with nonlinearity and interdependency of input data. However, the generation of a large number of realizations of output data is rather computationally intensive and time consuming (Crosetto and Tarantola, 2001). In addition, Monte Carlo simulation is considered an experimental method, which means it does not provide the theoretical relationship between the input and output uncertainties (Xue et al., 2015).
1.22.3.4.2
Uncertainty in raster data: The case of the digital elevation model
Approaches to estimate uncertainty vary based on the spatial data structure. Here we discuss approaches to address uncertainty in the raster data structure. Data that represent elevation are one of the most common data used for GIS evaluation of natural systems. DEMs provide the basis for characterization of landform and are used extensively in environmental applications such as
Spatial Data Uncertainty
323
geomorphology, environmental modeling, and hydrology. Here we address the special case of DEM error and resulting uncertainty, which has served as the source of research related to uncertainty in continuous surfaces for over three decades. DEM quality is a function of not only the production method but also the scale at which they are produced. DEM data quality and associated errors are linked to DEM production methods (Daniel and Tennant, 2001; Nelson et al., 2009; Wechsler, 2007; Wilson, 2012). Errors in DEMs arise from mistakes in data entry, measurement, and temporal and spatial variation (Heuvelink, 1999). While the exact nature and extent of errors within a DEM are unknown, attempts to represent the spatial structure of error can be used to address uncertainty. In 1934, Neprash noted “.It is frequently assumed that if traits or conditions are closely associated with one another in their geographic distribution, they are functionally, if not causally, related.” (Neprash, 1934, p. 167). This early recognition of geographic spatial dependence, made famous by Waldo Tobler’s first law of geography (Tobler, 1970), is referred to as spatial autocorrelation. If geographic features are spatially autocorrelated, so too are associated errors (Congalton, 1991). Here we discuss approaches to address uncertainty in the raster data structure. It has been established that errors are a fact of spatial data. Errors therefore are propagated throughout spatial analyses. In the case of raster surfaces, much attention has been given to the propagation of errors from DEMs to the variety of derived parameters including those frequently in hydrologic analysesdslope, upslope contributing area, and the topographic index (Wechsler and Kroll, 2006, 2007). Many of these approaches use the RMSE as a springboard for generating equally viable continuous surfaces integrated into Monte Carlo simulations. Statistics derived from the variability in outcomes based on simulations of equiprobable inputs have been used as a baseline for representing uncertainty in numerous applications.
1.22.3.4.3
DEM uncertainty propagation to derived parameters
Primary terrain attributes, such as surface slope and aspect, are computed directly from DEM data. From slope, flow direction and flow accumulation are calculated. Routing flow in a GIS requires smoothing of the DEM surface prior to calculation of hydrologic parameters and further modifying the surface (Wechsler, 2007; Wu et al., 2008). Various algorithms exist for calculating these derived terrain parameters, each producing different results (Carter, 1992; Horn, 1981; Tarboton, 1997; Zevenbergen and Thorne, 1987). In the case of raster surfaces, much attention has been given to the propagation of DEM errors to the variety of derived parameters frequently in hydrologic analysesdslope, upslope contributing area, and the topographic index (Hunter and Goodchild (1997); Wechsler, 2007; Wechsler and Kroll, 2006). One area not currently addressed is that although a variety of methods exist for deriving terrain parameters, software packages do not offer a choice to users. For example, Esri’s ArcGIS provides only one option for calculating slope and flow direction (Horn, 1981; Jenson and Domingue, 1988). Assessing and communicating variability in results based on algorithms are not currently embedded in mainstream practice. The literature is replete with viable methods and approaches to address uncertainty. They focus on the issues of spatial extent or grid cell resolution, methods to propagate error and simulate its impact on the elevation and derived surfaces, and the propagation of this error to model results. However, the research community has not reached a consensus as to how to approach uncertainty and integrate these methods into accessible components of GIS software. This has impeded the integration of the approaches described in this section into conventional practice.
1.22.4
Uncertainty Propagation in Spatial Analysis
DEM-derived terrain parameters are frequently used as inputs to distributed parameter models that represent landscape processes. Vector data are often used as input to certain spatial operations and/or computational models such as hydrologic models, habitat models, network models, and soil erosion models. Uncertainty in the input vector data affects the accuracy of the final output, as spatial operations and/or computational models are functions of the input data. Spatial operations and/or computational models may contain uncertainty. As a result, uncertainties are propagated through spatial operations and/or computational models to the final output. Accommodating the propagation of uncertainty through modeling requires considering not only model inputs, but also model output uncertainty. While uncertainty analysis approaches facilitate exploration of uncertainty contributed by model parameters, sensitivity analyses are used to measure uncertainty related to model output. These two methods have been used to assess uncertainties of computational models in various areas including transportation, ecology, hydrology, urban planning, hazard susceptibility mapping, land suitability evaluation, and environmental planning (Rinner and Heppleston, 2006; Rae et al., 2007; Chen et al., 2009, 2010; Plata-Rocha et al., 2012; Feizizadeh and Blaschke, 2014; Hong and Vonderohe, 2014; Ligmann-Zielinska and Jankowski, 2014). Sensitivity analysis explores how the uncertainty of model output is apportioned to different sources of variations in the input data (Saltelli et al., 2000). It studies the relationship between the uncertainty of input and the uncertainty of output of a model, determining the contribution of each individual input uncertainty to the output uncertainty. Methods for sensitivity analysis include factor screening, differential analysis, Monte Carlo simulation, and response surface analysis. Techniques to quantify the importance of input uncertainty consist of linear regression analysis, correlation analysis, measurement of importance, and variance-based techniques (Bonin, 2006). Among them, variance-based techniques, also called ANOVA-like methods, have gained more attention in recent years. They are based on Monte Carlo simulation under the assumption that the input factors are independent. For each input factor, the variance-based techniques calculate a sensitivity index representing the fractional contribution to the variance of the model output due to this particular input factor. One advantage of variance-based techniques is that, unlike
324
Spatial Data Uncertainty
regression analysis or correlation analysis, the variance-based techniques work for nonlinear and nonadditive models (Crosetto and Tarantola, 2001). Lodwick et al. (1990) performed a sensitivity analysis of attribute values related to polygonal mapping units and proposed two algorithms for determining the confidence level of the output using five indices, one of which is attribute uncertainty, in a sensitivity analysis. Bonin (2000, 2002) transformed uncertainty of categorical attributes into a parametric probabilistic model and used it in a sensitivity analysis to evaluate the impact of attribute uncertainties on the results of travel time computation. Uncertainty analyses estimate the overall uncertainty of model output information as a result of uncertainties associated with the model input and the model itself (Saltelli et al., 2000). Unlike sensitivity analysis, which only considers uncertainty in model input, uncertainty analysis takes both uncertainties of the model input and uncertainty of the model itself into account. The uncertainty of the model itself is typically represented by parameters which are used to tune the modeling hypotheses (Bonin, 2006).
1.22.4.1
Modeling Uncertainty Propagation in Computational Models
In most computational models, spatial data input includes both raster data and vector data. For example, in a hydrological model, the input data may consist of DEM data, river network data, and land-use/land cover map data. Computational models are simplified mathematical representations of the real world. Uncertainties are inevitably introduced into these models. Input data are subject to various errors contributed by measurement, scale, and sampling issues. Errors in a particular model may be contributed through parameter selection, assumptions, and model structure. The associated uncertainties will propagate and accumulate through the model process and reside in the model output. Evaluation of uncertainty propagation through a computational model helps users make more effective and responsible decisions based on the model.
1.22.4.2
Modeling Uncertainty Propagation in Simple Spatial Operations
Positional uncertainty in buffer analyses is attributed to the positional uncertainty of the original points, lines and polygons, and the buffer width. As a result, the position of the measured buffer is not identical to the position of the “true” buffer. Modeling uncertainty in buffer analysis is mainly investigated for the raster data model, and limited studies are conducted for vector data. Zhang et al. (1998) derived an uncertainty model for buffer analysis based on the epsilon band and error band models. One limitation of this model is that it assumes positional uncertainties of the vertices are identically and independently distributed (Shi, 2009). Shi et al. (2003) proposed a more generic model that not only circumvents the above limitation, but also considers both positional uncertainty and buffer width uncertainty. Three indices are computed to assess uncertainty in buffer analysis: error of commission, error of omission, and normalized discrepant area between the “true” and measured location of the buffer according to the probability density function of the measured vertices and the measured buffer value. Overlay analysis in vector data models includes point-in-polygon overlay, line-in-polygon overlay, and polygon-on-polygon overlay. Polygon-on-polygon overlay involves vital components of point-in-polygon overlay and line-in-polygon overlay. Due to methodological complexities, analytical uncertainty propagation models for overlay analysis in vector data have seldom been explored. Only a few studies are found in the literature (Prisley et al., 1989; Caspary and Scheduring, 1992; Kraus and Kager, 1994; Leung et al., 2004b). Shi et al. (2004) describe an approach where correlation is permitted in uncertainties of x- and ydirections at each vertex and of all the vertices in x-direction (or y-direction). In this more generic and comprehensive model, the following uncertainty indices are calculated based on the variance–covariance matrix of all original polygon vertices: variance and covariance matrix of vertices of the generated polygons, variances of measurements of the generated polygons (i.e., variance of the perimeter, variance of the area, as well as the variance–covariance matrix of the center of gravity), and maximum and minimum error intervals for the vertices of the generated polygons (both for individual vertex and for all the vertices in x and y directions).
1.22.5
Semantic Uncertainty in Spatial Concepts
The previous sections deal with uncertainty related to positional and attribute accuracy. Another major type of uncertainty in spatial data is referred to as semantic uncertainty. For example, the concept of exurbanization, or urban sprawl, has more than 18 definitions in the literature (Berube et al., 2006), and each definition may generate different spatial boundaries of exurbanization for the same area (Ban and Ahlqvist, 2009). Therefore, there is semantic uncertainty in the concept of exurbanization. This section introduces general concepts in semantic uncertainty including issues of uncertainty, ontology, fuzzy-set approaches, and semantic uncertainty in a variety of applications.
1.22.5.1
Uncertainty, Fuzzy-Sets, and Ontologies
As stated, error, vagueness, and ambiguity contribute to uncertainty. Errors occur when classes of objects and individuals are clearly defined but poorly measured. Vagueness occurs when there is no unique distinction between objects and classes. Ambiguity occurs when more than two definitions exist for a concept (Fisher, 1999). Examples of errors such as positional accuracy were addressed in the previous sections. The concept of MAUP or ecological fallacies results in vagueness. An example of vagueness is the concept of a hill since it is difficult to say what appropriate elevation a hill should have. An example of ambiguity is the exurbanization concept
Spatial Data Uncertainty
325
since an area can be classified as either exurban or non-exurban, depending on different definitions. This section reviews research focused on uncertainty, fuzzy-set approach, and ontologies that are related to semantic uncertainty of spatial data.
1.22.5.1.1
Uncertainty
Formal representations, modeling, and databases of semantic uncertainty have been studied in uncertainty research. Gahegan (1999) explored semantic content of spatial data transformation and interoperability using formal notations that can be useful for multiple disciplines including geoscience, hydrology, geology, and geography. Zhang and Goodchild (2002) discussed theoretical and practical aspects of spatial data and uncertainty with emphasis on the description and modeling of uncertainty. Plewe (2002) argued that uncertainty in spatio-temporal data from historical geography can be dealt with by using an uncertain temporal entity model. Ahlqvist et al. (2005) pointed out that uncertainty exists in both semantic and spatial definitions of geographic objects and argued that the uncertainty research of spatial objects should include the existence or semantics of the object itself, the location, and boundary of the object. Morris (2008) described how to deal with uncertainty in spatial databases by using fuzzy sets for its query and representation. Some of these studies concentrated on issues of user’s cognition and criticism of the uncertainty research. Couclelis (2003) argued the necessity of shifting focus of uncertainty research in geography from information to knowledge by differentiating cognitive information system for humans and digital information system for GIS. Foody (2003) described problems of the uncertainty research in GIScience community due to misunderstanding of spatial concepts. Brown (2004) argued for interdisciplinary approaches to addressing uncertainty and recommended that human and physical geographers should collaborate to develop relevant methodologies. Kuipers (2000) discussed the usefulness of spatial semantic hierarchy to provide robust knowledge of local geometry for robotic agents’ movements.
1.22.5.1.2
Fuzzy-set approach
A fuzzy set is a class of objects with a continuum of degrees of membership of a concept or phenomenon (Zadeh, 1965). It can address uncertainty associated with vague concepts by allowing partial membership to a set (Zadeh, 1965; Fisher, 2000). When a spatial concept is characterized by vaguenessdi.e., exurbanizationdit should be represented using fuzzy boundaries rather than crisp discrete boundaries. Fuzzy boundaries of a spatial concept are created by applying a fuzzy-set approach to the spatial data. The fuzzy-set approach utilizes a membership function that assigns each spatial object a membership valuedi.e., being exurbanizeddranging between zerodno membership or not exurbanized at alldand onedfull membership or entirely exurbanized. In addition, a membership value of 0.5 is given at the breakpoint of the definitiondcould be either exurban or non-exurban. Fig. 6 presents how a fuzzy-set membership function can be generated based on one of the existing exurban definitions. According to Daniels (1999), the exurban areas are defined as “10–50 miles away from a city of at least 50,000 people”. The definition consists of two attributes, distance and population. In this section, we focus on the numerical expression of the distance attribute, “10–50 miles away from a city”. An example fuzzy-set membership function for the attribute can be developed following the logic of fuzzy-set approach above. According to the numerical expression of the attribute, the distance values of “10 miles” and “50 miles” can be the breakpoints of a fuzzy-set membership function to decide whether a location is exurban or not. Therefore, let’s assign a membership value of 0.5 for areas that correspond to the breakpoints. To make the fuzzy-set membership function of the attribute simple, a linear type of formula is used in this example. Let’s assign a full fuzzy-set membership 1 to the distance values between 20 miles and 40 miles, and no membership 0 to the distance values 0 mile and longer than 60 miles in the fuzzy-set membership function. Fig. 6 shows visual representation of the fuzzy-set membership function.
1
0.5
mi 0
10 20
40 50 60
Fig. 6 Example of a fuzzy-set membership function of an exurbanization concept of Daniels (1999) focusing on the distance attribute. Redrawn from Ban, H., and Ahlqvist, O. (2009). Representing and negotiating uncertain geospatial concepts–Where are the exurban areas? Computers, Environment and Urban Systems. 33(4), 233–246, Table 1.
326
Spatial Data Uncertainty
Following Ban and Ahlqvist (2009), a set of membership functions for distance in Fig. 6 can be developed as follows: mf : 0:05$X f or ð X < 20Þ; mf : 1 f or ð X 20 and X < 40Þ; mf : 3 0:05$X f or ð X 40 and X < 60Þ; and mf : 0 f or ð X 60Þ
(2)
When Eq. (2) is applied to empirical GIS data, the results can be visualized in maps. Fig. 7 depicts the uncertain spatial boundaries of one of the exurban definitions of Daniels (1999) visualized by using Eq. (2) and data of distance from metropolitan statistical areas (MSA) in four counties in Ohio, USA. The continuous and fuzzy grayscale values in Fig. 7 represent the heterogeneous degrees of exurbanization in the study areas that crisp, Boolean-style boundaries may miss. In Fig. 7, darker gray colors represent higher degrees of exurbanization. There are other exurban definitions in Daniels (1999) that are not introduced here. Each definition of exurbanization in Daniels (1999) can have its own fuzzy-set membership functions. If a concept consists of multiple fuzzy sets, they can be combined by operations such as inclusion, union, intersection, complement, relation, and convexity (Zadeh, 1965). The development of the concept of a fuzzy set and associated equations is well documented (Klir and Yuan, 1995; Zimmermann, 1996; Ragin, 2000; Robinson, 2003). Ban and Ahlqvist (2009) showed how multiple fuzzy-set membership functions can be combined to generate negotiated uncertain boundaries of exurban areas in maps. The fuzzy-set approach has been used in several applications of multidisciplinary research. For example, it has been used to deal with conceptual development of uncertainty research such as multivalued logic (Fisher, 2000) and a rough-fuzzy set (Ahlqvist, 2005b), several uncertain phenomena in human geography and social sciences with qualitative and quantitative data (Openshaw, 1998; Ragin, 2000), uncertainty issues in behavioral and natural environment studies such as land cover mapping and DEM data (Fisher, 1996; Fisher and Wood, 1998; Fisher and Tate, 2006), uncertain geometry and vector boundary representation (Wang and
Fig. 7 Visualization of the uncertain boundaries of exurbanization based on the fuzzy-set membership function and empirical GIS data (distance data source: US Census Bureau, 2000). Depicts one of the existing definitions of exurbanization concept regarding distance, “10–50 miles away from a major urban center” (Daniels, 1999).
Spatial Data Uncertainty
327
Hall, 1996; Guesgen and Albrecht, 2000), vagueness in GIS data, object-oriented databases, spatio-temporal data (Cross and Firat, 2000; Dragicevic and Marceau, 2000), similarity in statistics (Hagen-Zanker et al., 2005), and development of curricula of GIS about uncertainty and fuzzy classification (Wilson and Burrough, 1999). Fuzzy set tools have been integrated into GIS software packages and are therefore accessible to practitioners outside of the research communities.
1.22.5.1.3
Ontologies
Ontologies are theories from philosophy that describe a certain view of the world, for example a spatial term, by its composition, structure, properties, relations, classes, boundaries, functions, and processes (Mark et al., 1999; Fonseca et al., 2002; Couclelis, 2010). Ontologies deal with the nature of being and stem from metaphysics (Ahlqvist et al., 2005). Recently, ontologies have been highlighted in spatial uncertainty research by using computing technologies such as artificial intelligence, machine learning, and the semantic web (Sen, 2008; Kuhn, 2009; Couclelis, 2010). Goodchild (2004) argues that expanded research emphasizing ontologies in geographic processes is necessary. Ontology-related research has focused on the relationship between human cognition and ontologies including: (1) the connection between producers and consumers of an ontology for exchanges of ideas (Frank, 1997), (2) formulating and testing ontologies embodied in human cognition related to geographic categories (Smith and Mark, 1998), (3) the effect of individual differences in cognitive categorization of geographic objects (Mark et al., 1999), (4) an ontology of spatial relations of image interpretation using fuzzy representations (Hudelot et al., 2008), and (5) geographic information ontologies reflecting user intentionality and object of discourse (Couclelis, 2010). Ontology research has developed along with the integration of systems in computing environments. Philosophical, cognitive, and formal theories of semantic uncertainty have been described along with formal tools and conceptual structures for implementation and representation of uncertainty (Kavouras and Kokla, 2007). An ontology-driven GIS architecture that integrates geographic information based on semantic values has been recommended (Fonseca et al., 2002). Research has focused on topics of implementation of spatial ontologies in computer systems, such as the cooperation between critical GIS and computational approaches (Duckham and Sharp, 2005), comparison of ontologies between philosophical meaning and computer-science meaning (Schuurman, 2006), providing a formal model of uncertain geographic information based on multivalues logic (Duckham et al., 2001), and suggesting an approach to deal with vagueness in geographical concepts by using logical and semantic analysis (Bennett, 2001). In addition, some studies have dealt with the temporal aspects of ontology in geography that have been neglected until recently (Couclelis, 1999) and methods to analyze the effect of uncertainty in spatio-temporal interactions of moving objects by using space–time prisms and accessibility (Neutens et al., 2007). Research on spatial ontologies has expanded beyond the spatial domain. For example, Goodchild et al. (2007) introduced the concept of the geo-atom that could deal with uncertainty in measurement of objects to represent both discrete and continuous conceptualizations of the world. Frank (2003) proposed a multitier ontology in spatio-temporal GIS databases consisting of physical reality, observable reality, object world, social reality, and cognitive agents to integrate different philosophical viewpoints ranging from realist and positivist views to postmodern views of the world. Another group of study has made efforts to represent ontologies using formalism for the use of computing to deal with uncertainty. Sen (2008) demonstrated road-network ontologies on the framework for probabilistic geospatial ontologies by using machine-based mapping that verified maps created by human beings. Kuhn (2009) proposed an ontology of observation and measurement to formalize and ground the semantics of spatial phenomena observed and represented on the semantic sensor web. Buccella et al. (2011) proposed a system to integrate geographic data by formalizing information as normalized ontologies consisting of structural, syntactic, and semantic aspects to assist users in finding more suitable correspondences.
1.22.5.2
Applications
In this section, applied spatial uncertainty research is separated by timeframe to demonstrate the depth and breadth of research over the decades. Research in this area continues to expand.
1.22.5.2.1
The 1990s
In the 1990s, research focused on introducing fuzzy-set approaches and the usefulness of uncertainty in geographic research. For example, Fisher and Pathirana (1990) explored the use of a fuzzy classifier to determine land cover components of individual pixels in remote sensing data and demonstrated that the fuzzy classifier could be useful to extract information about individual pixels. Fisher (1992) revealed uncertainty in viewshed analysis and provided an alternative approach by utilizing an error simulation algorithm and fuzzy set to produce fuzzy viewsheds. Hays (1993) examined the uncertain geographic boundary of a certain region that could be represented by a few terms and argued that the fuzzy-set theory could contribute to multiple disciplines by illustrating uncertain geographic boundaries. In addition, Davis and Keller (1997b) proposed a model to combine the techniques of fuzzy logic and Monte Carlo simulation to deal with thematic classification uncertainty and variance in continuously distributed data. Comparisons between the fuzzy-set approach and Boolean approaches were explored. Burrough et al. (1992) suggested using fuzzy classification to determine land suitability and demonstrated that the fuzzy approach provided much better classification of continuous variation than the Boolean approach. Sui (1992) and Davidson et al. (1994) compared results of land evaluation from both Boolean and fuzzy-set approaches and argued the fuzzy-set approach provided more gradual results than the Boolean approach.
328
Spatial Data Uncertainty
De Gruijter et al. (1997) demonstrated that soil distribution modeling should be based on a fuzzy-set approach to capture soil processes and land-use effects that may be missed by traditional soil maps with a higher level of aggregation and classification. Steinhardt (1998) combined traditional assessment methods and a fuzzy-set approach for assessment of large areal units of landscape and demonstrated that the fuzzy sets and fuzzy logic provided a better representation than the traditional assessment method. Lastly, Hunter and Goodchild (1997) applied a spatially autoregressive error model to DEM to demonstrate effects of uncertainty in DEM for analyses of slope and aspect.
1.22.5.2.2
The 2000s
Since 2000, research in uncertainty has continued to expand. Some provide expanded use of fuzzy-set approaches to spatial uncertainty research. For example, Ahlqvist et al. (2003) and Ahlqvist (2005a) demonstrated how rough fuzzy sets can be useful to deal with uncertainty in classification, especially vagueness and indiscernibility in land cover categories. Dixon (2005) incorporated fuzzy rule-based model and data of GIS, GPS, and remote sensing to generate groundwater sensitivity maps. Malczewski (2006) applied the concept of fuzzy linguistic quantifiers into the GIS-based land suitability analysis by using land-use data. In addition, Ban and Ahlqvist (2009) demonstrated how an uncertain urban concept such as exurbanization can be measured by using a fuzzyset approach. Uncertainty in spatial data has been examined in applications using several analytical methodologies. Liu and Phinn (2003) presented an application of cellular automata modeling to represent multiple states of urban development using a GIS and fuzzy-set approach. Ladner et al. (2003) introduced an approach that applies association rules for fuzzy spatial data to assess correlations among values in data. Comber et al. (2006) argued for the importance of assessing data uncertainty by using methods such as linking data quality reporting, metadata, and fitness for use assessment to better deal with spatial uncertainty. Ge et al. (2009) visualized uncertainty in land cover data by using methods including maximum likelihood classification, a fuzzy c-means clustering algorithm, and a parallel coordinate plot. It also demonstrated that a fuzzy-set approach provided better results than a probability approach. Additional research has focused on assessment of uncertainty from the viewpoint of critical GIS. Comber et al. (2004) demonstrated the inconsistency among expressions of expert opinions for relations of land cover ontologies and suggested the combination of different expert opinions. Duckham and Sharp (2005) suggested combining critical thinking in GIS, such as societal issues in using technology with computational approaches to deal with uncertainty in geographic information. In addition, Duckham et al. (2006) developed a qualitative reasoning system to describe and assess consistency of uncertain geographic data to support integration of heterogeneous geographic datasets. Much research has dealt with development of methodologies, frameworks, and tools. For example, Jiang and Eastman (2000) reviewed existing multicriteria evaluation (MCE) approaches in GIS including Boolean and Weighted Linear Combination and proposed the use of fuzzy-set membership for a more specific instance of the MCE. Pappenberger et al. (2007) introduced a method to estimate uncertainty of inundation extent by using a fuzzy evaluation method and remote sensing data. Zhang and Foody (2001) compared the fuzzy c-means clustering algorithm and the artificial neural network approach to classify remote sensing data. Research has provided techniques for modeling spatial uncertainty. Wechsler and Kroll (2006) described a Monte Carlo methodology for evaluation of the effects of uncertainty on elevation values and topographic parameters such as in DEM data. Peterson et al. (2006) explored uncertainty of geographic extent of Marburg virus by using ecologic niche modeling. Heuvelink et al. (2007) developed a statistical framework and software tool for simulating and representing uncertainty in variables of environmental phenomena. Bone et al. (2005) developed a model that applies the fuzzy-set theory to remote sensing and GIS data in order to produce susceptibility maps that show insect infestations in forest landscapes. Uncertainty in classification of spatial data has also received attention. These include methods for representing vagueness in taxonomic class definitions in land-use data combined with a formal representation of fuzzy sets (Ahlqvist, 2004; Ahlqvist and Gahegan, 2005), and for evaluating similarity metrics of semantic land cover change data (Ahlqvist, 2008). In addition to uncertainty of land-use data classification, uncertainty in concepts of broad urban areas was studied. For instance, some works introduced how the uncertainty approach can represent an inherent ordering of categories such as urban, suburban, exurban, and rural areas in the numerical measurement domain (Ahlqvist and Ban, 2007) and demonstrated how an uncertain exurban concept can be measured and represented by using fuzzy-set approach geovisualization, and virtual environment to enable users to negotiate the spatial boundaries of an uncertain concept (Ban and Ahlqvist, 2009). Another area deals with user expertise on the use of uncertain geographic information for risk assessment in the domain of floodplain mapping (Roth, 2009).
1.22.5.2.3
The 2010s
In the 2010s, applications of spatial data uncertainty research have focused on the development of models and data. For instance, Voudouris (2010) proposed object and field data model combined with uncertainty and semantics by using the Unified Modeling Language (UML) class diagram. Tate (2013) evaluated uncertainty of social index reliability by assessing and visualizing uncertainty for a hierarchical social vulnerability index data using Monte Carlo-based analysis. In addition, a few works paid attention to the perception of the users and broad audience. For example, Grira et al. (2010) argued that the participation of end-users in the management of spatial data uncertainty in a Volunteered Geographic Information (VGI) context contributes to improving the perceived spatial data quality. In addition, Goodchild et al. (2012) argued that the uncertainty concepts in geographic information and social science approach should be involved in the Digital Earth for a better communication between scientists and the public regarding uncertainty in the information technology (IT) and spatial data.
Spatial Data Uncertainty
329
There are other recent works that have developed the frameworks and tools of the spatial uncertainty. Janowicz et al. (2011) and Bastin et al. (2013) introduced new frameworks to deal with uncertainty in geographic information implemented in web-based user interfaces. Bordogna et al. (2012) developed a geographic information retrieval model and a software tool to represent uncertainty in indexing text in documents with geographic locations.
1.22.6
Uncertainty Visualization
Research on uncertainty visualization is relatively new in the timeline of research on spatial data uncertainty and has developed with the advancement of IT, especially since 1990s. Currently, various visualization techniques from IT such as static, animated, twodimensional/three-dimensional/more than four-dimensional, and interactive web-based visualization have been used to represent spatial data uncertainty more effectively. This section focuses on different techniques to visualize data uncertainty, including both positional uncertainty and semantic uncertainty. Much of the research dealing with visualization of positional uncertainty has focused on development of methodologies. For example, Davis and Keller (1997a) introduced modeling and visualization of multiple types of uncertaintydsuch as uncertainty in classification and data accuracydwith an example of slope stability modeling. Lucieer and Goovaerts (2006) developed a geostatistical method to generate a spatial distribution of risk measurements to investigate how uncertainty associated with risk is visualized as uncertain locations of spatial clusters and outliers. Monmonier (2006) focused on uncertainty generated by the processes of data preparation, modeling, and classification, and argued that cartographers should not be neglected while technology of uncertainty visualization has evolved. Xiao et al. (2007) developed a method to evaluate the classification of robustness of choropleth maps when uncertainty is present in the data. Tucci and Giordano (2011) developed a method for detecting positional inaccuracy and uncertainty to measure deceptive changes in urban areas using historical maps. Pfaffelmoser et al. (2011) developed a visualization methodology for representing positional and geometrical variability of isosurfaces in uncertain 3D scalar fields with user interaction. Kwan (2012a) introduced the uncertain geographic context problem that could be generated from the way contextual units or neighborhoods are geographically delineated. The research community has developed software tools to visualize semantic uncertainty. For example, Bastin et al. (2002) developed a toolkit consisted of interactive and linked views to enable visualization of data uncertainty that allowed users to consider error and uncertainty as integral elements of image data to be viewed and explored. Lucieer and Kraak (2004) developed a tool to visualize fuzzy classification of remotely sensed imagery with the use of exploratory and interactive visualization techniques. The tool consists of dynamically linked views including an image display, a parallel coordinate plot, a 3D features space plot, and a classified map of uncertainty. On the other hand, some studies on visualization of uncertainty in data types have developed a model of fuzzy spatial data types consisting of fuzzy points, lines, regions with fuzzy spatial algebra (Schneider, 1999), and a multivalue data type that consists of multiple instances of the same object to visualize uncertainty of the spatial multivalue data (Love et al., 2005). Efforts for accuracy assessment of data and methodology have provided some guidelines to uncertainty visualization research. For the assessment of spatial data accuracy, Goodchild and Hunter (1997) proposed a technique for evaluating the positional accuracy of digitized linear features based on a comparison with higher accuracy data using statistics and visualization. Woodcock and Gopal (2000) demonstrated that the methods for accuracy assessment using fuzzy sets contribute to finding the magnitude of errors and assessment of ambiguity in map classification. Themes related to users’ cognition, perception, and behavior to visualization have been explored. For instance, Deitrick and Edsall (2006) demonstrated the importance of the way uncertainty is expressed since uncertainty visualization could affect decision-making. Some authors argued the importance of user interpretation and perceptual issues of understanding the graphical expression of uncertainty (Drucker, 2011; Brodlie et al., 2012). Usability is another theme that has been examined for uncertainty visualization. Some of them addressed that uncertainty in geographic data, and classification should be visualized to help users’ understanding (Deitrick and Edsall, 2008; Slingsby et al., 2011). In terms of user evaluation, Hope and Hunter (2007) demonstrated that general end users usually do not have an intuitive understanding of uncertainty represented in the outputs of GIS in the process of decision-making regardless of their experience with spatial data. Devillers et al. (2010) also criticized that uncertainty visualization remained only in the academic literature and has not reached the general end-users of spatial data. There are several works that investigate how cartographic elements should be used in uncertainty visualization. For example, some of them demonstrated computer graphics and visualization techniques such as three-dimensional, shape, glyph, magnitude, volume, hue, and interactivity to help users’ access and understanding of uncertainty in the data (Wittenbrink et al., 1996; Pang, 2001). Another research topic in this area is evaluating a user’s perceptions of the effectiveness of a particular visualization method such as shape, opacity, blinking, three-dimensional, hue, and saturation (Drecki, 2002). In addition, a method of choropleth mapping was developed to represent uncertainty of socioeconomic data by using hierarchical tessellations of data uncertainty and quadtree data structure (Kardos et al., 2005). Bostrom et al. (2008) argued that cartographic design features should be used in risk communications as maps to better represent uncertainty information and to influence risk perception and behavior. Visual variables and cartographic techniques such as whitening of hues, orientation, grain, arrangement, shape, fuzziness, transparency, and iconicity have been studied for cognitive testing of uncertainty visualization (Kubícek and Sasinka, 2011; MacEachren et al., 2012; Kaye et al., 2012). With recent technology development, web-based visualization has been used in semantic uncertainty research. Examples include web-based visualization and data exploration tools for capturing uncertain geography concepts and data such as “high crime areas”
330
Spatial Data Uncertainty
(Evans and Waters, 2007), simulation of a snow avalanche event (Kunz et al., 2011), and statistical processing for extensive psychological user evaluation (Kubícek and Sasinka, 2011). Lastly, three-dimensional visualization and augmented reality have been recently applied. Su et al. (2013) developed uncertainty-aware geospatial visualization using 3D-augmented reality techniques to monitor the proximity between invisible utilities and digging implements to deal with utility strikes for improvement of urban excavation safety. Delmelle et al. (2014) evaluated the influence of positional and temporal inaccuracies in the 3D mapping of potential outbreaks of dengue fever data.
1.22.7
Uncertainty in Crowd-Sourced Spatial Data
We have entered an era of big data: large volumes of various datasets are being created every moment. Videos are made and shared on YouTube. Photos are uploaded and pinned on Instagram, Flickr, and Pinterest. Blogs are written and commented on possibly any topic including science, politics, fashion, and travel. Tweets are posted and mentioned on anything from the presidential election and social rebellion to local news and daily gossip. Spatial data such as Global Positioning System (GPS) trajectories and places of interest are produced in collaborative mapping projects and social networking platforms, with or without users’ awareness. It is indeed an age of voluminous data produced at an unprecedented pace in all aspects of human activities. These “Big Data” and associated analytics impact our everyday lives and shape our decisions. We are able to track product prices and purchase when the price reaches historical low. The travel website Kayak relies on historical price changes to forecast price trends and offer purchasing advice to consumers. Navigation systems such as Waze incorporate traffic data contributed by on-road drivers via their cellphones to provide route suggestions in real time. Big spatial data are unique in the ability to analyze data based on their spatial location. These data provide great potential to study physical and human phenomena. Earth observation systems such as the Landsat program continuously acquire satellite imagery of the Earth, while widespread sensor networks constantly monitor environmental conditions including temperature, pressure, and sound. Furthermore, empowered by advanced geospatial technologies and ubiquitous computing systems, billions of human sensors (Goodchild, 2007) are now capable of creating large amounts of geotagged data from bikeways to parking lots. In many cases we can now find more than one data source for a specified geographic feature. These crowd-sourced spatial datasets have unique characteristics: First, these datasets are assertive instead of authoritative. There is no “gold standard” or master geographic dataset any more. Anyone with access to geospatially enabled technologies can generate geographic information effortlessly and different representations of the same geographic world can coexist simultaneously. Second, a much wider range of geographic phenomena can be mapped by citizen volunteers, and ephemeral geographic things and events may be recorded and visualized in real time. When mapping was expensive and production cycles were long, government agencies and their experts tended to map things that are stable, which ensures the accuracy and validity of maps for a long period of time. However, identification of an object’s location on the Earth’s surface and creation of one’s travel trajectories may be as easy as pressing a few buttons on a smartphone nowadays. When we are inundated with such volumes of easy-to-get information, evaluation of data sources and identifying factors that contribute to uncertainty is of particular importance.
1.22.7.1
Evaluation of Uncertainty in Crowd-Sourced Spatial Data
Uncertainty in spatial data may be measured: (1) during the data creation process or (2) evaluated after datasets are produced. Goodchild and Li (2012) proposed three mechanisms (the crowd-sourcing, social, and geographic approaches) to ensure spatial data quality in the data acquisition and compilation process. Other researchers have evaluated data quality by comparing a crowd-sourced dataset with a reference authoritative data source (e.g., Haklay (2010); Cipeluch et al., 2010). These two directions of research result in four major methods for evaluating uncertainty in crowd-sourced spatial data. Uncertainty in crowd-sourced spatial data can be evaluated using the crowd-sourcing approach which is based on Linus’s Law “given enough eyeballs, all bugs are shallow (Raymond, 1999)”. In its original context, the likelihood of a bug being found and corrected increases as the number of programmers reviewing a piece of code increases. When applied to data uncertainty, it suggests that higher accuracy of spatial data is associated with a larger number of reviewers or editors. This approach has been confirmed by a study on data quality of OpenStreetMap (Haklay, 2010): Positional accuracy of features increases as the number of editors increases until a threshold is reached. This approach can also be used to update datasets such as street networks. When an offroad route is frequently traveled by an increasing number of drivers, a new road segment may probably be added to make the dataset up-to-date. This approach works best for geographic features that are prominent or draw attention from many viewers and editors, but it may not be as effective for geographic areas that are sparsely populated or attract little interest. This is also problematic for spatial data that involve disagreements (e.g., feature types), as demonstrated by the “tag wars” in OpenStreetMap (Mooney, 2011). The second method is called the social approach, which relies on social trust and a hierarchy of gate-keepers. A reputation system is established based on the number and quality of contributions a person makes. Data producers at different levels in the hierarchy have different privileges in terms of editing, deletion, blocking other users, and resolving disputes. This type of mechanism has been implemented in many crowd-sourcing projects, including Wikipedia and OpenStreetMap (OSM). The third approach uses geographic theories and principles that govern our world to assess the likelihood of a piece of geographic information being true. While the first two approaches evaluate data uncertainty indirectly by looking at either
Spatial Data Uncertainty
331
the number of editors or the credibility of a contributor, this approach evaluates spatial data uncertainty directly using geographic knowledge. A geographic fact under evaluation should be consistent with our geographic knowledge of that particular feature type and the surrounding areas. Uncertainty in the form of logical inconsistency can be easily identified using such a knowledge base. For instance, a photo of a charming house geotagged to an ocean is probably misplaced. This is a promising approach for ensuring data quality; however, implementation of thousands of geographic principles and rules remains a challenge. In addition to the three approaches proposed by Goodchild and Li (2012), another common method to evaluate uncertainty in crowd-sourced spatial data is a comparison between a dataset and a reference authoritative source. This method has been widely used in quality analysis of OSM data in different parts of the world. For example, Haklay (2010) assessed the positional accuracy and completeness of geographic features in London and England by comparing OSM with Ordnance Survey datasets. Girres and Touya (2010) analyzed geometric, attribute, semantic and temporal accuracy, logical consistency, completeness, lineage, and usage of OSM data in France by comparing them with the BD TOPOÒ Large Scale Referential (RGE) from the French National Mapping Agency (Institut Géographique National, IGN). Fan et al. (2014) evaluated building footprint data in terms of completeness, semantic accuracy, position accuracy, and shape accuracy for the city of Munich in Germany by comparing OSM with ATKIS (German Authority Topographic–Cartographic Information System). Although this is a widely adopted method, we need to be aware of its limitations: It may quickly become inadequate when it comes to temporal accuracy, as authoritative data sources usually have a long production cycle and may easily be outperformed by a crowd-sourced spatial database created by millions of human sensors with advanced geospatial technologies. Evaluation of completeness using this method may also not be appropriate due to the same reason. As demonstrated by a study of conflating bikeways from authoritative and crowd-sourced data (Li and Valdovinos, 2017), authoritative datasets are no longer the gold standard and can actually be complemented and improved using a crowdsourced spatial dataset.
1.22.7.2
Uncertainty in Platial Data
A new dimension that is becoming increasingly important is uncertainty regarding places in crowd-sourced data. Place is a core concept in everyday life, in the discipline of geography, and across the social sciences. It has been extensively studied in human geography from experiential, humanistic, and phenomenological perspectives (e.g., Tuan, 1977; Relph, 1976; Harvey, 1993; Hubbard and Kitchin, 2010). However, this has generally been ignored in GIScience due to its vagueness and ambiguity, which do not meet the standard of rigor in scientific studies. Many places are geographic objects with indeterminate boundaries or changes through time that are difficult to define as the exact polygonal extents traditionally required by GIS (Burrough and Frank, 1996). On the other hand, space has been studied comprehensively through such concepts as location, distance, direction, geometry, and topology in GIScience. GISs have focused on representing space using points, lines, and areas in predefined coordinate systems, within which measurements are made. So far, research on spatial data accuracy and uncertainty focuses on geographic features represented as coordinate pairs in space. However, the wide adoption of digital technologies and the crowd-sourcing of geographic information have generated voluminous data centered on place, which has a great potential to benefit society. It is time to take place more seriously in spite of its imprecise or absent spatial component. Just as location is essential in spatial representation, placename is critical in representing places. It is the key to link information about the same place from different sources, just as location is the key to link properties through spatial joins. An accurate location, or even a highly inaccurate location, is no longer essential in representing place. A placename like Lower Manhattan may be more pertinent to an average citizen than the officially defined boundary of Manhattan. The development of digital technologies makes it incredibly easy to record people’s lives, so placenames that were once present only in verbal conversations are mentioned and discussed on the web, in Twitter, Flickr, travel blogs, etc. This is a form of informal geographic information. As shown in this sentence, in downtown Long Beach, the Fahrenheit temperature was 59 on 5 December 2016. It involves a place “downtown Long Beach” although it is poorly defined in terms of spatial location, without any coordinates in a reference system. A message like this is definitely meaningful and useful despite its uncertain spatial location. Place is sometimes not a well-defined geographic entity, but a vague object with changing context. Due to its ubiquitous presence in human discourse, many efforts have been made to represent uncertainty associated with places, especially those mentioned and tagged in crowd-sourced datasets. For example, Montello et al. (2003) asked participants to draw the location of downtown Santa Barbara in order to extract spatial information for the vague place “downtown”. Jones et al. (2008) tried to delineate the spatial extent of vague places by searching and calculating the frequency of co-occurrence between examined places and more precisely defined places. There is also increasing interest in generating spatial footprints for imprecise regions using geotagged photos, particularly photos in Flickr. Grothe and Schaab (2009) used two statistical methodsdkernel density estimation and support vector machinesdto generate boundaries of named places. Keßler et al. (2009) implemented a clustering and filtering algorithm to assign spatial locations to placenames and to compare those locations with the locations obtained from traditional gazetteers. Li and Goodchild (2012) extracted geotagged Flickr photos and constructed different places in a hierarchy in France. Fig. 8 shows a density surface of Flickr photos tagged with the placename “Paris” (in red) along with density surfaces for many of the constituent features of a tourist’s Paris. The map bears little relationship to the official, spatial definition of Paris, but clearly demonstrates the hierarchical nature of a tourist’s conception of Paris. More recently, Chen and Shaw (2016) developed a modified kernel density estimation method to delineate vague spatial extents of places extracted from Flickr photos.
332
Spatial Data Uncertainty
Fig. 8 Density surfaces created from Flickr postings for Paris (red) and several major places within a tourist’s Paris (Background map: OpenStreetMap) (Li and Goodchild, 2012).
1.22.7.3
Uncertainty Equals Low Quality?
Understandably, people prefer certainty to uncertainty. Over the past few decades, the GIScience community has made considerable effort to fight spatial data uncertainty by reducing errors, removing fuzziness, and increasing quality. We judge a spatial data source by how certain or accurate it is, and have developed numerous methods and techniques to assess uncertainty. As discussed in the previous sections, positional uncertainty is measured using accurate quantitative methods such as the difference between a represented location and its true location in reality. A high-quality spatial database is expected to have minimal uncertainty in every aspect. Geometric accuracy has always been an important measurement to evaluate map quality. However, in the era of big geospatial data, a large portion of geographic information is created on places. Placenames may be all that is present to link different properties provided by various sources, and locational or geometric component may not be available at all. Should we treat these data sources as little or no value due to the high uncertainty associated with locations? Or should we extract valuable knowledge from them in spite of uncertainty? Perfectly accurate data are not always necessary, or even desirable. The London Tube map (Fig. 9) is a classic example of schematic mapping that represents stations on transit lines in their relative positions. It was the first attempt to represent complex transit railway systems in London using a schematic diagram, designed by Harry Beck in 1933. Distance in central London is expanded while distance in the periphery is shrunk. Lines are straightened and directions are adjusted to horizontals, verticals, and diagonals. Although geometry is intentionally distorted, the Tube map serves the major purpose of a transit map, which is to convey route information to passengers. Assisted by the map, travelers find their origins and destinations, where to get on a train, where to make connections, and where to get off. Its success as a navigation aid has been proven by its wide adoption in many cities all over the world (Ovenden, 2007). This type of map significantly reduces cognitive load in wayfinding in a complex subway system or other public transit systems. Schematic mapping is also used in railroad maps and electric distribution systems where geographic accuracy between features such as junctions is not relevant. In all these cases, geometry is reduced for the sake of clear representation and reduction of cognitive load in comprehending essential information. Nonessential geometric details between subway or railroad stations are not required for performing the task. In summary, the Tube map has very low accuracy and high uncertainty in terms of positions of train stations, but it is a very suitable map format for conveying the topological relationships between various stations and is actually more convenient to use compared with a planimetrically correct map with accurate positions. Another type of map that does not preserve geometry is the cartogram. Cartograms distort location and resize geographic features to take advantage of length of lines or area of polygons so as to emphasize some geographic variables, such as population and travel time. Area cartograms, also known as value-by-area maps, have been proven an effective way to visualize spatial distributions by substituting land area for other quantitative attributes (Tobler, 2004). For example, cartograms were created to represent China’s population and wealth distribution (Li and Clarke, 2012). They were also used to visualize different aspects of social life in Great Britain (Dorling, 1995) and are included in newspaper stories and technical reports
Spatial Data Uncertainty
Fig. 9
333
Official tube map: London underground. Source: http://content.tfl.gov.uk/standard-tube-map.pdf
due to their simple interpretation. A second type of cartogram is a distance cartogram, in which map distance does not correspond to distance on the Earth’s surface, but other variables, such as travel time, and travel cost. Distance is distorted so as to be proportional to the magnitude of another variable that is of interest. In both types of cartogram, location, length, size, and shape are distorted according to the value of an investigated variable, while topology is maintained, manifested as retained adjacency between polygons and connectivity between nodes. In geographic visualization, absolute geometric accuracy is often not the ultimate goal, although high accuracy may be required for identification of a geographic feature on the Earth by its shape, or measurement of distance and area. In many other tasks, precise location is not required. In both schematic maps and cartograms, we deliberately distort geometry to achieve some other purpose, including removal of unnecessary detail to provide a focused presentation. This is particularly applicable to the world of crowdsourced geographic data. Large amounts of valuable information are embedded in blogs, websites, photos tags, and tweets. No accurate locational information may be available. Instead of rejecting these sources outright due to their inherent uncertainty, we may choose to accept uncertainty as an intrinsic characteristic of platial data and make good use of these data sources. Place may be represented simply as a point whose absolute location is not critical. Instead, it is relative locations of places and the connectivity and relations between them that matter.
1.22.8
Future Directions
Uncertainty as an inevitable component of any spatial databases has a profound impact on analyses and decisions relying on these data. GIScientists have devoted much effort to quantify, assess, and visualize data uncertainty by developing mathematic formulae and simulation methods to characterize positional and attribute uncertainty in both vector and raster data models. Even though uncertainty in spatial data has been extensively studied, ranging from positional accuracy to attribute accuracy, from analytical approaches to Monte Carlo simulation approaches, from uncertainty of spatial objects to uncertainty propagation in spatial operations and computational models, there is still much work to be done in terms of modeling and conveying uncertainty. Current models mainly deal with positional accuracy, whereas attribute uncertainty has received little attention in the literature. It might be attributed to the difficulty of modeling categorical data which constitute a major part of attribute data. However, attribute data are an integral component of spatial data; the lack of uncertainty modeling for attribute data will eventually impact
334
Spatial Data Uncertainty
uncertainty assessment as a whole. Therefore, more research needs to be done on models for attribute uncertainty, as well as models for integrating positional uncertainty and attribute uncertainty. Modeling for uncertainty propagation in spatial operations like buffer and overlay analysis is largely investigated for raster data instead of for vector data, as these operations are more straightforward in raster data. In spite of this, many studies use vector data to perform spatial analysis. In some cases, vector data are the best model, such as network analysis. Without appropriate models for uncertainty assessment, the results cannot be fully trusted. Thus, it is important to develop more models for uncertainty propagation in spatial operations using vector data. The need to integrate uncertainty models for vector and raster data is a natural result of the integration of vector and raster data models. For example, in a zonal statistics analysis of calculating the average temperature for each county in a state, the county data is vector data, and the temperature is raster data. The zonal statistics is a combination of raster data and vector data. Monte Carlo simulation might be a solution to assess uncertainty propagation during the zonal statistics analysis, but it is also important to develop an integrated analytical method in order to understand the relationship behind. Fuzzy-set and ontologies have been used to address semantic uncertainty while cartographic techniques have been adopted to visualize uncertainty for better communication with the wider audience. The increasingly growing crowd-sourced spatial data have added several new dimensions to the data uncertainty research. New methods must be developed to evaluate positional and attribute uncertainty for datasets created without quality control procedures. Given that a large portion of crowd-sourced geographic data is centered on places without accurate positional information, uncertainty of platial data is an inadequately studied area. Regarding semantic uncertainty, the existing literature focuses on fuzzy-set approaches to measure uncertainty and ontologies to deal with formal description of uncertainty in the computing environment. The issues of semantic uncertainty are present in a wide spectrum of applications including urban, environmental, technical, and social aspects. Some of recent studies utilize advanced methodologies of information technology and computer science such as web-based systems and Digital Earth. The study of semantic uncertainty might contribute to wider areas of knowledge with further development of technology such as virtual environment to deal with uncertainty problems of data from both real space and virtual space. Current studies of uncertainty visualization utilize some elements of cartography such as hue, saturation, transparency, shape, and orientation to represent uncertainty of spatial concepts and data. Some of recent studies utilize advanced methodologies such as web-based 3D visualization and augmented reality. The study of uncertainty visualization could benefit from future development of IT by expanding the type of uncertainty visualization. As an alternative to authoritative spatial data, crowd-sourced geographic information is promising in providing free and timely data for various applications such as basic spatial infrastructure data, routing and navigation data, places of interest, and data for tourism and emergency response. However, uncertainty associated with this new data source needs to be investigated and understood more thoroughly. On one hand, new methods need to be developed to evaluate data quality as demonstrated in this article. Four approaches have been proposed or applied, namely the crowd-sourcing approach, the social approach, the geographic approach, and the comparison approach. These methods have not been completely implemented as an automatic process in software packages. On the other hand, we may not need to be obsessed with highly accurate spatial data in every aspect. New methods and techniques should be developed to take advantage of large volumes of uncertain data that are centered on places, represented by placenames and relationships between places. Many approaches to represent spatial data uncertainty have been proposed and summarized in this article. However, currently many of these approaches are not accessible to everyday geospatial practitioners. For general users, it is impossible to apply those models in real-world applications. This may be due to a number of factors. n n n n
The scientific community has not come to a consensus regarding approaches to be taken. Software vendors have not integrated approaches to quantify and visualize uncertainty. Academic communities are not consistently training students in methods to address, quantify, and visualize uncertainty, thus there is not a demand for software vendors to provide this service. The community of practitioners may still be reticent to accept the fuzziness inherent in all spatial data and perhaps view it as a threat to the perceived validity of geospatial results.
The time has come for uncertainty to be integrated into common geospatial practice. To do so requires an understanding of the complexities of uncertainty and a consensus within the scholarly and professional communities regarding how it should be approached.
References Ahlqvist, O., 2004. A parameterized representation of uncertain conceptual spaces. Transactions in GIS 8 (4), 493–514. Ahlqvist, O., 2005a. Using uncertain conceptual spaces to translate between land cover categories. International Journal of Geographical Information Science 19 (7), 831–857. Ahlqvist, O., 2005b. Transformation of geographic information using crisp, fuzzy and rough semantics. In: Fisher, P., Unwin, D. (Eds.), Re-presenting GIS. John Wiley & Sons, London, p. 99. Ahlqvist, O., 2008. Extending post-classification change detection using semantic similarity metrics to overcome class heterogeneity: A study of 1992 and 2001 US National Land Cover Database changes. Remote Sensing of Environment 112 (3), 1226–1241. Ahlqvist, O., Ban, H., 2007. Categorical measurement semantics: A new second space for geography. Geography Compass 1 (3), 536–555. Ahlqvist, O., Gahegan, M., 2005. Probing the relationship between classification error and class similarity. Photogrammetric Engineering & Remote Sensing 71 (12), 1365–1373.
Spatial Data Uncertainty
335
Ahlqvist, O., Keukelaar, J., Oukbir, K., 2003. Rough and fuzzy geographical data integration. International Journal of Geographical Information Science 17 (3), 223–234. Ahlqvist, O., Bibby, P., Duckham, M., Fisher, P., Harvey, F., Schuurman, N., 2005. Not just objects: Reconstructing objects. In: Fisher, P., Unwin, D. (Eds.), Re-presenting GIS. John Wiley & Sons, London, pp. 17–25. Alai, J. (1993). Spatial uncertainty in a GIS. Master of Science, The University of Calgary, Calgary. Alesheikh, A. A. (1998). Modeling and managing uncertainty in object-based geospatial information systems. Doctor of Philosophy, The University of Calgary, Alberta. Alesheikh, A. A. and Li, R. (1996). Rigorous uncertainty models of line and polygon objects in GIS. Paper presented at the GIS LIS -International Conference ’06, Denver. Alesheikh, A.A., Blais, J.A.R., Chapman, M.A., Kariml, H., 1999. Rigorous geospatial data uncertainty models for GISs. In: Jaton, Annick, Lowell, Kim (Eds.), Spatial accuracy assessment: Land information uncertainty in natural resources. Ann Arber Press, Chelsea, pp. 195–202. Anderson, T.V., Mattson, C.A., Larson, B.J., Fullwood, D.T., 2012. Efficient propagation of error through system models for functions common in engineering. Journal of Mechanical Design 134 (1), 1–6. ASPRS, 1989. ASPRS accuracy standards for large scale maps. Photogrammetric Engineering and Remote Sensing 56, 1068–1070. Ban, H., Ahlqvist, O., 2009. Representing and negotiating uncertain geospatial concepts–Where are the exurban areas? Computers, Environment and Urban Systems 33 (4), 233–246. Bastin, L., Fisher, P.F., Wood, J., 2002. Visualizing uncertainty in multi-spectral remotely sensed imagery. Computers & Geosciences 28 (3), 337–350. Bastin, L., Cornford, D., Jones, R., Heuvelink, G.B., Pebesma, E., Stasch, C., Nativi, S., Mazzetti, P., Williams, M., 2013. Managing uncertainty in integrated environmental modelling: The UncertWeb framework. Environmental Modelling & Software 39, 116–134. Bennett, B., 2001. What is a forest? On the vagueness of certain geographic concepts. Topoi 20 (2), 189–201. Berube, A., Singer, A., Wilson, J.H., Frey, W.H., 2006. Finding exurbia: America’s fast-growing communities at the metropolitan fringe. In: The Brookings institution: Living cities census series, pp. 1–48. October. Blakemore, M., 1984. Generalization and error in spatial databases. Cartographica 21 (2), 131–139. Blöschl, G. (1996). Scale and scaling in hydrology. Technical University, Institut für Hydraulik, Gewässerkunde, und Wasserwirtschaft. Bone, C., Dragicevic, S., Roberts, A., 2005. Integrating high resolution remote sensing, GIS and fuzzy set theory for identifying susceptibility areas of forest insect infestations. International Journal of Remote Sensing 26 (21), 4809–4828. Bonin, O. (2000). New advances in error simulation in vector geographical databases. Paper presented at the 4th International Symposium on Spatial Accuracy Assessment in Natural Resources and Environmental Sciences, University of Amsterdam, The Netherlands. Bonin, O., 2002. Large deviation theorems for weighted sums applied to a geographical problem. Journal of Applied Probability 39 (2), 251–260. Bonin, O. (2006). Sensitivity analysis and uncertainty analysis for vector geographical applications. Paper presented at the 7th International Symposium on Spatial Accuracy Assessment in Natural Resources and Environmental Sciences, Lisbon. Bordogna, G., Ghisalberti, G., Psaila, G., 2012. Geographic information retrieval: Modeling uncertainty of user’s context. Fuzzy Sets and Systems 196, 105–124. Bostrom, A., Anselin, L., Farris, J., 2008. Visualizing seismic risk and uncertainty. Annals of the New York Academy of Sciences 1128 (1), 29–40. Brodlie, K., Osorio, R.A., Lopes, A., 2012. A review of uncertainty in data visualization. In: Expanding the frontiers of visual analytics and visualization. Springer, London, pp. 81–109. Brown, J.D., 2004. Knowledge, uncertainty and physical geography: Towards the development of methodologies for questioning belief. Transactions of the Institute of British Geographers 29 (3), 367–381. Brus, J. (2013). Uncertainty vs. spatial data quality visualisations: A case study on ecotones. 13th SGEM GeoConference on Informatics, Geoinformatics And Remote Sensing, (International Multidisciplinary Scientific GeoConference SGEM2013), vol. 1, pp. 1017–1024. Albena, Bugaria. Buccella, A., Cechich, A., Gendarmi, D., Lanubile, F., Semeraro, G., Colagrossi, A., 2011. Building a global normalized ontology for integrating geographic data sources. Computers & Geosciences 37 (7), 893–916. Burrough, P.A., Frank, A., 1996. Geographic objects with indeterminate boundaries, 2. CRC Press, London. Burrough, P.A., MacMillan, R.A., Deursen, W.V., 1992. Fuzzy classification methods for determining land suitability from soil profile observations and topography. Journal of Soil Science 43 (2), 193–210. Buttenfield, B.P., 1993. Representing data quality. Cartographica: The International Journal for Geographic Information and Geovisualization 30 (2–3), 1–7. Buttenfield BP (2000) Chapter 6: Mapping Ecological Uncertainty. In: Hunsaker CT, Goodchild MF, Friedl MA, and Case TJ (eds.) 2001 Spatial Uncertainty in Ecology, pp. 116–132. New York: Springer-Verlag. Carter, J., 1992. The effect of data precision on the calculation of slope and aspect using gridded DEMs. Cartographica: The International Journal for Geographic Information and Geovisualization 29 (1), 22–34. Caspary, W. and Scheduring, R. (1992). Error-bands as measures of geometrical accuracy. Paper presented at the Third European Conference on GIS (EGIS’92), Munich. Chen, X. (1996). Spatial relations between uncertain sets. Paper presented at the International Archives of Photogrammetry and Remote Sensing, Vienna. Chen, J., & Shaw, S. L. (2016). Representing the spatial extent of places based on Flickr photos with a representativeness-weighted Kernel density estimation. In: International Conference on Geographic Information Science, pp. 30–144. Chen, Y., Yu, J., Shahbaz, K. and Xevi, E. (2009). A GIS-based sensitivity analysis of multi-criteria weights. Paper presented at the 18th World IMAC Congress and MODSIM09 International Congress on Modelling and Simulation, Cairns. Chen, Y., Yu, J., Khan, S., 2010. Spatial sensitivity analysis of multi-criteria weights in GIS-based land suitability evaluation. Environmental Modelling & Software 25 (12), 1582–1591. Cheng, T., Adepeju, M., 2014. Modifiable temporal unit problem (MTUP) and its effect on space-time cluster detection. PLoS One 9 (6), e100465. Cheung, C. K. and Shi, W. (2000). A simulation approach to analyze error in buffer spatial analysis. Paper presented at the International archives of Photogrammetry and Remote Sensing, Amsterdam. Chrisman, N. R. (1982). A theory of cartographic error and its measurement in digital bases. Paper presented at the Fifth International Symposium on Computer-Assisted Cartography and International Society for Photogrammetry and Remote Sensing (Auto-Carto 5): Environmental Assessment and Resource Management, Crystal City. Chrisman, N. R. (1989). Error in categorical maps: Testing versus simulation. Paper presented at the Ninth International Symposium on Computer-Assisted Cartography (Auto-Carto 9), Baltimore. Chrisman, N.R., 1991. The error component in spatial data. Geographical Information Systems 1, 165–174. Chrisman, N.R., Yandell, B.S., 1988. Effects of point error on area calculations: A statistical model. Surveying and Mapping 48, 241–246. Cipeluch, B., Jacob, R., Winstanley, A. and Mooney, P. (2010). Comparison of the accuracy of OpenStreetMap for Ireland with Google Maps and Bing Maps. In: Proceedings of the Ninth International Symposium on Spatial Accuracy Assessment in Natural Resources and Environmental Sciences 20–23rd July 2010. University of Leicester, pp. 37–40. Claramunt, C., Theriault, M., 1996. Toward semantics for modelling spatio-temporal processes within GIS. Advances in GIs Research I, 27–43. Clementini, E., Felice, D.P., 1996. An algebraic model for spatial objects with indeterminate boundaries. In: Burrough, P.A., Frank, A.U. (Eds.), Geographic objects with indeterminate boundaries. Taylor & Francis, London, pp. 155–169. Cohen, J., 1960. A coefficient of agreement of nominal scales. Educational and Psychological Measurement 20 (1), 37–46. Cohn, A.G., Gotts, N.M., 1996. The “Egg-Yolk” representation of regions with indeterminate boundaries. In: Burrough, P.A., Frank, A.U. (Eds.), Geographic objects with indeterminate boundaries. Taylor & Francis, London, pp. 171–187. Çöltekin, A., De Sabbata, S., Willi, C., Vontobel, I., Pfister, S., Kuhn, M., & Lacayo, M. (2011). Modifiable temporal unit problem. Paper presented at the ISPRS/ICA workshop “Persistent Problems in Geographic Visualization” (ICC2011), Paris.
336
Spatial Data Uncertainty
Comber, A.J., Fisher, P.F., Wadsworth, R., 2004. Integrating land-cover data with different ontologies: Identifying change from inconsistency. International Journal of Geographical Information Science 18 (7), 691–708. Comber, A.J., Fisher, P.F., Harvey, F., Gahegan, M., Wadsworth, R., 2006. Using metadata to link uncertainty and data quality assessments. In: Progress in spatial data handling. Springer, Berlin and Heidelberg, pp. 279–292. Congalton, R.G., 1988. A comparison of sampling schemes used in generating error matrices for assessing the accuracy of maps generated from remotely sensed data. Photogrammetric Engineering & Remote Sensing 54 (5), 593–600. Congalton, R.G., 1991. A review of assessing the accuracy of classifications of remotely sensed data. Remote Sensing of Environment 37 (1), 35–46. Couclelis, H., 1999. Space, time, geography. Geographical Information Systems 1, 29–38. Couclelis, H., 2003. The certainty of uncertainty: GIS and the limits of geographic knowledge. Transactions in GIS 7 (2), 165–175. Couclelis, H., 2010. Ontologies of geographic information. International Journal of Geographical Information Science 24 (12), 1785–1809. Crosetto, M., Tarantola, S., 2001. Uncertainty and sensitivity analysis: Tools for GIS-based model implementation. International Journal of Geographical Information Science 15 (5), 415–437. Cross, V., Firat, A., 2000. Fuzzy objects for geographical information systems. Fuzzy Sets and Systems 113 (1), 19–36. Cruzan, M.B., Weinstein, B.G., Grasty, M.R., Kohrn, B.F., Hendrickson, E.C., Arredondo, T.M., Thompson, P.G., 2016. Small unmanned aerial vehicles (micro-UAVs, drones) in plant ecology. Applications in Plant Sciences 4 (9). http://dx.doi.org/10.3732/apps.160004. Daniel, C., Tennant, K., 2001. DEM quality assessment. In: Digital elevation model technologies and applications: The DEM users manual. ASPRS, Bethesda, pp. 395–440. Daniels, T., 1999. When city and country collide. Island Press, Washington, DC. Dark, S.J., Bram, D., 2007. The modifiable areal unit problem (MAUP) in physical geography. Progress in Physical Geography 31 (5), 471–479. http://dx.doi.org/10.1177/ 0309133307083294. Davidson, D.A., Theocharopoulos, S.P., Bloksma, R.J., 1994. A land evaluation project in Greece using GIS and based on Boolean and fuzzy set methodologies. International Journal of Geographical Information Systems 8 (4), 369–384. Davis, T.J., Keller, C.P., 1997a. Modelling and visualizing multiple spatial uncertainties. Computers & Geosciences 23 (4), 397–408. Davis, T.J., Keller, C.P., 1997b. Modelling uncertainty in natural resource analysis using fuzzy sets and Monte Carlo simulation: Slope stability prediction. International Journal of Geographical Information Science 11 (5), 409–434. De Gruijter, J.J., Walvoort, D.J.J., Van Gams, P.F.M., 1997. Continuous soil mapsdA fuzzy set approach to bridge the gap between aggregation levels of process and distribution models. Geoderma 77 (2), 169–195. Deitrick, S., Edsall, R., 2006. The influence of uncertainty visualization on decision making: An empirical evaluation. In: Progress in spatial data handling. Springer, Heidelberg, pp. 719–738. Deitrick, S., Edsall, R., 2008. Making uncertainty usable: Approaches for visualizing uncertainty information. Geographic visualization: Concepts, tools and applications. John Wiley & Sons, Hoboken, pp. 277–291. Delmelle, E., Dony, C., Casas, I., Jia, M., Tang, W., 2014. Visualizing the impact of space-time uncertainties on dengue fever patterns. International Journal of Geographical Information Science 28 (5), 1107–1127. Deutsch, C.V., Journel, A.G., 1992. GSLB: Geostatistical software library and user’s guide. Oxford University Press, Oxford. Devillers, R., Stein, A., Bédard, Y., Chrisman, N., Fisher, P., Shi, W., 2010. Thirty years of research on spatial data quality: Achievements, failures, and opportunities. Transactions in GIS 14 (4), 387–400. http://dx.doi.org/10.1111/j.1467-9671.2010.01212.x. Dixon, B., 2005. Groundwater vulnerability mapping: A GIS and fuzzy rule based integrated tool. Applied Geography 25 (4), 327–347. Dorling, D., 1995. A new social atlas of Britain. John Wiley and Sons, London. Dragicevic, S., Marceau, D.J., 2000. A fuzzy set approach for modelling time in GIS. International Journal of Geographical Information Science 14 (3), 225–245. Drecki, I., 2002. Visualisation of uncertainty in geographical data. In: Spatial data quality. Taylor & Francis, London, pp. 140–159. Drucker, J., 2011. Humanities approaches to graphical display. Digital Humanities Quarterly 5 (1), 1–21. Drummond, J., 1995. Positional accuracy. In: Guptill, S.C., Morrison, J.L. (Eds.), Elements of spatial data quality. Elsevier Science Ltd, Oxford, pp. 31–38. Duckham, M., Sharp, J., 2005. Uncertainty and geographic information: Computational and critical convergence. In: Fisher, P., Unwin, D. (Eds.), Re-presenting GIS. John Wiley & Sons, London, pp. 113–124. Duckham, M., Mason, K., Stell, J., Worboys, M., 2001. A formal approach to imperfection in geographic information. Computers, Environment and Urban Systems 25 (1), 89–103. Duckham, M., Lingham, J., Mason, K., Worboys, M., 2006. Qualitative reasoning about consistency in geographic information. Information Sciences 176 (6), 601–627. Dungan, J.L., 2002. Toward a comprehensive view of uncertainty in remote sensing analysis. In: Uncertainty in remote sensing and GIS, 3. Wiley, Hoboken, pp. 25–35. Dunn, R., Harrison, A.R., White, J.C., 1990. Positional accuracy and measurement error in digital databases of land use: An empirical study. International Journal of Geographical Information systems 4 (4), 385–398. Dutton, G. (1992). Handling positional uncertainty in spatial databases. Paper presented at the 5th International Symposium on Spatial data Handling, Charleston. Edwards, G., Lowell, K.E., 1996. Modeling uncertainty in photointerpreted boundaries. Photogrammetric Engineering and Remote Sensing 62 (4), 377–391. Egenhofer, M.J., Herring, J.R., 1991. Categorizing binary topological relationships between regions, lines, and points in geographic databases. Department of Surveying Engineering, University of Maine, Orono. Ehlschlaeger CR (1998) The stochastic simulation approach: Tools for representing spatial application uncertanity. Unpublished doctoral thesis, University of California, Santa Barbara. Ehlschlaeger, CR, and Shortridge A (1996) Modeling elevation uncertainty in geographical analyses. In: Proceedings of the International Symposium on Spatial Data Handling, p. 9B. Esri (2017) An overview of the space time pattern mining Toolbox. http://desktop.arcgis.com/en/arcmap/10.3/tools/space-time-pattern-mining-toolbox/an-overview-of-the-spacetime-pattern-mining-toolbox.htm (Accessed 4 November 2017). Evans, A.J., Waters, T., 2007. Mapping vernacular geography: Web-based GIS tools for capturing fuzzy or vague entities. International Journal of Technology, Policy and Management 7 (2), 134–150. Fan, A., Guo, D., 2001. The uncertainty band model of error entropy. Acta Geodaetica el Cartographica Sinica 30, 48–53. Fan, H., Zipf, A., Fu, Q., Neis, P., 2014. Quality assessment for building footprints data on OpenStreetMap. International Journal of Geographical Information Science 28 (4), 700–719. Feizizadeh, B., Blaschke, T., 2014. An uncertainty and sensitivity analysis approach for GIS-based multicriteria landslide susceptibility mapping. International Journal of Geographical Information Science 28 (3), 610–638. FGDC (1998). Federal Geographic Data Committee. Geospatial Positioning Accuracy Standards Part 3: National Standard for Spatial Data Accuracy. Subcommittee for Base Cartographic Data, Federal Geographic Data Committee, FGDC-STD-007.3-1998. FGDC.gov. Fisher, P.F., 1991a. First experiments in viewshed uncertainty: The accuracy of the viewshed area. Photogrammetric Engineering and Remote Sensing 57 (10), 1321–1327. Fisher, P.F., 1991b. Modelling soil map-unit inclusions by Monte Carlo simulation. International Journal of Geographical Information System 5 (2), 193–208. Fisher, P.F., 1992. First Experiments in Viewshed Uncertainty: Simulating Fuzzy Viewsheds. Photogrammetric engineering and remote sensing 58 (3), 345–352. Fisher, P., 1996. Boolean and fuzzy regions. In: Masser I and Salge F (eds.) Geographic objects with indeterminate boundaries, GISDATA2. Taylor and Francis. ISBN-10: 0748403876. Fisher, P.F., 1999. Models of uncertainty in spatial data. Geographical Information Systems 1, 191–205.
Spatial Data Uncertainty
337
Fisher, P.F., 2000. Sorites paradox and vague geographies. Fuzzy Sets and Systems 113 (1), 7–18. Fisher, P.F., Pathirana, S., 1990. The evaluation of fuzzy membership of land cover classes in the suburban zone. Remote Sensing of Environment 34 (2), 121–132. Fisher, P.F., Tate, N.J., 2006. Causes and consequences of error in digital elevation models. Progress in Physical Geography 30 (4), 467–489. Fisher, P., Wood, Jo, 1998. What is a mountain? Or the Englishman who went up a Boolean geographical concept but realised it was fuzzy. Geography Compass 83 (3), 247–256. Fonseca, F.T., Egenhofer, M.J., Agouris, P., Câmara, G., 2002. Using ontologies for integrated geographic information systems. Transactions in GIS 6 (3), 231–257. Fonstad, M.A., Dietrich, J.T., Courville, B.C., Jensen, J.L., Carbonneau, P.E., 2013. Topographic structure from motion: A new development in photogrammetric measurement. Earth Surface Processes and Landforms 38 (4), 421–430. http://dx.doi.org/10.1002/esp.3366. Foody, G.M., 2003. Uncertainty, knowledge discovery and data mining in GIS. Progress in Physical Geography 27 (1), 113–121. Fotheringham, A.S., Wong, D.W., 1991. The modifiable areal unit problem in multivariate statistical analysis. Environment and Planning A 23 (7), 1025–1044. Frank, A.U., 1997. Spatial ontology: A geographical information point of view. In: Spatial and temporal reasoning. Springer, Dordrecht, pp. 135–153. Frank, A.U., 2003. Ontology for spatio-temporal databases. In: Spatio-temporal databases. Springer, Berlin and Heidelberg, pp. 9–77. Fukunaga, K., Hayes, R.R., 1989. Effects of sample size in classifier design. IEEE Transactions on Pattern Analysis and Machine Intelligence 11 (8), 873–885. Gahegan MN (1999) Characterizing the semantic content of geographic data, models, and systems. Interoperating Geographic Information Systems, pp. 71–83. US: Springer. Gallik, J., Bolesova, L., 2016. sUAS and their application in observing geomorphological processes. Solid Earth 7 (4), 1033–1042. http://dx.doi.org/10.5194/se-7-1033-2016. Ge, Y., Li, S., Lakhan, V.C., Lucieer, A., 2009. Exploring uncertainty in remotely sensed data with parallel coordinate plots. International Journal of Applied Earth Observation and Geoinformation 11 (6), 413–422. Gehlke, C.E., Biehl, K., 1934. Certain effects of grouping upon the size of the correlation coefficient in census tract material. Journal of the American Statistical Association 29 (185A), 169–170. Ghilani, C.D., 2000. Demystifying area uncertainty: More or less. Surveying and Land Information Systems 60 (3), 177–182. Girres, J.F., Touya, G., 2010. Quality assessment of the French OpenStreetMap dataset. Transactions in GIS 14 (4), 435–459. Goncalves, J.A., Henriques, R., 2015. UAV photogrammetry for topographic monitoring of coastal areas. ISPRS Journal of Photogrammetry and Remote Sensing 104, 101–111. http://dx.doi.org/10.1016/j.isprsjprs.2015.02.009. Goodchild, M. F. (1991). Symposium on spatial database accuracy. Paper presented at the Symposium on Spatial Database Accuracy, Melbourne. Goodchild, M.F., 1995. Attribute accuracy. In: Guptill, S.C., Morrison, J.L. (Eds.), Elements of spatial data quality. Elsevier Science Ltd, Oxford. Goodchild, M.F., 2000. Introduction: special issue on ‘uncertainty in geographic information systems’. Fuzzy sets and systems 113 (1), 3–5. Goodchild, M.F., 2001. Metrics of scale in remote sensing and GIS. International Journal of Applied Earth Observation and Geoinformation 3 (2), 114–120. Goodchild, M.F., 2004. GIScience, geography, form, and process. Annals of the Association of American Geographers 94 (4), 709–714. Goodchild, M.F., 2007. Citizens as sensors: The world of volunteered geography. GeoJournal 69 (4), 211–221. Goodchild, M.F., 2011. Scale in GIS: An overview. Geomorphology 130 (1), 5–9. Goodchild, M.F., Gopal, S. (Eds.), 1989. The accuracy of spatial databases. CRC Press, Boca Raton. Goodchild, M.F., Hunter, G.J., 1997. A simple positional accuracy measure for linear features. International Journal of Geographical Information Science 11 (3), 299–306. http:// dx.doi.org/10.1080/136588197242419. Goodchild, M.F., Li, L., 2012. Assuring the quality of volunteered geographic information. Spatial Statistics 1, 110–120. Goodchild, M.F., Proctor, J., 1997. Scale in a digital geographic world. Geographical and environmental modelling 1, 5–24. Goodchild, M.F., Quattrochi, D.A., 1997. Scale, multiscaling, remote sensing, and GIS. Press, CRS, ISBN 9781566701044. Goodchild, M.F., Yuan, M., Cova, T.J., 2007. Towards a general theory of geographic representation in GIS. International Journal of Geographical Information Science 21 (3), 239–260. Goodchild, M.F., Guo, H., Annoni, A., Bian, L., de Bie, K., Campbell, F., Cragliac, M., Ehlersg, M., van Genderene, J., Jacksonh, D., Lewisi, A.J., Pesaresic, M., Remetey-Fülöppj, G., Simpsonk, R., Skidmoref, A., Wangb, C., Woodgatel, P., 2012. Next-generation digital earth. Proceedings of the National Academy of Sciences 109 (28), 11088–11094. Griffith, D.A., 1989. Distance calculations and errors in geographic databases. In: Goodchild, M.F., Gopal, S. (Eds.), Accuracy of spatial databases. Taylor & Francis, London, pp. 81–90. Griffith, D., Chun, Y., 2016. Spatial Autocorrelation and Uncertainty Associated with Remotely-Sensed Data. Remote Sensing 8 (7), 535. Grira, J., Bédard, Y., Roche, S., 2010. Spatial data uncertainty in the VGI world: Going from consumer to producer. Geomatica 64 (1), 61–72. Grothe, C., Schaab, J., 2009. Automated footprint generation from geotags with kernel density estimation and support vector machines. Spatial Cognition and Computation 9 (3), 195–211. Guesgen, H.W., Albrecht, J., 2000. Imprecise reasoning in geographic information systems. Fuzzy Sets and Systems 113 (1), 121–131. Guptill, S.C., Morrison, J.L., 1995. Elements of spatial data quality. Elsevier Science, Oxford. Hagen-Zanker, A., Straatman, B., Uljee, I., 2005. Further developments of a fuzzy set map comparison approach. International Journal of Geographical Information Science 19 (7), 769–785. Haklay, M., 2010. How good is volunteered geographical information? A comparative study of OpenStreetMap and Ordnance Survey datasets. Environment and Planning B: Planning and Design 37 (4), 682–703. Harvey, D., 1993. From space to place and back again: Reflections on the condition of postmodernity. In: Bird, J., Curtis, B., Putnam, T., Tickner, L. (Eds.), Mapping the futures. Routledge, London, pp. 3–29. Harwin, S., Lucieer, A., 2012. Assessing the accuracy of georeferenced point clouds produced via multi-view stereopsis from unmanned aerial vehicle (UAV) imagery. Remote Sensing 4 (12), 1573–1599. http://dx.doi.org/10.3390/rs4061573. Hays, T.E., 1993. “The new guinea highlands”: Region, culture area, or fuzzy set? [and comments and reply]. Current Anthropology 34 (2), 141–164. Helton, J.C., Davis, F.J., 2003. Latin hypercube sampling and the propagation of uncertainty in analyses of complex systems. Reliability Engineering & System Safety 81 (1), 23–69. Hengl, T., Heuvelink, G., Loon, E., 2010. On the uncertainty of stream networks derived from elevation data: The error propagation approach. Hydrology and Earth System Sciences 14 (7), 1153–1165. Heuvelink, G.B., 1998. Error propagation in Environmental modelling with GIS. CRC Press, Boca Raton. Heuvelink, G.B., 1999. Propagation of error in spatial modelling with GIS. Geographical Information Systems 1, 207–217. Heuvelink, G.B., Brown, J.D., van Loon, E.E., 2007. A probabilistic framework for representing and simulating uncertain environmental variables. International Journal of Geographical Information Science 21 (5), 497–513. Hong, S., Vonderohe, A.P., 2014. Uncertainty and sensitivity assessments of GPS and GIS integrated applications for transportation. Sensors 14 (2), 2683–2702. Hoover, W. E. (1984). Algorithms for confidence circles and ellipses (NOS 107 C&GS 3). https://www.ngs.noaa.gov/PUBS_LIB/AlgorithmsForConfidenceCirclesAndEllipses_TR_ NOS107_CGS3.pdf (Accessed, April, 10, 2017). Hope, S., Hunter, G.J., 2007. Testing the effects of positional uncertainty on spatial decision-making. International Journal of Geographical Information Science 21 (6), 645–665. Horn, B.K., 1981. Hill shading and the reflectance map. Proceedings of the IEEE 69 (1), 14–47. Hubbard, P., Kitchin, R. (Eds.), 2010. Key thinkers on space and place. Sage. Thousand Oaks, CA. Hudelot, C., Atif, J., Bloch, I., 2008. Fuzzy spatial relation ontology for image interpretation. Fuzzy Sets and Systems 159 (15), 1929–1951. Hugenholtz, C.H., Whitehead, K., Brown, O.W., Barchyn, T.E., Moorman, B.J., LeClair, A., Hamilton, T., 2013. Geomorphological mapping with a small unmanned aircraft system (sUAS): Feature detection and accuracy assessment of a photogrammetrically-derived digital terrain model. Geomorphology 194, 16–24. http://dx.doi.org/10.1016/ j.geomorph.2013.03.023.
338
Spatial Data Uncertainty
Hunsaker CT, Goodchild MF, Friedl MA and Case TJ (eds.) (2013) Spatial uncertainty in ecology: implications for remote sensing and GIS applications. Springer Science & Business Media. Hunter, G.J., Goodchild, M.F., 1996. A new model for handling vector data uncertainty in GIS. Journal of Urban and Regional Systems Association 8 (1), 51–57. Hunter, G.J., Goodchild, M.F., 1997. Modeling the uncertainty of slope and aspect estimates derived from spatial databases. Geographical Analysis 29 (1), 35–49. Hunter, G.J., Qiu, J., Goodchild, M.F., 2000. Application of a new model of vector data uncertainty. In: Jaton, A., Lowell, K. (Eds.), Spatial accuracy assessment: Land information uncertainty in natural resources. Ann Arbor Press, Michigan, pp. 203–208. Janowicz, K., Raubal, M., Kuhn, W., 2011. The semantics of similarity in geographic information retrieval. Journal of Spatial Information Science 2011 (2), 29–57. Jelinski, D.E., Wu, J., 1996. The modifiable areal unit problem and implications for landscape ecology. Landscape Ecology 11 (3), 129–140. Jenson, S.K., Domingue, J.O., 1988. Extracting topographic structure from digital elevation data for geographic information system analysis. Photogrammetric Engineering and Remote Sensing 54 (11), 1593–1600. Jiang, H., Eastman, J.R., 2000. Application of fuzzy measures in multi-criteria evaluation in GIS. International Journal of Geographical Information Science 14 (2), 173–184. Jones, C.B., Purves, R.S., Clough, P.D., Joho, H., 2008. Modelling vague places with knowledge from the web. International Journal of Geographical Information Science 22 (10), 1045–1065. Kardos, J., Benwell, G., Moore, A., 2005. The visualisation of uncertainty for spatially referenced census data using hierarchical tessellations. Transactions in GIS 9 (1), 19–34. Kavouras, M., Kokla, M., 2007. Theories of geographic concepts: Ontological approaches to semantic integration. CRC Press, Boca Raton. Kaye, N.R., Hartley, A., Hemming, D., 2012. Mapping the climate: Guidance on appropriate techniques to map climate variables and their uncertainty. Geoscientific Model Development 5 (1), 245–256. Keßler, C., Mau, P., Heuer, J.T., Bartoschek, T., 2009. Bottom-up gazetteers: Learning from the implicit semantics of geotags. In: GeoS ’09: Proceedings of the Third International Conference on GeoSpatial Semantics. Springer, Berlin, pp. 83–102. Kiiveri, H.T., 1997. Assessing, representing and transmitting positional uncertainty in maps. International Journal of Geographical Information Science 11 (1), 33–52. Klir, G., Yuan, B., 1995. Fuzzy sets and fuzzy logic, 4. Prentice hall, New Jersey. Kraus, K., Kager, H., 1994. Accuracy of derived data in a geographic information system. Computers, Environment and Urban Systems 18 (2), 87–94. 2011. Thematic uncertainty visualization usability–comparison of basic methods. Annals of GIS 17 (4), 253–263. Kubícek, P., Sasinka, C., Kuhn, W., 2009. A functional ontology of observation and measurement. In: International Conference on GeoSpatial Sematics. Springer, Berlin and Heidelberg, pp. 26–43. Kuipers, B., 2000. The spatial semantic hierarchy. Artificial Intelligence 119 (1), 191–233. Kunz, M., Grêt-Regamey, A., Hurni, L., 2011. Visualization of uncertainty in natural hazards assessments using an interactive cartographic information system. Natural Hazards 59 (3), 1735–1751. Kwan, M.-P., 2012a. The uncertain geographic context problem. Annals of the Association of American Geographers 102 (5), 958–968. Kwan, M.-P. (2012b). Uncertain geographic context problem: Implications for environmental health research. Paper presented at the 142nd APHA Annual Meeting and Exposition, 15–19 November 2014. New Orleans: LA. Ladner, R., Petry, F.E., Cobb, M.A., 2003. Fuzzy set approaches to spatial data mining of association rules. Transactions in GIS 7 (1), 123–138. Lam, N.S.N., Quattrochi, D.A., 1992. On the issues of scale, resolution, and fractal analysis in the mapping sciences. The Professional Geographer 44 (1), 88–98. Leung, Y., Yan, J., 1998. A locational error model for spatial features. International Journal of Geographical Information Science 12 (6), 607–620. Leung, Y., Ma, J., Goodchild, M.F., 2004a. A general framework for error analysis in measurement-based GIS Part 1: The basic measurement-error model and related concepts. Journal Geographical Systems 6, 381–402. Leung, Y., Ma, J., Goodchild, M.F., 2004b. A general framework for error analysis in measurement-based GIS part 3: Error analysis in intersections and overlays. Journal Geographical Systems 6, 325–354. Leung, Y., Ma, J., Goodchild, M.F., 2004c. A general framework for error analysis in measurement-based GIS part 4: Error analysis in length and area measurements. Journal of Geographical Systems 6, 403–428. Li, L., Clarke, K.C., 2012. Cartograms showing China’s population and wealth distribution. Journal of Maps 8 (3), 320–323. Li L and Goodchild MF (2012) Constructing places from spatial footprints. In: Proceedings of the 1st ACM SIGSPATIAL international workshop on crowdsourced and volunteered geographic information, 6 November 2012, pp. 15–21, Redondo Beach, ACM. Li L and Valdovinos J (2017) Optimized conflation of authoritative and crowd-sourced geographic data: Creating an integrated bike map. In: Information Fusion and Geographic Information Systems (IF&GIS’2017). Switzerland: Springer International Publishing. Li, D., Zhang, J., Wu, H., 2012. Spatial data quality and beyond. International Journal of Geographical Information Science 26 (12), 2277–2290. Ligmann-Zielinska, A., Jankowski, P., 2014. Spatially-explicit integrated uncertainty and sensitivity analysis of criteria weights in multicriteria land suitability evaluation. Environmental Modelling & Software 57, 235–247. Liu, Y., Phinn, S.R., 2003. Modelling urban development with cellular automata incorporating fuzzy-set approaches. Computers, Environment and Urban Systems 27 (6), 637–658. Liu, C., Tong, X., 2005. Relationship of uncertainty between polygon segment and line segment for spatial data in GIS. Geo-spatial Information Science 8 (3), 183–188. Lo, C.P., Yeung, A.K.W., 2002. Concepts and techniques of geographic information systems. Prentice-Hall Inc, Upper Saddle River. Lodwick, W.A., Monson, W., Svoboda, L., 1990. Attribute error and sensitivity analysis of map operations in geographic information systems: Suitability analysis. International Journal of Geographical Information systems 4 (4), 413–428. Love, A.L., Pang, A., Kao, D.L., 2005. Visualizing spatial multivalue data. IEEE Computer Graphics and Applications 25 (3), 69–79. Lowell, K., Jaton, A., 2000. Spatial accuracy assessment: Land information uncertainty in natural resources. CRC Press, Boca Raton. Lucieer, A., Kraak, M.J., 2004. Interactive and visual fuzzy classification of remotely sensed imagery for exploration of uncertainty. International Journal of Geographical Information Science 18 (5), 491–512. MacEachren, A.M., Robinson, A., Hopper, S., Gardner, S., Murray, R., Gahegan, M., Hetzler, E., 2005. Visualizing geospatial information uncertainty: What we know and what we need to know. Cartography and Geographic Information Science 32 (3), 139–160. http://dx.doi.org/10.1559/1523040054738936. MacEachren, A.M., Roth, R.E., O’Brien, J., Li, B., Swingley, D., Gahegan, M., 2012. Visual semiotics; uncertainty visualization: An empirical study. IEEE Transactions on Visualization and Computer Graphics 18 (12), 2496–2505. Malczewski, J., 2006. Ordered weighted averaging with fuzzy quantifiers: GIS-based multicriteria evaluation for land-use suitability analysis. International Journal of Applied Earth Observation and Geoinformation 8 (4), 270–277. Mark, D.M., Smith, B., Tversky, B., 1999. Ontology and geographic objects: An empirical study of cognitive categorization. In: International Conference on Spatial Information Theory. Springer, Berlin and Heidelberg, pp. 283–298. Mikhail, E.M., Gracie, G., 1981. Analysis and adjustment of survey measurements. Van Nostrand Reinhold Co, New York. Miller, M.D., 2016. The modifiable conceptual unit problem demonstrated using pollen and seed dispersal. Global Ecology and Conservation 6, 93–104. Monmonier, M., 2006. Cartography: Uncertainty, interventions, and dynamic display. Progress in Human Geography 30 (3), 373. Montello, D.R., Goodchild, M.F., Gottsegen, J., Fohl, P., 2003. Where’s downtown? Behavioral methods for determining referents of vague spatial queries. Spatial Cognition and Computation 3 (2–3), 185–204. Mooney, P. (2011). The evolution and spatial volatility of VGI in OpenStreetMap. In: Hengstberger Symposium Towards Digital Earth: 3D Spatial Data Infrastructures, pp. 7–8. Heidelberg. Morris, A. (2008). Uncertainty in spatial databases. In: Wilson, J. P. & Fotheringham, A. S. (eds.) The handbook of geographic information science, pp. 80–93. Oxford, UK: John Wiley & Sons.
Spatial Data Uncertainty
339
Mowrer, H.T., Congalton, R.G., 2003. Quantifying spatial uncertainty in natural resources: Theory and applications for GIS and remote sensing. CRC Press, Boca Raton. Neitzel, F., Klonowski, J., 2011. Mobile 3D mapping with a low-cost UAV system. International Archives of the Photogrammetry, Remote Sensing and Spatial Science 38, 1–6. Nelson, A., Reuter, H., Gessler, P., 2009. DEM production methods and sources. Developments in Soil Science 33, 65–85. Neprash, J.A., 1934. Some problems in the correlation of spatially distributed variables. Journal of the American Statistical Association 29 (185A), 167–168. Neutens, T., Witlox, F., Van de Weghe, N., De Maeyer, P., 2007. Human interaction spaces under uncertainty. Transportation Research Record: Journal of the Transportation Research Board 2021, 28–35. Openshaw, S., 1984a. Ecological fallacies and the analysis of areal census data. Environment and Planning A 16 (1), 17–31. Openshaw, S., 1984b. The modifiable areal unit problem. CATMOG – Concepts and Techniques in Modern Geography. Geo Books: Norwich, England. Openshaw, S., 1998. Towards a more computationally minded scientific human geography. Environment and Planning A 30 (2), 317–332. Ovenden, M., 2007. Transit maps of the world. Penguin Books, London. Pang, A., 2001. Visualizing uncertainty in geo-spatial data. In: Proceedings of the Workshop on the Intersections between Geospatial Information and Information Technology. National Research Council, Arlington, pp. 1–14. Pappenberger, F., Frodsham, K., Beven, K., Romanowicz, R., Matgen, P., 2007. Fuzzy set approach to calibrating distributed flood inundation models using remote sensing observations. Hydrology and Earth System Sciences Discussions 11 (2), 739–752. Perkal, J. (1966). On the length of empirical curves. Paper presented at the Michigan Inter-University Community of Mathematical Geography, Ann Arbor. Peterson, A.T., Lash, R.R., Carroll, D.S., Johnson, K.M., 2006. Geographic potential for outbreaks of Marburg hemorrhagic fever. The American Journal of Tropical Medicine and Hygiene 75 (1), 9–15. Pfaffelmoser T, Reitinger M, and Westermann R (2011) Visualizing the positional and geometrical variability of isosurfaces in uncertain scalar fields. In: Computer Graphics Forum, vol. 30, no. 3, pp. 951–960. Oxford, Blackwell Publishing. Plata-Rocha, W., Gómez-Delgado, M., Bosque-Sendra, J., 2012. Proposal for the introduction of the spatial perspective in the application of global sensitivity analysis. Journal of Geographic Information System 4 (6), 503–513. Plewe, B., 2002. The nature of uncertainty in historical geographic information. Transactions in GIS 6 (4), 431–456. Prisley, S.P., Gregoire, T.G., Smith, J.L., 1989. The mean and variance of area estimates computed in an arc-node geographical information systems. Photogrammetric Engineering and Remote Sensing 55, 1601–1612. Quattrochi, D.A., Goodchild, M.F., 1997. Scale in remote sensing and GIS. CRC Press, Boca Raton. Rae, C., Rothley, K., Dragicevic, S., 2007. Implications of error and uncertainty for an environmental planning scenario: A sensitivity analysis of GIS-based variables in a reserve design exercise. Landscape and Urban Planning 79 (3–4), 210–217. Ragin, C.C., 2000. Fuzzy-set social science. University of Chicago Press, Chicago. Randell, D. A., Cui, Z., & Cohn, A. G. (1992). A spatial logic based on regions and connection. Paper presented at the 3rd International Conference on Knowledge Representation and Reasoning, Cambridge, MA. Raymond, E., 1999. The cathedral and the bazaar. Knowledge, Technology & Policy 12 (3), 23–49. Relph, E., 1976. Place and placelessness. Pion, London. Reshetyuk, Y., Martensson, S.G., 2016. Generation of highly accurate digital elevation models with unmanned aerial vehicles. Photogrammetric Record 31 (154), 143–165. http:// dx.doi.org/10.1111/phor.12143. Rinner, C., Heppleston, A., 2006. The Spatial Dimensions of Multi-Criteria Evaluation – Case Study of a Home Buyer’s Spatial Decision Support System. In: Raubal, M., Miller, H.J., Frank, A.U., Goodchild, M.F. (Eds.), Geographic Information Science, pp. 338–352. Proceedings of 4th International Conference, GIScience 2006, Münster, Germany, 20-23 September 2006. Robinson, V.B., 2003. A perspective on the fundamentals of fuzzy sets and their use in geographic information systems. Transactions in GIS 7 (1), 3–30. Rosenfield, G.H., Fitzpatrick-Lins, K., 1986. A coefficient of agreement as a measure of thematic classification accuracy. Photogrammetric Engineering and Remote Sensing 52 (2), 223–227. Roth, R.E., 2009. The impact of user expertise on geographic risk assessment under uncertain conditions. Cartography and Geographic Information Science 36 (1), 29–43. Ruddell, D., Wentz, E.A., 2009. Multi-tasking: Scale in geography. Geography Compass 3 (2), 681–697. Sae-Jung, J., Chen, X. and Phuong, D. (2008). Error propagation modeling in GIS polygon overlay. Paper presented at the International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Beijing. Saltelli, A., Chan, K., Scott, M. (Eds.), 2000. Sensitivity analysis. John Wiley & Sons, New York. Schneider, M., 1999. Uncertainty management for spatial data in databases: Fuzzy spatial data types. In: International Symposium on Spatial Databases. Springer, Berlin and Heidelberg, pp. 330–351. Schuurman, N., 2006. Formalization matters: Critical GIS and ontology research. Annals of the Association of American Geographers 96 (4), 726–739. Sen, S., 2008. Framework for probabilistic geospatial ontologies. International Journal of Geographical Information Science 22 (7), 825–846. Shi, W. (1994). Modeling positional and thematic error in integration of GIS and remote sensing. Enschede: ITC Publication. Shi W (2009) Principles of modeling uncertainties in spatial data and spatial analyses. Boca Raton, FL: CRC Press, Taylor & Francis Group. Shi, W., Liu, W., 2000. A stochastic process-based model for the positional error of line segments in GIS. International Journal of Geographical Information Science 14 (1), 51–66. Shi, W., Liu, K., 2004. Modeling fuzzy topological relations between uncertain objects in GIS. Photogrammetric Engineering and Remote Sensing 70 (8), 921–929. Shi, W., Tong, X., Liu, D., 2000. A approach for modeling error of generic curve features in GIS. Acta Geodaetica et Cartographica Sinica 29, 52–58. Shi, W., Cheung, C., Zhu, C., 2003. Modeling error propagation of buffer spatial analysis in vector-based GIS. International Journal of Geographical Information Science 17 (3), 251–271. Shi, W., Cheung, C., Tong, X., 2004. Modeling error propagation in vector-based overlay spatial analysis. ISPRS Journal of Photogrammetry & Remote Sensing 59, 47–59. Slingsby, A., Dykes, J., Wood, J., 2011. Exploring uncertainty in geodemographics with interactive graphics. IEEE Transactions on Visualization and Computer Graphics 17 (12), 2545–2554. Smith, B., Mark, D., 1998. Ontology with human subjects testing. American Journal of Economics and Sociology 58 (2), 245–312. Stehman, S.V., Czaplewski, R.L., 1998. Design and analysis for thematic map accuracy assessment: Fundamental principles. Remote Sensing and Environment 64 (3), 331–344. Steinhardt, U., 1998. Applying the fuzzy set theory for medium and small scale landscape assessment. Landscape and Urban Planning 41 (3), 203–208. Stephan, F.F., 1934. Sampling errors and interpretations of social data ordered in time and space. Journal of the American Statistical Association 29 (185A), 165–166. Stouffer, S.A., 1934. Problems in the application of correlation to sociology. Journal of the American Statistical Association 29 (185A), 52–58. Su, X., Talmaki, S., Cai, H., Kamat, V.R., 2013. Uncertainty-aware visualization and proximity monitoring in urban excavation: A geospatial augmented reality approach. Visualization in Engineering 1 (1), 1. Sui, D., 1992. A fuzzy GIS modeling approach for urban land evaluation. Computers, Environment and Urban Systems 16 (2), 101–115. Sui, D. (2009). Ecological fallacy. In: Kitchin, R. & Thrift, N. (eds.) International encyclopedia of human geography, pp. 291–293. Elsevier. https://www.elsevier.com/books/ international-encyclopedia-of-human-geography/kitchin/978-0-08-044911-1. Tarboton, D.G., 1997. A new method for the determination of flow directions and upslope areas in grid digital elevation models. Water Resources Research 33 (2), 309–319. Tate, E., 2013. Uncertainty analysis for a social vulnerability index. Annals of the Association of American Geographers 103 (3), 526–543. Tobler, W., 1970. A computer movie simulating urban growth in the Detroit region. Economic Geography 46 (Suppl 1), 234–240. Tobler W (1988) Resolution, Resampling, and All That. In: Mounsey H and Tomlinson R (eds.) Building Data Bases for Global Science, pp. 129–137. London, UK: Taylor and Francis.
340
Spatial Data Uncertainty
Tobler, W., 2004. Thirty five years of computer cartograms. Annals of the Association of American Geographers 94 (1), 58–73. Tong, X., Shi, W., 2010. Measuring positional error of circular curve features in Geographic Information Systems (GIS). Computers & Geosciences 36 (7), 861–870. Tong, X., Sun, T., Fan, J., Goodchild, M.F., Shi, W., 2013. A statistical simulation model for positional error of line features in Geographic Information Systems (GIS). International Journal of Applied Earth Observation and Geoinformation 21, 136–148. Tuan, Y.F., 1977. Space and place: The perspective of experience. University of Minnesota Press, Minneapolis. Tucci, M., Giordano, A., 2011. Positional accuracy, positional uncertainty, and feature change detection in historical maps: Results of an experiment. Computers, Environment and Urban Systems 35 (6), 452–463. US Census Bureau. (2000). Census 2000 summary file 3. American fact finder. http://www.factfinder.census.gov/home/en/sf3.html (Accessed 9 October 2007). USGS, 1998. National mapping program technical instructions, part 2-standards for digital elevation models. USGS, Washington, DC. Veregin H (1999) Data quality parameters. Geographical Information Systems 1: 177–189. Verhoeven, G., Taelman, D., Vermeulen, F., 2012. Computer vision-based orthophoto mapping of complex archaeological sites: The ancient quarry of Pitaranha (Portugal–Spain). Archaeometry 54 (6), 1114–1129. http://dx.doi.org/10.1111/j.1475-4754.2012.00667.x. Voudouris, V., 2010. Towards a unifying formalisation of geographic representation: The object–field model with uncertainty and semantics. International Journal of Geographical Information Science 24 (12), 1811–1828. Wang, F., Hall, G.B., 1996. Fuzzy representation of geographical boundaries in GIS. International Journal of Geographical Information Systems 10 (5), 573–590. Wechsler, S., 2007. Uncertainties associated with digital elevation models for hydrologic applications: A review. Hydrology and Earth System Sciences 11 (4), 1481–1500. Wechsler, S.P., Kroll, C.N., 2006. Quantifying DEM uncertainty and its effect on topographic parameters. Photogrammetric Engineering & Remote Sensing 72 (9), 1081–1090. Westoby, M.J., Brasington, J., Glasser, N.F., Hambrey, M.J., Reynolds, J.M., 2012. ‘Structure-from-motion’ photogrammetry: A low-cost, effective tool for geoscience applications. Geomorphology 179, 300–314. http://dx.doi.org/10.1016/j.geomorph.2012.08.021. Wilson, J.P., 2012. Digital terrain modeling. Geomorphology 137 (1), 107–121. Wilson JP and Burrough PA (1999) Dynamic modeling, geostatistics, and fuzzy classification: New sneakers for a new geography? 736–746. Wittenbrink, C.M., Pang, A.T., Lodha, S.K., 1996. Glyphs for visualizing uncertainty in vector fields. IEEE Transactions on Visualization and Computer Graphics 2 (3), 266–279. Wong, D., 2009. The modifiable areal unit problem (MAUP). Sage, London. Wood, J.D., Fisher, P.F., 1993. Assessing interpolation accuracy in elevation models. IEEE Computer Graphics and Applications 13 (2), 48–56. Woodcock, C.E., Gopal, S., 2000. Fuzzy set theory and thematic maps: Accuracy assessment and area estimation. International Journal of Geographical Information Science 14 (2), 153–172. Wu, J., Jones, B., Li, H., Loucks, O.L., 2006. Scaling and uncertainty analysis in ecology. Methods and applications. Springer. ISBN-10: 1402046626. Wu, S., Li, J., Huang, G., 2008. A study on DEM-derived primary topographic attributes for hydrologic applications: Sensitivity to elevation data resolution. Applied Geography 28 (3), 210–223. Xiao, N., Calder, C.A., Armstrong, M.P., 2007. Assessing the effect of attribute uncertainty on the robustness of choropleth map classification. International Journal of Geographical Information Science 21 (2), 121–144. Xue, J., Leung, Y., Ma, J., 2015. High-order Taylor series expansion methods for error propagation in geographic information systems. Journal of Geographical Systems 17 (2), 187–206. Zadeh, L.A., 1965. Fuzzy sets. Information and Control 8 (3), 338–353. Zandbergen, P.A., 2008. Positional accuracy of spatial data: Non-normal distributions and a critique of the national standard for spatial data accuracy. Transactions in GIS 12 (1), 103–130. Zevenbergen, L.W., Thorne, C.R., 1987. Quantitative analysis of land surface topography. Earth Surface Processes and Landforms 12 (1), 47–56. Zhan, F.B., 1998. Approximate analysis of binary topological relations between geographic regions with indeterminate boundaries. Soft Computing 2 (2), 28–34. Zhang, J., 2006. The calculating formulae and experimental methods in error propagation analysis. IEEE Transactions on Reliability 55 (2), 169–181. Zhang, J., Foody, G.M., 2001. Fully-fuzzy supervised classification of sub-urban land cover from remotely sensed imagery: Statistical and artificial neural network approaches. International Journal of Remote Sensing 22 (4), 615–628. Zhang, J., Goodchild, M.F., 2002. Uncertainty in geographical information. CRC Press, Boca Raton. Zhang, J., Kirby, R.P., 2000. A geostatistical approach to modeling positional error in vector data. Transactions in GIS 4 (2), 145–159. Zhang, B., Zhu, L., Zhu, G., 1998. The uncertainty propagation model of vector data on buffer operation in GIS. ACTA Geodaetica et Cartographic Sinica 27, 259–266. Zhang, L., Deng, M., Chen, X., 2006. A new approach to simulate positional error of line segment in GIS. Geo-spatial Information Science 9 (2), 142–146. Zimmermann, H. J. (1996). Fuzzy control. In: Fuzzy set theorydAnd its applications, pp. 203–240. Berlin, Germany: Springer.