Comput., Environ. and Urban Sys,tems. Vol. 19. No. 1. pp. 23-36, 1995 Copyright Q 1995 Elsevier Science Ltd Printed in the USA. All rights reserved 0198-97W95 $9.50 + .OO
0198-9715(94)00032-S
DATA-QUALITY ENHANCEMENT IN LAYER-BASED GEOGRAPHIC SYSTEMS
TECHNIQUES INFORMATION
Ho ward Veregin Department of Geography, Kent State University Kent, OH David I? Lanter Department of Geography, University of California, Santa Barbara, CA
ABSTRACT. This study deals with the general issue ofdata quality in geographic information systems (GIS). The specific focus is the propagation of source data errors through GIS data-transformation functions. The objective of the study is to describe quality enhancement (QE) tools that can be used to improve the quality of derived data products. These tools allow users to explore the error characteristics of their databases, devise optimal strategies for improving the accuracy of derived data, and enhance the reliability of information used for decision-making purposes.
INTRODUCTION The need for data-quality assessment techniques and quality-assurance procedures in geographic information systems (GIS) no longer needs much by way of formal introduction or justification. In recent years the issue has received a great deal of attention from the GIS community. There has been a significant increase in the number of journal and conference articles related to the issue of GIS data quality, and several books have recently been published on the topic (Goodchild & Gopal, 1989; Hunter, 1991). The importance of data quality is also reflected in the recent adoption of the Spatial Data Transfer Standard (SDTS), which includes a data-quality component, as a Federal Information Processing Standard (FIPS) to serve all segments of the U.S. federal geographical information processing community. Nor is this type of activity restricted to the United States, as evinced by work on data quality being conducted by the International Cartographic Association, the efforts to promote European data standards (such as DIGEST
Requests for reprints should be sent to Dr. H. Veregin, Department of Geography, 44242-0001,
e-mail:
[email protected].
23
Kent State University,
Kent, OH
24
H. Veregin and D. I? Lanter
[Digital Geographic Information Exchange Standard]), and the appearance of non-English texts on GIS data quality (Giordano & Veregin, 1994). Despite these advances in knowledge and awareness, it is probably safe to say that most GIS-based modeling projects do not explicitly account for the presence of errors in source data and the repercussions that these errors may have on the validity of derived data products. While project managers may suspect that derived data are less than perfect, data-quality assessment procedures are not normally applied because the tools needed to perform these assessments are simply not found in commercial GIS software packages. A complicating factor is the dynamic nature of the GIS environment, which fosters the merging and manipulation of data from different sources. The application of GIS datatransformation functions can induce significant changes in source data-quality characteristics. Data quality cannot be modeled as a static attribute that is simply inherited by derived data products. Rather, quality assessment for derived data requires attention to the issue of error propagation, that is, the ways in which different GIS data-transformation functions affect data-quality characteristics. The development of quality-assessment procedures for digital geographic data must be coupled with advances in the understanding of error-propagation mechanisms in order to assess the reliability of data products derived through GIS-based transformations. Veregin (1989a) suggests that the assessment of data quality in GIS contains the following hierarchy of interdependent components: l l l l
error error error error
source identification, detection and measurement, propagation modeling, and management and reduction.
Each level in the hierarchy is dependent on the success of the procedures performed at the preceding level. Most of the research on data quality in GIS focuses on the first two levels (see Veregin, 1989b). More recently, there has been work on the third level (error propagation), and several attempts have been made to develop semiautomated systems for propagating error through sequences of GIS functions (Carver, 1991; Heuvelink, Burrough, & Stein, 1989). Our own research in this area has focused on the design and implementation of an automated system for modeling the propagation of thematic error through several commonly employed datatransformation functions (Lanter & Veregin, 1990, 1992). While much still remains to be known about error-propagation mechanisms, it is possible to use the knowledge that has been gained in these studies to begin exploring the fourth level of the hierarchy, that is, the issue of error management and reduction. This is the focus of the present study. Our objective is to demonstrate some simple concepts and methods that can be used to improve the quality of data products derived in the course of GIS-based analysis procedures. We refer to this capability as quality enhancement (QE) to emphasize its close relationship to quality assurance (QA) and quality control (QC). QE is potentially the most powerful application of information about data quality and error propagation in GIS, as it allows users to enhance the reliability of the information on which their decisions are based. Given sufficient sophistication in QE tools, users would be able to examine different strategies for improving quality and select those that deliver the greatest return in data quality per unit investment.
ERROR PROPAGATION
IN LAYER-BASED
GIS
The purpose of this section is to highlight the main features of layer-based as a background for the discussion of QE techniques.
error propagation
Data-Quality Enhancement in GIS
25
Layer-Based Data Flows As the phrase itself suggests, layer-based GIS is based on the concept of the layer as the basic means of data organization and representation. The layer is based on the geographic-data cube model, whereby any geographical observation can be located in a space defined by its spatial (or locational), temporal and thematic (or attribute) values. Space, time, and theme define geographical data. (Note that this implies that there is a fundamental distinction between “spatial” data and “geographical” data, the former being a component of the latter.) Layers are spatially-registered thematic or temporal overlays, showing different themes for the same locations within the same time interval, or showing the same theme for the same locations but for a series of different time intervals (as in change detection). Data-transformation functions in GIS may modify an entire layer or selected features in the layer, depending on the type of function applied. Giordano and Veregin (1994) define three main classes of functions: l
l
l
Input functions make data usable by a GIS. These functions operate on nondigital data (e.g., paper maps) and convert these data to layers that are then manipulatable by other GIS functions. Manipulation functions operate exclusively on layers, in the sense that layers serve as both inputs and outputs. Thus, these functions create data that are manipulatable by other functions of this class. Many common GIS analytical tools fall into this category (e.g., overlay operations, reselection, buffering). Output functions take layers as input, and produce, as output, data that are not further manipulatable by the system. Plotting, reformatting, and exporting are examples of such functions.
In this study, we focus on the class of manipulation functions. Our interest in these functions reflects our view of the relative importance of the analytical component in GIS, as this is one of the main factors that distinguishes GIS from other information systems. As noted above, layers serve as both the inputs and outputs for manipulation functions. Thus, the data transformations associated with these functions can be represented in the form of a data flow linking input and output layers. This is the familiar cartographic model or map algebra representation of GIS-based analysis (Tomlin & Berry, 1979). For the purposes of error propagation, it is useful to identify two slightly different forms of data flow. The first involves functions that have one input layer and produce one output layer. As an example, one might selectively extract from a land-use layer those polygons classified as ranches (Fig. la). The second form of data flow involves the merging of two input layers to produce a single output layer. This is the realm of the familiar overlay operations. As an example, two input layers, the
Grazing
LandUse
I reselect
Qi&c
t
•1 Ranches FIGURE 1. Graphical
Oakwoods
t
(a)
El
AtRisk
(b)
Representation of Data Flows Between (a) One Input Layer and One Output Layer, and (b) Two Input Layers and One Output Layer.
26
H. Veregin and D. I? Lanter
first depicting areas of cattle grazing and the second depicting areas of oak woodlands, might be intersected to show areas that have both of these characteristics (Fig. lb). It is common for entire sequences of such transformations to be performed in GIS-based modeling and analysis procedures. The goal is to derive and make explicit a particular set of geographical relationships that are implicit in the source data (Lanter, 1991). The resulting data-flow model links source layers to the final derived layer, which contains an explicit representation of the selected geographical relationships. An example of such a data-flow model is shown in Figure 2. The objective here is to identify areas where oak-tree regeneration is at risk from cattle-grazing activity. This is achieved as follows: l l
l l
Areas classified as ranches are extracted from a land-use layer. Areas with permits for cattle grazing are superimposed on the ranch layer using an inclusive logical operator (i.e., a logical union) to identify areas that either are classified as ranches or have a grazing permit. Areas of oak woodland are extracted from a vegetation layer. Areas of oak woodland are superimposed on the grazing layer using an exclusive logical operator (i.e., a logical intersection) to identify areas where oak trees and cattle are found together.
GIS data transformation functions create links between input and output layers. These links are lineage relationships describing parent-child associations between layers. Parentlinks ask, “Who are my children?” while child-links ask, “Who are my parents?’ These two relationships allow for simple discrimination of input and output layers for any function; the input layer has a parent-link, while the output layer has a child-link (Fig. 3). In a dataflow model (e.g., Fig. 2), three different types of layers can be identified based on these lineage relationships: l l l
Source layers (a type of input layer) have parent-links but no child-links. Final layers (a type of output layer) have child-links but no parent-links. Intermediate layers (both input and output layers) have both child and parent links (Fig. 4).
q cl tl Permits
LandUse
Ranches
Vegetation
Oakwwds union Ic
0
AtRisk FIGURE 2. Graphic Representation
of Data-Flow Model.
Data-Quality Enhancement in G/S
27
Ranches
FIGURE 3. Child- and Parent-Links
Between Input and Output Layers.
It is also possible to represent data flows in functional notation. This capability turns out to be especially useful in error propagation. The two flows shown in Figure 1 are given below: Ranches = reselect(Landuse) AtRisk = intersecr(Grazing,Oakwoods) The entire data-flow model shown in Figure 2 can also be represented in functional form, as follows: AtRisk = intersect(union(reselect(landUse),Permits),reseZect(Vegetation)) An important characteristic of this type of functional representation is that the final derived layer (AtRisk) can be described in terms of the source layers alone (LandUse, Permits, and Vegetation).
Layer-Based Error Propagation Error propagation in layer-based GIS is based on the ability to identify links between input and output layers. In order to propagate error through a GIS function, it must be assumed that
q
Ll
El
Permits
LandUse
Vegetation Parent-/Ink
Parent-khk t
t
Ch:d-link + I Parent-knk
+ C*fd-unk
I
ChiM-link
El Ranches Parent-,&k
III
Oakwoods
b hi/dlink \ \ k
Parent-link ‘1 ChtkMnk
El Grazing Pare&k
hChiM \
linj
~
El AtRisk FIGURE 4. Child- and Parent-Links
for Data-Flow Model.
28
H. Veregin and D. P Lanter
the input layer has been attributed with an index of error. An error-propagation function is used to modify the index for the input layer, and the modified index is then passed to the output layer via the parent-link (Fig. 5). As noted above, one characteristic of the functional representation of a data-flow model is that the final derived layer can be described solely in terms of source layers. This implies that, in order to propagate error through a data-flow model, it is necessary only that the source layers are attributed with error indices. These indices are then transformed and propagated via parent-links to intermediate layers, where they are transformed and propagated again until the final derived layer is reached. Error-propagation functions themselves must be tailored to specific data-transformation functions and particular error indices. The way in which error propagation is modeled may also depend on ancillary information describing such factors as the spatial distribution and cooccurrence of error on different layers. Error indices describe the magnitude of a selected error component in a layer. These indices may be scalar quantities, matrices, mathematical functions, and even co-registered layers. One of the simplest error indices to propagate is PCC (proportion correctly classified), which measures thematic error for categorical data. This index is derived from a classification error matrix showing a cross-tabulation of the actual and estimated thematic classes for a sample of locations. Element cii in the matrix is the number of sample locations assigned to class i that actually belong to class j. The PCC index is then defined as the trace of this matrix (i.e., the sum of all cU where i = j) divided by the number of sample locations. If the sample has been obtained randomly, then PCC may be interpreted as the probability that a location selected at random on the layer is correctly classified. The flow of information about error through a data-flow model is referred to as an errorpropagation model. The structure of the error-propagation model matches the structure of the data-flow model, except that layers are replaced by error indices and GIS functions are replaced by error-propagation functions. The following is the error-propagation model for the data-flow model described above: PCC*tRisk = epf_intersect_pcc( epf_union_pcc( epf_reselectpcc(PCCLa,dvse)’ PccPcmlitJ~
epf_rese~ect_pcc lPCCVq&ation)) According to this model, the PCC of the final derived layer, At&k, is a function of the PCC values of the three source layers. It is not necessary to know the PCC of any intermediate layer. The error-propagation model uses a set of error-propagation functions (epf’s) that modify the PCC index appropriately for each GIS function used in the data flow. Each error-propagation function is specific to a given error index (in this case, PCC) and a given data-transformation function
LandUse
Ranches FIGURE 5. Propagation
of Transformed
Error index via Parent-Link.
Data-Quality
Enhancement
29
in G/S
(e.g., reselect, union, intersect). In fact, a variety of error-propagation functions might be applied to each combination of error index and data-transformation function, depending on the assump tions about error-propagation mechanisms and the spatial distribution and co-occurrence of error. Much of our earlier work in error propagation has focused on the development of techniques to track source-layer errors as data-transformation functions are applied. This process is based on the identification of lineage relationships established between derived layers and the sources from which they are derived. As layers are passed through GIS transformation functions, a graphical representation is constructed showing source layers, derived layers, and the transformation functions that connect them. The error indices associated with each layer are also displayed showing the modification of index values (Fig. 6).
INVERSE
ERROR PROPAGATION
Error-propagation modeling allows users to assess the fitness-for-use of a particular derived data product for a particular decision-making purpose. For example, users may have an 85% thematic accuracy target that would result in the rejection of a derived layer with a PCC less than 0.85. As the accuracy of derived layers can be significantly different from the accuracy of the data sources from which they are derived, error propagation is a necessary component in such assessments of fitness-for-use. However, one limitation of error-propagation modeling is that it is not proactive. Assume that error propagation indicates that a derived layer does not meet the target accuracy level for a
FILE
RULES
LANDUSE
PERMITS
.8,
/ --*
1 TRACE
1 QUERY
1
LDMS
1 REFRESH
VEGETATION
OA&DS
32
/I
LIII GRAZING
/.:
1”
LJ 84
AT-RISK
lNTRSECT_COVER:
K: intenacl grazhg
5:
oakwood
at-risk
FIGURE 6. Screen Dump From Automated Error-Propagation System.
30
H. Veregin and D. P Lanter
particular decision-making purpose. In this situation, the derived layer cannot be used to support decision-making, as the user has no option but to reject the layer as unfit. What the user needs to know at this point is how to make the necessary modifications to obtain a derived layer that reaches the target accuracy level. The user needs answers to some specific questions. What data sources are especially problematic? What processing steps are causing error to be inflated? If the accuracy of the source layers were increased, would there be a positive effect on the derived layer, and if so, how much of an effect would there be and how much would it cost? In light of these considerations, our goal is to design a system for analysis of error propagation in layer-based GIS that can be used to: l
l
l
assess the relative importance of data sources in terms of their impact on the accuracy of derived data, explore scenarios for improving the accuracy of derived data based on enhancements in source-data accuracy, and prioritize strategies for improving derived-data accuracy based on the cost of enhancements in source data.
Our approach is based on inversion of the error-propagation model. Model inversion allows source layer accuracy to be expressed in terms of the accuracy of the final derived layer. In this way, it is possible to compute the level of accuracy that each source layer would need to attain in order to achieve a given target accuracy level for the final derived layer, One can also use this approach to determine the increase in derived-layer accuracy that would result from a unit increase in the accuracy of any source layer. This capability makes it possible to prioritize source layers in terms of their potential for improving the accuracy of derived data. There are two components to model inversion. The first is the specification of an inverse error-propagation function for each GIS data-transformation function. The second component is the specification of the inverse error propagation model itself.
Function Inversion Inversion of a particular error-propagation function is a straightforward task if the function is a simple algebraic expression that maps error in the input layer to error in the output layer. This condition is met in the examples that follow. The error propagation functions for these examples are drawn from Lanter and Veregin (1992). We begin with a generic data flow in which an output layer, B, is derived from an input layer, A, using some GIS function,f, that is:
B =.fW The inverse form is written as: A =f-‘(B) In this inverse form, the input layer is defined as a function of the output layer. Next, given an error propagation function for some error index, e, the propagation from layer A to layer B is defined as:
of error
where eA is the value of the error index for input layer A, eB is the value of the (propagated) index for output layer B, and e&f-e is the error-propagation function for the GIS functionf
Data-QualityEnhancement in G/S
31
and the error index e. The inverse form is written as:
Now consider the application of these simple principles used in the data-flow model described above.
to the data-transformation
functions
Reselect The reselect function transforms one input layer to produce an output layer, that is, B = reselect(A). In this case, the error-propagation
function for the PCC index is defined as:
PCC, = epf_reselect_pcc(PCCA) Under the assumption lect_pcc states that:
of equal errors among classes (see Lanter & Veregin,
1992) epf_rese-
PCC, = PCC, + K( 1 - PCC,), where, K = (r (r - 1) + (k - r) (k-r
- 1)) / (k (k - I)),
where k is the number of classes in the input layer and r is the number of classes collapsed form the reselected class in the output layer. The inverse error propagation function is defined simply as:
to
PCCA = epf_resebct_pcc -‘(Pcc& where epf_resebct_pcc-l
states that: PCC, = (PCC, - K) / (1 - K).
intersect A more complicated layers, that is:
case exists for the intersect function,
as this function
C = intersect (A, B). In this case, the inverse form is defined as: A = intersect-l (C, B). However, note that there is also a second inverse form: B = intersect-l (C, A). The PCC of derived layer C is expressed as: PCC, = epf_intersect _pcc (PCCA, PCC,).
merges two input
H. Veregin and D. I? Lanter
32 The inverse error propagation
function (for the first inverse form described above) is defined as:
PCC, = epf_infersecf_pcc-’ Assuming that errors are uncorrelated epf_intersect_pee states:
between
(PCC,, the layers
PCC,). (see Lanter
& Veregin,
1992),
PCC, = PCC, x PCC,. Thus, epf_intersec?_pcc-l
is written as: PCC, = PCC, / PCC,.
This function can be used to compute the PCC that input layer A must attain in order to reach a target PCC for output layer C. However, as layer B is also an input layer, the PCC of this layer must be fixed at some value.
Union One can likewise Given:
define the inverse
error propagation
function
for the union function.
C = union(A,B), and PCC, = epf_union_pcc(PCCA,PCCB), where epf_union_pcc states that: PCC, = 1 - (1 - PCC,)
(1 - PCC,),
then the inverse data flow is: A = union-l(C,B), and the inverse error propagation
function is:
PCC, = epf_union_pcc-‘(PCCc,PCCB), where epf_union_pcc-’
states that: PCC, = 1 - (1 - PCC,) / (1 - PCC,).
Again, a second inverse form exists, as in the case of the intersect function.
Model Inversion In order to use these inverse error-propagation functions, they must be concatenated to create an inverse error-propagation model. The form of this inverse model must in turn match an inverse data-flow model obtained by selecting one of the source layers as a target layer. The inverse data-flow model defines the target layer to be dependent on the final derived layer and the remaining source layers. Specification of the inverse data-flow model involves tracing parent-links from the target layer to the final derived layer, and child-links from the final derived layer to the remaining source layers. There is a unique inverse data-flow model for each target source layer. The
Data-QualityEnhancement in GIS
33
inverse data-flow models for each of the three source layers in Figure 2 are given below: LandUse = reseZect-1(union-1(Permits,intersect-1(AtRisk,reselect(Vegetation
)))),
Permits = union-1(reselect(LandUse),inrersecr-1(AtRisk,reseZect(Vegetation))), Vegetation = intersecr-l(AtRisk,union(reseZect(LandUse),Pennits)). Assume that one is interested in computing how accurate source layer LandUse would need to be in order for derived layer AtRisk to achieve a given target accuracy level. In this case, the accuracies of source layers Permits and Vegetation are held constant. The inverse error-propagation model for LandUse then mirrors the inverse data-flow model given above, that is: PCCLandUse
=
epf_reselect_pcc-l(epf_union_pcc-l( PCCpedts,epf_infersect_pcc-l( PCCAtRisk,epf_reselect_pcc(PCCv~~~~ti~~))))
Analogous
inverse error propagation
models can also be produced for the other source layers.
INVERSE ERROR PROPAGATION
AS A TOOL FOR QUALITY ENHANCEMENT
There are several ways in which inverse error propagation can be used as a tool for QE. The most straightforward of these involves computation of the accuracy that a given source layer must attain in order to achieve a target level of accuracy in the final derived layer. This involves holding the accuracy of the remaining source layers constant. Consider an example in which the PCC of LandUse is 0.75 (k = 5), the PCC of Permits is 0.9 (k = 2), and the PCC of Vegetation is 0.7 (k = 4). Using the error propagation functions given above, the computed PCC of AtRisk is 0.84 (see Fig. 6). Assume a target PCC for AtRisk of 0.94 (an increase of 0.1). Application of the inverse error-propagation model as described above indicates that in this case LandUse must have a PCC of 0.87 (an increase of 0.12), assuming that the PCCs of the remaining source layers (Permits and Vegetation) do not change. The same approach can be used to define the required accuracy of Permits (holding LandUse and Vegetation constant) and Vegetation (holding LandUse and Permits constant). A more sophisticated application of inverse error propagation involves the computation of the effects of changes in the accuracy of each source layer. Figure 7 shows the relationship
I
-0.5
I
0.0
I
I
I
0.5
A PCC in input layer FIGURE 7. Relationship
Between Change in PCC of Source Layers and Change in PCC of AtRisk.
H. Veregin and D. I? Lanter
34
between the change in the PCC of each source layer (horizontal axis) and the resulting PCC of AtRisk (vertical axis). The line for each source layer on the graph spans its maximum possible range along the horizontal axis. For example, Vegetation has a PCC of 0.7, and therefore the line ranges from -0.7 to 0.3 (i.e., from a PCC of 0.0 to a PCC of 1.0). The lines for all three source layers intersect at the originally computed PCC value for AtRisk (i.e., 0.84). The layer with the steepest slope (i.e., Vegetation) yields the highest rate of increase in the PCC of At&k. Assuming that the cost of improving accuracy is identical for all three source layers, the accuracy of AtRisk can therefore most economically be improved by increasing the accuracy of the Vegetation layer. The graph also shows that the maximum PCC achievable for AtRisk (assuming that only one source layer is changed) is 0.99, which can be attained by increasing the PCC of Vegetation to a value of 1.O. In practice, the cost of improving accuracy in source layers may vary as a function of the nature of the data contained in the source, the availability and cost of data, the costs of performing ground-based accuracy assessments, and other factors. In this situation, various approaches might be pursued in order to determine the optimal strategy for quality enhancement. One strategy involves improving the accuracy of the source that yields the target accuracy at the lowest cost. An alternative is to improve the accuracy of the source that yields the greatest increase in derived-layer accuracy per unit of cost. The latter strategy can be represented in the form of a graph in which the vertical axis represents the change in accuracy of the derived layer divided by the cost of a unit change in accuracy of each source layer. An example is shown in Figure 8, in which the relative costs of improving the accuracy of the three sources LandUse, Permits, and Vegetation are expressed by the ratio O.l:OS:l (i.e., the cost of a unit improvement in the accuracy of Vegetation is 10 times that for LandUse and 2 times that for Permits). As the figure shows, LandUse yields the greatest return in derived-layer accuracy per unit investment. It is also possible to identify the solution space of source-layer accuracy combinations that yield the target accuracy level for the derived layer. Given three source layers, the solution space can be represented as a surface for which the x, y, and z coordinates are the source-layer
0.2
-0.5 A PCC
0.0
in input layer
FIGURE 8. Relationship Between Change in PCC of Source Layers and Change in PCC of AtRisk, Weighted by the Cost of Improving Source Accuracy.
Data-Quality
Enhancement
in G/S
35
FIGURE 9. Solution space for LandUse, Permits, and Vegetation for target PCC of 0.9 for AtRisk.
PCC values. Any point on the surface gives a combination of PCC values for that yields the target PCC for the derived layer. Figure 9 shows such a surface accuracy level. Note that Vegetation is the most critical layer for ensuring accuracy in AtRisk. Either LandUse or Permits (but not both) can be relatively out significantly affecting derived layer accuracy, because these two source through the union function, which tends to inflate accuracy (Veregin, 1989a).
the source layers for a given target a high level of inaccurate withlayers are passed
CONCLUSION Inverse error propagation allows users to examine the implications of variations in sourcelayer accuracies. This in turn facilitates the exploration of error characteristics and the identification of optimal strategies for performing quality-enhancement procedures to improve the accuracy of derived data. Inverse error propagation explicitly accounts for the dynamic nature of data-quality assessment in a GIS environment. It recognizes that users have different dataquality objectives, and that the “ideal” level of quality varies between users and from application to application. The capabilities described in this study are designed to allow users to reach their data-quality objectives in the most cost-effective manner, to enhance the utility with which data-quality information can be used to inform decision making.
REFERENCES Carver, S. (1991). Adding error handling functionality to the GIS toolkit. Pmceedings EGIS ‘91, 187-196. Giordano, A., & Veregin, H. (1994). I1 conrmllo dl quality nei sistemi inform&vi rerritoriali. Venice, Italy: I1 Cardo. Goodchild, M. F., & Gopal, S. (Eds.). (1989). Accuracy ofspatial databases. Basingstoke, UK: Taylor & Francis. Heuvelink, G. B. M., Burrough, P. A., & Stein, A. (1989). Propagation of errors in spatial modelling with GIS. Iniemational Journal of Geographical Information Systems, 3,303-322. Hunter, G. J. (Ed). (1991). Proceedings, Symposium on Spatial Database Accuracy. Melbourne, Australia: Department of Surveying and Land Information, University of Melbourne. Lanter, D. (1991). Design of a lineage-based meta-data base for GIS. Cartography and Geographic Information Systems, 18.255-261.
36
H. Veregin
and D. I? Lanter
Lanter, D., SCVeregin,. H. (1990). A lineage meta-database program for propagating error in geographic information systems. GWLIS ‘90 Proceedings, 144-153. Lanter, D., & Veregin, H. (1992). A research paradigm for propagating error in layer-based GIS. Photogrammetric Engineering and Remote Sensing, 58,526533. Tomlin, C. D., & Berry, J. K. (1979). A mathematical structure for cartographic modeling in environmental analysis. Proceedings, American Congress on Surveying and Mapping, 269-283. Veregin, H. (1989). Error modeling for the map overlay operation. In M. F. Goodchild & S. Gopal (Eds.), Accuracy of s&tial databases (pp. 3-18). Basingstoke, 6K: Tayior& Francis. Veregin, H. (1989b). A taxonomy of error in spatial databases (Technical Paper No. 89-12). Santa Barbara, CA: National Center for Geographic Information and Analysis.