Compurers & GeosciencesVol. 21, No. IO, pp. 1163-I 176, 1995 Copyright 0 1995Elsevier Science Ltd Printed in Great Britain. All rights reserved oo9s3004(9qooo47-x 009s3004/95 s9.50 + 0.00
Pergamon
MERCURY 0:
AN EVIDENTIAL REASONING CLASSIFIER
IMAGE
DEREK R. PEDDLE Earth-Observations
Laboratory, Institute for Space and Terrestrial Science, Department of Geography, University of Waterloo, Waterloo, Ont., N2L 3G1, Canada (e-mail:
[email protected]) (Received 11 April 1994; accepted I December 1994)
Abstract-MERCURY@ is a multisource evidential reasoning classification software system based on the Dempster-Shafer theory of evidence. The design and implementation of this software package is described for improving the classification and analysis of multisource digital image data necessary for addressing advanced environmental and geoscience applications. In the remote-sensing context, the approach provides a more appropriate framework for classifying modern, multisource, and ancillary data sets which may contain a large number of disparate variables with different statistical properties, scales of measurement, and levels of error which cannot be handled using conventional Bayesian approaches. The software uses a nonparametric, supervised approach to classification, and provides a more objective and flexible interface to the evidential reasoning framework using a frequency-based method for computing support values from training data. The MERCURY@ software package has been implemented efficiently in the C programming language, with extensive use made of dynamic memory allocation procedures and compound linked list and hash-table data structures to optimize the storage and retrieval of evidence in a Knowledge Look-up Table. The software is complete with a full user interface and runs under Unix, Ultrix, VAX/VMS, MS-DOS, and Apple Macintosh operating system. An example of classifying alpine land cover and permafrost active layer depth in northern Canada is presented to illustrate the use and application of these ideas. Key Words: Remote sensing, Dempster-Shafer
theory, Multisource data, Yukon, Permafrost.
INTRODUCTION Recent advances in airborne and satellite remotesensing systems together with the need to analyze increasingly complex environmental phenomena at regional and global scales have placed new demands upon remote-sensing image-processing, analysis, and classification procedures (Argialas and Harlow, 1990; Campbell and Cromp, 1990). For example, image data now are available at higher spatial and spectral resolutions and may possess numerous information bands and diverse statistical properties (e.g. hyperspectral imagery). Moreover, when considering applications of greater complexity, image-processing procedures (e.g. computing texture, moment distributions, higher order derivatives), as well as the integration of multisource ancillary data (e.g. digital elevation models, climatological data, thematic data from a geographic information system) may be used with remote-sensing imagery to provide the additional information necessary for the analysis (Franklin, 1989). However, these modern, higher dimensional, and multisource data sets cannot be handled always using conventional image-classification procedures (Srinivasan and Richards, 1990; Peddle, 1993) which are available typically in commercial image-analysis
systems and statistical-analysis packages (e.g. Bayesian maximum likelihood classifiers, linear discriminant analysis). Some of the problems and limitations encountered when using these traditional approaches are as follows. (1) Multisource data sets may contain information at different scales of measurement (or levels of data). In increasing order of information precision, these are nominal (or thematic data), ordinal (ranked data), interval, and ratio-level data. Conventional classifiers usually can handle only ratio-level data (e.g. remote-sensing imagery). Other types of data, such as directional information, also are not appropriate for input to many conventional classification algorithms. (2) Most classification algorithms do not have an explicit mechanism to handle data inconsistencies, errors, or uncertainty. This is particularly important with disparate, multisource data sets which may possess differing levels of information quality, or instances of missing or undefined data. (3) Many conventional classifiers are based on a parametric statistical model and therefore require input data to conform to a Gaussian (normal) distribution, however, higher resolution spectral imagery and many ancillary data types violate this assumption and are inappropriate for parametric algorithms. (4) Traditional classifiers were not designed for
1163
D. R. Peddle
1164
processing higher dimensional data sets, for example, maximum likelihood classifiers usually are not sufficiently robust to handle more than 7-10 input variables at a time. In this paper, the implementation of a new, multisource data classification system, MERCURY@, is presented for overcoming these problems associated with traditional Bayesian classification algorithms. The MERCURY@ software package is based on the Dempster-Shafer (D-S) approach to evidential reasoning (Dempster, 1967; Shafer, 1976) which has been used in a variety of earth resources and geoscience applications, such as geological mapping (Moon, 1990, 1993) water resources (Caselton and Luo, 1992; Peddle and Franklin, 1993), forestry update mapping (Goldberg and others, 1985), remotesensing land cover classification (Lee, Richards, and Swain, 1987; Srinivasan and Richards, 1990; Peddle, 1993) and marine pattern detection (Rey, Tunaley, and Sibbald, 1993). In addition to using the advantages of the evidential reasoning approach to overcoming problems associated with conventional classification algorithms, the MERCURY@ software also provides a more objective procedure for computing evidence from multisource data as an improved interface to the D-S framework, compared with previous ad hoc methods (as reviewed in Peddle, 1995). The MERCURY@ software is presented here in the context of multisource remote-sensing image classification, however, it could be used for any classification or modeling application involving digital image or attribute table data. In the next section, evidential reasoning theory is reviewed briefly and contrasted with some of the fundamental tenets of the traditional Bayesian approach to statistical inference; this is followed by a detailed account of how the MERCURY@ classification system has been implemented in software. Prior to concluding the paper, the application of the MERCURY@ software is illustrated for land cover and permafrost classification in a sub-Arctic alpine setting, with the land cover results compared with two conventional Bayesian classification algorithms.
EVIDENTIAL REASONING Dempster-Shajb
theory qf’ eoidence
The mathematical theory of evidence was proposed first by Shafer (1976) as an extension and refinement of Dempster’s Rule of Combination (Dempster, 1967). The theory provides a general and heuristic basis for integrating distinct bodies of information from independent sources, Although the theory is general in nature and could be applied to any problem which requires the pooling of information to determine the best answer from a set of choices, it will be cast here in the spatial realm and presented in terms of the nomenclature and principles of remote-sensing classification.
For a given pixel (set of data values from different sources for a discrete image unit), the task of a classification is to assign the pixel to one member within a set of classes. In the theory of evidence, the set of all possible classes constitutes the explicit framework within which discrimination is to occur, and is referred to as the frame of discernment (denoted by 0). Evidence can be associated with any combination of set members, including those sets which contain multiple classes. Although most remote-sensing classifications are concerned only with individual classes, or singleton sets, the theory of evidence also lends itself well to considering hierarchical class structures (e.g. Wilkinson and Megier, 1990; Srinivasan and Richards, 1990). The evidence in support of a given class labeling is referred to as the mass committed to that class (usually a real number between zero and one, inclusive). For a given data source, the set of all masses over the frame of discernment is a belief function, which in this research also is termed an evidential vector. In addition to evidential support, the theory considers a measure of plausibility, or the amount of evidence which fails to refute a proposition. The plausibility represents the upper probability or maximum evidence in favor of a proposition (Goldberg and others, 1985) and is calculated as one minus the support for all other propositions (Shafer, 1976). In the context of a remote-sensing classification, plausibility for class C, would be computed as I - S(lC,), where S represents evidential support. The “true” likelihood of a proposition lies somewhere between its support and plausibility measures (Lee, Richards, and Swain, 1987) which is termed the evidential interval of a mass (Garvey, 1987). Use of the evidential interval allows both the support in favor of a class labeling and the associated level of uncertainty to be included in a decision rule. Bayesian uersus evidential theory Although the Dempster-Shafer theory is a generalization of the Bayesian theory of statistical inference (Shafer, 1976), it has some fundamental and important differences which overcome several problems associated with traditional probability theory. For example, Bayesian likelihood values usually are represented by a single-point probability value which tends to overstate the precision of the knowledge it represents (Garvey. Lowrance, and Fischler. 1981). A simple illustration of this is if no information is available regarding two possibilities that are exclusive and exhaustive. In the Bayesian framework, a probability of 0.5 usually is assigned arbitrarily to each possibility (Garvey, Lowrance, and Fischler, 1981) however, this does not portray adequately the lack of knowledge for the propositions under consideration. In contrast to this, the Dempster-Shafer representation has a built-in mechanism to handle ignorance of this nature through its specification of evidential intervals and uncertainty values. Furthermore, when working
MERCURY@ with heuristic information, a probability of 0.40 for one proposition does not necessarily indicate that its negation should be assumed to have a probability of 0.60, as in Bayesian theory (Srinivasan and Richards, 1990). This important distinction between lack of belief and disbelief is handled in the D-S approach using uncertainty measures which permit evidence to be assigned to the entire set of classes being considered (the frame of discernment 0). This provides a more accurate representation of the situation when some or all of the available information is insufficient to decide among any of the propositions. Within a hierarchical framework, this ability to assign belief to sets of propositions also allows the system to suspend judgment. As a result of these capabilities, the D-S approach can provide a more realistic representation of the knowledge available for decision making by avoiding some of the oversimplifications inherent to traditional Bayesian probability theory. These advantages are important particularly when concerned with disparate information which contains typically different levels of precision, importance, quality, or relevance to the problem at hand (e.g. classification of multisource remote-sensing imagery and ancillary data). Knowledge spec$cation One of the difficulties in applying the mathematical theory of evidence to a given application is that the evidential framework does not include a specification of how measures of evidential support and plausibility are created or derived for input to the procedure (this also effects the computation of uncertainty measures). In some instances, this is not a problem if the available information can be considered appropriate evidence for direct input to the theory of evidence. However, this may not be the situation and an interface to evidential reasoning must be created to derive the necessary support and plausibility values. Obviously. the ability of such an interface to translate adequately the available information into evidence is critical to realizing the advantages offered by evidential reasoning and will affect the degree of success obtained. Remote-sensing image data do not provide direct measures of evidence for input to the mathematical theory of evidence and therefore a separate process to derive these measures is required. To date, however, the methods devised for determining evidence from remote-sensing imagery and multisource data have been subjective. limited, and data or application specific (see the review in Peddle, 1995). As a result, the full power of evidential reasoning for image analysis and classification has not yet been realized. To overcome this significant problem, a more objective, comprehensive approach has been designed as a central aspect of the MERCURY@ software. This design is presented in detail in Peddle (1995), and is outlined briefly here as a necessary precursor
1165
to understanding its implementation in software (discussed in “MERCURY@ software design”). In the MERCURY@ software package, measures of evidence are computed from training data within a supervised classification framework. This type of classification involves the image analyst determining a set of classes and identifying representative areas within an image for each class (“training the classifier”). In the MERCURY@ classifier, the frequency of occurrence of individual pixel values constitutes the basis to form the evidential vectors for all classes. The underlying premises to this method are that training data contain evidence for a set of classes. and that the frequency of occurrence of a given value in the training set represents the magnitudes of support for those classes. For quantitative data (i.e. at interval or ratio scales of measure), a bin structure can be placed over a frequency distribution to extend training data knowledge for a greater dynamic range of the data source and reduce any bias associated with individual frequency counts. This bin transformation approach permits evidence to be gathered consistently and objectively from multisource data of differing type, format, and scale of measurement (e.g. nominallevel thematic GIS data, ordinal-level forest information, directional-aspect data, ratio-level remote-sensing imagery). The user also has the option to specify different weighting factors for individual sources, iT there is sufficient a priori information about the relative importance of each data source for the classification. For each source, different weights also may be assigned for each class. This would be important, for example, in a high-relief environment where topographic slope and aspect information may be known to be critical for alpine classes (e.g. from field work, stereo aerial photography. maps, or reports). but less important for other classes located predominantly in flat, low-lying areas such as river valleys. In the MERCURY@ software, evidence is stored and accessed efficiently in a knowledge look-up table (K-LUT). This K-LUT is a central component of the MERCURY@ software -its data structure design and implementation is discussed in full later in this paper. Orthogonal summation Once a set of evidential vectors has been assembled for a pixel in a multisource data set. the task remains to combine the evidence from all sources into a manageable, one-dimensional format containing one measure of support per class. The decomposition of source-specific evidential vectors into one resultant mass function is achieved by orthogonal summation using Dempster’s Rule of Combination (Dempster. 1967). This powerful rule can be applied to any number of sources, each of which contains evidence for a set of labels (which may differ by source. but must all be subsets from the same frame of discernment). The orthogonal summation (0) of evidence from two sources works by sequentially multiplying
1166
D. R. Peddle
Table 1 provides a conceptual general form of orthogonal summation for two sources over a set of singleton class labels A. The evidential vector for source 1 is represented in the top row of the table by the mass function m,, whereas the evidential vector from source 2 appears in the left column as mz. For example, following Equations (3) and (4), the combined evidence in support of class 1 [Cm(A,) in Table I] is calculated as the sum of the following three entries in the table: (i) the product of evidence from each source [m,(A,).m,(A, )]; (ii) the evidence from source 1 multiplied by the evidence from source 2 (1) assigned to the frame of discernment [m,(A, ) .m, (O)]; m'(C)= K-’ 1 ml (Ai)mz (B,) and, (iii) the evidence from source 2 multiplied by A,n a,= c the evidence from source 1 assigned to the frame of where K is defined as: discernment [mz(A,).m,(O)].The latter two products are included in the computation because both terms K= 1c m,(A,)m,(B,). (2) (from the source and the frame) possess evidence for A,nB,=# label A,. The product sum then is normalized by the total amount of evidence not committed to the empty The normalizing constant K-’ corrects for any set ( $). This is determined as one minus the sum mass that was committed to the empty set ($), and of evidence from products of nonintersecting classes is also the extent of conflict between the two sources considered (Shafer, 1976). Orthogonal summation of [e.g. m,(Az).m,(A,),shown as one of the 4 entries in Table I]. Essentially, the combined evidence for a additional sources is achieved by repeated application given label A, is the quotient of the evidence from all of these formulae. In this research, evidence from each source is sources for class A, by the evidence from all sources for all classes. The uncertainty measure for a source compiled over the same set of (singleton) class labels in the frame of discernment. As a result, for a set of is the mass [ 1 - Zm(A)] not ascribed to any class, and is assigned necessarily to 0 (after Garvey, Lowrance, singleton class labels A, the orthogonal sum equation and Fischler, 1981). to determine the new mass m' assigned to the nth Table 2 presents the orthogonal summation of evidlabeling proposition of A can be rewritten as ence from an example multisource data set (Peddle, m’(A,) = K-’ 1 (3) 1995) to illustrate the decomposition of conflicting ml (AJm,(Aj) A, n A, = A, evidence into a decision. Each of the three sources (4) favors a different class (bold entries in Table 2), and K = 1c m,(Ai)m,(A,). the evidence for a given class from one source by the evidence for each class from the next source. Each product of evidence for a given class labeling then is normalized by the sum of the products obtained through multiplication of evidence from nonintersecting classes. The general form of the equation for computing the orthogonal sum of source 1 (with mass m, over a set of labels A) and source 2 (with mass m, over a set of labels B) to determine the mass m' assigned to a labeling proposition C is computed using the equation:
A, n A, = 9
as a result,
there is no clear or a priori
Table 1.General form of orthogonal summation for singleton classes for a frame of discernment 0 with a set of labels {A,“.A,)
0 Source 2
m2(Al)
T
Source
ml(A2)
WAl) ml(Al)*
1
ml(Q)
. . . mt(AJ
o
mz(Al) -
@
ml(Q)
MAl)
m2bW
ml(A2)
Q
. . . m2(4)
m
m2@2)
@
l
ml(@)
m2@2)
+
ml(Ai)
@
-
m2W
ml(Al) m2(@)
1 @ 2:
(-IEm
l
ml@21
l
m2(Q)
K-‘zm(A2)
ml(-+) m2W
m2(AJ -
ml(Q)
m2(Ai) l
ml(Q) *
m2(Q)
Km’Cm(Ai) <= 1 -Cm($)
consensus
1167
MERCURY@ Table 2. Example orthogonal summation of evidence from three sources. Bold entries denote class with greatest support for each source considered Source 1
@ Cl Source 2
.13
c2
c3
0
.22
.35
.30
Cl
.26
.m
.0572
.0910
c2
,085
.Olll
m
,029s
m 9;?55.
c3
.17
.0221
.0374
J&B
JzJQ
0
,485
m
JQfg
J&g
.1455
.1749
.1509
.2803
.2327
.2007
.3729
u
v
u
K=.7515
xK-’ 1 Q 2:
CB
Source 18 Source
Source
3
Cl .2327
2 0
c2
c3
.2007
.3729
.1937
Cl
.12
.m
.0241
.0447
m
C2
.13
.0303
9261
.0485
m
c3
0
0
0
.75
Q
0
m
m
m
.2256
,201s
.2797
Q .1453 K=.8524
x K-’ (1 &t 2) @ 3:
.2647
.2367
.3281
c support
+
.4352
.4072
.4986
c Plausibility
=
.6999
.6439
.8267
Label
+
3
regarding a class label preference. As shown in Table 2, the orthogonal sum of source 1 and 2 is taken first, and the result (102) is used in orthogonal summation with source 3. By the commutativity of multiplication, the orthogonal summation from different sources can proceed in any order. So, for example, the same results would be obtained from (3Q1)@2. Sequential orthogonal summation ceases when each source has been processed; the result is one evidential vector of class supports which represents a consolidation of all the information from the various sources over the frame of discernment in an appropriate form for the application of a classification decision rule. Decision rule
After the evidence from each information source has been combined by repeated application of the orthogonal sum rule, a decision rule is applied to the mass function to classify the pixel into one of the classes within the frame of discernment. The decision rule may be based on maximum support (e.g. Lee, Richards, and Swain, 1987; Wilkinson and Mtiger, 1990) where the class with the highest support is assigned as the pixel label. Plausibility also has been adopted in classification decision rules. For example, Kim and Swain (1989) used maximum plausibility as a decision rule, and Srinivasan and Richards (1990) used
plausibility as a tie-breaking criterion within a maximum support decision rule applied to hierarchical class structures. In the MERCURY@ software, measures of both support and plausibility are included in the decision rule. Class allocation is based on the sum of support and plausibility for each class, with the pixel labeled as the class with the greatest sum. The plausibility measure is incorporated directly into the main decision rule of the MERCURY@ program so that evidence for the entire evidential interval can be considered. Furthermore, the degree of plausibility can be important when considering singleton sets or relatively small subsets of a frame of discernment because the evidence may warrant differing plausibilities to those sets which have equal support values, or which otherwise may not receive any support (Shafer, 1976). The fact that singleton set class structures are used in remote-sensing image classification provided extra rationale to incorporate plausibility explicitly in the decision rule. The required computation for plausibility is minimal, and allows the classifier to account for the range of uncertainty that may exist for the set of labels. It is noteworthy, however, that for a singleton set class structure with one uncertainty measure associated with each evidential vector, the MERCURY@ decision rule of maximum support and plausibility can be simplified to one of maximum support. In the Table 2 example in which three sources of data present conflicting evidence for classification, support is computed for each class label through repeated orthogonal summation for all sources, after which the plausibility for each class is derived. Evidential support and plausibility are added, and the maximum sum decision rule assigns the pixel to class 3.
MERCURY $ SOFTWARE DESIGN Computing environments
The MERCURY@ software package (Multisource Evidential Reasoning Classification of K-LUT Probabilities by orthogonal summation {@}) has been implemented efficiently in the C programming language (Kernigan and Richie, 1978) and runs under the following operating systems: Unix (BSD, RISC/ AIX, SunOS), Ultrix (DecStations), VAX/VMS, MS-DOS, and Apple Macintosh. The same version of ANSI Standard code is used in each computing environment, with portability among operating systems ensured through the use of C language preprocessor directives embedded in the program for system specific functions (usually input/output routines; although this feature could be used to specify hardware specific features, such as word sizes). Conditional compilation of source code is used to specify the operating system as a numeric commandline argument associated with a defined constant (OS
1168
D. R. Peddle
in the MERCURY@ program). For example, in the MERCURY@ software Unix is identified as operating system number 3 and commands specific to this operating system are enabled by compiling the program using the -D (define) switch as follows: %cc - D
OS = 3
mercury.c
Other operating systems use slightly different command-line syntax for conditional compilation, whereas in the Apple Macintosh environment, this is achieved through the menu driven interface. The MERCURY@ software supports a variety of different input data types and formats within these computing environments. For example, supervised training information and classification input data can be processed in either image or attribute table formats. For image data, the MERCURY@ software uses a standard and generic single-channel eight bit binary format with no header or trailer information. A conversion routine has been implemented to handle commercial image-analysis system (IAS) data formats, such as the EASIjPACE system (PCI, 1994). Attribute table files consist of columns of ASCII format data from each variable in the data set and are used typically with statistical software packages such as Excel (1992) and SPSS (1986). For the specification of supervised training data, a set of knowledge acquisition modules are used to interface with several commercial image-analysis systems. This takes advantage of standard and existing graphics routines for the delineation of training areas on a video display unit, which are available typically in an IAS. The IAS training data files are reformatted for use in the MERCURY@ environment through the external modules. As a result of the portability of the MERCURY@ source code, as well as its use of a simple and generic input format and its support of a variety of data types from commercial software systems through specialized functions and external interface modules, the MERCURY@ system can be used in a wide variety of computing environments and is compatible with many commercial image-analysis systems and statistical software packages. Organization
The MERCURY@ software package is divided logically into five modules. The three main programs are: (i) USER.C, a user interface; (ii) KLUT.C, which builds the knowledge look-up table; and (iii) MERCURY.C, the evidential classifier. The files GL0BALS.H and FUNCTIONSH contain variable declarations and a library of program functions accessed by different modules, respectively. Figure 1 shows the organization of the MERCURYQ software modules and the flow of analysis for a classification task. The software is used by running the programs USER.C, KLUT.C, and MERCURY.C in sequence. All necessary classification parameters are created through the user interface and stored in
functions. h
_-
ASSESSMENT
i Figure 1. Organization of MERCURY@ software.
a data parameter file on disk (DATA.PAR) for input to both the KLUT.C and MERCURY.C programs. The parameter file enables user selections to be archived for future use and reference, and also allows classification to be executed either interactively or in batch mode. In the latter situation, several classifications can be run concurrently while not impeding interactive work. The program KLUT.C reads the parameter file to obtain information necessary for training data input (e.g. number of classes and sources, filenames, formats, etc.). It then builds the knowledge look-up table (K-LUT) by compiling the original frequency distributions of all training data, and writes it to the disk file TA.KLT in ASCII format for subsequent input to the classification program. The raw K-LUT is saved to disk prior to applying any bin transformations or weighting factors to the evidence. This is done so that the frequency distributions from training data need only be computed once, from which any number of classifications could be performed (e.g. testing different bin sizes and source weights to refine a classification). In this way, training data are retained in a compact format in one location for future reference. The MERCURY.C program is the main component of the software. It first reads the data parameter file and the raw training data frequency distributions from the K-LUT disk file, and performs all specified knowledge domain processing (bin transformations, source, and class weighting) to prepare the final K-LUT of evidence. Each pixel to be classified is considered in sequence. An evidential vector over all
MERCURY@ classes in the frame of discernment is compiled for each data source by accessing the appropriate entry in the knowledge look-up table. The set of evidential vectors is aggregated into one resultant mass function by repeated application of the orthogonal sum rule using the methodology and equations presented earlier. In the final step, the decision rule is applied for the mass function to produce a class labeling for the pixel; this result is written to the output file in the format specified by the user. Once the classification is complete, it can be inputted to a post-classification accuracy assessment program (KAPPA.(Z) to provide summary statistics such as percent accuracy and Kappa coefficients of agreement. Design
criteria
In the method created to overcome limitations imposed by maintaining training data information using statistical models, these data are required instead to be stored explicitly as transformed frequency histograms Therefore, an efficient method of implementing this storage requirement is essential when the potential volume of information is considered (for example, for n sources and m classes, a total of n x m distributions must be maintained in computer memory). Additional consideration must be given to the fact that the MERCURY@ classifier is intended for processing multisource. disparate data for which the number of input sources, their properties, and the desired number of classes is likely to differ widely by application and within a given data set. As a result, there should not be restrictions on image size, the number of variables, or the number of classes, and for each data source, there should not be a limit to the range of digital numbers. To meet these requirements, a knowledge look-up table was designed and implemented in software to facilitate efficient storage and retrieval of computed evidence in memory, and to provide the necessary flexibility and functionality to the user. This functionality was possible using a combination of several advanced data structures which could be constructed within the C-programming language, and which allowed a reasonable balance to be reached among memory requirements, storage space, and computational efficiency. The background, design, and implementation of a compound data structure for the knowledge look-up table are described next. Dater sfructure
A data structure specifies how information is organized and stored internally in computer memory. In general, there are five fundamental operations that may be performed on a data structure: (i) insertion; (ii) deletion: (iii) retrieval; (iv) update; and (v) enumeration (Standish, 1980). In the construction of the K-LUT from training data in the MERCURY@ program. evidence is inserted continually into the table and frequency values updated; during classification. evidence is retrieved for each input pixel
I169
value under consideration. Therefore, the K-LUT should be designed to maximize the efficiency of insertion, update, and retrieval operations. The information in the K-LUT includes evidence for a range of values for a set of classes from a number of sources, and therefore could be abstracted as a three-dimensional storage requirement. Perhaps the most apparent and simple data structure to use for this situation would be a common three-dimensional array matrix, with a given piece of evidence indexed by source number, class number, and pixel value. Insertion, update, and retrieval would occur quickly by random access, and programming would be relatively straightforward. Unfortunately. however. arrays almost always have fixed bounds that must be known a priori, and worse, they require contiguous memory for their storage (Kochan, 1983). The former restricts their application in the general situation (i.e. when the number of required array elements is unknown or may differ), whereas the latter can create impractical memory requirements, particularly when many array elements are needed or if internal memory is fragmented. If memory space is not an issue (cg. large mainframe computers), it would be restrictive to limit classification to a predefined maximum image size and data range for a maximum number of sources and classes. Computing resources and allocated memory space invariably would be either insufficient or wasted, depending on the given number of sources, classes, and data range programmed. In the former situation, source code modification and recompilation would be required. One alternative to fixed-bounded arrays is the use of list data structures which do not require contiguous memory for their storage. The C-programming language offers a powerful feature for list-based access of memory stores through the use of pointers to achieve memory indirection. A pointer data type contains the physical address in memory of the data item being stored, thereby permitting the item to be accessed indirectly through a memory pointer address operator. The implication of this concept IS that related data items may be scattered independently throughout memory, linked together by memory indirection, and accessed as a list of entities. Because memory need not be contiguous to maintain a pointer-based data structure, the full amount of system memory is available to the program regardless of the internal configuration of memory and its state of fragmentation. A second advantage to list-based data structures is that dynamic memory allocation is possible. The Cprogramming language provides a number of memory allocation and reclamation functions which can be called during program execution. This facilitates the efficient use of memory only as it is required, while allowing memory space to be returned to the system when it is no longer needed. The implications of this, firstly, are only the exact amount of required memory is allocated for the task at any given time. and
D. R. Peddle
1170
secondly, additional memory can become available for reuse during program execution. This functionality can be critical in applications which have significant memory requirements, particularly when these requirements are not constant throughout program execution. Knowledge look-up table Figure 2 shows the MERCURY@ knowledge look-up table data structure which has been designed to store evidence derived from training data for each class and each source. The K-LUT is comprised of a series of pointer-based linked list structures. The top-level (list S in Fig. 2) contains information pertaining to the input data sources. Each node in the linked-list is organized as a C-language structure type variable which contains pertinent information about each data source (e.g. the data source file name, the user-specified bin size, missing and undefined values for the data source, and a series of flags which specify the scale of measurement of the data and how missing
and undefined data are to be treated during classification). Each linked-list node also contains a pointer to the next source node structure, with the final (nth) node linked back to the first node at the head of the list, thereby creating a circularly linked list data structure at this level. This was implemented to enable efficient and continuous access to the K-LUT during the process of orthogonal summation in which evidence from each source is combined sequentially and repeatedly until all sources have been processed. Because the mathematical law of commutativity holds for the Dempster-Shafer Orthogonal Sum Rule (Shafer, 1976), the order in which evidence is combined from different sources does not matter. As a result, random access to each node is not required. Associated with each source node in the top-level linked list is a second linked list containing training data information for each of the m classes under consideration (Fig. 2). Nodes at this second level are organized as a structure variable, with information fields pertaining to the training data sample size for the given class, and flags signaling the specified configuration for the bin transformation procedure. Similarly to the top-level source nodes, these secondlevel class nodes are maintained as a circularly linked list. This was developed to reduce the time required to build the K-LUT, because, in most situations, it will not be necessary to traverse the entire list to reach the appropriate class node to be updated. Knowledge retrieval
HASH TABLE 0~112(...110 Sorted One -Way ii
_I
Linked List key
hey /I 1 SUP,W~
/
, I suppon 0
As_+111+
Figure 2. Data structures
+rii’
01
used in MERCURY@ Knowledge Look-Up Table (K-LUT). Evidence from training data for n sources and m classes is stored and accessed efficiently using two levels of circularly linked lists and hash tables consisting of small array and one way linked lists maintained in ascending order of pixel value search keys.
As mentioned in the previous section, the first-level source lists and second-level class lists are referenced in sequential order when building the K-LUT from training data, and when retrieving information from it for classification. These two levels of circularly linked lists serve to locate the area in memory where evidence resides for a given source and class of interest. Once these two lists have been traversed to the appropriate pool of evidence, the task becomes one of needing rapid, random access to the quantity of evidential support for a given pixel value under consideration (from a given source, and with respect to a gtven class). One solution would be to use a l-l mapping to an array data structure, where each possible pixel value combination would form the indices for direct access within a matrix. This approach has been used successfully in table look-up algorithms by Bolstad and Lillesand (1991). Ahearn and Wee (1991), and Scarpace and others (1992) for improving and optimizing the computational efficiency of maximum likelihood classifications by removing the redundancy of classifying frequently occurring pixel combinations. These implementations worked well for a limited number of satellite image bands (three to six) for which the input data domain was fixed and known (all bands comprised &bit data, ranging from 0 to 255). However, multisource data sets may contain larger numbers of nonredundant features, each of which may possess an unknown or variable
1171
MERCURY 8
absolute numeric range of digital values (e.g. 8- or 16-bit image values, or, nominal data with only several different values). Therefore, a simple l-l mapping to an array would be impractical because the required number of array dimensions and the appropriate array sizes would be both variable and data dependent, and would likely exceed (or waste) available contiguous memory. To overcome these problems, a hash table (Knuth, 1973) was implemented based on a compound array/ list data structure (Fig. 2). A hash table allows quick, random access to evidence by storing data referenced by search keys (in this situation, pixel values) at memory addresses computed as a function of the search key itself. This type of memory organization permits search keys with an unlimited numeric range to be stored within a known amount of storage. Two issues in computational searching must be resolved to achieve an efficient hash table implementation (Standish, 1980): (i) locate a hashing function, h(K), which minimized the number of collisions (or different values mapped to the same memory location) for a distribution of possible input keys K and a target address space (or table size) M; and, (ii) devise a collision resolution policy for locating a given key amongst those with which it has collided. In the MERCURY @ software, the modulus division method was used as a hashing function, with a table size M = 11. This method was selected based on recommendations in Knuth (1973) and Standish (1980) and following the results of a study by Lum, Yuen, and Dodd (1971) in which the division method with a prime divisor was shown to perform best among eight different hashing functions tested. Evidence for a given input pixel K is hashed to an address in memory computed using the hashing function: h(K) = K MOD M.
(5)
A relatively small table size was selected to minimize the amount of contiguous memory allocated for each of the n sources and m classes of Figure 2 (a total of n x m hash tables are required). A prime number was selected for M because this is more likely to produce a more even distribution of addresses over the hash table (Standish, 1980). A collision resolution by chaining policy (Kruse, 1987) was implemented to ensure that hash-table overflow will not occur (unless the system itself runs out of memory), because new memory is allocated dynamically and only as needed when a collision occurs. Each slot of the hash table contains a pointer to a linked list, which is maintained in ascending order to expedite insertion and retrieval from the K-LUT. When inserting into the K-LUT, the hashing function is first invoked, and the new value is inserted either as a new node at the appropriate place along the sorted list (which may be at the head of the list if no collisions have occurred), or, if the key value exists already in the list, its frequency of occurrence (fin Fig. 2) is incremented. During the histogram bin transformation process, the entire data
structure is traversed with existing nodes updated or new nodes inserted as required. When retrieving evidence from the K-LUT, the pixel value is hashed to a slot and the sorted list is traversed until the search key which contains the evidence is located. If the key is not located in the sorted list, then there is no evidence for it from the training data, and zero support is assigned for that class.
EXAMPLE APPLICATIONS Land-cover class$ication Two classification tasks were designed to test the ability of the MERCURY@ software for classifying land-cover and permafrost from multisource digital data sets [see Peddle (1993), and Peddle and Franklin (1993), for a full account of these experiments]. The study was conducted in a 10 x 10 km area of mountainous terrain in the southwest Yukon Territory, Canada. The digital data sources for this study include a multispectral SPOT satellite image acquired 21 July 1990, and a dense grid 22 m digital elevation model (DEM). Measures of spectral image texture (angular second moment, inverse difference moment, entropy) were processed from each SPOT image band using a spatial cooccurrence algorithm (Franklin and Peddle, 1987), and geomorphometric measures of slope, aspect, incidence, relief, cross-slope convexity, and down-slope convexity were extracted from the coregistered DEM using the GEDEMON software package (Peddle and Franklin, 1990) to provide the additional information necessary to increase landcover classification accuracy in complex, high-relief areas. Observations of land-cover from field work and aerial photointerpretation were compiled for 1693 pixel sites. These pixels were identified in the registered image data sets and divided randomly into a mutually exclusive set of 1238 training and 455 test pixels. The training data were input to the KLUT.C program to build the knowledge look-up table. The entire image then was classified by the MERCURY.C program with classification accuracy determined with respect to the 455 independent test pixels. Table 3 provides a summary of results obtained from an extensive comparison of 36 land-cover classifications using maximum likelihood (ML), linear discriminant analysis (LDA), and evidential reasoning (Peddle, 1993). In that study, it was determined that ML accuracy was highest if a small number of variables was used, and that as additional variables were introduced, accuracy levels declined to a low of 32% when all 12 available multisource variables were used. This was attributed to the sensitivity of ML to increased data dimensionality and violations of its statistical assumptions (several of the texture and geomorphometric variables were not distributed normally). The opposite trend was determined with LDA and ER-accuracy increased as new information was made available to
D. R. Peddle
1172
Table 3. Comparison of land-cover classification using Maximum Likelihood, Linear Discriminant Analysis, and MERCURYQ Evidential Reasoning algorithm for classifying different numbers of input variables from multisource image data set. Results reported in percent accuracy and using Kappa Coefficient of Agreement (K), with highest result for each variable set highlighted in bold Number of
Maximum Likelihood
Variables % 3
75
K
MERCURY@
Evidential Reasoning
%
K
%
K
(0.72)
59 (0.53)
6
78 (0.75)
75 (0.72)
85
(0.83)
9
55 (0.51)
78 (0.76)
84
(0.82)
12
32 (0.29)
84 (0.81)
91
(0.90)
the classifiers, with LDA reaching a maximum of 84%. However, neither ML nor LDA were able to process the directional aspect variable as a result of their reliance on measures of central tendency. The MERCURY@ software was the only classifier able to process all available variables. The highest overall accuracy obtained from all 36 tests was 91% using the MERCURY@ classifier. The bin sizes used for each input variable (the only user defined parameter required) for these evidential classifications ranged from 17 to 25. The process of bin size selection is discussed in full in Peddle (1995) where it was determined that the MERCURY@ classifier is not sensitive to changes in bin size, as long as sufficiently large bin sizes are selected for each variable (classification accuracies became constant using bin sizes of 15 and greater for this data set). The resulting land-cover classification map from the MERCURY@ classifier is shown in Figure 3 in plan perspective.
Permafrost
Linear
Discriminant Analysis
class$ication
The occurrence of permafrost (perennially frozen ground) is a complex environmental condition which is difficult to detect and monitor throughout large areas because of its dynamic subsurface and temporal nature. The depth of the permafrost active layer (ground above the permafrost table which undergoes annual freezing and thawing) is an important, physically based surrogate for understanding soil thermal dynamics, predicting alpine slope hazard processes (e.g. debris flows, land slides), and as an input to models of atmospheric
61 (0.56)
from a topographic map to determine equivalent latitude, whereas the land-cover variable was obtained from the optimal evidential classification of landcover discussed in the previous section and shown in Figure 3. Each variable in this multisource data set exists at a different scale of measurement: the landcover labels are nominal data, aspect is directional information, and equivalent latitude is ratio level data-the ML and LDA classifiers are not suited to handling nominal or directional data and therefore only the MERCURY@ algorithm could be used to classify permafrost. Measurements of permafrost active layer depth were obtained at 507 sites in the field using a soil probe. This sample was divided randomly into 344 training sites and 163 test sites for input to the MERCURY@ software. The active layer depth measurements were aggregated into three classes at 25 cm intervals (i.e. < 25 cm; 25-50 cm; > 50 cm), with a fourth class representing absence of permafrost. Bin sizes of 11 and 5 were used for the aspect and equivalent latitude variables, respectively, whereas for the land-cover variable, a bin size transformation was not used because this is inappropriate for nominal-scale data. A classification accuracy of 79% was obtained for 163 test pixels, which was considered to be a good result given the complexity of the application. This level of accuracy also compared favorably with a separate classification of ground data which had an accuracy of 83% (Peddle and Franklin, 1993). The final map of permafrost active layer depth obtained from the remotely sensed variables is shown in Figure 4 in three-dimensional perspective view draped over the digital elevation model. SUMMARY AND CONCLUSION The MERCURY@ software system has been presented for multisource data classification of digital image data using a new interface to Dempster-Shafer evidential reasoning as an alternative to traditional classification by Bayesian probability theory. The
n Water n White Spruce n Woodland q Upland Shrub n Alpine Shrub
Alpine Tundra Alpine Barrens
Forest
n n
Organic
Terrain
Exposed Slopes
Figure 3. MERCURY@ land-cover classification in plan perspective multisource variables derived from SPOT imagery and digital elevation 91% (kappa coefficient: 0.90).
(north to the top) model. Classification
from 12 accuracy
Figure 4. MERCURY@ evidential classification of permafrost active layer depth shown in 3-dimensional perspective view draped over digital elevation model; view direction is northwest. Classes: light blue: < 25 cm active layer depth; medium blue: 25-50 cm; dark blue: > 50 cm; light red color indicates absence of permafrost.
1173
1175
MERCURY@
approach overcomes a number of theoretical limitations and conceptual problems with conventional classification algorithms, as follows: it can process an unlimited
number of variables from higher dimensional data sets (e.g. hyperspectral imagery from airborne and future satellite platforms); it is a nonparametric classifier and therefore can handle data which violate the Gaussian assumption of parametric classifiers; it has an explicit mechanism for handling information uncertainty, which is important when concerned with disparate, multisource information or databases with inherent error or missing data; it can process nominal, ordinal, interval, and ratio-level data, as well as other data types such as directional information; in situations of missing or incomplete information, the evidential reasoning approach to classification provides a more accurate and precise representation of the available knowledge compared with the arbitrary assignment of probability used in Bayesian theory.
The MERCURY@ software system has been implemented efficiently in the C-programming language and has been designed to maximize flexibility, functionality. and ease of use in a variety of computing environments. For example, the MERCURY@ software package was shown to be: l
l
l
l
l
l
l
l l
portable to the Ultrix, Unix, VAX/VMS, MSDOS, and Apple Macintosh operating systems; implemented using C-language preprocessor directives and conditional compilation to permit the same version of source code to be used for all operating systems supported; compatible with data formats used by a number of commercial image-analysis systems (IAS) and statistical software packages through the use of a simple and generic data format as well as specialized knowledge interface modules; compatible with IAS graphics routines for interactive training data delineation; capable of processing both image and attribute data in a supervised classification framework; efficient in memory allocation and resource usage through linked list and hash table data structures and memory reclamation functions which operate during program execution; not limited to a fixed number of data sources, image dimensions, or number of classes, as well as being capable of handling any dynamic range of input data; suitable for interactive or batch-mode processing; straightforward to use as a result of a complete, functional user interface.
The MERCURY@ software package also provides an improved, more objective method for computing
evidence for input to the Dempster-Shafer framework, compared with existing, ad hoc implementations. In an example application, the MERCURY@ software achieved significantly higher levels of classification accuracy compared with traditional maximum likelihood (ML) and linear discriminant analyses (LDA) of nine land-cover classes in a complex, high-relief environment. It also was used to address successfully the intricate task of permafrost active layer depth classification using a multisource data set which could not be handled by the ML and LDA classification algorithms. From these results, and the theoretical and practical advantages to this new implementation, we conclude that this approach to classification should become increasingly attractive for multisource data analysis, particularly as sophisticated remotesensing satellite systems continue to evolve (e.g. the NASA Earth Observing System, ERS radar systems, RADARSAT) and new techniques are required to extract their rich information content for complex applications. Additionally, beyond the remote-sensing examples presented here, the generality and portability of the MERCURY@ software package could be suitable to a wide variety of current and future environmental, earth resources, and geoscience applications. Readers interested in obtaining the MERCURY 8 classifier, or other software developed in this research, are welcome to contact the author. Acknowledgmena-This research was supported by the Natural Sciences and Engineering Research Council of Canada, the Eco-Research Tri-Council Secretariat, the Alberta Heritage Scholarship Fund, Forestry Canada, and the Institute for Space and-Terrestrial Scie&e. Dr Steven Franklin and Dr Nigel Waters, University of Calgary, Dr Peng Gong, University of California at Berkeley, and the anonymous reviewers provided helpful comments on this research. I thank Dr Ellsworth LeDrew and Greg McDermid for their assistance in producing Figures 3 and 4:
REFERENCES Ahearn, S. C., and Wee, C., 1991, Data space volumes and classification optimization of SPOT and Landsat TM data: Photogrammetric Engineering & Remote Sensing, v. 57, no. 1, p. 61-65. Argialas, D. P., and Harlow, C. A., 1990, Computational image interpretation models-an overview and a pcrspective: Photogrammetric Engineering & Remote Sensing, v. 56, no. 6, p. 871-886. Bolstad, P. V., and Lillesand, T. M., 1991, Rapid maximum likelihood classification: Photogrammetric Engineering & Remote Sensing, v. 57, no. I, p. 67-74. Campbell, W. J., and Cromp, R. F., 1990, Evolution of an intelligent information fusion system: Photogrammetric Engineering & Remote Sensing, v. 56, no. 6, p. 867-870. Caselton, W. F.. and Luo, W.. 1992, Decision making with imprecise probabilities-Dempster-Shafer theor; and application: Water Resources Research, v. 28, no. 12, p. 3071-3083. Dempster, A. P., 1967, Upper and lower probabilities induced by a multivalued mapping: Annals of Mathematical Statistics, v. 38, p. 325-339. Excel, 1992, Excel user’s guide, Microsoft Corporation, Redmond, Washington, 640 p.
1176
D. R. Peddle
Franklin, S. I?., 1989, Ancillary data input to satellite remote sensing of complex terrain phenomena: Computers & Geosciences, v. 15. no. 5, p. 799-808. Franklin, S. E., and Peddle, D. R., 1987, Texture analysis of digital image data using spatial cooccurrence: Computers & Geosciences, v. 13, no. 3, p. 293-31 I. Garvey, T. D., 1987, Evidential reasoning for geographic evaluation for helicopter route planning: IEEE Trans. Geoscience and Remote Sensing, v. 25, no. 3, p. 294304. Garvey, T. D., Lowrance, J. D., and Fischler, M. A., 1981, An inference technique for integrating knowledge from disparate sources: Proc. Seventh Intern. Conf. Artificial Intelligence, Vancouver, Canada, p. 3199325. Goldberg, M., Goodenough, D. G., Alvo, M., and Karam, G.. 1985, A hierarchical expert system for updating forestrv mans with Landsat data: Proc. IEEE. v. 73. no. 6, p. 105’441063. Kernigan, B. W., and Richie, D. M., 1978, The C programming language: Prentice-Hall Software Series, Bell Laboratories, Prentice-Hall, Englewood Cliffs, New Jersey, 228 p. Kim, H., and Swain, P. H., 1989, Multisource data analysis in remote sensing and geographic information systems based on Shafer’s theory of evidence: Proc. Intern. Geoscience and Remote Sensing Symposium/Twelfth Canadian Symp. Remote Sensing, Vancouver, Canada, v. 2, p. 8299832. Knuth, D. E., 1973, The art of computer programming sorting and searching: Addison-Wesley, Reading, Massachusetts, v. 3, 722~. Kochan, S. G., 1983, Programming in C: Hayden, Hasbrouck Heights, New Jersey, 373 p. Kruse, R. L., 1987, Data structures and program design: Prentice-Hall, Englewood Cliffs, New Jersey, 586 p. Lee, T., Richards, J. A., and Swain, P. H., 1987, Probabilistic and evidential approaches for multi-source data analvsis: IEEE Trans. Geoscience and Remote Sensing. v. 25, no. 3. p. 2833292. Lum, V. Y., Yuen, P. S., and Dodd, M., 1971, Keyto-address transform techniques-a fundamental performance study on large existing formatted files: Comm. Assoc. Computing Machinery, v. 14, no. 4, p. 228239. Moon, W. M., 1990, Integration of geophysical and geoI.
logical data using evidential belief function: IEEE Trans. Geoscience and Remote Sensing, v. 28, no. 4, p. 71 I-720. Moon, W. M., 1993, On mathematical representation and integration of multiple spatial geoscience data sets. Can. Jour. Remote Sensing, v. 19, no. 1, p. 63-67. PCI, 1994, Easi/Pace image analysis system user’s manuals: Perceptron Computing Inc., Richmond Hill, Ontario, Canada, variously paged. Peddle, D. R., 1993, An empirical comparison of evidential reasoning, linear discriminant analysis and maximum likelihood algorithms for alpine land cover classification: Can. Jour. Remote Sensing, v. 19, no. 1, p. 31-44. Peddle, D. R., 1995, Knowledge formulation for supervised evidential classification: Photogrammetric Engineering & Remote Sensing v. 61, no. 4, p. 409-417. Peddle, D. R., and Franklin, S. E.,-1990, GEDEMON: a FORTRAN-77 nroaram for restoration and derivative processing of digit;1 image data: Computers St Geosciences, v. 16, no. 5, p. 669-696. Peddle, D. R., and Franklin, S. E., 1993, Classification of permafrost active layer depth from remotely sensed and topographic evidence: Remote Sensing of Environment, v. 44, no. 1, p. 67-80. Rey, M., Tunaley, K. E., and Sibbald, T., 1993, Use of the Dempster-Shafer algorithm for the detection of SAR ship wakes: IEEE Trans. Geoscience and Remote Sensing, v. 31, no.5, p. 1114-1118. Scarpace, F., Selvan, A., Weiler, P., and Seidl, L., 1992, Maximum likelihood classification using a table look-up technique: Proc. ASPRS Conference, Washington, D.C., v. 4, p. 171-174. Shafer, G., 1976, A mathematical theory of evidence: Princeton Univ. Press, Princeton, 297 p. SPSS Inc., 1986, SPSS’ user’s guide (2nd ed.): Chicago, Illinois, 988 p. Srinivasan, A., and Richards, J. A., 1990, Knowledge-based techniques for multi-source classification: Intern. Jour. Remote Sensing, v. 11, no. 3, p. 505-525. Standish, T. A., 1980, Data structure techniques: AddisonWesley, Menlo Park, California, 447 p. _ Wilkinson. G. G.. and Meeier. J.. 1990. Evidential reasoning in’ a pixel class&cation hierarchy-a potential method for integrating image classifiers and expert system rules based on geographic context: Intern. Jour. Remote Sensing, v. II, no. 10, p. 1963-1968.