An investigation on some sequential algorithms for terrain classification

An investigation on some sequential algorithms for terrain classification

Pattern Recognition Letters 14 (1993) 523-529 North-Holland June 1993 PATREC 1063 An investigation on some sequential algorithms for terrain classi...

473KB Sizes 1 Downloads 103 Views

Pattern Recognition Letters 14 (1993) 523-529 North-Holland

June 1993

PATREC 1063

An investigation on some sequential algorithms for terrain classification Y. H u a n g a n d P. Z a m p e r o n i In,~titul /JTr :Vactlric/itentechnik, Technisctle Universitdt Braunsc/lweig, SctlleinitzslraBe 23. D-3300 Braunscln~ci.~,, Germany

Received 27 February 1992

,~/)stral'l Huang, Y. and P. Zamperoni, An investigation on some sequential algorithms for terrain classification. Pattern Recognition Letters 14(1993) 523 529. [his paper describes three extensions of the sequential probability ratio test (SPRT) algorithm (Fu (1968)), originally developed as a two-class classifier, to the multi-class classification problem. These sequential methods have been applied to the crop classification in remote sensing images taken from agricultural areas. The performances of the considered methods are compared and the classification scores are given. K~,yuords. Sequential probability ratio test, multi-class classification, remote sensing images.

1. Introduction In pattern recognition sequential decision procedures have been used besides other statistical classification techniques. The use of a sequential decision process for pattern classification m a y be advantageous 'if the cost of taking feature measurements is to be considered or if the features extracted f r o m input patterns are sequential in nature' (Fu (1968), p. 14); here we consider especially the case that the cost of taking a feature measurement is high. A sequential probability ratio test (SPRT) method, suggested years ago (Wald (1947) can be used if there are two pattern classes to be recognized. According to this method, the probability ratio 2., considered after the nth sequential measureCorrespomh'nc~' to. Piero Zamperoni, lnstitut ffir Nachrichtentechuik, Technische Universitfit Braunschweig, Schleinitzstral3e 23, D-3300 Braunschweig, Germany. Email: zam(¢ifn.ing.tu-bs.de

ment of the feature vector )? has been taken, is defined as follows:

pn(R/~ol) p.(R/co2) p.(2/coi), i =

2n -

(1)

where 1, 2, is the conditional probability density function of 2 for the pattern class coi. Note that the SPRT approach does not specify in which one of the following ways the ' n t h feature vector measurement' is realized: • the object of the measurements is always the same physical sample, and n is the number of the extracted features, i.e., every measurement considers one additional feature; • the dimensionality k of the feature space spanned by X is constant, and for every measure a new physical sample is considered. In our case the input data are taken from already segmented remote sensing views (Huang (1991)), The segmentation process resulted in a subdivision of the images into homogeneous regions. In the

0167-8655/93/$06.00 @ 1993 - - Elsevier Science Publishers B.V. All rights reserved

523

Volume 14, Number 6

P A T T E R N R E C O G N I T I O N LETTERS

classification stage, which constitutes the object of this work, samples are extracted from each region, with the aim of attributing them to one out of a set of known terrain classes. Under these circumstances the second alternative has been chosen within the scope of this work. In fact, it is difficult to extract a great number of independent features (as m a n y as necessary for taking a decision) f r o m the same neighbourhood of an image, but it is relatively easy to extract the same feature vector from a great number of points, spread inside of a region. The value of 2, is then compared with two stopping boundaries A and B. The decision is for X e c o I if 2n~>A, and for ) ? c o ) 2 if 2n~B; if B<)~n
A - - - , el2

B = - l -el2

eij the probability of f(ee)j, and i, j e {1,2}. with

821

(2)

the decision X e o) i when

The application of sequential decision procedures to pattern classification was proposed in Fu (1968), where also a generalized sequential probability ratio test (GSPRT) was suggested for the case of more than two pattern classes. The definition of the G S P R T and its decision rules will be given in the next section. This paper presents two new SPRT-based approaches to the multi-class classification and compares their performances with those of the G S P R T for the concrete application of classifying remote sensing data taken from aerial views of agricultural areas, which may include also anomalous kinds of terrain, as for instance soil erosion or mud accumulation. In addition to natural patterns, also synthetic patterns generated by means of r a n d o m processes have been used as input data, in order to compare the performances of the investigated approaches. The paper is organized as follows: Section 2 describes the various classifiers; Sections 3 and 4 report on experimental results obtained with the synthetic and with the natural patterns, respectively; Section 5 is dedicated to some conclusions. 524

June 1993

2. Approaches to sequential multi-class classification

2.1. Generalized sequential probability ratio test With m classes ( m > 2 ) , the generalized probability ratio )~n(2/O)i) for the class Wg after the nth sample is defined in Fu (1968) by:

/I JO,

=

J

(3)

f o r i = 1,2, ...,m. The stopping boundaries A(oJi)

of the classes i = 1. . . . . m for the G S P R T are defined analogously to those of equation (2) of the twoclasses case:

A((.oi)=(1-eii)/[ fi(1-eij)] 1/m t

(4)

j=l

for i = 1, 2, ..., m, where eij is the desired probability of taking the wrong decision ) ( e a) i when J(e~oj is true (in particular, for i=j it is the desired probability of a correct decision for the class i). If

J.n(X/(.oi)<~A((Di)

V i = 1,2 .... , m

(5)

is true, the class (.oi is not furtherly considered as a candidate class for the actual data sample. After rejecting the class wi, the total number of candidate classes is reduced by one, and a new G S P R T is performed upon the remaining m - 1 classes. In this way the classes are rejected one after the other, until only one class is left, which is then accepted as the recognized one. In Fu (1968) also a modified stopping boundary An((Z~i) is proposed, depending upon the stage n of the test, with the aim of obtaining a faster convergence speed, and therefore of reducing the classification time:

for n = 1, 2 ..... N

(6)

where N represents the m a x i m u m allowed number of samples before the test is forcedly truncated.

Volume 14, Number 6

PATTERN RECOGNITION LETTERS

June 1993

2.2. Matrix-SPRT: a matrix decision rule

2.3. Tree-SPRT: a tree decision rule

This classifier has been developed on the basis of the two-classes SPRT. The classification is obtained by evaluating a matrix, whose elements are the results of single SPRTs, performed on each pair of classes. The matrix is defined to have zero diagonal elements:

In this approach, successive SPRTs are performed f r o m the bottom to the top of an oriented tree structure, whose terminal nodes are constituted by the terrain classes considered in the decision process. Within the scope of this investigation, the two tree structures represented in Figure 1 have been realized and tested with m = 7 classes. The value of the tolerable error rate is scaled over the successive SPRT-stages, in the sense that it grows f r o m one stage to the next. The class left after the last decision has been considered as the recognized class. In the practical realization, particular attention has been devoted to the investigation of those aspects of the decision strategy, which can sensibly influence the classification performance. The most important ones a m o n g them are: • The choice between the tree structures (a) and (b) of Figure 1. • The choice between a 'running sequence' and a 'non-running sequence' of data samples. In the first case, after every SPRT (or after every class rejection in the GSPRT) the next test is started with new data samples, i.e., with new image points belonging to the segmented region under consideration, and features are extracted f r o m these points. In the second case the next test uses the same image samples as the previous test, in the same sequence. • The choice of the class to be associated with each single terminal node in the tree structures of Figure 1. In order to reduce the classification time, the two classes to be submitted to a SPRT have

0

a12

""

alto ~

a21

0

"'"

a2m |

!

•.- io I aml

(7)

-..

The matrix elements assume the following values: • aij>0, if the SPRT performed upon the classes coi and coj decides for coj; • aij < 0, if the decision of the S P R T is coi. In the present work two kinds of approaches have been taken for determining the values of the elements aij : 1. the elements of the matrix are constants c, so that aije {-c,0, c}; 2. the elements of the matrix are the signed numbers of SPRT iterations, necessary for taking a decision between 09i and ogj, i.e., ao~ {-nij, O, +nij }. For both matrices, in the ideal case all the nondiagonal elements of the column corresponding to the recognized class should be positive; for the second type of matrices they should also be as small as possible. Decision criteria based upon the distribution of the values of the matrix elements have been derived in the learning phase.

d

(~)

e

c

i

s

i

(b)

Figure 1. Tree structures used for the terrain classification with the tree-SPRT method into 7 classes. @ : terminal nodes, represented by the classes o9I, ...,co 7. @: SPRT between two concurring classes. 525

Volume 14, Number 6

PATTERN RECOGNITION LETTERS estimation of the conditional probability density distribution from learning samples of all classes V~

Patter~Ifeature

1 Icondition 1 pr°babi-I

I

/

June 1993

GSPRT

t

~ I decision

v ~MATR1X-SPRT~ Figure 2. Overall block diagram of the terrain classification system, showing the three investigated classifier types.

been chosen so as to favour pairings between the classes with the highest separability. The block diagram of the investigated classification system, featuring three different classifier types, is outlined in Figure 2. The same system has been used for synthetic and for remote sensing images.

3. Classification of synthetic images 3.1. Synthetic images and selected features The synthetic test images have been obtained from generators of independent random noise with normal or equal distribution. The parameters controlling the pattern generation are the average grey value and the grey value variance. These quantities represent also the components of the two-dimensional feature vector used in the classification process. Local grey value average and variance for an image sample point are determined inside of a small window (typically 5 × 5 points) centered upon the considered sample point. The learning phase consists of the estimation o f the conditional probability density distribution for all ogi's, by determining the corresponding two-dimensional histograms in the feature space defined above. The estimation has been improved by means of an iterative procedure for smoothing the sparselysettled regions of the feature space, described elsewhere (Lindemeier (1990)). 526

3.2. Experiments and results Since the matrix- and the tree-classifier used also for the terrain classification are based upon single SPRTs, first the behaviour of a pure SPRT classifier with the synthetic data has been investigated. An upper limit has been assumed for the number of data samples. The following properties have been observed: 1. Convergence: if the number of samples is large enough (about 106 in our case), the classification process converges almost always before the truncation threshold is attained. 2. Dependence of the measured classification error rate R upon the class separability, characterized by the Bayes risk RB: this dependence is shown in the plot of Figure 3 for a given value of eij. The general tendency is that R <~eij for R B,~ eij, and R = R B for R B ~>eij. 3. Behaviour in presence of classes not considered in the learning phase: the experiments show that a decision is taken by the classifier also in these cases. Also the multi-class classifiers described in Section 2 have been tested with synthetic images. In order to compare the performances of the various classifiers, the error rates have been measured for the same patterns. The results of the experiments can be summarized in the following points: 1. For not too small values of the admitted error rate e (in our case for e~>0.05), the probability of obtaining R ~
Volume 14, Number 6

P A T T E R N R E C O G N I T I O N LETTER S

June 1993

classification error R relative to the three classifier types is plotted for an admitted error rate e = 0.05. This diagram shows also that the matrix-SPRT has the highest probability of exceeding the threshold e. 03

4. Terrain classification in remote sensing images 02

4.1. Terrain types and feature selection 0.1 I

00

0 I. 1

I

1

I

02

I

I

03

---

RB

Figure 3. Plot representing the dependence of the classification error rate R upon the Bayes risk R n, for e = eij = e j i = O. 1.

2. O f all the investigated approaches, the matrix-SPRT delivers the worst classification results. Nevertheless, the matrix-SPRT was the best in recognizing unknown classes; under this aspect the worst performances have been those of the G S P R T . 3. The G S P R T needs the lowest computation time, while the tree-SPRT with non-running sample sequence needs the lowest number of samples. 4. Although the measured average error rates of the tree-SPRT and of the G S P R T are near to each other and both lower than the admitted error rate, the probability of obtaining absolutely very low error rates is higher with the G S P R T than with the tree-SPRT. This behaviour is illustrated by the diagram in Figure 4, where the distribution of the

fiR)["/.]] GSPRT TREE-SPRT 2010.

"~M~RIX- SPRT

The natural terrain patterns used as input data have been extracted from already segmented remote sensing images of agricultural areas. These areas are predominantly cultivated with seven kinds of crops, which represent also the classes 091..... 097 considered for classification: 091 is potato, 092 is rye, 093 is spring barley, 094 is forest, 095 is wheat, 096 is meadow, and 097 is sugar beet; for these classes a learning stage has been performed. Besides these seven classes, an eighth class has been considered, which corresponds to all the unknown terrain types. A classification is correct if a pattern, not belonging to the seven known classes, is classified as a member of the eighth class. The classes co] to 097 can also be grouped into the two geographically relevant categories of grain crops ((,02, 093, and o95) and of non-grain crops (091, 094, o96, and 097)- A classification into these two categories is of some interest in relation with geographical information systems. In the classiciation process a five-dimensional feature space has been used. This subset of five features is selected from an extended set of statistical features, measured in windows of different sizes on the image, comprising the local average grey value and textural features defined by means of averaged differences between neighbouring pixels in various geometrical relationships (Huang (1991)). The five actually used features have been selected after the learning phase, so as to minimize the Bayes risk.

4.2. Experiments and results

0.0

0.05



0.10

I

015

i_

R

Figure 4. Distribution of the classification error R for a tolerated error rate of e = 0.05, relative to the three considered classifier types.

The performances of the considered classifiers, in relation with the variation of the various parameters, can be summarized as follows: 1. Influence of the size of the window used in 527

Volume 14, Number 6

PATTERN R E C O G N I T I O N LETTERS

defining the features: Generally, the optimality of the selected feature subset depends upon the window size. The feature sets actually used in the experiments have been optimized for a window size of 7 × 7 to 9 x 9 pixels. This size represents a compromise between conflicting requirements: texture stationarity, image resolution, and computational complexity. 2. Total number of data samples needed for the classification: The experiments showed that the matrix-SPRT approach needs a sensibly larger number o f samples than the other two methods. For comparable classification scores, the treeSPRT with a non-running sample sequence needs the lowest number of samples. Figure 5 gives the mean sample size needed by the three classifier types for optimally chosen feature sets, as a function of the tolerated error rate e. 3. Influence of the tolerated error rate e: Figure 5 shows also that the required sample size decreases with increasing values of e. However, the rate of this decrease is slow, and raising the error acceptance threshold does not result in a rewarding cut of the sample size. As for the influence of e upon the classification scores, this dependance varies from one feature set to the other. The general tendency is that of a monotone variation (either falling or growing) of the classification score with growing values of e. However, this variation could be observed only for values of e > 0 . 1 . The assessment of the unsupervised classification results obtained in this investigation has been

number of samples

50

June 1993

possible thanks to ground truth available form man-made mappings of the cultivations existing on the terrain. Anomalous zones occurring in the test areas, as for instance soil erosion or mud accumulation, have not been evaluated, in order to restrict the assessment only to the seven crop and vegetation classes. This has been possible, because such anomalous zones can be detected also by means of a shape analysis (irregular, concave or elongated shapes). Table 1 gives an overview of the unsupervised classification scores for each class, using the same feature set and an acceptance error threshold of e = 0 . 2 . The size of the data sample has been

Table 1 GSPRT Class

1 2 3 4 5 6 7

Classification score (°70) 1

2

3

4

5

6

7

8

80.0 0.0 9.1 0.0 0.0 0.0 10.0

0.0 88.9 18.2 0.0 0.0 0.0 0.0

20.0 11.1 45.5 0.0 12.5 0.0 10.0

0.0 0.0 9.1 100.0 12.5 0.0 0.0

0.0 0.0 0.0 0.0 50.0 0.0 20.0

0.0 0.0 18.2 0.0 25.0 80.0 0.0

0.0 0.0 0.0 0.0 0.0 20.0 60.0

0.0 0.0 0.0 0.0 0.0 0.0 0.0

Tree-SPRT Class

Classification score (°70) 1

1 2 3 4 5 6 7

60.0 0.0 9.1 0.0 12.5 40.0 10.0

2

3

0.0 40.0 66.7 33.3 18.2 54.4 0.0 0.0 12.5 12.5 20.0 0.0 0.0 10.0

4

5

6

7

8

0.0 0.0 9.1 100.0 0.0 0.0 0.0

0.0 0.0 0.0 0.0 62.5 0.0 20.0

0.0 0.0 9.1 0.0 0.0 40.0 10.0

0.0 0.0 0.0 0.0 0.0 0.0 50.0

0.0 0.0 0.0 0.0 0.0 0.0 0.0

6

7

8

20.0 0.0 36.4 0.0 62.5 60.0 9.1

0.0 0.0 0.0 0.0 0.0 20.0 45.5

20.0 0.0 9.1 0.0 25.0 20.0 9.1

40 Matrix-SPRT

30

Class

20 i0

0

o

0

0.0Ol 0.01 0.05

o

0.1

o

0

0.2

0.3

Figure 5. Mean sample size needed by the three classifier types, for optimally chosen feature sets, as a function of the tolerated error rate e. e: matrix-SPRT, o: GSPRT, .: tree-SPRT.

528

1 2 3 4 5 6 7

Classification score (070) 1

2

3

40.0 0.0 18.2 0.0 0.0 0.0 0.0

0.0 33.3 0.0 0.0 0.0 0.0 0.0

20.0 33.3 27.3 50.0 0.0 0.0 9.1

4

5

0.0 0.0 11.1 22.2 9.1 0.0 50.0 0.0 0.0 12.5 0.0 0.0 0.0 27.3

Volume 14, Number 6

PATTERN RECOGNITION LETTERS

June 1993

Table 2 Classification score (%) GSPRT

Tree-SPRT

Matrix-SPRT

Supervised (7 classes) Unsupervised (7 classes) Unsupervised (2 categories)

100 72 75,4 / 87.5

100 61.9 86.7 / 77.5

85.7 38.4 42.9 / 61.1

Average number of samples

4.3

1.3

32.8

limited to 100. The image data used for obtaining these results originate from 129 already segmented regions of a set of 23 remote sensing views of a larger test area. The overall mean scores of a supervised, as well as of an unsupervised classification, relative to the seven crop and vegetation classes under the same conditions as above, are summarized in Table 2 (from Huang and Zamperoni (1991)), together with the scores of the classification into the two categories: grain/non-grain. This table gives also the average number of samples needed for the classification. The scores of the unsupervised classification by means of the tree-SPRT and of the matrix-SPRT can be raised to 63.1% and to 44%, respectively, by taking e = 0 . 0 5 . The scores obtained with the matrix-SPRT classifier can be improved also by selecting another feature space, but generally the convergence becomes very slow.

5. Conclusions

Three types of sequential classifiers have been presented and their performances have been experimentally investigated both with synthetical and with natural image data. The results show that the G S P R T classifier (known from previous works) and the matrix- and tree-SPRT classifiers (devel-

oped within the scope of this work) can be useful for the multi-class classification of very difficult material, as for instance the remote sensing aerial views of agricultural areas considered in this work. Satisfactory classification scores can be generally attained with the G S P R T and with the threeSPRT classifiers; the performances of the matrixSPRT classifier have been found to be inferior to those of the first two ones. The G S P R T classifier features the shortest computation time while the tree-SPRT with non-running sample sequence needs the smallest sample size. The latter type of classifier is also preferable if a small amount of samples is available, as for instance with small cultivated fields.

References Fu, K.S. (1968). Sequential Methods in Pattern Recognition and Machine Learning. Academic Press, New York. Huang, Y. (1991). Segmentierung von landwirtschaftlichen Luftbildern nach strukturellen und statistischen Texturmodellen. Fortschritt-Berichte VDI, Reihe 10, N. 184. VDIVerlag, Diisseldor f. Huang, Y. and P. Zamperoni (1991). Terrain classification by sequential algorithms. In: B. Radig, Ed., Mustererkennung 1991. Springer, Berlin, 239-243. Lindemeier, U. (1990). Untersuchung und Erprobung eines sequentiellen Mustererkennungsverfahrens. Master Thesis, lnstitut fiir Nachrichtentechnik, Braunschweig University, June 1990. Wald, A. (1947). Sequential Analysis. Wiley, New York.

529