Photogrammefria, 39 (1984) 135--142
135
Elsevier Science Publishers B.V., Amsterdam - - Printed in The Netherlands
COMPARING IMAGES OF IMAGE SEQUENCES USING SEGMENTS
B. B A R G E L , A. E B E R T and D. E R N S T
Forschungsinstitut ffirInformationsverarbeitung und Mustererkennung F I M (FGAN), Breslauer Strasse, 48, D-7500 Karlsruhe 1 (F.R. Germany) (Received March 28, 1984)
ABSTRACT Bargel, B., Ebert, A. and Ernst, D., 1984. Comparing images of image sequences using segments. Photogrammetria, 39: 135--142. A new method for the comparison of images (frames) of image sequences is proposed. Instead of image functions (e.g. correlation) symbolic image descriptions referring to segmentation results for single frames are compared. This paper describes the chosen methods for image segmentation which refer to an iterative binarization (level slicing) and to a region growing (local grey level evaluation) algorithm. The generated segments characterized by features and attributes represent a symbolic description and are the basis for the comparison of successive frames. F o r the matching of segments the hypotheses no changes, pseudo changes, and real changes are tested. The applied algorithms refer to the similarity of shape features, to spatial relationships and to contour points. First results are given for the image segmentation and the image comparison of test scenes.
INTRODUCTION
Object detection and classification in image sequences is an important task with many applications (e.g. industrial robots, tracking of moving objects, homing systems, updating of maps). To reduce the time for the data interpretation or to replace the human operator in dangerous situations automatic systems for image sequence processing are necessary. In the past, systems have been built [1], where object detection was based on a simple comparison of the image functions. Items which cannot be handled by those systems but should be included in the analysis of image sequences are: (1) Tracking of more than one object at the same time. (2) Detection of new objects entering the field of view. (3) Retrieval of objects after temporary occlusion. (4) Prediction of possible occlusions. To include these items, it is necessary to apply methods of symbolic image description for the frames of the image sequence and to compare successive frames using these descriptions.
0031-8663/84/$03.00
© 1984 Elsevier Science Publishers B.V.
136 SEGMENTATION For special imagery (IR-data) an image analysis system was developed at FIM [ 2 ] , where the object extraction is based on a binarisation algorithm. This segmentation algorithm relies on the fact that the interesting objects have higher intensity values than the environment. In this case it is possible to separate the objects from the background by means of a binarisation of the image using an appropriate threshold. Because of the great variety of object intensities such a threshold has to be computed automatically for each object. There are well-known methods for finding such object adaptive thresholds if the contrast between the objects to be separated is fairly high and if some a priori information about the objects is available. As such information could not be taken into account and objects with smaller contrast against their environment should be considered, too, the following procedure was chosen. Beginning with the highest grey level of the unprocessed image as threshold, a starting area and the shape features of this area are generated. The starting area is enlarged by decreasing the threshold until a minimum distance of the shape features between two successive areas is detected (significant change). The latter area is accepted as a segment describing an object or a part of an object. It is stored into a m e m o r y together with its shape feature vector, its binarization threshold, and its grey level characteristic as a symbolic description. By this procedure all segments are generated sequentially for the single frame which at least is completely described by a set of feature vectors. This procedure of level slicing which gives good results for the segmentation of IR-imagery was tested for TV-scenes, too, where the objects of interest are not characterized by higher or lower intensity values than the environment. For these data two new aspects must be considered. First, the extracted segments with homogeneous grey level may describe parts of the objects of interest as well as parts of foreground or background objects. Second, there are segments which are adjacent to segments with higher as well as lower intensity. To get good results for the TV-imagery, too, the grey level slicing is modified as follows: (1) For the binarization, not a grey level threshold but a grey level interval is used, which at first is defined by the grey level of a "starting point". (2} Starting points are the local maxima of the distances to the nearest contour in the grey level image. The contours are computed by gradient analysis and contour search algorithms [ 3]. These maxima mark the interior of image areas with a homogeneous grey level distribution. The grey levels of these points are therefore suited for determining the starting interval of the level-slicing procedure. (3) The interval is alternatively extended to lower and higher grey levels. The binarization algorithm stops, if the extension of the interval in each of both directions results in a significant change of the extracted areas.
137
Results of this modified grey level slicing, applied to test imagery as given in Fig. 1, are shown in Fig. 2, where segments are displayed by different pseudo-colours. These examples demonstrate that with this algorithm parts of interesting fore- as well as background objects are properly segmented. This is true especially for smaller areas with a homogeneous grey level distribution. Larger and textured areas are n o t as reliably described. In the larger areas a slight grey level change (grey level drift) exists which should not influence the segmentation by generating an unnecessarily large number of segments describing uninteresting parts of objects. As the level
Fig. 1. Test data for the segmentation and matching of frames of an image sequence (locomotive engine, garage, house, etc.).
Fig. 2. Segmentation of the test data using the modified level slicing algorithm (original in colour).
138
slicing with the evaluation of changes in the binarization results for different intervals is very sensitive to such a grey level drift an additional algorithm is added to the segmentation process. This algorithm refers to methods of region growing with a local evaluation of the grey level distribution. That is, only those new elements are merged with the actual region which have a small neighbourhood inside and outside the region with a similar grey level characteristic. The results of segmentation with this algorithm included are given in Fig. 3, which proves that now the larger areas are properly selected, too.
Fig. 3. Improvement of the segmentation by combination of region growing and level slicing (original in colour).
SYMBOLIC IMAGE DESCRIPTION After the application of the "region growing" (for larger areas) and the "modified level slicing" with small gaps of inhomogeneous grey levels closed by postprocessing, the single frames are completely segmented. These segmentation results are the basis for the symbolic description of each frame which contains in detail the following items: spectral characteristics (mean value and variance of the grey-level distribution) textural features (second-order grey level statistic [ 4] ) shape features (size, axes of inertia, etc. [2] ) position (x-, y-coordinates of the center of gravity) spatial relationships (distances between centers of gravity, adjacencies) similarity to other segments in the single frame (Canberra-distances [5] between the corresponding shape features). After matching the frames and relating segments further items are computed:
139
motion (segments of stationary objects, of objects with slow motion, etc .) stability (the change of features for corresponding segments). MATCHING OF IMAGES The first step of image sequence processing using a symbolic description is the matching of successive frames. For this, the correspondence between the segments of two frames must be found. To solve the correspondence problem it is assumed that both frames show nearly the same scene at slightly different times. That means, the time interval is short enough to ensure that in most parts of the frame the grey level information remains unchanged. For these parts, it is expected that segments were generated which can be matched according to the similarity of their features. In other parts, however, significantchanges can occur which result in a different segmentation. Examples of those changes are the merging or occlusion of objects. Different segmentation results m a y also be caused by grey level fluctuations in textured low contrasted parts of the images. Therefore the methods to compute the correspondence between the segments of the same object in both frames have to consider the following situations (Fig. 4):
A
B
C D
Fig. 4. Schematic representation of the matching situations (A) no changes; (B) pseudo changes; (C) real changes; (D) pseudo and real changes.
(A) JNo changes: The segments representing an object or a part of an object are nearly identical in both frames. (B) Pseudo changes: The segments representing an object or a part of an object in the frames to be compared have a completely different shape due to the sensitivity of the segmentation algorithm to unimportant slight grey level changes. In both frames, however, these segments can be connected to nearly identical "supersegments". (C) Real changes: The segments representing an object or a part of an
140 object are different in the frames to be compared due to merging or occlusion effects. (D) Pseudo and real changes: The segments as well as possible "supersegments" representing an object or a part of an object are different in the frames to be compared. The computational complexity of the methods for matching increases strongly from situation A to situation D. It is therefore preferable to introduce a hierarchical matching procedure where at first the hypothesis is tested that all segments have unchanged partners in the other frame. To do this, the feature vector of each segment is compared to the feature vectors of the segments in the other frame. To reduce the number of comparisons and the number of ambiguities due to similar feature vectors the matching algorithm can be restricted to parts of the frames. The size and location of these parts depend on the m a x i m u m velocity of the moving objects and the maximum of the sensor movement allowed. For unambiguous combinations with a sufficient similarity {Canberradistance below a user specified threshold) between their feature vectors the segments concerned are considered to correspond. If there are some segments in a frame with nearly the same features and with a sufficient similarity to more than one segment in the other frame, the matching on the basis of single segments becomes ambiguous. Therefore it is necessary to regard the similar segments of each frame as a group. The matching of the single segments is done in such a way that their arrangements within the groups remain unchanged. For the segments which could not be matched using shape features the occurrence of pseudo changes is anticipated and tested. To do this, segments which are adjacent and have a similar texture are combined (merged into "supersegments"). These "supersegments" are then compared in the same manner as the segments in the first step. The still unmatched segments refer to real object changes in both frames. For these segments the comparison using shape features is not possible because, especially in the case of merging, the segment area may change rapidly from frame to frame. Nevertheless there are parts of the segment area which remain unchanged and which can be taken as deciding criterion for the matching of the segments. In order to refer to parts of the segments the contour information will be used which is extracted from the segment area by means of a polygonal approximation [6]. Thus, the problem of matching areas is transformed to the matching (of parts) of polygons. For the matching of polygons two different algorithms [7], [6] are applied. The first is rather simple and time efficient. It is only able to match polygons which are unchanged and have enough vertices. The second m e t h o d is more complex but it is able to recognize similarities even if the polygons are shorter and changed by affine transformations. The matched c o n t o u r line defines that part of the segment which is unchanged. To test the final hypothesis that there are real as well as pseudo changes
141
the possible "supersegments" were generated from the n o t y e t matched segments. The contours of these "supersegments" of the t w o frames were compared using the c o n t o u r matching methods mentioned above. The results of these comparisons define the unchanged parts of the correspondent "supersegments". Segments with no correspondence in the other frame after the application of the matching algorithms for situations A to D are considered to describe new or missing objects or significantly changed objects. CONCLUSION
The results of the matching methods, applied to the segmented frames as displayed in Fig. 3, are given in Fig. 5. In this representation the correspondent segments or "supersegrnents" are coded with the same pseudocolours. Segments which are n o t matched are shown in black. These results demonstrate that segments, "supersegments" included, could be matched testing the hypotheses A to D. In this simple test scene no new objects appear. The small black parts in the frames, therefore, result from distortions or u n i m p o r t a n t segmentation errors. They could be suppressed, r e c o m p u t e d by postprocessing or taken as new objects. For the test of postprocessing m e t h o d s and for the matching of frames where new objects appear or objects disappear, further scenes are necessary. Those scenes taken from TVor IR-imagery are under investigation at the moment. In future, the matching of successive frames will be used to generate a history for the image sequence. With this history it will be possible to detect and track objects of interest as well as to discover or to anticipate impacts of the object representation like occlusion by foreground objects.
Fig. 5. Results of the matching algorithm. The correspondent segments are displayed in the same pseudo-colours in both frames (original in colour).
142 REFERENCES 1
2 3 4 5 6 7
Bets, K.-H., Bohner, M. and Fritsche, P., !983. Image sequence analysis for target tracking. In: T.S. Huang (Editor), Image Sequence Processing and Dynamic Scene Analysis. Springer. Bohner, M., 1980. Target tracking and target detection in TV and FUR-Imagery. AGARD Conf. Proc. No. 290. Kazmierczak, H., 1980. Erfassung und maschinelle Verarbeitung yon Bildern. Springer Verlag. Haralick, R.M., 1979. Statistical and structural approaches to texture. Proc. IEEE, 67: 786--804. Mardia, K.V., Kent, J.T., and Bibby, J.M., 1979. Multivariate Analysis, Academic Press, New York, N.Y. Davis, L., 1979. Shape matching using relaxation techniques. IEEE PAMI-1, No. 1 (January). Ranade, S. and Rosenfeld, A., 1979. Point pattern matching by relaxation. Pattern Recognition, 12: 269--275.