COMPUTER VISION, GRAPHICS, AND IMAGE PROCESSING
39, 131-143 (1987)
Fast Edge-Only Matching Techniques for Robot Pattern Recognition TERRY CAELLI AND SHYAMALA NAGENDRAN AIberta Centre for Machine Intelligence and Robotics, and Department of Psychology, The Unit~ersiO' of Alberta, Edmonton, Alberta T6G 2E9, Canada Received September 17, 1985; accepted July 22, 1986 A technique for detecting arbitrary two-dimensional signals by cross-correlating edge-only information is developed from fast pipeline pixel processor logical and convolution operations. The procedures are incorporated into a robot visual system capable of locating, moving toward, and pointing to a target signal, independent of its orientation and size. ~ 19s7 Academic Pres~, lnc
1. INTRODUCTION
Many claims have been made as to the importance of edge information in coding images [1], pattern recognition [2], and fact computational vision in general. Second, convergent results from human psychophysics [3], vertebrate physiology [4], and computational edge processing [5, 6] point to the "optimal" edge extractor as a bandpass Gaussian filter or pseudo-gamma function bandpass filter [5] approximated by the V 2G a operator [3, 6]: Laplacian following lowpass Gaussian filter of bandwidth a picture cycles. Though it is also clear that more precise information about image edges can be obtained by the consideration of the intersections of edges resultant from classes of bandpass filters (V 2G,~, a ~ { a 1 . . . a n }), in the present studies we have restricted our simulations to one such filter. Rather than focus upon edge extraction, in this paper we are concerned with edge-only pattern recognition or pattern matching, as restricted to the two-dimensional environment shown in Fig. 1. It is well known that, with signals (s(x, y)) embedded in nonstationary backgrounds (as in scenes: g(x, y)), the direct crosscorrelation of s with g is suboptimal as a detection procedure, [7, 8]. Suggestions as to what types of filtering may be employed on s and g to improve performance have been made, including an ideal prefilter by Kumar and Carroll [7], a least squares ideal filter by Liu and Caelli [8], and the possibility that edge-only matching may well be efficient [2]. Edge-based matching techniques are particularly useful in the context of pipeline pixel processors where the execution time for convolution operations is critically dependent on kernel size, and the frame memories are restricted to 8- or 16-bit pixel sizes. Second, such techniques are useful, as will be seen, with affine invariant matching problems: where the signal to be detected is different in size and orientation from its version specified, a priori. Figure 1 shows our problem domain whereby a signal is given to the robot and the task is to find the signal embedded on the wall of the open field from the center, move toward, and point to it. A concomitant aim was to explore how efficient the matching process is--when based on edge extraction procedures and pipeline pixel processor technology. 131 0734-189X/87 $3.00 Copyright ;¢'91987 by Academic Press, Inc, All rights of reproduction in any form reserved.
132
CAELLI A N D N A G E N D R A N
FIG. 1. (a, b), Robot and restricted visual environment used in simulations. Various image montages were placed about the walls and the robot's task was to detect where a specified signal was, then move toward and point to it, based on visual information.
2. EQUIPMENT
A common minicomputer based (PDP 11//23) image processing system (Image Technology Incorporated: ITI) consisting of three 512 x 512 8-bit frame memories with frame grabber and pipeline pixel processor, was used. This system was interfaced with an RCA video camera and RB-robot (RB Robots, Colorado), as shown in Fig. 1. Images could be displayed on an Electrohome color monitor. Image digitization occurred at raster frequency: 30 Hz, or 33 msec per image. The pipeline pixel processor (Arithmetic Logic Unit: ALU-512) maps frame memories into frame memories with respect to the logical operations (OR, exclusive OR, AND, 2's complement) and the usual addition and subtraction operations. Each pass takes 33 msec and operates on the full image, except when pixel protects are active. Information passes through a 16-bit register and the device also has an 8 x 8-bit multiplier--which enables convolution operations on 512 x 512 images with (n x m)-sized kernels, and so taking n x m x 33 msec per convolution. The RB-robot was controlled by interrupt-driven subroutine calls capable of moving it in any direction and operating the three-joint arm according to the processed image information.
133
ROBOT PATTERN RECOGNITION 3. EDGE-EXTRACTION, INVARIANCE CODES, AND MATCHING TECHNIQUES
As implied above, the ~7 ZG(a) operator (on an image I) is an adequate representative of bandpass filtering used in recent edge-extraction techniques. Defined by 02
v2G(a)(I)
02
+ - - { G2 ( a : O y x , y ) * Z ( x , y ) } ,
= 3x~7{G(a:x,y)*I(x,y)}
(1)
where * defines convolution and G(a: x, y) is a low-pass filter with a point-spread function
G ( . " x, y) = e "2(x2+v2),
(2)
this filter is simple to approximate in a pipeline pixel processor by two convolution kernels. First, the Gaussian low-pass filter (2) can be approximated by the recursive use of an averaging kernel of the form 1] 1"
A = 1/4111
(3)
After m recursions, it produces bivariate binomial coefficients of the form G m -- B ( m ; x, y ) = C~"" C~m,
x = 0 . . . . . rn; y = 0 . . . . . m.
(4)
This takes m × 132 msec to complete and in the simulations to be reported here m was set at 10 (a total of 33 × 4 × 10 = 1.32 sec). The ~7 2, or Laplacian, kernels are defined (in finite difference form [2]) by
72 =
--4
,
(5)
1 taking 9 × 33 = 297 msec to complete. Though various options are available f o r coding the image edges as (x, y ) coordinates from these resultant • 2G images, we have used the following technique, since it involves only two passes through the pipeline pixel processor. Since the output of the ~7 2G(a) operator can be positive or negative, 127 was added to the output, followed by an .AND.(•) with 128, giving the complete form { E ( x , y)} = {all z ( x , y) > 0, from [ G 2 [ V 2 G l o ( I )
+ 127]] N 128}.
(6)
Here, also, a second smoothing operation (G2) was employed to suppress isolated nonzero points resultant from the V2Glo operation. Some illustrations of these processes are shown in Fig. 2, taking a total time of 1.75 sec. In our application the edge set of points (6) was (spatially) uniformly sampled to less than 256 points, as shown in Fig. 2 and 3. Once this edge code has been established for a given signal, cross-correlation of the signal with any arbitrary scene can be reduced to a number of passes through the pipeline processor equal to the number of signal points. That is, the cross-corre-
134
CAELLI AND NAGENDRAN
a
b
FIG. 2. (a) Input images (I) and (b) edge-only versions created by G2(x72GLo (I) + 127) n 128.
a
b
Fro. 3. Edge-only matching (cross-correlation) between sampled signals (a) (see Fig. 2b) and images (b). Luminance corresponds to the likelihood of signal presence.
ROBOT PATTERN RECOGNITION
135
lation function
C~g(x, y) =
[
a, fl)g(x + a, y + fl) dadfl
(7)
o0
reduces to Xh
Yh
Csg(x,y) = ~-+ E s ( n , m ) g ( x + n , y + m ) ,
(8)
ll=X I IlI=FI
(xt, Yt, Xh, Yh) corresponding to the subarray containing the signal. For both s and g being binary valued (edge-only) images (8) reduces to Csg(X, y) =
Y'.
s(n, m ) g ( x + n, y + m).
(9)
n, m ~ { E ( x , y ) )
This correlation function can be executed by adding the image g to itself shifted by the locus of points ((x, y) values) depicting the signal, or
Csg( x, Y) =
E
g(x + n, y + m).
(10)
n, m ~ E ( n , m)
That is, this form of edge-only cross-correlation only accumulates evidence for the signal's presence in the image (g) by the degree to which the image and signal edge points coincide. Equation (10) is readily executed via a series of passes through the pipeline pixel processor where the results are accumulated--as illustrated in Fig. 3b. Though edge-only matching techniques have been suggested before to be useful with detecting signals in nonstationary backgrounds, one aim of the following simulations was to determine just how efficient such a technique is, when a variety of different signals and environments are used. To this stage, however, the pipeline pixel processor based edge matching technique is upper bounded in time by 255 × 0.033 + 1.75 sec, or 10.165 sec. The problem with this matching technique, however, is that it is not invariant to size and orientation changes of the signal in the image. This is a particularly relevant problem in the case of robot vision studied here since movements of the robot toward the field wall introduces size and possibly small angular changes in the signal (Fig. 1). Techniques for matching which overcome rotations of signals have been recently developed by Hsu et al. [9] using circular harmonic functions whereby a Fourier decomposition of the polar transformed signal and rotated version is enacted along the orientation parameter. Matching occurs since both images have identical power spectra, with the phase giving the orientation differences between them. The problems associated with these techniques are (1) determining the initial estimate of the signal and image centers to enact the polar transforms, and (2) computing the Fourier transform. An alternate approach to the problem is simply to transform edge-only images into polar coordinates and enact matching using pixel processor operations, as above, between frame memories--whose coordinates are polar. Again, this process has the same problem of estimating the image center for the polar transform. For
136
CAELLI AND NAGENDRAN
FIG. 4. (a) Input cartesian image; (b) edge-only version. (c) Resultant sampled edge-only signal (s). (d) The cross-correlation between s and g where luminance depicts the likelihood of signal presence. (e) Log-polar image (g) with respect to peaks of (d); (f) polar version of signal. (g) Cross-correlation between (e) and (f). (e)-(g) The horizontal axis corresponds to polar angles ( - 180 ° to 180 °) while the vertical axis corresponds to log-polar distance (see text). The peaks in (g) (horizontal center) correspond to the best match between the log-polar signals and images. In these cases no rotations were introduced and hence the peak is in the horizontal center.
ROBOT PATTERN RECOGNITION
137
FIG. 5. (a) Input cartesian images (g); (b) edge-only versions. (c) Resultant sampled edge-only signal (s). (d) The cross-correlation between s and g where luminance depicts the likelihood of signal presence. (e) Log-polar image (g) with respect to peaks of (d); (f) polar version of signal. (g) Cross-correlation between (e) and (f). (e)-(g) The horizontal axis corresponds to polar angles ( 180 ° to 180 °) while the vertical axis corresponds to log-polar distance (see text). The peaks in (g) (horizontal center) correspond to the best match between the log-polar signals and images. In these cases no rotations were introduced and hence the peak is in the horizontal center.
138
CAELLI AND NAGENDRAN
both signal and image the best a priori initial estimate of the center is the peak of the, what may be termed, cartesian cross-correlation functions, as shown in Fig. 3b. Since this peak is based upon edge-only images, it is more likely to be sharply defined compared to more usual cross-correlation function gradients about the peak response (see Liu and Caelli [8] for examples). The log-polar (conformal) mapping is r' = 77.8 loge(r + 1)
and
O = tan-X(y - Yo/X - Xo),
(11)
where ro - - ~ / ( x - x o ) 2 + ( y - y o )
2
(12)
FIG. 6. (a) Input image. (b) Cartesian (A) and log-polar (B) sampled edge-onlysignal for matching. The polar transform is centered on the "+" in (b).
ROBOT PATTERN RECOGNITION
139
for (x o, Yo) corresponding to the peaks of signal autocorrelation and image crosscorrelation images (see Fig. 3b). Figures 4 and 5 show examples of signal and images transformed by the above procedure. Here, the peak of the log-polar cross-correlation function determines the particular signal size and orientation detected--with respect to the center chosen by the cartesian cross-correlation procedure. In these illustrations only -~r to +Tr (radians) sections of the image frame memory are shown. However, to enact the full log-polar cross-correlation function into the standard cartesian coordinate system we have duplicated the image over a full 4~r range so that the equivalent (log-polar) "cylindrical" frame memory can be represented. This procedure overcomes the problem of wraparound that occurs with large rotations of the signal.
FIG. 6--Continued. (C) Imageat differentsize and orientation.(d) Log-polaredge-onlyimage.
140
CAELLI AND NAGENDRAN
FIG. 6--Continued. (e) Log-polarmatch. The peak (in circled area) corresponds to the size/orientation at which the best match occurred.
To illustrate these ideas more clearly, Fig. 6 shows the full duplicated log-polar image and the log-polar matching process when the signal in the image is different in orientation and size from the target signal. Again, the peak strength in the log-polar domain still indexes the degree of match while the position tells the orientation and size characteristics. It should also be noted that differences in the peak strength between the polar cross-correlation functions and the cartesian are mainly due to the inaccuracies of the log-polar transform resolution employed. 4. IMPLEMENTATIONSAND SIMULATIONS Though the basic computational vision procedures are defined and illustrated, a number of problems related to implementing these ideas into a pseudo-real time robot visual system must be resolved--particularly related to the analysis of error, setting of thresholds, and adaptation procedures. Even in the restricted environment used in these simulations ambient luminance introduces photon fluctuations such that even redigitizing exactly the same image does not necessarily result in identical pixel values, albeit they are close. Of particular variability is the luminance projections around the field walls about which the robot moves. For these reasons a signal matching threshold was set by digitizing the signal and then redigitizing the same area and determining the edge-only cross-correlation function. The peak value could then be used to estimate the signal match threshold --which in these simulations was set at 60% of the number of signal points. Due to the rounding errors associated with the log-polar transform, the polar signal matching threshold was typically set at half of that of the cartesian. Although these thresholds seem remarkably low, the checking procedure resulted in no false alarms
141
ROBOT PATTERN R E C O G N I T I O N
¢=A¢ ¢=¢+A¢ ~ O Y E S ~ ES R" rotate at centre(¢) D: digitize E: edge extraction M C: cartesian match
| YES M OL VD ETO FIE
Mp: polar match Cc: cartesian criterion Cp: polar criterion log polar transform
FI: O Rs: r°tate signal O
INITIAL RESCALE OFSIGNAL } ~
1
L LR° ,o NO I-~
FIG. 7.
YES-
T
~NO
STOP:
MOVE SIGNAL FROM NOT FIELD PRESENT
POINT ~SIGNAL
FAIL
m, GOAL ATTAINED
! ,.YES AOAPTVE
sO--6~-~-~'~9/--~ARM MOVE ~/ IROUTINE NO 072~ I
Robot system flowchart for the visual locating of signals.
in our restricted simulations--though more detailed signal detection analysis is being pursued. The full simulation procedure, after the signal was converted into a set of { x, y } edge coordinators ({ E(x, y)}, (6)), is illustrated in Fig. 6. As indicated in Fig. 1, the robot commenced in the middle of the field with a goal of searching the field walls for the presence of the specified signal. Once sufficient evidence had been attained for its presence in a specified region, it moved toward the wall, rematched with size and orientation (log-polar) matching, if a simple rescale signal size operation was not adequate, and if still positive, pointed to the signal. Again, the pointing process was accomplished by a simple adaptive procedure based on matching an edge-only hand code with the digitized hand. The peak of the cross-correlation function localizes the hand with respect to the signal and simple (nondistorting to the hand image) elbow and shoulder motions were used to converge to the signal (x, y) coordinates via a modified steepest descent procedure (Fig. 7). Figure 8 shows two different visual environments used to test the above procedures. In these situations recognition-under-transformations was enacted only if the polar transformation threshold was set to be lower than the cartesian threshold. In the case of alphanumeric characters, we have simulated the detection with small rotations of all elements. Only in the case of the character "X" did we obtain false
142
CAELLI AND NAGENDRAN
FIG. 8. Two typical visual environments used to evaluate the pattern recognition technique involving alphabetic characters (bottom), and complex scenes (top).
alarms with the criteria specified above. In all other alphabetic and scene cases perfect matches--that is, the solution as shown in Fig. 1b--were obtained. With complex patterns such as two-dimensional scenes or objects the false-alarm probability decreases with increases in the complexity of the pattern. That is, the error function of this process is directly dependent on the number of edge points and their spatial distribution. The point is that this form of cross-correlation decreases the background "noise" problem which occurs with direct cross-correlation techniques (7). 5. CONCLUSIONS
In these simulations we have demonstrated how pipeline pixel processor technology may be effectively employed to enact edge-only matching of arbitrary signals to images. Though to this stage the choice of thresholds is arbitrary, our results using the log-polar mapping procedure for matching under rotations and size change along with the cartesian procedure resulted in successful matches for alphabetic characters and relatively complex images.
ROBOT PATTERN RECOGNITION
143
This polar matching procedure collapses when no evidence of a match occurs--that is, when no discernible peak occurs in the cross-correlation function. In other attempts to solve this problem [9] an adaptive procedure is used to establish the center. However, this is time consuming in that the procedure involves changing between polar and cartesian coordinates. The second limitation of the above procedure is the simple use of signal sampling to keep the number of signal pan and scroll values (coordinates) below 255. This does not necessarily result in the signal features, which are more likely to remain invariant to light fluctuations, image distortions, etc. However, these problems are currently under investigation, and the matching procedure and criteria are being analyzed from a signal detection perspective. ACKNOWLEDGMENTS
This project was funded by Grant A2568 from The Natural Sciences and Engineering Research Council of Canada. We would like to thank Mr. Merle Blachut for the development of the interrupt-driven interface between robot and PDP 11/23 computer. REFERENCES 1. D. Marr, Vision, Freeman, San Francisco, 1982. 2. A. Rosenfeld and A. Kak, Digital Picture Processing, Vol. 1, Academic Press, New York, 1982. 3. R. Watt and M. Morgan, The recognition and representation of edge blur: Evidence of spatial primitives in human vision, Vision Res. 2,3, 1983, 1465-1477. 4. D. Pollen, B. Andrews, and S. Felden, Spatial frequency selectivity of periodic complex cells in the visual cortext of the cat, Vision Res. 18, 1978, 665-682. 5. D. Marr and E. Hildreth, Theory of edge detection, Proc. R. Soc. (London) Ser. B 207, 1980, 187-217. 6. Y. Leclerc and S. Zucker, The Local Structure of Image Discontinuities in One Dimension, Technical Report TR-83-19R, Department of Electrical Engineering, McGill University, 1983. 7. V. Kumar and C. Carroll, Loss of optimality in cross correlators, J. Opt. Soc. Am. Ser. 1(4), 1984, 392-397. 8. Z.-Q Liu and T. Caelli, An adaptive prefilter matching technique for detecting signals in non-white noise, Appl. Opt. 25(10), 1986, 1622-1626. 9. Y. Hsu, H. Arsenault, and G. April, Rotation-invariant digital pattern recognition using circular harmonic expansion, Appl. Opt. 21(22), 1982, 4012-4015.