Copyright @ IFAC Intelligent Autonomous VehiclC5. Espoo. Finland. 1995
VEHICLE DETECTION AND RECOGNITION IN GREYSCALE IMAGERY N.D. MATTHEWS", P.E. AN"; D. CHARNLEY" and C.J. HARRIS" "Un.i".r6itll of Southa.mpto .... D.pa.rtm .... t of EI.ctro ...ics a. ... d Comput.r Sci .... c•• Ha.mpshir •• UK ... dmO.cs.60tO .... a.c .u,l:. dcO.cs.60ton.. a.c . u,l:. CjhOU6 .60tO .... a.c . u,l: ""Florida. Atla.... tic Un.i".r6itll. D.pa.rtm ....t of Oc.a. ... E ...gi .....ri ...g. Florida.. USA -
.a.... Oo • .ja.u . • du
Abstract. This paper details a novel two-stage vehicle detection and recognition algorithm by combining an image processing region of interest (ROI) designator to cue a secondary recognition process implemented using principal component analysis (PGA) as input to a Multi-Layered Perceptron (MLP) classifier. Both the image processing system and MLP classifier have been designed for real-time implementation and data-fusion with other information sources .
Key Words. Vehicle . tracking applications. image recognition. classifiers. cruise control
1. Introduction The introduction of new technologies to road vehicles such as intelligent cruise control or collision avoidance systems necessitates a high degree of robustness and reliability. Whilst accurate range estimates may be recovered using conventional sensors , e.g. millimetric radar, these typically suffer from both low bearing resolution and potential ambiguities through for example false alarms. This paper details a novel two-stage vehicle detection and recognition algorithm by combining an image processing region of interest (ROI) designator to cue a secondary recognition process implemented using principal component analysis (peA) as input to a Multi-Layered Perceptron (MLP) classifier . The combination of an initial detection phase, followed by a recognition process has greatly simplified the classifier design. In turn the classifier performance has allowed some of the image processing assumptions to be relaxed, whilst maintaining a high signal to noise ratio (SNR). Both the image processing system and MLP classifier have been designed for realtime implementation and data-fusion with other sensors, e.g . range/ range rate radar .
2. Detection 2.1. Vehicle cues
Road-vehicles, comprise a large number of horizontal structures , especially when viewed approximately from the rear (fig . 1) . Whilst , there are other potential sources of horizontal structure, it
is generally possible to mask areas within the image where these may occur and it is physically impossible for vehicles to be present, e.g. above the horizon. Practical experience, suggests as noted by (Kuehnle, 1991 ; Young et al., 1992) that groupings of horizontal edges are a significant cue for regions of interest (ROJ's) which potentially contain vehicles.
2.2. Vehicle horizontal location
A directionally filtered gradient technique is employed (below the approximate horizon) , by convolving the input greyscale image I with two 3 x 3 masks, so as to generate horizontal and vertical edge responses, h and I" , at each pixel location . The absolute magnitudes of h and Iv are compared so that only pixels with a significant horizontal response are retained. In practice man-made objects, e.g . vehicles . exhibit a stronger edge response than "natural" objects. Hence, an appropriate technique is to consider the total horizontal edge response over some ROI , rather than using an edge threshold . However , it is extremely difficult to define an appropriate ROI , thus an assumption is made that vehicles do not overlap image columns s1gnificantly e.g. (fig . 1) , this assumption is often implicit in many symmetry based techniques (Kuehnle, 1991) . Applying the horizontal overlap assumption , each image column may be considered as a potential RO! to sum over. Firstly, the horizontal edge response in each image column is summed and smoothed with a triangular filter . Potential vehi-
cle locations are then derived from locally maximal peaks which are extracted from the smoothed column responses in decreasing order , with an additional constraint to discard maxima which are too close to a previously extracted, i.e . higher, peak . Although this technique has a number of
cle shadow. Consider a road template at the bottom of the original image, whose pixel grey levels are assumed to belong to a Gaussian distribution. together with some extraneous pixels belonging to e.g. road markings . Following (Chu and Aggarwal, 1992) the mean tL of the underlying Gaussian distribution is equivalenced to the mode of the grey level patch , and the standard deviation er of the underlying normal distribution is given by (1)
er =
Ix -
tLl/ln(4)
(1)
where x is the location with 50% of the modal response . Pixels in the original grey level image in the range [0 : tL - 3er) are considered as due to a potential under-vehicle shadow . Areas corresponding to under-vehicle shadows are located by considering each potential vehicle location in turn . Pixels within a given candidate image strip are considered row by row, starting at the bottom of the image , i.e. closest . If more pixels on any given row within a given candidate vehicle's image strip are considered to be shadow than non-shadow, then the row is interpreted as part of an under-vehicle shadow and the vertical location noted as the bottom of the candidate vehicle (fig . 3). The candidate vehicle location is
Fig. 1. Detected candidate vehicle columns
limitations, e.g. the triangular filter support width and the proximity constraint have implications for the potential size of recovered ROI, it is extremely simple to implement and is invariant to camera pitching , e.g. when changing gear .
2.3 . Vehtcle wldth
Vehicle width is determined from an edge following technique applied to the horizontal edge response image. Edges are linked left and right from the candidate vehicle horizontal position on each row until the horizontal edge response drops beIowa given threshold . The vehicle width is determined by the leftmost and rightmost linked edge response pixels (fig . 2) . Although this technique
Fig. 3. Detected under-vehicle shadows
discarded if no location can be obtained to satisfy the under-vehicle shadow criterion. 2.4.2 . Vehtcle he1ght . Obviously, to extract an RO! for subsequent classification an estimate is required for the location of the top of the vehicle. Since there are no consistent cues associated with a vehicle roof a heuristic is applied that rears of cars are an approximately square, subject to digitisation . Hence , the vehicle height is equivalenced to its width and thus the top of the vehicle may be estimated from the vehicle width and the location of its under-vehicle shadow (fig . 4) .
Fig. 2. Detected candidate vehicle widths
will overestimate the vehicle width if a large horizontal features crosses the candidate vehicle location, in practice it tends to underestimate the true horizontal vehicle extent since only horizontal edges are considered, whereas a vehicle boundary is usually at a vertical edge .
2.4 .3 . Horizontal edge clustering. Whilst the vehicle vertical location derived from the undervehicle shadow is generally temporally stable , it may fail when road characteristics differ greatly from those in the template patch, e.g . due to shad-
2.4 . Vehtcle verttcallocation 2.4.1. Under-vehicle shadow. An important vehicular cue in daylight imagery is the under vehi2
ows cast by overtaking vehicles. Therefore a second vehicle vertical location algorithm has been developed , based on horizontal edge clustering. A vehicle-sized patch may be defined by applying the same aspect ratio heuristic used to derive the vehicle height (section 2.4 .2). Within a given candidate vehicle strip (fig . 2) the vehicle vertical location is given by the maximum horizontal edge response summed over all vehicle-size patches (fig. 4) . Although this technique generally
Fig. 6. ROI's for MLP classification - 2 due to redundancy of pattern components . Feature extraction is an essential procedure to reduce the problem dimensionality whilst retaining the required SNR. In this combined detection / recognition system a high degree of data reduction has already been performed by the initial image processing stage . With this information , the ROI considered by a classifier may be kept to a reasonable size, greatly improving the SNR and enabling accurate pattern classification.
Fig. 4. Detected candidate vehicle ROI's, (left) undervehicle shadow (right) maximal edge response
3.2 . Principal component analysis (PCA)
finds a region close to a vehicle it is not as temporally stable as the under-vehicle shadow . Since each image strip will by definition generate some maximum vertical patch position there is an explicit requirement for the car / road classifier .
Suppose there is a set of patterns {;Z:l, ' " ,;Z:M} , where ;Z:i is the ith pattern stored in the set S , and is defined as [Xo, Xl, ... , XL] in a L-dimensional pattern space . To apply principal component analysis (PCA), an auto correlation matrix, R , is formed by summing the outer products of M patterns
2.5. Classifier input
M
The candidate ROI's are dilated slightly to ensure that any vehicle is completely extracted . The ROI is then scaled to a constant size to maintain scale invariance for the classifier . The scaled ROI size is 20 x 20 pixels to reduce the dimensionality of the classification problem, any further reduction is likely to "oversmooth" car edges and other useful features. ROI's extracted using the undervehicle shadow (section 2.4.1) are shown in fig. 5 and ROI's extracted using horizontal edge clustering (section 2.4 .3) are shown in fig. 6.
R=
L;Z:i;Z:i
T
(2)
i=l
A set of L eigenvectors {Q1 " '" qL} associated with R now form the basis of the set S . The set of eigenvalues {Al ,"" AL} define the relative correlation significance over the basis of R . Any pattern ;Z:i may be represented as a linear combination of the eigenvectors by projecting ;Z:i onto the space spanned by the basis of R with principal components al· L
;Z: i
= Lalql
(3)
1= 1
The set of principal components, {aI , ... , aL} now forms an orthogonal representation for the pattern
Fig. 5. ROI's for MLP classification
;Z:i ·
To effectively apply PCA, only a set of eigenvectors with dominant eigenvalues should be retained, this not only reduces the problem dimensionality but also can potentially improve the SNR (by rejecting the noise components). The size of the eigenvector set is generally problem dependent , and can be considerably smaller than L if the pattern components are highly correlated. Thus ,
3. Recognition 3.1. Dimenstonaltty reduction/ feature extraction Pattern classification tasks for many image applications are generally considered difficult problems because of the associated high dimensional pattern space and poor signal-to-noise ratio (SNR) 3
any pattern
Zi
can be approximated to
applications. The MLP is chosen as the classifier for this study due to the high dimensionality of the input space.
Zi
I=N
Zi
=L
alql
(4)
1=1
3.3 .1. Multi-layered perceptron (MLP) . The multi-layered perceptron utilises nonlinear squashing functions to partition the pattern space with global hyperplanes. The orientation and intercept of these hyperplanes are adjusted using a nonlinear gradient descent (or backpropagation) optimisation technique.
3.2 .1. Local PGA . Note that the set of eigenvectors can be treated as an encoding device such that each of the eigenvectors contributes to an orthogonal pattern variation . When these eigenvectors are derived based on a single pattern, its encoding capability is restricted to the specificity of the pattern, and the generalisation abilities to other patterns are likely to be poor despite sharing similar local characteristics .
The architecture of an MLP network can be regarded as no -+ n1 -+ n2, where no is the network input dimension, Tl,l is the number of nodes in the hidden layer and n2 is the number of components in the output layer . The input-output relationship of the squashing node in the hidden layer is defined as
Based on an important observation on many road patterns that their statistical characteristics tend to be stationary, i.e. a set of localised eigenvectors defined in a smaller dimension can be used to reconstruct the entire pattern by forming an union of local patches (Sanger, 1989) . This local PCA is generally more robust when reconstructing a pattern which is not in 5, this has been verified in this application , where M = 90, L = 25, N = 5. This is an important property for car detection where real-time classification at (near) frame rate is required . An example of such localised eigenvectors or eigenmasks (5 x 5 pixels) corresponding to the ten largest eigenvalues derived from ninety road / car images (fig. 7) .
n, O"i
= ! i ( L Wi,jXj + J.L;),
(5)
j=l
where Wi,j is the hidden node parameter which connects the ith node in the hidden layer and the jth component of the input vector (Xj), J.Li is the bias associated with the ith hidden node and J; denotes the ith hidden node output . Using this relationship, each hidden node partitions the input space with its hyperplane along which its activation function has a constant output value - the activation function used in this car/ road classification is the sigmoid function . By forming inputs with the hidden node outputs , the ith network output can now be computed as n,
Yi
=L
Vi ,j !; ,
1:S i :S
n2
(6)
j=l
Fig . 7. Eigenmasks corresponding to the ten largest eigenvalues (in descending, raster order) .
Because of its flexible structure, the MLP has been shown to have universal approximation capability (Hornik , 1991).
3.3 . Pattern classification 3.4 . Training
Given a set of properly extracted features , criteria for selecting proper classifiers are commonly based on Bayesian Maximum A Posteriori (MAP) or Maximum Likelihood (ML) principles so that the average error probability is minimised . A nonlinear interpolative model , Multi-layered Perceptron (MLP), is frequently used to approximate the posterior class distribution because of its nonlinear modelling capability (Ruck et al., 1990). Although the MLP is generally inappropriate for on-line control applications because of its slow adaptation process and non-unique solution minimum, it generalises efficiently in a high dimensional space by means of globally extent basis functions, which is essential for most classification
Car/ road identification is performed by the MLP using features extracted by local PCA from a given scaled ROJ (fig. 5 and 6) . To avoid the common problem of overfitting/ parameter drift due to modelling error and measurement noise, the pattern classifier is only trained off-line, and hence its rate of convergence is less critical (this off-line condition obviously requires the training patterns to be truly representative) . 3.4 .1. Parameter adaptation backpropagation. Backpropagation is a nonlinear version of gradient descent algorithm where the steady state solution of the inputs and residual errors are 104
cally uncorrelated. The cost function for parameter optimisation is commonly chosen to be the sum of squared errors based on L training samples: 2~ L:~=1 €(z?, where €(z) = y·(z) - y(z) is the residual error , and y. (z) is the desired network output. By considering the negative directions of their true gradients, the parameter changes tSv and tSw can be expressed as
(3
L
LL
tSvi
~~~=-~ __=-~~~-=~~ or t.-. c,....
(7)
€(z)Ji(p)
.
p=l
(3
L
LL
tSwi,j
•
. •· •• < ·· · ···ff·· '~ .l.:.;;: .. .. .. . :
r~
j.
(8)
€(z)vdI{p)xj
§ "
p=l
f •
L.
.
where i is the hidden node number . By correlating with the input vector, w iteratively forms the projection basis such that its internal representation is optimal in the mean-squared sense .
Fig . 8. MLP classification performance using 1 hidden layer and (top) 1 hidden node, (middle) 3 hidden nodes and (bottom) 5 hidden nodes - + : training set; 0: validation set; x: testing set .
3.4.2 . Classljicatlon performance . The scaled 20 x 20 pixel RO! is partitioned into 4 x 4 subpatterns, each of size 5 x 5 pixels (L 25) . Local PCA is performed using the eigenmasks (fig. 7) on the individual subpatterns, and the principal components associated with the five most dominant eigenvalues are computed (N 5) for each of the subpatterns. The set of principal components forms a transformed pattern (80 dimension) . These tranformed patterns are then used to adapt the MLP network via backpropagation to estimate the car/ road decision boundary.
=
others of greater complexity. This is surprising, as it suggests that the car and road patterns can be separated with a 90% confidence level using a single linear hyperplane . Additional nonlinearity in the hyperplane does not appear to improve the average error probability. This might be explained by the use of the local PCA such that the features have a better SNR given the set of eigenmasks (fig . 7), and also the distribution of the existing pattern set .
=
480 ROJ's containing either a car or a road segment were extracted by hand from greylevel imagery, these were partitioned into three nonoverlapping subsets: training set (200 patterns), validation set (80 patterns), and testing set (200 patterns). The patterns in the training set were used to update the parameters in batch mode, whereas the patterns in the validation set indicated when overfitting started to occur (Sjoberg and Ljung, 1992). The patterns in the testing set were used to estimate the true error probability.
By comparing the performance among different pattern distributions, an optimal set of network parameters can be chosen at the point where the classification performance based on the validation set begins to deteriorate (10,000 cycles for 1 hidden node; 9,000 cycles for both 3 and 5 hidden nodes). The misclassification of car and road patterns was also evaluated to ensure that the classifier was not biased toward any particular class (table 1). These results indicate that the bias is rela-
The classification performance of the MLP as a function of number of training cycles is shown in (fig . 8) under three different internal representations . From these results, it can be seen that the
Table 1
.... . ..... . ..... .. .. .. .. j
~
~
I
i
j "~:;~~~;:~::::::::::::::::::::::::4 I
Car Road
•
-r c· i "..
n
1 node Car Road 97 3 1 99
3 nodes Car Road 92 8 1 99
5 nodes Car Road 93 7 1 99
tively insignificant (or uniform) for an MLP classifier of 1, 3 or 5 hidden nodes and that classification performance is acceptable, i.e. within:::::: 10% error, provided that the image characteristics are similar to those in the training patterns .
oof
..f
I ~~ ! --~~--~--~~--~, -'- ()o T' ",~
MLP classification of training data.
1 IQ'
MLP with only 1 hidden node performs as well as 5
4. Integration and results
ing, only 17 are mis-classified, whereas 127 of the 128 vehicles are correctly identified (the lorry is mis-classified in a single frame) .
The integrated system generates a number of candidate ROl's (fig . 5 and 6), potentially two per vertical strip, i.e . one due to recovered undervehicle shadow position (section 2.4.1) and one due to maximum horizontal edge response location (section 2.4.3), and invokes the MLP classifier on all candidate ROl's. If more than one potential vehicle is detected in a given vertical strip then the ROI with the highest classifier confidence is accepted (fig. 11).
5. Conclusions
The hybrid detection and recognition system has proved to be remarkably successful. The use of the image processing ROI detection enables multiple potential vehicle's in a single image to be classified and rejects most extraneous background information from the classification process . The use of ROl's has also allowed the classifier to be invariant to both scale and ROI position within the input image. The success of the classification system has allowed some of the image processing constraints to be relaxed and has enabled an extra degree of robustness to be incorporated into the combined system . It is hoped to further improve the system by incorporating a tracker to perform temporal filtering .
For the example image the MLP classifications are shown in (fig. 9) and (fig. 10). A background and
Fig. 9. ROI's classified as road (76.2%, 100% and 100%)
Acknowledgements This work was conducted at the University of Southampton . The authors would like to thank Lucas Automotive, Ford-Jaguar and Pilkington for their support of this PROMETHEUS project. Fig. 10 . ROI's classified as car (27.07%, 40.58%, 88.78%,93.72%,94.89%,95.40% and 96 .63%)
6. REFERENCES
Chu, C. and J .K. Aggarwal (1992). Image sensing using multiple sensing modalities. IEEE Trans . Pattern Analysis and Machine Intelltgence 14(10), 840-846 . Hornik, K. (1991) . Approximation capabilities of multilayer feedforward networks. Neural Networks 4,251-257 . Kuehnle, A . (1991) . Symmetry-based recognition of vehicle rears . Pattern Recognition Letters 12, 249-258. Ruck , D .W ., S.K . Rogers , M. Kabrisky, M.E. Oxley and B.W. Suter (1990). The multilayer perceptron as an approximation to a bayes optimal discriminant function. IEEE Trans. Neural Networks 1(4),296-298. Sanger, T .D . (1989) . Optimal unsupervised learning in a single-layered linear feedforward neural network . Neural Network 2, 459-473. Sjoberg , J . and L. Ljung (1992). Overtraining , regularization , and searching for minimum in neural networks. In: Proc . of the IFAC Sympo stum on Adapt:ve Systems m Control and S:gnal Processmg. IFAC. pp . 669-674 . Young, E., R . Tribe and R. Conlong (1992) . Obstacle detection for collision avoidance. In : Th:rd PROMETHEUS collmon avoidance workshop.
lorry tail-gate ROI are categorised as "car" since they are significantly "not-road" , due to their high degree of internal structure, although the classifier has a low (27.07%) confidence in the classification of the background region.
Fig. 11. Detected and classified vehicle targets.
The integrated vehicle detection and recognition system has been successfully applied to a large number image sequences from a variety of sources featuring many different vehicles. The clutter rejection capability of the classifier improves the SNR of the combined output significantly, e.g. for the 32 frame example sequence , of 76 spurious background ROl's detected by the image process6