Copyright e IFAC Intelligent Autonomous Vehicles, Espoo , Finland, 1995
ROAD DIRECTION DETECTION BASED ON GABOR FILTERS AND NEURAL NETWORKS M. SURAKKAo and J. HEIKKONEN .. 1 °Mech4nic41 Engineering L4bor4torll, M4chine Intelligence Divi.ion, N4mil.:i 1-2, Tnl.:ubd-.hi, Ib4rdi-l.:en, 305 JAPAN oOEvrope4n Commiuion, Joint Re.e4rch Centre, In.titute for Sll.tem. Engineering 4nd Inform4tic. , TP 361, 1-21020 J.pr4 (VA), ITALY Abstract . This paper proposes a road direction detection system for outdoor a';ltonomous vehic~es that must navigate on roads according to visual infonnation . The system consists of t~ee major stages. At the first stage a self-similar family of Gabor ke~e.ls are ~sed to extract .l~w-level Image features from a road image . Next, unsupervised learrung prmclples With Self-Orgaruzmg ~aps are u.sed to cluster the Gabor coefficients to higher level image features suitable for roa~ directJO~ detectl~n . At the last stage a multilayer perceptron network trained with a .backpropagatlon algorithm .c arnes out the road direction estimation according to the Gabor coeffiCients clusters. Several road Images taken by a camera mounted on a passenger car are used to test the perfonnance of the system . Key Words. Road direction detection ; autonomous vehicles; navigation systems ; computer vision ; neural networks
This work addresses one of the most important and basic tasks of autonomous vehicles, namely, the task of detecting the direction of the road or allowed driving area in front of the vehicle. Many vehicles like fork lifts , payloaders , etc . are often required to operate on roads or in road-like surroundings . A computer vision based road detection system used in outdoor autonomous vehicle must be robust and fast. When operating in road environments, the speed of the vehicle, the type of the road , and the weather conditions set their own requirements for the road direction detection system . For instance, at the speed of 50 km/h a vehicle moves about 14 m/s , which sets requirements for hardware used as well as for the efficiency of algorithm chosen . In addition, due to variety of different road types , it is impossible to store to the computer memory all possible sensory situations corresponding to certain road directions because it would lead to enormous amount of data to be stored and processed . This kind of system will not be capable to operate in real-time . The weather conditions also affect the road direction detection . Especially sunlight can cause difficult problems because of shadows and reflections in the road .
1. INTRODUCTION Autonomous vehicles can be used in several tasks. However , totally autonomous vehicle, such as an autonomous passenger car appears still only in our dreams. The technical obstacles itself are mainly solved ; The recent developments in many technologies, such as VLSI, computer hardware, parallel and distributed processing, micro controllers, and sensory devices including cameras , have made it possible to build feasible systems for a variety of needs . However , before autonomous systems become part of our daily life, e.g., in underwater and outer space exploration and manipulation, or diagnostic and repair in industry, many problems of perceptual autonomous systems have to be solved . Autonomous systems need sophisticated and efficient computing techniques for interaction with the real world . An intelligent, autonomous system often perceives environmental information from multiple sources; cameras, ultrasonic , infrared , and tactile sensors are commonly used . Among these , vision is the most challenging one . One of the main goals of a robot vision system is to gather visual information in front of the vehicle and to provide a robust description of the scene. This description may include object location information, i.e. , different kinds of objects are recognized and their locations are determined, or the description may give information on road boundaries for a road following task (Nitzan , 1987) .
Traditional approaches to road direction detection are based on sufficiently detailed and accurate models of the expected roads . A road model typically includes a generic description of the road at an expected location relative to the vehicle (Thorpe, 1991). However, it is not always possible to define a road model that anticipates a variety of environments under a variety of weather con-
I On leave from Lappeenranta University of TeclUlology, Department of lnIonnation Technology, P .O . Box 20, FIN53850 Lappeenranta. Finland.
283
ditions, or construction of such a model can be costly. This is especially true in complex and dynamic real world environments. Therefore, there has been growing interest to design autonomous vehicles that can learn to detect the direction of the road directly from their own observations.
Feature extraction layer
Superviied layer
Uosuperviaed layer I
I
I
:--F;;~~----:: :s-eir:-------; : :---MPL-----;
G
h
~
~
: extraction OraanizatioD network : : : via SOMs : ; : .. - -.--_........... .. -- .... .... , -- ----_ ........... , Image Gabor Feature Road featura clusters direction
Artificial neural networks (ANNs) present a computational paradigm for constructing a visionbased road direction detection system for an autonomous vehicle. Much of the allure of the ANNs is their learning and generalization abilities, and their fault and noise tolerance, i.e ., their robustness. In the literature there are many examples where the ANNs have been used for road direction detection . For instance, the road detection system of the ALVINN (Pomerleau, 1989) is based on a feedforward neural network, and a road direction detection system based on Hough transform and dynamical neural network techniques was described in (Bimbo, 1992) .
-----_
Fig. 1. Block dia.gra.m of the proposed system for roa.d direction detection .
poses described in this paper, Gabor filters (Daugman, 1988) are selected as the basic road feature detectors due to their good performance in many computer vision applications (see e.g. (Lampinen, 1992; Heikkonen, 1994)). Gabor filters are orientation and frequency sensitive band pass filters which have been shown to possess good localization properties both in spatial and frequency domains (Daugman, 1988), making them suitable for extracting orientation dependent frequency contents, i.e., edge-like features. It should again be noted that no attempts to construct a symbolic description of the viewed scene is made; the system only relies on the information of the Gabor features extracted from the scene.
This paper proposes a road direction detection system for outdoor autonomous vehicles . The system is based on Gabor filters and artificial neural networks. The system employs a self-organizing feature selection based on Self-Organizing Maps (Kohonen, 1982; Kohonen, 1989) . Combined with a neural network classifier, the system can learn to detect road directions directly from observed road images. This paper describes the proposed system and reports experimental results on real world road images taken by a camera mounted on a passenger car .
For Gabor feature extraction , the view of the camera, i.e., image g(x, y) (N., x N y pixels) is divided into equally spaced grid of T., x Ty nodes . Let the coordinates of the grid node (i,j),i 1, .. . ,T."j = 1, .. . ,Ty , be given by (Xi,Yj) in the coordinate system of the image g . Then the discrete Gabor features Gij in grid node (i, j) are defined by the convolution G ij «(J,f) L:., ,y E9 g(x,y)·T/J(Xi-X,Yj-y,(J,f), where g( X, y) is the image gray level of pixel (x, y), and T/J(x, y, (J, f) is the self-similar family of the 2-D Gabor filters. The self-similarity of the filters means that all the filters are dilations, translations, and rotations of each other .
= =
2. ROAD DIRECTION DETECTION SYSTEM Fig. 1 shows the overall system architecture of the proposed road direction detection system . The system consists of three major layers: Feature extraction layer is based on self-similar Gabor kernels (Daugman, 1988) of different orientations and widths; Unsupervised layer uses SelfOrganizing Maps to cluster the Gabor coefficients to higher level image features suitable for road direction detection; and Supervised layer based on multilayer perceptron (MLP) network (Rumelhart, 1986) trained with a back propagation algorithm carries out the road direction detection according to the Gabor coefficients clusters.
The Gabor filters are spatial sinusoids localized by Gaussian window, and they are defined in spatial domain with the impulse response (centered at the origin) given by (Daugman, 1988) :
+ fyy)) (f; + f;)(x 2 + y2)
.
20'2
),
T/J(x, y, (J, f) = exp (i(fr x
exp(-
=
(1)
=
f cos (J, fy fsin(J, and x and y where fr are pixel coordinates. Parameter f determines the central frequency of the pass band in orientation (J : (J k E {O, .. . ,I{ - I} and parameter 0' determines the width of the Gaussian envelope along the x- and y-directions.
2 .1. Gabor Feature Extraction
At the first stage of the road direction detection system the field of view of the camera is processed for visual feature extraction. There are several criteria for environment features to be useful for autonomous vehicles. Features must be easily computed, robust, insensitive to various distortions and variations in the images, and they should be meaningful for the given application. For pur-
= 't,
In the feature extraction several Gabor filters with different orientations and widths were used . In support of the research reported in (Lampinen,
284
1992; Heikkonen , 1994) we used (J' = 7r, eight values of () (K = 8), and frequencies f = {7r/2,7r/4,7r/8}. Thus for each grid node (i,j), i = 1, ... , T z , j = 1, . .. , TI/ there are 24 features Gij with varying () and f .
Kohonen, 1989) algorithm is quite similar to any vector quantization algorithm. In vector quantization the problem is to represent the NIdimensional input space VI by a fixed number of vectors called codewords. Each codeword will correspond and represent a set of points in the input space that are closer to that codeword than to any other codeword . The SOM is a method for determining the codewords for a given input space; the codewords are represented by the weight vectors Wn of the units n in the map VM . However , the SOM algorithm does not only try to find out the most representative codewords for the input space VI of interest by forming a mapping from VI into a typically lower N M-dimensional space VM of the map ; it also tries to preserve the topological relationships between the inputs. This means that the input vectors which are close to each other in VI tend to be represented by units close to each other on the map space VM . The closeness between the units in the map is described by defining the map as a 1-, 2-, or multidimensional discrete lattice of units .
The complex Gabor filters capture the whole frequency spectrum, both amplitude and phase. Although the phase information evidently contains crucial information about the locations of edges and other details in the image, in this application the phase information is rejected in further processing by computing only the amplitudes of the convolution results Gij at each frequency levels separately. There are two main reasons for that . First, the amplitude of the outputs of the Gabor filters changes smoothly over the image which supports our grid node feature extraction technique. Second, the amplitude of the Gabor transformation has been found to be a robust, distortion tolerant feature space for many pattern recognition applications (see e.g. (Lampinen, 1991; Lampinen :1992)).
The training algorithm of the SOM is based on iterative random sampling where one sample, the input vector x : x E VI, is selected randomly and compared to the weight vectors of the units in the map . The best matching unit b to given input pattern x is selected using some criteria, such as IIWb - xii = min nEvM {llw n - xiI}, where Wn is the weight vector of any neuron n (initially all weight vectors are set randomly to their initial positions into input space ). The best matching unit and the neurons on its topological neighborhood N e( b) are updated towards the given input pattern x typically according to w~ew = W~ld+\lIrb(X-W~ld), where r E Ne(b), and \lIrb, 0 ~ \lIrb ~ 1, is a prespecified adjustable gain function of distance d(b, r) between the units rand b along the lattice. The function \lIrb usually has its maximum at r b and its value decreases as distance d increases. During training N e( n) and \lI rb are usually updated so that the neighborhoods of each neuron and gain function slowly decrease as the iteration counter increases. After a number of input vector presentations the mapping has formed .
2.2. Neural Network Layers of the System
=
The acquired Gabor features G ij , i 1, .. . , T z , j = 1, . .. , T y , produce a feature space for a road image . Thus the problem of detecting road direction can be described as a function learning problem : the road detection system must learn to map from different Gabor features to appropriate road directions. This is also a classification problem . However, the dimensionality of the obtained Gabor feature space is too huge to be used efficiently to determine which features correspond to certain classes, i.e., in this case to road directions . Although theoretically a MLP network used in the classification stage of the system can produce any mapping from the input space to the output space, a huge dimensional input space would require a network with huge number of parameters to learn complex class regions . Consequently, very large amounts of training data and time are required in the network training process. This would be impractical in many applications and also in this case.
=
The mathematical properties of the SOM algorithm have been considered by several authors (see e.g. (Kohonen , 1989; Kohonen , 1991 ; Ritter , 1988) . In brief, it has been shown that after training the weight vectors in the map with no "topological defects" specify the centers of clusters covering the input space , and the point density function of these centers tends to approximate closely the probability density function of the input space VI.
To reduce the required number of learning samples , road direction detection system uses an unsupervised layer based on Self-Organizing Maps (Kohonen , 1982 ; Kohonen , 1989) to reduce the dimensionality of the input data . Thus the final supervised classifier can have a much smaller amount of free parameters to be adjusted according to the input feature space. Also a smaller number of preclassified training samples are required during the training of the system .
There are several versIOns of the basic SOM
The Self-Organizing Map (SOM) (Kohonen , 1982 ;
285
network trained via backpropagation algorithm (Rumelhart, 1986) . The next section gives a more detailed description of the MLP network used .
model. Some of these try to speed up the learning by speeding the process of the best matching unit finding (Lampinen , 1990), or by introducing hierarchical representations (Koikkalainen, 1994) of self-organizations. In this system is used the Tree Structured SOM (TS-SOM) (Koikkalainen , 1994) which is computationaliy faster and easier to use than the original SOM .
3. EXPERIMENTAL RESULTS This section shows experimental results with real world road images. The image data for teaching and testing of the proposed system was collected by using a videocamera while driving on various roads . The camera was set on the instrument panel of the car without any special mounting rack . Ideal weather conditions were not searched, i.e., direct sunlight or shadows on the road were not wanted to avoid as the aim was to test the viability and the robustness of the system in order to use it in an autonomous passenger car which will be developed within the near future .
In the unsupervised layer of the road direction detection system the acquired 24 element vectors G;j are first presented to three Self-Organizing Maps denoted by V; , k = 1,2, 3. There is one for each frequency level f = h, = SOM 7r/k, k = 1, 2, 3. The purpose of the maps V; is encode the Gabor features G;j (0'/1:) into more compact form . This image feature coding stage produces codewords C}( i, j) for Gabor vectors G;j (0'/1:) according to the best matching units in the map V; : C}(i,j) bmu(V; , G;j(O , fl:)) , where bmu(V; , G;j(O'/k)) returns the indexes of the best matching units b~j in the map V; for the inputs G;j(O'/I:) , i = 1, . . . , Tr , and j = 1, . .. , Ty . In encoding some information will be lost , but the SOM can preserve the topology of the input data, meaning that small distortions in inputs introduce only small changes in the encoding . Hence similarity relations in input data are preserved .
V;
=
The images were grabbed from videotape by using an ordinary image processing board . The size of the grabbed road images was 241 x 256 pixels. Total of 168 road images with 256 gray levels were taken. By mirroring the taken images, 336 images were obtained for experiments. Fig. 2 shows six typical road images used in testing of the system . The camera setup unfortunately constrains the useful image area. It can be noticed (see e.g. Fig. 2) that the hood of the car always appears at the bottom of the image and that the road data is always contained in the lower half only. Therefore , the first 120 and last 21 rows were discarded . In addition, the first and last 8 columns were left out , leading to 100 x 240 images for the feature extraction stage.
At the next stage the codewords C}( i , j) are clustered into "high-level" image feature information which can be used directly by a MLP network to determine the direction for a road. For that purpose , a hierarchical SOM clustering scheme based on multi-layer SOM (MSOM) is employed for its known clustering properties (Luttreli , 1989; Lampinen :1992) : It minimizes the total quantization error produced by the SOM layers and it produces clusters whose form is dynamically matched to the distribution of the input samples. The basic one layer SOM divides the input space into regions by placing a fixed number of reference vectors (or codewords) in it . In the multilayer SOM, the outputs , i.e., the indexes of best-matching units of one or more SOM are fed into another SOM as inputs. The MSOM is able to form arbitrarily complex clusters somewhat in analogy with the MLP networks (Lampinen , 1992) .
In order to train and test the usefulness of the road direction detection system, the direction of the road in each image was defined by using a proctractor in measuring the angle between the normal of the road center line and the tangent of the road about 20m ahead of the car. The angle was measured within 5 degrees of accuracy in such a way that if the road turns left the direction angle is negative; otherwise it is positive . It should be noticed that the measured angle can only be an approximation of the real road direction . Furthermore , the measured angles belonging to the interval [-80 0 , 80 0 ) were normalized into interval [-0 .8, 0.8).
In this multilayer SOM clustering approach the codewords C}( i, j) at the same i-level which corresponds to same grid node row in the image are mapped through SOMs denoted by VJ , k 1,2 , 3. Thus , for each level i, codewords Ct(i) are obtained : Ct(i) bmu(VJ , {C}(i , l) , ... , C}(i , Ty )}) , k 1, 2, 3. Finally, the codewords (i) can be used in the supervised layer of the road direction detection system . The supervised layer , i.e., the classification stage of the system , is based on a MLP
In the experiment the Gabor image features G;j(O , /l:),i 1, ... , Tx , j 1, ... ,Ty , and k 1, 2, 3 were extracted in the equally spaced grid of 10 x 20 nodes (i .e., Tx = 10 and Ty = 20) . In support of the research reported in (Lampinen , 1992; Heikkonen, 1994) , the SOMs and v,k in the . d v unsuperVlse layer of the system had 128 units in one-dimensional discrete lattice . In the supervised layer a 3-layer MLP network of 30 input neurons ,
=
=
ct
=
=
V;
286
Table 1
Number of road images (in the training and test sets) where the system has correctly predicted road directions within the given prediction accuracy.
Predictation accuracy 50 100 15 0 20 0
Training set 236 images 211 226 230 231
Test set 100 images 57 81 86 91
Fig. 2. Six road images from a set of 336 grabbed images used in the testing of the road direction detection system.
one hidden layer with 20 neurons, and an output layer of one neuron was used . The MLP network used had linear output layer and tansigmoid input and hidden layers, and it was trained with a fast backpropagation algorithm with momentum and adaptive learning rate (Vogl, 1988).
Fig. 3. Four road images of the test set where the error in the predicted road direction was more than 20 degrees .
ever, there are some road images where the prediction of the road direction was more than 20 degrees . Too small training set is clearly one reason for that; by increasing the size of the training set the prediction results should improve. Of course, more experimental results should be carried out in order to validate the proposed road direction detection system but, nevertheless, the results are encouraging considering real autonomous vehicles. Fig. 3 shows three images in the test set where the prediction error was more than 20 degrees.
The road images were divided randomly into training and test sets . In the training set there were 236 images leaving 100 road images to the test set . First the SOMs V;' and VJ were trained with the Gabor features extracted from the trainmg Images. After that the MLP network was trained . In order to obtain best generalizing results , the training of the MLP was stopped and the best-generalizing weights were saved when the road direction prediction results of the test images were most accurate. The accuracy of the prediction can easily be determined by summing the errors between the defined road directions and the predicted ones . When the summed error is in its minimum , the best-generalizing weights are obtained.
4. CONCLUSION This paper proposed a system for learning and recognition of road directions from camera images. The basic components of the proposed road direction detection system are the feature extraction layer based on Gabor filters, the unsupervised layer based on Self-Organizing Maps, and the supervised layer based on multilayer perceptron network. The proposed treatment of road direction detection learning for an autonomous vehicle has addressed several crucial issues. First, it is beneficial to learn road directions in different
Table 1 shows the prediction results obtained both with the training images and the test images as the function of the prediction accuracy from 5 degrees upto 20 degrees. The results illustrate that after training the system can estimate the direction of the road within 10 degrees accuracy. How-
287
Topologically Correct Feature Maps . Biological Cybernetics, 43, 59-69 . Kohonen, T . (1989). Self-Organization and Associative Memory. Springer- Verlag, Berlin, Heidelberg . Kohonen, T . (1991). Self-Organizing Maps : Optimization Approaches. A rtificial Neural N etworks (T . Kohonen , K. Miikisara, O. Simula and J . Kangas (Eds.)), Vol. 2, 981-990 . Koikkalainen , P. (1994) . Progress with the TreeStructured Self-Organizing Map . Proc. 11th European Conference on A rtificial Intelligence , Amsterdam, Netherlands , 211-215 . Lampinen, J. and E . Oja (1990) . Fast Computation of Kohonen Self-Organization . Neurocomputing: Algorithms, Architedures and Applications (F. Fogelman Sou lie and J . Herault (Eds.)) , NATO ASI Series F: Computer and Systems Sciences, Springer-Verlag, 65-74 . Lampinen, J. (1991) . Feature Extractor Giving Distortion Invariant Hierarchical Feature Space. Applications of A rtificial Neural Networks If (S .K . Rogers (Ed .)), Proc. SPIE 1469, 832-842 . Lampinen , J ., Neural Pattern Recognition : Distortion Tolerance by Self-Organizing Maps, Dod. Tech . thesis , Lappeenranta University of Technology, Lappeenranta, Finland , 1992. Luttrell, S.P. (1989) . Self-organization: a Derivation from First Principles of a Class of Learning Algorithms . Proc. International Joint Conference on Neural Networks, Vol. 2, 459-498. Nitzan , D ., R . Bolles, J . Kremers and P. Mulgaonkar (1987). 3-D Vision for Robot Applications . Machine Intelligence and Knowledge Engineering for Robotic Applications (K.C . Wong and A. Pugh (Eds.)) , NATO ASI Series F : Computer and Systems Sciences, SpringerVerlag , 21-89 , 1987. Pomerleau, D.A . (1989) . ALVINN: An Autonomous Land Vehicle in a Neural Network . Advances in Neural Information Processing Systems (D .S. Touretzky (Ed .)), San Mateo, CA : Morgan Kaufmann, 305-313. Ritter, H. and K . Schulten (1988). Convergence Properties of Kohonen's Topology Preserving Maps: Fluctuations , Stability, and Dimension Selection . Biological Cybernetics , 60, 59-71 . Rumelhart , D .E ., R .J . Hinton and R .J . Williams (1986) . Learning Internal Representations by Error Propagation. Parallel Distributed Processing (D .E. Rumelhart, J . McCelland (Eds. )), The MIT Press, Cambridge MA , 318-362. Thorpe , C .E . (1991) . Outdoor Visual Navigation for Autonomous Robots. RobotiCS and Autonomou s Systems, 7,85-98. Vogl , T .P., J .K . Mangis, A .K. Rigler, W .T . Zink and D.L. Alkon (1988) . Accelerating the Convergence of the Backpropagation Method . Biological Cybernetics, 59 , 257-263 .
weather conditions directly from concrete observations without any need for deriving symbolic road descriptions from the images. In general it would be tedious, time consuming, and even impossible to write explicit road direction detection programs for autonomous vehicles that must work on various road types . It is much easier if the system can construct the knowledge for road direction detection task directly from measurements perceived through the system's own sensors, as in our case. This also reduces the cost of maintenance and reprogramming. Moreover, after training , while operating on-line, the proposed system is computationally quite moderate, making it applicable to autonomous vehicles with modest computing resources . The performance of the system was tested with real world road images taken by using a camera mounted on a passenger car . The results given in this paper are promising even when only about two hundred road images are used to train the system . Of course, it is obvious that the actual viability of the proposed road direction detection system cannot be guaranteed without experiments on a real autonomous vehicle . Therefore, in the near future the system will be tested with a passenger car (Toyota Crown -71) equipped with actuators controlling steering, brakes and rpm 's of the engine while driving on a road. The computer responsible of the navigation of the car is 486/100 MHz lap top equipped with 1/0- and image processing board . The maximum speed of the experimental vehicle will be about 50 km/h . The top speed of the car under comput.er control is set directly by the performance of the computer used . In this case the goal was to develop autonomous navigation system which is easy to install any kind of vehicle. Also keeping the hardware costs on reasonable level was one of main themes. In the future the plan is to train the system for driving on the gravel paved roads as well as operating in winter conditions . 5. REFERENCES Bimbo, Del A., L. Landi and S. Santini (1992) . Dynamic Neural Estimation for Autonomous Vehicles Driving . Proc . 11th IAPR International Conferena on Pattern Recognition , Vol. 11 , 350-354 . Daugman , J. (1988) . Complete Discrete 2-D Gabor Transforms by Neural Networks for Image Analysis and Compression. IEEE Trans . on ASSP, 36(7),1169-1179 . Heikkonen , J . (1994) . Subsymbolic Representations , Self-Organizing Maps , and Object Motion Learning. Dr .Tech . thesis , Lappeenranta University of Technology, Finland . Kohonen, T. (1982) . Self-organized Formation of
288