Copyright © IFAC 12th Triennial World Congress, Sydney, Australia, 1993
AUTONOMOUS VISION-BASED NAVIGATION IN BUILDINGS R. Frezza*, F. Vedana*, G. Pled* and P. Perona*'** ·UniversilO di Padova.ltaly ··California inslilule o/Technology, USA
Abstract- In this paper, the problem of vision-based autonomous navigation inside buildings is addressed . In our simplified scenario the vehicle navigates along an irregular corridor the shape of which is not known a priori, but is is characterized by boundaries having a simple piecewise-Iinear profile. The navigation problem is approached as the control problem of tracking the 'mopboard' of one of the walls of the corridor while keeping at a preset distance from it . The state of the system consists in the state of the vehicle augmented by a description of the environment. The environment is sensed using early vision techniques and is modelled as a filtered random walk, as proposed by Dickmanns in his work on road-following. Dickmanns' road model is augmented with a jump-process to handle environment discontinuities. We demonstrate our scheme and compare it to Dickmanns' in extensive sirnulations.
Keywords: navigation, autonomous navigation, autonomous vehicles, vision, computer vision
1
predict as accurately as posible the geometry of the border.
INTRODUCTION
The problem of visually-guided autonomous navigation in man-made environments is studied. By 'autonomous' it is meant that the human operator enters the picture only for 'high level' mission task definition . The vehicle (mobile robot) navigates in proximity of man-made structures. These structures are a priori not known , but are characterized by borders or boundaries of a simple geometric shape having some kind of regularity or predictability. Typical examples are unmanned automatic vehicle steering and guidance on unknown roads. navigation of unmanned submarines along underwater structures (pumping wells or cables) , the motion of a space vehicle along orbitant space structures, etc. In this kind of applications the vehicle must typically follow a preassigned visible line , a border or a priviledged boundary of some kind, perhaps coasting at a preassigned speed while keeping at a desired distance and with fixed relative heading with respect to the line. We stress that the borderline to be tracked, although of a generically 'familiar' shape, is a priori unknown. To accomplish its mission. i.e. do accurate and safe navigation, the mobile robot must therefore be able to do continuously on-line estimation of its environment . ~o as 1.0 recognize and
Vision is a particularly appropriate sensor to use in navigation applications. Images are captured via TV cameras mounted on the vehicle, either rigidly attached or attached to a mobile platform to achieve active gaze control. Relative vehicleenvironment position measurements are easily obtained on the focal or image plane of a TV camera by just reading off the digital coordinates of the pixels composing the part of image of interest. Use of early vision techniques which do online feature (say borderline edge-detection or detection of special points on the borderline) recognition is standard [Dickmanns. 1988b]. The use of Dynamic Vision as suggested by Dickmanns in his fundamental work [Dickmanns et al., 1989, Dickmanns, 1988b, Dickmanns, 1988a, Zapp, 1988] is especially attractive as it does not require processing of the whole image frame but does instead only require feature-following on the image plane by using a bank of Kalman filters in parallel (which update predicted positions of interesting features) and only local processing of data in small windows of the image plane. In this paper we shall consider a rather idealized situation due to a very particular kind of environment, namely hallways in the interior of buildings. We shall assume in this paper that the vehicle moves on a plane (the floor) and the line to
• Correspondence should be "dcircsse
867
be followed is just the intersection of the walls with the fioor, which we shall call the mopboard hereafter. Discontinuities of the mopboard may occur due to branches of the corridors and to secondary entrances, protrusions etc. This type of environment may be considered a generalization of the road environment of Dickmanns in that we allow for these discontinuities. We demonstrate that model of the environment used by Dickmanns is inappropriate for navigation in this type of environment and propose using online estimation of discontinuity location to supplement the filter's estimates.
1.1
Vehicle-environment geometry
To describe the geometry of the system we shall find convenient to work wit h several coordinate frames:
vehicle frame {x,.
I,.,
z.} - This has origin in the projection of the cent er of gravity G of the vehicle on the iloor; the axis y. is parallel to the fioor and ·.::)lncides with the normal advancement direction (surge), x. is also parallel to the floor and normally facing the wall, z. is vertical.
{Xa,Ya,Za} -It is the 'projection' of the vehicle frame onto the mopboard: it has origin on the mopboard in the point where it intersects the line containing x •. The axis Xa lies on the floor orthogonal to the mopboard, Ya is tangent to the mopboard and Za is vertical facing upwards.
camera height over the fioor plane, and f is the focal lenght of the TVcamera (e.g. f = 35mm). This approach is quite suitable for on line calculations as the location x. of the mop board at distance L may be done very simply and cheaply by a search along TJ const on the image plane. The coordinate { measured on the image plane is related to the world coordinate again by perspective projection: x. The vehicle is of the sinchro-drive type, with three steering and simultaneously turning wheels whose instantaneous direction is also the current velocity vector direction (no slippage). The steering angle with respect to the body principal axis y., (which is also the camera optical axis) is supposed to be small enough so that longitudinal and lateral motions can be approximately decoupled. The longitudinal speed is held approximately constant by a longitudinal controller to a certain nominal value v which is computed based on a priori specifications on maximal tolerable centripetal acceleration, stopping time to avoid collision with the wall, etc. In the simulations v has been chosen equal to 0.5 m/so The mass of the cart is m = 150K 9 and the moment of inertia J = 40K gm 2 •
=
= {to
environment frame
inertial fixed frame {x f' Yf' Z f} - This is constant in time and equal to the environment frame at time t = o. The environment and inertial systems are initially coincident at t = O. After the departure the environment frame moves and keeps following the vehicle but travelling along the mopboard. There is also a fourth coordinate frame {{, TJ} on the image plane of the TV camera. The visual measurements of the mopboard mimic Dickmanns' scheme and one of the mobile vehicles at the LR.S.T. in Trento [Poggio et al., 1992] . The observation process is simplified assuming that only one wall is observed and only one feature point is considered as a significant description of the mopboard at any given instant t. The selected feature is the intersection of the mopboard with a the vertical plane y. = L crossing the y. axis perpendicularly at a predefined distance L from the focus of the camera (L is a design parameter representing the "look-ahead distance" of the system). If the optical axis of the camera is aligned with y. the intersection of the plane y. = L with the floor projects (we are assuming perspective projection) onto the image scanline TJ = -hi where h is the
2
MODELING AND ESTIMATION OF THE ENVIRONMENT
In this section a simple dynamical model of the environment will be introduced and its use in estimation of the environment will be discussed . The mop board is modelled as a continuous curve and therefore will not adequately describe the jumps occurring at wall recesses or correThe jumps sponding to sudden protrusions. could in principle be dealt with by modeling the mop board line as a jump process, the observation just as a discontinuity counting process and then using appropriate jump processes estimation techniques [Segall et al., 1975b, Segall et al., 1975a, Bremaud. 1981]. This approach. although seemingly methodologically the correct one, is complicated by the simultaneous presence of continuous obervation noise (pixel location, various camera inaccuracies, vehicle chattering etc.) which makes the observation process of a mixed counting-continuous type. For this reason the simple jump-process estimation algorithms used for counting observations are of no immediate use. In fact, accurate detection of the jump points in the observation signal is one of the major problems for an accurate estimation. The detection of jumps in the observations will be dealt with separately by means of an ad hoc technique inspired by a classical early vision edge detection method [Canny, 1986]. Between jumps the problem will be treated as an ordinary Kalman filtering
868
problem of continuos type. The interplay of the two techniques has turned out to yield quite satisfactory results.
2.1
The model is inspired by the ~autobahn" model of Dickmanns [Dickmanns, 1988a]. The mopboard line is essentially modeled as a curve with a timevarying random curvature. We choose to model the curvature of the wall as seen by an observer travelling on board the vehicle as an integrated white noise with a suitable variance (proportional to the square of the longitudinal speed v). In environmentfixed coordinates a point on the mop board at a vertical distance Ya from the origin will then deflect from the vertical Ya-axis by the random amount
=
2 CYa
Y
(2)
the distance between the origins of the body and environment frames:
c : the curvature of the mopboard. In the body reference frame the lateral dynamics is then given by
{
2cv -vt/J v
W -
(5)
f(-t/J
+ f + cL)
(6)
3
JUMP DETECTION
T
2c..r~ + 2 arctan c;;;w - 2 arctan -vtan t/J + WX tan t/J 7
W
The main limitation of the mop board model discussed in the previous section is that it describes inherently continuous signals. while the wall profile can obviously be highly discontinuos. It is quite evident that explicit modelling of the discontinuities can substantially improve the characteristics of the filter. Attempts to bring in discontinuous modeling of the jumps have lead to complicated models and rather non-robust algorithms with divergence problems. In this paper we propose instead an on-line jump detection algorithm operating directly on the observed signal (i.e. without using any a priori modeling of the dynamics of the jumps). From extensive experimentation it has actually been found that a simple variant of a very common edge-detection technique used in early vision. namely the so-called Canny filter. serves the purpose quite well.
2cv COd '"
v
(3) where T is the control torque and v is treated as a known (but possibly time-varying) parameter. Observe that the rate of change of the heading variable t/J is actually the sum of the vehicle's angular speed and of the turning speed of the environment frame Wa = 2 arctan ~~:~ - 2 arctan c~~v",. The measurement equation is obtained from simple geometric computations. with a slight approximation : Y
T
J
If we treat the longitudinal speed vasa fixed parameter, this is a time-varying linear system and linear systems analysis can be applied. The linearized model (5), (6) has been used for implementing a Kalman filter to do online mopboard estimation. The sampling period ~t = 4 . 10- 2 s was chosen corrisponding to the frame frequency of the TV camera. Even in simple open loop simulations, say with the vehicle traveling at constant longitudinal speed along a straight line, the extreme sensitivity of the filter and the need for a very accurate tuning is apparent. The tuning of the filter. done to get a best possible estimate for both the continuous tracts of the signal and the jumps, results in a very poor overall performance. Large spikes are observed in the innovation process in correspondence to discontinuities . In order to follow the x variable the estimate of t/J (which should be constant equal to zero everywhere) is considerably worsened.
the heading relative to the environment. i.e. the angle formed by Ya and y.;
= = =
= =
while (4) becomes:
the (inertial) angular velocity of the vehicle
w(t) ,pr t) i( t) c(t)
x tan t/J :::: 0
t/J( t) i( t) { e( t)
v(t} being a white noise process of suitable variance. The wall's profile is described locally as a family of parabolas. Of course the mop board line is very far from a parabola but, locally, far off from jump points, the model is acceptable. With the above description of the environment, the lateral dynamics of the vehicle can be written in terms of four state variables:
x
W
o·,
~(t)
where c(t) is the curvature described by
w
2.
co • .p -
The linearized state equations ( 3) become:
(1)
e = v(t)
2c.rw,...."
3. even with focallenght f = 35mm (a wideangle lens) the angle of observation of the feature point is $ 30°; the approximations cost/J :::: 1 and sint/J :::: t/J may be considered acceptable.
2nd order model
Xa
1•
f(-tgw
+ f + c~:-", )
(4)
A linearized model is derived from (3),(4) assuming small angles and small angular velocities, viz.:
869
Our Canny-like edge-detector used for jump estimation is a simple correlation-threshold device which operates on the innovation process of the Kalman filter by a finite memory correlation mask (discussed below), followed by a suitable threshold to detect the spikes in the innovation process that are, with highest probability, due to discontinuities in the mop board signal. To get an optimal design of the correlation filter, one should in principle choose a Canny kernel adapted to the particular edge to be detected [Perona et al., 1990] . In our case the kernel has been chosen equal to a rectangular pulse of suitable width W . The (discretized) innovation process is thus integrated along a moving window [t - W, t] in time and the integrated process is then continuously compared with a suitable treshold. The width W must acually be chosen not too large so as to avoid unacceptably long delays in the edge recognition process. The output of the detector is clearly proportional to the size of th ,~ Jump in the wall. Through a suitable proportJO nality factor (which is calculated in the tuning phase ofthe detector) an estimate of the jump size is also obtained . This is used to correct on-line the Kalman filter estimate at jump times. At jump occurrence times the filter covariance is also re-initialized so as to get a faster transient. As for the treshold level, in presence of additive Gaussian noise one should choose a value of the order of the inverse amplitude signal-to-noise ratio. Exact merit figures (probability of false detection, probability of missing a dicontinuity, SNR) are calculated in [Canny, 1986, Perona et al. , 1990]. This is just a rule of thumb to set startup values in running simulations. The fine tuning of the treshold value is then done experimentally. As it is seen from the simulations, the filter performance is greatly improved by the introduction of the jump dection algorithm . 4
CONCLUSIONS
Our work is part of the effort to approach V\SlOnbased vehicle nevigation with modern control and estimation techniques. We have presented a simple scheme for vehicle navigation inside buildings. Our scheme may be seen as an extension of previous work on navigation by Dickmanns and others; while they study smooth road-following we study the case in which substantial discontinuities are present in the environment . We have shown experimentally that in this case continuous world models lead to poor estimation and tracking performance. We have proposed detecting world discontinuities with a Canny-like edge detector whose output is used to update the state of the Kalman filter. An implementation of this idea performs excellently in
our simulations. We believe that we are able to simulate realistically the estimation and control aspects of indoor navigation; the vision techniques on which our scheme would rely are simple and robust and should not represent an obstacle in a real-world implementation. Work is in progress to validate our scheme by demonstrations of indoor navigation of a real vehicle.
5
ACKNOWLEDGEMENTS
This research will be partially sponsored by the Italian Space Agency (ASI) grant n. CS-ASI-92-298. We are grateful to Tomaso Poggio for useful discussions and for providing useful references. Discussions with P. Bellutta and T . Coianiz of IRST were also useful for understanding many practical aspects of autonomous navigation of mobile vehicles.
6
REFERENCES
P. Bremaud (1981) . Point Processes and Queues. Martingale Dynamics. Springer-Verlag.
J. Canny (1986) . A computational approach to edge detection . IEEE Trans. Pattern Anal. Mach. Intell., 8,679-698. E. D. Dickmanns and Th. Christians (1989) . Relative 3d-state estimation for autonomous visual guidance of road vehicles. In Intelligent autonomous system 2 (IAS-2), Amsterdam, pp. 11-14. E. D. Dickmanns and V. Graefe (1988) . Applications of dynamic monocular machine vision. Machine Vision and Applications, 1,241-261. E. D. Dickmanns and V. Graefe (1988) . Dynamic monocular machine vision. Machine Vision and Applications, 1,223-240. B. Horn (1986) . Robot vision. MIT press. P. Perona and J. Malik (1990). Detecting and localizing edges composed of steps, peaks and roofs. In Proc. 3rd Int. Conf. Computer Vision , pp. 52-57. IEEE Computer Society, Osaka. T . Poggio and L. Stringa (1992). A project for and intelligent system - vision and learning. Intl. J. of Quantum Chem ., 42(4),727-739. A. Segall, M. Davis, and T . Kailath (1975) . Nonlinear filtering with counting observations. IEEE Trans. Inform. Theory, IT-21(2),143-149. A. Segall and T . Kailath (1975) . The modeling of randomly modulated jump processes. IEEE Trans . Inform . Theory,IT-21(2),135-142 . A. Zapp (1988) . Automatische Strassenfahrzeugfuerung durch Rechnersehen. PhD thesis, U niversitaet der Bundeswehr M uenchen, Fakultaet fuer Luft- und Raumfahrttechnik.
870