Copyright © IFAC Transportation Systems, Tianjin, PRC, 1994
PERCEPTION-INTERPRETATION INTERFACING FOR THE PROLAB2 ROAD VEHICLE D. HUTBER, S. MOISAN, C. SHEKHAR and M. THONNAT INRIA Sophia-Antipolis, 2004 route des Lucioles, 06902 Sophia-Antipolis Cedex, FRANCE.
Abstract. This paper describes an interface between a low-level perception system and a high-level interpretation system . This is done in the context of an intelligent co-pilot, which is designed to help the driver of a road vehicle drive more safely. The interface consists of two parts . First, a multisensor data fusion module processes the results of the perception modules and passes a coherent environment map to the interpretation system . Second, a program supervision module translates high-level requests from the interpretation system to low-level perception commands. An example in a realistic traffic situation is presented. Key Words. Image Processing, Artificial Intelligence, Multiprocessing systems, Self-adapting systems, Obstacle avoidance , Automobile
1.
INTRODUCTION
low IBVBI
Within the framework of the European traffic safety project Prometheus, the PROART group in France aims at designing a demonstrator vehicle named PROLAB2. This vehicle is equipped with an onboard guidance system (or co-pilot) to help the driver in negotiating road obstacles, in different realistic traffic situations. A road obstacle may be another car, a bicycle, a pedestrian, a truck, etc. The co-pilot (Hassoun et al., 1993) is an intelligent real-time system which processes the information flowing into it from a variety of sen~rs _mounted on the vehicle (5 CCD cameras, 1 linear stereo camera pair and 1 telemetrycamera combination, in addition to a number of proprioceptive sensors), uses this information to interpret the traffic situation in the vicinity of the vehicle, and communicates its analysis to the driver by various means.
CJ CJ CJ CJ CJ CJ
high IBveI
Inte",..t.tlon
Fig. 1: The architecture of the copilot
terface consists of two modules: the multi-sensor data fusion and the program supervision modules.
2.
The co-pilot consists mainly of the following components (Fig. 1): 1. The sensors and the programs for processing the sensor data (the perception system). 2. The interpretation of the traffic situation (the interpretation system) (Lefort, Abou-Khaled and Ramamonjisoa, 1993). 3. The interface 1 between the above two systems.
THE NEED FOR AN INTERFACE
The need for an interface between low-level perception and high-level interpretation systems is encountered in a variety of applications. The concepts, processing and expertise involved at the two levels are different, and a direct linkage between them is often very difficult. In Prolab2, the low level is concerned with sensors, programs, parameters, data, etc. while the high level deals with zones, obstacles, situations, scenarios, etc.
In this paper, we will focus on item 3. This inIThis interface should not be confused with the interface between the co-pilot and the driver
In this context, we can make the following obser813
a small subset of this information may be relevant to the processing of requests.
vations: (1) It is not possible to directly feed the results of low level processing to the high level, since the two levels deal with different concepts. There is thus a need for the processed low-level data to be translated into a form which can be directly used by the interpretation module. (2) The information that is needed by the highlevel, often expressed in symbolic terms, has to be converted into specific low-level processing commands. In Prolab2, (1) is done using a multi-sensor fusion module, and (2) using a program supervision module.
3.
4.
MULTI-SENSOR DATA FUSION
The "raw" outputs from the perception modules are all expressed in their own local coordinate frames, and are often too noisy and unreliable to be used direcly by the interpretation module. The function of the data fusion module is to convert all the spatial information arriving from the perception modules to the same common coordinate frame, and reduce the uncertainty associated with it by combining it together. The first step is achieved by calibrating all the sensors on the vehicle with respect to a central coordinate frame fixed to it. Classical ways of doing the second step are to use the constraint of temporal coherence (Stephens and Harris, 1988) to achieve temporal fusion, and the constraint of redundance/complementarity (Houzelle, 1992) to achieve multi-sensor fusion. The Prolab2 system uses both of these constraints, integrating measurements from several sensors over time in an asynchronous ill'j'l'oach, to produce a dynamic environment Ill , The output from the fusion module is an clI\i:unment map, which is in terms of a list of percciw:d, real-w(,rld obstacles together with their position and veloci: ,;, itnd the precision and validity of this informatioll. ' In order to manipulate the incoming data, which is for each of the sensors a list of regions in 2D or 3D corresponding to possible 'obstacles' together with a time-stamp, the module has models available of the different sensor types ann the expected precision (spatial uncertainty) ilnd the expected validity (expressed in probabili :~t.ic terms) of the data from them. The configurat.ion of sensors used on the vehicle means that there are currently three main types of sensor having different models. The three types of sensor have different characteristics in terms of the dimensionality of the geometric information they provide, and of its precision. The three types are:
IMPORTANT PROLAB2 CONCEPTS
In this section, we introduce some of the concepts used in the Prolab2 system. Zones of interest: The space around the vehicle is divided into 6 zones, which will hav,: varying significance depending on the current situation of the vehicle. Logical sensors: A logical sensor is composed of one or more physical sensors and of image processing programs. The role of a logical sensor is to provide a precise type of information from a given zone of interest. Requests: Requests are sent from the interpretation to the program supervision, asking for a specific type of information to be made available. Requests can be to focus on a obstacle (tracking), to detect obstacles in a particular zone, etc. Requests are expressed at the level of zones and obstacles, and not at the level of sensors, parameters, data flow, etc. Requests can have different priorities, and a high-priority request can interrupt one with a lower priority. Events: As the test vehicle moves through the traffic, there are various important sensed occurrences of different types such as lane changes, the appearance or disappearance of an obstacle, the detection of a traffic beacon, etc. These are called events. Scenarios: A scenario is a sequence of events corrresponding to a typical driving situation, such as urban road with traffic light or pedestrians, overtaking on a highway or a freeway, etc. Context: In order to process a request, it is not sufficient to know just the input data and the programs that can process the data as desired. Additional information, such as the current scenario, the time of the day, the weather, the current speed of the vehicle, etc. may be needed in the selection of the appropriate programs and their parameters. Broadly speaking, the context can be defined to be the sum of all such information available at the current time. However, only
Sen.or
:ID/3D
Lateral Preciaion
Precision
3D
Low
High
3D 2D
High High
High Low
Type
Time of flight telemetry Linear Stereo
CCD C&mer&
Longitudinal
The longitudinal precision of the stereo cameras is of course dependent on the range. The working of the fusion module is based around the Extended Kalman Filter (Bar Shalom and Fort20 t her information such as the indicator state for motorised vehicles, can also be associated with the obstacles on request by the Interpretation module by activating specialist perception modules. 814
Sensor
Fmt stage (boxes are Kalman flIters)
Second sta,!!C (boxes are position. etc. estimations)
Environment map
on the traffic situation, the interpretation module may require more detailed information about one or more zones in the environmental map, or may be interested in tracking a particular obstacle, in which case it emits a request. This request is expressed in a high-level symbolic language (in terms of zones, obstacles, etc.), which has then to be translated into a series of low-level commands to the perception module. This is the task of the program supervision module.
(list of obstacles)
I-+-~ Obstacle #1
1-+--+ Obstacle #2 1-1---+ Obstacle #3
1-+--+ Obstacle #4
Program supervision in a perception context (VIDIMUS, 1991; Clement and Thonnat, 1993) is far more complex than a simple mapping from requests to elementary operations. The reason for this is that in order to provide the information requested by the interpretation module, a number of programs have to be selected, initialized , scheduled, linked, executed and monitored, all in real-time with time and resource constraints. Further, all these tasks must be done in a completely autonomous fashion. This means that the program supervision module has to encapsulate all the problem-solving knowledge relating to the perception module (which includes knowledge of which processing sequences are appropriate for a given perception goal, how the parameters of the programs can be initialized, how the performance of the programs can be evaluated, etc.). This knowledge may be roughly divided into planning knowledge, which pertains to the choice of programs and their composition into processing sequences, and execution control knowledge , which is concerned with the setting of parameters, the evaluation of program results, and the failurehandling strategies to be employed if the results are not satisfactory.
Fig. 2: Two-Stage Fusion Method
mann, 1988) which assumes constant linear dynamics for each obstacle but uses a time-varying observation model. The asynchronous aspects of the data are accommodated using update equations for the filter that depend on the elapsed time since the last update. The Fusion module works in two stages, the first being a set of filters for each sensor, each filter corresponding to an obstacle observed by that sensor. Thus in Figure2, Sensor 1 observes 3 obstacles, Sensor 2 observes 2 obstacles etc. The second stage links together the filters (at most one from each sensor) that are observing the same physical object, yielding an estimate of its position etc. with an uncertainty less in general than the first stage. Thus the first stage performs the temporal aspects of fusion, and the second stage the multi-sensor aspects. For more details see Hutber, Vieville and Giraudon (1994). For any given obstacle, clearly the precision and validity of the input (and hence the output) information is dependent on many factors, including the frequency of input information relating to the obstacle, and from which sensor type this comes. Furthermore, as described, the fusion module is passive in the sense that it cannot direct atten4 tion to an obstacle or area of interest that could make the environment map more reliable, and it does not have any 'self-awareness' knowledge with which it could request further information. However, this knowledge does exist in the Program Supervision and Interpretation modules, and the example in the next section shows how this information is exploited in the context of a road scenario.
5.
In Prolab2, the basic unit of planning is the logical sensor (Henderson and Faugeras, 1988), which is essentially the combination of one or more physical sensors with a sequence of abstract processing operations. This kind of "skeletal" planning is more appropriate for Prolab2 than the classical AI planning which maps the states of the world into an ordered list of actions. In Prolab2, the states of the world are too complex to be listed exhaustively, and the "actions" (perception tasks) are complex tasks in themselves, whose behaviour cannot be predicted in advance. The skeletal plan represented by the logical sensor is concretized at run time using the contextual information available. The knowledge required for this is expressed in the form of choice rules attached to each logical sensor. The basic execution control strategy for programs is "initialize, run, evaluate results, and if satisfied go to next step, if not, modify parameters and re-try". Execution control knowledge is expressed in terms of rules for parameter initialization, performance evaluation, and parameter adjustment. For rea-
SUPERVISION OF PROGRAMS
As mentioned earlier, the perception and interpretation modules operate at different levels, and cannot communicate directly. The results of the perception module are converted by the data fusion module into a form (an environmental map) which is suitable for interpretation. Depending 815
no safety threat, and we can release resources for other perception tasks. In the next section, we present an example of how this loop mechanism works in a realistic traffic situation.
sons_of efficiency, and due to the real-time nature of the application, in Prolab2 the stages evaluate results and re-try are kept to a minimum, being used for occasional monitoring rather than continuous result optimisation. For more details see Shekhar, Gaudin, Moisan and Thonnat (1994).
7. 6.
THE PERCEPTION/INTERPRETATION LOOP
EXAMPLE
The example shown in Figure 3 is of our car turning left at a crossroads, approaching on a road that has a left filter lane. In the figure, time proceeds down the page, and what is happening in the various parts of the system loop are shown in each row of the figure .
As was detailed in Section 3, the Interpretation module decides which obstacles are 'important' in the current context and may send a request such as to the Program Supervision module. In this sense, 'important' could mean that given its current position and speed, the obstacle N could pose a safety threat to our vehicle.
The column marked Scenario shows the actual situation looked at from a plan view. The Prolab2 car is shown in black, other cars are in grey. The next column shows the sensed Events that are processed by the Interpretation system, and which lead to the Requests made to the Program Supervision module. In turn, the program supervision module converts these Requests into various Measures of the three types specified earlier, and these Measures are commands sent to the Perception modules.
This request is interpreted by the Program Supervision module using its knowledge of 'how to make obstacle information more reliable ', given the other constraints that all other obstacles in the map must be periodically updated, and that a background level of surveillance must be maintained in order to detect new obstacles. Thus the result is a re-allocation of resources to focus on the nominated obstacle, which is implemented by three types of decisions: 1. Activation/deactivation of logical sensors Switch on other logical sensors which have the obstacle in their field of view, and possibly switch off logical sensors that do not. 2. Windowing of image data Restrict processing of data from sensor to a defined window containing the obstacle. 3. Adjustment of parameters Adjust parameters of the processing modules looking at the obstacle to maximise the useful information extracted in the neighbourhood of the obstacle (possibl~to-the detriment of the useful information relating to other obstacles).
As a result of these Measures, the Environment Map, which is the partial information of the environment available to the Interpretation system , changes to give more information where it is needed. In the figure the neighbourhood of our vehicle is divided into six zones. A zone unshaded has active logical sensors working in Detection mode, a shaded zone has no sensors active. In some of the Environment Map diagrams there are shaded ellipses. These represent detected obstacles, the size of the ellipse representing the precision of the data (the larger the ellipse, the less confident we are about its position). It should be noted that the co-ordinate frame of the Map (which is a plan view) , is such that up on the paper represents directly in front of our vehicle, so that as we turn, even obstacles stationary relative to the ground appear to move around us.
As a result of this change of priorities, the Fusion module will now receive more information (either from different sensors, at a higher frequency, or better quality due to the favourable change in the parameter settings) relating to obstacle N, and due to the nature of the filtering process employed, this in turn leads to higher precision and/or validity values associated with the obstacle data. This is then available to the Interpretation module which (in an asynchronous manner) can make one of two decisions. The first is to continue the initiated focus, since the obstacle is still a threat, however, now we know more precisely the nature and magnitude of that threat. The second is to terminate the focus, since the precision and validity values are now sufficiently high that we are now confident that the obstacle poses
The example scenario shows many of the important elements of the communication between the high and low levels in the Prolab2 system. In the Events column, some of the events generated, for example 'Obstacle Detected' and 'Obstacle in front moves away', are determined by examination of the Environment Map, whilst others 'Active Beacon', 'Indicator', 'We start to turn' and 'We complete our turn' are generated from the beacon receiver and the proprioceptive sensors. In the Requests column, the three types of request, Detection, Focus-On and Observation are all illustrated together with their negations. The Measures are of the three types discussed 816
SCENARIO
1=1
EVENTS REQUESTS
Start
0
Detection
-Zone 1
MEASURES ENVIRONMENT MAP
Activation - Front Camera - Linear Stereo
• t=t 1
Active Be.u:on
~~~
_
_
DeleClion - Zone 2
_ _ _ _ _ _ _ ... _ _ _ a
.-
I L I I L
1=13
- Left Camera
indicator On
T
t =t 2
Activation
~_~¥M!l
ObSlaCle #1 De1ecled
Focus On - Obstacle # 1
Lane Change Completed
Stop Detection - Zone 2
Activation
- Telemetry Windowing - Front Camera
De-8ctivalion - Left Camera
I I ~> L
':'4
~
~
--------
~
L
..~ L
~I
I
~
'-
~
I
We arrive at intersection
We start IOlUm
ObslAcie Det.cted POlCntiaI Danger! ObslAciein front moves away
Deactivation
- Telemetry AdjustmcDt of Paramclcn - Expcc1cd angle of lines Dewindowing - Front camera
I
Focus On - Obsl.1I2 Observe - indicator Slate
We complete
Windowing -Front Camera Adjustmcnl of Ponmc.... - Cootrast ncor Oboacle .2
Stop Observing - Obst412 - indicator Slate
our lUm
Activation - Left Camera Adjusonent of Parameters - Expected angle of lines
Activation - Right Camera
Det.ction -Zone 6 Stop Focus On -Obsl. .1
Detection -Zone 5
I
t~IL
I
~~~saway
Detection - Zone 2
I
I
tel7
Obstacle in
Stop Detection - Zones 2.5.6 Stop Focus On -Obst.2
Activation - Rear Camera Adjustment of Parameters - Contrast global
Deactivation - Rear, Lefl Right Cameras
Dewindowina - Front Camera
Fig. 3: Example
817
m Section 6. The Activation of sensors and Windowing on an appropriate obstacle are selfexplanatory. The Adjustment of Parameters example is derived from the fact that some of the modules for vehicle detection and localisation in the Prolab2 system use the cue of sets of closelyspaced, parallel, horizontal and/or vertical lines in the image. This works well for cars seen from the front or rear, as is the case on the open road, but for the case at time t4, obstacle #1 is seen at an oblique angle and hence the cue is changed to that of oblique lines. This expectation is communicated by the Program Supervision module to the perception modules by changing the 'expected angle of lines' parameter. A similar case exists for the contrast adjustment at time t 6 , in response to the request for the observation of obstacle #2's indicator state. The local contrast in the neighbourhood of obstacle #2 is increased by adjusting the image capture parameters to increase the range of grey levels over a window containing the obstacle. This increases the chances of detecting the state of the indicator.
the driver of too many false alarms. Experiments are in progress to find a compromise position. The notion of request emitted by an Interpretation module to the Program Supervision one is very general in vision, independent of the target application. The language used to communicate between these modules is fairly universal, and corresponds to basic "vision tasks". The types of requests we have selected can be used in many other applications. 9. REFERENCES Bar Shalom, Y. and Fortmann, T. E. (1988). Tracking and Data Association, AcademicPress, Boston. A Clement, V. and Thonnat, M. (1993). knowledge-based approach to the integration of image processing procedures, CVGIP: Image Understanding 57(2) : 166- 184. Hassoun, M. et al. (1993). Towards safe driving in traffic situation by using an electronic copilot, IEEE Symposium on Intelligent Vehicles '93, Tokyo, Japan.
Finally, notice that at time t6 the event Potential Danger! has been generated, and the warning sign displayed next to the environment map. This event is generated by modules in the Interpretation module, that analyse the trajectories of the obstacles in the neighbourhood of the Prolab2 vehicle, and decide, based on a variety of measures, whether any obstacles pose a danger. The warning icons, both visual and audible, together with an indication of the direction of the perceived danger, constitute the interface of the co-pilot system to the driver.
8.
Henderson, T. and Faugeras, O. (1988). Highlevel multisensor integration, Sensor Fusion: Spatial Reasoning and Scene Interpretation, Vo!. 1003, SPIE, pp. 307-314. Houzelle, S. (1992). Extraction Automatique d'Objets Cartographiques par Fusion d'Informations Extraites d'Images Satellites, PhD thesis, Ecole Nationale Superieure des Telecommunications. Hutber, D., Vieville, T. and Giraudon, G. (1994). Data fusion for reliable detection and tracking of multiple obstacles in a road environment - an asynchronous approach, Accepted for SPRANN IMACS Conference.
DISCUSSION AND CONCLUSIONS
This paper shows the use of a perceptioninte!1>retation loop in a complex real-time application. The two functional components of this interface, an algorithmic multi-sensor data fusion module and a knowledge-based program supervision module are described. These two modules are defined in a way that is generally usable for many applications, involving for the one, a reduction of uncertainty in the perceived environment, and for the other, the planning and monitoring of perceptual tasks.
Lefort, N., Abou-Khaled, O. and Ramamonjisoa, D. (1993). A co-pilot architecture based on a multi-expert system and a real-time environment, IEEE Int'l Conference on Systems, Man and Cybernetics, Le Touquet, France. Shekhar, C., Gaudin, V., Moisan, S. and Thonnat, M. (1994). Real-time supervision of perception programs for prolab2, Accepted for SPRANN IMACS Conference.
In this application, the Fusion module, together with the trajectory analysis modules handle all the uncertainties in the environment, presenting to the Interpretation module only information that is certain enough to be acted upon. There is clearly a tradeoff between false alarms and missed targets, and with such a 'driver's assistant' system, the risk of having an accident from a missed target has to be countered by the annoyance to
Stephens, M. and Harris, C. (1988). 3d wire-frame integration from image sequences, Fourth Alvey Vision Conference, pp. 159--166. VIDIMUS (1991). Vidimus esprit project annual report, Technical report, British Aerospace, Sowerby Research Centre, Bristol, England.
818