A FRAMEWORK FOR VEHICLE PLATOONING BASED ON MONOCULAR VISION

A FRAMEWORK FOR VEHICLE PLATOONING BASED ON MONOCULAR VISION

A FRAMEWORK FOR VEHICLE PLATOONING BASED ON MONOCULAR VISION Eric Royer ∗ Maxime Lhuillier ∗ Michel Dhome ∗ Fran¸ cois Marmoiton ∗ ∗ LASMEA UMR6602 C...

1MB Sizes 1 Downloads 91 Views

A FRAMEWORK FOR VEHICLE PLATOONING BASED ON MONOCULAR VISION Eric Royer ∗ Maxime Lhuillier ∗ Michel Dhome ∗ Fran¸ cois Marmoiton ∗ ∗

LASMEA UMR6602 CNRS Universit´e Blaise Pascal 24 Avenue des Landais, 63177 Aubi`ere, France

Abstract: In this paper, we present a framework used to compute the relative pose of several vehicles in a platooning configuration. The localization of each vehicle is done separately with monocular vision. A wireless communication is used so that the vehicles can share their positions and compute the distance between them. This approach doesn’t assume that the following vehicles can see the leader. The main difficulty is the synchronization of the localization information. A motion model of the vehicles is used to solve this problem. Experimental data show how vision can be used as a localization sensor for vehicle platooning. Keywords: autonomous vehicles, computer vision, position estimation

1. INTRODUCTION Platooning means coupling two or more vehicles without a physical link to form a train. It is an interesting concept for efficient transportation systems. In particular, in urban environnements, several vehicles take less road space in a platooning configuration compared to independant vehicles. Several sensors have been used succesfully for platooning in the past (for example a radar in (Kuroda et al., 1998) or an RTK GPS sensor in (Bom et al., 2005)). The GPS is often used for vehicle localization, but it has some limitations. A low cost GPS receiver is not enough accurate in this application. An RTK GPS is suitable but very expensive and the signals from the satellites can be masked by tall buildings in urban canyons. In this paper, we focus on monocular vision because a camera is a very cheap sensor with a low power consumption. Moreover, using a single camera simplifies the hardware compared to a stereo system. In order to control the vehicles, the vision algorithm needs to provide the relative pose of the

vehicles. There are two main ways to do that. We will explain them in the context of two vehicles but this can be generalized to several vehicles. In the first configuration, the second vehicle has an on-board camera. Visual tracking of the first vehicle allows to compute the relative pose of the vehicles. This method has been used succesfully by Benhimane and Malis (Benhimane and Malis, 2005). In the approach we develop in this paper, each vehicle is equipped with a camera and computes its position with reference to the environment. A wireless communication is used so that the first vehicle can send its position to the second. Then the second vehicle computes the relative pose of the vehicles. This method is close to what can be done when the vehicles obtain their positions from a RTK GPS. Our method has the drawback that it needs a way to communicate between the vehicles, but it also overcomes some limitations of the method relying on visual tracking. The main difficulty in the visual tracking approach is that the first vehicle must be seen by the second one. This may not be the case in tight turns even with a wide

angle camera. In (Benhimane and Malis, 2005), the problem is solved by using a pan-tilt camera to keep the first vehicle in the field of view of the second one. But this isn’t enough to overcome an occultation. In particular, if more than two vehicles are used, our method allows to compute the relative pose of the vehicles number 1 and n without error accumulation which is detrimental to the accuracy of the longitudinal control of the vehicles. We plan to use the control law designed by Bom et al. (Bom et al., 2005). It was designed specifically for the control of several vehicle without error accumulation. We have already developped a vision based algorithm to compute the pose of a vehicle with reference to the environment (Royer et al., 2005b). It has been used for the lateral control of a single vehicle. The topic of this paper is to show how this algorithm can be adapted to the longitudinal control of the vehicles in a platooning configuration. The main difficulty in this configuration is that each vehicle has its own camera to compute its pose. Since the image acquisition is not done at the same time, it’s not possible to compute directly the distance between the vehicles. In this paper we propose a framework to deal with the time synchronization. The integration of the computation and communication delays is also something we consider here. These must be taken into account to compute an accurate longitudinal position of the vehicles. In the framework presented in this paper, a motion model and covariance is associated to the pose so that the vehicle position can be computed at a given time after the image acquisition. Our approach has some similarities with the one described by Tae Soo No et al. (No et al., 2001). In section 2, we give a summary of the vision algorithm used to compute the localization of a vehicle with reference to the environment. In section 3, we present the motion model we use to predict the position of the vehicle and how we use this model to compute the distance between the vehicles. Finally, in section 4, we give some experimental results.

2. LOCALIZATION WITH MONOCULAR VISION This section is a summary of the localization system published in (Royer et al., 2005b). It was used with good results for the lateral control of a single vehicle in (Royer et al., 2005a). The localization algorithm relies on two steps as shown on figure 1. First, the vehicle is driven manually along a trajectory and a monocluar video sequence is recorded with the on-board camera.

Fig. 1. Localization with monocular vision From this sequence a 3D reconstruction of the environment and the trajectory is computed. Because we use only one camera, this is a structure from motion problem well known in the computer vision community. The computation of the reconstruction is done off line with a method relying on bundle adjustment. The second step is the real time localization process which provides the complete pose (6 degrees of freedom) of the camera from a single image. Interest points are detected in the current image. These features are matched with the features stored in memory as part of the 3D reconstruction. From the correspondences between 2D points in the current frame and 3D points in memory, the pose of the camera is computed. The complete pose (6 degrees of freedom) is always used for the camera. From the camera pose, we compute the pose of the vehicle on the ground plane (with only 3 degrees of freeedom) and we use that to compute the lateral and angular deviation to the reference trajectory. These two parameters are used for the lateral control of the vehicle. Every step in the reconstruction as well as the localization relies on image matching. Interest points are detected in each image with Harris corner detector (Harris and Stephens, 1988). Matching is done by computing a Zero Normalized Cross Correlation score between pairs of interest points and keeping the best scores. To speed up matching and to reduce the number of outliers, a Region Of Interest (ROI) is used: for every point in image 1,

we try to match it with the points in image 2 that are inside the ROI.

3. POSE, COVARIANCE AND MOTION MODEL COMPUTATION The pose of each vehicle is computed after each image acquisition (15 times per second). We get (Rn,i , Tn,i ) the pose of vehicle number n at time tn,i (the time at which image i was acquired by the camera n). The pose of each vehicle is computed with reference to the map built in the learning step. So the pose of all the vehicles are given in the same coordinate system. Since there is no synchronization between the cameras of the different vehicles, tn,i doesn’t match tm,j . Therefore, it is not possible to compute directly the distance between two vehicles because we don’t know their positions at the same time. To overcome this difficulty, every time we compute the pose of a vehicle (at time tn,i ), we also compute the parameters of the motion of the vehicle so that it becomes possible to compute the pose of the vehicle at time t for t ≥ tn,i . This motion model was developed mainly for multivehicle configurations but it also permits some improvements for the localization of a single vehicle.

3.1 Motion model The motion of the vehicle is described by the linear and angular velocity of the vehicle. We consider only the angular velocity around the vertical axis because other rotations of the vehicle are caused by unpredictable irregularities of the road. The state vector of the vehicle at time tn,i is made of four elements: • Tn,i = (Xn,i , Yn,i , Zn,i ), the position of the vehicle • Rn,i , a rotation matrix which gives the orientation −−→ • Vn,i , the linear velocity • ωn,i , the angular velocity around the vertical axis Rn,i and Tn,i are given directly by the vision −−→ algorithm. Vn,i and ωn,i are computed from the position history of the vehicle. This model is not exact for a car like vehicle because it doesn’t take explicitly into account the steering angle. We chose it because computing the steering angle from a pair of images is difficult to do reliably. 3.1.1. Linear and angular velocity computation −−→ We compute separately kVn,i k and a unit vector − −→ → − v in the direction of Vn,i . Past experiments have shown that the computation of the vehicle

orientation is more accurate than its position. − So we estimate → v based on the direction of the vehicle which is directly computed from Rn,i and the known rigid transformation between the camera and the vehicle. Then, we compute: −−−−−−−→ → Tn,i Tn,i−k · − v −−→ kVn,i k = (1) tn,i − tn,i−k with k = 4. The angular velocity is computed with a similar method. The axis of rotation is fixed. We assume → − that the rotation is around the vertical axis Y . We compute the angle θ between the axis of the vehicle at times tn,i and tn,i−k . Then we have: ωn,i =

θ tn,i − tn,i−k

(2)

3.1.2. Pose prediction with the motion model Once the motion model has been computed at time tn,i , it is possible to use it to compute the position and orientation of the vehicle at time t ≥ tn,i . The predicted position T is: −−→ T = Tn,i + ∆tVn,i

(3)

with ∆t = t − tn,i . The predicted orientation R is given by: R=R → − Rn,i θ, Y

(4)

with θ = ∆tωn,i and R → − the rotation matrix θ, Y → − describing the rotation of angle θ around axis Y . 3.2 Motion model utilization for a single vehicle For a single vehicle, the motion model is used in two ways. First, processing the image i (acquired at time tn,i takes the time ∆t (about 65 ms), so the vehicle has moved during the processing time. The motion is computed so that the position used in the command law reflects the real position of the vehicle at tn,i + ∆t and not the position at time tn,i . The second way to use the motion model is to make the vision algorithm faster. This is illustrated on figure 2. When we know the pose of the camera for t ≤ tn,i , we can compute in advance the position of the camera at time tn,i+1 . With this information, it is possible to project the 3D points of the map in the image i + 1 so that we know approximately where the interest points will be on the current frame. Additionally, the confidence on the motion model and the covariance associated to the pose of the camera at time tn,i can be used to predict the covariance associated to the pose at time tn,i+1 . This allows us to define

Fig. 2. Motion model used for a single vehicle : the pose prediction allows a speedup in image matching. the center and the size of the regions of interest in the current frame so that matching interest points between the 3D map and image i + 1 can be done faster and with less outliers. The use of these regions of interest allows 15% speedup in the matching process compared to fixed size regions.

3.3 Motion model utilization for several vehicles In a platooning configuration with several vehicles, the motion model is used in order to be able to compute the position of a given vehicle at any time. This is necessary to compute accurately the distance between the vehicles. The complete process of computing the interdistance between the vehicles is illustrated on figure 3 in the case of two vehicles. We assume that each vehicle has a clock and the clocks are synchronized at the beginning of the experiment. Vehicle 1 is the leader. When vehicle 1 acquires an image at time t1,i , it computes its localization (complete pose with six degrees of freedom, associated covariance and motion model). Then, using the wifi communication, it sends this localization along with the date of the image acquisition t1,i . Vehicle 2 receives this information at time t1,i + ∆t. ∆t is a delay which includes the image processing time and the commucation time. Once vehicle 2 receives the localization of vehicle 1, it computes ∆t based on its own clock. Then it computes the current position of vehicle 1 (at time t1,i + ∆t). Simultaneously, vehicle 2 processes its image (acquired at time t2,j ) to get its own localization. When it receives the localization of the leader, it uses its own motion model to compute its position at time t1,i + ∆t. After these two steps, vehicle 2 knows the positions of both vehicles at the same time (t1,i + ∆t), so it can compute the distance between vehicle 1 and vehicle 2 according to the method described in paragraph 3.4. This distance is needed to control the speed of vehicle 2.

Fig. 3. Localization of each vehicle, wifi communication and use of the motion model to compute the distance between the vehicles 3.4 Computation of the distance between the vehicles Once the position of two vehicles are known at the same time, it is possible to compute the distance between them. Instead of the euclidean distance between the vehicles, we compute a curvilinear

Fig. 4. Curvilinear distance between two vehicles distance along the reference trajectory. This distance is much more adapted to the platooning especially in curves. The method used to compute the distance is illustrated on figure 4 in the case of two vehicles. For each vehicle (An ), we find the closest position Pn on the reference trajectory. − → At this point, we compute the tangent Tn and −→ − → −→ normal Nn to the trajectory. (Pn , Tn , Nn ) forms a local reference frame in which An has coordinates (αn , βn ). The curvilinear distance between the vehicles is given by: d = dc (P1 P2 ) + α1 − α2

Fig. 5. The 3D map computed with vision in the learning step. The map is seen from the top, the black squares show the trajectory of the vehicle in the learning step. The landmarks (whose positions were computed in 3D) are shown as blue dots.

(5)

where dc (P1 P2 ) is the length of the path between P1 and P2 . The reference trajectory is defined by the position of the vehicle for each frame of the learning video sequence (recorded at 15 frames per second). So usually, the values of αn are very small. βn is used for the lateral control of vehicle n.

4. EXPERIMENTAL RESULTS We did the following experiment. We drove one vehicle manually in order to record a reference video sequence. A 3D map was computed from this video sequence with the structure from motion algorithm. The 3D map is shown on figure 5 as seen from the top. The trajectory is approximately 100 meters long. Then two vehicles were driven manually on the same path to simulate a platooning configuration. The speed of the vehicles was approximately 1 meter per second. Each vehicle had a camera which recorded a video sequence. Then we computed the distance between the vehicles from the images by using the framework described in this paper. We also computed the distance from the GPS data recorded at the same time. The GPS data was recorded only to be used as the ground truth. The distance between the vehicles computed with the vision algorithm or with the GPS data is shown on figure 6. This distance is not constant because the vehicles were driven manually. We used RTK GPS receivers but the GPS accuracy

Fig. 6. Distance between two vehicles computed either with vision or with the GPS for this experiment was pretty low because some buildings were masking the satellites. We plan to repeat this experiment to have better ground truth data. For this experiment, the distance computed with vision is less noisy than the one with the GPS. This is a common situation because vision works best when there are features (buildings, trees, etc . . . ) close to the camera. On the other hand, this kind of objects can block the satellites signals and the RTK GPS are usually more accurate in places free from obstacles. For this reason we think that vision and GPS could be used as complementary sensors for localization of the vehicles. Another way to check that the relative position of the vehicles is computed correctly is to look at the images. Except in tight turns, the first vehicle is visible in the image of the second one. Since vehicle 2 knows its own position and the position of the vehicle in front of it, it can draw on its own image the projection of vehicle 1. Figure 7 shows the reprojection of the first vehicle in the second camera for a few frames of the video. This is not the result of a tracking algorithm. The reprojection closely matches the actual position of the vehicle in the image. This is very encouraging because it could allow us to detect the position of several key points on the first vehicle with vision.

Fig. 7. Reprojection of the first vehicle in the image of the second camera. The reprojection is drawn in blue. The yellow squares are the interest points used to localize vehicle 2. These points could be used to refine the relative position of the two vehicles.

5. CONCLUSION We have shown a method to compute the relative pose between several vehicles in a platooning configuration. This approach doesn’t assume that the following vehicles can see the leader because each vehicle can localize itself with reference to the environment. By using a motion model of the vehicle, we can take into account the movement of the vehicles during the communication and computation delays. We plan to use a better model for the motion of the vehicles which takes explicitly into account the steering angle. Results shown in this paper show that it is possible to compute the distance between the vehicles. We have not used this information for the longitudinal control yet but we plan to do so in the near future. Finally, this framework gives us the possibility to reproject the front vehicles in the image acquired by the following vehicles. This can be a very useful information for additional vision work. It can be used to refine the relative position of the vehicles. But it could also be used in order to detect obstacles between the vehicles. For example, if vehicle 2 knows where vehicle 1 should appear in its image, but there is no vehicle visible at this position, then we could deduce the presence of an obsatcle between the vehicles.

REFERENCES Benhimane, S. and E. Malis (2005). Visionbased control for car platooning using homography decomposition. In: IEEE International Conference on Robotics and Automation. pp. 2173–2178. Bom, J., B. Thuilot, F. Marmoiton and P. Martinet (2005). A global strategy fo urban vehicles platooning relying on nonlinear decoupling laws. In: International Conference on Intelligent Robots and Systems. pp. 1995– 2000. Harris, C. and M. Stephens (1988). A combined corner and edge detector. In: Alvey Vision Conference. pp. 147–151. Kuroda, H., S. Kuragaki, T. Minowa and K. Nakamura (1998). An adaptive cruise control system using a millimeter wave radar. In: IEEE International Conference on Intelligent Vehicles. pp. 168–172. No, Tae Soo, Chong Kil-To and Roh Do-Hwan (2001). A lyapunov function approach to longitudinal control of vehicles in a platoon. IEEE Transactions on vehicular technology 50(1), 116–124. Royer, E., J. Bom, M. Dhome, B. Thuilot, M. Lhuillier and F. Marmoiton (2005a). Outdoor autonomous navigation using monocular vision. In: International Conference on Intelligent Robots and Systems. pp. 3395–3400. Royer, E., M. Lhuillier, M. Dhome and T. Chateau (2005b). Localization in urban environments : monocular vision compared to a differential gps sensor. In: International Conference on Computer Vision and Pattern Recognition, CVPR.