Transportation Research Part C 63 (2016) 1–22
Contents lists available at ScienceDirect
Transportation Research Part C journal homepage: www.elsevier.com/locate/trc
Virtual 3D city model as a priori information source for vehicle localization system Maya Dawood a,b,⇑, Cindy Cappelle c, Maan E. El Najjar a, Mohamad Khalil b, Bachar El Hassan b, Denis Pomorski a, Jing Peng a a
Lille 1 University, LAGIS, UMR CNRS 8164, Lille, France Lebanese University, Doctoral School for Sciences and Technology, AZM Center for Research in Biotechnology, Tripoli, Lebanon c Université de Technologie de Belfort-Montbéliard, Laboratoire Systèmes et Transports, Belfort, France b
a r t i c l e
i n f o
Article history: Received 13 August 2011 Received in revised form 4 December 2015 Accepted 4 December 2015
Keywords: Localization Data fusion Image processing Intelligent vehicle GPS 3D-GIS
a b s t r a c t This paper aims at demonstrating the usefulness of integrating virtual 3D models in vehicle localization systems. Usually, vehicle localization algorithms are based on multi-sensor data fusion. Global Navigation Satellite Systems GNSS, as Global Positioning System GPS, are used to provide measurements of the geographic location. Nevertheless, GNSS solutions suffer from signal attenuation and masking, multipath phenomena and lack of visibility, especially in urban areas. That leads to degradation or even a total loss of the positioning information and then unsatisfactory performances. Dead-reckoning and inertial sensors are then often added to back up GPS in case of inaccurate or unavailable measurements or if high frequency location estimation is required. However, the dead-reckoning localization may drift in the long term due to error accumulation. To back up GPS and compensate the drift of the dead reckoning sensors based localization, two approaches integrating a virtual 3D model are proposed in registered with respect to the scene perceived by an on-board sensor. From the real/virtual scenes matching, the transformation (rotation and translation) between the real sensor and the virtual sensor (whose position and orientation are known) can be computed. These two approaches lead to determine the pose of the real sensor embedded on the vehicle. In the first approach, the considered perception sensor is a camera and in the second approach, it is a laser scanner. The first approach is based on image matching between the virtual image extracted from the 3D city model and the real image acquired by the camera. The two major parts are: 1. Detection and matching of feature points in real and virtual images (three features points are compared: Harris corner detector, SIFT and SURF). 2. Pose computation using POSIT algorithm. The second approach is based on the on–board horizontal laser scanner that provides a set of distances between it and the environment. This set of distances is matched with depth information (virtual laser scan data), provided by the virtual 3D city model. The pose estimation provided by these two approaches can be integrated in data fusion formalism. In this paper the result of the first approach is integrated in IMM UKF data fusion formalism. Experimental results obtained using real data illustrate the feasibility and the performances of the proposed approaches. Ó 2015 Elsevier Ltd. All rights reserved.
⇑ Corresponding author at: Beirut, Lebanon. Tel.: +961 3593709. E-mail address:
[email protected] (M. Dawood). http://dx.doi.org/10.1016/j.trc.2015.12.003 0968-090X/Ó 2015 Elsevier Ltd. All rights reserved.
2
M. Dawood et al. / Transportation Research Part C 63 (2016) 1–22
1. Introduction The geo-localization system is an essential component of intelligent transportation systems (ITS). Nowadays, outdoor positioning systems often rely on GPS because of its affordability and convenience (Abbott and Powell, 1999). However, GPS suffers from satellite masks occurring especially in urban environments, under bridges, tunnels or in forests. To provide continuous, accurate, and high integrity position data, GPS-based localization systems should then incorporate additional sensors (as proprioceptive sensors or environment perception sensors) and/or database (for example 2D digital map) (El Badaoui El Najjar and Bonnifait, 2005). Nevertheless, using only incremental encoders placed on the rear wheels and gyroscopes is not sufficient in case of long GPS outages, because the dead-reckoning (DR) localization is subject to drift due to error accumulation (Se and Lowe, 2002; Dawood and Cappelle, 2011). Variety of approaches studied to compensate this problem. Map-matching algorithms is one example. It produces the vehicle’s position using road map and an input generated from GPS or GPS integrated with DR sensors. Map matching process based on finding the road where the vehicle is traveling then project current positioning point to the road of vehicle traveling (Bernstein and Kornhauser, 1988; Renault and LeMeur, 2005). provide good survey of existing map matching. Another paper highlights the potential impacts of the forthcoming European Geostationary Overlay Service (EGNOS) on the performance of map matching algorithm (Quddus and Ochieng, 2007). In addition, two major approaches of vision based localization are SLAM (Simultaneous Localization And Mapping) and visual odometry. The goal of SLAM is to localize a robot in the environment while mapping it at the same time (Davison, 2003). Real time hierarchical outdoor SLAM based on stereovision and GPS fusion is proposed in Schleicher and Bergasa (2009). Marais and Meyrie (2014) deals with combination of video and GNSS for localization. In this paper, the proposed approach consists in determining where satellites are located in the fisheye image, and then excluding those located in non-sky regions when calculating the position. Visual odometry for vehicle geo-localization has been an object of great interest during last decades. An approach for SLAM is proposed in Dissanayake and Newman (2001), which is based on the extended Kalman filter. In Scaramuzza and Seigwart (2008), an appearance-guided monocular omni-directional visual odometry for outdoor ground vehicles is proposed. In this paper, new approaches to back up GPS limitations and drift of dead-reckoning sensors based localization are proposed. The originality of these proposed approaches is the integration of a virtual 3D model of the environment in the absolute localization process. The principle is to compare the scene perceived by an on-board sensor and the scene generated thanks to the virtual 3D model. In the first case, the considered exteroceptive sensor is a video camera. In the second case, it is a laser scanner (Peng and Najjar, 2009). The first approach is based on image matching (Se and Lowe, 2001, 2002; Dawood and Cappelle, 2011). Features points are detected in the real image acquired by the an-board video camera and in the virtual image generated by the virtual 3D model. Then the two sets of detected points are matched. Three methods are often used for feature detection and matching: Harris corner detector (Peng and Chen, 2009; Harris and Stephens, 1988), SIFT (Scale Invariant Feature Transform) (Se and Lowe, 2001; Peng and Chen, 2009; Lowe, 2004) and SURF (Speed Up Robust Features) (Bay and Tuytelaars, 2006; Bay and Ess, 2008). Harris corner extraction algorithm is one of the effective and widely used algorithms for feature detection. However, Harris corners are not scale invariant. Due to this, we turn to SIFT algorithm whose idea is to represent image parts by histograms of gradient location and orientation. Although, SIFT is currently the most used method for scale/rotation invariant feature detection, it is too computationally expansive to be used in real time or large scale evolution environments. The results of SURF is comparable to the SIFT one. But the additional advantage is that SURF is less cost computational. Finally, to estimate the vehicle position and orientation, POSIT algorithm is used in David and Dementhon (2004) and DeMenthon and Davis (1995). It permits to avoid the intrinsic non linearity of the geometrical constraints that arises when the imaging process is modeled as a perspective transformation. Considering the laser scanner based second approach (Peng and El Badaoui El Najjar, 2009), the geo-pose is computed by scan matching between real scan data provided by the real laser scanner and virtual scan data obtained using the virtual 3D model. ICP (Iterative Closest Point) algorithm (Zhang, 1994) is used to determine the transformation R and T between the two scans data. The pose can then be computed with the kinematic model of unicycle (Gutmann and Schlegel, 1996). Each of the pose approach estimation has advantages and drawbacks. The laser scanner based approach is more efficient in term of computation time than the vision based approach. Moreover, high resolution textured images are not needed contrary to the vision based method (Lothe and Bourgeois, 2009). In addition, in the laser scanner based approach, it is not affected by the time of day (day or night) as the vision approach. Nevertheless, information provided by laser scanner can be insufficient in certain configurations of the environment, vision based approach is then useful as images provide more information. The pose estimated using virtual 3D model can be integrated in a multi-sensor data fusion framework (with GPS, dead-reckoned sensors . . .). Many researches use multi-sensors fusion for better vehicle tracking. Salameh and Challita (2013) proposes two different applications in the area of V2V communications. First application presents a method for better car tracking using GPS information shared through V2V communication and vision system. Second application presents a new simulated framework for prototyping the whole process by combining embedded data, vision, and V2V simulation. Extended Kalman Filter (Grewal and Weill, 2011; Welch and Bishop, 2002), Unscented Kalman Filter (Julier and Uhlmann, 1997; Wan and Van der Merwe, 2001; Julier and Uhlmann, 2004), and Particular Filtering (Gustafsson and Bergman,
M. Dawood et al. / Transportation Research Part C 63 (2016) 1–22
3
2001) are commonly used techniques to fuse data. In this paper, the integration of 3D model/camera based approach is illustrated using Unscented Kalman Filter with Interacting Multiple Model (Bar-Shalom and Kirubarajan, 2002) (UKF-IMM) formalism. The rest of this paper is organized as follows. In Section 2, the virtual 3D city model is described. Then Section 3 details the developed geo-localization methods. In this section, two approaches are presented: the vision based approach and the laser scanner based approach. In Section 4, an integration of 3D model/vision based pose estimation with GPS and dead-reckoning sensors using IMM UKF filter is proposed. In Section 5, experimental results obtained with real data are presented, before we conclude in Section 6.
2. Virtual 3D city model 2.1. Virtual 3D model Virtual 3D models aim at modeling the real world as accurately as possible. Three dimensional representation and visualization of city environments are employed in an increasing number of applications from the areas of urban planning, city marketing, tourism, facility management and emergency response. Virtual 3D city models meet then a rapid expansion (Kim and Kim, 2005; Takase and Sho, 2003) and are more and more available. Such models are generated using several data like laser profiler data, 2D digital map and textured images. The 3D models used in this work are the maps of Stanislas square (Fig. 1A and B) and LORIA laboratory (Fig. 1C and D) at Nancy, France. According to the constructor, the topological and geometrical accuracy is 1 m. To navigate in real-time in the 3D model, a three dimensional geographical information system (3D GIS) adapted to our applications in robotics and intelligent vehicle was developed.
2.2. Developed GIS A GIS is a computer system capable of managing the 3D city model database. The developed 3D GIS has several inputs and outputs. The 3D GIS should also be calibrated in order to be used in a given application. In our case, the virtual camera needed to be calibrated in order to obtain the virtual image that corresponds to the real one. Then using the calibrated GIS we can compute the 3D coordinates of the points that are projected in the virtual image plane.
2.2.1. GIS inputs and outputs The three inputs of the developed 3D-GIS are: 1. the database of the model; 2. the extrinsic parameters (position and orientation) of the virtual camera with respect to the local frame attached to the 3D model; 3. the intrinsic parameter that is the horizontal Field of View (FOV) of the virtual camera. The five outputs functionalities are: 1. 2. 3. 4. 5.
the virtual image that represents the view of the scene seen by the virtual camera; the bitmap image of the segments that is visible from the position and orientation of the camera; the depth image; a file text that contains the depth information in meter for each pixel; XML file which contains information about the segments of the database.
2.2.2. Calibration of the virtual laser scanner and virtual camera An important aspect of these approaches is the calibration of the virtual camera and virtual laser scanner with respect to the real camera and real laser scanner respectively, in order to obtain a similar view. The intrinsic parameter of the virtual camera that has to be determined is the FOV (Field Of View) parameter (see Fig. 2). The FOV is determined using the intrinsic parameters of the real camera with the following relation:
FOV ¼ 2 arctan
l=2 f
ð1Þ
with f is the focal length of the real camera and l the length of the image in pixels. The extrinsic parameters of the virtual camera are its position and orientation in the 3D model frame. So, the position and orientation in WGS84 system should be converted to 3D model.
4
M. Dawood et al. / Transportation Research Part C 63 (2016) 1–22
Fig. 1. Screen shots of the 3D model of the Stanislas square and LORIA laboratory.
Fig. 2. FOV parameter of the virtual camera.
3. virtual 3D city model based geo-localization In order to help GPS and dead-reckoning based vehicle localization, we propose in this paper to integrate in the localization process an a priori information source that is virtual 3D city model. The concept of the proposed geo-localization approach is based on the registration between: – the perception of the scene acquired by on-board sensors – and the perception of the scene simulated using the virtual 3D model and a rough estimation of the vehicle pose. Based on this principle, two variants are proposed depending on the considered perception sensor. In the first presented approach, a monocular camera is used whereas in the second approach, a laser range finder is considered. 3.1. Vision approach 3.1.1. Principle of vision approach The general principle of the proposed geo-localization method is described in Fig. 3. The proposed approach is based on 2D-3D matching that is used to update the gyroscope and wheel odometry based pose prediction. Three feature detection and matching methods (Harris corner detector method, Scale Invariant Feature Transform SIFT and Speed Up Robust Features SURF) are used for pose estimated by 2D-3D matching between two images: the real image captured by an on board
M. Dawood et al. / Transportation Research Part C 63 (2016) 1–22
5
Fig. 3. General principle for localization method.
camera and the virtual image that is produced by the 3D GIS using the 3D model database and the initial estimation of the pose. This initial estimation of the pose is predicted, we use the on-board GPS fixed on the top of the vehicle and the heading provided by the gyroscope. This virtual image will then be in the neighborhood of the real image. After matching step, we have two sets of 2D points from the real and virtual images that are matched. Next, the RANSAC (RANdom SAmple Consensus) (Fischler and Bolles, 1981) method is applied in order to eliminate outliers. This method has been popular in regression problem with samples contaminated with outliers. Then, we have to compute the 3D coordinates of the points that are projected to the detected feature points in the virtual plane. With these 2D/3D matches, the real camera pose can be computed by using an iterative method called POSIT. 3.1.2. Feature extraction and matching We compared and evaluated three different methods: Harris corner detectors, SIFT, and SURF. The first method is the Harris Corner Detector. Harris corner detector algorithm is realized by calculating each pixels gradient. If the absolute gradient value in 2 directions are both great, then judge the pixel as a corner (Fig. 4A1). Then the 2 sets of Harris points have to be matched (Fig. 4A2). To do so, for each interest point detected in the real image, Zero mean Normalized Cross Correlation (ZNCC) is computed with all interest points detected in the virtual image in a search region. The second method is SIFT algorithm. SIFT is used for extracting distinctive invariant features from images which will be used to perform reliable matching between different images using a nearest-neighbor algorithm. The major steps in computation of SIFT are: 1. scale-space extrema detection and keypoint localization (Fig. 4B1) based on difference-of-Gaussian (DoG) function to identify potential interest points; 2. orientation assignment to each keypoint location based on local image gradient direction; 3. the keypoint descriptor measures the local image gradients at the selected scale in the region around each keypoint; 4. SIFT matching (Fig. 4B2) uses the nearest neighbor algorithm to match each other. The third method is SURF (Speeded Up Robust Features) has been recently published. SURF is a local feature method. The use of integral image and basic Hessian-matrix approximation has greatly reduced the computational complexity. The SURF parts consist of: 1. interest point detection (Fig. 4C1) based on the Hessian matrix that approximates second order Gaussian derivative with box filters by using integral images; 2. orientation assignment determined by constructing a circular region around the detected interest point and the dominant orientation describes the orientation of interest point;
6
M. Dawood et al. / Transportation Research Part C 63 (2016) 1–22
Fig. 4. Keypoint detection and matching using the three methods: Harris, SIFT and SURF.
3. interest point descriptors are constructed by extracting square windows around the interest points and computing the Haar wavelet responses in horizontal and vertical directions; 4. matching (Fig. 4C2) two features are based on comparing the corresponding interest point descriptors. This is done by calculating the euclidian distance between two interest point descriptors, minimum distances are taken as matched point. Next, the RANSAC method is applied in order to eliminate outliers. This algorithm is a popular way to make a model fitting in the presence of noisy data. RANSAC simply iterates two steps: generating a hypothesis from random samples and verifying it to the data. The mathematical model used in our case is the 2D homography which defines the relation between the real and the virtual images. 3.1.3. Computation of 3D coordinates After the matching is done, we have a set of 2D feature points from real image matched with a set o 2D feature points the virtual image. We have now to compute the 3D coordinates of the feature points in the virtual image using the depth provided by the 3D-GIS. This depth information is the distance in meters between the virtual camera and the 3D points. Using the depth information and the position of the pixels in the virtual image, we can compute the 3D coordinates in the camera coordinate system thanks to the pinhole camera inverse projection model. Let P denote a 3D point of the scene (Fig. 5). The coordinates of P in the camera referential system are denoted by: P Vx ; P Vy ; PVz , the 3D point P is projected to the pixel denoted pv . The feature points pV are detected using Harris, SURF or SIFT in the virtual image and matched with the real image features. Let uV ; v V be the coordinates of pixel pV in the virtual image plane. Then the coordinates of P (3D point corresponding to pixel pv ) can be computed by
8 V P ¼ df > m > < z v PVx ¼ Pvz ufu0 > > v : V Py ¼ P vz v f v 0
ð2Þ
M. Dawood et al. / Transportation Research Part C 63 (2016) 1–22
7
P u
u0
v
v0
p0 d m f
V
Y
V
Z
V
V
X
Fig. 5. Computation of the 3D coordinates of virtual feature points.
where (Fig. 5): – – – –
ðu0 ; v 0 Þ: the coordinates of the virtual image center, f: the focal length of the virtual camera, d: distance in meters between the virtual camera V and the 3D point P, m: distance in pixels between the virtual camera V and the pixel pV ,
3.1.4. Pose computation using POSIT The aims of this part is to determine the rotation and translation between the real and virtual camera using POSIT algorithm. As virtual camera V pose is known (the a priori pose), we will be able to deduce the real camera pose. To do that, we have a set of m correspondences i.e. we have m 2D features points pRi ði ¼ 1; . . . ; mÞ from the real image matched with m 2D features points pVi ði ¼ 1; . . . ; mÞ from the virtual image. Let denote P i ði ¼ 1; . . . ; mÞ the m corresponding 3D points. P Ri ði ¼ 1; . . . ; mÞ are the 3D coordinates of Pi in the real camera frame. P Vi ði ¼ 1; . . . ; mÞ are the 3D coordinates of Pi is the virtual camera frame (Fig. 6). Rot ¼ ½Rx ; Ry ; Rz and T ¼ ½T x ; T y ; T z T are the searched rotation matrix ð3 3Þ and translation vector ð1 3Þ between the real and virtual cameras. P Ri and P Vi are then related by:
PRi ¼ RotPvi þ T
ð3Þ
Explicitly:
8 R V > > < Pix ¼ Rx Pi þ T x PRiy ¼ Ry PVi þ T y > > : R Piz ¼ Rz PVi þ T z
ð4Þ
T T pRi ¼ uRi ; v Ri (resp. pVi ¼ uVi ; v Vi ) are then the perspective projection of Pi in the real (resp. virtual) image.
uRi ¼
f PRiz
PRix ;
v Ri ¼
f PRiz
PRiy
ð5Þ
0R 0R T The scaled orthographic projection (SOP) of P i in the real image is denoted p0R . With SOP, all 3D points are i ¼ ui ; v i assumed to have the same distance T z in z direction then:
u0R i ¼
f R P ; T z ix
v 0Ri ¼
f R P T z iy
ð6Þ
Let s ¼ f =T z then:
h T T V V u0R i ¼ s Rx P i þ T x ¼ P i
1
i sR x
sT x
ð7Þ
8
M. Dawood et al. / Transportation Research Part C 63 (2016) 1–22
Fig. 6. Perspective projection and scaled orthogonal projection of P.
Similarly:
v 0Ri ¼
h
PVi
T
1
i sRy sT y
ð8Þ
These two equations form a linear system with sRx ; sT x ; sRy ; sT y as unknown. Effectively, there exists a relation between SOP and perspective projections: R u0R i ¼ ð1 þ ei Þui ;
v 0Ri ¼ ð1 þ ei Þv Ri
ð9Þ
where
ei ¼
1 T V R P Tz z i
ð10Þ
Each point gives two equations which means, if four or more SOP points are given, the problem becomes over constrained and can be solved with standard least square minimization technique. Since, Rx and Ry are unit vector, they can be determined by:
Rx ¼
sRx ; ksRx k
Ry ¼
sRy ksRy k
ð11Þ
And s ¼ ksRx k ¼ ksRy k sT x and sT y are solved from the linear system. Therefore, solving for s allow T x and T y to be found and Rz ¼ Rx Ry . Thus, the full transformation is covered. So, knowing the SOP points allows the pose to be recovered. However, we have only the perspective projection instead. The outline of algorithm is: 1. 2. 3. 4. 5. 6. 7.
Initialize ei to 0 for i ¼ 1; 2; . . . ; m. Determine the values of sRx ; sRy ; sT x ; sT y . Calculate s by normalizing sRx to solve for T x ; T y and T z . Calculate the value of Rz ¼ Rx Ry . Update ei from (10). Repeat from 2 until convergence. Determine the position of the real camera.
3.2. Laser scanner based approach 3.2.1. Approach developed for absolute location using laser scanner data After studying the contribution of the 3D model on the absolute location using vision and test the approaches developed with real data, we wanted to study the use of the range finder provided by the 3D model for geo-localization, for two major reasons. – Reduction of excessive computation time due to image processing. – Avoiding the use of 3D models with high textures resolution (large size of data to work on large scale maps). The use of the rangefinder allows us to circumvent these problems. The developed method is based on the data provided by 3D-GIS, in addition to the virtual range finder data (Fig. 7) that can be built from the depth file (Z-Buffer) or from ray tracing (see Fig. 7).
M. Dawood et al. / Transportation Research Part C 63 (2016) 1–22
9
In this paper, we take into consideration two assumptions to calculate the pose. In the first consideration, we have assumed that the vehicle is running in a surface plane. In the second, we improved the previous assumption method for an evolution on a non flat surface (described in Section 3.2.3). By retuning back to the surface plane, we suppose that there is no pitch and no roll. So the elevation of the laser scanner with a predicted pose permit to extract a depth image from the 3D city model managed by 3D-GIS and depth file which is attached to. In the same horizontal plane, we can consider that distances provided by the laser scanner (laser scan data) is comparable to distances observed by the virtual camera (virtual laser scan data) provided by the depth file. At this step, the task is to match a pair of scans: virtual laser scan data with the laser scan data. To this task, we use the ICP (Iterative Closest Point) (Besl and McKay, 1992) (see Figs. 8 and 9). 3.2.2. Laser scanner data A laser scanner is a sensor used to measure the distances. This sensor is used for navigation in outdoor and indoor application, map building and obstacle detection due to their good accuracy. The used laser scanner is a SICK LMS 291 which provides the distances between the sensor and the objects of the environment in an horizontal plane. The sensor scans horizontally from left to right the environment with a 180° field-of-view discredited into 0.5° (Jensfelt and Christensen, 2005). The coordinates ðX L ðiÞ; Y L ðiÞÞ of the i-th point of the laser scan can be computed as follows:
X L ðiÞ ¼ DðiÞ cos ðangleðiÞÞ Y L ðiÞ ¼ DðiÞ sin ðangleðiÞÞ
ð12Þ
with: – D(i): distance between the laser scanner and i-th point of laser scan data. – angle(i): angle of the i-th laser scan point relative to X-axis. A calibration step is done to adjust the FOV to make the profiles provided by the real laser scanner comparable to virtual scanner provided by the 3D-GIS. 3.2.3. Improved method for an evolution on a non flat surface For the evolution of the vehicle on a non flat surface, the above method will work poorly because we do not have an estimate of the roll, pitch, and elevation of the laser range finder. For these reasons, a research phase is done to find the most appropriate virtual laser pitch, roll and elevation that correspond to the real one based on the criterion of similarity of laser distances between the two scans. This step is possible because the 3D model contains a digital terrain model. In order to determine the virtual laser scan data that corresponds to the real laser scan data so as to be used in the localization method, we create a set of virtual laser scan data by fixing the pose of the virtual camera with the prediction ðx; y; hÞ and by discretizing an interval around each of the three degrees of freedom ðu; w; zÞ representing respectively the pitch angle, roll angle and elevation. This way of searching for the virtual laser scan data that corresponds to the real laser scan data to be used in the localization method assumes the position at time t + 1 is reasonably close to the position at time t. This assumption is made plausible with the high frequency of data acquisition and relatively low computation time of the proposed method. The various possible combinations of generated intervals are considered as the inputs of 3D-GIS. The output of the 3D-GIS is then a set of virtual laser scan data which contains all possible attitude of the vehicle at each elevation (Fig. 12). At this step, an ICP algorithm is used in order to have the most likely virtual laser scan data that corresponds to the real one (Fig. 12). Fig. 10 shows a real case of a chosen interval ([0.25, +0.25]) around the estimate of z at time (t 1). This interval is discretized with a sampling interval of 0.05 m. The blue lines represent the different profiles generated by this interval. Subsequently, the pitch angle is calculated according to the elevation (Fig. 11).
pitchangle ¼ arctanððH hÞ=dÞ
ð13Þ
with – H: the real laser elevation on the top of the vehicle after calibration (Fig. 11), – h: the obtained height of the virtual laser scan data in 3D model after comparison with the real laser scan data, – d: the distance in meter given by the laser scanner. The yellow1 line in Fig. 12(a) represents the profile of virtual scan data with the chosen pitch angle. Then an interval [20°, +20°] around the estimated of the roll angle at the previous instant t 1. This interval is sampled with sampling step of 2°. The profile of the virtual laser scan is given with the yellow lines in Fig. 12(b). The red line in Fig. 12(b) shows the selected virtual laser scan that will be used in the pose estimation method at instant t. 1
For interpretation of color in ‘Figs. 12, 18, 19, 22, and 23’, the reader is referred to the web version of this article.
10
M. Dawood et al. / Transportation Research Part C 63 (2016) 1–22
Fig. 7. Output of the 3D GIS.
Fig. 8. Block diagram of the location method with a laser range finder.
M. Dawood et al. / Transportation Research Part C 63 (2016) 1–22
11
Fig. 9. (a) Virtual image of Stanislas square with reprojection of laser scanner data. (b) Real laser scanner data. (c) Virtual laser scanner data. (d) Real/virtual laser scans alignment with the ICP algorithm.
Fig. 10. Laser scanner measurement for a vehicle pose.
3.2.3.1. Pose estimation. The ICP algorithm gives the transformation (R, T) that corresponds to the maximum likely alignment R1 R2 T between two sets of scans. Using the 2 2 rotation matrix R = and translation vector T= 1 , we can calculate the R3 R4 T2 elementary rotation @ x and elementary translation @T. We consider a car-like robot with the kinematic of a unicycle. The
12
M. Dawood et al. / Transportation Research Part C 63 (2016) 1–22
Fig. 11. Steps for estimating the pitch and roll with the ICP algorithm.
Fig. 12. Scans with sampling of the pitch (a) and roll (b).
generalized coordinate vector ðx; yÞ 2 IR2 is the coordinate vector of the centre of the rear axle of the car-like robot model and h the vehicle heading w.r.t the horizontal axis of the world frame, which is the East in the Lambert I2 plane coordinate frame which is used in this work.
2 The projection of Lambert is one of the cartographic projections. It is often used for Francelocal maps. In this work, adequate algorithm is used for geographical transformations to convert the WGS84 coordinate system (GPS geographical system) into Lambert I which concern Nancy city.
M. Dawood et al. / Transportation Research Part C 63 (2016) 1–22
13
Fig. 13. Unicycle car-like robot.
Using the unicycle kinematic model, the pose can be computed by integrating elementary rotation and elementary translation (Fig. 13). So the absolute estimation of position ðxk ; yk ; hk Þ at instant k is computed in terms of the a priori pose ðxk1 ; yk1; ; hk1 Þ provided at instant k 1.
R¼ T¼
R1
R2
R3 R4
T1 T2
a cosðR1 Þ; si a sinðR2 Þ > 0 Þ; si a sinðR2 Þ < 0 a cosðR 1 ! qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 2 @T ¼ T 1 þ T 2 @x ¼
8 > < xk ¼ xk1 þ @T cosðhk1 Þ yk ¼ yk1 þ @T sinðhk1 Þ > : hk ¼ hk1 þ @ x
ð14Þ
ð15Þ
4. Data fusion using IMM-UKF To fuse the information coming from the sensors, different approaches can be found in the actual literature. Many of them rely on the implementation of Kalman Filter. The Kalman Filter (KF) is the optimal minimum mean square error estimator for linear space model (Wang, 1990). However, in our case, the dynamic system is not linear, so the traditional KF can’t be applied in estimating the state of such system. The most straight forward extension of KF is the extended Kalman filter (EKF) (Rezaei and Sengupta, 2005; Boucher and Lahrech, 2004). In EKF, the transformation is approximated using the first order tailor series linearization. But, when the system is strongly not linear, then the performance of this filter may lead to instability problems. Other filtering methods can be found in the literature, such as the Unscented Kalman Filter (UKF) and the particle-based solution (Rezaei and Sengupta, 2005; St-Pierre and Gingras, 2004; Cui and Hong, 2005). So, UKF is an alternate method for calculating the statistics of a random variable which undergoes a non linear transformation. In addition, the interacting multiple model IMM method based on the UKF (IMM-UKF) has the advantage to takes into account the possibility that the system’s model might change. It has been widely applied in many fields. For example, in Gustafsson and Gunnarsson (2001) the IMM method is applied to the problem of object tracking with a video system in a car. The authors in Weiss and Kaempchen (2004) develop the IMM algorithm to detect lane change maneuvers based on laser, radar, and vision data. The authors in Sayage and Moran (2007) propose waveform selection for maneuvering targets within the IMM framework. So, to integrate the pose estimation of the two developed approaches (based on GPS, dead-reckoning sensors and virtual 3D model of the environment), a data fusion formalism based on IMM-UKF is proposed. The IMM-UKF filter is applied to the vision approach. GPS measurements are fused with proprioceptive data and pose determined by 2D-3D matching using Interacting Multiple Model-Unscented Kalman Filter (IMM-UKF) as indicated in Fig. 14. 4.1. Vehicle evolution model The evolution model provides the a priori estimation of the vehicle pose from the incremental encoders and gyroscope measurements. This model is non-linear and can be expressed by
8 xk1 > < xk ¼ xk1 þ dk1 cos hk1 þ 2 yk ¼ yk1 þ dk1 sin hk1 þ xk1 2 > : hk ¼ hk1 þ xk1
ð16Þ
14
M. Dawood et al. / Transportation Research Part C 63 (2016) 1–22
Fig. 14. Data fusion using IMM-UKF filter.
– ðxk ; yk ; hk Þ the state vector at time k composed of the position and orientation of the vehicle, – ðdk ; xk Þ the vector of the measured inputs composed of the elementary distance covered by the rear wheels (dk is given by the odometer) and the elementary rotation ðxk provided by the gyroscope). 4.2. Unscented Kalman Filter Due to model non linearity, Unscented Kalman Filter (UKF) is used. (UKF) makes use of the Unscented Transformation (UT) (Julier and Uhlmann, 2004; Welch and Bishop, 2002) as an approximation method for propagating the mean and covariance of a random variable through a nonlinear transformation. UKF is based on two steps: the prediction of the state vector and its covariance matrix, the estimation of the state vector and its covariance matrix by updating the prediction with the current measurement. 4.3. IMM filter IMM filter is a popular method for estimating systems, whose model changes according to finite state, discrete time Markov chain. A Markov transition matrix is used to specify the probability that the target is in one of the modes of operation. The model probabilities are updated at each new measurement, and the resulting weighting factors are used in calculating the state. It consists of three steps: (1) Calculate the mixed probabilities and the initial condition: The initial value of the state mean and covariance for each filter can be defined by: j ~xk1 ¼
n X
lijk j X ik1
ð17Þ
i¼1
ej ¼ P k1
n X
h
ih
j j lijk j Pik1 þ X ik1 ~xk1 X ik1 ~xk1
iT
ð18Þ
i¼1
where
1 cj
lijjk ¼ pij lik1
ð19Þ
P is mixing probabilities of model M i in the time step k 1 and c ¼ ni¼1 pij lik¼1 a normalization factor. (2) Model filtering using UKF and calculate the model probability update: Apply the UKF on the previous calculated state mean and covariance in order to obtain xkj and P kj . Then, we compute the Normalized Innovation Squared (NIS) of the measurement for each mode i at time k as i ^ k
i T i 1 i 1 exp v S v k k k 2 rffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ N v ik ; 0; Sik ¼
ffi
i
2pSk
ð20Þ
M. Dawood et al. / Transportation Research Part C 63 (2016) 1–22
where
v ik is the innovation measurement and Sik
15
it’s covariance for model M i in the UKF update step.
i
The probabilities of each model M at time step k are calculated as:
lik ¼
i 1^ ci c k
ð21Þ
P V where c ¼ ni¼1 ik ci is a normalizing factor. (3) Estimation and covariance combination The output estimate and covariance are computed according to the following equations:
xk ¼
n X
lik xik
ð22Þ
i¼1
Pk ¼
n X
n
lik Pik X ik1 xk X ik xk
T o
ð23Þ
i¼1
4.4. Modeling In our case, the system is composed of two models (Fig. III.26). One corresponds to the GPS measurement and the other corresponds to the 2D/3D matching based pose estimation. The evolution and the measurement models are defined as:
Fig. 15. System architecture of localization system using IMM-UKF.
Fig. 16. Experimental vehicle.
16
M. Dawood et al. / Transportation Research Part C 63 (2016) 1–22
40 35 30 25
y(m) North
20 15
reference frame pose by surf
10
pose by sift
5
pose by harris
0 -5 -10 -10
0
10
20
30
40
50
X(m) East 180
180
pose by surf
160
pose by sift
140
pose by harris
cummulative Error on y in meters
cummulative Error on x in meters
200
120 100 80 60 40 20 0
0
20
40
Frame Number
60
80
160
pose by surf pose by sift pose by harris
140 120 100 80 60 40 20 0
0
10
20
30
40
50
60
70
80
Frame Number
Fig. 17. Geo-localization results and cumulative error on x and y using SURF, SIFT and Harris.
X ik=k1 ¼ f X ik1=k1 ; U k ; ck þ aik Z k ¼ HX ik=k1 ¼
1 0 0 0 1 0
ð24Þ
X ik=k1
ð25Þ
– X ik=k1 ¼ xik=k1 ; yik=k1 ; hik=k1 is the predicted state using the GPS measurement or the 2D/3D matching based pose – – – –
estimation. U k ¼ ðdk ; xk Þ is the elementary distance covered by the rear wheels and the elementary rotation. ck represents the measurement error of the inputs. ak is the process noise. ck and ak are assumed to be uncorrelated and zero mean noise.
17
M. Dawood et al. / Transportation Research Part C 63 (2016) 1–22 Table 1 Comparison between Harris, SIFT, and SURF methods. Detection and matching method
Mean computational time (ms)/frame
Absolute position error (m)
Absolute orientation error (°)
Harris corner detector SIFT SURF
1180.5 3412.7 1950.8
4.86 2.33 1.99
5.53 2.21 1.96
40 35 30
Y meter
25 20 15
True trajectory Harris feature points IMM-UKF filter(GPS available) IMM-UKF filter(GPS not available)
10 5 0 -5 -10 -10
0
10
20
30
40
50
X meter 35 30 25
Y meter
20 15
True trajectory SIFT feature points IMM-UKF filter(GPS available) IMM-UKF filter(GPS not available)
10 5 0 -5 -10 0
5
10
15
20
25
30
35
40
45
50
X meter 40 35 30
Y meter
25 20 15
True trajectory
10
SURF feature points IMM-UKF filter(GPS available)
5
IMM-UKF filter(GPS not available)
0 -5 -10 0
5
10
15
20
25
30
35
40
45
50
X meter Fig. 18. IMM-UKF filter estimate of position using Harris, SIFT and SURF methods.
18
M. Dawood et al. / Transportation Research Part C 63 (2016) 1–22
For the two measurements, we used the same basic odometric model. However, the difference in the used models is in the tuning of model noise. In the case of GPS measurement, we chose low values for the variance–covariance matrix of the ak noise in order to give high credibility for the GPS when it is available and low credibility for the odometric model. While in the 2D/3D matching based pose estimation, we give high values for the variance–covariance matrix of the ak noise in order to give low credibility to the 2D/3D matching based pose estimation since after certain time the position may diverge using this measurement. Using IMM-UKF described above, the final estimate for the state and covariance of the Gaussian density in that particular time step can be determined by: calculating the initial conditions for certain model-matched filter by mixing the state estimates produced by the filters from the previous time step under the assumption that this particular model is the right model at current time step, then applying Unscented Kalman Filtering for each model, and after that we compute a weighted combination of updated state estimates produced by the filters yielding a final estimate for the state and covariance of the Gaussian density in that particular time step. The weights are chosen according to the probabilities of the models, which are computed in filtering step of the algorithm (see Fig. 15). 0:98 0:02 A probability transition matrix of two models is p ¼ that’s to say, at the start the two models have the same 0:02 0:98 chance to be selected and a prior model probability is set to l0 ¼ ½0:9 0:1. The innovation is calculated by v ik ¼ Z ik zik , where zik is the GPS measurement model or pose determined by 2D-3D matching measurement model. Then, the Normalized Innovation Squared (NIS) is used to calculate the probability of each model at time step k. If the probability of a GPS measurement model is low, then the GPS measurement is approximately rejected and the pose is estimated using data fusion of the odometers and gyroscope data with the pose determined by 2D-3D matching. If the probability of a 2D-3D matching measurement model is low, then the 2D-3D matching measurement is approximately rejected and the pose is estimated using data fusion of the odometers and gyroscope data with the GPS measurement. 5. Real experimentation Experimentations have been carried out with CyCab vehicle, which is a Robosoft product (http://www.robosoft.fr/) with embedded sensors: the GPS RTK, Odometer and gyroscope, Laser Scanner SICK LMS 291, and an AVT Marlin Camera. The processor speed of the used PC is 2.27 GHZ. All the results are calculated using Matlab. 5.1. Illustration of the localization method with a real camera and 3D model In order to test and compare our methods and algorithms, a sensor data acquisition platform has been developed. In this platform, a camera (AVT Marlin F-046C connected to the FireWire IEEE 1394 SVGA bus) of 30 frame per second, the RTK GPS (Thales Sagitta 02 connected on the RS232 port) are embedded and it’s frequency is 1 Hz, and the Odometer and gyroscope of 10 Hz frequency. The reference of the orientation is used gyroscope which is a KVH magnetic gyroscope integrated in the experimental vehicle. The accuracy is about 0.2°. So, each sensor has its sampling frequency. Synchronization of sensors measurements (camera, Odometer and Gyroscope) are made in taking the GPS RTK time acquisition as reference. Which means, a synchronization method has been realized in the sampling of all sensors is one hertz. The first experiment has been carried out in Nancy downtown. As we described previously, we can determine a vehicle pose by matching the 3D city model with captured 2D images (the resolution of the images are 580 780). In Fig. 16, the results of position computation are displayed. The blue squares represent the true trajectory of the vehicle i.e. the data of
Fig. 19. View from the top of the test carried out with Matlab.
M. Dawood et al. / Transportation Research Part C 63 (2016) 1–22
19
Fig. 20. Pitch and roll variation.
a centimetric RTK GPS. The yellow, red, and green stars represent the positions determined using SURF, Harris corner detector and SIFT respectively (see Fig. 17). The following table provides a comparison between the three proposed methods for a calculation of a position (see Table 1). After pose determination using the three methods Harris, SIFT and SURF. The results of the IMM-UKF that fuses the pose determined by 2D-3D matched, odometer and gyroscope measurements when the GPS is unavailable are displayed in Fig. 18: the green color represent the 3D model/vision based position estimation using Harris, SIFT or SURF feature points, the black color represent the true trajectory, the red color represent the IMM-UKF trajectory when GPS available, and the blue color represent the IMM-UKF trajectory when GPS is unavailable. The Root Mean Square Error (RMSE) of position estimation using respectively Harris, SIFT and SURF methods are 4.10, 1.96 and 1.91 meters respectively. The delay of treatment of IMM-UKF is about 0.27 s/frame by using a PC of 2.27 GHz processor speed. During all of our tests, SURF and SIFT are more performant than Harris. SURF and SIFT opposite to Harris are invariant to rotation, scale change, image noise and illumination. However, the computational cost and the complexity of the algorithm for SIFT is higher than SURF and Harris. If the approach is in online mode, the speed advantages of Harris over SURF and SIFT could be important. However, if the approach is working on offline mode the accuracy robustness advantages of SIFT and SURF over Harris could be important. So, SURF can be of more importance in our case because feature
20
M. Dawood et al. / Transportation Research Part C 63 (2016) 1–22
Fig. 21. Result of matching using ICP algorithm.
Fig. 22. The result and the correspondence.
Fig. 23. Error on x-axis (top) and y-axis (down).
points can be calculated quickly and that they can be recognized and matched with feature points from other images in an efficient way. 5.2. Illustration of the localization method with laser rangefinder and 3D model The second experiment has been carried out around LORIA Lab. The red parts of Fig. 19 represent the chosen parts where GPS masks in order to validate and to test the proposed algorithm. Coordinates in blue represent the centre of the 3D city
M. Dawood et al. / Transportation Research Part C 63 (2016) 1–22
21
model. For one pose in the trajectory of GPS, before matching the data of the laser scan data from the laser scanner and the virtual laser scan data by using the 3D-GIS, a filtering step is needed. Fig. 20 represents the variation in pitch and roll according to the position of the vehicle along its path. Fig. 21 shows the result of the matching after ICP algorithm, the laser scan data is represented in red points, the virtual laser scan data is in blue and green points represent the scan data aligned after scan matching using ICP algorithm to get the absolute estimation of pose. Pose computation of the transformation based on the matching of this pair of scans with the ICP algorithm: x = 9.205 correspond to x_GPS = 9.65, y = 0.99 corresponds to y_GPS = 0.70, pitch angle = 2.74, and roll = 5. For the next points in the trajectory of GPS, we use the estimation of pose at the instant (t 1) and the current angle of gyroscope as the predicted pose, to extract the virtual image and the depth file for estimating the pose of vehicle. To show the performance and the quality of the computation of geo-pose by the proposed approach, we choose two zones in the test trajectory, as shown in Fig. 18, where we have masked the GPS measurements. Results and errors are shown in Figs. 22 and 23, the green line links the GPS position (⁄) and the estimation of the position ðDÞ at each instant: The maximum error of the geo-pose computation is less than 1 meter on x-axis and on y-axis also. The mean error is about 0.6 m.
6. Conclusion The major contribution of the presented approach is the integration of virtual 3D city models in multi-sensors data fusion formalism for vehicle geo-localization in urban area. This paper proposes two methods to take advantage of such a virtual 3D city model. The benefits of virtual 3D city models have already been illustrated for detecting, tracking and geo-localizing obstacles using a GPS, a monovision system and 3D model. The first approach is constructed by matching the 3D model of the environment with the image acquired by an embedded camera. This matching is based points matching. Three well known features were compared: SURF, SIFT and Harris corner detector. Using these matching, we can determine the position of the vehicle using POSIT algorithm. The preliminary obtained results prove the validity of such method and indicated that SURF gives accurate results in addition that it is computationally less expensive. The principle of the second approach is to compare real data acquired by an on-board laser scanner with the virtual data generated by the virtual 3D model. The advantage of the method based on laser scanner is that it is more efficient in term of computation time. It’s noticed that the result of the proposed method can be affected by pitch and roll angle if the vehicle is running in non-flat surface. In order to avoid this problem, a method was developed in order to estimate the roll and pitch and aimed to be integrated in the pose estimation. The proposed methods have been tested and validated with real data. Obtained experimentation results are promising and it illustrates the benefits of a 3D city model for geo-localization in urban areas. This paper proposes a method for geo-localizing a vehicle in urban area by fusing data from GPS, pose determined from 2D-3D matching and odometric model using IMM-UKF filter. References Abbott, E., Powell, D., 1999. Land-vehicle navigation using GPS. Proc. IEEE 87 (1). Bar-Shalom, Y., Kirubarajan, T., Li, X.R., 2002. Estimation with Application to Tracking and Navigation. John Wiley & Sons, Inc., NewYork, NY, USA. Bay, H., Tuytelaars, T., Gool, L., 2006. Surf: speeded up robust features. In: Proceedings of the Ninth European Conference on Computer Vision. Bay, H., Ess, A., Tuytelaars, T., Van Gool, L., 2008. Speeded-up robust features (SURF). Comput. Vis. Image Underst 110 (3), 346–359. Bernstein, D., Kornhauser, A., 1988. Map matching for personal navigation assistants. Transp. Res. Board. Besl, P.J., McKay, N.D., 1992. Method for registration of 3-D shapes. IEEE Trans. Pattern Anal. Mach. Intell. 14 (2), 239–256. Boucher, C., Lahrech, A., Noyer, J.C., 2004. Non-linear filtering for land vehicle navigation with GPS outage. In: Proc. IEEE Int. Conf. Syst., ManCybern., The Hague, Netherlands. Cui, A.N., Hong, L., Layne, J.R., 2005. A comparison of nonlinear filtering approaches with an application to ground target tracking. Sig. Process 85 (8), 1469– 1492. David, P., Dementhon, D., Duraiswami, R., Samet, H., 2004. Softposit: simultaneous pose and correspondence determination. Int. J. Comput. Vis. 59, 259–284. Davison, A.J., 2003. Real-time simultaneous localisation and mapping with a single camera. In: Proceedings of the Ninth IEEE International Conference on Computer Vision ICCV ’03, p. 1403. Dawood, M., Cappelle, C., El Badaoui El Najjar, M., Khalil, M., Pomorski, D., 2011. Vehicle geo-localization based on IMM-UKF data fusion using a GPS receiver, a video camera and a 3D city model. In: IEEE Intelligent Vehicles Symposium IV. DeMenthon, D., Davis, L.S., 1995. Model-based object pose in 25 lines of code. Int. J. Comput. Vis. 15, 123–141. Dissanayake, M.W.M.G., Newman, P., Clark, S., Durrantwhyte, H., Csorba, M., 2001. A solution to the simultaneous localization and map building (SLAM) problem. IEEE Trans. Robot. Automat. 17, 229–241. El Badaoui El Najjar, M., Bonnifait, Ph., 2005. Towards an estimate of confidence in a road-matched location. In: IEEE International Conference on Robotics and Automation. Fischler, M.A., Bolles, R.C., 1981. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24, 381–395. Grewal, M.S., Weill, L.R., Andrews, A.P., 2011. Global Positioning Systems, Inertial Navigation and Integration. John Wiley and Sons, ISBN 0471-35032-X. Gustafsson, F., Gunnarsson, F., Bergman, N., Forssell, U., Jansson, J., Karlsson, R., Nordlund, P., 2001. Particle filters for positioning, navigation, and tracking. IEEE Trans. Signal Process. 50. Gutmann, J.S., Schlegel, C., AMOS, 1996. Comparison of scan matching approaches for self-localization in indoor environments. In: Eurobot, 1st Euromicro Workshop on Advanced Mobile Robots (EUROBOT), p. 61. Harris, C.J., Stephens, C.J., 1988. A combined corner and edge detector. In: Proceedings of 4th Alvey Vision Conference, pp. 147–151. Jensfelt, P., Christensen, H., 2005. Laser based position acquisition and tracking in an indoor environment. In: Proc. of International Symposium on Robotics and Automation.
22
M. Dawood et al. / Transportation Research Part C 63 (2016) 1–22
Julier, S., Uhlmann, J., 1997. A new extension of the Kalman filter to nonlinear systems. In: International Symposium on Aerospace/Defense Sensing, Simulation and Controls. Julier, S.J., Uhlmann, J.K., 2004. Unscented filtering and nonlinear estimation. Proc. IEEE 92 (3), 401–422. Kim, S., Kim, K., Lee, S., Lee, J., 2005. Video Navigation System using the Geographic Hypermedia, High Resolution Earth Imaging for Geospatial Information. ISPRS Hannover Workshop. Lothe, P., Bourgeois, S., Dekeyser, F., Royer, E., Dhome, M., 2009. Towards geographical referencing of monocular SLAM reconstruction using 3D city models: application to real-time accurate vision-based localization. In: IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA. Lowe, D.G., 2004. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 2 (60), 91–110. Marais, J., Meyrie, C., Flancquart, A., 2014. Toward accurate localization in guided transport: combining GNSS data and imaging information. Transport. Res. Part C: Emer. Technol. 43. Peng, K., Chen, X., Zhou, D., Liu, Y., 2009. 3D reconstruction based on SIFT and Harris feature points. In: Proceedings of the IEEE International Conference on Robotics and Biomimetics, Guilin, China. Peng, J., El Badaoui El Najjar, M., Pomorski, D., Cappelle, C., Charpillet, F., 2009. In: Proc. of IEEE France 9th International Conference on Intelligent Transport Systems (ITS-T’09), Lille, France, pp. 477–480. Peng, J., El Najjar, M.E., Cappelle, C., Pomorski, D., Charpillet, F., Deeb, A., 2009. A novel geo-localisation method using GPS, 3D-GIS and Laser scanner for intelligent vehicle navigation in urban areas. In: ICAR International Conference on Advanced Robotics, Munich, Germany. Quddus, M., Ochieng, M., Noland, R., 2007. Current map-matching algorithms for transport applications: state-of-the art and future research directions. Transport. Res. Part C: Emer. Technol. 15 (5). Renault, S., LeMeur, A., Meizel, D., 2005. GPS/GIS localization for management of vision referenced navigation in urban environments. In: 8th International IEEE Conference on Intelligent Transportation Systems. Rezaei, S., Sengupta, R., 2005. Kalman filter based integration of DGPS and vehicle sensors for localization. In: Proc. IEEE Int. Conf. MechatronicsAutom., Niagara Falls, ON, Canada. Salameh, N., Challita, G., Ramaswamy, S., 2013. Collaborative positioning and embedded multi-sensors fusion cooperation in advanced driver assistance system. Transport. Res. Part C: Emer. Technol. 29. Sayage, C., Moran, B., 2007. Waveform selection for maneuvering targets within an IMM framework. IEEE Trans. Aerosp. Electron. Syst. 43 (3). Scaramuzza, D., Seigwart, R., 2008. Appearance-guided monocular omnidirectional visual odometry for outdoor ground vehicles. IEEE Trans. Rob., 1–12 Schleicher, L., Bergasa, L.M., Ocana, M., Barea, R., Lopez, M., 2009. Real time hierarchical outdoor SLAM based on stereovision and GPS fusion. IEEE Trans. Intell. Transport. Syst. 10 (3). Se, S., Lowe, D., Little, J., 2001. Vision-based mobile robot localization and mapping using scale-invariant features. Robot. Automat. 2, 2051–2058. Se, S., Lowe, D., Little, J., 2002. Mobile robot localization and mapping with uncertainty using scale invariant visual land marks. Int. J. Robot. Res. 21 (735–758). St-Pierre, M., Gingras, D., 2004. Comparison between the unscented Kalman filter and the extended Kalman filter for the position estimation module of an integrated navigation information system. In: Proc. IEEE Intelligent Vehicle. Symosium, Parma, Italy. Takase, Y., Sho, N., Sone, A., Shimiya, K., 2003. Generation of digital city model. J. Visual. Soc. Jpn. 23. Wan, E., Van der Merwe, R., 2001. The unscented Kalman filter for nonlinear estimation. In: IEEE Symposium on Adaptive Systems for Signal Processing, Communication and Control. Wang, C.M., 1990. Location estimation and uncertainty analysis for mobile robots. Auton. Robot Veh., 90–95. Weiss, K., Kaempchen, N., Kirchner, A., 2004. Multiple-model tracking for the detection of lane change maneuvers. In: Proc. IEEE Intelligent Vehicle. Symosium, Parma, Italy. Welch, G., Bishop, G., 2002. An Introduction to the Kalman Filter. University of North Carolina, Chapell Hill (TR 95-041). Zhang, Z., 1994. Iterative point matching for registration of free-form curves and surfaces. Int. J. Comput. Vis. 13 (2), 119–152.