ISPRS Journal of Photogrammetry and Remote Sensing 121 (2016) 128–142
Contents lists available at ScienceDirect
ISPRS Journal of Photogrammetry and Remote Sensing journal homepage: www.elsevier.com/locate/isprsjprs
RBA: Reduced Bundle Adjustment for oblique aerial photogrammetry Yanbiao Sun a, Huabo Sun b, Lei Yan c,⇑, Shiyue Fan d, Rui Chen c a
Center of Remote Sensing, School of Architecture, Tianjin University, Tianjin 300072, China China Academy of Civil Aviation Science and Technology, Beijing 100028, China c Beijing Key Lab of Spatial Information Integration and Its Applications, Peking University, Beijing 100871, China d National Marine Data and Information Service, Tianjin 300072, China b
a r t i c l e
i n f o
Article history: Received 23 January 2016 Received in revised form 21 September 2016 Accepted 26 September 2016 Available online 12 October 2016 Keywords: Bundle Adjustment Oblique images Reduced normal equation RMSE Parametrization
a b s t r a c t This study proposes an efficient Bundle Adjustment (BA) model for oblique aerial photogrammetry to reduce the number of unknown parameters and the dimensions of a non-linear optimization problem. Instead of serving as independent exterior orientations, oblique camera poses are parameterized with nadir camera poses and constant relative poses between oblique and nadir cameras. New observation functions are created with image points as a function of the nadir pose and the relative pose parameters. With these observation functions, the problem of BA is defined as finding optimal unknown parameters by minimizing the total difference between estimated and observed image points. A Gauss-Newton optimization method is utilized to provide a solution for this least-square problem with a reduced normal equation, which plays a very critical role in the convergence and efficiency of BA. Compared with traditional BA methods, the number of unknown parameters and the dimension of the normal equations decrease, this approach dramatically reduces the computational complexity and memory cost especially for large-scale scenarios with a number of oblique images. Four synthetic datasets and a real dataset were used to check the validation and the accuracy of the proposed method. The accuracy of the proposed method is very close to that of the traditional BA method, but the efficiency can be significantly improved by the proposed method. For very large-scale scenarios, the proposed method can still address the limitation of memory and orientate all images captured by an oblique aerial multi-camera system. Ó 2016 International Society for Photogrammetry and Remote Sensing, Inc. (ISPRS). Published by Elsevier B.V. All rights reserved.
1. Introduction Oblique images have become increasingly important because facades and footprints of buildings can be viewed by oblique cameras, whereas regular aerial photogrammetry is insufficient for this task. Therefore, oblique cameras can close the gap between classical aerial and terrestrial acquisitions. For the past few decades, oblique image technology has been developed and widely used in many applications due to the improved airborne digital multicamera systems (Rupnik et al., 2015), including Leica RCD30, Pictometry, IGI, BlomOblique, UltraCam Osprey, etc. Oblique images can be used to generate dense point clouds, visualize 3D cities, and analyze damages when disasters or emergency events occur (Turlapaty et al., 2012; Dubois and Lepage, 2014). To overcome the limitation of low-efficiency mapping and ⇑ Corresponding author. E-mail addresses:
[email protected] (Y. Sun),
[email protected] (H. Sun),
[email protected] (L. Yan),
[email protected] (S. Fan),
[email protected] (R. Chen).
missing facade information in traditional photogrammetry (Gerke and Kerle, 2011), retrieved rich geometric and radiometric information of roofs and facades of buildings using airborne oblique, multi-perspective Pictometry data to accurately and rapidly assess building damage. Vetrivel et al. (2015) analyzed building damage assessment with point clouds extracted from oblique images. Digital elevation models (DEMs) have been created from oblique and non-metric imagery and have served as auxiliary data to monitor changes in bed topography in the Canadian Rockies (Chandler et al., 2002). Cavegn et al. (2014) investigated the potential of dense matching approaches for generating dense 3D point clouds from oblique aerial images and then evaluated the quality of point clouds by comparing with terrestrial laser scanning data. Rau et al. (2015) generated 3D point clouds based on the multi-view dense matching technique to classify roofs, facades, roads, trees, and grass for oblique images. Yalcin and Selcuk (2015) modeled a 3D representation of a Turkish city using oblique aerial images, which were textured onto the exterior of a building. Frommholz et al. (2015) proposed a method to automatically reconstruct city
http://dx.doi.org/10.1016/j.isprsjprs.2016.09.005 0924-2716/Ó 2016 International Society for Photogrammetry and Remote Sensing, Inc. (ISPRS). Published by Elsevier B.V. All rights reserved.
Y. Sun et al. / ISPRS Journal of Photogrammetry and Remote Sensing 121 (2016) 128–142
buildings with nadir images as well as oblique images in which facade elements can be retrieved. To generate accurate point cloud or 3D models from oblique aerial images, resuming the best image orientation is required. When very precise GPS/IMU signals are available, the orientation information can be obtained from these devices. For example, Fritsch et al. (2012) integrated an AEROcontrol GNSS/IMU system into a multi-camera system and then oriented all images, involving nadir and oblique cameras. Mostafa and Schwarz (2000) retrieved the exterior orientations of multiple cameras from the recorded trajectory dataset collected by medium class INS and two lowcost GPS devices. However, the orientation data ofen cannot satisfy the requirement of accuracy when using a low-cost GPS/IMU devices or when GPS/IMU devices are unavailable. Fortunately, the Bundle Adjustment (BA) model can be used to resume camera poses by minimizing the total difference of estimated and observed image points, which can be extracted by feature extraction and matching technology (Gülch, 2009; Sun et al., 2014; Hu et al., 2015). A simple method to orientate oblique images is to treat oblique and nadir cameras as independent; then, optimizing unknown parameters involves features, nadir poses and oblique poses. This process is very similar to regular aerial photogrammetry, where only nadir images are captured. In this study, we call this method the traditional BA method for oblique aerial photogrammetry. For example, Nex et al. (2013) determined the orientations of oblique images using the APERO tool, which is a traditional method that optimizes all exterior orientations of cameras. Recently, some additional constraints have been imposed onto the BA model. Gerke and Nyaruhuma (2009) proposed a BA model with constraints consisting of vertical and horizontal line features of building facades to determine the orientations of FLI-MAP 400 images from multicamera devices. Wiedemann and More (2012) proposed an extended mathematical model to achieve the orientations of oblique images for an Aerial Oblique System (AOS), which consists of a forward-looking, backward-looking and a straight down Rollei AIC 39 MPix camera. For the extended mathematical model, the oblique poses were expressed with nadir poses and constant relative poses between nadir and oblique cameras. Then, all nadir poses and the relative poses were jointly optimized. To facilitate the linearization, it is supposed that the system remained in a horizontal position, and eccentricity was used to express the relative poses between nadir and oblique cameras. However, the eccentricity is not very accurate to express the relative poses between nadir and oblique cameras, producing an inevitable systemic error and reducing the accuracy of the camera orientation. Rupnik et al. (2013) mainly presented a new connectivity strategy to automatically orientate oblique poses in a large block in which the images were acquired by a multi-camera system using Canon Eos 1D Mark III cameras, and oblique cameras were rotated in the four-directions manner by a camera that was 45° w.r.t. to the nadir. In addition, this strategy recapped a possible way for a multi-camera system orientation, i.e., the collinearity equations for oblique cameras were defined as a function of nadir camera poses that were respective displacements between nadir and oblique cameras. However, the possible method was not implemented in experiments, and the oblique images were orientated using the Apero tool (Pierrot-Deseilligny and Clery, 2011), which utilizes a traditional BA method. In this study, we will introduce the possible method in detail and systematically analyze the accuracy and efficiency of this new BA method using synthetic and real datasets. For the proposed Reduced Bundle Adjustment (RBA) model, the nadir poses will be expressed with the strategy similar to common aerial photogrammetry, while the oblique poses will be parameterized by a combination with the nadir poses and relative poses between the nadir and oblique cameras. The observation functions
129
are created by image points as a function of 3D points, the nadir poses and the relative pose parameters. Then, oblique aerial images are aligned, making it convenient to determine the connectivity and geometric topology relationship between the nadir and oblique images. Next, a Gauss-Newton optimization method is used to provide a solution for the non-linear least-square problem, constructing a reduced normal equation. As we can see from the latter experiment, it is possible to orientate oblique and nadir cameras in a large-scale study using the reduced normal equation. The remainder of this study is organized as follows. Section 2 describes the observation functions for oblique images in detail, whose poses are parameterized with the nadir poses and the relative poses between nadir and oblique cameras. Section 3 introduces a strategy of image alignment to easily determine the connectivity between nadir and oblique images. Meanwhile, a reduced normal equation is constructed when a Gauss-Newton optimization method is utilized to provide a solution. In Section 4, four synthetic datasets with increasing oblique images are tested to discuss the accuracy and efficiency of the proposed BA method compared with the traditional BA method. Additionally, a real dataset is also used to analyze the performance of the proposed method. Section 5 presents the conclusions and possible further studies. 2. Observation functions for oblique images 2.1. Camera parametrization In the traditional BA model, oblique cameras are treated as independent and similar to nadir cameras, i.e., they have individual rotation matrices and translation vectors like the nadir camera poses. Actually, oblique cameras are tightly fixed and remain still w.r.t. the nadir cameras. Thus, at any exposure stations, the relative poses between the nadir camera and its surrounding oblique cameras are nearly identical. Therefore, we will parameterize the oblique poses with the nadir pose and the constant relative poses. In this study, we mainly discuss the camera orientation for the fivecamera system, which consists of one nadir camera and four oblique cameras. An oblique multi-camera system is illustrated in Fig. 1, where the red circle refers to the nadir camera (C ND ), while the green, purple, yellow and blue triangles represent east, west, south and north oblique cameras (C E ; C W ; C S ; C N ), respectively. Suppose that n nadir images are taken by the multi-camera system and 4n oblique images are simultaneously obtained in an oblique aerial scenario. Similar to the traditional BA model, the nadir poses can be parameterized with individual Euler angles and translation vectors, which are C i ¼ fRi ; T i gði ¼ 1; 2; . . . ; nÞ and R amounts to a rotational matrix computed by Euler angles. Assume that the relative pose between the east camera and the nadir one is represented with ðr E ; t E Þ. Similarly, the relative poses of the west, south and north ones w.r.t. the nadir cameras are expressed by ðr W ; tW Þ, ðr S ; t S Þ and ðrN ; t N Þ, respectively. Then, the oblique poses, C iE , C iW , C iS and C iN , can be represented with Eq. (1).
n o 8 > C iE ¼ rE Ri ; RTi t E þ T i > > > > n o > > > < C iW ¼ r W Ri ; RTi t W þ T i n o i T > > ¼ r R ; R t þ T C > S S i i i S > > > n o > > : C i ¼ rN Ri ; RT tN þ T i i N
ð1Þ
where i of C iE , C iW , C iS , C iN ranges from 1 to n. In Eq. (1), three angles of the exterior orientation can be retrieved from the left part in the brace, while three translation vectors of the exterior orientation
130
Y. Sun et al. / ISPRS Journal of Photogrammetry and Remote Sensing 121 (2016) 128–142
3. Bundle Adjustment with reduced normal equation 3.1. Image alignment The image alignment is pre-required in the BA model, and we name the image indexes as follows. Suppose that some aerial images are taken by an oblique five-camera system at n exposure stations in a scenario. First, all nadir images are indexed from 1 to n. Then, all east images are indexed from n þ 1 to 2n. Similarly, the west, south and north images are indexed with ð2n þ 1; 3nÞ, ð3n þ 1; 4nÞ and ð4n þ 1; 5nÞ, respectively. Thus, for the ith nadir image, it is easy to find its surrounding oblique images, whose indexes are n þ i, 2n þ i, 3n þ i, and 4n þ i, respectively. At the same time, for the ith image, Eq. (8) distinguish which type of oblique camera
Fig. 1. The structure of an oblique five-camera system containing one nadir camera and four oblique cameras. The circle refers to the nadir camera, while the east, west, south and north oblique cameras are expressed using four triangles.
i1 ; k¼ n
8 0; > > > > > 1; > < k ¼ 2; > > > 3; > > > : 4;
i 2 C ND i 2 CE ð8Þ
i 2 CW i 2 CS i 2 CN
where bac means the smallest integer smaller than or equal to a. are expressed in the right part of the brace. Note that the relative poses remain unchanged and are constant at any exposure stations. 2.2. Observation functions For 3D point features, the parameterization using Euclidean XYZ is utilized. According to the pin-hole camera model, a 3D terrain point can project into cameras to form multiple 2D image points based on an observation functions. Similar to the traditional BA model, the observation function of the ith nadir camera can be expressed by Eqs. (2) and (3)
u
v
¼
x=w
y=w
3 x 6 7 4 y 5 ¼ KRi ðX T i Þ w
ð2Þ
2
2
ð3Þ
2
x
x
ð5Þ
ð6Þ
3
6 7 T 4 y 5 ¼ Kr N Ri ðX Ri t N T i Þ w
1 1 1 denoted with P ¼ fC i ; X j ; r1 E ; t E ; r W ; t W ; r S ; t S ; r N ; t N g(i ¼ 1; 2; . . . ; n; 1 j ¼ 1; 2; . . . ; m). Here, r represents three Euler angles computed from its rotation matrix. The problem of RBA can be defined as a least-square problem in Eq. (9) to estimate the best unknown parameters by minimizing the total re-projection residual of the image points.
ð9Þ
where z refers to observed image points, and f ð:Þ represents the observation functions formulated in Eqs. (2)–(7). A Gauss-Newton optimization method can be applied to solve the least-square problem in Eq. (9). When a good initialization is provided, the first derivative w.r.t. all unknown parameters can be calculated and expressed with J ¼ fJC i ; JX j ; J r1 ; Jt1 ; Jr1 ; Jt1 ; Jr1 ; Jt1 ; Jr1 ; Jt1 g. Then, E
E
W
ð7Þ
W
S
S
N
N
a linear equation based on the Gauss–Newton method can be listed in Eq. (10), seeking the optimal incremental values for each iteration.
J T JM ¼ J T ðz f ðPÞÞ
3
6 7 T 4 y 5 ¼ Kr S Ri ðX Ri t S T i Þ w 2
ð4Þ
3
x 6 7 T 4 y 5 ¼ Kr W Ri ðX Ri t W T i Þ w
P ¼ fC i ; X j g(i ¼ 1; 2; . . . ; 5n; j ¼ 1; 2; . . . ; m). But, for the proposed RBA model, the size of unknown parameters is 6ðn þ 4Þ þ 3m,
T
3
x 6 7 T 4 y 5 ¼ Kr E Ri ðX Ri t E T i Þ w
For the scenario with n nadir images, 4n oblique images and m features exist in the traditional BA model, and the size of unknown parameters is 30n þ 3m, which can be expressed with
kek2 ¼ ½z f ðPÞ ½z f ðPÞ
where (u; v ) is an image point and K is a 3 3 calibration matrix. For the oblique cameras surrounding the ith nadir camera, Eq. (1) is substituted into Eq. (3), and the observation functions can be formulated and listed in Eqs. (4)–(7) and Eq. (2).
2
3.2. Bundle Adjustment using Gauss–Newton method
ð10Þ
Eq. (10) can also be expressed with block matrices and block vectors in Eq. (5).
2
U CC
6 UT 4 CS W TCF
U CS U SS W TSF
32
3
2
3
eC 6 7 6 7 W SF 7 54 MS 5 ¼ 4 eS 5 MF eF V FF
W CF
MC
ð11Þ
where MC , MS and MF refer to incremental value of the nadir poses, relative poses of oblique cameras and 3D points, respectively. Because the number of observed image points is greatly larger than the number of cameras, the Schur complement (Ouellette, 1981) is commonly utilized to first calculate the incremental values of the cameras using Eq. (12) and then to obtain those of 3D points using Eq. (13):
131
Y. Sun et al. / ISPRS Journal of Photogrammetry and Remote Sensing 121 (2016) 128–142
"
T T U CC W CF V 1 U CS W CF V 1 FF W CF FF W SF T U TCS W SF V 1 FF W CF
T U SS W SF V 1 FF W SF
#
MC MS
"
¼
eC W TCF V 1 FF eF T 1 eS W SF V FF eF
#
ð12Þ where the far left square matrix in Eq. (12) is called the reduced normal equation for oblique scenario. This greatly affects the convergence and the elapsed time for the non-linear optimization problem. From Eq. (12), the dimension of the normal equation is reduced to ð6n þ 24Þ ð6n þ 24Þ from 30n 30n, making it faster to solve the best incremental value when one iteration is performed. After the incremental values of the nadir poses and the relative poses are known, they can be substituted into Eq. (13) to calculate incremental values of the features.
"
MF ¼ V 1 FF
eC W TCF eC W TCF eS eS W TSF eC W TSF eS
#
ð13Þ Fig. 2. The oblique five-camera system designed for generation of synthetic datasets.
4. Experiments and discussion In this section, we conducted experiments using some synthetic and real datasets to check the validity and accuracy of the proposed method. First, we manufactured four synthetic datasets in which the number of oblique images continually increases. The accuracy and efficiency of the proposed method were analyzed and compared with the traditional BA model. Then, we also tested the algorithm using a real dataset whose camera poses and 3D terrain were resumed and reconstructed. Because the efficiency of the BA solution strongly depends on the sparseness and the number of non-zeros of the normal equation (Triggs et al., 2000), the sparseness of all datasets are drawn and the number of non-zeros elements is counted for analyzing the efficiency of the BA model. 4.1. Synthetic datasets An oblique five-camera system (Fig. 2) was designed and used to manufacture four synthetic datasets (Table 1). In Fig. 2, the nadir, east, west, south and north oblique images were depicted with red,1 green, brown, purple and blue tiles, respectively. All cameras were regarded as without any distortion. The angles between the optical axis of the nadir image and those of four oblique images were set to 30°. Meanwhile, the distances between the center of the nadir camera and four oblique cameras were each 0.25 m. The principle point was located at the (4500, 3366) pixel, while the focal length was set to 8833.3 pixels. When the oblique multi-camera system took a picture at the height of 5000 m, the distribution of cameras in the planar direction was depicted in Fig. 3, and the number of observed images was recorded in the second column in Table 1. Assume that there were some 3D points evenly distributed on the whole scenario listed in the third column. The 3D points were projected into all cameras, forming observed image projection shown in the fourth column. Then, Gaussian noise with r ¼ 0:3 pixel was added to the image points, which served as the measurements of Eq. (9). When given an initialization for the BA model, the convergence and elapsed time are shown in Table 2 using the proposed method (RBA) and the traditional BA method (TBA). In this study, sSBA, a well-known publicly available software package, was used as TBA, while RBA was also implemented based on sSBA (Konolige and Garage, 2010). For fair comparison, the iterative solution using the Gauss-Newton optimization method is also applied to the TBA method. Meanwhile, the stopping criteria used in the RBA and TBA 1 For interpretation of color in Figs. 2, 11 and 12, the reader is referred to the web version of this article.
Table 1 The overall parameters of four synthetic datasets. Data
Images
Points
Projections
Syn1 Syn2 Syn3 Syn4
300 1540 3740 6020
3401 12,168 25,616 38,720
47,971 264,179 654,240 1,061,313
are the same. The stopping criteria (Zhao et al., 2015) involves the following: (1) the update of the state vector is less than 1012 , (2) the change of the value of the objective function is less 1012 , (3) the gradient of the objective function is less than 1012 , and (4) the maximum number of iterations is 200. The third, fourth and fifth columns refer to the initial root mean square error (RMSE), the final RMSE and the number of iterations, respectively. The sixth column is the elapsed time, recorded for the case of the calculations implemented on a computer with 2.1 GHz and 8G RAM. The last column amounts to the number of non-zero elements of the normal equation illustrated in Figs. 4–7, where a blue point represents a 6 6 matrix. The sparseness of the normal equation would greatly affect the efficiency of the BA model. For the final RMSE, there exists only a slight difference between final RMSEs resulting from the proposed method and the traditional one. For the runtime, the proposed method has better efficiency compared with the traditional one. Meanwhile, the efficiency of the proposed method could be significantly improved when more images were processed. For the Syn3 dataset, the elapsed time could be shortened to as low as 16.7 s compared with the traditional method (57.8 s). When the number of images were continually increased in the Syn4 dataset, it was found that the non-zeros of the normal equation of the traditional method was approximately five times larger than that of the proposed method. Due to the limitation of memory, a solution of the Syn4 dataset failed using the traditional method when calculating the incremental values for one iteration. In this study, the Suitesparse tool (Davis et al., 2014) was used to solve the linear equation with a sparse and reduced normal equation system. For the above synthetic datasets, the proposed method required less time to converge to an approximate RMSE compared with the traditional method. Meanwhile, there was a greater possibility to provide a solution for a large-scale scenario with a large number of oblique images using the proposed method, because the dimension and non-zeros of the normal equation were significantly reduced.
132
Y. Sun et al. / ISPRS Journal of Photogrammetry and Remote Sensing 121 (2016) 128–142
(a) Syn1 dataset.
(b) Syn2 dataset.
(c) Syn3 dataset.
(d) Syn4 dataset.
Fig. 3. The distribution of the cameras of synthetic datasets in the plane direction.
Table 2 The convergence and elapsed time of the proposed method (RBA) and the traditional BA method (TBA) using synthetic datasets. Data
Method
Initial RMSE
Final RMSE
Iterations
Elapsed time (s)
Non zeros
Syn1
TBA RBA
180.5562 180.5562
0.1144 0.1154
4 3
0.97 0.52
883,368 132,552
Syn2
TBA RBA
196.8831 196.8831
0.1172 0.1181
4 3
12.9 3.7
6,330,312 1,139,976
Syn3
TBA RBA
202.4382 202.4382
0.1179 0.1188
4 3
57.8 16.7
16,601,544 3,068,424
Syn4
TBA RBA
206.432 206.432
N/A 0.1190
N/A 3
N/A 28.82
27,396,000 5,094,936
4.2. Real dataset In this section, we tested the proposed algorithm using a real dataset ‘‘Zurich”, which is from ISPRS/EuroSDR project ‘‘Benchmark on High Density Aerial Image Matching” (Cramer, 2007; Haala, 2014). In the Zurich dataset, a Leica RCD30 Oblique Penta camera was used and a total 135 oblique images were published. For nadir images, the overlaps of along-track images and across-track images reach up to 70% and 50%, respectively. Four oblique cameras were mounted at a tilt angle of 35° w.r.t. the nadir cameras. Among the
publicly 135 images, there are a total of 27 nadir images distributed on three tracks, but some oblique images captured at the same exposure stations with the nadir images are missing. Thus, in this experiment, we removed the unrelated oblique images and selected a total of 96 images for the test, in which the number of nadir, east, west, south and north oblique images were in total 27, 18, 15, 18 and 18, respectively. Then, feature extraction and matching for oblique images was utilized to extract 4708 correspondences and 36,989 image points (Table 3). The RMSE curves of each iteration of two methods are
Y. Sun et al. / ISPRS Journal of Photogrammetry and Remote Sensing 121 (2016) 128–142
(a) RBA.
133
(b) TBA.
Fig. 4. The sparseness of the normal equation of the Syn1 dataset for the proposed method (RBA) and the traditional method (TBA).
(a) RBA.
(b) TBA.
Fig. 5. The sparseness of the normal equation of the Syn2 dataset for the proposed method (RBA) and the traditional method (TBA).
depicted in Fig. 8, where blue and red lines represent the curves of the proposed method and the traditional method, respectively. When a RMSE of 487.0076 was initialized, the traditional method required 4 iterations to converge to 0.3144 pixels, while the proposed one yielded the approximate final RMSE within 4 iterations. Note that the same stopping criteria are used in the TBA method and the proposed RBA method for fair comparison. The reason
why the final RMSE of the proposed method was slightly poorer than that of the traditional method was that, the extracted image points and relative poses were smeared with noise. The elapsed time was recorded in the seventh row, which shows that it was slightly faster using the proposed method compared to the traditional method. The reason for this was that the non-zeros elements of the normal equation of the proposed method were much less
134
Y. Sun et al. / ISPRS Journal of Photogrammetry and Remote Sensing 121 (2016) 128–142
(a) RBA.
(b) TBA.
Fig. 6. The sparseness of the normal equation of the Syn3 dataset for the proposed method (RBA) and the traditional method (TBA).
(a) RBA.
(b) TBA.
Fig. 7. The sparseness of the normal equation of the Syn4 dataset for the proposed method (RBA) and the traditional method (TBA).
than those of the traditional method. Additionally, the sparseness of the normal equation of the two methods are shown in Fig. 9. Next, we will discuss the residuals of camera pose parameters, relative pose parameters and sparse 3D points retrieved by the two methods. First, the variation of camera pose parameters are shown in Fig. 10, in which Fig. 10(a) and (b) refers to the variation of three
rotational angles and three translation vectors, respectively. Here, the points of 1–27, 28–45, 46–60, 61–78 and 79–96 on the Xaxis express the nadir, east, west, south and north cameras, respectively. Meanwhile, we calculated the max value, mean value and standard variation of yaw, pitch, roll angles as well as three translation vectors in Table 4, in which ‘Max”, ‘‘Mean” and ‘‘Std” represented the max value, mean value and standard deviation,
Y. Sun et al. / ISPRS Journal of Photogrammetry and Remote Sensing 121 (2016) 128–142 Table 3 The convergence and elapsed time of the proposed method (RBA) and the traditional method (TBA) using the real dataset. Parameters
TBA
RBA
3D points Projection Initial RMSE Final RMSE Iterations Elapsed time Non zeros
4708 36,989 487.0076 0.3144 4 0.28 225,504
4708 36,989 487.0076 1.2045 4 0.2 34,596
135
and roll angles are depicted in Fig. 11(a), (c) and (e), and those of X-, Y- and Z-direction translation vectors are shown in Fig. 11(b), (d) and (f). Here, the relative pose parameters of the east, west, south and north cameras w.r.t. nadir cameras are drawn in blue, pink, gray and yellow, respectively. Meanwhile, the mean value and standard variation of six types of parameters are shown in the left-upper corner. The detailed statistic of the residuals of relative pose parameters of oblique cameras are listed in Table 4. It is shown that the average variations of the relative rotational angles and the relative translation vectors were approximately 0.025° and 0.23 m, respectively (see Table 5). Third, 3D sparse points were triangulated by two methods and are illustrated in Fig. 12, in which the red circles and blue crosses represent the 3D points obtained by RBA and TBA, respectively. Meanwhile, we drew the residual distribution of 3D points in Fig. 13, in which the Z-value of a point refer to the distance between two 3D points triangulated by the two methods. The statistic of the residuals in Fig. 13 is shown in Table 6. The average residual of two sets of 3D points triangulated by two methods was 1.34 m. In summary, the accuracy obtained by RBA is similar to that of TBA. For the Zurich dataset tested, the average variation of camera positions and 3D points obtained by the two methods were approximately 0.2 m and 1.34 m.
4.3. Discussion Fig. 8. RMSE curves of each iteration for the Zurich dataset.
respectively. The average variations of rotational angles and translation vectors were approximately 0.016° and 0.2 m, respectively. It is clear that the resumed camera positions using two methods are approximately in coincidence. Second, the variations of the relative pose parameters of the oblique cameras w.r.t. the nadir camera retrieved by RBA and TBA are illustrated in Fig. 11, where the variations of yaw, pitch
(a) RBA.
First, we have focused on analyzing the accuracy affected by the added noise of relative poses of oblique cameras w.r.t. the nadir cameras, including position (X, Y and Z) and orientation (pitch, yaw and roll angles). Then, the residuals of relative pose parameters retrieved by the TBA and RBA methods were compared. Finally, we discuss the computational complexity and memory cost required by two methods. Here, we used the first synthetic data ‘‘Syn1” for the analysis, which involves 3401 points, 60 nadir camera and 240 oblique cameras.
(b) TBA.
Fig. 9. The sparseness of the normal equation of the Zurich dataset for the proposed method (RBA) and the traditional method (TBA).
136
Y. Sun et al. / ISPRS Journal of Photogrammetry and Remote Sensing 121 (2016) 128–142
(a) Orientation.
(b) Position.
Fig. 10. The variations of the camera poses retrieved by RBA and TBA using the Zurich dataset.
Table 4 The statistic of the variations of camera pose parameters estimated by TBA and RBA method using the Zurich dataset. Parameters
Max
Mean
Std
Yaw (°) Pitch (°) Roll (°) X (m) Y (m) Z (m) Angles (°) Translation (m)
0.0143 0.0527 0.0544 0.6457 0.2675 0.3690 0.0544 0.6457
0.0056 0.0225 0.0222 0.3598 0.0922 0.1568 0.0168 0.2029
0.0040 0.0166 0.0159 0.2029 0.0747 0.1107 0.0156 0.1753
4.3.1. Analysis on the accuracy of BA influenced by the noise of relative pose parameters of oblique cameras If no noise has been added to relative pose parameters, then there is a rigidity constraint. When more noise was added into the relative pose parameters, the constraint became weaker. In this section, we will analyze the accuracy of BA influenced by the noise added to the relative pose parameters, which is closely related to the constraint. The accuracy influenced by the noise of the relative pose parameters of oblique cameras has been analyzed in Tables 7 and 8, in which a Gaussian noise with r ¼ 0:3 pixel was added to the image points and served as the measurement of the non-linear least-squares problem. We assume that the relative positions of four oblique cameras are not rigidly constant, and we added a Gaussian noise listed in the first column of Table 7. When an initialization is used, as given in the second column, the final RMSE of the traditional and proposed methods were obtained in the third and fourth columns, respectively. First, it is shown that the final RMSE of RBA was slightly poorer than that of TBA when no noise was added to the relative positions. The existing noise of image points was the main reason for the poorer accuracy. To minimize the difference between re-projection and the noisy image points, TBA has optimized all of the unknown parameters without any constraint. However, to guarantee that the retrieved relative positions of oblique cameras at different exposure stations are constant, RBA should make the RMSE larger because of the equality constraint in Eq. (1). Second, the accuracy of RBA was poorer and poorer when more noise was added into the relative positions, but the final RMSE of RBA was very close to that of TBA when the added noise ranged from r ¼ 0:01 m to r ¼ 0:2 m. Finally, when the amount of noise was very large, the accuracy of RBA was very poor because the assumption of rigidity constraint in Eq. (1) was infeasible when amount of the added noise was very large. The analysis of accuracy influenced by the added noise of relative orientation is presented in Table 8. Similarly, the final RMSE of
RBA was slightly poorer than TBA when no noise was added to the relative rotational angles of the oblique cameras at different exposure stations. Additionally, the accuracy of RBA was close to that of TBA when the noise added was r ¼ 0:002° and r ¼ 0:01°. When the noise became greater, the accuracy of RBA became poorer and poorer. In summary, the accuracy of the proposed method is influenced by the noise of relative pose parameters of oblique cameras. When the noise is not very great, the RMSE of RBA is very close to that of TBA. When the constraint and the added large noise become weak, a poor accuracy is obtained by RBA.
4.3.2. Analysis of the residuals of retrieved relative pose parameters using two methods In this section, we discussed the residuals of relative pose parameters obtained by two methods for different constraints. Note that the residuals referred to the difference between true and estimated relative pose parameters at different exposure stations. Here, we analyzed the performance of two methods in detail for three cases as follows: Case I: The difference of the RMSE between two methods is very small (rigidity constraint). Case II: The difference of the RMSE between two methods is small (semi-constraint). Case III: The difference of the RMSE between two methods is large (weak constraint). For the first case, there is a complete rigidity constraint and no noise was added to the relative pose (e.g., r ¼ 0 in Table 7 and r ¼ 0 in Table 8). The convergent RMSEs of TBA and RBA were 0.1148 and 0.1157, respectively. For the second case, a semiconstraint is used, and here we analyzed the performance using the data with the Gaussian noise with r ¼ 0:2 m (eg. r ¼ 0:2 in Table 7). Using the data ‘‘r ¼ 0:2” in Table 7, the convergent RMSE of TBA and RBA were 0.1145 and 0.1428, respectively. Finally, the data ‘‘r ¼ 0:1” degrees in Table 8 were used for the analysis of the performance of the third case, where the convergent RMSEs of TBA and RBA were 0.1147 and 1.3520, respectively. For the first case, when no noise was added into the oblique cameras, the residuals of relative pose parameters retrieved are shown in Fig. 14, where the blue and yellow points represent the residuals of TBA and those of RBA, respectively. Fig. 14(a), (c) and (e) represent the residuals of yaw, pitch and roll angles,respectively, while Fig. 14(b), (d) and (f) are the residuals of X-, Y- and Z-direction translation vectors, respectively. In addition, in Fig. 14, the points of 1–60, 61–120, 121–180 and 181–240 on the X-axis represent the residuals of the parameters of east, west,
Y. Sun et al. / ISPRS Journal of Photogrammetry and Remote Sensing 121 (2016) 128–142
(a) Yaw angle.
137
(b) X-direction.
(c) Pitch angle.
(d) Y-direction.
(e) Roll angle.
(f) Z-direction.
Fig. 11. The variations of the relative pose parameters of oblique cameras retrieved by RBA and TBA using the Zurich dataset.
Table 5 The statistic of the residuals of relative pose parameters of oblique cameras estimated by TBA and RBA method using the real dataset. Parameters
Max
Mean
Std
Yaw (°) Pitch (°) Roll (°) X (m) Y (m) Z (m) Angles (°) Translation (m)
0.0067 0.0848 0.0794 0.6826 0.2767 0.1015 0.0848 0.6826
0.0031 0.0330 0.0399 0.5073 0.1291 0.0624 0.0253 0.2306
0.0021 0.0258 0.0233 0.0766 0.0836 0.0274 0.0256 0.2029
south and north oblique cameras, respectively. It is clear that the residuals of all parameters of RBA were smaller than those of TBA. In addition, the statistic of the residuals in Fig. 14 was shown in Table 9, in which ‘‘Max”, ‘‘Mean” and ‘‘Std” represent max value, mean value and standard deviation, respectively. We clearly discovered that the residuals of RBA were smaller, that is, the estimated relative pose parameters were closer to the true parameters. For the second case, when the Gaussian noise r ¼ 0:2 m was added to the relative position of oblique cameras, the residuals of relative pose parameters retrieved are shown in Fig. 15. The residuals of all parameters of RBA are also smaller than those of TBA as a whole.
138
Y. Sun et al. / ISPRS Journal of Photogrammetry and Remote Sensing 121 (2016) 128–142
Fig. 12. 3D sparse points triangulated by two methods using the Zurich dataset.
Fig. 13. The residual of 3D points triangulated by two methods using the Zurich dataset.
Table 6 The statistic of the residual of 3D points triangulated by TBA and RBA using the Zurich dataset. Parameters
Max
Mean
Std
X (m) Y (m) Z (m) Translation (m)
3.7174 2.3891 4.9074 6.3556
0.6826 0.2491 1.0331 1.3468
0.2952 0.3229 0.7195 0.7011
Similarly, the statistic of the residuals in Fig. 15 is shown in Table 10. We clearly find the estimated relative pose parameters are also closer to the true value.
For the third case, when the Gaussian noise r ¼ 0:1° was added into the relative rotational angles of the side cameras, the residuals of relative pose parameters retrieved are shown in Fig. 16. The residuals of all parameters of RBA are larger than those of TBA as a whole. The statistic of the residuals in Fig. 16 is shown in Table 11. It is shown that the retrieved parameters by RBA are further away from the true values compared with TBA. In summary, the constraint that the difference of relative camera poses at different exposure stations is zero, cannot be guaranteed in practical cases due to the inevitable mechanically dithered
Y. Sun et al. / ISPRS Journal of Photogrammetry and Remote Sensing 121 (2016) 128–142 Table 7 The accuracy affected by the noise of relative positions of oblique cameras w.r.t. nadir cameras. Noise (m)
Initial RMSE
r¼0 r ¼ 0:01 r ¼ 0:02 r ¼ 0:05 r ¼ 0:1 r ¼ 0:2 r ¼ 0:5 r¼1
211.3859 211.4072 211.4308 211.4048 211.1563 211.2463 211.0995 211.0995
Final RMSE TBA
RBA
0.1148 0.1149 0.1151 0.1146 0.1144 0.1145 0.1147 0.1148
0.1157 0.1159 0.1163 0.1172 0.1218 0.1428 0.2309 0.4290
Table 8 The accuracy affected by the noise of relative orientations of oblique cameras w.r.t. nadir cameras. Noise (°)
r¼0 r ¼ 0:002 r ¼ 0:01 r ¼ 0:02 r ¼ 0:03 r ¼ 0:1 r ¼ 0:2
Initial RMSE
211.3859 211.3977 211.4521 211.4734 211.1492 211.2108 211.4048
Final RMSE TBA
RBA
0.1148 0.1150 0.1149 0.1148 0.1145 0.1147 0.1147
0.1157 0.1189 0.1726 0.2787 0.5136 1.3520 2.7044
139
cameras. When the relative camera poses of the side cameras satisfy the rigidity constraint, the convergent RMSE arising from RBA is slightly poorer due to the extracted image points smeared with noise. When the constraint becomes weaker, the resultant RMSE is poorer, and the retrieved camera parameters of the proposed method is also poorer.
4.3.3. Computational complexity TBA and RBA involve three main steps: Cost function computation (CostFun). Calculating the reprojection error. Hessian matrix construction (H_con). Constructing the normal equation. Hessian matrix solution (H_sol). Solving the linear system using Cholesky decomposition. We have listed the runtime of the three steps per iteration in detail in Table 12. In addition, the memory costs of the two methods are listed in the fifth and ninth rows. The memory cost of the BA algorithm mainly depends on the dimension of the Hessian matrix (Konolige and Garage, 2010). Assuming that the number of nadir cameras is n, the dimensions of the Hessian matrix in TBA and RBA are 30n 30n and ð6n þ 24Þ ð6n þ 24Þ, respectively. From Table 12, it is clear that the memory cost dramatically increased with the increasing nadir
(a) Yaw angle.
(b) X-direction.
(c) Pitch angle.
(d) Y-direction.
(e) Roll angle.
(f) Z-direction.
Fig. 14. The residual of relative pose parameters retrieved by TBA and RBA methods when no noise was added into the oblique cameras. (a), (c) and (e) represented the residuals of yaw, pitch and roll angles, while (b), (d) and (f) were the residuals of X-, Y- and Z-direction translation vectors.
140
Y. Sun et al. / ISPRS Journal of Photogrammetry and Remote Sensing 121 (2016) 128–142
Table 9 The statistic of the residuals of relative pose parameters of oblique cameras estimated by TBA and RBA method when no noise was added into the oblique cameras. Parameters
Angles (°) Translation (m)
TBA
RBA
Max
Mean
Std
Max
Mean
Std
0.0037 0.3036
5.6858e04 0.0575
5.5208e04 0.0451
5.1566e04 0.0524
1.7189e04 0.0306
1.4618e04 0.0146
(a) Yaw angle.
(b) X-direction.
(c) Pitch angle.
(d) Y-direction.
(e) Roll angle.
(f) Z-direction.
Fig. 15. The residual of relative pose parameters retrieved by TBA and RBA methods when the added Gaussian noise of relative positions of oblique cameras was set as r ¼ 0:2 m. (a), (c) and (e) represented the residuals of yaw, pitch and roll angles, while (b), (d) and (f) are the residuals of X-, Y- and Z-direction translation vectors.
Table 10 The statistic of the residuals of relative camera parameters of oblique cameras estimated by TBA and RBA method when the added Gaussian noise of relative position of oblique cameras was set as r ¼ 0:2 m. Parameters
Angles (°) Translation (m)
TBA
RBA
Max
Mean
Std
Max
Mean
Std
2.86e03 0.2728
6.12e04 0.0607
5.88e04 0.0473
5.73e04 0.1620
1.77e04 0.0574
1.51e04 0.0383
images. For the real data, only a little memory cost was required for constructing a Hessian matrix because the number of nadir images was very few (e.g., 27). For the largest synthetic dataset, TBA required approximately five times more memory than RBA. Thus, due to the limitation of the allocated memory, TBA was out of memory when dealing with the ‘‘Syn4” dataset and could not provide a solution to the large linear system using Cholesky decomposition. In Table 12, it is shown that the Hessian matrix construction was the most time consuming part for BA. Thus, the computational
complexity of the BA algorithm depended on forming the Hessian matrix. For the BA algorithm, the complexity of H_con is N 2 logN and quadratic in n, as stated by Konolige and Garage (2010). Assuming that the number of nadir cameras is n, the computational complexities of TBA and RBA are 25n2 logð5nÞ and n2 logn, respectively. Thus, the efficiency of RBA will be better than TBA when the number of nadir images is very large. For the Zurich data, the number of nadir images is the fewest, so the improved efficiency is not apparent. It can be deduced that the efficiency should be better when a large-scale dataset is available.
141
Y. Sun et al. / ISPRS Journal of Photogrammetry and Remote Sensing 121 (2016) 128–142
(a) Yaw angle.
(b) X-direction.
(c) Pitch angle.
(d) Y-direction.
(e) Roll angle.
(f) Z-direction.
Fig. 16. The residual of relative pose parameters retrieved by TBA and RBA methods when the added Gaussian noise of relative rotational angles of oblique cameras was set as r ¼ 0:1°. (a), (c) and (e) represented the residuals of yaw, pitch and roll angles, while (b), (d) and (f) were the residuals of X-, Y- and Z-direction translation vectors.
Table 11 The statistic of the residuals of relative camera parameters of oblique cameras estimated by TBA and RBA method when the added Gaussian noise of relative rotational angle of oblique cameras was set as r ¼ 0:1°. Parameters
TBA
RBA
Max
Mean
Std
Max
Mean
Std
Angles (°) Translation (m)
0.0040 0.3023
5.7885e04 0.0575
5.3757e04 0.0452
0.0172 0.3046
0.0072 0.1195
0.0042 0.0861
Table 12 The runtime of three steps per iteration (s) and memory cost of two methods. Methods
Step/Memory
Real data
Syn1
Syn2
Syn3
Syn4
TBA
CostFun H_con H_sol Memory
0.04 0.04 0.01 15 MB
0.01 0.17 0.01 35 MB
0.06 2.23 0.01 300 MB
0.15 11.82 0.02 900 MB
0.24 19.77 N/A 1.6 GB
RBA
CostFun H_con H_sol Memory
0.06 0.03 0.01 11 MB
0.07 0.06 0.01 15 MB
0.2 0.47 0.01 81 MB
1.4 2.58 0.02 210 MB
2.3 4.61 0.03 350 MB
142
Y. Sun et al. / ISPRS Journal of Photogrammetry and Remote Sensing 121 (2016) 128–142
In summary, the advantage of the proposed method is lower computational complexity and memory cost, yielding better efficiency when dealing with large-scale oblique aerial images. However, it is undeniable that a slightly poorer accuracy will be obtained since the constraint of the observation function in Eq. (1) cannot be totally constant in practical cases. 5. Conclusion In oblique aerial photogrammetry, oblique cameras are firmly fixed, and relative poses between oblique and nadir cameras seldom change at all exposure positions. For the process of the traditional BA method, the oblique and nadir cameras are treated as having independent exterior orientations. In this study, the four oblique cameras were parameterized with nadir poses and four groups of constant relative poses, reducing the dimension of the normal equation. The unknown parameters of the proposed BA method only involved feature points, the nadir poses and the relative poses, and the total number of unknown parameters of our method would rapidly decrease when dealing with larger scale oblique images. Although a poorer RMSE was obtained, the proposed method requires a lower computational complexity and memory cost. The new method makes it possible to provide a solution for a large-scale scenario with a large number of images. When four synthetic datasets and a real dataset were tested, the efficiency was improved using the proposed method; meanwhile the accuracies of the proposed method were very close to those of the traditional method. In this study, we analyzed and compared efficiencies using numerous synthetic oblique images. In the future, we will check the efficiency of the proposed method when large-scale oblique images are available. Acknowledgments The authors would like to thank ISPRS/EuroSDR project for providing the oblique imagery. This work is supported by the National Natural Science Foundation of China (Grant Nos. 41571432, 41371492 and U1533102), and the Beijing Natural Foundation at Grant No. of Z151100000915068. References Cavegn, S., Haala, N., Nebiker, S., Rothermel, M., Tutzauer, P., 2014. Benchmarking high density image matching for oblique airborne imagery. Int. Arch. Photogram., Rem. Sens. Spat. Inf. Sci. 3, 45–52 (Zurich). Chandler, J., Ashmore, P., Paola, C., Gooch, M., Varkaris, F., 2002. Monitoring riverchannel change using terrestrial oblique digital imagery and automated digital photogrammetry. Ann. Assoc. Am. Geograph. 92 (4), 631–644. Cramer, M., 2007. The EuroSDR performance test for digital aerial camera systems. In: Photogrammetric Week, vol. 7, pp. 89-106.
Davis, T.A., Duff, I., Amestoy, P., Gilbert, J., Larimore, S., Natarajan, E.P., Rajamanickam, S., 2014. Suite Sparse: A Suite of Sparse Matrix Packages
. Dubois, D., Lepage, R., 2014. Fast and efficient evaluation of building damage from very high resolution optical satellite images. IEEE J. Select. Top. Appl. Earth Observ. Rem. Sens. 7 (10), 4167–4176. Fritsch, D., Kremer, J., Grimm, A., 2012. Towards all-in-one photogrammetry. GIM Int., 18–23 Frommholz, D., Linkiewicz, M., Meissner, H., Dahlke, D., Poznanska, A., 2015. Extracting semantically annotated 3D building models with textures from oblique aerial imagery. Int. Arch. Photogram., Rem. Sens. Spat. Inf. Sci. 1, 53–58. Gerke, M., Kerle, N., 2011. Automatic structural seismic damage assessment with airborne oblique pictometry imagery. Photogram. Eng. Rem. Sens. 77 (9), 885– 898. Gerke, M., Nyaruhuma, A., 2009. Incorporating scene constraints into the triangulation of airborne oblique images. Int. Arch. Photogram., Rem. Sens. Spat. Inf. Sci. 38 (1), 4–7. Gülch, E., 2009. Advanced matching techniques for high precision surface and terrain models. In: Photogrammetric Week 9, pp. 303–315. Haala, N., 2014. Dense Image Matching Final Report, vol. 64. EuroSDR Official Publication, pp. 115–145. Hu, H., Zhu, Q., Du, Z., Zhang, Y., Ding, Y., 2015. Reliable spatial relationship constrained feature point matching of oblique aerial images. Photogram. Eng. Rem. Sens. 81 (1), 49–58. Konolige, K., Garage, W., 2010. Sparse bundle adjustment. In: British Machine Vision Conference (BMVC), pp. 1–11. Mostafa, M.M., Schwarz, K.P., 2000. A multi-sensor system for airborne image capture and georeferencing. Photogram. Eng. Rem. Sens. 66 (12), 1417–1424. Nex, F., Rupnik, E., Remondino, F., 2013. Building footprints extraction from oblique imagery. Int. Arch. Photogram., Rem. Sens. Spat. Inf. Sci. 2, 61–66. Ouellette, D.V., 1981. Schur complements and statistics. Linear Algebra Appl. 36, 187–295. Pierrot-Deseilligny, M., Clery, I., 2011. APERO - Open source bundle adjustment software for automatic calibration and orientation of a set of images. Int. Arch. Photogram., Rem. Sens. Spat. Inf. Sci. 38 (5/W16). Rau, J.Y., Jhan, J.P., Hsu, Y.C., 2015. Analysis of oblique aerial images for land cover and point cloud classification in an urban environment. IEEE Trans. Geosci. Rem. Sens. 53 (3), 1304–1319. Rupnik, E., Nex, F., Remondino, F., 2013. Automatic orientation of large blocks of oblique images. Int. Arch. Photogram., Rem. Sens. Spat. Inf. Sci. 1 (1), 299–304. Rupnik, E., Nex, F., Toschi, I., Remondino, F., 2015. Aerial multi-camera systems: accuracy and block triangulation issues. ISPRS J. Photogram. Rem. Sens. 101, 233–246. Sun, Y., Zhao, L., Huang, S., Yan, L., Dissanayake, G., 2014. L2-SIFT: SIFT feature extraction and matching for large images in large-scale aerial photogrammetry. ISPRS J. Photogram. Rem. Sens. 91, 1–16. Triggs, B., McLauchlan, P.F., Hartley, R.I., Fitzgibbon, A.W., 2000. Bundle adjustment – a modern synthesis. In: Vision Algorithms: Theory and Practice. Springer, Berlin, Heidelberg, pp. 298–372. Turlapaty, A., Gokaraju, B., Du, Q., Younan, N.H., Aanstoos, J.V., 2012. A hybrid approach for building extraction from space borne multi-angular optical imagery. IEEE J. Select. Top. Appl. Earth Observ. Rem. Sens. 5 (1), 89–100. Vetrivel, A., Gerke, M., Kerle, N., Vosselman, G., 2015. Identification of damage in buildings based on gaps in 3D point clouds from very high resolution oblique airborne images. ISPRS J. Photogram. Rem. Sens. 105, 61–78. Wiedemann, A., More, J., 2012. Orientation strategies for aerial oblique images. Int. Arch. Photogram., Rem. Sens. Spat. Inf. Sci. 39, 185–189. Yalcin, G., Selcuk, O., 2015. 3D city modelling with oblique photogrammetry method. Proc. Technol. 19, 424–431. Zhao, L., Huang, S., Sun, Y., Yan, L., Dissanayake, G., 2015. ParallaxBA: bundle adjustment using parallax angle feature parametrization. Int. J. Robot. Res. (IJRR) 34 (4-5), 493–516.