Comparative evaluation of iterative and non-iterative methods to ground coordinate determination from single aerial images

Comparative evaluation of iterative and non-iterative methods to ground coordinate determination from single aerial images

ARTICLE IN PRESS Computers & Geosciences 30 (2004) 267–279 Comparative evaluation of iterative and non-iterative methods to ground coordinate determ...

2MB Sizes 0 Downloads 28 Views

ARTICLE IN PRESS

Computers & Geosciences 30 (2004) 267–279

Comparative evaluation of iterative and non-iterative methods to ground coordinate determination from single aerial images$ Yongwei Sheng* Department of Geography, University of California, Los Angeles (UCLA), 1255 Bunche Hall, Los Angeles, CA 90095-1524, USA Received 30 August 2002; received in revised form 10 November 2003; accepted 30 November 2003

Abstract The single-ray backprojection problem refers to the process of determining ground coordinates of pixels in a single aerial image with the support of a digital surface model or a digital elevation model. Several methods have been employed to solve this problem. The iterative photogrammetric (IP) method, based on the inverse collinearity equations, is widely used in photogrammetry. The ray-tracing (RT) method, which is popular in computer graphics, computes the coordinates by intersecting the view ray with the surface. A third one is an iterative ray-tracing (IRT) method, which finds the intersection point by extending the view ray towards the surface by a certain step once a time until it hits the surface. Since the methods become diversified, there is a need to compare and evaluate them. This paper analyzes the principles of these three methods, tests them using a variety of data sets, and provides a comprehensive comparison on their strategies, parameter selection, divergence, occlusion-compliance, precision, robustness, and efficiency. The major difference of these methods is in the strategy of computing the intersection between the view ray and the surface, and this leads to their varied performance. It is found that the IP method is the most computationally efficient and can produce precise coordinates for simple surfaces, but it may surfer from the divergence and occlusioninduced problems for complicated ones. The rigorous RT method is precise, occlusion-compliant and parameter-free, but it is computationally intensive. The IRT method is intermediate in terms of efficiency. If the initial step is small enough, it can adequately address the occlusion-induced problem and produce satisfactory coordinates for complicated surfaces. This comparison provides a guide to method selection for the single-ray backprojection problem. r 2004 Elsevier Ltd. All rights reserved. Keywords: Digital mono-plotting; Single-ray backprojection; Iterative photogrammetric method; Ray-tracing method; Iterative ray-tracing method

1. Introduction It is not a straightforward process to determine the ground coordinates of a pixel in aerial images. Since an aerial image is a projection of the 3-D world on a 2-D image plane, the 3-D information of the scene is not $ Code available from server at http://www.iamg.org/ CGEditor/index.htm. *Tel.: +1-310-825-9422; fax: +1-310-206-5976. E-mail address: [email protected] (Y. Sheng).

fully conveyed to the 2-D image. Therefore, a single image is not sufficient to determine the 3-D ground coordinates. The 3-D ground coordinates can be obtained from a stereo pair of photos. The determination of a point’s position in the object space from two or more images is known as the process of space intersection in photogrammetry (Wolf and Dewitt, 2000). However, it is often necessary to determine the 3-D ground coordinates of a point appearing only in one image. A digital surface model (DSM) or a digital elevation model (DEM), a 3-D description of the

0098-3004/$ - see front matter r 2004 Elsevier Ltd. All rights reserved. doi:10.1016/j.cageo.2003.11.003

ARTICLE IN PRESS 268

Y. Sheng / Computers & Geosciences 30 (2004) 267–279

surface, is usually needed to complement the lost 3-D information in the determination process. A pixel’s ground coordinates are the coordinates of the intersection point between the terrain surface and the view ray, starting from the camera focal point and targeting to this pixel. The problem becomes to intersect the view ray with the surface depicted by the DSM, which is also known as the single-ray backprojection problem (Mikhail et al., 2001). In general, there are two methods to solving the backprojection problem: the iterative photogrammetric (IP) method and the ray-tracing (RT) method. The IP method is widely used in photogrammetry (Masry and McLaren, 1979; Radwan and Makarovic, 1980; Chen and Lee, 1993; Tudor and Sugarbaker, 1993; Doytsher and Hall, 1995; Li, 2002), while the RT method is . popular in computer graphics (Sheng, 2000; Borner et al., 2001). A third method was introduced by O’Neill and Dowman (1988) and reinforced by Mikhail et al. (2001), which is an iterative ray-tracing (IRT) method finding the intersection point by extending the ray towards the surface by a certain step once a time. In addition, Jauregui et al. (2002) recently developed a ‘‘pseudophoto’’ method to the problem. In this method, DSM grids are first projected onto an ideally vertical photo (i.e., the pseudophoto). This method projects the point in the image, whose ground coordinates to be determined, onto the pseudophoto, iteratively determines the DSM grid it falls in, and interpolates its ground coordinates from the four corner points of the grid. This method may not work for complicated terrain, and it is not included in this paper. Each method has its own strength and weakness. There however is a lack of efforts in evaluating and comparing them. This paper intends to provide such a comprehensive comparison on the first three methods.

expressed as 0 1 0 1 x  x0 X  Xs B C B C @ y  y0 A ¼ l  R  @ Y  Ys A; f Z  Zs

ð1Þ

where l is a scaling factor; R is the rotation matrix, which depends on camera attitude (i.e., f; o; k). Eliminating the scaling factor, we obtain the wellknown collinearity equations: r11 ðX  Xs Þ þ r12 ðY  Ys Þ þ r13 ðZ  Zs Þ ; r31 ðX  Xs Þ þ r32 ðY  Ys Þ þ r33 ðZ  Zs Þ r21 ðX  Xs Þ þ r22 ðY  Ys Þ þ r23 ðZ  Zs Þ ; y ¼ y0  f  r31 ðX  Xs Þ þ r32 ðY  Ys Þ þ r33 ðZ  Zs Þ

x ¼ x0  f 

ð2Þ where r11 ; r12 ; y; and r33 are the elements of the rotation matrix R: These equations transform the 3-D ground coordinates to the 2-D image coordinates, and indicate that the image coordinates of a point with known (X ; Y ; Z) in the object space can be solely determined. The inverse form of Eq. (1) is 1 1 0 0 X  Xs x  x0 C C B B ð3Þ @ Y  Ys A ¼ 1l  R1  @ y  y0 A: Z  Zs

f

Eliminating the scaling factor from Eq. (3), we arrive at the inverse collinearity equations: r11 ðx  x0 Þ þ r21 ðy  y0 Þ þ r31 ðf Þ ; r13 ðx  x0 Þ þ r23 ðy  y0 Þ þ r33 ðf Þ r12 ðx  x0 Þ þ r22 ðy  y0 Þ þ r32 ðf Þ : Y ¼ Ys þ ðZ  Zs Þ  r13 ðx  x0 Þ þ r23 ðy  y0 Þ þ r33 ðf Þ ð4Þ X ¼ Xs þ ðZ  Zs Þ 

Because the two equations are insufficient for three unknowns (i.e., X ; Y and Z), it is impossible to determine the ground coordinates (X ; Y ; Z) for a point at (x; y) on the image plane using Eqs. (4).

2. Relationship between photo coordinates and ground coordinates 3. Methods to the single-ray backprojection problem Before analyzing the methods to the backprojection problem, it is necessary to discuss the relationship between photo coordinates and ground coordinates. Photo and ground coordinate systems are two distinct coordinate systems, primarily in that the former one is 2-D while the later is 3-D, but their relationship can be established by using the perspective projection geometry. Suppose the camera location (Xs ; Ys ; Zs ), attitude (f; o; k) and properties (i.e., focal length f ; and the offsets x0 and y0 of the principal point) are known, a point at (X ; Y ; Z) in the ground coordinate system is projected to a point at (x; y) in the camera image plane, and the relationship between (x; y) and (X ; Y ; Z) is

This section briefs the principle of the following three methods to the single-ray backprojection problem: the IP method, the RT method, and the IRT method. 3.1. IP method The IP method is based on the inverse collinearity equations. An important implication of Eqs. (4) is that X and Y can be determined using these equations if the elevation Z is available. A DSM (or DEM) provides an elevation array indexed by column (i.e., the X coordinate) and row (i.e., the Y coordinate) in the object space.

ARTICLE IN PRESS Y. Sheng / Computers & Geosciences 30 (2004) 267–279

Thus the elevation of a cell is not available from the DSM until its column and row in the DSM array is known. An iterative procedure is therefore required. Fig. 1 illustrates how the IP method works. The iterative method plays on the profile intersected between the terrain surface and the vertical view plane defined by the view ray and the vertical line through the camera focal point. The profile inclination angle a around the intersection point and the view angle y are crucial to this method. The general procedures of this method to determine the intersection point for pixel P are as follows (Masry and McLaren, 1979; Chen and Lee, 1993; Tudor and Sugarbaker, 1993; Doytsher and Hall, 1995): (1) Using the given initial elevation Z0 (usually the median or the mean value of the DSM), determine the X and Y coordinates of the initial projected point A1 using Eqs. (4). (2) From A1 vertically trace to point B1 on the surface, and interpolate its Z coordinate Z1 from the DSM. (3) Update the elevation from Z ¼ Z0 to Z ¼ Z1 ; and repeat steps (1) and (2) to determine subsequent projected points A2 ; A3 ; A4 ; y;An until convergence. (4) If converged, output the coordinates of An as the ground coordinates of pixel P: Each iteration produces a projected point on the view ray. As the iteration continuing, A1 ; A2 ; A3 ; A4 ; y;An form a sequence of projected points. If the last two projected points are close enough (i.e., the distance

Camera S P

A2 B2



B1 A4

B3 A3

θ A1

Z0

Fig. 1. Iterative photogrammetric method. IP method solves intersection point iteratively along profile intersected between surface and vertical view plane defined by view ray and vertical line through camera focal point. Each iteration produces a projected point Ai (i ¼ 1; 2; y; n), and sequence of A1 ; A2 ; A3 ; A4 ; y;An approaches intersection point gradually if iteration is convergent. Profile inclination angle a around intersection point and view angle y are crucial to this method.

269

smaller than a pre-defined threshold DT), then the iteration is considered convergent and the coordinates of the last projected point are output as the ground coordinates of pixel P: 3.2. RT method RT is a popular image rendering technique in computer graphics to create 2-D images of the 3-D world (Glassner, 1994). The principle of RT is quite simple. Rays transfer among three major objects: a light source, a target and the camera. The backward RT technique is a popular and efficient RT strategy, which starts a ray from the camera, traces the ray to the objects, and then to the light source. The single-ray backprojection problem turns out to be the first portion of the backward RT process. For a pixel in the image, a ray R is defined by its origin (i.e., the camera focal point) R0 ¼ ½X s ; Y s ; Zs T and its normalized direction vector Rd ¼ ½Xd ; Yd ; Zd T : 0 1 0 1 0 1 Xs X ðtÞ Xd B C B C B C RðtÞ ¼ @ Y ðtÞ A ¼ R0 þ t  Rd ¼ @ Ys A þ t  @ Yd A; Zs Zd ZðtÞ ð5Þ where t is the distance between a point RðtÞ on the ray and the origin R0 : The core of a RT algorithm is the set of ray intersection routines (Haines, 1989). For an object surface described by equation f ðX ; Y ; ZÞ ¼ 0; an intersection occurs at point RðtÞ if and only if t is a root of equation f ðX ðtÞ; Y ðtÞ; ZðtÞÞ ¼ 0: In computer graphics, objects are usually modeled as geometric primitives. Plane, sphere, the quadratic family, and their derivatives are commonly used. Intersection of these surfaces with a ray can be solved analytically. It is impossible to represent the terrain surface with a general geometric primitive, but the terrain surface can be modeled seamlessly using a set of triangles, which is known as the triangular irregular network (TIN) model. The RT method can be directly applied if the terrain surface is readily available in a TIN model. However, surface data are usually provided in DSM grids. Gridbased DSM can be converted to TIN simply by splitting a cell into two triangles diagonally crossing the maximum gradient. In the RT method, the critical tasks are to find the triangle seen by pixel P; and to compute the coordinates of the intersection point between the triangle and the ray. It is computationally far too . expensive to screen from all triangles. Borner et al. (2001) proposed a faster algorithm by limiting the screening to relevant triangles. As illustrated in Fig. 2, this algorithm first finds the starting and ending nodes (S and E) of the view ray where intersects with the two bounding planes defined by the maximum (Zmax ) and

ARTICLE IN PRESS 270

Y. Sheng / Computers & Geosciences 30 (2004) 267–279

minimum (Zmin ) elevations of the DSM, and generates a list of DSM cells (shaded cells in the figure) touching the line segment defined by the footprints (i.e., A and B) of the two nodes. We further screen the list by using the 0 0 maximum (Zmax ) and minimum (Zmin ) heights of the 0 0 cells in the list. Normally, the range of Zmax and Zmin is much smaller than that of Zmax and Zmin ; thus this step significantly reduces the size of the list and limits the search to the hatched cells between C and D in Fig. 2. Each cell consists of two triangles, and the triangles of the cells in the list form a triangle list. Only the triangles in this list need to be checked from C to D for intersection with the view ray through the following procedures: (1) Construct the equation AX þ BY þ CZ þ 1 ¼ 0 of the plane, on which the triangle resides; (2) Calculate the distance between the camera and the intersection point A  Xs þ B  Ys þ C  Zs þ 1 t¼ ; A  Xd þ B  Yd þ C  Zd (3) Calculate coordinates of the intersection point RðtÞ using Eq. (5) with the above t; (4) Check if RðtÞ lies within the triangle. If it does, then it is the intersection point, and output its coordinates as the ground coordinates of pixel

P; otherwise, repeat steps (1)–(4) to check next triangle in the list until the intersection point is found. 3.3. IRT method The RT method is quite computationally intensive since the procedure of intersecting the ray and a triangle needs to be repeated many times. To avoid the direct intersection calculation between the ray and the surface, O’Neill and Dowman (1988) proposed an IRT method to determine the intersection by extending the ray towards the surface by a step once a time until it hits the surface. The elevation of the extended point on the ray is then compared to the elevation interpolated from the DSM to determine if the ray has intersected the surface (Fig. 3). If the point elevation becomes less than the surface elevation, then the ray is considered crossing the surface, and the intersection point is between the last two tracing points. There are two crucial parameters involved in this method, the starting point and the step size. Mikhail et al. (2001) suggested using the maximum elevation (Zmax ) in the DSM to determine the starting point (i.e., the intersection point between the ray and the horizontal plane of Z ¼ Zmax ). The step size is critical to the iteration speed and the output coordinate accuracy. If the step is too large, the method is fast but with a poor accuracy. On the contrary, if the step is small, the method becomes slow but the output coordinates are

Camera S P Ray Starting point Step size

Z max

Ray elevation ∆Z 0

Intersection point

Fig. 2. Ray-tracing method. RT method represents surface using TIN, and determines ground coordinates by mathematically intersecting view ray and triangles. To speed up procedure, algorithm limits intersection computation to only relevant triangles. Algorithm first calculates starting and ending nodes (S and E) of view ray where intersects with two bounding planes defined by maximum (Zmax ) and minimum (Zmin ) heights of DSM, finds a list of DSM cells (shaded cells in the figure) touching line segment defined by footprints (i.e., A and B) of 0 two nodes, and further screens list by using maximum (Zmax ) 0 and minimum (Zmin ) heights of cells in list. As a result, intersection computation is limited only to triangles of hatched cells between C and D.

Surface elevation θ

Fig. 3. Iterative ray-tracing method. IRT method starts a ray from a high point, and extends ray down towards surface by a step once a time until it hits surface. Elevation of extended point on ray is then compared to elevation interpolated from the DSM to determine if ray has intersected surface. If point elevation becomes less than surface elevation, then ray is considered intersecting with surface, and intersection point is between last two extended points. Procedure is then repeated using a smaller step until step size is smaller than a pre-defined threshold.

ARTICLE IN PRESS Y. Sheng / Computers & Geosciences 30 (2004) 267–279

reliable. Using multi-level steps can gain both speed and accuracy. The method first uses a relatively large step to roughly locate the intersection point. Once the intersection point has been found, the 3-D position of the intersection point can be refined around the found point using a smaller step size. In its implementation in this paper, we refined the step size to 1/10 of the previous one. The iterative process stops when the step becomes smaller than a pre-defined threshold DT:

(A)

4. Experiments The study site is a closed uneven-aged redwood stand located at the campus (122.38 W, 37.62 N) of the University of California, Berkeley. It is about 166 m long and 122 m wide containing approximately 60 redwood trees, several oak trees, and a couple of buildings. The difference in topographic elevation at this site is less than 12 m, while the redwoods are up to 45 m tall. The DEM (Fig. 4A) of the background terrain

(B)

20

(C)

271

20

33 (m)

70 (m)

(D)

Fig. 4. Background DEM, scene DSM, and aerial images of study area. (A) Smooth DEM describing background terrain surface. (B) Rough DSM describing scene surface. (C) Left image of stereo photo pair. (D) Right image.

ARTICLE IN PRESS Y. Sheng / Computers & Geosciences 30 (2004) 267–279

272 Table 1 Image properties of stereo pair Image

Image size

Resolution (m)

Station location (m)

Attitude: f; o; k (deg)

Nadir view Off-nadir view

669 449 669 449

0.24 0.24

(4370.7, 4002.3, 402.9) (4576.4, 4003.5, 404.7)

(0.5015,0.0028, 0.3228) (0.5015, 0.0028, 0.3228)

surface and tree surface models of individual trees were developed from aerial photos using a 3-D tree interpreter (Gong et al., 2002). The minimum, mean, median and maximum values of the DEM are 21.34, 24.80, 24.63 and 32.83 m, respectively. Buildings were interactively extracted and their elevations were calculated to generate a building surface model. The building surface model, the tree surface models and the background DEM were composed to produce the surface model (DSM) of the study area (Fig. 4B). The minimum, mean, median and maximum values of the DSM are 21.34, 30.42, 25.73 and 67.74 m, respectively. The DSM and the background DEM cover the same area (The ranges of X and Y ground coordinates are 4307.3–4428.8 m and 3951.8–4118.1 m, respectively.) with the same resolution (0.24 m) and the same dimensions (694 507 cells). To assess the methods’ performance under various situations, the ground coordinates are calculated for a stereo photo pair, consisting of a nadir view image (Fig. 4C) and an off-nadir image (Fig. 4D). The photo properties of the two images are listed in Table 1. Occlusions are severe in these images, especially in the off-nadir one with trees leaning towards the left. When occlusions are present, some algorithms are not able to correctly compute the coordinates for visible areas near the occlusions. This is called the occlusion-induced problem in this paper. Both the smooth background DEM and the complicated scene DSM are used in the experiments. These data sets are ideal for this comparison study, especially for evaluating these methods’ ability to address the occlusion-induced problem. Both the IP and IRT methods involve parameter selection. In the experiments, we used the median value of the surface models as the initial height for the IP method, and 1 m as the initial step size for the IRT method. The iteration threshold DT was defined as 0.024 m (i.e., 1/10 of pixel size) for both methods, which was the accuracy intended to achieve. The results produced by these three methods are shown in Figs. 5 and 6 for the background surface and the scene surface, respectively. To save figure space, only the Z coordinate maps are presented. In both figures, the upper and lower rows show the maps for the nadir image and the offnadir image, respectively; the maps produced by the IP, IRT and RT methods are shown in the left, middle and right columns. In the case of the background DEM (Fig. 5), the Z coordinate maps generated by theses three

methods look almost identical, and very similar to the background DEM (Fig. 4A). When the surface becomes complicated (i.e., the one represented by the DSM), the coordinate maps produced by these methods show significant difference. The map (Fig. 6A) produced by the IP method for the nadir image is different from those (Figs. 6B and C) produced by the IRT and RT methods. The IP method fails to the divergence problem around the buildings and trees, leaving black areas. The difference becomes more significant in the case of the off-nadir image. The trees lean towards the left in Figs. 6E and F due to the perspective geometry and the low view angle, but few of them show up in Fig. 6D because the IP method diverges or fails to the occlusion-induced problem for most trees. One can see from both Figs. 5 and 6 that the maps produced by the IRT and the RT methods are similar in any cases. A more detailed comparison on these results will be carried out in the comparison section.

5. Comparison This section compares and evaluates these methods in a great detail on their intersection strategies, parameter selection, divergence, occlusion-compliance, precision, robustness, and efficiency. 5.1. Intersection strategies Since the IP method is based on the photogrammetric collinearity equations, it seems to be quite different from the two RT methods. In fact, the IP method can also be expressed in the form of RT. From Eqs. (3) and (5), the direction vector of the ray can be calculated from the pixel coordinates ðx; yÞ in the image: 0 1 1 0 1 Xd X  Xs x  x0 B C B C B C 1  R1  @ y  y0 A @ Yd A ¼ 1t  @ Y  Ys A ¼ ðltÞ Zd Z  Zs f 0 1 r11 ðx  x0 Þ þ r21 ðy  y0 Þ þ r31 ðf Þ B C 1 ¼ ðltÞ  @ r12 ðx  x0 Þ þ r22 ðy  y0 Þ þ r32 ðf Þ A: 0

r13 ðx  x0 Þ þ r23 ðy  y0 Þ þ r33 ðf Þ

ARTICLE IN PRESS Y. Sheng / Computers & Geosciences 30 (2004) 267–279

(A)

(B)

(C)

(D)

(E)

(F)

20

273

33 (m)

Fig. 5. Z coordinate maps for background DEM. (A), (B) and (C) are maps produced for nadir image by IP, IRT and RT methods, respectively. (D), (E) and (F) are maps produced for off-nadir image by these methods, respectively.

profile. The RT method mathematically solves the intersection between the ray and the seen triangle. The IRT method finds the intersection by prolonging the ray until it hits the surface. The difference in their intersection strategies leads to the discrepancy in their performance.

The vector is normalized as 1 1 0 0 r11 ðx  x0 Þ þ r21 ðy  y0 Þ þ r31 ðf Þ Xd C C B B @ Yd A ¼ @ r12 ðx  x0 Þ þ r22 ðy  y0 Þ þ r32 ðf Þ A=s; r13 ðx  x0 Þ þ r23 ðy  y0 Þ þ r33 ðf Þ Zd where s is the normalization factor, and s¼

qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ðr11 ðx  x0 Þ þ r21 ðy  y0 Þ  r31  f Þ2 þ ðr12 ðx  x0 Þ þ r22 ðy  y0 Þ  r32  f Þ2 þ ðr13 ðx  x0 Þ þ r23 ðy  y0 Þ  r33  f Þ2 :

Thus, the RT form of the IP method is 0 1 0 1 X ðtÞ Xs B C B C @ Y ðtÞ A ¼ @ Ys A ZðtÞ Zs 0 1 r11 ðx  x0 Þ þ r21 ðy  y0 Þ þ r31 ðf Þ B C þt  @ r12 ðx  x0 Þ þ r22 ðy  y0 Þ þ r32 ðf Þ A=s: r13 ðx  x0 Þ þ r23 ðy  y0 Þ þ r33 ðf Þ ð6Þ In fact, the major difference of these three methods is in how they find the intersection point. The IP method iterates the intersection point along the surface

5.2. Parameter selection The RT method is easy to use without having to specify any parameters. Both the two iterative methods however involve parameter selection, and need to specify an iteration threshold DT as the accuracy control. The IP method needs an initial elevation estimate, which is critical to its convergence speed and may cause vulnerabilities to the occlusion-induced problem. It is not easy to select an adequate value for all pixels, though the median value of DSM elevations may be a good candidate for most pixels.

ARTICLE IN PRESS Y. Sheng / Computers & Geosciences 30 (2004) 267–279

274

(A)

(B)

(C)

(D)

(E)

(F)

20

70 (m)

Fig. 6. Z coordinate maps for scene DSM. (A), (B) and (C) are maps produced for nadir image by IP, IRT and RT methods, respectively. (D), (E) and (F) are maps produced for off-nadir image by these methods, respectively.

The IRT method needs to select the initial step size, which is also difficult to estimate. When the surface is complicated and viewed from the side, occlusions may occur in the aerial image. If the step size is selected too big, the first intersection point might be missed and the produced coordinates might be degraded by the occlusion-induced problem. If the step is small enough, the occlusion-induced problem will be lessened considerably, but the computational expense will rocket up. Using the case of off-nadir image for the DSM, Fig. 7 illustrates the influence of parameter selection on the performance of the IRT method. When using a large initial step (i.e., DS ¼ 5 m) and a DT of 0.024 m, the IRT method surfers from the occlusion-induced problem and produces fuzzy edges around treetops in the output Z coordinate map (Fig. 7A). The method is computationally efficient with such a large initial step, and most pixels need about 25 iterations to find their intersection points (Fig. 7E). Reducing DS to 1 m, the occlusioninduced problem is lessened considerably (Fig. 7B) at an expense of 27 more iterations on average (Fig. 7F). When reducing DT to 0.00024 m, the IRT method needs 63 iterations on average (see Figs. 7C and G). When setting DS to 0.1 m and DT to 0.00024 m for accurate

coordinates, the Z coordinate map (Fig. 7D) is not improved much, but the number of iterations jumps to 418 on average (Fig. 7H). 5.3. Divergence Divergence is a major weakness of the IP method. This method may be divergent when the surface is complicated. The IP method converges everywhere for the background terrain surface (see Figs. 5A and D), but not for the complicated scene surface. Many divergent areas (in black) are found around trees in the Z coordinate map (Fig. 6A) of the nadir image, and the divergent areas expend in the map (Fig. 6D) of the offnadir image viewed at a lower angle. The divergence problem does not apply to the RT method since it is not an iterative method. Because it is certain for the extended ray to hit the surface, the IRT method does not surfer from the problem either. 5.4. Occlusion Occlusions often occur in off-nadir images of highly relieved areas (Sheng et al., 2003), and may create

ARTICLE IN PRESS Y. Sheng / Computers & Geosciences 30 (2004) 267–279

275

70

(A)

(B)

(C)

(D)

20 (m)

600

200

60

(E)

(F)

(G)

(H)

5

Fig. 7. Influence of parameter selection on IRT method. Upper row shows Z coordinate maps. (A) Initial step size DS ¼ 5 m, DT ¼ 0:024 m. (B) DS ¼ 1 m, DT ¼ 0:024 m. (C) DS ¼ 1 m, DT ¼ 0:00024 m. (D) DS ¼ 0:1 m, DT ¼ 0:00024 m. Lower row shows iteration number maps, (E), (F), (G) and (H) corresponding to (A), (B), (C) and (D), respectively. Z map (A) produced using large DS is degraded by occlusion-induced problem, especially around treetops, whereas small DS significantly increases computational expenses (H).

problems to the produced coordinates. The RT method searches for the closest intersecting triangle, and is therefore immune to the occlusion-induced problem. The two iterative methods, however, are not occlusioncompliant. The occlusion-induced problem in the IP method can be lessened if the initial elevation can be adequately estimated, but this is not always possible. Since the initial elevation estimate in this paper uses the median value of DSM, which is close to the elevation of the background surface, the IP method tends to produce the coordinates for a ground point even if it is occluded by higher objects such as buildings and trees. Very few coordinates were produced from tree points in Fig. 6D due to the vulnerability of the method to divergence and the occlusion-induced problem. The IRT method surfers from the occlusion-induced problem to a less degree compared to the IP method. When the step size is too large to capture the first intersection point, the IRT method becomes vulnerable to the occlusion-induced problem. In the case of the offnadir viewed DSM, the number of occluded pixels drops from 3174 to 350 when the initial step size is reduced from 5 to 1 m. If the step is selected small enough, the

occlusion-induced problem can be well handled but at a high computational expense (Fig. 7H). 5.5. Accuracy assessment Since it is difficult to obtain the true coordinates, there is no direct way to assess the accuracy of each method. A cross-validation was carried out as an alternative. The idea is that a method is likely the most reliable method if it in any case always produces similar results to other methods. The mean Euclidean distance of the output coordinates between two of the three methods was computed (see Table 2). For the two cases of the background DEM, the coordinates produced by the IP and RT methods are the closest with the mean distance of 0.000098 and 0.00018 m, respectively. For the cases of the DSM, the IRT and RT methods produce the most similar coordinates with the mean distance of 0.0459 and 0.0505 m, respectively. The RT method produces the most similar results to either the IP or the IRT method in any cases. This suggests that the RT method produce the most reliable coordinates. This can also be observed from the principle of the RT method. If the TIN model

ARTICLE IN PRESS Y. Sheng / Computers & Geosciences 30 (2004) 267–279

276

Table 2 Cross-validation of these three methods DEM

Photo

IP vs. IRT a

IP vs. RT b

IRT vs. RT

Background DEM

Nadir Off-nadir

0.004988 0.005011

0.000098 0.000180b

0.004998 0.004995

DSM

Nadir Off-nadir

0.792063 5.682584

0.784091 5.669951

0.045920b 0.050458b

Euclidean distances of output coordinates between two of three methods were computed. This table shows mean distance (m) for each case. Figures in this table suggest that RT method is most accurate. a Unit: meters b Indicates minimum distance in each case.

0.024

0.012

(A)

(B)

(C)

(D)

0 (m)

(E)

(F)

(G)

(H)

Fig. 8. Error maps of IP and IRT methods. Upper and lower rows show error maps of IP and IRT method, respectively. (A) and (E) nadir images using DEM. (B) and (F) off-nadir images using DEM. (C) and (G) nadir images using DSM. (D) and (H) off-nadir images using DSM. IP method is more precise for smooth surface than IRT method, but it becomes problematic for a complex surface.

adequately represents the surface, this method can mathematically calculate precise coordinates. The RT method is rigorous and accurate, and the output coordinates can be very close to the true values. Assuming that the RT method produces true coordinate values, the error map of the IP and IRT methods can be modeled by taking Euclidean distance between their output coordinates and those of the RT

method. Fig. 8 shows such error maps of the IP and IRT methods in all cases. In the case of the nadir image using the DEM, the error map (Fig. 8A) produced by the IP method is smaller than 0.0025 m everywhere. The error map (Fig. 8B) for the off-nadir image using the DEM becomes brighter, indicating the error increases, but the maximum error is still small (i.e., 0.0050 m). For the cases of the DSM, the error maps (Figs. 8C and D) show

ARTICLE IN PRESS Y. Sheng / Computers & Geosciences 30 (2004) 267–279

277

Table 3 Computational expenses of these methods DEM

Photo

IP

IRT

RT

Flops

Iterations

Flops

Iterations

Flops

Intersections

Background DEM

Nadir Off-nadir

36.74 39.40

3.05 3.27

189.55 202.29

19.47 20.90

252.64 363.78

2.67 3.45

DSM

Nadir Off-nadir

67.18 88.78

5.52 7.27

432.81 456.32

48.36 51.65

780.47 3664.68

10.19 53.09

Table 3 shows total number of float operations (unit in million flops) recorded by MATLABs. It also lists for each case mean iteration number for IP and IRT methods or averaged number of intersections for RT method.

high errors on tree crowns and around occluded areas (in white). The error of the IRT method should be smaller than the specified iteration threshold DT if the occlusioninduced problem is adequately addressed. The maximum error in the two Z coordinate maps (Figs. 8E and F) generated for the nadir and off-nadir images using the DEM is 0.0117 and 0.0120 m, respectively, all smaller than DT (i.e., 0.0240 m). In both cases, the errors are much larger than those of the IP method. The two error maps (Figs. 8G and H) using the DSM show high errors exceeding DT at building and tree edges. This can be explained by the fact that the RT method models the surface in a different way. The IP and IRT methods work on the grid-based surface model and use the bilinear method to interpolate height at a certain location, while the RT method uses the TIN model to represent the surface. The height interpolated from the grid-based surface model should be very close that derived from the TIN model in smooth areas, but it could be significantly different around sharp edges. In summary, the IP method can produce more precise coordinates for smooth surface than the IRT method, but the IRT method can better handle complicated surface.

5.6. Efficiency The speed of the IP method depends on the profile inclination angle a; the view angle y; the iteration threshold DT; and the offset DZ0 of the initial elevation estimate from the true height. If the iterative method converges, it requires a number of iterations. Each iteration costs about 30 float operations (flops) in interpolating elevation Z from DSM, computing X and Y coordinates from Z; and converting ground coordinates to the row and column numbers in the DSM. For the IRT method, the iteration speed depends on the view angle y; the iteration threshold DT; the initial

step size DS and the offset DZ0 of the elevation from the maximum elevation. Each iteration costs about 25 flops in tracing next ray point, converting ground coordinates to the row and column numbers in the DSM, and interpolating elevation from DSM. The RT method is the most computationally expensive. Its speed depends on is the view angle y; the DSM cell size, and the offset DZ0 of the elevation from the maximum elevation in the screened cell list. The expense is mainly spent on ray-triangle intersection computation, which costs about 210 flops. These three methods were implemented in MATLABs version 5.3. Table 3 shows the total number of float operations (unit in million flops) recorded by MATLAB. It also lists for each case the averaged number of iterations for the IP and IRT methods or the averaged number of intersection calculations for the RT method. The IP method turns out always to be the most efficient in every case. In general, the IRT and the RT methods cost about five times and ten times more than the IP method, respectively. The IP and IRT method do not significantly increase computational expenses from nadir to off-nadir view, from simple to complicated surface. On the contrary, the computational cost of the RT method significantly increases, and the case of the off-nadir image using the DSM needs about 3.665 billion flops, costing seven and 40 times more than the IRT and the IP methods, respectively.

5.7. Robustness The RT method is robust and can tackle all situations, nadir or off-nadir views, simple or complicated surfaces. The IRT method may not handle the occlusion-induced problem well unless an adequately small initial step is used. The IP method is the most vulnerable in these three methods. When the surface becomes complicated, it may be divergent and may fail to the occlusioninduced problem.

ARTICLE IN PRESS 278

Y. Sheng / Computers & Geosciences 30 (2004) 267–279

6. Discussions Since these three methods can be expressed in a form of RT as shown in Section 5.1, they all do not rely on the photogrammetric collinearity equations, and are not limited to the aerial photo’s perspective geometry. They can be applied to images generated by other imaging mechanisms as long as the view ray can be defined. These should include aerial scanning systems and satellite systems. . As Borner et al. (2001) did for the RT method, a prediction algorithm can help to speed up these methods by taking advantage of the high correlation between the ground coordinates of the neighboring pixels. Neighborhood information was not used in this comparison. The reasons are twofold. First, the prediction algorithm can work only for smooth surfaces; second, the feature points to be digitized in the mono-plotting process are usually not close to each other, thus such a prediction algorithm is not applicable. The RT method has potential to improve its efficiency using multi-resolution TIN models. The RT method in this paper uses a regular TIN network to represent the surface by simply splitting a grid cell into two triangles, resulting in a large number of triangles. A procedure of merging triangles in flat areas and representing the surface using multi-resolution TIN models (Cignoni et al., 1997) will potentially reduce the number of triangles and speed up the RT method.

7. Conclusions This paper provides a comprehensive comparison of three methods to ground coordinate determination from single images and a surface model. They are the iterative photogrammetric method, the ray-tracing method, and the iterative ray-tracing method. These methods employ different strategies in determining intersection between the ray and the surface, and this is responsible for their varied performance. The RT method is rigorous, precise and robust, and it can properly address the occlusioninduced problem. It works under any circumstances no matter how complicated the surface is; however, it is quite computationally intensive. The IP method is the most efficient, about seven times faster than the RT method for simple surface, and 40 times faster for complicated surface. This method can produce quite precise coordinates for simple surface, but it becomes problematic when the surface is complicated. It may diverge and fail to the occlusion-induced problem. The IRT method has its efficiency in between. It does not surfer from the divergence problem though it is an iterative method, and can deal with all surfaces, simple or complicated. Initial step size is a critical parameter to this method. An adequately small initial step can

considerably prevent this method from the occlusioninduced problem, but it significantly increases the computational expenses. In summary, the IP method is appealing due to its efficiency and high precision when the surface is smooth and less complicated. The IRT and the RT methods are candidates for complicated surfaces. The IRT method is a good choice for both effectiveness and efficiency, especially for the case of complicated surfaces and offnadir images when the RT method is not computationally affordable. With the increasingly availability of computing resources, the RT method will attract more attention due to its high precision and robustness.

References . Borner, A., Wiest, L., Keller, P., Reulke, R., Richter, R., Schaepman, M., Schl.apfer, D., 2001. SENSOR: a tool for the simulation of hyperspectral remote sensing systems. ISPRS Journal of Photogrammetry and Remote Sensing 55 (5–6), 299–312. Chen, L.C., Lee, L.H., 1993. Rigorous generation of digital orthophotos from SPOT images. Photogrammetric Engineering and Remote Sensing 59 (3), 655–661. Cignoni, P., Puppo, E., Scopigno, R., 1997. Representation and visualization of terrain surfaces at variable resolution. Visual Computer 13 (5), 199–217. Doytsher, Y., Hall, J., 1995. Fortran programs for co-ordinate resection using an oblique photograph and high-resolution DEM. Computers & Geosciences 21 (7), 895–905. Glassner, A.S., 1994. 3D Computer Graphics: A User’s Guide for Artists and Designers, 2nd Edition. Design Books, New York, 213pp. Gong, P., Sheng, Y., Biging, G.S., 2002. 3D Model-based tree measurement from high resolution aerial imagery. Photogrammetric Engineering and Remote Sensing 68 (11), 1203–1212. Haines, E., 1989. Essential ray tracing algorithms. In: Glassner, A.S. (Ed.), An Introduction to Ray Tracing. Academic Press, New York, pp. 33–77. ! L., 2002. A procedure for Jauregui, M., Vilchez, J., Chacon, map updating using digital mono-plotting. Computers & Geosciences 28 (4), 513–523. Li, Z., 2002. Orthoimage generation and measurement from single images. In: Chen, Y.Q., Lee, Y.C. (Eds.), Geographical Data Acquisition. Springer, New York, pp. 173–192. Masry, S.E., McLaren, R.A., 1979. Digital map revision. Photogrammetric Engineering and Remote Sensing 45 (2), 193–200. Mikhail, E.M., Bethel, J.S., McGlone, J.C., 2001. Introduction to Modern Photogrammetry. Wiley, New York, NY, 479pp. O’Neill, M.A., Dowman, I.J., 1988. The generation of epipolar synthetic stereo mates for SPOT images using a DEM. International Archives of Photogrammetry and Remote Sensing 27 (B3), 587–598. Radwan, M.M., Makarovic, B., 1980. Digital mono-plotting system—improvements and tests. ITC Journal (3), 511–534.

ARTICLE IN PRESS Y. Sheng / Computers & Geosciences 30 (2004) 267–279 Sheng, Y., 2000. Model-based conifer crown surface reconstruction from multi-ocular high-resolution aerial imagery. Ph.D. Dissertation University of California at Berkeley, Berkeley, CA, 168pp. Sheng, Y., Gong, P., Biging, G.S., 2003. True Orthoimage Production for Forested Areas from Large-Scale Aerial Photographs. Photogrammetric Engineering and Remote Sensing 69 (3), 259–266.

279

Tudor, G.S., Sugarbaker, L.J, 1993. GIS orthographic digitizing of aerial photographs by terrain modelling. Photogrammetric Engineering and Remote Sensing 59 (4), 499–503. Wolf, P., Dewitt, B.A, 2000. Elements of Photogrammetry with Application in GIS, 3rd Edition. McGraw Hill, New York, 608pp.