COMPUTER
GRAPHICS
AND
IMAGE
PROCESSING
IS,
201-223 (1981)
Detection of Roads and Linear Structures in Low-Resolution Aerial Imagery Using a Multisource Knowledge Integration Technique* M. A. FISCHLER, J. M. TENENJSAUM, AND H. C. W O L F ArtifciaI
Intelligence Center, Computer Science and Technol~ Division, SRI Zntemational, 333 Ravenwood Avenue, Menlo Park, California 94025 Received October 26, 1979; revised February 4, 1980; accepted March 25, 1980
This paper describes a computer-based approach to the problem of detecting and precisely delineating roads, and similar “line-like” structures, appearing in low-resolution aerial imagery. The approach is based on a new paradigm for combining local information from multiple, and possibly incommensurate, sources, including various line and edge detection operators, map knowledge about the likely path of roads through an image, and generic knowledge about roads (e.g., connectivity, curvature, and width constraints). The fii interpretation of the scene is achieved by using either a graph search or dynamic programming technique to optimize a global figure of merit. Implementation details and experimental results are included. 1. INTRODUCTION
Given the problem of producing an overlay showing the clearly visible roads in an aerial image, a person would normally be expected to accomplish this task with little difficulty, even though he may be completely unfamiliar with the terrain depicted in the image. Our purpose in this paper is to clarify the nature of this task and some of its generalizations. In particular, we wish to specify the requirements and mechanisms for a machine to be capable of near-human performance in finding roads and other semantically meaningful linear structures in aerial imagery. A. Performance Criteria Our goal is to produce a list of connected points for each segment of road which is tracked in the input image. Each such track is a delineation of the actual road and should have the following properties: (1) No point on a track should be located outside of the road boundaries when the road is clearly visible. (2) The track should be smooth where the road is straight or smoothly curving (within the constraints of a digital raster representation). (3) If parts of the road are occluded, those portions of the continuous overlaying the occluded segments should be labeled as such.
track
(4) In areas where the road is partially occluded, the track should follow the actual center of the road (as opposed to the center of the visible portion). If the road is composed of adjacent but separated lanes, then each lane will be considered a separate road for our purposes. *The work reported herein was supported by the Defense Advanced Research Projects Agency under Contracts DAAG29-76-C-0057 and MDA903-79-C-0588; and by the U.S. Army Engineer Topographic Laboratory under Contract DAAK70-78-C-0114. 201 0146664X/81/030201-24$02.00/0 Copy~@t 0 1981 by Academic Press. Inc. All rights of repmduction in any form reserved.
202
FISCHLER,
B. Contextual
TENENBAUM,
AND WOLF
Settings for Road Tracking
A “road” is a functionally defined entity whose appearance in an image depends largely on its width and how much internal road detail is visible; i.e., appearance depends largely on image resolution (see Fig. 1). Additional factors having a major effect on visually locating roads in imagery include the visible extent of the road, its contrast with the adjacent terrain, the presence of nearby linear structures, and any prior knowledge about the actual shape of the road and its location in the
FIG. 1. Road scenes depicted at a spectrum of resolutions.
ROAD DETECTION
USING
We have found that the following ent approaches to the road tracking
KNOWLEDGE
contextual problem:
(1) High vs low resolution (low resolution width of three or fewer pixels).
INTEGRATION
settings require significantly
203 differ-
is defined as roads having an image
(2) Clear vs occluded viewing (clear viewing is defined as a situation in which no more than approximately 30% of the road being tracked is occluded by clouds, intervening objects, etc.). (3) High vs low density of linear detail (nominally, this distinction corresponds to urban vs rural scenes). In this paper we will mainly be concerned with tracking roads in clear imagery of rural scenes at low resolution. A robust technique for tracking roads in highresolution imagery was previously reported (Quam [14]). We note that in the case of high-resolution imagery, once the road has been “acquired” and we are able to track features internal to the road boundaries, the surrounding detail is of minor importance (except as it introduces shadows and occlusions); thus, the distinction between urban and rural scenes is important mainly at low resolution. Where the roads are heavily occluded, road matching rather than road tracking is the appropriate technique; here one needs to have prior knowledge of the geometry of the road networks being searched for. Prior knowledge about the (approximate) location and/or direction of the roads in the imagery is important if a specific road (as opposed to all roads) is to be tracked; some method of indicating which road we are interested in is necessary, and this is typically done by delimiting a search area in the input image. Finally, prior knowledge about terrain type and/or scene elevations can be used to help distinguish low-resolution roads from other linear features by invoking cultural and economic constraints which are known to affect road construction. In the following section, we argue that there is no currently available single coherent model suitable for reliable detection of local road presence. It is thus essential that some means for integrating information from multiple (incommensurate) image operators and knowledge sources be devised. We present a general paradigm for this multisource integration task, and describe its specific application to the problem of detecting linear structures. 2. LOW-RESOLUTION
ROAD TRACKING
At low resolution, roads are often indistinguishable from other linear features appearing in the image (including artifacts, such as scratches). Thus, the lowresolution road tracking problem largely reduces to the general problem of line (as opposed to edge) following. Nevertheless, there are still some weak semantics that can be invoked to specifically tailor a system for road tracking, trading some generality for significant increases in performance. A. The Basic Paradigm The basic paradigm we employ is to first evaluate all local evidence for the presence of a road at every location in the search area (a low numeric value indicates a high likelihood that the given image point lies on a road), and then find a single track which, while satisfying imposed constraints (such as continuity),
204
FISCHLER, TENENBAUM,
AND WOLF
minimizes the sum of the local evaluation scores (costs) associated with every point along the track. While the -basic optimization paradigm is not new (e.g., Fischler [5], Montanari [ 121, Martelli [ 111, Barrow and Tenenbaum [ 11, and Rubin [ 18]), it is incomplete in that it does not provide mechanisms for reconciling incommensurate sources of information. This capability is crucial in problems such as road trackin in which no single coherent model is adequate for reliable detection. In this paper we introduce the following new and relatively simple mechanisms for combining local evidence and constraints in the context of an optimization paradigm for detecting linear structures: -Partitioning image operators into two classes, based on their error characteristics: Type I operators that will almost never incorrectly classify artifacts as instances of the structure they are searching for, but may often miss correct instances; versus Type II operators that accurately measure relevant parameters of all true instances but may falsely classify and incorrectly parameterize noninstances. -Differential use of these two classes of operators: operators with good classification ability provide a framework which gets filled in by information supplied by operators providing precise local feature characterization. -Transformation of individual operator scores to a scale in which differential numerical values are responsive to externally supplied knowledge about the particular scene being processed (e.g., knowledge about road shape or occlusions).
B. Detecting Local Road Presence-Road Operators and Models At low resolution, roads are line-like structures of essentially constant width, which, in general, are locally constant in intensity in the along-track direction and show significant contrast with the adjacent terrain (generally, they are either uniformly lighter or darker). A specific interpretation of this low-resolution road model is embodied in the Duda Road Operator (DRO)’ described in the Appendix. In Fig. 2 we show some examples of the scores produced by this operator on a variety of road scenes. It is apparent that the DRO does a good job most of the time but has some significant weaknesses; it is sensitive to (a) road orientation (in directions other than the four principal directions explicitly covered by the masks described in Fig. Al), (b) raster quantization effects (e.g., where a straight line segments “jogs” in crossing a quantization boundary), (c) sharp changes in road direction, and (d) certain contrast problems with the adjacent terrain, At this point one might wonder if a special road operator is really required; why not simply use a generic edge detector (e.g., Sobel [in Duda and Hart [4], Roberts [15], or Hueckel [9, lo])? Even more to the point, we notice that it is possible to interpret the effect of employing an operator on an image as resulting in the suppression of all detail other than that associated with the entity to be detected; therefore, a high-pass filter might act as a perfectly good road operator. Finally, roads will generally be lighter or darker than the immediately adjacent terrain; why not simply use the actual intensity values (contrast enhanced and possibly inverted, depending on the relative brightness between the road and adjacent terrain)? In ‘Suggested by R. 0. Duda of SRI International; other similar operators spccializsd for line detection are described in Rosenfeld and Thrston [ 171 and in VanderBrug [19,20].
ROAD DETECTION
USING KNOWLEDGE
b
a
C
INTEGRATION
d
e FIG. 2. Duda road operator applied to a number of scenes.
205
206
FISCHLER,
TENENBAUM,
a
C
AND WOLF
b
d
FIG. 3. Different road operators applied to the same scene. (a) Or@nal operator, (c) Roberts’ cross gradient, (d) Sobel-type. gradient.
image, (b) Duda
road
Fig. 3 we show a comparison of these different techniques applied to the same road scene; in Fig. 4 the scores are thresholded to make explicit the locations in the image which are assigned the highest road presence likelihoods by the different techniques. Note that in contrast to the DRO, the other operators whose performance is depicted in Figs. 3 and 4 make few errors of omission, but tend to make classification errors that are spatially coherent. In the approach we have developed, a key attribute characterizing the utility of a “local” image feature detector (i.e., “operator”) is the percentage and coherence of its mistakes when it is almost certain it has found instances of the feature it is designed to detect. Even though the Duda road operator makes mistakes of omission, its performance in not making coherent false-alarm type errors is quite good.
ROAD DETECTION
a
USING
KNOWLEDGE
INTEGRATION
207
b
d
e
f
FIG. 4. Different road operators applied to the same scene. (Operator scorea are thresholded to bigblight the locations assigned the best scores.) (a) original image, (b) Duda road operator, (c) Roberts’ cross gradient, (d) Sobel-type gradient, (e) Hueckel line operator, (f) intensity.
208
FISCHLER, TENENBAUM,
AND WOLF
C. Combining Incommensurate Sources of Knowledge-An Basic Optimization Paradigm
Elaboration of the
We will now specify a general approach for combining the results deduced by the application of a set of (road) operators, and for introducing prior knowledge and constraints to influence the answer produced by the optimization algorithm. We partition our inventory of operators into two categories-Type I operators, each of which can be adjusted to make very few coherent errors in detecting instances of the relevant feature when the feature is not present (possibly at the cost of making a large number of omission errors); and Type II operators, each of which can be adjusted to reliably give a quantitative indication of the presences of the feature when it is actually under examination (but these operators might be very unreliable in their assertions when examining something other than the desired feature). Our basic approach is to strongly bias (or even constrain) the desired answer to fit the coherent pattern produced by a superposition of evidence provided by all the Type I operators and to fill in the details locally, using that particular Type II operator which seems to be most certain that it has found the desired feature. (A more comprehensive discussion of methods for combining multisource evidence is given in Fischler and Garvey [6].) A problem that immediately arises is how to combine the results of several Type I and Type II operators. By considering the output of Type I operators to be valid binary decisions, we have made them commensurate and can logically combine their outputs. In the context of tracking roads (or other linear features), we scan each of our Type I operators over some specified region of interest and create a binary overlay mask containing the logical union of the locations at which one or more of these operators has detected the presence of a road with high likelihood. An example of such a mask, called a “perfect road score” (PRS) mask, is compared in Fig. 5 with the road image from which it was derived. The problem of combining the results produced by a set of Type II operators has no acceptable solution when the values they return are not probabilities or other commensurate quantities. However, Type I and Type II operator scores can be
a Fro. 5. A scene and its perfect road score mask.
ROAD DETECTION
USING
KNOWLEDGE
INTEGRATION
209
combined, since a positive Type I output can always be set to the maximum value (zero cost) on the likelihood scale of any Type II operator. Our approach is thus to modify the cost array (CA) produced by each Type II operator so that zero cost is incurred at those locations marked in the PRS mask as places at which there is a very high likelihood of local road presence. The optimization algorithm is separately applied to each CA, and a best path through each such array is independently computed. Since there is no way to directly compare the relative quality of these alternative solutions, we employ the following heuristic: the average cost per pixel along the track in each Type II array is calculated. This average cost is normalized by determining its ranking in a histogram of costs obtained by the same operator in a region surrounding the road track. The track with the highest normalized ranking is selected as the primary road track through the given region.
a
b
d FIG. 6. Examples of how transforming Type II Image operator scores (X) allows us to adjust the trade-off between road smoothness and placing the road track at its locally most probable location. (a) x’ = XT1 + 1, (b) X’ = XT2 + 1, (c)X’ = Xf2 + 1, (d) X’ = XT2 + 2000.
210
FISCHLER,
1. Introducing
TENENBAUM,
AND WOLF
A Priori Knowledge
In addition to embedding Type I information into the Type II cost arrays as a framework around which the road track will be constructed, constraints and preferences on shape, and on the balance between local and global information can also be inserted by appropriate modification of the values in the Type II cost arrays. Adding a constant bias “b” to each cost value tends to smooth and straighten the road track (somewhat like pulling the path taut); this effect occurs because as the bias increases, the length of the track becomes relatively more important in comparison to the local quality as defined by the individual costs returned by an operator. Similarly, raising each cost to a power “a” introduces a very strong inhibition against going through a point having a low likelihood of being on a road (but can result in detouring around shadows, vehicles, etc.). The above cost transformations provide a convenient means for introducing a. priori knowledge into the optimization procedure. For instance, if we are tracking a ragged coastline, we would opt for placing the path through those locations having the best edge scores, as opposed to trying to smooth the result; here we would use a zero bias. However, if the linear feature was known to be fairly straight, a high bias would be appropriate. A bias could also be used where there is evidence that occlusion exists (e.g., due to clouds or to intervening objects), or where there is no significant contrast between the road and the adjacent terrain, to reduce the preference for one path over another. Raising costs to a power would be helpful in tracking a road through a region where other strong linear structures were known to exist since jumps between features would be inhibited. Figure 6 shows some examples of tracking a road with modified costs obtained using the transformation cost’ = cost0 + b. D. The Optimization
Algorithm-Finding a Best Path Through an Array of Local Likelihoods
The result of applying Type I and II operators to an image, and combining the resulting scores via the methods described in the preceding section, is an array of positive costs inversely related to the local likelihood of the presence of a road at the corresponding image location. We now consider the problem of finding a path through this array such that the sum of the costs along this path is minimized. We will initially assume that specific starting and terminating points are given and later show how this constraint can easily be removed. The connectivity relationships between adjacent pixels in an image array define a graph: the image pixels correspond to vertices, each of which is connected by a directed weighted arc to its eight immediately adjacent array neighbors. If the weight associated with each arc* represents the incremental cost of a road track passing through the vertex (pixel) to which it points, then the problem of finding an optimal road track is equivalent to finding a minimum cost path through this graph. 2Since the incremental cost attached to that pixel and is entering a given vertex will neighbors may be scaled by neighbors.
incurred by adjoining a specific pixel to a given track is simply the cost independent of any characteristics of the other path members, all arcs generally have equal weights. However, arcs from diagonally located a factor of sqrt(2) to reflect the additional physical distance to these
ROAD DETECTION
USING
KNOWLEDGE
INTEGRATION
211
There is a large body of literature describing various algorithmic solutions to the minimum path problem (e.g., see the excellent survey by Dreyfus [3]), and thus our main concern here is algorithmic efficiency under the specific connectivity constraints specified above. Two alternatives were considered: the A* algorithm presented in Duda and Hart [4] (a generalization of the Moore [ 131 and Dijkstra [2] algorithms) and used by Martelli [l l] for edge tracking; and a variation of an algorithm originally described by Ford [7] (and related to the distance transform described by Rosenfeld [16]), which we will call the F* algorithm. 1. The A* Algorithm The A* algorithm iteratively constructs a minimal cost path from some “start” node s to a “goal” node g by extending the best partial path available at each step of the iteration (i.e., a best first search). The vertex u selected for “expansion” at each step is chosen according to an evaluation function f(u), which is an estimate of the cost of a minimal cost path from s to g constrained to go through u. The evaluation function f( u) can be expressed as
f(u) =fl(u)
+f2(uh
where fl(u) is the lowest cost path from s to u found so far, and f2( u) is an estimate of the cost of a minimal cost path from u to g (fl is initialized to infinity for all vertices other than s). For optimality, f2 must be a lower bound on the true cost. It can be set equal to zero if no better estimate is available; when f2 is identically zero for all vertices, A* and the Dijkstra algorithm are also identical. If f2 is identically zero (the Dijkstra algorithm) or satisfies the “consistency” condition which requires that the difference between the estimated costs to the goal vertex g from any pair of vertices ul and u2 must be a lower bound on the true cost of an optimal path from ul to ~2, then it can be shown that when A* expands a vertex u, an optimal path from s to u has been found. Thus, since at each iteration (assuming we employ an f2 which satisfies the consistency condition) A* determines the true minimal path length from s to some new vertex u, it will converge in less than N iterations (where N is the number of vertices in the graph being searched). After each iteration, we examine those vertices still not having final values for f 1 and select that vertex u’ with the smallest current value off as the next vertex to be expanded: the value of fl at u’ cannot be further reduced and is thus labeled as final; each of the eight neighbors of u’ is examined to see if a reduced value off 1 at these vertices is possible due to a path from s to the neighbor through u’. The algorithm terminates when u’ = g and an optimal path can be found by starting at g and iteratively moving to the neighboring vertex with the lowest value3 off 1 until vertex s is reached. An average of N/2 comparisons are needed to find u’ after each iteration if we examine all the vertices which still do not have final values. If we are willing to maintain an ordered list of the values of f for those vertices which have already been updated at least once, but still do not have final values, then the average number of comparisons is on the order of log(N): u’ is removed from the top of the 3As long as all the costs are greater than zero, ties can be resolved arbitrarily; allowed, then “looping” can result.
if zero costs are
212
FISCHLER,
TENENBAUM,
AND WOLF
list, and each of its eight neighbors must either be inserted or repositioned in the list if they undergo an update of their j value. Thus, A* can require on the order of N* operations (additions and comparisons) for the simpler version of the algorithm; and it can require on the order of Nlog N operations for a worst case situation using the ordered list approach. When the optimum path is clearly distinguished from any of the alternatives, A* is very efficient and can have a computational requirement proportional to the number of vertices along the optimal path. However, for real imagery and currently available operators, the number of alternatives which must be examined grows very quickly, making A* undesirable from a computational standpoint. 2. The Fc Algorithm The Fc algorithm requires the designation of a start vertex S, a goal vertex g, and a cost array C. The first iteration of the algorithm involves a top-to-bottom, row-by-row updating of a path array P (which conceptually overlays C), in which all values were initialized to infinity except the start vertex, which has an initial value equal to its cost. The elements of the ith row of P are subject to two adjustments per iteration. First, all the elements of the ith row are iteratively adjusted from left to right according to the rule P(i,j)
= min[ P(i P(i P(i P(i,j
1,j - 1) 1J) 1,j f 1) - I}
+ c(i,j); + c(i,j); + c(i.j);
iAh
+ c(i,j);
P(i,j)]. Next, all elements of the ith row of P are iteratively adjusted from right to left according to the rule P(i,j)
= min[ P(i,j + 1) + c(i,j);
P(i,j)].
All additional iterations @asses), when required, alternate between a bottom-to top pass (with the row indexing reversed so that the bottom row corresponds to i = l), followed by a top-to-bottom pass (with normal row indexing) using updating rules (1) and (2) given above. It can easily be shown that if the row indices for the elements along the optimal path increases monotonically, only a single pass through P is needed to assure convergence. In general, finding an optimal path from s to g will require a number of passes equal to the number or row index “reversals” along such a path; if the needed number of passes is not known in advance, then the algorithm can terminate when in a complete pass there is no element of P that had a change to a new path cost less than the current path cost assigned to g. As with A*, the optimal path can be found by a backtrace, starting at g and iteratively moving to the neighbor with the smallest value of P until vertex s is reached. For each iteration, the number of operations (comparisons and additions) required for Fc is of order 5N, and since the number of direction reversals for roads and other linear structures generally tends to be small along the nominal
ROAD DETECTION
USING
KNOWLEDGE
INTEGRATION
213
direction of travel, F is computationally more attractive than A* for tracking such structures. (In an interactive setting, the monotonicity condition can be insured by proper partitioning of the image, and thus only a single iteration is required per partition.) For the general case, F* insures that in each iteration at least one element of P achieves its final value, and thus E* can require on the order of N2 operations. We note that for both A* and Fc we can find the optimal path from a set S of start vertices, rather than a single vertex, by connecting a pseudo-start vertex s to each member of S by a zero cost path; we can specify a set of goal vertices G, rather than just a single vertex g, by an identical mechanism. An example showing the operation of the F* algorithm is presented below: 5
1s
63
3
4 8 1627
1
5
8
1
2
6 1
528671 8
3
6 74
1
8
63
7 13
14
20
21
8
10
7
4
12
11 14
17 11 12 13
22 13 13 15
5 8
11 13
10 17
6 7
1
5 2
10*
4
8
7* 6*
4 10 17
11 14
Cost Array C, start vertex s (1,2),
2
goal vertex g (5,6).
3
5 2
13
8
2g
4
1 5 2
Path Cost Array P after the first iteration (top-to-bottom pass).
12* 13*
13* 16*
11 12 13
13 13 15
6
3
7
8
5
11
13
8
13
6 7
1
5 2
10
4
8
7 6
6
3
7
7*
9*
5 8
11
4 10 17
12
8 13
11 12*
8* lO*
9* ll*
13
6
- 1
7
4
12
5
10
-2
12 13
13 16
7
12
8-6 -4
Path Cost Array P after the second iteration (bottom-to-top pass); elements with updated costs are marked by a following asterisk.
13
Path Cost Array P after the third iteration (top-to-bottom pass).
13 12*
6
3
7
12
-7
9
8
5
11
10
11
9
13
8
13
17
12
- 8 - 10
I1
Path Cost Array P after the fourth iteration (bottom-to-top pass) ; elements on the minimum cost path are preceded by a minus sign.
The algorithm terminated after the fourth pass because the only element changed (2,7) had a higher path cost (12) than the current path cost of g (10).
214
FISCHLER, TENENBAUM,
E. A Low-Resolution
AND WOLF
Road Tracking Algorithm
The ideas presented in preceding sections have been synthesized into an algorithm for precisely tracking the major linear structure visible in a delimited region of an image. This algorithm, known as LRRT, takes as input an image array, a search region defined by a binary mask, and constraints on starting and ending coordinates of the track (defined by two specific regions through which the road is constrained to pass, such as the sides of the search window or a pair of boxes.) Algorithm LRRT operates as follows: (1) A selected set of Type I operators are scanned over the region designated by the search mask; and the costs produced by each such operator are histogrammed and thresholded at some preset level (generally operator dependent), so that the number of points below this threshold will not exceed the number of road points estimated to be present in the search window. Selecting 5% of the points in the search window is a typical upper limit for the Duda Road Operator. A PRS mask is generated as the union of those locations at which each Type I operator returns a cost lower than its associated threshold. (2) A selected set of Type II operators is scanned over the region designated by the search mask. The scores for each operator are self-normalized and stored in individual cost arrays. (Normalization is accomplished by separately histogram.” ming the response of each operator and converting the raw values to percentile ranks on a scale of I- 100.) (3) The costs in each Type II array are modified using the transfomi, cost’ = cost’+ b, described earlier, to incorporate a priori knowledge and constraints on road smoothness, visibility, and presence of interfering structures. (4) Costs in Type II arrays at locations designated in the PRS mask are set to a very small positive value. (True zero values could lead to arbitrary wandering, or even cycling, through regions of zero cost.) (5) Each Type II cost array is considered to be a graph with each pixel connected to each of its eight neighbors. A minimum cost path is found in each such array between the starting and terminating delimiters, using algorithm F1, The average cost per pixel along the track in each Type II array is computed and self-normalized by ranking it in a histogram of costs for that operator compiled over the original search region. The track with the highest rank is chosen as the preferred track. F. Extensions of Algorithm
LRRT to Delineation Networks
of Complete Road
Algorithm LRRT can be extended to sequentially delineate all road segments contained in a specified search region of an image. This delineation is obtained by making multiple passes through the Type II cost arrays with algorithm Fc. After each pass, a ribbon centered on the detected road track is marked as a forbidden area to allow the next most prominent road segment to be detected. Marking is accomplished by adding a large cost to every pixel located within some number (currently three) of pixels of any point on the original track; a distance propagation algorithm described in Rosenfeld [ 161 is used for this purpose. This approach will
ROAD DETECTION
USING
KNOWLEDGE
INTEGRATION
215
delineate all roads connecting the original start and stop delimiters and can be used, for example, to delineate all paths between opposite sides of a rectangular search region; the process can then be applied to find all paths connecting the alternate opposing sides of the search rectangle. Since algorithm Fc computes the cost from a given starting delimiter to all locations in the search region, road segments branching out from a primary path can often be found by backtracing from alternate stopping points. In particular, a complete road network can often be found by backtracing from local cost minima along the periphery of the original search region. 3. IMPLEMENTATION
DETAILS
AND EXPERIMENTAL
RESULTS
Algorithm LRRT has been embedded in a complete system for delineating roads in aerial imagery. System operation is partitioned into three phases: initialization, tracking, and smoothing. The objective of the initialization phase is to select the search region within which LRRT will be applied and to constrain the starting and ending points. The search region can be automatically specified from a map data base in approximate registration with the image. Alternatively, the search region can be specified interactively on a graphic display. 4 In either case, the desired search region can be specified explicitly or derived from a sketch of the road path. (A sketch is expanded into a search region using the distance transform referenced in Rosenfeld [ 161.) In an approach we are currently developing (to be described in detail in a later paper), it should be possible to obtain a fairly good sketch of the major road segments depicted in an arbitrarily large image without any manual intervention or data base knowledge. The approach is first to obtain a PRS mask for the entire image, and then extract clusters of marked points from this mask which lie within some maximum distance of their nearest neighbor. Each such cluster is used to generate a minimum spanning tree, and the major branches of the tree are taken to be the approximate road tracks (i.e., sketches). An example of this process is presented in Fig. 7. Algorithm LRRT is used in the tracking phase to obtain a precise delineation for each road segment derived in the initialization phase. The resulting path may bridge small occlusions and have gaps in regions of significant occlusion. The smoothing phase explicitly marks those portions of a road track that were inferred from continuity, rather than direct visibility, and links separated segments of the same road network. While we have addressed the problems associated with each of the above phases for automatically delineating the low-resolution roads and linear structures in an image, most of our current experimental work has been concerned with obtaining a high-performance solution to the problem of precise delineation required in phase two. We have implemented two versions of algorithm LRRT: an INTERLISP/SAIL version for developmental work and a FORTRAN version for more extensive experimentation and evaluation. Both versions run on the SRI PDP-10: the FORTRAN version is also compatible with a CDC 6400 system at the U.S. 4These two methods were used to obtain the experimental results discussed later and illustrated in Fig. 8; widths of typical search regions were 20-30 times road width, and lengths were often equal to the full image dimension.
216
FISCHLER,
TENENBAUM,
AND WOLF
Army Engineer Topographic Lab (ETL) at Ft. Belvoir. The FORTRAN version has a minimum core requirement of 20,000 60-bit words and will track a road segment 128 pixels long in 15 set of CPU time; the corresponding numbers for the INTERLISP version are 90,000 36-bit words of core and 60 set of CPU time. The FORTRAN version of the LRRT makes some additional assumptions about the roads to be tracked: it assumes that they are generally lighter or darker than the surrounding terrain and that they do not “double back” on themselves in the designated search areas. It uses the Fc algorithm, a single Type II operator (based
FIG. 7. Automatic road segment extraction. Resulting sketch delimits search areas for precise delineation using algorithm LRRT. (a) Intensity image of road scene. @) Perfect road score mask (PRS) of image (derived from thresholded intensity, Duda road operator scores, and Rosenfeld-Thurston nonlinear operator scores described in [ 17- 191). (c) Largest cluster of points (Cluster 1) extracted from PRS (each point is within 10 pixels of some other cluster point. (Cluster is truncated at bottom edge due to program storage limitations.) (d) Minimum spanning tree for Cluster 1. (e) Maximum path through minimum spanning tree for Cluster 1. (f) Minimum spanning trees for all clusters of PRS. (g) Maximum paths through minimum spanning trees for all clusters. (h) Maximum paths with length greater than 60 points for all clusters.
ROAD DETECTION
USING
KNOWLEDGE
217
INTEGRATION
FIG. I.-Continued.
on histogram normalized image intensity), and two Type I operators (the Duda Road Operator and an image intensity operator which thresholds image intensity and also checks whether the width of the above-threshold region at a potential road point is sufficiently narrow).5 This program has been tested on approximately 50 road segments found in aerial images of seven different geographic locations with no failures, when the assumptions are satisfied and the roads are clearly visible (some examples are shown in Fig. 8). 4. CONCLUDING
COMMENTS
In this paper we have addressed the problem of precise delineation of the roads and linear features appearing in aerial photographs, using an approach based on global optimization of locally evaluated evidence. Since there does not appear to SAdditioml details of the particular variants of the Duda FORTRAN LRRT are described in the Appendix.
and intensity
operators
used in the
218
FISCHLER,
TENENBAUM,
AND WOLF
b
e FIG. 8. Examples of road delineation
produced
by the low-resolution
road tracker.
ROAD DETECTION
USING KNOWLEDGE
h
i
i
k Fm.
8.-Continued.
INTEGRATION
219
220
FISCHLER,
TENENBAUM,
AND WOLF
exist a single coherent model suitable for reliable detection of local road presence, it was essential that some means for integrating information from multiple (incommensurate) image operators and knowledge sources be devised- the conventional optimization paradigm does not provide any formal machinery for achieving this task. Two key points characterize the basis of our approach: (1) Rather than projecting all image operators on a single linear scale and attempting to use them in the same qualitative manner, we have identified the distinctly different nature and potential use of operators which have strong object detection capabilities, as opposed to those which are useful for object analysis once identification and/or location is known. (Depending on the specific context, a particular operator might switch from one role to the other.) We have provided a simple and uniform mechanism for integrating the information provided by the two classes of operators for the specific task of tracking linear structures, and we believe that the same general approach is applicable in a wider range of problem settings. (2) We have recognized that the score returned by an image operator usually has little absolute meaning, and yet a monotonic transformation of this score can lead to a significantly different final result in tracking linear structures. We have capitalized on this property by introducing a monotonic transform which provides a simple and uniform mechanism for adjusting the scores to reflect a priori information and semantic constraints. Our plans for future work include the continuing development of fully automated techniques for road tracking, extension of these techniques to detection and classification of different types of linear structures (e.g., rivers, railroads, runways, etc.), and development of techniques for tracking linear structures in three dimensions using stereo image pairs. The scientific content of this work lies in discovering effective models for representing and detecting the linear structures of interest and developing paradigms for integrating information from the wide variety of knowledge sources available to the human observer whose performances we are attempting to equal or surpass. Applications of our work include road monitoring for planning and intelligence purposes, delineation of roads and linear features for automated cartography, and detection of roads and linear features as landmarks for autonomous navigation. APPENDIX
Implementation
Details of the Duda and Inten@
Operators
I. The Duda Road Operator (DRO)
The essential details of the DRO are shown in Fig. Al. The masks at the top of the figure measure uniformity of intensity along a potential road track (a,,a,,a,) and contrast of this potential track with the adjacent terrain (b,, b,, 6,; c,, c2, C-J. Note that terrain intensity is sampled at a one-pixel offset from the assumed track to allow for slight variations in road width. Scoring function G(u) is designed to ignore slight variations in intensity along the road and to penalize all significant
ROAD DETECTION
USING
KNOWLEDGE
INTEGRATION
221
variations equally; similarly, scoring function F(U) rewards all large contrasts equally. The unspecified parameters in Fig. Al were assigned the following experimentally optimized values for an intensity range of l- 127: M=
1.5;
ei = 5;
e = 15;
82 = 20;
e = 0.1.
MASK TO DETECT RIGHT DIAGONAL ROAD SEGMENTS
F(u) M
// --------/
//
/’ 1’ / /
/’
FIG. Al.
Description
of Duda road operator.
222
FISCHLER,
TENENBAUM,
AND WOLF
When used as a Type I operator, all the scores produced by the DRO over some region of interest in the image are histogrammed; and the best N percent of the scores are interpreted as producing positive Type I responses. N is determined either by the nominal upper bound of l-2% for images where road content is initially unknown or by an a priori estimate of the number of road pixels in the segment of the image to be analyzed. For example, when tracking a road whose average width is two pixels and assuming that this road is known to lie somewhere in a band 40 pixels wide, we would set N at 3--L%. 2. The Intensity Operator
The intensity operator looks for a narrow region of high brightness at a potential road point (or low brightness for dark roads). For dark roads, the score returned by the operator is just the intensity value scaled to a range of l- 127. For light roads, intensities are subtracted from 128 after scaling, so that, in both cases, low values (costs) correspond to likely road points. If at a given point the “width” condition described below is not satisfied, then the score at that point is degraded by adding a constant (25 in our current implementation) to it. Since the width test is applied directionally, the operator itself is directional. When used as a Type I operator, the scaled intensity values in the entire search region are histogrammed. To produce a positive Type I response, intensity at a potential road point must satisfy the additional condition of being among the lowest K % of histogram values. K is determined in a marmer analogous to N (used in the DRO) based on the expected number of road points which can be detected by the operator: typical values range from 5- 10%. The width condition at a point is evaluated by looking at the brightness scores along a line normal to an assumed local road direction at that point. We require that (for a maximum road width of three pixels) in an interval of 15 pixels centered on the point, there should not exist, in a contiguous sequence of five points, more than three points with intensity scores in the lowest (best) KS. ACKNOWLEDGMENTS
The authors wish to acknowledge the important contributions made by Harry G. Barrow, Richard 0. Duda, and Thomas D. Garvey in the formative stages of the work described in this paper. REFERENCES 1. H. G. Barrow and J. M. Tenenbaum, The Representation and Use of Knowledge in Vision, Ai Technical Note 108, SRI International, Menlo Park, California, July 1975. 2. E. W. Dijkstra, A note on two problems in connection with graphs, Numer. Muth. 1, 1959,269-271. 3. S. E. Dreyfus, An appraisal of some shortest-path algorithms, Operatiour Res. 17, No. 3, May-June 1969, 395-412. 4. R. 0. Duda and P. E. Hart, Pattern Classification and Scene Analysis, Wiley, New York, 1973. 5. M. A. Fischler and R. A. El&lager, The representation and matching of pictorial strnctnres, IEEE Trans. Computers C-22, No. 1, 1973, 67-92. 6. M. A. Fischler and T. D. Garvey, A computer-based approach to percepttnxl reasoning and multisensor integration, in preparation. 7. L. R. Ford, Jr., Network Flow Theory, The Rand Corporation, P-923 (August 1956). 8. P. Hart, N. Nilsson, and B. Raphael, A formal basis for the heuristic determination of minimum cost paths, IEEE Tram. Qst. Sci. Cybern. SSC-4, No. 2, July 1968, loO- 107. 9. M. Hueckel, An operator which locates edges in digitized pictures, J. Assoc. Comput. Mach. 18, No. 1, January 1971, 113-125.
ROAD DETECTION
USING
KNOWLEDGE
INTEGRATION
223
10. M. Hueckel, A local visual operator which recognizes edges and lines, J. Assoc. Comput. Mach. 20, No. 4, October 1973, 634-647. 11. A. Martelli, An application of heuristic search methods to edge and curve detection, Comm. ACM 19, No. 2, February 1976. 12. U. Montanari, On optimal detection of lines in noisy pictures, Comm. ACM 14, No. 5, May 1971. 13. E. Moore, The shortest path through a maze, Proc. Intern. Symp. Theory Switching, Part II, The Annals of the Computation Laboratory of Harvard University 30, Harvard University Press, Cambridge, Mass., 1959. 14. L. Guam, Road tracking and anomaly detection, in Proc. Image Understanding Workshop, pp. 51-55 (May 1978). 15. L. G. Roberts, Machine perception of three-dimensional solids, in @tical and Electra-Qtical Information Processing @ppett et al., Eds.), pp. 159- 197, MIT Press, Cambridge, Mass., 1965. 16. A. Rosenfeld and J. L. Pfaltz, Sequential operations in digital picture processing, J. Assoc. Compur. Mach. 13, No. 4, 1966, 471-494. .17. A. Rosenfeld and M. Thurston, Edge and curve detection for visual scene analysis, IEEE Tram. Computers C-u), No. 5, 1971, 562-569. 18. S. Rubin, The ARGOS Image Understanding System, Ph.D. thesis, Dept. of Computer Science, Carnegie-Mellon University, Pittsburgh, Penn., 1978. 19. G. J. Vanderbrug, Semilinear line detectors, Computer Graphics and Image Processing 4, No. 3, September 1975, 287-293. 20. G. J. Vanderbrug, Line detection in satellite imagery, IEEE Tram. Geosci. Electron. GE14, No. 1, 1976, 37-44.