COMPUTER
GRAPHICS
AND
IMAGE
PROCESSING
(1972)
1,
(386-393)
Some Experiments in Scene Analysis and Scene Regeneration Using COMPAX V.
S. N.
REDDY
AND
R.
NAKASIMHAN
Tutu llutitute 0f‘Fundon~errttrl Rrsccrrc~l~ Homi Bhuhho Rocrd, Bombay .5. lrrditr
It was demonstrated in an earlier paper that picture processing languages like COMPAX and PAX II offer a powerful li-amework for the specification and generation oi scenes with simple semantic content. The converse problem ofanalyzing such scenes is considered in this paper. It is shown that the framework provided by COMPAX and PAX II continue to be of great value in picture parsing and identification of individual objects in a scene, as well as in the regeneration of the scene after individual objects arc removed from it. The scenes considered are isometric projections of configurations of rectangular blocks. 1. 1NTRODUC;TIOh
1.1. In an earlier paper we described some experiments in the computel generation of 2D representations of 3D scenes with ‘simple semantic content’. Simplicity of semantic content refers to the constraint that the objects, as well as their configurations, in the scene have a simple structure; that is, only cer. tain delimited types of attributes and relations enter into their specification. W e argued there that scenes, simple in the above sense, should be capable of being computer generated without being called upon to perform involved computations of the type associated with complex problem-solving or theorem-proving programs. It was shown that picture processing languages like COMPAX and PAX II are especially well-suited for this task. The power ol these languages lies in the ease with which certain kinds of attribute and relationship computations can be carried out; for example, relations such as: in front of, behind, above, below, inside, outside, overlapping, separate, etc.; and attributes such as: line-like, blob-like, curve-like, and so forth. W e suggested that it would be appropriate to refer to such computations as perceptual level computations and argued that scenes with simple semantic content should, by and large, require only perceptual level computations for their generation and analysis. 1.2. Our earlier paper was restricted to the generation of isometric projections of scenes composed of rectangular parallelepipeds (referred to as “blocks” henceforth). The blocks are assumed to stand on the floor or on top of other blocks. No block is allowed to lean on another block for support. In other words, it is assumed that the undersurface of the blocks are horizontal, always. It is also assumed that all the blocks are oriented, in plan, the same way. That is, the sides of all the blocks are parallel. The principal corn 386
SOME
EXPERIMENTS
IN
SCENE
ANALYSIS
387
putational problem associated with the generation of such scenes is the hidden-line removal problem. Figure 8 shows typical computer-generated scenes. In this paper we shall consider the converse problem of analyzing a given such scene into its individual blocks. Specifically, we shall be interested in reconstructing the scene after a block, that has been recognized, is removed. It is clear that this reconstruction problem does not always have a unique solution, since the true dimensions of blocks partially hidden earlier may not be computable uniquely. However, syntactic constraints always determine what are admissible reconstructions. Our interest is in reconstructing scenes that always satisfy these admissibility constraints. It should also be reasonably clear that the analysis and reconstruction problem is bound to be intrinsically much more difficult than the generative one. It may not be possible to tackle it, always, completely at the perceptual level. computations) may often Problem-solving techniques (i.e., cognitive-level have to be invoked. Nevertheless, it is our thesis that for simple scenes, the computational power inherent in COMPAX-like (PAX-like) languages should prove adequate; and for more complex scenes, the analyzed outputs generated by such languages should prove to be of very great value for higher level processing. The analysis procedures described in the rest of this paper should lend support to these assertions. Familiarity with COMPAX or PAX II is assumed on the part of the reader; otherwise, see (the Appendix in [4] or [3]). 2. SCENE
2.1. Extrtlctioti
ofHrr.vic~ l?lformrrtio?l
ANALYSIS
f,oul
the Scerrc
Consider a typical scene as in Fig. 1. The scene contains three kinds of distinctive features, namely, straight lines of three different slopes (referred to as Vertical, L-slant, and R-slant lines), Vertices of individual blocks, and points shared by the edges of two distinct blocks. Metric properties of individual blocks are readily computed from their edge lengths; Vertices provide location information of the blocks in the scene; the shared points, in their turn, serve to delimit hidden lines and coincident edges in the scene. These aspects are clearly brought out in Fig. 2 which is the labeled version of the scene shown in Fig. 1. In the rest of this subsection we shall describe the manner in which this labelled information is separated out in distinct COMPAX planes for later use. Vertical lines are labeled by first marking points, three or more in chainlength, in the north-south direction and extending these labels along these directions to include initially unmarked points. The vertical lines so labeled are separated out in a separate plane named ‘V LINES’. Analogously, L- and R-slant lines are separated into two other planes, ‘LLINES’ and ‘RLINES’, respectively. Vertices are defined as points where two or more of the labeled V-, L-, or N-Lines meet. They are readily identified and assigned distinct labels according to their type using ‘AND’ an2 ‘BOOFUN’ operations. As shown in
388
KEDDY
AND
NARASIMHAN
Fig. 3, each physical vertex of a block in the isometric projection is identified by a conventionally assigned tag C’l, ‘1’2, , . . , ‘1’7. Syntactically, th(-s vertices of a block are classified into three types: [,-type (V2, V3, t’7); Arrowtype (Vl, V4, V6) and Y-type (V5). These classifications conform to those used by others (see, e.g., [l] which is based on the earlier work of Guzmal, [2]). Vertices with tags Vl, . . 4 , V7 are separated into seven different planes named: VERTEX 1, . * . , VERTEX 7. Finally, nonvertex intersection points are identified as points at which an edge ends interior to another edge. These can be differentiated from the vertices because each of them has a pair of complementary neighbors on either side. Three types of points of intersections are identified by the tags VI+ VR, and LR as shown in Fig. 4. These, again, are separated into three distinct planes: VLINRS, VRINRS, and LRINRS. An intersection point could, in fact, denote a simple vertex (of a single block) or arise out of the coincidence of vertices of more than one block. Such occurrences are illustrated in Fig. 5. These intersection points are termed Junction Points and separated out into a distinct plane called ‘JUNCPNTS’. Junctioll points indicate the presence of shared lines in the scene as shower in Fig. .Y.
SOME
EXPERIMENTS
IN
SCENE
ANALYSIS
389
V6 v4
I;rc:. 3. (Left)
Blork
vertex convention.
FIG. 4. (Right) Types
of points
of intersection
of edges.
The block retrieval scheme can be functionally split up into four distinct phases. In phase one, all vertices of a given type associated with blocks that can be retrieved are separated out. The blocks are retrieved one by one in a systematic fashion. In phase two, hidden lines, if any, due to the retrieved block are regenerated and added to the part of the scene yet to be analyzed. Phase three mainly deals with common lines shared by the retrieved block an d some other block in the remaining part of the scene. Such lines (earlier erased when the former block was removed) are identified through the junction points and restored to the left-over scene. During the final phase, the vertex, intersection point, and junction point
FIG:. 5. Junction
points.
390
WEDDY
AND
NARASIMHAN
planes are updated by deleting from them the points corresponding to the retrieved block. In the rest of this subsection we shall discuss in some detail the first threr, phases. Pl2n.w 1: Block retricrxi. It is simplest to try to recognize a block by starting with its vertex identified by the tag ‘Vl’. All such vertices are availahl~~ in the plane VERTEX 1. We now select a subset S of points in VERTEX 1 satisfying the following condition: each such vertex is connected to a poiiit in VERTEX 2 and a point in VERTEX 3 by an L-slant and R-slant line, respec’tively. The blocks associated with points in S are deemed to be retrievable Next, we consider the points in S one by one and determine the particular L- and R-slant lines passing through it. This is done essentially by recoir~ strutting these lines a step at a time till the end points (namely, points irt VERTEX 2 and VERTEX 3) are reached. During this process, a counteninitialized to zero at the start, is incremented by one after each step of line, construction. Thus the lengths of the edges Vl-V2 and Vl-V3 are obtained for each block belonging to every point in the subset S. At the end of these two computations we have available for each retrievable block its base edges (Vertex 2-Vertex l-Vertex 3) as shown in Fig. 6a. Thri dimensions of these edges are also known. What remains to be determined for each of these blocks is its height. For, then, the complete block can be generated and subtracted from the input picture. This is what we have beerr referring to as the retrieval process. of the block, the vertical edges from To find out the vertical dimension the plane VLINES are added to the base as shown in Fig. 61~ and a searclr is made to determine whether a point in VERTEX 5 lies on the vertical lint\ through Vl or, failing this, whether a point in VERTEX 4 lies on the vertical line through V2, or, finally, whether a point in VERTEX 6 lies on the vertical line through V3. A successful outcome of any one of these searches, establishes the height of the block under consideration and the block retrieval process is complete. In case all the three searches fail, the entire top surface of the block under question is hidden by one at a higher level and its true height cannot be dt:termined. The program then selects the maximum of the three vertical etige~ as representing the height of the block and terminates the retrieval process.
SOME
EXPERIMENTS
IN
SCENE
ANALYSIS
391
This process of block retrieval is repeated for each point in the subset S earlier computed. We have so far discussed the block retrieval procedure starting with V’l vertices. In the case of blocks whose Vl vertices are hidden, it may be possible to reconstruct the blocks starting with vertices V4 or V6 or V5; or, even, V2, V3 or V7 (except in this last case of L-type vertices, only two alternatives exist for determining the third dimension of the block). All these possibilities are systematically tried through analogous procedures. At the end of each block retrieval process, the hidden line and shared line determination procedures are executed and these lines are regenerated and added to the remaining scene. We shall discuss this process now. P1~cl.w 3: Hidden-line determination. In the present implementation, the analysis program assumes tacitly that a given block hides not more than one block, if at all. It is easily verified that more complicated cases involving more than one hidden block can be handled analogously although, to be sure, much more complex test procedures would have to be incorporated in the program. Consider the three kinds of hidden lines that arise in the situations shown in Fig. 7. The restoration of these lines is based on the use of points of intersection previously separated out and stored in planes VLINRS, VRINRS, and LRINRS (see phase 1). The subsets of points in these planes that lie on the edges of the currently retrieved block are separated out in a distinct plane. It is evident that these points identify the intersections of the edges of the hidden block with those of the currently retrieved block. The types of intersection points involved and their number determine the nature of the hidden lines as we shall see presently. Care is taken at this stage to abort the attempt to generate the hidden lines in case these belong to the retrieved block itself. As soon as
(a)
(b) Frc.. 7. Types
of hidden
lines.
392
REDDY
AND
NARASIMHAK
the types of hidden lines are determined, the program computes the coordinates of the hidden vertices, if any. The hidden lines are next generated and added to the scene yet to be analyzed. To get some idea of the logic used in the hidden line classification, L’O~Isider Fig. 7 again. Let H be the currently retrieved block. If the number (11’ intersection points of the VB-type is three or more, then the hiding involved is of the kind illustrated in Figure 7c. If there are 2 VR-type intersections, it is checked whether a VI,-type point is also present. If yes, the situation is it’% shown in Fig. 7b. Otherwise, the situation of Fig. 7a is assumed to be preseirt. It is clear that analogous considerations (interchanging VR and Vl,) would enable one to handle the case where block R hides block A from its left. Phc~ve 3: Detection c$shcrrc?cl lines. The varieties of common edges the program is designed to detect are illustrated in Fig. 5. The scheme begins with the separation of junction points lying on the currently retrieved block. Ttrc type of the junction point identifies the directional sense of the shared edge, and defines one end of the edge. To determine the other end of this edge, ii search is made for the appropriate intersection point on the shared line. The line segment delimited by these two end points is now added to the hidden block in the remaining part of the scene. This procedure is repeated wit11 other junction points, if any, lying on the retrieved block. Shared points, ii any, are also restored at this stage.
In phase one we saw that there exist seven different starting points for tllc, retrieval of a block. For each block so retrieved, the procedure for hidder~ lines reconstruction and shared lines restoration are executed as discussed in detail so far. When all blocks retrievable through any one of the sevcll starting procedures have been removed, the reconstructed scene is rrcyclet-1 through the analysis program as a fresh input. All lines are labeled afreslt again and the vertices, intersection points and junction points identified and separated out as discussed earlier. The reason for doing this should 1~’ obvious. With the restoration of lines earlier hidden and edges earlier s1nuec.l. there is an increased likelihood that the newly parsed picture would Icml itself to easier recognition and retrieval of the rernaining blocks. Figure 8 illustrates two typical complex scenes analyzed by the currcr~~ program. 3. CONCLUDING
HEMAHKS
Analysis of visually given scenes composed of blocks of various sizes and shapes has been of central concern to all the robot-projects currently undei~ study. The standard procedure in all these studies is to compute a line diagram of the visually given scene and analyze it on the basis of the vertex classifications and related edge information. Our principal concern in thib paper has been to show that COMPAX-like (PAX-like) languages offer a vcr?. powerful picture processing framework within which these kinds of prrcep tual-level analyses could be efficiently carried out. We have tried to f~stalr~, lish the credibility of this thesis by implementing a program that IS capalrlc
SOME
EXPERIMENTS
IN
SCENE
ANALYSIS
393
(b)
FIG. 8. Typical
scenes analyzed
by the computer.
of analyzing scenes built out of blocks under several simplifying assumptions. As we pointed out earlier, scenes with more complex semantic content would naturally require elaboration of our current program in terms of its problemsolving capability and search procedures. But our current implementation demonstrates convincingly, in our opinion, that the picture processing framework provided by COMPAX (PAX) should continue to be of great potential value in coping with these more complex visual situations. ACKNOWLEDGMENTS
We would like to acknowledge the assistance of Mr. V. S. Patil and Mr. C. T. Devassy in the preparation of the paper for publication. One of us (R. N.) would also like to thank the Jawaharlal Nehru Memorial Fund for support during this study. REFERENCES 1. C. FALK, Interpretation of line data as a three-dimensional scene, .Arfij’icicr/ ~~rfcllig~~~~c~~~ 3,1972,101-144. 2. ,I. (:UZMAN, Decomposition of a visual scene into three-dimensional bodies, Proceeding\ of AFIPS Fall Joint Computer Conference, 33, Part I, 1968, pp. 291-304. 3. E. G. JOHNSTON, The PAX II picture processing system, in Picture Procc~s.si~~~urd !‘s!/c~ltoIjictoric‘s (B. S. Lipkin and A. Rosenfeld, Eds.); Academic Press, New York, 1970, pp 427-512. 4. K. NARASIMHAN, Syntax-directed interpretation of classes of pictures, Conutt. Ac,nl 9, 1966, 166. 5. K. NARASIMHAN ANI) V. S. N. REDDY, Some experiments in scene generation using COMP.4X, in Gruphic Lunguuges (F. Nake and A. Rosenfeld, Eds.), Amsterdam: North-Holland, 1972, pp. 111-120.