Solid modeling in robotic vision

Solid modeling in robotic vision

227 Solid Modeling in Robotic Vision * 1. Introduction Pietro M o r a s s o Department of Communication, Computer and Systems Science, University of...

405KB Sizes 1 Downloads 118 Views

227

Solid Modeling in Robotic Vision * 1. Introduction

Pietro M o r a s s o Department of Communication, Computer and Systems Science, University of Genoa, Via Opera Pia lla, 116145 Genoa, Italy Robotic vision has been usually considered a computational task which processes images in order to obtain three-dimensional models of objects and therefore is image driven. However, this is not the only possible approach. A different approach is presented which is driven by solid modeling, where a common three-dimensional representation coordinates the activities of early processing modules.

Keywords: Robotics, Robotic vision, Solid modelling, Occluding contours, Object memory.

P. Morasso obtained a degree in Electricai Engineering at the University of Genoa in 1968. In 1970 he joined the Computer Science Department of the University of Genoa where he is currently Professor of Anthropomorphic Robotics. During the 1970-1972 period he was post-doctoral fellow at the Psychology Department of MIT, Cambridge Mass. His scientific interests are in the area of neuroscience and computational modeling of human skills. * This paper was supported the Program on Robotics of the Ministry o f Education and by the Esprit Project P419. North-Holland Computers in Industry 7 (1986) 227-232

Machine vision is a multifaceted task which does not allow a straightforward formulation and cannot be faced according to a unique computational approach. It is not surprising that different computational approaches exist, according to where the emphasis is put. The iconic approach, which is historically the oldest one, is a development of the patterfi recognition area: the emphasis is on images and on different kinds of iconic matching techniques. The approach developed particularly by Marr and by his coworkers [6] tries to go beyond images, by making explicit the characteristics of surfaces which can be deduced from the analysis of the grey level distribution over the image: a family of techniques has been generated accordingly (shape-from-shading, shape-from-stereo, shapefrom-motion, etc.). These techniques have common characteristics: they are local in the sense that they tend to provide information about the differential geometry of the surface, which requires to integrate complex differential equations in order to capture the global shape of the objects, which is then represented in terms of their boundary. Furthermore, these techniques are egocentric, i.e. they assume the observer as the system of reference, failing to provide a framework for integrating different viewpoints. A local, superficial, but immediately three-dimensional information is provided by active tactile exploration, which may consist of using the same robot arm as a three- or six-dimensional digitizer. A global, boundary description is provided by another non-optic method: the laser scanner. This is a very precise measuring concept, which however lacks that kind of flexibility which is displayed by the human visual system and which is likely to characterize the visual skill that robots will be required to have. A global, volumetric representation is generated by the use of multiple occluding contours [7,8]. Since each occluding contour constrains the object within a conic surface, multiple views determine an intersection volume which contains the object.

228

Industrial Robotics in Discrete Manufacturing

The multiple views can be obtained either by means of multiple cameras which look simultaneously at the same object or by means of one camera which "navigates" around the object (in the same "ecological" way in which humans look at objects, never statically but with a continuous active process of visuo-motor exploration [2]. On the other hand, it must be remembered that the shape-from-multiple-contours approach can only give convex approximations of the shape: it is indeed blind to concavities. The conclusion which can be drawn from the comparative analysis of different approaches to machine vision is that no one, on its own, is general enough to be self-sufficient. What is necessary, in our opinion, is the integration among the different sources of spatial and visual information and we suggest that such integration entails a precise hierarchy among the different approaches, in order to compensate the limitations of one with the strong points of another. This point of view is illustrated in the following sections by means of some preliminary examples.

2. Hierarchy of Visuo-Spatiai Computational Layers Among the people who work in the machine vision area, it is implicitly assumed that the computational process, which ends up eventually with a three-dimensional model, consists of some sort of surface formation procedure, performed in different steps: a flat image patch is first isolated, using linear or regional segmentation techniques, and then the patch is shaped and rounded, taking into account shading, stereo disparity, etc. (this is what Marr called the transformation from a primal sketch to a 2.5 sketch of the scene depicted by the original image). On the contrary, it is our opinion that the computational processes underlying machine vision should be guided by 3D volumetric primitives. In qualitative terms, we think that machine vision is more akin to sculptoring than to painting a n d / o r chasing: the marble block, which is the original solid structure from which the sculptor initiates his solid formation performance, can be represented by a holistic memory (i.e. a three-dimensional volumetric memory), which is set initially to a "full" state and is then "chopped", "carved", and "smoothed" in order to produce the final Volumetric shape. In computational terms,

('omputers in Industr~

the holistic memory can be implemented by a digital cube, whose unit cell is the voxel; however, if we wish to save some megabytes (while waiting for cheap enough memories) it is possible to use cell decomposition techniques, such as run-lengthcoding or octrees). The different computational stages interact with such a representation and they should be ordered, as a sculptor does, according to a rough-to-fine sequence: rough actions must come first and finer actions are appropriate only later. Occluding contours are naturally suited for a preliminary rough volumetric sketch. Each occluding contour, taken from a different viewpoint, allows to "chop off" volume elements by means of "longitudinal strokes" and it is cumulative, improving the degree of approximation for each new observation, exactly as humans do when they actively move their eyes, their head, and their body while examining an object. However, if acting alone, this process can be rather slow, in providing an acceptable approximation of the solid, and then its action can be integrated with concurrent volume chopping processes which may act by "transversal strokes": this is the case, for example, with some rough stereo vision algorithms [9], but also cognitive factors may help, by using expected knowledge about the objects in the scene. The appropriateness of occluding contours for buiding 3D visual interpretations, in absence of any other 3D cue, is supported by the experiments on motion perception by Johansson, who reported the compelling illusion of 3D motion which is generated by displaying moving flat shapes on a uniform background. The basic importance of 3D primitives is also supported by the experiments by Shepard, concerning mental spatial rotations. If we now wish to integrate the rough volumetric sketch with fine local boundary procedures, we must consider that such a sketch is polyhedric and discontinuous and then it first must be smoothed and its boundary must be made available as a continuous surface. At that stage, the local methods can come into action, which may "carve" smooth concavities in the initial convex representation, by using the additional shaping constraints to the surface. Tactile exploration can participate in the process at each stage, either contributing with discontinuous volumetric strokes or smoothly deforming the volume boundary by providing additional con-

Computers in Industry

straints. The coherence/registration among the different computational layers is assured by the fact that all of them refer to the same system of reference, associated with the 3D memory. This view of computer vision should be contrasted with the traditional one, for example well expressed explicitly by Barrow and Tenenbaum [1] when they stress the importance of a hierarchy of computational layers in the image domain and not in the space domain. The image centered point of view leads inevitably to bottom-up algorithms which are data-driven. The solid centered point of view is top-down and bottom-up at the same time: it is model-driven because the solid representation is the coordinating hypothesis of all the computational layers, but it is also driven by the visual, kinesthetic, and cognitive data which contribute to refine the solid model. A solid representation of objects, in addition, is much more appropriate for cognitive modeling and processing than a planar (even if relational) representation, such as that used by Guzman for polyhedra [4], which has ignited so much labour among AI scientists. In the following two Sections, we outline some aspects of the theory discussed above and the corresponding algorithms.

3. Building the Volumetric Sketch

In order to btlild the volumetric sketch of an object from multiple occluding contours, it" is necessary to map each pixel of an occluding contour into a ray in space. This requires knowledge of the

P. Morasso / Solid Modeling in Robotic Vision

229

spatial location of two points of the ray: one is naturally the location of the observer, with regard to the global system of reference, and the other can conveniently be localized as a point in the background, if the object is viewed against a calibrated background, which consists of rectangular grids over orthogonal planes. The calibration technique and the algorithms for tracing rayfrom-pixel and pixel-from-ray are described in more detail in [8]. Here we summarize only the main steps of the process. Fig. 1 shows the calibration grids from different points of view; Fig. 2 shows the views of calibration objects of known geometry which allow to estimate the spatial location of each observer; Fig. 3 shows the occluding contours of two objects; Fig. 4 shows the initial prismatic approximation of the object which may be computed from the limit rectangles in each image. Algorithms have been implemented which scan the prismatic approximation by vertical columns, firing rays at all the observers and recording when all of them hit the binary images inside the occluding contours. This implementation, which follows a space-to-image approach, is conceptually very simple and is very well suited to a highly parallel implementation in which ray firing and binary image testing are the basic parallelized primitives. The space-to-image approach is naturally useful when the multiple images are simultaneously available. When the multiple images are acquired in a sequence, if we wish to proceed incrementally (i.e. without having to wait for the complete sequence to be available) by guiding the visual exploration

Fig. 1. Calibration grids from different points of view; f o ( e a c h of them a table stores the image coordinates of the visible grid elements.

230

Industrial Robotics m Discrete M a n ~fi~ turmg

( omput~'r~ m ln,lu~trb

v

r f

i

I

! F

[ L

~

_

a

Fig. 2. Calibration objects of known geometry, which allo,s to estimate the observer location for each view.

with the intermediate results, then it is better to proceed according to a dual image-to-space approach. For each image, it is first necessary to decompose the area outside the occluding contour into polygonal patches. Each polygonal patch corresponds, in spatial terms, to a pyramid whose vertex is placed in the observer location. What we must do is to chop off the volume elements of the holistic memory which are in common with each pyramid. This requires a pyramid filling algorithm, which is the basic primitive suitable to be parallelized, in this case.

(not necessarily all of it). In particular, we can use a parametric representation of the form ,-

= ,-(u, ,,)

where r is a 3D smooth vector function of two independent variables u and v. If the volumetric sketch is stored in a rectangular holistic memory, it is natural to consider it as a tomography, where the cutting planes may be orthogonal to any one of the coordinated axes. Once an axis has been chosen, the boundary can be obtained by means of contour tracing on each cutting plane, resulting in the following collection of contours:

4. Smoothing and Carving the Volumetric Sketch

((ci.,,

In order to perform fine local modifications of the volumetric sketch, it is necessary to obtain a continuous smooth representation of its boundary

where chain nates, plane,

Fig. 3. Occluding contours of two objects.

(1)

i=l',{;)),

j=ja" jb)

each contour e can be represented by a code or by an array of Cartesian coordi" j " is the index which identifies the cutting and " i " identifies a contour on a cutting

Computers in Industry

P. Morasso / Solid Modeling in Robotic Vision

where u is the running variable along a contour. In order to obtain a continuous representation of the boundary, such as (1), we can choose the second independent variable v as the variable which identifies the continuous motion of the cutting planes ( j identifies the discrete motion, according to the spatial sampling size of the data). Furthermore, we can interpolate the parameters p along such a variable by means of sufficiently smooth interpolation functions, such as splines. We end up with the following function

Fig, 4. Prismatic approximation.

p(v)) plane (there might be more than one of them, corresponding to different sub-shapes of the object). Fig. 5 shows an example. However, since we are mostly concerned with local modifications, we may as well drop the latter index, taking into account only a sequence of contours at a time. Each contour is a closed curve and then it can be smoothly represented by a continuous periodic function f ( ) , expressed for example by means of a Fourier expansion. Such a representation consists, in general, of a set of base functions (sinusoids in the Fourier case) and of a set of parameters p: (f/=~(u,

p/), j=ja: jb)

Fig. 5. Volumetric reconstruction.

231

(2)

(3)

which generates a family of planar contours. Such a function, together with v (which is linearly related to the third dimension) expresses the smooth representation of the boundary of the volumetric sketch. "Carving" the volumetric sketch means to edit the shape representation (3) according to additional local constraints, which may derive from very different sources of information (e.g. visual and kinesthetic) and from very different early processing stages (e.g. optic flow, stereo disparity, shading). Each of these sources of perceptual evidence is localized in space, i.e. it is relative only to a limited number of "slices" of the volumetric

232

Industrial Robotics in Discrete Manufacturmg

representation (this identifies a limited range of v) and only to a limited part of the slices (this identifies a limited range of u). Therefore, we end up with a shape editing problem where we must modify (3) in order to satisfy some constraint value of f and, possibly, of its derivatives, while keeping the modifications induced by these constraints restricted to a given region of the u, v domain.

Re|erences [1] Barrow, H.G., Tenenbaum. J.M.: Computational vision. 1EEE Proceed. 69 (1981) 572-595.

Computers m Industr~

[2] Gibson, J.J.: The ecological approach to visual perception. Houghton Mifflin Co., Boston, 1979. [3] Goldwasser, S.M.: A generalized onject display processor architecture. IEEE SIGARCH Newsletter 12 (1984) 38-47. [4] Guzman, A.: Decomposition of a visual scene into three-dimensionalbodies. AFIPS Joint Fall Conf. 33 (1968) 291-304. [5] Johansson, G.: Visual motion perception. Scientific American 232 (1975) 76-88. [6] Mart, D.: Vision. WH Freeman Publ., San Francisco, 1982. [7] Martin, W.N., Aggarwal, J.K.: Volumetric description of objects from multiple perspective views. IEEE Trans. Pattern Analysis Machine Intelligence PAM1-5 (1983) 150-158. [8] Massone, L., Morasso, P., Zaccaria, R.: Shape from occluding contours. SPIE Symp Intelligent robots & computer vision, Cambridge, Mass, Nov. 4-8 (1984). [9] Nishiara, H.K.: PRISM: a practical real time imaging stereo matcher. Proceed. 3rd ROVISEC Syrup, 121-130, Cambridge, Mass. Nov. 6-10 (1983).