Visual feedback for robots— problems and progress

Visual feedback for robots— problems and progress

Cop\Tight © I F.-\C Robot Control (SHOW '85). Barcelona. Spain. 1985 VISUAL FEEDBACK FOR ROBOTS PROBLEMS AND PROGRESS U. L. Haass CEC, I lIjimnatioll...

2MB Sizes 1 Downloads 66 Views

Cop\Tight © I F.-\C Robot Control (SHOW '85). Barcelona. Spain. 1985

VISUAL FEEDBACK FOR ROBOTS PROBLEMS AND PROGRESS U. L. Haass CEC, I lIjimnatioll

T~(hllolvgifs

Ru~

amI T~/f(Oll/lI/lIl1i((ltiolls Task d~ la Lvi, 8-10-19 , Brw.\~ls

Abstract This paper is concerned with possible applications of computer vision for feed-back contro l of industrial robots. Efforts in research to overcome the limitations of currently available bi nary vision systems are presented , but it is pointed out that new developments, especially for dynamic feed - back control, are stil l far from becoming industrial products, and euphoric expectations should be approached with caution. Three important directions of relevant research are dealt with in more detail , in order to demonstrate the range of problems: shape information from a sing l e image, from stereo, and image sequences.

dynamic

scene

information

from

input

device.

This,

however,

path control for co l lision grasping moving parts

avoidance

6.

external supervision of the work area

7.

navigation of mobile robots

and

Almost every survey among the industrial clientele of vision systems shows that expectations had risen with the early demonstrations of vision systems and had been pu shed by a mu l titude of euphoric publications , conferences and industrial fairs, but they have not been met to this day (Batchelor, 1982). This has lead to a considerable lack of confidence in new products on the market, which only with the recent development of new- generation gray - scale systems seems to have passed its low point.

So far, the history of industrial robots has still been almost parallel to the development of computer vision systems . But although the idea of making robots "intelligent" by giving them vision capabilities was always sufficiently plausible to spur enormous efforts in research, the degree of industrial implementation is far less than the amount of technical publications might suggest. As Richard P . Paul (1985) recalls, the first robots equipped with external sensors used touch-feedback an

5.

Scene

I NTRODUCTION

as

ESPR IT 200,

Although the first two tasks are not confined to industrial robots (Chin and Harlow , 1982), the connection of vision and robotics has current l y gained such prestige that almost any type of industrial vision system is advertised under the glamorous label of robotics (Ferretti , 1984). That these claims of 'robotics' a r e usual l y not being refuted might be attributed to the fact that sophisticated visual feedback systems capable of the more genuine tasks 3 to 7 have not left the laboratories yet, due to the enormous amount of technical problems st il l to be solved.

Keywords Computer Vision , Robotics , Visual Feedback, Analysis , Moving Images

Fur(~ .

was

BINARY VISION SYSTEMS FOR ROBOT CONTROL From the academic point of view, vision systems for

input device, mostly at MIT (Roberts, 1965) and Stanford University (Agin and Duda, 1975) , had triggered a flood of activities in research laboratories around the world. Computer Vision, of

industrial applications should benefit from a number of circumstances. As opposed to systems for outdoor applications , they offer the possibility to arrange scene and illumination, usually deal with parts displaying man-made geometries, and a large amount of a - priori knowledge is available . Hence it was felt that even binary vision systems , three years ago still the only type of vision systems on the market, would sell. These systems allow

course ,

various tasks

500n

dropped because of its slow , groping nature in those days, in favor of vision , although computer vision was still in its embryonic state. Early demonstrations

has

of

computers with a TV

become

practical

in

camera

many

as

other

applications: TV coding, medicine, biology, meteorology, geophysics, etc. Although mutual effects must not be neglected in order to understand the history of ideas , this paper will focus on computer vision in the industrial robot

of

picture

measurement

in

almost

real-time (TV- rate) with a relatively modest investment in terms of storage and processing power . If the scene was sufficiently well arranged and the objects not too complex , the process of binarization, then, would not discard the information needed to analyse the image correctly.

en v ironment .

Today, the following robotic tasks have emerged as being suitable for visual input:

In spite of the euphoric demonstrations and announcements, only very few of these systems could be implemented successfully in production

1.

identifying parts

processes .

2.

inspection of parts

1.

3.

determining the point where to grasp a part

4.

determining the point where to place a part

-179

Some of the reasons are:

Limited Technical Capabilities: Binary vision systems can not analyse complex objects in a complex scene , the binarization is fooled by dirt on the background or on the objects as well as by reflective surfaces , objects must not overlap , etc .

480 2.

C. L. Haass Lack of comfort and flexibility : Most of the currently available vision systems need complicated tuning with lengthy programming efforts in order to adapt the system to a different task . Very often the producer of the vision system is not familiar with the problems of the customer , the software is cryptic , and the customer has not sufficient background to understand its function .

Whereas the second category of problems might be attributed to the lack of industrial maturity of the system or to the infant market situation , the technical limitations of binary vision systems are insurmountable.

However ,

some

applications

of

binary vision systems for robot control should be mentioned, that - in spite of the above-mentioned limitations - have shown potential usefulness of vision systems for robot control and thus have participated in the growing swell of expectations :

Image analysis can be conside r ed as the inverse process of the pictur e - taking act , i . e ., the recovery of the thr ee - dimensional geometry , information

about

two- dimensional process results

surface

featu r es

etc .,

information , the inverse process of image

is generally an

from

pictur es . Since the imaging in a dramatic reduction of ill - posed

problem

and

analysis

therefore

requires the introduction of constraints , or clues ,

which help to recover as much information of the original scene as possible . Even the best - understood problems of vision , the r efore , could hardly be solved by binary vision systems . Current research in image analysis might be divided in three frontiers : image analysis of a single gray- tone or col or image , analysis of stereo pairs of images , and the analysis of moving images .

RECOVERY OF SCENE INFORMATION FROM A SINGLE IMAGE 1.

Picking from a moving conveyor belt One of the early, spectacular exercises of vision systems was to guide a robot in order to pick an object from a moving conveyor belt . Although the performance of this task suggests a closed - loop type of robot control , most of the demonstrations still employ some form of open-loop control . Very often a fixed camera is directed perpendicularly from above to the conveyor belt , with the vision system identifying the relative position of a single , non-overlapping object on the belt , and the time of passage (Holland , Rossol and Ward , 1979; Foith and co - workers , 1980) . Assuming no slippage in the belt and with the belt speed given or measured , the position of the workpiece

as

a

function

of

time

can

be

calculated , such that the robot is guided to an appropriate pick position. 2.

Tracking Closed-loop control mechanisms can implemented with binary vision systems , if object

is

not

too

complex

and

moves on

be the a

Seam-Welding This application has created considerable interest because of its potential for automation. Due to the often limited complexity of the scene around the seam , visual guidance might be feasible even with binary vision systems. In most of the implementations, the seam is traced by extracting its position

and

curvature

in

TV

frame rate (Jones and Starke , 1984). Another interesting approach is applicable for arc welding (Niepold , 1982) . Here a picture showing the melting pool and the tip of the electrode is analysed . Since the welding arc itself would impede any useful TV imaging , pictures are taken asynchronously during the frequent arc breakdowns when the protruding electrode touches the work pieces . The (binary) image of melting pool and electrode tip does not only determine the location of the electrode relative to the seam in

X,

frontier

dates back to the beginning the scene was restricted

to a " blocks world " (Roberts , 1965 ; Huffman , 1971) , and the task consisted of finding logical clues for the interpretation of

edges ,

corners ,

faces

and

volumes. The problem of edge detection , i.e . low- level image analysis , was usually given less consideration . Very often the discussion started off from the existence of ideal drawings of block - type wire - frame models. Nevertheless , the analysis of block - world pictures has provided us with valuable insights in the problems of scene - description , and influenced the emerging disciplines usually subsumed under the term image I' understanding ". Interestingly enough , the technique of homogeneous transformation between object space and image plane (Roberts , 1965) has become standard also in robotics for relating various systems of manipulator or world coordinates

homogeneous background. Visual tracking systems had been developed first to track missiles (Schalkoff and McVey , 1982); here the requirement for an object of simple geometry on homogeneous background is fulfilled. Kalman trackers were also employed to predict the position of a moving part in order to grab it (Niemi , Malinen, and Koskinen , 1977) . Although critics have often regarded this problem as too academic for a real - life factory pick - and - place work station, this was mainly due to the simplistic scene required to successfully employ binary vision systems. 3.

Work at this

of computer vision, where

y, and z axis , but

also allows the control of current , voltage , and even oscillating motion of the electrode holder, as required for some types of seams .

to each other. Today it has been recognized that edge - detection is not a trivial task. A three - dimensional body edge generally is hypothezised at the location of a discrete gray - value transition in the image . This conclusion , borders of

however , leads to false alarms at the shadows , reflections , or surface

properties. Another important problem is the scale of gray - value transition: this depends on the resolution , the size of the object in the image plane , and the roundness of the actual edge . One of the edge - operators that has received wide-spread attention is the zero- crossing filter (Hildreth , 1983) . This band - pass filter is a combination of Gaussian low- pass and Laplacian high - pass, hence mapping gray - level slopes to zero - crossings (the Laplacian can be considered as a second derivative). The frequency response has the shape of a Mexican hat. The importance of this filter lies not only in the simple implementation and flexibility in tuning its frequency response , but in the discovery that the human visual system possesses similar edge operators (Marr and Hildreth , 1980) , which can be explained as the difference of two Gaussian low pass filters of different cut - off frequencies . The difference of Gaussian (DOG) ,however, is only another approach of deriving the Mexican hat operator . Due to the problems mentioned abcve , edge detection requires a subsequent stage in order to discard wrong edges, connect broken edges , and possibly compare the results from

various

operator

sizes .

This requires the introduction of (higher level) models for scene objects which essentially adds new constraints to the image analysis problem. An elegant method for this task is the Analysis - by - Synthesis method (Tropf , 1980) . Here , edges or connected lines or pieces of curvature are

successively combined

or

discarded

in

order

to

4Hl

Visual Feedback for Robots synthesize a representation of the object, while comparing the synthesis with available models. The process is stopped, when a certain difference between image model and reference model has been achieved. This method allows even the recognition of work pieces that overlap (Haettich, 1982). Further clues about the three-dimensional scene can be derived from assumptions about the symmetry of edges (Kanade, 1983). As we all know, edges are not the only clues to interpret the three-dimensional structure of a depicted scene. Edge detectors derive information only at the transitions of gray-values . Another powerful tool is the utilization of photometric laws. Each single gray-value of the image is a result of the geometric constellation of illumination source and the surface normal of the object, as well as properties of the light source, the surface reflectivity and the imaging process. The inverse process, also called shape from shading, is an operation that maps the gray-value of each pixel to the surface normal (Ikeuchi and Horn, 1981). Si nce for a single pixel, the surface normal cannot be inferred uniquely from its gray-value, new constraints have to be added. The introduction of two powerful tools has made an iterative solution to this problem possible: one is the gradient space (Mackworth, 1973), and the other one is the reflectance map (Horn, 1977). The

gradient

three-dimensi onal

space surface

two-dimensional plane parameters

is

mapping

normal vectors

being

associated

a spanned

with

its

by

into

the

surface

of a

two

normal.

Under orthographic projection, tilted surface planes thus map into points of the two-dimensional gradient space. The reflectance map is obtained in the gradient space by tilting an illuminated plane surface in all possible directions (normal vectors)

In order to obtain connected surface elements that can be used to test models, e.g., with a relational structure, the surface orientation

map needs to be

properly interpolated and segmented (Henriksen, 1984). This identification process might become rather difficult again, because some features of the image representation might not show details required by the reference models. Efficient search strategies are therefore as important as robust low-level operators. Although the approaches mentioned in this section might all be theoretically convincing, the problems of real-world applications are still a heavy burden. Successful shape recognition from a single gray-value image has been achieved so far only under laboratory conditions. For industrial needs, methods of structured illumination, especially with a single light ray, have been introduced, mostly due to the fact that the deviation of the reflected light ray can often be measured with binary vision systems (Myers, 1980). For complex tasks, not a single ray but a mesh of rays (Hall and co-workers, 1982) might need to illuminate the object. Although the reconstruction then does not have to cope with those problems inherent to the gray-value analysis of the Horn method, others are still manifold. Since these methods make use of a camera as a means to record the deviated reflection of structured illumination, they may be considered as a kind of active sensing. Most of the methods that employ structured illumination, however, are still difficult to implement in an industrial environment (e.g., they require

absence

of other light sources,

are

not

always sufficiently flexible), such that the use of laser systems becomes

more

and

more

attractive.

Laser systems can be employed for the generation of depth maps in at least three different ways: as a source

of

structural

illumination,

where

the

circles.

reflecting point is located in the image of a calibrated camera and depth calculated by triangulation (Levi, 1983); the time-of-flight measurement of depth (Jarvis, 1983); or holographic methods (Slater and Blake, 1984). Although these methods are often not considered as belonging to the field of computer vision, the resulting depth map can usually be interpolated and segmented with the same methods as those obtained from

Ho rn (1977) has developed a method to recover the normal of an illumi nated surface through the

are combined, they could represent a powerful tool for the analysis of geometric structures. It will

relative to the

viewer.

For

each normal vector,

the relative brightness of reflection is recorded. Usually then, lines of constant brightness are connected. In the case of purely Lambertian reflection, with the surface being illuminated from the same direction as

constant

(relative)

the

viewer,

brightness

these lines of

are

concentric

"shape-from-shading" methods.

"characteristic

strip-expansion"

which

is

an

iterative propagat ion of the image-irradiance equation performed alternatingly in the reflectanc e map (which has to be given for the right surface and illumination), and the gray-level ima ge itself. To begin this walk, the surface normal of a particular image poin t must be known. on, the iterative procedure produces

From then a unique

solution for the surface normal on all image points rea ched du ring the walk.

be interesting to

see,

If

laser and camera

if passive or active forms

of range finding will become more popular for industrial purposes. Laser systems, like computer vision methods,

need

considerable

time

for

the

calculation of a depth map, but because they suffer from less ambiguity than vision systems, their implementation requires less constraints. Passive vision systems, however, might prove to be more

flexible. ~~C~INFO~T ION

FROM STEREO

The juxtaposition of a

second

camera

seems to be

This method is not only problematic because of the requirement of a give n reflecta nce map and a known initial point, but also due to the detrimental influence of image noise on the accumulation of

the natural way of retrieving depth information from a scene. This problem has been looked at not only by computer vision people, but to a great extent by physiologists who are trying to find out

errors.

how the human

Several improvements have therefore

been

suggested: to start the walk at various points simultaneously and interpolate the surface in between

by introducing smoothness constraints,

or

enter additional constraints like additional pictures with different illumination geometries (Ikeuchi and Horn, 1981), or make use of texture (\;itkin , 1981). The availability of high-resolution color cameras allows even further photometric clues

(Jarvis , 1982). In any case , the

surface

orientation

map, as any

chart of gray-value edges, again is still a low-level representation of the image, hardly allowing a direct identification of the objects.

stereoscopic

vision

system

works.

Undisputed is the fact that the second camera, or eye, allows some sort of triangulation, provided it is performed with the same physical point in the depicted scene. The first problem in stereo vision therefore is the establishment of correspondences of image point pairs depicting the same point in the scene. This however leads to two further questions: what are good candidates for feature points, and how should the search be guided in order to find the matching partner in the other image? Good candidates of features are those that cannot be confused

easily,

e.g.,

corners,

regions and points . The fact that those are often sparse, simplifies the search

small

features for the

L". I.. H ,lass corresponding partner, but a ls o generates on ly a depth map with spar se entries. A subsequent interpolation wo uld require the introduct ion of

as the analysis of the three-dimensional wor ld f r om a mobile camera (e.g . , for navigational purposes). The analysis of images as they are produced by a TV

surface

camera

smoo thne s s constraints .

For example ,

the

image of a plane surfa c e in space might have four corners. Only if we introduce a surface smoothness constraint, all depth values for points in between could be calculated.

with,

required

e.g . ,

if

25

frames

per

s econd,

a closed - loop control system

strict sense is required .

is

in

the

This, howeve r , does not

make the

research efforts on the analysis of stil l

images,

as

obsolete.

presented

On

the

in

previous

contrary , the

chapters ,

analysis

of

a

The discovery that the human stereoscopic vision system works even with random dots in the absence of edges and corners has forced researchers t o

dynamic envi r onment as

revise the theory of stereoscopic

There are, in general terms , three approaches to the analysis of moving images, sometimes, however,

vision

(Julesz ,

1971). It was concluded requires some sort

that stereoscopic vision of local grouping or

"co - operative

around

process"

each

i mage

po int

(Marr and Poggio, 1976). Later, surface consistency constraints were als o introduced (Grimson , 1981) . However, it is still not fully clear what type of primiti ve features the human visual system utilizes ,

nor

how

the human

visua l

process

capable of resolving ambiguities so well. eye movement might be rather complicated copied

is

Since to be

by industrial vision systems, one can

make

use of the homogeneous resoluti o n within the visual field of cameras . A fi xed geometry of the two image planes allows the search of co rrespondence s alon g so-called epipolar lines which, in the case of identical cameras with their optical axes in the same directi on , are even parallel in both i mages (Fryer, 1982). The two-dimensional search for the correspondenc e then problem, and aside

becomes from a

a

one - dimen s i o nal "forb idd e n" zone

re s ulting from dis pa rities in the parallactic occlusion o f scene elements , the o rder o f obj ect features along the e pipo lar line is identical in both images .

seen

by

a camera requi r es

the utilization of all clues available.

leading to similar results : these are the feature match, the optical flow, and the numerical signa l analysis method. In the latter , the stream of images is cons id ered as a four-dimensional signal with

moving

parts

"filtered" out

of

the

scene

(Bees, 1982). Problems with this approach are usually due t o the ill fitting of the underlying model;

for

example,

subjected to a

if

image

harmonic

sequences

analysis ,

requires un - realistic assumptions

this

are

approach

about the nature

o f the imag es and would not make use of r elevant a - pri o ri knowledge . A filter with less implementation problems is the pixel - by- pixel difference

ope rato r

for

consecutive

frames

(Yachida, Ichinose and Tsuji , 1983; Wenstoep , 1983). This operation is clearly a linear temporal filter , and the (temporal) response (i . e. energy as a funct io n of ob ject image displacement) depends on the c hosen delay f actor . Unfortunately , the filter app r oa ch generally destroys the pictorial features of moving scene element s.

The feature matching used

in

stereo

method

is similar to the one

vision.

Here,

however ,

From a technical point of view , the precision of depth measurements with stereo cameras is of grea t intere s t. If the stereoscopic base lin e is short, the matching process has t o rea ch sub- pi x e l precision; if the base is wide , the " forbidden

correspondences have to be parts of the scene , and a

zone", gets lar ge r , the be come les s similar ,

co rre sp::mdi ng pri mitives and even phot ometr i c

become a considerable task; if there are too few , erroneous correspondences could have a detrimental

properties might lead to disparate gray - level transition patterns at the same physical object point. Other er r o r sources a re t he limited res o lution of the camera , focusing p r ob l ems of the

infl u ence, and also a full retrieval of relevant information might not be possible. It is therefore important to se lect features that display a high degree o f uniqu eness (Kories and Zimme rmann, 1984). These migh t be straight lines, corners, o r image

lenses, geometric errors

of

the

imaging

system ,

aggravated by mechanical imprecision and vibration. The method of searching f or matches al o ng epipolar line s ther efore might be too re st rictive in a n industrial environment, requiring higher-level f eatures for the matching process (Grimson, 1985) . precision in the cm range might be tole ra ble for o bstacle recognition and navi gation assembly tasks certainly require purposes , precis ion in the r:un range. Neasured against these requirements , there is still a great deal of r esearch work to be done before s ter eo ca n be

Whereas

implemented in industrial visi o n systems.

One

way

t o go might be to combine stereo vision with photometric and other depth clues , possibly even with laser ra nge finders . The integ r ation o f various sensors might pote n tial ly result in systems that are superior to each single -se nso r device , but

research on this to pic has start e d only recently. Th e discrepancy between the advancement of

line search as in stere o is , in general, not possible. Again , i f there are t oo many features in

the imag e,

The

camera (e. g . , f or surveillance purposes) ,

as

well

ambiguities

might

requires

in order to solve

unmatchable features.

If

the p rocess is c arr ied over to subsequent images , a

whole chain of feature matches might evolve , that could not o nly suppcrt the constraints in the relaxation process, but also might help in identitifyi n g the three-dimensional structure of the moving object (Ullman , 1983) . Optical flow is defined a s the dynamic flow of b r ightness in the visual field. In technical systems, this flow is rec o rded on a time - sampled s e quence of images. The optical flow approach might be considered as the lower - end extreme of the feature matching method: here , ev e ry single pixel is th e ca rrier of a feature, namely its gray-level (o r co l or) . It should be poin ted out , that the opt ical flow is given in terms of brightness values

into

dynamic

of

actual matching (rel axation) problem

ambiguities and identify

on - li ne fe edback contro l is even more dramatic f o r the analysi s of moving images, as the following section might show .

ima gery , i .e. , dynamic scenes taken by a sta ti o nary

resolut ion

the introduction of constraints,

and

Th is p r oblem encompasses all situations of

the

points with c ertain contrast characteristics.

theoretical requirement studies and implementation of "i ntelligent " r obo t s with visual, real - time,

ANALYSIS OF MOVING IMAGES

established on moving reduction to epipolar

not

object

photometric

points

of

the

scene.

problems and shadows have to be

account when the inversion from o ptical

Hence,

taken flow

to obj ec t motion is to be undertaken . The p r oblem o f optic al flow derivation can be approached from the Taylor - series expansion of gray-level displ acement (Limb and Murphy, 1975 ) , or from s o lvin g the flow equation directly (Horn and Schunck , 1981 ) . In a ny ca se, a solution of this p r oblem is l oc all y impcssible. For example , the fl ow wi thin homogeneous gray-level areas is local l y

\·isual Feedhack for Rohots undetermined, and even for gray-value edges, only the vector component perpendicular to the edge can be determined. Hence the solution requires the introduction of constraints and a strategy of relaxation from local to global estimates. One of the frequently used constraints is smoothness, assumin g there are no discontinuities in the resulting vector field of brightness flow. But since these do exist across the border of moving objects, especially where they occlude or disclose background, the smoothness constraint has t o be subjected to further restrictions. Since object ed ges are general ly assumed at gray-value edges, the extraction of these can play a cent ral part in restricting the smoothness constraints (Cornelius and Kanade, 1983).

-183

maximum velocity of a

numerical

solutions

are

required,

with

being, area,

o r by marking such

a

system

might be sufficient for the surveillance work area of a robot (Haass, 1984).

of

the

The record of location coordinates of moving objects can be directly fed into a Kalman filter. Thus, if the object 's motion can be modelled as a dynamical process (and this is usually true at least during a short time interval) , and if the frame rate of the vision system is sufficiently high, predictions could be derived about the object ' s location in the next image frame. Predictions could g r eatly ease many tasks i n moving image analysis, for example they could result in a substantial reduction of computation in the relaxation process

This strategy can be generalized by implementing the smoothness constraint directly in the derivatives of the gray-value (Nagel , 1984) . Since , generally speaking, the flow equation including the constraints can not be solved analytically ,

human

the entrance zone of a work

of

feature

correspondences or

optical flow vectors. The derivatives of the tra jectories such as velocity and acceleration could be directly implemented in a closed-loop robot control system. Here lies a considerable back -log demand for research.

initial

values derived from those points where the flow equation might offer a unique solution (certa i n gray - value corn ers, for instance) . Besides the above mentioned problems of inferring object motion from optic flow , there are still various other questions that require intensive research: suitable operators to estimate the loca l gray - value derivatives (operator size, prior low-pass filtering), the search for efficient , stable and

HARDWARE Currently

Computer

bottleneck

is

information

Vision's

hardware. content

most

Due of

to

restrictive

the

a

enormous

single

image ,

notwithstanding a sequence of images bei ng produced in TV rate, cl assi ca l von - Neumann

processors

need

All current implementations require off-line image processing with considerable computing power, and occasional manual interaction. This is all the

to be replaced or supported by paralle l architectures. This is particularly true for the job of l ow-level processing because of the inherent parallelity of the algorithms. About two dozen designs of processors for that level exist today, from pipeline structures (cytocomputer) to full - array architectures (CLIP) (Computer , 1983). So far, little attention has been given to the similarity of homogeneous transform operations for

more true for applications,

vision and r obotic manipulator

fast

converging

important,

the

iterati on

schemes,

conditioning

of

and,

the

most

smoothness

constraint .

where

vision

systems

are to guide a mobile robot through an obstacle course (Moravec, 1983). Together with the gamut of unsolved problems from determining the 3-D-structure of the scene or the motion parameters of object trajectories from the optical flow , to segmenting and deriving symbolic descriptions for the retrieved information , extensive additional work is needed to develop machines which meet industrial standard. Although the above - mentioned methods of analysing moving images are clearly a result of profou nd theoreti c al

research,

other ,

more

empiric

approaches have been undertaken in the past to solve a particular problem with currently available technology. This requires that strong constraints for image interpretati on have to be supported by particular circumstances.

For

examp le,

a camera

mounted on the ceiling di r ected down on to the work area of a robot for surveillance purposes (Haass , 19 82) can utilize the assumption, that all object s then move basically in a two- dimensiona l plane , hence not requi ring any depth measurement .

contro l .

unified design for special purpose be investigated.

Here,

hardware

a

could

Whereas front - end processing is well under research, and some kind of Multiple Data (SIMD , MIMD) architecture will eventually be implemented in each computer vision system (only optical computing devices wi ll create new technological leaps in th is field again) , the terra incognita really begins with the interface to the symbolic level. Various fronts for research, however, have been opened, e.g., investigating representation , acquisition , storage and processing of knowledge , inference machines, etc. ( Arlabosse and

co - wo rkers, 1985 ; Nitzan , 1985). Although some of these topics include well - known problems like syntactic pattern recognition, model design and search strategies , it seems that new formalisms emerge that all o w more rigorous approaches to research.

CONCLUSION Because o f

the

mu ltitude

of

unsolved

problems ,

Once an image could be segmented in II change " and lino change" areas, a binary vision system mi ght suffice in order to register appropriate features

compu ter

of

are likely to experience increasing deployment rates with g r owing sophistication. The tasks usually referred to as v isu al o n-line feedback include the analysis of the three - dimensional

the

"change"

areas

and

establish

a

r ecord

describing their motion. The problem of distinguishing between illumination changes and changes due to moving objects can be attacked by photometric arguments: changing the illumination or casting a shadow on stationary image elements can be modelled as a multiplicative process on the gray - value , by a constant factor within the affected area. Hence , if the division of subsequent image gray - values leads to a constant fa cto r within this area , the hypothesis of change due to illumination could be accepted. The interpretation of objects then is performed solely on the base of the dynamic characteristics of the areas

indicating

including

other

"change

due

constraints,

to

like

motion".

minimum

By

and

vision for feed - back in robot control

is

still far from industrial ubiquity, although vision systems

for inspection have entered the arena

geomet ry of the scene

from a much

as

well

and

as moving objects ,

fixed or moving platform , and are therefore more

challenging .

~evertheless ,

research

laboratories all over the world have star ted working on the pr oblems that need to be overcomE and are ve ry often supported by national or, as ir the European Community, 5upranational researct programs (Rathmill and co- workers, 1985) . Initial results have been demonst rated so far, but usual l} require severe res trictions

of

the

scene ,

heav}

computational po ..·er, and off - line processing. Although almost all technology forecasts foreseE

L. L. Haass the emergence of vision-controlled robots in the near future, insight in the problems might help preserving a certain scepticism on the time-scale of industrial implementation. Demand-pull exists (like rationalization, flexibility requirements, product quality improvements, demand for the production of small lots), so does technology push (including sinking prices for electronic parts, higher integration, improved software tools, improved sensors). This, however is not only true for vision systems, but other sensors will emerge which will be competitive or superior in tasks that previously had been almost exclusively assigned to vision. One can expect that in the evolution of industrial knowledge - based robots, vision will play a role only in combinations with other sensors, where the system structure is determined from what is needed and what each sensor alone or in combinations with others can achieve.

Disclaimer The views represented in this paper reflect solely the opinion of the author.

Literature Agin, G.J., and R.O. Duda ( 1975 ) . SRI vision research for advanced industrial autolllation. 2nd USA-Japan Co~puter Conf ., Tokyo, pp. 113-117. Arlabosse, F., S. Celi, M. Gallanti, G. Guida, L Horowitz, and A. Stefanini (1985). Literature survey on qualitative Modeling of physical systells, tellporal reasoning, and distributed K8S architectures. Technical Report TR-I for ESPRIT project No 256, CEC, 8russels. 8atchelor, 8. (1982). Enthusiasts debate illuoination and optics at U.S. workshop. Sensor Review July 1982, pp.157-159. 80es, U.(1982). Velocity-matched linear filterin9 in tioe-varying imagery. Proc. 6th ICPR, pp 1136-1139. Chin, R. I., and C.A. Harlow ( 1982). Automated visual inspection: a survey. IEEE PAMI-4, No. 6, pp 557-573. COMPUTER ( 1983), Vol. 16, No. I, special issue on co.puter ~itectures~ge processing_ IEEE. Cornelius, N. and 1. Kanade (1983). Adapting optical flow to lIIeasure object motion in reflectance and x-ray buge sequences. Proc . ACM Siggraph / Si9art Workshop on Motion: Representation and Perception, Toronto, pp 50-58. Ferretti, M. ( 1984 ) . Les yeux des robots. Sciences et Techniques, No. I, pp 34-47. Foit~., C. Eisenbarth, E. Enderle, H.Geisselmann, H.Ringshauser, and G. Zimmermann ( 1980). Visual sensor for workpiece recognition on moving conveyor belt (in German), in: Wege zu sehr fortgeschri ttenen Handhabungssystelllen, Springer Verlag 8erlin. pp 135-154. Fryer, R. J . ( !982 ) . ARCHIE - an experimental 3-d vision syste •• 4th ROVISEC, London. Grioson, W.E.L. ( 1981 ) . A computational theory of visual surFace interpolation . MI T AI Lab Memo No 613. Grioson, W.E.L. ( 1985 ) .Cooputational experiments with a feature base d stereo algorithm, IEEE PAMI-7, No. I, pp 17-34. Haass, U.L., H.8. Kuntze. and W. Schill ( 1982 ) . A surveillance system for obstacle recognition and collision avoidance control in robot environ.ent, 2n c ROVISEC, pp 357-366. Haass. U.L. ( 1984). Trackin9 of dynamic changes in TV i.age se quences for motion interpretatio n (in German). Reports of the Fraunhofer-Society, F.R. Ge- -any. No. 2 , pp. 12-16. Paettich, W. ( 1982 ) . Rec09nition of overlapping work pieces by model directed construction of work piece contours. In: Digital Systems for Autolllation, Crane, Russak COllpany, Vol. I, No 2-3, pp 223-239. Hall, E.L., J.8.K. !io, Ch.A . McPherson, and F.A. Sad j adi ( 1982 ) . Measuring curved surfaces for robot vision. !!!I Co.puter, ~, No. 12, pp. 42-54. Henriksen, K., P. Jo hansen, and S. Olsen ( 1984).Open Probleos, report froll a discussion at the Copenhagen Workshop on Cooputer Vision, Jan. 1984, DIKU, Copenhagen. Hildreth, E.C. ( 1983 ) . The detection of intensity changes by computer and biological vision systems . CVGIP ~, No. I, pp 1-27.

Holland, S.~., L. Rossol , and M.R. Ward (1979). CONSIGHT-I: a vision-controlled robot syste. for transferring parts fro. belt conveyors. Co.p. Vision and Sensor-Based Robots, G.G. Dodd and L. Rossol, edts. Plenu., New York, pp.81-97 Horn, 8.K.P.(1977). Understanding i.age intensities. Artificial Intelligence, Vol.8, No.2, pp. 201-231. ---Horn, 8.K.P., and 8.G. Schunck (1981). Deter.ining optical flow. Artificial Intelligence, ~, No.I-3, pp.185-203. Hufhan, D.A.(1971). hpossible objects as nonsense sentences. Machine Intelligence, B. Meltzer and D. Michie, eds., Vol 6, pp. 295-323. Ikeuchi, Ko and 8.K.P. Horn (1981). NUlerical shape fro. shading and occluding boundaries. Artificial Intelligence, Vol.17, No.I-3, pp. 141-184. Jarv~.A. (1982) . Expedient 3-d robot color vision, 2nd ROVISEC, Stuttgart, pp 327-338. Jarvis, R.A. (1983). A laser ti.e-of-flight scanner for robotic vision, IEEE PAMI-5, No.5, pp. 505-512. Jones, S .B. and G. Starke (1984). Applications, advantages and approaches for illage analysis in the control of arc weldin9.ESPRIT '84, Status Report on On90in9 Work, J. Roukens and J.F. Renuart, Eds., North-Holland, Amsterda., 425-455. Julesz, 8. (1971) . Foundations of Cyclopea n Perception, The University of Chicago Press, Chicago . Kanade, T.(1983).Geooetrical aspects of interpreting images as a three-di.ensional scene, IEEE Proc., Vol.71, 7, 789-802. Kories, R., and G. Zimoer.ann (1984). Mo~detection in i,age sequences: an evaluation of feature detectors. Proc 7th I CPR , pp. 778-780 . Levi, P. (1983). Laser range finding: industrial robots learn spatial vision (in Gr • • ), ELEKTRONIK, No. 12, pp. 93-98. Li.b, J.O ., and J.A. Murphy ( 1975). Esti.ating the velocity of loving i.ages in television signals, CGIP, 4 , pp. 311-327. Mackworth, A.K. (1973). Interpreting pict;;;:;s -of polyhedral scenes. Artificial Intelligence,~, pp. 121-137. Marr, D. and E.C. Hildreth (1980). Theory of edge detection. Proc. R. Soc. London, Ser. 8, Vol 207, pp. 182-217. Marr, D. and T. Poggio (1976). Cooperativ e computation of ste reo disparity. Science 194, pp. 283-287. Moravec, H.P. (1983) . The Stanford cart and the CMU rover. IEEE Proc. Vol. 71, No. 7, pp. 872-884. Myers, W. (1980). Industry begins to use visual pattern recognition, IEEE CO'puter, Vol.13, No. 5, pp. 21-30. Nagel, H.-H. (1984). Recent advances in i.age sequence analysis . Proc. ler ColloQue Image - Traitellent, Synthese, Technol09ie et Applications. 8iarritz , pp. 545-558. Niemi, A., P. Malinen, and K. Koskinen ( 1977). Di9itally il plemented sensing and control functions for a standard industrial robot. 7th Int. Symp. Ind. Robots, pp .48 7-495 . Niepold, R. ( 1982). Optical sensor system controls arc welding process. 2nd ROVISEC, pp 201-212. Nitzan, D. ( 1985). Development of intelligent robots: achievements and issues. IEEE J. RA Vol. I, No. I, pp.3-13. Paul, R.P . (1985). The early stag~robots. IEEE Control Systems Magazine, ~, No.I, pp 27-31. Rath.ill, K., P. MacConaill, S.O'Leary, and J.8rowne ( Edts.) ( 1985). Robot Technology and Applications - Proc. 1st Robotics Europe Conference. Springer Verlag, Berlin. Roberts. L.G. ( 1965). Machine perception of threedi.ensional solids. in: Electr a-Optical Informati on Processing, J. T. Tippett et aI., Eds •• MIT Press, Ca.bridge, pp 159-197. Schalkoff, R.J . and E. S. McVey ( 1982 ) . A model and tracking algorithm for a class of video targets IEEE PAMI-4, No. I pp. 2-10. Slater, M. and R.J. 81ake ( 1984 ) . Holographic 3-d i.age preprocessor for welding control feedback. Processing and Display of Three-Dimensional Data, J.J . Pearson, Edt., SPIE Vol. 507. Tropf, H. (1980). Analysis-by-synthesis search for se.antic segllentation, applied to workpiece recognition. Proc. 5th ICPR, Mia.i, pp 241-244. Ull.an, S. ( 1983 ) . Co.putatinal studies in the interpretation of structure and lIotion: sUllllury and extension. MIT AI Lab, Meoo 706 . Wenstoep, Oe.S. ( 1983 ) . Motion detection fro. i.age in fori ation. 3rd Scandin. Conf. Ioage Analysis, pp 381-386. ~itkin, A.P. ( 1981 ) . Recovering surface shape and orientation frol texture. Artificial Intelligence, ~, pp.17-45. Yachida, M., 1. Ichinose, and S.Tsuji ( 1983 ) . Model-guided lDonitoring of building environment by a lIobile robot. 8th IJCAI, pp. 1125-1127.