Cop\Tight © I F.-\C Robot Control (SHOW '85). Barcelona. Spain. 1985
VISUAL FEEDBACK FOR ROBOTS PROBLEMS AND PROGRESS U. L. Haass CEC, I lIjimnatioll
T~(hllolvgifs
Ru~
amI T~/f(Oll/lI/lIl1i((ltiolls Task d~ la Lvi, 8-10-19 , Brw.\~ls
Abstract This paper is concerned with possible applications of computer vision for feed-back contro l of industrial robots. Efforts in research to overcome the limitations of currently available bi nary vision systems are presented , but it is pointed out that new developments, especially for dynamic feed - back control, are stil l far from becoming industrial products, and euphoric expectations should be approached with caution. Three important directions of relevant research are dealt with in more detail , in order to demonstrate the range of problems: shape information from a sing l e image, from stereo, and image sequences.
dynamic
scene
information
from
input
device.
This,
however,
path control for co l lision grasping moving parts
avoidance
6.
external supervision of the work area
7.
navigation of mobile robots
and
Almost every survey among the industrial clientele of vision systems shows that expectations had risen with the early demonstrations of vision systems and had been pu shed by a mu l titude of euphoric publications , conferences and industrial fairs, but they have not been met to this day (Batchelor, 1982). This has lead to a considerable lack of confidence in new products on the market, which only with the recent development of new- generation gray - scale systems seems to have passed its low point.
So far, the history of industrial robots has still been almost parallel to the development of computer vision systems . But although the idea of making robots "intelligent" by giving them vision capabilities was always sufficiently plausible to spur enormous efforts in research, the degree of industrial implementation is far less than the amount of technical publications might suggest. As Richard P . Paul (1985) recalls, the first robots equipped with external sensors used touch-feedback an
5.
Scene
I NTRODUCTION
as
ESPR IT 200,
Although the first two tasks are not confined to industrial robots (Chin and Harlow , 1982), the connection of vision and robotics has current l y gained such prestige that almost any type of industrial vision system is advertised under the glamorous label of robotics (Ferretti , 1984). That these claims of 'robotics' a r e usual l y not being refuted might be attributed to the fact that sophisticated visual feedback systems capable of the more genuine tasks 3 to 7 have not left the laboratories yet, due to the enormous amount of technical problems st il l to be solved.
Keywords Computer Vision , Robotics , Visual Feedback, Analysis , Moving Images
Fur(~ .
was
BINARY VISION SYSTEMS FOR ROBOT CONTROL From the academic point of view, vision systems for
input device, mostly at MIT (Roberts, 1965) and Stanford University (Agin and Duda, 1975) , had triggered a flood of activities in research laboratories around the world. Computer Vision, of
industrial applications should benefit from a number of circumstances. As opposed to systems for outdoor applications , they offer the possibility to arrange scene and illumination, usually deal with parts displaying man-made geometries, and a large amount of a - priori knowledge is available . Hence it was felt that even binary vision systems , three years ago still the only type of vision systems on the market, would sell. These systems allow
course ,
various tasks
500n
dropped because of its slow , groping nature in those days, in favor of vision , although computer vision was still in its embryonic state. Early demonstrations
has
of
computers with a TV
become
practical
in
camera
many
as
other
applications: TV coding, medicine, biology, meteorology, geophysics, etc. Although mutual effects must not be neglected in order to understand the history of ideas , this paper will focus on computer vision in the industrial robot
of
picture
measurement
in
almost
real-time (TV- rate) with a relatively modest investment in terms of storage and processing power . If the scene was sufficiently well arranged and the objects not too complex , the process of binarization, then, would not discard the information needed to analyse the image correctly.
en v ironment .
Today, the following robotic tasks have emerged as being suitable for visual input:
In spite of the euphoric demonstrations and announcements, only very few of these systems could be implemented successfully in production
1.
identifying parts
processes .
2.
inspection of parts
1.
3.
determining the point where to grasp a part
4.
determining the point where to place a part
-179
Some of the reasons are:
Limited Technical Capabilities: Binary vision systems can not analyse complex objects in a complex scene , the binarization is fooled by dirt on the background or on the objects as well as by reflective surfaces , objects must not overlap , etc .
480 2.
C. L. Haass Lack of comfort and flexibility : Most of the currently available vision systems need complicated tuning with lengthy programming efforts in order to adapt the system to a different task . Very often the producer of the vision system is not familiar with the problems of the customer , the software is cryptic , and the customer has not sufficient background to understand its function .
Whereas the second category of problems might be attributed to the lack of industrial maturity of the system or to the infant market situation , the technical limitations of binary vision systems are insurmountable.
However ,
some
applications
of
binary vision systems for robot control should be mentioned, that - in spite of the above-mentioned limitations - have shown potential usefulness of vision systems for robot control and thus have participated in the growing swell of expectations :
Image analysis can be conside r ed as the inverse process of the pictur e - taking act , i . e ., the recovery of the thr ee - dimensional geometry , information
about
two- dimensional process results
surface
featu r es
etc .,
information , the inverse process of image
is generally an
from
pictur es . Since the imaging in a dramatic reduction of ill - posed
problem
and
analysis
therefore
requires the introduction of constraints , or clues ,
which help to recover as much information of the original scene as possible . Even the best - understood problems of vision , the r efore , could hardly be solved by binary vision systems . Current research in image analysis might be divided in three frontiers : image analysis of a single gray- tone or col or image , analysis of stereo pairs of images , and the analysis of moving images .
RECOVERY OF SCENE INFORMATION FROM A SINGLE IMAGE 1.
Picking from a moving conveyor belt One of the early, spectacular exercises of vision systems was to guide a robot in order to pick an object from a moving conveyor belt . Although the performance of this task suggests a closed - loop type of robot control , most of the demonstrations still employ some form of open-loop control . Very often a fixed camera is directed perpendicularly from above to the conveyor belt , with the vision system identifying the relative position of a single , non-overlapping object on the belt , and the time of passage (Holland , Rossol and Ward , 1979; Foith and co - workers , 1980) . Assuming no slippage in the belt and with the belt speed given or measured , the position of the workpiece
as
a
function
of
time
can
be
calculated , such that the robot is guided to an appropriate pick position. 2.
Tracking Closed-loop control mechanisms can implemented with binary vision systems , if object
is
not
too
complex
and
moves on
be the a
Seam-Welding This application has created considerable interest because of its potential for automation. Due to the often limited complexity of the scene around the seam , visual guidance might be feasible even with binary vision systems. In most of the implementations, the seam is traced by extracting its position
and
curvature
in
TV
frame rate (Jones and Starke , 1984). Another interesting approach is applicable for arc welding (Niepold , 1982) . Here a picture showing the melting pool and the tip of the electrode is analysed . Since the welding arc itself would impede any useful TV imaging , pictures are taken asynchronously during the frequent arc breakdowns when the protruding electrode touches the work pieces . The (binary) image of melting pool and electrode tip does not only determine the location of the electrode relative to the seam in
X,
frontier
dates back to the beginning the scene was restricted
to a " blocks world " (Roberts , 1965 ; Huffman , 1971) , and the task consisted of finding logical clues for the interpretation of
edges ,
corners ,
faces
and
volumes. The problem of edge detection , i.e . low- level image analysis , was usually given less consideration . Very often the discussion started off from the existence of ideal drawings of block - type wire - frame models. Nevertheless , the analysis of block - world pictures has provided us with valuable insights in the problems of scene - description , and influenced the emerging disciplines usually subsumed under the term image I' understanding ". Interestingly enough , the technique of homogeneous transformation between object space and image plane (Roberts , 1965) has become standard also in robotics for relating various systems of manipulator or world coordinates
homogeneous background. Visual tracking systems had been developed first to track missiles (Schalkoff and McVey , 1982); here the requirement for an object of simple geometry on homogeneous background is fulfilled. Kalman trackers were also employed to predict the position of a moving part in order to grab it (Niemi , Malinen, and Koskinen , 1977) . Although critics have often regarded this problem as too academic for a real - life factory pick - and - place work station, this was mainly due to the simplistic scene required to successfully employ binary vision systems. 3.
Work at this
of computer vision, where
y, and z axis , but
also allows the control of current , voltage , and even oscillating motion of the electrode holder, as required for some types of seams .
to each other. Today it has been recognized that edge - detection is not a trivial task. A three - dimensional body edge generally is hypothezised at the location of a discrete gray - value transition in the image . This conclusion , borders of
however , leads to false alarms at the shadows , reflections , or surface
properties. Another important problem is the scale of gray - value transition: this depends on the resolution , the size of the object in the image plane , and the roundness of the actual edge . One of the edge - operators that has received wide-spread attention is the zero- crossing filter (Hildreth , 1983) . This band - pass filter is a combination of Gaussian low- pass and Laplacian high - pass, hence mapping gray - level slopes to zero - crossings (the Laplacian can be considered as a second derivative). The frequency response has the shape of a Mexican hat. The importance of this filter lies not only in the simple implementation and flexibility in tuning its frequency response , but in the discovery that the human visual system possesses similar edge operators (Marr and Hildreth , 1980) , which can be explained as the difference of two Gaussian low pass filters of different cut - off frequencies . The difference of Gaussian (DOG) ,however, is only another approach of deriving the Mexican hat operator . Due to the problems mentioned abcve , edge detection requires a subsequent stage in order to discard wrong edges, connect broken edges , and possibly compare the results from
various
operator
sizes .
This requires the introduction of (higher level) models for scene objects which essentially adds new constraints to the image analysis problem. An elegant method for this task is the Analysis - by - Synthesis method (Tropf , 1980) . Here , edges or connected lines or pieces of curvature are
successively combined
or
discarded
in
order
to
4Hl
Visual Feedback for Robots synthesize a representation of the object, while comparing the synthesis with available models. The process is stopped, when a certain difference between image model and reference model has been achieved. This method allows even the recognition of work pieces that overlap (Haettich, 1982). Further clues about the three-dimensional scene can be derived from assumptions about the symmetry of edges (Kanade, 1983). As we all know, edges are not the only clues to interpret the three-dimensional structure of a depicted scene. Edge detectors derive information only at the transitions of gray-values . Another powerful tool is the utilization of photometric laws. Each single gray-value of the image is a result of the geometric constellation of illumination source and the surface normal of the object, as well as properties of the light source, the surface reflectivity and the imaging process. The inverse process, also called shape from shading, is an operation that maps the gray-value of each pixel to the surface normal (Ikeuchi and Horn, 1981). Si nce for a single pixel, the surface normal cannot be inferred uniquely from its gray-value, new constraints have to be added. The introduction of two powerful tools has made an iterative solution to this problem possible: one is the gradient space (Mackworth, 1973), and the other one is the reflectance map (Horn, 1977). The
gradient
three-dimensi onal
space surface
two-dimensional plane parameters
is
mapping
normal vectors
being
associated
a spanned
with
its
by
into
the
surface
of a
two
normal.
Under orthographic projection, tilted surface planes thus map into points of the two-dimensional gradient space. The reflectance map is obtained in the gradient space by tilting an illuminated plane surface in all possible directions (normal vectors)
In order to obtain connected surface elements that can be used to test models, e.g., with a relational structure, the surface orientation
map needs to be
properly interpolated and segmented (Henriksen, 1984). This identification process might become rather difficult again, because some features of the image representation might not show details required by the reference models. Efficient search strategies are therefore as important as robust low-level operators. Although the approaches mentioned in this section might all be theoretically convincing, the problems of real-world applications are still a heavy burden. Successful shape recognition from a single gray-value image has been achieved so far only under laboratory conditions. For industrial needs, methods of structured illumination, especially with a single light ray, have been introduced, mostly due to the fact that the deviation of the reflected light ray can often be measured with binary vision systems (Myers, 1980). For complex tasks, not a single ray but a mesh of rays (Hall and co-workers, 1982) might need to illuminate the object. Although the reconstruction then does not have to cope with those problems inherent to the gray-value analysis of the Horn method, others are still manifold. Since these methods make use of a camera as a means to record the deviated reflection of structured illumination, they may be considered as a kind of active sensing. Most of the methods that employ structured illumination, however, are still difficult to implement in an industrial environment (e.g., they require
absence
of other light sources,
are
not
always sufficiently flexible), such that the use of laser systems becomes
more
and
more
attractive.
Laser systems can be employed for the generation of depth maps in at least three different ways: as a source
of
structural
illumination,
where
the
circles.
reflecting point is located in the image of a calibrated camera and depth calculated by triangulation (Levi, 1983); the time-of-flight measurement of depth (Jarvis, 1983); or holographic methods (Slater and Blake, 1984). Although these methods are often not considered as belonging to the field of computer vision, the resulting depth map can usually be interpolated and segmented with the same methods as those obtained from
Ho rn (1977) has developed a method to recover the normal of an illumi nated surface through the
are combined, they could represent a powerful tool for the analysis of geometric structures. It will
relative to the
viewer.
For
each normal vector,
the relative brightness of reflection is recorded. Usually then, lines of constant brightness are connected. In the case of purely Lambertian reflection, with the surface being illuminated from the same direction as
constant
(relative)
the
viewer,
brightness
these lines of
are
concentric
"shape-from-shading" methods.
"characteristic
strip-expansion"
which
is
an
iterative propagat ion of the image-irradiance equation performed alternatingly in the reflectanc e map (which has to be given for the right surface and illumination), and the gray-level ima ge itself. To begin this walk, the surface normal of a particular image poin t must be known. on, the iterative procedure produces
From then a unique
solution for the surface normal on all image points rea ched du ring the walk.
be interesting to
see,
If
laser and camera
if passive or active forms
of range finding will become more popular for industrial purposes. Laser systems, like computer vision methods,
need
considerable
time
for
the
calculation of a depth map, but because they suffer from less ambiguity than vision systems, their implementation requires less constraints. Passive vision systems, however, might prove to be more
flexible. ~~C~INFO~T ION
FROM STEREO
The juxtaposition of a
second
camera
seems to be
This method is not only problematic because of the requirement of a give n reflecta nce map and a known initial point, but also due to the detrimental influence of image noise on the accumulation of
the natural way of retrieving depth information from a scene. This problem has been looked at not only by computer vision people, but to a great extent by physiologists who are trying to find out
errors.
how the human
Several improvements have therefore
been
suggested: to start the walk at various points simultaneously and interpolate the surface in between
by introducing smoothness constraints,
or
enter additional constraints like additional pictures with different illumination geometries (Ikeuchi and Horn, 1981), or make use of texture (\;itkin , 1981). The availability of high-resolution color cameras allows even further photometric clues
(Jarvis , 1982). In any case , the
surface
orientation
map, as any
chart of gray-value edges, again is still a low-level representation of the image, hardly allowing a direct identification of the objects.
stereoscopic
vision
system
works.
Undisputed is the fact that the second camera, or eye, allows some sort of triangulation, provided it is performed with the same physical point in the depicted scene. The first problem in stereo vision therefore is the establishment of correspondences of image point pairs depicting the same point in the scene. This however leads to two further questions: what are good candidates for feature points, and how should the search be guided in order to find the matching partner in the other image? Good candidates of features are those that cannot be confused
easily,
e.g.,
corners,
regions and points . The fact that those are often sparse, simplifies the search
small
features for the
L". I.. H ,lass corresponding partner, but a ls o generates on ly a depth map with spar se entries. A subsequent interpolation wo uld require the introduct ion of
as the analysis of the three-dimensional wor ld f r om a mobile camera (e.g . , for navigational purposes). The analysis of images as they are produced by a TV
surface
camera
smoo thne s s constraints .
For example ,
the
image of a plane surfa c e in space might have four corners. Only if we introduce a surface smoothness constraint, all depth values for points in between could be calculated.
with,
required
e.g . ,
if
25
frames
per
s econd,
a closed - loop control system
strict sense is required .
is
in
the
This, howeve r , does not
make the
research efforts on the analysis of stil l
images,
as
obsolete.
presented
On
the
in
previous
contrary , the
chapters ,
analysis
of
a
The discovery that the human stereoscopic vision system works even with random dots in the absence of edges and corners has forced researchers t o
dynamic envi r onment as
revise the theory of stereoscopic
There are, in general terms , three approaches to the analysis of moving images, sometimes, however,
vision
(Julesz ,
1971). It was concluded requires some sort
that stereoscopic vision of local grouping or
"co - operative
around
process"
each
i mage
po int
(Marr and Poggio, 1976). Later, surface consistency constraints were als o introduced (Grimson , 1981) . However, it is still not fully clear what type of primiti ve features the human visual system utilizes ,
nor
how
the human
visua l
process
capable of resolving ambiguities so well. eye movement might be rather complicated copied
is
Since to be
by industrial vision systems, one can
make
use of the homogeneous resoluti o n within the visual field of cameras . A fi xed geometry of the two image planes allows the search of co rrespondence s alon g so-called epipolar lines which, in the case of identical cameras with their optical axes in the same directi on , are even parallel in both i mages (Fryer, 1982). The two-dimensional search for the correspondenc e then problem, and aside
becomes from a
a
one - dimen s i o nal "forb idd e n" zone
re s ulting from dis pa rities in the parallactic occlusion o f scene elements , the o rder o f obj ect features along the e pipo lar line is identical in both images .
seen
by
a camera requi r es
the utilization of all clues available.
leading to similar results : these are the feature match, the optical flow, and the numerical signa l analysis method. In the latter , the stream of images is cons id ered as a four-dimensional signal with
moving
parts
"filtered" out
of
the
scene
(Bees, 1982). Problems with this approach are usually due t o the ill fitting of the underlying model;
for
example,
subjected to a
if
image
harmonic
sequences
analysis ,
requires un - realistic assumptions
this
are
approach
about the nature
o f the imag es and would not make use of r elevant a - pri o ri knowledge . A filter with less implementation problems is the pixel - by- pixel difference
ope rato r
for
consecutive
frames
(Yachida, Ichinose and Tsuji , 1983; Wenstoep , 1983). This operation is clearly a linear temporal filter , and the (temporal) response (i . e. energy as a funct io n of ob ject image displacement) depends on the c hosen delay f actor . Unfortunately , the filter app r oa ch generally destroys the pictorial features of moving scene element s.
The feature matching used
in
stereo
method
is similar to the one
vision.
Here,
however ,
From a technical point of view , the precision of depth measurements with stereo cameras is of grea t intere s t. If the stereoscopic base lin e is short, the matching process has t o rea ch sub- pi x e l precision; if the base is wide , the " forbidden
correspondences have to be parts of the scene , and a
zone", gets lar ge r , the be come les s similar ,
co rre sp::mdi ng pri mitives and even phot ometr i c
become a considerable task; if there are too few , erroneous correspondences could have a detrimental
properties might lead to disparate gray - level transition patterns at the same physical object point. Other er r o r sources a re t he limited res o lution of the camera , focusing p r ob l ems of the
infl u ence, and also a full retrieval of relevant information might not be possible. It is therefore important to se lect features that display a high degree o f uniqu eness (Kories and Zimme rmann, 1984). These migh t be straight lines, corners, o r image
lenses, geometric errors
of
the
imaging
system ,
aggravated by mechanical imprecision and vibration. The method of searching f or matches al o ng epipolar line s ther efore might be too re st rictive in a n industrial environment, requiring higher-level f eatures for the matching process (Grimson, 1985) . precision in the cm range might be tole ra ble for o bstacle recognition and navi gation assembly tasks certainly require purposes , precis ion in the r:un range. Neasured against these requirements , there is still a great deal of r esearch work to be done before s ter eo ca n be
Whereas
implemented in industrial visi o n systems.
One
way
t o go might be to combine stereo vision with photometric and other depth clues , possibly even with laser ra nge finders . The integ r ation o f various sensors might pote n tial ly result in systems that are superior to each single -se nso r device , but
research on this to pic has start e d only recently. Th e discrepancy between the advancement of
line search as in stere o is , in general, not possible. Again , i f there are t oo many features in
the imag e,
The
camera (e. g . , f or surveillance purposes) ,
as
well
ambiguities
might
requires
in order to solve
unmatchable features.
If
the p rocess is c arr ied over to subsequent images , a
whole chain of feature matches might evolve , that could not o nly suppcrt the constraints in the relaxation process, but also might help in identitifyi n g the three-dimensional structure of the moving object (Ullman , 1983) . Optical flow is defined a s the dynamic flow of b r ightness in the visual field. In technical systems, this flow is rec o rded on a time - sampled s e quence of images. The optical flow approach might be considered as the lower - end extreme of the feature matching method: here , ev e ry single pixel is th e ca rrier of a feature, namely its gray-level (o r co l or) . It should be poin ted out , that the opt ical flow is given in terms of brightness values
into
dynamic
of
actual matching (rel axation) problem
ambiguities and identify
on - li ne fe edback contro l is even more dramatic f o r the analysi s of moving images, as the following section might show .
ima gery , i .e. , dynamic scenes taken by a sta ti o nary
resolut ion
the introduction of constraints,
and
Th is p r oblem encompasses all situations of
the
points with c ertain contrast characteristics.
theoretical requirement studies and implementation of "i ntelligent " r obo t s with visual, real - time,
ANALYSIS OF MOVING IMAGES
established on moving reduction to epipolar
not
object
photometric
points
of
the
scene.
problems and shadows have to be
account when the inversion from o ptical
Hence,
taken flow
to obj ec t motion is to be undertaken . The p r oblem o f optic al flow derivation can be approached from the Taylor - series expansion of gray-level displ acement (Limb and Murphy, 1975 ) , or from s o lvin g the flow equation directly (Horn and Schunck , 1981 ) . In a ny ca se, a solution of this p r oblem is l oc all y impcssible. For example , the fl ow wi thin homogeneous gray-level areas is local l y
\·isual Feedhack for Rohots undetermined, and even for gray-value edges, only the vector component perpendicular to the edge can be determined. Hence the solution requires the introduction of constraints and a strategy of relaxation from local to global estimates. One of the frequently used constraints is smoothness, assumin g there are no discontinuities in the resulting vector field of brightness flow. But since these do exist across the border of moving objects, especially where they occlude or disclose background, the smoothness constraint has t o be subjected to further restrictions. Since object ed ges are general ly assumed at gray-value edges, the extraction of these can play a cent ral part in restricting the smoothness constraints (Cornelius and Kanade, 1983).
-183
maximum velocity of a
numerical
solutions
are
required,
with
being, area,
o r by marking such
a
system
might be sufficient for the surveillance work area of a robot (Haass, 1984).
of
the
The record of location coordinates of moving objects can be directly fed into a Kalman filter. Thus, if the object 's motion can be modelled as a dynamical process (and this is usually true at least during a short time interval) , and if the frame rate of the vision system is sufficiently high, predictions could be derived about the object ' s location in the next image frame. Predictions could g r eatly ease many tasks i n moving image analysis, for example they could result in a substantial reduction of computation in the relaxation process
This strategy can be generalized by implementing the smoothness constraint directly in the derivatives of the gray-value (Nagel , 1984) . Since , generally speaking, the flow equation including the constraints can not be solved analytically ,
human
the entrance zone of a work
of
feature
correspondences or
optical flow vectors. The derivatives of the tra jectories such as velocity and acceleration could be directly implemented in a closed-loop robot control system. Here lies a considerable back -log demand for research.
initial
values derived from those points where the flow equation might offer a unique solution (certa i n gray - value corn ers, for instance) . Besides the above mentioned problems of inferring object motion from optic flow , there are still various other questions that require intensive research: suitable operators to estimate the loca l gray - value derivatives (operator size, prior low-pass filtering), the search for efficient , stable and
HARDWARE Currently
Computer
bottleneck
is
information
Vision's
hardware. content
most
Due of
to
restrictive
the
a
enormous
single
image ,
notwithstanding a sequence of images bei ng produced in TV rate, cl assi ca l von - Neumann
processors
need
All current implementations require off-line image processing with considerable computing power, and occasional manual interaction. This is all the
to be replaced or supported by paralle l architectures. This is particularly true for the job of l ow-level processing because of the inherent parallelity of the algorithms. About two dozen designs of processors for that level exist today, from pipeline structures (cytocomputer) to full - array architectures (CLIP) (Computer , 1983). So far, little attention has been given to the similarity of homogeneous transform operations for
more true for applications,
vision and r obotic manipulator
fast
converging
important,
the
iterati on
schemes,
conditioning
of
and,
the
most
smoothness
constraint .
where
vision
systems
are to guide a mobile robot through an obstacle course (Moravec, 1983). Together with the gamut of unsolved problems from determining the 3-D-structure of the scene or the motion parameters of object trajectories from the optical flow , to segmenting and deriving symbolic descriptions for the retrieved information , extensive additional work is needed to develop machines which meet industrial standard. Although the above - mentioned methods of analysing moving images are clearly a result of profou nd theoreti c al
research,
other ,
more
empiric
approaches have been undertaken in the past to solve a particular problem with currently available technology. This requires that strong constraints for image interpretati on have to be supported by particular circumstances.
For
examp le,
a camera
mounted on the ceiling di r ected down on to the work area of a robot for surveillance purposes (Haass , 19 82) can utilize the assumption, that all object s then move basically in a two- dimensiona l plane , hence not requi ring any depth measurement .
contro l .
unified design for special purpose be investigated.
Here,
hardware
a
could
Whereas front - end processing is well under research, and some kind of Multiple Data (SIMD , MIMD) architecture will eventually be implemented in each computer vision system (only optical computing devices wi ll create new technological leaps in th is field again) , the terra incognita really begins with the interface to the symbolic level. Various fronts for research, however, have been opened, e.g., investigating representation , acquisition , storage and processing of knowledge , inference machines, etc. ( Arlabosse and
co - wo rkers, 1985 ; Nitzan , 1985). Although some of these topics include well - known problems like syntactic pattern recognition, model design and search strategies , it seems that new formalisms emerge that all o w more rigorous approaches to research.
CONCLUSION Because o f
the
mu ltitude
of
unsolved
problems ,
Once an image could be segmented in II change " and lino change" areas, a binary vision system mi ght suffice in order to register appropriate features
compu ter
of
are likely to experience increasing deployment rates with g r owing sophistication. The tasks usually referred to as v isu al o n-line feedback include the analysis of the three - dimensional
the
"change"
areas
and
establish
a
r ecord
describing their motion. The problem of distinguishing between illumination changes and changes due to moving objects can be attacked by photometric arguments: changing the illumination or casting a shadow on stationary image elements can be modelled as a multiplicative process on the gray - value , by a constant factor within the affected area. Hence , if the division of subsequent image gray - values leads to a constant fa cto r within this area , the hypothesis of change due to illumination could be accepted. The interpretation of objects then is performed solely on the base of the dynamic characteristics of the areas
indicating
including
other
"change
due
constraints,
to
like
motion".
minimum
By
and
vision for feed - back in robot control
is
still far from industrial ubiquity, although vision systems
for inspection have entered the arena
geomet ry of the scene
from a much
as
well
and
as moving objects ,
fixed or moving platform , and are therefore more
challenging .
~evertheless ,
research
laboratories all over the world have star ted working on the pr oblems that need to be overcomE and are ve ry often supported by national or, as ir the European Community, 5upranational researct programs (Rathmill and co- workers, 1985) . Initial results have been demonst rated so far, but usual l} require severe res trictions
of
the
scene ,
heav}
computational po ..·er, and off - line processing. Although almost all technology forecasts foreseE
L. L. Haass the emergence of vision-controlled robots in the near future, insight in the problems might help preserving a certain scepticism on the time-scale of industrial implementation. Demand-pull exists (like rationalization, flexibility requirements, product quality improvements, demand for the production of small lots), so does technology push (including sinking prices for electronic parts, higher integration, improved software tools, improved sensors). This, however is not only true for vision systems, but other sensors will emerge which will be competitive or superior in tasks that previously had been almost exclusively assigned to vision. One can expect that in the evolution of industrial knowledge - based robots, vision will play a role only in combinations with other sensors, where the system structure is determined from what is needed and what each sensor alone or in combinations with others can achieve.
Disclaimer The views represented in this paper reflect solely the opinion of the author.
Literature Agin, G.J., and R.O. Duda ( 1975 ) . SRI vision research for advanced industrial autolllation. 2nd USA-Japan Co~puter Conf ., Tokyo, pp. 113-117. Arlabosse, F., S. Celi, M. Gallanti, G. Guida, L Horowitz, and A. Stefanini (1985). Literature survey on qualitative Modeling of physical systells, tellporal reasoning, and distributed K8S architectures. Technical Report TR-I for ESPRIT project No 256, CEC, 8russels. 8atchelor, 8. (1982). Enthusiasts debate illuoination and optics at U.S. workshop. Sensor Review July 1982, pp.157-159. 80es, U.(1982). Velocity-matched linear filterin9 in tioe-varying imagery. Proc. 6th ICPR, pp 1136-1139. Chin, R. I., and C.A. Harlow ( 1982). Automated visual inspection: a survey. IEEE PAMI-4, No. 6, pp 557-573. COMPUTER ( 1983), Vol. 16, No. I, special issue on co.puter ~itectures~ge processing_ IEEE. Cornelius, N. and 1. Kanade (1983). Adapting optical flow to lIIeasure object motion in reflectance and x-ray buge sequences. Proc . ACM Siggraph / Si9art Workshop on Motion: Representation and Perception, Toronto, pp 50-58. Ferretti, M. ( 1984 ) . Les yeux des robots. Sciences et Techniques, No. I, pp 34-47. Foit~., C. Eisenbarth, E. Enderle, H.Geisselmann, H.Ringshauser, and G. Zimmermann ( 1980). Visual sensor for workpiece recognition on moving conveyor belt (in German), in: Wege zu sehr fortgeschri ttenen Handhabungssystelllen, Springer Verlag 8erlin. pp 135-154. Fryer, R. J . ( !982 ) . ARCHIE - an experimental 3-d vision syste •• 4th ROVISEC, London. Grioson, W.E.L. ( 1981 ) . A computational theory of visual surFace interpolation . MI T AI Lab Memo No 613. Grioson, W.E.L. ( 1985 ) .Cooputational experiments with a feature base d stereo algorithm, IEEE PAMI-7, No. I, pp 17-34. Haass, U.L., H.8. Kuntze. and W. Schill ( 1982 ) . A surveillance system for obstacle recognition and collision avoidance control in robot environ.ent, 2n c ROVISEC, pp 357-366. Haass. U.L. ( 1984). Trackin9 of dynamic changes in TV i.age se quences for motion interpretatio n (in German). Reports of the Fraunhofer-Society, F.R. Ge- -any. No. 2 , pp. 12-16. Paettich, W. ( 1982 ) . Rec09nition of overlapping work pieces by model directed construction of work piece contours. In: Digital Systems for Autolllation, Crane, Russak COllpany, Vol. I, No 2-3, pp 223-239. Hall, E.L., J.8.K. !io, Ch.A . McPherson, and F.A. Sad j adi ( 1982 ) . Measuring curved surfaces for robot vision. !!!I Co.puter, ~, No. 12, pp. 42-54. Henriksen, K., P. Jo hansen, and S. Olsen ( 1984).Open Probleos, report froll a discussion at the Copenhagen Workshop on Cooputer Vision, Jan. 1984, DIKU, Copenhagen. Hildreth, E.C. ( 1983 ) . The detection of intensity changes by computer and biological vision systems . CVGIP ~, No. I, pp 1-27.
Holland, S.~., L. Rossol , and M.R. Ward (1979). CONSIGHT-I: a vision-controlled robot syste. for transferring parts fro. belt conveyors. Co.p. Vision and Sensor-Based Robots, G.G. Dodd and L. Rossol, edts. Plenu., New York, pp.81-97 Horn, 8.K.P.(1977). Understanding i.age intensities. Artificial Intelligence, Vol.8, No.2, pp. 201-231. ---Horn, 8.K.P., and 8.G. Schunck (1981). Deter.ining optical flow. Artificial Intelligence, ~, No.I-3, pp.185-203. Hufhan, D.A.(1971). hpossible objects as nonsense sentences. Machine Intelligence, B. Meltzer and D. Michie, eds., Vol 6, pp. 295-323. Ikeuchi, Ko and 8.K.P. Horn (1981). NUlerical shape fro. shading and occluding boundaries. Artificial Intelligence, Vol.17, No.I-3, pp. 141-184. Jarv~.A. (1982) . Expedient 3-d robot color vision, 2nd ROVISEC, Stuttgart, pp 327-338. Jarvis, R.A. (1983). A laser ti.e-of-flight scanner for robotic vision, IEEE PAMI-5, No.5, pp. 505-512. Jones, S .B. and G. Starke (1984). Applications, advantages and approaches for illage analysis in the control of arc weldin9.ESPRIT '84, Status Report on On90in9 Work, J. Roukens and J.F. Renuart, Eds., North-Holland, Amsterda., 425-455. Julesz, 8. (1971) . Foundations of Cyclopea n Perception, The University of Chicago Press, Chicago . Kanade, T.(1983).Geooetrical aspects of interpreting images as a three-di.ensional scene, IEEE Proc., Vol.71, 7, 789-802. Kories, R., and G. Zimoer.ann (1984). Mo~detection in i,age sequences: an evaluation of feature detectors. Proc 7th I CPR , pp. 778-780 . Levi, P. (1983). Laser range finding: industrial robots learn spatial vision (in Gr • • ), ELEKTRONIK, No. 12, pp. 93-98. Li.b, J.O ., and J.A. Murphy ( 1975). Esti.ating the velocity of loving i.ages in television signals, CGIP, 4 , pp. 311-327. Mackworth, A.K. (1973). Interpreting pict;;;:;s -of polyhedral scenes. Artificial Intelligence,~, pp. 121-137. Marr, D. and E.C. Hildreth (1980). Theory of edge detection. Proc. R. Soc. London, Ser. 8, Vol 207, pp. 182-217. Marr, D. and T. Poggio (1976). Cooperativ e computation of ste reo disparity. Science 194, pp. 283-287. Moravec, H.P. (1983) . The Stanford cart and the CMU rover. IEEE Proc. Vol. 71, No. 7, pp. 872-884. Myers, W. (1980). Industry begins to use visual pattern recognition, IEEE CO'puter, Vol.13, No. 5, pp. 21-30. Nagel, H.-H. (1984). Recent advances in i.age sequence analysis . Proc. ler ColloQue Image - Traitellent, Synthese, Technol09ie et Applications. 8iarritz , pp. 545-558. Niemi, A., P. Malinen, and K. Koskinen ( 1977). Di9itally il plemented sensing and control functions for a standard industrial robot. 7th Int. Symp. Ind. Robots, pp .48 7-495 . Niepold, R. ( 1982). Optical sensor system controls arc welding process. 2nd ROVISEC, pp 201-212. Nitzan, D. ( 1985). Development of intelligent robots: achievements and issues. IEEE J. RA Vol. I, No. I, pp.3-13. Paul, R.P . (1985). The early stag~robots. IEEE Control Systems Magazine, ~, No.I, pp 27-31. Rath.ill, K., P. MacConaill, S.O'Leary, and J.8rowne ( Edts.) ( 1985). Robot Technology and Applications - Proc. 1st Robotics Europe Conference. Springer Verlag, Berlin. Roberts. L.G. ( 1965). Machine perception of threedi.ensional solids. in: Electr a-Optical Informati on Processing, J. T. Tippett et aI., Eds •• MIT Press, Ca.bridge, pp 159-197. Schalkoff, R.J . and E. S. McVey ( 1982 ) . A model and tracking algorithm for a class of video targets IEEE PAMI-4, No. I pp. 2-10. Slater, M. and R.J. 81ake ( 1984 ) . Holographic 3-d i.age preprocessor for welding control feedback. Processing and Display of Three-Dimensional Data, J.J . Pearson, Edt., SPIE Vol. 507. Tropf, H. (1980). Analysis-by-synthesis search for se.antic segllentation, applied to workpiece recognition. Proc. 5th ICPR, Mia.i, pp 241-244. Ull.an, S. ( 1983 ) . Co.putatinal studies in the interpretation of structure and lIotion: sUllllury and extension. MIT AI Lab, Meoo 706 . Wenstoep, Oe.S. ( 1983 ) . Motion detection fro. i.age in fori ation. 3rd Scandin. Conf. Ioage Analysis, pp 381-386. ~itkin, A.P. ( 1981 ) . Recovering surface shape and orientation frol texture. Artificial Intelligence, ~, pp.17-45. Yachida, M., 1. Ichinose, and S.Tsuji ( 1983 ) . Model-guided lDonitoring of building environment by a lIobile robot. 8th IJCAI, pp. 1125-1127.