Three-dimensional surface measurement by microcomputer J R T Lewis and T Sopwith”
A simple method for measuring the surface shape of an object using structured lighting is described. Whilst many such methods have been proposed in the past, most have suffered from the need for sophisticated image analysis and pattern recognition processes to extract data from the resulting images. The method presented here requires only the simplest image .processing fwzctions and a relatively small amo~t of computation. It is suitable for implementation on a microcomputer. Although originally kievelopedfor a medical application, the method has wide applicability and might be suitable, for example, for ind~trial quality control. Keywords: surface measurement, structured light, microcomputers
Optical triangulation methods are particularly suitable for the measurement of the surface shape of objects when physical contact is undesirable. Usually these techniques rely on the projection of a pattern of light and dark stripes or grids onto the object in question, and the recording of one or more images of it (see for example References l-3). These techniques are often referred to as ‘active illumination’ or ‘structured lighting’ methods. The projection of patterns onto the object is important for two reasons. First, the objects to be measured, certainly in the medical field, are often rather featureless. The projected patterns add features to the images and allow measurements to be made. Secondly, the patterns reduce the complexity of the stereoscopic matching problem. In any stereoscopic method the correspondence problem must be solved. In techniques where a stereo pair of images is recorded, the correspondence problem consists of discovering pairs of points, one in each image, resulting from the same object feature. In techniques where one of the cameras is replaced by a projector, as in structured light methods, the corresIBM UK Scientific Centre, Aquitaine House, 2-5 St. Clements Street, Winchester, Hants SO23 9HE, UK *Lung Function Unit, Brompton Hospital, Fulham Road, London SW3 6HP, UK
pondence problem is that of knowing which part of the projected pattern gave rise to a particular image feature. The correspondence problem for images taken under ambient lighting conditions has been the subject of much research in the field of computer vision. A number of a~go~thms have been developed to solve it (see for example References 4-7). Each one requires a considerable amount of computer power since calculations must be performed iteratively for every pixel in each of the two images being processed. In structured light stereo systems the correspondence problem needs to be solved for the features in the projected pattern rather than for every pixel. This implies a considerable reduction in the amount of data to be processed. Even so, there is usually still a need for some form of image analysis or pattern recognition process to locate the projected features in the image. The problem is that the shape of the feature in the image depends on the surface onto which it is projected. Any technique for recognizing the features must be able to cope with these distortions. The difficulty of this problem has meant that stripe- or grid-based methods, used on complex objects, have usually required human observers to perform the tasks of interpreting the images and digitizing the data before the 3D coordinates can be calculated’+3. This is clearly a disadvantage in any situation where speed is essential or where many observations are required, such as routine measurements of significant numbers of human patients. Automatic image interpretation is obviously desirable. Essentially there are two possible ways to overcome the problem. The first is to develop automatic methods of image analysis for the existing structured light techniques. This involves the classic computer vision problems of image ~~entation and feature extraction. Neither is trivial and neither has a truly general solution yet. It is becoming clear that the solution to this type of problem will require some combination of sophisticated image processing and reasoning. Each of these requires considerable computer power. Although computer vision and image unde~tanding are topics of active research, the work is not yet at a stage where an application could be built routinely. In addition, analysis at
0262~8856/86/03 159-08 $03.00 0 1986 Butterworth & Co. (Publishers) Ltd vol4 no 3 august 1986
159
this level of complexity requires computer resources beyond the capability of the current generation of small, standalone computers. This was an important concern for the original medical application of the technique. The alternative approach is to avoid the complexity altogether and to look for projection patterns which give rise to simpler images which do not require sophisticated analysis. The system described here does that, bypassing many of the traditional problems of structured lighting but generating some new ones of its own. Initially it is being applied to the study of volume changes in the human chest during various types of respiratory manoeuvre. In this respect it is a natural successor to the optical contouring technique’. IMPLICATIONS
OF USING
SPOTS
The idea of using a projection pattern of spots, to overcome the image analysis problem in optical contouring’, was originally suggested by Denison’. Clearly, bright spots are easy to find in an image by simply applying a threshold. All pixels in the image brighter than the threshold are considered to belong to spots. To minimize the amount and complexity of the data the best approach is to record only the position of each spot. Since a spot will normally cover a group of pixels in the image some concept of the position of the spot is required to calculate it from the positions of the pixels of which it is composed. This will be discussed further below. As only the position of the spots is recorded they are all considered identical. This poses some difficulties for any technique for solving the correspondence problem. Consider a system with one projector and one camera. The correspondence problem in this situation is to match each spot in the image with the particular spot from the projector which caused it. If it could be guaranteed that all of the projected spots, or a least some known subset of them, would be visible in the image formed by the camera, disambiguation (the process of solving the correspondence problem) would be relatively straightforward. Once one correct match had been found, the others would follow using standard geometric constraints and sequence restrictions’.“. The constraints will be described in more detail later. For now it is only necessary to realize that no such guarantee about the visibility of a known set of projected points can be given. Occlusion of one part of the object by another part can easily prevent certain projected spots from being imaged. The number and position of the affected spots will depend on the shape of the object. Clearly any system based on the projection of spots must be able to recognize and cater for this situation. This difficulty will be referred to as the ‘missing spot’ problem. Initially, some form of spot encoding might appear an attractive proposition for reducing the ambiguity of matching. Unfortunately this immediately leads back to the problem of the way in which the image of the spot is affected by the surface onto which it is projected. Shape encoding would have exactly the same type of problem already discussed for stripes and grids. Colour encoding could be used for objects which themselves were neutral in colour. Strongly coloured objects would cause problems, of course. In addition, with current technology there are practical limitations and difficulties associated with the recording and storage of colour images. Fortunately
160
Patient /4
Right camera
camera
,’ Projector Figure I. A structured light system with three optical components instead of the more conventional two. A camera is placed symmetrically each side of a projector there is another approach which obviates the need for spot encoding. It involves using two cameras and a projector, as shown in Figure 1. The arrangement introduces additional geometric constraints which limit the possible pairings between the spots in the two camera images. This allows the correspondence problem to be solved, as will be shown. The whole essence of the technique is to use simple contraints to obtain the solution.
STEREOSCOPIC
CONSTRAINTS
There are a number of factors which constrain the possible pairings of points between the two images. Some of these are rigid geometric or photometric constraints, others are rules from observation about how the world behaves generally. These constraints have been derived for use in stereoscopic matching algorithms under ambient lighting conditions (see, for example, References 9 and 10). A number of these constraints are considered here in relation to objects illuminated by an array of bright spots from a projector and observed with a pair of cameras as depicted in Figure 1. The ‘contrast sign’ constraint” requires that stereo matching must take place between points in the images which are either both bright or both dark. There is no implication that both must necessarily have exactly the same brightness level, simply that a dark pixel cannot match a bright one. This not altogether surprising statement is an intuitive result and is also borne out by experiments on the human visual system (see, for example, Reference 15, Figure 5.5-2). The implication for illumination by spots is clear and essentially trivial. In trying to find a match for a given bright spot in one image, it is only necessary to consider bright spots in the other image. The ‘uniqueness’ constraint” states that, almost everywhere, a point in one image can match at most one point in the other image. The exceptions are those rare occasions when malicious alignment of object points
image and vision computing
for one image occurs. More important, for illumination by spots, is the case where a point in one image is not matched at ail by any point in the other image. This is a naturat consequence of the cameras having different fields of view and can also be caused by occlusion, as has been mentioned. It causes difficulty in using certain other constraints, notably that of sequence. However, the uniqueness constraint does imply that the problem will occur infrequently. Inability to match a point should therefore have little effect on the final computed surface, so long as it is not allowed to interfere with the correct matching of other pairs of spots. Matter is smooth almost everywhere. There are discontinuities in depth as a scene is viewed but these occupy much less of the image area than do smooth regionslO. In general then, points which are close to one another in the images are likely to have similar disparities. In addition to this rather general rule there does appear to be a practical limit to the rate of change of disparity which can be used for stereoscopic matching in humansI and in machine vision systems’. In the case of illumination by spots a certain degree of smoothness is enforced by the spacing of the spot matrix itself. Effectively the surface under investigation is being sampled with a spatial frequency defined by the spot spacing on the object. Although not uniform, this is normally of the same order of magnitude as the spot spacing when projected onto a plane surface at the same distance as the object. Surface features smaller than this spacing will not be recorded reliably. For example, although a depth discontinuity will be recorded because the disparities of points either side of it will be different, the shape of the discontinuity cannot be measured. Some indication of its shape can be gained from inte~olation between the measured surface points but this will usually lead to a smooth curved profIle whatever the actual form of the discontinuity. With this restriction in mind, the technique is clearly best suited to those situations in which a smooth object is being measured - hence its suitability for medical applications. Since the smoothness is to some degree imposed, use of this type of constraint for ruling out certain matches based on the disparity values of neighbouring points seems a reasonable way to proceed. The disadvantage with this type of approach is the need to examine the neighbourhood of a potential match before assigning a particular pairing. Such techniques require a significant amount of computer resource. The solution needs to be iterated until consistent pairings are achieved in each neighbourhood. Pairings need to be re-examined on each iteration as the neighbourhood disparities change. Given the constraints imposed by the type of computer available for the medical application, the smoothness criterion is best used as a means of trying to disambiguate a hopefully small number of points which remain ambiguous after application of the other constraints.
EPIPOLAR LINES Of fundamental importance in many stereoscopic systems, epipolar lines provide extremely useful geometric constraints. Consider again Figure I which shows a system in which the two cameras are placed symmetrically on either side of a projector. In this arrangement the stereoscopic geometry of the cameras provides the
voi 4 no 3 august 1986
“r-,_ I
Y
-_ -
-
Figure 2. Stereoscopic geometry for the left-hand camera of a pair following constraint. For a given point in one image, the corresponding point in the other image must lie on a line whose equation can be derived from the geometry of the system. These lines are called “epipolar’ lines. Their benefit derives from the fact that they reduce a two-dimensional search for possible matching points into a one-dimensional problem, The lines are not parallel, however, and a search for image points lying on them would in general be time consuming. There is, however, a convenient transformation” which results in a particularly simple way of implementing epipolar line constraints. This transformation is considered in more detail below. The geometry relating to the left-hand camera of the stereo pair is shown in Figure 2. The origin of the system of coordinate axes is at 0. P, with coordinates (x, y, z), represents a point on the object. C is the left camera of the stereo pair. Lines GH and IJ lie in a plane perpendicular to the optic axis OC of the camera. This plane is called the camera recording plane. The coordinates (a, h) of point R in the camera recording plane are related by translation and scaling factors to the pixel coordinates of the image of point P in camera C. These factors can be determined by including fiducial marks in the apparatus at known positions. Calculation of the coordinates of R in the recording piane is the first step in the analysis. Point R, (a, h) on the recording plane, actually lies at the point (a cos 0, h, a sin 0) in true (x, y, z) space, where 8 is the angle between OC and the positive z axis. CP represents the path of the reflected light ray from the object to the camera. Points C, P and R are colinear and the line CR intersects the plane z = 0 at point B whose coordinates are (x,, y,, 0). The plane z = 0 is known as the ‘base plane’. Since C has coordinates (-D,, 0, D,), it is clear that the equation of line CB is given by
161
(*)
= (;) = (5g
(1)
Using the fact that point R also lies on this line, it can be shown that x, =
Yl =
-a@,
sin 8 + D, cos 8 a sin 8 - D,
- bD, using -D,
(2)
Similar expressions can be developed to calculate the base plane coordinates (x,, y2) corresponding to real point (x, y, z) seen from the right-hand camera situated at (D,, 0, Q) x2
=
a(D, sin 8 + D, cos 8 a sin 8 + D,
(3) Yz =
bDz a sin 8 + D,
As we have seen, the left-hand camera situated at (-D,, 0, DJ views the point (x, y, z) as lying on the base plane at position (x,, y,, 0) and the right-hand camera situated at (D,, 0, 0,) views the point as lying on the base plane at position (x2, y,, 0). The base plane points, the camera points and the object point must all lie in a plane. The condition for this to be true is
(4)
which reduces to
~DPA_Y,-
~2)
=0
(5)
This is the result expected intuitively. The coordinates of the base plane projections of image points corresponding to the same object point will have the same y coordinate. In other words, the epipolar lines in the base plane are horizontal. The search for potential matches is now simple. Given a point in one image, the potential matches in the other image are just those points with the same base plane projection y coordinate. The task of finding potential matches by the epipolar line constraint is considerably simplified by the transformation to base plane coordinates. There is one interesting practical consequence which becomes apparent from this form of the epipolar line constraint. Potential matches are recognized by similarity of the y coordinate of their base plane projection. It is obviously advantageous to reduce the number of potential matches which occur by chance due to different projector rays having the same base plane y coordinates. This can be achieved by using a projected array of spots in which horizontal rows are avoided. Currently the prototype system employs a rectangular array of spots which is rotated around the optic axis of the projector by -25”. Without this rotation it is common to have 2&30 potential matches for each spot. This represents the typical number of spots on a row. With the rotated spot matrix a more typical figure is 3-5 potential matches
162
and usually a small number of spots have unambiguous matches. The lower number of potential matches greatly reduces the amount of searching and computation which must be performed during disambiguation. A further constraint which accrues from epipolar lines is that of ordering. In general, the sequence of image points along an epipolar line in one image is matched by the sequence of points on the corresponding epipolar line in the other image. This constraint applies to many real-world imaging situationsgv’2 and initially looks an attractive prospect for the spot matching problem. Once one match is found it should be a simple matter to assign all other matches on the same epipolar line just from their sequence. Unfortunately this approach is over simple as it ignores the problem of missing spots. A spot missing from one image would cause incorrect assignment of all spots after it in sequence on the epipolar lines. Obviously this situation would have to be detected and corrected. In fact the technique used to overcome the problem of missing spots itself introduces a new geometric constraint suitable for restricting the possible matches between image points. This constraint is based on the geometry of the projected array of spots and is independent of sequence. Sequence constraints are not currently employed in the disambiguation algorithm.
PROJECTION
GEOMETRY
CONSTRAINT
It is the addition of the projector in the system of Figure 1 which allows the missing spot problem to be overcome. The projector provides an additional geometric constraint. Not only must a surface point lie on rays to the two cameras but also on a ray from the projector. One way to utilize this constraint is as a test of reasonableness on the 3D coordinates produced by each potential match. For such coordinates to be correct they must lie on one of the projector rays. Furthermore, only one surface point can be illuminated by each projector ray. In the case of competition for a projector ray by more than one potential match, a ‘goodness of lit’ criterion can be applied. If both competing matches have very similar goodness of tit, disambiguation of the point may be deferred or even abandoned. The use of the projection geometry constraint has been crucial to the success of the disambiguation algorithm used in the prototype system.
CALCULATION
OF 3D COORDINATES
Once the pairings of spots are available, computation of the real 3D surface coordinate from which the image points arose is a simple matter of triangulation. The equations involved are derived from substitution of the base plane form of the epipolar line constraint into the equations of the rays from the object to each camera. As we have seen, the constraint is formulated as y, = y, and the equations involved are Equation (1) and the corresponding formula for the right-hand camera of the pair. Substitution of the constraint and combination of the equations for both cameras yields
image and vision computing
)
16) z=D:
(1-t )
Since all of the values on the right-hand side of the first expression in Equation (6) are known, it can be evaluated. The resulting value of x can be used in the second expression for y which can also be evaluated. Finally this value can be used to evaluate z.
DlSAMBlGUATfO~
ALGORITHM
Prior to disambiguation a certain amount of preprocessing is necessary. Essentially this is to convert the camera image of the object illuminated with spots into a list of spot coordinates. The practicalities of doing this are discussed below in relation to the prototype system. From this list of spot coordinates in the camera, the base plane coordinates are calculated using Equations (2) and (3). It is with these values that disambiguation is carried out. The algorithm makes use of the epipoiar line and projection geometry constraints. Essentially the sohttion can be thought of as taking place in two stages. First, a list is built for each point in one image of all the points in the other image which might match it. This list is built after application of the epipolar line constraint using the simple formulation already described, i.e. the y base plane coordinates of potential matches must be the same. Each of these potential matches is examined to find the ones which give rise to a 3D surface point close to a projector ray. Close in this sense means within some allowable error distance based on the dimensions of the apparatus and the desired level of precision in the final surface coordinates. If such a match is found it is considered to be the correct one and the other potential matches involving these image points are marked as invalid. If later, however, another potential match is found which requires the same projector ray and its goodness of fit criterion is better than the previously assigned match, the projector ray is reassigned to the new match and the previous match is ‘reambiguated’. That is, it is marked as once again being ambiguous. In addition, the other potential matches involving the image points concerned are reinstated as being available for matching. The process is iterated until no further change occurs. Typically only two or three iterations are required. Currently, post processing using the smoothness constraint to disambiguate remaining, unmatched spots has not been used. So far the number of spots not capable of disambiguation has been found to be very small. DETERMINING
PROJECTION
GEOMETRY
As we have seen, disambiguation requires information about the geometry of the projected array of spots. Essentially, the equation of each ray of light from the projector is required. Since the projector is approximated
vol4 no 3 august 1986
as a point source, one point on each ray is already known. One further point on each ray is required to define its path uniquely. A particularly convenient point on each ray is the one where it intersects the base plane. By the definition of the geometry of the system, points in the base plane have a z coordinate of 0. Hence only their x and y coordinates need to be established. Clearly it is necessary to project the array of spots onto a physical base plane. It is feasible to measure each spot’s x and y coordinates manually but a rather more elegant method is possible. Since the spots are already in the base plane their 3D coordinates and their base plane projected coordinates (Equations (2) and (3)) are the same. This can be seen easily from examination of the expressions for the 3D coordinates (Equation (6)). In the expression for z, substitution of the base plane restriction z = 0 obviously yields y = y, as the only physically reasonable solution. In a similar way substitution of y = y, into the expression for y leads to the result x = x,. Finally, substitution of this result into the expression for x yields x=
D.J + x2 2D,Y+ x + x2
(7)
Rearranging this gives (x - x2)(x f 0,) = 0
(8)
Clearly the result x = x2 is the interesting one. In summary then, the coordinates recorded by the cameras and transformed by the base plane projection formulae (Equations (2) and (3)) give the x and y components of the 3D coordinates of the projector ray end points directly. It is possible to determine the end points using only one camera but desirable to obtain the values from both cameras of a stereo pair and to average them. This procedure is also useful for checking that the camera alignment is correct. Any inaccuracies in the system geometry will manifest themselves as differences in the coordinate values of a given ray as determined from the two cameras. In general, the base plane is proving a particularly useful concept both theoretically and practically. In addition to simplifying implementation of the epipolar line constraint and providing a means to determine the projection geometry, many of the problems of alignment of the optical system can be overcome by use of suitable markers in the base plane. PROTOTYPE
SYSTEM
A prototype system was constructed to test the practicality of the method. The optical system is kept in the Brompton Hospital, and data is recorded as 35 mm photographic negatives before being sent to the IBM UK Scientific Centre in Winchester for digitization and processing. The object being studied is supported in a framework built from the simplest laboratory equipment, mainly retort stands. The framework also includes fiducial marks in the form of LEDs at known positions in space. These allow scaling of the data to be performed as well as providing alignment marks during digitization of the photographic negatives. The images are obtained from a pair of Olympus 35 mm SLR cameras mounted on
163
standard photographic tripods. The spot matrix is formed on the object using a Kodak Carousel slide projector and ‘Superslide’ sized transparencies made especially for the experiment. Digitization of the negatives is carried out using a vidicon TV camera and a framestore. The prototype software for the calculations has been written to run in the IAX image processing environment14. The features of this environment mean that prototypes can be built and modified rapidly, greatly simplifying development. The IAX system runs on a time sharing IBM 4341 mainframe computer under the VM operating system. Images and results are displayed on a Ramtek 9400 series display system. The first task after digitizing the images is to extract the coordinates of each spot. This is achieved by taking a ‘centre of gravity’ approach. Each spot is located and a list is built of the coordinates of the pixels of which it is composed. These pixels are found by examining the neighbours of each bright pixel. Pixels are considered to be bright if their grey level exceeds a specified threshold. Once the list is complete the coordinates of the component pixels are averaged to give the spot coordinate. Currently all bright pixels are given the same weight. i.e. they all contribute equally to the spot coordinate. It is conceivable that a weighting scheme could be used in which the brightest pixels in the spot were considered more important than the rest. Such a refinement is probably unnecessary when the projected spots are small and bright. However, if they are large and rather diffuse it could become important. Diffuse spots do tend to occur as a result of using a standard 35 mm projector as the light source. Projectors are designed to deliver as much light as possible and to form an image in a single plane. To this end they have a restricted depth of field. When spots are projected onto real curved surfaces many of them will be out of the plane of best focus. This leads to spreading of each spot and a loss of brightness. The out of focus effect is approximately circularly symmetric and so probably has little effect on the final computed spot centre coordinate. It is intended to study the effect of different weighting schemes on the spot centre coordinate calculation and on disambiguation. The spot centre coordinates are stored as a simple array. From this point onward all computations involving the image are complete and the image data is no longer required. The next step is to convert the spot coordinates into their base plane projections. This involves applying Equations (2) and (3) and scaling the coordinates from pixel values to real dimensions. The scaling factors are derived from the fiducial marks in the apparatus (LEDs). These appear in each image and their coordinates are established in essentially the same way as those of the projected spots. With the data in this form the disambiguation algorithm is applied, as already described. After disambiguation and subsequent application of Equation (6) a list of the 3D coordinates of the object’s surface is available.
RESULTS Figure 3 shows the reconstructed surface of a cylindrical object. This is an early result from the prototype system. The tilled black circles represent 3D coordinates reconstructed from the stereoscopic pair of camera views as
164
Figure 3. Reconstructed surface from a qtlindrical object already described. Interpolation between the coordinates is shown by the straight line segments in the figure. There are a number of places in the figure where the straight lines cross and where there is no coordinate. These are where the disambiguation algorithm was unable to reconstruct a 3D coordinate. The lines do not meet at a point in this situation, which is only to be expected as they are a linear approximation to the curved surface. Notice how the rows of coordinates are not horizontal. This is caused by the twisting of the array of projected spots used to reduce the number of potential matches. It is clear from Figure 3 that the cylinder was upright but not quite vertical. This was indeed the case for the object being studied. It was in fact a large metal tin covered in graph paper to present a matt surface and remove direct, specular reflections. The conclusion which can be drawn from this early result is that, despite the rather crude apparatus from which it was obtained, the technique has produced a representation of the surface. Obviously the cylinder is about the simplest possible case to consider. More recent work has illustrated that the technique can cope with more complex shapes in the form of a tailor’s mannequin.
ERROR AND
PRECISION
An analysis of the errors which accumulate in the system is under way. Clearly many factors affect the final level
image and vision computing
of precision. It appears from a very crude analysis that the error in position of each reconstructed point should be at the same order of magnitude as the level of error in positioning the cameras and projector. More work needs to be done to see how error levels are affected by, for example, the angular separation between the cameras. This may lead to novel arrangements of the optical components. It is worth pointing out that there are two quite separate error mechanisms at work in this system. By far the most significant is error introduced by choosing an incorrect pair of spots during disambiguation. This represents a catastrophic breakdown of the system as the resulting 3D coordinate will usually bear no resemblance to the correct value. Obviously this situation must not be allowed to occur. It is always safer to be conservative when accepting disambiguated data. For example, if two potential matches are competing for the same projector ray with about the same level of confidence there is no really safe way to choose the correct match. Indeed, it is possible that both are incorrect. It is safer to use neither. A post processing technique, perhaps one based on the smoothness criterion as discussed, can be used to try to choose between points like this. In the prototype system this problem has not occurred to any significant level and in genera1 it should be rare. For uncommon events like this the use of even computationally expensive algorithms is a practical possibility since the number of points to be processed is low. The second error mechanism is the usual one of uncertainty in the measurements made. In this system these include uncertainties in the distances and angles between the geometric centre of the system and the optical components as well as uncertainties in the location of the spots in the images. In the prototype system geometric precision is hard to achieve. The newer system, discussed below, will be better in this respect.
FUTURE
EXTENSIONS
The prototype system has shown that the basic technique is viable despite the relatively simple computation involved. Work is currently in hand to produce a version of the system which uses video cameras directly for input and which runs on an IBM PC. This version will also allow more precise positioning of the optical components and of the object under examination. This will enable a detailed analysis of the errors actually encountered rather than those predicted. However, like the prototype, this new version will be only a two-camera system. With the current geometry, the system covers about l/6 of the surface of an approximately cylindrical body. This leads to the first major extension to the system which will be required. The medical application, for which the system was originally developed, requires simultaneous observation of all or half of the patient’s trunk. Arrangements with six cameras and six projectors or four cameras and three projectors respectively could fulfil these requirements. Clearly, though, a significant hardware investment would be required for each system. One of the main reasons for studying the errors which accumulate in the system is to try to find geometries with as few optical components as possible which still meet the desired level of precision.
VOI4 no 3 august 1986
p2
Figure 4. An arrangement fbr recording the whole of u roughly cylindrical object
swfucc
Multiple projector systems introduce new problems of their own. The main problem is that of the additional spots. A six-camera system can be thought of as simply six two-camera systems. Each camera can participate with a projector and camera to its right or to its left. In Figure 4 cameras C, and C2 form a system with projector P,. However, C, can also form a second system with P, and C,. The problem is that when analysing one of these two-camera subsystems there are image points in the cameras which did not originate from the projector between them. The problems will occur when these spots by chance apparently lie close to rays from the projector being processed. Under these circumstances the possibility for incorrect spot matching is increased. Currently the only obvious way to minimize this possibility is to design the projection patterns carefully so as to minimize the overlap. Some other potential solutions are being considered. For example, by polarizing the light from adjacent projectors at different angles, and by using a polarizing filter on the camera, spots from the two projectors should record at different intensities and will be distinguishable. Obviously the nature of the reflectance of the surface of the object will dictate how effective this approach will be. Some types of surface reduce or eliminate the polarization of incident radiation on reflection. In some applications simultaneous observation of the entire object may not be required. Data for the whole surface can be acquired by rotating the object and making six individual observations using just two cameras and a single projector. The problems of a multiple projector arrangement are avoided in this way. The system being built around the PC has this capability to allow investigation of the practical difficulties associated with this type of observation. Ultimately it is hoped that the system will be able to be used to study the dynamics of the human respiratory system during breathing. This will require sequences of images to be recorded in real time. Even with the
165
video camera system only single images from each camera can be analysed at present. Each image consists of -250 kbyte of data. Just storing this data to disc takes several seconds. Further images cannot be acquired until storage is complete. Clearly, to increase the speed of the process some reduction of the quantity of data is essential. It should be possible to make substantial savings by recording data only in the region of the bright spots. A thresholding and run length encoding scheme, based on direct analysis of the video signals from the cameras, is under consideration. It should reduce the amount of data to a few kbytes per frame making feasible the storage of a significant number of frames in the main memory of the PC. After recording the data the entire sequence could be dumped to disc ready for analysis. This sort of scheme should make it possible to record data at many frames per second, a capability which would allow the desired study of the dynamics of the human chest. OTHER
APPLICATIONS
The system described is intended for use in the study of human breathing. As has been mentioned, the same approach could be used for many applications where the surface coordinates of an object must be recovered. Indeed for many applications the system could be even simpler than that described. For example, the use of specially devised spot arrays and projection angles specific to a given object could mean that in some applications the missing spot problem does not occur. This kind of approach would be suitable for quality control where small variations in an otherwise well defined shape are being monitored. Under these conditions only one camera is required and sequence criteria can be used in the disambiguation. Once one spot has been disambiguated the others follow simply from their position in the array. The base plane technique can still be employed to determine the projector ray geometry since it requires only one camera, as shown above. In other applications it may not be necessary to measure coordinates across the whole surface but just in selected areas. Again the same technique can be used but with projection arrays which deliver spots only to the regions of interest. It seems likely that many applications can be based on the simple stereoscopic structured light system which has been described.
SUMMARY A system has been described for measuring the surface coordinates of objects. It is best suited for relatively smooth surfaces. It has advantages over other systems in that all the data can be collected simultaneously and that the analysis is simple. It is suitable for implementation on microcomputers.
166
ACKNOWLEDGEMENTS The authors are indebted to Professor D M Denison of the Lung Function Unit of the Brompton Hospital for initially raising the idea of using projection patterns composed of spots to simplify the image processing task. They also wish to acknowledge the help of Dr A R Gourlay of the IBM UK Scientific Centre for the derivation of the form of the epipolar line constraint which has simplified implementation of the technique.
REFERENCES 1 Gourlay, A R, Kaye, G, Denison, D M, Peacock, A J and Morgan, M D L ‘Analysis of an optical technique for lung function studies’ Computers in Biology and Medicine Vol 14 (1984) pp 47-58 2 Tio, T B K, McPherson, C A and Hall, E L ‘Curved surface measurement for robot vision’ IEEE Comput. Sot. Conf Pattern Recognition and Image Proc. (I 982) pp 370-378 3 Frobin, W and Hierholzer, E ‘Rasterstereography: a
4
5
6 7
photogrammetric method for measurement of body surfaces’ Photogrammetric Eng. Remote Sensing Vol 47 No I2 (1981) pp 1717-1724 Marr, D and Poggio, T Co-operative computation of stereo disparity’ Science Vol I94 (1976) pp 283287 Marr, D and Poggio, T ‘A theory of human stereopsis’ Proc. Roy. Sot. Lond. Series B Vol 204 (1979) pp 301-328 Mayhew, J E W and Frisby, J P ‘The computation of binocular edges’ Perception Vol9 (1980) pp 69-86 Pollard, S, Mayhew, J E W and Frisby, J P ‘PMF a stereo correspondence algorithm using a disparity gradient limit’ Perception (in press)
8 Denison, D M Private communication 9 Yuille, A L and Poggio, T ‘A generalized ordering constraint for stereo correspondence’ AZ Memo 777 MIT, MA, USA (1984) 10 Marr, D Vision Freeman and Co., San Francisco, CA, USA (1982) II Gourlay, A R Private communication I2 Baker, H H and Binford, T 0 ‘Depth from edge and intensity based stereo’ Proc. 7th Znt. J. Conf. AZ(1981) pp 631-636 13 Burt, P and Julesz, B ‘A disparity gradient limit for binocular fusion’ Science Vol 209 No 9 (1980) pp 615-617 14 Jackson, P H The ZAX image processing system: reference manual Report number 125, IBM UK Scientific Centre (1985) I5 Julesz, B The foundations of Cyclopean perception The University of Chicago Press, Chicago, IL, USA (1971)
image and vision computing