I
7
Tiisi n COMPUTING
ELSEVIER
Image and Vision Computing
15 (1997) 3 17-329
Measuring body points on automobile drivers using multiple cameras George
Stockmana,
Jin-Long
Chen”, Yuntao
Cui”, Herbert
Reynoldsb
“Computer Science Department, Michigan State University, East Lansing, MI 48824, USA bBiomechanics Department. Michigan State University, East Lansing, MI 48824. USA
Received 6 October 1995; accepted 27 June 1996
Abstract Methods are given for measuring 3D points on human drivers of automobiles. Points are natural body features marked by special targets placed on the body. The measurements are needed for improved comfort and accommodation in automotive seat design. The measurement methods required hardware instrumentation in the automobile and development of algorithms for off-line processing of the acquired data. This paper describes the use of multiple cameras to measure 3D locations within the driver’s workspace. Results obtained show that measurement error in X, Y and Z coordinates is expected to be less than 1.O mm on the lab bench and 2.0 mm in the car, and that combined 3D error is expected to be less than 2.0 mm and 3.0 mm, respectively. KeJ?!+ords;Measurement; Measurement error; 3D locations; Multiple cameras
1. Introduction The specific problem treated in this paper is the measurement of the location of body points of a driver in an automobile. The overall problem is the design of better seating for proper vehicle operation and comfortable support. Research toward these goals requires knowledge of postural geometry during operation of the vehicle. To obtain geometric measurements, four cameras are placed inside an automobile such that each important 3D feature point is visible from at least two cameras enabling use of stereo computations. Feature points are special points on appliances, targets attached to the body and vehicle, or natural points such as the center of an eye. Measurements of the 3D points are used to compute the positions and orientations (poses) of the head, pelvis, arms and legs, and lumbar curvature. The multiple images are taken at times deemed important by the experimental protocol, or as triggered by the driver’s seat adjustment. All stereo images are taken and stored digitally in a fraction of a second at a time when the driver posture is stable. The stereo images are used in an off-line procedure which computes the 3D location of feature points. Several other measurements are taken within the car which have a bearing upon assessment of driver comfort and seat design. These include electromyographs (EMGs), which measure muscle activity, and 0262-8856/97/$17.00 0 1997 Elsevier Science PII SO262-8856(96)01132-8
B.V. All rights reserved
an array of pressure measurements which give the distribution of weight on the seat and the location of feature points in the pelvis. The position of the seat, steering wheel and lumbar support are measured with transducers. A cut-down car (called a ‘seat buck’) in the laboratory contains all the above instruments plus contact sensors to measure the position of the spine. While these other measurements are critical for the project goals, they are not discussed in this paper. The rest of the paper is concerned primarily with the multiple camera measurement system and the problems in its implementation; use of the measurements in analyzing driver posture and comfort is not treated. The requirements on our measurement system and its resulting design are sketched in Section 2. Section 3 gives the mathematical methods used for the computation of 3D coordinates from multiple images; the methods presented are applicable to a wide range of 3D measurement problems, and hence are discussed in a general manner. Experiments using the measurement system, both in the lab and in the car, are reported in Section 4.
2. System requirements and design We needed a system that would serve the study ergonomics in a very general manner. In addition
of to
318
G. Stockmun
et ui./Image
and Vision Computing
the need for measuring feature point position, we needed to take full video of our subjects so that comfort and behavior could be assessed. Because of these requirements we had to develop our own approach rather than use existing equipment and techniques. Our approach was to install sensors and adequate computer power and storage on board the vehicle so that signals and images could be digitized and stored for off-line analysis. Our major requirements are summarized below.
15 (1997) 317-329
storage of only 4 x 4 digital images, a full video analog tape was made from one of the camera outputs. Computer display and control was available in the back seat of the vehicle for use by the occupant-technician present during the test drives. Whenever a set of images was collected, the images were reviewed when in the Matrox frame grabber buffer by the technician before being written to the hard disk for permanent storage. 2.3. Procedures
2.1. Requirements l
l
l
l
Stereo images are needed at sparse instants (up to 10) of the drive, so that critical body points could be computed with at least 2 mm accuracy in x, y and z coordinates. The system must yield coordinates relative to a standard vehicle coordinate system defined via the driver seat mounting. Full view of all drivers from head to toe is needed for completeness of observation. Full view of the driver is needed regardless of seat position in the vehicle package envelope; that is, small females and larger males should be equally visible with the same camera setup. The equipment should be unobtrusive and not interfere with the normal operation of the vehicle.
2.2. Hardware Mid-sized 1995 vehicles were instrumented for use in normal driving conditions. For acquiring the body measurements, an Intel 486 PC processor was installed in the trunk. Four CCD cameras with 4.8mm auto iris lenses and IR filters were connected to a Matrox framegrabber capable of digitizing four images within 0.13 seconds from the four cameras. After considerable experimentation, we finally arrived at satisfactory locations so that the cameras would not be obtrusive yet would collectively view the critical feature points. The final locations were as follows: (1) at the dome light fixture; (2) on the A pillar (front, right) above the dashboard; (3) on the A pillar (front, right) below the dashboard; and (4) on the B pillar (middle, right) at the middle of the window opening. The four cameras are rigidly attached to the windshield or vehicle structure such that no vibration effects are detectable in the images. Experimentation with illumination was also required. Camera 1 had 60 IR LEDs and cameras 2, 3 and 4 had 100 IR LEDs to add illumination to the interior without annoying the driver. The notch filters on the camera lenses passed only the IR wavelength. The combined illumination from sunlight and the LEDs gave an image containing good details of both the driver and interior of the vehicle. While most body measurement experiments required
One hundred and two paid subjects were carefully selected and informed about the project’s goals and procedures according to procedures approved by the Michigan State University Committee on Research Involving Human Subjects. Within the lab, the subjects were measured inside and outside of the car using both the camera measurement system and other instruments. These anthropometric measurements are needed for modeling the spinal posture of the individual person’s body in the driving positions. Of the 102 subjects, 40 completed three different drives; the first to become accustomed to the particular vehicle, the second to test upright and reclined seat back positions on spinal posture, and the third to study driver behavior and posture under freely chosen circumstances (except that the route was fixed). Subjects were asked to provide information about their comfort during the experiment via verbal and written questionaires at various points of the drive. At various times in the study, body measurements were made using the multiple camera measurement system. Images were taken when required by the protocol or when triggered by the driver adjusting the seat. Circular targets 2 cm in diameter and highly reflective in the IR band were affixed to certain body points of the subjects. Images from two calibrated onboard cameras are shown in Fig. 1: extracted features are outlined ellipses. Sets of images from a lab session or a drive were analyzed off-line using an interactive program called 3DAQ. The output was a set of 3D coordinates for each feature point at each time that images were taken. Using the external body feature points we estimate the 3D position of the skeletal geometry of the vehicle operator. The position of the operator’s skeleton defines the structure of posture that others [ 1,2] have identified as a primary source of discomfort and fatigue in the operator. The targets described in this paper were, therefore, placed on the skin surface over skeletal landmarks that define the skeletal linkage system. The definition of the linkage system in the arms and legs is straightforward, unlike the definition of the linkage system in the torso. In addition, the comfort of the operator is most closely associated with comfort in the back and buttocks. Thus, our most difficult task was to use the multiple camera system to measure the posture of the torso.
C. Stockman
et al.iImage
and Vision Computing
15 (1997)
317-329
(4 Fig. 1. Automatically
W
detected
circular
fiducials
from (a) camera
Since the targets seen by the multiple cameras are on the skin surface, our task was to identify landmarks on the torso that were easily visible to the cameras and were physically close to the skeletal landmark whose position we wanted to know. On the torso, the skeletal landmarks closest to the skin surface are, in general, obscured by the seat. Since at least three landmarks in the pelvis are needed to define the position of the pelvis in the seat, we used pressure mats to digitize the right and left ischial tuberosities in the multiple camera image space (see Fig. 2). The cameras were used to measure the location
2 (pillar A above dash) (b) camera
3 (pillar A below dash).
of targets on the sternum and pelvis (i.e. over the anterior superior iliac spine). By combining the multiple camera measurements with the pressure measurements (Fig. 2) we were able to calculate the position of the pelvis in the seated operator and the curvature of the spine. More details on these procedures and on the use of measurements to relate posture and comfort are beyond the scope of this paper: interested readers can contact Herbert Reynolds at the above address. Section 3 discusses the methods used to compute the 3D location of feature points visible in the images. 2.4. Related background
Supratsternale
lschial
Fig. 2. Surface landmarks sure mat in seat cushion
on sternum
Tuberosily
and pelvis area relative to pres-
Multiple camera stereo has been the most commonly studied passive method for obtaining 3D scene data; Refs [337] are representative samples of a large literature. The recent work by Jain et al. [7] contains an excellent review of various solutions for the external and interior camera orientation problems and for stereo system calibration. A commonly used method for calibration is described by Tsai [8], who reported the possibility of achieving accuracy of 1 part in 4000 using off-the-shelf cameras and lenses. The most difficult task in automatic depth reconstruction from stereo images is finding corresponding points in the images. We avoided this problem by having an interactive user identify target points with an option to automatically refine them. Computational speed is an issue for autonomous robots, but not for our project, since we allow offline processing of several minutes for each image set and do not search for corresponding points. Finding corresponding points automatically is possible, and will be considered in the future when more development time is available.
320
G. Stockman
et al./Image
and Vision Computing
I5 (1997) 317-329
Y
z
LL
0
Fig. 3. Common
x
perpendicular
to two skew lines. A’, and A’, are the images of the same scene point in two cameras
In previous work, Reynolds et al. [9, lo] computed internal location of various skeletal points by using stereo x-rays of cadavers with tungsten-carbide spheres inserted as feature points. Veress et al. [ll] and others [12] have shown that the stereophotogrammetric RMS accuracy is 0.1 to 0.4 mm using well-defined targets, such as the tungsten carbide spheres. In studies of living subjects who are not part of a medical investigation, implanting tungsten carbide spheres is impractical. Yet, the same problem exists - where is the skeleton in the seated human operator whose activities are being recorded with stereo images? One of the objectives of the current study is, therefore, to obtain estimates of skeletal geometry from external non-invasive methods: the above accuracy represents a lower bound on what we can achieve. Measurement approaches using active illumination such as LIDAR [5], which are very popular in industry, are unusable because the lighting would annoy or harm the driver. Moreover, they would not be capable of producing video, nor would they be likely to improve measurement accuracy on the fiducial points. Following the early research of Johansson, some systems use active points of illumination attached to the body [13] to simplify the identification of such points in the 2D images and to significantly compress the output. We decided early on that we wanted to be able to study full images of the driver. However, we do use tight-fitting
body suits with specially reflective fiducial points (see Fig. 1). Research has appeared very recently which has future potential for our problem [14-161. Some success has been achieved in fitting rich articulated and deformable models to a sequence of images so that both structure and motion can be estimated. To use such an approach, more precise laboratory measurements could be used to parameterize a model of the human body, which could then be used to fit images acquired within the vehicle. It is possible such an approach would be more effective than our current approach, which is characterized by the small number of special feature points.
3. 3D measurement
system
The purpose of the 3D measurement system is to produce 3D locations [xi, yi, Zi] for critical body points Pi at certain instants during the drive. Coordinates of the Pi are in the global frame of the car, as defined by the driver seat mounts, so that both body structure and pose in the car can be computed from these same measurements. We adopted the general stereo approach [4,6,7], where 3D coordinates are computed by observing the same feature point Pi in two or more 2D images from cameras that are calibrated to the 3D workspace. Our cameras are not specially arranged relative to each other as in many
G. Stockman et al./Image and Vision Computing 15
stereo systems; instead, the cameras are constrained by where they may be placed in the car in order to be unobtrusive to the driver, and at the same time have an effective field of view. The 3D coordinates of a scene point are obtained as follows (see Fig. 3). Once a camera is calibrated to the workspace, the 3 x 4 camera matrix obtained algebraically represents the perspective imaging transformation. Via the matrix, each image point determines a ray projecting into 3D space. When the same 3D feature point is observed by two cameras in two image points, the intersection of the two projecting rays represents the 3D scene point. Theoretically, these two projecting rays should intersect with each other, but in practice the algebraic rays do not intersect due to various errors. One significant source of error is image quantization. A second significant source of error is due to the differences between the algebraic pinhole model for the camera and the behavior of the real lens. A reasonable location for the assumed intersection of skewed, but almost intersecting, lines can be defined as the midpoint of the shortest line segment connecting the two rays ([3, Ch. 10.61 or [6, Ch. 14.61). Throughout the rest of this paper, we refer to this computation as intersecting the two rays. To combat the sources of error, we utilize multiple pairs of images to obtain multiple estimates for each scene point. If a 3D feature point is seen by three cameras, three estimates are available for its location; in the case where the point is seen by four cameras, six estimates are available. Multiple estimates (outliers first deleted) of the 3D point are weighted in inverse proportion to their predicted error to obtain a single estimate. Details are given below.
3.1. Camera calibration The perspective imaging transformation can be represented by a 3 x 4 camera matrix CsX4. 3D point Pi = [xi,yj,z,]’ is projected to image point [ui, vi]‘. The following matrix equation uses homogeneous coordinates: ui is computed by taking the dot product of the first row of C and Pi and dividing by the perspective factor s, which is the dot product of the third row of C and Pi. v, is computed in a similar manner using the second row of C:
Derivation of the camera matrix from a well-known calibration procedure is described by Hall et al. [17] and Jain et al. [7]. The eleven unknown parameters of C can, in theory, be determined by knowing the 3D workspace
[ 1997) 317-329
321
coordinates [xiryi, zi]’ and 2D image coordinates [ui, Ui]’ for six feature points. In practice, about 20-30 observations spread across the field of view are used to overdetermine a solution and counter the effects of measurement error. The linear least squares solution for the parameters of C minimizes the sum of the squared errors (residuals) between the 2D observations and the 3D points projected by C into the image plane. An excellent survey of methods for determining the parameters of camera orientation is given by Jain et al. [7]: the method which we used is described in Section 12.10.2. This method does not enforce the constraints of an orthonormal rotation matrix; however, a more general linear transformation can be modeled which allows certain assumptions about camera geometry to be relaxed. Because we were buying twelve sets of cameras and lens and had to economize on the equipment, we decided to use the more general camera parameterization. Parameters of exterior and interior camera orientation are combined in the resulting elements I+, and are not computed in the procedure. Part of the cost of assuming fewer constraints is the need for more calibration points. Non-linear lens distortion must be handled separately, and is discussed below. A jig is needed to do rapid calibration to the workspace: it is a physical object with precise and easily recognizable feature points [xi, yj, z;]‘. In lab work, we usually align the jig with the workspace coordinate system so that the premeasured 3D coordinates are immediately known and available in a file. We built a special jig for the car which rests in the driver’s seat mounting with the seat removed. The jig is an aluminum frame which fills the work volume and has about 60 spherical beads attached to it. Each bead is about 1Omm in diameter, and is covered with retro-reflective tape to improve contrast within the car: this material reflects near infrared radiation very well. A Metrecom coordinate measuring device (by FAR0 Technologies) is used to locate all of the jig feature points relative to the car coordinate system. The measurement error using this device is less than 0.1 mm within our work volume; however, skill is required to measure the sphere centers within this error. All 3D feature points P, obtained using the multiple camera measurement system are in car coordinates in millimeter units. 3.1 .I. Using program 3DAQ for calibration A computer program, called 3DAQ for ‘3D acquisition’, was developed to handle both the major tasks of camera calibration and the stereo computation using the camera matrices. Considerable attention was paid to the tasks of data management and user interaction in choosing and computing feature points. 3DAQ operates in either calibration mode or calculate mode, depending on which of the two tasks is being done. In calibration mode, 3DAQ displays the calibration images to the
322
G. Stockman
et al./Imagr
und Vision Computing I.5 (1997) 317-329
=> CALCULATE MODE
>>
t354vOF24.pp
FEEDBACK
Point ankle u= 14.00
NAME Iv= 44l.00
X= 705.15 IY= 14.18
L23
0.70
2-D CO( I & -70.47
3-D COC
1.67
RESIDUl
CALIBRATION MATRIX 1116839169 1JlH9130-0.176777193I4hI149756 0.15561016k?4ow51 oM60734 13.~61MI4 oMil24677110161361-D10033D35 lalo,me IJLIMM6 11600910 0d00060606d1600M~
FiuarlImFcrCatterdSnritiveHef,
Fig. 4. Program
3DAQ user interface
user and displays the locations [xi, yI, z,] of known 3D calibration points on the jig. Fig. 4 shows 3DAQ in 3D calculation mode. The image displayed is from the right Pillar B camera. For the moment, the reader should imagine that instead of the driver and seat, the calibration jig is present in the image. The 3D names and locations of the jig points would be displayed at the top of the 3DAQ window in the same way that the body points appear in Fig. 4. The user then locates the jig feature points in the image using the mouse. When six or more points are selected, a camera calibration matrix is computed as described above. Fig. 4 was taken after calibration was done, so the calibration matrix for that view appears in the window at the lower right. Computing camera matrix
in 3D calculation
mode.
C in this manner takes only a second or two on a SUN Sparc2. Each association ([xi, yj, z;]‘, [ui, q]‘) is displayed, along with the residual from the camera model C. If the user detects a mistake or inaccuracy by inspecting the residuals, it can be corrected immediately by reselecting or disabling any of the image points, resulting in a new C and new set of residuals. Because 3DAQ is shown in stereo calculation mode in Fig. 4, the 3D residuals from the stereo computations are shown for each feature point, and not the 2D residuals which would be shown during camera calibration mode. The cameras need to be calibrated whenever they are moved relative to the workspace in the car. This timeconsuming process requires removing the seat from its
G. Stockman et al.lImagr and Vision Computing 1.5 (1997) 317-329
mounting, installing the jig, and independently measuring the bead locations. The camera matrices can be reused indefinitely (for many shots of many drivers) provided that the camera positions and orientations remain fixed. When a camera remains fixed, so do the image locations of fiducial points attached to the background. When changes are detected in the locations of such points, then the camera calibration procedure is repeated. 3.1.2. Using program 3DAQ jtir 30 calculation After each camera view is calibrated, 3DAQ is used in calculate mode to locate various feature points in 3D using the mathematical methods discussed in the previous section. The feature points are targets on the driver and fiducials on the car. The windows, or buttons, represent feature point objects which exist in the real world and independent of the program representation. Feature points on the body are specially highlighted targets of circular retro-reflective tape on a dark background attached to the driver’s body suit or exposed skin. 3DAQ has an option to search for an elliptical bright region and use its center as an image point. Note that four of the target windows at the top of Fig. 4 show no 2D coordinates ~ this is because they are not visible in the image currently displayed. However, two of these four targets (suprasternale and sternum2) do have 3D coordinates which have been computed from other images where they are visible. The 3DAQ user is free to change from one image to another in arbitrary sequence to perform her analysis of the 3D scene. 3DAQ manages the use of up to four cameras; the user can readily switch among them. Whenever two or more cameras are calibrated, the user can then use the camera models to compute the 3D locations of any identifiable 3D feature points that have not been used for calibration. In calculate mode, the 3D residuals shown are computed as follows. For points whose ground truth coordinates are known, the residuals are the difference between the ground truth coordinates and those computed via stereo. For other points, the residuals are an estimate of the standard deviation of the computed estimate of the coordinates. Computation of this estimate is described below in the section discussing error. 3DAQ is written in C using XWindows. 3.2. Correcting
lens distortion
The quality of our lenses was limited by cost considerations, so we used software and another calibration procedure to reduce distortion, primarily barrel distortion, in the images before using 3DAQ to calibrate the cameras. Tsai [8] and Jain et al. [7] describe techniques for correcting radial distortion. Basically, each image point is corrected by moving it along a radial from the image center an amount proportional to the square of its distance from the image center. We used the method
323
described by Goshtasby [18], which can remove even non-symmetrical distortions by providing parameters for local regions of the image. Goshtasby’s method accounts for both radial distortion and also the tangential distortion which results from decentering the lens elements. Also, the use of Bezier patches has advantages over least squares fitting of global warping parameters where local distortions affect the global parameters. Each camera is used to acquire an image of a checkerboard of precise black and white squares. Corners of the squares are automatically located in the image by an edge detection procedure that searches outward from the center square of the image. As in Ref. [18], two lookup tables are produced which are used to smoothly map distorted image coordinates into corrected image coordinates. The tables are then used to warp all of the images taken by that particular camera/lens. The warping parameters remain valid as long as there is no change in the camera lens/body configuration. Distortion can be detected by studying the residuals using 3DAQ; however, if the residuals indicate distortion in the data, the only recourse is to invalidate the data. One of the duties of the occupant-technician is to monitor fiducials in the images and check that they are in expected fixed image positions. Procedures are available for calibrating both image warping parameters and perspective model parameters in a single step: see, for example, the work of Tsai and Weng et al. [8, 191. We chose not to use such procedures because we would not have enough feature points available inside the car to determine the warping parameters for the four cameras, but we may have to reconsider this decision in the future if we need more accuracy in the 3D points. 3.3. Stereo computation Once the camera calibration matrices are obtained, they can be used to compute the world coordinates of feature points Pi. Since each 2D image point can project a ray in 3D space, two projecting rays LI and L2, derived from image points corresponding to the same object feature point, determine, an intersection which yields the 3D world coordinates of that object point (see Fig. 3). While L1 and L2 theoretically intersect at the 3D point Pi, the algebraic parameters actually represent skew lines due to approximation errors in the fitted camera matrices Ci and C2 and errors in location of the image point coordinates. An approximation to the object point lies midway between two projecting rays at their point of closest approach, i.e. the midpoint of their common perpendicular. This midpoint can be computed in closed form using the method given by Duda and Hart [3, Sect. 10.61. If the rays are not close enough in approach, the computation can be discarded. If the accuracy of the two rays is different, the estimate can be moved closer to the more accurate ray, as shown in the
324
G. Stockman
et al./Image
and Vision Computing
section below discussing error. An alternative computation for Pi would be to use the least squares solution to the overdetermined system of four equations and three unknowns. However, this least squares solution minimizes algebraic distance rather than true geometric distance, and thus gives less information about 3D error. The design of the system considered several sources of error. Analysis of error is in the next section.
15 11997) 317-329
means of measuring 3D locations with much greater precision than the multiple camera system, and we compare the two measurements. This is done by performing the stereo computation for certain testpoints on the jig which have not been used for camera calibration. Jig points are measured with an accuracy of 0.1 mm or better by the coordinate measuring machine. Variance of computed coordinates can be estimated from the residuals. Results of such tests are given in the Section 4.
3.4. Error in 30 point location The following error sources may affect our measurements. Some effects may be detected and avoided by discarding unreliable information: Each physical camera lens is only approximated by the camera matrix. We compensate for much of the nonlinear distortion by warping the image as needed to create the expected image of a grid; however, other distortions may still be present. For example, it is known that lens distortion varies with change of temperature and focusing [20]. Often, the 2D residuals resulting from camera calibration are of the order of 0.5 pixels even for calibration points. Location of feature points in the image are subject to quantization errors and other effects in defining them. The auto-iris lens can blur the image features. Features selected with the user’s mouse must have integral coordinates. Subpixel accuracy can be achieved for the circular fiducials, but there is error when the target is not planar in 3D. Also, the user or automatic procedure sometimes selects the wrong feature point. The 3DAQ program user can reselect a feature point using feedback from 3D calculations. The driver is subject to slight movements or vibrations during the period when the frames are grabbed. If the effect is severe, the data set is cancelled either onboard the automobile or during offline processing. Possible vibration of the cameras was an important concern; however, after 400 drive sets, this possible error source was judged insignificant. A camera moved from its calibrated position, is detected by the image locations of fixed fiducials placed in the car. If movement occurs, the camera is recalibrated and any data taken while it was out of calibration is discarded. (It is possible, in theory, to recompute a new camera calibration matrix from three or more fiducials fixed to the car background, but experience shows that many more than three points should be used to reduce error.) 3.4.1. Experimental computation of coordinate error For each 3D point measured, we want an estimate of the error to be used in subsequent modeling programs. We use two methods to estimate the overall error in the computed 3D feature points. First, we use an independent
3.4.2. Mathematical analysis of coordinate error Here we use mathematical analysis of the camera and ray configurations to estimate error in the coordinates of the 3D points. Not only do we obtain an estimate of our 3D measurement accuracy, but we also see how to best use or improve our instrument. Fig. 5 shows much of the geometry for the current discussion. A 3D point C observed as image point Xj by camera C/ is expected to lie along the ray XjC. However, due to error A0j in the location of observation Xi, the true 3D location of C lies on an error cone defined by Ae,, as shown in Fig. 5. If the observations can lie anywhere within A!, of X,, then the 3D point can lie anywhere inside the intersection of the two cones. A probability distribution of actual image location given observation Xi induces a probability distribution in 3D space and multiple observations of C allow the concept of a maximum likelihood estimate of C. However, we will not pursue a formal probabilistic approach here, but rather one that propagates a nominal imaging error through our computational formulas. Assuming that the angular error in the observation X, is bounded by Aej, the point C can lie anywhere within the intersection of the two error cones. The figure shows two rays that actually intersect in a plane in 3D, which we know might not actually be the case, as shown in Fig. 3. Also, to enlarge the error volume under study, the angles AB are drawn much larger than true scale. The ABi are less than 10 minutes of arc in our application; therefore, our analysis will assume that the three rays projecting from each camera focal point are approximately parallel in any small volume, and thus quadrilateral ABCD is a parallelogram. Also, the standoff distances r, are defined as the distance from (Zi to C, but because the error volume is small relative to this distance, rI is a good approximation for the distance from cj to E as well for the purpose of error analysis. 3D spread of image error The spread of error in the 3D cone at distance rl for camera Ci is IAE 1 = rlABi. Assuming a pixel size of 0.025mm in the image plane, an image point error of 0.5 pixels and focal length of 4.8mm, then IAE 1= 2 mm at rl = 800 mm and approximately 1 mm at rl = 400 mm. 0.5 pixels is the worst case quantization error for a feature selected accurately with the mouse, but the error could be larger due to modeling error.
G. Stockman er al./Image and Vision Computing I5 (1997) 317-329
325
-_____________________ E :
Fig. 5. Sketch of error cones for nearly intersecting
Residuals in camera pixels in size.
calibration
are of the order
of 0.5
Subpixel accuracy for image points Haralick and Shapiro [6], in Chapter 20 of Volume II, show how to reduce quantization error by using the centroid of a circular disk as the feature point. The standard deviation of the centroid is gc = 0.3Ac/fi, where N is the number of rows of pixels which cover the disk and AC is the row quantization. If the pixel is not square, then the standard deviation is computed similarly for rows and columns. For a disc of diameter N 2 10, the standard deviation will be only about one tenth of a pixel. Using an image point error bound of two standard deviations reduces the 3D spread of error by the ratio of 0.5 to 0.2 relative to the previous analysis. This effect only holds for a planar circular feature parallel to the image plane or for our spherical calibration beads which project a circle in any direction. An iterative ellipse fitting routine has been developed [21] to find the parameters of the best ellipse fitting boundary points of an elliptical region. The 3DAQ user has this option in defining image feature points. The method produces results comparable to linear least squares when 30 or more boundary points are available from a nearly fully visible ellipse, but produces much better results under significant occlusion. Thus, if use of an extended target is justified, image quantization errors are of the order of a tenth of a pixel and
rays (not to scale).
are less than our observed modeling error. (It is not justified when the circular target is distorted significantly out of a plane by the body part to which it is affixed.) Experimental results below show 3D errors similar to the examples in the previous paragraph, which assumed a 0.5 pixel image error, and therefore support this analysis. 3D error estimation Estimation of 3D point error is complicated by the different ways in which the multiple measurements can be combined. Estimates are more robust if we discard outliers before linearly weighting the remaining estimates. Outliers can be caused by user mislocation of a feature point, by movement of a camera or feature point on the body, or by blurring of the image due to significant change of the iris. Previously, we described a method to compute an estimate for 3D point C as the midpoint of the shortest line segment connecting the two rays. From the above analysis, it makes sense to discard the stereo computation if the midpoint computed is of distance greater than 2rjA8, from ray CjX, for either ray. If the midpoint is within the required distances, then it also makes sense to weight the result according to the bound on error; for example, if one camera is 400mm from the point while the other is at 800 mm and all other parameters are equal, then the weights should be 2 to 1. Let Ai and Aj be the 3D error bounds projected from the two cameras to the intersection: then we define the
326
G. Stockman et al./Image and Vision Computing 15 (1997) 317-329
intersection C =
coordinates
Q12 = ((l/ai)Qi
as follows:
+ (l/‘~,)Qj)/(l/&
+ l/‘Aj).
(1)
This gives the midpoint of Q;Qj when the error bounds are the same. Refer to Fig. 3 also. It is well known that the depth accuracy of stereo depends upon the angle Q of approach of the two rays. Intuitively, accuracy is best at a: = 7~12 when quadrilateral ABCD is a rectangle and degrades as (3~decreases; the computation becoming worthless near a = 0. A simple trigonometric derivation yields the length x of the diagonal of the parallelogram ABCD: x = ((Y, A& + r2A&) cos(a/2))/
sin CK
(2)
With Q = 7r/2 and both camera error models the same, we have x = v’? r At), as expected. x is eight times as large at a = 10” as it is at 90”, but only twice as large at 45”. While accuracy is best at 90”, not many points will be visible in both images; thus, many systems have been configured with their optical axes making an angle of 45”. Placement of our cameras was strongly constrained by the workspace in the car and the requirement that the cameras be unobtrusive to the driver. Also, the actual angle LYvaries not only with the pair of cameras, but also with the particular feature point since the focal lengths are short and the fields of view are wide. In combining several stereo estimates from multiple pairs of cameras, we therefore weight each acceptable estimate from stereo pair i,j according to its error bound ((riAf3; + rjAOj) COS(CU/~))/sin QI.
4. Experimental
procedures
We first performed experiments indoors on a lab bench. These experiments allowed us to initially develop and test our procedures, to produce a first version of software, and to obtain optimistic estimates of the error in measured coordinates. As we progressed in transporting our hardware to the automobile, we had to make several adjustments. These included the use of smaller focal length lenses and inclusion of an image warping step, reworking the camera calibration jigs, and completely redesigning the interactive software to process the images and perform the multiple camera stereo computations. After several iterations of improvement of the onboard system, we repeated our evaluation of the measurement system. Results are given below. 4.1. Experiments
on the lab bench
Several scenes were created within the laboratory using calibration jigs and rulers so that the 3D coordinates of many points could easily be known. Some of these points were used for calibration and the rest were
used to test the 3D sensing process is as follows:
accuracy.
The
five-step
(1) Scene creation
and measurement of 3D points. The first test scenes were constructed using jigs and rulers. The markers on the rulers and the corners of objects were selected as feature points. The 3D world coordinates of these feature points were measured and recorded. Some of these feature points would be used for computing camera calibration matrix, while others would be used as test points for verifying the accuracy of the derived 3D world coordinates. (2) The first test images were acquired using a CoHu camera with 8.5 mm focal length lens and no IR filter. Standoff of the camera from the workvolume was 900 mm to 1OOOmm. The work volume of measurements was roughly 500mm x 500mm x 250mm. Image capture and digitization was done via a Sun 4/330 SPARCstation integrated with a color imaging system. The 2D image coordinates for calibration and test points were extracted manually using an Xwindow software tool which was a predecessor of 3DAQ. of the calibration (3) Using the 3D world coordinates points obtained in process (l), and the image coordinates extracted in process (2), the camera calibration matrix was derived for each of three views of the scene. No warping step was used. (4) We manually located feature points from the scene in the three images in order to get reliable correspondences. Software was developed to make this laborintensive step as fast and as pleasant as possible. (In fact, the early software was not so user-friendly; user complaints led to the current 3DAQ interface which is much more user-friendly.) matrices from any two (5) With camera calibration views and a number of point-to-point correspondences obtained in process (4), 3D feature point locations were computed using the technique of finding the midpoint of the common perpendicular to two projecting rays. Abbreviated results of camera calibration for Experiment I, View III are given in Table 1. The residuals of the model fit average about one half pixel in the image plane, even though the image was not corrected for barrel distortion. Results of some stereo computations using the three views follow directly in Table 2 and are summarized in Table 3. While there are a few errors as bad as 5 mm, most are less than 1 mm and the average is below 2 mm for all three coordinates. The average relative error is about 1 part in 700. 4.2. Measurements
made in the car
The measurement experiment repeated using instrumentation
described above was in the car. However,
G. Stockman Table 1 Camera calibration
sample output
(Experiment
et al./Image
and Vision Computing
I. View III). (XW, YW, ZW): real world coordinate;
(XI, YI): image coordinate
Fit data
Input data
Residuals (in pixels)
XW
YW
zw
XI
YI
XII
YII
76.2 127.0 177.8 228.6 0.0 0.0 278.5 278.5
152.0 152.0 152.0 152.0 431.8 482.6 64.0 0.0
- I .o -1.0 -1.0 - 1.o 19.0 19.0 0.0 -77.0
305.0 310.0 316.0 321.0 100.0 63.0 389.0 457.0
91.0 115.0 142.0 167.0 62.0 62.0 191.0 172.0
305.0 310.1 315.3 320.8 100.2 62.7 388.5 456.9
91.0 115.6 141.0 167.3 62.0 62.1 191.7 171.2
Computed
calibration
0.00845079 0.42900348 -0.00028072
matrix
errors
0.09206268 0.36976537 0.00083967
from experiment
View I & 11
’ Note: rows marked
I (in mm)’
1.677 -0.736 -0.627 -2.260 -2.067 0.192 -7.025
1 1 1 1 1 1 1
with * represent
-0.935 -0.610 PO.403 -0.515 -0.372 -0.674 -1.140
View I & III
-1.487 -0.064 -0.072 -0.737 I.453 1.521 -0.217
0.956 0.879 1.151 3.116 PI.913 -2.315 2.773
-0.927 -0.106 0.153 0.966 -0.352 -1.310 1.003
Bestapproximation 1.323 0.142 0.359 0.612 -2.025 -1.126 -2.023
-0.623 0.249 0.591 0.121 1.166 1.027 0.153
-0.917 -0.106 -0.016 0.966 -0.372 -1.310 1.003
- I.096 0.249 0.555 0.121 I.453 I.027 0.153
Table 4 3D human
body
feature
points
visible in example
image 2 (units are
mm) Image 2D
Reconstructed (X. Y. Z)
3D coordinates
#I
1 and 2 (average error in mm) 2 and 3 (average error in mm) 1 and 3 (average error in mm) 1 and 2 (maximum error in mm) 2 and 3 (maximum error in mm) 1 and 3 (maximum error in mm) approximation (average error)
1.319* 0.142 -0.134s 0.612 -1.913* -1.126 -2.023*
four cameras were calibrated from images of the jig given to the 3DAQ program. Then, using’the stereo calculation mode of 3DAQ, 3D coordinates of test jig points (not used for calibration) were computed. For each camera, roughly 15 points were used for the
(D. V) from Experiment
i 1 1 1 1 1 1
the test points
Point name
view view view view view view best
I .o -0.3 -0.0 -0.1 -0.7 0.8
1.oooooooo
the cameras had 4.8 mm auto-iris lenses and the IR notch filters as described in Section 2. The four cameras were set up and the distortion mapping was computed for each before placing the camera in the car. All images would be warped using the appropriate look-up tables from Goshtasby’s method before use in any further analysis [18]. The special jig was placed in the driver seat mounting and the centers of all calibration beads were located using a coordinate measuring device. Each of the
Table 3 Results obtained
0.0 -0.6
396.35992432 55.73084641
View II & III
PI.177 0.806 0.519 I.488 1.768 0.597 2.834
0.0 -0. I 0.7 0.2 -0.2 0.3 0.5 0.1
C
-0.70418924 -0.01052378 ~0.00017955
Table 2 Sample measurement
-0.889 -0.391 -0.186 -0.128 -0.395 -0.724 -1.378
327
15 11997) 317-329
x coord
v coord
0.9 0.9 0.7 4.0 3.5 2.8 0.7
0.9 0.6 0.5 3.7 2.5 2.6 0.5
z coord 1.5
I .6 0.8 7.0 5.2 2.5 0.9
Eye Neck Shoulder Elbow Wrist Knee Ankle R_ASIS ball 1 ball2
(408.0. (314.0. (217.0. (147.0, (235.0.
337.0) 367.0) 338.0) 217.0) 105.0)
-. (192.0, (258.0, (299.0,
165.0) 160.0) 129.0)
(223.4. 76.1. 761.8) (60.2, 88.6. 711.3) (15.9, -42.7, 618.9) (19.5, -55.2. 377.8) (247.0. -16.3, 307.9) (459.6. -43.1, 182.7) (688.0, -6.6. -37.0) (96.1, 26.8, 306.8) (161.3, 183.7. 297.1) (270.7, 172.5. 3 14.3)
328
G. Bockman
Table 5 Sample distribution Size of error
(mm)
number of points cumulative percent
el al./Image
and Vision Computing
15 (1997) 317-329
of total 3D error from within the car 0.5-1.0
1.0-1.5
1.552.0
2.0-2.5
2.5-3.0
3.0-3.5
3.5-4.0
4.0-4.5
4.5-5.0
5.0-5.5
I 10
11 26
11 42
13 61
9 74
6 83
5 90
3 94
3 99
1 100
calibration matrix. All feature points visible to two or more cameras were located in 3D using the stereo computations. Table 4 shows the coordinates of 3D body points visible in the image from camera 2. The 3D errors computed as Euclidean distance between the 3DAQ computations and those of the coordinate measuring device are described in the Table 5. The table lists the number of points having an error within given ranges: a total of 69 test points were used. For 42% of the points the total error was less than 2.0mm, and thus, the error in each coordinate must also be less than 2.0mm. In 83% of the cases the total error is less than 3.5mm; if individual coordinate errors are balanced, then all coordinate errors are no worse than 2.0mm. At least 17% of the points have some coordinate error larger than the desired 2.0mm limit.
5. Concluding discussion We have described a system of hardware and algorithms for measuring 3D points of a human driver of an automobile during operation of a motor vehicle. Measurements made using a system prototype in the laboratory yielded an expected accuracy of better than 1.O mm in x, y and z coordinates and a combined error of less than 2.0mm without removing the barrel distortion of the lens. Many additional problems had to be faced when installing such a system in a car, however. First, we had to use small focal length lenses due to the small workspace and the relatively large depth-of-field needed. We could not freely place cameras in order to optimize viewpoint or stereo accuracy because cameras could not interfere with the driver’s vision. The illumination within a car varies a great deal with weather conditions, time of day, and heading. Therefore, we used auto-iris lenses to adapt to the illumination. We used LEDs emitting 800nm wavelength light to specially illuminate the scene only at times when images were taken, and specially reflective targets were affixed to the driver’s body. In the more difficult real environment, measurement accuracy degraded slightly. In a test of 69 points, only 18 were measured with a 3D error of 3.0 mm or more and only one of those errors was about 5.0mm. The accuracy on the average is of the order of 0.2% of the field of view, i.e. 1 part per 500. These results, on average, satisfy the original requirements set for the project; however, for some points the error is slightly larger
than our design goal. One limiting factor in making 3D measurements is the difficulty of accurately identifying image points. With the large field of view and depth of work volume, it is difficult to obtain crisp images of all targets. Moreover, targets affixed to the body are distorted from their ideal geometry. Improvement of accuracy also depends upon improving the image warping and calibration procedures to derive more accurate camera models: in the warping step we need to investigate use of centroids rather than intersections and tighter control on temperature and lighting. The error from the current system is about five times that of other systems using surgically implanted spheres [l 11. In conclusion, using our non-invasive procedures, the multiple camera system, in conjunction with a high definition pressure mapping of the interface between the seat and occupant, can be used to define the position of the skeleton of a vehicle operator in the working environment. As a result, quantitative measures of human posture are possible in real working environments such that the location ofjoint centers in the human linkage system can be estimated for the computation of joint loads. Thus far, 448 sets of drive data and 102 sets from the seat buck have been collected. Analysis of use of the data for the study of the posture and comfort of drivers is in progress and will be the subject of forthcoming publications.
Acknowledgement The authors acknowledge the support of Delphi Interior and Lighting Systems (formerly Inland Fisher Guide) Division of General Motors Corporation. In addition, we wish to thank Michael Moore for his excellent work in redesigning and reprogramming the graphical user interface for the 3DAQ program during Spring 1993. This work was supported by research sponsored by Delphi Interior and Lighting Systems, General Motors Corporation.
References [l] M.A. Adams and W.C. Hutton, The effect of posture on the lumbar spine, J. Bone and Joint Surgery, 67-B(4) (1985) 625-629. [2] T.M. Hosea, S. Simon, J. Dlatizky, M. Wong and C.-C. Hsieh, Myoelectric analysis of the paraspinal musculature in relation to automobile driving, Spine, 1 l(9) (1986) 928-936.
G. Stockman
et al./Image
and Vision Computing
[3] R.O. Duda and P.E. Hart, Pattern Classification and Scene Analysis, John Wiley, New York, 1973. [4] ST. Barnard and M.A. Fischler, Computational stereo, Computing Surveys, 14 (December 1982). [5] R.A. Jarvis, A perspective on range finding techniques for computer vision, IEEE Trans. PAMI, S(2) (1983) 122-139. [6] R. Haralick and L. Shapiro, Computer and Robot Vision, Addison-Wesley, Reading, MA, 1993. [7] R. Jain, R. Kasturi and B. Schunck, Machine Vision, McGrawHill, New York, 1995. [8] R. Tsai, A versatile camera calibration technique for high accuracy 3d machine vision metrology using off-the-shelf cameras and lenses, IEEE Trans. Robotics and Automation, 3 (August 1987). [9] H.M. Reynolds, R. Halgren and J. Marcus, Systems anthropometry: Development of a stereoradiographic measurement system, J. Biomechanics, 2(4) (1982) 229-233. [IO] H.M. Reynolds, Erect, neutral and slump sitting postures; a study of torso linkage system from shoulder to hip joint, Technical Report AL/CL-TR-1994-0151, USAF, Air Force Material Command, Wright-Patterson Air Force Base, Dayton, Ohio, 1994. [11] S.A. Veress, F.G. Lippert III and T. Takamoto, An analytic approach to x-ray photogrammetry, Photogrammetric Eng. and Remote Sensing, 43(12) (1977) 1503-1520. [12] G. Selvick, P. Alberius and A. Aronson, The effect of posture on the lumbar spine, Acta Rad. Diag., 24(4) (I 994) 3433352.
IS (1997) 317-329
329
[13] G. Johansson, Perception of motion and changing form, Scandanavian J. Psychology, 5 (1964) 181-208. [14] A. Pentland, Automatic extraction ofdeformable part models, Int. J. Computer Vision, 4 (1990) 107-126. [15] I. Kakadiaris, D. Metaxas and R. Bajscy, Active partdecomposition, shape and motion estimation of articulated objects: A physics-based approach, IEEE Comp. Vision and Pattern Rec., Seattle, WA, June 21-23 1994. [16] K. Rohr, Towards model-based recognition of human movements in image sequences, CVGIP: IU, 59( 1) (1994) 94- 115. [17] E.L. Hall, J.K.B. Tio, C.A. McPherson and F.A. Sadjadi, Measuring curved surfaces for robot vision, Computer, 15(12) (December 1982) 42-54. [18] A. Goshtasby, Correction of image deformation from lens distortion using Bezier patches, Computer Vision, Graphics and Image Processing, 47 (1989) 385-394. [19] P. Cohen, J. Weng and N. Rebibo, Motion and structure estimation from stereo image sequences, IEEE Trans. Robotics and Automation, 8 (June 1992) 362-382. [20] J. Fryer and D. Brown, Lens distortion for close-range photogrammetry, Photogramm. Eng. and Remote Sensing, 52(l) (1986) 51-58. [21] Y.T. Cui, J. Weng and H. Reynolds, Optimal parameter estimation of ellipses, Proc. 8th Int. Conf. on Image Analysis and Processing, San Remo, Italy, September 1995.