A camera model for natural scene processing

A camera model for natural scene processing

Pattern Recoqnition Pergamon Press 1977. Vol. 9, pp. 131-136. Printed in Great Britain A CAMERA M O D E L FOR NATURAL SCENE PROCESSING* E. A. PARRISH...

438KB Sizes 2 Downloads 79 Views

Pattern Recoqnition Pergamon Press 1977. Vol. 9, pp. 131-136. Printed in Great Britain

A CAMERA M O D E L FOR NATURAL SCENE PROCESSING* E. A. PARRISH, JR. and A. K. GOKSELt School of Engineering and Applied Science, University of Virginia, Charlottesville, VA, U.S.A. (Received 3 September 1976 and in revised form 28 February 1977) Abstract--A camera model for perspective transformations used in processing three-dimensional natural scenes is presented. An efficient method to determine the required camera offset parameters and a relatively fast calibration procedure are given. Camera model Scene analysis Perspective transformations

INTRODUCTION

A great deal of research has been devoted to what might be called two-dimensional pattern recognition problems, in which relationships between objects in an image scene and the position of the sensor (e.g. TV camera) are ignored. Such problems include character recognition, fingerprint identification and face recognition, among others. In research concerned with the use of robots equipped with video sensors, however, the third dimension becomes equally important. In two-dimensional problems, where required, the location of an object in the real world can be specified in terms of the picture array coordinates--for example, by expressing the row and column number of the beginning of the digital image of the object. Similarly, the relative positions of objects can be given in terms of the picture array coordinates. When dealing with three-dimensional problems, however, it becomes necessary to relate the picture array coordinates to the three-dimensional position of the object in the real world. The method used to transform the three-dimensional object coordinates to two-dimensional image coordinates and vice versa is called a perspective transformation. Such transformations require a camera model and allow decisions to be made concerning the real world location of objects based upon information obtained from a two-dimensional digital picture array. Camera models have been used for some time hy photogrammetrists interested in constructing a map from a number of separate images of the region of interest/1~ The use of perspective transformations for scene analysis was suggested by Roberts t2) and later applied by Sobel(3) to the Stanford Hand-Eye Project.

This paper presents a model similar to that of Sobe113~ which was used in a system for processing natural scenes/4) In particular, these scenes consisted of apple trees, and the objects of interest were apples. The perspective transformation was used to determine the trajectory required of the mechanical arm in order to acquire an apple whose image position had been determined (using scene analysis and pattern recognition techniques) in terms of its coordinates in a binary picture array. The hardware system consisted of a Hewlett Packard 2100A minicomputer, a standard 525-1ine black and white television camera mounted on a computercontrolled, pan-and-tilt base, and a rudimentary three degrees-of-freedom mechanical arm, also under computer control. Although photographs of actual apple tree scenes were used during the design phase, an artificial apple tree was used in laboratory demonstrations for obvious practical considerations. In operation, the television camera was used to scan an area until a portion of the tree was in view. After processing the acquired data, the arm was driven until each apple located in a given scene was touched by a tactile sensor mounted on the end of the arm. In this way, the apples were "picked." Obviously, the calibration of the camera model is crucial to the success of such a system. An efficient method to determine the required camera model offset parameters and a relatively fast calibration procedure will be presented after the model is developed in the following section.

* This research was supported by the National Science Foundation under Grant ENGR 74-16376. t Formerly with the School of Engineering and Applied. Science, University of Virginia, Charlottesville, Virginia. He is now with M.B. Associates, San Ramon, California. 13l

THE CAMERA MODEL

The purpose of the camera model is to facilitate the derivation of a perspective transformation to map the world coordinates of the object 'location to the image coordinates and its inverse that maps the image coordinates to the object coordinates.~51 In this model the camera is represented by a pinhole lens and the distortions in images, due to lens aberrations, TV camera, and other hardware, are neglected. The

132

E.A. PARRISH,JR. and A. K. GOKSEL ~epictu~matfix 1000 0 1 0 0 P=

(3)

0 0 1 0 1

0~01 The image plane

//I,

transforms an object point (in image plane coordinates) to its image point, and only involves the lens focal length, .f. The matrix G is a linear transformation given by

.

I I 0 0

Fig. 1. The camera model.

o

camera model is illustrated in Fig. 1, and the top and profile views of the camera are depicted in Fig. 2. The image plane is placed in front of the lens to circumvent the image inversion in the picture-taking process3 5) The direct perspective transformation in homogeneous coordinates (2'5) can be expressed as (1)

u v = Su,

where up = (xv, y p, zp, 1)' is the image coordinate vector, u = (x,y,z, 1)t is the object coordinate vector, S is a 4 x 4 transformation matrix, and t denotes transpose. The transformation matrix S is formed by a product of individual matrices as (s) S = PGR Z

G=

1 o

0 0

-all

]

-(d~+J)

1 -d3

0 0 0

(4)

1

which moves the image plane to the gimbal center. The camera-center-to-gimbal-center offset is represented by the vector (dl, d2, d3Y. The rotation operator (3) (5)

R = R s RT R e ,

is a linear transformation which aligns the image plane with the world reference frame. The swing matrix,

LoI:OS °sinai 0 !1

(2)

"I

0

cos

0

0

,6, 1

,

swings the image plane through an angle ~k. The swing angle, ~, is the angle between the camera horizontal (in the lens-plane) and the x - y plane. The tilt matrix,

Xo

I

(a)

0 Rr =

I I

J

/

cos 4~

/

I° ° °1 0

- s i n ~b cos q5

0

0

0

1

cos 0

0

sin 0

0

0 -]

cos 0

0

0

0

0

1

0

0

0

0

1

-sin 0 Re =

Fig. 2. Parameters of the model.

0 (7)

tilts the image plane through an angle q5; and the pan matrix,

zo

.vo

sin 4~

.7

]

(8)

,

pans the image plane through an angle 0. Finally, the

A camera model for natural scene processing is the set of equations,

translation matrix,

x = ax + kb~,

-- X 0

I°° )z 0

T=

133

1

0 --Yo

0

0

1 -o

0

0

0

1

(19)

y = a r + kb r

(9)

,

moves the gimbal center to the world reference frame. In the direct transformation (object point to image point), the second component of the image coordinate vector, yp, does not have any physical meaning; but a square matrix S allows the determination of the inverse transformation by a matrix inversion. The image coordinate pair (xp, zp) is related to its corresponding picture array coordinate pair (h, v) by the expressions (4)

z = a~ + kb~,

which describe the ray from the camera lens center to the object location. The coefficients in the above set of equations can be expressed as ax = Xo + dl cos0 - dz cos~b sin0 + d3 sin4)sin0

(20) h

--

ho

O0

-

-

U

.

--

.

- cosqSsin0 + . J~ - s m q ) s m v

bx = ~ f K ~ C ° s O

(21) ay = Yo + dl s i n O + d2 cos~cosO - d3 sinq~cosO (22)

h - ho xp , Kx

(10) b,

hf K- h°x s i n 0 + c o s q ~ c o s 0

- - - s iVn . .q) cos o vofKz-

and

(23) zp =

U0 - -

U

(11)

K,

az = z0 + d2 sin~b + d 3 cosq~ V0

where ho and Vo denote the location of the image plane origin in the picture array, and K , and K~ denote the image scale factors (bits/in.) in the x and z direction. Hence, the image location (h, v) can be expressed as h = (A/D) + ho,

(12)

v = (B/D) + Vo,

(13)

and

b~ = sin~ + - -

-

/)

cos ~b.

(24) (25)

fK.

The free variable k results from the fact that each image point is mapped onto a straight line passing through the object point and the camera lens center. A depth cue is required to determine the exact location of the object point along this line. In the sequel the p r o d u c t s f K x a n d f K ~ will be considered as single parameters and be denoted by L~ and L.., respectively.

where A = fKx[(x

- Xo) (cos q/cos 0 + sin ~9sin ~bsin 0)

+ (y - yo) (cos~bsin0 - sinq/sing~cos0)

CAMERA

(14)

+ (z - Zo) sin~ cos4~ - dl] D = - (x - Xo)cos q5sin 0 + (y - Yo)cos ~ cos 0 + (z - Zo)sin(h - d2,

(15)

and B = -fK,[(x

- X o ) ( - s i n ~ c o s 0 + cos~Osinq~sin0)

+ (Y - Y0)(- sinqJsin0 - cos~Osin4~cos0) -1- ( 2 - -

(16)

CALIBRATION

The perspective transformations contain certain offset parameters. A knowledge of these parameters is required in order to utilize the transformations. The thirteen offset parameters consist of ho, Vo, Lx, L~, dl, d2, d3, 0, ~b, I~, Xo, Yo, and Zo. They can be directly measured or can be computed from a set of observations. (5) The latter method is based on computing the parameter values that will minimize the error between the measured and calculated image coordinates over a set of sample points. Here, the error function is defined as n

Zo) COS~b COS~b -- d3].

E = ~ Ah~ + Ave, The inverse perspective transformation (image point to object point) can be stated as u = S - 1 up,

(17)

where S -x = T - 1 R - 1 G - x P

-1.

(18)

A more convenient form for the inverse transformation

(26)

where Ahi = h ~ - hci and Avl = v l - vci. The parameter values that will minimize the error function are determined by the solution of the set of equations, (0~-E'] = 0, j = 1,2 . . . . .

\exN

13,

(27)

134

E.A.

PARRISH,

JR. and A. K. GOKSEL

or

~: 1

(ah, Oahi \ Oxj +

OaviI = O, j

Avi ~

Oxi )

= 1, 2. . . . . 13, (28)

where xj are the offset parameters. It is not practical to attempt an analytical solution because of the complexity of the error function and the large number of parameters. Instead, the error function is minimized numerically using a least-squares non-linear parameter estimation algorithm. (6) The parameters in a similar model, discussed in Ref. (3), were determined by direct search methods. However, the anticipated numerical complications in these methods suggest determining a group of the parameters by analytical minimization of the error function while the remaining are measured. The performance of the parameters can then be checked on a set of image and corresponding object points. If the resulting errors are too large, then a least squares estimation method can be used to determine numerically a larger set of parameters. The parameter values determined by the former method can then be used as initial estimates in the latter to speed up convergence. The measured parameter values are tabulated in Table 1. The coordinates of the image plane origin (h0, Vo)are determined from (12) and (13), along with a set of physical observations in the following manner. From (12) and (13), h = h0 if A = 0, or (x - Xo)cos 0 + (y - Yo)sin 0 = 0,

Equations (31) and (32) describe the locus of object points whose corresponding image points lie on lines through ho and Vo, respectively. A set of data was collected by placing a black sphere at several points on these loci and computing the corresponding image centroids. The sample mean for ho and Vo yielded the values ho = 82 bits and Vo = 54 bits. The error function is thus

:

o

+[v,-(vo-

L~ Bi -

da~]2,

(33)

o, /

where Lx, Lz, d,, and d3 are the parameters to be computed and hi and vi are the image coordinates determined by the picture-processing routine. The terms Ai, Bi, and Di are expressed as Ai = (xi - Xo) (cosO cos0 + s i n O sinq~ sin0) + (Yi - yo) (cos~k sin0 - sinlk sinq5 cos0)

(34)

, + (zi - Zo)(sin~Ocos4~) Bi = (xl - Xo) ( - sinO cos0 + cosO sinq~ sin0) + (Yi - Yo) ( - sinO sin0 - c o s O sinq~ cos0) (35)

+ (z~ - Zo) c o s 0 cosq~, and Di = - (xi - Xo) cosq~ sin0 + (Yl - yo)cosq5 cos0

(29)

+ (zi - Zo) sin q~ - d : ,

(36)

and where (x, Yi,z~) is the object location measured at the irh observation. The solution for the set of equations,

V=Vo if B = 0 , or

&/OLx = 0 (x - Xo) sin~b sin0 - (y - Yo) sinq~ cos0 + (Z - - Z o ) C O S ~ - - d 3 = 0 .

&/8L= = 0 (30)

Substituting the parameter values in Table 1 into (29) and (30) yields x = 48.77 - y

(31)

(~E/63d 3 = 0 ,

yields

for h = ho and x = y - 7.22

(32)

ell = f U A

(38)

da = f3/f4

(39)

L x = ( ~ h , - h o ' ~ / ( ~ Ai

f o r v -~ V0 .

i:,

Table 1. Measured offset parameters

and

0 ~b

Lz =

Xo Yo Zo d~ d2 d3

45 deg. - 24 deg. 0 45.28 in. 3.29 in. 8.0 in. 0 5.2 in. 7.65 in.

(37)

&lcGd, = 0

D, } l t i = , ~ -

- ,~1"=

D, / , \ , = 1

" ~-i) dii~, '

D-~ 7 - d3 ,=El

(40)

,

(4i)

where

" Ai i=l

"

i i=l

hi - ho

" A~2 V, h i - ho i=l

(42)

A camera model for natural scene processing

1

A i ~ hi-hO i=1

Table 3. Camera model errors

" h l - ho

~.~iii~=1 Di

i=l

i=l

Experiment No. 1

Experiment No. 2

0.023 0.00015 0.541 0.010 0,00003 0.537

0.022 0.00018 0.597 0.009 0.00004 0.689

(43)

~=l " Bi ~ v l - vo f3 = t"= D~ i= a

Di

~ Bi -

i=1

B{ " - - i~= l D~

135

Var (E) CV(e)

v i - vo Di

(6) (44) Var CV(8) and n

n /)i - -

Di

i= D~i=

" 1 ~ v i - Vo ~-I - --Bi"

DO

•= D~

l

Di

i=1

(45) The four unspecified parameters were computed utilizing the data composed of twenty object points and their known corresponding image points. A black sphere was placed at twenty different locations on a quadruled table while the location of its centroid was measured and recorded. The image coordinates of the sphere's centroid were determined by the image processing routine and constitute the true values of h~ and vl. The computed parameters are tabulated in Table 2. The computed parameters were tested in two experiments, each with a sample size of twenty. The relative image position and the relative object position errors were calculated. The relative image position error is

Ei

= hi-hci

h i

+ vi-vci

I

vl

I'

(46)

where (hi, vi) denote the measured image centroid coordinates determined by the image processing routine, and (hci, vc~)denote the corresponding coordinates calculated from the object coordinates using the transformations in (12) and (13). The relative object position error is given by bi

=

xi -- xci Yi -- Yci - + , xi I Yi

!

(47)

where (xi, Yi) are the measured x - y object coordinates of the sphere centroid, and (xci,Y~.i) are the calculated x - y object coordinates using the inverse transformations in (19) and the image centroid coordinates (hl, vi). The z-coordinate of the object centroid was kept constant throughout the experiments was used to determine the variable k in (19). The results of the two experiments, i.e. the sample means of errors, L g, their variances Var (E), Var (6), and the coefficients of variation CV(~) and CV(5), are tabulated in Table 3. The magnitudes of the errors are small, especially when errors in measurements are considered. Table 2. Computer model parameters L~

L=

d1 (in.)

d3 (in.)

353.79

344.95

0.26

7.69

It should be noted that the four computed parameter values are not the exact values for these parameters but are adjusted to compensate for the errors in other parameter values. This feature of the method was tested by deliberately introducing a gross error of 10~o into the measured parameters. The four parameters Lx, L=, dl, and d3 were then computed. The parameter values and the resulting position errors are tabulated in Table 4. The results demonstrate the effectiveness of the method in compensating for the errors in the model. Two numerical idiosyncrasies of the method should be noted. The numerator and the denominator of the parameters dl and d3 may vanish when the sample size is small, causing these parameters to be undefined. This characteristic of the method stems from the fact that the error function has the shape of a parabolic trough with an almost level bottom. In fact, the minimum point is independent of dl and d3 when a single sample is considered, resulting in a trough with a level bottom. The fluctuations in relative minimum grow in amplitude as sample size grows. Empirically, a sample size of ten was found to be safe, with a small uncertainty factor. Another numerical characteristic is that the method adjusts the parameters to reduce the errors equally in all samples, similar to a line-fitting routine. Hence, a "bad" sample would mislead the minimization process. A bad sample can be the result of an error in the picture-taking process, such as insufficient lighting, improper focus setting, etc. Hence, bad samples should be eliminated before the minimization process is actually applied. The bad samples can be detected by the unusual deviation between their corresponding individual errors and the mean error. The results of the experiments displayed in Tables 3 and 4 demonstrate the effectiveness of the method. Table 4. Results of error compensation experiment

Lx Ldl

d3 Var(E) cv(E) 6 Var (6) cv(a)

332.88 366.48 1.10 in. 5.69 in. 0,060 0.00086 0.489 0,025 0.00012 0.438

136

E.A. PARRISH,JR. and A. K. GOKSEL

The additional effort to refine the parameter values by numerically determining a larger set of parameters is clearly unnecessary.

are bad, the program requests a new data set. Using this procedure, the system can be calibrated within fifteen minutes.

THE CALIBRATION PROCESS

CONCLUSIONS

In the experimental setup most camera parameters were kept unchanged. However, some parameter values are changed when the camera is removed to be used in another project and later remounted. Hence, a calibration process is needed to determine the new parameter values. Rccalibration of the parameters would also compensate for the temperature and time-dependent variations in the system. The calibration process involves the following steps. A black sphere is placed at twenty predetermined object locations on the quadruled table while the image centroid coordinates are found by the pictureprocessing routine. The object coordinates are prestored in the calibration program and do not need to be entered during calibration. The error minimization method described above is used to compute the parameters Lx, Lz, dl, and da. The position errors incurred by these parameter values are then computed and listed to verify the calibration process. If any bad samples are detected, these samples are removed from the data, and the parameters are recomputed. If more than four out of twenty samples

This paper has presented a camera model and the associated perspective transformations used in a three-dimensional natural scene processing system. A method was described which simplified the usual numerical approach for determining required offset parameters and an efficient calibration procedure was presented. REFERENCES

1. Manual of Photogrammetry, Am. Soc. Photogrammetry, 2 vols., Fails Church, VA, 3rd Edn. (1966). 2. L. G. Roberts, Machine perception of three-dimensional solids, Optical and Electro-Optical Information Processing, pp. 159-197, MIT Press, Cambridge (1965). 3. I. E. Sobel, Camera models and machine perception, Ph.D. Dissertation, Stanford Univ. (1970). 4. A. K. Goksel, Digital picture processing of natural scenes and manipulator control for automated fruit harvesting, Ph.D. Dissertation, Univ. of Virginia (1975). 5. R. O. Duda and P. E. Hart, Pattern Classification and Scene Analysis, Ch. 10. Wiley, New York (1973). 6. D. W. Marquardt, An algorithm for least squares estimation of nonlinear parameters, J. Soc. lndust. Appl. Math. 2, June (1963).

About the Author--EDWARD A. PARRISH,JR., (S'66-M'68) was born in Newport News, Virginia, on 7 January 1937. He received the B.E.E., M.E.E., and Sc.D (Electrical Engineering) degrees from the University of Virginia, Charlottesville, Virginia, in 1964, 1966, and 1968, respectively. From 1961 to 1964 he was a Senior Computer Programmer and a Group Leader with Amerad Corporation, Charlottesville, Virginia. From 1964 to 1966 he was a Research Associate with the Research Laboratories for the Engineering Sciences, University of Virginia, where he worked on sampled-data modeling of SCR power systems. From 1966 to 1968 he held a NASA Traineeship and engaged in research in the pattern recognition area. He joined the Electrical Engineering Department of the University of Virginia in 1968, where he is currently Professor of Electrical Engineering. He is Director of the Pattern Analysis and Computer Systems Laboratory and has research interests in the areas of pattern recognition and picture processing, digital systems, industrial automation, and microprocessor applications. Dr. Parrish is a member of Eta Kappa Nu, Sigma Xi, Tan Beta Pi, Secretary and Member of Board of Governors of the IEEE Computer Society, Machine Intelligence and Pattern Analysis Technical Committee of the IEEE Computer Society, Mini/Micro Technical Committee of the IEEE Computer Society, American Association for the Advancement of Science, the Pattern Recognition Society, and the American Association of University Professors. About the Anthor--KEMAL GOKSEL received the B.S. degree in Electrical Engineering from Robert

College in Istanbul, Turkey in 1969 and the M.S. and Ph.D. degrees in electrical engineering from University of Virginia, Charlottesville, Virginia in 1971 and 1975, respectively. From 1970 to 1975 he was with Research Labs for Engineering Sciences, University of Virginia, first as research engineer, then as graduate research assistant. He is now a project engineer with M. B. Associates, San Ramon, California. His research interests include minicomputer and microcomputer systems, pattern recognition and computer controlled manipulators. Dr. Goksel is a member of Eta Kappa Nu.