Forensic Science International 163 (2006) 10–17 www.elsevier.com/locate/forsciint
2D/3D image (facial) comparison using camera matching Mirelle I.M. Goos, Ivo B. Alberink *, Arnout C.C. Ruifrok Netherlands Forensic Institute, P.O. Box 24044, 2490 AA, The Hague, The Netherlands Received 1 February 2005; received in revised form 1 November 2005; accepted 1 November 2005 Available online 6 December 2005
Abstract A problem in forensic facial comparison of images of perpetrators and suspects is that distances between fixed anatomical points in the face, which form a good starting point for objective, anthropometric comparison, vary strongly according to the position and orientation of the camera. In case of a cooperating suspect, a 3D image may be taken using e.g. a laser scanning device. By projecting the 3D image onto a 2D image with the suspect’s head in the same pose as that of the perpetrator, using the same focal length and pixel aspect ratio, numerical comparison of (ratios of) distances between fixed points becomes feasible. An experiment was performed in which, starting from two 3D scans and one 2D image of two colleagues, male and female, and using seven fixed anatomical locations in the face, comparisons were made for the matching and non-matching case. Using this method, the non-matching pair cannot be distinguished from the matching pair of faces. Facial expression and resolution of images were all more or less optimal, and the results of the study are not encouraging for the use of anthropometric arguments in the identification process. More research needs to be done though on larger sets of facial comparisons. # 2005 Elsevier Ireland Ltd. All rights reserved. Keywords: Facial image identification; Anthropometric comparison; 2D/3D image comparison; Laser scanning
1. Introduction In forensic practice, there is a frequent demand for comparison of images of perpetrators and suspects. These comparisons are usually performed on a morphological level, by comparison of local and global anatomical characteristics of the faces in the images. A problem with this approach to identification is that expert opinion is usually of a subjective nature, with considerable variation between observers, cf. [1–4], and unknown performance rates, both of individual experts and in general. A possible solution for this problem is the use of anthropometric comparison between (ratios of) distances between fixed anatomical points in the face like the corner of eyes, tip of the nose or lowest point of ear lobe. These * Corresponding author. Tel.: +31 70 888 64 04; fax: +31 70 888 65 59. E-mail address:
[email protected] (I.B. Alberink).
would form an excellent starting point for objective, numerical comparison. A problem is that they vary strongly according to the position and tilt of the camera taking the image with respect to the head. Further disturbing factors in this process include
Unknown focal length of the camera taking the image, unknown pixel aspect ratio (height/width ratio of pixel), low image resolution, different facial expressions, and lens distortion.
In case of a cooperating suspect, a new photograph can be taken using the so-called ‘‘three-point method’’, described in ref. [5]. In this method, an overlay containing the locations of three fixed anatomical points is projected on top of live camera recordings and used to position the suspect so that position and pose of the face correspond with those of the questioned image. The camera used to position the suspect is required to
0379-0738/$ – see front matter # 2005 Elsevier Ireland Ltd. All rights reserved. doi:10.1016/j.forsciint.2005.11.004
M.I.M. Goos et al. / Forensic Science International 163 (2006) 10–17
be identical to the one recording the questioned image, and the camera position and settings checked by the representation of fixed objects in the image. The result is two images, taken with the same camera settings, in which the persons are looking in about the same direction relative to the camera, which may for example be superimposed. However, the procedure is time consuming, requires intense cooperation and estimates of the accuracy cannot be given. A good way around this difficulty would be the following. Since in the above situation usually the suspect is known, his or her 3D image may be taken using e.g. a laser scanner. If this 3D image is subsequently projected onto a 2D image with the suspect’s head in the same pose as that of the perpetrator, and using the same focal length and pixel aspect ratio, numerical comparison of (ratios of) distances between fixed anatomical points becomes feasible. Finding the right projection of the 3D image is reached through fitting mutual fixed anatomical points in both images on each other. On the basis of this, camera position, orientation, focal length and pixel aspect ratio can be determined putting the suspect in the same pose as the perpetrator in the 2D image, and the 3D image is projected on a comparable 2D one. If more than four fixed locations in the face are known, the remaining points may be used for comparing mutual distances and ratios between them. In case of matching suspect and perpetrator, the differences are expected to be significantly smaller than for non-matching ones. To investigate the latter, an experiment was conducted in which we used two 3D scans and one 2D image of two colleagues. Using seven fixed anatomical locations in the face, comparisons were conducted for the matching and nonmatching case. To counter possible operator dependency of the annotation of the locations, the fixed anatomical points were annotated by four operators. First, a description will be given of the methods used at the basis of our camera matching procedure, and our implementation of these. After this, the results of comparisons between matching and non-matching pairs of subjects are described and conclusions on the above are drawn.
11
Since these are usually not known, they need to be estimated. To estimate them, we developed a matching algorithm that finds the right position, orientation, focal length and aspect ratio for the camera so that a number (either four or seven) of fixed anatomical landmarks (optimally) coincide. The locations of the landmarks, chosen on basis of criteria as visibility in both the photo and 3D model, even distribution over the face, being minimally influenced by differences in face, and being well defined, were the following: 1. 2. 3. 4. 5. 6. 7.
Right ectocanthion, left ectocanthion, pronasale, stomion, gnathion, intertragic notch, and sellion.
(see Fig. 1). The points are determined manually, for the 3D scans by one single operator and for the 2D image by four separate operators. Camera position, orientation, focal length and aspect ratio are determined solely on the basis of
2. Description of the experiment 2.1. General description In order to make a 2D projection from a 3D laser scan model in which the face is viewed from the same position as the face on a questioned image, a number of parameters of the camera used to take the questioned image need to be determined. The camera parameters of importance are: X,Y,Z: position of the camera, a,b,g: orientation (tilt, pan and roll) of the camera, f: focal length of the camera lens, a: pixel aspect ratio.
Fig. 1. The 2D image used as the questioned image, plus the annotated landmarks.
12
M.I.M. Goos et al. / Forensic Science International 163 (2006) 10–17
Table 1 Image coordinates assigned to the seven match points by the four different operators (C indicates column, R row coordinates)
1 2 3 4
Ectocanthion (right eye)
Pronasale (nose tip)
Gnathion (chin)
Ectocanthion (left eye)
C
R
C
R
C
R
C
465 469 459 466
814 811 813 816
550 549 543 552
947 953 948 947
656 650 645 644
1217 1223 1212 1212
760 767 753 758
Intertragic notch (ear)
Sellion (between eyes)
Stomion (mouth)
R
C
R
C
R
C
R
788 791 792 792
949 950 951 952
878 877 870 878
578 577 576 574
794 804 798 794
610 609 603 603
1070 1069 1071 1070
Table 2 3D coordinates of the match points as assigned on the laser scan models (by one operator) 3D model of Fig. 3A (matching case)
Right eye Nose tip Chin Left eye Ear Between eyes Mouth
3D model of Fig. 3B (non-matching case)
X
Y
Z
X
Y
Z
159.9 103.0 114.1 81.2 88.4 109.6 109.1
243.2 259.2 217.8 199.9 125.7 241.0 235.7
156.6 123.8 51.2 161.1 133.8 164.7 87.3
164.1 105.7 123.9 76.7 77.7 112.8 114.4
237.5 259.8 232.6 199.2 114.3 240.7 245.5
157.7 125.3 43.7 152.5 113.1 163.0 84.8
these points, which will be referred to as match points. The resulting coordinates are given in Tables 1 and 2. The formulas used as the basis of our matching algorithm are based on the principle of central projection which states that for every point of the 3D object, the line from that point to its projection on the photo goes through the projection center, see Fig. 2. The projection center defines the position of the camera. Knowing the coordinates of corresponding points in the photo (2D coordinates) and the 3D object (3D coordinates), the eight camera parameters described above can be estimated. In the investigation, this has been done using both four and seven match points. The formulas used to estimate the parameters on the basis of match points are the collinearity equations described in ref. [6]. Because the known parameters in these formulas are not linearly related with the unknown parameters, the solution is approximated using Taylor’s method. This is implemented by means of an iterative process, in which the sum of the squared differences
in row and column coordinates as found in Formula (1) is minimized. In this formula, n is the number of match points, Dr the difference in row coordinates and Dc is the difference in column coordinates: ! n X 2 2 2 minðe Þ ¼ min (1) ðDr þ Dc Þi i¼1
The mean absolute error as calculated in Formula (2): pffiffiffiffiffiffiffiffiffiffiffiffi meanjej ¼ e2 =2n (2) is minimized. The iterative process as described does not necessarily converge, and is dependent on a fitting starting point. Using focal length and pixel aspect ratio in the process caused the process to diverge so often that it was decided to fix these at their true values, 70 mm and 1. (Notice that in this way they do not further obstruct the comparison process.) Hence, only camera position and orientation were estimated, and used to
Fig. 2. Principle of central projection: each line from a point in the 3D model to the corresponding point in the image crosses the projection center.
M.I.M. Goos et al. / Forensic Science International 163 (2006) 10–17
13
Fig. 3. The two 3D facial laser scans used, plus annotated landmarks. The questioned image (see Fig. 1) and the laser scan (A) originate from the same person. The dots indicate the match points used for matches based on four points. The dots and crosses in combination indicate the match points used for matches based on seven points.
project the 3D model on a 2D image with the same orientation of the head as in the questioned image (Fig. 5). The analysis of the results is performed on the basis of the following two measures: 1. In the case of seven match points: the remaining error e between the locations of the match points in the 2D image and those in the image derived from the 3D model. 2. In the case of four match points: the remaining error e between the locations of the three corresponding points not used as match points. 2.2. Technical data and equipment The photo in Fig. 1, used as questioned image, has a width of 1200 pixels and a height of 1805 pixels. The height of the face is about 600 pixels. The aspect ratio is 1, the photo was taken with a focal length of 70 mm and a projection size of 36 mm 24 mm. Fig. 5 shows 3D laser scans made of two different persons. Of one of these persons the photo in Fig. 1 is made. From these facial scans, 2D projections are made, in which the 3D model is viewed from about the same angle as the face in the photo seen in Fig. 1. Now, the locations of corresponding anatomical points can be compared. Since there is a matching pair (2D projection of the scan model and the photo originate from the same person) and a non-matching pair (2D projection of the scan model and the photo do not originate from the same person) the location-differences of anatomical points in images can be compared. The 3D scanner used for the experiment is the Minolta VI 910. The resulting facial scans are comprised of more than
172,000 points. The accuracy of the scanner is about 0.2 mm in the x and y and about 0.1 mm in the z-direction for fine scans, taken with a focal length of 25 mm. The z-coordinate axis is pointing in the direction of the laser beam. For more information about the scanner used see [7].
3. Results 3.1. General As mentioned, the 3D coordinates of the match points are annotated by a single operator, so for both 3D facial models one set of 3D coordinates of the match points is used. The 2D coordinates of the match points are determined by four operators, so there are four sets of 2D coordinates of the match points. This means each of the two sets of 3D coordinates can be used with each of the four sets of 2D coordinates to estimate the camera position and orientation, so that in total eight different solutions for the camera position and orientation can be derived, four of which deriving from a matching pair and four from a non-matching one. Camera positions and orientations have been estimated using both seven and four match points. In case of four match points, only the white dots, as indicated in Figs. 3 and 4, have been used. Of the seven corresponding points, these four seem to describe the face best in height, depth and width and therefore to be best suited for the camera match algorithm. 3.2. Results using seven match points The 2D coordinates of the match points for each operator are matched with their corresponding 3D coordinates. The
14
M.I.M. Goos et al. / Forensic Science International 163 (2006) 10–17
resulting camera positions and orientations and the remaining errors are given in Tables 3 and 4. Table 3 shows the results when the photo and the 3D model are both from the same person, a matching pair. Table 4 presents the results for the non-matching pair. The resulting camera positions and orientations were used to make 2D projections from the 3D models. In Fig. 5B and C, the resulting 2D projections of both 3D models are shown for operator 1. In these projections, the face is viewed from about the same angle as the face in Fig. 5A, the questioned image. The differences in location of the corresponding points depicted in Fig. 5A and B, and A and C, have been analyzed. Tables 3 and 4 show that the mean absolute error can be larger for a matching pair than for a non-matching pair. For operator 2, the mean absolute error for a matching pair, Table 3, is larger than for operator 4, for a non-matching pair, Table 4. 3.3. Results using four match points Of the four selected match points, the 2D coordinates are matched with their corresponding 3D coordinates for each operator. The resulting camera positions and orientations and the remaining errors are given in Tables 5 and 6. Table 5 shows the results from the algorithm in which the photo and the 3D model were both from the same person, a matching pair. Table 6 presents the results from the algorithm for the non-matching pair. The mean absolute error is, for all operators, smaller for the matching pair than for the nonmatching pair. If only four corresponding points are used as match points, the other three corresponding points can be used for analysis. One would expect that for these points the remaining absolute differences in photo coordinates are smaller for a matching pair than for a non-matching pair. For each operator, these absolute errors are presented in Fig. 6. The boxes give the errors resulting from a matching pair; the dots show the errors resulting from a non-matching pair. Fig. 6 shows that for 10 out of 12 errors, the absolute error is larger for the matching pair than for the nonmatching pair. 3.4. Discussion
Fig. 4. Outline of the 2D/3D matching method used.
Expectations beforehand were that the remaining absolute errors between corresponding points would be significantly smaller for matching than for non-matching pairs of images. In the experiments using four match points, this was the case for the match points: the resulting mean absolute error of the match points was consistently smaller for the matching pair, which indicates that this error might be usable for differentiating between matching and non-matching pairs, see Tables 5 and 6. However, for the other three corresponding points, in 10 out of 12 cases the resulting absolute errors are larger for the
M.I.M. Goos et al. / Forensic Science International 163 (2006) 10–17
15
Table 3 Derived camera position and orientation for the matching pair, for each investigator
1 2 3 4
X
Y
Z
a
b
g
Mean jej
Max jej
372.3 371.5 381.4 382.9
613.6 609.5 601.2 606.7
156.8 181.1 184.2 167.2
86.5 83.0 82.2 84.9
308.2 308.1 307.1 307.2
179.9 183.4 183.6 182.0
6.0 8.1 6.3 6.1
13.1 17.7 11.8 11.5
Calculations based on the image coordinates and 3D coordinates of seven match points. The position of the camera (X,Y,Z) in millimeter, the angles (a,b,g) in degrees, the error (e) in pixels.
Table 4 Derived camera position and orientation for the non-matching pair, for each investigator
1 2 3 4
X
Y
Z
a
b
g
Mean jej
Max jej
338.0 339.8 353.8 352.0
703.9 696.7 688.4 696.3
185.6 208.1 210.0 195.2
83.4 80.7 80.2 82.1
316.2 315.9 314.7 315.0
173.2 175.9 175.9 175.0
10.2 11.0 9.0 7.5
17.1 19.0 15.0 12.5
Calculations based on the image coordinates and 3D coordinates of seven match points. The position of the camera (X,Y,Z) in millimeter, the angles (a,b,g) in degrees, the error (e) in pixels.
matching than for the non-matching pair, see Fig. 6. This means that the position in the images of the three corresponding anatomical points not used for matching is in this case not usable as a basis for numerical comparisons to differentiate between matching and non-matching pairs. The experiments using seven match points show that the remaining mean absolute error of a matching pair can be larger than the remaining mean absolute error of a nonmatching pair. The remaining mean absolute error of opera-
tor 4 in Table 4 (non-matching pair) is smaller than the remaining error of operator 2 in Table 3 (matching pair). Again, this reflects poorly on the use of the position of match points in the images to differentiate matching and nonmatching pairs. The particular choice of four match points results in a match fitting the contour of the face, whereas the three remaining points characterize its inside region, hence it might be noted that these are expected not to fit very well. This being generally true, the fact remains that in case of
Table 5 Derived camera position and orientation for the matching pair, for each investigator
1 2 3 4
X
Y
Z
a
b
g
Mean jej
Max jej
376.7 364.5 392.2 396.0
593.7 598.4 566.0 573.2
157.5 174.2 191.3 170.7
86.2 83.8 80.4 83.9
306.6 307.7 304.2 304.2
180.2 183.0 185.1 182.9
5.5 6.8 3.6 3.6
8.7 11.8 6.3 6.2
Calculations based on the image coordinates and 3D coordinates of four match points. The position of the camera (X,Y,Z) in millimeter, the angles (a,b,g) in degrees, the error (e) in pixels.
Table 6 Derived camera position and orientation for the non-matching pair, for each investigator
1 2 3 4
X
Y
Z
a
b
g
Mean jej
Max jej
303.2 291.9 337.5 336.0
730.4 728.7 701.6 711.4
166.7 188.8 196.1 180.1
85.8 83.4 82.1 84.1
319.8 320.5 316.3 316.7
171.4 174.0 174.2 173.4
10.5 11.0 8.9 8.4
16.4 16.3 14.5 12.4
Calculations based on the image coordinates and 3D coordinates of four match points. The position of the camera (X,Y,Z) in millimeter, the angles (a,b,g) in degrees, the error (e) in pixels.
16
M.I.M. Goos et al. / Forensic Science International 163 (2006) 10–17
Fig. 5. 2D projections of the 3D laser scans using calculated camera position and orientation, for researcher 1. The 3D scans are viewed from about the same point as the face in part (A). Numerical results for the matching (A and C) and non-matching (A and B) case are given in Tables 3 and 4.
Fig. 6. The case of four match points: remaining absolute errors for the three remaining ones, per researcher. Boxes indicate the errors that result from a matching image and 3D model, dots errors that result from a non-matching image and 3D model. Point numbers 1 up to 3 correspond to pronasale, sellion and stomion (see Fig. 3).
matching images one expects the faces to have a better fit than in the case of non-matching faces. Moreover, different combinations of the four match points were looked at, showing similar results. The current combination was reported as it seems to give the best global fit of the faces, but the result is not dependent on this fact.
4. Conclusions The main question of the study may be expressed as: ‘‘is it possible to conclude whether or not two faces belong to the same person on basis of the distance between corresponding anatomical points in the resulting
M.I.M. Goos et al. / Forensic Science International 163 (2006) 10–17
2D projection of a 3D scan model and a questioned image?’’ To this, a method was developed in order to semi-automatically determine the camera position and orientation needed to project the 3D scan model onto a 2D image so that a number (either four or seven) of landmarks optimally coincide. The algorithm did not converge well in case focal length of the camera lens and pixel aspect ratio were left free as well, so we concentrated on determining position and orientation of the camera. Visual inspection of the results affirms that in this case the projected images are indeed depicting the face in the same position and orientation as in the questioned image. Yoshino et al. [8–10] performed similar experiments connecting 3D facial models to 2D images for facial comparison. On the basis of a manual (computer-assisted) superimposition of the 3D models on the images, resulting positions of anatomical landmarks are compared. Their conclusion is that the anthropometric used is helpful in the identification process. The result of the present study does not support their conclusion. Looking at the remaining distances between landmarks for a case of matching faces and a non-matching one, these do not seem smaller in the matching case. As such, this example makes a bad case for anthropometric comparison of images. Possible factors for this result as mentioned in the introduction cannot explain the bad result. Facial expressions seemed to be even, focal length and aspect ratio were taken at their true values, 3D scans and 2D image were taken with high quality equipment, and different operators annotated the 2D image in order to study stability of the results using different operators. One might argue that the one operator seems to obtain better results than the other because of being more experienced, but in fact their results do not differ much from what might be expected by cause of natural variation. It is informative to look at an illustration of the problem in Fig. 5B and C. In these two projected images it is seen clearly that the two faces of our colleagues, which are not too similar, have quite similar locations for all landmarks (except the pronasale). We note that the above is only a small study and further research is necessary on larger sets of faces to obtain a clear picture on the possibilities for anthropometric comparisons.
17
Acknowledgements Authors would like to thank Jurrien Bijhold for coming up with the idea for the study, and Derk Vrijdag for anatomical annotation of the suspect image. Moreover, we thank the referees who made useful comments that improved the paper.
References [1] P. Vanezis, D. Lu, J. Cockburn, A. Gonzales, G. McCombe, O. Trujillo, M. Vanezis, Morphological classification of facial features in adult Caucasian males based on an assessment of 50 subjects, J. Forensic Sci. 41 (5) (1996) 786–791. [2] H. Borrman, J. Wasen, M. Taister, Variation in the observation and classification of different morphological facial characteristics, Abstracts – The 9th Scientific Meeting of the International Association for Craniofacial Identification, Washington, DC, 2000. For. Sci. Com. 2 (www.fbi.gov). [3] I.B. Alberink, A.C.C. Ruifrok, Inter-operator test for the clicking of polylines in earprints, J. Forensic Ident., submitted for publication. [4] I.B. Alberink, A.C.C. Ruifrok, H. Kieckhoefer, Inter-operator test for anatomical annotation of earprints, J. Forensic Sci., submitted for publication. [5] H. van den Heuvel, The positioning of persons or skulls for photo comparison using three point analysis and one-shot-3D photographs, in: Proceedings of SPIE, Kathleen Higgins, 3576, SPIE, Bellingham, 1998 pp. 203–215. [6] W. Gruen, E.P. Baltsavias, Geometrically constrained multiphoto matching, Photogramm. Eng. Remote Sens. 54 (May (5)) (1998) 633–641. [7] A. Ruifrok, M. Goos, B. Hoogeboom, D. Vrijdag, J. Bijhold, Facial image comparison using a 3D laser scanning system, Proc. SPIE 5108 (2003). [8] M. Yoshino, H. Matsuda, S. Kubota, K. Imaizumi, S. Miyasaka, Computer-assisted facial image identification system using a 3-D physiognomic range finder, Forensic Sci. Int. 109 (2000) 225–237. [9] M. Yoshino, Conventional and novel methods for facial-image identification, Forensic Sci. Rev. 16 (July (2)) (2004). [10] M. Yoshino, M. Taniguchi, K. Imaizumi, S. Miyasaka, T. Tanijiri, H. Yano, C. David, L. Thomas, J.G. Clement, A new retrieval system for a database of 3D facial images, Forensic Sci. Int. 148 (2005) 113–120.