Re-illuminating single images using Albedo estimation

Re-illuminating single images using Albedo estimation

Pattern Recognition 38 (2005) 1261 – 1274 www.elsevier.com/locate/patcog Re-illuminating single images using Albedo estimation Philip L. Worthington∗...

2MB Sizes 0 Downloads 37 Views

Pattern Recognition 38 (2005) 1261 – 1274 www.elsevier.com/locate/patcog

Re-illuminating single images using Albedo estimation Philip L. Worthington∗ Department of Computation, UMIST, Manchester M60 1QD, UK Received 23 September 2003; accepted 5 November 2004

Abstract Predicting the appearance of a scene under novel lighting conditions is of growing interest at the convergence of vision, graphics and virtual reality. In this paper, we develop a method for appearance prediction from a single image using the apparatus of shape from shading (SFS). We re-visit the reflectance estimation process first proposed by Blake (Graphics Image Process. 32 (1985) 314), and develop a novel approach to parameter selection within the Blake method based on the quality of images which can be produced by re-illuminating the recovered needle-map. Combining Blake’s method with recent advances in SFS is demonstrated to yield significant improvements in the appearance prediction of real images under varying lighting conditions. 䉷 2005 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved. Keywords: Shape from shading; Albedo estimation; Re-illumination; Appearance; Prediction

1. Introduction Predicting the appearance of an object or scene under novel viewing conditions has become a major research goal at the interface between vision and graphics. A wide range of techniques have been proposed in both literatures, usually based on sets of image, geometric models or range images (e.g. Refs. [1–10]). In contrast, the traditional goal of shape from shading (SFS) has been shape estimation from a single image. Progress has been slow due to the under-constrained nature of the problem; however, recent research has suggested alternative applications, including generalizing appearance for face recognition [11,12] and view synthesis [13,14]. This paper seeks to extend this work by exploring the prediction of appearance under varying illumination from a single image. Possible applications of such techniques

∗ Tel.: +44 0 161 200 3301; fax: +44 0 161 200 3324.

E-mail address: [email protected] (P.L. Worthington).

include predicting appearance from a police mugshot for face recognition purposes, to reduce storage and matching demands in an object recognition context, or to model background objects in VR and graphics applications without the expense of building complete 3D models. 1.1. Motivation Most existing work on SFS focuses on the recovery of accurate surface shape or normal estimates. Re-illumination, when considered at all, is treated as a consequence of having recovered plausible shape information, rather than as a useful goal in its own right. However, SFS solutions which appear accurate when illuminated under the original lighting typically suffer from severe artefacts when reilluminated under novel conditions. These artefacts strongly disrupt the resulting images, leading to a loss of subjective photo-realism with a change of just a few degrees in lighting direction. By photo-realism, we mean that most observers would agree that the image could be a photograph of a real object or scene. Clearly, this is a very subjective

0031-3203/$30.00 䉷 2005 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved. doi:10.1016/j.patcog.2004.11.015

1262

P.L. Worthington / Pattern Recognition 38 (2005) 1261 – 1274

Fig. 1. The results of applying the Bichsel and Pentland (left pair of columns) and Worthington and Hancock (right) SFS algorithms to a face image. The top row shows the original image (middle) and the images produced by re-illuminating the output of each algorithm using the original light source direction. Subsequent rows show the results of successively more extreme lighting. We consider lighting from the right and from below in each pair of columns, at angles of 15◦ , 30◦ and 45◦ to the original lighting. Clearly, the Bichsel and Pentland approach does not ensure data closeness, but the oversmoothing of the image data does improve the sense of photorealism in the re-illuminated images. The Worthington and Hancock method ensures data closeness and captures the surface shape fairly well, but introduces artefacts which destroy the sense of photorealism.

assessment criterion which is strongly observer-, object- and context-dependent. Nonetheless, it can be readily understood as an important goal for graphics techniques, and is also likely to be desirable in applications such as predicting facial appearance under different lighting conditions for face recognition. Fig. 1 illustrates the poor re-illumination performance of some existing SFS techniques: the well-known Bichsel and Pentland algorithm [15] and the Worthington and Hancock method [16]. The images produced using several other wellknown SFS algorithms [17–19] have also been examined, but produce significantly poorer re-illuminated images than these two examples. Throughout this paper, we assume the original image to be illuminated by a light source coincident with the viewing direction, since this simplifies the mathematics and is often a reasonable approximation in the case of flash photography of faces and objects. When re-illuminated under the original lighting conditions, the Bichsel and Pentland surface yields a reasonable approximation to the original image, although flattening of the recovered surface re-

sults in some brightening of the image. However, structural artefacts are introduced and it is clear from, for example, the behaviour of the nose when re-illuminated from above or below, that the surface does not truly capture the face shape. In this example, we only provide the height of the central point of the image. If initialized with the height of the tip of the nose, the performance would be improved, but this would require high-level image understanding to automate. Meanwhile, the Worthington and Hancock approach [16] uses the image irradiance equation as a hard constraint on needle-map recovery, so the original image can always be perfectly recovered. However, re-illumination under novel lighting introduces severe artefacts (Fig. 1). These are widely distributed, and manifest themselves as unexpectedly bright or dark regions. The main obstacles to creating photo-realistic images using existing SFS methods appear, from Fig. 1, to result from either oversmoothing the surface, or from local errors in the recovered needle-map or surface. In the former case, as illustrated by the Bichsel and Pentland results, methods

P.L. Worthington / Pattern Recognition 38 (2005) 1261 – 1274

walk away from the image data due to model dominance. In the latter, strict adherence to the intensity data forces the choice of normal directions which, when re-illuminated by light sources significantly different from the original, result in artefacts that disrupt photo-realism. 1.2. Sources of errors We concentrate on the Worthington and Hancock SFS framework reported in Ref. [16], since this approach ensures data closeness and is a flexible platform for developing novel SFS constraints. Within the framework, the Lambertian image irradiance equation is treated as a hard constraint. This ensures that the original image can be perfectly recovered from the needle-map. Under the constant albedo assumption adopted in most SFS research, the hard constraint reduces the SFS problem to selecting the direction of the normal on the cone using some form of neighbourhood consistency constraint. In other words, we fix the polar angle, , and aim to find an azimuthal angle,  using an additional neighbourhood consistency constraint. We initialize the azimuthal angle using the negative gradient direction. This is chosen based on the assumption that bright regions are closer to the viewer. There is some evidence that the human visual system makes a similar assumption [20]. This gives us   (−jE/jy)i,j −1 0 −1 i,j = cos Ei,j , i,j = tan (1) (−jE/jx)i,j at every pixel location (i, j ) in the image E. We then use some form of consistency process to update k+1 i,j on the

basis of ki,j and its neighbours. Constraints such as needlemap smoothness and curvature consistency were applied in the original development of the hard constraint framework [16], whilst in recent papers we have considered minimization of image smoothness under re-illumination [21], directional averaging and multiscale gradient consistency [22]. It is worth noting that commonly applied integrability constraints (e.g. Ref. [23]) do not work well in conjunction with the hard constraint framework. If we assume that whichever consistency process we apply produces reasonable estimates of the azimuthal angles, we are left with errors in the estimation of the polar angles as the main source of re-illumination artefacts. This is not surprising, since clearly the constant-albedo Lambertian model is a grossly inaccurate approximation in most cases. Regions which appear dark due to low albedo or shadows are treated the same as regions which are dark due to possessing extreme slant relative to the light source direction. In terms of dealing with re-illumination artefacts within the hard constraint framework, we consider the distinction between shadow and albedo changes to be relatively unimportant. What matters is mitigating the disruptive effects of low-intensity regions that are not due to strong surface slant. Another way to view this is to relax the hard constraint where there is no justification for a large polar angle.

1263

A large polar angle leads to a dark patch in the original image appearing extremely bright under some re-illumination conditions. This will always be incorrect for a low albedo region, and often incorrect for a shadow region. Clearly, if a shadow in the original image is due to self-shadowing, there may be a range of light sources which might illuminate it. However, given only a single image, we have no information on which to base a surface normal estimate, so it appears better to incorrectly assume that the region has a low albedo rather than allow an unexpectedly bright patch to appear and disrupt the photo-realism. 1.3. Albedo estimation Clearly, from the above discussion we need some way to distinguish between dark patches due to strong surface slants, and those due to reflectance and shadowing effects. For conciseness, we describe this process as albedo estimation, although similar processes have been described variously as lightness estimation [24,1] and intrinsic image recovery [25,26]. In fact, Blake [1] defines lightness as the “psychophysical correlate of surface reflectance,” whilst albedo is just one aspect of the physical surface reflectance, but for the purposes of this paper we ignore these subtleties. Albedo estimation from a single image is an underconstrained problem. However, reasonable results have been achieved using various methods, essentially based on the assumption that reflectance changes lead to much stronger image gradients than shading effects [24,1,27,22]. In this paper, we revisit Blake’s early approach to estimate albedo [1] and show that its performance in conjunction with the hard constraint framework provides a means to predict appearance from a single image, with significantly better results than existing SFS methods. Moreover, we demonstrate that a simple measure of re-illuminated image smoothness can provide a basis for automatically selecting the value of the threshold parameter in the Blake method. The remainder of this paper is structured as follows. Section 2 briefly surveys the literature concerning SFS, and views synthesis techniques and albedo estimation from a single image. Section 3 reviews the Blake method [1] for estimating albedo from a single image, whilst Section 4 describes the use of re-illuminated image smoothness to select the Blake threshold parameter. Section 5 combines the albedo estimation process with a method for updating the azimuthal angle to produce a complete SFS algorithm. Finally, we present extensive experimental results and a discussion of avenues for future research.

2. Background A wide range of techniques have been proposed for predicting the appearance of an object or scene under novel viewing conditions in recent years. In the vision literature these include multiple-viewpoint techniques such as

1264

P.L. Worthington / Pattern Recognition 38 (2005) 1261 – 1274

image-based rendering [10,9] and structures from motion [6–8], and fixed viewpoint, multiple-lighting techniques such as photometric stereo and illumination cones [28,29]. Structures from motion techniques in particular have progressed rapidly over the last decade, especially in their application to planar objects such as the built environment [6–8]. Recent work has begun to address applying such methods to smooth objects which lack the strong surface features typically used to find correspondence between views [30]. Meanwhile, in the graphics literature, several approaches have been developed for realistically re-rendering a scene, based upon using geometric models together with one or more images to estimate the BRDF of the scene [2–5]. Smooth objects are natural candidates for the application of shape from shading and related techniques such as photometric stereo. However, despite the attention received by SFS since Horn first formulated the problem [31], progress has been slow. The image irradiance equation provides insufficient constraint to unambiguously recover shape, so a great deal of effort has been applied to developing smoothness constraints to augment the image irradiance information [17,32]. A problem with this approach is that the smoothness term often leads to model dominance, with the consequence that the recovered shape often lacks detail, and it is not possible to accurately reconstruct the original image. A survey of several popular SFS techniques [32] illustrates these problems. Several recent papers have reported the use of shading information in a view synthesis context. Georghiades et al. [28,29] have developed illumination cone techniques which implicitly recover reflectance information from a small number of images captured under controlled lighting. Chantler and Dong [33] have applied photometric stereo to images of 3-D textures, and use a combination of needle-map and 3-D reconstruction to predict the appearance of the texture under novel lighting conditions. In contrast to these multiview techniques, Vetter and Poggio [34,35] use a single face image to generate novel views under variable pose, although they utilize prior knowledge of the class of face images in the generation of virtual views via warping. Similarly, Atick, Griffin and Redlich [36] use a large set of range images of the human head, in conjunction with SFS, for face recognition. Zhao and Chellappa [11,12] have used SFS to synthesize views of human faces from single images in a face recognition context. Their method iteratively estimates a piecewise constant albedo map from surface normal estimates, but depends on finding a good initial partition of the image into different albedo regions. Segmentation is a notoriously difficult problem and typically requires careful parameter selection, and they also augment the process with a 3-D model of a prototype head to improve the accuracy of the images produced, thus limiting the general applicability of the method. Other researchers have attempted to separate reflectance and shading effects in general images. Horn [24] and Blake [1] presented early approaches based on Retinex theory

[37]. Both methods assume that the image is approximately Mondrian in form—composed of patches of uniform reflectance—and essentially threshold the image gradients. The reflectance estimates are then recovered from the remaining gradients. Blake and Brelstaff [38] showed that valuable results can be obtained by applying this method to real images, whilst Forsyth and Ponce [39] have recently highlighted a simpler formulation of Blake’s approach and Kimmel et al. [40] reported a variational approach to retinex calculations. In a similar vein, Barrow and Tenenbaum [25] proposed the separation of an image into pure reflectance and shading intrinsic images. The recovery of intrinsic images is under-constrained, since infinite combinations of illuminated surface and albedo map could create a particular image. Weiss addresses this problem by using sequences of images [26], while Tappen et al. [27] attempt to learn the difference between patterns due to shading and albedo changes from examples and subsequently recover intrinsic images from single, unseen images. Their approach may be viewed as similar to Blake’s method, but applying machine learning to find an optimal threshold parameter, and followed by a consistency process. Although the results are impressive, the system appears to require re-training to deal with different light source directions. In a recent paper [22], we proposed a heuristic approach to relax the hard constraint in regions where there was little confidence that a large polar angle is justified. This method is again based on image gradient strength, and uses a fuzzy inference-based consistency process to update initial estimates.

3. Albedo estimation using Blake’s method The first stage of our scheme for predicting appearance under novel illumination requires a method for estimating albedo. We retain the Lambertian assumption for simplicity, and seek estimates for the albedo, i,j , at each pixel position (i, j ), such that Ei,j =i,j ni,j ·s. Clearly, if E is normalized to the range [0, 1], i,j is constrained to lie in the range [Ei,j , 1]. Blake’s method [1] is based on the assumption that albedo changes produce stronger gradients than shading effects. This assumption does not hold for smoothly varying albedoes, or rapid changes due to shadow boundaries. Fortunately, the former occur relatively rarely in nature, whilst we noted in Section 1.1 that misclassification of shadow regions as low albedo patches can be viewed as desirable from the point of view of re-illumination. We adopt the simplified formulation of Blake’s approach described in Ref. [39] and begin by considering the image intensity at each pixel as the product of the albedo with an intrinsic shading image, E : . Ei,j = i,j Ei,j

(2)

P.L. Worthington / Pattern Recognition 38 (2005) 1261 – 1274

1265

Fig. 2. Top row: Estimated albedo maps generated using Blake’s lightness estimation method and using thresholds that reject, from left to right, 30%, 50%, 70% and 90% of the gradients in the image. Middle row: intrinsic shading images. Bottom row: results of re-illuminating needle-maps generated using Blake’s method to relax the hard irradiance constraint, and 30 iterations of our fuzzy inference-based approach to estimating azimuthal angles. The re-illumination light source is 45◦ to the right of the viewing direction. Although the differences are quite subtle, it is clear that artefacts reduce in prominence as we move left to right, but so does the sense of face structure and depth.

It is convenient to work with the log image: =r ei,j = log Ei,j = log i,j + log Ei,j i,j + ei,j .

(3)

Taking derivatives of the log image, we expect to obtain a derivative map dominated by sharp peaks where the albedo changes, with only a small contribution from the smoothly varying intrinsic shading image. If we threshold to remove the latter contribution, we should be left with only the albedo changes. In Blake’s original paper [1], albedo estimates are then recovered by integrating the thresholded derivative map via an iterative approach. However, as (Forsyth and Ponce) [39] note, the method can be re-cast as a minimization problem where we seek to find the log albedo map, r, which minimizes:   2   2  jr  jr je  je   F1 =  − T − T + , (4)   jy jx jx  jy  where T (x) is a threshold function with cutoff . This leads to a constrained minimization problem with each log albedo value, ri,j , restricted to the range [ei,j , 0]. With our estimate of ri,j to hand, we can simply take the exponential to recover the albedo estimate, i,j at each pixel. Blake’s process works surprisingly well in practice, even on real images. With good selection of the threshold parameter, , it successfully discards most shading gradients and retains mainly reflectance edges. Of course, the two cases will inevitably be confused in places due to unexpectedly

large shading gradients or small reflectance changes. In general, however, plausible intrinsic images can be obtained. Fig. 2 illustrates the intrinsic images produced from a realworld image using different values of .

4. Threshold selection using re-illumination Given that Blake’s method is capable of generating good albedo estimates, there remains the outstanding problem of how to select the gradient threshold, . Clearly, selecting too high a threshold will result in reflectance edges being discarded when neighbouring regions have similar albedoes, resulting in an insufficiently detailed albedo map. Conversely, too low a threshold will result in many gradients due to shading being classified as reflectance edges, so details which should be included in the intrinsic shading image will be transferred to the albedo map, resulting in a near-uniform needle-map corresponding to a flattened surface. Fig. 2 illustrates these arguments. Unfortunately, through experimentation, it becomes clear that there is no threshold value that is applicable to all images. Of course, since we aim to find a piece-wise constant albedo map, we could use some measure of constancy as the basis for determining a threshold. This appears to be the approach adopted by Zhao and Chellappa [11,12]. Unfortunately, improving constancy is not a well-defined goal. Any functional based on it appears certain to improve

1266

P.L. Worthington / Pattern Recognition 38 (2005) 1261 – 1274

Smoothness measure

530 520 510 500 490 480

3300 3250 3200 3150 3100 3050

0

10 20 30 40 50 60 70 80 90 100

Threshold (%) Smoothness measure

Smoothness measure

2800 2750 2700 2650 2600 2550 2500

3350 3300 3250 3200 3150 3100

Threshold (%) Smoothness measure

Smoothness measure

860

800 780 760 740 720 700 680

1340 1320

0 10 20 30 40 50 60 70 80 90 100

Threshold (%)

3140 3120 3100 3080 3060 3040 3020 3000

0 10 20 30 40 50 60 70 80 90 100

Threshold (%)

820

1360

3160

3400

10 20 30 40 50 60 70 80 90 100

840

1380

1300

3050 0

1400

Threshold (%)

2900 2850

1420

0 10 20 30 40 50 60 70 80 90 100

Smoothness measure

Smoothness measure

540

Smoothness measure

3350

550

0 10 20 30 40 50 60 70 80 90 100

Threshold (%)

2350 2300 2250 2200 2150 2100 2050 2000 1950

0

10 20 30 40 50 60 70 80 90 100

Threshold (%)

0 10 20 30 40 50 60 70 80 90 100

Threshold (%)

Fig. 3. Plots of re-illuminated image smoothness versus gradient threshold for the selection of images used in this paper. From top left to bottom right, we have three face images collected by the author (the first two plots correspond to the same image at different scales), followed by the famous test images Lenna and Peppers, and two images from the COIL database, Toy Duck and Lucky Cat. For most of the images we see that the plots have a distinctive structure with fairly clear minima, which allow us to select a threshold which is optimal from the point of view of generating smooth re-illuminated images. The COIL images appear to behave differently because the main sources of strong gradients are the boundary with the background and with low albedo regions. It is not possible to set a global threshold which distinguishes between these cases, so if a high threshold is set problems with distinguishing between object and dark background occur.

monotonically as the gradient threshold is increased, to the limiting case where all gradients are discarded and the minimization yields a uniform albedo map. Adopting constancy therefore simply shifts our search from a threshold on the gradients to a threshold on the constancy measure, or on some form of segmentation process. Instead, we suggest a novel solution motivated by our overall goal of producing photo-realistic re-illuminated images using SFS. In the absence of a functional to describe photo-realism, we use a simple image smoothness measure as a crude approximation: F2 =







s∈S (i,j )∈E(s) (I,J )∈N

(Ei,j (s) − EI,J (s))2 ,

(5)

where we sum over a coarse sampling of possible light source directions, S, and compare the re-illuminated intensities over local neighbourhoods, N, of the current pixel, (i, j ). E(s) denotes the image generated by re-illuminating the needle-map using a point light source in direction s. The use of a smoothness measure appears reasonable if we reconsider the performance of existing SFS schemes in Fig. 1. Producing the smoothest possible images under reillumination should improve photo-realism. Fig. 3 illustrates the usefulness of F2 as a basis for threshold selection. It shows plots of F2 versus the proportion of gradients allowed past the threshold for each of the real images considered in this paper. With the exception of the COIL images [41], which feature extensive dark background

P.L. Worthington / Pattern Recognition 38 (2005) 1261 – 1274

regions that confuse the threshold selection, the plots appear generally well-behaved, and display distinct minima which can be used to automatically set threshold values. Most importantly, F2 is, empirically, not trivially minimized by the extremal cases of i,j = Ei,j or i,j = 1 for all (i, j ). There are simple explanations for why this might be the case. In the first instance, we have i,j = 0 for all (i, j ), so the reilluminated images are identical to the original apart from a scaling of all intensities by cos s , where s is the polar angle of the light source direction. However, this will result in a high value of F2 relative to other choices of albedo map, since there is no possibility of any part of the image being a black, featureless shadow unless it was in shadow in the original. Note that, since we do not recover the surface from the needle-map and perform ray tracing, our method does not model instances of self-shadowing. As we move away from the extreme case of i,j =Ei,j , the polar angles will increase. When the needle-map is re-illuminated, any normals at angles greater than /2 to the light source will be black. Assuming that our method for estimating azimuthal angles results in a locally consistent needle-map, shadow regions will occur, and since these regions are featureless, they do not contribute to F2 except at their boundaries, so F2 should reduce. However, at some point our choice of threshold will start discarding genuine reflectance edges, leading to a loss of distinction between regions which should have different albedo values, and hence to normals with excessively large polar angles which will typically produce strong artefacts under some re-illumination conditions. In the extreme case, we will return to the hard-constraint results illustrated in Fig. 1, which appear unlikely to correspond to low values of F2 . It is possible that a more sophisticated measure of photorealism than image smoothness can be designed. Moreover, the above argument is ad hoc in nature, whereas it may be possible to identify some form of maximum likelihood model to relate re-illuminated image quality to gradient threshold. However, these issues are left for future research. For the time being we note that, empirically, the smoothness of re-illuminated images varies with the gradient threshold and, for many images, provides a basis for selecting a threshold which is in some sense optimal.

5. Estimating azimuthal angles A suitable method for estimating the azimuthal angle of each normal, i,j , is required. For simplicity, we assume that i,j is independent of the polar angles, so it does not matter if we find i,j first, and then estimate the albedo, or estimate i,j direct from the intrinsic shading image. In practice, since the albedo estimation process is not perfect, finding i,j from the intrinsic shading image tends to introduce a small degree of additional noise. We initialize each i,j so that the normal points in the negative gradient direction. This can be justified by considering the case of a Lambertian sphere. Assuming convexity

1267

as a further implicit constraint, the negative gradients will radiate away from the brightest point on the sphere, producing a near-ideal needle map (apart from numerical effects due to use of finite difference approximations) in this simple case. There are a variety of gradient estimation methods which can be applied, ranging from simple Roberts or Sobel operators, to advanced multiscale approaches such as [42]. Here, we adopt the recent mask-based method due to Ando [43], which appears to provide significant advantages over Sobel masks. Given an initial needle-map, we seek an update procedure for adjusting i,j to improve local consistency and hence produce smoother, more realistic re-illuminated images. In the SFS literature, popular examples include smoothness (e.g. Ref. [17]) and integrability (e.g. Ref. [23]). Such constraints can be used in conjunction with the hard constraint framework, but integrability in particular produces poor results. Other constraints include curvature and gradient consistency [16]. Here, we adopt a fuzzy inference approach. Having estimated the gradients using 5 × 5 Ando masks [43], we create a membership function for the orientation in the form of a von Mises distribution centred on the negative gradient direction [44]. The initial membership function is given by p(|g , k) =

1 exp(k cos( − g )) 2I0 (k)

(6)

 where I0 (k) = 1/2 02 exp(k cos()) d, where g is the angle associated with the initial gradient estimate, g, and I0 is the first-order modified Bessel function. To reflect the heuristic that we have greater confidence in the initial estimate if the gradient magnitude is large, the parameter k is chosen to be proportional to |g|. For strong gradients this produces a membership function with a narrower peak which is less likely to be influenced by neighbouring distributions. We subsequently encourage orientation consistency by multiplying neighbouring membership functions together until the needle-map settles to a steady state. If neighbouring gradients point in similar directions, the peaks will reinforce and produce an averaging effect amongst the neighbours, whilst neighbouring gradients in significantly different directions have little effect on one another. Fifty iterations were found to be more than sufficient for the azimuthal angles to settle to a steady state for all the images tested. The approach offers a simple adaptive smoothing solution which appears to work well in practice.

6. Experiments Fig. 4 provides a quantitative assessment of reillumination results compared against ground truth for a simple synthetic image featuring albedo changes. We use the normalized correlation between the ground truth and

1-Normalized Correlation

P.L. Worthington / Pattern Recognition 38 (2005) 1261 – 1274

1-Normalized Correlation

1268

1 0.8 0.6 0.4 0.2 0

1 0.8 0.6 0.4 0.2 0

phi=–pi:pi

0.6 0.4 0.2 0

theta=0:pi/2

phi=–pi:pi

1-Normalized Correlation

1-Normalized Correlation

phi=–pi:pi

1 0.8

theta=0:pi/2

theta=0:pi/2

1 0.8 0.6 0.4 0.2 0

phi=–pi:pi

theta=0:pi/2

Fig. 4. A simple synthetic image featuring variable albedo patches. Top row, left to right: original image illuminated from the viewing direction, ground truth image illuminated with light source direction (−1, 0, 1), re-illuminated Bichsel and Pentland output. Second row: re-illuminated outputs from the Worthington and Hancock algorithm (left), the heuristic albedo estimation method reported in Ref. [22] (middle) and using Blake albedo estimation (right). Third row: plots of (1-normalized correlation) comparing ground truth against re-illuminated images produced using Bichsel and Pentland (left) and Worthington and Hancock (right). Bottom row: Correlation plots for heuristic albedo estimation method reported in [22] (left) and Blake albedo estimation using a threshold of 85% and fuzzy estimation of the azimuthal angles (right). The plots cover lighting directions in the range 0  s  /2, −  s  .

re-illuminated images, given by C(E(s), E t (s))  = 1 − 

t ¯ ¯t (i,j )∈E (Ei,j (s) − E(s))(Ei,j (s) − E (s))

2 t 2 ¯ ¯t (i,j )∈E (Ei,j (s) − E(s)) (Ei,j (s) − E (s))

,

(7) where E(s) is the image resulting from re-illumination by point light source in direction s, E t (s) is the corresponding

synthetic ground truth image, and E¯ and E¯ t denote the mean image intensities. Fig. 4 includes sample re-illuminated images produced using the Bichsel and Pentland and Worthington and Hancock algorithms, as well as the heuristic albedo estimation approach proposed in Ref. [22]. Although the image produced using the method described in this paper contains some artefacts, particularly at the borders of the low albedo regions, it is significantly more realistic than the solution produced by the Bichsel and Pentland algorithm. The

P.L. Worthington / Pattern Recognition 38 (2005) 1261 – 1274

normalized correlation plots bear this out. Over the range of lighting directions covering a hemisphere centred on the viewing direction, the maximum value of C(E, E t ) for the Bichsel and Pentland algorithm is 0.6833, and the mean is 0.2898. The Worthington and Hancock algorithm using curvature consistency but no albedo estimation produces a maximum of 0.3060 and a mean of 0.0728. This is a very good result given the use of the hard constraint, and it is clear from Fig. 4 that the algorithm performs extremely well on this simple test image. In contrast, the method reported in Ref. [22] using albedo estimation with fuzzy inference gives a maximum value of 0.3063 and a mean of 0.0839, but tends to produce much more photo-realistic results than the Worthington and Hancock algorithm on real images. Using Blake’s method together with fuzzy inference on the azimuthal angles provides a significant improvement in the maximum error compared to the other schemes, yielding a maximum correlation error of 0.2128, and a mean of 0.0780. This is produced using a threshold that rejects 85% of gradients, which minimizes the smoothness of a coarse sampling of re-illuminated images. However, the best-performing threshold measured against ground truth should be chosen to reject 75% of gradients, and gives a maximum of 0.1870 and a mean of 0.0759. It is surprising that the mean error is worse than for the Worthington and Hancock method, although this may be attributable to the problems, mentioned previously in the context of COIL images, caused by the boundary with the dark background.

1269

The results on Lenna and Peppers (Fig. 7), and on images from the COIL database [41] (Fig. 8), are intended to illustrate the generality of the technique. Lenna contains a face region which produces fairly good results under reillumination, but we note that the detailed background is also well modelled by our approach. The Peppers image proves more problematic due to the highly non-Lambertian nature of the surfaces, although the long vertical pepper to the left of the centre is captured fairly well, and the behaviour for small lighting changes is reasonably plausible. The two COIL images in Fig. 8 might be expected, at the outset, to yield good results, given the smooth nature of the objects, the limited extent of the albedo changes, and the carefully controlled lighting conditions used to capture them. There remain some problems with the selection of azimuthal angles, such that there is a tendency to introduce concave regions around the borders of low albedo regions due to the gradients pointing out of the low albedo regions. For this reason it would be useful to be able to calculate the gradient directions from the intrinsic shading images, but the albedo estimates are not sufficiently accurate to allow this to be done without introducing noise to the azimuthal angle estimates. Our fuzzy inference process fails to rotate the initial normals by the 180◦ required in such cases to produce a more globally pleasing result. Nonetheless, the re-illuminated images appear extremely realistic even for fairly extreme re-illumination conditions. 6.2. Efficiency

6.1. Real images As noted in Section 1.1, our fundamental goal is to produce photo-realistic images. Figs. 5–8 illustrate that the method presented here represents a significant step in that direction. Compared to the results of the Bichsel and Pentland and Worthington and Hancock methods presented in Fig. 1, the re-illuminated images are significantly smoother and more realistic, and show a marked reduction in the extent and severity of artefacts introduced. In Figs. 5 and 6, very realistic images are generated close to the original lighting direction. Under more extreme lighting conditions, the sense of photo-realism begins to fail due to artefacts. Some of these may be attributable to the fact that we make no attempt to capture self-shadowing behaviour, which would require surface recovery and ray tracing. However, inaccuracies in the estimation of  and  remain the main sources of error. Strong artefacts appear where neighbouring patches of normals adopt opposing azimuthal directions, resulting in unexpected gradients. The overall effect appears to make the faces appear almost faceted, with discontinuities introduced where we expect smooth behaviour. The method also struggles, to some degree, with strongly textured regions such as hair due to the gradients confusing the albedo estimation process. Nonetheless, the results are encouraging and represent a significant improvement in the photo-realism achieved by existing SFS techniques.

Overall, the SFS process is extremely computationally expensive since we need to re-estimate the albedo map by minimizing Eq. (4) for a number of sample threshold values during the parameter search. At present, we have only implemented the method using Matlab, and it takes several hours to perform the parameter search on images of the order 128 × 128 pixels on a 1 GHz Pentium III. Clearly, a C implementation will be significantly faster, but is still likely to take tens of minutes. Of course, we could skip the parameter search process and simply use a selected value for all images—our experiments suggest that a threshold that rejects 70% of the gradients works reasonably well in practice, although parameter searching produces significantly improved results. It is also likely that optimizations can be identified. However, the technique is clearly not suitable for real-time application in its current form, but goes some way towards demonstrating that, in principle, generation of photo-realistic images from a single view is possible using the apparatus of SFS together with Blake’s albedo estimation method.

7. Conclusions and further work In this paper, we have presented research which brings together Blake’s albedo estimation algorithm [1] with

1270

P.L. Worthington / Pattern Recognition 38 (2005) 1261 – 1274

Fig. 5. Re-illuminated faces using Blake’s method. The right-hand pair of columns use an image 50% of the size of the left-hand pair. The left column of each pair corresponds to light sources from 45◦ to the left of the viewing direction to 45◦ to the right in 15◦ increments, whilst the right-hand column shows results for the same angles above and below.

P.L. Worthington / Pattern Recognition 38 (2005) 1261 – 1274

1271

Fig. 6. Two further face images re-illuminated after Blake lightness estimation followed by fuzzy inference-based azimuthal angle estimation. Performance near the viewing direction is very good, whilst the method begins to break down for more extreme lighting.

recent advances in SFS [16,22]. The result is a technique capable of analysing a single grey-scale image and creating novel, realistic images under varying illumination conditions. Moreover, we have proposed the use of a smoothness measure, calculated from the re-illuminated images, as a basis for identifying an appropriate threshold value for the Blake method. This allows the development of a fully automatic, generally applicable technique for appearance

prediction. The technique has been demonstrated to produce good results on a range of real images and represents a significant improvement over existing SFS schemes. The technique requires further development to improve its performance and applicability. A particular issue is to further address the problem of strong gradients arising from neighbouring patches of surface normals having opposing azimuthal angles. These are especially disruptive to the sense

1272

P.L. Worthington / Pattern Recognition 38 (2005) 1261 – 1274

Fig. 7. Re-illumination results for the famous Lenna and Peppers test images. These are complex images, and the original lighting conditions are unknown (we assume light source in the viewing direction for simplicity, but could apply a light source estimation technique), but the results on Lenna in particular are realistic and capture the effects of changing light source direction well.

of realism of re-illuminated face images, and suggest development of constraints which penalize instances where strong gradients occur in a re-illuminated image but not in the original. Our current approach to choosing the threshold value for the Blake method using re-illuminated image smoothness appears to work well in practice, but is an ad hoc method. In future work, we hope to re-formulate the approach as a maximum likelihood problem, either on the

value of the threshold, or on the direct estimation of albedo. The method at present is too computationally intensive for practical application, so further work is also required to seek optimizations and short-cuts which will allow results to be generated closer to real time. Finally, when the above issues have been addressed, the method requires evaluation in real applications. These are likely to include face and object recognition, and visual quality assessment experiments.

P.L. Worthington / Pattern Recognition 38 (2005) 1261 – 1274

1273

Fig. 8. Re-illumination results on two images from the COIL database. Both contain objects which are fairly amenable to standard SFS analysis, but also include albedo changes. Despite the problems, noted previously, of finding a suitable threshold in terms of behaviour at the boundaries with the uniform background, the method produces excellent results on both images, strongly capturing the object structure and producing realistic images under re-illumination.

References [1] A. Blake, Boundary conditions for lightness computation in Mondrian world, Comput. Vision Graphics Image Process. 32 (1985) 314–327. [2] K. Nishino, Z. Zhang, K. Ikeuchi, Determining reflectance parameters and illumination distribution from a sparse set of images for view-dependent image synthesis, Proc. Int. Conf. Comput. Vision II (2001) 391–439. [3] Y. Li, S. Lin, S.B. Kang, H. Lu, H.-Y. Shum, Singleimage reflectance estimation for relighting by iterative soft

grouping, Proc. Pacific Conf. on Comput. Graphics Appl. (2002) 483–486. [4] Y. Yu, P. Debevec, J. Malik, T. Hawkins, Inverse global illumination: recovering reflectance models of real scenes from photographs, Proc. ACM SIGGRAPH (1999) 215–224. [5] S. Boivin, A. Gagalowicz, Image-based rendering of diffuse, specular and glossy surfaces from a single image, Proc. ACM SIGGRAPH (2001) 107–116. [6] P. Eisert, E. Steinbach, B. Girod, Automatic reconstruction of stationary 3-D objects from multiple uncalibrated camera

1274

[7]

[8]

[9] [10]

[11]

[12]

[13]

[14] [15]

[16]

[17]

[18] [19]

[20] [21] [22]

[23]

[24] [25]

P.L. Worthington / Pattern Recognition 38 (2005) 1261 – 1274 views, IEEE Trans. Circuits Syst. Video Technol. 10 (2) (2000) 261. T. Sato, M. Kanbara, N. Yokoya, H. Takemure, Dense 3D reconstruction of an outdoor scene by hundreds-baseline stereo using a hand-held video camera, Int. J. Comput. Vision 47 (1–3) (2002) 119–129. T. Werner, A. Zisserman, New techniques for automated architectural reconstruction from photographs, Proc. European Conf. Comput. Vision II (2002) 541. M. Levoy, P. Hanrahan, Light field rendering, ACM Comput. Graphics Proc. SIGGRAPH (1996) 31–42. S. Laveau, O. Faugeras, 3D scene representation as a collection of images and fundamental matrices, Technical Report 2205, INRIA Sophia-Antipolis, 1994. W. Zhao, R. Chellappa, SFS based view synthesis for robust face recognition, Proc. 4th IEEE Conf. Automatic Face Gesture Recogn. (2000) 285–292. W. Zhao, R. Chellappa, Illumination-insensitive face recognition using symmetric shape-from-shading, Proc. IEEE Conf. Comput. Vision Pattern Recogn. I (2000) 286–293. P.L. Worthington, E.R. Hancock, Coarse view synthesis using shape-from-shading, Pattern Recogn. 36 (2) (2003) 439–449. P.L. Worthington, Novel view synthesis using needle-map correspondence, Br. Machine Vision Conf. II (2002) 718–727. M. Bichsel, A.P. Pentland, A simple algorithm for shape from shading, Proc. IEEE Conf. Comput. Vision Pattern Recogn. (1992) 459–465. P.L. Worthington, E.R. Hancock, New constraints on datacloseness and curvature consistency for shape-form-shading, IEEE Trans. Pattern Anal. Machine Intell. 21 (1999) 1250– 1267. B.K.P. Horn, M.J. Brooks, The variational approach to shape from shading, Comput. Vision Graphics Image Process. 33 (2) (1986) 174–208. P.S. Tsai, M. Shah, Shape from shading using linear approximation, Image Vision Comput. 12 (8) (1994) 487–498. Q. Zheng, R. Chellappa, Estimation of illuminant direction, albedo, and shape from shading, IEEE Trans. Pattern Anal. Machine Intell. 13 (7) (1991) 680–702. C.G. Christou, J.J. Koenderink, Light source dependence in shape from shading, Vision Res. 37 (11) (1997) 1441–1449. P.L. Worthington, Re-illumination-driven shape from shading, Comput. Vision Image Understand., to appear. P.L. Worthington, Predicting appearance from a single image with albedo estimation, Image Vision Comput. 2003, submitted. R.T. Frankot, R. Chellappa, A method for enforcing integrability in shape from shading algorithms, IEEE Trans. Pattern Anal. Machine Intell. 10 (4) (1988) 439–451. B.K.P. Horn, Determining lightness from an image, Computer Graphics Image Process. 3 (1974) 277–299. H.G. Barrow, J.M. Tenenbaum, Recovering intrinsic scene characteristics from images, in: Hanson, Riseman (Eds.),

Computer Vision Systems, Academic Press, New York, 1978. [26] Y. Weiss, Deriving intrinsic images from image sequences, Proc. Int. Conf. Comput. Vision (2001) 68–75. [27] M.F. Tappen, W.T. Freeman, E.H. Adelson, Recovering intrinsic images from a single image, Adv. Neural Inform. Processing. Syst. 15 (2003). [28] A.S. Georghiades, D.J. Kriegman, P.N. Belhumeur, Illumination cones for recognition under variable illumination: faces, Proc. IEEE Conf. Comput. Vision Pattern Recogn. (1998) 52–58. [29] A.S. Georghiades, D.J. Kriegman, P.N. Belhumeur, From few to many: illumination cone models for face recognition under variable lighting and pose, IEEE Trans. Pattern Anal. Machine Intell. 23 (6) (2001) 643–660. [30] M. Weber, A. Blake, R. Cipolla, Towards a complete dense geometric and photometric reconstruction under varying pose and illumination, Br. Machine Vision Conf. I (2002) 83–92. [31] B.K.P. Horn, Obtaining shape from shading information, in: P.H. Winston (Ed.), The Psychology of Computer Vision, McGraw-Hill, New York, 1975, pp. 115–155. [32] R. Zhang, P.S. Tsai, J.E. Cryer, M. Shah, Shape from shading: a survey, IEEE Trans. Pattern Anal. Machine Intell. 21 (8) (1999) 690–706. [33] J. Dong, M.J. Chantler, Capture and synthesis of 3D surface texture, Proc. Int. Workshop Texture Anal. Synthesis (2002) 41–45. [34] T. Vetter, T. Poggio, Linear object classes and image synthesis from a single example image, IEEE Trans. Pattern Anal. Machine Intell. 19 (7) (1997) 733–742. [35] T. Vetter, Synthesis of novel views from a single face image, Int. J. Comput. Vision 28 (2) (1998) 103–116. [36] J. Atick, P. Griffin, N. Redlich, Statistical approach to shape from shading: reconstruction of three-dimensional face surfaces from single two-dimensional images, Neural Comput. 8 (1996) 1321–1340. [37] E.H. Land, J.J. McCann, Lightness and retinex theory, J. Opt. Soc. Am. 61 (1971) 1–11. [38] G. Brelstaff, A. Blake, Computing lightness, Pattern Recogn. Lett. 5 (1987) 129–138. [39] D.A. Forsyth, J. Ponce, Computer Vision: a Modern Approach, Prentice-Hall, Englewood Cliffs, NJ, 2003. [40] R. Kimmel, M. Elad, D. Shaked, R. Keshet, I. Sobel, A variational framework for retinex, Proc. SPIE Electron. Imaging 4672 (2002). [41] S.A. Nene, S.K. Nayar, H. Murase, Columbia object image library (COIL-20), Technical Report CUCS-005-96, 1996. [42] X.-G. Feng, P. Milanfar, Multiscale principal components analysis for image local orientation estimation, Proc. Asilomar Conf. Signals Syst. Comput. (2002) 478–482. [43] S. Ando, Consistent gradient operators, IEEE Trans. Pattern Anal. Machine Intell. 22 (3) (2000) 252–265. [44] K.V. Mardia, P.E. Jupp, Directional Statistics, Wiley, New York, 2000.

About the Author–Philip L. WORTHINGTON received his MA degree in Engineering and Computing Science in 1996 from the University of Oxford, and his DPhil in Computer Vision under the supervision of Professor Edwin Hancock at the University of York in 2000. After working as an engineer and consultant, he was appointed to a lectureship in the Department of Computation at UMIST, UK in 2001. He is currently an Imaging Informatics Specialist at AstraZeneca, UK, and pursuing a Master of Enterprise degree at the University of Manchester with a view to developing computer vision technology ventures.