Fast natural image matting in perceptual color space

Fast natural image matting in perceptual color space

ARTICLE IN PRESS Computers & Graphics 29 (2005) 403–414 www.elsevier.com/locate/cag Technical section Fast natural image matting in perceptual colo...

925KB Sizes 0 Downloads 49 Views

ARTICLE IN PRESS

Computers & Graphics 29 (2005) 403–414 www.elsevier.com/locate/cag

Technical section

Fast natural image matting in perceptual color space Shengyou Lin, Jiaoying Shi State Key Laboratory of CAD and CG, Zhejiang University, PR China

Abstract This paper proposes a fast natural image matting method in perceptual color space. Natural image matting is usually composed of three steps: region segmenting, color estimating and alpha estimating. Our matting approach uses a practical model to estimate the foreground and background color of a given pixel in unknown regions. It avoids complex computing and decreases computational cost significantly. A new alpha estimating method in perceptual color space is introduced to compute the alpha value correctly and effectively. We separate the chroma and intensity information of a color and emphasize the more significant one. Our method works well on different perceptual color spaces, extracts foreground objects much faster, and produces modestly better mattes than the Bayesian approach. r 2005 Elsevier Ltd. All rights reserved. Keywords: Color; Intensity; Statistical; Blue screen matting; Difference matting; Natural image matting

1. Introduction The matting problem [1–5] is to extract a foreground element of arbitrary shape from a background image by estimating a color and opacity for each pixel of the foreground element. The opacity value at each pixel is referred to as its alpha and the alpha image is called a matte. Compositing is the inverse operation of matting. In the compositing process, the foreground element is placed over a new background, using the matte to remove those parts of the new background that the foreground element obscures. Matting and compositing are often used in film and video production, for instance, to place the image of an actor into another environment and to create special effects. Matting techniques can be classified mainly as blue screen matting and natural image matting, depending on Corresponding author. Tel.: +86 571 87951045;

fax: +86 571 87951780. E-mail addresses: [email protected] (S. Lin), [email protected] (J. Shi).

the background of the given image. Generally speaking, the problem of extracting foreground and alpha from a constant color background is called blue screen matting, but if the background is arbitrary, the process is called natural image matting. Though the matting technique is popular in film and video production, pulling a matte from a natural image is still a difficult problem, especially when the natural image has complex shapes such as fur and hair. The matting problem is difficult because it is essentially an underconstrained problem. For a foreground element over a single background image, there are infinite possible interpretations for the foreground color versus opacity. The challenge is to find the most reasonable one of these numerous interpretations. Natural image matting attempts to pull a matte from an arbitrary background using three basic steps: segmenting the image into three regions, namely, foreground, background, and unknown; and estimating the background and foreground color components of each pixel in unknown regions; estimating the alpha value of this pixel. This paper proposes an effective alpha estimating method for natural image matting that

0097-8493/$ - see front matter r 2005 Elsevier Ltd. All rights reserved. doi:10.1016/j.cag.2005.03.003

ARTICLE IN PRESS 404

S. Lin, J. Shi / Computers & Graphics 29 (2005) 403–414

separates chroma and intensity information of color in perceptual color space and treats them differently. The remainder of the paper is organized as follows. First, we give a review of previous matting techniques (Section 2). Then our approach is introduced in detail (Section 3). Next, we show our results through several matting examples (Section 4). Finally, we summarize our work and describe some future research directions (Section 5).

2. Previous work 2.1. Problem overview Porter and Duff [6] put forward the concept of alpha channel and summarized the compositing equation as C ¼ aF þ ð1  aÞB,

(1)

where C, F and B are composite, foreground, and background colors, and a is the alpha component. Eq. (1) holds true in each of the RGB channels. For blue screen matting, C and B are known, so we have three equations and four unknowns. For natural image matting, we have three equations but seven unknowns. To pull a matte out accurately, additional constraints must be added. Blue screen matting, has been used in the film and video industry for decades. Many methods [7–10] on blue screen matting have been patented since 1964. Smith and Blinn summarized the blue screen problem nicely in their paper [1]. Difference matting, another matting technique similar to blue screen matting, was proposed by Qian and Sezan [2]. This method requires pre-recording of a background image of the scene without the presence of any foreground object. The alpha value is determined by taking a difference between the background image and the image to be input and comparing this difference to a threshold. One limitation of both blue screen matting and difference matting is the reliance on a controlled background image. Natural image matting is a more general technique that extracts foreground elements and alpha from an arbitrary background. There are some typical natural image matting techniques, such as Knockout [11,12], Ruzon and Tomasi [3], Hillman et al. [4] and Chuang et al. [5], etc. These methods all begin by segmenting the image into three regions: foreground, background and unknown. Neither the Ruzon nor the Hillman methods is robust enough and both of these two methods are very time consuming. The Knockout method is simple and fast, but it may fail in certain cases. Chuang’s approach can obtain a better result than the previous methods, but also spends too much time on processing. Here we briefly describe the Knockout method and the Bayesian approach proposed by

Chuang. A good summarization of natural image matting can be found in [5]. 2.2. Knockout method In the Knockout method, for a given pixel p in the unknown region, let C denote its RGB color, and then its foreground color F and background color B0 are calculated as a weighted mean over the colors of pixels on the contour segments close to pixel p. Color C is then projected onto a plane that is perpendicular to line segment FB0 and passes through point B0 . The projected point is the optimized point B. The three a components in the RGB channels are estimated by projecting the color onto the RGB axes, and the final a is taken as a weighted mean over these three alpha components. Therefore the Knockout method uses a simple color estimating model and an alpha estimating method that decomposes a into three components along the axes in RGB space. 2.3. Bayesian method Chuang’s method takes a statistical approach to estimate F, B and a: Unlike other matting techniques, this approach does the matting process with a scanning order that marches inward from the known foreground and background regions. In addition, it utilizes the previously computed pixels while computing the temporary background and foreground color components F¯ ¯ The color components F¯ and B ¯ are calculated as and B: the centers of oriented Gaussian distributions. Further, this approach formulates the matting problem in a welldefined Bayesian framework and uses Maximum a Posteriori (MAP) estimation to obtain the optimized color F and B simultaneously. Finally, the alpha value is computed by projecting color onto the line segment FB in color space. In fact, this projecting method is basically the same as what Ruzon [3] and Hillman [4] used. Chuang’s color estimating model is a more complex one than Knockout’s and its projecting method of alpha estimating can get the maximal occurring probability of color C because the projected point is the nearest point in line segment FB to point C in RGB color space.

3. Our approach As with previous matting techniques, our approach begins by manually segmenting the input image into three regions: foreground, background and unknown. Next, we build a practical model to estimate the background and foreground color components of a given pixel in the unknown region. Finally, the alpha value is computed using the new estimating method based on perceptual color space. Our approach solves

ARTICLE IN PRESS S. Lin, J. Shi / Computers & Graphics 29 (2005) 403–414

the problem of natural image matting much faster and marginally better than Chuang’s approach. 3.1. Region segmenting by user As shown in Fig. 1(a), a foreground contour is drawn manually around the foreground object’s edge using a mouse and cursor. A second contour is drawn similarly. It should be assured that the foreground contour is located completely in the foreground region and the background contour is located completely in the background region. At the same time, the unknown region should be as small as possible for the matte to be pulled out better and faster. In Fig. 1(a), the red curve is the foreground contour, and the blue curve is the background contour.

3.2. Color estimating with a practical model Given a point p in the unknown region and a distance parameter g1 (1:0pg1 p10:0), the foreground and background color components of point p can be estimated approximately. First, we find a point f1 nearest to point p on the foreground contour (Fig. 1(b)) and let l F denote this shortest distance. Next, we draw a circle F with radius g1 l F and center f1. The area covered by circle F includes the pixels in both known and unknown regions. (In Fig. 1(b), g1 ¼ 1:0:). Finally, the foreground color F of point p is calculated approximately as a weighted mean over the colors of all known foreground pixels in circle F. We use a spatial Gaussian fall-off wFi with s ¼ l F as the weight function to emphasize the contribution of nearby pixels. For a foreground pixel i in the circle area, the weight wFi is given by 2 2 1 wFi ¼ pffiffiffiffiffiffi e4zFi =l F . 2p

Fig. 1. Region segmenting and background and foreground color estimating. (a) shows the hand-drawn contours and the three regions. Blue contour is the background contour, and the red contour is the foreground conrour. (b) shows how the background color and foreground color are computed.

405

(2)

In Eq. (2), symbol zFi means the distance between point f1 and the foreground pixel i in the circle F in 2D image space. The foreground color component F is given approximately as the weighted mean color F¯ over all foreground pixels in circle F: 1 X F  F¯ ¼ w F Fi , (3) W F i2N i P where W F ¼ i2N wFi and Fi is the color of foreground pixel i within circle F, which contains N foreground pixels and some unknown pixels. The same procedure is followed to estimate the background color B. For the case of foregrounds and backgrounds with low color variation, this simple model is effective enough. A model aiming at higher resolution images is left for future work. The distance parameter g1 is generally set to 1.0 in the case of smooth images. When g1 is set to another value, y; the computation cost will be nearly y2 times that of the case where g1 ¼ 1:0 because the number of known pixels in the circle will increase by y2 times. The quality of the matting result changes little by changing the value of g1 : In the case of complex images, we can set g1 to a larger value to avoid the noise of matting results because more pixels are involved in the computing of foreground and background color components. Chuang utilizes more sample pixels when the unknown pixel is closer to the contour. Our method uses more sample pixels when the unknown pixel is far from the contour. The Knockout method uses pixels only on the nearby contour instead of using all the pixels within the known nearby region. In Fig. 8, we use the same alpha estimating method that will be described in the next section, but we use different color estimating methods. When the image is smooth, these differences

ARTICLE IN PRESS 406

S. Lin, J. Shi / Computers & Graphics 29 (2005) 403–414

Fig. 2. Instances in which Chuang’s alpha estimating method may not bring correct results. (a) The small black rectangle is enlarged on the right, where P1 and P2 indicate two pixels that have incorrect results from using Chuang’s projecting. (c) and (d), respectively, illustrate the projecting of P1 and P2 in RGB color space using Chuang’s method, in which C denotes the observed color, F the foreground color and B the background color. Perceptually, P1 is supposed to be sorted into strand, but is now projected to background. This kind of projecting depends only on the shape of DCFB. (b) compares the results of this black rectangle using Chuang’s method (left pair) and our method (right pair), with each result pair consisting of an alpha outcome and a compositing outcome.

in color estimating will not lead to any obviously different quality of results. In Chuang’s method, previously computed pixels are also utilized to estimate the initial colors. This can improve the precision if previously computed colors are precise enough, but would give worse results otherwise. Our method appears similar to the Knockout method to some extent, but we compute foreground and background colors using sample pixels within the neighborhood region, rather than using pixels only on nearby contours. 3.3. Alpha estimating in perceptual color space First, we discuss the disadvantages of previous alpha estimating methods in natural image matting. Second, we describe in detail our alpha estimating method in a simple perceptual color space. Last, we extend our alpha estimating method to other perceptual color spaces. 3.3.1. Previous alpha estimating methods in RGB color space The typical natural image matting techniques include Knockout [11,12], Ruzon [3], Hillman [4] and Chuang [5]. Their alpha estimating steps are all done in RGB color space. Among them, Ruzon, Hillman and Chuang appear to use the same projecting algorithm to estimate alpha values, but Knockout uses a decomposing algorithm that is different from the others.

For an observed color C, when its foreground color component F and background color component B are computed, Knockout decomposes the final a into three components ar ; ag ; ab along three axes in RGB space, using the relation f ðaÞ ¼

f ðCÞ  f ðBÞ , f ðFÞ  f ðBÞ

(4)

where f( ) projects RGB color or final alpha onto one of the r, g, b axes. Each of these alpha components is computed respectively by projecting onto its respective axis. The final a is taken as a weighted sum over all the projections, where the weights are proportional to the denominator in Eq. (1) for each axis. This method decomposes a into three components along the axes in RGB space. Other natural image matting techniques including Chuang, Ruzon and Tomasi, and Hillman use a different alpha estimating method, done by projecting color C onto line segment FB in RGB space. Their alpha computing equation is given as follows: a¼

ðC  BÞ ðF  BÞ . jjF  Bjj2

(5)

So alpha is the ratio of jBCj to jBFj: When point C lies outside of BF, then for Eq. (4) C is projected onto a point C0 on BF, and the alpha value is jBC0 j divided by jBFj: Clearly this projecting depends only on the shape

ARTICLE IN PRESS S. Lin, J. Shi / Computers & Graphics 29 (2005) 403–414

407

Bð116; 120; 145Þ represents background. Viewed with human eyes, C is supposed to approach foreground F intuitively. As a result, we perceptually sort P1 into ‘‘strands’’ rather than ‘‘background’’ according to the dominant chroma information, although P1’s luminance tends to be closer to the background. Knockout’s alpha estimating method gives most weight to the alpha component whose respective denominator in Eq. (4) is largest. So, in Fig. 2(c) the final alpha would be closest to ab : As is expected, the final computed a is 0.058, which implies that color C is closer to B. But color C is obviously approaching F because C and F are both colors of hair strands. Another example where this method fails, near pixel P2, is given in Fig. 2(d). This alpha estimating method cannot get perceptually correct results, as shown in the left part in Fig. 2(b). If we use Chuang’s alpha estimating method to compute alpha, as shown in Fig. 2(c), chroma’s influence would be undermined by luminance, and color C is projected to color C1ð116; 123; 151Þ which is closer to background B than to foreground F. The dominant color component of C is originally red, but after projecting it is changed to blue. That is to say, chroma, the perceptually preponderant information, is undesirably influenced by intensity. This can lead to error, as shown in the middle part in Fig. 2(b). Therefore we need a strategy so that when either the chroma or intensity information is perceptually dominant, we can minimize the influence of the other.

Fig. 3. Our perceptual color space illustration. (a) Decomposed intensity and chroma are expressed through the length and direction of the vectors in RGB color space. We compute alpha via the projected triangle DC0 F0 B0 and the length of the color vectors, thus our alpha estimating relies not only on DCFB’s shape but also on its position in RGB color space. In (b), d denotes the distance between F0 and B0 on the unit plane.

of DCFB. In other words, if the shape is determined then alpha is also determined, wherever the position of this triangle is in RGB space. This seems questionable. In practice, we have found that this approach does not always produce correct results. In Fig. 2(a), Cð160; 147; 144Þ is the color of P1ð470; 281Þ on hair strands; Fð121; 90; 77Þ is the weighted mean over the colors on the strands; and

3.3.2. Alpha estimating in a simple perceptual color space We propose to estimate the alpha value in a perceptual color space, which consists of two chroma dimensions and one intensity dimension. Qian and Sezen [2] also separate the intensity and chroma information of a color and treat them differently in difference matting. According to them, significant information, either chroma or intensity, is emphasized, and insignificant information is ignored. Since RGB color tightly interwinds intensity and chroma data, we decompose RGB color in perceptual color space, to emphasize the more important one of them. This is exactly the strategy we need. In addition, alpha depends not only on DCFB’s shape, but also on its position in RGB color space. There must be enough thought given to this point during alpha estimating method. Our method, described in the following paragraphs, gives attention to both shape and position of DCFB, and also separates chroma and intensity. This is the underlying reason why our method outperforms previous alpha estimating methods. For convenience of computing and description, we adopt a modified perceptual space representation. It is assumed that the color coordinates of RGB color space are normalized to lie on the interval [0,1]. Given a color c ¼ ½Rc ; Gc ; Bc ; let c0 ¼ ½rc ; gc ; bc denote the chroma of

ARTICLE IN PRESS S. Lin, J. Shi / Computers & Graphics 29 (2005) 403–414

408

Fig. 4. Results of Syringe and Feather_edge examples in Lab, HSV, and YUV color spaces. Their original images are shown in Figs. 5 and 6, respectively. In these two examples, the results of HSV and YUV are better than that of Lab if you zoom in and check them carefully.

color c, and Lc ¼ 0:114Rc þ 0:588Gc þ 0:298Bc denote its intensity. In RGB color space, chroma c0 also represents a color whose RGB components are given as: rc ¼ Rc =ðRc þ Gc þ Bc Þ; gc ¼ G c =ðRc þ G c þ Bc Þ; bc ¼ Bc =ðRc þ Gc þ Bc Þ: Obviously we have rc þ gc þ bc ¼ 1; i.e. chroma color c0 of any color c lies on the unit plane [13] (Drgb in Fig. 3(a)). This converted color space is composed of Lc and ðrc ; gc ; bc Þ and the latter has actually only two free dimensions, because chroma color c0 lies on the unit plane. Hence this color space is actually ðLc ; rc ; gc Þ; whose first dimension represents intensity, i.e. the length of RGB color vector, and the other two indicate chroma, i.e. the direction of the RGB color vector. We call it Lrg color space, a modified perceptual color space. In Fig. 3, it is clear that the shape and position of DCFB jointly determine the intensity LC ; LF ; LB and the projected DC0 F0 B0 on the unit plane. To analyze the individual effect of chroma and intensity on alpha computing, we decompose alpha into two components, namely, chroma alpha aCH and intensity alpha aIN : aCH ¼

ðC0  B0 Þ ðF0  B0 Þ jjF0  B0 jj2

(6)

LC  LB , LF  LB

(7)

and aIN ¼

where a; aCH and aIN are all assumed to lie on the interval [0,1]. Let r ¼ MINðLF ; LB Þ=MAX ðLF ; LB Þðr 2 ð0; 1 Þ: When r approaches 0, the intensity difference between F and B is so large that the intensity prevails over the chroma, so we give more weight to aIN : When r approaches 1, the chroma prevails over the intensity, so we give more weight to aCH : Besides r; the distance pffiffiffi d ¼ jF0 B0 j ðd 2 ½0; 2 Þ (Fig. 3(b)) on the unit plane is also important in computing the weight. The different values of d in the given range have a similar effect to those of r: By practically choosing r3 and d3 to stress these changes and emphasize the dominant information, we derive the weight of aCH and aIN as W IN ¼

u v þ d 3 r3

(8)

and W CH ¼ sd 3 þ tr3 ,

(9)

where u, v, s, t are adjustable constants. When s and t are too large, there will be noise throughout the whole matting results. When u and v are too large, there will be completely wrong matting results in some local regions. 1 In Lrg color space, we set u ¼ 8000 ; v ¼ t ¼ 3:0 and s ¼ 8000 experientially. The final alpha is computed as a

ARTICLE IN PRESS S. Lin, J. Shi / Computers & Graphics 29 (2005) 403–414

409

Fig. 5. Comparisons of Syringe example. Inset 1 combines the close-ups of the original, the alpha matte and the composite image of the hair strands on the head portrait’s right side. Inset 2 shows the other close-up of the alpha matte image of the hair strands on the head’s left side. Knockout loses some hair strands in inset 1. Ruzon and Tomasi shows diagonal color discontinuity in inset 1 and broken hair strands in inset 2. Bayesian exhibits quite good results in both insets 1 and 2, but when viewing into the two insets carefully, dirty areas are found in inset 1 and a little color discontinuity is found on the bottom inset 2. Our method shows none of the above artifacts.

ARTICLE IN PRESS S. Lin, J. Shi / Computers & Graphics 29 (2005) 403–414

410

Fig. 6. Comparison of Feather_edge example. Insets 1 and 2 show the close-ups of the alpha image of the hair around the chin and on the head, respectively. Bayesian loses a part of hair strand on the top ellipse of inset 1, and produces some noise in the bottom ellipse of inset 1.

weighted mean over aCH and aIN : a¼

W CH aCH þ W IN aIN . W CH þ W IN

(10)

When C is on FB, the result by Chuang’s alpha estimating method is accurate. Although our method will cause deviation in this case, the chance that C is exactly on FB is quite small. Moreover, this deviation is too subtle to be detected by human eyes. 3.3.3. Alpha estimating in other perceptual color spaces There are two chroma dimensions and one intensity dimension in a perceptual color space. As we know, there are many kinds of perceptual color spaces in color science. We simply sort them into three main classes according to their color coordinates: (1) Lab, Luv, Lab [14], etc. which are all weakly correlated perceptual color spaces after complex iron-linear transformation of RGB color space. (2) YUV, YIQ, YES, YCrCb, etc., whose coordinates are linear combinations of their corresponding RGB color coordinates. The two chroma coordinates of color spaces in class (1) and (2) are roughly equal in the sense of both mathematics and physics.

(3) HSV, HLS, etc., which are based on hand-made palette coloring processes. They are also non-linear transformations of RGB color space, but their chroma coordinates are different from those of class (1) and (2). So they can be considered as a kind of polar-coordinate representation (Fig. 4).

Lrg color space would be sorted into class (1) based on the above strategy, but its transforming process is not that complex. In the following section, we use the results in Lrg color space to make some comparisons with other natural image matting techniques. Our natural image matting algorithm has been tested with a lot of examples in all of the above perceptual color spaces. Here we only show the results in Lab, YUV and HSV color space in Fig. 5. Their alpha estimating processes are similar to what is described in Section 3.3.2. In Lrg color space the chroma alpha and parameter d are calculated in the unit plane mentioned above, while the others all do this more directly in a plane where intensity L equals 0. We find that the results obtained in YUV and HSV color space are almost the same, and both have good matting quality that is comparable to the results in Lrg color space. Although Lab color space is weakly

ARTICLE IN PRESS S. Lin, J. Shi / Computers & Graphics 29 (2005) 403–414

411

Fig. 7. Other results of our matting technique. (The results of Knockout, Ruzon and Tomasi and Bayesian are all from Yung-Yu Chuang’s homepage: http://www.cs.washington.edu/homes/cyy/. Syringe and Feather_edge images are obtained from Corel Knockout’s tutorial. Copyright r2001 Corel. All rights reserved. Gandalf and Galadriel images are from the movie ‘‘Lord of The Rings’’. The water image is courtesy of Philip Greenspun. http://philip.greespun.com. The tiger image is from http:// www.mypcera.com.)

correlated and most approaching to human perception, the results in it are worse than the results in the other two spaces using our weight computing method described in the previous section. Because of the rather different and complex transformation from RGB to Lab color space, seeking for a more appropriate weight computing method of Lab color space is also left as future work. Experientially, color spaces in class (2) will be a good choice for alpha estimating because of the

simple linear transformation and good matting quality of these color spaces.

4. Results and comparisons We compare the quality of our results with the quality of the results from several natural image matting

ARTICLE IN PRESS 412

S. Lin, J. Shi / Computers & Graphics 29 (2005) 403–414

techniques. Then, we compare the speed of our algorithm with that of Chuang’s.

Table 1 Our processing time of different examples under different environments (s)

4.1. Quality In Fig. 6, we use the Syringe example to compare our results with other natural image matting techniques. Knockout loses some hair strands and generates little distortion, as shown in inset 1. Ruzon and Tomasi exhibits broken strands in inset 2 and a diagonal color discontinuity in inset 1. Chuang’s result appears quite good, but if examined very carefully, a slight color discontinuity is detected in the ellipse in inset 2. Our approach has none of the above artifacts. In Fig. 7, we repeat the comparison between the Bayesian approach and our method on the Feather_edge example. When scrutinized, Chuang’s result loses a part of a hair strand, as shown in the upper ellipse, and shows some dirty areas in the lower ellipse in inset 1. In inset 2, Chuang’s result again exhibits color discontinuity. Our approach has none of the artifacts. The Syringe and Feather_edge images are two representative examples that are typically difficult for most natural image matting techniques because of the complex edges of their foreground elements. In these two examples, our method extracts the foreground objects marginally better than Chuang’s method. Fig. 8 shows other results from our algorithm. Almost all hair strands are recovered in Gandalf and Galadriel. Even in the Tiger and Waterfall images with a clearer background, our method extracts the tiger nearly faultlessly, and removes the rocks, which are hard to extract by other techniques. 4.2. Computational time Our superiority over the Bayesian approach in computational time is more noticeable and evident than that in quality. Our method is much faster than Chuang’s approach due to the difference in color estimating. In brief, Chuang’s color estimating method is composed of four steps: (1) Partitioning colors into several clusters. Generally, the higher the color variation is, the more partitioned clusters and compute time there will be. (2) For each cluster, calculating the weighted mean color. Given a cluster of N pixels, this step needs 3N additions, 3N multiplications. (3) Calculating each cluster’s weighted covariance matrix, which needs 15N additions and 12N multiplications. (4) Solving a 6  6 linear equation to obtain an approximate solution of F and B. Cost in this step can be neglected compared to that in steps 1–3.

CPU: Celeron 400 CPU: P3 600 CPU: P4 1.8G RAM: 320M RAM: 512M RAM: 256M Syringe Feather_edge Galadriel Gandalf Waterfall Tiger

16.484 25.696 4.266 5.618 9.303 5.658

10.485 16.474 2.754 3.635 5.988 3.666

5.680 8.844 1.406 1.891 3.281 1.922

In our method, the entire color estimating process in similar to step 2. Usually step 1 alone can take 5–30 times longer than step 2, and step 3 may cost 4 times that. Therefore, our method is at least 5 þ 1 þ 4 ¼ 10 times faster than Chuang’s procedure. Chuang uses about 2 min on the Syringe image under the environment: CPU PIII1.0G RAM 512M. This configuration is more powerful than that used for columns 1 and 2 of Table 1. The results of our method are shown in Table 1. We can also choose Knockout’s color estimating method to further decrease the computing time remarkably without obvious weakening of quality (as shown in Fig. 8). When the image is smooth, the most important step is the alpha estimating, rather than the color estimating.

5. Conclusions and future work In this paper, we have presented a fast and effective alpha estimating method in natural image matting in perceptual color space. Our approach differs from previous matting methods in that we decompose color information into chroma and intensity, then estimate alpha value as a weighted mean over chroma alpha and intensity alpha components. Because of this decomposition, significant components are emphasized and insignificant components are ignored. Our primary contribution is proposing a new alpha estimating method in perceptual color space and this method turns out to be satisfying and fast. In the three steps of conventional natural image matting, most previous approaches focus on the second step, building various complex models to estimate foreground and background color, while giving less attention to alpha estimating. This may neutralize the achievements of their endeavor on modelling; besides, in step 2, when computing foreground and background colors in the nearby regions of a given unknown pixel, the increasing complexity of models would not receive comparative improvement in accuracy of the

ARTICLE IN PRESS S. Lin, J. Shi / Computers & Graphics 29 (2005) 403–414

413

Fig. 8. Comparison of results by using different color estimating methods and the same alpha estimating method described in this paper. When the image is smooth, these differences in color estimating will not lead to obviously different matting results.

resulting foreground and background colors. We concentrate on the third step, estimating alpha, and obtain a nice result, which is much better than Knockout and Ruzon and Tomasi and modestly better than Chuang’s approach. Furthermore, our method is much faster than the existing approaches of comparable quality. In the future, we hope to explore a number of research areas. In the analysis of WIN and WCH, we hope to further explore factors besides r and d. When the background is of higher color variation, we hope to find a more robust color estimating model in place of the current weighted-mean method. We also plan to utilize our algorithm in some applications, such as [15,16].

Acknowledgments Many thanks to Professor Judy Brown for her help in proof reading. This work has been supported partially by the National Science Foundation of China (Project no. 60033010), the National Innovative Group Foundation of China (Project no. 60021201) and the National Key Fundamental Research and Development Program (Project no. 2002CB312105).

References [1] Smith AR, Blinn JF. Blue screen matting. In: Proceedings of SIGGRAPH 1996; August 1996. p. 259–68.

ARTICLE IN PRESS 414

S. Lin, J. Shi / Computers & Graphics 29 (2005) 403–414

[2] Qian RJ, Sezan MI. Video background replacement without a blue screen. In: Proceedings of ICIP 1999; October 1999. p. 143–6. [3] Ruzon M, Tomasi C. Alpha estimation in natural images. In: Proceedings of CVPR 2000; June 2000. p. 18–25. [4] Hillman P, Hannah J, Renshaw D. Alpha channel estimation in high resolution images and image sequences. In: Proceedings of CVPR 2001; 2001. p. 1063–8. [5] Chuang YY, Curless B, Salesin D, Szeliski R. A Bayesian approach to digital matting. In: Proceedings of CVPR 2001; 2001. p. 264–71. [6] Porter T, Duff T. Compositing digital images. In: Proceedings of SIGGRAPH 1984; July 1984. p. 253–9. [7] Vlahos P. Electronic composite photography. US Patent 3,595,987. 27 July 1971. [8] Vlahos P. Comprehensive electronic compositing system. US Patent 4,100,569. 11 July 1978. [9] Mishima Y. Soft edge chroma-key generation based upon hex-octahedral color space. US Patent 5,355,174. 1993.

[10] Dadourian A. Method and apparatus for compositing video images. US Patent 5,343,252. 30 August 1994. [11] Berman A, Dardourian A, Vlahos P. Method for removing from an image the background surrounding a selected object. US Patent 6,134,346. 2000. [12] Berman A, Vlahos P, Dadourian A. Comprehensive method for removing from an image the back-ground surrounding a selected object. US Patent 6,134,345. 2000. [13] Rogers D. Procedural elements for computer graphics, 2nd ed. New York: McGraw-Hill; 1998. [14] Ruderman DL, Cronin TW, Chiao CC. Statistics of cone responses to natural images: implications for visual coding. Journal of Optical Society of America 1998;15(8):2036–45. [15] Chuang YY, Agarwala A, Curless B, Salesin DH, Szeliski R. Video matting of complex scenes. In: Proceedings of SIGGRAPH 2002; 2002. p. 243–8. [16] Chuang YY, Goldman DB, Curless B, Salesin DH, Szeliski R. Shadow matting and compositing. In: Proceedings of SIGGRAPH 2003; vol. 22(3). July 2003. p. 494–500.