Automatic spectral video matting

Automatic spectral video matting

Pattern Recognition 46 (2013) 1183–1194 Contents lists available at SciVerse ScienceDirect Pattern Recognition journal homepage: www.elsevier.com/lo...

2MB Sizes 0 Downloads 130 Views

Pattern Recognition 46 (2013) 1183–1194

Contents lists available at SciVerse ScienceDirect

Pattern Recognition journal homepage: www.elsevier.com/locate/pr

Automatic spectral video matting Wu-Chih Hu n, Jung-Fu Hsu Department of Computer Science and Information Engineering, National Penghu University of Science and Technology, #300, Liu-Ho Road, Makung, Penghu 880, Taiwan

a r t i c l e i n f o

abstract

Article history: Received 22 February 2012 Received in revised form 23 July 2012 Accepted 14 October 2012 Available online 23 October 2012

This paper proposes automatic spectral video matting based on adaptive component detection and component-matching-based spectral matting. In the proposed automatic spectral video matting, adaptive component detection is used to automatically generate reliable components of a given image according to its complexity. Spectral matting based on the hue difference of components is then used to obtain an accurate alpha matte of the first frame without user intervention. Finally, the componentmatching-based spectral matting is used in subsequent frames to obtain automatic video matting. In the proposed video matting method, the reliable components of a given image can be obtained; the accurate alpha mattes of given images can be automatically obtained; and the efficient and accurate video matting can be automatically obtained. Experimental results show that the proposed method outperforms state-of-the-art video matting methods based on spectral matting. & 2012 Elsevier Ltd. All rights reserved.

Keywords: Video matting Image matting Spectral matting Component matching Hue difference

1. Introduction Video matting is the process of removing the background from the video sequence to obtain the foreground objects along with opacity estimates (alpha matte). Typically, the alpha matte is computed using a blue (or green) background, a method called the blue screen technique. Using the blue screen technique, the alpha matte of the foreground is extracted based on removing the blue (or green) color pixels. However, the blue screen technique is not suitable in all situations since it requires a calibrated studio setup with special equipment. Video matting is an extension of image matting, which was first mathematically established by Porter and Duff [1]. Image matting considers the problem of estimating the opacity of each pixel in the given image. The color of a given pixel is assumed to be a convex combination of the corresponding foreground and background colors with the associated opacity value (alpha value). For a RGB three-channel color image, at each pixel, there are 3 equations and 7 unknowns. That is, all three values (alpha value, foreground, and background) are unknown for a given color image and need to be determined at every pixel location. For a given pixel, the known information is the three dimensional color vector, and the unknown variables are the alpha value and the three dimensional color vectors of the foreground and background. Consequently, image matting is a highly under-constrained problem.

n

Corresponding author. Tel.: þ886 6 9264115. E-mail addresses: [email protected] (W.-C. Hu), [email protected] (J.-F. Hsu). 0031-3203/$ - see front matter & 2012 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.patcog.2012.10.012

In order to solve this highly under-constrained problem, most existing image matting methods require the user to provide additional constraints, such as a trimap or scribbles (brush strokes). Generally, image matting methods can be roughly classified into three types [2]: sampling-based methods, propagation-based methods, and matting with extra information. Sampling-based methods estimate an unknown pixel from the foreground and background colors by examining nearby pixels, which are specified by the user as belonging to either the foreground or background. Finally, these color samples are used to directly estimate the alpha value. Propagation-based methods assume that foreground and background colors are locally smooth. Foreground and background colors are systematically eliminated using an optimization process to obtain the alpha matte. Matting with extra information is designed to provide additional information or constraints for the matting algorithm to obtain the alpha matte, such as using flash and no-flash images. In contrast to matting with extra information, sampling-based and propagation-based methods are suitable for video matting. Closed-form matting [3] does not require exact estimates for the foreground and background; scribbles are sufficient for extracting the alpha matte. Therefore, automatic video matting based on closed-form matting has been proposed [4–6]. Jain et al. [4] proposed an automatic scribbling approach based on the motion of the foreground object and closed-form matting to achieve automatic video matting. Gupta et al. [5] proposed an efficient video matting technique for determining the probability maps proposed by Jain et al. [4]. Hu et al. [6] proposed an automatic video matting method based on video object segmentation and closed-form matting. However, these automatic video matting methods [4–6] cannot deal with the problem of background regions

1184

W.-C. Hu, J.-F. Hsu / Pattern Recognition 46 (2013) 1183–1194

inside foreground objects and are only suitable for video sequences with a still background. Spectral matting [7] is a state-of-the-art image matting method. However, the accuracy of the automatically obtained alpha matte is low without user intervention. Therefore, using spectral matting to obtain automatic and accurate alpha matte is a challenging problem. Modified spectral matting methods have been proposed to increase the accuracy of the obtained alpha matte [8,9]. Wang and Li [8] proposed spectral matting based on the color information of matting components to increase the accuracy of the alpha matte. First, the intensity/saturation classification [10] is used by analyzing the features of the HSV color space. Next, a color distinguishing scheme [11] is used to divide the HSV color space into unequal regions. Then, the measurement of color histograms [12] is used to measure the color similarity between matting components to separate the matting components into foreground and background groups. Finally, the complete alpha matte is obtained using the matting components of the foreground group. However, if the colors of the foreground and background are similar or the distance between the color histograms of components is large, the alpha matte will be incorrect. Hu et al. [9] proposed a modified spectral matting method that uses palette-based component classification. The palette-based component classification is based on the assumption that the angle between the foreground palette and background palette is p in the hue space. Using the palette-based component classification, components are classified into foreground, background, and unknown regions. The matting components of the foreground and unknown regions are combined to form the complete alpha matte based on minimizing the matte cost. However, an incorrect alpha matte is obtained when the angle between the foreground and background is not p in the hue space. Video matting based on these modified spectral matting methods may require every frame to manually set the foreground components, which is a time-consuming and troublesome process for the user. Eisemann et al. [13] proposed a semi-automatic video matting method based on spectral matting and optical flow computation. The matting components of the foreground and background in the first frame are given by the user to obtain the alpha matte. The alpha mattes in subsequent frames are obtained using the warp field transform and the procedure of minimizing the error function. The matting components of the foreground and background in subsequent frames are manually set in intervals of 12–15 frames. However, the computational cost of optical flow computation is high and using optical flow computation can generate accumulation errors. The present study proposes an automatic spectral video matting method based on adaptive component detection and component-matching-based spectral matting. The adaptive component detection is used to automatically generate reliable components of a given image according to its complexity. The mean shift algorithm [14,15] is used to obtain the number of clusters of a given image. The obtained number of clusters is used in the spectral segmentation with the k-means algorithm [16] to obtain the distinct components of a given image. Next, spectral matting based on the hue difference of components is used to obtain an accurate alpha matte of the first frame without user intervention. Component classification is used based on the hue difference of components to obtain foreground, background, and unknown components. The corresponding matting components of the foreground, background, and unknown components are obtained via a linear transformation of the smallest eigenvectors of the matting Laplacian matrix [3]. The matting components of the foreground and unknown components are combined to form the complete alpha matte based on minimizing the matte cost.

Finally, the component-matching-based spectral matting is used in subsequent frames to obtain automatic video matting. The rest of this paper is organized as follows. The adaptive component detection is presented in Section 2. Spectral matting based on the hue difference of components is described in Section 3. Video matting using the component-matching-based spectral matting is proposed in Section 4. Section 5 presents experimental examples and their evaluations. Finally, the conclusion is given in Section 6.

2. Adaptive component detection In spectral matting, the distinct components are obtained using spectral segmentation with the k-means algorithm [16] based on the eigenvectors of the matting Laplacian matrix [3]. The number of components is given by the users. Determining the number of components is a troublesome process for the user. Therefore, adaptive component detection is critical for automatic spectral video matting. For fair comparison to evaluate the performance of image matting, the same spectral segmentation with the k-means algorithm [16] is used in the proposed adaptive component detection and spectral matting based on the hue difference of components. In the proposed automatic spectral video matting method, adaptive component detection is used based on the mean shift [14,15] and spectral segmentation with the k-means algorithm. The mean shift is first used to obtain the clustering number, and then spectral segmentation with the k-means algorithm using the obtained clustering number is used to obtain the distinct components of the given image. The mean shift algorithm is a robust feature-based analysis method [14]. It can greatly reduce the number of given image entities. Furthermore, it has a good discontinuity-preserving filtering characteristic, so salient features of the overall image are retained. Moreover, it is an unsupervised clustering method where the number of data clusters is unknown a priori. Tao et al. [15] use the mean shift algorithm for color image segmentation. The mean shift algorithm is briefly described as below. A special class of radially symmetric kernels satisfying   KðxÞ ¼ ck,d k JxJ2 is used, where ck,d 40 is chosen such that   R1 R1 2 dx ¼ 1. k(x), called the profile of the 0 KðxÞdx ¼ 0 ck,d k JxJ kernel, is a monotonically decreasing function defined only for 0 xZ0. Given the function g(x)¼  k (x) for a profile, the kernel G(x)   is defined as GðxÞ ¼ ck,d g JxJ2 . For n data points xi, i ¼1,...,n, in the d-dimensional space Rd, the mean shift is defined as:   Pn xxi 2 i ¼ 1 xi g : h :   x mh,GðxÞ ¼ P n xxi 2 i¼1g : h :

ð1Þ

where x is the center of the kernel (window) and h is a bandwidth parameter. The mean shift method is guaranteed to converge to a nearby point where the estimate has zero gradient [14]. The center position of kernel G can be updated iteratively using   Pn yj xi 2 i ¼ 1 xi g : h :   , j ¼ 1,2,::: yj þ 1 ¼ P ð2Þ yj xi 2 n i¼1g : h : where y1 is the center of the initial position of the kernel. Once the number of clusters of a given image is obtained, it is applied to the spectral segmentation with the k-means algorithm based on the eigenvectors of the matting Laplacian matrix to estimate the distinct components of the given image. The matting P Laplacian matrix is defined as a sum of matrices L ¼ q Aq , each of which contains the affinities among pixels inside a local window

W.-C. Hu, J.-F. Hsu / Pattern Recognition 46 (2013) 1183–1194

1185

Fig. 1. Results obtained using adaptive component detection for images of (a) an eagle, (b) a tiger and (c) a butterfly.

wq, where dij is the Kronecker delta; mq is a 3  1 mean color P vector in the window wq around pixel q; q is a 3  3 covariance matrix in a given window; 9wq9 is the number of pixels in a given window; I3  3 is the 3  3 identity matrix; Ii and Ij are 3  1 color vectors in the window wq; and e is a small positive constant. 8   T P 1   > e < d  1 1 þ I m Ij mq , ði,jÞ A wq ij jwq j i q q þ jwq j I33 Aq ði,jÞ ¼ > :0 , otherwise

background color Bi Ii ¼ ai F i þ ð1ai ÞBi ð4Þ where ai is the pixel’s foreground opacity. In spectral matting, the compositing equation is generalized by assuming that each pixel is a convex combination of K image layers F1  FK [7] Ii ¼

K X

aki F ki

ð5Þ

k¼1

ð3Þ Fig. 1 shows the results obtained using adaptive component detection for images with various levels of complexity, where the left column shows the original images and the right column shows the results obtained using adaptive component detection.

where aki are the matting components of the image which specify the fractional contribution of each layer to the final color observed at each pixel and must satisfy K X

aki ¼ 1; aki A ½0,1

ð6Þ

k¼1

3. Spectral matting based on the hue difference of components In this paper, spectral matting based on the hue difference of components is proposed to automatically obtain a high-accuracy alpha matte. The proposed method overcomes the drawbacks of spectral matting in general [7], spectral matting based on color information of matting components [8], and spectral matting based on palette-based component classification [9]. A flow diagram of the proposed method is shown in Fig. 2. In image matting, it is typically assumed that each pixel Ii in an input image is a linear combination of a foreground color Fi and a

Suppose that the input image consists of K distinct components C1  CK such that Ci\Cj ¼ f for iaj. The eigenvectors of the N  N Laplacian matrix L (matting Laplacian matrix) are computed as E¼[e1,...,eM], making E an N  M matrix (N is the total number of pixels). The distinct components are obtained using adaptive component detection. The average hue of each distinct component is then calculated in the HSV color space. If all hue angles between components are smaller than p=5, then a single foreground component and a single background component are found; otherwise, multiple foreground and background components are found. The number p=5 is set from experience by increasing p=20 from 0 to p. This threshold is suitable and reliable for obtaining the component classification.

1186

W.-C. Hu, J.-F. Hsu / Pattern Recognition 46 (2013) 1183–1194

For the first case, the single foreground and background components are respectively obtained using C B ¼ argmaxfCðiÞ \ Iboundary g

ð7Þ

H C F ¼ argmaxfC H B C ðiÞg

ð8Þ

iAk

iAk

where C H B is the hue angle of the background component obtained by Eq. (7); C(i) is the ith component; Iboundary is an image with boundary pixels (with a width of one pixel) of the given image; CH(i) is the hue angle of the ith component; and CB and CF are the background component and foreground component, respectively. Fig. 3 shows the obtained result of image matting. Fig. 3(a) shows

an image of a flower; Fig. 3(b) shows the result of component detection; and Fig. 3(c) shows the result of component classification, where the selected background component is green, the unknown components are blue, and the selected foreground component is the remaining region. For the case of multiple foreground and background components, the adjacent components are merged using the following algorithm. Algorithm of component merging C(i) and C(j) are adjacent components. 1: If 9CH(i) CH(j)9 r y 2: If(CS(i) ZTC(i)) & (CS(j) ZTC(j)) 3: C(i)[C(j) 4: else 5: If 9CV(i) CV(j)9rThV 6: C(i)[C(j) 7: else 8: Do nothing 9:else 10: Do nothing where CS(i) and CV(i) are the saturation and intensity of the ith component, respectively. The decision of the hue or the intensity is more pertinent to human visual perception of the color of that pixel [8] is used in the algorithm of component merging, where b ¼4, y ¼ p=10, and ThV ¼0.1 are set from experience. Threshold TC(i) is calculated using [8] T CðiÞ ¼

Fig. 2. Flow diagram of the proposed spectral matting based on the hue difference of components.

1 1þ bUC V ðiÞ

, b A ½1,4

ð9Þ

When the adjacent components have been merged, the background component and foreground component are found using Eq. (7) and Eq. (8), respectively. Then, the foreground component and non-adjacent components are merged using the component merging algorithm. The background component is processed using the same procedure. Fig. 4 shows the obtained result of image matting. Fig. 4(a) shows an image of Amira; Fig. 4(b) shows the result of component detection; and Fig. 4(c) shows the result

Fig. 3. Image matting of an image of a flower. (a) Original image, (b) result of component detection, (c) result of component classification and (d) obtained alpha matte. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Fig. 4. Image matting of an image of Amira. (a) Original image, (b) result of component detection, (c) result of component classification and (d) obtained alpha matte. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

W.-C. Hu, J.-F. Hsu / Pattern Recognition 46 (2013) 1183–1194

1187

Fig. 5. Component detection results for (a) (k 1)th frame and (b) kth frame.

of component classification, where the selected background component is green, the unknown components are blue, and the selected foreground component is the remaining region. Next, the corresponding matting components are obtained using a linear transformation of the smallest eigenvectors ~ E~ ¼ ½e1 ,:::,eM  of the matting Laplacian matrix. ak are initialized by applying the k-means algorithm on the smallest eigenvectors, and the indicator vectors of the resulting components C 1  C K~ are projected onto the span of eigenvectors E~ using T

ak ¼ E~ E~ mC

k

ð10Þ

C

where m denotes the indicator vector of the component C, defined as ( 1 , iAC C mi ¼ ð11Þ 0 , i= 2C The matting components are computed by minimizing an energy function g X g   ~ k ð12Þ aki  þ 1aki  , where ak ¼ Ey i,k

P subject to k aki ¼ 1 to find a set of K~ linear combination vectors yk. The above energy function is optimally minimized using Newton’s method, where g is chosen to be 0.9 for a robust measure [7]. Finally, the matting components of the foreground and unknown components are combined to form the complete alpha matte by minimizing the matte cost JðaÞ ¼ aT La

ð13Þ

To perform this task efficiently, the correlations between the matting components via L are pre-computed and stored in a K~  K~ matrix F: T

Fðk,lÞ ¼ ak Lal

ð14Þ

Then, the matting cost is computed using: T

JðaÞ ¼ b Fb

ð15Þ

where b is a K~ -dimensional binary vector indicating the selected matting components. Figs. 3(d) and 4(d) are the obtained alpha mattes of Figs. 3(a) and 4(a), respectively. Using the proposed spectral matting based on the hue difference of components, a high-accuracy alpha matte is automatically obtained. It is worth mentioning that the matting component of the foreground region is included in the matte cost and that of the background region is excluded from the matting cost, which greatly increases the accuracy of the obtained alpha matte. ~ Furthermore, since only resulting components C 1  C K (K~ r K) are computed to obtain the corresponding matting components by minimizing the energy function using Newton’s method, the computational cost is effectively reduced.

4. Video matting using the component-matching-based spectral matting The proposed automatic video matting method uses componentmatching-based spectral matting. The foreground and background components in the first frame are given by the automatically obtained alpha matte using the proposed spectral matting based on the hue difference of components. In subsequent frames, component-matching-based component classification is used to make use of the area and color information of components of successive frames to obtain the foreground, background, and unknown components. The component-matching-based component classification is described as follows. The candidate foreground and background components are obtained using   8  H  H > > > Foreground , If C k1,F ðjÞC k ðiÞ r y& > > > Area Area > > ðiÞ r Thr2 > < Thr1 rC k1,F ðjÞ=C  k   H ð16Þ C k ðiÞ ¼ Background , If C H k1,B ðjÞC k ðiÞ r y& > > > > > Area > ðiÞ r Thr2 Thr1 rC Area > k1,F ðjÞ=C k > > : Unknown , Otherwise

1188

W.-C. Hu, J.-F. Hsu / Pattern Recognition 46 (2013) 1183–1194

Fig. 6. Obtained alpha mattes of (a) Fig. 5(a) and (b) Fig. 5(b).

Area where C H k1,F ðjÞ and C k1,F ðjÞ are the hue angle and area of the jth

foreground component in the (k 1)th frame, respectively;

CH k1,B ðjÞ

and C Area k1,B ðjÞ are the hue angle and area of the jth background Area ðiÞ are component in the (k 1)th frame, respectively; C H k ðiÞ and C k the hue angle and area of the ith component of the kth frame. y ¼ p=10, Thr1 ¼0.5, and Thr2 ¼1.5 are set from experience. To obtain the foreground and background components, these candidate foreground and background components are respectively checked using: 8    V  V < Foreground , If C S ðiÞ o T C k ðiÞ &C k1,F ðjÞC k ðiÞ rThV k C k ðiÞ ¼ : Unknown , Otherwise

ð17Þ 8    V  V < Background , If C S ðiÞ o T C k ðiÞ &C k1,B ðjÞC k ðiÞr ThV k C k ðiÞ ¼ : Unknown , Otherwise

ð20Þ 8    V  V < Background , If C S ðiÞ o T C k ðiÞ &BC k1 ði,jÞBC k ði,jÞ r ThV k C k ðiÞ ¼ : Unknown , Otherwise ð21Þ FC Vk1 ði,jÞ

ð18Þ C Vk1,F ðjÞ

foreground component in the (k 1)th frame; BC Area k1 ði,jÞ is the area of the overlapping region between Ck(i) and Ck  1,B(j) of the jth background component in the (k  1)th frame. Thr3 ¼0.25 is set from experience. To obtain the foreground and background components, these candidate foreground and background components are respectively checked using: 8    V  V < Foreground , If C S ðiÞ o T C k ðiÞ &FC k1 ði,jÞFC k ði,jÞ rThV k C k ðiÞ ¼ : Unknown , Otherwise

C Vk1,B ðjÞ

where and are the intensity of the jth foreground and background component of the (k 1)th frame, respectively; C Sk ðiÞ and C Vk ðiÞ are the saturation and intensity of the ith component of the kth frame, respectively. T C k ðiÞ is calculated using Eq. (9) and ThV ¼ 0.1 is set from experience. However, the adaptive component detection would obtain different results for two successive frames. For example, Figs. 5(a) and (b) show the (k 1)th and kth frames, respectively; the components of the Amira’s face are different in the two images and thus these components are not classified as the foreground components. Therefore, to obtain the candidate foreground and background components, these unknown components are further checked using:   8  H  H > > Foreground ,If FC k1 ði,jÞFC k ði,jÞ r y& > > > > Area Area > > Z Thr3 > < FC k ði,jÞ=C k ðiÞ    ð19Þ C k ðiÞ ¼ Background ,If BC H ði,jÞBC H ði,jÞr y& k1 k > > > > > Area Area > > > BC k ði,jÞ=C k ðiÞ Z Thr3 > : Unknown ,Otherwise H where FC H k1 ði,jÞ and FC k ði,jÞ are the hue angles of the overlapping regions between Ck(i) and Ck  1,F(j) of the jth foreground component in the (k 1)th frame and the ith unknown component in the H kth frame, respectively; BC H k1 ði,jÞ and BC k ði,jÞ are the hue angles of the overlapping regions between Ck(i) and Ck  1,B(j) of the jth background component in the (k 1)th frame and the ith component in the kth frame, respectively; FC Area k1 ði,jÞ is the area of the overlapping region between Ck(i) and Ck  1,F(j) of the jth

FC Vk ði,jÞ

where and are the intensity of the overlapping regions between Ck(i) and Ck  1,F(j) of the jth foreground component in the (k 1)th frame and the ith component in the kth frame, respectively; BC Vk1 ði,jÞ and BC Vk ði,jÞ are the intensity of the overlapping regions between Ck(i) and Ck  1,B(j) of the jth background component in the (k 1)th frame and the ith component in the kth frame, respectively. Once the components of the foreground, background, and unknown regions are obtained, the components of the foreground region are merged into one foreground component, and the components of the background region are merged into one background component. The resulting components are determined to obtain the corresponding matting components. Finally, the matting components of the foreground and unknown components are combined to form the complete alpha matte by minimizing the matte cost using Eq. (15). Fig. 6(a) and (b) are the obtained alpha mattes of Fig. 5(a) and (b), respectively.

5. Experimental results Experiments were conducted on a computer with an Intel Core 2 Quad Q8200 @2.33 GHz CPU and 2 GB of RAM. The algorithms were implemented in BCB (Borland Cþþ Builder) 6.0 and Matlab R2008b. In the first experiment, the performance of adaptive component detection was evaluated. 132 images with various levels of complexity were used to test the accuracy of the component detection. These tested images are roughly classified into three types: low complexity, medium complexity, and high complexity of image segmentation. The ground truths of these tested images are obtained by manually setting the components of the foreground and background. Accurate component detection is defined

W.-C. Hu, J.-F. Hsu / Pattern Recognition 46 (2013) 1183–1194

1189

Fig. 7. Image matting results obtained using various methods for images of (a) Bear, (b) Face, (c) Flower, (d) Pharos, (e) Kim and (f) Fox.

as each detected component being only the part of the foreground or background. In this experiment, 124 images were accurately detected using the adaptive component detection, for an accuracy of 94%. Therefore, the adaptive component detection can automatically generate reliable components of a given image. The second experiment was used to demonstrate the performance of the proposed spectral matting based on the hue difference of components. Six tested images were used to evaluate the performance of spectral matting [7], spectral matting based on color information of matting components [8], spectral matting based on palette-based component classification [9], and the proposed spectral matting based on the hue difference of components. The first column of Fig. 7 shows images of Bear [17],

Face [7], Flower [3], Pharos [3], Kim [7], and Fox [18], respectively. The 2nd–5th columns of Fig. 7 show the alpha mattes obtained using spectral matting, spectral matting based on color information of matting components, spectral matting based on palette-based component classification, and the proposed method, respectively. The number of clusters obtained using the proposed adaptive component detection was used in the image matting methods. The mean absolute error (MAE) was used for performance evaluation. MAE is defined as:

MAE ¼

N X M   1 X aði,jÞaði,jÞ MN i ¼ 1 j ¼ 1

ð22Þ

1190

W.-C. Hu, J.-F. Hsu / Pattern Recognition 46 (2013) 1183–1194

Table 1 shows the performance evaluation results for spectral matting, spectral matting based on color information of matting components, spectral matting based on palette-based component classification, and the proposed method, where the ground truths of the tested images were obtained using spectral matting with user intervention; a(i,j) and aði,jÞ are the alpha mattes obtained using image matting and the ground truth, respectively. The ground truths of the tested images were obtained using spectral matting with user intervention and the proposed image

Table 1 Performance evaluation (MAE) of image matting. Tested image

Spectral matting [7]

Wang and Li’s method [8]

Hu et al.’s method [9]

Proposed method

Bear Face Flower Pharos Kim Fox

0 0 0 0 0.3711 0.0142

0 0.1436 0 0.2436 0.2564 0

0 0 0.1108 0 0.7616 0

0 0 0 0 0 0

matting method can obtain the same alpha mattes (that is, the same components of the foreground and background) obtained using spectral matting for all tested images, thus this is reasonable that the obtained MAE is zero (as shown in Table 1). If the ground truths of the tested images are obtained using other image matting methods (such as closed-form matting [3], easy matting [19], robust matting [20], Poisson matting [21], and Bayesian image matting [22]), and then the MAE obtained using the proposed image matting method would be not zero. Furthermore, the alpha matte obtained using Wang and Li’s method [8] will be incorrect if the colors of the foreground and background are similar or the distance between the color histograms of components is large. Moreover, incorrect alpha matte can be obtained using Hu et al.‘s method [9] when the angle between the foreground and background is not p in the hue space. Fig. 7 and Table 1 show that the proposed method outperforms state-of-theart image matting methods based on spectral matting. The third experiment was used to evaluate the performance of video matting. The Amira and Kim sequences (benchmarks) were used to evaluate performance. The Amira sequence consists of 52 frames, each with a size of 310  240 pixels. The Kim sequence consists of 52 frames, each with a size of 360  240 pixels.

Fig. 8. Ground truths of the Amira sequence. (a) Original frames and (b) ground truths.

Fig. 9. Ground truths of the Kim sequence. (a) Original frames and (b) ground truths.

W.-C. Hu, J.-F. Hsu / Pattern Recognition 46 (2013) 1183–1194

1191

Fig. 10. Video matting results of the Amira sequence. (a) Results obtained using semi-automatic video matting [13], (b) results obtained using video matting based on Wang and Li’s modified spectral matting [8] and (c) results obtained using the proposed method.

Fig. 11. Video matting results of the Kim sequence. (a) Results obtained using semi-automatic video matting [13], (b) results obtained using video matting based on Wang and Li’s modified spectral matting [8] and (c) results obtained using the proposed method.

The ground truths of the Amira and Kim sequences were obtained using spectral matting with user intervention, frame by frame, as shown in Fig. 8 and Fig. 9, respectively. Figs. 8(a) and 9(a) show the

11th, 31th, and 51th frames of the Amira sequence and Kim sequence, respectively. Figs. 8(b) and 9(b) show the alpha mattes of the ground truths.

1192

W.-C. Hu, J.-F. Hsu / Pattern Recognition 46 (2013) 1183–1194

Table 2 Performance evaluation of video matting for Amira sequence.

Maximum MAE Minimum MAE Average MAE

Fig. 12. Performance evaluation (MAE) of video matting for the Amira sequence.

The results of video matting obtained using semi-automatic video matting proposed by Eisemann et al. [13], video matting based on Wang and Li’s modified spectral matting [8], and the proposed automatic spectral video matting are shown in Fig. 10 and Fig. 11. In the semi-automatic video matting [13], the interval of 15 frames was set to manually give the foreground components and background components. In the video matting based on Wang and Li’s modified spectral matting, the video sequence was processed using modified spectral matting [8], frame by frame. Furthermore, in the semi-automatic video matting [13] and video matting based on Wang and Li’s modified spectral matting, the number of clusters obtained using the proposed adaptive component detection was used in the spectral matting. Figs. 10 and 11 show the results of the 11th, 31th, and 51th frames of the Amira sequence and Kim sequence, respectively. Figs. 10(a) and 11(a) show the results obtained using semiautomatic video matting [13]. Figs. 10(b) and 11(b) show the results obtained using video matting based on Wang and Li’s modified spectral matting [8]. Figs. 10(c) and 11(c) show the results obtained using the proposed automatic spectral video matting. Figs. 12 and 13 show the performance evaluation of semiautomatic video matting [13], video matting based on Wang and Li’s modified spectral matting [8], and the proposed method for the Amira and Kim sequences, respectively. Tables 2 and 3 list the maximum MAE, minimum MAE, and average MAE values over the frames of video matting for the Amira and Kim sequence, respectively. Figs. 10–13 and Tables 2 and 3 demonstrate that the proposed automatic video matting outperforms state-of-theart video matting methods based on spectral matting. The ground truths of the tested video were obtained using spectral matting with user intervention, frame by frame, and the proposed video matting method can obtain the same alpha mattes (that is, the same components of the foreground and background) obtained using spectral matting in all frames of the tested videos, thus the obtained MAE is zero. This is reasonable and not a biased selection of the dataset. If the ground truths of

Video matting [8]

Proposed method

0.304  10  2 0.087  10  2 0.180  10  2

25.431  10  2 0 4.141  10  2

0 0 0

Table 3 Performance evaluation of video matting for Kim sequence.

Maximum MAE Minimum MAE Average MAE

Fig. 13. Performance evaluation (MAE) of video matting for the Kim sequence.

Semi-automatic video matting [13]

Semi-automatic video matting [13]

Video matting [8]

Proposed method

0.399  10  2 0.099  10  2 0.198  10  2

54.988  10  2 5.723  10  2 23.09  10  2

0 0 0

the tested video are obtained using other image matting methods (such as closed-form matting [3], easy matting [19], robust matting [20], Poisson matting [21], and Bayesian image matting [22]), and then the MAE obtained using the proposed video matting method would be not zero. The comparisons in terms of number of clusters in spectral matting, type of video matting, image matting of the first frame, image matting of subsequent frames, manually set in subsequent frames, and computational cost of video matting are carried out and tabulated in Table 4. As observed from this table, the proposed method outperforms the two state-of-the-art video matting methods based on spectral matting. The proposed video matting method is unlike the traditional video object segmentation methods. The video object segmentation methods can be roughly classified into two types: background construction-based video object segmentation and foreground extraction-based video object segmentation, and they can be affected by the various backgrounds. The obtained alpha mattes using the proposed video matting method are not affected by the various backgrounds not only uniform ones. Furthermore, in contrast to automatic video matting based on the closed-form matting and motion detection [4–6], the proposed automatic spectral video matting works well for dynamic backgrounds. Fig. 14 shows the video composition with the foregrounds of the Amira and Kim sequences obtained using the proposed method and new backgrounds. The results show that the proposed method produces realistic video sequences. The ground truths of all tested images and video sequences can be downloaded from http://cc15.npu.edu.tw/  wchu/ research.htm.

6. Conclusion An automatic spectral video matting method was proposed. Adaptive component detection is used to automatically generate reliable components of a given image based on the mean shift and spectral segmentation with the k-means algorithm. The obtained reliable components are used in the proposed spectral matting based on the hue difference of components to obtain an accurate alpha matte of the first frame without user intervention. Finally, the component-matching-based spectral matting is used in subsequent frames to obtain automatic video matting. This paper makes three major contributions. (i) The adaptive component detection can automatically obtain the number of clusters to generate reliable components of a given image without user intervention which overcomes the problem of user’s giving the

W.-C. Hu, J.-F. Hsu / Pattern Recognition 46 (2013) 1183–1194

1193

Table 4 Comparative results of the proposed method and the two state-of-the-art algorithms.

Number of clusters Type of video matting Image matting of the first frame Image matting of subsequent frames Manually set in subsequent frames Computational cost

Semi-automatic video matting [13]

Video matting [8]

Proposed method

Manually set Frame-by-frame Spectral matting and user intervention Spectral matting and optical flow YES

Manually set Frame-by-frame Spectral matting based information of matting Spectral matting based information of matting No

Automatically set Frame-by-frame Spectral matting based on the hue difference of components Component-matching-based spectral matting No

Medium

High

on the color components on the color components

Low

Fig. 14. Video composition. (a) New Amira sequence and (b) New Kim sequence.

number of clusters in the spectral matting. (ii) The proposed spectral matting based on the hue difference of components can automatically obtain an alpha matte that is more accurate than those obtained using state-of-the-art image matting based on spectral matting. (iii) The proposed video matting using component-matching-based spectral matting can automatically and efficiently obtain results that are more accurate than those obtained using state-of-the-art video matting based on spectral matting. Experimental results show that the proposed automatic spectral video matting has good performance in component detection, image matting, and video matting. Therefore, the proposed automatic spectral video matting is a useful tool for video composition, video editing, and special effects for film and TV.

Acknowledgment This paper was supported by the National Science Council, Taiwan, under grant no. NSC101–2221-E-346–011. The authors gratefully acknowledge the helpful comments and suggestions of the Associate Editor and reviewers, which have improved the presentation. References [1] T. Porter, T. Duff, Compositing digital images, Computer Graphics 18 (1984) 253–259. [2] J. Wang, M.F. Cohen, Image and video matting: A survey, Foundations and Trends in Computer Graphics and Vision 3 (2) (2007) 1–78.

[3] A. Levin, D. Lischinski, Y. Weiss, A closed-form solution to natural image matting, IEEE Transactions on Pattern Analysis and Machine Intelligence 30 (2) (2008) 228–242. [4] A. Jain, M. Agrawal, A. Gupta, V. Khandelwal, A novel approach to video matting using automated scribbling by motion analysis, in: Proceedings of the IEEE International Conference on Virtual Environments, HumanComputer Interfaces, and Measurement Systems, (2008), pp. 25–30. [5] A. Gupta, S. Mangal, P. Nagori, A. Jain, V. Khandelwal, Video matting by automatic scribbling using quadra directional filling of segmented frames, in: Proceeding of the 2nd IEEE International Conference on Computer Science and Information Technology, (2009), pp. 336–340. [6] W.-C. Hu, D.-Y. Huang, C.-Y. Yang, J.-F. Hsu, Automatic video object segmentation with opacity estimate, in: Proceedings of the 4th International Conference on Genetic and Evolutionary Computing, (2010), pp. 683–686. [7] A. Levin, A. Rav-Acha, D. Lischinski, Spectral matting, IEEE Transactions on Pattern Analysis and Machine Intelligence 30 (10) (2008) 1699–1712. [8] J.-Z. Wang, C.-H. Li, Spectral matting based on color information of matting components, Advances in Wireless Networks and Information Systems 72 (2010) 119–130. [9] W.-C. Hu, J.-J. Jhu, C.-P. Lin, Unsupervised and reliable image matting based on modified spectral matting, Journal of Visual Communication and Image Representation 23 (4) (2012) 665–676. [10] S. Sural, G. Qian, S. Pramanik, Segmentation and histogram generation using the HSV color space for image retrieval, in: Proceedings of the 2002 International Conference on Image Processing, (2002), pp. 589–592. [11] L. Zhengjun, Z. Shuwu, An improved image retrieval method based on the color histogram, Control and Automation Publication Group 24 (2-1) (2008) 246–247. [12] F.-D. Jou, K.-C. Fan, Y.-L. Chang, Efficient matching of large-size histograms, Pattern Recognition Letters 25 (3) (2004) 277–286. [13] M. Eisemann, J. Wolf, M. Magnor, Spectral video matting, in: Proceedings of the Vision, Modeling and Visualization, (2009), pp. 121–126. [14] D. Comaniciu, P. Meer, Mean shift: A robust approach toward feature space analysis, IEEE Transactions on Pattern Analysis and Machine Intelligence 24 (5) (2002) 603–619.

1194

W.-C. Hu, J.-F. Hsu / Pattern Recognition 46 (2013) 1183–1194

[15] W. Tao, H. Jin, Y. Zhang, Color image segmentation based on mean shift and normalized cuts, IEEE Transactions on Systems, Man, and Cybernetics-Part B: Cybernetics 37 (5) (2007) 1382–1389. [16] A. Ng, M. Jordan, Y. Weiss, On spectral clustering: Analysis and an algorithm, in: Proceedings of the Advances in Neural Information Processing System, (2001), pp. 849–856. [17] I.-C. Chang, C.-J. Hsieh, Image forgery using enhanced Bayesian-based matting algorithm, Intelligent Automation and Soft Computing 17 (2) (2011) 269–281. [18] Y.-Y. Chuang, B. Curless, D.H. Salesin, R. Szeliski, A Bayesian approach to digital matting, in: Proceedings of the IEEE Computer Vision and Pattern Recognition, (2001), vol. 2, pp. 264–271.

[19] Y. Guan, W. Chen, X. Liang, Z. Ding, Q. Peng, Easy matting: A stroke based approach for continuous image matting, Computer Graphics Forum 25 (3) (2008) 567–576. [20] J. Wang, M. Cohen, Optimized color sampling for robust matting, in: Proceedings of the IEEE Computer Vision and Pattern Recognition, (2007). [21] J. Sun, J. Jia, C.-K. Tang, H.-Y. Shum, Poisson matting, in: Proceedings of the International Conference on Computer Graphics and Interactive Techniques (ACM SIGGRAPH 2004), 2004, vol. 23, no 3, pp. 315–321. [22] Y.-Y. Chuang, B. Curless, D.H. Salesin, R. Szeliski, A Bayesian approach to digital matting, in: Proceedings of the IEEE Computer Vision and Pattern Recognition (CVPR), 2001, vol. 2, pp. 264–271.

Wu-Chih Hu received his Ph.D. degree in electrical engineering from the National Taiwan University of Science and Technology, Taipei, Taiwan, in 1998. From 1998, he worked at the National Penghu University of Science and Technology for 15 years. He is currently the chairman and Associate Professor in the Department of Computer Science and Information Engineering. He has published more than 100 papers in journal and conference proceedings since 1998. His current research interests include computer vision, image processing, pattern recognition, digital watermarking, visual surveillance, and video processing.

Jung-Fu Hsu received his MS degree in Graduate Institute of Electrical Engineering and Computer Science from National Penghu University of Science and Technology, Taiwan in 2011. His recent research interests include image processing and video processing.