Optik 127 (2016) 396–400
Contents lists available at ScienceDirect
Optik journal homepage: www.elsevier.de/ijleo
Completion of images of historical artifacts based on salient shapes Mang Xiao a,∗ , Guangyao Li a , Lei Peng a,b , Yangjian Lv a , Yuhang Mao a a b
College of Electronics and Information Engineering, Tongji University, Shanghai, PR China School of Information Engineering, Tai’an College, Shandong, PR China
a r t i c l e
i n f o
Article history: Received 8 December 2014 Accepted 9 October 2015 Keywords: Global optimization Historical artifact Image completion Image inpainting Salient shape reconstruction
a b s t r a c t Protecting images of historical artifacts is of great value and cultural significance. In this study, we present a method for completing images of complex historical artifacts by referring to objects with similar shapes in other images. Our approach includes two fundamental stages. First, we reconstruct the salient shape of the damaged object using shape point set registration and curve fitting. Second, based on a shape guide map and gradients, we generate a new energy to complete damaged images of historical artifacts in terms of their shape and semantics. We obtained promising results with multiple images of historical artifacts, thereby demonstrating the superior performance of our proposed method compared with existing approaches. © 2015 Elsevier GmbH. All rights reserved.
1. Introduction Image completion techniques are used to solve the problem of filling target region (or “hole”) in image. This is a difficult problem in computer vision because the completed image must be visually plausible in terms of its shape and texture. In particular, completing images of damaged historical artifacts remains a challenging problem. Two main types of image completion techniques are available. The first classical method is based on a single image. In some studies, the hole have been filled with a diffusion-based method [1,2], which is known as “inpainting”. However, this method produces blurring and the results obtained after the repair process may be locally discontinuous in large-scale damaged images. Therefore, the exemplar-based method [3] is used to fill large damaged regions. The methods described in [4] proceed in a greedy manner, whereas Wexler et al. [5] employed an optimization method with a welldefined objective function, which yielded more coherent results. However, this approach is computationally expensive, although the fast PatchMatch method [6] relieves this problem greatly. It is difficult to obtain the full structure of an image using patch translation; thus, some methods [7–9] employ photometric and geometric transformations to address this issue.
∗ Corresponding author. Tel.: +86 18818203139. E-mail address:
[email protected] (M. Xiao). http://dx.doi.org/10.1016/j.ijleo.2015.10.002 0030-4026/© 2015 Elsevier GmbH. All rights reserved.
The second method fills the target region with more information based on the structure and texture in multiple images [10]. For example, Hays and Efros [11] completed the target regions in scenes using graph cuts with an image of a similar scene from a huge image database, where the method was based on image retrieval technology. Mobahi et al. [12] used a similar approach to fill target regions including a damaged foreground object. Tang et al. [13] used a boundary band map to reconstruct the structure of a damaged foreground object and to fill target regions according to a greedy strategy. However, the high level accuracy of object reconstructions and the consistent filling of target regions based on sample image are still challenging problems. In the present study, we present an innovative method for completing the target regions in images of historical artifacts using samples of similar images. Our method identifies similar objects based on salient shapes, which can be expressed as their shapes. The shape of the damaged object can be reconstructed with the salient shape of the sample object using the point set registration technique [14], although there is a relatively large degree of deformation between the shapes. Furthermore, the texture of the sample object can be used to fill the target region by histogram specification, and using photometric and geometric transformation. The three main contributions of this study are as follows. First, the salient shape of the damaged object is reconstructed precisely. Second, the damaged image is completed in a seamless and consistent manner. Third, a global optimization method is proposed for completing images of historical artifacts using sample images.
M. Xiao et al. / Optik 127 (2016) 396–400
2. Salient shape reconstruction The salient shape reconstruction process comprises two main steps. First, known shapes are extracted from the segmented foreground objects in the damaged image and sample image. Next, the reconstructed global shape of the damaged foreground object is calculated using the non-rigid registration method based on the two known shapes.
2.1. Shape extraction A salient shape comprises the core structure of the foreground object. The foreground objects are segmented with the Grab Cut method [15] and the results are shown in Fig. 1(b and e). The shape of the foreground object is then extracted by Canny edge detection [16] and the results are shown in Fig. 1(c and f).
397
We assume that noise on the inliers is Gaussian distribution with uniform standard deviation and zero mean. Then we get the following equation: L2 E(f, 2 ) =
2 (yi − f (xi |0, 2 I) n n
1
−
2d ()d/2
where d is the dimension of the point, and the size of identity matrix I is d × d. An inlier point correspondence (xi , yi ) satisfies yi − f(xi ) ∼ N(0, 2 I). We define a reproducing kernel Hilbert space H with a positive definite matrix valued kernel : Rd × Rd → Rd×d . The optimal transformation f takes the form: f (xi ) =
m
(x, x˜i )ci
(3)
i=1
where ci is a d × 1 dimensional coefficient vector. The chosen point m set x˜i are somewhat analogous to “control points”. i=1
2.2. Shape reconstruction
L2 E(C, 2 ) =
We use the point set registration method to reconstruct the global shape of the damaged object. First, we provide a brief introduction to non-rigid point set registration, which utilizes a robust estimator called the L2 minimizing estimate (L2E) criterion [14]. Non-rigid registration helps to determine the correct correspondence between two point sets, such as the shapes extracted from two objects. The goal of correspondence is to estimate a transformation f : yi = f(xi ) and to fit the inliers. To obtain the L2E estimator for the model f(x|), we minimize the function to evaluate the parameter .
L2 E() =
2 f (x|) n n
2
f (x|) dx −
i=1
(1)
(2)
i=1
1 2 − − e n (2)d/2 n
1 2d ()d/2
yiT −Ui, · C 2 2 2
i=1
T
+ tr(C C) where kernel matrix ∈ with ij = (x˜i , x˜i ) = e
2 −ˇx˜i −x˜j
(4) Rm×m
−ˇ x˜i −x˜j
2
is named the Gram matrix
, U ∈ Rn×m with Uij = (xi , x˜j ) =
, Ui,· is the ith row of the matrix U, tr(·) is the trace, and e C = (c1 , . . ., cm )T denotes the coefficient matrix. Shapes of the damaged foreground object and the sample foreground object are shown in Fig. 2(a). Fig. 2(b) demonstrates the result of the registration. The global shape of the damaged foreground object is reconstructed successfully shown in Fig. 2(c), revealing the reconstructed outline complete and semantic. We set ˇ = 0.8 and = 0.1 in algorithm. The parameter 2 and C were initialized to 0.05 and 0, respectively.
Fig. 1. Shape extraction. (a) Damaged image. (b) Damaged object. (c) Damaged object’ s shape. (d) Sample image. (e) Sample object. (f) Sample object’ s shape.
398
M. Xiao et al. / Optik 127 (2016) 396–400
Fig. 2. Shape reconstruction. (a) Shapes of the two foreground objects. (b) Alignment of the two shapes using the CPD registration method. (c) The salient shape is reconstructed.
Fig. 4. Examples of shape guidance maps. (a) The labels of the damaged foreground object are red, and the labels of the corresponding source regions are also shown in red in (b) and (c). The labels of the target shape in the damaged foreground object are blue in (a), and the labels in the corresponding source shape are also shown in blue in (b) and (c). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
P) as the patch’ s three color channels, and as the luminance’ s two gradient channels. denotes the transformation, which includes the rotation, translation, reflection and non-uniform scale. We define the color matching term with the following function: We define the color matching term with the following function:
E(Q, P) = Qi − Pi
(6)
where Qi is color of the ith pixel in the target patch Q, Pi is color of the ith pixel in the source patch P.
E(∇ Q, ∇ P) = ∇ Q i − ∇ P i Fig. 3. Results obtained after histogram specification in the sample foreground. (a) Sample image. (b) The results of histogram specification.
3. Filling the target regions
3.1. Histogram specification Histogram specification [17] is an extension of histogram equalization. We specify a color histogram of the damaged foreground object in the sample foreground and we change the distribution of the pixel values without affecting the structures in the image. Fig. 3 shows the specification results. 3.2. Global patch-based optimization We divide the images into the target region T and source regions S. The source regions include the region S1 of the sample object in the sample image and the known region S2 in the damaged image. The target region in the damaged image may be inconsistent due to geometric transformations or variable spatial illumination. To address the optimization problem of filling the target regions, we use the new energy function in Eq. (5). E(T, S) =
q⊂T
min(E(Q, P) + E(∇ Q, ∇ P) + ˇEshape (Q, P)) p⊂S
where ∇ Qi is the gradient of the ith pixel in the target patch Q, ∇ Pi is the gradient of the ith pixel in the source patch P. The shape guidance term is defined as
In this section, we describe our method for filling hole. We use the histogram specification to transform the color of the sample foreground object before implementing texture synthesis. Next, we generate a new energy to complete the damage image of a historical artifact in terms of its shape and semantics.
(5)
where Q = N(q) denotes a target patch of size w × w, which around pixel q at the patch’ s top left corner, P = f(N(q)) denote a w × w patch that is a source patch that is a consequence of a small neighborhood N around pixel p, which used photometric and geometric transformation. We define each patch has five channels at every pixel (L, a, b, ∇ x L, ∇ y L), in (L * , a * , b *) color space, (L, a, b) denotes three color channels and denotes two gradient channels to estimate the change of the luminance. While for simple symbols, we define Q (or
(7)
j
Eshape (Q, P) = Wpos (Q, P)L(Gpos (Q, P) − Gpos (Q, P)) j
+ Wid (Q, P)(Gid (Q, P) = / Gid (Q, P))
(8)
where D(·) is the ε-insensitive deviation function D(x) = max(x − ε, 0) and the indicator function [·] is 0 if its argument is false and 1 otherwise. The target patch and the source patch have different values in Gpos that will be penalized by the first term. Gpos (Q, P) indicates the distance of the target patch Q to the reconstructed shape of the damaged foreground object. j denotes that the patch come from the source region S1 or S2 . Weight values Wpos , Wid that are used to locally enable or disable its role. The source patches copied from positions with different labels are penalized by the second term, as shown in Fig. 4. In general, given the large solution space and the cost of evaluating the energy in one solution, it is intractable to obtain a globally optimal completion of the image. The method proposed by Wexler et al. [5] is an approximate optimization scheme, which comprises two iterative steps called patch search and pixel filling. (1) Patch search. for every target patch in the damaged image, the nearest neighbor patch is supposed to find in the known region or the sample image so as to minimize the value of Eq. (5). We extend the PatchMatch algorithm that not only handle translations, scales and rotations, but also cope with nonuniform scale and reflections. On the other side, in order to get invariance to small illumination, color changes, and exposure. We apply bias b and gain g adjustments to the most similar target patchfor each bias channel of a source patch. We define b(P c ) = min max (Q c ) − g(P c ) (P c ), bmin , bmax , and gain, where c denotes the color in each channel (L, a, b), () and () are the standard deviation and mean of the patch at each channel c, [bmin , bmax ] and [gmin , gmax ] are the bias and gain ranges, which are applied to regulate the colors of the patch Pc : Pc ← g(Pc )(Pc ) + b(Pc ).
M. Xiao et al. / Optik 127 (2016) 396–400
(2) voting. Eq. (5) is about the all patch terms. Thus, the optimal damaged image satisfies the fallowing function:
¯ ) + ˇEshape (I, T¯ ) T = argmin E(I, T¯ ) + E(I, T
(9)
I
¯ . The value of pixel (i, j) where I has the same size as T¯ and T ¯ is computed as following: in T¯ or T
T¯ (i, j) =
NN(Qi−k,j−l )(k, l)
k = 0...w − 1
w2
,
(10)
l = 0...w − 1 ¯ (i, j) = T
k = 0...w − 1
NN(Qi−k,j−l )(k, l) w2
(11)
l = 0...w − 1 where NN(Qi,j ) denotes the nearest neighbor source patch to the target patch Qi,j , and the selected pixel (k, l) is defined as NN(Qi,j )(k, l) in that patch. T¯ is the average colors of the target ¯ region that filled with the overlapping transformed patches. T is computed with the same way.
399
4. Experiments The experiments were performed using a system with an Intel CoreTM i7-4700k 3.5 GHz processor. We set the patch size as 7 × 7. We defined the search range as [0.8, 1.3] for a uniform scale, [0.9, 1.1] for horizontal or vertical scales, and −/2, /2 for rotation. The range of the bias for all three channels was [−10, 10] and for the gain it was [0.8, 1.3]. We set the gradient weight and the shape weight. The histogram matching technique was implemented for each YCbCr channel to avoid color distortion. During image synthesis, the gap width wg was set to five pixels horizontally or vertically. The PatchMatch iteration range was [20, 30] for updating the nearest neighbor field. The algorithm was fairly robust under these parameters. We compared our image completion approach with existing techniques, i.e., the methods of Wexler et al. and Hays et al., to demonstrate the efficiency and robustness of the proposed algorithm. Fig. 5 shows the results obtained using the three approaches with six damaged images. It should be mentioned that our approach obtained complimentary results using sample images and source images. Therefore, our method performed better in terms of both continuity and the visual effect.
Fig. 5. Comparison of the results obtained using different methods. (a) Damaged image. (b) Sample image. (c) Results obtained using the method of Wexler et al. (d) Results obtained using the method of Hays et al. (e) Results obtained using our method.
400
M. Xiao et al. / Optik 127 (2016) 396–400
5. Conclusion In this study, we addressed the problem of completing images of complex historical artifacts. To avoid disrupting the structure of the damaged object, we propose the reconstruction of the salient shape of the damaged object using the salient shape of a sample object based on shape point set registration. In order to match the texture and shape of the damaged object and the sample object, we enrich the patch search space using photometric and geometric transformations, and we define a shape guidance term. We then solve the hole-filling problem by treating it as a patch-based optimization problem using the expectation-maximization algorithm. We evaluated our approach with many images of historical artifacts and we obtained promising results in all cases. Acknowledgements This work is supported by National Nature Science Foundation of China (NSFC) (60771065, 51378365). References [1] T.F. Chan, S.H. Kang, J. Shen, Euler’s elastica and curvature-based inpainting, SIAM J. Appl. Math. (2002) 564–592. [2] M. Bertalmio, G. Sapiro, V. Caselles, C. Ballester, Image inpainting, in: Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques, ACM Press/Addison-Wesley Publishing Co, 2000, pp. 417–424. [3] A.A. Efros, T.K. Leung, Texture synthesis by non-parametric sampling, in: Proceedings of the Seventh IEEE International Conference on Computer Vision, 1999, vol. 2, IEEE, 1999, pp. 1033–1038.
[4] A. Criminisi, P. Pérez, K. Toyama, Region filling and object removal by exemplarbased image inpainting, IEEE Trans. Image Process. 13 (9) (2004) 1200– 1212. [5] Y. Wexler, E. Shechtman, M. Irani, Space–time completion of video, IEEE Trans. Pattern Anal. Mach. Intell. 29 (3) (2007) 463–476. [6] C. Barnes, E. Shechtman, A. Finkelstein, D. Goldman, Patchmatch, A randomized correspondence algorithm for structural image editing, ACM Trans. Graph. – TOG 28 (3) (2009) 24. [7] A. Mansfield, M. Prasad, C. Rother, T. Sharp, P. Kohli, L.J. Van Gool, Transforming image completion, in: BMVC, 2011, pp. 1–11. [8] J.-B. Huang, J. Kopf, N. Ahuja, S.B. Kang, Transformation guided image completion, in: 2013 IEEE International Conference on Computational Photography (ICCP), IEEE, 2013, pp. 1–9. [9] S. Darabi, E. Shechtman, C. Barnes, D.B. Goldman, P. Sen, Image melding: combining inconsistent images using patch-based synthesis, ACM Trans. Graph. 31 (4) (2012) 82. [10] M. Xiao, G. Li, L. Xie, Y. Tan, Y. Mao, Contour-guided image completion using a sample image, J. Electron. Imaging 24 (2) (2015), 023029–023029. [11] J. Hays, A.A. Efros, Scene completion using millions of photographs, in: ACM Transactions on Graphics (TOG), vol. 26, ACM, 2007, p. 4. [12] H. Mobahi, S.R. Rao, Y. Ma, Data-driven image completion by image patch subspaces, in: Picture Coding Symposium, 2009, IEEE, 2009, pp. 1–4. [13] C. Tang, X. Hu, L. Chen, G. Zhai, X. Yang, Sample-based image completion using structure synthesis, J. Visual Commun. Image Represent. 24 (7) (2013) 1115–1123. [14] J. Ma, J. Zhao, J. Tian, Z. Tu, A.L. Yuille, Robust estimation of nonrigid transformation for point set registration, in: 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, 2013, pp. 2147–2154. [15] C. Rother, V. Kolmogorov, A. Blake, Grabcut, Interactive foreground extraction using iterated graph cuts, in: ACM Transactions on Graphics (TOG), vol. 23, ACM, 2004, pp. 309–314. [16] J. Canny, A computational approach to edge detection, IEEE Trans. Pattern Anal. Mach. Intell. (6) (1986) 679–698. [17] R.C. Gonzalez, R.E. Woods, Digital Image Processing, 3rd ed., Prentice-Hall, 2008, pp. 122–138 (Ch. Intensity Transformations and Spatial Filtering).