Exemplar based regular texture synthesis using LSTM

Exemplar based regular texture synthesis using LSTM

Pattern Recognition Letters 128 (2019) 226–230 Contents lists available at ScienceDirect Pattern Recognition Letters journal homepage: www.elsevier...

2MB Sizes 0 Downloads 36 Views

Pattern Recognition Letters 128 (2019) 226–230

Contents lists available at ScienceDirect

Pattern Recognition Letters journal homepage: www.elsevier.com/locate/patrec

Exemplar based regular texture synthesis using LSTM Xiuxia Cai a, Bin Song a,∗, Zhiqian Fang b a b

State Key Laboratory of Integrated Services Networks, Xidian University, Xi’an 710071, China Zhengzhou Tiamaes TechCo., LTD, 10 Building, No. 316 Lianhua Street, Zhengzhou High-tech industria Development Zone, Zhengzhou, China

a r t i c l e

i n f o

Article history: Received 23 November 2017 Revised 4 September 2019 Accepted 6 September 2019 Available online 6 September 2019 MSC: 41A05 41A10 65D05 65D17

a b s t r a c t Exemplar based texture synthesis is an important technique for image processing and computer graph in texture mapping. So far, great achievements have been made in this field. However, both traditional and modern methods based on deep learning are making errors in synthesizing patterned texture due to the failure for catching the regularity of texture. To obtain a better synthesized result, a new framework for regular texture synthesis is proposed in this paper. Besides, we use recurrent neural network (RNN) of long-shot term memory (LSTM) to produce a regular texture based on exemplar. Our method can generate at any size texture without errors, which is an improvement for texture synthesis with deep learning techniques.Compared with traditional method as well as deep learning method, our method is obviously better in synthesizing regular texture. © 2019 Elsevier B.V. All rights reserved.

Keywords: Deep learning Texture synthesis Exemplar LSTM

1. Introduction Texture synthesis is one of the most fundamental techniques texture mapping by computer graphics and an important technique of image processing. This technique can be applied to various fields such as image melding and inpainting [2,3,16]. For a specific model task, there are two big categories to get a seamless photo-realistic texture. One is non-parametric methods of exemplar-based texture synthesis [6], and the other refers to procedural method with a parameterized procedure [13,29]. Considering the hardy reproduction of a specific given pattern with parameterized procedure method, numerous researches which adopt exemplar based method [4,21] have been carried on. The proposed method in this paper belongs to the exemplar based method. Assuming a new texture generated from a regular texture by our techniques, it can faithfully present the main visual characteristics of the exemplar without other unnatural looking artifacts. [7] proposed to use convolutional neural network (CNN) in synthesis texture, which is a new milestone for the technique of texture synthesis. Besides, there are many extended applications, such as [8,9]. However, no matter the traditional method or the lately CNN and GAN method used for texture synthesis, none of them can



Corresponding author. E-mail address: [email protected] (B. Song).

https://doi.org/10.1016/j.patrec.2019.09.006 0167-8655/© 2019 Elsevier B.V. All rights reserved.

deal well with the synthesis problem of regular texture, as shown in Fig. 1. The reason why the unnatural looking artifacts were generated is that previous works fail to catch the regularity of patterned texture well. To solve this issue, we propose recurrent neural network (RNN) of long-shot time memory (LSTM) to generate a larger regular texture based on exemplar. LSTM network can effectively handle long-range dependencies which are central to object and scene understanding. We process the image pixel by pixel and allow the pixel to be processed in the LSTM end to end. Additionally, this structure ensures that signals can be well propagated in the space and we can take the regular texture as a periodic signal. Then, the number of hidden units for LSTM is set to be the length of signal period. We can generate the synthesized texture at any size we wanted, which is also an improvement in texture synthesis to use deep learning technique. The remainder of this paper is organized as follows. Section 2 reviews related work and symmetries group theory. In Section 3, we present the framework of our texture synthesis. In Section 3.1, we introduce the model used in this paper. Section 3.2 explains the way of generating texture without errors. Section 4 provides several synthesis examples. Section 5 is the conclusion of this paper. According to the limitations of our algorithm, we give the program of the future work.

X. Cai, B. Song and Z. Fang / Pattern Recognition Letters 128 (2019) 226–230

227

The techniques of image generation are applied to image processing, such as super resolution and image stylized [10,37], which have obtained good achievements. As an excellent recursive neural network method, LSTM is applied to generate regular texture with arbitrary size in this paper. Fig. 1. One example of synthesis result for traditional method, CNN method and our method.

2. Related work 2.1. Texture synthesis In general, nonparametric methods include pixel-based, blockbased, and optimization-based methods [34]. Pixel based and patch based methods are the earliest and also the most widely used in texture synthesis. Paper [5] presented that their method could demonstrate state-of-the-art results in various image editing and synthesis applications. their method frequently used in the later optimization based method. Paper [35] optimized the optimization based synthesis through adding an extra curved contours channel, which could make the best of shape similarity of curvilinear features, which can effectively handle multiple layers and synthesize valid instances of interaction. Paper [14] gave a concept of self-tuning texture synthesis that belongs to the optimization based method. However, none of these methods mentioned before can well generate regularity texture without artifacts. Aguerrebere et al. [1] explains in detail the traditional texture synthesis method based on exemplar. Recent works [14] have improved the efficiency and performance of exemplar-based texture synthesis for traditional method. Lately, deep learning becomes a popular method used in image processing. Versteegen et al. [33] introduced a new simple framework for texture modelling with Markov-Gibbs random fields (MGRF),which is a further improvement in the traditional algorithm. Afterwards, considering the similarities between performance of convolutional neural networks and biological vision [12,36], texture synthesis using CNN becomes popular in numerous extended applications [2,15]. However, this kind of technique can not generate synthesized texture at any size until now. Besides, the CNN based texture synthesis method can generate the texture in the size of exemplar at the same scale. In this paper, we aim to generate the texture of any size using the LSTM method, which also belongs to the deep learning method.

3. Our method The framework of our method is shown in Fig 2. In Fig. 2, at first, we train the LSTM based on the exemplar, which is the learning stage. And then we use the LSTM to generate new texture, which is the generating stage. In Section 3.1, we will introduce the mathematical model in this paper and the details of generating texture steps in Section 3.2. 3.1. Model Probability Eqs. (1) and (2) are used to estimate the distribution of pixel values. We design and train our LSTM model by means of these two equations and expect that the model can simulate the probability equations to generate pixel values. Let x be an exemplar and xij be the pixel at location (i, j). Our aim is to estimate the distribution p(xij ). We rewrite it as the conditional distributions over the pixels:

p( x i j ) =



ij

p(xi j |x
(1)

where x < ij designates the set of pixels xmn such that m < i or m = i and n < j or n = j (m = i and n = j are not set up at the same time, as shown in Fig. 3). For RGB image, we assume each of the colors is conditioned on the other channels. Then, Eq. (1) can be rewritten as following:

p( x i j ) =



p(xi j,R |x
ij

p(xi j,B |x
(2)

where xij,R , xij,G and xij,B represent three color channels pixel values. Each channel variable xij,∗ takes value distributed in (0, 255).

2.2. Image generation using deep learning In recent years, the use of Deep learning for image generation becomes popular [30]. During this period, many good works emerged. The core aim for the technique research of image generation is to restore the source image into high-quality or new style image. Besides, the size of new generated image is same as the source image. Some of these methods are classified to CNN based method, such as Neural Autoregressive Distribution Estimation (NADE) [32], and Pixel-CNN [25]. The CNN based image generation fully uses the theory of Restricted Boltzmann Machine (RBM) and auto encoding. In addition, some other techniques for image generation can be classified to RNN, such as Deep Recurrent Attentive Writer (DRAW) [11] and Pixel Recurrent Neural Networks (Pixel-RNN) [26]. Li and Wand [18], Radford et al. [28] have proposed the use of Generative Adversarial Networks (GAN) in synthesis texture. These kinds of methods do well in style transfer. Their experiment results are surprising on the datasets: MNIST [20], SVHN [24], and CIFAR-10 [17]. However, these methods still need to be further improved in texture image synthesis technology of arbitrary size.

Fig. 2. The framework of training LSTM network for generating new texture.

Fig. 3. Left: To generate the new-pixel conditions on all the previously generated pixels left and above of the pixel. The over head Inverted triangular domain pixels are the values are not zero in the kernel. Center: The case that when i = 0 we generate the new-pixel conditioned on the left pixels. Right: When j = 0 the dependency field of the LSTM covers only the pixels above the new-pixel. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

228

X. Cai, B. Song and Z. Fang / Pattern Recognition Letters 128 (2019) 226–230

We aimed at estimating a distribution over exemplar that can be used to tractably compute the likelihood between exemplar and new generated ones. Based on the idea of pixel-RNN method, we make full use of pixel context information, and compute the features for the pixels in the context of one-dimensional convolution. Besides, the features are computed from left to right as well as top to bottom. Supposing that the kernel size for the convolution filter is K∗ K, while for the boundary pixels showing the center and right sub figure in Fig. 3, the kernel size is 1∗ K and K∗ 1 separately. In terms of the normal K∗ K filter, upper triangular filter coefficient is not zero (the red region in the left sub figure in Fig. 3, where the K = 3). 3.2. Training and synthesis In the following, we briefly describe the LSTM algorithm. We set hij and cij as the states of hidden and memory units and oij , iij , ( firj , ficj ) and gij are output, input, forget and content gates. Given the previous states, the new states are obtained as follows:

( o , i , f r , f c , g )T = ⎛ i j σ i j ⎞i j i j i j



⎜ σ ⎟ ⎜ σ ⎟ K ss  hi−1, j + K is  x
(3)

tanh ci j = gi j  ii j + ci, j−1  ficj + ci−1, j  firj hi j = oi j  tanh(ci j ) where σ is the logistic sigmoid function,  represents the convolution operation and  indicates the element-wise multiplication. Each memory unit of a spatial LSTM has two preceding states: ci, j−1 and ci−1, j and two corresponding forget gates including ficj and firj . Where ci, j−1 is the memory unit on the left, and ci−1, j is the memory unit next to the current pixel on the above, and f icj is the horizontal forget gate, firj is the vertical forget gate. Besides, Kss and Kis are the weights for the state-to-state and the input-to-state components. To generate new texture which is similar with the exemplar, we first train the LSTM based on learning the exemplar. Depending on the advantage of well dependencies and propagating of LSTM in the space, we generated the new texture under the application of the learned model. The whole method is shown in Fig. 2. Where xi j is the new generated pixel, while xij is the true value. The modules in yellow region belong to network learning stage and the modules in grey region belong to generating texture stage. The steps for generating new regular texture based on exemplar is shown in Algorithm 1. In step 1, we initialize the width and height for the new regular texture to be generated, and randomly initialize the value for the first pixel in the new texture. In step2, we search in the exemplar for the best-matched pixel, and record (i, j) as the position of the matched pixel. Besides, we train three kinds of LSTM network and assign LSTMr , LSTMc and LSTMw respectively to them in step 3. In step 2, we have known that the best matched pixel locals in ith row and jth column. Therefore, in step 3, for the matched pixel in the case of center sub figure in Fig. 3, we use the ith row training the LSTMr . Meanwhile, the left pixel are the context for the pixel to be generated. For the matched pixel in the case of right sub figure in Fig. 3, we use the jth column to train the LSTMc . In this training step, we only take the pixels on above as the context for the pixel to be generated. The LSTMw is trained to use the whole exemplar. The number of memory and hidden units are equal, for LSTMr , which are the width of the exemplar, for LSTMc are the height of the exemplar, and LSTMw are the width multiplied by height of the exemplar. In the last step4, we send x0 to LSTMr to generate x01 and continue to generate the

Algorithm 1 Algorithm for generating regular texture based on exemplar using LSTM. step1:random initialized one pixel value x0 and the size of new texture; step2:search in the exemplar for the best matched pixel, and sign the position for the matched pixels as (i, j ); step3:According to the position of the calibration of matching pixel in step 2, training the corresponding LSTM three models (LST Mr , LST Mc , LST Mw ); step4:input x0 into LST Mr to generate x01 , and then iterate to generate all the remaining pixels in the first row; step5:input x0 into LST Mc to generate x10 , and then iterate to generate all the remaining pixels in the first column; step6:input x01 and x10 into LST Mw to generate x11 , and then iterate to generate all the remaining pixels of the texture image.

rest pixels for the first row in the new texture. Subsequently, we send x0 to LSTMc to generate x10 and continue to generate the rest pixels for the first column in the new texture. Finally, LSTMw is used to generate the rest pixels for the new texture. The example of synthesis results is shown in Section 4. 4. Experiments In this section, we illustrate our experiments and results. We compare our experimental results with methods [7,14]. Until now, method [14] is the best representative work for traditional texture synthesis. Method [7] is a technique for CNN based texture synthesis, which is developed in recent years. Considering an exemplar, we generate texture in the size of 1024 ∗ 1024 under the same scale of the exemplar. However, method [7] can only generate texture in the size of exemplar under the same scale. Therefore, in the experiments, we just compare visual results in the same scale, rather than in different sizes. The results of method [14] and our method are in the size of 1024 ∗ 1024 while the results of method [7] are in the size of exemplar. All the exemplars used in this paper are available in the site http://www.cgtextures.com/. Our models are trained to use the Torch toolbox. We implemented our algorithm in python and ipython. Our implementation is on Graphic Processing Unit(GPU) of NVIDIA GTX980. We use as large a batch size as allowed by the GPU memory. In this paper, we set the batch size as 64∗ 64. In Fig. 4, the exemplar is in arbitrary sizes. The exemplars include large, middle and small-scale structures for the texture. Besides, there are patterned fabric, regular blocks, mixed size tiles and so on. The second column refers to the synthesis results of method [7]. The third column is the experimental results of method [14]. Besides, the last column is the experimental results of our method. From Fig. 4 we can find that neither method [7,14] can well generate regularity texture without errors. Our method can generate regularity texture without errors due to the long-range dependencies in the space for the LSTM. In addition, our method can generate any size of texture with the same scale of exemplar which is also an improvement for the technique of texture synthesis based on deep learning. Table 1 shows the unreferenced image quality evaluation values of the synthesized texture corresponding to the three methods in Fig. 4. The evaluation methods we used are Spatial and Spectral Entropies Quality (SSEQ [19]), Blind Image Quality Index (NIQI

X. Cai, B. Song and Z. Fang / Pattern Recognition Letters 128 (2019) 226–230

229

Table 1 Experiment result of SSEQ, BIQI and NIQE. Methods

SSEQ

BIQI

NIQE

Gatys et al. Kaspar et al. Our method Gatys et al. Kaspar et al. Our method Gatys et al. Kaspar et al. Our method Gatys et al. Kaspar et al. Our method Gatys et al. Kaspar et al. Our method Gatys et al. Kaspar et al. Our method Gatys et al. Kaspar et al. Our method Gatys et al. Kaspar et al. Our method Gatys et al. Kaspar et al. Our method

55.5421 41.2563 39.3363 54.8235 35.3064 34.2575 53.8200 38.8025 36.5107 30.5000 24.5578 23.8210 48.6010 31.3904 29.1206 26.8433 16.9325 15.5737 37.6371 21.9070 19.0115 50.1118 27.9650 27.4318 32.2547 18.5874 17.2682

56.9001 32.8654 32.7776 46.1162 26.1981 25. 8605 53.8012 34.5024 33.2732 30.5022 21.5435 20.3811 38.6111 29.3127 28.8485 26.8674 15.7522 15.6379 41.2019 30.1091 29.0544 48.1738 27.0884 26.6126 39.4100 38.9561 37.3290

16.5036 5.5648 5.4645 14.8511 5.7124 5.0367 23.8112 13.8001 11.5168 7.5754 5.0012 4.8838 8.6045 5.4154 5.2236 6.8466 4.5401 4.3254 11.0029 8.0012 7.4112 9.5888 6.8907 6.0773 20.1541 10.2311 9.9323

Fig. 5. We compare experimental result with method of [27,28,31].

In Fig. 6 is one of our experiment result compared with CNNbased [31], MRF-based [27] and DCGAN [28]. Qualitatively, ours result is comparable and superior to the other methods. 5. Conclusion In this paper, we build upon deep recurrent neural networks as generative models for texture synthesis, and improve the effect of synthesis for the regular texture. We have described the ways we train the model by capturing the dependent relations of pixels and we treated the pixel values as discrete random variables in the conditional distributions. It has been shown that our method can significantly improve the state of the art by the regular texture synthesis. Now, we can generate any size of texture at the same scale with exemplar. However, texture image is just one family of image. There are many other families of image which can not be well processed by our method. So, in the next step, we attempt to expand our experiments to the technique of image retrieval which will be not limited to texture image. Declaration of Competing Interest Fig. 4. We compare experimental results with method Gatys et al.(CNN-based method) and Kaspar et al.(traditional method).

[23]) and Natural Image Quality Evaluator (NIQE [22]). The image are sequenced in the corresponding 1 to 9 from the first line to the last line in Fig. 4. The smaller the values of these three indicators are, the higher the image quality will be. Table 1 shows that our method has certain advantages in these three indicators.

We declare that we have no financial and personal relationships with other people or organizations that can inappropriately influence our work, there is no professional or other personal interest of any nature or kind in any product, service and/or company that could be construed as influencing the position presented in, or the review of, the manuscript entitled, “Exemplar based Regular Texture Synthesis Using LSTM”.

230

X. Cai, B. Song and Z. Fang / Pattern Recognition Letters 128 (2019) 226–230

Acknowledgments We thank the anonymous reviewers and the editor for their valuable comments. This work has been supported by the National Natural Science Foundation of China (No. 61772387), the Fundamental Research Funds of Ministry of Education and China Mobile (MCM20170202), the National Natural Science Foundation of Shaanxi Province (Grant No. 2019ZDLGY03-03) and also supported by the ISN State Key Laboratory. References [1] C. Aguerrebere, Y. Gousseau, G. Tartavel, Exemplar-based texture synthesis: the Efros-Leung algorithm, Image Process. Line 3 (2013) 223–241. [2] C.F. Cadieu, H. Hong, D.L. Yamins, N. Pinto, D. Ardila, E.A. Solomon, N.J. Majaj, J.J. DiCarlo, Deep neural networks rival the representation of primate it cortex for core visual object recognition, PLoS Comput. Biol. 10 (12) (2014) e1003963. [3] X. Cai, B. Song, Combining inconsistent textures using convolutional neural networks, J. Vis. Commun. Image Represent. 40 (2016) 366–375. [4] K. Chen, H. Johan, W. Mueller-Wittig, Simple and efficient example-based texture synthesis using tiling and deformation, in: Proceedings of the ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games, ACM, 2013, pp. 145–152. [5] S. Darabi, E. Shechtman, C. Barnes, D.B. Goldman, P. Sen, Image melding: combining inconsistent images using patch-based synthesis., ACM Trans. Graph. 31 (4) (2012). 82–1 [6] A.A. Efros, T.K. Leung, Texture synthesis by non-parametric sampling, in: Computer Vision, 1999. The Proceedings of the Seventh IEEE International Conference on, 2, IEEE, 1999, pp. 1033–1038. [7] L. Gatys, A.S. Ecker, M. Bethge, Texture synthesis using convolutional neural networks, in: Advances in Neural Information Processing Systems, 2015, pp. 262–270. [8] L.A. Gatys, A.S. Ecker, M. Bethge, A neural algorithm of artistic style, arXiv:1508.06576 (2015). [9] L.A. Gatys, A.S. Ecker, M. Bethge, Image style transfer using convolutional neural networks, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 2414–2423. [10] J. Gauthier, Conditional generative adversarial nets for convolutional face generation, Class Project for Stanford CS231N: Convolutional Neural Networks for Visual Recognition, Winter semester, 2014, 2014. [11] K. Gregor, I. Danihelka, A. Graves, D.J. Rezende, D. Wierstra, Draw: a recurrent neural network for image generation, arXiv:1502.04623 (2015). [12] U. Güçlü, M.A. van Gerven, Deep neural networks reveal a gradient in the complexity of neural representations across the brain’s ventral visual pathway, arXiv:1411.6422 (2014). [13] J.C. Hart, Perlin noise pixel shaders, in: Proceedings of the ACM SIGGRAPH/EUROGRAPHICS Workshop on Graphics Hardware, ACM, 2001, pp. 87–94. [14] A. Kaspar, B. Neubert, D. Lischinski, M. Pauly, J. Kopf, Self tuning texture optimization, in: Computer Graphics Forum, 34, Wiley Online Library, 2015, pp. 349–359.

[15] S.-M. Khaligh-Razavi, N. Kriegeskorte, Deep supervised, but not unsupervised, models may explain it cortical representation, PLoS Comput. Biol. 10 (11) (2014) e1003915. [16] J. Kopf, C.-W. Fu, D. Cohen-Or, O. Deussen, D. Lischinski, T.-T. Wong, Solid texture synthesis from 2d exemplars, in: ACM Transactions on Graphics (TOG), 26, ACM, 2007, p. 2. [17] A. Krizhevsky, Learning multiple layers of features from tiny images (2009). [18] C. Li, M. Wand, Precomputed real-time texture synthesis with Markovian generative adversarial networks, in: European Conference on Computer Vision, 2016, pp. 702–716. [19] L. Liu, B. Liu, H. Huang, A.C. Bovik, No-reference image quality assessment based on spatial and spectral entropies, Signal Process. Image Commun. 29 (8) (2014) 856–863. [20] Y. Lecun, L. Bottou, Y. Bengio, P. Haffner, Gradient-based learning applied to document recognition, Proc. IEEE 86 (11) (1998) 2278–2324. [21] C. Ma, L.-Y. Wei, X. Tong, Discrete element textures, in: ACM Transactions on Graphics (TOG), 30, ACM, 2011, p. 62. [22] A. Mittal, R. Soundararajan, A.C. Bovik, Making a completely blind image quality analyzer, IEEE Signal Process. Lett. 20 (3) (2013) 209–212. [23] A.K. Moorthy, A.C. Bovik, A two-step framework for constructing blind image quality indices, IEEE Signal Process. Lett. 17 (5) (2010) 513–516. [24] Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, A.Y. Ng, Reading digits in natural images with unsupervised feature learning, Nips Workshop on Deep Learning Unsupervised Feature Learning, 2011. [25] A. van den Oord, N. Kalchbrenner, L. Espeholt, O. Vinyals, A. Graves, et al., Conditional image generation with pixelcnn decoders, in: Advances In Neural Information Processing Systems, 2016, pp. 4790–4798. [26] A. van den Oord, N. Kalchbrenner, K. Kavukcuoglu, Pixel recurrent neural networks, arXiv:1601.06759 (2016). [27] J. Portilla, E.P. Simoncelli, A parametric texture model based on joint statistics of complex wavelet coefficients, Int. J. Comput. Vis. 40 (1) (20 0 0) 49–70. [28] A. Radford, L. Metz, S. Chintala, Unsupervised representation learning with deep convolutional generative adversarial networks, Comput. Sci. (2015). [29] E. Risser, C. Han, R. Dahyot, E. Grinspun, Synthesizing structured image hybrids, in: ACM Transactions on Graphics (TOG), 29, ACM, 2010, p. 85. [30] L. Theis, M. Bethge, Generative image modeling using spatial lstms, in: Advances in Neural Information Processing Systems, 2015, pp. 1927–1935. [31] D. Ulyanov, V. Lebedev, A. Vedaldi, V. Lempitsky, Texture networks: Feedforward synthesis of textures and stylized images (2016). [32] B. Uria, M.-A. Côté, K. Gregor, I. Murray, H. Larochelle, Neural autoregressive distribution estimation, arXiv:1605.02226 (2016). [33] R. Versteegen, G. Gimel’Farb, P. Riddle, Learning high-order generative texture models(2014) 90–95. [34] L.-Y. Wei, S. Lefebvre, V. Kwatra, G. Turk, State of the art in example-based texture synthesis, in: Eurographics 2009, State of the Art Report, EG-STAR, Eurographics Association, 2009, pp. 93–117. [35] R. Wu, W. Wang, Y. Yu, Optimized synthesis of art patterns and layered textures, IEEE Trans. Vis. Comput. Graph. 20 (3) (2014) 436–446. [36] D.L. Yamins, H. Hong, C.F. Cadieu, E.A. Solomon, D. Seibert, J.J. DiCarlo, Performance-optimized hierarchical models predict neural responses in higher visual cortex, Proc. Natl. Acad. Sci. 111 (23) (2014) 8619–8624. [37] A. Zhmoginov, M. Sandler, Inverting face embeddings with convolutional neural networks, arXiv:1606.04189 (2016).