ARTICLE IN PRESS
Signal Processing: Image Communication 23 (2008) 1–13 www.elsevier.com/locate/image
Transform domain texture synthesis D.S. Wickramanayake, E.A. Edirisinghe, H.E. Bez Department of Computer Science, Loughborough University, UK Received 26 October 2006; received in revised form 28 August 2007; accepted 28 August 2007
Abstract In this paper, we propose a fast DWT-based multi-resolution texture synthesis algorithm in which coefficient blocks of the spatio-frequency components of the input texture are efficiently stitched together to form the corresponding components of the synthesised output texture. We propose the use of an automatically generated threshold to determine the visually significant coefficients, which act as elements of a matching template used in the texture quilting process. We show that the use of a limited set of visually significant coefficients, regardless of their level of resolution, not only reduces the computational cost, but also results in more realistic texture synthesis. A transform domain texture blending strategy is used to remove the remaining artefacts across edges, improving the synthesised texture quality further. We use popular test textures to compare our results with that of the state-of-the-art techniques. Some application scenarios of the proposed algorithm are also discussed. r 2007 Elsevier B.V. All rights reserved. Keywords: Texture synthesis; Image quilting; Multi-resolution; Wavelet transforms
1. Introduction In order to enhance the visual realism in virtual scenes, non-geometric finer details need to be added. Among the methods experimented by researchers, capturing the real word appearance by photographs and using them for creating the virtual scenes has been the most successful. Texturing surfaces has thus attracted much interest with an increasing number of animated movies produced in recent years. Texture synthesis is particularly useful in
Corresponding author. Tel.: +441509635721; fax: +441509635722. E-mail addresses:
[email protected] (D.S. Wickramanayake),
[email protected] (E.A. Edirisinghe),
[email protected] (H.E. Bez).
modelling repetitive patterns such as human and animal skin, stone, wood, marble, etc. The problem of synthesising textures has been studied extensively and numerous approaches have already been proposed. So far the most common approach to texture synthesis has been to develop a statistical model which emulates the generative process of the texture that it is intending to mimic. Markov Random Field (MRF) is a widely used texture model [1–5], which assumes the underline stochastic process is both local and stationary. Another common approach is the physical simulation of the texture. In this method texture generation is done by directly simulating the physical generation process of certain textures such as corrosion, weathering, etc. The inspiration for our work comes from the two patch-based algorithms proposed by Efros and
0923-5965/$ - see front matter r 2007 Elsevier B.V. All rights reserved. doi:10.1016/j.image.2007.08.001
ARTICLE IN PRESS 2
D.S. Wickramanayake et al. / Signal Processing: Image Communication 23 (2008) 1–13
Freeman [6] and Lin Ling and C Liu [7]. Both of these algorithms use patch-based sampling and in addition the second addresses the problem of constrained texture synthesis. These algorithms produce reasonably good-quality results with less computation cost compared to other algorithms. In Efros and Freeman’s algorithm the output texture is formed by selectively transferring randomly selected blocks of a predefined size from the input texture image. Given that the top left-hand corner block of the output image has been appropriately formed, a subset of blocks from which a good candidate for the block to its right (assuming a raster scanned order) is found as follows: All possible blocks of the same block size from the input image are matched to the first block (top left-hand corner) of the output image, under a certain overlap. Unfortunately this algorithm cannot be used for real time texture synthesis, as its efficiency is relatively low. The use of exhaustive searching in choosing the best match causes computational power to be wasted. Due to the use of a random picking technique in selecting the final block to be patched with the preceeding block, often the seam between the two adjacent blocks is quite visible. Even though a minimum error boundary cutting technique is used to smoothen off these sudden changes in texture, it involves computationally expensive methodologies such as dynamic programming and thus would not be suitable for real time applications. Recently Kwatra [8] proposed another patchbased approach using a graph cut technique to find the optimal cutting path along the edges. Entire patch matching is carried out initially and sub-patch matching is done in refinement passes. In entire patch matching the patch is overlapped with different offsets and the best patch is selected using a probability function. Sub-patch matching involves overlapping error regions with patches of different offsets. As these two steps involve large overlaps high computational power is required in calculating the overlap error. Therefore, they have proposed a FFTbased acceleration to speed up the algorithm. It is stated that the use of FFT decreased the complexity of image matching from O(n2) to O(n log (n)), where n is the number of pixels in the sample image. In a typical texture synthesis process n is a constant and is not very large as for video textures. In order to resolve the problems discussed above, in our previous work we proposed a Discrete Wavelet Transform (DWT)-based multi-resolution image quilting algorithm [9] in which coefficient-
blocks of the spatio-frequency components of the input texture are efficiently stitched together to form the corresponding components of the synthesised output texture. In this paper, we propose major improvements to this algorithm in terms of speed and the quality of synthesized texture. In particular, we adopt a modified version of the Embedded Zerotree Wavelet (EZW) coding algorithm of Shapiro [10], popularly used in progressive coding of images, in achieving progressive texture synthesis capabilities. Using theoretical and experimental analysis we show that the complexity of our algorithm is of order O(n), where n is the number of pixels in the sample texture. The proposed texture synthesis algorithm consists of the following features.
A DWT framework which provides a compact multi-resolution representation of texture. A Zerotree [10]-based approach providing an ordered representation of perceptually significant coefficients. A multi-resolution construction capability of synthesized texture. The possibility of being used in bandwidth adaptive systems requiring dynamic quality/ speed adjustments. Fast mapping in hardware with low processing power.
The above features specifically make it viable for the proposed texture synthesis algorithm to be used in association with the progressive 3D surface coding and transmission algorithms, such as MESHGRID [11], considered within the MPEG-4 AFX standardisation activities [12]. For clarity of presentation, the rest of the paper is divided into four further sections as follows. Section 2 introduces the reader to the basics of DWT-based analysis of a texture image and a summary of our previously proposed multi-resolution texture synthesis algorithm. Section 3 presents the proposed multiresolution framework. Section 4 provides experimental results and a comprehensive analysis of the results. Finally Section 5 concludes, with an insight to possible improvements and future variations. 2. Multi-resolution texture synthesis and the previously proposed algorithm A texture image contains large amounts of perceptual data. Therefore the number of bits
ARTICLE IN PRESS D.S. Wickramanayake et al. / Signal Processing: Image Communication 23 (2008) 1–13
required to represent one with good resolution is high. Research in image compression technologies has proven that it is possible to produce a texture of near perceptual quality with only about 20% of total image data. Unfortunately, identifying this significant data in the pixel domain is difficult. However, images consist of a wide range of frequency components spread throughout the human visual frequency band. Some of these frequency components have a significant effect in human perception while some others have very low significance. Existing texture synthesis algorithms that produce near photorealistic texture, demands high computational power often requiring hours to synthesize small areas of texture. This is due to the reason that they are based on a texture analysis in the pixel domain in which a de-correlation of frequency bands is not possible. The best way to speed up these algorithms is to identify the perceptually significant frequency bands and use only those frequency components in the synthesis process. This requires a good frequency analysis method. In our algorithm we use the simplest twodimensional DWT, the Haar Transform as means for de-correlating the frequency bands of a texture. Initially, the image is divided into four sub-bands that arise as a result of carrying out horizontal and vertical DWT filters over the pixel grid. The subband labelled LH1, HL1 and HH1 represent the finest scale wavelet coefficients whereas the subband labelled LL1 represents low resolution coefficients (see Fig. 1(a)). In order to obtain the next level of wavelet coefficients, the sub-band LL1 is further decomposed and sampled using vertical and horizontal DWT filters (see Fig. 1(b)). The process
LL1
LL2
HL2
LH2
HH2
is repeated until a required final scale is reached. By observing the sub-bands we can note that for each coarser scale, the coefficients represent a larger spatial area of the texture but a narrower band of frequencies. There are three sub-bands at each scale and the remaining lowest frequency sub-band is a representation of the information at all coarser scales. In the above multi-resolution representation, coefficients in sub-bands have parent–child relationships. The family tree can be constructed as follows (see Fig. 1(d)). Each coefficient in LL3 sub-band has one child in sub-bands HL3, LH3 and HH3. Each coefficient in HL3, LH3 and HH3 has four children in the corresponding lower sub-band. Note that in a three-level decomposition, one coefficient tree corresponds to a 8 8 block of pixels in the original image. In our previously proposed multi-resolution texture synthesis algorithm [9], texture synthesis starts by applying a three-level DWT to the sample image. For the purpose of clarity, the wavelet decomposition process is illustrated in Fig. 2. The three-level wavelet decomposition of the sample and output textures (texture to be synthesized) will consist of ten sub-bands each. The basic idea therefore is to synthesize each sub-band of the output texture by the corresponding sub-band of the input, sample texture. The texture synthesis procedure is described in detail below. In the first phase we only synthesize the four subbands of the third decomposition level of the output image. The first considered is the synthesis of the lowest resolution sub-band (LL3). Firstly we randomly pick a block from the lowest resolution sub-band of the sample image and place it in the top
LL3 HL3
HL1
HL1
3
LH3 HH3
HL2 HL1 HH2
LH2
Level 3 Level 2
LH1
HH1
LH1
HH1
LH1
HH1
Level 1
Fig. 1. (a) Application of one-level discrete wavelet decomposition, (b) second-level decomposition, (c) third-level decomposition and a coefficient tree representing the parent–child relationship between the coefficients, (d) family tree representation corresponding to a wavelet coefficient in LL3 sub-band in a third-level wavelet decomposed image.
ARTICLE IN PRESS D.S. Wickramanayake et al. / Signal Processing: Image Communication 23 (2008) 1–13
4
Fig. 2. Transforming the sample texture into a multi-resolution image representation. (a) Sample texture, (b) single-level decomposition, (c) two-level decomposition, (d) three-level decomposition and (e) matching criteria.
left-hand corner of the lowest resolution sub-band of the output coefficient image. Subsequently corresponding (child) blocks on the three detailed sub-bands of the same resolution level (i.e. level 3) of the sample image, are transferred to the top lefthand corners of the associated sub-bands of the output texture. Once the first block has been randomly selected and transferred to the output representation as discussed above, all possible blocks of similar size from the input sample image’s lowest resolution sub-band are picked and matched, for a good overlap with the first block. The matching criteria are as follows. In general, if B0 and B are two blocks to be matched, we say B0 is the best match for B if d(B,B0 ) is minimum for all possible B blocks where, Xh dðB; B1 Þ ¼ ½qBLL3 ðiÞ qB1LL3 ðiÞ2 i2qB
i þ½qBHH3 ðiÞ qB1HH3 ðiÞ2 ,
ð1Þ
where qBLL3 is the edge zone of block BLL3 (see Fig. 2) and i is an element (coefficient) within the edge zone. In the proposed scheme we use the combined matching error of the blocks in low-resolution sub-band (LL3), BLL3 and one of the horizontal (HL3), vertical (LH3), or diagonal sub-bands (HH3), to compute the total matching error. This is
illustrated in Fig. 2. In general, the decision to select a combination between the matching errors (as described above) in the horizontal, vertical or diagonal sub-bands could be taken by comparing the energy level of these components. This is justified as a higher energy detail component of a particular type; say horizontal component, would mean that the horizontal details of the original image would be more significant and visually important than the vertical or diagonal detail. However, in checking the suitability of the candidate sample, we avoid using coefficients from the high-frequency sub-bands as they are more likely to contain noise, which could be falsely interpreted as high-energy visual information from the sample. In the above algorithm we used only the lowest resolution sub-band, LL3 together with one of LH3, HL3 or HH3 sub-bands in selecting the block to be synthesized. However, as discussed in Section 1, there are perceptually important coefficients in sub-bands of higher resolution levels as well as perceptually negligible coefficients in low-frequency bands, i.e. in the considered sub-bands. As this was not accounted for in the above texture synthesis process, the quality of the synthesized texture was found to be inferior for certain types of textures. Further the computational power was wasted in considering coefficients of insignificant visual impact in the synthesis process.
ARTICLE IN PRESS D.S. Wickramanayake et al. / Signal Processing: Image Communication 23 (2008) 1–13
In [10] Shapiro proposed the EZW algorithm which is now extensively used in DWT-based image and video coding techniques for the prioritised coding of visually significant coefficients in a DWT decomposed image. Within the research context of our present work, we adopt a modified version of the coefficient prioritisation approach used in [10] in DWT-based multiresolution texture synthesis. According to the authors’ knowledge, such an attempt has not been made in texture synthesis previously. It is noted that when using the L2 norm as the matching criteria in the pixel domain, we do not get the visually best match, a problem inherited from Freeman’s method. The novel approach to multi-resolution texture synthesis proposed in Section 3 minimizes all above shortcomings. 3. Proposed multi-resolution texture synthesis algorithm Generally, the process of texture synthesis can be mathematically represented by Eq. (2), where F is the texture synthesis function, which takes the input texture sample Isample as the input and synthesizes a texture, Ioutput. I output ¼ F ðI sample Þ.
(2)
3.1. Texture synthesis The proposed multi-resolution approach to texture synthesis starts by applying a three-level, 2D DWT (e.g., Haar Transform) to the sample image, Isample. The application of single-level 2D wavelet transform will result in decomposing Isample into a set of component images (sub-bands): I sample ¼ f ðLLs3 ; HLs3 ; LHs3 ; HHs3 ; HLs2 ; LHs2 ; HHs2 , HLs1 ; LHs1 ; HHs1 Þ,
ð3Þ
where f represents the inverse and f1 represents the forward discrete wavelet transform function. First subscript indicates whether it is a sub-band of the sample or the output, and second subscript indicates the level. If a wavelet decomposition strategy similar to the above is performed on the output texture Ioutput (which is yet to be generated), the following set of equations can be used for its mathematical representation: I output ¼ f ðLLo3 ; HLo3 ; LHo3 ; HHo3 ; HLo2 ; LHo2 , HHo2 ; HLo1 ; LHo1 ; HHo1 Þ.
ð4Þ
Therefore the three-level wavelet decomposition of the sample and output textures consist of ten sub-bands
5
each (see Fig. 1(c)). The basic idea of proposed multiresolution texture synthesis algorithm is to synthesize each sub-band of the output texture by the corresponding sub-band of the sample texture. This texture synthesis procedure is described in detail in the next paragraphs. It is noted that in the following sections, the term block tree is used to represent a collection of number of adjacent coefficient trees (see Figs. 1(c and d)). Let Bs(x, y) denote a general block tree of the decomposed sample image located at position (x, y) relative to its origin. Here subscript ‘s’ represents the sample image. We first pick a block tree, Bs(x1, y1) randomly and place it in the top left-hand corner of the output coefficient image (see Fig. 3(a)). Let this block tree be denoted by Bo(0, 0) where subscript ‘o’ represents the output (i.e. synthesized) image. Then using the right-hand side edge zone of the block tree Bo(0, 0), we create an edge tree (denoted by qBo(0, 0)) as shown in Fig. 3(b). Consequently this edge tree is used to create a so-called matching tree template by selecting only the visually significant coefficients of the edge tree using an EZW-based technique as discussed in Section 3.2: Let this matching tree template be denoted by qBtoð0; 0Þ . This matching tree template (see Fig. 3(c)) is then moved around the sample image’s three-level DWT representation in search of the best match. When the best match is found (using the criteria described in the following sub-section), a block tree of a specified size next to the best matching location (Fig. 3(d)) is picked and placed in the output representation (see Fig. 3(e)). Note that the best matching block tree selected is accompanied by the area overlapped with the matching tree template and is placed overlapped with the edge tree. These two overlapping areas are combined using the transform domain weighted edge blending technique proposed. In our experiments we have set the block tree size to 25l 25l, where l represents the level of decomposition. Consequently, for each new block tree another matching tree template is created, and above process is repeated for finding the next block tree until the first row is filled. From the second row onwards an L shaped matching tree template created using vertical and horizontal edge trees that come from the block above and the block to the right, is considered instead, and the same process is followed for finding a good candidate block tree for the next location. This process is repeated in the raster scan order till the whole output representation is filled.
ARTICLE IN PRESS 6
D.S. Wickramanayake et al. / Signal Processing: Image Communication 23 (2008) 1–13
Fig. 3. Construction of the output texture. (a) First random block tree ( ) placed at the top left-hand corner of the output texture, (b) edge zone tree of the first block tree ( ), (c) matching tree template moved around the sample texture to find the best match, (d) best position is found and the corresponding block tree is picked and (e) first random block tree ( ) and its best match ( ) (second block tree) placed at the top left-hand corner of the output texture.
3.2. Block matching criterion and matching template creation General block matching criteria can be explained as follows. If Bo(x1, y1) and Bs(x2, y2) are two block trees to be matched, we say Bs(x2, y2) is the best match for Bo(x1, y1) if d(Bo(x1, y1), Bs(x2, y2)) is minimum for all possible B5 block trees where X dðBsðx2; y2Þ ; Boðx1; y1Þ Þ ¼ fqBoðx1; y1Þ ðiÞ selected i
qBsðx2; y2Þ ðiÞg2 ,
synthesis algorithm as it involves a significant amount of searching. Therefore if the number of coefficients that needs to be compared in a particular search can be reduced, the efficiency of the algorithm can be significantly increased. In order to achieve this, instead of using all the coefficients i, we select a limited number of coefficients for comparison depending on their visual significance. The selection of a particular wavelet coefficient i is determined by two important observations of the EZW algorithm [10] of Shapiro. They are:
ð5Þ
where qBx(x, y) is the matching mask tree of the block tree Bx(x, y) (see Fig. 3(c)) and i is the ith element in the matching mask tree. Block matching is the most computationally expensive sub-process in a patch-based texture
Natural images in general have a low pass spectrum. When an image is wavelet transformed the energy in the sub-bands decrease as the scale decreases, so the wavelet coefficients will, on average, be smaller in the higher sub-bands than in the lower sub-bands (i.e. higher sub-bands only add details).
ARTICLE IN PRESS D.S. Wickramanayake et al. / Signal Processing: Image Communication 23 (2008) 1–13
Larger wavelet coefficients are more important than the smaller coefficients.
With these observations Shapiro proposed a technique for representing an image with a small number of visually significant coefficients. It was proved that if the magnitude of the parent coefficient of a tree is comparatively low there is a high tendency for the visual significance of the whole tree to be low. Such a tree was referred to as a zerotree. In addition Shapiro proposed the idea of isolated zeros in non-zero trees and specified a scanning order for optimum compression. The following notation was used for identifying the coefficients. If a coefficient is, larger than the threshold a symbol P (positive) is coded, if smaller than the negative of the threshold a symbol N (negative) is coded. If the coefficient is the root of a zerotree then a symbol T (zerotree) is coded. If the coefficient is smaller than the threshold but it is not the root of a zerotree, then a symbol Z (isolated zero) is coded. This occurs when there is a coefficient larger than the threshold in the sub-tree. For further details of this algorithm readers are referred to [10]. In order to exploit above observations and increase the efficiency of the algorithm we need to find a threshold which gives the minimum number of coefficients which could result in the best possible perceptual quality. Extensive experiments carried out show that the optimum threshold is dependent on the details contained in the texture and that typically 5–20% of the wavelet coefficients provide a sufficient approximation for synthesizing most of the texture types. Therefore the threshold is determined as follows: Let us consider a sample texture of size n n. This is then wavelet transformed and all the coefficients in all three levels are organized in descending order according to their absolute magnitudes. Depending on the energy content in the high-frequency bands of the texture, the magnitude of the kth [where (n2 0.05)o ko(n2 0.2)] coefficient is selected as the threshold. Selection of k will depend on the nature of texture (stochastic: low k, natural: high k) and in turn will define the threshold where only 5–20% of the high magnitude coefficients are considered. After finding the threshold, the edge tree is used for creating the matching template. The coefficients of the edge tree are marked using the same notation used by Shapiro [10] as described above. Consequently locations of all the coefficients that are coded
7
as P, N and Z are taken into the template. These locations are the i values specified for Eq. (5). When the above matching template is used, all coefficients below the threshold apart from the isolated zeros were disregarded in matching, regardless of whether they came from a low resolution sub-band or not. Isolated zeros were considered as they share information belonging to a tree which contains highfrequency information (Note: Large magnitude coefficients belonging to a high-frequency band contains visually significant information). 3.3. Multi-resolution weighted edge blending After a matching pair of blocks have been found, the quality of the boundary can be improved with the use of an edge blending algorithm. Even though many pixel domain edge blending algorithms are available [13], their use in transform domain blending, is limited due to the need of converting the transform domain blocks to pixel domain for blending. As a key functionality of our texture synthesis approach is its ability to complete the entire texture synthesis process in transform domain, an edge blending technique that is capable of operating entirely in the DWT domain is required. Drori et al. [14] proposed a DWT domain algorithm for general purpose image blending applications. However, due to the blending been limited only to the LL bands of the wavelet decomposed blocks/ images, its application in progressive texture synthesis is not possible. Therefore we propose a technique extending conventional pixel-based image feathering [13] algorithm by offering additional control over the processing of the individual multiresolution sub-bands that are used in the wavelet representation when moving from the source to the destination image. In this process, the source image is smoothly faded out and distorted towards the alignment of features in the destination image. The destination image starts distorted to the feature geometry of the source image and while it fades in, the distortion is gradually reduced. The resulting image contains an average of the source and the destination image adjusted to the average feature geometry. Consider two already selected adjacent wavelet coefficient blocks bs and bo (corresponding to image blocks Bs and Bo) with corresponding vertical overlapping edge zones qBs (source image) and qBo (destination image) (see Fig. 4). The edge blending technique is to generate an intermediate
ARTICLE IN PRESS D.S. Wickramanayake et al. / Signal Processing: Image Communication 23 (2008) 1–13
8
image qBblended (d) such that qBblended (0) ¼ qBs and qBblended (1) ¼ qBo. We assume that the distance d varies from 0 to 1 when the source image, qBs continuously changes to destination image qBo. Note that source and destination image sub-band coefficients are organized into a tree-structured representation using corresponding coefficients from blocks bs and bo. It is necessary to define a set of weighting matrices, W(sub)(level), (levelA{1,2,3} and subA{LL, LH, HL, HH}), where the weights of the matrices are defined such that when sub-bands of the lefthand edge zone is multiplied by the corresponding weighting matrices, the left-hand edge zone’s contribution to the overlapping region gradually decreases from left to right. Similarly, when the right-hand edge zone is multiplied by another set of weighting matrices derived from the above weighting matrices, its contribution to the overlapping area increases from left to right. The sizes of these weighting matrices are dependent on the width of the overlap. To start with, let us consider blending in the pixel domain. If the weighting matrix is W the blending equation can be represented by Eq. (6). As the overlap used in the proposed algorithm is 8, W is selected empirically as 2
0:9 6 0 6 6 6 0 6 6 0 6 W ¼6 6 0 6 6 0 6 6 4 0 0
0 0:8 0 0
0 0
0 0
0:7 0 0 0:6
0 0
0 0
0 0
0 0
0 0
0 0
0
0
0
0:5
0
0
0 0
0 0
0 0
0 0
0:4 0
0 0:3
0
0
0
0
0
0
3 0 0 7 7 7 0 7 7 0 7 7 7 0 7 7 0 7 7 7 0 5 0:2 (6)
∂Bs (∂bs)
Bs (bs)
qBblended ¼ qBs W þ qBo ðI W Þ,
(7)
where I is an identity matrix. A wavelet transform can be expressed as f(x) ¼ HxHT where the matrix H satisfies HHT ¼ aI where a is a numerical constant. Hence HT ¼ aH1. Applying wavelet transform to the both sides of Eq. (7) gives HðqBblended ÞH T ¼ HðqBs W ÞH T þ H½qBo ðI W ÞH T , qbblended ¼ HðqBs ÞIðW ÞH T þ HðqBo ÞIðI W ÞH T , Since HH1 ¼ I, qbblended ¼ HðqBs ÞH 1 HðW ÞH T þ HðqBo ÞH 1 HðI W ÞH T , qbblended ¼ HðqBs ÞH 1 HðW ÞH 1 þ HðqBo ÞH 1 ½HðIÞH 1 HðW ÞH 1 , qbblended ¼ qbs w þ qbo ðI wÞ, where qbs ¼ HqBs H 1 ; qbo ¼ HqBo H 1 ; blended blended 1 qb ¼ HqB H and w ¼ HW H 1 . This equation can be expanded as follows: 0 1 ! qbsLL qbsHL qbblended qbblended LL HL @ A¼ qbsLH qbsHH qbblended qbblended LH HH ! ! qboLL qboHL wLL wHL þ qboLH qboHH wLH wHH ! I LL wLL I HL wHL . I LH wLH I HH wHH
∂Bblended (∂bblended)
∂Bo (∂bo)
Bo (bo)
Bs (bs)
Bo (bo)
d= 0
d= 1
Fig. 4. (a) Already selected adjacent blocks showing three-level wavelet decomposed edge zones and (b) the two blocks placed so that they overlap.
ARTICLE IN PRESS D.S. Wickramanayake et al. / Signal Processing: Image Communication 23 (2008) 1–13
This leads to a set of equations where level of decomposition, l ¼ 1 qbblended ¼ qbsLLðlÞ wLLðlÞ þ qbsHLðlÞ wLHðlÞ LLðlÞ þ qboLLðlÞ ðI wLLðlÞ Þ þ qboHLðlÞ ðI wLHðlÞ Þ, s s qbblended HLðlÞ ¼ qbLLðlÞ wHLðlÞ þ qbHLðlÞ wHHðlÞ
þ qboLLðlÞ ðI wHLðlÞ Þ þ qboHLðlÞ ðI wHHðlÞ Þ, s s qbblended LHðlÞ ¼ qbLHðlÞ wLLðlÞ þ qbHHðlÞ wLHðlÞ
þ qboLHðlÞ ðI wLLðlÞ Þ þ qboHHðlÞ ðI wLHðlÞ Þ, s s qbblended HHðlÞ ¼ qbLHðlÞ wHLðlÞ þ qbHHðlÞ wHHðlÞ
þ qboLHðlÞ ðI wHLðlÞ Þ þ qboHHðlÞ ðI wHHðlÞ Þ.
In general X sublevel represents a particular sub-band of an image matrix X in which, lA{1,2,3}, X 2 fqbs ; qbo ; w; qbblended g and sub 2 fLL; LH; HL; HHg. Similarly blended coefficients can be calculated for higher levels. This derivation is based on the assumption, H T ¼ aH 1 . Therefore blending equations above are valid not only for ‘‘Haar’’ wavelets but also for all the other wavelet matrices satisfying the above assumption.
9
of different categories of textures (regular, near regular, random and stochastic) to illustrate that the proposed algorithm is capable of synthesizing all texture types. Fig. 6 compares the performance of proposed technique with that of Effros’s [6], our previous method based on DWT [9] and Kwatra’s method [8] for three test images. They clearly illustrate the improved subjective quality performance of the proposed algorithm compared to that of the benchmarks. Further experiments were performed to assess the synthesis time variation with sample size, and the results were compared with the relevant performance characteristics of Kwatra et al.’s [8] algorithm. The output texture size was kept constant at 512 512 (262,144) pixels and sample size is varied from 128 128 pixels to 1024 1024 pixels. Results shown in Graph 1 illustrate that synthesis time is linearly proportional to the number of pixels in the input sample. Thus it is proved that the complexity of the proposed texture synthesis algorithm can be represented by O(n) where ‘n’ is the number of pixels in the sample image. For Kwatra et al.’s algorithm this was stated to be O(n log n).
4. Experimental results and analysis 4.1. Detailed analysis In order to analyse the performance of the proposed algorithm, experiments were performed on a widely used set of texture images, consisting of textures of both regular and stochastic nature. A typical set of sample images and the output textures obtainable using the proposed texture synthesis algorithm is illustrated in Fig. 5. Experiments were performed on a PC with a Pentium IV 1600 MHz processor with no hardware acceleration support. The proposed algorithm was implemented using MATLAB, with no particular attention given to optimizing the code. For synthesizing a 512 512 pixels texture from a 184 184 pixels sample, the effective speedup finally achieved was approximately 20-fold as compared to the Effro’s and Freeman’s algorithm. In particular when synthesizing all textures considered in Fig. 5, Effros’s algorithm took an average of 98.26 s per texture whereas the proposed here took only 2–8 s. The results clearly indicate that the proposed method is capable of providing high-quality texture synthesis for a wide variety of textures. The selection of publicly available texture images for our experiments should enable readers to compare the performance of our algorithm with that of other state-of-the-art techniques. We have chosen samples
Closer inspection of results in Fig. 4 and further experiments have revealed that there are certain important factors that are crucial in assessing the quality of the final texture synthesized by the proposed algorithm. They are: Number of coefficients used in the mask in selecting the matching block. (1) Size of the unit of construction used. (2) Width of the mask tree.
4.1.1. Number of coefficients used In the proposed algorithm, when the threshold is lowered, the number of coefficients included in the mask tree increases. This in turn increases the quality of synthesized texture, but also the number of comparisons, and hence the computational cost, and the synthesis time. Our observations further reveal that when the quality of the synthesized texture approaches a certain maximum level, lowering the threshold further will not significantly improve the quality of the final output texture. The threshold selection criteria detailed in Section 3 is used to select a compromise between quality and computational cost.
ARTICLE IN PRESS 10
D.S. Wickramanayake et al. / Signal Processing: Image Communication 23 (2008) 1–13
Fig. 5. Synthesized (large) texture samples using the proposed algorithm. The corresponding small textures show the original texture samples.
Instead of randomly picking a block from a set of blocks, which match within 10% of the error of the best matching block [6], the proposed texture synthesis method selects the best possible match in terms of the overlap criteria discussed above. The experiments carried out by us revealed that the technique proposed in [6] is ineffective when used in conjunction with the proposed multi-resolutionbased approach. This is due to the fact that the
proposed algorithm performs block quilting (matching) using only significant wavelet coefficients. It is noted that a randomly selected block within 10% of the error of the best matching block could be vastly different from the best matching block. 4.1.2. Block size In order to maintain the global structure of the overall texture it is important to select the block size
ARTICLE IN PRESS D.S. Wickramanayake et al. / Signal Processing: Image Communication 23 (2008) 1–13
11
Fig. 6. Comparison of results: (a) Efros and Freeman’s algorithm, (b) our previous algorithm [8], (c) Kwatra [11] and (d) proposed algorithm.
as large as possible. This also accounts for increased efficiency of the algorithm as the choice of blocks available for filling the output texture becomes less, making the process fast. At the same time, selection
of large block sizes makes it increasingly difficult to find overlapping areas providing a good match, lowering the quality of the resulting texture. Selection of the optimum size of the block is
ARTICLE IN PRESS 12
D.S. Wickramanayake et al. / Signal Processing: Image Communication 23 (2008) 1–13
Input size Vs Synthesis time Synthesis Time (seconds)
30
y = 2E-05x + 3.7954 R2 = 0.9996
25 20 15 10 5 0
0
200000
400000 600000 800000 Input size (Pixels)
1000000
1200000
Graph 1. Input size (n) vs synthesis time.
dependent on the repeating pattern contained in the texture to be synthesized. The use of small block sizes will increase the synthesis time. Thus in an effective implementation of the proposed algorithm we need to have a trade off between the image quality and efficiency in selecting the block size. Our experiments show that a block size of 8 8 in LL3 sub-band gives better results for most of the textures. 4.1.3. Width of the mask tree In selecting the matching block, the width of the matching mask tree (corresponding to the area of overlap of block trees to be matched) will also account for the quality and the speed of synthesis. Using less overlapping elements (coefficients) results in, increased efficiency and more visible artefacts at block boundaries. An increase of overlapping elements results in better quality with less artefacts and increased synthesis time. However, a too extensive increase in overlapping area will result in noticeable artefacts as it makes it more difficult for the algorithm to make the correct decision on the perceptually best matching block. In order to maintain a compromised situation we have adapted an overlap of a single coefficient row (or column) at level LL3, of decomposition. This amounts to an overlap of 8 pixel rows (or columns) in the pixel domain. 4.2. Applications The added advantage of the adaptation of the Shapiro’s EZW coefficient prioritisation idea in the proposed algorithm is that, it can be used in a variety of applications where other texture synthesis
algorithms would perform less optimally. Following is a summary of applications that benefit from the proposed algorithm. Progressive 2D texture transmission: Within a progressive transmission scenario, data is transmitted according to significance. The special design of the proposed texture synthesis algorithm allows DWT coefficient significance-based progressive creation, transmission and reconstruction of the synthesized texture. Texture mapping of progressively transmitted 3D structures: MPEG-4 AFX standard is currently working on progressive transmission of 3D objects. Initially, data sufficient for a coarse representation of the geometry of the object is transmitted. The proposed algorithm can complement this effort by texturing the surface of the objects with minimal transmission of texture data. Thus, both the geometry as well as the texture can be refined progressively with progressive data transmission. Scalable mobile applications: In mobile applications the available memory and the processing power are limited and therefore displaying complex 3D objects is difficult. Storing all data required for re-rendering complex 3D objects, at the client side, is difficult. The solution is to store the data on the server side. However, this results in texture synthesis speed penalty, if conventional texture synthesis algorithms are to be used. The proposed algorithm can be used for progressive synthesis of these textures, hence, minimizing storage, bandwidth and synthesis time requirements. Compressed domain texture synthesis: Synthesizing a compressed output texture with the use of a compressed original texture sample. This is particularly useful in fast on-demand applications.
ARTICLE IN PRESS D.S. Wickramanayake et al. / Signal Processing: Image Communication 23 (2008) 1–13
5. Conclusion In this paper, we have introduced a novel approach to synthesizing textures under a multiresolution framework. We have provided experimental results and an in-depth analysis, proving that the proposed method works remarkably fast, producing better output texture quality as compared to the method proposed in [6,9]. The multi-resolution nature of the proposed framework also makes it easily applicable to modern imaging applications needing progressive transmission capabilities. In designing the above multi-resolution texture synthesis algorithm we have made a compromise between the synthesised texture quality and the algorithmic complexity by not performing seamless edge construction algorithms as in [6,7]. However, due to the multi-resolution approach and the novel matching criteria adopted, we have managed to obtain perceptually equivalent (or better) synthesised texture quality to that of [6,7] at a much less computational complexity. We are currently looking at the implementation optimisation of the algorithms. We are also in the process of applying the idea to handle the texture synthesis part omitted from consideration in the fast MESHGRID coding algorithm of [11], which has been one of the key contributions to the MPEG-4 AFX coding standard. This is expected to extend the applicability of the MESHGRID algorithm to full, fast, multi-scalable 3D object/surface coding. References [1] A. Witkin, M. Kass, Reaction-diffusion textures, in: Computer Graphics (SIGGRAPH ’91 Proceedings), July 1991.
13
[2] A. Efros, T. Leung. Texture synthesis by non-parametric sampling. In: International Conference for Computer Vision, vol. 2, September 1999, pp. 1033–1038. [3] G. Turk, Generating textures on arbitrary surfaces using reaction-diffusion, in: Computer Graphics (SIGGRAPH ’91 Proceedings, July 1991, pp. 289–298. [4] J.P. Lewis, Texture synthesis for digital painting, in: Computer Graphics (SIGGRAPH ’84 Proceedings), July 1984, pp. 245–252. [5] A. Fournier, D. Fussel, L. Carpenter, Computer rendering of stochastic models, Comm. ACM (June 1982) 371–384. [6] A. Efros, W.T. Freeman, Image quilting for texture synthesis and transfer, in: Proceedings of SIGGRAPH ’01, Los Angeles, CA, August 2001, pp. 341–346. [7] L. Ling, C.E. Liu, Y. Xing, B. Guo, H.-Y. Shum, Real time texture synthesis by patch based sampling, ACM Trans. Graphics 20 (3) (2001) 127–150. [8] V. Kwatra, A. Scho¨dl, I. Essa, G. Turk, A. Bobick, Graphcut textures: image and video synthesis using graph cuts, ACM Trans. Graphics 22 (3) (2003) 277–286 (SIGGRAPH 2003 proceedings). [9] D.S. Wickramanayake, E.A. Edirisinghe, H.E. Bez, Fast wavelet transform domain texture synthesis, in: Proceedings of the SPIE International Conference on Visual Communications and Image Processing 5308, VCIP 2004, San Jose, CA, January 2004, pp. 979–987. [10] J.M. Shapiro, Embedded image coding using zero trees of wavelet coefficients, IEEE Trans. Signal Processing 41 (12) (1993) 3445–3462. [11] I.A. Salomie, A. Munteanu, A. Gavrilescu, G. Lafruit, P. Schelkens, R. Deklerck, J. Cornelis, Meshgrid–a compact, multiscalable and animation-friendly surface representation, IEEE Trans. CSVT 14 (7) (2004) 950–966. [12] Moving Picture Experts Group (MPEG): ISO/IEC Standard 14496-16, a.k.a. MPEG-4 Part 16: Animation Framework eXtension (AFX), 2004. [13] R. Szeliski, H.Y. Shum, Creating full view panoramic mosaics and environment maps, in: Proceedings of SIGGRAPH 97, August 1997, pp. 251–258. [14] D. Drori, Lishchinski Fast multi-resolution image operations in the wavelet domain, IEEE Trans. Visualization Comput. Graphics (2003) 395–411.