Region level based multi-focus image fusion using quaternion wavelet and normalized cut

Region level based multi-focus image fusion using quaternion wavelet and normalized cut

Signal Processing 97 (2014) 9–30 Contents lists available at ScienceDirect Signal Processing journal homepage: www.elsevier.com/locate/sigpro Regio...

6MB Sizes 0 Downloads 27 Views

Signal Processing 97 (2014) 9–30

Contents lists available at ScienceDirect

Signal Processing journal homepage: www.elsevier.com/locate/sigpro

Region level based multi-focus image fusion using quaternion wavelet and normalized cut Yipeng Liu a, Jing Jin a, Qiang Wang a,n, Yi Shen a, Xiaoqiu Dong b a Department of Control Science and Engineering, Harbin Institute of Technology, Room 602, Main Building, No. 92 Xidazhi Street, Nangang District, Harbin 150001, China b The Fourth Hospital of Harbin Medical University, No. 37 of Yi Yuan Street in Nan gang district, Harbin 150001, China

a r t i c l e i n f o

abstract

Article history: Received 25 March 2013 Received in revised form 10 October 2013 Accepted 13 October 2013 Available online 28 October 2013

Region level based methods are popular in recent years for multifocus image fusion as they are the most direct fusion ways. However, the fusion result is not ideal due to the difficulty in focus region segmentation. In this paper, we propose a novel region level based multifocus image fusion method that can locate the boundary of the focus region accurately. As a novel tool of image analysis, phases in the quaternion wavelet transform (QWT) are capable of representing the texture information in the image. We use the local variance of the phases to detect the focus or defocus for every pixel initially. Then, we segment the focus detection result by the normalized cut to remove detection errors, thus initial fusion result is acquired through copying from source images according to the focus detection results. Next, we compare initial fusion result with spatial frequency weighted fusion result to accurately locate the boundary of the focus region by structural similarity. Finally, the fusion result is obtained using spatial frequency as fusion weight along the boundary of the focus region. Furthermore, we conduct several experiments to verify the feasibility of the fusion framework. The proposed algorithm is demonstrated superior to the reference methods. Crown Copyright & 2013 Published by Elsevier B.V. All rights reserved.

Keywords: Multifocus image fusion Focus region detection Quaternion wavelet Normalized cut Spatial frequency Structural similarity

1. Introduction In recent years, image fusion has become an important and useful technique widely used in defense system, remote sensing, and medical imaging etc. [1]. It is defined as the process of integrating information from multiple images of the same scene into a single enhanced composite image that is more informative and appropriate than any of the individual source images for visual perception or computer processing such as segmentation and feature extraction [2].

n Corresponding author at: Dept. of Control Science and Engineering, School of Astronautics, Harbin Institute of Technology, PR China, 92 XiDaZhi Street, 327 MailBox, Zhulou Bldg, Room 605, Harbin, PR China, 150001. Tel.: þ86 451 86413411x8602; fax: þ 86 451 86418328. E-mail address: [email protected] (Q. Wang).

Multifocus image fusion is a classical branch of the field. In the applications of digital cameras, it is usually impossible to acquire an image that contains all salient features and relevant objects in-focus, resulting from the limited depth-of-focus of optical lenses. One feasible way to solve this problem is by extending depth of focus based on focal plane detection [3–5]; another one is by exploiting the multifocus image fusion technique, and thus multiple images with different focus depths are composited to form an image with every relevant object fully focused [6]. In this paper, we focus on fusion techniques which are classified into spatial and transformed domain based methods. Pixel based image fusion in spatial domain is a straightforward way to combine source images [7]. It takes advantage of original information and easy implementation, but would lead to blurring effects and the high

0165-1684/$ - see front matter Crown Copyright & 2013 Published by Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.sigpro.2013.10.010

10

Y. Liu et al. / Signal Processing 97 (2014) 9–30

Fig. 1. Schematic diagram of image fusion method in [10].

sensitivity to noise and misregistration [8]. To further improve the quality of the fusion result, many researchers proposed to divide the images into blocks or segmented regions [8–11]. Researchers employ focus measures to indentify the regions in focus, next, simply copy them into a single resultant image [9] or designed the weight function imposed on the source images according to the local features, such as spatial frequency [10] or structural similarity index (SSIM) [11]. These methods fuse pixels as a whole and avoid the aforementioned problems from pixelbased methods, however, suffer from block artifacts around the block or region boundary which significantly degenerate the visual perception of the fused image. Meanwhile, as all know, segmentation results are vital to the quality of the fused image [11], but segmentation is still a challenging task. Recently, multiscale transform based methods have been popular for image fusion. The research concentrated on exploring advantages of various wavelets, including discrete wavelet [12,13], steerable pyramid [14], curvelet [15], complex wavelet [16], nonsubsampled contourlet [17], shearlet transform [18] and hybrid multiresolution method [19,20]. Coefficients in multiscale domain represent the sharpness and edges of an image and they are meaningful in detecting salient features. On the other hand, coefficients are different from the concept of pixels, and there is not a one-to-one mapping relationship between pixels and coefficients. Thereby, the variation of one coefficient would affect several pixels. Moreover, some information of the source images may also be lost during the implementation of the inverse multiresolution transform [10]. Based on this, the references [21,22] proposed hybrid methods to combine the advantages of spatial region based and transformed domain, but the postprocessing step using morphologic operation is not robust. In this paper, we aim to avoid abovementioned problem to the greatest extent by exploiting the advantages of a novel image analysis tool, i.e. quaternion wavelet [23–26]. The reference [25] proposed a pixel level multifocus image fusion method using quaternion wavelet. The reference [26] combined quaternion algebra and curvelet transform to fuse multifocus color images. Quaternion algebra is helpful for color image processing where real part of a quaternion is zero, and three imaginary parts

represent RGB components, respectively. However, the quaternion wavelet in our paper constructs an analytic signal whose phases represent the geometric structure of the image, and we find these phases in a local window can reflect the pixel in focus or not. All of these characteristics are suitable to the grayscale image processing. So we use phase information of quaternion wavelet to detect the focus region of source images initially. And then, we desire to remove the detection error robustly by using normalized cut with fixed cluster numbers and improve the visual perception by locating the boundary along the focus region accurately. The rest of this paper is organized as follows. The related work is illustrated in Section 2. Sections 3 and 4 presents a brief introduction to QWT and our proposed multifocus image fusion method. In Section 5, the experimental results of main current methods besides our work are given. Finally, we draw the conclusions in Section 6.

2. Related work The reference [10] proposed a region level based fusion method in spatial domain. The framework is illustrated in Fig. 1, taking two different focus images as an example. The fusion process is accomplished by the following steps. (1) The initial fused image is obtained by averaging two registered source images A and B. (2) The initial fused image is partitioned into several regions using the normalized cuts algorithm [27]. The pixels in the image are represented as nodes in a weighted undirected graph. Every pair of nodes is connected by an edge, and the weight on each edge is measured by the similarity between nodes. The graph is segmented by cutting the edges and [27] proposed a disassociation measure to compute the cut cost. (3) Partition images A and B using the result of step (2). (4) The spatial frequency (SF) of each region of partitioned A and B is computed. SF is defined as qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi SF ¼ ðRFÞ2 þ ðCFÞ2 ð1Þ

Y. Liu et al. / Signal Processing 97 (2014) 9–30

11

Fig. 2. Clock (a) Right focus; (b) left focus; and (c) regions partition of the averaging fusion result.

where RF and CF are the row frequency sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi M N 1 RF ¼ ∑ ∑ ½f ðx; yÞ  f ðx; y  1Þ2 M  Nx¼1y¼2 and column frequency sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi M N 1 CF ¼ ∑ ∑ ½f ðx; yÞ  f ðx  1; yÞ2 M  Nx¼2y¼1 where f ðx; yÞ is the gray level intensity of the pixel ðx; yÞ, and ðM; NÞ is the size of the locally computational window. (5) Compare the spatial frequency of the corresponding regions of the two source images to decide which should be used to construct the fused image: (6) Merge all the selected regions to reconstruct the final image.

3. Quaternion wavelet transform and focus measure Quaternion wavelet transform (QWT) is the extension of complex wavelet transform that provides a richer scalespace analysis for 2-D signals' geometric structure. Contrary to discrete wavelet transform (DWT), it is near shift invariant and provides a magnitude-phases local analysis of images. For convenience of further discussions, we briefly review some basic ideas on quaternion and construction of QWT.

3.1. Quaternion wavelet transform The quaternion algebra ℋ was invented by Hamilton in 1843 which is a generalization of the complex algebra. ℋ ¼ q ¼ a þ bi þcj þdkja; b; c; d A R

There is a problem for the method that the focus and defocus objects may be partitioned into one region and they cannot be separated in the next fusion process. This is not an ideal fusion way as the fused image will include the information from both focus and defocus regions. For example, the right and left focus images of ‘clock’ are shown in Fig. 2(a) and (b), and the regions partition result is shown in Fig. 2(c). Obviously, two clocks with different focuses in source images are mixed together into one region.

ð2Þ

where the orthogonal imaginary numbers i, j and k satisfy the following rules: 2

i2 ¼ j2 ¼ k ¼  1; ij ¼ k; jk ¼ i; ki ¼ j

ð3Þ

An alternative representation for a quaternion is q ¼ jqjeiϕ ekψ ejθ

ð4Þ

where ðϕ; θ; ψÞ A ½ π; πÞ  ½  π=2; π=2Þ  ½  π=4; π=4. It is defined by one modulus and three angles that we call phase. When ψ A ð  π=4; π=4Þ, The computational formulae

12

Y. Liu et al. / Signal Processing 97 (2014) 9–30

5

1

8

4

4

0.8

6

3

3

0.6 4

2

2

0.4

1

0.2

2

1

0 -4

-2

0

2

4

0 -4

-2

0

2

4

0 0.5

0 1

1.5

2

2.5

0.8

0.5

0.8

0.4

0.6

0.4

0.6

0.3

0.4

0.2

0.2

0.1

-4

-2

0

2

4

-4

-2

0

2

4

0.3 0.4 0.2 0.2 0 -4

0.1 -2

0

2

4

0 -4

-2

0

2

4

0 -4

0 -2

0

2

4

Fig. 3. (a) The original image; (b) blurred image of (a); (c) the phase ϕ coefficients distribution of (a); and (d) the phase ϕ coefficients distribution of (b).

[28] is 8   > ϕ ¼ arctan 2 2ðac2 þ bdÞ > 2 > a þ b  c2  d > <   þ cdÞ θ ¼ arctan 2 2ðab 2 2  d2 > a  b þ c > > > : ψ ¼  1 arcsinð2ðad dbÞÞ 2

1 0.9

ð5Þ

0.8 0.7 QWT WT

0.6

The quaternionic analytic signal is defined by its partial (H1 , H 2 ) and total (H T ) Hilbert transforms (HT).

0.5 0.4

f A ðx; yÞ ¼ f ðx; yÞ þ iH 1 ðf ðx; yÞÞ þ jH 2 ðf ðx; yÞÞ þ kHT ðf ðx; yÞÞ

ð6Þ

where H 1 ðf ðx; yÞÞ ¼ f ðx; yÞnnðδðyÞ=πxÞ, H 2 ðf ðx; yÞÞ ¼ f ðx; yÞnn ðδðxÞ=πyÞ and H T ðf ðx; yÞÞ ¼ f ðx; yÞnnð1=π 2 xyÞ. δðyÞ and δðxÞ are impulse sheets along x and y axes, respectively; and nn denotes 2-D convolution. We start with real separable scaling function φ and mother wavelets ψ H , ψ V , ψ D , for separable wavelet, ψðx; yÞ ¼ ψ h ðxÞψ h ðyÞ. According to the definition of the quaternionic analytic signal, the QWT, i.e. the analytic 2D

0.3 0.2 0.1 0 0

5

10

15

Window size of Gaussian blurring

Fig. 4. Comparative sharpness metric curves for Fig. 3(a) with different Gaussian blur degrees.

Y. Liu et al. / Signal Processing 97 (2014) 9–30

13

Fig. 5. Local focus measure result for Fig. 2(b) (a) using high frequency coefficients of phases ðϕ; θÞ; (b) using low frequency coefficients of phases ðϕ; θÞ.

wavelets can be constructed as follows. 8 φ ¼ φh ðxÞφh ðyÞ-φ þiH 1 ðφÞ þ jH 2 ðφÞ þ kHT ðφÞ > > > > H < ψ ¼ ψ h ðxÞφh ðyÞ-ψ H þ iH 1 ðψ H Þ þjH 2 ðψ H Þ þkH T ðψ H Þ ψ V ¼ φh ðxÞψ h ðyÞ-ψ V þ iH 1 ðψ V Þ þjH 2 ðψ V Þ þkH T ðψ V Þ > > > > : ψ D ¼ ψ ðxÞψ ðyÞ-ψ D þ iH 1 ðψ D Þ þjH 2 ðψ D Þ þkH T ðψ D Þ h

ð7Þ

in each frequency sub-band, and the three phases ðϕ; ψ; θÞ describe the ‘structure’ of those features. More details about implementation of QW used here are referenced to the work [23]. 3.2. Phase based image sharpness measure using QWT

h

For separable wavelet, ψðx; yÞ ¼ ψ h ðxÞψ h ðyÞ, 2-D HT equals to twice 1-D HT along row and column, respectively. Considering that 1-D HT pair, i.e. (ψ h ; ψ g ¼ Hψ h ), and scaling function (φh ; φg ¼ Hφh ), 2-D analytic wavelet can be derived from formula (6) as the product form of 1-D separate wavelet, 8 φ ¼ φh ðxÞφh ðyÞ þiφg ðxÞφh ðyÞ þ jφh ðxÞφg ðyÞ þkφg ðxÞφg ðyÞ > > > > > < ψ H ¼ ψ h ðxÞφh ðyÞ þ iψ g ðxÞφh ðyÞ þjψ h ðxÞφg ðyÞ þ kψ g ðxÞφg ðyÞ ψ V ¼ φh ðxÞψ h ðyÞ þ iφg ðxÞψ h ðyÞ þ jφh ðxÞψ g ðyÞ þ kφg ðxÞψ g ðyÞ > > > > > : ψ D ¼ ψ h ðxÞψ h ðyÞ þ iψ g ðxÞψ h ðyÞ þjψ h ðxÞψ g ðyÞ þ kψ g ðxÞψ g ðyÞ ð8Þ The real-imaginary quaternion analytic form can be transformed into magnitude-phase form as (4) according to (5). The QWT magnitude jqj, with the property of near shift-invariance, represents features at any spatial position

The defocus, i.e. the sharpness of the original image is degraded, causes that the distribution of the pixel intensity is shrunk. In other words, the pixel intensity of texture is approaching to uniformity and the intensive pixel changes attenuate. Based on this opinion, we considered seeking out the edges variations between images with different blur degree based on the phase information of QWT [29]. From a straightforward perspective, since the phases of QWT are capable of representing the texture information in the image, the high frequency of phases will witness the trend from clarity to blur. Moreover, ðϕ; θÞ stands for vertical and horizontal edges, and thus the horizontal and vertical high frequency component i.e. HL and LH sub-bands of ðϕ; θÞ should inherently reflect the visual edge changes along two directions. For example, given an image in Fig. 3(a), and its blurred edition in Fig. 3(b), and their coefficients distribution of the phase ϕ

14

Y. Liu et al. / Signal Processing 97 (2014) 9–30

Fig. 6. Multifocus image fusion framework using QWT.

the QWT with wavelet transform (WT) based method through following perspectives. 3.3.1. Global image sharpness measure From Fig. 4, we can see that the measurement range of the WT based image sharpness is narrow according to the different blur degree while QWT based measure gives obvious distinction even if the blur degree is large, i.e. when the window size of Gaussian blur degree is larger than 7.

Fig. 7. The focus region detection result combined with normalized cut.

are shown in Fig. 3(c) and (d), respectively. We can find that the coefficients distribution in HL sub-band of the phase ϕ becomes more concentrated which shows the intensive pixel changes attenuate. 3.3. Local focus measure using QWT Intuitively, the phase based image sharpness measure (PM) [29] is defined as qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi PM ¼ s2h1 þ s2h2 ð9Þ where h stands for high frequency, ðsh1 ; sh2 Þ is the standard variance of QWT coefficients corresponding to the subscripts ðh1; h2Þ, which stands for HL and LH component of phase ðϕ; θÞ, correspondingly. Computationally, the local window size is 9n9. The reference [13] proposed a fusion method using a wavelet-based statistical sharpness measure. We compare

3.3.2. Local focus measure Here, we desire to judge the local window in focus or not and the result using formula (9) for Fig. 2(b) is shown in Fig. 5(a). The result shows that the high frequency coefficients of phase ðϕ; θÞ in a local window are not sufficient to reflect the local focus condition. However, the local focus measure using low frequency coefficients performs well in Fig. 5(b) because the distribution range of low frequency coefficients is more centralized, and even though the number of coefficients in a local window is small and it can reflect the general distribution shape. Generally, the wavelet based focus measure result in Fig. 5 (c) is acceptable, but the boundary between two focus regions is evidently incorrect. 4. Proposed framework for the multifocus image fusion The framework diagram of the proposed method is shown in Fig. 6. Multifocus image fusion aims to integrate focus regions together. Considering the better performance of phases based focus measure using QWT, we detect whether each pixel is in focus or not based on it firstly. In the following subsections, the algorithm is explained in detail. 4.1. Binary image segmentation using normalized cut In our fusion framework, after the focus measure using QWT, the general focus regions are obtained and then

Y. Liu et al. / Signal Processing 97 (2014) 9–30

15

Fig. 8. (a) The initial fusion result using QWT; (b) The initial fusion result using SF weighted function; (c) The differences between Figs. 8(a) and 2(b); (d) The differences between Figs. 8(b) and 2(b); (e) The structural differences between Fig. 8(a) and (b); and (f) The amendment of initial focus region detection.

applied with normalized cut and filling holes that can remove undesirable detection errors. The initial detection result shown in Fig. 5(b) of the focus region in source images is amended by combining

local focus measure with normalized cut, and the amendatory result is shown in Fig. 7. Notes on the parameter of cluster numbers for normalized cut segmentation:

16

Y. Liu et al. / Signal Processing 97 (2014) 9–30

Fig. 9. (a) Fusion result using wavelet based focus measure for Fig. 2(a) and (b); (b) Fusion result based on Fig. 9(a) improved by proposed postprocessing framework; (c) Proposed fusion result for Fig. 2(a) and (b); (d) Difference image between fused images in Fig. 9(a) and the source image Fig. 2(a); (e) Difference image between fused images in Fig. 9(b) and the source image Fig. 2(a); and (f) Difference image between fused images in Fig. 9(c) and the source image Fig. 2(a).

(1) Researchers need to set the “the number of clustering” parameter which will affect the number of segmented regions in the segmentation result. And the paper [10]

set the parameter 10, but we can see that it would cause that one segmented region maybe include the focus and defocus pixels at the same time from

Y. Liu et al. / Signal Processing 97 (2014) 9–30

17

Fig. 10. Fusion results of different methods for ‘Clock’: (a) LP; (b) NSCT; (c) BM; and (d) SC.

Fig. 2(c). Moreover, it is also a problem that how to set the parameter for images with different sizes. (2) In our paper, after detecting whether the pixels are in focus or not, the focus region map is a binary image, and we can set the parameter as 2 because we just desire to find the focus regions. The performance of focus measure based on quaternion wavelet is helpful to provide an approximately correct map for most focus pixels, so after segmenting the binary map using normalized cut with 2 clusters, the segmentation result includes the concentrated region of most focus pixels and avoid the isolated regions.

Because the window based focus measure is not accurate along the boundary, we need to further improve the visual perception of the initial fusion result. It is effective to locate the boundary through comparative study between initial focus region detection result and another fusion result preserving the image edges. Spatial frequency (SF) is used to evaluate the edge information in a given window. So we choose the weighted function based fusion result using SF as a reference. By the way, totally weighted function based fusion result includes undesirable information from defocus regions, and is not ideal for multifocus image fusion.

The characteristics of normalized cut in our framework compared to direct image segmentation [10] are summarized as

4.2. Spatial frequency weighted fusion

(1) The cluster number parameter is robust, and corresponding to two different conditions in multifocus images- focus and defocus; (2) Normalized cut helps to clear away the isolated regions, which is benefited from the better performance of the local focus measure based on QWT.

The weight function is designed as W A ði; jÞ ¼

8 > < 0:5

SF½ΩA ði; jÞ > : SF½ΩA ði; jÞ þ SF½ΩB ði; jÞ

if SF½ΩA ði; jÞ ¼ 0; and SF½ΩB ði; jÞ ¼ 0 others

ð10Þ where A and B stand for two multifocus images, ði; jÞ expresses the image pixel position and Ω denotes a

18

Y. Liu et al. / Signal Processing 97 (2014) 9–30

Fig. 11. The difference images between: (a) Figs. 10(a) and 2(a); (b) Figs. 10(b) and 2(a); (c) Figs. 10(c) and 2(a); and (d) Figs. 10(d) and 2(a).

Fig. 12. Pepsi (a) left focus; and (b) right focus.

window centered at ði; jÞ. And W B ði; jÞ ¼ 1 W A ði; jÞ. The initial fusion result is acquired by copying the corresponding focus region from source images which is shown in Fig. 8(a). The spatial frequency (SF) weighted fusion result

is shown in Fig. 8(b) which preserves edge information from focus regions in source images. In order to observe the distinctions between the fusion results using QWT and SF respectively, we can see the difference between Fig. 2(b)

Y. Liu et al. / Signal Processing 97 (2014) 9–30

19

Fig. 13. Fusion results of different methods for ‘Pepsi’: (a) LP; (b) WT; (c) NSCT; (d) BM; (e) SC; and (f) proposed method.

and each of them, shown in Fig. 8(c) and (d). From Fig. 8(c), we can find that the right boundary of the left clock in Fig. 2(b) is not preserved in Fig. 8(a) and from Fig. 8(d) we can see there are residues for both focus and defocus regions, so the fusion result only using SF weighted function in Fig. 8(b) is also not ideal.

4.3. Boundary location with structural similarity The next key problem is how to find the differences between two given fusion results. In [30], structural similarity (SSIM) was proposed to quantify the visibility of errors between a distorted image and a reference image

20

Y. Liu et al. / Signal Processing 97 (2014) 9–30

Fig. 14. The difference images between (a) Figs. 13(a) and 12(b); (b) Figs. 13(b) and 12(b); (c) Figs. 13(c) and 12(b); (d) Figs. 13(d) and 12(b); (e) Figs. 13(e) and 12(b); and (f) Figs. 13(f) and 12(b).

Y. Liu et al. / Signal Processing 97 (2014) 9–30

Fig. 15. Lab (a) left focus; and (b) right focus.

Fig. 16. Fusion results of different methods for ‘Lab’: (a) LP; (b) WT; (c) NSCT; (d) BM; (e) SC; and (f) proposed method.

21

22

Y. Liu et al. / Signal Processing 97 (2014) 9–30

Fig. 17. The difference images between: (a) Figs. 16(a) and 15(b); (b) Figs. 16(b) and 15(b); (c) Figs. 16(c) and 15(b); (d) Figs. 16(d) and 15(b); (e) Figs. 16(e) and 15(b); and (f) Figs. 16(f) and 15(b).

Fig. 18. Note (a) left focus; and (b) right focus.

based on the human visual system. Given two images X and Y, SSIM is defined as SSIMðX; YÞ ¼ ½lðX; YÞα ½cðX; YÞβ ½sðX; YÞγ

ð11Þ

where α, β and γ are used to adjust the relative importance

of the three components. The luminance lðX; YÞ, contrast cðX; YÞand structure sðX; YÞ components are defined as lðX; YÞ ¼

2ux uy þ C 1 ux 2 þ uy 2 þC 1

ð12Þ

Y. Liu et al. / Signal Processing 97 (2014) 9–30

23

Fig. 19. Fusion results of different methods for ‘Note’: (a) LP; (b) WT; (c) NSCT; (d) BM; (e) SC; and (f) the proposed method.

cðX; YÞ ¼

2sx sy þ C 2 sx 2 þ sy 2 þ C 2

ð13Þ

sðX; YÞ ¼

sxy þ C 3 sx sy þ C 3

ð14Þ

where ux , uy are the means of X and Y; sx , sy are the standard deviation of X and Y; sxy is the cross correlation of the shifted images X  ux Y  uy ; positive constants C 1 , C 2 and C 3 are included to avoid instability when the denominators are very close to zero. We care about the local differences, and SSIM is based on pixels of the local window. The value of SSIM represents the similarity degree between the initial fusion result and the SF weighted fusion result. Actually, we aim to improve the boundary of the focus region, so only the SSIM values of the boundary neighborhood in the initial fusion result are calculated. The larger SSIM value means more similarity; meanwhile it shows better focus region detection result. After thresholding the SSIM map, we can obtain the structural differences measure in Fig. 8(e). We focus on the differences along the boundary of focus region in Fig. 7, and amend the focus region detection

result as shown in Fig. 8(f) by labeling and transferring the pixels corresponding to lower SSIM values. 4.4. Fusion process The fusion rule is simple that choosing the pixels from focus regions of source images and for the boundary along the focus region, we use formula (10) as the weight function. The weighted fusion along the boundary can alleviate block artifacts to improve the visual perception. 4.5. Advantages of proposed fusion framework We summarize the problems of the existing methods as follows: (1) Blurring effects due to pixel based computation; (2) Block artifacts due to simple copy rules between multifocus images; (3) Information loss due to the reconstruction process of different transform methods;

24

Y. Liu et al. / Signal Processing 97 (2014) 9–30

Fig. 20. The difference images between: (a) Figs. 19(a) and 18(a); (b) Figs. 19(b) and 18(a); (c) Figs. 19(c) and 18(a); (d) Figs. 19(d) and 18(a); (e) Figs. 19(e) and 18(a); and (f) Figs. 19(f) and 18(a).

Fig. 21. Plane (a) left focus; and (b) right focus.

(4) Post-processing is not robust to determine focus regions such as morphological operations due to the various choices of structural shapes and mask sizes.

Generally, the proposed method is based on the spatial focus region detection, so we can avoid the problems (1) and (3) caused by the pixel level fusion methods and the

Y. Liu et al. / Signal Processing 97 (2014) 9–30

25

Fig. 22. Fusion results of different methods for ‘Plane’: (a) LP; (b) WT; (c) NSCT; (d) BM; (e) SC; and (f) the proposed method.

Fig. 23. The difference images between: (a) Figs. 22(a) and 21(a); (b) Figs. 22(b) and 21(a); (c) Figs. 22(c) and 21(a); (d) Figs. 22(d) and 21(a); (e) Figs. 22(e) and 21(a); and (f) Figs. 22(f) and 21(a).

26

Y. Liu et al. / Signal Processing 97 (2014) 9–30

fusion result directly from the reconstruction process of multiresolution methods, respectively. In practice, we employ spatial frequency weighed function to fuse the Table 1 Evaluation of fusion results by MI. Methods

Clock

Pepsi

Lab

Note

Plane

LP WT NSCT BM SC Proposed

7.4725 9.0285 6.7213 8.8856 7.1760 8.9971

7.1848 8.7242 6.7941 8.4355 6.9614 8.8599

6.3484 8.6510 7.0040 8.5523 7.1157 8.7697

5.1841 6.8355 4.4495 7.1201 4.8710 6.8542

7.4676 8.8642 6.9733 8.6041 7.6984 8.9018

Table 2 Evaluation of fusion results by QAB/F. Methods

Clock

Pepsi

Lab

Note

Plane

LP WT NSCT BM SC Proposed

0.7392 0.7438 0.7185 0.7430 0.6423 0.7443

0.7817 0.7840 0.7757 0.7826 0.7396 0.7927

0.5960 0.7567 0.7267 0.7514 0.5921 0.7577

0.6861 0.7048 0.6467 0.6788 0.5260 0.7062

0.7393 0.7421 0.7309 0.7561 0.7273 0.7644

boundary of the finally detected focus regions under the consideration of both visual perception and indices evaluation, so the problem (2) is expected to be solved. For the problem (4), our post-processing method is driven from the images themselves using structural similarity, and it is universal to different multifocus images. Besides, we produce the initial focus region detection result, i.e. a binary image, and segment the binary image using normalized cut which would not bring the problem that focus and defocus regions are mixed together as abovementioned in Fig. 2(c). Furthermore, we consider locating the boundary of the focus region and using spatial frequency based weight function to preserve the edges near the boundary and thus the fusion result keeps most original information from the focus region of source images and the visual quality of the fusion result is enhanced along the boundary of the focus region. The wavelet based focus measure provides a weakly acceptable result in Fig. 5(c), and we can obtain the initial fusion result as shown in Fig. 9(a), simply copying from source images (a) and (b) according to the focus regions detection result. If we substitute QWT in our proposed fusion framework with wavelet, we can improve the fusion result in Fig. 9(b), but it is still worse than the fusion result

Fig. 24. Toy (a) left focus; (b) middle focus; and (c) right focus.

Y. Liu et al. / Signal Processing 97 (2014) 9–30

27

Fig. 25. Fusion results (a) using WT; and (b) using QWT.

based on QWT in Fig. 9(c). From the differences image between fusion results and source image shown in Fig. 9 (d–f), we can easily observe that the proposed fusion method perform better, because it preserve most of the source information in the focus region. 5. Experimental results We conduct the experiments on five pairs of two focus images- ‘Clock’, ‘Pepsi’ ‘Lab’, ‘Note’ and ‘Plane’ with a group of three focus image ‘toy’, and the proposed framework is compared with two pixel level and three region level fusion methods using visual perception and objective indices including MI [31], and QAB/F [32]. MI employs mutual information for representing the amount of information that is transferred from the source images to the final fused image. The overall fusion performance is the sum of mutual information between each source image and the final fused image; QAB/F evaluates the amount of edge information that is transferred from input images to the fused image, and the edge detection process is based on Sobel algorithm that is applied both horizontally and vertically. Compared pixel level methods are Laplacian pyramid (LP) and nonsubsampled contourlet transform based fusion (NSCT) [17] which is usually performed best [33] in contrast with DWT, dual tree complex wavelet, stationary wavelet, curvelet and contourlet transform. Also, we compare the proposed fusion result with region level fusion methods using DWT which substitutes QWT in our framework, blur measure (BM) [9] and similarity characteristics (SC) [11]. 5.1. Subjective evaluation In this section, we compare the proposed fusion method with others from visual perception. For ‘clock’ in Figs. 2(a) and (b), 10(a)–(d) are the fused images by LP, NSCT, BM and SC, respectively, which WT and QWT based results are shown in Fig. 9(b) and (c). To make more clear comparisons, Fig. 11(a)–(d) shows the difference images

between the fused images in Fig. 10, and the source image as shown in Fig. 2(a). For the focused regions, the difference between the source image and the ideal fused image should be zero. So less residual features in the difference image means better performance of the fusion method to incorporate the necessary information of the source image. From Fig. 10(a), LP does not produce continuous edge in the black square. Also, in Fig. 10(b), there are slightly undesirable edges in the black squares because of edges dislocation. This is common for the pixel level fusion method. For region level BM method, in Fig. 10(c), window based blur measure cannot accurately locate focus region boundary so it results in the artifacts happened in the black square. The visual perception of the proposed method is better in Fig. 9(c). Among the pixel level fusion method, LP perform well from Fig. 11(a). From Fig. 11(b), on the top of right clock, partial edges information is lost for NSCT. Finally, from Fig. 9(f) there is almost no residues for the proposed method which is better than BM and SC. Multifocus image pairs ‘Pepsi’ and ‘Lab’ are shown in Fig. 12 and 15 respectively. From Fig. 13(b), (c) and (f), there are no distinct visual defects for the LP, CP and the proposed method based fusion results. The NSCT based fusion result shows misregistration artifacts in the black square shown in Fig. 13(d) and BM method shows detection errors of the focus region in Fig. 13(e). From Fig. 14(a)–(f), we can see that the proposed method detects the focus region more accurately. The abovementioned observation results are also suitable to fusion results in Figs. 16 and 17 for ‘Lab’. The black squares in Fig. 16 show obvious defects in the fusion results. When we substitute wavelet for QWT in the proposed fusion framework, the results are worse mainly because the wavelet based focus measure is inferior to the QWT based one. For ‘Note’ and ‘Plane’ shown in Figs. 18 and 21, the fusion results are illustrated in Figs. 19and 22 . In our proposed fusion framework, WT and QWT based focus measure both perform better than other methods from the difference images shown in Figs. 20 and 23 between the fusion results and original multifocus images.

28

Y. Liu et al. / Signal Processing 97 (2014) 9–30

Fig. 26. The difference images between (a) Figs. 25(a) and 24(a); (b) Figs. 25(b) and 24(a); (c) difference between Figs. 25(a) and 24(b); (d) Figs. 25(b) and 24(b); (e) difference between Figs. 25(a) and 24(c); and (f) Figs. 25(b) and 24(c).

5.2. Objective evaluation For further comparison, two objective criteria are used to compare the fusion results. For both criteria, a larger value would indicate a better fusion result. The values of

MI and QAB/F of different fusion results are listed in Tables 1 and 2, respectively. The proposed fusion method outperforms other referenced methods in terms of MI and QAB/F, except that WT and BM perform a little bit better than QWT by MI for Clock and Note, respectively, but from

Y. Liu et al. / Signal Processing 97 (2014) 9–30

Figs. 9 and 20, there are obvious defects in the fusion results by WT and BM, however, QWT performs better from the visual perception. Above all, the proposed fusion methods perform better no matter from the perspective of subjective or objective evaluations. Pixel level fusion methods referenced in the paper tend to produce undesirably extra textures because of the local misregistration, especially happened around image edges. Region level fusion method using blur measure can eliminate most of detection errors in the interior parts of the focus region through filling operation but cannot locate the focus region boundary accurately. 5.3. Three focus images fusion Our proposed fusion framework is not limited to fuse two focus images. It is easy to be extended to more focus images because there is no parameter related to the number of focus in the algorithm. For three focus images ‘toy’ shown in Fig. 24, considering the abovementioned better performance, the fusion results based on wavelet based focus measure and the proposed framework are shown in Fig. 25(a) and (b), and differences images between fusion results and source images are shown in Fig. 26. In white squares of Fig. 25(a), there are obvious blur in the fusion results for wavelet based fusion method. The only defect for the proposed method happens on the top of the ball from Fig. 26(d) and (f), while Fig. 25(b) based on the proposed fusion framework looks good from the visual perception. 6. Conclusions In this paper, we proposed a region level framework for the focus region detection using quaternion wavelet transform and normalized cut. It combines advantages of spatial and transform domain based methods: (1) Phase based local focus measure using quaternion wavelet detects the initial focus region roughly; (2) Binary image segmentation using normalized cut is helpful to refine the result of focus region detection; (3) The initial detection result is corrected by locating the boundary of the focus region with the structural similarity index; and (4) Spatial frequency as a weight function is used to improve the visual perception along the boundary of the focus region. The experimental results on three pairs of two focus images and a group of three focus images have demonstrated the superior performance of the proposed fusion scheme from the perspective of both subjective and objective evaluation.

Acknowledgment This work was financially supported by the National Basic Research Program of China (973 Program, Grant no. 2012CB720000), and the National Natural Science Foundation of China (Grant no. 60901043, 61201307), and supported by the Innovation Funds of Harbin Institute of Technology (Grant no. IDGA18102011). Also thanks to the financial support from Chinese Scholarship Council.

29

References [1] G. Pajares, J. Cruz, A wavelet-based image fusion tutorial, Pattern Recognit. 37 (2004) 1855–1872. (September). [2] A. Goshtasby, S. Nikolov, Image fusion: Advances in the state of the art, Inf. Fusion 8 (2007) 114–118. (April). [3] W. Chen, C. Quan, C.J. Tay, Extended depth of focus in a particle field measurement using a single-shot digital hologram, Appl. Phys. Lett. 95 (2009) 1–3. (November) 201103). [4] W. Chen, X. Chen, Focal-plane detection and object reconstruction in the noninterferometric phase imaging, J. Opt. Soc. Am. A 29 (2012) 585–592. (April). [5] Y. Liu, J. Wang, Y. Hong, Z. Wang, K. Zhang, P.A. Williams, P. Zhu, J.C. Andrews, P. Pianetta, Z Wu, Extended depth of focus for transmission x-ray microscope, Opt. Lett. 37 (2012) 3708–3710. (September). [6] J. Tian, L. Chen, L. Ma, W. Yu, Multi-focus image fusion using a bilateral gradient-based sharpness criterion, Opt. Commun. 284 (2011) 80–97. (January). [7] B. Yang, S. Li, Pixel-level image fusion with simultaneous orthogonal matching pursuit, Inf. Fusion 13 (2010) 10–19. (January). [8] G. Piella, A general framework for multiresolution image fusion: from pixels to regions, Inf. Fusion 4 (2003) 259–280. (December). [9] Y.J. Zhang, L. Ge, Efficient fusion scheme for multi-focus images by using blurring measure, Digital Signal Process. 19 (2009) 186–193. (March). [10] S. Li, B. Yang, Multifocus image fusion using region segmentation and spatial frequency, Image Vision Computing 26 (2008) 971–979. (July). [11] X. Luo, J. Zhang, Q. Dai, A regional image fusion based on similarity characteristcs, Signal Process. 92 (2012) 1268–1280. (May). [12] T. Pu, G. Ni, Contrast-based image fusion using the discrete wavelet transform, Opt. Eng. 39 (2000) 2075–2082. (August). [13] J. Tian, L. Chen, Adaptive multi-focus image fusion using a waveletbased statistical sharpness measure, Signal Process. 92 (2012) 2137–2146. (September). [14] Z. Liu, K. Tsukada, K. Hanasaki, Y.K. Ho, Y.P. Dai, Image fusion by using steerable pyramid, Pattern Recognit. Lett. 22 (2001) 929–939. (July). [15] F. Nencini, A. Garzelli, S. Baronti, L. Alparone, Remote sensing image fusion using the curvelet transform, Inf. Fusion 8 (2007) 143–156. (April). [16] J.J. Lewis, R.J. O’Callaghan, S.G. Nikolov, D.R. Bull, N. Canagarajah, Pixel- and region-based image fusion with complex wavelets, Inf. Fusion 8 (2007) 119–130. (April). [17] Q. Zhang, B.L. Guo, Multi-focus image fusion using the nonsubsampled contourlet transform, Signal Process. 89 (2009) 1334–1346. (July). [18] P. Geng, Z. Wang, Z. Zhang, Z. Xiao, Image fusion by pulse couple neural network with shearlet, Opt. Eng. 51 (2012) 067005. (June). [19] S. Li, B. Yang, Multifocus image fusion by combining curvelet and wavelet transform, Pattern Recognit. Lett. 29 (2008) 1295–1301. (July). [20] S. Li, B. Yang, Hybrid multiresolution method for multisensor multimodal image fusion, IEEE. Sens. J. 10 (2010) 1519–1526. (September). [21] Y. Chai, H. Li, Z. Li, Multifocus image fusion scheme using focused region detection and multiresolution, Opt. Commun. 284 (19) (2011) 4376–4389. [22] H. Li, Y. Chai, Z. Li, Multi-focus image fusion based on nonsubsampled contourlet transform and focused regions detection, Optik 124 (2013) 40–51. (January). [23] W.L. Chan, H. Choi, R.G. Baraniuk, Coherent multiscale image processing using dual-tree quaternion wavelets, IEEE Trans. Image Process. 17 (2008) 1069–1082. (July). [24] Y. Liu, J. Jin, Q. Wang, Y. Shen, Phase-preserving speckle reduction based on soft thresholding in quaternion wavelet domain, J. Electron. Imaging 21 (2012) 043009. (October). [25] Y. Liu, J. Jin, Q. Wang, Y. Shen, X. Dong, Novel focus region detection method for multifocus image fusion using quaternion wavelet, J. Electron. Imaging. 22 (2013) 023017. (May). [26] L. Guo, M. Dai, M. Zhu, Multifocus color image fusion based on quaternion curvelet transform, Opt. Express. 20 (2012) 18846–18860 . (September). [27] J. Shi, J. Malik, Normalized cuts and image segmentation, IEEE. Trans. Pattern. Anal. Mach. Intell. 22 (2000) 888–905. (August). [28] T. Bülow, Hypercomplex spectral signal representations for the processing and analysis of images, Ph.D. dissertation, Christian Albrechts University, Kiel, Germany, 1999.

30

Y. Liu et al. / Signal Processing 97 (2014) 9–30

[29] Y. Liu, J. Jin, Q. Wang, Y. Shen, Phases measure of image sharpness based on quaternion wavelet, Pattern Recognit. Lett. 34 (2013) 1063–1070. (July). [30] Z. Wang., A.C. Bovik, H.R. Sheikh, E.P. Simoncelli, Image quality assessment: from error visibility to structural similarity, IEEE Trans. Image Process. 13 (2004) 600–612. (April).

[31] G. Qu, D. Zhang, P. Yan, Information measure for performance of image fusion, Electron Lett. 38 (2002) 313–315. (March). [32] C.S. Xydeas, V. Petrovic, Objective image fusion performance measure, Electron Lett. 36 (2000) 308–309. (February). [33] S. Li, B. Yang, J. Hu, Performance comparison of different multiresolutiontransforms for image fusion, Inf. Fusion 12 (2011) 74–84. (April).