Image fusion based on visual salient features and the cross-contrast

Image fusion based on visual salient features and the cross-contrast

J. Vis. Commun. Image R. 40 (2016) 218–224 Contents lists available at ScienceDirect J. Vis. Commun. Image R. journal homepage: www.elsevier.com/loc...

889KB Sizes 1 Downloads 12 Views

J. Vis. Commun. Image R. 40 (2016) 218–224

Contents lists available at ScienceDirect

J. Vis. Commun. Image R. journal homepage: www.elsevier.com/locate/jvci

Image fusion based on visual salient features and the cross-contrast q Jianhua Adu a, Shenghua Xie b,⇑, Jianhong Gan a a

Software Department, Chengdu University of Information Technology, Chengdu 610225, China Sichuan Academy of Medical Sciences & Sichuan Provincial People’s Hospital, Sichuan Provincial Key Laborary of Ultrasound in Cardiac Electrophysiology and of Biomechanics, Chengdu 610072, China b

a r t i c l e

i n f o

Article history: Received 17 May 2016 Revised 20 June 2016 Accepted 25 June 2016 Available online 27 June 2016 Keywords: Image fusion Nonsubsampled contourlet transform Visual salient features The cross-contrast

a b s t r a c t To extract and combine the features of the original images, a novel algorithm based on visual salient features and the cross-contrast is proposed in this paper. Original images were decomposed into low frequency subband coefficients and bandpass direction subband coefficients by using the nonsubsampled contourlet transform. Three maps of visual salient features are constructed based on visual salient features the local energy, the contrast and the gradient respectively, and low-frequency subband coefficients are got by utilizing these visual saliency maps. The cross-contrast is obtained by computing the ratio between the local gray mean of bandpass direction subband coefficients and the local gray mean of fused low-frequency subband coefficients. Bandpass direction subband coefficients is goted by the crosscontrast. Comparison experiments have been performed on different image sets, and experimental results demonstrate that the proposed method performs better in both subjective and objective qualities. Ó 2016 Elsevier Inc. All rights reserved.

1. Introduction Image fusion is an active research area in optical signal processing, the objective of image fusion is to combine useful information from several images of the same picture or scene [1]. Therefore, multiple different images of one same scene may be acquired by different image sensors under different optic conditions or at different times to integrate different data so as to obtain more information [2]. Because the fused image contains the main features of several images which captured by different sensors, the target object in the same scene can be observed and distinguished more clearly, more comprehensively, more reliably. Now, as an important image analysis and computer vision technology, image fusion has widely applied to target recognition, computer vision, remote sensing, robot, medical image processing, military application, etc. Meanwhile, image fusion can provide more effective information for further computer image processing, such as high efficiency video processing, image classification, image segmentation, object recognition and detection [3–9]. In recent years, many effective image fusion methods have been proposed, such as the method based on Multi-scale transform (MST) [10], the method based on ICA or PCA [11], the method based on neural networks [12], the method based on SIFT [13]

q

This paper has been recommended for acceptance by M.T. Sun.

⇑ Corresponding author.

E-mail address: [email protected] (S. Xie). http://dx.doi.org/10.1016/j.jvcir.2016.06.026 1047-3203/Ó 2016 Elsevier Inc. All rights reserved.

and the method based on morphological component [14]. Multiscale transform (MST)-based fusion methods are the most popular and important tools in image processing, which are also effectively used for image fusion. There are many classical MST-based fusion methods such as pyramid-based ones, wavelet-based ones and multi-scale geometric analysis (MGA)-based ones. Pyramid-based ones include Laplacian pyramid (LP) [15,16], ratio of low-pass pyramid (RP) [17] and gradient pyramid (GP) [18,19]. The wavelet-based ones include discrete wavelet transform (DWT) [10,20], stationary wavelet transform (SWT) [21–24] and dualtree complex wavelet transform(DTCWT) [25]. The multi-scale geometric analysis (MGA)-based ones include curvelet transform (CVT) [26,27], ridgelet transform [28], nonsubsampled contourlet transform (NSCT) [29–31] and nonsubsampled shearlet transformation(NSST) [32–34]. In general, the MST-based fusion methods consist of the following three steps [35,36]. First, the original images are decomposed into a multi-scale transform domain. Secondly, the transformed coefficients are merged with a given fusion rule. Finally, the fused image is reconstructed by performing the corresponding inverse transform over the merged coefficients. Therefore, it’s obvious that the fusion rule of high-pass and lowpass subband image plays a crucial role for the result of image fusion. Moreover, transform domain also has a great impact on the fused results. Human is primarily dependent on visual sense to obtain information from the outside world. The studies of human visual system (Human Visual system, HVS) have shown that during

219

J. Adu et al. / J. Vis. Commun. Image R. 40 (2016) 218–224

observating and understanding a image, HVS is usually more concerned about salient features of the image [37–39]. Some analysis methods based on visual saliency also have been proposed to quickly detect salient area or targets in an image [40–43]. In this paper, three feature maps will be constructed based on visual saliency which are the local energy, the contrast and the gradient respectively, and low frequency subband coefficients are fused utilizing these visual feature salient maps. Then, a cross-contrast fusion method is used to get bandpass directional subband coefficients, and the cross-contrast represents the ratio between the local gray mean of the bandpass directional subband coefficients and the local gray mean of the fused low frequency subband coefficients. A comparative study of different MST-based methods is reported in [44], where Li et al. found that the NSCT-based method can generally achieve the best results. Therefore, in this paper, NSCT has been selected as MST-based fusion method. This paper is organized as follows. The following section briefly explains the principle of NSCT, and the Section 3 introduces image fusion based on nonsubsampled contourlet transform. the Section 4 introduces image fusion algorithm based on visual salient features and the cross-contrast. In Section 5, the results and analysis of experiments are presented. Finally, our conclusions are given in Section 6. Further, for brevity, in the subsequent part of this paper we use the abbreviation LFS and BDS, and define low frequency subband coefficients as LFS coefficients and bandpass directional subband coefficients as BDS coefficients

2. Non-subsampled contourlet transform The tools of multiscale geometric analysis have been broadly used in image fusion. Nowadays, wavelet transform is an efficient tool to express the one-dimensional (1-D) piecewise smooth signals, but in the case of two-dimensional (2-D) signals, it cannot efficiently preserve edges of a nature image. In addition, separable wavelets are deficient in capturing only limited directional information and feature of multi-dimensional signals. To overcome the drawbacks of wavelet in dealing with higher dimension signals, Do and Vetterli [45] recently pioneered a new representational system named contourlet, which is a real representation of 2-D signal. The contourlet transform uses the Laplacian pyramid (LP) for multi-scale decomposition, and the directional filter bank (DFB) for directional decomposition. The contourlet transform was proposed to address the lack of geometrical structure in the separable two-dimensional wavelet transform. Because of its filter bank structure, the contourlet transform is not shift-invariant. Afterwards, in 2006, a novel multi-scale decomposition method, the nonsubsampled contourlet transform evolving from contourlet transform, was proposed by da Cunha et al. [46]. The NSCT is not only with multi-scale, localization, and multi-direction, but also with properties of shiftinvariance and the same size between each subband image and the original image. The NSCT not only retains the characteristics

of contourlet, but also has other important properties of the shift invariance. The size of different subbands is identical, so it is easy to find the relationship among different subbands, which is beneficial for designing fusion rules. The contourlet transform and NSCT have a similar approach of decomposition and reconstruction. In NSCT, the multiscale analysis and the multidirection analysis are also separate, but both are shift-invariant. The construction of NSCT is based on a nonsubsampled pyramid filter banks (NSPFB) and nonsubsampled directional filter banks (NSDFB), and each subband image has the same size with the original image. Therefore, the NSCT is a flexible multi-scale, multi-direction, and shiftinvariant image decomposition, as is displayed in Fig. 1. First, NSPFB is used to obtain a multiscale decomposition by using two-channel nonsubsampled 2-D filter banks. The NSPFB decomposition is similar to the 1-D nonsubsampled wavelet transform (NSWT) computed with the àtrous algorithm [47]. Second, NSDFB is used to split bandpass subband in each scale into different directions. Finally, we also can take an inverse transform to reconstruct an image by these coefficients obtained by NSCT. More details can be seen in [46]. Consequently, introduction of NSCT into image fusion could do justice to the good character of NSCT in effectively presenting features of original images. Although the work efficiency of NSCT is a bit slow, the results are excellent. Now, the hardware has a strong computing power, so it does not matter that the shortcoming of NSCT compared with its superior performance.

3. The image fusion based on nonsubsampled contourlet transform Based on the above theory, NSCT can effectively be applied to image fusion. The image fusion based on nonsubsampled contourlet transform is usually done by the following steps [48].

3.1. Image decomposition In this paper, the intensity of pixel (m, n) of a decomposed image can be defined as ‘‘image coefficient”. Firstly, assume that there have been two original images f1 and f2 that are geometrically registered to each other. Secondly, original images f1 and f2 are separately decomposed into multiscale and multidirection with NSCT. Therefore, coefficients of two images f

f

f

f

fC i01 ðm; nÞ; C i;l1 ðm; nÞði P i0 Þg and fC i02 ðm; nÞ; C i;l2 ðm; nÞði P i0 Þg are obtained respectively. Here, C i0 ðm; nÞ is obtained after a multiscale decomposition by using two-channel nonsubsampled 2-D filter banks, which denotes the lowpass subband coefficient of the pixel (m, n) at the coarsest scale, i.e., the i0-th scale. Ci,l(m, n) is obtained after splitting bandpass subband in each scale into different directions, which denotes the BDS coefficient of the pixel (m, n) at the ith scale and the l-th direction. In this paper, C i0 ðm; nÞ and Ci,l(m, n) are obtained by NSCT tool provided by da Cunha et al. [46].

Fig. 1. Nonsubsampled contourlet transform decomposition framework.

220

J. Adu et al. / J. Vis. Commun. Image R. 40 (2016) 218–224

3.2. Image fusion After getting coefficients of lowpass and bandpass subband, the NSCT coefficients fC Fi0 ðm; nÞ; C Fi;l ðm; nÞði P i0 Þg of the fused image F can be obtain by different fusion rules. 3.3. Inverse transforms With the usage of the inverse NSCT, the fused image can be acquired with NSCT. 4. Fusion algorithm based on visual salient features and the cross-contrast In this section, the proposed algorithm in this paper will be discussed in detail. A fused image f is assumed to be generated from a pair of original images f1 and f2 that have already been registered perfectly. In image fusion based on NSCT, rules of fusion play a decisive role for quality of the fused image. In this paper, LFS coefficients are selected based on visual salient features, and for the selection of BDS coefficients, a cross-contrast method is used. The schematic diagram of the proposed image fusion method is shown in Fig. 2. 4.1. Fusion rule of low frequency subband coefficients After decomposing original images into different frequency subf

multi-focus image, the area near focus point is sharp and the defocus area is ambiguous. The defocused process is substantially similar to a sharp image is low-pass filtered by a Gaussian smoothing function G (x, y), and some high-frequency information of the original image is filtered out. Therefore, the local energy of defocused area should be smaller than the area in focus. In the infrared image, the local energy of infrared targets is significantly higher than other area. Therefore, we use the local energy as a feature of visual saliency to construct the local energy map, as shown in Eq. (1). In Eq. (1), w is the local area in a image and f(i, j) is the coefficient of the pixel in the image.



X

2

f ði; jÞ

ð1Þ

ði;jÞ2w

In a image, the visual salient area can be effectively identified by the contrast. In this paper, the contrast as a feature of visual saliency is used to identify sharp area or infrared targets in a image. The contrast map is constructed by the ratio of local gray mean and the mean of the entire image in this paper, as shown in Eq. (2).

Rðx; yÞ ¼

1 mn

Pðm1Þ=2

Pðn1Þ=2

i¼ðm1Þ=2 1 MN

j¼ðn1Þ=2 f ðx

PM PN x¼1

þ i; y þ jÞ

ð2Þ

y¼1 f ðx; yÞ

The area which texture is sharp can be effectively identified by the gradient in a image, so the gradient is used as a feature of visual saliency in this paper. Here, Sobel gradient operator is used to construct the gradient map, as shown in Eq. (3).

f

band coefficients by NSCT, LFS coefficients C j01 ðm; nÞ and C j02 ðm; nÞ

8 Dx f ðx; yÞ ¼ ½f ðx  1; y þ 1Þ þ 2f ðx; y þ 1Þ þ f ðx þ 1; y þ 1Þ  ½f ðx  1; y  1Þ þ 2f ðx; y  1Þ þ f ðx þ 1; y  1Þ > > < Dy f ðx; yÞ ¼ ½f ðx  1; y  1Þ þ 2f ðx  1; yÞ þ f ðx  1; y þ 1Þ  ½f ðx þ 1; y  1Þ þ 2f ðx þ 1; yÞ þ f ðx þ 1; y þ 1Þ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi > > : Gðx; yÞ ¼ D f ðx; yÞ^ 2 þ D f ðx; yÞ^ 2 x y

ð3Þ

will be obtained, and they will be fused into LFS coefficients

In this paper, two original images are decomposed by NSCT to

C jf0 ðm; nÞ of the fused image by using proposed method. In the

obtain the low-frequency image f 1 and f 2 . For simplicity we

image, the local energy, the contrast and the gradient are three important features of visual saliency. In this paper, three feature maps will be constructed based on these three features of visual saliency respectively by the Maximum Entropy Thresholding Method in the literature [49], and LFS coefficients of the fused image are got by utilizing these visual saliency maps. The local energy can effectively describe sharp goals in multifocus image and infrared target in the infrared Image. In the

L

assume that LFS coefficients are

L

f C j01 ðm; nÞ

f

and C j02 ðm; nÞ, and LFS

coefficients of the fused image is C jf0 ðm; nÞ. The visual salient maps L

L

of the low-frequency image f 1 and f 2 are obtained by the Maximum Entropy Thresholding Method in the literature [49], which are the Local energy map MASE, MBSE, the contrast map MASR, MBSR, the gradient map MASG, MBSG. Finally, the integration of LFS coefficients will be achieved by the following rules.

Fig. 2. Schematic diagram of the proposed image fusion method.

221

J. Adu et al. / J. Vis. Commun. Image R. 40 (2016) 218–224

(1) If C fj01 ðm; nÞ 2 MASE [M ASR [M ASG and C fj02 ðm; nÞ R MASE [M ASR [M

(3) If ðC fj01 ðm; nÞ 2 MASE Þ þ ðC fj01 ðm; nÞ 2 MASR Þ þ ðC fj01 ðm; nÞ 2 MASG Þ

f

f

ASG , then C jf0 ðm; nÞ ¼ C j01 ðm; nÞ; On the contrary, then f

(2) If

f ðC j01 ðm; nÞ

f

f

2 MASG Þ ¼¼ ðC j02 ðm; nÞ 2 MBSE Þ þ ðC j02 ðm; nÞ 2 MBSR Þ þ ðC j02

2 MASE , then

f ðC j01 ðm; nÞ

2 MASE Þ ¼ 1, otherwise

2 MASE Þ ¼ 0, Similarly, set MASR and MASG are the

same definitions. If ðC fj01 ðm; nÞ 2 MASE Þþ ðC fj01 ðm; nÞ 2 MASR Þ f

f

f

then

f

þ ðC j01 ðm; nÞ 2 MASG Þ > ðC j02 ðm; nÞ 2 MBSE Þ þ ðC j02 ðm; nÞ 2 MBSR Þ þ ðC j02 ðm; nÞ 2 MBSG Þ,

f

f

C jf0 ðm; nÞ ¼ C j02 ðm; nÞ. f C j01 ðm; nÞ

f

–0 and ðC j01 ðm; nÞ 2 MASE Þ þ ðC j01 ðm; nÞ 2 MASR Þ þ ðC j01 ðm; nÞ

f

C jf0 ðm; nÞ ¼ C j01 ðm; nÞ;

On

the

ðm; nÞ 2 MBSG Þ, then (4)

f If C j01 ðm; nÞ R M MASE [ ASR [M ASG ,

C jf0 ðm;

nÞ ¼

f ðC j01 ðm;

MASE [ ASR [ ASG M

then

M

compare

nÞþ and

the

f C j02 ðm;

nÞÞ=2.

f C j02 ðm; nÞ

contrast

R

R(x, y);

f

If R1(x, y) > R2(x, y), then C jf0 ðm; nÞ ¼ C j01 ðm; nÞ, otherwise f

C jf0 ðm; nÞ ¼ C j02 ðm; nÞ.

f

contrary, then C jf0 ðm; nÞ ¼ C j02 ðm; nÞ. 4.2. Fusion rules of bandpass directional subband coefficients

f Hf1(j ,l )

fL

f Hf2(j ,l )

C jf,1l ( x, y )

I/ [\

R fj1,l ( x, y )

C jf,2l ( x, y )

R fj2,l ( x, y ) Fig. 3. Schematic diagram of the cross-contrast method.

The local contrast reflects the change of the characteristics within the image, such as edges and texture, and also it contains high frequency information of the image and the intensity relative to the background of image, so the human visual system is sensitive to changes of local contrast in the image. Usually, if the greater the contrast is, the more sensitive the visual system may be. In view of this, the fusion method can selectively highlight the contrast of the fused image, in order to achieve good visual effects. Bandpass subband components of the image contain abundant details and reflect mutation characteristics of original image. In this paper, for the algorithm of BDS coefficients fusion, we use the sensitivity of human visual relative to the contrast of the image to extract the details of the image, and propose a cross-contrast fusion method according to the characteristics of BDS coefficients. The cross-contrast represents the ratio between the local gray mean of the BDS image and the local gray mean of the fused LFS image, as shown in Fig. 3.

Fig. 4. Multi-focus original images and fusion results. (a) right-focused image; (b) left-focused image; (c) fused image based on DWT; (d) fused image based on NMF; (e) fused image based on NSCT; (f) fused image based on proposed method.

222

J. Adu et al. / J. Vis. Commun. Image R. 40 (2016) 218–224

Because the pixels within the area have strong mutual correlations, the characteristics of the image isn’t expressed by a single pixel, but commonly by adjacent pixels of the area. The crosscontrast Rj,l(x, y) represents the ratio between the local gray mean of the BDS image and the local gray mean of the LFS image that have been obtained by the fused method described in the previous section. Since the BDS image will eventually be superimposed on the low-frequency image to restore a complete image, it is more beneficial to choose BDS coefficients of the fused image if we construct the cross-contrast by using BDS coefficients and the fused low-frequency coefficients. It’s knowned by the aforementioned the sensitivity of the human visual system relative to the contrast that the cross-contrast to some extent can reflect the effect that pixels of the BDS image are fused into the fused image. Rj,l(x, y) as shown in Eq. (4), wherein Sj,l(x, y) is the BDS coefficient at j scale and l direction, and a, b represent the range of the local area. SL(x + i, y + j) denotes the LFS coefficient of the fused image. Pðm1Þ=2 Pðn1Þ=2 1 i¼ðm1Þ=2 j¼ðn1Þ=2 SL ðx þ i; y þ jÞ is the local gray mean of mn LFS coefficients of the fused image and the values of its size are m and n. Since the local area of the fused LFS image is used as the background for the cross-contrast, in Eq. (4) a, b and m, n both represent the size of the local area, but a  b is less than m  n. In this paper, the value of a and b both are 3, and the value of m and n both are 15.

Rj;l ðx; yÞ ¼

Pðb1Þ=2 p¼ða1Þ=2 q¼ðb1Þ=2 Sj;l ðx þ p; y þ qÞ Pðm1Þ=2 Pðn1Þ=2 1 i¼ðm1Þ=2 j¼ðn1Þ=2 SL ðx þ i; y þ jÞ mn

1 ab

Pða1Þ=2

ð4Þ

Two origin images f1 and f2 registered rigorously are decomf

f

posed by NSCT, then the cross-contrast Rj;l1 ðm; nÞ and Rj;l2 ðm; nÞ are obtained by Eq. (4). BDS coefficients of two original images f

f

Rj;l2 ðm; nÞ and C j;l2 ðm; nÞ will be obtained after decomposing the high-frequency component in each scale into different directions. f

f

C j;l1 ðm; nÞ and C j;l2 ðm; nÞ denote BDS coefficients of the image at the pixel (m, n) at the j-th scale and the l-th direction. Finally, the BDS coefficient of the fused image C j;lf ðm; nÞ is selected by the Eq. (5).

C j;lf ðm; nÞ

¼

8 < C f 1 ðm; nÞRf 1 ðm; nÞ P Rf 2 ðm; nÞ j;l

j;l

j;l

: C f 2 ðm; nÞRf 1 ðm; nÞ < Rf 2 ðm; nÞ j;l

j;l

ð5Þ

j;l

5. Experimental results and analysis To verify the effective performance of the proposed method, multifocus images and Visible-infrared images from different applications are used in this paper. For comparison purposes, some other fusion methods is also selected to perform fusion such as the DWT-based method, NMF-based method and the NSCT-based method, in all of which lowpass subband coefficients and bandpass subband coefficients are merged by the averaging scheme and the absolute maximum choosing scheme respectively. For NMF-based algorithm, the values of W and H are initialized at random, and the number of iterations is set as 60. For the NSCT-based method and the proposed method in this paper, the stage number of the multiscale decomposition in NSCT is set as 3, and the levels of the multidirection decomposition are 2, 3 and 4, respectively from coarse resolution to fine resolution. In order to overcome the shortcoming of the subjective evaluation system and better evaluate the effect of the fused image, an objective evaluation system is established to evaluate results objectively in this paper. In this paper, evaluation metrics, such as the information entropy (IE), standard deviation (SD), average grads (AG), spatial frequency (SF), root mean square cross entropy

(RCE), Tenengrad and sum of modified laplacian (SML) are adopted, and depending on the type of the original image we will select several evaluation metrics in them as the objective evaluation. The IE value of the image reflects the amount of average information contained in the image. The larger the IE value is, the more abundant the information amount of the image is. SD indicates the dispersion degree between the gray values of pixels and the average values of the fused image. The larger the SD is, the more disperse the gray level is. The AG of the image can be sensitive to reflect the clarity in the fused image, and the greater the AG is, the sharper the fused image is. SF can be sensitive to reflect the overall activity of the spatial domain in an image, and the larger its value is, the better the fusion image is. RCE is used to express the extent of the difference between original images and ultimate fused image. Smaller RCE means that the fused image extracted the more information from original images, and the effect of fusion is relatively better. Tenengrad is used to evaluate the overall sharpness based on obtaining the gradient magnitude from the Sobel operator. Similarly, SML also is used to evaluate the overall sharpness based on modified laplacian. The larger Tenengrad and SML value are, the sharper the image is. In this paper, two multifocus images with perfect registration, Clock A and Clock B, are selected to evaluate the proposed fusion algorithm in this experiment, whose sizes are both 512  512. As shown in Fig. 4(a), the right clock is more sharp, which is in focus, while the left clock, out of focus, is blurred. In contrast, the opposite phenomenon emerges in Fig. 4(b). Fig. 4(c)–(f) respectively are obtained by using the DWT method, the NMF method, the NSCT method and the proposed method. From the visual angle, Fig. 4 (d)–(e) is obviously less sharper than Fig. 4(f), and their hands of the two clocks are blurred. On the contrary, the overall performances of the fused image, Fig. 4(f) based on proposed method, are much better and hands of the two clocks are sharper. Fig. 4 (c) also achieved a good result, but the overall brightness of the image is a little dark. Therefore, the overall effect of Fig. 4(c) is worse than Fig. 4(f) Table 1 shows the objective evaluation of the four methods, and Tenengrad and SML are normalized results. From Table 1, only RCE, compared with other three methods, the proposed methods do not perform well. For IE, SD, AG and SF, the proposed method is evidently better than the other three ones. It denotes that the proposed method obtained better results in fusion of image information, protections of image details and obvious contrast. For SML and Tenengrad, the proposed method also is better than the other three ones, which denotes the overall sharpness of image is better. So, we can find that the proposed method got optimal results in most evaluation methods. So, it can be concluded that the objective evaluation results are coincide with the above visual effect very well. The second experiment is performed on the Fig. 5(a) and Fig. 5 (b). Visible light image Fig. 5(a) and infrared image Fig. 5 (b) obtained in the same scene are used to evaluate the proposed fusion algorithm. Fig. 5(c)–(e) respectively are obtained by using

Table 1 Performance of different fusion methods.

IE SD AG SF RCE Tenengrad SML

DWT-based method

NMF-based method

NSCT-based method

Proposed method

6.8744 36.2926 2.6887 7.4273 1.4257 0.521 0.822

7.3592 52.8381 2.6538 6.8378 0.0973 0.341 0.621

7.3957 58.2655 3.2668 9.7653 0.7231 0.9231 0.721

7.5695 58.7883 3.7159 9.9841 0.8000 1 1

223

J. Adu et al. / J. Vis. Commun. Image R. 40 (2016) 218–224

Fig. 5. Visible-infrared images and Fusion results (a) infrared image; (b) visible light image; (c) fused image based on DWT; (d) fused image based on NMF; (e) fused image based on NSCT; (f) fused image based on proposed method.

the DWT method, the NMF method, the NSCT method, and Fig. 5(f) is the result used the proposed method in this paper. Although Fig. 5 shows these four methods all have achieved good results, it can be seen by comparing that the fused image attained by proposed method is with the best visual quality. In the Fig. 5(f), the ship and the car captured from the infrared image Fig. 5(a) are sharpest in all the results, and the scene that contains the road and the house from the visible light image Fig. 5(b) also is sharp. Meanwhile, fewer artifacts of fusion process are introduced during the fusion process, and the Fig. 5(f) appear more natural. Evidently, many artifacts of fusion process are introduced into the fused image Fig. 5(c) attained by the DWT-based method and it is obvious that Fig. 5(d) is too dark overall. The fused image Fig. 5(e) obtained by the NSCT-based method also achieved a good result, but the infrared targets such as the car and the ship are not better than Fig. 5(f), and these infrared targets are easier to be identified in Fig. 5(f). Since the fusion results of infrared image and visible light image haven’t the standard image to be compared, IE, RCE, AG, SF and SD as key indicators for objective evaluation are adopted. Table 2 shows the objective evaluation data of these four methods. The entropy IE of the fused image obtained by the proposed method in this paper is maximum, which means that the fused image contains the largest amount of information and has relatively better fusion result than others. The RCE of the fused image obtained by the proposed method in this paper is minimum, which means that the fused image extracts more information from the original images and has relatively better fusion result. Meanwhile, the results of the proposed method all are the best for AG, SF, SD, and it demonstrates that the proposed method achieved better fusion effect.

Table 2 Performance of different fusion methods.

IE AG RCE SF SD

DWT-based method

NMF-based method

NSCT-based method

Proposed method

4.0122 2.9245 3.1121 5.4175 31.9221

4.2012 2.9812 2.9071 5.1232 39.0211

4.6626 3.0421 2.8075 6.1032 40.8312

4.9546 3.0435 2.3912 6.0831 41.2832

6. Conclusion For MST-based image fusion method, the fusion rule of highpass and low-pass subband coefficients plays a crucial role for the result of image fusion. Moreover, transform domain also has a great impact on the fused results. In this paper, to extract and combine the features of the original images, a novel algorithm based on visual salient features and the cross-contrast is proposed for multi-scale transform. Decomposition and reconstruction of the multiscale image and the fusion rule are the two most important components of MST-based fusion algorithms. As a new image multiscale geometric analysis tool, the NSCT is more suitable for image fusion because of many advantages such as multiscale, localization, multi-direction, and shift-invariance. Therefore, NSCT has been selected as the MST-based fusion method in this paper. After original images decomposed by NSCT, three visual salient maps are constructed based on visual salient features the local energy, the contrast and the gradient respectively, and the LFS coefficients are got by utilizing these visual saliency maps. The cross-contrast is obtained by computing the ratio between the local gray mean

224

J. Adu et al. / J. Vis. Commun. Image R. 40 (2016) 218–224

of the BDS coefficients and the local gray mean of the fused LFS coefficients, and the BDS coefficients are got by the crosscontrast. Multifocus and visible-infrared images have been used to evaluate the performance of the proposed method, and visual evaluations, quantitative comparisons on the fused images indicate that the proposed method in this paper can produce better fused images. Acknowledgments We sincerely thank the reviewers and editors for their carefully checking our manuscript and providing constructive suggestions. Project (KYTZ201322) Supported by the Scientific Research Foundation of CUIT. We have benefited from the images supplied by TNO Human Factors Research Institute in the Netherlands. References [1] B. Xiangzhi, Image fusion through feature extraction by using sequentially combined toggle and top-hat based contrast operator, Appl. Opt. 51 (2012) 7566–7575. [2] A. Toet, J.M. Valeton, L.J. Van Ruyven, Merging thermal and visual images by a contrast pyramid, Opt. Eng. 28 (1989) 789–792. [3] A.H.S. Solberg, A.K. Jain, T. Taxt, Multisource classification of remotely sensed data: fusion of Landsat TM and SAR images, IEEE Trans. Geosci. Remote Sens. 32 (1994) 768–778. [4] G. Bhatnagar, Z. Liu, A novel image fusion framework for night-vision navigation and surveillance, SIViP 9 (2015) 1–11. [5] U. Hoelscher-Hoebing, D. Kraus, Unsupervised image segmentation and image fusion for multi-beam/multi-aspect sidescan sonar images, Oceans 1 (1998) 571–576. [6] H. Pan, G. Xiao, Z. Jing, Feature-based image fusion scheme for satellite recognition, Information Fusion (2010) 1–6. [7] Z. Xue, R.S. Blum, Concealed weapon detection using color image fusion, Information Fusion, 2003. Proceedings of the Sixth International Conference of, 2003, pp. 622–627. . [8] C. Yan, Y. Zhang, J. Xu, F. Dai, L. Li, Q. Dai, F. Wu, A highly parallel framework for HEVC coding unit partitioning tree decision on many-core processors, IEEE Signal Process. Lett. 21 (2014) 573–576. [9] C. Yan, Y. Zhang, J. Xu, F. Dai, J. Zhang, Q. Dai, F. Wu, Efficient parallel framework for HEVC motion estimation on many-core processors, IEEE Trans. Circuits Syst. Video Technol. 24 (2014) 2077–2089. [10] H. Li, B.S. Manjunath, S.K. Mitra, Multisensor Image Fusion Using the Wavelet Transform, Image Processing, 1994. Proceedings. ICIP-94., IEEE International Conference, 1994, pp. 235-245. [11] M. Gonzalez-Audicana, J.L. Saleta, R.G. Catalan, R. Garcia, Fusion of multispectral and panchromatic images using improved IHS and PCA mergers based on wavelet decomposition, Geosci. Remote Sens. IEEE Trans. 42 (2004) 1291–1299. [12] Z. Wang, Y. Ma, J. Gu, Multi-focus image fusion using PCNN, Pattern Recogn. 43 (2010) 2003–2016. [13] Y. Liu, S. Liu, Z. Wang, Multi-focus image fusion with dense SIFT, Information Fusion 23 (2015) 139–155. [14] Y. Jiang, M. Wang, Image fusion with morphological component analysis, Information Fusion 18 (2014) 107–118. [15] B. Aiazzi, L. Alparone, A. Barducci, S. Baronti, Multispectral fusion of multisensor image data by the generalized Laplacian pyramid, Geoscience and Remote Sensing Symposium, 1999. IGARSS ’99 Proceedings. IEEE 1999 International, vol. 2, 1999, pp. 1183–1185. [16] J. Du, W. Li, B. Xiao, Q. Nawaz, Union laplacian pyramid with multiple features for medical image fusion, Neurocomputing (2016). [17] A. Toet, Image fusion by a ratio of low-pass pyramid, Pattern Recogn. Lett. 9 (1989) 245–253. [18] V.S. Petrovi?, C.S. Xydeas, Gradient-based multiresolution image fusion, IEEE Trans. Image Process. 13 (2004) 228–237. [19] J. Tian, L. Chen, L. Ma, W. Yu, Multi-focus image fusion using a bilateral gradient-based sharpness criterion, Opt. Commun. 284 (2011) 80–87. [20] T. Pu, G. Ni, Contrast-based image fusion using the discrete wavelet transform, Opt. Eng. 39 (2000) 2075–2082.

[21] M. Beaulieu, S. Foucher, L. Gagnon, Multi-spectral image resolution refinement using stationary wavelet transform, Geoscience and Remote Sensing Symposium, 2003. IGARSS ’03. Proceedings. 2003 IEEE International, vol. 6, 2003, pp. 4032–4034. [22] S. Li, Multisensor remote sensing image fusion using stationary wavelet transform: effects of basis and decomposition level, Int. J. Wavelets Multiresolut. Inf. Process. 6 (2011) 37–50. [23] Y. Chai, H.F. Li, J.F. Qu, Image fusion scheme using a novel dual-channel PCNN in lifting stationary wavelet domain, Opt. Commun. 283 (2010) 3591–3602. [24] Y. Chai, H. Li, Z. Li, Multifocus image fusion scheme using focused region detection and multiresolution, Opt. Commun. 284 (2011) 4376–4389. [25] Y. Hu, J. Huang, S. Kwong, Y.K. Chan, Image Fusion Based Visible Watermarking Using Dual-Tree Complex Wavelet Transform, Digital Watermarking, Second International Workshop, IWDW 2003, Seoul, Korea, October 20-22, 2003, Revised Papers, 2003, pp. 86–100. [26] F. Nencini, A. Garzelli, S. Baronti, L. Alparone, Remote sensing image fusion using the curvelet transform, Information Fusion 8 (2007) 143–156. [27] M. Choi, R.Y. Kim, M.R. Nam, H.O. Kim, Fusion of multispectral and panchromatic satellite images using the curvelet transform, IEEE Geosci. Remote Sens. Lett. 2 (2005) 136–140. [28] T. Chen, J. Zhang, Y. Zhang, Remote sensing image fusion based on ridgelet transform, Appl. Opt. 26 (1987) 5204–5210. [29] Q. Zhang, B.-L. Guo, Multifocus image fusion using the nonsubsampled contourlet transform, Signal Processing 89 (2009) 1334–1346. [30] Xiao-Bo, Jing-Wen, XIAO, Hong-Zhi, Zi-Qian, Image fusion algorithm based on spatial frequency-motivated pulse coupled neural networks in nonsubsampled contourlet transform domain, Acta Automatica Sinica 34 (2008) 1508–1514. [31] J. Peng, Image fusion with nonsubsampled contourlet transform and sparse representation, J. Electron. Imaging 22 (2013) 6931–6946. [32] P. Ganasala, V. Kumar, Multimodality medical image fusion based on new features in NSST domain, Biomed. Eng. Lett. 4 (2015) 414–424. [33] Q.G. Miao, C. Shi, P.F. Xu, M. Yang, Y.B. Shi, A novel algorithm of image fusion using shearlets, Opt. Commun. 284 (2011) 1540–1547. [34] G. Gao, L. Xu, D. Feng, Multi-focus image fusion based on non-subsampled shearlet transform, IET Image Proc. 7 (2013) 633–639. [35] G. Piella, A general framework for multiresolution image fusion: from pixels to regions, Information Fusion 4 (2003) 259–280. [36] Y. Liu, S. Liu, Z. Wang, A general framework for image fusion based on multiscale transform and sparse representation, Information Fusion 24 (2015) 147– 164. [37] J.L. Lai, Y. Yi, Key frame extraction based on visual attention model, J. Vis. Commun. Image Represent. 23 (2012) 114–125. [38] J.A. García, R. Rodriguez-Sánchez, J. Fdez-Valdivia, Axiomatic approach to computational attention, Pattern Recogn. 43 (2010) 1618–1630. [39] L. Itti, C. Koch, E. Niebur, A model of saliency-based visual attention for rapid scene analysis, IEEE Trans. Pattern Anal. Mach. Intell. 20 (1998) 1254–1259. [40] Y. Fang, Z. Chen, W. Lin, C.W. Lin, Saliency detection in the compressed domain for adaptive image retargeting, IEEE Trans. Image Process. A Publ. IEEE Signal Process. Soc. 21 (2012) 3888–3901. [41] Y. Fang, Z. Chen, W. Lin, C.W. Lin, Saliency-based image retargeting in the compressed domain, in: International Conference on Multimedea 2011, Scottsdale, Az, USA, November 28 – December, 2011, pp. 1049–1052. [42] V. Gopalakrishnan, Y. Hu, D. Rajan, Random walks on graphs for salient object detection in images, IEEE Trans. Image Process. Publ. IEEE Signal Process. Soc. 19 (2010) 3232–3242. [43] M. Al-Azawi, Y. Yang, H. Istance, Irregularity-based image regions saliency identification and evaluation, Multimedia Tools Appl. 549 (2014) 1–24. [44] S. Li, B. Yang, J. Hu, Performance comparison of different multi-resolution transforms for image fusion, Information Fusion 12 (2011) 74–84. [45] M.N. Do, M. Vetterli, The contourlet transform: an efficient directional multiresolution image representation, IEEE Trans. Image Process. Publ. IEEE Signal Process. Soc. 14 (2005) 2091–2106. [46] C.A. Da Cunha, J. Zhou, M.N. Do, The nonsubsampled contourlet transform: theory, design, and applications, IEEE Trans. Image Process. Publ. IEEE Signal Process. Soc. 15 (2006) 3089–3101. [47] M. Shensa, Discrete wavelet transform: wedding the a Trous and Mallat algorithms, IEEE Trans. Signal Process. 40 (1992) 2464–2482. [48] J. Adu, M. Wang, Z. Wu, Z. Zhou, Multi-focus image fusion based on the nonsubsampled contourlet transform, J. Mod. Opt. 59 (2012) 1355–1362. [49] A.K.C. Wong, P.K. Sahoo, A gray-level threshold selection method based on maximum entropy principle, Syst. Man Cyber. IEEE Trans. 19 (1989) 866–871.