Information Sciences 349–350 (2016) 25–49
Contents lists available at ScienceDirect
Information Sciences journal homepage: www.elsevier.com/locate/ins
Multifocus image fusion by combining with mixed-order structure tensors and multiscale neighborhood Huafeng Li, Xiaosong Li, Zhengtao Yu∗, Cunli Mao Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Yunnan, Kunming 650500, PR China
a r t i c l e
i n f o
Article history: Received 23 July 2014 Revised 8 February 2016 Accepted 11 February 2016 Available online 23 February 2016 Keywords: Fractional differential Multifocus image fusion Multiscale focus measure Multiscale neighborhood Structure tensor
a b s t r a c t In this study, we propose a new method for multifocus image fusion by combining with the structure tensors of mixed order differentials and the multiscale neighborhood. In this method, the structure tensor of an integral differential is utilized to detect the high frequency regions and the structure tensor of the fractional differential is used to detect the low frequency regions. To improve the performance of the fusion method, we propose a new focus measure based on the multiscale neighborhood technique to generate the initial fusion decision maps by exploiting the advantages of different scales. Next, based on the multiscale neighborhood technique, a post-processing method is used to update the initial fusion decision maps. During the fusion process, the pixels located in the focused inner regions are selected to produce the fused image. In order to avoid discontinuities in the transition zone between the focused and defocused regions, we propose a new “averaging” scheme based on the fusion decision maps at different scales. Our experimental results demonstrate that the proposed method outperformed the conventional multifocus image fusion methods in terms of both their subjective and objective quality. © 2016 Elsevier Inc. All rights reserved.
1. Introduction The information captured by a single imaging sensor is usually insufficient to describe a scene in a comprehensive manner [52]. Thus, image fusion is employed to combine complementary visual information from several input images to produce a composite image for applications such as human visual perception and machine recognition. In recent years, this technique has attracted considerable research attention[1,11,12,14,25,26,34,51,59,60,71] and many novel methods have been proposed. As one the most important aspects of multi-sensor image fusion, the aim of multifocus image fusion is to create an image that contains all of the relevant objects in focus by extracting and synthesizing the focused objects from the source images. This technique can overcome the limited depth-of-focus found with optical lenses in imaging cameras. Thus, this method has been applied widely in many fields such as medical imaging, digital photography, microbial imaging, and military uses. In contrast to pattern recognition tasks [29,31,47] and the haze removal problem [72], the key problem during multifocus image fusion is recognizing the focused pixels or regions in the source images. Thus, a number of sharpness measures have been proposed in order to detect the focused pixels correctly. However, due to the intrinsic ambiguity that arises from textureless and edgeless regions, it is difficult to decide whether a pixel or region is blurred or not simply by using a
∗
Corresponding author. Tel.: +86 13888616568. E-mail address:
[email protected] (Z. Yu).
http://dx.doi.org/10.1016/j.ins.2016.02.030 0020-0255/© 2016 Elsevier Inc. All rights reserved.
26
H. Li et al. / Information Sciences 349–350 (2016) 25–49
focus measure alone. Furthermore, the removal of visual artifacts, such as boundary seams and ringings generated in the fused border regions or along the boundaries of objects, can be quite a challenging problem during the fusion process [10,62]. Many algorithms have been developed to address these problems in recent decades, among which the transformed domain-based methods are considered to be the most popular. Using a multiscale transform (MST), the input images are transformed into multiresolution representations. Simultaneously, the low frequency subband coefficients and the high frequency subband coefficients of the source images are obtained. To retain the salient features of the input images, different subband coefficients are then selected to reconstruct the fused image according to fusion rules. Finally, the fused image is constructed by taking an inverse MST. At present, the most commonly used MSTs include the discrete wavelet transform (DWT) [22,35], dual tree complex wavelet transform [16,18,61], contourlet transform [8,55], nonsubsampled contourlet transform (NSCT) [6,49,50,63], Curvelet transform [2,23], and log-Gabor transform [40]. In recent years, several new fusion methods have been proposed in the transform domain, such as the multiscale neighbor distance-based method (NeiDis) [68] and the support value transform-based method [69]. In addition to the MST, another significant factor that affects the fusion performance in the MST domain is the fusion rule used for different subbands, where the most commonly used schemes include maximum selection-based, local neighborhood-based, and local region-based selection principles [18]. In general, the local region-based and local neighborhood-based selection schemes outperform the maximum-selection based principle because all of the pixels in a local region or neighborhood are considered. However, finding a suitable size for the local neighborhood is quite a challenging problem. To address this problem, Redondo and Šroubek proposed multi-size windows-based selection principles in the log-Gabor domain [40] based on the exploitation of different window sizes, where more coefficients are selected from the focused regions. For region-based selection principles, the performance is highly dependent on segmentation algorithm. However, the segmentation algorithm may not always generate a segmented result with high quality. As a result, the quality of fused result cannot be always guaranteed. These MSTs-based approaches can combine the features from the source images in an efficient manner, even if these features are spatially overlapping [70]. Therefore, there has been much research in this area during the last decade and many fusion methods have been proposed based on different MSTs and selection principles. However, these methods cannot represent the local features of an image in an adaptive manner, and thus suboptimal fusion results may be produced [32,42]. In addition, some useful information may be lost from the source image during the implementation of the inverse MST [19,24]. Apart from the transform domain methods, various fusion algorithms include region-based methods and block-based methods in the spatial domain [7,24,26]. The basic idea of region-based methods is to apply a segmentation method to the source images, before employing a sharpness measure to determine which region is in focus. Finally, the fused image is produced by extracting and synthesizing the focused regions. The most commonly used sharpness measures include variance, energy of image gradient[15], sum-modified Laplacian [33], spatial frequency [24], and Tenenbaum’s algorithm [58]. The main advantage of these methods is that they allow the focused objects to be transferred successfully into the fused image. However, the fusion quality of these methods depends greatly on the segmentation algorithm. Moreover, the segmentation algorithms are usually complex and time consuming, which hinders the application of the fusion method. In addition, discontinuities or erroneous results may be produced around the boundaries of the fused objects because of the uncertainty of focus estimation in the focused border regions. In block-based methods, the source images are first divided into blocks, before the fused image is constructed by selecting the sharper blocks from the source image. However, the performances of these methods are sensitive to the block size. If the size is not suitable or the block is partly clear, these methods may be affected by blockness, which significantly affects the appearance of the fused image. In order to avoid these problems, new multifocus image fusion methods have been proposed based on focus detection methods [4,21,28,70]. The advantage of these methods is that the objects within the focused region can be copied directly into the resultant image, but this advantage depends on the detection method applied to the focused regions. It should be noted that most focused region detection methods assume that the focused regions appear sharper than the unfocused regions. In general, these methods can produce adequate results if this assumption is satisfied, but the focused regions contain some smooth regions in many cases. Therefore, the focused region detection method may yield incorrect solutions. To overcome this problem, post-processing methods have been developed based on a morphology filter, such as “open” or “closing” and “corrosion” or “expansion” operations[4,19], although it is difficult to determine the thresholds for the morphological filters in most cases. Moreover, these methods are likely to introduce erroneous and discontinuous effects along the boundaries of the focused regions because the boundaries of focused regions cannot be determined accurately. Thus, Chai and Li proposed a fusion method that combines focused region detection and MST [4,19], but these methods are inconvenient in practice because of the morphology filter. To address the weaknesses of the traditional focus region detection-based methods, we propose a novel simple scheme, which efficiently generates an all-in-focus image by integrating complementary information from the input images. This new scheme was inspired by a previous study [40] and it comprises four components. First, using the structure tensors of fractional and integral differentials, a novel focus measure based on the multiscale neighborhood technique (MLNT) is used to evaluate the sharpness of a pixel. Next, a series of initial decision maps (IDMs) are generated by comparing the values for the sharpness measure in different local neighborhoods. In order to detect the focused regions correctly, the MLNT and IDMs are all used to build the final decision map (FDM), and a fusion scheme for the focused border regions is generated
H. Li et al. / Information Sciences 349–350 (2016) 25–49
27
by applying these IDMs in the third component. To obtain the final fused result, a simple and efficient method is proposed for the fusion of the source images based on the FDMs and the fusion scheme of the focused border regions. The novel contributions of this study are summarized as follows. (1) Development of a new focus measure by combining the fractional structure tensor and the integer order structure tensor, which can effectively avoid erroneous assessments of the focusing properties when the focused pixels are located in smooth regions. (2) Proposal of a multiscale neighbor technique for generating the IDMs and FDMs. This design integrates the advantages of local neighborhoods with different sizes in an efficient manner, which improves the reliability of the focus measure. (3) Design of an averaging method based on decision maps at different scales to merge the borders between the focused and defocused regions, thereby suppressing the discontinuities between different focused regions in the fused results. The remainder of this paper is organized as follows. In Section 2, we provide a detailed description of the new focus measure generated by the structure tensors of mixed order differentials. In Section 3, we present the method employed for detecting pixels in the focused regions. In Section 4, we describe the proposed fusion strategy. Our experimental results are presented in Section 5, as well as some discussion and a performance analysis. Finally, we give our conclusions in Section 6. 2. Structure tensor 2.1. Structure tensor of the integral differential During multifocus image fusion, the key issue is detecting the focused pixels in each source image in an accurate manner. In general, a focus measure is used as a metric to identify the focused pixels. Thus, the performance of the focus measure is vital for the successful detection of the focused pixels. The important parts of an ideal fused image are the local geometric structures of the focused regions, which usually correspond to perceptually important features [37], and thus they appear sharper than the defocused regions. Therefore, the saliency of the geometric structures of a region can reflect its focusing properties to a certain extent. In general, the geometric structures can be described by the first or second derivatives, or by scale-space decompositions. Therefore, the gradient operator associated with the first derivative can be used to describe the geometric structure information in an image. For a multichannel image f (x, y ) = ( f1 (x, y ), f2 (x, y ), f3 (x, y ), . . . , fm (x, y )), the square of the variation f (x, y) at position (x, y) in direction θ can be represented by Eq. (1) for any ε → 0+ [17]:
(df )2 = f (x + ε cos θ , y + ε sin θ ) − f (x, y )22 2 m ∂f ∂ fi ≈ ε cos θ + i ε sin θ . ∂x ∂y i=1
(1)
Then, the rate of change R(θ ) for the image f (x, y) in direction θ at position (x, y) can be described as
2 ∂ fi sin θ ∂x ∂y i=1 ⎡ ⎤ m ∂ f i ∂ f i m ∂ fi 2 i=1 ∂ x i=1 ∂ x ∂ y T = (cos θ , sin θ )⎣ ⎦(cos θ , sin θ )
R (θ ) =
m ∂ fi
cos θ +
m ∂ f i ∂ f i i=1 ∂ y ∂ x
= (cos θ , sin θ )
m
m
i=1
∂ fi 2 ∂y
(2)
∇ fi ∇ fiT (cos θ , sin θ )T ,
i=1
∂f
∂f
where ∇ fi = ( ∂ xi , ∂ yi )T . The matrix
P=
m i=1
⎡
m i=1
∇ fi ∇ fiT = ⎣
∂ f i 2 ∂x
m ∂ fi ∂ fi i=1 ∂ y ∂ x
m ∂ f i ∂ f i
⎤
i=1 ∂ x ∂ y
m
i=1
∂ f i 2 ⎦ ∂y
is usually called the structure tensor. Clearly, the matrix is positive semidefinite. In various applications, the tensor technique is employed to compress images [9], recognize human gait feature or targets from a video or a remote-sensing image [64,65], solve more general streaming problems [44] and perform binary image classification [46]. For a grayscale image (m = 1), Eq. (2) shows that the function R(θ ) is maximal if the vector (cos θ , sin θ ) is parallel to ∇ f, whereas it is minimal if (cos θ , sin θ ) is orthogonal to ∇ f. We also note that maximizing R(θ ) (or minimizing) is equivalent to maximizing (or minimizing) (cos θ , sin θ )∇ f ∇ f T (cos θ , sin θ )T . The matrix ∇ f ∇ f T is positive semidefinite, so an orthogonal matrix Q exists such that
∇ f ∇ f T = Q Q ,
(3)
where is a 2 × 2 diagonal matrix. The elements on the primary diagonal are all nonnegative and they are the eigenvalues of ∇ f ∇ f T . For the orthogonal matrix Q, the column vectors comprise the orthonormal basis of the eigenvectors of ∇ f ∇ f T .
28
H. Li et al. / Information Sciences 349–350 (2016) 25–49
Fig. 1. The “Lab” source images (with size 640 × 480) and the IDMs obtained by Eq. (6). (a) Focused on the clock, (b) focused on the student, and (c) and (d) IDMs obtained for Fig. 1(a) and (b) by Eq. (6).
Thus, for a grayscale image f (x, y), we can easily obtain the eigenvalues λ1 = |∇ f |2 , and λ2 = 0 of the semidefinite ∇ f ∇ f T . ⊥
∇f f The corresponding eigenvectors are ν1 = |∇ , and ν2 = ∇ f| |∇ f | . The maximum eigenvalue λ1 and the minimum eigenvector λ2 give the maximum and minimum rate of change for image f in direction ν 1 at a given point, respectively. In a multifocus image, the change in the gray value in the focused region is usually sharper than that in the unfocused region. Therefore, the structure tensor ∇ f ∇ f T can be utilized to describe information about the change in an image. For multichannel ∇ f ∇ f T has orthogonal eigenvectors v1 , v2 with v1 parallel to the vec images, the structure tensor m ∂ f i 2 m ∂ fi 2 ∂ fi ∂ fi T 2 2 tor (2F , 2G − E + (G − E ) + 4F ) , where E = i=1 ( ∂ x ) , F = m i=1 ∂ x ∂ y , G = i=1 ( ∂ y ) . The corresponding eigenvalues can be expressed as
1 2
λ1,2 = ((E + G ) ±
( E − G )2 + 4F 2 ).
(4)
The eigenvalues λ1, 2 convey shape information, where the vectors v1,2 indicate the orientations that maximize and minimize the gray-value fluctuations. Therefore, in the isotropic region, the eigenvalues are relatively small and λ1 ≈ λ2 , whereas the eigenvalues are relative large and λ1 ≥ λ2 in the anisotropy region. For multifocus images, if the eigenvalues of one pixel are both small, this indicates that the current pixel is located in the flat region. If one eigenvalue is larger and the other is smaller, the pixel is likely to be in the focused region. Therefore, the difference between the maximum and minimum eigenvalues of the traditional structure tensor can be used to detect the focused pixels to some extent. We denote the difference by λ and it can be computed by
λ(x, y ) = λ1 (x, y ) − λ2 (x, y ).
(5)
Then, the IDM Di (x, y) of image i can be generated by
Di (x, y ) =
1, 0,
if if
λi (x, y ) ≥ λ j (x, y ) λi (x, y ) < λ j (x, y ),
(6)
where i, j, i = j denote any two source images and Di (x, y ) = 1 indicates that the point (x, y) in image i is in focus. For the source images in Fig. 1(a) and (b), the corresponding decision maps obtained by Eq. (6) are shown in Fig. 1(c) and (d), respectively. 2.2. Fractional differential-based structure tensor Based on the analysis above and the experimental results (see Fig. 1(c) and (d)), we know that the eigenvalue of the structure tensor can be employed to detect the focused pixels. However, two challenges need to be addressed. First, most of the focus measures are designed based on the first or second-order differential, including the structure tensor. During image processing, the gray-level value is highly correlated between neighboring pixels. Most fractal structures are produced by an evolutionary process (e.g., fractal growth) to yield an evolutionary final product (e.g., a fracture). These fractal-like structures are often expressed by complex texture features [38]. For the texture regions, the integral differential values may remain close to zero and these defects have adverse effects when distinguishing the focused pixels. The second challenge is that the focus measure based on the integral differential may yield an incorrect detection result if the smooth region is focused. As a generalization of the integral differential, the fractional differential is an effective mathematical method for handling fractal problems [41,48]. The fractional differential can also be used in a similar manner to the integral differential to analyze and process images, such as inspecting and extracting singularities. Therefore, the fractional differential can be introduced into the structure tensor to describe the features of the fractal-like texture and the edge details. According to the analyses given above, the α -order fractional structure tensor can be defined as
P =
m i=1
⎡
∇ α fi (x, y )∇ α fiT (x, y ) = ⎣
m i=1
∂ α f i 2 ∂ xα
m ∂ α fi ∂ α fi i=1 ∂ yα ∂ xα
m ∂ α f i ∂ α f i i=1 ∂ xα
m
i=1
∂ yα
⎤
∂ α fi 2 ⎦. ∂ yα
(7)
H. Li et al. / Information Sciences 349–350 (2016) 25–49
29
In Eq. (7), ∇ α fi (x, y), ∂ α fi /∂ xα and ∂ α fi /∂ yα , respectively, are given as
∇ α fi (x, y ) =
∂ α f ∂ α f T i i , , ∂ xα ∂ yα
(8)
∂ α fi K−1
= (−1 )k Ckα fi (x − k , y ), α ∂x k =0
(9)
∂ α fi K−1
= (−1 )k Ckα fi (x, y − k ), ∂ yα k =0
(10)
where α can be any real number (including fractions), Ckα = (α + 1 )/[ (k + 1 ) (α − k + 1 )] denotes the generalized binomial coefficient, and (x) is the Gamma function. In particular, if α = 1, then Ck1 = 0 for k ≥ 2, and Eqs. (9) and (10) are the first-order derivatives as usual. Clearly, the fractional differentiation-based structure tensor is a generalization of P. If the order α = 1, then Eq. (7) is the same as the definition of P. From the viewpoint of image processing, in a section where the frequency 0 < ω < 1, the enhancement of the texture features and the preservation magnitude of low-frequency contours by fractional differentials are both superior to that obtained with the first-order derivative. This is because the texture features may be highly attenuating in a smooth area where the gray scale does not change intensively. Therefore, the value of the integral differential in this area is nearly zero [38], and thus when 0 < α < 1, the fractional differential is more suitable for extracting the texture information of an image. However, in the high frequency section where ω > 1, the enhancement of high-frequency edge components by the fractional differential is inferior to that obtained by the first order derivative. Thus, the integral differential is better for extracting the edge information than the fractional differential. Considering the properties of the integral and fractional differentials, we propose a new detection method that integrates the advantages of different order differentials. 3. Multiscale neighborhood technique For multifocus images, if the current pixel is located in the focused region, the gray values of its local neighboring pixels change significantly, whereas they may change gradually in the defocused region. This change can be reflected by the eigenvalues of the structure tensor, so determining whether a pixel is focused can be decided by the corresponding eigenvalues of its structure tensor. In order to integrate the advantages of different order differentials, we propose a new focus measure that combines the eigenvalues of the structure tensors generated by the integral and the fractional differentials. 3.1. Multiscale focus measure (MFM) Let us consider the processing of two images, A and B, where we assume that the source images have been registered in an appropriate manner. It should be noted that our method can be extended easily to multiple images (more than two source images). Using the eigenvalues of the matrix ∇ α f (x, y)∇ α f T (x, y), a new focus measure is proposed, which is given as
λα (x, y ) = λα1 (x, y ) − λα2 (x, y ), where λα (x, y ) and 1
λα2 (x, y ) are the maximum and minimum eigenvalues of the structure tensor
(11)
∇ α f (x, y)∇ α f T (x, y), re-
spectively. Based on the focus measure, the focusing property of one pixel can be determined by comparing the corresponding eigenvalues of each source image pixel. However, we do not consider the information that might possibly be contained in the neighborhood of the current pixel. To address this problem, we can consider the pixels located in the neighborhood. However, this technique is sensitive to the neighborhood size, especially for the focused smooth region pixels. To avoid false detection due to the factors mentioned above, we propose a new detection method based on a multiscale neighborhood technique. First, we introduce and define the MFM as
λα
n,m1
(x, y ) =
m1 n
λα (x + i, y + j ),
(12)
i=−n j=−m1
where (2m1 + 1 ) × (2n + 1 ) is the neighborhood size, which we call the scale factor. By adjusting the scale factor, we can obtain a series of λα n,m1 at different scales. To reduce the impact of the current neighborhood on the adjacent neighborhood, the following modified MFM is developed.
λα
n,m1
(x, y ) = λαn,m1 (x, y ) − λαn−1,m1 −1 (x, y )
(13)
To reduce the computational complexity, both of the parameters n and m are set to the same value in this study. Thus, Eq. (13) can be modified as
λαn (x, y ) = λαn,n (x, y ) − λαn−1,n−1 (x, y ).
(14)
30
H. Li et al. / Information Sciences 349–350 (2016) 25–49
Fig. 2. IDMs for the “Lab” source images obtained by Eq. (18) under different scale factors L. (a) and (b) IDMs obtained for Fig. 1(a) and (b) by Eq. (18) at L = 0. (c) and (d) IDMs obtained for Fig. 1(a) and (b) by Eq. (18) at L = 10.
It should be noted that different neighborhood sizes have various advantages. In order to exploit the advantages of neighborhoods with different sizes, the following formulae are introduced:
˜ α2 (x, y ) ≥ λ ˜ α2 (x, y ) > T}, ˜ α2 (x, y ) and λ S1i,n (x, y ) = {nxy |λ i,n j,n i,n
(15)
˜ α2 (x, y ), λ ˜ α2 (x, y )) < T and λ ˜ α1 (x, y ) ≥ λ ˜ α1 (x, y )}, S2i,n (x, y ) = {nxy | max(λ i,n j,n i,n j,n
(16)
λi,L (x, y ) = S1i,n (x, y ) S2i,n (x, y )n = 1, 2, 3, . . . , L ,
(17)
where j = i, i, j = A, B, α 1 ∈ (0, 1), α 2 ∈ [1, 2], and T is a threshold; the operator |·| denotes the cardinality of the set {·}; λ˜ αi,n2 (x, y ) and λ˜ αi,n1 (x, y ) denote the modified focus measure value with different orders at the nth scale and location (x, y) in image i, respectively. The scale factor n ranges from 0 to L, where L denotes the maximum of n. For α 1 and α 2 , we investigated different orders and found that α1 = 0.75 and α2 = 1 were good choices for most multifocus images. In Eqs. (15) and (16), S2i,n (x, y ) and S1i,n (x, y ) can be employed to decide which pixels are focused in the complex fractallike texture details and edge regions. In general, pixels with greater values for λi,L (x, y ) are more likely to be in the focused regions compared with λ j,L (x, y ). Therefore, an IDM that reflects the focusing property of the source image can be established as
Di (x, y ) =
1, 0,
if if
λi,L (x, y ) ≥ λ j,L (x, y ) , λi,L (x, y ) < λ j,L (x, y )
(18)
where “1” in Di (x, y) indicates that the pixel is in focus at position (x, y) in image i. Therefore, it can be selected as the corresponding pixel in the fused image. After the IDM is generated, the pixels located in the focused regions can be determined according to the decision maps. Naturally, it is very easy to obtain the fused image when guided by the decision maps. Fig. 2 shows the detection of the focused regions in the “Lab” source images (Fig. 1(a) and (b)). Fig. 2(a) and (b) are the IDMs generated for Fig. 1(a) and (b) by Eq. (18), respectively, when the threshold T and the scale factor L were set to 0. Fig. 2(c) and (d) are the processing results obtained by Eq. (18) using a threshold of T = 0.0025 and a scale factor of L = 10. According to Fig. 2, the proposed focused pixel detection method can classify the pixels correctly, although the pixels are located in the extremely low frequency section. 3.2. MLNT Fig. 2 shows that comparing only the focus value, i.e., the λ value, is insufficient to determine all of the focused pixels. Indeed, some thin protrusions, thin gulfs, narrow breaks, small holes, etc., may appear in Di (x, y) due to the smooth regions in the source images. Thus, we propose a new method for generating the FDM, which is described algorithmically as follows. (1) Compute the pixel values in a multiscale local neighborhood of decision map Di (x, y ), (i = A, B ) using the following expression
Zi,n1 (x, y ) =
n1
n1
Di ( x + l1 , y + l2 ) ,
(19)
l1 =−n1 l2 =−n1
where n1 is the scale factor of the local neighborhood. (2) Compute the cardinality of sets Ri,n1 and R j,n1 by the following equations
Zi,N1 (x, y ) = |Ri,n1 (x, y )|,
(20)
Z j,N1 (x, y ) = |R j,n1 (x, y )|,
(21)
where
Ri,n1 (x, y ) = {n1 (x, y )|Zi,n1 (x, y ) − Zi,n1 −1 (x, y ) > Z j,n1 (x, y ) − Z j,n1 −1 (x, y )}, R j,n1 (x, y ) = {n1 (x, y )|Z j,n1 (x, y ) − Z j,n1 −1 (x, y ) > Zi,n1 (x, y ) − Zi,n1 −1 (x, y )},
(n1 = 1, 2, 3, . . . , N1 ), i = j, i, j = A, B, and N1 is the maximum of the scale factor n1 .
H. Li et al. / Information Sciences 349–350 (2016) 25–49
31
Fig. 3. The decision maps of the “Lab” source images (Fig. 1(a) and (b)). (a) and (b) Decision maps obtained by Eq. (22) using L = 10, N1 = 10. (c) and (d) Decision maps obtained by Eq. (23) using L = 10, N1 = 10. (e) and (f) Decision maps obtained by Eq. (22) using L = 15, N1 = 15. (g) and (h) Decision maps obtained by Eq. (23) using L = 15, N1 = 15.
(3) Compare Zi,N1 (x, y ) and Z j,N1 (x, y ) to determine which pixel is in focus, and generate the modified decision maps as follows.
Di (x, y ) =
1, 0,
i f Zi,N1 (x, y ) ≥ Z j,N1 (x, y ) . i f Zi,N1 (x, y ) < Z j,N1 (x, y )
(22)
In Eq. (22), if Di (x, y ) = 1, this indicates that the pixel at position (x, y) in image i is in focus, whereas if Di (x, y ) = 0, this indicates that the pixel at position (x, y) in image i is defocused. (4) Fill the small isolated regions in the largest white and black regions of image Di by using
Di = bwareaopen(Di , th ),
(23)
where “bwareaopen” represents the filling filter, th is a threshold, and the filtering operation can remove all of the small regions with less than th pixels from a binary image. In our experiment, we selected th = 13, 0 0 0 as a good choice for the test images. Fig. 3 shows the detection results obtained by the proposed MLNT. The detection maps for Fig. 1(a) and (b) generated by Eq. (22) at L = 10, N1 = 10 and L = 15, N1 = 15 are shown in Fig. 3(a) and (b) and Fig. 3(e) and (f), respectively. Fig. 3(c) and (d) and Fig. 3(g) and (h) depicts the detection results obtained for Fig. 3(a) and (b) and Fig. 3(e) and (f) after filling the isolated holes using the “bwareaopen” filter. 4. Proposed fusion scheme 4.1. Fusion scheme for border regions between focused and defocused areas In general, after the FDMs have been built, most previous studies suggest that the fused image should be produced according to the following scheme:
⎧ ⎨ fA (x, y ), fF (x, y ) = fB (x, y ), ⎩ random( fA (x, y ), fB (x, y ))
i f DA (x, y ) = 1 and DB (x, y ) = 0 i f DA (x, y ) = 0 and DB (x, y ) = 1, i f DA (x, y ) = DB (x, y )
(24)
where fF (x, y) denotes the pixel value of fused image F at position (x, y). In this fusion scheme, when DA (x, y ) = 1 and DB (x, y ) = 0, the pixel located at (x, y) in image A is extracted to produce the corresponding pixel of the fused image F. In another case, i.e., if DA (x, y ) = 0 and DB (x, y ) = 1, the corresponding pixel of the source image B is selected as the pixel of the fused image F. In addition, DA (x, y ) = DB (x, y ) demonstrates that the pixel focusing property at location (x, y) is very difficult to determine. In this case, boundary seams may be introduced along the boundary of the focused region. In order to address this problem, an effective method was proposed that combines the spatial domain and transformed domain methods [19]. Unfortunately, some details are smoothed and the brightness may be distorted because the MST is employed in these methods [20]. To avoid these problems, we propose a new scheme for fusing the focused border regions in a subsection. For clarity, an overview of the scheme employed by the proposed method is shown in Fig. 4. First, the following definitions are defined to simplify the discussion: 1 Ui,n (x, y ) = argi,n {λαi,n2 (x, y ) ≥ λαj,n2 (x, y ) and
λαi,n2 (x, y ) > T},
(25)
32
H. Li et al. / Information Sciences 349–350 (2016) 25–49
Fig. 4. Block diagram illustrating the image fusion scheme used for processing the focused border regions.
IDMA1 Source imageA
MFM
IDMA2
MLNT FDMA
M
IDMAL
FSI FSB
Fused image
IDMB1 Source image B
MFM
IDMB2
MLNT FDMB
M
IDMBL
Fig. 5. Block diagram of the proposed image fusion scheme. 2 Ui,n (x, y ) = argi,n {max(λαi,n2 (x, y ), λαj,n2 (x, y )) < T and
λαi,n1 (x, y ) ≥ λαj,n1 (x, y )}.
(26)
The decision maps Di,n (x, y )(i = A, B ) at scale n can be given as
Di,n (x, y ) =
1, 0,
1 i f Ui,n (x, y ) = i, n or Ui,l2 (x, y ) = i, n . 1 2 i f Ui,n (x, y ) = i, n and Ui,n (x, y ) = i, n
(27)
Given the relationship between pixels located in the same neighborhood, the multiscale decision maps are first convolved with a Gaussian filter Gn ,σ with a size of n × n and standard deviation of σ :
MDi,n = IDi,n−1 + Di,n ,
(28)
where IDi,n−1 = MDi,n−1 ⊗ Gn ,σ and MDi,1 = Di,1 ⊗ Gn ,σ . After the decision map MDi, n is established, the fusion rule for the focused border region at scale n can be described as
Fn (x, y ) =
fA (x, y ), fB (x, y ),
i f MD1A,n (x, y ) ≥ MD1B,n (x, y ) , i f MD1A,n (x, y ) < MD1B,n (x, y )
(29)
where Fn (x, y) denotes the gray value for the fusion result at scale n and location (x, y). To reduce the influence of seams on the fusion results, we propose a compromise scheme (i.e., an averaging method) as follows
F (x, y ) =
L 1 Fn (x, y ). L
(30)
n=1
It should be noted that in the present study, n × n and σ in Eq. (28) were set to 13 × 13 and 8, respectively, because these settings generated satisfactory results in most cases. 4.2. Final fusion scheme In order to make the proposed method explicit, further details of the implementation of the method are illustrated in Fig. 5, where FSI and FSB denote the fusion schemes for the focused interior regions and the focused border regions, respectively. For FSI, the corresponding pixels located in the focused regions in the source images can be extracted directly
H. Li et al. / Information Sciences 349–350 (2016) 25–49
33
Fig. 6. Decision maps obtained using different scales. (a), (e) Detection results for Fig. 1(a) and (b) generated by Eq. (18) using L = 1 and the threshold T = 0.0025, respectively. (b)–(d), (f)–(h) Detection results for the “Lab” source images using L = 4, 10, 15 and N1 = 0, respectively.
Fig. 7. Performance with different values of N1 and L when processing Fig. 1(a). (a)–(d), (e)–(h), (i)–(l), (m)–(p) Decision maps obtained after filtering by “bwareaopen” using L = 1, 4, 10, 15 and N1 = 2, 7, 10, 15, respectively.
to compose the fused image. Similarly to a previous study [4], a sliding window technique with a size of N × M is employed to determine which pixel is located in the focused interior region. If a pixel located at (x, y) is in the focused interior regions, the following two cases will occur: z i (x, y ) =
(N+1 )/2
(M+1 )/2
l1 =−(N−1 )/2 l2 =−(M−1 )/2
Di ( x + l1 , y + l2 ) = 0,
(31)
34
H. Li et al. / Information Sciences 349–350 (2016) 25–49
Fig. 8. Performance with different values of N1 and L when processing Fig. 1(b). (a)–(d), (e)–(h), (i)–(l), (m)–(p) Decision maps obtained after filtering using “bwareaopen” with L = 1, 4, 10, 15 and N1 = 2, 7, 10, 15, respectively.
Algorithm 1 Multifocus image fusion with multiscale neighborhood. Step Step Step Step Step Step Step Step Step
1: Give multifocus source images fi (x, y ), (i = A, B ), and the values of the multiscale factors L and N1 . 2: Calculate the focus information for each source image according to Eq. (17) at each scale n, n = 1, 2, 3 . . . ,L. 3: Detect the focused region and construct the IDMs for each input image using Eq. (18). 4: Use the multiscale local neighborhood technique to obtain the FDMs for each source image. 5: Construct the multiscale decision maps for each image according to Eqs. (25)–(27). 6: Construct the initial fused image at each scale n using Eq. (29). 7: Apply the averaging operator to the initial fused images to obtain the fused image F . 8: Use Eqs. (31) and (32) to determine the focused interior regions. 9: Apply Eq. (33) to the source images and the fused image F to construct the final fused image. z j (x, y ) =
(N+1 )/2
(M+1 )/2
D j (x + l1 , y + l2 ) = MN,
(32)
l1 =−(N−1 )/2 l2 =−(M−1 )/2
where i = j, i, j = A, B. In these conditions, the final fusion result can be generated by the following scheme.
⎧ ⎨ fA (x, y ), fF (x, y ) = fB (x, y ), ⎩ 1 L L
l=1 Fl
i f z A (x, y ) = MN and z B (x, y ) = 0 i f z A (x, y ) = 0 and z B (x, y ) = MN.
(x, y ),
(33)
i f 0 < z A (x, y ) < MN or 0 < z B (x, y ) < MN
In this fusion scheme, “z i (x, y ) = MN and z j (x, y ) = 0” indicates that the pixel is focused at position (x, y) in image i, so it is used to construct the fused image. In addition, “0 < z A (x, y ) < MN or 0 < z B (x, y ) < MN” in Eq. (33) means that the corresponding pixels in image A and B are located in the indeterminacy regions. In this case, the pixels in the fused image are generated by the averaging method. To summarize the descriptions given above, the proposed method for fusing a pair of source images is presented in Algorithm 1.
H. Li et al. / Information Sciences 349–350 (2016) 25–49
35
Fig. 9. Illustrations of four pairs of multifocus testing images. (a) and (b) “Clock” multifocus images measuring 512 × 512. (c) and (d) “Book” multifocus images measuring 320 × 240 (e) and (f) “Pepsi” multifocus images measuring 512 × 512. (g) and (h) “Boy” multifocus images measuring 640 × 480.
Fig. 10. Performance with different scales L and N1 using different source images. (a)–(c), (f)–(h) The corresponding IDMs obtained for the “Clock” source images using L = 0, N1 = 0, L = 10, N1 = 10, and L = 15, N1 = 15. (k)–(m), (p)–(r) The corresponding IDMs obtained for the “Book” source images using L = 0, N1 = 0, L = 10, N1 = 10, and L = 15, N1 = 15. (d)−(e), (i)−(j), (n)−(o), (s)−(t) The corresponding decision maps obtained for (b)–(c), (g)−(h), (l)−(m), (q)−(r) after filling the isolated regions with “bwareaopen.”
5. Experiments and analysis In this section, we first describe performance evaluations based on various parameters, i.e., the multiscale factors L and N1 , used for producing the IDMs and the FDMs. Next, we present evaluations of the performance of the proposed fusion method when fusing the focused border regions based on comparisons with the traditional DWT-based method, the stationary wavelet transform (SWT)-based method, and the NSCT-based method. Finally, we compared the results obtained by the proposed method after integrating the advantages of integral and fractional differentials with those produced by some
36
H. Li et al. / Information Sciences 349–350 (2016) 25–49
Fig. 11. Performance with different scales L and N1 using different source images. (a)–(c), (f)–(h) The corresponding IDMs obtained for the “Pepsi” source images using L = 0, N1 = 0, L = 10, N1 = 10, and L = 15, N1 = 15. (k)–(m), (p)–(r) The corresponding IDMs obtained for the “Boy” source images using L = 0, N1 = 0, L = 10, N1 = 10, and L = 15, N1 = 15. (d) and (e), (i) and (j), (n) and (o), (s) and (t) The corresponding decision maps obtained for (b) and (c), (g) and (h), (l) and (m), (q) and (r) after filling the isolated regions with “bwareaopen.”
state-of-the-art fusion methods. In our experiments, all of the tests were implemented in Matlab 7.01 using a laptop with a Core (TM) Duo 2.1 GHz CPU and 2GB RAM. 5.1. Performance of the parameters In this section, we consider the performance of different multiscale factors used for producing the IDMs and FDMs. For simplicity, we utilized the multifocus “Lab” images shown in Fig. 1 (a) and (b) in this experiment. In order to evaluate the performance of different scale factors, L and N1 were varied from 0 to 15. Fig. 6 (a)–(d) and Fig. 6(e)–(h)) show the decision maps obtained for each “Lab” source image (see Fig. 1(a) and (b)) with L = 1, 4, 10, 15 and N1 = 0. These detection results demonstrate that the results improved as the scale factor L increased to some extent because a larger scale factor L meant that more information was contained in the local neighborhood of the current pixel. The results demonstrate that the proposed method could distinguish the focused pixel in an effective manner to some extent even when it was located in the smooth regions. However, the detection results also indicate that as the scale factor L increased, the number of erroneously detected pixels increased in the smooth regions. Thus, the scale factor L should not be set at an excessively large value to ensure that good detection results are obtained. Figs. 7 and 8 illustrate the estimated decision maps obtained using different values of N1 when L was fixed at 1, 4, 10, 15. From the first rows to the last in Figs. 7 and 8, the estimated results for Fig. 6(a)–(d) and Fig. 6(e)–(h) were generated using scale factors of N1 = 2, 7, 10, 15 with L = 1, 4, 10, 15, respectively. These figures demonstrate that almost ideal results were obtained when the scale factor N1 was set to 10 ∼ 15 for all factors of L. Thus, N1 ∈ [10, 15] was robust to the value of L (L ∈ Z+ and L ∈ [0, 15]), so the second scale factor N1 could be set to 10 ∼ 15. For the “Lab” source images, Figs. 7 and 8 indicate that L = 1 was a good choice when N1 = 15. However, for many real images, the complexity of the content may hinder focused pixel detection, and thus a relatively larger scale factor L may be beneficial for evaluating the focusing property of a pixel. In addition, the detection results shown in Figs. 7 and 8 indicate that our method was robust to the values of L and N1 at relatively large scales. Based on this discussion and the computational complexity, the parameter L can be also selected from 10 to 15. The source images shown in Fig. 9(a)–(h) were employed to further demonstrate the performance of the proposed method, thereby illustrating the rationality of the parameter settings discussed above. The corresponding initial
H. Li et al. / Information Sciences 349–350 (2016) 25–49
37
Fig. 12. Fusion results and comparisons of the visual effects obtained using the DWT, SWT, NSCT, and averaging-based methods. (a)–(d) “Lab” image fusion results obtained using the DW T, SW T, NSCT, and averaging-based methods. (e)–(h) “Clock” image fusion results obtained using the DW T, SW T, NSCT, and averaging-based methods, (i)–(l) Differences between (a)–(d) and Fig. 1(b). (m)–(p) Differences between (e)–(h) and Fig. 9(a).
detection results for Fig. 9(a)–(h) are shown in the first columns in Figs. 10 and 11. These detection results were generated by Eq. (18) using L = 0 and N1 = 0. The second and third columns in Figs. 10 and 11 show the initial detection results obtained using L = 10, N1 = 10 and L = 15, N1 = 15, respectively, and the corresponding FDMs generated by filtering the small isolated regions are presented in the last two columns of Figs. 10 and 11. These experimental results demonstrate that the parameters L ∈ [10, 15] and N1 ∈ [10, 15] were reasonable. More importantly, a large number of experiments showed that these parameter settings were effective for most images. In practice, to reduce the number of computations, the parameters L and N1 can be set to 10. For various source images, the proposed method may require a different set of parameters to obtain the best performance, but the results produced using our method were still impressive when L = 10, N1 = 10. 5.2. Performance of the averaging method In this section, we compare the proposed averaging method described in Section 4.1 with the DWT-based method, SWTbased method, and NSCT-based method. In all of these methods, the lowpass subband coefficients and the bandpass subband coefficients were simply merged by the “averaging” scheme and the “absolute maximum choosing” scheme, respectively. The “db5” wavelet was used in the DWT-based and SWT-based methods. In all of these methods, the decomposition levels were set to 4. For the “Lab” and “Clock” source images (see Fig. 1(a) and (b), and Fig. 9(a) and (b)), the fusion results obtained with different methods are shown in Fig. 12(a)–(h). In order to facilitate a better comparison, the differences between the fused images and their corresponding source images (shown in Fig. 1(b) and Fig. 9(a)) are given in Fig. 12(i)–(p). For the focused regions, the difference between the source image and the fused image should be zero, which means that the entire focused region is contained within the fused image. Based on these differences, we found that many artifacts were introduced into the fusion results obtained by DWT due to the lack of shift invariance. Compared with the DWT-based method, the number
38
H. Li et al. / Information Sciences 349–350 (2016) 25–49 Table 1 Quantitative assessments of Fig. 12(a)–(d) and Fig. 12(e)–(h) using MI, QAB/F , VIF, QP , QY , and QCB metrics. Images
Metric
DWT
SWT
NSCT
Averaging
MI QAB/F
6.5285 0.5742
6.9608 0.6342
7.0719 0.6575
8.5060 0.7252
Figs. 12(a)–(d)
VIF QP QY QCB MI QAB/F
0.4503 0.6367 0.8504 0.6159 6.1493 0.5120
0.5011 0.7103 0.9035 0.6614 6.5757 0.5574
0.5068 0.7243 0.9179 0.6721 6.6676 0.5517
0.5641 0.7955 0.9686 0.7378 8.2022 0.6614
Figs. 12(e)–(h)
VIF QP QY QCB
0.4750 0.6276 0.8040 0.6451
0.5310 0.7084 0.8402 0.6818
0.5400 0.7343 0.8413 0.6936
0.5872 0.8119 0.9663 0.7482
Fig. 13. Performance comparison based on Fig. 12(a)–(h) using MI, QAB/F , VIF, QP , QY , and QCB . (a) Bar graphs of these metrics with different methods after processing the “Lab” source images. (b) Bar graphs of these metrics with different methods after processing the “Clock” source images.
of artifacts was reduced greatly in the fusion results generated by the SWT-based and NSCT-based methods due to the shift invariance of SWT and NSCT. Moreover, Fig. 12(i)–(k) and Fig. 12(m)–(o) show clearly that the amount of information retained in the regions corresponding to the focused regions in Fig. 1(b) and Fig. 9(a) was greater than that in Fig. 12(l) and (p). This indicates that our proposed method (averaging method) could extract more useful information from the source images and transfer them into the fused images. More importantly, fewer visual artifacts, such as boundary seams and ringings, were present along the boundaries of the focused regions using our proposed method. However, a visual evaluation alone is not sufficient and thus an objective evaluation is required. In this section, we use popular objective criteria to quantify the clarity of the fusion results. The first criterion is mutual information (MI) [39,53], which is a metric defined as the sum of the MI between each input image and the fused image. The second criterion is the QAB/F metric, which was proposed by Xydeas and Petrovic [36,54] for assessing the fusion performance by evaluating the amount of edge information transferred from the source images to the fused image. In this metric, a Sobel edge detector is employed to calculate the strength and orientation information for each pixel in both the source and fused results. The third criterion is visual information fidelity (VIF) [43], which was designed based on natural scene statistics theory and the human visual system. This metric can evaluate the fusion performance by computing the information shared between the reference image and the fused image. In most cases, an ideal reference image is always difficult to obtain. Therefore, we used a modified version of VIF [13], which is defined by averaging the VIF values between the source images and the fused results, to evaluate the fusion performance in our experiments. However, the evaluation results according to any metric might not agree with the visual quality of the fused image. Thus, to guarantee the faithfulness and objectivity of our evaluation results, we used three further metrics, i.e., a metric based on phase congruency (QP ) [27,67], Yang’s metric (QY ) [27,57], and the Chen-Blum metric (QCB ) [5,27], to evaluate the performance of different methods. For all of the criteria, a higher value indicated a better fusion result.
H. Li et al. / Information Sciences 349–350 (2016) 25–49
39
Fig. 14. Fused images obtained using NSCT-CON, SR, Block, NeiDis, DSFIT, and our proposed method. (a)–(r) Fusion results for “Lab”, “Boy,” and “Book” obtained using NSCT-CON, SR, Block, NeiDis, DSFIT, and our proposed method, respectively.
40
H. Li et al. / Information Sciences 349–350 (2016) 25–49
Fig. 15. Fused images obtained using NSCT-CON, SR, Block, NeiDis, DSFIT, and our proposed method. (a)–(l) Fusion results obtained for “Clock” and “Pepsi” using NSCT-CON, SR, Block, NeiDis, DSFIT, and our proposed method, respectively.
Table 1 compares the quantitative results for Fig. 12(a)–(h). To facilitate comparisons, the evaluation values obtained using different fusion methods are presented in Fig. 13. The experimental data show that the NSCT-based method could transfer more information from the source images into the fused images compared with the DWT- and SWT-based methods. However, the values also show that the proposed averaging method yielded the best fusion results in terms of the largest values for the MI, QAB/F , VIF, QP , QY , and QCB metrics. This demonstrates that the “averaging method” performed better than the other methods.
H. Li et al. / Information Sciences 349–350 (2016) 25–49
41
Fig. 16. Differences between fused images and their corresponding source images. (a)–(r) Differences between the “Lab”, “Boy,” and “Book” fusion results, and the corresponding source images, respectively. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
42
H. Li et al. / Information Sciences 349–350 (2016) 25–49
Fig. 17. Differences between the fused images and their corresponding source images. (a)–(l) Differences between the “Clock” and “Pepsi” fusion results and their corresponding source images, respectively. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
5.3. Comparison with other methods In this section, we compare the performance of our proposed approach with five other methods, i.e., the NSCT-contrastbased method (NSCT-CON) [63], sparse representation-based method (SR) [56], block-based method (Block) [15], NeiDis [68], and dense SIFT-based method (DSIFT) [28]. For the first method, we used four decomposition levels, with four, eight, eight, and sixteen directions ranging from a coarser scale to a fine scale, where we adopted the fusion schemes proposed in [63] for different subbands. In the NeiDis-based method, the level of decomposition was set to four, where the low-frequency and
H. Li et al. / Information Sciences 349–350 (2016) 25–49
43
0.9
12
0.8
10 0.7
NSCT-CON
8
NSCT-CON
0.6
6
NeiDis
SR AB/F
Block
Q
MI
SR 0.5
Block NeiDis
0.4
DSIFT
DSIFT 4
0.3
Proposed
Proposed
0.2
2 0.1
0
0
Figs 14(a)-(f) Figs 14(g)-(l) Figs 14(m)-(r) Figs 15(a)-(f) Figs 15(g)-(l)
Figs 14(a)-(f) Figs 14(g)-(l) Figs 14(m)-(r) Figs 15(a)-(f) Figs 15(g)-(l)
a
b
1.2
1.2
1
1
NSCT-CON
0.8
NSCT-CON
0.8
Block
0.6
SR QP
QP
SR
Block
0.6
NeiDis
NeiDis
DSIFT 0.4
DSIFT 0.4
Proposed
0.2
Proposed
0.2
0
0 Figs 14(a)-(f) Figs 14(g)-(l) Figs 14(m)-(r) Figs 15(a)-(f) Figs 15(g)-(l)
Figs 14(a)-(f) Figs 14(g)-(l) Figs 14(m)-(r) Figs 15(a)-(f) Figs 15(g)-(l)
c
d 1
1.2
0.9 1
0.8 0.7
NSCT-CON
0.8
Block
0.6
NeiDis
0.4
NSCT-CON SR
0.6 Q CB
QY
SR
Block
0.5
NeiDis
DSIFT
0.4
DSIFT
Proposed
0.3
Proposed
0.2
0.2
0.1 0
0
Figs 14(a)-(f) Figs 14(g)-(l) Figs 14(m)-(r) Figs 15(a)-(f) Figs 15(g)-(l)
Figs 14(a)-(f) Figs 14(g)-(l) Figs 14(m)-(r) Figs 15(a)-(f) Figs 15(g)-(l)
e
f
25000
1800 1600
20000
1400
NSCT-CON Time/s
Block NeiDis 10000
NSCT-CON
1200 PSM/MB
SR
15000
SR 1000
Block NeiDis
800
DSIFT Proposed
DSIFT 600
Proposed
400
5000
200 0
0 Lab
Boy
Book
g
Clock
Pepsi
Lab
Boy
Book
Clock
Pepsi
h
Fig. 18. Performance comparison based on Fig. 14(a)–(r) and Fig. 15(a)–(l) using MI, QAB/F , VIF, QP , QY and QCB , Time/s, and PSM/MB.
44
H. Li et al. / Information Sciences 349–350 (2016) 25–49 Table 2 Evaluations of the fusion results obtained for Fig. 14(a)–(r) using the MI, QAB/F , VIF,QP , QY , and QCB metrics. Images
Metric
NSCT-CON
SR
Block
NeiDis
DSIFT
Proposed
MI QAB/F VIF
7.3519 0.6687 0.5212
7.0284 0.6404 0.5323
8.5205 0.7383 0.4829
7.1305 0.6562 0.4964
8.5490 0.7271 0.5564
8.7873 0.7326 0.5684
Figs. 14(a)–(f)
QP QY QCB MI QAB/F VIF
0.7419 0.9249 0.6788 5.1856 0.7406 0.5048
0.7257 0.8698 0.6080 4.2440 0.7241 0.5225
0.6898 0.9183 0.6507 7.2108 0.7560 0.5218
0.7178 0.9011 0.6736 5.4575 0.7364 0.5003
0.7949 0.9475 0.7150 7.2252 0.7558 0.5241
0.8039 0.9889 0.7617 7.2110 0.7561 0.5244
Figs. 14(g)–(l)
QP QY QCB MI QAB/F VIF
0.6458 0.9895 0.7946 7.4614 0.6273 0.7153
0.6501 0.9752 0.7397 7.6445 0.6408 0.7660
0.6590 0.9954 0.8309 9.3684 0.7034 0.7407
0.6405 0.9878 0.7968 7.5919 0.6346 0.7212
0.6598 0.9959 0.8334 9.4358 0.6886 0.7307
0.6602 0.9962 0.8335 9.6394 0.7120 0.7597
Figs. 14(m)–(r)
QP QY QCB
0.9246 0.9109 0.8316
0.9291 0.9056 0.8094
0.9522 0.9557 0.8532
0.9375 0.9150 0.8431
0.9473 0.9639 0.8704
0.9579 0.9770 0.8764
Table 3 Evaluation of the fusion results for Fig. 15(a)–(l) using the MI, QAB/F , VIF,QP , QY , and QCB metrics. Images
Metric
NSCT-CON
SR
Block
NeiDis
DSIFT
Proposed
MI QAB/F VIF
7.0695 0.5661 0.5503
6.8600 0.5974 0.5545
8.6873 0.6736 0.5277
6.7788 0.5821 0.5221
8.4701 0.6757 0.5676
8.5248 0.6823 0.5879
Figs. 15(a)–(f)
QP QY QCB MI QAB/F VIF
0.7613 0.8575 0.7033 7.0109 0.7539 0.6534
0.7001 0.7361 0.4455 6.9990 0.7375 0.5982
0.7045 0.9574 0.7230 8.3108 0.7205 0.5705
0.6936 0.8393 0.6290 6.8448 0.7095 0.6356
0.8022 0.9665 0.7563 8.6840 0.7453 0.6485
0.7937 0.9869 0.7648 8.8331 0.7632 0.6840
Figs. 15(g)–(l)
QP QY QCB
0.8454 0.9480 0.6819
0.7840 0.8765 0.5314
0.6903 0.9224 0.6222
0.8152 0.9279 0.6659
0.8421 0.9462 0.7193
0.8746 0.9645 0.7558
Table 4 Computational time and memory usage rates with different fusion methods when processing the “Lab,” “Boy,” “Book,” “Clock,” and “Pepsi” source images. Images
Lab Boy Book Clock Pepsi
Metric
NSCT-CON
SR
Block
NeiDis
DSIFT
Proposed
Time/s PSM/MB Time/s PSM/MB Time/s PSM/MB Time/s PSM/MB Time/s PSM/MB
1104.5 1622 1061.5 1622 254.7110 1359 928.0610 1613 933.6780 1617
16925 1093 23266 1377 4680 968 16803 989 16863 1053
12.0430 1313 11.8560 1314 4.6950 1296 10.5770 1307 10.3740 1308
3.0260 1207 3.3390 1322 1.5600 1189 2.9640 1204 2.855 1322
39.7180 1146 29.7490 1137 11.7310 1132 33.7120 1170 38.1730 1135
585.8750 1479 574.6590 1498 132.0550 1119 493.5070 1419 499.0140 1442
high-frequency were fused using the “averaging” scheme and the “maximum with consistency check choosing” scheme, respectively. Moreover, we employed the optimal parameters for the NSCT-CON, SR, Block, NeiDis, and DSIFT methods specified in previous studies. We tested the performance of different fusion methods based on visual comparisons and quantitative assessments. In our experiments, we implemented the NSCT-CON and Block methods, but we used the original source codes provided for all of the other methods. Figs. 14 and 15 illustrate the fusion results obtained by different methods for the “Lab,” “Boy,” “Book,” “Clock,” and “Pepsi” multifocus image sets (see Fig. 1(a) and (b) and Fig. 9(a)–(h); most are available from the following website: http: //www.imagefusion.org/). The results show that most of the useful information in the source images was preserved in the
H. Li et al. / Information Sciences 349–350 (2016) 25–49
45
Fig. 19. The “3Book” source images. (a) Ideal reference fused image (b) left fused, (c) middle focused, and (d) right focused.
Table 5 Quantitative assessments of the different fusion results shown in Fig. 20(a)–(f) using MI, QAB/F , VIF, QP , QY , QCB , Time/s, and PSE/MB. Methods
MI
QAB/F
VIF
QP
QY
QCB
Times/s
PSF/MB
NSCT-CON SR Block NeiDis SIFT Proposed
10.2566 10.8295 11.0709 10.0316 11.1805 11.1184
0.7465 0.7498 0.8270 0.7378 0.8373 0.8424
0.4729 0.4992 0.4472 0.4553 0.4685 0.4779
0.7346 0.7269 0.6961 0.6986 0.7446 0.7705
1.1953 1.1933 1.1971 1.1617 1.2152 1.2225
0.9459 0.9092 0.9320 0.9408 0.9536 0.9639
1254.9 26906 41.7072 6.0060 58.0480 773.7930
1680 10 0 0 1196 1136 1105 1591
fused results. To facilitate a better comparison, Figs. 16 and 17 show the differences between the fused images (as shown in Figs. 14 and 15) and their corresponding source images (see Figs. 1(b) and9(a), (d), (f) and (h)). Moreover, the local regions enclosed by red boxes in Figs. 16 and 17 are enlarged at the bottom right corners to illustrate these differences. Figs. 16 and 17 show clearly that some of the information from the focused regions in Figs. 1(b) and 9(a), (d), (f) and (h) was still retained in Figs. 16 and 17, except for Figs. 16(f), (k), (l), (r) and 17(f) and (l). This demonstrates that some important information in the source images was not transferred into the fused images when the NSCT-CON, SR, Block, NeiDis, and DSIFT methods were applied to the real multifocus images. Moreover, the Block and DSIFT methods readily produced block effects and artifacts, which significantly affected the appearance of the fused images. However, for the synthetic images (see Fig. 9(g) and (h)), the fused results (see Fig. 16(k) and (l)) show that DSIFT obtained similar visual effects to our proposed method. This indicates that the performance of the proposed method is competitive with that of DSIFT for synthetic images. In particular, the fusion results produced using the proposed method and DSIFT had the same perceptual quality after processing the “Boy” source images. Compared with the other methods, the proposed method yielded the best results in most cases, where almost all of the useful information in the source images was present in the fused results. This is because the differences corresponding to the focused regions in Figs. 1(b) and 9(a), (d), (f) and (h) were close to zero. In addition, fewer artifacts and erroneous results were introduced during the fusion process. Moreover, the proposed method performed very well in the transition area between the focused and defocused regions because discontinuities such as boundary seams were mostly avoided, although the boundaries of the focused regions were not determined accurately in the decision maps due to the complex patterns in the source images. Therefore, the fused images generated by our proposed method were more desirable compared with those produced by the other methods. In order to assess the fusion performance in a more objective manner, we employed the MI, QAB/F , VIF, QP , QY , and QCB metrics in the quantitative evaluations. Tables 2 and 3 list the quantitative assessments of the fused images (see Fig. 14(a)– (r) and Fig. 15(a)–(l)) obtained using different methods. Moreover, the time and space (physical memory, PSM) consumption rates using different methods when processing the “Lab,” “Boy,” “Book,” “Clock,” and “Pepsi” source images are given in Table 4. For clarity, the values obtained with different methods are displayed as eight bar charts in Fig. 18. In Tables 2 and 3, the largest values for MI, QAB/F , VIF, QP , QY , and QCB are highlighted in boldface. According to these values and charts, the proposed method performed slightly worse than the “Block” method when processing the “Lab” source images in terms of QAB/F and the “Clock” source images in terms of MI, while it also performed worse than the SR method for the “Book” source images in terms of VIF. Moreover, the MI value was only slightly lower than that using DSIFT when processing the “Boy” source images, as well as slightly smaller in terms of QP when processing the “Clock” source images. However, using only one metric with the highest evaluation results does not reflect the fusion quality in an objective manner, but the proposed method had the best performance in terms of more than half of the metrics. Furthermore, the “Book,” “Lab,” and “Clock” fusion results generated by the proposed method had the best visual effects. There was no apparent difference between DSIFT and our method in terms of the “Boy” fusion results, but the conclusion that our method
46
H. Li et al. / Information Sciences 349–350 (2016) 25–49
Fig. 20. Fusion results obtained for the “3Book” source images and the differences between the fused results and Fig. 19(a). (a)–(f) Fusion results obtained using NSCT-CON, SR, Block, NeiDis, DSFIT, and the proposed method. (g)–(l) The corresponding differences between (a)–(f) and Fig. 19(a). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
outperformed DSIFT was supported by an objective assessment of the fusion results. However, exploiting the advantages of neighborhoods with different sizes means that the complexity of the proposed method is increased, which was demonstrated by the time and space consumption rates with different methods. But, the execution time of our method could be reduced considerably by optimizing the speed and efficiency of its implementation. Although some of the quantitative evaluation metrics were not the highest with our method, our conclusion that the proposed method performed best was supported by considerations of the visual effects and the objective quality.
H. Li et al. / Information Sciences 349–350 (2016) 25–49
47
To demonstrate that the proposed method can handle an image set containing more than two source images, we performed an experiment using three “3Book” source images (shown in Fig. 19(a)–(c)). The fusion results produced by the different fusion methods are shown in Fig. 20(a)–(f). For the SR, Block, and DSIFT methods, it should be noted that the first and second source images were merged first, before the fused image was then merged with the left source image to obtain the final fusion results. For NSCT-CON, NeiDis, and the proposed method, the fused image was obtained by merging the source images in parallel. Fig. 20(g)–(l) show the differences between the fused results and the source image in Fig. 19(a). The local regions enclosed by the red boxes in Fig. 20(g)–(l) were extracted and shown in the bottom right corners. It can be seen that the fusion results obtained using our proposed method contained fewer artifacts and they exhibited the best visual effects compared with the other methods. To assess the fusion performance of different methods in an objective manner, we used all of the evaluation criteria discussed above in this experiment. Table 5 shows the quantitative assessments of different methods using the “3Book” source images, which indicate that our proposed method outperformed the other methods in terms of QAB/F , QP , QY , and QCB . For the SR and DSIFT methods, only one evaluation result was higher than those obtained using the proposed method. Based on this objective evaluation and visual comparisons, we conclude that our method has better fusion performance than existing methods. However, our method performed worse than the Block, NeiDis, and DSIFT methods in terms of its time and space consumption rates. Despite these differences, the time and space consumption rates of the proposed method are still acceptable for software development. According to the analysis presented above, our proposed method can detect the focused regions accurately, and the boundary seams and ringings can be reduced greatly. Thus, the proposed method can generate pleasing fusion results and it outperformed conventional image fusion methods, including the NSCT-based, Block-based, SR-based, NeiDis-based, and DSIFT-based methods, according to our comprehensive assessment. 6. Conclusion In this study, we proposed a novel multifocus image fusion method by combining the structure tensors of integral and fractional differentials. In this method, we employ a new focus measure based on the eigenvalue of mixed-order structure tensors. In order to integrate the advantages of local neighborhoods with different sizes, we proposed a novel MFM technique and a post-processing processing method, which we utilize to identify the focused regions. In addition, we proposed an averaging fusion technique based on the detection results obtained for different neighborhoods to solve the fusion problem in the transition zones between focused and defocused areas. Finally, we demonstrated the validity and superior performance of the proposed method based on experimental results. Further research may be performed in the areas of multisensor image fusion by integrating the advantages of integral and fractional differentials, as well as applications to new techniques for image fusion such as saliency detection [45], nonnegative matrix factorization [30], dictionary learning [34,73], and sparse low-rank approximation [3,66]. Acknowledgments We thank the anonymous reviewers for their constructive and valuable comments and suggestions. We also thank Dr Xiaohui Yuan from the University of North Texas (USA) for suggestions regarding the linguistic presentation of the manuscript. Moreover, we thank Dr Zheng Liu, Dr Yu Liu, and Dr Hengjun Zhao from the University of Ottawa, University of Science and Technology of China, and Chongqing University, respectively, for providing the Matlab codes for their methods [27,28,68]. This research was supported by the National Natural Science Foundation of China (nos. 61302041 and 61562053), the Applied Basic Research Foundation of Yunnan Provincial Science and Technology Department (no. 2013FD011), the Talent Cultivation Foundation of Kunming University of Science and Technology (no. KKZ3201303027), the Yunnan Provincial Foundation for Personnel Cultivation (nos. KKSY201303086, KKSY201403116, and KKSY201403024), and the Major Project of Education Department of Yunnan Province (no. 2014Z022). References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12]
G. Bhatnagar, Q.M. Wu, Z. Liu, A new contrast based multimodal medical image fusion framework, Neurocomputing 157 (2015) 143–152. E.J. Candeès, L. Demanet, D.L. Donoho, L. Ying, Fast discrete curvelet transforms, Multiscale Model. Simul. 5 (3) (2006) 861–899. E.J. Candeès, X. Li, Y. Ma, J. Wright, Robust principal component analysis, J. ACM 58 (3) (2011) 1–37. Y. Chai, H.F. Li, Z.H. Li, Multifocus image fusion scheme using focused region detection and multiresolution, Opt. Commun. 284 (19) (2011) 4376–4389. Y. Chen, R.S. Blum, A new automated quality assessment algorithm for image fusion, Image Vis. Comput. 27 (10) (2009) 1421–1432. A.L.d. Cunha, J. Zhou, M.N. Do, The nonsubsampled contourlet transform: Theory, design and applications, IEEE Trans. Image Process. 15 (10) (2006) 3089–3101. I. Dea, B. Chanda, Multi-focus image fusion using a morphology-based focus measure in a quad-tree structure, Inf. Fusion 14 (2) (2013) 136–146. M.N. Do, M. Vetterli, The contourlet transform: An efficient directional multiresolution image representation, IEEE Trans. Image Process. 14 (12) (2005) 2091–2106. B. Du, M.F. Zhang, L.F. Zhang, X.L. Li, Hyperspectral biological image compression based on multiway tensor projection, in: Proceedings of the 2014 IEEE International Conference on Multimedia and Expo, Chengdu, 2014, pp. 1–6. J.Y. Duan, G.F. Meng, S.M. Xiang, C.H. Pan, Multifocus image fusion via focus segmentation and region, Neurocomputing 140 (2014) 193–209. Q. Guo, S. Chen, H. Leung, S. Liu, Covariance intersection based image fusion technique with application to pan sharpening in remote sensing, Inf. Sci. 180 (18) (2010) 3434–3443. D. Guo, J.W. Yan, X.B. Qu, High quality multi-focus image fusion using self-similarity and depth information, Opt. Commun. 338 (2015) 138–144.
48 [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67]
H. Li et al. / Information Sciences 349–350 (2016) 25–49 Y. Han, Y. Cai, Y. Cao, X. Xu, A new image fusion performance metric based on visual information fidelity, Inf. Fusion 14 (2) (2013) 127–135. R.C. Hong, W.Y. Cao, J.X. Pang, J.G. Jiang, Directional projection based image fusion quality metric, Inf. Sci. 281 (2014) 611–619. W. Huang, Z. Jing, Evaluation of focus measures in multi-focus image fusion, Pattern Recognit. Lett. 28 (4) (2007) 493–500. S. Ioannidou, V. Karathanassi, Investigation of the dual-tree complex and shift-invariant discrete wavelet transforms on quickbird image fusion, IEEE Geosci. Remote Sens. Lett. 4 (1) (2007) 166–170. L.H. Jin, H. Liu, X.Y. Xu, E.N. Song, Improved direction estimation for Di Zenzo’s multichannel image gradient operator, Pattern Recognit. 45 (12) (2012) 4300–4311. J. Lewis, R. Ocallaghan, S. Nikolov, D. Bull, N. Canagarajah, Pixel- and region-based image fusion with complex wavelets, Inf. Fusion 8 (2) (2007) 119–130. H.F. Li, Y. Chai, Z.F. Li, A new fusion scheme for multifocus images based on focused pixels detection, Mach. Vis. Appl. 24 (8) (2013a) 1167–1181. S.T. Li, X.D. Kang, J.W. Hu, Image fusion with guided filtering, IEEE Trans. Image Process. 22 (7) (2013b) 2864–2874. S.T. Li, X.D. Kang, J.W. Hu, B. Yang, Image matting for fusion of multi-focus images in dynamic scenes, Inf. Fus. 14 (2) (2013c) 147–162. H. Li, B. Manjunath, S. Mitra, Multisensor image fusion using the wavelet transform, Graph. Models Image Process. 57 (3) (1995) 235–245. S.T. Li, B. Yang, Multifocus image fusion by combining curvelet and wavelet transform, Pattern Recognit. Lett. 29 (9) (2008a) 1295–1301. S.T. Li, B. Yang, Multifocus image fusion using region segmentation and spatial frequency, Image Vision Comput. 26 (7) (2008b) 971–979. H.F. Li, Z.T. Yu, C.L. Mao, Fractional differential and variational method for image fusion and super-resolution, Neurocomputing 171 (2016) 138–148. P.L. Lin, P.Y. Huang, Fusion methods based on dynamic-segmented morphological wavelet or cut and paste for multifocus images, Signal Process. 88 (6) (2008) 1511–1527. Z. Liu, E. Blasch, Z.Y. Xue, J.Y. Zhao, R. Laganiere, W. Wu, Objective assessment of multiresolution image fusion algorithms for context enhancement in night vision: A comparative study, IEEE Trans. Pattern Anal. Mach. Intell. 34 (1) (2012) 94–108. Y. Liu, S.P. Liu, Z.F. Wang, Multi-focus image fusion with dense SIFT, Inf. Fusion 23 (2015) 139–155. T.L. Liu, D.C. Tao, Classification with noisy labels by importance reweighting, IEEE Trans. Pattern Analysis and Machine Intelligence 38 (3) (2016) 447–461. T.L. Liu, D.C. Tao, On the performance of manhattan nonnegative matrix factorization, IEEE Trans. Neural Netw. Learn. Syst. PP (99) (2015) 1, doi:10. 1109/TNNLS.2015.2458986. L. Liu, M.Y. Yu, L. Shao, Multiview alignment hashing for efficient image search, IEEE Trans. Image Process. 24 (3) (2015) 956–966. D. Looney, D.P. Mandic, Multiscale image fusion using complex extension of EMD, IEEE Trans. Signal Process. 57 (4) (2009) 1626–1630. S.K. Nayar, Y. Nakagawa, Shape from focus, IEEE Trans. Pattern Anal. Mach. Intell. 16 (8) (1994) 824–831. M. Nejati, S. Samavi, S. Shirani, Multi-focus image fusion using dictionary based sparse representation, Inf. Fusion 25 (2015) 72–84. G. Pajares, J. Cruz, A wavelet-based image fusion tutorial, Pattern Recognit. 13 (2) (2004) 1855–1872. V. Petrovic´ , C.S. Xydeas, Sensor noise effects on signal-level image fusion performance, Inf. Fusion 4 (3) (2003) 167–183. G. Piella, Image fusion for enhanced visualization: A variational approach, Int. J. Comput. Vis. 83 (1) (2009) 1–11. Y.F. Pu, J.L. Zhou, X. Yuan, Fractional differential mask: A fractional differential-based approach for multiscale texture enhancement, IEEE Trans. Image Process. 19 (2) (2010) 491–511. G. Qu, D. Zhang, Information measure for performance of image fusion, Electron. Lett. 38 (7) (2002) 313–315. R. Redondo, F. Šroubek, S. Fischer, G. Cristóbal, Multifocus image fusion using the log-Gabor transform and a multisize windows technique, Inf. Fusion 10 (2) (2009) 163–171. A. Rocco, B. West, Fractional calculus and the evolution of fractal phenomena, Physics A 265 (3-4) (1990) 535–546. J.B. Sharman, K.K. Sharma, V. Sahula, Hybrid image fusion scheme using self-fractional fourier functions and multivariate empirical mode decomposition, Signal Process. 100 (2014) 146–159. H.R. Sheikh, A.C. Bovik, Image information and visual quality, IEEE Trans. Image Process. 15 (2) (2006) 430–444. J.M. Sun, D.C. Tao, C. Faloutsos, Beyond streams and graphs: Dynamic tensor analysis, in: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2006, pp. 374–383. D.P. Tao, J. Cheng, M.L. Song, X. Lin, Manifold ranking-based matrix factorization for saliency detection, IEEE Trans. Neural Netw. Learn. Syst. PP (99) (2015) 1, doi:10.1109/TNNLS.2015.2461554. D.C. Tao, X.L. Li, W.M. Hu, S. Maybank, X.D. Wu, Supervised tensor learning, in: Proceedings of the Fifth IEEE International Conference on Data Mining, 2005, pp. 450–457. D.P. Tao, X. Lin, L.W. Jin, X.L. Li, Principal component 2-D long short-term memory for font recognition on single Chinese characters, IEEE Trans. Cybern. 46 (3) (2016) 756–765. F.B. Tatom, The relationship between fractional calculus and fractals, Fractals 3 (1) (1995) 217–229. X.Y. Wang, P.P. Niu, H.Y. Hong, C.P. Wang, A.L. Wang, A new robust color image watermarking using local quaternion exponent moments, Inf. Sci. 277 (2014) 731–754. X.Y. Wang, H.Y. Yang, Y. Zhang, Z.K. Fu, Image denoising using SVM classification in nonsubsampled contourlet transform domain, Inf. Sci. 246 (2013) 155–176. W. Wu, X.M. Yang, Y. Pang, J. Peng, G.G. Jeon, A multifocus image fusion method by using hidden Markov model, Opt. Commun. 287 (15) (2013) 63–72. C. Xu, D.C. Tao, M.L. Song, X. Lin, Multi-view intact space learning, IEEE Trans. Pattern Anal. Mach. Intell. 37 (12) (2015) 2531–2544. X. Xu, D.C. Tao, C. Xu, Large-margin multi-view information bottleneck, IEEE Trans. Pattern Anal. Mach. Intell. 36 (8) (2014) 1559–1572. C.S. Xydeas, V. Petrovic´ , Objective image fusion performance measure, Electron. Lett. 36 (4) (20 0 0) 308–309. L. Yang, B.L. Guo, W. Ni, Multimodality medical image fusion based on multiscale geometric analysis of contourlet transform, Neurocomputing 72 (1–3) (2008) 203–211. B. Yang, S. Li, Multifocus image fusion and restoration with sparse representation, IEEE Trans. Instrum. Meas. 59 (4) (2010) 884–892. C. Yang, J. Zhang, X. Wang, X. Liu, A novel similarity based quality metric for image fusion, Inf. Fusion 9 (2) (2008) 156–160. T. Yeo, S. Ong, Jayasooriah, R. Sinniah, Autofocusing for tissue microscopy, Image Vis. Comput. 11 (10) (1993) 629–639. H.T. Yin, Sparse representation with learned multiscale dictionary for image fusion, Neurocomputing 148 (2015) 600–610. X.H. Yuan, B. Buckles, A wavelet-based noise-aware method for fusing noisy imagery, in: Proceedings of the IEEE International Conference on Image Processing, San Antonio, TX, 2007, pp. 16–19. X.C. Yuan, C.M. Pun, C.L.P. Chen, Robust mel-frequency cepstral coefficients feature detection and dual-tree complex wavelet transform for digital audio watermarking, Inf. Sci. 298 (2015) 159–179. Z. Zhang, R.S. Blum, A categorization of multiscale decomposition based image fusion schemes with a performance study for a digital camera application, Proc. IEEE 87 (8) (1999) 1315–1326. Q. Zhang, B.L. Guo, Multifocus image fusion using the nonsubsampled contourlet transform, Signal Process. 89 (7) (2009) 1334–1346. L.F. Zhang, L.P. Zhang, D.C. Tao, B. Du, A sparse and discriminative tensor to vector projection for human gait feature representation, Signal Process. 106 (2015) 245–252. L.F. Zhang, L.P. Zhang, D.C. Tao, X. Huang, A multifeature tensor for remote-sensing target recognition, IEEE Geosci. Remote Sens. Lett. 8 (2) (2011) 374–378. L.F. Zhang, Q. Zhang, L.P. Zhang, D.C. Tao, X. Huang, B. Du, Ensemble manifold regularized sparse low-rank approximation for multiview feature embedding, Pattern Recognit. 48 (2015) 3102–3112. J. Zhao, R. Laganiere, Z. Liu, Performance assessment of combinative pixel-level image fusion based on an absolute feature measurement, Int. J. Innov. Comput. Inf. Control 3 (6A) (2007) 1433–1447.
H. Li et al. / Information Sciences 349–350 (2016) 25–49
49
[68] H.J. Zhao, Z.W. Shang, Y.Y. Tang, B. Fang, Multi-focus image fusion based on the neighbor distance, Pattern Recognit. 46 (3) (2013) 1002–1011. [69] S. Zheng, W.Z. Shi, J. Liu, G.X. Zhu, J.W. Tian, Multisource image fusion method using support value transform, IEEE Trans. Image Process. 16 (7) (2007) 1831–1839. [70] Z.Q. Zhou, S. Liu, B. Wang, Multi-scale weighted gradient-based fusion for multi-focus images, Inf. Fusion 20 (2014) 60–72. [71] H. Zhu, H. Leung, Z.S. He, A variational Bayesian approach to robust sensor fusion based on student-t distribution, Inf. Sci. 222 (2013) 201–214. [72] Q.S. Zhu, J.M. Mai, L. Shao, A fast single image haze removal algorithm using color attenuation prior, IEEE Trans. Image Process. 24 (11) (2015) 3522– 3533. [73] F. Zhu, L. Shao, Weakly-supervised cross-domain dictionary learning for visual recognition, Int. J. Comput. Vis. 109 (1) (2014) 42–59.