ARTICLE IN PRESS
JID: CAEE
[m3Gsc;February 22, 2017;22:1]
Computers and Electrical Engineering 0 0 0 (2017) 1–17
Contents lists available at ScienceDirect
Computers and Electrical Engineering journal homepage: www.elsevier.com/locate/compeleceng
Multi-focus image fusion based on optimal defocus estimationR Veysel Aslantas, Ahmet Nusret Toprak∗ Erciyes University, Faculty of Engineering, Department of Computer Engineering 38039 Kayseri, Turkey
a r t i c l e
i n f o
Article history: Received 2 May 2016 Revised 2 February 2017 Accepted 2 February 2017 Available online xxx Keywords: Multi-focus image fusion Extended depth of field Defocus estimation
a b s t r a c t One of the main drawbacks of the imaging systems is limited depth of field which prevents them from obtaining an all-in-focus image of the environment. This paper presents an efficient, pixel-based multi-focus image fusion method which generates an all-in-focus image by combining the images that are acquired from the same point of view with different focus settings. The proposed method first estimates the point spread function of each source image by utilizing the Levenberg–Marquardt algorithm. Then, artificially blurred versions of the source images are computed by convolving them with the estimated point spread functions. Fusion map is computed by making use of both the source and the artificially blurred images. At last, the fusion map is improved by morphological operators. Experimental results show that the proposed method is computationally competitive with the state-of-the-art methods and outperforms them in terms of both visual and quantitative metric evaluations. © 2017 Elsevier Ltd. All rights reserved.
1. Introduction Multi-focus image fusion is a significant challenge in computer vision and image analysis areas, which aims to create an all-in-focus image by combining the images of the same scene with varying focus settings [1]. The obtained fused image provides more comprehensive information about the scene and it is more suitable for the human/machine perception than each of the source images [2]. Furthermore, analyzing a single fused image is easier than analyzing a collection of images. Therefore, a single fused image is preferred to a series of images with different focal points. Multi-focus image fusion methods are applied to a broad range of applications such as microscopic imaging, industrial vision systems, macro photography, and so on [3,4]. In the aforementioned applications, it is desirable to have the entire image in focus. However, in practice, cameras that are used in imaging systems are not pinhole devices but composed of convex lenses. Such a camera that composed of convex lenses has a limited depth of field (DoF). It can precisely focus on only one plane at a time. Therefore, images of the objects that located at any other plane, are blurred by an amount depending on their distances to the plane of focus. However, if the amount of blur is sufficiently small, it is nearly indistinguishable from a sharp object. Thus, an area of acceptable sharpness occurs between two planes on either side of the plane of focus. This area is named as the depth of field (DoF). To overcome this limitation, two or more images of the same scene are captured from the same point of view with different focus settings. Then these images are combined to create a single composite image that provides the desired depth of field. R ∗
Reviews processed and recommended for publication to the Editor-in-Chief by Area Editor Dr. E. Cabal-Yepez. Corresponding author. E-mail address:
[email protected] (A.N. Toprak).
http://dx.doi.org/10.1016/j.compeleceng.2017.02.003 0045-7906/© 2017 Elsevier Ltd. All rights reserved.
Please cite this article as: V. Aslantas, A.N. Toprak, Multi-focus image fusion based on optimal defocus estimation, Computers and Electrical Engineering (2017), http://dx.doi.org/10.1016/j.compeleceng.2017.02.003
JID: CAEE 2
ARTICLE IN PRESS
[m3Gsc;February 22, 2017;22:1]
V. Aslantas, A.N. Toprak / Computers and Electrical Engineering 000 (2017) 1–17
An effective multi-focus image fusion method must meet the following requirements: Firstly, it should preserve all relevant and salient information of the source images. Secondly, fusion process should not produce artifacts. Finally, it should be robust to imperfections such as noise and misregistration [1]. Recently, a large number of multi-focus image fusion methods have been presented. These methods could be classified into two classes: transform-domain and spatial-domain methods. In the transform-domain methods, transform coefficients are obtained by first applying a multi-scale transform to the source images. Then, these coefficients are fused according to particular fusion strategies. Finally, the fused image is obtained by performing an inverse transform over the fused coefficient. Many kind of transforms have been used for image fusion such as Laplacian Pyramid [5] and Discrete Wavelet Transform (DWT) [6]. Another wavelet-based approach have been proposed in [7] that uses variance and spatial frequency of the wavelet coefficients to produce the fused coefficients. Recently developed multi-scale transforms such as Nonsubsampled Shearlet Transform [8], and Discrete Cosine Transform (DCT) [9] have been also employed to multi-focus image fusion. The main drawback of the transform-domain methods is that applying transform prior to the fusion process modifies the original pixel values. Since the fusion is carried out on the transform coefficients, original pixel values cannot be preserved in the transform-domain methods. This may result in loss of information, and color and brightness distortions on the fused image [10]. By contrast with the former class methods, spatial-domain methods fuse source images directly by using their spatial features. Spatial-domain techniques could be basically classified into two categories, which are region-based and pixel-based techniques. In the former category, source images are initially segmented into regions or fixed-sized blocks. Then, the corresponding regions are compared by using sharpness criteria to determine which regions are the sharp ones. At last, sharpest regions are selected to form the fused image. If any of the regions contain the projection of the objects that located at different distances from the plane of focus, these regions will be imaged partially blurred. In this case, the fused image will also contain blurred parts. Therefore, in region-based methods, prior segmentation algorithm plays a crucial role. Some of the well-known region-based methods can be listed as follows: block-based method [11], and block-based method using quadtree approach [12]. Similarly, the method based on ensemble individual features [13] first divides the images into blocks. Then it calculates different feature information that is extracted from both spatial and transform domains of each block to construct the fused image by selecting blocks in a winner-takes-all manner. In addition to these, several optimization-based methods are also proposed in literature to enhance the performance of the region-based methods [14]. The main principle of the pixel-based methods is to select pixels with the maximum sharpness value e.g. PCNN based [15] and dense scale invariant feature transform (DSIFT) [16] based methods. In these methods, a sharpness criterion is computed for each pixel by taking into account a particular neighborhood around that pixel. Then, fused image is obtained by transferring sharpest pixels. On the other hand, some spatial-domain methods create the fused image by using the weighted average of the corresponding pixel values [1,17]. However, averaging pixel values often causes halo artifacts, reduced contrast and reduced sharpness. In recent times, a pixel-based method that provides satisfactory fusion performance compared to the classical ones, is introduced [18]. In this method, first, the point spread functions (PSF) of each source image is estimated analytically. Then the sharp pixels are detected by using the estimated PSFs. In order to estimate the PSFs, the source images are segmented into fixed sized blocks and then the sharpness values of these blocks are computed by using a predefined sharpness criterion. At last, the PSFs of the source images are obtained by taking average of the PSFs that are calculated over the completely sharp and blurred block pairs. To compute the PSFs, entirely blurred and sharp regions are to be needed. However, in the multi-focus image fusion applications, the source images are not entirely blurred or sharp but consist of both of these regions. Detecting entirely blurred and sharp region pairs in which PSFs can be computed is a challenging task. Besides, each blurred objects on the images may have a different degree of blur since the distance between the objects and the plane of focus can be different. In this situation, each object could have a different PSF. Considering this; using an optimal PSF instead of an average PSF for each image would be more reasonable. Based on the aforementioned analysis and motivations, we then propose an efficient pixel-based multi-focus image fusion method using Levenberg-Marquardt algorithm. Levenberg-Marquardt algorithm is utilized to determine the PSFs of each source image. Then, the sharp pixels of the source images are detected by using estimated PSFs to create the fusion map. Some post processes are carried out on the fusion map to enhance the fusion performance. Finally, the fused image is obtained by transferring the detected sharp pixels. The main contributions of the paper are summarized below: (1) An optimal defocus estimation method is proposed to estimate the defocus blur of the multiple focused input images. (2) An improved and efficient algorithm is presented to fuse the multiple focused images which uses the estimated defocus blur of input images. (3) Several experiments are conducted on real multi-focus image sets. Experiments demonstrate the performance of the proposed technique and provide comparison with the previous methods. The remainder of the paper is organized as follows. In Section 2 the proposed method is explained in detail. In Section 3 experimental results and discussions are given. With Section 4, the paper is concluded. Please cite this article as: V. Aslantas, A.N. Toprak, Multi-focus image fusion based on optimal defocus estimation, Computers and Electrical Engineering (2017), http://dx.doi.org/10.1016/j.compeleceng.2017.02.003
ARTICLE IN PRESS
JID: CAEE
[m3Gsc;February 22, 2017;22:1]
V. Aslantas, A.N. Toprak / Computers and Electrical Engineering 000 (2017) 1–17
3
2. Fusion of multi-focus images using optimal PSFs In this section, an efficient multi-focus image fusion method is proposed. First, the optimal PSFs of the source images are estimated by making use of Levenberg–Marquardt optimization algorithm. Then, the fused image is produced from the source images using the estimated PSFs. 2.1. Theory of defocus Imaging process suffers from spatially variant out-of-focus blur. A defocused image of an object can be modeled as the convolution of a PSF over the focused image of the same object [19]. Let the I(x, y) represents an observed defocused image, with x and y indicating the coordinates of a pixel and the S(x, y) be the corresponding ideal all-in-focus image. The I(x, y) and S(x, y) are assumed to be related by:
I (x, y ) = S(x, y ) h(x, y, σ ),
(1)
where denotes 2D convolution operator and h(x, y, σ ) represents the PSF. The PSF of an imaging system defines the shape of the blur formed when a point source is imaged. As discussed in [20], the PSF can be approximately modeled by a 2D Gaussian function as follows:
h(x, y, σ ) =
1 2π σ 2
exp
− ( x2 + y2 ) , 2σ 2
(2)
where σ is the spread parameter. 2.2. The Levenberg-Marquardt algorithm The Levenberg-Marquardt (LM) algorithm is an iterative optimization method for solving non-linear real-valued functions which are expressed as the sum of the squares. It has become a standard method in non-linear optimization and widely used in a broad range of applications. In this section, a brief description of the LM algorithm is given. A detailed analysis of the LM algorithm can be found in [21]. Given a set of a data points (α i , γ i ) of m measured data and vector function f: Rn → Rm with m ≥ n, LM algorithm aims to find the minimum of:
F (β ) =
M 1 (γi − f (αi , β ))2 . 2
(3)
i=1
At each iteration of the algorithm, the parameter vector (β ) is updated by a new estimate (β +δ ). The LM algorithm based on a linear approximation to f in the neighborhood of β . Taylor series expansion of it can be written as:
f (xi , β + δ ) ≈ f (αi , β ) + Jδ,
(4)
∂ f (αi ,β ) where J is the Jacobian of the f, J = ∂β .
In each iteration, in order to determine the δ it is required to minimize the following equation:
(JT J + μEI )δ = −JT (γ − f (β )),
(5)
where EI is the identity matrix and μ is the non-negative damping parameter. The first algorithm proposed by Levenberg has a drawback that if the damping parameter, μ, is large, inverting JT J + μIcannot be used. Hence, Marquardt replaced the identity matrix I with the diagonal matrix that consists of the diagonal elements of the JT J. Based on this Eq. (5) can be rewritten as follows:
(JT J + μdiag(JT J ))δ = −JT (γ − f (β )).
(6)
2.3. Optimal defocus estimation on multi-focus images In this section, we present a method that utilizes Levenberg–Marquardt algorithm to estimate the defocus blur on the source images. In order to make the theory more clear, a linear imaging model is used in this paper [22]. For two acquired images by a linear combination of the foreground and the background layers, the linear imaging model is illustrated in Fig. 1. Consider a scene consists of two objects located at different distances; the foreground (f) and the background (g) objects. Due to the limited DoF, two differently focused image of the scene need to be acquired from an identical point of view to obtain an all-in-focus image of the scene. Assume that, two images of the scene are acquired, such that in each image, only one of the layer is in focus. First one is the near-focused image (I1 ) in which the foreground is in focus (sharp) and the background is out of focus (blurred). The other one is the far-focused image (I2 ) in which the background is in focus and the foreground is out of focus. As mentioned above, a defocused image can be expressed as convolution with a PSF. Hence, acquired images could be modeled as the linear combination of the foreground and the background layers. The near-focused image (I1 ) can be formed Please cite this article as: V. Aslantas, A.N. Toprak, Multi-focus image fusion based on optimal defocus estimation, Computers and Electrical Engineering (2017), http://dx.doi.org/10.1016/j.compeleceng.2017.02.003
ARTICLE IN PRESS
JID: CAEE 4
[m3Gsc;February 22, 2017;22:1]
V. Aslantas, A.N. Toprak / Computers and Electrical Engineering 000 (2017) 1–17
Fig. 1. Image formation model of acquired and fused images by linear combination of background and foreground layers.
as the combination of the foreground and convolution of the background with the PSF h1 . Similarly, far-focused image (I2 ) can also be expressed as the blurred foreground by the PSF h2 and the background as follows:
I1 (x, y ) = f (x, y ) + (h1 g(x, y )) I2 (x, y ) = (h2 f (x, y )) + g(x, y )
(7)
where h1 and h2 denote the PSF of the source images. If the convolution of the foreground with the h2 is denoted as fb and the convolution of the background with the h1 is denoted as gb then Eq. (7) can be abbreviated as:
I1 (x, y ) = f (x, y ) + gb (x, y ) I2 (x, y ) = fb (x, y ) + g(x, y )
(8)
The aim of using this model is to have an estimate of the blur on each layer, when the camera is focused on the other one. A schematic diagram of the proposed defocus estimation method is illustrated in Fig. 2. Consider I1 and I2 images in which either f layer or g layer, respectively, is in focus. The other layer is out of focus. If each image is convolved with the PSF of the other one, then the amounts of blur in both corresponding regions become the same. In other words, if the near-focused image (I1 ) is convolved with the h2 , the amounts of blur in f layer of the both blurred I1 and original I2 images become the same. If the far-focused image (I2 ) is convolved with the h1 , both g layers of blurred I2 and original I1 will have the same amount of blur. Based on this, artificially blurred versions of the source images I1 ∗ and I2 ∗ are obtained as follows:
I1 ∗ (x, y ) = fb (x, y ) + (h2 gb (x, y )) I2 ∗ (x, y ) = (h1 ( fb (x, y )) + gb (x, y ) ∗
(9) ∗
In I1 and I2 background is blurred by same degree. Also, foreground regions of I2 and I1 are blurred by the same degree. If the I2 ∗ is subtracting from I1 , background region will be canceled out. Similarly, if the I1 ∗ is subtracting from I2 , foreground region will be canceled out. By subtracting I2 ∗ from I1 and I1 ∗ from I2 , the difference images D1 and D2 are computed, respectively.
D1 (x, y ) = f (x, y ) − (h1 fb (x, y )) D2 (x, y ) = g(x, y ) − (h2 gb (x, y ))
(10)
If the ideal PSFs of the source images are used, element-wise product of the difference images has to be zero. Thus, in the proposed method, selected optimization algorithm aims to minimize the result of the element-wise product of the difference images. The objective function is given in the following equation:
z=
M N
D1 (x, y ) ◦ D2 (x, y )
(11)
x=1 y=1
Please cite this article as: V. Aslantas, A.N. Toprak, Multi-focus image fusion based on optimal defocus estimation, Computers and Electrical Engineering (2017), http://dx.doi.org/10.1016/j.compeleceng.2017.02.003
ARTICLE IN PRESS
JID: CAEE
[m3Gsc;February 22, 2017;22:1]
V. Aslantas, A.N. Toprak / Computers and Electrical Engineering 000 (2017) 1–17
5
Fig. 2. Optimal PSF estimation scheme.
Finally, Levenberg-Marquardt algorithm is utilized to compute the optimal PSFs (hˆ 1 , hˆ 2 ) iteratively. As discussed in the Section 2.1, PSF can be modeled as a 2D Gaussian function. As can be seen in Eq. (2), the only unknown parameter of this function is the spread parameter: σ . Thus, Levenberg-Marquardt algorithm is performed to estimate the spread parameters σ 1 and σ 2 by minimizing the Eq. (11). If estimated σ 1 and σ 2 are substituted for σ in Eq. (2) the optimal PSFs are obtained, respectively. 2.4. Fusing the multi-focus images using optimal PSFs After estimating the PSFs of the source images, the fusion map is constructed by making use of the estimated PSFs (hˆ 1 , hˆ 2 ). Firstly, each source image is convolved with the estimated PSF of the other image. Thus, artificially blurred images are obtained as in the following equations:
I1∗ (x, y ) = I1 (x, y ) h2
I2∗ (x, y ) = I2 (x, y ) h1
(12)
Each artificially blurred image is produced by blurring one of the source images with the estimated blur kernel of the other image. Hence, each artificially blurred image has the same amount of blur with the other source image for the blurred region of that image. For example, intensity values of the background object have to be equal in I1 and I2 ∗ . Based on this, the difference images are calculated by subtracting the source images from the artificially blurred version of the other image as follows:
D1 (x, y ) = I1 (x, y ) − I2∗ (x, y ) D2 (x, y ) = I2 (x, y ) − I1∗ (x, y )
(13)
If the input images contain high variations around each pixel, fusion map can be calculated by comparing difference images pixel by pixel. To make the method more robust to the images with low visual changes, fusion map determined at (x, y) considering sum of difference values in a local window around (x, y):
M (x, y ) =
⎧ ⎨l1 , ⎩l2 ,
(x,y ) (i, j )∈ (i, j )∈ (x,y )
D1 (i, j ) ≥ D1 (i, j ) <
(x,y ) (i, j )∈ (i, j )∈ (x,y )
D2 (i, j ) D2 (i, j )
(14)
where M is the fusion map, l1 and l2 are the labels that indicate the pixel would be transferred from I1 or I2 , respectively. Please cite this article as: V. Aslantas, A.N. Toprak, Multi-focus image fusion based on optimal defocus estimation, Computers and Electrical Engineering (2017), http://dx.doi.org/10.1016/j.compeleceng.2017.02.003
ARTICLE IN PRESS
JID: CAEE 6
[m3Gsc;February 22, 2017;22:1]
V. Aslantas, A.N. Toprak / Computers and Electrical Engineering 000 (2017) 1–17
Fig. 3. Kids image set and fusion maps (a and b) source images (c) initial fusion map (d) fusion map obtained by morphological filtering (e) final fused image obtained the proposed method.
Although the proposed defocus estimation and image fusion method are efficient, smooth and homogeneous regions of the focused objects may not have identified as in focus, in some cases. This could cause some small holes or thin protrusions to appear in the fusion map. In order to overcome this problem and improve the fusion quality, morphological operators are applied to the fusion map. Firstly, the opening operator is applied which is simply erosion of map, followed by dilation of the resulting map with the same structuring element. It removes thin protrusions and opens up the gaps between the objects that are connected by a thin bridge of pixels. Then the closing operation which is obtained by the dilation of an image followed by an erosion, is utilized to fill tiny holes in the fusion map [23]. Final fusion map M’ is obtained by applying opening and closing operations with the same structuring element successively as follows:
M = (M ◦ B ) • B
(15)
where B represents the structuring element. The structuring element is characterized by a shape regarded as a controllable parameter of operation which is selected as a 7 × 7 sized disk in this paper. An example of morphological filtering process is given in Fig. 3 for Kids image set. 3. Experimental results This section reports the experimental results computed by the proposed method and evaluate the obtained fused images quantitatively and perceptually. 3.1. Experimental setup In order to demonstrate the effectiveness of the proposed image fusion method, it is compared with seven stateof-the-art fusion methods, including the Discrete Wavelet Transform (DWT) method [6], the Discrete Cosine Transform (DCT) method [9], the block-based (BB) spatial-domain method [11], the cross bilateral filter (CBF) based method [17], the multi-scale weighted gradient-based fusion (MWGF) method [24], the dense scale-invariant feature transform (DSIFT) based method [16] and the analytical defocus estimation (ADE) based pixel-wise spatial-domain method [18]. The parameters for the methods are chosen as follows: for DWT method decomposition level is set to 4 and ‘sym8’ filter is selected, 16 × 16 block size and spatial frequency (SF) sharpness criteria is selected for BB method. For ADE method block size is selected as 8 × 8 and robustness constant r is set to 0,75. The parameters of the other methods: DCT, CBF, MWGF, and DSIFT are set to the default parameters that are recommended in the respective publications. 3.2. Objective quality metrics In order to assess the fusion performance of the proposed and the other well-known methods objectively, three fusion quality metrics are chosen. These metrics are: mutual information (QNMI ) [25], similarity-based metric (QSSIM ) [26] and quality of edge (QE ) [27]. Normalized Mutual Information Metric (QNMI ): Normalized mutual information metric represents how much information is obtained from the source images [25]. Mutual information can be defined as:
MI (I1 , I2 ) = H (I1 ) + H (I2 ) − H (I1 , I2 )
(16)
where H(I1 , I2 ) is the joint probability distribution function of I1 and I2 , H(I1 ) and H(I2 ) are the marginal entropy of the I1 and I2 , respectively. Normalized mutual information image fusion metric can be expressed as:
MI (I1 , IF ) MI (I2 , IF ) QNMI (I1 , I2 |IF ) = 2 + H (I1 ) + H (IF ) H (I2 ) + H (IF )
(17)
Please cite this article as: V. Aslantas, A.N. Toprak, Multi-focus image fusion based on optimal defocus estimation, Computers and Electrical Engineering (2017), http://dx.doi.org/10.1016/j.compeleceng.2017.02.003
ARTICLE IN PRESS
JID: CAEE
[m3Gsc;February 22, 2017;22:1]
V. Aslantas, A.N. Toprak / Computers and Electrical Engineering 000 (2017) 1–17
7
Fig. 4. Synthetically produced Toycars image set (a and b) source images (σ 1 = 4, σ 2 = 4,5) (c) reference image.
Structural Similarity Based Metric (QSSIM ): A metric based on structural similarity (SSIM) is proposed by Yang et al. [26]. It can be expressed as follows:
QSSIM =
λw SSIM (I1 , IF |w ) + (1 − λ(w ))SSIM (I2 , IF |w ) max(SSIM (I1 , IF |w ), SSIM (I2 , IF |w ))
ifSSIM (I1 , I2 |w ) ≥ 0.75 ifSSIM (I1 , I2 |w ) < 0.75
(18)
where w is the sliding window and its size selected as 7 × 7. I1 and I2 are the source images and the IF is the fused image. λ(w) denotes the weight factor and calculated as follows:
λ (w ) =
s(I1 |w ) s(I1 |w ) + s(I2 |w )
(19)
where s(I1 |w) and s(I2 |w) are the variance of w window of I1 and I2 . Quality of Edge Metric (QE ): QE evaluates the amount of edge or gradient information that transferred to the fused image [27]. It is formulated as follows:
QE (I1 , I2 |IF ) =
X
x=1
Y
Q I1 IF (x, y )w1 (x, y ) + Q I2 IF (x, y )w2 (x, y ) X Y x=1 y=1 w1 (x, y ) + w2 (x, y )
x=1
(20)
where QI 1 IF (x, y) = Qg I1 IF (x, y)Qα I1 IF (x, y). Qg I1 IF and Qα I1 IF are the edge strength and orientation preservation values, respectively. QI 2 IF is similar to QI 1 IF . w1 and w2 are used to weights the QI 1 IF and QI 2 IF , respectively. The parameter values of the QE metric are selected as suggested in [27]. 3.3. Analysis of the defocus estimation method In this section, experiments are performed on artificially produced multi-focus image sets to evaluate the proposed defocus estimation method. The proposed defocus estimation method utilizes optimization algorithms to estimate the PSF of the source images as detailed in Section 2.3. In the experiments, three popular optimization algorithms including Genetic Algorithm (GA) [28], Artificial Bee Colony (ABC) [29] and Levenberg-Marquardt (LM) algorithms are compared. For both of the GA and ABC algorithms number of population is chosen as 20 and maximum generation is chosen as 100. For the GA, crossover rate is set to 0,3 and mutation rate is set to 0,1. For the ABC algorithm, limit parameter is chosen as 20. LM algorithm runs for 7 iterations in all spread parameter combinations. The Toycars image set (300 × 300) is synthetically produced based on the linear imaging model that given Eq. (7) and can be seen in the Fig. 4. The performance of the defocus estimation method is evaluated by using synthetically produced image pairs with predefined spread parameters (σ 1 , σ 2 ) as given in the Table 1. The left column of the table indicates the real values of the spread parameters. Estimated spread parameters and computing times of the LM, GA and ABC algorithms are given in the table. For both GA and ABC, table shows the mean and standard deviation (given in parenthesis) values obtained through 30 independent runs. It can be seen from the table that LM estimates the spread parameters slightly more accurately than the other algorithms. On the other hand, the results demonstrate that spread parameters can be calculated with high accuracy by all the optimization methods applied in this work. Table 1 also demonstrates that the LM algorithm is much faster than the other optimization algorithms. 3.4. Experimental results and discussions The experiments are conducted on six multi-focus image sets: pepsi (256 × 256), lab (320 × 240), watch (256 × 256), leaf (256 × 256), kids (520 × 520) [30] and hydrant (640 × 480). Among them, first four image sets are gray and the last two sets are color images. The source images are illustrated in the Fig. 5. Please cite this article as: V. Aslantas, A.N. Toprak, Multi-focus image fusion based on optimal defocus estimation, Computers and Electrical Engineering (2017), http://dx.doi.org/10.1016/j.compeleceng.2017.02.003
JID: CAEE 8
ARTICLE IN PRESS
[m3Gsc;February 22, 2017;22:1]
V. Aslantas, A.N. Toprak / Computers and Electrical Engineering 000 (2017) 1–17
Fig. 5. Multi-focus source images.
Fig. 6. Fusion results of the tested fusion methods for the Pepsi image set.
Please cite this article as: V. Aslantas, A.N. Toprak, Multi-focus image fusion based on optimal defocus estimation, Computers and Electrical Engineering (2017), http://dx.doi.org/10.1016/j.compeleceng.2017.02.003
ARTICLE IN PRESS
JID: CAEE
[m3Gsc;February 22, 2017;22:1]
V. Aslantas, A.N. Toprak / Computers and Electrical Engineering 000 (2017) 1–17
9
Table 1 Comparison of the accuracy and computing times of the different optimization methods. Real values
LM
GA
ABC
σ1
σ2
σ1
σ2
Times/s
σ1
σ2
Times/s
σ1
σ2
Times/s
1,0
1,5
0.0830
1.4947
0.50
2,0
2,5
1,9419
2,4544
0,72
3,0
3,5
2,9891
3,3984
1,06
4
4,5
4,0764
4,4973
1,12
1,0968 (0,0386) 1,9261 (0.0523) 2,9896 (0,0356) 4,0610 (0,0438)
1,5005 (0,0328) 2.4445 (0.0432) 3,3973 (0,0234) 4,5091 (0,0254)
19,96 (2,31) 29,41 (3,49) 36,90 (4,01) 37,02 (3,81)
1,0829 (0,0 0 01) 1,9422 (0,0 0 01) 2,9816 (0,0 0 03) 4,0759 (0,0 0 03)
1,4947 (0,0 0 01) 2,4441 (0,0 0 02) 3,3985 (0,0 0 02) 4,4973 (0,0 0 05)
16,81 (2,35) 28,92 (1,39) 40,58 (2,33) 41,17 (2,34)
Fig. 7. The magnified regions of the fused images for the Pepsi image set.
Fusion results of the Pepsi multi-focus image set obtained by DWT, DCT, BB, CBF, MWGF, DSIFT, ADE and the proposed methods are given in Fig. 6. When the fusion result of the BB method is examined, some artificially produced edges can be seen near the letter P on the can and the edge of the table. Also, some distortions are occurred near the letter P in the fused image of the DWT method. Similarly, some obvious errors can be seen on the card that located on the background of the fused image of the DWT method. For a more clear comparison, the magnified versions of this region are given in Fig. 7. As can be seen in the Fig. 7 (a, d), the DWT and CBF methods generate shadow like artifacts around the letters. CBF method also suffers from artifacts around the P letter on the can. Moreover, edge of the table is appeared in blur in the fused image of the DCT (see Fig. 6(b)) and BB (see Fig. 6(c)) methods. It can be said that the MWGF and DSIFT methods has a better fusion performance compared to the other methods. On the other hand, fused images of the ADE and the proposed method do not contain any significant artifacts. Experimental results of the watch image set are shown in the Fig. 8. Watch image pair contains small objects such as the crown and the push buttons of the watch. Thus, it becomes a challenging image set for the fusion methods. It is observed from the fused images of the DWT, DCT, BB, CBF and DSIFT methods (see Fig. 8 (a–d, f)) that these methods produce obvious artifacts. In order to make the comparison more clear, the magnified regions of the fused images are presented in Fig. 9. These images are also show that the DWT DCT, BB, CBF and DSIFT methods cannot preserve the details of the watch. In particular, the crown and the push buttons of the watch are heavily distorted in the fused images of the DCT and BB methods that reveals the basic disadvantages of the block based methods. When the selected blocks cannot cover the small objects completely, the details cannot be integrated in the fused image. The details of the watch are well preserved in the fused image of the MWGF method. However, it can be seen from Fig. 8(e) right side of the background is completely blurred in the fused image of MWGF method. On the contrary, ADE and the proposed method preserve these details almost perfectly (see Fig. 8 (g and h)). Fig. 10 illustrates the fusion results of the Lab image set. If the source images of the Lab image set are examined carefully (see Fig. 5(b)), it can be observed that the head of man has slightly moved between two images. Due to this movement, the methods which suffer from shift variance, produce artifacts around the head. In order to examine the results in more detail, magnified images of the head of the man are illustrated in Fig. 11. The artifacts nearby the head can easily be seen in the results of the DWT, DCT, CBF and DSIFT methods. Although the BB method has better result than the mentioned methods, it still produces some artifacts. On the other hand, there are no notable artifacts in the fused image of MWGF, ADE and the proposed method (see Fig. 10 (e, g and f)). This result shows that the proposed method does not suffer from shift variance. Fusion results of the leaf image set are given in Fig. 12. Since Leaf image set has a complicated structure, it is hard to evaluate the fusion results visually. For a better visual evaluation, the difference images that displayed in Fig. 13, are obtained by subtracting far focused input image from the fused images. For an ideal fusion method, background must be cancelled out in the difference image. It is observed from the figures the DWT method cannot preserve the high frequency details and this is seen more clearly from the difference image (see Fig. 13(a)). Besides, some blocking effects appear around the leaves in the results of the DCT and BB methods (see Fig. 13 (b and c)). These two methods divide the images into blocks prior to the decision stage. If the blocks cannot cover the objects completely, such blocking effects appear on the Please cite this article as: V. Aslantas, A.N. Toprak, Multi-focus image fusion based on optimal defocus estimation, Computers and Electrical Engineering (2017), http://dx.doi.org/10.1016/j.compeleceng.2017.02.003
JID: CAEE 10
ARTICLE IN PRESS
[m3Gsc;February 22, 2017;22:1]
V. Aslantas, A.N. Toprak / Computers and Electrical Engineering 000 (2017) 1–17
Fig. 8. Fusion results of the tested fusion methods for the Watch image set.
Fig. 9. The magnified regions of the fused images for the Watch image set.
fused image. CBF method construct the fused image by taking weighed average of the input images, so it does not preserve the original pixel values. This can be seen in Fig. 13(d). Also some ringing effects around the leaves can be seen in the fused image of DSIFT and ADE methods (see Fig. 12 (f and g)). These artifacts also can be seen in the difference image (see Fig. 13 (f and g)). Besides, from the Fig. 12(e), it can be clearly seen that leaves on the right side of the image is blurred in the fused image of the MWGF method. Difference image of the MWGF method (see Fig. 13(e)) also shows that these leaves are transferred from the wrong source image. In contrary, the fused image of the proposed method contains all the useful information of the source images. Furthermore, it can be observed in the Fig. 13(h) that the proposed method produces the closest result to the ideal fused image. Fig. 14 shows the fusion results of tested methods for the colored Kids image set. From the Fig. 14, it can be easily seen that DWT, CBF and DSIFT methods suffer from poor contrast (see 14 (a, d and f)). Also some blocking artifacts are visible around the head of the kid in the fused image of the DCT and BB methods (see 14 (b and c)). On the other hand, it is Please cite this article as: V. Aslantas, A.N. Toprak, Multi-focus image fusion based on optimal defocus estimation, Computers and Electrical Engineering (2017), http://dx.doi.org/10.1016/j.compeleceng.2017.02.003
JID: CAEE
ARTICLE IN PRESS V. Aslantas, A.N. Toprak / Computers and Electrical Engineering 000 (2017) 1–17
[m3Gsc;February 22, 2017;22:1] 11
Fig. 10. Fusion results of the tested fusion methods for the Lab image set.
Fig. 11. The magnified regions of the fused images for the Lab image set.
obvious that the proposed and the MWGF methods provides better fusion performance than the other methods for Kids image set. The last experiment is performed on the Hydrant image. The fused images of the Hydrant set are given in Fig. 15. As can be seen in the figure, grasses and flowers generate a complex pattern in the foreground of the image. Besides, the Hydrant image set is not registered perfectly because of the wind. To make visual evaluation more clear, difference images are illustrated in Fig. 16. It is observed from the difference images of the DWT, DCT, BB, CBF and DSIFT methods, these methods cannot produce an accurate fused image due to the complex pattern of the source images. The silhouette of the foreground can easily be seen in the difference images of the transform-domain (DWT and DSIFT) and averaging-based (CBF) methods. As mentioned above, the transform-domain methods suffer from shift variance and also do not preserve the original pixel values. Hence, these methods cannot produce satisfying fusion results. On the other hand, the blocking effect around the border between the grasses and the background can be observed on the difference images of the DCT and BB methods (see Fig. 16 (b and c)). The fused image of the MWGF method better than the above-mentioned methods. However, as can be seen from the Fig. 16(e) one of the flower (second from left) is completely blurred in the fused image of MWGF method. ADE method produces better results compared to the other methods. However, it contains some artifacts around the hydrant and trees. On the contrary, the proposed method the color and the detail information is well preserved Please cite this article as: V. Aslantas, A.N. Toprak, Multi-focus image fusion based on optimal defocus estimation, Computers and Electrical Engineering (2017), http://dx.doi.org/10.1016/j.compeleceng.2017.02.003
JID: CAEE 12
ARTICLE IN PRESS
[m3Gsc;February 22, 2017;22:1]
V. Aslantas, A.N. Toprak / Computers and Electrical Engineering 000 (2017) 1–17
Fig. 12. Fusion results of the tested fusion methods for the Leaf image set.
Fig. 13. The difference images for the Leaf image set. Difference images are obtained by subtracting far focused input image from the fused image of the corresponding methods.
in the fused image of the proposed image. It can also be seen from the Fig. 16(g) that there is no notable artifacts on the foreground of the difference image. After all, the visual comparison can be summarized as follows. The DWT method often fail in producing all-in-focus images especially in the imperfectly registered images. BB and DCT methods mostly preserves the relevant information of the source images, but they also yield severe blocking artifacts. CBF and DSIFT methods produce low contrast fused images and cannot preserve original pixel values like transform-based methods. MWGF method mostly produce better result than other methods. However in some cases such as Leaf and Hydrant image sets, it cannot preserve all of the focused regions. Compared with other methods, ADE generally generates better results. However, it estimates defocus information of the input images by averaging PSFs that are calculated for different regions of the input images. Thus, in some cases, estimated PSFs may not be suitable for each region. Thus, ADE can make an incorrect estimation of PSF and introduce some artifact in fused image. In contrast, the proposed method successfully realizes the fusion of multi-focus images without artifacts and yields more informative images. At last, the quantitative evaluation of the compared methods is given in Table 2 in terms of QNMI , QSSIM , and QE quality metrics. The best value of each row is highlighted in bold. From the table it can be seen for the Pepsi image set that the DCT method produces the best values in terms of the QNMI and QE metrics. The proposed method gives the second best for both of the mentioned metrics. The higher the QNMI value the more information that transferred from the source images to the Please cite this article as: V. Aslantas, A.N. Toprak, Multi-focus image fusion based on optimal defocus estimation, Computers and Electrical Engineering (2017), http://dx.doi.org/10.1016/j.compeleceng.2017.02.003
JID: CAEE
ARTICLE IN PRESS V. Aslantas, A.N. Toprak / Computers and Electrical Engineering 000 (2017) 1–17
[m3Gsc;February 22, 2017;22:1] 13
Fig. 14. Fusion results of the tested fusion methods for the Kids image set.
Fig. 15. Fusion results of the tested fusion methods for the Hydrant image set.
fused image. However, QNMI produces high value whether the transferred information is useful or not. In other words, if the fused image is very similar to the one of the source images, it will also produce high QMI value. Furthermore, QE measures the amount of the edge that integrated from the source images. In some instances, artificially occurred edges can cause the QE to yield higher values. Therefore, the quality metrics should be considered together to evaluate the fusion performances. On the other hand, the proposed method gives the best quality index in terms of QSSIM metric. For the Watch image set the proposed method produces highest results in terms of QMI and QSSIM . For the QE , the best result belongs to the CBF method. However, if Figs. 8(d) and 9(d) are examined, obvious artifacts can be seen. For the Lab and Leaf image sets, the proposed method gives the highest quality values. According to the quantitative results obtained for Watch and Lab images, the ADE and DSIFT methods follow the proposed method. For Leaf image, MWGF method produces better results than other methods in terms of the QSSIM metric. However, the Fig. 12(e) shows that a whole leaf is blurred in the fused image of the MWGF method. For Hydrant image set, the ADE method follows the proposed method. Besides, it should be noted that DWT Please cite this article as: V. Aslantas, A.N. Toprak, Multi-focus image fusion based on optimal defocus estimation, Computers and Electrical Engineering (2017), http://dx.doi.org/10.1016/j.compeleceng.2017.02.003
ARTICLE IN PRESS
JID: CAEE 14
[m3Gsc;February 22, 2017;22:1]
V. Aslantas, A.N. Toprak / Computers and Electrical Engineering 000 (2017) 1–17
Fig. 16. The difference images for the Hydrant image set. Difference images are obtained by subtracting near focused input image from the fused image of the corresponding methods. Table 2 Results of the image metrics with respect to the fusion methods. Source images
Index
DWT
DCT
BB
CBF
MWGF
DSIFT
ADE
Proposed
Pepsi
QNMI QSSIM QE QNMI QSSIM QE QNMI QSSIM QE QNMI QSSIM QE QNMI QSSIM QE QNMI QSSIM QE
0,9419 0,9750 0,7245 0,7851 0,9482 0,7187 0,9336 0,9670 0,6729 0,5605 0,9478 0,6555 0.7566 0,9720 0,6931 0,7153 0,9506 0,6783
1,3172 0,9795 0,7684 1,1895 0,9756 0,7645 1,2764 0,9809 0,7446S 1,0136 0,9627 0,6901 1,0939 0,9839 0,7410 1,1300 0,9749 0,7344
1,2771 0,9791 0,7641 1,1924 0,9753 0,7707 1,2716 0,9813 0,7478 1,0457 0,9822 0,7308 1,0995 0,9824 0,7412 1,1407 0,9742 0,7343
1,1159 0,9830 0,7547 0,9418 0,9750 0,7829 1,1166 0,9819 0,7339 0,7563 0,9785 0,7231 0,7926 0,9200 0,5459 0,8283 0,9715 0,7111
1,1891 0,9786 0,7289 1,0739 0,9741 0,7334 1,2286 0,9818 0,7341 0,9044 0,9883 0.6191 1,0306 0,9843 0,7435 1,0644 0,9744 0,7334
1,3083 0,9799 0,7657 1,1904 0,9760 0,7714 1,2591 0,9808 0,7466 1,0379 0,9863 0,7375 0,7924 0,9199 0,5456 0,8292 0,9421 0,7113
1,3020 0,9756 0,7651 1,1934 0,9761 0,7713 1,2754 0,9812 0,7477 1,0485 0,9845 0,7352 1,1043 0,9842 0,7442 1,1390 0,9744 0,7349
1,3087 0,9832 0,7664 1,1940 0,9763 0,7714 1,2768 0,9819 0,7479 1,0495 0,9864 0,7387 1,1015 0,9844 0,7456 1,1430 0,9749 0,7356
Watch
Lab
Leaf
Kids
Hydrant
produces the worst results for all image sets. From the experiments it can be concluded that the quantitative results are consistent with the visual results and the proposed method is an effective solution for multi-focus image fusion. 3.5. Comparison of computational efficiencies In this section, computational efficiencies of the methods are compared. Experiments are performed on a computer equipped with Intel Core i5 1.70 GHz CPU. The average computing time of different image fusion methods on two image pairs of size 256 × 256 pixels are compared in Table 3. It can be seen from the table that the CBF is the most time consuming method while the DSIFT and MWGF methods are also less efficient than the others. On the other hand, BB, DWT and DCT methods are the most efficient methods. The proposed method is also very efficient with about 1 s required for defocus estimation and fusion. Table 3 Average running time of different methods on two image pairs of size 256 × 256 pixels. Method
DWT
DCT
BB
CBF
MWGF
DSIFT
ADE
Proposed
Time/s
0,7038
0,7163
0,1156
18,0161
1,7308
5,5263
1,0907
1,1905
Please cite this article as: V. Aslantas, A.N. Toprak, Multi-focus image fusion based on optimal defocus estimation, Computers and Electrical Engineering (2017), http://dx.doi.org/10.1016/j.compeleceng.2017.02.003
ARTICLE IN PRESS
JID: CAEE
[m3Gsc;February 22, 2017;22:1]
V. Aslantas, A.N. Toprak / Computers and Electrical Engineering 000 (2017) 1–17
15
Fig. 17. Image fusion result with more than two images (a–c) near focused, middle focused and far focused images of the Camera image set, respectively. (d) fused Camera image produced by the proposed method (e and f) near focused, middle focused and far focused images of Oilpaint image set, respectively (h) fused Oilpaint image produced by the proposed method.
3.6. Fusion of color image sequences The proposed method can also be applied to the multi-focus image sets that contain more than two images. Fig. 17 shows the Camera and Oilpaint image sets that have three source images and fusion results of the proposed method for these image sets. As shown in Fig. 17(d), all the focused regions of the source images transferred to the fused image without introducing any artifacts. Fig. 17(h) shows the fused image obtained by the proposed method for the Oilpaint images set. From the image it can be seen that the proposed method preserves the details of the source images. These examples reveal that the proposed method works also well with more than two source images. 4. Conclusion In this paper an efficient, pixel-based multi-focus image fusion method has been presented. In particular, first, the PSFs of the source images are computed by utilizing the Levenberg–Marquardt algorithm. Then, each source image is blurred by the estimated PSF of the other image to produce artificially blurred images. Difference images are obtained by subtracting the source images from the artificially blurred ones. The fusion map is produced by comparing the difference images. The morphological operators are then employed to improve the fusion map. At last, fused image is constructed by transferring the sharp pixels from the source images via the fusion map. Several experiments were conducted to evaluate the fusion performance of the proposed method. Both visually and quantitative evaluations demonstrated that the proposed method outperforms the state-of-the-art multi-focus image fusion methods. Acknowledgment This work is supported by the Research Fund of Erciyes University under the Grant Number: FDK-2015-5630. References [1] Li S, Kang X, Hu J. Image fusion with guided filtering. IEEE T Image Process 2013;22(7):2864–75. [2] Aslantas V, Bendes E, Kurban R, Toprak AN. New optimised region-based multi-scale image fusion method for thermal and visible images. IET Image Process 2014;8(5):289–99. [3] Qu Y, Yang H. Optical microscopy with flexible axial capabilities using a vari-focus liquid lens. J Microsc 2015;258(3):212–22. [4] Sannen D, Van Brussel H. A multilevel information fusion approach for visual quality inspection. Inform Fusion 2012;13(1):48–59. [5] Burt PJ, Kolczynski RJ. Enhanced image capture through fusion. Computer vision, 1993. proceedings., fourth international conference on; 1993. [6] Pajares G, de la Cruz JM. A wavelet-based image fusion tutorial. Pattern Recognit 2004;37(9):1855–72. [7] Abdipour M, Nooshyar M. Multi-focus image fusion using sharpness criteria for visual sensor networks in wavelet domain. Comput Electr Eng 2016;51:74–88. [8] Gao GR, Xu LP, Feng DZ. Multi-focus image fusion based on non-subsampled shearlet transform. IET Image Process 2013;7(6):633–9. [9] Haghighat MBA, Aghagolzadeh A, Seyedarabi H. Multi-focus image fusion for visual sensor networks in DCT domain. Comput Electr Eng 2011;37(5):789–97. [10] Aslantas V, Kurban R. A comparison of criterion functions for fusion of multi-focus noisy images. Opt Commun 2009;282(16):3231–42. [11] Li S, Kwok JT, Wang Y. Combination of images with diverse focuses using the spatial frequency. Inform Fusion 2001;2(3):169–76. [12] De I, Chanda B. Multi-focus image fusion using a morphology-based focus measure in a quad-tree structure. Inform Fusion 2013;14(2):136–46. [13] Kausar N, Majid A, Javed SG. A novel ensemble approach using individual features for multi-focus image fusion. Comput Electr Eng 2016;54:393–405.
Please cite this article as: V. Aslantas, A.N. Toprak, Multi-focus image fusion based on optimal defocus estimation, Computers and Electrical Engineering (2017), http://dx.doi.org/10.1016/j.compeleceng.2017.02.003
JID: CAEE 16 [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30]
ARTICLE IN PRESS
[m3Gsc;February 22, 2017;22:1]
V. Aslantas, A.N. Toprak / Computers and Electrical Engineering 000 (2017) 1–17 Aslantas V, Kurban R. Fusion of multi-focus images using differential evolution algorithm. Expert Syst Appl 2010;37(12):8861–70. Wang ZB, Ma YD, Gu J. Multi-focus image fusion using PCNN. Pattern Recognit 2010;43(6):2003–16. Liu Y, Liu S, Wang Z. Multi-focus image fusion with dense SIFT. Inform Fusion 2015;23:139–55. Shreyamsha Kumar BK. Image fusion based on pixel significance using cross bilateral filter. Signal Image Video Process 2015;9(5):1193–204. Aslantas V, Toprak AN. A pixel based multi-focus image fusion method. Opt Commun 2014;332:350–8. Aslantas V. A depth estimation algorithm with a single image. Opt Express 2007;15(8):5024–9. Chaudhuri S, Pentland A, Rajagopalan AN. Depth from defocus: a real aperture imaging approach. New York: Springer; 2012. Madsen K, Nielsen HB, Tingleff O. Methods for non-linear least squares problems; 2004. Kubota A, Aizawa K. Reconstructing arbitrarily focused images from two differently focused images using linear filters. IEEE T Image Process 2005;14(11):1848–59. Li H, Li L, Zhang J. Multi-focus image fusion based on sparse feature matrix decomposition and morphological filtering. Opt Commun 2015;342:1–11. Zhou Z, Li S, Wang B. Multi-scale weighted gradient-based fusion for multi-focus images. Inform Fusion 2014;20:60–72. Hossny M, Nahavandi S, Creighton D. Comments on’information measure for performance of image fusion’. Electron Lett 2008;44(18):1066–7. Yang C, Zhang J-Q, Wang X-R, Liu X. A novel similarity based quality metric for image fusion. Inform Fusion 2008;9(2):156–60. Xydeas CS, Petrovic V. Objective image fusion performance measure. Electron Lett 20 0 0;36(4):308–9. Goldberg DE. Genetic algorithms in search, optimization and machine learning. Addison-Wesley Longman Publishing Co., Inc.; 1989. Karaboga D, Basturk B. A powerful and efficient algorithm for numerical function optimization: artificial bee colony (ABC) algorithm. J Global Optim 2007;39(3):459–71. Nejati M, Samavi S, Shirani S. Multi-focus image fusion using dictionary-based sparse representation. Inform Fusion 2015;25:72–84.
Please cite this article as: V. Aslantas, A.N. Toprak, Multi-focus image fusion based on optimal defocus estimation, Computers and Electrical Engineering (2017), http://dx.doi.org/10.1016/j.compeleceng.2017.02.003
JID: CAEE
ARTICLE IN PRESS V. Aslantas, A.N. Toprak / Computers and Electrical Engineering 000 (2017) 1–17
[m3Gsc;February 22, 2017;22:1] 17
Veysel Aslantas received B.Sc. degree in Electronics Engineering from the Erciyes University, Turkey, in 1988 and Ph.D. degree in Intelligent Systems from the University of Wales Cardiff, United Kingdom, in 1997. Since 1997, he has been with the Department of Computer Engineering, Erciyes University, where he is currently a Professor. His research interests include computer vision, image processing and intelligent optimization techniques. Ahmet Nusret Toprak received B.E. degree in Computer Engineering from the Karadeniz Technical University, Turkey, in 2009, and M.Sc. degree in Computer Engineering from the Erciyes University, Turkey, in 2012. He is currently a research assistant and Ph.D. student at Erciyes University. His current research interests include image fusion and intelligent optimization techniques.
Please cite this article as: V. Aslantas, A.N. Toprak, Multi-focus image fusion based on optimal defocus estimation, Computers and Electrical Engineering (2017), http://dx.doi.org/10.1016/j.compeleceng.2017.02.003