Neurocomputing 356 (2019) 119–130
Contents lists available at ScienceDirect
Neurocomputing journal homepage: www.elsevier.com/locate/neucom
Scale-invariant structure saliency selection for fast image fusion Yixiong Liang a, Yuan Mao a, Jiazhi Xia a, Yao Xiang a, Jianfeng Liu b,∗ a b
School of Computer Science and Engineering, Central South University, Changsha 410083, China School of Automation, Central South University, Changsha 410083, China
a r t i c l e
i n f o
Article history: Received 29 October 2018 Revised 10 March 2019 Accepted 27 April 2019 Available online 10 May 2019 Communicated by Jungong Han Keywords: Scale-space theory Difference-of-Gaussian scale space pyramid Scale-invariant saliency map Guided filtering
a b s t r a c t In this paper, we present a fast yet effective method for pixel-level scale-invariant image fusion in spatial domain based on the scale space theory. Specifically, we propose a scale-invariant structure saliency selection scheme based on the difference-of-Gaussian (DoG) scale space pyramid of images to build the weights or activity map. Due to the scale-invariant structure saliency selection, our method can keep both details of small size objects and the integrity information of large size objects in images. In addition, our method is very efficient since there are no complex operations involved and easy to be implemented. Experimental results demonstrate that compared to state-of-the-art image fusion methods, the proposed method yields competitive or even better results in terms of both visual quality and objective metrics, but it is much faster and therefore is appropriate to high-resolution and high-throughput image fusion tasks. Code is available at https://github.com/yiqingmy/Fusion.
1. Introduction Pixel-level image fusion intends to combine different images of the same scene into a single composite image which is more comprehensive and more informative for visual perception and processing [1,2] and are widely used in various image processing and computer vision applications. For instance, in medical imaging applications, multi-modal image fusion [3] tries to fuse images which have been acquired via different sensor modalities exhibiting diverse characteristics for a more reliable and accurate medical diagnosis. In surveillance applications, the visible-infrared image fusion [4] intends to fuse the appearance information and thermal radiation from the infrared image and visible image simultaneously. Another typical application is the multi-focus image fusion [5–7]. As the depth-of-field (DoF) of bright-field microscopy is only about 1–2 μm, while the specimen’s profile covers a much larger range and then the parts of the specimen that lie outside the object plane are blurred. The multi-focus image fusion can obtain an allin-focus image from multiple images taken under different distance from the object to the lens of the identical view point. According to the intuitive visual perception, a good image fusion method should contain the following properties. First, it preserves both the details of small size objects and the integrity information of large size objects in the fused image, even in the case ∗
Corresponding author. E-mail addresses:
[email protected] (Y. Liang),
[email protected] (Y. Mao),
[email protected] (J. Xia),
[email protected] (Y. Xiang),
[email protected] (J. Liu). https://doi.org/10.1016/j.neucom.2019.04.043 0925-2312/© 2019 Elsevier B.V. All rights reserved.
© 2019 Elsevier B.V. All rights reserved.
of the size of the interested objects varying largely. For example, the cervical cell images from the microscope contain both small size isolated cells and large size agglomerates, which are both useful for cervical cytology [8]. Second, it should be efficient enough to handle large-scale data. For instance, it needs to process thousands of fields of view (FoV) in an acceptable time for the whole slide scanning in digital cytopathology [9], which requires to fuse a series of high resolution images captured at each FoV in a very efficient way. Third, it does not produce visual artifacts. In past decades, image fusion has been studied extensively and various of image fusion techniques have been proposed in literatures which can be roughly classified into two categories [1]: transform domain methods and spatial domain methods. The transform domain methods [2] are mainly based on the “decomposition-fusion-reconstruction” framework, which first transforms each source image into a new domain by some tools such as multi-scale decomposition (MSD) [6,10–12], sparse coding [13–15] or other transformation like principal component analysis (PCA) [16] and convolutional neural network (CNN) [4,17], and then constructs a composite representation in the transform domain with specific fusion rules and finally applies the inverse transform on the composite representation to obtain the fused image. However the fused images of these methods often undergo some global defects [18,19]. Moreover, they are often not efficient due to the complicated decomposition and inverse construction. In contrast to the transform domain methods, the spatial domain method [2] is to take each pixel in the fused image as the weighted average of the corresponding pixels in the input
120
Y. Liang, Y. Mao and J. Xia et al. / Neurocomputing 356 (2019) 119–130
Fig. 1. The proposed scheme.
images, where the weights or activity map are often determined according to the saliency of different pixels [20–23] and the corresponding spatial context information [18,24–26]. Various handcrafted [20,21] or CNN-based focus measure [22,23] are used to constitute the saliency map, which are often further refined to capture the corresponding spatial context information by specific operations such as guided filtering [24], probability filtering [27] or boosted random walks [7]. Comparing to the transformation domain methods, the spatial domain methods are often fast and easy to be implemented, but the fused performance extremely depends on the accurate estimation of the weights of the pixels. In addition, these methods cannot deal well with the case where the size of the interested objects varies largely in the image. In this paper, we propose a simple yet effective multi-scale image fusion method in spatial domain, which is illustrated in Fig. 1. The key idea is the scale-invariant structure saliency generation based on the difference-of-Gaussian (DoG) scale space pyramid [28]. After generating the saliency map of each image, a simple max operation is applied to them to generate the mask images, which are further refined by a single-scale guided filtering [29] to exploit the spatial correlation among adjacent pixels, resulting a scale-invariant estimation of activity maps. Without complicated processing involved, the proposed method is very fast and can be used to fuse the high resolution images in real-time applications. Experimental results demonstrate that comparing to many stateof-the-art methods, the proposed method is much faster while yields competitive or even better results in terms of both visual and quantitative evaluations. Our contributions in this paper are as follows:
• We propose a scale-invariant structure saliency selection scheme based on the difference-of-Gaussian (DoG) scale space pyramid of images. The resulting image fusion method can keep both details of small size objects and the integrity information of large size objects in images simultaneously. • Our method is very efficient, easy to be implemented and can be used for fast high resolution images fusion.
• Comparing to many state-of-the-art methods, our method yields competitive or even better results in terms of both visual and quantitative evaluations on four datasets. 1.1. Related works As mentioned before, existing image fusion methods roughly fall into two groups: transform domain methods and spatial domain methods. Here we will briefly review some related methods and explicitly distinguish them from the proposed method (refer to [2,3,19,30,31] for comprehensive survey). The transform domain methods can be summarized by the three steps, i.e. image transform, transformed coefficients fusion and inverse transform. The basic assumption is that in the transformed domain the resulting coefficients can well characterize the underlying salient information of source images and therefore the choice of image transform tools is crucial. The MSD is the most popular tool to extract salient features at different scales and based on which the multi-scale fusion can be easily implemented in transform domain. The pyramid transform [10,12] and wavelet transform [32] are the two most used MSD techniques. For instance, the nonsubsampled contourlet transform (NSCT) [33] and the dual tree complex wavelet transform (DTCWT) [34] are used to decompose the image into a serials of subbands and then perform coefficients fusion in the transform domain, while the method of LP-SR [35] uses the multi-scale Laplacian pyramid transform. In addition to the selection image transform tools, the fusion strategies of multi-scale coefficients also play a crucial role on the fused image. Conventionally, the coefficients are fused by choosing the max or average of absolute value of coefficients. The multi-scale weighted gradient fusion (MWGF) [34] adopts a weighted scheme to combine all the important gradient information from the input images into an optimal weighted gradient image. Instead of performing coefficient at each scale individually, the cross-scale coefficient selection method [36] calculates an optimal set of coefficients for each scale in the transform domain. LP-SR [35] focuses on the fusion of low-pass MSD bands based on the sparse coding. However, because the fusion
Y. Liang, Y. Mao and J. Xia et al. / Neurocomputing 356 (2019) 119–130
is performed in the transform domain, the fused image often has some global defects [18,19] and is easily disturbed by misregistration or noise [6]. Furthermore, the transform domain methods are often complex to decompose each source images into another domain and then reconstruct the fused image especially for a group of high resolution source images. Different from these methods, our method perform multi-scale fusion directly in the spatial domain. Recently, much attention has been paid on the spatial domain fusion which directly produce the fused image by calculating the weighted average of the pixels in the source images and therefore the crucial step is the construction of weights. Generally, the weights are determined according to how well each pixel is focused, which is usually measured by some focus metrics reflecting the saliency of different pixels [2,21]. Those focus metrics can be either given or learned. For instance, the absolute value of gradient [18,26] or Laplacian response [7,24,25] are widely used as the pre-defined focus metrics. The dense SIFT (DSIFT) [20] defines the saliency of each pixel based on its SIFT descriptor. Different from these pre-defined focus metrics, many machine learning algorithms such as support vector machine [37] and neural networks [2] are also exploited to learn the focus metric. As the focus metric of each pixel is calculated individually, the results is still sensitive to mis-registration or noise, which can be alleviated by making use of the spatial context information. There are two widely adopted strategies. One is the region-based fusion [2,18] which resorts to image segmentation to identify the focus and de-focus regions of images, but the results rely heavily on the accuracy of the segmentation. The other is to apply some post-processing techniques to the initial pixel-wise weight maps [7]. The frequently-used techniques include guided filtering [24], probability filtering [27], image matting [25,26] and boosted random walks [7], etc. Most of these methods are fast and easy to be implemented while comparing with transform domain methods [2]. However, they adopt at-most two-scale fusion scheme and therefore cannot handle the multi-scale fusion well, whereas our method searches across the entire scale space and thus can extract and combine salient features at different scales. Due to the overwhelming success in a broad range of applications in computer vision and image processing, in the past three years many deep learning (DL)-based image fusion methods have been proposed [30] which can also be roughly divided into the transform domain methods and spatial domain methods. DeepFuse [17] and DenseFuse [4] take advantage of the unsupervised encoder-decoder model to deal with image fusion. Specifically, they train a encoder to transform the original images into a new domain and then fuse the resulting coefficients. The final fused image is reconstructed by a learned decoder. On the contrary, the CNNbased [22] and the p-CNN based [23] methods resort to learning the saliency map by casting the multi-focus image fusion as a classification problem and model by CNN in spatial domain. Particularly, the former introduces a siamese convolutional network for two-class classification while the latter adopt a standard CNN for three-class classification. Very recently, the ensemble of CNN [39] and generative adversarial network (GAN) [38] are exploited to improve the accuracy of saliency map. Due to the lack of public available labeled data, some simple operation such as Gaussian blurring and masking are used to generate the training data. Obviously, those DL-based methods involve computationally intensive training and reconstruction. Moreover, in order to deal with more than two images, the DL-based methods try to fuse them one by one in series making the inference is very time-consuming. By contrast, our method involves no complicated operations and can combine many images simultaneously, making it very appropriate to high-resolution and high-throughput image fusion tasks.
121
2. The proposed method 2.1. Scale-invariant saliency selection Our scale-invariant saliency selection scheme is based on the scale space theory [40]. It has been shown that under reasonable assumptions, the only possible scale space kernel is the Gaussian function, and the scale space of an image can be produced by convolving the image with variable-scale Gaussian [40]. Therefore, we can define the scale space of image as a function L(x, y, σ ) which is produced by the convolution of a variable-scale Gaussian g(x, y, σ ) with the image I(x, y), i.e.
L(x, y, σ ) = g(x, y, σ ) ∗ I (x, y ), 2
2
y where g(x, y, σ ) = 2π1σ 2 exp(− x2+ ) and ∗ is the convolution opσ2 erator and (x, y) indexes the position of pixel in the image. Here we adopt the scheme in [28] to sample the scale space of input image. The initial image I(x, y) is incrementally convolved with Gaussian function to produce images separated by a constant factor k in scale space. Each octave of scale space is divided into several layers with an integer number s, where k = 21/s . Once a complete octave has been processed, the first Gaussian image of the next octave has twice the initial value of σ 0 and is downsampled by taking every second pixel in each row and column of the Gaussian image in its previous octave. This procedure is repeated until enough number of octaves are processed. Hence the scale parameter of the sth layer in the oth octave, σ (o, s), is given by
σ (o, s ) = 2o−1 ks−1 σ0 , where o and s start with 1. Based on the sampled scale space, we can easily obtain the DoG scale space D(x, y, σ ) by calculating the difference of two nearby scales
D(x, y, σ ) = [g(x, y, kσ ) − g(x, y, σ )] ∗ I (x, y ) = L(x, y, kσ ) − L(x, y, σ ). As discussed in [28], the DoG response is robust to noise and can reflect the local image structure at the current scale and therefore we herein utilize it to generate the scale-invariant saliency map S(x, y). We first define a scale-dependent saliency metric Sσ (x, y) according to the absolute values of the DoG response which is further smoothed in the neighborhood with a Gaussian, i.e.
Sσ (x, y ) = g(x, y, σI ) ∗ |L(x, y, kσ ) − L(x, y, σ )|,
(1)
where σ I is the smoothing Gaussian parameter of (integration scale). The smoothing with g(x, y, σ I ) is introduced to make the saliency estimation more robust to the noise and in our implementation it is approximated by a 3 × 3 Gaussian low-pass spatial filter. It should be noted that instead of using the DoG-based response, we can try to use some derivatives-based alternatives [41]. For example, we can define the following gradient-based metric
Sσ (x, y ) = g(x, y, σI ) ∗
∂ L(x, y, σ ) ∂x
2
∂ + L(x, y, σ ) ∂y
2 , (2)
or the following scale-normalized metric based on Laplacian of Gaussian (LoG)
∂2 ∂2 Sσ (x, y ) = g(x, y, σI ) ∗ σ 2 L(x, y, σ ) + L(x, y, σ ). ∂x ∂ y2 2
(3)
Here we use the DoG-based metric (1) in that the DoG operator is a close approximation of the LoG function but can significantly accelerate the computation process [28]. Another kind of possible
122
Y. Liang, Y. Mao and J. Xia et al. / Neurocomputing 356 (2019) 119–130
Fig. 2. Construction of the scale-invariant saliency map for each image.
alternatives is based on the eigenvalues of the second moment matrix [34,41] or Hessian matrix [28], but it is far more complicated and less stable [42]. To construct the scale-invariant saliency map S(x, y) of image I(x, y), for each position, we search the maximum saliency metric across all the scales σ in the scale space by
S(x, y ) = max Sσ (x, y ).
(4)
σ
Due to the image resolution of each octave is different, we first apply max operation within each octave and then resize the resulting map to be exactly the size of the original image. The scaleinvariant saliency map is finally obtained by applying the max operation across each octave, as shown in Fig. 2, where the o is the number of octaves in the scale space. In our implementation, when generating the DoG scale space pyramid, we set initial scale σ0 = 1 and then produce s + 1 images in the stack of Gaussian blurred images for each octave so that the saliency comparison per octave covers a complete octave.
A straightforward approach is to take the obtained scaleinvariant saliency of each pixels as the corresponding activity or weight. However, it will introduce blur in the fused image. We adopt the simple non-max suppression to alleviate this problem, i.e. for ith image Ii (x, y), we can determine a mask Mi (x, y) as the initial activity map by comparing the obtained scale-invariant saliency maps {Si (x, y )}ni=1
M (x, y ) =
1, 0,
W (x, y ) =
if Si (x, y ) = max S1 (x, y ), S2 (x, y ), . . . , Sn (x, y ) otherwise,
(5) where n is the number of input images. However, as the above procedure compares pixels individually without considering the spatial context information, the resulting masks are usually noisy and will introduce artifacts into the fused image. Moreover, there may exist more than one maximum at a spatial position in scale space (i.e., there exist multi-scale structures). To deal with these situations, we can choose a sophisticated solution to model
1
[a(u, v )I (x, y ) + b(u, v )],
(6)
|ω| (u,v)∈ω(x,y)
where |ω| is the number of pixels in window with size of (2r + 1 ) × (2r + 1 ) which is centered at pixel (x, y), a(x, y) and b(x, y) are the const coefficients of window ω(x, y) which are determined by ridge regression 1
a(x, y ) =
2.2. Activity maps generation and fusion
i
the pixel saliency and spatial smoothness simultaneously into a energy function, which can be globally optimized by some tools such as graph-cut techniques [43], but the optimization is often relatively inefficient. Another choice is to perform the morphology smoothing operation, which is very efficient but inaccurate that is likely to introduce errors or artifacts [34]. Guided image filtering [29] or joint bilateral filtering [44] is an interesting alternative, which provides a trade-off between the efficiency and accuracy. Following [24], we determine the final activity map W(x, y) by applying guided filtering on the initial activity map as follows
|ω|
(u,v )∈ω (x,y ) I (u, v )M (u, v ) − μ (u, v )M (u, v )
¯
δ (u, v ) + b(x, y ) = M¯ (x, y ) − a(x, y )μ(x, y ). 2
(7) (8)
δ 2 (x,
Here μ(x, y) and y) are the mean and variance of image I in window ω(x, y), M¯ (x, y ) is the mean of the initial activity map M in window ω(x, y) and denotes the regularization parameter penalizing large a(x, y). The parameters of guided filtering are set to r = 2, = 2 in our implementation. For each input image Ii (x, y), we can determine the corresponding activity map Wi (x, y) and then obtain the final fused image F(x, y) by
F (x, y ) =
n
(
) (
i i i=1 W x, y I x, y n i i=1 W x, y
(
)
)
.
(9)
For color input images, the activity map is repeated for red, green and blue channels, respectively, to generate the final color fused image. 3. Experiments and discussions In this section, we conduct a set of experiments to demonstrate the effectiveness and efficiency of the proposed image
Y. Liang, Y. Mao and J. Xia et al. / Neurocomputing 356 (2019) 119–130
123
Fig. 3. Example source images.
fusion method for different kinds of images. The experimental setups are introduced first and then the comparisons of our method with other state-of-the-art algorithms are conducted and analyzed. 3.1. Dataset and setups of experiments We use four image datasets for the experiments. The first one is composed by 8 pairs of gray multi-modal medical images and the second one contains 15 pairs of multi-focus gray or color natural images, while the third one includes 8 pairs of gray visible-infrared images. These three are the public datasets for image fusion and are often used in many related papers. The last one is a new multifocus cervical cell image dataset collected by ourselves, which consists of 15 groups of color images and each group contains a series of multi-focus cervix cell images with size of 2040 × 1086 or 2448 × 2048, etc. Some examples in our experiments are shown in Fig. 3. We compare our method with several state-of-the-art algorithms, including dense SIFT (DSIFT)[20], dual-tree complex wavelet transform (DTCWT) [34], guided filter fusion (GFF) [24], image matting (IM) [26], CNN-based method [22], Laplacian sparse representation (LP-SR) [35], multi-weighted gradient fusion (MWGF) [34], non-sampled contourlet transform (NSCT) [33], boundary finding based method (BF) [18] and DenseFuse [4]. The implementation of DenseFuse [4] is based on the Tensorflow [45] and the other compared methods are implemented in Matlab.1 The parameters of them are set with default values given by the authors. Our source code is implemented in C++ and all 1 The training of CNN-based method [22] is claimed to be implemented in C++ (Caffe) but for the inference only Matlab code is provided.
the data used in experiments are available online.2 The results are compared in terms of both visual and objective quality, along with the efficiency. For the objective quality evaluation of image fusion, there is not a reference image (ground truth) in practice to make an accurate assessment for the fused result and hence there is not a uniform metric to describe the performance of the fused image [35,46]. Usually, several metrics are combined to make a comprehensive analysis for a reference. Here, we adopt six commonlyused metrics, including normalized mutual information (QNMI ) [47], Yang’s et al. metric (QY ) [48], quality index (QI ) [49], edge information preservation value (QAB/F ) [50], feature mutual information (QFMI ) [51] and visual information fidelity (QVIF ) [52]. The brief introduction of each metric is described below. Normalized mutual information (QNMI ) [47]. It is defined as
QNMI
MI (A, F ) MI (B, F ) =2 + , H (A ) + H (F ) H (B ) + H (F )
(10)
where H( · ) means the entropy of corresponding image, and MI( · , · ) is the mutual information between the two images. It is the advanced metric compared to the traditional mutual information metric [53], which is not stable and sensitive to the measure towards the source image with the highest entropy. QNMI reflects how well the original content from the source images is preserved in the fused image. Yang’s et al. metric (QY ) [48]. It is a structural similarity (SSIM) [54] based fusion metric, which measures the level of structural information of source images preserved in the fused image. For the
2
https://github.com/yiqingmy/Fusion.
124
Y. Liang, Y. Mao and J. Xia et al. / Neurocomputing 356 (2019) 119–130
source images A, B and the fused image F, it is calculated by
λ(w )SSIM(A, F |w ) +(1 − λ(w ))SSIM(B, F |w ), SSIM(A, B|w ) ≥ 0.75 QY = max(SSIM(A, F |w ), SSIM(B, F |w )), SSIM(A, B|w ) < 0.75, (11) where SSIM(·, ·|w ) is the structural similarity funciton and w is a 7 × 7 window and λ(w) is the local weight, which is defined as
s ( A|w ) , s ( A|w ) + s ( B|w )
λ (w ) =
(12)
where s(A|w) and s(B|w) are the variance of source images A and B within window w. Quality index (QI ) [49]. It is designed by modeling image distortion with the factors of loss of correlation, luminance distortion and contrast distortion, which is calculated in local regions using a sliding window approach and is defined as
QI =
M 1
Q j, M
(13)
j=1
where M is the number of local regions of the image and the Qj is calculated by
Qj =
4σA j Fj A¯ j F¯j
(σA2j + σF2j ) + [(A¯ j )2 + (F¯j )2 ]
,
(14)
where Aj , Fj are the jth local region of the source image A and fused image F, respectively. σA j Fj is the covariance between the two signals, A¯ j , F¯j are the mean value of each signal, σ 2 , σ 2 are the Aj
Fj
variance for the corresponding signals. The edge information preservation value (QAB/F ) [50]. It evaluates the amount of edge information that is transferred from input images to the fused image to associate the important visual information with the “edge”. For the image with resolution X × Y, the metric value is defined as
X
x=1
QAB/F =
Y
Q AF (x, y )wA (x, y ) + Q BF (x, y )wB (x, y ) , X Y A B i=1 j=1 (w (i, j ) + w (i, j ))
y=1
(15)
where the edge preservation values Q AF (x, y ) = QgAF (x, y )QαAF (x, y ), the QgAF (x, y ) and QαAF (x, y ) denote the edge strength and orientation preservation at pixel (x, y) respectively for image A, same as the QBF (x, y) for source image B. The wA (x, y) and wB (x, y) are the weighting factors of QAF (x, y) and QBF (x, y) respectively. Feature mutual information (QFMI ) [51]. It estimates the joint probability distribution from marginal distribution to calculate the mutual information based on the image features. It is calculated by
QF MI =
IF A IF B + , HF + HA HF + HB
(16)
where HA , HB and HF are the histogram based entropies of the source images A, B and fused image F respectively, and the amount of feature information IFA is calculated by
IF A =
pF A (x, y, z, w ) log2
f,a
pF A (x, y, z, w ) , pF (x, y ) · pA (z, w )
(17)
where pFA (x, y, z, w) is the joint probability distribution function, pA (x, y) and pF (x, y) are the marginal distribution corresponding to the gradient of the image A and F , respectively. Visual information fidelity (QVIF ) [52]. It quantifies the loss of image information to the distortion process and explores the relationship between image information and visual quality. It is given by
QV IF =
− → − → I ( C N, j ; A N, j |sN, j ) . → − →N, j − ; F N, j |sN, j ) j∈subbands I ( C j∈subbands
(18)
− → − → It sums over all the subbands of interest, where I ( C N, j ; A N, j |sN, j ) − →N, j − →N, j N, j and I ( C ; F |s ) represent the information that could ideally be extracted by the brain from subband j in the test and refer− → ence images respectively. The C N, j represents N elements of the random fields (RF) that describes the coefficients from subband j. This metric relates image quality with the amount of information shared between the fused image and source image. We measure these metrics with the default parameters setting and the values of them are all the larger the better for the performance of image quality. 3.2. Results and discussions We first evaluate the performance of the proposed method under varying total number of octaves o and number of layers s sampled per octave. The fused images of a pair of multi-modal medical images with different o and s are shown in Fig. 4. In this example, on the one hand, when only 1 or 2 octaves are involved in constructing the DoG scale space pyramid, the fused images fail to keep the integrity information of large size objects (e.g. eyeballs), while by increasing the value of o, the integrity information of eyeballs is preserved. On the other hand, although not as significant as the increase of octave numbers o, the fused image can contain more details by the increase of layer numbers s. The corresponding objective quality metrics are shown in Fig. 5. As shown in Fig. 5(a), most of the metric values are improved as the number of octaves increases with the fixed layer numbers 3 in the global tendency and each of them tends to be stable when the number of octaves is 5. To get a relatively good quality from Fig. 5(b), we can notice that some of the metric values can get a good performance when the number of layers is 3, such as the QNMI , QI and QVIF , though there are only a little change of all the metric values by increasing the number of layers with the fixed octave numbers 5. Because it will result in more computation burden with the increase of the value o and s, and for different kinds of source images, there are different performance with the diverse parameter settings. To get a trade-off between them in our experiments, we set o = 5, s = 3 for the multi-modal dataset, o = 1, s = 1 for the natural dataset, o = 4, s = 3 for the visible-infrared dataset and o = 3, s = 5 for the multi-focus cell dataset respectively given by investigating the fused performance under different parameters setting. Fig. 6 shows the sampled fused images obtained by different methods with the multi-modal source images shown in Fig. 3(a). As shown in these figures, the proposed method can produce images which preserve the complementary information of different source images well. Moreover, due to the scale-invariant structure saliency selection, our method can keep the integrity information of large size objects and the visual details simultaneously. It extracts the details and the structure from the different modal source images well, as shown in Fig. 6(k). For the performance in Fig. 6(e), (i) and (j), the visual quality is acceptable that most of the complementary information is maintained well from the source images, but some the integrity of the brain is destroyed compared with our fused image. And there are serious distortion in the fused results by the compared methods in Fig. 6(a)–(d), (f)–(h). Furthermore, from Fig. 6(l)–(v), the DTCWT, GFF, IM, NSCT and DenseFuse methods may decrease the brightness and contrast while the proposed method can preserve these features and details without producing visible artifacts and brightness distortions. Figs. 7–9 show the fused images of different methods by fusing the sampled natural image pairs shown in Fig. 3(b), respectively. It can be shown from these figures that although all these methods generate acceptable fused images in global performance, our method produces slightly better results than others (see the halo artifacts in the magnified area of Fig. 7 with the closeup view
Y. Liang, Y. Mao and J. Xia et al. / Neurocomputing 356 (2019) 119–130
125
Fig. 4. The fused image obtained by the proposed method with varying o and s. Notice that the integrity information of large size objects (e.g. eyeballs) is reserved by increasing the value o and s.
Fig. 5. Performance of the proposed method with varying s and o.
in the bottom-left corner). And in Fig. 9, our method conducts a better performance than others that the boundary is sharp without blur artifacts, shown in the close-up views in the top-right of each sub-picture. The results in Fig. 8 have similar visual performance of these methods because of the simple scene without intricate background and large varying scales of objects in the source images. Fig. 10 displays the results by fusing the visible-infrared images shown in Fig. 3(c). From Fig. 10(a) to (k), it can be seen that in Fig. 10(k) with our method, the brightness of the board and pole is preserved well without any distortion while there are some black blocks that the integrity information is destroyed in other fused images. And from Fig. 10(l) to (v), the proposed method can extract the complementary brightness from the source images well where the contrast and details of the car are preserved well, while others produce visible artifacts and contrast distortions. The
close-up view is shown in the bottom-left corner in each subpicture. Figs. 11–14 show the comparative fused results of the multifocus cell images shown in Fig. 3(d). As shown in the closeup views in the bottom-right of Fig. 11, the fused images based on DSIFT, IM, MWGF, BF and DenseFuse methods are extremely blurred in the boundary and fail to keep the details of cell nucleus. Furthermore, the DTCWT and NSCT based methods produce halo artifacts in the fused images, while GFF and CNN based methods fail to preserve the small cell nucleus. LP-SR based method nearly works fine which keeps the most of the details of the small size cells, but the integrity of the clustered large size cells is damaged. Fortunately, in our proposed method, the integrity of the clustered large size cells is preserved and most of the isolated small size cells are maintained from the original images, which demonstrates the best visual quality.
126
Y. Liang, Y. Mao and J. Xia et al. / Neurocomputing 356 (2019) 119–130
Fig. 6. The fused results obtained by different methods for the multi-modal medical images.
Fig. 7. Compared results of the source images “Clock”.
Fig. 8. Compared results of the source images “wine”.
Fig. 9. Source images and fusion results of the “flowers” source images in Fig. 3(b).
Y. Liang, Y. Mao and J. Xia et al. / Neurocomputing 356 (2019) 119–130
127
Fig. 10. The fused results obtained by different methods for the visible-infrared images.
Fig. 11. Fusion results of the first group multi-focus cell images in Fig. 3(d).
Fig. 12. Fusion results of the third group multi-focus cell images in Fig. 3(d).
Similarly, as shown in the close-up views of Fig. 12, the fused images from DSIFT, IM, MWGF, BF and DenseFuse are blurred and lose some nucleus details, while the results from DTCWT, GFF, CNN and NSCT produce halo artifacts. LP-SR based method can keep details well but also produces halo artifacts. Our method can preserve the focused areas of different source images well without introducing any artifacts. For the fused results in Fig. 13, there are some halo effects and some details are lost in the compared methods from Fig. 13(a) to (j), shown in the close-up view in the topleft corner. With the proposed method, both the integrity of the
cluster cells and the details of the small cell nucleus are preserved well. For the example illustrated in Fig. 14, the fused images generated by DSIFT, DTCWT, IM and NSCT all fail to preserve the focused areas of different source images and result in extremely blurred images. The reason is that some of the images loss focus seriously among the series of input source images, and the process of these methods are all related to the local region that makes it sensitive to the serious blur pixels to generate the decision map. And makes it inconsistent with other visual performance above. The GFF, CNN,
128
Y. Liang, Y. Mao and J. Xia et al. / Neurocomputing 356 (2019) 119–130
Fig. 13. Fusion results of the last group multi-focus cell images in Fig. 3(d).
Fig. 14. Fusion results of the fourth group multi-focus cell images in Fig. 3(d).
MWGF, BF and DenseFuse based method introduce a lot of color distortion of the nucleus regions and the obvious halo artifact. The result of LP-SR based method is close to the one of our method but introduces some odd color distortion. Again, our method produces fused image which can preserve the focused areas of different source images well without introducing any artifacts. To analyze the objective metrics for each method, we calculate the metric between each source image and the corresponding fused image one by one for the group of source images, and the final metric value is the average of them. All the quantitative results of different fusion methods are shown in Table 1. It can be seen that the proposed method yields competitive objective metrics for different kinds of datasets. For the multi-modal dataset, the metrics value QNMI is highest though others are not the best but nearly close to the best scores. It seems that the BF is better in this dataset, which is based on the boundary finding directly and there are the clear boundary for the test multi-modal images that the structural information can be maintained well. But our method is the most stable one that can achieve comprehensive competitive metric values for different source image datasets with these metrics, such as the QNMI and QY for the natural multi-focus dataset, the QNMI , QY , QAB/F , QFMI and QVIF for the visible-infrared dataset, the QAB/F for the multi-focus cell dataset. It means that our method can extract the information from a series of source images to the fused image well that is robust to fuse different kinds of images, while other methods can only get comparable better score for a certain dataset, such as the BF, CNN-based and DenseFuse methods.
We also compare the computational efficiency of each method on the high-resolution color cell images with the size of 2040 × 1086. Experiments are performed on a computer equipped with a 4.20 GHz CPU and 8 GB memory and all codes are available online. The average running time of different image fusion methods is compared in Table 2. As mentioned before, the compared methods except for the DenseFuse are all implemented in Matlab while the implementation of the DenseFuse is based on the Tensorflow with a NIVIDA GeForce GTX TITAN Xp GPU, and our code is implemented in C++, therefore strictly speaking, the comparison is running time unfair. Here, we re-implement the GFF-based method with C++ and also include the corresponding running time in Table 2 to reveal the running efficiency of different implementation between Matlab and C++ to some extent. As shown in Table 2, the guided filtering based method, i.e. GFF-based and the proposed method are the most efficient methods while the CNN-based is the most time-consuming. And the efficiency of the DenseFuse method is not superior to ours though the implementation is based on the high performance computing GPU. Comparing to the original Matlab implementation, the GFF-method can be speeded up by almost 40% with C++ implementation, but it is still much slower than our method. We attribute this to the following reasons. First, the computation burden of the DoG-based scale-invariant saliency selection step can be negligible comparing to the computation burden of activity maps refinement step based on the guided filtering. The GFF-based method needs to perform guided filtering twice (each for both base and detail layers), while our method only needs to perform filtering one time to refine the activity maps of each
Y. Liang, Y. Mao and J. Xia et al. / Neurocomputing 356 (2019) 119–130
129
Table 1 The average objective assessments of different fusion methods. Source
Index
DSIFT
DTCWT
GFF
IM
CNN
LP-SR
MWGF
NSCT
BF
DenseFuse
Proposed
Multi-modal dataset
QNMI QY QAB/F QI QFMI QVIF
0.7132 0.8623 0.6017 0.5109 0.8639 0.3731
0.5090 0.6876 0.5037 0.4180 0.8514 0.2310
0.5850 0.7921 0.5600 0.4807 0.8574 0.2789
0.6424 0.8565 0.5354 0.5134 0.8560 0.30 0 0
0.6741 0.8619 0.6183 0.5962 0.8718 0.4165
0.5982 0.7422 0.5610 0.4887 0.8569 0.2735
0.6597 0.8729 0.5983 0.5236 0.8680 0.3561
0.5306 0.7201 0.5390 0.4428 0.8514 0.2417
0.7148 0.8803 0.6224 0.5931 0.8748 0.4444
0.6946 0.8679 0.5239 0.4608 0.8433 0.2730
0.7171 0.8708 0.5850 0.5471 0.8648 0.3315
Natural multi-focus dataset
QNMI QY QAB/F QI QFMI QVIF
0.7478 0.8342 0.6203 0.6351 0.8597 0.4871
0.7332 0.8566 0.6196 0.6182 0.8612 0.4820
0.7569 0.8644 0.6256 0.6236 0.8616 0.4953
0.7517 0.8631 0.6226 0.6175 0.8612 0.4905
0.7358 0.8396 0.6774 0.6792 0.8686 0.5756
0.7387 0.8547 0.6206 0.6239 0.8615 0.4827
0.7435 0.8657 0.6210 0.6152 0.8619 0.4948
0.7426 0.8540 0.6233 0.6381 0.8612 0.4857
0.7528 0.8673 0.6660 0.6634 0.8583 0.5584
0.7768 0.8870 0.6023 0.6987 0.8539 0.4782
0.8112 0.9137 0.6749 0.6891 0.8663 0.5457
Visible-infrared dataset
QNMI QY QAB/F QI QFMI QVIF
0.5402 0.8479 0.5985 0.4838 0.8565 0.3809
0.3506 0.7766 0.5689 0.4812 0.8718 0.3056
0.4523 0.8303 0.5880 0.5051 0.8750 0.3546
0.5034 0.8478 0.5686 0.4911 0.8724 0.3584
0.4945 0.8492 0.5911 0.4968 0.8757 0.3788
0.4301 0.8277 0.5956 0.5039 0.8745 0.3562
0.5198 0.8850 0.5889 0.5028 0.8754 0.3978
0.3565 0.7979 0.5892 0.5034 0.8728 0.3202
0.6018 0.8934 0.5936 0.5050 0.8765 0.4062
0.5242 0.8374 0.5900 0.5722 0.8678 0.3827
0.6107 0.9063 0.6402 0.5278 0.8812 0.4336
Multi-focus cell dataset
QNMI QY QAB/F QI QFMI QVIF
0.6309 0.2402 0.1870 0.1915 0.7603 0.2261
0.5461 0.2236 0.2120 0.1779 0.7570 0.1854
0.6145 0.2321 0.2100 0.1844 0.7622 0.2205
0.5723 0.2102 0.1950 0.1677 0.7475 0.1959
0.5419 0.2282 0.2051 0.1662 0.7456 0.2087
0.5082 0.2712 0.2331 0.2161 0.7579 0.2293
0.5340 0.2279 0.1886 0.1814 0.7152 0.2225
0.5385 0.2655 0.2228 0.2115 0.7573 0.1967
0.5684 0.2348 0.2102 0.1855 0.7537 0.2270
0.6928 0.3609 0.2033 0.3574 0.9526 0.2993
0.5415 0.3085 0.2468 0.2474 0.7605 0.2388
Table 2 The average running time (s) of different fusion methods on color cell images with size of 2040 × 1086. Specially, the DenseFuse method is implemented on the GPU. DSIFT
DTCWT
IM
CNN
LP-SR
MWGF
1679.67 NSCT 414.24
36.53 BF 453.39
72.47 DenseFuse(GPU) 1678.27
5276.09 GFF 10.66
34.29 GFF(C++) 6.44
259.30 Proposed 2.08
source image. Second, instead of using the original color image, we use the gray one as the guided image to accelerate the activity refinement step. Due to the extremely efficiency, our method can be applied for some nearly real-time applications such as digital cytopathology [9] and can be further accelerated through GPU programming.
Conflict of Interest None. Acknowledgments This research was partially supported by the Natural Science Foundation of Hunan Province, China (No. 14JJ2008) and the National Natural Science Foundation of China under Grant Nos. 61602522, 61573380, and 61672542 and the Fundamental Research Funds of the Central Universities of Central South University under Grant No. 2018zzts577. Supplementary material Supplementary material associated with this article can be found, in the online version, at doi:10.1016/j.neucom.2019.04.043.
4. Conclusion
References
In this paper, based on the scale space theory, we propose a very simple yet effective multi-scale image fusion method in spatial domain. To keep both details of small size objects and the integrity information of large size objects in the fused image, we first get a robust saliency map with the scale-invariant structure based on the DoG scale space pyramid, which transfers the details and integrity detection of the objects into a scale-zooming intensive response. Then the activity map is constructed by non-max suppression scheme based on the saliency maps and refined by the guided filtering to capture the spatial context. Finally, the fused image is generated by combining the activity maps and the original input images intuitively. Experimental results demonstrate that our method is efficient and can produce an all-in-focus image with a high quality for different kinds of datasets, which can extract the complementary information from source images and preserve the details and the integrity of very different size objects well. Meanwhile, due to the low-time complexity, the proposed method can deal with high resolution images in a very efficient way and can be applied for the real-time application.
[1] T. Stathaki, Image Fusion: Algorithms and Applications, Elsevier, 2011. [2] S. Li, X. Kang, L. Fang, J. Hu, H. Yin, Pixel-level image fusion: a survey of the state of the art, Inf. Fusion 33 (2017) 100–112. [3] J. Du, W. Li, K. Lu, B. Xiao, An overview of multi-modal medical image fusion, Neurocomputing 215 (2016) 3–20. [4] H. Li, X.-J. Wu, DenseFuse: a fusion approach to infrared and visible images, IEEE Trans. Image Process. 28 (5) (2019) 2614–2623. [5] J. Duan, L. Chen, C.P. Chen, Multifocus image fusion with enhanced linear spectral clustering and fast depth map estimation, Neurocomputing 318 (2018) 43–54. [6] K. He, D. Zhou, X. Zhang, R. Nie, Multi-focus: focused region finding and multi-scale transform for image fusion, Neurocomputing 320 (2018) 157–170. [7] J. Ma, Z. Zhou, B. Wang, L. Miao, H. Zong, Multi-focus image fusion using boosted random walks-based algorithm with two-scale focus maps, Neurocomputing 335 (2019) 9–20. [8] R. Nayar, D.C. Wilbur, The Bethesda System for Reporting Cervical Cytology: Definitions, Criteria, and Explanatory Notes, Springer, 2015. [9] L. Pantanowitz, M. Hornish, R. Goulart, The impact of digital imaging in the field of cytopathology, CytoJournal 6 (2009) 6. [10] T. Mertens, J. Kautz, F. Van Reeth, Exposure fusion: a simple and practical alternative to high dynamic range photography, in: Computer Graphics Forum, 28, Wiley Online Library, 2009, pp. 161–171. [11] X. Liu, W. Mei, H. Du, Structure tensor and nonsubsampled Shearlet transform based algorithm for CT and MRI image fusion, Neurocomputing 235 (2017) 131–139.
130
Y. Liang, Y. Mao and J. Xia et al. / Neurocomputing 356 (2019) 119–130
[12] V.S. Petrovic, C.S. Xydeas, Gradient-based multiresolution image fusion, IEEE Trans. Image Process. 13 (2) (2004) 228–237. [13] B. Yang, S. Li, Multifocus image fusion and restoration with sparse representation, IEEE Trans. Instrum. Meas. 59 (4) (2010) 884–892. [14] Y. Liu, X. Chen, R.K. Ward, Z.J. Wang, Image fusion with convolutional sparse representation, IEEE Signal Process. Lett. 23 (12) (2016) 1882–1886. [15] Q. Zhang, T. Shi, F. Wang, R.S. Blum, J. Han, Robust sparse representation based multi-focus image fusion with dictionary construction and local spatial consistency, Pattern Recognit. 83 (2018) 299–313. [16] H.R. Shahdoosti, H. Ghassemian, Combining the spectral PCA and spatial PCA fusion methods by an optimal filter, Inf. Fusion 27 (2016) 150–160. [17] K. Ram Prabhakar, V. Sai Srikar, R. Venkatesh Babu, DeepFuse: a deep unsupervised approach for exposure fusion with extreme exposure image pairs, in: Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 4714–4722. [18] Y. Zhang, X. Bai, T. Wang, Boundary finding based multi-focus image fusion through multi-scale morphological focus-measure, Inf. Fusion 35 (2017) 81–101. [19] A.A. Goshtasby, S. Nikolov, Image fusion: advances in the state of the art, Inf. Fusion 8 (2007) 114–118. [20] Y. Liu, S. Liu, Z. Wang, Multi-focus image fusion with dense SIFT, Inf. Fusion 23 (2015) 139–155. [21] V.N. Gangapure, S. Banerjee, A.S. Chowdhury, Steerable local frequency based multispectral multifocus image fusion, Inf. Fusion 23 (2015) 99–115. [22] Y. Liu, X. Chen, H. Peng, Z. Wang, Multi-focus image fusion with a deep convolutional neural network, Inf. Fusion 36 (2017) 191–207. [23] H. Tang, B. Xiao, W. Li, G. Wang, Pixel convolutional neural network for multifocus image fusion, Inf. Sci. 433 (2018) 125–141. [24] S. Li, X. Kang, J. Hu, Image fusion with guided filtering, IEEE Trans. Image Process. 22 (7) (2013) 2864–2875. [25] Y. Chen, J. Guan, W.-K. Cham, Robust multi-focus image fusion using edge model and multi-matting, IEEE Trans. Image Process. 27 (3) (2018) 1526–1541. [26] S. Li, X. Kang, J. Hu, B. Yang, Image matting for fusion of multi-focus images in dynamic scenes, Inf. Fusion 14 (2) (2013) 147–162. [27] X. Xia, Y. Yao, L. Yin, S. Wu, H. Li, Z. Yang, Multi-focus image fusion based on probability filtering and region correction, Signal Process. 153 (2018) 71–82. [28] D.G. Lowe, Distinctive image features from scale-invariant keypoints, Int. J. Comput. Vis. 60 (2) (2004) 91–110. [29] K. He, J. Sun, X. Tang, Guided image filtering, IEEE Trans. Pattern Anal. Mach. Intell. 35 (6) (2013) 1397–1409. [30] Y. Liu, X. Chen, Z. Wang, Z.J. Wang, R.K. Ward, X. Wang, Deep learning for pixel-level image fusion: recent advances and future prospects, Inf. Fusion 42 (2018) 158–173. [31] A.P. James, B.V. Dasarathy, Medical image fusion: a survey of the state of the art, Inf. Fusion 19 (2014) 4–19. [32] S. Li, B. Yang, J. Hu, Performance comparison of different multi-resolution transforms for image fusion, Inf. Fusion 12 (2) (2011) 74–84. [33] Q. Zhang, B.-l. Guo, Multifocus image fusion using the nonsubsampled contourlet transform, Signal Process. 89 (7) (2009) 1334–1346. [34] Z. Zhou, S. Li, B. Wang, Multi-scale weighted gradient-based fusion for multi– focus images, Inf. Fusion 20 (2014) 60–72. [35] Y. Liu, S. Liu, Z. Wang, A general framework for image fusion based on multi-scale transform and sparse representation, Inf. Fusion 24 (2015) 147–164. [36] R. Shen, I. Cheng, A. Basu, Cross-scale coefficient selection for volumetric medical image fusion, IEEE Trans. Biomed. Eng. 60 (4) (2013) 1069–1079. [37] S. Li, J.-Y. Kwok, I.-H. Tsang, Y. Wang, Fusing images with different focuses using support vector machines, IEEE Trans. Neural Netw. 15 (6) (2004) 1555–1561. [38] X. Guo, R. Nie, J. Cao, D. Zhou, L. Mei, K. He, FuseGAN: learning to fuse multi focus image via conditional generative adversarial network, IEEE Trans. Multimed. 22 (2019). in press [39] M. Amin-Naji, A. Aghagolzadeh, M. Ezoji, Ensemble of CNN for multi-focus image fusion, Inf. Fusion 51 (2019) 201–214. [40] T. Lindeberg, Scale-space theory: a basic tool for analyzing structures at different scales, J. Appl. Stat. 21 (1–2) (1994) 225–270. [41] K. Mikolajczyk, C. Schmid, Scale & affine invariant interest point detectors, Int. J. Comput. Vis. 60 (1) (2004) 63–86. [42] K. Mikolajczyk, Detection of Local Features Invariant to Affines Transformations, Institut National Polytechnique de Grenoble, 2002 Ph.D. thesis. [43] V. Kolmogorov, R. Zabin, What energy functions can be minimized via graph cuts? IEEE Trans. Pattern Anal. Mach. Intell. 26 (2) (2004) 147–159. [44] G. Petschnigg, R. Szeliski, M. Agrawala, M. Cohen, H. Hoppe, K. Toyama, Digital photography with flash and no-flash image pairs, in: Proceedings of the ACM Transactions on Graphics (TOG), 23, ACM, 2004, pp. 664–672. [45] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard, et al., TensorFlow: a system for large-scale machine learning, in: Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), 2016, pp. 265–283. [46] Z. Liu, E. Blasch, V. John, Statistical comparison of image fusion algorithms: Recommendations, Inf. Fusion 36 (2017) 251–260. [47] M. Hossny, S. Nahavandi, D. Creighton, Comments on ‘information measure for performance of image fusion’, Electron. Lett. 44 (18) (2008) 1066–1067. [48] C. Yang, J.-Q. Zhang, X.-R. Wang, X. Liu, A novel similarity based quality metric for image fusion, Inf. Fusion 9 (2) (2008) 156–160. [49] Z. Wang, A.C. Bovik, A universal image quality index, IEEE Signal Process. Lett. 9 (3) (2002) 81–84.
[50] C. Xydeas, V. Petrovic, Objective image fusion performance measure, Electron. Lett. 36 (4) (20 0 0) 308–309. [51] M.B.A. Haghighat, A. Aghagolzadeh, H. Seyedarabi, A non-reference image fusion metric based on mutual information of image features, Comput. Electr. Eng. 37 (5) (2011) 744–756. [52] H.R. Sheikh, A.C. Bovik, Image information and visual quality, IEEE Trans. Image Process. 15 (2) (2006) 430–444. [53] G. Qu, D. Zhang, P. Yan, Information measure for performance of image fusion, Electron. Lett. 38 (7) (2002) 313–315. [54] Z. Wang, A.C. Bovik, H.R. Sheikh, E.P. Simoncelli, Image quality assessment: from error visibility to structural similarity, IEEE Trans. Image Process. 13 (4) (20 04) 60 0–612. Yixiong Liang is currently an Associate Professor of Computer Science in Central South University. Between 2011 and 2012, he was a visitor at the Robotics Institute, Carnegie Mellon University. From 2005 to 2007, he was a Postdoctoral Fellow in Institute of Automation, Chinese Academy of Science. He received the Ph.D., M.S. and B.S. degrees from Chongqing University, China, in 2005, 2002 and 1999, respectively. His research interests include computer vision and machine learning.
Yuan Mao is currently a master student in the School of Computer Science and Engineering in Central South University, China. Her research interests include image fusion, computer vision and machine learning.
Jiazhi Xia received the B.S. and M.S. degrees in computer science from Zhejiang University and the Ph.D. degree in computer science from Nanyang Technological University. He is currently an Associate Professor at the School of Information Science and Engineering, Central South University. His research interest includes visualization, visual analytics, and computer graphics.
Yao Xiang is currently a lecture of Computer Science in Central South University. She received the Ph.D., M.S. and B.S. degrees from Central South University, China, in 2011, 2007 and 2004, respectively. Her research interests include image processing and machine learning.
Jianfeng Liu is currently an Associate Professor of Control Science in Central South University. From 2008 to 2011, he was a Postdoctoral Fellow in CRRC Zhuzhou Locomotive Co., Ltd. Between 2009 and 2010, he was a visitor at the Electrical Engineering Institute, Michigan State University. He received the M.S. and Ph.D. degrees from Central South University, Changsha, China, in 20 03 and 20 08, respectively. His research interests include model predictor control and machine learning.