Infrared and visible image fusion via intensity transfer and direct matrix mapping

Infrared Physics and Technology 102 (2019) 103030 Contents lists available at ScienceDirect Infrared Physics & Technology journal homepage: www.else...

Download PDF

7MB Sizes 1 Downloads 46 Views

Report

Full Text

Infrared Physics and Technology 102 (2019) 103030

Contents lists available at ScienceDirect

Infrared Physics & Technology journal homepage: www.elsevier.com/locate/infrared

Infrared and visible image fusion via intensity transfer and direct matrix mapping Zhongmin Li, Huibin Yan

T

⁎

School of Information Engineering, Nanchang Hangkong University, Nanchang 330063, China

A R T I C LE I N FO

A B S T R A C T

Keywords: Infrared and visible image Image fusion Intensity transfer Direct matrix mapping

Infrared (IR) and visible (VIS) image fusion has turned into a research hotspot in image processing due to various applications recently. In this paper, so as to make the resulted images not only retain the thermal object information in the IR images and the background information in the VIS images simultaneously, but also be more in line with human vision, we propose a novel intensity transfer and direct matrix mapping-based fusion method, which we named IT-DMM. Speciﬁcally, we formulate such fusion as a minimization problem, and our solution to the problem is a direct matrix mapping from the input images to the resulted image. The weights used in our ITDMM, which are computed by using spatial saliency map of the IR image, are vital to the performance of the proposed method and are analyzed in detail. Our IT-DMM has high computational eﬃciency due to the direct matrix mapping and the eﬃcient weight computation method, and thus it has potential practical value. Qualitative and quantitative experiments on eighteen IR and VIS image pairs indicate that our IT-DMM outperforms eleven state-of-the-art image fusion methods in obtaining perceptually better fusion results with almost the same thermal object information in IR images and background information in VIS images.

1. Introduction IR images can better highlight IR objects because IR sensors can better capture thermal radiation, but their backgrounds are generally blurred and unsatisfactory, while VIS images are more in line with human vision, and more background details can be presented due to the reﬂected light capture mechanism of VIS sensors, but the objects in them are often diﬃcult to distinguish when the objects are similar to the backgrounds or in harsh environments such as fog or poor illumination. IR and VIS image fusion combing images of diﬀerent modalities in the same scene into a composite image which can describe the same scene more comprehensively, has turned into a research hotspot in image processing due to varieties of applications and requirements such as recognition, object detection, and surveillance [1]. Numerous fusion methods for IR and VIS images have been proposed, and we classify them into four categories, namely the transform domain (TD) methods [2,3], the spatial domain (SD) methods [4,5], the deep learning (DL)-based methods [6,7], and other methods [8,9]. Methods based on multi-scale transform (MST) are the most typical TD methods, which can generally be summarized into the following three phases, i.e., decomposition, fusion and reconstruction. Although the MST-based methods can usually obtain satisfactory results in most cases because the multi-scale processing mechanism is basically consistent ⁎

with human visual perception [10], there are two major problems in them, namely how to choose the MST and the pre-designed fusion rules. Furthermore, the MST-based methods may not perform well for IR and VIS images because the information of them in the same scale belongs to diﬀerent categories [5]. Recently, Edge-preserving ﬁlter (EPF) has become an eﬃcacious tool widely used in the SD image fusion methods. Classical edge-preserving ﬁlters include bilateral ﬁlter (BF) [11], weighted least squares ﬁlter (WLSF) [12], guided ﬁlter (GF) [13], rolling guidance ﬁlter (RGF) [14] and etc. For convenience, in this paper all the EPF-based fusion methods for IR and VIS images are classiﬁed as SD methods. Zhou et al. presented a bilateral ﬁlters-based method; their method can perceptually achieve better results than traditional MST-based methods [5]. Ma et al. proposed an RGF-based fusion method for IR and VIS images [15]. Nevertheless, it is always diﬃcult to design appropriate fusion rules for these methods because it is a tough task [6]. As a new ﬁeld in machine learning research, DL was introduced into image fusion due to its good feature extraction capabilities. Zhong et al. applied Convolutional Neural Network (CNN) to remote sensing image fusion [16]. Liu et al. designed a fusion method for multi-focus images via CNN; their method can automatically design complex fusion rules [17]. However, based on the basic fact that it is diﬃcult to deﬁne ground truth for IR and VIS image fusion, it is not feasible to use CNN to

Corresponding author. E-mail address: [email protected] (H. Yan).

https://doi.org/10.1016/j.infrared.2019.103030 Received 9 March 2019; Received in revised form 2 September 2019; Accepted 2 September 2019 Available online 06 September 2019 1350-4495/ © 2019 Elsevier B.V. All rights reserved.

Infrared Physics and Technology 102 (2019) 103030

Z. Li and H. Yan

Fig. 1. The “Camp” IR source image and its SSM.

advantages of our IT-DMM over eleven state-of-the-art methods. The remaining parts of this paper are outlined as follows. We brieﬂy introduce spatial saliency map used to compute the weights in Section 2. We elaborate on our proposed IT-DMM in Section 3. The subjective and objective evaluations of our IT-DMM compared with eleven stateof-the-art image fusion methods are given in Section 4. We give some concluding remarks in Section 5.

train an end-to-end model suitable for such fusion. So as to apply CNN to this fusion, Liu et al. utilized the model (small variation) in their previous work [17] to fuse the decomposed coeﬃcients according to their local similarity [6]. Ma et al. introduced the Generative Adversarial Network (GAN) into IR and VIS image fusion [7]. There is no doubt that DL based methods will get better fusion results in the near future, but it requires the support of powerful hardware due to its computational complexity. Other fusion methods for IR and VIS images include hybrid methods [18,19], total variation-based methods [8,20,21], fuzzy theory-based methods [22,23], feature extraction and visual preservation-based method [9] and etc. Hybrid methods usually yield better results for IR and VIS image fusion because they combine the advantages of several diﬀerent methods. Although the MST-based methods and the sparse representation (SR)-based methods are classiﬁed as the TD methods in this paper, since the latter methods are also performed in the TD, these two methods are generally considered to be two diﬀerent methods. For example, Liu et al. proposed a universal image fusion framework combining MST and SR [18]. Other fusion methods solve the fusion problem of IR and VIS images from various novel perspectives. For example, Ma et al. proposed a fusion method named gradient transfer fusion (GTF), which considers IR and VIS image fusion as a minimization problem. In their method, the cost function consists of two terms, i.e., the one term (data ﬁdelity term) tends to constrain the resulted image to have the same pixel intensities as the IR image, and the other term (regularization term) tends to control the resulted image to have the same gradient distribution as the VIS image [8]. However, loss of small-scale details and poor visual performance are the major drawbacks of their fusion method due to pixel gradients other than pixel intensities of the VIS image used in the regularization term. For the purpose of making the resulted images not only retain the thermal object information in the IR images and the background information in the VIS images simultaneously, but also be more in line with human vision, in this paper we formulate such fusion as a minimization problem, and then propose a novel fusion method via intensity transfer and direct matrix mapping (named IT-DMM). The main novelty of this paper is that we can obtain the direct matrix mapping between the input images and the resulted image, which is ﬁrst proposed for this kind of fusion as far as we know. Please note that our work is signiﬁcantly diﬀerent from the work in [8]; the method in [8] cannot obtain the direct matrix mapping between the source images and the resulted image, and its fusion model needs to be solved by total variation minimization techniques (the iteration process is time consuming). The weights used in our IT-DMM, which are computed by using spatial saliency map of the IR image, are vital to the performance of our IT-DMM. The direct matrix mapping and the eﬃcient weight computation method make our proposed IT-DMM have high computational eﬃciency. Therefore, it has potential practical value. Experimental results on eighteen IR and visible image pairs demonstrate the

2. Spatial saliency map Based on the assumption that color, intensity and texture are the contrast of visual signals, captured easily by human perception system, Zhai et al. proposed an eﬀective and eﬃcient method for computing the spatial saliency maps (SSM) using the color statistics of images [24]. Let S (I ) represent the spatial saliency map of image I , and we just take the computation of saliency value Sp of the pixel at p for example.

Sp =

∑ (Ip − Ik ), k

(1)

where Ip and Ik represent the value of the pixel at p and k in image I , respectively. Eq. (1) can be rewritten as follows because diﬀerent pixels in image I may have the same pixel value. 255

Sp =

∑ NV (Ip − V ), V =0

(2)

where NV is the total number of pixels whose values are equal to V in image I . The calculation of Eq. (1) can be accelerated via the histogram of image I [24] using Eq. (2). In this paper, the spatial saliency map S (I ) is normalized to [0 1]. Fig. 1 shows an IR image and its SSM. We can see that the signiﬁcant regions (such as the person) in Fig. 1(a) are highlighted in Fig. 1(b), while the remaining parts (background) in Fig. 1(a) are suppressed in Fig. 1(b). 3. The proposed method In this section, we ﬁrstly present the formulation of the fusion problem and give its solution, and then summarize the proposed method, ﬁnally analyze the role of the weights used in our IT-DMM. For convenience, in this paper we assume that m and n are the number of rows and columns of the gray images, respectively. Note that all the values in this paper are real numbers. 3.1. Problem formulation and solution 3.1.1. Problem formulation While there is no clear deﬁnition about the desired fusion results for IR and VIS image fusion, a certain method with fusion results meeting the following three requirements may be considered better, i.e. preserving the thermal objects in the original IR images, keeping the 2

Infrared Physics and Technology 102 (2019) 103030

Z. Li and H. Yan

backgrounds in the original VIS images, and being consistent with human visual perception. Based on the above three requirements, we formulate the IR and VIS image fusion as the following minimization problem.

x∗ = arg min

∑ ((xp − up)2 + wp (xp − vp)2)

x

or x∗

p

= arg min ( ∥x − u∥22 + ∥ W (x − v) ∥22 ), x

(3)

Rmn × 1

Rmn × 1

and v ∈ are the column vector form of the IR where u ∈ and VIS image, respectively. The subscript p represents the spatial location of a pixel. Notation ∥·∥2 represents the ℓ2 norm. W ∈ Rmn × mn is a weight matrix which is a diagonal matrix whose diagonal elements are wp (wp ⩾ 0 ). The vector x that minimizes Eq. (3) is the column vector form x∗ ∈ Rmn × 1 of the resulted image. The left term ∥x − u∥22 controls the resulted image to have the same pixel intensities as the IR image. The right term ∥ (x − v) ∥22 controls the resulted image to have the same pixel intensities as the VIS image. The weight matrix W controls the trade-oﬀ between the two terms. Note that the weights wp are vital to the performance of our proposed method (named IT-DMM), and the weight wp at p is computed as follows.

wp = |log(Sp)|,

Fig. 2. The graph of function f (x ) .

image, the saliency value Sp of the pixel at p will be a large value according to Eq. (2), and then the corresponding weight wp computed by Eq. (4) will be small. The smaller the weight wp , the more the pixel at p in the resulted image is transferred from the IR image and the less is transferred from the VIS image in order to minimize the Eq. (3). In contrast, when p is in the remaining parts (background) of the IR image, the saliency value Sp of the pixel at p will be a small value according to Eq. (2), and then the corresponding weight wp computed by Eq. (4) will be large. The larger the weight wp , the more the pixel at p in the resulted image is transferred from the VIS image and the less is transferred from the IR image in order to minimize the Eq. (3).

(4)

where Sp represents the saliency value of the pixel at p , and notation |·| denotes the absolute value. 3.1.2. Problem solution We rewrite the Eq. (3) as follows.

x∗ = arg min {(x − u)T (x − u) + (x − v)T W (x − v)}. x

u)T (x

(5)

v)T W (x

− u) + (x − − v), and then let the Let F (x) = (x − derivative of F (x) with respect to x be equal to zero, we can obtain Eq. (6) which the column vector x∗ ∈ Rmn × 1 should meet.

(II + W) x∗ − (u + Wv) = 0,

4. Experiments

(6)

where II ∈ Rmn × mn is a unit matrix. The column vector x∗ of the resulted image can ﬁnally be expressed as follows, since (II + W ) is a diagonal matrix.

x ∗ =(II + W)−1 (u + Wv),

This section consists of three parts including instructions about the experiments, subjective results and objective results compared with eleven state-of-the-art methods.

(7) 4.1. Instructions about the experiments

(·)−1

denotes the inverse. The solving process of Eq. where notation (5) is the same as that in [25] and [26]. Eq. (7) is the direct matrix mapping from the IR and VIS images to the resulted image, which is eﬀective and eﬃcient.

4.1.1. Dataset Eighteen pairs of images popular in IR and VIS fusion domain shown in Fig. 3 are adopted in our experiments. The ﬁrst thirteen pairs of them (a)-(m) are downloaded from the TNO datasets [27], three pairs of them (n)-(p) are provided by Zhou, the ﬁrst author of [5], and the remaining two pairs are provided by Zhang, the ﬁrst author of [9].

3.2. Procedure of the proposed method The procedure of the proposed method is summarized in Algorithm 1.

4.1.2. Compared methods Eleven state-of-the-art IR and VIS image fusion methods are compared with our IT-DMM, which are three TD methods (named DTCWT in [28], NSCT in [29], and LATLRR in [30]), three SD methods (named GF in [4], RGF in [15], and HMSD in [5]), three DL-based methods (named CNN in [6], DF in [31], and GAN in [7]), and two other methods (named MST-SR in [18] and GTF in [8]). GF and MST-SR were proposed in near six years. LATLRR, RGF, HMSD, GTF, CNN, DF and GAN were proposed in near three years, and LATLRR and three DLbased methods (CNN, DF and GAN) were newly published in just one year. The codes of the eleven compared methods are publicly available, and all the parameters in them remain unchanged.

Algorithm 1: The proposed method for IR and VIS fusion Input: infrared image u , visible image v Output: fused image x 1. calculate W according to Eqs. (2) and (4); 2. compute fused image x according to Eq. (7).

3.3. Analysis of the weights wp Fig. 2 shows the mathematical function graph between f (x ) = |log(x )| and x . From Fig. 2 we can see that a very small number x can obtain a large enough f (x ) , and only when x is a large enough number (near to 1) can obtain a small enough f (x ) . Note that the range of Sp is [0, 1], and then the range of wp is [0, ∞] according to Eq. (4). Next, we will explain why the weight wp at p computed by Eqs. (2) and (4) can work. When p is in the signiﬁcant regions (thermal objects) of the IR

4.1.3. Evaluation metrics As there is no evidence that any image fusion evaluation metric is better than the others, six evaluation metrics are selected to evaluate diﬀerent methods quantitatively. They are entropy (EN ), standard 3

Infrared Physics and Technology 102 (2019) 103030

Z. Li and H. Yan

Fig. 3. Eighteen source image pairs used in our experiments.

and VIS images, i.e., “Bunker”, “Tank”, “Marne 04”, and “Bench”, respectively. Some regions of the images are labeled with red or green rectangles, and some of them are enlarged in the lower left or the lower right corner of the corresponding images for better comparisons. As shown in Figs. 4–7(a)–(b), the objects in the IR images are obvious, while the backgrounds in the VIS images are clear. DTCWT, NSCT, GF, MST-SR, LATLRR, HMSD, and DF fail to retain the object well in the Fig. 4(a), as labeled by red rectangles in Fig. 4(c–g), (i) and (l). GTF and GAN can keep the object well, but these two methods have poor capabilities in keeping the background information in Fig. 4(b), as illustrated in Fig. 4(k) and (m). RGF cannot keep the intensities of the background in Fig. 4(b) well, as illustrated in Fig. 4(h). CNN fuses the source image pair well, but the object seems overexposed and the “AUTO” labeled with a green rectangle is slightly darker than it in Fig. 4 (a), as shown in Fig. 4(j). Our IT-DMM copes with all the problems described above well, as illustrated in Fig. 4 (n). Speciﬁcally, the object in the resulted image is almost the same as that in Fig. 4(a) and the background of the resulted image is consistent with that in Fig. 4(b).

deviation (SD ), two mutual information-based metrics (named NMI in [32], QTE in [33]), structural similarity index measure (SSIM ) [34], and human perception inspired fusion metric (QM ) [35]. EN and SD are widely used to predict the quality of the resulted image. Here, EN and SD are used as auxiliary metrics due to their unstable performance [1,36]. NMI and QTE are two metrics which measure the information of the resulted image transferred from IR and VIS images. SSIM measures the structural similarity between two images. Note that SSIM is just calculated between the VIS image and the resulted image to better demonstrate the superiority of our IT-DMM. QM is a human perceptionbased evaluation metric. For the ﬁrst ﬁve metrics, a larger value means a better performance of the fusion method, but for the metric QM , the contrary is the case. The codes for all the metrics are publicly available, and we keep the parameters in them unchanged.

4.2. Subjective results compared with eleven methods Figs. 4–7 show the resulted images of all methods on four pairs of IR 4

Infrared Physics and Technology 102 (2019) 103030

Z. Li and H. Yan

Fig. 4. Fusion results on the “Bunker” source image pair.

the background details (such as the red rectangle labeled parts enlarged in the lower left corner) in Fig. 6(b) well except GF. However, it loses too much spectral information (see the clouds in the upper left) in Fig. 6(a), as illustrated in Fig. 6(e). Furthermore, the resulted images of

As illustrated in Fig. 5, the situation is similar with that in Fig. 4 except that RGF and GAN fail to retain the object in Fig. 5(a) well, as labeled by red rectangles in Fig. 5(h) and (m). As illustrated in Fig. 6, all the compared methods cannot cope with

5

Infrared Physics and Technology 102 (2019) 103030

Z. Li and H. Yan

Fig. 5. Fusion results on the “Tank” source image pair.

image is more in line with human vision. As illustrated in Fig. 7, compared with the proposed method, all the compared methods cannot keep the object (the red rectangle labeled parts) in Fig. 7(a) well. Meanwhile, the details of the background in

DTCWT, NSCT, MST-SR, HMSD, and CNN are aﬀected by artifacts in the green rectangle labeled regions enlarged in the lower right corner, as illustrated in Fig. 6(c–d), (f), and (i–j). As shown in Fig. 6(n), Our ITDMM does not exist the above two problems. Meanwhile, the resulted

6

Infrared Physics and Technology 102 (2019) 103030

Z. Li and H. Yan

Fig. 6. Fusion results on the “Marne 04” source image pair.

informative, but the artifacts in the resulted images are obvious. RGF usually obtains general fusion results, but in some cases (such as Fig. 5(g)) the object information cannot be kept well. HMSD can get perceptually results. However, the halo artifacts can be found in the results (see Fig. 6(i)). CNN is essentially a kind of Laplacian pyramidbased image fusion method. Therefore, the halo artifacts still exist in the fused results. DF can address the details of the source images well,

Fig. 7(b) are also kept well by our IT-DMM. All in all, the fused results of the two MST-based image fusion methods (DTCWT, NSCT) tend to have three defects: lower contrast, more spectral information, and halo artifacts. The fused results obtained by LATLRR tend to have a lower contrast and lose some details. The fused results obtained by GF tend to lose most spectral information of the IR images. The fused images obtained by MST-SR are more 7

Infrared Physics and Technology 102 (2019) 103030

Z. Li and H. Yan

Fig. 7. Fusion results on the “Bench” source image pair.

8

Infrared Physics and Technology 102 (2019) 103030

Z. Li and H. Yan

Fig. 8. The fused results obtained by our IT-DMM on the remaining fourteen pairs of source images.

DMM gain the ﬁrst place compared with those of eleven state-of-the-art methods. Note that EN and SD reﬂect the amount of information contained in the image and the contrast of the image, respectively. When some noise or artifacts exist in the fused image, the values of these two metrics may be larger. Ref. [1,36] also mentioned the unstable performance of these two metrics and it is the reason why we use them as auxiliary metrics, which is demonstrated in Section 4.1.3. The performance of the remaining four metrics of our IT-DMM indicates that the results fused by our IT-DMM are more informative and more consistent with human visual perception compared with other eleven methods. Combining the above analysis about the six image fusion evaluation metrics in Section 4.1.3, we can conclude that our IT-DMM can achieve advanced objective results. Next, the computational eﬃciency of diﬀerent methods is compared except DF and GAN. This is because these two methods are implemented with PYTHON, while the others are implemented with MATLAB. The experiment is conducted on a desktop computer with a 3.20 GHz CPU and 4.00 GB memory, where the version of MATLAB is MATLAB 2016a in Windows 7. The average running times of the ten methods on the eighteen pairs of IR and VIS images shown in Fig. 3 are listed in Table 2. Table 2 reveals that our IT-DMM has the potential practical value due to its high computational eﬃciency.

but it cannot address the thermal objects in the IR images well. GTF and GAN can retain the thermal radiation information well. However, these two methods ignore the intensities of the VIS images, resulting in poor visual performance of the resulted images. As for our IT-DMM, the thermal objects in the resulted images are almost the same as those in the IR images and the backgrounds in the resulted images are consistent with those in the VIS images. Moreover, the resulted images are more in line with human vision. The fused results obtained by our IT-DMM on the remaining fourteen pairs of source images are illustrated in Fig. 8. 4.3. Objective results compared with eleven methods In this subsection, six image fusion evaluation metrics discussed above are used to quantitatively compare twelve methods on the eighteen pairs of IR and VIS images shown in Fig. 3. Each metric value of each method on all the source images is presented in Fig. 9. The average value of each metric of all the methods on those source images is shown in Table 1. In all the methods, the best value of each metric is shown in bold. We can see from Fig. 9 and Table 1 that the two metrics EN and SD of our IT-DMM achieve the moderate results, the average value of metric EN of our IT-DMM is inferior to that of GF, MST-SR, HMSD, and CNN, and the average value of metric SD of our IT-DMM is inferior to that of MST-SR, HMSD, and CNN. The remaining four metrics NMI , QTE , SSIM and QM of our IT-DMM achieve the best results for most image pairs. Meanwhile, the average values of them of our IT9

Infrared Physics and Technology 102 (2019) 103030

Z. Li and H. Yan

Fig. 9. Quantitative comparisons of twelve methods on the eighteen pairs of IR and VIS images shown in Fig. 3 using six image fusion evaluation metrics (EN , SD , NMI , QTE , SSIM and QM ). In general, the larger the ﬁrst ﬁve metrics are, the better the fusion method is, while for the metric QM , the contrary is the case.

5. Conclusions

formulates IR and VIS image fusion as a minimization problem, and our solution to it is the direct matrix mapping from the input images to the resulted image, which is novel as far as we know. We use spatial

In this paper, we proposed a fusion method named IT-DMM which 10

Infrared Physics and Technology 102 (2019) 103030

Z. Li and H. Yan

Table 1 The average value of each metric of all the methods on the eighteen pairs of IR and VIS images shown in Fig. 3.

EN SD NMI QTE SSIM QM

DTCWT

NSCT

GF

MST-SR

LATLRR

RGF

HMSD

CNN

GTF

DF

GAN

IT-DMM

6.4451 27.0366 0.2623 0.3548 0.7390 486.77

6.4757 27.6670 0.2671 0.3493 0.7439 474.76

6.8543 35.6759 0.4714 0.3916 0.7590 539.32

7.1027 42.3625 0.3712 0.3436 0.7425 569.39

6.4608 27.5347 0.3026 0.4093 0.7343 438.02

6.6696 34.6019 0.3127 0.3569 0.7511 378.49

6.9469 41.0889 0.3672 0.3685 0.7931 422.48

6.9760 42.4945 0.3622 0.3539 0.7577 389.80

6.4672 31.4284 0.3266 0.3685 0.6915 964.71

6.7089 32.6315 0.3549 0.3573 0.7210 444.90

6.5789 30.6167 0.2889 0.3970 0.5837 861.06

6.7203 36.0407 0.5085 0.4101 0.8355 353.78

Table 2 The average running times of the ten methods on the eighteen pairs of IR and VIS images shown in Fig. 3 (unit: second).

Time

DTCWT

NSCT

GF

MST-SR

LATLRR

RGF

HMSD

CNN

GTF

IT-DMM

0.5406

5.1237

0.7439

1.8359

394.5606

4.1656

6.3126

110.7077

10.1885

0.2152

University (Grant No. YC2018019).

saliency map of the IR image to compute the weights used in our ITDMM, which is simple and eﬃcient. Beneﬁting from the eﬃcient weight computation method and the direct matrix mapping, our ITDMM has high computational eﬃciency, and thus it has potential practice value. Qualitative experiments on eighteen IR and VIS image pairs demonstrate that our IT-DMM can achieve state of the art results, where the results fused by our IT-DMM not only retain the thermal objects in the IR images and the backgrounds in the VIS images simultaneously, but also be more in line with human vision. Quantitative comparisons demonstrate that the results fused by our IT-DMM are more informative and more consistent with human visual perception compared with eleven state-of-the-art methods. Furthermore, the high computational eﬃciency of our IT-DMM is also veriﬁed by comparing the average running time on eighteen pairs of IR and VIS images with nine fusion methods. Although our IT-DMM can perform well in most cases, it also has some disadvantages in some certain cases. First, it tends to lose unobvious thermal object information of the IR images. Speciﬁcally, our basic assumption is that thermal objects are signiﬁcant regions in the IR images. When the thermal object information in the IR images is not obvious, it will be suppressed in the spatial saliency map, and then it will not be transferred to the resulted image. Fortunately, most thermal object information is obvious in the IR images. Second, when the backgrounds of the VIS images are too brighter than the backgrounds of IR images, the thermal object information may be not as bright as that in the IR images. This work provides a novel direction for IR and VIS image fusion. In the future, the following two main issues will be studied. One is the new manner to design and compute the weights w p , and the other is the new framework which can be used for ﬁnding the matrix mapping between input images and the resulted image.

References [1] J. Ma, Y. Ma, C. Li, Infrared and visible image fusion methods and applications: a survey, Inf. Fusion 45 (2019) 153–178. [2] B. Zhang, X. Lu, H. Pei, Y. Zhao, A fusion algorithm for infrared and visible images based on saliency analysis and non-subsampled Shearlet transform, Infrared Phys. Technol. 73 (2015) 286–297. [3] B. Cheng, L. Jin, G. Li, General fusion method for infrared and visual images via latent low-rank representation and local non-subsampled shearlet transform, Infrared Phys. Technol. 92 (2018) 68–77. [4] S. Li, X. Kang, J. Hu, Image fusion with guided ﬁltering, IEEE Trans. Image Process. 22 (7) (2013) 2864–2875. [5] Z. Zhou, B. Wang, S. Li, M. Dong, Perceptual fusion of infrared and visible images through a hybrid multi-scale decomposition with Gaussian and bilateral ﬁlters, Inf. Fusion 30 (2016) 15–26. [6] Yu Liu, Xun Chen, Juan Cheng, Hu Peng, Zengfu Wang, Infrared and visible image fusion with convolutional neural networks, Int. J. Wavelets Multiresolut Inf. Process. 16 (03) (2018) 1850018, https://doi.org/10.1142/S0219691318500182. [7] J. Ma, W. Yu, P. Liang, C. Li, J. Jiang, FusionGAN: a generative adversarial network for infrared and visible image fusion, Inf. Fusion 48 (2019) 11–26. [8] J. Ma, C. Chen, C. Li, J. Huang, Infrared and visible image fusion via gradient transfer and total variation minimization, Inf. Fusion 31 (2016) 100–109. [9] Y. Zhang, L. Zhang, X. Bai, L. Zhang, Infrared and visual image fusion through infrared feature extraction and visual information preservation, Infrared Phys. Technol. 83 (2017) 227–237. [10] G. Piella, A general framework for multiresolution image fusion: from pixels to regions, Inf. Fusion 4 (4) (2003) 259–280. [11] C. Tomasi, R. Manduchi, Bilateral ﬁltering for gray and color images, Proceedings of the International Conference on Computer Vision, 1998, pp. 839–846. [12] Z. Farbman, R. Fattal, D. Lischinski, R. Szeliski, Edge-preserving decompositions for multi-scale tone and detail manipulation, ACM Trans. Graph. 27 (3) (2008) 67. [13] K. He, J. Sun, X. Tang, Guided image ﬁltering, Proceedings of the European Conference on Computer Vision, Springer, 2010, pp. 1–14. [14] Q. Zhang, X. Shen, L. Xu, J. Jia, Rolling guidance ﬁlter, Proceedings of the European Conference on Computer Vision, Springer, 2014, pp. 815–830. [15] J. Ma, Z. Zhou, B. Wang, H. Zong, Infrared and visible image fusion based on visual saliency map and weighted least square optimization, Infrared Phys. Technol. 82 (2017) 8–17. [16] J. Zhong, B. Yang, G. Huang, F. Zhong, Z. Chen, Remote sensing image fusion with convolutional neural network, Sens. Imaging 17 (1) (2016) 1–16. [17] Y. Liu, X. Chen, H. Peng, Z. Wang, Multi-focus image fusion with a deep convolutional neural network, Inf. Fusion 36 (2017) 191–207. [18] Y. Liu, S. Liu, Z. Wang, A general framework for image fusion based on multi-scale transform and sparse representation, Inf. Fusion 24 (2015) 147–164. [19] M. Yin, P. Duan, W. Liu, X. Liang, A novel infrared and visible image fusion algorithm based on shift-invariant dual-tree complex shearlet transform and sparse representation, Neurocomputing 226 (2017) 182–191. [20] Y. Ma, J. Chen, C. Chen, F. Fan, J. Ma, Infrared and visible image fusion using total variation model, Neurocomputing 202 (2016) 12–19. [21] H. Guo, Y. Ma, X. Mei, J. Ma, Infrared and visible image fusion based on total variation and augmented Lagrangian, J. Opt. Soc. Am. A 34 (11) (2017) 1961–1968. [22] X. Bai, Infrared and visual image fusion through fuzzy measure and alternating operators, Sensors 15 (7) (2015) 17149–17167. [23] S.K. Kashyap, IR and color image fusion using interval type 2 fuzzy logic system, 2015 International Conference on Cognitive Computing and Information Processing (CCIP), Noida, India, IEEE, 2015, pp. 1–4. [24] Y. Zhai, M. Shah, Visual attention detection in video sequences using spatiotemporal cues, Proceedings of the 14th ACM international conference on Multimedia, 2006, pp. 815–824.

Declaration of Competing Interest The authors declare that they have no known competing ﬁnancial interests or personal relationships that could have appeared to inﬂuence the work reported in this paper entitled “Infrared and Visible Image Fusion via Intensity Transfer and Direct Matrix Mapping”. Acknowledgments The authors would like to thank the editor Prof. Shi and the anonymous reviewers for their valuable comments. This work was partly supported by the National Natural Science Foundation of China (Grant No. 61263040), the Scientiﬁc Research Foundation of Jiangxi Provincial Education Department in China (Grant No. GJJ170602), the China Scholarship Council (File No. 201608360166), and the Postgraduate Innovation Special Foundation of Nanchang Hangkong 11

Infrared Physics and Technology 102 (2019) 103030

Z. Li and H. Yan

[31] H. Li, X. Wu, DenseFuse: a fusion approach to infrared and visible images, IEEE Trans. Image Process. 28 (5) (2019) 2614–2623. [32] M. Hossny, S. Nahavandi, D. Creighton, Comments on ‘Information measure for performance of image fusion’, Electron. Lett. 44 (18) (2008) 1066–1067. [33] N. Cvejic, C.N. Canagarajah, D.R. Bull, Image fusion metric based on mutual information and Tsallis entropy, Electron. Lett. 42 (11) (2006) 626–627. [34] Z. Wang, A.C. Bovik, H.R. Sheikh, E.P. Simoncelli, Image quality assessment: from error visibility to structural similarity, IEEE Trans. Image Process. 13 (4) (2004) 600–612. [35] H. Chen, P.K. Varshney, A human perception inspired quality metric for image fusion based on regional information, Inf. Fusion 8 (2) (2007) 193–207. [36] X. Jin, Q. Jiang, S. Yao, D. Zhou, R. Nie, J. Hai, K. He, A survey of infrared and visual image fusion methods, Infrared Phys. Technol. 85 (2017) 478–501.

[25] L. Zhang, M. Yang, X. Feng, Sparse representation or collaborative representation: Which helps face recognition? Proceedings of the 2011 International conference on computer vision, 2011, pp. 471–478. [26] Z. Xia, X. Peng, X. Feng, A. Hadid, Scarce face recognition via two-layer collaborative representation, IET Biometrics 7 (1) (2018) 56–62. [27] A. Toet, TNO Image Fusion Dataset, April, 2015. http://ﬁgshare.com/articles/TNO_ Image_Fusion_Dataset/1008029. [28] J.J. Lewis, R.J. O’Callaghan, S.G. Nikolov, D.R. Bull, N. Canagarajah, Pixel- and region-based image fusion with complex wavelets, Inf. Fusion 8 (2) (2007) 119–130. [29] Q. Zhang, B. Guo, Multifocus image fusion using the nonsubsampled contourlet transform, Signal Process. 89 (7) (2009) 1334–1346. [30] H. Li, X. Wu, Infrared and visible image fusion using Latent Low-Rank Representation, arXiv preprint arXiv, 1804.08992 (2018).

12

Infrared and visible image fusion via intensity transfer and direct matrix mapping

Infrared and visible image fusion via intensity transfer and direct matrix mapping

Recommend Documents