Displays 26 (2005) 191–196 www.elsevier.com/locate/displa
Fusion of visible and infrared imagery for night color vision V. Tsagaris*, V. Anastassopoulos Electronics and Computers Division, Physics Department, University of Patras, Greece, 26500 Received 20 May 2004; accepted 29 June 2005 Available online 26 July 2005
Abstract A combined approach for fusing night-time infrared with visible imagery is presented in this paper. Night color vision is thus accomplished and the final scene has a natural day-time color appearance. Fusion is based either on non-negative matrix factorization or on a transformation that takes into consideration perceptual attributes. The final obtained color images possess a natural day-time color appearance due to the application of a color transfer technique. In this way inappropriate color mappings are avoided and the overall discrimination capabilities are enhanced. Two different data sets are employed and the experimental results establish the overall method as being efficient, compact and perceptually meaningful. q 2005 Elsevier B.V. All rights reserved. Keywords: Night vision; Infrared imagery; Image fusion
1. Introduction The availability of modern night vision systems, like image intensifiers and thermal cameras, enable operations during night and in adverse weather conditions. Night vision cameras deliver monochrome (gray level or greenish) images that are usually hard to interpret and may give rise to visual illusions and loss of situational awareness. The two most common night-time imaging systems display either emitted infrared (IR) radiation or dim reflected light. In this way the different imaging modalities give complementary information about the objects or area under inspection. Thus, techniques for fusing IR and intensified visual imagery or IR with color day-time imagery should be employed in order to provide a compact representation of the scene with increased interpretation capabilities. In the context of thermal imaging a number of color fused-based representations have been proposed [1–4]. A simple mapping of thermal bands into the three components of an RGB image can provide an immediate benefit, since the human eye can discriminate several thousands of colors but only a few dozens of gray levels. On the other hand, * Corresponding author. Tel.: C302610997445; fax: C30261997456. E-mail address:
[email protected] (V. Tsagaris).
0141-9382/$ - see front matter q 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.displa.2005.06.007
inappropriate color mappings may hinder situational awareness [5–6] due to lack of color constancy. Hence, an image fusion method for night vision imagery must result in color images with natural appearance and a high degree of similarity with the corresponding natural scenes. The main goal of this work is to present a strategy for integrating infrared information into images with natural colors. Firstly, fusion is carried out by means of two different algorithms. The main purpose is to fuse the RGB components of the color image from the visible region of the electromagnetic spectrum with the corresponding grayscale infrared image. The first fusion method employs nonnegative matrix factorization (NMF) [7,8] in order to derive a part-based additive representation of the source data. Three components are derived from NMF and mapped to RGB channels in order to form a color image. In this way a part-based representation with improved detection capabilities is obtained, since the important features of the source data can be found in different components. Alternatively, a second fusion method [9] can be used, which is based on perceptual attributes and incorporates a linear transformation that yields to a color image, whose covariance matrix is adjustable. A suitable choice of this matrix improves color perception while the features that are best detected or presented in the source images are merged into a robust color representation. The result of the fusion process possesses enhanced interpretation capabilities as far as the infrared component is concerned.
192
V. Tsagaris, V. Anastassopoulos / Displays 26 (2005) 191–196
In order to further improve the discrimination capabilities, the fusion process is followed by a color transfer technique, similar to that proposed in [10]. In this way the color constancy is increased and the final image has an overall natural color appearance similar to that of the original color image from the visible region of the electromagnetic spectrum. Moreover, the overall color balance of the scene is sustained and lack of color constancy is avoided. The proposed approach can be applied into night-time surveillance infrared cameras when the corresponding day-time color scenes are available or in the case of intensifying visual cameras with more than three channels for fusion with IR imagery. This paper is organized as follows. Section 2 provides the theoretical background of the two fusion methods employed in this work. The color transfer technique that is applied in the fused images in order to provide them with a natural color appearance is presented in Section 3. In Section 4 a detailed description for the two data sets used is given. Moreover, the results of the fusion and the color transfer approaches are demonstrated accompanied with a discussion. Concluding remarks are provided in Section 5.
2. Image fusion methods 2.1. Non-negative matrix factorization Unsupervised approaches for finding the ‘appropriate’ features from a data set, such as principal components analysis (PCA), can be understood as factorizing a data matrix subject to different constraints. Depending upon the constraints utilized, the resulting factors can be shown to have very different representational properties. Nonnegative matrix factorization is distinguished from the other methods by its use of non-negativity constraints. These constraints lead to a part-based representation because they allow only additive, not subtractive, combinations. Principal components analysis enforces only a weak orthogonality constraint, resulting in a very distributed representation that uses cancellations to generate variability. Non-negativity is a useful constraint for matrix factorization that can learn a part representation of the data. In this way NMF is shown to be a useful decomposition for multivariate data [7–8]. The NMF requires algorithms for solving the following problem: Given a non-negative matrix V, find non-negative matrix factors W and H such that: V zWH
(1)
NMF can be applied to the statistical analysis of multivariate data in the following manner. Given a set of multivariate n-dimensional data vectors, the vectors are regarded as being the columns of an n!m matrix V where m is the number of examples in the data set. This matrix is then
approximately factorized into an n!m matrix W and an r! m matrix H. Usually r is chosen to be smaller than n or m, so that W and H are smaller than the original matrix V. This results in a compressed version of the original data matrix. The approximation in (1) can be rewritten column by column as vzWh, where v and h are the corresponding columns of V and H. In other words, each data vector v is approximated by a linear combination of the columns of W, weighted by the components of h. Therefore W can be regarded as containing a basis that is optimized for the linear approximation of the data in V. Since relatively few basis vectors are used to represent many data vectors, good approximation can only be achieved if the basis vectors discover structure that is latent in the data. NMF realization is usually based on iterative updates of W and H, and they are found very useful in practical applications. Other algorithms may possibly be more efficient concerning overall computation time, but can be considerably more difficult to implement. At each iteration of the algorithms, the new value of W or H is found by multiplying the current value by some factor that depends on the quality of the approximation in (1). It was proven in [7] that the quality of the approximation improves monotonically with the application of these multiplicative update rules. In practice, this means that repeated iteration of the update rules is guaranteed to converge to a locally optimal matrix factorization. In order to find an approximate factorization VzWH, a cost function that quantifies the quality of the approximation must be defined. Such a cost function can be constructed using some measure of distance between two non-negative matrices V and (WH). One useful measure is the norm of the difference between V and WH, X sV KWHs2 Z ðVij KWHij Þ2 (2) ij
This is lower bounded by zero, and clearly vanishes if and only if VZWH. Another useful measure is DðVsðWHÞÞ Z
X ij
ðVij log
Vij KVij C ðWHÞij Þ ðWHÞij
(3)
Like the norm this quantity is also lower bounded by zero, and vanishes if and only if VZWH. However, it cannot be called a ‘distance’ because it is not symmetric in V and WH. Therefore, we will refer to it as the ‘divergence’ of V from WH. It reduces to the divergence, or P Kullback-Leibler P relative entropy, when ij Vij Z ij WHij Z 1, so that and can be regarded as normalized probability density functions. Although the functions kV-WHk and D(VkWH) are convex in W only or H only, they are not convex in both variables simultaneously. Therefore, it is unrealistic to expect an algorithm to optimize these functions in the sense of finding global minima. Nevertheless, there are many techniques from numerical optimization that can be applied to find local minima. Gradient descent is perhaps the
V. Tsagaris, V. Anastassopoulos / Displays 26 (2005) 191–196
simplest technique to implement, but convergence can be slow and it always leads to the nearest local optimum. Other methods such as conjugate gradient have faster convergence, at least in the vicinity of local minima, but their implementation is more complicated compared to gradient descent. The convergence of gradient-based methods also has the disadvantage of being very sensitive to the choice of step size, which can be very inconvenient for large applications. A multiplicative update rule can be a good compromise between speed and ease of implementation for finding local minima for the cost functions described above. The norm kV-WHk is nonincreasing under the update rules HiC1 ) Hi
ðW T VÞi ðW T WHÞi
WiC1 ) Wi
ðVH T Þi ðWHH T Þi
193
The resulting covariance matrix Cy is chosen as the covariance matrix of a natural color image of the scene. The matrices Cx and Cy are of the same dimension and if they are known, the transformation matrix A can be evaluated using the Cholesky factorization method. Accordingly, a symmetric positive definite matrix S can be decomposed by means of an upper triangular matrix Q, so that S Z QT ,Q
(8)
The matrices Cx, Cy using the Cholesky factorization can be written as Cx Z QTx Qx
Cy Z QTy Qy
(9)
and Eq. (7) becomes (4)
This norm is invariant under these updates if and only if W and H are at a stationary point of the distance. The divergence D(VkWH)) is nonincreasing under the update rules P W V =ðWHÞi HiC1 ) Hi i Pi i k Wki (5) P m Hi Vi =ðWHÞi P WiC1 ) Wi v Hiv The divergence is invariant under these updates if and only if W and H are at a stationary point of the divergence. It is straightforward to see that these multiplicative factor is unity when VZWH, so that perfect reconstruction is necessarily a fixed point of the update rules.
QTy Qy Z AT QTx Qx A Z ðQx AÞT Qx A thus Qy Z Qx A
(11)
and the transformation matrix A is 1 A Z QK x Qy
(12)
The final form of the transformation matrix A implies that the proposed transformation depends on the statistical properties of both the original and final images. The resulting population vector y is of the same order as the original population vector x, but only three of the components of y will be used for color representation. The relation between the covariance Cy and the correlation coefficient matrix Ry is given by Cy Z SRy ST
2.2. Image fusion based on perceptual attributes A different approach for color image fusion is proposed in [9]. In this method the core idea is not to totally decorrelate the data as in the case of PCA, because the result is unpleasing for the human observer, but to control the correlation among the color components of the final image. This is achieved by appropriate selection of the covariance matrix of the final fused image. The transformation distributes the energy of the source multispectral bands, so that the correlation among the RGB components of the final image is similar to that of natural color images. In this way no additional color space transformation is needed and direct representation to any RGB display can be applied. This can be achieved using a linear transformation of the form y Z AT x
(6)
where x and y are the population vectors of the source and the final images respectively. The relation between the covariance matrices of y and x is Cy Z AT Cx A
(7)
(10)
where 2
sy1
6 6 0 6 SZ6 6 0 6 : 4 0
(13) 3
0
0
:
0
sy2
0
:
0 :
sy3 :
: :
0
0
:
7 0 7 7 0 7 7 : 7 5
(14)
syK
is the diagonal matrix with the variances of the new vectors in the main diagonal and 3 2 1 rR;G rR;B : 0 7 6 1 rG;B : 0 7 6 rR;G 7 6 (15) Ry Z 6 1 : 07 7 6 rR;B rG;B 7 6 : : : : : 5 4 0
0
0
:
1
is the desired correlation coefficient matrix. For high visual quality the final color image produced by the transformation must possess high degree of contrast. In other words the energy of the original data must be sustained and equally distributed in the RGB components of the final
194
V. Tsagaris, V. Anastassopoulos / Displays 26 (2005) 191–196
color image. This requirement is expressed as follows K X iZ1
s2xi Z
3 X
s2yi
(16)
iZ1
with sy1Zsy2Zsy3 approximately. The rest KK3 images should have negligible energy (contrast) and will not be used in forming the final color image. Their variance can be adjusted to small values say syiZ10K4 sy1 for iZ4.K.
where the opposite colors are on the opposite sides of a line passing through the perfect white with tristimulus values Xn, Y n, Z n. In the Lab space, the mean is subtracted from the source (original image) data points: L~ Z LK!LO a~ Z aK!aO b~ Z bK!bO
(21)
Then, the source data points are scaled with the ratio of the standard deviations of the source and target images respectively: sLt ~ sa sb Ls as0 Z ta a~ s bs0 Z tb b~ s L ss ss ss
3. Color transfer
Ls0 Z
The color images resulting from a color image fusion scheme often have unnatural color appearance and lack of color constancy. On the other hand, an appearance similar to this of normal daylight color images enhances the interpretation capabilities and significantly improves discrimination performance. In this section a simple technique, similar to that proposed in [10], is described in order to transfer the color characteristics from natural day-light imagery to the fused color images. However, the color transfer is directly applied to the Lab color space, whereas in [10] an additional transformation in the LMS system was employed. In a color transfer technique, the fused color image should be transformed into a color space with reduced correlation compared to the RGB space. Lab is a perceptually uniform color space and there is little correlation between the axes; so different operations to different color channels can be applied with confidence that undesirable cross-channel artifacts will not occur. The first step of the color transfer technique is to transform the RGB tristimulus values to device independent XYZ ones. The conversion depends on the characteristics of the display upon which the image was originally intended to be displayed. Since that information is rarely available, it is common practice to use a device-independent conversion that maps white in the chromaticity diagram to white in RGB space and vice versa. So, the RGB to XYZ conversion is defined as 3 2 3 2 3 2 0:5141 0:3239 0:1604 R X 7 6 7 6 7 6 (17) 4 Y 5 Z 4 0:2651 0:6702 0:0641 5,4 G 5
After this transformation the pixels comprising the source image have standard deviations that conform to the target color image. In this way the colors and hues of the original image are transferred to the target image. Finally, in reconstructing the Lab transform of the source image, instead of adding the previously subtracted averages, the averages computed for the target color image are added. The result is transformed back to RGB space via the XYZ color space.
Z
0:0241 0:1228
0:8444
B
and then to the Lab color space system according to
Y L Z 116 f K16 Yn
(18)
X Y Kf a Z 500 f Xn Yn
(19)
Y Z Kf b Z 200 f Yn Zn
(20)
(22)
4. Data description and results Two data sets corresponding to two different scenes are used in this work. The first data set comprises a color image of a scene representing a sandy path, trees and fences (Fig. 1(a)) and a midwave infrared (3–5 mm) image in which a person is standing behind the trees and close to the fence as shown in Fig. 1(b). In the second data set, a color image of a lakeside and a bench (Fig. 2(a)) along with a midwave infrared image where a person is crouching next to the bench, shown in Fig. 2(b), are presented. The two data sets have been provided by TNO, Human Factors and a more detailed description of the data acquisition procedure can be found in [4]. The image fusion methods described in Section 2 are applied to both data sets in order to obtain a fused color representation of the scenes. In Figs. 1(c) and 2(c) the results of the NMF fusion method, for both data sets, are depicted. In both cases the important characteristics of the target are transferred to the final color representation and thus the target can be more easily distinguished compared to the original grayscale thermal images of Figs. 1(b) and 2(b). NMF provides a representation of additive features as expected and for this reason the color appearance is rather unnatural. In the case of the second fusion method that is based on perceptual attributes the resulted color image has a much more natural color appearance as presented in Figs. 1(d) and 2(d). This is due to the fact that the statistical properties (covariance or correlation matrix) of the natural color image of the scene are incorporated into the method as implied by (12). Moreover, this method seems to outperform the NMF method in a visual detection process because
V. Tsagaris, V. Anastassopoulos / Displays 26 (2005) 191–196
195
Fig. 1. (a) Scene 1, color image covering the visible part of the EM spectrum and (b) the corresponding infrared image (c) fusion result according to method 1 (d) fusion result according to method 2 (e) and (f) final image after the use of a color transfer technique.
in both cases the target is preserved in white and thus can be easily distinguished from the background. In order to further improve the color appearance of the images and increase the visual discrimination capabilities the color transfer technique described in Section 3 is employed. The source image for the color transfer technique is the image resulted from the fusion process and as target image the natural color image of the scene is used. In this way the color balance of the natural scene is transferred to the fused image and the salient features revealed by the fusion process are further highlighted. The color appearance of the images resulted by the NMF function process is improved significantly by the color transfer technique as can be seen in Figs. 1(e) and 2(e). In both images the target is easily detected and the more natural color appearance of the images yields in enhanced discrimination performance. The color transfer technique is also applied to the color images resulted from the second fusion method and the results are shown in Figs. 1(f) and 2(f). The color appearance of the scenes is improved, although not in the same degree as in the case of the NMF fused images, because the second fusion method already incorporates characteristics of the natural color scene. However, the target discrimination capabilities are enhanced.
Fig. 2. (a) Scene 2, color image covering the visible part of the EM spectrum and (b) the corresponding infrared image (c) fusion result according to method 1 (d) fusion result according to method 2 (e) and (f) final image after the use of a color transfer technique.
5. Conclusions An ergonomic and fully automated approach for fusing visible with infrared imagery has been presented in this work. In order to obtain color images resulted from visual and infrared imagery with enhanced interpretation capabilities, a two-step approach has been proposed. Fusion is carried out as a first step in order to reduce data dimensionality. One of the proposed fusion approaches is based on NMF, and provides an additive part-based representation of the source imagery. An alternative fusion approach incorporates attributes of color perception and thus provides increased discrimination capabilities. In the second step the first order statistics of a natural color image are transferred to the fusion image in order to provide them with a natural day-time appearance. The proposed approach has been applied into two different data sets and the results prove that the method yields in results that make the scene interpretation more intuitive and with
196
V. Tsagaris, V. Anastassopoulos / Displays 26 (2005) 191–196
improved discrimination capabilities. Thus, the method is promising for night-time surveillance infrared cameras when the corresponding day-time color images are available or in the case of intensifying visual cameras with more than three channels for fusion with IR imagery.
Acknowledgements The authors would like to thank Alexander Toet, TNO Human Factors for providing the data used in this work. This work was supported by the European Social Fund (ESF), Operational Program for Educational and Vocational Training II (EPEAEK II), and the Program HERAKLEITOS of the Ministry of Education and Religious Affairs, Greece.
References [1] E.A. Essock, M.J. Sinai, J.S. McCarley, W.K. Krebs, J.K. DeFord, Perceptual ability with real-world nighttime scenes: image-intensified, infrared, and fused-color imagery, Human Factors 41 (3) (1999) 438–452.
[2] A.M. Waxman, et al., Solid-state color night vision: fusion of lowlight visible and thermal infrared imagery, MIT Lincoln Laboratory Journal 11 (1999) 41–60. [3] A.M. Waxman, A.N. Gove, D.A. Fay, J.P. Racamoto, J.E. Carrick, M.C. Seibert, E.D. Savoye, Color night vision: opponent processing in the fusion of visible and IR imagery, Neural Networks 10 (1) (1997) 1–6. [4] A. Toet, Natural color mapping for multiband nightvision imagery, Information fusion 4 (2003) 155–166. [5] E. Reinhard, M. Ashikhmin, B. Gooch, P. Shirley, Color transfer between images, IEEE Computer Graphics and Applications 21 (5) (2001) 34–41. [6] Varga JT. Evaluation of operator performance using true color and artificial color in natural scene perception, (Report ADA363036), Naval Postgraduate School, Monterey, CA, 1999. [7] D.D. Lee, H.S. Seung, Algorithms for non-negative matrix factorization, Advances in Neural and Information Processing Systems 13 (2001) 556–562. [8] D.D. Lee, H.S. Seung, Learning the parts of an object by non-negative matrix factorization, Nature 401 (1999) 788–791. [9] V. Tsagaris, V. Anastassopoulos, Multispectral image fusion method based on perceptual attributes Proceeding of the SPIE, vol. 5238, 2003 pp. 357–367. [10] E. Reinhard, M. Ashikhmin, B. Gooch, P. Shirley, Color transfer between images, IEEE Computer Graphics and Applications 21 (5) (2001) 34–41.