Information Fusion 8 (2007) 119–130 www.elsevier.com/locate/inffus
Pixel- and region-based image fusion with complex wavelets John J. Lewis *, Robert J. OÕCallaghan, Stavri G. Nikolov, David R. Bull, Nishan Canagarajah The Centre for Communications Research, University of Bristol, Bristol BS8 1UB, UK Received 20 January 2005; received in revised form 8 September 2005; accepted 21 September 2005 Available online 23 November 2005
Abstract A number of pixel-based image fusion algorithms (using averaging, contrast pyramids, the discrete wavelet transform and the dualtree complex wavelet transform (DT-CWT) to perform fusion) are reviewed and compared with a novel region-based image fusion method which facilitates increased flexibility with the definition of a variety of fusion rules. A DT-CWT is used to segment the features of the input images, either jointly or separately, to produce a region map. Characteristics of each region are calculated and a region-based approach is used to fuse the images, region-by-region, in the wavelet domain. This method gives results comparable to the pixel-based fusion methods as shown using a number of metrics. Despite an increase in complexity, region-based methods have a number of advantages over pixel-based methods. These include: the ability to use more intelligent semantic fusion rules; and for regions with certain properties to be attenuated or accentuated. 2005 Elsevier B.V. All rights reserved. Keywords: Image fusion; Feature-based fusion; Region-based fusion; Segmentation; Complex wavelets
1. Introduction Numerous imaging sensors have been developed, however, there are many scenarios where no one sensor will give the complete picture. For example, while images taken in the visible spectrum tend to have good contextual information they do not easily show up targets if they are camouflaged or in low light conditions. Thus, redundant information from a set of two or more different sensors can be fused together to produce a single fused image (or video stream) which preserves all relevant information from the original data. In this paper we define image fusion as in [1]: the process by which several images, or some of their features are combined together to form a single image. Ideally, a fusion algorithm should satisfy the requirements [2]:
*
Corresponding author. Tel.: +44 117 3315073; fax: +44 117 9545206. E-mail address:
[email protected] (J.J. Lewis).
1566-2535/$ - see front matter 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.inffus.2005.09.006
• Preserve all relevant information in the fused image. • Suppress irrelevant parts of the image and noise. • Minimise any artefacts or inconsistencies in the fused image. Image fusion algorithms can be considered to be at one of four levels of abstraction [3]: signal; pixel; feature; and symbolic level. At pixel-level, images are combined by considering individual pixel values or small arbitrary regions of pixels in order to make the fusion decision. Various types of pixel-based fusion algorithms are discussed in Section 2. One method of achieving feature-level fusion is with a region-based fusion scheme. An image is initially segmented in some way to produce a set of regions. Various properties of these regions can be calculated and used to determine which features from which images are included in the fused image. This has advantages over pixel-based methods as more intelligent semantic fusion rules can be considered based on actual features in the image, rather than on single or arbitrary groups of pixels. Section 3 provides an
120
J.J. Lewis et al. / Information Fusion 8 (2007) 119–130
overview of our region fusion scheme. A brief discussion of evaluation schemes for fused images is given in Section 4 followed by some results and discussion in Section 5 and conclusions in Section 6. Throughout the paper, it is assumed that all input images are registered in advance. 2. Review of pixel-based image fusion Pixel-based fusion schemes range from simple averaging of the pixel values of registered images to more complex multi-resolution (MR) pyramid and wavelet methods (for example see [1,2,4–11]). 2.1. Spatial image fusion These methods work by combining the pixel values of the two or more images to be fused in a linear or non-linear way. The simplest form is a weighted averaging of the registered input images I1, I2, . . ., IN, weighted with a1, a2, . . ., aN to give the fused image F, as shown in Eq. (1) F ¼ a1 ðI 1 Þ þ a2 ðI 2 Þ þ þ aN ðI N Þ where a1 þ a2 þ þ aN ¼ 1
ð1Þ
The values of a can be computed using a priori knowledge. Where this information is not available, it is common to set a1 = a2 = = aN. This simple average (PB:AVE) is used in this paper for comparison with other fusion methods. Where two or three grey scale images are to be fused, simple colour fusion can be used such that each image is assigned to a colour plane. For example, a visible image assigned to the green plane in RGB colour space and a co-registered IR image to the red plane. In the fused image, objects from the visible image (such as foliage) will appear green while hot objects from the IR image will appear red. A more complex method of fusion in colour space, [12], based on the human visual system and colour opponency combines the images using neural networks. Other colour fusion methods use principle component analysis techniques to fuse remote sensing images.
ety of pyramid transforms exist, such as, contrast, Laplacian or Gaussian pyramids based on the type of filters used. A contrast or ratio of low-pass (ROLP) pyramid, introduced in [13], better fits the model of the human visual system. Fusion is performed by initially transforming the registered input images using the pyramid transform and then fusing the images using some fusion rule such as the maximum-selection fusion rule and the fused image obtained by performing the inverse transformation, as shown in Fig. 1. However these methods can have problems with noisy images as noise tends to have higher contrast and this will be selected over cleaner images. A more complex fusion rule proposed in [14] and modified in [15] uses a match and saliency metric to select important features producing cleaner fused images. For comparison in this paper, a contrast pyramid fusion scheme (PB:PYR) described in [16] is used. 2.2.2. The wavelet transform Wavelet transforms have been successfully used in many fusion schemes. A common wavelet transform technique used for fusion is the DWT [6,17,1]. It has been found to have some advantages over pyramid schemes such as: increased directional information [6]; no blocking artefacts that often occur in pyramid-fused images [6]; better signalto-noise ratios than pyramid-based fusion [15]; improved perception over pyramid-based fused images, compared using human analysis [6,15]. The DWT is a spatial-frequency decomposition that provides flexible multi-resolution analysis of an image. The one-dimensional (1-D) DWT involves successive filtering and down sampling of the signal. Provided that the filters used are bi-orthogonal, they will have a set of related synthesis filters that can be used to perfectly reconstruct
I1
I2
IN
2.2. Multi-resolution image fusion Transform fusion methods, defined in Eq. (2), involve transforming each of the registered input images I1, I2, . . ., IN from normal image space into some other domain by applying an MR transform, x, for example wavelets or pyramids. The transformed images are fused using some fusion rule, /, and the fused image, F, reconstructed by performing the inverse transform, x1 [1] F ¼ x1 ð/ðxðI 1 Þ; xðI 2 Þ; . . . ; xðI N ÞÞÞ
Fusion Rule
F
Fused Pyramid Coefficients
ð2Þ
2.2.1. Fusion with pyramid transforms Multi-resolution image pyramids are constructed by successive filtering and down sampling of an image. A vari-
Fused Image
I
Fig. 1. The pixel-based image fusion scheme using pyramids.
J.J. Lewis et al. / Information Fusion 8 (2007) 119–130
the signal. For images the 1-D DWT is used in two dimensions by separately filtering and down sampling in the horizontal and vertical directions. This gives four sub-bands at each scale of the transformation with sensitivity to vertical, horizontal and diagonal frequencies. A major problem with the DWT is its shift variant nature caused by sub-sampling which occurs at each level. A small shift in the input signal results in a completely different distribution of energy between DWT coefficients at different scales [8]. A shift invariant DWT (SIDWT), described in [2], yields a very over-complete signal representation as there is no sub-sampling. The dual-tree complex wavelet transform (DT-CWT) [18,19,8] is an over-complete wavelet transform that provides both good shift invariance and directional selectivity over the DWT, although there is an increased memory and computational cost. Two fully decimated trees are produced, one for the odd samples and one for the even samples generated at the first level. The DT-CWT has reduced over completeness compared with the SIDWT, an increased directional sensitivity over the DWT and is able to distinguish between positive and negative orientations giving six distinct sub-bands at each level, the orientations of which are ±15, ±45, ±75. The DT-CWT gives perfect reconstruction as the filters are chosen from a perfect reconstruction bi-orthogonal set [8]. It is applied to images by separable complex filtering in two dimensions. The bi-orthogonal Antonini and Q-shift filters used are described in [18]. The increased shift invariance and directional sensitivity mean that the DT-CWT gives improved fusion results over the DWT [9]. 2.2.3. Fusion with wavelet transforms Discrete wavelet transform (PB:DWT) was described in [6] and works in a similar way to PB:PYR but using the discrete wavelet transform (DWT) rather than pyramids. The dual-tree complex wavelet transform (PB:CWT) is used in [9] and shown in Fig. 2. It employs the DT-CWT to obtain a MR decomposition of the input images and then combine them using some fusion rules. In general, the high-pass and low-pass coefficients are dealt with separately. For the high-pass, a maximum-selection fusion rule to produce a single set of coefficients is often used. The maximum-selection scheme selects the largest absolute wavelet coefficient at each location from the input images as the coefficient at that location in the fused image. As wavelets and to a lesser extent, pyramids, tend to pick out the salient features of an image, this scheme works well and produces good results. An averaging fusion rule has been used, however, this tends to blur images and reduce the contrast of features appearing in only one image. More complex fusion rules have been developed such as in [6], where the maximum absolute value in an arbitrary window is used as an activity measure of the central coefficient in the window. In [5] a decision map is created based on the activity of an arbitrary block around a central coefficient as well as the similarity between the two areas of an image. In this
121
Fig. 2. The pixel-based image fusion scheme using the DT-CWT.
paper a choose maximum scheme is used for all pixel-based algorithms. The low-pass coefficients are usually combined as a weighted average to give the fused version. This works well with images from the same modality, however multi-modal images can have different dynamic ranges and averaging can significantly alter the range of the images, reducing contrast in the fused image. Fig. 3 shows the fusion of the ‘‘UN Camp’’ sequence using these algorithms. 3. The region level fusion scheme The majority of applications of a fusion scheme are interested in features within the image, not in the actual pixels. Therefore, it seems reasonable to incorporate feature information into the fusion process [20]. There are a number of perceived advantages of this, including: • Intelligent fusion rules: Fusion rules are based on combining groups of pixels which form the regions of an image. Thus, more useful tests for choosing the regions of a fused image, based on various properties of a region, can be implemented. • Highlight features: Regions with certain properties can be either accentuated or attenuated in the fused image depending on a variety of the regionÕs characteristics. • Reduced sensitivity to noise: Processing semantic regions rather than individual pixels or arbitrary regions can help overcome some of the problems with pixel-fusion methods such as sensitivity to noise, blurring effects and mis-registration [20]. • Registration and video fusion: The feature information extracted from the input images could be used to aid the registration of the images. Region-based video fusion schemes could use motion estimation to track the fused features, allowing the majority of frames to be quickly predicted from some fully fused frames.
122
J.J. Lewis et al. / Information Fusion 8 (2007) 119–130
Fig. 3. Pixel-based fusion of the ‘‘UN Camp’’ images. (a) Original IR image, (b) original visible image, (c) PB:AVE fused image, (d) PB:PYR fused image, (e) PB:DWT fused image (f) PB:CWT fused image.
A number of region-based fusion schemes have been proposed, for example, [21,22,20,23,5]. These initially transform pre-registered images using an MR transform. Regions representing image features are then extracted from the transform coefficients. A grey-level clustering using a generalised pyramid linking method is used for segmentation in [20,23,5]. The regions are then fused based on a simple region property such as average activity. These methods do not take full advantage of the wealth of information that can be calculated for each region. The region-based fusion scheme, initially proposed in [24], is shown in Fig. 4. Firstly, the N registered images I1, I2, . . ., IN are transformed using x, the DT-CWT. ½Dn ; An ¼ xðI n Þ
ð3Þ
This gives a set of detail coefficients Dn,(h,l) for each image In, consisting of a group of six different sub-bands, h, at each level of decomposition, l. An is the approximation of the image at the highest level. A combination of intensity information, In and textural information, Dn, is used to segment the images, either jointly or separately, with the segmentation algorithm r, giving the segmentation maps: S1, S2, . . ., SN; or a list of all the Tn regions for each image: R1, R2, . . ., RN; where Rn ¼ frn;1 ; rn;2 ; . . . ; rn;T n g, n 2 N. The map is down sampled by 2 to give a decimated segmentation map of the image at each level of the transform. When down sampling, priority is always given to a pixel from the smaller of the two regions. The segmentation algorithm is discussed in Section 3.1 A MR priority map, P n ¼ fpn;rn;1 ; pn;rn;2 ; . . . ; pn;rn;T n g
ð4Þ
is then generated for each region, rn,t in each image, In based on any of a variety of tests. Calculating the priority
Fig. 4. The region-based image fusion scheme using the DT-CWT.
is discussed further in Section 3.2. The priority generated for the input images shown in Fig. 3(a) and (b) are shown in Fig. 5. The priority of each region across all sub-bands and levels is the same. If priority is allowed to vary across sub-bands and levels, the features become blurred when the feature has a low energy in some of the sub-bands. This results in some coefficients which contribute to a feature in
J.J. Lewis et al. / Information Fusion 8 (2007) 119–130
123
Fig. 5. Priorities generated using entropy. (a) For the IR image and (b) for the visible image.
the original input image not being present in the fused image coefficients, causing distortion. A feature, tn, can be accentuated or attenuated by multiplying all of the wavelet coefficients corresponding to the region by a constant, wtn . A weighting mask, Wn, is created for each image, the same size as the wavelet coefficients, defining how much each coefficient should be multiplied by. Regions are then either chosen or discarded based on this priority and the fusion rule, /, giving the wavelet coefficients of the fused image. A mask, M, is generated that specifies which image each region should come from. This is shown in Eq. (5). The mask is the same size as that of the wavelet coefficients of the fused image. An example of the mask for one sub-band at the second level is shown in Fig. 6. The algorithm always chooses the region with the maximum priority to determine which image each of the coefficients representing a region, t, should come from. M t ¼ /ðp1;t ; p2;t ; . . . ; pN ;t Þ
ð5Þ
If Si 5 Sj, a segmentation map, SF, is created such that SF = S1 [ S2 [ [ SN. Thus, where two regions ri,p and rj,q from image i and j overlap, both will be split into two regions, each with the same priority as the original. The low-pass wavelet coefficients can either be combined as a weighted average or region-by-region, similar to the highpass coefficients, dependent on the mask, M. The fused image is obtained by performing the inverse transform, x1, on the fused weighted wavelet coefficients, F = x1(DF, AF).
Fig. 6. The image mask: black from IR, gray from visible.
3.1. The segmentation algorithm The quality of the segmentation algorithm is of vital importance to the fusion process. For the correct features to be present in the fused image, ideally, the segmentation algorithm should have the following properties: • All required features are segmented as single separate regions. If a feature is missed, it may not be included in the fused image. If a feature is split into more than one region, each will be treated separately, possibly introducing artefacts into the fused image. • As small a number of regions as possible should be created, as the time taken to compute the fused image increases with the number of regions. An adapted version of the combined morphological– spectral unsupervised image segmentation algorithm is used, which is described in [25], enabling it to handle multi-modal images. The algorithm works in two stages. The first stage produces an initial segmentation by using both textured and non-textured queues. The detail coefficients of the DTCWT are used to process texture. The gradient function is applied to all levels and orientations of the DT-CWT coefficients and up-sampled to be combined with the gradient of the intensity information to give a perceptual gradient. The larger gradients indicate possible edge locations. The watershed transform of the perceptual gradient gives an initial segmentation. The second stage uses these primitive regions to produce a graph representation of the image which is processed using a spectral clustering technique. A segmented IR and visible image is shown in Fig. 7. The method can use either intensity information or textural information or both in order to obtain the segmentation map. This flexibility is useful for multi-modal fusion where some a priori information of the sensor types is known. For example, IR images tend to lack textural information with most features having a similar intensity value throughout the region. Intuitively, an intensity only segmentation map should give better results than a texturebased segmentation. Secondly, the segmentation can be performed either separately or jointly. For separate segmentation, each of the
124
J.J. Lewis et al. / Information Fusion 8 (2007) 119–130
Fig. 7. Segmentation of IR and visible image. (a) Unimodal segmentation of IR image, (b) unimodal segmentation of visible image, (c) union of both unimodal segmentations and (d) joint segmentation.
input images generates an independent segmentation map for each image. S 1 ¼ rðI 1 ; D1 Þ; . . . ; S N ¼ rðI N ; DN Þ
ð6Þ
Alternatively, information from all images could be used to produce a joint segmentation map. S joint ¼ rðI 1 I N ; D1 DN Þ
ð7Þ
where, dnðh;lÞ is the average wavelet coefficient of the region r tn . Thirdly, the entropy of the wavelet coefficients could be calculated and used as priority. The normalized Shannon entropy of a region is: X 1 P ðrtn Þ ¼ d 2 ðx; yÞ log d 2nðh;lÞ ðx; yÞ ð10Þ jrtn j 8h;8l;ðx;yÞ2r nðh;lÞ tn
A multiscale watershed segmentation algorithm is used to jointly segment multivalued images in [26]. In general, jointly segmented images work better for fusion. This is because the segmentation map will contain a minimum number of regions to represent all the features in the scene most efficiently. A problem can occur for separately segmented images, where different images have different features or features which appear as slightly different sizes in different modalities. Where regions partially overlap, if the overlapped region is incorrectly dealt with, artefacts will be introduced and the extra regions created to deal with the overlap will increase the time taken to fuse the images. However, if the information from the segmentation process is going to be used to register the images or if input images are completely different, it can be useful to separately segment them. The effects of segmenting the images in different ways are shown in Fig. 7. In particular, the inefficient segmentation union of the two unimodal segmentation maps, which is necessary in order to fuse the images is shown in Fig. 7(c). 3.2. Calculation of priority and fusion rules
with the convention 0 log (0) = 0. A variety of other methods could be used, possibly in combination with one or more of the above, to produce a priority map. The regionÕs size, shape or position of the center of mass could also be used to contribute to the priority. For example, smaller regions could be given higher priorities than larger regions which are more likely to contain background information. Fig. 8 shows a Delaunay triangulation map. This shows how each of the regions are related to other surrounding regions in the image. This information could be used to generate priorities based on the position of two regions relative to each other. For example, in Fig. 8, the priority of the figure could be increased the closer the figure is to the house or road. Alternatively, this region information could be used to weight a region, increasing or decreasing its prominence in the image. A weighting system implemented to increase the weighting of a region dependent on its proximity to another region is described in Eq. (11). The weighting, wi, of a region, ri of an image, with centre of mass at coordinates (xi, yi), is inversely proportional to the distance from a region rj. The regions need not be from the same image.
A measure of the average energy of the wavelet coefficients in a region is generally a good measure of the importance of that region. In [5], a simple activity measure taking the absolute value of the wavelet coefficient is used. Thus, the priority of region rtn in image n, with size jrtn j calculated with the detail coefficients dn(h, l)(x, y) 2 Dn(h, l): X 1 P ðrtn Þ ¼ jd nðh;lÞ ðx; yÞj ð8Þ jrtn j 8h;8l;ðx;yÞ2r tn
Alternatively, the variance of the wavelet coefficients over a region could be used as priority X 1 2 P ðrtn Þ ¼ ðd nðh;lÞ ðx; yÞ dnðh;lÞ Þ ð9Þ jrtn j 8h;8l;ðx;yÞ2r tn
Fig. 8. The fused image with segmentation map and triangulation map overlaid.
J.J. Lewis et al. / Information Fusion 8 (2007) 119–130
While the classification of regions as objects is beyond the scope of this research, a simple threshold can often be used to extract features such as people, animals and vehicles from infra-red images, for example, the figure from the IR image, Fig. 3(a). 1 wi / qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 2 ðxi xj Þ þ ðy i y j Þ
ð11Þ
4. Evaluation Evaluation of fusion techniques is a non-trivial task especially as it is often difficult to say which of two fused images are better without reference specific tasks. For example, the best image for a task involving tracking should emphasise the objects to be tracked. However, this will probably not be the best approach if perceptually natural images are required. Evaluation techniques are generally split into subjective or qualitative tests and objective or quantitative tests. The majority of testing of image fusion techniques tends to be subjective based on psycho-visual testing. The main disadvantage of this technique is that it requires time and resources (often in the form of expert subjects). Quantitative testing overcomes these difficulties, in theory, however it is difficult to perform and metrics must be validated. A number of assessment methods, such as root mean squared error, require a ‘‘ground truth’’ fused image which does not usually exist for real images (e.g. [6]). Three metrics are considered in this paper, which do not require ground truth images: mutual information (MI): described in [27], this essentially computes how much information from each of the fused images is transferred to the
125
fused image. The Xydeas and Petrovic QAB/F metric, proposed in [28], considers the amount of edge information transferred from the input images to the fused images using a Sobel edge detector to calculate the strength and orientation information at each pixel in both source and the fused images. The final metric, presented by Piella and Heijnmans in [29], is based on an image quality metric (IQM) introduced in [30]. This uses three measures: correlation, luminance distortion and contrast distortion. To apply this to fusion, saliency information in arbitrary windows is used to weight a comparison of all input images to the fused image. In order to include edge information in the calculation, this information is calculated from both the original images and from ‘‘edge images’’. 5. Results and discussion Fig. 9 shows the fused result of the IR and visible images from the ‘‘UN Camp’’ sequence for the region-based method. As can be observed, the region level scheme performs comparably with the pixel-based fusion method, shown in Fig. 3. The input images have been jointly segmented and priority based on either entropy, variance or activity. Each of these methods have successfully chosen the majority of the background of the fused image from the visible image, while the important features in the IR image, namely the figure, have also been included. With these images, there appears to be very little differentiating the three methods of calculating priority. The low-pass coefficients are combined with both averaging and using a region-based combination based on the mask used to fuse the high-pass coefficients. The region-based image method
Fig. 9. Region-based fusion of the ‘‘UN Camp’’ sequence. (a) Using entropy to calculate priority with low-pass averaged, (b) using entropy to calculate priority with low-pass region-fused, (c) using variance to calculate priority with low-pass region-fused, (d) using activity to calculate priority with low-pass region-fused, (e) mask used as multiplier on IR coefficients and (f) region-fused image with masked IR coefficients multiplied by a factor of 2.0.
126
J.J. Lewis et al. / Information Fusion 8 (2007) 119–130
produces images with better contrast that more accurately reflects the contrast of the original images compared with both pixel-based fusion and region-based fusion with averaging of the low-pass bands. Fig. 9(e) shows the weighting mask generated by extracting all regions in the IR image above a certain threshold. This can be seen to successfully extract the feature representing the figure. The coefficients representing the figure are weighted by a factor 2.0, which accentuates the presence of the figure, as shown in Fig. 9(f). Fig. 10 shows two multi-focus ‘‘Pepsi’’ images. The images are jointly segmented and the priorities calculated using entropy are shown in the figure. The parts of the image that are in focus have higher priorities than the out of focus parts. Thus, the fused image shows the entire image in focus. There is no perceivable difference between the pixel- and region-based fusion algorithms. As the two images are from the same type of sensor, there is no advantage to performing the fusion of the low-pass coefficients in a region-based manner compared with averaging.
Fig. 11 shows three input images, from the ‘‘Sea’’ data set, depicting a ship at sea. The data set consists of two IR images covering different spectral bands and a visible image. The pixel-fused image has pulled through some of the interlace artefacts and noise from the images to a greater extent than the region-based method. Errors due to mis-registration of the input images, which are not as visible in the region-fused image. However, some detail around the boat is missing in the region-fused image. While the region-fusion of the low-pass coefficients has improved contrast averaging, the image is more difficult to interpret. This is partly due to the fact that the sea in front of the boat on the right is from a different image to that on the left. Table 1 shows a comparison of this region-based (RB:CWT) fusion method and a number of pixel-based fusion methods using the metrics described in Section 4. These results confirm that the region fusion algorithm performs comparably to pixel-based algorithms. MI consistently rates region-based fusion as producing better
Fig. 10. Fusion of multi-focus Pepsi images. (a) Input image A, (b) input image B, (c) PB:AVE fused image, (d) PB:PYR fused image, (e) PB:DWT fused image, (f) PB:CWT fused image, (g) RB:CWT fused image, (h) entropy priority map for image A for the jointly segmented input images and (i) entropy priority map for image B for the jointly segmented input images.
J.J. Lewis et al. / Information Fusion 8 (2007) 119–130
127
Fig. 11. Fusion of the ‘‘Sea’’ images. (a) 2–3 lm frequency band IR input image, (b) 8–12 lm frequency band IR input image, (c) visible range CCD image, (d) PB:AVE fused image, (e) PB:PYR fused image, (f) PB:DWT fused image, (g) PB:CWT fused image, (h) RB:CWT fused image averaging lowpass coefficients and (i) RB:CWT image with region-based low-pass fusion.
results than pixel-based, except for the ‘‘Sea’’ images when it is rated second and averaging is rated highest. However, perceptually, averaging produces the worst fused image. The Xydeas and Petrovic metric shows the RB:CWT as out performing pixel-based methods, while the Piella metric shows the PB:CWT method to produce slightly better results. We have found that QAB/F and IQM generally correlate well with the results of visual analysis, however, it should be noted that these metrics are based on edges and fused images containing significant artefacts such as ringing introduced by the transform can be rated highly by these metrics but look inferior perceptually. Ideally the fused images should be tested using the performance of human operators at performing specific tasks. We hope to carry out such tests in the future.
Fig. 12 shows the fusion of the ‘‘Octec’’ IR and visible images. The input IR and visible images, with their segmentation overlaid to show the regions, are given in Fig. 12(b) and (a) respectively. In the pixel-fused image, Fig. 12(d), the figure, visible in the IR image is obscured by the smoke. As the smoke is fairly textured, the pixelbased fusion and region-based fusion using a statistically based priority pull through the smoke from the visible image obscuring the figure. Fig. 12(c) shows the RB:CWT fused image with the low-pass coefficients also region fused. This image has much better contrast compared with the PB:CWT fused image and perceptually looks better. However, there are some artefacts, for example in the sky in the top left of the image. As regions of interest tend to be small while the background (and other less interesting
128
J.J. Lewis et al. / Information Fusion 8 (2007) 119–130
Table 1 Comparison of fusion methods Metric
Method
UN Camp
Pepsi
Sea
MI
PB:AVE PB:PYR PB:DWT PB:CWT RB:CWT
1.000 1.000 1.000 1.056 1.060
1.320 1.318 1.288 1.314 1.402
1.163 1.078 1.100 1.109 1.123
QAB/F
PB:AVE PB:PYR PB:DWT PB:CWT RB:CWT
0.352 0.468 0.470 0.505 0.565
0.627 0.809 0.795 0.820 0.825
0.366 0.402 0.487 0.525 0.571
IQM
PB:AVE PB:PYR PB:DWT PB:CWT RB:CWT
0.580 0.700 0.745 0.772 0.721
0.753 0.956 0.952 0.956 0.954
0.665 0.736 0.790 0.808 0.766
features in an image) large, the inverse of the size of a region can be used for the priority. In the region-based fused image, Fig. 12(e), using the inverse of size as a priority, the figure is clearly visible. Fig. 13 shows some fused images from the ‘‘UN Camp’’ sequence. The distance between the centre of masses (see Fig. 8) of the region representing the road and the figure is calculated and the coefficients of the figure are weighted inversely proportional to the closeness to the road. For this experiment the road was manually selected and the figure detected by thresholding the IR image. The calculated distance (given in pixels) and the weighting applied to the fig-
ure is given for each image. These images are jointly segmented and fused using an entropy priority. The figure is seen to get brighter as he moves closer to the road. This task differs from that originally proposed in [12] for this image sequence, which was to determine the figureÕs position relative to the fence. However, this is a very difficult task and the task proposed is a worthwhile exercise showing how this algorithm could be applied. While the region fusion approach would not be applicable in all situations the aim is to show how region-based rules can be easily applied in a number of situations to improve the resulting fused image. The more prior knowledge available about the scene, the more we can tailor these rules to make them more useful. For example, the three ‘‘Sea’’ images in Fig. 14 show three kayaks in various formations. The distance between each kayak can be calculated and their spatial arrangement determined and using some prior knowledge decide what kind of boats are being dealt with (e.g. terrorists, smugglers, holiday makers etc.). All results in this paper were obtained with a Matlab implementation of the algorithm as part of the Image Fusion Toolbox (IFT) being developed at the University of Bristol and generated using a computer with a Pentium (R) 4 2.66 GHz processor and 1 GB of RAM. The RB:CWT fusion for the ‘‘UN Camp’’ images were computed in 37.90 s (0.64 s for the DT-CWT transform; 30.5 s for the segmentation; 6.38 s for the fusion; and 0.38 s for the inverse DT-CWT). To date, no effort has been made to optimise the algorithm and it is expected that substantial improvements in efficiency could be made. The
Fig. 12. Octec IR and visible fused images. (a) Octec IR image with segmentation overlaid, (b) Octec visible image with segmentation overlaid, (c) regionfused image using entropy as priority, (d) PB: CWT fused image, (e) region-fused image using size as priority.
J.J. Lewis et al. / Information Fusion 8 (2007) 119–130
129
Fig. 13. The figure in the IR image from the ‘‘UN Camp’’ sequence is highlighted depending on closeness to the road. (a) Image no. 11, Dist = 220; weighting = 1.10. (b) Image no. 13, Dist = 213; weighting = 1.13. (c) Image no. 18, Dist = 207; weighting = 1.16. (d) Image no. 19, Dist = 111; weighting = 1.65, (e) Image no. 24, Dist = 105; weighting = 1.67 and (f) image no. 26, Dist = 101; weighting = 1.69.
Fig. 14. Overlay of regions on three images from the ‘‘Sea’’ sequence.
PB:CWT takes 1.60 s (0.64 s for the DT-CWT transform; 0.58 s for fusion; and 0.38 s for the inverse DT-CWT). The PB:DWT takes 0.57 s (0.31 s for the DWT; 0.15 s for fusion; and 0.12 s for the inverse DWT); the PB:PYR requires 0.43 s (0.21 s for the forward transform; 0.18 s for fusion; and 0.04 s for the inverse transform) and PB:AVE takes 0.01 s. While the RB:CWT takes significantly longer to produce results, most of this time is taken by segmentation. This is often a post-processing step for pixel-based fusion and this must be considered when comparing timings. 6. Conclusions This paper has demonstrated that comparable results can be achieved with region and pixel-based methods. While region-based fusion methods are generally more complex, there are a number of advantages of such schemes over pixel-based fusion. The advantages demonstrated here
include novel more intelligent fusion rules and the ability to treat regions differently depending on a variety of properties such as average activity, size and position relative to other regions in the scene. Future work should include a thorough objective and subjective testing of the effects of using either joint or separate multi-modal image segmentation and the further development of higher-level regionbased fusion rules. Acknowledgements This work has been partially funded by the UK MOD Data and Information Fusion Defence Technology Center. The original ‘‘UN Camp’’ and ‘‘Sea’’ IR and visible images are kindly supplied by Alexander Toet of the TNO Human Factors Research Institute, the multifocus ‘‘Pepsi’’ images by Rick Blum of Lehigh University, Bethlehem and the Octec images by David Dwyer of OCTEC Ltd. These images are available online at www.imagefusion.org.
130
J.J. Lewis et al. / Information Fusion 8 (2007) 119–130
References [1] S.G. Nikolov, Image fusion: a survey of methods, applications, systems and interfaces, Technical Report UoB-SYNERGY-TR02, University of Bristol, UK, August 1998. [2] O. Rockinger, Pixel-level fusion of image sequences using wavelet frames, in: Proceedings of the 16th Leeds Applied Shape Research Workshop, Leeds University Press, 1996. [3] M. Abidi, R. Gonzalez, Data Fusion in Robotics and Machine Intelligence, Academic Press, USA, 1992. [4] A. Toet, Hierarchical image fusion, Machine Vision and Applications 3 (1990) 1–11. [5] G. Piella, A general framework for multiresolution image fusion: from pixels to regions, Information Fusion 4 (2003) 259–280. [6] H. Li, S. Manjunath, S. Mitra, Multisensor image fusion using the wavelet transform, Graphical Models and Image Processing 57 (3) (1995) 235–245. [7] O. Rockinger, Image sequence fusion using a shift-invariant wavelet transform, in: Proceedings of the IEEE International Conference on Image Processing, vol. III, 1997, pp. 288–291. [8] N. Kingsbury, The dual-tree complex wavelet transform: a new technique for shift invariance and directional filters, in: IEEE Digital Signal Processing Workshop, vol. 86, 1998. [9] S.G. Nikolov, P.R. Hill, D.R. Bull, C.N. Canagarajah, Wavelets for image fusion, in: A. Petrosian, F. Meyer (Eds.), Wavelets in Signal and Image Analysis, Computational Imaging and Vision Series, Kluwer Academic Publishers, Dordrecht, The Netherlands, 2001, pp. 213–244. [10] P.R. Hill, Wavelet Based Texture Analysis and Segmentation for Image Retrieval and Fusion, Ph.D. Thesis, Department of Electrical and Electronic Engineering, University of Bristol, UK, 2002. [11] Z. Zhang, R. Blum, A categorization and study of multiscaledecomposition-based image fusion schemes, Proceedings of the IEEE (1999) 1315–1328. [12] A. Toet, J. IJspeert, A. Waxman, M. Aguilar, Fusion of visible and thermal imagery improves situational awareness, Displays 18 (1997) 85–95. [13] A. Toet, J. van Ruyven, J. Valeton, Merging thermal and visual images by contrast pyramids, Optical Engineering 28 (7) (1989) 789–792. [14] P. Burt, R. Kolczynski, Enhanced image capture through fusion, in: Proceedings of the 4th International Conference on Computer Vision, 1993, pp. 173–182. [15] T. Wilson, S. Rogers, M. Kabrisky, Perceptual based hyperspectral image fusion using multi-spectral analysis, Optical Engineering 34 (11) (1995) 3154–3164.
[16] A. Toet, Multiscale contrast enhancement with applications to image fusion, Optical Engineering 31 (May) (1992) 1026–1031. [17] L. Chipman, T. Orr, L. Graham, Wavelets and image fusion, in: Wavelet Applications in Signal and Image Processing III, vol. 2569, 1995, pp. 208–219. [18] N. Kingsbury, The dual-tree complex wavelet transform with improved orthogonality and symmetry properties, in: Proceedings IEEE Conference on Image Processing, Vancouver, 2000. [19] N. Kingsbury, Image processing with complex wavelets, in: B. Silverman, J. Vassilicos (Eds.), Wavelets: The Key to Intermittent Information, Oxford University Press, USA, 1999, pp. 165– 185. [20] G. Piella, A region-based multiresolution image fusion algorthim, in: ISIF Fusion 2002 conference, Annapolis, July 2002. [21] Z. Zhang, R. Blum, Region-based image fusion scheme for concealed weapon detection, in: Proceedings of the 31st Annual Conference on Information Sciences and Systems, March, 1997. [22] B. Matuszewski, L.-K. Shark, M. Varley, Region-based wavelet fusion of ultrasonic, radiographic and shearographyc non-destructive testing images, in: Proceedings of the 15th World Conference on NonDestructive Testing, Rome, October 2000. [23] G. Piella, H. Heijmans, Multiresolution image fusion guided by a multimodal segmentation, in: Proceedings of Advanced Concepts of Intelligent Systems, Ghent, Belgium, September 2002. [24] J.J. Lewis, R.J. OÕCallaghan, S.G. Nikolov, D.R. Bull, C.N. Canagarajah, Region-based image fusion using complex wavelets, in: The Seventh International Conference on Image Fusion, Stockholm, 2004. [25] R. OÕCallaghan, D.R. Bull, Combined morphological–spectral unsupervised image segmentation, IEEE Transactions on Image Processing 14 (1) (2005) 49–62. [26] P. Scheunders, J. Sijbers, Multiscale watershed segmentation of multivalued images, in: International Conference on Pattern Recognition, Quebec, 2002. [27] G. Qu, D. Zhang, P. Yan, Information measure for performance of image fusion, Electronic Letters 38 (7) (2002) 313–315. [28] V. Petrovic, C. Xydeas, On the effects of sensor noise in pixel-level image fusion performance, in: Proceedings of the Third International Conference on Image Fusion, vol. 2, 2000, pp. 14–19. [29] G. Piella, H. Heijmans, A new quality metric for image fusion, in: International Conference on Image Processing, ICIP, Barcelona, Spain, 2003. [30] Z. Wang, A. Bovik, A universal image quality index, Signal Processing Letters 9 (3) (2002) 81–84.