Infrared Physics & Technology 53 (2010) 43–49
Contents lists available at ScienceDirect
Infrared Physics & Technology journal homepage: www.elsevier.com/locate/infrared
Feature-level and pixel-level fusion routines when coupled to infrared night-vision tracking scheme Yi Zhou, Abedalroof Mayyas, Ala Qattawi, Mohammed Omar * Clemson University International Center for Automotive Research – CU-ICAR, Greenville, SC 29607, United States
a r t i c l e
i n f o
Article history: Received 28 May 2009 Available online 3 September 2009 Keywords: Feature-based fusion Pixel-level fusion Night vision Weighted average Gaussian filtering
a b s t r a c t This manuscript evaluates the feature-based and the pixel-based fusion schemes quantitatively when applied to fuse infrared LWIR and visible TV sequences. The input sequence is from a commercial night-vision module dedicated for automotive applications. The text presents an in-house feature-level fusion routine that applies three fusing relationships; intersection, disjointing and inclusion, in addition to a new objects tracking routine. The processing is done for two specific night driving scenarios; a passing vehicle and an approaching vehicle with glare. The study presents the feature-level fusion details that include; a registration done at the hardware-level, a Gaussian-based preprocessing, a feature extraction subroutine, and finally the fusing logic. The evaluation criteria are based on the retrieved objects morphology and the number of features extracted. Presented comparison show that feature-level is more robust over variations in intensity of input channels and provides higher signal to noise ratio; 6.18 compared to 4.72 for the pixel-level case. Additionally, this study indicates that the pixel-level extracts more information from the channel with higher intensity while the feature-level highlights the input with higher number of features. Ó 2009 Elsevier B.V. All rights reserved.
1. Introduction Sensor fusion constitutes the process of combining the relevant information from multiple sensors into one unified output. Image fusion is one subset of this field, and is currently being implemented in different fields; ariel mapping [1], surveillance [2], and most recently in night vision driving modules [3]. Comprehensive definition and application of image fusion can be found in [4]. Current image fusion schemes can be usefully classified into; pixel-level, feature-level and decision-level fusion. The distinguishing factor between these levels is the sequence and the criteria for extracting the information from each source image. In pixel-level fusion, the content of each image is combined on a pixel-by-pixel basis followed by the information extraction step. In feature-level fusion, the information is extracted separately from each source image and then combined. While in decision-level fusion, the information is extracted from each input separately and then a decision is made based on specific criteria such as man-made vs. natural features, to decide on the content to be combined from each input channel. Each level constitutes a different set of image processing routines done in different domains and using different arithmetic operations. In [3], we proposed a new pixel-level based
* Corresponding author. Address: 340 Carroll Campbell Graduate Engineering Center, CU-ICAR Campus. Tel.: +864 283 7226. E-mail address:
[email protected] (M. Omar). 1350-4495/$ - see front matter Ó 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.infrared.2009.08.011
fusion routine to couple with an in-house tracking code to detect objects in night driving scenario mainly; pedestrians, vehicles and road features. The pixel-level subroutine was able to retrieve the encountered features accurately at a refresh rate of 30 Hz. This manuscript investigates the tracking routine performance when coupled with a feature-level based fusion for the same driving scenario of [5]. Section 2 discusses the image computations needed for the feature-level code, through explaining the code flow-diagram and contrasting it with that for a pixel-level fusion scheme. Section 3 compares the retrieved objects from both the feature-level and pixel-level codes and evaluates the tracking code performance in predicting detected objects shape and signal to noise ratio. Finally, the conclusion summarizes the findings and proposes future improvements. 2. Feature-level fusion; the image processing Pixel-level fusion was developed first in [3] because it is considered to be of lower complexity from the feature-level fusion. It fuses raw images from different inputs and domains to enhance the objects that are not complete in either domain or input channel. On the other hand, the fusion-level requires a higher quality raw imagery, because of the initial step of feature extraction from each channel, which should be done in real-time (30 Hz). However, the feature-level fusion is considered advantageous because it
44
Y. Zhou et al. / Infrared Physics & Technology 53 (2010) 43–49
requires a single processing code, while the pixel-level might require a Principle Component Analysis (PCA) transformation for saturated pixels as in the head-lamp glare case, in addition to an adaptive averaging for un-saturated pixels. This dual processing will also require a subroutine to distinguish between saturated and un-saturated input locations. The input sequences are from two channels; a Long-Wave Infrared LWIR (7.5–13 lm) feed from an un-cooled, micro-bolometric array of 352 144 pixels (commercial name PathFinderÒ, product of FLIR, MA). In addition to a visible TV channel from a Charged Coupled Device CCD detector. The refresh rate is set at 30 Hz for both detectors and an image registration is done at the hardware level using direct translation and rotation of acquired scans based on priori calibrated target. Other possible registration routines can be found in [6], which requires the mapping of the input pixels from both detectors. Figs. 1a and b, and 2a and b display the input images; Fig. 1a shows the passing vehicle scenario LWIR feed and Fig. 1b shows that of the CCD detector. Fig. 2 displays the approaching vehicle with glare scenario with a, b being the LWIR and CCD feeds, respectively. The proposed feature-level fusion code is based on the blockdiagram in Fig. 3, which shows the main calculation sequence; preprocessing followed by the tracking routine developed in [5]. To improve the quality of the source images, a Gaussian filtering is applied to attenuate the noise levels; the preprocessing is implemented using the operator from Eq. (1) and the convolution from Eq. (2);
Gðx; yÞ ¼
1
e
x2 þy2 2r2
2pr Sðx; yÞ ¼ Iðx; yÞ Gðx; yÞ 2
ð1Þ ð2Þ
where G(x, y) is the Gaussian kernel, I(x, y) is the raw image, and S(x, y) is the convolution of G(x, y) and (x, y). Other filters might be used for both or one of the raw images depending on the noise type and level in the source image. The Gaussian filter even though it smoothes the images, it proved to be efficient and suitable choice for current detectors. The main advantage of the Gaussian filter over an averaging based filter is that it provides a good model of a fuzzy point source. On the other hand the averaging filter assumes a square correction of point-source images, which is not suitable for
Fig. 1. Passing vehicle scenario: (a) LWIR image and (b) visible image.
Fig. 2. Approaching vehicle scenario: (a) LWIR and (b) visible image.
objects diffracted through defocused lenses. Additionally the Gaussian filter is rotationally symmetric and its filtering weights/effects decreases monotonically from central pixel thus providing the most weight to each central pixel, also it is easy to mathematically adjust its characteristics through its standard deviation to suit each LWIR and visible feeds and it is also separable. The feature extraction code is based on a seed detection algorithm that consists of three primary steps; seed initiation, boundary detection, and finally seed growth to construct complete objects. The seed initiation step is based on a thresholding scheme that segments the objects in the field of view thus eliminating the background and its associated noise. Lee et al. in [7] utilized a thresholding routine that utilizes the input images’ histogram to define an initial threshold value as the minimum value between two peaks. However, in this case a double weighted threshold scheme is used to identify the threshold. This is done based on the histograms in Fig. 4a and b, of the LWIR and visible images from Fig. 1a and b, respectively. The weighted, double-thresholding eliminates the temporal and spatial background and is found effective for both the visible and the thermal acquisitions. Self referencing proposed in [8] and [9] is another possible thresholding scheme. Even though the self-referencing scheme is found effective in extracting targets from noisy thermal images using a 2D adaptive operator, it is mainly designed to detect BLObs (Binary Large Objects); thus it does not extract features spread in one-dimension such as lane markers and light streaks, which are both found in current case study.
45
Y. Zhou et al. / Infrared Physics & Technology 53 (2010) 43–49
Fig. 3. Flow chart of proposed feature-level image fusion routine.
a
b
800
3500
700
3000
600
2500
500
2000
400
1500
300
1000
200 100
500
X= 180 Y= 2
0 0
50
100
150
200
X= 69 Y= 12
0 250
0
50
100
150
200
250
Fig. 4. Histogram of passing vehicle scenario: (a) LWIR image and (b) visible image.
The pseudo code of the weighted double thresholding algorithm is shown below. The threshold value for Fig. 4a is found to be 180 and that for Fig. 4b is 69. The resulted images after the double thresholding are displayed in Fig. 5a and b. Initialize threshold_value to intensity average of the ROI while
{ for { calculate the average of pixels less than threshold_value, average_low; calculate the average of pixels not less than threshold_value, average_high;} threshold_value = (average_low + average_high weight)/(weight + 1);} The boundary detection is then implemented by using a Sobel edge detector, described mathematically in Eqs. (3) and (4) coupled with a 4-directional flood fill algorithm. Running the boundary detection on Fig. 5a and b results in Fig. 6a and b, respectively, which show the main boundaries for the detected objects, in this case the passing vehicle. Inspecting these figures indicates that the objects detected in the LWIR are the vehicles
tires’ due to the frictional heat generated, while Fig. 6b shows the vehicle head-lamps and the traffic signs. Thus, Fig. 6a and b indicates that the tracking routine can detect relevant information from both feeds which then can be combined in the fusion step. If those are not detected then a feature-level fusion will not be possible.
Gðx; yÞ ¼
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi G2x þ G2y
h ¼ arctanðGy =Gx Þ
ð3Þ ð4Þ
where G(x, y) is the amplitude of the Sobel edge, and h is the angle of the Sobel edge. Once the seed and boundary are obtained, a seed growth process is implemented by using erosion and dilation combination to grow the seeds until they touch the boundaries and converge with stable edges. Then a feature extraction process is done on both infrared and visual images to extract the targeted features or relevant morphological information such as object’s centroid, size, aspect ratio, and its angular direction. The centroid is calculated using the image moment as in Eqs. (5) and (6).
46
Y. Zhou et al. / Infrared Physics & Technology 53 (2010) 43–49
Pn Þ ðxi xÞðyi y are mean value of x and y, where covðx; yÞ ¼ i¼1 n1 , x and y respectively. Then calculate the eigenvalues of the covariance matrix,
trace C ¼ covðx; xÞ þ covðx; yÞ; det C ¼ covðx; xÞ covðy; yÞ covðx; yÞ covðy; xÞ
ð8Þ
Finally, the major and minor axes lengths can be computed using Eqs. (9) and (10).
Major axis length : Minor axis length :
trace C þ trace C
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi trace C 2 4det C 2 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ð9Þ
2
trace C 4det C 2
ð10Þ
The direction of the image is calculated as the angle h between the major axis and the horizontal axis described in Eq. (11),
h¼
1 2covðx; yÞ tan1 2 covðx; xÞ covðy; yÞ
ð11Þ
Finally, the major and minor axes are used to draw the rectangle, showing the reconstructed, detected objects from both feeds, as displayed in Fig. 7. After proving that the detection code can extract the relevant information from each channel, a feature-level algorithm operates on the features from Fig. 7a and b. The feature-level fusion can apply one of several relationships or logics between the detected objects from each sensor; such
Fig. 5. Seed initiation: (a) LWIR image and (b) visible image.
Fig. 6. Boundary detection results: (a) LWIR image and (b) visible image.
Moment : mpq ¼
XX y
Centroid : xc ¼
xp yq Iðx; yÞ
ð5Þ
m10 m01 ; yc ¼ m00 m00
ð6Þ
where I(x, y) is the intensity of the pixel at location (x, y), p and q are the order of location x and y, respectively. m00 is the region area, m10 is the region moment about x axis, m01 is the region moment about y axis. The eccentricity is calculated using lengths of the major and minor axes as described through Eqs. (7) and (10). To calculate the major and minor axes lengths, the covariance matrix is calculated as;
C¼
covðx; xÞ
Fig. 7. Feature extraction results: (a) LWIR image and (b) visible image.
x
covðx; yÞ
covðy; xÞ covðy; yÞ
I
a
V
I
V
b
V
I I
V
c
ð7Þ Fig. 8. Feature-level fusion conditions: (a) intersect, (b) disjoint, and (c) include.
Y. Zhou et al. / Infrared Physics & Technology 53 (2010) 43–49
47
relationships are shown graphically in Fig. 8a–c, where circle I represents infrared tracking and rectangle V represents visual tracking. Because, the wavelength of LWIR is from (7.5–13 lm) and that of the visible light from (0.38–0.78 lm), the tracking information from infrared and visual images is inherently complementary. Therefore, a series of feature level operations are applied in the upper three conditions based on their complementary relationship. In intersect condition, both tracking information contain the information of the same target; so an OR operation F = I [ V is applied to obtain the union of I and V to obtain more of the targets features. In disjoint condition, each detected objects contains the information for a different targets; so an OR operation F = I [ V is applied to obtain the union of I and V and keep both of them. In include condition, both tracking contain the information of the same target. However, because one is included in another one, there is some redundant information. So an AND operation F = I \ V is applied to obtain the intersection of I and V to obtain more detailed information of the target. The fused image of the passing vehicle scenario from Fig. 1 is shown in Fig. 9. Both the emitted information from the warm features such as the tires from the infrared image and the reflective features as those in the road sign and the vehicle head-lamps from visual image, are retained in the final fused image. For the approaching vehicle with glare scenario, the fused image is in Fig. 10. The infrared input carries more detailed information about the vehicle but it misses the reflective components from the ambient objects; road lane-markings and signs. While the visual image carries more information about the road sign and road-lane paint markings but is saturated with the bright glare, which saturates the object preventing any prediction of the object type (shape), or orientation. For this scenario, the feature-level results in the
Fig. 11. Pixel-level fusion of: (a) passing vehicle and (b) approaching vehicle scenario.
image displayed in Fig. 10, which retains useful information from both the infrared and the visual domains and at the mean time attenuate the glare to show the vehicle in details.
Fig. 9. Feature-level fusion of LWIR image and visible image.
3. Feature-level vs. pixel-level fusion This section discusses the implemented feature-level fusion results in light of the results from a pixel-level fusion conducted for the same scenarios. We start by demonstrating the pixel-level fusion for the images from Figs. 1 and 2, which are displayed in Fig. 11a and b, respectively. The displays in Fig. 11a and b shows the pixel-level fusion results along with the tracking routine trace. For the pixel-level fusion an adaptive weighting algorithm is used; mathematically described in Eq. (12), to combine the LWIR and the visible raw images into the ones in Fig. 11. The complete details of this algorithm are in reference [3], however its processing blockdiagram is displayed in Fig. 12 for completeness.
ImgFused ¼
Fig. 10. Feature-level fusion of approaching vehicle scenario.
a ImgInfrared ImgVisible Img2Infrared þ Img2Visible
ð12Þ
where Img represent each of the images, and a is found to have the value of 2. Generally both fusion codes are successful in integrating more information in the final fused image, however in Fig. 9 the detected vehicle trace is more accurate in describing the shape of the vehicle from that in Fig. 11a. Also, the feature-level result has detected the
48
Y. Zhou et al. / Infrared Physics & Technology 53 (2010) 43–49
Infrared Image
Visual Image
Preprocessing
Preprocessing
To further compare the pixel-level and feature-level computations, the signal to noise ratio (SNR) of each of the resulted images are computed using Eq. (13), which is derived from a relation proposed by Krile et al. [10].
Pk SNR ¼ Image Fusion
Seed Growth
Converge?
No
Yes Done
Fig. 12. The block-diagram for the pixel-level fusion code.
traffic sign that is missed from the pixel-level image in Fig. 11a. Comparing Figs. 10 and 11b show that feature-level have resulted in more detected features that include; the lane-markings, the traffic signs, and the light-spots in the background, which are all missing from the pixel-level image in Fig. 11b. On the other hand, comparing Figs. 10 and 11b show that the feature-level result is saturated with glare compared with the pixel-level image, which displays the approaching vehicle without any glare in the field of view.
a
. ½Pf ðiÞ2 k ð13Þ
½rðNðiÞÞ2
where Pf(i, j) is the intensity profile within the image background, and noise N(i) is the difference between two profiles such as Pf(i) within the same proximity. Applying Eq. (13) on pixel and feature-level fused images, results in an average SNR of 4.72 and 6.18, respectively; this difference in SNR might be due to the Gaussian filter implementation sequence in each of the codes. However, the SNR computation indicates that the feature-level provides better fused-images in terms of resolution from the same input channels, when compared with the pixel-level scheme. To further investigate the reconstructed object’s boundaries, an intensity profile is drawn across some of the passing vehicle features from each of the fused images; the resulted profiles are compiled in Fig. 13, which shows that the head-lamps reconstructed boundaries are better in the case of the feature-level, while those of the tires are more detectable in the pixel-level image. Additionally, the data extracted from each input channel for each of the fusion routines, are investigated through subtracting each raw image from the fused one. An image arithmetic subtraction is applied to yield the images in Fig. 14a and b. These images are for approaching vehicle with glare scenario. Investigating Fig. 14a and b indicates that the pixel-level extracts more information from the channel with higher intensity while the feature-level highlights the input with higher number of features. This is due the fact that in feature-level, the feature extraction operates on each input separately, then fuses them hence the difference in intensity scale for each input is neutralized. Other quantitative criteria such
Boundary Detection
Seed Detection
i
250 Pixel-level Feature-level 200
150
100
50
0
b
1
2
3
4
5
6
7
8
9
10
11
160 Pixel-level Feature-level
150 140 130 120 110 100 90 80 70 60
1
2
3
4
5
6
7
8
9
10
Fig. 13. The intensity across features of interest from the approaching vehicle: (a) the vehicle head-lamp and (b) the vehicle tires.
Y. Zhou et al. / Infrared Physics & Technology 53 (2010) 43–49
49
Fig. 14. Features added from the infrared image to the visible raw image, the result of subtracting the: (a) feature-level and (b) pixel-level images from the visible image of the approaching vehicle scenario.
as the Image Quality (IQ) proposed by Liu in [4] are more suitable to multi-spectral (2 bands) fusion with focus on edge intensity and Gaussian noise contribution. 4. Conclusion This manuscript presented a combined algorithm to track and fuse relevant objects and features from LWIR and visible sequences, into reconstructed fused-images. An in-house tracking algorithm is used to extract the relevant features from each input channel, using a double thresholding, self-calibrated process, which equalizes the histogram of each raw feed. The proposed fusion-level is applied successfully on the two discussed driving scenarios and can be further applied on real-time videos with 30 Hz refresh rate. Additionally, the text evaluated the results from the feature-level approach with those from a pixel-level code, which uses an adaptive weighting and a PCA transformation. The comparison indicates that the feature-level fusion produces final images with higher SNR than those obtained using a pixel-level scheme. Additionally, the pixel-level code is more sensitive to the intensity scales of objects from the input images, while the feature-level focuses on the number of objects of interest from the raw feed. Even though, the feature-level implements the feature extraction subroutine to both of the input images and hence requiring more processing resources, it avoids the problem of checking each input pixel for saturation as the case in the pixel-level routine. The pro-
posed and evaluated, feature-level code shows the potential of such technique in combining the relevant data from raw, semi-processed, and processed input images. Further optimization is required to improve the code reconstruction of detected objects boundaries. References [1] L. Wald, T. Ranchin, M. Mangolini, Fusion of satellite images of different spatial resolutions: assessing the quality of resulting images, PhEngRS 63 (6) (1997) 691–699. [2] W.C. Kao, Real-time image fusion and adaptive exposure control for smart surveillance systems, Electronics Letters 43 (18) (2007). [3] Y. Zhou, M.A. Omar, Routines for fusing infrared, visible acquisitions, applied to night vision systems, International Journal of Optomechatronics 3 (1) (2009) 41. [4] R. Blum, Z. Liu, Multi-Sensor Image Fusion and Its Applications, Marcel Dekker/ CRC Press, London, 2005. [5] M. Omar, Y. Zhou, Pedestrian tracking routine for passive automotive night vision systems, Sensor Review 27 (4) (2007) 310–316. [6] L.G. Brown, A survey of image registration techniques, ACM Computing Surveys 24 (4) (1992) 325–376. [7] T.S. Lee, J.-S. Shie, T.L. Hung, Application studies of a simulated low density room-temperature IRFPA, SPIE Proceedings 3361 (1998) 26–34. [8] M. Omar, K. Chuah, K. Saito, A. Numasato, M. Sakakibara, Infrared seed inspection system (IRSIS) on painted car shells, Infrared Physics and Technology 48 (2006) 240–248. [9] M. Omar, B. Gharaibeh, A.J. Salazar, K. Saito, Infrared thermography and ultraviolet fluorescence for the nondestructive evaluation of ballast tanks’ coated surfaces, The NDT and E International Journal 40 (2007) 62–70. [10] L. Krile, S. Mitra, Digital registration technique for sequential fundus images, Applications of Digital Image Processing X (829) (1987) 293–300.