Feature-level and pixel-level fusion routines when coupled to infrared night-vision tracking scheme

Infrared Physics & Technology 53 (2010) 43–49 Contents lists available at ScienceDirect Infrared Physics & Technology journal homepage: www.elsevier...

Download PDF

541KB Sizes 0 Downloads 50 Views

Report

PDF Reader
Full Text

Infrared Physics & Technology 53 (2010) 43–49

Contents lists available at ScienceDirect

Infrared Physics & Technology journal homepage: www.elsevier.com/locate/infrared

Feature-level and pixel-level fusion routines when coupled to infrared night-vision tracking scheme Yi Zhou, Abedalroof Mayyas, Ala Qattawi, Mohammed Omar * Clemson University International Center for Automotive Research – CU-ICAR, Greenville, SC 29607, United States

a r t i c l e

i n f o

Article history: Received 28 May 2009 Available online 3 September 2009 Keywords: Feature-based fusion Pixel-level fusion Night vision Weighted average Gaussian ﬁltering

a b s t r a c t This manuscript evaluates the feature-based and the pixel-based fusion schemes quantitatively when applied to fuse infrared LWIR and visible TV sequences. The input sequence is from a commercial night-vision module dedicated for automotive applications. The text presents an in-house feature-level fusion routine that applies three fusing relationships; intersection, disjointing and inclusion, in addition to a new objects tracking routine. The processing is done for two speciﬁc night driving scenarios; a passing vehicle and an approaching vehicle with glare. The study presents the feature-level fusion details that include; a registration done at the hardware-level, a Gaussian-based preprocessing, a feature extraction subroutine, and ﬁnally the fusing logic. The evaluation criteria are based on the retrieved objects morphology and the number of features extracted. Presented comparison show that feature-level is more robust over variations in intensity of input channels and provides higher signal to noise ratio; 6.18 compared to 4.72 for the pixel-level case. Additionally, this study indicates that the pixel-level extracts more information from the channel with higher intensity while the feature-level highlights the input with higher number of features. Ó 2009 Elsevier B.V. All rights reserved.

1. Introduction Sensor fusion constitutes the process of combining the relevant information from multiple sensors into one uniﬁed output. Image fusion is one subset of this ﬁeld, and is currently being implemented in different ﬁelds; ariel mapping [1], surveillance [2], and most recently in night vision driving modules [3]. Comprehensive deﬁnition and application of image fusion can be found in [4]. Current image fusion schemes can be usefully classiﬁed into; pixel-level, feature-level and decision-level fusion. The distinguishing factor between these levels is the sequence and the criteria for extracting the information from each source image. In pixel-level fusion, the content of each image is combined on a pixel-by-pixel basis followed by the information extraction step. In feature-level fusion, the information is extracted separately from each source image and then combined. While in decision-level fusion, the information is extracted from each input separately and then a decision is made based on speciﬁc criteria such as man-made vs. natural features, to decide on the content to be combined from each input channel. Each level constitutes a different set of image processing routines done in different domains and using different arithmetic operations. In [3], we proposed a new pixel-level based

* Corresponding author. Address: 340 Carroll Campbell Graduate Engineering Center, CU-ICAR Campus. Tel.: +864 283 7226. E-mail address: [email protected] (M. Omar). 1350-4495/$ - see front matter Ó 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.infrared.2009.08.011

fusion routine to couple with an in-house tracking code to detect objects in night driving scenario mainly; pedestrians, vehicles and road features. The pixel-level subroutine was able to retrieve the encountered features accurately at a refresh rate of 30 Hz. This manuscript investigates the tracking routine performance when coupled with a feature-level based fusion for the same driving scenario of [5]. Section 2 discusses the image computations needed for the feature-level code, through explaining the code ﬂow-diagram and contrasting it with that for a pixel-level fusion scheme. Section 3 compares the retrieved objects from both the feature-level and pixel-level codes and evaluates the tracking code performance in predicting detected objects shape and signal to noise ratio. Finally, the conclusion summarizes the ﬁndings and proposes future improvements. 2. Feature-level fusion; the image processing Pixel-level fusion was developed ﬁrst in [3] because it is considered to be of lower complexity from the feature-level fusion. It fuses raw images from different inputs and domains to enhance the objects that are not complete in either domain or input channel. On the other hand, the fusion-level requires a higher quality raw imagery, because of the initial step of feature extraction from each channel, which should be done in real-time (30 Hz). However, the feature-level fusion is considered advantageous because it

44

Y. Zhou et al. / Infrared Physics & Technology 53 (2010) 43–49

requires a single processing code, while the pixel-level might require a Principle Component Analysis (PCA) transformation for saturated pixels as in the head-lamp glare case, in addition to an adaptive averaging for un-saturated pixels. This dual processing will also require a subroutine to distinguish between saturated and un-saturated input locations. The input sequences are from two channels; a Long-Wave Infrared LWIR (7.5–13 lm) feed from an un-cooled, micro-bolometric array of 352 144 pixels (commercial name PathFinderÒ, product of FLIR, MA). In addition to a visible TV channel from a Charged Coupled Device CCD detector. The refresh rate is set at 30 Hz for both detectors and an image registration is done at the hardware level using direct translation and rotation of acquired scans based on priori calibrated target. Other possible registration routines can be found in [6], which requires the mapping of the input pixels from both detectors. Figs. 1a and b, and 2a and b display the input images; Fig. 1a shows the passing vehicle scenario LWIR feed and Fig. 1b shows that of the CCD detector. Fig. 2 displays the approaching vehicle with glare scenario with a, b being the LWIR and CCD feeds, respectively. The proposed feature-level fusion code is based on the blockdiagram in Fig. 3, which shows the main calculation sequence; preprocessing followed by the tracking routine developed in [5]. To improve the quality of the source images, a Gaussian ﬁltering is applied to attenuate the noise levels; the preprocessing is implemented using the operator from Eq. (1) and the convolution from Eq. (2);

Gðx; yÞ ¼

1

e

x2 þy2 2r2

2pr Sðx; yÞ ¼ Iðx; yÞ Gðx; yÞ 2

ð1Þ ð2Þ

where G(x, y) is the Gaussian kernel, I(x, y) is the raw image, and S(x, y) is the convolution of G(x, y) and (x, y). Other ﬁlters might be used for both or one of the raw images depending on the noise type and level in the source image. The Gaussian ﬁlter even though it smoothes the images, it proved to be efﬁcient and suitable choice for current detectors. The main advantage of the Gaussian ﬁlter over an averaging based ﬁlter is that it provides a good model of a fuzzy point source. On the other hand the averaging ﬁlter assumes a square correction of point-source images, which is not suitable for

Fig. 1. Passing vehicle scenario: (a) LWIR image and (b) visible image.

Fig. 2. Approaching vehicle scenario: (a) LWIR and (b) visible image.

objects diffracted through defocused lenses. Additionally the Gaussian ﬁlter is rotationally symmetric and its ﬁltering weights/effects decreases monotonically from central pixel thus providing the most weight to each central pixel, also it is easy to mathematically adjust its characteristics through its standard deviation to suit each LWIR and visible feeds and it is also separable. The feature extraction code is based on a seed detection algorithm that consists of three primary steps; seed initiation, boundary detection, and ﬁnally seed growth to construct complete objects. The seed initiation step is based on a thresholding scheme that segments the objects in the ﬁeld of view thus eliminating the background and its associated noise. Lee et al. in [7] utilized a thresholding routine that utilizes the input images’ histogram to deﬁne an initial threshold value as the minimum value between two peaks. However, in this case a double weighted threshold scheme is used to identify the threshold. This is done based on the histograms in Fig. 4a and b, of the LWIR and visible images from Fig. 1a and b, respectively. The weighted, double-thresholding eliminates the temporal and spatial background and is found effective for both the visible and the thermal acquisitions. Self referencing proposed in [8] and [9] is another possible thresholding scheme. Even though the self-referencing scheme is found effective in extracting targets from noisy thermal images using a 2D adaptive operator, it is mainly designed to detect BLObs (Binary Large Objects); thus it does not extract features spread in one-dimension such as lane markers and light streaks, which are both found in current case study.

45

Y. Zhou et al. / Infrared Physics & Technology 53 (2010) 43–49

Fig. 3. Flow chart of proposed feature-level image fusion routine.

a

b

800

3500

700

3000

600

2500

500

2000

400

1500

300

1000

200 100

500

X= 180 Y= 2

0 0

50

100

150

200

X= 69 Y= 12

0 250

0

50

100

150

200

250

Fig. 4. Histogram of passing vehicle scenario: (a) LWIR image and (b) visible image.

The pseudo code of the weighted double thresholding algorithm is shown below. The threshold value for Fig. 4a is found to be 180 and that for Fig. 4b is 69. The resulted images after the double thresholding are displayed in Fig. 5a and b. Initialize threshold_value to intensity average of the ROI while { for { calculate the average of pixels less than threshold_value, average_low; calculate the average of pixels not less than threshold_value, average_high;} threshold_value = (average_low + average_high weight)/(weight + 1);} The boundary detection is then implemented by using a Sobel edge detector, described mathematically in Eqs. (3) and (4) coupled with a 4-directional ﬂood ﬁll algorithm. Running the boundary detection on Fig. 5a and b results in Fig. 6a and b, respectively, which show the main boundaries for the detected objects, in this case the passing vehicle. Inspecting these ﬁgures indicates that the objects detected in the LWIR are the vehicles

tires’ due to the frictional heat generated, while Fig. 6b shows the vehicle head-lamps and the trafﬁc signs. Thus, Fig. 6a and b indicates that the tracking routine can detect relevant information from both feeds which then can be combined in the fusion step. If those are not detected then a feature-level fusion will not be possible.

Gðx; yÞ ¼

qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ G2x þ G2y

h ¼ arctanðGy =Gx Þ

ð3Þ ð4Þ

where G(x, y) is the amplitude of the Sobel edge, and h is the angle of the Sobel edge. Once the seed and boundary are obtained, a seed growth process is implemented by using erosion and dilation combination to grow the seeds until they touch the boundaries and converge with stable edges. Then a feature extraction process is done on both infrared and visual images to extract the targeted features or relevant morphological information such as object’s centroid, size, aspect ratio, and its angular direction. The centroid is calculated using the image moment as in Eqs. (5) and (6).

46

Y. Zhou et al. / Infrared Physics & Technology 53 (2010) 43–49

Pn Þ ðxi xÞðyi y are mean value of x and y, where covðx; yÞ ¼ i¼1 n1 , x and y respectively. Then calculate the eigenvalues of the covariance matrix,

trace C ¼ covðx; xÞ þ covðx; yÞ; det C ¼ covðx; xÞ covðy; yÞ covðx; yÞ covðy; xÞ

ð8Þ

Finally, the major and minor axes lengths can be computed using Eqs. (9) and (10).

Major axis length : Minor axis length :

trace C þ trace C

pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ trace C 2 4det C 2 pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

ð9Þ

2

trace C 4det C 2

ð10Þ

The direction of the image is calculated as the angle h between the major axis and the horizontal axis described in Eq. (11),

h¼

1 2covðx; yÞ tan1 2 covðx; xÞ covðy; yÞ

ð11Þ

Finally, the major and minor axes are used to draw the rectangle, showing the reconstructed, detected objects from both feeds, as displayed in Fig. 7. After proving that the detection code can extract the relevant information from each channel, a feature-level algorithm operates on the features from Fig. 7a and b. The feature-level fusion can apply one of several relationships or logics between the detected objects from each sensor; such

Fig. 5. Seed initiation: (a) LWIR image and (b) visible image.

Fig. 6. Boundary detection results: (a) LWIR image and (b) visible image.

Moment : mpq ¼

XX y

Centroid : xc ¼

xp yq Iðx; yÞ

ð5Þ

m10 m01 ; yc ¼ m00 m00

ð6Þ

where I(x, y) is the intensity of the pixel at location (x, y), p and q are the order of location x and y, respectively. m00 is the region area, m10 is the region moment about x axis, m01 is the region moment about y axis. The eccentricity is calculated using lengths of the major and minor axes as described through Eqs. (7) and (10). To calculate the major and minor axes lengths, the covariance matrix is calculated as;

C¼

covðx; xÞ

Fig. 7. Feature extraction results: (a) LWIR image and (b) visible image.

x

covðx; yÞ

covðy; xÞ covðy; yÞ

I

a

V

I

V

b

V

I I

V

c

ð7Þ Fig. 8. Feature-level fusion conditions: (a) intersect, (b) disjoint, and (c) include.

Y. Zhou et al. / Infrared Physics & Technology 53 (2010) 43–49

47

relationships are shown graphically in Fig. 8a–c, where circle I represents infrared tracking and rectangle V represents visual tracking. Because, the wavelength of LWIR is from (7.5–13 lm) and that of the visible light from (0.38–0.78 lm), the tracking information from infrared and visual images is inherently complementary. Therefore, a series of feature level operations are applied in the upper three conditions based on their complementary relationship. In intersect condition, both tracking information contain the information of the same target; so an OR operation F = I [ V is applied to obtain the union of I and V to obtain more of the targets features. In disjoint condition, each detected objects contains the information for a different targets; so an OR operation F = I [ V is applied to obtain the union of I and V and keep both of them. In include condition, both tracking contain the information of the same target. However, because one is included in another one, there is some redundant information. So an AND operation F = I \ V is applied to obtain the intersection of I and V to obtain more detailed information of the target. The fused image of the passing vehicle scenario from Fig. 1 is shown in Fig. 9. Both the emitted information from the warm features such as the tires from the infrared image and the reﬂective features as those in the road sign and the vehicle head-lamps from visual image, are retained in the ﬁnal fused image. For the approaching vehicle with glare scenario, the fused image is in Fig. 10. The infrared input carries more detailed information about the vehicle but it misses the reﬂective components from the ambient objects; road lane-markings and signs. While the visual image carries more information about the road sign and road-lane paint markings but is saturated with the bright glare, which saturates the object preventing any prediction of the object type (shape), or orientation. For this scenario, the feature-level results in the

Fig. 11. Pixel-level fusion of: (a) passing vehicle and (b) approaching vehicle scenario.

image displayed in Fig. 10, which retains useful information from both the infrared and the visual domains and at the mean time attenuate the glare to show the vehicle in details.

Fig. 9. Feature-level fusion of LWIR image and visible image.

3. Feature-level vs. pixel-level fusion This section discusses the implemented feature-level fusion results in light of the results from a pixel-level fusion conducted for the same scenarios. We start by demonstrating the pixel-level fusion for the images from Figs. 1 and 2, which are displayed in Fig. 11a and b, respectively. The displays in Fig. 11a and b shows the pixel-level fusion results along with the tracking routine trace. For the pixel-level fusion an adaptive weighting algorithm is used; mathematically described in Eq. (12), to combine the LWIR and the visible raw images into the ones in Fig. 11. The complete details of this algorithm are in reference [3], however its processing blockdiagram is displayed in Fig. 12 for completeness.

ImgFused ¼

Fig. 10. Feature-level fusion of approaching vehicle scenario.

a ImgInfrared ImgVisible Img2Infrared þ Img2Visible

ð12Þ

where Img represent each of the images, and a is found to have the value of 2. Generally both fusion codes are successful in integrating more information in the ﬁnal fused image, however in Fig. 9 the detected vehicle trace is more accurate in describing the shape of the vehicle from that in Fig. 11a. Also, the feature-level result has detected the

48

Y. Zhou et al. / Infrared Physics & Technology 53 (2010) 43–49

Infrared Image

Visual Image

Preprocessing

Preprocessing

To further compare the pixel-level and feature-level computations, the signal to noise ratio (SNR) of each of the resulted images are computed using Eq. (13), which is derived from a relation proposed by Krile et al. [10].

Pk SNR ¼ Image Fusion

Seed Growth

Converge?

No

Yes Done

Fig. 12. The block-diagram for the pixel-level fusion code.

trafﬁc sign that is missed from the pixel-level image in Fig. 11a. Comparing Figs. 10 and 11b show that feature-level have resulted in more detected features that include; the lane-markings, the trafﬁc signs, and the light-spots in the background, which are all missing from the pixel-level image in Fig. 11b. On the other hand, comparing Figs. 10 and 11b show that the feature-level result is saturated with glare compared with the pixel-level image, which displays the approaching vehicle without any glare in the ﬁeld of view.

a

. ½Pf ðiÞ2 k ð13Þ

½rðNðiÞÞ2

where Pf(i, j) is the intensity proﬁle within the image background, and noise N(i) is the difference between two proﬁles such as Pf(i) within the same proximity. Applying Eq. (13) on pixel and feature-level fused images, results in an average SNR of 4.72 and 6.18, respectively; this difference in SNR might be due to the Gaussian ﬁlter implementation sequence in each of the codes. However, the SNR computation indicates that the feature-level provides better fused-images in terms of resolution from the same input channels, when compared with the pixel-level scheme. To further investigate the reconstructed object’s boundaries, an intensity proﬁle is drawn across some of the passing vehicle features from each of the fused images; the resulted proﬁles are compiled in Fig. 13, which shows that the head-lamps reconstructed boundaries are better in the case of the feature-level, while those of the tires are more detectable in the pixel-level image. Additionally, the data extracted from each input channel for each of the fusion routines, are investigated through subtracting each raw image from the fused one. An image arithmetic subtraction is applied to yield the images in Fig. 14a and b. These images are for approaching vehicle with glare scenario. Investigating Fig. 14a and b indicates that the pixel-level extracts more information from the channel with higher intensity while the feature-level highlights the input with higher number of features. This is due the fact that in feature-level, the feature extraction operates on each input separately, then fuses them hence the difference in intensity scale for each input is neutralized. Other quantitative criteria such

Boundary Detection

Seed Detection

i

250 Pixel-level Feature-level 200

150

100

50

0

b

1

2

3

4

5

6

7

8

9

10

11

160 Pixel-level Feature-level

150 140 130 120 110 100 90 80 70 60

1

2

3

4

5

6

7

8

9

10

Fig. 13. The intensity across features of interest from the approaching vehicle: (a) the vehicle head-lamp and (b) the vehicle tires.

Y. Zhou et al. / Infrared Physics & Technology 53 (2010) 43–49

49

Fig. 14. Features added from the infrared image to the visible raw image, the result of subtracting the: (a) feature-level and (b) pixel-level images from the visible image of the approaching vehicle scenario.

as the Image Quality (IQ) proposed by Liu in [4] are more suitable to multi-spectral (2 bands) fusion with focus on edge intensity and Gaussian noise contribution. 4. Conclusion This manuscript presented a combined algorithm to track and fuse relevant objects and features from LWIR and visible sequences, into reconstructed fused-images. An in-house tracking algorithm is used to extract the relevant features from each input channel, using a double thresholding, self-calibrated process, which equalizes the histogram of each raw feed. The proposed fusion-level is applied successfully on the two discussed driving scenarios and can be further applied on real-time videos with 30 Hz refresh rate. Additionally, the text evaluated the results from the feature-level approach with those from a pixel-level code, which uses an adaptive weighting and a PCA transformation. The comparison indicates that the feature-level fusion produces ﬁnal images with higher SNR than those obtained using a pixel-level scheme. Additionally, the pixel-level code is more sensitive to the intensity scales of objects from the input images, while the feature-level focuses on the number of objects of interest from the raw feed. Even though, the feature-level implements the feature extraction subroutine to both of the input images and hence requiring more processing resources, it avoids the problem of checking each input pixel for saturation as the case in the pixel-level routine. The pro-

posed and evaluated, feature-level code shows the potential of such technique in combining the relevant data from raw, semi-processed, and processed input images. Further optimization is required to improve the code reconstruction of detected objects boundaries. References [1] L. Wald, T. Ranchin, M. Mangolini, Fusion of satellite images of different spatial resolutions: assessing the quality of resulting images, PhEngRS 63 (6) (1997) 691–699. [2] W.C. Kao, Real-time image fusion and adaptive exposure control for smart surveillance systems, Electronics Letters 43 (18) (2007). [3] Y. Zhou, M.A. Omar, Routines for fusing infrared, visible acquisitions, applied to night vision systems, International Journal of Optomechatronics 3 (1) (2009) 41. [4] R. Blum, Z. Liu, Multi-Sensor Image Fusion and Its Applications, Marcel Dekker/ CRC Press, London, 2005. [5] M. Omar, Y. Zhou, Pedestrian tracking routine for passive automotive night vision systems, Sensor Review 27 (4) (2007) 310–316. [6] L.G. Brown, A survey of image registration techniques, ACM Computing Surveys 24 (4) (1992) 325–376. [7] T.S. Lee, J.-S. Shie, T.L. Hung, Application studies of a simulated low density room-temperature IRFPA, SPIE Proceedings 3361 (1998) 26–34. [8] M. Omar, K. Chuah, K. Saito, A. Numasato, M. Sakakibara, Infrared seed inspection system (IRSIS) on painted car shells, Infrared Physics and Technology 48 (2006) 240–248. [9] M. Omar, B. Gharaibeh, A.J. Salazar, K. Saito, Infrared thermography and ultraviolet ﬂuorescence for the nondestructive evaluation of ballast tanks’ coated surfaces, The NDT and E International Journal 40 (2007) 62–70. [10] L. Krile, S. Mitra, Digital registration technique for sequential fundus images, Applications of Digital Image Processing X (829) (1987) 293–300.

Feature-level and pixel-level fusion routines when coupled to infrared night-vision tracking scheme

Feature-level and pixel-level fusion routines when coupled to infrared night-vision tracking scheme

Recommend Documents