Infrared dim target detection method inspired by human vision system

Infrared dim target detection method inspired by human vision system

Optik - International Journal for Light and Electron Optics 206 (2020) 164167 Contents lists available at ScienceDirect Optik journal homepage: www...

2MB Sizes 0 Downloads 38 Views

Optik - International Journal for Light and Electron Optics 206 (2020) 164167

Contents lists available at ScienceDirect

Optik journal homepage: www.elsevier.com/locate/ijleo

Original research article

Infrared dim target detection method inspired by human vision system

T

Shaoyi Lia,*, Chenhui Lia, Xi Yanga, Kai Zhanga, Jianfei Yinb a b

School of Astronautics, Northwestern Polytechnical University, Xi'an, 710072, China Shanghai Academy of Spaceflight Technology, Shanghai, 201109, China

A R T IC LE I N F O

ABS TRA CT

Keywords: Infrared dim target Target detection Scale adaptation Visual contrast Pipeline filtering

Infrared dim target detection has long been a key technology for various systems, such as infrared search and track (IRST) systems and the Space-Based Infrared System (SBIRS). However, it is difficult for traditional detection methods to adapt to different types of complex backgrounds. Therefore, this paper proposes an adaptive infrared dim target detection method based on human visual contrast, motion, prediction, and other characteristics. First, according to the characteristics of different types of background images, the classification preprocessing strategy is adopted to remove noise, suppress the background, and improve the target signal-to-noise ratio. Second, on the basis of the visual contrast and scale adaptation mechanism, we propose an adaptive multi-scale local contrast method to extract the saliency region, and we then analyze the spectral scale to further suppress the background, enhance the target central area, and construct a suspected target set. Finally, the candidate moving target set is obtained by motion region matching using the optical flow method, and a multi-frame screening strategy combined with dynamic pipeline filtering is proposed to identify the target and reduce the false positive rate. Our experiment results indicate that the proposed method can adapt to changes in the target scale and achieve stable and adaptive detection of dim targets in the background of sky, sea-sky, and ground objects.

1. Introduction Infrared dim target detection technology, which is widely used in infrared search and track (IRST) systems, the Space-Based Infrared System (SBIRS), infrared imaging systems, guided missile systems, etc., is a hotspot in military research today. In these applications, owing to the long imaging distance, the target imaging area is small and no shape or texture information is available. Furthermore, in practical complex environments, applications present a variety of characteristics, such as diverse background types, dynamically changing and highly fluctuating backgrounds, low signal-to-noise ratio (SNR), fluctuating and submerged target energy, and dark or dim targets. Small target imaging areas (1 pixel×1 pixel ∼ 5 pixel ×5 pixel) and relatively low SNR (2∼6) are shown in Fig. 1. All these factors make infrared dim target detection and tracking methods in practical applications extremely challenging in terms of background adaptation, change in target scale, low SNR, high false positive rate, etc. In recent decades, domestic and foreign researchers have conducted numerous studies on infrared dim target detection algorithms for image sequences, which are mainly divided into two categories: detection before tracking (DBT) and tracking before detection (TBD). Typical DBT single-frame detection methods include spatial maximum median filtering [1], morphology top-hat [2], least-



Corresponding author. E-mail address: [email protected] (S. Li).

https://doi.org/10.1016/j.ijleo.2020.164167 Received 11 October 2019; Received in revised form 31 December 2019; Accepted 1 January 2020 0030-4026/ © 2020 Elsevier GmbH. All rights reserved.

Optik - International Journal for Light and Electron Optics 206 (2020) 164167

S. Li, et al.

Fig. 1. Infrared dim target environment.

squares support vector machine (LS-SVM) [3], two-dimensional least-mean-square filtering (TDLMS) [4], and wavelet transform [5]. The target screening stage includes pipeline filtering [6] and the Hough transform [7]. TBD methods include three-dimensional matching filtering [8], particle filtering [9], and dynamic programming [10]. Inspired by the powerful ability of the human visual system in dim target perception and prediction, this study proposes an adaptive infrared dim target detection method that can adapt to multiple background types and target scale changes for complex backgrounds, such as sky and sea-sky backgrounds. This method combines the spatial- and frequency-domain significance detection models to enhance the contrast between the target and the background, and thus achieve adaptive candidate target region extraction. Then, combined with the target motion characteristics in the time domain, the optical flow method and the dynamic pipeline filtering screening strategy are used to complete the suspected target rejection. The remainder of this article is organized as follows. Section 2 reviews the related studies. Section 3 describes the steps of the proposed infrared dim target detection method. Section 4 presents and analyzes the experimental results. Finally, Section 5 summarizes the findings of the study and explores directions for future work. 2. The related work In recent years, human visual characteristics, such as the contrast mechanism, visual attention mechanism, scale adaption, and multi-resolution representation, have been introduced to infrared dim target detection [11]. On the basis of the above-mentioned characteristics, Kim et al. [12] used the Laplacian of Gaussian (LoG) filter to simulate the contrast mechanism, which can enhance small target brightness while suppressing background clutter. Further, their Tune-Max of SCR (TM-SCR) algorithm can solve scale problems and achieve clutter suppression [13]. Wang et al. [14] used the difference of Gaussian (DoG) filter for calculating the saliency map. As an approximation of LoG, DoG has the same characteristics but involves less computation. Dong et al. [15] added a Gaussian window to the saliency map processed by the DoG filter to simulate the visual attention mechanism. In addition, Chen et al. [16] proposed a detection method based a local contrast measure (LCM), which exploits the human visual system (HVS) characteristics and derived kernel (DK) models. However, this method requires pixel-by-pixel calculation, which leads to relatively low efficiency, while highlight noise can easily result in false positives. Han et al. [17] improved the method by adopting the image blocking strategy and adjusting the algorithm for the highlight noise; however, its detection effect is strongly dependent on the window size. Some methods improve the operator to improve the performance of detection algorithms [18–27], but the computational complexity cannot meet the engineering demand. With the increasing focus on the real-time nature of detection methods, frequency-domain methods have shown the advantages of simple principle, low computational complexity, and strong effect. Studies have indicated that after the image transformation, the spectrum contains a large amount of information related to the significance. The spectral residual (SR) method proposed by Hou et al. [28] is regarded as a classical frequency-domain significance detection method. Accordingly, Guo et al. [29] found that only the phase spectrum can be used to obtain the same effect, and they added color features and proposed a detection method based on the phase spectrum of the quaternion Fourier transform (PQFT). Qi et al. [30] used a second-order directional derivative (SODD) filter constructed by a facet model to extract the second-order directional derivative map of the original image in four directions, which is used as the quaternion data channel. The filter is capable of distinguishing the target from the background in accordance with the Gaussian nature of the infrared dim target and the anisotropy of the background. PQFT is also used to suppress the background and enhance the target. Some latest theories are also being used to quickly detect infrared targets [31–43]. It is worth noting that none of 2

Optik - International Journal for Light and Electron Optics 206 (2020) 164167

S. Li, et al.

Fig. 2. Block diagram of the proposed algorithm.

the above-mentioned methods uses spectral information. In the hypercomplex Fourier transform (HFT) significance detection method based on spectral scale analysis proposed by Li et al. [44], the amplitude spectrum of the image can reflect the difference between significant and non-significant regions, and the periodic signal of the spatial domain after the Fourier transform corresponds to the peak at a certain frequency in the spectrum. The peak is more obvious as the number of repetitions increases; by contrast, the significant area is relatively smooth in the frequency spectrum. Once the non-significant area is found, the detection problem translates into a problem of how to suppress this part effectively. Spectral scale analysis is used to determine the optimal smoothing scale. This method is applicable to RGB images. However, for infrared dim target detection, identifying the features to be used is the key to the effectiveness of the algorithm. 3. Proposed method In this study, we propose an infrared dim target detection method inspired by the human visual system. Fig. 2 shows the framework of this method, which consists of four parts. First, in the preprocessing part, median filtering is used to reduce noise and to determine whether a background difference step is required according to the image condition. In addition, 2D-DoG filtering is uniformly adopted to suppress the background and enhance the target. Second, combined with the scale adaptation concept, the local contrast is computed and the adaptive threshold is used to extract the significant region. Then, the background is further suppressed by combining the intensity and direction characteristics and using spectral scale analysis to construct the saliency map. Finally, in the spatial domain, the motion region extracted by the optical flow method is matched with the salient region, and the multi-frame screening strategy of dynamic pipeline filtering is then used to obtain the real target position. The following subsections provide a detailed description of each part of the method. 3.1. Spatial significant area extraction According to the visual physiological structure and psychological theory, human visual attention has certain selectivity, and the parts that attract attention are called salient regions. For the HVS, owing to the contrast mechanism, contrast is more important than features such as shape and color [45]. Furthermore, in visual attention, the positioning of the target is not affected by dimensional changes. In this study, on the basis of the contrast mechanism and scale adaptive characteristics of the HVS, we compute the multiscale local contrast and extract the saliency region using the adaptive threshold. 3.1.1. Spatial filtering The concentric circular antagonism model of the retinal ganglion cell receptive field can be expressed by the DoG [26]. The twodimensional DoG operator is defined as follows:

DoG (x , y ) =

1 − e 2πσ12

x 2 +y2 2σ12



1 − e 2πσ22

x 2 +y2 2σ22

(1)

where the two Gaussian functions respectively correspond to the excitation zone of the central strong action and the suppression zone of the weak periphery. It can be seen that the DoG operator has central excitation and lateral suppression characteristics, which can simulate the contrast mechanism of the HVS. Therefore, for the preprocessing stage, the proposed method uses a 2D-DoG filter to suppress the background and improve the target and background contrast. The 2D-DoG filter combines multiple narrow-band-pass Gaussian filters to form a broad-band-pass filter as follows: N

N

∑ DoG (x , y,σ1n, σ2n) = ∑ [G (x , y,σ1n) − G (x , y, σ2n)] n=1

n=1

= G (x , y, σ1N ) − G (x , y, σ21)

(2)

> ≥ > N represents the number of filters, and the two scales where frequencies of the filter. A 5 × 5 Gaussian kernel is used to simplify the operation. σ1n + 1

σ2n + 1

σ1n

σ2n ,

σ1N

and

σ21

determine the low and high cutoff

3.1.2. Multi-scale local contrast feature extraction An image blocking strategy is adopted for the filtered result image. A p × p sliding window is used to traverse the entire image 3

Optik - International Journal for Light and Electron Optics 206 (2020) 164167

S. Li, et al.

Fig. 3. Schematic of multi-scale local contrast calculation.

from left to right and top to bottom with a fixed step size to obtain a series of image patches. According to the definition of a small target scale, we set a step size of 7 and p = 15 in the algorithm. Note that the size of a sub-block in [17] is fixed and it cannot adapt to varying target sizes, which affects the significance of the image patches. To solve this problem, we have made some improvements in combination with the concept of scale adaptation. As shown in Fig. 3, by sequentially changing the scale, sub-blocks of different sizes in the center of each image block are taken as the suspected target area (reference numeral 0). Its 8 neighborhood sub-blocks, which are of the same size as the suspected area, become the partial background (references 1, 2,…, 8). We set σ = 3,5, 7…p in this algorithm. The value of any sub-block is represented by the grayscale mean of all the pixels in it.

m=

1 N

∑ Ic (i, j) (3)

i, j

where N is the image sub-block pixel count, Ic (i, j ) is the grayscale value of the pixel located at coordinates (i, j ) in the sub-block, and Ln represents sub-block grayscale maximum as follows:

L = max(Ic (i, j ))

(4)

We can use the following formula to calculate the local contrast of the image block at different scales.

ILCM σ = min

L0σ × m0σ (k = 1, 2, ...8) mkσ

(5)

m0σ

L0σ

is the maximum grayscale value of the 0th sub-block, is the grayscale mean value, and mkσ where σ represents different scales, represents the grayscale mean of the 8 neighborhood sub-blocks, where k = 1, 2, ...8. The saliency of the corresponding image block n is obtained by calculating the maximum value of the local contrast at each scale:

Cn = max(ILCM σ )(σ = 3, 5, ...p)

(6)

The saliency matrix of the entire graph can be expressed as

S (a, b) = Cn

(7)

where a and b represent the number of rows and columns of an image block. 3.1.3. Adaptive salient region extraction To remove the point noise in the infrared scene image, the median filter is first used to process the original image with a 3 × 3 template. Considering the influence of different backgrounds and illumination variations on the target, we propose a classification processing strategy. For a uniform or slowly changing background, we adopt the difference and normalization operation to process the median filtering result Im , whereas for a non-uniform or rapidly changing background, we adopt the median filter. Low target brightness and small scale make it extremely easy for the median filter to lose target information, while the difference image can effectively preserve the target. The result of this phase is used as the input to the 2D-DoG filter. First, define the difference image as follows:

Id (i, j ) = In (i, j ) − Im (i, j )

(8)

The part of Id that is less than zero is set to zero, i.e.,

Id (i, j ) Id (i, j ) = ⎧ ⎨ ⎩ 0

(Id (i, j ) ≥ 0) (Id (i, j ) < 0)

(9)

Then, normalize the results as follows:

I ′d = Id•

(Xmax − Xmin ) Xmax

(10)

where In is the nth frame image, Im is the median filtering result of In , and Id is a difference image. Further, Xmax and Xmin are the grayscale maximum and minimum values of the image, respectively. 4

Optik - International Journal for Light and Electron Optics 206 (2020) 164167

S. Li, et al.

According to the mean of the normalized difference image I ′d , the classification processing result is obtained as

(μ (I ′d ) ≥ T0 ) (μ (I ′d ) < T0 )

Im Itemp = ⎧ ⎨ ⎩ Id

(11)

where μ (I ′d ) represents the mean value of the difference image and T0 is the classification threshold, which is determined experimentally. The significant region can be obtained using the threshold segmentation algorithm to process the significant matrix S . The adaptive threshold Th can be calculated as

Th = μs + kσs k = k 0 σI ′d μ I ′d

(12)

where μs and σs are the mean and standard deviation of the saliency matrix, respectively, and k is the adaptive adjustment factor, which is determined by the mean and variance of the normalized difference map. In general, k 0 is selected according to the image data test, and it is set to 2.0 in this paper. 3.2. Frequency domain significant region center enhancement Background clutter still exists in the extracted region described in Section 3.1. To further highlight the target central region, by learning from the concavity characteristics of the human eye, we use the frequency-domain saliency method to enhance the contrast between the target region signal and the background. First, the directional feature is extracted for any significant region using a SODD filter [30]. The long-range infrared dim target approximates the isotropic Gaussian type owing to the optical diffusion effect, while the background clutter and noise present a single direction in the local range. The SODD filter distinguishes between the background clutter and the target by converting them into directional texture and Gaussian-like spots, respectively. The SODD along the direction vector at the point (x 0 , y0 ) is expressed as

∂2f (x , y ) ∂ 2 2 ′ ′ →2 |(x 0, y0 ) = → (f x cos α + f y sin α) = 2K 4 (x 0 , y0 )cos α+ 2K5 (x 0 , y0 )cosαsin α+ 2K 6 (x 0 , y0 ) sin α (13) ∂l ∂l → where α is the angle between l and the X axis (image line direction). The fitting coefficient Ki (x 0 , y0 ) (i = 4, 5, 6) in the above-mentioned formula can be obtained using the following template operation.

2 2 2 2 2 ⎤ ⎡ 4 ⎡ 2 1 1 ⎢− 1 − 1 − 1 − 1 − 1⎥ 1 ⎢ 2 W4 = 0 ⎢ 0 ⎢− 2 − 2 − 2 − 2 − 2 ⎥, W5 = 70 ⎢ − 1 − 1 − 1 − 1 − 1⎥ 100 ⎢− 2 − 1 ⎢ ⎢ 2 2 2 2 2 ⎥ ⎣ ⎦ ⎣− 4 − 2

0∘,

45∘ ,

90∘,

0 − 2 − 4⎤ 0 − 1 − 2⎥ 0 0 0 ⎥, W6 = W4T 0 1 2 ⎥ 0 2 4 ⎥ ⎦

(14)

135∘ .

Let α = The SODD maps in the four directions are obtained using Eq. (14). Note that the grayscale value of the target in the obtained SODD maps is negative, and the grayscale value of the neighborhood is positive. To solve this problem, it is necessary to set the grayscale values greater than zero to zero. Then, using the orthogonal fusion strategy, we merge the two vertical directions in the four directions, respectively, to obtain two merged SODD maps. The fusion operation can effectively suppress the directed background clutter region from the orthogonal direction, and it can highlight the small target region that is not affected by the filtering direction. The specific function can be expressed as

∂2f (x , y ) ∂2f (x , y ) | α = 0• | α = 90∘ →2 →2 ∂l ∂l ∂2f (x , y ) ∂2f (x , y ) SODD2 = | α = 45∘• | α = 135∘ →2 →2 ∂l ∂l SODD1 =

(15)

On the basis of the frequency-domain scale analysis method [21], the best scale filter is used to smooth the spectral spikes and thus suppress the background and other periodic backgrounds at zero frequency. The detailed steps of the process are as follows. A super complex matrix is constructed using SODD maps and preprocessed images:

→ → → q (m , n) = λ1 p1 + λ2 p2 i + λ3 p3 j + λ 4 p4 k

(16)

→→ → where i , j , k represent three virtual axes and p1,2,3,4 represent four channels. We set each channel as follows: p2 = SODD1 , p3 = SODD2 , p4 = Is

(17)

where Is is an image processed by 2D-DoG filtering. The corresponding coefficient of each channel is λ1 = 0, λ2 = λ3 = 0.25, λ 4 = 0.5 respectively. Considering that the magnitudes of different channel feature maps are not uniform, we perform the normalization 5

Optik - International Journal for Light and Electron Optics 206 (2020) 164167

S. Li, et al.

Fig. 4. Basic process of dynamic pipeline filtering algorithm.

operation before the fusion. The Gaussian low-pass filter cluster represented by Eq. (18) smooths the spectrum of the super complex matrix to yield the spectral scale space Λ .

Λ (u, v; k ) = (g (.,.;k ) ∗ A)(u, v )

(18)

where A (u, v ) is the spectrum of the super complex matrix, “∗” represents the convolution operation, and g (.,.;k ) represents the Gaussian low-pass filter cluster, which is defined as follows:

g (u, v; k ) =

2 2 2k − 1 2 1 e−(u + v )/(2 t0 ) 2π 2k − 1t0

(19)

Where k is the scale parameter, k = 1, ...K , and K = log 2 min{H , W } + 1, where H , W are the height and width of the image. Further, t0 is set to a constant value of 0.5, and the standard deviation of the filter is 2k − 1. Combining the processed spectrum with the original phase and performing inverse Fourier transform gives the following significant graph group:

SMk = h ∗ |F−1 [Λk (u, v ) e jϕ (u, v) ]|2

(20)

where ϕ (u, v ) denotes the original phase spectrum of the super complex matrix and h denotes Gaussian smoothing filters with a value of A to enhance the saliency effect. ˜ is selected from the saliency map group using two-dimensional information entropy [44]. We believe The best saliency map SM

Fig. 5. Three-dimensional saliency maps of different detection methods. Rows (1)–(3) correspond to infrared images of sky, sea-sky, and ground object backgrounds. (a) original image and results of applying the (b) proposed, (c) top-hat, (d) ILCM, and (e) PQFT methods. 6

Optik - International Journal for Light and Electron Optics 206 (2020) 164167

S. Li, et al.

Table 1 Single-frame target detection results. Original Sky Sea-sky Ground Objects

SCRin σin SCRin σin SCRin σin

Proposed 4.48 50.21 7.41 52.40 3.76 39.84

SCRG BSF SCRG BSF SCRG BSF

Top-hat 4.07 16.68 2.49 7.10 4.17 7.77

SCRG BSF SCRG BSF SCRG BSF

ILCM 3.50 6.11 1.25 5.18 1.14 5.18

SCRG BSF SCRG BSF SCRG BSF

PQFT 1.25 2.94 0.97 3.70 1.25 4.41

SCRG BSF SCRG BSF SCRG BSF

1.09 12.49 0.98 8.90 1.16 12.37

Note: Bold font indicates the best performance.

Fig. 6. Sequence detection results of sky background: (a) original image and results of applying the (b) proposed, (c) top-hat, (d) ILCM, and (e) PQFT methods.

that the target and background regions of the best saliency map should be distinguished maximally, corresponding to the minimum entropy value. The corresponding filter scale kp of the best saliency map can be obtained as

kp = arg min{H2D (SMk )}

(21)

k

where H2D (x ) represents the two-dimensional information entropy. Through the above-mentioned method, all the significant area centers are enhanced, and the full-size saliency map is constructed in its original position using all the significant p × p blocks. The connected region in which the grayscale value is greater than 0 is the suspected target region, which can be expressed as (22)

R s = {Rs1, Rs2 , ...} 3.3. Significant area matching and target screening

We consider that real targets have the characteristic of continuous motion, whereas false targets in the background do not. The traditional single-frame detection method does not use motion features, and it is difficult to remove false targets. Using multi-frame images is an effective way to solve this problem, and it can reduce the false positive rate. Here, we propose that the high accuracy (HA) optical flow method [46] can be used to estimate the motion region, the candidate target can be obtained by matching the 7

1 2 3 4 5

Image sequence

117 117 150 150 150

Total frames

1 4 1 1 1

Target number

3*3-6*6 3*3-5*5 3*3-4*4 3*3-4*4 2*2-3*3

Target scale

2.71-5.51 2.63-5.44 4.95-5.28 5.15-9.26 2.89-4.96

Range of target signal to noise ratio

Table 2 Sequence detection statistical results of sky background.

97.4 % 66.0 % 100 % 100 % 100 %

Detection rate

0.897 0.051 0.440 3.060 9.540

Top-hat False positive number/ frames 82.1 % 36.5 % 100 % 100 % 46.0 %

Detection rate

0.846 0.026 0.120 3.220 11.280

ILCM False positive number/ frames 94.9 % 53.2 % 100 % 100 % 88.0 %

Detection rate

0.154 0 0 2.580 6.920

PQFT False positive number/ frames

100 % 96.3 % 100 % 100 % 100 %

Detection rate

0.559 0.007 0 2.133 0

Proposed False positive number/ frames

S. Li, et al.

Optik - International Journal for Light and Electron Optics 206 (2020) 164167

8

Optik - International Journal for Light and Electron Optics 206 (2020) 164167

S. Li, et al.

Fig. 7. Sequence detection results of sea-sky background: (a) original image and results of applying the (b) proposed, (c) top-hat, (d) ILCM, and (e) PQFT methods.

suspected target region, and dynamic pipeline filtering can be combined to complete multi-frame target discrimination. The HA optical flow method uses the entire field optical flow constraint equation and the smoothing constraint regularization equation to construct a continuous energy functional, and the functional minimum can be solved by the variational method to obtain the optical flow field. The HA algorithm is used for the salient map per m frames, and the optical flow fields U = {u (x , y )} and V = {v (x , y )} are obtained according to the following equation: n n ⎧u(n + m) = u¯ n − Ix (Ix u¯ + Iy v¯ + It ) 2 2 ⎪ λ + Ix + I y2

Iy (Ix u¯ n + Iy v¯n + It ) ⎨ (n + m) = v¯n − ⎪v λ2 + Ix2 + I y2 ⎩

(23)

where u¯ and v¯ are the averages of the neighborhoods u, v ,respectively, which can be obtained using a local smoothing template. Further, Ix and Iy are the gradients along the x and y directions of a certain pixel of the image, respectively, It represents the change in the grayscale value of a certain pixel with time, and λ is the weight coefficient, which we set to 2. We perform a simple threshold segmentation on the optical flow field to obtain the motion region R mo . (24)

R mo = {Rmo1, Rmo2 , ...} Then, the centroid of all suspected target areas is calculated according to formula (25)

∑i, j i × Isk (i, j ) ⎧ ⎪ x¯k = ∑ I (i, j ) ⎪ i, j sk ⎨ ∑i, j j × Isk (i, j ) ⎪ y¯k = ∑i, j Isk (i, j ) ⎪ ⎩

(25)

where k represents the kth suspected target area and Isk denotes the saliency map corresponding to the suspected target area Rsk . When the centroid of the suspected target area falls into the motion area, we believe that they match with each other. The obtained candidate target area is expressed as 9

1 2 3 4 5

Image sequence

153 150 288 150 147

Total frames

2 1 2 1 1

Target number

3*3-5*5 2*2-3*3 2*2-4*4 3*3-4*4 2*2-3*3

Target scale range

Table 3 Sequence detection results of sea-sky background.

3.54-4.57 2.41-4.67 4.46-5.36 3.88-4.74 5.77-7.83

Range of target signal to noise ratio 75.5 % 100 % 39.1 % 82.0 % 100 %

Detection rate

0.196 0 1.115 10.620 1.388

Top-hat False positive number/ frames 50.0 % 100 % 66.7 % 24.0 % 100 %

Detection rate

0 0.280 0.646 12.330 13.796

ILCM False positive number/ frames 50.0 % 100 % 66.7 % 24.0 % 100 %

Detection rate

0 0 0.604 4.500 0.694

PQFT False positive number/ frames

95.6 % 100 % 98.9 % 93.3 % 95.5 %

Detection rate

0.369 0 0.416 6.200 1.000

Proposed False positive number/ frames

S. Li, et al.

Optik - International Journal for Light and Electron Optics 206 (2020) 164167

10

Optik - International Journal for Light and Electron Optics 206 (2020) 164167

S. Li, et al.

Fig. 8. Sequence detection results of ground object background: (a) original image and results of applying the (b) proposed, (c) top-hat, (d) ILCM, and (e) PQFT methods.

R c = {R ck |(x¯k , y¯k ) ∈ Rmop , k = 1, 2, ...n, p = 1, 2, ...m}

(26)

where there are a total of n suspected target areas and a total of m motion areas, and the number of candidate target areas is less than or equal to the minimum of n and m. Next, a forward pipeline is established, centering on the centroid of the candidate target region, and nearest neighbor data association is used for screening all the candidate target regions to remove the false targets. For long-distance imaging, the target centroid position changes slightly in a short time. It can be considered that target centroid position in the next frame is in the neighborhood of the centroid position in the current frame, and the change in the centroid position can be expressed as 2 ⎧ d (x¯t , y¯t ) ≤ R ⎨ d (x¯t , y¯t ) = (x¯t − x¯t − 1 )2 + (y¯t − y¯t − 1 )2 ⎩

(27)

where (x¯t , y¯t ) indicates the target position in the current frame. The neighborhood radius R is determined according to the actual situation; we set R = 9. The traditional pipeline filtering method needs to count the candidate targets after multi-frame image detection; its real-time performance is poor and it requires improvement. This section proposes a candidate target screening strategy based on real-time dynamic pipeline sequences. The basic flow is shown in Fig. 4. Step1: Check whether the original pipelines of the system match with the candidate targets of the current frame. If they match, go to step2; if not, go to step3. Step2: Update the pipeline centers and increase the length of the pipelines. Step3: Continue searching in the next frame to reduce the length of the pipeline. If nothing is found, expand the pipe radius to continue searching. If the pipe length is L < 3, discard the pipe. When the length of the pipeline of the candidate area is L ≥ 5, it is determined as a true target. After performing the abovementioned steps, we create a new pipeline for the remaining candidate targets. Note that if multiple targets match with the same pipeline, a template matching algorithm is introduced to determine the true target. 4. Experimental results In our experiments, we use infrared images of sky, sea-sky, and ground object scenes, each of which includes five sets of image sequences. The image sequences are continuously generated by a long-wave infrared camera with a resolution of 640 × 512. The target type is aircraft or ship. Top-hat [2], ILCM [17], and PQFT [30] are selected as the methods for comparison. All the experiments are performed on MATLAB 2014a software. 11

1 2 3 4 5

Image sequence

144 150 159 150 150

Total frames

1 1 1 1 1

Target number

2*2-3*3 3*3-4*4 3*3-4*4 2*2-3*3 2*2-3*3

Target scale range

1.96-3.52 4.72-5.77 4.85-5.57 3.74-4.76 3.86-4.45

Range of target signal to noise ratio

Table 4 Sequence detection results of ground object background.

93.8 % 84.0 % 100 % 78.0 % 100 %

Detection rate

2.854 8.920 3.169 23.940 5.620

Top-hat False positive number/ frames 10.4 % 80.0 % 90.6 % 58.0 % 100 %

Detection rate

4.667 10.70 19.170 54.530 2.400

ILCM False positive number/ frames 60.4 80.0 92.5 26.0 10.0

% % % % %

Detection rate

6.833 4.060 19.736 19.120 1.140

PQFT False positive number/ frames

100 100 100 100 100

% % % % %

Detection rate

0.163 0.956 0.354 1.644 4.089

Proposed False positive number/ frames

S. Li, et al.

Optik - International Journal for Light and Electron Optics 206 (2020) 164167

12

Optik - International Journal for Light and Electron Optics 206 (2020) 164167

S. Li, et al.

Fig. 9. ROC curves: (a)ROC curve of sky background; (b)ROC curve of sea-sky background; (c)ROC curve of ground object background.

4.1. Comparative analysis of single-frame target detection performance For the continuous detection of infrared dim targets, the single-frame detection effect is extremely important. The ideal singleframe processing effect should be able to enhance the target and suppress the background simultaneously. A three-dimensional saliency diagram of single-frame detection in three types of scenes is shown in Fig. 5. Obviously, this method can significantly enhance the target and suppress the background. Owing to the presence of noise in the third image sequence, the noise reduction process is uniformly performed in all the experiments. To objectively evaluate the single-frame processing effect of different algorithms, we introduce two important evaluation indexes: signal-clutter ratio gain (SCRG) and background suppression factor (BSF). The background suppression coefficient evaluates the algorithm effect only in terms of background suppression, while the SNR gain comprehensively considers both background suppression and target enhancement. SCRG and BSF can be expressed as

SCR =

Gmt − Gmb SCRout (σ ) , SCRG = , BSF = c in σb SCRin (σc )out

(28)

where Gmt denotes the average grayscale value of the target area, Gmb denotes the average grayscale value of the local background area, σb represents the standard deviation of the local background area, and the length of the local background is 3W × 3W (while that the target area is W × W ). SCRG and BSF are respectively defined as the ratio between the output and input SCR and that between the standard deviations of the original image and the image obtained after processing. Table 1 summarizes the evaluation results of the different methods. It can be seen that the proposed method can achieve the optimal SNR gain under the three types of backgrounds. Further, the background suppression ability of the proposed method is better than that of top-hat and ILCM under the complex background of sea-sky and ground objects. However, it is slightly lower than that of PQFT because the background clutter is enhanced during the processing of this method. Overall, the single-frame average detection effect of our method is better than that of the other methods under different backgrounds. 4.2. Comparative analysis of continuous frame target detection performance In this experiment, the continuous frame detection performance of the proposed algorithm is illustrated by a total of 15 sets of image sequences from three types of backgrounds. It is worth noting that the thresholds of the top-hat, ILCM, and PQFT algorithms are all set to a fixed range in comparison with our adaptive threshold method. Before the experiment, the actual target positions are marked with red circles in the original images, as shown in Figs. 6(a) and 8 (a). In this study, the detection rate (DR) and the number of single-frame false positives (FP) defined below are used as the performance indicators. The detection rate represents the number of targets detected from among all the targets in the image sequence, and the number of single-frame false positives represents the average number of false positives per frame.

⎧ DR = number of true detections ⎪ total number of targets number of false positives ⎨ ⎪ FP = total number of images ⎩

(29)

Fig. 6 and Table 2 show the results of the different methods in the sky background. Fig. 6(b) shows the test results of the proposed method; the yellow frame represents the detected target, and the green frame represents the area to be confirmed. Fig. 6(c)–(e) show the results obtained by top-hat, ILCM, and PQFT, respectively, where the targets detected are marked with a yellow frame. For sequence 2, only the proposed method detects four targets with different scales and inconsistencies, while the other methods have different levels of missed detection. In sequence 5, because of the small target and complex cloud layer background, the number of single-frame false positives of the other methods is more. However, when the proposed method is used for screening the candidate targets, the number of false positives is reduced to 0. From the detection results of the five sequences used in this study, it can be seen that the detection rate of the proposed method is the highest and the number of false positives is the smallest. Fig. 7 and Table 3 show the results of the different methods in the sea-sky background. The marking method is the same as that mentioned above. Sequence1and sequence 3 show that the detection rate of the proposed method is greater than 90 % under multi13

Optik - International Journal for Light and Electron Optics 206 (2020) 164167

S. Li, et al.

objective conditions. When there are differences in the target scale and brightness in one frame of the image, the proposed method can enhance the contrast for all the targets. From the results of sequence 4 and sequence 5, it can be seen that the PQFT method has a small number of false positives, and for sequence 4, the targets are relatively low and there is considerable interference in the background. Because of the poor target enhancement effect, the detection rate of the PQFT method is especially low. The proposed method guarantees a detection rate of 93.3 %, while its false positive rate is only slightly higher than that of PQFT; nevertheless, its average detection performance is better than that of the other methods. Fig. 8 and Table 4 show the results of the different methods in the ground object background. The marking method is the same as that mentioned above. The detection rate of the proposed method reaches 100 % in all five sequences. The results of the first four sequences show that the other three methods have general effects in complex backgrounds, and their detection rate and false positive number are affected. In particular, for sequence 5, the proposed method has the same detection rate as ILCM, but its false positive rate is higher than that of ILCM. This is because the target has lower brightness than the false positive point. When extracting the significant area, the adaptive threshold results in a larger number of false target areas The detection results of the three types of backgrounds show that the proposed method has strong adaptability to the background type, which implies that its detection rate is the highest and the false positive rate is relatively low. The detection rate of the top-hat algorithm is relatively high in most cases, but it is strongly affected by the edge morphology. The block size of ILCM is fixed, which makes the performance of the algorithm extremely susceptible to the target scale and the contrast between the target and the background. Small target scale and relatively low SNR result in ineffective detection of targets. In complex backgrounds such as ground objects, PQFT has a poor detection effect because it uses a second-order directional derivative filter, which is more suitable for suppressing directional backgrounds. To quantitatively describe the performance of the algorithm, a receiver operating characteristic (ROC) curve is plotted by setting different thresholds, where the change in the method is represented by an adaptive adjustment factor k. The ROC curve can characterize the relationship between the false positive rate Pf and the probability of detection Pd , which are defined as follows:

pd =

numberoftruedetections totalnumberoftargets

(30)

pf =

numberoffalsealarms numberofpixelsinthesequence

(31)

We select three typical sequences in the infrared scenes of sky, sea-sky, and ground objects (sequence 1 for sky background, sequence 3 for sea background, and sequence 4 for ground background), and the ROC curves are shown in Fig. 9(a)–(c). It can be seen that the proposed method outperforms the other four methods. Regardless of the background, the detection rate of the proposed method is the highest under the same false positive rate. 5. Conclusion This paper proposed an infrared dim target detection method inspired by the human visual system, which has strong adaptability to sky, sea-sky, and ground object backgrounds. First, the image was pre-processed to suppress the background and enhance the target. Second, the saliency region was adaptively extracted on the basis of the contrast mechanism and scale adaptation. By combining the intensity and direction characteristics, the saliency map was constructed using spectral scale analysis, and it was matched with the motion regions using the optical flow method to obtain the candidate targets. Finally, the false positive rate was reduced according to the pipeline screening strategy. The detection results of three types of backgrounds showed that the proposed method can effectively enhance the target, suppress background clutter, and obtain stable detection results. The proposed method uses the characteristics of time, space, and spectrum, and the trend of comprehensive utilization of multidimensional information is expected in future studies on infrared dim target detection. In follow-up investigations, we plan to study how to use the target motion direction information to further eliminate false targets. In addition, target scale estimation and precise localization are important topics for future research. Declaration of Competing Interests The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Acknowledgments This work was supported by National Natural Science Foundation of China (61703337) and Shanghai Academy of Spaceflight Technology Innovation Foundation (SAST2017-082). References [1] R. Venkateswarlu, Max-mean and max-median filters for detection of small targets, Proc. SPIE 3809 (1999) 74–83.

14

Optik - International Journal for Light and Electron Optics 206 (2020) 164167

S. Li, et al.

[2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46]

V.T. Tom, T. Peli, M. Leung, et al., Morphology-based algorithm for point target detection in infrared backgrounds, Signal Data Process. Small Targets (1993). P. Wang, J.W. Tian, C.Q. Gao, Infrared small target detection using directional highpass filters based on LS-SVM, Electron. Lett. 45 (2009) 156–158. M.M. Hadhoud, D.W. Thomas, The two-dimensional adaptive LMS (TDLMS) algorithm, IEEE Trans. Circuits Syst. 35 (1988) 485–494. G. Boccignone, A. Chianese, A small target detection using wavelets, 1998 IEEE International Conference on Pattern Recognition, 2 (1998) 1776–1778. M. Diani, G. Corsini, A. Baldacci, Space-time processing for the detection of airborne targets in IR image sequences, IEE Proc. Vis. Image Signal Process 148 (2001) 151–157. W.S.W. Shaolin, Z.X.Z. Xiaosong, Hough transform: it’s application to the linearly moving pointtargets detection, 1994 IEEE International Symposium on Speech, 2 (1994) 795–797. S. Reed, R.M. Gagliardi, L.B. Stotts, Optical moving target detection with 3-D matched filtering, 2002 IEEE Transactions on Aerospace and Electronic Systems, 24 (2002) 327–336. D.J. Salmond, H. Birch, A particle filter for track-before-detect, 2002 IEEE American Control Conference, 5 (2002) 3755–3760. Y. Barniv, Dynamic programming solution for detecting dim moving targets, IEEE Trans. Aerosp. Electron. Syst. AES-21 (1985) 144–156. C. Wang, S. Qin, Adaptive detection method of infrared small target based on target-background separation via robust principal component analysis, Infrared Phys. Technol. 69 (2015) 123–135. S. Kim, Y. Yang, J. Lee, et al., Small target detection utilizing robust methods of the human visual system for IRST, J. Infrared Millim. Terahertz Waves 30 (2009) 994–1011. S. Kim, J. Lee, Scale invariant small target detection by optimizing signal-to-clutter ratio in heterogeneous background for infrared search and track, Pattern Recognit. 45 (2012) 393–406. X. Wang, G. Lv, L. Xu, Infrared dim target detection based on visual attention, Infrared Phys. Technol. 55 (2012) 513–521. X. Dong, X. Huang, Y. Zheng, et al., Infrared dim and small target detecting and tracking method inspired by Human Visual System, Infrared Phys. Technol. 62 (2014) 100–109. C.L.P. Chen, H. Li, Y. Wei, et al., A local contrast method for small infrared target detection, IEEE Trans. Geosci. Remote. Sens. 52 (2014) 574–581. J. Han, Y. Ma, B. Zhou, et al., A robust infrared small target detection algorithm based on human visual system, IEEE Geosci. Remote. Sens. Lett. 11 (2014) 2168–2172. H. Liu, Y. Li, Z. Zhang, S. Liu, T. Liu, Blind Poissonian reconstruction algorithm via curvelet regularization for an FTIR spectrometer, Opt. Express 26 (2018) 22837–22856. C.J. Wang, T. Sun, T.F. Wang, J. Chen, Fast contour torque features-based recognition in laser active imaging system, Optik 126 (2015) 3276–3282. Z.H. Huang, L. Chen, Y.Z. Zhang, et al., Robust contact-point detection from pantograph-catenary infrared images by employing horizontal-vertical enhancement operator, Infrared Phys. Technol. 101 (2019) 146–155. D.L. L, Z.H. Li, Temporal noise suppression for small target detection in infrared image sequences, Optik 126 (2015) 4789–4795. H. Liu, Z. Zhang, J. Sun, S. Liu, Blind spectral deconvolution algorithm for Raman spectrum with Poisson noise, Photon. Res. 2 (2014) 168–171. Z.H. Huang, Y.Z. Zhang, et al., Joint horizontal-vertical enhancement and tracking scheme for robust contact-point detection from pantograph-catenary infrared images, Doi: https://doi.org/10.1016/j.infrared.2019.103156. H.M. Qu, Q. Zheng, Y.Y. Li, Q. Chen, Accuracy test and analysis for infrared search and track system, Optik 124 (2013) 2313–2317. H. Liu, T. Zhang, L. Yan, H. Fang, Y. Chang, A MAP-based algorithm for spectroscopic semi-blind deconvolution, Analyst 137 (2012) 3862–3873. Z.H. Huang, A. Pan, Non-local weighted regularization for optical flow estimation, Doi: https://doi.org/10.1016/j.ijleo.2019.164069. X.Z. Bai, Morphological center operator for enhancing small target obtained by infrared imaging sensor, Optik 125 (2014) 3697–3701. X. Hou, L. Zhang, Saliency detection: a spectral residual approach, 2007 IEEE Conference on Computer Vision and Pattern Recognition, 1 (2007) 1–8. M. Oakes, D. Bhowmik, C. Abhayaratne, Visual attention-based watermarking, 2011 IEEE International Symposium of Circuits and Systems, 1 (2011) 2653–2656. S. Qi, J. Ma, C. Tao, et al., A robust directional saliency-based method for infrared small-target detection under various complex backgrounds, IEEE Geosci. Remote. Sens. Lett. 10 (2013) 495–499. T. Liu, H. Liu, Y. Li, Z. Chen, Z. Zhang, S. Liu, Flexible FTIR spectral imaging enhancement for industrial robot infrared vision sensing, IEEE Trans. Ind. Inf. (2019), https://doi.org/10.1109/TII.2019.2934728. Z.H. Huang, Y.Z. Zhang, et al., Progressive dual-domain filter for enhancing and denoising optical remote sensing images, IEEE Geosci. Remote Sens. Lett. 15 (2018) 759–763. T. Liu, Y. Li, H. Liu, Z. Zhang, S. Liu, RISIR: rapid infrared spectral imaging restoration model for industrial material detection in intelligent video systems, IEEE Trans. Ind. Inf. (2019), https://doi.org/10.1109/TII.2019.2930463. Q. Sun, L. Li, Y.H. Xin, Research on the multi-scale low rank method and its optimal parameters selection strategy for infrared small target detection, Optik 192 (2019) 1–13. Z.H. Huang, Q. Li, et al., Iterative weighted sparse representation for X-ray cardiovascular angiogram image denoising over learned dictionary, IET Image Process. 12 (2018) 254–261. T. Liu, H. Liu, Y. Li, Z. Zhang, S. Liu, Efficient blind signal reconstruction with wavelet transforms regularization for educational robot infrared vision sensing, IEEE/ASME Trans. Mechatron. 24 (2019) 384–394. K. Zhang, X.G. Li, Infrared small dim target detection based on region proposal, Optik 182 (2019) 961–973. Z.H. Huang, L.K. Huang, et al., Framelet regularization for uneven intensity correction of color images with illumination and reflectance estimation, Neurocomputing 314 (2018) 154–168. T. Liu, H. Liu, Z. Chen, A.M. Lesgold, Fast blind instrument function estimation method for industrial infrared spectrometers, IEEE Trans. Ind. Inf. 14 (2018) 5268–5277. Z.H. Huang, H. Fang, et al., Optical remote sensing image enhancement with weak structure preservation via spatially adaptive gamma correction, Infrared Phys. Technol. 94 (2018) 38–47. H. Liu, L. Yan, Y. Chang, H. Fang, T. Zhang, Spectral deconvolution and feature extraction with robust adaptive Tikhonov regularization, IEEE Trans. Instrum. Meas. 62 (2013) 315–327. Z.H. Huang, Y.Z. Zhang, et al., Unidirectional variation and deep CNN denoiser priors for simultaneously destriping and denoising optical remote sensing images, Int. J. Remote Sens. 40 (2019) 5737–5748. X.Q. Zhang, H.S. Li, Research on target capture probability calculation model of composite photoelectric detection imaging sensor system, Optik 166 (2018) 161–168. J. Li, M.D. Levine, X. An, et al., Visual saliency based on scale-space analysis in the frequency domain, IEEE Trans. Pattern Anal. Mach. Intell. 35 (2013) 996–1010. Y.P. Deng, M. Wang, The small infrared target detection based on visual contrast mechanism, 2015 IEEE International Conference on Design, 1 (2015) 664–673. T. Brox, B. Andrés, N. Papenberg, et al., High accuracy optical flow estimation based on a theory for warping, computer vision - ECCV 2004, 8th European Conference on Computer Vision, Prague, Czech Republic, May 11–14, 2004. Proceedings, Part IV. Springer, Berlin, Heidelberg, 4, 2004, pp. 25–36.

15