Pedestrian detection in thermal images: An automated scale based region extraction with curvelet space validation

Pedestrian detection in thermal images: An automated scale based region extraction with curvelet space validation

Accepted Manuscript Pedestrian Detection in Thermal Images: An Automated Scale based Region Extraction with Curvelet Space Validation A. Lakshmi, A.G...

3MB Sizes 0 Downloads 19 Views

Accepted Manuscript Pedestrian Detection in Thermal Images: An Automated Scale based Region Extraction with Curvelet Space Validation A. Lakshmi, A.G.J. Faheema, Dipti Deodhare PII: DOI: Reference:

S1350-4495(15)30152-3 http://dx.doi.org/10.1016/j.infrared.2016.03.012 INFPHY 1980

To appear in:

Infrared Physics & Technology

Received Date: Revised Date: Accepted Date:

13 November 2015 22 March 2016 24 March 2016

Please cite this article as: A. Lakshmi, A.G.J. Faheema, D. Deodhare, Pedestrian Detection in Thermal Images: An Automated Scale based Region Extraction with Curvelet Space Validation, Infrared Physics & Technology (2016), doi: http://dx.doi.org/10.1016/j.infrared.2016.03.012

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Pedestrian Detection in Thermal Images: An Automated Scale based Region Extraction with Curvelet Space Validation A. Lakshmi, A.G.J. Faheema and Dipti Deodhare

Abstract Pedestrian detection is a key problem in night vision processing with a dozen of applications that will positively impact the performance of autonomous systems. Despite significant progress, our study shows that performance of state-of-the-art thermal image pedestrian detectors still has much room for improvement. The purpose of this paper is to overcome the challenge faced by the thermal image pedestrian detectors, which employ intensity based Region Of Interest (ROI) extraction followed by feature based validation. The most striking disadvantage faced by the first module, ROI extraction, is the failed detection of cloth insulted parts. To overcome this setback, this paper employs an algorithm and a principle of region growing pursuit tuned to the scale of the pedestrian. The statistics subtended by the pedestrian drastically vary with the scale and deviation from normality approach facilitates scale detection. Further, the paper offers an adaptive mathematical threshold to resolve the problem of subtracting the background while extracting cloth insulated parts as well. The inherent false positives of the ROI extraction module are limited by the choice of good features in pedestrian validation step. One such feature is curvelet feature, which has found its use extensively in optical images, but has as yet no reported results in thermal images. This has been used to arrive at a pedestrian detector with a reduced false positive rate. This work is the first venture made to scrutinize the utility of curvelet for characterizing pedestrians in thermal images. Attempt has also been made to improve the speed of curvelet transform computation. The classification task is realized through the use of the well known methodology of Support Vector Machines (SVMs). The proposed method is substantiated with qualified evaluation methodologies that permits us to carry out probing and informative comparisons across state-of-the-art features, including deep learning methods, with six standard and in-house databases. With reference to deep learning, our algorithm exhibits comparable performance. More important is that it has significant lower requirements in terms of compute power and memory, thus making it more relevant for depolyment in resource constrained platforms with significant size, weight and power constraints. Keywords: Pedestrian Region of Interest, Pedestrian Validation, Curvelet Transform

1. Introduction Pedestrian detection is a very active computer vision research area whose importance has been increasing during recent years. Computer vision applications such as surveillance, automatic recognition of people in rescue missions, content based indexing, automotive safety can be strongly benefited from accurate pedestrian detection technologies. The role of automatic pedestrian detection in a variety of autonomous robotic vehicles to complete their cognitive limitations has also contributed to the spurring interest and progress in this area of machine vision. Detection of pedestrians in optical images offers solutions primarily for day time or well lit situations. As far as night situations go, pedestrian detection relies on the images captured by a thermal camera, the output of which is the emission of objects in the far infrared spectrum. As the utility of the benefited computer vision areas such as robotics, surveillance improves in night times as well, more reliable night time pedestrian detection is becoming mandatory. Infra-red Images: Optical and infra red images share few common characteristics. One such is the change in appearance with view point change. The most salient feature of thermal Preprint submitted to Elsevier Journal of Infrared Physics and Technology

imaging over visible spectrum imaging is its lack of dependency on external ambient light. The most advantageous consequence of this ability, to operate in no or low light condition has been taken advantage of in the surveillance domain. Absence of shadow in infra red scenes eliminates the need for shadow removal. This is often a major challenge in visible images. Moreover, the very valuable advantage of infrared images over optical images is that they eliminate the influence of color, texture and illumination on appearance variability. On the downside, besides all their advantages, detecting pedestrians from thermal images is far from trivial as i) extremely low temperature will help the clothes to shield the heat emission and only the exposed parts of pedestrians may be pictured and ii) conditions of high temperature can strongly heat the background non-human objects which have inherent passive heat radiation behavior and make the scene look much cluttered and complex (4). Pedestrian Detection in Optical Images: Detection of pedestrians in visible spectrum range is not trivial as they can appear with fairly random shapes, dress color and textures. There are very rich research bibliographies as far as pedestrian detection using visible range light cameras are concerned. Most of the literature follows key-point based, segmentation based or March 24, 2016

shape based approaches, the latter being robust under all resolution settings. Huge gains were achieved with shape based approaches such as Viola and Jones (36) (39), which was built on Haar like features. Shape detectors have proliferated greatly from their introduction as full body detectors to body part based detectors. Either be it a full body or body part detector, features are the cues employed to achieve this goal. As optical images capture excellent texture details, it was easy to build a pedestrian model from features. Pedestrian Detection in Thermal Images: In (13), (14), (15), (16), authors have incorporated stereo vision to come up with a set of Region Of Interest (ROI) based estimation of depth of warm clusters in the scene. Few geometric restrictions are imposed on the estimated ROI, which need to be further validated as pedestrians. As this paper focuses on single camera pedestrian detection, our discussion is on the three state-of-the-art strategies used for single image pedestrian detection: i) template based detectors, ii) shape (based on features) detectors and iii) intensity based ROI Extraction with feature based validation detectors. The simplest and earliest technique to obtain pedestrians is the template matching approach (7) (8), where the 2D templates of pedestrians are shifted over the image at various scales and poses. This method comes with the burden of too high computational cost precluding use for real time scenarios. The method can be speeded up by restricting the search space by imposing certain known constraints on the target position. This might make the resultant solution too application specific. Due to its inherent disadvantages, not much attention has been paid to template based approaches. Shape detectors, the well known method for pedestrian detection in visible images, has been investigated in thermal images as well by very few authors (38); nevertheless, successfully building pedestrian models in feature space has proven challenging, thereby resulting in higher false positive rates and miss rates. This can be attributed to the lack of fine details such as texture, absence of clear edges and lack in articulation of body parts induced by clothing insulation. As the performance gain achieved in feature space alone has not been profound, shape detectors have not attracted many researchers for use in thermal images. Pedestrian detection approaches with highest performance share the standard and most suitable approach ((1), (2), (3)), namely intensity based ROI extraction followed by feature based validation. The two stage procedure has become quite popular in thermal images as it exploits the advantages of thermal images over optical images effectively. The merger of ROI extraction and validation into one step process in optical images is due to the non-availability of embedded information in the image intensity to extract the ROIs and availability of superior features to drive the pedestrian detection in the feature space alone.

identification. ROI extraction helps to do away with the irrelevant background parts. This makes the ROI extraction part inevitable to enable the latter classification part to run faster. Moreover it permits efficient exploitation of the well known fact of a thermal image, namely pedestrians are hotter than the background, thus making foreground object extraction much easier and faster compared to visible images. 2. ROI validation: Having extracted the pedestrian region, the final stage is to validate the presence of pedestrians in the given ROIs. It should be noted that thermal images render themselves useless to extract distinctive features as compared to that of optical images. Limitation in feature space may be attributed to the modality by which images are captured by thermal sensors. Thus, by-passing the ROI extraction step and searching for pedestrians in feature space alone has not been considered as a better solution in the literature of thermal pedestrian detection, yet this step is mandatory to remove the potential false positives, thus enabling superior performance. The aim of this paper is to explore the progress that has been made over the last decade of thermal pedestrian detection via intensity based ROI detection with feature based validation, identify the pitfalls and try to enhance the areas that have the most impact on the detection quality. By putting together the following contributions, the best known detection quality has been reported in this paper. For the process of pedestrian Bounding Box (BB) detection, the following three steps have been described, 1. Performance improvement through multi scale pedestrian models. 2. A formal analytic criterion to automate the identification of pedestrian scale and also to estimate the statistical parameters of pedestrians. 3. A robust probability based thresholding mechanism that has resulted in a repertoire of solutions to do away with the most prominent impediment to state-of-the-art pedestrian detectors, namely pedestrian body part splitting. For validation of the detected Bounding boxes, to reduce the number of false positives, the following two steps are proposed, 1. The use of curvelet features has been explored. To the best of our knowledge, there are no reported results on curvelet feature based pedestrian detection in thermal literature. 2. A fast implementation of curvelet transform to enable expeditious pedestrian detection has been described. The outline of the paper is as follows. Section 2 reviews the various current state-of-the-art methodologies in autonomous pedestrian detection. Section 3 introduces a robust pedestrian ROI generation framework that adopts strategies to avoid splitting of ROIs while not letting false positives go beyond a threshold. Section 4 briefly mentions the theory behind curvelet transform and the specific method that has been followed to improve

1. ROI extraction: The validation process involves processor time demanding operations such as feature extraction and classification. The image might contain significant information not relevant to the current task of pedestrian 2

the speed of curvelet transform. This section also dwells upon the proposed curvelet feature extraction method. Section 5 presents results that allow to draw conclusions about the potential efficacy of the proposed algorithm as against state-of-the-art thermal features. A separate experimental section has been dedicated to show case the advantage of the proposed method over the deep learning based object detectors, which has gained huge popularity in optical domain in the last few years.

symmetry and edge density. As the maximum and minimum size of the pedestrian bounding boxes were fixed, the processing didn’t allow the detection of pedestrians in all possible ranges. Fixed size bounding boxes might end up in too big a bounding box containing irrelevant image patches or excessively small bounding boxes missing out on some portion of the pedestrians. To overcome this effect, authors have approached the problem with a multi-resolution technique. The presence of a pedestrian was checked for, with different sized bounding boxes placed in different positions between the estimated vertical axes based on the perspective constraints. This phenomenon of exhaustive search raises the bar with reference to the required performance and processing speed. In (17), the authors have used a two-stage thresholding to solve the problem of missing pedestrian body. The base within the region of interest was calculated as the ground plane. In order to estimate the ground plane, a gyroscope was attached to the same base as the thermal camera. In (21), authors adopted a two-stage process to detect the head and the body. In the PTile head and body detection, authors made the assumption of head pixels being 2% and body pixels being 20 % of the total number of pixels, which may not always be the case. Further, the algorithm had fixed the body size as w x 3 and h x 10, where w and h are the width and height of the head respectively. In (29), to compensate for pedestrian clothing distortion, a closing operation is performed with fixed size rectangular structuring element. This can increase the number of false positives in cluttered scenes and can further fail to cater for the loss of well insulated lower limbs in pedestrian ROI detection. The region growing strategy adopted in the later stage would lead to inclusion of regions outside the pedestrian boundary. This paper proposes a pedestrian ROI extraction framework to defeat the common problem, pedestrian body part splitting, encountered in pedestrian detection in thermal images. The ROI extraction process formulated in this paper is a single solution suited for pedestrians at various scales.

2. Related Work A fair amount of literature exists on pedestrian detection, which has been detailed in the survey paper (20). This section tries to highlight the most representative research of the essential two steps of most popular thermal pedestrian detectors: intensity based pedestrian ROI extraction and feature based ROI validation. 2.1. Pedestrian ROI Extraction The most popular strategy to recover ROI employs intensity based methods. In this paper, we are more interested in intensity based ROI extraction because of its obvious advantages in thermal images as explained in the introduction above. As far as moving pedestrian detection goes, the technique exploited to obtain ROI is derived from background subtraction models (5), (6), (9), (18), where object motion has been used as a cue. Generalizing these technologies for ROI extraction in context of standing pedestrian is difficult. Since the focus of this paper is on establishing a pedestrian detector for standing humans without having to make the assumption of a static camera, the literature survey emphasizes on the papers that have published work along similar lines. Authors of (19) have used contour saliency maps to fit a bounding box around pedestrians. In order to estimate the contour saliency map, they have utilized background gradient information, which has been derived as the mean value of the training data, making it prone to the disadvantages of background modeling techniques. In (10), bright pixel vertical and horizontal projection curves were formed as the number of bright pixels along the columns / rows vs the corresponding horizontal and vertical positions. The projection curves were divided into several waves with rising and falling edges. Pedestrians were captured in an image stripe corresponding to one such wave. The main hard challenge that this method faced was the selection of threshold used to identify the bump. Authors of (11) tried to solve this problem by defining vertical and horizontal variance curves defined as the variance of intensity in image cols/rows versus the corresponding horizontal / vertical positions. The variance curves are characterized by a bigger variation range which makes it easier to select a loose threshold. In (12), a segmentation based approach was used for candidate generation. Intensity based segmentation approach and projection curve approach are susceptible to missing pedestrian parts especially limbs, though it is more evident in the later than the former. In (35), pedestrian ROI vertical axis was detected from the linear combination of histograms of grey level symmetry, edge

2.2. Pedestrian ROI Validation Having obtained the pedestrian region, researchers apply feature extraction modules to derive robust feature to feed the validation process. In the last few decades, better features have been a constant driving force for recognition quality improvement. Numerous approaches have been proposed in this direction. However, their efficacy on thermal images has not been fully explored. For all what we see in literature from olden days to date, Histogram Of Gradients (HOG) is exhaustively used to drive recognition accuracy progress (22), (24). HOG represents the distribution of localized gradients as per their orientation and weighted by their magnitudes. Authors of (23) came up with the creation of binary patterns extracted from two confined regions, known as R-HOG. Authors of (22) had explored the use of Histogram of Phase Congruency (HPC) based on the well known measure termed as phase congruency. The phase congruency is defined such that it highlights the points in the image where Fourier components are maximally in phase. 3

(a)

(b)

(c)

(d)

(e)

(f)

Figure 1: (a) and (d) Original images, (b) and (e) Thresholded image with vertical projection curves,(c) and (f) Head detection

200 100 0

50

100

150 Intensity

200

250

300

BackGround Insulated Body Parts

500 0 0

50

100

Prob of Intensity

Prob of Intensity

600 Pedestrian

800 600 BackGround

200 0 0

50

100

150 Intensity

(e)

200

250

300

Pedestrian

200 100 0

50

100

(b)

1200

400

150 Intensity

BackGround

300

0

(a)

1000

Non−Insulated Body Parts

200

250

300

400

Insulated Body Parts Non−Insulated Body Parts

BackGround

200

0 0

50

100

150 Intensity

200

250

400

150 Intensity

200

250

600 400

200

300

0 0

50

(f)

BackGround

100

150 Intensity

200

250

300

150 Intensity

200

250

300

Insulated Body Parts

600

Pedestrian

50

100

(d)

200 0 0

300

Non−Insulated Insulated Body Parts Body Parts

BackGround

(c)

Prob of Intensity

0

1000

600

400

Prob of Intensity

Pedestrian

300

1500

Prob of Intensity

400

Prob of Intensity

500 BackGround

500

Prob of Intensity

Prob of Intensity

600

250

BackGround 400

Non−Insulated Body Parts

200

300

(g)

0 0

50

100

150 Intensity

200

(h)

Figure 2: Histogram of middle (top row) and near (bottom row) range pedestrians indicate their deviation from uni-modality. (a), (b), (e) and (f): Dark BackGround, (c), (d), (g) and (h): Bright BackGround

Multi resolution transforms have generated increasing interest within the community of image processing and computer vision researchers. Researchers have reiterated the similarity between biological vision and multi scale representation (1). One of the first successful multi resolution features, HAAR wavelets were used in (25) and (26). Many multi resolution features such as wavelet entropy features based on wavelet transform (27), Double Density Dual Tree Complex Wavelet Transform (DD DT-CWT) (28) have found widespread use. While wavelet transforms are suitable for objects that could be represented as point singularities, they are ill-suited for line and surface singularities because of their lack of directionality (30). As natural images are composed of curves as well, there came the idea of analyzing the curves of the images and it resulted in the discovery of a new transform named as Curvelet Transform (CT) as proposed in (31), (32). The multi resolution geometric analysis technique with curvelets as basis function has proven to be effective in representing the images in terms of the object silhouette. To the best of our knowledge, this is the first paper to

use the curvelet transform in thermal image analysis as against its wide spread use in the long standing research record of optical images. 3. Proposed Algorithm: Pedestrian ROI Detection This section is involved in elaborating the proposed methodology to extract pedestrian ROIs by discarding the areas with no information. ROI generation is crucial as it affects the accuracy of the pedestrian recognition as well. In order to rectify the minuscule and fragmented ROIs of traditional ROI generation methods, we propose the following algorithm that incorporates the steps given below: 1. Head Detection 2. Vertical Boundary Extension 3. Region Growing 4

4. Bounding Box Estimation

2. Middle and near pedestrians: Bi-modal for minimal cloth insulation and multi modal distribution otherwise. Empirical study revealed that the background usually contributes to the first mode while all the other modes arise from foreground objects.

Distance of a pedestrian from the thermal imager influences the number of pixels subtended by an average man, the thermal contrast between the pedestrian and the background that is perceived by the camera. In summary, there is no single solution to suit pedestrian extraction at near and far ranges. The conclusion drawn was that extraction of pedestrian at distinct scales mandates a different set of algorithms. The steps that are proposed after head detection are tailored to the scale of the pedestrians to be extracted. This demands an automatic procedure to discern the scale of the pedestrian. On account of this, an analytical solution is put forward as a procedure after head detection.

With dark backgrounds, the modes of background and foreground are well separated. However, the background mode merges with foreground mode, if the background is bright.

3.1. Pedestrian Head Detection

2000

60

1500

Background 40

Background

1000

Pedestrian 20 0 0

Under most circumstances, all parts of the pedestrian may not be warmer than the background with good contrast leading to a different depth of grey intensity. Under the hot sun, most of the heat radiating objects may also appear brighter, thus making the scene cluttered. This may result in the absence of a single dominant mode in the intensity histogram corresponding to foreground pedestrians. These constraints complicate the process of coming up with an optimum threshold (with low inter class variance and high intra class variance) to separate the whole pedestrian as a single entity from the background. However, human head in thermal images always has an obvious advantage of being distinctively brighter than the background pixels (21). This property of the human head enables the choice of a very high threshold to isolate them from the background and as a consequence, permits suppression of false detection of many non-pedestrian heat emitting objects. Intensity vertical projection curve were obtained by computing the count of foreground pixels in the resultant image (Fig. 1). The curves have several heaves with rising left curves and falling right curves. Each stripe with a pair of rising left edge and declining right edge, apparently in all probability contains pedestrians as shown in Fig. 1.

Prob of Intensity

Prob of Intensity

2500

80

50

100Intensity150

200

250

300

Pedestrian

500 0 0

50

(a)

100

150

Intensity

200

250

300

(b)

Figure 3: Histogram of far pedestrian depicting their uni-modality nature: (a) Bright BackGround (b) Dark BackGround

3.2.1. Analytic solution to scale identification This section details the solution put forward to automate the procedure of recognizing the pedestrian scale. Uni-modality analysis: Towards identifying the pedestrian scale, it has been corroborated with empirical evidence in the previous section that the knowledge of the evidence of unimodality in the histogram will suffice. Accordingly, it is vital to derive an analytical criterion to assess the pattern of the histogram of vertical stripes. To estimate the degree of unimodality in the distribution, this paper adopts the technique projected by the authors of (37), where Kullback Leibler (KL) divergence (Eq. 1) has been used as a measure of deviation of the given distribution data points Y(x) from uni-modal Gaussian distribution P(x). It has been proved in that paper that the P(x) with mean µ and sigma σ as given in Eq. 2 will best fit Y(x) of length N. For the sake of completeness, the derivation for the same is provided in this paper as Appendix A. The histogram of vertical stripes is approximated with the estimated µ and σ (Eq. 2) and judged as uni-modal in the event of KL > Threshold. This will help us conclude whether the scale of the pedestrian is small (far: if uni-modal) or otherwise (middle or near). Samples of far and middle/near pedestrian histograms are shown in Fig. 4 with the approximated uni-modal Gaussian and the respective KL divergence.

3.2. Pedestrian Scale Identification The solution mentioned in this section is critical to distinguish the pedestrians based on their scale as the subsequent steps, i) vertical boundary extension is adapted to the width of the pedestrian, which in turn depends on the scale and ii) region growing is built on the type and pattern of the resultant mode of each vertical stripe. Pedestrians are grouped by their image size into small, medium and large. This division was motivated by ROI extraction framework requirements. The consequence of having pedestrians at heterogeneous ranges results in them occupying respective number of pixels in the extracted vertical stripe, leading to specific patterns in their histogram. Fig. 3 and 2 show the analysis of histograms of pedestrians (intensity vs. probability (prob) of intensity) in far, middle and near scales respectively. The histograms clearly indicate that pedestrians at the three ranges on inspection exhibit the following patterns:

N ∑

P(x) Y(x)

(1)

∑N xY(x) µ = ∑x=1 x=N x=1 Y(x) ∑N (x − µ)2 Y(x) σ2 = x=1∑ x=N x=1 Y(x)

(2)

KL = −

x=1

1. Far pedestrians: Unimodal distribution. 5

Y(x) ln

0.012 Prob of Intensity

Prob of Intensity

2500 Pedestrian histogram Approximated Histogram

2000 1500 KL = 0.0512

1000 500 0 0

50

100

150 200 Intensity (I)

(a)

250

300

Pedestrian histogram Approximated histogram KL = 0.4529

0.01 0.008 0.006 0.004 0.002 0 0

50

100

150 Intensity

200

250

300

(b)

Figure 4: Uni-modal approximation of (a) Histogram of far range pedestrian (less KL indicates the presence of uni-modality), (b) Histogram of middle range pedestrian (large KL proves the deviation from uni-modality)

Figure 5: Bottom plot: Vertical projection curve (column number vs. sum of intensity) and its approximated Gaussian, Top plot: Head boundary extended as ±3σ

3.3. Vertical Boundary Estimation The vertical stripes detected in Sec 3.1 may not reflect the exact width of the pedestrians as they are the resultants of pedestrian head segmentation. The width of the entire body can be expected to be wider than that of the head. The width of the candidate vertical stripes are expanded as follows and further shrunk to the exact pedestrian width as explained in the following sections.

RMC =

Ii j > T and i ∈ [ih , R]

(3)

Ii j is the value of the pixel under analysis at row i and col j, ih is the row position of the pedestrian head in the vertical stripe, R is the number of rows and T is the threshold. Fig. 6 shows the region growing of pedestrians at various scales. It indicates the supremacy of the methodology used, to eradicate the splitting of body parts.

3.3.1. Far Range Pedestrians In this scenario, the difference between the width of the head and body of a human will essentially be very small because of the resolution lost during image capture. These vertical stripes are extended few pixels on both the sides to cater for the slight increase in human torso size as against the human head. Empirically it has been ascertained that five pixels meet the requirement. 3.3.2. Middle and Near Range Pedestrians In this case, the left and right boundary of the human torso will be evidently far from the estimated left (Jhst ) and right (Jhend ) vertical boundaries of the head. In order to get a clue of the right and left boundaries of the pedestrians as a whole, the following steps are taken:

{

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

(i)

(j)

Figure 6: (a), (c), (e), (g), (i): Vertical stripes with pedestrians. (b), (d), (f), (h), (j): Region Grown without missing cloth insulated body parts

T is estimated on the basis of the distribution of pixel intensities in the respective vertical stripe as explained below. Estimating the thresholds locally at each vertical stripe came with the advantage of adaptively countering the effect of large variation of the global background.

1. The vertical projection curve, v p (x) (where x ∈ [Jhst , Jhend ]), of the original intensity in each vertical stripe is modeled as a uni-modal Gaussian. 2. The parameters of the Gaussian are estimated as furnished in Eq 2. 3. The vertical boundary is extended as ±3σ as shown in Fig. 5.

3.4.1. Threshold for Far Range The statistical characteristics (as discussed in Section 3.2) of pixel intensities in the vertical stripe that embeds far pedestrians exhibited uni-modal characteristics. This could be intuitively acknowledged by the fact that the number of foreground pedestrian pixels is far lower than that of background pixels. Hence, pedestrians contribute mainly to the tail of the histogram as they generally tend to have higher magnitude than the surrounding background and as they are bound to be in minority. The threshold which is desired to pick maximum of pedestrian pixels while ensuring minimum of background pixel selection is presumed to be located before the elongated tail. This justifies our choice to set the threshold based on the T-Point algorithm proposed in (33).

3.4. Region Growing This section details the methodology followed to extract the foreground region from the background in each vertical stripe with probable presence of pedestrians. Region growing was preferred over the simple threshold based segmentation in order to take advantage of the contiguity property of pedestrians and its obvious advantage over naive segmentation techniques to allow additional constraints to be imposed. The regions were grown with pedestrian head as seed points and 8neighborhood connectivity, depending on a Region Membership Criterion (RMC) as described below. 6

400 Threshold 200 0 0

50

100 150 200 250 300 Intensity

1500

Threshold = 25

1000 500 0 0

(a)

Threshold

50

100 150 200 250 300 Intensity

Prob of Intensity

Threshold = 62

Prob of Intensity

Prob of Intensity

600

The above equation can be written using Langrange multiplier as,

600 Threshold = 17 400 Threshold

min pF E F (T ) + λpB E B (T )

200

T

0 0

50

(b)

Where, λ = 0.1 ∗ pB , thus λ ∈ [0, 0.1]

100 150 200 250 300 Intensity

To find the threshold (T ), we need to differentiate Eq. 6 with respect to t and equate them to zero which results in

(c)

Figure 7: Threshold selection (µB + 3σB ) for well separated Background and Foreground modes of middle and near pedestrians



( ) λpB σF − (T − µF )2 (T − µB )2 − = exp + pF σB σF 2 σB 2

(7)

(8)

Taking ln on both the sides of the Eq. 8, we obtain (

) ( ) (T − µF )2 (T − µB )2 pF σF − ln = ln − + λpB σB σF 2 σB 2

(9)

Writing the same as a quadratic equation in T , we get, ( ) ( ) T 2 σ2F − σ2B + 2T µF σ2B − µB σ2F ( ) σF pF 2 2 2 2 2 2 +σF µB − σB µF + σB σF ln =0 λσB pB

(4)

Well Separated Background and Foreground: When the variances of the foreground and background are quite low as against the difference between their means, the influence of background in pedestrian region growing is far less and the threshold estimation becomes straight forward as µB +3σB . Fig. 7 shows the threshold selection for bi-modal and multi-modal scenarios of middle and near range pedestrians. Merged Background and Foreground: Bright background might lead to a large variance of the ROI gray intensity in foreground and background compared to their mean difference. This results in running over of the peaks of the background and foreground distribution with a degraded valley between them, thus rendering it impossible to choose a trivial threshold as a membership criterion to grow the pedestrian ROI. This mandates the requirement of a carefully chosen threshold to prevent false classification of pedestrian pixels as background pixels. The threshold is chosen to decrease the probability of erroneously classifying the object pixel as background (Eq. 5). Background pixel being identified as object is kept at bay by the additional constraint imposed on the Region Membership Criterion (RMC) (refer Eq. 3).

(10)

Solving the quadratic equation, we get T as, {( ) 1 2 2 µ σ − µ σ ± σ B σF ( 2 ) F B B F σ B − σF 2 √ ( ) )} σF pF ( 2 2 (µB − µF ) − ln σ B − σF 2 λσB pB

(11)

Fig. 8 (top row) emphasizes the supremacy of the threshold estimation by analyzing the same in two different simulation conditions. σB 2 = 20, σF 2 = 19 and µB = 50, µF = 150 are kept constant across the experiments, i) pB ≪ 1: low threshold as the confidence level of a given pixel belonging to foreground is high, ii) pB ≈ 1: high threshold to filter out background effectively as a given pixel will belong to the background rather than to the foreground. It can be seen from the figure that the threshold is estimated optimally in order not to mis-classify background pixels as foreground pixels, while not missing out foreground pixels. Fig. 8 (bottom row) shows the threshold that will be estimated as per Eq. 11 for the extracted pedestrian vertical stripes whose histogram has quite a lot of overlap with the background histogram. Estimation of bi-modal and multi-modal parameters: Apriori assessment and knowledge of the distribution being bimodal or multi modal was an entailment to simplify the parameter estimation process required by region growing procedure. Towards this, this paper extends the work done in (37)

min pF E F (T ) with constraint, pB E B (T ) = 0 T ∫ T ∫ inf where, E F (T ) = PF (t)dt, E B (T ) = PB (t)dt (5) − inf

λpB PF (T ) = pF PB (T )

Substituting the Gaussian PDF for PB (t) and PF (t) , we obtain

3.4.2. Threshold for Middle and Near Range As the object area is comparable to that of background area, the histogram either exhibits bi-modality or multi-modality depending on the intensity of cloth insulation (refer Section 3.2). The threshold chosen depends on the separation between background (first mode) and foreground modes (the modes thereafter in tri and other higher modal distributions). The modes are presumed to be well separated provided that the distance (Eq. 4) between first (background) and second (foreground) mode is above a threshold and vice versa. The parameters of the background and foreground ϕB = [pB , µB , σB 2 ] and ϕF = [pF , µF , σF 2 ] respectively are estimated as explained in the later part of this section. |µF − µB | |µB − µF | ψ = pB + pF σB σF

(6)

T

7

1

0.005 KL = 0.02017

50

100 X 150

200

250

0 0

300

50

(a)

250

300

0.008 Threshold

0.006

Threshold = 91

0.004 0.002 50

100

150 Intensity

(c)

50

200

250

300

Pedestrian Histogram Approximated 400

100

150 x

200

250

300

KL = 0.0241

0.01

0.006 0.004

KL=0.4826

0.002

0 0

0 0

50

100

150 x

200

250

300

50

100

(b)

150 x

200

250

300

(c)

Figure 9: Bi-modal approximation (with the proposed parameter estimation equations) of simulated bi-modal Gaussian and their KL.

Threshold = 122

Threshold 0.012

200

0 0

0.008

0.005

(a)

600 Pedestrian Histogram Approximated

0.01

Prob of Intensity

Prob of Intensity

200

(b)

0.012

0 0

100 X 150

50

100

150 Intensity

200

250

Prob of Intensity

0 0

0 0

0.015

Approximated Gaussian Simulated Gaussian

0.01

Approximated Gaussian Simulated Gaussian

Prob of x

Threshold

0.5

0.01

0.012

0.02

300

(d)

KL = 0.0115

0.006 0.004 0.002 50

100

150 Intensity

(a)

200

250

300

Pedestrian Histogram Approximated

0.01

KL = 0.9132

0.005

0 0

50

100

150 Intensity

200

250

300

(b)

Figure 10: Bi-modal approximation of histogram of middle/near range pedestrian (a) Two modes, (b) Three modes

(uni-modality deviation) to bi-modality deviation. The deviation from bi-modality is calculated as KL divergence between the vertical stripe distribution and most optimally fit bi-modal Gaussian distribution Pbi = pB PB (x) + pF PF (x), where PB (x) and PF (x) are background and foreground modes with parameters ϕB = [pB , µB , σB 2 ] and ϕF = [pF , µF , σF 2 ] respectively. If the distribution is bi-modal, the parameters are estimated via the Eq. 12 and in other respects, through the standard iterative Expectation Maximization (EM) algorithm. By providing an analytical solution to estimate the parameters of the bi-modal distribution, this paper overcomes the impediments of EM algorithms such as slow convergence and dependence of the solution on initial values chosen. ∑ Nm ∑Nm 2 xY(x) x=1 (x − µ) Y(x) 2 µB = ∑x=1 , σ = B ∑ Nm Nm x=1 Y(x) x=1 Y(x) ∑N ∑N 2 x=N +1 xY(x) x=Nm +1 (x − µ) Y(x) µF = ∑N m , σF 2 = ∑N x=Nm +1 Y(x) x=Nm +1 Y(x) ( )−1 [pB pF ]T = AT A AT b

Approximated Pedestrian Histogram

0.008

0 0

Figure 8: Top Row: Simulation for low and high background probability, Bottom Row: Threshold for merged modes: (c) Two modes, (d) Three modes

0.015

0.01

Prob of Intensity

0.5

Background Prob=0.85 Foreground Prob = 0.15 Threshold = 97

Approximated Gaussian Simulated Gaussian

Prob of x

Threshold

0.025

Prob of x

Background Prob = 0.15 Foreground Prob = 0.85 Threshold = 86

Prob of X

Prob of X

1

4-connectivity. This finite graph was represented using an adjacency matrix. Connected pixels were extracted from this adjacency matrix through block triangularization of the same, which was computed using Dulmage Mendelson decomposition (34). Shrinking of the vertical stripe and extraction of top and bottom locations of ROI bounding box was done by examining the extracted connected pixels. Few examples of ROI extracted images are shown in Fig. 11. It is more evident in the results shown that the claim made in the proposed ROI extraction module, minimal pedestrian body part splitting, has been achieved while not missing out any pedestrians, by careful stitching of the strategies proposed in Section 3.2, 3.3 and 3.4 4. Proposed Algorithm: Pedestrian Validation

(12)

In this section, each detected Bounding Box (BB) of Section 3 is validated in order to remove the artifacts and ambiguities featuring wrong pedestrian BB detection. This is mandatory as false positives will increase in the situations where the assumptions made on the histogram of background and pedestrians in order to extract latter from the former, become invalid. There is a high probability of the assumptions becoming invalid in scenarios where the background is cluttered or the non-pedestrian objects emit intense heat. Few examples are given in Fig. 12 for reference of the cases where ROI extraction alone may not suffice. The proposed pedestrian validation method involves feature extraction for pedestrian description, followed by classification using SVM. Pedestrian description is the process of projecting the pedestrian candidate ROI into a feature space, which can be further used in the classification step to recognize input as pedestrians or non-pedestrians. After thorough investigation in the feature space, a more recent multi resolution transform

Where, Nm is the value of x, where the background and foreground distribution forms the valley. A is the matrix with PB (x) and PF (x) as column vectors and b is the column vector of Y(x) (x ∈ [0...N]). Proof for the parameters of the bi-modal distribution that optimally fits the given histogram is given in Appendix A. To substantiate the derivation of the parameters derived for bi-modal distribution, simulation results have been displayed in Fig. 9. The results of middle pedestrians (two and three modes) with their optimally fit bi-modal Gaussian and KL divergence are given in Fig. 10. 3.5. ROI Bounding Box (BB) Estimation The final step in ROI detection is to find the appropriate bounding box coordinates of pedestrian ROI. A graph was constructed with non-zero elements in the vertical stripe based on 8

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

(i)

(j)

(k)

(l)

(m)

(n)

(o)

(p)

(q)

(r)

Figure 11: Thermal images with extracted BB, (a) to (e) are images from standard databases as given in Sec. 5, (f) to (r) are images from in-house database as given in Sec. 5. Bounding boxes inscribe full body of pedestrians without any splitting in spite of intense cloth insulation

to effectively represent the BB in a feature space is being presented in this work. The multi resolution information provided by curvelet transform along different orientations has been exploited and a new feature descriptor named curvelet energy entropy feature has been proposed. To validate the effectiveness of the proposed feature descriptor, a comparative study has been conducted with other well known features. The tests demonstrate that by having curvelet feature representation, the classification task has been enabled with significant improvement in accuracy over state-of-the-art features. A comprehensive comparative study is presented in Section 5 below

4.2. Curvelet Transform Implementation As the implementation of curvelet transform in the time domain involves high cost, the frequency domain implementations are in wide-spread use irrespective of the redundancy involved. In Fourier space, curvelets are supported inside a parabolic wedge. In practice polar coordinates are replaced with Cartesian coordinates. In Cartesian coordinates system, the support of curvelets are inside trapezoids U jq (ω1 , ω2 ), which isolate frequencies inside j the trapezoid such that 2 j ≤ ω1 ≤ 2 j+1 and −2 2 ≤ ω2 /ω1 ≤ j 2 2 . In Cartesian grid, equi-spaced angles become equi-spaced −j slopes tan θ jq = q2 2 , where q = −2[ j|2] , · · · , 2[ j|2] − 1 and the curvelets at different rotation angles become sheared trapezoids ( ) U jq (ω1 , ω2 ) , which can be modeled using the shear matrix given below [ ] 1 0 S θ jq = . (14) tan θ jq 1

4.1. Overview of Curvelet Transform Curvelets are optimally sparse representation of objects with discontinuities along C 2 edges. The space side picture of Curvelet Transform is the representation of images with curvelet basis. Curvelet basis are the waveforms ϕ(x) = ϕ(x1 , x2 ), smooth and oscillatory in horizontal direction x1 and non-oscillatory in the vertical direction x2 with parabolic rescaling (width = length2 ), rotation and translation. The curvelet coefficients (Eq. 13) are calculated as a convolution of the image f and the basic curvelet (ϕ j,0,0 ) at different scales ( j = −j 1, · · ·, J), rotation angles (θ jq = 2πq2 2 (, where ··· ( q = 0, −1, j )) −j such that 0 ≤ θ jq ≤ 2π) and translations k = k1 2 , k2 2 2 ( ) 3j f ⊗ 2 4 ϕ j,0,0 Rθ jq x − k

The two distinct implementation techniques (http://curvelab.org) of Discrete Curvelet Transform (DCT) as presented in (32) are Discrete Curvelet Transform via unequispaced FFT (usDCT) and wrapping based Discrete Curvelet Transform (wDCT). The interpolation involved in usDCT is computationally the most expensive component of this transform. Naturally, one might be inclined to use wDCT due to its inherent advantages. In spite of its advantages and its capability to optimally represent content of the images, the application of Curvelet Transform in image analysis

(13)

where, Rθ jq is the rotation matrix with angle θ jq . 9

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

(i)

(j)

Figure 12: Thermal images with BB displaying the possible false positives of ROI extraction module. These images show the indispensability of pedestrian validation step to remove false positives

was limited due to its computational cost. In this direction, this paper presents a new implementation of wDCT, which needs significantly less time to compute DCT as against the conventional wDCT.

E jq

N jq ∑ f ast 2 = C jq (i)

(15)

i=1

Here, N jq represents the number of coefficients in scale j and angle q. Curvelet entropy is a measure of image complexity. In a given scale and angle, sparser the curvelet coefficients, larger the curvelet entropy. For example, smooth images have very sparse curvelet coefficient, indicating less complexity of the image. Given F˜jq (ω1 , ω2 ), the curvelet spectra at scale j and angle 2 q, E jq (ω1 , ω2 ) = F˜jq (ω1 , ω2 ) represents the distribution of curvelet energy in the given scale j and angle q at (ω1 , ω2 ), thus leading to the probability distribution of curvelet spectra in the given scale and angle as per Eq. 16.

4.3. Proposed Fast Wrapping based Discrete Curvelet Transform algorithm This section puts forward a fast wDCT algorithm that lowers the computational time and retains acceptable numerical accuracy. The proposed implementation preserves the properties of Curvelet Transform as we have not changed the core of the Curvelet Transform algorithm. The key to reducing the time complexity is the observation that wDCT implementation is dominated by the use of non power of 2 FFT algorithms. We propose the necessary modifications required in the conventional wDCT to save the time wasted in the computation of non power of 2 FFTs. We present the architecture of our fast wDCT ( j = 1, . . ., J scales and q = 0, . . ., L j , where L j is the number of angles in scale j) of an image of size N1 ×N2 , as Algorithm 1. Experiments have been conducted on 12 images of diversified sizes, to emphasize the practical improvements in the computation time of curvelet transform. The sizes of the images are chosen such that the image size space is well sampled to include include prime factors (113, 347, 547, 953), compound numbers with large prime factors (302, 583, 889, 993) and powers of two (128, 256, 512, 1024). Fig. 13 f brings out the reduction in time complexity that could be achieved as an outcome of the proposed tweaks incorporated into wDCT.

P jq (ω1 , ω2 ) =

E jq (ω1 , ω2 ) E jq

(16)

The curvelet energy entropy at scale j and angle q is defined as per Shannon’s entropy theory as H jq =

∑ ω1 ,ω2 ∈ω jq

[ ( )] P jq (ω1 , ω2 ) ln P jq (ω1 , ω2 ) (17)

Where, ω jq represents frequencies present in scale j and angle q. Curvelet Energy Entropy (CEE) Feature: To derive CEE feature for a given ROI, each ROI was tranformed to curvelet space (two scales with eight different orientations in finer scale). The energy distribution (along one coarse scale and eight orientations in finer scale) and entropy (along 8 orientations in finear scale) of curvelet coefficients were computed as given in Eq. 15 and Eq. 17 respectively. Thus, for a given ROI, a 17 dimensional vector was derived as CEE feature.

4.4. Curvelet Feature Extraction The extracted curvelet features consist of two parts: curvelet energy and curvelet entropy. Curvelet energy at scale j and angle q is defined as the mean square of the curvelet coefficients in that particular scale and angle as given in Eq. 15. Curvelet energy is the indication of distribution of image energy across scales and bands. 10

4.5. ROI classification

The proposed pedestrian detection method was evaluated with five diverse evaluation experiments depicting the following metrics calculated as average over 10 different images: recall vs precision, False Positive (FP) rate vs False Negative (FN) rate or miss rate, Percentage of Wrong Classifications (PWC), Receiver Operating Characteristic (ROC) curve (FP rate vs. TP rate) and accuracy (ratio between TP rate and the sum of TP and FP rate) across different feature extraction methods. A comparative study of computation time of CEE with respect to other features has also been carried out. A detailed study has been done to bring out the significance of the proposed method over CNN based detectors BB Class Estimation: The metrics used for evaluation mandates the classification of detected BB and ground truth BB as True Positive (TP), False Positive (FP), True Negative (TN) and False Negative (FN). This classification was carried out by comparing the ground truth bounding boxes (BBgt ) and final detected bounding boxes (BBdt ) of various feature extraction methods based on the confidence value as specified in Eq. 19.

The Support Vector Machine (SVM) was the classifier chosen to solve the pattern recognition problem of recognizing pedestrians. SVM is very adept at solving two class classification problem with the training data set ( fr , yr ), with fr as CEE feature vector of positive and negative ROIs, and yr as the label with binary value (pedestrian or non-pedestrian), where r = [1, 2, · · · , R]. The optimal solution to this problem is obtained in SVM through the following optimization problem: max αr

R {∑ r=1

αr −

} 1∑ αr α s yr y s K ( fr , f s ) 2 r,s

(18)

Where, 0 < αr and K is the kernel function that maps the original feature space to a high dimensional feature space. Popular kernel functions are Gauss kernel, polynomial kernel and sigmoid kernel. Gauss kernel was chosen for the given work and the parameters were tuned empirically to obtain optimal classification results.

area(BBdt ∩ BBgt ) > overlap ratio (19) area(BBdt ∪ BBgt ) The assignment of each BBdt to BBgt follows the following rules:

5. Experiments and Results In this section, the impact of increasing the ROI extraction complexity as proposed in this paper and that of curvelet energy entropy features applied over the ROIs has been evaluated. Pedestrian Dataset: Multiple thermal pedestrian datasets have been collected over years. Today, OTC is the predominant benchmark for pedestrian detection. This paper makes use of 6 different datasets to analyze the proposed method on pedestrians in different scales, including three databases from OTCBVS (http://www.cse.ohiostate.edu/otcbvs-bench), the Thermal Corridor dataset (TC), BU-TIV datasets (csr.bu.edu/BU-TIV/BUTIV) and in-house datasets. The three datasets from OTCBVS are OSU Thermal Pedestrian Database (with 155 pedestrians), OSU ColorThermal Database (OSUc with 700 pedestrians) and Terravic Motion IR Database (M-IR with 700 pedestrians). The in-house database is composed of around i)1000 images with pedestrians, cars and motor-bikes, captured on a cold winter evening, ii) 1000 images consisting of pedestrians in a room (night), iii) 2000 images bearing pedestrians in corridor (day and night), captured with a FLIR thermal camera. A large set of feature types have been explored: Histogram Of Gradients (HOG) (11), Histogram of Phase Congruency (22) (HPC), HAAR wavelet (25), Wavelet Entropy (WE) (27), Dual Tree Complex Wavelet Transform (DT DWT) and Double Density DT DWT (DD DT DWT) (28) among others. Curvelet features have shown to systematically improve the performance. Much of the progress in pedestrian detection can be attributed to the use of curvelet features along with the tweaks made in the proposed pedestrian detection step to reduce the loss of pedestrian parts. The good news is that, that the proposed method seems to perform skilfully across datasets and better than well performing state-of-the-art features. Conclusive empirical evidence has been provided indicating the meaningful gains achieved by the proposed method.

1. BBdt with the highest score was considered the winner for a particular BBgt . 2. Once a BBgt is matched to a BBdt , they are excluded from the following matching process so that each of them may be paired at most once. BBdt is assigned a class as T P, FP, FN or T N based on the following outcomes:   TP    FP BBdt =    T N

if matched if unmatched if unmatched and non-pedestrian { BBgt = FN if unmatched

(20)

(21)

5.1. Results and Discussion This section discusses the outcome of comparing the proposed method with state-of-the-art features based on the metrics mentioned in Section 5. The entire procedure is listed as follows: 1. Search for all possible candidate ROIs using the proposed ROI extraction method. Classify the extracted ROIs manually as pedestrian and non-pedestrian. Select around 2100 as positive samples and 2300 as negative samples to be used in the training phase. 2. Represent all positive and negative samples in various feature spaces: proposed Curvelet Energy Entropy feature (CEE), HAAR, WE, HOG, HPC, DT-CWT, DD DT-CWT. 3. Feed these features as input to the SVM classifier and generate output with ROIs that qualify the pedestrian validation process. 11

Algorithm 1 Algorithm for proposed Forward fast Wrapping based Discrete Curvelet Transform Pad the input image f (n1 , n2 ) of size N1 ×N2 with δn1 , δn2 to the next power of 2 such that N1 + δN1 = 2 x , N2 + δN2 = 2y , x and y ω being some integers. Apply Fourier Transform to the image to get the Fourier samples F (ω1 , ω2 ) where −ω 2 ≤ ω 1 , ω2 < 2 . for j < J and q < L j do 1. Find the product F˜jq (ω1 , ω2 ) = U jq (ω1 , ω2 ) F (ω1 , ω2 ). 2.

To make the length of the inverse FT a power of two: (a)

( ) ( ) Embed the product in the center of the parallelogram of complex zeros of size L1 j + δ1 j × L2 j + δ2 j such that j

L1 j + δ1 j = 2m , L2 j + δ2 j = 2n , where m and n are any integers, L1 j = 2 j τ1 with τ1 = N1 2−J and L2 j = 2 2 τ2 with τ2 = N2 2−J

4.

Tile[ this (parallelogram along the )] 2D plane to get Fˆjq (ω1 , ω2 ) ) ( ˆ F jq ω1 mod L1 j + δ1 j , ω2 mod L2 j + δ2 j . ( ) ( ) Extract L1 j + δ1 j × L2 j + δ2 j portion of the wrapped product around the origin.

5.

Compute the inverse FFT to get interpolated curvelet coefficients C jq (i).

6.

Re-sample the interpolated curvelet coefficients to L1 j ×L2 j (Zero padding in frequency amounts to interpolation in space). [ L ]−1 Extract N1 2− j ×N2 2j to counteract the effect of zero padding the input image and obtain the Curvelet coefficient

3.

such

that

F˜jq (ω1 , ω2 )

=

f ast (i) of the proposed fast algorithm. C jq end for

5.1.1. Experiment 1: Comparison with non-deep learning methods This section furnishes the result of 6 different state-of-the-art feature extraction methods compared against CEE features. Recall Vs. Precision: Fig. 13 a shows the comparison in terms of average recall vs average precision (Eq. 22). Average is calculated as mean of the recall and precision obtained across the different databases mentioned in Section 5. Compared to other features, CEE was able to recall most of the pedestrians while being precise in judging a pedestrian.

Recall =

TP TP , Precision = T P + FN T P + FP

FP + FN (23) T P + T N + FP + FN FP rate vs TP rate: The best way to illustrate the performance of any binary classification problem is ROC curve. ROC curve is a plot of FP rate against TP rate. The best classification method would give a point in upper left corner of the plot. In order to assess the trade off between TP and FP, the result of comparsion of the proposed method with other feature extraction methods is given in terms of ROC curve in Fig. 13 d. ROC curve leads to the inference that the proposed method always lies above the random guess line of ROC curve. Accuracy: The assessment that could be derived from ROC curve is accuracy (as given in Eq. 24) of the classification problem. In order to give more intuitive quantitative comparison, this paper also furnishes the accuracy view of ROC curve. The Fig. 13 e shows that the accuracy of the proposed method exceeds that of other feature extraction methods. PWC =

(22)

FP Rate vs. Miss Rate: Concerning miss rate and the probability of falsely rejecting the null hypothesis, the proposed system has been tested extensively, and the outcome has been plotted in Fig 13 b. The performance of CEE features was promising as it maintains a significant reduction in miss rate (FN) at fixed false-positive rates (FP). CEE demonstrated a detection miss rate of essentially zero at FP of 0.9. PWC: In addition to recall-precision and miss rate vs false positives, it may be of significant value to measure the instances when a pedestrian detection system goes erroneous. Towards this, in this section, Probability of Wrong Classification (PWC) as given in Eq. 23 is employed as a common reference metric to compare the feature extraction methods. Fig. 13 c is the comparative plot of reduction in PWC achieved by our system against the other feature extraction methods. Among all the combinations of feature extraction methods, it can be inferred that CEE gave the lowest possible PWC.

TP (24) T P + FP Time Complexity: Fig. 13 f shows that the tweaks proposed in the computation of curvelet transform (refer Sec. 4.3) has led to drastic decrease in the computation time of curvelet feature, comparable with other feature extraction methods, thus enabling curvelet transform to be used to its full potential. accuracy =

5.1.2. Experiment 2: Comparison with deep learning Given the popularity of deep learning networks, it is fair to dedicate a section to investigate the deep learning based pedestrian detection to bring out the pros of the proposed method. In past few years, deep learning architectures have been used 12

1

1

0.6

0.9

0.6 0.5 0.4 0.1

Proposed HOG DD DT CWT DT CWT HPC WE HAAR 0.2 0.3 0.4

Miss Rate

Recall

0.7

0.6

0.4

0.5 0.6 Prceision

0.7

0.8

0.2 0.1

0.9

PWC

0.8

0.8

Proposed HOG DD DT CWT DT CWT HPC WE HAAR 0.2

0.3

0.2 0 0.4 0.5 0.6 False Positives

(a)

0.8

0.9

Proposed

HOG

HAAR

0.2

Random Guess LIne 0.2

0.4 0.6 False Positive Rate

DD DT CWT

DT CWT

WE

HPC

40

0.4 0.2 0

OSU

BU−TIV M−IR OSUc Database

(d)

TC In−House

(c)

Computation Time in ms

0.4

OSU BU−TIV M−IR OSUc Database

0.6 Proposed HOG DD DT CWT DT CWT HPC WE HAAR 0.8 1

Accuracy

True Positive Rate

0.8

0.8

0 0

0.7

(b)

1

0.6

0.4

Prposed HOG HAAR DD DT CWT DT CWT WE HPC

TC

In−House

(e)

wDCT

Proposed

HOG

HAAR

DD DT CWT

DT CWT

WE

HPC

20 0

113

347

301

583

128

256

512

400 200 0

547

943

889 Image Size

993

1024

(f)

Figure 13: (a) Comparison of Recall vs Precision, (b) Comparison of Miss rate vs FP, (c) Comparison of PWC, (d) Comparison in terms of ROC, (e) Comparison of accuracy, (f) Comparison of time complexity

rather than shallow architectures to achieve state of the art results in object detection of optical domain. A search for papers on deep learning revealed that optical object detection had a big revolution with deep Convolutional Neural Network (CNN) (44). Research efforts in object detection in optical images has been recently accelerated by CNN. This has led this paper to restrict the study on deep learning to CNN. A literature survey has been further conducted on CNN based optical object detectors, which ended with the conclusion that Fast R-CNN (Region based CNN) proposed by authors of (42) and faster R-CNN proposed by authors of (45) have achieved good classification accuracy with reduced computational complexity. As faster R-CNN needs a very high end NVIDIA card for training, the study of faster R-CNN is beyond the scope of this paper, thus limiting the study to Fast R-CNN. Fast R-CNN works as follows: category independent region proposal (selective search (43)), R-CNN that extracts feature vector from each region followed by class specific linear classifier. The first and foremost advantage that we claim over the Fast R-CNN based object detector is in region proposal. The selective search used in Fast R-CNN is denser as compared to the ROI extraction module proposed. Due to the inherent nature of thermal images, dense region proposals indeed reduce the detection quality by increasing spurious false positives. Fig. 14 shows a comparative plot of the number of proposals generated by selective search and our method. It is very clear that dense proposal excerts lot of pressure on validation step to reduce the false positives. Our sparse ROI extractor is effective in terms of computational time as well as detector performance. Experiments have been conducted across fast R-CNN and the proposed method in terms recall vs precision, FP rate vs miss

pedestrian 0.999

(a)

(b)

Figure 14: ROI extraction of (a) Selective Search of Fast R-CNN, (b) Proposed

rate, PWC, ROC and accuracy as shown in Fig. 15. It could be concluded by noting that the proposed method performs equally well and sometimes even better than fast R-CNN. The following are the additional arguments put forward to emphasize the preference of the proposed method: 1. Data requirement: The architecture of the CNN network used in Fast R-CNN consists of 5 convolution layers. The number of output from each layer is 96, 256, 384, 383, 256 with kernels of size of 11x11, 5x5, 3x3, 3x3 and 3x3 respectively. With these huge number of free parameters, the degree of over fitting of CNN is highly probable. In order to prevent over fitting, extra training examples are mandatory, thus making CNN data hungry and less useful for thermal imagery domain, where labeled data is very scarce. This attributes to lesser utility of deep learning methods in thermal domain, though authors of (12) had explored the use of CNN features for pedestrian detection in thermal images. The authors had shown their results on a training set of 8000 positive and 8000 negative examples, far greater than the number of training exemplars 13

0.9

0.4

Proposed Fast−RCNN

Miss Rate

Recall

0.7 0.6

0.3 PWC

0.8 0.8

Proposed Fast−RCNN

0.6 0.4

0.2 0.1

0.5 0.4 0.1

1

0.8

0.8

0.6 Accuracy

1

Proposed Fast−RCNN

True Positive Rate

1

0.6

Proposed

Fast R−CNN

0.4 0.2

0.4 Proposed Fast R−CNN

0.2

0.3

0.4

0.5 0.6 Precision

(a)

0.7

0.8

0.9

0.2 0.1

0.2

0.3

0.4

0.5 0.6 False Positive

0.7

0.8

0.9

0

0.2 0.1

OSU BU−TIV M−IR OSUc TC In−House Database

(b)

0.2

0.3

(c)

0.4 0.5 0.6 False Positive Rate

0.7

0.8

0

0.9

OSU BU−TIV M−IR OSUc TC In−House Database

(d)

(e)

Figure 15: Comparison of proposed methods with Fast R-CNN (a) Recall vs Precision, (b) Miss rate vs FP, (c) PWC, (d) ROC, (e) Accuracy

between the ROC of the proposed method (for near pedestrians) and Fast R-CNN (far pedestrians) (ref Fig. 16 b) indicates the superiority of the proposed method. As the object size to be detected becomes small, the detection rate of any pedestrian detector reduces. The time analysis confirms the constraint of deep learning pedestrian detectors to be used in real time pedestrian detection.

Computation time (secs for Fast R−CNN / ms for proposed)

2. Initialization requirement: A good initialization of the parameters is mandatory for convergence and to avoid the network from getting stuck in some local minima. In the experiments conducted in this paper, the parameters of CNN were initialized with the weights from a pre-trained network, failing which the CNN was resulting in very high miss rate. 3. Memory requirement: The model generated out of fast RCNN training is of the order of 500 Mb. It becomes memory expensive to port such huge models into memory constrained mini and micro robotics.

8

Secs

Secs

Secs

Fast R−CNN Proposed

Secs

6

4

mSecs mSecs

2

0

mSecs mSecs

480x640

240x320

160x213 120x160 Image Size

(a)

4. Time Requirement: Fast R-CNN is one of the faster deep learning pedestrian detector. The fast R-CNN is claimed to be 9 times faster than R-CNN (41) and 3 times faster than SPPnet (42). Fig. 16 a shows the comparison of detection time of fast R-CNN and the proposed method run on Z-210 quad core processor with RAM of 10 Gb without NVIDIA card. It is significant that the proposed method has achieved very less detection time, thus making it superior over CNN based methods, to be used in mini and micro robotics with power and compute constriants.

Secs

mSecs 96x128

1 True Positive Rate

used for the proposed method. The data requirement for the proposed algorithm is very minimal.

0.8 0.6 0.4 0.2 0.1

Proposed Fast R−CNN 0.2

0.3

0.4 0.5 0.6 False Positive Rate

0.7

0.8

0.9

(b)

Figure 16: (a) Time complexity of Fast R-CNN and proposed (b) Comparison of ROC of Fast R-CNN evaluated for far pedestrians and the proposed method evaluated for near pedestrians

6. Conclusion This paper has brought a new dimension to thermal pedestrian detectors, which follow the intensity based ROI extraction methodology coupled with feature based validation. The unacceptable problem of ROI extraction module, body part splitting, has been well minimized by tailoring the ROI extraction to the scale of the pedestrian. This choice of scale dependent ROI extraction algorithm has been intensely justified with empirical analysis of pedestrians at different scales, which clearly displays the obvious variation in pedestrian statistics with their scale. The identification of scale of the pedestrian has been automated with an analytical solution, based on assessing the deviation of the distribution of the pedestrian pixels from normality. Besides, the paper has come up with an adaptive threshold for pedestrian region growing built on the nature of their probability distribution and substantiated the utilization of proposed threshold with experimental studies. Results from voluminous experiments have been provided to validate the minimization of pedestrian body splitting achieved through the proposed ROI extraction module. Substantial decrease in false positives has

An example: consider a platform that runs at a rate of 60 Km / Hr and fitted with a high end FLIR thermal camera A615 with focal length of 41.3 mm, detector pitch of 17 micro metre and frame rate of 50 Hz. On an average, it could be seen that the proposed method consumes 2 ms per frame. As the maximum number of frames that could be captured with any high end thermal camera cannot exceed 1.2 frames in 2 ms, the proposed method could be used for any real time applications. The average computation time of Fast R-CNN is around 7s. At the mentioned speed of 60 Km / Hr, in order to avoid collision, fast R-CNN need to detect pedestrian at a distance of 116m. A pedestrian of average height 2 m at a distance of 116m will subtend approximately 41 pixels in the image. Thus, the mandate posed on Fast R-CNN is to exhibit good detection rate for far pedestrians and that on the proposed method is to have good detection rate for near pedestrians. The comparison 14

been achieved in the pedestrian validation step through exploitation of a very capable feature, named Curvelet Energy Entropy feature. One of the major accomplishment of this paper is in venturing the utilization of curvelet transform, which has not found its extensive use in thermal images so far. Solution has also been established to increase the speed of curvelet transform computation. The performance superiority achieved over different state-of-the-art features has been supplemented with comprehensive experiments conducted across six different datasets. The advantage of the proposed method over the most popular and recent CNN based detectors is that the former is neither power hungry nor memory hungry, thus making it preferrable for resource constraint environments.

Substituting the Eq. 29 in the first term of Eq. 28, −

∂ ∂σ

Nm ∑

x=Nm

A=−

N ∑

Y(x) ln Pbi (x) +

Y(x) ln pB −

x=1

(25)

B=−

N ∑

Y(x)

N ∑

Y(x) ln Y(x)

Y(x) ln

x=Nm +1

(26)

(31)

Y(x) ln √

(x − µB )2 2σB 2

N ∑ σB pF − Y(x) ln σF x=N +1 pB

Y(x)

(x − µF )2 2σF 2

Y(x)

(x − µB )2 2σB 2

x=Nm +1 N ∑



1 2πσB

m

N ∑

+

x=Nm +1

(27)

N ∑ x=1

+

To find µB , Eq. 31 is differentiated with respect to µB and equated to 0 as given below, N ∑ x=1

x=1

(30)

Where,

Proof for Bi-modal parameters: The most optimal ϕB and ϕF are arrived at by making the KL divergence between Y(x) and Pbi (x) the least. KL = −

)

KL = A + B

]

N ∑

pF PF (x) Y(x) ln pB PB (x) +1

Substituting Gaussian distribution for PB (x) and PF (x),

This equation resolves to the one given in Eq. 2 for σ.

N ∑

(

N ∑



The above equation gets simplified to the one given in Eq. 2 for µ. N

Y(x) ln (pB PB (x))

x=Nm +1

x=1

x=1

∑ ∂ − (x − µ)2 1 = Y(x) + =0 ∂σ x=0 σ σ3

N ∑

Y(x) ln (pB PB (x)) −

are zero.

N ∂ 1 ∑ = 2 xY(x) − µY(x) = 0 ∂µ σ x=0

[

pF PF (x) − Y(x) ln (pB PB (x))+ pB PB (x) x=Nm +1 ( ) N ∑ pF PF (x) Y(x) ln 1 + pB PB (x) x=N +1

(x) With the fact that, in most of the cases, PPFB (x) ≪ 1 when x ∈ [1, Nm ] and otherwise when x ∈ [Nm , N], the above equation can be simplified as,

Proof of Uni-modality analysis: To qualify the deviation from normality, the first step followed was to find the mean and variance of the uni-modal Gaussian that best fits the given data points. The parameters are estimated by minimizing the KL divergence as given in Eq. 1. With the assumption of Gaussian distribution, ln P(x) becomes,

and

N ∑

m

Appendix furnishes the details of the technique presented by the authors of (37) to assess the deviation of given probability distribution Y(x) from uni-modality and proof of the analytical solution proposed in this paper for the parameters of bi-modal distribution.

∂ ∂µ

Nm ∑ x=1

)

Y(x) ln 1 +



µ and σ are found such that

Y(x) ln (pB PB (x)) −

x=1

(

7. Appendix

√ (x − µ)2 ln P(x) = − − ln 2πσ 2 2σ

Nm ∑

Y(x)

N ∑ 2 (x − µB ) 2 (x − µB ) − Y(x) =0 2σB 2 2σB 2 x=N +1 m

µB

(28)

x=1

Nm ∑

Y(x) =

x=1

In order to find the optimal parameters, KL has to be differentiated with respect to each parameter. As the second term will not contribute to the differentiated term, it has been ignored from now on. Substituting Pbi (x), ln Pbi (x) becomes ( ) pF PF (x) ln Pbi (x) = ln (pB PB (x)) + ln 1 + (29) pB PB (x)

Nm ∑

(32)

Y(x)x

x=1

Untangling of the above equation yields an analytical solution for µB as given in Eq. 12. In order to find µF , Eq. 31 is differentiated with respect to µF and equated to 0 as, −

N ∑ x=Nm +1

15

Y(x)

2 (x − µF ) =0 2σF 2

(33)

The above equation reduces to the derivation of µF as provided in Eq. 12. In pursuance of extracting a solution for σB 2 , Eq. 31 is differentiated with respect to σB , Nm ∑

[9] C. Dai, Y. Zheng, and X. Li, Pedestrian Detection and Tracking in Infrared Imagery using Shape and Appearance, Computer Vision and Image Understanding, 2007, pp. 288 - 299. [10] Fang, K. Yamada, Y. Ninomiya, B. K. Horn, and I. Masaki, A Shape Independent Method For Pedestrian Detection with Far Infrared Images, IEEE Transactions on Vehicular Technology, Vol. 53, 2004, pp. 1679 1697. [11] John. V, Zheng Lu and Mita. S, Pedestrian Detection from Thermal Images with a Scattered Difference of Directional Gradients Feature Descriptor, IEEE Conference on Intelligent Transportation Systems, 2014, pp. 2168 - 2173. [12] Vijay John, Zheng Lu, Seiichi Mita and Bin Qi, Pedestrian Detection in Thermal Images Using Adaptive Fuzzy C-Means Clustering and Convolutional Neural Networks, International Conference on Machine Vision Applications, 2015. [13] K. Hajebi and J. Zelek, Dense surface from infrared stereo, IEEE Workshop on Applications of Computer Vision, Vol. 19, 2007. [14] M. Bertozzi, A. Broggi, A. Lasagni, and M. Del Rose, Infrared Stereo Vision Based Pedestrian Detection, Intelligent Vehicles Symposium, Vol. 19, 2005. [15] M. Bertozzi, A. Broggi, C. Caraffi, M. Delrose, M. Felisa, and G Vezzoni, Pedestrian Detection by Means of Far Infrared Stereo Vision, Computer Vision and Image Understanding, Vol. 19, 2007, pp. 194 - 204. [16] S.J Krotosky and Mohan M Trivedi, On Color-, Infrared-, and Multimodal-Stereo Approaches to Pedestrian Detection, IEEE Transactions on Intelligent Transportation Systems, 2007, Vol. 19, pp. 619 - 629. [17] Daniel Olmeda, Arturo De, Jose Maria Armingol, Far Infrared Pedestrian Detection and Tracking for Night Driving, Journal Robotica, Vol. 29, 2011, pp. 495 - 505 [18] Chun Fu Lin, Chin Shenchen, Wen JyiHwang c, Chih YenChen a, Chi HungHwang a and Chun-Li Chang, Novel Outline Features for Pedestrian Detection System with Thermal Images, Pattern Recognition, 2015. [19] Xinyue Zhao, Zaixing and Dong Liang Robust Pedestrian Detection in Thermal Infrared Imagery using a Shape Distribution Histogram Feature and Modified Sparse Representation Classification Pattern Recognition, Vol.48, 2015, pp. 1947 - 1960. [20] Nermin K. Negied, Elsayed E. Hemayed, Magda B. Fayek, Pedestrian Detection in Thermal Bands: Critical Survey, Journal of Electrical Systems and Information Technology, 2015 [21] M. Yasuno, N. Yasuda and M. Aoki, Pedestrian detection and tracking in far infrared images, IEEE Intelligent Transportation Systems, 2005. [22] Bin Qi, John V, Zheng Liu and MIta S, Use of Sparse Representation for Pedestrian Detection in Thermal Images, CVPR, 2014 [23] Yuji Yamauchi, Chika Matsushima, Takayoshi Yamashita, and Hironobu Fujiyoshi, Relational HOG Feature with Wild Card for Object Detection, IEEE International Conference on Computer Vision Workshops (ICCV Workshops), 2011, pp. 1785 - 1792. [24] Q Zhu, M.C. Yeh, K.T. Cheng, and S. Avidan, Fast Human Detection using a Cascade of Histograms of Oriented Gradients, IEEE Conference on Computer Vision and Pattern Recognition, 2006, pp. 1491 - 1498. [25] Hao Sun, Cheng Wang, and Boliang Wang, Night Vision Pedestrian Detection Using a Forward-Looking Infrared Camera, International Workshop on Multi-Platform/Multi-Sensor Remote Sensing and Mapping (M2RSM), 2011, pp. 1 - 4. [26] Michael J Jones and Daniel Snow, Pedestrian Detection using Boosted Features over many Frames, International Conference on Pattern Recognition, 2008, pp. 1 - 4. [27] Jianfu Li and Yong Wang, Pedestrian Tracking in Infrared Image Sequences using Wavelet Entropy Features, Asia Pacific Conference on Computational Intelligence and Industrial Applications, 2009, pp. 288 298. [28] Jianfu Li, Weiguo Gong, Weihong Li and Xiaoying Liu, Robust Pedestrain Detection in Thermal Infrared Imagery using the Wavelet Transform, Journal on Infrared Physics and Technology, 2010, pp. 267 - 273. [29] Ronan Malley, Edward Jones and Martin Glavin, Detection of Pedestrians in Far Infrared Automotive Night Vision using Region Growing and Clothing Distortion Compensation, Journal on Infrared Physics and Technology, Vol. 53, 2010, pp. 439 - 449. [30] H. Shan, J. Ma, and H. Yang, Comparisons of wavelets, contourlets and

m ∑ 1 2(x − µB )2 − Y(x) σB x=1 2σB 3

N

Y(x)

x=1 N ∑ x=Nm

N ∑ 1 2(x − µB )2 Y(x) + Y(x) σB x=N +1 2σB 3 +1

N ∑



(34)

m

Y(x)

x=Nm +1

N ∑ 1 2(x − µB )2 − Y(x) σB x=N +1 2σB 3 m

Equating it to zero and cancelng the common terms, σB 2

Nm ∑ x=1

Y(x) =

Nm ∑

Y(x) (x − µB )2

(35)

x=1

Solution of the above equation leads to σB 2 as given in Eq. 12. To emerge at an equation for σF 2 , Eq. 31 needs to be differentiated with respect to σF as presented below, N ∑ x=Nm +1

Y(x)

N ∑ 1 2 (x − µF )2 − Y(x) =0 σF 2 x=N +1 2σF 3

(36)

m

The solution of this emerges into an equation for σF 2 as provided in Eq. 12. To find pB and pF , least square solution of the following linear equation has been arrived at as an estimation in Eq. 12. Ax = b Where, A = [PB

PF ] , x = [pB

p F ]T , b = Y

(37)

Where, PB , PF and Y are column vectors of distributions PB (x), PF (x) and Y(x). 8. References [1] M. Bertozzi, A. Broggi, C. Caraffi, M.D. Rose, M. Felisa, G. Vezzoni, Pedestrian detection by means of far-infrared stereovision, Computer Vis. Image Understanding, 106, 2007, pp. 194 - 204. [2] A.D. Ciotec, V.E. Neagoe, A.P. Barar, Concurrent self-organizing maps for pedestrian in thermal imagery,U.P.B Sci. Bull. Series C, Vol. 75, Issue: 4, 2013, pp. 45 - 56. [3] H. Sun, C. Wang, B. Wang, N.E. Sheimy, Pyramid Binary Pattern Features for Real-Time Pedestrian Detection from Infrared Videos, Elsevier Neurocomput. Vol. 74, 2011, pp. 797 - 804. [4] J.W. Davis, V. Sharma, Background Subtraction using Contour based Fusion of Thermal and Visible Imagery, Comput. Vis. and Image Understanding, 106, 2007, pp. 162 - 182. [5] A.E. Maadi, X. Maldague, Outdoor Infrared Video Surveillance: A Novel Dynamic Technique for the Subtraction of a Changing Background of IR Images, Infrared Phys. Technology, Vol. 49, 2007, pp. 261 - 265. [6] J. Davis, V. Sharma, Background Subtraction in Thermal Imagery using Contour Saliency, Int. Journal on Comput Vision, Vol. 71, 2007, pp. 161 - 181. [7] H. Nanda and L. Davis, Probabilistic Template Based Pedestrian Detection in Infrared Videos, Proceedings of IEEE intelligent vehicle symposium, 2002, pp. 15 - 20. [8] N. Dalal, B. Triggs, and C. Schmid, Human Detection Using Oriented Histograms of Flow and Appearance, Proc. of European Conf. on Computer Vision, 2006, pp. 428 - 441.

16

[31]

[32] [33]

[34]

[35]

[36]

[37]

[38]

[39] [40]

[41]

[42] [43] [44] [45]

curvelets for seismic denoising, J. Appl. Geophys., vol. 69, 2009, pp. 103 115. E. Candes and D. Donoho, Curvelets, A surprisingly effective nonadaptive representation for objects with edges, Saint-Malo, Ed. Vanderbilt Univ Press, 2000. D. L. Donoho and M. R. Duncan, Digital curvelet transform: Strategy, implementation and experiments, Wavelet Applications Vii, 2000. Coudray, Nicholas, Buessler, A robust thresholding algorithm for unimodal image histograms, Pattern Recognition Letters, 2009, pp. 1010 - 1019. Alex Pothen and Chin Ju Fan, Computing block triangular form of a sparse matrix, ACM Transaction on Mathematical Software, 1990, pp. 303 - 324. Massimo Bertozzi, Alberto Broggi, Alessandra Fascioli, Thorsten Graf and Marc Michael, pedestrian detection for driver assistance using multi resolution infrared vision, IEEE Trans. on Vehicular Technology, 2004. Dipti Deodhare, Facial Expressions to Emotions: A study of Computational Paradigms for Facial Emotion Recognition, Understanding Facial Expression in Communication, 2015. Dipti Deodhare, Vidya Sagar. M and Murthy. M. N, Bi-modal Projection based Features for Pattern Classification, International Joint Conference on Neural Networks (IJCNN), IEEE World Congress on Computational Intelligence, Vancouver, Canada, July, 2006, pp. 16 - 28. Li Zhang, Bo Wu and Ram Navatia, Pedestrian Detection in Infrared Images based on Local Shape Features, in Proc. IEEE Conf. CVPR, 2007, pp. 1 - 8. Viola P and Jones, Rapid Object Detection using a Boosted Cascade of Simple Features, in Proc. IEEE Conf. CVPR, 2001, pp. I 511 - I 518. R. Girshick, J. Donahue, T. Darrelland J. Malik, Rich Feature Hierarchies for Object Detection and Semantic Segmentation, in Proc. IEEE Conf. CVPR, 2014, pp. 1 - 5. K. He, X. Zhang, S. Ren and J. Sun, Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition, in ECCV, 2014, pp. 1 7. Ross Girshick, Fast R-CNN, in ICCV, 2015, pp. 1 - 7. J. R. Uijlings, K. E. Van de Sande, T. Gevers, A. W. Smeulders, Selective Search for Object Recognition, in IJCV, 2013. M. Zeiler and R. Fergus, Regularization of neural Networks using dROPconnect, in International Conference on Machine Learning, 2013. Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun, Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks, in Computer Vision and Pattern Recognition, 2015.

17

 Performance improvement through multi scale pedestrian models.  A formal analytic criterion to automate the identification of pedestrian scale and also to estimate the statistical parameters of pedestrians.  A robust probability based thresholding mechanism that has resulted in a repertoire of solutions to do away with the most prominent impediment to state-of-the-art pedestrian detectors, namely pedestrian body part splitting.  The use of curvelet features has been explored. To the best of our knowledge, there are no reported results on curvelet feature based pedestrian detection in thermal literature.  A fast implementation of curvelet transform to enable expeditious pedestrian detection has been described.

Corresponding author: A. Lakshmi, Centre for Artificial Intelligence and Robotics, Bangalore, Karnataka, India. email: [email protected] Authors: Faheema AGJ, Centre for Artificial Intelligence and Robotics, Bangalore, Karnataka, India. Dipti Deodhare, Centre for Artificial Intelligence and Robotics, Bangalore, Karnataka, India.