Optik 157 (2018) 1155–1165
Contents lists available at ScienceDirect
Optik journal homepage: www.elsevier.de/ijleo
Original research article
GPU accelerated face detection from low resolution surveillance videos using motion and skin color segmentation Vikram Mutneja a,∗ , Satvir Singh b a
Ph.D. Research Scholar, I.K. Gujral Punjab Technical University, Kapurthala, Punjab, India Associate Professor, Department of Electronics & Communication, I.K. Gujral Punjab Technical University main campus, Kapurthala, Punjab, India b
a r t i c l e
i n f o
Article history: Received 9 July 2017 Accepted 22 November 2017 Keywords: Face detection Haar features Low resolution surveillance videos Motion segmentation Skin color filtering GPU computing Parallel processing Pattern recognition
a b s t r a c t Surveillance systems are being used at most of the places nowadays for the security purpose, which involves face biometrics as an essential step. However, frequently the resolution of faces in surveillance videos is small owing to the various factors such as the distance between CCTVs and place of footage, wide coverage angle, installation problems, environmental factors and hardware constraints which make it difficult to detect and extract the human faces from surveillance videos. In this paper, we have worked on using the motion and skin color based segmentation to extract the region of interest which is subsequently subjected to face detection. Further GPU acceleration has been employed to facilitate realtime face detection. There are following main contributions made in this work: firstly, the adaptability in the selection of lowest and highest values of scaling factors to facilitate the multi-scale face detection based on the analysis of regions segmented by motion and skin color pixels. Secondly, the use of a band of pre-trained Haar-classifiers and image scaling to facilitate multi-scale face detection has been done unlike Viola–Jones algorithm which is based on detector scaling. Pre-trained classifiers of base size in the range of 13 by 13 to 30 by 30 pixels have been used for the face detection in this range. The faces of size higher than 30 by 30 pixels are detected by applying the image scaling. The proposed algorithm has been successfully tested on surveillance videos from dataset ChokePoint to detect the low-resolution faces of minimum size of the order of 8 by 8 pixels. © 2017 Elsevier GmbH. All rights reserved.
1. Introduction The task of detection, extraction of human faces from the surveillance videos is blend of various streams of engineering and scientific research areas such as video and image signal processing, computer vision, pattern recognition, machine intelligence, mathematics, geometry and parallel processing for speed enhancement. Since past many years, there has been a lot of research and development in the field of intelligent video surveillance systems. Human identification is an important task in surveillance systems which is usually based on face biometrics. Despite the significant progress in face detection and processing technology, yet there is a lot of scope in development of real-time facial image processing systems robust to issues such as face pose variation, image degradation, low-resolution and occlusions. Face biometric systems receive input from the face detection module, which should be capable of extracting the human faces even from low quality or low-resolution surveillance video data.
∗ Corresponding author. E-mail address:
[email protected] (V. Mutneja). URL: http://www.drsatvir.in (V. Mutneja). https://doi.org/10.1016/j.ijleo.2017.11.188 0030-4026/© 2017 Elsevier GmbH. All rights reserved.
1156
V. Mutneja, S. Singh / Optik 157 (2018) 1155–1165
Features are an important components of the system for the classification task. The Haar-features have been widely used in the task of object detection in most of the works in literature. These have been the key ingredients in the breakthrough developments in the area of face detection with the introduction of Viola–Jones face detection algorithm [1,2]. Thereafter, innumerable works have targeted modification and improvement in the Viola–Jones framework ([3–5,25]). The number of Haar features is quite large because of their position and scalability depending upon the size of input image (refer Table 1). The accuracy of overall system depends upon the quality as well quantity of features. More the number of good features, more is the expected accuracy, but at the cost of decreased efficiency. Therefore researchers have tried to make the balance Table 1 Classifiers generated of base size in the range of 13 × 13 to 30 × 30, N: features pool size, W: number of weak classifiers generated, P: percentage of selected weak classifiers, cType: classifiers type (1–5). Base size
cType
N
W
P
Base size
N
W
P
13 × 13 13 × 13 13 × 13 13 × 13 13 × 13 13 × 13
1 2 3 4 5 Total
1980 1980 1188 1188 900 7236
234 186 73 19 96 608
11.82 9.39 6.14 1.60 10.67 8.40
22 × 22 22 × 22 22 × 22 22 × 22 22 × 22 22 × 22
21,000 21,000 13,230 13,230 10,000 78,460
673 493 194 53 240 1653
3.20 2.35 1.47 0.40 2.40 2.11
14 × 14 14 × 14 14 × 14 14 × 14 14 × 14 14 × 14
1 2 3 4 5 Total
2808 2808 1716 1716 1296 10,344
258 220 92 28 90 688
9.19 7.83 5.36 1.63 6.94 6.65
23 × 23 23 × 23 23 × 23 23 × 23 23 × 23 23 × 23
25,410 25,410 16,170 16,170 12,100 95,260
769 506 212 39 258 1784
3.03 1.99 1.31 0.24 2.13 1.87
15 × 15 15 × 15 15 × 15 15 × 15 15 × 15 15 × 15
1 2 3 4 5 Total
3822 3822 2366 2366 1764 14,140
335 269 126 27 114 871
8.77 7.04 5.33 1.14 6.46 6.16
24 × 24 24 × 24 24 × 24 24 × 24 24 × 24 24 × 24
30,613 30,613 19,481 19,481 14,641 114,829
813 512 230 52 282 1889
2.66 1.67 1.18 0.27 1.93 1.65
16 × 16 16 × 16 16 × 16 16 × 16 16 × 16 16 × 16
1 2 3 4 5 Total
5145 5145 3150 3150 2401 18,991
387 264 105 24 130 910
7.52 5.13 3.33 0.76 5.41 4.79
25 × 25 25 × 25 25 × 25 25 × 25 25 × 25 25 × 25
36,432 36,432 23,184 23,184 17,424 136,656
822 569 249 49 294 1983
2.26 1.56 1.07 0.21 1.69 1.45
17 × 17 17 × 17 17 × 17 17 × 17 17 × 17 17 × 17
1 2 3 4 5 Total
6720 6720 4200 4200 3136 24,976
408 304 124 23 144 1003
6.07 4.52 2.95 0.55 4.59 4.02
26 × 26 26 × 26 26 × 26 26 × 26 26 × 26 26 × 26
43,200 43,200 27,600 27,600 20,736 162,336
939 628 240 52 184 2043
2.17 1.45 0.87 0.19 0.89 1.26
18 × 18 18 × 18 18 × 18 18 × 18 18 × 18 18 × 18
1 2 3 4 5 Total
8704 8704 5440 5440 4096 32,384
476 335 145 28 163 1147
5.47 3.85 2.67 0.51 3.98 3.54
27 × 27 27 × 27 27 × 27 27 × 27 27 × 27 27 × 27
50,700 50,700 32,500 32,500 24,336 190,736
1010 699 254 49 239 2251
1.99 1.38 0.78 0.15 0.98 1.18
19 × 19 19 × 19 19 × 19 19 × 19 19 × 19 19 × 19
1 2 3 4 5 Total
11,016 11,016 6885 6885 5184 40,986
531 353 177 37 175 1273
4.82 3.20 2.57 0.54 3.38 3.11
28 × 28 28 × 28 28 × 28 28 × 28 28 × 28 28 × 28
59,319 59,319 37,908 37,908 28,561 223,015
1010 649 271 63 304 2297
1.70 1.09 0.71 0.17 1.06 1.03
20 × 20 20 × 20 20 × 20 20 × 20 20 × 20 20 × 20
1 2 3 4 5 Total
13,851 13,851 8721 8721 6561 51,705
604 403 177 40 200 1424
4.36 2.91 2.03 0.46 3.05 2.75
29 × 29 29 × 29 29 × 29 29 × 29 29 × 29 29 × 29
68,796 68,796 44,226 44,226 33,124 259,168
1094 729 240 63 374 2500
1.59 1.06 0.54 0.14 1.13 0.96
21 × 21 21 × 21 21 × 21 21 × 21 21 × 21 21 × 21
1 2 3 4 5 Total
17,100 17,100 10,830 10,830 8100 63,960
640 443 165 38 222 1508
3.74 2.59 1.52 0.35 2.74 2.36
30 × 30 30 × 30 30 × 30 30 × 30 30 × 30 30 × 30
79,576 79,576 51,156 51,156 38,416 299,880
1208 759 247 67 446 2727
1.52 0.95 0.48 0.13 1.16 0.91
V. Mutneja, S. Singh / Optik 157 (2018) 1155–1165
1157
between number of features and detection accuracy. Also the redundant and insignificant features should be eliminated from the features database as they deteriorate the accuracy as well as hamper the system’s efficiency. GPU (i.e. Graphical Processing Unit) computing is the use of a GPU in conjunction with CPU (Central Processing Unit) to accelerate general-purpose scientific and engineering applications. GPU computing offers unprecedented application performance by offloading compute-intensive portions of the application to the GPU, while the remainder of the code still runs on the CPU. GPUs consist of thousands of smaller, more efficient cores designed for parallel performance. Since video processing involves a lot of graphics and computational intensive processing, use of GPUs can significantly accelerate the task of automatic video processing from surveillance systems. In the proposed technique, image scaling has been employed to facilitate the multi-scale face detection. Skin color filtering and motion segmentation is done to select the pixel locations, on which detection has to be applied. Image slices of size same as that of base detector are generated and processed concurrently by GPU CUDA kernel, and marked as a face or non-face area. To avoid multiple detections of same face area, the detection process is started with lowest scaling factor, so that large size faces are detected earlier and saved. While creating sub-images for the subsequent higher values of scaling factors, no sub-images are produced from the pixels lying within the region of previously selected faces. It helps reducing the number of sub-images to be processed at different scaling factors, taking into account the faces detected for previous scaling factors, thereby reducing the memory requirement and enhancing the time efficiency. This paper is structured as follows: Section 2 provides Literature Survey, Section 3 gives details of proposed technique for the face detection using motion and skin color segmentation, Section 4 describes experimental setup, Section 5 elaborates experimental results and discussion, and finally, Section 6 presents conclusion and future Scope. 2. Literature survey Many researchers have worked in past two decades in the field of detection and processing of the facial images from videos. Nasrollahi and Moeslund [6] worked on generating good quality frontal face image from a low-resolution video sequence, by using face detector proposed by Viola and Jones [2], facial features such as head pose estimation, sharpness, brightness, and resolution are extracted, based on which face quality is estimated. As per [7], they used auto-associative memories for the head pose estimation, further generated high-resolution frontal face image using reconstruction and learning based super-resolution techniques. Bagdanov et al. [8] worked on multi-face detection, tracking, logging and facial image quality analysis. They developed multi-pose face detector, based on AdaBoost face detector, used lateral and frontal face detectors. As reported, the proposed face-logging system is appropriate for situations in which face size is bounded; illumination conditions are consistent with the images used to train the AdaBoost detectors in their multi-pose face detectors. The developed system was reportedly evaluated on 10 h of realistic surveillance videos, with quantitative as well as qualitative analysis. During the past many years of research in the area of face detection, Viola and Jones [2] has been highly successful. Authors presented novel concepts such as a new image representation called “Integral image” to facilitate efficient calculation of features, the AdaBoost learning algorithm for generating classifiers, the cascade of classifiers for faster computation. As per the survey by [9], growing research field is concentrating on developing appearance-based models for multi-view and rotation invariant face detections. Wang [10] performed the complete analysis of Viola–Jones face detection algorithm, a learning code and a trained face detector for testing on color images. As the Viola–Jones algorithm involves multiple screenings of same face area by detectors of different size, authors proposed a post-processing step to reduce detection redundancy by using a robustness argument. Alionte and Lazar [11] proposed a practical implementation of a face detector based on Viola–Jones algorithm using Matlab cascade object detector by using the system type object vision.CascadeObjectDetector. Eight face detectors were developed using the trainCascadeObjectDetector function. The performance of different face detectors was analyzed by tuning the number of cascade layers and target False Alarm Rate. In the case of a color image sequence, using skin color results in faster face localization and pose-estimation. Chen et al. [12] used video object and skin color segmentation for face localization and neural networks for face quality analysis. Kasturi et al. [13] proposed a very robust framework for performance evaluation of face detection and tracking in surveillance videos. Zhu and Ramanan [14] presented a unified model for face detection, pose estimation and landmarks localization in real world cluttered images. Sarkar et al. [15] made use of skin color cues for face region estimation and subsequent localization of facial landmarks i.e. eyes and mouth in the detected skin color area so as to confirm it as a face region. The testing was done for face detection as well as tracking in the low-resolution video sequences. Gaba et al. [16] proposed pixel oriented and non-parameterized method for moving object identification and tracking from surveillance videos, robust against changes in brightness, dynamic variations in the surrounding environment and noise from the background. The system was reported to have the capability to eliminate the ghost objects and tested on several open source video databases by imposing a single set of variables to overcome shortcomings of relevant and recently developed techniques. Singh et al. [17] proposed the method based on Discrete Fractional Fourier Transform (DFrFT) for more precise and with better recall rate the image change detection, which can be used as alternate to the motion segmentation for finding the regions of change in video frames. Jairath et al. [18] proposed adaptive skin color model to reduce false positives in AdaBoost based face detection without affecting the system’s time efficiency and performed testing on a database of five videos containing manually annotated face images. Yan et al. [19] proposed head pose rotation and occlusion invariant modified AdaBoost face detection from videos
1158
V. Mutneja, S. Singh / Optik 157 (2018) 1155–1165
3.5
x 10
Number of Segmented Pixels
3 2.5 2 1.5 1 0.5 0 0
10
20
30
40
50
60
70
80
90
100
Fig. 1. Number of motion segmented pixels versus threshold (single frame). 3.5
x 10
Threshold=0 Threshold=1 Threshold=2 Threshold=3 Threshold=4 Threshold=5 Threshold=6 Threshold=7 Threshold=8 Threshold=9 Threshold=10 Threshold=11 Threshold=12 Threshold=13 Threshold=14 Threshold=15 Threshold=16 Threshold=17 Threshold=18 Threshold=19 Threshold=20
Number of Segmented Pixels
3
2.5
2
1.5
1
0.5
0 0
50
100
150
200
250
300
350
Fig. 2. Motion segmented pixels in video frames for different values of thresholds.
by employing self-adaptation in the selection of the region of interest to increase system time efficiency. [20] is US patent on using face and motion detection for best viewing frames selection in video conference endpoint. Guan et al. [21] proposed a face localization method using fuzzy classifier, Haar-features and YCbCr skin color features, [22] exploited combination of Haar-like, LBP (Local Binary Patterns) and SURF (Speeded Up Robust Features) in combination with PSO (Particle Swarm Optimization) and SVM (Support Vector Machines) for multi-view face detection, Seyedarabi et al. [23] employed face edge information along with skin color cues in YCbCr color space model to develop fuzzy rule-based classifiers to extract head candidates from image. Kuo et al. [24] made use of fuzzy c-means for color identification of objects of interest in surveillance videos. [25–32] worked on parallelization of Viola–Jones face detector using GPU. 3. Proposed algorithm The proposed research work is in the detection and extraction of human facial images from the low-resolution surveillance video sequence using Haar-features and AdaBoost based face detection, employing motion and skin color segmentation for search space reduction and GPU acceleration for faster processing. In a typical surveillance scenario, as a person enters the scene, the system starts detection and extracting faces from video frames being captured. The trigger to start the detection is based on the outcome of motion segmentation. If no pixels are obtained after segmentation, detection is not started. To handle low-resolution of faces, two major modifications have been done in Haar-features based face detection, Firstly it relies on image scaling instead of detector scaling. Secondly, the base detectors of smaller size have been generated and incorporated in the detection process. Another modification which has been done to exhibit improved efficiency is the use of the product of parameters classifiers window and True Detection rate (TDR) for the generation and implementation of detection cascade in AdaBoost. The parameter TDR has been computed during the training process as the ratio of total number of true detections including faces as well as non-faces, with respect to the total number of example images. The dataset from MIT of faces (2429) and non-faces (4547) has been used for training the detectors. The detectors have been trained using the example images of size 19 × 19 and its scaled versions in the range 13 × 13 to 30 × 30. The multi-scale face detection has been handled by using the trained detectors of size in the range of 13 × 13 to 30 × 30 (refer Table 1). The detection of faces in this range (13 × 13 to 30 × 30) is done by using the individual trained classifiers.
V. Mutneja, S. Singh / Optik 157 (2018) 1155–1165
1159
Fig. 3. Processing without motion segmentation.
Fig. 4. Processing with motion segmentation.
However, to facilitate the detection of faces of size higher than 30 × 30, image scaling is performed unlike detector scaling in the Viola–Jones algorithm, and detector of size 30 × 30 is applied for subsequent detections of higher size of faces. The lower and upper values of scaling factors are computed based on various factors such as the size of input images, the minimum and maximum size of the facial image to be detected and maximum size of the image which can be handled by the CUDA kernel. The motion segmentation and skin color filtering has been used for search space reduction as well as to select the lower and upper values of scaling factors. The area of the regions bounded by the pixels segmented by using motion as well as skin color cues is computed. The analysis of the area of the regions formed is done, to estimate the smallest and largest size of the face in the input image/video frame. Correspondingly the lower and upper values of scaling factors are computed to facilitate the multi-scale face detection. The motion segmentation has been done by using the technique of inter-frame difference and comparison with the threshold value. The pixels having the difference above the set threshold are marked as motion segmented. The skin color segmentation has been done using the YCbCr color space as per equations 97.5 ≤ Cb ≤ 142.5, and 17 ≤ Cr ≤ 134. Motion segmentation, as well as skin color filtering, are done in native video frame resolution. The locations of segmented pixels are translated to the scaled versions, by multiplying their x and y co-ordinates to same factor to which image is scaled. Therefore
Fig. 5. Time comparison with and without motion and skin color segmentation.
1160
V. Mutneja, S. Singh / Optik 157 (2018) 1155–1165
the number of segmented pixels becomes quite independent to the value of the scaling factor during the upward scaling in the detection process. Algorithm 1 shows the working of the GPU accelerated face detection process. Algorithm 1. 1: 2: 3:
4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23:
GPU accelerated face detection process Load the First Frame of the Video (pFrame) Get the size of the Video Frame, NR: Number of Rows, NC: Number of Columns Compute the Lowest Scaling Factor(LR), Upper Scaling Factor(UR), N = Number of scaling factors Generate the matrix for shifting interval (SL) based on the number and values of scaling factors Load the Classifiers array of Haar Features Detection Cascade, Upload it to GPU Initialize and load the CUDA kernel for performing the detection in parallel on sub-images, Initialize and load the kernel constants Loop1: Load the Current Frame of Video (cFrame) Extract the motion and skin color segmented pixels as vectors XMS and YMS holding x, y co-ordinates respectively of all pixels corresponding to area in which motion occurred Initialize the Array to Hold the Locations of Detected Faces Loop2: For multi-scale face detection, i=1:N Scale Image and Segmented Pixel Vectors XMS, YMS: img1 = Img * i * LR, XMSi = XMS * i * LR, YMSi = YMS * i * LR Determine xMax, yMax Maximum pixel locations for sub-image generation based upon size of scaled image, xMin = yMin = 2. Eliminate the values from XMSi and YMSi which are out of range xMin to xMax and yMin to yMax Initialize the array to hold the sub-images to be generated. Loop3: To scan for pixel locations from XMSi and YMSi for sub-images generation with shift interval SL(i) if X, Y lies in the previously detected face area then Skip Rest and Go to Next Segmented Pixel Location (Loop3) Create the sub-image, compute its integral image and add it to array of sub-images (Integral Versions) Iterate Loop2 for Next Segmented Pixel Location Invoke the CUDA kernel object and pass the generated array of sub-images (Integral Versions) to detect faces Update Array of Detected Faces for Scaling Factor=i * LR, Iterate Loop2 for Next Scaling Factor All Faces Detected in Current Frame, Set pFrame = cFrame, i.e. Previous frame=Current frame and Iterate Loop1 for Next Frame
The proposed technique is based on Haar-features which is window based technique. For achieving the parallelization, the array is generated by sub-images corresponding to segmented pixels only and their integral versions are computed. No sub-images are created for the pixel locations which are found to lie in the previously detected face area corresponding to all the scaling factors processed so far. The tasks of generation of sub-images and their integral versions, as well as their processing, is handled in parallel by separate CUDA kernel for each. After computation of threshold values corresponding to all features on a particular sub-image within device code, it is parsed by detector cascade within the kernel itself, and the corresponding sub-image is marked as face or non-face. Algorithm 2 shows the complete working of GPU CUDA kernel to process the sub-images in parallel, generated from the motion and skin color segmented pixels. Algorithm 2. 1:
2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20:
CUDA kernel GPU accelerated face detection process Load Kernel Constants Detector Window Size (DWS), Number of Weak Classifiers (NCF), Number of Stages of Cascade Kernel, Array of Features (F) Thresholds (TH) Load Kernel Arguments: Pointer to Array of Sub-Images Copy Classifiers to shared memory for better efficiency Load the sub-image index: subImgIdx = blockIdx. x Compute Absolute Base Address for pixel Operations: BaseAddress = subImgIdx * DWS * DWS Get the Feature Number (fNum) from thread index: fNum = threadIdx. x Array to hold the sign of terms containing sum of sub-rectangles sign[16] = 1, − 1, − 1, 1, − 1, 1, 1, − 1, 1, − 1, − 1, 1, − 1, 1, 1, − 1 Initialize shared memory array to hold binary results of decision stumps, face or noface (NCF) Loop1: While (fNum < fMax) (Using The Thread Re-use) Fetching the co-ordinates of sub-rectangles from feature array and compute Haar Feature Value Compare it for Lower and Upper Bounds of Thresholds (LL and UL) Update face o r n oface Array fnum = fnum + blockDim. x, Go Back to Loop1 Loop2: For i=1:numLevels; Operation of Detection Cascade Threshold value V(i) computed of Stage number i if if(V(i) > TH(i)) then Go to Loop2, to Iterate to Next Stage of Detection Cascade else status(subImgIdx) = 0, Detected as Non Face return if i= = numLevels then status(subImgIdx) = 1 (Detected as Face) return
V. Mutneja, S. Singh / Optik 157 (2018) 1155–1165
1161
Fig. 6. Result of applying proposed algorithm on test video frames (1).
4. Experimental setup The machine loaded with Windows 8.1 (64 Bit) on Intel Core i3 1.9 GHz and NVIDIA graphics processing unit GeForce GT 740M has been used to develop and test the proposed method. The development and testing have been done in MATLAB version 8.2.0.701 (R2013b). The CUDA files are compiled to generate the PTX (Parallel Thread Execution) code files by CUDA compiler of Release 6.0, V6.0.1 with the architecture support corresponding to compute factor 3.5. The PTX code is used to create and load the CUDA kernel object in MATLAB workspace. 5. Results and discussion The proposed algorithm is based on Haar features and AdaBoost face detection in conjunction with motion and skin color segmentation and GPU acceleration. Apart from this, the generation of sub-images also is effectively reduced by eliminating the pixel locations lying within the area of previously selected faces. A considerable amount of scanning area reduction has been achieved by using motion and skin color segmentation. Fig. 1 shows the number of motion segmented pixels with respect to the set threshold value for the inter-frame difference. Fig. 2 shows the effect on the number of motion-segmented pixels by varying the threshold value for processing the video frames of a test video. Table 2 shows the percentage search space reduction without using motion segmentation, while Table 3 by using motion segmentation. Tables 2 and 3 show the detailed analysis of reduction in the number of sub-images to be generated in each case for all the scaling factors to be applied while processing a video frame from the test video. The corresponding processed video frame, without motion segmentation and with motion segmentation are shown in Figs. 3 and 4 respectively. By comparison of both, it is inferred that motion segmentation has helped in reducing false positives. The time measurement was done for processing the pixels for each scaling factor, with and without using the motion segmentation. Fig. 5 show the time comparison. The maximum speedup of 168.6 has been achieved at the scaled frame size of 570 × 760 and minimum of 1.39 at frame size 38 × 51. The base size of the video frame is 480 × 640. The proposed algorithm has been tested on the seven different test videos (V1–V7), each comprising of 200 frames taken from the low-resolution surveillance systems from the video dataset ChokePoint [33]. The size of the videos has been scaled
1162
V. Mutneja, S. Singh / Optik 157 (2018) 1155–1165
Table 2 The number of sub-images generated for different scaling factors without using motion segmentation. Scaling factor
Frame size
Total number of sub-images
Sub-images generated
Percentage reduction
0.08 0.16 0.24 0.32 0.40 0.48 0.55 0.63 0.71 0.79 0.87 0.95 1.03 1.11 1.19 1.27 1.35 1.43 1.50 1.58 1.66 1.74 1.82 1.90 1.98 2.06 2.14 2.22 2.30 2.38
38 × 51 76 × 102 114 × 152 152 × 203 190 × 254 228 × 304 266 × 355 304 × 406 342 × 456 380 × 507 418 × 558 456 × 608 494 × 659 532 × 710 570 × 760 608 × 811 646 × 862 684 × 912 722 × 963 760 × 1014 798 × 1064 836 × 1115 874 × 1166 912 × 1216 950 × 1267 988 × 1318 1026 × 1368 1064 × 1419 1102 × 1470 1140 × 1520
464 4320 11,960 23,530 38,976 58,092 81,252 108,288 138,880 173,630 212,256 254,324 300,664 350,880 404,424 462,354 524,160 589,180 658,700 732,096 808,592 889,702 974,688 1,062,660 1,155,360 1,251,936 1,351,384 1,455,674 1,563,840 1,674,764
464 4320 11,960 23,530 38,976 58,092 81,252 108,288 138,880 173,630 212,256 254,324 284,121 285,660 286,281 287,826 288,570 289,194 291,374 292,467 292,467 293,562 294,030 294,659 294,659 295,758 295,758 295,758 296,859 296,738
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 5.50 18.59 29.21 37.75 44.95 50.92 55.77 60.05 63.83 67.00 69.83 72.27 74.50 76.38 78.11 79.68 81.02 82.28
Table 3 Percentage reduction search space versus scaling factor by using motion segmentation. SF
Frame size
Total sub-images
Sub-images generated (motion segmented)
Percentage reduction
0.08 0.16 0.24 0.32 0.40 0.48 0.55 0.63 0.71 0.79 0.87 0.95 1.03 1.11 1.19 1.27 1.35 1.43 1.50 1.58 1.66 1.74 1.82 1.90 1.98 2.06 2.14 2.22 2.30 2.38
38 × 51 76 × 102 114 × 152 152 × 203 190 × 254 228 × 304 266 × 355 304 × 406 342 × 456 380 × 507 418 × 558 456 × 608 494 × 659 532 × 710 570 × 760 608 × 811 646 × 862 684 × 912 722 × 963 760 × 1014 798 × 1064 836 × 1115 874 × 1166 912 × 1216 950 × 1267 988 × 1318 1026 × 1368 1064 × 1419 1102 × 1470 1140 × 1520
464 4320 11,960 23,530 38,976 58,092 81,252 108,288 138,880 173,630 212,256 254,324 300,664 350,880 404,424 462,354 524,160 589,180 658,700 732,096 808,592 889,702 974,688 1,062,660 1,155,360 1,251,936 1,351,384 1,455,674 1,563,840 1,674,764
35 111 178 260 325 412 461 577 624 674 781 841 899 899 899 899 782 782 782 782 782 782 782 782 782 782 782 782 782 782
92.46 97.43 98.51 98.90 99.17 99.29 99.43 99.47 99.55 99.61 99.63 99.67 99.70 99.74 99.78 99.81 99.85 99.87 99.88 99.89 99.90 99.91 99.92 99.93 99.93 99.94 99.94 99.95 99.95 99.95
V. Mutneja, S. Singh / Optik 157 (2018) 1155–1165
1163
Fig. 7. Result of applying proposed algorithm on test video frames (2). Table 4 Performance comparison (of detection rate (DR) and false positives (FP) and processing speed in frames per second (FPS)) with original Viola–Jones algorithm. Video
V1 V2 V3 V4 V5 V6 V7
Frame size
75 × 100 92 × 112 108 × 144 168 × 224 240 × 320 480 × 640 600 × 800
Viola–Jones (CPU)
Proposed algorithm (GPU)
GPU acceleration
DR
FP
Speed (FPS)
DR
FP
Speed (FPS)
63.5 71.5 79.5 84.5 88 92 94
9 8 6 5 5 4 2
14.00 13.25 11.51 8.47 5.79 1.80 1.14
64.5 73 82 87 90.5 93.5 95.5
7 7 5 4 4 3 1
99.51 74.53 43.12 38.99 24.34 7.27 3.16
7.11 5.62 3.75 4.60 4.20 4.04 2.77
to range from 75 × 100 to 600 × 800. The performance of the proposed algorithm has been compared with the original [2] algorithm (MATLAB Support). Figs. 6–8 show the results of application of the proposed algorithm on some of the frames from various test videos. Table 4 shows the performance comparison of the proposed algorithm with the original Viola–Jones algorithm in terms of the detection rate, false positives and detection time. From the results, it is inferred that we have been able to achieve the promising results in terms of processing speed as well as detection accuracy on the test videos. The maximum GPU versus CPU speed-up has been achieved of the order of 7.11 at the test video of frame size 75 × 100 and minimum of 2.77 at size 600 × 800. 6. Conclusions and future scope In the proposed system, we have developed the algorithm for the detection and extraction of faces from low-resolution surveillance videos by applying motion segmentation, skin color filtering and GPU acceleration. The inter-frame difference has been used for motion segmentation and YCbCr model for the skin color segmentation. The processing of the segmented
1164
V. Mutneja, S. Singh / Optik 157 (2018) 1155–1165
Fig. 8. Result of applying proposed algorithm on test video frames (3).
pixels has been parallelized through GPU computing for face detection using modified Haar-features and AdaBoost based technique. The dynamic determination of lower and upper values of scaling factors is done by analysis of the area of regions formed by segmentation. Further to facilitate multi-scale face detection, we have used the band of trained classifiers of base size in the range of 13 × 13 to 30 × 30. Image scaling is applied to detect the faces of the size larger than 30 × 30. The testing has been performed on the test videos from the video dataset ChokePoint [33] to detect the low-resolution faces of the size of the order of 8 × 8 pixels. The performance analysis has been done in terms of the time efficiency improvement using motion and skin color segmentation, GPU acceleration and detection accuracy. From the results achieved, we contend that proposed algorithm is very effective in detection and extraction of faces from low-resolution surveillance videos. The proposed system can be improved by incorporation of adaptation in the selection of threshold level for generating the motion-segmented pixels, and adaptive selection of weight in the classification task of AdaBoost algorithm, in the threshold comparison task. We intend to do further improvements in the proposed algorithm by incorporating the handling of faces with severe head poses and occlusions, quality enhancement of degraded videos and integration of the proposed algorithm with the overall video surveillance system using face biometrics.
Acknowledgements Authors gratefully acknowledge the support provided by I.K.G. Punjab Technical University, Kapurthala, Punjab, India in doing this research work. Authors deeply acknowledge the immense help received from the scholars whose articles are cited and included in references of this manuscript. The authors are also grateful to authors/editors/publishers of all those articles, journals and books from where the literature for this article has been reviewed and discussed. This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.
V. Mutneja, S. Singh / Optik 157 (2018) 1155–1165
1165
References [1] P. Viola, M. Jones, Rapid object detection using a boosted cascade of simple features, Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference on, vol. 1, IEEE (2001) I. [2] P. Viola, M.J. Jones, Robust real-time face detection, Int. J. Comput. Vis. 57 (2) (2004) 137–154. [3] A. Egorov, A. Shtanko, P. Minin, Selection of Viola–Jones algorithm parameters for specific conditions, Bull. Lebedev Phys. Inst. 42 (8) (2015) 244–248. [4] A.P. Mena, M.B. Mayoral, E. Díaz-Lópe, Comparative study of the features used by algorithms based on Viola and Jones face detection algorithm, in: Bioinspired Computation in Artificial Systems, Springer, 2015, pp. 175–183. [5] S. Pandey, S. Sharma, An optimistic approach for implementing Viola–Jones face detection algorithm in database system and in real time, in: International Journal of Engineering Research and Technology, vol. 4, ESRSA Publications, 2015. [6] K. Nasrollahi, T.B. Moeslund, Extracting a good quality frontal face image from a low-resolution video sequence, IEEE Trans. Circ. Syst. Video Technol. 21 (10) (2011) 1353–1362. [7] N. Gourier, J. Maisonnasse, D. Hall, J. Crowley, Head pose estimation on low resolution images, Multimodal Technol. Percept. Hum. (2007) 270–280. [8] A.D. Bagdanov, A. Del Bimbo, F. Dini, G. Lisanti, I. Masi, Compact and efficient posterity logging of face imagery for video surveillance, IEEE Multimed. 19 (4) (2012) 48–59. [9] R. Belaroussi, M. Milgram, A comparative study on face detection and tracking algorithms, Expert Syst. Appl. 39 (8) (2012) 7158–7164. [10] Y.-Q. Wang, An analysis of the Viola–Jones face detection algorithm, Image Process. On Line 4 (2014) 128–148. [11] E. Alionte, C. Lazar, A practical implementation of face detection by using Matlab cascade object detector, System Theory, Control and Computing (ICSTCC), 2015 19th International Conference on, IEEE (2015) 785–790. [12] T.-W. Chen, S.-C. Hsu, S.-Y. Chien, Automatic feature-based face scoring in surveillance systems, Multimedia, 2007. ISM 2007. Ninth IEEE International Symposium on, IEEE (2007) 139–146. [13] R. Kasturi, D. Goldgof, P. Soundararajan, V. Manohar, J. Garofolo, R. Bowers, M. Boonstra, V. Korzhova, J. Zhang, Framework for performance evaluation of face, text, and vehicle detection and tracking in video: data, metrics, and protocol, IEEE Trans. Pattern Anal. Mach. Intell. 31 (2) (2009) 319–336. [14] X. Zhu, D. Ramanan, Face detection, pose estimation, and landmark localization in the wild, Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, IEEE (2012) 2879–2886. [15] R. Sarkar, S. Bakshi, P.K. Sa, A real-time model for multiple human face tracking from low-resolution surveillance videos, Proc. Technol. 6 (2012) 1004–1010. [16] N. Gaba, N. Barak, S. Aggarwal, Motion detection, tracking and classification for automated video surveillance, Power Electronics, Intelligent Control and Energy Systems (ICPEICES), IEEE International Conference on, IEEE (2016) 1–5. [17] S. Singh, K. Singh, Image change detection by means of discrete fractional Fourier transform, Int. J. Comput. Appl. 77 (16) (2013). [18] S. Jairath, S. Bharadwaj, M. Vatsa, R. Singh, Adaptive skin color model to improve video face detection, in: Mach. Intell. Signal Process., Springer, 2016, pp. 131–142. [19] S. Yan, H. Wang, Z. Fang, C. Wang, A face detection method combining improved adaboost algorithm and template matching in video sequence, Intelligent Human–Machine Systems and Cybernetics (IHMSC), 2016 8th International Conference on, vol. 2, IEEE (2016) 231–235. [20] G.R.G. Aarrestad, R.Ø. Aas, K. Tangeland, Use of face and motion detection for best view framing in video conference endpoint, uS Patent App. 15/059,386 (Mar. 3 2016). [21] C.-N. Guan, C.-F. Juang, G.-C. Chen, Face localization using fuzzy classifier with wavelet-localized focus color features and shape features, Digit. Signal Process. 22 (6) (2012) 961–970. [22] H. Pan, Y. Zhu, L. Xia, Efficient and accurate face detection using heterogeneous feature descriptors and feature selection, Comput. Vis. Image Understand. 117 (1) (2013) 12–28. [23] H. Seyedarabi, S.M. Bakhshmand, S. Khanmohammadi, Multi-pose head tracking using colour and edge features fuzzy aggregation for driver assistant system, Signal and Image Processing Applications (ICSIPA), 2009 IEEE International Conference on, IEEE (2009) 385–390. [24] J.Y. Kuo, T.Y. Lai, F.-C. Huang, K. Liu, The color recognition of objects of survey and implementation on real-time video surveillance, Systems Man and Cybernetics (SMC), 2010 IEEE International Conference on, IEEE (2010) 3741–3748. [25] V. Mutneja, S. Singh, Modified Viola–Jones algorithm with GPU accelerated training and parallelized skin color filtering-based face detection, J. Real-Time Image Process. (2017) 1–21. [26] W. Wang, Y. Zhang, S. Yan, Y. Zhang, H. Jia, Parallelization and performance optimization on face detection algorithm with opencl: a case study, Tsinghua Sci. Technol. 17 (3) (2012) 287–295. [27] J. Kong, Y. Deng, GPU accelerated face detection, Intelligent Control and Information Processing (ICICIP), 2010 International Conference on, IEEE (2010) 584–588. [28] G. Wei, C. Ming, The face detection system based on GPU + CPU desktop cluster, Multimedia Technology (ICMT), 2011 International Conference on, IEEE (2011) 3735–3738. [29] D. Oro, C. Fern’ndez, C. Segura, X. Martorell, J. Hernando, Accelerating boosting-based face detection on GPUs, Parallel Processing (ICPP), 2012 41st International Conference on, IEEE (2012) 309–318. [30] H. Jia, Y. Zhang, W. Wang, J. Xu, Accelerating Viola–Jones facce detection algorithm on GPUs, High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems (HPCC-ICESS), 2012 IEEE 14th International Conference on, IEEE (2012) 396–403. [31] B. Bilgic, B.K. Horn, I. Masaki, Efficient integral image computation on the GPU, Intelligent Vehicles Symposium (IV), 2010 IEEE, IEEE (2010) 528–533. [32] E. Li, B. Wang, L. Yang, Y.-T. Peng, Y. Du, Y. Zhang, Y.-J. Chiu, GPU and CPU cooperative acceleration for face detection on modern processors, Multimedia and Expo (ICME), 2012 IEEE International Conference on, IEEE (2012) 769–775. [33] Y. Wong, S. Chen, S. Mau, C. Sanderson, B.C. Lovell, Patch-based probabilistic image quality assessment for face selection and improved video-based face recognition, IEEE Biometrics Workshop, Computer Vision and Pattern Recognition (CVPR) Workshops, IEEE (2011) 81–88.