Cross border intruder detection in hilly terrain in dark environment

Cross border intruder detection in hilly terrain in dark environment

Optik 127 (2016) 535–538 Contents lists available at ScienceDirect Optik journal homepage: www.elsevier.de/ijleo Cross border intruder detection in...

489KB Sizes 1 Downloads 57 Views

Optik 127 (2016) 535–538

Contents lists available at ScienceDirect

Optik journal homepage: www.elsevier.de/ijleo

Cross border intruder detection in hilly terrain in dark environment Jagdish Lal Raheja a , Swati Deora a , Ankit Chaudhary b,∗ a b

Machine Vision Lab Digital Systems Group CSIR–CEERI, Pilani, RJ, India Pilani, RJ, India

a r t i c l e

i n f o

Article history: Received 10 September 2014 Accepted 29 August 2015 Keywords: Intruder detection Cross border HMM Feature extraction MS Kinect Human surveillance

a b s t r a c t Automatic surveillance is an important research area and has been studied for many years. In this paper, a new method of Cross Border Intrusion detection in hilly region has been discussed. The Kinect camera ensures that intruders crossing the border can be detected during daytime as well as in night. We introduce a border surveillance system that is able to recognize intruder actions like standing, walking, crawling, and bending, etc. in illuminated as well as in dark conditions. The system is able to detect whether the moving object is a human being or an animal and activates an alarm if it detects human movement. Also, the system works well in plain as well as in hilly terrain. Using skeletal tracking application provided by Kinect console, the actions are classified and recognized. The HMM based classification makes the system robust and thus, makes it a versatile component for being a part of other different applications. The proposed system gives result for an overall detection accuracy of 92%. © 2015 Elsevier GmbH. All rights reserved.

1. Introduction Protecting a country’s border is a vital task for Homeland security. One of the biggest challenges of border security is to secure sheer magnitude area of border regions. Countries like Bharat (India) has long land borders and intrusion is a long standing problem. Security personals work in round the clock and in different terrain from snow to dense forest and from sharp hills to desert. In the day, the visual monitoring is helpful; however, it is very difficult to detect potential intruders or smugglers in total darkness or in other low-light conditions. Even Europe and USA also have similar intrusion issues. Security Industry is growing at a rapid rate due to increase in crime rate in the past few years. Intrusion detection devices have been used extensively for security purposes in order to detect the entry of unauthorized person in a protected zone [1]. Various video surveillance systems are installed with the alarm system at public and private places. Video surveillance is a great security solution to control the suspicious activities. There exist some traditional vision-based approaches that could detect intruder using colored image sequences. But the results of these works are restricted by the illumination conditions and nonstationary background [2]. Moreover, if they are able to work in dark conditions then they are very costly. Like other security projects,

∗ Corresponding author. E-mail addresses: [email protected] (J.L. Raheja), [email protected] (S. Deora), [email protected] (A. Chaudhary). http://dx.doi.org/10.1016/j.ijleo.2015.08.234 0030-4026/© 2015 Elsevier GmbH. All rights reserved.

the concept of achieving control over the border areas is based on the organized interaction of smart sensors and guards and their key is the accurate alarm system that guards can rely on. The existing sensors and radars, which had been used at borders are efficient. But several times due to foliage or animals they trigger sensors nuisance alarms instead of actually detecting human intruders. It is because these sensors are not able to differentiate between humans and animals and thus, send wrong signals frequently. Recently, the rapid growth of depth sensors like Microsoft Kinect, which is easily available and a low-cost device provides enough information about the depth coordinates of the full human body joints for real-time full body tracking, which enables us to explore the feasibility of skeleton-based features for cross border intruder detection. Real-time human action recognition is a challenging and complex task due to changes in human appearance, their dresses, their heights and sizes, and changes in illumination [3–5]. To overcome these problems, we have used Kinect and its SDK that extracts only the skeleton from the full frame. [6,7]. As it is based on infrared (IR) camera and depth calculation, the problem of illumination is eliminated. This paper explores the possibility of using Kinect for cross border intruder detection for surveillance purposes. The proposed system is able to differentiate between humans and animals and thus, avoid the false activation of alarm. The depth approximation feature added to the system triggers the alarm only after it detects the human movement crossing a certain limit. With the proposed system, there is greater opportunity to ensure that if the guards are alerted, they are not wasting time in chasing false alarms. Moreover, the system works well in plain

536

J.L. Raheja et al. / Optik 127 (2016) 535–538

Image Capturing

Skeleton Tracking

Feature Extraction

Dimension Reduction

comparing with the training postures. Although angular representation of data makes the system robust but on the other hand the approach is more complex and time consuming. Kumar [4] used Quaternion based directional features in order to gain the position and orientation invariance. Speed and size invariance is achieved by dynamic time warping [17]. The dataset was created using both depth data and the RGB data and thus, makes the dataset heavy. Several classification methods are used to classify human actions. Patsadu [18] used different classification methods for comparison. These are back-propagation neural network [5], support vector machine [3,19], decision tree, and Naive Bayes. Wang [2] uses Hidden Markov model (HMM) for classification of different states. There are good algorithm for training and recognizing HMM [2] [19]. Multi-class probability estimation [4] when used for classification HMM outperforms the recognition performance, when compared to support vector machines. 3. Proposed method for intruder detection

HMM Classification Fig. 1. Block diagram of System.

regions as well as in hilly terrain. Fig. 1 shows the block diagram for the system. 2. Related work It has been a long time since surveillance has become an active research field. Several cameras have been designed to keep a check on intruders. Vittal [8] used thermal cameras, which continuously scan the corresponding areas of the border and were in turn connected to digital signal processing unit, which continuously compare the images obtained with the reference image and previously captured images and thus, any change in the successive images indicated dubious movements, which were immediately reported by means of wireless communication. Thus, the system lacked in differentiating between animals or human intrusion and thus, produced false alarms. Moreover, it was unable to detect intruder movements. Barry [9] used radar system for intrusion detection, which makes the system costly and requires installation of new equipment. The Kinect is a motion sensing device and was originally designed as a natural user interface. Moreover, Kinect has the ability to capture depth data [10]. Lot of work has already been reported on human action recognition. Raheja [11] and Biswas [12] used Kinect for recognition and interpretation of 3D scene information using a projected infrared structured light. Kinect allows the capturing of skeleton information and depth image. Sinha [13] and Raheja [14] proposed a method to take human skeleton data for skeleton joints at frames per second in real time to capture arbitrary walking pattern of a person and stored the skeleton joint coordinates traversing in 3D space as the feature vector to make person identification system. The static and dynamic features were extracted from the raw skeleton data for each person in order to identify that person. Thus, it is limited to specific number of persons. Lai [15] used covariance matrix of carefully selected features that can provide a remarkably discriminative representation for action recognition but they are prone to temporal misalignments and does not form the vector space. Monir [16] proposed a method of human posture identification, in which a set of vectors and angles manipulated from the skeletal data represents the body postures. The classifier is trained by storing those angles for different postures. While testing for identifying a posture three different matrices are applied to it for

The proposed approach for intruder action recognition uses the Kinect skeletal tracking to recognize activities performed by humans [20]. Skeletal joint sequence obtained through Kinect is an effective representation of the activities. Hence, we consider the sequence of 20 human joints captured by Kinect for feature extraction. The dataset used in this study was acquired by Kinect and Kinect SDK [21]. The Kinect was placed on a pole at a height of 0.8 meters with a tilt of −10◦ . We collected dataset for four possible way of moving on a hilly region: Walking, bending, crawling, and standing taking three different subjects for each. The skeletal tracking for such patterns are shown in Fig. 2. Each gesture was repeated five times by each subject. Experimentally, we found that joint features perform better than any other feature choices for our dataset. In real world applications, different persons perform activity at different rate and in each frame; to accommodate all possible scenarios and make our system more robust, we decided to track 20 joints [22]. Thus, we have a very large feature space that is nonlinearly separable and involves complex classification tasks.

Fig. 2. (a) Bending, (b) crawling, (c) walking, (d) standing.

J.L. Raheja et al. / Optik 127 (2016) 535–538

537

Dynamic: Depth Data Acquisition

vcp = ic − jp ; ic ∈ c , jp ∈ p

Depth Calculation

Depth> =TH

Yes

Offset:

vcf = ic − jf ; ic ∈ c , jf ∈ f Alarm ON

Thus, the feature vector contains the above three features as shown:



f = vcc vcp vcf

No



Feature Extraction

B. Dimensionality reduction Dimensionality Reduction Pattern Matching and Classification

Due to high dimensionality of the feature vector it becomes necessary to produce a compact low-dimensional dataset that can be used to reproduce most of the variability of the dataset.

Trained HMM

Recognized Action Fig. 3. Flowchart showing feature extraction and classification.

Table 1 Table showing total types and number of features. Type of feature

Formula used

Number of features

Static Dynamic Offset

n

C2 n2 n2

190 400 400

5n2 −n 2

990

Total features

a. PCA: principal component analysis is a very popular technique of dimensionality reduction used to compute the basis vectors for a set of vectors [24]. The Eigen vectors having largest eigenvalues corresponds to the principal components. Here after PCA, the dimensionality of features is reduced to 30. b. VQ: vector quantization is a lossy data compression method used in many applications, such as image and voice recognition. Here, we use VQ to cluster “n” number of observations into k-clusters. Thus, it generates the codebook with all the code words [25]. The value of k is selected to be 60. Thus, generating the codebook of 60 code words, having dimensionality of 30 each. The observation vectors and the centroids have the same feature dimension. So, the movements are represented as the sequence of integers. C. Depth measurement

In our study, we used Hidden Markov model for classification and principal component analysis for reducing the dimensionality of high dimensional dataset. Fig. 3 shows the working of flow of the algorithm. A. Feature extraction Kinect has IR camera that enables skeletal tracking. This tool of skeletal tracking aims toward the collection of joints coordinates. The joint coordinates are calculated with respect to the IR camera of the device itself. Thus, the coordinates does not correspond to the actual coordinates. For each frame, coordinates for 20 joints were collected. The feature vectors are calculated from the x, y, and z values for each joint. Thus, total 60 (20 × 3 = 60) values are collected from each frame, which is further used to make the feature vector. Three different kinds of features were extracted from each frame namely static features, dynamic features, and offset features [23]. Static features correspond to the static property of the frame that describes the distances between two corresponding joints in a frame. We calculated 190 static features. Dynamic features correspond to the motion property of the frame and are extracted by taking the difference of the current frame from the previous frame. Thus, 400 dynamic features were extracted from each frame. Similarly, 400 offset features were extracted with respect to the initial frame. Hence, total 2970 features are extracted in a frame as each feature has corresponding 3D coordinate. Table 1 shows total number of features extracted where n is equal to the total number of joints i.e., 20. The definitions of these features are as follows: Static:

vcc = i − j ; i, j ∈ [1, N] ; i = / j; N = 20

The Depth sensor of Kinect camera consists of the IR Projector combined with the IR camera. It is based on the structured light principle. [5]. Sometimes the depth values produced by the depth sensor are inaccurate because the calibration between the IR projector and the IR camera is invalid. The depth measurement of Kinect is not linear. Although the exact calculation of depth is not possible using Kinect. Observation shows that the depth values changes somewhat logarithmically with the actual distance [26]. A better approximation of the actual distance is obtained when we multiply the value by 10 obtained by the given equation: Approx. distance = 0.1236 × tan

 raw depth  2842.5



+ 1.10863

where raw depth is the depth value obtained by Kinect. Multiplying the output by 10, we get the approximated actual distance in meters. D. Classification The classification has been done by Hidden Markov model. HMM is a variant of finite state machine having a set of Hidden states S, Output observation O Transition Probabilities A, emission probabilities B, and initial states probabilities  [27–29]. The model is characterized by the complete set of parameters:  = {A, B, } We used 60 × 30 codebook matrix generated after vector quantization for training the HMM model. For training, forward-backward algorithm is used for each action separately. The three matrices used are initial state matrix, transition state matrix, and emission

538

J.L. Raheja et al. / Optik 127 (2016) 535–538

matrix. Viterbi algorithm is used for decoding purpose. The resulting state of the decoding is the recognized action. 4. Results The proposed algorithm for intruder action recognition showed a detection accuracy of 92%. In this study, we analyzed that PCA reduced the dimensionality and aids in improving the classification accuracy of the system. The system was trained for four actions i.e., walking, standing crawling, and bending. Observation results reveal that the HMM used for classification works well providing a recognition accuracy of 88.33%. Walking and bending were classified accurately. Nevertheless, misclassification was an issue for activities that have similar movements like standing and walking with slow pace, which makes it difficult to classify the frames without any information about the past frames. In this study, it was found that increasing the number of training instances in the dataset enhances the tolerance of the variations and result in stable classification. 5. Conclusion In this paper, we presented a cross border intruder detection system using MS Kinect. Our dataset consists of input vectors of 20 body-joint positions. The system successfully classifies input frames into four class actions a.k.a. walking, standing crawling, and bending. But it appears fairly easy to extend the number of actions to be recognized. The system is robust enough in handling noisy Kinect data. Our proposed system serves a promising direction in accurately detecting any intruder. The system works well for new users without the need to carry out additional training. For border security with this proposed system there is greater chance to ensure that when the guards are alerted by the alarm system and need to respond, they are not wasting time chasing nuisance alarms. Also in order to increase the field of view, we can employ the rotating feature of Kinect too. Thus, making the system more robust. Recently launched Microsoft Xbox One can be further used for border surveillance purpose as it provides better performance. Acknowledgment This work has been done under the collaborative project jointly supported by Department of Science and Technology (DST), Ministry of Science and Technology of the Republic of India and the Slovenian Research Agency (ARRS), Ministry of Education, Science, and Sport of the Republic of Slovenia. The authors would like to thank DST, India for the financial assistance and Director CSIRCEERI, India for his valuable advice. References [1] Y. Li, et al., Ultrasonic Intruder Detection System for Home Security. Intelligent Control and Automation, Springer, Berlin Heidelberg, 2006, pp. 1108–1115. [2] B. Wang, Z. Chen, C. Jing, Gesture recognition by using kinect skeleton tracking system, in: Intelligent Human-Machine Systems and Cybernetics (IHMSC), 2013 5th International Conference on. IEEE, 1, 2013.

[3] M.D. Bengalur, Human activity recognition using body pose features and support vector machine, in: Advances in Computing, Communications and Informatics (ICACCI), 2013 International Conference on. IEEE, 2013. [4] P.K.r Pisharady, M. Saerbeck, Robust gesture detection and recognition using dynamic time warping and multi-class probability estimates, in: Computational Intelligence for Multimedia, Signal and Vision Processing (CIMSIVP), 2013 IEEE Symposium on. IEEE, 2013. [5] J.L. Raheja, M.B.L. Manasa, A. Chaudhary, S. Raheja, ABHIVYAKTI: hand gesture recognition using orientation histogram in different light conditions, in: Proceedings of the 5th Indian International Conference on Artificial Intelligence, India, 2011, pp. 1687–1698. [6] Z. Zhang, Microsoft Kinect sensor and its effect, MultiMedia, IEEE 19 2 (2012) 4–10. [7] M.A. Livingston, et al., Performance measurements for the Microsoft Kinect skeleton, Virtual Reality Short Papers and Posters (VRW), 2012 IEEE. IEEE (2012). [8] K.P. Vittal, et al., Computer controlled Intrusion-detector and automatic firingunit for border security, in: Computer and Network Technology (ICCNT), 2010 Second International Conference on. IEEE, 2010. [9] A.S. Barry, D.S. Mazel, The Secure Perimeter Awareness Network (SPAN) at John F. Kennedy International Airport, in: Security Technology, 2007 41st Annual IEEE International Carnahan Conference on. IEEE, 2007. [10] M.B. Manjuatha, et al., Survey on skeleton gesture recognition provided by kinect, IJAREEIE 3 4 (2014) 8475–8483. [11] J.L. Raheja, A. Chaudhary, K. Singal, Tracking of Fingertips and Centre of Palm using KINECT, in: Proceedings of the 3rd IEEE International Conference on Computational Intelligence, Modelling and Simulation, Malaysia, 2011, pp. 248–252, 20-22 Sep. [12] K.K. Biswas, B. Saurav Kumar, Gesture Recognition using Microsoft Kinect®, in: Automation, Robotics and Applications (ICARA), 2011 5th International Conference on. IEEE, 2011. [13] A. Sinha, K. Chakravarty, Pose based person identification using Kinect, in: Systems, Man, and Cybernetics (SMC), 2013 IEEE International Conference on. IEEE, 2013. [14] J.L. Raheja, A. Chaudhary, K. Nandhini, S. Shukla, Pre-Consultation help necessity detection based on Gait Recognition, Signal, Image and Video Processing, 9, Springer, 2015, pp. 1357–1363, 6. [15] K. Lai, J. Konrad, P. Ishwar, A gesture-driven computer interface using Kinect, in: Image Analysis and Interpretation (SSIAI), 2012 IEEE Southwest Symposium on. IEEE, 2012. [16] S. Monir, S. Rubya, H.S. Ferdous, Rotation and scale invariant posture recognition using Microsoft Kinect skeletal tracking feature, ISDA (2012). [17] J.L. Raheja, M. Minhas, D. Prashanth, T. Shah, A. Chaudhary, Robust gesture recognition using Kinect: a comparison between DTW and HMM, Optik 126 (11–12) (2015) 1098–1104. [18] O. Patsadu, C. Nukoolkit, B. Watanapa, Human gesture recognition using Kinect camera, in: Computer Science and Software Engineering (JCSSE), 2012 International Joint Conference on. IEEE, 2012. [19] J.L. Raheja, A. Mishra, A. Chaudhary, Indian Sign Language Recognition using SVM, Pattern Recognition and Image Analysis, Springer 26 (1) (2016). [20] G.D. Forney Jr., The Viterbi algorithm, Proceedings of the IEEE 61 3 (1973) 268–278. [21] A. Jana, Kinect for Windows SDK Programming Guide, Packet Publishing Ltd, 2012. [22] A. Kar, Skeletal tracking using Microsoft Kinect, Methodology 1 (2010) 1–11. [23] X. Yang, Y. Tian, Effective 3D action recognition using Eigen joints (2013). [24] Robospace, online; accessed 19- June http://robospace.wordpress.com/2013/ 10/09/object-orientation-principal-component-analysis-opencv/. [25] Vector Quantization, online; accessed 19- June http://www.mqasem.net/ vectorquantization/vq.html. [26] Imaging Information, online; accessed 19- June http://openkinect.org/wiki/ Imaging Information. [27] M.Z. Uddin, N.D. Thang, K. Tae-Seong, Human Activity Recognition via 3-D joint angle features and Hidden Markov models, in: Image Processing (ICIP), 2010 17th IEEE International Conference on. IEEE, 2010. [28] B. Ozer, T. Lv, W. Wayne, A bottom-up approach for activity recognition in smart rooms, in: Multimedia and Expo, 2002. ICME’02. Proceedings. 2002 IEEE International Conference on. IEEE, 1, 2002. [29] D.J. Moore, I.A. Essa, H. Monson, I. Hayes, Exploiting human actions and object context for recognition tasks, in: Computer Vision, 1999. The Proceedings of the Seventh IEEE International Conference on. IEEE, 1, 1999.