Optik 126 (2015) 386–390
Contents lists available at ScienceDirect
Optik journal homepage: www.elsevier.de/ijleo
Smart road vehicle sensing system based on monocular vision Hai Wang, Chaochun Yuan, Yingfeng Cai ∗ School of Automotive and Traffic Engineering, Jiangsu University, Zhenjiang 212013, China
a r t i c l e
i n f o
Article history: Received 30 December 2013 Accepted 15 September 2014 Keywords: Vehicle sensing Smart system Monocular vision Automatic sample labeling
a b s t r a c t In this paper, a smart monocular vision based system to sense vehicles with a camera mounted inside a moving car is developed. The “smartness” is that the sensing ability of our system can be self improved when used. This system maintains an online learning ability which consists of two main stages: an initialization stage by applying an offline trained classifier and a retraining stage with queried and labeled new samples. The unlabeled examples are queried base on “most uncertainty” criterion, and an automatic labeling mechanism is used to assign a class label to some of the queried examples. Finally, the newly labeled training examples are then used to retrain the classifier and improve its performance continuously. Experiments show that the developed system maintains smart learning ability and performs well on real road situation. © 2014 Elsevier GmbH. All rights reserved.
1. Introduction Automotive accidents injure more than ten million people each year, including two or three million of them seriously. Vehicle accident statistics disclose that the main threats a driver is facing are from other vehicles, especially the vehicle in front. Based on this, with the aim to prevent those accidents or reduce accident severity, front vehicle sensing technology is becoming a hot area among automotive manufacturers, suppliers and universities. Besides, vehicle sensing systems also play important roles in many end use electrical applications, such as driver-assistance systems and automatic parking systems. Although radar or laser based vehicle sensing maintain higher robustness and accuracy, vision is becoming more and more popular nowadays for its low cost and ability to get rich environment information such as vehicles, pedestrians, lane marks and traffic signs [1–5]. With the progress of hardware’s computation ability, complicated sensing algorithm can be gradually implemented on these vision based vehicle sensing system and significantly improve the performance. Traditional typical offline trained vision-based vehicle sensing algorithm is with two steps: (1) hypothesis generation (HG) and (2) hypothesis verification (HV) [6,7]. In the HG step, potential vehicles are identified in road images. In the HV step, the system verifies hypotheses generated in the HG step previously generated.
∗ Corresponding author. Tel.: +86 18261977099. E-mail address:
[email protected] (Y. Cai). http://dx.doi.org/10.1016/j.ijleo.2014.09.010 0030-4026/© 2014 Elsevier GmbH. All rights reserved.
In this paper, a smart monocular vision based system to sense vehicles with a camera mounted inside a moving car is built. The “smartness” is that the sensing ability of our system can be self improved when used by customer. In another words, our system maintain an online learning ability. Our proposed smart vehicle sensing system consists of two main stages: an initialization stage by applying an offline trained classifier and a retraining stage with queried and labeled new samples. In the initialization stage, a set of training examples is collected and annotated to train an initial classifier. Once an initial classifier has been built, a query function is used to query unlabeled examples base on “most uncertainty” criterion, and an automatic labeling mechanism is used to assign a class label to some of the queried examples. The newly labeled training examples are then used to retrain the classifier 2. System architecture In this paper, a real-time smart vehicle sensing system employing online trained classifiers is proposed, of which the sensing ability can be self improved when using. Specifically, the system performance, such as detection rate and false alarm rate, can be improved online by retraining classifier with online generated image samples. Like many literature referred, firstly an offline trained initial classifier is built with labor selected initial training samples. Then, new image samples with most uncertainty are selected from realtime obtained image that needs to be classified. When the image samples are generated, a sample labeling system is employed to tag image samples as either positive or negative training samples. At last, the classifier is retrained and updated with those new labeled
H. Wang et al. / Optik 126 (2015) 386–390
387
3.2. Probabilistic neural network The probabilistic neural network (PNN) is developed by Donald Specht [13]. It is a model based on competitive learning with a ‘winner takes all attitude’ and the core concept is based on multivariate probability. The PNN provides a general solution to pattern classification problems by following an approach developed in statistics, called Bayesian classifiers. PNN is very suitable for our application because of two good features:
Fig. 1. Framework of the system architecture.
samples as well as initial training samples. The architecture of the proposed real-time vehicle sensing system is shown in Fig. 1.
(1) Compared to other back feedback neural networks like BPNN, its training time is very small. This is very critical because our system need to retrain and update the classifier frequently. (2) It can achieve any nonlinear transform and the decision surface is close to that of Bayes optimal rule. When training samples is big enough, the decision surface will be the most optimal. In our smart vehicle sensing system, new training samples will be continuously generated and its performance will be better and better theoretically.
3. Training with Haar-NMF feature and probabilistic neural network
4. Online sample query and multi-cue based sample labeling
3.1. Haar-NMF feature
In traditional offline learning based vehicle sensing system, all training samples are manually queried and labeled, which is not able to satisfy the requirement of online training. To solve this problem, an online sample query and multi-cue based automatic sample labeling strategy is proposed. Firstly, samples are queried with the principle of confidence which means selecting the “most uncertainty” samples. Because the uncertain samples are usually considered contain more rich information to the classifier than other samples. After that, the generated samples are judged and labeled to be “Positive”, “Negative” or “Uncertain” by applying a proposed Multi-Cue based automatic sample labeling strategy. When finally new generated samples are labeled as positive or negative, they will be grouped with existed samples and be retrained by the method proposed in Section 3. It should be noted that this labeling method is not able to be applied to sense vehicle directly due to the heavy computation requirement.
Haar-like rectangular features was introduced by Viola and Jones [10] in the context of face detection. Various studies have incorporated this approach into on-road vehicle-detection systems such as [8,9,11]. The set of Haar-like rectangular features is well suited to the detection of the shape of vehicles. Rectangular features are sensitive to edges, bars, vertical and horizontal details, and symmetric structures. The algorithm also allows for rapid object detection that can be exploited in building a real-time system, partially due to fast and efficient feature extraction using the integral image. However, there exists one big trouble to directly implement Haar-like feature to train the classifier, because the dimension of Haar-like feature vector generated from sample images are extremely high. For instance, the dimension of a Haar-like feature vector generated from a 24 by 24 image is more than 100,000. These will dramatically increase the computation time and require huge hardware storage which will be not suitable in our embedded system. Here NMF (Non-negative Matrix Factorization) is utilized to reduce the dimension of Haar feature vectors and form Haar-NMF feature vectors. NMF is a matrix decomposition method under the constrain that all the elements of the matrix are non-negative [12]. This algorithm is also can be considered as an optimization process under the constraint of certain cost functions, whose approximate solution can be calculated by iteration. The calculation process of Haar-NMF feature is as follows: (1) Let H be the Haar feature vector and l be the vector dimension. Since not all elements of Haar feature vectors are non-negative, the absolute value of Haar feature is obtained. Then H is converted to a matrix C(m × n), in which l = m × n. (2) Make NMF decomposition of rank r of matrix C C = W HT
(1)
In which, W is non-negative base matrix and H is non-negative coefficient matrix whose dimension are m × r and n × r respectively. In the next part, Haar-NMF features will be loaded on a probabilistic neural network (PNN) to train the vehicle sensing classifier.
4.1. Confidence-based online sample query Confidence-based query uses a confidence metric to query examples that lie near the classification boundary in the decision space [15]. Among many confidence-based query methods, the score-based query is the most simple and effective, which use a simple threshold on the value of a classifier’s discriminate function evaluated on given samples. So that an implicit probability or confidence measure for binary classifiers can be obtained by just feeding the value of the discriminate function to the logistic function. Apply this method, in our case, each sample x with class conditional probabilities near 0.5 is queried. Indeed, in the real system the probabilities range P is chosen as 0.4–0.6. 4.2. Multi-cue based sample labeling In order to automatically label the new generated samples, this paper proposes a Multi-Cue based sample labeling strategy. Three cue factors are utilized to make the judgments which are complexity, vertical plane and relative movement. Samples labeling realization diagram is shown in Fig. 2. The three kinds of judgment factors are expressed as follows:
388
H. Wang et al. / Optik 126 (2015) 386–390
Fig. 3. Sky, horizontal plane and vertical plane distinction and marking. The vertical plane is marked with grown color and an “X” for vision emphasis. Besides, the horizontal plane is marked with green color and purple for the sky. (For interpretation of the references to color in figure legend, the reader is referred to the web version of the article.)
Fig. 2. Flow chart of multi-cue based sample labeling.
4.2.1. Complexity The first factor that can be considered to judge vehicles is the complexity of an image sample. The idea is that if an image sample is of simple pattern, it cannot be identified as a vehicle. A good measure for this purpose can be the detected edges. Let n and n˜ denote the number of pixels and the number of detected edge pixels in an image sample, respectively. The measure of the degree of complexity of an image sample can be defined as Cp =
n˜ n
(2)
If the complexity Cp of an image sample satisfies Cp < ıC , then this sample can be judged as negative one. Here ıC is the threshold which is set by statistic information. Statistic results of complexity factor C including 5000 vehicle images and 10,000 non-vehicle images imply that Cp of all vehicle images are more than 0.4 and that of non-vehicles are evenly distributed from 0.1 to 0.8. Based on the statistic results above, the threshold are conservatively set as ıC = 0.35. Any sample whose complexity is smaller than 0.35 is considered as negative sample. Otherwise, the sample needs further assessment in the next step. 4.2.2. Vertical Plane By observation, it can be found that road scene geometry can be mainly divided into three categories such as sky, horizontal plane and vertical plane. Statistic and analysis results of many road scene images reveal that horizontal plane are often from road surface whereas other items often contain vertical plane such as vehicles, trees and fences. Based on this, if these three categories of objects can be distinguished through single image analysis, it will provide very valuable information for the judgment of vehicles. Because every vehicle should contain certain percentage of vertical plane. In another words, the object that do not contain vertical plane is definitely not a vehicle. In literature [14], Hoiem proposed a geometric information classification algorithm which is used in our framework. Firstly an image is over-segmented into superpixels [16], each of which belongs to a particular geometric class. Each superpixel is described by depth cues, including color, location, perspective, and texture. Then, from a logistic regression form of AdaBoost previously trained, the geometric class of each superpixel is specified. By using this method, a typical road image is processed like below (see Fig. 3). Fig. 3(a) is the original road image, while in Fig. 3(b) it is divided into three categories. The vertical plane is marked with grown color and an “X” for vision emphasis. Besides, the horizontal plane is marked with green color and purple for the sky.
If the number of pixels belonging to vertical plane of a sample image is below a certain percentage ıK , this sample is able to judge as negative, otherwise, the sample still needs further assessment in the next step. The vertical plane is marked with grown color and an “X” for vision emphasis. Besides, the horizontal plane is marked with green color and purple for the sky. 4.2.3. Relative movement The sample images that have not been judged as negative ones are still not able to be labeled as positive. Because, in some special cases, for example, a complex painting in the advertise board also satisfy both the two factors above. On road vehicle have another very unique feature different from many other objects, which is its movement. Considering that, a strong constrain is used to label samples as positive, which is the relative movement. The sample to be judged is tracked firstly with Meanshift which is a fast and effective algorithm used in many object tracking application. Then, using previous calculated camera parameters, the relative speed Vrel between the tracked sample and ego-vehicle is calculated. After that, by inducing the speed of ego-vehicle, the sample speed Vsam can be obtained. If Vsam is significantly greater than zero, the sample can be considered as a moving object and labeled as positive (vehicle). Otherwise, the sample is labeled as uncertain. The reason why it is not labeled as negative is that there may exist still or low speed vehicles. 5. Experiments This section will demonstrate three groups of experiments results and comparisons focus on system performance: (1) correct rate of sample labeling; (2) learning ability of vehicle classifier and (3) training time. 5.1. Experiment setup In the initial training, the positive training samples are partially from Caltech1999 database, which include 126 images contain rear view vehicles. Besides, another 300 vehicle images are collected by our groups in recorded road videos. Meanwhile, the negative samples are chosen from 500 images not contain vehicles and the number of negative samples for initial training is 2000. Fig. 4 shows some of these positive and negative training samples. 5.2. System performance 5.2.1. Correct rate of sample labeling In the first experiment, around 7600 images are processed in the “sensing, sample-generation-and-labeling, retraining” process. During this process, 3856 new samples are generated in the
H. Wang et al. / Optik 126 (2015) 386–390
389
Fig. 4. Some positive and negative training samples. (a) Positive samples and (b) negative samples.
Table 1 Correct rate of sample labeling.
Table 2 Training time comparison.
Sample types
Correct labeling
Incorrect labeling
Correct rate (%)
Method
Positive samples Negative samples Overall
1278/1279 2183/2186 3461/3465
1/1279 3/2186 4/3465
99.92 99.86 99.88
With extra 100P/200N
With extra 300P/600N
With extra 900P/1800N
AdaBoost (h) Our PNN based method (s)
1.1 15.7
6.7 107.3
22.4 422.5
classifiers are the initial classifier and retrained classifiers with extra 100P/200N (100 positive samples/200 negative samples), 300P/600N, 900P/1800N separately. From the ROC curves, it can be seen that the initial classifier performs very poor. When extra samples are loaded continuously, the classification ability improved dramatically.
Fig. 5. ROC curves of four classifier with different number of training samples.
confidence-based online sample query step. After the multi-cue based sample labeling step, 3465 of these new samples are successfully labeled as positive or negative, and 391 as uncertain. The labeling correct rate of these 3465 samples is shown in Table 1. From the experiment, it can be referred that the labeling correct rate is almost 100% and the labeled samples are confidentially to be used in further classifier retraining.
5.2.3. Classifier training time Correct rate of sample labeling and classifier learning ability are obviously the most two important characters, however, for our resource limited and real time embedded system, there is another critical problem which is the training time. Because when new samples are generated, the classifier needs to be retrained and updated. In real time application, lower training time means quicker improvement of classifier performance. The training time of our proposed method and another state-ofthe-art AdaBoost method are compared in Table 2. Since AdaBoost need much storage space which is not able to be supplied by DSP system, the comparison is preceded in a table PC with 2.67G Core 2 Duo processor, 4G memory and Ubuntu Linux 11.10 OS. It can be seen from Table 2 that, due to the superiority of PNN feed forward neural network structure, the training time is around 200 times lower than that of AdaBoost based method.
5.2.2. Classifier learning ability In the second experiments, learning ability of the vehicle classifier is assessed by using True Positive Rate (TPR) and False Detection Rate (FDR). TPR is the percentage of vehicles in the camera’s view that are detected. TPR is assessed by dividing the number of truly detected vehicles by the total number of vehicles. This quantity measures recall and localization. TPR is defined by TPR =
Ture Positive All Positive
FDR is the proportion of detection that was not true vehicles. We assess the FDR by dividing the number of false positives by the total number of detections. This is the percentage of erroneous detection. FDR is a measure of precision and localization. FDR is defined by FDR =
False Positive Ture Positive + False Positive
Four ROC curves related to four trained classifiers are shown in Fig. 5 which demonstrates the performance of classifiers when different number of training samples is loaded. The four
Fig. 6. Some of the real road vehicle sensing results. First row: daylight highway situation; Second row: raining day highway situation; Third row: daylight urban situation; Fourth row: night highway with road lamp. (For interpretation of the references to color near the citation of this figure, the reader is referred to the web version of the article.)
390
H. Wang et al. / Optik 126 (2015) 386–390
5.3. On-road experiment results
References
Finally, some of the vehicle sensing results in real road situation is shown in Fig. 6. The four rows of images are picked in daylight highway, raining day highway, daylight urban and night highway with road lamp respectively. The solid green box means detected vehicles, and the dotted red box means undetected vehicles or false detected vehicles. Overall, most of the on-road vehicles can be sensed successfully while miss detection and false detection sometimes occurred during adverse situation such as partial occlusion and bad weather. We even try to put our system in night road with no road lamp at all. However, the sensing performance is extremely poor due to very low illuminations.
[1] C. Blanco, F. Jaureguizar, N. García, An efficient multiple object detection and tracking framework for automatic counting and video surveillance applications, IEEE Trans. Consum. Electron. 58 (3) (2012) 857–862. [2] R. Sun, J. Chen, J. Gao, Fast pedestrian detection based on saliency detection and HOG-NMF features, J. Electron. Inf. Technol. 35 (8) (2013) 1921–1926. [3] Z. Cai, M. Gu, Traffic sign recognition algorithm based on shape signature and dual tree-complex wavelet transform, J. Cent. South Univ. Technol. 20 (4) (2013) 433–439 (English edition). [4] R. Zhang, S. Zhang, S. Yu, Moving objects detection method based on brightness distortion and chromaticity distortion, IEEE Trans. Consum. Electron. 53 (3) (2007) 1177–1185. [5] J. Kim, D. Yeom, Y. Joo, Fast robust algorithm of tracking multiple moving objects for intelligent video surveillance systems, IEEE Trans. Consum. Electron. 57 (3) (2011) 1165–1170. [6] Z. Sun, G. Bebis, R. Miller, On-road vehicle detection: a review, IEEE Trans. Pattern Anal. Mach. Intell. 28 (5) (2006) 694–711. [7] Q.B. Truong, H.N. Geon, B.R. Lee, Vehicle detection and recognition for automated guided vehicle, in: Proc. ICCAS-SICE, Fukuoka, Japan, 2009, pp. 671–676. [8] A. Haselhoff, S. Schauland, A. Kummert, A signal theoretic approach to measure the influence of image resolution for appearance based vehicle detection, in: Intelligent Vehicles Symposium, 2008 IEEE, Eindhoven, Dutch, 2008, pp. 822–827. [9] D. Ponsa, A. Lopez, F. Lumbreras, J. Serrat, T. Graf, 3D vehicle sensor based on monocular vision, in: Proceedings on Intelligent Transportation Systems, 2005 IEEE, 2005, pp. 1096–1101. [10] P. Viola, M. Jones, Rapid object detection using a boosted cascade of simple features, in: Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference on Vol. 1, IEEE, 2001. [11] S. Wender, K. Dietmayer, 3D vehicle detection using a laser scanner and a video camera, IET Intell. Transp. Syst. 2 (2) (2008) 105–112. [12] D.D. Lee, H.S. Seung, Learning the parts of objects by non-negative matrix factorization, Nature 401 (6755) (1999) 788–791. [13] D. Specht, Enhancement to Probabilistic Neural Network, in: IEEE International. Joint Conference on Neural Networks, Baltimore, MD, 1992, pp. 7–11. [14] D. Hoiem, A. Efros, M. Hebert, Recovering surface layout from an image, Int. J. Comput. Vis. 75 (1) (2007) 151–172. [15] M. Li, I.K. Sethi, Confidence-based active learning, IEEE Trans. Pattern Anal. Mach. Intell. 28 (8) (2006) 1251–1261. [16] X. Ren, J. Malik, Learning a classification model for segmentation, in: Ninth IEEE International Conference on Computer Vision, 2003, pp. 10–17.
6. Conclusions In this paper, a smart monocular vision based system to sense vehicles with a camera mounted inside a moving car is built. The “smartness” is for its self improved sensing ability when used by customers with the proposed online query-labeling-learning framework. Experiments demonstrate that this system worked well at most circumstances except for extremely low illumination condition. Our future work is to expand the sensing ability of this system so that it is able to work in extremely low illumination condition such as no lamp highway. Acknowledgments This research was funded partly by National Natural Science Foundation of China (51305167) and by Jiangsu University Scientific Research Foundation for Senior Professionals (12JDG10).