Smart road vehicle sensing system based on monocular vision

Optik 126 (2015) 386–390 Contents lists available at ScienceDirect Optik journal homepage: www.elsevier.de/ijleo Smart road vehicle sensing system ...

Download PDF

1MB Sizes 2 Downloads 71 Views

Report

PDF Reader
Full Text

Optik 126 (2015) 386–390

Contents lists available at ScienceDirect

Optik journal homepage: www.elsevier.de/ijleo

Smart road vehicle sensing system based on monocular vision Hai Wang, Chaochun Yuan, Yingfeng Cai ∗ School of Automotive and Trafﬁc Engineering, Jiangsu University, Zhenjiang 212013, China

a r t i c l e

i n f o

Article history: Received 30 December 2013 Accepted 15 September 2014 Keywords: Vehicle sensing Smart system Monocular vision Automatic sample labeling

a b s t r a c t In this paper, a smart monocular vision based system to sense vehicles with a camera mounted inside a moving car is developed. The “smartness” is that the sensing ability of our system can be self improved when used. This system maintains an online learning ability which consists of two main stages: an initialization stage by applying an ofﬂine trained classiﬁer and a retraining stage with queried and labeled new samples. The unlabeled examples are queried base on “most uncertainty” criterion, and an automatic labeling mechanism is used to assign a class label to some of the queried examples. Finally, the newly labeled training examples are then used to retrain the classiﬁer and improve its performance continuously. Experiments show that the developed system maintains smart learning ability and performs well on real road situation. © 2014 Elsevier GmbH. All rights reserved.

1. Introduction Automotive accidents injure more than ten million people each year, including two or three million of them seriously. Vehicle accident statistics disclose that the main threats a driver is facing are from other vehicles, especially the vehicle in front. Based on this, with the aim to prevent those accidents or reduce accident severity, front vehicle sensing technology is becoming a hot area among automotive manufacturers, suppliers and universities. Besides, vehicle sensing systems also play important roles in many end use electrical applications, such as driver-assistance systems and automatic parking systems. Although radar or laser based vehicle sensing maintain higher robustness and accuracy, vision is becoming more and more popular nowadays for its low cost and ability to get rich environment information such as vehicles, pedestrians, lane marks and trafﬁc signs [1–5]. With the progress of hardware’s computation ability, complicated sensing algorithm can be gradually implemented on these vision based vehicle sensing system and signiﬁcantly improve the performance. Traditional typical ofﬂine trained vision-based vehicle sensing algorithm is with two steps: (1) hypothesis generation (HG) and (2) hypothesis veriﬁcation (HV) [6,7]. In the HG step, potential vehicles are identiﬁed in road images. In the HV step, the system veriﬁes hypotheses generated in the HG step previously generated.

∗ Corresponding author. Tel.: +86 18261977099. E-mail address: [email protected] (Y. Cai). http://dx.doi.org/10.1016/j.ijleo.2014.09.010 0030-4026/© 2014 Elsevier GmbH. All rights reserved.

In this paper, a smart monocular vision based system to sense vehicles with a camera mounted inside a moving car is built. The “smartness” is that the sensing ability of our system can be self improved when used by customer. In another words, our system maintain an online learning ability. Our proposed smart vehicle sensing system consists of two main stages: an initialization stage by applying an ofﬂine trained classiﬁer and a retraining stage with queried and labeled new samples. In the initialization stage, a set of training examples is collected and annotated to train an initial classiﬁer. Once an initial classiﬁer has been built, a query function is used to query unlabeled examples base on “most uncertainty” criterion, and an automatic labeling mechanism is used to assign a class label to some of the queried examples. The newly labeled training examples are then used to retrain the classiﬁer 2. System architecture In this paper, a real-time smart vehicle sensing system employing online trained classiﬁers is proposed, of which the sensing ability can be self improved when using. Speciﬁcally, the system performance, such as detection rate and false alarm rate, can be improved online by retraining classiﬁer with online generated image samples. Like many literature referred, ﬁrstly an ofﬂine trained initial classiﬁer is built with labor selected initial training samples. Then, new image samples with most uncertainty are selected from realtime obtained image that needs to be classiﬁed. When the image samples are generated, a sample labeling system is employed to tag image samples as either positive or negative training samples. At last, the classiﬁer is retrained and updated with those new labeled

H. Wang et al. / Optik 126 (2015) 386–390

387

3.2. Probabilistic neural network The probabilistic neural network (PNN) is developed by Donald Specht [13]. It is a model based on competitive learning with a ‘winner takes all attitude’ and the core concept is based on multivariate probability. The PNN provides a general solution to pattern classiﬁcation problems by following an approach developed in statistics, called Bayesian classiﬁers. PNN is very suitable for our application because of two good features:

Fig. 1. Framework of the system architecture.

samples as well as initial training samples. The architecture of the proposed real-time vehicle sensing system is shown in Fig. 1.

(1) Compared to other back feedback neural networks like BPNN, its training time is very small. This is very critical because our system need to retrain and update the classiﬁer frequently. (2) It can achieve any nonlinear transform and the decision surface is close to that of Bayes optimal rule. When training samples is big enough, the decision surface will be the most optimal. In our smart vehicle sensing system, new training samples will be continuously generated and its performance will be better and better theoretically.

3. Training with Haar-NMF feature and probabilistic neural network

4. Online sample query and multi-cue based sample labeling

3.1. Haar-NMF feature

In traditional ofﬂine learning based vehicle sensing system, all training samples are manually queried and labeled, which is not able to satisfy the requirement of online training. To solve this problem, an online sample query and multi-cue based automatic sample labeling strategy is proposed. Firstly, samples are queried with the principle of conﬁdence which means selecting the “most uncertainty” samples. Because the uncertain samples are usually considered contain more rich information to the classiﬁer than other samples. After that, the generated samples are judged and labeled to be “Positive”, “Negative” or “Uncertain” by applying a proposed Multi-Cue based automatic sample labeling strategy. When ﬁnally new generated samples are labeled as positive or negative, they will be grouped with existed samples and be retrained by the method proposed in Section 3. It should be noted that this labeling method is not able to be applied to sense vehicle directly due to the heavy computation requirement.

Haar-like rectangular features was introduced by Viola and Jones [10] in the context of face detection. Various studies have incorporated this approach into on-road vehicle-detection systems such as [8,9,11]. The set of Haar-like rectangular features is well suited to the detection of the shape of vehicles. Rectangular features are sensitive to edges, bars, vertical and horizontal details, and symmetric structures. The algorithm also allows for rapid object detection that can be exploited in building a real-time system, partially due to fast and efﬁcient feature extraction using the integral image. However, there exists one big trouble to directly implement Haar-like feature to train the classiﬁer, because the dimension of Haar-like feature vector generated from sample images are extremely high. For instance, the dimension of a Haar-like feature vector generated from a 24 by 24 image is more than 100,000. These will dramatically increase the computation time and require huge hardware storage which will be not suitable in our embedded system. Here NMF (Non-negative Matrix Factorization) is utilized to reduce the dimension of Haar feature vectors and form Haar-NMF feature vectors. NMF is a matrix decomposition method under the constrain that all the elements of the matrix are non-negative [12]. This algorithm is also can be considered as an optimization process under the constraint of certain cost functions, whose approximate solution can be calculated by iteration. The calculation process of Haar-NMF feature is as follows: (1) Let H be the Haar feature vector and l be the vector dimension. Since not all elements of Haar feature vectors are non-negative, the absolute value of Haar feature is obtained. Then H is converted to a matrix C(m × n), in which l = m × n. (2) Make NMF decomposition of rank r of matrix C C = W HT

(1)

In which, W is non-negative base matrix and H is non-negative coefﬁcient matrix whose dimension are m × r and n × r respectively. In the next part, Haar-NMF features will be loaded on a probabilistic neural network (PNN) to train the vehicle sensing classiﬁer.

4.1. Conﬁdence-based online sample query Conﬁdence-based query uses a conﬁdence metric to query examples that lie near the classiﬁcation boundary in the decision space [15]. Among many conﬁdence-based query methods, the score-based query is the most simple and effective, which use a simple threshold on the value of a classiﬁer’s discriminate function evaluated on given samples. So that an implicit probability or conﬁdence measure for binary classiﬁers can be obtained by just feeding the value of the discriminate function to the logistic function. Apply this method, in our case, each sample x with class conditional probabilities near 0.5 is queried. Indeed, in the real system the probabilities range P is chosen as 0.4–0.6. 4.2. Multi-cue based sample labeling In order to automatically label the new generated samples, this paper proposes a Multi-Cue based sample labeling strategy. Three cue factors are utilized to make the judgments which are complexity, vertical plane and relative movement. Samples labeling realization diagram is shown in Fig. 2. The three kinds of judgment factors are expressed as follows:

388

H. Wang et al. / Optik 126 (2015) 386–390

Fig. 3. Sky, horizontal plane and vertical plane distinction and marking. The vertical plane is marked with grown color and an “X” for vision emphasis. Besides, the horizontal plane is marked with green color and purple for the sky. (For interpretation of the references to color in ﬁgure legend, the reader is referred to the web version of the article.)

Fig. 2. Flow chart of multi-cue based sample labeling.

4.2.1. Complexity The ﬁrst factor that can be considered to judge vehicles is the complexity of an image sample. The idea is that if an image sample is of simple pattern, it cannot be identiﬁed as a vehicle. A good measure for this purpose can be the detected edges. Let n and n˜ denote the number of pixels and the number of detected edge pixels in an image sample, respectively. The measure of the degree of complexity of an image sample can be deﬁned as Cp =

n˜ n

(2)

If the complexity Cp of an image sample satisﬁes Cp < ıC , then this sample can be judged as negative one. Here ıC is the threshold which is set by statistic information. Statistic results of complexity factor C including 5000 vehicle images and 10,000 non-vehicle images imply that Cp of all vehicle images are more than 0.4 and that of non-vehicles are evenly distributed from 0.1 to 0.8. Based on the statistic results above, the threshold are conservatively set as ıC = 0.35. Any sample whose complexity is smaller than 0.35 is considered as negative sample. Otherwise, the sample needs further assessment in the next step. 4.2.2. Vertical Plane By observation, it can be found that road scene geometry can be mainly divided into three categories such as sky, horizontal plane and vertical plane. Statistic and analysis results of many road scene images reveal that horizontal plane are often from road surface whereas other items often contain vertical plane such as vehicles, trees and fences. Based on this, if these three categories of objects can be distinguished through single image analysis, it will provide very valuable information for the judgment of vehicles. Because every vehicle should contain certain percentage of vertical plane. In another words, the object that do not contain vertical plane is deﬁnitely not a vehicle. In literature [14], Hoiem proposed a geometric information classiﬁcation algorithm which is used in our framework. Firstly an image is over-segmented into superpixels [16], each of which belongs to a particular geometric class. Each superpixel is described by depth cues, including color, location, perspective, and texture. Then, from a logistic regression form of AdaBoost previously trained, the geometric class of each superpixel is speciﬁed. By using this method, a typical road image is processed like below (see Fig. 3). Fig. 3(a) is the original road image, while in Fig. 3(b) it is divided into three categories. The vertical plane is marked with grown color and an “X” for vision emphasis. Besides, the horizontal plane is marked with green color and purple for the sky.

If the number of pixels belonging to vertical plane of a sample image is below a certain percentage ıK , this sample is able to judge as negative, otherwise, the sample still needs further assessment in the next step. The vertical plane is marked with grown color and an “X” for vision emphasis. Besides, the horizontal plane is marked with green color and purple for the sky. 4.2.3. Relative movement The sample images that have not been judged as negative ones are still not able to be labeled as positive. Because, in some special cases, for example, a complex painting in the advertise board also satisfy both the two factors above. On road vehicle have another very unique feature different from many other objects, which is its movement. Considering that, a strong constrain is used to label samples as positive, which is the relative movement. The sample to be judged is tracked ﬁrstly with Meanshift which is a fast and effective algorithm used in many object tracking application. Then, using previous calculated camera parameters, the relative speed Vrel between the tracked sample and ego-vehicle is calculated. After that, by inducing the speed of ego-vehicle, the sample speed Vsam can be obtained. If Vsam is signiﬁcantly greater than zero, the sample can be considered as a moving object and labeled as positive (vehicle). Otherwise, the sample is labeled as uncertain. The reason why it is not labeled as negative is that there may exist still or low speed vehicles. 5. Experiments This section will demonstrate three groups of experiments results and comparisons focus on system performance: (1) correct rate of sample labeling; (2) learning ability of vehicle classiﬁer and (3) training time. 5.1. Experiment setup In the initial training, the positive training samples are partially from Caltech1999 database, which include 126 images contain rear view vehicles. Besides, another 300 vehicle images are collected by our groups in recorded road videos. Meanwhile, the negative samples are chosen from 500 images not contain vehicles and the number of negative samples for initial training is 2000. Fig. 4 shows some of these positive and negative training samples. 5.2. System performance 5.2.1. Correct rate of sample labeling In the ﬁrst experiment, around 7600 images are processed in the “sensing, sample-generation-and-labeling, retraining” process. During this process, 3856 new samples are generated in the

H. Wang et al. / Optik 126 (2015) 386–390

389

Fig. 4. Some positive and negative training samples. (a) Positive samples and (b) negative samples.

Table 1 Correct rate of sample labeling.

Table 2 Training time comparison.

Sample types

Correct labeling

Incorrect labeling

Correct rate (%)

Method

Positive samples Negative samples Overall

1278/1279 2183/2186 3461/3465

1/1279 3/2186 4/3465

99.92 99.86 99.88

With extra 100P/200N

With extra 300P/600N

With extra 900P/1800N

AdaBoost (h) Our PNN based method (s)

1.1 15.7

6.7 107.3

22.4 422.5

classiﬁers are the initial classiﬁer and retrained classiﬁers with extra 100P/200N (100 positive samples/200 negative samples), 300P/600N, 900P/1800N separately. From the ROC curves, it can be seen that the initial classiﬁer performs very poor. When extra samples are loaded continuously, the classiﬁcation ability improved dramatically.

Fig. 5. ROC curves of four classiﬁer with different number of training samples.

conﬁdence-based online sample query step. After the multi-cue based sample labeling step, 3465 of these new samples are successfully labeled as positive or negative, and 391 as uncertain. The labeling correct rate of these 3465 samples is shown in Table 1. From the experiment, it can be referred that the labeling correct rate is almost 100% and the labeled samples are conﬁdentially to be used in further classiﬁer retraining.

5.2.3. Classiﬁer training time Correct rate of sample labeling and classiﬁer learning ability are obviously the most two important characters, however, for our resource limited and real time embedded system, there is another critical problem which is the training time. Because when new samples are generated, the classiﬁer needs to be retrained and updated. In real time application, lower training time means quicker improvement of classiﬁer performance. The training time of our proposed method and another state-ofthe-art AdaBoost method are compared in Table 2. Since AdaBoost need much storage space which is not able to be supplied by DSP system, the comparison is preceded in a table PC with 2.67G Core 2 Duo processor, 4G memory and Ubuntu Linux 11.10 OS. It can be seen from Table 2 that, due to the superiority of PNN feed forward neural network structure, the training time is around 200 times lower than that of AdaBoost based method.

5.2.2. Classiﬁer learning ability In the second experiments, learning ability of the vehicle classiﬁer is assessed by using True Positive Rate (TPR) and False Detection Rate (FDR). TPR is the percentage of vehicles in the camera’s view that are detected. TPR is assessed by dividing the number of truly detected vehicles by the total number of vehicles. This quantity measures recall and localization. TPR is deﬁned by TPR =

Ture Positive All Positive

FDR is the proportion of detection that was not true vehicles. We assess the FDR by dividing the number of false positives by the total number of detections. This is the percentage of erroneous detection. FDR is a measure of precision and localization. FDR is deﬁned by FDR =

False Positive Ture Positive + False Positive

Four ROC curves related to four trained classiﬁers are shown in Fig. 5 which demonstrates the performance of classiﬁers when different number of training samples is loaded. The four

Fig. 6. Some of the real road vehicle sensing results. First row: daylight highway situation; Second row: raining day highway situation; Third row: daylight urban situation; Fourth row: night highway with road lamp. (For interpretation of the references to color near the citation of this ﬁgure, the reader is referred to the web version of the article.)

390

H. Wang et al. / Optik 126 (2015) 386–390

5.3. On-road experiment results

References

Finally, some of the vehicle sensing results in real road situation is shown in Fig. 6. The four rows of images are picked in daylight highway, raining day highway, daylight urban and night highway with road lamp respectively. The solid green box means detected vehicles, and the dotted red box means undetected vehicles or false detected vehicles. Overall, most of the on-road vehicles can be sensed successfully while miss detection and false detection sometimes occurred during adverse situation such as partial occlusion and bad weather. We even try to put our system in night road with no road lamp at all. However, the sensing performance is extremely poor due to very low illuminations.

[1] C. Blanco, F. Jaureguizar, N. García, An efﬁcient multiple object detection and tracking framework for automatic counting and video surveillance applications, IEEE Trans. Consum. Electron. 58 (3) (2012) 857–862. [2] R. Sun, J. Chen, J. Gao, Fast pedestrian detection based on saliency detection and HOG-NMF features, J. Electron. Inf. Technol. 35 (8) (2013) 1921–1926. [3] Z. Cai, M. Gu, Trafﬁc sign recognition algorithm based on shape signature and dual tree-complex wavelet transform, J. Cent. South Univ. Technol. 20 (4) (2013) 433–439 (English edition). [4] R. Zhang, S. Zhang, S. Yu, Moving objects detection method based on brightness distortion and chromaticity distortion, IEEE Trans. Consum. Electron. 53 (3) (2007) 1177–1185. [5] J. Kim, D. Yeom, Y. Joo, Fast robust algorithm of tracking multiple moving objects for intelligent video surveillance systems, IEEE Trans. Consum. Electron. 57 (3) (2011) 1165–1170. [6] Z. Sun, G. Bebis, R. Miller, On-road vehicle detection: a review, IEEE Trans. Pattern Anal. Mach. Intell. 28 (5) (2006) 694–711. [7] Q.B. Truong, H.N. Geon, B.R. Lee, Vehicle detection and recognition for automated guided vehicle, in: Proc. ICCAS-SICE, Fukuoka, Japan, 2009, pp. 671–676. [8] A. Haselhoff, S. Schauland, A. Kummert, A signal theoretic approach to measure the inﬂuence of image resolution for appearance based vehicle detection, in: Intelligent Vehicles Symposium, 2008 IEEE, Eindhoven, Dutch, 2008, pp. 822–827. [9] D. Ponsa, A. Lopez, F. Lumbreras, J. Serrat, T. Graf, 3D vehicle sensor based on monocular vision, in: Proceedings on Intelligent Transportation Systems, 2005 IEEE, 2005, pp. 1096–1101. [10] P. Viola, M. Jones, Rapid object detection using a boosted cascade of simple features, in: Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference on Vol. 1, IEEE, 2001. [11] S. Wender, K. Dietmayer, 3D vehicle detection using a laser scanner and a video camera, IET Intell. Transp. Syst. 2 (2) (2008) 105–112. [12] D.D. Lee, H.S. Seung, Learning the parts of objects by non-negative matrix factorization, Nature 401 (6755) (1999) 788–791. [13] D. Specht, Enhancement to Probabilistic Neural Network, in: IEEE International. Joint Conference on Neural Networks, Baltimore, MD, 1992, pp. 7–11. [14] D. Hoiem, A. Efros, M. Hebert, Recovering surface layout from an image, Int. J. Comput. Vis. 75 (1) (2007) 151–172. [15] M. Li, I.K. Sethi, Conﬁdence-based active learning, IEEE Trans. Pattern Anal. Mach. Intell. 28 (8) (2006) 1251–1261. [16] X. Ren, J. Malik, Learning a classiﬁcation model for segmentation, in: Ninth IEEE International Conference on Computer Vision, 2003, pp. 10–17.

6. Conclusions In this paper, a smart monocular vision based system to sense vehicles with a camera mounted inside a moving car is built. The “smartness” is for its self improved sensing ability when used by customers with the proposed online query-labeling-learning framework. Experiments demonstrate that this system worked well at most circumstances except for extremely low illumination condition. Our future work is to expand the sensing ability of this system so that it is able to work in extremely low illumination condition such as no lamp highway. Acknowledgments This research was funded partly by National Natural Science Foundation of China (51305167) and by Jiangsu University Scientiﬁc Research Foundation for Senior Professionals (12JDG10).

Smart road vehicle sensing system based on monocular vision

Smart road vehicle sensing system based on monocular vision

Recommend Documents