Learning full-reference quality-guided discriminative gradient cues for lane detection based on neural networks

J. Vis. Commun. Image R. 65 (2019) 102675 Contents lists available at ScienceDirect J. Vis. Commun. Image R. journal homepage: www.elsevier.com/loca...

Download PDF

1MB Sizes 0 Downloads 9 Views

Report

PDF Reader
Full Text

J. Vis. Commun. Image R. 65 (2019) 102675

Contents lists available at ScienceDirect

J. Vis. Commun. Image R. journal homepage: www.elsevier.com/locate/jvci

Learning full-reference quality-guided discriminative gradient cues for lane detection based on neural networks Jingyi Liu ⇑ School of Intelligent Manufacturing and Automobile, Chongqing College of Electronic Engineering, Chongqing, China Applied Mathematics and Mechanics, Peter the Great St. Petersburg Polytechnic University, St. Petersburg, Russia

a r t i c l e

i n f o

Article history: Received 1 July 2019 Revised 6 October 2019 Accepted 6 October 2019 Available online 9 October 2019 Keywords: Lane detection Full-reference IQA CNN RNN

a b s t r a c t Learning an intelligent lane detection system is significant to autonomous vehicles, which is a crucial module to smart cars. Although conventional approaches have achieved impressive performance, they suffer from the following limitations: (1) lane perception are confronted with different weather conditions and varied illumination. Existing methods lack a unified framework for characterizing different sceneries and (2) the inefficiency of utilizing images due to the potential label noise. To solve these limitations, we propose a lane detection framework towards autonomous vehicles by learning a fullreference quality-aware discriminative gradient deep model, where two types of deep networks are proposed. More specifically, we first design a gradient-guided deep convolutional network to detect the presence of lane, since the gradient value of lane edge is larger than that of other regions. We leverage fullreference image quality assessment (FR-IQA) method to discover more discriminative gradient cues, and geometric attributes are exploited simultaneously. Subsequently, a recurrent neural layer is designed to represent the spatial distribution of detected lanes whose visual cues are difficult to explicitly define. Noticeably, we only utilize a small proportion of the labeled images, whereas the noisy features are abandoned using sparsity penalty. Extensive experiments have demonstrated the effectiveness of our proposed method. Ó 2019 Elsevier Inc. All rights reserved.

1. Introduction With the development of smart cars, autonomous vehicles become a hot research topic in both academic and industry field. As a key technique in autonomous vehicles, lane detection, whose target is to detect lane position, is an indispensable module for lane departure warning systems and lane keeping systems [1]. Autonomous vehicles can locate their position and adjust their orientation in real time by utilizing detected lanes. Besides, lane detection technique can be used for vehicle route planning, which is a significant contribution for urban traffic and robot path selection. For example, with the development of e-commerce, online shopping has become a trend, which promotes the development of logistics. The application of lane detection technique in intelligent transportation can provide route planning for autonomous truck. In general, lane markings are either white or yellow lines. Lane detection is a big challenge in computer vision because ⇑ Address: School of Intelligent Manufacturing and Automobile, Chongqing College of Electronic Engineering, Chongqing, China. E-mail address: [email protected] https://doi.org/10.1016/j.jvcir.2019.102675 1047-3203/Ó 2019 Elsevier Inc. All rights reserved.

lane is a small region within an image, which is continuous or discontinuous lines. Even worse, lane detection becomes impossible due to obstacles such as vehicles, trees, street lamp or pedestrian. Bad weather conditions are an important factor affecting lane detection. The difficulty of lane detection in rainy days is far greater than that in good weather conditions. Moreover, illumination is another factor that affects lane detection. Under poor illumination, lane detection is extremely difficult. In summary, lane detection is still a problem to be solved due to some limitations: (1) Different weather conditions and illumination will affect the accuracy of lane detection. Besides, obstacles are another factor that increase the difficulty of detection. Existing algorithms cannot exploit visual cues of lane markings effectively. In addition, these algorithms have not encoded geometric information which is a crucial cues for lane detection. (2) In practice, when lane markings are not clearly visible or missing, the accuracy of lane detection will drop dramatically. Some objects that are very similar to lane markings, such as street lamp, will also affect detection.

2

J. Liu / J. Vis. Commun. Image R. 65 (2019) 102675

There are lacking of standard datasets for lane detection. In addition, noise is introduced inevitably in the annotation process, such as lane-like noise. Existing algorithms for lane detection are mainly based on no-noise environment, thus generalization capacity is poor. In our research, we observe that lane markings are always white or yellow. Thus, gradient value of lane edge will be much larger than that of others’ regions. As shown in Fig. 1, we randomly sample some regions in greyscale image. Red rectangles denote lane edge, green rectangles denote others’ region. Obviously, the gradient change in the red rectangular region is much larger than that in other regions. Inspired by this, we propose a novel method for lane detection by learning a noise-tolerance deep model. More specifically, we first design a gradient-guided deep convolutional network to detect the presence of lane, since the gradient of lane edge is larger than that of other regions. Besides, two edges of lane marking are parallel and vectors of which are in the same direction. Thus, we make full use of the geometric attributes. Subsequently, a recurrent neural layer is designed to cope with spatial distribution of detected lanes whose visual cues are difficult to explicitly define. Noticeably, we only utilize a small proportion of tagged images and noisy features will be abandoned based on sparsity penalty. The main contributions of our proposed method can be summarized as follows: (1) We design a two kinds of deep networks for lane detection, where both gradient information and geometric attributes are utilized to detect lane cues which are not easily visible. (2) Our proposed method is noise-tolerant, where we design a sparsity penalty to remove contaminated labels with noise. 2. Related work Our proposed method is related to three research topics: image quality assessment, deep neural networks and lane detection. 2.1. Image quality assessment In our implement, we leverage full-reference image quality assessment method [5,6] to select regions that is similar to the lane regions. FR-IQA requires an image as the reference image to compare the difference between the test image and the reference image. Sheikh et al. [4] conducted an extensive subjective IQA investigation where 779 distorted images were evaluated. In their research, more than 25,000 individual judgments of images were leveraged to evaluate the performance of full-reference assessment algorithms. Researchers conducted comprehensive experiments

Fig. 1. Gradient features in greyscale image. The gradient value of lane edge (red rectangle) is larger than that of others’ region (green rectangle). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

including various number of images, human judgments and distortion images. Wang et al. [2] proposed a structural similarity metric for image quality assessment. They proposed an assumption that structural information within a scene can be highly perceived by human visual mechanism. Thus, they designed a Structural Similar Index (SSIM) for calculating the relationship between original images and evaluated images. Further, Wang et al. [3] developed a multiscale SSIM (MSSIM) algorithm, which was more flexible than single-scale algorithms [2], where MSSIM incorporated various visual conditions. 2.2. Deep neural networks In recent years, deep neural networks have achieved impressive performance in image classification, image recognition or image segmentation [29,30]. Classical neural networks include AlexNet [9], VGG [26], GoogleNet [27], ResNet. Itti et al. [7] proposed the earliest neural-based saliency detection, where multimodel image features such as color, intensity and orientations were combined into a saliency map. To decrease saliency regions to be detected, a dynamical neural network was proposed to select attended locations. Hinton et al. [8] leveraged deep neural networks to cope with speech recognition, which outperformed HMMs and GMMs based algorithms by a large margin. Krizhevsky et al. [9] designed a deep CNN for image classification under 1.2 million images in the ImageNet dataset. The dataset was classified into 1000 different groups. Dropout layer was also leveraged in the designed network to alleviate overfitting. Silver et al. [10] designed famous AlphaGo program based on deep neural networks and tree search in 2016. They designed ‘value networks’ to calculate board positions, and moves were selected using ‘policy networks’. And Monte Carlo tree search algorithm was incorporated with value and policy networks. Ciresan et al. [11] proposed a multi-column deep networks towards image classification. Winner-take-all strategy was leveraged for neuron training. Experiment on MNIST classification demonstrate the competitive among state-of-art algorithms. Graves et al. [12] designed LSTM RNN architecture for speech recognition and achieved 17.7% test error on the TIMIT benchmark. 2.3. Lane detection Lane detection technique is an enabler to autonomous vehicles, which can assist cars to adjust their position to drive along the road. Yoo et al. [1] proposed a gradient-enhancing conversion algorithm for lane detection with strong illumination robustness. They first converted RGB images to gray-scale images. Afterwards, adaptive Canny edge detection, Hough transform and curve model fitting algorithms were leveraged to plot lane markings. The proposed method achieved 96% lane detection accuracy in normal illumination and more than 93% under poor illuminations. Hillel et al. [13] conducted a comprehensive survey on road and lane detection, which was a big contribution for relevant researches. Cheng et al. [14] proposed a lane detection framework towards moving vehicles in traffic context. They designed color information based land markings extraction algorithm, which was robust to both illumination and moving cars occupation. As for vehicles whose color were similar to lane markings, they leverage shape, size and motion information for distinguishing them. Afterwards, lane detection can be achieved through pixel accumulation in the extracted lane mask. Jiang et al. [15] proposed a particle filterbased algorithm for lane detection, where an enhanced Euclidean distance was designed to calculate an edge map of a road image. In [16], Jiang et al. designed a vison system to achieve multiple lane detection. Two occasions towards straight path and curve path were processed using different strategies. For straight lines, Hough transform was utilized for lane detection. While a complete

3

J. Liu / J. Vis. Commun. Image R. 65 (2019) 102675

perspective transformation was leveraged to cope with curve paths. Kong et al. [17] decomposed lane detection into two steps including vanishing point estimation and road area segmentation. The key technique of the method was a soft voting scheme. Li et al. [18] designed a two kinds of deep networks for scene understanding. The multitask deep CNN architecture aimed to detect presence of the target. The recurrent networks aimed to exploit spatial information. They conducted experiment on lane detection and results demonstrate the effectiveness of the proposed method. Kaliyaperumal et al. [21] proposed a lane detection under obstacles in radar images, which was adaptive to all-weather conditions. Prior information of road shape was leveraged to detect road edges. The Metropolis algorithm was utilized to deform the template to exploit underlying gradient information. Ma et al. [22] proposed a multisensory fusion algorithm for both lane and pavement detection. In their implement, boundary detection was achieved using a Bayesian multisensory image fusion method, where a deformable template model was utilized to characterize the boundary of interest. An empirical MAP estimate was leveraged to approximately describe standard MAP estimate. Sach et al. [23] conduct road profile detection from low texture images. An edge maps was utilized to accelerate convergence in the stereo matching. Cremean et al. [24] leveraged Kalman filter for road geometry estimation, where a clothoid model was constructed. Takagi et al. [25] utilized onvehicle LIDAR for road environment recognition. A model-based algorithm was built to recognize objects on the absolute coordinate system. 3. Proposed method We argue that gradient information is crucial to lane detection due to the gradient values of lane marking edges are much larger than that of others’ regions. In our implement, we design a gradient guided deep convolutional network to detect the presence of target, i.e., lane markings. The pipeline of our method is shown in Fig. 2. The key techniques of our method are a gradient guided CNN architecture that detect cues of lane markings and an RNN architecture that cope with invisible cues. In addition, our deep model is trained in a noise-tolerance manner using sparsity penalty. 3.1. FR-IQA gradient-guided deep networks In general, the gradients of edges and corners are larger, thus, gradient information reflects the shape of objects. Histogram of gradient (HOG) is a classical low-level feature, which is widely used in pedestrian tracking, vehicle recognition and tracking. Each pixel has a magnitude and orientation. For RGB images, gradient of a pixel is calculated among three channels. And the corresponding

magnitude is the largest one among three magnitudes of channels, orientation is the corresponding one of the largest magnitudes. For each pixel, we define its gradient information as follows:

g ¼ ðm; hÞ

ð1Þ

where m denotes gradient magnitude, and h denotes gradient orientation. As shown in Fig. 3, we leverage Sobel filter kernel Sx and Sy to calculate the derivative value of a pixel in both x-orientation and yorientation. We calculate convolution of Sobel filter and image as follows:

@f ¼ Sx f @x

ð2Þ

@f ¼ Sy f @y

ð3Þ

where f denotes the image, denotes convolution operator. Then, the vector of gradient can be defined as follows:

rf ¼

T @f @f ; : @x @y

ð4Þ

The magnitude and orientation of gradient are calculated as:

sﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ 2 2 @f @f m¼ þ @x @y h ¼ tan1

@f @y

@f : @x

ð5Þ

ð6Þ

From Fig. 3, we can see that the largest gradient values are along lane markings’ edge. Inspired by this observation, we propose a gradient guided CNN architecture to maximize the utilization of gradient information. In order to select high-quality lane regions from original image, we leverage full-reference image quality assessment (FR-IQA) algorithms to achieve this task. Specifically, we define a distance metric to calculate the structural cues of lane markings’ gradient:

Dðx; yÞ ¼ f ðsðx; yÞ; bðx; yÞ; g ðx; yÞÞ

ð7Þ

where sðx; yÞ denotes the structural similarity, bðx; yÞ denotes the brightness factor and g ðx; yÞ denotes the gradient similarity. sðx; yÞ is defined as follows:

2 Rx Ry sðx; yÞ ¼ Rx þ Ry

ð8Þ

where Rx denotes the structure of reference image and Ry denotes the structure of test image. bðx; yÞ is defined as:

bðx; yÞ ¼

2ð1 þ SÞ 1 þ ð 1 þ SÞ 2

ð9Þ

Fig. 2. The pipeline of our proposed method. We propose two kinds of deep networks to detect visual cues of lane markings, and we leverage quality model to choose the most discriminative gradient cues.

4

J. Liu / J. Vis. Commun. Image R. 65 (2019) 102675

3.2. Recurrent neural networks

Fig. 3. An example of gradient of lane markings, where blue arrow denotes the orientation of gradient, and the length of arrow denotes the magnitude of gradient. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

where S denotes the brightness change. g ðx; yÞ is defined as:

2 gx gy g ðx; yÞ ¼ gx þ gy

ð10Þ

where g x and g y denote the gradient factor of the reference image and test image, respectively. Formula (7) evaluates the similarity between reference image x and test image y. In our implement, the regions of the test image have high value of Dðx; yÞ are deemed as the high-quality regions and we select these regions for further processing. An advantage of our deep network is that the input image to our network is the whole image instead of region of interest (ROI). Thus, our designed network is an end-to-end training framework. In order to make full use of gradient information, we design a gradient-activation layer (GAL) to calculate gradient and search a series of largest ones which might represent edge of lane markings. In order to discover the most discriminative gradient cues, we leverage a non-reference image quality assessment method followed by [28]. In addition, geometric attribute is another crucial information. As shown in Fig. 3, two edges of lane markings have similar texture, thus, we recognize the gradient information of these two edges is similar or symmetric. Specifically, we concatenate each pixel whose magnitude of gradient is the largest within its 8 8 neighborhood to form candidate lane regions. Noticeably, there are some inherent constraints: (1) All lane markings are either straight lines or curve. (2) As for straight lane markings, two edges of the lane markings are parallel. As for curve lane markings, inspired by differential algorithm, we assume that two edges of the lane markings are parallel in a particular small region. Therefore, the loss function of our gradient guided convolution neural network is:

L¼

l

l q 2 2 X 1X q b b q Hq k d j dj H j j 2 j¼1 j¼1

ð11Þ

q b q is the j-th output, and Hq is the ground truth. b where H d j denotes j j q

the direction of the j-th output, and dj denotes the direction of ground truth. k denotes a regularization parameter. In our network, n oP Q we define X P as the input feature maps and Hq q¼1 as the

Based on the candidate lane markings detected by gradient based CNN, we propose a recurrent neural networks for lane structure prediction and visual cues detection. The internal state of RNN can show dynamic sequences. Different from feedforward neural networks, RNN can utilize its internal memory to process input sequences of arbitrary time series. We recognize that lane marking cues have the similar spatial structure, such as orientation, position in the image. We tend to assume that lane is at the bottom of the image instead of top (Generally speaking, regions in the top of the image is sky or buildings.). In addition, in poor illumination, the gradient of lane region is similar to that of lane lamp regions, both of them have large magnitude of gradient. However, their geometric properties are quite different. Our RNN architecture can detect these visual cues. In order to perceive global geometric attributes of an image, the model is required to preserve internal memory. We argue that when model traverses images, the memory of one part of the image will affect another, thus, meaningful geometric attributes will be generated when trained properly. Fig. 4 shows our structure of designed RNN architecture. Candidate lane regions generated by gradient-based CNN contain uncertain objects, such as street lamp. In order to exploit geometric attributes of target regions, we leverage recurrent neurons to detect visual cues. LSTM [19] is utilized in our implement. We define standard recurrent neuron as follows: dþ1

hi

¼ f adþ1 i

¼ adþ1 i

X

ð13Þ

wij xdþ1 þ j

X

j

d

ð14Þ

uik hk

k d

where the function f ðÞ is a nonlinear activation, and hi denotes the status of the i-th neuron at d-th step. x denotes the neuron of previous layers, w and u denote connection weights. A recurrent neuron cell is shown in Fig. 5. Considering three types of gates, formula (10) can be reorganized as follows: dþ1

adþ1 ¼ cdþ1 adi þ bi i i

g

X

wij xdþ1 þ j

j

X

! d

uik hk

ð15Þ

k

where symbol c and b denote forget/keeping gate and input gaze, respectively. As similar as function f ðÞ, g ðÞ is a nonlinear activation. The net input consists of three components: signals from previous layers, hosting cell and previous output. This can be formulated as:

adþ1 ¼g i

X j

waij xdþ1 þ j

X

! a v aik hdk þ v ai adþi i

ð16Þ

k

where a denotes a gate of fb; c; dg.

p¼1

output feature maps. Our gradient-activation layer activates neurons based on gradient values. The output feature map is defined as: q

H ¼f

X p

! P

X G

q;p

þb

q

s:t: f ðxÞ ¼

1 1 þ ex

ð12Þ q

where Gq;p denotes a series of gradient filters, and b denotes bias of output feature maps. denotes a convolution operator. By minimizing formula (7), we can obtain a series of candidate lane markings within the image. Afterwards, we design an RNN architecture to further detect cues of lane markings.

Fig. 4. Structure of RNN architecture, where candidate lane markings are generated by our gradient guided CNN. Some non-lane regions such as street lamp are seemed as lane markings. Predicted lane markings are generated by recurrent neurons.

5

J. Liu / J. Vis. Commun. Image R. 65 (2019) 102675

Fig. 6. Sample images from tuSimple dataset. The dataset contains different weather conditions.

Fig. 5. The structure of a recurrent neuron cell.

4.2. Comparative experiments

Noticeably, in this paper, in order to alleviate the influence of noisy labels, we design a sparsity penalty to utilize a small proportion of tagged images and noisy features will be abandoned. For a series of labels, we minimize the following objective function:

minl1 ;l2 ;;ln

1 a b jjF VUjj2 þ jjVjj2F þ jjWjj2;1 2 2 2

ð17Þ

where F denotes the label matrix, W denotes the transformation matrix. U denotes the output of the most top layer. By minimizing formula (13), noisy labels can be abandoned using a large penalty. The whole framework of our proposed method is summarized as follows: Algorithm: Lane detection by learning a noise-tolerance model Input: A set of training images, a test image. Output: Test image with plotted lane. Steps: (1) Candidate lane markings selection using gradient guided CNN architecture by minimizing formula (7); (2) Visual cues detection using RNN architecture based on formula (11) as well as formula (13) for noise-tolerance training. (3) Lane detection using learned deep model.

We conduct comparative experiment with other state-of-theart algorithms to highlight the performance of our method. The comparative result is shown in Table 1. Comparative results demonstrate that our method outperforms other competitors. Our gradient-guided CNN and noisetolerant RNN can complement with each other. CNN architecture aims to detect candidate lane marking regions based on gradient magnitude and orientation, while RNN architecture aims to

Table 1 Comparative experiments on different algorithms. Methods

Accuracy

False positive

False negative

Dpantoja VPGNet Aslarry Leonardoli XinggangPan Jung [20] Ours

96.17 96.81 96.51 96.87 96.43 96.32 96.94

0.2362 0.0763 0.0851 0.0442 0.0627 0.0548 0.0533

0.0363 0.0214 0.0263 0.0193 0.0181 0.0173 0.0212

Bold values means the best performer.

4. Experiments and analysis This section introduces our experiment. We conduct the experiment on DELL precision 5530 equipped with I7 CPU, 512 GB SSD and Nvidia 1080Ti. Our experiment includes implementation details, comparative study. 4.1. The implementation details We leverage tuSimple dataset as our training and testing dataset, which consists of over 3600 training images as well as 2700 testing images. TuSimple dataset contains different weather conditions, and it is the largest dataset for evaluating the performance of lane detection. The sample images are shown in Fig. 6. Our method is conducted on deep learning framework Caffe. The training images are first resized to 320 240, and the while dataset was trained for 10 epochs with a batch size of 128. We set the initial learning rating to 0.1, and after training for one epoch, it becomes 0.05. The weight decay is set to 0.0002, and momentum is set to 0.8. The whole model is trained using stochastic gradient descent (SGD).

Fig. 7. ROC curve under noise-tolerant learning and non-noise-tolerant learning.

Fig. 8. Lane detection results using our trained model.

6

J. Liu / J. Vis. Commun. Image R. 65 (2019) 102675

Table 2 IQA accuracy on different datasets and algorithms. LIVE dataset Method

PLCC

MAE

RMS

SRCC

KRCC

VSNR VIF PSNR MAD DW-PSNR CTW-PSNR SW-PSNR IW-PSNR DW-SSIM Ours

0.9228 0.9592 0.9259 0.9395 0.9164 0.9352 0.9276 0.9334 0.9562 0.9528

0.8075 0.6148 0.7962 0.7292 0.7957 0.7204 0.7782 0.7328 0.6213 0.6478

10.52 7.671 10.36 9.366 10.97 9.689 10.17 9.842 8.049 9.371

0.9271 0.9636 0.9291 0.9439 0.9192 0.9373 0.9274 0.9335 0.9574 0.9545

0.7613 0.8272 0.7658 0.7920 0.7578 0.7856 0.7679 0.7796 0.8199 0.8184

TID2008 dataset Method

PLCC

MAE

RMS

SRCC

KRCC

VSNR VIF PSNR MAD DW-PSNR CTW-PSNR SW-PSNR IW-PSNR DW-SSIM Ours

0.6823 0.8098 0.5524 0.7483 0.5848 0.6482 0.6293 0.6668 0.8039 0.8055

0.6911 0.5996 0.8033 0.6644 0.8102 0.7338 0.7581 0.7179 0.5913 0.5874

0.9814 0.7892 1.1187 0.8907 1.0881 1.0218 1.0433 1.0012 0.7978 0.8886

0.7052 0.7496 0.5615 0.7703 0.5887 0.6598 0.6424 0.6832 0.8166 0.7192

0.5338 0.5862 0.4508 0.5735 0.4393 0.5079 0.4862 0.5262 0.6223 0.6271

5. Conclusion

Table 3 IQA accuracy compared with NR-IQA methods. Method

BLIINDS-II DIIVINE BRISQUE NIQE CORNIA Ours

LIVE

TID2008

LCC

SROCC

LCC

SROCC

0.916 0.923 0.942 0.915 0.935 0.947

0.912 0.925 0.939 0.914 0.942 0.926

0.628 0.654 0.651 0.426 0.613 0.663

0.536 0.549 0.573 0.317 0.549 0.604

Bold values means the best performer.

detect visual cues of these candidate regions. In addition, our noise-tolerant training strategy can abandon noisy labels and minimizing utilization of manual labels. In order to highlight advantages of our noise-tolerant learning algorithm, Fig. 7 shows the ROC curve under noise-tolerant learning and non-noisetolerant learning. Obviously, noise-tolerant learning outperforms non-noise-tolerant learning. Fig. 8 shows the detected results using our trained model. Compared with other algorithms, our method has two advantages. First, our algorithm makes full use of gradient information, which is quite different from other regions. Second, we leverage CNN architecture to select candidate lane markings and RNN to further refine them. Our training strategy can incorporate advantages of these two deep networks. Third, a noise-tolerant learning algorithm is designed to alleviate the influence of noisy labels. 4.3. Comparative study on IQA In our implement, we leverage FR-IQA method for selecting high-quality regions. We conduct comparative experiments on IQA datasets and different algorithms, as shown in Table 2. Our FR-IQA method achieves competitive performance. In addition, the most applied IQA methods, non-reference IQA (NR-IQA), are also utilized to conduct comparative study. The result is shown in Table 3.

Lane detection technique is widely applied in modern intelligent systems, especially in autonomous vehicles. Traditional algorithms leverage Hough transform based algorithms to detect lane markings, however, they cannot achieve satisfactory performance. In addition, conventional methods didn’t take gradient and geometric information into account. The gradient value of lane edge is larger than that of other regions. In this paper, we propose a lane detection framework by learning noise-tolerance deep model, where two kinds of deep networks are designed for detecting visual cues of lane markings. More specifically, since the gradient value of lane edge is larger than that of other regions, thus we first design a gradient-guided deep convolutional network to detect the presence of lane, and geometric attributes are exploited simultaneously. Subsequently, a recurrent neural layer is designed to cope with spatial distribution of detected lanes whose visual cues are difficult to explicitly define. In our implement, we only utilize a small proportion of labeled images and noisy features will be abandoned based on sparsity penalty. We conduct comprehensive experiments and results demonstrate the effectiveness of our method. Declaration of Competing Interest The authors declare that there is no conflict of interest. Acknowledgements This work was supported by Fund project: Research on Lightweight Innovation Technology of Electric Vehicle; Project (No. XJZK201813). References [1] H. Yoo, U. Yang, K. Sohn, Gradient-enhancing conversion for illuminationrobust lane detection, IEEE Trans. Intell. Transp. Syst. 14 (3) (2013) 1083–1094. [2] Z. Wang, A.C. Bovik, H.R. Sheikh, E.P. Simoncelli, Image quality assessment: from error visibility to structural similarity, IEEE Trans. Image Process. 13 (4) (2004) 600–612.

J. Liu / J. Vis. Commun. Image R. 65 (2019) 102675 [3] Z. Wang, E.P. Simoncelli, A.C. Bovik, Multiscale structural similarity for image quality assessment, The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, vol. 2, IEEE, 2003, pp. 1398–1402. [4] H.R. Sheikh, M.F. Sabir, A.C. Bovik, A statistical evaluation of recent full reference image quality assessment algorithms, IEEE Trans. Image Process. 15 (11) (2006) 3440–3451. [5] W. Osberger, N. Bergmann, A. Maeder, An automatic image quality assessment technique incorporating high level perceptual factors, in: Proc. IEEE Int. Conf. Image Process., 1998, pp. 414–418. [6] R.J. Safranek, J.D. Johnston, A perceptually tuned sub-band image coder with image dependent quantization and post-quantization data compression, in: Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 1989, pp. 1945–1948. [7] L. Itti, C. Koch, E. Niebur, A model of saliency-based visual attention for rapid scene analysis, IEEE Trans. Pattern Anal. Mach. Intell. 11 (1998) 1254–1259. [8] G. Hinton, L. Deng, D. Yu, G. Dahl, A.R. Mohamed, N. Jaitly, T. Sainath, Deep neural networks for acoustic modeling in speech recognition, IEEE Signal Process Mag. (2012) 29. [9] A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet classification with deep convolutional neural networks, in: Advances in Neural Information Processing Systems, 2012, pp. 1097–1105. [10] D. Silver, A. Huang, C.J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, S. Dieleman, Mastering the game of Go with deep neural networks and tree search, Nature 529 (7587) (2016) 484. [11] D. Ciresßan, U. Meier, J. Schmidhuber, Multi-column deep neural networks for image classification, 2012. arXiv preprint arXiv:1202.2745. [12] A. Graves, A.R. Mohamed, G. Hinton, Speech recognition with deep recurrent neural networks, in: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, IEEE, 2013, pp. 6645–6649. [13] A.B. Hillel, R. Lerner, D. Levi, G. Raz, Recent progress in road and lane detection: a survey, Mach. Vis. Appl. 25 (3) (2014) 727–745. [14] H.Y. Cheng, B.S. Jeng, P.T. Tseng, K.C. Fan, Lane detection with moving vehicles in the traffic scenes, IEEE Trans. Intell. Transp. Syst. 7 (4) (2006) 571–582. [15] R. Jiang, R. Klette, T. Vaudrey, S. Wang, New lane model and distance transform for lane detection and tracking, in: International Conference on Computer Analysis of Images and Patterns, Springer, Berlin, Heidelberg, 2009, pp. 1044– 1052. [16] Y. Jiang, F. Gao, G. Xu, Computer vision-based multiple-lane detection on straight road and in a curve, in: 2010 International Conference on Image Analysis and Signal Processing, IEEE, 2010, pp. 114–117.

7

[17] H. Kong, J.Y. Audibert, J. Ponce, Vanishing point detection for road detection, in: 2009 IEEE Conference on Computer Vision and Pattern Recognition, IEEE, 2009, pp. 96–103. [18] J. Li, X. Mei, D. Prokhorov, D. Tao, Deep neural network for structural prediction and lane detection in traffic scene, IEEE Trans. Neural Networks Learn. Syst. 28 (3) (2017) 690–703. [19] A. Graves, Supervised sequence labelling with recurrent neural networks Ph.D. dissertation, Univ. Toronto, Canada, 2009. [20] S. Jung, J. Youn, S. Sull, Efficient lane detection based on spatiotemporal images, IEEE Trans. Intell. Transp. Syst. 17 (1) (2015) 289–295. [21] K. Kaliyaperumal, S. Lakshmanan, K. Kluge, An algorithm for detecting roads and obstacles in radar images, IEEE Trans. Vehicular Technol. 50 (2001) 170– 182. [22] B. Ma, S. Lakshmanan, A.O. Hero, I. Simultaneous detection of lane and pavement boundaries using model-based multisensory fusion, IEEE Trans. Intell. Transp. Syst. 01 (2000) 135–147. [23] L. Sach, K. Atsuta, K. Hamamoto, S. Kondo, A robust road profile estimation method for low texture stereo images, in: International Conference on Image Processing, 2009, pp. 4273–4276. [24] L.B. Cremean, R. Murray, Model-based estimation of off-highway road geometry using single-axis LADAR and inertial sensing, in: Proceedings of the IEEE International Conference on Robotics and Automation, 2006, pp. 1661–1666. [25] K. Takagi, K. Morikawa, T. Ogawa, M. Saburi, Road environment recognition using on-vehicle LIDAR, in: IEEE Intelligent Vehicles Symposium, 2006, pp. 120–125. [26] K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition, 2014. arXiv preprint arXiv:1409.1556. [27] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, A. Rabinovich, Going deeper with convolutions, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 1–9. [28] A. Mittal, A.K. Moorthy, A.C. Bovik, No-reference image quality assessment in the spatial domain, IEEE Trans. Image Process. Publ. IEEE Signal Process. Soc. 21 (12) (2012) 4695. [29] I. Bezzine, M. Kaaniche, S. Boudjit, A. Beghdadi, Sparse optimization of non separable vector lifting scheme for stereo image coding, J. Vis. Commun. Image Represent. 57 (2018) 283–293. [30] N. Karimi, M.R. Taban, Nonparametric blind SAR image super resolution based on combination of the compressive sensing and sparse priors, J. Vis. Commun. Image Represent. 55 (2018) 853–865.

Learning full-reference quality-guided discriminative gradient cues for lane detection based on neural networks

Learning full-reference quality-guided discriminative gradient cues for lane detection based on neural networks

Recommend Documents