( :()p~rig-ht
©
11" :\( : ( :()Illnd.
COlllllllllli c l1i (J l h
P ; lri~ . Fr;!lH.. t'.
ill
C (J IIlIHlIl· r ~.
l · r ; lII ~ p () rI ; lIilm.
I ~ I X~ I
COMPUTER VISION AS A TRAFFIC SURVEILLANCE TOOL N. Hoose
Abstract Traffic flows on major highways have now reached the level that any disruption in flow can lead to severe congestion. Apart from the increased risk of accidents caused by the formation of queues and shock waves, the increase in journey times for road users represents a significant cost to the community. There is a need for systems that can monitor the current state of traffic along critical lengths of highway and report deteriorations in traffic conditions rapidly and consistently to traffic control centres. currently the most sophisticated traffic surveillance systems combine the output from several point vehicle detectors (e.g. loop detectors) to provide an alarm based on certain combinations of flow, speed and detector occupancy. A CCTV system is often used to look at the highway and verify the alarm. There are two main drawbacks with such systems. Firstly, the traffic data is time based and due to the random features of traffic flow the measurements must be assessed over a period of time in order to prevent too many false alarms. Secondly, the large amount of information available from the CCTV system is largely ignored due to the expense and difficulty of having human operators continually monitoring the TV pictures. This paper describes the use of a commercially available, powerful image processing subsystem hosted by a micro-computer to provide continuous monitoring of the images form a TV camera. By utilising the particular architecture of the sub-system to carry out low level image transformations over a 512 x 512 pixel image at high speed and using the host to perform higher level "vision" an alarm can be generated when the computer detects certain conditions. This can be used at the control centre to attract attention to a particular camera location. Th e alg o rithm is based on using the spatial information contained within the image and offers the possibility of rapid response to changes in traffic conditions within the scene. A description of the hardware and the software is given along with details of trials using video recordings of different highway locations. The types of errors found are described and their impact on the likely usage of the system assessed. Keywords. Traffic control; road traffic: pattern recognition; image processing; INTRODUCTION
computer applications;
the current traffic situation. By comparing the measured parameters with their expected values we can identify and locate abnormalities in the traffic flow . This method does not attempt to identify individual vehicles.
The automatic analysis of video images by computer in order to provide traffic data has been the subject of research for the past decade. Most research has been directed towards providing detailed data which could then be used for analysing a particular problem off-line or be further aggregated and used in a traffic control system. This paper proposes an algorithm which is directed towards traffic surveillance rather than data collection .
When presented with a view of a highway and its traffic a human can very quickly characterise the current situation, e.g light flow, congestion, incident, without counting or measuring the speed of individual vehicles. The aim of our approach is to mimic this ability of humans and to provide a qualitative description of the current state of the traffic. This should provide a system capable of continuous surveillance which can draw an
The method of scene analysis described in this paper considers the scene as a whole and attempts to identify regions with common properties that can be related to
57
:\. I loo se
S8
operators attention to changes in traffic that may require a traffic control measure to be initiated. The use of image processing techniques for traffic surveillance and incident detection is being studied at a number of institutes around the world. Research carried out in Sweden (Abramczuk, 1984) processes two or three rows of pixels along the carriageway and identifies individual vehicles or platoons of vehicles. At INRETS in France (Beucher, Blosseville & Lenoir 1987) researchers have proposed an algorithm which allows individual vehicles to be located and tracked. Two UK groups (Hoose & Willumsen 1987, Houghton et al. 1987) have reported on methods for tracking vehicles through junctions. Most of the research reported to date has made use of purpose built or highly sophisticated image processing hardware. The utility of this approach lies in its use of readily available, low cost computing equipment in combination with software based algorithms. Use is made of the standard image processing functions supported by these systems to provide "building blocks" for the algorithm. Attention is also directed towards obtaining information about "traffic" rather than aggregating individual vehicle data. TRANSFORMING THE IMAGE A digital image can be considered as a matrix of numbers whose values represent the average reflected light level for the projected ground area of each pixel. This size of this area is a function of the relative geometry of the camera and the scene. The matrix shows how the light is distributed across the scene but little else. More information can be extracted by considering relative pixel values, e.g the difference in grey level values of adjacent pixels. Thus, in order to extract more meaningful information from the matrix transformations must be performed to produce matrices of relative values in both the spatial and time domains. It is thought that biological vision systems utilise the relative light values between different objects and parts of objects rather than the absolute reflected light level.
The differences caused by moving objects are a result of regions of differing brightness covering or uncovering each other, e.g a bright region moving over a dark one. If 8t is small then these differences will appear at the edges of the region. In real world images most edges are not simple steps but transitions in grey level that occur over several pixels. As the distance moved increases the size of the difference increases both in magnitude and in area. This increase can be seen in the histogram of an image, as shown in Fig. 1. Increasing movement within the image causes the distribution of grey levels to spread. The left histogram of Fig. 1 shows the distribution of signed differences due to noise whilst the right shows the histogram of differences when movement is taking place. A comparison of the variance of the distribution with noise only with that of noise plus movement can be used to detect and obtain a measure of the amount of movement, i.e as the amount of movement increases the variance of the difference histogram increases. Detectipg vehicles If the image data is represented as a three-dimensional graph where the vertical axis represents the brightness of a pixel at coordinate x,y then bright regions will show as peaks and plateaux and dark regions will be seen as valleys and troughs. The steepness of a slope in this graph represents how rapidly the light intensity changes and, in general, where there is a significant gradient corresponds to an actual edge in the image. By performing a spatial convolution an image can be transformed so that the pixel intensity represents the size of the gradient at the pixel location. In traffic scenes the results of an edge detector generally highlights vehicles as complex groups of edges. An individual vehicle will be made up of several regions of differing intensity which in turn are different from the background scene. In most cases the road area in the image has a relatively low edge content, chiefly road markings and kerb lines. The presence of vehicles can be detected by the increase in edge complexity within the road area. This c an be measured by analysis of the histogram in the same way as for movement detection.
Detecting movement If we subtract an image taken at time t from an image taken at time t+8t the differences will be due to four possible causes: Movement of the camera, movement of objects within the scene, changes in lighting and electrical noise. For a fixed camera position subject to minimum vibration the first reason can be eliminated. If we choose 8t to be sufficiently small then changes in light levels in a real world scene will be negligible and a problem which has restricted the effectiveness of some other systems will be eliminated. This leaves differences due to moving objects to be differentiated from those due to noise.
ANALYSING THE SCENE Cell map In the previous two sections two methods of transforming image data to allow movement and vehicles to be detected have been described based on analysing the image histograms of the transformed image. At the next stage we need to know how these measures are distributed across the image. To do this each of the transformed images is divided up in a rectangular grid of "cells", each of size dx, dy pixels. A cell size of 64 x 64 pixels has been used to date. The histogram for each cell is found and the variance calculated and stored. Thus in the same way as a pixel represents
ComplltlT \'isioll as a Traffic SlIrH'illallcc Tool
the light these cell the scene differences
intensity over an area each of values represents a measure of content in terms of edges and over a group of pixe1s.
In order to reduce the amount of computation required and to use context to establish that changes in the cell parameters are a result of traffic the cells processed are limited to those through which the projection of the carriageway pass. Currently, these cells, as illustrated in fig. 3, are selected interactively at the start of a program run by marking the road edges at the upper and lower edges of the screen image.
Then: "Moving density"
(1 )
Ne Ns "Queue density"
Ds
( 2)
Ne If we assume that some cells will be incorrectly classified and that more than 2 cells in anyone scene must be set to "stop" before an alarm is raised then a table of scene "states" can be drawn up as in Table 2. Table 2 Scene "States"
Cell "state" As a first stage the values for edges and movement in each cell are compared with a threshold and a "state" for each cell can be determined using the logical table shown in Table 1 The edge threshold, TE, represents the parameter value above which the cell is deemed to contain sufficient edges to indicate the presence of a vehicle. The value of the difference threshold, To, determines the amount of movement that is deemed to be significant within a cell. As this is related to the speed of the vehicles this threshold will govern the speed below which traffic is deemed to be queuing. Each cell has its own value for both thresholds. At this time, these values are determined by manual analysis of the parameter values recorded during a "training" run of 50 cycles. Table 1 Cell "States"
Ds 0 < Ds < 2/Ne
0 0 DM
0< DM < Ne /2
)
2 / Ne
None
Warning
Warning
Medium
Warning
ALARM
Dense
Warning
ALARM
)
Ne / 2
The scene states "Medium" and "Dense" give a qualitative description of the current state of moving traffic. The "Warning" state indicates that there are some cells in the "stop" state but that this may not accurately reflect the actual state. Finally, the "ALARM" state is intended to draw an operators attention.
Movement
)TM
None
Analyse further
Stop
Moving
Edges )TE
In the case where further analysis is required the parameter values are again compared with the threshold but this time their closeness to the threshold value is assessed. If the movement parameter is significantly more than TM then the cell state is set to moving traffic . If the edge parameter is within a set limit of the edge threshold, TE, then the cell state is judged to be "None". Scene "status" By comparing the numbers of cells in each state a description of the scene can be generated . Again a simple threshold comparison has been used. If we define the following: Ne Total number of cells NM Number of cells with state "Moving" Ns Number of cells with state "Stationary"
THE COMPUTER SYSTEM The program has been implemented on a commercially available image processing subsystem connected to a 80286 based microcomputer as shown in Fig. 2. The subsystem digitises and processes pixe1 data at high speed while the host controls the programming of the subsystem and processes the cell data. The traffic images are obtained from video rec o rdings of various sites. The subsystem is modular and co mprises a set of cards connected by a VME bus and a proprietary video bus. Pixel data is transferred between the different function and memory cards via the video bus. The system used in the work described in this paper comprised five cards, a digitiser and controller card, two frames tore cards each with 1MB of video storage, an arithmetic/logical processing card and a histogramming card.
50
'\. I-loos e
The microcomputer host controls the subsystem via a card that connects the host bus to the subsystem VME bus. Instructions can be sent to, and data received from, the subsystem by this route. The subsystem processes the image data up to the histogram stage with the host issuing the appropriate instructions for this lower level stage of the algorithm. Analysis of the cell histograms and subsequent stages of the algorithm are processed within the host microcomputer. INITIAL RESULTS
CELL CLASSIFICATION ERRORS The mistakes that occur in trying to set the state for each cell can mostly be attributed to three separate effects. The "Aperture" problem.
The first analysis of the effectiveness of this algorithm has been carried out using a video recording of the scene shown in Fig. 3. The scene is that of a merging section on a motorway . The camera is mounted on a gantry and the field of view extends approximately 100 metres along the carriageway . During the recording queues form and disperse in different lanes . The cell map comprised 28 number, 64 x 64 pixel cells in a square packed arrangement. Over a program run of 100 cycles at the end of each cycle of the algorithm the state of each cell was displayed using a colour coded overlay on the scene being processed. This computer generated state was manually compared with the actual state and the numbers of mi5-classified cells recorded. Figure 4 shows the distribution of the number of cells incorrectly set in each cycle against the number of cycles with that amount of mi5-classified cells . The images processed were sampled at approximately 15 second intervals from a 30 minute recording on VHS videotape . From Fig. be made .
Where a cell is classified as "None" instead of "Moving" the cell is usually in the foreground and mostly occupied by a uniform grey-level region of a heavy goods vehicles (HGVs).
This is a classic problem in image processing . If a straight edge is viewed in such a way that only a segment of the edge can be seen then the only component of movement that can be measured is that normal to the edge. Figure 6 illustrates this effect. If this normal component is zero , or very small, then the edge will appear to be stationary. Such a condition can arise in the algorithm because of the use of cells for analysis of the scene. For example, if a cell in the test scene shown in Fig. 3 contains just the vertical edge of a HGV without any horizontal edges then, because the vehicle is moving parallel to the vertical image axis, no movement will be detected. However, the edge convolution will highlight the edge and the cell will be classified incorrectly. If the vehicle has a uniform grey level and occupies most of the cell, as in Fig. 7, then a "None" state will be set. Alternatively, if the cell is only partially occupied then a "Stop" will resul t (Fig. 7).
4 the following observations can Adjacency effects.
The mode value is 2 mi5-classified cells per cycle, the median value is 1.54 cells per cycle and the mean is 1 . 7 cells per cycle. In over 80% of the cycles 3 or less cells were incorrectly classified. A more detailed analysis of the results for each cycle is shown in Fig. 5. This compares the computer classification with the manual and reveals the occurrence of different sorts of mi5-classification. At each cycle the overall scene state was compared with that generated by the computer. At this stage only the presence or otherwise of the "ALARM" state was noted. No false alarms were generated but 9 cycles occurred where an alarm should have been produced but was not. It was observed that these were generally where a queue had formed on the far ground of the scene or where the vehicles were not completely stationary. The following recorded .
observations
were
also
Mi5-classification of "None" cells as "Moving" occurs either when a moving vehicle is very close to the boundary of the mi5-classified cell or where that cell has a large number of significant edges even when it is unoccupied .
Consider the situation shown in Fig. 8. A vehicle in the image captured at time t is within cell P but at time t+dt, when the second of the images is captured, its leading edge has entered cell Q. This will result in a change in the movement parameter f o r cell Q without a change in the edge parameter. On the first pass this will be designated as "Undefined" and put forward for further analysis. On the second pass if the movement parameter has increased by more than a predetermined amount (20% in the program run) then it will be defined as "Moving" . Thus, apparently empty cells can be set to this state and overestimate the value of DM the moving density. Threshold values In the simple decision model used the choice of the threshold values, TE and To , is very important. The function of the edge threshold is to allow the computer to discriminate between the background situation and the presence of a vehicle. Too high a value for TE will cause vehicles whose contrast with the background is low to be ignored whereas if the value is too low normal variations in the parameter value caused by noise and changes in light level will trigger an incorrect responie.
COll1pllllT
\ 'isioll as a Traffic Sllrlc illallCc Tool
The amount of movement that is detected within a cell depends upon the speed of the moving object, the number of distinct grey level regions of the object within the cell in question and the camera scene geometry. If the difference threshold, To, is too high the program will tend to be oversensitive and flag "Stop" cells unnecessarily. On the other hand, too low a value results in the opposite effect and the program ignores the formation of queues. In order to be able to estimate good values for these thresholds the "training" run should , ideally, contain very little traffic and that should be moving freely. In the results given above this was not possible and the "training" was carried out under heavy but free flowing traffic conditions . This meant that background conditions for those cells covering the road furthest from the camera did not occur and hence suitable threshold values were difficult to determine.
61
see whether a more sophisticated decision algorithm can be used to improve the accuracy of the classification. The effectiveness of the scene characterisation has not been discussed in any detail. The characterisation method described here is rather crude and is only capable of producing a simple alarm system. Questions of traffic become important at this stage. What is a queue? When does a platoon of vehicles become a queue? What sort of events does an operator require information about? These remain to be examined once the underlying cell level has been fully investigated . Moreover, the method described does not utilise any of the spatial information from the cell map. ACKNOWLEDGEMENT This work has been financed by the Science and Engineering Research Council. The author would like to thank Wootton Jeffreys Consultants for providing sample video tapes for analysis.
CONCLUDING REMARKS The results reported above indicate that the general approach outlined in this paper shows considerable potential . The program behaves in a consistent manner and is tolerant of changes in external lighting. The use of commercially available hardware, and the image processing functions supported by that equipment , means that the costs for implementing this algorithm are significantly lower than for systems that utilise specially designed hardware to perform similarly complex processing. The use of several levels in the algorithm, pixel - cell - scene, allows "noise" in the data at lower levels to be taken into account. Incorrect classification of cell "states" can be regarded as noise at that level but its presence is taken into account in the scene level stage of the algorithm. Each processing cycle in the algorithm is independent of any other cycle. Thus the concept of "real time" is not linked to that of video frame rates. The program is "real time" in that it uses images taken from a continuous video input. More important in this system is the concept of "response time" , i.e the time between the images being taken and the scene classification. This of course should be as small a possible but the actual maximum allowable response time will depend upon the precise circumstances under which the program is being used . At the cell level of processing there are a number of deficiencies that need to be addresses. Firstly, the form of the cell map itself. As the whole approach is at an early stage a very simple map format, a rectangular grid, was chosen. Use of a smaller cell size or of a mapping that reflects more the scene geometry may overcome some of the errors outlined earlier. It should be noted that the use of rectangular cells is governed by the hardware. Next, the method for determining the cell "states" needs to be assessed to
REFERENCES Abramczuk, T. (1984) . A microcomputer based TV-detector for road traffic. The Symposium on Road Research Program , OECD, Tokyo, Japan. Beucher, S . Blosseville, J.M. and Lenoir, F. (1987). Traffic spatial measurements using video image processing. SPIE, 848 . Hoose, Nand Willumsen , L . G. (1987) Real time vehicle tracking using the CLIP4 parallel processor. Seminar on Information Technology in Traffic and Transport, PTRC Summer Annual Meeting, University of Bath , UK. Houghton, A. Hobson, G. S. Seed, Land Tozer, R. C. (1987) . Automatic monitoring of vehicles at road junctions . Traffic Engineering & Control, 28 10 .
No mo v ement
Movement
t
n ( I)
Grey level Fig.
I
Grey level
1 I mage h ist o gra ms fo r image
I
d i fference
62
\i . Hoosc
Video out
Video
in A/ D and video bus cont,- o I buffer / __.....__ Frame buffer Arithmetic / Logic Unit Histog r amming AT / VME con v erter
Image p r ocessing subs y stem
P CAT host computer
Fig. 2 Compute r
s y stem
"Cell map"
Fig.
3 Video Scene
(i ,1 No of cyoles
30.-----------------------------------------------. 25
5
o
o
23
4
56
7
No of mis-classified cells Fig . 4 Distribution of mis-classified cells
None
Actual State Moving 19
Non e
Computer State
106
Moving
Stop Undef
Fig.
Stop 15 32
25
10
9
4
5 Anal y sis of cell mis-classification b y type for 100 cycle r un
Actual
mo v eme nt
Movement detected within
"A perture "
Fig.
6 Aperture problem
" ape l- tul-e "
8
64
:\. H oose
Cell
ma P-i
V/ /
Vehicle
/
/
/// /
Cell classified as ' 'N one"
1/
,.
V V
/
V V
/
.Cel I c lassi fied as ' 'S top " .
.- '-
/
./
/
.I,../
"./
" " "
Fig . 7 Mis-classification of c ell s
due to aperture problem
Cel l map
I
\ Q
Q
1,. - '" / / p
.-
,.
V
",.
I,..-
V I,..-
Time t
, r - - Ve h
1/
l C
le - - - -
/ /
/
~V
,.
//
Time t+dt
Fig. 8 Adj ace ncy Effect
"
".
P