Communicated by Dr Li Bing
Accepted Manuscript
Unsupervised Video Summarization using Cluster Analysis for Automatic Vehicles Counting and Recognizing Hana Rabbouch, Foued Saadaoui, Rafaa Mraihi ˆ PII: DOI: Reference:
S0925-2312(17)30696-3 10.1016/j.neucom.2017.04.026 NEUCOM 18359
To appear in:
Neurocomputing
Received date: Revised date: Accepted date:
19 April 2016 3 January 2017 8 April 2017
Please cite this article as: Hana Rabbouch, Foued Saadaoui, Rafaa Mraihi, Unsupervised Video Sumˆ marization using Cluster Analysis for Automatic Vehicles Counting and Recognizing, Neurocomputing (2017), doi: 10.1016/j.neucom.2017.04.026
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT
4
b ˆ Hana RABBOUCHa,∗, Foued SAADAOUI , Rafaa MRAIHIc
2
a Universit´ e
7 8 9 10
AN US
de Tunis, Institut Sup´ erieur de Gestion de Tunis Cit´ e Bouchoucha - 2000 Tunis, TUNISIA b Saudi Electronic University, Abu Bakr Seddiq Branch Road Al-Rabiea District - 11673 Riyadh, SAUDI ARABIA c Universit´ e de Manouba, Ecole Sup´ erieure de Commerce de Tunis Campus Universitaire de La Manouba - 2010 Tunis, TUNISIA
5 6
11
CR IP T
3
Unsupervised Video Summarization using Cluster Analysis for Automatic Vehicles Counting and Recognizing
1
Abstract
Automatic Vehicles Counting and Recognizing (AVCR) is a very challenging topic in transport engineering having important implications for the modern transport policies. Implementing a computer-assisted AVCR in the most vi-
M
tal districts of a country provides a large amount of measurements which are statistically processed and analyzed, the purpose of which is to optimize the
ED
decision-making of traffic operation, pavement design, and transportation planning. Since the advent of computer vision technology, video-based surveillance of road vehicles has become a key component in developing autonomous intel-
PT
ligent transportation systems. In this context, this paper proposes a Pattern Recognition system which employs an unsupervised clustering algorithm with the objective of detecting, counting and recognizing a number of dynamic ob-
CE
jects crossing a roadway. This strategy defines a virtual sensor, whose aim is similar to that of an inductive-loop in a traditional mechanism, i.e. to extract
AC
from the traffic video streaming a number of signals containing anarchic information about the road traffic. Then, the set of signals is filtered with the aim of conserving only motion’s significant patterns. Resulted data are subsequently processed by a statistical analysis technique so as to estimate and try to recognize a number of clusters corresponding to vehicles. Finite Mixture Models fitted by the EM algorithm are used to assess such clusters, which provides ∗ Corresponding author Preprint submitted to Journal of LATEX Templates April 21, 2017 Email addresses:
[email protected] (Hana RABBOUCH), ˆ
[email protected] (Foued SAADAOUI),
[email protected] (Rafaa MRAIHI)
ACCEPTED MANUSCRIPT
important information about the traffic such as the instantaneous number of vehicles, their weights, velocities and intervehicular distances. Keywords: Video Summarization; Multisignals Processing; Pattern
13
Recognition; Cluster Analysis; Information Retrieval; Vehicles Counting.
14
1. Introduction
CR IP T
12
Vehicle Traffic Counting (VTC) is one of the best-known methods and
16
promising research topics in transport sciences [3, 17, 31, 42, 47]. The VTC
17
plays an important role in collecting raw traffic data, which are crucial in con-
18
ducting any transportmetrical modelling study. Traditional counting methods
19
are mainly based on the buried inductive-loop traffic detector [11], pneumatic
20
tubes [44], piezoelectric sensors [32], and radars [20]. These devices are often
21
electronic and electromagnetic communication or detection systems that can
22
sense a set of vehicles passing or arriving at a certain point. These technical
23
equipments are either buried under the roadway surface or embedded into the
24
roadway. For this reason, they are inconvenient and cannot therefore be main-
25
tained. In comparison with traditional detectors, traffic cameras are installed
26
on the roadside and are considered as a major component of most Intelligent
27
Transportation Systems (ITS). Monitoring centers receive real-time live records
28
and perform image analyses. Today, since cameras are easily operated, con-
29
trolled and maintained, the traffic video data have replaced old-fashioned ones
30
and are now extensively applied to resolve many other transport problems.
PT
ED
M
AN US
15
The computer vision is a branch of artificial intelligence that uses appropri-
32
ate tools for acquiring, processing, analyzing and modelling multidimensional
33
data, and especially images in order to produce numerical or symbolic informa-
AC
CE
31
34
tion. In fact, images data can take many forms, such as video sequences, views
35
from one or many cameras, or multidimensional data from traffic, pedestrian
36
flows, conductors surveys, etc. The computer-aided detection and monitoring
37
of moving objects is among the most promising areas that could have a great
38
importance in developing transports technologies. We can find in the literature,
2
ACCEPTED MANUSCRIPT
several recent works in this new field, the most important of which are based
40
on the Background Subtraction (BS) principle [51] and the tracking methodolo-
41
gies [13]. Regarding background subtraction, there are techniques that model
42
the variation of the intensity values of background pixel with unimodal distri-
43
butions [21], mixture of Gaussians [22] and nonparametric kernel density esti-
44
mation [18]. Unimodal models are simple and fast but are not able to adapt
45
to multiple backgrounds (e.g., when there are trees moving in the wind). The
46
Gaussian mixture approaches can cope with these moving backgrounds but can-
47
not handle fast variations with accuracy using a few Gaussians, and therefore,
48
this method has problems for the sensitive detection of foreground regions. The
49
nonparametric kernel density estimation overcomes this constraint and allows
50
quick adaptation to background changes [46]. To provide temporal coherence to
51
the measurements, tracking methodologies are typically applied [30]. The two
52
main variants are those considering 2-D image objects and those inferring 3-D
53
volumes and positions using camera calibration. Nevertheless, as explained in
54
[46], 2-D estimations often lack the required accuracy in classification strategies,
55
while 3-D estimation alternatives are usually designed for simplified urban sce-
56
narios, with reduced vehicle speeds, and do not consider overtaking maneuvers,
57
which makes the classification problem easier.
ED
M
AN US
CR IP T
39
However, to accommodate the real needs of many applications, the two
59
above-mentioned methodologies must be computationally inexpensive and have
60
low memory requirements, while still being able to accurately identify moving
61
objects in the video. On the other hand, many other modelling methodolo-
CE
PT
58
gies lack mathematical rigor, and are static and qualitative, and thus can be
63
difficult to implement and extend. In this paper we develop a methodology
64
for implementing a Statistical Machine Learning (SML) -based algorithm that
AC
62
65
performs an unsupervised traffic video summarizing with the object of automat-
66
ically counting and recognizing a crowd of vehicles forming a traffic flow. This
67
methodology simply consists in extracting lighting information from a traffic
68
video stream and summarizing it using clustering techniques with the aim of
69
counting and classifying vehicles along a roadway. The main contribution is 3
ACCEPTED MANUSCRIPT
to use a fictitious inductive-loop sensor to extract an as-brief-as possible set of
71
observations that can reflect the useful traffic video information, especially, the
72
instantaneous number of vehicles. Another appealing characteristic of the pro-
73
posed approach is that input data, which have been captured by the sensor, will
74
be treated as a multisignal system, which provides the opportunity of benefiting
75
from signals’ mathematics to rigorously formulate and easily extend the defined
76
methodology.
CR IP T
70
Gaussian Mixtures Models (GMM) are employed within the proposed device
78
in defining significant patterns corresponding to the moving objects, thus en-
79
abling to parsimoniously recognize their shapes, velocities, weights, coordinates
80
(intervehicular distances) and other traces that will certainly help to identify
81
vehicles. The strength of the GMM clustering comes from its ability to split
82
even partially overlapped objects, and consequently allowing to resist occlusion
83
problems. Besides, the randomness, which is essentially caused by irregular
84
shapes and sudden speed variations, cannot reduce the performance of the sys-
85
tem, unless important of its parameters are not carefully chosen or verified.
86
Besides, since clustering algorithms cannot often preliminarily provide the ade-
87
quate number of groups, they must be initially supplied with this information.
88
Below, a new approach which is adapted to the current framework is proposed
89
and implemented. The method whose principle is inspired by the Set Theory,
90
allows to easily define the number of disjoint sets before starting the estimation
91
process. The GMM-based clustering of the traffic signals can be especially inter-
92
esting when dealing with situations where day or night (lighted areas) shadows
CE
PT
ED
M
AN US
77
taint objects, since they allow to statistically decompose overlapped clusters.
94
Moreover, due to its statistical (Maximum-Likelihood-type) nature, the system
95
can efficiently operate under adverse weather conditions, such as rain, snow and
AC
93
96
fog or in desert climates where sandy and dusty winds often occur and affect
97
the quality of the record videos. However, while preserving the parsimonious-
98
ness of the approach in the current paper, coupling it to some quality enhancers
99
schemes aiming at denoising [41] and dehazing/deraining/desnowing [24, 25, 35],
100
could easily lead to a better universal approach (see [45] and [47]). 4
ACCEPTED MANUSCRIPT
This paper is organized as follows. In section 2, a literature review is pre-
102
sented about the recent works in these directions. Sections 3 and 4 describe the
103
experimental set-up, data collection and mathematical formulation. The tech-
104
nical background of the Gaussian mixture models, which are employed in the
105
counting and recognizing strategy, is pointed out in section 5. Section 6 summa-
106
rizes the above-defined methodology by outlining the practical implementation
107
tasks. Section 7 provides all simulation results. The last section is a concluding
108
one.
109
2. Literature Review
AN US
CR IP T
101
Many works contributed to the field of the automatic video-based detection
111
and counting of vehicles. The first papers were devoted to cover important top-
112
ics like violation detection [53], vehicles tracking [10], classification of moving
113
vehicles [3], vehicles counting [4], traffic light detection [29], and resolving other
114
problems for the development of a modern transportation system. The recent
115
literature also encompasses other works such as that of Bragatto et al. [6], which
116
exploits computer techniques to build-up a portable equipment able to perform
117
vehicles counting and classification in an automatic way and in real time. De
118
Oliveira and Scharcanski [17] detected vehicles are tracked using a particle filter-
119
ing algorithm to instantaneously determine their positions on the road. Also for
120
vehicles counting, Pan et al. [36] focused on regions of interest and used a num-
121
ber of self-adapting windows instead of the traditional fixed window method.
122
The context of the region is also considered in the work of Meher and Murty
123
[31], where the problem of detecting and tracking movement is determined as a
124
geometric regions between the frames. Mishra et al. [33] developed a real-time
CE
PT
ED
M
110
algorithm for the detection and classification of the different categories of ve-
126
hicles in a heterogeneous traffic video. Contextual regularities are discussed by
127
Zhao and Wang [57], where the characteristic points are tracked and grouped
128
for counting vehicles separately on each channel. Ambardekar et al. [1] in-
129
vestigated a selection of object recognition techniques and constellation-based
AC 125
5
ACCEPTED MANUSCRIPT
modelling applied to the problem of vehicle classification. Fu-min et al. [19]
131
combined a set of machine learning methods to provide an appropriate solution
132
for solving the problem of allday traffic congestion states recognition based on
133
the traffic video information. Mu et al. [34] proposed an approach to detect
134
and recognize traffic lights in vertical arrangement. Their approach has been
135
designed for autonomous vehicles to do all processing in real-time.
CR IP T
130
Other approaches have been specifically developed for counting and/or rec-
137
ognizing vehicles, most of which have been soundly based on the Frame Differ-
138
encing (FD) [54], Background Subtraction (BS) [51] and Optical Flow (OF) [28]
139
methodologies. Unzueta et al. [46] proposed a robust a multicue background
140
subtraction procedure in which the segmentation thresholds can adapt robustly
141
to illumination changes, maintaining a high sensitivity level to new incoming
142
foreground objects and effectively removing moving casted shadows and head-
143
light reflections on the road and in traffic jam situations, in contrast to existing
144
approaches that do not cover all of these functionalities at the same time. To
145
the same end, Yang et al. [52] used an approach which discriminates the fore-
146
ground from the background using a pixel-based dynamic background model by
147
updating a weighting-ordered list for each pixel. The foreground on a check-line
148
is then collected over time to form a spatial- temporal profile image. The traffic
149
flow is finally estimated by counting the number of connected components in the
150
profile image. Iftikhat et al. [23] proposed an algorithm which uses a Gaussian
151
mixture model for detection of foreground and is capable of tracking the vehicle
152
trajectory and extracts the useful traffic information for vehicle counting. This
CE
PT
ED
M
AN US
136
stationary surveillance system uses a fixed position overhead camera to monitor
154
traffic. Seenouvong et al. [43] used several other computer vision techniques, in-
155
cluding thresholding, hole filling and adaptive morphology operations, together
AC
153
156
with the BS technique to get an important counting accuracy. To tackle other
157
problems which are mainly related to the inaccuracy of these methods, some
158
improved strategies have been proposed such as the Feature Correlation Hyper-
159
graph (FCH) [56], which created to model multiple features and enhance the
160
recognition of vehicles, the multi-view based methods devised to improve the 6
ACCEPTED MANUSCRIPT
accuracy of objects classification [55] and the virtual-loop method proposed in
162
[48] to improve the quality of the video-based vehicle counting. In [50], the BS
163
method is integrated with segmentation to track and segment vehicles with the
164
object of detecting occlusion. Moreover, traffic scenario contexts are adopted in
165
the vehicle-counting methods.
CR IP T
161
Nevertheless, video-based detection approaches are often restricted by com-
167
plex external environments such as the adverse weather and illumination con-
168
ditions (e.g. moving shadows, rain/snow/fog weather conditions, and vehicle
169
headlights at night). Such situations are common in traffic scenes [12, 47].
170
Consequently, vehicle detection approaches should be further endowed with the
171
ability of handling data which have been collected in adverse conditions. Many
172
recent studies have proved that these methods tend to be disturbed by a lot
173
of adverse factors. Chen et al. [10] proposed a multimedia data mining frame-
174
work which can be efficiently applied to traffic applications under exceptional
175
weather conditions. A statistical framework based on the mixture of Gaussians
176
has been used by Lagorio et al. [26] to identify changes both in the spatial
177
and temporal frequencies with the aim of characterizing specific meteorologi-
178
cal events in traffic scenes. Wang and Yao [47] proposed a video-based vehicle
179
detection approach with data-driven adaptive neuro-fuzzy networks. Their ex-
180
periments showed that the approach can be used to effectively detect vehicles
181
under adverse illumination and weather conditions.
182
3. Measuring Tools and Experimental Set-up
CE
PT
ED
M
AN US
166
183
The aim of this work is to propose an automatic traffic data collector which
184
can be exploited in many statistical applications, allowing to understand and find solutions to many transports’ problems. In fact, modelling traffic informa-
186
tion is the first stage preceding any development attempt towards an intelligent
187
transportation system. An overview of the developed strategy is graphically
188
represented in the simple diagram of Figure 1. It can roughly be summarized
189
into three general processes: The traffic recording process; the data transmission
AC 185
7
ACCEPTED MANUSCRIPT
190
process; and the information retrieval process. The first stage consists in positioning a traffic camera at a fixed location
192
with a predetermined focus. Traffic cameras are video cameras used to record
193
traffic flows. They are commonly put along major roads in developed nations
194
and are electrically powered either by mains power in urban areas, or via solar
195
panels in rural ones. Our experiment is set-up in manner which allows to con-
196
tinuously record the movement of vehicles flowing in a unique sense. Later in
197
real implementations, this task could be easily extended to treat opposite-sense
198
roads. Some important criteria have to be realized to allow a best performance
199
camera recording. For example, an overhead installation of the camera, which
200
requires the presence of existing structure for mounting. It is worth noticing
201
here that a vertical position with directing the camera’s focus straightly down
202
will always give the more accurate records. We must also consider that weather
203
conditions that obstruct view of traffic can interfere with performance (e.g.,
204
snow, fog, sun glare on camera lens at sunrise and sunset). Besides, we have to
205
take into account that large vehicles can mask trailing smaller ones [44].
M
AN US
CR IP T
191
The second stage of the procedure concerns the process of transmitting
207
recorded information to the computer server. This server is used in the third
208
and last stage to receive data and treat them before reporting results, see Figure
209
1. In fact, transmitting databases, especially large of them, have seen a huge
210
advance these last decades. In this context, the evolution of wireless transmit-
211
ters and receivers technology made wireless exchange of data easy and reliable.
212
Besides, high accessibility to wireless technology almost everywhere encourages
CE
PT
ED
206
to adopt it for the system’s data-streams transmission. The device used in our
214
experiment can be any IP/WiFi spill-resistant camera, which is the realization
215
of both, the first and second stage.
AC
213
216
The third and last stage of the procedure is the center of operations. The
217
inputs at this stage are video data streams that will be treated as a raw database.
218
Then, such a database will be processed through three essential substages:
219
• Focus window: As already done in the first stage when positioning the 8
ACCEPTED MANUSCRIPT
camera, this task is very important and consists on fixing a focus window
221
within the framework of the recorded video with adequate location and
222
angles. The main goal in this task is to focus the view toward the zone of
223
interest.
CR IP T
220
• Pixels line: Once the window is fixed, a static pixels line which acts as
225
a sensor is suitably chosen in such a way that it intersects the vehicles
226
trajectory. Each pixel of such a line defines a signal whose amplitude is
227
equivalent to the intensity of the pixel at the time t. Mathematically, this
228
is equivalent to define for each pixel a signal f (t), ∀ t ∈ Z, which yields
229
a system of signals (multisignal) for all the line. It is noticeable that a
230
another pixels line, not necessarily parallel to the first one, can be also
231
fixed and used later for validation tests.
AN US
224
• Multisignals processing: The main objective of this stage is to prepro-
233
cess the multisignal arising from the pixels line sensor before passing it
234
to the data analyzer. Preprocessing consists firstly on subdividing long
235
signals into a number of sequences to alleviate the process of data streams
236
management. Then, we need to increase the separation between the back-
237
ground and the vehicles, in order to avoid the color confusion. Hence,
238
squares of differences series are generally more useful than intensity lev-
239
els.
PT
ED
M
232
• Pattern recognition: The first objective in this last stage is to assess
241
the appropriate number of vehicles crossing the chosen line. This can be
CE
240
done by defining the adequate number of dynamic clusters evolving in each
243
subsequence of the multisignal. Once the number of clusters is determined,
244
other important traffic information can be assessed such as vehicles sizes,
AC
242
245
intervehicular distances, speeds, shapes, colors, and so on. Hence, we
246
need to define the features of such clusters, the most important of them
247
are their localizations, shapes and volumes. Unsupervised clustering is a
248
famous pattern recognition technique which can easily handle multisignal’s
249
data in order to extract traffic information. 9
ACCEPTED MANUSCRIPT
The proposed procedure can be easily extended to a fully embedded system wire-
251
lessly transmitting derived information to servers which store them in banks of
252
data. This issue could be realized in future eventual works. However, before car-
253
rying out any implementation, we need to formulate the process. The following
254
section proposes a rigorous mathematical explanation of all the procedure.
M
AN US
CR IP T
250
Figure 1: A diagram describing the strategy proposed for the traffic information collection. The strategy can be summarized into three general tasks: Traffic recording, data transmission
ED
and information retrieval.
256
PT
255
4. Mathematical Formulation Each frame from the video sequence is considered as a two-to-one quanti-
257
tative application y(x1 , x2 ) 7→ R+ , (x1 , x2 ) ∈ Ω ⊂ N2 , where y is the color
259
intensity and (x1 , x2 ) are the pixel’s coordinates in a Cartesian plane. Hence,
260
the video is a time-dependent sequence that can then be considered as
AC
CE 258
yx1 ,x2 (t), t = 1, 2, . . . ,
(1)
261
with (x1 , x2 ) ∈ Ω ⊂ N2 . Fixing x1 and x2 is equivalent to focusing on a
262
single pixel within the video, which from a frame to another yields one discrete
263
signal. In the experimental set-up described in section 3, the video processing 10
ACCEPTED MANUSCRIPT
strategy consists firstly on fixing a transversal pixel-line opposing coming on
265
vehicles. Hence, we consider that we are dealing with a T -length video sample
266
and that the dimension of the focus window is N × M , i.e. x1 = 1, . . . , N and
267
x2 = 1, . . . , M . Then, fixing x2 yields an n-dimensional signal that we denote
268
as follows
CR IP T
264
y(t) = [y1 (t), . . . , yx (t), . . . , yN (t)]† , t = 1, . . . , T,
(2)
where the superscript † denotes the vector transpose. This process is a multi-
270
signal that will be analyzed in order to extract valuable information about the
271
traffic. To distinguish vehicles that move in space at the same time, we keep
272
x in Eq.(2) as the space index, varying along the road width. Indeed, such a
273
localization will allow to avoid considering a number of vehicles having the same
274
characteristics and going in parallel, as a single one.
AN US
269
Besides, since data essentially come from a discretization process of continu-
276
ous visual information, they put a considerable amount of noise. Such a noise is
277
commonly amplified due to other factors such as the sudden variations of vehi-
278
cles speeds, the shadow effects, the particular conditions of luminosity and color,
279
and probably for the presence of a number of moving artifacts in the scene. Con-
280
sequently, we should consider Eq.(2) as a space-time stochastic process. Math-
281
ematically, this amounts to suppose that we have (Ω, F, P) a probability space.
282
We also suppose that we have X and τ , two totally ordered sets respectively
283
indexing space and time. The function y describing the pixels’ intensities is a
284
process y : X × τ × Ω → R such as y(x, t, ω) = yX ,τ (ω), ω ∈ Ω. By abusing the
285
notation, the process will be simply denoted {yX ,τ , x ∈ X , t ∈ τ }. Tracking
CE
PT
ED
M
275
moving objects in videos necessitates to rather consider time-differentiated in-
287
tensity signals instead of primitives because they can accurately mark any new
288
movement occurrence. Time-differentiating y leads to the following Stochastic
289
Differential Equation (SDE)
AC
286
dy(x, t) = f(x, t)dt + %(x, t)dη(t), ∀x, t,
(3)
290
where f(x, t) and %(x, t) are respectively the drift and volatility, and η(t) is a
291
brownian motion. 11
ACCEPTED MANUSCRIPT
292
If we consider y is a bivariate discrete function that, for each couple (x, t) ∈
293
N admits a real random value y(x, t), the video sampled data can be represented
294
as an N × T -matrix, yX ,τ = [y(x, t)]x=1,...,N,
t=1,...,T .
(4)
CR IP T
2
This provides a 3D pattern which consists of a number of signals (a vector
296
of signals/a multisignal). These discrete signals will be processed before being
297
summarized with the aim of recognizing a number of characteristics of all mobile
298
engines that cross the focus window. Similarly, Eq.(5) discretized under some
299
assumptions leads to
AN US
295
∆y(x, t) = f(x, t) + %εt , ∀x, t,
(5)
300
where ∆y(x, t) = y(x, t) − y(x, t − 1) and εt ∼ N (0, 1). We denote the new data
301
set,
y˙ X ,τ = [∆y(x, t)]x=1,...,N,
t=2,...,T ,
(6)
which is the data set that will be mined and modelled in order assess f (x, t),
303
which is our essential traffic information.
M
302
The first information we need is the adequate number of vehicles crossing the
305
chosen focus window during the video sample sequence. This is equivalent to
306
identify the appropriate number of clusters in the multisignal dataset (Eq.(6)),
307
allowing consequently to assess the size of the flows. A cluster is a significant
308
covariation of a set of neighbors signals in a manner resembling waves motion.
309
Mathematically, recovering such clusters amounts to estimate the drift in Eq.(5),
CE
PT
ED
304
or simply define the time and space from which begin and finish high dispersions.
311
This is also equivalent to focus on the scatter S of all elements whose variations
312
exceed some threshold, i.e.,
AC
310
S = {x ∈ X , t ∈ τ : ε < |∆y(x, t)|}
(7)
313
where ε is a constant that is fixed sufficiently far from zero. The following
314
proposition gives a determination of such a threshold according to the provided
315
initial conditions. 12
ACCEPTED MANUSCRIPT
Proposition 4.1. Let us consider the signals’ system {y(x, ˙ t)}x∈X ranging be-
317
tween two moments t1 and t2 , such as t1 < t2 , before any vehicle has crossed
318
˙ t)|, ∀ x ∈ X , above the fictive line. There exists a threshold ε = supt1 ≤t≤t2 |y(x,
319
which any variation announces the arrival of a moving object.
CR IP T
316
Now, we need to practically extract away the high-amplitude parts of multisignals, which in fact define the occurrence of vehicles. For that, we consider
321
able set. Among appealing characteristics of such functions is that they allow
322
to extract from any signal (discrete or continuous) a representative scatter. For
323
example, if we rather consider the following discrete composite indicator func-
324
tion,
AN US
320
an Indicator Function (IF) of a set A ⊂ R+ , which for all t ∈ R+ is defined as 1 if y ∈ A 1A {y} = 0 if y ∈ / A.
Similarly, we can define a Discrete Indicator Function (DIF) when A is a count-
M
1 if |y(x, ˙ t)| > εˆ
ED
1]ˆε,+∞[ {y(x, ˙ t)} =
0 if |y(x, ˙ t)| ≤ εˆ,
(8)
where εˆ is preliminarily fixed as defined in Proposition 4.1, and for all ∀ x ∈
326
X , t ∈ τ , this allows to split all scatters elements, according to their amplitudes,
327
into two binary modalities. This is a supplementary step towards the detection
328
of moving object in one hand and omitting the background of the scene on the
329
other hand. However, if we want to keep the pixels’ intensity dimension, we need
CE
PT
325
330
Hence, we rather consider the following dataset, y˜˙ X ,τ = |∆y(x, t)|1]ˆε,+∞[ {∆y(x, t)}
AC
331
to incorporate the real amplitudes which partially reflect the colors of vehicles1 .
(9)
x=1,...,N, t=2,...,T
1 Detecting
vehicles colors and shapes is a very important topic for intelligent surveillance.
This type of operations is most commonly used by government law enforcement and intelligence services, see Li et al. [27].
13
ACCEPTED MANUSCRIPT
which finally gives for each observation four dimensions, its coordinates (x, t),
333
its absolute amplitude ξ = |y(x, ˙ t)| and a binary variable 1]ˆε,+∞[ which indicates
334
whether such an element will be taken in the scatter S or not. The dichotomy
335
is an important fact since it allows to decompose the multisignal flow into two
336
essential areas, a road background and a pattern representing moving vehicles.
337
In the next step, we only consider elements x, t, ξ | 1 = 1. All these elements
338
are subsequently represented in a Cartesian coordinates system. This yields
339
a scatter that will be analyzed with the objective of determining an optimal
340
number of structures that corresponds to the number of vehicle that have crossed
341
the pixels line. In contrast, the complementary set that consists of elements
342
x, t, ξ | 1 = 0, will be omitted since it represents the road background that, in
343
fact, has no interest in our study. The new dataset whose elements verify 1 = 1
344
is a 3D set denoted S and it defines a scatter that corresponds to the moving
345
objects crossing the focus window. Now, the main objective is to determine the
346
number of disjoint subgroups and try to identify each one.
347
5. Clusters Analysis
348
5.1. Finite mixture models
ED
M
AN US
CR IP T
332
Cluster Analysis is an unsupervised pattern recognition method that splits
350
the data space into a set of subspaces. The object of a clustering scheme is to
351
perform a partition where elements within a cluster are similar and elements
352
in different clusters are dissimilar. Finite Mixture Models (FMMs) are popu-
353
lar unsupervised statistical learning methods used for automatically classifying
354
and modelling data. These models consider a parametrization of the variance
355
matrix of each cluster through their spectral decomposition leading to many
AC
CE
PT
349
356
geometric structures for clustering and classification [5, 39]. Estimating FMMs
357
models is often carried out using the well-known EM algorithm (expectation-
358
maximization) or any version of its many improved variants [16]. Although
359
FMMs have been incrementally used in many other fields such as in epidemiol-
360
ogy, biology and finance, they are used the first time in traffic vision. Below,
14
ACCEPTED MANUSCRIPT
we put forth the technical background of these parsimonious models, before
362
describing their practical implementation for counting and recognizing vehicles.
363
In a FMM, it is assumed that data are generated by a mixture of under-
364
lying probability distributions in which each component represents a different
365
cluster. Given observations y = (y1 , . . . , yN ), let ϕ(yi |Φg ) be the density of an
366
observation yi from the g-th component, where Φg are the corresponding pa-
367
rameters, and let G be the adequate number of components in the mixture. The
368
model for the composite of the clusters is generally formulated by maximizing
369
the following likelihood approach:
AN US
L(Φ1 , . . . , ΦG ; π1 , . . . , πG |y) = 370
371
372
N X G Y
CR IP T
361
i=1 g=1
πg ϕ(yi |Φg ),
(10)
where πg is the probability that an observation belongs to the g-th component, PG such that, πg ≥ 0 and g=1 πg = 1. Let us consider the case where ϕ(yi |Φg ) is a multivariate Gaussian probabil-
ity density function (p.d.f.), which is a model that has been used with consider-
374
able success in a large number of applications. In this instance, the parameters
375
Φg consist of a mean vector µg and a covariance matrix Σg , and the density is
376
given as follows:
ED
M
373
PT
ϕ(yi |µg , Σg ) = (2π)−n/2 |Σg |−1/2
× exp(− 21 (yi − µg )† Σ−1 g (yi − µg )).
(11)
As already developed in a monographs, Gaussian clusters are ellipsoidal, cen-
378
tered at the means µg . The covariances Σg determine their other geometric
CE
377
characteristics. In fact, each covariance matrix can be written as
AC
379
Σg = λg Dg Ag Dg† ,
(12)
1
380
where λg = |Σg | n , Dg is the orthogonal matrix of eigenvectors, Ag is a di-
381
agonal matrix such that |Ag | = 1, with the normalized eigenvalues of Σg on
382
the diagonal in a decreasing order. The orientation of the principal compo-
383
nents of Σg is determined by Dg , while Ag determines the shape of the density
384
contours; λg specifies the volume of the corresponding ellipsoid. By allowing 15
ACCEPTED MANUSCRIPT
these quantities to vary between groups, parsimonious and easily interpreted
386
models, useful in describing various clustering or classification situations, can
387
be obtained. Varying the assumptions concerning the parameters λg , Dg and
388
Ag leads to a selection of different models, which offers a flexible stochastic
389
modelling toolbox.
CR IP T
385
As defined by Biernacki et al. [5], we can distinguish 3 main categories of
391
models. One of them consists in assuming spherical shapes, namely Ag = I,
392
where I is the identity matrix. These models are named spherical and are de-
393
noted [λI]. Another family of interest consists of assuming that the variance
394
matrices Σg are diagonal. This means that Dg are permutation matrices. We
395
write Σg = λB where B is a diagonal matrix verifying |B| = 1. This category
396
is named diagonal and is denoted [λB]. The third category allows volumes,
397
shapes and orientations of clusters to vary or to be equal between clusters. It is
398
named general and is denoted [λDAD† ]. It is noticeable that for each category,
399
a subscript under a parameter means that the parameter is let varying between
400
clusters. Overall, 28 Gaussian Mixture Models can be used as automatic classi-
401
fiers (see Table 1).
M
AN US
390
Weights or proportions πg ’s are also important parameters that enrich GMMs
403
since they determine proportions of data that a priori belong to their respective
404
clusters. According to Biernacki et al. [5], two typical assumptions are consid-
405
ered with regard to the proportions: assuming either equal or free proportions
406
over the mixture components. Therefore, models can rather be classified ac-
407
cording to the degree of freedom allowed to the components of the quantity
CE
PT
ED
402
πλDAD† . Once added to the above geometric features, we can obtain the
409
twenty-eight models of Table 1. By varying such parameters, a selection of
410
parsimonious and easily interpreted models are ready to be calibrated.
AC
408
411
5.2. Models learning
412
In an unsupervised clustering context, the number of groups has to be spec-
413
ified before running the estimation procedure. Then, during the estimation
414
process, data move iteratively from one cluster to another, starting from an 16
ACCEPTED MANUSCRIPT
Table 1: Parametrization of a Gaussian Mixture Model (GMM). (The index g refers to varying parameters between clusters)
Models
Equal π 0 s
Varying π 0 s
General
[λDAD† ]
(1)
(15)
[λg DAD† ]
(2)
†
(3)
[λg DAg D† ]
(4)
[λDg ADg† ]
(5)
[λg Dg ADg† ]
(6)
[λDg Ag Dg† ]
(7)
(21)
[λg Dg Ag Dg† ]
(8)
(22)
[λB]
(9)
(23)
(10)
(24)
(11)
(25)
(12)
(26)
[λI]
(13)
(27)
[λg I]
(14)
(28)
[λg B] [λBg ]
M
[λg Bg ]
(17)
(18)
(19)
(20)
ED
Spherical
(16)
[λDAg D ]
AN US
Diagonal
CR IP T
Category
initial position that has been chosen [8]. Typically, the number of components
416
does not change during the course of the iterations. The EM algorithm [16]
417
or one of its variants [5] can be qualified as main machines that appropriately
418
ensure a GMM calibration. The principle is described as follows. With a data
CE
PT
415
set typically composed of N vectors y = (y1 , . . . , yN ) in Rn and the aim is to
420
estimate an unknown partition z of y into G clusters, where z = (z1 , . . . , zN ) de-
421
notes the labels with zi = (zi1 , . . . , ziG ), zig = 1 if yi belongs to the g-th cluster
422
and 0 if not. The relevant assumptions are that the density of an observation
423
yi given zi is expressed as:
AC
419
p(xi ; zi ) =
G Y
g=1
[πg ϕ(yi |µg , Σg )]zig ,
17
(13)
ACCEPTED MANUSCRIPT
424
and that each zi is independent and identically distributed (i.i.d.) under a
425
multinomial density. The resulting complete-data log-likelihood is:
426
N X G X i=1 g=1
zig log πg ϕ(yi |µg , Σg ).
(14)
CR IP T
l(µg , Σg , πg , zig |y) =
The Expectation-step of the EM algorithm for mixture models is given by zˆig = E[zig |yi , Φ1 , . . . , ΦG ] = PG
ˆ g) π ˆg ϕ(yi |ˆ µg , Σ
g 0 =1
ˆ g0 ) π ˆg0 ϕ(yi |ˆ µg0 , Σ
,
(15)
while the Maximization-step involves maximizing Eq.(14) in terms of πg ’s and
428
Φg ’s with zig fixed at the values computed in the Expectation-step. From rea-
429
sonable starting parameters [5], the EM algorithm alternates between expecting
430
and maximizing. The routine continues until a level when the relative differ-
431
ence between successive values of the mixture log-likelihood falls below a small
432
threshold. Under certain conditions, the method can be shown to converge to
433
a local maximum of the mixture likelihood (see [16]). For a GMM, estimates of
434
the means and probability weights have simple closed-form expressions involving
435
the data and the quantities zˆik from the Expectation-step, ng , N
µg =
PN
ˆig yi i=1 z
ED
πg =
M
AN US
427
PN
ng
and
Σg =
PN
ˆig (yi i=1 z
− µg )(yi − µg )T ng
436
where ng =
437
by its spectral decomposition can be seen in Celeux and Govaert [8].
438
5.3. Optimal number of clusters
Details of the Maximization-step for Σg parameterized
CE
PT
ˆig . i=1 z
(16)
We need to count and estimate the features of the separate subgroups that
440
are composed of points that are the least dispersed in both time and space.
441
As discussed in section 3, each one of these groups contains information about
AC
439
442
a passing vehicle. Since the components are well defined in time and space,
443
their centers (means) will inform about their localizations and distances one
444
another. The dispersion of groups is also important because it informs about
445
the size/velocity ratio, which allows to classify vehicles into categories. In fact,
446
larger vehicles are known to have a limited speed, unlike small and midsize 18
ACCEPTED MANUSCRIPT
vehicles. They are generally slowed because of the current traffic laws and
448
regulations, in comparison with other vehicles such as the light and commercial
449
cars. Hence, the greater the dispersion of the cluster, the slower and larger the
450
vehicle will be assumed. Another assumptions that should be considered is that
451
no vehicles stop in the captured road videos.
CR IP T
447
To identify motion groups, it is convenient to proceed using some particu-
453
lar clustering tools. Many clustering algorithms are not able to preliminarily
454
provide the appropriate number of clusters, and therefore they must initially
455
be supplied with this information. Since this information is seldom previously
456
known, we need to run the algorithm many times with a different value for
457
each run. Then, the partition that best fits the data is chosen. The process of
458
estimating how well a partition fits the structure underlying the data is known
459
as cluster validation. Several Cluster Validation Indices (CVIs) have been pro-
460
posed in the literature [2]. One of them is to focus on the partitioned data and to
461
measure the compactness and separation of the clusters, where approaches such
462
as those of Dunn [14], Davies-Bouldin [15], Calinski-Harabasz [7] and Rousseeuw
463
[38] are the most prominent.
M
AN US
452
When using GMMs, a set of Selection Criteria have been used for choosing
465
an optimal model and its appropriate number of clusters. The most popular
466
criteria are the Bayesian Information Criterion (BIC), the Integrated Complete
467
Likelihood (ICL) and the Normalized Entropy Criterion (NEC). It is noticeable
468
that the NEC is only used for assessing the number of clusters of a mixture
469
model. However, when data distribution is significantly different from a Gaus-
CE
PT
ED
464
sian mixture, such criteria can provide a wrong number of components, which
471
may leads to misleading interpretations. Hence, we develop a procedure whose
472
role is to firstly determine the optimal number of clusters at each subsequence.
AC
470
As an alternative, we propose a simple strategy for preliminary counting
disjoint subgroups. For each sampled subsequence, we begin by transforming data, composing the scatter y˜˙ , and then, fixing a maximum number of clusters Gmax . Then, we perform the adjustment of the Gmax -components GMM to the data, to obtain a set of clusters C = {C1 , C2 , . . . , CGmax } with respective means 19
ACCEPTED MANUSCRIPT
ˆ 1, . . . Σ ˆ G }. The data set y˜˙ is consequently {ˆ µ1 , . . . , µ ˆGmax } and covariances {Σ max
shared between these clusters to form the set we denote {y˜˙ (1) , . . . y˜˙ (Gmax ) }, where
subsets y˜˙ (h) , h = 1, . . . , Gmax , are partitioned but not necessarily disjoint. Let
Sg =
S
C∈C
CR IP T
us consider the structures of connected clusters: C, g = 1, . . . , G, (G ≤ Gmax ) T
C 6= ∅. Clusters belonging to the same structure
473
verifying within each group g,
474
are labelled to a specific class (the gth) among the G separate classes. Defining
475
these structures (their number and components) can be ensured by checking
477
elements y˜˙ i simultaneously belonging to more then one cluster. If this is the T case (i.e. y˜˙ i ∈ C), y˜˙ i is assigned to Sg . It is noticeable that the frequency
AN US
476
of elements within intersection zones can be of great importance for defining
479
the adequate number of vehicles and delimiting them, especially when data are
480
corrupted by adverse weather and illumination conditions. Moreover, it deserves
481
to be mentioned that this approach is mathematically obvious since we know
482
all Gmax clusters equations. Finally, once composed structures are well defined,
483
they will reflect the locations and areas of the scatters corresponding to the G
484
vehicles in the current video subsequence.
485
6. Implementation
PT
ED
M
478
Before performing the cluster analysis, we should subdivide the traffic video
487
steaming into a set of short sequences in order to make the data modelling
488
possible. Practically, this amounts to deal with subsequences and send them
489
one-by-one to the clustering center as soon that they are recorded and trans-
490
formed. The length of subsequences must be arranged in a way that they do
AC
CE
486
491
not contain more than 12 vehicles. This is done by considering, for each subse-
492
quence, the time interval corresponding to a pavement portion which is assumed
493
to handle not more than a dozen of crowded vehicles. The width of intervals can
494
accordingly be preliminarily fixed by estimating the adequate surface for such
495
a number, taking also into account inter-vehicular distances. Besides, vehicles
20
ACCEPTED MANUSCRIPT
496
must treated on a sequence of mutually disjoint intervals. In other words, a
497
vehicle that is considered in two successive subsequences have to be processed
498
only at one of them. If we consider a set of splitting-times t0 , t0 + ∆t, t0 + 2∆t, t0 + 3∆t, . . .,
500
which corresponds to the succession of periodic times stating the end of a video
501
sequence and the beginning of a new one, a number of vehicles will probably
502
belong to two successive sequences. Hence, we need to particularly treat vehicles
503
that intersect frontiers. We propose a simultaneous strategy whose role is to
504
collect and treat such cases in a parallel independent process, see Figure 2.
505
The strategy consists, after performing the cluster analysis on the first video
506
subsequences, in extracting clusters that coincide with the splitting time and
507
merge them. This can be done by re-clustering the datasets related to such
508
clusters.
AN US
CR IP T
499
Explicitly, a number of clusters {C1 , C2 , . . . , Cg , . . . . . .} with respective means
509
{µ1 , µ2 , . . .} and respective covariances {Σ1 , Σ2 , . . .} may occur at the end of the
511
video sequence. A cluster Cg is consequently split into two parts Cg
512
Parts are localized in both time and space, consequently, they can be identified
513
as composing the same cluster which coincides with the splitting-time. In a sec-
514
ond stage, couples Cg
515
are consolidated through redrawing their common bounds using a re-clustering
516
task performed to delimit their scatters. We should notice that, when we are
517
facing a low frequency traffic, such intersection situations can be straightfor-
518
wardly avoided by allowing more flexibility to the video band to last until all
(1)
(2)
and Cg .
ED
M
510
(2)
and Cg , ∀ g, are considered in a parallel process and
CE
PT
(1)
vehicles have entirely crossed the pixels line. Unfortunately, this is seldom the
520
case in urban areas.
AC
519
521
522
The overall data modelling stage can be summarized to the following main
523
tasks. 1- Multi-signals data are time-differentiated (before or after being se-
524
quentially processed). 2- A Gmax -components finite mixture model is assigned
525
to the new dataset. 3- Structures labelling is used to determine the adequate 21
M
AN US
CR IP T
ACCEPTED MANUSCRIPT
Figure 2: A diagram explaining the video sequencing strategy and how to deal with vehicles that have been split between two adjacent subsequences. In the top, we consider two adjacent
ED
video subsequences A and B. In the bottom, we see how vehicles coinciding with the cutting time are considered in an independent subspace.
number of separated groups in each subsequence. 4- A parallel process is de-
527
voted to treat border clusters. 5- A final task is consecrated to reassign a single
528
cluster to each vehicle before a traffic report is rendered. Such a report contains
529
an assessment of the total number of vehicles as well as other information such
530
as the vehicles’ velocity and weight, the intervehicular distances and the traffic
531
fluidity. The instruction set and programs are outlined in Tables 4 and 5 (see
CE
PT
526
the Appendix).
533
7. Simulations and Results
AC 532
534
The following simulations are based on a short benchmark video. The objec-
535
tive is to test the data processing program of the proposed strategy for counting 22
CR IP T
ACCEPTED MANUSCRIPT
(a) 157th frame
(b) 170th frame
Figure 3: The 157th and 170th sampled frames taken from the benchmark traffic video.
and recognizing vehicles, which in fact is the last one of the three main tasks
537
that compose the strategy (see Figure 1 in section 3). A server machine with 2.3
538
GHz processor and 8 GB RAM is used during experiments. As seen in section
539
4, processing traffic videos begins by splitting long streams into short sequence,
540
where the length of sequence should depend on the frequency of vehicles on the
541
road section. Let us recall that the main role of the proposed strategy is to
542
ensure the automatic collection of the vehicles’ information from the recorded
543
video, in particular, their number, spatial separation and categories.
M
AN US
536
The benchmark sequence, which recording appears to have been made in
545
good conditions, properly describes the passage of a flow of vehicles of similar
546
sizes on a one-way road. The road consists of two subways, but we will only
547
focus on the principal way which is directly objected to the camera. Figure
548
3 shows an excerpt of two frames taken from the benchmark video. Figure 4
549
illustrates the focus window and the pixels line choices. The video duration is
550
d =17 seconds. The sequence is firstly discretized by cutting it into 531 frames.
CE
PT
ED
544
This implies that a constant steplengh ∆t=0.032 seconds has been preliminarily
552
fixed. It is worth noticing that we can change the time scale resolution ∆t to
553
either zoom-in or zoom-out the representation according to the nature of the
554
traffic.
555
7.1. Experiment 1
AC
551
556
As defined above, the proposed program consists firstly in carefully fixing
557
the focus window and the pixels line in the video, which is carried out as seen in 23
AN US
(a) Focus Window
CR IP T
ACCEPTED MANUSCRIPT
(b) Wireless Inductive Loop
Figure 4: The focus window and the fictive-loop (pixels line) choices.
Figure 4. The fictive line defines the first set of signals that contain information
559
about the traffic. Such an information remains hidden behind an amount of
560
systematic noise. The multivariate time series are firstly transformed to their
561
squared differences, see Figure 5. Then, the program ensures the projection
562
of such data on a functional subspace that allows to split them into two di-
563
chotomous levels before extracting only the higher level. The threshold εˆ is
564
fixed according to Proposition 4.1. In the next step, two among the best-known
565
CVIs are performed in order to determine the number of vehicles G that cross
PT
ED
M
558
the fictive wire inductive-loop. The Davies-Bouldin and Silhouette indices re-
567
spectively attain their minimal and maximal at the fourth cluster. The same
568
results is obtained when considering a preliminary number Gmax =15 which is
AC
CE 566
569
adjusted according to the clusters intersections. Consequently, an adequate 4-
570
components mixture model is assigned to the transformed data. Such a model
571
summarizes the video sequence, thereby defining the number of vehicles G = 4,
572
their size (weights) and intervehicular distances (localization), see Figure 6 and
573
Table 2. 24
ACCEPTED MANUSCRIPT
250
150
CR IP T
Color level
200
100
50
0
0
100
200
300
Time (frames)
400
500
(a) Traffic Multisignal
(b) Squared t-Differentiated Data
AN US
Figure 5: In the left-hand side, the plot of the traffic multisignal. In the right-hand side, 3D representing the squares of the time-differentiated multisignal.
Table 2: EM-estimated Gaussian mixture parameters. A four-components model is assigned to data. The four vehicles have almost the same size. Component 1 Component 2
Component 3
Component 4
0.284
0.213
0.240
172.73
252.92
345.50
415.64
388.67
85.32
421.01
92.78
Weights (πg )
ED
Localization (µg )
PT
574
575
M
0.262
7.2. Experiment 2
The second experiment is also conducted on the basis of the same video
CE
576
benchmark, see Figure 3. The object is the same, except that we assume that,
578
when sequencing the video streaming, some passing vehicles are split between
579
the two adjacent subsequences. Hence, we slightly modify the initial conditions
AC
577
580
of the simulation, in such a way we create these situations. Precisely, splitting-
581
times are chosen at the same time vehicles cross the pixels line. Creating such
582
perturbations allows to further test the robustness of the strategy. It is worth
583
mentioning that no change of the focus window neither the pixels line are made.
584
Proposed algorithms are launched with the objective of retrieving the adequate 25
0.85
0.8
0.775
0.5
0.75
0.4
0.725
Minimal DB value
0
2
4
6
8
10
12
14
16
Number of Clusters
18
20
(a) Cluster Validation Indices
0
0.2
0.4
0.6
0.8
Silhouette Value
1
(b) Silhouette Analysis
200
PT
ED
Space
0.7
600
400
-200 100
3
4
M
600
0
2
AN US
0.55
Cluster
0.6
0.45
1
0.825
Maximal silhouette value
Silhouette Values
0.65
200
300
400
400 Space
Davies Bouldin Values
0.7
CR IP T
ACCEPTED MANUSCRIPT
200
0
-200 100
500
Time (frames)
200
300
400
500
Time (frames) (d) 4 Clusters (re-clustering)
CE
(c) 15 Clusters
Figure 6: Counting and pattern recognition. (a) plots Davies-Bouldin and silhouette values for each number of clusters tested. (b) represents a silhouette plot created from clustered
AC
data. (c) plots EM-estimated clusters.
26
ACCEPTED MANUSCRIPT
585
number of vehicles. In other words, the four vehicles crossing the window have
586
to be recounted and modelled. In this experiment, we use two computers, one of them as a transmitter
588
and the other as a receiver. The transmitter ensures the subdivision of the
589
video and the transmission of subsequences one after another. The receiver
590
picks up subsequences and carry out all other tasks. This experiment simulates
591
a synchronous implementation of the framework. Cutting the video sequence
592
is made at t1 = 8 s : 128 ms and t2 = 13 s : 216 ms, which gives the three
593
subsequences whose scatters are plot in Figure 7. Unlike the first experiment,
594
we only use Gaussian Mixture Models to both count and recognize vehicles.
595
As explained above, the principle is to fixe in advance a maximum number of
596
clusters (≥10). Once estimation is complete, new groups of intersected clusters
597
at each sequence gives the number of vehicles G in that sequences. A second
598
estimation is carried out, with a preliminary fixed number of clusters G, and
599
aims at recognizing the vehicles categories. Frontier clusters are particularly
600
treated as explained above (see Figure 8), to finally find the same results of the
601
first experiment (see Figure 9).
CE
PT
ED
M
AN US
CR IP T
587
Figure 7: Scatters extracted from the three consecutive subsequences after subdividing the
AC
traffic video.
602
603
27
CR IP T
ACCEPTED MANUSCRIPT
Figure 8: A first cluster analysis for identifying vehicles. Border clusters whose distance between their centers does not exceed a limited value are firstly identified to be merged into
604
AN US
a unique cluster.
7.3. Experiment 3
The current experiment considers cases where recording conditions have not
606
been optimized. One way to simulate this situation is to use benchmark se-
607
quences such as those given in the left-hand side of Figure 10. In the first
608
sequence, the camera angle and the distance seem to be not appropriate for a
609
obtaining perfect results. Moreover, the traffic flow rate is higher than that of
610
the previous experiment, with a value exceeding 10000 vehicles per hour (see
611
Chan and Vasconcelos [9]). The second and third traffic videos respectively
612
represent cases where fog and snowfall engulf the landscape, thus affecting the
613
visibility of the scene. These adverse weather conditions are commonly encoun-
614
tered in traffic scenes. As for the first case, it is clear that shots have been
615
taken from long distance. The two latter sequences are provided by Eichberger,
616
Fleischer and Leuck2 and have been incrementally used as benchmark data in
CE
PT
ED
M
605
617
earlier works such as in Chen et al. [10] and Zhang et al. [58]. The aim of this experiment is to test the performance and usefulness of the
619
proposed approach under more or less unfavorable conditions. To meet this
620
objective, we propose a numerical simulation, in which we use the clustering
621
techniques to detect the vehicles on their roads in each one of the benchmark
AC
618
2 http://i21www.ira.uka.de/image
sequences/Karl-Wilhelm-Straße: stationary camera in-
stallation by German Eichberger, Klaus Fleischer and Holger Leuck.
28
500
400
400
200 100
0
120
140
160
180
Time (frames)
M
300
ED
Space
400
300
220
240
260
280
300
480
500
600
500
0
200
(b) Subsequence 2
600
100
200
Time (frames)
(a) Subsequence 1
200
200 100
0 100
300
AN US
300
CR IP T
600
500
Space
600
320
340
360
380
500 400
Space
Space
ACCEPTED MANUSCRIPT
300 200 100 0
400
400
420
440
460
Time (frames)
PT
Time (frames)
(c) Subsequence 3
(d) Subsequence 4
CE
Figure 9: A second analysis for merging border clusters. Border clusters whose distances
AC
between their centers does not exceed a limited value are merged into a unique cluster.
29
ACCEPTED MANUSCRIPT
sequences. Any one of the 28 models of Table 1 can be assigned to each of these
623
scenes, and this is done according to the Bayesian criterion. This procedure is
624
repeated several times in order to only provide consistent results (i.e., results
625
corresponding to the most recurrent models). In this experiment, we assess the
626
detection accuracy on the basis of the of the fit between clusters and actual
627
vehicles. Each cluster surrounding the scatter corresponding to a particular
628
vehicle will be counted among the right predictions. Otherwise, the detection
629
of the vehicle will be considered wrongly predicted. Throughout the simulation
630
study, the most reccurent model for each one of the scenes is depicted in the
631
right-hand side of Figure 10.
AN US
CR IP T
622
For Sequence 1, almost all the 12 vehicles are suitably detected and ac-
633
curately quantified, although shots have been taken from long distance. The
634
red scatter fitting represents the only exception since the cluster is somewhat
635
affected by few outliers. Since data are naturally overlapped in the current
636
experiment, contrary to what has been previously done when dealing with in-
637
tersecting clusters, the counting rule has been slightly modified. This is why,
638
we rather consider nested clusters as a single vehicle. Mathematically, this con-
639
dition can be straightforwardly checked by testing whether all elements of the
640
scatter of the included cluster belong to the space defined by the including clus-
641
ter. A metric function for handling overlapped clusters according to the level
642
of the density of the traffic can be developed as a part of a future work. For
643
sequences 2 and 3, the results seem the most affected by the exceptional record-
644
ing conditions. For both cases, we only retrieve a number G − 1 vehicles, i.e.,
CE
PT
ED
M
632
one less vehicle on the road. This shows that, with a distantly-installed camera
646
and under adverse weather and illumination conditions, the system can lead to
647
imprecise results. In the two problems, the choice of the threshold εˆ has to be
AC
645
648
carefully chosen. A too low value can give rise to a potential number of outliers.
649
On the other hand, a too high value can hide some significant patterns, which
650
are replaced by dispersed points whose effect is similar to that of outliers, such
651
as in the case of the grey car in the bottom of Figure 10(c). Thus, as seen in
652
Figures 10(d) and 10(f), the clusters in the neighborhood are affected by the 30
ACCEPTED MANUSCRIPT
few remaining points. The current example highlights the importance of offer-
654
ing the suitable preliminary conditions for the video recording. In the case of
655
exceptional circumstances, choosing an optimal threshold and performing ade-
656
quate data preprocessing (like defogging/desnowing) should be incorporated as
657
essential parts of the process.
658
7.4. Experiment 4
CR IP T
653
The fourth application is a real study with the objective of testing the count-
660
ing and recognizing methodology in a more realistic framework. The Saudi cap-
661
ital, Riyadh, is chosen as the place for carrying out tests. Recording is made
662
at the Northern Ring Road (East), which is one of the key roads in Riyadh.
663
The experiment is implemented following the demarche described in section 3.
664
A wireless IP camera (2.0-Megapixel) is fixed at a height of 7 meters for live
665
video streaming. Data are transmitted to a portable computer to be processed.
666
The contour plot of the pixels line intensity in Figure 11 gives an overview of
667
the filmed sequence. A number of 106 vehicles are counted during the short
668
period of the experiment, 6 among them are Heavy Goods Vehicles (HGV)
669
while remained are light-kind vehicles. The video sequence has been picked up
670
a Saturday morning of a sunny day of March.
ED
M
AN US
659
The video sequence is wisely subdivided into 14 portion in manner allow-
672
ing to avoid border clusters. The sequence consists of 2972 frames and the
673
bandwidth of each proportion varies between 100 and 250 frames. The first
674
estimation is launched while fixing the preliminary number of vehicles by sub-
PT
671
sequence to 20. As explained in the previous section, the number/subsequence
676
is forthwith reduced once we consider groups of intersected clusters. Final esti-
677
mation allows to assess vehicles’ weights (categories). In the current example,
AC
CE 675
678
scatters has been further regressed using the well-known Principal Component
679
Analysis (PCA) method. The last estimation is carried out under general, di-
680
agonal and spherical models with, of course, varying proportions (see Table 1).
681
General models were the best according to the BIC measure, with a counting
682
success exceeding 97%, (see results in Figure 12). Besides, all HGVs have been 31
ACCEPTED MANUSCRIPT
CR IP T
30
50 20
100 150
10
Axis 2
200 250
0
−10
300 350
−20
400 −30 −150
100
200
300
400
500
−50
0
50
100
150
Axis 1
(a) Sequence 1: Misplaced camera
50
(b) Detecting vehicles in sequence 1
50
100
40
150
30
200
20
Axis 2
250 300
10
0
−10
350
−20
450 500 550 200
300
400
500
600
(c) Sequence 2: Fogy landscape
50 100
−50 −250
50
40 20
Axis 2
0 −20
400
−40
450
−60
CE
0
60
350
500
−80
550
−100 −100
400
−50
80
300
300
−100
100
250
200
−150
Axis 1
200
100
−200
(d) Detecting vehicles in sequence 2
PT
150
−40
700
ED
100
−30
M
400
AC
−100
600
AN US
450
500
600
−50
0
50
100
150
Axis 1
700
(e) Sequence 3: Snowy landscape
(f) Detecting vehicles in sequence 3
Figure 10: Detecting vehicles under unfavorable recording conditions.
32
ACCEPTED MANUSCRIPT
well identified and localized, as seen in subplots S1 , S3 , S9 , S10 and S13 in Fig-
684
ure 12. However, the sensibility of GMMs to outliers, which is clear in subplots
685
S3 and S10 , should be noticed as one of the weaknesses of the method. Out-
686
liers which are caused by some factors such as shadows of vehicles or any other
687
object crossing the detection line, can introduce uncertainty to the recognition
688
function. The system has also proven to be promising in crowded scenes. In ar-
689
eas where vehicles was the least scattered, clusters have been properly assessed.
690
This proves that, once the system is perfectly implemented, results in crowded
691
areas can be of great interest.
ED
M
AN US
CR IP T
683
Figure 11: A contour plot of intensity multisignal drawn by the detector line. Recording is
PT
made at the Northern Ring Road (East) of Riyadh (Saudi Arabia).
It is well known that Gulf countries are characterized by a desert climate
693
where sandy and dusty winds often occur and affect the quality of the record
CE
692
videos, in addition to other known problems like the rain/snow/fog/haze condi-
695
tions. Mixing such patterns could be assimilated to a randomly varying noise.
696
Hence, as a second stage of this case study, additional experiments are car-
AC
694
697
ried out on the basis of a degraded version of same video sequence. This time,
698
the video is preliminarily blurred and contrast-modified, before being corrupted
699
(additively and multiplicatively) by a dynamic noise. We assume two different
700
degradation models: The first model consists in blurring the video by a Point
701
Spread Function (PSF) before injecting noise. The second model assumes a poor 33
ACCEPTED MANUSCRIPT
contrast ratio and the video is also corrupted by a gradually increasing noise.
703
The figure 13 gives an overview of the degradation process through some selected
704
noisy frames. An accuracy measure is used to assess the stability of the counting
705
and recognizing method against noise and visual damaging effect. The measure
706
is equivalent to the proportion of vehicles that have been correctly recounted
707
and roughly recognized. The results of the accuracy percentage are reported in
708
Table 3. The results show that clustering is significantly affected by the noise
709
level and the quality of the resolution. This allowed us to add another point to
710
the list of shortcomings to be overcame in future research directions by adopting
711
hybrid strategies with schemes of denoising [41], dehazing/deraining/desnowing
712
[24, 25, 35]. Such a combined strategy could be efficiently used in real complex
713
environments (e.g. adverse illumination and weather conditions).
AN US
CR IP T
702
Table 3: Assessing the accuracy percentage (%) of the counting and recognizing method while
Original
Blurred
Poor contrast
97
97
97
89
82
85
43
30
40
20
16
18
PT
its contrast level.
M
degrading the test video. A noise sequence is added to the video while blurring it and varying
14
9
11
8
3
5
Speckle(0.00)
97
97
97
Speckle(0.03)
96
96
95
Speckle(0.05)
92
88
90
Speckle(0.10)
81
48
80
Speckle(0.15)
76
32
71
Speckle(0.20)
29
25
27
N (0, 0.00) N (0, 0.03) N (0, 0.05) N (0, 0.10) N (0, 0.15)
ED
Additive noise
N (0, 0.20)
AC
CE
Multiplicative noise
714
715
To recapitulate, let us mention some of the distinguishable advantages of the 34
ACCEPTED MANUSCRIPT
proposed strategy. Cheapness of the strategy implementation and maintenance,
717
in comparison with many classical tools that are commonly used, is one such
718
advantages. The device can be easily extended to an autonomous solar powered
719
system. We should also notice the high practicability of the method since its
720
principle is mainly related to some simple mathematical notions. In fact, visual
721
information are converted to quantitative data, which are handled by GMM
722
statistical clustering tools. These models, which are known to be parsimonious,
723
are assessed using the EM algorithm which is also have many appealing features
724
such as its simplicity and numerical stability. The originality of the approach
725
is also an advantage that deserves to be mentioned. The method is one of the
726
rare that can simultaneously count and recognize vehicles. Extracted signals
727
reflect the dynamics of the traffic and can consequently be further exploited to
728
provide traffic information such as the degree of congestion, types of vehicles,
729
traffic incidents, etc. The results of the experiments show that the system can
730
be easily implemented and extended to serve drivers needs.
AN US
CR IP T
716
However, a number of shortcomings have appeared in this work and need
732
to be addressed as a priority. The first issue to be critically examined is the
733
consistency of the method, especially in the case of high frequency traffic flows.
734
Focusing on the robustness and accuracy of the approach for the treatment of
735
traffics with a considerable amount of small vehicles such as the motorbikes
736
and microcars can also be carried out in a soon future work. These problems
737
demand more sophisticated equipments to register high resolution traffic video.
738
Splitting videos into subsequences is also a task that weighs down the procedure.
CE
PT
ED
M
731
In particular, treating frontier clusters has been somewhat intricate. Adopting
740
a continuous strategy in future works will contribute to a more effective one.
741
Besides, using Gaussian densities to describe clusters can be too restrictive and
AC
739
742
can also be extended to provide a more flexible framework. Otherwise, trans-
743
forming scatters’ data can be carried out to reflect more accurately Gaussian
744
clusters. Finally, testing the system in real world traffic conditions such as am-
745
bient lighting (day or night), shadows, vehicle occlusion, inclement weather, is
746
also a very challenging topic and needs careful consideration in the future. 35
ACCEPTED MANUSCRIPT
747
8. Conclusion The flows of circulating vehicles in the main roads of a region represent a
749
determining factor to stimulate economic growth and development. They can
750
also hide a number of economic externalities. Defining the size of these flows
751
and the nature of circulating vehicles is a matter of great importance since it
752
may explain a great part of the urban transport problems. Hence, appropriately
753
estimating such flows is the first stage of the process towards defining a strat-
754
egy with optimal choices on use and combination of transportation means to
755
achieve maximum effectiveness in transports and transfers. However, studying
756
this class of problems is generally hampered by the lack of real data due to the
757
high maintenance costs that necessitate existing techniques. In this paper, we
758
developed a strategy based on video-detection and the simple inference of their
759
datastreams. The proposed strategy consists in considering a virtual inductive-
760
loop within the traffic video by focusing on a set of pixels whose intensities
761
define a set of signals. However, several exogenous factors, such as the sudden
762
variation of vehicles velocity, particular conditions of luminosity, and the pres-
763
ence of a number of moving artifacts in the scene, may introduce a considerable
764
amount of randomness to the studied data. Besides, the color composition of
765
vehicles can vary greatly as the windshield, top of the car can have different
766
colors, affecting the precision of a counting method. Consequently, flexible and
767
parsimonious statistical models are applied to summarize the set of signals with
768
the aim of finding and characterizing particular clusters corresponding to the
769
population of vehicles. We explained all the strategy and the process of imple-
770
mentation. A numerical validation carried out on a benchmark video sequence
771
shows the accuracy of the technique. In forthcoming works, the technique will
CE
PT
ED
M
AN US
CR IP T
748
eventually be implemented as part of a project for counting and characteriz-
773
ing vehicles fleets of most vivacious roads in the different areas of the Tunisian
774
capital.
AC 772
36
ACCEPTED MANUSCRIPT
775
Appendix
Table 4: Extracting, transforming and clustering the motion data.
CR IP T
Algorithm 1 Input: (yX ,τ , Gmax , Θ0 , γ).
Output: A report containing the Gmax clusters parameters Θ∗g , g = 1, . . . , Gmax and a respective partition of the scatter. 1:
Time-differentiating data: for t = 1 to T do y˙ x,t := y(x, t + 1) − y(x, t)
3:
AN US
end for 2:
Fixing the threshold εˆ = supx,t |y(x, ˙ t)|.
Construct the dataset S whose elements form the scatter. Hence, set S = ∅: for x = 1 to n do for t = 2 to T do while εˆ < y˙ x,t do
S := S ∪ {s}, where s = (x, t, y˙ x,t )†
M
end while end for end for
Perform a cluster analysis to the dataset S. A Gmax -GMM is chosen and the EM
ED
4:
algorithm is used to calculate its parameters set Θ containing π, µ and Σ, and also partition data into Gmax sets of similar characteristics y˜˙ = {y˜˙ (1) , . . . , y˜˙ (Gmax ) }. Starting from an arbitrary solution Θ0 , iteratively compute Θ∗ :
PT
Set i := 0, and calculate Θ1 = EM(Θ0 ), where EM( . ) is as defined in Eqs.(15) and (16) while ||Θi+1 − Θi || ≥ γ do
CE
i := i + 1
Calculate Θi+1 = EM(Θi )
AC
end while
776
Acknowledgment
777
We would like to thank the anonymous reviewers for their insightful and con-
778
structive comments that greatly contributed to improving the paper. Our many 37
ACCEPTED MANUSCRIPT
Table 5: Clusters adjustment. Algorithm 2 Input: (y˜˙ , t0 , δt, Gmax , Θ∗ , γ). {C1 , C2 , . . . , CG }. 1:
Reducing the number of clusters to an optimal number G. Set G := {∅}: for h = 1 to Gmax do if y˜˙ (h) 6⊂ G then
G := G ∪ {y˜˙ (h) }
end if
AN US
0 0 (h) while ∃ y˜˙ x,t ∈ y˜˙ (h) ∩ y˜˙ (h ) , ∀ h 6= h0 and y˜˙ (h ) 6⊂ G do
G := Gmax − 1,
0 G := G ∪ {y˜˙ (h ) }
end while end for 2:
CR IP T
Output: A report containing the number of vehicles G and their respective clusters
Re-clustering elements arising from neighbor cluster into G groups. A G-GMM is estimated using the EM algorithm, see Algorithm 1 – 4:. The result is a set of parameters {Θ1 , Θ2 , . . . , ΘG } and their respective groups {y˜˙ (1) , . . . , y˜˙ (G) }. Classifying clusters into two categories: border B and non-border N B clusters. Set B = N B = {∅}. for g = 1, . . . , G do
∈ y˜˙ (g) , ∀ x then
ED
(g) if ∃ y˜˙ x,t
M
3:
0 +δt
B := B ∪ {y˜˙ (g) }
else
PT
N B := N B ∪ {y˜˙ (g) }
end if
CE
end for
thanks go also to the editorial staff for their generous support and assistance
780
during the review process.
AC
779
38
ACCEPTED MANUSCRIPT
781
References [1] A. Ambardekar, M. Nicolescu, G. Bebis & M. Nicolescu, (2014). Vehicle
783
classification framework: a comparative study. EURASIP Journal on Image
784
and Video Processing, vol. 2014, no. 29, pp. 1–13.
CR IP T
782
785
[2] O. Arbelaitz, I. Gurrutxaga, J. Muguerza, J.M. P´erez & I. Perona, (2013).
786
An extensive comparative study of cluster validity indices. Pattern Recog-
787
nition, vol. 46, no. 1, pp. 243–256.
[3] F. Archetti, E. Messina, D. Toscani & L. Vanneschi, (2006). Classifying
789
and counting vehicles in traffic control applications. Applications of Evo-
790
lutionary Computing, Lecture Notes in Computer Science, vol. 3907, pp.
791
495–499.
AN US
788
[4] E. Bas, M. Tekalp & F.S. Salman, (2007). Automatic vehicle counting from
793
video for traffic flow analysis. IEEE Symposium on Intelligent Vehicle - IV,
794
pp. 392–397.
M
792
[5] C. Biernacki, G. Celeux, G. Govaert & F. Langrognet, (2006). Model-based
796
cluster and discriminant analysis with the mixmod software. Computational
797
Statistics and Data Analysis, vol. 51, pp. 587–600.
ED
795
[6] T. Bragatto, G. Ruas, V. Benso, M.V. Lamar, D. Aldigueri, G.L. Teixeira
799
& Y. Yamashita, (2008). A new approach to multiple vehicle tracking in
800
intersections using harris corners and adaptive background subtraction.
801
IEEE Intelligent Vehicles Symposium, pp. 548–553.
CE
PT
798
802
[7] T. Calinski & J. Harabasz, (1974). A dendrite method for cluster analysis. Communications in Statistics, vol. 3, pp. 1–27.
AC
803
804
805
[8] G. Celeux & G. Govaert, (1995). Gaussian parsimonious clustering models. Pattern Recognition, vol. 28, pp. 781–793.
806
[9] A.B. Chan & N. Vasconcelos, (2005). Classification and retrieval of traffic
807
video using auto-regressive stochastic processes. IEEE Intelligent Vehicles
808
Symposium, Las Vegas, USA. 39
ACCEPTED MANUSCRIPT
809
[10] S. Chen, M. Shyu, C. Zhang & J. Strickrott, (2002). A multimedia data
810
mining framework: mining information from traffic video sequences. Jour-
811
nal of Intelligent Information Systems, vol. 19, no. 1, pp. 6177. [11] S. Y. Cheung, S. Coleri, B. Dundar, S. Ganesh, C. Tan & P. Varaiya, (2005).
813
Traffic measurement and vehicle classification with single magnetic sensor.
814
Journal of the Transportation Research Board, vol. 1917, pp. 173–181.
815
[12] M.V. Chitturi, J.C. Medina & R.F. Benekohal, (2010). Effect of shadows
816
and time of day on performance of video detection systems at signalized
817
intersections. Transportation Research Part C: Emergerging Technologies,
818
vol. 18, no. 2, pp. 176–186.
AN US
CR IP T
812
819
[13] B. Coifman, D. Beymer, and P. McLauchlan, (1998). A real-time computer
820
vision system for vehicle tracking and tracking surveillance. Transportation
821
Research Part C: Emergerging Technologies, vol. 6, no. 4, pp. 271–288. [14] J.C. Dunn, (1973). A fuzzy relative of the ISODATA process and its use
823
in detecting compact well-separated clusters. Journal of Cybernetics, vol.
824
3, pp. 32-57.
ED
M
822
[15] D.L. Davies & D.W. Bouldin, (1979). A clustering separation measure.
826
IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 1,
827
pp. 224–227.
PT
825
[16] A.P. Dempster, N.M. Laird & D.B. Rubin, (1977). Maximum likelihood
829
for incomplete data via the EM algorithm. Journal of the Royal Statistical
CE
828
Society, Series B, vol. 39, pp. 1-38.
830
[17] A.B. De Oliveira & J. Scharcanski, (2010). Vehicle counting and trajectory
AC
831
832
detection based on particle filtering. Twenty-third SIBGRAPI Conference
833
on Graphics, Patterns and Images, pp. 376–383.
834
[18] A. Elgammal, D. Harwood & L. Davis, (2000). Non-parametric model for
835
background subtraction. Proceedings of the 6th European Conference on
836
Computer Vision-Part II, pp. 751–767. 40
ACCEPTED MANUSCRIPT
[19] Z. Fu-min, L. L¨ u-chao, J. Xin-hua & L. Hong-tu, (2014). An automatic
838
recognition approach for traffic congestion states based on traffic video.
839
Journal of Highway and Transportation Research and Development, vol. 8,
840
no. 2, pp. 72–80.
CR IP T
837
841
[20] T. Gates, S. Schrock & J. Bonneson, (2004). Comparison of portable speed
842
measurement devices. Journal Transportation Research Record: Journal of
843
the Transportation Research Board, vol. 1870, pp. 139–146.
[21] T. Horprasert, D. Harwood & L.S. Davis, (1999). A statistical approach for
845
real-time robust background subtraction and shadow detection. in IEEE
846
ICCV’99 Frame-Rate Workshop.
AN US
844
847
[22] J.S. Hu & T.M. Su, (2007). Robust background subtraction with shadow
848
and highlight removal for indoor surveillance. EURASIP Journal on Ad-
849
vanced Signal Processing, vol. 10, no. 1, pp. 108132.
[23] Z. Iftikhar, P. Premaratne & P. Vial, (2014). Computer vision based traffic
851
monitoring system for multi-track freeways. Lecture Notes in Computer
852
Science, vol. 8589, pp. 339–349.
ED
M
850
[24] H.G. Kim, S.J. Seo & B.C. Song, (2015). Multi-frame de-raining algorithm
854
using a motion-compensated non-local mean filter for rainy video sequences.
855
Journal of Visual Communication and Image Representation, vol. 26, pp.
856
317–328.
PT
853
[25] J-H. Kim, J-Y. Sim & C-S. Kim, (2015). Video deraining and desnowing
858
using temporal correlation and low-rank matrix completion. IEEE Trans-
859
actions on Image Processing, vol. 24, no. 9, pp. 2658–2670.
AC
CE
857
860
[26] A. Lagorio, E. Grosso & M. Tistarelli, (2008). Automatic detection of ad-
861
verse weather conditions in traffic scenes. In the IEEE Fifth International
862
Conference on Advanced Video and Signal Based Surveillance, pp. 273–279.
41
ACCEPTED MANUSCRIPT
863
[27] D. Li, J. Shan, Z. Shao, X. Xhou & Y. Yao, (2013). Geomatics for smart
864
cities – concept, key techniques, and applications. Geo-spatial Information
865
Science, vol. 16, no. 1, pp. 13–24. [28] Y. Liu, Y. Lu, Q. Shi & J. Ding, (2013). Optical flow based urban road vehi-
867
cle tracking. Ninth International Conference on Computational Intelligence
868
and Security (CIS), pp. 391-395.
869
CR IP T
866
[29] K.H. Lu, C.M. Wang & S.Y. Chen, (2008). Traffic light recognition. Journal of the Chinese Institute of Engineers, vol.31, pp. 1069–1075.
870
[30] J.C. McCall & M.M. Trivedi, (2006). Video-based lane estimation and
872
tracking for driver assistance: survey, system, and evaluation. IEEE Trans-
873
actions on Intelligent Transportation Systems, vol. 7, no. 1, pp. 20–37.
AN US
871
[31] S.K. Meher & M.N. Murty, (2011). Efficient detection and counting of
875
moving vehicles with region-level analysis of video frames. Proceedings of
876
the International Conference on SocProS, AISC 131, pp. 29–40.
M
874
[32] L. Mimbela, (2007). A summary of vehicle detection and surveillance tech-
878
nologies used in intelligent transportation systems. Federal Highway Ad-
879
ministration, Washington, D.C.
ED
877
[33] P.K. Mishra, M. Athiq, A. Nandoriya & S. Chaudhuri, (2013). Video-based
881
vehicle detection and classification in heterogeneous traffic conditions using
882
a novel kernel classifier. IETE Journal of Research, vol. 59, no. 5, pp. 541–
883
550.
CE
PT
880
[34] G. Mu, Z. Xinyu, L. Deyi, Z. Tianlei & A. Lifeng, (2015). Traffic light
885
detection and recognition for autonomous vehicles. The Journal of China
886
Universities of Posts and Telecommunications, vol. 22, no. 1, pp. 50–56.
AC
884
887
[35] S.G. Narasimhan & S.K. Nayar, (2003). Contrast restoration of weather
888
degraded images. IEEE Transactions on Pattern Analysis and Machine
889
Intelligence, vol. 25, no. 6, pp. 713–724.
42
ACCEPTED MANUSCRIPT
890
[36] X. Pan, Y. Guo & A. Men, (2010). Traffic surveillance system for vehicle
891
flow detection. Second International Conference on Computer Modeling and
892
Simulation, pp. 314–318. [37] H. Rabbouch, F. Saˆ adaoui & A.V. Vasilakos, (2016). A wavelet-assisted
894
subband de-noising for the tomographic image reconstruction. Forthcoming.
895
[38] P. Rousseeuw, (1987). Silhouettes: a graphical aid to the interpretation
896
and validation of cluster analysis. Journal of Computational and Applied
897
Mathematics, vol. 20, pp. 53–65.
[39] F. Saˆ adaoui, (2012). A probabilistic clustering method for US interest rate
AN US
898
CR IP T
893
analysis. Quantitative Finance, vol. 12, no. 1, pp. 135–148.
899
900
[40] F. Saˆ adaoui & H. Rabbouch, (2014). A wavelet-based multi-scale vector
901
ANN model for econophysical systems prediction. Expert Systems with Ap-
902
plications, vol. 41, no. 13, pp. 6017–6028.
[41] R. Sammouda, A.M.S. Al-Salman, A. Gumaei & N. Tagoug, (2015). An
904
efficient image denoising method for wireless multimedia sensor networks
905
based on DT-CWT. International Journal of Distributed Sensor Networks,
906
vol. 2015, pp. 1–13.
ED
M
903
[42] A. S´ anchez, P.D. Su´ arez, A. Conci & E. Nunes, (2011). Video-based dis-
908
tance traffic analysis: Application to vehicle tracking and counting. Com-
909
puting in Science and Engineering, vol. 13, no. 3, pp. 38–45.
CE
PT
907
[43] N. Seenouvong, U. Watchareeruetai, C. Nuthong, K. Khongsomboon & N.
911
Ohnishi, (2016). A computer vision based vehicle detection and counting
912
system. 8th International Conference on Knowledge and Smart Technology
913
(KST), Chiangmai – Thailand.
AC
910
914
[44] S.L. Skszek, (2001). ”State-of-the-Art” report on non-traditional traffic
915
counting methods. Technical report no. FHWA-AZ-01-503, Arizona De-
916
partment of Transportation.
43
ACCEPTED MANUSCRIPT
[45] M. Sun, K. Wang, M. Tang, F.-Y. Wang & J. Yang, (2011). Video vehicle
918
detection through multiple background-based features and statistical learn-
919
ing. In Proc. IEEE Intelligent Transportation Systems Conference, Wash-
920
ington, DC (2011), pp. 1337–1342.
CR IP T
917
921
[46] L. Unzueta, M. Nieto, A. Cort´es, J. Barandiaran, O. Otaegui & P. S` anchez,
922
(2012). Adaptive Multicue Background Subtraction for Robust Vehicle
923
Counting and Classification. IEEE Transaction on Intelligent Transporta-
924
tion Systems, vol. 13, no. 2, pp. 527–540.
[47] K. Wang & Y. Yao, (2015). Video-based vehicle detection approach with
926
data-driven adaptive neuro-fuzzy networks. International Journal of Pat-
927
tern Recognition and Artificial Intelligence, vol. 29, no. 7, pp. 1–32.
AN US
925
[48] Y. Xia, X. Shi, G. Song, Q. Geng & Y. Liu, (2016). Towards improving
929
quality of video-based vehicle counting method for traffic flow estimation.
930
Signal Processing, vol. 120, pp. 672–681.
931
M
928
[49] Y. Xia, C. Wang, X. Shi & L. Zhang, (2014). Vehicles overtaking detection using RGB-D data. Signal Processing, vol. 112, pp. 98-109.
ED
932
[50] Y. Xia, W. Xu, L. Zhang, X. Shi & K. Mao, (2014). Integrating 3D structure
934
into traffic scene understanding with RGBD data. Neurocomputing, vol.
935
151, part 2, pp. 700-709.
PT
933
[51] J. Yang & Y. Dai, (2012). A modified method of vehicle extraction based on
937
background subtraction. IEEE International Conference on Fuzzy Systems
CE
936
(FUZZ-IEEE), pp. 1–5.
938
[52] M.T. Yang, R.K. Jhang & J.S. Hou, (2013). Traffic flow estimation and
AC
939
940
vehicle-type classification using vision-based spatial-temporal profile anal-
941
ysis. IET Computer Vision, vol. 7, no. 5, pp. 394–404.
942
[53] N.H.C. Yung & A.H.S. Lai (2001). An effective video analysis method for
943
detecting red light runners. IEEE Transactions on Vehicular Technolology,
944
vol.50, pp. 1074–1084. 44
ACCEPTED MANUSCRIPT
945
[54] H. Zhang & K. Wu, (2012). A vehicle detection algorithm based on three-
946
frame differencing and background subtraction. Fifth International Sympo-
947
sium on Computational Intelligence and Design (ISCID), pp. 148–151. [55] L. Zhang, M. Song, X. Liu, J. Bu & C. Chen, (2013). Fast multi-view
949
segment graph kernel for object classification. Signal Processing, vol. 93
950
pp. 1597–1607.
CR IP T
948
[56] L. Zhang, Y. Gao, C. Hong, Y. Feng, J. Zhu & D. Cai, (2014). Feature
952
correlation hypergraph: exploiting high-order potentials for multimodal
953
recognition. IEEE Transactions on Cybernetics, vol. 44, pp. 1408–1419.
AN US
951
954
[57] R. Zhao & X. Wang, (2013). Counting vehicles from semantic regions.
955
IEEE Transactions on Intelligent Transportation Systems, vol. 14, no. 2,
956
pp. 1016–1022.
[58] J. Zhu, Y. Lao & Y.F. Zheng, (2010). Object tracking in structured environ-
958
ments for video surveillance applications. IEEE Transactions on Circuits
959
and Systems for Video Technology, vol. 20, no. 2, pp. 223–235.
AC
CE
PT
ED
M
957
45
AN US
CR IP T
ACCEPTED MANUSCRIPT
M
960
Hana Rabbouch received her Research Master’s degree in Image Process-
962
ing from the National School of Computer Science ENSI at the University of
963
Manouba (Tunisia). She is currently enrolled at the Higher Institute of Man-
964
agement of Tunis (Tunis University) finishing her doctoral thesis in Business
965
Informatics. Image processing with applications in management, transporta-
966
tion and medicine constitute her principal areas of research.
AC
CE
PT
ED
961
46
ED
M
AN US
CR IP T
ACCEPTED MANUSCRIPT
967
Foued Saˆ adaoui received his Ph.D. degree in Quantitative Methods from
969
Sousse University (Tunisia). He is a member of the laboratory “Mathematical
970
Physics, Special Functions and Applications” at the High School of Sciences
PT
968
and Technology of Hammam Sousse. Currently, he is Associate Professor at the
972
College of Sciences and Arts of the Saudi Electronic University (Riyadh - Saudi
973
Arabia). Some of his main areas of research are that of computing, statistical
AC
CE 971
974
modeling and data analysis.
47
AN US
CR IP T
ACCEPTED MANUSCRIPT
975
Rafaa Mraihi is an Associate Professor in Transportation Sciences at the
977
High School of Business (University of Manouba, Tunis). He received his PhD in
978
Economic Sciences with a specialization in industrial economy of transport from
979
University of Toulouse 1 and University of Sfax. His research interest addresses
980
the sustainable transport problems and modelling and planning of transport.
AC
CE
PT
ED
M
976
48
AC
CE
PT
ED
M
AN US
CR IP T
ACCEPTED MANUSCRIPT
Figure 12: Results of assessing the number of vehicles and recognizing their categories in a road of Riyadh. Among a number of 106 vehicles, 103 have been detected and characterized, which is equivalent to a success rate exceeding 97%.
49
M
AN US
(a) Noised frames
CR IP T
ACCEPTED MANUSCRIPT
CE
PT
ED
(b) Noised and blurred frames
(c) Noised frames with a poor contrast
Figure 13: Frames from the degraded traffic video sequences.
In the left-hand-side, the
sequence is corrupted by an additive Gaussian noise. In the right-hand-side, the sequence
AC
is multiplicatively corrupted by a speckle noise. The center and bottom images correspond respectively to the blurred and the contrast-modified sequences.
50