Unsupervised video summarization using cluster analysis for automatic vehicles counting and recognizing

Unsupervised video summarization using cluster analysis for automatic vehicles counting and recognizing

Communicated by Dr Li Bing Accepted Manuscript Unsupervised Video Summarization using Cluster Analysis for Automatic Vehicles Counting and Recognizi...

6MB Sizes 0 Downloads 9 Views

Communicated by Dr Li Bing

Accepted Manuscript

Unsupervised Video Summarization using Cluster Analysis for Automatic Vehicles Counting and Recognizing Hana Rabbouch, Foued Saadaoui, Rafaa Mraihi ˆ PII: DOI: Reference:

S0925-2312(17)30696-3 10.1016/j.neucom.2017.04.026 NEUCOM 18359

To appear in:

Neurocomputing

Received date: Revised date: Accepted date:

19 April 2016 3 January 2017 8 April 2017

Please cite this article as: Hana Rabbouch, Foued Saadaoui, Rafaa Mraihi, Unsupervised Video Sumˆ marization using Cluster Analysis for Automatic Vehicles Counting and Recognizing, Neurocomputing (2017), doi: 10.1016/j.neucom.2017.04.026

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT

4

b ˆ Hana RABBOUCHa,∗, Foued SAADAOUI , Rafaa MRAIHIc

2

a Universit´ e

7 8 9 10

AN US

de Tunis, Institut Sup´ erieur de Gestion de Tunis Cit´ e Bouchoucha - 2000 Tunis, TUNISIA b Saudi Electronic University, Abu Bakr Seddiq Branch Road Al-Rabiea District - 11673 Riyadh, SAUDI ARABIA c Universit´ e de Manouba, Ecole Sup´ erieure de Commerce de Tunis Campus Universitaire de La Manouba - 2010 Tunis, TUNISIA

5 6

11

CR IP T

3

Unsupervised Video Summarization using Cluster Analysis for Automatic Vehicles Counting and Recognizing

1

Abstract

Automatic Vehicles Counting and Recognizing (AVCR) is a very challenging topic in transport engineering having important implications for the modern transport policies. Implementing a computer-assisted AVCR in the most vi-

M

tal districts of a country provides a large amount of measurements which are statistically processed and analyzed, the purpose of which is to optimize the

ED

decision-making of traffic operation, pavement design, and transportation planning. Since the advent of computer vision technology, video-based surveillance of road vehicles has become a key component in developing autonomous intel-

PT

ligent transportation systems. In this context, this paper proposes a Pattern Recognition system which employs an unsupervised clustering algorithm with the objective of detecting, counting and recognizing a number of dynamic ob-

CE

jects crossing a roadway. This strategy defines a virtual sensor, whose aim is similar to that of an inductive-loop in a traditional mechanism, i.e. to extract

AC

from the traffic video streaming a number of signals containing anarchic information about the road traffic. Then, the set of signals is filtered with the aim of conserving only motion’s significant patterns. Resulted data are subsequently processed by a statistical analysis technique so as to estimate and try to recognize a number of clusters corresponding to vehicles. Finite Mixture Models fitted by the EM algorithm are used to assess such clusters, which provides ∗ Corresponding author Preprint submitted to Journal of LATEX Templates April 21, 2017 Email addresses: [email protected] (Hana RABBOUCH), ˆ [email protected] (Foued SAADAOUI), [email protected] (Rafaa MRAIHI)

ACCEPTED MANUSCRIPT

important information about the traffic such as the instantaneous number of vehicles, their weights, velocities and intervehicular distances. Keywords: Video Summarization; Multisignals Processing; Pattern

13

Recognition; Cluster Analysis; Information Retrieval; Vehicles Counting.

14

1. Introduction

CR IP T

12

Vehicle Traffic Counting (VTC) is one of the best-known methods and

16

promising research topics in transport sciences [3, 17, 31, 42, 47]. The VTC

17

plays an important role in collecting raw traffic data, which are crucial in con-

18

ducting any transportmetrical modelling study. Traditional counting methods

19

are mainly based on the buried inductive-loop traffic detector [11], pneumatic

20

tubes [44], piezoelectric sensors [32], and radars [20]. These devices are often

21

electronic and electromagnetic communication or detection systems that can

22

sense a set of vehicles passing or arriving at a certain point. These technical

23

equipments are either buried under the roadway surface or embedded into the

24

roadway. For this reason, they are inconvenient and cannot therefore be main-

25

tained. In comparison with traditional detectors, traffic cameras are installed

26

on the roadside and are considered as a major component of most Intelligent

27

Transportation Systems (ITS). Monitoring centers receive real-time live records

28

and perform image analyses. Today, since cameras are easily operated, con-

29

trolled and maintained, the traffic video data have replaced old-fashioned ones

30

and are now extensively applied to resolve many other transport problems.

PT

ED

M

AN US

15

The computer vision is a branch of artificial intelligence that uses appropri-

32

ate tools for acquiring, processing, analyzing and modelling multidimensional

33

data, and especially images in order to produce numerical or symbolic informa-

AC

CE

31

34

tion. In fact, images data can take many forms, such as video sequences, views

35

from one or many cameras, or multidimensional data from traffic, pedestrian

36

flows, conductors surveys, etc. The computer-aided detection and monitoring

37

of moving objects is among the most promising areas that could have a great

38

importance in developing transports technologies. We can find in the literature,

2

ACCEPTED MANUSCRIPT

several recent works in this new field, the most important of which are based

40

on the Background Subtraction (BS) principle [51] and the tracking methodolo-

41

gies [13]. Regarding background subtraction, there are techniques that model

42

the variation of the intensity values of background pixel with unimodal distri-

43

butions [21], mixture of Gaussians [22] and nonparametric kernel density esti-

44

mation [18]. Unimodal models are simple and fast but are not able to adapt

45

to multiple backgrounds (e.g., when there are trees moving in the wind). The

46

Gaussian mixture approaches can cope with these moving backgrounds but can-

47

not handle fast variations with accuracy using a few Gaussians, and therefore,

48

this method has problems for the sensitive detection of foreground regions. The

49

nonparametric kernel density estimation overcomes this constraint and allows

50

quick adaptation to background changes [46]. To provide temporal coherence to

51

the measurements, tracking methodologies are typically applied [30]. The two

52

main variants are those considering 2-D image objects and those inferring 3-D

53

volumes and positions using camera calibration. Nevertheless, as explained in

54

[46], 2-D estimations often lack the required accuracy in classification strategies,

55

while 3-D estimation alternatives are usually designed for simplified urban sce-

56

narios, with reduced vehicle speeds, and do not consider overtaking maneuvers,

57

which makes the classification problem easier.

ED

M

AN US

CR IP T

39

However, to accommodate the real needs of many applications, the two

59

above-mentioned methodologies must be computationally inexpensive and have

60

low memory requirements, while still being able to accurately identify moving

61

objects in the video. On the other hand, many other modelling methodolo-

CE

PT

58

gies lack mathematical rigor, and are static and qualitative, and thus can be

63

difficult to implement and extend. In this paper we develop a methodology

64

for implementing a Statistical Machine Learning (SML) -based algorithm that

AC

62

65

performs an unsupervised traffic video summarizing with the object of automat-

66

ically counting and recognizing a crowd of vehicles forming a traffic flow. This

67

methodology simply consists in extracting lighting information from a traffic

68

video stream and summarizing it using clustering techniques with the aim of

69

counting and classifying vehicles along a roadway. The main contribution is 3

ACCEPTED MANUSCRIPT

to use a fictitious inductive-loop sensor to extract an as-brief-as possible set of

71

observations that can reflect the useful traffic video information, especially, the

72

instantaneous number of vehicles. Another appealing characteristic of the pro-

73

posed approach is that input data, which have been captured by the sensor, will

74

be treated as a multisignal system, which provides the opportunity of benefiting

75

from signals’ mathematics to rigorously formulate and easily extend the defined

76

methodology.

CR IP T

70

Gaussian Mixtures Models (GMM) are employed within the proposed device

78

in defining significant patterns corresponding to the moving objects, thus en-

79

abling to parsimoniously recognize their shapes, velocities, weights, coordinates

80

(intervehicular distances) and other traces that will certainly help to identify

81

vehicles. The strength of the GMM clustering comes from its ability to split

82

even partially overlapped objects, and consequently allowing to resist occlusion

83

problems. Besides, the randomness, which is essentially caused by irregular

84

shapes and sudden speed variations, cannot reduce the performance of the sys-

85

tem, unless important of its parameters are not carefully chosen or verified.

86

Besides, since clustering algorithms cannot often preliminarily provide the ade-

87

quate number of groups, they must be initially supplied with this information.

88

Below, a new approach which is adapted to the current framework is proposed

89

and implemented. The method whose principle is inspired by the Set Theory,

90

allows to easily define the number of disjoint sets before starting the estimation

91

process. The GMM-based clustering of the traffic signals can be especially inter-

92

esting when dealing with situations where day or night (lighted areas) shadows

CE

PT

ED

M

AN US

77

taint objects, since they allow to statistically decompose overlapped clusters.

94

Moreover, due to its statistical (Maximum-Likelihood-type) nature, the system

95

can efficiently operate under adverse weather conditions, such as rain, snow and

AC

93

96

fog or in desert climates where sandy and dusty winds often occur and affect

97

the quality of the record videos. However, while preserving the parsimonious-

98

ness of the approach in the current paper, coupling it to some quality enhancers

99

schemes aiming at denoising [41] and dehazing/deraining/desnowing [24, 25, 35],

100

could easily lead to a better universal approach (see [45] and [47]). 4

ACCEPTED MANUSCRIPT

This paper is organized as follows. In section 2, a literature review is pre-

102

sented about the recent works in these directions. Sections 3 and 4 describe the

103

experimental set-up, data collection and mathematical formulation. The tech-

104

nical background of the Gaussian mixture models, which are employed in the

105

counting and recognizing strategy, is pointed out in section 5. Section 6 summa-

106

rizes the above-defined methodology by outlining the practical implementation

107

tasks. Section 7 provides all simulation results. The last section is a concluding

108

one.

109

2. Literature Review

AN US

CR IP T

101

Many works contributed to the field of the automatic video-based detection

111

and counting of vehicles. The first papers were devoted to cover important top-

112

ics like violation detection [53], vehicles tracking [10], classification of moving

113

vehicles [3], vehicles counting [4], traffic light detection [29], and resolving other

114

problems for the development of a modern transportation system. The recent

115

literature also encompasses other works such as that of Bragatto et al. [6], which

116

exploits computer techniques to build-up a portable equipment able to perform

117

vehicles counting and classification in an automatic way and in real time. De

118

Oliveira and Scharcanski [17] detected vehicles are tracked using a particle filter-

119

ing algorithm to instantaneously determine their positions on the road. Also for

120

vehicles counting, Pan et al. [36] focused on regions of interest and used a num-

121

ber of self-adapting windows instead of the traditional fixed window method.

122

The context of the region is also considered in the work of Meher and Murty

123

[31], where the problem of detecting and tracking movement is determined as a

124

geometric regions between the frames. Mishra et al. [33] developed a real-time

CE

PT

ED

M

110

algorithm for the detection and classification of the different categories of ve-

126

hicles in a heterogeneous traffic video. Contextual regularities are discussed by

127

Zhao and Wang [57], where the characteristic points are tracked and grouped

128

for counting vehicles separately on each channel. Ambardekar et al. [1] in-

129

vestigated a selection of object recognition techniques and constellation-based

AC 125

5

ACCEPTED MANUSCRIPT

modelling applied to the problem of vehicle classification. Fu-min et al. [19]

131

combined a set of machine learning methods to provide an appropriate solution

132

for solving the problem of allday traffic congestion states recognition based on

133

the traffic video information. Mu et al. [34] proposed an approach to detect

134

and recognize traffic lights in vertical arrangement. Their approach has been

135

designed for autonomous vehicles to do all processing in real-time.

CR IP T

130

Other approaches have been specifically developed for counting and/or rec-

137

ognizing vehicles, most of which have been soundly based on the Frame Differ-

138

encing (FD) [54], Background Subtraction (BS) [51] and Optical Flow (OF) [28]

139

methodologies. Unzueta et al. [46] proposed a robust a multicue background

140

subtraction procedure in which the segmentation thresholds can adapt robustly

141

to illumination changes, maintaining a high sensitivity level to new incoming

142

foreground objects and effectively removing moving casted shadows and head-

143

light reflections on the road and in traffic jam situations, in contrast to existing

144

approaches that do not cover all of these functionalities at the same time. To

145

the same end, Yang et al. [52] used an approach which discriminates the fore-

146

ground from the background using a pixel-based dynamic background model by

147

updating a weighting-ordered list for each pixel. The foreground on a check-line

148

is then collected over time to form a spatial- temporal profile image. The traffic

149

flow is finally estimated by counting the number of connected components in the

150

profile image. Iftikhat et al. [23] proposed an algorithm which uses a Gaussian

151

mixture model for detection of foreground and is capable of tracking the vehicle

152

trajectory and extracts the useful traffic information for vehicle counting. This

CE

PT

ED

M

AN US

136

stationary surveillance system uses a fixed position overhead camera to monitor

154

traffic. Seenouvong et al. [43] used several other computer vision techniques, in-

155

cluding thresholding, hole filling and adaptive morphology operations, together

AC

153

156

with the BS technique to get an important counting accuracy. To tackle other

157

problems which are mainly related to the inaccuracy of these methods, some

158

improved strategies have been proposed such as the Feature Correlation Hyper-

159

graph (FCH) [56], which created to model multiple features and enhance the

160

recognition of vehicles, the multi-view based methods devised to improve the 6

ACCEPTED MANUSCRIPT

accuracy of objects classification [55] and the virtual-loop method proposed in

162

[48] to improve the quality of the video-based vehicle counting. In [50], the BS

163

method is integrated with segmentation to track and segment vehicles with the

164

object of detecting occlusion. Moreover, traffic scenario contexts are adopted in

165

the vehicle-counting methods.

CR IP T

161

Nevertheless, video-based detection approaches are often restricted by com-

167

plex external environments such as the adverse weather and illumination con-

168

ditions (e.g. moving shadows, rain/snow/fog weather conditions, and vehicle

169

headlights at night). Such situations are common in traffic scenes [12, 47].

170

Consequently, vehicle detection approaches should be further endowed with the

171

ability of handling data which have been collected in adverse conditions. Many

172

recent studies have proved that these methods tend to be disturbed by a lot

173

of adverse factors. Chen et al. [10] proposed a multimedia data mining frame-

174

work which can be efficiently applied to traffic applications under exceptional

175

weather conditions. A statistical framework based on the mixture of Gaussians

176

has been used by Lagorio et al. [26] to identify changes both in the spatial

177

and temporal frequencies with the aim of characterizing specific meteorologi-

178

cal events in traffic scenes. Wang and Yao [47] proposed a video-based vehicle

179

detection approach with data-driven adaptive neuro-fuzzy networks. Their ex-

180

periments showed that the approach can be used to effectively detect vehicles

181

under adverse illumination and weather conditions.

182

3. Measuring Tools and Experimental Set-up

CE

PT

ED

M

AN US

166

183

The aim of this work is to propose an automatic traffic data collector which

184

can be exploited in many statistical applications, allowing to understand and find solutions to many transports’ problems. In fact, modelling traffic informa-

186

tion is the first stage preceding any development attempt towards an intelligent

187

transportation system. An overview of the developed strategy is graphically

188

represented in the simple diagram of Figure 1. It can roughly be summarized

189

into three general processes: The traffic recording process; the data transmission

AC 185

7

ACCEPTED MANUSCRIPT

190

process; and the information retrieval process. The first stage consists in positioning a traffic camera at a fixed location

192

with a predetermined focus. Traffic cameras are video cameras used to record

193

traffic flows. They are commonly put along major roads in developed nations

194

and are electrically powered either by mains power in urban areas, or via solar

195

panels in rural ones. Our experiment is set-up in manner which allows to con-

196

tinuously record the movement of vehicles flowing in a unique sense. Later in

197

real implementations, this task could be easily extended to treat opposite-sense

198

roads. Some important criteria have to be realized to allow a best performance

199

camera recording. For example, an overhead installation of the camera, which

200

requires the presence of existing structure for mounting. It is worth noticing

201

here that a vertical position with directing the camera’s focus straightly down

202

will always give the more accurate records. We must also consider that weather

203

conditions that obstruct view of traffic can interfere with performance (e.g.,

204

snow, fog, sun glare on camera lens at sunrise and sunset). Besides, we have to

205

take into account that large vehicles can mask trailing smaller ones [44].

M

AN US

CR IP T

191

The second stage of the procedure concerns the process of transmitting

207

recorded information to the computer server. This server is used in the third

208

and last stage to receive data and treat them before reporting results, see Figure

209

1. In fact, transmitting databases, especially large of them, have seen a huge

210

advance these last decades. In this context, the evolution of wireless transmit-

211

ters and receivers technology made wireless exchange of data easy and reliable.

212

Besides, high accessibility to wireless technology almost everywhere encourages

CE

PT

ED

206

to adopt it for the system’s data-streams transmission. The device used in our

214

experiment can be any IP/WiFi spill-resistant camera, which is the realization

215

of both, the first and second stage.

AC

213

216

The third and last stage of the procedure is the center of operations. The

217

inputs at this stage are video data streams that will be treated as a raw database.

218

Then, such a database will be processed through three essential substages:

219

• Focus window: As already done in the first stage when positioning the 8

ACCEPTED MANUSCRIPT

camera, this task is very important and consists on fixing a focus window

221

within the framework of the recorded video with adequate location and

222

angles. The main goal in this task is to focus the view toward the zone of

223

interest.

CR IP T

220

• Pixels line: Once the window is fixed, a static pixels line which acts as

225

a sensor is suitably chosen in such a way that it intersects the vehicles

226

trajectory. Each pixel of such a line defines a signal whose amplitude is

227

equivalent to the intensity of the pixel at the time t. Mathematically, this

228

is equivalent to define for each pixel a signal f (t), ∀ t ∈ Z, which yields

229

a system of signals (multisignal) for all the line. It is noticeable that a

230

another pixels line, not necessarily parallel to the first one, can be also

231

fixed and used later for validation tests.

AN US

224

• Multisignals processing: The main objective of this stage is to prepro-

233

cess the multisignal arising from the pixels line sensor before passing it

234

to the data analyzer. Preprocessing consists firstly on subdividing long

235

signals into a number of sequences to alleviate the process of data streams

236

management. Then, we need to increase the separation between the back-

237

ground and the vehicles, in order to avoid the color confusion. Hence,

238

squares of differences series are generally more useful than intensity lev-

239

els.

PT

ED

M

232

• Pattern recognition: The first objective in this last stage is to assess

241

the appropriate number of vehicles crossing the chosen line. This can be

CE

240

done by defining the adequate number of dynamic clusters evolving in each

243

subsequence of the multisignal. Once the number of clusters is determined,

244

other important traffic information can be assessed such as vehicles sizes,

AC

242

245

intervehicular distances, speeds, shapes, colors, and so on. Hence, we

246

need to define the features of such clusters, the most important of them

247

are their localizations, shapes and volumes. Unsupervised clustering is a

248

famous pattern recognition technique which can easily handle multisignal’s

249

data in order to extract traffic information. 9

ACCEPTED MANUSCRIPT

The proposed procedure can be easily extended to a fully embedded system wire-

251

lessly transmitting derived information to servers which store them in banks of

252

data. This issue could be realized in future eventual works. However, before car-

253

rying out any implementation, we need to formulate the process. The following

254

section proposes a rigorous mathematical explanation of all the procedure.

M

AN US

CR IP T

250

Figure 1: A diagram describing the strategy proposed for the traffic information collection. The strategy can be summarized into three general tasks: Traffic recording, data transmission

ED

and information retrieval.

256

PT

255

4. Mathematical Formulation Each frame from the video sequence is considered as a two-to-one quanti-

257

tative application y(x1 , x2 ) 7→ R+ , (x1 , x2 ) ∈ Ω ⊂ N2 , where y is the color

259

intensity and (x1 , x2 ) are the pixel’s coordinates in a Cartesian plane. Hence,

260

the video is a time-dependent sequence that can then be considered as

AC

CE 258

yx1 ,x2 (t), t = 1, 2, . . . ,

(1)

261

with (x1 , x2 ) ∈ Ω ⊂ N2 . Fixing x1 and x2 is equivalent to focusing on a

262

single pixel within the video, which from a frame to another yields one discrete

263

signal. In the experimental set-up described in section 3, the video processing 10

ACCEPTED MANUSCRIPT

strategy consists firstly on fixing a transversal pixel-line opposing coming on

265

vehicles. Hence, we consider that we are dealing with a T -length video sample

266

and that the dimension of the focus window is N × M , i.e. x1 = 1, . . . , N and

267

x2 = 1, . . . , M . Then, fixing x2 yields an n-dimensional signal that we denote

268

as follows

CR IP T

264

y(t) = [y1 (t), . . . , yx (t), . . . , yN (t)]† , t = 1, . . . , T,

(2)

where the superscript † denotes the vector transpose. This process is a multi-

270

signal that will be analyzed in order to extract valuable information about the

271

traffic. To distinguish vehicles that move in space at the same time, we keep

272

x in Eq.(2) as the space index, varying along the road width. Indeed, such a

273

localization will allow to avoid considering a number of vehicles having the same

274

characteristics and going in parallel, as a single one.

AN US

269

Besides, since data essentially come from a discretization process of continu-

276

ous visual information, they put a considerable amount of noise. Such a noise is

277

commonly amplified due to other factors such as the sudden variations of vehi-

278

cles speeds, the shadow effects, the particular conditions of luminosity and color,

279

and probably for the presence of a number of moving artifacts in the scene. Con-

280

sequently, we should consider Eq.(2) as a space-time stochastic process. Math-

281

ematically, this amounts to suppose that we have (Ω, F, P) a probability space.

282

We also suppose that we have X and τ , two totally ordered sets respectively

283

indexing space and time. The function y describing the pixels’ intensities is a

284

process y : X × τ × Ω → R such as y(x, t, ω) = yX ,τ (ω), ω ∈ Ω. By abusing the

285

notation, the process will be simply denoted {yX ,τ , x ∈ X , t ∈ τ }. Tracking

CE

PT

ED

M

275

moving objects in videos necessitates to rather consider time-differentiated in-

287

tensity signals instead of primitives because they can accurately mark any new

288

movement occurrence. Time-differentiating y leads to the following Stochastic

289

Differential Equation (SDE)

AC

286

dy(x, t) = f(x, t)dt + %(x, t)dη(t), ∀x, t,

(3)

290

where f(x, t) and %(x, t) are respectively the drift and volatility, and η(t) is a

291

brownian motion. 11

ACCEPTED MANUSCRIPT

292

If we consider y is a bivariate discrete function that, for each couple (x, t) ∈

293

N admits a real random value y(x, t), the video sampled data can be represented

294

as an N × T -matrix, yX ,τ = [y(x, t)]x=1,...,N,

t=1,...,T .

(4)

CR IP T

2

This provides a 3D pattern which consists of a number of signals (a vector

296

of signals/a multisignal). These discrete signals will be processed before being

297

summarized with the aim of recognizing a number of characteristics of all mobile

298

engines that cross the focus window. Similarly, Eq.(5) discretized under some

299

assumptions leads to

AN US

295

∆y(x, t) = f(x, t) + %εt , ∀x, t,

(5)

300

where ∆y(x, t) = y(x, t) − y(x, t − 1) and εt ∼ N (0, 1). We denote the new data

301

set,

y˙ X ,τ = [∆y(x, t)]x=1,...,N,

t=2,...,T ,

(6)

which is the data set that will be mined and modelled in order assess f (x, t),

303

which is our essential traffic information.

M

302

The first information we need is the adequate number of vehicles crossing the

305

chosen focus window during the video sample sequence. This is equivalent to

306

identify the appropriate number of clusters in the multisignal dataset (Eq.(6)),

307

allowing consequently to assess the size of the flows. A cluster is a significant

308

covariation of a set of neighbors signals in a manner resembling waves motion.

309

Mathematically, recovering such clusters amounts to estimate the drift in Eq.(5),

CE

PT

ED

304

or simply define the time and space from which begin and finish high dispersions.

311

This is also equivalent to focus on the scatter S of all elements whose variations

312

exceed some threshold, i.e.,

AC

310

S = {x ∈ X , t ∈ τ : ε < |∆y(x, t)|}

(7)

313

where ε is a constant that is fixed sufficiently far from zero. The following

314

proposition gives a determination of such a threshold according to the provided

315

initial conditions. 12

ACCEPTED MANUSCRIPT

Proposition 4.1. Let us consider the signals’ system {y(x, ˙ t)}x∈X ranging be-

317

tween two moments t1 and t2 , such as t1 < t2 , before any vehicle has crossed

318

˙ t)|, ∀ x ∈ X , above the fictive line. There exists a threshold ε = supt1 ≤t≤t2 |y(x,

319

which any variation announces the arrival of a moving object.

CR IP T

316

Now, we need to practically extract away the high-amplitude parts of multisignals, which in fact define the occurrence of vehicles. For that, we consider

321

able set. Among appealing characteristics of such functions is that they allow

322

to extract from any signal (discrete or continuous) a representative scatter. For

323

example, if we rather consider the following discrete composite indicator func-

324

tion,

AN US

320

an Indicator Function (IF) of a set A ⊂ R+ , which for all t ∈ R+ is defined as   1 if y ∈ A 1A {y} =  0 if y ∈ / A.

Similarly, we can define a Discrete Indicator Function (DIF) when A is a count-

M

  1 if |y(x, ˙ t)| > εˆ

ED

1]ˆε,+∞[ {y(x, ˙ t)} =

 0 if |y(x, ˙ t)| ≤ εˆ,

(8)

where εˆ is preliminarily fixed as defined in Proposition 4.1, and for all ∀ x ∈

326

X , t ∈ τ , this allows to split all scatters elements, according to their amplitudes,

327

into two binary modalities. This is a supplementary step towards the detection

328

of moving object in one hand and omitting the background of the scene on the

329

other hand. However, if we want to keep the pixels’ intensity dimension, we need

CE

PT

325

330

Hence, we rather consider the following dataset,   y˜˙ X ,τ = |∆y(x, t)|1]ˆε,+∞[ {∆y(x, t)}

AC

331

to incorporate the real amplitudes which partially reflect the colors of vehicles1 .

(9)

x=1,...,N, t=2,...,T

1 Detecting

vehicles colors and shapes is a very important topic for intelligent surveillance.

This type of operations is most commonly used by government law enforcement and intelligence services, see Li et al. [27].

13

ACCEPTED MANUSCRIPT

which finally gives for each observation four dimensions, its coordinates (x, t),

333

its absolute amplitude ξ = |y(x, ˙ t)| and a binary variable 1]ˆε,+∞[ which indicates

334

whether such an element will be taken in the scatter S or not. The dichotomy

335

is an important fact since it allows to decompose the multisignal flow into two

336

essential areas, a road background and a pattern representing moving vehicles.

337

In the next step, we only consider elements x, t, ξ | 1 = 1. All these elements

338

are subsequently represented in a Cartesian coordinates system. This yields

339

a scatter that will be analyzed with the objective of determining an optimal

340

number of structures that corresponds to the number of vehicle that have crossed

341

the pixels line. In contrast, the complementary set that consists of elements

342

x, t, ξ | 1 = 0, will be omitted since it represents the road background that, in

343

fact, has no interest in our study. The new dataset whose elements verify 1 = 1

344

is a 3D set denoted S and it defines a scatter that corresponds to the moving

345

objects crossing the focus window. Now, the main objective is to determine the

346

number of disjoint subgroups and try to identify each one.

347

5. Clusters Analysis

348

5.1. Finite mixture models

ED

M

AN US

CR IP T

332

Cluster Analysis is an unsupervised pattern recognition method that splits

350

the data space into a set of subspaces. The object of a clustering scheme is to

351

perform a partition where elements within a cluster are similar and elements

352

in different clusters are dissimilar. Finite Mixture Models (FMMs) are popu-

353

lar unsupervised statistical learning methods used for automatically classifying

354

and modelling data. These models consider a parametrization of the variance

355

matrix of each cluster through their spectral decomposition leading to many

AC

CE

PT

349

356

geometric structures for clustering and classification [5, 39]. Estimating FMMs

357

models is often carried out using the well-known EM algorithm (expectation-

358

maximization) or any version of its many improved variants [16]. Although

359

FMMs have been incrementally used in many other fields such as in epidemiol-

360

ogy, biology and finance, they are used the first time in traffic vision. Below,

14

ACCEPTED MANUSCRIPT

we put forth the technical background of these parsimonious models, before

362

describing their practical implementation for counting and recognizing vehicles.

363

In a FMM, it is assumed that data are generated by a mixture of under-

364

lying probability distributions in which each component represents a different

365

cluster. Given observations y = (y1 , . . . , yN ), let ϕ(yi |Φg ) be the density of an

366

observation yi from the g-th component, where Φg are the corresponding pa-

367

rameters, and let G be the adequate number of components in the mixture. The

368

model for the composite of the clusters is generally formulated by maximizing

369

the following likelihood approach:

AN US

L(Φ1 , . . . , ΦG ; π1 , . . . , πG |y) = 370

371

372

N X G Y

CR IP T

361

i=1 g=1

πg ϕ(yi |Φg ),

(10)

where πg is the probability that an observation belongs to the g-th component, PG such that, πg ≥ 0 and g=1 πg = 1. Let us consider the case where ϕ(yi |Φg ) is a multivariate Gaussian probabil-

ity density function (p.d.f.), which is a model that has been used with consider-

374

able success in a large number of applications. In this instance, the parameters

375

Φg consist of a mean vector µg and a covariance matrix Σg , and the density is

376

given as follows:

ED

M

373

PT

ϕ(yi |µg , Σg ) = (2π)−n/2 |Σg |−1/2

× exp(− 21 (yi − µg )† Σ−1 g (yi − µg )).

(11)

As already developed in a monographs, Gaussian clusters are ellipsoidal, cen-

378

tered at the means µg . The covariances Σg determine their other geometric

CE

377

characteristics. In fact, each covariance matrix can be written as

AC

379

Σg = λg Dg Ag Dg† ,

(12)

1

380

where λg = |Σg | n , Dg is the orthogonal matrix of eigenvectors, Ag is a di-

381

agonal matrix such that |Ag | = 1, with the normalized eigenvalues of Σg on

382

the diagonal in a decreasing order. The orientation of the principal compo-

383

nents of Σg is determined by Dg , while Ag determines the shape of the density

384

contours; λg specifies the volume of the corresponding ellipsoid. By allowing 15

ACCEPTED MANUSCRIPT

these quantities to vary between groups, parsimonious and easily interpreted

386

models, useful in describing various clustering or classification situations, can

387

be obtained. Varying the assumptions concerning the parameters λg , Dg and

388

Ag leads to a selection of different models, which offers a flexible stochastic

389

modelling toolbox.

CR IP T

385

As defined by Biernacki et al. [5], we can distinguish 3 main categories of

391

models. One of them consists in assuming spherical shapes, namely Ag = I,

392

where I is the identity matrix. These models are named spherical and are de-

393

noted [λI]. Another family of interest consists of assuming that the variance

394

matrices Σg are diagonal. This means that Dg are permutation matrices. We

395

write Σg = λB where B is a diagonal matrix verifying |B| = 1. This category

396

is named diagonal and is denoted [λB]. The third category allows volumes,

397

shapes and orientations of clusters to vary or to be equal between clusters. It is

398

named general and is denoted [λDAD† ]. It is noticeable that for each category,

399

a subscript under a parameter means that the parameter is let varying between

400

clusters. Overall, 28 Gaussian Mixture Models can be used as automatic classi-

401

fiers (see Table 1).

M

AN US

390

Weights or proportions πg ’s are also important parameters that enrich GMMs

403

since they determine proportions of data that a priori belong to their respective

404

clusters. According to Biernacki et al. [5], two typical assumptions are consid-

405

ered with regard to the proportions: assuming either equal or free proportions

406

over the mixture components. Therefore, models can rather be classified ac-

407

cording to the degree of freedom allowed to the components of the quantity

CE

PT

ED

402

πλDAD† . Once added to the above geometric features, we can obtain the

409

twenty-eight models of Table 1. By varying such parameters, a selection of

410

parsimonious and easily interpreted models are ready to be calibrated.

AC

408

411

5.2. Models learning

412

In an unsupervised clustering context, the number of groups has to be spec-

413

ified before running the estimation procedure. Then, during the estimation

414

process, data move iteratively from one cluster to another, starting from an 16

ACCEPTED MANUSCRIPT

Table 1: Parametrization of a Gaussian Mixture Model (GMM). (The index g refers to varying parameters between clusters)

Models

Equal π 0 s

Varying π 0 s

General

[λDAD† ]

(1)

(15)

[λg DAD† ]

(2)



(3)

[λg DAg D† ]

(4)

[λDg ADg† ]

(5)

[λg Dg ADg† ]

(6)

[λDg Ag Dg† ]

(7)

(21)

[λg Dg Ag Dg† ]

(8)

(22)

[λB]

(9)

(23)

(10)

(24)

(11)

(25)

(12)

(26)

[λI]

(13)

(27)

[λg I]

(14)

(28)

[λg B] [λBg ]

M

[λg Bg ]

(17)

(18)

(19)

(20)

ED

Spherical

(16)

[λDAg D ]

AN US

Diagonal

CR IP T

Category

initial position that has been chosen [8]. Typically, the number of components

416

does not change during the course of the iterations. The EM algorithm [16]

417

or one of its variants [5] can be qualified as main machines that appropriately

418

ensure a GMM calibration. The principle is described as follows. With a data

CE

PT

415

set typically composed of N vectors y = (y1 , . . . , yN ) in Rn and the aim is to

420

estimate an unknown partition z of y into G clusters, where z = (z1 , . . . , zN ) de-

421

notes the labels with zi = (zi1 , . . . , ziG ), zig = 1 if yi belongs to the g-th cluster

422

and 0 if not. The relevant assumptions are that the density of an observation

423

yi given zi is expressed as:

AC

419

p(xi ; zi ) =

G Y

g=1

[πg ϕ(yi |µg , Σg )]zig ,

17

(13)

ACCEPTED MANUSCRIPT

424

and that each zi is independent and identically distributed (i.i.d.) under a

425

multinomial density. The resulting complete-data log-likelihood is:

426

N X G X i=1 g=1

zig log πg ϕ(yi |µg , Σg ).

(14)

CR IP T

l(µg , Σg , πg , zig |y) =

The Expectation-step of the EM algorithm for mixture models is given by zˆig = E[zig |yi , Φ1 , . . . , ΦG ] = PG

ˆ g) π ˆg ϕ(yi |ˆ µg , Σ

g 0 =1

ˆ g0 ) π ˆg0 ϕ(yi |ˆ µg0 , Σ

,

(15)

while the Maximization-step involves maximizing Eq.(14) in terms of πg ’s and

428

Φg ’s with zig fixed at the values computed in the Expectation-step. From rea-

429

sonable starting parameters [5], the EM algorithm alternates between expecting

430

and maximizing. The routine continues until a level when the relative differ-

431

ence between successive values of the mixture log-likelihood falls below a small

432

threshold. Under certain conditions, the method can be shown to converge to

433

a local maximum of the mixture likelihood (see [16]). For a GMM, estimates of

434

the means and probability weights have simple closed-form expressions involving

435

the data and the quantities zˆik from the Expectation-step, ng , N

µg =

PN

ˆig yi i=1 z

ED

πg =

M

AN US

427

PN

ng

and

Σg =

PN

ˆig (yi i=1 z

− µg )(yi − µg )T ng

436

where ng =

437

by its spectral decomposition can be seen in Celeux and Govaert [8].

438

5.3. Optimal number of clusters

Details of the Maximization-step for Σg parameterized

CE

PT

ˆig . i=1 z

(16)

We need to count and estimate the features of the separate subgroups that

440

are composed of points that are the least dispersed in both time and space.

441

As discussed in section 3, each one of these groups contains information about

AC

439

442

a passing vehicle. Since the components are well defined in time and space,

443

their centers (means) will inform about their localizations and distances one

444

another. The dispersion of groups is also important because it informs about

445

the size/velocity ratio, which allows to classify vehicles into categories. In fact,

446

larger vehicles are known to have a limited speed, unlike small and midsize 18

ACCEPTED MANUSCRIPT

vehicles. They are generally slowed because of the current traffic laws and

448

regulations, in comparison with other vehicles such as the light and commercial

449

cars. Hence, the greater the dispersion of the cluster, the slower and larger the

450

vehicle will be assumed. Another assumptions that should be considered is that

451

no vehicles stop in the captured road videos.

CR IP T

447

To identify motion groups, it is convenient to proceed using some particu-

453

lar clustering tools. Many clustering algorithms are not able to preliminarily

454

provide the appropriate number of clusters, and therefore they must initially

455

be supplied with this information. Since this information is seldom previously

456

known, we need to run the algorithm many times with a different value for

457

each run. Then, the partition that best fits the data is chosen. The process of

458

estimating how well a partition fits the structure underlying the data is known

459

as cluster validation. Several Cluster Validation Indices (CVIs) have been pro-

460

posed in the literature [2]. One of them is to focus on the partitioned data and to

461

measure the compactness and separation of the clusters, where approaches such

462

as those of Dunn [14], Davies-Bouldin [15], Calinski-Harabasz [7] and Rousseeuw

463

[38] are the most prominent.

M

AN US

452

When using GMMs, a set of Selection Criteria have been used for choosing

465

an optimal model and its appropriate number of clusters. The most popular

466

criteria are the Bayesian Information Criterion (BIC), the Integrated Complete

467

Likelihood (ICL) and the Normalized Entropy Criterion (NEC). It is noticeable

468

that the NEC is only used for assessing the number of clusters of a mixture

469

model. However, when data distribution is significantly different from a Gaus-

CE

PT

ED

464

sian mixture, such criteria can provide a wrong number of components, which

471

may leads to misleading interpretations. Hence, we develop a procedure whose

472

role is to firstly determine the optimal number of clusters at each subsequence.

AC

470

As an alternative, we propose a simple strategy for preliminary counting

disjoint subgroups. For each sampled subsequence, we begin by transforming data, composing the scatter y˜˙ , and then, fixing a maximum number of clusters Gmax . Then, we perform the adjustment of the Gmax -components GMM to the data, to obtain a set of clusters C = {C1 , C2 , . . . , CGmax } with respective means 19

ACCEPTED MANUSCRIPT

ˆ 1, . . . Σ ˆ G }. The data set y˜˙ is consequently {ˆ µ1 , . . . , µ ˆGmax } and covariances {Σ max

shared between these clusters to form the set we denote {y˜˙ (1) , . . . y˜˙ (Gmax ) }, where

subsets y˜˙ (h) , h = 1, . . . , Gmax , are partitioned but not necessarily disjoint. Let

Sg =

S

C∈C

CR IP T

us consider the structures of connected clusters: C, g = 1, . . . , G, (G ≤ Gmax ) T

C 6= ∅. Clusters belonging to the same structure

473

verifying within each group g,

474

are labelled to a specific class (the gth) among the G separate classes. Defining

475

these structures (their number and components) can be ensured by checking

477

elements y˜˙ i simultaneously belonging to more then one cluster. If this is the T case (i.e. y˜˙ i ∈ C), y˜˙ i is assigned to Sg . It is noticeable that the frequency

AN US

476

of elements within intersection zones can be of great importance for defining

479

the adequate number of vehicles and delimiting them, especially when data are

480

corrupted by adverse weather and illumination conditions. Moreover, it deserves

481

to be mentioned that this approach is mathematically obvious since we know

482

all Gmax clusters equations. Finally, once composed structures are well defined,

483

they will reflect the locations and areas of the scatters corresponding to the G

484

vehicles in the current video subsequence.

485

6. Implementation

PT

ED

M

478

Before performing the cluster analysis, we should subdivide the traffic video

487

steaming into a set of short sequences in order to make the data modelling

488

possible. Practically, this amounts to deal with subsequences and send them

489

one-by-one to the clustering center as soon that they are recorded and trans-

490

formed. The length of subsequences must be arranged in a way that they do

AC

CE

486

491

not contain more than 12 vehicles. This is done by considering, for each subse-

492

quence, the time interval corresponding to a pavement portion which is assumed

493

to handle not more than a dozen of crowded vehicles. The width of intervals can

494

accordingly be preliminarily fixed by estimating the adequate surface for such

495

a number, taking also into account inter-vehicular distances. Besides, vehicles

20

ACCEPTED MANUSCRIPT

496

must treated on a sequence of mutually disjoint intervals. In other words, a

497

vehicle that is considered in two successive subsequences have to be processed

498

only at one of them. If we consider a set of splitting-times t0 , t0 + ∆t, t0 + 2∆t, t0 + 3∆t, . . .,

500

which corresponds to the succession of periodic times stating the end of a video

501

sequence and the beginning of a new one, a number of vehicles will probably

502

belong to two successive sequences. Hence, we need to particularly treat vehicles

503

that intersect frontiers. We propose a simultaneous strategy whose role is to

504

collect and treat such cases in a parallel independent process, see Figure 2.

505

The strategy consists, after performing the cluster analysis on the first video

506

subsequences, in extracting clusters that coincide with the splitting time and

507

merge them. This can be done by re-clustering the datasets related to such

508

clusters.

AN US

CR IP T

499

Explicitly, a number of clusters {C1 , C2 , . . . , Cg , . . . . . .} with respective means

509

{µ1 , µ2 , . . .} and respective covariances {Σ1 , Σ2 , . . .} may occur at the end of the

511

video sequence. A cluster Cg is consequently split into two parts Cg

512

Parts are localized in both time and space, consequently, they can be identified

513

as composing the same cluster which coincides with the splitting-time. In a sec-

514

ond stage, couples Cg

515

are consolidated through redrawing their common bounds using a re-clustering

516

task performed to delimit their scatters. We should notice that, when we are

517

facing a low frequency traffic, such intersection situations can be straightfor-

518

wardly avoided by allowing more flexibility to the video band to last until all

(1)

(2)

and Cg .

ED

M

510

(2)

and Cg , ∀ g, are considered in a parallel process and

CE

PT

(1)

vehicles have entirely crossed the pixels line. Unfortunately, this is seldom the

520

case in urban areas.

AC

519

521

522

The overall data modelling stage can be summarized to the following main

523

tasks. 1- Multi-signals data are time-differentiated (before or after being se-

524

quentially processed). 2- A Gmax -components finite mixture model is assigned

525

to the new dataset. 3- Structures labelling is used to determine the adequate 21

M

AN US

CR IP T

ACCEPTED MANUSCRIPT

Figure 2: A diagram explaining the video sequencing strategy and how to deal with vehicles that have been split between two adjacent subsequences. In the top, we consider two adjacent

ED

video subsequences A and B. In the bottom, we see how vehicles coinciding with the cutting time are considered in an independent subspace.

number of separated groups in each subsequence. 4- A parallel process is de-

527

voted to treat border clusters. 5- A final task is consecrated to reassign a single

528

cluster to each vehicle before a traffic report is rendered. Such a report contains

529

an assessment of the total number of vehicles as well as other information such

530

as the vehicles’ velocity and weight, the intervehicular distances and the traffic

531

fluidity. The instruction set and programs are outlined in Tables 4 and 5 (see

CE

PT

526

the Appendix).

533

7. Simulations and Results

AC 532

534

The following simulations are based on a short benchmark video. The objec-

535

tive is to test the data processing program of the proposed strategy for counting 22

CR IP T

ACCEPTED MANUSCRIPT

(a) 157th frame

(b) 170th frame

Figure 3: The 157th and 170th sampled frames taken from the benchmark traffic video.

and recognizing vehicles, which in fact is the last one of the three main tasks

537

that compose the strategy (see Figure 1 in section 3). A server machine with 2.3

538

GHz processor and 8 GB RAM is used during experiments. As seen in section

539

4, processing traffic videos begins by splitting long streams into short sequence,

540

where the length of sequence should depend on the frequency of vehicles on the

541

road section. Let us recall that the main role of the proposed strategy is to

542

ensure the automatic collection of the vehicles’ information from the recorded

543

video, in particular, their number, spatial separation and categories.

M

AN US

536

The benchmark sequence, which recording appears to have been made in

545

good conditions, properly describes the passage of a flow of vehicles of similar

546

sizes on a one-way road. The road consists of two subways, but we will only

547

focus on the principal way which is directly objected to the camera. Figure

548

3 shows an excerpt of two frames taken from the benchmark video. Figure 4

549

illustrates the focus window and the pixels line choices. The video duration is

550

d =17 seconds. The sequence is firstly discretized by cutting it into 531 frames.

CE

PT

ED

544

This implies that a constant steplengh ∆t=0.032 seconds has been preliminarily

552

fixed. It is worth noticing that we can change the time scale resolution ∆t to

553

either zoom-in or zoom-out the representation according to the nature of the

554

traffic.

555

7.1. Experiment 1

AC

551

556

As defined above, the proposed program consists firstly in carefully fixing

557

the focus window and the pixels line in the video, which is carried out as seen in 23

AN US

(a) Focus Window

CR IP T

ACCEPTED MANUSCRIPT

(b) Wireless Inductive Loop

Figure 4: The focus window and the fictive-loop (pixels line) choices.

Figure 4. The fictive line defines the first set of signals that contain information

559

about the traffic. Such an information remains hidden behind an amount of

560

systematic noise. The multivariate time series are firstly transformed to their

561

squared differences, see Figure 5. Then, the program ensures the projection

562

of such data on a functional subspace that allows to split them into two di-

563

chotomous levels before extracting only the higher level. The threshold εˆ is

564

fixed according to Proposition 4.1. In the next step, two among the best-known

565

CVIs are performed in order to determine the number of vehicles G that cross

PT

ED

M

558

the fictive wire inductive-loop. The Davies-Bouldin and Silhouette indices re-

567

spectively attain their minimal and maximal at the fourth cluster. The same

568

results is obtained when considering a preliminary number Gmax =15 which is

AC

CE 566

569

adjusted according to the clusters intersections. Consequently, an adequate 4-

570

components mixture model is assigned to the transformed data. Such a model

571

summarizes the video sequence, thereby defining the number of vehicles G = 4,

572

their size (weights) and intervehicular distances (localization), see Figure 6 and

573

Table 2. 24

ACCEPTED MANUSCRIPT

250

150

CR IP T

Color level

200

100

50

0

0

100

200

300

Time (frames)

400

500

(a) Traffic Multisignal

(b) Squared t-Differentiated Data

AN US

Figure 5: In the left-hand side, the plot of the traffic multisignal. In the right-hand side, 3D representing the squares of the time-differentiated multisignal.

Table 2: EM-estimated Gaussian mixture parameters. A four-components model is assigned to data. The four vehicles have almost the same size. Component 1 Component 2

Component 3

Component 4

0.284

0.213

0.240

172.73

252.92

345.50

415.64

388.67

85.32

421.01

92.78

Weights (πg )

ED

Localization (µg )

PT

574

575

M

0.262

7.2. Experiment 2

The second experiment is also conducted on the basis of the same video

CE

576

benchmark, see Figure 3. The object is the same, except that we assume that,

578

when sequencing the video streaming, some passing vehicles are split between

579

the two adjacent subsequences. Hence, we slightly modify the initial conditions

AC

577

580

of the simulation, in such a way we create these situations. Precisely, splitting-

581

times are chosen at the same time vehicles cross the pixels line. Creating such

582

perturbations allows to further test the robustness of the strategy. It is worth

583

mentioning that no change of the focus window neither the pixels line are made.

584

Proposed algorithms are launched with the objective of retrieving the adequate 25

0.85

0.8

0.775

0.5

0.75

0.4

0.725

Minimal DB value

0

2

4

6

8

10

12

14

16

Number of Clusters

18

20

(a) Cluster Validation Indices

0

0.2

0.4

0.6

0.8

Silhouette Value

1

(b) Silhouette Analysis

200

PT

ED

Space

0.7

600

400

-200 100

3

4

M

600

0

2

AN US

0.55

Cluster

0.6

0.45

1

0.825

Maximal silhouette value

Silhouette Values

0.65

200

300

400

400 Space

Davies Bouldin Values

0.7

CR IP T

ACCEPTED MANUSCRIPT

200

0

-200 100

500

Time (frames)

200

300

400

500

Time (frames) (d) 4 Clusters (re-clustering)

CE

(c) 15 Clusters

Figure 6: Counting and pattern recognition. (a) plots Davies-Bouldin and silhouette values for each number of clusters tested. (b) represents a silhouette plot created from clustered

AC

data. (c) plots EM-estimated clusters.

26

ACCEPTED MANUSCRIPT

585

number of vehicles. In other words, the four vehicles crossing the window have

586

to be recounted and modelled. In this experiment, we use two computers, one of them as a transmitter

588

and the other as a receiver. The transmitter ensures the subdivision of the

589

video and the transmission of subsequences one after another. The receiver

590

picks up subsequences and carry out all other tasks. This experiment simulates

591

a synchronous implementation of the framework. Cutting the video sequence

592

is made at t1 = 8 s : 128 ms and t2 = 13 s : 216 ms, which gives the three

593

subsequences whose scatters are plot in Figure 7. Unlike the first experiment,

594

we only use Gaussian Mixture Models to both count and recognize vehicles.

595

As explained above, the principle is to fixe in advance a maximum number of

596

clusters (≥10). Once estimation is complete, new groups of intersected clusters

597

at each sequence gives the number of vehicles G in that sequences. A second

598

estimation is carried out, with a preliminary fixed number of clusters G, and

599

aims at recognizing the vehicles categories. Frontier clusters are particularly

600

treated as explained above (see Figure 8), to finally find the same results of the

601

first experiment (see Figure 9).

CE

PT

ED

M

AN US

CR IP T

587

Figure 7: Scatters extracted from the three consecutive subsequences after subdividing the

AC

traffic video.

602

603

27

CR IP T

ACCEPTED MANUSCRIPT

Figure 8: A first cluster analysis for identifying vehicles. Border clusters whose distance between their centers does not exceed a limited value are firstly identified to be merged into

604

AN US

a unique cluster.

7.3. Experiment 3

The current experiment considers cases where recording conditions have not

606

been optimized. One way to simulate this situation is to use benchmark se-

607

quences such as those given in the left-hand side of Figure 10. In the first

608

sequence, the camera angle and the distance seem to be not appropriate for a

609

obtaining perfect results. Moreover, the traffic flow rate is higher than that of

610

the previous experiment, with a value exceeding 10000 vehicles per hour (see

611

Chan and Vasconcelos [9]). The second and third traffic videos respectively

612

represent cases where fog and snowfall engulf the landscape, thus affecting the

613

visibility of the scene. These adverse weather conditions are commonly encoun-

614

tered in traffic scenes. As for the first case, it is clear that shots have been

615

taken from long distance. The two latter sequences are provided by Eichberger,

616

Fleischer and Leuck2 and have been incrementally used as benchmark data in

CE

PT

ED

M

605

617

earlier works such as in Chen et al. [10] and Zhang et al. [58]. The aim of this experiment is to test the performance and usefulness of the

619

proposed approach under more or less unfavorable conditions. To meet this

620

objective, we propose a numerical simulation, in which we use the clustering

621

techniques to detect the vehicles on their roads in each one of the benchmark

AC

618

2 http://i21www.ira.uka.de/image

sequences/Karl-Wilhelm-Straße: stationary camera in-

stallation by German Eichberger, Klaus Fleischer and Holger Leuck.

28

500

400

400

200 100

0

120

140

160

180

Time (frames)

M

300

ED

Space

400

300

220

240

260

280

300

480

500

600

500

0

200

(b) Subsequence 2

600

100

200

Time (frames)

(a) Subsequence 1

200

200 100

0 100

300

AN US

300

CR IP T

600

500

Space

600

320

340

360

380

500 400

Space

Space

ACCEPTED MANUSCRIPT

300 200 100 0

400

400

420

440

460

Time (frames)

PT

Time (frames)

(c) Subsequence 3

(d) Subsequence 4

CE

Figure 9: A second analysis for merging border clusters. Border clusters whose distances

AC

between their centers does not exceed a limited value are merged into a unique cluster.

29

ACCEPTED MANUSCRIPT

sequences. Any one of the 28 models of Table 1 can be assigned to each of these

623

scenes, and this is done according to the Bayesian criterion. This procedure is

624

repeated several times in order to only provide consistent results (i.e., results

625

corresponding to the most recurrent models). In this experiment, we assess the

626

detection accuracy on the basis of the of the fit between clusters and actual

627

vehicles. Each cluster surrounding the scatter corresponding to a particular

628

vehicle will be counted among the right predictions. Otherwise, the detection

629

of the vehicle will be considered wrongly predicted. Throughout the simulation

630

study, the most reccurent model for each one of the scenes is depicted in the

631

right-hand side of Figure 10.

AN US

CR IP T

622

For Sequence 1, almost all the 12 vehicles are suitably detected and ac-

633

curately quantified, although shots have been taken from long distance. The

634

red scatter fitting represents the only exception since the cluster is somewhat

635

affected by few outliers. Since data are naturally overlapped in the current

636

experiment, contrary to what has been previously done when dealing with in-

637

tersecting clusters, the counting rule has been slightly modified. This is why,

638

we rather consider nested clusters as a single vehicle. Mathematically, this con-

639

dition can be straightforwardly checked by testing whether all elements of the

640

scatter of the included cluster belong to the space defined by the including clus-

641

ter. A metric function for handling overlapped clusters according to the level

642

of the density of the traffic can be developed as a part of a future work. For

643

sequences 2 and 3, the results seem the most affected by the exceptional record-

644

ing conditions. For both cases, we only retrieve a number G − 1 vehicles, i.e.,

CE

PT

ED

M

632

one less vehicle on the road. This shows that, with a distantly-installed camera

646

and under adverse weather and illumination conditions, the system can lead to

647

imprecise results. In the two problems, the choice of the threshold εˆ has to be

AC

645

648

carefully chosen. A too low value can give rise to a potential number of outliers.

649

On the other hand, a too high value can hide some significant patterns, which

650

are replaced by dispersed points whose effect is similar to that of outliers, such

651

as in the case of the grey car in the bottom of Figure 10(c). Thus, as seen in

652

Figures 10(d) and 10(f), the clusters in the neighborhood are affected by the 30

ACCEPTED MANUSCRIPT

few remaining points. The current example highlights the importance of offer-

654

ing the suitable preliminary conditions for the video recording. In the case of

655

exceptional circumstances, choosing an optimal threshold and performing ade-

656

quate data preprocessing (like defogging/desnowing) should be incorporated as

657

essential parts of the process.

658

7.4. Experiment 4

CR IP T

653

The fourth application is a real study with the objective of testing the count-

660

ing and recognizing methodology in a more realistic framework. The Saudi cap-

661

ital, Riyadh, is chosen as the place for carrying out tests. Recording is made

662

at the Northern Ring Road (East), which is one of the key roads in Riyadh.

663

The experiment is implemented following the demarche described in section 3.

664

A wireless IP camera (2.0-Megapixel) is fixed at a height of 7 meters for live

665

video streaming. Data are transmitted to a portable computer to be processed.

666

The contour plot of the pixels line intensity in Figure 11 gives an overview of

667

the filmed sequence. A number of 106 vehicles are counted during the short

668

period of the experiment, 6 among them are Heavy Goods Vehicles (HGV)

669

while remained are light-kind vehicles. The video sequence has been picked up

670

a Saturday morning of a sunny day of March.

ED

M

AN US

659

The video sequence is wisely subdivided into 14 portion in manner allow-

672

ing to avoid border clusters. The sequence consists of 2972 frames and the

673

bandwidth of each proportion varies between 100 and 250 frames. The first

674

estimation is launched while fixing the preliminary number of vehicles by sub-

PT

671

sequence to 20. As explained in the previous section, the number/subsequence

676

is forthwith reduced once we consider groups of intersected clusters. Final esti-

677

mation allows to assess vehicles’ weights (categories). In the current example,

AC

CE 675

678

scatters has been further regressed using the well-known Principal Component

679

Analysis (PCA) method. The last estimation is carried out under general, di-

680

agonal and spherical models with, of course, varying proportions (see Table 1).

681

General models were the best according to the BIC measure, with a counting

682

success exceeding 97%, (see results in Figure 12). Besides, all HGVs have been 31

ACCEPTED MANUSCRIPT

CR IP T

30

50 20

100 150

10

Axis 2

200 250

0

−10

300 350

−20

400 −30 −150

100

200

300

400

500

−50

0

50

100

150

Axis 1

(a) Sequence 1: Misplaced camera

50

(b) Detecting vehicles in sequence 1

50

100

40

150

30

200

20

Axis 2

250 300

10

0

−10

350

−20

450 500 550 200

300

400

500

600

(c) Sequence 2: Fogy landscape

50 100

−50 −250

50

40 20

Axis 2

0 −20

400

−40

450

−60

CE

0

60

350

500

−80

550

−100 −100

400

−50

80

300

300

−100

100

250

200

−150

Axis 1

200

100

−200

(d) Detecting vehicles in sequence 2

PT

150

−40

700

ED

100

−30

M

400

AC

−100

600

AN US

450

500

600

−50

0

50

100

150

Axis 1

700

(e) Sequence 3: Snowy landscape

(f) Detecting vehicles in sequence 3

Figure 10: Detecting vehicles under unfavorable recording conditions.

32

ACCEPTED MANUSCRIPT

well identified and localized, as seen in subplots S1 , S3 , S9 , S10 and S13 in Fig-

684

ure 12. However, the sensibility of GMMs to outliers, which is clear in subplots

685

S3 and S10 , should be noticed as one of the weaknesses of the method. Out-

686

liers which are caused by some factors such as shadows of vehicles or any other

687

object crossing the detection line, can introduce uncertainty to the recognition

688

function. The system has also proven to be promising in crowded scenes. In ar-

689

eas where vehicles was the least scattered, clusters have been properly assessed.

690

This proves that, once the system is perfectly implemented, results in crowded

691

areas can be of great interest.

ED

M

AN US

CR IP T

683

Figure 11: A contour plot of intensity multisignal drawn by the detector line. Recording is

PT

made at the Northern Ring Road (East) of Riyadh (Saudi Arabia).

It is well known that Gulf countries are characterized by a desert climate

693

where sandy and dusty winds often occur and affect the quality of the record

CE

692

videos, in addition to other known problems like the rain/snow/fog/haze condi-

695

tions. Mixing such patterns could be assimilated to a randomly varying noise.

696

Hence, as a second stage of this case study, additional experiments are car-

AC

694

697

ried out on the basis of a degraded version of same video sequence. This time,

698

the video is preliminarily blurred and contrast-modified, before being corrupted

699

(additively and multiplicatively) by a dynamic noise. We assume two different

700

degradation models: The first model consists in blurring the video by a Point

701

Spread Function (PSF) before injecting noise. The second model assumes a poor 33

ACCEPTED MANUSCRIPT

contrast ratio and the video is also corrupted by a gradually increasing noise.

703

The figure 13 gives an overview of the degradation process through some selected

704

noisy frames. An accuracy measure is used to assess the stability of the counting

705

and recognizing method against noise and visual damaging effect. The measure

706

is equivalent to the proportion of vehicles that have been correctly recounted

707

and roughly recognized. The results of the accuracy percentage are reported in

708

Table 3. The results show that clustering is significantly affected by the noise

709

level and the quality of the resolution. This allowed us to add another point to

710

the list of shortcomings to be overcame in future research directions by adopting

711

hybrid strategies with schemes of denoising [41], dehazing/deraining/desnowing

712

[24, 25, 35]. Such a combined strategy could be efficiently used in real complex

713

environments (e.g. adverse illumination and weather conditions).

AN US

CR IP T

702

Table 3: Assessing the accuracy percentage (%) of the counting and recognizing method while

Original

Blurred

Poor contrast

97

97

97

89

82

85

43

30

40

20

16

18

PT

its contrast level.

M

degrading the test video. A noise sequence is added to the video while blurring it and varying

14

9

11

8

3

5

Speckle(0.00)

97

97

97

Speckle(0.03)

96

96

95

Speckle(0.05)

92

88

90

Speckle(0.10)

81

48

80

Speckle(0.15)

76

32

71

Speckle(0.20)

29

25

27

N (0, 0.00) N (0, 0.03) N (0, 0.05) N (0, 0.10) N (0, 0.15)

ED

Additive noise

N (0, 0.20)

AC

CE

Multiplicative noise

714

715

To recapitulate, let us mention some of the distinguishable advantages of the 34

ACCEPTED MANUSCRIPT

proposed strategy. Cheapness of the strategy implementation and maintenance,

717

in comparison with many classical tools that are commonly used, is one such

718

advantages. The device can be easily extended to an autonomous solar powered

719

system. We should also notice the high practicability of the method since its

720

principle is mainly related to some simple mathematical notions. In fact, visual

721

information are converted to quantitative data, which are handled by GMM

722

statistical clustering tools. These models, which are known to be parsimonious,

723

are assessed using the EM algorithm which is also have many appealing features

724

such as its simplicity and numerical stability. The originality of the approach

725

is also an advantage that deserves to be mentioned. The method is one of the

726

rare that can simultaneously count and recognize vehicles. Extracted signals

727

reflect the dynamics of the traffic and can consequently be further exploited to

728

provide traffic information such as the degree of congestion, types of vehicles,

729

traffic incidents, etc. The results of the experiments show that the system can

730

be easily implemented and extended to serve drivers needs.

AN US

CR IP T

716

However, a number of shortcomings have appeared in this work and need

732

to be addressed as a priority. The first issue to be critically examined is the

733

consistency of the method, especially in the case of high frequency traffic flows.

734

Focusing on the robustness and accuracy of the approach for the treatment of

735

traffics with a considerable amount of small vehicles such as the motorbikes

736

and microcars can also be carried out in a soon future work. These problems

737

demand more sophisticated equipments to register high resolution traffic video.

738

Splitting videos into subsequences is also a task that weighs down the procedure.

CE

PT

ED

M

731

In particular, treating frontier clusters has been somewhat intricate. Adopting

740

a continuous strategy in future works will contribute to a more effective one.

741

Besides, using Gaussian densities to describe clusters can be too restrictive and

AC

739

742

can also be extended to provide a more flexible framework. Otherwise, trans-

743

forming scatters’ data can be carried out to reflect more accurately Gaussian

744

clusters. Finally, testing the system in real world traffic conditions such as am-

745

bient lighting (day or night), shadows, vehicle occlusion, inclement weather, is

746

also a very challenging topic and needs careful consideration in the future. 35

ACCEPTED MANUSCRIPT

747

8. Conclusion The flows of circulating vehicles in the main roads of a region represent a

749

determining factor to stimulate economic growth and development. They can

750

also hide a number of economic externalities. Defining the size of these flows

751

and the nature of circulating vehicles is a matter of great importance since it

752

may explain a great part of the urban transport problems. Hence, appropriately

753

estimating such flows is the first stage of the process towards defining a strat-

754

egy with optimal choices on use and combination of transportation means to

755

achieve maximum effectiveness in transports and transfers. However, studying

756

this class of problems is generally hampered by the lack of real data due to the

757

high maintenance costs that necessitate existing techniques. In this paper, we

758

developed a strategy based on video-detection and the simple inference of their

759

datastreams. The proposed strategy consists in considering a virtual inductive-

760

loop within the traffic video by focusing on a set of pixels whose intensities

761

define a set of signals. However, several exogenous factors, such as the sudden

762

variation of vehicles velocity, particular conditions of luminosity, and the pres-

763

ence of a number of moving artifacts in the scene, may introduce a considerable

764

amount of randomness to the studied data. Besides, the color composition of

765

vehicles can vary greatly as the windshield, top of the car can have different

766

colors, affecting the precision of a counting method. Consequently, flexible and

767

parsimonious statistical models are applied to summarize the set of signals with

768

the aim of finding and characterizing particular clusters corresponding to the

769

population of vehicles. We explained all the strategy and the process of imple-

770

mentation. A numerical validation carried out on a benchmark video sequence

771

shows the accuracy of the technique. In forthcoming works, the technique will

CE

PT

ED

M

AN US

CR IP T

748

eventually be implemented as part of a project for counting and characteriz-

773

ing vehicles fleets of most vivacious roads in the different areas of the Tunisian

774

capital.

AC 772

36

ACCEPTED MANUSCRIPT

775

Appendix

Table 4: Extracting, transforming and clustering the motion data.

CR IP T

Algorithm 1 Input: (yX ,τ , Gmax , Θ0 , γ).

Output: A report containing the Gmax clusters parameters Θ∗g , g = 1, . . . , Gmax and a respective partition of the scatter. 1:

Time-differentiating data: for t = 1 to T do y˙ x,t := y(x, t + 1) − y(x, t)

3:

AN US

end for 2:

Fixing the threshold εˆ = supx,t |y(x, ˙ t)|.

Construct the dataset S whose elements form the scatter. Hence, set S = ∅: for x = 1 to n do for t = 2 to T do while εˆ < y˙ x,t do

S := S ∪ {s}, where s = (x, t, y˙ x,t )†

M

end while end for end for

Perform a cluster analysis to the dataset S. A Gmax -GMM is chosen and the EM

ED

4:

algorithm is used to calculate its parameters set Θ containing π, µ and Σ, and also partition data into Gmax sets of similar characteristics y˜˙ = {y˜˙ (1) , . . . , y˜˙ (Gmax ) }. Starting from an arbitrary solution Θ0 , iteratively compute Θ∗ :

PT

Set i := 0, and calculate Θ1 = EM(Θ0 ), where EM( . ) is as defined in Eqs.(15) and (16) while ||Θi+1 − Θi || ≥ γ do

CE

i := i + 1

Calculate Θi+1 = EM(Θi )

AC

end while

776

Acknowledgment

777

We would like to thank the anonymous reviewers for their insightful and con-

778

structive comments that greatly contributed to improving the paper. Our many 37

ACCEPTED MANUSCRIPT

Table 5: Clusters adjustment. Algorithm 2 Input: (y˜˙ , t0 , δt, Gmax , Θ∗ , γ). {C1 , C2 , . . . , CG }. 1:

Reducing the number of clusters to an optimal number G. Set G := {∅}: for h = 1 to Gmax do if y˜˙ (h) 6⊂ G then

G := G ∪ {y˜˙ (h) }

end if

AN US

0 0 (h) while ∃ y˜˙ x,t ∈ y˜˙ (h) ∩ y˜˙ (h ) , ∀ h 6= h0 and y˜˙ (h ) 6⊂ G do

G := Gmax − 1,

0 G := G ∪ {y˜˙ (h ) }

end while end for 2:

CR IP T

Output: A report containing the number of vehicles G and their respective clusters

Re-clustering elements arising from neighbor cluster into G groups. A G-GMM is estimated using the EM algorithm, see Algorithm 1 – 4:. The result is a set of parameters {Θ1 , Θ2 , . . . , ΘG } and their respective groups {y˜˙ (1) , . . . , y˜˙ (G) }. Classifying clusters into two categories: border B and non-border N B clusters. Set B = N B = {∅}. for g = 1, . . . , G do

∈ y˜˙ (g) , ∀ x then

ED

(g) if ∃ y˜˙ x,t

M

3:

0 +δt

B := B ∪ {y˜˙ (g) }

else

PT

N B := N B ∪ {y˜˙ (g) }

end if

CE

end for

thanks go also to the editorial staff for their generous support and assistance

780

during the review process.

AC

779

38

ACCEPTED MANUSCRIPT

781

References [1] A. Ambardekar, M. Nicolescu, G. Bebis & M. Nicolescu, (2014). Vehicle

783

classification framework: a comparative study. EURASIP Journal on Image

784

and Video Processing, vol. 2014, no. 29, pp. 1–13.

CR IP T

782

785

[2] O. Arbelaitz, I. Gurrutxaga, J. Muguerza, J.M. P´erez & I. Perona, (2013).

786

An extensive comparative study of cluster validity indices. Pattern Recog-

787

nition, vol. 46, no. 1, pp. 243–256.

[3] F. Archetti, E. Messina, D. Toscani & L. Vanneschi, (2006). Classifying

789

and counting vehicles in traffic control applications. Applications of Evo-

790

lutionary Computing, Lecture Notes in Computer Science, vol. 3907, pp.

791

495–499.

AN US

788

[4] E. Bas, M. Tekalp & F.S. Salman, (2007). Automatic vehicle counting from

793

video for traffic flow analysis. IEEE Symposium on Intelligent Vehicle - IV,

794

pp. 392–397.

M

792

[5] C. Biernacki, G. Celeux, G. Govaert & F. Langrognet, (2006). Model-based

796

cluster and discriminant analysis with the mixmod software. Computational

797

Statistics and Data Analysis, vol. 51, pp. 587–600.

ED

795

[6] T. Bragatto, G. Ruas, V. Benso, M.V. Lamar, D. Aldigueri, G.L. Teixeira

799

& Y. Yamashita, (2008). A new approach to multiple vehicle tracking in

800

intersections using harris corners and adaptive background subtraction.

801

IEEE Intelligent Vehicles Symposium, pp. 548–553.

CE

PT

798

802

[7] T. Calinski & J. Harabasz, (1974). A dendrite method for cluster analysis. Communications in Statistics, vol. 3, pp. 1–27.

AC

803

804

805

[8] G. Celeux & G. Govaert, (1995). Gaussian parsimonious clustering models. Pattern Recognition, vol. 28, pp. 781–793.

806

[9] A.B. Chan & N. Vasconcelos, (2005). Classification and retrieval of traffic

807

video using auto-regressive stochastic processes. IEEE Intelligent Vehicles

808

Symposium, Las Vegas, USA. 39

ACCEPTED MANUSCRIPT

809

[10] S. Chen, M. Shyu, C. Zhang & J. Strickrott, (2002). A multimedia data

810

mining framework: mining information from traffic video sequences. Jour-

811

nal of Intelligent Information Systems, vol. 19, no. 1, pp. 6177. [11] S. Y. Cheung, S. Coleri, B. Dundar, S. Ganesh, C. Tan & P. Varaiya, (2005).

813

Traffic measurement and vehicle classification with single magnetic sensor.

814

Journal of the Transportation Research Board, vol. 1917, pp. 173–181.

815

[12] M.V. Chitturi, J.C. Medina & R.F. Benekohal, (2010). Effect of shadows

816

and time of day on performance of video detection systems at signalized

817

intersections. Transportation Research Part C: Emergerging Technologies,

818

vol. 18, no. 2, pp. 176–186.

AN US

CR IP T

812

819

[13] B. Coifman, D. Beymer, and P. McLauchlan, (1998). A real-time computer

820

vision system for vehicle tracking and tracking surveillance. Transportation

821

Research Part C: Emergerging Technologies, vol. 6, no. 4, pp. 271–288. [14] J.C. Dunn, (1973). A fuzzy relative of the ISODATA process and its use

823

in detecting compact well-separated clusters. Journal of Cybernetics, vol.

824

3, pp. 32-57.

ED

M

822

[15] D.L. Davies & D.W. Bouldin, (1979). A clustering separation measure.

826

IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 1,

827

pp. 224–227.

PT

825

[16] A.P. Dempster, N.M. Laird & D.B. Rubin, (1977). Maximum likelihood

829

for incomplete data via the EM algorithm. Journal of the Royal Statistical

CE

828

Society, Series B, vol. 39, pp. 1-38.

830

[17] A.B. De Oliveira & J. Scharcanski, (2010). Vehicle counting and trajectory

AC

831

832

detection based on particle filtering. Twenty-third SIBGRAPI Conference

833

on Graphics, Patterns and Images, pp. 376–383.

834

[18] A. Elgammal, D. Harwood & L. Davis, (2000). Non-parametric model for

835

background subtraction. Proceedings of the 6th European Conference on

836

Computer Vision-Part II, pp. 751–767. 40

ACCEPTED MANUSCRIPT

[19] Z. Fu-min, L. L¨ u-chao, J. Xin-hua & L. Hong-tu, (2014). An automatic

838

recognition approach for traffic congestion states based on traffic video.

839

Journal of Highway and Transportation Research and Development, vol. 8,

840

no. 2, pp. 72–80.

CR IP T

837

841

[20] T. Gates, S. Schrock & J. Bonneson, (2004). Comparison of portable speed

842

measurement devices. Journal Transportation Research Record: Journal of

843

the Transportation Research Board, vol. 1870, pp. 139–146.

[21] T. Horprasert, D. Harwood & L.S. Davis, (1999). A statistical approach for

845

real-time robust background subtraction and shadow detection. in IEEE

846

ICCV’99 Frame-Rate Workshop.

AN US

844

847

[22] J.S. Hu & T.M. Su, (2007). Robust background subtraction with shadow

848

and highlight removal for indoor surveillance. EURASIP Journal on Ad-

849

vanced Signal Processing, vol. 10, no. 1, pp. 108132.

[23] Z. Iftikhar, P. Premaratne & P. Vial, (2014). Computer vision based traffic

851

monitoring system for multi-track freeways. Lecture Notes in Computer

852

Science, vol. 8589, pp. 339–349.

ED

M

850

[24] H.G. Kim, S.J. Seo & B.C. Song, (2015). Multi-frame de-raining algorithm

854

using a motion-compensated non-local mean filter for rainy video sequences.

855

Journal of Visual Communication and Image Representation, vol. 26, pp.

856

317–328.

PT

853

[25] J-H. Kim, J-Y. Sim & C-S. Kim, (2015). Video deraining and desnowing

858

using temporal correlation and low-rank matrix completion. IEEE Trans-

859

actions on Image Processing, vol. 24, no. 9, pp. 2658–2670.

AC

CE

857

860

[26] A. Lagorio, E. Grosso & M. Tistarelli, (2008). Automatic detection of ad-

861

verse weather conditions in traffic scenes. In the IEEE Fifth International

862

Conference on Advanced Video and Signal Based Surveillance, pp. 273–279.

41

ACCEPTED MANUSCRIPT

863

[27] D. Li, J. Shan, Z. Shao, X. Xhou & Y. Yao, (2013). Geomatics for smart

864

cities – concept, key techniques, and applications. Geo-spatial Information

865

Science, vol. 16, no. 1, pp. 13–24. [28] Y. Liu, Y. Lu, Q. Shi & J. Ding, (2013). Optical flow based urban road vehi-

867

cle tracking. Ninth International Conference on Computational Intelligence

868

and Security (CIS), pp. 391-395.

869

CR IP T

866

[29] K.H. Lu, C.M. Wang & S.Y. Chen, (2008). Traffic light recognition. Journal of the Chinese Institute of Engineers, vol.31, pp. 1069–1075.

870

[30] J.C. McCall & M.M. Trivedi, (2006). Video-based lane estimation and

872

tracking for driver assistance: survey, system, and evaluation. IEEE Trans-

873

actions on Intelligent Transportation Systems, vol. 7, no. 1, pp. 20–37.

AN US

871

[31] S.K. Meher & M.N. Murty, (2011). Efficient detection and counting of

875

moving vehicles with region-level analysis of video frames. Proceedings of

876

the International Conference on SocProS, AISC 131, pp. 29–40.

M

874

[32] L. Mimbela, (2007). A summary of vehicle detection and surveillance tech-

878

nologies used in intelligent transportation systems. Federal Highway Ad-

879

ministration, Washington, D.C.

ED

877

[33] P.K. Mishra, M. Athiq, A. Nandoriya & S. Chaudhuri, (2013). Video-based

881

vehicle detection and classification in heterogeneous traffic conditions using

882

a novel kernel classifier. IETE Journal of Research, vol. 59, no. 5, pp. 541–

883

550.

CE

PT

880

[34] G. Mu, Z. Xinyu, L. Deyi, Z. Tianlei & A. Lifeng, (2015). Traffic light

885

detection and recognition for autonomous vehicles. The Journal of China

886

Universities of Posts and Telecommunications, vol. 22, no. 1, pp. 50–56.

AC

884

887

[35] S.G. Narasimhan & S.K. Nayar, (2003). Contrast restoration of weather

888

degraded images. IEEE Transactions on Pattern Analysis and Machine

889

Intelligence, vol. 25, no. 6, pp. 713–724.

42

ACCEPTED MANUSCRIPT

890

[36] X. Pan, Y. Guo & A. Men, (2010). Traffic surveillance system for vehicle

891

flow detection. Second International Conference on Computer Modeling and

892

Simulation, pp. 314–318. [37] H. Rabbouch, F. Saˆ adaoui & A.V. Vasilakos, (2016). A wavelet-assisted

894

subband de-noising for the tomographic image reconstruction. Forthcoming.

895

[38] P. Rousseeuw, (1987). Silhouettes: a graphical aid to the interpretation

896

and validation of cluster analysis. Journal of Computational and Applied

897

Mathematics, vol. 20, pp. 53–65.

[39] F. Saˆ adaoui, (2012). A probabilistic clustering method for US interest rate

AN US

898

CR IP T

893

analysis. Quantitative Finance, vol. 12, no. 1, pp. 135–148.

899

900

[40] F. Saˆ adaoui & H. Rabbouch, (2014). A wavelet-based multi-scale vector

901

ANN model for econophysical systems prediction. Expert Systems with Ap-

902

plications, vol. 41, no. 13, pp. 6017–6028.

[41] R. Sammouda, A.M.S. Al-Salman, A. Gumaei & N. Tagoug, (2015). An

904

efficient image denoising method for wireless multimedia sensor networks

905

based on DT-CWT. International Journal of Distributed Sensor Networks,

906

vol. 2015, pp. 1–13.

ED

M

903

[42] A. S´ anchez, P.D. Su´ arez, A. Conci & E. Nunes, (2011). Video-based dis-

908

tance traffic analysis: Application to vehicle tracking and counting. Com-

909

puting in Science and Engineering, vol. 13, no. 3, pp. 38–45.

CE

PT

907

[43] N. Seenouvong, U. Watchareeruetai, C. Nuthong, K. Khongsomboon & N.

911

Ohnishi, (2016). A computer vision based vehicle detection and counting

912

system. 8th International Conference on Knowledge and Smart Technology

913

(KST), Chiangmai – Thailand.

AC

910

914

[44] S.L. Skszek, (2001). ”State-of-the-Art” report on non-traditional traffic

915

counting methods. Technical report no. FHWA-AZ-01-503, Arizona De-

916

partment of Transportation.

43

ACCEPTED MANUSCRIPT

[45] M. Sun, K. Wang, M. Tang, F.-Y. Wang & J. Yang, (2011). Video vehicle

918

detection through multiple background-based features and statistical learn-

919

ing. In Proc. IEEE Intelligent Transportation Systems Conference, Wash-

920

ington, DC (2011), pp. 1337–1342.

CR IP T

917

921

[46] L. Unzueta, M. Nieto, A. Cort´es, J. Barandiaran, O. Otaegui & P. S` anchez,

922

(2012). Adaptive Multicue Background Subtraction for Robust Vehicle

923

Counting and Classification. IEEE Transaction on Intelligent Transporta-

924

tion Systems, vol. 13, no. 2, pp. 527–540.

[47] K. Wang & Y. Yao, (2015). Video-based vehicle detection approach with

926

data-driven adaptive neuro-fuzzy networks. International Journal of Pat-

927

tern Recognition and Artificial Intelligence, vol. 29, no. 7, pp. 1–32.

AN US

925

[48] Y. Xia, X. Shi, G. Song, Q. Geng & Y. Liu, (2016). Towards improving

929

quality of video-based vehicle counting method for traffic flow estimation.

930

Signal Processing, vol. 120, pp. 672–681.

931

M

928

[49] Y. Xia, C. Wang, X. Shi & L. Zhang, (2014). Vehicles overtaking detection using RGB-D data. Signal Processing, vol. 112, pp. 98-109.

ED

932

[50] Y. Xia, W. Xu, L. Zhang, X. Shi & K. Mao, (2014). Integrating 3D structure

934

into traffic scene understanding with RGBD data. Neurocomputing, vol.

935

151, part 2, pp. 700-709.

PT

933

[51] J. Yang & Y. Dai, (2012). A modified method of vehicle extraction based on

937

background subtraction. IEEE International Conference on Fuzzy Systems

CE

936

(FUZZ-IEEE), pp. 1–5.

938

[52] M.T. Yang, R.K. Jhang & J.S. Hou, (2013). Traffic flow estimation and

AC

939

940

vehicle-type classification using vision-based spatial-temporal profile anal-

941

ysis. IET Computer Vision, vol. 7, no. 5, pp. 394–404.

942

[53] N.H.C. Yung & A.H.S. Lai (2001). An effective video analysis method for

943

detecting red light runners. IEEE Transactions on Vehicular Technolology,

944

vol.50, pp. 1074–1084. 44

ACCEPTED MANUSCRIPT

945

[54] H. Zhang & K. Wu, (2012). A vehicle detection algorithm based on three-

946

frame differencing and background subtraction. Fifth International Sympo-

947

sium on Computational Intelligence and Design (ISCID), pp. 148–151. [55] L. Zhang, M. Song, X. Liu, J. Bu & C. Chen, (2013). Fast multi-view

949

segment graph kernel for object classification. Signal Processing, vol. 93

950

pp. 1597–1607.

CR IP T

948

[56] L. Zhang, Y. Gao, C. Hong, Y. Feng, J. Zhu & D. Cai, (2014). Feature

952

correlation hypergraph: exploiting high-order potentials for multimodal

953

recognition. IEEE Transactions on Cybernetics, vol. 44, pp. 1408–1419.

AN US

951

954

[57] R. Zhao & X. Wang, (2013). Counting vehicles from semantic regions.

955

IEEE Transactions on Intelligent Transportation Systems, vol. 14, no. 2,

956

pp. 1016–1022.

[58] J. Zhu, Y. Lao & Y.F. Zheng, (2010). Object tracking in structured environ-

958

ments for video surveillance applications. IEEE Transactions on Circuits

959

and Systems for Video Technology, vol. 20, no. 2, pp. 223–235.

AC

CE

PT

ED

M

957

45

AN US

CR IP T

ACCEPTED MANUSCRIPT

M

960

Hana Rabbouch received her Research Master’s degree in Image Process-

962

ing from the National School of Computer Science ENSI at the University of

963

Manouba (Tunisia). She is currently enrolled at the Higher Institute of Man-

964

agement of Tunis (Tunis University) finishing her doctoral thesis in Business

965

Informatics. Image processing with applications in management, transporta-

966

tion and medicine constitute her principal areas of research.

AC

CE

PT

ED

961

46

ED

M

AN US

CR IP T

ACCEPTED MANUSCRIPT

967

Foued Saˆ adaoui received his Ph.D. degree in Quantitative Methods from

969

Sousse University (Tunisia). He is a member of the laboratory “Mathematical

970

Physics, Special Functions and Applications” at the High School of Sciences

PT

968

and Technology of Hammam Sousse. Currently, he is Associate Professor at the

972

College of Sciences and Arts of the Saudi Electronic University (Riyadh - Saudi

973

Arabia). Some of his main areas of research are that of computing, statistical

AC

CE 971

974

modeling and data analysis.

47

AN US

CR IP T

ACCEPTED MANUSCRIPT

975

Rafaa Mraihi is an Associate Professor in Transportation Sciences at the

977

High School of Business (University of Manouba, Tunis). He received his PhD in

978

Economic Sciences with a specialization in industrial economy of transport from

979

University of Toulouse 1 and University of Sfax. His research interest addresses

980

the sustainable transport problems and modelling and planning of transport.

AC

CE

PT

ED

M

976

48

AC

CE

PT

ED

M

AN US

CR IP T

ACCEPTED MANUSCRIPT

Figure 12: Results of assessing the number of vehicles and recognizing their categories in a road of Riyadh. Among a number of 106 vehicles, 103 have been detected and characterized, which is equivalent to a success rate exceeding 97%.

49

M

AN US

(a) Noised frames

CR IP T

ACCEPTED MANUSCRIPT

CE

PT

ED

(b) Noised and blurred frames

(c) Noised frames with a poor contrast

Figure 13: Frames from the degraded traffic video sequences.

In the left-hand-side, the

sequence is corrupted by an additive Gaussian noise. In the right-hand-side, the sequence

AC

is multiplicatively corrupted by a speckle noise. The center and bottom images correspond respectively to the blurred and the contrast-modified sequences.

50