6th IFAC Conference on Management and Control of Production and Logistics The International Federation of Automatic Control September 11-13, 2013. Fortaleza, Brazil
Using Background and Segmentation Algorithms Applied in Mobile Robots. Román Osorio*, Mario Peña*, Ismael López-Juárez** Jesús Savage***, Gastón Lefranc**** * Instituto de Investigaciones en Matemáticas Aplicadas y en Sistema, UNAM. México, D.F.
[email protected] ** CINVESTAV, Grupo de Robótica y Manufactura Avanzada, Saltillo, Coah. México.
[email protected] *** Bio-Robotics Laboratory, School of Engineering, Universidad Nacional Autónoma de México.
[email protected] **** Escuela de Ingeniería Eléctrica, Pontificia Universidad Católica de Valparaíso, Valparaíso Chile.
[email protected] Abstract: In this article a segmentation algorithm for detecting moving objects is presented. The aim of the research is to integrate the algorithm in applications such as car parking video surveillance systems. One of the techniques used in this paper to detect motion in a sequence of images is the use of the background model, which is widely used. The technique allows to detect which objects are moving (without identification) which is the first stage for further processing in tasks such as tracking and object recognition. The results from the segmentation algorithm using several parameters are presented that validate the approach. Keywords: Artificial Vision, Images processing, video surveillance.
background model has been identified, then the foreground model can be obtained (Elgammal, 2002) and it will be the image information that does not match with the background model or, in other words, the image section is dynamic. To obtain the foreground model several algorithms have been reported in the literature (Lefranc, et al., 2000).
1. INTRODUCTION Today, video surveillance systems for real-time supervision are used. The recognition, for a specific motion in scenes, is determinant factor for an intelligent surveillance system. These systems provide an automated scene interpretation and are able to predict actions an interaction between observed subjects and objects. The stages of an intelligent surveillance system are composed by object detection in motion, object recognition, tracking, behavior analysis and retrieval.
Paper has three stages: preprocessing that includes image segmentation, feature vector extraction and object classification and recognition of object present in the image scene. Through the segmentation of an image, it is possible to separate different object in a scene. Techniques such as image histogram or edge detection can be used. Edge detection in 3D can be used so that edges are detected as discontinuities in an image section (chromaticity values, texture values, etc.). This 3D edge detection is a key issue to characterize the image. The feature extraction transforms the image in one or several images to allow the image transformation in intensity levels to apply later edge detection techniques based on the covariance model. Having a defined edge also defines an adequate model. If the edges are imperfect then a discrete model can be obtained (Cano and Lefranc, 2002).
Surveillance systems are of high demand following an upward trend for domestic, military and commercial use. In major cities such as in Mexico and in Santiago City several thousands of surveillance cameras have been installed in public places and public transports with the purpose of security vigilance. These camera systems are monitored in the police station and if a suspicious action or vandal act is detected, then the police officers on duty are alerted. The above situations are very demanding task for humans, therefore and automated solution is preferable. Then, the integration of segmentation algorithms for detecting motion is very helpful for a comprehensive solution. The state of the art has been defined in video surveillance conferences (IEEE, 1999; IEEE, 2000a; IEE, 2003; IEE, 2004; IEEE, 2003, IJCV, 2000), as well as in journals devoted to discuss surveillance problems (IEEE, 2000b; IEEE, 2001; CVIU, 2001) and to discuss human motion analysis (Cheung, S-C. and Kamath, C., 2004). The main objective in this work is to detect the background and foreground image information using segmentation algorithms and the background model representation (Cervecera B. P., 2010; Sonsoles H. M, 2009 and McKeon et al., 2000). The result from these models is the image segmentation features that remain static. Once the
978-3-902823-50-2/2013 © IFAC
It has been developed a segmentation technique for integrating edges and regions detection that can be applied to low level images. A transformation allows forming image force fields, where each force vector represents a similarity from a point with the rest of the image. The interaction force among pixels depends on the distances and contrast that allows associating the similarity calculus (and segmentation) with photometry and space scale. The extraction of the feature vector from the segmented object permits to carry out object recognition and classification (Lefranc, 2002).
135
10.3182/20130911-3-BR-3021.00099
IFAC MCPL 2013 September 11-13, 2013. Fortaleza, Brazil
using a mean filter and then two methods are employed. One consisting in using the background model in conjunction to the foreground detection and the other approach is using a Mixture of Gaussians (MoG) that will be described later, detecting the image background and the foreground
In a previous work, a simple way to describe different objects in motion using image sequences is presented. The segmentation is an important part to identify the object location within the image. If the object moves fast, the captured scenes contain blurred edges. To improve the technique, the convolution is employed using Fourier Transform and defining the object in its real position.
Background modelling using differences
A video image sequence can be described as a discrete representation of an object moving in time and within a defined space (motion space) or vector motion space defined as a group of motion vectors used to identify motion patterns from an object in an image sequence. This definition suggests the existence of a relationship between scene sequence pixels known as motion fields that can be modeled in several ways (Lefranc, 2002).
Image Sequence
Foreground detection using differences Post processing
Pre processing
Foreground Mask
Background modelling and Foreground detection using Mixture of Gaussians
One of these models is the feature model that extracts relevant features such as edges, occluded segments, etc. Under the assumption that the objects are rigid then nonlinear equations have to be solved. Depending on the required resolution is the computational cost and also having good results from the scene dynamic analysis when using a structure of scenes.
Fig. 1. Block diagram for the system The design is implemented considering gray scale images and color images (RGB). For the RGB model there are three independent channels. 2.1. Preprocessing
The optical flow model is based on the instantaneous change in a specific point in regard to the motion pattern of the same point on the image sequence. This model has some problems related with noise and the presence of independent moving objects. This results in, sometimes, bad segmentation and also poor motion estimation.
With the purpose of reducing its size, and the computational load, the image is scaled down. Then, the image is filtered to reduce image noise using a mean value filter that consists in obtaining the mean value from different regions in which the image It(x, y) can be divided (IEEE, 2000b). The region size is the same as the filter size. The filtered image Gt(x, y) is obtained from a convolution operation between the [mxn] image It(x, y) and the mean value [pxq] filter Ht(x, y).
To solve these problems, a method based on contour named extrapolation and subtraction can be used. This technique works in the flow field’s space improving the precision in the measurement of these fields using the noise extrapolation and a subtraction vector that allows identifying the difference between moving objects.
Gt(x, y) = It(x, y)* Ht(x, y) ∑m-1
∑n-1 It(m, n) Ht(x-m, y-n)
m=0 n=0
When a robot manipulator tries to grasp a moving object, a visual feedback path is needed. The information of the object location and motion is available, ordering to move the robot. This scheme is called visual servoing that works with image sequences. A complete system uses a sequence of stereo images to determine the location of an object defining a new path composed by a set of points in space (Lefranc, 2002).
(1)
The image is scaled with factors (sx, sy) with a value inversely proportional to the size of the filter, that is: (2) The scaled image Et(x, y) has the following size
, which
is clearly equal or lower than filtered image Gt(x, y).
In this paper, the use of the segmentation algorithm to detect moving objects is presented. The integration of the system is done in a car park surveillance system. The result of segmentation using image sequence using the background model is presented. These results are the base for further processing regarding tracking and object recognition.
scaling Gt(x, y)
Et(x, y)
(3)
To improve the segmentation results during object motion, two independent methods are used: the first is based on differences using the background model and the second is based on the differences using the foreground modeling.
2. SYSTEM DESIGN The implemented algorithms are based on the background model. With this algorithm it is possible to detect moving objects efficiently and in real time. It is robust in terms of distinguishing weather a certain pixel changes or not in further sequence of frames (Cervera, 2010).
2.2. Background model. The simplest method with low computational cost to determine the background model Bd(x, y) is done by differences. For obtaining the model, a first image is taken from the sequence (Bd1 (x, y) = E1(x, y)) and from that instant, the previous image is taken as background model:
The diagram in Figure 1 illustrates the method. The input to the systems is the image sequence, which is preprocessed 136
IFAC MCPL 2013 September 11-13, 2013. Fortaleza, Brazil
Bd (x, y) = E-1(x, y)
background model, parameters update and foreground detection.
(4)
2.3. Foreground model.
2.5. Post-processing
The foreground detection by image differences is selected because is the simplest and with low computational cost to find the foreground mask Fd(x, y). The foreground is obtained comparing the difference value between background image and the current image E(x, y) with a threshold . If a determined pixel difference is higher than the threshold, then that pixel is considered as being a foreground pixel, otherwise is considered as background pixel. E(x, y)- Bd (x, y) ≤ Fd(x, y) = 0
During post-processing, the results from the foreground detection by image differences are operated with the MoG using the AND operator as illustrated in Table 1. Table 1 AND operation between MoG and Foreground detection
(5)
E(x, y)- Bd (x, y) > Fd(x, y) = 1
FMoGt(x, y)
Fandt(x, y)
0 0 1 1
0 1 0 1
0 0 0 1
Where Fandt(x, y) is the foreground resulting from the AND operation. As it can be observed from the table, it is a foreground only if both methods are considered as foreground. Using the MoG provides a higher tolerance from the noise introduced by the camera.
The background modeling and foreground detection does not work with multimodal backgrounds, however it helps to reduce the sensitivity of the Mixture of Gaussians. 2.4. Background modeling and Foreground detection by Mixture of Gaussians (MoG)
To improve the foreground from this AND operation, a dilation is applied to the result in order to have well defined edges. With this operation small holes are also filled, that is, we have the following mapping:
Based on results, the Mixture of Gaussians (MoG) is selected as a segmentation model. This method shows important advantages such as precision and the ability to model multimodal backgrounds. However, it has some disadvantages such a high computational cost, lower than parametric models. This disadvantage can be improved by using only the necessary k distributions.
dilation
Fandt(x, y)
Ft(x, y)
(9)
Where Ft(x, y) is the foreground or detected moving objects. In the case of color (RGB) images will have three results and logic OR operation can be applied to obtain the general result.
In the particular case of multimodal backgrounds, only one distribution is used, becoming a simple Gaussian method, reducing the computational cost. In this algorithm, each background image pixel pt in time t is represented by k Gaussian distributions, whose combination results in the probability function F(pt). F(pt) = ∑i = k wi, t(pt, i,t , i, t)
Fdt(x, y)
3. TEST RESULTS To carry out some tests, subjectivity and objectivity are combined taking several samples adjusting its parameters. The test parameters are as follow:
i=1
(6)
Color Flag (CF). Indicates weather an image in gray scale or color is going to be processed.
∑i = k wi, t = 1
(7)
Filter Size (FS). Indicates the size of the mean filter.
i=1
Threshold (T). Indicates the allowed difference between the image and the background model as to be considered as foreground.
where wi,t is the parameter that indicates the weight in the ith Gaussian component in time t and (pt, i,t ,i,t) is the ith Gaussian component from a pixel pt having a mean value i,t and a standard deviation i,t, given by:
K distributions (k). Indicates the number of Gaussians to be used for processing. Standard Deviation Coefficient . Provides the estimated value for the standard deviation to be used for the corresponding distribution.
(8)
. Indicates the rate of change for the Change Rate Factor change in the mean and variance.
The Gaussian components are the weighted pixel average values taken from the images sequence.
Initial Standard Deviation value for the standard deviation.
In the next section, the background and foreground extraction using a MoG is explained including parameter initialization,
137
. Indicates the initial
IFAC MCPL 2013 September 11-13, 2013. Fortaleza, Brazil
Tests were performed varying only one parameter with the purpose of observing the segmentation consequences. However, a complete sweep with all values is not applied since some parameters are known in advance. First, it is worked with a unimodal and later with multimodal form.
0 0
4 x 4 4 2 1.5 0.001 4 x 4 4 3 1.5 0.001
2 2
23 31
The processing time is defined as the elapsed time to complete a module plus the time to find the contour and its approximation by a polynomial. For the tests, a PC computer with Intel(R) Pentium(R) 4 CPU 2.8GHz processor, 1.0 GB RAM was used running under Windows XP Professional. The image information was obtained from a Logitech HD Webcam C510 with 640x480 resolutions and 30 frames/sec. 3.1 Unimodal Scenario Tests Integrating the results from the different tests, the parameters are selected considering the dependence between parameters in such a way to obtain the best segmentation. To test the parameters, the scene changes to the worst case showing shadows, glare and colored background with similar moving objects.
Fig. 2. Results from the final test in unimodal scenario using a 4x4 filter.
The results from this worst scenario are shown in Table 2. As it can be observed the color flag (CF) is set (1) and reset (0). Table 2. Final test for a unimodal scenario with 4 x 4 filter CF
0 1
FS
T k
4 x 4 4 1 1.5 0.001 4 x 4 4 1 1.5 0.001
Average initial processing time (ms)
2 2
15 22 Fig. 3. Image results varying k for gray scale images using a 4x4 filter
In figure 2, the results for unimodal scenario using a 4x4 filter is shown. No noise is observed. The object shape remains either using gray scale or color images, even without contour approximation by a polynomial.
Table 4. Test varying k for color images using 4x4 filter. CF
3.2. Multimodal Scenario Tests In real life, scenarios are of multimodal type, which is a more comprehensive situation. A multimodal scenario test is prepared varying the number of distributions in order to test the method under these situations. The test is carrying out by adding flashing lights to the scene in order to observe if the algorithm can eliminate them. This is observed in figure 3 and figure 4. In the same manner, the results are given in Table 3 and Table 4.
1 1 1
Table 3. Test varying k for gray scale images using 4x4 filter.
CF
0
FS
T k
4 x 4 4 1 1.5 0.001
Average initial processing time (ms)
2
16
138
FS
T k
4 x 4 4 1 1.5 0.001 4 x 4 4 2 1.5 0.001 4 x 4 4 3 1.5 0.001
Average initial processing time (ms)
2 2 2
24 45 61
IFAC MCPL 2013 September 11-13, 2013. Fortaleza, Brazil
b) Shadows and glare in the scene require the use of the filter. c)
To further reduce the noise, the threshold for background model has to be modified (up/down).
d) If multimodal backgrounds are detected, then the k distributions have to be increased. e)
Since real-time is required for the application, then the gray scale image sequences are preferred. Results from color images are very similar but increase the computational cost.
Fig. 4. Image results varying k for color images using a 4x4 filter with k = 3, the motion produced by the flashing lights cannot be detected.
5. CONCLUSIONS In this work, a segmentation algorithm to detect moving objects and to integrate it within a car park surveillance system it has been presented. For detecting the moving object, using a sequence of images, a background model was used. The algorithm allows the user to know if the objects are moving (without recognition) detecting the motion. A further stage that includes the object recognition algorithm has been envisaged.
4. DISCUSION To determine the parameter values for a better motion detection and to reduce the noise, several tests are made. It is observed that there are background and object features that affect the detection such as: 1.
Interaction. The overlapping between objects and their different speeds make more difficult the detection.
2.
Mode. Multimodal scenarios increase the complexity for detection of moving objects.
3.
Rigidity. A rigid object is easier to detect.
4.
Size. A small object or objects far away from the camera can be confused with noise or missed for the camera and other hand, greater objects or objects closer to the camera that are occluded in the scene can be imperceptible.
5.
Texture. It is easy to identify an object if the object’s texture is very different from the background.
6.
Speed. A slow moving object can be confused by instability or noise from the camera. On the other hand, a fast moving object can be difficult to detect.
Several tests are carried out to determine the best parameter values for the model in order to have the best detection with lower noise. The noise was mainly due to the segmentation process. It is observed that if the size filter increases then the threshold background too and vice versa. The same effect occurs with the standard deviation coefficient. The segmentation is adequate in unimodal and multimodal scenarios having an increased computational cost the latter. The results show that color images are in general better than the gray scale; however, the higher computational cost that imply to work with these images makes preferable to work with gray scale images especially if the system is intended to be working in real-time situations. REFERENCES
In addition, the following issues are found:
Cano F., Lefranc G. (2002), Sistema Servoing de manipuladores Pick and Place. X Congreso Latinoamericano de Control Automático, Guadalajara, México. Cervera B. P. (2010). Integración de información de movimiento en la segmentación de secuencias de vídeo basada en el modelado de fondo, Universidad Autónoma de Madrid. Cheung, S-C. and Kamath, C. (2004). Robust techniques for background subtraction in urban traffic video. IS&T/SPIE's Symposium on Electronic Imaging. CVIU (2001). Special issue on human motion analysis, Computer Vision and Image Understanding, March 2001.
If the filter size increases, the background threshold and the standard deviation decreases and vice versa. In general, the color image sequences are better than in gray scale. The computational cost is higher for images in color rather in gray scale. The multimodal scenario requires a higher computational load. However, the segmentation is adequate in unimodal and multimodal scenarios. It was possible to diminish the noise applying some of the following strategies: a)
Camouflage, implies to lower the standard deviation value.
Varying the lighting implies the use of the filter.
139
IFAC MCPL 2013 September 11-13, 2013. Fortaleza, Brazil
Elgammal, A. Duraiswami, R., Harwood, D. and Davis, L. S. (2002), Background and Foreground Modeling using Non-parametric Kernel Density Estimation for Visual Surveillance. IEEE. IEE (2003). First IEE Workshop on Intelligent Distributed Surveillance Systems, February 2003, London, ISSN 0963-3308. IEE (2004). Second IEE Workshop on Intelligent Distributed Surveillance Systems, February 2004, London, ISBN 086341-3927. IEEE (1998). First IEEE Workshop on Visual Surveillance, January 1998, Bombay, India, ISBN 0-8186-8320-1. IEEE (1999). Second IEEE Workshop on Visual Surveillance, January 1999, Fort Collins, Colorado, ISBN 0-7695-0037-4. IEEE (2000a). Third IEEE International Workshop on Visual Surveillance (VS’2000), July 2000, Dublin, Ireland, ISBN 0-7695-0698-4. IEEE (2000b). Special issue on visual surveillance, IEEE Transactions on Pattern Analysis and Machine Intelligence, August 2000, ISSN 0162-8828/00. IEEE (2001). Special issue on third generation surveillance systems, Proceedings of IEEE, October 2001, ISBN 0018-9219/01. IEEE (2003). IEEE Conference on AdvancedVideo and Signal Based Surveillance, July 2003, ISBN 0-76951971-7. IJCV (2000). Special issue on visual surveillance, International Journal of Computer Vision, June 2000. Lefranc G. (2002), “Visual Servoing Systems: A Tutorial”, IEEE International Symposium on Robotics and Automation, ISRA´2002, Toluca, México. Lefranc G., González N., Power C. (2000), Detection the movement of an object from a sequence of images. Proceedings IFAC, IFIP, IEEE Management and Control of Production and Logistics, Grenoble, France. Elsevier 2000. McKoen, K., Navarro-Prieto, R., Duc, B., Durucan, E., Ziliani, F., and Ebrahimi, T. (2000). Evaluation of video segmentation methods for surveillance applications, Proc. EUSIPCO, vol. II, pp. 1045-1048. Sonsoles H. M. (2009). Análisis comparativo de técnicas de segmentación de secuencias de video basadas en el modelado de fondo, Universidad Autónoma de Madrid.
140