Proceedings of the 18th World Congress The International Federation of Automatic Control Milano (Italy) August 28 - September 2, 2011
Real-Time Wide-Field Monitoring System Based on Image Mosaicing Technique and Its Application Li-hui Zou*,**, Jie Chen*,**, Juan Zhang*,**, Jing-hua Lu*,** * School of Automation, Beijing Institute of Technology, Beijing, 100081 China ** Key Laboratory of Complex System Intelligent Control and Decision (Beijing Institute of Technology), Ministry of Education, Beijing, 100081 China (e-mail: {zoulihui, chenjie,
[email protected]}) Abstract: Common surveillance system is usually limited by the scope of observation, which may increase the probability of missing or misjudging suspicious targets. A wide-field monitoring system based on efficient image mosaicing algorithm is proposed in this paper. It can monitor a wide field for both short and long distance in real time and highlight the detected moving objects simultaneously. The image mosaicing is not only aimed at obtaining the wide field of view, but also utilized in motion detection for forming a wide-field motion mask. A simulation platform is developed to verify the performance of the system. The experiment results show that it satisfies the demands of high resolution and real-time implementation. The system is of high practical value to transportation surveillance or other applications. Keywords: safety analysis; image processing; transportation. 1. INTRODUCTION With the development of computer vision, video surveillance becomes increasingly important in current society (Haering and Shafique 2007). It is closely related to our daily life, which can be seen in many places, such as train and subway station, airport, parking lots and so on. The research on video surveillance has been widely carried out, such as (Robert 2009) dealt with feature detection and tracking in traffic monitoring, (Vu et al. 2006) handled with event recognitions, (Valera and Velastin 2005) addressed at distributed video surveillance. Many aspects of the research on video surveillance are growing into maturity. Nevertheless, in many surveillance systems, the limitation of monitor area under each camera is still a problem. Many unwilling situations, such as missing accident spots or losing suspicious target, are happen quite often, even in multi-camera distributed surveillance systems, which are caused by partial visual range or blind angles. Therefore, developing a surveillance system with wide field of view (FOV) is beneficial to system security. It can reduce the error or missing rate of suspicious targets, and decrease the construction and maintenance costs as the distribution density of monitoring camera is reduced. Basically, there are many requirements for a wide-field surveillance system. The most fundamental ones are:
The ability of the monitoring system with adequate resolution to enable overall situational awareness of moving objects;
The ability to perform a real-time implementation.
Common solutions for obtaining wide FOV can be classified into three main categories: obtaining from omni-directional 978-3-902661-93-7/11/$20.00 © 2011 IFAC
reflecting imaging devices (Winters et al. 2000), scanning with 1D linear camera which only has 1D array CCD sensors (Zheng 2003) and stitching overlapped images shot by multiple cameras or a rotating camera (McLauchlan and Jaenicke 2002). The first solution which can generate panoramic annular image at once for whole surroundings is usually used on robots for indoor surveillance. The image resolution is very poor owing to the technical difficulties in manufacturing lager size CCD. The second solution can create large FOV image with high resolution efficiently as long as there is only 1D relative motion between the scene and the camera. If the camera scans the scene in 2D or more motion, the image will be distorted seriously. The third solution is suitable for more general case. The adequate resolution requirement which is associated with the discrimination capacity of pixels in the image can be satisfied by choosing a proper focal length on the basis of Johnson's criteria for image forming system (Johnson 1958), but realtime image mosaicing in this solution is a great challenge. In this paper, a real-time wide-field monitoring system based on image moscaicing technique is proposed for both short distance surveillance (e.g. train and subway stations, traffic environments) and long distance surveillance (e.g. airport, ferry service). Many cooperative modules are explored in this system to relieve the working pressures on observers. The wide FOV is gained by an efficient image mosaicing algorithm. It is superior to other systems that the moving objects are detected simultaneously during the generation of the mosaic image, and a motion mask of the wide FOV which indicates the motion magnitude of objects is formed by the image mosaicing algorithm as well. A simulation platform is designed, and a series of experiments, including experiments for transport environment surveillance, are carried out to verify the functions and the efficiency of the system.
14928
10.3182/20110828-6-IT-1002.02945
18th IFAC World Congress (IFAC'11) Milano (Italy) August 28 - September 2, 2011
2. THE PROPOSED SCHEME FOR WIDE FIELD SURVEILLANCE Serving for both short and far distance security surveillance, the solution for obtaining wide FOV falls into the third categories, stitching overlapped images shot by a rotating camera, but not by multiple cameras. Usually, the FOV is depended on the focal length and the discrimination capacity of pixels in the image. Under the same discrimination capacity, the working distance of a surveillance system for detection and recognition is proportional to the focal length, the longer the focal length, the farther the working distance. However, the FOV is inversely proportional to the focal length, that is, the FOV becomes smaller with the increase of focal length. Thus, the FOV is rather small if the system works in a finer discrimination capacity for long distance surveillance, for instance, if the working distance is up to 2km and the discrimination capacity reaches 0.1m/pixel, the horizontal viewing angle is only 4~5° even with a high resolution camera. Therefore, it is unpractical to install too many cameras.
is more suspicious, and then locks it via human-computer interaction. After that, the system processor integrates the location of the acquired target in the mosaic image with the feedback angle information from the pan unit to calculate the target angle relative to the zero position. Finally, this angle information is provided to the tracking pan unit, and the tracking camera starts searching and tracking the moving target in this direction. In brief, the whole system can be summarized into the following modules: pan unit control module, image acquisition module, image mosaicing module, motion detection module, multiple displays and human-computer interaction module, object locating module, and target tracking module. According to the working mechanism, the relationships among these modules are illustrated as Fig. 2. The modules in bold are the modules that we are mainly focus on in this paper. Motion detection
Besides these two essential demands of a wide-field surveillance system, it is also required to detect moving objects automatically in the wide FOV to assist observers in their work. Aiming at these objectives, a surveillance scheme for wide field is proposed, as shown in Fig. 1.
Processing Console
Image acquisition
Image mosaicing
Pan unit & controller
Object Locating
Multi-screen display & Human-computer interaction
Target tracking
Fig. 2. The relationship diagram among the modules 3. KEY TECHNIQUE IN MODULES 3.1 Image Mosaicing Module
Fig. 1. A simulated diagram of monitoring system for widefield surveillance The working mechanism of the system can be described as follows. Both panorama camera and tracking camera are settled on the horizontal pan units, and they are set with the same zero position. Firstly, the control signals for panorama cameras and its pan unit are sent by processing console. The pan unit is controlled to rotate around at a constant speed, and the panorama camera is triggered to capture a scanning video of the scene synchronously. Then, the gathered frames are gradually stitched into a wide-field image with high resolution by image mosaicing algorithm. In the meantime, the detection of moving objects is conducted, and the widefield motion mask which is used for emphasizing moving objects is synthesized by the same method of image mosaicing. And then, the large mosaic image marked with moving objects is displayed on multi-screens via a multidisplay controller without any squeezing. The observer watches the screens, makes a decision which moving object
Image mosaicing technique can solve the limitation problem of the FOV. There are two main research mainstreams in this study. One is represented by Szeliski R. who proposed a classical motion model for image mosaicing (Shum and Szeliski 2000). The other one is represented by Peleg S. who proposed a manifold-based model for image mosaicing (Peleg et al. 2000). The image mosaicing algorithm (Zou et al. 2010a) presented in this module owing to its efficiency belongs to the latter one. The sprite of manifold-based mosaicing technique is inspired from 1D linear camera scanning, which is similar to the second solution for obtaining wide FOV. However, the ‘scanning line’ which is referred to as a strip is determined by the relative motion, and the deformation is greatly reduced taking advantage of the overlapping information between neighboring frames when suffering more than 1D motion. The algorithm used in this module is realized by the following steps. Step 1: Global motion estimation based on coarse-to-fine iterative Lucas-Kanade (LK) method Since seldom calibration information is known from an external device, only on the premise that the major motion
14929
18th IFAC World Congress (IFAC'11) Milano (Italy) August 28 - September 2, 2011
between neighboring frames is taken as translation due to dense image capturing. For this reason, we compute the motion parameters and align the video sequence first. LK method is a widely used iterative method for computing motion between two frames (Lucas and Kanade 1981). The brief description of LK method is summarized as follows.
Brightness discontinuity between adjacent images is often occurred during image acquisition. It will cause step changes in the mosaic image. In order to smooth the intensities of the mosaics, a rule judging brightness discontinuity is introduced. if ∆ t ≥ T , then I ref =I t
(6)
is Assuming that the motion between the two images I1and I2= ∆t mean ( I t ( xt −1 + u , yt −1 + v) − I t −1 ( xt −1 , yt −1 ) ) (7) ∀ ( x , y )∈I t I t −1 a pure translation (u, v), it can be computed by minimizing the error: where ∆ t is the average intensity difference of the overlapped (1) region between the current frame It and the previous frame It-1. Err= (u , v) ∑ ( I 2 ( x + u, y + v) − I1 ( x, y )) 2 x , y∈R If ∆ t exceeds a threshold T, taken It as a brightness reference where R is the overlapping region between I1 and I2. This image Iref, and the intensity difference should be compensated problem can be converted into solving a set of equations (Zou et al. 2010b). All the previous or the following frames A[u, v]T=b. Given an initial motion estimation (u, v), LK of It (depending on the position of It in the whole sequence, if method yields a type of Newton-Raphson iteration during the t
p
∑
p
∑ ω (m, n) ⋅ Il −1 (2i + m, 2 j + n)
(4)
− p n= −p m=
where 0 ≤ i ≤ M 2l , 0 ≤ j ≤ M 2l , 0 ≤ l ≤ K , K is the number of layers, and the weighting coefficient ω (m, n) is a Gaussian kennel whose size is (2p+1)×(2p+1) satisfying the constraint: p
∑
m,n = − p
ωl (i − m2i , j − n2 j ) = 1
(5)
After decomposing the image into a Gaussian pyramid, the motion estimation turns into a coarse-to-fine solving procedure. The estimation result of a higher level is utilized for the iteration of its lower level. The initial estimation becomes [u, v]l-1=2[u, v]l. The process will end on the lowest image level until the result converges to a permitted residual error or exceeds the maximum iteration times. Step 2: Eliminating the brightness difference
where (xt-1, yt-1) and (xt, yt) are corresponding points in image It-1 and image It, (u, v) is the optical flow vector as a function of position (xn, yn), and (a, b, c, d, e, f) are the parameters of the affine transformation A. Actually, the optical flow can be interpreted as the motion between two neighboring frames taken at times t and t +δt at every pixel position. Because of the high-overlapped ratio among original frames, the motion estimation result above can be taken as the global optical flow vector. We neglect the local optical flow caused by moving object or surface changes, and select the cut lines of strips that are only orthogonal to the global optical flow vector. This simplification may result in some deformation of the moving object in the mosaic image. For example, if the moving object moves in the same direction with the camera rotation, it would be elongated as duplicated scanned by the camera. If in the opposite direction, against with the camera rotation, the moving object would be shortened as missing scanned due to the relative movement. Nevertheless, it does not affect the overall impression of the final display, guaranteed by the high resolution quality.
14930
18th IFAC World Congress (IFAC'11) Milano (Italy) August 28 - September 2, 2011
Therefore, a cut line F(x,y)=0 orthogonal to the global optical flow (u, v) can be computed via its normal line whose direction is ( ∂F ∂x, ∂F ∂x ) , the same direction as (u, v), i.e. ∂F ∂x u a + bx + cy = k= k ∂F v d + ex + fy ∂y
(10)
The solution of F(x,y) is a family of lines. The line that contains maximum number of pixels in the centre of the image is chosen where the distortion is minimum. A brief description of forming the mosaics is shown in Fig. 3. If given the affine transformations At and At+1, two cut lines, Ft ( xt , yt ) = 0 and Ft +1 ( xt +1 , yt +1 ) = 0 , are selected and taken as the left sides of strip St and St+1 respectively. They also determine the right sides, Ft ' ( xt −1 , yt −1 ) = 0 and Ft '+1 ( xt , yt ) = 0 , of the previous strips. For example, the right side of strip St-1, Ft ' ( xt −1 , yt −1 ) = 0 , is computed by projecting the left side of strip St, Ft ( xt , yt ) = 0 , onto image It-1 using the transformation At, which makes the right side of strip St-1 have the same properties as the left side of strip St in image It. The same corresponding between Ft '+1 ( xt , yt ) = 0 and Ft +1 ( xt +1 , yt +1 ) = 0 is associated with the transformation At+1, etc. Pasting all the strips from the video sequence together can generate the mosaics of the wide scene. At Ft-1 It-1
At+1
Ft’ St-1
Ft It
St-1 St St+1
F’t+1 St
Ft+1 It+1
F’t+2 St+1
Mosaics
1 if I ' ( x, y, ti ) − I ( x, y, t j ) > Tg dij ( x, y ) = 0 otherwise
(11)
where (x, y) is the point in the overlapping region between I(ti) and I(tj). The pixels in the difference image with non-zero values correspond to the regions that discernible changes occurred due to the movement of objects. An empty mask manifold is created, the same size as the mosaic image. Strip selections among the differences images are done in a similar way using the motion estimation results directly. When the difference image dij is aligned to I(tj), the same cut line, Fj, of strip Sj from I(tj) can be selected as the right side of the difference strip S d from dij, and its left side ij
corresponds to the left cut line, Fi, of strip Si from I(ti), etc. All the difference strips are aligned to the coordinates of the mosaic image and pasted onto the mask manifold. The motion mask for wide FOV is formed. Sequentially, the mosaic image marked with the moving objects is composed by overlapping the motion mask onto the mosaic image. The marked area on an object indicates the magnitude and the speed of the motion, the larger, and the faster. 3.3 Multiple Displays and Human-Computer Interaction This module is a direct link between the plentiful visual information and the observer. Dot by dot display is recommended to avoid squeezing the appearances of interested targets. Multi-screen display technique is utilized in the displaying system which includes LCD monitors, multi-display controller and its control software. We prefer to adopt hardware splicing processor as the multi-display controller, in stead of a common used multi-monitor graphics card. It can work in high stability without CPU and the support of operating system, which will not slow down the processing speed of the system. The mosaic image is taken as the input signal to the multidisplay controller and divided into a certain number of output signals to multiple screens. The observer judges the suspicious moving objects and makes a decision to track through the computer-human interaction by clicking on the geometric centres of the suspicious targets with the mouse.
Fig. 3. Schematic diagram of cutting and pasting strips onto the manifold-based mosaics 3.2 Motion Detection Module
3.4 Object Locating Module and Others
The function of this module is to detect the moving objects in the whole surveillance range. A motion detection mask of the wide FOV is constructed, and the mask will be highlighted on the mosaic image, which prompts the observer to pay more attention to these interested regions.
Object locating module is closely related to pan unit control module. It not only controls the pan unit to rotate horizontally at a constant speed, but also inquires angle positions periodically and records the angles of the beginning frame and the ending frame, as φ0 and φn , in each inquiring interval. When an interested target is locked, it is easy to know which strip its geometric centre locates in. If it locates in the middle frames during an inquiring period, as shown in Fig. 4, the angle φi relative to the zero position is calculated as follows.
Taking into account the efficiency of the system, the moving objects are detected based on the difference images. Each frame I(ti) is first aligned to its following one I(tj), combined with the global motion estimation result. Then these aligned images, I’(ti) and I(tj), are subtracted to obtain the difference image dij. The thresholding is applied and only the significant difference is reserved, i.e.
14931
i n
φi = φ0 + (φn − φ0 )
(12)
18th IFAC World Congress (IFAC'11) Milano (Italy) August 28 - September 2, 2011
S0 S1 S 2 Sn
Φ0 O
Φi Φn
Fig. 4. Angle information calculation for object locating After determining the angle location of the target, this angle information is sent to target tracking module. And it will detect and track the target along the indicated direction more precisely. Due to space limitations, we limit our discussion in tracking module here. Our research in this aspect please see (Yang et al. 2009, Chen et al. 2010) for reference. 4. EXPERIMENTS
captured at 20fps. As a matter of fact, the bottleneck of the system easily lies in the real-time performance of image mosaicing. It will directly affect the accuracy of target tracking if delayed. The mosaicing algorithm is simplified by reducing the image data which are being processed. Instead of taking all the pixels in images into motion estimation, only the prominent corners, such as Harris corners (Harris and Stephens 1988, Chen et al. 2009), are extracted to participate into the coarse-to-fine iterative LK algorithm. And the motion between the neighboring frames is calculated approximately by averaging the motion vectors of all the corners. Fig. 6 shows the mosaicing result after this simplification. It is satisfied visually, and the process of image mosaicing is finished in real time. Moreover, the horizontal angle range of the final mosaic image reaches 29.3217°, and its size is 6248×495. The testing result shows that the proposed wide-field monitoring system satisfies the real-time performance requirement, and the resolution of the system is high enough for the observer to monitor.
An experimental platform for simulation is developed seen in Fig. 5. The system runs on the PC with Intel Core 2 Duo E8400/3GHz CPU, a 4GB memory, programmed in VC++ environment. We provide 1×4 multi-screens taking into account the visual monitor range and the receiving speed of humans. The physical resolution of each screen is 1920×1080. Thus it can display a large mosaic image whose size reaches up to 7680×1080 at a time if dot by dot. The surveillance range is enlarged with the rotation of the panorama camera, and the mosaicing work is done in real time. A series of experiments are carried out to verify the functions and the efficiency of the proposed system. Due to space limitations, here, we present two typical experiments to show the performance and its application in transportation.
Experiment for transport environment surveillance
This is an application instance on monitor the transport environments on a bridge 200 meters far away, as a short distance surveillance case. Some of the original video frames in regular FOV are shown in Fig. 7(a), and the difference images between them and their following frames are shown in Fig. 7(b). The mosaic image of the wide surveillance scene is shown in Fig. 7(c), and its mosaiced motion detection result in this wide field is shown in Fig. 7(d). The final displayed image with motion mask is shown in Fig. 7(e). The observer can clearly notice that the bus on the bridge moves much faster than the cars under the bridge owing to its larger movement area, which can conclude that a traffic jam is happened under this bridge. Functions for task of the proposed surveillance system beyond just security, for instance traffic control and coordination of service as seen in this example. The system can achieve the demands of motion detection in wide FOV at the same time. 5. CONCLUSIONS
Fig. 5. The experimental platform for simulation
Experiment for Testing Real-time Performance
A more general experiment for long distance surveillance is carried out to test the real-time performance of the system. The observation scene is about 2km far away outside of our laboratory. The video sequence, 656×494×240 in size, is
The main contribution of this paper should be highlighted is that we propose a real-time wide-field monitoring system based on image mosaicing technique and develop a simulation platform. The system can synthesize a high resolution mosaic image of a wide FOV with marked moving objects in real time without any camera calibration. It has high practical value and can be applied to transportation security surveillance and other applications, such as sea or forests motoring, border surveillance and so on. Next goal is to set up an automatic recognition module based on image indexing and retrieval technique, instead of human-computer interaction.
Fig. 6. The mosaic image for 2 km far away outside of our laboratory
14932
18th IFAC World Congress (IFAC'11) Milano (Italy) August 28 - September 2, 2011
(a) Original video frames
(b) Difference images
(c) Mosaic image
(d) Motion detection mosaics
(e) Final output of mosaic image marked with motion mask Fig. 7 An instance for monitoring the transport environments ACKNOWLEDGEMENT This work is supported by Chinese National Science Fund for Distinguished Young Scholars, No. 60925011, and Provincial and Ministerial Key Fundation Project, No. 9140A17051010BQ01. The authors wish to thank the Vision and Image Group for their fruitful insights and helpful suggestions. Thanks are also due to the Simulation and Modelling Group at Beijing Institute of Technology for their cooperation. REFERENCES Bouguet, J. (2000). Pyramidal implementation of the LucasKanade feature tracker: description of the algorithm. Technical report, OpenCV Document, Intel Microprocessor Research Labs. Chen, J., L.H. Zou, J. Zhang and L.H. Dou (2009). The Comparison and Application of Corner Detection Algorithms. Journal of Multimedia, 4(6), pp. 435-441. Chen, X., L. Dou and J. Zhang (2010). Method for maneuvering target video frequency tracking based on inductive factor of posture information. Journal of Systems Engineering and Electronics, 21(2), pp. 261-267. Harris, C. and M. Stephens (1988). A combined corner and edge detector. Proceedings of the Fourth Alvey Vision
Conference, University of Sheffield Printing Unit, Manchester. pp.147-151. Haering, N. and K. Shafique (2007). Automatic visual analysis for transportation security. IEEE Conference on Technologies for Homeland Security, Woburn, MA, pp. 13-18. Johnson, J. (1958). Analysis of image forming systems. Image Intensifier Symposium, AD 220160 (Warfare Electrical Engineering Department, U.S. Army Research and Development Laboratories, Ft. Belvoir, Va.), pp. 244-273. Lucas, B. D. and T. Kanade (1981). An iterative image registration technique with an application to stereo vision. Proceedings of Imaging Understanding Workshop, pp. 121-130. McLauchlan, P. F. and A. Jaenicke (2002). Image mosaicing using sequential bundle adjustment. Image and Vision Computing, 20(9-10), pp. 751-759. Peleg, S., B. Rousso, A. Rav-Acha and A. Zomet (2000). Mosaicing on adaptive manifolds. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22, pp. 11441154. Rousso, B., S. Peleg and I. Finci (1997). Mosaicing with generalized strips. In DARPA Image Understanding Workshop, New Orleans, United States, pp. 255-260. Robert, K. (2009). Video-based traffic monitoring at day and night. 12th International IEEE Conference on Intelligent Transportation Systems, St. Louis, MO, pp. 1-6. Szeliski, R. (1996). Video mosaics for virtual environments. IEEE Computer Graphics and Applications, 16(2), pp. 22-30. Valera, M. and S.A. Velastin (2005). Intelligent distributed surveillance systems: a review. IEE Pro.-Vision, Image and Signal Processing, 152(2), pp. 192-204. Vu, V.-T., et al. (2006). Audio-video event recognition system for public transport security. IET Conference on Crime and Security, London, UK, pp. 414-419. Winters, N., J. Gaspar, G. Lacey and J. Santos-Victor (2000). Omni-directional vision for robot navigation. Proceedings of IEEE Workshop on Omnidirectional Vision, Hilton Head Island, SC, pp. 21-28. Yang, W., L. Dou and J. Zhang (2009). Video moving object segmentation algorithm based on an improved Kirsch edge operator. Proceedings of the 2009 Chinese Conference on Pattern Recognition and the First CJK Joint Workshop on Pattern Recognition, Nanjing, China, pp. 4-8. Zheng, J. Y. (2003). Digital route panoramas. IEEE Multimedia, 10(3), pp. 57-67. Zou, L.H., J. Chen, J. Zhang and L.H. Dou (2010a). An image mosaicing approach for video sequences based on space-time manifolds. Proceedings of the 29th Chinese Control Conference, Beijing, China, pp. 3003-3006. Zou, L.H., J. Chen, J. Zhang and N. García (2010b). Malaria cell counting diagnosis within large field of view. International Conference on Digital Image Computing: Techniques and Applications, Sydney, Australia, pp. 172-177.
14933