University of Pannonia Information Science and Technology PhD School Thesis Booklet Novel Probabilistic Methods for Visual Surveillance Applications Ákos Utasi Department of Electrical Engineering and Information Systems Supervisor: László Czúni, Ph.D. Veszprém, 2011
1 Overview, Goals The number of surveillance systems deployed in public areas is increasing rapidly, producing so tremendous amount of data that human cannot cope with its processing. Therefore, automatic methods have been developed in the recent decade to aid or to substitute the labour intensive work. We should mention here some applications like traffic monitoring, anomaly and unusual event detection, traffic jam detection, and human, group or crowd activity recognition. From these examples it is obvious that in most cases the real-time processing capability is a mandatory condition for surveillance applications. On the other hand, although the number of digital camera sales have been increased, the number of low-cost analog devices is still significantly higher. The quality of the video produced by such analog camera systems is often very poor, heavily loaded with noise and different aberrations and artifacts are visible. This is particularly true in outdoor urban environment. It is obvious that all the noise cannot be completely removed in real-time.
2 Therefore, robustness is another requirement for automatic visual surveillance methods. All our methods presented in this work have these two key properties and have been tested in urban scenes using low-quality real-life recordings. I have developed novel techniques in three main fields: scene recognition in time-multiplexed multi-camera environment, foreground-background separation for finding moving objects, and unusual event detection for detecting traffic anomalies. The analog multi-camera systems often produce video streams by using time-division multiplexing, resulting in unsegmented time-multiplexed videos, i.e. the correspondences between the video frames and the cameras are unknown. However, most of the surveillance methods operate on a single static camera only. Therefore, in the first task I developed methods, which can be used as a preprocessing step in a multi-camera system for recognizing the different camera views from the visual input data. The presented methods can be used either in offline mode for efficiently segmenting the time-multiplexed archive data, or in online mode for detecting unusual camera or multiplexer events, such as unusual camera order and duration, manual pan-tilt-zoom control, or device malfunction. The second task is the separation of foreground from the static or dynamic background, and I examined the foreground aperture problem, which is usually not handled correctly by most of the algorithms. Therefore, I presented a novel extension to one of the most widely used technique to improve its robustness against this problem.
3 Finally, for the third task I presented a multi-level system, which uses the pixel-wise optical flow directions to learn the typical motion patterns and the fluctuation of the traffic, and to find anomalous traffic events in realtime. The proposed methods can be used in cluttered urban environment, where the traditional object tracking based approaches usually fail or work with high false alarm rate.
4 Research Methodology As already discussed, real-time processing capability and robustness were two key factors during the development of my tools. Therefore, I mainly used robust probabilistic methods, which use theorems and assertions from the field of mathematical statistics, and probability theory. The proposed models use different implementations of mixture of Gaussians, hidden Markov and semi-markov models. Contributions of this thesis are presented in probabilistic modeling of processes, real-time Bayesian detectors using these models, and local innovations in the model structure and in the parameters estimation. To demonstrate the performance of my methods I used real-life urban videos from low-quality cameras in our experiments. These videos have been recorded by the Budapest Police Headquarters and by the members of the Image Processing Laboratory. The methods have been implemented as single-threaded C++ applications without any GPU-based acceleration. Implementing the image processing routines in C++ has been highly fa-
5 cilitated by the Image Processing Library and OpenCV software libraries provided by Intel. The programs have been tested on ordinary PCs under Microsoft Windows XP and Ubuntu Linux operating systems.
Thesis Groups 6 Thesis Groups 1. Thesis group: analysis of time-multiplexed videos Related publications: [1], [2], [3]. The analog multi-camera surveillance systems often produce unsegmented time-multiplexed videos, and the multiplexer is usually not synchronized with the video recorder, that is no additional information about the cameras temporal position in the video stream is available. However, most of the methods developed for security tasks work for static cameras only. Therefore, the first step in a multi-camera system is automatic scene recognition. Existing scene recognition methods do not consider the visual similarity of the images of a camera, the periodicity of the multiplexed segments, and regularity and uncertainty of the segments duration at the same time. Therefore, I introduced novel hidden Markov (HMM) and hidden semi-markov (HSMM) based methods, which take into account these considerations. In offline mode they provide an efficient
Thesis Groups 7 tool for segmenting large amounts of archived data, while in online mode they can be used for the real-time detection of abnormal camera and multiplexer events, such as unusual camera order and duration, manual pan-tilt-zoom (PTZ) control, or device malfunction. (a) Thesis: I designed new HMM and HSMM models for the automatic offline segmentation of time-multiplexed videos. Both methods assume two main attributes: the visual similarity of the segments of the same camera, and the periodicity of the segments in the stream. In addition to these, the HSMM-based method also assumes the uncertainty of the camera duration. In these models I used simple image features to retain high processing speed. I showed experimentally, that both methods can be efficiently used for the segmentation of archived low-quality surveillance videos. (b) Thesis: I introduced novel HMM and HSMM-based detectors for online scene recognition and anomalous camera event detection in time-multiplexed videos. The HMM-based method is capable to detect anomalous order, manual PTZ control and device malfunction events. Besides that, the HSMM-based detector can also be used for detecting unusually long or short camera durations. The proposed detectors have real-time processing performance on ordinary PCs and provides high detection rate both on day and nighttime videos. I proved their practical applicability by
Thesis Groups 8 using low-quality real-life recordings in my experiments. 2. Thesis group: foreground-background separation Related publications: [4], [5]. The separation of moving image parts from the background is an important task in video surveillance applications. The adaptive mixture of Gaussians (MoG) foreground-background separation method is one of the most widely used techniques for motion detection, with known deficiencies induced by the so called foreground aperture problem. Due to this problem the original MoG approach fails in the affected scenarios. Therefore, I extended this method to improve its robustness against the foreground aperture problem, while retaining its real-time processing performance. (a) Thesis: I introduced a novel extension to the adaptive MoGbased foreground-background separation method by modeling the foreground pixels in a separate layer using a single Gaussian. I defined a recursive method between neighboring models to propagate the high covariance values from the borders to the inner parts of homogeneous areas, thereby preventing them from becoming background. Moreover, I defined deterministic steps for the state change between the foreground and background models. According to my experiments, the improved method preserves the shapes of the moving objects more precisely, and improves the robustness of the method against the foreground
Thesis Groups 9 aperture problem significantly, achieving even 50% decrease in the number of misclassified pixels, while decreasing the processing speed by approximately 30%. 3. Thesis group: unusual event detection in surveillance videos Related publications: [6], [7], [8], [9], [10]. Most of the known methods for unusual event detection rely on the trajectories of objects. However, object tracking based approaches work with high false alarm rate in cluttered urban environment. Therefore, I designed new methods for the detection of anomalous traffic events and situations in urban surveillance videos. In situations, where object tracking is unreliable, my proposed methods are able to model the normal traffic with the utilization of pixel-wise optical flow directions. The proposed methods do not need any manual calibration or settings; they only require an automatic training phase using videos of usual activity. In my experiments I used low-quality real-life videos to demonstrate the robustness of my methods against practical problems. (a) Thesis: I introduced novel pixel-level modeling of optical flow directions to learn the usual motion patterns of the video. The usual motion directions are estimated in an automatic training phase. I designed a novel method for estimating the probabilities of unusual motions, which takes into account the temporal Markovian property of the motion vectors. According to my
Thesis Groups 10 experiments, this temporal extension increases the difference between the probabilities of the anomalous and usual events significantly, thereby improves the anomaly detection performance of the methods. (b) Thesis: I introduced a regional HMM-based unusual event detector, which learns the typical motion patterns and the fluctuation of the traffic of a region in the scene. The method uses the temporal changes of the extracted pixel-wise optical flow information to model the rules of the traffic system. (c) Thesis: The low probability values of large numbers of motion vectors at a time result in a numerical precision problem in the HMM training process. To avoid this problem, I introduced a novel scaling technique in the mathematical formulæ of the HMM parameter estimation procedure. I proved that this scaling technique does not change the estimation procedure. Moreover, the proposed scaling technique can be combined with the existing scaling procedure used in the case of long training sequences, and I proved that no modification is required in the estimation procedure when this combined scaling is used. According to my tests, the proposed scaling technique increases the robustness of the training procedure against the precision problem significantly and allows for an approximately five times larger motion vector set to be processed, while no decrease in the training speed is
Thesis Groups 11 noticeable. (d) Thesis: I introduced a hierarchical representation of regional motion models into a discrete HMM to be able to consider the joint state of several regions in the scene. Thereby, this high-level model can be used to describe the relations of spatially distant motion events and can provide a tool for detecting unusual events with greater spatial extent.
Thesis Groups 12 Exploitation of my Results The presented methods are robust, they do not require any special hardware architecture, and achieve real-time processing performance on an ordinary PC. These properties make them also applicable to commercial software products. The developed algorithms directly correspond to ongoing research projects with the participation of the MTA-SZTAKI. Particularly, the aim of the MEDUSA project of the European Defence Agency is to realize an intelligent multi-sensor data fusion grid, and several methods of the third thesis group will be integrated in the final prototype system. The unusual event detection methods of the first thesis of this group were also integrated into the system of the MONLINGV project of the Jedlik Ányos programme.
Related Publications 13 Related Publications [1] Á. Utasi and L. Czúni. Analysis of time-multiplexed security videos. In Proceedings of The 6th IEEE International Conference on Advanced Video and Signal Based Surveillance, pages 547 552, Genoa, Italy, September 2 4 2009. [2] Á. Utasi and L. Czúni. Detecting irregular camera events in time-multiplexed videos. Electronics Letters, 45(18):937 939, 2009. [3] Á. Utasi and L. Czúni. Idő-multiplexelt biztonsági felvételek elemzése. In Proceedings of The 7th Conference of Hungarian Association for Image Processing and Pattern Recognition, Budapest, Hungary, January 28 30 2009. [4] Á. Utasi and L. Czúni. Reducing the foreground aperture problem in mixture of Gaussians based motion detection. In Proceedings of The 14th International Conference on Systems, Signals and Image Processing and 6th EURASIP Conference Focused on Speech and Image Processing, Multimedia Communications and Services, pages 157 160, Maribor, Slovenia, June 27 30 2007. [5] Á. Utasi and L. Czúni. Valós idejű mozgásdetektálás módosított mixture of Gaussians eljárással. In Proceedings of The 6th Conference of Hungarian Association for Image Processing and Pattern Recognition, Debrecen, Hungary, January 25-27 2007. [6] Á. Utasi and L. Czúni. Anomaly detection with low-level processes in videos. In Proceedings of The 3rd International Conference on Computer Vision The-
Related Publications 14 ory and Applications, pages 678 681, Funchal, Madeira, Portugal, January 22 25 2008. [7] Á. Utasi and L. Czúni. HMM-based unusual motion detection without tracking. In Proceedings of The 19th International Conference on Pattern Recognition, pages 1 4, Tampa, FL, USA, December 8 11 2008. [8] Á. Utasi and L. Czúni. Visual analysis of urban road traffic. In Proceedings of The 15th International Conference on Systems, Signals and Image Processing, pages 445 448, Bratislava, Slovak Republic, June 25 28 2008. [9] Á. Utasi and L. Czúni. Detection of unusual optical flow patterns by multilevel hidden Markov models. Optical Engineering, 49(1), 2010. [10] Á. Utasi and L. Czúni. Rendhagyó optikai áramlás detekciója rejtett Markov modellekkel. In Proceedings of The 8th Conference of Hungarian Association for Image Processing and Pattern Recognition, Szeged, Hungary, January 25 28 2011.
Other Publications 15 Other Publications [11] Á. Utasi and Cs. Benedek. A 3-D Marked Point Process Model for Multi- View People Detection. In Proceedings of The 24th IEEE Conference on Computer Vision and Pattern Recognition, Colorado Springs, USA, June 21 23 2011. [12] Á. Utasi and A. Kovács. Recognizing Human Actions by Using Spatiotemporal Motion Descriptors. In Proceedings of Advanced Concepts for Intelligent Vision Systems, Sydney, Australia, December 13 16 2010. [13] Á. Utasi and Cs. Benedek. Multi-Camera People Localization and Height Estimation Using Multiple Birth-and-Death Dynamics. In Proceedings of The 10th International Workshop on Visual Surveillance (in conjunction with ACCV 2010), Queenstown, New Zealand, November 8 12 2010. [14] L. Kovács and Á. Utasi. Shape and Motion Fused Multiple Flying Target Recognition and Tracking. In Proceedings of Automatic Target Recognition XX, at SPIE Defense, Security, and Sensing, Orlando, USA, April 5 7 2010. [15] L. Kovács, Á. Utasi and T. Szirányi. VISRET - A Content Based Annotation, Retrieval and Visualization Toolchain. In Proceedings of Advanced Concepts for Intelligent Vision Systems, pages 265 276, Bordeaux, France, September 28 - October 2 2009. [16] Á. Utasi, L. Kovács, Szlávik, L. Havasi, I. Petrás, and T. Szirányi. Digital Video Event Detector Framework for Surveillance Applications. In Pro-
Other Publications 16 ceedings of The 6th IEEE International Conference on Advanced Video and Signal Based Surveillance, pages 565 570, Genoa, Italy, September 2 4 2009. [17] L. Kovács, Á. Utasi and T. Szirányi. Extraction, Categorization, and Unusual Motion Signaling of Small Moving Objects. In Proceedings of Signal and Data Processing of Small Targets, San Diego, USA, August 2 6 2009. [18] Á. Utasi, Á. Kiss and T. Szirányi. Statistical Filters for Crowd Image Analysis. In Proceedings of The 11th IEEE International Workshop on Performance Evaluation of Tracking and Surveillance (in conjunction with CVPR 2009), pages 95 100, Miami, USA, June 25 2009. [19] L. Havasi, Z. Szlávik, L. Kovács, Á. Utasi, Cs. Benedek and T. Szirányi. Validate Privacy Constraints in Surveillance Systems. In Proceedings of ICT Solutions for Justice, Skopje, Macedonia, September 24 2009. [20] L. Kovács, Z. Szlávik, Cs. Benedek, L. Havasi, I. Petrás, D. Losteiner, Á. Utasi, A. Licsár, L. Czúni, and T. Szirányi. Video Surveillance Framework for Crime Prevention and Event Indexing. In Proceedings of ICT Solutions for Justice,, Thessaloniki, Greece, October 24 2008. [21] Z. Szlávik, L. Kovács, L. Havasi, Cs. Benedek, I. Petrás, Á. Utasi, A. Licsár, L. Czúni, and T. Szirányi. Behavior and event detection for annotation and surveillance. In Proceedings of the 6th International Workshop on Content- Based Multimedia Indexing, pages 117 124, London, UK, June 18 20 2008. [22] Á. Utasi and L. Czúni. Unusual Event Detection in Low-Quality Urban Surveillance Videos with Modeling Motion Directions. In Proceedings of The Asia-Pacific Workshop on Visual Information Processing, Tainan, Taiwan, December 15 17 2007.