Real-Time Event Detection System for Intelligent Video Surveillance

DLSU Engineering e-journal Vol. 1 No. 2, September 2007, pp.31-39 Real-Time Event Detection System for Intelligent Video Surveillance Timothy John A. Chua Andrew Jonathan W. Co Paolo Javier S. Ilustre Department of Electronics and Communications Engineering De La Salle University-Manila email: tjcmasamune@yahoo.com, andrew.jonathan.co@gmail.com, coolblue_a14@yahoo.com Enrique M. Manzano Edzel R. Lapira Department of Electronics and Communications Engineering De La Salle University, Manila email: manzanoe@dlsu.edu.ph, lapirae@dlsu.edu.ph The monitoring of CCTV cameras is heavily dependent on the efficiency of security personnel which leaves a lot to be desired when 50-100 live feeds are to be simultaneously monitored for extended periods of time The more expensive solution is the addition of alarms to every location under surveillance. A more prudent approach, which this paper proposes, is to extend the functionality of the already available CCTV camera s by allowing a computer to analyze the live feed using digital signal processing techniques. A proof-of-concept Real-time Event Detection Automated Surveillance System is presented here. The system takes a live feed from an analog camera via an encoder connected to the USB port of a computer. Various image processing techniques are then applied to the frames in or der to separate the foreground from the background. Useful information is subsequently extracted from the foreground and is used in identifying objects of interest, establishing object correspondence across consecutive frames and in analyzing the behavior of the object itself. Detect ed alarm events are noted by means of a log, which is visible through the use r interface of the system.

32 Chua, Co, Ilustre, Manzano and Lapira 1.0 INTRODUCTION Security and safety are key issues which have high priority in the concerns of society today. These concerns paved the way for the rise and the usage of CCTV networks, which most institutions use today for monitoring purposes. Unfortunately, statistics show that normal-sized buildings have around 50-100 cameras, all of which require constant and simultaneous monitoring. This is no easy task, considering that control rooms usually host only one or two security personnel. If a surveillance operator were tasked to monitor the camera feeds for several hours, his attention would decrease over time. The probability of missing alarm situations would increase accordingly. Instead of adding alarms, which can be very expensive, this paper suggests to automate the monitoring process using a computer to analyze the live feed via digital signal processing techniques. Since the system detects events in real time, quick and proper detection of suspicious events is guaranteed which eventually will lead to increased safety of people and property concerned. This is the most important benefit the system provides. The system, in addition, aims to: (1) reduce the possibility of negligence on the security personnel s part; (2) reduce the workload of the security personnel, effectively allowing them to give their attention to other important matters; and (3) improve the security measures in any entity of application. The system has been tested under two settings: a walkway or corridor setting, and a bank lobby or art gallery setting. 2.0 SYSTEM BLOCK DIAGRAM As seen in Figure 1, there are five major blocks in the system: image acquisition, image processing, feature extraction, tracking and alarm detection. The image acquisition module includes the hardware setup (complete with the camera and accompanying peripherals) and the internal frame grabber used to acquire the input. This module produces frames which serve as input to the next module. The image processing module performs various procedures on the input acquired from the previous module, such as preprocessing and post-processing image operations, in order to obtain useful information to send to the next module. The output of this module is the list of the blobs in the image. Blobs are collections of neighboring white pixels that are grouped together. The details of these processes will be discussed in latter sections. The feature extraction module takes the blobs from the previous module and extracts important properties from them. These properties can be used either in object identification, tracking, or behavior analysis (alarm detection).

Real-Time Event Detection System for Intelligent Video Surveillance 33 Figure 1. System Block Diagram The tracking module provides correspondence throughout the frames. In other words, it follows a blob, through its features, as it moves along the camera s field of view. The alarm detection module handles all alarm-related functions of the system. Input taken from the tracking section of the system is analyzed and logging actions are performed in response to these events. Upon the occurrence of an alarm event, the system triggers the logging mechanism of the system. The alarm, together with its details, is noted in the log visible through the system s user interface. 3.0 THEORY AND DESIGN CONSIDERATIONS 3.1 Image Processing The image processing block contains 3 modules: background modeling, morphological operation and region labeling, as seen in Figure 2. The system does not apply any pre-processing operation, such as median filtering or Gaussian smoothing, due to the minimal noise of the camera, but it is an open option in the event that the system is deployed in a noisier environment. The background modeling module is further subdivided into 3 modules, as shown in Figure 3. Various background updating algorithms have been used in the literature [1-7], and this project utilizes the (Selective) Running Gaussian Average Model [3,4], which models every pixel in the frame as a Gaussian distribution. The background is modeled by two parameters: the mean (µ) and the deviation (R). They are updated, according to equations below: µ i( x, y) = αfi( xy, ) + (1 α ) µ i 1( xy, ) (1) R xy Fxy xy R xy (, ) = β (, ) µ (, ) + (1 β) (, ) i i i i 1 (2) I ( xy, ) = F( xy, ) µ ( xy, ) (3) i i i Where α and β are learning constants, Fi is the current pixel value, and µi-1 and Ri-1 are previous control values.

34 Chua, Co, Ilustre, Manzano and Lapira Image subtraction [8] is then performed between the current frame values and the reference frame values. Only the magnitude of the difference is considered, so as not to get negative values. Figure 2. Image Processing Block Figure 3. Subdivisions of the Background Modeling Block Thresholding [10] is finally applied to separate the background from the foreground. The threshold value is dictated by the deviation value (R). If considered as foreground. Otherwise, it is set as background. Ii ( x, y) > kr, it is For the post-processing operations a simple image closing [8] operation is performed, which is a dilation-erosion pair done in cascade. This helps emphasize regions of interest while limiting noise at the same time. A 5x5 square structuring element is usedwhich is a compromise between computational efficiency and ability to improve the binary images. The region labeling module groups neighboring white pixels together and labels them as blobs. This application makes use of the 8-connectivity rule, which considers diagonally-adjacent pixels as connected. 3.2 Feature Extraction Features are properties of blobs (i.e. groups of connected white pixels), which can be used in the latter modules of the system. These modules include tracking, behavior analysis or alarm detection, and identification of objects of interest. The system makes use of four features: Area, Centroid Location, Bounding Box and Orientation [9]. Although other features can be extracted, the aforementioned four features were sufficient for the alarms detected by this system. The area is the sum of all white pixels within a blob, provided that the foreground is set to white. It is used in identifying objects of interest. The centroid location is the position of the centroid of a blob. It is used in tracking objects throughout the field of view. The orientation is the angle between the major axis of a blob and the horizontal plane (x-axis). An orientation greater than a set threshold could imply a standing up position, while the opposite could be indicative of a lying down position. Finally, the bounding box is the smallest rectangle that can contain a blob. It is defined by four points, which together denote the sides of the rectangle.

Real-Time Event Detection System for Intelligent Video Surveillance 35 3.3 Tracking The tracking section of this project utilizes a simple Euclidean distance matching [10] algorithm to follow a blob throughout the field of view. The formula for computing distance is similar to that of solving for the hypotenuse of right triangles. The same object theoretically can only travel a limited distance between two frames, and this means of tracking is based on that principle. 3.4 Alarm Detection The system detects five alarms, based on the features extracted. It detects one area alarm: trespassing [11], and four behavior alarms: running, blocking, crawling and lying down [12]. Running is detected when the orientation of a blob dictates a standing up position and the centroid location displacement between two frames is greater than a set threshold. Blocking occurs if the orientation again implies a standing up position but the centroid displacement between two frames is less than a set threshold. Crawling occurs when a blob s orientation implies a lying down position and centroid displacement greater than a set threshold. Lying down occurs when the orientation is similar to crawling but the centroid displacement is less than a set threshold. Finally, trespassing, which is the single area alarm, occurs when the centroid of a blob is detected in a pre-defined restricted area. Alarm counters are utilized for robustness, that is, to prevent false alarms. Alarm reset counters are also utilized to prevent even more false alarms. All of these values can be adjusted by the user, depending on the setting in which the system will be deployed. The system is real-time in that once the counters of an alarm reach the threshold set by the user, the system automatically registers the event as an alarm. 4.0 EXPERIMENTAL SETUP The system was tested in an indoor environment, with a field of view of about 8 meters by 8 meters in area. The camera was mounted at a height of about 7.5 feet (2.3 meters) in one corner of the room. The system s parameters were then calibrated based on these conditions. These parameters included alarm counter limits, alarm reset counter limits, and thresholds. The system was allowed to run uninterrupted for 15 minutes at a time, as it monitored actions and alarm events patterned after normal circumstances. There were two settings considered in testing the system. For robustness purposes, the system was tested several times at different times of the day. The first setting was a corridor setup. The trespassing alarm was turned off while the rest of the alarms were turned on. The second setting was a lobby setup (art gallery). For the second setting, the blocking alarm was turned off while the rest of the alarms were turned on. This was because people tend to linger around a spot in this kind of setting.

36 Chua, Co, Ilustre, Manzano and Lapira 5.0 RESULTS AND ANALYSIS 5.1 Data Before system testing, a script containing scenes was prepared in order to have a comparison point with the data acquired from the system test. Both normal actions and alarm events were distributed evenly across the scenes. It should be noted that actions that the system considers as normal are not reported and are subsequently ignored. Table 1. Summary of Data for Alarm Detection Ratings Alarms Actual Alarms Misses Accuracy Blocking 28 0 100% Crawling 13 1 92.31% Running 19 3 84.21% Lying Down 16 1 93.75% Trespassing 14 0 100% The table above shows the total number of actual alarm events that occurred (according to script) compared to the misses, which are alarm events that were not detected by the system. Various reasons account for these misses, such as the innate limitations brought about by using a single fixed camera (e.g. line-of-sight problems, occlusions). An example of such a missed detection would be a subject that runs along the camera s line-of-sight. Although the subject is running, the camera would only be able to see variations in the size of the perceived blob, as the subject approaches or recedes from the camera. Another example would be if a subject were to lie down along the line-of-sight of the camera. Instead of seeing an object lying down, the camera would see an object blocking, since it is highly possible for the subject s perceived orientation (in degrees) to be greater than the preset threshold. A few other missed detections occurred when the alarm counters did not reach their preset threshold values. This would mean that the subject had not performed the alarm action long enough for the system to consider the event significant. Upon the end of the said action, the reset counters eventually reached their own threshold and reset the alarm counters. Such actions were monitored, but not reported by the system. The accuracy of alarm detection is computed according to the following formula: Accuracy DETECTION Actualalarms Misses = x100% (4) Actualalarms Table 2 below summarizes the false alarm ratings of the system, with respect to the five alarms.

Real-Time Event Detection System for Intelligent Video Surveillance 37 TABLE 2. SUMMARY OF DATA FOR FALSE ALARM RATINGS Alarms False Alarm False Alarm Hits Alarms Rate Blocking 29 1 3.45% Crawling 12 0 0.00% Running 18 2 11.11% Lying Down 16 1 6.25% Trespassing 14 0 0.00% The alarm hits represent the total number of alarm events detected by the system, while false alarms represent the detections that do not reflect the actual events dictated by the script. For example, a lying-down action could be falsely reported as a blocking alarm. Likewise, a normal event (normal walking) could be detected by the system as running. The false alarms can be attributed to line-of-sight problems brought about by the use of a single, static camera. False alarms can also occur if a subject of interest unintentionally exhibits characteristics of an alarm event. This can also trigger alarms. This false alarm rate can be computed according to the following formula: Rate FALSEALARM FalseAlarm = x100% (5) ActualHits The counters and the reset counters provide the system with a means to cope with the occurrence of potential errors (missed detections and false alarms). A balance must be reached in order for the system to perform optimally. A reduction in the counter values will lessen missed alarms, but will inevitably increase false alarms, and vice-versa. The system is considered real-time in that once an alarm counter reaches its threshold value, the system immediately makes a log of the associated alarm event. The proper choice of counter limits is important, since these values determine whether the system detects alarms satisfactorily in a particular setting. 5.2 Performance The system, as shown in the block diagram, is composed of several modular stages that depend on each other only for input. In order to optimize the use of the processor, the system uses multiple threads of execution to run the aforementioned modules. With the help of multithreading, the system is designed to allot more processor resources to the modules that are more computationally expensive, and vice versa. In addition, the premise of concurrent processing in a multithreaded program implies that the system could experience a significant performance boost in a multiple-processor system. The system processes data at an input rate of 5 frames per second. It was implemented on a computer with a 2.8-GHz microprocessor and 512 MB RAM. However, it was also successfully tested and deployed using the same frame rate of 5 frames per second on

38 Chua, Co, Ilustre, Manzano and Lapira a laptop with a 1.7-GHz Mobile Centrino processor and 512 MB of RAM. A picture of the system s user interface is shown in Figure 4. 6.0 CONCLUSIONS In this project, an example of a real-time intelligent surveillance system capable of analyzing and detecting alarm events was created. The system utilizes input from a grayscale camera and performs various image processing operations to the acquired input. From these operations, relevant data were acquired, which were subsequently used in identifying objects of interest, tracking the aforementioned objects, and analyzing their behavior. The system was able to successfully implement a multithreaded pipeline model, with the goal of maximizing the use of available processor resources and contributing towards the system s execution speed. Modules of the system logic were grouped into different worker threads. The system application observes proper thread safety practices with regard to exchanging data between threads through the use of thread-safe data structures. Various additional features were also added to the system for ease in usage. One such feature is the ability to toggle between monitoring and ignoring any of the five alarm events. In addition, upon detection of an alarm event, the system saves the frame during which the said event was reported, and draws a bounding box on the subject that performed the alarm event for identification and verification purposes. Lastly, the user interface allows users to calibrate the system to different settings by providing a facility for changing settingspecific values, such as the alarm and reset counter thresholds. These features extend the variety of applications where the system can be used in. Future projects may incorporate other means of background updating, in order to compensate possible weaknesses of the algorithm used. In the same light, more complex tracking algorithms may also be used to address some issues, like fully handling occlusion and splitting or compensating for the innate weaknesses of using single-static cameras. These improvements are means towards building a complete and robust intelligent surveillance system. 7.0 REFERENCES [1] Lo, B. P. L. & Velastin, S. A. (2000). Automatic congestion detection system fo r underground platforms. Proceedings of 2001 International on Intelligent Multimedia Video and Speech Processing, pp. 158-161, 2000. [2] Cucchiara, R., Grana, C., Piccardi, M. & Prati, A. (Oct 2003). Detecting moving objects, ghosts and shadows in video streams. IEEE Transactions on Pattern Analysis and Machine Intelligence. 25(10), 1337-1342. [3] Wren, C., Azarbayejani, A., Darrell, T. & Pentland, A. (July 1997). Pfinder: Real-time tracking of the human body. IEEE Transactions on Pattern Analysis and Machine Intelligence. 19(7), 780-785. [4] Koller, D., Weber, J., Huang, T., Malik J., Ogasawara, G., Rao, B. & Russel, S. (2004). Towards robust automatic traffic scene analysis in real-time, in Proceedings of International Conference on Pattern Recognition, pp. 126 131, 2004.

Real-Time Event Detection System for Intelligent Video Surveillance 39 [5] Stauffer, C. & Grimson, W. (1999). Adaptive background mixture models for real-time tracking. Proceedings of CVPR 1999, pp. 246-252, 1999. [6] Elgammal, A., Harwood, D. & Davis, L.S. (1999). Non-parametric model for background subtraction. Proceedings of ICCV 99 FRAME-RATE Workshop, 1999. [7] Piccardi, M. & Jan, T. (Oct 2004). Efficient mean-shift background subtraction. In Proceedings of IEEE 2004 KIP, Singapore, 2004. [8] Shapiro, L. & Stockman, G. (2001). Computer vision. USA: Prentice Hall. [9] Mathworks. (2004). MATLAB image processing toolbox [Help file]. [10] Xu, L.Q., Landabaso, J.L., Lei, B. (July 2004). Segmentation and tracking of multiple moving objects for intelligent video analysis. BT Technology Journal, 22(3), 140-150. [11] Duque, D., Santos, H. & Cortez, P. (n.d.) The OBSERVER: An intelligent and automated video surveillance system. Retrieved Oct 18, 2006 from https://repositorium.sdum. uminho.pt/bitstream/1822/5602/1/41410898.pdf -restricted area. [12] Clarity Visual Intelligence. (n.d.) Retrieved Oct 18, 2006 from http://www.clarityvi.com/ pdf/vss.pdf. ABOUT THE AUTHORS Timothy John A. Chua (summa cum laude), Andrew Jonathan W. Co, and Paolo Javier S. Ilustre earned their BS Electronics and Communications Engineering degrees from De La Salle University Manila in December 2006. Enrique Manzano is an Assistant Professor with the Electronics and Communication Department and Physics Department at De La Salle University Manila having obtained his Bachelors and Masters degrees in Electrical Engineering from the University of the Philippines, Diliman. His research interests include instrumentation, conducting polymers, and digital signal processing. Edzel R. Lapira is an Assistant Professor with the Electronics and Communications (ECE)Department at De La Salle University-Manila. He obtained his BS ( 99) and MS( 03) in Electronics and Communications Engineering from the same university. His research interests include digital watermarking, blind source recovery, image processing and VLSI.