Color-Based Object Tracking in Multi-Camera Environments

Similar documents
Mean-Shift Tracking with Random Sampling

Practical Tour of Visual tracking. David Fleet and Allan Jepson January, 2006

A Comparative Study between SIFT- Particle and SURF-Particle Video Tracking Algorithms

A Robust Multiple Object Tracking for Sport Applications 1) Thomas Mauthner, Horst Bischof

2. Colour based particle filter for 3-D tracking

An Active Head Tracking System for Distance Education and Videoconferencing Applications

The Advantages of Using a Fixed Stereo Vision sensor

Real Time Target Tracking with Pan Tilt Zoom Camera

False alarm in outdoor environments

Design of Multi-camera Based Acts Monitoring System for Effective Remote Monitoring Control

VEHICLE TRACKING USING ACOUSTIC AND VIDEO SENSORS

An Experimental Comparison of Online Object Tracking Algorithms

A Movement Tracking Management Model with Kalman Filtering Global Optimization Techniques and Mahalanobis Distance

A PHOTOGRAMMETRIC APPRAOCH FOR AUTOMATIC TRAFFIC ASSESSMENT USING CONVENTIONAL CCTV CAMERA

Communicating Agents Architecture with Applications in Multimodal Human Computer Interaction

Fusing Time-of-Flight Depth and Color for Real-Time Segmentation and Tracking

An Energy-Based Vehicle Tracking System using Principal Component Analysis and Unsupervised ART Network

Kernel-Based Hand Tracking

FACE RECOGNITION BASED ATTENDANCE MARKING SYSTEM

Interactive person re-identification in TV series

Robust Pedestrian Detection and Tracking From A Moving Vehicle

Distributed Vision Processing in Smart Camera Networks

VEHICLE LOCALISATION AND CLASSIFICATION IN URBAN CCTV STREAMS

Model-Based 3D Human Motion Capture Using Global-Local Particle Swarm Optimizations

Network Video Image Processing for Security, Surveillance, and Situational Awareness

A Survey of Video Processing with Field Programmable Gate Arrays (FGPA)

Towards License Plate Recognition: Comparying Moving Objects Segmentation Approaches

Tracking and integrated navigation Konrad Schindler

3D Template-Based Single Camera Multiple Object Tracking

Speed Performance Improvement of Vehicle Blob Tracking System

SIMULTANEOUS LOCALIZATION AND MAPPING FOR EVENT-BASED VISION SYSTEMS

Probabilistic Latent Semantic Analysis (plsa)

Vehicle Tracking Using On-Line Fusion of Color and Shape Features

International Journal of Innovative Research in Computer and Communication Engineering. (A High Impact Factor, Monthly, Peer Reviewed Journal)


Tracking in flussi video 3D. Ing. Samuele Salti

RECOGNIZING SIMPLE HUMAN ACTIONS USING 3D HEAD MOVEMENT

Motion Capture Sistemi a marker passivi

ISSN: A Review: Image Retrieval Using Web Multimedia Mining

Automatic Traffic Estimation Using Image Processing

ARC 3D Webservice How to transform your images into 3D models. Maarten Vergauwen

Interactive Offline Tracking for Color Objects

The Visual Internet of Things System Based on Depth Camera

A Prototype For Eye-Gaze Corrected

Object Tracking for Laparoscopic Surgery Using the Adaptive Mean-Shift Kalman Algorithm

Epipolar Geometry. Readings: See Sections 10.1 and 15.6 of Forsyth and Ponce. Right Image. Left Image. e(p ) Epipolar Lines. e(q ) q R.

Classifying Manipulation Primitives from Visual Data

Face Recognition in Low-resolution Images by Using Local Zernike Moments

Robot Perception Continued

Synchronous Image Acquisition based on Network Synchronization

Tracking And Object Classification For Automated Surveillance

Real-Time Tracking of Pedestrians and Vehicles

REAL TIME TRAFFIC LIGHT CONTROL USING IMAGE PROCESSING

Deterministic Sampling-based Switching Kalman Filtering for Vehicle Tracking

ACCURACY ASSESSMENT OF BUILDING POINT CLOUDS AUTOMATICALLY GENERATED FROM IPHONE IMAGES

Analysis of a Device-free Passive Tracking System in Typical Wireless Environments

Tracking Groups of Pedestrians in Video Sequences

Evaluating the Performance of Systems for Tracking Football Players and Ball

REPRESENTATION, CODING AND INTERACTIVE RENDERING OF HIGH- RESOLUTION PANORAMIC IMAGES AND VIDEO USING MPEG-4

Tracking and Recognition in Sports Videos

Camera Technology Guide. Factors to consider when selecting your video surveillance cameras

HSI BASED COLOUR IMAGE EQUALIZATION USING ITERATIVE n th ROOT AND n th POWER

The Scientific Data Mining Process

Vehicle Tracking by Simultaneous Detection and Viewpoint Estimation

Human Behavior Analysis in Intelligent Retail Environments

A Learning Based Method for Super-Resolution of Low Resolution Images

PHOTOGRAMMETRIC TECHNIQUES FOR MEASUREMENTS IN WOODWORKING INDUSTRY

A Reliability Point and Kalman Filter-based Vehicle Tracking Technique

Video-Based Eye Tracking

Robust Infrared Vehicle Tracking across Target Pose Change using L 1 Regularization

Using Gaussian Process Annealing Particle Filter for 3D Human Tracking

Vision-Based Blind Spot Detection Using Optical Flow

ARTICLE. 10 reasons to switch to IP-based video

Detection and Restoration of Vertical Non-linear Scratches in Digitized Film Sequences

A Study on SURF Algorithm and Real-Time Tracking Objects Using Optical Flow

Behavior Analysis in Crowded Environments. XiaogangWang Department of Electronic Engineering The Chinese University of Hong Kong June 25, 2011

HANDS-FREE PC CONTROL CONTROLLING OF MOUSE CURSOR USING EYE MOVEMENT

Vision based approach to human fall detection

Efficient Background Subtraction and Shadow Removal Technique for Multiple Human object Tracking

Automatic parameter regulation for a tracking system with an auto-critical function

Visual Tracking. Frédéric Jurie LASMEA. CNRS / Université Blaise Pascal. France. jurie@lasmea.univ-bpclermoaddressnt.fr

Build Panoramas on Android Phones

Fall detection in the elderly by head tracking

The Big Data methodology in computer vision systems

siftservice.com - Turning a Computer Vision algorithm into a World Wide Web Service

How does the Kinect work? John MacCormick

Doppler. Doppler. Doppler shift. Doppler Frequency. Doppler shift. Doppler shift. Chapter 19

BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION

Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite

Whitepaper. Image stabilization improving camera usability

Tracking of Small Unmanned Aerial Vehicles

High speed 3D capture for Configuration Management DOE SBIR Phase II Paul Banks

Cloud-Empowered Multimedia Service: An Automatic Video Storytelling Tool

Parametric Comparison of H.264 with Existing Video Standards

A General Framework for Tracking Objects in a Multi-Camera Environment

Automated Recording of Lectures using the Microsoft Kinect

A Wide Area Tracking System for Vision Sensor Networks

Combining good practices in fully online learning environment introductory physics course

Interactive Information Visualization of Trend Information

Comparing Improved Versions of K-Means and Subtractive Clustering in a Tracking Application

Transcription:

Color-Based Object Tracking in Multi-Camera Environments In Proceedings of the DAGM 03, Springer LNCS 2781, pp. 591-599, Sep 2003 Katja Nummiaro 1, Esther Koller-Meier 2, Tomáš Svoboda 2, Daniel Roth 2, and Luc Van Gool 1,2 1 Katholieke Universiteit Leuven, ESAT/VISICS, Belgium {knummiar,vangool}@esat.kuleuven.ac.be, 2 Swiss Federal Institute of Technology (ETH), D-ITET/BIWI, Switzerland {ebmeier,svoboda,vangool}@vision.ee.ethz.ch Abstract. Smart rooms provide challenging research fields for surveillance, human-computer interfacing, video conferencing, industrial monitoring or service and training applications. This paper presents our efforts toward building such an intelligent environment with a calibrated multi camera setup. For a virtual classroom application, the best view is selected automatically from different cameras that follow the target. Real-time object tracking, which is needed to achieve this goal, is implemented by means of color-based particle filtering. The use of multiple models for a target results in a robust tracking even when the view on the target changes considerably like from the front to the back. Information is shared between the cameras and used for (re)initializing an object with epipolar geometry and the prior knowledge of the target model characteristics. Experiments in our research environment show the possible uses of the proposed system. 1 Introduction Intelligent environments provide a wide variety of research topics like object tracking, face/gesture recognition or speech analysis. Such multimodal sensor systems attract application areas like surveillance, human-computer interfacing, video conferencing, industrial monitoring or service and training tasks. Within this paper we focus on the autonomous processing of visual information based on a network of calibrated cameras. Each camera system comprises a recognition and tracking module which locates the target in the observed scene. Both modules operate on color distributions while multiple color models for a target are handled simultaneously. To document interesting events on-line, an automated virtual editor is included in a central server and produces a video stream by switching the camera to the best view in the room. Figure 1 illustrates the system architecture of our multi-camera setup. For the integration of multiple cameras, the ViRoom (Visual Room) system by Doubek et al. [5] is used. The modular architecture is constructed from lowcost digital cameras and standard computers running under Linux and allows

consistent, synchronized image acquisition. We extended the ViRoom software by an automatic camera control to keep the target in the center of the view as various applications are more interested in a clear image view than in the exact location of the target. Accordingly, from the detected location, the pan and tilt angles are calculated and the cameras are rotated into the appropriate direction. A color adjustment is applied during the calibration of the cameras since the images show in general different brightness and color characteristics. The research area of multi sensor systems is very active [3, 8, 9, 13]. For instance, a flexible multi-camera system for low bandwidth communication is presented by Comaniciu et al. [3]. Based on color tracking the target on the current image can be transmitted in real-time with high resolution. Khan et al. [8] describe an interesting approach to track people with multiple cameras that are uncalibrated. When a person enters the field of view of one camera, the system searches for a corresponding target in all other cameras by using previously compiled field of view lines. Krumm et al. [9] describe the tracking in an intelligent environment using two calibrated stereo cameras that provide both depth and color information. Each measurement from a camera is transformed into a common world coordinate system and submitted to a central tracking module. The work most closely related to ours is that of Trivedi et al. [13], where an overall system specification for an intelligent room is given. The 3D tracking module operates with multiple cameras and maintains a Kalman filter for each object in the scene. In comparison we use a more general representation of the probability distribution of the object state which allows to initialize this distribution along the epipolar lines when an object enters the field of view of a camera. In our system the best view is selected according to the quality of the tracking results for the individual cameras while Trivedi et al. utilize the motion of the tracked target. The outline of this paper is as follows. Section 2 presents a short review of the color-based tracking technique. In Section 3 the multiple target models used for the tracking are explained. Section 4 presents the exchange of information in the camera network while Section 5 explains the selection of the optimal camera view. In Section 6 some experimental results are presented and finally, Section 7 concludes the paper. 2 Tracking Robust real-time tracking of non-rigid objects is a challenging task. Color histograms provide an efficient feature for this kind of tracking problems as they are robust to partial occlusion, are rotation and scale invariant and computationally efficient. The fusion of such color distributions with particle filters provides an efficient and robust tracker in case of clutter and occlusion. Particle filters [6] can namely represent non-linear problems and non-gaussian densities by propagating multiple alternate hypotheses simultaneously. The color-based particle filter [10, 11] approximates the posterior density by a set of weighted random samples {(s (n) t, π (n) t )} N n=1 conditioned on the past ob-

Camera calibration parameters Image sequence Camera 1 Pan / tilt movement Image sequence Camera 2 Pan / tilt movement Tracking module 1 Tracking module 2... Tracking results Target models Data Image sequence / tracking result Image sequence / tracking result View selection Final image sequence Fig. 1. The sketch of the system architecture with multiple cameras and the virtual editor. servations. Each sample s represents one hypothetical state of the object, with a corresponding discrete sampling probability π, where N n=1 π(n) = 1. The tracked object state is specified by an elliptical region s = {x, y, ẋ, ẏ, H x, H y, Ḣ} (1) where x, y represent the location of the ellipse, ẋ, ẏ the motion, H x, H y the length of the half axes and Ḣ the corresponding scale change. In order to compare the histogram of such a hypothesized region p s (n) with the target histogram q from an initial model, a similarity measure based on the Bhattacharyya coefficient [4, 1, 7] ρ[p (n) s, q] = t m u=1 p (u) s (n) t q (u) (2) is used, where u represents the number of bins for the histograms. 3 Multiple Target Models To support multiple cameras, we have to use more than one histogram for a target, as for instance one camera might have a frontal view of the head, while an oppositely placed camera will have a view of the back of the head. Since we track independently in each camera, the correspondence between the histograms must be established.

1 0.9 Bhattacharyya coefficient 0.8 0.7 0.6 0.5 0.4 0.3 facial view (frame 1) back view (frame 15) side view (frame 21) 0.2 0 5 10 15 20 25 30 Frame index Fig. 2. Top row: The three main views and histogram regions for the target model (frames 1, 15 and 21). Bottom row: The plotted Bhattacharyya values from a turning head sequence, using the different target models (facial, side and back view) and comparing them to the head region. Three characteristic images from the front, the side and the back are selected as initial target models from a recorded sequence of the person to be tracked and the corresponding histograms q = {q f, q s, q b } are stored. During the tracking, the similarity measures to these three histograms are included in the object state. By using a linear stochastic model for the propagation, the Bhattacharyya coefficients for the next frames can be estimated and rapid changes of this coefficient are restricted. Figure 2 shows the different values during the turning head sequence that is used to determine the target histograms. As can be seen, the Bhattacharyya coefficients change smoothly. The initial samples of the individual particle filters are spread over the whole image or strategically placed at positions where the target is expected to appear. A target is now recognized on the basis of the three Bhattacharyya coefficients where the best matching model is taken as the target model. By calculating the mean value µ and the standard deviation σ of the Bhattacharyya coefficient for elliptic regions over all the positions of the background in the initialization step, we define an appearance condition as ρ[p (n) s, q] > µ + 2σ. (3) t This indicates a 95% confidence that a sample does not belong to the background. If a fraction b N of the samples shows a high enough correspondence to one of the

target histograms the object is considered to be found and the tracking process is started. The parameter b = 0.1 has been proven sufficient in our experiments and is called the kick-off fraction. During tracking, the target model of each camera is adapted as described in [10]. 4 Exchanging Information across Cameras Exchanging information between the cameras is important to provide a robust tracking. Within our system we focus on the (re)initialization of the individual trackers. As we are working with a calibrated camera setup [12], we utilize the tracking results of specific cameras to initialize the remaining trackers via epipolar geometry. The epipolar geometry is used in two different ways: 1) During initialization when one of the target models matches an object, 2) When the object is temporally lost due to clutter, occlusions or other difficult tracking conditions. Initialization: When an object is detected in one camera, we try to initialize it in the other cameras. For this purpose, the epipolar lines are calculated that correspond to the estimated target location. Samples are then placed stochastically around these lines and the velocity components are chosen from Gaussian distributions. If the object is already visible in more than one camera, the intersection of the corresponding epipolar lines is calculated and the samples are distributed around this point. Reinitialization: If less than b N of the samples fulfill the appearance condition which is explained in Eq. 3, we consider the object to be lost. In this case, we use the epipolar lines and their intersections to reinitialize an object during tracking. A fraction of the samples are then spread around the epipolar lines of the other cameras while the remaining samples are propagated normally. 5 Best View Selection Transmission of presentations in intelligent environments are attractive due to wide application areas such as surveillance or training tasks like a virtual classroom. In the context of the ViRoom, we are interested in a front view of the face whereby the person is moving freely in our smart room. Accordingly, we have developed a automated virtual editor which creates a video stream by switching to the best view in the room. Given several camera inputs, it automatically chooses the one that gives the best front view of the target. The camera hand-over is controlled on the basis of the tracking results by evaluating the Bhattacharyya coefficient (see Eq. 2) of the mean state of each tracker ρ[p E[S], q f ] = E[S] = m u=1 p (u) E[S] q(u) f (4) N π (n) s (n). (5) n=1

Fig. 3. The best view selection according to the Bhattacharyya coefficients of the individual trackers is shown. The small images on the bottom show the current tracking results of the individual camera views as white ellipses and the corresponding Bhattacharyya coefficients. In the top row the best target model and the output of the virtual editor are displayed. As the Bhattacharyya coefficient represents a similarity measure in respect to the target histogram, the virtual editor chooses always the camera view which provides the highest Bhattacharyya coefficient in means of the facial view. Figure 3 illustrates the best view selection on the basis of the individual Bhattacharyya coefficients. 6 Results In this Section, experimental results demonstrate the capabilities and limitations of our distributed tracking system. All images are captured from live video streams and have a size of 160 120 pixels. The application runs at 4-5 frames per seconds without any special optimization on Pentium II PCs at 1GHz under Linux where each of the three cameras is attached to its own computer. The capability of the virtual editor is illustrated in Figure 4. It can be seen that the camera hand-over automatically chooses the best front view of the tracked face even if it is partly occluded. The initialization of the tracker plays an important role in the multi-camera tracking. However, when there are several possible objects in the neighborhood of the epipolar lines, a wrong object can be selected. Such a scene is shown in Figure 5. In the top row, the situation is handled correctly as the target is occluded in the middle camera and not initialized. In the second row, the target is not localized correctly whereas in the third row a wrong target is selected. In both cases, the target is occluded in two cameras, so that the samples are spread along an epipolar line and not around a intersection point.

Fig. 4. The virtual editor automatically chooses the best front view of the tracked face. 7 Conclusion The proposed system describes a step towards intelligent processing of visual data. An example setup is presented with multiple cameras and a tracker for each of them. To track a target robustly in all camera views, multiple models are recorded and the trackers select automatically the model that matches best. By implementing a virtual editor, an on-line documentations is produced on the basis of tracking results and camera control. Information exchange between the individual trackers is currently only used for the (re)initialization process by applying epipolar geometry. We will integrate the use of a common state for all trackers based on 3D information. Furthermore, Fig. 5. The initialization step can cause problems in tracking, if there are several equally good candidates in the vicinity of the epipolar lines.

we will enhance the virtual editor. The camera selection should be made in such a way that the resulting stream is pleasant to watch and accordingly, short movie cuts should be avoided. In addition, the Bhattacharyya coefficient which is the basis for the best view selection will be combined with alternative decision rules. We have also planned to compute virtual views from the available images to increase the information content of the video stream. A teleconferencing setup with multiple people is another interesting extension. The virtual editor should be able to locate the person which is the main actor within the observed scene. Acknowledgment The authors acknowledge the support by the European Commission project STAR (IST-2000-28764). We thank Petr Doubek and Stefaan De Roeck for the multi-camera set-up and calibration, and Bart Vanluyten and Stijn Wuyts for including the active cameras. References 1. F. Aherne, N. Thacker and P. Rockett, The Bhattacharyya Metric as an Absolute Similarity Measure for Frequency Coded Data, Kybernetika, pp. 1-7, Vol. 32(4), 1997. 2. C.J.C. Burges, A tutorial on support vector machines for pattern recognition, Data Mining and Knowledge Discovery, Vol.2(2): pp. 955-974, 1998. 3. D. Comaniciu, F. Berton and V. Ramesh, Adaptive Resolution System for Distributed Surveillance, Real-Time Imaging, pp. 427-437, Vol. 8, 2002. 4. D. Comaniciu, V. Ramesh and P. Meer, Real-Time Tracking of Non-Rigid Objects using Mean Shift, CVPR, pp. 142-149, Vol. 2, 2000. 5. P. Doubek, T. Svoboda and L. Van Gool, Monkeys - a Software Architecture for ViRoom - Low-Cost Multicamera System, ICVS, pp. 386-395, 2003. 6. M. Isard and A. Blake, CONDENSATION Conditional Density Propagation for Visual Tracking, International Journal on Computer Vision, pp. 5-28, Vol. 1(29), 1998. 7. T. Kailath, The Divergence and Bhattacharyya Distance Measures in Signal Selection, IEEE Transactions on Communication Technology, COM-15(1) pp. 52-60, 1967. 8. S. Kahn, O. Javed and M. Shah, Tracking in Uncalibrated Cameras with Overlapping Field of View, PETS, 2001. 9. J. Krumm, S. Harris, B. Meyers, B. Brumitt, M. Hale and S. Shafer, Multi- Camera Multi-Person Tracking for EasyLiving, International Workshop on Visual Surveillance, pp. 3-10, 2000. 10. K. Nummiaro, E. Koller-Meier and L. Van Gool, An Adaptive Color-Based Particle Filter, Journal of Image and Vision Computing, pp. 99-110, Vol 21(1), 2003. 11. P. Pérez, C. Hue, J. Vermaak and M. Gangnet, Color-Based Probabilistic Tracking, ECCV, pp. 661-675, 2002. 12. T. Svoboda, H. Hug and L. Van Gool, ViRoom - Low Cost Synchronised Multicamera System and its Self-Calibration, DAGM, pp. 515-522, 2002. 13. M.M. Trivedi, I. Mikic and S.K. Bhonsle, Active Camera Networks and Semantic Event Databases for Intelligent Environments Proceedings of the IEEE Workshop on Human Modelling, Analysis and Synthesis, 2000.