Urban Vehicle Tracking using a Combined 3D Model Detector and Classifier

Urban Vehicle Tracing using a Combined 3D Model Detector and Classifier Norbert Buch, Fei Yin, James Orwell, Dimitrios Maris and Sergio A. Velastin Digital Imaging Research Centre, Kingston University, Penrhyn Road, Kingston upon Thames, KT1 2EE, United Kingdom {norbert.buch, fei.yin, j.orwell, d.maris, sergio.velastin}@ingston.ac.u Abstract. This paper presents a tracing system for vehicles in urban traffic scenes. The tas of automatic video analysis for existing CCTV infrastructure is of increasing interest due to benefits of behaviour analysis for traffic control. Based on 3D wire frame models, we use a combined detector and classifier to locate ground plane positions of vehicles. The proposed system uses a Kalman filter with variable sample time to trac vehicles on the ground plane. The classification results are used in the data association of the tracer to improve consistency and noise suppression. Quantitative and qualitative evaluation is provided using videos of the public benchmaring i-lids data set provided by the UK Home Office. Correctly detected tracs of 94% outperform a baseline motion tracer tested under the same conditions. Keywords: vehicle tracing, visual surveillance, motion estimation, 3D models, vehicle classification, urban traffic 1 Introduction In recent years, there has been an increased scope for automatic analysis of urban traffic activity. This is due in part to the additional numbers of cameras and other sensors, the enhanced infrastructure and consequent accessibility and also the advancement of analytical techniques to process the video data. Monitoring objectives include the detection of traffic violations (illegal turns, one way streets, etc.) and the gathering of statistics about the type of road users. Using general purpose surveillance cameras, the classification of vehicles is a demanding challenge (see [9, 8, 12, 4]). Compared to most examples in image retrieval problem, the quality of surveillance data is generally poor and the range of operational conditions (night-time, inclement and changeable weather that affects the auto-iris) require robust techniques which need to be immune to errors in obtaining road users silhouettes. Those silhouettes extracted by foreground analysis are the input to our classifier. The classification process is based on 3D models for vehicles to give robustness against foreground noise and can be restricted to an active region of the camera view (e.g. lanes). This allows human operators to configure monitoring objectives. The classified vehicles are traced on the ground plane over time using a Kalman filter for variable time steps. Tracing performance is evaluated using the framewor of Yin et al. [13] and compared to a state of the art OpenCV blob tracer [11] operating on the same video data. - 1 -

Our novel contributions are firstly the extension of our 3D vehicle detector and classifier by tracing on the ground plane. We derive a variable sample rate Kalman filter to accommodate missed observations. The classification of vehicles is used during tracing due to our novel approach of classifying before tracing. Secondly, our tracing evaluation framewor [13] is used to generate rich performance figures based on ground truth containing image bounding boxes. Thirdly, the performance of the 3D model based ground plane tracer is compared to a state of the art blob tracer. The remainder of the paper is organised as follows: Section 2 introduces the detector and classifier used. The application of Kalman filtering to the classification results is demonstrated in section 3. Introduction to the evaluation framewor and results are given in section 4. Section 5 concludes the paper. 1.1 Related wor This review firstly introduces detection and tracing systems and continues with performance evaluation framewors. Vehicle tracing in urban environments is performed in [12]. However, only a single 3D model for cars is used to estimate a vehicle constellation per frame with optimisation solved with a Marov Chain Monte Carlo (MCMC) algorithm. The reported detection rates are 96.8% and 88% for two videos, which are limited to single size vehicles. The paper of Morris and Trivedi [9] presents a combined tracing and classification approach for side views of highways which is an extension to [8]. A single Gaussian bacground model is used for foreground segmentation. Classification and tracing accuracy was increased by combining tracing and classification. A Kalman filter is used to trac the foreground regions based on the centroids in the image plane only. The OpenCV blob tracer [11] used as baseline here wors in a similar fashion. The field of generic object recognition recently expanded towards surveillance applications. Good examples are Leibe et al. [6,7] for vehicle and pedestrian detection. Performance however, is not yet comparable to state of the art surveillance systems for this specific tas. Performance evaluation has played an important role on developing, assessing and comparing object tracing algorithms. Lazarevic-McManus et al [5] evaluated performance of motion detection based on ROC-lie curves and the F-measure. The latter allows comparison using a single value domain, but is mainly designed to operate on motion detection rather than tracing. There is a significant body of wor dealing with evaluation of both motion detection and tracing. Needham and Boyle [1] proposed a set of metrics and statistics for comparing trajectories to account for detection lag, or constant spatial shift. However, taing only the trajectory (a set of points over time) as the input of evaluation may not give sufficient information about how precise the tracs are since the size of the object is not considered. Bashir and Porili [1] use the spatial overlap of ground truth and system bounding boxes which is not biased towards large objects. However they are counted per frame, which is justified when the objective is object detection. In object tracing, counting true positive (TP), false positive (FP) and false negative (FN) tracs is a more natural choice which is consistent with the expectations of surveillance end-users. Brown et al. [3] suggests a framewor for matching of system trac centroids and an enlarged ground truth bounding box which favours tracs of large objects. - 2 -

2 Detection and Classification using 3D Models Joint detection and classification is performed using 3D wire frame models for vehicles with calibrated cameras. As indicated in the bloc diagram in Figure 1, the detector uses a Gaussian Mixture Model (GMM) for motion estimation with subsequent closed contour retrieval to generate motion silhouettes for an input video frame. Those motion silhouettes are used to generate vehicle hypotheses. The classifier matches 3D wire frame models (see ) with the motion silhouettes. To validate the hypotheses, the normalised overlap area of motion silhouettes and projected model silhouettes is calculated. Full details on the classifier can be found in a previous paper [4]. The output of the classifier are class labelled ground plane positions of vehicles. On frame to frame detection and classification of four classes, the classifier precision is 96.1% with a total system recall of 9.4% at a precision of 87.9%. Section 5 gives tracing evaluation results on the same video set. Detector frame GMM Tracer foreground mas Classifier Closed Contours silhouettes Overlap Area [4] scores Maximum labels GP pos Kalman Filter tracs 3D Hypothesi s model silhouette GP positions 2D Projection] Models Figure 1 Bloc diagram of detector with 3D classifier and subsequent tracer. Figure 2 Left: 3D wire frame models used for the classifier. Right: Example of detection and classification with ground plane tracing. The wire frame projection in red is used to estimate the bounding box for traced vehicles. - 3 -

3 Tracing Tracing introduces temporal consistency to the detection and classification result of the previous section. Our novel contribution is the extension of the classifier by a Kalman filter with variable sample rate. The detector with joint classifier may reject valid vehicles in some frames due to noise, which requires the Kalman filter to operate on variable time intervals. Tracing is performed on the ground plane of the scene, which simplifies behaviour analysis lie bus lane monitoring. We use the standard formulation of the Kalman filter for a constant velocity model of vehicles x = Fx 1 + Bu + w z = Hx + v with u = (1) T with state vector = vx, x, vy, y z = x, y. All time and speed related constants for the filter are based on seconds rather than the sample rate or frame rate. The ground plane coordinates are in metres, all noise and position estimates are in metres or meters per second. The above is valid, if the integration constant T from speed to position in the transition matrix F is defined in seconds 1 T 1 F =. (2) 1 T 1 x and the measurement vector [ ] T The only conditions to operate the Kalman filter at variable sample rate is to update T in the transition matrix F constantly. For prediction steps, T is the time between the last update step of the filter and the current time. The state prediction xˆ and the error covariance P is therefore estimated for the correct time. If a measurement is available, the update step is performed with the same transition matrix F. If no measurement is available, not update is performed. Future prediction steps will be performed with increasing time T until an update taes place. Tracs can be discarded if the predicted error covariance P grows beyond a threshold. The parameters for the filter are as follows. The process noise w is set to 1.1m s for velocity and.7m for position. Those values can be derived from the expected acceleration of vehicles. The measurement noise is v = 2m corresponding to the detection grid. The initial error covariance P is set to 3m s for velocity and 1m for position. The initial position state corresponds to the detection position with zero velocity. The velocity is updated during the second detection using the first motion vector. Observations mi, are associated with tracs based on the distance d ij between the observation m i, and the prediction x ˆ j, normalised by the diagonal elements of the predicted error covariance P. Changes in the model- id of the last observation of a trac id i and the current observation id j are penalised. The total number of model- ids is 1. This novel approach is possible due to our system having classification before the tracing. ( ) ( ) 2 2 1 1 1 ij = i j x + i j y + 1 i j d x x P y y P id id (3) - 4 -

4 Evaluation The object tracing performance is demonstrated by comparing our tracer with a baseline tracer (OpenCV blob tracer [11]). The OpenCV tracer uses an adaptive mixture of Gaussians for bacground estimation, connected component analysis for data association and Kalman filtering for tracing blob position and size. We use the i-lids benchmaring video data set provided by the UK Home Office [2] for evaluation. We run the tracer on the following sequences of the pared car data set scene 1 (PVTRA1xxxx): 1a3, 1a7, 1a13, 1a19, 1a2, 2a5, 2a1 and 2a11. Those videos contain overcast, sunny, changing weather conditions and camera saturation. We propose a rich set of metrics such as Correct Detected Tracs, False Detected Tracs and Trac Detection Failure to provide a general overview of the system s performance. Trac Fragmentation shows whether the temporal and spatial coherence of tracs is established. ID Change is useful to test the data association module of the system. Latency indicates how quic the system can respond to an object entering the camera view, and Trac Completeness how complete the object has been traced. Metrics such as Trac Distance Error and Closeness of Tracs indicate the accuracy of estimating the position, the spatial and the temporal extent of the objects respectively. More details about this evaluation framewor can be found in Yin et al. [13]. 4.1 Qualitative results Figure 3 Correct detected tracs inside the active regions of interest (dar red boxes). Left: the proposed system with corresponding ground plane tracs. Right: OpenCV tracer result. - 5 -

Figure 4 The second car is missed due to occlusion between the vehicles. The proposed classifier on the left correctly locates the first car. The OpenCV tracer merged both cars with a large bounding box at a central position. Figure 5 Pedestrians are correctly rejected as other class by the proposed classifier and detected by the OpenCV tracer. 4.2 Quantitative results The ground truth used for evaluation is provided with the i-lids data set. It is of limited duration within the videos and does not include pedestrians on the road. The evaluation was constrained to the two regions of interest on the road (dar red boxes in Figure 3) for both tracers. The full results are provided in Table 1 indicating that the proposed system outperforms the OpenCV tracer on high level metrics such as correct detected tracs, trac detection failure, false detected tracs and trac fragmentation. This can mainly be attributed to the additional prior information from using 3D models to classify the content of the input video. For metrics that evaluate the motion segmentation such as trac closeness and distance error, both tracers have similar performance, which can be explained by the similar bacground estimation method. The trac closeness of the proposed system is better than the baseline due to 3D models which are more robust against shadows, which can be observed for the bus in Figure 3 and the occluded car in Figure 4. The extent of the projected wire frame model is used as bounding box for the proposed system. The false detected tracs from the OpenCV tracer are high due to systematic detection of pedestrians, which can not be classified. Refer to Figure 5 for an example. The proposed system detected 94% of the ground truth tracs compared to - 6 -

Table 1 Tracing results Metrics proposed Tracer OpenCV blob Tr. Number of Ground truth tracs 1 1 Number of system tracs 144 23 Correct detected tracs 94 88 Trac detection failure 6 12 False detected tracs 27 9 Latency [frames] 5 5 Trac fragmentation 8 18 Average trac Completeness [time] 64% 55% ID change 1 3 Average trac closeness [bbox overlap] 54% 35% Standard Deviation of closeness 2% 13% Average distance error [pixels] 22 21 Standard Deviation of distance error 19 15 88% of the base line. Our system has half of the trac detection failures compared to the base line. The higher detection rate can be explained by a more sensitive bacground estimation producing more complete and additional noise detections. However, the classification stage rejects many ambiguous detections. Id change can occurs if a trac of an object leaving is continued for a new object. This is worse for the proposed system compared to the OpenCV tracer, because the tracer is more persistent, occasionally wrongly continuing a trac but therefore generating much less trac fragmentations. 5 Conclusions and future wor We proposed a novel system for detection, classification and ground plane tracing of vehicles in surveillance videos. The proposed system is evaluated on the i-lids data set against the state of the art OpenCV blob tracer. Our system performs similar for motion related metric but outperforms the baseline for high level metric lie detected tracs 94% and missed tracs 6. This indicates superior performance in the camera view with the additional benefit of gaining group plane locations. This can be essential to solve surveillance tass lie enforcing bus lane restrictions. Future wor can be the evaluation of the classes of tracs and the group plane positions. Both require a significant amount of ground truth. Regarding the detector and classifier, avoiding the reliance on motion estimation would be beneficial for more robustness against lighting changes and camera saturation. There is the opportunity to post process completed tracs for retrospective behaviour analysis. - 7 -

6 Acnowledgements We are grateful to the Directorate of Traffic Operations at Transport for London for funding the wor on classification and tracing and to BARCO View, Belgium for funding the wor on tracing evaluation. 7 References [1] F. Bashir and F. Porili. Performance evaluation of object detection and tracing systems. In IEEE Int. W. on Performance Evaluation of Tracing and Surveillance, PETS'6, 26. [2] Home Office Scientific Development Branch. Imagery library for intelligent detection systems i-lids. http://scienceandresearch.homeoffice.gov.u/hosdb/cctv-imagingtechnology/video-based-detection-systems/i-lids/ [accessed 19 December 28]. [3] L. M. Brown, A. W. Senior, Ying li Tian, Jonathan Connell, Arun Hampapur, Chiao-Fe Shu, Hans Merl, and Max Lu. Performance evaluation of surveillance systems under varying conditions. In IEEE Int'l Worshop on Performance Evaluation of Tracing and Surveillance, pages 1 8. Colorado, January 25. [4] Norbert Buch, James Orwell, and Sergio A. Velastin. Detection and classification of vehicles for urban traffic scenes. In International Conference on Visual Information Engineering VIE8, pages 182 187. IET, July 28. [5] N. Lazarevic-McManus, J.R. Renno, D. Maris, and G.A. Jones. An object-based comparative methodology for motion detection based on the f-measure. Computer Vision and Image Understanding, Sp. Is. on Intelligent Visual Surveillance, pages 74 85, 27. [6] B. Leibe, N. Cornelis, K. Cornelis, and L. Van Gool. Dynamic 3d scene analysis from a moving vehicle. In Computer Vision and Pattern Recognition. CVPR '7. IEEE Conference on, pages 1 8, June 27. [7] B. Leibe, K. Schindler, N. Cornelis, and L. Van Gool. Coupled object detection and tracing from static cameras and moving vehicles. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 3(1):1683 1698, Oct. 28. [8] B. Morris and M. Trivedi. Robust classification and tracing of vehicles in traffic video streams. In Intelligent Transportation Systems Conference. ITSC '6. IEEE, pages 178 183, 26. [9] Brendan Morris and Mohan Trivedi. Improved vehicle classification in long traffic video by cooperating tracer and classifier modules. In AVSS '6: Proceedings of the IEEE International Conference on Video and Signal Based Surveillance, page 9, USA, 26. [1] C.J. Needham and R.D. Boyle. Performance evaluation metrics and statistics for positional tracer evaluation. In International Conference on Computer Vision Systems, ICVS'3, pages 278 289, Graz, Austria, April 23. [11] OpenCV. Open source computer vision library. http://sourceforge.net/projects/opencvlibrary [accessed 19 December 28]. [12] Xuefeng Song and R. Nevatia. Detection and tracing of moving vehicles in crowded scenes. In Motion and Video Computing. WMVC '7. IEEE W. on, pages 4 4, 27. [13] Fei Yin, Dimitrios Maris, and Sergio A. Velastin. Performance evaluation of object tracing algorithms. In 1th IEEE International Worshop on Performance Evaluation of Tracing and Surveillance, PETS'7, Rio de Janeiro, October 27. - 8 -