SELF-EVALUATION FOR VIDEO TRACKING SYSTEMS

SELF-EVALUATION FOR VIDEO TRACKING SYSTEMS Hao Wu and Qinfen Zheng Cenre for Auomaion Research Dep. of Elecrical and Compuer Engineering Universiy of Maryland, College Park, MD-20742 {wh2003, qinfen}@cfar.umd.edu ABSTRACT In his paper, we presen an algorihm for auomaic performance evaluaion of a video racking sysem ha does no require ground-ruh daa. Such an algorihm can play an imporan role in auomaically deermining when he underlying sysem loses rack and needs reiniializaion. The algorihm is based on measuring appearance similariy and racking uncerainy. Several experimenal resuls on vehicle and human racking are presened. Effeciveness of he evaluaion scheme is assessed by comparisons wih ground ruh. The proposed self evaluaion algorihm has been used in an acousic/video based moving vehicle deecion and racking sysem. 1. INTRODUCTION An objec racking sysem can fail under many circumsances. I could be due o illuminaion changes, pose variaion, occlusion, and oher facors. There is a need for auomaic performance evaluaion. Mos of he exising work on racking performance evaluaion has focused on overall algorihmic performance evaluaion using ground-ruh daa. Their usefulness in real ime deermining racking failure is quie limied. In his paper, we presen a racker self-evaluaion algorihm ha auomaically evaluaes he racking qualiy on-he-run and does no require ground-ruh daa. Online self-evaluaion for keeping rack of sysem performance has been sudied for video based objec segmenaion. In [Erdem, 2004], segmenaion and moion consisency along he objec conour and hisogram similariy are calculaed and used o evaluae he goodness of segmenaion and racking. However, a generic racking algorihm may no segmen he objec from he background and hence, he conour informaion may no be available. We address video racking sysems whose arges are bounded by boxes. The rack assessmen is mainly based on appearance similariy and rajecory smoohness. We reduce he confidence in 1 Prepared hrough collaboraive paricipaion in he Advanced Sensors Consorium sponsored by he U.S. Army Research Laboraory under he Collaboraive Technology Alliance Program, Cooperaive Agreemen DAAD19-01-02-0008. racking when here is ambiguiy in he resul. The uncerainy is assessed hrough monioring several ambiguiy measuremens. The paper is organized as follows: ambiguiy feaure exracion and rack evaluaion crierion are discussed in Secion 2 and 3 respecively; Secion 4 gives several experimenal resuls; finally conclusions are given in Secion 5. 2. FEATURES USED FOR SELF-EVALUATION In a common video racker, he locaion and appearance of he arge is represened hrough a represenaive chip specified by a bounding box in he image frame. Conour based rackers can be modified o fi ino such a framework. Inuiively, one may hink ha he appearance change can be used for evaluaion. However, i is no reliable o judge he racking performance solely based on he appearance of he racking box. Appearance change may be caused by wo facors: (1) objec pose change due o camera and/or objec moion and (2) appearance difference measure no consisen wih subjecive evaluaion. The appearance change doesn necessarily indicae poor racking performance. In addiion, in many cases he bounding box includes some background pixels, which makes he appearance evaluaion difficul. In our experience on video surveillance using saic infrared camera, we have noiced ha when racking fails, he size and locaion of bounding box changes irregularly. Once he racking bounding box locks ono background pixels, i changes randomly due o he similariy of he background cluer. Anoher common cause of racking failure is ha he racking bounding box locks ono background objecs. Our goal is o deec any racking failure soon afer i occurs. The following ambiguiy ess are examined in our self evaluaion algorihm. Tes 1: Trajecory complexiy evaluaion Normally, a moving vehicle will no change is direcion and speed dramaically in a few adjacen frames. Therefore, rapid and frequen change in objec moion rajecory is a sign of racking failure. We measure rajecory complexiy as he raio of he rajecory pah

Repor Documenaion Page Form Approved OMB No. 0704-0188 Public reporing burden for he collecion of informaion is esimaed o average 1 hour per response, including he ime for reviewing insrucions, searching exising daa sources, gahering and mainaining he daa needed, and compleing and reviewing he collecion of informaion. Send commens regarding his burden esimae or any oher aspec of his collecion of informaion, including suggesions for reducing his burden, o Washingon Headquarers Services, Direcorae for Informaion Operaions and Repors, 1215 Jefferson Davis Highway, Suie 1204, Arlingon VA 22202-4302. Respondens should be aware ha nowihsanding any oher provision of law, no person shall be subjec o a penaly for failing o comply wih a collecion of informaion if i does no display a currenly valid OMB conrol number. 1. REPORT DATE 00 DEC 2004 2. REPORT TYPE N/A 3. DATES COVERED - 4. TITLE AND SUBTITLE Self-Evaluaion For Video Tracking Sysems 5a. CONTRACT NUMBER 5b. GRANT NUMBER 5c. PROGRAM ELEMENT NUMBER 6. AUTHOR(S) 5d. PROJECT NUMBER 5e. TASK NUMBER 5f. WORK UNIT NUMBER 7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) Cenre for Auomaion Research Dep. of Elecrical and Compuer Engineering Universiy of Maryland, College Park, MD-20742 8. PERFORMING ORGANIZATION REPORT NUMBER 9. SPONSORING/MONITORING AGENCY NAME(S) AND ADDRESS(ES) 10. SPONSOR/MONITOR S ACRONYM(S) 12. DISTRIBUTION/AVAILABILITY STATEMENT Approved for public release, disribuion unlimied 11. SPONSOR/MONITOR S REPORT NUMBER(S) 13. SUPPLEMENTARY NOTES See also ADM001736, Proceedings for he Army Science Conference (24h) Held on 29 November - 2 December 2004 in Orlando, Florida., The original documen conains color images. 14. ABSTRACT 15. SUBJECT TERMS 16. SECURITY CLASSIFICATION OF: 17. LIMITATION OF ABSTRACT UU a. REPORT unclassified b. ABSTRACT unclassified c. THIS PAGE unclassified 18. NUMBER OF PAGES 5 19a. NAME OF RESPONSIBLE PERSON Sandard Form 298 (Rev. 8-98) Prescribed by ANSI Sd Z39-18

lengh, L pp 1 2, and end poins disance, D pp 1 2, beween wo racking poins p1 = p( τ ) and p2 = p() as shown in Fig. 1. Normally, he larger he raio is, he more complex he rajecory will be. We define rajecory complexiy indicaor as Lpp 1 2 0 if T1 I () = D 1 pp (1) 1 2 1 oherwise We can furher include rajecory direcion change in rajecory complexiy indicaor. P(-) Fig.1 Illusraion of racking rajecory Tes 2: Moion smoohness evaluaion We noiced ha he rajecory incremen beween wo adjacen frames ofen increases when racking fails. We define moion sep as he displacemen of objec box over wo consecuive frames,. Moion smoohness D p ( 1) p ( ) indicaor is defined as 0 if D T p p 2 ( 1) ( ) I () = 2 (2) 1 oherwise The hreshold T 2 is deermined according o prior knowledge of objec moion. For objec racking from a moving camera, camera ego moion should firs be esimaed and removed from he objec displacemen compuaion. Tes 3: Scale consancy evaluaion In general, for medium o long range surveillance, we expec he scale change o be small. We measure arge scale change as he raio of he area of curren arge bounding box, A, o he area of iniial bounding box, P() A 0. Boh he arge scale change and scale change rae are measured and used in rack evaluaion. We define he scale consancy indicaor as A A T31 U T32 U A0 A 0 0 if I () = 3 da da T33 T (3) U 34 Ad 0 Ad 0 1 oherwise Tes 4: Shape similariy evaluaion Shape is an imporan discriminaor for objecs. When he racking bounding box swiches o a differen objec or o he background, he shape of he bounding box ofen also changes. We use aspec raio, Widh Heigh, of he bounding box o represen objec shape and measure he shape similariy as he raio of bounding box aspec raios. The shape similariy indicaor is defined as W H W H 0 if T41 U T42 I () = W 4 0 H0 W0 H0 (4) 1 oherwise Tes 5: Appearance similariy evaluaion Alhough racking evaluaion should no solely depend on appearance similariy, appearance change ofen resuls in racking failure. Therefore, quanifying he appearance change is sill imporan. We use hree appearance change measures o evaluae he appearance sabiliy. The firs one, D, is pixel by pixel difference I beween he curren objec and he iniial objec; he second one, D, is difference of image inensiy H hisograms beween he curren and iniial objecs as used in 0; he hird one, D M, is he sum of weighed differences beween he curren appearance model and he iniial appearance model. Oher measuremen mehods can also be added. We define he appearance similariy indicaor as I () 5 0 if { D T } { D T } { D T } I 51 H 52 M 53 = U U 1 oherwise 3. EVALUATION CRITERION In ideal siuaion, a good racking should have all he five racking evaluaion indicaors equal o one. In pracical circumsances, some unexpeced facors may rigger one or wo of hese indicaors, while he racking performance is sill good. However if hree or more indicaors have been riggered, we conclude ha he racking performance has deerioraed. We fuse he above five es scores o ge a comprehensive racking performance score. We firs learn he uncerainy decision hresholds for each es using empirical daa and hen compue a weighed sum of he five indicaors 5 5 q () = wi (), w = 1 i i i i= 1 i= 1 (5) (6) In general, he larger he q () is, he beer he racking performance. When q () drops below a hreshold, we conclude ha he racking performance has deerioraed and needs o be re-iniialized. The weigh can be learn from raining daa. In our implemenaion, he

appearance weigh, w 5, is se slighly larger han ohers. In implemenaion, one may re-iniialize he sysem only afer q () is below a hreshold for a specified period of ime. 4. EXPERIMEN RESULTS The proposed algorihm was esed on differen surveillance videos. Fig.2 shows evaluaion resuls on an IR vehicle surveillance sequence. The vehicle firs moved sraigh away from he camera and hen made a lef urn. The resuls show ha he self evaluaion algorihm does give a good indicaion of he racking performance. In Fig. 2(a), when he bounding box does no fi he objec well, he evaluaion score drops. Afer re-iniializaion, he bounding box fis he objec and he evaluaion score rises, as shown in Fig. 2(b). We also compared he self evaluaion resul wih ground ruh (Fig.3). I is shown ha as he disance beween he racked objec locaion and he ground ruh increases, our racking confidence score decrease indicaing deerioraion in racking performance. When inegraed ino a moving vehicle deecion and racking sysem [Sankaranayanan, 2004], he proposed algorihm helps he video surveillance sysem mainaining a good arge rack by re-iniializing he racker whenever he racker performance deerioraes. The racking algorihm used in our experimens is he adapive appearance model based racker developed by Zhou, e al [Zhou, 2004]. 5. CONCLUSIONS In his paper, we presen an algorihm for auomaic performance evaluaion of a video racking sysem ha does no require ground-ruh daa. The algorihm is based on measuring appearance similariy and racking uncerainy. Several experimenal resuls on vehicle and human racking are repored. Effeciveness of he evaluaion scheme is demonsraed by comparisons wih ground ruh. The proposed self evaluaion algorihm has been used in an acousic/video based moving vehicle deecion and racking sysem [Sankaranayanan, 2004]. 6. REFERENCES Erdem, C.E. Sankur, B, Tekalp, A.M., 2004: Performance Measures for Video Objec Segmenaion and Tracking, IEEE Trans. Image Processing, 13:931-951. Sankaranayanan, A.C., e al, 2004: Vehicle Tracking using Acousic and Video Sensors, Proc. 24 h Army Science Conference (o appear). Zhou, S., Chellappa, R., Moghaddam, B., 2004: Visual Tracking and Recogniion Using Appearanceadapive Models in Paricle Filers, IEEE Trans. Image Processing (o appear). Fig. 4 shows he resuls of evaluaion of pedesrian deecion and racking from a color surveillance video. The firs hree images are represenaive frames of he surveillance video wih he racking bounding box superimposed. The corresponding racker evaluaion scores are shown in he boom row of Fig.4. In his example, he bounding box swiches o he background and wanders around a ha posiion aferwards. Our self evaluaion crierion correcly repors he racking failure. Fig.5 shows he resuls of evaluaing a pedesrian racking wih parial occlusion and reappearance. The racked person walks behind a moving car. The racker becomes uncerain while parially occluded by he moving vehicle. The racker regains is confidence/performance afer he human reappears. Our racker evaluaion algorihm correcly scores he even. Fig.6 shows he evaluaion resuls for racking a group of pedesrian wih significan occlusion. As he racked human group is blocked by he moving van, he bounding box swiches o he van and loses he arge. Our self-evaluaion score drops when he racker fails. We expec he confidence score will drop furher if arge rajecory direcion is also incorporaed in he evaluaion measuremens.

(a) (b) Fig.2 Improved video racking wih rack evaluaion and appearance updaing. Also shown are he corresponding evaluaion plos. Fig.3 Comparison of self-evaluaion score and he ground ruh. The red line is he disance beween GPS measuremens and racked arge cener; he green line is he evaluaion scores repored by our algorihm. Fig.4 An example of pedesrian racking. Shown in he op hree rows are represenaive frames wih he racking bounding box superimposed. The corresponding racker evaluaion scores are shown in he boom row. Our self evaluaion crierion correcly repors he racking failure.

Fig.5. An example of racking pedesrian wih parial occlusion. The racked person walks behind a moving car. The racker becomes uncerain while parially occluded by he moving vehicle. The racker regains is performance afer he human is cleared of occlusion. Our racker evaluaion algorihm correcly scores he even. Fig.6. An example of racking a group of pedesrian wih significan occlusion. As he racked human group is occluded by he moving van, he bounding box swiches o he van and lose he arge. Our self-evaluaion score drops when he racker fails.