Globally-Optimal Greedy Algorithms for Tracking a Variable Number of Objects

Globally-Opimal Greedy Algorihm for Tracking a Variable Number of Objec Hamed Piriavah Deva Ramanan Charle C. Fowlke Deparmen of Compuer Science, Univeriy of California, Irvine {hpiriav,dramanan,fowlke}@ic.uci.edu Abrac 62 6 62 62 We analyze he compuaional problem of muli-objec racking in video equence. We formulae he problem uing a co funcion ha require eimaing he number of rack, a well a heir birh and deah ae. We how ha he global oluion can be obained wih a greedy algorihm ha equenially inaniae rack uing hore pah compuaion on a flow nework. Greedy algorihm allow one o embed pre-proceing ep, uch a nonmax uppreion, wihin he racking algorihm. Furhermore, we give a near-opimal algorihm baed on dynamic programming which run in ime linear in he number of objec and linear in he equence lengh. Our algorihm are fa, imple, and calable, allowing u o proce dene inpu daa. Thi reul in ae-of-he-ar performance. 62 rack deah rack deah d rack birh c b 62 rack birh rack birh 6 a 62 6 6 e f Figure 1. We rea he problem of muli-arge racking hrough a perpecive of paioemporal grouping, where boh a large number of group and heir paioemporal exen (e.g., he number of objec and heir rack birh and deah) mu be eimaed. We how he oupu of an efficien, linear-ime algorihm for olving hi compuaional problem on he ETHMS daae []. In hi video clip our mehod reurn hundred of correc rack, a eviden by he overlaid rack number. 1. Inroducion Our conribuion i grounded in a novel analyi of an ineger linear program (ILP) formulaion of muli-objec racking [14, 25, 3, 17, 2, 18]. Our work mo cloely follow he min-co flow algorihm of [25]. We how ha one can exploi he pecial rucure of he racking problem by uing a greedy, ucceive hore-pah algorihm o reduce he be-previou running ime of O(N 3 log2 N ) o O(KN log N ), where K i he unknown, opimal number of unique rack, and N i he lengh of he video equence. The inuiion behind he greedy approach em from hi urpriing fac (Fig.2): he opimal inerpreaion of a video wih k + 1 rack can be derived by a local modificaion o he oluion obained for k rack. Guided by hi inigh, we alo inroduce an approximae greedy algorihm whoe running ime cale linearly wih equence lengh (i.e., O(KN )), and i in pracice everal order of magniude faer wih no obervable lo in accuracy. Finally, our greedy algorihm allow for he embedding of variou pre-proceing or po-proceing heuriic (uch a non-maximum uppreion) ino he racking algorihm, which can boo performance. We conider he problem of racking a variable number of objec in a video equence. We approach hi ak a a paioemporal grouping problem, where all image region mu be labeled a background or a a deecion belonging o a paricular objec rack. From uch a grouping perpecive, one mu explicily eimae (a) he number of unique rack and (b) he paioemporal exen, including he ar/erminaion ime, of each rack (Fig.1). Approache o accomplihing he above ak ypically employ heuriic or expenive algorihm ha cale exponenially in he number of objec and/or uper-linearly in he lengh of he video. In hi paper, we ouline a family of muli-objec racking algorihm ha are: 1. Globally opimal (for common objecive funcion) 2. Locally greedy (and hence eay o implemen) 3. Scale linearly in he number of objec and (quai)linearly wih video-lengh 1201

2. Relaed Work Claic formulaion of muli-objec racking focu on he daa aociaion problem of maching inance label wih emporal obervaion [11, 6, 7, 13]. Many approache aume manual iniializaion of rack and/or a fixed, known number of objec [14]. However, for many real-world racking problem, uch informaion i no available. A more general paioemporal grouping framework i required in which hee quaniie are auomaically eimaed from video daa. A popular approach o muli-objec racking i o run a low-level racker o obain rackle, and hen ich ogeher rackle uing variou graph-baed formalim or greedy heuriic [15, 22, 16, 1, 2]. Such graph-baed algorihm include flow-nework [25], linear-programming formulaion [14], and maching algorihm [15]. One of he conribuion of hi paper i o how ha wih a paricular choice of low-level racker, and a paricular chedule of rack inaniaion, uch an algorihm can be globallyopimal. We rely on an increaingly common ILP formulaion of racking [14, 25, 3, 17, 2, 18]. Such approache reric he e of poible objec locaion o a finie e of candidae window on he pixel grid. Becaue andard linear programming (LP) relaxaion do no cale well, many algorihm proce a mall e of candidae, wih limied or no occluion modeling. Thi can produce broken rack, ofen requiring a econd merging age. Our calable algorihm i able o proce much larger problem and direcly produce ae-of-he-ar rack. Our work relie heavily on he min-co flow nework inroduced for emporal daa aociaion in [25]. We compare our reul wih he min-co olver ued in ha work [12], and verified ha our O(KN log N) algorihm produce idenical reul, and ha our approximae O(KN) algorihm produce near-idenical reul when properly uned. In concurren work, Berclaz e al. decribe a O(KN log N) algorihm for muli-objec racking in [4]. I i imilar in many repec wih ome difference: Our graph repreenaion ha a pair of node for each deecion. Thi allow u o explicily model objec dynamic hrough raniion co, and allow for a impler flow-baed analyi. In addiion, our algorihm inaniae rack in a greedy fahion, allowing for he inegraion of pre-proceing ep (e.g., non-max-uppreion) ha improve accuracy. Finally, we alo decribe approximae O(KN) algorihm ha perform near-idenical in pracice. 3. Model We define an objecive funcion for muli-objec racking equivalen o ha of [25]. The objecive can be derived from a generaive perpecive by conidering a Hidden Markov 3 rack eimae 4 rack eimae x Figure 2. The inuiion behind our opimal greedy algorihm. Aume ha we are racking he x locaion of muliple objec over ime. On he lef, we how he opimal eimae of 3 objec rajecorie. Given he knowledge ha an addiional objec i preen, one may need o adju he exiing rack. We how ha one can do hi wih a hore-pah/minflow compuaion ha puhe flow from a ource o a erminal (middle). The oluion can revere flow along exiing rack o cu and pae egmen, producing he opimal 4-rack eimae (righ). We furher peed up hi proce by approximaing uch edi uing fa dynamic programming algorihm. Model (HMM) whoe ae pace i he e of rue objec locaion a each frame, along wih a prior ha pecifie likely ae raniion (including birh and deah) and an obervaion likelihood ha generae dene image feaure for all objec and he background. 3.1. Independen rack We wrie x for a vecor-valued random variable ha repreen he locaion of a paricular objec, a given by a pixel poiion, cale, and frame number: x = (p, σ, ) x V (1) where V denoe he e of all paceime locaion in a video. Prior: We wrie a ingle rack a an ordered e of ae vecor T = (x 1,... x N ), ordered by increaing frame number. We wrie he collecion of rack a a e X = {T 1,... T K }. We aume ha rack behave independenly of each oher, and ha each follow a variable-lengh Markov model: P (X) = P (T ) T X where P (T ) = P (x 1 ) ( N 1 n=1 ) P (x n+1 x n ) P (x N ) The dynamic model P (x n+1 x n ) encode a moohne prior for rack locaion. We wrie P (x 1 ) for he probabiliy of a rack aring a locaion x 1, and P (x N ) for he probabiliy of a rack raniioning ino a erminaion ae from locaion x N. If he probabiliy of erminaion i low, he above prior will end o favor longer, bu fewer rack o a o minimize he oal number of erminaion. If hee probabiliie are dependen on he paial coordinae of x, hey can model he fac ha rack end o erminae near image border or ar near enry poin uch a doorway. 1202

Likelihood: We wrie Y = {y i i V } for he e of feaure vecor oberved a all pace-ime locaion in a video. For example, hee could be he e of gradien hiogram feaure ha are cored by a liding-window objec deecor. We now decribe a likelihood model for generaing Y given he e of rack X. We make wo aumpion: 1) here exi a one-o-one mapping beween a puaive objec ae x and pace-ime locaion index i and 2) rack do no overlap (T k T l = for k l). Togeher, boh imply ha a locaion can be claimed by a mo one rack. We wrie y x for he image feaure a locaion x; hee feaure are generaed from a foreground appearance model. Feaure vecor for unclaimed window are generaed from a background model: ( ) P (Y X) = P fg (y x ) P bg (y i ) (2) T X x T = Z T X x T l(y x ) where l(y x ) = P fg(y x ) P bg (y x ) and i V \X Z = i P bg (y i ) The likelihood i, up o a conan, only dependen on feaure of he window which are par of he e of rack. If we aume ha he foreground and background likelihood are Gauian deniie wih he ame covariance, P fg (y x ) = N(y x ; µ fg, Σ) and P bg (y x ) = N(y x ; µ bg, Σ), we can wrie he log-likelihood-raio a a linear funcion (log l(y x ) = w y x ), akin o a logiic regreion model derived from a cla-condiional Gauian aumpion. Thi model provide a generaive moivaion for he linear emplae ha we ue a local deecor in our experimen. 3.2. Track inerdependence The above model i reaonable when he rack do no overlap or occlude each oher. However, in pracice we need o deal wih boh occluion and non-maxima uppreion. Occluion: To model occluion, we allow rack o be compoed of ae vecor from non-conecuive frame e.g., we allow n and n+1 o differ by up o k frame. The dynamic model P (x n+1 x n ) for uch k-frame kip capure he probabiliy of oberving he given k-frame occluion. Non-maxima uppreion: When we conider a dene e of locaion V, here will be muliple rack which core well bu correpond o he ame objec (e.g., a good rack hifed by one pixel will alo have a high probabiliy mach o he appearance model). A complee generaive model could accoun for hi by producing a cluer of image feaure around each rue objec locaion. Inference would explain away evidence and enforce excluion. In pracice, he ypical oluion i o apply non-max uppreion (NMS) a a pre-proce o prune he e of candidae locaion V prior o muli-objec racking [1, 6, 14, 25]. In our experimen, we alo uilize NMS o prune he e V and a a heuriic for explaining away evidence. However, we how ha he NMS procedure can be naurally embedded wihin our ieraive algorihm (raher han a a pre-proce). By uppreing exra deecion around each rack a i i inanced, we allow for he poibiliy ha he prior can override he obervaion erm and elec a window which i no a local maxima. Thi allow he NMS procedure o exploi emporal coherence. The recen work of [2] make a imilar argumen and add an explici nonoverlapping conrain o heir ILP, which may acrifice racabiliy. We demonrae in Sec. 6 ha our imple and fa approach produce ae-of-he-ar reul. 4. MAP Inference The maximuim a poeriori (MAP) eimae of rack given he collecion of oberved feaure i: X = argmax P (X)P (Y X) (3) X = argmax P (T ) l(y x ) X T X x T (4) = argmax X log P (T ) + log l(y x ) (5) T X x T We drop he conan facor Z and ake logarihm of he objecive funcion o implify he expreion while preerving he MAP oluion. The above can be re-wrien a an Ineger Linear Program: f = argmin C(f) (6) f wih C(f) = c i fi + c ij f ij + c i f i + c ifi i ij E i i (7).. f ij, f i, f i, f i {0, 1} and f i + j f ji = f i = f i + j f ij (8) where f i i a binary indicaor variable ha i 1 when paceime locaion i i included in ome rack. The auxiliary variable f ij along wih he econd conrain (8) enure ha a mo one rack claim locaion i, and ha muliple rack may no pli or merge. Wih a ligh abue of noaion, le u wrie x i for he puaive ae correponding o locaion i: c i = log P (x i ), c i = log P (x i ), () c ij = log P (x j x i ), c i = log l(y i ). 1203

frame 1 frame 2 frame 3 Figure 3. The nework model from [25] for hree conecuive frame of video. Each pace-ime locaion i V i repreened by a pair of node conneced by a red edge. Poible raniion beween locaion are modeled by blue edge. To allow rack o ar and end a any paioemporal poin in he video, each locaion i i conneced o boh a ar and erminaion node. All edge are direced and uni capaciy. The co are c i for red edge, c ij for blue edge and c i and c i for black edge. encode he rack ar, erminae, raniion, and obervaion likelihood repecively. We define he edge e E o pan he e of permiible ae raniion given by our dynamic model (Sec.3.1). 4.1. Equivalence o nework flow To olve he above problem, we can relax he ineger conrain in (8) o linear box conrain (e.g., 0 f i 1) Thi relaxaion yield a uni capaciy nework flow problem whoe conrain marix i oally unimodular, implying ha opimal oluion o he relaxed problem will ill be inegral [1]. In paricular, aume ha we knew he number of rack in a video o be K. Le F K denoe he e of flow conervaion and uni capaciy conrain along wih he addiional conrain { F K = f ij, f i, fi, f i [0, 1], i f i = K, fi + j f ji = f i = fi + j f ij, i f i = K Minimizing C(f) ubjec o conrain F K i an inance of a minimum co flow problem [1, 25]. Such problem are imilar o max-flow problem (commonly ued in viion for olving graph-cu problem [5]), excep ha edge in he flow nework are labeled wih a co a well a capaciy. The co of a flow i defined o be he um, over all edge, of he co of each edge muliplied by he flow hrough ha edge. Finding he MAP eimae of K rack correpond o finding a minimum co flow ha puhe K uni of flow from he ource o he ink. Figure 3 how an example flow nework conruced from he racking problem. Each pace-ime locaion i, or equivalenly puaive objec ae x i, correpond o a pair of node (u i, v i ) conneced by an edge of co c i. Each raniion beween ucceive window i repreened by an edge (v i, u j ) wih co c ij. Finally, node and are inroduced wih edge (, u i ) correponding o rack ar and edge (v i, ) for erminaion (wih co c i and c i repecively). All edge have uni capaciy. Puhing K uni of flow from o yield a e of K dijoin -pah, each of which correpond o one of he opimal rack T X. 5. Finding min-co flow Zhang e al. [25] decribe how o olve he above opimizaion problem in O(mn 2 log n) ime uing a puhrelabel mehod [12], where n i he number of node (e.g. deecion window) in he nework graph and m i he number of edge. Auming ha n and m cale linearly wih he number of frame N (reaonable given a fixed number of deecion per frame), he algorihm ake O(N 3 log N) o find K rack. Furhermore, he co of he opimal oluion, min f FK C(f) i convex in K [25] o one can ue a biecion earch over K (upper-bounded by he number of deecion) o find he opimal number of rack wih a oal running ime O(N 3 log 2 N). In he following, we how ha one can olve he muliobjec racking problem in O(KN log N) by olving K +1 hore-pah problem. Thi coniderable reducion in complexiy i due o wo paricular properie of he nework in Fig.3: 1. All edge are uni capaciy. 2. The nework i a direced acyclic graph (DAG). The above condiion allow one o ue dynamic programming (DP) algorihm o compue hore pah. We decribe a novel DP algorihm ha i neceary o conruc a globally-opimal O(KN log N) algorihm. We alo how ha DP produce he opimal oluion for K = 1 in O(N) and high-qualiy approximae oluion for K > 1 in O(KN). We begin by decribing he opimal O(KN log N) algorihm baed on ucceive hore pah (inroduced in Fig.2). 5.1. Succeive Shore-pah We now decribe a ucceive hore pah algorihm [1] for olving min-co flow problem for DAG nework wih uni-capaciy link. Given a graph G wih an inegral flow f, define he reidual graph G r (f) o be he ame a he original graph excep ha all edge ued in he flow f are revered in direcion and aigned negaive heir original co. We iniialize he algorihm by eing he flow f o be zero and hen ierae he following wo ep: 1. Find he minimum-co pah γ from o in G r (f) 2. If oal co of he pah C(γ) i negaive, updae f by puhing uni-flow along γ unil no negaive co pah can be found. Since each pah ha uni capaciy, each ieraion increae he oal flow by 1 and 1204

decreae he objecive by C(γ). The algorihm erminae afer K + 1 ieraion having found a minimum co flow. Puhing any furher flow from o will only increae he co. We refer he reader o [1] for a proof of he correcne of he algorihm bu give a brief ouline. We ay a flow f i F K -feaible if i aifie he conrain e F K. A neceary and ufficien condiion for f o be a minimum co flow of ize K i ha i be F K -feaible and ha here doe no exi a negaive-co direced cycle in G r (f). The ucceive hore-pah algorihm above ar wih a F 0 -feaible flow and a each ieraion i yield a new flow which i F i - feaible. Furhermore, each ep of he algorihm modifie edge along a ingle pah and can be hown o no inroduce any negaive weigh cycle. Figure 4 how example ieraion of hi algorihm and he reuling equence of reidual graph. Noe ha he hore pah in he reidual nework may inance a new rack and/or edi previou rack by removing flow from hem (by puhing flow hrough he revere edge). In each ieraion, we need o find a hore -pah. We would like o ue Dijkra algorihm o compue he hore pah in O(N log N), making he overall algorihm O(KN log N) where K i he opimal number of rack. Unforunaely, here are negaive edge co in our original nework, precluding he direc applicaion of Dijkra algorihm. Forunaely, one can conver any min-co flow nework o an equivalen nework wih non-negaive co [1]. Thi converion require compuing he hore-pah of every node from in he original graph G. For general graph wih negaive weigh, hi compuaion ake O(N 2 ) uing he Bellman-Ford algorihm [1]. For DAG, one can ue a O(N) dynamic programming algorihm, which we decribe below. The ucceive hore pah algorihm hu run in O(KN log N) operaion and reurn he global minima for our racking problem (Equaion 3). 5.2. Dynamic Programming Soluion for K = 1 We now preen a O(N) dynamic programming (DP) algorihm for compuing he hore pah of every node o. We will alo how ha hi algorihm olve he min co flow problem for K = 1. Becaue each edge in he nework i of uni capaciy, he minimum co uni flow mu correpond o he hore pah from node o. Becaue he original nework graph i a DAG, one can conruc a parial ordering of node and ue DP o compue hore pah by weeping from he fir o la frame. Thi i imilar o DP algorihm for racking bu augmened o eimae boh he birh and deah ime of a rack. Aume ha node are ordered in ime, and le co(i) repreen he minimum co of a rack paing hrough node i. We iniialize co(i) for deecion in he fir frame o be co(i) = c i + c i. We can hen recurively compue he (a) (c) (e) Figure 4. Illuraion of ucceive hore pah algorihm. (a) The racking problem modeled a a graph a decribed in Fig.3. The algorihm hould end a given amoun of flow from ource node o he erminal. (b) One uni of flow f 1 i paed hrough he hore pah (in red) from ource o erminal. (c) The reidual graph G r(f 1) produced by eliminaing he hore pah and adding edge (in green) wih uni capaciy and negaive co wih he oppoie direcion. (d) The hore pah found in he reidual graph. In hi example, hi pah ue previouly added edge, puhing flow backward and ediing he previouly inanced rack. (e) Reidual graph afer paing wo uni of flow. A hi poin, no negaive co pah exi and o he algorihm erminae and reurn he wo rack highlighed in (f). Noe ha he algorihm ulimaely pli he rack inanced in he fir ep in order o produce he final opimal e of rack. In hi example only one pli happened in an ieraion, bu i i poible for a hore pah o ue edge from wo or more previouly inanced rack, bu i i very rare in pracice. Our dynamic programming algorihm canno reolve any pliing ince he reidual graph ha cycle, however he 2-pa dynamic programming algorihm can reolve he iuaion when any new hore pah pli a mo one previouly inanced rack. co in ucceive frame a: co(i) = c i + min(π, c i ) where π = min c ij + co(j) j N(i) (10) where N(i) i he e of deecion from he previou k frame ha can raniion o deecion i. The co of he opimal ending a node i i hen co(i) + c i, and he overall hore pah i compued by aking a min over i. By caching he argmin value a each node, we can reconruc he hore pah o each node in a ingle backward weep. (b) (d) 5.3. Approximae DP oluion for K > 1 We now propoe a imple greedy algorihm o inance a variable, unknown number of dijoin, low-co rack. Sar wih he original nework-flow graph: (f) 1205

1. Find he hore pah from o uing DP. 4 4 4 4 4 4 2. If he co of he pah i negaive, remove node on he pah and repea. rack birh rack deah The above algorihm perform K + 1 ieraion of DP o dicover K rack he la inanced rack i ignored ince i increae he overall co. I running ime i O(KN ). A each ieraion, we have obained a feaible (bu no necearily minimum co) k-uni flow. The ub-opimaliy lie in he fac ha he above algorihm canno adju any previouly inanced rack baed on he demand o produce addiional rack. In ucceive age, i operae on a ube of he original graph raher han he reidual graph ued in he ucceive hore pah algorihm. Unforunaely dynamic programming can be direcly applied o he reidual graph Gr (f ) ince he reidual graph i no longer a DAG (Fig.4-(c)). a rack deah c b 14 14 rack deah d e d Figure 5. We how he reul of our algorihm, including eimaed rack birh and deah, on he Calech Pederian daae [8]. We how ypical reul on he ETHMS daae in Fig.1. men, hi decreaed compuaion ime by hree order of magniude. 6. Experimenal Reul 5.4. Approximae 2-pa DP oluion for K > 1 Daae: Mo benchmark for muli-objec racking (e.g., PETS [24]) are deigned for aionary camera. We are inereed in moving camera applicaion, and o ue he Calech Pederian daae [8] and ETHMS daae [] o evaluae our algorihm. The Calech daae wa capured by a camera inalled on a moving car. I conain 71 video of roughly 1800 frame each, capured a 30 frame per econd. Since he ee conain heldou label, we evaluae ourelve uing all annoaed pederian on he raining e. The ETHMS daae conain fooage of a buy idewalk a een by a camera mouned on a child roller. Alhough hi daae conain boh lef and righ view o faciliae ereo, we ue only he lef view in our experimen. The daae conain four video of roughly 1000 frame each, capured a 14 fp. Boh daae include bounding box annoaion for people, while Calech alo provide rack ID. We manually annoaed ID on a porion of ETHMS. In order o compare our reul wih previou work, we ue he ame ETHMS video equence a [25] wih frame and ignore deecion maller han 24 pixel a hey did. Seup: We ran an ou-of-he-box pre-rained parbaed HOG pederian deecor [10] wih a conervaive NMS hrehold, generaing around 1000 deecion per frame of each video. We e he log-likelihood raio (he local co ci ) of each deecion o be he negaive core of he linear deecor (he diance from he deciion boundary of an SVM). We ue a bounded-velociy dynamic model: we define he raniion co cij o be 0, bu only connec candidae window acro conecuive frame ha paially overlap. We e birh and deah co (ci, ci ) o be 10. We experimened wih applying an addiional NMS ep wihin our greedy algorihm. We alo experimened wih occluion modeling by adding raniion which kip over k frame, wih k up o 10. We now decribe generalizaion of our DP-baed algorihm from 5.3 ha can alo inance new rack while performing mall edi of previouly inanced rack. We oberve ha mo of he ime he hore reidual pah doe no make large edi on previou rack. We ue he ame algorihm from Secion 5.1, bu perform an approximae hore-pah uing a 2-pa DP algorihm raher han Dijkra algorihm. We perform a forward pa of DP a in (10), bu on Gr (f ) raher han G wih co(i) defined a he be forward-progreing pah from he ource o node i (ignoring revered edge). We hen ue he co a iniial value for a backward pa aring from he la frame, defining N (i) o be he e of node conneced hrough revere edge o i. Afer hi pa, co(i) i he co of he be forward and backward progreing pah ending a i. One could add addiional pae, bu we find experimenally ha wo pae are ufficien for good performance while aving O(log N ) operaion over Dijkra approach. 5.5. Caching Our DP algorihm repeaedly perform compuaion on a erie of reduced or reidual graph. Much of hee compuaion can be cached. Conider he DP compuaion required for he algorihm from Secion 5.3. Once a rack i inanced, co(i) value for node whoe hore-pah inerec ha rack are no longer valid, and i i only hi mall number of node ha need o be re-evaluaed in he nex ieraion. Thi e can be marked uing he following fac: any pah ha inerec a ome node mu hare he ame birh node. Each node can be labeled wih i birh node by propagaing a birh ID during meage-paing in DP. We hen only need o recompue co(i) for node ha have he ame birh node a a newly inanced rack. In our experi1206

Co 200 400 600 800 1000 1200 DP (min 1263.6404 a 444) 2 pa DP (min 12.0164 a 516) Succeive hore pah (min 1284.1071 a 522) Co 1260 1270 1280 120 500 505 510 515 Number of rack 1400 0 200 400 600 800 1000 Number of rack Figure 6. Co v. ieraion number for all hree algorihm on Calech daae. The ine how ha our 2-pa DP algorihm produce rack whoe co i cloe o opimum while being order of magniude faer. Deecion Rae 0.6 0.5 0.4 0.3 0.2 0.1 DP SSP HOG Deecion Rae 0 10 4 10 3 10 2 10 1 10 0 Fale Poiive Per Frame 1 0.8 0.6 0.4 0.2 DP SSP 0 10 3 10 2 10 1 10 0 Fale Poiive Per Frame DP+NMS HOG Figure 7. Deecion rae veru FPPI on Calech daae [8] (lef) and ETHMS daae [] (righ). We compare our approximae 1- pa DP algorihm wih he opimal ucceive hore pah (SSP) algorihm and a HOG-deecor baeline. The DP perform a well a or even beer han he hore pah algorihm, while being order of magniude faer. We alo how ha by uppreing overlapping deecion afer each rack i inanced (DP-NMS), we can furher improve performance. Scoring crieria: We ue deecion accuracy (a meaured by deecion rae and fale poiive per frame) a our primary evaluaion crieria, a i allow u o compare wih a wide body of relaed work on hee daae. To direcly core racker accuracy, variou oher crieria (uch a rack fragmenaion, ideniy wiching, ec.) have been propoed [21, 20, 25]. We alo ue rack ideniy o evaluae our algorihm below. Approximaion qualiy: We have decribed hree differen algorihm for olving he minimum co flow problem. Figure 6 how he flow co, i.e., he objecive funcion, veru ieraion number for all hree algorihm on he Calech daae. The DP algorihm follow he ucceive hore pah (SSP) algorihm for many ieraion bu evenually i i neceary o edi a previouly inanced rack (a in Figure 4) and he greedy DP algorihm begin o make ubopimal choice. However DP and SSP do no deviae much before reaching he minimum co and he 2-pa DP which allow for a ingle edi follow SSP quie cloely. Thi figure ine how a cloe look a he co a he minimum. Since he 2-pa algorihm can pli a mo one rack in each ieraion and i i very rare o ee wo pli a he ame ieraion, he co value for 2-pa DP algorihm i very cloe o he opimum one. Raher han coring he co funcion, we can direcly compare algorihm uing rack accuracy. Figure 7 how deecion rae veru FPPI for he baeline deecor, DP, and SSP algorihm. Thee figure how ha DP and SSP are imilar in accuracy, wih DP performing even beer in ome cae. We upec he SSP algorihm produce (overly) hor rack becaue he 1 order Markov model enforce a geomeric diribuion over rack lengh. The approximae DP algorihm inadverenly produce longer rack (ha beer mach he ground ruh diribuion of lengh) becaue previouly inanced rack are never cu or edied. We henceforh evaluae our one-pa DP algorihm in he ubequen experimen. We alo preen addiional diagnoic Lengh of % of window allowable occluion wih ID error 1 14.6 5 13. 10.3 Table 1. Evaluaing rack label error a a funcion of he lengh of he allowable occluion. We how reul for our DP algorihm applied o a porion of he ETHMS daae given ideal deeced window. Our DP algorihm cale linearly wih he lengh of allowable occluion. By allowing for longer occluion (common in hi daae), he % of window wih correc rack label ignificanly increae. experimen on he ETHMS daa, ince i conain on average more objec han Calech. Track ideniie: We evaluae rack ideniie on he ETHMS daae by uing our racker o compue rack label for ground-ruh bounding boxe. Thi i equivalen o running our racker on an ideal objec deecor wih zero mied deecion and fale poiive. Given a correpondence beween eimaed rack label and ground-ruh rack label, he miclaificaion rae i he fracion of bounding boxe wih incorrec label. We compue he correpondence ha minimize hi error by biparie maching [15]. We found occluion modeling o be crucial for mainaining rack ideniie. Our algorihm can repor rack wih k-frame occluion by adding in raniion beween pace-ime window paced k frame apar. Our DP algorihm cale linearly wih k, and o we can readily model long 10-frame occluion (Table 1). Thi grealy increae he accuracy of rack label on hi daa becaue uch occluion are common when nearby people pa he camera, occluding people furher away. Thi reul implie ha, given ideal local deecor, our racking algorihm produce rack ideniie wih 0% accuracy. NMS-wihin-he-loop: In Figure 7, we ue he ETHMS daae o examine he effec of adding a NMS ep wihin 1207

Algorihm Deecion rae Fale poiive per frame [] ereo algorihm 47 1.5 [25] algorihm 1 68.3 0.85 [25] algorihm 2 wih occluion handling 70.4 0.7 [23] wo-age algorihm wih occluion handling 75.2 0.3 Our DP 76.6 0.85 Our DP+NMS 7.8 0.85 Table 2. Our algorihm performance compared o he previou ae-of-he-ar on he ETHMS daae. Pleae ee he ex for furher dicuion. our ieraive greedy algorihm. When applying [10] pederian deecor, we ue heir defaul NMS algorihm a a pre-proce o uppre deecion ha overlap oher higher-coring deecion by ome hrehold. Afer inancing a rack during he DP algorihm, we uppre remaining window ha overlap he inanced rack uing a lower hrehold. Thi uppreion i more reliable han he iniial one becaue racked window are more likely o be rue poiive. Our reul ouperform all previouly publihed reul on hi daa (Table 2). Running ime: For he -frame ETHMS daae, MATLAB LP olver doe no converge, he commercial min-co-flow olver ued in [23] ake 5 econd, while our MATLAB DP code ake 0.5 econd. 7. Concluion We have decribed a family of efficien, greedy bu globally opimal algorihm for olving he problem of muliobjec racking, including eimaing he number of objec and heir rack birh and deah. Our algorihm are baed on a novel analyi of a min-co flow framework for racking. Our greedy algorihm allow u o embed pre-proceing ep uch a NMS wihin our racking algorihm. Our calable algorihm alo allow u o proce large inpu equence and model long occluion, producing ae-of-he-ar reul on benchmark daae. Acknowledgemen: Funding for hi reearch wa provided by NSF Gran 0540 and 0812428, and ONR- MURI Gran N00014-10-1-033. Reference [1] R. Ahuja, T. Magnai, and J. Orlin. Nework flow: Theory, Algorihm, and Applicaion. Prenice Hall, 2008. [2] A. Andriyenko and K. Schindler. Globally opimal muliarge racking on a hexagonal laice. In ECCV, 2010. [3] J. Berclaz, F. Fleure, and P. Fua. Muliple objec racking uing flow linear programming. In Performance Evaluaion of Tracking and Surveillance (PETS-Winer), 200 Twelfh IEEE Inernaional Workhop on, page 1 8. IEEE, 2010. [4] J. Berclaz, F. Fleure, E. Türeken, and P. Fua. Muliple Objec Tracking uing K-Shore Pah Opimizaion. IEEE Tranacion on PAMI, Acceped for publicaion in 2011. [5] Y. Boykov, O. Vekler, and R. Zabih. Fa approximae energy minimizaion via graph cu. IEEE PAMI, 2001. [6] Y. Cai, N. de Freia, and J. Lile. Robu viual racking for muliple arge. Lecure Noe in Compuer Science, 354:107, 2006. [7] W. Choi and S. Savaree. Muliple arge racking in world coordinae wih ingle, minimally calibraed camera. ECCV 2010, page 553 5, 2010. [8] P. Dollár, C. Wojek, B. Schiele, and P. Perona. Pederian deecion: A benchmark. In IEEE CVPR, June 200. [] A. E, B. Leibe, and L. Van Gool. Deph and appearance for mobile cene analyi. In ICCV, 2007. [10] P. Felzenzwalb, D. McAlleer, and D. Ramanan. A dicriminaively rained, mulicale, deformable par model. IEEE CVPR, 2008. [11] T. Formann, Y. Bar-Shalom, and M. Scheffe. Sonar racking of muliple arge uing join probabiliic daa aociaion. IEEE Journal of Oceanic Engineering, 8(3):1 184, 1. [12] A. Goldberg. An efficien implemenaion of a caling minimum-co flow algorihm. Journal of Algorihm, 22(1):1 2, 17. [13] M. Iard and J. MacCormick. Bramble: A bayeian mulipleblob racker. In ICCV, 2001. [14] H. Jiang, S. Fel, and J. Lile. A linear programming approach for muliple objec racking. In IEEE CVPR, 2007. [15] H. Kuhn, P. Haa, I. Ilya, G. Lohman, and V. Markl. The Hungarian mehod for he aignmen problem. Mahead, 23(3):151 210, 13. [16] S. K. V. G. L. Leibe, B. Coupled deecion and rajecory eimaion for muli-objec racking. ICCV 2007. [17] Y. Ma, Q. Yu, and I. Cohen. Targe racking wih incomplee deecion. CVIU, 200. [18] S. Pellegrini, A. E, and L. V. Gool. Improving daa aociaion by join modeling of pederian rajecorie and grouping. In ECCV, 2010. [1] A. Perera, C. Sriniva, A. Hoog, G. Brookby, and W. Hu. Muli-objec racking hrough imulaneou long occluion and pli-merge condiion. In IEEE CVPR, volume 1, 2006. [20] A. G. A. Perera, A. Hoog, C. Sriniva, G. Brookby, and W. Hu. Evaluaion of algorihm for racking muliple objec in video. In AIPR, page 35, 2006. [21] K. Smih, D. Gaica-Perez, J. Odobez, and S. Ba. Evaluaing muli-objec racking. In CVPR Workhop. IEEE, 2005. [22] C. Sauffer. Eimaing racking ource and ink. In Proc. Even Mining Workhop. Cieeer. [23] J. Xing, H. Ai, and S. Lao. Muli-objec racking hrough occluion by local rackle filering and global rackle aociaion wih deecion repone. In IEEE CVPR, June 200. [24] D. Young and J. Ferryman. Pe meric: On-line performance evaluaion ervice. In Join IEEE Inernaional Workhop on Viual Surveillance and Performance Evaluaion of Tracking and Surveillance (VS-PETS), page 317 4, 2005. [25] L. Zhang, Y. Li, and R. Nevaia. Global daa aociaion for muli-objec racking uing nework flow. In CVPR, 2008. 1208