Everybody needs somebody: Modeling social and grouping behavior on a linear programming multiple people tracker

Everybody needs somebody: Modelng socal and groupng behavor on a lnear programmng multple people tracker Laura Leal-Taxe, Gerard Pons-Moll and Bodo Rosenhahn Insttute for Informaton Processng (TNT) Lebnz Unversty Hannover, Germany leal@tnt.un-hannover.de Abstract Multple people trackng conssts n detectng the subjects at each frame and matchng these detectons to obtan full trajectores. In sem-crowded envronments, pedestrans often occlude each other, makng trackng a challengng task. Most trackng methods make the assumpton that each pedestran s moton s ndependent, thereby gnorng the complex and mportant nteracton between subjects. In ths paper, we present an approach whch ncludes the nteracton between pedestrans n two ways: frst, consderng socal and groupng behavor, and second, usng a global optmzaton scheme to solve the data assocaton problem. Results on three challengng publcly avalable datasets show our method outperforms state-of-the-art trackng systems. (a) (b) (c) Fgure : Includng socal and groupng behavor to the network flow graph. (a) Constant velocty assumpton. (b) Avodance forces. (c) Group attracton forces. groupng behavor to obtan a robust tracker able to work n crowded scenaros... Related work. Introducton The optmzaton strategy deals wth the data assocaton problem, whch s usually solved on a frame-by-frame bass or one track at a tme. Several methods can be used such as Markov Chan Monte Carlo (MCMC) [5] or nference n Bayesan networks [2]. In [3] an effcent approxmatve Dynamc Programmng (DP) scheme s presented, n whch trajectores are estmated one after the other, whch obvously does not guarantee a global optmum for all trajectores. Recent works show that global optmzaton can be more relable n crowded scenes as t solves the matchng problem jontly for all tracks. The multple object trackng problem s defned as a lnear constraned optmzaton flow problem and Lnear Programmng (LP) s commonly used to fnd the global optmum. The dea was frst used for people trackng n [2], although ths method needs to know a pror the number of targets to track, whch lmts ts applcaton n real trackng stuatons. Other works formulate the trackng problem as a maxmum flow [4] or a mnmum cost problem [24, 28], both effcently solved usng LP and wth a far superor performance when compared wth DP [3]. Most trackng systems work wth the assumpton that the moton model for each target s ndependent. Ths smplfy- Multple people trackng s a key problem for many computer vson tasks, such as survellance, anmaton or actvty recognton. In crowded envronments occlusons and false detectons are common, and although there have been substantal advances n the last years, trackng s stll a challengng task. Trackng s often dvded n two steps: detecton and data assocaton. Researchers have presented mprovements on the object detector [5, 26] as well as on the optmzaton technques [4, 6] and even specfc algorthms have been developed for trackng n crowded scenes []. Though each object can be tracked separately, recent works have proven that trackng objects jontly and takng nto consderaton ther nteracton can gve much better results n complex scenes. Current research s manly focused on two aspects to explot the nteracton between pedestrans: the use of a global optmzaton strategy [4, 28] and a socal moton model [22, 27]. The focus of ths paper s to marry the concepts of global optmzaton and socal and Ths work was partally funded by the German Research Foundaton, DFG projects RO 2497/7- and RO 2524/2-.

ng assumpton s especally problematc n crowded scenes. In order to avod collsons and reach the chosen destnaton at the same tme, a pedestran follows a seres of socal rules or socal forces. These have been defned n what s called the Socal Force Model (SFM) [], whch has been used for abnormal crowd behavor detecton [9], crowd smulaton [2] and has only recently been appled to multple people trackng: n [25], an energy mnmzaton approach s used to predct the future poston of each pedestran consderng all the terms of the socal force model. In [22] and [7], the socal forces are ncluded n the moton model of the Kalman or Extended Kalman flter. In [] a method s presented to detect small groups of people n a crowd, but t s only recently that groupng behavor has been ncluded n a trackng framework [7, 23, 27]. In [23] groups are ncluded n a graphcal model whch contans cycles and, therefore, Dual-Decomposton s needed to fnd the soluton, whch obvously s computatonally much more expensve than usng Lnear Programmng. Moreover, the results presented n [23] are only for short tme wndows. On the other hand, the formulatons of [7,27] are predctve by nature and therefore too local and unable to deal wth trajectory changes (e.g. when people meet and stop to talk). Socal behavor models have only been ntroduced wthn a predctve framework, whch are suboptmal due to the recursve nature of flterng. Therefore, n contrast to prevous works, we propose to nclude socal and groupng models nto a global optmzaton framework whch allows us to better estmate the true maxmum a-posteror probablty of the trajectores..2. Contrbutons We present a novel approach for multple people trackng whch takes nto account the nteracton between pedestrans n two ways: frst, usng global optmzaton for data assocaton and second, ncludng socal as well as groupng behavor. The key nsght s that people plan ther trajectores n advance n order to avod collsons, therefore, a graph model whch takes nto account future and past frames s the perfect framework to nclude socal and groupng behavor. We formulate multple object trackng as a mnmum-cost network flow problem, and present a new graph model whch yelds to better results than exstng global optmzaton approaches. The socal force model (SFM) and groupng behavor (GR) are ncluded n an effcent way wthout alterng the lnearty of the problem. Results on several challengng publc datasets show the mprovement of the trackng results n crowded envronments. 2. Multple people trackng Trackng s commonly dvded n two steps: object detecton and data assocaton. Frst, the objects are detected n each frame of the sequence and second, the detectons are matched to form complete trajectores. In ths secton we defne the data assocaton problem and descrbe how to convert t to a mnmum-cost network flow problem, whch can be effcently solved usng Lnear Programmng. The dea s to buld a graph n whch the nodes represent the pedestran detectons. These nodes are fully connected to past and future observatons by edges, whch determne the relaton between two observatons wth a cost. Thereby, the matchng problem s equvalent to a mnmum-cost network flow problem: fndng the optmal set of trajectores s equvalent to sendng flow through the graph so as to mnmze the cost. 2.. Problem statement Let O = {o } be a set of object detectons wth o = (p, t ), where p = (x, y, z) s the 3D poston and t s the tme stamp. A trajectory s defned as a lst of ordered object detectons T k = {o k, o k2,, o kn }, and the goal of multple object trackng s to fnd the set of trajectores T = {T k } that best explans the detectons. Ths s equvalent to maxmzng the a-posteror probablty of T gven the set of detectons O. Assumng detectons are condtonally ndependent, the objectve functon s expressed as: T = argmax T P (o T )P (T ) () P (o T ) s the lkelhood of the detecton. In order to reduce the space of T, we make the assumpton that the trajectores cannot overlap (.e., a detecton cannot belong to two trajectores), but unlke [28], we do not defne the moton of each subject to be ndependent, therefore, we deal wth a much larger search space. We extend ths space by ncludng the followng dependences for each trajectory T k : Constant velocty assumpton: the observaton o T k depends on past observatons [o, o 2 ] Groupng behavor: If T k belongs to a group, the set of members of the group T k,gr has an nfluence on T k Avodance term: T k s affected by the set of trajectores T k,sfm whch are close to T k at some pont n tme and do not belong to the same group as T k The frst and thrd dependences are grouped nto the SFM term. The sets T k,sfm and T k,gr are dsjont,.e., a pedestran can have an attractve effect or a repulsve effect on another pedestran, but not both. Therefore, we decompose P (T ) as: P (T ) = P (T k T k,sfm T k,gr ) (2) T k T = P (T k,sfm T k )P (T k,gr T k )P (T k ) T k T

Frame t- where the trajectores are represented by a Markov chan: Q P (T ) = Pn (ok )... P (ok ok ) (3) Frame t+ Frame t Tk T Pk,SFM (ok,sfm ok, ok )Pk,GR (ok,gr ok, ok )... Pout (okn ) where Pk,SFM evaluates how well the socal rules are kept f ok s matched to ok, and Pk,GR descrbes how well the structure of the group s kept. Tk T log P (TGR Tk ) + e5 b5 We lnearze the objectve functon by defnng a set of flow flags f,j = {, } whch ndcate f an edge (, j) s n the path of a trajectory or not. In a mnmum cost network flow problem, the objectve s to fnd the values of the varables that mnmze the total cost of the flows over the network. Defnng the costs as negatve log-lkelhoods, and combnng Equatons (), (2) and (3), the followng objectve functon s obtaned: T = argmn log P (Tk ) log P (TSFM Tk ) e2 b2 2.2. Trackng wth Lnear Programmng T s (s, e ) log P (o T ) b e (e e3 b3, b j) e6 b6 (b, e ) b4 (b, t) e4 t Fgure 2: Example of a graph wth the specal source s and snk t nodes, 6 detectons whch are represented by two nodes each: the begnnng b and the end e. = argmn T + Cn, fn, + Below we detal the three types of edges present n the graphcal model and the cost for each type: C,out f,out (C,j + CSFM,,j + CGR,,j )f,j + C f,j subject to the followng constrants: Edge capactes: we assume that each detecton can only correspond to one trajectory, therefore, the edge capactes have an upper bound of uj and: fn, + f f,out + f (4) Flow conservaton at the nodes: P P fn, + f = j f,j j fj, = f,out + f (5) To map ths formulaton nto a cost-flow network, we defne G = (N, E) to be a drected network wth a cost C,j and a capacty uj assocated wth every edge (, j) E. An example of such a network s shown n Fgure 2; t contans two specal nodes, the source s and the snk t; all flow that goes through the graph starts at the s node and ends at the t node. Thereby, each flow represents a trajectory Tk and the path that each flow follows ndcates whch observatons belong to each of the trajectores. Each observaton o s represented wth two nodes, the begnnng node b N and the end node e N (see Fgure 2). A detecton edge connects b and e. Lnk edges. The edges (e, bj ) connect the end nodes e wth the begnnng nodes bj n followng frames, wth cost C,j and flow f,j, whch s f o and oj belong to Tk and f Fmax, and otherwse. f s the frame number dfference between nodes j and and Fmax s the maxmum allowed frame gap. The costs of the lnk edges represent the spatal relaton between dfferent subjects. Assumng that a subject cannot move a lot from one frame to the next, we defne the costs to be a decreasng functon of the dstance between detectons s successve frames. The tme gap between observatons s also taken nto account n order to be able to work at any frame rate, therefore velocty measures are used nstead of dstances. The veloctes are mapped to probabltes wth a Gauss error functon as shown n Equaton (6), assumng the pedestrans cannot exceed a maxmum velocty Vmax. The choce of parameter Vmax s detaled n Secton 4. E(Vt, Vmax ) = 2 + 2 erf Vt + Vmax 2 Vmax 4 (6) The advantage of usng Equaton (6) over a lnear functon s that the probablty of lower veloctes decreases more slowly, whle the probablty for hgher veloctes decreases more rapdly. Ths s consstent wth the probablty dstrbuton of speed learned from tranng data.

Therefore, the cost of a lnk edge s defned as: C,j = log (P (o j o )) + C( f) ( ) pj p = log E ) t, V max + C( f) ( ) where C( f) = log B f j s the cost dependng on the frame dfference between detectons. Detecton edges. The edges (b, e ) connect the begnnng node b and end node e, wth cost C and flow f, whch s f o belongs to T k, and otherwse. ( C = log ( P det (o )) + log BB mn p BB p ) If all the costs of the edges are postve, the soluton to the mnmum-cost problem s the trval null flow. Consequently, we represent each observaton wth two nodes and a detecton edge wth negatve cost. The hgher the lkelhood of a detecton P det (o ) the more negatve the cost of the detecton edge, hence, confdent detectons are lkely to be n the path of the flow n order to mnmze the total cost. If a map of the scene s avalable, we can also nclude ths nformaton n the detecton cost. If a detecton s far away from a possble entry/ext pont, we add an extra negatve cost to the detecton edge, n order to favor that observaton to be matched. The added cost depends on the dstance to the closest entry/ext pont p BB, and s only computed for dstances hgher than BB mn =.5m. Entrance and ext edges. The edges (s, e ) connect the source s wth all the end nodes e, wth cost C n, and flow f n,, whch s f T k starts at o and otherwse. Smlarly, (b, t) connects the end node b wth snk t, wth cost C,out and flow f,out, whch s f T k ends at o, and otherwse. By connectng the s node wth the end nodes (or t to begn nodes), we make sure that when a track starts (or ends) t does not beneft from the negatve cost of the detecton edge. Therefore, we defne C n = C out = and the flow constrants of Eqs. (4) and (5). In [28], the authors propose to create the opposte edges (s, b ) and (e, t). The advantage of our formulaton s that t does not depend on P n and P out, whch are data dependent terms that need to be calculated durng optmzaton. 3. Modelng socal behavor If a pedestran does not encounter any obstacles, the natural path to follow s a straght lne. But what happens when the space gets more and more crowded and the pedestran can no longer follow the straght path? Socal nteracton between pedestrans s especally mportant when the envronment s crowded. In ths secton we consder how to nclude the socal behavor, whch we dvde nto the Socal ) Force Model (SFM) and the Group behavor (GR), nto our mnmum-cost network flow problem. 3.. Socal Force Model The socal force model states that the moton of a pedestran can be descrbed as f they were subject to socal forces. There are three man terms that need to be consdered: the desre of a pedestran to mantan a certan speed, the desre to keep a comfortable dstance from other pedestrans and the desre to reach a destnaton. Snce we cannot know a pror the destnaton of the pedestran n a real trackng system, we focus on the frst two terms. Constant velocty assumpton. The pedestran tres to keep a certan speed and drecton, therefore we assume that n t + t we have the same speed as n t and predct the pedestran s poston n t + t accordngly. Avodance term. The pedestran also tres to avod collsons and keep a comfortable dstance from other pedestrans. We model ths term as a repulson feld wth an exponental dstance-decay functon wth value α learned from tranng data. a t+ t = g m g exp ( pt+ t p t+ t m α t The only pedestrans that have ths repulson effect on subject are the ones whch do not belong to the same group as and p t+ t p t+ t m m. The dfferent avodance terms are combned lnearly. Now the predcton of the pedestran s next poston s also nfluenced by the avodance term from all pedestrans: p t+ t = p t + (vt + at+ t t) t (7) The dstance between predcton and real measurements s used to compute the cost: ( ) p C SFM,,j = log E t+ t p t+ t j t, V max In Fgure 3 we plot the probablty dstrbuton computed usng dfferent terms. Note, ths s just for vsualzaton purposes, snce we do not compute the probablty for each pont on the scene, but only for the postons where the detector has fred. There are 4 pedestrans n the scene, the purple one and 3 green ones walkng n a group. As shown n 3(b), f we only use the predcted postons (yellow heads) gven the prevous speeds, there s a collson between the purple pedestran and the green marked wth a collde. The avodance term shfts the probablty mode to a more plausble poston. )

5 5 2 25 5 5 5 5 5 5 5 5 (a) (b) (c) (d) (e) 2 2 2 2 Fgure 3: Three green pedestrans walk n a group, the predcted postons n the next frame are marked by yellow heads. The : 72 Y: 29 purple pedestran s lnearly predcted poston (yellow head) clearly nterferes wth the trajectory of the group. Representaton of the probablty (blue s red s ) dstrbuton for the purple s next poston usng: 3(a) only dstances, 3(b) only SFM 25 25 25 25 (constant velocty assumpton and avodance term), 3(c) only GR (consderng the purple pedestran belongs to the group), 5 5 2 25 5 5 2 25 5 5 2 25 5 5 2 25 3(d) dstances+sfm and 3(e) dstances+sfm+gr. 3.2. Group Model The socal behavor [] also ncludes an attracton force whch occurs when a pedestran s attracted to a frend, shop, etc. We model the attracton between members of a group. Before modelng group behavor we determne whch tracks form each group and at whch frame the group begns and ends (to deal wth splttng and formaton of groups). The dea s that f two pedestrans are close to each other over a reasonable perod of tme, they are lkely to belong to the same group. From the tranng sequence n [22], we learn the dstance and speed probablty dstrbutons of the members of a group P g vs. ndvdual pedestrans P. If m and n are two trajectores whch appear on the scene at t = [, N], we compute the flag G m,n that ndcates f m and n belong to the same group. N, P g (m, n) > N P (m, n) G m,n = t= t=, otherwse For every observaton o, we defne a group label g whch ndcates to whch group the observaton belongs to, f any. If several pedestrans form a group, they tend to keep a smlar speed, therefore, f belongs to a group, we can use the mean speed of all the other members of the group to predct the next poston for usng Equaton (7). The dstance between ths predcted poston and the real measurements s used n (6) to obtan the probablty for the groupng term. An example s shown n Fgure 3(c), where we can see that the maxmum probablty provded by the group term keeps the group confguraton. In Fgure 3(d) we show the combned probablty of the dstance and SFM nformaton, whch narrows the space of probable postons. Fnally, 3(e) represents the combned probablty of DIST, SFM and GR. 4. Implementaton detals To compute the SFM and groupng costs, we need to have nformaton about the veloctes of the pedestrans, whch can only be obtaned f we already have the trajectores. We solve ths chcken-and-egg problem teratvely; on the frst teraton, the trajectores are estmated only wth the nformaton defned n Secton 2.2. The mnmum cost soluton s found usng the Smplex algorthm [8], wth the mplementaton gven n [8]. To reduce the computatonal cost, we prune the graph usng the physcal constrants represented by the edge costs. If any of the costs C j, C SFM,,j or C GR,,j s nfnte the edge (, j) s erased from the graphcal model. For long sequences, we dvde the vdeo nto several batches and optmze for each batch. For temporal consstency, the batches have an overlap of F max = frames. Wth our non-optmzed code, the runtme for a sequence of 8 frames (4 seconds), 4837 detectons, batches of frames and 6 teratons s 3 seconds on a 3GHz machne. All parameters defned n the prevous sectons are learned from tranng data; n our case we use one sequence of the publcly avalable dataset [22]. The parameter to penalze for the frame dfference s B j =.3, the avodance term α =.5. Our approach works well for a wde range of V max and F max. Values between 5 and 25 were tested for both parameters, and the dfference between worst and best trackng accuracy obtaned was %. For all experments shown n the followng sectons, we use V max = 7 and F max =. 5. Expermental results In ths secton we show the trackng results of our method on three publcly avalable datasets and compare wth exstng state-of-the-art trackng approaches usng the CLEAR metrcs [3], DA (detecton accuracy), TA (trackng accuracy), DP (detecton precson) and TP (trackng precson). 5.. Evaluaton wth mssng data, nose and outlers We evaluate the mpact of every component of the proposed approach wth one of the sequences of the dataset [22], whch contans mages from a crowded publc place,

Trackng accuracy.98.96.94.92.9.88 4 8 2 6 2 Mssng data (%) Trackng accuracy.98.96.94.92.9.88.86 2 3 4 5 Outlers (%) Trackng accuracy.988.986.984.982.98.978.976.2.4.6.8. Nose level Fgure 4: Experments are repeated 5 tmes and average result, maxmum and mnmum are plotted. Blue star = results wth DIST, Green damond = results wth SFM, Red square = results wth SFM+GR. From left to rght: Experment wth smulated mssng data, wth outlers, and wth random nose. (a) (b) (c) Fgure 5: Top row: Trackng results wth only DIST. Bottom row: Trackng results wth SFM+GR. Green = correct trajectores, Blue = observaton mssng from the set, Red = wrong match. 5(a) Wrong match wth DIST, corrected wth SFM. 5(b) Mssng detectons cause the matches to shft due the global optmzaton; correct result wth SFM. 5(c) Mssed detecton for subject 3 on two consecutve frames. Wth SFM, subject 2 n the frst frame (yellow arrow) s matched to subject 3 n the last frame (yellow arrow), creatng an dentty swtch; correct result wth groupng nformaton. wth several groups as well as walkng and standng pedestrans. Usng the ground truth (GT) pedestran postons as the baselne for our experments, we perform three types of tests, mssng data, outlers and nose, and compare the results obtaned wth: DIST: proposed network model wth veloctes SFM: addng the Socal Force Model (Secton 3.) SFM+GR: addng SFM and groupng behavor (Secton 3.2) Mssng data. Ths experment shows the robustness of our approach gven mssed detectons. Ths s evaluated by randomly erasng a certan percentage of detectons from the GT set. The percentages evaluated are [, 4, 8, 2, 6, 2] from the total number of detectons over the whole sequence. As we can see n Fgure 4, both SFM and SFM+GR ncrease the trackng accuracy when compared to DIST. Outlers. Wth an ntal set of detectons of GT wth 2% mssng data, tests are performed wth [,, 2, 3, 4, 5] percentage of outlers added n random postons over the ground plane. In Fgure 4, the results show that the SFM s especally mportant when the tracker s dealng wth outlers. Wth 5% of outlers, the dentty swtches wth SFM+GR are reduced 7% w.r.t the DIST results. Nose. Ths test s used to determne the performance of our approach gven nosy detectons, whch are very common manly due to small errors n the 2D-3D mappng. From the GT set wth 2% mssng data, random nose s added to every detecton. The varances of the nose tested are [,.2,.4,.6,.8,.] of the sze of the scene observed. As expected, group nformaton s the most robust to nose; f the poston of pedestran A s not correctly estmated, other pedestrans n the group wll contrbute to the estmaton of the true trajectory of A. These results corroborate that havng good behavoral

models becomes more mportant as the observatons deterorate. In Fgure 5 we plot the trackng results of a sequence wth 2% smulated mssng data. Only usng dstance nformaton can see dentty swtches as shown n Fgure 5(a). In Fgure 5(b) we can see how mssng data affects the matchng results. The matches are shfted, ths chan reacton s due to the global optmzaton. In both cases, the use of SFM allows the tracker to nterpolate the necessary detectons and fnd the correct trajectores. Fnally, n Fgure 5(c) we plot the wrong result whch occurs because track 3 has two consecutve mssng detectons. Even wth SFM, track 2 s swtched for 3, snce the swtch does not create extreme changes n velocty. In ths case, the groupng nformaton s key to obtanng good trackng results. More results are shown n Fgure 7, frst row. n the greedy phase of predctve approaches, where people fght for detectons. The red false detecton n the frst frame takes the detecton n the second frame that should belong to the green trajectory (whch ends n the frst frame). In the thrd frame, the red trajectory overtakes the yellow trajectory and a new blue trajectory starts where the green should have been. None of the resultng trajectores volate the SFM and GR condtons. On the other hand, our global optmzaton framework takes full advantage of the SFM and GR nformaton and correctly recovers all the trajectores. More results of the proposed algorthm can be seen n Fgure 7, last row. 5.2. Trackng results 5.2. Frame: 2 45 45 23 43 97 43 3 46 3 46 Frame: Frame: 52 52 4 36 6 57 57 97 48 48 43 3 36 6 45 45 43 3 Frame: 4 Frame: 3 97 23 29 We evaluate the proposed algorthm on two publcly avalable datasets: a crowded town center [2] and the wellknown PETS29 dataset [9]. We compare results wth () [2] usng the results provded by the authors; (2) [28], a trackng algorthm based on network flows, for whch we use our own mplementaton of the algorthm; (3) [22], whch ncludes socal behavor, usng the code provded by the authors; (4) [27], whch ncludes socal and groupng behavor, usng our own mplementaton. For a far comparson, we do not use appearance nformaton for any method. Frame: 46 Frame: 3 Frame: 4 52 52 4 5 5 4 6 6 36 36 57 57 53 53 Fgure 6: Predctve approaches [22, 27] (frst row) vs. Proposed method (second row) Town Center dataset We perform trackng experments on a vdeo of a crowded town center [2]. To show the mportance of socal behavor and the robustness of our algorthm at low frame rates, we track at 2.5fps (takng one every tenth frame). HOG Detectons Benfold et al. [2] Zhang et al. [28] Pellegrn et al. [22] Yamaguch et al. [27] Proposed DA 63. 64.9 66. 64. 64. 67.6 TA 64.8 65.7 63.4 63.3 67.3 DP 7.9 8.5 7.5 7.8 7. 7.6 TP 8.4 7.5 7.7 7.9 7.5 IDsw 259 4 83 96 86 Table : Town Center sequence. Note, the precson reported n [2] s about 9% hgher than the nput detectons precson; ths s because the authors use the moton estmaton obtaned wth a KLT feature tracker to mprove the exact poston of the detectons, whle we use the raw detectons. Stll, our algorthm reports 64% less ID swtches. As shown n Table, our algorthm outperforms [22], whch ncludes socal behavor, and [27], whch ncludes also groupng nformaton, by almost 4% n accuracy and wth 5% less ID swtches. In Fgure 6 we can see an example where [22, 27] fal. The errors are created 5.2.2 Results on the PETS29 dataset In addton, we perform monocular trackng on the PETS29 sequence L, Vew and obtan the detectons usng the Mxture of Gaussans (MOG) background subtracton method. We obtan a trackng accuracy of 67% compared to 64.5% for Pellegrn et al. [22]. Ths dataset s very challengng from a socal behavor pont of vew, because the subjects often change drecton and groups form and splt frequently. Snce our approach s based on a probablstc framework, t s better suted for unexpected behavor changes (lke destnaton changes), where other predctve approaches fal [22, 27]. 6. Conclusons In ths paper we argued for ntegratng pedestran behavoral models n a lnear programmng framework. Our algorthm fnds the MAP estmate of the trajectores total posteror ncludng socal and groupng models usng a mnmum-cost network flow wth an mproved novel graph structure that outperforms exstng approaches. People nteracton s persstent rather than transent, hence the proposed probablstc formulaton fully explots the power of 53 5

Frame: 35 Frame: 43 73 75 77 67 7 69 72 8 2 9 7 6 69 4 7 82 65 79 68 Frame: 8445 Frame: 445 Frame: 365 8 78 8 67 83 84 4 3 73 2 74 4 76 5 75 9 8 7 69 68 7 8 78 82 83 84 8 72 7 79 6 85 76 66 75 3 73 86 72 87 6 66 74 85 7 62 9 62 62 68 Frame: 76 8 66 Frame: 25 Frame: 9 Frame: 4 Fgure 7: Frst row: Results on the BIWI dataset (Secton 5.). The scene s heavly crowded, socal and groupng behavor are key to obtanng good trackng results. Last row: Results on the Town Center dataset (Secton 5.2.). behavoral models as opposed to standard predctve and recursve approaches such as Kalman flterng. Experments on three publc datasets reveal the mportance of usng socal nteracton models for trackng n dffcult condtons such as n crowded scenes wth the presence of mssed detectons, false alarms and nose. Results show that our approach s superor to state-of-the-art multple people trackers. As future work, we plan on extendng our approach to even more crowded scenaros where ndvduals cannot be detected and therefore features mght be used as n [6]. References [] S. Al and M. Shah. Floor felds for trackng n hgh densty crowded scenes. ECCV, 28. [2] B. Benfold and I. Red. Stable mult-target trackng n real-tme survellance vdeo. CVPR, 2. 7 [3] J. Berclaz, F. Fleuret, and P. Fua. Robust people trackng wth global trajectory optmzaton. CVPR, 26. [4] J. Berclaz, F. Fleuret, E. Tu retken, and P. Fua. Multple object trackng usng k-shortest paths optmzaton. TPAMI, 2. [5] M. Bretensten, F. Rechln, B. Lebe, E. Koller-Meer, and L. van Gool. Robust trackng-by-detecton usng a detector confdence partcle flter. ICCV, 29. [6] G. Brostow and R. Cpolla. Unsupervsed detecton of ndependent moton n crowds. CVPR, 26. 8 [7] W. Cho and S. Savarese. Multple target trackng n world coordnate wth sngle, mnmally calbrated camera. ECCV, 2. 2 [8] G. Dantzg. Lnear programmng and extensons. Prnceton Unversty Press, Prncenton, NJ, 963. 5 [9] J. Ferryman. Pets 29 dataset: Performance and evaluaton of trackng and survellance. 29. 7 [] W. GE, R. Collns, and B. Ruback. Automatcally detectng the small group structure of a crowd. WACV, 29. 2 [] D. Helbng and P. Molna r. Socal force model for pedestran dynamcs. Physcal Revew E, 5:4282, 995. 2, 5 [2] H. Jang, S. Fels, and J. Lttle. A lnear programmng approach for multple object trackng. CVPR, 27. [3] R. Kastur, D. Goldgof, P. Soundararajan, V. Manohar, J. Garofolo, M. Boonstra, V. Korzhova, and J. Zhang. Framework for performance evaluaton for face, text and vehcle detecton and trackng n vdeo: data, metrcs, and protocol. TPAMI, 3(2), 29. 5 [4] R. Kaucc, A. Perera, G. Brooksby, J. Kaufhold, and A. Hoogs. A unfed framework for trackng through occlusons and across sensor gaps. CVPR, 25. [5] Z. Khan, T. Balch, and F. Dellaert. Mcmc-based partcle flterng for trackng a varable number of nteractng targets. TPAMI, 25. [6] B. Lebe, K. Schndler, N. Cornels, and L. van Gool. Coupled detecton and trackng from statc cameras and movng vehcles. TPAMI, 3(), 28. [7] M. Luber, J. Stork, G. Tpald, and K. Arras. People trackng wth human moton predctons from socal forces. ICRA, 2. 2 [8] A. Makhorn. Gnu lnear programmng kt (glpk). http://www.gnu.org/software/glpk/, 2. 5 [9] R. Mehran, A. Oyama, and M. Shah. Abnormal crowd behavor detecton usng socal force model. CVPR, 29. 2 [2] P. Nllus, J. Sullvan, and S. Carlsson. Mult-target trackng - lnkng denttes usng bayesan network nference. CVPR, 26. [2] N. Pelechano, J. Allbeck, and N. Badler. Controllng ndvdual agents n hgh-densty crowd smulaton. Eurographcs/ACM SIGGRAPH Symposum on Computer Anmaton, 27. 2 [22] S. Pellegrn, A. Ess, K. Schndler, and L. van Gool. You ll never walk alone: modelng socal behavor for mult-target trackng. ICCV, 29., 2, 5, 7 [23] S. Pellegrn, A. Ess, and L. van Gool. Improvng data assocaton by jont modelng of pedestran trajectores and groupngs. ECCV, 2. 2 [24] H. Prsavash, D. Ramanan, and C. Fowlkes. Globally-optmal greedy algorthms for trackng a varable number of objects. CVPR, 2. [25] P. Scovanner and M. Tappen. Learnng pedestran dynamcs from the real world. ICCV, 29. 2 [26] B. Wu and R. Nevata. Detecton and trackng of multple, partally occluded humans by bayesan combnaton of edgelet part detectors. IJCV, 75(2), 27. [27] K. Yamaguch, A. Berg, L. Ortz, and T. Berg. Who are you wth and where are you gong? CVPR, 2., 2, 7 [28] L. Zhang, Y. L, and R. Nevata. Global data assocaton for multobject trackng usng network flows. CVPR, 28., 2, 4, 7