CHAPTER 2 Trackng wth Non-Lnear Dynamc Models In a lnear dynamc model wth lnear measurements, there s always only one peak n the posteror; very small non-lneartes n dynamc models can lead to a substantal number of peaks. As a result, t can be very dffcult to represent the posteror: t may be necessary to represent all the peaks to be able to compute the mean or the covarance. We dscuss these dffcultes n secton 2.1. There s no general soluton to ths problem, but there s one mechansm whch has proven useful n some practcal problems: we present ths, rather techncal, mechansm n secton 2.2, and show some applcatons n secton 2.3. It s qute typcal of vson applcatons that there s some doubt about what measurements to track for example, a Kalman flter tracker followng a seres of corner ponts may need to decde whch mage measurement corresponds to whch track. A poor soluton to ths problem may lead to apparently good tracks that bear no relatonshp to the underlyng motons. In secton??, we dscuss how to attach measurements to tracks. 2.1 NON-LINEAR DYNAMIC MODELS If we can assume that nose s normally dstrbuted, lnear dynamc models are reasonably easy to deal wth, because a lnear map takes a random varable wth a normal dstrbuton to another random varable wth a (dfferent, but easly determned) normal dstrbuton. We used ths fact extensvely n descrbng the Kalman flter. Because we knew that everythng was normal, we could do most calculatons by determnng the mean and covarance of the relevant normal dstrbuton, a process that s often qute easy f one doesn t try to do the ntegrals drectly. Furthermore, because a normal dstrbuton s represented by ts mean and covarance, we knew what representaton of the relevant dstrbutons to mantan. Many natural dynamc models are non-lnear. There are two sources of problems. Frstly, n models where the dynamcs have the form x N(f(x 1,); Σ d ) (where f s a non-lnear functon), both P (X y,...,y 1 )andp (X y,...,y ) tend not to be normal. As secton 2.1.1 wll show, even qute nnocuous lookng nonlneartes can lead to very strange dstrbutons ndeed. Secondly, P (Y X ) may not be Gaussan ether. Ths phenomenon s qute common n vson; t leads to dffcultes that are stll nether well understood nor easly dealt wth (secton 2.1.2). Dealng wth these phenomena s dffcult. There s not, and wll never be, a completely general soluton. It s always a good dea to see f a lnear model can be made to work. If one does not, there s the opton of lnearzng the model locally 33
34 Chapter 2 Trackng wth Non-Lnear Dynamc Models and assumng that everythng s normal. Ths approach, known as the extended Kalman flter tends to be unrelable n many applcatons. We descrbe t brefly n the appendx, because t s useful on occason. Fnally, there s a method that mantans a radcally dfferent representaton of the relevant dstrbutons from that used by the Kalman flter. Ths method s descrbed n secton 2.2. The rest of ths secton llustrates some of the dffcultes presented by non-lnear problems. 2.1.1 Unpleasant Propertes of Non-Lnear Dynamcs Non-lnear models of state evoluton can take unmodal dstrbutons lke Gaussans and create multple, well-separated modes, phenomena that are very poorly modeled by a sngle Gaussan. Ths effect s most easly understood by lookng at an example. Let us have the (apparently smple) dynamcal model x +1 = x +.1 sn x. Notce that there s no random component to ths dynamcal model at all; now let us consder P (X 1 ), assumng that P (X )sagaussan wth very large varance (and so bascally flat over a large range). The easest way to thnk about ths problem s to consder what happens to varous ponts; as fgure 2.1 llustrates, ponts n the range ((2k)π, (2k + 2)π) movetowards (2k +1)π. Ths means that probablty must collect at ponts (2k +1)π (we ask you to provde some detals n the exercses). x +1 FIGURE 2.1: The non-lnear dynamcs x +1 = x +.1snx cause ponts n the range ((2k)π, (2k +2)π) movetowards (2k +1)π. As the fgure on the left llustrates, ths s because x +.1snx s slghtly smaller than x for x n the range ((2k +1)π, (2k +2)π) and s slghtly larger than x for x n the range ((2k)π, (2k +1)π). In fact, the nonlnearty of ths functon looks small t s hardly vsble n a scaled plot. However, as fgure 2.2 shows, ts effects are very sgnfcant. Ths nonlnearty s apparently very small. Its effects are very substantal, x
Secton 2.1 Non-Lnear Dynamc Models 35 however. One way to see what happens s to follow a large number of dfferent ponts through the dynamcs for many steps. We choose a large collecton of ponts accordng to P (X ), and then apply our dynamc model to them. A hstogram of these ponts at each step provdes a rough estmate of P (X ), and we can plot how they evolve, too; the result s llustrated n fgure 2.2. As ths fgure shows, P (X )veryqucklylooks lke a set of narrow peaks, each wth a dfferent weght, at (2k +1)π. Representngthsdstrbuton by reportng only ts mean and covarance nvolves a substantal degree of wshful thnkng. 3 25 2 15 1 5 1 2 3 4 5 6 7 8 9 1 18 35 45 16 3 4 14 12 25 35 3 1 2 25 8 15 2 6 4 1 15 1 2 5 5-15 -1-5 5 1 15-15 -1-5 5 1 15-15 -1-5 5 1 15 FIGURE 2.2: On the top, wehaveplottedthe tme evoluton of the state of a set of 1 ponts, for 1 steps of the process x +1 = x +.1 sn x. Notce that the ponts all contract rather quckly to (2k +1)π, andstaythere. We have joned up the tracks of the ponts to make t clear how the state changes. On the bottom left we show a hstogram of the start states of the ponts we used; ths s an approxmaton to P (x ). The hstogram on the bottom center shows a hstogram of the pont postons after 2 teratons; ths s an approxmaton to P (x 2). The hstogram on the bottom rght shows a hstogram of the pont postons after 7 teratons; ths s an approxmaton to P (x 7). Notce that there are many mportant peaks to ths hstogram t mght be very unwse to model P (x )asagaussan.
36 Chapter 2 Trackng wth Non-Lnear Dynamc Models 2.1.2 Dffcultes wth Lkelhoods There s another reason to beleve that P (X y,...,y )maybeverycomplcated n form. Even f the dynamcs do not dsplay the effects of secton 2.1.1, the lkelhood functon P (Y X )cancreate serous problems. For many mportant cases we expect that the lkelhood has multple peaks. For example, consder trackng people n vdeo sequences. The state wll predct the confguraton of an dealsed human fgure and P (Y X )wll be computed by comparng predctons about the mage wth the actual mage, n some way. As the confguraton of the dealsed human fgure changes, t wll cover sectons of mage that aren t generated by a person but look as though they are. For example, pretty much any coherent long straght mage regon wth parallel sdes can look lke a lmb ths means that as X changes to move the arm of the dealsed fgure fromwheretshould be to cover ths regon, the value of P (Y X )wll go down, and then up agan. The lkely result s a functon P (Y X )wthmanypeaks n t. We wll almost certanly need to keep track of more than one of these peaks Ths s because the largest peak for any gven frame may not always correspond to the rght peak. Ths ambguty should resolve tself once we have seen some more frames we don t expect to see many mage assembles that look lke people, move lke people for many frames and yet aren t actually people. However, untl t does, we may need to manage a representaton of P (X y,...,y )whchcontans several dfferent peaks. Ths presents consderable algorthmc dffcultes we don t know how many peaks there are, or where they are, and fndng them n a hgh dmensonal space may be dffcult. One partally successful approach s a form of random search, known as partcle flterng. 2.2 PARTICLE FILTERING The man dffculty n trackng n the presence of complcated lkelhood functons or of non-lnear dynamcs s n mantanng a satsfactory representaton of P (x y,...,y ). Ths representaton should be handle multple peaks n the dstrbuton, and should be able to handle a hgh-dmensonal state vector wthout dffculty. There s no completely satsfactory general soluton to ths problem (and there wll never be). In ths secton, we dscuss an approach that has been useful n many applcatons. 2.2.1 Sampled Representatons of Probablty Dstrbutons A natural way to thnk about representatons of probablty dstrbutons s to ask what aprobablty dstrbuton s for. Computng a representaton of a probablty dstrbutons s not our prmary objectve; we wsh to represent a probablty dstrbuton so that we can compute one or another expectaton. For example, we mght wsh to compute the expected state of an object gven some nformaton; we mght wsh to compute the varance n the state, or the expected utlty of shootng at an object, etc. Probablty dstrbutons are devces for computng expectatons thus, our representaton should be one that gves us a decent prospect of computng an expectaton accurately. Ths means that there s a strong resonance between questons ofrepresentng probablty dstrbutons and questons of effcent numer-
Secton 2.2 Partcle Flterng 37 cal ntegraton. Monte Carlo Integraton usng Importance Samplng. Assume that we have a collecton of N ponts u,andacollecton ofweghtsw.thesepontsare ndependent samples drawn from a probablty dstrbuton S(U ) wecallths the samplng dstrbuton; notcethatwehavebrokenwthourusual conventon of wrtng any probablty dstrbuton wth a P. We assume that S(U )hasa probablty densty functon s(u ). The weghts have the form w = f(u )/s(u )forsomefuncton f. Now t s afactthat [ ] 1 E g(u )w = N = g(u ) f(u ) s(u) s(u)du g(u )f(u )du where the expectaton s taken over the dstrbuton on the collecton of N ndependent samples from S(U )(youcanprovethsfactusng the weak law of large numbers). The varance of ths estmate goes down as 1/N, and s ndependent of the dmenson of U. Representng Dstrbutons usng Weghted Samples. If we thnk about a dstrbuton as a devce for computng expectatons whch are ntegrals wecanobtan a representaton of adstrbuton from the ntegraton method descrbed above. Ths representaton wll consst of a set of weghted ponts. Assume that f s non-negatve, and f(u )du exsts and s fnte. Then f(x) f(u )du s a probablty densty functon representng the dstrbuton of nterest. We shall wrte ths probablty densty functon as p f (X). Now we have acollecton of N ponts u S(U ), and a collecton of weghts w = f(u )/s(u ). Usng ths notaton, we have that [ ] 1 E w = N = 1 f(u) s(u) s(u)du f(u )du
38 Chapter 2 Trackng wth Non-Lnear Dynamc Models Algorthm 2.1: Obtanng a sampled representaton of a probablty dstrbuton Represent a probablty dstrbuton p f (X) = f(x) f(u )du by a set of N weghted samples { (u,w ) } where u s(u) andw = f(u )/s(u ). Now ths means that E pf [g] = g(u )p f (U)dU = =E g(u)f(u )du f(u )du [ ] g(u )w w g(u )w w (where we have cancelled some N s). Ths means that we can n prncple represent aprobablty dstrbuton by a set of weghted samples (algorthm 1). There are some sgnfcant practcal ssues here, however. Before we explore these, we wll dscuss how to perform varous computatons wth sampled representatons. We have already shown how to compute an expectaton (above, and algorthm 2). There are two other mportant actvtes for trackng: margnalsaton, and turnng arepresentatonofapror nto a representaton of aposteror. Margnalsng a Sampled Representaton. An attracton of sampled representatons s that some computatons are partcularly easy. Margnalsaton s a good and useful example. Assume we have a sampled representaton of p f (U) =p f ((M, M)). We wrte U as two components (M, N) sothatwecanmargnalse wth respect to one of them. Now assume that the sampled representaton conssts of a set of samples whch we can wrte as { ((m, n ),w ) } In ths representaton, (m, n ) s(m, N) andw = f((m, n ))/s((m, n )). We want a representaton of the margnal p f (M) = p f (M, N)dN. We wll use ths margnal to estmate ntegrals, so we can derve the representatonby
Secton 2.2 Partcle Flterng 39 Algorthm 2.2: Computng an expectaton usng a set of samples We have a representaton of a probablty dstrbuton p f (X) = f(x) f(u )du by a set of weghted samples { (u,w ) } where u s(u) andw = f(u )/s(u ). Then: N =1 g(u)p f (U)dU g(u )w N =1 w thnkng about ntegrals. In partcular g(m)p f (M)dM = g(m) p f (M, N)dNdM = g(m )p f (M, N)dNdM N =1 g(m )w N =1 w meanng that we can represent the margnal by droppng the n components of the sample (or gnorng them, whch may be more effcent!). Transformng a Sampled Representaton of a Pror nto a Sampled Representaton of a Posteror. Approprate manpulaton of the weghts of a sampled dstrbuton yelds representatons of other dstrbutons. A partcularly nterestng case s representng a posteror, gven some measurement. Recall that p(u V = v )= p(v = v U)p(U ) p(v = v U)p(U)dU = 1 K p(v = v U)p(U) where v s some measured value taken by the random varable V. Assume we have a sampled representaton of p(u), gven by { (u,w ) }. We
4 Chapter 2 Trackng wth Non-Lnear Dynamc Models Algorthm 2.3: Computng a representaton of a margnal dstrbuton Assume we have a sampled representaton of a dstrbuton p f (M, N) gven by { ((m, n ),w ) } Then { (m,w ) } s a representaton of the margnal, p f (M, N)dN can evaluate K farly easly: K = p(v = v U)p(U )du [ N ] =1 =E p(v = v u )w N =1 w N =1 p(v = v u )w N =1 w Now let us consder the posteror. g(u)p(u V = v )du = 1 g(u )p(v = v U)p(U)dU K N =1 g(u )p(v = v u )w 1 K N =1 w N =1 g(u )p(v = v u )w N =1 p(v = v u )w (where we substtuted the approxmate expresson for K n the last step). means that, f we take { (u,w ) } and replace the weghts wth w = p(v = v u )w the result { (u,w ) } s a representaton of the posteror. Ths 2.2.2 The Smplest Partcle Flter Assume that we have asampledrepresentaton of P (X 1 y,...,y 1 ), and we need to obtan a representaton of P (X y,...,y ). We wll follow the usual two
Secton 2.2 Partcle Flterng 41 Algorthm 2.4: Transformng a sampled representaton of a pror nto a sampled representaton of a posteror. Assume we have a representaton of p(u) as { (u,w ) } Assume we have an observaton V = v,andalkelhood model p(v U). The posteror, p(u V = v )srepresented by { (u,w ) } where w = p(v = v u )w steps of predcton and correcton. We can regard each sample as a possble state for the process at step X 1. We are gong to obtan our representaton by frstly representng P (X, X 1 y,...,y 1 ) and then margnalsng out X 1 (whch we know how to do). The result s the pror for the next state, and, snce we know how to get posterors from prors, we wll obtan P (X y,...,y ). Predcton. Now p(x, X 1 y,...,y 1 )=p(x X 1 )p(x 1 y,...,y 1 ) Wrte our representaton of p(x 1 y,...,y 1 )as { (u k 1,w k 1) } (the superscrpts ndex the samples for a gven step, and the subscrpt gves the step). Now for any gven sample u k 1,wecanobtan samples of p(x X 1 = u k 1 ) farly easly. Ths s because our dynamc model s x = f(x 1 )+ξ where ξ N(, Σ m ). Thus, for any gven sample u k 1,wecangenerate samples of p(x X 1 = u k 1) as { (f(u k 1 )+ξ l, 1)}
42 Chapter 2 Trackng wth Non-Lnear Dynamc Models where ξ l N(, Σ m ). The ndex l ndcates that we mght generate several such samples for each u k 1. We can now represent p(x, X 1 y,...,y 1 )as { ((f(u k 1 )+ξ, l u k 1),w 1) k } (notce that there are two free ndexes here, k and l; bythswemeanthat, for each sample ndexed by k, there mght be several dfferent elements of the set, ndexed by l). Because we can margnalse by droppng elements, the representaton of P (x y,...,y 1 ) s gven by { (f(u k 1)+ξ,w l 1) } k (we walk through a proof nthe exercses). We wllrendex ths collectonof samples whchmayhavemorethann elements and rewrte t as { (u k,,w k, ) assumng that there are M elements. Just as n our dscusson of Kalman flters, the superscrpt ndcates that ths our representaton of the th state before a measurement has arrved. The superscrpt k gves the ndvdual sample. Correcton. Correcton s smple: we need to take the predcton, whch acts as a pror, and turn t nto a posteror. We do ths by choosng an approprate weght for each sample, followng algorthm 4. The weght s } p(y = y X = s k, )w k, (you should confrm ths by comparng wth algorthm 4). and our representaton of the posteror s { },p(y = y X = s k, )w k, (s k, ) The Trackng Algorthm. In prncple, we now have most of a trackng algorthm the only mssng step s to explan where the samples of p(x )came from. The easest thng to do here s to start wth a dffuse pror of a specal form that s easly sampled a Gaussan wth large covarance mght do t and gve each of these samples a weght of 1. It s a good dea to mplement ths trackng algorthm to see how t works (exercses!); you wll notce that t works poorly even on the smplest problems (fgure 2.3 compares estmates from ths algorthm to exact expectatons computed wth a Kalman flter). The algorthm gves bad estmates because most samples represent no more than wastedcomputaton. In jargon, the samples are called partcles. If you mplement ths algorthm, you wll notce that weghts get small very fast; ths sn t obvously a problem, because the mean value of the weghts s cancelled n the dvson, so we could at each step dvde the weghts by ther mean
Secton 2.2 Partcle Flterng 43 2 2 1.5 1.5 1 1.5.5 -.5 -.5-1 -1-1.5-1.5-2 5 1 15 2 25 3 35 4-2 5 1 15 2 25 3 35 4 FIGURE 2.3: The smple partcle flterbehavesverypoorly,as a result of a phenomenon called sample mpovershment, whchsrather lke quantsaton error. In ths example, we have a pont on the lne drftng on the lne (.e. x N(x 1,σ 2 )). The measurements are corrupted by addtve Gaussan nose. In ths case, we can get an exact representaton of the posteror usng a Kalman flter. In the fgure on the left, wecomparearepre- sentaton obtaned exactly usng a Kalman flter wth one computed from smple partcle flterng. We show the mean of the posteror as a pont wth a one standard devaton bar (prevously we used three standard devatons, but that would make these fgures dffcult to nterpret). The mean obtaned usng a Kalman flter s gven as an x; the mean obtaned usng a partcle flter s gven as an o; we have offset the standard devaton bars from one another so as to make the phenomenon clear. Notce that the mean s poor, but the standard devaton estmate s awful, and gets worse as the trackng proceeds. In partcular, the standard devaton estmate woefully underestmates the standard devaton thscouldmsleadauser nto thnkng the tracker was workng and producng good estmates, when n fact t s hopelessly confused. The fgure on the rght ndcates what s gong wrong; we plot the tracks of ten partcles, randomly selected from the 1 used. Note that relatvely few partcles ever le wthn one standard devaton of the mean of the posteror; n turn, ths means that our representaton of P (x +1 y,...,y )wll tend to consst of many partcles wth very low weght, and only one wth a hgh weght. Ths means that the densty s represented very poorly, and the error propagates. value. If you mplement ths step, you wll notce that very quckly one weght becomes close to one and all others are extremely small. It s a fact that, n the smple partcle flter, the varance of the weghts cannot decrease wth (meanng that, n general, t wll ncrease and we wll end up wth one weght very much larger than all others). If the weghts are small, our estmates of ntegrals are lkely to be poor. In partcular, a sample wth a small weght s postoned at a pont where f(u) smuch smaller than p(u); n turn (unless we want to takeanexpectaton of a functon whch s very large at ths pont) ths sample s lkely to contrbute relatvely lttle to the estmate of the ntegral. Generally, the way to get accurate estmates of ntegrals s to have samples that le where the ntegral s lkely to be large we certanly don t want to mss these ponts. We are unlkely to want to take expectatons of functons that vary quckly, and so we would lke our samples to le where f(u) slarge. In turn, ths
44 Chapter 2 Trackng wth Non-Lnear Dynamc Models means that a sample whose weght w s small represents a waste of resources we d rather replace t wth another sample wth a large weght. Ths means that the effectve number of samples s decreasng some samples make no sgnfcant contrbuton to the expectatons we mght compute, and should deally be replaced (fgure 2.3 llustrates ths mportant effect). In the followng secton, we descrbe ways of mantanng the set of partcles that lead to effectve and useful partcle flters. 2.2.3 A Workable Partcle Flter Partcles wth very low weghts are farly easly dealt wth we wll adjust the collecton of partcles to emphasze those that appear to be most helpful n representng the posteror. Ths wll help us deal wth another dffculty, too. In dscussng the smple partcle flter, we dd not dscuss how many samples there were at each stage f,atthepredcton stage, we drew several samples of P (X X 1 = s k,+ 1 )for each s k,+ 1,thetotal pool ofsampleswould grow as got bgger. Ideally, we would have a constant number of partcles N. Allths suggests that we need a method to dscard samples, deally concentratng on dscardng unhelpful samples. There are anumberofstrategesthatarepopular. Resamplng the Pror. At each step, we have a representaton of P (X 1 y,...,y 1 ) va weghted samples. Ths representaton conssts of N (possbly dstnct) samples, each wth an assocated weght. Now n a sampled representaton, the frequency wth whch samples appear can be traded off aganst the weght wth whch they appear. For example, assume we have a sampled representaton of P (U) consstng of N pars (s k,w k ). Form a new set of samples consstng of a unon of N k copes of (s k, 1), for each k. If N k k N = w k k ths new set of samples s also a representaton of P (U) (youshould check ths). Furthermore, f we take a sampled representaton of P (U) usngn samples, and draw N elements from ths set wth replacement, unformly and at random, the result wll be a representaton of P (U), too (you should check ths, too). Ths suggests that we could (a) expand the sample set and then (b) subsample t to get anewrepresentaton of P (U). Ths representaton wll tend to contan multple copes of samples that appeared wth hgh weghts n the orgnal representaton. Ths procedure s equvalent to the rather smpler process of makng N draws wth replacement from the orgnal set of samples, usng the weghts w as the probablty of drawng a sample. Each sample n the new set would have weght 1; the new set would predomnantly contan samples that appeared n the old set wth large weghts. Ths process of resamplng mght occur at every frame, or only when the varance of the weghts s too hgh. Resamplng Predctons.
Secton 2.2 Partcle Flterng 45 Algorthm 2.5: A practcal partcle flter resamples the posteror. Intalzaton: Represent P (X )byasetofn samples { } (s k,,w k, ) where s k, P s (S) andw k, = P (s k, )/P s (S = s k, ) Ideally, P (X )hasasmpleformands k, P (X )andw k, =1. Predcton: Represent P (X y, y 1 )by { (s k,,w k, ) } where s k, = f(s k,+ 1 )+ξk and w k, = w k,+ 1 and ξk N(, Σ d ) Correcton: Represent P (X y, y )by { where s k,+ = s k, and w k,+ (s k,+,w k,+ ) } = P (Y = y X = s k, )w k, Resamplng: Normalse the weghts so that wk,+ = 1andcompute the varance of the normalsed weghts. If ths varance exceeds some threshold, then construct a new set of samples bydrawng, wthreplacement, N samples from the old set, usng the weghts as the probablty that a sample wll be drawn. The weght of eachsample s now 1/N. Aslghtly dfferent procedure s to generate several samples of P (X X 1 = s k,+ 1 )foreachsk,+ 1,makeN draws, wth replacement, from ths set usng the weghts w as the probablty of drawng a sample, to get N partcles. Agan, ths process wll emphasze partcles wth larger weght over those wth smaller weghts. The Consequences of Resamplng. Fgure 2.4 llustrates the mprovements that can be obtaned by resamplng. Resamplng snot a unformlybengn actvty, however: t s possble but unlkely tolosemportant partcles as a result of resamplng, and resamplng can be expensve computatonally f there are many partcles.
46 Chapter 2 Trackng wth Non-Lnear Dynamc Models Algorthm 2.6: An alternatve practcal partcle flter. Intalzaton: Represent P (X )byasetofn samples { } (s k,,w k, ) where s k, P s (S) andw k, = P (s k, )/P s (S = s k, ) Ideally, P (X )hasasmpleformands k, P (X )andw k, =1. Predcton: Represent P (X y, y 1 )by where s k,l, { (s k,,w k, ) = f(s k,+ 1 )+ξl and w k,l, = w k,+ 1 and ξ l N(, Σ d ) and the free ndex l ndcates that each s k,+ 1 generates M dfferent values of s k,l,.thsmeans that there are now MN partcles. Correcton: We rendex the set of MN samples by k. RepresentP (X y, y ) by { } (s k,+,w k,+ ) where s k,+ = s k, and w k,+ Resamplng: As n algorthm 5. } = P (Y = y X = s k, )w k, 2.2.4 If s, And s and But s Practcal Issues n Buldng Partcle Flters Partcle flters have been extremely successful n many practcal applcatons n vson, but can produce some nasty surprses. One mportant ssue has to do wth the number of partcles; whle the expected value of an ntegral estmated wth asampledrepresentaton s the true value of the ntegral,t may requre a very large number of partcles before the varance of the estmator s low enough to be acceptable. It s dffcult to say how many partcles wll be requred to produce usable estmates. In practce, ths problem s usually solved by experment. Unfortunately, these experments may be msleadng. You can (and should!) thnk about a partcle flter as a form of search we have a seres of estmates of state, whch we update usng the dynamc model, and then compare to the data; estmates whch look as though they could have yelded the data are kept, and the others are dscarded. The dffculty s that we may mss good hypotheses. Ths
Secton 2.2 Partcle Flterng 47 2 2 1.5 1.5 1 1.5.5 -.5 -.5-1 -1-1.5-1.5-2 5 1 15 2 25 3 35 4-2 5 1 15 2 25 3 35 4 FIGURE 2.4: Resamplng hugely mproves the behavour of a partcle flter. We now show aresampled partcle flter trackng a pont drftng on the lne (.e. x N(x 1,σ 2 )). The measurements arecorrupted by addtve Gaussan nose, and are the same as for fgure 2.3. In the fgure on the left, wecomparean exact representaton obtaned usng akalmanflterwth one computed from smple partcle flterng. We show the mean of the posteror as a pont wth a one standard devaton bar. The mean obtaned usng akalmanflters gven as an x ; the mean obtaned usng a partcle flter s gven as an o ; we have offset the standard devaton bars from one another so as to make the phenomenon clear. Notce that estmates of both mean and standard devaton obtaned from the partcle flter compare well wth the exact values obtaned from the Kalman flter. The fgure on the rght ndcates where ths mprovement came from; we plot the tracks of ten partcles, randomly selected from the 1 used. Because we are now resamplng the partcles accordng to ther weghts, partcles that tend to reflect the state rather well usually reappear n the resampled set. Ths means that many partcles le wthn one standard devaton of the mean of the posteror, and so the weghts on the partcles tend to have much smaller varance, meanng the representaton s more effcent. could occur f, for example, the lkelhood functon had many narrow peaks. We may end up wth updated estmates of state that le n some, but not all of these peaks; ths would result n good state hypotheses beng mssed. Whle ths problem can (just!) be caused to occur n one dmenson, t s partcularly serous n hgh dmensons. Ths s because real lkelhood functons can have many peaks, and these peaks are easy to mss n hgh dmensonal spaces. It s extremely dffcult to get good results from partcle flters n spaces of dmenson much greater than about 1. The problem can be sgnfcant n low dmensons, too ts sgnfcance depends, essentally, on how good a predcton of the lkelhood we can make. Ths problem manfests tself n the best-known fashon when one uses a partcle flter to track people. Because there tend to be many mage regons that are long, roughly straght, and coherent, t s relatvely easy to obtan many narrow peaks n the lkelhood functon these correspond, essentally, to cases where the confguraton for whch the lkelhood s beng evaluated has a segment lyng over one of these long, straght coherent mage regons. Whle there are several trcks for addressng ths problem all nvolve refnng some form of search over the lkelhood there s no standard soluton yet.
48 Chapter 2 Trackng wth Non-Lnear Dynamc Models 2.3 TRACKING PEOPLE WITH PARTICLE FILTERS Trackng people s dffcult. The frst dffculty s that there s a great deal of state to a human there are manyjont angles, etc. that may need to be represented. The second dffculty s that t s currently very hard to fnd people n an mage ths means that t can be hard to ntate tracks. Most systems come wth a rch collecton of constrants that must be true before they can be used. Ths s because people have a large number of degrees of freedom: bts of the body move around, we can change clothng, etc., whch means t s qute dffcult to predct appearance. People are typcally modelled as a collecton of body segments, connected wth rgd transformatons. These segments can be modelled as cylnders n whch case, we can gnore the topand bottom of the cylnder and any varatons n vew, and represent the cylnder as an mage rectangle of fxed sze or as ellpsods. The state of the tracker s then gven by the rgd body transformatons connectng these body segments (and perhaps, varous veloctes and acceleratons assocated wth them). Both partcle flters and (varants of) Kalman flters have been used to track people. Each approach can be made to succeed, but nether s partcularly robust. There are two components to buldng a partcle flter tracker: frstly, we need a moton model and secondly, we need a lkelhood model. We can use ether a strong moton model whch can be obtaned by attachng markers to a model and usng them to measure the way the model s jont angles change as a functon of tme or a weak moton model perhaps a drft model. Strong moton models have some dsadvantages: perhaps the ndvdual we are trackng moves n a funny way; and we wll need dfferent models for walkng, walkng carryng a weght, joggng and runnng (say). The dffculty wth a weak moton model s that we are pretty much explctly acknowledgng that each frame s a poor gude to the next. Lkelhood models are another source of dffcultes, because of the complexty of the relatonshp between the tracker s state and the mage. The lkelhood functon (P (mage features person present at gven confguraton)) tends to have many local extrema. Ths s because the lkelhood functon s evaluated by, n essence, renderng a person usng the state of the tracker and then comparng ths renderng to the mage. Assume that we know the confguraton of the person n the prevous mage; to assess the lkelhood of a partcular confguraton n the current mage, we use the confguraton to compute a correspondence between pxels n the current mage and n the prevous mage. The smplest lkelhood functon can be obtaned usng the sum of squared dfferences between correspondng pxel values ths assumes that clothng s rgd wth respect to the human body, that pxel values are ndependent gven the confguraton, and that there are no shadng varatons. These are all extremely dubous assumptons. Of course, we choose whch aspects of an mage to render and to compare; we mght use edge ponts nstead of pxel values, to avod problems wth llumnaton. Multple extrema n the lkelhood can be caused by: the presence of many extended coherent regons, whch look lke body segments, n mages; the presence of many edge ponts unrelated to the person beng tracked (ths s a problem f we use edge ponts n the comparson); changes n llumnaton; and changes n the appearance
Secton 2.3 Trackng People wth Partcle Flters 49 FIGURE 2.5: Atypcallkelhood computaton dentfes ponts n the mage that provde evdence that a person s present. In one use of a partcle flter for trackng people, Deutscher, Blake and Red look for two types of evdence: the frst s boundary nformaton, and the second s non-background nformaton. Boundary ponts are estmated usng an edge detector, and non-background ponts are obtaned usng background subtracton; the fgure on the left llustrates these types of pont. On the far left, themage; center left, pontsnearedges obtaned usng smoothed gradent estmates; near left, ponts where there s moton, obtaned usng background subtracton. Now each partcle gves the state of the person, and so can be used to determne where each body segment les n an mage; ths means we can predct the boundares of the segments and ther nterors, and compute a score based on the number of edge ponts near segment boundares and the number of non-background ponts nsde projected segments ( near rght shows sample ponts that look for edges and far rght shows sample ponts that look for movng ponts). Fgure from Artculated Body Moton Capture by Annealed Partcle Flterng, J. Deutscher, A. Blake and I. Red, Proc. Computer Vson and Pattern Recognton 2 c 2, IEEE of body segments caused by clothng swngng on the body. The result s a tendency for trackers to drft (see, for example, the conclusons n [Sdenbladh et al., 2b]; the comments n [Yacoob and Davs, 2]). In all of the examples we show, the tracker must be started by hand. An alternatve to usng a strong moton model s to use a weak moton model and rely on the search component of partcle flterng. The best example of ths approach s a devce, due to Deutscher et al., known as an annealed partcle flter (fgure 2.6), whch essentally searches for the global extremum through a sequence of smoothed versons of the lkelhood [Deutscher et al., 2]. However, clutter creates peaks that can requre ntractable numbers of partcles. Furthermore, ths strategy requres a detaled search of a hgh dmensonal doman (the number of people beng tracked tmes the number of parameters n the person model plus some camera parameters).
5 Chapter 2 Trackng wth Non-Lnear Dynamc Models FIGURE 2.6: If a weak moton model s used to track a person wth a partcle flter, the lkelhood functon can create serous problems. Ths s because the state s hghdmensonal, and there are many local peaks n the lkelhood even for a person on ablack background, as n these mages. It s qute possble that none of our partcles are near the peaks, meanng that the flter s representaton becomes unrelable. One way to deal wth ths s to searchthelkelhood by annealng t. Ths process creates a seres of ncreasngly smooth approxmatons to the lkelhood, whose peaks le close to or on the peaks of the lkelhood. We weght partcles wth a smoothed approxmaton, then resample the partcles accordng to ther weghts, allow them to drft, then weght them wth a less smooth approxmaton, etc. The result s a random search through the lkelhood that should turn up the man local mnma. Ths yelds a tracker that can track people on smple backgrounds, but requres only very general moton models the tracker llustrated above models human moton as drft. Fgure from Artculated Body Moton Capture by Annealed Partcle Flterng, J. Deutscher, A. Blake and I. Red, Proc. Computer Vson and Pattern Recognton 2 c 2, IEEE 2.4 NOTES Space has forced us to omt some topcs, mportant n radar trackng and lkely to become mportant n vson. Frstly, the radar communty s accustomed to generatng tracks automatcally; ths practce s not unknown n vson, but most trackers are ntalzed by hand. Secondly, the radar communty s accustomed to trackng multple targets, wth multple returns; ths complcates data assocaton sgnfcantly, whch s now seen as a weghted matchngproblem and dealt wth usng one of the many exact algorthms for that problem. Thrdly, the radar communty s accustomed to dealng wth trackng for objects that can swtch dynamc model thsleads to so-called IMM flters, where a seres of flters wth dstnct dynamc models propose dfferent updates, and these proposals are weghted by evdence and updated (for a good general summary of trackng practce n radar, see [Blackman and Popol, 1999]). There s a lttle work on ths topc n the vson communty, but t tends to be somewhat dffcult to do wth a partcle flter (actually, ths means t needs an nconvenent number of partcles; you can do pretty much anythng wth apartcleflter, f you have enough partcles).
Secton 2.4 Notes 51 Topc Sampled representatons Partcle flterng Applcatons of partcle flters What you must know Aprobablty dstrbuton s a devce for computng expectatons, whch are ntegrals. A set of samples from one probablty dstrbuton can be weghted such that a weghted sum of functon values taken at the samples sagood estmate of the expectaton of that functon wth respect to some (possbly dfferent) probablty dstrbuton. Such a set of samples s called a sampled representaton. Acollecton of sample states s mantaned so that the samples represent the posteror dstrbuton. Typcally, these states are propagated forward n tme accordng to the dynamcs, compared wth observatons, weghted accordng to theobservaton lkelhood, and then resampled to remove samples wth low weghts. The algorthm should be seen as a randomsed search of the lkelhood, usng the pror as a proposal process. Ths approach useful and powerful f (a) the state space s of low dmenson and(b)the lkelhood functon s not too complcated. Partcle flters have been wdely appled to trackng knematc representatons of human fgures. In ths applcaton, they have had consderable success, but currently requre manual ntalsaton; ths s a sgnfcant mpedment to ther adopton n practcal applcatons. Chapter summary for chapter 2: When object dynamcs s not lnear, the posteror dstrbutons encountered n trackng problems tend to have forms that are dffcult to represent. Partcle flterng s an nference algorthm that s well suted to trackng non-lnear dynamcs. The Partcle Flter We have been able to provde only a bref overvew of a subject that s currently extremely actve. We have delberately phrased our dscusson rather abstractly, so as to brng out the ssues that are most problematc, and to motvate a vew of partcle flters as convenent approxmatons. Partcle flters have surfaced n avaretyof forms n a varety of lteratures. The statstcs communty, where they orgnated, knows them as partcle flters (e.g. [Ktagawa, 1987]; seealsothe collecton [Doucet et al., 21]). In the AI communty, the method s sometmes called survval of thefttest [Kanazawa et al., 1995]. Inthevsoncommunty, the method s sometmes known as condensaton [Isard and Blake, 1996; Blake and Isard, 1996; Blake and Isard, 1998]. Partcle flters have been the subject of a great deal of work n vson. Much of the work attempts to sdestep the dffcultes wth lkelhood functons that we sketched n the partcle flterng secton (see, n partcular, the annealng method of [Deutscher et al., 2] and the lkelhood correctons of [Sullvan et al., 1999]). Unfortunately, all uses of the partcle flter have been relentlessly top-down n the sense that one updates an estmate of state and then computes some comparson between an mage and a renderng, whchsasserted to be a lkelhood. Whle ths strategy represents an effectve end-run around data assocaton, t means that we
52 Chapter 2 Trackng wth Non-Lnear Dynamc Models are commtted to searchng rather nasty lkelhoods. There s a strong analogy between partcle flters and search. Ths can be used to gve some nsght nto what they do and where they work well. For example, ahghdmensonal lkelhood functon wth many peaks presents serous problems to apartcleflter. Ths s because theresno reason to beleve that any of the partcles each step advances wll fnd a peak. Ths s certanly not an ntrnsc property of the technque whch s just an algorthm and s almost certanly amajorstrategc error. The consequence of ths error s that one can track almost anythng wth a partcle flter (as long as the dmenson of the state space s small enough) but t has to be ntalzed by hand. Ths partcular ghost needs to be exorcsed from the party assoonaspossble. The exorcsm wll probably nvolve thnkng about how to come up wth clean probablstc models that (a) allow fast bottom-up nference and (b) don t nvolve tanglng wth lkelhoods wth as complex a form as those commonly used. We expect an exctng struggle wth ths problem over the next few years. Partcle flters are an entrelygeneral nference mechansm (meanng that they can be used to attack complex nference problems untng hgh level and low level vson [Isard and Blake, 1998b; Isard and Blake, 1998a]). Ths should be regarded as a sgn that t can be very dffcult to get them to work, because there are nference problems that are, essentally, ntractable. One source of dffcultes s the dmenson of the state space t s slly to beleve that one can represent the covarance of a hgh-dmensonal dstrbuton wth a small number of partcles, unless the covarance s very strongly constraned. A partcular problem s that t can be qute hard to tell when a partcle flter s workng obvously, f the tracker has lost track, there s a problem, but the fact that the tracker seems to be keepng track s not necessarly a guarantee that all s well. For example, the covarance estmates may be poor; we need to ask for how long the tracker wll keep track; etc. One way to smplfy ths problem s to use tghtly parametrsed moton models. Ths reduces the dmenson of the state space n whch we wsh to track, but at the cost of not beng able to track some objects or of beng compelled to choose whch model to use. Ths approach has been extremely successful n applcatons lke gesture recognton [Black and Jepson, 1998]; trackngmovng people [Sdenbladh et al., 2a]; andclassfyng body movements [Rttscher and Blake, 1999]. A tracker could track the state of ts own platform, nstead of trackngamovng object [Dellaert et al., 1999]. There are other methods for mantanng approxmatons of denstes. One mght, for example, use a mxture of Gaussans wth a constant number of components. It s rather natural to dodataassocatonbyaveragng, whch wll result n the number of elements n the mxture gong up at each step; one s then supposed to cluster the elements and cull some components. We haven t seen ths method used n vson crcles yet. Startng a People Tracker Desderata for a trackng applcaton are: that tracks are ntated automatcally;
Secton 2.4 Notes 53 that tracks can be dscarded automatcally, as necessary (ths means that the occasonal erroneous track won t affect the count of total objects); that the tracker can be shown to work robustly over long sequences of data. We dscussed relatvely few of the many knematc human trackers, because none can meet these tests. It would be nce f ths remark were obsolete by the tme ths book reaches ts readers, but we don t thnk ths wll be the case (whch s why we made t!). Trackng people on a general background remans extremely challengng; the dffculty s knowng how to ntate the track, whch s hardbecause the varatons n the appearance of clothng mean that t s generally dffcult to know whch pxels come from a person. Furthermore, the nference problem s very dffcult, because the condtonal ndependence assumptons that smplfy fndng people no longer apply the poston oftheupper arm n frame n, say,depends on both the poston of the torso n frame n and the poston of the upper arm n frame n 1. It s possble to evade ths dffculty n the frst nstance by assemblng mult-frame moton vectors [Song et al., 1999; Song et al., 2], but these too have unpleasant dependences over tme (the moton of the upper arm n frame n, etc.), and the consequences of gnorng these dependences are unknown. Typcally, current person trackers ether ntalze the tracker by hand, use aggressvely smplfed backgrounds whch have hgh contrast wth the movng person, or use background subtracton. These trcks are justfed, because they make t possble to study ths (extremely mportant) problem, but they yeld rather unconvncng applcatons. There scurrently (md 21) no person tracker that represents the confguraton of the body and can start automatcally; all such trackers use manual startng methods. One way to start such a tracker would be to fnd all possble people, and then track them. But fndng people s dffcult, too. No publshed method can fnd clothed people n arbtrary confguratons n complex mages. There are three standard approaches to fndng people descrbed n the lterature. Frstly, the problem can be attacked by template matchng (examples nclude [Oren et al., 1997], where uprght pedestrans wth arms hangng at ther sde are detected by a template matcher; [Nyog and Adelson, 1995; Lu and Pcard, 1996; Cutler and Davs, 2], wherewalkng s detected by the smple perodc structure that t generates n a moton sequence; [Wren et al., 1995; Hartaoglu et al., 2], whchrelyonbackground subtracton that s, a template that descrbes non-people ). Matchng templates to people (rather than to the background) s napproprate f people are gong to appear n multple confguratons, because the number of templates requred s too hgh. Ths motvates the second approach, whch s to fnd people by fndng faces (sectons?? and??, and[poggo and Sung, 1995; Rowley et al., 1996a; Rowley et al., 1996b; Rowley et al., 1998a; Rowley et al., 1998b; Sung and Poggo, 1998]). The approach s most successful when frontal faces are vsble. The thrd approach s to use the classcal technque of search over correspondence (search over correspondences between pont features s an mportant early formulatonof object recognton; the technques we descrbe have roots n [Faugeras and Hebert, 1986; Grmson and Lozano-Pérez, 1987; Thompson and Mundy, 1987; Huttenlocher and Ullman, 1987]). In ths approach, we search over correspondence
54 Chapter 2 Trackng wth Non-Lnear Dynamc Models between mage confguratons and object features. There are a varety of examples n the lterature (for a varety of types of object; see, for example, [Huang et al., 1997; Ullman, 1996]). Perona and collaborators fnd faces by searchng for correspondences between eyes, nose and mouth and mage data, usng a search controlled by probablstc consderatons [Leung et al., 1995; Burl et al., 1995]. Unclad people are found by [Fleck et al., 1996; Forsyth and Fleck, 1999], usnga correspondence search between mage segments and body segments, tested aganst human knematc constrants. Amuchmprovedversonofths technque, whch learns the model from data, appears n [Ioffe and Forsyth, 1998].
Secton II Appendx: The Extended Kalman Flter, or EKF 55 II APPENDIX: THE EXTENDED KALMAN FILTER, OR EKF We consder non-lnear dynamc models of the form Agan, we wll need to represent wth x N(f(x 1,); Σ d ) P (x y,...,y 1 ) (for predcton) and P (x y,...,y ) (for correcton). We take the poston that these dstrbutons can be represented by supplyng a mean and a covarance. Typcally, the representaton works only for dstrbutons that look rather lke normal dstrbutons a bg peak at one spot, and then a fast falloff. To obtan an extended Kalman flter, we lnearze the dynamcs about the current operatng pont, and lnearze the measurement model. We do not derve the flter equatons (t s a dull exercse n Laplace s approxmaton to ntegrals), but smply present them n algorthm 7. We wrte the Jacoban of a functon g thssthematrxwhosel, m th entry s f l x m asj (g), and when we want to show that t has been evaluated at some pont x j,wewrtej (g; x j ).
56 Chapter 2 Trackng wth Non-Lnear Dynamc Models Algorthm 2.7: The extended Kalman flter mantans estmates of the mean and covarance of the varous dstrbutons encountered whle trackng a state varable of some fxed dmenson usng the gven non-lnear dynamc model. Dynamc Model: x N(f(x 1,), Σ d ) y N(h(x,), Σ m ) Start Assumptons: x and Σ Update Equatons: Predcton are known x = f(x + 1 ) Update Equatons: Correcton Σ =Σ d + J (f; x + 1 ) T Σ + 1 J (f; x+ 1 ) 1 K =Σ J T h;x [ J (h; x )Σ J (h; x )T +Σ m ] 1 x + = x + K [ y h(x,)] Σ + = [ Id K J (h; x )] Σ