A Background Layer Model for Object Tracking through Occlusion

A Background Layer Model for Obec Trackng hrough Occluson Yue Zhou and Ha Tao Deparmen of Compuer Engneerng Unversy of Calforna, Sana Cruz, CA 95064 {zhou,ao}@soe.ucsc.edu Absrac Moon layer esmaon has recenly emerged as a promsng obec rackng mehod. In hs paper, we exend prevous research on layer-based racker by nroducng he concep of background occludng layers and explcly nferrng deph orderng of foreground layers. The background occludng layers le n fron of, behnd, and n beween foreground layers. Each pxel n he background regons belongs o one of hese layers and occludes all he foreground layers behnd. Togeher wh he foreground orderng, he complee nformaon necessary for relably rackng obecs hrough occluson s ncluded n our represenaon. An MA esmaon framework s developed o smulaneously updae he moon layer parameers, he orderng parameers, and he background occludng layers. Expermenal resuls show ha under varous condons wh occluson, ncludng suaons wh movng obecs undergong complex moons or havng complex neracons, our rackng algorhm s able o handle many dffcul rackng asks relably. Inroducon In recen years, dynamc moon layer esmaon has emerged as a promsng approach for obec rackng [4],[9],[5],[8]. A moon layer s a regon n an mage ha undergoes a coheren moon. The wo chef problems n moon layer based rackng algorhms are how o represen moon layers and how o esmae he parameers assocaed wh hese layers. Wh he dynamc moon layer represenaon, rackng problem can be formulaed as he maxmum a poseror (MA) esmaon of a Hdden Markov Model (HMM) [8]. In a ypcal moon layer esmaon process, boh foreground obecs and background are modeled and hey compee wh each oher o maxmze he on poseror probably. Ths s one of he man reasons behnd he success of layer rackers. In erms of layer represenaon, n prevous work, only moon, segmenaon, and appearance are consdered. Ths obec represenaon works well for rackng mulple obecs when no occluson presens. However, s nsuffcen n accommodang occluson caused by foreground or background obec and n effecvely modelng he spaal relaonshp among movng obecs and he background. revous work on moon layer analyss and moon layer based rackng employed global or local moon represenaons [0],[],[]. The obec shape and appearance are ofen modeled as Gaussan dsrbuons [9], Markov Random Felds (MRF) [0] or oher mxure models []. To handle obec occluson n moon analyss and rackng, an explc generave occluson boundary model was proposed n [3]. To handle self-occluson on he foreground obecs and adapvely change he shapes of he foreground obecs o allow he rackng of non-rgd moon, [4] proposed a combned paramerc shape and moon model wh deph orderng o represen he vsbly of each layers. The Transformed Hdden Markov Model (THMM) algorhm [6] ncludes boh moon and appearance represenaon as he parameers n a generave model and formulaes he rackng problem as he learnng of hese parameers. In hs paper, we propose a novel scene represenaon wh orderng nformaon ha conans complee nformaon for nferrng he foreground obec and he background occluson. In hs represenaon, each movng obec s modeled as a foreground layer. Some background obecs such as rees may occlude foreground moon layers. To model he deph dfference n background, we nroduce background layers ha le beween foreground layers. Ths s dfferen from he prevous mehods where he background regon s modeled as a sngle layer. In addon, he deph orderng of foreground layers s reaed as a sae varable o explcly model he deph relaons among foreground obecs. Unlke he global shape model n [9], we also allow gradual bu arbrary changes n obecs shapes, whch are capured n he foreground mask. Based on hs new layer represenaon, we propose an esmaon algorhm ha esmaes he moon layer parameers, he foreground orderng, and he background layers n an MA framework. The overall formulaon can be wren as max arg ( Λ Λ, I,..., I 0 ) () Λ where Λ s he sae of he racker a me, and I s he mage observaon.

The res of he paper s organzed as follows. The deals of he proposed layer represenaon are presened n Secon. Secon 3 descrbes he MA esmaon of he layer sae. Secon 4 descrbes he mplemenaon and demonsraes he expermenal resuls. Some dscussons and conclusons can be found n Secon 5. Dynamc layer modelng. Deph orderng of he foreground and background layers In our proposed approach, a dynamc scene s represened by foreground and background layers. As shown n Fgure, foreground moon layers are ordered accordng o her relave deph from fron o back. The fron-mos layer s layer. Some background regons, whch are defned as he mage areas ha do no move, may le n beween foreground layers. These background regons are n fron of some foreground obecs and are called occludng background layers. In our model, as shown n Fgure, here s one background layer beween every wo neghborng foreground layers. There s also one background layer ha s behnd all foreground layers and one layer ha s n fron all foreground layers (layer ). If here are L foreground layers, he here are L + background layers. Foreground layers Layer Layer Layer Background layers Layer Layer 3 Fron Fgure. The ordered layer model. Fgure. An example of he background layers. Each foreground layer s descrbed by s moon, shape, and appearance. Each background layer s descrbed by s shape and appearance. If he background also moves, all background layers share a sngle moon. A any me, Back he se of all layer parameers s called he sae of he racker and s denoed by Λ. In laer secons, we wll descrbe n deal he models for hese sae varables. Fgure shows a real example of a vdeo frame and he op-mos occludng background layer. In hs example, only he shape of he fron-mos background layer s shown. Usng he above layer model, from he obec pon of vew, each obec belongs o one of he foreground layers. We explcly model and esmae he layer assgnmen for he obecs n he scene. The deph orderng of L foreground obecs a me s denoed as O =,,..., ]. The neger varable [, L] and [ L k l ff k l. If we assume ha obecs do no nerleave wh each oher, here are L! possble layer assgnmens for L foreground obecs. We furher assume ha he deph orderng s a random varable wh a unform dsrbuon. Ths means all he permuaons have he same probably and hus have he same pror probably ( O, ) = L!, where O, s an arbrary layer orderng confguraon. I should be noced ha he foreground layer orderng, ogeher wh he shape, appearance, and moon nformaon of all foreground and background layers, provde he complee nformaon for occluson reasonng.. Moon models We descrbe he background moon usng a D affne model. and esmaed hs model usng he so-called drec mehod []. All background layers share he same moon. Each foreground layer undergoes a D rgd moon, whch s descrbed usng poson µ, orenaon ω, scalng facor s, and her emporal dervaves. A consan velocy model s used o descrbe he dynamcs of he foreground layers. If we denoe he moon parameers of a layer as θ = [ µ, ω, s, & µ, & ω, s& ], hen he moon dynamcs s wren as ( θ θ ) = N( θ : Φθ, Q) () where θ s he moon parameer a me, Φ s he sandard ranson marx for a consan velocy model, and he noaon N ( x : µ, R) denoes a mulvarae Gaussan dsrbuon wh mean µ and covarance marx R..3 Shape models of he foreground and background layers Each foreground or background layer s assocaed wh a shape map. A each pxel locaon, he value of he shape map s he probably ha he obec n he layer presenng a ha pxel locaon ( may no be vsble hough). For he foreground layer and poson x a me, we denoe

he value of he shape map as τ ( x ). For he background layer, we use he noaon π ( x ) o represen s shape map. One dfference beween he foreground shape map and he background shape map s ha for he background, he probablsc values of all shape maps a each pxel mus sum up o. Ths reveals our underlyng assumpon ha here s only one background surface for each pxel. Ths s a reasonable assumpon because even here are more surfaces hey wll no be observable anyway..3. Layer vsbly Once he shape maps are defned for all layers, for each pxel x, we can compue he probably ha he h foreground layer s vsble. Ths s he probably of he on even ha background layers o are absen foreground layer o are absen and h foreground layer presens a x. The frs probably s l = π l ( x ) because here s only one background surface. The second probably s s= [ τ s ( x )], and he hrd probably s τ ( x ) (for smplcy, we gnore he subscrp ). As a resul he probably of he h foreground layer beng vsble a x s ( x ) = τ ( x ) ( π ( x )) [ τ ( x )] () l= l s= Smlarly, he probably of observng he background layer a x s k= s h B, ( x ) = π ( x ) ( τ k ( x )) (3) and he probably of observng one of he background layers s ( x ) B L+ π ( x ) ( τ k ( x )) (4) = k= =.3. Shape dynamcs If we assume he shape of he foreground does no change dramacally, hen we can use a consan value Gaussan model o descrbe he dynamcs of he shape changes over me. More specfcally, ( τ N( τ ( x ) τ ( x ); τ,, ) = γ + ( R( & ω )( x & µ ) / s& ), σ τ where γ represens he uncerany n he shape of he layer. The ransformaon R ( & ω )( x & µ ) / s& s used o algn he shape maps. ) (5).4 Appearance model The appearance of foreground layer s defned n he local coordnae sysem and s denoed as A,. We assume ha he mage observaon model s a Gaussan dsrbuon wh he appearance as he mean, or ( I ( x ) A, ( x )) = N( I ( x ) : A, ( x ), σ I ) (6) where σ I s he varance of he mage observaon. Lke he moon and shape models, we also assume ha he emporal changes of he layer appearance follow a consan value Gaussan dsrbuon. Ths s formulaed as ( A, ( x ) A, ( x )) = N( A, ( x ) : A, ( x ) : σ A) (7) where σ A s he appearance uncerany ha accouns for he appearance varaons..5 The MA esmaon The rackng procedure can be consdered as he maxmzaon of he poseror probably arg max ( Λ Λ, I,..., I 0 ) (8) Λ Usng Bayes rule and he HMM model, ( Λ Λ, I,..., I 0 ) = ( I Λ ) ( Λ Λ ) (9) where ( Λ Λ ) s he sae pror funcon, and ( I Λ ) s he lkelhood funcon. Based on our models n he prevous secons, he pror funcon s compued as where ( Λ Λ ) = (0) order fg _ shape bg _ shape moon appearance order = fg ( o o ) _ shape ( τ ( x ) τ, ( x )) = L N = = L + N bg _ shape = ( π ( x ) π, ( x )) = = L moon = ( θ θ = = L N appearance = =, ) ( A, ( x ) A, ( x )) Here we assume he mage has L foreground layers, L + background layers, and N pxels on he obec n layer. 3

To compue he lkelhood funcon, we need o frs oban he probablsc dsrbuon of he fron-mos layer a each pxel based on foreground layer orderng, foreground and background shapes, and he appearance models. More specfcally, we compue he lkelhood funcon as ( I N Λ ) = ( ( x ) + ( x )) () = where bgo ( x ) and ( x ) represen he lkelhood of one of he background or foreground layers s vsble a pxel x. They can be compued as and bgo x ) = ( I( x ) B( x )) ( x ) () bgo ( B [ ( I ( x ) A ( x )) ( x )] ( x ) (3) = L = x ) and x ) are defned n Eq(-4). B ( ( 3 Esmaon of he obec sae Solvng Eq(8) s a dffcul opmzaon problem because he sae space s very large. An approxmae soluon can be found by frs decomposng he orgnal problem no several sub-problems (see Fgure 3). Then opmzaon s performed o solve hese sub-problems sequenally. We found ha n pracce hs approach yelds feasble soluons. Hypohesze & deermne obec orderng Layer moon esmaon Foreground shape esmaon Background shape esmaon Appearance esmaon Fgure 3. Esmaon of he sae parameers. 3. Foreground layer orderng hypohess + Foreground layer orderng O s modeled as a unformly dsrbued random varable. Because of hs propery, he esmaon of O s raher smple: he algorhm goes hrough all he possble value of O, and fnds he one ha maxmzes he poseror probably. Snce all he oher parameer esmaon seps hghly depend on he deph orderng, s compued a he begnnng of each eraon. 3. Moon esmaon Maxmzng he poseror probably n Eq(8) w.r.. foreground layer moon s equvalen o opmzng he funcon n ( = bgo ( x ) + ( x )) moon (4) A search algorhm can be used o fnd he soluon around he predced poson. Roaon, ranslaon, and he scalng facor are dscrezed wh suffcen precson for hs purpose. For sequence wh movng background, a drec mehod [] s used o esmae he moon parameers. 3.3 Foreground shape esmaon From Eq(-3) and Eq(-4), can be observed ha he lkelhood s a lnear funcon of each foreground shape varable τ ( x ) and he pror erm s a Gaussan funcon of τ ( x ). If we opmze τ ( x ) ndependenly for each layer, he esmaon becomes he maxmzaon of a funcon n he form of ( x x0 ) / σ ( ax + b) e (5) where a and b are consans ha can be compued usng Eq(-3) and Eq(-4). The opmal soluon s 0,, or he roo of he quadrac equaon ax + ( b ax0 ) x ( bx0 + aσ ) = 0 (6) Ths equaon s derved by akng he dervaves of he funcon n Eq(5) and se o be 0. 3.4 Background shape esmaon Esmaon of he background shape s smlar o he esmaon of he foreground shape. However, here s one addonal consran needs o be enforced: he values of all shape maps should sum up o. Wh hs consran he global opmzaon becomes complcaed. However, we can use he resuls n he prevous frame or prevous eraon as he sarng pon o perform a greedy algorhm o fnd he local opmal soluon. We esmae each background level ndvdually wh he shape maps of oher layers fxed. Afer all shape values for all layers are esmaed, hey are normalzed so ha her sum becomes. There s anoher dfference beween he background shape esmaon and he foreground shape esmaon. For foreground, he obec shape does no change sgnfcanly over me because of he D rgd model. Therefore we use he shape n he prevous frame as our pror n he esmaon. However, n he background shape esmaon, he shape of each background layer hghly depends on obec moon. The occludng background shape n he same area can change quckly from me o me because of obec movemens. For example, a car may frs pass behnd a ree, urn around and hen pass n fron of he ree agan; n he frs case he ree s par of he occludng 4

background layer o he foreground layer of car, whle n he second case he ree belongs o he background layer ha does no occlude he same foreground. So n our algorhm f all he obecs leave an area for a ceran perod of me, we acually lack vsual nformaon o nfer background layer shapes. As a resul no maer wha he prevous background shape values are, hey becomes obsolee and he shape of all background layers are rese o a defaul value. 3.5 Appearance esmaon To esmae he appearance, we need o fnd A, ha maxmzes he funcon n ( ( x ) + ( x )) (7) = bgo appearance Snce boh he appearance observaon model Eq(6) and he appearance dynamcs Eq(7) are Gaussan funcons, he funcon n Eq(7) becomes a Gaussan mxure. The closed-form soluon o hs opmzaon problem s dffcul o fnd. However, appearance s a dscree funcon and we know he soluon should be beween he curren observaon and he prevous esmae. For each pxel, we can search for he appearance value n hs range o fnd he soluon. 4 Implemenaon and expermenal resuls 4. Inalzaon and deleon of obecs In addon o he rackng algorhm dscussed n he prevous secons, here are several oher ssues regardng he nalzaon and deleon of he foreground and background layers need o be addressed. In our mplemenaon, change mage s compued o deermne wheher a movng obec presens n he scene. A new obec s nalzed f a change blob s deeced far away from any exsng obecs. In hs case, we assume he cener of he obec s locaed a he cener of he change blob. The value of shape map a each pxel s proporonal o he nensy of he change mage. The appearance s se o be he orgnal mage nensy values. An addonal background layer s nsered. The new layer has he same shape map as s neghborng background layer. A normalzaon sep s hen appled o make sure hese background shape maps sum up o a each pxel. An obec s deleed f moves ou of he mage boundares or s occluded for a very long perod of me. Then he foreground layer of hs obec s removed from he daa srucure and wo background layers nex o merge no one layer wh he shape mask value equal o he sum of he orgnal wo shape maps. 4. Synhec vdeos We have esed he proposed algorhm usng synhec and real vdeo clps. (Vdeo clps of he resuls are avalable n he supplemenary fle). In Fgure 4 and Fgure 5 show he rackng resuls of wo synhec vdeos wh movng obecs. The vdeos nclude dffcul condons ncludng shadows, reflecons, and ransparen obecs (e.g. he waerfall), and ou-of-plane obec roaon. Our rackng algorhm locked on he movng obecs successfully hrough occluson n boh sequences. The esmaed sae varables n hree key frames of he second vdeo are shown n Fgure 6. I can be observed he background shape maps n row 3 accuraely descrbe he shape of he occludng ree. 4.3 Vehcle rackng hrough occluson We mplemened a rackng sysem based on he proposed algorhm for handlng obec occlusons. In Fgure 7, he rackng resul on a vdeo clp wh a car occluded by background s demonsraed. In hs example, background obecs such as rees, lgh poles, and he rsng ground occlude he car. The proposed rackng algorhm esmaes he layer parameers correcly hrough he sequence. The racker found one foreground layer and wo background layers. The esmaed sae varables n hree key frames are demonsraed n Fgure 8. The background shape maps are for he fron-mos layer. Fgure 6. Layer sae varables n hree frames of he vdeo n Fgure 5 (Row are he orgnal mages, Row are he foreground shapes, Row 3 are he background shapes, and Row 4 are he foreground appearances). 4.4 Human rackng Alhough our model of layer shape s D rgd, our racker s able o rack movng people by adusng he sysem parameers and focusng orso area, whch s relavely rgd compared o he oher pars of human body. Fgure 9 shows he rackng resul of wo persons passng accross each oher. The algorhm racked boh persons successfully hrough he occluson. 5

Fgure 0 demonsraes he rackng resuls on a vdeo clp n whch a walkng person s occluded by background obecs. Because he occludng background area s large, here s a long perod of full occluson. Snce he algorhm esmaes he background occludng layers, knows whch par of he foreground s occluded. As a resul he racker s aware of he occluson and wll no updae he obec appearance. The racker s able o regan correc values of layer sae soon afer he obec moves ou of he occludng background, as observed n he las frame. Fgure 8. Layer sae varables n hree frames of he vdeo n Fgure 7 (Row are he orgnal mages, Row are he foreground shapes, Row 3 are he background shapes, Row 4 are he foreground appearances). 5 Conclusons A novel moon layer based represenaon and he assocaed esmaon algorhm have been proposed n hs paper. Ths new approach exends he radonal layer model by nroducng he background layers and layer orderng. The expermenal resuls demonsrae he power of hs represenaon n handlng he dffcul occluson problem n rackng. One advanage of he proposed represenaon s ha models all possble neracon beween foreground and background obecs. No only he occluson caused by he foreground layer s modeled, bu also modeled s he occluson caused by he background layers. Some fuure research opcs for mprovng he proposed algorhm nclude he developmen of more flexble shape and moon models ha can handle arculaed and nonrgd moons and he nvesgaon of effcen opmzaon algorhms for fndng he opmal orderng of he foreground layers. References [] J. R. Bergen,. Anandan, K. J. Hanna, and R. Hngoran, Hearchcal model-based moon esmaon, n roc. of nd European Conference on Compuer Vson, pp. 37-5, 99. [] M.J. Black, D.J. Flee., and Y. Yacoob A framework for modelng appearance change n mage sequences. IEEE Inernaonal Conference on Compuer Vson, Mumba, Inda, January 998, pp. 660-667. [3] Black, M. J. and Flee D. J., robablsc deecon and rackng of moon boundares, In. J. of Compuer Vson, 38(3):3-45, July 000. [4] Allan D. Jepson, Davd J. Flee Mchael J. Black A Layered Moon represenaon wh occluson and compac spaal suppor. ECCV () 00: 69-706. [5] Allan D. Jepson, Davd J. Flee and Thomas F. El-Maragh, Robus onlne appearance models for vsual rackng IEEE Conference on Compuer Vson and and aern Recognon, Kaua, 00, Vol. I, pp. 45 4. [6] N. Joc, N. erovc, B. Frey, and T. S. Huang, Transformed hdden Markov models: esmang mxure models of mages and nferrng spaal ransformaons n vdeo sequences, n roc. of he IEEE Conference on Compuer Vson and aern Recognon, pp.(ii) 6-33, 000. [7] N. Joc and B.J. Frey Learnng flexble spres n vdeo layers. In Compuer Vson and aern Recognon, pp. (I) 99-06, 00. [8] L.R. Rabner. A uoral on hdden Markov models and seleced applcaons n speech recognon. roceedngs of he IEEE, 77(): 57-85, 989. [9] H. Tao, H. Sawhney and R. Kumar, Obec rackng wh Bayesan esmaon of dynamc layer represenaons, IEEE Transacons On aern Analyss And Machne Inellgence. Jan. 00. [0] N. Vasconcelos and A. Lppman, Emprcal Bayesan EMbased moon segmenaon, n roc. of IEEE Conference on Compuer Vson and aern Recognon, pp. 57-53, 997. [] J. Y. A. Wang and Edward H. Adelson, Layered represenaon for moon analyss, n roc. of IEEE conference on Compuer Vson and aern Recognon, pp. 36-366, 993. [] Y. Wess and E. H. Adelson, A unfed mxure framework for moon segmenaon: ncorporang spaal coherence and esmang he number of models, n roc. of IEEE conference on Compuer Vson and aern Recognon, pp. 3-36, 996. 6

Fgure 4. A synhec vdeo sequence wh a fgure movng horzonally. Fgure 5. A movng fgure moves behnd a ree. Fgure 7. A movng car s occluded by rees and he rsng ground. Fgure 9. Two people work across each oher. Fgure 0. A person walks behnd rees and bushes. 7