Human Tracking by Fast Mean Shift Mode Seeking

Transcription

1 JOURAL OF MULTIMEDIA, VOL. 1, O. 1, APRIL Human Trackng by Fast Mean Shft Mode Seekng [10 font sze blank 1] [10 font sze blank 2] C. Belezna Advanced Computer Vson GmbH - ACV, Venna, Austra Emal: csaba.belezna@acv.ac.at [10 font sze blank 1] B. Frühstück Semens AG Österrech, Programm- und Systementwcklung, Graz, Austra Emal: bernhard.fruehstueck@semens.com H. Bschof Insttute for Computer Graphcs and Vson, Graz Unversty of Technology, Graz, Austra Emal: bschof@cg.tu-graz.ac.at [10 font sze blank 1] [10 font sze blank 2] [10 font sze blank 3] Abstract Change detecton by background subtracton s a common approach to detect movng foreground. The resultng dfference mage s usually thresholded to obtan objects based on pxel connectedness and resultng blob objects are subsequently tracked. Ths paper proposes a detecton approach not requrng the bnarzaton of the dfference mage. Local densty maxma n the dfference mage - usually representng movng objects - are outlned by a fast non-parametrc mean shft clusterng procedure. Object trackng s carred out by updatng and propagatng cluster parameters over tme usng the mode seekng property of the mean shft procedure. For occludng targets, a fast procedure determnng the object confguraton maxmzng mage lkelhood s presented. Detecton and trackng results are demonstrated for a crowded scene and evaluaton of the proposed trackng framework s presented. [9 font sze blank 1] Index Terms automated vsual survellance, moton detecton, mean shft clusterng, human trackng, occluson handlng [9 font sze blank 1] I. ITRODUCTIO Scenes of practcal nterest usually contan a large number of nteractng targets under dffcult magng condtons. In such crcumstances the task of relable object detecton and trackng becomes non-trval and obtanng a meanngful hgh-level representaton poses a challengng task. Human detecton and trackng systems proposed n recent years attempt to tackle ncreasngly complex scenaros. Moton detecton s an essental part of automated vsual survellance systems; however, relable segmentaton of movng regons nto ndvdual objects of Based on Human Trackng by Mode Seekng, by C. Belezna, B. Frühstück, and H. Bschof whch appeared n the Proceedngs of the 4th Internatonal Symposum on Image and Sgnal Processng and Analyss, Zagreb, Croata, September IEEE. Fgure 1. Blob analyss (a) typcally produces undersegmented results for humans n groups leadng to low detecton rates and poor trackng results (b). Image (c) llustrates the proposed detecton and trackng approach generatng sgnfcantly mproved results. nterest stll represents a great challenge. For nstance, blob-based moton segmentaton n the presence of nteractng targets typcally generates objects whch are under- or oversegmented (see Fg.1.a and Fg.1.b). Blobbased moton segmentaton reles on thresholdng, where the threshold s a senstve parameter leadng to an mmedate decson whether a pxel belongs to a movng or non-movng regon. Thresholdng elmnates relevant nformaton and moton segmentaton errors are dffcult to correct afterwards gven the poor qualty of bnary mages. In ths paper we propose a novel detecton and trackng scheme drectly operatng on the dfference mage obtaned by background subtracton. The method shows good trackng performance n crowded scenaros, even n the presence of a large overlap between objects. A fast varant of mean shft clusterng s appled to delneate objects and mode seekng along the densty gradent of the dfference mage s used to propagate and update object propertes. Upon occludng objects the optmal spatal arrangement,.e. the object confguraton s determned by searchng for the maxmum lkelhood estmate n the space of jont-object confguratons. The search employs a samplng scheme relyng on the mean

2 2 JOURAL OF MULTIMEDIA, VOL. 1, O. 1, APRIL 2006 shft procedure and on prors wth respect to the number and sze of nvolved humans. The paper s organzed as follows: secton II descrbes related work. Secton III provdes a bref overvew on the appled fast mean shft procedure usng a unform kernel. Secton IV descrbes the mean shft clusterng-based object detecton technque. Secton V gves detals on the trackng algorthm based on mode propagaton and descrbes the occluson handlng scheme usng a smple human model. Secton VI provdes an algorthmc summary of the trackng system. Secton VII presents detecton and trackng results and ther performance evaluaton. Fnally, the paper s concluded n secton VIII. II. RELATED WORK Human detecton by blob analyss s used n many approaches [1]. However, nferrng the poston of ndvdual humans from the bnary segmentaton results by shape analyss [2] or by stochastc segmentaton [3] requres good segmentaton qualty n order to fnd landmark ponts such as heads or shoulders. Blob-based analyss can be complemented by appearance [4] or color nformaton [5], enablng a trackng system to better cope wth occlusons. Colorbased segmentaton [6] n crowded scenes can be also used for trackng f colors are dstnctve for dfferent ndvduals. Pece [7] proposed clusterng n the dfference mage usng mxtures of Gaussans and trackng by propagatng cluster parameters. Due to the Gaussan assumpton on the cluster shapes, nteractng and occludng targets are often clustered together. Our approach also performs dfference mage clusterng; however, wthout relyng on specfc assumptons wth respect to the dstrbuton of the data. Thus nearby densty maxma,.e. cluster centers are kept separate. Color- or hstogram based trackng [8] performs mode seekng along the gradent of a hstogram smlarty functon. Our trackng approach adopts a smlar mode seekng strategy, but mode seekng n our case s performed to track densty maxma n the dfference mage. In the context of multple target trackng, partcle flterng recently appeared as a promsng technque [9]. It s capable to ntegrate dfferent mechansms, such as vsual object recognton [10], color trackng and occluson handlng [11]. Our work proposes a smple, computatonally effcent object detecton and mult-target trackng framework, whch can be also combned wth exstng detecton and trackng technques. III. THE FAST MEA SHIFT PROCEDURE Object detecton s performed by delneatng clusters n the dfference mage by the mean shft mode seekng procedure. The mean shft algorthm s a nonparametrc technque to locate densty extrema or modes of a gven dstrbuton by an teratve procedure [12]. Startng from a locaton x the local mean shft vector represents an offset to x', whch s a translaton towards the nearest mode along the drecton of maxmum ncrease n the underlyng densty functon. The local densty s estmated wthn the local neghborhood of a kernel by kernel densty estmaton where at a data pont a kernel weghts K(a) are combned wth weghts I(a) assocated wth the data. Fast computaton of the new locaton vector x' can be performed as n [13]: x' K'' K'' a ( a x) x ( a ( a x) ( a) a =, (1) where K'' represents the second dervatve of the kernel K, dfferentated wth respect to each dmenson of the mage space,.e. the x- and y-coordnates. The functons x and are the double ntegrals,.e. two-dmensonal ntegral mages [14] n the form of: and ( x) = I( x ) x (2) x x < x x < x ( x) = I( x ). (3) If the kernel K s unform wth bounded support, ts second dervatve becomes sparse contanng only four mpulse functons at ts corners. Thus, evaluatng a convoluton takes only the summaton of four corner values n the gven ntegral mage. To compute the mean shft vector at locaton x, the followng steps are performed: 1. three ntegral mages (defned n (2) and (3)) are precomputed n a sngle pass (see [14] and [15] for detals); 2. the expresson n (1) s evaluated usng only ten arthmetc operatons and twelve array accesses. The number of operatons s ndependent of the kernel sze, gven the sparse structure of K''. IV. MEA SHIFT CLUSTERIG The clusterng step s facltated by the use of a human sze model {H(x), W(x)}, where H and W denote human heght and wdth, respectvely. Ths nformaton s obtaned by a smple calbraton step. The prncpal steps of mean shft clusterng are performed analogously to the steps descrbed n [16]: 1. The dfference mage ntensty maxmum s mapped to unt ntensty and ts entre range s scaled proportonally. 2. A sample set of n ponts X 1 X n s defned by locatng local maxma - above a very low threshold T 1 - n the dfference mage. The fnal result does not depend crtcally on T 1. A very low value just ncreases the run tme and generates more outlers whch can be elmnated durng the mode trackng step. )

3 JOURAL OF MULTIMEDIA, VOL. 1, O. 1, APRIL sum magntudes { S 1, K, S }. An llustratve example k depctng these constructs s shown n Fg 2. At ths stage the detected clusters can be consdered as object canddates and probable outlers can be elmnated by mposng sze constrants on the sze of the attracton basns. V. CLUSTER TRACKIG Fgure 2. Example for fast mean shft based clusterng n a dfference mage (shown nverted). Obtaned clusters are delneated by rectangular basns of attracton (rectangles wth dashed lne). Cluster centers (black dots) and connected path pont sets (shown as lnes runnng towards cluster centers) are also shown. ote that some clusters are generated by nose and moton clutter. 3. The fast mean shft procedure s appled to the ponts of the sample set wth a wndow sze of (H(X ), W(X )) accordng to the local sze model. The mean shft procedure converges to the nearest mode typcally wthn 3-4 teratons. The mode seekng process delneates a path between the ntal pont of the sample set and the detected local mode canddate. Each mean shft teraton defnes a pont on the path, what we denote as a path-pont {PX}. Thus, each detected mode canddate locaton has an assocated set of path-ponts {PX 1,, PX n }, not ncludng the mode tself. When the mean shft offset vector s computed accordng to (1), the area sum (.e. sum of pxel ntenstes) wthn the kernel (denomnator of the expresson n (1)) s also obtaned. The set of area sum magntudes {S 1,,S n } s useful to have snce t provdes nformaton on the magntude of the local densty and as we wll see later, t can be used n the occluson handlng step evaluatng a gven spatal confguraton of kernels. 4. Gven the fnte sze of the mean shft convergence crteron, detected mode canddate locatons - obtaned for the same peak of underlyng densty - mght slghtly devate. Detected mode canddates are lnked based on spatal proxmty: all detected modes wthn a wndow of the sze (W, H) are grouped together and a cluster center Y s obtaned by takng the mean of lnked canddate coordnates. Path-pont sets belongng to grouped mode canddates are also merged, such as the sets of area sum magntudes. The merged set of path ponts s used to delneate the cluster: a boundng box representaton of the basn of attracton s obtaned by determnng the spatal extrema of path ponts n x- and y-drectons. The above clusterng process yelds followng nformaton for a gven cluster : a cluster center Y, a set of path-ponts { PX 1, K, PX }, the basn of attracton k boundares n form of a boundng box and a set of area A. Trackng by Mode Seekng Movng objects of a scene usually represent movng local densty maxma n the correspondng sequence of dfference mages. The mode seekng property of the mean shft procedure mples that a mode can be pursued by a repettve mean shft procedure: for each mode dsplacement n the dfference mage - assumng that the nterframe dsplacement s much smaller than the kernel sze - the mode locaton can be repeatedly found. The prncpal advantages of ths trackng strategy are: 1. the data assocaton problem s solved mplctly, snce the mode seekng procedure s guded to the nearby mode along the steepest densty gradent; 2. t represents a smple and computatonally effcent technque, because only a few fast mean shft teratons are suffcent to redetect the object. Furthermore, the mode seekng process can be easly complemented by an underlyng moton model. The dsadvantage of the above strategy s that such a sequental mode seekng assumes the spatal dstnctveness of avalable modes. When several densty maxma are n spatal proxmty - such as n a dfference mage of a crowded scene contanng humans occludng each other -, the dstrbuton locally mght become strongly non-gaussan and mode canddates tend to exhbt coalescence, leadng to the breakdown of affected trackng processes. If objects n the orgnal mage and correspondng dfference mage densty extrema are spatally wellseparated, the mean shft mode seekng procedure can relably track them. In the ntal frame the entre dfference mage s evaluated by the fast mean shft clusterng algorthm as descrbed n secton IV. The obtaned cluster centers are then used n subsequent frames as the ponts of a new sample set X'. Startng from these ponts the fast mean shft procedure s carred out (usng the locally-scaled unform kernel of heght H(x) and wdth W(x)) locatng the nearby mode canddate whch corresponds to the new locaton of the movng object,.e. the new cluster center. For spatally solated mode canddates we do not compute addtonal cluster parameters, such as basn of attracton or path-ponts, snce we assume that the underlyng dstrbuton vares only slghtly wth respect to ts shape. In the followng, the dfferent cases of the trackng framework are descrbed and a possble soluton for coalescng mode canddates s presented. B. Occluson Handlng If several objects meet and form a group, occluson - partal or complete - between the objects mght take

4 4 JOURAL OF MULTIMEDIA, VOL. 1, O. 1, APRIL 2006 place. Such an event generates overlappng or very close densty extrema n the dfference mage. Typcally one specfc mode attracts all or most of the nearby mode seekng procedures. We denote ths phenomenon as "mode canddate stealng". The occurrence of mode canddate stealng can be easly detected, snce two or more mode canddates appear n close proxmty. Typcally, before movng objects form a group, they can be tracked separately, as descrbed n subsecton A. After each trackng step, t s examned whether at least another cluster center exsts wthn a wndow of (0.5H(x), 0.5W(x)) around a detected cluster center. If ths s the case, mode canddate stealng has occurred mplyng that the local confguraton of humans cannot be obtaned by mode seekng. In such stuatons we employ a Bayesan approach - smlarly to the technque descrbed n [3] - to fnd the local optmum confguraton of humans best explanng the dfference mage data I. Ths task can be stated as a model-based segmentaton problem. We employ a very smple human shape model, a rectangular regon. Ths regon s equvalent to the kernel used n the mean shft procedure. All parameters (heght, wdth and orentaton) of the rectangular regon are known. The trackng algorthm provdes pror nformaton on the number of objects nvolved n the group formaton. Thus, the number of models needed to explan the data s also avalable. The search for * θ - the most probable confguraton consstng of objects - n the space of possble * confguratons θ becomes a maxmum lkelhood estmaton problem: * θ = arg maxθ P( I θ ). (4) The unknown parameters are the locatons of the humans {x,y } =1.. n the occluded state. When we detect mode canddate stealng, we perform the followng steps to fnd the optmum local confguraton of humans: 1. A new sample set of ponts by locatng local maxma s generated wthn a local mage regon spanned by the spatal extrema of nvolved object wndows. 2. Startng from these ponts fast mean shft teratons are carred out untl convergence (see Fg. 3, center). 3. The mean shft algorthm has the property that t runs along the path of the maxmum ncrease n the underlyng densty. Furthermore, the magntude of the mean shft offset correlates wth the local magntude of the densty gradent. These propertes have followng mplcatons: 1. the mean shft kernel becomes quckly centered on relevant data; 2. local plateaus or rdges on the densty surface are dstngushed by a large number of path ponts, PX (see secton IV). Our samplng procedure s guded by the path of mean shft runs. We use the path ponts as a canddate set of possble object locatons. We also make use of area sum magntudes S, whch are avalable at these locatons, obtaned as a "by-product" of mean shft computaton. Fgure 3. Example llustratng the approach searchng for the most probable confguraton of humans n the presence of occluson. Left: Occluson between two humans shown n the nverted dfference mage. Center: Mean shft mode seekng s performed startng from a set of sample ponts. Obtaned path ponts (shown as dots) represent possble locatons of a human. Rght: the found optmum confguraton of the two humans for the gven mage regons. Ths strategy sgnfcantly reduces the search space and facltates the fast evaluaton of a gven confguraton. 4. The lkelhoods for ndvdual human hypotheses are not ndependent, snce nter-occluson between humans mght be present. Therefore the jont lkelhood for multple humans has to be formulated. A hypotheszed confguraton θ dvdes the dfference mage nto two mage regons: pxels explaned by the confguraton and pxels outsde of the confguraton. If M s the mage regon occuped by the th model, the unon of mage regons M = U = 1 M defnes a mask contanng all pxels explaned by the confguraton. Accordngly, M denotes the complementary regon outsde of the models (see Fg. 3). The local mage regon R around the occludng objects s gven by R = M U M. A confguraton maxmzng the lkelhood should fulfll followng crtera: 1. maxmzng the sum of ntenstes wthn the model regon M, whle 2. mnmzng the sum of ntenstes n M, outsde of the models. A log-lkelhood functon expressng ths balance between the two quanttes can be formulated as: ( I θ ) a1 I( x) a ln P 2 I( x) x M x M x R x M A I( x) I( x), (5) usng the complementarty between M and M and the expermentally determned weght A. 5. The above quantty s evaluated for the confguratonθ. Fast evaluaton of the lkelhood expresson of (5) can be performed as follows: The sum of pxel ntenstes wthn the kernel centered at the th path pont,.e. the area sum S s obtaned durng the mean shft procedure. The frst term of (1) can be computed by: 1. takng the sum of area sums at the sampled locatons and 2. correctng for possble overlaps between hypotheszed models.

5 JOURAL OF MULTIMEDIA, VOL. 1, O. 1, APRIL Snce the models are represented by rectangular regons wth sdes parallel to the mage border, the overlap regons can be easly computed. The maxmum number of possble overlaps between objects s ( 1). Then, the sum of pxel ntenstes n the regon 2 covered by models (frst term n (5)) can be computed as: x M = 1 I ( x) = S I( x), x V where V denotes the unon of overlappng regons. The unon of overlappng regons s determned by examnng the ntersectons between all overlap regons. Snce parwse overlaps span rectangular regons, therefore - usng the ntegral mage defned n (3) - the sum of pxel ntenstes wthn an overlap regon can be obtaned by three arthmetc operatons. The second term of (5) - representng the sum of pxel ntenstes n the entre regon R - s needed to be computed only once usng the ntegral mage. 6. Generally, n our scenaros the number of occludng humans s gven by a small number; usually two, rarely three objects form an occluded group. Typcally 5-12 path ponts are used for hypotheszng object locatons, thus n the worst case, evaluaton of a couple of thousand confguratons s necessary. The models of the best confguraton are assocated - usng a nearest neghbor crteron - wth the predcted cluster centers and trajectores are updated accordngly. When usng a blob-based detecton system, occlusons between objects often generate object mergng, renderng the trackng task dffcult. The presented approach provdes measurements even n the case of complete occluson between objects, due to the use of as pror. If the pror nformaton on s ncorrect - due to detecton or trackng falures - the error s propagated further, over the duraton of occluson events. Ths problematc ssue s not handled n the present approach. C. Appearance of ew Objects ew objects are detected usng a smple scheme. For all prevously detected objects, the dfference mage s reset to zero ntensty wthn the local kernel. The resdual dfference mage s analyzed agan for the exstence of clusters, as descrbed n secton IV. Coalescence of a newly-created cluster wth a nearby cluster ndcates that the appearng object s dentcal wth an exstng object: n such cases the appearng object s deleted. D. Object Dsappearance If the mean shft offset for a tracked object remans zero over a gven tme perod, three possbltes exst: 1. the object has come to a full stop, 2. the object s generated by nose or clutter, 3. the object has dsappeared. (6) To evaluate such a case, the dfference mage regon wthn the object kernel s examned. A new set of sample ponts s generated by selectng local maxma and a clusterng step accordng secton IV s carred out. The basn of attracton of the cluster s delneated. If the dmensons of the basn of attracton sgnfcantly devate from the local scalng of a human, the object s deleted. VI. ALGORITHMIC SUMMARY The algorthm of cluster center trackng proceeds accordng to the followng man steps: 1. Integral mages (2) and (3) are computed n a sngle pass. 2. Performng clusterng n the ntal frame of the dfference mage. Cluster attrbutes (cluster center, basn of attracton, set of path ponts and area sums) are determned. 3. Inter-frame cluster center trackng by fast mean shft procedure. A lnear moton model s appled. 4. Testng for coalescence between mode canddates. If mode canddate stealng s detected, the most probable confguraton s searched usng the number of nvolved objects as prors. 5. Testng for appearance of new objects 6. Testng for dsappearance of objects. The cluster center trackng algorthm performs fast cluster center propagaton for spatally solated objects, and - n the case of occlusons - a computatonally effcent scheme proposes the optmum confguraton of occludng objects. VII. RESULTS AD DISCUSSIO Background dfferencng was carred out applyng a moton detecton technque [17] usng an adaptve background model. One sequence (4600 frames, resoluton: pxels) depctng a scene of walkng people was selected for evaluaton. The humans n the scene cast shadows and moton clutter n form of a movng flag and movng vegetaton s present (see Fg. 4). The sequence was processed by the proposed trackng approach (Fg. 4.b) and also by a common blob-based trackng algorthm (Fg. 4.a). Blob-based detecton was based on the method descrbed n [17]. The blob trackng algorthm generated a new trajectory hypothess each tme when a new blob object appeared n the mage. Blob objects and exstng track hypotheses were matched by computng the overlap between ther boundng boxes. A lnear moton model was assumed. Durng occlusons hypotheszed trajectores were solely guded by the moton model. Trackng results obtaned for the proposed tracker show that t copes well wth occlusons and shadows. Trajectores reman stable for all the targets, as t can be seen n Fg. 4.b. Occlusons between two and rarely between three persons are resolved successfully usng the proposed model-based occluson handlng scheme. Shadows are detected as solated mode canddates. In the subsequent cluster delneaton step (see secton IV) these

6 6 JOURAL OF MULTIMEDIA, VOL. 1, O. 1, APRIL 2006 TABLE I. HUMA DETECTIO PERFORMACE FOR A BLOB-BASED AD FOR THE PROPOSED APPROACH Performance measures Blob-based approach Proposed approach Detecton rate 38% 94% False alarm rate 32% 37% Mean spatal devaton between detected humans and ground truth 31% 14% umber of evaluated frames 3990 Total number of vald humans n the ground truth Fgure 4. Trackng results n the case of a (a) blob-based trackng approach and (b) for the proposed trackng scheme. mode canddates are elmnated based on the extent of the correspondng basns of attracton. ote, that shadows n ths sequence are orented nearly horzontally, thus ther elmnaton based on geometrc constrants works well. In cases where the shape and orentaton of shadows s smlar to those of the humans n the scene, shadows are detected and tracked as vald mode canddates. Blob-based trackng (Fg. 4.a) yelds only trajectory segments and trajectores representng the moton of humans n groups. Blob detecton reles on connected component analyss, whch leads to poor object segmentaton qualty n the case of overlappng humans and/or shadows. Segmented blob boundares are shown n Fg. 4.a as whte boundng boxes. Trackng falures arse from the under-segmented objects and due to the lack of measurement update durng occlusons. In order to quanttatvely assess the detecton performance, detecton results were compared to a ground truth. As ground truth, the boundng boxes of humans were determned manually for a number of frames (see Table I). Humans wth more than 50% vsble parts were consdered as vald objects. Correct detecton was assumed when the centrod of the detected human was nsde of the ground truth boundng box. A one-toone mappng between detectons and ground truth data was enforced. Trackng by mode seekng acheves a hgh detecton rate of 94%, whle generatng a false alarm rate of 37% (see Table I). The hgh detecton rate s due to the modeldrven clusterng and occluson handlng provdng accurate locatons for humans even durng occlusons. The hgh false alarm rate s generated by the permanent moton clutter caused by flag and vegetaton movements. Clusters of these movng regons are not dstngushable from humans by the proposed method. The blob-based detecton approach produces poor results for the test sequence gven the frequent occurrence of undersegmented groups and humans wth shadows. The amount of false alarms s slghtly lower, snce large connected movng regons count as a sngle detected blob, whereas the model-based approach explans them as a group of objects. The mean spatal devaton (see Table I) of detecton results was evaluated by computng the dstance n the mage space between the ground truth centrods and the centrods of matchng detectons. Ths dstance s normalzed by the local heght model H(x). As t can be seen from the table, blob detecton locatons are hghly naccurate, snce detectons are off by ca. 30% of the human heght. The large amount of spatal errors s agan due to undersegmented objects. Mean shft based detecton produces smaller errors mplyng that detected objects concde well spatally wth ground truth objects. In order to evaluate the trackng performance, a partcular ground truth trajectory undergong several occlusons was selected. Ths ground truth was compared to the trajectores generated by the blob-based and proposed trackng scheme. Trajectores obtaned for the two dfferent trackng algorthms are shown n Fg. 5 together wth the ground truth trajectory. Errors n terms of the spatal dstance measured n pxels wth respect to the ground truth trajectory are shown n Fg. 6 for the blob-based and for the proposed trackng approaches. The target tracked by mean shft mode seekng remans close to the ground truth target poston for the entre track duraton. The trajectory obtaned by the blob-based approach, however, devates sgnfcantly from the ground truth trajectory. In ths case, the tracked target s defned most of the tme by a group

7 JOURAL OF MULTIMEDIA, VOL. 1, O. 1, APRIL Fgure 5. Trajectores llustratng the ground truth trajectory and the trajectores obtaned by blob-trackng and by mode-seekng for a partcular human (marked by a rectangle). Fgure 7. Example frame showng trackng results for another sequence, where occasonal short-term occlusons by scene objects occur. correspondng to local densty maxma n the dfference mage. Furthermore we show, how detected mode canddates can be tracked usng the mode seekng property of the mean shft algorthm. A computatonally effcent strategy to locate occludng humans s presented. Stable trackng results n a challengng scene depctng frequent occlusons are acheved showng sgnfcantly mproved results when compared to a blob-based human detector. ACKOWLEDGMET Ths work has been carred out wthn the Kplus Competence Center Advanced Computer Vson. Ths work was funded from the K plus Program. Fgure 6. The spatal trackng error (wth respect to a manuallydetermned ground truth trajectory) for blob-based trackng (gray lne) and for trackng usng mode seekng (dark lne). The trajectores are shown n Fg. 5. of people leadng to a permanent offset n the centrod poston of the target. Trackng results for another mage sequence (resoluton: pxels) are shown n Fg.7. Ths scene contans occasonal occlusons between scene objects and humans. The proposed trackng approach can not cope wth short-term occlusons. In such cases trajectores termnate upon occluson and rentalze after occluson. The proposed trackng approach runs n real-tme (8-12 fps) on a 2.5 GHz PC for all of the presented sequences. VIII. COCLUSIOS Ths paper presents a novel approach to detect and track humans n real-tme n crowded scenes based on a fast varant of the mean shft procedure. We demonstrate how mean shft-based clusterng - relyng on a kernel of predefned sze - can effcently delneate objects REFERECES [1] I. Hartaoglu, D. Harwood and L. S. Davs, W4: Real- Tme Survellance of People and Ther Actvtes, IEEE Trans. Pattern Anal. Mach. Intell., 22(8), pp , [2] Y. Kuno, T. Watanabe, Y. Shmosakoda and S. akagawa, Automated Detecton of Human for Vsual Survellance System, Int. Conf. on Pattern Recognton, C92.2, August [3] T. Zhao and R. evata, Bayesan Human Segmentaton n Crowded Stuatons, IEEE Conference on Computer Vson and Pattern Recognton, pp , June [4] A. W. Senor, Trackng wth Probablstc Appearance Models, ECCV Workshop on Performance Evaluaton of Trackng and Survellance Systems, pp , June [5] T. Yang, Q. Pan and J. L, Real-tme multple objects trackng wth occluson handlng n dynamc scenes, IEEE Conference on Computer Vson and Pattern Recognton, Vol. 1, pp , June [6] A. Elgammal, R. Duraswam and L. S. Davs, Effcent nonparametrc adaptve color modelng usng fast gauss transform, Proc. IEEE Conf. Computer Vson and Pattern Recognton, pp , December [7] A. E. C. Pece, Trackng by Cluster Analyss of Image Dfferences, Proc. 8th Int. Symposum on Intellgent Robotc Systems, July 2000.

8 8 JOURAL OF MULTIMEDIA, VOL. 1, O. 1, APRIL 2006 [8] D. Comancu, V. Ramesh and P. Meer, Kernel-Based Object Trackng, IEEE Trans. Pattern Anal. Mach. Intell., 25(5), pp , [9] S. Maskell, M. Rollason,. Gordon and D. Salmond, Effcent Multtarget Trackng usng Partcle Flters, Journal Image and Vson Computng, 21(10), pp , September [10] K. Okuma, A. Taleghan,. de Fretas, J.J. Lttle and D. G. Lowe, A Boosted Partcle Flter: Multtarget Detecton and Trackng, In European Conference on Computer Vson, May [11] Y. Ca,. de Fretas and J.J. Lttle, Robust Vsual Trackng for Multple Targets, In European Conference on Computer Vson, May [12] D. Comancu and P. Meer, Mean Shft Analyss and Applcatons, IEEE Int. Conf. Computer Vson, pp , [13] C. Belezna, B. Frühstück and H. Bschof, Trackng Multple Humans usng Fast Mean Shft Mode Seekng, IEEE Internatonal Workshop on Vsual Survellance and Performance Evaluaton of Trackng and Survellance, pp , January [14] P. Vola and M. Jones, Rapd Object Detecton usng a Boosted Cascade of Smple Features, Proc. IEEE Conf. Computer Vson and Pattern Recognton, vol. 1, pp , [15] C. Belezna, B. Frühstück, H. Bschof and W. Kropatsch, Detectng Humans n Groups usng a Fast Mean Shft Procedure, Proc. of the 28th Workshop of the Austran Assocaton for Pattern Recognton, pp , [16] D. Comancu and P. Meer, Mean Shft: A Robust Approach Toward Feature Space Analyss, IEEE Trans. Pattern Anal. Mach. Intell., 24(5), pp , [17] R. Collns, A. Lpton, T. Kanade, H. Fujyosh, D. Duggns, Y. Tsn, D. Tollver,. Enomoto and O. Hasegawa, A System for Vdeo Survellance and Montorng: VSAM Fnal Report, Techncal Report CMU-RI-TR-00-12, Robotcs Insttute, Carnege Mellon Unversty, [9 font fze blank 1] [9 font fze blank 2] [9 font fze blank 3] [9 font fze blank 4] [9 font fze blank 5] Csaba Belezna graduated from the Techncal Unversty of Ilmenau n He receved hs Ph.D. degree n physcs from the Claude Bernard Unversty, Lyon n C. Belezna joned the K+ Competence Center Advanced Computer Vson n 2000 as a research scentst. Snce 2005 he s Key-researcher at the competence center responsble for research actvtes n the area of Survellance and trackng. Hs research nterests nclude survellance, vsual object recognton and statstcal methods n computer vson. [9 font fze blank 1] [9 font fze blank 2] [9 font fze blank 3] Bernhard Frühstück receved hs M.S. degree n telematcs from the Graz Unversty of Technology n B. Frühstück joned Semens PSE, Graz n 2000 as a research staff member workng n the feld of bometrcs and mage processng. Hs prmary nterests nclude mage processng related to bometrc and ndustral applcatons. [9 font fze blank 1] [9 font fze blank 2] [9 font fze blank 3] Horst Bschof receved hs M.S. and Ph.D. degree n computer scence from the Venna Unversty of Technology n 1990 and 1993, respectvely. In 1998 he got hs Habltaton (vena docend) for appled computer scence. Currently he s Professor for Computer Vson at the Insttute for Computer Graphcs and Vson at Graz Unversty of Technology, Austra. H. Bschof s Key-researcher at the K+ Competence Center Advanced Computer Vson where he s responsble for research projects n the area on Statstcal methods and learnng. He s member of the scentfc board of the K+ centers VrVs (Vrtual realty and vsualzaton) and Know (Knowledge management). He s vce-presdent of the Austran Assocaton for Pattern Recognton. The research nterests nclude, learnng and adaptve methods for computer vson, object recognton, survellance, robotcs and medcal vson, where Horst Bschof has publshed more than 260 revewed scentfc papers. Horst Bschof s program co-char of ECCV 2006 to be held n Graz. He was co-charman of nternatonal conferences (ICA 2001, DAGM 1994), and local organzer for ICPR Currently he s Assocate Edtor for the journals Pattern Recognton, and Computer and Informatcs.