Bayesian Filtering with Online Gaussian Process Latent Variable Models

Bayesian Filering wih Online Gaussian Process Laen Variable Models Yali Wang Laval Universiy yali.wang.1@ulaval.ca Marcus A. Brubaker TTI Chicago mbrubake@cs.orono.edu Brahim Chaib-draa Laval Universiy chaib@if.ulaval.ca Raquel Urasun Universiy of Torono urasun@cs.orono.edu Absrac In his paper we presen a novel non-parameric approach o Bayesian filering, where he predicion and observaion models are learned in an online fashion. Our approach is able o handle mulimodal disribuions over boh models by employing a mixure model represenaion wih Gaussian Processes (GP) based componens. To cope wih he increasing complexiy of he esimaion process, we explore wo compuaionally efficien GP varians, sparse online GP and local GP, which help o manage compuaion requiremens for each mixure componen. Our experimens demonsrae ha our approach can rack human moion much more accuraely han exising approaches ha learn he predicion and observaion models offline and do no updae hese models wih he incoming daa sream. 1 INTRODUCTION Many real world problems involve high dimensional daa. In his paper we are ineresed in modeling and racking human moion. In his seing, dimensionaliy reducion echniques are widely employed o avoid he curse of dimensionaliy. Linear approaches such as principle componen analysis (PCA) are very popular as hey are simple o use. However, hey ofen fail o capure complex dependencies due o heir assumpion of lineariy. Non-linear dimensionaliy reducion echniques ha aemp o preserve he local srucure of he manifold (e.g., Isomap [1, ], LLE [1, 1]) can capure more complex dependencies, bu ofen suffer when he manifold assumpions are violaed, e.g., in he presence of noise. Probabilisic laen variable models have he advanage of being able o ake he uncerainies ino accoun when learning he laen represenaions. Perhaps he mos successful model in he conex of modelling human moion is he Gaussian process laen variable model (GPLVM) [1], where he non-linear mapping beween he laen space and he high dimensional space is modeled wih a Gaussian process. This provides powerful prior models, which have been employed for characer animaion [,, 1] and human body racking [, 1, ]. In he conex of racking, one is ineresed in esimaing he sae of a dynamic sysem. The mos commonly used echnique for sae esimaion is Bayesian filering, which recursively esimaes he poserior probabiliy of he sae of he sysem. The wo key componens in he filer are he predicion model, which describes he emporal evoluion of he process, as well as he observaion model which links he sae and he observaions. A parameric form is ypically employed for boh models. Ko and Fox [1] inroduced he GP-BayesFiler, which defines he predicion and observaion models in a nonparameric way via Gaussian processes. This approach is well suied when accurae parameric models are difficul o obain. Is main limiaion, however, resides in he fac ha i requires ground ruh saes (as GPs are supervised), which are ypically no available. GPLVMs were employed in [11] o learn he laen space in an unsupervised manner, bypassing he need for labeled daa. This, however, can no exploi he incoming sream of daa available in he online seing as he laen space is learned offline. Furhermore, only unimodal predicion and observaion models can be capured due o he fac ha he models learned by GP are nonlinear bu Gaussian. In his paper we exend he previous non-parameric filers o learn he laen space in an online fashion as well as o handle mulimodal disribuions for boh he predicion and observaion models. Towards his goal, we employ a mixure model represenaion in he paricle filering framework. For he mixure componens, we invesigae wo compuaionally efficien GP varians which can updae he predicion and observaion models in an online fashion, and cope wih he growh in complexiy as he number of daa poins increases over ime. More specifically, he sparse

online GP [] selecs he acive se in a online fashion o efficienly mainain sparse approximaions o he models. Alernaively, he local GP [] reduces he compuaion by imposing local sparsiy. We demonsrae he effeciveness of our approach on a wide variey of moions, and show ha boh approaches perform beer han exising algorihms. In he remainder of he paper we firs presen a review on Bayesian filering and he GPLVM. We hen inroduce our algorihm and show our experimenal evaluaion followed by he conclusions. BACKGROUND In his secion we review Bayesian filering and Gaussian process laen variable models..1 BAYESIAN FILTERING Bayesian filering is a sequenial inference echnique ypically employed o perform sae esimaion in dynamic sysems. Specifically, he goal is o recursively compue he poserior disribuion of he curren hidden sae x given he hisory of observaions y 1: = (y 1,..., y ) up o he curren ime sep p(x y 1: ) p(y x ) p(x x 1 )p(x 1 y 1: 1 )dx 1 where p(x x 1 ) is he predicion model ha represens he sysem dynamics, and p(y x ) is he observaion model ha represens he likelihood of an observaion y given he sae x. One of he mos fundamenal Bayesian filers is he Kalman filer, which is a maximum-a-poseriori esimaor for linear and Gaussian models. Unforunaely, i is ofen no applicable in pracice since mos real dynamical sysems are nonlinear and/or non-gaussian. Two popular exensions for non-linear sysems are he exended Kalman filer (EKF) and he unscened Kalman filer (UKF) []. However, similar o he Kalman filer, he performance of EKF and UKF is poor when he models are mulimodal []. In conras, paricle filers ha are no resriced o linear and Gaussian models have been developed by using sequenial Mone Carlo sampling o represen he underlying poserior p(x y 1: ) []. More specifically, a each ime sep, N p paricles of x are drawn from he predicion model p(x x 1 ), and hen all he paricles are weighed according o he observaion model p(y x ). The poserior p(x y 1: ) is approximaed using hese N p weighed paricles. Finally, he N p paricles are resampled for he nex sep. Unforunaely, he parameric descripion of he dynamic models limis he esimaion accuracy of Bayesian filers. Recenly, a number of GP-based Bayesian filers were proposed by learning he predicion and observaion models using GP regression [1, ]. This is a promising alernaive as GPs are non-parameric and can capure complex mappings. However, raining hese mehods requires access o ground ruh daa before filering. Unforunaely, he inpus of he raining se are he hidden saes which are no always known in real-world applicaions. Two exensions were inroduced o learn he hidden saes of he raining se via a non-linear laen variable model [11] or a sparse pseudo-inpu GP regression []. However, hese mehods require offline learning procedures, which are no able o exploi he incoming daa sreams. In conras, we propose wo non-parameric paricle filers ha are able o exploi he incoming daa o learn beer models in an online fashion.. GAUSSIAN PROCESS DYNAMICAL MODEL The Gaussian Process Laen Variable Model (GPLVM) is a probabilisic dimensionaliy reducion echnique, which places a GP prior on he observaion model [1]. Wang e al. [] proposed he Gaussian Process Dynamical Model (GPDM), which enriches he GPLVM o capure emporal srucure by incorporaing a GP prior over he dynamics in he laen space. Formally, he model is: x = f x (x 1 ) + η x y = f y (x ) + η y where y R Dy represens he observaion and x R Dx he laen sae, wih D y D x. The noise processes are assumed o be Gaussian η x N (, σ xi) and η y N (, σ yi). The nonlinear funcions f i x and f i y have GP priors, i.e., f i x GP(, k x (x, x )) and f i y GP(, k y (x, x )) where k x (, ) and k y (, ) are he kernel funcions. For simpliciy, we denoe he hyperparameers of he kernel funcions by θ. Le x 1:T = (x 1,, x T ) be he laen space coordinaes from ime = 1 o ime = T. GPDM is ypically learned by minimizing he negaive log poserior log(p(x 1:T, θ y 1:T )) wih respec o x 1:T, and θ []. Afer x 1:T and θ are obained, a sandard GP predicion is used o consruc he model p(x x 1, θ, X T ) and p(y x, θ, Y T ) wih daa X T = {(x k 1, x k )} T k= and Y T = {(x k, y k )} T k=1. Tracking ( > T ) is hen performed assuming he model is fixed and can be done using, e.g., a paricle filer as described above. The major drawback of his approach is ha i is no able o adap o new observaions during racking. As shown in our experimenal evaluaion, his resuls in poor performance when he raining se is small.

ONLINE GP PARTICLE FILTER In order o solve he above-menioned difficulies in learning and filering wih dynamic sysems, we propose an Online GP Paricle Filer framework o learn and refine he model during racking, i.e., he predicion p(x x 1 ) and observaion p(y x ) models are updaed online in he paricle filering framework. To accoun for mulimodaliy and he significan amoun of uncerainy ha can be presen, we propose o represen he predicion and observaion models by a mixure model. For each mixure componen, we will invesigae wo differen GP varians. Le he predicion and observaion models a 1 be p(x x 1, Θ 1,M ) = 1 RM R M i=1 p(x x 1, Θ i 1,M )(1) p(y x, Θ 1,O ) = 1 RO R O i=1 p(y x, Θ i 1,O ) () where Θ i 1,M and Θi 1,O represens he parameers of he i-h componen, Θ 1,M = {Θ i 1,M }R M i=1 and Θ 1,O = {Θ i 1,O }R O i=1 are he parameers of all componens. A he -h ime sep, we run a sandard paricle filer o obain a number of weighed paricles. The laen space represenaions a ime can be obained by resampling he weighed paricles. Then, we assign each paricle o he mos likely mixure componen of p(x x 1, Θ 1,M ) and p(y x, Θ 1,O ) o capure he muli-modaliy of he predicion and observaion models. Finally, we compue he mean laen saes of he assigned paricles and use his mean sae o updae he corresponding componens parameers, Θ i,m for he predicion (or moion) model and Θi,O for he observaion model. The whole framework is summarized in Algorihm 1. Wha remains is o specify how he parameers of individual componens are represened and updaed (lines 1 and in Algorihm 1). As noed above, we aim o use a GP model for each mixure componen. However, a sandard implemenaion would require O( ) operaions and O( ) memory. As grows linearly over ime, he paricle filer will quickly become oo compuaionally and memory inensive. Thus a primary challenge is how o efficienly updae he GP mixure componens in he predicion and observaion models. In order o efficienly updae Θ i,m and Θi,O in an online manner, we consider wo fas GP-based sraegies: Sparse Online GP (SOGP) and Local GPs (LGP) in which he reducion in memory and/or compuaion is achieved by an online sparsificaion and a local expers mechanism respecively. A deailed review of fas GP approaches can be found in [1, 1]. The specific conens of Θ i,m or Θi,O will vary depending on he mehod used. In he case of SOGP i will conain some compued quaniies and he acive se while for LGP i will simply be he se of all raining poins. While we will Algorihm 1 Online GP-Paricle Filer 1: Iniialize model parameers Θ based on y 1:T : Iniialize paricle se x (1:N P ) T based on y 1:T : for = T + 1 o T do : for i = 1 o N p do : x (i) : ŵ (i) p(x x (i) 1, Θ 1,M ) = p(y x (i), Θ 1,O ) : end for : Normalize weighs w (i) = ŵ (i) /( N p i=1 ŵ(i) ) : Resample paricle se wih probabiliies w (1:Np) 1: for i = 1 o N p do 11: ηm i = arg max j p(x (i) x (i) 1, Θj 1,M ) 1: ηo i = arg max j p(y x (i), Θ j 1,O ) 1: end for 1: for j = 1 o R M do 1: n j 1 = N p i=1 δ(ηi M = j) 1: x j 1 = 1 Np n j i=1 δ(ηi M = j)x(i) 1 1 1: x j = 1 Np n j i=1 δ(ηi M = j)x(i) 1 1: Updae Θ j,m wih ( xj 1, xj ) 1: end for : for j = 1 o R O do 1: n j 1 = N p i=1 δ(ηi O = j) : x j = 1 Np n j i=1 δ(ηi O = j)x(i) 1 : Updae Θ j,o wih ( xj, y ) : end for : end for focus on hese wo sraegies, we noe ha in principle any similar updae sraegy could be used insead, such as informaive vecor machines [1] or local regression approaches [,, ]. In wha follows, o avoid confusion wih he noaions of he laen sae and observaion, we will use a and b o indicae he inpu and oupu when we describe SOGP and LGP regression in which we consider modeling a generic funcion b = f(a) + ξ, wih ξ N (, σ I)..1 SPARSE ONLINE GAUSSIAN PROCESS The Sparse Online Gaussian Process (SOGP) of [, ] is a well-known algorihm for online learning of GP models. To cope wih he fac ha daa arrives in an online manner, SOGP rains a GP model sequenially by updaing he poserior mean and covariance of he laen funcion values of he raining se. This online procedure is coupled wih a sparsificaion sraegy which ieraively selecs a fixed-size subse of poins o form he acive se, prevening he oherwise unbounded growh of he compuaion and memory load. The key of SOGP is o mainain he join poserior over he laen funcion values of he fixed-size acive se D 1, i.e.,

N (µ 1, Σ 1 ), by recursively updaing µ 1 and Σ 1. When a new observaion (a, b ) 1 is available, we perform he following updae o ake he new daa poin ino accoun [] q = Q 1 k 1 (a ) () ρ = k(a, a ) k 1 (a ) T Q 1 k 1 (a ) () ˆσ = σ + ρ + q T Σ 1 q () [ ] Σ δ = 1 q ρ + q T () Σ 1 q [ ] µ 1 µ = q T + ˆσ µ (b q T µ 1 )δ () 1 [ ] Σ 1 Σ Σ = 1 q q T Σ 1 ρ + q T ˆσ Σ 1 q δ δ T () where k 1 (a ) is he kernel vecor which is consruced from a and he acive se D 1, and Q 1 is he inverse kernel marix of he acive se D 1. One of he key seps in his algorihm is how o decide when o add he new poin o he acive se. We employed he sraegy suggesed by [, ], and ignore he new poin wih ρ < ɛ for some small value of ɛ (we use ɛ = 1 ). In his case, he µ, Σ are updaed as µ [µ ] i, Σ [Σ ] i, i where i = is he index of he new poin, [ ] i removes he i-h enry of a vecor, and [ ] i, i removes he i-h row and column of a marix. Addiionally, he inverse kernel marix is simply Q = Q 1 because he new poin is no included in he acive se. When ρ ɛ, we add he new poin o he acive se D = D 1 {(a, b )}. The µ, Σ are hen he same as Eq. () and (), and he inverse kernel marix is updaed o be [] Q = [ Q 1 ] + ρ [ q q T ] q q T 1 When he size of he acive se is larger han he fixed size N A because a new poin was added, we mus remove a poin. This is done by selecing he one which minimally affecs he predicions according o he squared prediced error. Following [, ], we remove he j-h daa poin wih j = arg min j () ( ) [Q µ ] j (1) [Q ] j,j where [ ] j selecs he j-h enry of a vecor and [ ] j,j selec he jh diagonal enry of a marix. Once a poin has been seleced for removal, µ, Σ and Q are updaed as µ [µ ] j (11) Σ [Σ ] j, j (1) Q [Q ] j, j [Q ] j,j [Q ] T j,j [Q ] j,j (1) 1 For simpliciy of presenaion, we assume ha b is a scalar. The exension o vecor valued b is sraighforward. Algorihm SOGP Updae inpu Previous poserior quaniies µ 1, Σ 1, Q 1 inpu Previous acive se D 1 inpu New inpu-oupu observaion (a, b ) pair 1: Compue ρ, µ and Σ as in Equaions (), () and (). : if ρ < ɛ hen : Perform updae µ [µ ] i, Σ [Σ ] i, i where i is he index of he newly added row o µ. : Se Q = Q 1, D = D 1. : else : Compue Q as in Equaion (). : Add o acive se D = D 1 {(a, b )}. : end if : if D > N A hen 1: Selec poin j o remove using Equaion (1). 11: Perform updae µ [µ ] j, Σ [Σ ] j, j and Q [Q ] j, j [Q] j,j[q]t j,j [Q ] j,j. 1: Remove j from acive se D D \ {(a j, b j )}. 1: end if oupu µ, Σ, Q and D where [ ] j,j selecs he j-h column of he marix wih he j-h row removed and he poin is removed from he acive se D D \ {(a j, b j )}. The join poserior a ime can be used o consruc he predicive disribuion for a new inpu a p SOGP (b a, D, Θ) = N (b b, σ ) (1) where b = k (a ) T Q µ and σ = σ + k(a, a ) + k (a ) T (Q Σ Q Q )k (a ). We summarize he SOGP updaes in Algorihm.. LOCAL GAUSSIAN PROCESSES l=1 An alernaive o he SOGP approach is o use Local Gaussian Processes (LGP), which was developed specifically o deal wih large, muli-modal regression problems [1, ]. In LGP, given a es inpu a and a se of inpu-oupu pairs D = {(a i, b i )} N i=1 }, he M a-neares neighbors D a = {(a l, b l )} Ma are seleced based on he disance in he inpu space d l = a l a. Then, for each of he M a neighbors, M b -neares neighbors D bl = {(a j, b j )} M b j=1 are seleced based on he disance in he oupu space o b l. These neighbors are hen combined o form a local GP exper which makes a Gaussian predicion wih mean and covariance µ l = B Dbl K 1 D bl,d bl k Dbl (a ) σ l = k(a, a ) k Dbl (a ) T K 1 D bl,d bl k Dbl (a ) + σ where B Dbl is he marix whose columns are he M b neares neighbors of b l, k Dbl (a ) is he vecor of kernel funcion values for he inpu a and he poins in D bl, and

K Dbl,D bl is he kernel marix for he poins in D bl. The final predicive disribuion is hen formed by combining all local expers in a mixure model M a p LGP (b a, D, Θ) = w l N (b µ l, σl I) (1) wih weighs w l 1/d l. l=1 EXPERIMENTAL EVALUATION To illusrae our approach we choose very differen moions, i.e., walking, golf swing, swimming as well as exercises (composed of side wis and squa). The daa consiss of moion capure from he CMU daase [], where each observaion is a dimensional vecor conaining he D roaions of all joins. We normalize he daa o be mean zero and subsample he observaions o reduce he correlaion beween consecuive frames. We use a frequency of 1 frames/s for walking and swimming, frames/s for golf swing and frames/s for he exercise moion. We compue all resuls averaged over rials and repor he average roo mean squared error as our measure of performance. In all he experimens, he laen space dimensionaliy is se o be as is common for human moion models []. We use PCA o iniialize he laen space and K-means o obain he daa poins used for he mixure componens. We choose he compound kernel funcion k(x, x ) = σf exp(. x x /γ ) + l x T x for boh predicion and observaion mappings. Unless oherwise saed, we use paricles, a raining se of size of /// and /// mixure componens for walking/golf/swimming/exercise moions respecively. For LGP, he number of local GP expers is ///, and he size of each local exper is ///. For SOGP, he size of he acive se is ///. The parameer values were chosen o balance compuaional cos wih he predicion accuracy and in our experimens we demonsrae he robusness of our approach o hese parameers..1 COMPARISON TO STATE-OF-THE-ART We compare our approaches o wo baselines: The firs one is he approach of Ko and Fox [11] where a GPDM is learned offline wih gradien descen [] before performing paricle filering for sae esimaion. The second baseline is similar, bu learns he GPDM offline using sochasic gradien descen []. We esed he baselines in wo differen seings. Firs, only he iniial raining se is available o learn he predicion and observaion models. Second, all he daa (including fuure sreamed examples) are used o learn he predicion and observaion models. Noe ha he laer represens he oracle for Ko and Fox [11]. Number of Paricles: We evaluae how he accuracy changes as a funcion of he number of paricles, N p. As expeced, he predicion error is reduced in all he mehods when he number of paricles increases. As shown in he firs row of Fig. 1, our approaches are superior o he baselines. Imporanly, we ouperformed he oracle baseline as we are able o represen muli-modal disribuions effecively. This is paricularly imporan in he exercise sequence as he dynamics are clearly mulimodal due o he differen moions ha are performed in he sequence. Furhermore, our LGP varian ouperforms SOGP. We believe his is due o he fac ha SOGP has a fixed capaciy while LGP is able o leverage more raining daa when making predicions. Influence of noise: In his experimen we evaluae he robusness of all approaches o addiive noise in he observaions. The second row of Fig. 1 shows ha our LGP paricle filer significanly ouperforms he baselines, paricularly in he exercise sequence. Our SOGP ouperforms all baselines ha have access o he same raining se, and is only beaen by he oracle for walking. Size of Training Se: We nex evaluae how he accuracy depends on he size of he inial raining se, T. The firs row of Fig. clearly indicaes ha our mehods perform well even when he raining se is very small. In conras, he wo baselines require bigger raining ses o achieve comparable performance. This is expeced as he baselines do no updae he laen space o ake he incoming observaions ino accoun.. QUALITATIVE EXPERIMENTS Fig. shows he laen space of boh SOGP and LGP filers when employing paricles for each ime sep (depiced in blue). From he D laen space and prediced skeleons, we find ha he manifolds of boh LGP and SOGP paricle filers have a good represenaion of he high-dimensional human moion daa.. PROPERTIES OF OUR METHODS We nex discuss various aspecs of our mehod and evaluae he influence of he parameers of SOGP and LGP filers. For LGP, due o he fac ha he daa sizes of walking, golf and swimming moions are small, we reduced he number (size) of local expers o be able o increase he size (number) of he local expers. Compuaional Complexiy: Overall he compuaional complexiy of our mehod (Alg 1) is mainly deermined by he complexiy of consrucing a predicion disribuion for each componens (lines - and 11-1) and model updaes (line 1 and line ). Specifically, for an individual componen which is eiher SOGP or LGP, compuing he predicion disribuion is O(N A ) or O(M am b + T M am b ) respecively where N A is he size of acive se, M a is he number of local expers, M b are he number of neigh-

GPDM+PF GPDM(Oracle)+PF Sochasic GPDM+PF Sochasic GPDM(Oracle)+PF Our SOGP+PF Our LGP+PF.. 11.... 1 1 Number of Paricles 1. 1 Number of Paricles. Number of Paricles 1 Number of Paricles. 1............ 1... Sandard Deviaion of Noise 1 1. Sandard Deviaion of Noise.. Sandard Deviaion of Noise 1 Sandard Deviaion of Noise Figure 1: Roo mean squared error as a funcion of (1s row) he number of paricles in he paricle filer, (nd row) he sandard deviaion of noise added o he observaions. The columns (from lef o righ) represen walking, golf, swimming and exercise moions. Noe ha our approach ouperforms he baselines in all seings. Handling mulimodal disribuions is paricularly imporan in he exercise example as i is composed of a variey of differen moions. GPDM+PF Sochasic GPDM+PF Our SOGP+PF Our LGP+PF.. 11..... 1.... 1. 1 1 Size of Iniial daa 1. Size of Iniial Daa Size of Iniial Daa Size of Iniial Daa... 11.. 1.. 1 1 Missing Dimension 1. 1 1 Missing Dimension 1 1 Missing Dimension 1 1 Missing Dimension Figure : Roo Mean Squared Error as a funcion of (1s row) he number of iniial raining poins, (nd row) he number of he missing dimensions. The columns (from lef o righ) are respecively for walking, golf, swimming and exercise moions. bors in he oupu space and T M a M b comes from he KNN search. The model updaes for he mixure componens (lines 1 and ) have a compuaional complexiy of O(NA ) and O(1) for SOGP and LGP respecively.

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 (a) D Walk (b) D Golf (c) D Swim (d) D Exercise 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 (e) D Walk (f) D Golf (g) D Swim (h) D Exercise (i) Walk (=,, ) (j) Golf (=, 1, ) (k) Swim (=1,) (l) Exercise (=,,1) Figure : D laen spaces learned while racking: The firs row depics he resuls of our SOGP varian, while he second row shows our LGP varian. In he walk/golf/swimming plos he red curve represens he prediced mean of he laen sae sequence and he blue crosses are he paricles a each sep. In he plos for exercise (las column), he red, blue and green curves are he prediced mean of he laen sae for hree moions in he exercise sequence, and he black crosses are he paricles a each sep. The hird row depics he prediced skeleons, where ground ruh is shown in green, our SOGP varian in blue and our LGP varian in red. Our SOGP+PF Our SOGP(no updae)+pf Our LGP+PF Our LGP(no updae)+pf....... 1 Number of Mixure Componens 1. 1 Number of Mixure Componens. 1 Number of Mixure Componens 1 Number of Mixure Componens Figure : Roo Mean Squared Error as a funcion of he number of mixure componens. The columns (from lef o righ) are respecively for walking, golf, swimming and exercise moions. Number of Mixure Componens: Fig. shows performance as a funcion of he number of mixure componens, R M and R O, for boh SOGP and LGP. For LGP+PF in walk/golf/swim/exercise, he number of local GPs are 1/1// and he size of each local GP are ///. In all cases, we se R M = R O. Noe ha performance ypically increases wih he number of mixure componens, for SOGP, bu less so for LGP. Furhermore, our approaches ouperform he baselines in which he model is no updaed during filering indicaing ha he online model updaing is very imporan in pracice. Also noe ha while LGP generally ouperforms SOGP, he difference quickly declines

.... 1 1 1 1 1 1... Walk Golf Swim Exercise. 1 1 (a) Size of Acive Daa (b) Number of Local GPs (c) Size of Local GPs Figure : Roo mean squared error as a funcion of he size of he acive se in SOGP, he number of he local GP expers and he size of each local GP exper in LGP. In he subplo (c), he op x-axis is for exercise and he boom one for he oher moions. Figure : Prediced skeleon for missing pars (Walk: wo legs; Golf, Swim and Exercise: lef arm). The ground ruh is shown in green, our SOGP paricle filer in blue and LGP paricle filer in red. We show he prediced performance a =,, for walk, =,, for golf, =1, for swim, =,, 11 for exercise. as he number of mixure componens increases. This suggess ha, when he fixed memory requiremens of SOGP is desirable, a larger number of mixure componens will achieve performance comparable o LGP. Acive Se size in SOGP: To explore he effec of he size of he acive se, N A, on performance we se he number of mixure componens, R M and R O, o be /1// for walk/golf/swim/exercise, and use he same seings as before for he oher parameers. Resuls are shown in Fig. (a). As expeced performance improves when he size of he acive se increases. Number and Size of Local Expers in LGP: Figs. (b) and (c) show he performance of our approach as a funcion of he number of local GP expers, M a, as well as heir size, M b. For his experimen we se he number of mixure componens, R M and R O, o be 1//1/ and used he same seings as before for he oher parameers excep when evaluaing he size of each local GP exper where we se he number of local GP expers o ///. As shown in he figure, even wih he small number (size) of local GP expers, we sill achieve good performance.. HANDLING MISSING DATA In his seing, we evaluae he capabiliies of our approaches o handle missing daa. We assume ha he iniial se has no missing values, bu a fixed se of join angles are missing for all incoming frames. Our approach is able o cope wih missing daa wih only wo small modificaions. Firs, paricles are weighed only based on he observed dimensions. Furhermore, when updaing he predicion and observaion models, we employ mean impuaion for he missing observaion dimensions. Fig. shows reconsrucions of he missing dimensions for all our moions, which consiss of he wo legs for walking, he lef arm for golf swing, swimming and exercise moions. We can see ha our approach is able o reconsruc he missing pars well. Finally, o evaluae he racking performance as a funcion of he number of missing dimensions, we randomly generae he indices for he missing dimensions and use he same missing dimensions for all incoming frames. The second row of Fig. shows ha, compared o he baselines, our mehods perform well even when he number of missing dimensions is (1/ of he skeleon) for all he moions. In addiion, our LGP paricle filer ouperforms our SOGP varian. CONCLUSION In his paper we have presened a novel non-parameric approach o Bayesian filering, where he observaion and predicion models are consruced using a mixure model wih GP componens learned in an online fashion. We have demonsraed ha our approach can capure he mul-

imodaliy accuraely and efficienly by online updaes. We have explored wo fas GP varians for updaing which keep memory and compuaion bounded for individual mixure componens. We have demonsraed he effeciveness of our approach when racking differen human moions and explored he impac of various parameers on performance. The Local GP paricle filer proved superior o our SOGP varian, however hese differences can be miigaed by using more mixure componens when using SOGP. In he fuure, we plan o invesigae he usefulness of our approach in oher seings such as shape deformaion esimaion and financial ime series. References [1] K. Chalupka, C. K. I. Williams, and I. Murray. A Framework for Evaluaing Approximaion Mehods for Gaussian Process Regression. JMLR, 1. [] CMU Mocap Daabase. CMU Mocap Daabase. hp://mocap.cs.cmu.edu/. [] L. Csao and M. Opper. Sparse online gaussian processes. In Neural Compuaion,. [] M. P. Deisenroh, M. F. Huber, and U. D. Hanebeck. Analyic momen-based gaussian process filering. In ICML,. [] A. Douce, S. Godsill, and C. Andrieu. On sequenial mone carlo sampling mehods for bayesian filering. Saisics and Compuing,. [] T. Hasie and C. Loader. Local Regression: Auomaic Kernel Carpenry. Saisical Science, 1. [] R. A. Jacobs, M. I. Jordan, S. J. Nowlan, and G. E. Hinon. Adapive Mixures of Local Expers. Neural Compuaion, 11. [] O. C. Jenkins and M. Maarić. A spaio-emporal exension o Isomap nonlinear dimension reducion. In ICML,. [] S. J. Julier, J. K. Uhlmann, and H. F. Durran-Whye. A new approach for filering nonlinear sysems. In American Conrol Conference, 1. [1] J. Ko and D. Fox. Gp-bayesfilers: Bayesian filering using gaussian process predicion and observaion models. In IROS,. [11] J. Ko and D. Fox. Learning gp-bayesfilers via gaussian process laen variable models. In RSS,. [1] N. Lawrence. Probabilisic non-linear principal componen analysis wih gaussian process laen variable models. JMLR, :1 11,. [1] N. Lawrence, M. Seeger, and R. Herbrich. Fas Sparse Gaussian Process Mehods: The Informaive Vecor Machine. In NIPS,. [1] C-S. Lee and A. Elgammal. Coupled Visual and Kinemaics Manifold Models for Human Moion Analysis. IJCV, 1. [1] S. Levine, J. Wang, A. Haraux, Z. Popovic, and V. Kolun. Coninuous characer conrol wih lowdimensional embeddings. In SIGGRAPH, 1. [1] K. Moon and V. Pavlovic. Impac of dynamics on subspace embedding and racking of sequences. In CVPR, pages 1,. [1] D. Nguyen-Tuong, J. Peers, and M. Seeger. Local gaussian process regression for real ime online model learning and conrol. In NIPS,. [1] J. Quiñonero-Candela and C. E. Rasmussen. A Unifying View of Sparse Approximae Gaussian Process Regression. JMLR,. [1] S. Roweis and L. Saul. Nonlinear Dimensionaliy Reducion by Locally Linear Embedding. Science, ():,. [] S. Schaal and C. G. Akeson. Consrucive Incremenal Learning From Only Local Informaion. Neural Compuaion, 1. [1] J. Tenenbaum, V. de Silva, and J. Langford. A Global Geomeric Framework for Nonlinear Dimensionaliy Reducion. Science,. [] R. Turner, M. P. Deisenroh, and C. E. Rasmussen. Sae-space inference and learning wih gaussian processes. In AISTATS, 1. [] R. Urasun and T. Darrell. Sparse probabilisic regression for aciviy-independen human pose inference. In CVPR,. [] R. Urasun, D. Flee, A. Herzman, and P. Fua. Priors for people racking from small raining ses. In ICCV,. [] R. Urasun, D. Flee, and P. Fua. d people racking wih gaussian process dynamical models. In CVPR,. [] R. Urasun, D Flee, A. Geiger, J. Popovic, T. Darrell, and N. Lawrence. Topologically-consrained laen variable models. In ICML,. [] S. Van Vaerenbergh, M. Lázaro-Gredilla, and I. Sanamaría. Kernel recursive leas-squares racker for ime-varying regression. IEEE Transacions on Neural Neworks and Learning Sysems, 1. [] J. Wang, D. Flee, and A. Herzmann. Gaussian process dynamical models for human moion. PAMI, ():,. ISSN 1-. [] A. Yao, J. Gall, L. V. Gool, and R. Urasun. Learning probabilisic non-linear laen variable models for racking complex aciviies. In NIPS, 11.