Factored Conditional Restricted Boltzmann Machines for Modeling Motion Style



Similar documents
Clustering based Two-Stage Text Classification Requiring Minimal Training Data

Multi-agent System for Custom Relationship Management with SVMs Tool

An Ensemble Classification Framework to Evolving Data Streams

What is Candidate Sampling

Adaptive Multi-Compositionality for Recursive Neural Models with Applications to Sentiment Analysis

Prediction of Success or Fail of Students on Different Educational Majors at the End of the High School with Artificial Neural Networks Methods

Predicting Advertiser Bidding Behaviors in Sponsored Search by Rationality Modeling

Forecasting the Direction and Strength of Stock Market Movement

Cardiovascular Event Risk Assessment Fusion of Individual Risk Assessment Tools Applied to the Portuguese Population

Approximation Algorithms for Data Distribution with Load Balancing of Web Servers

SIMPLIFYING NDA PROGRAMMING WITH PROt SQL

Expressive Negotiation over Donations to Charities

Recurrence. 1 Definitions and main statements

TCP/IP Interaction Based on Congestion Price: Stability and Optimality

An Efficient Job Scheduling for MapReduce Clusters

Neural Network-based Colonoscopic Diagnosis Using On-line Learning and Differential Evolution

8 Algorithm for Binary Searching in Trees

An Interest-Oriented Network Evolution Mechanism for Online Communities

Face Verification Problem. Face Recognition Problem. Application: Access Control. Biometric Authentication. Face Verification (1:1 matching)

Dynamic Virtual Network Allocation for OpenFlow Based Cloud Resident Data Center

Off-line and on-line scheduling on heterogeneous master-slave platforms

Dropout: A Simple Way to Prevent Neural Networks from Overfitting

Branch-and-Price and Heuristic Column Generation for the Generalized Truck-and-Trailer Routing Problem

XAC08-6 Professional Project Management

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network

How To Calculate The Accountng Perod Of Nequalty

Luby s Alg. for Maximal Independent Sets using Pairwise Independence

Predictive Control of a Smart Grid: A Distributed Optimization Algorithm with Centralized Performance Properties*

DEFINING %COMPLETE IN MICROSOFT PROJECT

On-Line Trajectory Generation: Nonconstant Motion Constraints

Swing-Free Transporting of Two-Dimensional Overhead Crane Using Sliding Mode Fuzzy Control

Logistic Regression. Steve Kroon

Georey E. Hinton. University oftoronto. Technical Report CRG-TR May 21, 1996 (revised Feb 27, 1997) Abstract

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification

BERNSTEIN POLYNOMIALS

Bayesian Network Based Causal Relationship Identification and Funding Success Prediction in P2P Lending

A Simple Congestion-Aware Algorithm for Load Balancing in Datacenter Networks

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College

MATHEMATICAL ENGINEERING TECHNICAL REPORTS. Sequential Optimizing Investing Strategy with Neural Networks

A Continuous Restricted Boltzmann Machine with a Hardware-Amenable Learning Algorithm

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by

Extending Probabilistic Dynamic Epistemic Logic

Hacia un Modelo de Red Inmunológica Artificial Basado en Kernels. Towards a Kernel Based Model for Artificial Immune Networks

THE DISTRIBUTION OF LOAN PORTFOLIO VALUE * Oldrich Alfons Vasicek

Lecture 2: Single Layer Perceptrons Kevin Swingler

A study on the ability of Support Vector Regression and Neural Networks to Forecast Basic Time Series Patterns

RELIABILITY, RISK AND AVAILABILITY ANLYSIS OF A CONTAINER GANTRY CRANE ABSTRACT

Increasing Supported VoIP Flows in WMNs through Link-Based Aggregation

A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm

A Replication-Based and Fault Tolerant Allocation Algorithm for Cloud Computing

Adaptive Fractal Image Coding in the Frequency Domain

Support Vector Machines

A neuro-fuzzy collaborative filtering approach for Web recommendation. G. Castellano, A. M. Fanelli, and M. A. Torsello *

How To Understand The Results Of The German Meris Cloud And Water Vapour Product

A Fast Incremental Spectral Clustering for Large Data Sets

v a 1 b 1 i, a 2 b 2 i,..., a n b n i.

USING EMPIRICAL LIKELIHOOD TO COMBINE DATA: APPLICATION TO FOOD RISK ASSESSMENT.

Realistic Image Synthesis

Fragility Based Rehabilitation Decision Analysis

ECE544NA Final Project: Robust Machine Learning Hardware via Classifier Ensemble

Can Auto Liability Insurance Purchases Signal Risk Attitude?

Simple Interest Loans (Section 5.1) :

Bypassing Synthesis: PLS for Face Recognition with Pose, Low-Resolution and Sketch

7.5. Present Value of an Annuity. Investigate

Damage detection in composite laminates using coin-tap method

Brigid Mullany, Ph.D University of North Carolina, Charlotte

ANALYTICAL CHARACTERIZATION OF WLANS FOR QUALITY-OF-SERVICE WITH ACTIVE QUEUE MANAGEMENT

IMPACT ANALYSIS OF A CELLULAR PHONE

Using Content-Based Filtering for Recommendation 1

Application of Quasi Monte Carlo methods and Global Sensitivity Analysis in finance

Loop Parallelization

An Alternative Way to Measure Private Equity Performance

The Dynamics of Wealth and Income Distribution in a Neoclassical Growth Model * Stephen J. Turnovsky. University of Washington, Seattle

L10: Linear discriminants analysis

Comparison of workflow software products

An interactive system for structure-based ASCII art creation

THE APPLICATION OF DATA MINING TECHNIQUES AND MULTIPLE CLASSIFIERS TO MARKETING DECISION

The Application of Fractional Brownian Motion in Option Pricing

CFD Simulation of Cloud and Tip Vortex Cavitation on Hydrofoils

An Enhanced Super-Resolution System with Improved Image Registration, Automatic Image Selection, and Image Enhancement

Chapter 4 ECONOMIC DISPATCH AND UNIT COMMITMENT

Modelling of Web Domain Visits by Radial Basis Function Neural Networks and Support Vector Machine Regression

A Dynamic Energy-Efficiency Mechanism for Data Center Networks

1. Fundamentals of probability theory 2. Emergence of communication traffic 3. Stochastic & Markovian Processes (SP & MP)

Project Networks With Mixed-Time Constraints

A Resources Allocation Model for Multi-Project Management

VRT012 User s guide V0.1. Address: Žirmūnų g. 27, Vilnius LT-09105, Phone: (370-5) , Fax: (370-5) , info@teltonika.

Improved SVM in Cloud Computing Information Mining

Transcription:

Factored Condtona Restrcted Botzmann Machnes for Modeng Moton Stye Graham W. Tayor GWTAYLOR@CS.TORONTO.EDU Geoffrey E. Hnton HINTON@CS.TORONTO.EDU Department of Computer Scence, Unversty of Toronto, Toronto, Ontaro M5S 2G4, Canada Abstract The Condtona Restrcted Botzmann Machne (CRBM) s a recenty proposed mode for tme seres that has a rch, dstrbuted hdden state and permts smpe, exact nference. We present a new mode, based on the CRBM that preserves ts most mportant computatona propertes and ncudes mutpcatve three-way nteractons that aow the effectve nteracton weght between two unts to be moduated by the dynamc state of a thrd unt. We factorze the threeway weght tensor mped by the mutpcatve mode, reducng the number of parameters from O(N 3 ) to O(N 2 ). The resut s an effcent, compact mode whose effectveness we demonstrate by modeng human moton. Le the CRBM, our mode can capture dverse styes of moton wth a snge set of parameters, and the three-way nteractons greaty mprove the mode s abty to bend moton styes or to transton smoothy between them. 1. Introducton Drected graphca modes (or Bayes nets) have been a domnant paradgm n modes of statc data. Ther tempora counterparts, Dynamc Bayes nets, generaze many exstng modes such as the Hdden Marov Mode (HMM) and ts varous extensons. In a but the smpest drected modes, nference s made dffcut due to a phenomenon nown as expanng away where observng a chd node renders ts parents dependent. An aternatve to approxmate nference n drected modes s to use a speca type of undrected mode, the Restrcted Botzmann Machne (RBM) (Smoensy, 1987), that aows effcent, exact nference. The Restrcted Botzmann Machne has an effcent, ap- Appearng n Proceedngs of the 26 th Internatona Conference on Machne Learnng, Montrea, Canada, 2009. Copyrght 2009 by the author(s)/owner(s). proxmate earnng agorthm caed contrastve dvergence (CD) (Hnton, 2002). RBMs have been used n a varety of appcatons (Hnton & Saahutdnov, 2006; Saahutdnov et a., 2007) and ther propertes have become better understood over the ast few years (Weng et a., 2005; Carrera-Perpnan & Hnton, 2005; Saahutdnov & Murray, 2008). The CD earnng procedure has aso been mproved (Teeman, 2008). A maor motvaton for the use of RBMs s that they can be used as the budng bocs of deep beef networs (DBN), whch are earned effcenty by tranng greedy, ayer-byayer. DBNs have been shown to earn very good generatve modes of handwrtten dgts (Hnton et a., 2006), but they fa to mode patches of natura mages. Ths s because RBMs have dffcuty n capturng the smoothness constrant n natura mages: a snge pxe can usuay be predcted very accuratey by smpy nterpoatng ts neghbours. Osndero and Hnton (2008) ntroduced the Semrestrcted Botzmann Machne (SRBM) to address ths concern. The constrants on the connectvty of the RBM are reaxed to aow atera connectons between the vsbe unts n order to mode the par-wse correatons between nputs, thus aowng the hdden unts to focus on modeng hgher-order structure. SRBMs aso permt deep networs. Each tme a new eve s added, the prevous top ayer of unts s gven atera conectons, so, after the ayerby-ayer earnng s compete, a ayers except the topmost contan atera connectons between unts. SRBMs mae t possbe to earn deep beef nets that mode mage patches much better, but they st have strong mtatons that can be seen by consderng the overa generatve mode. The equbrum sampe generated at each ayer nfuences the ayer beow by controng ts effectve bases. The mode woud be much more powerfu f the equbrum sampe at the hgher eve coud contro the atera nteractons at the ayer beow usng a three-way, mutpcatve reatonshp. Memsevc and Hnton (2007) ntroduced a more powerfu mode, the gated CRBM, whch permtted such mutpcatve nteractons and was abe to earn rch, dstrbuted representatons of mage transformatons.

Factored Condtona Restrcted Botzmann Machnes In ths paper, we expore the dea of mutpcatve nteractons n a dfferent type of CRBM (Tayor et a., 2007). Instead of gatng atera nteractons wth hdden unts, we aow a set of context varabes to gate the three types of connectons ( sub-modes ) n the CRBM shown n Fg. 1. Our modfcaton of the CRBM archtecture does not change the desrabe propertes reated to nference and earnng but maes the mode context-senstve. Whe our mode s appcabe to genera tme seres where condtona data s avaabe (e.g. seasona varabes for modeng ranfa occurrences, economc ndcators for modeng fnanca nstruments) we appy our wor to capturng aspects of stye n data captured from human moton (mocap). Tayor et a. (2007) showed that a CRBM coud capture many dfferent styes wth a snge set of parameters. Generaton of dfferent styes was purey based on ntazaton, and the mode archtecture dd not aow contro of transtons between styes nor dd t permt stye bendng. By usng stye varabes to gate the connectons of a CRBM, we obtan a much more powerfu generatve mode that permts controed transtonng and bendng. We demonstrate that n a condtona mode, gatng s superor to smpy usng abes to bas the hdden unts, whch s the technque most commony apped to statc modes. Ths paper s aso part of a arge body of wor reated to the separaton of stye and content n moton. The abty to separatey specfy the stye (e.g. sad) and the content (e.g. wa to ocaton A) s hghy desrabe for anmators. Prevous wor has ooed at appyng user-specfed stye to an exstng moton sequence (Hsu et a., 2005; Torresan et a., 2007). The drawbac to these approaches s that the user must provde the content. We propose a generatve mode for content that adapts to stystc contros. Recenty, modes based on the Gaussan Process Latent Varabe Mode (Lawrence, 2004) have been successfuy apped to separate content and stye n human moton (Wang et a., 2007). The advantage of our approach over such methods s that our mode does not need to retan the tranng dataset (ust a few frames for ntazaton) and s thus sutabe for ow-memory devces. Furthermore, tranng s near n the number of frames, and so our mode can scae up to massve datasets, une the erne-based methods whch are cubc n the number of frames. The rch, dstrbuted hdden state of our mode means that t does not suffer from the from the mted representatona power of HMM-based methods (e.g. Brand & Hertzmann, 2000). 2. Bacground 2.1. Condtona RBMs The CRBM (Fg. 1) s a non-near generatve mode for tme seres data that uses an undrected mode wth bnary Hdden ayer Vsbe ayer t-2 t-1 t Fgure 1. Archtecture of the CRBM atent varabes, h, connected to a coecton of vsbe varabes, v. The vsbe varabes can use any dstrbuton n the exponenta famy (Weng et a., 2005), but for mocap data, we use rea-vaued Gaussan unts (Freund & Hausser, 1992). At each tme step t, v and h receve drected connectons from the vsbe varabes at the ast N tmesteps. To smpfy the presentaton, we w assume the data at t 1,..., t N s concatenated nto a hstory vector whch we ca v <t. We w use to ndex the eements of v <t. The mode defnes a ont probabty dstrbuton over v t and h t, condtona on v <t and mode parameters, θ: E = p(v t, h t v <t, θ) = exp ( E (v t, h t v <t, θ)) /Z (â,t v,t ) 2 2σ 2 ˆb,t h,t W v,t σ h,t (1) where Z s a constant caed the partton functon whch s exponentay expensve to compute exacty. The dynamc bases, â,t = a + A v,<t and ˆb,t = b + B v,<t, express the net nput from the past to the vsbe and hdden unts, respectvey. As s commony done, we set σ = 1. Such an archtecture maes on-ne nference effcent and aows us to tran by mnmzng contrastve dvergence (for detas, see Hnton, 2002). Tayor et a. (2007) apped the CRBM to synthesze nove moton and perform on-ne fng n of data ost durng moton capture. An mportant feature of the CRBM s that once t s traned, we can add ayers e n a Deep Beef Networ (Hnton et a., 2006). The prevous ayer CRBM s ept, and the sequence of hdden state vectors, whe drven by the data, s treated as a new nd of fuy observed data. The next

Factored Condtona Restrcted Botzmann Machnes eve CRBM has the same archtecture as the frst (though t has bnary vsbe unts and we can change the number of hdden unts) and s traned n the exact same way. Upper eves of the networ can then mode more nterestng hgher-order structure. More ayers ad n capturng mutpe styes of moton, and permttng transtons between these styes (see Sec. 4). 2.2. Gated Condtona Restrcted Botzmann Machnes Memsevc and Hnton (2007) ntroduced a way of mpementng mutpcatve nteractons n a condtona mode. The gated CRBM was deveoped n the context of earnng transformatons between mage pars. The dea s to mode an observaton (the output) gven ts prevous nstance (the nput) (e.g. neghbourng frames of vdeo). The gated CRBM has two equvaent vews: frst, as gated regresson (Fg. 2a), where hdden unts can bend sces of a transformaton matrx nto a near regresson, and second as moduated fters (Fg. 2b) where nput unts gate a set of bass functons used to reconstruct the output. In the atter vew, each settng of the nput unts defnes an RBM (whch means that condtona on the nput, nference and earnng n a gated CRBM are tractabe). For ease of presentaton, (a) (b) Fgure 2. Two vews of the Gated Botzmann Machne. Reproduced from (Memsevc & Hnton, 2007). et us consder the case where a nput, output, and hdden varabes are bnary (the extenson to rea-vaued nput and output varabes s straghtforward). As n Eq. 1, the gated CRBM descrbes a ont probabty dstrbuton through exponentatng and renormazng an energy functon. Ths energy functon captures a possbe correatons between the components of the nput, x, the output, v, and the hdden varabes, h: E (v, h x, θ) = W v h x a v c v h b h (2) where a, b ndex the standard bases and c ndex the gated bases, whch shft a unt condtonay. The parameters W are the components of a three-way weght tensor. The CD weght updates for earnng a gated CRBM are smar to a standard RBM. 2.3. Factorng To mode tme seres, we can consder the output of a gated CRBM to be the current frame of data, v = v t, and the nput to be the prevous frame (or frames), x = v <t. Ths means that the gated CRBM s a nd of autoregressve mode where a transformaton s composed from a set of smper transformatons. The number of possbe compostons s exponenta n the number of hdden unts, but the componenta nature of the hdden unts prevents the number of parameters n the mode from becomng exponenta, as t woud n a mxture mode. Because of the three-way weght tensor, the number of parameters s cubc (assumng that the numbers of nput, output and hdden unts are comparabe). In many appcatons, ncudng mocap, strong underyng reguartes n the data suggest that structure can be captured usng three-way, mutpcatve nteractons but wth ess than the cubcay many parameters mped by the weght tensor. Ths motvates us to factorze the nteracton tensor nto a product of parwse nteractons. If we appy the factorzaton to Eq. 2, the frst term becomes f W f v W f h W f x v h x, where f ndexes a set of determnstc factors. Superscrpts dfferentate the three types of parwse nteractons: Wf v connect output unts to factors (undrected), Wf h connect hdden unts to factors (undrected), and Wf x connect nput unts to factors (drected). If the number of factors s comparabe to the number of other unts, ths reduces the number of parameters from O(N 3 ) to O(N 2 ). Athough factorng has been motvated by the ntroducton of mutpcatve nteractons, modes that ony nvove parwse nteracton can aso be factored. 3. A Stye-Gated, Factored Mode We now consder modeng mutpe styes of human moton usng factored, mutpcatve, three-way nteractons. Hnton et a. (2006) showed that a good generatve mode of handwrtten dgts coud be but by connectng a seres of softmax abe unts to the topmost hdden ayer of a DBN (Fg. 3a). Campng a abe changed the energy andscape of the autoassocatve mode formed by the top two ayers, such that performng aternatng Gbbs sampng woud produce a ont sampe compatbe wth a partcuar dgt cass. It s easy to extend ths modfcaton to the CRBM, where dscrete stye abes bas the hdden unts. In a CRBM, however, the hdden unts aso condton on

Factored Condtona Restrcted Botzmann Machnes nformaton from the past that s much stronger than the nformaton comng from the abe (Fg. 3b). The mode has earned to respect consstency of styes between frames and so w resst a transton ntroduced by changng the abe unts. (a) (b) Fgure 4. A factored CRBM whose nteractons are gated by reavaued stystc features. Fgure 3. a) In a deep beef networ, campng the abe unts changes the energy functon. b) In a condtona mode, abe nformaton s swamped by the sgna comng from the past. As n the gated CRBM, we are motvated to et stye change the nteractons of the unts as opposed to smpy ther effectve bases. Memsevc (2008) used factored three-way nteractons to aow the hdden unts of a gated CRBM to contro the effect of one vdeo frame on the subsequent vdeo frame. Fgure 4 shows a dfferent way of usng factored three-way nteractons to aow rea-vaued stye features, derved from dscrete stye abes, to contro three dfferent sets of parwse nteractons. Le the standard CRBM (Eq. 1), the mode defnes a ont probabty dstrbuton over v t and h t, condtona on the past N observatons, v <t, and mode parameters, θ. However, the dstrbuton s aso condtona on the stye abes, y t. Smar to our dscusson of the CRBM, we assume bnary stochastc hdden unts and rea-vaued vsbe unts wth addtve, Gaussan nose. For notatona ease, we assume σ = 1. The energy functon s: E (v t, h t v <t, y t, θ) = 1 (â,t v,t ) 2 2 Wf v Wf h Wf z v,t h,t z,t ˆb,t h,t. (3) f The three terms n Eq. 3 correspond to the three sub-modes (cooured bue, red, and green, respectvey n Fg. 4). For each sub-mode, what was a matrx of weghts s now repaced by three sets of weghts connectng unts to factors. The types of weghts are dfferentated agan by superscrpts. For exampe, the matrx of undrected weghts n the standard CRBM, W, has been repaced by three matrces nvoved n a factorzed, mutpcatve nteracton: Wf v, W f h, and W f z. The same process s apped to the other two sub-modes. Note that the three sub-modes may have a dfferent number of factors (whch we ndex by f, m, and n). The dynamc bases become: â,t = a + A v m A v<t m v,<t A z mz,t, (4) m ˆb,t = b + Bn h B v<t n v,<t Bnz z,t (5) n where the dynamc component of Eq. 4 and Eq. 5 s smpy the tota nput to the vsbe/hdden unt va the factors. The tota nput s a three-way product between the nput to the factors (comng from the past and from the stye features) and the weght from the factors to the vsbe/hdden unt. The dynamc bases ncude a statc component, a and b. As n the gated CRBM, we coud aso add three types of gated bases, correspondng to the parwse nteractons n each of the sub-modes. In our experments, we have not used any gated bases. The determnstc features, z t, are a near functon of the

Factored Condtona Restrcted Botzmann Machnes one-hot encoded stye abes, y t : z,t = p R p y p,t. (6) As wth other modes based on RBMs, the exstence of the partton functon means that maxmum ehood earnng s ntractabe. Nonetheess, t s easy to compute a good approxmaton to the gradent of an aternatve obectve functon caed the contrastve dvergence, whch eads to a set of very smpe gradent update rues. The updates for a W, A, and B parameters tae the form: X qr ) ( α q,t β r,t γ r,t 0 α q,t β r,t γ r,t K (7) t where α q,t ; q,,, s the unt connected to factor r; r f, m, n by weght X qr. Terms β r,t and γ r,t correspond to the tota nput that arrves at factor r from the two other types of unts nvoved n the three-way reatonshp. 0 s an expectaton wth respect to the data dstrbuton, and K s an expectaton wth respect to the ont dstrbuton obtaned from startng wth a tranng vector camped to the vsbes and performng K steps of aternatng Gbbs sampng (.e. CD-K). Consder two concrete exampes: W v f t v,t Wf h h,t v,t Wf h h,t W z W z f z,t 0 f z,t K, (8) A z m ( z,t A v mv,t A v<t m v,<t 0 t ) z,t A v mv,t A v<t m v,<t K. (9) The weghts connectng abes to features, R, can smpy be earned by bacpropagatng the gradents obtaned by CD. Snce these weghts affect a three sub-modes, ther updates are more compcated. Appyng the chan rue: R p ) ( C t y p,t 0 C t y p,t K, t C t = f + m + n W z f A z m B z n Wf v v,t Wf h h,t A v mv,t A v<t m v,<t Bnh h,t B v<t n v,<t. (10) The updates for the hdden and vsbe bases are the same as n the standard CRBM (Tayor et a., 2007). 3.1. Parameter sharng In addton to the massve reducton n the number of free parameters obtaned by factorzng, further savngs may be obtaned by tyng some sets of parameters together. In the fuy parameterzed mode (Fg. 5a), there are 9 dfferent sets (matrces) of weghts but f we restrct the number of factors to be the same for each of the three sub-modes, four sets of parameters are dentca n dmenson: the weghts that orgnate from the nputs (past vsbe unts), the outputs (vsbe unts), the hdden unts and the features. Any combnaton of the compatbe parameters may be ted. Fg. 5b shows a fuy-shared parameterzaton. Ths has sghty ess than haf the number of parameters of the fuy parameterzed mode, assumng that the number of nput, output, hdden, and feature unts are comparabe. (a) Fgure 5. a) Fuy parameterzed mode wth each dot representng a dfferent set of parameters and dfferent coors denotng a dfferent number of factors n each sub-mode. b) Fu parameter sharng. In comparng dfferent reduced parameterzatons, tyng ony the feature-factor parameters, Wf z, Az m, and Bz n ed to modes that syntheszed the hghest quaty moton. When sharng the autoregressve weghts, A v<t m and Av m, wth non-autoregressve weghts, B v<t m and W f v, respectvey, we found that the component of the gradent reated to the autoregressve mode tended to domnate the weght update eary n earnng. Ths was due to the strength of the correaton between past and present compared to hdden and present or hdden and past. Wthodng the autoregressve component of the gradent for the frst 100 epochs, unt the hdden unts were abe to extract nterestng structure from the data, soved ths probem. In our reported experments we traned modes wth ony the feature-factor parameters ted. 4. Experments Sectons 4.1-4.2 report the resuts of tranng severa modes wth data retreved from the CMU Graphcs Lab Moton Capture Database. We extracted a seres of 10 styzed (b)

Factored Condtona Restrcted Botzmann Machnes wa sequences performed by subect 137. The was were abeed as cat, chcen, dnosaur, drun, gangy, gracefu, norma, od-man, sexy and strong. We baanced the dataset by repeatng the sequences 3-6 tmes (dependng on the orgna ength) so that our fna dataset contaned approxmatey 3000 frames of each stye at 60fps. A rotatons were converted to an exponenta map representaton. As n (Tayor et a., 2007), the root segment was expressed n a body-centred coordnate system whch s nvarant to ground-pane transatons and rotatons about the gravtatona vertca. A data was scaed to have zero mean and unt varance. We refer the reader to vdeos of our syntheszed data at http://www.cs.toronto.edu/ gwtayor/pubcatons/cm2009/. 4.1. Basene: the CRBM As a basene, we traned two CRBM modes foowng (Tayor et a., 2007) wth the foowng exceptons: At each teraton of CD earnng, we performed 10 steps of aternatng Gbbs sampng (CD-10). We added a sparsty term to the energy functon to genty encourage the hdden unts, whe drven by the data, to have an average actvaton of 0.2. Ths s the same nd of sparsty used n (Lee et a., 2008). At each teraton of CD earnng, we added Gaussan nose wth σ = 1 to each dmenson of v <t. A parameters used a earnng rate of 10 3, except for the autoregressve weghts whch used a earnng rate of 10 5. 4.1.1. 1-LAYER MODEL A snge-ayer CRBM wth 1200 hdden unts and N = 12 was traned on the 10-stye data for 200 epochs wth the parameters beng updated after every 100 tranng cases. Each tranng case was a wndow of 13 consecutve frames and the order of the tranng cases was randomy permuted. In addton to the rea-vaued mocap data, the hdden unts receved nput from a one-hot encodng of the matchng stye abe. Respectng the condtona nature of our appcaton (generaton of styzed moton, as opposed to, say cassfcaton) ths abe was not reconstructed durng earnng. After tranng the mode, we generated moton by ntazng wth 12 frames of tranng data and hodng the abe unts camped to the stye matchng the ntazaton. Wth a snge ayer we can generate hgh-quaty moton of 9/10 styes (see the suppementa vdeos), however, the mode fas to produce good generaton of the od-man stye. We beeve that ths reates to the subte nature of ths partcuar moton. In examnng the actvty of the hdden unts over tme whe camped to tranng data, we observed that the mode devotes most of ts hdden capacty to capturng the more actve styes as t pays a hgher cost for fang to mode more pronounced frame-to-frame changes. 4.1.2. 2-LAYER MODEL We aso earned a deeper networ by frst tranng a CRBM wth 600 bnary hdden unts and rea-vaued vsbe unts and then tranng a second eve CRBM wth 600 bnary hdden and 600 bnary vsbe unts. The data for tranng the second eve CRBM was the actvatons of the hdden unts of the frst eve CRBM whe drven by the tranng data. We added stye abes to the top ayer whe tranng the second eve CRBM. The frst mode was traned for 300 epochs, and the second eve was traned for 120 epochs. After tranng, the 2-hdden-ayer networ was abe to generate hgh-quaty was of a styes, ncudng od-man (see the suppementa vdeos). The second eve CRBM ayer effectvey repaces the pror over the frst ayer of hdden unts, p(h t v <t, θ), that s mpcty defned by the parameters of the frst CRBM. Ths provdes a better mode of the subte correatons between the features that the frst eve CRBM extracts from the moton. 4.2. Modeng wth Dscrete Stye Labes Usng the same 10-styes dataset, we traned a factored CRBM wth Gaussan vsbe unts whose parameters were gated by 100 rea-vaued features drven by the dscrete stye abes (Fg. 4). Ths mode had 600 hdden unts, 200 factors per sub-mode and N = 12. Feature-to-factor parameters were aso ted between sub-modes. A parameters used a earnng rate of 10 2, except for the autoregressve parameters A v m, Av<t m, Az m and the abe-to-feature parameters, R p, whch used a earnng rate of 10 3. We traned the mode for 500 epochs. After tranng the mode, we tested ts abty to synthesze reastc moton by ntazng wth 12 frames of tranng data and hodng the abe unts camped to the matchng stye. The snge-ayer mode was abe to generate styzed content as we as the 2-ayer standard CRBM (see the suppementa vdeos). In addton, we were abe to nduce transtons between two or more styes by neary bendng the dscrete stye abe from one settng to another over 200 frames 1. We were further abe to bend together styes (e sexy and strong) by appyng a near nterpoaton of the dscrete abes. The resutng moton was more natura when a snge stye was domnant (e.g. an 0.8/0.2 bend). We beeve ths s smpy a case of better performance when the desred moton more cosey resembes the cases present n the tranng data set, so tranng on a few exampes of bends shoud greaty mprove ther generaton. 1 The number of frames was seected emprcay and provded a smooth transton, but the mode s not senstve to ths number. A quc (e.g. frame-to-frame) change of abes w smpy produce a ery transton.

Factored Condtona Restrcted Botzmann Machnes 4.3. Modeng wth Rea-vaued Stye Parameters The motons consdered thus far have been descrbed by a snge, dscrete abe such as gangy or drun. Moton stye, however, can be characterzed by mutpe dscrete abes or even contnuous factors such as the eve of fow, weght, tme and space formay defned n Laban Movement Anayss (Torresan et a., 2007). In the case of mutpe dscrete abes, our rea-vaued feature unts, z, can receve nput from mutpe categores of abes. For contnuous factors of stye, we can connect rea-vaued stye unts to the rea-vaued feature unts, or we can smpy gate the mode drecty by the contnous descrpton of stye. To test ths hypothess, we traned a mode exacty as n Sec. 4.2, but nstead of gatng connectons wth 100 reavaued feature unts, we gated wth 2 rea-vaued stye descrptors that were condtoned upon at every frame. Agan we traned wth wang data, but the data was captured specfcay for ths experment. One stye unt represented the speed of wang and the other, the strde ength. The tranng data conssted of nne sequences at 60fps, each approxmatey 6000 frames correspondng to the crossproduct of (sow, norma, fast) speed and (short, norma, ong) strde ength. The correspondng abes each had vaues of 1, 2 or 3. These vaues were chosen to avod the speca case of a gatng unts beng set at zero and nufyng the effectve weghts of the mode. After tranng for 500 epochs, the mode coud, as before, generate reastc moton accordng to the nne dscrete combnatons of speed and strde-ength wth whch t was traned based on ntazaton and settng the abe unts to match the abes n the tranng set. Furthermore, the mode supported both nterpoaton and extrapoaton aong the speed and strde ength axes and dd not appear overy senstve to ntazaton (see the suppementa vdeos). 4.4. Quanttatve Evauaton In our experments so far, we have sought a quatatve comparson to the CRBM, based on the reasm of syntheszed moton. We have aso focused on the abty of a factored mode wth mutpcatve nteractons to synthesze transtons as we as nterpoate and extrapoate between styes present n the tranng data set. The appcaton does not naturay present a quanttatve comparson, but n the past, other tme seres modes have been compared by ther performance on the predcton of ether fu or parta hed-out frames (Tayor et a., 2007; Lawrence, 2007). We use the dataset frst proposed by (Hsu et a., 2005) whch conssts of abeed sequences of seven types of wang: (crouch, og, mp, norma, sde-rght, sway, wadde) each at three dfferent speeds (sow, medum, fast). We preprocessed the data to remove mssng or extremey nosy sectons, and smoothed wth a ow-pass fter before downsampng from 120 to 30fps. For each archtecture: unfactored/factored CRBM, and stye-gated unfactored/factored CRBM, we traned 21 dfferent modes on a stye and speed pars except one, whch we hed out for testng. Then, for each mode, we attempted to predct every subsequence of ength M n the test set, gven the past N = 6 frames. We repeated the experments for each archtecture, each tme reportng resuts averaged over the 21 modes. Predcton coud be performed by ntazng wth the prevous frame and Gbbs sampng n the same way we generated, but ths approach s subect to nose. We found that n a cases, ntegratng out the hdden unts and foowng the gradent of the negatve free energy (the og probabty of an observaton pus og Z ) wth respect to v t gave ess predcton error. Detas of how to compute the free energy by margnazng out the bnary hdden unts can be found n (Freund & Hausser, 1992). The archtectures were subect to dfferent earnng rates and so the number of epochs for whch to tran each mode were determned by settng asde 10% of the tranng set for vadaton. Fg. 6 presents the resuts. Wth amost haf the number of free parameters, the 600-60 factored mode performed as we as the fuy parameterzed CRBM. Gatng wth stye nformaton gves an advantage n onger-term predcton because t prevents the mode from graduay changng the stye. The unfactored mode wth stye nformaton performed sghty worse than the factored mode and was extremey sow to tran (t too two days to tran whereas the other modes were each traned n a few hours). RMS predcton error (normazed data space) 0.8 0.75 0.7 0.65 0.6 0.55 0.5 0.45 0.4 0.35 CRBM 600 (201,795) FCRBM 600 60 (110,445) stye FCRBM 600 60 (110,795) unfac stye CRBM 100 (218,445) 0 5 10 15 20 Predct ahead nterva Fgure 6. Predcton experment. The number of free parameters are shown n parentheses. Error s reported n the normazed space n whch the modes are traned and s per-dmenson, perframe.

Factored Condtona Restrcted Botzmann Machnes 5. Concuson Restrcted Botzmann Machnes have severa attractve computatona propertes whch carry through to the deeper archtectures of whch they form the core. From a generatve mode standpont, however, these deep networs have a defcency. Regardess of whether or not the ayers beow contan atera nteractons, sampng the hgher ayers can ony determne the effectve bases of the ayer beow. The gated CRBM s a mode n whch the hdden unts nfuence the atera nteractons of the ayer beow, provdng an exponenta number of (non-ndependent) possbe modes at a cost that s cubc n the number of parameters. If we ony et contextua nformaton (e stye) determne the effectve hdden bases of a CRBM, the sgna s swamped by the nformaton comng from the past. However, f we aow for a three-way, mutpcatve reatonshp e n the gated CRBM, context becomes a natura part of the mode, determnng the effectve weghts. The potenta bow-up n the number of parameters mped by such a mode s soved by factorzng the three-way tensors. When modeng human moton, our approach permts stye to change the effectve weghts of the networ va dscrete or rea-vaued representatons. Changng these stye-based factors durng generaton can nduce natura-oong transtons and permt nterpoaton and extrapoaton of styes n the tranng data. In our experments we aways condtoned on stye, and assumed that our tranng data had been abeed a pror. Ths added a supervsed favour to our otherwse unsupervsed modes. We beeve the more nterestng probem s the fuy unsupervsed settng ncudng the dscovery of stye n the ayers of bnary features. Future wor w contnue to focus on hgher-order nteractons (perhaps beyond thrd) but where stye-based descrptors are nferred rather than provded. References Brand, M., & Hertzmann, A. (2000). Stye machnes. Proc. SIGGRAPH (pp. 183 192). Carrera-Perpnan, M., & Hnton, G. (2005). On contrastve dvergence earnng. Proc. AISTATS (pp. 59 66). Freund, Y., & Hausser, D. (1992). Unsupervsed earnng of dstrbutons of bnary vectors usng 2-ayer networs. Proc. NIPS 4 (pp. 912 919). Hnton, G. (2002). Tranng products of experts by mnmzng contrastve dvergence. Neura Comput, 14, 1771 1800. Hnton, G., Osndero, S., & Teh, Y. (2006). A fast earnng agorthm for deep beef nets. Neura Comp., 18, 1527 1554. Hnton, G., & Saahutdnov, R. (2006). Reducng the dmensonaty of data wth neura networs. Scence, 313, 504 507. Hsu, E., Pu, K., & Popovć, J. (2005). Stye transaton for human moton. Proc. SIGGRAPH (pp. 1082 1089). Lawrence, N. (2004). Gaussan process atent varabe modes for vsuasaton of hgh dmensona data. Proc. NIPS 16 (pp. 329 326). Lawrence, N. (2007). Learnng for arger datasets wth the gaussan process atent varabe mode. Proc. AISTATS. Lee, H., Eanadham, C., & Ng., A. (2008). Sparse deep beef net mode for vsua area V2. Proc. NIPS 20. Memsevc, R. (2008). Non-near atent factor modes for reveang structure n hgh-dmensona data. Doctora dssertaton, Unversty of Toronto. Memsevc, R., & Hnton, G. (2007). Unsupervsed earnng of mage transformatons. Proc. CVPR. Osndero, S., & Hnton, G. (2008). Modeng mage patches wth a drected herarchy of Marov random feds. Proc. NIPS 20 (pp. 1121 1128). Saahutdnov, R., Mnh, A., & Hnton, G. (2007). Restrcted Botzmann machnes for coaboratve fterngfterng. Proc. ICML (pp. 791 798). Saahutdnov, R., & Murray, I. (2008). On the quanttatve anayss of deep beef networs. Proc. ICML (pp. 872 879). Smoensy, P. (1987). Informaton processng n dynamca systems: Foundatons of harmony theory. In D. E. Rumehart, J. L. McCeand et a. (Eds.), Parae dstrbuted processng: Voume 1: Foundatons, 194 281. Cambrdge: MIT Press. Tayor, G., Hnton, G., & Rowes, S. (2007). Modeng human moton usng bnary atent varabes. Proc. NIPS 19 (pp. 1345 1352). Teeman, T. (2008). Tranng restrcted Botzmann machnes usng approxmatons to the ehood gradent. Proc. ICML (pp. 1064 1071). Torresan, L., Hacney, P., & Breger, C. (2007). Learnng moton stye synthess from perceptua observatons. Proc. NIPS 19 (pp. 1393 1400). Wang, J., Feet, D., & Hertzmann, A. (2007). Mutfactor gaussan process modes for stye-content separaton. Proc. ICML (pp. 975 982). Weng, M., Rosen-Zv, M., & Hnton, G. (2005). Exponenta famy harmonums wth an appcaton to nformaton retreva. Proc. NIPS 17 (pp. 1481 1488).