Online Multi-Task Learning for Policy Gradient Methods

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "Online Multi-Task Learning for Policy Gradient Methods"

Transcription

1 Online Multi-ask Learning for Policy Gradient Metods aitam Bou Ammar Eric Eaton University of Pennsylvania, Computer and Information Science Department, Piladelpia, PA USA Paul Ruvolo Olin College of Engineering, Needam, MA USA Mattew E aylor Wasington State University, Scool of Electrical Engineering and Computer Science, Pullman, WA USA Abstract Policy gradient algoritms ave sown considerable recent success in solving ig-dimensional sequential decision making tasks, particularly in robotics owever, tese metods often require extensive experience in a domain to acieve ig performance o make agents more sampleefficient, we developed a multi-task policy gradient metod to learn decision making tasks consecutively, transferring knowledge between tasks to accelerate learning Our approac provides robust teoretical guarantees, and we sow empirically tat it dramatically accelerates learning on a variety of dynamical systems, including an application to quadrotor control 1 Introduction Sequential decision making SDM is an essential component of autonomous systems Altoug significant progress as been made on developing algoritms for learning isolated SDM tasks, tese algoritms often require a large amount of experience before acieving acceptable performance is is particularly true in te case of igdimensional SDM tasks tat arise in robot control problems e cost of tis experience can be proibitively expensive in terms of bot time and fatigue of te robot s components, especially in scenarios were an agent will face multiple tasks and must be able to quickly acquire control policies for eac new task Anoter failure mode of conventional metods is tat wen te production environment differs significantly from te training environment, previously learned policies may no longer be correct Proceedings of te 31 st International Conference on Macine Learning, Beijing, Cina, 2014 JMLR: W&CP volume 32 Copyrigt 2014 by te autors Wen data is in limited supply, learning task models jointly troug multi-task learning ML rater tan independently can significantly improve model performance run & O Sullivan, 1996; Zang et al, 2008; Rai & Daumé, 2010; Kumar & Daumé, 2012 owever, ML s performance gain comes at a ig computational cost wen learning new tasks or wen updating previously learned models Recent work Ruvolo & Eaton, 2013 in te supervised setting as sown tat nearly identical performance to batc ML can be acieved in online learning wit large computational speedups Building upon tis work, we introduce an online ML approac to learn a sequence of SDM tasks wit low computational overead Specifically, we develop an online ML formulation of policy gradient reinforcement learning tat enables an autonomous agent to accumulate knowledge over its lifetime and efficiently sare tis knowledge between SDM tasks to accelerate learning We call tis approac te Policy Gradient Efficient Lifelong Learning Algoritm PG-ELLA te first to our knowledge online ML policy gradient metod Instead of learning a control policy for an SDM task from scratc, as in standard policy gradient metods, our approac rapidly learns a ig-performance control policy based on te agent s previously learned knowledge Knowledge is sared between SDM tasks via a latent basis tat captures reusable components of te learned policies e latent basis is ten updated wit newly acquired knowledge, enabling a accelerated learning of new task models and b improvement in te performance of existing models witout retraining on teir respective tasks e latter capability is especially important in ensuring tat te agent can accumulate knowledge over its lifetime across numerous tasks witout exibiting negative transfer We sow tat tis process is igly efficient wit robust teoretical guarantees We evaluate PG-ELLA on four dynamical systems, including an application to quadrotor control, and sow tat PG-ELLA outperforms standard policy gradients bot in te initial and final performance

2 Online Multi-ask Learning for Policy Gradient Metods 2 Related Work in Multi-ask RL Due to its empirical success, tere is a growing body of work on transfer learning approaces to reinforcement learning RL aylor & Stone, 2009 By contrast, relatively few metods for multi-task RL ave been proposed One class of algoritms for multi-task RL use nonparametric Bayesian models to sare knowledge between tasks For instance, Wilson et al 2007 developed a ierarcical Bayesian approac tat models te distribution over Markov decision processes MDPs and uses tis distribution as a prior for learning eac new task, enabling it to learn tasks consecutively In contrast to our work, Wilson et al focused on environments wit discrete states and actions Additionally, teir metod requires te ability to compute an optimal policy given an MDP is process can be expensive for even moderately large discrete environments, but is computationally intractable for te types of continuous, ig-dimensional control problems considered ere Anoter example is by Li et al 2009, wo developed a model-free multi-task RL metod for partially observable environments Unlike our problem setting, teir metod focuses on off-policy batc ML Finally, Lazaric & Gavamzade 2010 exploit sared structure in te value functions between related MDPs owever, teir approac is designed for on-policy multi-task policy evaluation, rater tan computing optimal policies A second approac to multi-task RL is based on Policy Reuse Fernández & Veloso, 2013, in wic policies from previously learned tasks are probabilistically reused to bias te learning of new tasks One drawback of Policy Reuse is tat it requires tat tasks sare common states, actions, and transition functions but allows different reward functions, wile our approac only requires tat tasks sare a common state and action space is restriction precludes te application of Policy Reuse to te scenarios considered in Section 7, were te systems ave related but not identical transition functions Also, in contrast to PG-ELLA, Policy Reuse does not support reverse transfer, were subsequent learning improves previously learned policies Peraps te approac most similar to ours is by Deisenrot et al 2014, wic uses policy gradients to learn a single controller tat is optimal on average over all training tasks By appropriately parameterizing te policy, te controller can be customized to particular tasks owever, tis metod requires tat tasks differ only in teir reward function, and tus is inapplicable to our experimental scenarios 3 Problem Framework We first describe our framework for policy gradient RL and lifelong learning e next section uses tis framework to present our approac to online ML for policy gradients 31 Policy Gradient Reinforcement Learning We frame eac SDM task as an RL problem, in wic an agent must sequentially select actions to maximize its expected return Suc problems are typically formalized as a Markov decision process MDP X, A, P, R, γ, were X R d is te potentially infinite set of states, A R m is te set of possible actions, P : X A X [0, 1] is a state transition probability function describing te system s dynamics, R : X A R is te reward function measuring te agent s performance, and γ [0, 1 specifies te degree to wic rewards are discounted over time At eac time step, te agent is in state x X and must coose an action a A, transitioning it to a new state x +1 px +1 x, a as given by P and yielding reward r +1 = Rx, a A policy π : X A [0, 1] is defined as a probability distribution over state-action pairs, were πa x represents te probability of selecting action a in state x e goal of an RL agent is to find an optimal policy π tat maximizes te expected return e sequence of state-action pairs forms a trajectory τ = [x 0:, a 0: ] over a possibly infinite orizon Policy gradient metods Sutton et al, 1999; Peters & Scaal, 2008; Peters & Bagnell, 2010 ave sown success in solving ig-dimensional problems, suc as robotic control Peters & Scaal, 2007 ese metods represent te policy π θ a x using a vector θ R d of control parameters e goal is to determine te optimal parameters θ tat maximize te expected average return: J θ = p θ τ Rτ dτ, 1 were is te set of all possible trajectories e trajectory distribution p θ τ and average per time step return Rτ are defined as: p θ τ = P 0 x 0 Rτ = 1 px +1 x, a π θ a x =1 r +1, =0 wit an initial state distribution P 0 : X [0, 1] Most policy gradient algoritms, suc as episodic REIN- FORCE Williams, 1992, PoWER Kober & Peters, 2011, and Natural Actor Critic Peters & Scaal, 2008, employ supervised function approximators to learn te control parameters θ by maximizing a lower bound on te expected return of J θ Eq 1 o acieve tis, one generates trajectories using te current policy π θ, and ten compares te result wit a new policy parameterized by θ As described by Kober & Peters 2011, te lower bound on te expected return can be attained using Jensen s inequality

3 Online Multi-ask Learning for Policy Gradient Metods and te concavity of te logaritm: log J θ = log p θτ Rτ dτ p θ τ = log p θ τ p θτ Rτ dτ p θ τ Rτ log p θτ dτ + constant p θ τ D KL pθ τ Rτ p θτ = J L,θ θ, were D KL pτ qτ = pτ log pτ dτ We qτ see tat tis is equivalent to minimizing te KL divergence between te reward-weigted trajectory distribution of π θ and te trajectory distribution p θ of te new policy π θ 32 e Lifelong Learning Problem In contrast to most previous work on policy gradients, wic focus on single-task learning, tis paper focuses on te online ML setting in wic te agent is required to learn a series of SDM tasks Z 1,, Z max over its lifetime Eac task t is an MDP Z t = S t, A t, P t, R t, γ t wit initial state distribution e agent will learn te tasks consecutively, acquiring multiple trajectories witin eac task before moving to te next e tasks may be interleaved, providing te agent te opportunity to revisit earlier tasks for furter experience, but te agent as no control over te task order We assume tat a priori te agent does not know te total number of tasks max, teir distribution, or te task order P t 0 e agent s goal is to learn a set of optimal policies Π = { } π,, π θ 1 θ wit corresponding parameters Θ = { θ 1,, θ max } At any time, te agent max may be evaluated on any previously seen task, and so must strive to optimize its learned policies for all tasks Z 1,, Z, were denotes te number of tasks seen so far 1 max 4 Online ML for Policy Gradient Metods is section develops te Policy Gradient Efficient Lifelong Learning Algoritm PG-ELLA 41 Learning Objective o sare knowledge between tasks, we assume tat eac task s control parameters can be modeled as a linear combination of latent components from a sared knowledge base A number of supervised ML algoritms Kumar & Daumé, 2012; Ruvolo & Eaton, 2013; Maurer et al, 2013 ave sown tis approac to be successful Our approac incorporates te use of a sared latent basis into policy gradient learning to enable transfer between SDM tasks PG-ELLA maintains a library of k latent components L R d k tat is sared among all tasks, forming a basis for te control policies We can ten represent eac task s control parameters as a linear combination of tis latent basis θ t = Ls t, were s t R k is a task-specific vector of coefficients e task-specific coefficients s t are encouraged to be sparse to ensure tat eac learned basis component captures a maximal reusable cunk of knowledge We can ten represent our objective of learning stationary policies wile maximizing te amount of transfer between task models by: e L= 1 [ min J θ t + µ s t 1 ]+ λ L 2 s t F, 2 were θ t = Ls t, te L 1 norm of s t is used to approximate te true vector sparsity, and F is te Frobenius norm e form of tis objective function is closely related to oter supervised ML metods Ruvolo & Eaton, 2013; Maurer et al, 2013, wit important differences troug te incorporation of J as we will examine sortly Our approac to optimizing Eq 2 is based upon te Efficient Lifelong Learning Algoritm ELLA Ruvolo & Eaton, 2013, wic provides a computationally efficient metod for learning L and te s t s online over multiple tasks in te case of supervised ML e objective solved by ELLA is closely related to Eq 2, wit te exception tat te J term is replaced wit a measure of eac task model s average loss over te training data in ELLA Since Eq 2 is not jointly convex in L and te s t s, most supervised ML metods use an expensive alternating optimization procedure to train te task models simultaneously Ruvolo & Eaton provide an efficient alternative to tis procedure tat can train task models consecutively, enabling Eq 2 to be used effectively for online ML In te next section, we adapt tis approac to te policy gradient framework, and sow tat te resulting algoritm provides an efficient metod for learning consecutive SDM tasks 42 Multi-ask Policy Gradients Policy gradient metods maximize te lower bound of J θ Eq 1 In order to use Eq 2 for ML wit policy gradients, we must first incorporate tis lower bound into our objective function Rewriting te error term in Eq 2 in terms of te lower bound yields e L= 1 [ θt ] min J L,θ + µ s t 1 + λ L 2 s t F, were θ t = Ls t owever, we can note tat θt J L,θ p θ tτ R t pθ tτ R t τ τ log dτ p θtτ τ t

4 Online Multi-ask Learning for Policy Gradient Metods θt erefore, maximizing te lower bound of J L,θ is o compute te second-order aylor θt representation, te equivalent to te following minimization problem: first and second derivatives of J L,θ wrt θ t are required e first derivative, θt J L,θ, is given by: θt min p θtτ R t pθ tτ R τ log t τ dτ θ t p θtτ p τ θtτ R t τ log p t θt θtτ dτ τ t Substituting te above result wit θ t = Ls t into Eq 2 leads to te following total cost function for ML wit policy gradients: {[ e L = 1 min p θ tτ R t τ s t τ t pθ tτ R t ] } 3 τ log dτ + µ s t 1 + λ L 2 p θtτ F Wile Eq 3 enables batc ML using policy gradients, it is computationally expensive due to two inefficiencies tat make it inappropriate for online ML: a te explicit dependence on all available trajectories troug J θ t = τ t p θ tτ R t τ dτ, and b te exaustive evaluation of a single candidate L tat requires te optimization of all s t s troug te outer summation ogeter, tese aspects cause Eq 3 and similarly Eq 2 to ave a computational cost tat depends on te total number of trajectories and total number of tasks, complicating its direct use in te lifelong learning setting We next describe metods for resolving eac of tese inefficiencies wile minimizing Eq 3, yielding PG-ELLA as an efficient metod for multi-task policy gradient learning In fact, we sow tat te complexity of PG-ELLA in learning a single task policy is independent of a te number of tasks seen so far and b te number of trajectories for all oter tasks, allowing our approac to be igly efficient 421 ELIMINAING DEPENDENCE ON OER ASKS As mentioned above, one of te inefficiencies in minimizing e L is its dependence on all available trajectories for all tasks o remedy tis problem, as in ELLA, we approximate e L by performing a second-order aylor expansion of J L,θ θ t around te optimal solution: { α t = arg min θ t p θ tτ R t τ τ t pθ tτ R t } τ log dτ p θtτ As sown by Ruvolo & Eaton 2013, te second-order aylor expansion can be substituted into te ML objective function to provide a point estimate around te optimal solution, eliminating te dependence on oter tasks wit: log p θtτ = log p t x t 0 t + =0 t + =0 p t x t +1 xt, at log π θt a t xt erefore: θt θtj L,θ = p θ tτ R t τ τ t t θt log π a t θt xt dτ =1 t = E log π θt θt =1 a t xt R t τ Policy gradient algoritms determine α t = θ t by following θt te above gradient e second derivative of J L,θ can be computed similarly to produce: θt 2 θt, θ J t L,θ = p θ tτ R t τ τ t 2 θt, θ log π t θt t =1 a t xt dτ We let Γ t = 2 θt, θ tj θt L,θ represent te essian evaluated at α t : [ t ] Γ t = E R t τ 2 θt, θ log π a t t θt xt θ t =α t =1 Substituting te second-order aylor approximation into Eq 3 yields te following: ê L= 1 [ α min t Ls t 2 ] s s t Γ t+ µ t 1 +λ L 2 F, were v 2 A = v Av, te constant term was suppressed since it as no effect on te minimization, and te linear term was ignored since by construction α t is a minimizer Most importantly, te dependence on all available trajectories as been eliminated, remedying te first inefficiency 4

5 Online Multi-ask Learning for Policy Gradient Metods 422 COMPUING E LAEN SPACE e second inefficiency in Eq 3 arises from te procedure used to compute te objective function for a single candidate L Namely, to determine ow effective a given value of L serves as a common basis for all learned tasks, an optimization problem must be solved to recompute eac of te s t s, wic becomes increasingly expensive as grows large o remedy tis problem, we modify Eq 3 or equivalently, Eq 4 to eliminate te minimization over all s t s Following te approac used in ELLA, we optimize eac task-specific projection s t only wen training on task t, witout updating tem wen training on oter tasks Consequently, any canges to θ t wen learning on oter tasks will only be troug updates to te sared basis L As sown by Ruvolo & Eaton 2013, tis coice to update s t only wen training on task t does not significantly affect te quality of model fit as grows large Wit tis simplification, we can rewrite Eq 4 in terms of two update equations: s t arg min l L m, s, α t, Γ t 5 s L m+1 arg min L 1 l L, s t, α t, Γ t +λ L 2 F, 6 were L m refers to te value of te latent basis at te start of te m t training session, t corresponds to te particular task for wic data was just received, and l L, s, α, Γ = µ s 1 + α Ls 2 Γ o compute L m, we null te gradient of Eq 6 and solve te resulting equation to yield te updated column-wise vectorization of L as A 1 b, were: A = λi d k,d k + 1 b = 1 vec s t s t s t Γ t α t Γ t For efficiency, we can compute A and b incrementally as new tasks arrive, avoiding te need to sum over all tasks 43 Data Generation & Model Update Using te incremental form Eqs 5 6 of te policy gradient ML objective function Eq 3, we can now construct an online ML algoritm tat can operate in a lifelong learning setting In typical policy gradient metods, trajectories are generated in batc mode by first initializing te policy and sampling trajectories from te system Kober & Peters, 2011; Peters & Bagnell, 2010 Given tese trajectories, te policy parameters are updated, new Algoritm 1 PG-ELLA k, λ, µ 0, A zeros k d,k d, b zeros k d,1, L zeros d,k wile some task t is available do if isnewaskt ten + 1 t, R t getrandomrajectories else t, R t getrajectories α t A A s t s t Γ t b b vec s t α t Γ t end if Compute α t and Γ t from t, R t L reinitializeallzerocolumnsl s t arg min s l L, s, α t, Γ t A A + s t s t Γ t b b + vec s t α t Γ t L mat 1 A + λi k d,k d end wile 1 1 b trajectories are sampled from te system using te updated policy, and te procedure is ten repeated In tis work, we adopt a sligtly modified version of policy gradients to operate in te lifelong learning setting e first time a new task is observed, we use a random policy for sampling; eac subsequent time te task is observed, we sample trajectories using te previously learned α t Additionally, instead of looping until te policy parameters ave converged, we perform only one run over te trajectories Upon receiving data for a specific task t, PG-ELLA performs two steps to update te model: it first computes te task-specific projections s t, and ten refines te sared latent space L o compute s t, we first determine α t and Γ t using only data from task t e details of tis step depend on te form cosen for te policy, as described in Section 5 We can ten solve te L 1 -regularized regression problem given in Eq 5 an instance of te Lasso to yield s t In te second step, we update L by first reinitializing any zero-columns of L and ten following Eq 6 e complete PG-ELLA is given as Algoritm 1 5 Policy Forms & Base Learners PG-ELLA supports a variety of policy forms and base learners, enabling it to be used in a number of policy gradient settings is section describes ow two popular policy gradient metods can be used as te base learner in PG- ELLA In teory, any policy gradient learner tat can provide an estimate of te essian can be incorporated

6 Online Multi-ask Learning for Policy Gradient Metods 51 Episodic REINFORCE In episodic REINFORCE Williams, 1992, te stocastic policy for task t is cosen according a t = θ t x t + ɛ, wit ɛ N 0, σ 2, and so π a t xt N θ t x t, σ2 erefore, θt [ t θtj L,θ = E R t τ =1 σ 2 ] a t θ x t is used to minimize te KL-divergence, equivalently maximizing te total discounted pay-off e second derivative [ for episodic REINFORCE is given by ] Γ t t = E =1 σ 2 x t xt 52 Natural Actor Critic In episodic Natural Actor Critic enac, te stocastic policy for task t is cosen in a similar fasion to tat of REINFORCE: π a t xt N θ t x t, σ2 e cange in te probability distribution is measured by a KL-divergence tat is approximated using a second-order expansion to incorporate te Fiser information matrix Accordingly, te gradient follows: θ J = G 1 θ J θ, were G denotes te Fiser information matrix e essian can be computed in a similar manner to te previous section For details, see Peters & Scaal eoretical Results & Computational Cost ere, we provide teoretical results tat establis tat PG- ELLA converges and tat te cost in terms of model performance for making te simplification from Section 421 is asymptotically negligible We proceed by first stating teoretical results from Ruvolo & Eaton 2013, and ten sow tat tese teoretical results apply directly to PG- ELLA wit minimal modifications First, we define: ĝ L = 1 l L, s t, α t, Γ t +λ L 2 F Recall from Section 421, tat te leftand side of te preceding equation specifies te cost of basis L if we leave te s t s fixed ie, we only update tem wen we receive training data for tat particular task We are now ready to state te two results from Ruvolo & Eaton 2013: Proposition 1: e latent basis becomes more stable over time at a rate of L +1 L = O 1 Proposition 2: 1 ĝ L converges almost surely; 2 ĝ L e L converges almost surely to 0 Proposition 2 establises tat te algoritm converges to a fixed per-task loss on te approximate objective function ĝ and te objective function tat does not contain te simplification from Section 421 Furter, Prop 2 establises tat tese two functions converge to te same value e consequence of tis last point is tat PG-ELLA does not incur any penalty in terms of average per-task loss for making te simplification from Section 421 e two propositions require te following assumptions: 1 e tuples Γ t, α t are drawn iid from a distribution wit compact support bounding te entries of Γ t and α t 2 For all L, Γ t, and α t, te smallest eigenvalue of L γ Γ t L γ is at least κ wit κ > 0, were γ is te subset of non-zero indices of te vector s t = arg min s α t Ls 2 Γ t In tis case te non-zero elements of te unique minimizing s t are given by: s t γ = L γ Γ t 1 L γ L γ Γ t α t µɛ γ, were ɛ γ is a vector containing te signs of te non-zero entries of s t e second assumption is a mild condition on te uniqueness of te sparse coding solution e first assumption can be verified by assuming tat tere is no sequential dependency of one task to te next Additionally, te fact tat Γ t is contained in a compact region can be verified for te episodic REINFORCE algoritm by looking at te form of te essian and requiring tat te time orizon t is finite Using a similar argument we can see tat te magnitude of te gradient for episodic REINFORCE is also bounded wen t is finite If we ten assume tat we make a finite number of updates for eac task model we can ensure tat te sum of all gradient updates is finite, tus guaranteeing tat α t is contained in a compact region Computational Complexity: Eac update begins by running a step of policy gradient to update α t and Γ t We assume tat te cost of te policy gradient update is Oξd, n t, were te specific cost depends on te particular policy algoritm employed and n t is te number of trajectories obtained for task t at te current iteration o complete te analysis, we use a result from Ruvolo & Eaton 2013 tat te cost of updating L and s t is Ok 2 d 3 is gives an overall cost of Ok 2 d 3 +ξd, n t for eac update 7 Evaluation We applied PG-ELLA to learn control policies for te four dynamical systems sown in Figure 1, including tree mecanical systems and an application to quadrotor control We generated multiple tasks by varying te parameterization of eac system, yielding a set of tasks from eac domain wit varying dynamics For example, te simple mass spring damper system exibits significantly iger oscillations as te spring constant increases Notably, te opti-

7 Online Multi-ask Learning for Policy Gradient Metods, i F x, x i 3, 3 i 1, 1 i able 1 System parameter ranges used in te experiments F SM k [1, 10] d [001, 02] m [05, 5] x, x i F2 2, 2 i e11 e21 e31 F1 rol r e 2B e1 F F3 B e3b x, x i CP & 3CP mc [05, 15] mp [01, 02] l [02, 08] d [001, 009] 3CP l1 [03, 05] l2 [02, 04] l3 [01, 03] 3CP d1 [01, 02] d2 [001, 002] d3 [01, 02] Ii 10 6, 10 4 l 711 E XPERIMENAL P ROOCOL pit c F4 yaw Figure 1 e four dynamical systems: a simple mass spring damper top-left, b cart-pole top-rigt, c tree-link inverted pendulum bottom-left, and d quadrotor bottom-rigt mal policies for controlling tese systems vary significantly even for only sligt variations in te system parameters 71 Bencmark Dynamical Systems We evaluated PG-ELLA on tree bencmark dynamical systems In eac domain, te distance between te current state and te goal position was used as te reward function Simple Mass Spring Damper: e simple mass SM system is caracterized by tree parameters: te spring constant k in N/m, te damping constant d in Ns/m, and te mass m in kg e system s state is given by te position x and velocity x of te mass, wic vary according to a linear force F e goal is to design a policy for controlling te mass to be in a specific state gref = xref, x ref i In our experiments, te goal state varied from being gref = 0, 0i i to gref = i, 0i, were i {1, 2,, 5} Cart-Pole: e cart-pole CP system as been used extensively as a bencmark for evaluating RL algoritms Bus oniu et al, 2010 CP dynamics are caracterized by te cart s mass mc in kg, te pole s mass mp in kg, te pole s lengt l in meters, and a damping parameter d in Ns/m e state is caracterized by te position x and velocity x of te cart, as well as te angle θ and angular velocity θ of te pole e goal is to design a policy capable of controlling te pole in an uprigt position ree-link Inverted Pendulum: e tree-link CP 3CP is a igly nonlinear and difficult system to control e goal is to balance tree connected rods in an uprigt position by moving te cart e dynamics are parameterized by te mass of te cart mc, rod mass mp,i, lengt li, inertia Ii, and damping parameters di, were i {1, 2, 3} represents te index for eac of te tree rods e system s state is caracterized by an eigt-dimensional vector, consisting of te position x and velocity x of te cart, and te angle {θi }3i=1 and angular velocity {θ i }3i=1 of eac rod We first generated 30 tasks for eac domain by varying te system parameters over te ranges given in able 1 ese parameter ranges were cosen to ensure a variety of tasks, including tose tat were difficult to control wit igly caotic dynamics We ten randomized te task order wit repetition and PG-ELLA acquired a limited amount of experience in eac task consecutively, updating L and te st s after eac session At eac learning session, PGELLA was limited to 50 trajectories for SM & CP or 20 trajectories for 3CP wit 150 time steps eac to perform te update Learning ceased once PG-ELLA ad experienced at least one session wit eac task o configure PG-ELLA, we used enac Peters & Scaal, 2008 as te base policy gradient learner e dimensionality k of te latent basis L was cosen independently for eac domain via cross-validation over 10 tasks e stepsize for eac task domain was determined by a line searc after gatering 10 trajectories of lengt 150 o evaluate te learned basis at any point in time, we initialized policies for eac task using θ t = Lst for t = {1,, } Starting from tese initializations, learning on eac task commenced using enac e number of trajectories varied among te domains from a minimum of 20 on te simple mass system to a maximum of 50 on te quadrotors e lengt of eac of tese trajectories was set to 150 time steps across all domains We measured performance using te average reward computed over 50 episodes of 150 time steps, and compared tis to standard enac running independently wit te same settings 712 R ESULS ON E B ENCMARK S YSEMS Figure 2 compares PG-ELLA to standard policy gradient learning using enac, sowing te average performance on all tasks versus te number of learning iterations PGELLA clearly outperforms standard enac in bot te initial and final performance on all task domains, demonstrating significantly improved performance from ML We evaluated PG-ELLA s performance on all tasks using te basis L learned after observing various subsets of tasks, from observing only tree tasks 10% to observing all 30 tasks 100% ese experiments assessed te quality of te learned basis L on bot known as well as unknown tasks, sowing tat performance increases as PG-ELLA

8 Online Multi-ask Learning for Policy Gradient Metods Average Reward PG ELLA, 100% tasks observed PG ELLA, 50% tasks observed PG ELLA, 30% tasks observed PG ELLA, 10% tasks observed Standard Policy Gradients Average Reward PG ELLA, 100% tasks observed PG ELLA, 50% tasks observed PG ELLA, 30% tasks observed PG ELLA, 10% tasks observed Standard Policy Gradients Average Reward PG ELLA, 100% tasks observed PG ELLA, 50% tasks observed PG ELLA, 30% tasks observed PG ELLA, 10% tasks observed Standard Policy Gradients Iterations a Simple Mass Spring Damper Iterations b Cart-Pole Iterations c ree-link Inverted Pendulum Figure 2 e performance of PG-ELLA versus standard policy gradients enac on te bencmark dynamical systems learns more tasks Wen a particular task was not observed, te recent L wit a zero initialization of s t was used o assess te difference in total number of trajectories between PG-ELLA and enac, we also tried giving enac an additional 50 trajectories of lengt 150 time steps at eac iteration owever, its overall performance did not cange 72 Quadrotor Control We also evaluated PG-ELLA on an application to quadrotor control, providing a more callenging domain e quadrotor system is illustrated in Figure 1, wit dynamics influenced by inertial constants around e 1,B, e 2,B, and e 3,B, trust factors influencing ow te rotor s speed affects te overall variation of te system s state, and te lengt of te rods supporting te rotors Altoug te overall state of te system can be described by a nine-dimensional vector, we focus on stability and so consider only six of tese state variables e quadrotor system as a ig-dimensional action space, were te goal is control te four rotational velocities {w i } 4 i=1 of te rotors to stabilize te system o ensure realistic dynamics, we used te simulated model described by Bouabdalla 2007, wic as been verified on and used in te control of a pysical quadrotor o produce multiple tasks, we generated 15 quadrotor systems by varying eac of: te inertia around te x- axis I xx [45e 3, 65e 3 ], inertia around te y-axis I yy [42e 3, 52e 3 ], inertia around te z-axis I zz [15e 2, 21e 2 ], and te lengt of te arms l [027, 03] In eac case, tese parameter values ave been used by Bouabdalla 2007 to describe pysical quadrotors We used a linear quadratic regulator, as described by Bouabdalla, to initialize te policies in bot te learning ie, determining L and s t and testing ie, comparing to standard policy gradients pases We followed a similar experimental procedure to evaluate PG-ELLA on quadrotor control, were we used 50 trajectories of 150 time steps to perform an enac policy gradient update eac learning session Figure 3 compares PG-ELLA to standard policy gradients enac on quadrotor control As on te bencmark sys- Average Reward 1 x PG ELLA, 100% tasks observed PG ELLA, 50% tasks observed PG ELLA, 30% tasks observed PG ELLA, 10% tasks observed Standard Policy Gradients Iterations Figure 3 Performance on quadrotor control tems, we see tat PG-ELLA clearly outperforms standard enac in bot te initial and final performance, and tis performance increases as PG-ELLA learns more tasks e final performance of te policy learned by PG-ELLA after observing all tasks is significantly better tan te policy learned using standard policy gradients, sowing te benefits of knowledge transfer between tasks Most importantly for practical applications, by using te basis L learned over previous tasks, PG-ELLA can acieve ig performance in a new task muc more quickly wit fewer trajectories tan standard policy gradient metods 8 Conclusion & Future Work PG-ELLA provides an efficient mecanism for online ML of SDM tasks wile providing improved performance over standard policy gradient metods By supporting knowledge transfer between tasks via a sared latent basis, PG- ELLA is also able to rapidly learn policies for new tasks, providing te ability for an agent to rapidly adapt to new situations In future work, we intend to explore te potential for cross-domain transfer wit PG-ELLA Acknowledgements is work was partially supported by ONR N , AFOSR FA , and NSF IIS We tank te reviewers for teir elpful suggestions

9 Online Multi-ask Learning for Policy Gradient Metods References Bócsi, B, Csato, L, and Peters, J Alignment-based transfer learning for robot models In Proceedings of te 2013 International Joint Conference on Neural Networks IJCNN, 2013 Bou-Ammar,, aylor, ME, uyls, K, Driessens, K, and Weiss, G Reinforcement learning transfer via sparse coding In Proceedings of te 11t Conference on Autonomous Agents and Multiagent Systems AAMAS, 2012 Bouabdalla, S Design and control of quadrotors wit application to autonomous flying PD tesis, École polytecnique fédérale de Lausanne, 2007 Buşoniu, L, Babuška, R, De Scutter, B, and Ernst, D Reinforcement Learning and Dynamic Programming Using Function Approximators CRC Press, Boca Raton, Florida, 2010 Daniel, C, Neumann, G, Kroemer, O, and Peters, J Learning sequential motor tasks In Proceedings of te 2013 IEEE International Conference on Robotics and Automation ICRA, 2013 Deisenrot, MP, Englert, P, Peters, J, and Fox, D Multi-task policy searc for robotics In Proceedings of te 2014 IEEE International Conference on Robotics and Automation ICRA, 2014 Fernández, F and Veloso, M Learning domain structure troug probabilistic policy reuse in reinforcement learning Progress in AI, 21:13 27, 2013 Kober, J and Peters, J Policy searc for motor primitives in robotics Macine Learning, 841 2, July 2011 Kumar, A and Daumé III, Learning task grouping and overlap in multi-task learning In Proceedings of te 29t International Conference on Macine Learning ICML, 2012 Kupcsik, AG, Deisenrot, MP, Peters, J, and Neumann, G Data-efficient generalization of robot skills wit contextual policy searc In Proceedings of te AAAI Conference on Artificial Intelligence AAAI, 2013 Lazaric, A and Gavamzade, M Bayesian multi-task reinforcement learning In Proceedings of te 27t International Conference on Macine Learning ICML, 2010 Peters, J and Scaal, S Natural actor-critic Neurocomputing, : , 2008 Rai, P and Daumé III, Infinite predictor subspace models for multitask learning In Proceedings of te 26t Conference on Uncertainty in Artificial Intelligence UAI, 2010 Ruvolo, P and Eaton, E ELLA: An efficient lifelong learning algoritm In Proceedings of te 30t International Conference on Macine Learning ICML, 2013 Sutton, RS, McAllester, DA, Sing, SP, and Mansour, Y Policy gradient metods for reinforcement learning wit function approximation In Neural Information Processing Systems NIPS, pp , 1999 aylor, ME, and Stone, P ransfer learning for reinforcement learning domains: a survey Journal of Macine Learning Researc, 10: , 2009 aylor, ME, Witeson, S, and Stone, P ransfer via inter-task mappings in policy searc reinforcement learning In Proceedings of te 6t International Joint Conference on Autonomous Agents and Multiagent Systems AAMAS, 2007 aylor, ME, Kulmann, G, and Stone, P Autonomous transfer for reinforcement learning In Proceedings of te 7t International Joint Conference on Autonomous Agents and Multiagent Systems AAMAS, pp , 2008 run, S and O Sullivan, J Discovering structure in multiple learning tasks: te C algoritm In Proceedings of te 13t International Conference on Macine Learning ICML, 1996 Williams, RJ Simple statistical gradient-following algoritms for connectionist reinforcement learning Macine Learning, 8: , 1992 Wilson, A, Fern, A, Ray, S, and adepalli, P Multi-task reinforcement learning: a ierarcical Bayesian approac In Proceedings of te 24t International Conference on Macine Learning ICML, pp , 2007 Zang, J, Garamani, Z, and Yang, Y Flexible latent variable models for multi-task learning Macine Learning, 733: , 2008 Li,, Liao, X, and Carin, L Multi-task reinforcement learning in partially observable stocastic environments Journal of Macine Learning Researc, 10: , 2009 Liu, Y and Stone, P Value-function-based transfer for reinforcement learning using structure mapping In Proceedings of te 21st National Conference on Artificial Intelligence AAAI, pp , 2006 Maurer, A, Pontil, M, and Romera-Paredes, B Sparse coding for multitask and transfer learning In Proceedings of te 30t International Conference on Macine Learning ICML, 2013 Peters, J and Bagnell, JA Policy gradient metods Encyclopedia of Macine Learning, pp , 2010 Peters, J and Scaal, S Applying te episodic natural actor-critic arcitecture to motor primitive learning In Proceedings of te 2007 European Symposium on Artificial Neural Networks ESANN, 2007

Verifying Numerical Convergence Rates

Verifying Numerical Convergence Rates 1 Order of accuracy Verifying Numerical Convergence Rates We consider a numerical approximation of an exact value u. Te approximation depends on a small parameter, suc as te grid size or time step, and

More information

The EOQ Inventory Formula

The EOQ Inventory Formula Te EOQ Inventory Formula James M. Cargal Matematics Department Troy University Montgomery Campus A basic problem for businesses and manufacturers is, wen ordering supplies, to determine wat quantity of

More information

Training Robust Support Vector Regression via D. C. Program

Training Robust Support Vector Regression via D. C. Program Journal of Information & Computational Science 7: 12 (2010) 2385 2394 Available at ttp://www.joics.com Training Robust Support Vector Regression via D. C. Program Kuaini Wang, Ping Zong, Yaoong Zao College

More information

Research on the Anti-perspective Correction Algorithm of QR Barcode

Research on the Anti-perspective Correction Algorithm of QR Barcode Researc on te Anti-perspective Correction Algoritm of QR Barcode Jianua Li, Yi-Wen Wang, YiJun Wang,Yi Cen, Guoceng Wang Key Laboratory of Electronic Tin Films and Integrated Devices University of Electronic

More information

2.28 EDGE Program. Introduction

2.28 EDGE Program. Introduction Introduction Te Economic Diversification and Growt Enterprises Act became effective on 1 January 1995. Te creation of tis Act was to encourage new businesses to start or expand in Newfoundland and Labrador.

More information

Comparison between two approaches to overload control in a Real Server: local or hybrid solutions?

Comparison between two approaches to overload control in a Real Server: local or hybrid solutions? Comparison between two approaces to overload control in a Real Server: local or ybrid solutions? S. Montagna and M. Pignolo Researc and Development Italtel S.p.A. Settimo Milanese, ITALY Abstract Tis wor

More information

Optimized Data Indexing Algorithms for OLAP Systems

Optimized Data Indexing Algorithms for OLAP Systems Database Systems Journal vol. I, no. 2/200 7 Optimized Data Indexing Algoritms for OLAP Systems Lucian BORNAZ Faculty of Cybernetics, Statistics and Economic Informatics Academy of Economic Studies, Bucarest

More information

Geometric Stratification of Accounting Data

Geometric Stratification of Accounting Data Stratification of Accounting Data Patricia Gunning * Jane Mary Horgan ** William Yancey *** Abstract: We suggest a new procedure for defining te boundaries of te strata in igly skewed populations, usual

More information

SAMPLE DESIGN FOR THE TERRORISM RISK INSURANCE PROGRAM SURVEY

SAMPLE DESIGN FOR THE TERRORISM RISK INSURANCE PROGRAM SURVEY ASA Section on Survey Researc Metods SAMPLE DESIG FOR TE TERRORISM RISK ISURACE PROGRAM SURVEY G. ussain Coudry, Westat; Mats yfjäll, Statisticon; and Marianne Winglee, Westat G. ussain Coudry, Westat,

More information

Strategic trading in a dynamic noisy market. Dimitri Vayanos

Strategic trading in a dynamic noisy market. Dimitri Vayanos LSE Researc Online Article (refereed) Strategic trading in a dynamic noisy market Dimitri Vayanos LSE as developed LSE Researc Online so tat users may access researc output of te Scool. Copyrigt and Moral

More information

Instantaneous Rate of Change:

Instantaneous Rate of Change: Instantaneous Rate of Cange: Last section we discovered tat te average rate of cange in F(x) can also be interpreted as te slope of a scant line. Te average rate of cange involves te cange in F(x) over

More information

Schedulability Analysis under Graph Routing in WirelessHART Networks

Schedulability Analysis under Graph Routing in WirelessHART Networks Scedulability Analysis under Grap Routing in WirelessHART Networks Abusayeed Saifulla, Dolvara Gunatilaka, Paras Tiwari, Mo Sa, Cenyang Lu, Bo Li Cengjie Wu, and Yixin Cen Department of Computer Science,

More information

We consider the problem of determining (for a short lifecycle) retail product initial and

We consider the problem of determining (for a short lifecycle) retail product initial and Optimizing Inventory Replenisment of Retail Fasion Products Marsall Fiser Kumar Rajaram Anant Raman Te Warton Scool, University of Pennsylvania, 3620 Locust Walk, 3207 SH-DH, Piladelpia, Pennsylvania 19104-6366

More information

MATHEMATICS FOR ENGINEERING DIFFERENTIATION TUTORIAL 1 - BASIC DIFFERENTIATION

MATHEMATICS FOR ENGINEERING DIFFERENTIATION TUTORIAL 1 - BASIC DIFFERENTIATION MATHEMATICS FOR ENGINEERING DIFFERENTIATION TUTORIAL 1 - BASIC DIFFERENTIATION Tis tutorial is essential pre-requisite material for anyone stuing mecanical engineering. Tis tutorial uses te principle of

More information

An inquiry into the multiplier process in IS-LM model

An inquiry into the multiplier process in IS-LM model An inquiry into te multiplier process in IS-LM model Autor: Li ziran Address: Li ziran, Room 409, Building 38#, Peing University, Beijing 00.87,PRC. Pone: (86) 00-62763074 Internet Address: jefferson@water.pu.edu.cn

More information

ACT Math Facts & Formulas

ACT Math Facts & Formulas Numbers, Sequences, Factors Integers:..., -3, -2, -1, 0, 1, 2, 3,... Rationals: fractions, tat is, anyting expressable as a ratio of integers Reals: integers plus rationals plus special numbers suc as

More information

ON LOCAL LIKELIHOOD DENSITY ESTIMATION WHEN THE BANDWIDTH IS LARGE

ON LOCAL LIKELIHOOD DENSITY ESTIMATION WHEN THE BANDWIDTH IS LARGE ON LOCAL LIKELIHOOD DENSITY ESTIMATION WHEN THE BANDWIDTH IS LARGE Byeong U. Park 1 and Young Kyung Lee 2 Department of Statistics, Seoul National University, Seoul, Korea Tae Yoon Kim 3 and Ceolyong Park

More information

Unemployment insurance/severance payments and informality in developing countries

Unemployment insurance/severance payments and informality in developing countries Unemployment insurance/severance payments and informality in developing countries David Bardey y and Fernando Jaramillo z First version: September 2011. Tis version: November 2011. Abstract We analyze

More information

Computer Science and Engineering, UCSD October 7, 1999 Goldreic-Levin Teorem Autor: Bellare Te Goldreic-Levin Teorem 1 Te problem We æx a an integer n for te lengt of te strings involved. If a is an n-bit

More information

Lecture 10. Limits (cont d) One-sided limits. (Relevant section from Stewart, Seventh Edition: Section 2.4, pp. 113.)

Lecture 10. Limits (cont d) One-sided limits. (Relevant section from Stewart, Seventh Edition: Section 2.4, pp. 113.) Lecture 10 Limits (cont d) One-sided its (Relevant section from Stewart, Sevent Edition: Section 2.4, pp. 113.) As you may recall from your earlier course in Calculus, we may define one-sided its, were

More information

7.6 Complex Fractions

7.6 Complex Fractions Section 7.6 Comple Fractions 695 7.6 Comple Fractions In tis section we learn ow to simplify wat are called comple fractions, an eample of wic follows. 2 + 3 Note tat bot te numerator and denominator are

More information

College Planning Using Cash Value Life Insurance

College Planning Using Cash Value Life Insurance College Planning Using Cas Value Life Insurance CAUTION: Te advisor is urged to be extremely cautious of anoter college funding veicle wic provides a guaranteed return of premium immediately if funded

More information

Can a Lump-Sum Transfer Make Everyone Enjoy the Gains. from Free Trade?

Can a Lump-Sum Transfer Make Everyone Enjoy the Gains. from Free Trade? Can a Lump-Sum Transfer Make Everyone Enjoy te Gains from Free Trade? Yasukazu Icino Department of Economics, Konan University June 30, 2010 Abstract I examine lump-sum transfer rules to redistribute te

More information

Lecture 10: What is a Function, definition, piecewise defined functions, difference quotient, domain of a function

Lecture 10: What is a Function, definition, piecewise defined functions, difference quotient, domain of a function Lecture 10: Wat is a Function, definition, piecewise defined functions, difference quotient, domain of a function A function arises wen one quantity depends on anoter. Many everyday relationsips between

More information

FINITE DIFFERENCE METHODS

FINITE DIFFERENCE METHODS FINITE DIFFERENCE METHODS LONG CHEN Te best known metods, finite difference, consists of replacing eac derivative by a difference quotient in te classic formulation. It is simple to code and economic to

More information

OPTIMAL FLEET SELECTION FOR EARTHMOVING OPERATIONS

OPTIMAL FLEET SELECTION FOR EARTHMOVING OPERATIONS New Developments in Structural Engineering and Construction Yazdani, S. and Sing, A. (eds.) ISEC-7, Honolulu, June 18-23, 2013 OPTIMAL FLEET SELECTION FOR EARTHMOVING OPERATIONS JIALI FU 1, ERIK JENELIUS

More information

SHAPE: A NEW BUSINESS ANALYTICS WEB PLATFORM FOR GETTING INSIGHTS ON ELECTRICAL LOAD PATTERNS

SHAPE: A NEW BUSINESS ANALYTICS WEB PLATFORM FOR GETTING INSIGHTS ON ELECTRICAL LOAD PATTERNS CIRED Worksop - Rome, 11-12 June 2014 SAPE: A NEW BUSINESS ANALYTICS WEB PLATFORM FOR GETTING INSIGTS ON ELECTRICAL LOAD PATTERNS Diego Labate Paolo Giubbini Gianfranco Cicco Mario Ettorre Enel Distribuzione-Italy

More information

Predicting the behavior of interacting humans by fusing data from multiple sources

Predicting the behavior of interacting humans by fusing data from multiple sources Predicting te beavior of interacting umans by fusing data from multiple sources Erik J. Sclict 1, Ritcie Lee 2, David H. Wolpert 3,4, Mykel J. Kocenderfer 1, and Brendan Tracey 5 1 Lincoln Laboratory,

More information

- 1 - Handout #22 May 23, 2012 Huffman Encoding and Data Compression. CS106B Spring 2012. Handout by Julie Zelenski with minor edits by Keith Schwarz

- 1 - Handout #22 May 23, 2012 Huffman Encoding and Data Compression. CS106B Spring 2012. Handout by Julie Zelenski with minor edits by Keith Schwarz CS106B Spring 01 Handout # May 3, 01 Huffman Encoding and Data Compression Handout by Julie Zelenski wit minor edits by Keit Scwarz In te early 1980s, personal computers ad ard disks tat were no larger

More information

SWITCH T F T F SELECT. (b) local schedule of two branches. (a) if-then-else construct A & B MUX. one iteration cycle

SWITCH T F T F SELECT. (b) local schedule of two branches. (a) if-then-else construct A & B MUX. one iteration cycle 768 IEEE RANSACIONS ON COMPUERS, VOL. 46, NO. 7, JULY 997 Compile-ime Sceduling of Dynamic Constructs in Dataæow Program Graps Soonoi Ha, Member, IEEE and Edward A. Lee, Fellow, IEEE Abstract Sceduling

More information

Improved dynamic programs for some batcing problems involving te maximum lateness criterion A P M Wagelmans Econometric Institute Erasmus University Rotterdam PO Box 1738, 3000 DR Rotterdam Te Neterlands

More information

2 Limits and Derivatives

2 Limits and Derivatives 2 Limits and Derivatives 2.7 Tangent Lines, Velocity, and Derivatives A tangent line to a circle is a line tat intersects te circle at exactly one point. We would like to take tis idea of tangent line

More information

Staffing and routing in a two-tier call centre. Sameer Hasija*, Edieal J. Pinker and Robert A. Shumsky

Staffing and routing in a two-tier call centre. Sameer Hasija*, Edieal J. Pinker and Robert A. Shumsky 8 Int. J. Operational Researc, Vol. 1, Nos. 1/, 005 Staffing and routing in a two-tier call centre Sameer Hasija*, Edieal J. Pinker and Robert A. Sumsky Simon Scool, University of Rocester, Rocester 1467,

More information

An Interest Rate Model

An Interest Rate Model An Interest Rate Model Concepts and Buzzwords Building Price Tree from Rate Tree Lognormal Interest Rate Model Nonnegativity Volatility and te Level Effect Readings Tuckman, capters 11 and 12. Lognormal

More information

Equilibria in sequential bargaining games as solutions to systems of equations

Equilibria in sequential bargaining games as solutions to systems of equations Economics Letters 84 (2004) 407 411 www.elsevier.com/locate/econbase Equilibria in sequential bargaining games as solutions to systems of equations Tasos Kalandrakis* Department of Political Science, Yale

More information

A system to monitor the quality of automated coding of textual answers to open questions

A system to monitor the quality of automated coding of textual answers to open questions Researc in Official Statistics Number 2/2001 A system to monitor te quality of automated coding of textual answers to open questions Stefania Maccia * and Marcello D Orazio ** Italian National Statistical

More information

Simultaneous Location of Trauma Centers and Helicopters for Emergency Medical Service Planning

Simultaneous Location of Trauma Centers and Helicopters for Emergency Medical Service Planning Simultaneous Location of Trauma Centers and Helicopters for Emergency Medical Service Planning Soo-Haeng Co Hoon Jang Taesik Lee Jon Turner Tepper Scool of Business, Carnegie Mellon University, Pittsburg,

More information

Dynamic Competitive Insurance

Dynamic Competitive Insurance Dynamic Competitive Insurance Vitor Farina Luz June 26, 205 Abstract I analyze long-term contracting in insurance markets wit asymmetric information and a finite or infinite orizon. Risk neutral firms

More information

Distances in random graphs with infinite mean degrees

Distances in random graphs with infinite mean degrees Distances in random graps wit infinite mean degrees Henri van den Esker, Remco van der Hofstad, Gerard Hoogiemstra and Dmitri Znamenski April 26, 2005 Abstract We study random graps wit an i.i.d. degree

More information

2.13 Solid Waste Management. Introduction. Scope and Objectives. Conclusions

2.13 Solid Waste Management. Introduction. Scope and Objectives. Conclusions Introduction Te planning and delivery of waste management in Newfoundland and Labrador is te direct responsibility of municipalities and communities. Te Province olds overall responsibility for te development

More information

Area-Specific Recreation Use Estimation Using the National Visitor Use Monitoring Program Data

Area-Specific Recreation Use Estimation Using the National Visitor Use Monitoring Program Data United States Department of Agriculture Forest Service Pacific Nortwest Researc Station Researc Note PNW-RN-557 July 2007 Area-Specific Recreation Use Estimation Using te National Visitor Use Monitoring

More information

Finite Volume Discretization of the Heat Equation

Finite Volume Discretization of the Heat Equation Lecture Notes 3 Finite Volume Discretization of te Heat Equation We consider finite volume discretizations of te one-dimensional variable coefficient eat equation, wit Neumann boundary conditions u t x

More information

Artificial Neural Networks for Time Series Prediction - a novel Approach to Inventory Management using Asymmetric Cost Functions

Artificial Neural Networks for Time Series Prediction - a novel Approach to Inventory Management using Asymmetric Cost Functions Artificial Neural Networks for Time Series Prediction - a novel Approac to Inventory Management using Asymmetric Cost Functions Sven F. Crone University of Hamburg, Institute of Information Systems crone@econ.uni-amburg.de

More information

Tangent Lines and Rates of Change

Tangent Lines and Rates of Change Tangent Lines and Rates of Cange 9-2-2005 Given a function y = f(x), ow do you find te slope of te tangent line to te grap at te point P(a, f(a))? (I m tinking of te tangent line as a line tat just skims

More information

Chapter 10: Refrigeration Cycles

Chapter 10: Refrigeration Cycles Capter 10: efrigeration Cycles Te vapor compression refrigeration cycle is a common metod for transferring eat from a low temperature to a ig temperature. Te above figure sows te objectives of refrigerators

More information

Design and Analysis of a Fault-Tolerant Mechanism for a Server-Less Video-On-Demand System

Design and Analysis of a Fault-Tolerant Mechanism for a Server-Less Video-On-Demand System Design and Analysis of a Fault-olerant Mecanism for a Server-Less Video-On-Demand System Jack Y. B. Lee Department of Information Engineering e Cinese University of Hong Kong Satin, N.., Hong Kong Email:

More information

2.23 Gambling Rehabilitation Services. Introduction

2.23 Gambling Rehabilitation Services. Introduction 2.23 Gambling Reabilitation Services Introduction Figure 1 Since 1995 provincial revenues from gambling activities ave increased over 56% from $69.2 million in 1995 to $108 million in 2004. Te majority

More information

MATHEMATICAL MODELS OF LIFE SUPPORT SYSTEMS Vol. I - Mathematical Models for Prediction of Climate - Dymnikov V.P.

MATHEMATICAL MODELS OF LIFE SUPPORT SYSTEMS Vol. I - Mathematical Models for Prediction of Climate - Dymnikov V.P. MATHEMATICAL MODELS FOR PREDICTION OF CLIMATE Institute of Numerical Matematics, Russian Academy of Sciences, Moscow, Russia. Keywords: Modeling, climate system, climate, dynamic system, attractor, dimension,

More information

Catalogue no. 12-001-XIE. Survey Methodology. December 2004

Catalogue no. 12-001-XIE. Survey Methodology. December 2004 Catalogue no. 1-001-XIE Survey Metodology December 004 How to obtain more information Specific inquiries about tis product and related statistics or services sould be directed to: Business Survey Metods

More information

In other words the graph of the polynomial should pass through the points

In other words the graph of the polynomial should pass through the points Capter 3 Interpolation Interpolation is te problem of fitting a smoot curve troug a given set of points, generally as te grap of a function. It is useful at least in data analysis (interpolation is a form

More information

OPTIMAL DISCONTINUOUS GALERKIN METHODS FOR THE ACOUSTIC WAVE EQUATION IN HIGHER DIMENSIONS

OPTIMAL DISCONTINUOUS GALERKIN METHODS FOR THE ACOUSTIC WAVE EQUATION IN HIGHER DIMENSIONS OPTIMAL DISCONTINUOUS GALERKIN METHODS FOR THE ACOUSTIC WAVE EQUATION IN HIGHER DIMENSIONS ERIC T. CHUNG AND BJÖRN ENGQUIST Abstract. In tis paper, we developed and analyzed a new class of discontinuous

More information

Math Test Sections. The College Board: Expanding College Opportunity

Math Test Sections. The College Board: Expanding College Opportunity Taking te SAT I: Reasoning Test Mat Test Sections Te materials in tese files are intended for individual use by students getting ready to take an SAT Program test; permission for any oter use must be sougt

More information

Cyber Epidemic Models with Dependences

Cyber Epidemic Models with Dependences Cyber Epidemic Models wit Dependences Maocao Xu 1, Gaofeng Da 2 and Souuai Xu 3 1 Department of Matematics, Illinois State University mxu2@ilstu.edu 2 Institute for Cyber Security, University of Texas

More information

Analyzing the Effects of Insuring Health Risks:

Analyzing the Effects of Insuring Health Risks: Analyzing te Effects of Insuring Healt Risks: On te Trade-off between Sort Run Insurance Benefits vs. Long Run Incentive Costs Harold L. Cole University of Pennsylvania and NBER Soojin Kim University of

More information

M(0) = 1 M(1) = 2 M(h) = M(h 1) + M(h 2) + 1 (h > 1)

M(0) = 1 M(1) = 2 M(h) = M(h 1) + M(h 2) + 1 (h > 1) Insertion and Deletion in VL Trees Submitted in Partial Fulfillment of te Requirements for Dr. Eric Kaltofen s 66621: nalysis of lgoritms by Robert McCloskey December 14, 1984 1 ackground ccording to Knut

More information

What is Advanced Corporate Finance? What is finance? What is Corporate Finance? Deciding how to optimally manage a firm s assets and liabilities.

What is Advanced Corporate Finance? What is finance? What is Corporate Finance? Deciding how to optimally manage a firm s assets and liabilities. Wat is? Spring 2008 Note: Slides are on te web Wat is finance? Deciding ow to optimally manage a firm s assets and liabilities. Managing te costs and benefits associated wit te timing of cas in- and outflows

More information

The differential amplifier

The differential amplifier DiffAmp.doc 1 Te differential amplifier Te emitter coupled differential amplifier output is V o = A d V d + A c V C Were V d = V 1 V 2 and V C = (V 1 + V 2 ) / 2 In te ideal differential amplifier A c

More information

1. Case description. Best practice description

1. Case description. Best practice description 1. Case description Best practice description Tis case sows ow a large multinational went troug a bottom up organisational cange to become a knowledge-based company. A small community on knowledge Management

More information

Pre-trial Settlement with Imperfect Private Monitoring

Pre-trial Settlement with Imperfect Private Monitoring Pre-trial Settlement wit Imperfect Private Monitoring Mostafa Beskar University of New Hampsire Jee-Hyeong Park y Seoul National University July 2011 Incomplete, Do Not Circulate Abstract We model pretrial

More information

Bonferroni-Based Size-Correction for Nonstandard Testing Problems

Bonferroni-Based Size-Correction for Nonstandard Testing Problems Bonferroni-Based Size-Correction for Nonstandard Testing Problems Adam McCloskey Brown University October 2011; Tis Version: October 2012 Abstract We develop powerful new size-correction procedures for

More information

Math 113 HW #5 Solutions

Math 113 HW #5 Solutions Mat 3 HW #5 Solutions. Exercise.5.6. Suppose f is continuous on [, 5] and te only solutions of te equation f(x) = 6 are x = and x =. If f() = 8, explain wy f(3) > 6. Answer: Suppose we ad tat f(3) 6. Ten

More information

CHAPTER 7. Di erentiation

CHAPTER 7. Di erentiation CHAPTER 7 Di erentiation 1. Te Derivative at a Point Definition 7.1. Let f be a function defined on a neigborood of x 0. f is di erentiable at x 0, if te following it exists: f 0 fx 0 + ) fx 0 ) x 0 )=.

More information

To motivate the notion of a variogram for a covariance stationary process, { Ys ( ): s R}

To motivate the notion of a variogram for a covariance stationary process, { Ys ( ): s R} 4. Variograms Te covariogram and its normalized form, te correlogram, are by far te most intuitive metods for summarizing te structure of spatial dependencies in a covariance stationary process. However,

More information

h Understanding the safe operating principles and h Gaining maximum benefit and efficiency from your h Evaluating your testing system's performance

h Understanding the safe operating principles and h Gaining maximum benefit and efficiency from your h Evaluating your testing system's performance EXTRA TM Instron Services Revolve Around You It is everyting you expect from a global organization Te global training centers offer a complete educational service for users of advanced materials testing

More information

SAT Subject Math Level 1 Facts & Formulas

SAT Subject Math Level 1 Facts & Formulas Numbers, Sequences, Factors Integers:..., -3, -2, -1, 0, 1, 2, 3,... Reals: integers plus fractions, decimals, and irrationals ( 2, 3, π, etc.) Order Of Operations: Aritmetic Sequences: PEMDAS (Parenteses

More information

Theoretical calculation of the heat capacity

Theoretical calculation of the heat capacity eoretical calculation of te eat capacity Principle of equipartition of energy Heat capacity of ideal and real gases Heat capacity of solids: Dulong-Petit, Einstein, Debye models Heat capacity of metals

More information

Strategic trading and welfare in a dynamic market. Dimitri Vayanos

Strategic trading and welfare in a dynamic market. Dimitri Vayanos LSE Researc Online Article (refereed) Strategic trading and welfare in a dynamic market Dimitri Vayanos LSE as developed LSE Researc Online so tat users may access researc output of te Scool. Copyrigt

More information

SAT Math Must-Know Facts & Formulas

SAT Math Must-Know Facts & Formulas SAT Mat Must-Know Facts & Formuas Numbers, Sequences, Factors Integers:..., -3, -2, -1, 0, 1, 2, 3,... Rationas: fractions, tat is, anyting expressabe as a ratio of integers Reas: integers pus rationas

More information

2.12 Student Transportation. Introduction

2.12 Student Transportation. Introduction Introduction Figure 1 At 31 Marc 2003, tere were approximately 84,000 students enrolled in scools in te Province of Newfoundland and Labrador, of wic an estimated 57,000 were transported by scool buses.

More information

Derivatives Math 120 Calculus I D Joyce, Fall 2013

Derivatives Math 120 Calculus I D Joyce, Fall 2013 Derivatives Mat 20 Calculus I D Joyce, Fall 203 Since we ave a good understanding of its, we can develop derivatives very quickly. Recall tat we defined te derivative f x of a function f at x to be te

More information

Inverted pendulum systems: rotary and arm-driven - a mechatronic system design case study

Inverted pendulum systems: rotary and arm-driven - a mechatronic system design case study Mecatronics 12 2002) 357±370 Inverted pendulum systems: rotary and arm-driven - a mecatronic system design case study S. Awtar, N. King, T. Allen, I. Bang, M. Hagan, D. Skidmore, K. Craig * Department

More information

A FLOW NETWORK ANALYSIS OF A LIQUID COOLING SYSTEM THAT INCORPORATES MICROCHANNEL HEAT SINKS

A FLOW NETWORK ANALYSIS OF A LIQUID COOLING SYSTEM THAT INCORPORATES MICROCHANNEL HEAT SINKS A FLOW NETWORK ANALYSIS OF A LIQUID COOLING SYSTEM THAT INCORPORATES MICROCHANNEL HEAT SINKS Amir Radmer and Suas V. Patankar Innovative Researc, Inc. 3025 Harbor Lane Nort, Suite 300 Plymout, MN 55447

More information

Referendum-led Immigration Policy in the Welfare State

Referendum-led Immigration Policy in the Welfare State Referendum-led Immigration Policy in te Welfare State YUJI TAMURA Department of Economics, University of Warwick, UK First version: 12 December 2003 Updated: 16 Marc 2004 Abstract Preferences of eterogeneous

More information

The modelling of business rules for dashboard reporting using mutual information

The modelling of business rules for dashboard reporting using mutual information 8 t World IMACS / MODSIM Congress, Cairns, Australia 3-7 July 2009 ttp://mssanz.org.au/modsim09 Te modelling of business rules for dasboard reporting using mutual information Gregory Calbert Command, Control,

More information

A strong credit score can help you score a lower rate on a mortgage

A strong credit score can help you score a lower rate on a mortgage NET GAIN Scoring points for your financial future AS SEEN IN USA TODAY S MONEY SECTION, JULY 3, 2007 A strong credit score can elp you score a lower rate on a mortgage By Sandra Block Sales of existing

More information

A NOVEL PASSIVE ENERGY DISSIPATION SYSTEM FOR FRAME-CORE TUBE STRUCTURE

A NOVEL PASSIVE ENERGY DISSIPATION SYSTEM FOR FRAME-CORE TUBE STRUCTURE Te Sevent Asia-Pacific Conference on Wind Engineering, November 8-, 009, Taipei, Taiwan A NOVEL PASSIVE ENERGY DISSIPATION SYSTEM FOR FRAME-CORE TUBE STRUCTURE Zeng-qing Cen and Zi-ao Wang Director, Wind

More information

Multivariate time series analysis: Some essential notions

Multivariate time series analysis: Some essential notions Capter 2 Multivariate time series analysis: Some essential notions An overview of a modeling and learning framework for multivariate time series was presented in Capter 1. In tis capter, some notions on

More information

Welfare, financial innovation and self insurance in dynamic incomplete markets models

Welfare, financial innovation and self insurance in dynamic incomplete markets models Welfare, financial innovation and self insurance in dynamic incomplete markets models Paul Willen Department of Economics Princeton University First version: April 998 Tis version: July 999 Abstract We

More information

Research on Risk Assessment of PFI Projects Based on Grid-fuzzy Borda Number

Research on Risk Assessment of PFI Projects Based on Grid-fuzzy Borda Number Researc on Risk Assessent of PFI Projects Based on Grid-fuzzy Borda Nuber LI Hailing 1, SHI Bensan 2 1. Scool of Arcitecture and Civil Engineering, Xiua University, Cina, 610039 2. Scool of Econoics and

More information

A Behavior Based Kernel for Policy Search via Bayesian Optimization

A Behavior Based Kernel for Policy Search via Bayesian Optimization via Bayesian Optimization Aaron Wilson WILSONAA@EECS.OREGONSTATE.EDU Alan Fern AFERN@EECS.OREGONSTATE.EDU Prasad Tadepalli TADEPALL@EECS.OREGONSTATE.EDU Oregon State University School of EECS, 1148 Kelley

More information

Factoring Synchronous Grammars By Sorting

Factoring Synchronous Grammars By Sorting Factoring Syncronous Grammars By Sorting Daniel Gildea Computer Science Dept. Uniersity of Rocester Rocester, NY Giorgio Satta Dept. of Information Eng g Uniersity of Padua I- Padua, Italy Hao Zang Computer

More information

Haptic Manipulation of Virtual Materials for Medical Application

Haptic Manipulation of Virtual Materials for Medical Application Haptic Manipulation of Virtual Materials for Medical Application HIDETOSHI WAKAMATSU, SATORU HONMA Graduate Scool of Healt Care Sciences Tokyo Medical and Dental University, JAPAN wakamatsu.bse@tmd.ac.jp

More information

On Distributed Key Distribution Centers and Unconditionally Secure Proactive Verifiable Secret Sharing Schemes Based on General Access Structure

On Distributed Key Distribution Centers and Unconditionally Secure Proactive Verifiable Secret Sharing Schemes Based on General Access Structure On Distributed Key Distribution Centers and Unconditionally Secure Proactive Verifiable Secret Saring Scemes Based on General Access Structure (Corrected Version) Ventzislav Nikov 1, Svetla Nikova 2, Bart

More information

TD(0) Leads to Better Policies than Approximate Value Iteration

TD(0) Leads to Better Policies than Approximate Value Iteration TD(0) Leads to Better Policies than Approximate Value Iteration Benjamin Van Roy Management Science and Engineering and Electrical Engineering Stanford University Stanford, CA 94305 bvr@stanford.edu Abstract

More information

Part II: Finite Difference/Volume Discretisation for CFD

Part II: Finite Difference/Volume Discretisation for CFD Part II: Finite Difference/Volume Discretisation for CFD Finite Volume Metod of te Advection-Diffusion Equation A Finite Difference/Volume Metod for te Incompressible Navier-Stokes Equations Marker-and-Cell

More information

Pioneer Fund Story. Searching for Value Today and Tomorrow. Pioneer Funds Equities

Pioneer Fund Story. Searching for Value Today and Tomorrow. Pioneer Funds Equities Pioneer Fund Story Searcing for Value Today and Tomorrow Pioneer Funds Equities Pioneer Fund A Cornerstone of Financial Foundations Since 1928 Te fund s relatively cautious stance as kept it competitive

More information

Evaluating probabilities under high-dimensional latent variable models

Evaluating probabilities under high-dimensional latent variable models Evaluating probabilities under ig-dimensional latent variable models Iain Murray and Ruslan alakutdinov Department of Computer cience University of oronto oronto, ON. M5 3G4. Canada. {murray,rsalaku}@cs.toronto.edu

More information

Working Capital 2013 UK plc s unproductive 69 billion

Working Capital 2013 UK plc s unproductive 69 billion 2013 Executive summary 2. Te level of excess working capital increased 3. UK sectors acieve a mixed performance 4. Size matters in te supply cain 6. Not all companies are overflowing wit cas 8. Excess

More information

Determine the perimeter of a triangle using algebra Find the area of a triangle using the formula

Determine the perimeter of a triangle using algebra Find the area of a triangle using the formula Student Name: Date: Contact Person Name: Pone Number: Lesson 0 Perimeter, Area, and Similarity of Triangles Objectives Determine te perimeter of a triangle using algebra Find te area of a triangle using

More information

Regression Using Support Vector Machines: Basic Foundations

Regression Using Support Vector Machines: Basic Foundations Regression Using Support Vector Machines: Basic Foundations Technical Report December 2004 Aly Farag and Refaat M Mohamed Computer Vision and Image Processing Laboratory Electrical and Computer Engineering

More information

Large-scale Virtual Acoustics Simulation at Audio Rates Using Three Dimensional Finite Difference Time Domain and Multiple GPUs

Large-scale Virtual Acoustics Simulation at Audio Rates Using Three Dimensional Finite Difference Time Domain and Multiple GPUs Large-scale Virtual Acoustics Simulation at Audio Rates Using Tree Dimensional Finite Difference Time Domain and Multiple GPUs Craig J. Webb 1,2 and Alan Gray 2 1 Acoustics Group, University of Edinburg

More information

A hybrid model of dynamic electricity price forecasting with emphasis on price volatility

A hybrid model of dynamic electricity price forecasting with emphasis on price volatility all times On a non-liquid market, te accuracy of a price A ybrid model of dynamic electricity price forecasting wit empasis on price volatility Marin Cerjan Abstract-- Accurate forecasting tools are essential

More information

Pretrial Settlement with Imperfect Private Monitoring

Pretrial Settlement with Imperfect Private Monitoring Pretrial Settlement wit Imperfect Private Monitoring Mostafa Beskar Indiana University Jee-Hyeong Park y Seoul National University April, 2016 Extremely Preliminary; Please Do Not Circulate. Abstract We

More information

Abstract. Introduction

Abstract. Introduction Fast solution of te Sallow Water Equations using GPU tecnology A Crossley, R Lamb, S Waller JBA Consulting, Sout Barn, Brougton Hall, Skipton, Nort Yorksire, BD23 3AE. amanda.crossley@baconsulting.co.uk

More information

Government Debt and Optimal Monetary and Fiscal Policy

Government Debt and Optimal Monetary and Fiscal Policy Government Debt and Optimal Monetary and Fiscal Policy Klaus Adam Manneim University and CEPR - preliminary version - June 7, 21 Abstract How do di erent levels of government debt a ect te optimal conduct

More information

Motivation. Motivation. Can a software agent learn to play Backgammon by itself? Machine Learning. Reinforcement Learning

Motivation. Motivation. Can a software agent learn to play Backgammon by itself? Machine Learning. Reinforcement Learning Motivation Machine Learning Can a software agent learn to play Backgammon by itself? Reinforcement Learning Prof. Dr. Martin Riedmiller AG Maschinelles Lernen und Natürlichsprachliche Systeme Institut

More information

Free Shipping and Repeat Buying on the Internet: Theory and Evidence

Free Shipping and Repeat Buying on the Internet: Theory and Evidence Free Sipping and Repeat Buying on te Internet: eory and Evidence Yingui Yang, Skander Essegaier and David R. Bell 1 June 13, 2005 1 Graduate Scool of Management, University of California at Davis (yiyang@ucdavis.edu)

More information

Note nine: Linear programming CSE 101. 1 Linear constraints and objective functions. 1.1 Introductory example. Copyright c Sanjoy Dasgupta 1

Note nine: Linear programming CSE 101. 1 Linear constraints and objective functions. 1.1 Introductory example. Copyright c Sanjoy Dasgupta 1 Copyrigt c Sanjoy Dasgupta Figure. (a) Te feasible region for a linear program wit two variables (see tet for details). (b) Contour lines of te objective function: for different values of (profit). Te

More information

Reinforced Concrete Beam

Reinforced Concrete Beam Mecanics of Materials Reinforced Concrete Beam Concrete Beam Concrete Beam We will examine a concrete eam in ending P P A concrete eam is wat we call a composite eam It is made of two materials: concrete

More information