Online MultiTask Learning for Policy Gradient Methods


 Donald Theodore Jefferson
 3 years ago
 Views:
Transcription
1 Online Multiask Learning for Policy Gradient Metods aitam Bou Ammar Eric Eaton University of Pennsylvania, Computer and Information Science Department, Piladelpia, PA USA Paul Ruvolo Olin College of Engineering, Needam, MA USA Mattew E aylor Wasington State University, Scool of Electrical Engineering and Computer Science, Pullman, WA USA Abstract Policy gradient algoritms ave sown considerable recent success in solving igdimensional sequential decision making tasks, particularly in robotics owever, tese metods often require extensive experience in a domain to acieve ig performance o make agents more sampleefficient, we developed a multitask policy gradient metod to learn decision making tasks consecutively, transferring knowledge between tasks to accelerate learning Our approac provides robust teoretical guarantees, and we sow empirically tat it dramatically accelerates learning on a variety of dynamical systems, including an application to quadrotor control 1 Introduction Sequential decision making SDM is an essential component of autonomous systems Altoug significant progress as been made on developing algoritms for learning isolated SDM tasks, tese algoritms often require a large amount of experience before acieving acceptable performance is is particularly true in te case of igdimensional SDM tasks tat arise in robot control problems e cost of tis experience can be proibitively expensive in terms of bot time and fatigue of te robot s components, especially in scenarios were an agent will face multiple tasks and must be able to quickly acquire control policies for eac new task Anoter failure mode of conventional metods is tat wen te production environment differs significantly from te training environment, previously learned policies may no longer be correct Proceedings of te 31 st International Conference on Macine Learning, Beijing, Cina, 2014 JMLR: W&CP volume 32 Copyrigt 2014 by te autors Wen data is in limited supply, learning task models jointly troug multitask learning ML rater tan independently can significantly improve model performance run & O Sullivan, 1996; Zang et al, 2008; Rai & Daumé, 2010; Kumar & Daumé, 2012 owever, ML s performance gain comes at a ig computational cost wen learning new tasks or wen updating previously learned models Recent work Ruvolo & Eaton, 2013 in te supervised setting as sown tat nearly identical performance to batc ML can be acieved in online learning wit large computational speedups Building upon tis work, we introduce an online ML approac to learn a sequence of SDM tasks wit low computational overead Specifically, we develop an online ML formulation of policy gradient reinforcement learning tat enables an autonomous agent to accumulate knowledge over its lifetime and efficiently sare tis knowledge between SDM tasks to accelerate learning We call tis approac te Policy Gradient Efficient Lifelong Learning Algoritm PGELLA te first to our knowledge online ML policy gradient metod Instead of learning a control policy for an SDM task from scratc, as in standard policy gradient metods, our approac rapidly learns a igperformance control policy based on te agent s previously learned knowledge Knowledge is sared between SDM tasks via a latent basis tat captures reusable components of te learned policies e latent basis is ten updated wit newly acquired knowledge, enabling a accelerated learning of new task models and b improvement in te performance of existing models witout retraining on teir respective tasks e latter capability is especially important in ensuring tat te agent can accumulate knowledge over its lifetime across numerous tasks witout exibiting negative transfer We sow tat tis process is igly efficient wit robust teoretical guarantees We evaluate PGELLA on four dynamical systems, including an application to quadrotor control, and sow tat PGELLA outperforms standard policy gradients bot in te initial and final performance
2 Online Multiask Learning for Policy Gradient Metods 2 Related Work in Multiask RL Due to its empirical success, tere is a growing body of work on transfer learning approaces to reinforcement learning RL aylor & Stone, 2009 By contrast, relatively few metods for multitask RL ave been proposed One class of algoritms for multitask RL use nonparametric Bayesian models to sare knowledge between tasks For instance, Wilson et al 2007 developed a ierarcical Bayesian approac tat models te distribution over Markov decision processes MDPs and uses tis distribution as a prior for learning eac new task, enabling it to learn tasks consecutively In contrast to our work, Wilson et al focused on environments wit discrete states and actions Additionally, teir metod requires te ability to compute an optimal policy given an MDP is process can be expensive for even moderately large discrete environments, but is computationally intractable for te types of continuous, igdimensional control problems considered ere Anoter example is by Li et al 2009, wo developed a modelfree multitask RL metod for partially observable environments Unlike our problem setting, teir metod focuses on offpolicy batc ML Finally, Lazaric & Gavamzade 2010 exploit sared structure in te value functions between related MDPs owever, teir approac is designed for onpolicy multitask policy evaluation, rater tan computing optimal policies A second approac to multitask RL is based on Policy Reuse Fernández & Veloso, 2013, in wic policies from previously learned tasks are probabilistically reused to bias te learning of new tasks One drawback of Policy Reuse is tat it requires tat tasks sare common states, actions, and transition functions but allows different reward functions, wile our approac only requires tat tasks sare a common state and action space is restriction precludes te application of Policy Reuse to te scenarios considered in Section 7, were te systems ave related but not identical transition functions Also, in contrast to PGELLA, Policy Reuse does not support reverse transfer, were subsequent learning improves previously learned policies Peraps te approac most similar to ours is by Deisenrot et al 2014, wic uses policy gradients to learn a single controller tat is optimal on average over all training tasks By appropriately parameterizing te policy, te controller can be customized to particular tasks owever, tis metod requires tat tasks differ only in teir reward function, and tus is inapplicable to our experimental scenarios 3 Problem Framework We first describe our framework for policy gradient RL and lifelong learning e next section uses tis framework to present our approac to online ML for policy gradients 31 Policy Gradient Reinforcement Learning We frame eac SDM task as an RL problem, in wic an agent must sequentially select actions to maximize its expected return Suc problems are typically formalized as a Markov decision process MDP X, A, P, R, γ, were X R d is te potentially infinite set of states, A R m is te set of possible actions, P : X A X [0, 1] is a state transition probability function describing te system s dynamics, R : X A R is te reward function measuring te agent s performance, and γ [0, 1 specifies te degree to wic rewards are discounted over time At eac time step, te agent is in state x X and must coose an action a A, transitioning it to a new state x +1 px +1 x, a as given by P and yielding reward r +1 = Rx, a A policy π : X A [0, 1] is defined as a probability distribution over stateaction pairs, were πa x represents te probability of selecting action a in state x e goal of an RL agent is to find an optimal policy π tat maximizes te expected return e sequence of stateaction pairs forms a trajectory τ = [x 0:, a 0: ] over a possibly infinite orizon Policy gradient metods Sutton et al, 1999; Peters & Scaal, 2008; Peters & Bagnell, 2010 ave sown success in solving igdimensional problems, suc as robotic control Peters & Scaal, 2007 ese metods represent te policy π θ a x using a vector θ R d of control parameters e goal is to determine te optimal parameters θ tat maximize te expected average return: J θ = p θ τ Rτ dτ, 1 were is te set of all possible trajectories e trajectory distribution p θ τ and average per time step return Rτ are defined as: p θ τ = P 0 x 0 Rτ = 1 px +1 x, a π θ a x =1 r +1, =0 wit an initial state distribution P 0 : X [0, 1] Most policy gradient algoritms, suc as episodic REIN FORCE Williams, 1992, PoWER Kober & Peters, 2011, and Natural Actor Critic Peters & Scaal, 2008, employ supervised function approximators to learn te control parameters θ by maximizing a lower bound on te expected return of J θ Eq 1 o acieve tis, one generates trajectories using te current policy π θ, and ten compares te result wit a new policy parameterized by θ As described by Kober & Peters 2011, te lower bound on te expected return can be attained using Jensen s inequality
3 Online Multiask Learning for Policy Gradient Metods and te concavity of te logaritm: log J θ = log p θτ Rτ dτ p θ τ = log p θ τ p θτ Rτ dτ p θ τ Rτ log p θτ dτ + constant p θ τ D KL pθ τ Rτ p θτ = J L,θ θ, were D KL pτ qτ = pτ log pτ dτ We qτ see tat tis is equivalent to minimizing te KL divergence between te rewardweigted trajectory distribution of π θ and te trajectory distribution p θ of te new policy π θ 32 e Lifelong Learning Problem In contrast to most previous work on policy gradients, wic focus on singletask learning, tis paper focuses on te online ML setting in wic te agent is required to learn a series of SDM tasks Z 1,, Z max over its lifetime Eac task t is an MDP Z t = S t, A t, P t, R t, γ t wit initial state distribution e agent will learn te tasks consecutively, acquiring multiple trajectories witin eac task before moving to te next e tasks may be interleaved, providing te agent te opportunity to revisit earlier tasks for furter experience, but te agent as no control over te task order We assume tat a priori te agent does not know te total number of tasks max, teir distribution, or te task order P t 0 e agent s goal is to learn a set of optimal policies Π = { } π,, π θ 1 θ wit corresponding parameters Θ = { θ 1,, θ max } At any time, te agent max may be evaluated on any previously seen task, and so must strive to optimize its learned policies for all tasks Z 1,, Z, were denotes te number of tasks seen so far 1 max 4 Online ML for Policy Gradient Metods is section develops te Policy Gradient Efficient Lifelong Learning Algoritm PGELLA 41 Learning Objective o sare knowledge between tasks, we assume tat eac task s control parameters can be modeled as a linear combination of latent components from a sared knowledge base A number of supervised ML algoritms Kumar & Daumé, 2012; Ruvolo & Eaton, 2013; Maurer et al, 2013 ave sown tis approac to be successful Our approac incorporates te use of a sared latent basis into policy gradient learning to enable transfer between SDM tasks PGELLA maintains a library of k latent components L R d k tat is sared among all tasks, forming a basis for te control policies We can ten represent eac task s control parameters as a linear combination of tis latent basis θ t = Ls t, were s t R k is a taskspecific vector of coefficients e taskspecific coefficients s t are encouraged to be sparse to ensure tat eac learned basis component captures a maximal reusable cunk of knowledge We can ten represent our objective of learning stationary policies wile maximizing te amount of transfer between task models by: e L= 1 [ min J θ t + µ s t 1 ]+ λ L 2 s t F, 2 were θ t = Ls t, te L 1 norm of s t is used to approximate te true vector sparsity, and F is te Frobenius norm e form of tis objective function is closely related to oter supervised ML metods Ruvolo & Eaton, 2013; Maurer et al, 2013, wit important differences troug te incorporation of J as we will examine sortly Our approac to optimizing Eq 2 is based upon te Efficient Lifelong Learning Algoritm ELLA Ruvolo & Eaton, 2013, wic provides a computationally efficient metod for learning L and te s t s online over multiple tasks in te case of supervised ML e objective solved by ELLA is closely related to Eq 2, wit te exception tat te J term is replaced wit a measure of eac task model s average loss over te training data in ELLA Since Eq 2 is not jointly convex in L and te s t s, most supervised ML metods use an expensive alternating optimization procedure to train te task models simultaneously Ruvolo & Eaton provide an efficient alternative to tis procedure tat can train task models consecutively, enabling Eq 2 to be used effectively for online ML In te next section, we adapt tis approac to te policy gradient framework, and sow tat te resulting algoritm provides an efficient metod for learning consecutive SDM tasks 42 Multiask Policy Gradients Policy gradient metods maximize te lower bound of J θ Eq 1 In order to use Eq 2 for ML wit policy gradients, we must first incorporate tis lower bound into our objective function Rewriting te error term in Eq 2 in terms of te lower bound yields e L= 1 [ θt ] min J L,θ + µ s t 1 + λ L 2 s t F, were θ t = Ls t owever, we can note tat θt J L,θ p θ tτ R t pθ tτ R t τ τ log dτ p θtτ τ t
4 Online Multiask Learning for Policy Gradient Metods θt erefore, maximizing te lower bound of J L,θ is o compute te secondorder aylor θt representation, te equivalent to te following minimization problem: first and second derivatives of J L,θ wrt θ t are required e first derivative, θt J L,θ, is given by: θt min p θtτ R t pθ tτ R τ log t τ dτ θ t p θtτ p τ θtτ R t τ log p t θt θtτ dτ τ t Substituting te above result wit θ t = Ls t into Eq 2 leads to te following total cost function for ML wit policy gradients: {[ e L = 1 min p θ tτ R t τ s t τ t pθ tτ R t ] } 3 τ log dτ + µ s t 1 + λ L 2 p θtτ F Wile Eq 3 enables batc ML using policy gradients, it is computationally expensive due to two inefficiencies tat make it inappropriate for online ML: a te explicit dependence on all available trajectories troug J θ t = τ t p θ tτ R t τ dτ, and b te exaustive evaluation of a single candidate L tat requires te optimization of all s t s troug te outer summation ogeter, tese aspects cause Eq 3 and similarly Eq 2 to ave a computational cost tat depends on te total number of trajectories and total number of tasks, complicating its direct use in te lifelong learning setting We next describe metods for resolving eac of tese inefficiencies wile minimizing Eq 3, yielding PGELLA as an efficient metod for multitask policy gradient learning In fact, we sow tat te complexity of PGELLA in learning a single task policy is independent of a te number of tasks seen so far and b te number of trajectories for all oter tasks, allowing our approac to be igly efficient 421 ELIMINAING DEPENDENCE ON OER ASKS As mentioned above, one of te inefficiencies in minimizing e L is its dependence on all available trajectories for all tasks o remedy tis problem, as in ELLA, we approximate e L by performing a secondorder aylor expansion of J L,θ θ t around te optimal solution: { α t = arg min θ t p θ tτ R t τ τ t pθ tτ R t } τ log dτ p θtτ As sown by Ruvolo & Eaton 2013, te secondorder aylor expansion can be substituted into te ML objective function to provide a point estimate around te optimal solution, eliminating te dependence on oter tasks wit: log p θtτ = log p t x t 0 t + =0 t + =0 p t x t +1 xt, at log π θt a t xt erefore: θt θtj L,θ = p θ tτ R t τ τ t t θt log π a t θt xt dτ =1 t = E log π θt θt =1 a t xt R t τ Policy gradient algoritms determine α t = θ t by following θt te above gradient e second derivative of J L,θ can be computed similarly to produce: θt 2 θt, θ J t L,θ = p θ tτ R t τ τ t 2 θt, θ log π t θt t =1 a t xt dτ We let Γ t = 2 θt, θ tj θt L,θ represent te essian evaluated at α t : [ t ] Γ t = E R t τ 2 θt, θ log π a t t θt xt θ t =α t =1 Substituting te secondorder aylor approximation into Eq 3 yields te following: ê L= 1 [ α min t Ls t 2 ] s s t Γ t+ µ t 1 +λ L 2 F, were v 2 A = v Av, te constant term was suppressed since it as no effect on te minimization, and te linear term was ignored since by construction α t is a minimizer Most importantly, te dependence on all available trajectories as been eliminated, remedying te first inefficiency 4
5 Online Multiask Learning for Policy Gradient Metods 422 COMPUING E LAEN SPACE e second inefficiency in Eq 3 arises from te procedure used to compute te objective function for a single candidate L Namely, to determine ow effective a given value of L serves as a common basis for all learned tasks, an optimization problem must be solved to recompute eac of te s t s, wic becomes increasingly expensive as grows large o remedy tis problem, we modify Eq 3 or equivalently, Eq 4 to eliminate te minimization over all s t s Following te approac used in ELLA, we optimize eac taskspecific projection s t only wen training on task t, witout updating tem wen training on oter tasks Consequently, any canges to θ t wen learning on oter tasks will only be troug updates to te sared basis L As sown by Ruvolo & Eaton 2013, tis coice to update s t only wen training on task t does not significantly affect te quality of model fit as grows large Wit tis simplification, we can rewrite Eq 4 in terms of two update equations: s t arg min l L m, s, α t, Γ t 5 s L m+1 arg min L 1 l L, s t, α t, Γ t +λ L 2 F, 6 were L m refers to te value of te latent basis at te start of te m t training session, t corresponds to te particular task for wic data was just received, and l L, s, α, Γ = µ s 1 + α Ls 2 Γ o compute L m, we null te gradient of Eq 6 and solve te resulting equation to yield te updated columnwise vectorization of L as A 1 b, were: A = λi d k,d k + 1 b = 1 vec s t s t s t Γ t α t Γ t For efficiency, we can compute A and b incrementally as new tasks arrive, avoiding te need to sum over all tasks 43 Data Generation & Model Update Using te incremental form Eqs 5 6 of te policy gradient ML objective function Eq 3, we can now construct an online ML algoritm tat can operate in a lifelong learning setting In typical policy gradient metods, trajectories are generated in batc mode by first initializing te policy and sampling trajectories from te system Kober & Peters, 2011; Peters & Bagnell, 2010 Given tese trajectories, te policy parameters are updated, new Algoritm 1 PGELLA k, λ, µ 0, A zeros k d,k d, b zeros k d,1, L zeros d,k wile some task t is available do if isnewaskt ten + 1 t, R t getrandomrajectories else t, R t getrajectories α t A A s t s t Γ t b b vec s t α t Γ t end if Compute α t and Γ t from t, R t L reinitializeallzerocolumnsl s t arg min s l L, s, α t, Γ t A A + s t s t Γ t b b + vec s t α t Γ t L mat 1 A + λi k d,k d end wile 1 1 b trajectories are sampled from te system using te updated policy, and te procedure is ten repeated In tis work, we adopt a sligtly modified version of policy gradients to operate in te lifelong learning setting e first time a new task is observed, we use a random policy for sampling; eac subsequent time te task is observed, we sample trajectories using te previously learned α t Additionally, instead of looping until te policy parameters ave converged, we perform only one run over te trajectories Upon receiving data for a specific task t, PGELLA performs two steps to update te model: it first computes te taskspecific projections s t, and ten refines te sared latent space L o compute s t, we first determine α t and Γ t using only data from task t e details of tis step depend on te form cosen for te policy, as described in Section 5 We can ten solve te L 1 regularized regression problem given in Eq 5 an instance of te Lasso to yield s t In te second step, we update L by first reinitializing any zerocolumns of L and ten following Eq 6 e complete PGELLA is given as Algoritm 1 5 Policy Forms & Base Learners PGELLA supports a variety of policy forms and base learners, enabling it to be used in a number of policy gradient settings is section describes ow two popular policy gradient metods can be used as te base learner in PG ELLA In teory, any policy gradient learner tat can provide an estimate of te essian can be incorporated
6 Online Multiask Learning for Policy Gradient Metods 51 Episodic REINFORCE In episodic REINFORCE Williams, 1992, te stocastic policy for task t is cosen according a t = θ t x t + ɛ, wit ɛ N 0, σ 2, and so π a t xt N θ t x t, σ2 erefore, θt [ t θtj L,θ = E R t τ =1 σ 2 ] a t θ x t is used to minimize te KLdivergence, equivalently maximizing te total discounted payoff e second derivative [ for episodic REINFORCE is given by ] Γ t t = E =1 σ 2 x t xt 52 Natural Actor Critic In episodic Natural Actor Critic enac, te stocastic policy for task t is cosen in a similar fasion to tat of REINFORCE: π a t xt N θ t x t, σ2 e cange in te probability distribution is measured by a KLdivergence tat is approximated using a secondorder expansion to incorporate te Fiser information matrix Accordingly, te gradient follows: θ J = G 1 θ J θ, were G denotes te Fiser information matrix e essian can be computed in a similar manner to te previous section For details, see Peters & Scaal eoretical Results & Computational Cost ere, we provide teoretical results tat establis tat PG ELLA converges and tat te cost in terms of model performance for making te simplification from Section 421 is asymptotically negligible We proceed by first stating teoretical results from Ruvolo & Eaton 2013, and ten sow tat tese teoretical results apply directly to PG ELLA wit minimal modifications First, we define: ĝ L = 1 l L, s t, α t, Γ t +λ L 2 F Recall from Section 421, tat te leftand side of te preceding equation specifies te cost of basis L if we leave te s t s fixed ie, we only update tem wen we receive training data for tat particular task We are now ready to state te two results from Ruvolo & Eaton 2013: Proposition 1: e latent basis becomes more stable over time at a rate of L +1 L = O 1 Proposition 2: 1 ĝ L converges almost surely; 2 ĝ L e L converges almost surely to 0 Proposition 2 establises tat te algoritm converges to a fixed pertask loss on te approximate objective function ĝ and te objective function tat does not contain te simplification from Section 421 Furter, Prop 2 establises tat tese two functions converge to te same value e consequence of tis last point is tat PGELLA does not incur any penalty in terms of average pertask loss for making te simplification from Section 421 e two propositions require te following assumptions: 1 e tuples Γ t, α t are drawn iid from a distribution wit compact support bounding te entries of Γ t and α t 2 For all L, Γ t, and α t, te smallest eigenvalue of L γ Γ t L γ is at least κ wit κ > 0, were γ is te subset of nonzero indices of te vector s t = arg min s α t Ls 2 Γ t In tis case te nonzero elements of te unique minimizing s t are given by: s t γ = L γ Γ t 1 L γ L γ Γ t α t µɛ γ, were ɛ γ is a vector containing te signs of te nonzero entries of s t e second assumption is a mild condition on te uniqueness of te sparse coding solution e first assumption can be verified by assuming tat tere is no sequential dependency of one task to te next Additionally, te fact tat Γ t is contained in a compact region can be verified for te episodic REINFORCE algoritm by looking at te form of te essian and requiring tat te time orizon t is finite Using a similar argument we can see tat te magnitude of te gradient for episodic REINFORCE is also bounded wen t is finite If we ten assume tat we make a finite number of updates for eac task model we can ensure tat te sum of all gradient updates is finite, tus guaranteeing tat α t is contained in a compact region Computational Complexity: Eac update begins by running a step of policy gradient to update α t and Γ t We assume tat te cost of te policy gradient update is Oξd, n t, were te specific cost depends on te particular policy algoritm employed and n t is te number of trajectories obtained for task t at te current iteration o complete te analysis, we use a result from Ruvolo & Eaton 2013 tat te cost of updating L and s t is Ok 2 d 3 is gives an overall cost of Ok 2 d 3 +ξd, n t for eac update 7 Evaluation We applied PGELLA to learn control policies for te four dynamical systems sown in Figure 1, including tree mecanical systems and an application to quadrotor control We generated multiple tasks by varying te parameterization of eac system, yielding a set of tasks from eac domain wit varying dynamics For example, te simple mass spring damper system exibits significantly iger oscillations as te spring constant increases Notably, te opti
7 Online Multiask Learning for Policy Gradient Metods, i F x, x i 3, 3 i 1, 1 i able 1 System parameter ranges used in te experiments F SM k [1, 10] d [001, 02] m [05, 5] x, x i F2 2, 2 i e11 e21 e31 F1 rol r e 2B e1 F F3 B e3b x, x i CP & 3CP mc [05, 15] mp [01, 02] l [02, 08] d [001, 009] 3CP l1 [03, 05] l2 [02, 04] l3 [01, 03] 3CP d1 [01, 02] d2 [001, 002] d3 [01, 02] Ii 10 6, 10 4 l 711 E XPERIMENAL P ROOCOL pit c F4 yaw Figure 1 e four dynamical systems: a simple mass spring damper topleft, b cartpole toprigt, c treelink inverted pendulum bottomleft, and d quadrotor bottomrigt mal policies for controlling tese systems vary significantly even for only sligt variations in te system parameters 71 Bencmark Dynamical Systems We evaluated PGELLA on tree bencmark dynamical systems In eac domain, te distance between te current state and te goal position was used as te reward function Simple Mass Spring Damper: e simple mass SM system is caracterized by tree parameters: te spring constant k in N/m, te damping constant d in Ns/m, and te mass m in kg e system s state is given by te position x and velocity x of te mass, wic vary according to a linear force F e goal is to design a policy for controlling te mass to be in a specific state gref = xref, x ref i In our experiments, te goal state varied from being gref = 0, 0i i to gref = i, 0i, were i {1, 2,, 5} CartPole: e cartpole CP system as been used extensively as a bencmark for evaluating RL algoritms Bus oniu et al, 2010 CP dynamics are caracterized by te cart s mass mc in kg, te pole s mass mp in kg, te pole s lengt l in meters, and a damping parameter d in Ns/m e state is caracterized by te position x and velocity x of te cart, as well as te angle θ and angular velocity θ of te pole e goal is to design a policy capable of controlling te pole in an uprigt position reelink Inverted Pendulum: e treelink CP 3CP is a igly nonlinear and difficult system to control e goal is to balance tree connected rods in an uprigt position by moving te cart e dynamics are parameterized by te mass of te cart mc, rod mass mp,i, lengt li, inertia Ii, and damping parameters di, were i {1, 2, 3} represents te index for eac of te tree rods e system s state is caracterized by an eigtdimensional vector, consisting of te position x and velocity x of te cart, and te angle {θi }3i=1 and angular velocity {θ i }3i=1 of eac rod We first generated 30 tasks for eac domain by varying te system parameters over te ranges given in able 1 ese parameter ranges were cosen to ensure a variety of tasks, including tose tat were difficult to control wit igly caotic dynamics We ten randomized te task order wit repetition and PGELLA acquired a limited amount of experience in eac task consecutively, updating L and te st s after eac session At eac learning session, PGELLA was limited to 50 trajectories for SM & CP or 20 trajectories for 3CP wit 150 time steps eac to perform te update Learning ceased once PGELLA ad experienced at least one session wit eac task o configure PGELLA, we used enac Peters & Scaal, 2008 as te base policy gradient learner e dimensionality k of te latent basis L was cosen independently for eac domain via crossvalidation over 10 tasks e stepsize for eac task domain was determined by a line searc after gatering 10 trajectories of lengt 150 o evaluate te learned basis at any point in time, we initialized policies for eac task using θ t = Lst for t = {1,, } Starting from tese initializations, learning on eac task commenced using enac e number of trajectories varied among te domains from a minimum of 20 on te simple mass system to a maximum of 50 on te quadrotors e lengt of eac of tese trajectories was set to 150 time steps across all domains We measured performance using te average reward computed over 50 episodes of 150 time steps, and compared tis to standard enac running independently wit te same settings 712 R ESULS ON E B ENCMARK S YSEMS Figure 2 compares PGELLA to standard policy gradient learning using enac, sowing te average performance on all tasks versus te number of learning iterations PGELLA clearly outperforms standard enac in bot te initial and final performance on all task domains, demonstrating significantly improved performance from ML We evaluated PGELLA s performance on all tasks using te basis L learned after observing various subsets of tasks, from observing only tree tasks 10% to observing all 30 tasks 100% ese experiments assessed te quality of te learned basis L on bot known as well as unknown tasks, sowing tat performance increases as PGELLA
8 Online Multiask Learning for Policy Gradient Metods Average Reward PG ELLA, 100% tasks observed PG ELLA, 50% tasks observed PG ELLA, 30% tasks observed PG ELLA, 10% tasks observed Standard Policy Gradients Average Reward PG ELLA, 100% tasks observed PG ELLA, 50% tasks observed PG ELLA, 30% tasks observed PG ELLA, 10% tasks observed Standard Policy Gradients Average Reward PG ELLA, 100% tasks observed PG ELLA, 50% tasks observed PG ELLA, 30% tasks observed PG ELLA, 10% tasks observed Standard Policy Gradients Iterations a Simple Mass Spring Damper Iterations b CartPole Iterations c reelink Inverted Pendulum Figure 2 e performance of PGELLA versus standard policy gradients enac on te bencmark dynamical systems learns more tasks Wen a particular task was not observed, te recent L wit a zero initialization of s t was used o assess te difference in total number of trajectories between PGELLA and enac, we also tried giving enac an additional 50 trajectories of lengt 150 time steps at eac iteration owever, its overall performance did not cange 72 Quadrotor Control We also evaluated PGELLA on an application to quadrotor control, providing a more callenging domain e quadrotor system is illustrated in Figure 1, wit dynamics influenced by inertial constants around e 1,B, e 2,B, and e 3,B, trust factors influencing ow te rotor s speed affects te overall variation of te system s state, and te lengt of te rods supporting te rotors Altoug te overall state of te system can be described by a ninedimensional vector, we focus on stability and so consider only six of tese state variables e quadrotor system as a igdimensional action space, were te goal is control te four rotational velocities {w i } 4 i=1 of te rotors to stabilize te system o ensure realistic dynamics, we used te simulated model described by Bouabdalla 2007, wic as been verified on and used in te control of a pysical quadrotor o produce multiple tasks, we generated 15 quadrotor systems by varying eac of: te inertia around te x axis I xx [45e 3, 65e 3 ], inertia around te yaxis I yy [42e 3, 52e 3 ], inertia around te zaxis I zz [15e 2, 21e 2 ], and te lengt of te arms l [027, 03] In eac case, tese parameter values ave been used by Bouabdalla 2007 to describe pysical quadrotors We used a linear quadratic regulator, as described by Bouabdalla, to initialize te policies in bot te learning ie, determining L and s t and testing ie, comparing to standard policy gradients pases We followed a similar experimental procedure to evaluate PGELLA on quadrotor control, were we used 50 trajectories of 150 time steps to perform an enac policy gradient update eac learning session Figure 3 compares PGELLA to standard policy gradients enac on quadrotor control As on te bencmark sys Average Reward 1 x PG ELLA, 100% tasks observed PG ELLA, 50% tasks observed PG ELLA, 30% tasks observed PG ELLA, 10% tasks observed Standard Policy Gradients Iterations Figure 3 Performance on quadrotor control tems, we see tat PGELLA clearly outperforms standard enac in bot te initial and final performance, and tis performance increases as PGELLA learns more tasks e final performance of te policy learned by PGELLA after observing all tasks is significantly better tan te policy learned using standard policy gradients, sowing te benefits of knowledge transfer between tasks Most importantly for practical applications, by using te basis L learned over previous tasks, PGELLA can acieve ig performance in a new task muc more quickly wit fewer trajectories tan standard policy gradient metods 8 Conclusion & Future Work PGELLA provides an efficient mecanism for online ML of SDM tasks wile providing improved performance over standard policy gradient metods By supporting knowledge transfer between tasks via a sared latent basis, PG ELLA is also able to rapidly learn policies for new tasks, providing te ability for an agent to rapidly adapt to new situations In future work, we intend to explore te potential for crossdomain transfer wit PGELLA Acknowledgements is work was partially supported by ONR N , AFOSR FA , and NSF IIS We tank te reviewers for teir elpful suggestions
9 Online Multiask Learning for Policy Gradient Metods References Bócsi, B, Csato, L, and Peters, J Alignmentbased transfer learning for robot models In Proceedings of te 2013 International Joint Conference on Neural Networks IJCNN, 2013 BouAmmar,, aylor, ME, uyls, K, Driessens, K, and Weiss, G Reinforcement learning transfer via sparse coding In Proceedings of te 11t Conference on Autonomous Agents and Multiagent Systems AAMAS, 2012 Bouabdalla, S Design and control of quadrotors wit application to autonomous flying PD tesis, École polytecnique fédérale de Lausanne, 2007 Buşoniu, L, Babuška, R, De Scutter, B, and Ernst, D Reinforcement Learning and Dynamic Programming Using Function Approximators CRC Press, Boca Raton, Florida, 2010 Daniel, C, Neumann, G, Kroemer, O, and Peters, J Learning sequential motor tasks In Proceedings of te 2013 IEEE International Conference on Robotics and Automation ICRA, 2013 Deisenrot, MP, Englert, P, Peters, J, and Fox, D Multitask policy searc for robotics In Proceedings of te 2014 IEEE International Conference on Robotics and Automation ICRA, 2014 Fernández, F and Veloso, M Learning domain structure troug probabilistic policy reuse in reinforcement learning Progress in AI, 21:13 27, 2013 Kober, J and Peters, J Policy searc for motor primitives in robotics Macine Learning, 841 2, July 2011 Kumar, A and Daumé III, Learning task grouping and overlap in multitask learning In Proceedings of te 29t International Conference on Macine Learning ICML, 2012 Kupcsik, AG, Deisenrot, MP, Peters, J, and Neumann, G Dataefficient generalization of robot skills wit contextual policy searc In Proceedings of te AAAI Conference on Artificial Intelligence AAAI, 2013 Lazaric, A and Gavamzade, M Bayesian multitask reinforcement learning In Proceedings of te 27t International Conference on Macine Learning ICML, 2010 Peters, J and Scaal, S Natural actorcritic Neurocomputing, : , 2008 Rai, P and Daumé III, Infinite predictor subspace models for multitask learning In Proceedings of te 26t Conference on Uncertainty in Artificial Intelligence UAI, 2010 Ruvolo, P and Eaton, E ELLA: An efficient lifelong learning algoritm In Proceedings of te 30t International Conference on Macine Learning ICML, 2013 Sutton, RS, McAllester, DA, Sing, SP, and Mansour, Y Policy gradient metods for reinforcement learning wit function approximation In Neural Information Processing Systems NIPS, pp , 1999 aylor, ME, and Stone, P ransfer learning for reinforcement learning domains: a survey Journal of Macine Learning Researc, 10: , 2009 aylor, ME, Witeson, S, and Stone, P ransfer via intertask mappings in policy searc reinforcement learning In Proceedings of te 6t International Joint Conference on Autonomous Agents and Multiagent Systems AAMAS, 2007 aylor, ME, Kulmann, G, and Stone, P Autonomous transfer for reinforcement learning In Proceedings of te 7t International Joint Conference on Autonomous Agents and Multiagent Systems AAMAS, pp , 2008 run, S and O Sullivan, J Discovering structure in multiple learning tasks: te C algoritm In Proceedings of te 13t International Conference on Macine Learning ICML, 1996 Williams, RJ Simple statistical gradientfollowing algoritms for connectionist reinforcement learning Macine Learning, 8: , 1992 Wilson, A, Fern, A, Ray, S, and adepalli, P Multitask reinforcement learning: a ierarcical Bayesian approac In Proceedings of te 24t International Conference on Macine Learning ICML, pp , 2007 Zang, J, Garamani, Z, and Yang, Y Flexible latent variable models for multitask learning Macine Learning, 733: , 2008 Li,, Liao, X, and Carin, L Multitask reinforcement learning in partially observable stocastic environments Journal of Macine Learning Researc, 10: , 2009 Liu, Y and Stone, P Valuefunctionbased transfer for reinforcement learning using structure mapping In Proceedings of te 21st National Conference on Artificial Intelligence AAAI, pp , 2006 Maurer, A, Pontil, M, and RomeraParedes, B Sparse coding for multitask and transfer learning In Proceedings of te 30t International Conference on Macine Learning ICML, 2013 Peters, J and Bagnell, JA Policy gradient metods Encyclopedia of Macine Learning, pp , 2010 Peters, J and Scaal, S Applying te episodic natural actorcritic arcitecture to motor primitive learning In Proceedings of te 2007 European Symposium on Artificial Neural Networks ESANN, 2007
Verifying Numerical Convergence Rates
1 Order of accuracy Verifying Numerical Convergence Rates We consider a numerical approximation of an exact value u. Te approximation depends on a small parameter, suc as te grid size or time step, and
More informationThe EOQ Inventory Formula
Te EOQ Inventory Formula James M. Cargal Matematics Department Troy University Montgomery Campus A basic problem for businesses and manufacturers is, wen ordering supplies, to determine wat quantity of
More informationTraining Robust Support Vector Regression via D. C. Program
Journal of Information & Computational Science 7: 12 (2010) 2385 2394 Available at ttp://www.joics.com Training Robust Support Vector Regression via D. C. Program Kuaini Wang, Ping Zong, Yaoong Zao College
More informationResearch on the Antiperspective Correction Algorithm of QR Barcode
Researc on te Antiperspective Correction Algoritm of QR Barcode Jianua Li, YiWen Wang, YiJun Wang,Yi Cen, Guoceng Wang Key Laboratory of Electronic Tin Films and Integrated Devices University of Electronic
More information2.28 EDGE Program. Introduction
Introduction Te Economic Diversification and Growt Enterprises Act became effective on 1 January 1995. Te creation of tis Act was to encourage new businesses to start or expand in Newfoundland and Labrador.
More informationComparison between two approaches to overload control in a Real Server: local or hybrid solutions?
Comparison between two approaces to overload control in a Real Server: local or ybrid solutions? S. Montagna and M. Pignolo Researc and Development Italtel S.p.A. Settimo Milanese, ITALY Abstract Tis wor
More informationOptimized Data Indexing Algorithms for OLAP Systems
Database Systems Journal vol. I, no. 2/200 7 Optimized Data Indexing Algoritms for OLAP Systems Lucian BORNAZ Faculty of Cybernetics, Statistics and Economic Informatics Academy of Economic Studies, Bucarest
More informationGeometric Stratification of Accounting Data
Stratification of Accounting Data Patricia Gunning * Jane Mary Horgan ** William Yancey *** Abstract: We suggest a new procedure for defining te boundaries of te strata in igly skewed populations, usual
More informationSAMPLE DESIGN FOR THE TERRORISM RISK INSURANCE PROGRAM SURVEY
ASA Section on Survey Researc Metods SAMPLE DESIG FOR TE TERRORISM RISK ISURACE PROGRAM SURVEY G. ussain Coudry, Westat; Mats yfjäll, Statisticon; and Marianne Winglee, Westat G. ussain Coudry, Westat,
More informationStrategic trading in a dynamic noisy market. Dimitri Vayanos
LSE Researc Online Article (refereed) Strategic trading in a dynamic noisy market Dimitri Vayanos LSE as developed LSE Researc Online so tat users may access researc output of te Scool. Copyrigt and Moral
More informationInstantaneous Rate of Change:
Instantaneous Rate of Cange: Last section we discovered tat te average rate of cange in F(x) can also be interpreted as te slope of a scant line. Te average rate of cange involves te cange in F(x) over
More informationSchedulability Analysis under Graph Routing in WirelessHART Networks
Scedulability Analysis under Grap Routing in WirelessHART Networks Abusayeed Saifulla, Dolvara Gunatilaka, Paras Tiwari, Mo Sa, Cenyang Lu, Bo Li Cengjie Wu, and Yixin Cen Department of Computer Science,
More informationWe consider the problem of determining (for a short lifecycle) retail product initial and
Optimizing Inventory Replenisment of Retail Fasion Products Marsall Fiser Kumar Rajaram Anant Raman Te Warton Scool, University of Pennsylvania, 3620 Locust Walk, 3207 SHDH, Piladelpia, Pennsylvania 191046366
More informationMATHEMATICS FOR ENGINEERING DIFFERENTIATION TUTORIAL 1  BASIC DIFFERENTIATION
MATHEMATICS FOR ENGINEERING DIFFERENTIATION TUTORIAL 1  BASIC DIFFERENTIATION Tis tutorial is essential prerequisite material for anyone stuing mecanical engineering. Tis tutorial uses te principle of
More informationAn inquiry into the multiplier process in ISLM model
An inquiry into te multiplier process in ISLM model Autor: Li ziran Address: Li ziran, Room 409, Building 38#, Peing University, Beijing 00.87,PRC. Pone: (86) 0062763074 Internet Address: jefferson@water.pu.edu.cn
More informationACT Math Facts & Formulas
Numbers, Sequences, Factors Integers:..., 3, 2, 1, 0, 1, 2, 3,... Rationals: fractions, tat is, anyting expressable as a ratio of integers Reals: integers plus rationals plus special numbers suc as
More informationON LOCAL LIKELIHOOD DENSITY ESTIMATION WHEN THE BANDWIDTH IS LARGE
ON LOCAL LIKELIHOOD DENSITY ESTIMATION WHEN THE BANDWIDTH IS LARGE Byeong U. Park 1 and Young Kyung Lee 2 Department of Statistics, Seoul National University, Seoul, Korea Tae Yoon Kim 3 and Ceolyong Park
More informationUnemployment insurance/severance payments and informality in developing countries
Unemployment insurance/severance payments and informality in developing countries David Bardey y and Fernando Jaramillo z First version: September 2011. Tis version: November 2011. Abstract We analyze
More informationComputer Science and Engineering, UCSD October 7, 1999 GoldreicLevin Teorem Autor: Bellare Te GoldreicLevin Teorem 1 Te problem We æx a an integer n for te lengt of te strings involved. If a is an nbit
More informationLecture 10. Limits (cont d) Onesided limits. (Relevant section from Stewart, Seventh Edition: Section 2.4, pp. 113.)
Lecture 10 Limits (cont d) Onesided its (Relevant section from Stewart, Sevent Edition: Section 2.4, pp. 113.) As you may recall from your earlier course in Calculus, we may define onesided its, were
More information7.6 Complex Fractions
Section 7.6 Comple Fractions 695 7.6 Comple Fractions In tis section we learn ow to simplify wat are called comple fractions, an eample of wic follows. 2 + 3 Note tat bot te numerator and denominator are
More informationCollege Planning Using Cash Value Life Insurance
College Planning Using Cas Value Life Insurance CAUTION: Te advisor is urged to be extremely cautious of anoter college funding veicle wic provides a guaranteed return of premium immediately if funded
More informationCan a LumpSum Transfer Make Everyone Enjoy the Gains. from Free Trade?
Can a LumpSum Transfer Make Everyone Enjoy te Gains from Free Trade? Yasukazu Icino Department of Economics, Konan University June 30, 2010 Abstract I examine lumpsum transfer rules to redistribute te
More informationLecture 10: What is a Function, definition, piecewise defined functions, difference quotient, domain of a function
Lecture 10: Wat is a Function, definition, piecewise defined functions, difference quotient, domain of a function A function arises wen one quantity depends on anoter. Many everyday relationsips between
More informationFINITE DIFFERENCE METHODS
FINITE DIFFERENCE METHODS LONG CHEN Te best known metods, finite difference, consists of replacing eac derivative by a difference quotient in te classic formulation. It is simple to code and economic to
More informationOPTIMAL FLEET SELECTION FOR EARTHMOVING OPERATIONS
New Developments in Structural Engineering and Construction Yazdani, S. and Sing, A. (eds.) ISEC7, Honolulu, June 1823, 2013 OPTIMAL FLEET SELECTION FOR EARTHMOVING OPERATIONS JIALI FU 1, ERIK JENELIUS
More informationSHAPE: A NEW BUSINESS ANALYTICS WEB PLATFORM FOR GETTING INSIGHTS ON ELECTRICAL LOAD PATTERNS
CIRED Worksop  Rome, 1112 June 2014 SAPE: A NEW BUSINESS ANALYTICS WEB PLATFORM FOR GETTING INSIGTS ON ELECTRICAL LOAD PATTERNS Diego Labate Paolo Giubbini Gianfranco Cicco Mario Ettorre Enel DistribuzioneItaly
More informationPredicting the behavior of interacting humans by fusing data from multiple sources
Predicting te beavior of interacting umans by fusing data from multiple sources Erik J. Sclict 1, Ritcie Lee 2, David H. Wolpert 3,4, Mykel J. Kocenderfer 1, and Brendan Tracey 5 1 Lincoln Laboratory,
More information 1  Handout #22 May 23, 2012 Huffman Encoding and Data Compression. CS106B Spring 2012. Handout by Julie Zelenski with minor edits by Keith Schwarz
CS106B Spring 01 Handout # May 3, 01 Huffman Encoding and Data Compression Handout by Julie Zelenski wit minor edits by Keit Scwarz In te early 1980s, personal computers ad ard disks tat were no larger
More informationSWITCH T F T F SELECT. (b) local schedule of two branches. (a) ifthenelse construct A & B MUX. one iteration cycle
768 IEEE RANSACIONS ON COMPUERS, VOL. 46, NO. 7, JULY 997 Compileime Sceduling of Dynamic Constructs in Dataæow Program Graps Soonoi Ha, Member, IEEE and Edward A. Lee, Fellow, IEEE Abstract Sceduling
More informationImproved dynamic programs for some batcing problems involving te maximum lateness criterion A P M Wagelmans Econometric Institute Erasmus University Rotterdam PO Box 1738, 3000 DR Rotterdam Te Neterlands
More information2 Limits and Derivatives
2 Limits and Derivatives 2.7 Tangent Lines, Velocity, and Derivatives A tangent line to a circle is a line tat intersects te circle at exactly one point. We would like to take tis idea of tangent line
More informationStaffing and routing in a twotier call centre. Sameer Hasija*, Edieal J. Pinker and Robert A. Shumsky
8 Int. J. Operational Researc, Vol. 1, Nos. 1/, 005 Staffing and routing in a twotier call centre Sameer Hasija*, Edieal J. Pinker and Robert A. Sumsky Simon Scool, University of Rocester, Rocester 1467,
More informationAn Interest Rate Model
An Interest Rate Model Concepts and Buzzwords Building Price Tree from Rate Tree Lognormal Interest Rate Model Nonnegativity Volatility and te Level Effect Readings Tuckman, capters 11 and 12. Lognormal
More informationEquilibria in sequential bargaining games as solutions to systems of equations
Economics Letters 84 (2004) 407 411 www.elsevier.com/locate/econbase Equilibria in sequential bargaining games as solutions to systems of equations Tasos Kalandrakis* Department of Political Science, Yale
More informationA system to monitor the quality of automated coding of textual answers to open questions
Researc in Official Statistics Number 2/2001 A system to monitor te quality of automated coding of textual answers to open questions Stefania Maccia * and Marcello D Orazio ** Italian National Statistical
More informationSimultaneous Location of Trauma Centers and Helicopters for Emergency Medical Service Planning
Simultaneous Location of Trauma Centers and Helicopters for Emergency Medical Service Planning SooHaeng Co Hoon Jang Taesik Lee Jon Turner Tepper Scool of Business, Carnegie Mellon University, Pittsburg,
More informationDynamic Competitive Insurance
Dynamic Competitive Insurance Vitor Farina Luz June 26, 205 Abstract I analyze longterm contracting in insurance markets wit asymmetric information and a finite or infinite orizon. Risk neutral firms
More informationDistances in random graphs with infinite mean degrees
Distances in random graps wit infinite mean degrees Henri van den Esker, Remco van der Hofstad, Gerard Hoogiemstra and Dmitri Znamenski April 26, 2005 Abstract We study random graps wit an i.i.d. degree
More information2.13 Solid Waste Management. Introduction. Scope and Objectives. Conclusions
Introduction Te planning and delivery of waste management in Newfoundland and Labrador is te direct responsibility of municipalities and communities. Te Province olds overall responsibility for te development
More informationAreaSpecific Recreation Use Estimation Using the National Visitor Use Monitoring Program Data
United States Department of Agriculture Forest Service Pacific Nortwest Researc Station Researc Note PNWRN557 July 2007 AreaSpecific Recreation Use Estimation Using te National Visitor Use Monitoring
More informationFinite Volume Discretization of the Heat Equation
Lecture Notes 3 Finite Volume Discretization of te Heat Equation We consider finite volume discretizations of te onedimensional variable coefficient eat equation, wit Neumann boundary conditions u t x
More informationArtificial Neural Networks for Time Series Prediction  a novel Approach to Inventory Management using Asymmetric Cost Functions
Artificial Neural Networks for Time Series Prediction  a novel Approac to Inventory Management using Asymmetric Cost Functions Sven F. Crone University of Hamburg, Institute of Information Systems crone@econ.uniamburg.de
More informationTangent Lines and Rates of Change
Tangent Lines and Rates of Cange 922005 Given a function y = f(x), ow do you find te slope of te tangent line to te grap at te point P(a, f(a))? (I m tinking of te tangent line as a line tat just skims
More informationChapter 10: Refrigeration Cycles
Capter 10: efrigeration Cycles Te vapor compression refrigeration cycle is a common metod for transferring eat from a low temperature to a ig temperature. Te above figure sows te objectives of refrigerators
More informationDesign and Analysis of a FaultTolerant Mechanism for a ServerLess VideoOnDemand System
Design and Analysis of a Faultolerant Mecanism for a ServerLess VideoOnDemand System Jack Y. B. Lee Department of Information Engineering e Cinese University of Hong Kong Satin, N.., Hong Kong Email:
More information2.23 Gambling Rehabilitation Services. Introduction
2.23 Gambling Reabilitation Services Introduction Figure 1 Since 1995 provincial revenues from gambling activities ave increased over 56% from $69.2 million in 1995 to $108 million in 2004. Te majority
More informationMATHEMATICAL MODELS OF LIFE SUPPORT SYSTEMS Vol. I  Mathematical Models for Prediction of Climate  Dymnikov V.P.
MATHEMATICAL MODELS FOR PREDICTION OF CLIMATE Institute of Numerical Matematics, Russian Academy of Sciences, Moscow, Russia. Keywords: Modeling, climate system, climate, dynamic system, attractor, dimension,
More informationCatalogue no. 12001XIE. Survey Methodology. December 2004
Catalogue no. 1001XIE Survey Metodology December 004 How to obtain more information Specific inquiries about tis product and related statistics or services sould be directed to: Business Survey Metods
More informationIn other words the graph of the polynomial should pass through the points
Capter 3 Interpolation Interpolation is te problem of fitting a smoot curve troug a given set of points, generally as te grap of a function. It is useful at least in data analysis (interpolation is a form
More informationOPTIMAL DISCONTINUOUS GALERKIN METHODS FOR THE ACOUSTIC WAVE EQUATION IN HIGHER DIMENSIONS
OPTIMAL DISCONTINUOUS GALERKIN METHODS FOR THE ACOUSTIC WAVE EQUATION IN HIGHER DIMENSIONS ERIC T. CHUNG AND BJÖRN ENGQUIST Abstract. In tis paper, we developed and analyzed a new class of discontinuous
More informationMath Test Sections. The College Board: Expanding College Opportunity
Taking te SAT I: Reasoning Test Mat Test Sections Te materials in tese files are intended for individual use by students getting ready to take an SAT Program test; permission for any oter use must be sougt
More informationCyber Epidemic Models with Dependences
Cyber Epidemic Models wit Dependences Maocao Xu 1, Gaofeng Da 2 and Souuai Xu 3 1 Department of Matematics, Illinois State University mxu2@ilstu.edu 2 Institute for Cyber Security, University of Texas
More informationAnalyzing the Effects of Insuring Health Risks:
Analyzing te Effects of Insuring Healt Risks: On te Tradeoff between Sort Run Insurance Benefits vs. Long Run Incentive Costs Harold L. Cole University of Pennsylvania and NBER Soojin Kim University of
More informationM(0) = 1 M(1) = 2 M(h) = M(h 1) + M(h 2) + 1 (h > 1)
Insertion and Deletion in VL Trees Submitted in Partial Fulfillment of te Requirements for Dr. Eric Kaltofen s 66621: nalysis of lgoritms by Robert McCloskey December 14, 1984 1 ackground ccording to Knut
More informationWhat is Advanced Corporate Finance? What is finance? What is Corporate Finance? Deciding how to optimally manage a firm s assets and liabilities.
Wat is? Spring 2008 Note: Slides are on te web Wat is finance? Deciding ow to optimally manage a firm s assets and liabilities. Managing te costs and benefits associated wit te timing of cas in and outflows
More informationThe differential amplifier
DiffAmp.doc 1 Te differential amplifier Te emitter coupled differential amplifier output is V o = A d V d + A c V C Were V d = V 1 V 2 and V C = (V 1 + V 2 ) / 2 In te ideal differential amplifier A c
More information1. Case description. Best practice description
1. Case description Best practice description Tis case sows ow a large multinational went troug a bottom up organisational cange to become a knowledgebased company. A small community on knowledge Management
More informationPretrial Settlement with Imperfect Private Monitoring
Pretrial Settlement wit Imperfect Private Monitoring Mostafa Beskar University of New Hampsire JeeHyeong Park y Seoul National University July 2011 Incomplete, Do Not Circulate Abstract We model pretrial
More informationBonferroniBased SizeCorrection for Nonstandard Testing Problems
BonferroniBased SizeCorrection for Nonstandard Testing Problems Adam McCloskey Brown University October 2011; Tis Version: October 2012 Abstract We develop powerful new sizecorrection procedures for
More informationMath 113 HW #5 Solutions
Mat 3 HW #5 Solutions. Exercise.5.6. Suppose f is continuous on [, 5] and te only solutions of te equation f(x) = 6 are x = and x =. If f() = 8, explain wy f(3) > 6. Answer: Suppose we ad tat f(3) 6. Ten
More informationCHAPTER 7. Di erentiation
CHAPTER 7 Di erentiation 1. Te Derivative at a Point Definition 7.1. Let f be a function defined on a neigborood of x 0. f is di erentiable at x 0, if te following it exists: f 0 fx 0 + ) fx 0 ) x 0 )=.
More informationTo motivate the notion of a variogram for a covariance stationary process, { Ys ( ): s R}
4. Variograms Te covariogram and its normalized form, te correlogram, are by far te most intuitive metods for summarizing te structure of spatial dependencies in a covariance stationary process. However,
More informationh Understanding the safe operating principles and h Gaining maximum benefit and efficiency from your h Evaluating your testing system's performance
EXTRA TM Instron Services Revolve Around You It is everyting you expect from a global organization Te global training centers offer a complete educational service for users of advanced materials testing
More informationSAT Subject Math Level 1 Facts & Formulas
Numbers, Sequences, Factors Integers:..., 3, 2, 1, 0, 1, 2, 3,... Reals: integers plus fractions, decimals, and irrationals ( 2, 3, π, etc.) Order Of Operations: Aritmetic Sequences: PEMDAS (Parenteses
More informationTheoretical calculation of the heat capacity
eoretical calculation of te eat capacity Principle of equipartition of energy Heat capacity of ideal and real gases Heat capacity of solids: DulongPetit, Einstein, Debye models Heat capacity of metals
More informationStrategic trading and welfare in a dynamic market. Dimitri Vayanos
LSE Researc Online Article (refereed) Strategic trading and welfare in a dynamic market Dimitri Vayanos LSE as developed LSE Researc Online so tat users may access researc output of te Scool. Copyrigt
More informationSAT Math MustKnow Facts & Formulas
SAT Mat MustKnow Facts & Formuas Numbers, Sequences, Factors Integers:..., 3, 2, 1, 0, 1, 2, 3,... Rationas: fractions, tat is, anyting expressabe as a ratio of integers Reas: integers pus rationas
More information2.12 Student Transportation. Introduction
Introduction Figure 1 At 31 Marc 2003, tere were approximately 84,000 students enrolled in scools in te Province of Newfoundland and Labrador, of wic an estimated 57,000 were transported by scool buses.
More informationDerivatives Math 120 Calculus I D Joyce, Fall 2013
Derivatives Mat 20 Calculus I D Joyce, Fall 203 Since we ave a good understanding of its, we can develop derivatives very quickly. Recall tat we defined te derivative f x of a function f at x to be te
More informationInverted pendulum systems: rotary and armdriven  a mechatronic system design case study
Mecatronics 12 2002) 357±370 Inverted pendulum systems: rotary and armdriven  a mecatronic system design case study S. Awtar, N. King, T. Allen, I. Bang, M. Hagan, D. Skidmore, K. Craig * Department
More informationA FLOW NETWORK ANALYSIS OF A LIQUID COOLING SYSTEM THAT INCORPORATES MICROCHANNEL HEAT SINKS
A FLOW NETWORK ANALYSIS OF A LIQUID COOLING SYSTEM THAT INCORPORATES MICROCHANNEL HEAT SINKS Amir Radmer and Suas V. Patankar Innovative Researc, Inc. 3025 Harbor Lane Nort, Suite 300 Plymout, MN 55447
More informationReferendumled Immigration Policy in the Welfare State
Referendumled Immigration Policy in te Welfare State YUJI TAMURA Department of Economics, University of Warwick, UK First version: 12 December 2003 Updated: 16 Marc 2004 Abstract Preferences of eterogeneous
More informationThe modelling of business rules for dashboard reporting using mutual information
8 t World IMACS / MODSIM Congress, Cairns, Australia 37 July 2009 ttp://mssanz.org.au/modsim09 Te modelling of business rules for dasboard reporting using mutual information Gregory Calbert Command, Control,
More informationA strong credit score can help you score a lower rate on a mortgage
NET GAIN Scoring points for your financial future AS SEEN IN USA TODAY S MONEY SECTION, JULY 3, 2007 A strong credit score can elp you score a lower rate on a mortgage By Sandra Block Sales of existing
More informationA NOVEL PASSIVE ENERGY DISSIPATION SYSTEM FOR FRAMECORE TUBE STRUCTURE
Te Sevent AsiaPacific Conference on Wind Engineering, November 8, 009, Taipei, Taiwan A NOVEL PASSIVE ENERGY DISSIPATION SYSTEM FOR FRAMECORE TUBE STRUCTURE Zengqing Cen and Ziao Wang Director, Wind
More informationMultivariate time series analysis: Some essential notions
Capter 2 Multivariate time series analysis: Some essential notions An overview of a modeling and learning framework for multivariate time series was presented in Capter 1. In tis capter, some notions on
More informationWelfare, financial innovation and self insurance in dynamic incomplete markets models
Welfare, financial innovation and self insurance in dynamic incomplete markets models Paul Willen Department of Economics Princeton University First version: April 998 Tis version: July 999 Abstract We
More informationResearch on Risk Assessment of PFI Projects Based on Gridfuzzy Borda Number
Researc on Risk Assessent of PFI Projects Based on Gridfuzzy Borda Nuber LI Hailing 1, SHI Bensan 2 1. Scool of Arcitecture and Civil Engineering, Xiua University, Cina, 610039 2. Scool of Econoics and
More informationA Behavior Based Kernel for Policy Search via Bayesian Optimization
via Bayesian Optimization Aaron Wilson WILSONAA@EECS.OREGONSTATE.EDU Alan Fern AFERN@EECS.OREGONSTATE.EDU Prasad Tadepalli TADEPALL@EECS.OREGONSTATE.EDU Oregon State University School of EECS, 1148 Kelley
More informationFactoring Synchronous Grammars By Sorting
Factoring Syncronous Grammars By Sorting Daniel Gildea Computer Science Dept. Uniersity of Rocester Rocester, NY Giorgio Satta Dept. of Information Eng g Uniersity of Padua I Padua, Italy Hao Zang Computer
More informationHaptic Manipulation of Virtual Materials for Medical Application
Haptic Manipulation of Virtual Materials for Medical Application HIDETOSHI WAKAMATSU, SATORU HONMA Graduate Scool of Healt Care Sciences Tokyo Medical and Dental University, JAPAN wakamatsu.bse@tmd.ac.jp
More informationOn Distributed Key Distribution Centers and Unconditionally Secure Proactive Verifiable Secret Sharing Schemes Based on General Access Structure
On Distributed Key Distribution Centers and Unconditionally Secure Proactive Verifiable Secret Saring Scemes Based on General Access Structure (Corrected Version) Ventzislav Nikov 1, Svetla Nikova 2, Bart
More informationTD(0) Leads to Better Policies than Approximate Value Iteration
TD(0) Leads to Better Policies than Approximate Value Iteration Benjamin Van Roy Management Science and Engineering and Electrical Engineering Stanford University Stanford, CA 94305 bvr@stanford.edu Abstract
More informationPart II: Finite Difference/Volume Discretisation for CFD
Part II: Finite Difference/Volume Discretisation for CFD Finite Volume Metod of te AdvectionDiffusion Equation A Finite Difference/Volume Metod for te Incompressible NavierStokes Equations MarkerandCell
More informationPioneer Fund Story. Searching for Value Today and Tomorrow. Pioneer Funds Equities
Pioneer Fund Story Searcing for Value Today and Tomorrow Pioneer Funds Equities Pioneer Fund A Cornerstone of Financial Foundations Since 1928 Te fund s relatively cautious stance as kept it competitive
More informationEvaluating probabilities under highdimensional latent variable models
Evaluating probabilities under igdimensional latent variable models Iain Murray and Ruslan alakutdinov Department of Computer cience University of oronto oronto, ON. M5 3G4. Canada. {murray,rsalaku}@cs.toronto.edu
More informationWorking Capital 2013 UK plc s unproductive 69 billion
2013 Executive summary 2. Te level of excess working capital increased 3. UK sectors acieve a mixed performance 4. Size matters in te supply cain 6. Not all companies are overflowing wit cas 8. Excess
More informationDetermine the perimeter of a triangle using algebra Find the area of a triangle using the formula
Student Name: Date: Contact Person Name: Pone Number: Lesson 0 Perimeter, Area, and Similarity of Triangles Objectives Determine te perimeter of a triangle using algebra Find te area of a triangle using
More informationRegression Using Support Vector Machines: Basic Foundations
Regression Using Support Vector Machines: Basic Foundations Technical Report December 2004 Aly Farag and Refaat M Mohamed Computer Vision and Image Processing Laboratory Electrical and Computer Engineering
More informationLargescale Virtual Acoustics Simulation at Audio Rates Using Three Dimensional Finite Difference Time Domain and Multiple GPUs
Largescale Virtual Acoustics Simulation at Audio Rates Using Tree Dimensional Finite Difference Time Domain and Multiple GPUs Craig J. Webb 1,2 and Alan Gray 2 1 Acoustics Group, University of Edinburg
More informationA hybrid model of dynamic electricity price forecasting with emphasis on price volatility
all times On a nonliquid market, te accuracy of a price A ybrid model of dynamic electricity price forecasting wit empasis on price volatility Marin Cerjan Abstract Accurate forecasting tools are essential
More informationPretrial Settlement with Imperfect Private Monitoring
Pretrial Settlement wit Imperfect Private Monitoring Mostafa Beskar Indiana University JeeHyeong Park y Seoul National University April, 2016 Extremely Preliminary; Please Do Not Circulate. Abstract We
More informationAbstract. Introduction
Fast solution of te Sallow Water Equations using GPU tecnology A Crossley, R Lamb, S Waller JBA Consulting, Sout Barn, Brougton Hall, Skipton, Nort Yorksire, BD23 3AE. amanda.crossley@baconsulting.co.uk
More informationGovernment Debt and Optimal Monetary and Fiscal Policy
Government Debt and Optimal Monetary and Fiscal Policy Klaus Adam Manneim University and CEPR  preliminary version  June 7, 21 Abstract How do di erent levels of government debt a ect te optimal conduct
More informationMotivation. Motivation. Can a software agent learn to play Backgammon by itself? Machine Learning. Reinforcement Learning
Motivation Machine Learning Can a software agent learn to play Backgammon by itself? Reinforcement Learning Prof. Dr. Martin Riedmiller AG Maschinelles Lernen und Natürlichsprachliche Systeme Institut
More informationFree Shipping and Repeat Buying on the Internet: Theory and Evidence
Free Sipping and Repeat Buying on te Internet: eory and Evidence Yingui Yang, Skander Essegaier and David R. Bell 1 June 13, 2005 1 Graduate Scool of Management, University of California at Davis (yiyang@ucdavis.edu)
More informationNote nine: Linear programming CSE 101. 1 Linear constraints and objective functions. 1.1 Introductory example. Copyright c Sanjoy Dasgupta 1
Copyrigt c Sanjoy Dasgupta Figure. (a) Te feasible region for a linear program wit two variables (see tet for details). (b) Contour lines of te objective function: for different values of (profit). Te
More informationReinforced Concrete Beam
Mecanics of Materials Reinforced Concrete Beam Concrete Beam Concrete Beam We will examine a concrete eam in ending P P A concrete eam is wat we call a composite eam It is made of two materials: concrete
More information