# Variational Policy Search via Trajectory Optimization

Save this PDF as:

Size: px
Start display at page:

## Transcription

2 We follow prior work and define the probability of O t as p(o t = 1 x t, u t ) exp( c(x t, u t )) [19]. Using the dynamics distribution p(x t+1 x t, u t ) and the policy π θ (u t x t ), we can define a dynamic Bayesian network that relates states, actions, and the optimality indicator. By setting O t = 1 at all time steps and learning the maximum likelihood values for θ, we can perform policy optimization [20]. The corresponding optimization problem has the objective p(o θ) = p(o ζ)p(ζ θ)dζ exp ( ) T T c(x t, u t ) p(x 1 ) π θ (u t x t )p(x t+1 x t, u t )dζ. (1) Although this objective differs from the classical minimum average cost objective, previous work showed that it is nonetheless useful for policy optimization and planning [20, 19]. In Section 5, we discuss how this objective relates to the classical objective in more detail. 3 Variational Policy Search Following prior work [11], we can decompose log p(o θ) by using a variational distribution q(ζ): log p(o θ) = L(q, θ) + D KL (q(ζ) p(ζ O, θ)), where the variational lower bound L is given by L(q, θ) = q(ζ) log p(o ζ)p(ζ θ) dζ, q(ζ) and the second term is the Kullback-Leibler (KL) divergence p(ζ O, θ) D KL (q(ζ) p(ζ O, θ)) = q(ζ) log dζ = q(ζ) q(ζ) log p(o ζ)p(ζ θ) dζ. (2) q(ζ)p(o θ) We can then optimize the maximum likelihood objective in Equation 1 by iteratively minimizing the KL divergence with respect to q(ζ) and maximizing the bound L(q, θ) with respect to θ. This is the standard formulation for expectation maximization [9], and has been applied to policy optimization in previous work [8, 21, 3, 11]. However, prior policy optimization methods typically represent q(ζ) by sampling trajectories from the current policy π θ (u t x t ) and reweighting them, for example by the exponential of their cost. While this can improve policies that already visit regions of low cost, it relies on random policy-driven exploration to discover those low cost regions. We propose instead to directly optimize q(ζ) to minimize both its expected cost and its divergence from the current policy π θ (u t x t ) when a model of the dynamics is available. In the next section, we show that, for a Gaussian distribution q(ζ), the KL divergence in Equation 2 can be minimized by a variant of the differential dynamic programming (DDP) algorithm [4]. 4 Trajectory Optimization DDP is a trajectory optimization algorithm based on Newton s method [4]. We build off of a variant of DDP called iterative LQR, which linearizes the dynamics around the current trajectory, computes the optimal linear policy under linear-quadratic assumptions, executes this policy, and repeats the process around the new trajectory until convergence [17]. We show how this procedure can be used to minimize the KL divergence in Equation 2 when q(ζ) is a Gaussian distribution over trajectories. This derivation follows previous work [10], but is repeated here and expanded for completeness. Iterative LQR is a dynamic programming algorithm that recursively computes the value function backwards through time. Because of the linear-quadratic assumptions, the value function is always quadratic, and the dynamics are Gaussian with the mean at f(x t, u t ) and noise ɛ. Given a trajectory ( x 1, ū 1 ),..., ( x T, ū T ) and defining ˆx t = x t x t and û t = u t ū t, the dynamics and cost function are then approximated as following, with subscripts x and u denoting partial derivatives: ˆx t+1 f xtˆx t + f ut û t + ɛ c(x t, u t ) ˆx T t c xt + û T c ut ˆxT t c xxtˆx t + 1 2ûT t c uut û t + û T t c uxtˆx t + c( x t, ū t ). 2

3 Under this approximation, we can recursively compute the Q-function as follows: Q xxt =c xxt +f T xtv xxt+1 f xt Q uut =c uut +f T utv xxt+1 f ut Q uxt =c uxt +f T utv xxt+1 f xt Q xt =c xt +f T xtv xt+1 Q ut =c ut +f T utv xt+1, as well as the value function and linear policy terms: V xt = Q xt Q T uxtq 1 uutq u V xxt = Q xxt Q T uxtq 1 uutq ux The deterministic optimal policy is then given by g(x t ) = ū t + k t + K t (x t x t ). k t = Q 1 uutq ut K t = Q 1 uutq uxt. By repeatedly computing the optimal policy around the current trajectory and updating x t and ū t based on the new policy, iterative LQR converges to a locally optimal solution [17]. In order to use this algorithm to minimize the KL divergence in Equation 2, we introduce a modified cost function c(x t, u t ) = c(x t, u t ) log π θ (u t x t ). The optimal trajectory for this cost function approximately 1 minimizes the KL divergence when q(ζ) is a Dirac delta function, since [ T ] D KL (q(ζ) p(ζ O, θ)) = q(ζ) c(x t, u t ) log π θ (u t x t ) log p(x t+1 x t, u t ) dζ + const. However, we can also obtain a Gaussian q(ζ) by using the framework of linearly solvable MDPs [16] and the closely related concept of maximum entropy control [23]. The optimal policy π G under this framework minimizes an augmented cost function, given by c(x t, u t ) = c(x t, u t ) H(π G ), where H(π G ) is the entropy of a stochastic policy π G (u t x t ), and c(x t, u t ) includes log π θ (u t x t ) as above. Ziebart [23] showed that the optimal policy can be written as π G (u t x t ) = exp( Q t (x t, u t ) + V t (x t )), where V is a softened value function given by V t (x t ) = log exp (Q t (x t, u t )) du t. Under linear dynamics and quadratic costs, V has the same form as in the LQR derivation above, which means that π G (u t x t ) is a linear Gaussian with mean g(x t ) and covariance Q 1 uut [10]. Together with the linearized dynamics, the resulting policy specifies a Gaussian distribution over trajectories with Markovian independence: q(ζ) = p(x t ) T π G (u t x t ) p(x t+1 x t, u t ), where π G (u t x t ) = N (g(x t ), Q 1 uut), p(x t ) is an initial state distribution, and p(x t+1 x t, u t ) = N (f xtˆx t +f ut û t + x t+1, Σ ft ) is the linearized dynamics with Gaussian noise Σ ft. This distribution also corresponds to a Laplace approximation for p(ζ O, θ), which is formed from the exponential of the second order Taylor expansion of log p(ζ O, θ) [15]. Once we compute π G (u t x t ) using iterative LQR/DDP, it is straightforward to obtain the marginal distributions q(x t ), which will be useful in the next section for minimizing the variational bound L(q, θ). Using µ t and Σ t to denote the mean and covariance of the marginal at time t and assuming that the initial state distribution at t = 1 is given, the marginals can be computed recursively as µ t+1 = [ f xt f ut ] [ µt ū t + k t + K t (µ t x t ) [ Σt Σ Σ t+1 = [ f xt f ut ] t K T t K t Σ t Q 1 uut + K t Σ t K T t ] ] [ f xt f ut ] T + Σ ft. 1 The minimization is not exact if the dynamics p(x t+1 x t, u t) are not deterministic, but the result is very close if the dynamics have much lower entropy than the policy and exponentiated cost, which is often the case. 3

4 Algorithm 1 Variational Guided Policy Search 1: Initialize q(ζ) using DDP with cost c(x t, u t ) = α 0 c(x t, u t ) 2: for iteration k = 1 to K do 3: Compute marginals (µ 1, Σ t ),..., (µ T, Σ T ) for q(ζ) 4: Optimize L(q, θ) with respect to θ using standard nonlinear optimization methods 5: Set α k based on annealing schedule, for example α k = exp ( K k K log α 0 + k K log α ) K 6: Optimize q(ζ) using DDP with cost c(x t, u t ) = α k c(x t, u t ) log π θ (u t x t ) 7: end for 8: Return optimized policy π θ (u t x t ) When the dynamics are nonlinear or the modified cost c(x t, u t ) is nonquadratic, this solution only approximates the minimum of the KL divergence. In practice, the approximation is quite good when the dynamics and the cost c(x t, u t ) are smooth. Unfortunately, the policy term log π θ (u t x t ) in the modified cost c(x t, u t ) can be quite jagged early on in the optimization, particularly for nonlinear policies. To mitigate this issue, we compute the derivatives of the policy not only along the current trajectory, but also at samples drawn from the current marginals q(x t ), and average them together. This averages out local perturbations in log π θ (u t x t ) and improves the approximation. In Section 8, we discuss more sophisticated techniques that could be used in future work to handle highly nonlinear dynamics for which this approximation may be inadequate. 5 Variational Guided Policy Search The variational guided policy search (variational GPS) algorithm alternates between minimizing the KL divergence in Equation 2 with respect to q(ζ) as described in the previous section, and maximizing the bound L(q, θ) with respect to the policy parameters θ. Minimizing the KL divergence reduces the difference between L(q, θ) and log p(o θ), so that the maximization of L(q, θ) becomes a progressively better approximation for the maximization of log p(o θ). The method is summarized in Algorithm 1. The bound L(q, θ) can be maximized by a variety of standard optimization methods, such as stochastic gradient descent (SGD) or LBFGS. The gradient is given by T L(q, θ) = q(ζ) log π θ (u t x t )dζ 1 M M i=1 T log π θ (u i t x i t), (3) where the samples (x i t, u i t) are drawn from the marginals q(x t, u t ). When using SGD, new samples can be drawn at every iteration, since sampling from q(x t, u t ) only requires the precomputed marginals from the preceding section. Because the marginals are computed using linearized dynamics, we can be assured that the samples will not deviate drastically from the optimized trajectory, regardless of the true dynamics. The resulting SGD optimization is analogous to a supervised learning task with an infinite training set. When using LBFGS, a new sample set can generated every n LBFGS iterations. We found that values of n from 20 to 50 produced good results. When choosing the policy class, it is common to use deterministic policies with additive Gaussian noise. In this case, we can optimize the policy more quickly and with many fewer samples by only sampling states and evaluating the integral over actions analytically. Letting µ θ x t, Σ θ x t and µ q x t, Σ q x t denote the means and covariances of π θ (u t x t ) and q(u t x t ), we can write L(q, θ) as L(q, θ) 1 M = 1 M M i=1 M T q(u t x i t) log π θ (u t x i t)du t + const i=1 T 1 2 ( µ θ x µ q i t x i t ) TΣ ) θ 1 x t (µ θ x µ q 1 Σ i t x i t 2 log θ x i t 1 ( 2 tr Σ θ 1 Σ q x i t x i t ) + const. Two additional details should be taken into account in order to obtain the best results. First, although model-based trajectory optimization is more powerful than random exploration, complex tasks such as bipedal locomotion, which we address in the following section, are too difficult to solve entirely with trajectory optimization. To solve such tasks, we can initialize the procedure from a good initial 4

8 assumption that the oracle can provide good actions in all states. This assumption does not hold for DDP, which is only valid in a narrow region around the trajectory. To mitigate the locality of the DDP solution, we weighted the samples by their probability under the DDP marginals, which allowed DAGGER to solve the swimming task, but it was still outperformed by our method on the walking task, even with adaptation of the DDP solution. Unlike DAGGER, our approach is relatively insensitive to the instability of the learned policy, since the learned policy is not sampled. Several prior methods also propose to improve policy search by using a distribution over high-value states, which might come from a DDP solution [6, 1]. Such methods generally use this restart distribution as a new initial state distribution, and show that optimizing a policy from such a restart distribution also optimizes the expected return. Unlike our approach, such methods only use the states from the DDP solution, not the actions, and tend to suffer from the increased variance of the restart distribution, as shown in previous work [10]. 8 Discussion and Future Work We presented a policy search algorithm that employs a variational decomposition of a maximum likelihood objective to combine trajectory optimization with policy search. The variational distribution is obtained using differential dynamic programming (DDP), and the policy can be optimized with a standard nonlinear optimization algorithm. Model-based trajectory optimization effectively takes the place of random exploration, providing a much more effective means for finding low cost regions that the policy is then trained to visit. Our evaluation shows that this algorithm outperforms prior variational methods and prior methods that use trajectory optimization to guide policy search. Our algorithm has several interesting properties that distinguish it from prior methods. First, the policy search does not need to sample the learned policy. This may be useful in real-world applications where poor policies might be too risky to run on a physical system. More generally, this property improves the robustness of our method in the face of unstable initial policies, where on-policy samples have extremely high variance. By sampling directly from the Gaussian marginals of the DDP-induced distribution over trajectories, our approach also avoids some of the issues associated with unstable dynamics, requiring only that the task permit effective trajectory optimization. By optimizing a maximum likelihood objective, our method favors policies with good best-case performance. Obtaining good best-case performance is often the hardest part of policy search, since a policy that achieves good results occasionally is easier to improve with standard on-policy search methods than one that fails outright. However, modifying the algorithm to optimize the standard average cost criterion could produce more robust controllers in the future. The use of local linearization in DDP results in only approximate minimization of the KL divergence in Equation 2 in nonlinear domains or with nonquadratic policies. While we mitigate this by averaging the policy derivatives over multiple samples from the DDP marginals, this approach could still break down in the presence of highly nonsmooth dynamics or policies. An interesting avenue for future work is to extend the trajectory optimization method to nonsmooth domains by using samples rather than linearization, perhaps analogously to the unscented Kalman filter [5, 18]. This could also avoid the need to differentiate the policy with respect to the inputs, allowing for richer policy classes to be used. Another interesting avenue for future work is to apply model-free trajectory optimization techniques [7], which would avoid the need for a model of the system dynamics, or to learn the dynamics from data, for example by using Gaussian processes [2]. It would also be straightforward to use multiple trajectories optimized from different initial states to learn a single policy that is able to succeed under a variety of initial conditions. Overall, we believe that trajectory optimization is a very useful tool for policy search. By separating the policy optimization and exploration problems into two separate phases, we can employ simpler algorithms such as SGD and DDP that are better suited for each phase, and can achieve superior performance on complex tasks. We believe that additional research into augmenting policy learning with trajectory optimization can further advance the performance of policy search techniques. Acknowledgments We thank Emanuel Todorov, Tom Erez, and Yuval Tassa for providing the simulator used in our experiments. Sergey Levine was supported by NSF Graduate Research Fellowship DGE

9 References [1] A. Bagnell, S. Kakade, A. Ng, and J. Schneider. Policy search by dynamic programming. In Advances in Neural Information Processing Systems (NIPS), [2] M. Deisenroth and C. Rasmussen. PILCO: a model-based and data-efficient approach to policy search. In International Conference on Machine Learning (ICML), [3] T. Furmston and D. Barber. Variational methods for reinforcement learning. Journal of Machine Learning Research, 9: , [4] D. Jacobson and D. Mayne. Differential Dynamic Programming. Elsevier, [5] S. Julier and J. Uhlmann. A new extension of the Kalman filter to nonlinear systems. In International Symposium on Aerospace/Defense Sensing, Simulation, and Control, [6] S. Kakade and J. Langford. Approximately optimal approximate reinforcement learning. In International Conference on Machine Learning (ICML), [7] M. Kalakrishnan, S. Chitta, E. Theodorou, P. Pastor, and S. Schaal. STOMP: stochastic trajectory optimization for motion planning. In International Conference on Robotics and Automation, [8] J. Kober and J. Peters. Learning motor primitives for robotics. In International Conference on Robotics and Automation, [9] D. Koller and N. Friedman. Probabilistic Graphical Models: Principles and Techniques. MIT Press, [10] S. Levine and V. Koltun. Guided policy search. In International Conference on Machine Learning (ICML), [11] G. Neumann. Variational inference for policy search in changing situations. In International Conference on Machine Learning (ICML), [12] J. Peters and S. Schaal. Reinforcement learning of motor skills with policy gradients. Neural Networks, 21(4): , [13] K. Rawlik, M. Toussaint, and S. Vijayakumar. On stochastic optimal control and reinforcement learning by approximate inference. In Robotics: Science and Systems, [14] S. Ross, G. Gordon, and A. Bagnell. A reduction of imitation learning and structured prediction to no-regret online learning. Journal of Machine Learning Research, 15: , [15] L. Tierney and J. B. Kadane. Accurate approximations for posterior moments and marginal densities. Journal of the American Statistical Association, 81(393):82 86, [16] E. Todorov. Policy gradients in linearly-solvable MDPs. In Advances in Neural Information Processing Systems (NIPS 23), [17] E. Todorov and W. Li. A generalized iterative LQG method for locally-optimal feedback control of constrained nonlinear stochastic systems. In American Control Conference, [18] E. Todorov and Y. Tassa. Iterative local dynamic programming. In IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), [19] M. Toussaint. Robot trajectory optimization using approximate inference. In International Conference on Machine Learning (ICML), [20] M. Toussaint, L. Charlin, and P. Poupart. Hierarchical POMDP controller optimization by likelihood maximization. In Uncertainty in Artificial Intelligence (UAI), [21] N. Vlassis, M. Toussaint, G. Kontes, and S. Piperidis. Learning model-free robot control by a Monte Carlo EM algorithm. Autonomous Robots, 27(2): , [22] K. Yin, K. Loken, and M. van de Panne. SIMBICON: simple biped locomotion control. ACM Transactions Graphics, 26(3), [23] B. Ziebart. Modeling purposeful adaptive behavior with the principle of maximum causal entropy. PhD thesis, Carnegie Mellon University,

### STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct

### Deterministic Sampling-based Switching Kalman Filtering for Vehicle Tracking

Proceedings of the IEEE ITSC 2006 2006 IEEE Intelligent Transportation Systems Conference Toronto, Canada, September 17-20, 2006 WA4.1 Deterministic Sampling-based Switching Kalman Filtering for Vehicle

### PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical

### Gaussian Processes in Machine Learning

Gaussian Processes in Machine Learning Carl Edward Rasmussen Max Planck Institute for Biological Cybernetics, 72076 Tübingen, Germany carl@tuebingen.mpg.de WWW home page: http://www.tuebingen.mpg.de/ carl

### Making Sense of the Mayhem: Machine Learning and March Madness

Making Sense of the Mayhem: Machine Learning and March Madness Alex Tran and Adam Ginzberg Stanford University atran3@stanford.edu ginzberg@stanford.edu I. Introduction III. Model The goal of our research

### (Quasi-)Newton methods

(Quasi-)Newton methods 1 Introduction 1.1 Newton method Newton method is a method to find the zeros of a differentiable non-linear function g, x such that g(x) = 0, where g : R n R n. Given a starting

### Linear Threshold Units

Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

### Probabilistic user behavior models in online stores for recommender systems

Probabilistic user behavior models in online stores for recommender systems Tomoharu Iwata Abstract Recommender systems are widely used in online stores because they are expected to improve both user

### Statistical Machine Learning

Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

### Tracking Algorithms. Lecture17: Stochastic Tracking. Joint Probability and Graphical Model. Probabilistic Tracking

Tracking Algorithms (2015S) Lecture17: Stochastic Tracking Bohyung Han CSE, POSTECH bhhan@postech.ac.kr Deterministic methods Given input video and current state, tracking result is always same. Local

### A more robust unscented transform

A more robust unscented transform James R. Van Zandt a a MITRE Corporation, MS-M, Burlington Road, Bedford MA 7, USA ABSTRACT The unscented transformation is extended to use extra test points beyond the

### Regression Using Support Vector Machines: Basic Foundations

Regression Using Support Vector Machines: Basic Foundations Technical Report December 2004 Aly Farag and Refaat M Mohamed Computer Vision and Image Processing Laboratory Electrical and Computer Engineering

### Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

### Robotics. Chapter 25. Chapter 25 1

Robotics Chapter 25 Chapter 25 1 Outline Robots, Effectors, and Sensors Localization and Mapping Motion Planning Motor Control Chapter 25 2 Mobile Robots Chapter 25 3 Manipulators P R R R R R Configuration

### Big Data - Lecture 1 Optimization reminders

Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Schedule Introduction Major issues Examples Mathematics

### CCNY. BME I5100: Biomedical Signal Processing. Linear Discrimination. Lucas C. Parra Biomedical Engineering Department City College of New York

BME I5100: Biomedical Signal Processing Linear Discrimination Lucas C. Parra Biomedical Engineering Department CCNY 1 Schedule Week 1: Introduction Linear, stationary, normal - the stuff biology is not

### Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University caizhua@gmail.com

Bayesian Machine Learning (ML): Modeling And Inference in Big Data Zhuhua Cai Google Rice University caizhua@gmail.com 1 Syllabus Bayesian ML Concepts (Today) Bayesian ML on MapReduce (Next morning) Bayesian

### Bayesian Statistics: Indian Buffet Process

Bayesian Statistics: Indian Buffet Process Ilker Yildirim Department of Brain and Cognitive Sciences University of Rochester Rochester, NY 14627 August 2012 Reference: Most of the material in this note

### INTRODUCTION TO NEURAL NETWORKS

INTRODUCTION TO NEURAL NETWORKS Pictures are taken from http://www.cs.cmu.edu/~tom/mlbook-chapter-slides.html http://research.microsoft.com/~cmbishop/prml/index.htm By Nobel Khandaker Neural Networks An

### Safe robot motion planning in dynamic, uncertain environments

Safe robot motion planning in dynamic, uncertain environments RSS 2011 Workshop: Guaranteeing Motion Safety for Robots June 27, 2011 Noel du Toit and Joel Burdick California Institute of Technology Dynamic,

### Advanced Signal Processing and Digital Noise Reduction

Advanced Signal Processing and Digital Noise Reduction Saeed V. Vaseghi Queen's University of Belfast UK WILEY HTEUBNER A Partnership between John Wiley & Sons and B. G. Teubner Publishers Chichester New

### Sampling via Moment Sharing: A New Framework for Distributed Bayesian Inference for Big Data

Sampling via Moment Sharing: A New Framework for Distributed Bayesian Inference for Big Data (Oxford) in collaboration with: Minjie Xu, Jun Zhu, Bo Zhang (Tsinghua) Balaji Lakshminarayanan (Gatsby) Bayesian

### Scaling Bayesian Network Parameter Learning with Expectation Maximization using MapReduce

Scaling Bayesian Network Parameter Learning with Expectation Maximization using MapReduce Erik B. Reed Carnegie Mellon University Silicon Valley Campus NASA Research Park Moffett Field, CA 94035 erikreed@cmu.edu

### Least Squares Estimation

Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

### Christfried Webers. Canberra February June 2015

c Statistical Group and College of Engineering and Computer Science Canberra February June (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 829 c Part VIII Linear Classification 2 Logistic

### Master s thesis tutorial: part III

for the Autonomous Compliant Research group Tinne De Laet, Wilm Decré, Diederik Verscheure Katholieke Universiteit Leuven, Department of Mechanical Engineering, PMA Division 30 oktober 2006 Outline General

### POTENTIAL OF STATE-FEEDBACK CONTROL FOR MACHINE TOOLS DRIVES

POTENTIAL OF STATE-FEEDBACK CONTROL FOR MACHINE TOOLS DRIVES L. Novotny 1, P. Strakos 1, J. Vesely 1, A. Dietmair 2 1 Research Center of Manufacturing Technology, CTU in Prague, Czech Republic 2 SW, Universität

### Cheng Soon Ong & Christfried Webers. Canberra February June 2016

c Cheng Soon Ong & Christfried Webers Research Group and College of Engineering and Computer Science Canberra February June (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 31 c Part I

### The QOOL Algorithm for fast Online Optimization of Multiple Degree of Freedom Robot Locomotion

The QOOL Algorithm for fast Online Optimization of Multiple Degree of Freedom Robot Locomotion Daniel Marbach January 31th, 2005 Swiss Federal Institute of Technology at Lausanne Daniel.Marbach@epfl.ch

19 LINEAR QUADRATIC REGULATOR 19.1 Introduction The simple form of loopshaping in scalar systems does not extend directly to multivariable (MIMO) plants, which are characterized by transfer matrices instead

### Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max

### large-scale machine learning revisited Léon Bottou Microsoft Research (NYC)

large-scale machine learning revisited Léon Bottou Microsoft Research (NYC) 1 three frequent ideas in machine learning. independent and identically distributed data This experimental paradigm has driven

### Methods of Data Analysis Working with probability distributions

Methods of Data Analysis Working with probability distributions Week 4 1 Motivation One of the key problems in non-parametric data analysis is to create a good model of a generating probability distribution,

### Tutorial on Markov Chain Monte Carlo

Tutorial on Markov Chain Monte Carlo Kenneth M. Hanson Los Alamos National Laboratory Presented at the 29 th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Technology,

### Gaussian Process Training with Input Noise

Gaussian Process Training with Input Noise Andrew McHutchon Department of Engineering Cambridge University Cambridge, CB PZ ajm57@cam.ac.uk Carl Edward Rasmussen Department of Engineering Cambridge University

### Linear Classification. Volker Tresp Summer 2015

Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong

### Learning Gaussian process models from big data. Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu

Learning Gaussian process models from big data Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu Machine learning seminar at University of Cambridge, July 4 2012 Data A lot of

### CHAPTER 2 Estimating Probabilities

CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a

### A Tutorial on Particle Filters for Online Nonlinear/Non-Gaussian Bayesian Tracking

174 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 50, NO. 2, FEBRUARY 2002 A Tutorial on Particle Filters for Online Nonlinear/Non-Gaussian Bayesian Tracking M. Sanjeev Arulampalam, Simon Maskell, Neil

### Time series Forecasting using Holt-Winters Exponential Smoothing

Time series Forecasting using Holt-Winters Exponential Smoothing Prajakta S. Kalekar(04329008) Kanwal Rekhi School of Information Technology Under the guidance of Prof. Bernard December 6, 2004 Abstract

### EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set Amhmed A. Bhih School of Electrical and Electronic Engineering Princy Johnson School of Electrical and Electronic Engineering Martin

### Dirichlet Processes A gentle tutorial

Dirichlet Processes A gentle tutorial SELECT Lab Meeting October 14, 2008 Khalid El-Arini Motivation We are given a data set, and are told that it was generated from a mixture of Gaussian distributions.

: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

### Latent variable and deep modeling with Gaussian processes; application to system identification. Andreas Damianou

Latent variable and deep modeling with Gaussian processes; application to system identification Andreas Damianou Department of Computer Science, University of Sheffield, UK Brown University, 17 Feb. 2016

### Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

### These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

Music and Machine Learning (IFT6080 Winter 08) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher

### TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAA

2015 School of Information Technology and Electrical Engineering at the University of Queensland TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAA Schedule Week Date

### Stock Option Pricing Using Bayes Filters

Stock Option Pricing Using Bayes Filters Lin Liao liaolin@cs.washington.edu Abstract When using Black-Scholes formula to price options, the key is the estimation of the stochastic return variance. In this

Stochastic Gradient Method: Applications February 03, 2015 P. Carpentier Master MMMEF Cours MNOS 2014-2015 114 / 267 Lecture Outline 1 Two Elementary Exercices on the Stochastic Gradient Two-Stage Recourse

### ADVANTAGE UPDATING. Leemon C. Baird III. Technical Report WL-TR Wright Laboratory. Wright-Patterson Air Force Base, OH

ADVANTAGE UPDATING Leemon C. Baird III Technical Report WL-TR-93-1146 Wright Laboratory Wright-Patterson Air Force Base, OH 45433-7301 Address: WL/AAAT, Bldg 635 2185 Avionics Circle Wright-Patterson Air

### Tetris: Experiments with the LP Approach to Approximate DP

Tetris: Experiments with the LP Approach to Approximate DP Vivek F. Farias Electrical Engineering Stanford University Stanford, CA 94403 vivekf@stanford.edu Benjamin Van Roy Management Science and Engineering

### Online Model Predictive Control of a Robotic System by Combining Simulation and Optimization

Mohammad Rokonuzzaman Pappu Online Model Predictive Control of a Robotic System by Combining Simulation and Optimization School of Electrical Engineering Department of Electrical Engineering and Automation

### Bayesian Image Super-Resolution

Bayesian Image Super-Resolution Michael E. Tipping and Christopher M. Bishop Microsoft Research, Cambridge, U.K..................................................................... Published as: Bayesian

### C21 Model Predictive Control

C21 Model Predictive Control Mark Cannon 4 lectures Hilary Term 216-1 Lecture 1 Introduction 1-2 Organisation 4 lectures: week 3 week 4 { Monday 1-11 am LR5 Thursday 1-11 am LR5 { Monday 1-11 am LR5 Thursday

### Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014

Probabilistic Models for Big Data Alex Davies and Roger Frigola University of Cambridge 13th February 2014 The State of Big Data Why probabilistic models for Big Data? 1. If you don t have to worry about

### Maximum likelihood estimation of mean reverting processes

Maximum likelihood estimation of mean reverting processes José Carlos García Franco Onward, Inc. jcpollo@onwardinc.com Abstract Mean reverting processes are frequently used models in real options. For

### Local Gaussian Process Regression for Real Time Online Model Learning and Control

Local Gaussian Process Regression for Real Time Online Model Learning and Control Duy Nguyen-Tuong Jan Peters Matthias Seeger Max Planck Institute for Biological Cybernetics Spemannstraße 38, 776 Tübingen,

Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.

### CS 688 Pattern Recognition Lecture 4. Linear Models for Classification

CS 688 Pattern Recognition Lecture 4 Linear Models for Classification Probabilistic generative models Probabilistic discriminative models 1 Generative Approach ( x ) p C k p( C k ) Ck p ( ) ( x Ck ) p(

### Robotics 2 Clustering & EM. Giorgio Grisetti, Cyrill Stachniss, Kai Arras, Maren Bennewitz, Wolfram Burgard

Robotics 2 Clustering & EM Giorgio Grisetti, Cyrill Stachniss, Kai Arras, Maren Bennewitz, Wolfram Burgard 1 Clustering (1) Common technique for statistical data analysis to detect structure (machine learning,

### Statistical Machine Learning from Data

Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Gaussian Mixture Models Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique

### Adaptive Search with Stochastic Acceptance Probabilities for Global Optimization

Adaptive Search with Stochastic Acceptance Probabilities for Global Optimization Archis Ghate a and Robert L. Smith b a Industrial Engineering, University of Washington, Box 352650, Seattle, Washington,

### Sanjeev Kumar. contribute

RESEARCH ISSUES IN DATAA MINING Sanjeev Kumar I.A.S.R.I., Library Avenue, Pusa, New Delhi-110012 sanjeevk@iasri.res.in 1. Introduction The field of data mining and knowledgee discovery is emerging as a

### Basics of Statistical Machine Learning

CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar

### A Behavior Based Kernel for Policy Search via Bayesian Optimization

via Bayesian Optimization Aaron Wilson WILSONAA@EECS.OREGONSTATE.EDU Alan Fern AFERN@EECS.OREGONSTATE.EDU Prasad Tadepalli TADEPALL@EECS.OREGONSTATE.EDU Oregon State University School of EECS, 1148 Kelley

### Principles of Data Mining by Hand&Mannila&Smyth

Principles of Data Mining by Hand&Mannila&Smyth Slides for Textbook Ari Visa,, Institute of Signal Processing Tampere University of Technology October 4, 2010 Data Mining: Concepts and Techniques 1 Differences

### Visualization by Linear Projections as Information Retrieval

Visualization by Linear Projections as Information Retrieval Jaakko Peltonen Helsinki University of Technology, Department of Information and Computer Science, P. O. Box 5400, FI-0015 TKK, Finland jaakko.peltonen@tkk.fi

### Common Core Unit Summary Grades 6 to 8

Common Core Unit Summary Grades 6 to 8 Grade 8: Unit 1: Congruence and Similarity- 8G1-8G5 rotations reflections and translations,( RRT=congruence) understand congruence of 2 d figures after RRT Dilations

### Introduction to Machine Learning. Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011

Introduction to Machine Learning Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011 1 Outline 1. What is machine learning? 2. The basic of machine learning 3. Principles and effects of machine learning

### TD(0) Leads to Better Policies than Approximate Value Iteration

TD(0) Leads to Better Policies than Approximate Value Iteration Benjamin Van Roy Management Science and Engineering and Electrical Engineering Stanford University Stanford, CA 94305 bvr@stanford.edu Abstract

### Short Presentation. Topic: Locomotion

CSE 888.14 Advanced Computer Animation Short Presentation Topic: Locomotion Kang che Lee 2009 Fall 1 Locomotion How a character moves from place to place. Optimal gait and form for animal locomotion. K.

Graduate Programs in Statistics Course Titles STAT 100 CALCULUS AND MATR IX ALGEBRA FOR STATISTICS. Differential and integral calculus; infinite series; matrix algebra STAT 195 INTRODUCTION TO MATHEMATICAL

### Component Ordering in Independent Component Analysis Based on Data Power

Component Ordering in Independent Component Analysis Based on Data Power Anne Hendrikse Raymond Veldhuis University of Twente University of Twente Fac. EEMCS, Signals and Systems Group Fac. EEMCS, Signals

### Bayesian networks - Time-series models - Apache Spark & Scala

Bayesian networks - Time-series models - Apache Spark & Scala Dr John Sandiford, CTO Bayes Server Data Science London Meetup - November 2014 1 Contents Introduction Bayesian networks Latent variables Anomaly

### INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition)

INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition) Abstract Indirect inference is a simulation-based method for estimating the parameters of economic models. Its

Adaptive Online Gradient Descent Peter L Bartlett Division of Computer Science Department of Statistics UC Berkeley Berkeley, CA 94709 bartlett@csberkeleyedu Elad Hazan IBM Almaden Research Center 650

### A Learning Based Method for Super-Resolution of Low Resolution Images

A Learning Based Method for Super-Resolution of Low Resolution Images Emre Ugur June 1, 2004 emre.ugur@ceng.metu.edu.tr Abstract The main objective of this project is the study of a learning based method

### Bootstrapping Big Data

Bootstrapping Big Data Ariel Kleiner Ameet Talwalkar Purnamrita Sarkar Michael I. Jordan Computer Science Division University of California, Berkeley {akleiner, ameet, psarkar, jordan}@eecs.berkeley.edu

### Reinforcement Learning and Motion Planning

Reinforcement Learning and Motion Planning University of Southern California August 25, 2010 Reinforcement Learning Holy grail of learning for robotics Curse of dimensionality... Trajectory-based RL High

### Bargaining Solutions in a Social Network

Bargaining Solutions in a Social Network Tanmoy Chakraborty and Michael Kearns Department of Computer and Information Science University of Pennsylvania Abstract. We study the concept of bargaining solutions,

### An Environment Model for N onstationary Reinforcement Learning

An Environment Model for N onstationary Reinforcement Learning Samuel P. M. Choi Dit-Yan Yeung Nevin L. Zhang pmchoi~cs.ust.hk dyyeung~cs.ust.hk lzhang~cs.ust.hk Department of Computer Science, Hong Kong

### Monotonicity Hints. Abstract

Monotonicity Hints Joseph Sill Computation and Neural Systems program California Institute of Technology email: joe@cs.caltech.edu Yaser S. Abu-Mostafa EE and CS Deptartments California Institute of Technology

### Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski trajkovski@nyus.edu.mk

Introduction to Machine Learning and Data Mining Prof. Dr. Igor Trakovski trakovski@nyus.edu.mk Neural Networks 2 Neural Networks Analogy to biological neural systems, the most robust learning systems

### The Artificial Prediction Market

The Artificial Prediction Market Adrian Barbu Department of Statistics Florida State University Joint work with Nathan Lay, Siemens Corporate Research 1 Overview Main Contributions A mathematical theory

### Logistic Regression (1/24/13)

STA63/CBB540: Statistical methods in computational biology Logistic Regression (/24/3) Lecturer: Barbara Engelhardt Scribe: Dinesh Manandhar Introduction Logistic regression is model for regression used

### Current Standard: Mathematical Concepts and Applications Shape, Space, and Measurement- Primary

Shape, Space, and Measurement- Primary A student shall apply concepts of shape, space, and measurement to solve problems involving two- and three-dimensional shapes by demonstrating an understanding of:

### Section 5. Stan for Big Data. Bob Carpenter. Columbia University

Section 5. Stan for Big Data Bob Carpenter Columbia University Part I Overview Scaling and Evaluation data size (bytes) 1e18 1e15 1e12 1e9 1e6 Big Model and Big Data approach state of the art big model

### Automated Stellar Classification for Large Surveys with EKF and RBF Neural Networks

Chin. J. Astron. Astrophys. Vol. 5 (2005), No. 2, 203 210 (http:/www.chjaa.org) Chinese Journal of Astronomy and Astrophysics Automated Stellar Classification for Large Surveys with EKF and RBF Neural

### Machine Learning and Pattern Recognition Logistic Regression

Machine Learning and Pattern Recognition Logistic Regression Course Lecturer:Amos J Storkey Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh Crichton Street,

### Subspace Analysis and Optimization for AAM Based Face Alignment

Subspace Analysis and Optimization for AAM Based Face Alignment Ming Zhao Chun Chen College of Computer Science Zhejiang University Hangzhou, 310027, P.R.China zhaoming1999@zju.edu.cn Stan Z. Li Microsoft

### CORRELATED TO THE SOUTH CAROLINA COLLEGE AND CAREER-READY FOUNDATIONS IN ALGEBRA

We Can Early Learning Curriculum PreK Grades 8 12 INSIDE ALGEBRA, GRADES 8 12 CORRELATED TO THE SOUTH CAROLINA COLLEGE AND CAREER-READY FOUNDATIONS IN ALGEBRA April 2016 www.voyagersopris.com Mathematical

### Data Mining Chapter 6: Models and Patterns Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Data Mining Chapter 6: Models and Patterns Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Models vs. Patterns Models A model is a high level, global description of a

### Chapter 4: Artificial Neural Networks

Chapter 4: Artificial Neural Networks CS 536: Machine Learning Littman (Wu, TA) Administration icml-03: instructional Conference on Machine Learning http://www.cs.rutgers.edu/~mlittman/courses/ml03/icml03/

### ELICITING a representative probability distribution from

146 IEEE TRANSACTIONS ON ENGINEERING MANAGEMENT, VOL. 53, NO. 1, FEBRUARY 2006 Entropy Methods for Joint Distributions in Decision Analysis Ali E. Abbas, Member, IEEE Abstract A fundamental step in decision

### Simple and efficient online algorithms for real world applications

Simple and efficient online algorithms for real world applications Università degli Studi di Milano Milano, Italy Talk @ Centro de Visión por Computador Something about me PhD in Robotics at LIRA-Lab,

### Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering

Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering Department of Industrial Engineering and Management Sciences Northwestern University September 15th, 2014

### Probability and Statistics

CHAPTER 2: RANDOM VARIABLES AND ASSOCIATED FUNCTIONS 2b - 0 Probability and Statistics Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be

### Neural Networks for Machine Learning. Lecture 13a The ups and downs of backpropagation

Neural Networks for Machine Learning Lecture 13a The ups and downs of backpropagation Geoffrey Hinton Nitish Srivastava, Kevin Swersky Tijmen Tieleman Abdel-rahman Mohamed A brief history of backpropagation