A Behavior Based Kernel for Policy Search via Bayesian Optimization

Size: px
Start display at page:

Download "A Behavior Based Kernel for Policy Search via Bayesian Optimization"

Transcription

1 via Bayesian Optimization Aaron Wilson Alan Fern Prasad Tadepalli Oregon State University School of EECS, 1148 Kelley Engineering Center, Corvallis, OR Abstract We expand on past successes applying Bayesian Optimization (BO) to the Reinforcement Learning (RL) problem. BO is a general method of searching for the maximum of an unknown objective function. The BO method explicitly aims to reduce the number of samples needed to identify the optimal solution by exploiting a probabilistic model of the objective function. Much work in BO has focused on Gaussian Process (GP) models of the objective. The performance of these models relies on the design of the kernel function relating points in the solution space. Unfortunately, previous approaches adapting ideas from BO to the RL setting have focused on simple kernels that are not well justified in the RL context. We show that a new kernel can be motivated by examining an upper bound on the absolute difference in expected return between policies. The resulting kernel explicitly compares the behaviors of policies in terms of the trajectory probability densities. We incorporate the behavior based kernel into a BO algorithm for policy search. Results reported on four standard benchmark domains show that our algorithm significantly outperform alternative state-of-the-art algorithms. 1. Introduction In the policy search setting, RL agents seek an optimal policy within a fixed set. In such a setting an agent executes a sequence of policies searching for the true optimum. Naturally, future policy selection decisions should benefit from the information available in all samples. A question arises regarding how the expected return of untried policies can be estimated using the batch of samples, and how to best use Appearing in ICML 2011 Workshop: Planning and Acting with Uncertain Models, Bellevue, WA, USA, Copyright 2011 by the author(s)/owner(s). the estimated returns to perform policy search. In this work we propose explicitly constructing a probabilistic model of the expected return informed by observations of past policy behaviors. We exploit this probabilistic model of the return by selecting new policies predicted to best improve on the performance of policies in the sample set. Our approach is based on adapting black box Bayesian Optimization (BO) to the RL problem. BO is a method of sequentially planning a sequence of queries from an unknown objective function for purposes of seeking the maximum. It is an ideal method for tackling the basic problem of policy search as it directly confronts the fundamental issue of trading off exploration of the objective function (global searches) with exploitation (local searches). Fundamental to application of Bayesian Optimization techniques is the definition of a Bayesian prior distribution for the objective function. The method of BO searches this surrogate representation of the objective function for maximal points instead of directly querying the true objective. Hopefully, by using a large number of surrogate function evaluations (trading computational resources for higher quality samples) the true maximum point can be identified with few queries to the true objective. As in most Bayesian methods success of the BO technique rests on the quality of the modeling effort. How should the objective, the expected return in the RL case, be effectively modeled? In this work, similar to past efforts applying BO to RL, we focus on GP models of the expected return. The generalization performance of GP models, and hence the performance of the BO technique, is strongly impacted by the definition of the kernel function which encodes a notion of relatedness between points in the function space. When applying BO to RL this means encoding a notion of similarity between policies. Past work has used simple kernels to relate policy parameters (for instance squared exponential kernels (Lizotte et al., 2007; Wilson et al., 2010)). Unfortunately, the selected kernels fail to account for the special properties of sequential decision processes typical of RL problems. A more appropriate notion of relatedness is needed for the RL context. We propose that policies are

2 better related by their behavior rather than their parameters. Below we motivate our behavior-based kernel function. We then discuss how to incorporate the kernel into a BO approach when a sparse sample of policy trajectories are available. Empirically we demonstrate that the behaviorbased Kernel significantly improves BO and outperforms a selection of standard algorithms on four benchmark domains. 2. Problem Setting We study the Reinforcement Learning problem in the context of Markov Decision Processes (MDPs). MDPs are described by a tuple (S, A, P, P 0, R, π). We consider processes with continuous state and action values. Where each state and action is a vector s R n, and a R. The transition function P is a probability distribution P (s t s t 1, a t 1 ) defining the response of the process to the agents action selections. Distribution P 0 gives the probability of beginning in a particular state. The reward function R(s, a) returns a numeric value representing the immediate reward for the state action pair (we do not consider stochastic reward functions). Finally, the function π is a stochastic mapping from states to actions P π (a φ(s), θ). It is a function of a vector of parameters θ R k, and features of the state φ(s). We are interested in episodic average reward RL. Define the trajectory density, P (ξ θ) = P 0 (s 0 ) T P (s t s t 1, a t 1 )P π (a t 1 φ(s t 1 ), θ), t=1 and the value of a trajectory R(ξ) = T t=0 R(s t, a t ). The variable T is assumed to have a maximum value insuring that all trajectories have finite length. Generalizations of our efforts to the infinite horizon case is possible, but is not a focus of this work. We define the expected return in terms of the integral over paths, η(θ) = R(ξ)P (ξ; θ)dξ. The basic policy search problem is to identify the policy parameters that maximize this expectation, arg max θ η(θ). 3. Policy Search via Bayesian Optimization Bayesian optimization addresses the general problem of maximizing a real valued function, θ = arg max η(θ). θ BO is a global method for tackling expensive objective functions by explicitly reducing the number of evaluations needed before the maximum is found. In BO the objective function is treated like a random variable, is modeled by a probability distribution P (η), and the uncertainty encoded by this distribution is employed to select which points will be used to query the objective. The principle idea is to use a large number of surrogate evaluations to reduce the number of expensive evaluations of the true objective. The BO method is a form of active learning. Given the objective function prior distribution BO proceeds iteratively. A point is selected according to some criteria (the selection criteria is a function of the posterior), the point is evaluated (a policy in our case), the posterior distribution is updated using the data, a new point is selected, and so on. As an active method of learning the criteria for selection of new query points plays a critical role in the quality of the posterior estimation of the surface, and the speed of identifying the maximum. Any selection criteria must address the trade off of exploration and exploitation. Because the Bayes optimal selection criteria is computationally intractable a heuristic method of selection must be used. A common heuristic called Maximum Expected Improvement (MEI) is the method of selection used in this work. Suppose we have a collection of n points in the θ space, and their associated objective function values. Define η max to be the point with highest observed return in the data set. Consider the following function, I(θ) = max{0, η(θ) η max }, which returns the amount by which the point θ exceeds the observed maximum. MEI searches for the maximum of the expectation of this improvement function with respect to the posterior uncertainty P (η(θ) D) given observed data D, θ n+1 = arg max E P (η(θ) D) [I(θ)]. θ Conveniently, closed form solutions exist for the Expected Improvement. By incorporating the posterior uncertainty into the selection process MEI is guaranteed to explore regions of sufficiently high uncertainty. Clearly, when the conditional posterior distribution has sufficient probability mass above the current maximum the EI will be positive, pushing the algorithm to execute experiments in new regions. Due to its empirical success the MEI criterion, originally proposed by (Mockus, 1994), has become the standard choice in most work on BO. Recent work has also established the convergence properties of iteratively selecting points using MEI with GP prior (Vazquez & Bect, 2010) lending further weight to its continued use. Crucial to the performance of the BO method is the definition of the objective function prior distribution. Our objective in the policy search problem is maximization of the

3 Algorithm 1 Bayesian Optimization Algorithm for RL 1: Let D 1:n = {η(θ i), ξ i} n i=1. 2: Compute the matrix of covariances K. 3: Select the next point in the policy space to evaluate: θ n+1 = arg max θ E P (η(θ) D) (I(θ) D 1:n). 4: Execute the policy θ n+1 for E episodes. 5: Compute Monte Carlo estimate of expected return ˆη(θ n+1) = 1 R(ξ) E ξ ξ n+1 6: Update D 1:n+1 = D 1:n (ˆη(θ n+1), ξ n+1) 7: Return to step 2. expected return. And it is this quantity that we model using the GP. GPs are defined by a mean and covariance function, η(θ) GP (m(θ), K(θ, θ)). The mean m(θ) encodes prior assumptions about the underlying function space (frequently assumed to be zero). The covariance matrix K(θ, θ) encodes relationships between points in the function space. Substantial engineering efforts have been devoted to developing meaningful kernel functions, K(θ i, θ j ), for a variety of domains due to the kernel s impact on generalization performance. Consider the basic Bayesian Optimization Algorithm 1. Line 1 assumes a batch of data of the form, {ˆη(θ i ), ξ i } and we denote the full set of observations from all past policies D 1:n = {ˆη(θ i ), ξ i } n i=1. We write ˆη to indicate a Monte Carlo estimate of the expected return for policy θ, and ξ i indicates the set of trajectories used in the Monte Carlo estimate. Given this data the surface of the expected return is modeled using the GP prior. For the moment we leave aside the computation of the covariance function in line 2. For purposes of maximizing the expected improvement, line 3, the GP posterior distribution, P (f(θ D 1:n )), must be computed. In the GP model this posterior has a simple form. Given the data D 1:n let y be the vector of outputs such that, y i = [η(θ i )] and let K(θ, θ)) be the covariance matrix with elements K(θ i, θ j ). Consequently the conditional posterior distribution is Gaussian with mean, and variance, µ(η(θ n+1 ) D 1:n ) = k(θ n+1, θ)k(θ, θ) 1 y, σ 2 (η(θ n+1 ) D 1:n ) = k(θ n+1, θ n+1 ) k(θ n+1, θ)k(θ, θ) 1 k(θ, θ n+1 ). k(θ n+1, θ) is the vector of similarities between the new point and all previously observed points, and k(θ, θ n+1 ) is its transpose. It is at line 3 that the selection of the kernel function has its impact. The kernel controls how the information in the sample is generalized to new points, and therefore the quality of the points returned by the optimization. Our work is an effort to improve the generalization performance of the GP by defining a meaningful kernel for the RL context. We discuss the motivation for our kernel and its estimation below Behavior-based Kernel It turns out that a simple bound relates the difference in returns of two policies to the KL-divergence of the trajectory densities. Consider the difference in expected returns of two policies indexed by θ i, and θ j. The absolute value of this difference, η(θ i ) η(θ j ), has an upper bound expressed in terms of the Kullback Leibler (KL) divergence, ( ) P (ξ θi) KL(P (ξ θ i) P (ξ θ j)) = P (ξ θ i) dξ, P (ξ θ j) of the trajectory probability densities. Theorem [ 1. For any θ i, and θ j, η(θ i ) η(θ j ) Rmax KL(P (ξ θi 2 ) P (ξ θ j )) + ] KL(P (ξ θ j ) P (ξ θ i )). Proof. η(θ i ) η(θ j ) = = Rmax R(ξ)P (ξ θ i )dξ R(ξ)(P (ξ θ i ) P (ξ θ j )) dξ R(ξ)(P (ξ θ i ) P (ξ θ j )) dξ P (ξ θ i ) P (ξ θ j ) dξ R(ξ)P (ξ θ j ) dξ Rmax 2 KL(P (ξ θ i ) P (ξ θ j )) = Rmax 2 [ KL(P (ξ θ i ) P (ξ θ j )) ] + KL(P (ξ θ j ) P (ξ θ i )) Rmax 2 D(θ i, θ j ) The Rmax term, introduced in line 4, represents the maximal score for any finite length trajectory. The first introduction of the KL-divergence is justified by Pinsker s Inequality. Pinsker s inequality bounds from above the variational distance between two distributions, defined on arbitrary sets, by the divergence term shown above. The inequality states that 1 (V (P, Q))2 2 KL(P, Q) where V is the variational distance, (P (x) Q(x)) dx. The inequality was originally proposed in (Pinsker, 1964) with recent generalizations to other variational distances here (Reid & Williamson, 2009). The second to last line introduces the symmetric KL-divergence which bounds the standard divergence from above (KL(P, Q) 0). Importantly the bound is a symmetric positive measure of distance between policies. It bounds, from above, the absolute difference in expected value, and reaches zero only when the divergence is zero. Additionally, though the variational bound is strictly tighter than the divergence based bound reported here, computing the variational distance inherently requires knowledge of the domain transition models. Alternatively, the term of the KL-divergence is a ratio of path probabilities and can be computed with no knowledge of the domain model. This characteristic is important when learned models are not available. Our goal is to incorporate the final measure of policy relatedness into

4 the surrogate representation of the expected return. Unfortunately, the divergence function does not meet the standard requirements for a kernel ((Moreno et al., 2004)). To transform the bound into a valid kernel we first define a function, A Behavior Based Kernel for Policy Search D(θ i, θ j) = KL(P (ξ θ i) P (ξ θ j))+ KL(P (ξ θ j) P (ξ θ i)), and define the covariance function to be the negative exponential of D, K(θ i, θ j ) = exp( α D(θ i, θ j )). The kernel has a single scalar parameter α controlling its width. This is precisely what we sought, a measure of policy similarity which depends on the action selection decisions. The kernel compares behaviors not parameters. Though the variance of this estimate can be large it will not negatively impact exploration in our algorithm. Our empirical results show that errors in the divergence estimates, including the importance sampled estimates, do not negatively impact performance. 4. Results We report the performance of our algorithm in four benchmark RL tasks including mountain car, cart-pole balancing, 3-link planar arm, and an acrobot domain. We compare BOA with our behavior based kernel to three alternatives: The BOA with squared exponential kernel, Q- Learning with CMAC function approximation (Sutton & Barto, 1998), and LSPI (Lagoudakis et al., 2003) Estimation of the Kernel Function Values We propose using this kernel to improve the BO algorithm discussed above where the proposed kernel plays a role in lines 2 and 3 of the algorithm. Below we discuss using estimates of the divergence values. Computing the exact KL-divergence requires access to a model of the decision process. Even with a model in hand computing the integral over paths is itself a computationally demanding process. The divergence must be estimated. In this work we elect to use a simple Monte Carlo estimate of the divergence. The divergence between policy θ i and θ j is approximated by, ˆD(θ i, θ j ) = ξ ξi ( P (ξ θi ) P (ξ θ j ) ) + ξ ξ j ( ) P (ξ θj ), P (ξ θ i ) using a sparse sample of trajectories generated by each policy respectively (ξ i represents the set of trajectories generated by policy θ i ). Because of the definition of the trajectory density the term within the arithm reduces to a ratio of action selection probabilities, ( ) P (ξ θi ) = P (ξ θ j ) T t=1 easily computed without a model. ( ) Pπ (a t φ(s t ), θ i ), P π (a t φ(s t ), θ j ) A second problem arises when computing the Expected Improvement (Line 3 of the BOA). Computing the conditional predictive mean and covariance for new points requires evaluation of the kernel for policies which have no trajectories associated with them. Because we have no access to a model we elect to use an importance sampled estimate of the divergence, ˆD(θ new, θ j) = ( P (ξ θ new) P (ξ θnew) P (ξ θ j) P (ξ θ j) ξ ξ j ( ) P (ξ θj) +. P (ξ θ new) ) 4.1. Experiment Setup We detail the special requirements necessary to implement each algorithm in this section. The results reported below are averaged over 30 runs for the BOA implementations, and 300 runs for Q-learning and LSPI. The initial policy is always randomly initialized. Expected returns reported for the first episode represent the average performance of randomly generated policies. BOA with Behavior Based Kernel. To generate data stochastic policies are transformed into deterministic policies by executing the maximum probability action. Single trajectories generated from these policies are provided as data to the kernel function. The expected returns reported below are for these deterministic policies. Policies are treated as stochastic for purposes of computing the kernel function. As seen below this sparse sample is sufficient to distinguish policies using the behavior based kernel. Maximizing the EI is done using a gradient free black box optimizer called DIRECT (Jones et al., 1993) BOA. We compare to a BO algorithm with a squared exponential kernel, which was the kernel of choice in past work (Lizotte et al., 2007; Wilson et al., 2010). K(θ i, θ j ) = exp( 1 2 (θ i θ j ) T ρ(θ i θ j )), We were able to get positive results by tuning the ρ vector for each experiment. Reported results are for the best setting of this parameter. The mean function of the GP was set to zero. To generate data for the BOA stochastic policies are transformed into deterministic policies as described above. The expected returns reported below are for these deterministic policies. DIRECT is used to optimize the EI. Q-Learning with CMAC function approximation. The basis function set was identified by hand in each problem. Epsilon greedy exploration was used.

5 LSPI. LSPI results are reported in the cart-pole, acrobot, and mountain car tasks. We were unable to get reasonable results from LSPI in the planar arm domain Cart-Pole Domain In the cart-pole domain the agent attempts to balance a pole for a fixed allotment of time. Successful policies keep the agents cart within a fixed boundary and maintain stability throughout the episode. In this version of the domain the agent must keep the pole balanced for 1000 steps. The state includes the location of the cart, the cart velocity, the angle of the pole and the angular velocity of the pole. At each step the agent receives a positive reward plus a penalty for large pole angles and speeds. This reward promotes policies that minimize deviations from the ideal position. Finally, a successfully completed episode (balancing for the full 1000 steps) gives the agent a reward of 100. The policy search algorithms maximize a linear policy. Figure 1 shows the results for the cart pole domain. In this case the parameter of the divergence kernel is set to 1. We have performed analysis of the sensitivity to this parameter which cannot be set too small. When it is set below.1 the probability of convergence to the true optimal begins to fall to zero. Of course, to avoid this problem the value of the kernel parameter can always be set using by maximizing the likelihood of the data (Rasmussen & Williams, 2005). This was confirmed in additional experiments with the kernel parameter set to 3 and 10 respectively which continued to explore well after finding optimum points. Setting the parameter to 1 guaranteed convergence to the optimal policy for all of our runs and avoided unnecessary exploration. Clearly the divergence based kernel outperforms all of the competitors including the BOA with the squared exponential kernel. Importantly many policies generated by the standard BOA have similar behaviors in the cart-pole domain. The behavior based kernel avoids exploring these redundant behaviors resulting in quick convergence Planar Arm Domain In this domain the agent controls an articulated arm attempting to place the arm tip within a fixed goal location. Three arm joints are controlled by applying a small amount of torque (-1 or 1) which causes a kinematic response. Each arm segment is constrained to move through of rotation simulating constraints of a real machine. At each step the agent is penalized by the distance from the center of the goal to the tip of the arm. A istic controller is used for each joint. The state space for each controller is the distance between location of the arm tip and target. Figure 2 shows the results for the planar arm domain. The generalization of the divergence-based kernel is particularly powerful in this case. Much of the policy space is quickly identified to be redundant by our kernel. In this case the performance is not as responsive to the kernel parameter. We performed experiments setting α as high as 10 and still observed quick convergence to the maximum value. The experiment reported here has the value of α set to Mountain Car Domain The mountain car task is to accelerate a simulated car from an initial position within a basin of attraction to the peak of a slope. The problem is made difficult because the car does not have sufficient power to drive directly to the goal. The agent must generate momentum by backing up one hillside to have sufficient velocity to reach the opposite peak. At each step the mountain car agent receives a flat -1 reward, and a bonus reward of 100 if the agent reaches the peak. The agent controls the car with left, and right actions. A istic function is used to select actions. In Figure 3 we report the results. In this case we have elected to leave the other results off of the graph. The other methods were unable to find competitive policies with less than 500 samples (the standard BOA outperforms all of the other alternatives). This is due to the flat reward structure which leads to a plateaued objective function. This makes Mountain Car a perfect experiment for illustrating the importance of directed exploration based on differences in policy behavior. Approaches based on random exploration, and exploration weighted by returns are poorly suited to this kind of reward structure. The divergence kernel on the other hand generalizes between whole regions of the plateau directing search to policies likely to generate novel behaviors. Visual inspection of the performance of the standard BOA algorithm shows that many of the selected policies, unrelated according to the squared exponential kernel, actually produce the same action sequences when started at the initial state. This redundant search is completely avoided by using the behavior based kernel. We also provide plots indicating the sensitivity to the kernel parameter α. When the kernel parameter is set too low little exploration is performed and a suboptimal 130 step policy is found. When set to 10 much more exploration is performed, more than 200 additional episodes are sampled before settling on an optimal 117 step policy, which compares favorably to the 119 step policy found when the parameter is set to Acrobot Task In this domain the agent controls a simulated acrobot attached by the hands to a fixed location. The goal is to apply torque (-1,1) to the hips of the robot and swing the feet above a pre-specified threshold. The dynamics of the acrobot are constrained so that the bottom half of the agent

6 Figure 1. Cart-pole. cannot perform full revolutions. At each step the agent receives a flat -1 penalty and a bonus +100 if the goal height is reached. A istic policy is used to select actions. Figure 4 illustrates the results for the acrobot domain. The generalization performance is less pronounced in this case. The squared exponential kernel generalizes reasonably well. The reason for this can be observed in the behavior of random policies in the acrobot domain. The behaviors are erratic. Small changes in the parameters of the policy lead to very different action sequences. Therefore, more behaviors must be searched before good policies are identified. Even so the behavior based kernel does outperform the BOA with squared exponential kernel. 5. Related Work Work by (Kakade, 2001) presented a metric based on the Fisher information to derive the natural policy gradient update. Kakade was able to show significant results in a difficult Tetris domain outperforming standard gradient methods. Follow up work by (Bagnell & Schneider, 2003) proposed pursuing a related idea within the path integral framework for RL (the same framework of this paper). Their work considers metrics defined as functions on the distribution over trajectories generated by a fixed policy P (ξ θ). In contrast to our goals both works focus on iteratively improving a policy via gradient descent. Furthermore, no explicit attention is paid to using the metric information to guide the exploratory process. However, the insight that policy relationships should be functions of the trajectory density has played a key role in our work. Figure 2. Planar Arm. More closely related to our proposal is recent work in (Peters et al., 2010), and (Kober & Peters, 2010). (Peters et al., 2010) uses a divergence-based bound to control exploration. Specifically, they attempt to maximize the expected reward subject to a bound proportional to the KLdivergence between the empirically observed state-action distribution and the state-action distribution of the new policy. The search for a new policy is necessarily local, restricted by the bound to be close to the current policy. By contrast, our work uses the divergence as a measure of similarity allowing for a more aggressive search of the policy space. A related work (Kober & Peters, 2010) derives a lower bound on the importance sampled estimate of the expected return, as was done in (Dayan & Hinton, 1997), and observes the relationship to the KL-divergence of the reward weighted behavior policy and the target policy. They derive from this relationship an EM-based update for the policy parameters. An explicit effort is made to construct the update such that exploration is accounted for. However, their method of state-dependent exploration is still based on random perturbations of the action selection policy. Our method of exploration is instead determined by the posterior uncertainty, and does not depend on the behavior policy. In fact, the mean action of a stochastic policy, treating the stochastic policy as a deterministic one, can be used by our algorithm. This is an advantage when working with real physical systems where random perturbations can damage expensive equipment. 6. Conclusion We have examined how to improve policy search algorithms by constructing and exploiting a probabilistic model of the expected return objective function. Our work extends BO methods for policy search problems by constructing a behavior based kernel. We motivate our kernel by examining a simple upper bound on the absolute difference of expected returns. The resulting bound is a symmetric positive measure of distance between policies, and reaches zero only when the divergence is zero. We use this upper bound

7 Figure 3. Mountain Car. as the basis for our kernel function, argue that the properties of the bound insure a more reasonable measure of policy relatedness, and demonstrate empirically that the improved model of the objective substantially speeds exploration in some simple benchmark domains. Acknowledgments This research is supported by the Army Research Office and the Office of Naval Research. References Bagnell, J. Andrew (Drew) and Schneider, Jeff. Covariant policy search. In IJCAI, August Dayan, Peter and Hinton, Geoffrey E. Using expectationmaximization for reinforcement learning. Neural Computation, 9: , February Jones, D. R., Perttunen, C. D., and Stuckman, B. E. Lipschitzian optimization without the lipschitz constant. J. Optim. Theory Appl., 79(1): , Kakade, Sham. A natural policy gradient. In NIPS, Kober, Jens and Peters, Jan. Policy search for motor primitives in robotics. Machine Learning, pp. 1 33, Lagoudakis, Michail G., Parr, Ronald, and Bartlett, L. Least-squares policy iteration. Journal of Machine Learning Research, 4, Figure 4. Acrobot. Mockus, J. Application of bayesian approach to numerical methods of global and stochastic optimization. Global Optimization, 4(4): , Moreno, Pedro J., Ho, Purdy P., and Vasconcelos, Nuno. A kullback-leibler divergence based kernel for svm classification in multimedia applications. In NIPS, Peters, Jan, Mülling, Katharina, and Altun, Yasemin. Relative entropy policy search. In AAAI, Pinsker, M. Information and Information Stability of Random Variables and Processes. Holden-Day Inc, San Francisco, Translated by Amiel Feinstein. Rasmussen, Carl Edward and Williams, Christopher K. I. Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning). The MIT Press, ISBN X. Reid, Mark D. and Williamson, Robert C. pinsker inequalities. In COLT, Generalised Sutton, R.S. and Barto, A. G. Reinforcement Learning:An Introduction. MIT Press, Vazquez, Emmanuel and Bect, Julien. Convergence properties of the expected improvement algorithm with fixed mean and covariance functions. Journal of Statistical Planning and Inference, 140(11): , Wilson, Aaron, Fern, Alan, and Tadepalli, Prasad. Incorporating domain models into bayesian optimization for rl. In ECML, Lizotte, Daniel, Wang, Tao, Bowling, Michael, and Schuurmans, Dale. Automatic gait optimization with gaussian process regression. In IJCAI, 2007.

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical

More information

A Sarsa based Autonomous Stock Trading Agent

A Sarsa based Autonomous Stock Trading Agent A Sarsa based Autonomous Stock Trading Agent Achal Augustine The University of Texas at Austin Department of Computer Science Austin, TX 78712 USA achal@cs.utexas.edu Abstract This paper describes an autonomous

More information

Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct

More information

Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014

Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014 Probabilistic Models for Big Data Alex Davies and Roger Frigola University of Cambridge 13th February 2014 The State of Big Data Why probabilistic models for Big Data? 1. If you don t have to worry about

More information

Using Markov Decision Processes to Solve a Portfolio Allocation Problem

Using Markov Decision Processes to Solve a Portfolio Allocation Problem Using Markov Decision Processes to Solve a Portfolio Allocation Problem Daniel Bookstaber April 26, 2005 Contents 1 Introduction 3 2 Defining the Model 4 2.1 The Stochastic Model for a Single Asset.........................

More information

Basics of Statistical Machine Learning

Basics of Statistical Machine Learning CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar

More information

Learning Tetris Using the Noisy Cross-Entropy Method

Learning Tetris Using the Noisy Cross-Entropy Method NOTE Communicated by Andrew Barto Learning Tetris Using the Noisy Cross-Entropy Method István Szita szityu@eotvos.elte.hu András Lo rincz andras.lorincz@elte.hu Department of Information Systems, Eötvös

More information

Machine Learning and Pattern Recognition Logistic Regression

Machine Learning and Pattern Recognition Logistic Regression Machine Learning and Pattern Recognition Logistic Regression Course Lecturer:Amos J Storkey Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh Crichton Street,

More information

Gaussian Processes to Speed up Hamiltonian Monte Carlo

Gaussian Processes to Speed up Hamiltonian Monte Carlo Gaussian Processes to Speed up Hamiltonian Monte Carlo Matthieu Lê Murray, Iain http://videolectures.net/mlss09uk_murray_mcmc/ Rasmussen, Carl Edward. "Gaussian processes to speed up hybrid Monte Carlo

More information

An Introduction to Machine Learning

An Introduction to Machine Learning An Introduction to Machine Learning L5: Novelty Detection and Regression Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia Alex.Smola@nicta.com.au Tata Institute, Pune,

More information

Tutorial on Markov Chain Monte Carlo

Tutorial on Markov Chain Monte Carlo Tutorial on Markov Chain Monte Carlo Kenneth M. Hanson Los Alamos National Laboratory Presented at the 29 th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Technology,

More information

Section 5. Stan for Big Data. Bob Carpenter. Columbia University

Section 5. Stan for Big Data. Bob Carpenter. Columbia University Section 5. Stan for Big Data Bob Carpenter Columbia University Part I Overview Scaling and Evaluation data size (bytes) 1e18 1e15 1e12 1e9 1e6 Big Model and Big Data approach state of the art big model

More information

Gaussian Process Training with Input Noise

Gaussian Process Training with Input Noise Gaussian Process Training with Input Noise Andrew McHutchon Department of Engineering Cambridge University Cambridge, CB PZ ajm57@cam.ac.uk Carl Edward Rasmussen Department of Engineering Cambridge University

More information

Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method

Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method Martin Riedmiller Neuroinformatics Group, University of Onsabrück, 49078 Osnabrück Abstract. This

More information

CS 688 Pattern Recognition Lecture 4. Linear Models for Classification

CS 688 Pattern Recognition Lecture 4. Linear Models for Classification CS 688 Pattern Recognition Lecture 4 Linear Models for Classification Probabilistic generative models Probabilistic discriminative models 1 Generative Approach ( x ) p C k p( C k ) Ck p ( ) ( x Ck ) p(

More information

TD(0) Leads to Better Policies than Approximate Value Iteration

TD(0) Leads to Better Policies than Approximate Value Iteration TD(0) Leads to Better Policies than Approximate Value Iteration Benjamin Van Roy Management Science and Engineering and Electrical Engineering Stanford University Stanford, CA 94305 bvr@stanford.edu Abstract

More information

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture - 17 Shannon-Fano-Elias Coding and Introduction to Arithmetic Coding

More information

The Advantages and Disadvantages of Online Linear Optimization

The Advantages and Disadvantages of Online Linear Optimization LINEAR PROGRAMMING WITH ONLINE LEARNING TATSIANA LEVINA, YURI LEVIN, JEFF MCGILL, AND MIKHAIL NEDIAK SCHOOL OF BUSINESS, QUEEN S UNIVERSITY, 143 UNION ST., KINGSTON, ON, K7L 3N6, CANADA E-MAIL:{TLEVIN,YLEVIN,JMCGILL,MNEDIAK}@BUSINESS.QUEENSU.CA

More information

Motivation. Motivation. Can a software agent learn to play Backgammon by itself? Machine Learning. Reinforcement Learning

Motivation. Motivation. Can a software agent learn to play Backgammon by itself? Machine Learning. Reinforcement Learning Motivation Machine Learning Can a software agent learn to play Backgammon by itself? Reinforcement Learning Prof. Dr. Martin Riedmiller AG Maschinelles Lernen und Natürlichsprachliche Systeme Institut

More information

Adaptive Online Gradient Descent

Adaptive Online Gradient Descent Adaptive Online Gradient Descent Peter L Bartlett Division of Computer Science Department of Statistics UC Berkeley Berkeley, CA 94709 bartlett@csberkeleyedu Elad Hazan IBM Almaden Research Center 650

More information

Inference of Probability Distributions for Trust and Security applications

Inference of Probability Distributions for Trust and Security applications Inference of Probability Distributions for Trust and Security applications Vladimiro Sassone Based on joint work with Mogens Nielsen & Catuscia Palamidessi Outline 2 Outline Motivations 2 Outline Motivations

More information

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop Music and Machine Learning (IFT6080 Winter 08) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher

More information

Inductive QoS Packet Scheduling for Adaptive Dynamic Networks

Inductive QoS Packet Scheduling for Adaptive Dynamic Networks Inductive QoS Packet Scheduling for Adaptive Dynamic Networks Malika BOURENANE Dept of Computer Science University of Es-Senia Algeria mb_regina@yahoo.fr Abdelhamid MELLOUK LISSI Laboratory University

More information

Linear Models for Classification

Linear Models for Classification Linear Models for Classification Sumeet Agarwal, EEL709 (Most figures from Bishop, PRML) Approaches to classification Discriminant function: Directly assigns each data point x to a particular class Ci

More information

An Environment Model for N onstationary Reinforcement Learning

An Environment Model for N onstationary Reinforcement Learning An Environment Model for N onstationary Reinforcement Learning Samuel P. M. Choi Dit-Yan Yeung Nevin L. Zhang pmchoi~cs.ust.hk dyyeung~cs.ust.hk lzhang~cs.ust.hk Department of Computer Science, Hong Kong

More information

Constrained Bayes and Empirical Bayes Estimator Applications in Insurance Pricing

Constrained Bayes and Empirical Bayes Estimator Applications in Insurance Pricing Communications for Statistical Applications and Methods 2013, Vol 20, No 4, 321 327 DOI: http://dxdoiorg/105351/csam2013204321 Constrained Bayes and Empirical Bayes Estimator Applications in Insurance

More information

Introduction to Online Learning Theory

Introduction to Online Learning Theory Introduction to Online Learning Theory Wojciech Kot lowski Institute of Computing Science, Poznań University of Technology IDSS, 04.06.2013 1 / 53 Outline 1 Example: Online (Stochastic) Gradient Descent

More information

Sampling via Moment Sharing: A New Framework for Distributed Bayesian Inference for Big Data

Sampling via Moment Sharing: A New Framework for Distributed Bayesian Inference for Big Data Sampling via Moment Sharing: A New Framework for Distributed Bayesian Inference for Big Data (Oxford) in collaboration with: Minjie Xu, Jun Zhu, Bo Zhang (Tsinghua) Balaji Lakshminarayanan (Gatsby) Bayesian

More information

Big Data - Lecture 1 Optimization reminders

Big Data - Lecture 1 Optimization reminders Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Schedule Introduction Major issues Examples Mathematics

More information

Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering

Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering Department of Industrial Engineering and Management Sciences Northwestern University September 15th, 2014

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Gaussian Mixture Models Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique

More information

Introduction to Machine Learning Using Python. Vikram Kamath

Introduction to Machine Learning Using Python. Vikram Kamath Introduction to Machine Learning Using Python Vikram Kamath Contents: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. Introduction/Definition Where and Why ML is used Types of Learning Supervised Learning Linear Regression

More information

Christfried Webers. Canberra February June 2015

Christfried Webers. Canberra February June 2015 c Statistical Group and College of Engineering and Computer Science Canberra February June (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 829 c Part VIII Linear Classification 2 Logistic

More information

ECON20310 LECTURE SYNOPSIS REAL BUSINESS CYCLE

ECON20310 LECTURE SYNOPSIS REAL BUSINESS CYCLE ECON20310 LECTURE SYNOPSIS REAL BUSINESS CYCLE YUAN TIAN This synopsis is designed merely for keep a record of the materials covered in lectures. Please refer to your own lecture notes for all proofs.

More information

The Optimality of Naive Bayes

The Optimality of Naive Bayes The Optimality of Naive Bayes Harry Zhang Faculty of Computer Science University of New Brunswick Fredericton, New Brunswick, Canada email: hzhang@unbca E3B 5A3 Abstract Naive Bayes is one of the most

More information

CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES. From Exploratory Factor Analysis Ledyard R Tucker and Robert C.

CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES. From Exploratory Factor Analysis Ledyard R Tucker and Robert C. CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES From Exploratory Factor Analysis Ledyard R Tucker and Robert C MacCallum 1997 180 CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES In

More information

Robotics. Chapter 25. Chapter 25 1

Robotics. Chapter 25. Chapter 25 1 Robotics Chapter 25 Chapter 25 1 Outline Robots, Effectors, and Sensors Localization and Mapping Motion Planning Motor Control Chapter 25 2 Mobile Robots Chapter 25 3 Manipulators P R R R R R Configuration

More information

Learning the Structure of Factored Markov Decision Processes in Reinforcement Learning Problems

Learning the Structure of Factored Markov Decision Processes in Reinforcement Learning Problems Learning the Structure of Factored Markov Decision Processes in Reinforcement Learning Problems Thomas Degris Thomas.Degris@lip6.fr Olivier Sigaud Olivier.Sigaud@lip6.fr Pierre-Henri Wuillemin Pierre-Henri.Wuillemin@lip6.fr

More information

Learning Gaussian process models from big data. Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu

Learning Gaussian process models from big data. Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu Learning Gaussian process models from big data Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu Machine learning seminar at University of Cambridge, July 4 2012 Data A lot of

More information

A Learning Based Method for Super-Resolution of Low Resolution Images

A Learning Based Method for Super-Resolution of Low Resolution Images A Learning Based Method for Super-Resolution of Low Resolution Images Emre Ugur June 1, 2004 emre.ugur@ceng.metu.edu.tr Abstract The main objective of this project is the study of a learning based method

More information

large-scale machine learning revisited Léon Bottou Microsoft Research (NYC)

large-scale machine learning revisited Léon Bottou Microsoft Research (NYC) large-scale machine learning revisited Léon Bottou Microsoft Research (NYC) 1 three frequent ideas in machine learning. independent and identically distributed data This experimental paradigm has driven

More information

Monte Carlo Simulation

Monte Carlo Simulation 1 Monte Carlo Simulation Stefan Weber Leibniz Universität Hannover email: sweber@stochastik.uni-hannover.de web: www.stochastik.uni-hannover.de/ sweber Monte Carlo Simulation 2 Quantifying and Hedging

More information

A Game Theoretical Framework for Adversarial Learning

A Game Theoretical Framework for Adversarial Learning A Game Theoretical Framework for Adversarial Learning Murat Kantarcioglu University of Texas at Dallas Richardson, TX 75083, USA muratk@utdallas Chris Clifton Purdue University West Lafayette, IN 47907,

More information

CHAPTER 2 Estimating Probabilities

CHAPTER 2 Estimating Probabilities CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a

More information

Markov Decision Processes for Ad Network Optimization

Markov Decision Processes for Ad Network Optimization Markov Decision Processes for Ad Network Optimization Flávio Sales Truzzi 1, Valdinei Freire da Silva 2, Anna Helena Reali Costa 1, Fabio Gagliardi Cozman 3 1 Laboratório de Técnicas Inteligentes (LTI)

More information

Principles of Data Mining by Hand&Mannila&Smyth

Principles of Data Mining by Hand&Mannila&Smyth Principles of Data Mining by Hand&Mannila&Smyth Slides for Textbook Ari Visa,, Institute of Signal Processing Tampere University of Technology October 4, 2010 Data Mining: Concepts and Techniques 1 Differences

More information

Options with exceptions

Options with exceptions Options with exceptions Munu Sairamesh and Balaraman Ravindran Indian Institute Of Technology Madras, India Abstract. An option is a policy fragment that represents a solution to a frequent subproblem

More information

Linear Classification. Volker Tresp Summer 2015

Linear Classification. Volker Tresp Summer 2015 Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong

More information

Globally Optimal Crowdsourcing Quality Management

Globally Optimal Crowdsourcing Quality Management Globally Optimal Crowdsourcing Quality Management Akash Das Sarma Stanford University akashds@stanford.edu Aditya G. Parameswaran University of Illinois (UIUC) adityagp@illinois.edu Jennifer Widom Stanford

More information

171:290 Model Selection Lecture II: The Akaike Information Criterion

171:290 Model Selection Lecture II: The Akaike Information Criterion 171:290 Model Selection Lecture II: The Akaike Information Criterion Department of Biostatistics Department of Statistics and Actuarial Science August 28, 2012 Introduction AIC, the Akaike Information

More information

Master s Theory Exam Spring 2006

Master s Theory Exam Spring 2006 Spring 2006 This exam contains 7 questions. You should attempt them all. Each question is divided into parts to help lead you through the material. You should attempt to complete as much of each problem

More information

Bayesian Statistics: Indian Buffet Process

Bayesian Statistics: Indian Buffet Process Bayesian Statistics: Indian Buffet Process Ilker Yildirim Department of Brain and Cognitive Sciences University of Rochester Rochester, NY 14627 August 2012 Reference: Most of the material in this note

More information

Moral Hazard. Itay Goldstein. Wharton School, University of Pennsylvania

Moral Hazard. Itay Goldstein. Wharton School, University of Pennsylvania Moral Hazard Itay Goldstein Wharton School, University of Pennsylvania 1 Principal-Agent Problem Basic problem in corporate finance: separation of ownership and control: o The owners of the firm are typically

More information

Exact Inference for Gaussian Process Regression in case of Big Data with the Cartesian Product Structure

Exact Inference for Gaussian Process Regression in case of Big Data with the Cartesian Product Structure Exact Inference for Gaussian Process Regression in case of Big Data with the Cartesian Product Structure Belyaev Mikhail 1,2,3, Burnaev Evgeny 1,2,3, Kapushev Yermek 1,2 1 Institute for Information Transmission

More information

Pa8ern Recogni6on. and Machine Learning. Chapter 4: Linear Models for Classifica6on

Pa8ern Recogni6on. and Machine Learning. Chapter 4: Linear Models for Classifica6on Pa8ern Recogni6on and Machine Learning Chapter 4: Linear Models for Classifica6on Represen'ng the target values for classifica'on If there are only two classes, we typically use a single real valued output

More information

PHASE ESTIMATION ALGORITHM FOR FREQUENCY HOPPED BINARY PSK AND DPSK WAVEFORMS WITH SMALL NUMBER OF REFERENCE SYMBOLS

PHASE ESTIMATION ALGORITHM FOR FREQUENCY HOPPED BINARY PSK AND DPSK WAVEFORMS WITH SMALL NUMBER OF REFERENCE SYMBOLS PHASE ESTIMATION ALGORITHM FOR FREQUENCY HOPPED BINARY PSK AND DPSK WAVEFORMS WITH SMALL NUM OF REFERENCE SYMBOLS Benjamin R. Wiederholt The MITRE Corporation Bedford, MA and Mario A. Blanco The MITRE

More information

Nonparametric adaptive age replacement with a one-cycle criterion

Nonparametric adaptive age replacement with a one-cycle criterion Nonparametric adaptive age replacement with a one-cycle criterion P. Coolen-Schrijner, F.P.A. Coolen Department of Mathematical Sciences University of Durham, Durham, DH1 3LE, UK e-mail: Pauline.Schrijner@durham.ac.uk

More information

Message-passing sequential detection of multiple change points in networks

Message-passing sequential detection of multiple change points in networks Message-passing sequential detection of multiple change points in networks Long Nguyen, Arash Amini Ram Rajagopal University of Michigan Stanford University ISIT, Boston, July 2012 Nguyen/Amini/Rajagopal

More information

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives

More information

Introduction to Support Vector Machines. Colin Campbell, Bristol University

Introduction to Support Vector Machines. Colin Campbell, Bristol University Introduction to Support Vector Machines Colin Campbell, Bristol University 1 Outline of talk. Part 1. An Introduction to SVMs 1.1. SVMs for binary classification. 1.2. Soft margins and multi-class classification.

More information

CCNY. BME I5100: Biomedical Signal Processing. Linear Discrimination. Lucas C. Parra Biomedical Engineering Department City College of New York

CCNY. BME I5100: Biomedical Signal Processing. Linear Discrimination. Lucas C. Parra Biomedical Engineering Department City College of New York BME I5100: Biomedical Signal Processing Linear Discrimination Lucas C. Parra Biomedical Engineering Department CCNY 1 Schedule Week 1: Introduction Linear, stationary, normal - the stuff biology is not

More information

Linear Threshold Units

Linear Threshold Units Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

More information

CSCI567 Machine Learning (Fall 2014)

CSCI567 Machine Learning (Fall 2014) CSCI567 Machine Learning (Fall 2014) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu September 22, 2014 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 22, 2014 1 /

More information

Neuro-Dynamic Programming An Overview

Neuro-Dynamic Programming An Overview 1 Neuro-Dynamic Programming An Overview Dimitri Bertsekas Dept. of Electrical Engineering and Computer Science M.I.T. September 2006 2 BELLMAN AND THE DUAL CURSES Dynamic Programming (DP) is very broadly

More information

Continuous Time Bayesian Networks for Inferring Users Presence and Activities with Extensions for Modeling and Evaluation

Continuous Time Bayesian Networks for Inferring Users Presence and Activities with Extensions for Modeling and Evaluation Continuous Time Bayesian Networks for Inferring Users Presence and Activities with Extensions for Modeling and Evaluation Uri Nodelman 1 Eric Horvitz Microsoft Research One Microsoft Way Redmond, WA 98052

More information

Cell Phone based Activity Detection using Markov Logic Network

Cell Phone based Activity Detection using Markov Logic Network Cell Phone based Activity Detection using Markov Logic Network Somdeb Sarkhel sxs104721@utdallas.edu 1 Introduction Mobile devices are becoming increasingly sophisticated and the latest generation of smart

More information

Marketing Mix Modelling and Big Data P. M Cain

Marketing Mix Modelling and Big Data P. M Cain 1) Introduction Marketing Mix Modelling and Big Data P. M Cain Big data is generally defined in terms of the volume and variety of structured and unstructured information. Whereas structured data is stored

More information

Gaussian Processes in Machine Learning

Gaussian Processes in Machine Learning Gaussian Processes in Machine Learning Carl Edward Rasmussen Max Planck Institute for Biological Cybernetics, 72076 Tübingen, Germany carl@tuebingen.mpg.de WWW home page: http://www.tuebingen.mpg.de/ carl

More information

The equivalence of logistic regression and maximum entropy models

The equivalence of logistic regression and maximum entropy models The equivalence of logistic regression and maximum entropy models John Mount September 23, 20 Abstract As our colleague so aptly demonstrated ( http://www.win-vector.com/blog/20/09/the-simplerderivation-of-logistic-regression/

More information

Model for dynamic website optimization

Model for dynamic website optimization Chapter 10 Model for dynamic website optimization In this chapter we develop a mathematical model for dynamic website optimization. The model is designed to optimize websites through autonomous management

More information

Variations of Statistical Models

Variations of Statistical Models 38. Statistics 1 38. STATISTICS Revised September 2013 by G. Cowan (RHUL). This chapter gives an overview of statistical methods used in high-energy physics. In statistics, we are interested in using a

More information

Local Gaussian Process Regression for Real Time Online Model Learning and Control

Local Gaussian Process Regression for Real Time Online Model Learning and Control Local Gaussian Process Regression for Real Time Online Model Learning and Control Duy Nguyen-Tuong Jan Peters Matthias Seeger Max Planck Institute for Biological Cybernetics Spemannstraße 38, 776 Tübingen,

More information

NEURAL NETWORKS AND REINFORCEMENT LEARNING. Abhijit Gosavi

NEURAL NETWORKS AND REINFORCEMENT LEARNING. Abhijit Gosavi NEURAL NETWORKS AND REINFORCEMENT LEARNING Abhijit Gosavi Department of Engineering Management and Systems Engineering Missouri University of Science and Technology Rolla, MO 65409 1 Outline A Quick Introduction

More information

Invited Applications Paper

Invited Applications Paper Invited Applications Paper - - Thore Graepel Joaquin Quiñonero Candela Thomas Borchert Ralf Herbrich Microsoft Research Ltd., 7 J J Thomson Avenue, Cambridge CB3 0FB, UK THOREG@MICROSOFT.COM JOAQUINC@MICROSOFT.COM

More information

Least Squares Estimation

Least Squares Estimation Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

More information

BayesX - Software for Bayesian Inference in Structured Additive Regression

BayesX - Software for Bayesian Inference in Structured Additive Regression BayesX - Software for Bayesian Inference in Structured Additive Regression Thomas Kneib Faculty of Mathematics and Economics, University of Ulm Department of Statistics, Ludwig-Maximilians-University Munich

More information

Semi-Supervised Support Vector Machines and Application to Spam Filtering

Semi-Supervised Support Vector Machines and Application to Spam Filtering Semi-Supervised Support Vector Machines and Application to Spam Filtering Alexander Zien Empirical Inference Department, Bernhard Schölkopf Max Planck Institute for Biological Cybernetics ECML 2006 Discovery

More information

Betting rules and information theory

Betting rules and information theory Betting rules and information theory Giulio Bottazzi LEM and CAFED Scuola Superiore Sant Anna September, 2013 Outline Simple betting in favorable games The Central Limit Theorem Optimal rules The Game

More information

Coding and decoding with convolutional codes. The Viterbi Algor

Coding and decoding with convolutional codes. The Viterbi Algor Coding and decoding with convolutional codes. The Viterbi Algorithm. 8 Block codes: main ideas Principles st point of view: infinite length block code nd point of view: convolutions Some examples Repetition

More information

17.3.1 Follow the Perturbed Leader

17.3.1 Follow the Perturbed Leader CS787: Advanced Algorithms Topic: Online Learning Presenters: David He, Chris Hopman 17.3.1 Follow the Perturbed Leader 17.3.1.1 Prediction Problem Recall the prediction problem that we discussed in class.

More information

Supervised Learning (Big Data Analytics)

Supervised Learning (Big Data Analytics) Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used

More information

Making Sense of the Mayhem: Machine Learning and March Madness

Making Sense of the Mayhem: Machine Learning and March Madness Making Sense of the Mayhem: Machine Learning and March Madness Alex Tran and Adam Ginzberg Stanford University atran3@stanford.edu ginzberg@stanford.edu I. Introduction III. Model The goal of our research

More information

Likelihood Approaches for Trial Designs in Early Phase Oncology

Likelihood Approaches for Trial Designs in Early Phase Oncology Likelihood Approaches for Trial Designs in Early Phase Oncology Clinical Trials Elizabeth Garrett-Mayer, PhD Cody Chiuzan, PhD Hollings Cancer Center Department of Public Health Sciences Medical University

More information

Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University caizhua@gmail.com

Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University caizhua@gmail.com Bayesian Machine Learning (ML): Modeling And Inference in Big Data Zhuhua Cai Google Rice University caizhua@gmail.com 1 Syllabus Bayesian ML Concepts (Today) Bayesian ML on MapReduce (Next morning) Bayesian

More information

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max

More information

Model-based Synthesis. Tony O Hagan

Model-based Synthesis. Tony O Hagan Model-based Synthesis Tony O Hagan Stochastic models Synthesising evidence through a statistical model 2 Evidence Synthesis (Session 3), Helsinki, 28/10/11 Graphical modelling The kinds of models that

More information

Probabilistic Latent Semantic Analysis (plsa)

Probabilistic Latent Semantic Analysis (plsa) Probabilistic Latent Semantic Analysis (plsa) SS 2008 Bayesian Networks Multimedia Computing, Universität Augsburg Rainer.Lienhart@informatik.uni-augsburg.de www.multimedia-computing.{de,org} References

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

An Application of Inverse Reinforcement Learning to Medical Records of Diabetes Treatment

An Application of Inverse Reinforcement Learning to Medical Records of Diabetes Treatment An Application of Inverse Reinforcement Learning to Medical Records of Diabetes Treatment Hideki Asoh 1, Masanori Shiro 1 Shotaro Akaho 1, Toshihiro Kamishima 1, Koiti Hasida 1, Eiji Aramaki 2, and Takahide

More information

A HYBRID GENETIC ALGORITHM FOR THE MAXIMUM LIKELIHOOD ESTIMATION OF MODELS WITH MULTIPLE EQUILIBRIA: A FIRST REPORT

A HYBRID GENETIC ALGORITHM FOR THE MAXIMUM LIKELIHOOD ESTIMATION OF MODELS WITH MULTIPLE EQUILIBRIA: A FIRST REPORT New Mathematics and Natural Computation Vol. 1, No. 2 (2005) 295 303 c World Scientific Publishing Company A HYBRID GENETIC ALGORITHM FOR THE MAXIMUM LIKELIHOOD ESTIMATION OF MODELS WITH MULTIPLE EQUILIBRIA:

More information

NEURAL NETWORKS A Comprehensive Foundation

NEURAL NETWORKS A Comprehensive Foundation NEURAL NETWORKS A Comprehensive Foundation Second Edition Simon Haykin McMaster University Hamilton, Ontario, Canada Prentice Hall Prentice Hall Upper Saddle River; New Jersey 07458 Preface xii Acknowledgments

More information

Efficient online learning of a non-negative sparse autoencoder

Efficient online learning of a non-negative sparse autoencoder and Machine Learning. Bruges (Belgium), 28-30 April 2010, d-side publi., ISBN 2-93030-10-2. Efficient online learning of a non-negative sparse autoencoder Andre Lemme, R. Felix Reinhart and Jochen J. Steil

More information

IN AVIATION it regularly occurs that an airplane encounters. Guaranteed globally optimal continuous reinforcement learning.

IN AVIATION it regularly occurs that an airplane encounters. Guaranteed globally optimal continuous reinforcement learning. GUARANTEED GLOBALLY OPTIMAL CONTINUOUS REINFORCEMENT LEARNING 1 Guaranteed globally optimal continuous reinforcement learning Hildo Bijl Abstract Self-learning and adaptable autopilots have the potential

More information

Logistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld.

Logistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld. Logistic Regression Vibhav Gogate The University of Texas at Dallas Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld. Generative vs. Discriminative Classifiers Want to Learn: h:x Y X features

More information

COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments

COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments Contents List of Figures Foreword Preface xxv xxiii xv Acknowledgments xxix Chapter 1 Fraud: Detection, Prevention, and Analytics! 1 Introduction 2 Fraud! 2 Fraud Detection and Prevention 10 Big Data for

More information

1 Prior Probability and Posterior Probability

1 Prior Probability and Posterior Probability Math 541: Statistical Theory II Bayesian Approach to Parameter Estimation Lecturer: Songfeng Zheng 1 Prior Probability and Posterior Probability Consider now a problem of statistical inference in which

More information

HT2015: SC4 Statistical Data Mining and Machine Learning

HT2015: SC4 Statistical Data Mining and Machine Learning HT2015: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Bayesian Nonparametrics Parametric vs Nonparametric

More information

Bayesian Image Super-Resolution

Bayesian Image Super-Resolution Bayesian Image Super-Resolution Michael E. Tipping and Christopher M. Bishop Microsoft Research, Cambridge, U.K..................................................................... Published as: Bayesian

More information