A Behavior Based Kernel for Policy Search via Bayesian Optimization
|
|
- Preston Stevens
- 8 years ago
- Views:
Transcription
1 via Bayesian Optimization Aaron Wilson Alan Fern Prasad Tadepalli Oregon State University School of EECS, 1148 Kelley Engineering Center, Corvallis, OR Abstract We expand on past successes applying Bayesian Optimization (BO) to the Reinforcement Learning (RL) problem. BO is a general method of searching for the maximum of an unknown objective function. The BO method explicitly aims to reduce the number of samples needed to identify the optimal solution by exploiting a probabilistic model of the objective function. Much work in BO has focused on Gaussian Process (GP) models of the objective. The performance of these models relies on the design of the kernel function relating points in the solution space. Unfortunately, previous approaches adapting ideas from BO to the RL setting have focused on simple kernels that are not well justified in the RL context. We show that a new kernel can be motivated by examining an upper bound on the absolute difference in expected return between policies. The resulting kernel explicitly compares the behaviors of policies in terms of the trajectory probability densities. We incorporate the behavior based kernel into a BO algorithm for policy search. Results reported on four standard benchmark domains show that our algorithm significantly outperform alternative state-of-the-art algorithms. 1. Introduction In the policy search setting, RL agents seek an optimal policy within a fixed set. In such a setting an agent executes a sequence of policies searching for the true optimum. Naturally, future policy selection decisions should benefit from the information available in all samples. A question arises regarding how the expected return of untried policies can be estimated using the batch of samples, and how to best use Appearing in ICML 2011 Workshop: Planning and Acting with Uncertain Models, Bellevue, WA, USA, Copyright 2011 by the author(s)/owner(s). the estimated returns to perform policy search. In this work we propose explicitly constructing a probabilistic model of the expected return informed by observations of past policy behaviors. We exploit this probabilistic model of the return by selecting new policies predicted to best improve on the performance of policies in the sample set. Our approach is based on adapting black box Bayesian Optimization (BO) to the RL problem. BO is a method of sequentially planning a sequence of queries from an unknown objective function for purposes of seeking the maximum. It is an ideal method for tackling the basic problem of policy search as it directly confronts the fundamental issue of trading off exploration of the objective function (global searches) with exploitation (local searches). Fundamental to application of Bayesian Optimization techniques is the definition of a Bayesian prior distribution for the objective function. The method of BO searches this surrogate representation of the objective function for maximal points instead of directly querying the true objective. Hopefully, by using a large number of surrogate function evaluations (trading computational resources for higher quality samples) the true maximum point can be identified with few queries to the true objective. As in most Bayesian methods success of the BO technique rests on the quality of the modeling effort. How should the objective, the expected return in the RL case, be effectively modeled? In this work, similar to past efforts applying BO to RL, we focus on GP models of the expected return. The generalization performance of GP models, and hence the performance of the BO technique, is strongly impacted by the definition of the kernel function which encodes a notion of relatedness between points in the function space. When applying BO to RL this means encoding a notion of similarity between policies. Past work has used simple kernels to relate policy parameters (for instance squared exponential kernels (Lizotte et al., 2007; Wilson et al., 2010)). Unfortunately, the selected kernels fail to account for the special properties of sequential decision processes typical of RL problems. A more appropriate notion of relatedness is needed for the RL context. We propose that policies are
2 better related by their behavior rather than their parameters. Below we motivate our behavior-based kernel function. We then discuss how to incorporate the kernel into a BO approach when a sparse sample of policy trajectories are available. Empirically we demonstrate that the behaviorbased Kernel significantly improves BO and outperforms a selection of standard algorithms on four benchmark domains. 2. Problem Setting We study the Reinforcement Learning problem in the context of Markov Decision Processes (MDPs). MDPs are described by a tuple (S, A, P, P 0, R, π). We consider processes with continuous state and action values. Where each state and action is a vector s R n, and a R. The transition function P is a probability distribution P (s t s t 1, a t 1 ) defining the response of the process to the agents action selections. Distribution P 0 gives the probability of beginning in a particular state. The reward function R(s, a) returns a numeric value representing the immediate reward for the state action pair (we do not consider stochastic reward functions). Finally, the function π is a stochastic mapping from states to actions P π (a φ(s), θ). It is a function of a vector of parameters θ R k, and features of the state φ(s). We are interested in episodic average reward RL. Define the trajectory density, P (ξ θ) = P 0 (s 0 ) T P (s t s t 1, a t 1 )P π (a t 1 φ(s t 1 ), θ), t=1 and the value of a trajectory R(ξ) = T t=0 R(s t, a t ). The variable T is assumed to have a maximum value insuring that all trajectories have finite length. Generalizations of our efforts to the infinite horizon case is possible, but is not a focus of this work. We define the expected return in terms of the integral over paths, η(θ) = R(ξ)P (ξ; θ)dξ. The basic policy search problem is to identify the policy parameters that maximize this expectation, arg max θ η(θ). 3. Policy Search via Bayesian Optimization Bayesian optimization addresses the general problem of maximizing a real valued function, θ = arg max η(θ). θ BO is a global method for tackling expensive objective functions by explicitly reducing the number of evaluations needed before the maximum is found. In BO the objective function is treated like a random variable, is modeled by a probability distribution P (η), and the uncertainty encoded by this distribution is employed to select which points will be used to query the objective. The principle idea is to use a large number of surrogate evaluations to reduce the number of expensive evaluations of the true objective. The BO method is a form of active learning. Given the objective function prior distribution BO proceeds iteratively. A point is selected according to some criteria (the selection criteria is a function of the posterior), the point is evaluated (a policy in our case), the posterior distribution is updated using the data, a new point is selected, and so on. As an active method of learning the criteria for selection of new query points plays a critical role in the quality of the posterior estimation of the surface, and the speed of identifying the maximum. Any selection criteria must address the trade off of exploration and exploitation. Because the Bayes optimal selection criteria is computationally intractable a heuristic method of selection must be used. A common heuristic called Maximum Expected Improvement (MEI) is the method of selection used in this work. Suppose we have a collection of n points in the θ space, and their associated objective function values. Define η max to be the point with highest observed return in the data set. Consider the following function, I(θ) = max{0, η(θ) η max }, which returns the amount by which the point θ exceeds the observed maximum. MEI searches for the maximum of the expectation of this improvement function with respect to the posterior uncertainty P (η(θ) D) given observed data D, θ n+1 = arg max E P (η(θ) D) [I(θ)]. θ Conveniently, closed form solutions exist for the Expected Improvement. By incorporating the posterior uncertainty into the selection process MEI is guaranteed to explore regions of sufficiently high uncertainty. Clearly, when the conditional posterior distribution has sufficient probability mass above the current maximum the EI will be positive, pushing the algorithm to execute experiments in new regions. Due to its empirical success the MEI criterion, originally proposed by (Mockus, 1994), has become the standard choice in most work on BO. Recent work has also established the convergence properties of iteratively selecting points using MEI with GP prior (Vazquez & Bect, 2010) lending further weight to its continued use. Crucial to the performance of the BO method is the definition of the objective function prior distribution. Our objective in the policy search problem is maximization of the
3 Algorithm 1 Bayesian Optimization Algorithm for RL 1: Let D 1:n = {η(θ i), ξ i} n i=1. 2: Compute the matrix of covariances K. 3: Select the next point in the policy space to evaluate: θ n+1 = arg max θ E P (η(θ) D) (I(θ) D 1:n). 4: Execute the policy θ n+1 for E episodes. 5: Compute Monte Carlo estimate of expected return ˆη(θ n+1) = 1 R(ξ) E ξ ξ n+1 6: Update D 1:n+1 = D 1:n (ˆη(θ n+1), ξ n+1) 7: Return to step 2. expected return. And it is this quantity that we model using the GP. GPs are defined by a mean and covariance function, η(θ) GP (m(θ), K(θ, θ)). The mean m(θ) encodes prior assumptions about the underlying function space (frequently assumed to be zero). The covariance matrix K(θ, θ) encodes relationships between points in the function space. Substantial engineering efforts have been devoted to developing meaningful kernel functions, K(θ i, θ j ), for a variety of domains due to the kernel s impact on generalization performance. Consider the basic Bayesian Optimization Algorithm 1. Line 1 assumes a batch of data of the form, {ˆη(θ i ), ξ i } and we denote the full set of observations from all past policies D 1:n = {ˆη(θ i ), ξ i } n i=1. We write ˆη to indicate a Monte Carlo estimate of the expected return for policy θ, and ξ i indicates the set of trajectories used in the Monte Carlo estimate. Given this data the surface of the expected return is modeled using the GP prior. For the moment we leave aside the computation of the covariance function in line 2. For purposes of maximizing the expected improvement, line 3, the GP posterior distribution, P (f(θ D 1:n )), must be computed. In the GP model this posterior has a simple form. Given the data D 1:n let y be the vector of outputs such that, y i = [η(θ i )] and let K(θ, θ)) be the covariance matrix with elements K(θ i, θ j ). Consequently the conditional posterior distribution is Gaussian with mean, and variance, µ(η(θ n+1 ) D 1:n ) = k(θ n+1, θ)k(θ, θ) 1 y, σ 2 (η(θ n+1 ) D 1:n ) = k(θ n+1, θ n+1 ) k(θ n+1, θ)k(θ, θ) 1 k(θ, θ n+1 ). k(θ n+1, θ) is the vector of similarities between the new point and all previously observed points, and k(θ, θ n+1 ) is its transpose. It is at line 3 that the selection of the kernel function has its impact. The kernel controls how the information in the sample is generalized to new points, and therefore the quality of the points returned by the optimization. Our work is an effort to improve the generalization performance of the GP by defining a meaningful kernel for the RL context. We discuss the motivation for our kernel and its estimation below Behavior-based Kernel It turns out that a simple bound relates the difference in returns of two policies to the KL-divergence of the trajectory densities. Consider the difference in expected returns of two policies indexed by θ i, and θ j. The absolute value of this difference, η(θ i ) η(θ j ), has an upper bound expressed in terms of the Kullback Leibler (KL) divergence, ( ) P (ξ θi) KL(P (ξ θ i) P (ξ θ j)) = P (ξ θ i) dξ, P (ξ θ j) of the trajectory probability densities. Theorem [ 1. For any θ i, and θ j, η(θ i ) η(θ j ) Rmax KL(P (ξ θi 2 ) P (ξ θ j )) + ] KL(P (ξ θ j ) P (ξ θ i )). Proof. η(θ i ) η(θ j ) = = Rmax R(ξ)P (ξ θ i )dξ R(ξ)(P (ξ θ i ) P (ξ θ j )) dξ R(ξ)(P (ξ θ i ) P (ξ θ j )) dξ P (ξ θ i ) P (ξ θ j ) dξ R(ξ)P (ξ θ j ) dξ Rmax 2 KL(P (ξ θ i ) P (ξ θ j )) = Rmax 2 [ KL(P (ξ θ i ) P (ξ θ j )) ] + KL(P (ξ θ j ) P (ξ θ i )) Rmax 2 D(θ i, θ j ) The Rmax term, introduced in line 4, represents the maximal score for any finite length trajectory. The first introduction of the KL-divergence is justified by Pinsker s Inequality. Pinsker s inequality bounds from above the variational distance between two distributions, defined on arbitrary sets, by the divergence term shown above. The inequality states that 1 (V (P, Q))2 2 KL(P, Q) where V is the variational distance, (P (x) Q(x)) dx. The inequality was originally proposed in (Pinsker, 1964) with recent generalizations to other variational distances here (Reid & Williamson, 2009). The second to last line introduces the symmetric KL-divergence which bounds the standard divergence from above (KL(P, Q) 0). Importantly the bound is a symmetric positive measure of distance between policies. It bounds, from above, the absolute difference in expected value, and reaches zero only when the divergence is zero. Additionally, though the variational bound is strictly tighter than the divergence based bound reported here, computing the variational distance inherently requires knowledge of the domain transition models. Alternatively, the term of the KL-divergence is a ratio of path probabilities and can be computed with no knowledge of the domain model. This characteristic is important when learned models are not available. Our goal is to incorporate the final measure of policy relatedness into
4 the surrogate representation of the expected return. Unfortunately, the divergence function does not meet the standard requirements for a kernel ((Moreno et al., 2004)). To transform the bound into a valid kernel we first define a function, A Behavior Based Kernel for Policy Search D(θ i, θ j) = KL(P (ξ θ i) P (ξ θ j))+ KL(P (ξ θ j) P (ξ θ i)), and define the covariance function to be the negative exponential of D, K(θ i, θ j ) = exp( α D(θ i, θ j )). The kernel has a single scalar parameter α controlling its width. This is precisely what we sought, a measure of policy similarity which depends on the action selection decisions. The kernel compares behaviors not parameters. Though the variance of this estimate can be large it will not negatively impact exploration in our algorithm. Our empirical results show that errors in the divergence estimates, including the importance sampled estimates, do not negatively impact performance. 4. Results We report the performance of our algorithm in four benchmark RL tasks including mountain car, cart-pole balancing, 3-link planar arm, and an acrobot domain. We compare BOA with our behavior based kernel to three alternatives: The BOA with squared exponential kernel, Q- Learning with CMAC function approximation (Sutton & Barto, 1998), and LSPI (Lagoudakis et al., 2003) Estimation of the Kernel Function Values We propose using this kernel to improve the BO algorithm discussed above where the proposed kernel plays a role in lines 2 and 3 of the algorithm. Below we discuss using estimates of the divergence values. Computing the exact KL-divergence requires access to a model of the decision process. Even with a model in hand computing the integral over paths is itself a computationally demanding process. The divergence must be estimated. In this work we elect to use a simple Monte Carlo estimate of the divergence. The divergence between policy θ i and θ j is approximated by, ˆD(θ i, θ j ) = ξ ξi ( P (ξ θi ) P (ξ θ j ) ) + ξ ξ j ( ) P (ξ θj ), P (ξ θ i ) using a sparse sample of trajectories generated by each policy respectively (ξ i represents the set of trajectories generated by policy θ i ). Because of the definition of the trajectory density the term within the arithm reduces to a ratio of action selection probabilities, ( ) P (ξ θi ) = P (ξ θ j ) T t=1 easily computed without a model. ( ) Pπ (a t φ(s t ), θ i ), P π (a t φ(s t ), θ j ) A second problem arises when computing the Expected Improvement (Line 3 of the BOA). Computing the conditional predictive mean and covariance for new points requires evaluation of the kernel for policies which have no trajectories associated with them. Because we have no access to a model we elect to use an importance sampled estimate of the divergence, ˆD(θ new, θ j) = ( P (ξ θ new) P (ξ θnew) P (ξ θ j) P (ξ θ j) ξ ξ j ( ) P (ξ θj) +. P (ξ θ new) ) 4.1. Experiment Setup We detail the special requirements necessary to implement each algorithm in this section. The results reported below are averaged over 30 runs for the BOA implementations, and 300 runs for Q-learning and LSPI. The initial policy is always randomly initialized. Expected returns reported for the first episode represent the average performance of randomly generated policies. BOA with Behavior Based Kernel. To generate data stochastic policies are transformed into deterministic policies by executing the maximum probability action. Single trajectories generated from these policies are provided as data to the kernel function. The expected returns reported below are for these deterministic policies. Policies are treated as stochastic for purposes of computing the kernel function. As seen below this sparse sample is sufficient to distinguish policies using the behavior based kernel. Maximizing the EI is done using a gradient free black box optimizer called DIRECT (Jones et al., 1993) BOA. We compare to a BO algorithm with a squared exponential kernel, which was the kernel of choice in past work (Lizotte et al., 2007; Wilson et al., 2010). K(θ i, θ j ) = exp( 1 2 (θ i θ j ) T ρ(θ i θ j )), We were able to get positive results by tuning the ρ vector for each experiment. Reported results are for the best setting of this parameter. The mean function of the GP was set to zero. To generate data for the BOA stochastic policies are transformed into deterministic policies as described above. The expected returns reported below are for these deterministic policies. DIRECT is used to optimize the EI. Q-Learning with CMAC function approximation. The basis function set was identified by hand in each problem. Epsilon greedy exploration was used.
5 LSPI. LSPI results are reported in the cart-pole, acrobot, and mountain car tasks. We were unable to get reasonable results from LSPI in the planar arm domain Cart-Pole Domain In the cart-pole domain the agent attempts to balance a pole for a fixed allotment of time. Successful policies keep the agents cart within a fixed boundary and maintain stability throughout the episode. In this version of the domain the agent must keep the pole balanced for 1000 steps. The state includes the location of the cart, the cart velocity, the angle of the pole and the angular velocity of the pole. At each step the agent receives a positive reward plus a penalty for large pole angles and speeds. This reward promotes policies that minimize deviations from the ideal position. Finally, a successfully completed episode (balancing for the full 1000 steps) gives the agent a reward of 100. The policy search algorithms maximize a linear policy. Figure 1 shows the results for the cart pole domain. In this case the parameter of the divergence kernel is set to 1. We have performed analysis of the sensitivity to this parameter which cannot be set too small. When it is set below.1 the probability of convergence to the true optimal begins to fall to zero. Of course, to avoid this problem the value of the kernel parameter can always be set using by maximizing the likelihood of the data (Rasmussen & Williams, 2005). This was confirmed in additional experiments with the kernel parameter set to 3 and 10 respectively which continued to explore well after finding optimum points. Setting the parameter to 1 guaranteed convergence to the optimal policy for all of our runs and avoided unnecessary exploration. Clearly the divergence based kernel outperforms all of the competitors including the BOA with the squared exponential kernel. Importantly many policies generated by the standard BOA have similar behaviors in the cart-pole domain. The behavior based kernel avoids exploring these redundant behaviors resulting in quick convergence Planar Arm Domain In this domain the agent controls an articulated arm attempting to place the arm tip within a fixed goal location. Three arm joints are controlled by applying a small amount of torque (-1 or 1) which causes a kinematic response. Each arm segment is constrained to move through of rotation simulating constraints of a real machine. At each step the agent is penalized by the distance from the center of the goal to the tip of the arm. A istic controller is used for each joint. The state space for each controller is the distance between location of the arm tip and target. Figure 2 shows the results for the planar arm domain. The generalization of the divergence-based kernel is particularly powerful in this case. Much of the policy space is quickly identified to be redundant by our kernel. In this case the performance is not as responsive to the kernel parameter. We performed experiments setting α as high as 10 and still observed quick convergence to the maximum value. The experiment reported here has the value of α set to Mountain Car Domain The mountain car task is to accelerate a simulated car from an initial position within a basin of attraction to the peak of a slope. The problem is made difficult because the car does not have sufficient power to drive directly to the goal. The agent must generate momentum by backing up one hillside to have sufficient velocity to reach the opposite peak. At each step the mountain car agent receives a flat -1 reward, and a bonus reward of 100 if the agent reaches the peak. The agent controls the car with left, and right actions. A istic function is used to select actions. In Figure 3 we report the results. In this case we have elected to leave the other results off of the graph. The other methods were unable to find competitive policies with less than 500 samples (the standard BOA outperforms all of the other alternatives). This is due to the flat reward structure which leads to a plateaued objective function. This makes Mountain Car a perfect experiment for illustrating the importance of directed exploration based on differences in policy behavior. Approaches based on random exploration, and exploration weighted by returns are poorly suited to this kind of reward structure. The divergence kernel on the other hand generalizes between whole regions of the plateau directing search to policies likely to generate novel behaviors. Visual inspection of the performance of the standard BOA algorithm shows that many of the selected policies, unrelated according to the squared exponential kernel, actually produce the same action sequences when started at the initial state. This redundant search is completely avoided by using the behavior based kernel. We also provide plots indicating the sensitivity to the kernel parameter α. When the kernel parameter is set too low little exploration is performed and a suboptimal 130 step policy is found. When set to 10 much more exploration is performed, more than 200 additional episodes are sampled before settling on an optimal 117 step policy, which compares favorably to the 119 step policy found when the parameter is set to Acrobot Task In this domain the agent controls a simulated acrobot attached by the hands to a fixed location. The goal is to apply torque (-1,1) to the hips of the robot and swing the feet above a pre-specified threshold. The dynamics of the acrobot are constrained so that the bottom half of the agent
6 Figure 1. Cart-pole. cannot perform full revolutions. At each step the agent receives a flat -1 penalty and a bonus +100 if the goal height is reached. A istic policy is used to select actions. Figure 4 illustrates the results for the acrobot domain. The generalization performance is less pronounced in this case. The squared exponential kernel generalizes reasonably well. The reason for this can be observed in the behavior of random policies in the acrobot domain. The behaviors are erratic. Small changes in the parameters of the policy lead to very different action sequences. Therefore, more behaviors must be searched before good policies are identified. Even so the behavior based kernel does outperform the BOA with squared exponential kernel. 5. Related Work Work by (Kakade, 2001) presented a metric based on the Fisher information to derive the natural policy gradient update. Kakade was able to show significant results in a difficult Tetris domain outperforming standard gradient methods. Follow up work by (Bagnell & Schneider, 2003) proposed pursuing a related idea within the path integral framework for RL (the same framework of this paper). Their work considers metrics defined as functions on the distribution over trajectories generated by a fixed policy P (ξ θ). In contrast to our goals both works focus on iteratively improving a policy via gradient descent. Furthermore, no explicit attention is paid to using the metric information to guide the exploratory process. However, the insight that policy relationships should be functions of the trajectory density has played a key role in our work. Figure 2. Planar Arm. More closely related to our proposal is recent work in (Peters et al., 2010), and (Kober & Peters, 2010). (Peters et al., 2010) uses a divergence-based bound to control exploration. Specifically, they attempt to maximize the expected reward subject to a bound proportional to the KLdivergence between the empirically observed state-action distribution and the state-action distribution of the new policy. The search for a new policy is necessarily local, restricted by the bound to be close to the current policy. By contrast, our work uses the divergence as a measure of similarity allowing for a more aggressive search of the policy space. A related work (Kober & Peters, 2010) derives a lower bound on the importance sampled estimate of the expected return, as was done in (Dayan & Hinton, 1997), and observes the relationship to the KL-divergence of the reward weighted behavior policy and the target policy. They derive from this relationship an EM-based update for the policy parameters. An explicit effort is made to construct the update such that exploration is accounted for. However, their method of state-dependent exploration is still based on random perturbations of the action selection policy. Our method of exploration is instead determined by the posterior uncertainty, and does not depend on the behavior policy. In fact, the mean action of a stochastic policy, treating the stochastic policy as a deterministic one, can be used by our algorithm. This is an advantage when working with real physical systems where random perturbations can damage expensive equipment. 6. Conclusion We have examined how to improve policy search algorithms by constructing and exploiting a probabilistic model of the expected return objective function. Our work extends BO methods for policy search problems by constructing a behavior based kernel. We motivate our kernel by examining a simple upper bound on the absolute difference of expected returns. The resulting bound is a symmetric positive measure of distance between policies, and reaches zero only when the divergence is zero. We use this upper bound
7 Figure 3. Mountain Car. as the basis for our kernel function, argue that the properties of the bound insure a more reasonable measure of policy relatedness, and demonstrate empirically that the improved model of the objective substantially speeds exploration in some simple benchmark domains. Acknowledgments This research is supported by the Army Research Office and the Office of Naval Research. References Bagnell, J. Andrew (Drew) and Schneider, Jeff. Covariant policy search. In IJCAI, August Dayan, Peter and Hinton, Geoffrey E. Using expectationmaximization for reinforcement learning. Neural Computation, 9: , February Jones, D. R., Perttunen, C. D., and Stuckman, B. E. Lipschitzian optimization without the lipschitz constant. J. Optim. Theory Appl., 79(1): , Kakade, Sham. A natural policy gradient. In NIPS, Kober, Jens and Peters, Jan. Policy search for motor primitives in robotics. Machine Learning, pp. 1 33, Lagoudakis, Michail G., Parr, Ronald, and Bartlett, L. Least-squares policy iteration. Journal of Machine Learning Research, 4, Figure 4. Acrobot. Mockus, J. Application of bayesian approach to numerical methods of global and stochastic optimization. Global Optimization, 4(4): , Moreno, Pedro J., Ho, Purdy P., and Vasconcelos, Nuno. A kullback-leibler divergence based kernel for svm classification in multimedia applications. In NIPS, Peters, Jan, Mülling, Katharina, and Altun, Yasemin. Relative entropy policy search. In AAAI, Pinsker, M. Information and Information Stability of Random Variables and Processes. Holden-Day Inc, San Francisco, Translated by Amiel Feinstein. Rasmussen, Carl Edward and Williams, Christopher K. I. Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning). The MIT Press, ISBN X. Reid, Mark D. and Williamson, Robert C. pinsker inequalities. In COLT, Generalised Sutton, R.S. and Barto, A. G. Reinforcement Learning:An Introduction. MIT Press, Vazquez, Emmanuel and Bect, Julien. Convergence properties of the expected improvement algorithm with fixed mean and covariance functions. Journal of Statistical Planning and Inference, 140(11): , Wilson, Aaron, Fern, Alan, and Tadepalli, Prasad. Incorporating domain models into bayesian optimization for rl. In ECML, Lizotte, Daniel, Wang, Tao, Bowling, Michael, and Schuurmans, Dale. Automatic gait optimization with gaussian process regression. In IJCAI, 2007.
Statistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical
More informationA Sarsa based Autonomous Stock Trading Agent
A Sarsa based Autonomous Stock Trading Agent Achal Augustine The University of Texas at Austin Department of Computer Science Austin, TX 78712 USA achal@cs.utexas.edu Abstract This paper describes an autonomous
More informationLecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct
More informationProbabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014
Probabilistic Models for Big Data Alex Davies and Roger Frigola University of Cambridge 13th February 2014 The State of Big Data Why probabilistic models for Big Data? 1. If you don t have to worry about
More informationUsing Markov Decision Processes to Solve a Portfolio Allocation Problem
Using Markov Decision Processes to Solve a Portfolio Allocation Problem Daniel Bookstaber April 26, 2005 Contents 1 Introduction 3 2 Defining the Model 4 2.1 The Stochastic Model for a Single Asset.........................
More informationBasics of Statistical Machine Learning
CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar
More informationLearning Tetris Using the Noisy Cross-Entropy Method
NOTE Communicated by Andrew Barto Learning Tetris Using the Noisy Cross-Entropy Method István Szita szityu@eotvos.elte.hu András Lo rincz andras.lorincz@elte.hu Department of Information Systems, Eötvös
More informationMachine Learning and Pattern Recognition Logistic Regression
Machine Learning and Pattern Recognition Logistic Regression Course Lecturer:Amos J Storkey Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh Crichton Street,
More informationGaussian Processes to Speed up Hamiltonian Monte Carlo
Gaussian Processes to Speed up Hamiltonian Monte Carlo Matthieu Lê Murray, Iain http://videolectures.net/mlss09uk_murray_mcmc/ Rasmussen, Carl Edward. "Gaussian processes to speed up hybrid Monte Carlo
More informationAn Introduction to Machine Learning
An Introduction to Machine Learning L5: Novelty Detection and Regression Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia Alex.Smola@nicta.com.au Tata Institute, Pune,
More informationTutorial on Markov Chain Monte Carlo
Tutorial on Markov Chain Monte Carlo Kenneth M. Hanson Los Alamos National Laboratory Presented at the 29 th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Technology,
More informationSection 5. Stan for Big Data. Bob Carpenter. Columbia University
Section 5. Stan for Big Data Bob Carpenter Columbia University Part I Overview Scaling and Evaluation data size (bytes) 1e18 1e15 1e12 1e9 1e6 Big Model and Big Data approach state of the art big model
More informationGaussian Process Training with Input Noise
Gaussian Process Training with Input Noise Andrew McHutchon Department of Engineering Cambridge University Cambridge, CB PZ ajm57@cam.ac.uk Carl Edward Rasmussen Department of Engineering Cambridge University
More informationNeural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method
Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method Martin Riedmiller Neuroinformatics Group, University of Onsabrück, 49078 Osnabrück Abstract. This
More informationCS 688 Pattern Recognition Lecture 4. Linear Models for Classification
CS 688 Pattern Recognition Lecture 4 Linear Models for Classification Probabilistic generative models Probabilistic discriminative models 1 Generative Approach ( x ) p C k p( C k ) Ck p ( ) ( x Ck ) p(
More informationTD(0) Leads to Better Policies than Approximate Value Iteration
TD(0) Leads to Better Policies than Approximate Value Iteration Benjamin Van Roy Management Science and Engineering and Electrical Engineering Stanford University Stanford, CA 94305 bvr@stanford.edu Abstract
More informationInformation Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay
Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture - 17 Shannon-Fano-Elias Coding and Introduction to Arithmetic Coding
More informationThe Advantages and Disadvantages of Online Linear Optimization
LINEAR PROGRAMMING WITH ONLINE LEARNING TATSIANA LEVINA, YURI LEVIN, JEFF MCGILL, AND MIKHAIL NEDIAK SCHOOL OF BUSINESS, QUEEN S UNIVERSITY, 143 UNION ST., KINGSTON, ON, K7L 3N6, CANADA E-MAIL:{TLEVIN,YLEVIN,JMCGILL,MNEDIAK}@BUSINESS.QUEENSU.CA
More informationMotivation. Motivation. Can a software agent learn to play Backgammon by itself? Machine Learning. Reinforcement Learning
Motivation Machine Learning Can a software agent learn to play Backgammon by itself? Reinforcement Learning Prof. Dr. Martin Riedmiller AG Maschinelles Lernen und Natürlichsprachliche Systeme Institut
More informationAdaptive Online Gradient Descent
Adaptive Online Gradient Descent Peter L Bartlett Division of Computer Science Department of Statistics UC Berkeley Berkeley, CA 94709 bartlett@csberkeleyedu Elad Hazan IBM Almaden Research Center 650
More informationInference of Probability Distributions for Trust and Security applications
Inference of Probability Distributions for Trust and Security applications Vladimiro Sassone Based on joint work with Mogens Nielsen & Catuscia Palamidessi Outline 2 Outline Motivations 2 Outline Motivations
More informationThese slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
Music and Machine Learning (IFT6080 Winter 08) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher
More informationInductive QoS Packet Scheduling for Adaptive Dynamic Networks
Inductive QoS Packet Scheduling for Adaptive Dynamic Networks Malika BOURENANE Dept of Computer Science University of Es-Senia Algeria mb_regina@yahoo.fr Abdelhamid MELLOUK LISSI Laboratory University
More informationLinear Models for Classification
Linear Models for Classification Sumeet Agarwal, EEL709 (Most figures from Bishop, PRML) Approaches to classification Discriminant function: Directly assigns each data point x to a particular class Ci
More informationAn Environment Model for N onstationary Reinforcement Learning
An Environment Model for N onstationary Reinforcement Learning Samuel P. M. Choi Dit-Yan Yeung Nevin L. Zhang pmchoi~cs.ust.hk dyyeung~cs.ust.hk lzhang~cs.ust.hk Department of Computer Science, Hong Kong
More informationConstrained Bayes and Empirical Bayes Estimator Applications in Insurance Pricing
Communications for Statistical Applications and Methods 2013, Vol 20, No 4, 321 327 DOI: http://dxdoiorg/105351/csam2013204321 Constrained Bayes and Empirical Bayes Estimator Applications in Insurance
More informationIntroduction to Online Learning Theory
Introduction to Online Learning Theory Wojciech Kot lowski Institute of Computing Science, Poznań University of Technology IDSS, 04.06.2013 1 / 53 Outline 1 Example: Online (Stochastic) Gradient Descent
More informationSampling via Moment Sharing: A New Framework for Distributed Bayesian Inference for Big Data
Sampling via Moment Sharing: A New Framework for Distributed Bayesian Inference for Big Data (Oxford) in collaboration with: Minjie Xu, Jun Zhu, Bo Zhang (Tsinghua) Balaji Lakshminarayanan (Gatsby) Bayesian
More informationBig Data - Lecture 1 Optimization reminders
Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Schedule Introduction Major issues Examples Mathematics
More informationTwo Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering
Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering Department of Industrial Engineering and Management Sciences Northwestern University September 15th, 2014
More informationStatistical Machine Learning from Data
Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Gaussian Mixture Models Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique
More informationIntroduction to Machine Learning Using Python. Vikram Kamath
Introduction to Machine Learning Using Python Vikram Kamath Contents: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. Introduction/Definition Where and Why ML is used Types of Learning Supervised Learning Linear Regression
More informationChristfried Webers. Canberra February June 2015
c Statistical Group and College of Engineering and Computer Science Canberra February June (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 829 c Part VIII Linear Classification 2 Logistic
More informationECON20310 LECTURE SYNOPSIS REAL BUSINESS CYCLE
ECON20310 LECTURE SYNOPSIS REAL BUSINESS CYCLE YUAN TIAN This synopsis is designed merely for keep a record of the materials covered in lectures. Please refer to your own lecture notes for all proofs.
More informationThe Optimality of Naive Bayes
The Optimality of Naive Bayes Harry Zhang Faculty of Computer Science University of New Brunswick Fredericton, New Brunswick, Canada email: hzhang@unbca E3B 5A3 Abstract Naive Bayes is one of the most
More informationCHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES. From Exploratory Factor Analysis Ledyard R Tucker and Robert C.
CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES From Exploratory Factor Analysis Ledyard R Tucker and Robert C MacCallum 1997 180 CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES In
More informationRobotics. Chapter 25. Chapter 25 1
Robotics Chapter 25 Chapter 25 1 Outline Robots, Effectors, and Sensors Localization and Mapping Motion Planning Motor Control Chapter 25 2 Mobile Robots Chapter 25 3 Manipulators P R R R R R Configuration
More informationLearning the Structure of Factored Markov Decision Processes in Reinforcement Learning Problems
Learning the Structure of Factored Markov Decision Processes in Reinforcement Learning Problems Thomas Degris Thomas.Degris@lip6.fr Olivier Sigaud Olivier.Sigaud@lip6.fr Pierre-Henri Wuillemin Pierre-Henri.Wuillemin@lip6.fr
More informationLearning Gaussian process models from big data. Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu
Learning Gaussian process models from big data Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu Machine learning seminar at University of Cambridge, July 4 2012 Data A lot of
More informationA Learning Based Method for Super-Resolution of Low Resolution Images
A Learning Based Method for Super-Resolution of Low Resolution Images Emre Ugur June 1, 2004 emre.ugur@ceng.metu.edu.tr Abstract The main objective of this project is the study of a learning based method
More informationlarge-scale machine learning revisited Léon Bottou Microsoft Research (NYC)
large-scale machine learning revisited Léon Bottou Microsoft Research (NYC) 1 three frequent ideas in machine learning. independent and identically distributed data This experimental paradigm has driven
More informationMonte Carlo Simulation
1 Monte Carlo Simulation Stefan Weber Leibniz Universität Hannover email: sweber@stochastik.uni-hannover.de web: www.stochastik.uni-hannover.de/ sweber Monte Carlo Simulation 2 Quantifying and Hedging
More informationA Game Theoretical Framework for Adversarial Learning
A Game Theoretical Framework for Adversarial Learning Murat Kantarcioglu University of Texas at Dallas Richardson, TX 75083, USA muratk@utdallas Chris Clifton Purdue University West Lafayette, IN 47907,
More informationCHAPTER 2 Estimating Probabilities
CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a
More informationMarkov Decision Processes for Ad Network Optimization
Markov Decision Processes for Ad Network Optimization Flávio Sales Truzzi 1, Valdinei Freire da Silva 2, Anna Helena Reali Costa 1, Fabio Gagliardi Cozman 3 1 Laboratório de Técnicas Inteligentes (LTI)
More informationPrinciples of Data Mining by Hand&Mannila&Smyth
Principles of Data Mining by Hand&Mannila&Smyth Slides for Textbook Ari Visa,, Institute of Signal Processing Tampere University of Technology October 4, 2010 Data Mining: Concepts and Techniques 1 Differences
More informationOptions with exceptions
Options with exceptions Munu Sairamesh and Balaraman Ravindran Indian Institute Of Technology Madras, India Abstract. An option is a policy fragment that represents a solution to a frequent subproblem
More informationLinear Classification. Volker Tresp Summer 2015
Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong
More informationGlobally Optimal Crowdsourcing Quality Management
Globally Optimal Crowdsourcing Quality Management Akash Das Sarma Stanford University akashds@stanford.edu Aditya G. Parameswaran University of Illinois (UIUC) adityagp@illinois.edu Jennifer Widom Stanford
More information171:290 Model Selection Lecture II: The Akaike Information Criterion
171:290 Model Selection Lecture II: The Akaike Information Criterion Department of Biostatistics Department of Statistics and Actuarial Science August 28, 2012 Introduction AIC, the Akaike Information
More informationMaster s Theory Exam Spring 2006
Spring 2006 This exam contains 7 questions. You should attempt them all. Each question is divided into parts to help lead you through the material. You should attempt to complete as much of each problem
More informationBayesian Statistics: Indian Buffet Process
Bayesian Statistics: Indian Buffet Process Ilker Yildirim Department of Brain and Cognitive Sciences University of Rochester Rochester, NY 14627 August 2012 Reference: Most of the material in this note
More informationMoral Hazard. Itay Goldstein. Wharton School, University of Pennsylvania
Moral Hazard Itay Goldstein Wharton School, University of Pennsylvania 1 Principal-Agent Problem Basic problem in corporate finance: separation of ownership and control: o The owners of the firm are typically
More informationExact Inference for Gaussian Process Regression in case of Big Data with the Cartesian Product Structure
Exact Inference for Gaussian Process Regression in case of Big Data with the Cartesian Product Structure Belyaev Mikhail 1,2,3, Burnaev Evgeny 1,2,3, Kapushev Yermek 1,2 1 Institute for Information Transmission
More informationPa8ern Recogni6on. and Machine Learning. Chapter 4: Linear Models for Classifica6on
Pa8ern Recogni6on and Machine Learning Chapter 4: Linear Models for Classifica6on Represen'ng the target values for classifica'on If there are only two classes, we typically use a single real valued output
More informationPHASE ESTIMATION ALGORITHM FOR FREQUENCY HOPPED BINARY PSK AND DPSK WAVEFORMS WITH SMALL NUMBER OF REFERENCE SYMBOLS
PHASE ESTIMATION ALGORITHM FOR FREQUENCY HOPPED BINARY PSK AND DPSK WAVEFORMS WITH SMALL NUM OF REFERENCE SYMBOLS Benjamin R. Wiederholt The MITRE Corporation Bedford, MA and Mario A. Blanco The MITRE
More informationNonparametric adaptive age replacement with a one-cycle criterion
Nonparametric adaptive age replacement with a one-cycle criterion P. Coolen-Schrijner, F.P.A. Coolen Department of Mathematical Sciences University of Durham, Durham, DH1 3LE, UK e-mail: Pauline.Schrijner@durham.ac.uk
More informationMessage-passing sequential detection of multiple change points in networks
Message-passing sequential detection of multiple change points in networks Long Nguyen, Arash Amini Ram Rajagopal University of Michigan Stanford University ISIT, Boston, July 2012 Nguyen/Amini/Rajagopal
More informationAuxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus
Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives
More informationIntroduction to Support Vector Machines. Colin Campbell, Bristol University
Introduction to Support Vector Machines Colin Campbell, Bristol University 1 Outline of talk. Part 1. An Introduction to SVMs 1.1. SVMs for binary classification. 1.2. Soft margins and multi-class classification.
More informationCCNY. BME I5100: Biomedical Signal Processing. Linear Discrimination. Lucas C. Parra Biomedical Engineering Department City College of New York
BME I5100: Biomedical Signal Processing Linear Discrimination Lucas C. Parra Biomedical Engineering Department CCNY 1 Schedule Week 1: Introduction Linear, stationary, normal - the stuff biology is not
More informationLinear Threshold Units
Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear
More informationCSCI567 Machine Learning (Fall 2014)
CSCI567 Machine Learning (Fall 2014) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu September 22, 2014 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 22, 2014 1 /
More informationNeuro-Dynamic Programming An Overview
1 Neuro-Dynamic Programming An Overview Dimitri Bertsekas Dept. of Electrical Engineering and Computer Science M.I.T. September 2006 2 BELLMAN AND THE DUAL CURSES Dynamic Programming (DP) is very broadly
More informationContinuous Time Bayesian Networks for Inferring Users Presence and Activities with Extensions for Modeling and Evaluation
Continuous Time Bayesian Networks for Inferring Users Presence and Activities with Extensions for Modeling and Evaluation Uri Nodelman 1 Eric Horvitz Microsoft Research One Microsoft Way Redmond, WA 98052
More informationCell Phone based Activity Detection using Markov Logic Network
Cell Phone based Activity Detection using Markov Logic Network Somdeb Sarkhel sxs104721@utdallas.edu 1 Introduction Mobile devices are becoming increasingly sophisticated and the latest generation of smart
More informationMarketing Mix Modelling and Big Data P. M Cain
1) Introduction Marketing Mix Modelling and Big Data P. M Cain Big data is generally defined in terms of the volume and variety of structured and unstructured information. Whereas structured data is stored
More informationGaussian Processes in Machine Learning
Gaussian Processes in Machine Learning Carl Edward Rasmussen Max Planck Institute for Biological Cybernetics, 72076 Tübingen, Germany carl@tuebingen.mpg.de WWW home page: http://www.tuebingen.mpg.de/ carl
More informationThe equivalence of logistic regression and maximum entropy models
The equivalence of logistic regression and maximum entropy models John Mount September 23, 20 Abstract As our colleague so aptly demonstrated ( http://www.win-vector.com/blog/20/09/the-simplerderivation-of-logistic-regression/
More informationModel for dynamic website optimization
Chapter 10 Model for dynamic website optimization In this chapter we develop a mathematical model for dynamic website optimization. The model is designed to optimize websites through autonomous management
More informationVariations of Statistical Models
38. Statistics 1 38. STATISTICS Revised September 2013 by G. Cowan (RHUL). This chapter gives an overview of statistical methods used in high-energy physics. In statistics, we are interested in using a
More informationLocal Gaussian Process Regression for Real Time Online Model Learning and Control
Local Gaussian Process Regression for Real Time Online Model Learning and Control Duy Nguyen-Tuong Jan Peters Matthias Seeger Max Planck Institute for Biological Cybernetics Spemannstraße 38, 776 Tübingen,
More informationNEURAL NETWORKS AND REINFORCEMENT LEARNING. Abhijit Gosavi
NEURAL NETWORKS AND REINFORCEMENT LEARNING Abhijit Gosavi Department of Engineering Management and Systems Engineering Missouri University of Science and Technology Rolla, MO 65409 1 Outline A Quick Introduction
More informationInvited Applications Paper
Invited Applications Paper - - Thore Graepel Joaquin Quiñonero Candela Thomas Borchert Ralf Herbrich Microsoft Research Ltd., 7 J J Thomson Avenue, Cambridge CB3 0FB, UK THOREG@MICROSOFT.COM JOAQUINC@MICROSOFT.COM
More informationLeast Squares Estimation
Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David
More informationBayesX - Software for Bayesian Inference in Structured Additive Regression
BayesX - Software for Bayesian Inference in Structured Additive Regression Thomas Kneib Faculty of Mathematics and Economics, University of Ulm Department of Statistics, Ludwig-Maximilians-University Munich
More informationSemi-Supervised Support Vector Machines and Application to Spam Filtering
Semi-Supervised Support Vector Machines and Application to Spam Filtering Alexander Zien Empirical Inference Department, Bernhard Schölkopf Max Planck Institute for Biological Cybernetics ECML 2006 Discovery
More informationBetting rules and information theory
Betting rules and information theory Giulio Bottazzi LEM and CAFED Scuola Superiore Sant Anna September, 2013 Outline Simple betting in favorable games The Central Limit Theorem Optimal rules The Game
More informationCoding and decoding with convolutional codes. The Viterbi Algor
Coding and decoding with convolutional codes. The Viterbi Algorithm. 8 Block codes: main ideas Principles st point of view: infinite length block code nd point of view: convolutions Some examples Repetition
More information17.3.1 Follow the Perturbed Leader
CS787: Advanced Algorithms Topic: Online Learning Presenters: David He, Chris Hopman 17.3.1 Follow the Perturbed Leader 17.3.1.1 Prediction Problem Recall the prediction problem that we discussed in class.
More informationSupervised Learning (Big Data Analytics)
Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used
More informationMaking Sense of the Mayhem: Machine Learning and March Madness
Making Sense of the Mayhem: Machine Learning and March Madness Alex Tran and Adam Ginzberg Stanford University atran3@stanford.edu ginzberg@stanford.edu I. Introduction III. Model The goal of our research
More informationLikelihood Approaches for Trial Designs in Early Phase Oncology
Likelihood Approaches for Trial Designs in Early Phase Oncology Clinical Trials Elizabeth Garrett-Mayer, PhD Cody Chiuzan, PhD Hollings Cancer Center Department of Public Health Sciences Medical University
More informationBayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University caizhua@gmail.com
Bayesian Machine Learning (ML): Modeling And Inference in Big Data Zhuhua Cai Google Rice University caizhua@gmail.com 1 Syllabus Bayesian ML Concepts (Today) Bayesian ML on MapReduce (Next morning) Bayesian
More informationLogistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression
Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max
More informationModel-based Synthesis. Tony O Hagan
Model-based Synthesis Tony O Hagan Stochastic models Synthesising evidence through a statistical model 2 Evidence Synthesis (Session 3), Helsinki, 28/10/11 Graphical modelling The kinds of models that
More informationProbabilistic Latent Semantic Analysis (plsa)
Probabilistic Latent Semantic Analysis (plsa) SS 2008 Bayesian Networks Multimedia Computing, Universität Augsburg Rainer.Lienhart@informatik.uni-augsburg.de www.multimedia-computing.{de,org} References
More informationSocial Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
More informationAn Application of Inverse Reinforcement Learning to Medical Records of Diabetes Treatment
An Application of Inverse Reinforcement Learning to Medical Records of Diabetes Treatment Hideki Asoh 1, Masanori Shiro 1 Shotaro Akaho 1, Toshihiro Kamishima 1, Koiti Hasida 1, Eiji Aramaki 2, and Takahide
More informationA HYBRID GENETIC ALGORITHM FOR THE MAXIMUM LIKELIHOOD ESTIMATION OF MODELS WITH MULTIPLE EQUILIBRIA: A FIRST REPORT
New Mathematics and Natural Computation Vol. 1, No. 2 (2005) 295 303 c World Scientific Publishing Company A HYBRID GENETIC ALGORITHM FOR THE MAXIMUM LIKELIHOOD ESTIMATION OF MODELS WITH MULTIPLE EQUILIBRIA:
More informationNEURAL NETWORKS A Comprehensive Foundation
NEURAL NETWORKS A Comprehensive Foundation Second Edition Simon Haykin McMaster University Hamilton, Ontario, Canada Prentice Hall Prentice Hall Upper Saddle River; New Jersey 07458 Preface xii Acknowledgments
More informationEfficient online learning of a non-negative sparse autoencoder
and Machine Learning. Bruges (Belgium), 28-30 April 2010, d-side publi., ISBN 2-93030-10-2. Efficient online learning of a non-negative sparse autoencoder Andre Lemme, R. Felix Reinhart and Jochen J. Steil
More informationIN AVIATION it regularly occurs that an airplane encounters. Guaranteed globally optimal continuous reinforcement learning.
GUARANTEED GLOBALLY OPTIMAL CONTINUOUS REINFORCEMENT LEARNING 1 Guaranteed globally optimal continuous reinforcement learning Hildo Bijl Abstract Self-learning and adaptable autopilots have the potential
More informationLogistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld.
Logistic Regression Vibhav Gogate The University of Texas at Dallas Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld. Generative vs. Discriminative Classifiers Want to Learn: h:x Y X features
More informationCOPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments
Contents List of Figures Foreword Preface xxv xxiii xv Acknowledgments xxix Chapter 1 Fraud: Detection, Prevention, and Analytics! 1 Introduction 2 Fraud! 2 Fraud Detection and Prevention 10 Big Data for
More information1 Prior Probability and Posterior Probability
Math 541: Statistical Theory II Bayesian Approach to Parameter Estimation Lecturer: Songfeng Zheng 1 Prior Probability and Posterior Probability Consider now a problem of statistical inference in which
More informationHT2015: SC4 Statistical Data Mining and Machine Learning
HT2015: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Bayesian Nonparametrics Parametric vs Nonparametric
More informationBayesian Image Super-Resolution
Bayesian Image Super-Resolution Michael E. Tipping and Christopher M. Bishop Microsoft Research, Cambridge, U.K..................................................................... Published as: Bayesian
More information