Section 5. Stan for Big Data. Bob Carpenter. Columbia University
|
|
|
- Evelyn Baker
- 9 years ago
- Views:
Transcription
1 Section 5. Stan for Big Data Bob Carpenter Columbia University
2 Part I Overview
3 Scaling and Evaluation data size (bytes) 1e18 1e15 1e12 1e9 1e6 Big Model and Big Data approach state of the art big model big data model complexity Types of Scaling: data, parameters, models
4 Riemannian Manifold HMC Best mixing MCMC method (fixed # of continuous params) Moves on Riemannian manifold rather than Euclidean adapts to position-dependent curvature geonuts generalizes NUTS to RHMC (Betancourt arxiv) SoftAbs metric (Betancourt arxiv) eigendecompose Hessian and condition computationally feasible alternative to original Fisher info metric of Girolami and Calderhead (JRSS, Series B) requires third-order derivatives and implicit integrator Code complete; awaiting higher-order auto-diff
5 Adiabatic Sampling Physically motivated alternative to simulated annealing and tempering (not really simulated!) Supplies external heat bath Operates through contact manifold System relaxes more naturally between energy levels Betancourt paper on arxiv Prototype complete
6 Black Box Variational Inference Black box so can fit any Stan model Multivariate normal approx to unconstrained posterior covariance: diagonal mean-field or full rank not Laplace approx around posterior mean, not mode transformed back to constrained space (built-in Jacobians) Stochastic gradient-descent optimization ELBO gradient estimated via Monte Carlo + autdiff Returns approximate posterior mean / covariance Returns sample transformed to constrained space
7 Black Box EP Fast, approximate inference (like VB) VB and EP minimize divergence in opposite directions especially useful for Gaussian processes Asynchronous, data-parallel expectation propagation (EP) Cavity distributions control subsample variance Prototypte stage collaborating with Seth Flaxman, Aki Vehtari, Pasi Jylänki, John Cunningham, Nicholas Chopin, Christian Robert
8 Maximum Marginal Likelihood Fast, approximate inference for hierarchical models Marginalize out lower-level parameters Optimize higher-level parameters and fix Optimize lower-level parameters given higher-level Errors estimated as in MLE aka empirical Bayes but not fully Bayesian and no more empirical than full Bayes Design complete; awaiting parameter tagging
9 Part II Posterior Modes & Laplace Approximation
10 Laplace Approximation Maximum (penalized) likelihood as approximate Bayes Laplace approximation to posterior Compute posterior mode via optimization Estimate posterior as θ = arg max θ p(θ y) p(θ y) MultiNormal(θ H 1 ) H is the Hessian of the log posterior H i,j = 2 θ i θ j log p(θ y)
11 Stan s Laplace Approximation L-BFGS to compute posterior mode θ Automatic differentiation to compute H current R: finite differences of gradients soon: second-order automatic differentiation
12 Part III Variational Bayes
13 VB in a Nutshell y is observed data, θ parameters Goal is to approximate posterior p(θ y) with a convenient approximating density g(θ φ) φ is a vector of parameters of approximating density Given data y, VB computes φ minimizing KL-divergence from approximation g(θ φ) to posterior p(θ y) φ = arg min φ KL[g(θ φ) p(θ y)] = ( ) p(θ y) arg max φ log g(θ φ) dθ Θ g(θ φ)
14 PPROXIMATE KL-Divergence INFERENCE Example parison of ive forms for the divergence. The corresponding to ard deviations for ssian distribution riables z1 and z2, ntours represent g levels for an distribution q(z) ariables given by two independent sian distributions s are obtained by (a) the Kullbacke KL(q p), and Kullback-Leibler q). z z1 1 (a) z z1 1 (b) Green: true distribution p; Red: approx. distribution g (a) VB-like: KL[g p]; (b) EP-like: KL[p g] is used in an alternative approximate inference framework called expectation propagation. VBWe systematically therefore consider theunderstimates general problem of minimizing posterior KL(p q) when variance q(z) is a factorized approximation of the form (10.5). The KL divergence can then be written in Bishop the form (2006) Pattern Recognition and Machine Learning, fig [ M ] KL(p q) = p(z) ln q i(z i) dz + const (10.16) i=1
15 VB vs. 464 Laplace 10. APPROXIMATE INFERENCE Figure 10.1 Illustration of the variational approximation for the example considered e left-hand plot shows the original distribution (yellow) along with the Laplace (red) and v imations, and the right-hand plot shows the negative logarithms of the corresponding c solid yellow: target; red: Laplace; green: VB VB approximates posterior mean; Laplace posterior mode However, we shall suppose the model is such that working Bishop (2006) Pattern Recognition distribution is and intractable. Machine Learning, fig We therefore consider instead a restricted family of dist seek the member of this family for which the KL divergence is to restrict the family sufficiently that they comprise only
16 Stan s Black-Box VB Typically custom g() per model based on conjugacy and analytic updates Stan uses black-box VB with multivariate Gaussian g g(θ φ) = MultiNormal(θ µ, Σ) for the unconstrained posterior e.g., scales σ log-transformed with Jacobian Stan provides two versions Mean field: Σ diagonal General: Σ dense
17 Stan s VB: Computation Use L-BFGS optimization to optimize θ Requires differentiable KL[g(θ φ) p(θ y)] only up to constant (i.e., use evidence lower bound (ELBO)) Approximate KL-divergence and gradient via Monte Carlo KL divergence is an expectation w.r.t. approximation g(θ φ) Monte Carlo draws i.i.d. from approximating multi-normal only need approximate gradient calculation for soundness so only a few Monte Carlo iterations are enough
18 Stan s VB: Computation (cont.) To support compatible plug-in inference draw Monte Carlo sample θ (1),..., θ (M) with θ (m) MultiNormal(θ µ, Σ ) inverse transfrom from unconstrained to constrained scale report to user in same way as MCMC draws Future: reweight θ (m) via importance sampling with respect to true posterior to improve expectation calculations
19 Near Future: Stochastic VB Data-streaming form of VB Scales to billions of observations Hoffman et al. (2013) Stochastic variational inference. JMLR 14. Mashup of stochastic gradient (Robbins and Monro 1951) and VB subsample data (e.g., stream in minibatches) upweight each minibatch to full data set size use to make unbiased estimate of true gradient take gradient step to minimimize KL-divergence Prototype code complete
20 The End (Section 5)
STA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! [email protected]! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct
Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014
Probabilistic Models for Big Data Alex Davies and Roger Frigola University of Cambridge 13th February 2014 The State of Big Data Why probabilistic models for Big Data? 1. If you don t have to worry about
Christfried Webers. Canberra February June 2015
c Statistical Group and College of Engineering and Computer Science Canberra February June (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 829 c Part VIII Linear Classification 2 Logistic
Basics of Statistical Machine Learning
CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu [email protected] Modern machine learning is rooted in statistics. You will find many familiar
Computational Statistics for Big Data
Lancaster University Computational Statistics for Big Data Author: 1 Supervisors: Paul Fearnhead 1 Emily Fox 2 1 Lancaster University 2 The University of Washington September 1, 2015 Abstract The amount
Big Data need Big Model 1/44
Big Data need Big Model 1/44 Andrew Gelman, Bob Carpenter, Matt Hoffman, Daniel Lee, Ben Goodrich, Michael Betancourt, Marcus Brubaker, Jiqiang Guo, Peter Li, Allen Riddell,... Department of Statistics,
Lecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
CS 688 Pattern Recognition Lecture 4. Linear Models for Classification
CS 688 Pattern Recognition Lecture 4 Linear Models for Classification Probabilistic generative models Probabilistic discriminative models 1 Generative Approach ( x ) p C k p( C k ) Ck p ( ) ( x Ck ) p(
Statistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
Towards running complex models on big data
Towards running complex models on big data Working with all the genomes in the world without changing the model (too much) Daniel Lawson Heilbronn Institute, University of Bristol 2013 1 / 17 Motivation
Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression
Logistic Regression Department of Statistics The Pennsylvania State University Email: [email protected] Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max
Pattern Analysis. Logistic Regression. 12. Mai 2009. Joachim Hornegger. Chair of Pattern Recognition Erlangen University
Pattern Analysis Logistic Regression 12. Mai 2009 Joachim Hornegger Chair of Pattern Recognition Erlangen University Pattern Analysis 2 / 43 1 Logistic Regression Posteriors and the Logistic Function Decision
Probabilistic Linear Classification: Logistic Regression. Piyush Rai IIT Kanpur
Probabilistic Linear Classification: Logistic Regression Piyush Rai IIT Kanpur Probabilistic Machine Learning (CS772A) Jan 18, 2016 Probabilistic Machine Learning (CS772A) Probabilistic Linear Classification:
Statistical Machine Learning from Data
Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Gaussian Mixture Models Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique
Manifold Learning with Variational Auto-encoder for Medical Image Analysis
Manifold Learning with Variational Auto-encoder for Medical Image Analysis Eunbyung Park Department of Computer Science University of North Carolina at Chapel Hill [email protected] Abstract Manifold
Gaussian Processes to Speed up Hamiltonian Monte Carlo
Gaussian Processes to Speed up Hamiltonian Monte Carlo Matthieu Lê Murray, Iain http://videolectures.net/mlss09uk_murray_mcmc/ Rasmussen, Carl Edward. "Gaussian processes to speed up hybrid Monte Carlo
An Internal Model for Operational Risk Computation
An Internal Model for Operational Risk Computation Seminarios de Matemática Financiera Instituto MEFF-RiskLab, Madrid http://www.risklab-madrid.uam.es/ Nicolas Baud, Antoine Frachot & Thierry Roncalli
Monte Carlo Simulation
1 Monte Carlo Simulation Stefan Weber Leibniz Universität Hannover email: [email protected] web: www.stochastik.uni-hannover.de/ sweber Monte Carlo Simulation 2 Quantifying and Hedging
Tutorial on Markov Chain Monte Carlo
Tutorial on Markov Chain Monte Carlo Kenneth M. Hanson Los Alamos National Laboratory Presented at the 29 th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Technology,
Bayesian Statistics in One Hour. Patrick Lam
Bayesian Statistics in One Hour Patrick Lam Outline Introduction Bayesian Models Applications Missing Data Hierarchical Models Outline Introduction Bayesian Models Applications Missing Data Hierarchical
Linear Discrimination. Linear Discrimination. Linear Discrimination. Linearly Separable Systems Pairwise Separation. Steven J Zeil.
Steven J Zeil Old Dominion Univ. Fall 200 Discriminant-Based Classification Linearly Separable Systems Pairwise Separation 2 Posteriors 3 Logistic Discrimination 2 Discriminant-Based Classification Likelihood-based:
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical
Markov Chain Monte Carlo Simulation Made Simple
Markov Chain Monte Carlo Simulation Made Simple Alastair Smith Department of Politics New York University April2,2003 1 Markov Chain Monte Carlo (MCMC) simualtion is a powerful technique to perform numerical
Linear Classification. Volker Tresp Summer 2015
Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong
Incorporating cost in Bayesian Variable Selection, with application to cost-effective measurement of quality of health care.
Incorporating cost in Bayesian Variable Selection, with application to cost-effective measurement of quality of health care University of Florida 10th Annual Winter Workshop: Bayesian Model Selection and
Optimizing Prediction with Hierarchical Models: Bayesian Clustering
1 Technical Report 06/93, (August 30, 1993). Presidencia de la Generalidad. Caballeros 9, 46001 - Valencia, Spain. Tel. (34)(6) 386.6138, Fax (34)(6) 386.3626, e-mail: [email protected] Optimizing Prediction
11. Time series and dynamic linear models
11. Time series and dynamic linear models Objective To introduce the Bayesian approach to the modeling and forecasting of time series. Recommended reading West, M. and Harrison, J. (1997). models, (2 nd
Bayesian Dark Knowledge
Bayesian Dark Knowledge Anoop Korattikara, Vivek Rathod, Kevin Murphy Google Research {kbanoop, rathodv, kpmurphy}@google.com Max Welling University of Amsterdam [email protected] Abstract We consider the
CSCI567 Machine Learning (Fall 2014)
CSCI567 Machine Learning (Fall 2014) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu September 22, 2014 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 22, 2014 1 /
These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
Music and Machine Learning (IFT6080 Winter 08) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher
Machine Learning and Data Analysis overview. Department of Cybernetics, Czech Technical University in Prague. http://ida.felk.cvut.
Machine Learning and Data Analysis overview Jiří Kléma Department of Cybernetics, Czech Technical University in Prague http://ida.felk.cvut.cz psyllabus Lecture Lecturer Content 1. J. Kléma Introduction,
Model-based Synthesis. Tony O Hagan
Model-based Synthesis Tony O Hagan Stochastic models Synthesising evidence through a statistical model 2 Evidence Synthesis (Session 3), Helsinki, 28/10/11 Graphical modelling The kinds of models that
Austerity in MCMC Land: Cutting the Metropolis-Hastings Budget
Anoop Korattikara [email protected] School of Information & Computer Sciences, University of California, Irvine, CA 92617, USA Yutian Chen [email protected] Department of Engineering, University of
BayesX - Software for Bayesian Inference in Structured Additive Regression
BayesX - Software for Bayesian Inference in Structured Additive Regression Thomas Kneib Faculty of Mathematics and Economics, University of Ulm Department of Statistics, Ludwig-Maximilians-University Munich
Introduction to Deep Learning Variational Inference, Mean Field Theory
Introduction to Deep Learning Variational Inference, Mean Field Theory 1 Iasonas Kokkinos [email protected] Center for Visual Computing Ecole Centrale Paris Galen Group INRIA-Saclay Lecture 3: recap
INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition)
INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition) Abstract Indirect inference is a simulation-based method for estimating the parameters of economic models. Its
Chapter 8. The exponential family: Basics. 8.1 The exponential family
Chapter 8 The exponential family: Basics In this chapter we extend the scope of our modeling toolbox to accommodate a variety of additional data types, including counts, time intervals and rates. We introduce
Bias in the Estimation of Mean Reversion in Continuous-Time Lévy Processes
Bias in the Estimation of Mean Reversion in Continuous-Time Lévy Processes Yong Bao a, Aman Ullah b, Yun Wang c, and Jun Yu d a Purdue University, IN, USA b University of California, Riverside, CA, USA
Machine Learning and Pattern Recognition Logistic Regression
Machine Learning and Pattern Recognition Logistic Regression Course Lecturer:Amos J Storkey Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh Crichton Street,
Logistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld.
Logistic Regression Vibhav Gogate The University of Texas at Dallas Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld. Generative vs. Discriminative Classifiers Want to Learn: h:x Y X features
Imperfect Debugging in Software Reliability
Imperfect Debugging in Software Reliability Tevfik Aktekin and Toros Caglar University of New Hampshire Peter T. Paul College of Business and Economics Department of Decision Sciences and United Health
Bayesian Information Criterion The BIC Of Algebraic geometry
Generalized BIC for Singular Models Factoring through Regular Models Shaowei Lin http://math.berkeley.edu/ shaowei/ Department of Mathematics, University of California, Berkeley PhD student (Advisor: Bernd
Logistic Regression (1/24/13)
STA63/CBB540: Statistical methods in computational biology Logistic Regression (/24/3) Lecturer: Barbara Engelhardt Scribe: Dinesh Manandhar Introduction Logistic regression is model for regression used
Big Data, Statistics, and the Internet
Big Data, Statistics, and the Internet Steven L. Scott April, 4 Steve Scott (Google) Big Data, Statistics, and the Internet April, 4 / 39 Summary Big data live on more than one machine. Computing takes
Spatial Statistics Chapter 3 Basics of areal data and areal data modeling
Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Recall areal data also known as lattice data are data Y (s), s D where D is a discrete index set. This usually corresponds to data
Gaussian Processes in Machine Learning
Gaussian Processes in Machine Learning Carl Edward Rasmussen Max Planck Institute for Biological Cybernetics, 72076 Tübingen, Germany [email protected] WWW home page: http://www.tuebingen.mpg.de/ carl
HT2015: SC4 Statistical Data Mining and Machine Learning
HT2015: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Bayesian Nonparametrics Parametric vs Nonparametric
Methods of Data Analysis Working with probability distributions
Methods of Data Analysis Working with probability distributions Week 4 1 Motivation One of the key problems in non-parametric data analysis is to create a good model of a generating probability distribution,
A Bootstrap Metropolis-Hastings Algorithm for Bayesian Analysis of Big Data
A Bootstrap Metropolis-Hastings Algorithm for Bayesian Analysis of Big Data Faming Liang University of Florida August 9, 2015 Abstract MCMC methods have proven to be a very powerful tool for analyzing
Classification. Chapter 3
Chapter 3 Classification In chapter we have considered regression problems, where the targets are real valued. Another important class of problems is classification problems, where we wish to assign an
CCNY. BME I5100: Biomedical Signal Processing. Linear Discrimination. Lucas C. Parra Biomedical Engineering Department City College of New York
BME I5100: Biomedical Signal Processing Linear Discrimination Lucas C. Parra Biomedical Engineering Department CCNY 1 Schedule Week 1: Introduction Linear, stationary, normal - the stuff biology is not
Validation of Software for Bayesian Models using Posterior Quantiles. Samantha R. Cook Andrew Gelman Donald B. Rubin DRAFT
Validation of Software for Bayesian Models using Posterior Quantiles Samantha R. Cook Andrew Gelman Donald B. Rubin DRAFT Abstract We present a simulation-based method designed to establish that software
Stochastic Variational Inference
Journal of Machine Learning Research 14 (2013) 1303-1347 Submitted 6/12; Published 5/13 Stochastic Variational Inference Matthew D. Hoffman Adobe Research Adobe Systems Incorporated 601 Townsend Street
Efficient Thompson Sampling for Online Matrix-Factorization Recommendation
Efficient Thompson Sampling for Online Matrix-Factorization Recommendation Jaya Kawale, Hung Bui, Branislav Kveton Adobe Research San Jose, CA {kawale, hubui, kveton}@adobe.com Long Tran Thanh University
33. STATISTICS. 33. Statistics 1
33. STATISTICS 33. Statistics 1 Revised September 2011 by G. Cowan (RHUL). This chapter gives an overview of statistical methods used in high-energy physics. In statistics, we are interested in using a
PS 271B: Quantitative Methods II. Lecture Notes
PS 271B: Quantitative Methods II Lecture Notes Langche Zeng [email protected] The Empirical Research Process; Fundamental Methodological Issues 2 Theory; Data; Models/model selection; Estimation; Inference.
Extreme Value Modeling for Detection and Attribution of Climate Extremes
Extreme Value Modeling for Detection and Attribution of Climate Extremes Jun Yan, Yujing Jiang Joint work with Zhuo Wang, Xuebin Zhang Department of Statistics, University of Connecticut February 2, 2016
How to assess the risk of a large portfolio? How to estimate a large covariance matrix?
Chapter 3 Sparse Portfolio Allocation This chapter touches some practical aspects of portfolio allocation and risk assessment from a large pool of financial assets (e.g. stocks) How to assess the risk
Semi-Supervised Support Vector Machines and Application to Spam Filtering
Semi-Supervised Support Vector Machines and Application to Spam Filtering Alexander Zien Empirical Inference Department, Bernhard Schölkopf Max Planck Institute for Biological Cybernetics ECML 2006 Discovery
Least Squares Estimation
Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David
Kristine L. Bell and Harry L. Van Trees. Center of Excellence in C 3 I George Mason University Fairfax, VA 22030-4444, USA [email protected], hlv@gmu.
POSERIOR CRAMÉR-RAO BOUND FOR RACKING ARGE BEARING Kristine L. Bell and Harry L. Van rees Center of Excellence in C 3 I George Mason University Fairfax, VA 22030-4444, USA [email protected], [email protected] ABSRAC
Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering
Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering Department of Industrial Engineering and Management Sciences Northwestern University September 15th, 2014
Exponential Random Graph Models for Social Network Analysis. Danny Wyatt 590AI March 6, 2009
Exponential Random Graph Models for Social Network Analysis Danny Wyatt 590AI March 6, 2009 Traditional Social Network Analysis Covered by Eytan Traditional SNA uses descriptive statistics Path lengths
Bayes and Naïve Bayes. cs534-machine Learning
Bayes and aïve Bayes cs534-machine Learning Bayes Classifier Generative model learns Prediction is made by and where This is often referred to as the Bayes Classifier, because of the use of the Bayes rule
Java Modules for Time Series Analysis
Java Modules for Time Series Analysis Agenda Clustering Non-normal distributions Multifactor modeling Implied ratings Time series prediction 1. Clustering + Cluster 1 Synthetic Clustering + Time series
A Tutorial on Particle Filters for Online Nonlinear/Non-Gaussian Bayesian Tracking
174 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 50, NO. 2, FEBRUARY 2002 A Tutorial on Particle Filters for Online Nonlinear/Non-Gaussian Bayesian Tracking M. Sanjeev Arulampalam, Simon Maskell, Neil
Reject Inference in Credit Scoring. Jie-Men Mok
Reject Inference in Credit Scoring Jie-Men Mok BMI paper January 2009 ii Preface In the Master programme of Business Mathematics and Informatics (BMI), it is required to perform research on a business
Inference on Phase-type Models via MCMC
Inference on Phase-type Models via MCMC with application to networks of repairable redundant systems Louis JM Aslett and Simon P Wilson Trinity College Dublin 28 th June 202 Toy Example : Redundant Repairable
Variational Mean Field for Graphical Models
Variational Mean Field for Graphical Models CS/CNS/EE 155 Baback Moghaddam Machine Learning Group baback @ jpl.nasa.gov Approximate Inference Consider general UGs (i.e., not tree-structured) All basic
Centre for Central Banking Studies
Centre for Central Banking Studies Technical Handbook No. 4 Applied Bayesian econometrics for central bankers Andrew Blake and Haroon Mumtaz CCBS Technical Handbook No. 4 Applied Bayesian econometrics
Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris
Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines
3F3: Signal and Pattern Processing
3F3: Signal and Pattern Processing Lecture 3: Classification Zoubin Ghahramani [email protected] Department of Engineering University of Cambridge Lent Term Classification We will represent data by
Nonlinear Optimization: Algorithms 3: Interior-point methods
Nonlinear Optimization: Algorithms 3: Interior-point methods INSEAD, Spring 2006 Jean-Philippe Vert Ecole des Mines de Paris [email protected] Nonlinear optimization c 2006 Jean-Philippe Vert,
Simulation-based optimization methods for urban transportation problems. Carolina Osorio
Simulation-based optimization methods for urban transportation problems Carolina Osorio Civil and Environmental Engineering Department Massachusetts Institute of Technology (MIT) Joint work with: Prof.
Dealing with systematics for chi-square and for log likelihood goodness of fit statistics
Statistical Inference Problems in High Energy Physics (06w5054) Banff International Research Station July 15 July 20, 2006 Dealing with systematics for chi-square and for log likelihood goodness of fit
Pa8ern Recogni6on. and Machine Learning. Chapter 4: Linear Models for Classifica6on
Pa8ern Recogni6on and Machine Learning Chapter 4: Linear Models for Classifica6on Represen'ng the target values for classifica'on If there are only two classes, we typically use a single real valued output
Introduction to Markov Chain Monte Carlo
Introduction to Markov Chain Monte Carlo Monte Carlo: sample from a distribution to estimate the distribution to compute max, mean Markov Chain Monte Carlo: sampling using local information Generic problem
Analyzing large and complex data sets in fusion: Modern tools from probability theory and pattern recognition
FACULTY OF ENGINEERING AND ARCHITECTURE Analyzing large and complex data sets in fusion: Modern tools from probability theory and pattern recognition Geert Verdoolaege Department of Applied Physics Ghent
Gaussian Process Training with Input Noise
Gaussian Process Training with Input Noise Andrew McHutchon Department of Engineering Cambridge University Cambridge, CB PZ [email protected] Carl Edward Rasmussen Department of Engineering Cambridge University
Linear Threshold Units
Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear
Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University [email protected]
Bayesian Machine Learning (ML): Modeling And Inference in Big Data Zhuhua Cai Google Rice University [email protected] 1 Syllabus Bayesian ML Concepts (Today) Bayesian ML on MapReduce (Next morning) Bayesian
MCMC Using Hamiltonian Dynamics
5 MCMC Using Hamiltonian Dynamics Radford M. Neal 5.1 Introduction Markov chain Monte Carlo (MCMC) originated with the classic paper of Metropolis et al. (1953), where it was used to simulate the distribution
Modeling and Analysis of Call Center Arrival Data: A Bayesian Approach
Modeling and Analysis of Call Center Arrival Data: A Bayesian Approach Refik Soyer * Department of Management Science The George Washington University M. Murat Tarimcilar Department of Management Science
MACHINE LEARNING IN HIGH ENERGY PHYSICS
MACHINE LEARNING IN HIGH ENERGY PHYSICS LECTURE #1 Alex Rogozhnikov, 2015 INTRO NOTES 4 days two lectures, two practice seminars every day this is introductory track to machine learning kaggle competition!
Multiple Optimization Using the JMP Statistical Software Kodak Research Conference May 9, 2005
Multiple Optimization Using the JMP Statistical Software Kodak Research Conference May 9, 2005 Philip J. Ramsey, Ph.D., Mia L. Stephens, MS, Marie Gaudard, Ph.D. North Haven Group, http://www.northhavengroup.com/
