Sampling via Moment Sharing: A New Framework for Distributed Bayesian Inference for Big Data


 Gerard McGee
 2 years ago
 Views:
Transcription
1 Sampling via Moment Sharing: A New Framework for Distributed Bayesian Inference for Big Data (Oxford) in collaboration with: Minjie Xu, Jun Zhu, Bo Zhang (Tsinghua) Balaji Lakshminarayanan (Gatsby)
2 Bayesian Inference! Parameter vector X. X! Data items Y = y1, y2,... yn. y 1 y 2 y 3 y 4... y N! Model:! Aim: p(x, Y )=p(x) p(x Y )= NY i=1 p(y i X) p(x)p(y X) p(y )
3 Why Bayes for Machine Learning?! An important framework to frame learning.! Quantification of uncertainty.! Flexible and intuitive construction of complex models.! Straightforward derivation of learning algorithms.! Mitigation of overfitting.
4 Big Data and Bayesian Inference?! Large scale datasets are fast becoming the norm.! Analysing and extracting understanding from these data is a driver of progress in many sectors of society.! Current successes in scalable learning are optimizationbased and nonbayesian.! What is the role of Bayesian learning in world of Big Data?
5 Generic (Machine) Learning on Big Data! Stochastic optimisation using minibatches.! Stochastic gradient descent. > Stochastic Gradient Langevin Dynamics (Welling & Teh, Teh et al)! Distributed/parallel computations on cores/clusters/gpus.! MapReduce, parameter server.! Bringing the computations to the data, not the reverse.! High communication costs. > Distributed Bayesian Posterior Sampling via Moment Sharing (Xu et al)! High synchronisation costs. > Asynchronous Anytime Sequential Monte Carlo (Paige et al)
6 Generic (Bayesian) Learning on Big Data! Stochastic optimisation using minibatches.! Stochastic gradient descent.! > Stochastic Gradient Langevin Dynamics [Welling & Teh 2011, Patterson & Teh 2013, Teh et al (forthcoming)]! Distributed/parallel computations on cores/clusters/gpus.! MapReduce, parameter server.! Bringing the computations to the data, not the reverse.! High communication costs.! > Distributed Bayesian Posterior Sampling via Moment Sharing [Xu et al 2014]! High synchronisation costs.! > Asynchronous Anytime Sequential Monte Carlo [Paige et al 2014]
7 Machine Learning on Distributed Systems! Distributed storage! Distributed computation! Network communication costs y 1i y 2i y 3i y 4i
8 Embarassingly Parallel MCMC Sampling Combine samples together. {X i } i=1...n Treat as independent inference problems. Collect samples. y 1i y 2i y 3i y 4i {X ji } j=1...m,i=1...n! Only communication at the combination stage.
9 ! where Local and Global Posteriors! Each worker machine j has access only to its data subset. p j (X y j )=p j (X) pj(x) is a local prior and pj(x yj) is local posterior. IY i=1 p(y ji X)! The (target) global posterior is p(x y) / p(x) my j=1! If prior p(x) = j pj(x), then p(x y) / p(y j X) / p(x)! Given collection of samples { Xji }i=1 n from pj(. y), how do we get { Xi }i=1 n samples from p(. y)? my j=1 p j (X y j ) my j=1 p j (X y j ) p j (X)
10 Consensus Monte Carlo! Each worker machine j collects N samples {Xmn} from: p j (X y j )=p(x) 1/m IY i=1 p(y ji X)! Master machine combines samples by weighted average: 0 mx X i W j 1 A 1 m X j=1 j=1 W j X ji [Scott et al 2013]
11
12 Consensus Monte Carlo X i = mx W j 1 A 1 m X W j X ji j=1 j=1! Combination is correct if local posteriors are Gaussian.! Weights are local posterior precisions.! If not Gaussian, makes strong assumptions and unclear what local priors and weights for it to work. [Scott et al 2013]
13 Approximating Local Posterior Densities! [Neiswanger et al 2013] proposed methods to combine estimates of local posterior densities instead of samples:! Parametric: Gaussian approximation.! Nonparametric: kernel density estimation based on samples.! Semiparametric: Product of a parametric Gaussian approximation with a nonparametric KDE correction term. p(x y) / my j=1 p j (X y j ) my j=1 1 n nx K hj (X; X ji ) i=1! Combination: Product of (approximate) densities.! Sampling: Resort to MetropoliswithinGibbs.! [Wang & Dunson 2013] s Weierstrass sampler is similar, using rejection sampling instead. [Neiswanger et al 2013, Wang & Dunson 2013]
14
15 Approximating Local Posterior Densities! Parametric approximation can be quite bad unless Bernsteinvon Mises Theorem kicks in.! Complex and expensive combination step in non and semiparametric estimates.! KDE suffers from curse of dimensionality.! Performs poorly if local posteriors differ significantly.
16 Intuition and Desiderata! Distributed system with independent MCMC sampling.! Identify regions of high (global) posterior probability mass.! Each local sampler is based on local data, but concentrate on high probability regions.! High probability regions found using samples, by allowing for some small amount of communication.
17 (Not Quite) Embarrassingly Parallel MCMC! Allow some amount of communication to align worker MCMC samplers.! High probability region defined by low order moments.! Align using Expectation Propagation (EP). y 1i y 2i y 3i y 4i! Asynchronous and infrequent updates.
18 Expectation Propagation! If N is large, the worker j likelihood term p(yj X) should be well approximated by Gaussian p(y j X) q j (X) =N (X; µ j, j )! Parameters fit iteratively using a variational approach to minimize KL divergence: p(x y) p j (X y) / p(y j X) p(x) Y k6=j q k (X) {z } p j (X) q new j ( ) = arg min N ( ;µ, ) KL p j( y) k N ( ; µ, )p j ( ) [Minka 2001]
19 Expectation Propagation p(x y) p j (X y) / p(y j X) p(x) Y k6=j q k (X)! Update performed as follows: {z } p j (X) q new j ( ) = arg min N ( ;µ, ) KL p j( y) k N ( ; µ, )p j ( )! Compute (or estimate) first two moments µ*, Σ* of pj( X y).! Compute µj, Σj so that N(.; µj, Σj) pj( X )/Z has moments µ*, Σ*.! Computations done on natural parameters.! Generalizes to other exponential families.
20 Expectation Propagation q new j ( ) = arg min N ( ;µ, ) KL p j( y) k N ( ; µ, )p j ( ) p(x)! Variational parameters fit iteratively until convergence.! EP tends to converge very quickly (when it does).! Damping updates can help convergence. p(y1 X) q1(x) p(y2 X) q2(x) p(y3 X) q3(x) p(y4 X) q4(x)! At convergence, all local posteriors agree on their first two moments. y 1i y 2i y 3i y 4i! Generalizes to hierarchical and graphical models [infer.net, Gelman et al 2014].
21 Sampling via Moment Sharing (SMS) q new j ( ) = arg min N ( ;µ, ) KL p j( y) k N ( ; µ, )p j ( ) p(x)! KL minimized by matching moments of pj(x y).! Moments computed by drawing MCMC samples. p(y1 X) q1(x) p(y2 X) q2(x) p(y3 X) q3(x) p(y4 X) q4(x)! All samples from all machines can be treated as approximate samples from full posterior given all data. y 1i y 2i y 3i y 4i! Communicate only moments, synchronous or asynchronous.
22 Sampling via Moment Sharing (SMS) q new j ( ) = arg min N ( ;µ, ) KL p j( y) k N ( ; µ, )p j ( ) p j ( ) p(x)! KL minimized by matching moments of pj(x y).! Moments computed by drawing MCMC samples. p(y1 X) q1(x) p(y2 X) q2(x) p(y3 X) q3(x) p(y4 X) q4(x)! All samples from all machines can be treated as approximate samples from full posterior given all data. y 1i y 2i y 3i y 4i! Communicate only moments, synchronous or asynchronous.
23 Sampling via Moment Sharing (SMS) q new j ( ) = arg min N ( ;µ, ) KL p j( y) k N ( ; µ, )p j ( ) p j ( ) p(x)! KL minimized by matching moments of pj(x y).! Moments computed by drawing MCMC samples. p(y1 X) q1(x) p(y2 X) q2(x) p(y3 X) q3(x) p(y4 X) q4(x)! All samples from all machines can be treated as approximate samples from full posterior given all data. y 1i y 2i y 3i y 4i {X ji }! Communicate only moments, synchronous or asynchronous.
24 Sampling via Moment Sharing (SMS) q new j ( ) = arg min N ( ;µ, ) KL p j( y) k N ( ; µ, )p j ( ) p j ( ) p(x)! KL minimized by matching moments of pj(x y).! Moments computed by drawing MCMC samples. p(y1 X) q1(x) p(y2 X) q2(x) p(y3 X) q3(x) p(y4 X) q4(x)! All samples from all machines can be treated as approximate samples from full posterior given all data. y 1i y 2i y 3i y 4i {X ji } ) (µ, )! Communicate only moments, synchronous or asynchronous.
25 Sampling via Moment Sharing (SMS) q new j ( ) = arg min N ( ;µ, ) KL p j( y) k N ( ; µ, )p j ( ) p j ( ) p(x)! KL minimized by matching moments of pj(x y).! Moments computed by drawing MCMC samples. p(y1 X) q1(x) p(y2 X) q2(x) p(y3 X) q3(x) p(y4 X) q4(x)! All samples from all machines can be treated as approximate samples from full posterior given all data. y 1i y 2i y 3i y 4i {X ji } ) (µ, ) ) (µ j, j )! Communicate only moments, synchronous or asynchronous.
26 Sampling via Moment Sharing (SMS) q new j ( ) = arg min N ( ;µ, ) KL p j( y) k N ( ; µ, )p j ( ) p(x)! KL minimized by matching moments of pj(x y). p j ( ) q j ( )! Moments computed by drawing MCMC samples. p(y1 X) q1(x) p(y2 X) q2(x) p(y3 X) q3(x) p(y4 X) q4(x)! All samples from all machines can be treated as approximate samples from full posterior given all data. y 1i y 2i y 3i y 4i {X ji } ) (µ, ) ) (µ j, j )! Communicate only moments, synchronous or asynchronous.
27 Bayesian Logistic Regression k T N/m k T N/m 10 3! Simulated dataset.! d=20, # data items N=1000.! NUTS base sampler.! # workers m = 4,10,50.! # MCMC iters T = 1000,1000,10000.! # EP iters k given as vertical lines k T N/m 10 3
28 Bayesian Logistic Regression! MSE of posterior mean, as function of total # iterations SMS(s) SMS(a) SCOT NEIS(p) NEIS(n) WANG k T m x 10 5
29 Bayesian Logistic Regression! Approximate KL, MSE of predictive probabilities, as function of total # iterations SMS(s) SMS(a) SCOT WANG k T m x SMS(s) SMS(a) SCOT NEIS(n) WANG k T m x 10 5
30 Bayesian Logistic Regression! Approximate KL as function of # nodes SMS(s,s) SMS(s,e) SMS(a,s) SMS(a,e) SCOT XING(p) m=8 m=16 m=32 m=48 m=64
31 Bayesian Logistic Regression! Approximate KL, as function of # iterations per node and # likelihood evaluations SMS(s) SMS(a) m = 8 m = 16 m = 32 m = 48 m = SMS(s) SMS(a) m = 8 m = 16 m = 32 m = 48 m = k T x k T N/m x 10 8
32 SpikeandSlab Sparse Regression! Posterior mean coefficients k T N/m k T N/m 10 3
33 Some Remarks! Scalable distributed MCMC sampling.! A bit of communication goes a long way.! Issue with stochasticity of moment estimates:! EP theory does not cover stochastic updates.! Not clear what is the best stochastic update to use.! Nor how can we characterise convergence and quality of approximation.! Matlab source: https://github.com/chokkyvista/smssample
34 Other Approaches to Scalable Bayes! Median posterior [Stanislav et al 2014]:! Embeds local posteriors into an RKHS, and computes the geometric median.! Improves robustness to outliers in data.! Stochastic gradient MCMC approaches:! Reduce cost of each MCMC step by using data subset.! A distributed version have been developed.! [Welling & Teh 2011, Ahn et al 2012, 2014, Teh, Thiery & Vollmer (forthcoming), Bardenet et al 2014]! Variational approaches:! Faster convergence, with possibly significant bias.! Recent works successfully extend these to large scale datasets using stochastic approximation techniques [Hoffman et al 2010, 2013, etc] and to flexible parameterized variational distributions [Mnih & Gregor 2014, Rezende et al 2014, Kingma & Welling 2014].
35 Bigger Picture! The probabilistic modelling/bayesian inference approach offers a principled and powerful data analysis framework.! Standard methodologies do not extend easily to Big Data.! Important to develop generic methodologies allowing these approaches to be applicable on Big Data.! Bias/variance tradeoffs becoming more important.! Low bias exact methods do not scale as well to Big Data.
36 Thank you! Thanks for funding:
Distributed Bayesian Posterior Sampling via Moment Sharing
Distributed Bayesian Posterior Sampling via Moment Sharing Minjie Xu 1, Balaji Lakshminarayanan 2, Yee Whye Teh 3, Jun Zhu 1, and Bo Zhang 1 1 State Key Lab of Intelligent Technology and Systems; Tsinghua
More informationProbabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014
Probabilistic Models for Big Data Alex Davies and Roger Frigola University of Cambridge 13th February 2014 The State of Big Data Why probabilistic models for Big Data? 1. If you don t have to worry about
More informationBig Data, Statistics, and the Internet
Big Data, Statistics, and the Internet Steven L. Scott April, 4 Steve Scott (Google) Big Data, Statistics, and the Internet April, 4 / 39 Summary Big data live on more than one machine. Computing takes
More informationComputational Statistics for Big Data
Lancaster University Computational Statistics for Big Data Author: 1 Supervisors: Paul Fearnhead 1 Emily Fox 2 1 Lancaster University 2 The University of Washington September 1, 2015 Abstract The amount
More informationBayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University caizhua@gmail.com
Bayesian Machine Learning (ML): Modeling And Inference in Big Data Zhuhua Cai Google Rice University caizhua@gmail.com 1 Syllabus Bayesian ML Concepts (Today) Bayesian ML on MapReduce (Next morning) Bayesian
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct
More informationCS 688 Pattern Recognition Lecture 4. Linear Models for Classification
CS 688 Pattern Recognition Lecture 4 Linear Models for Classification Probabilistic generative models Probabilistic discriminative models 1 Generative Approach ( x ) p C k p( C k ) Ck p ( ) ( x Ck ) p(
More informationBayesian Statistics: Indian Buffet Process
Bayesian Statistics: Indian Buffet Process Ilker Yildirim Department of Brain and Cognitive Sciences University of Rochester Rochester, NY 14627 August 2012 Reference: Most of the material in this note
More informationStatistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
More informationParallelization Strategies for Multicore Data Analysis
Parallelization Strategies for Multicore Data Analysis WeiChen Chen 1 Russell Zaretzki 2 1 University of Tennessee, Dept of EEB 2 University of Tennessee, Dept. Statistics, Operations, and Management
More informationSection 5. Stan for Big Data. Bob Carpenter. Columbia University
Section 5. Stan for Big Data Bob Carpenter Columbia University Part I Overview Scaling and Evaluation data size (bytes) 1e18 1e15 1e12 1e9 1e6 Big Model and Big Data approach state of the art big model
More informationParallel & Distributed Optimization. Based on Mark Schmidt s slides
Parallel & Distributed Optimization Based on Mark Schmidt s slides Motivation behind using parallel & Distributed optimization Performance Computational throughput have increased exponentially in linear
More informationLecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
More informationLogistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression
Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max
More informationExploiting the Statistics of Learning and Inference
Exploiting the Statistics of Learning and Inference Max Welling Institute for Informatics University of Amsterdam Science Park 904, Amsterdam, Netherlands m.welling@uva.nl Abstract. When dealing with datasets
More informationBasics of Statistical Machine Learning
CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar
More informationLinear Models for Classification
Linear Models for Classification Sumeet Agarwal, EEL709 (Most figures from Bishop, PRML) Approaches to classification Discriminant function: Directly assigns each data point x to a particular class Ci
More informationLearning Gaussian process models from big data. Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu
Learning Gaussian process models from big data Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu Machine learning seminar at University of Cambridge, July 4 2012 Data A lot of
More informationArtificial Intelligence Mar 27, Bayesian Networks 1 P (T D)P (D) + P (T D)P ( D) =
Artificial Intelligence 15381 Mar 27, 2007 Bayesian Networks 1 Recap of last lecture Probability: precise representation of uncertainty Probability theory: optimal updating of knowledge based on new information
More informationPS 271B: Quantitative Methods II. Lecture Notes
PS 271B: Quantitative Methods II Lecture Notes Langche Zeng zeng@ucsd.edu The Empirical Research Process; Fundamental Methodological Issues 2 Theory; Data; Models/model selection; Estimation; Inference.
More informationStatistical Machine Learning from Data
Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Gaussian Mixture Models Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique
More informationCSCI567 Machine Learning (Fall 2014)
CSCI567 Machine Learning (Fall 2014) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu September 22, 2014 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 22, 2014 1 /
More informationL3: Statistical Modeling with Hadoop
L3: Statistical Modeling with Hadoop Feng Li feng.li@cufe.edu.cn School of Statistics and Mathematics Central University of Finance and Economics Revision: December 10, 2014 Today we are going to learn...
More informationChristfried Webers. Canberra February June 2015
c Statistical Group and College of Engineering and Computer Science Canberra February June (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 829 c Part VIII Linear Classification 2 Logistic
More informationGaussian Processes to Speed up Hamiltonian Monte Carlo
Gaussian Processes to Speed up Hamiltonian Monte Carlo Matthieu Lê Murray, Iain http://videolectures.net/mlss09uk_murray_mcmc/ Rasmussen, Carl Edward. "Gaussian processes to speed up hybrid Monte Carlo
More informationSupervised Learning (Big Data Analytics)
Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used
More informationParallel Data Mining. Team 2 Flash Coders Team Research Investigation Presentation 2. Foundations of Parallel Computing Oct 2014
Parallel Data Mining Team 2 Flash Coders Team Research Investigation Presentation 2 Foundations of Parallel Computing Oct 2014 Agenda Overview of topic Analysis of research papers Software design Overview
More informationNeural Networks. CAP5610 Machine Learning Instructor: GuoJun Qi
Neural Networks CAP5610 Machine Learning Instructor: GuoJun Qi Recap: linear classifier Logistic regression Maximizing the posterior distribution of class Y conditional on the input vector X Support vector
More informationLinear Threshold Units
Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear
More informationIntroduction to Markov Chain Monte Carlo
Introduction to Markov Chain Monte Carlo Monte Carlo: sample from a distribution to estimate the distribution to compute max, mean Markov Chain Monte Carlo: sampling using local information Generic problem
More informationLab 8: Introduction to WinBUGS
40.656 Lab 8 008 Lab 8: Introduction to WinBUGS Goals:. Introduce the concepts of Bayesian data analysis.. Learn the basic syntax of WinBUGS. 3. Learn the basics of using WinBUGS in a simple example. Next
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical
More informationBayesian Statistics in One Hour. Patrick Lam
Bayesian Statistics in One Hour Patrick Lam Outline Introduction Bayesian Models Applications Missing Data Hierarchical Models Outline Introduction Bayesian Models Applications Missing Data Hierarchical
More informationThese slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
Music and Machine Learning (IFT6080 Winter 08) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher
More informationProbabilistic Linear Classification: Logistic Regression. Piyush Rai IIT Kanpur
Probabilistic Linear Classification: Logistic Regression Piyush Rai IIT Kanpur Probabilistic Machine Learning (CS772A) Jan 18, 2016 Probabilistic Machine Learning (CS772A) Probabilistic Linear Classification:
More informationAn Introduction to Machine Learning
An Introduction to Machine Learning L5: Novelty Detection and Regression Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia Alex.Smola@nicta.com.au Tata Institute, Pune,
More informationTutorial on Markov Chain Monte Carlo
Tutorial on Markov Chain Monte Carlo Kenneth M. Hanson Los Alamos National Laboratory Presented at the 29 th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Technology,
More informationA Bootstrap MetropolisHastings Algorithm for Bayesian Analysis of Big Data
A Bootstrap MetropolisHastings Algorithm for Bayesian Analysis of Big Data Faming Liang University of Florida August 9, 2015 Abstract MCMC methods have proven to be a very powerful tool for analyzing
More informationBayesian Factorization Machines
Bayesian Factorization Machines Christoph Freudenthaler, Lars SchmidtThieme Information Systems & Machine Learning Lab University of Hildesheim 31141 Hildesheim {freudenthaler, schmidtthieme}@ismll.de
More informationImputing Values to Missing Data
Imputing Values to Missing Data In federated data, between 30%70% of the data points will have at least one missing attribute  data wastage if we ignore all records with a missing value Remaining data
More informationTowards running complex models on big data
Towards running complex models on big data Working with all the genomes in the world without changing the model (too much) Daniel Lawson Heilbronn Institute, University of Bristol 2013 1 / 17 Motivation
More informationAusterity in MCMC Land: Cutting the MetropolisHastings Budget
Anoop Korattikara AKORATTI@UCI.EDU School of Information & Computer Sciences, University of California, Irvine, CA 92617, USA Yutian Chen YUTIAN.CHEN@ENG.CAM.EDU Department of Engineering, University of
More informationPrinciples of Data Mining by Hand&Mannila&Smyth
Principles of Data Mining by Hand&Mannila&Smyth Slides for Textbook Ari Visa,, Institute of Signal Processing Tampere University of Technology October 4, 2010 Data Mining: Concepts and Techniques 1 Differences
More informationInference on Phasetype Models via MCMC
Inference on Phasetype Models via MCMC with application to networks of repairable redundant systems Louis JM Aslett and Simon P Wilson Trinity College Dublin 28 th June 202 Toy Example : Redundant Repairable
More informationDirichlet Processes A gentle tutorial
Dirichlet Processes A gentle tutorial SELECT Lab Meeting October 14, 2008 Khalid ElArini Motivation We are given a data set, and are told that it was generated from a mixture of Gaussian distributions.
More informationManifold Learning with Variational Autoencoder for Medical Image Analysis
Manifold Learning with Variational Autoencoder for Medical Image Analysis Eunbyung Park Department of Computer Science University of North Carolina at Chapel Hill eunbyung@cs.unc.edu Abstract Manifold
More informationTwo Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering
Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering Department of Industrial Engineering and Management Sciences Northwestern University September 15th, 2014
More informationLinear Classification. Volker Tresp Summer 2015
Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong
More informationGaussian Processes in Machine Learning
Gaussian Processes in Machine Learning Carl Edward Rasmussen Max Planck Institute for Biological Cybernetics, 72076 Tübingen, Germany carl@tuebingen.mpg.de WWW home page: http://www.tuebingen.mpg.de/ carl
More informationPredict Influencers in the Social Network
Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, lyzhou@stanford.edu Department of Electrical Engineering, Stanford University Abstract Given two persons
More informationThe Basics of Graphical Models
The Basics of Graphical Models David M. Blei Columbia University October 3, 2015 Introduction These notes follow Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan. Many figures
More informationParametric Models Part I: Maximum Likelihood and Bayesian Density Estimation
Parametric Models Part I: Maximum Likelihood and Bayesian Density Estimation Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2015 CS 551, Fall 2015
More informationDeterministic Samplingbased Switching Kalman Filtering for Vehicle Tracking
Proceedings of the IEEE ITSC 2006 2006 IEEE Intelligent Transportation Systems Conference Toronto, Canada, September 1720, 2006 WA4.1 Deterministic Samplingbased Switching Kalman Filtering for Vehicle
More informationMachine Learning and Data Analysis overview. Department of Cybernetics, Czech Technical University in Prague. http://ida.felk.cvut.
Machine Learning and Data Analysis overview Jiří Kléma Department of Cybernetics, Czech Technical University in Prague http://ida.felk.cvut.cz psyllabus Lecture Lecturer Content 1. J. Kléma Introduction,
More informationData Mining: An Overview. David Madigan http://www.stat.columbia.edu/~madigan
Data Mining: An Overview David Madigan http://www.stat.columbia.edu/~madigan Overview Brief Introduction to Data Mining Data Mining Algorithms Specific Eamples Algorithms: Disease Clusters Algorithms:
More informationarxiv:1410.4984v1 [cs.dc] 18 Oct 2014
Gaussian Process Models with Parallelization and GPU acceleration arxiv:1410.4984v1 [cs.dc] 18 Oct 2014 Zhenwen Dai Andreas Damianou James Hensman Neil Lawrence Department of Computer Science University
More informationModeling and Analysis of Call Center Arrival Data: A Bayesian Approach
Modeling and Analysis of Call Center Arrival Data: A Bayesian Approach Refik Soyer * Department of Management Science The George Washington University M. Murat Tarimcilar Department of Management Science
More informationHT2015: SC4 Statistical Data Mining and Machine Learning
HT2015: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Bayesian Nonparametrics Parametric vs Nonparametric
More informationA Latent Variable Approach to Validate Credit Rating Systems using R
A Latent Variable Approach to Validate Credit Rating Systems using R Chicago, April 24, 2009 Bettina Grün a, Paul Hofmarcher a, Kurt Hornik a, Christoph Leitner a, Stefan Pichler a a WU Wien Grün/Hofmarcher/Hornik/Leitner/Pichler
More informationIntroduction to Machine Learning. Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011
Introduction to Machine Learning Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011 1 Outline 1. What is machine learning? 2. The basic of machine learning 3. Principles and effects of machine learning
More informationBayesian networks  Timeseries models  Apache Spark & Scala
Bayesian networks  Timeseries models  Apache Spark & Scala Dr John Sandiford, CTO Bayes Server Data Science London Meetup  November 2014 1 Contents Introduction Bayesian networks Latent variables Anomaly
More informationMachine Learning Logistic Regression
Machine Learning Logistic Regression Jeff Howbert Introduction to Machine Learning Winter 2012 1 Logistic regression Name is somewhat misleading. Really a technique for classification, not regression.
More informationCheng Soon Ong & Christfried Webers. Canberra February June 2016
c Cheng Soon Ong & Christfried Webers Research Group and College of Engineering and Computer Science Canberra February June (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 31 c Part I
More informationChapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )
Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 ) and Neural Networks( 類 神 經 網 路 ) 許 湘 伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) LR Chap 10 1 / 35 13 Examples
More informationClassification Problems
Classification Read Chapter 4 in the text by Bishop, except omit Sections 4.1.6, 4.1.7, 4.2.4, 4.3.3, 4.3.5, 4.3.6, 4.4, and 4.5. Also, review sections 1.5.1, 1.5.2, 1.5.3, and 1.5.4. Classification Problems
More informationMaster s thesis tutorial: part III
for the Autonomous Compliant Research group Tinne De Laet, Wilm Decré, Diederik Verscheure Katholieke Universiteit Leuven, Department of Mechanical Engineering, PMA Division 30 oktober 2006 Outline General
More informationAn Introduction to Using WinBUGS for CostEffectiveness Analyses in Health Economics
Slide 1 An Introduction to Using WinBUGS for CostEffectiveness Analyses in Health Economics Dr. Christian Asseburg Centre for Health Economics Part 1 Slide 2 Talk overview Foundations of Bayesian statistics
More informationFeature Engineering in Machine Learning
Research Fellow Faculty of Information Technology, Monash University, Melbourne VIC 3800, Australia August 21, 2015 Outline A Machine Learning Primer Machine Learning and Data Science BiasVariance Phenomenon
More informationBayesian Image SuperResolution
Bayesian Image SuperResolution Michael E. Tipping and Christopher M. Bishop Microsoft Research, Cambridge, U.K..................................................................... Published as: Bayesian
More informationStatistical Models in Data Mining
Statistical Models in Data Mining Sargur N. Srihari University at Buffalo The State University of New York Department of Computer Science and Engineering Department of Biostatistics 1 Srihari Flood of
More informationA Simultaneous Solution for General Linear Equations on a Ring or Hierarchical Cluster
Acta Technica Jaurinensis Vol. 3. No. 1. 010 A Simultaneous Solution for General Linear Equations on a Ring or Hierarchical Cluster G. Molnárka, N. Varjasi Széchenyi István University Győr, Hungary, H906
More informationDetection of changes in variance using binary segmentation and optimal partitioning
Detection of changes in variance using binary segmentation and optimal partitioning Christian Rohrbeck Abstract This work explores the performance of binary segmentation and optimal partitioning in the
More informationLatent variable and deep modeling with Gaussian processes; application to system identification. Andreas Damianou
Latent variable and deep modeling with Gaussian processes; application to system identification Andreas Damianou Department of Computer Science, University of Sheffield, UK Brown University, 17 Feb. 2016
More informationMACHINE LEARNING IN HIGH ENERGY PHYSICS
MACHINE LEARNING IN HIGH ENERGY PHYSICS LECTURE #1 Alex Rogozhnikov, 2015 INTRO NOTES 4 days two lectures, two practice seminars every day this is introductory track to machine learning kaggle competition!
More informationLEARNING FROM BIG DATA
LEARNING FROM BIG DATA Mattias Villani Division of Statistics and Machine Learning Department of Computer and Information Science Linköping University MATTIAS VILLANI (STIMA, LIU) LEARNING FROM BIG DATA
More informationBayesX  Software for Bayesian Inference in Structured Additive Regression
BayesX  Software for Bayesian Inference in Structured Additive Regression Thomas Kneib Faculty of Mathematics and Economics, University of Ulm Department of Statistics, LudwigMaximiliansUniversity Munich
More informationStatistics Graduate Courses
Statistics Graduate Courses STAT 7002Topics in StatisticsBiological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.
More informationLogistic Regression for Spam Filtering
Logistic Regression for Spam Filtering Nikhila Arkalgud February 14, 28 Abstract The goal of the spam filtering problem is to identify an email as a spam or not spam. One of the classic techniques used
More informationSpatial Statistics Chapter 3 Basics of areal data and areal data modeling
Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Recall areal data also known as lattice data are data Y (s), s D where D is a discrete index set. This usually corresponds to data
More informationNonparametric adaptive age replacement with a onecycle criterion
Nonparametric adaptive age replacement with a onecycle criterion P. CoolenSchrijner, F.P.A. Coolen Department of Mathematical Sciences University of Durham, Durham, DH1 3LE, UK email: Pauline.Schrijner@durham.ac.uk
More informationA crash course in probability and Naïve Bayes classification
Probability theory A crash course in probability and Naïve Bayes classification Chapter 9 Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s
More informationMaking Sense of the Mayhem: Machine Learning and March Madness
Making Sense of the Mayhem: Machine Learning and March Madness Alex Tran and Adam Ginzberg Stanford University atran3@stanford.edu ginzberg@stanford.edu I. Introduction III. Model The goal of our research
More informationBig Data need Big Model 1/44
Big Data need Big Model 1/44 Andrew Gelman, Bob Carpenter, Matt Hoffman, Daniel Lee, Ben Goodrich, Michael Betancourt, Marcus Brubaker, Jiqiang Guo, Peter Li, Allen Riddell,... Department of Statistics,
More informationModelbased Synthesis. Tony O Hagan
Modelbased Synthesis Tony O Hagan Stochastic models Synthesising evidence through a statistical model 2 Evidence Synthesis (Session 3), Helsinki, 28/10/11 Graphical modelling The kinds of models that
More informationJiří Matas. Hough Transform
Hough Transform Jiří Matas Center for Machine Perception Department of Cybernetics, Faculty of Electrical Engineering Czech Technical University, Prague Many slides thanks to Kristen Grauman and Bastian
More informationScalable Machine Learning  or what to do with all that Big Data infrastructure
 or what to do with all that Big Data infrastructure TU Berlin blog.mikiobraun.de Strata+Hadoop World London, 2015 1 Complex Data Analysis at Scale Clickthrough prediction Personalized Spam Detection
More informationMethods of Data Analysis Working with probability distributions
Methods of Data Analysis Working with probability distributions Week 4 1 Motivation One of the key problems in nonparametric data analysis is to create a good model of a generating probability distribution,
More informationReliability estimators for the components of series and parallel systems: The Weibull model
Reliability estimators for the components of series and parallel systems: The Weibull model Felipe L. Bhering 1, Carlos Alberto de Bragança Pereira 1, Adriano Polpo 2 1 Department of Statistics, University
More informationThe Exponential Family
The Exponential Family David M. Blei Columbia University November 3, 2015 Definition A probability density in the exponential family has this form where p.x j / D h.x/ expf > t.x/ a./g; (1) is the natural
More informationLinear regression methods for large n and streaming data
Linear regression methods for large n and streaming data Large n and small or moderate p is a fairly simple problem. The sufficient statistic for β in OLS (and ridge) is: The concept of sufficiency is
More informationCCNY. BME I5100: Biomedical Signal Processing. Linear Discrimination. Lucas C. Parra Biomedical Engineering Department City College of New York
BME I5100: Biomedical Signal Processing Linear Discrimination Lucas C. Parra Biomedical Engineering Department CCNY 1 Schedule Week 1: Introduction Linear, stationary, normal  the stuff biology is not
More informationCS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.
Lecture Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott
More informationMore details on the inputs, functionality, and output can be found below.
Overview: The SMEEACT (Software for More Efficient, Ethical, and Affordable Clinical Trials) web interface (http://research.mdacc.tmc.edu/smeeactweb) implements a single analysis of a twoarmed trial comparing
More informationModern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh
Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Peter Richtárik Week 3 Randomized Coordinate Descent With Arbitrary Sampling January 27, 2016 1 / 30 The Problem
More informationValidation of Software for Bayesian Models using Posterior Quantiles. Samantha R. Cook Andrew Gelman Donald B. Rubin DRAFT
Validation of Software for Bayesian Models using Posterior Quantiles Samantha R. Cook Andrew Gelman Donald B. Rubin DRAFT Abstract We present a simulationbased method designed to establish that software
More informationUnsupervised Learning
Unsupervised Learning Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London, UK zoubin@gatsby.ucl.ac.uk http://www.gatsby.ucl.ac.uk/~zoubin September 16, 2004 Abstract We give
More informationAn Internal Model for Operational Risk Computation
An Internal Model for Operational Risk Computation Seminarios de Matemática Financiera Instituto MEFFRiskLab, Madrid http://www.risklabmadrid.uam.es/ Nicolas Baud, Antoine Frachot & Thierry Roncalli
More informationA Probabilistic Model for Online Document Clustering with Application to Novelty Detection
A Probabilistic Model for Online Document Clustering with Application to Novelty Detection Jian Zhang School of Computer Science Cargenie Mellon University Pittsburgh, PA 15213 jian.zhang@cs.cmu.edu Zoubin
More informationDistance based clustering
// Distance based clustering Chapter ² ² Clustering Clustering is the art of finding groups in data (Kaufman and Rousseeuw, 99). What is a cluster? Group of objects separated from other clusters Means
More informationTracking Algorithms. Lecture17: Stochastic Tracking. Joint Probability and Graphical Model. Probabilistic Tracking
Tracking Algorithms (2015S) Lecture17: Stochastic Tracking Bohyung Han CSE, POSTECH bhhan@postech.ac.kr Deterministic methods Given input video and current state, tracking result is always same. Local
More information