Sampling via Moment Sharing: A New Framework for Distributed Bayesian Inference for Big Data
|
|
- Gerard McGee
- 8 years ago
- Views:
Transcription
1 Sampling via Moment Sharing: A New Framework for Distributed Bayesian Inference for Big Data (Oxford) in collaboration with: Minjie Xu, Jun Zhu, Bo Zhang (Tsinghua) Balaji Lakshminarayanan (Gatsby)
2 Bayesian Inference! Parameter vector X. X! Data items Y = y1, y2,... yn. y 1 y 2 y 3 y 4... y N! Model:! Aim: p(x, Y )=p(x) p(x Y )= NY i=1 p(y i X) p(x)p(y X) p(y )
3 Why Bayes for Machine Learning?! An important framework to frame learning.! Quantification of uncertainty.! Flexible and intuitive construction of complex models.! Straightforward derivation of learning algorithms.! Mitigation of overfitting.
4 Big Data and Bayesian Inference?! Large scale datasets are fast becoming the norm.! Analysing and extracting understanding from these data is a driver of progress in many sectors of society.! Current successes in scalable learning are optimizationbased and non-bayesian.! What is the role of Bayesian learning in world of Big Data?
5 Generic (Machine) Learning on Big Data! Stochastic optimisation using mini-batches.! Stochastic gradient descent. > Stochastic Gradient Langevin Dynamics (Welling & Teh, Teh et al)! Distributed/parallel computations on cores/clusters/gpus.! MapReduce, parameter server.! Bringing the computations to the data, not the reverse.! High communication costs. > Distributed Bayesian Posterior Sampling via Moment Sharing (Xu et al)! High synchronisation costs. > Asynchronous Anytime Sequential Monte Carlo (Paige et al)
6 Generic (Bayesian) Learning on Big Data! Stochastic optimisation using mini-batches.! Stochastic gradient descent.! > Stochastic Gradient Langevin Dynamics [Welling & Teh 2011, Patterson & Teh 2013, Teh et al (forthcoming)]! Distributed/parallel computations on cores/clusters/gpus.! MapReduce, parameter server.! Bringing the computations to the data, not the reverse.! High communication costs.! > Distributed Bayesian Posterior Sampling via Moment Sharing [Xu et al 2014]! High synchronisation costs.! > Asynchronous Anytime Sequential Monte Carlo [Paige et al 2014]
7 Machine Learning on Distributed Systems! Distributed storage! Distributed computation! Network communication costs y 1i y 2i y 3i y 4i
8 Embarassingly Parallel MCMC Sampling Combine samples together. {X i } i=1...n Treat as independent inference problems. Collect samples. y 1i y 2i y 3i y 4i {X ji } j=1...m,i=1...n! Only communication at the combination stage.
9 ! where Local and Global Posteriors! Each worker machine j has access only to its data subset. p j (X y j )=p j (X) pj(x) is a local prior and pj(x yj) is local posterior. IY i=1 p(y ji X)! The (target) global posterior is p(x y) / p(x) my j=1! If prior p(x) = j pj(x), then p(x y) / p(y j X) / p(x)! Given collection of samples { Xji }i=1 n from pj(. y), how do we get { Xi }i=1 n samples from p(. y)? my j=1 p j (X y j ) my j=1 p j (X y j ) p j (X)
10 Consensus Monte Carlo! Each worker machine j collects N samples {Xmn} from: p j (X y j )=p(x) 1/m IY i=1 p(y ji X)! Master machine combines samples by weighted average: 0 mx X i W j 1 A 1 m X j=1 j=1 W j X ji [Scott et al 2013]
11
12 Consensus Monte Carlo X i = mx W j 1 A 1 m X W j X ji j=1 j=1! Combination is correct if local posteriors are Gaussian.! Weights are local posterior precisions.! If not Gaussian, makes strong assumptions and unclear what local priors and weights for it to work. [Scott et al 2013]
13 Approximating Local Posterior Densities! [Neiswanger et al 2013] proposed methods to combine estimates of local posterior densities instead of samples:! Parametric: Gaussian approximation.! Nonparametric: kernel density estimation based on samples.! Semiparametric: Product of a parametric Gaussian approximation with a nonparametric KDE correction term. p(x y) / my j=1 p j (X y j ) my j=1 1 n nx K hj (X; X ji ) i=1! Combination: Product of (approximate) densities.! Sampling: Resort to Metropolis-within-Gibbs.! [Wang & Dunson 2013] s Weierstrass sampler is similar, using rejection sampling instead. [Neiswanger et al 2013, Wang & Dunson 2013]
14
15 Approximating Local Posterior Densities! Parametric approximation can be quite bad unless Bernstein-von Mises Theorem kicks in.! Complex and expensive combination step in non- and semi-parametric estimates.! KDE suffers from curse of dimensionality.! Performs poorly if local posteriors differ significantly.
16 Intuition and Desiderata! Distributed system with independent MCMC sampling.! Identify regions of high (global) posterior probability mass.! Each local sampler is based on local data, but concentrate on high probability regions.! High probability regions found using samples, by allowing for some small amount of communication.
17 (Not Quite) Embarrassingly Parallel MCMC! Allow some amount of communication to align worker MCMC samplers.! High probability region defined by low order moments.! Align using Expectation Propagation (EP). y 1i y 2i y 3i y 4i! Asynchronous and infrequent updates.
18 Expectation Propagation! If N is large, the worker j likelihood term p(yj X) should be well approximated by Gaussian p(y j X) q j (X) =N (X; µ j, j )! Parameters fit iteratively using a variational approach to minimize KL divergence: p(x y) p j (X y) / p(y j X) p(x) Y k6=j q k (X) {z } p j (X) q new j ( ) = arg min N ( ;µ, ) KL p j( y) k N ( ; µ, )p j ( ) [Minka 2001]
19 Expectation Propagation p(x y) p j (X y) / p(y j X) p(x) Y k6=j q k (X)! Update performed as follows: {z } p j (X) q new j ( ) = arg min N ( ;µ, ) KL p j( y) k N ( ; µ, )p j ( )! Compute (or estimate) first two moments µ*, Σ* of pj( X y).! Compute µj, Σj so that N(.; µj, Σj) pj( X )/Z has moments µ*, Σ*.! Computations done on natural parameters.! Generalizes to other exponential families.
20 Expectation Propagation q new j ( ) = arg min N ( ;µ, ) KL p j( y) k N ( ; µ, )p j ( ) p(x)! Variational parameters fit iteratively until convergence.! EP tends to converge very quickly (when it does).! Damping updates can help convergence. p(y1 X) q1(x) p(y2 X) q2(x) p(y3 X) q3(x) p(y4 X) q4(x)! At convergence, all local posteriors agree on their first two moments. y 1i y 2i y 3i y 4i! Generalizes to hierarchical and graphical models [infer.net, Gelman et al 2014].
21 Sampling via Moment Sharing (SMS) q new j ( ) = arg min N ( ;µ, ) KL p j( y) k N ( ; µ, )p j ( ) p(x)! KL minimized by matching moments of pj(x y).! Moments computed by drawing MCMC samples. p(y1 X) q1(x) p(y2 X) q2(x) p(y3 X) q3(x) p(y4 X) q4(x)! All samples from all machines can be treated as approximate samples from full posterior given all data. y 1i y 2i y 3i y 4i! Communicate only moments, synchronous or asynchronous.
22 Sampling via Moment Sharing (SMS) q new j ( ) = arg min N ( ;µ, ) KL p j( y) k N ( ; µ, )p j ( ) p j ( ) p(x)! KL minimized by matching moments of pj(x y).! Moments computed by drawing MCMC samples. p(y1 X) q1(x) p(y2 X) q2(x) p(y3 X) q3(x) p(y4 X) q4(x)! All samples from all machines can be treated as approximate samples from full posterior given all data. y 1i y 2i y 3i y 4i! Communicate only moments, synchronous or asynchronous.
23 Sampling via Moment Sharing (SMS) q new j ( ) = arg min N ( ;µ, ) KL p j( y) k N ( ; µ, )p j ( ) p j ( ) p(x)! KL minimized by matching moments of pj(x y).! Moments computed by drawing MCMC samples. p(y1 X) q1(x) p(y2 X) q2(x) p(y3 X) q3(x) p(y4 X) q4(x)! All samples from all machines can be treated as approximate samples from full posterior given all data. y 1i y 2i y 3i y 4i {X ji }! Communicate only moments, synchronous or asynchronous.
24 Sampling via Moment Sharing (SMS) q new j ( ) = arg min N ( ;µ, ) KL p j( y) k N ( ; µ, )p j ( ) p j ( ) p(x)! KL minimized by matching moments of pj(x y).! Moments computed by drawing MCMC samples. p(y1 X) q1(x) p(y2 X) q2(x) p(y3 X) q3(x) p(y4 X) q4(x)! All samples from all machines can be treated as approximate samples from full posterior given all data. y 1i y 2i y 3i y 4i {X ji } ) (µ, )! Communicate only moments, synchronous or asynchronous.
25 Sampling via Moment Sharing (SMS) q new j ( ) = arg min N ( ;µ, ) KL p j( y) k N ( ; µ, )p j ( ) p j ( ) p(x)! KL minimized by matching moments of pj(x y).! Moments computed by drawing MCMC samples. p(y1 X) q1(x) p(y2 X) q2(x) p(y3 X) q3(x) p(y4 X) q4(x)! All samples from all machines can be treated as approximate samples from full posterior given all data. y 1i y 2i y 3i y 4i {X ji } ) (µ, ) ) (µ j, j )! Communicate only moments, synchronous or asynchronous.
26 Sampling via Moment Sharing (SMS) q new j ( ) = arg min N ( ;µ, ) KL p j( y) k N ( ; µ, )p j ( ) p(x)! KL minimized by matching moments of pj(x y). p j ( ) q j ( )! Moments computed by drawing MCMC samples. p(y1 X) q1(x) p(y2 X) q2(x) p(y3 X) q3(x) p(y4 X) q4(x)! All samples from all machines can be treated as approximate samples from full posterior given all data. y 1i y 2i y 3i y 4i {X ji } ) (µ, ) ) (µ j, j )! Communicate only moments, synchronous or asynchronous.
27 Bayesian Logistic Regression k T N/m k T N/m 10 3! Simulated dataset.! d=20, # data items N=1000.! NUTS base sampler.! # workers m = 4,10,50.! # MCMC iters T = 1000,1000,10000.! # EP iters k given as vertical lines k T N/m 10 3
28 Bayesian Logistic Regression! MSE of posterior mean, as function of total # iterations SMS(s) SMS(a) SCOT NEIS(p) NEIS(n) WANG k T m x 10 5
29 Bayesian Logistic Regression! Approximate KL, MSE of predictive probabilities, as function of total # iterations SMS(s) SMS(a) SCOT WANG k T m x SMS(s) SMS(a) SCOT NEIS(n) WANG k T m x 10 5
30 Bayesian Logistic Regression! Approximate KL as function of # nodes SMS(s,s) SMS(s,e) SMS(a,s) SMS(a,e) SCOT XING(p) m=8 m=16 m=32 m=48 m=64
31 Bayesian Logistic Regression! Approximate KL, as function of # iterations per node and # likelihood evaluations SMS(s) SMS(a) m = 8 m = 16 m = 32 m = 48 m = SMS(s) SMS(a) m = 8 m = 16 m = 32 m = 48 m = k T x k T N/m x 10 8
32 Spike-and-Slab Sparse Regression! Posterior mean coefficients k T N/m k T N/m 10 3
33 Some Remarks! Scalable distributed MCMC sampling.! A bit of communication goes a long way.! Issue with stochasticity of moment estimates:! EP theory does not cover stochastic updates.! Not clear what is the best stochastic update to use.! Nor how can we characterise convergence and quality of approximation.! Matlab source:
34 Other Approaches to Scalable Bayes! Median posterior [Stanislav et al 2014]:! Embeds local posteriors into an RKHS, and computes the geometric median.! Improves robustness to outliers in data.! Stochastic gradient MCMC approaches:! Reduce cost of each MCMC step by using data subset.! A distributed version have been developed.! [Welling & Teh 2011, Ahn et al 2012, 2014, Teh, Thiery & Vollmer (forthcoming), Bardenet et al 2014]! Variational approaches:! Faster convergence, with possibly significant bias.! Recent works successfully extend these to large scale datasets using stochastic approximation techniques [Hoffman et al 2010, 2013, etc] and to flexible parameterized variational distributions [Mnih & Gregor 2014, Rezende et al 2014, Kingma & Welling 2014].
35 Bigger Picture! The probabilistic modelling/bayesian inference approach offers a principled and powerful data analysis framework.! Standard methodologies do not extend easily to Big Data.! Important to develop generic methodologies allowing these approaches to be applicable on Big Data.! Bias/variance trade-offs becoming more important.! Low bias exact methods do not scale as well to Big Data.
36 Thank you! Thanks for funding:
Distributed Bayesian Posterior Sampling via Moment Sharing
Distributed Bayesian Posterior Sampling via Moment Sharing Minjie Xu 1, Balaji Lakshminarayanan 2, Yee Whye Teh 3, Jun Zhu 1, and Bo Zhang 1 1 State Key Lab of Intelligent Technology and Systems; Tsinghua
More informationProbabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014
Probabilistic Models for Big Data Alex Davies and Roger Frigola University of Cambridge 13th February 2014 The State of Big Data Why probabilistic models for Big Data? 1. If you don t have to worry about
More informationComputational Statistics for Big Data
Lancaster University Computational Statistics for Big Data Author: 1 Supervisors: Paul Fearnhead 1 Emily Fox 2 1 Lancaster University 2 The University of Washington September 1, 2015 Abstract The amount
More informationBig Data, Statistics, and the Internet
Big Data, Statistics, and the Internet Steven L. Scott April, 4 Steve Scott (Google) Big Data, Statistics, and the Internet April, 4 / 39 Summary Big data live on more than one machine. Computing takes
More informationBayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University caizhua@gmail.com
Bayesian Machine Learning (ML): Modeling And Inference in Big Data Zhuhua Cai Google Rice University caizhua@gmail.com 1 Syllabus Bayesian ML Concepts (Today) Bayesian ML on MapReduce (Next morning) Bayesian
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct
More informationCS 688 Pattern Recognition Lecture 4. Linear Models for Classification
CS 688 Pattern Recognition Lecture 4 Linear Models for Classification Probabilistic generative models Probabilistic discriminative models 1 Generative Approach ( x ) p C k p( C k ) Ck p ( ) ( x Ck ) p(
More informationBayesian Statistics: Indian Buffet Process
Bayesian Statistics: Indian Buffet Process Ilker Yildirim Department of Brain and Cognitive Sciences University of Rochester Rochester, NY 14627 August 2012 Reference: Most of the material in this note
More informationStatistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
More informationParallelization Strategies for Multicore Data Analysis
Parallelization Strategies for Multicore Data Analysis Wei-Chen Chen 1 Russell Zaretzki 2 1 University of Tennessee, Dept of EEB 2 University of Tennessee, Dept. Statistics, Operations, and Management
More informationSection 5. Stan for Big Data. Bob Carpenter. Columbia University
Section 5. Stan for Big Data Bob Carpenter Columbia University Part I Overview Scaling and Evaluation data size (bytes) 1e18 1e15 1e12 1e9 1e6 Big Model and Big Data approach state of the art big model
More informationParallel & Distributed Optimization. Based on Mark Schmidt s slides
Parallel & Distributed Optimization Based on Mark Schmidt s slides Motivation behind using parallel & Distributed optimization Performance Computational throughput have increased exponentially in linear
More informationLogistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression
Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max
More informationLecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
More informationLinear Models for Classification
Linear Models for Classification Sumeet Agarwal, EEL709 (Most figures from Bishop, PRML) Approaches to classification Discriminant function: Directly assigns each data point x to a particular class Ci
More informationBasics of Statistical Machine Learning
CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar
More informationExploiting the Statistics of Learning and Inference
Exploiting the Statistics of Learning and Inference Max Welling Institute for Informatics University of Amsterdam Science Park 904, Amsterdam, Netherlands m.welling@uva.nl Abstract. When dealing with datasets
More informationLearning Gaussian process models from big data. Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu
Learning Gaussian process models from big data Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu Machine learning seminar at University of Cambridge, July 4 2012 Data A lot of
More informationPS 271B: Quantitative Methods II. Lecture Notes
PS 271B: Quantitative Methods II Lecture Notes Langche Zeng zeng@ucsd.edu The Empirical Research Process; Fundamental Methodological Issues 2 Theory; Data; Models/model selection; Estimation; Inference.
More informationCSCI567 Machine Learning (Fall 2014)
CSCI567 Machine Learning (Fall 2014) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu September 22, 2014 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 22, 2014 1 /
More informationStatistical Machine Learning from Data
Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Gaussian Mixture Models Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique
More informationGaussian Processes to Speed up Hamiltonian Monte Carlo
Gaussian Processes to Speed up Hamiltonian Monte Carlo Matthieu Lê Murray, Iain http://videolectures.net/mlss09uk_murray_mcmc/ Rasmussen, Carl Edward. "Gaussian processes to speed up hybrid Monte Carlo
More informationChristfried Webers. Canberra February June 2015
c Statistical Group and College of Engineering and Computer Science Canberra February June (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 829 c Part VIII Linear Classification 2 Logistic
More informationSupervised Learning (Big Data Analytics)
Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used
More informationL3: Statistical Modeling with Hadoop
L3: Statistical Modeling with Hadoop Feng Li feng.li@cufe.edu.cn School of Statistics and Mathematics Central University of Finance and Economics Revision: December 10, 2014 Today we are going to learn...
More informationLinear Threshold Units
Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear
More informationIntroduction to Markov Chain Monte Carlo
Introduction to Markov Chain Monte Carlo Monte Carlo: sample from a distribution to estimate the distribution to compute max, mean Markov Chain Monte Carlo: sampling using local information Generic problem
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical
More informationProbabilistic Linear Classification: Logistic Regression. Piyush Rai IIT Kanpur
Probabilistic Linear Classification: Logistic Regression Piyush Rai IIT Kanpur Probabilistic Machine Learning (CS772A) Jan 18, 2016 Probabilistic Machine Learning (CS772A) Probabilistic Linear Classification:
More informationThese slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
Music and Machine Learning (IFT6080 Winter 08) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher
More informationParallel Data Mining. Team 2 Flash Coders Team Research Investigation Presentation 2. Foundations of Parallel Computing Oct 2014
Parallel Data Mining Team 2 Flash Coders Team Research Investigation Presentation 2 Foundations of Parallel Computing Oct 2014 Agenda Overview of topic Analysis of research papers Software design Overview
More informationAn Introduction to Machine Learning
An Introduction to Machine Learning L5: Novelty Detection and Regression Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia Alex.Smola@nicta.com.au Tata Institute, Pune,
More informationBayesian Statistics in One Hour. Patrick Lam
Bayesian Statistics in One Hour Patrick Lam Outline Introduction Bayesian Models Applications Missing Data Hierarchical Models Outline Introduction Bayesian Models Applications Missing Data Hierarchical
More informationTutorial on Markov Chain Monte Carlo
Tutorial on Markov Chain Monte Carlo Kenneth M. Hanson Los Alamos National Laboratory Presented at the 29 th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Technology,
More informationA Bootstrap Metropolis-Hastings Algorithm for Bayesian Analysis of Big Data
A Bootstrap Metropolis-Hastings Algorithm for Bayesian Analysis of Big Data Faming Liang University of Florida August 9, 2015 Abstract MCMC methods have proven to be a very powerful tool for analyzing
More informationImputing Values to Missing Data
Imputing Values to Missing Data In federated data, between 30%-70% of the data points will have at least one missing attribute - data wastage if we ignore all records with a missing value Remaining data
More informationBayesian Factorization Machines
Bayesian Factorization Machines Christoph Freudenthaler, Lars Schmidt-Thieme Information Systems & Machine Learning Lab University of Hildesheim 31141 Hildesheim {freudenthaler, schmidt-thieme}@ismll.de
More informationPrinciples of Data Mining by Hand&Mannila&Smyth
Principles of Data Mining by Hand&Mannila&Smyth Slides for Textbook Ari Visa,, Institute of Signal Processing Tampere University of Technology October 4, 2010 Data Mining: Concepts and Techniques 1 Differences
More informationInference on Phase-type Models via MCMC
Inference on Phase-type Models via MCMC with application to networks of repairable redundant systems Louis JM Aslett and Simon P Wilson Trinity College Dublin 28 th June 202 Toy Example : Redundant Repairable
More informationDirichlet Processes A gentle tutorial
Dirichlet Processes A gentle tutorial SELECT Lab Meeting October 14, 2008 Khalid El-Arini Motivation We are given a data set, and are told that it was generated from a mixture of Gaussian distributions.
More informationManifold Learning with Variational Auto-encoder for Medical Image Analysis
Manifold Learning with Variational Auto-encoder for Medical Image Analysis Eunbyung Park Department of Computer Science University of North Carolina at Chapel Hill eunbyung@cs.unc.edu Abstract Manifold
More informationTwo Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering
Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering Department of Industrial Engineering and Management Sciences Northwestern University September 15th, 2014
More informationAusterity in MCMC Land: Cutting the Metropolis-Hastings Budget
Anoop Korattikara AKORATTI@UCI.EDU School of Information & Computer Sciences, University of California, Irvine, CA 92617, USA Yutian Chen YUTIAN.CHEN@ENG.CAM.EDU Department of Engineering, University of
More informationGaussian Processes in Machine Learning
Gaussian Processes in Machine Learning Carl Edward Rasmussen Max Planck Institute for Biological Cybernetics, 72076 Tübingen, Germany carl@tuebingen.mpg.de WWW home page: http://www.tuebingen.mpg.de/ carl
More informationLinear Classification. Volker Tresp Summer 2015
Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong
More informationPredict Influencers in the Social Network
Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, lyzhou@stanford.edu Department of Electrical Engineering, Stanford University Abstract Given two persons
More informationDeterministic Sampling-based Switching Kalman Filtering for Vehicle Tracking
Proceedings of the IEEE ITSC 2006 2006 IEEE Intelligent Transportation Systems Conference Toronto, Canada, September 17-20, 2006 WA4.1 Deterministic Sampling-based Switching Kalman Filtering for Vehicle
More informationThe Basics of Graphical Models
The Basics of Graphical Models David M. Blei Columbia University October 3, 2015 Introduction These notes follow Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan. Many figures
More informationMachine Learning and Data Analysis overview. Department of Cybernetics, Czech Technical University in Prague. http://ida.felk.cvut.
Machine Learning and Data Analysis overview Jiří Kléma Department of Cybernetics, Czech Technical University in Prague http://ida.felk.cvut.cz psyllabus Lecture Lecturer Content 1. J. Kléma Introduction,
More informationData Mining: An Overview. David Madigan http://www.stat.columbia.edu/~madigan
Data Mining: An Overview David Madigan http://www.stat.columbia.edu/~madigan Overview Brief Introduction to Data Mining Data Mining Algorithms Specific Eamples Algorithms: Disease Clusters Algorithms:
More informationModeling and Analysis of Call Center Arrival Data: A Bayesian Approach
Modeling and Analysis of Call Center Arrival Data: A Bayesian Approach Refik Soyer * Department of Management Science The George Washington University M. Murat Tarimcilar Department of Management Science
More informationA Latent Variable Approach to Validate Credit Rating Systems using R
A Latent Variable Approach to Validate Credit Rating Systems using R Chicago, April 24, 2009 Bettina Grün a, Paul Hofmarcher a, Kurt Hornik a, Christoph Leitner a, Stefan Pichler a a WU Wien Grün/Hofmarcher/Hornik/Leitner/Pichler
More informationHT2015: SC4 Statistical Data Mining and Machine Learning
HT2015: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Bayesian Nonparametrics Parametric vs Nonparametric
More informationTowards running complex models on big data
Towards running complex models on big data Working with all the genomes in the world without changing the model (too much) Daniel Lawson Heilbronn Institute, University of Bristol 2013 1 / 17 Motivation
More informationMachine Learning Logistic Regression
Machine Learning Logistic Regression Jeff Howbert Introduction to Machine Learning Winter 2012 1 Logistic regression Name is somewhat misleading. Really a technique for classification, not regression.
More informationClassification Problems
Classification Read Chapter 4 in the text by Bishop, except omit Sections 4.1.6, 4.1.7, 4.2.4, 4.3.3, 4.3.5, 4.3.6, 4.4, and 4.5. Also, review sections 1.5.1, 1.5.2, 1.5.3, and 1.5.4. Classification Problems
More informationChapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )
Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 ) and Neural Networks( 類 神 經 網 路 ) 許 湘 伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) LR Chap 10 1 / 35 13 Examples
More informationAn Introduction to Using WinBUGS for Cost-Effectiveness Analyses in Health Economics
Slide 1 An Introduction to Using WinBUGS for Cost-Effectiveness Analyses in Health Economics Dr. Christian Asseburg Centre for Health Economics Part 1 Slide 2 Talk overview Foundations of Bayesian statistics
More informationMaster s thesis tutorial: part III
for the Autonomous Compliant Research group Tinne De Laet, Wilm Decré, Diederik Verscheure Katholieke Universiteit Leuven, Department of Mechanical Engineering, PMA Division 30 oktober 2006 Outline General
More informationFeature Engineering in Machine Learning
Research Fellow Faculty of Information Technology, Monash University, Melbourne VIC 3800, Australia August 21, 2015 Outline A Machine Learning Primer Machine Learning and Data Science Bias-Variance Phenomenon
More informationarxiv:1410.4984v1 [cs.dc] 18 Oct 2014
Gaussian Process Models with Parallelization and GPU acceleration arxiv:1410.4984v1 [cs.dc] 18 Oct 2014 Zhenwen Dai Andreas Damianou James Hensman Neil Lawrence Department of Computer Science University
More informationStatistical Models in Data Mining
Statistical Models in Data Mining Sargur N. Srihari University at Buffalo The State University of New York Department of Computer Science and Engineering Department of Biostatistics 1 Srihari Flood of
More informationDetection of changes in variance using binary segmentation and optimal partitioning
Detection of changes in variance using binary segmentation and optimal partitioning Christian Rohrbeck Abstract This work explores the performance of binary segmentation and optimal partitioning in the
More informationBayesian networks - Time-series models - Apache Spark & Scala
Bayesian networks - Time-series models - Apache Spark & Scala Dr John Sandiford, CTO Bayes Server Data Science London Meetup - November 2014 1 Contents Introduction Bayesian networks Latent variables Anomaly
More informationMACHINE LEARNING IN HIGH ENERGY PHYSICS
MACHINE LEARNING IN HIGH ENERGY PHYSICS LECTURE #1 Alex Rogozhnikov, 2015 INTRO NOTES 4 days two lectures, two practice seminars every day this is introductory track to machine learning kaggle competition!
More informationSpatial Statistics Chapter 3 Basics of areal data and areal data modeling
Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Recall areal data also known as lattice data are data Y (s), s D where D is a discrete index set. This usually corresponds to data
More informationLatent variable and deep modeling with Gaussian processes; application to system identification. Andreas Damianou
Latent variable and deep modeling with Gaussian processes; application to system identification Andreas Damianou Department of Computer Science, University of Sheffield, UK Brown University, 17 Feb. 2016
More informationBayesX - Software for Bayesian Inference in Structured Additive Regression
BayesX - Software for Bayesian Inference in Structured Additive Regression Thomas Kneib Faculty of Mathematics and Economics, University of Ulm Department of Statistics, Ludwig-Maximilians-University Munich
More informationStatistics Graduate Courses
Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.
More informationLogistic Regression for Spam Filtering
Logistic Regression for Spam Filtering Nikhila Arkalgud February 14, 28 Abstract The goal of the spam filtering problem is to identify an email as a spam or not spam. One of the classic techniques used
More informationNonparametric adaptive age replacement with a one-cycle criterion
Nonparametric adaptive age replacement with a one-cycle criterion P. Coolen-Schrijner, F.P.A. Coolen Department of Mathematical Sciences University of Durham, Durham, DH1 3LE, UK e-mail: Pauline.Schrijner@durham.ac.uk
More informationModel-based Synthesis. Tony O Hagan
Model-based Synthesis Tony O Hagan Stochastic models Synthesising evidence through a statistical model 2 Evidence Synthesis (Session 3), Helsinki, 28/10/11 Graphical modelling The kinds of models that
More informationBayesian Image Super-Resolution
Bayesian Image Super-Resolution Michael E. Tipping and Christopher M. Bishop Microsoft Research, Cambridge, U.K..................................................................... Published as: Bayesian
More informationBig Data need Big Model 1/44
Big Data need Big Model 1/44 Andrew Gelman, Bob Carpenter, Matt Hoffman, Daniel Lee, Ben Goodrich, Michael Betancourt, Marcus Brubaker, Jiqiang Guo, Peter Li, Allen Riddell,... Department of Statistics,
More informationMaking Sense of the Mayhem: Machine Learning and March Madness
Making Sense of the Mayhem: Machine Learning and March Madness Alex Tran and Adam Ginzberg Stanford University atran3@stanford.edu ginzberg@stanford.edu I. Introduction III. Model The goal of our research
More informationMore details on the inputs, functionality, and output can be found below.
Overview: The SMEEACT (Software for More Efficient, Ethical, and Affordable Clinical Trials) web interface (http://research.mdacc.tmc.edu/smeeactweb) implements a single analysis of a two-armed trial comparing
More informationLinear regression methods for large n and streaming data
Linear regression methods for large n and streaming data Large n and small or moderate p is a fairly simple problem. The sufficient statistic for β in OLS (and ridge) is: The concept of sufficiency is
More informationScalable Machine Learning - or what to do with all that Big Data infrastructure
- or what to do with all that Big Data infrastructure TU Berlin blog.mikiobraun.de Strata+Hadoop World London, 2015 1 Complex Data Analysis at Scale Click-through prediction Personalized Spam Detection
More informationJiří Matas. Hough Transform
Hough Transform Jiří Matas Center for Machine Perception Department of Cybernetics, Faculty of Electrical Engineering Czech Technical University, Prague Many slides thanks to Kristen Grauman and Bastian
More informationCCNY. BME I5100: Biomedical Signal Processing. Linear Discrimination. Lucas C. Parra Biomedical Engineering Department City College of New York
BME I5100: Biomedical Signal Processing Linear Discrimination Lucas C. Parra Biomedical Engineering Department CCNY 1 Schedule Week 1: Introduction Linear, stationary, normal - the stuff biology is not
More informationModern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh
Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Peter Richtárik Week 3 Randomized Coordinate Descent With Arbitrary Sampling January 27, 2016 1 / 30 The Problem
More informationValidation of Software for Bayesian Models using Posterior Quantiles. Samantha R. Cook Andrew Gelman Donald B. Rubin DRAFT
Validation of Software for Bayesian Models using Posterior Quantiles Samantha R. Cook Andrew Gelman Donald B. Rubin DRAFT Abstract We present a simulation-based method designed to establish that software
More informationAn Internal Model for Operational Risk Computation
An Internal Model for Operational Risk Computation Seminarios de Matemática Financiera Instituto MEFF-RiskLab, Madrid http://www.risklab-madrid.uam.es/ Nicolas Baud, Antoine Frachot & Thierry Roncalli
More informationCS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.
Lecture Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott
More informationReliability estimators for the components of series and parallel systems: The Weibull model
Reliability estimators for the components of series and parallel systems: The Weibull model Felipe L. Bhering 1, Carlos Alberto de Bragança Pereira 1, Adriano Polpo 2 1 Department of Statistics, University
More informationA Simultaneous Solution for General Linear Equations on a Ring or Hierarchical Cluster
Acta Technica Jaurinensis Vol. 3. No. 1. 010 A Simultaneous Solution for General Linear Equations on a Ring or Hierarchical Cluster G. Molnárka, N. Varjasi Széchenyi István University Győr, Hungary, H-906
More informationLEARNING FROM BIG DATA
LEARNING FROM BIG DATA Mattias Villani Division of Statistics and Machine Learning Department of Computer and Information Science Linköping University MATTIAS VILLANI (STIMA, LIU) LEARNING FROM BIG DATA
More informationEfficient Cluster Detection and Network Marketing in a Nautural Environment
A Probabilistic Model for Online Document Clustering with Application to Novelty Detection Jian Zhang School of Computer Science Cargenie Mellon University Pittsburgh, PA 15213 jian.zhang@cs.cmu.edu Zoubin
More informationLogistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld.
Logistic Regression Vibhav Gogate The University of Texas at Dallas Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld. Generative vs. Discriminative Classifiers Want to Learn: h:x Y X features
More informationlarge-scale machine learning revisited Léon Bottou Microsoft Research (NYC)
large-scale machine learning revisited Léon Bottou Microsoft Research (NYC) 1 three frequent ideas in machine learning. independent and identically distributed data This experimental paradigm has driven
More informationSampling for Bayesian computation with large datasets
Sampling for Bayesian computation with large datasets Zaiying Huang Andrew Gelman April 27, 2005 Abstract Multilevel models are extremely useful in handling large hierarchical datasets. However, computation
More informationUnsupervised Data Mining (Clustering)
Unsupervised Data Mining (Clustering) Javier Béjar KEMLG December 01 Javier Béjar (KEMLG) Unsupervised Data Mining (Clustering) December 01 1 / 51 Introduction Clustering in KDD One of the main tasks in
More informationBayesian inference for population prediction of individuals without health insurance in Florida
Bayesian inference for population prediction of individuals without health insurance in Florida Neung Soo Ha 1 1 NISS 1 / 24 Outline Motivation Description of the Behavioral Risk Factor Surveillance System,
More informationReal-time Visual Tracker by Stream Processing
Real-time Visual Tracker by Stream Processing Simultaneous and Fast 3D Tracking of Multiple Faces in Video Sequences by Using a Particle Filter Oscar Mateo Lozano & Kuzahiro Otsuka presented by Piotr Rudol
More informationIntroduction to Online Learning Theory
Introduction to Online Learning Theory Wojciech Kot lowski Institute of Computing Science, Poznań University of Technology IDSS, 04.06.2013 1 / 53 Outline 1 Example: Online (Stochastic) Gradient Descent
More informationA General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions
A General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions Jing Gao Wei Fan Jiawei Han Philip S. Yu University of Illinois at Urbana-Champaign IBM T. J. Watson Research Center
More informationMethods of Data Analysis Working with probability distributions
Methods of Data Analysis Working with probability distributions Week 4 1 Motivation One of the key problems in non-parametric data analysis is to create a good model of a generating probability distribution,
More informationThe Artificial Prediction Market
The Artificial Prediction Market Adrian Barbu Department of Statistics Florida State University Joint work with Nathan Lay, Siemens Corporate Research 1 Overview Main Contributions A mathematical theory
More informationPerformance of Dynamic Load Balancing Algorithms for Unstructured Mesh Calculations
Performance of Dynamic Load Balancing Algorithms for Unstructured Mesh Calculations Roy D. Williams, 1990 Presented by Chris Eldred Outline Summary Finite Element Solver Load Balancing Results Types Conclusions
More informationDesigning a learning system
Lecture Designing a learning system Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x4-8845 http://.cs.pitt.edu/~milos/courses/cs750/ Design of a learning system (first vie) Application or Testing
More information