A Bootstrap MetropolisHastings Algorithm for Bayesian Analysis of Big Data


 Esther Bond
 2 years ago
 Views:
Transcription
1 A Bootstrap MetropolisHastings Algorithm for Bayesian Analysis of Big Data Faming Liang University of Florida August 9, 2015
2 Abstract MCMC methods have proven to be a very powerful tool for analyzing data of complex structures. However, their computeintensive nature, which typically require a large number of iterations and a complete scan of the full dataset for each iteration, precludes their use for big data analysis. We propose the socalled bootstrap MetropolisHastings (BMH) algorithm, which provides a general framework for how to tame powerful MCMC methods to be used for big data analysis; that is to replace the full data loglikelihood by a Monte Carlo average of the loglikelihoods that are calculated in parallel from multiple bootstrap samples. The BMH algorithm possesses an embarrassingly parallel structure and avoids repeated scans of the full dataset in iterations, and is thus feasible for big data problems. Compared to the popular divideandcombine method, BMH can be generally more efficient as it can asymptotically integrate the whole data information into a single simulation run. The BMH algorithm is very flexible. Like the MH algorithm, it can serve as a basic building block for developing advanced MCMC algorithms that are feasible for big data problems.
3 Big Data Big Data: Data too large to handle easily on a single server or too time consuming to analyze using traditional statistical methods. Examples of Big Data: Genome data: using big data to find better treatments for patients through genomic sequencing technologies Atmospheric sciences data: rapidly ballooning observations (e.g., radar, satellites, sensor networks), climate data, ensemble data. Social sciences data: social networks (Facebook, LinkedIn, network), social media data (news, telephone calls) Finance data, image data, etc.
4 Big Data Challenges Accessing, using and visualizing data Serverside processing and distributed storage Limited number of statistical methods: From the view of statistical inference, it is unclear how the current statistical methodology can be transported to the paradigm of big data. Modeling: With growing size typically comes a growing complexity of data structures and of the models needed to account for the structures. Missing data
5 Strategies used in Big Data Analysis Split and Merge: Lin and Xi (2011, SII): Aggregated estimating equation Xie (2013): high dimensional variable selection Song and Liang (2015, JRSSB): Bayesian high dimensional variable selection Online Learning: stream data Data/model reduction: Using lowrank models for approximate inference of massive data Subsampling: Liang et al. (2013, JASA) Bag of Little Bootstraps (Kleiner et al., 2012): provides an efficient way of bootstrapping for big data estimators, which functions by combining the results of bootstrapping multiple small subsets of the big original dataset.
6 Aggregated Estimating Equation Aggregated estimating equation (Lin and Xi, 2011): It employs a divideandcombine strategy. It is first to compress the raw data of each partition of the full dataset into some low dimensional statistics, and then to obtain an approximation to the estimating equation estimator, the aggregated estimating equation estimator, by solving an equation aggregated from the saved low dimensional statistics in all partitions.
7 Resamplingbased Stochastic Approximation Liang et al. (2013, JASA) proposed a new parameter estimator, maximum mean loglikelihood estimator, for big data problems, and a resamplingbased stochastic approximation method for obtaining such an estimator. θ l n(θ X n ) = 0 ( ) n 1 m θ l n,m,i(θ X m ) = 0 E θ l m(θ X m ) = 0 The resamplingbased stochastic approximation method successfully avoids some difficulties involved in big data computation, such as whole data scanning. i
8 Bag of Little Bootstraps The bootstrap method is a resamplingbased method, and has been widely used in applied statistics for assessing the quality of estimators since proposed by Efron (1979). The bag of little bootstraps (Kleiner et al., 2012) provides an efficient way of bootstrapping for big data estimators, which functions by combining the results of bootstrapping multiple small subsets of the big original dataset.
9 Bootstrap MH Algorithm: Motivation Markov chain Monte Carlo (MCMC) methods have been widely used in statistical data analysis, and they have proven to be a very powerful and typically unique computational tool for analyzing data of complex structures. MCMC methods are computerintensive, which typically require a large number of iterations and a complete scan of the full dataset for each iteration. This feature precludes their use for big data analysis. We aim to develop a framework under which the powerful MCMC methods can be tamed for using in big data analysis, such as parameter estimation, optimization, and model selection.
10 Bootstrap MH Algorithm: Basic Idea The bootstrap MetropolisHastings (BMH) algorithm works by replacing the full data loglikelihood by a Monte Carlo average of the loglikelihoods that are calculated in parallel from multiple bootstrap samples, where the bootstrap sample refers to a small set of observations drawn from the full dataset at random and with/without replacement. By this way, BMH avoids repeated scans of the full dataset in iterations, while it is still able to produce sensible solutions, such as parameter estimates or posterior samples, to the problem under consideration. BMH is feasible for big data and workable on parallel and distributed architectures.
11 BMH Algorithm: Notation Let D i denote a bootstrap sample of D, which is resampled from the full dataset at random and with/without replacement. Let m denote the size of D i = {xij : j = 1, 2,..., m}. If resampling is done without replacement, D i is called a subsample or ( n m) bootstrap sample. Otherwise, Di is called an moutofn bootstrap sample or m/nbootstrap sample.
12 BMH Algorithm: Notation Let f (D i θ) denote a likelihoodlike function of D i, and define l m,n,k (D s θ) = 1 k k log f (D i θ), (1) i=1 where k denotes the number of bootstrap samples drawn from D, and D s = {D 1,..., D k } is the collection of the bootstrap samples. The definition of f (D i θ) depends on the feature of D. If the observations in D are independently and identically distributed (i.i.d.), then, regardless D i is a ( n m)  or m/nbootstrap sample, we define m f (D i θ) = f (xij θ). (2) j=1
13 BMH Algorithm: Algorithm 1. Draw ϑ from a proposal distribution Q(θ t, ϑ). 2. Draw k bootstrap samples D 1,..., D k via ( n m)  or m/nbootstrapping. Let D s = {D 1,..., D k }. 3. Calculate the BMH ratio: r(θ t, D s, ϑ) = exp {l m,n,k (D s ϑ) l m,n,k (D s θ t )} π(ϑ) Q(ϑ, θ t ) π(θ t ) Q(θ t, ϑ). 4. Set θ t+1 = ϑ with probability α(θ t, D s, ϑ) = min{1, r(θ t, D s, ϑ)}, and set θ t+1 = θ t with the remaining probability.
14 BMH Algorithm: Remarks In BMH, {θ t } form a Markov chain with the transition kernel given by P m,n,k (θ, dϑ) = D s D α(θ, D s, ϑ)q(θ, ϑ)ψ(d s ) + δ θ (dϑ) 1 D s D Θ α(θ, D s, ϑ )Q(θ, dϑ )ψ(d s) where D denote the space of D s, ψ(d s ) denotes the probability of drawing D s, and δ θ ( ) is an indicator function. For ( ) n m bootstrapping, ψ(ds ) = ( n k; m) and for m/nbootstrapping, ψ(d s ) = 1/n mk. (3)
15 BMH Algorithm: Remarks When the observations in D are i.i.d, both the resampling schemes, ( n m)  or m/nbootstrapping lead to the same stationary distribution of BMH. Since BMH is proposed for simulations on parallel computers, the parameter k specifies the number of processors/nodes used in computing the averaged loglikelihood function. Theoretically, a large value of k is preferred. However, an extremely large value of k may slow down the computation due to the increased internode communications. In our experience, to achieve a good performance for BMH, k does not need to be very large. The choice of m can depend on the complexity of the model under consideration, in particular, the dimension of θ. In general, m should increase with the complexity of the model.
16 BMH Algorithm: Convergence Let g m (D θ) = exp{e[log f (D i θ)]}, where E[ ] denotes the expectation. Define the transition kernel [ ] P m (θ, ϑ) = α(θ, ϑ)q(θ, ϑ)+δ θ (dϑ) 1 α(θ, ϑ )Q(θ, ϑ )dϑ, Θ (4) which is induced by the proposal Q(, ) for a MH move with the invariant distribution given by π m (θ D) g m (D θ)π(θ). (5)
17 BMH Algorithm: Convergence Assume the following conditions hold: (A) sup θ Θ E log f (X i θ) <. (B) Assume that P m defines an irreducible and aperiodic Markov chain such that π m ( )P m = π m ( ). Therefore, for any starting point θ 0 Θ, lim t P t m(θ 0, ) π m ( ) = 0, where denotes the total variation norm. (C) For any (θ, ϑ) Θ Θ, 0 < exp{l m,n,k (D s ϑ) l m,n,k (D s θ)}/[g m (D ϑ) g m (D θ)] <, ψ( where ψ(d s ) is the resampling probability of D s from D.
18 BMH Algorithm: Convergence Lemma 1 Assume that the condition (A) holds and m = O(n γ ). If γ < 1/2, then U m,n (D θ) log(g m (D θ)) p 0, as n. (6)
19 BMH Algorithm: Convergence Theorem 1.( ( n m) bootstrapping) Assume the observations in D are iid and the conditions (A), (B) and (C) hold. Then for any ɛ (0, 1] and any θ 0 Θ, there exist N(ɛ, θ 0 ) N, K(ɛ, θ 0, n) N, and T (ɛ, θ 0, n, k) N such that for any n > N(ɛ, θ 0 ), k > K(ɛ, θ 0, n), and t > T (ɛ, θ 0, n, k), P t m,n,k (θ 0, ) π m ( ) ɛ, where π m ( ) is the stationary distribution of P m as defined in (5). Theorem 2.(m/nbootstrapping) Under similar conditions to Theorem 1, BMH with m/nbootstrapping has the same stationary distribution as with ( n m) bootstrapping.
20 Some key points: BMH Algorithm: Bayesian Inference It follows from the asymptotic normality of posterior distributions, see e.g., Chen (1985), we have π n (θ D) L N(µ n, Σ n ), (7) where µ n denotes the mode of π n (θ D) and Σ n = { 2 log(π n (θ)l(d θ))/ θ θ T } 1. Under regularity conditions, we show that as m, ( π m (θ D) L N µ n, n ) m Σ n. (8) The properties of π n (θ D) can be conveniently inferred from BMH samples.
21 Simulated Example Consider the normal linear regression y i = β 0 + β 1 x i1 + β 2 x i2 + β 3 x i3 + ɛ i, i = 1, 2,..., n, where (β 0, β 1, β 2, β 3 ) = (2, 0.25, 0.25, 0) are regression coefficients, and ɛ 1,..., ɛ n are iid normal random errors with mean 0 and variance σ 2. In simulations, we set n = 10 5 and σ 2 = 0.25, generate both x 1 = (x 11,..., x n1 ) T and x 2 = (x 12,..., x n2 ) T from the multivariate normal distribution N(0, I n ), and set x 3 = (x 13,..., x n3 ) T = 0.7x z, where z is also generated from N(0, I n ). Let θ = (β 0, β 1, β 2, β 3, σ 2 ) and θ = (2, 0.25, 0.25, 0, 0.25) be its true value.
22 Simulated Example Table: Parameter estimation results of MH and BMH for the simulated example. (k, m) m n 100% β 0 β 1 β 2 β 3 log σ 2 MH for the full data (1,10 5 ) 100% ( ) ( ) ( ) ( ) ( ) BMH with n m bootstrapping (25,200) 0.2% ( ) ( ) ( ) ( ) ( ) (25,500) 0.5% ( ) ( ) ( ) ( ) ( ) (25,1000) 1% ( ) ( ) ( ) ( ) ( )
23 Simulated Example Table: Parameter estimation results of MH and BMH for the simulated example. (k, m) m n 100% β 0 β 1 β 2 β 3 log σ 2 MH for the full data (1,10 5 ) 100% BMH with m/nbootstrapping (25,200) 0.2% (25,500) 0.5% (25,1000) 1% ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )
24 Simulated Example Table: Comparison of BMH (k = 50, m = 200) with the divideandcombine (D&C) and AMHT (approximated MH test; Korattikara et al., 2014) for parameter estimation. Algorithm β 0 β 1 β 2 β 3 log(σ 2 ) BMH Estimate SD ( ) ( ) ( ) ( ) ( ) D&C Estimate SD ( ) ( ) ( ) ( ) ( ) AMHT Estimate SD ( ) ( ) ( ) ( ) ( )
25 Variance Estimation Table: MH, BMH, D&C and AMHT estimates of σ11 2,..., σ2 55, σ2 12, σ2 34 and ρ β2,β 3 obtained with pooled samples, where σij 2 denotes the (i, j)th element of Σ 0, and ρ β2,β 3 denotes the correlation coefficient of β 2 and β 3. Method σ11 2 σ22 2 σ33 2 σ44 2 σ55 2 σ12 2 σ34 2 MH BMH D&C AMHT
26 QQplot of Posterior Samples (a) β 0 (b) β 1 (c) β 2 (d) β 3 (e) logσ 2 BMH BMH BMH BMH BMH MH MH MH MH MH (f) β 0 (g) β 1 (h) β 2 (i) β 3 (j) logσ 2 AMHT AMHT AMHT AMHT AMHT MH MH MH MH MH Figure: QQplots of the posterior samples generated by MH, BMH (with k = 50, m = 200 and ( n m) bootstrapping), and AMHT for the simulation example: the plots in the first row are BMH versus MH, and the plots in the second row are AMHT versus MH.
27 Histogram of Posterior Samples (a) β 1 (b) β 1 (c) β 1 Frequency Frequency Frequency MH samples AMHT samples BMH samples Figure: Histograms of the posterior samples of β 1 generated by (a) MH, (b) AMHT, and (c) BMH for the simulation example.
28 Comments on BMH It can asymptotically integrate the whole data information into a single simulation run. Like the MH algorithm, it can serve as a basic buidling block for developing advanced MCMC algorithms.
29 Neural Network Universal approximation ability: A feedforward network with a single hidden layer containing a finite number of hidden units is a universal approximator among continuous functions on compact subsets, under mild assumptions on the activation function. It is potentially a good tool for big data modeling!
30 Neural Network I 4 H 3 O 3 I 3 H 2 O 2 I 2 B H 1 O 1 I 1 Input Layer Hidden Layer Output Layer Figure: A fully connected one hidden layer MLP network.
31 Tempering BMH for Learning BNN with Big Data 1. Draw k bootstrap samples, D 1,..., D k, with/without replacement from the entire training dataset D. 2. Try to update each sample of the current population (θ 1 t,..., θ Π t ) by the local updating operators, where t indexes iterations, and the energy function is calculated by averaging the energy values calculated from the bootstrap samples D 1,..., D k. 3. Try to exchange θ i t with θ j t for n 1 pairs (i, j) with i being sampled uniformly on {1,..., n} and j = i ± 1 with probability ω i,j, where ω i,i+1 = ω i,i 1 = 0.5 for 1 < i < Π and ω 1,2 = ω Π,Π 1 = 1.
32 j=1 Tempering BMH for Learning BNN with Big Data (Rank=2) Slave1 Generate Bootstrap Samples (Rank=1) Master (Rank=3) Slave2 Bootstrap Sample D2 Bootstrap Sample D1 Bootstrap Sample D3 with Updated Parameters θ i t Broadcast θ i t Broadcast θ i t E ( θ i t D2 ) Reduce E ( θ i t D1 ) Reduce E ( θ i t D3 ) 1 3 Calculate 3 E ( θ i ) t Dj and Update Parameters No End of current iteration? Yes No End of simulation? Yes End Output simulation results End Figure: Parallel implementation of tempering BMH: The flowchart of the tempering BMH algorithm with 3 processors.
33 Forest Cover Type Data The goal of this study is to predict forest cover types from cartographic variables in the forested areas with minimal humancaused disturbances. The data were taken from four wilderness areas located in the Roosevelt National Forest of northern Colorado. It consisted of 581,012 observations. Each observation was obtained from the US Geological Survey (USGS) digital elevation model data based on m raster cells, and it consisted of 54 cartographic explanatory variables including 10 quantitative variables, 4 binary wilderness area variables, and 40 binary soil type variables. These observations have been classified into seven classes according to their cover types. The respective class sizes are , , 35754, 2747, 9493, 17367, and
34 Forest Cover Type Data Table 4. BMH results for forest cover type data. The resampling rares for the 7 types of observations are 0.5%, 0.5%, 1%, 2.5%, 1.5%, 1% and 1%, respectively. The aggregated resampling rate for the training data is 0.59%. Bootstrapping k Average network Size Training rate(%) Prediction rate(%) CPU(h) (1.18) 72.2 (0.17) 72.4 (0.17) 32.0 (2.9) m/n (0.95) 72.2 (0.15) 72.4 (0.12) 33.7 (3.2) (1.52) 72.3 (0.07) 72.4 (0.07) 28.5 (2.5) n m (0.83) 72.3 (0.15) 72.4 (0.16) 31.9 (3.0)
35 Forest Cover Type Data: Efficiency of BMH For comparison, we have applied parallel tempering to train the BNN with a singlethreaded simulation on an Intel Nehalem server. At each local updating step of parallel tempering, the whole training dataset is scanned once. Hence, this algorithm runs extremely slow. The first 5536 iterations of the simulation have taken 688 CPU hours, although the Intel Nehalem processor is much faster (approximately 2.5 times) than the processor used in the cluster machine. To finish iterations, it will take about 3000 CPU hours (125 days) on the Intel Nehalem server. Compared 3000 hours to 30 hours, it shows a great advantage of the parallelized BMH algorithm for big data problems.
36 Resampling on Distributed Architectures Let S i denote the ith subset of data stored in node i, i = 1,..., k. For each i, 1. Set j = i 1 or i + 1 with equal probability. If j = 0, reset j = k; and if j = k + 1, reset j = Exchange M randomly selected observations between S i and S j, where M can be a prespecified or random number. It follows from the standard theory of MCMC (see e.g. Geyer, 1991) that the above procedure will ensure that each subset stored in a single node is a random subset of the whole dataset.
37 Discussion We have proposed the BMH algorithm as a basic MCMC algorithm for Bayesian analysis of big data. The BMH algorithm is workable on parallel and distributed architectures and avoids repeated scans of the full dataset in iterations, and is thus feasible for big data problems. Compared to the popular divideandcombine method, BMH is generally more efficient as it can asymptotically integrate the whole data information into a single simulation run.
38 Discussion (continued) The BMH algorithm is very flexible. Like the MetropolisHastings algorithm, it can serve as a basic building block for developing advanced MCMC algorithms that are feasible for big data problems. Sampling: Tempering BMH, which combines BMH with parallel tempering. Model selection: Reversible jump BMH, which combines BMH with reversible jump MCMC. Optimization: Simulated annealing BMH, which combines BMH with simulated annealing. Compared to the existing methods, such as aggregated estimating equation, resamplingbased stochastic approximation, and bag of little bootstrap, BMH has the unique power to tame the powerful MCMC methods for using in big data analysis.
39 Acknowledgments NSF grants KAUST grant Student: Jinsu Kim for parallel programming.
Tutorial on Markov Chain Monte Carlo
Tutorial on Markov Chain Monte Carlo Kenneth M. Hanson Los Alamos National Laboratory Presented at the 29 th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Technology,
More informationIntroduction to Markov Chain Monte Carlo
Introduction to Markov Chain Monte Carlo Monte Carlo: sample from a distribution to estimate the distribution to compute max, mean Markov Chain Monte Carlo: sampling using local information Generic problem
More informationBasics of Statistical Machine Learning
CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar
More informationBootstrapping Big Data
Bootstrapping Big Data Ariel Kleiner Ameet Talwalkar Purnamrita Sarkar Michael I. Jordan Computer Science Division University of California, Berkeley {akleiner, ameet, psarkar, jordan}@eecs.berkeley.edu
More informationBayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University caizhua@gmail.com
Bayesian Machine Learning (ML): Modeling And Inference in Big Data Zhuhua Cai Google Rice University caizhua@gmail.com 1 Syllabus Bayesian ML Concepts (Today) Bayesian ML on MapReduce (Next morning) Bayesian
More informationMarkov Chain Monte Carlo Simulation Made Simple
Markov Chain Monte Carlo Simulation Made Simple Alastair Smith Department of Politics New York University April2,2003 1 Markov Chain Monte Carlo (MCMC) simualtion is a powerful technique to perform numerical
More informationSpatial Statistics Chapter 3 Basics of areal data and areal data modeling
Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Recall areal data also known as lattice data are data Y (s), s D where D is a discrete index set. This usually corresponds to data
More informationModern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh
Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Peter Richtárik Week 3 Randomized Coordinate Descent With Arbitrary Sampling January 27, 2016 1 / 30 The Problem
More informationMANBITESDOG BUSINESS CYCLES ONLINE APPENDIX
MANBITESDOG BUSINESS CYCLES ONLINE APPENDIX KRISTOFFER P. NIMARK The next section derives the equilibrium expressions for the beauty contest model from Section 3 of the main paper. This is followed by
More informationLogistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression
Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max
More informationIs Infrastructure Capital Productive? A Dynamic Heterogeneous Approach.
Is Infrastructure Capital Productive? A Dynamic Heterogeneous Approach. César Calderón a, Enrique MoralBenito b, Luis Servén a a The World Bank b CEMFI International conference on Infrastructure Economics
More informationProbabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014
Probabilistic Models for Big Data Alex Davies and Roger Frigola University of Cambridge 13th February 2014 The State of Big Data Why probabilistic models for Big Data? 1. If you don t have to worry about
More informationValidation of Software for Bayesian Models using Posterior Quantiles. Samantha R. Cook Andrew Gelman Donald B. Rubin DRAFT
Validation of Software for Bayesian Models using Posterior Quantiles Samantha R. Cook Andrew Gelman Donald B. Rubin DRAFT Abstract We present a simulationbased method designed to establish that software
More informationParameter estimation for nonlinear models: Numerical approaches to solving the inverse problem. Lecture 12 04/08/2008. Sven Zenker
Parameter estimation for nonlinear models: Numerical approaches to solving the inverse problem Lecture 12 04/08/2008 Sven Zenker Assignment no. 8 Correct setup of likelihood function One fixed set of observation
More informationParallelization Strategies for Multicore Data Analysis
Parallelization Strategies for Multicore Data Analysis WeiChen Chen 1 Russell Zaretzki 2 1 University of Tennessee, Dept of EEB 2 University of Tennessee, Dept. Statistics, Operations, and Management
More informationDetection of changes in variance using binary segmentation and optimal partitioning
Detection of changes in variance using binary segmentation and optimal partitioning Christian Rohrbeck Abstract This work explores the performance of binary segmentation and optimal partitioning in the
More informationLecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
More informationLinear Classification. Volker Tresp Summer 2015
Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong
More informationInference on Phasetype Models via MCMC
Inference on Phasetype Models via MCMC with application to networks of repairable redundant systems Louis JM Aslett and Simon P Wilson Trinity College Dublin 28 th June 202 Toy Example : Redundant Repairable
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct
More informationBayesian Statistics: Indian Buffet Process
Bayesian Statistics: Indian Buffet Process Ilker Yildirim Department of Brain and Cognitive Sciences University of Rochester Rochester, NY 14627 August 2012 Reference: Most of the material in this note
More informationStatistics Graduate Courses
Statistics Graduate Courses STAT 7002Topics in StatisticsBiological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.
More informationComputational Statistics for Big Data
Lancaster University Computational Statistics for Big Data Author: 1 Supervisors: Paul Fearnhead 1 Emily Fox 2 1 Lancaster University 2 The University of Washington September 1, 2015 Abstract The amount
More informationHierarchical Bayesian Modeling of the HIV Response to Therapy
Hierarchical Bayesian Modeling of the HIV Response to Therapy Shane T. Jensen Department of Statistics, The Wharton School, University of Pennsylvania March 23, 2010 Joint Work with Alex Braunstein and
More informationStatistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
More informationExample: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation:  Feature vector X,  qualitative response Y, taking values in C
More informationLab 8: Introduction to WinBUGS
40.656 Lab 8 008 Lab 8: Introduction to WinBUGS Goals:. Introduce the concepts of Bayesian data analysis.. Learn the basic syntax of WinBUGS. 3. Learn the basics of using WinBUGS in a simple example. Next
More informationA Scalable Bootstrap for Massive Data
A Scalable Bootstrap for Massive Data arxiv:2.56v2 [stat.me] 28 Jun 22 Ariel Kleiner Department of Electrical Engineering and Computer Science University of California, Bereley aleiner@eecs.bereley.edu
More informationNote on the EM Algorithm in Linear Regression Model
International Mathematical Forum 4 2009 no. 38 18831889 Note on the M Algorithm in Linear Regression Model JiXia Wang and Yu Miao College of Mathematics and Information Science Henan Normal University
More informationIntroduction to General and Generalized Linear Models
Introduction to General and Generalized Linear Models General Linear Models  part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK2800 Kgs. Lyngby
More informationHandling missing data in large data sets. Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza
Handling missing data in large data sets Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza The problem Often in official statistics we have large data sets with many variables and
More informationAPPLIED MISSING DATA ANALYSIS
APPLIED MISSING DATA ANALYSIS Craig K. Enders Series Editor's Note by Todd D. little THE GUILFORD PRESS New York London Contents 1 An Introduction to Missing Data 1 1.1 Introduction 1 1.2 Chapter Overview
More informationData Mining. Nonlinear Classification
Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15
More informationTowards running complex models on big data
Towards running complex models on big data Working with all the genomes in the world without changing the model (too much) Daniel Lawson Heilbronn Institute, University of Bristol 2013 1 / 17 Motivation
More informationR 2 type Curves for Dynamic Predictions from Joint LongitudinalSurvival Models
Faculty of Health Sciences R 2 type Curves for Dynamic Predictions from Joint LongitudinalSurvival Models Inference & application to prediction of kidney graft failure Paul Blanche joint work with MC.
More informationBayesian Statistics in One Hour. Patrick Lam
Bayesian Statistics in One Hour Patrick Lam Outline Introduction Bayesian Models Applications Missing Data Hierarchical Models Outline Introduction Bayesian Models Applications Missing Data Hierarchical
More informationMonte Carlo and Empirical Methods for Stochastic Inference (MASM11/FMS091)
Monte Carlo and Empirical Methods for Stochastic Inference (MASM11/FMS091) Magnus Wiktorsson Centre for Mathematical Sciences Lund University, Sweden Lecture 5 Sequential Monte Carlo methods I February
More informationNonparametric adaptive age replacement with a onecycle criterion
Nonparametric adaptive age replacement with a onecycle criterion P. CoolenSchrijner, F.P.A. Coolen Department of Mathematical Sciences University of Durham, Durham, DH1 3LE, UK email: Pauline.Schrijner@durham.ac.uk
More informationTwo Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering
Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering Department of Industrial Engineering and Management Sciences Northwestern University September 15th, 2014
More informationSampling via Moment Sharing: A New Framework for Distributed Bayesian Inference for Big Data
Sampling via Moment Sharing: A New Framework for Distributed Bayesian Inference for Big Data (Oxford) in collaboration with: Minjie Xu, Jun Zhu, Bo Zhang (Tsinghua) Balaji Lakshminarayanan (Gatsby) Bayesian
More informationGaussian Processes to Speed up Hamiltonian Monte Carlo
Gaussian Processes to Speed up Hamiltonian Monte Carlo Matthieu Lê Murray, Iain http://videolectures.net/mlss09uk_murray_mcmc/ Rasmussen, Carl Edward. "Gaussian processes to speed up hybrid Monte Carlo
More information1 Prior Probability and Posterior Probability
Math 541: Statistical Theory II Bayesian Approach to Parameter Estimation Lecturer: Songfeng Zheng 1 Prior Probability and Posterior Probability Consider now a problem of statistical inference in which
More informationA LOGNORMAL MODEL FOR INSURANCE CLAIMS DATA
REVSTAT Statistical Journal Volume 4, Number 2, June 2006, 131 142 A LOGNORMAL MODEL FOR INSURANCE CLAIMS DATA Authors: Daiane Aparecida Zuanetti Departamento de Estatística, Universidade Federal de São
More informationModel Combination. 24 Novembre 2009
Model Combination 24 Novembre 2009 Datamining 1 20092010 Plan 1 Principles of model combination 2 Resampling methods Bagging Random Forests Boosting 3 Hybrid methods Stacking Generic algorithm for mulistrategy
More informationParametric fractional imputation for missing data analysis
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 Biometrika (????,??,?, pp. 1 14 C???? Biometrika Trust Printed in
More informationAn Internal Model for Operational Risk Computation
An Internal Model for Operational Risk Computation Seminarios de Matemática Financiera Instituto MEFFRiskLab, Madrid http://www.risklabmadrid.uam.es/ Nicolas Baud, Antoine Frachot & Thierry Roncalli
More informationSome stability results of parameter identification in a jump diffusion model
Some stability results of parameter identification in a jump diffusion model D. Düvelmeyer Technische Universität Chemnitz, Fakultät für Mathematik, 09107 Chemnitz, Germany Abstract In this paper we discuss
More informationRecent Developments of Statistical Application in. Finance. Ruey S. Tsay. Graduate School of Business. The University of Chicago
Recent Developments of Statistical Application in Finance Ruey S. Tsay Graduate School of Business The University of Chicago Guanghua Conference, June 2004 Summary Focus on two parts: Applications in Finance:
More informationA Study Of Bagging And Boosting Approaches To Develop MetaClassifier
A Study Of Bagging And Boosting Approaches To Develop MetaClassifier G.T. Prasanna Kumari Associate Professor, Dept of Computer Science and Engineering, Gokula Krishna College of Engg, Sullurpet524121,
More informationMonte Carlobased statistical methods (MASM11/FMS091)
Monte Carlobased statistical methods (MASM11/FMS091) Jimmy Olsson Centre for Mathematical Sciences Lund University, Sweden Lecture 5 Sequential Monte Carlo methods I February 5, 2013 J. Olsson Monte Carlobased
More informationEstimating the Degree of Activity of jumps in High Frequency Financial Data. joint with Yacine AïtSahalia
Estimating the Degree of Activity of jumps in High Frequency Financial Data joint with Yacine AïtSahalia Aim and setting An underlying process X = (X t ) t 0, observed at equally spaced discrete times
More informationBayesian Methods. 1 The Joint Posterior Distribution
Bayesian Methods Every variable in a linear model is a random variable derived from a distribution function. A fixed factor becomes a random variable with possibly a uniform distribution going from a lower
More informationGenerating Random Numbers Variance Reduction QuasiMonte Carlo. Simulation Methods. Leonid Kogan. MIT, Sloan. 15.450, Fall 2010
Simulation Methods Leonid Kogan MIT, Sloan 15.450, Fall 2010 c Leonid Kogan ( MIT, Sloan ) Simulation Methods 15.450, Fall 2010 1 / 35 Outline 1 Generating Random Numbers 2 Variance Reduction 3 QuasiMonte
More informationSales forecasting # 2
Sales forecasting # 2 Arthur Charpentier arthur.charpentier@univrennes1.fr 1 Agenda Qualitative and quantitative methods, a very general introduction Series decomposition Short versus long term forecasting
More informationExploiting the Statistics of Learning and Inference
Exploiting the Statistics of Learning and Inference Max Welling Institute for Informatics University of Amsterdam Science Park 904, Amsterdam, Netherlands m.welling@uva.nl Abstract. When dealing with datasets
More informationKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Unit # 11 Sajjad Haider Fall 2013 1 Supervised Learning Process Data Collection/Preparation Data Cleaning Discretization Supervised/Unuspervised Identification of right
More informationSampling for Bayesian computation with large datasets
Sampling for Bayesian computation with large datasets Zaiying Huang Andrew Gelman April 27, 2005 Abstract Multilevel models are extremely useful in handling large hierarchical datasets. However, computation
More informationBig Data, Statistics, and the Internet
Big Data, Statistics, and the Internet Steven L. Scott April, 4 Steve Scott (Google) Big Data, Statistics, and the Internet April, 4 / 39 Summary Big data live on more than one machine. Computing takes
More informationLOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as
LOGISTIC REGRESSION Nitin R Patel Logistic regression extends the ideas of multiple linear regression to the situation where the dependent variable, y, is binary (for convenience we often code these values
More informationTopic models for Sentiment analysis: A Literature Survey
Topic models for Sentiment analysis: A Literature Survey Nikhilkumar Jadhav 123050033 June 26, 2014 In this report, we present the work done so far in the field of sentiment analysis using topic models.
More informationApplications of R Software in Bayesian Data Analysis
Article International Journal of Information Science and System, 2012, 1(1): 723 International Journal of Information Science and System Journal homepage: www.modernscientificpress.com/journals/ijinfosci.aspx
More informationMore details on the inputs, functionality, and output can be found below.
Overview: The SMEEACT (Software for More Efficient, Ethical, and Affordable Clinical Trials) web interface (http://research.mdacc.tmc.edu/smeeactweb) implements a single analysis of a twoarmed trial comparing
More informationPrinciple of Data Reduction
Chapter 6 Principle of Data Reduction 6.1 Introduction An experimenter uses the information in a sample X 1,..., X n to make inferences about an unknown parameter θ. If the sample size n is large, then
More informationWebbased Supplementary Materials for Bayesian Effect Estimation. Accounting for Adjustment Uncertainty by Chi Wang, Giovanni
1 Webbased Supplementary Materials for Bayesian Effect Estimation Accounting for Adjustment Uncertainty by Chi Wang, Giovanni Parmigiani, and Francesca Dominici In Web Appendix A, we provide detailed
More informationData Mining Practical Machine Learning Tools and Techniques
Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea
More informationA Practical Scheme for Wireless Network Operation
A Practical Scheme for Wireless Network Operation Radhika Gowaikar, Amir F. Dana, Babak Hassibi, Michelle Effros June 21, 2004 Abstract In many problems in wireline networks, it is known that achieving
More informationThe Scientific Data Mining Process
Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In
More informationIncorporating cost in Bayesian Variable Selection, with application to costeffective measurement of quality of health care.
Incorporating cost in Bayesian Variable Selection, with application to costeffective measurement of quality of health care University of Florida 10th Annual Winter Workshop: Bayesian Model Selection and
More informationStatistical machine learning, high dimension and big data
Statistical machine learning, high dimension and big data S. Gaïffas 1 14 mars 2014 1 CMAP  Ecole Polytechnique Agenda for today Divide and Conquer principle for collaborative filtering Graphical modelling,
More informationPackage EstCRM. July 13, 2015
Version 1.4 Date 2015711 Package EstCRM July 13, 2015 Title Calibrating Parameters for the Samejima's Continuous IRT Model Author Cengiz Zopluoglu Maintainer Cengiz Zopluoglu
More informationCI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore.
CI6227: Data Mining Lesson 11b: Ensemble Learning Sinno Jialin PAN Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore Acknowledgements: slides are adapted from the lecture notes
More informationClass #6: Nonlinear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris
Class #6: Nonlinear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Nonlinear classification Linear Support Vector Machines
More informationDecision Trees from large Databases: SLIQ
Decision Trees from large Databases: SLIQ C4.5 often iterates over the training set How often? If the training set does not fit into main memory, swapping makes C4.5 unpractical! SLIQ: Sort the values
More informationMonte Carlo Simulation
1 Monte Carlo Simulation Stefan Weber Leibniz Universität Hannover email: sweber@stochastik.unihannover.de web: www.stochastik.unihannover.de/ sweber Monte Carlo Simulation 2 Quantifying and Hedging
More informationAPPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder
APPM4720/5720: Fast algorithms for big data Gunnar Martinsson The University of Colorado at Boulder Course objectives: The purpose of this course is to teach efficient algorithms for processing very large
More informationIntroduction to Support Vector Machines. Colin Campbell, Bristol University
Introduction to Support Vector Machines Colin Campbell, Bristol University 1 Outline of talk. Part 1. An Introduction to SVMs 1.1. SVMs for binary classification. 1.2. Soft margins and multiclass classification.
More informationConstrained Bayes and Empirical Bayes Estimator Applications in Insurance Pricing
Communications for Statistical Applications and Methods 2013, Vol 20, No 4, 321 327 DOI: http://dxdoiorg/105351/csam2013204321 Constrained Bayes and Empirical Bayes Estimator Applications in Insurance
More informationMicroclustering: When the Cluster Sizes Grow Sublinearly with the Size of the Data Set
Microclustering: When the Cluster Sizes Grow Sublinearly with the Size of the Data Set Jeffrey W. Miller Brenda Betancourt Abbas Zaidi Hanna Wallach Rebecca C. Steorts Abstract Most generative models for
More informationApplied Multivariate Analysis  Big data analytics
Applied Multivariate Analysis  Big data analytics Nathalie VillaVialaneix nathalie.villa@toulouse.inra.fr http://www.nathalievilla.org M1 in Economics and Economics and Statistics Toulouse School of
More informationEfficiency and the CramérRao Inequality
Chapter Efficiency and the CramérRao Inequality Clearly we would like an unbiased estimator ˆφ (X of φ (θ to produce, in the long run, estimates which are fairly concentrated i.e. have high precision.
More informationCompression and Aggregation of Bayesian Estimates for Data Intensive Computing
Under consideration for publication in Knowledge and Information Systems Compression and Aggregation of Bayesian Estimates for Data Intensive Computing Ruibin Xi 1, Nan Lin 2, Yixin Chen 3 and Youngjin
More informationClustering in the Linear Model
Short Guides to Microeconometrics Fall 2014 Kurt Schmidheiny Universität Basel Clustering in the Linear Model 2 1 Introduction Clustering in the Linear Model This handout extends the handout on The Multiple
More informationDirichlet Processes A gentle tutorial
Dirichlet Processes A gentle tutorial SELECT Lab Meeting October 14, 2008 Khalid ElArini Motivation We are given a data set, and are told that it was generated from a mixture of Gaussian distributions.
More informationPHASE ESTIMATION ALGORITHM FOR FREQUENCY HOPPED BINARY PSK AND DPSK WAVEFORMS WITH SMALL NUMBER OF REFERENCE SYMBOLS
PHASE ESTIMATION ALGORITHM FOR FREQUENCY HOPPED BINARY PSK AND DPSK WAVEFORMS WITH SMALL NUM OF REFERENCE SYMBOLS Benjamin R. Wiederholt The MITRE Corporation Bedford, MA and Mario A. Blanco The MITRE
More informationA Basic Introduction to Missing Data
John Fox Sociology 740 Winter 2014 Outline Why Missing Data Arise Why Missing Data Arise Global or unit nonresponse. In a survey, certain respondents may be unreachable or may refuse to participate. Item
More informationComparison of Data Mining Techniques used for Financial Data Analysis
Comparison of Data Mining Techniques used for Financial Data Analysis Abhijit A. Sawant 1, P. M. Chawan 2 1 Student, 2 Associate Professor, Department of Computer Technology, VJTI, Mumbai, INDIA Abstract
More informationAdaptive Search with Stochastic Acceptance Probabilities for Global Optimization
Adaptive Search with Stochastic Acceptance Probabilities for Global Optimization Archis Ghate a and Robert L. Smith b a Industrial Engineering, University of Washington, Box 352650, Seattle, Washington,
More informationMessagepassing sequential detection of multiple change points in networks
Messagepassing sequential detection of multiple change points in networks Long Nguyen, Arash Amini Ram Rajagopal University of Michigan Stanford University ISIT, Boston, July 2012 Nguyen/Amini/Rajagopal
More informationNeural Network Addin
Neural Network Addin Version 1.5 Software User s Guide Contents Overview... 2 Getting Started... 2 Working with Datasets... 2 Open a Dataset... 3 Save a Dataset... 3 Data Preprocessing... 3 Lagging...
More informationProbabilistic Methods for TimeSeries Analysis
Probabilistic Methods for TimeSeries Analysis 2 Contents 1 Analysis of Changepoint Models 1 1.1 Introduction................................ 1 1.1.1 Model and Notation....................... 2 1.1.2 Example:
More informationReject Inference in Credit Scoring. JieMen Mok
Reject Inference in Credit Scoring JieMen Mok BMI paper January 2009 ii Preface In the Master programme of Business Mathematics and Informatics (BMI), it is required to perform research on a business
More informationCentre for Central Banking Studies
Centre for Central Banking Studies Technical Handbook No. 4 Applied Bayesian econometrics for central bankers Andrew Blake and Haroon Mumtaz CCBS Technical Handbook No. 4 Applied Bayesian econometrics
More informationMATH4427 Notebook 2 Spring 2016. 2 MATH4427 Notebook 2 3. 2.1 Definitions and Examples... 3. 2.2 Performance Measures for Estimators...
MATH4427 Notebook 2 Spring 2016 prepared by Professor Jenny Baglivo c Copyright 20092016 by Jenny A. Baglivo. All Rights Reserved. Contents 2 MATH4427 Notebook 2 3 2.1 Definitions and Examples...................................
More information2DI36 Statistics. 2DI36 Part II (Chapter 7 of MR)
2DI36 Statistics 2DI36 Part II (Chapter 7 of MR) What Have we Done so Far? Last time we introduced the concept of a dataset and seen how we can represent it in various ways But, how did this dataset came
More informationLeast Squares Estimation
Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN13: 9780470860809 ISBN10: 0470860804 Editors Brian S Everitt & David
More informationDesigning a learning system
Lecture Designing a learning system Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x48845 http://.cs.pitt.edu/~milos/courses/cs750/ Design of a learning system (first vie) Application or Testing
More informationStatistical Machine Learning from Data
Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Gaussian Mixture Models Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique
More informationSamplingbased optimization
Samplingbased optimization Richard Combes October 11, 2013 The topic of this lecture is a family of mathematical techniques called samplingbased methods. These methods are called sampling mathods because
More informationNumerical methods for American options
Lecture 9 Numerical methods for American options Lecture Notes by Andrzej Palczewski Computational Finance p. 1 American options The holder of an American option has the right to exercise it at any moment
More informationMACHINE LEARNING IN HIGH ENERGY PHYSICS
MACHINE LEARNING IN HIGH ENERGY PHYSICS LECTURE #1 Alex Rogozhnikov, 2015 INTRO NOTES 4 days two lectures, two practice seminars every day this is introductory track to machine learning kaggle competition!
More information