A Bootstrap Metropolis-Hastings Algorithm for Bayesian Analysis of Big Data

Size: px
Start display at page:

Download "A Bootstrap Metropolis-Hastings Algorithm for Bayesian Analysis of Big Data"

Transcription

1 A Bootstrap Metropolis-Hastings Algorithm for Bayesian Analysis of Big Data Faming Liang University of Florida August 9, 2015

2 Abstract MCMC methods have proven to be a very powerful tool for analyzing data of complex structures. However, their compute-intensive nature, which typically require a large number of iterations and a complete scan of the full dataset for each iteration, precludes their use for big data analysis. We propose the so-called bootstrap Metropolis-Hastings (BMH) algorithm, which provides a general framework for how to tame powerful MCMC methods to be used for big data analysis; that is to replace the full data log-likelihood by a Monte Carlo average of the log-likelihoods that are calculated in parallel from multiple bootstrap samples. The BMH algorithm possesses an embarrassingly parallel structure and avoids repeated scans of the full dataset in iterations, and is thus feasible for big data problems. Compared to the popular divide-and-combine method, BMH can be generally more efficient as it can asymptotically integrate the whole data information into a single simulation run. The BMH algorithm is very flexible. Like the MH algorithm, it can serve as a basic building block for developing advanced MCMC algorithms that are feasible for big data problems.

3 Big Data Big Data: Data too large to handle easily on a single server or too time consuming to analyze using traditional statistical methods. Examples of Big Data: Genome data: using big data to find better treatments for patients through genomic sequencing technologies Atmospheric sciences data: rapidly ballooning observations (e.g., radar, satellites, sensor networks), climate data, ensemble data. Social sciences data: social networks (Facebook, LinkedIn, network), social media data (news, telephone calls) Finance data, image data, etc.

4 Big Data Challenges Accessing, using and visualizing data Server-side processing and distributed storage Limited number of statistical methods: From the view of statistical inference, it is unclear how the current statistical methodology can be transported to the paradigm of big data. Modeling: With growing size typically comes a growing complexity of data structures and of the models needed to account for the structures. Missing data

5 Strategies used in Big Data Analysis Split and Merge: Lin and Xi (2011, SII): Aggregated estimating equation Xie (2013): high dimensional variable selection Song and Liang (2015, JRSSB): Bayesian high dimensional variable selection Online Learning: stream data Data/model reduction: Using low-rank models for approximate inference of massive data Subsampling: Liang et al. (2013, JASA) Bag of Little Bootstraps (Kleiner et al., 2012): provides an efficient way of bootstrapping for big data estimators, which functions by combining the results of bootstrapping multiple small subsets of the big original dataset.

6 Aggregated Estimating Equation Aggregated estimating equation (Lin and Xi, 2011): It employs a divide-and-combine strategy. It is first to compress the raw data of each partition of the full dataset into some low dimensional statistics, and then to obtain an approximation to the estimating equation estimator, the aggregated estimating equation estimator, by solving an equation aggregated from the saved low dimensional statistics in all partitions.

7 Resampling-based Stochastic Approximation Liang et al. (2013, JASA) proposed a new parameter estimator, maximum mean log-likelihood estimator, for big data problems, and a resampling-based stochastic approximation method for obtaining such an estimator. θ l n(θ X n ) = 0 ( ) n 1 m θ l n,m,i(θ X m ) = 0 E θ l m(θ X m ) = 0 The resampling-based stochastic approximation method successfully avoids some difficulties involved in big data computation, such as whole data scanning. i

8 Bag of Little Bootstraps The bootstrap method is a resampling-based method, and has been widely used in applied statistics for assessing the quality of estimators since proposed by Efron (1979). The bag of little bootstraps (Kleiner et al., 2012) provides an efficient way of bootstrapping for big data estimators, which functions by combining the results of bootstrapping multiple small subsets of the big original dataset.

9 Bootstrap MH Algorithm: Motivation Markov chain Monte Carlo (MCMC) methods have been widely used in statistical data analysis, and they have proven to be a very powerful and typically unique computational tool for analyzing data of complex structures. MCMC methods are computer-intensive, which typically require a large number of iterations and a complete scan of the full dataset for each iteration. This feature precludes their use for big data analysis. We aim to develop a framework under which the powerful MCMC methods can be tamed for using in big data analysis, such as parameter estimation, optimization, and model selection.

10 Bootstrap MH Algorithm: Basic Idea The bootstrap Metropolis-Hastings (BMH) algorithm works by replacing the full data log-likelihood by a Monte Carlo average of the log-likelihoods that are calculated in parallel from multiple bootstrap samples, where the bootstrap sample refers to a small set of observations drawn from the full dataset at random and with/without replacement. By this way, BMH avoids repeated scans of the full dataset in iterations, while it is still able to produce sensible solutions, such as parameter estimates or posterior samples, to the problem under consideration. BMH is feasible for big data and workable on parallel and distributed architectures.

11 BMH Algorithm: Notation Let D i denote a bootstrap sample of D, which is resampled from the full dataset at random and with/without replacement. Let m denote the size of D i = {xij : j = 1, 2,..., m}. If resampling is done without replacement, D i is called a subsample or ( n m) -bootstrap sample. Otherwise, Di is called an m-out-of-n bootstrap sample or m/n-bootstrap sample.

12 BMH Algorithm: Notation Let f (D i θ) denote a likelihood-like function of D i, and define l m,n,k (D s θ) = 1 k k log f (D i θ), (1) i=1 where k denotes the number of bootstrap samples drawn from D, and D s = {D 1,..., D k } is the collection of the bootstrap samples. The definition of f (D i θ) depends on the feature of D. If the observations in D are independently and identically distributed (i.i.d.), then, regardless D i is a ( n m) - or m/n-bootstrap sample, we define m f (D i θ) = f (xij θ). (2) j=1

13 BMH Algorithm: Algorithm 1. Draw ϑ from a proposal distribution Q(θ t, ϑ). 2. Draw k bootstrap samples D 1,..., D k via ( n m) - or m/n-bootstrapping. Let D s = {D 1,..., D k }. 3. Calculate the BMH ratio: r(θ t, D s, ϑ) = exp {l m,n,k (D s ϑ) l m,n,k (D s θ t )} π(ϑ) Q(ϑ, θ t ) π(θ t ) Q(θ t, ϑ). 4. Set θ t+1 = ϑ with probability α(θ t, D s, ϑ) = min{1, r(θ t, D s, ϑ)}, and set θ t+1 = θ t with the remaining probability.

14 BMH Algorithm: Remarks In BMH, {θ t } form a Markov chain with the transition kernel given by P m,n,k (θ, dϑ) = D s D α(θ, D s, ϑ)q(θ, ϑ)ψ(d s ) + δ θ (dϑ) 1 D s D Θ α(θ, D s, ϑ )Q(θ, dϑ )ψ(d s) where D denote the space of D s, ψ(d s ) denotes the probability of drawing D s, and δ θ ( ) is an indicator function. For ( ) n m -bootstrapping, ψ(ds ) = ( n k; m) and for m/n-bootstrapping, ψ(d s ) = 1/n mk. (3)

15 BMH Algorithm: Remarks When the observations in D are i.i.d, both the resampling schemes, ( n m) - or m/n-bootstrapping lead to the same stationary distribution of BMH. Since BMH is proposed for simulations on parallel computers, the parameter k specifies the number of processors/nodes used in computing the averaged log-likelihood function. Theoretically, a large value of k is preferred. However, an extremely large value of k may slow down the computation due to the increased inter-node communications. In our experience, to achieve a good performance for BMH, k does not need to be very large. The choice of m can depend on the complexity of the model under consideration, in particular, the dimension of θ. In general, m should increase with the complexity of the model.

16 BMH Algorithm: Convergence Let g m (D θ) = exp{e[log f (D i θ)]}, where E[ ] denotes the expectation. Define the transition kernel [ ] P m (θ, ϑ) = α(θ, ϑ)q(θ, ϑ)+δ θ (dϑ) 1 α(θ, ϑ )Q(θ, ϑ )dϑ, Θ (4) which is induced by the proposal Q(, ) for a MH move with the invariant distribution given by π m (θ D) g m (D θ)π(θ). (5)

17 BMH Algorithm: Convergence Assume the following conditions hold: (A) sup θ Θ E log f (X i θ) <. (B) Assume that P m defines an irreducible and aperiodic Markov chain such that π m ( )P m = π m ( ). Therefore, for any starting point θ 0 Θ, lim t P t m(θ 0, ) π m ( ) = 0, where denotes the total variation norm. (C) For any (θ, ϑ) Θ Θ, 0 < exp{l m,n,k (D s ϑ) l m,n,k (D s θ)}/[g m (D ϑ) g m (D θ)] <, ψ( where ψ(d s ) is the resampling probability of D s from D.

18 BMH Algorithm: Convergence Lemma 1 Assume that the condition (A) holds and m = O(n γ ). If γ < 1/2, then U m,n (D θ) log(g m (D θ)) p 0, as n. (6)

19 BMH Algorithm: Convergence Theorem 1.( ( n m) -bootstrapping) Assume the observations in D are iid and the conditions (A), (B) and (C) hold. Then for any ɛ (0, 1] and any θ 0 Θ, there exist N(ɛ, θ 0 ) N, K(ɛ, θ 0, n) N, and T (ɛ, θ 0, n, k) N such that for any n > N(ɛ, θ 0 ), k > K(ɛ, θ 0, n), and t > T (ɛ, θ 0, n, k), P t m,n,k (θ 0, ) π m ( ) ɛ, where π m ( ) is the stationary distribution of P m as defined in (5). Theorem 2.(m/n-bootstrapping) Under similar conditions to Theorem 1, BMH with m/n-bootstrapping has the same stationary distribution as with ( n m) -bootstrapping.

20 Some key points: BMH Algorithm: Bayesian Inference It follows from the asymptotic normality of posterior distributions, see e.g., Chen (1985), we have π n (θ D) L N(µ n, Σ n ), (7) where µ n denotes the mode of π n (θ D) and Σ n = { 2 log(π n (θ)l(d θ))/ θ θ T } 1. Under regularity conditions, we show that as m, ( π m (θ D) L N µ n, n ) m Σ n. (8) The properties of π n (θ D) can be conveniently inferred from BMH samples.

21 Simulated Example Consider the normal linear regression y i = β 0 + β 1 x i1 + β 2 x i2 + β 3 x i3 + ɛ i, i = 1, 2,..., n, where (β 0, β 1, β 2, β 3 ) = (2, 0.25, 0.25, 0) are regression coefficients, and ɛ 1,..., ɛ n are iid normal random errors with mean 0 and variance σ 2. In simulations, we set n = 10 5 and σ 2 = 0.25, generate both x 1 = (x 11,..., x n1 ) T and x 2 = (x 12,..., x n2 ) T from the multivariate normal distribution N(0, I n ), and set x 3 = (x 13,..., x n3 ) T = 0.7x z, where z is also generated from N(0, I n ). Let θ = (β 0, β 1, β 2, β 3, σ 2 ) and θ = (2, 0.25, 0.25, 0, 0.25) be its true value.

22 Simulated Example Table: Parameter estimation results of MH and BMH for the simulated example. (k, m) m n 100% β 0 β 1 β 2 β 3 log σ 2 MH for the full data (1,10 5 ) 100% ( ) ( ) ( ) ( ) ( ) BMH with n m -bootstrapping (25,200) 0.2% ( ) ( ) ( ) ( ) ( ) (25,500) 0.5% ( ) ( ) ( ) ( ) ( ) (25,1000) 1% ( ) ( ) ( ) ( ) ( )

23 Simulated Example Table: Parameter estimation results of MH and BMH for the simulated example. (k, m) m n 100% β 0 β 1 β 2 β 3 log σ 2 MH for the full data (1,10 5 ) 100% BMH with m/n-bootstrapping (25,200) 0.2% (25,500) 0.5% (25,1000) 1% ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )

24 Simulated Example Table: Comparison of BMH (k = 50, m = 200) with the divide-and-combine (D&C) and AMHT (approximated MH test; Korattikara et al., 2014) for parameter estimation. Algorithm β 0 β 1 β 2 β 3 log(σ 2 ) BMH Estimate SD ( ) ( ) ( ) ( ) ( ) D&C Estimate SD ( ) ( ) ( ) ( ) ( ) AMHT Estimate SD ( ) ( ) ( ) ( ) ( )

25 Variance Estimation Table: MH, BMH, D&C and AMHT estimates of σ11 2,..., σ2 55, σ2 12, σ2 34 and ρ β2,β 3 obtained with pooled samples, where σij 2 denotes the (i, j)th element of Σ 0, and ρ β2,β 3 denotes the correlation coefficient of β 2 and β 3. Method σ11 2 σ22 2 σ33 2 σ44 2 σ55 2 σ12 2 σ34 2 MH BMH D&C AMHT

26 QQ-plot of Posterior Samples (a) β 0 (b) β 1 (c) β 2 (d) β 3 (e) logσ 2 BMH BMH BMH BMH BMH MH MH MH MH MH (f) β 0 (g) β 1 (h) β 2 (i) β 3 (j) logσ 2 AMHT AMHT AMHT AMHT AMHT MH MH MH MH MH Figure: QQplots of the posterior samples generated by MH, BMH (with k = 50, m = 200 and ( n m) -bootstrapping), and AMHT for the simulation example: the plots in the first row are BMH versus MH, and the plots in the second row are AMHT versus MH.

27 Histogram of Posterior Samples (a) β 1 (b) β 1 (c) β 1 Frequency Frequency Frequency MH samples AMHT samples BMH samples Figure: Histograms of the posterior samples of β 1 generated by (a) MH, (b) AMHT, and (c) BMH for the simulation example.

28 Comments on BMH It can asymptotically integrate the whole data information into a single simulation run. Like the MH algorithm, it can serve as a basic buidling block for developing advanced MCMC algorithms.

29 Neural Network Universal approximation ability: A feed-forward network with a single hidden layer containing a finite number of hidden units is a universal approximator among continuous functions on compact subsets, under mild assumptions on the activation function. It is potentially a good tool for big data modeling!

30 Neural Network I 4 H 3 O 3 I 3 H 2 O 2 I 2 B H 1 O 1 I 1 Input Layer Hidden Layer Output Layer Figure: A fully connected one hidden layer MLP network.

31 Tempering BMH for Learning BNN with Big Data 1. Draw k bootstrap samples, D 1,..., D k, with/without replacement from the entire training dataset D. 2. Try to update each sample of the current population (θ 1 t,..., θ Π t ) by the local updating operators, where t indexes iterations, and the energy function is calculated by averaging the energy values calculated from the bootstrap samples D 1,..., D k. 3. Try to exchange θ i t with θ j t for n 1 pairs (i, j) with i being sampled uniformly on {1,..., n} and j = i ± 1 with probability ω i,j, where ω i,i+1 = ω i,i 1 = 0.5 for 1 < i < Π and ω 1,2 = ω Π,Π 1 = 1.

32 j=1 Tempering BMH for Learning BNN with Big Data (Rank=2) Slave-1 Generate Bootstrap Samples (Rank=1) Master (Rank=3) Slave-2 Bootstrap Sample D2 Bootstrap Sample D1 Bootstrap Sample D3 with Updated Parameters θ i t Broadcast θ i t Broadcast θ i t E ( θ i t D2 ) Reduce E ( θ i t D1 ) Reduce E ( θ i t D3 ) 1 3 Calculate 3 E ( θ i ) t Dj and Update Parameters No End of current iteration? Yes No End of simulation? Yes End Output simulation results End Figure: Parallel implementation of tempering BMH: The flowchart of the tempering BMH algorithm with 3 processors.

33 Forest Cover Type Data The goal of this study is to predict forest cover types from cartographic variables in the forested areas with minimal human-caused disturbances. The data were taken from four wilderness areas located in the Roosevelt National Forest of northern Colorado. It consisted of 581,012 observations. Each observation was obtained from the US Geological Survey (USGS) digital elevation model data based on m raster cells, and it consisted of 54 cartographic explanatory variables including 10 quantitative variables, 4 binary wilderness area variables, and 40 binary soil type variables. These observations have been classified into seven classes according to their cover types. The respective class sizes are , , 35754, 2747, 9493, 17367, and

34 Forest Cover Type Data Table 4. BMH results for forest cover type data. The resampling rares for the 7 types of observations are 0.5%, 0.5%, 1%, 2.5%, 1.5%, 1% and 1%, respectively. The aggregated resampling rate for the training data is 0.59%. Bootstrapping k Average network Size Training rate(%) Prediction rate(%) CPU(h) (1.18) 72.2 (0.17) 72.4 (0.17) 32.0 (2.9) m/n (0.95) 72.2 (0.15) 72.4 (0.12) 33.7 (3.2) (1.52) 72.3 (0.07) 72.4 (0.07) 28.5 (2.5) n m (0.83) 72.3 (0.15) 72.4 (0.16) 31.9 (3.0)

35 Forest Cover Type Data: Efficiency of BMH For comparison, we have applied parallel tempering to train the BNN with a single-threaded simulation on an Intel Nehalem server. At each local updating step of parallel tempering, the whole training dataset is scanned once. Hence, this algorithm runs extremely slow. The first 5536 iterations of the simulation have taken 688 CPU hours, although the Intel Nehalem processor is much faster (approximately 2.5 times) than the processor used in the cluster machine. To finish iterations, it will take about 3000 CPU hours (125 days) on the Intel Nehalem server. Compared 3000 hours to 30 hours, it shows a great advantage of the parallelized BMH algorithm for big data problems.

36 Resampling on Distributed Architectures Let S i denote the ith subset of data stored in node i, i = 1,..., k. For each i, 1. Set j = i 1 or i + 1 with equal probability. If j = 0, reset j = k; and if j = k + 1, reset j = Exchange M randomly selected observations between S i and S j, where M can be a pre-specified or random number. It follows from the standard theory of MCMC (see e.g. Geyer, 1991) that the above procedure will ensure that each subset stored in a single node is a random subset of the whole dataset.

37 Discussion We have proposed the BMH algorithm as a basic MCMC algorithm for Bayesian analysis of big data. The BMH algorithm is workable on parallel and distributed architectures and avoids repeated scans of the full dataset in iterations, and is thus feasible for big data problems. Compared to the popular divide-and-combine method, BMH is generally more efficient as it can asymptotically integrate the whole data information into a single simulation run.

38 Discussion (continued) The BMH algorithm is very flexible. Like the Metropolis-Hastings algorithm, it can serve as a basic building block for developing advanced MCMC algorithms that are feasible for big data problems. Sampling: Tempering BMH, which combines BMH with parallel tempering. Model selection: Reversible jump BMH, which combines BMH with reversible jump MCMC. Optimization: Simulated annealing BMH, which combines BMH with simulated annealing. Compared to the existing methods, such as aggregated estimating equation, resampling-based stochastic approximation, and bag of little bootstrap, BMH has the unique power to tame the powerful MCMC methods for using in big data analysis.

39 Acknowledgments NSF grants KAUST grant Student: Jinsu Kim for parallel programming.

Tutorial on Markov Chain Monte Carlo

Tutorial on Markov Chain Monte Carlo Tutorial on Markov Chain Monte Carlo Kenneth M. Hanson Los Alamos National Laboratory Presented at the 29 th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Technology,

More information

Introduction to Markov Chain Monte Carlo

Introduction to Markov Chain Monte Carlo Introduction to Markov Chain Monte Carlo Monte Carlo: sample from a distribution to estimate the distribution to compute max, mean Markov Chain Monte Carlo: sampling using local information Generic problem

More information

Basics of Statistical Machine Learning

Basics of Statistical Machine Learning CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar

More information

Bootstrapping Big Data

Bootstrapping Big Data Bootstrapping Big Data Ariel Kleiner Ameet Talwalkar Purnamrita Sarkar Michael I. Jordan Computer Science Division University of California, Berkeley {akleiner, ameet, psarkar, jordan}@eecs.berkeley.edu

More information

Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University caizhua@gmail.com

Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University caizhua@gmail.com Bayesian Machine Learning (ML): Modeling And Inference in Big Data Zhuhua Cai Google Rice University caizhua@gmail.com 1 Syllabus Bayesian ML Concepts (Today) Bayesian ML on MapReduce (Next morning) Bayesian

More information

Markov Chain Monte Carlo Simulation Made Simple

Markov Chain Monte Carlo Simulation Made Simple Markov Chain Monte Carlo Simulation Made Simple Alastair Smith Department of Politics New York University April2,2003 1 Markov Chain Monte Carlo (MCMC) simualtion is a powerful technique to perform numerical

More information

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Recall areal data also known as lattice data are data Y (s), s D where D is a discrete index set. This usually corresponds to data

More information

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Peter Richtárik Week 3 Randomized Coordinate Descent With Arbitrary Sampling January 27, 2016 1 / 30 The Problem

More information

MAN-BITES-DOG BUSINESS CYCLES ONLINE APPENDIX

MAN-BITES-DOG BUSINESS CYCLES ONLINE APPENDIX MAN-BITES-DOG BUSINESS CYCLES ONLINE APPENDIX KRISTOFFER P. NIMARK The next section derives the equilibrium expressions for the beauty contest model from Section 3 of the main paper. This is followed by

More information

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max

More information

Is Infrastructure Capital Productive? A Dynamic Heterogeneous Approach.

Is Infrastructure Capital Productive? A Dynamic Heterogeneous Approach. Is Infrastructure Capital Productive? A Dynamic Heterogeneous Approach. César Calderón a, Enrique Moral-Benito b, Luis Servén a a The World Bank b CEMFI International conference on Infrastructure Economics

More information

Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014

Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014 Probabilistic Models for Big Data Alex Davies and Roger Frigola University of Cambridge 13th February 2014 The State of Big Data Why probabilistic models for Big Data? 1. If you don t have to worry about

More information

Validation of Software for Bayesian Models using Posterior Quantiles. Samantha R. Cook Andrew Gelman Donald B. Rubin DRAFT

Validation of Software for Bayesian Models using Posterior Quantiles. Samantha R. Cook Andrew Gelman Donald B. Rubin DRAFT Validation of Software for Bayesian Models using Posterior Quantiles Samantha R. Cook Andrew Gelman Donald B. Rubin DRAFT Abstract We present a simulation-based method designed to establish that software

More information

Parameter estimation for nonlinear models: Numerical approaches to solving the inverse problem. Lecture 12 04/08/2008. Sven Zenker

Parameter estimation for nonlinear models: Numerical approaches to solving the inverse problem. Lecture 12 04/08/2008. Sven Zenker Parameter estimation for nonlinear models: Numerical approaches to solving the inverse problem Lecture 12 04/08/2008 Sven Zenker Assignment no. 8 Correct setup of likelihood function One fixed set of observation

More information

Parallelization Strategies for Multicore Data Analysis

Parallelization Strategies for Multicore Data Analysis Parallelization Strategies for Multicore Data Analysis Wei-Chen Chen 1 Russell Zaretzki 2 1 University of Tennessee, Dept of EEB 2 University of Tennessee, Dept. Statistics, Operations, and Management

More information

Detection of changes in variance using binary segmentation and optimal partitioning

Detection of changes in variance using binary segmentation and optimal partitioning Detection of changes in variance using binary segmentation and optimal partitioning Christian Rohrbeck Abstract This work explores the performance of binary segmentation and optimal partitioning in the

More information

Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

More information

Linear Classification. Volker Tresp Summer 2015

Linear Classification. Volker Tresp Summer 2015 Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong

More information

Inference on Phase-type Models via MCMC

Inference on Phase-type Models via MCMC Inference on Phase-type Models via MCMC with application to networks of repairable redundant systems Louis JM Aslett and Simon P Wilson Trinity College Dublin 28 th June 202 Toy Example : Redundant Repairable

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct

More information

Bayesian Statistics: Indian Buffet Process

Bayesian Statistics: Indian Buffet Process Bayesian Statistics: Indian Buffet Process Ilker Yildirim Department of Brain and Cognitive Sciences University of Rochester Rochester, NY 14627 August 2012 Reference: Most of the material in this note

More information

Statistics Graduate Courses

Statistics Graduate Courses Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.

More information

Computational Statistics for Big Data

Computational Statistics for Big Data Lancaster University Computational Statistics for Big Data Author: 1 Supervisors: Paul Fearnhead 1 Emily Fox 2 1 Lancaster University 2 The University of Washington September 1, 2015 Abstract The amount

More information

Hierarchical Bayesian Modeling of the HIV Response to Therapy

Hierarchical Bayesian Modeling of the HIV Response to Therapy Hierarchical Bayesian Modeling of the HIV Response to Therapy Shane T. Jensen Department of Statistics, The Wharton School, University of Pennsylvania March 23, 2010 Joint Work with Alex Braunstein and

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

More information

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not. Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

More information

Lab 8: Introduction to WinBUGS

Lab 8: Introduction to WinBUGS 40.656 Lab 8 008 Lab 8: Introduction to WinBUGS Goals:. Introduce the concepts of Bayesian data analysis.. Learn the basic syntax of WinBUGS. 3. Learn the basics of using WinBUGS in a simple example. Next

More information

A Scalable Bootstrap for Massive Data

A Scalable Bootstrap for Massive Data A Scalable Bootstrap for Massive Data arxiv:2.56v2 [stat.me] 28 Jun 22 Ariel Kleiner Department of Electrical Engineering and Computer Science University of California, Bereley aleiner@eecs.bereley.edu

More information

Note on the EM Algorithm in Linear Regression Model

Note on the EM Algorithm in Linear Regression Model International Mathematical Forum 4 2009 no. 38 1883-1889 Note on the M Algorithm in Linear Regression Model Ji-Xia Wang and Yu Miao College of Mathematics and Information Science Henan Normal University

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

More information

Handling missing data in large data sets. Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza

Handling missing data in large data sets. Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza Handling missing data in large data sets Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza The problem Often in official statistics we have large data sets with many variables and

More information

APPLIED MISSING DATA ANALYSIS

APPLIED MISSING DATA ANALYSIS APPLIED MISSING DATA ANALYSIS Craig K. Enders Series Editor's Note by Todd D. little THE GUILFORD PRESS New York London Contents 1 An Introduction to Missing Data 1 1.1 Introduction 1 1.2 Chapter Overview

More information

Data Mining. Nonlinear Classification

Data Mining. Nonlinear Classification Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15

More information

Towards running complex models on big data

Towards running complex models on big data Towards running complex models on big data Working with all the genomes in the world without changing the model (too much) Daniel Lawson Heilbronn Institute, University of Bristol 2013 1 / 17 Motivation

More information

R 2 -type Curves for Dynamic Predictions from Joint Longitudinal-Survival Models

R 2 -type Curves for Dynamic Predictions from Joint Longitudinal-Survival Models Faculty of Health Sciences R 2 -type Curves for Dynamic Predictions from Joint Longitudinal-Survival Models Inference & application to prediction of kidney graft failure Paul Blanche joint work with M-C.

More information

Bayesian Statistics in One Hour. Patrick Lam

Bayesian Statistics in One Hour. Patrick Lam Bayesian Statistics in One Hour Patrick Lam Outline Introduction Bayesian Models Applications Missing Data Hierarchical Models Outline Introduction Bayesian Models Applications Missing Data Hierarchical

More information

Monte Carlo and Empirical Methods for Stochastic Inference (MASM11/FMS091)

Monte Carlo and Empirical Methods for Stochastic Inference (MASM11/FMS091) Monte Carlo and Empirical Methods for Stochastic Inference (MASM11/FMS091) Magnus Wiktorsson Centre for Mathematical Sciences Lund University, Sweden Lecture 5 Sequential Monte Carlo methods I February

More information

Nonparametric adaptive age replacement with a one-cycle criterion

Nonparametric adaptive age replacement with a one-cycle criterion Nonparametric adaptive age replacement with a one-cycle criterion P. Coolen-Schrijner, F.P.A. Coolen Department of Mathematical Sciences University of Durham, Durham, DH1 3LE, UK e-mail: Pauline.Schrijner@durham.ac.uk

More information

Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering

Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering Department of Industrial Engineering and Management Sciences Northwestern University September 15th, 2014

More information

Sampling via Moment Sharing: A New Framework for Distributed Bayesian Inference for Big Data

Sampling via Moment Sharing: A New Framework for Distributed Bayesian Inference for Big Data Sampling via Moment Sharing: A New Framework for Distributed Bayesian Inference for Big Data (Oxford) in collaboration with: Minjie Xu, Jun Zhu, Bo Zhang (Tsinghua) Balaji Lakshminarayanan (Gatsby) Bayesian

More information

Gaussian Processes to Speed up Hamiltonian Monte Carlo

Gaussian Processes to Speed up Hamiltonian Monte Carlo Gaussian Processes to Speed up Hamiltonian Monte Carlo Matthieu Lê Murray, Iain http://videolectures.net/mlss09uk_murray_mcmc/ Rasmussen, Carl Edward. "Gaussian processes to speed up hybrid Monte Carlo

More information

1 Prior Probability and Posterior Probability

1 Prior Probability and Posterior Probability Math 541: Statistical Theory II Bayesian Approach to Parameter Estimation Lecturer: Songfeng Zheng 1 Prior Probability and Posterior Probability Consider now a problem of statistical inference in which

More information

A LOGNORMAL MODEL FOR INSURANCE CLAIMS DATA

A LOGNORMAL MODEL FOR INSURANCE CLAIMS DATA REVSTAT Statistical Journal Volume 4, Number 2, June 2006, 131 142 A LOGNORMAL MODEL FOR INSURANCE CLAIMS DATA Authors: Daiane Aparecida Zuanetti Departamento de Estatística, Universidade Federal de São

More information

Model Combination. 24 Novembre 2009

Model Combination. 24 Novembre 2009 Model Combination 24 Novembre 2009 Datamining 1 2009-2010 Plan 1 Principles of model combination 2 Resampling methods Bagging Random Forests Boosting 3 Hybrid methods Stacking Generic algorithm for mulistrategy

More information

Parametric fractional imputation for missing data analysis

Parametric fractional imputation for missing data analysis 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 Biometrika (????,??,?, pp. 1 14 C???? Biometrika Trust Printed in

More information

An Internal Model for Operational Risk Computation

An Internal Model for Operational Risk Computation An Internal Model for Operational Risk Computation Seminarios de Matemática Financiera Instituto MEFF-RiskLab, Madrid http://www.risklab-madrid.uam.es/ Nicolas Baud, Antoine Frachot & Thierry Roncalli

More information

Some stability results of parameter identification in a jump diffusion model

Some stability results of parameter identification in a jump diffusion model Some stability results of parameter identification in a jump diffusion model D. Düvelmeyer Technische Universität Chemnitz, Fakultät für Mathematik, 09107 Chemnitz, Germany Abstract In this paper we discuss

More information

Recent Developments of Statistical Application in. Finance. Ruey S. Tsay. Graduate School of Business. The University of Chicago

Recent Developments of Statistical Application in. Finance. Ruey S. Tsay. Graduate School of Business. The University of Chicago Recent Developments of Statistical Application in Finance Ruey S. Tsay Graduate School of Business The University of Chicago Guanghua Conference, June 2004 Summary Focus on two parts: Applications in Finance:

More information

A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier

A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier G.T. Prasanna Kumari Associate Professor, Dept of Computer Science and Engineering, Gokula Krishna College of Engg, Sullurpet-524121,

More information

Monte Carlo-based statistical methods (MASM11/FMS091)

Monte Carlo-based statistical methods (MASM11/FMS091) Monte Carlo-based statistical methods (MASM11/FMS091) Jimmy Olsson Centre for Mathematical Sciences Lund University, Sweden Lecture 5 Sequential Monte Carlo methods I February 5, 2013 J. Olsson Monte Carlo-based

More information

Estimating the Degree of Activity of jumps in High Frequency Financial Data. joint with Yacine Aït-Sahalia

Estimating the Degree of Activity of jumps in High Frequency Financial Data. joint with Yacine Aït-Sahalia Estimating the Degree of Activity of jumps in High Frequency Financial Data joint with Yacine Aït-Sahalia Aim and setting An underlying process X = (X t ) t 0, observed at equally spaced discrete times

More information

Bayesian Methods. 1 The Joint Posterior Distribution

Bayesian Methods. 1 The Joint Posterior Distribution Bayesian Methods Every variable in a linear model is a random variable derived from a distribution function. A fixed factor becomes a random variable with possibly a uniform distribution going from a lower

More information

Generating Random Numbers Variance Reduction Quasi-Monte Carlo. Simulation Methods. Leonid Kogan. MIT, Sloan. 15.450, Fall 2010

Generating Random Numbers Variance Reduction Quasi-Monte Carlo. Simulation Methods. Leonid Kogan. MIT, Sloan. 15.450, Fall 2010 Simulation Methods Leonid Kogan MIT, Sloan 15.450, Fall 2010 c Leonid Kogan ( MIT, Sloan ) Simulation Methods 15.450, Fall 2010 1 / 35 Outline 1 Generating Random Numbers 2 Variance Reduction 3 Quasi-Monte

More information

Sales forecasting # 2

Sales forecasting # 2 Sales forecasting # 2 Arthur Charpentier arthur.charpentier@univ-rennes1.fr 1 Agenda Qualitative and quantitative methods, a very general introduction Series decomposition Short versus long term forecasting

More information

Exploiting the Statistics of Learning and Inference

Exploiting the Statistics of Learning and Inference Exploiting the Statistics of Learning and Inference Max Welling Institute for Informatics University of Amsterdam Science Park 904, Amsterdam, Netherlands m.welling@uva.nl Abstract. When dealing with datasets

More information

Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Knowledge Discovery and Data Mining Unit # 11 Sajjad Haider Fall 2013 1 Supervised Learning Process Data Collection/Preparation Data Cleaning Discretization Supervised/Unuspervised Identification of right

More information

Sampling for Bayesian computation with large datasets

Sampling for Bayesian computation with large datasets Sampling for Bayesian computation with large datasets Zaiying Huang Andrew Gelman April 27, 2005 Abstract Multilevel models are extremely useful in handling large hierarchical datasets. However, computation

More information

Big Data, Statistics, and the Internet

Big Data, Statistics, and the Internet Big Data, Statistics, and the Internet Steven L. Scott April, 4 Steve Scott (Google) Big Data, Statistics, and the Internet April, 4 / 39 Summary Big data live on more than one machine. Computing takes

More information

LOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as

LOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as LOGISTIC REGRESSION Nitin R Patel Logistic regression extends the ideas of multiple linear regression to the situation where the dependent variable, y, is binary (for convenience we often code these values

More information

Topic models for Sentiment analysis: A Literature Survey

Topic models for Sentiment analysis: A Literature Survey Topic models for Sentiment analysis: A Literature Survey Nikhilkumar Jadhav 123050033 June 26, 2014 In this report, we present the work done so far in the field of sentiment analysis using topic models.

More information

Applications of R Software in Bayesian Data Analysis

Applications of R Software in Bayesian Data Analysis Article International Journal of Information Science and System, 2012, 1(1): 7-23 International Journal of Information Science and System Journal homepage: www.modernscientificpress.com/journals/ijinfosci.aspx

More information

More details on the inputs, functionality, and output can be found below.

More details on the inputs, functionality, and output can be found below. Overview: The SMEEACT (Software for More Efficient, Ethical, and Affordable Clinical Trials) web interface (http://research.mdacc.tmc.edu/smeeactweb) implements a single analysis of a two-armed trial comparing

More information

Principle of Data Reduction

Principle of Data Reduction Chapter 6 Principle of Data Reduction 6.1 Introduction An experimenter uses the information in a sample X 1,..., X n to make inferences about an unknown parameter θ. If the sample size n is large, then

More information

Web-based Supplementary Materials for Bayesian Effect Estimation. Accounting for Adjustment Uncertainty by Chi Wang, Giovanni

Web-based Supplementary Materials for Bayesian Effect Estimation. Accounting for Adjustment Uncertainty by Chi Wang, Giovanni 1 Web-based Supplementary Materials for Bayesian Effect Estimation Accounting for Adjustment Uncertainty by Chi Wang, Giovanni Parmigiani, and Francesca Dominici In Web Appendix A, we provide detailed

More information

Data Mining Practical Machine Learning Tools and Techniques

Data Mining Practical Machine Learning Tools and Techniques Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea

More information

A Practical Scheme for Wireless Network Operation

A Practical Scheme for Wireless Network Operation A Practical Scheme for Wireless Network Operation Radhika Gowaikar, Amir F. Dana, Babak Hassibi, Michelle Effros June 21, 2004 Abstract In many problems in wireline networks, it is known that achieving

More information

The Scientific Data Mining Process

The Scientific Data Mining Process Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

More information

Incorporating cost in Bayesian Variable Selection, with application to cost-effective measurement of quality of health care.

Incorporating cost in Bayesian Variable Selection, with application to cost-effective measurement of quality of health care. Incorporating cost in Bayesian Variable Selection, with application to cost-effective measurement of quality of health care University of Florida 10th Annual Winter Workshop: Bayesian Model Selection and

More information

Statistical machine learning, high dimension and big data

Statistical machine learning, high dimension and big data Statistical machine learning, high dimension and big data S. Gaïffas 1 14 mars 2014 1 CMAP - Ecole Polytechnique Agenda for today Divide and Conquer principle for collaborative filtering Graphical modelling,

More information

Package EstCRM. July 13, 2015

Package EstCRM. July 13, 2015 Version 1.4 Date 2015-7-11 Package EstCRM July 13, 2015 Title Calibrating Parameters for the Samejima's Continuous IRT Model Author Cengiz Zopluoglu Maintainer Cengiz Zopluoglu

More information

CI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore.

CI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore. CI6227: Data Mining Lesson 11b: Ensemble Learning Sinno Jialin PAN Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore Acknowledgements: slides are adapted from the lecture notes

More information

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines

More information

Decision Trees from large Databases: SLIQ

Decision Trees from large Databases: SLIQ Decision Trees from large Databases: SLIQ C4.5 often iterates over the training set How often? If the training set does not fit into main memory, swapping makes C4.5 unpractical! SLIQ: Sort the values

More information

Monte Carlo Simulation

Monte Carlo Simulation 1 Monte Carlo Simulation Stefan Weber Leibniz Universität Hannover email: sweber@stochastik.uni-hannover.de web: www.stochastik.uni-hannover.de/ sweber Monte Carlo Simulation 2 Quantifying and Hedging

More information

APPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder

APPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder APPM4720/5720: Fast algorithms for big data Gunnar Martinsson The University of Colorado at Boulder Course objectives: The purpose of this course is to teach efficient algorithms for processing very large

More information

Introduction to Support Vector Machines. Colin Campbell, Bristol University

Introduction to Support Vector Machines. Colin Campbell, Bristol University Introduction to Support Vector Machines Colin Campbell, Bristol University 1 Outline of talk. Part 1. An Introduction to SVMs 1.1. SVMs for binary classification. 1.2. Soft margins and multi-class classification.

More information

Constrained Bayes and Empirical Bayes Estimator Applications in Insurance Pricing

Constrained Bayes and Empirical Bayes Estimator Applications in Insurance Pricing Communications for Statistical Applications and Methods 2013, Vol 20, No 4, 321 327 DOI: http://dxdoiorg/105351/csam2013204321 Constrained Bayes and Empirical Bayes Estimator Applications in Insurance

More information

Microclustering: When the Cluster Sizes Grow Sublinearly with the Size of the Data Set

Microclustering: When the Cluster Sizes Grow Sublinearly with the Size of the Data Set Microclustering: When the Cluster Sizes Grow Sublinearly with the Size of the Data Set Jeffrey W. Miller Brenda Betancourt Abbas Zaidi Hanna Wallach Rebecca C. Steorts Abstract Most generative models for

More information

Applied Multivariate Analysis - Big data analytics

Applied Multivariate Analysis - Big data analytics Applied Multivariate Analysis - Big data analytics Nathalie Villa-Vialaneix nathalie.villa@toulouse.inra.fr http://www.nathalievilla.org M1 in Economics and Economics and Statistics Toulouse School of

More information

Efficiency and the Cramér-Rao Inequality

Efficiency and the Cramér-Rao Inequality Chapter Efficiency and the Cramér-Rao Inequality Clearly we would like an unbiased estimator ˆφ (X of φ (θ to produce, in the long run, estimates which are fairly concentrated i.e. have high precision.

More information

Compression and Aggregation of Bayesian Estimates for Data Intensive Computing

Compression and Aggregation of Bayesian Estimates for Data Intensive Computing Under consideration for publication in Knowledge and Information Systems Compression and Aggregation of Bayesian Estimates for Data Intensive Computing Ruibin Xi 1, Nan Lin 2, Yixin Chen 3 and Youngjin

More information

Clustering in the Linear Model

Clustering in the Linear Model Short Guides to Microeconometrics Fall 2014 Kurt Schmidheiny Universität Basel Clustering in the Linear Model 2 1 Introduction Clustering in the Linear Model This handout extends the handout on The Multiple

More information

Dirichlet Processes A gentle tutorial

Dirichlet Processes A gentle tutorial Dirichlet Processes A gentle tutorial SELECT Lab Meeting October 14, 2008 Khalid El-Arini Motivation We are given a data set, and are told that it was generated from a mixture of Gaussian distributions.

More information

PHASE ESTIMATION ALGORITHM FOR FREQUENCY HOPPED BINARY PSK AND DPSK WAVEFORMS WITH SMALL NUMBER OF REFERENCE SYMBOLS

PHASE ESTIMATION ALGORITHM FOR FREQUENCY HOPPED BINARY PSK AND DPSK WAVEFORMS WITH SMALL NUMBER OF REFERENCE SYMBOLS PHASE ESTIMATION ALGORITHM FOR FREQUENCY HOPPED BINARY PSK AND DPSK WAVEFORMS WITH SMALL NUM OF REFERENCE SYMBOLS Benjamin R. Wiederholt The MITRE Corporation Bedford, MA and Mario A. Blanco The MITRE

More information

A Basic Introduction to Missing Data

A Basic Introduction to Missing Data John Fox Sociology 740 Winter 2014 Outline Why Missing Data Arise Why Missing Data Arise Global or unit non-response. In a survey, certain respondents may be unreachable or may refuse to participate. Item

More information

Comparison of Data Mining Techniques used for Financial Data Analysis

Comparison of Data Mining Techniques used for Financial Data Analysis Comparison of Data Mining Techniques used for Financial Data Analysis Abhijit A. Sawant 1, P. M. Chawan 2 1 Student, 2 Associate Professor, Department of Computer Technology, VJTI, Mumbai, INDIA Abstract

More information

Adaptive Search with Stochastic Acceptance Probabilities for Global Optimization

Adaptive Search with Stochastic Acceptance Probabilities for Global Optimization Adaptive Search with Stochastic Acceptance Probabilities for Global Optimization Archis Ghate a and Robert L. Smith b a Industrial Engineering, University of Washington, Box 352650, Seattle, Washington,

More information

Message-passing sequential detection of multiple change points in networks

Message-passing sequential detection of multiple change points in networks Message-passing sequential detection of multiple change points in networks Long Nguyen, Arash Amini Ram Rajagopal University of Michigan Stanford University ISIT, Boston, July 2012 Nguyen/Amini/Rajagopal

More information

Neural Network Add-in

Neural Network Add-in Neural Network Add-in Version 1.5 Software User s Guide Contents Overview... 2 Getting Started... 2 Working with Datasets... 2 Open a Dataset... 3 Save a Dataset... 3 Data Pre-processing... 3 Lagging...

More information

Probabilistic Methods for Time-Series Analysis

Probabilistic Methods for Time-Series Analysis Probabilistic Methods for Time-Series Analysis 2 Contents 1 Analysis of Changepoint Models 1 1.1 Introduction................................ 1 1.1.1 Model and Notation....................... 2 1.1.2 Example:

More information

Reject Inference in Credit Scoring. Jie-Men Mok

Reject Inference in Credit Scoring. Jie-Men Mok Reject Inference in Credit Scoring Jie-Men Mok BMI paper January 2009 ii Preface In the Master programme of Business Mathematics and Informatics (BMI), it is required to perform research on a business

More information

Centre for Central Banking Studies

Centre for Central Banking Studies Centre for Central Banking Studies Technical Handbook No. 4 Applied Bayesian econometrics for central bankers Andrew Blake and Haroon Mumtaz CCBS Technical Handbook No. 4 Applied Bayesian econometrics

More information

MATH4427 Notebook 2 Spring 2016. 2 MATH4427 Notebook 2 3. 2.1 Definitions and Examples... 3. 2.2 Performance Measures for Estimators...

MATH4427 Notebook 2 Spring 2016. 2 MATH4427 Notebook 2 3. 2.1 Definitions and Examples... 3. 2.2 Performance Measures for Estimators... MATH4427 Notebook 2 Spring 2016 prepared by Professor Jenny Baglivo c Copyright 2009-2016 by Jenny A. Baglivo. All Rights Reserved. Contents 2 MATH4427 Notebook 2 3 2.1 Definitions and Examples...................................

More information

2DI36 Statistics. 2DI36 Part II (Chapter 7 of MR)

2DI36 Statistics. 2DI36 Part II (Chapter 7 of MR) 2DI36 Statistics 2DI36 Part II (Chapter 7 of MR) What Have we Done so Far? Last time we introduced the concept of a dataset and seen how we can represent it in various ways But, how did this dataset came

More information

Least Squares Estimation

Least Squares Estimation Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

More information

Designing a learning system

Designing a learning system Lecture Designing a learning system Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x4-8845 http://.cs.pitt.edu/~milos/courses/cs750/ Design of a learning system (first vie) Application or Testing

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Gaussian Mixture Models Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique

More information

Sampling-based optimization

Sampling-based optimization Sampling-based optimization Richard Combes October 11, 2013 The topic of this lecture is a family of mathematical techniques called sampling-based methods. These methods are called sampling mathods because

More information

Numerical methods for American options

Numerical methods for American options Lecture 9 Numerical methods for American options Lecture Notes by Andrzej Palczewski Computational Finance p. 1 American options The holder of an American option has the right to exercise it at any moment

More information

MACHINE LEARNING IN HIGH ENERGY PHYSICS

MACHINE LEARNING IN HIGH ENERGY PHYSICS MACHINE LEARNING IN HIGH ENERGY PHYSICS LECTURE #1 Alex Rogozhnikov, 2015 INTRO NOTES 4 days two lectures, two practice seminars every day this is introductory track to machine learning kaggle competition!

More information