Linear regression methods for large n and streaming data

Similar documents
Analysis of Bayesian Dynamic Linear Models

11. Time series and dynamic linear models

Linear Models for Classification

Adequacy of Biomath. Models. Empirical Modeling Tools. Bayesian Modeling. Model Uncertainty / Selection

Simple Linear Regression

Dynamic Linear Models with R

EE 570: Location and Navigation

APPLIED MISSING DATA ANALYSIS

A Tutorial on Particle Filters for Online Nonlinear/Non-Gaussian Bayesian Tracking

PTE505: Inverse Modeling for Subsurface Flow Data Integration (3 Units)

Statistics Graduate Courses

Quantile Regression under misspecification, with an application to the U.S. wage structure

Decision Theory Rational prospecting

Applications of R Software in Bayesian Data Analysis

Master s thesis tutorial: part III

Bayes and Naïve Bayes. cs534-machine Learning

Christfried Webers. Canberra February June 2015

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

Penalized regression: Introduction

Discrete Frobenius-Perron Tracking

> plot(exp.btgpllm, main = "treed GP LLM,", proj = c(1)) > plot(exp.btgpllm, main = "treed GP LLM,", proj = c(2)) quantile diff (error)

Observing the Changing Relationship Between Natural Gas Prices and Power Prices

BayesX - Software for Bayesian Inference in Structured Additive Regression

Introduction to Regression and Data Analysis

Simple Linear Regression Inference

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling

Marketing Mix Modelling and Big Data P. M Cain

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

STAT 360 Probability and Statistics. Fall 2012

Bayesian Updating with Discrete Priors Class 11, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom

Bayesian networks - Time-series models - Apache Spark & Scala

Government of Russian Federation. Faculty of Computer Science School of Data Analysis and Artificial Intelligence

Stat 704 Data Analysis I Probability Review

STA 4273H: Statistical Machine Learning

Statistics 104: Section 6!

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Probability and Random Variables. Generation of random variables (r.v.)

Towards running complex models on big data

CONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE

The Optimality of Naive Bayes

6 Hedging Using Futures

System Identification for Acoustic Comms.:

Introduction to Mobile Robotics Bayes Filter Particle Filter and Monte Carlo Localization

The HB. How Bayesian methods have changed the face of marketing research. Summer 2004

Handling missing data in large data sets. Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Adaptive Demand-Forecasting Approach based on Principal Components Time-series an application of data-mining technique to detection of market movement

On Correlating Performance Metrics

Nonparametric statistics and model selection

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS

Pa8ern Recogni6on. and Machine Learning. Chapter 4: Linear Models for Classifica6on

AP Physics 1 and 2 Lab Investigations

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Bayesian Analysis for the Social Sciences

SAS Software to Fit the Generalized Linear Model

Statistics for Analysis of Experimental Data

Linear Classification. Volker Tresp Summer 2015

Non Linear Dependence Structures: a Copula Opinion Approach in Portfolio Optimization

Regression with a Binary Dependent Variable

Chicago Booth BUSINESS STATISTICS Final Exam Fall 2011

Homework 4 - KEY. Jeff Brenion. June 16, Note: Many problems can be solved in more than one way; we present only a single solution here.

CS 688 Pattern Recognition Lecture 4. Linear Models for Classification

Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University

Statistical Machine Learning

MATHEMATICAL METHODS OF STATISTICS

What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling

Annealing Techniques for Data Integration

Polynomials. Dr. philippe B. laval Kennesaw State University. April 3, 2005

Supplementary PROCESS Documentation

Smoothing and Non-Parametric Regression

Probability and statistics; Rehearsal for pattern recognition

Univariate and Multivariate Methods PEARSON. Addison Wesley

A Bayesian hierarchical surrogate outcome model for multiple sclerosis

2013 MBA Jump Start Program. Statistics Module Part 3

CHAPTER 8 EXAMPLES: MIXTURE MODELING WITH LONGITUDINAL DATA

CHAPTER 9 EXAMPLES: MULTILEVEL MODELING WITH COMPLEX SURVEY DATA

Model-based Synthesis. Tony O Hagan

33. STATISTICS. 33. Statistics 1

PS 271B: Quantitative Methods II. Lecture Notes

[This document contains corrections to a few typos that were found on the version available through the journal s web page]

COURSE PLAN BDA: Biomedical Data Analysis Master in Bioinformatics for Health Sciences Academic Year Qualification.

Lecture 9: Introduction to Pattern Analysis

An Introduction to Machine Learning

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

The Basics of Graphical Models

CONTENTS PREFACE 1 INTRODUCTION 1 2 DATA VISUALIZATION 19

Web-based Supplementary Materials for Bayesian Effect Estimation. Accounting for Adjustment Uncertainty by Chi Wang, Giovanni

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

S+SeqTrial User s Manual

Data Mining Practical Machine Learning Tools and Techniques

Calculating the Probability of Returning a Loan with Binary Probability Models

Transcription:

Linear regression methods for large n and streaming data Large n and small or moderate p is a fairly simple problem. The sufficient statistic for β in OLS (and ridge) is: The concept of sufficiency is key to processing big data. In this case, the sufficient statistic is a simple sum over observations and can be computed in batches or updated online in the obvious way. (2) Linear regression - Part 2 Page 1

Large n For truly massive data subsampling is an option. There are many types of sampling, including: You can also take several samples, average the estimates over samples for a point estimate, and use quantiles of the estimates over the samples as bootstrap confidence intervals. (2) Linear regression - Part 2 Page 2

The problem is more interesting when the regression coefficients evolve over time. A dynamic linear model (DLM) is: Y t N(X t β t, σ 2 I) β t β t 1 Normal(ρβ t 1, τ 2 I) There are far more general versions of the DLM. This is also called a state-space model. Here the regression relationship β t varies over time. Examples where this might be a useful model: If ρ = 1 and τ 0 then the parameters evolve slowly over time. If ρ = 0 then β t is independent over time. (2) Linear regression - Part 2 Page 3

The simplest approach is a weighted linear regression: For example, the weights might be a Gaussian pdf: Or the weights could be a moving window: How to pick the bandwidth? (2) Linear regression - Part 2 Page 4

A more elegant approach is the Kalman Filter (KF). The KF can be motivated using a Bayesian approach. Before discussing the KF, we will introduce Bayesian linear regression. Bayesian methods can be applied to virtually any statistical problem, but we will focus here on linear models: Y β N(Xβ, σ 2 I n ) β Normal(µ, Σ) where σ, µ and Σ are assumed to be known. Bayesians assume that there is truly a fixed of β. However, we acknowledge that we don t and will never know what it is, so we represent our uncertainty about β by treating it as a random variable using a probability distribution. Before we observe the data, our uncertainty is captured with the prior distribution. Above we select the prior β Normal(µ, Σ). A Bayesian analysis combines the data and the prior to give the posterior distribution. Bayes Theorem gives the posterior: That is, posterior likelihood prior. p(β Y) = f(y β)f(β). f(y) The posterior quantifies our uncertainty about β after observing the data, and is what we use to conduct inference and make predictions. (2) Linear regression - Part 2 Page 5

Derivation of the posterior of β for the model Y β N(Xβ, σ 2 I) with prior β Normal(µ, Σ): (2) Linear regression - Part 2 Page 6

Y t N(X t β t, σ 2 I) β t β t 1 Normal(ρβ t 1, τ 2 I) The KF is a sequential application of the Bayesian linear model. At the first time point we apply the usual Bayesian linear model and obtain the posterior: β 1 Y 1 Normal(M 1, V 1 ). This posterior is used to define the prior for β 2. At the second time point, the prior is β 2 β 1 Normal(ρβ 1, τ 2 I). We don t know β 1 exactly, but we have its posterior distribution given all the data we have observed. Accounting for our uncertainty in β 1, the prior for β 2 is: Applying the Bayesian linear model formulas again, the posterior of β 2 is: (2) Linear regression - Part 2 Page 7

General Kalman filter updating rule: (2) Linear regression - Part 2 Page 8

We have looked at only the most simple case. A more general is the DLM is Y t N(X t β t, Σ t ) β t β t 1 Normal(G t β t 1, Ω t ) How to estimate Σ t, G t, and Ω t? So far we have assumed normality everywhere, what to do for non-normal models? Y t Possion[exp(X t β t )] β t β t 1 Normal(G t β t 1, Ω t ) So far we have assumed linear relationships between all variables, how to handle nonlinearity? Y t N[exp(X t β t ), σ 2 I] β t β t 1 Normal(G t β t 1, Ω t ) These extensions have been worked out, but they are complicated (extended KF, unscented KM, ensemble KF, etc.). (2) Linear regression - Part 2 Page 9

Bayesian linear models for large p While we re on the topic of Bayesian linear models, what do Bayesians do for linear (not dynamic) regression with large p? The linear regression model is Y i Normal(X T i β, σ 2 ). The Bayesian model allows up to put priors on the regression coefficients β 1,..., β p. If we believe before seeing the data that most of the covariates are unimportant, we can simply specify a prior that has mass near zero for the β j. For example: This a very intuitive approach, and has been shown to be very competitive. (2) Linear regression - Part 2 Page 10