# Penalized regression: Introduction

 To view this video please enable JavaScript, and consider upgrading to a web browser that supports HTML5 video
Save this PDF as:

Size: px
Start display at page:

## Transcription

1 Penalized regression: Introduction Patrick Breheny August 30 Patrick Breheny BST 764: Applied Statistical Modeling 1/19

2 Maximum likelihood Much of 20th-century statistics dealt with maximum likelihood estimation: One begins by specifying a distribution f(x θ) for the data This distribution in turn induces a log-likelihood function L(θ x), which specifies how likely the observed data is for various values of the unknown parameters Estimates can be found by finding the most likely values of the unknown parameters Hypothesis tests and confidence intervals can be carried out by evaluating how the likelihood changes in response to moving away from these most likely values Patrick Breheny BST 764: Applied Statistical Modeling 2/19

3 Problems with maximum likelihood in regression Maximum likelihood estimation has many wonderful properties, but is often unsatisfactory in regression problems for two reasons: Large variability: when p is large with respect to n, or when columns of X are highly correlated, the variance of ˆβ is large Lack of interpretability: if p is large, we often desire a smaller set of predictors in order to gain insight into the most relevant relationships between y and X Patrick Breheny BST 764: Applied Statistical Modeling 3/19

4 Subset selection Probably the most common solution to this problem is choose a subset of important explanatory variables based on ad hoc trial and error using p-values to guide whether a variable should be in the model or not However, this approach is impossible to replicate and therefore to study the statistical properties of A more systematic approach is to exhaustively search all possible subsets and then use some sort of criterion such as AIC or BIC to choose an optimal subset Patrick Breheny BST 764: Applied Statistical Modeling 4/19

5 Problem #1: Computational infeasibility One problem with this approach is that the number of all possible subsets grows exponentially with p With today s computers, searching all possible subsets is computationally infeasible for p much larger than 40 or 50 For this reason, modifications such as forward selection, backward selection, and forward/backward hybrids have been proposed Patrick Breheny BST 764: Applied Statistical Modeling 5/19

6 Problem #2: Instability Another problem is the fact that subset selection is discontinuous, in the sense that an infinitesimally small change in the data can result in completely different estimates As a result, subset selection is often unstable and highly variable, especially in high dimensions As we will see, penalized regression allows us to accomplish the same goals as subset selection, but in a more stable, continuous, and computationally efficient fashion Patrick Breheny BST 764: Applied Statistical Modeling 6/19

7 Subset selection pitfalls It should be pointed out that it is common practice to report inferential results from ordinary least squares models following subset selection as if the model had been prespecified from the beginning This is unfortunate, as the resulting inferential procedures violate every principle of statistical estimation and hypothesis testing: Test statistics no longer follow t/f distributions Standard errors are biased low, and confidence intervals falsely narrow p-values are falsely small Regression coefficients are biased away from zero Patrick Breheny BST 764: Applied Statistical Modeling 7/19

8 Penalized maximum likelihood A different way of dealing with this problem is to introduce a penalty: instead of maximizing l(θ x) = log{l(θ x}, we maximize the function where M(θ) = l(θ x) λp (θ) P is a function that penalizes what one would consider less realistic values of the unknown parameters λ controls the tradeoff between the two parts The function M is called the objective function Patrick Breheny BST 764: Applied Statistical Modeling 8/19

9 Penalized maximum likelihood (cont d) In regression, we usually think about minimizing a loss function (usually squared error loss) rather than maximizing likelihood; an equivalent formulation is that we estimate θ by minimizing M(θ) = L(θ x) + λp (θ) where L here is a loss function (usually a quantity that is proportional to a negative log likelihood, such as the residual sum of squares) Patrick Breheny BST 764: Applied Statistical Modeling 9/19

10 Penalty functions What exactly do we mean by less realistic values? In the first section of this course, we mean that coefficient values around zero are more believable than those far away from zero, and P is therefore a function which penalizes coefficients as they get further away from zero The two penalties that we will cover in depth in this section of the course are ridge regression and the lasso: Ridge: P (β) = Lasso: P (β) = p j=1 β 2 j p β j j=1 Patrick Breheny BST 764: Applied Statistical Modeling 10/19

11 Penalty functions (cont d) Later in the course, we will use the idea of penalization when fitting curves to data, to favor simple, smoother curves over wiggly, erratic ones; here, P is a measure of the wiggliness of the curve There are also plenty of other varieties of penalization in existence that we will not address in this course The all operate on the same basic principle, however: all other things being equal, we would tend to favor the simpler explanation over the more complicated one Patrick Breheny BST 764: Applied Statistical Modeling 11/19

12 The regularization parameter As mentioned earlier, the parameter λ controls the tradeoff between the penalty and the fit (loss/likelihood): When λ is too small, we tend to overfit the data and end up models that have high variance When λ is too large, we tend to underfit the data and end up with models that are too simplistic and thus potentially biased The parameter λ is called the regularization parameter; changing the regularization parameter allows us to directly balance the bias-variance tradeoff Obviously, selection of λ is a very important practical aspect of fitting these models Patrick Breheny BST 764: Applied Statistical Modeling 12/19

13 Bayesian interpretation From a Bayesian perspective, one can think of the penalty as arising from a prior distribution on the parameters The objective function is then (proportional to) the log of the posterior distribution of θ given the data By optimizing this objective function, we are finding the mode of the posterior distribution of θ Patrick Breheny BST 764: Applied Statistical Modeling 13/19

14 Constrained regression Yet another way to think about penalized regression is that they imply a constraint on the values of θ Suppose we were trying to maximize the likelihood subject to the constraint that P (θ) t A standard approach to solving such problem is to introduce a Lagrange multiplier; when we do so, we arrive at the same objective function as earlier Patrick Breheny BST 764: Applied Statistical Modeling 14/19

15 Penalization and the intercept term Earlier, we mentioned that penalized regression revolves around an assumption that coefficient values around zero are more believable than those far away from zero Some care is needed, however, in the application of this idea First of all, it does not make sense to apply this idea to the intercept, unless you happened to have some reason to think that the mean of y should be zero Hence, the intercept is not included in the penalty; if it were, the estimates would not be invariant to changes of location Patrick Breheny BST 764: Applied Statistical Modeling 15/19

16 Standardization A separate consideration is how to make far from zero mean the same thing for all the variables For example, suppose x 1 varied from 0 to 1, while x 2 varied from 0 to 1 million; clearly, a one-unit change in x does not mean the same for both of these variables Thus, the explanatory variables are usually standardized prior to model fitting to have mean zero and standard deviation 1; i.e., for all j x j = 0 x T j x j = n Patrick Breheny BST 764: Applied Statistical Modeling 16/19

17 Standardization (cont d) This can be accomplished without any loss of generality: Any location shifts for X are absorbed into the intercept Scale changes can be reversed after the model has been fit: x ij β j = x ij a aβ j = x ij βj ; i.e., if we had to divide x j by a to standardize it, we simply divide the transformed solution β j by a to obtain β j on the original scale Patrick Breheny BST 764: Applied Statistical Modeling 17/19

18 Further benefits Centering and scaling the explanatory variables has added benefits: The explanatory variables are now orthogonal to the intercept term, meaning that in the standardized covariate space, ˆβ 0 = ȳ regardless of what goes on in the rest of the model In other words, if we center y by subtracting off its mean, we don t even need to estimate β 0 Also, standardization simplifies the solutions; recall from BST 760 that for simple linear regression ˆα = ȳ ˆβ x (yi ȳ)(x i x) ˆβ = (xi x) 2 If we center and scale x and center y, however, then we get the much simpler expression ˆβ = x T y/n Patrick Breheny BST 764: Applied Statistical Modeling 18/19

19 Standardization summary To summarize, centering the response and centering and scaling each explanatory variable to have mean 0 and standard deviation 1 has the following benefits: Estimates of β are location-scale invariant Computational savings (we only need to estimate p parameters, not p + 1) Simplicity No loss of generality, as we can transform back to the original scale once we finish fitting the model Patrick Breheny BST 764: Applied Statistical Modeling 19/19

: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

### Local classification and local likelihoods

Local classification and local likelihoods November 18 k-nearest neighbors The idea of local regression can be extended to classification as well The simplest way of doing so is called nearest neighbor

### Multiple Linear Regression in Data Mining

Multiple Linear Regression in Data Mining Contents 2.1. A Review of Multiple Linear Regression 2.2. Illustration of the Regression Process 2.3. Subset Selection in Linear Regression 1 2 Chap. 2 Multiple

### Statistics in Retail Finance. Chapter 6: Behavioural models

Statistics in Retail Finance 1 Overview > So far we have focussed mainly on application scorecards. In this chapter we shall look at behavioural models. We shall cover the following topics:- Behavioural

### NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

### Statistics 112 Regression Cheatsheet Section 1B - Ryan Rosario

Statistics 112 Regression Cheatsheet Section 1B - Ryan Rosario I have found that the best way to practice regression is by brute force That is, given nothing but a dataset and your mind, compute everything

### Introduction to nonparametric regression: Least squares vs. Nearest neighbors

Introduction to nonparametric regression: Least squares vs. Nearest neighbors Patrick Breheny October 30 Patrick Breheny STA 621: Nonparametric Statistics 1/16 Introduction For the remainder of the course,

### Ridge Regression. Patrick Breheny. September 1. Ridge regression Selection of λ Ridge regression in R/SAS

Ridge Regression Patrick Breheny September 1 Patrick Breheny BST 764: Applied Statistical Modeling 1/22 Ridge regression: Definition Definition and solution Properties As mentioned in the previous lecture,

### Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives

### Statistical Models in R

Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Structure of models in R Model Assessment (Part IA) Anova

Lecture 16: Generalized Additive Models Regression III: Advanced Methods Bill Jacoby Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Goals of the Lecture Introduce Additive Models

### Section 6: Model Selection, Logistic Regression and more...

Section 6: Model Selection, Logistic Regression and more... Carlos M. Carvalho The University of Texas McCombs School of Business http://faculty.mccombs.utexas.edu/carlos.carvalho/teaching/ 1 Model Building

### 5. Multiple regression

5. Multiple regression QBUS6840 Predictive Analytics https://www.otexts.org/fpp/5 QBUS6840 Predictive Analytics 5. Multiple regression 2/39 Outline Introduction to multiple linear regression Some useful

### SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

### Penalized Logistic Regression and Classification of Microarray Data

Penalized Logistic Regression and Classification of Microarray Data Milan, May 2003 Anestis Antoniadis Laboratoire IMAG-LMC University Joseph Fourier Grenoble, France Penalized Logistic Regression andclassification

### Chapter 7: Simple linear regression Learning Objectives

Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -

### " Y. Notation and Equations for Regression Lecture 11/4. Notation:

Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through

### STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct

### 5. Linear Regression

5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4

### Marketing Mix Modelling and Big Data P. M Cain

1) Introduction Marketing Mix Modelling and Big Data P. M Cain Big data is generally defined in terms of the volume and variety of structured and unstructured information. Whereas structured data is stored

### Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in

### Nonparametric statistics and model selection

Chapter 5 Nonparametric statistics and model selection In Chapter, we learned about the t-test and its variations. These were designed to compare sample means, and relied heavily on assumptions of normality.

### Recall this chart that showed how most of our course would be organized:

Chapter 4 One-Way ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical

### Elements of statistics (MATH0487-1)

Elements of statistics (MATH0487-1) Prof. Dr. Dr. K. Van Steen University of Liège, Belgium December 10, 2012 Introduction to Statistics Basic Probability Revisited Sampling Exploratory Data Analysis -

### Review Jeopardy. Blue vs. Orange. Review Jeopardy

Review Jeopardy Blue vs. Orange Review Jeopardy Jeopardy Round Lectures 0-3 Jeopardy Round \$200 How could I measure how far apart (i.e. how different) two observations, y 1 and y 2, are from each other?

### Simple linear regression

Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between

### , has mean A) 0.3. B) the smaller of 0.8 and 0.5. C) 0.15. D) which cannot be determined without knowing the sample results.

BA 275 Review Problems - Week 9 (11/20/06-11/24/06) CD Lessons: 69, 70, 16-20 Textbook: pp. 520-528, 111-124, 133-141 An SRS of size 100 is taken from a population having proportion 0.8 of successes. An

### Simple Linear Regression Inference

Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

### INTRODUCTORY STATISTICS

INTRODUCTORY STATISTICS FIFTH EDITION Thomas H. Wonnacott University of Western Ontario Ronald J. Wonnacott University of Western Ontario WILEY JOHN WILEY & SONS New York Chichester Brisbane Toronto Singapore

### Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma siersma@sund.ku.dk The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association

### Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

### Statistical Machine Learning

Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

### Age to Age Factor Selection under Changing Development Chris G. Gross, ACAS, MAAA

Age to Age Factor Selection under Changing Development Chris G. Gross, ACAS, MAAA Introduction A common question faced by many actuaries when selecting loss development factors is whether to base the selected

### 11. Analysis of Case-control Studies Logistic Regression

Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:

### 4. Simple regression. QBUS6840 Predictive Analytics. https://www.otexts.org/fpp/4

4. Simple regression QBUS6840 Predictive Analytics https://www.otexts.org/fpp/4 Outline The simple linear model Least squares estimation Forecasting with regression Non-linear functional forms Regression

### New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Introduction

Introduction New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Predictive analytics encompasses the body of statistical knowledge supporting the analysis of massive data sets.

### Multivariate Normal Distribution

Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #4-7/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues

### Regression, least squares

Regression, least squares Joe Felsenstein Department of Genome Sciences and Department of Biology Regression, least squares p.1/24 Fitting a straight line X Two distinct cases: The X values are chosen

### Factors affecting online sales

Factors affecting online sales Table of contents Summary... 1 Research questions... 1 The dataset... 2 Descriptive statistics: The exploratory stage... 3 Confidence intervals... 4 Hypothesis tests... 4

### 2013 MBA Jump Start Program. Statistics Module Part 3

2013 MBA Jump Start Program Module 1: Statistics Thomas Gilbert Part 3 Statistics Module Part 3 Hypothesis Testing (Inference) Regressions 2 1 Making an Investment Decision A researcher in your firm just

### Model Validation Techniques

Model Validation Techniques Kevin Mahoney, FCAS kmahoney@ travelers.com CAS RPM Seminar March 17, 2010 Uses of Statistical Models in P/C Insurance Examples of Applications Determine expected loss cost

### Adequacy of Biomath. Models. Empirical Modeling Tools. Bayesian Modeling. Model Uncertainty / Selection

Directions in Statistical Methodology for Multivariable Predictive Modeling Frank E Harrell Jr University of Virginia Seattle WA 19May98 Overview of Modeling Process Model selection Regression shape Diagnostics

### Inferential Statistics

Inferential Statistics Sampling and the normal distribution Z-scores Confidence levels and intervals Hypothesis testing Commonly used statistical methods Inferential Statistics Descriptive statistics are

### Building a Smooth Yield Curve. University of Chicago. Jeff Greco

Building a Smooth Yield Curve University of Chicago Jeff Greco email: jgreco@math.uchicago.edu Preliminaries As before, we will use continuously compounding Act/365 rates for both the zero coupon rates

### Some Essential Statistics The Lure of Statistics

Some Essential Statistics The Lure of Statistics Data Mining Techniques, by M.J.A. Berry and G.S Linoff, 2004 Statistics vs. Data Mining..lie, damn lie, and statistics mining data to support preconceived

### Regression Analysis: Basic Concepts

The simple linear model Regression Analysis: Basic Concepts Allin Cottrell Represents the dependent variable, y i, as a linear function of one independent variable, x i, subject to a random disturbance

### 2. Simple Linear Regression

Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according

### Analysis of Bayesian Dynamic Linear Models

Analysis of Bayesian Dynamic Linear Models Emily M. Casleton December 17, 2010 1 Introduction The main purpose of this project is to explore the Bayesian analysis of Dynamic Linear Models (DLMs). The main

### Chapter 15. Mixed Models. 15.1 Overview. A flexible approach to correlated data.

Chapter 15 Mixed Models A flexible approach to correlated data. 15.1 Overview Correlated data arise frequently in statistical analyses. This may be due to grouping of subjects, e.g., students within classrooms,

### 1/27/2013. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 Introduce moderated multiple regression Continuous predictor continuous predictor Continuous predictor categorical predictor Understand

### Econometrics Simple Linear Regression

Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight

### CONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE

1 2 CONTENTS OF DAY 2 I. More Precise Definition of Simple Random Sample 3 Connection with independent random variables 3 Problems with small populations 8 II. Why Random Sampling is Important 9 A myth,

### Multivariate Logistic Regression

1 Multivariate Logistic Regression As in univariate logistic regression, let π(x) represent the probability of an event that depends on p covariates or independent variables. Then, using an inv.logit formulation

### Fitting Subject-specific Curves to Grouped Longitudinal Data

Fitting Subject-specific Curves to Grouped Longitudinal Data Djeundje, Viani Heriot-Watt University, Department of Actuarial Mathematics & Statistics Edinburgh, EH14 4AS, UK E-mail: vad5@hw.ac.uk Currie,

### Yiming Peng, Department of Statistics. February 12, 2013

Regression Analysis Using JMP Yiming Peng, Department of Statistics February 12, 2013 2 Presentation and Data http://www.lisa.stat.vt.edu Short Courses Regression Analysis Using JMP Download Data to Desktop

### Multivariate Analysis (Slides 13)

Multivariate Analysis (Slides 13) The final topic we consider is Factor Analysis. A Factor Analysis is a mathematical approach for attempting to explain the correlation between a large set of variables

### Betting with the Kelly Criterion

Betting with the Kelly Criterion Jane June 2, 2010 Contents 1 Introduction 2 2 Kelly Criterion 2 3 The Stock Market 3 4 Simulations 5 5 Conclusion 8 1 Page 2 of 9 1 Introduction Gambling in all forms,

### Collinearity of independent variables. Collinearity is a condition in which some of the independent variables are highly correlated.

Collinearity of independent variables Collinearity is a condition in which some of the independent variables are highly correlated. Why is this a problem? Collinearity tends to inflate the variance of

### CHAPTER 5. Exercise Solutions

CHAPTER 5 Exercise Solutions 91 Chapter 5, Exercise Solutions, Principles of Econometrics, e 9 EXERCISE 5.1 (a) y = 1, x =, x = x * * i x i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 y * i (b) (c) yx = 1, x = 16, yx

### Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a

### An Analysis of the Telecommunications Business in China by Linear Regression

An Analysis of the Telecommunications Business in China by Linear Regression Authors: Ajmal Khan h09ajmkh@du.se Yang Han v09yanha@du.se Graduate Thesis Supervisor: Dao Li dal@du.se C-level in Statistics,

### Testing for Lack of Fit

Chapter 6 Testing for Lack of Fit How can we tell if a model fits the data? If the model is correct then ˆσ 2 should be an unbiased estimate of σ 2. If we have a model which is not complex enough to fit

### Example G Cost of construction of nuclear power plants

1 Example G Cost of construction of nuclear power plants Description of data Table G.1 gives data, reproduced by permission of the Rand Corporation, from a report (Mooz, 1978) on 32 light water reactor

### Bivariate Regression Analysis. The beginning of many types of regression

Bivariate Regression Analysis The beginning of many types of regression TOPICS Beyond Correlation Forecasting Two points to estimate the slope Meeting the BLUE criterion The OLS method Purpose of Regression

### Introduction to Principal Components and FactorAnalysis

Introduction to Principal Components and FactorAnalysis Multivariate Analysis often starts out with data involving a substantial number of correlated variables. Principal Component Analysis (PCA) is a

### Improving Generalization

Improving Generalization Introduction to Neural Networks : Lecture 10 John A. Bullinaria, 2004 1. Improving Generalization 2. Training, Validation and Testing Data Sets 3. Cross-Validation 4. Weight Restriction

### 5 Numerical Differentiation

D. Levy 5 Numerical Differentiation 5. Basic Concepts This chapter deals with numerical approximations of derivatives. The first questions that comes up to mind is: why do we need to approximate derivatives

### Hypothesis Testing & Data Analysis. Statistics. Descriptive Statistics. What is the difference between descriptive and inferential statistics?

2 Hypothesis Testing & Data Analysis 5 What is the difference between descriptive and inferential statistics? Statistics 8 Tools to help us understand our data. Makes a complicated mess simple to understand.

### SAS Code to Select the Best Multiple Linear Regression Model for Multivariate Data Using Information Criteria

Paper SA01_05 SAS Code to Select the Best Multiple Linear Regression Model for Multivariate Data Using Information Criteria Dennis J. Beal, Science Applications International Corporation, Oak Ridge, TN

### Replicating portfolios revisited

Prepared by: Alexey Botvinnik Dr. Mario Hoerig Dr. Florian Ketterer Alexander Tazov Florian Wechsung SEPTEMBER 2014 Replicating portfolios revisited TABLE OF CONTENTS 1 STOCHASTIC MODELING AND LIABILITIES

### Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

Data Mining and Data Warehousing Henryk Maciejewski Data Mining Predictive modelling: regression Algorithms for Predictive Modelling Contents Regression Classification Auxiliary topics: Estimation of prediction

### Predicting Health Care Costs by Two-part Model with Sparse Regularization

Predicting Health Care Costs by Two-part Model with Sparse Regularization Atsuyuki Kogure Keio University, Japan July, 2015 Abstract We consider the problem of predicting health care costs using the two-part

### SYSM 6304: Risk and Decision Analysis Lecture 3 Monte Carlo Simulation

SYSM 6304: Risk and Decision Analysis Lecture 3 Monte Carlo Simulation M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas Email: M.Vidyasagar@utdallas.edu September 19, 2015 Outline

### Chapter 6: Multivariate Cointegration Analysis

Chapter 6: Multivariate Cointegration Analysis 1 Contents: Lehrstuhl für Department Empirische of Wirtschaftsforschung Empirical Research and und Econometrics Ökonometrie VI. Multivariate Cointegration

### Poisson Models for Count Data

Chapter 4 Poisson Models for Count Data In this chapter we study log-linear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the

### Model selection in R featuring the lasso. Chris Franck LISA Short Course March 26, 2013

Model selection in R featuring the lasso Chris Franck LISA Short Course March 26, 2013 Goals Overview of LISA Classic data example: prostate data (Stamey et. al) Brief review of regression and model selection.

### 15.062 Data Mining: Algorithms and Applications Matrix Math Review

.6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop

### t-tests and F-tests in regression

t-tests and F-tests in regression Johan A. Elkink University College Dublin 5 April 2012 Johan A. Elkink (UCD) t and F-tests 5 April 2012 1 / 25 Outline 1 Simple linear regression Model Variance and R

### A Predictive Model for NFL Rookie Quarterback Fantasy Football Points

A Predictive Model for NFL Rookie Quarterback Fantasy Football Points Steve Bronder and Alex Polinsky Duquesne University Economics Department Abstract This analysis designs a model that predicts NFL rookie

### Moral Hazard. Itay Goldstein. Wharton School, University of Pennsylvania

Moral Hazard Itay Goldstein Wharton School, University of Pennsylvania 1 Principal-Agent Problem Basic problem in corporate finance: separation of ownership and control: o The owners of the firm are typically

### Non-Linear Regression 2006-2008 Samuel L. Baker

NON-LINEAR REGRESSION 1 Non-Linear Regression 2006-2008 Samuel L. Baker The linear least squares method that you have een using fits a straight line or a flat plane to a unch of data points. Sometimes

### Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade

Statistics Quiz Correlation and Regression -- ANSWERS 1. Temperature and air pollution are known to be correlated. We collect data from two laboratories, in Boston and Montreal. Boston makes their measurements

### Logs Transformation in a Regression Equation

Fall, 2001 1 Logs as the Predictor Logs Transformation in a Regression Equation The interpretation of the slope and intercept in a regression change when the predictor (X) is put on a log scale. In this

### Detection of changes in variance using binary segmentation and optimal partitioning

Detection of changes in variance using binary segmentation and optimal partitioning Christian Rohrbeck Abstract This work explores the performance of binary segmentation and optimal partitioning in the

### DEPARTMENT OF ECONOMICS. Unit ECON 12122 Introduction to Econometrics. Notes 4 2. R and F tests

DEPARTMENT OF ECONOMICS Unit ECON 11 Introduction to Econometrics Notes 4 R and F tests These notes provide a summary of the lectures. They are not a complete account of the unit material. You should also

### Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

### Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

Topic 4 - Analysis of Variance Approach to Regression Outline Partitioning sums of squares Degrees of freedom Expected mean squares General linear test - Fall 2013 R 2 and the coefficient of correlation

### Overview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

Overview Classes 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) 2-4 Loglinear models (8) 5-4 15-17 hrs; 5B02 Building and

### GLM I An Introduction to Generalized Linear Models

GLM I An Introduction to Generalized Linear Models CAS Ratemaking and Product Management Seminar March 2009 Presented by: Tanya D. Havlicek, Actuarial Assistant 0 ANTITRUST Notice The Casualty Actuarial

### Dimensionality Reduction: Principal Components Analysis

Dimensionality Reduction: Principal Components Analysis In data mining one often encounters situations where there are a large number of variables in the database. In such situations it is very likely

### VI. Introduction to Logistic Regression

VI. Introduction to Logistic Regression We turn our attention now to the topic of modeling a categorical outcome as a function of (possibly) several factors. The framework of generalized linear models

### Simple Regression Theory II 2010 Samuel L. Baker

SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the

### DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9 Analysis of covariance and multiple regression So far in this course,

### A Detailed Price Discrimination Example

A Detailed Price Discrimination Example Suppose that there are two different types of customers for a monopolist s product. Customers of type 1 have demand curves as follows. These demand curves include

### data visualization and regression

data visualization and regression Sepal.Length 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 I. setosa I. versicolor I. virginica I. setosa I. versicolor I. virginica Species Species

### Applied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne

Applied Statistics J. Blanchet and J. Wadsworth Institute of Mathematics, Analysis, and Applications EPF Lausanne An MSc Course for Applied Mathematicians, Fall 2012 Outline 1 Model Comparison 2 Model

### Generalized Linear Models

Generalized Linear Models We have previously worked with regression models where the response variable is quantitative and normally distributed. Now we turn our attention to two types of models where the