Section 6: Model Selection, Logistic Regression and more...


 Benjamin Nichols
 3 years ago
 Views:
Transcription
1 Section 6: Model Selection, Logistic Regression and more... Carlos M. Carvalho The University of Texas McCombs School of Business 1
2 Model Building Process When building a regression model remember that simplicity is your friend... smaller models are easier to interpret and have fewer unknown parameters to be estimated. Keep in mind that every additional parameter represents a cost!! The first step of every model building exercise is the selection of the the universe of variables to be potentially used. This task is entirely solved through you experience and context specific knowledge... Think carefully about the problem Consult subject matter research and experts Avoid the mistake of selecting too many variables 2
3 Model Building Process With a universe of variables in hand, the goal now is to select the model. Why not include all the variables in? Big models tend to overfit and find features that are specific to the data in hand... ie, not generalizable relationships. The results are bad predictions and bad science! In addition, bigger models have more parameters and potentially more uncertainty about everything we are trying to learn... (check the beer and weight example!) We need a strategy to build a model in ways that accounts for the tradeoff between fitting the data and the uncertainty associated with the model 3
4 OutofSample Prediction One idea is to focus on the model s ability to predict... How do we evaluate a forecasting model? Make predictions! Basic Idea: We want to use the model to forecast outcomes for observations we have not seen before. Use the data to create a prediction problem. See how our candidate models perform. We ll use most of the data for training the model, and the left over part for validating the model. 4
5 OutofSample Prediction In a crossvalidation scheme, you fit a bunch of models to most of the data (training sample) and choose the model that performed the best on the rest (leftout sample). Fit the model on the training data Use the model to predict Ŷj values for all of the N LO leftout data points Calculate the Mean Square Error for these predictions MSE = 1 N LO N LO (Y j Ŷj) 2 j=1 5
6 Example To illustrate the potential problems of overfitting the data, let s look again at the Telemarketing example... let s look at multiple polynomial terms... Calls months 6
7 Example Let s evaluate the fit of each model by their R 2 (on the training data) R Polynomial Order 7
8 Example How about the MSE?? (on the leftout data) RMSE Polynomial Order 8
9 BIC for Model Selection Another way to evaluate a model is to use Information Criteria metrics which attempt to quantify how well our model would have predicted the data (regardless of what you ve estimated for the β j s). A good alternative is the BIC: Bayes Information Criterion, which is based on a Bayesian philosophy of statistics. BIC = n log(s 2 ) + p log(n) You want to choose the model that leads to minimum BIC. 9
10 BIC for Model Selection One (very!) nice thing about the BIC is that you can interpret it in terms of model probabilities. Given a list of possible models {M 1, M 2,..., M R }, the probability that model i is correct is P(M i ) e 1 2 BIC(M i ) R r=1 e 1 2 BIC(Mr ) = e 1 2 [BIC(M i ) BIC min ] R r=1 e 1 2 [BIC(Mr ) BIC min] (Subtract BIC min = min{bic(m 1)... BIC(M R )} for numerical stability.) 10
11 BIC for Model Selection Thus BIC is an alternative to testing for comparing models. It is easy to calculate. You are able to evaluate model probabilities. There are no multiple testing type worries. It generally leads to more simple models than F tests. As with testing, you need to narrow down your options before comparing models. What if there are too many possibilities? 11
12 Stepwise Regression One computational approach to build a regression model stepbystep is stepwise regression There are 3 options: Forward: adds one variable at the time until no remaining variable makes a significant contribution (or meet a certain criteria... could be out of sample prediction) Backwards: starts will all possible variables and removes one at the time until further deletions would do more harm them good Stepwise: just like the forward procedure but allows for deletions at each step 12
13 LASSO The LASSO is a shrinkage method that performs automatic selection. Yet another alternative... has similar properties as stepwise regression but it is more automatic... R does it for you! The LASSO solves the following problem: { N ( arg min β Yi X i β ) } 2 + λ β i=1 Coefficients can be set exactly to zero (automatic model selection) Very efficient computational method λ is often chosen via CV 13
14 One informal but very useful idea to put it all together... I like to build models from the bottom, up... Set aside a set of points to be your validating set (if dataset large enought) Working on the training data, add one variable at the time deciding which one to add based on some criteria: 1. larger increases in R 2 while significant 2. larger reduction in MSE while significant 3. BIC, etc... at every step, carefully analyze the output and check the residuals! Stop when no additional variable produces a significant improvement Always make sure you understand what the model is doing in the specific context of your problem 14
15 Binary Response Data Let s now look at data where the response Y is a binary variable (taking the value 0 or 1). Win or lose. Sick or healthy. Buy or not buy. Pay or default. Thumbs up or down. The goal is generally to predict the probability that Y = 1, and you can then do classification based on this estimate. 15
16 Binary Response Data Y is an indicator: Y = 0 or 1. The conditional mean is thus E[Y X ] = p(y = 1 X ) 1 + p(y = 0 X ) 0 = p(y = 1 X ) The mean function is a probability: We need a model that gives mean/probability values between 0 and 1. We ll use a transform function that takes the righthand side of the model (x β) and gives back a value between zero and one. 16
17 Binary Response Data The binary choice model is p(y = 1 X 1... X d ) = S(β 0 + β 1 X β d X d ) where S is a function that increases in value from zero to one. 17
18 Binary Response Data There are two main functions that are used for this: Logistic Regression: S(z) = ez 1 + e z. Probit Regression: S(z) = pnorm(z). Both functions are Sshaped and take values in (0, 1). Probit is used by economists, logit by biologists, and the rest of us are fairly indifferent: they result in practically the same fit. 18
19 Logistic Regression We ll use logistic regression, such that p(y = 1 X 1... X d ) = exp[β 0 + β 1 X β d X d ] 1 + exp[β 0 + β 1 X β d X d ] The logit link is more common, and it s the default in R. These models are easy to fit in R: glm(y X1 + X2, family=binomial) g stands for generalized, and binomial indicates Y = 0 or 1. Otherwise, generalized linear models use the same syntax as lm(). 19
20 Logistic Regression What is happening here? Instead of leastsquares, glm is maximizing the product of probabilities: n P(Y i x i ) = i=1 n i=1 ( exp[x b] ) Yi ( exp[x b] 1 + exp[x b] This maximizes the likelihood of our data (which is also what leastsquares did). ) 1 Yi 20
21 Logistic Regression The important things are basically the same as before: Individual parameter pvalues are interpreted as always. extractaic(reg,k=log(n)) will get your BICs. The predict function works as before, but you need to add type = response to get ˆp i = exp[x b]/(1 + exp[x b]) (otherwise it just returns the linear function x β). Unfortunately, techniques for residual diagnostics and model checking are different (but we ll not worry about that today). Also, without sums of squares there are no R 2, anova, or F tests! 21
22 Example: Basketball Spreads NBA basketball point spreads: we have Las Vegas betting point spreads for 553 NBA games and the resulting scores. We can use logistic regression of scores onto spread to predict the probability of the favored team winning. Response: favwin=1 if favored team wins. Covariate: spread is the Vegas point spread. Frequency favwin=1 favwin=0 favwin spread spread 22
23 Example: Basketball Spreads This is a weird situation where we assume is no intercept. There is considerable evidence that betting odds are efficient. A spread of zero implies p(win) = 0.5 for each team. Thus p(win) = exp[β 0 ]/(1 + exp[β 0 ]) = 1/2 β 0 = 0. The model we want to fit is thus p(favwin spread) = exp[β spread] 1 + exp[β spread] 23
24 Example: Basketball Spreads summary(nbareg < glm(favwin spread1, family=binomial)) Some things are different (z not t) and some are missing (F, R 2 ). 24
25 Example: Basketball Spreads The fitted model is p(favwin spread) = exp[0.156 spread] 1 + exp[0.156 spread] P(favwin) spread 25
26 Example: Basketball Spreads We could consider other models... and compare with BIC! Our Efficient Vegas model: > extractaic(nbareg, k=log(553)) A model that includes nonzero intercept: > extractaic(glm(favwin spread, family=binomial), k=log(553)) What if we throw in homecourt advantage? > extractaic(glm(favwin spread+favhome, family=binomial), k=log(553)) The simplest model is best (The model probabilities are 19/20, 1/20, and zero.) 26
27 Example: Basketball Spreads Let s use our model to predict the result of a game: Portland vs Golden State: spread is PRT by 8 p(prt win) = exp[ ] 1 + exp[ ] = 0.78 Chicago vs Orlando: spread is ORL by 4 p(chi win) = exp[ ] =
28 Example: Credit Scoring A common business application of logistic regression is in evaluating the credit quality of (potential) debtors. Take a list of borrower characteristics. Build a prediction rule for their credit. Use this rule to automatically evaluate applicants (and track your risk profile). You can do all this with logistic regression, and then use the predicted probabilities to build a classification rule. 28
29 Example: Credit Scoring We have data on 1000 loan applicants at German community banks, and judgement of the loan outcomes (good or bad). The data has 20 borrower characteristics, including Credit history (5 categories). Housing (rent, own, or free). The loan purpose and duration. Installment rate as a percent of income. 29
30 Example: Credit Scoring We can use forward step wise regression to build a model. null < glm(y history3, family=binomial, data=credit[train,]) full < glm(y., family=binomial, data=credit[train,]) reg < step(null, scope=formula(full), direction="forward", k=log(n)). Step: AIC= Y[train] history3 + checkingstatus1 + duration2 + installment8 The null model has credit history as a variable, since I d include this regardless, and we ve leftout 200 points for validation. 30
31 Classification A common goal with logistic regression is to classify the inputs depending on their predicted response probabilities. For example, we might want to classify the German borrowers as having good or bad credit (i.e., do we loan to them?). A simple classification rule is to say that anyone with p(good x) > 0.5 can get a loan, and the rest do not. 31
32 Example: Credit Scoring Let s use the validation set to compare this and the full model. > full < glm(formula(terms(y[train]., data=covars)), data=covars[train,], family=binomial) > predreg < predict(reg, newdata=covars[train,], type="response") > predfull < predict(full, newdata=covars[train,], type="response") > # 1 = false negative, 1 = false positive > errorreg < Y[train](predreg >=.5) > errorfull < Y[train](predfull >=.5) > # misclassification rates: > mean(abs(errorreg)) > mean(abs(errorfull)) Our model classifies borrowers correctly 78% of the time. 32
Generalized Linear Models
Generalized Linear Models We have previously worked with regression models where the response variable is quantitative and normally distributed. Now we turn our attention to two types of models where the
More informationPenalized regression: Introduction
Penalized regression: Introduction Patrick Breheny August 30 Patrick Breheny BST 764: Applied Statistical Modeling 1/19 Maximum likelihood Much of 20thcentury statistics dealt with maximum likelihood
More informationECLT5810 ECommerce Data Mining Technique SAS Enterprise Miner  Regression Model I. Regression Node
Enterprise Miner  Regression 1 ECLT5810 ECommerce Data Mining Technique SAS Enterprise Miner  Regression Model I. Regression Node 1. Some background: Linear attempts to predict the value of a continuous
More informationLocal classification and local likelihoods
Local classification and local likelihoods November 18 knearest neighbors The idea of local regression can be extended to classification as well The simplest way of doing so is called nearest neighbor
More informationYiming Peng, Department of Statistics. February 12, 2013
Regression Analysis Using JMP Yiming Peng, Department of Statistics February 12, 2013 2 Presentation and Data http://www.lisa.stat.vt.edu Short Courses Regression Analysis Using JMP Download Data to Desktop
More informationSTATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 SigmaRestricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
More information5. Multiple regression
5. Multiple regression QBUS6840 Predictive Analytics https://www.otexts.org/fpp/5 QBUS6840 Predictive Analytics 5. Multiple regression 2/39 Outline Introduction to multiple linear regression Some useful
More informationInternational Statistical Institute, 56th Session, 2007: Phil Everson
Teaching Regression using American Football Scores Everson, Phil Swarthmore College Department of Mathematics and Statistics 5 College Avenue Swarthmore, PA198, USA Email: peverso1@swarthmore.edu 1. Introduction
More informationResponse variables assume only two values, say Y j = 1 or = 0, called success and failure (spam detection, credit scoring, contracting.
Prof. Dr. J. Franke All of Statistics 1.52 Binary response variables  logistic regression Response variables assume only two values, say Y j = 1 or = 0, called success and failure (spam detection, credit
More informationSome Essential Statistics The Lure of Statistics
Some Essential Statistics The Lure of Statistics Data Mining Techniques, by M.J.A. Berry and G.S Linoff, 2004 Statistics vs. Data Mining..lie, damn lie, and statistics mining data to support preconceived
More informationCredit Risk Analysis Using Logistic Regression Modeling
Credit Risk Analysis Using Logistic Regression Modeling Introduction A loan officer at a bank wants to be able to identify characteristics that are indicative of people who are likely to default on loans,
More informationStatistics in Retail Finance. Chapter 2: Statistical models of default
Statistics in Retail Finance 1 Overview > We consider how to build statistical models of default, or delinquency, and how such models are traditionally used for credit application scoring and decision
More information[3] Big Data: Model Selection
[3] Big Data: Model Selection Matt Taddy, University of Chicago Booth School of Business faculty.chicagobooth.edu/matt.taddy/teaching [3] Making Model Decisions OutofSample vs InSample performance Regularization
More informationExample: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation:  Feature vector X,  qualitative response Y, taking values in C
More informationModel Validation Techniques
Model Validation Techniques Kevin Mahoney, FCAS kmahoney@ travelers.com CAS RPM Seminar March 17, 2010 Uses of Statistical Models in P/C Insurance Examples of Applications Determine expected loss cost
More informationChicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011
Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011 Name: Section: I pledge my honor that I have not violated the Honor Code Signature: This exam has 34 pages. You have 3 hours to complete this
More informationGLM I An Introduction to Generalized Linear Models
GLM I An Introduction to Generalized Linear Models CAS Ratemaking and Product Management Seminar March 2009 Presented by: Tanya D. Havlicek, Actuarial Assistant 0 ANTITRUST Notice The Casualty Actuarial
More informationSAS Software to Fit the Generalized Linear Model
SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling
More informationUNDERSTANDING THE EFFECTIVENESS OF BANK DIRECT MARKETING Tarun Gupta, Tong Xia and Diana Lee
UNDERSTANDING THE EFFECTIVENESS OF BANK DIRECT MARKETING Tarun Gupta, Tong Xia and Diana Lee 1. Introduction There are two main approaches for companies to promote their products / services: through mass
More informationStatistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
More informationMultivariate Logistic Regression
1 Multivariate Logistic Regression As in univariate logistic regression, let π(x) represent the probability of an event that depends on p covariates or independent variables. Then, using an inv.logit formulation
More informationLogistic Regression. http://faculty.chass.ncsu.edu/garson/pa765/logistic.htm#sigtests
Logistic Regression http://faculty.chass.ncsu.edu/garson/pa765/logistic.htm#sigtests Overview Binary (or binomial) logistic regression is a form of regression which is used when the dependent is a dichotomy
More informationData Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression
Data Mining and Data Warehousing Henryk Maciejewski Data Mining Predictive modelling: regression Algorithms for Predictive Modelling Contents Regression Classification Auxiliary topics: Estimation of prediction
More informationBinary Logistic Regression
Binary Logistic Regression Main Effects Model Logistic regression will accept quantitative, binary or categorical predictors and will code the latter two in various ways. Here s a simple model including
More informationLogistic Regression (a type of Generalized Linear Model)
Logistic Regression (a type of Generalized Linear Model) 1/36 Today Review of GLMs Logistic Regression 2/36 How do we find patterns in data? We begin with a model of how the world works We use our knowledge
More informationRidge Regression. Patrick Breheny. September 1. Ridge regression Selection of λ Ridge regression in R/SAS
Ridge Regression Patrick Breheny September 1 Patrick Breheny BST 764: Applied Statistical Modeling 1/22 Ridge regression: Definition Definition and solution Properties As mentioned in the previous lecture,
More informationINTRODUCTORY STATISTICS
INTRODUCTORY STATISTICS FIFTH EDITION Thomas H. Wonnacott University of Western Ontario Ronald J. Wonnacott University of Western Ontario WILEY JOHN WILEY & SONS New York Chichester Brisbane Toronto Singapore
More information11. Analysis of Casecontrol Studies Logistic Regression
Research methods II 113 11. Analysis of Casecontrol Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:
More informationLinear Classification. Volker Tresp Summer 2015
Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong
More informationL3: Statistical Modeling with Hadoop
L3: Statistical Modeling with Hadoop Feng Li feng.li@cufe.edu.cn School of Statistics and Mathematics Central University of Finance and Economics Revision: December 10, 2014 Today we are going to learn...
More informationStatistics 112 Regression Cheatsheet Section 1B  Ryan Rosario
Statistics 112 Regression Cheatsheet Section 1B  Ryan Rosario I have found that the best way to practice regression is by brute force That is, given nothing but a dataset and your mind, compute everything
More informationWeek 5: Multiple Linear Regression
BUS41100 Applied Regression Analysis Week 5: Multiple Linear Regression Parameter estimation and inference, forecasting, diagnostics, dummy variables Robert B. Gramacy The University of Chicago Booth School
More informationDiscussion Section 4 ECON 139/239 2010 Summer Term II
Discussion Section 4 ECON 139/239 2010 Summer Term II 1. Let s use the CollegeDistance.csv data again. (a) An education advocacy group argues that, on average, a person s educational attainment would increase
More informationHomework Assignment 7
Homework Assignment 7 36350, Data Mining Solutions 1. Base rates (10 points) (a) What fraction of the emails are actually spam? Answer: 39%. > sum(spam$spam=="spam") [1] 1813 > 1813/nrow(spam) [1] 0.3940448
More informationLogistic regression (with R)
Logistic regression (with R) Christopher Manning 4 November 2007 1 Theory We can transform the output of a linear regression to be suitable for probabilities by using a logit link function on the lhs as
More informationSimple Predictive Analytics Curtis Seare
Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010 Contents Section I: Background Information Why use Predictive Analytics? How to use
More informationGeneralized Linear Models. Today: definition of GLM, maximum likelihood estimation. Involves choice of a link function (systematic component)
Generalized Linear Models Last time: definition of exponential family, derivation of mean and variance (memorize) Today: definition of GLM, maximum likelihood estimation Include predictors x i through
More informationLogistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld.
Logistic Regression Vibhav Gogate The University of Texas at Dallas Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld. Generative vs. Discriminative Classifiers Want to Learn: h:x Y X features
More informationExamining a Fitted Logistic Model
STAT 536 Lecture 16 1 Examining a Fitted Logistic Model Deviance Test for Lack of Fit The data below describes the male birth fraction male births/total births over the years 1931 to 1990. A simple logistic
More information" Y. Notation and Equations for Regression Lecture 11/4. Notation:
Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through
More informationClassification Problems
Classification Read Chapter 4 in the text by Bishop, except omit Sections 4.1.6, 4.1.7, 4.2.4, 4.3.3, 4.3.5, 4.3.6, 4.4, and 4.5. Also, review sections 1.5.1, 1.5.2, 1.5.3, and 1.5.4. Classification Problems
More informationCross Validation. Dr. Thomas Jensen Expedia.com
Cross Validation Dr. Thomas Jensen Expedia.com About Me PhD from ETH Used to be a statistician at Link, now Senior Business Analyst at Expedia Manage a database with 720,000 Hotels that are not on contract
More informationFINDING SUBGROUPS OF ENHANCED TREATMENT EFFECT. Jeremy M G Taylor Jared Foster University of Michigan Steve Ruberg Eli Lilly
FINDING SUBGROUPS OF ENHANCED TREATMENT EFFECT Jeremy M G Taylor Jared Foster University of Michigan Steve Ruberg Eli Lilly 1 1. INTRODUCTION and MOTIVATION 2. PROPOSED METHOD Random Forests Classification
More informationNonparametric statistics and model selection
Chapter 5 Nonparametric statistics and model selection In Chapter, we learned about the ttest and its variations. These were designed to compare sample means, and relied heavily on assumptions of normality.
More informationAgenda. Mathias Lanner Sas Institute. Predictive Modeling Applications. Predictive Modeling Training Data. Beslutsträd och andra prediktiva modeller
Agenda Introduktion till Prediktiva modeller Beslutsträd Beslutsträd och andra prediktiva modeller Mathias Lanner Sas Institute Pruning Regressioner Neurala Nätverk Utvärdering av modeller 2 Predictive
More informationSection 1: Simple Linear Regression
Section 1: Simple Linear Regression Carlos M. Carvalho The University of Texas McCombs School of Business http://faculty.mccombs.utexas.edu/carlos.carvalho/teaching/ 1 Regression: General Introduction
More informationData Mining Practical Machine Learning Tools and Techniques
Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct
More informationStepwise Regression. Chapter 311. Introduction. Variable Selection Procedures. Forward (StepUp) Selection
Chapter 311 Introduction Often, theory and experience give only general direction as to which of a pool of candidate variables (including transformed variables) should be included in the regression model.
More informationMULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS
MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level of Significance
More informationASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS
DATABASE MARKETING Fall 2015, max 24 credits Dead line 15.10. ASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS PART A Gains chart with excel Prepare a gains chart from the data in \\work\courses\e\27\e20100\ass4b.xls.
More informationModel selection in R featuring the lasso. Chris Franck LISA Short Course March 26, 2013
Model selection in R featuring the lasso Chris Franck LISA Short Course March 26, 2013 Goals Overview of LISA Classic data example: prostate data (Stamey et. al) Brief review of regression and model selection.
More informationBasic Statistical and Modeling Procedures Using SAS
Basic Statistical and Modeling Procedures Using SAS OneSample Tests The statistical procedures illustrated in this handout use two datasets. The first, Pulse, has information collected in a classroom
More informationAdequacy of Biomath. Models. Empirical Modeling Tools. Bayesian Modeling. Model Uncertainty / Selection
Directions in Statistical Methodology for Multivariable Predictive Modeling Frank E Harrell Jr University of Virginia Seattle WA 19May98 Overview of Modeling Process Model selection Regression shape Diagnostics
More information!"!!"#$$%&'()*+$(,%!"#$%$&'()*""%(+,'*&./#$&'(&(0*".$#$1"(2&."3$'45"
!"!!"#$$%&'()*+$(,%!"#$%$&'()*""%(+,'*&./#$&'(&(0*".$#$1"(2&."3$'45"!"#"$%&#'()*+',$$.&#',/"0%.12'32./4'5,5'6/%&)$).2&'7./&)8'5,5'9/2%.%3%&8':")08';:
More informationNeural Networks & Boosting
Neural Networks & Boosting Bob Stine Dept of Statistics, School University of Pennsylvania Questions How is logistic regression different from OLS? Logistic mean function for probabilities Larger weight
More informationdata visualization and regression
data visualization and regression Sepal.Length 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 I. setosa I. versicolor I. virginica I. setosa I. versicolor I. virginica Species Species
More informationData Mining  Evaluation of Classifiers
Data Mining  Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010
More informationUsing An Ordered Logistic Regression Model with SAS Vartanian: SW 541
Using An Ordered Logistic Regression Model with SAS Vartanian: SW 541 libname in1 >c:\=; Data first; Set in1.extract; A=1; PROC LOGIST OUTEST=DD MAXITER=100 ORDER=DATA; OUTPUT OUT=CC XBETA=XB P=PROB; MODEL
More informationPicking Winners is For Losers: A Strategy for Optimizing Investment Outcomes
Picking Winners is For Losers: A Strategy for Optimizing Investment Outcomes Clay graham DePaul University Risk Conference Las Vegas  November 11, 2011 REMEMBER Picking a winner is not at all the same
More informationSUGI 29 Statistics and Data Analysis
Paper 19429 Head of the CLASS: Impress your colleagues with a superior understanding of the CLASS statement in PROC LOGISTIC Michelle L. Pritchard and David J. Pasta Ovation Research Group, San Francisco,
More informationIntroduction to Predictive Modeling Using GLMs
Introduction to Predictive Modeling Using GLMs Dan Tevet, FCAS, MAAA, Liberty Mutual Insurance Group Anand Khare, FCAS, MAAA, CPCU, Milliman 1 Antitrust Notice The Casualty Actuarial Society is committed
More informationInsurance Analytics  analýza dat a prediktivní modelování v pojišťovnictví. Pavel Kříž. Seminář z aktuárských věd MFF 4.
Insurance Analytics  analýza dat a prediktivní modelování v pojišťovnictví Pavel Kříž Seminář z aktuárských věd MFF 4. dubna 2014 Summary 1. Application areas of Insurance Analytics 2. Insurance Analytics
More informationCross Validation techniques in R: A brief overview of some methods, packages, and functions for assessing prediction models.
Cross Validation techniques in R: A brief overview of some methods, packages, and functions for assessing prediction models. Dr. Jon Starkweather, Research and Statistical Support consultant This month
More informationThe aspect of the data that we want to describe/measure is the degree of linear relationship between and The statistic r describes/measures the degree
PS 511: Advanced Statistics for Psychological and Behavioral Research 1 Both examine linear (straight line) relationships Correlation works with a pair of scores One score on each of two variables ( and
More informationBuilding risk prediction models  with a focus on GenomeWide Association Studies. Charles Kooperberg
Building risk prediction models  with a focus on GenomeWide Association Studies Risk prediction models Based on data: (D i, X i1,..., X ip ) i = 1,..., n we like to fit a model P(D = 1 X 1,..., X p )
More informationCHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS
Examples: Regression And Path Analysis CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS Regression analysis with univariate or multivariate dependent variables is a standard procedure for modeling relationships
More informationLogistic Regression (1/24/13)
STA63/CBB540: Statistical methods in computational biology Logistic Regression (/24/3) Lecturer: Barbara Engelhardt Scribe: Dinesh Manandhar Introduction Logistic regression is model for regression used
More informationWe discuss 2 resampling methods in this chapter  crossvalidation  the bootstrap
Statistical Learning: Chapter 5 Resampling methods (Crossvalidation and bootstrap) (Note: prior to these notes, we'll discuss a modification of an earlier train/test experiment from Ch 2) We discuss 2
More informationIBM SPSS Neural Networks 22
IBM SPSS Neural Networks 22 Note Before using this information and the product it supports, read the information in Notices on page 21. Product Information This edition applies to version 22, release 0,
More informationCrossvalidation for detecting and preventing overfitting
Crossvalidation for detecting and preventing overfitting Note to other teachers and users of these slides. Andrew would be delighted if ou found this source material useful in giving our own lectures.
More informationUsing JMP with a Specific
1 Using JMP with a Specific Example of Regression Ying Liu 10/21/ 2009 Objectives 2 Exploratory data analysis Simple liner regression Polynomial regression How to fit a multiple regression model How to
More informationLecture 14: GLM Estimation and Logistic Regression
Lecture 14: GLM Estimation and Logistic Regression Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South
More informationMultinomial and Ordinal Logistic Regression
Multinomial and Ordinal Logistic Regression ME104: Linear Regression Analysis Kenneth Benoit August 22, 2012 Regression with categorical dependent variables When the dependent variable is categorical,
More informationLab 13: Logistic Regression
Lab 13: Logistic Regression Spam Emails Today we will be working with a corpus of emails received by a single gmail account over the first three months of 2012. Just like any other email address this account
More informationRegression analysis. MTAT Data Mining Anna Leontjeva
Regression analysis MTAT.03.183 Data Mining 2016 Anna Leontjeva Previous lecture Supervised vs. Unsupervised Learning? Previous lecture Supervised vs. Unsupervised Learning? Iris setosa Iris versicolor
More informationLecture #2 Overview. Basic IRT Concepts, Models, and Assumptions. Lecture #2 ICPSR Item Response Theory Workshop
Basic IRT Concepts, Models, and Assumptions Lecture #2 ICPSR Item Response Theory Workshop Lecture #2: 1of 64 Lecture #2 Overview Background of IRT and how it differs from CFA Creating a scale An introduction
More informationMultiple Linear Regression in Data Mining
Multiple Linear Regression in Data Mining Contents 2.1. A Review of Multiple Linear Regression 2.2. Illustration of the Regression Process 2.3. Subset Selection in Linear Regression 1 2 Chap. 2 Multiple
More informationLogit and Probit. Brad Jones 1. April 21, 2009. University of California, Davis. Bradford S. Jones, UCDavis, Dept. of Political Science
Logit and Probit Brad 1 1 Department of Political Science University of California, Davis April 21, 2009 Logit, redux Logit resolves the functional form problem (in terms of the response function in the
More informationWeight of Evidence Module
Formula Guide The purpose of the Weight of Evidence (WoE) module is to provide flexible tools to recode the values in continuous and categorical predictor variables into discrete categories automatically,
More informationBeating the NFL Football Point Spread
Beating the NFL Football Point Spread Kevin Gimpel kgimpel@cs.cmu.edu 1 Introduction Sports betting features a unique market structure that, while rather different from financial markets, still boasts
More informationVI. Introduction to Logistic Regression
VI. Introduction to Logistic Regression We turn our attention now to the topic of modeling a categorical outcome as a function of (possibly) several factors. The framework of generalized linear models
More informationModule 4  Multiple Logistic Regression
Module 4  Multiple Logistic Regression Objectives Understand the principles and theory underlying logistic regression Understand proportions, probabilities, odds, odds ratios, logits and exponents Be
More informationPredictive Modeling Techniques in Insurance
Predictive Modeling Techniques in Insurance Tuesday May 5, 2015 JF. Breton Application Engineer 2014 The MathWorks, Inc. 1 Opening Presenter: JF. Breton: 13 years of experience in predictive analytics
More informationMultiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear.
Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear. In the main dialog box, input the dependent variable and several predictors.
More informationRégression logistique : introduction
Chapitre 16 Introduction à la statistique avec R Régression logistique : introduction Une variable à expliquer binaire Expliquer un risque suicidaire élevé en prison par La durée de la peine L existence
More informationRegularized Logistic Regression for Mind Reading with Parallel Validation
Regularized Logistic Regression for Mind Reading with Parallel Validation Heikki Huttunen, JukkaPekka Kauppi, Jussi Tohka Tampere University of Technology Department of Signal Processing Tampere, Finland
More informationChristfried Webers. Canberra February June 2015
c Statistical Group and College of Engineering and Computer Science Canberra February June (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 829 c Part VIII Linear Classification 2 Logistic
More informationI L L I N O I S UNIVERSITY OF ILLINOIS AT URBANACHAMPAIGN
Beckman HLM Reading Group: Questions, Answers and Examples Carolyn J. Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANACHAMPAIGN Linear Algebra Slide 1 of
More informationUsing Stata 11 & higher for Logistic Regression Richard Williams, University of Notre Dame, Last revised March 28, 2015
Using Stata 11 & higher for Logistic Regression Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised March 28, 2015 NOTE: The routines spost13, lrdrop1, and extremes are
More informationNew Work Item for ISO 35345 Predictive Analytics (Initial Notes and Thoughts) Introduction
Introduction New Work Item for ISO 35345 Predictive Analytics (Initial Notes and Thoughts) Predictive analytics encompasses the body of statistical knowledge supporting the analysis of massive data sets.
More informationANOVA. February 12, 2015
ANOVA February 12, 2015 1 ANOVA models Last time, we discussed the use of categorical variables in multivariate regression. Often, these are encoded as indicator columns in the design matrix. In [1]: %%R
More informationMISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group
MISSING DATA TECHNIQUES WITH SAS IDRE Statistical Consulting Group ROAD MAP FOR TODAY To discuss: 1. Commonly used techniques for handling missing data, focusing on multiple imputation 2. Issues that could
More informationPenalized Logistic Regression and Classification of Microarray Data
Penalized Logistic Regression and Classification of Microarray Data Milan, May 2003 Anestis Antoniadis Laboratoire IMAGLMC University Joseph Fourier Grenoble, France Penalized Logistic Regression andclassification
More informationPredicting Health Care Costs by Twopart Model with Sparse Regularization
Predicting Health Care Costs by Twopart Model with Sparse Regularization Atsuyuki Kogure Keio University, Japan July, 2015 Abstract We consider the problem of predicting health care costs using the twopart
More informationMachine Learning Methods for Demand Estimation
Machine Learning Methods for Demand Estimation By Patrick Bajari, Denis Nekipelov, Stephen P. Ryan, and Miaoyu Yang Over the past decade, there has been a high level of interest in modeling consumer behavior
More information5. Ordinal regression: cumulative categories proportional odds. 6. Ordinal regression: comparison to single reference generalized logits
Lecture 23 1. Logistic regression with binary response 2. Proc Logistic and its surprises 3. quadratic model 4. HosmerLemeshow test for lack of fit 5. Ordinal regression: cumulative categories proportional
More informationOrdinal Regression. Chapter
Ordinal Regression Chapter 4 Many variables of interest are ordinal. That is, you can rank the values, but the real distance between categories is unknown. Diseases are graded on scales from least severe
More informationThis chapter will demonstrate how to perform multiple linear regression with IBM SPSS
CHAPTER 7B Multiple Regression: Statistical Methods Using IBM SPSS This chapter will demonstrate how to perform multiple linear regression with IBM SPSS first using the standard method and then using the
More informationLatent Class Regression Part II
This work is licensed under a Creative Commons AttributionNonCommercialShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this
More information