Multiple Choice Models II

Size: px
Start display at page:

Download "Multiple Choice Models II"

Transcription

1 Multiple Choice Models II Laura Magazzini University of Verona Laura Magazzini Multiple Choice Models II 1 / 28

2 Categorical data Categorical variable models Y is the result of a single decision among more than 2 alternatives Unordered choice set: Categories/Qualitative choices multinomial logit, conditional logit, nested logit Ordered choice set (rankings): models for ordered data ordered probit Laura Magazzini (@univr.it) Multiple Choice Models II 2 / 28

3 Example: Education and Occupational Choice Education Primary/Secondary University Occupation School or more Total Menial 23 (74.19%) 8 (25.81%) 31 (100%) Blue Collar 60 (86.96%) 9 (13.04%) 69 (100%) Craft 65 (77.38%) 19 (22.62%) 84 (100%) WhiteCol 27 (65.85%) 14 (34.15%) 41 (100%) Prof 27 (24.11%) 85 (75.89%) 112 (100%) Total 202 (59.94%) 135 (40.06%) 337 (100%) Laura Magazzini (@univr.it) Multiple Choice Models II 3 / 28

4 Multinomial distribution Y i : qualitative random variable with J categories P ij = Pr(Y i = j), j = 1, 2,..., J Probability that individual i will choose alternative j Categories are mutually exclusive and exaustive: P ij = 1, i = 1, 2,..., N j Let d i = (d i1, d i2,..., d ij ), where d ij = 1 if Y i = j j d ij = 1, i = 1, 2,..., N Laura Magazzini (@univr.it) Multiple Choice Models II 4 / 28

5 Multinomial logit model (MNL) Y : result of a choice among J alternatives (J > 2) d i = (d i1, d i2,..., d ij ), where d ij = 1 if Y i = j P ij = Pr(Y i = j), j P ij = 1 Logit model: Pr(Y i = j) = exp(η ij ) J l=1 exp(η il) Laura Magazzini (@univr.it) Multiple Choice Models II 5 / 28

6 Properties of MNL Categorical variable models 0 P ij 1 j P ij = 1 (by definition) For every pair of alternatives (k, l), the probability ratio is P ik P il = exp(η ik) exp(η il ) log P ik P il = η ik η il The model can be motivated by a random utility model Laura Magazzini (@univr.it) Multiple Choice Models II 6 / 28

7 Random Utility Models (1) McFadden (1973, 2001) J alternatives: mutually exclusive, exhaustive, finite set Examples: competing brands, different means of transport, different occupations,... Categories can be ordered or unordered Different tecniques will be employed according to the nature of the alternatives Assume non-ordered alternatives Rational agent chooses the alternative that maximizes his/her utility: Y i = j if U ij > U ik for each k j Laura Magazzini (@univr.it) Multiple Choice Models II 7 / 28

8 Random Utility Models (2) McFadden (1973, 2001) Linear utility model: U ij = η ij + ɛ ij with η ij = LC(z ij, θ) η ij links the agent utility to factors that can be observed η ij is different from U ij since there are factors that cannot be observed by the researcher Pr(Y i = j) = Pr(U ij > U ik, k j) = Pr(η ij + ɛ ij > η ik + ɛ ik, k j) = Pr(ɛ ik ɛ ij < η ij η ik, k j) = I (ɛik ɛ ij <η ij η ik, k j)f (ɛ)dɛ ɛ with f probability density function of ɛ The model is made operational by a particular choice of distribution for the disturbance Closed functional forms exist only for few specifications (e.g. logit) Laura Magazzini (@univr.it) Multiple Choice Models II 8 / 28

9 How to specify η ij? Categorical variable models Standard MNL η ij = x i β j x individual characteristics, constant across all the alternatives j Conditional logit model η ij = z ij γ z ij characteristics of the choice j and individual i - Datasets typically analyzed by economists do not contain mixtures of individual and choice-specific attributes - CLM is usually applied when the interest is in the effect of choice-specific attributes - Custom transformation is needed for variables containing individual-specific attributes Laura Magazzini (@univr.it) Multiple Choice Models II 9 / 28

10 Standard MNL Pr(Y i = j x i ) = exp(x i β j) J l=1 exp(x i β l) It is not possible to estimates all the β 1,..., β J By adding a constant to all the βs, the probability doesn t change Indeterminacy in the model is removed by letting β 1 = 0 J = 1 is the reference category Pr(Y i = j x i ) = exp(x i β j) 1 + J l=2 exp(x i β l) Intercept in the model is allowed by letting the first column of x i = 1 for every i Laura Magazzini (@univr.it) Multiple Choice Models II 10 / 28

11 Estimation: MLE The log likelihood can be written as ln L = n J d ij ln Pr(Y i = j) i=1 j=1 with d ij = 1 if Y i = j, 0 otherwise The derivatives have the characteristically simple form: ln L β j = i (d ij P ij )x i = 0 As a consequence, if the model is estimated with an intercept, i d ij = i P ij = 1 Laura Magazzini (@univr.it) Multiple Choice Models II 11 / 28

12 Interpretation of the parameters The partial effects for this model are complicated: [ ] P j J = P j β j P k β k = P j [β j β] x i k=1 The coefficients in this model are difficult to interpret: P j / x k need not have the same sign as β jk A simpler interpretation by considering the odds ratio: ln P ij P i1 = x i β j ln P ij P ik = x i (β j β k ) if k 1 In case of dummy variables (coded as 0 or 1) ln P ij(x i =1) P i1(xi = β =1) j ln P ij(x i =1) P ik(xi = β =1) j β k if k 1 Laura Magazzini (@univr.it) Multiple Choice Models II 12 / 28

13 Conditional logit model Pr(Y i = j z j ) = exp(z j β) J k=1 exp(z kβ) The model contains choice-specific attributes The coefficients of individual-specific attributes (that do not vary across categories) are not identified Individual-specific variable can be inserted in the model, but need to be properly transformed All the coefficients of the choice-specific attributes cannot be separately identified: adding a constant to all the coefficients does not change the estimated probability The intercept is set to zero Laura Magazzini (@univr.it) Multiple Choice Models II 13 / 28

14 Marginal effects Categorical variable models P j (z) z k = β k [P j (z)(i (j=k) P k (z))] P j (z) z j = β z [P j (z)(1 P j (z))] P j (z) z h = β z P j (z)p h (z) (j h) P j change monotonically with respect to z The sign of the derivative depends on the sign of β z Opposite effect by considering z j or z h Simmetry: P j z h = P h z j P j does not change if all the variables z kh change in the same direction (the ranking of U ij is unchanged!) Laura Magazzini (@univr.it) Multiple Choice Models II 14 / 28

15 Multinomial logit (MNL) vs conditional logit (CNL) Similar response probabilities, but they differ in some important respects MNL: the conditioning variables do not change across alternatives Characteristics of the alternatives are unimportant or not of interest, or data are not available Example: occupational choice we do not know how much someone could make in every occupation We can collect data on factors affecting individual productivity and tastes, e.g education, past experience MNL: factors can have different effects on relative probabilities (different β j for different choices) CNL: choices on the basis of observable attributes of each alternative Common β MNL as a special case of CNL Important limitation: independence from irrelevant alternatives assumption Laura Magazzini (@univr.it) Multiple Choice Models II 15 / 28

16 Independence from irrelevant alternatives (logit) For every pair of alternatives (k, l), the probability ratio (odd) is ω = Pr(Y i = k x ik ) Pr(Y i = l x il ) = exp(η ik) exp(η il ) ω depends only on the linear predictors (η) of the considered alternatives, not on the whole set of alternatives From the point of view of estimation, it is useful that the odds ratio does not depend on the other choices But it is not a particularly appealing restriction to place on consumer behaviour Laura Magazzini (@univr.it) Multiple Choice Models II 16 / 28

17 IIA: example by McFadden (1984) Commuters initially choosing between cars and red buses with equal probabilities Suppose a third mode (blue buses) is added and commuters do not care about the colur of the bus (i.e. will chose between these with equal probability) IIA imply that the fraction of commuters taking a car would fall from, a result that is not very realistic 1 2 to 1 3 Laura Magazzini (@univr.it) Multiple Choice Models II 17 / 28

18 Testing IIA Hausman and McFadden (1984) If a subset of the choice set is truly irrelevant, omitting it from the model altogether will not change the parameter estimates sistematically Exclusion of these choices will be inefficient but will not lead to inconsistency But if the remaining odds are not truly independent from these alternatives, then the parameter estimates obtained when these choices are included will be inconsistent Therefore, Hausman s specification test can be applied Laura Magazzini (@univr.it) Multiple Choice Models II 18 / 28

19 The Hausman s specification test Consider two different estimators ˆθ E and ˆθ I Under H0, ˆθ E and ˆθ I are both consistent and ˆθ E is efficient relative to ˆθ I Under H1, ˆθ I remains consistent while ˆθ E is inconsistent Then H0 can be tested by using the Hausman statistics: H = (ˆθ I ˆθ E ) [Est.Asy.Var(ˆθ I ˆθ E )] 1 (ˆθ I ˆθ E ) = (ˆθ I ˆθ E ) [Est.Asy.Var(ˆθ I ) Est.Asy.Var(ˆθ E )] 1 (ˆθ I ˆθ E ) d χ 2 J The appropriate degree of freedom for the test will depend on the context In the case of MNL, J is the number of parameter in the estimating equation of the restricted choice set Laura Magazzini (@univr.it) Multiple Choice Models II 19 / 28

20 What if IIA hypothesis is not satisfied? (1) Multivariate probit model U j = β x j + ɛ j, j = 1,..., J, [ɛ 1, ɛ 2,..., ɛ J ] N(0, Σ) Pr(Y i = j) = Pr(U j > U k, j = 1, 2,..., J, k j) Main obstacle: difficulty in computing the multivariate normal probability for any dimensionality higher than 2 Recent advances in accurate simulations of multinormal integrals have made estimation of MNP more feasible Simulation-based estimation Laura Magazzini (@univr.it) Multiple Choice Models II 20 / 28

21 IIA is maintained within groups, but does not need to hold across groups Main limitations Results can depend on the way in which groups are formed... There is no specification test to discriminated among different Laura Magazzini groupings Multiple Choice Models II 21 / 28 Categorical variable models What if IIA hypothesis is not satisfied? (2) Generalized extreme value: Nested logit models Very appealing if it is possible to assume sequential choices The J alternatives are grouped into L subgroups: (1) First the group of alternative is chosen (2) Then, one alternative is chosen within the group

22 Treatment of rankings Ordered data Y can assume a limited number of categories y c, c = 0, 1,..., C Categories are inherently ordered: y 0 < y 1 < y 2 < y C Examples: Bond rating: AAA-D Symptoms: none, minor, serious Drug effect: worsen, none, partial recovery, full recovery Customer satisfaction: very unsatisfied, unsatisfied, satisfied, very satisfied... Ordered probit and logit models Multinomial models would fail to account for the ordinal nature of the dependent variable OLS would attach a meaning to the difference between the category codings Laura Magazzini (@univr.it) Multiple Choice Models II 22 / 28

23 Latent regression Categorical variable models Treatment of rankings We consider a continuous latent variable y (unobserved), linear function of x and ɛ: y = x β + ɛ We observe y = c γ c < y γ c+1, with γ 0 = e γ C+1 = + The latent response is specified by a linear regression model without the intercept Laura Magazzini (@univr.it) Multiple Choice Models II 23 / 28

24 Ordered Probit Model y = x β + ɛ with ɛ N(0, 1) Categorical variable models Treatment of rankings Pr(y i = 0 x) = Pr(yi γ 1 ) = Pr(ɛ i γ 1 x β x) = Φ(γ 1 x β) Pr(y i = 1 x) = Pr(γ 1 < yi γ 2 ) = Φ(γ 2 x β) Φ(γ 1 x β). Pr(y i = C x) = Pr(y i > γ C ) = 1 Φ(γ C x β) Usually y has no real meaning The interest is in Pr(y x) rather than E(y x) To identify the parameters: x cannot contain the intercept If you have to specify a model with an intercept, set γ 1 = 0 Laura Magazzini (@univr.it) Multiple Choice Models II 24 / 28

25 Marginal effects Categorical variable models Treatment of rankings Coefficients are difficult to interpret: Pr(y i =0 x) x j = β j φ(γ 1 x β) sign opposite to the sign of β j Pr(y i =c x) x j ambiguous sign!!! = β j [φ(γ c+1 x β) φ(γ c x β)] Pr(y i =C x) x j = β j φ(γ C x β) same sign as β j Laura Magazzini (@univr.it) Multiple Choice Models II 25 / 28

26 Treatment of rankings Changes in y and y in response to changes in x Increasing one of the x s while holding β and γ constant is equivalent to shifting the distribution of y to the right (solid to dashed curve) Laura Magazzini (@univr.it) Multiple Choice Models II 26 / 28

27 Treatment of rankings Ordered Logistic Regression: ɛ i logistica Proportional odds model Pr(y i > c) = ( log Pr(yi >c) 1 Pr(y i >c) exp(x i β γc) 1+exp(x i β γc) ) = x i β γ c Pr(y i >c)/[1 Pr(y i >c)] Pr(y j >c)/[1 Pr(y j >c)] = exp[(x i x j ) β] Doesn t depend on the threshold Laura Magazzini (@univr.it) Multiple Choice Models II 27 / 28

28 Treatment of rankings Ordered Probit vs. Ordered Logit Coefficients and threshold parameters are different due to different scale factors (σ probit = 1, whereas σ logit = π 2 /3) Predicted probabilities are similar Marginal effects are similar If the logit is chosen, estimated coefficients can be interpreted in terms of odds Laura Magazzini (@univr.it) Multiple Choice Models II 28 / 28

Multinomial and Ordinal Logistic Regression

Multinomial and Ordinal Logistic Regression Multinomial and Ordinal Logistic Regression ME104: Linear Regression Analysis Kenneth Benoit August 22, 2012 Regression with categorical dependent variables When the dependent variable is categorical,

More information

LOGIT AND PROBIT ANALYSIS

LOGIT AND PROBIT ANALYSIS LOGIT AND PROBIT ANALYSIS A.K. Vasisht I.A.S.R.I., Library Avenue, New Delhi 110 012 amitvasisht@iasri.res.in In dummy regression variable models, it is assumed implicitly that the dependent variable Y

More information

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not. Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

More information

Nominal and ordinal logistic regression

Nominal and ordinal logistic regression Nominal and ordinal logistic regression April 26 Nominal and ordinal logistic regression Our goal for today is to briefly go over ways to extend the logistic regression model to the case where the outcome

More information

Linda K. Muthén Bengt Muthén. Copyright 2008 Muthén & Muthén www.statmodel.com. Table Of Contents

Linda K. Muthén Bengt Muthén. Copyright 2008 Muthén & Muthén www.statmodel.com. Table Of Contents Mplus Short Courses Topic 2 Regression Analysis, Eploratory Factor Analysis, Confirmatory Factor Analysis, And Structural Equation Modeling For Categorical, Censored, And Count Outcomes Linda K. Muthén

More information

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max

More information

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Beckman HLM Reading Group: Questions, Answers and Examples Carolyn J. Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Linear Algebra Slide 1 of

More information

The Proportional Odds Model for Assessing Rater Agreement with Multiple Modalities

The Proportional Odds Model for Assessing Rater Agreement with Multiple Modalities The Proportional Odds Model for Assessing Rater Agreement with Multiple Modalities Elizabeth Garrett-Mayer, PhD Assistant Professor Sidney Kimmel Comprehensive Cancer Center Johns Hopkins University 1

More information

Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

More information

Logistic Regression (1/24/13)

Logistic Regression (1/24/13) STA63/CBB540: Statistical methods in computational biology Logistic Regression (/24/3) Lecturer: Barbara Engelhardt Scribe: Dinesh Manandhar Introduction Logistic regression is model for regression used

More information

Reject Inference in Credit Scoring. Jie-Men Mok

Reject Inference in Credit Scoring. Jie-Men Mok Reject Inference in Credit Scoring Jie-Men Mok BMI paper January 2009 ii Preface In the Master programme of Business Mathematics and Informatics (BMI), it is required to perform research on a business

More information

CREDIT SCORING MODEL APPLICATIONS:

CREDIT SCORING MODEL APPLICATIONS: Örebro University Örebro University School of Business Master in Applied Statistics Thomas Laitila Sune Karlsson May, 2014 CREDIT SCORING MODEL APPLICATIONS: TESTING MULTINOMIAL TARGETS Gabriela De Rossi

More information

GENDER DIFFERENCES IN MAJOR CHOICE AND COLLEGE ENTRANCE PROBABILITIES IN BRAZIL

GENDER DIFFERENCES IN MAJOR CHOICE AND COLLEGE ENTRANCE PROBABILITIES IN BRAZIL GENDER DIFFERENCES IN MAJOR CHOICE AND COLLEGE ENTRANCE PROBABILITIES IN BRAZIL (PRELIMINARY VERSION) ALEJANDRA TRAFERRI PONTIFICIA UNIVERSIDAD CATÓLICA DE CHILE Abstract. I study gender differences in

More information

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives

More information

Logistic regression modeling the probability of success

Logistic regression modeling the probability of success Logistic regression modeling the probability of success Regression models are usually thought of as only being appropriate for target variables that are continuous Is there any situation where we might

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Math 541: Statistical Theory II Lecturer: Songfeng Zheng Maximum Likelihood Estimation 1 Maximum Likelihood Estimation Maximum likelihood is a relatively simple method of constructing an estimator for

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

More information

Ordinal Regression. Chapter

Ordinal Regression. Chapter Ordinal Regression Chapter 4 Many variables of interest are ordinal. That is, you can rank the values, but the real distance between categories is unknown. Diseases are graded on scales from least severe

More information

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level of Significance

More information

It is important to bear in mind that one of the first three subscripts is redundant since k = i -j +3.

It is important to bear in mind that one of the first three subscripts is redundant since k = i -j +3. IDENTIFICATION AND ESTIMATION OF AGE, PERIOD AND COHORT EFFECTS IN THE ANALYSIS OF DISCRETE ARCHIVAL DATA Stephen E. Fienberg, University of Minnesota William M. Mason, University of Michigan 1. INTRODUCTION

More information

Credit Risk Models: An Overview

Credit Risk Models: An Overview Credit Risk Models: An Overview Paul Embrechts, Rüdiger Frey, Alexander McNeil ETH Zürich c 2003 (Embrechts, Frey, McNeil) A. Multivariate Models for Portfolio Credit Risk 1. Modelling Dependent Defaults:

More information

Logit and Probit. Brad Jones 1. April 21, 2009. University of California, Davis. Bradford S. Jones, UC-Davis, Dept. of Political Science

Logit and Probit. Brad Jones 1. April 21, 2009. University of California, Davis. Bradford S. Jones, UC-Davis, Dept. of Political Science Logit and Probit Brad 1 1 Department of Political Science University of California, Davis April 21, 2009 Logit, redux Logit resolves the functional form problem (in terms of the response function in the

More information

Logit Models for Binary Data

Logit Models for Binary Data Chapter 3 Logit Models for Binary Data We now turn our attention to regression models for dichotomous data, including logistic regression and probit analysis. These models are appropriate when the response

More information

Bayesian Statistics in One Hour. Patrick Lam

Bayesian Statistics in One Hour. Patrick Lam Bayesian Statistics in One Hour Patrick Lam Outline Introduction Bayesian Models Applications Missing Data Hierarchical Models Outline Introduction Bayesian Models Applications Missing Data Hierarchical

More information

Overview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

Overview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) Overview Classes 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) 2-4 Loglinear models (8) 5-4 15-17 hrs; 5B02 Building and

More information

Poisson Models for Count Data

Poisson Models for Count Data Chapter 4 Poisson Models for Count Data In this chapter we study log-linear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the

More information

Chapter 10: Basic Linear Unobserved Effects Panel Data. Models:

Chapter 10: Basic Linear Unobserved Effects Panel Data. Models: Chapter 10: Basic Linear Unobserved Effects Panel Data Models: Microeconomic Econometrics I Spring 2010 10.1 Motivation: The Omitted Variables Problem We are interested in the partial effects of the observable

More information

Least Squares Estimation

Least Squares Estimation Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

More information

problem arises when only a non-random sample is available differs from censored regression model in that x i is also unobserved

problem arises when only a non-random sample is available differs from censored regression model in that x i is also unobserved 4 Data Issues 4.1 Truncated Regression population model y i = x i β + ε i, ε i N(0, σ 2 ) given a random sample, {y i, x i } N i=1, then OLS is consistent and efficient problem arises when only a non-random

More information

Multivariate Logistic Regression

Multivariate Logistic Regression 1 Multivariate Logistic Regression As in univariate logistic regression, let π(x) represent the probability of an event that depends on p covariates or independent variables. Then, using an inv.logit formulation

More information

A General Approach to Variance Estimation under Imputation for Missing Survey Data

A General Approach to Variance Estimation under Imputation for Missing Survey Data A General Approach to Variance Estimation under Imputation for Missing Survey Data J.N.K. Rao Carleton University Ottawa, Canada 1 2 1 Joint work with J.K. Kim at Iowa State University. 2 Workshop on Survey

More information

Linear Classification. Volker Tresp Summer 2015

Linear Classification. Volker Tresp Summer 2015 Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong

More information

Discrete Choice Analysis II

Discrete Choice Analysis II Discrete Choice Analysis II Moshe Ben-Akiva 1.201 / 11.545 / ESD.210 Transportation Systems Analysis: Demand & Economics Fall 2008 Review Last Lecture Introduction to Discrete Choice Analysis A simple

More information

Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )

Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 ) Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 ) and Neural Networks( 類 神 經 網 路 ) 許 湘 伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) LR Chap 10 1 / 35 13 Examples

More information

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate

More information

SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

More information

Gerry Hobbs, Department of Statistics, West Virginia University

Gerry Hobbs, Department of Statistics, West Virginia University Decision Trees as a Predictive Modeling Method Gerry Hobbs, Department of Statistics, West Virginia University Abstract Predictive modeling has become an important area of interest in tasks such as credit

More information

Standard errors of marginal effects in the heteroskedastic probit model

Standard errors of marginal effects in the heteroskedastic probit model Standard errors of marginal effects in the heteroskedastic probit model Thomas Cornelißen Discussion Paper No. 320 August 2005 ISSN: 0949 9962 Abstract In non-linear regression models, such as the heteroskedastic

More information

Introduction to Quantitative Methods

Introduction to Quantitative Methods Introduction to Quantitative Methods October 15, 2009 Contents 1 Definition of Key Terms 2 2 Descriptive Statistics 3 2.1 Frequency Tables......................... 4 2.2 Measures of Central Tendencies.................

More information

ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2

ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2 University of California, Berkeley Prof. Ken Chay Department of Economics Fall Semester, 005 ECON 14 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE # Question 1: a. Below are the scatter plots of hourly wages

More information

Qualitative Choice Analysis Workshop 76 LECTURE / DISCUSSION. Hypothesis Testing

Qualitative Choice Analysis Workshop 76 LECTURE / DISCUSSION. Hypothesis Testing Qualitative Choice Analysis Workshop 76 LECTURE / DISCUSSION Hypothesis Testing Qualitative Choice Analysis Workshop 77 T-test Use to test value of one parameter. I. Most common application: to test whether

More information

Interpretation of Somers D under four simple models

Interpretation of Somers D under four simple models Interpretation of Somers D under four simple models Roger B. Newson 03 September, 04 Introduction Somers D is an ordinal measure of association introduced by Somers (96)[9]. It can be defined in terms

More information

Introduction to Regression and Data Analysis

Introduction to Regression and Data Analysis Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it

More information

LOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as

LOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as LOGISTIC REGRESSION Nitin R Patel Logistic regression extends the ideas of multiple linear regression to the situation where the dependent variable, y, is binary (for convenience we often code these values

More information

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models We have previously worked with regression models where the response variable is quantitative and normally distributed. Now we turn our attention to two types of models where the

More information

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm

More information

Web-based Supplementary Materials for Bayesian Effect Estimation. Accounting for Adjustment Uncertainty by Chi Wang, Giovanni

Web-based Supplementary Materials for Bayesian Effect Estimation. Accounting for Adjustment Uncertainty by Chi Wang, Giovanni 1 Web-based Supplementary Materials for Bayesian Effect Estimation Accounting for Adjustment Uncertainty by Chi Wang, Giovanni Parmigiani, and Francesca Dominici In Web Appendix A, we provide detailed

More information

Using the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, and Discrete Changes

Using the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, and Discrete Changes Using the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, Discrete Changes JunXuJ.ScottLong Indiana University August 22, 2005 The paper provides technical details on

More information

Chapter 7: Dummy variable regression

Chapter 7: Dummy variable regression Chapter 7: Dummy variable regression Why include a qualitative independent variable?........................................ 2 Simplest model 3 Simplest case.............................................................

More information

ANNUITY LAPSE RATE MODELING: TOBIT OR NOT TOBIT? 1. INTRODUCTION

ANNUITY LAPSE RATE MODELING: TOBIT OR NOT TOBIT? 1. INTRODUCTION ANNUITY LAPSE RATE MODELING: TOBIT OR NOT TOBIT? SAMUEL H. COX AND YIJIA LIN ABSTRACT. We devise an approach, using tobit models for modeling annuity lapse rates. The approach is based on data provided

More information

LOGISTIC REGRESSION ANALYSIS

LOGISTIC REGRESSION ANALYSIS LOGISTIC REGRESSION ANALYSIS C. Mitchell Dayton Department of Measurement, Statistics & Evaluation Room 1230D Benjamin Building University of Maryland September 1992 1. Introduction and Model Logistic

More information

11. Analysis of Case-control Studies Logistic Regression

11. Analysis of Case-control Studies Logistic Regression Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:

More information

VI. Introduction to Logistic Regression

VI. Introduction to Logistic Regression VI. Introduction to Logistic Regression We turn our attention now to the topic of modeling a categorical outcome as a function of (possibly) several factors. The framework of generalized linear models

More information

Part 2: Analysis of Relationship Between Two Variables

Part 2: Analysis of Relationship Between Two Variables Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable

More information

STATISTICA Formula Guide: Logistic Regression. Table of Contents

STATISTICA Formula Guide: Logistic Regression. Table of Contents : Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

More information

Models for Longitudinal and Clustered Data

Models for Longitudinal and Clustered Data Models for Longitudinal and Clustered Data Germán Rodríguez December 9, 2008, revised December 6, 2012 1 Introduction The most important assumption we have made in this course is that the observations

More information

Panel Data: Linear Models

Panel Data: Linear Models Panel Data: Linear Models Laura Magazzini University of Verona laura.magazzini@univr.it http://dse.univr.it/magazzini Laura Magazzini (@univr.it) Panel Data: Linear Models 1 / 45 Introduction Outline What

More information

A Classical Monetary Model - Money in the Utility Function

A Classical Monetary Model - Money in the Utility Function A Classical Monetary Model - Money in the Utility Function Jarek Hurnik Department of Economics Lecture III Jarek Hurnik (Department of Economics) Monetary Economics 2012 1 / 24 Basic Facts So far, the

More information

Regression with a Binary Dependent Variable

Regression with a Binary Dependent Variable Regression with a Binary Dependent Variable Chapter 9 Michael Ash CPPA Lecture 22 Course Notes Endgame Take-home final Distributed Friday 19 May Due Tuesday 23 May (Paper or emailed PDF ok; no Word, Excel,

More information

Multinomial Logistic Regression

Multinomial Logistic Regression Multinomial Logistic Regression Dr. Jon Starkweather and Dr. Amanda Kay Moske Multinomial logistic regression is used to predict categorical placement in or the probability of category membership on a

More information

Using An Ordered Logistic Regression Model with SAS Vartanian: SW 541

Using An Ordered Logistic Regression Model with SAS Vartanian: SW 541 Using An Ordered Logistic Regression Model with SAS Vartanian: SW 541 libname in1 >c:\=; Data first; Set in1.extract; A=1; PROC LOGIST OUTEST=DD MAXITER=100 ORDER=DATA; OUTPUT OUT=CC XBETA=XB P=PROB; MODEL

More information

Simultaneous or Sequential? Search Strategies in the U.S. Auto. Insurance Industry

Simultaneous or Sequential? Search Strategies in the U.S. Auto. Insurance Industry Simultaneous or Sequential? Search Strategies in the U.S. Auto Insurance Industry Elisabeth Honka 1 University of Texas at Dallas Pradeep Chintagunta 2 University of Chicago Booth School of Business September

More information

Simple Linear Regression Inference

Simple Linear Regression Inference Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

More information

Gender Effects in the Alaska Juvenile Justice System

Gender Effects in the Alaska Juvenile Justice System Gender Effects in the Alaska Juvenile Justice System Report to the Justice and Statistics Research Association by André Rosay Justice Center University of Alaska Anchorage JC 0306.05 October 2003 Gender

More information

The zero-adjusted Inverse Gaussian distribution as a model for insurance claims

The zero-adjusted Inverse Gaussian distribution as a model for insurance claims The zero-adjusted Inverse Gaussian distribution as a model for insurance claims Gillian Heller 1, Mikis Stasinopoulos 2 and Bob Rigby 2 1 Dept of Statistics, Macquarie University, Sydney, Australia. email:

More information

5. Multiple regression

5. Multiple regression 5. Multiple regression QBUS6840 Predictive Analytics https://www.otexts.org/fpp/5 QBUS6840 Predictive Analytics 5. Multiple regression 2/39 Outline Introduction to multiple linear regression Some useful

More information

Lecture 19: Conditional Logistic Regression

Lecture 19: Conditional Logistic Regression Lecture 19: Conditional Logistic Regression Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina

More information

Estimating the random coefficients logit model of demand using aggregate data

Estimating the random coefficients logit model of demand using aggregate data Estimating the random coefficients logit model of demand using aggregate data David Vincent Deloitte Economic Consulting London, UK davivincent@deloitte.co.uk September 14, 2012 Introduction Estimation

More information

The Probit Link Function in Generalized Linear Models for Data Mining Applications

The Probit Link Function in Generalized Linear Models for Data Mining Applications Journal of Modern Applied Statistical Methods Copyright 2013 JMASM, Inc. May 2013, Vol. 12, No. 1, 164-169 1538 9472/13/$95.00 The Probit Link Function in Generalized Linear Models for Data Mining Applications

More information

Machine Learning Logistic Regression

Machine Learning Logistic Regression Machine Learning Logistic Regression Jeff Howbert Introduction to Machine Learning Winter 2012 1 Logistic regression Name is somewhat misleading. Really a technique for classification, not regression.

More information

Review Jeopardy. Blue vs. Orange. Review Jeopardy

Review Jeopardy. Blue vs. Orange. Review Jeopardy Review Jeopardy Blue vs. Orange Review Jeopardy Jeopardy Round Lectures 0-3 Jeopardy Round $200 How could I measure how far apart (i.e. how different) two observations, y 1 and y 2, are from each other?

More information

Choice under Uncertainty

Choice under Uncertainty Choice under Uncertainty Part 1: Expected Utility Function, Attitudes towards Risk, Demand for Insurance Slide 1 Choice under Uncertainty We ll analyze the underlying assumptions of expected utility theory

More information

THE SELECTION OF RETURNS FOR AUDIT BY THE IRS. John P. Hiniker, Internal Revenue Service

THE SELECTION OF RETURNS FOR AUDIT BY THE IRS. John P. Hiniker, Internal Revenue Service THE SELECTION OF RETURNS FOR AUDIT BY THE IRS John P. Hiniker, Internal Revenue Service BACKGROUND The Internal Revenue Service, hereafter referred to as the IRS, is responsible for administering the Internal

More information

Statistical Models in R

Statistical Models in R Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Structure of models in R Model Assessment (Part IA) Anova

More information

Accurately and Efficiently Measuring Individual Account Credit Risk On Existing Portfolios

Accurately and Efficiently Measuring Individual Account Credit Risk On Existing Portfolios Accurately and Efficiently Measuring Individual Account Credit Risk On Existing Portfolios By: Michael Banasiak & By: Daniel Tantum, Ph.D. What Are Statistical Based Behavior Scoring Models And How Are

More information

Automated Statistical Modeling for Data Mining David Stephenson 1

Automated Statistical Modeling for Data Mining David Stephenson 1 Automated Statistical Modeling for Data Mining David Stephenson 1 Abstract. We seek to bridge the gap between basic statistical data mining tools and advanced statistical analysis software that requires

More information

Lecture notes: single-agent dynamics 1

Lecture notes: single-agent dynamics 1 Lecture notes: single-agent dynamics 1 Single-agent dynamic optimization models In these lecture notes we consider specification and estimation of dynamic optimization models. Focus on single-agent models.

More information

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96 1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

More information

Weight of Evidence Module

Weight of Evidence Module Formula Guide The purpose of the Weight of Evidence (WoE) module is to provide flexible tools to recode the values in continuous and categorical predictor variables into discrete categories automatically,

More information

Agenda. Mathias Lanner Sas Institute. Predictive Modeling Applications. Predictive Modeling Training Data. Beslutsträd och andra prediktiva modeller

Agenda. Mathias Lanner Sas Institute. Predictive Modeling Applications. Predictive Modeling Training Data. Beslutsträd och andra prediktiva modeller Agenda Introduktion till Prediktiva modeller Beslutsträd Beslutsträd och andra prediktiva modeller Mathias Lanner Sas Institute Pruning Regressioner Neurala Nätverk Utvärdering av modeller 2 Predictive

More information

Regression III: Advanced Methods

Regression III: Advanced Methods Lecture 4: Transformations Regression III: Advanced Methods William G. Jacoby Michigan State University Goals of the lecture The Ladder of Roots and Powers Changing the shape of distributions Transforming

More information

IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results

IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results IAPRI Quantitative Analysis Capacity Building Series Multiple regression analysis & interpreting results How important is R-squared? R-squared Published in Agricultural Economics 0.45 Best article of the

More information

Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP

Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP ABSTRACT In data mining modelling, data preparation

More information

Free Trial - BIRT Analytics - IAAs

Free Trial - BIRT Analytics - IAAs Free Trial - BIRT Analytics - IAAs 11. Predict Customer Gender Once we log in to BIRT Analytics Free Trial we would see that we have some predefined advanced analysis ready to be used. Those saved analysis

More information

Econometrics Simple Linear Regression

Econometrics Simple Linear Regression Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight

More information

Penalized regression: Introduction

Penalized regression: Introduction Penalized regression: Introduction Patrick Breheny August 30 Patrick Breheny BST 764: Applied Statistical Modeling 1/19 Maximum likelihood Much of 20th-century statistics dealt with maximum likelihood

More information

Deterministic and Stochastic Modeling of Insulin Sensitivity

Deterministic and Stochastic Modeling of Insulin Sensitivity Deterministic and Stochastic Modeling of Insulin Sensitivity Master s Thesis in Engineering Mathematics and Computational Science ELÍN ÖSP VILHJÁLMSDÓTTIR Department of Mathematical Science Chalmers University

More information

Regression 3: Logistic Regression

Regression 3: Logistic Regression Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic regression Logistic regression in R Outline Logistic regression Introduction The model Looking at and comparing

More information

Institut für Soziologie Eberhard Karls Universität Tübingen www.maartenbuis.nl

Institut für Soziologie Eberhard Karls Universität Tübingen www.maartenbuis.nl from Indirect Extracting from Institut für Soziologie Eberhard Karls Universität Tübingen www.maartenbuis.nl from Indirect What is the effect of x on y? Which effect do I choose: average marginal or marginal

More information

CHAPTER 12 EXAMPLES: MONTE CARLO SIMULATION STUDIES

CHAPTER 12 EXAMPLES: MONTE CARLO SIMULATION STUDIES Examples: Monte Carlo Simulation Studies CHAPTER 12 EXAMPLES: MONTE CARLO SIMULATION STUDIES Monte Carlo simulation studies are often used for methodological investigations of the performance of statistical

More information

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS Chapter Seven Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS Section : An introduction to multiple regression WHAT IS MULTIPLE REGRESSION? Multiple

More information

Marginal Person. Average Person. (Average Return of College Goers) Return, Cost. (Average Return in the Population) (Marginal Return)

Marginal Person. Average Person. (Average Return of College Goers) Return, Cost. (Average Return in the Population) (Marginal Return) 1 2 3 Marginal Person Average Person (Average Return of College Goers) Return, Cost (Average Return in the Population) 4 (Marginal Return) 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27

More information

ASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS

ASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS DATABASE MARKETING Fall 2015, max 24 credits Dead line 15.10. ASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS PART A Gains chart with excel Prepare a gains chart from the data in \\work\courses\e\27\e20100\ass4b.xls.

More information

SUGI 29 Statistics and Data Analysis

SUGI 29 Statistics and Data Analysis Paper 194-29 Head of the CLASS: Impress your colleagues with a superior understanding of the CLASS statement in PROC LOGISTIC Michelle L. Pritchard and David J. Pasta Ovation Research Group, San Francisco,

More information

Handling missing data in large data sets. Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza

Handling missing data in large data sets. Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza Handling missing data in large data sets Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza The problem Often in official statistics we have large data sets with many variables and

More information

MULTIPLE REGRESSION WITH CATEGORICAL DATA

MULTIPLE REGRESSION WITH CATEGORICAL DATA DEPARTMENT OF POLITICAL SCIENCE AND INTERNATIONAL RELATIONS Posc/Uapp 86 MULTIPLE REGRESSION WITH CATEGORICAL DATA I. AGENDA: A. Multiple regression with categorical variables. Coding schemes. Interpreting

More information

Multilevel Modeling of Complex Survey Data

Multilevel Modeling of Complex Survey Data Multilevel Modeling of Complex Survey Data Sophia Rabe-Hesketh, University of California, Berkeley and Institute of Education, University of London Joint work with Anders Skrondal, London School of Economics

More information

A Basic Introduction to Missing Data

A Basic Introduction to Missing Data John Fox Sociology 740 Winter 2014 Outline Why Missing Data Arise Why Missing Data Arise Global or unit non-response. In a survey, certain respondents may be unreachable or may refuse to participate. Item

More information