# Multiple Choice Models II

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 Multiple Choice Models II Laura Magazzini University of Verona Laura Magazzini Multiple Choice Models II 1 / 28

2 Categorical data Categorical variable models Y is the result of a single decision among more than 2 alternatives Unordered choice set: Categories/Qualitative choices multinomial logit, conditional logit, nested logit Ordered choice set (rankings): models for ordered data ordered probit Laura Magazzini Multiple Choice Models II 2 / 28

3 Example: Education and Occupational Choice Education Primary/Secondary University Occupation School or more Total Menial 23 (74.19%) 8 (25.81%) 31 (100%) Blue Collar 60 (86.96%) 9 (13.04%) 69 (100%) Craft 65 (77.38%) 19 (22.62%) 84 (100%) WhiteCol 27 (65.85%) 14 (34.15%) 41 (100%) Prof 27 (24.11%) 85 (75.89%) 112 (100%) Total 202 (59.94%) 135 (40.06%) 337 (100%) Laura Magazzini Multiple Choice Models II 3 / 28

4 Multinomial distribution Y i : qualitative random variable with J categories P ij = Pr(Y i = j), j = 1, 2,..., J Probability that individual i will choose alternative j Categories are mutually exclusive and exaustive: P ij = 1, i = 1, 2,..., N j Let d i = (d i1, d i2,..., d ij ), where d ij = 1 if Y i = j j d ij = 1, i = 1, 2,..., N Laura Magazzini Multiple Choice Models II 4 / 28

5 Multinomial logit model (MNL) Y : result of a choice among J alternatives (J > 2) d i = (d i1, d i2,..., d ij ), where d ij = 1 if Y i = j P ij = Pr(Y i = j), j P ij = 1 Logit model: Pr(Y i = j) = exp(η ij ) J l=1 exp(η il) Laura Magazzini Multiple Choice Models II 5 / 28

6 Properties of MNL Categorical variable models 0 P ij 1 j P ij = 1 (by definition) For every pair of alternatives (k, l), the probability ratio is P ik P il = exp(η ik) exp(η il ) log P ik P il = η ik η il The model can be motivated by a random utility model Laura Magazzini Multiple Choice Models II 6 / 28

7 Random Utility Models (1) McFadden (1973, 2001) J alternatives: mutually exclusive, exhaustive, finite set Examples: competing brands, different means of transport, different occupations,... Categories can be ordered or unordered Different tecniques will be employed according to the nature of the alternatives Assume non-ordered alternatives Rational agent chooses the alternative that maximizes his/her utility: Y i = j if U ij > U ik for each k j Laura Magazzini Multiple Choice Models II 7 / 28

8 Random Utility Models (2) McFadden (1973, 2001) Linear utility model: U ij = η ij + ɛ ij with η ij = LC(z ij, θ) η ij links the agent utility to factors that can be observed η ij is different from U ij since there are factors that cannot be observed by the researcher Pr(Y i = j) = Pr(U ij > U ik, k j) = Pr(η ij + ɛ ij > η ik + ɛ ik, k j) = Pr(ɛ ik ɛ ij < η ij η ik, k j) = I (ɛik ɛ ij <η ij η ik, k j)f (ɛ)dɛ ɛ with f probability density function of ɛ The model is made operational by a particular choice of distribution for the disturbance Closed functional forms exist only for few specifications (e.g. logit) Laura Magazzini Multiple Choice Models II 8 / 28

9 How to specify η ij? Categorical variable models Standard MNL η ij = x i β j x individual characteristics, constant across all the alternatives j Conditional logit model η ij = z ij γ z ij characteristics of the choice j and individual i - Datasets typically analyzed by economists do not contain mixtures of individual and choice-specific attributes - CLM is usually applied when the interest is in the effect of choice-specific attributes - Custom transformation is needed for variables containing individual-specific attributes Laura Magazzini Multiple Choice Models II 9 / 28

10 Standard MNL Pr(Y i = j x i ) = exp(x i β j) J l=1 exp(x i β l) It is not possible to estimates all the β 1,..., β J By adding a constant to all the βs, the probability doesn t change Indeterminacy in the model is removed by letting β 1 = 0 J = 1 is the reference category Pr(Y i = j x i ) = exp(x i β j) 1 + J l=2 exp(x i β l) Intercept in the model is allowed by letting the first column of x i = 1 for every i Laura Magazzini Multiple Choice Models II 10 / 28

11 Estimation: MLE The log likelihood can be written as ln L = n J d ij ln Pr(Y i = j) i=1 j=1 with d ij = 1 if Y i = j, 0 otherwise The derivatives have the characteristically simple form: ln L β j = i (d ij P ij )x i = 0 As a consequence, if the model is estimated with an intercept, i d ij = i P ij = 1 Laura Magazzini Multiple Choice Models II 11 / 28

12 Interpretation of the parameters The partial effects for this model are complicated: [ ] P j J = P j β j P k β k = P j [β j β] x i k=1 The coefficients in this model are difficult to interpret: P j / x k need not have the same sign as β jk A simpler interpretation by considering the odds ratio: ln P ij P i1 = x i β j ln P ij P ik = x i (β j β k ) if k 1 In case of dummy variables (coded as 0 or 1) ln P ij(x i =1) P i1(xi = β =1) j ln P ij(x i =1) P ik(xi = β =1) j β k if k 1 Laura Magazzini Multiple Choice Models II 12 / 28

13 Conditional logit model Pr(Y i = j z j ) = exp(z j β) J k=1 exp(z kβ) The model contains choice-specific attributes The coefficients of individual-specific attributes (that do not vary across categories) are not identified Individual-specific variable can be inserted in the model, but need to be properly transformed All the coefficients of the choice-specific attributes cannot be separately identified: adding a constant to all the coefficients does not change the estimated probability The intercept is set to zero Laura Magazzini Multiple Choice Models II 13 / 28

14 Marginal effects Categorical variable models P j (z) z k = β k [P j (z)(i (j=k) P k (z))] P j (z) z j = β z [P j (z)(1 P j (z))] P j (z) z h = β z P j (z)p h (z) (j h) P j change monotonically with respect to z The sign of the derivative depends on the sign of β z Opposite effect by considering z j or z h Simmetry: P j z h = P h z j P j does not change if all the variables z kh change in the same direction (the ranking of U ij is unchanged!) Laura Magazzini Multiple Choice Models II 14 / 28

15 Multinomial logit (MNL) vs conditional logit (CNL) Similar response probabilities, but they differ in some important respects MNL: the conditioning variables do not change across alternatives Characteristics of the alternatives are unimportant or not of interest, or data are not available Example: occupational choice we do not know how much someone could make in every occupation We can collect data on factors affecting individual productivity and tastes, e.g education, past experience MNL: factors can have different effects on relative probabilities (different β j for different choices) CNL: choices on the basis of observable attributes of each alternative Common β MNL as a special case of CNL Important limitation: independence from irrelevant alternatives assumption Laura Magazzini Multiple Choice Models II 15 / 28

16 Independence from irrelevant alternatives (logit) For every pair of alternatives (k, l), the probability ratio (odd) is ω = Pr(Y i = k x ik ) Pr(Y i = l x il ) = exp(η ik) exp(η il ) ω depends only on the linear predictors (η) of the considered alternatives, not on the whole set of alternatives From the point of view of estimation, it is useful that the odds ratio does not depend on the other choices But it is not a particularly appealing restriction to place on consumer behaviour Laura Magazzini Multiple Choice Models II 16 / 28

17 IIA: example by McFadden (1984) Commuters initially choosing between cars and red buses with equal probabilities Suppose a third mode (blue buses) is added and commuters do not care about the colur of the bus (i.e. will chose between these with equal probability) IIA imply that the fraction of commuters taking a car would fall from, a result that is not very realistic 1 2 to 1 3 Laura Magazzini Multiple Choice Models II 17 / 28

18 Testing IIA Hausman and McFadden (1984) If a subset of the choice set is truly irrelevant, omitting it from the model altogether will not change the parameter estimates sistematically Exclusion of these choices will be inefficient but will not lead to inconsistency But if the remaining odds are not truly independent from these alternatives, then the parameter estimates obtained when these choices are included will be inconsistent Therefore, Hausman s specification test can be applied Laura Magazzini Multiple Choice Models II 18 / 28

19 The Hausman s specification test Consider two different estimators ˆθ E and ˆθ I Under H0, ˆθ E and ˆθ I are both consistent and ˆθ E is efficient relative to ˆθ I Under H1, ˆθ I remains consistent while ˆθ E is inconsistent Then H0 can be tested by using the Hausman statistics: H = (ˆθ I ˆθ E ) [Est.Asy.Var(ˆθ I ˆθ E )] 1 (ˆθ I ˆθ E ) = (ˆθ I ˆθ E ) [Est.Asy.Var(ˆθ I ) Est.Asy.Var(ˆθ E )] 1 (ˆθ I ˆθ E ) d χ 2 J The appropriate degree of freedom for the test will depend on the context In the case of MNL, J is the number of parameter in the estimating equation of the restricted choice set Laura Magazzini Multiple Choice Models II 19 / 28

20 What if IIA hypothesis is not satisfied? (1) Multivariate probit model U j = β x j + ɛ j, j = 1,..., J, [ɛ 1, ɛ 2,..., ɛ J ] N(0, Σ) Pr(Y i = j) = Pr(U j > U k, j = 1, 2,..., J, k j) Main obstacle: difficulty in computing the multivariate normal probability for any dimensionality higher than 2 Recent advances in accurate simulations of multinormal integrals have made estimation of MNP more feasible Simulation-based estimation Laura Magazzini Multiple Choice Models II 20 / 28

21 IIA is maintained within groups, but does not need to hold across groups Main limitations Results can depend on the way in which groups are formed... There is no specification test to discriminated among different Laura Magazzini groupings Multiple Choice Models II 21 / 28 Categorical variable models What if IIA hypothesis is not satisfied? (2) Generalized extreme value: Nested logit models Very appealing if it is possible to assume sequential choices The J alternatives are grouped into L subgroups: (1) First the group of alternative is chosen (2) Then, one alternative is chosen within the group

22 Treatment of rankings Ordered data Y can assume a limited number of categories y c, c = 0, 1,..., C Categories are inherently ordered: y 0 < y 1 < y 2 < y C Examples: Bond rating: AAA-D Symptoms: none, minor, serious Drug effect: worsen, none, partial recovery, full recovery Customer satisfaction: very unsatisfied, unsatisfied, satisfied, very satisfied... Ordered probit and logit models Multinomial models would fail to account for the ordinal nature of the dependent variable OLS would attach a meaning to the difference between the category codings Laura Magazzini Multiple Choice Models II 22 / 28

23 Latent regression Categorical variable models Treatment of rankings We consider a continuous latent variable y (unobserved), linear function of x and ɛ: y = x β + ɛ We observe y = c γ c < y γ c+1, with γ 0 = e γ C+1 = + The latent response is specified by a linear regression model without the intercept Laura Magazzini Multiple Choice Models II 23 / 28

24 Ordered Probit Model y = x β + ɛ with ɛ N(0, 1) Categorical variable models Treatment of rankings Pr(y i = 0 x) = Pr(yi γ 1 ) = Pr(ɛ i γ 1 x β x) = Φ(γ 1 x β) Pr(y i = 1 x) = Pr(γ 1 < yi γ 2 ) = Φ(γ 2 x β) Φ(γ 1 x β). Pr(y i = C x) = Pr(y i > γ C ) = 1 Φ(γ C x β) Usually y has no real meaning The interest is in Pr(y x) rather than E(y x) To identify the parameters: x cannot contain the intercept If you have to specify a model with an intercept, set γ 1 = 0 Laura Magazzini Multiple Choice Models II 24 / 28

25 Marginal effects Categorical variable models Treatment of rankings Coefficients are difficult to interpret: Pr(y i =0 x) x j = β j φ(γ 1 x β) sign opposite to the sign of β j Pr(y i =c x) x j ambiguous sign!!! = β j [φ(γ c+1 x β) φ(γ c x β)] Pr(y i =C x) x j = β j φ(γ C x β) same sign as β j Laura Magazzini Multiple Choice Models II 25 / 28

26 Treatment of rankings Changes in y and y in response to changes in x Increasing one of the x s while holding β and γ constant is equivalent to shifting the distribution of y to the right (solid to dashed curve) Laura Magazzini Multiple Choice Models II 26 / 28

27 Treatment of rankings Ordered Logistic Regression: ɛ i logistica Proportional odds model Pr(y i > c) = ( log Pr(yi >c) 1 Pr(y i >c) exp(x i β γc) 1+exp(x i β γc) ) = x i β γ c Pr(y i >c)/[1 Pr(y i >c)] Pr(y j >c)/[1 Pr(y j >c)] = exp[(x i x j ) β] Doesn t depend on the threshold Laura Magazzini Multiple Choice Models II 27 / 28

28 Treatment of rankings Ordered Probit vs. Ordered Logit Coefficients and threshold parameters are different due to different scale factors (σ probit = 1, whereas σ logit = π 2 /3) Predicted probabilities are similar Marginal effects are similar If the logit is chosen, estimated coefficients can be interpreted in terms of odds Laura Magazzini Multiple Choice Models II 28 / 28

### Multinomial and Ordinal Logistic Regression

Multinomial and Ordinal Logistic Regression ME104: Linear Regression Analysis Kenneth Benoit August 22, 2012 Regression with categorical dependent variables When the dependent variable is categorical,

### Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

### Nominal and ordinal logistic regression

Nominal and ordinal logistic regression April 26 Nominal and ordinal logistic regression Our goal for today is to briefly go over ways to extend the logistic regression model to the case where the outcome

### Models for Count Data With Overdispersion

Models for Count Data With Overdispersion Germán Rodríguez November 6, 2013 Abstract This addendum to the WWS 509 notes covers extra-poisson variation and the negative binomial model, with brief appearances

### LOGIT AND PROBIT ANALYSIS

LOGIT AND PROBIT ANALYSIS A.K. Vasisht I.A.S.R.I., Library Avenue, New Delhi 110 012 amitvasisht@iasri.res.in In dummy regression variable models, it is assumed implicitly that the dependent variable Y

Mplus Short Courses Topic 2 Regression Analysis, Eploratory Factor Analysis, Confirmatory Factor Analysis, And Structural Equation Modeling For Categorical, Censored, And Count Outcomes Linda K. Muthén

### Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max

### Ordinal Regression. Chapter

Ordinal Regression Chapter 4 Many variables of interest are ordinal. That is, you can rank the values, but the real distance between categories is unknown. Diseases are graded on scales from least severe

### Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

### The Proportional Odds Model for Assessing Rater Agreement with Multiple Modalities

The Proportional Odds Model for Assessing Rater Agreement with Multiple Modalities Elizabeth Garrett-Mayer, PhD Assistant Professor Sidney Kimmel Comprehensive Cancer Center Johns Hopkins University 1

### I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

Beckman HLM Reading Group: Questions, Answers and Examples Carolyn J. Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Linear Algebra Slide 1 of

### Reject Inference in Credit Scoring. Jie-Men Mok

Reject Inference in Credit Scoring Jie-Men Mok BMI paper January 2009 ii Preface In the Master programme of Business Mathematics and Informatics (BMI), it is required to perform research on a business

### Logistic Regression (1/24/13)

STA63/CBB540: Statistical methods in computational biology Logistic Regression (/24/3) Lecturer: Barbara Engelhardt Scribe: Dinesh Manandhar Introduction Logistic regression is model for regression used

### Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives

### GENDER DIFFERENCES IN MAJOR CHOICE AND COLLEGE ENTRANCE PROBABILITIES IN BRAZIL

GENDER DIFFERENCES IN MAJOR CHOICE AND COLLEGE ENTRANCE PROBABILITIES IN BRAZIL (PRELIMINARY VERSION) ALEJANDRA TRAFERRI PONTIFICIA UNIVERSIDAD CATÓLICA DE CHILE Abstract. I study gender differences in

### Generalized Linear Models. Today: definition of GLM, maximum likelihood estimation. Involves choice of a link function (systematic component)

Generalized Linear Models Last time: definition of exponential family, derivation of mean and variance (memorize) Today: definition of GLM, maximum likelihood estimation Include predictors x i through

### Logistic regression modeling the probability of success

Logistic regression modeling the probability of success Regression models are usually thought of as only being appropriate for target variables that are continuous Is there any situation where we might

### CREDIT SCORING MODEL APPLICATIONS:

Örebro University Örebro University School of Business Master in Applied Statistics Thomas Laitila Sune Karlsson May, 2014 CREDIT SCORING MODEL APPLICATIONS: TESTING MULTINOMIAL TARGETS Gabriela De Rossi

### Credit Risk Models: An Overview

Credit Risk Models: An Overview Paul Embrechts, Rüdiger Frey, Alexander McNeil ETH Zürich c 2003 (Embrechts, Frey, McNeil) A. Multivariate Models for Portfolio Credit Risk 1. Modelling Dependent Defaults:

### It is important to bear in mind that one of the first three subscripts is redundant since k = i -j +3.

IDENTIFICATION AND ESTIMATION OF AGE, PERIOD AND COHORT EFFECTS IN THE ANALYSIS OF DISCRETE ARCHIVAL DATA Stephen E. Fienberg, University of Minnesota William M. Mason, University of Michigan 1. INTRODUCTION

### Latent Class (Finite Mixture) Segments How to find them and what to do with them

Latent Class (Finite Mixture) Segments How to find them and what to do with them Jay Magidson Statistical Innovations Inc. Belmont, MA USA www.statisticalinnovations.com Sensometrics 2010, Rotterdam Overview

### Maximum Likelihood Estimation

Math 541: Statistical Theory II Lecturer: Songfeng Zheng Maximum Likelihood Estimation 1 Maximum Likelihood Estimation Maximum likelihood is a relatively simple method of constructing an estimator for

### Poisson Models for Count Data

Chapter 4 Poisson Models for Count Data In this chapter we study log-linear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the

### Logistic regression: Model selection

Logistic regression: April 14 The WCGS data Measures of predictive power Today we will look at issues of model selection and measuring the predictive power of a model in logistic regression Our data set

### Microeconometrics Blundell Lecture 1 Overview and Binary Response Models

Microeconometrics Blundell Lecture 1 Overview and Binary Response Models Richard Blundell http://www.ucl.ac.uk/~uctp39a/ University College London February-March 2016 Blundell (University College London)

### MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level of Significance

### Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

### Overview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

Overview Classes 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) 2-4 Loglinear models (8) 5-4 15-17 hrs; 5B02 Building and

### Logit Models for Binary Data

Chapter 3 Logit Models for Binary Data We now turn our attention to regression models for dichotomous data, including logistic regression and probit analysis. These models are appropriate when the response

### Chapter 10: Basic Linear Unobserved Effects Panel Data. Models:

Chapter 10: Basic Linear Unobserved Effects Panel Data Models: Microeconomic Econometrics I Spring 2010 10.1 Motivation: The Omitted Variables Problem We are interested in the partial effects of the observable

### Least Squares Estimation

Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

### problem arises when only a non-random sample is available differs from censored regression model in that x i is also unobserved

4 Data Issues 4.1 Truncated Regression population model y i = x i β + ε i, ε i N(0, σ 2 ) given a random sample, {y i, x i } N i=1, then OLS is consistent and efficient problem arises when only a non-random

### Multivariate Logistic Regression

1 Multivariate Logistic Regression As in univariate logistic regression, let π(x) represent the probability of an event that depends on p covariates or independent variables. Then, using an inv.logit formulation

### Linear Classification. Volker Tresp Summer 2015

Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong

### A General Approach to Variance Estimation under Imputation for Missing Survey Data

A General Approach to Variance Estimation under Imputation for Missing Survey Data J.N.K. Rao Carleton University Ottawa, Canada 1 2 1 Joint work with J.K. Kim at Iowa State University. 2 Workshop on Survey

### Hypothesis Testing. 1 Introduction. 2 Hypotheses. 2.1 Null and Alternative Hypotheses. 2.2 Simple vs. Composite. 2.3 One-Sided and Two-Sided Tests

Hypothesis Testing 1 Introduction This document is a simple tutorial on hypothesis testing. It presents the basic concepts and definitions as well as some frequently asked questions associated with hypothesis

### A Tutorial on Logistic Regression

A Tutorial on Logistic Regression Ying So, SAS Institute Inc., Cary, NC ABSTRACT Many procedures in SAS/STAT can be used to perform logistic regression analysis: CATMOD, GENMOD,LOGISTIC, and PROBIT. Each

### Logit and Probit. Brad Jones 1. April 21, 2009. University of California, Davis. Bradford S. Jones, UC-Davis, Dept. of Political Science

Logit and Probit Brad 1 1 Department of Political Science University of California, Davis April 21, 2009 Logit, redux Logit resolves the functional form problem (in terms of the response function in the

### Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )

Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 ) and Neural Networks( 類 神 經 網 路 ) 許 湘 伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) LR Chap 10 1 / 35 13 Examples

### SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

### Analysis of Microdata

Rainer Winkelmann Stefan Boes Analysis of Microdata With 38 Figures and 41 Tables 4y Springer Contents 1 Introduction 1 1.1 What Are Microdata? 1 1.2 Types of Microdata 4 1.2.1 Qualitative Data 4 1.2.2

### Statistical Machine Learning

Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

### Bayesian Statistics in One Hour. Patrick Lam

Bayesian Statistics in One Hour Patrick Lam Outline Introduction Bayesian Models Applications Missing Data Hierarchical Models Outline Introduction Bayesian Models Applications Missing Data Hierarchical

### Introduction to latent variable models

Introduction to latent variable models lecture 1 Francesco Bartolucci Department of Economics, Finance and Statistics University of Perugia, IT bart@stat.unipg.it Outline [2/24] Latent variables and their

### Discrete Choice Analysis II

Discrete Choice Analysis II Moshe Ben-Akiva 1.201 / 11.545 / ESD.210 Transportation Systems Analysis: Demand & Economics Fall 2008 Review Last Lecture Introduction to Discrete Choice Analysis A simple

### The Probit Link Function in Generalized Linear Models for Data Mining Applications

Journal of Modern Applied Statistical Methods Copyright 2013 JMASM, Inc. May 2013, Vol. 12, No. 1, 164-169 1538 9472/13/\$95.00 The Probit Link Function in Generalized Linear Models for Data Mining Applications

### Chris Slaughter, DrPH. GI Research Conference June 19, 2008

Chris Slaughter, DrPH Assistant Professor, Department of Biostatistics Vanderbilt University School of Medicine GI Research Conference June 19, 2008 Outline 1 2 3 Factors that Impact Power 4 5 6 Conclusions

### Generalized Linear Models

Generalized Linear Models We have previously worked with regression models where the response variable is quantitative and normally distributed. Now we turn our attention to two types of models where the

### Lecture 13: Introduction to generalized linear models

Lecture 13: Introduction to generalized linear models 21 November 2007 1 Introduction Recall that we ve looked at linear models, which specify a conditional probability density P(Y X) of the form Y = α

### 3. The Multivariate Normal Distribution

3. The Multivariate Normal Distribution 3.1 Introduction A generalization of the familiar bell shaped normal density to several dimensions plays a fundamental role in multivariate analysis While real data

### Standard errors of marginal effects in the heteroskedastic probit model

Standard errors of marginal effects in the heteroskedastic probit model Thomas Cornelißen Discussion Paper No. 320 August 2005 ISSN: 0949 9962 Abstract In non-linear regression models, such as the heteroskedastic

### Chapter 7: Dummy variable regression

Chapter 7: Dummy variable regression Why include a qualitative independent variable?........................................ 2 Simplest model 3 Simplest case.............................................................

### The basic unit in matrix algebra is a matrix, generally expressed as: a 11 a 12. a 13 A = a 21 a 22 a 23

(copyright by Scott M Lynch, February 2003) Brief Matrix Algebra Review (Soc 504) Matrix algebra is a form of mathematics that allows compact notation for, and mathematical manipulation of, high-dimensional

### LOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as

LOGISTIC REGRESSION Nitin R Patel Logistic regression extends the ideas of multiple linear regression to the situation where the dependent variable, y, is binary (for convenience we often code these values

### Interpretation of Somers D under four simple models

Interpretation of Somers D under four simple models Roger B. Newson 03 September, 04 Introduction Somers D is an ordinal measure of association introduced by Somers (96)[9]. It can be defined in terms

### Web-based Supplementary Materials for Bayesian Effect Estimation. Accounting for Adjustment Uncertainty by Chi Wang, Giovanni

1 Web-based Supplementary Materials for Bayesian Effect Estimation Accounting for Adjustment Uncertainty by Chi Wang, Giovanni Parmigiani, and Francesca Dominici In Web Appendix A, we provide detailed

### Gerry Hobbs, Department of Statistics, West Virginia University

Decision Trees as a Predictive Modeling Method Gerry Hobbs, Department of Statistics, West Virginia University Abstract Predictive modeling has become an important area of interest in tasks such as credit

### HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate

### Using the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, and Discrete Changes

Using the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, Discrete Changes JunXuJ.ScottLong Indiana University August 22, 2005 The paper provides technical details on

### Structural Econometric Modeling in Industrial Organization Handout 1

Structural Econometric Modeling in Industrial Organization Handout 1 Professor Matthijs Wildenbeest 16 May 2011 1 Reading Peter C. Reiss and Frank A. Wolak A. Structural Econometric Modeling: Rationales

### Qualitative Choice Analysis Workshop 76 LECTURE / DISCUSSION. Hypothesis Testing

Qualitative Choice Analysis Workshop 76 LECTURE / DISCUSSION Hypothesis Testing Qualitative Choice Analysis Workshop 77 T-test Use to test value of one parameter. I. Most common application: to test whether

### Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

### ANNUITY LAPSE RATE MODELING: TOBIT OR NOT TOBIT? 1. INTRODUCTION

ANNUITY LAPSE RATE MODELING: TOBIT OR NOT TOBIT? SAMUEL H. COX AND YIJIA LIN ABSTRACT. We devise an approach, using tobit models for modeling annuity lapse rates. The approach is based on data provided

### 11. Analysis of Case-control Studies Logistic Regression

Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:

### VI. Introduction to Logistic Regression

VI. Introduction to Logistic Regression We turn our attention now to the topic of modeling a categorical outcome as a function of (possibly) several factors. The framework of generalized linear models

### ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2

University of California, Berkeley Prof. Ken Chay Department of Economics Fall Semester, 005 ECON 14 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE # Question 1: a. Below are the scatter plots of hourly wages

: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

### Introduction to Quantitative Methods

Introduction to Quantitative Methods October 15, 2009 Contents 1 Definition of Key Terms 2 2 Descriptive Statistics 3 2.1 Frequency Tables......................... 4 2.2 Measures of Central Tendencies.................

### Models for Longitudinal and Clustered Data

Models for Longitudinal and Clustered Data Germán Rodríguez December 9, 2008, revised December 6, 2012 1 Introduction The most important assumption we have made in this course is that the observations

### Introduction to Regression and Data Analysis

Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it

### Handling missing data in Stata a whirlwind tour

Handling missing data in Stata a whirlwind tour 2012 Italian Stata Users Group Meeting Jonathan Bartlett www.missingdata.org.uk 20th September 2012 1/55 Outline The problem of missing data and a principled

### Using An Ordered Logistic Regression Model with SAS Vartanian: SW 541

Using An Ordered Logistic Regression Model with SAS Vartanian: SW 541 libname in1 >c:\=; Data first; Set in1.extract; A=1; PROC LOGIST OUTEST=DD MAXITER=100 ORDER=DATA; OUTPUT OUT=CC XBETA=XB P=PROB; MODEL

### 7. Tests of association and Linear Regression

7. Tests of association and Linear Regression In this chapter we consider 1. Tests of Association for 2 qualitative variables. 2. Measures of the strength of linear association between 2 quantitative variables.

### Panel Data: Linear Models

Panel Data: Linear Models Laura Magazzini University of Verona laura.magazzini@univr.it http://dse.univr.it/magazzini Laura Magazzini (@univr.it) Panel Data: Linear Models 1 / 45 Introduction Outline What

### Multinomial Logistic Regression

Multinomial Logistic Regression Dr. Jon Starkweather and Dr. Amanda Kay Moske Multinomial logistic regression is used to predict categorical placement in or the probability of category membership on a

### Simple Linear Regression Inference

Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

### Machine Learning Logistic Regression

Machine Learning Logistic Regression Jeff Howbert Introduction to Machine Learning Winter 2012 1 Logistic regression Name is somewhat misleading. Really a technique for classification, not regression.

### 5. Multiple regression

5. Multiple regression QBUS6840 Predictive Analytics https://www.otexts.org/fpp/5 QBUS6840 Predictive Analytics 5. Multiple regression 2/39 Outline Introduction to multiple linear regression Some useful

### Gender Effects in the Alaska Juvenile Justice System

Gender Effects in the Alaska Juvenile Justice System Report to the Justice and Statistics Research Association by André Rosay Justice Center University of Alaska Anchorage JC 0306.05 October 2003 Gender

### Econometric Analysis of Cross Section and Panel Data Second Edition. Jeffrey M. Wooldridge. The MIT Press Cambridge, Massachusetts London, England

Econometric Analysis of Cross Section and Panel Data Second Edition Jeffrey M. Wooldridge The MIT Press Cambridge, Massachusetts London, England Preface Acknowledgments xxi xxix I INTRODUCTION AND BACKGROUND

### Structural Equation Models for Comparing Dependent Means and Proportions. Jason T. Newsom

Structural Equation Models for Comparing Dependent Means and Proportions Jason T. Newsom How to Do a Paired t-test with Structural Equation Modeling Jason T. Newsom Overview Rationale Structural equation

### Variance of OLS Estimators and Hypothesis Testing. Randomness in the model. GM assumptions. Notes. Notes. Notes. Charlie Gibbons ARE 212.

Variance of OLS Estimators and Hypothesis Testing Charlie Gibbons ARE 212 Spring 2011 Randomness in the model Considering the model what is random? Y = X β + ɛ, β is a parameter and not random, X may be

### Sampling Theory for Discrete Data

Sampling Theory for Discrete Data * Economic survey data are often obtained from sampling protocols that involve stratification, censoring, or selection. Econometric estimators designed for random samples

### Lecture notes: single-agent dynamics 1

Lecture notes: single-agent dynamics 1 Single-agent dynamic optimization models In these lecture notes we consider specification and estimation of dynamic optimization models. Focus on single-agent models.

### LOGISTIC REGRESSION ANALYSIS

LOGISTIC REGRESSION ANALYSIS C. Mitchell Dayton Department of Measurement, Statistics & Evaluation Room 1230D Benjamin Building University of Maryland September 1992 1. Introduction and Model Logistic

### DEPARTMENT OF ECONOMICS. Unit ECON 12122 Introduction to Econometrics. Notes 4 2. R and F tests

DEPARTMENT OF ECONOMICS Unit ECON 11 Introduction to Econometrics Notes 4 R and F tests These notes provide a summary of the lectures. They are not a complete account of the unit material. You should also

### Semester 1 Statistics Short courses

Semester 1 Statistics Short courses Course: STAA0001 Basic Statistics Blackboard Site: STAA0001 Dates: Sat. March 12 th and Sat. April 30 th (9 am 5 pm) Assumed Knowledge: None Course Description Statistical

### Automated Statistical Modeling for Data Mining David Stephenson 1

Automated Statistical Modeling for Data Mining David Stephenson 1 Abstract. We seek to bridge the gap between basic statistical data mining tools and advanced statistical analysis software that requires

### IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results

IAPRI Quantitative Analysis Capacity Building Series Multiple regression analysis & interpreting results How important is R-squared? R-squared Published in Agricultural Economics 0.45 Best article of the

### Lecture 19: Conditional Logistic Regression

Lecture 19: Conditional Logistic Regression Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina

### The zero-adjusted Inverse Gaussian distribution as a model for insurance claims

The zero-adjusted Inverse Gaussian distribution as a model for insurance claims Gillian Heller 1, Mikis Stasinopoulos 2 and Bob Rigby 2 1 Dept of Statistics, Macquarie University, Sydney, Australia. email:

### Deterministic and Stochastic Modeling of Insulin Sensitivity

Deterministic and Stochastic Modeling of Insulin Sensitivity Master s Thesis in Engineering Mathematics and Computational Science ELÍN ÖSP VILHJÁLMSDÓTTIR Department of Mathematical Science Chalmers University

### Agenda. Mathias Lanner Sas Institute. Predictive Modeling Applications. Predictive Modeling Training Data. Beslutsträd och andra prediktiva modeller

Agenda Introduktion till Prediktiva modeller Beslutsträd Beslutsträd och andra prediktiva modeller Mathias Lanner Sas Institute Pruning Regressioner Neurala Nätverk Utvärdering av modeller 2 Predictive

### Regression with a Binary Dependent Variable

Regression with a Binary Dependent Variable Chapter 9 Michael Ash CPPA Lecture 22 Course Notes Endgame Take-home final Distributed Friday 19 May Due Tuesday 23 May (Paper or emailed PDF ok; no Word, Excel,

Lecture 4: Transformations Regression III: Advanced Methods William G. Jacoby Michigan State University Goals of the lecture The Ladder of Roots and Powers Changing the shape of distributions Transforming

### Free Trial - BIRT Analytics - IAAs

Free Trial - BIRT Analytics - IAAs 11. Predict Customer Gender Once we log in to BIRT Analytics Free Trial we would see that we have some predefined advanced analysis ready to be used. Those saved analysis

### Estimating the random coefficients logit model of demand using aggregate data

Estimating the random coefficients logit model of demand using aggregate data David Vincent Deloitte Economic Consulting London, UK davivincent@deloitte.co.uk September 14, 2012 Introduction Estimation