Multiple Choice Models II

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "Multiple Choice Models II"

Transcription

1 Multiple Choice Models II Laura Magazzini University of Verona Laura Magazzini Multiple Choice Models II 1 / 28

2 Categorical data Categorical variable models Y is the result of a single decision among more than 2 alternatives Unordered choice set: Categories/Qualitative choices multinomial logit, conditional logit, nested logit Ordered choice set (rankings): models for ordered data ordered probit Laura Magazzini Multiple Choice Models II 2 / 28

3 Example: Education and Occupational Choice Education Primary/Secondary University Occupation School or more Total Menial 23 (74.19%) 8 (25.81%) 31 (100%) Blue Collar 60 (86.96%) 9 (13.04%) 69 (100%) Craft 65 (77.38%) 19 (22.62%) 84 (100%) WhiteCol 27 (65.85%) 14 (34.15%) 41 (100%) Prof 27 (24.11%) 85 (75.89%) 112 (100%) Total 202 (59.94%) 135 (40.06%) 337 (100%) Laura Magazzini Multiple Choice Models II 3 / 28

4 Multinomial distribution Y i : qualitative random variable with J categories P ij = Pr(Y i = j), j = 1, 2,..., J Probability that individual i will choose alternative j Categories are mutually exclusive and exaustive: P ij = 1, i = 1, 2,..., N j Let d i = (d i1, d i2,..., d ij ), where d ij = 1 if Y i = j j d ij = 1, i = 1, 2,..., N Laura Magazzini Multiple Choice Models II 4 / 28

5 Multinomial logit model (MNL) Y : result of a choice among J alternatives (J > 2) d i = (d i1, d i2,..., d ij ), where d ij = 1 if Y i = j P ij = Pr(Y i = j), j P ij = 1 Logit model: Pr(Y i = j) = exp(η ij ) J l=1 exp(η il) Laura Magazzini Multiple Choice Models II 5 / 28

6 Properties of MNL Categorical variable models 0 P ij 1 j P ij = 1 (by definition) For every pair of alternatives (k, l), the probability ratio is P ik P il = exp(η ik) exp(η il ) log P ik P il = η ik η il The model can be motivated by a random utility model Laura Magazzini Multiple Choice Models II 6 / 28

7 Random Utility Models (1) McFadden (1973, 2001) J alternatives: mutually exclusive, exhaustive, finite set Examples: competing brands, different means of transport, different occupations,... Categories can be ordered or unordered Different tecniques will be employed according to the nature of the alternatives Assume non-ordered alternatives Rational agent chooses the alternative that maximizes his/her utility: Y i = j if U ij > U ik for each k j Laura Magazzini Multiple Choice Models II 7 / 28

8 Random Utility Models (2) McFadden (1973, 2001) Linear utility model: U ij = η ij + ɛ ij with η ij = LC(z ij, θ) η ij links the agent utility to factors that can be observed η ij is different from U ij since there are factors that cannot be observed by the researcher Pr(Y i = j) = Pr(U ij > U ik, k j) = Pr(η ij + ɛ ij > η ik + ɛ ik, k j) = Pr(ɛ ik ɛ ij < η ij η ik, k j) = I (ɛik ɛ ij <η ij η ik, k j)f (ɛ)dɛ ɛ with f probability density function of ɛ The model is made operational by a particular choice of distribution for the disturbance Closed functional forms exist only for few specifications (e.g. logit) Laura Magazzini Multiple Choice Models II 8 / 28

9 How to specify η ij? Categorical variable models Standard MNL η ij = x i β j x individual characteristics, constant across all the alternatives j Conditional logit model η ij = z ij γ z ij characteristics of the choice j and individual i - Datasets typically analyzed by economists do not contain mixtures of individual and choice-specific attributes - CLM is usually applied when the interest is in the effect of choice-specific attributes - Custom transformation is needed for variables containing individual-specific attributes Laura Magazzini Multiple Choice Models II 9 / 28

10 Standard MNL Pr(Y i = j x i ) = exp(x i β j) J l=1 exp(x i β l) It is not possible to estimates all the β 1,..., β J By adding a constant to all the βs, the probability doesn t change Indeterminacy in the model is removed by letting β 1 = 0 J = 1 is the reference category Pr(Y i = j x i ) = exp(x i β j) 1 + J l=2 exp(x i β l) Intercept in the model is allowed by letting the first column of x i = 1 for every i Laura Magazzini Multiple Choice Models II 10 / 28

11 Estimation: MLE The log likelihood can be written as ln L = n J d ij ln Pr(Y i = j) i=1 j=1 with d ij = 1 if Y i = j, 0 otherwise The derivatives have the characteristically simple form: ln L β j = i (d ij P ij )x i = 0 As a consequence, if the model is estimated with an intercept, i d ij = i P ij = 1 Laura Magazzini Multiple Choice Models II 11 / 28

12 Interpretation of the parameters The partial effects for this model are complicated: [ ] P j J = P j β j P k β k = P j [β j β] x i k=1 The coefficients in this model are difficult to interpret: P j / x k need not have the same sign as β jk A simpler interpretation by considering the odds ratio: ln P ij P i1 = x i β j ln P ij P ik = x i (β j β k ) if k 1 In case of dummy variables (coded as 0 or 1) ln P ij(x i =1) P i1(xi = β =1) j ln P ij(x i =1) P ik(xi = β =1) j β k if k 1 Laura Magazzini Multiple Choice Models II 12 / 28

13 Conditional logit model Pr(Y i = j z j ) = exp(z j β) J k=1 exp(z kβ) The model contains choice-specific attributes The coefficients of individual-specific attributes (that do not vary across categories) are not identified Individual-specific variable can be inserted in the model, but need to be properly transformed All the coefficients of the choice-specific attributes cannot be separately identified: adding a constant to all the coefficients does not change the estimated probability The intercept is set to zero Laura Magazzini Multiple Choice Models II 13 / 28

14 Marginal effects Categorical variable models P j (z) z k = β k [P j (z)(i (j=k) P k (z))] P j (z) z j = β z [P j (z)(1 P j (z))] P j (z) z h = β z P j (z)p h (z) (j h) P j change monotonically with respect to z The sign of the derivative depends on the sign of β z Opposite effect by considering z j or z h Simmetry: P j z h = P h z j P j does not change if all the variables z kh change in the same direction (the ranking of U ij is unchanged!) Laura Magazzini Multiple Choice Models II 14 / 28

15 Multinomial logit (MNL) vs conditional logit (CNL) Similar response probabilities, but they differ in some important respects MNL: the conditioning variables do not change across alternatives Characteristics of the alternatives are unimportant or not of interest, or data are not available Example: occupational choice we do not know how much someone could make in every occupation We can collect data on factors affecting individual productivity and tastes, e.g education, past experience MNL: factors can have different effects on relative probabilities (different β j for different choices) CNL: choices on the basis of observable attributes of each alternative Common β MNL as a special case of CNL Important limitation: independence from irrelevant alternatives assumption Laura Magazzini Multiple Choice Models II 15 / 28

16 Independence from irrelevant alternatives (logit) For every pair of alternatives (k, l), the probability ratio (odd) is ω = Pr(Y i = k x ik ) Pr(Y i = l x il ) = exp(η ik) exp(η il ) ω depends only on the linear predictors (η) of the considered alternatives, not on the whole set of alternatives From the point of view of estimation, it is useful that the odds ratio does not depend on the other choices But it is not a particularly appealing restriction to place on consumer behaviour Laura Magazzini Multiple Choice Models II 16 / 28

17 IIA: example by McFadden (1984) Commuters initially choosing between cars and red buses with equal probabilities Suppose a third mode (blue buses) is added and commuters do not care about the colur of the bus (i.e. will chose between these with equal probability) IIA imply that the fraction of commuters taking a car would fall from, a result that is not very realistic 1 2 to 1 3 Laura Magazzini Multiple Choice Models II 17 / 28

18 Testing IIA Hausman and McFadden (1984) If a subset of the choice set is truly irrelevant, omitting it from the model altogether will not change the parameter estimates sistematically Exclusion of these choices will be inefficient but will not lead to inconsistency But if the remaining odds are not truly independent from these alternatives, then the parameter estimates obtained when these choices are included will be inconsistent Therefore, Hausman s specification test can be applied Laura Magazzini Multiple Choice Models II 18 / 28

19 The Hausman s specification test Consider two different estimators ˆθ E and ˆθ I Under H0, ˆθ E and ˆθ I are both consistent and ˆθ E is efficient relative to ˆθ I Under H1, ˆθ I remains consistent while ˆθ E is inconsistent Then H0 can be tested by using the Hausman statistics: H = (ˆθ I ˆθ E ) [Est.Asy.Var(ˆθ I ˆθ E )] 1 (ˆθ I ˆθ E ) = (ˆθ I ˆθ E ) [Est.Asy.Var(ˆθ I ) Est.Asy.Var(ˆθ E )] 1 (ˆθ I ˆθ E ) d χ 2 J The appropriate degree of freedom for the test will depend on the context In the case of MNL, J is the number of parameter in the estimating equation of the restricted choice set Laura Magazzini Multiple Choice Models II 19 / 28

20 What if IIA hypothesis is not satisfied? (1) Multivariate probit model U j = β x j + ɛ j, j = 1,..., J, [ɛ 1, ɛ 2,..., ɛ J ] N(0, Σ) Pr(Y i = j) = Pr(U j > U k, j = 1, 2,..., J, k j) Main obstacle: difficulty in computing the multivariate normal probability for any dimensionality higher than 2 Recent advances in accurate simulations of multinormal integrals have made estimation of MNP more feasible Simulation-based estimation Laura Magazzini Multiple Choice Models II 20 / 28

21 IIA is maintained within groups, but does not need to hold across groups Main limitations Results can depend on the way in which groups are formed... There is no specification test to discriminated among different Laura Magazzini groupings Multiple Choice Models II 21 / 28 Categorical variable models What if IIA hypothesis is not satisfied? (2) Generalized extreme value: Nested logit models Very appealing if it is possible to assume sequential choices The J alternatives are grouped into L subgroups: (1) First the group of alternative is chosen (2) Then, one alternative is chosen within the group

22 Treatment of rankings Ordered data Y can assume a limited number of categories y c, c = 0, 1,..., C Categories are inherently ordered: y 0 < y 1 < y 2 < y C Examples: Bond rating: AAA-D Symptoms: none, minor, serious Drug effect: worsen, none, partial recovery, full recovery Customer satisfaction: very unsatisfied, unsatisfied, satisfied, very satisfied... Ordered probit and logit models Multinomial models would fail to account for the ordinal nature of the dependent variable OLS would attach a meaning to the difference between the category codings Laura Magazzini Multiple Choice Models II 22 / 28

23 Latent regression Categorical variable models Treatment of rankings We consider a continuous latent variable y (unobserved), linear function of x and ɛ: y = x β + ɛ We observe y = c γ c < y γ c+1, with γ 0 = e γ C+1 = + The latent response is specified by a linear regression model without the intercept Laura Magazzini Multiple Choice Models II 23 / 28

24 Ordered Probit Model y = x β + ɛ with ɛ N(0, 1) Categorical variable models Treatment of rankings Pr(y i = 0 x) = Pr(yi γ 1 ) = Pr(ɛ i γ 1 x β x) = Φ(γ 1 x β) Pr(y i = 1 x) = Pr(γ 1 < yi γ 2 ) = Φ(γ 2 x β) Φ(γ 1 x β). Pr(y i = C x) = Pr(y i > γ C ) = 1 Φ(γ C x β) Usually y has no real meaning The interest is in Pr(y x) rather than E(y x) To identify the parameters: x cannot contain the intercept If you have to specify a model with an intercept, set γ 1 = 0 Laura Magazzini Multiple Choice Models II 24 / 28

25 Marginal effects Categorical variable models Treatment of rankings Coefficients are difficult to interpret: Pr(y i =0 x) x j = β j φ(γ 1 x β) sign opposite to the sign of β j Pr(y i =c x) x j ambiguous sign!!! = β j [φ(γ c+1 x β) φ(γ c x β)] Pr(y i =C x) x j = β j φ(γ C x β) same sign as β j Laura Magazzini Multiple Choice Models II 25 / 28

26 Treatment of rankings Changes in y and y in response to changes in x Increasing one of the x s while holding β and γ constant is equivalent to shifting the distribution of y to the right (solid to dashed curve) Laura Magazzini Multiple Choice Models II 26 / 28

27 Treatment of rankings Ordered Logistic Regression: ɛ i logistica Proportional odds model Pr(y i > c) = ( log Pr(yi >c) 1 Pr(y i >c) exp(x i β γc) 1+exp(x i β γc) ) = x i β γ c Pr(y i >c)/[1 Pr(y i >c)] Pr(y j >c)/[1 Pr(y j >c)] = exp[(x i x j ) β] Doesn t depend on the threshold Laura Magazzini Multiple Choice Models II 27 / 28

28 Treatment of rankings Ordered Probit vs. Ordered Logit Coefficients and threshold parameters are different due to different scale factors (σ probit = 1, whereas σ logit = π 2 /3) Predicted probabilities are similar Marginal effects are similar If the logit is chosen, estimated coefficients can be interpreted in terms of odds Laura Magazzini Multiple Choice Models II 28 / 28

Multinomial and Ordinal Logistic Regression

Multinomial and Ordinal Logistic Regression Multinomial and Ordinal Logistic Regression ME104: Linear Regression Analysis Kenneth Benoit August 22, 2012 Regression with categorical dependent variables When the dependent variable is categorical,

More information

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not. Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

More information

Nominal and ordinal logistic regression

Nominal and ordinal logistic regression Nominal and ordinal logistic regression April 26 Nominal and ordinal logistic regression Our goal for today is to briefly go over ways to extend the logistic regression model to the case where the outcome

More information

Models for Count Data With Overdispersion

Models for Count Data With Overdispersion Models for Count Data With Overdispersion Germán Rodríguez November 6, 2013 Abstract This addendum to the WWS 509 notes covers extra-poisson variation and the negative binomial model, with brief appearances

More information

LOGIT AND PROBIT ANALYSIS

LOGIT AND PROBIT ANALYSIS LOGIT AND PROBIT ANALYSIS A.K. Vasisht I.A.S.R.I., Library Avenue, New Delhi 110 012 amitvasisht@iasri.res.in In dummy regression variable models, it is assumed implicitly that the dependent variable Y

More information

Linda K. Muthén Bengt Muthén. Copyright 2008 Muthén & Muthén www.statmodel.com. Table Of Contents

Linda K. Muthén Bengt Muthén. Copyright 2008 Muthén & Muthén www.statmodel.com. Table Of Contents Mplus Short Courses Topic 2 Regression Analysis, Eploratory Factor Analysis, Confirmatory Factor Analysis, And Structural Equation Modeling For Categorical, Censored, And Count Outcomes Linda K. Muthén

More information

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max

More information

Ordinal Regression. Chapter

Ordinal Regression. Chapter Ordinal Regression Chapter 4 Many variables of interest are ordinal. That is, you can rank the values, but the real distance between categories is unknown. Diseases are graded on scales from least severe

More information

Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

More information

The Proportional Odds Model for Assessing Rater Agreement with Multiple Modalities

The Proportional Odds Model for Assessing Rater Agreement with Multiple Modalities The Proportional Odds Model for Assessing Rater Agreement with Multiple Modalities Elizabeth Garrett-Mayer, PhD Assistant Professor Sidney Kimmel Comprehensive Cancer Center Johns Hopkins University 1

More information

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Beckman HLM Reading Group: Questions, Answers and Examples Carolyn J. Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Linear Algebra Slide 1 of

More information

Reject Inference in Credit Scoring. Jie-Men Mok

Reject Inference in Credit Scoring. Jie-Men Mok Reject Inference in Credit Scoring Jie-Men Mok BMI paper January 2009 ii Preface In the Master programme of Business Mathematics and Informatics (BMI), it is required to perform research on a business

More information

Logistic Regression (1/24/13)

Logistic Regression (1/24/13) STA63/CBB540: Statistical methods in computational biology Logistic Regression (/24/3) Lecturer: Barbara Engelhardt Scribe: Dinesh Manandhar Introduction Logistic regression is model for regression used

More information

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives

More information

GENDER DIFFERENCES IN MAJOR CHOICE AND COLLEGE ENTRANCE PROBABILITIES IN BRAZIL

GENDER DIFFERENCES IN MAJOR CHOICE AND COLLEGE ENTRANCE PROBABILITIES IN BRAZIL GENDER DIFFERENCES IN MAJOR CHOICE AND COLLEGE ENTRANCE PROBABILITIES IN BRAZIL (PRELIMINARY VERSION) ALEJANDRA TRAFERRI PONTIFICIA UNIVERSIDAD CATÓLICA DE CHILE Abstract. I study gender differences in

More information

Generalized Linear Models. Today: definition of GLM, maximum likelihood estimation. Involves choice of a link function (systematic component)

Generalized Linear Models. Today: definition of GLM, maximum likelihood estimation. Involves choice of a link function (systematic component) Generalized Linear Models Last time: definition of exponential family, derivation of mean and variance (memorize) Today: definition of GLM, maximum likelihood estimation Include predictors x i through

More information

Logistic regression modeling the probability of success

Logistic regression modeling the probability of success Logistic regression modeling the probability of success Regression models are usually thought of as only being appropriate for target variables that are continuous Is there any situation where we might

More information

CREDIT SCORING MODEL APPLICATIONS:

CREDIT SCORING MODEL APPLICATIONS: Örebro University Örebro University School of Business Master in Applied Statistics Thomas Laitila Sune Karlsson May, 2014 CREDIT SCORING MODEL APPLICATIONS: TESTING MULTINOMIAL TARGETS Gabriela De Rossi

More information

Credit Risk Models: An Overview

Credit Risk Models: An Overview Credit Risk Models: An Overview Paul Embrechts, Rüdiger Frey, Alexander McNeil ETH Zürich c 2003 (Embrechts, Frey, McNeil) A. Multivariate Models for Portfolio Credit Risk 1. Modelling Dependent Defaults:

More information

It is important to bear in mind that one of the first three subscripts is redundant since k = i -j +3.

It is important to bear in mind that one of the first three subscripts is redundant since k = i -j +3. IDENTIFICATION AND ESTIMATION OF AGE, PERIOD AND COHORT EFFECTS IN THE ANALYSIS OF DISCRETE ARCHIVAL DATA Stephen E. Fienberg, University of Minnesota William M. Mason, University of Michigan 1. INTRODUCTION

More information

Latent Class (Finite Mixture) Segments How to find them and what to do with them

Latent Class (Finite Mixture) Segments How to find them and what to do with them Latent Class (Finite Mixture) Segments How to find them and what to do with them Jay Magidson Statistical Innovations Inc. Belmont, MA USA www.statisticalinnovations.com Sensometrics 2010, Rotterdam Overview

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Math 541: Statistical Theory II Lecturer: Songfeng Zheng Maximum Likelihood Estimation 1 Maximum Likelihood Estimation Maximum likelihood is a relatively simple method of constructing an estimator for

More information

Poisson Models for Count Data

Poisson Models for Count Data Chapter 4 Poisson Models for Count Data In this chapter we study log-linear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the

More information

Logistic regression: Model selection

Logistic regression: Model selection Logistic regression: April 14 The WCGS data Measures of predictive power Today we will look at issues of model selection and measuring the predictive power of a model in logistic regression Our data set

More information

Microeconometrics Blundell Lecture 1 Overview and Binary Response Models

Microeconometrics Blundell Lecture 1 Overview and Binary Response Models Microeconometrics Blundell Lecture 1 Overview and Binary Response Models Richard Blundell http://www.ucl.ac.uk/~uctp39a/ University College London February-March 2016 Blundell (University College London)

More information

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level of Significance

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

More information

Overview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

Overview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) Overview Classes 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) 2-4 Loglinear models (8) 5-4 15-17 hrs; 5B02 Building and

More information

Logit Models for Binary Data

Logit Models for Binary Data Chapter 3 Logit Models for Binary Data We now turn our attention to regression models for dichotomous data, including logistic regression and probit analysis. These models are appropriate when the response

More information

Chapter 10: Basic Linear Unobserved Effects Panel Data. Models:

Chapter 10: Basic Linear Unobserved Effects Panel Data. Models: Chapter 10: Basic Linear Unobserved Effects Panel Data Models: Microeconomic Econometrics I Spring 2010 10.1 Motivation: The Omitted Variables Problem We are interested in the partial effects of the observable

More information

Least Squares Estimation

Least Squares Estimation Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

More information

problem arises when only a non-random sample is available differs from censored regression model in that x i is also unobserved

problem arises when only a non-random sample is available differs from censored regression model in that x i is also unobserved 4 Data Issues 4.1 Truncated Regression population model y i = x i β + ε i, ε i N(0, σ 2 ) given a random sample, {y i, x i } N i=1, then OLS is consistent and efficient problem arises when only a non-random

More information

Multivariate Logistic Regression

Multivariate Logistic Regression 1 Multivariate Logistic Regression As in univariate logistic regression, let π(x) represent the probability of an event that depends on p covariates or independent variables. Then, using an inv.logit formulation

More information

Linear Classification. Volker Tresp Summer 2015

Linear Classification. Volker Tresp Summer 2015 Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong

More information

A General Approach to Variance Estimation under Imputation for Missing Survey Data

A General Approach to Variance Estimation under Imputation for Missing Survey Data A General Approach to Variance Estimation under Imputation for Missing Survey Data J.N.K. Rao Carleton University Ottawa, Canada 1 2 1 Joint work with J.K. Kim at Iowa State University. 2 Workshop on Survey

More information

Hypothesis Testing. 1 Introduction. 2 Hypotheses. 2.1 Null and Alternative Hypotheses. 2.2 Simple vs. Composite. 2.3 One-Sided and Two-Sided Tests

Hypothesis Testing. 1 Introduction. 2 Hypotheses. 2.1 Null and Alternative Hypotheses. 2.2 Simple vs. Composite. 2.3 One-Sided and Two-Sided Tests Hypothesis Testing 1 Introduction This document is a simple tutorial on hypothesis testing. It presents the basic concepts and definitions as well as some frequently asked questions associated with hypothesis

More information

A Tutorial on Logistic Regression

A Tutorial on Logistic Regression A Tutorial on Logistic Regression Ying So, SAS Institute Inc., Cary, NC ABSTRACT Many procedures in SAS/STAT can be used to perform logistic regression analysis: CATMOD, GENMOD,LOGISTIC, and PROBIT. Each

More information

Logit and Probit. Brad Jones 1. April 21, 2009. University of California, Davis. Bradford S. Jones, UC-Davis, Dept. of Political Science

Logit and Probit. Brad Jones 1. April 21, 2009. University of California, Davis. Bradford S. Jones, UC-Davis, Dept. of Political Science Logit and Probit Brad 1 1 Department of Political Science University of California, Davis April 21, 2009 Logit, redux Logit resolves the functional form problem (in terms of the response function in the

More information

Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )

Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 ) Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 ) and Neural Networks( 類 神 經 網 路 ) 許 湘 伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) LR Chap 10 1 / 35 13 Examples

More information

SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

More information

Analysis of Microdata

Analysis of Microdata Rainer Winkelmann Stefan Boes Analysis of Microdata With 38 Figures and 41 Tables 4y Springer Contents 1 Introduction 1 1.1 What Are Microdata? 1 1.2 Types of Microdata 4 1.2.1 Qualitative Data 4 1.2.2

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

More information

Bayesian Statistics in One Hour. Patrick Lam

Bayesian Statistics in One Hour. Patrick Lam Bayesian Statistics in One Hour Patrick Lam Outline Introduction Bayesian Models Applications Missing Data Hierarchical Models Outline Introduction Bayesian Models Applications Missing Data Hierarchical

More information

Introduction to latent variable models

Introduction to latent variable models Introduction to latent variable models lecture 1 Francesco Bartolucci Department of Economics, Finance and Statistics University of Perugia, IT bart@stat.unipg.it Outline [2/24] Latent variables and their

More information

Discrete Choice Analysis II

Discrete Choice Analysis II Discrete Choice Analysis II Moshe Ben-Akiva 1.201 / 11.545 / ESD.210 Transportation Systems Analysis: Demand & Economics Fall 2008 Review Last Lecture Introduction to Discrete Choice Analysis A simple

More information

The Probit Link Function in Generalized Linear Models for Data Mining Applications

The Probit Link Function in Generalized Linear Models for Data Mining Applications Journal of Modern Applied Statistical Methods Copyright 2013 JMASM, Inc. May 2013, Vol. 12, No. 1, 164-169 1538 9472/13/$95.00 The Probit Link Function in Generalized Linear Models for Data Mining Applications

More information

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm

More information

Chris Slaughter, DrPH. GI Research Conference June 19, 2008

Chris Slaughter, DrPH. GI Research Conference June 19, 2008 Chris Slaughter, DrPH Assistant Professor, Department of Biostatistics Vanderbilt University School of Medicine GI Research Conference June 19, 2008 Outline 1 2 3 Factors that Impact Power 4 5 6 Conclusions

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models We have previously worked with regression models where the response variable is quantitative and normally distributed. Now we turn our attention to two types of models where the

More information

Lecture 13: Introduction to generalized linear models

Lecture 13: Introduction to generalized linear models Lecture 13: Introduction to generalized linear models 21 November 2007 1 Introduction Recall that we ve looked at linear models, which specify a conditional probability density P(Y X) of the form Y = α

More information

3. The Multivariate Normal Distribution

3. The Multivariate Normal Distribution 3. The Multivariate Normal Distribution 3.1 Introduction A generalization of the familiar bell shaped normal density to several dimensions plays a fundamental role in multivariate analysis While real data

More information

Standard errors of marginal effects in the heteroskedastic probit model

Standard errors of marginal effects in the heteroskedastic probit model Standard errors of marginal effects in the heteroskedastic probit model Thomas Cornelißen Discussion Paper No. 320 August 2005 ISSN: 0949 9962 Abstract In non-linear regression models, such as the heteroskedastic

More information

Chapter 7: Dummy variable regression

Chapter 7: Dummy variable regression Chapter 7: Dummy variable regression Why include a qualitative independent variable?........................................ 2 Simplest model 3 Simplest case.............................................................

More information

The basic unit in matrix algebra is a matrix, generally expressed as: a 11 a 12. a 13 A = a 21 a 22 a 23

The basic unit in matrix algebra is a matrix, generally expressed as: a 11 a 12. a 13 A = a 21 a 22 a 23 (copyright by Scott M Lynch, February 2003) Brief Matrix Algebra Review (Soc 504) Matrix algebra is a form of mathematics that allows compact notation for, and mathematical manipulation of, high-dimensional

More information

LOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as

LOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as LOGISTIC REGRESSION Nitin R Patel Logistic regression extends the ideas of multiple linear regression to the situation where the dependent variable, y, is binary (for convenience we often code these values

More information

Interpretation of Somers D under four simple models

Interpretation of Somers D under four simple models Interpretation of Somers D under four simple models Roger B. Newson 03 September, 04 Introduction Somers D is an ordinal measure of association introduced by Somers (96)[9]. It can be defined in terms

More information

Web-based Supplementary Materials for Bayesian Effect Estimation. Accounting for Adjustment Uncertainty by Chi Wang, Giovanni

Web-based Supplementary Materials for Bayesian Effect Estimation. Accounting for Adjustment Uncertainty by Chi Wang, Giovanni 1 Web-based Supplementary Materials for Bayesian Effect Estimation Accounting for Adjustment Uncertainty by Chi Wang, Giovanni Parmigiani, and Francesca Dominici In Web Appendix A, we provide detailed

More information

Gerry Hobbs, Department of Statistics, West Virginia University

Gerry Hobbs, Department of Statistics, West Virginia University Decision Trees as a Predictive Modeling Method Gerry Hobbs, Department of Statistics, West Virginia University Abstract Predictive modeling has become an important area of interest in tasks such as credit

More information

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate

More information

Using the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, and Discrete Changes

Using the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, and Discrete Changes Using the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, Discrete Changes JunXuJ.ScottLong Indiana University August 22, 2005 The paper provides technical details on

More information

Structural Econometric Modeling in Industrial Organization Handout 1

Structural Econometric Modeling in Industrial Organization Handout 1 Structural Econometric Modeling in Industrial Organization Handout 1 Professor Matthijs Wildenbeest 16 May 2011 1 Reading Peter C. Reiss and Frank A. Wolak A. Structural Econometric Modeling: Rationales

More information

Qualitative Choice Analysis Workshop 76 LECTURE / DISCUSSION. Hypothesis Testing

Qualitative Choice Analysis Workshop 76 LECTURE / DISCUSSION. Hypothesis Testing Qualitative Choice Analysis Workshop 76 LECTURE / DISCUSSION Hypothesis Testing Qualitative Choice Analysis Workshop 77 T-test Use to test value of one parameter. I. Most common application: to test whether

More information

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

More information

ANNUITY LAPSE RATE MODELING: TOBIT OR NOT TOBIT? 1. INTRODUCTION

ANNUITY LAPSE RATE MODELING: TOBIT OR NOT TOBIT? 1. INTRODUCTION ANNUITY LAPSE RATE MODELING: TOBIT OR NOT TOBIT? SAMUEL H. COX AND YIJIA LIN ABSTRACT. We devise an approach, using tobit models for modeling annuity lapse rates. The approach is based on data provided

More information

11. Analysis of Case-control Studies Logistic Regression

11. Analysis of Case-control Studies Logistic Regression Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:

More information

VI. Introduction to Logistic Regression

VI. Introduction to Logistic Regression VI. Introduction to Logistic Regression We turn our attention now to the topic of modeling a categorical outcome as a function of (possibly) several factors. The framework of generalized linear models

More information

ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2

ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2 University of California, Berkeley Prof. Ken Chay Department of Economics Fall Semester, 005 ECON 14 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE # Question 1: a. Below are the scatter plots of hourly wages

More information

STATISTICA Formula Guide: Logistic Regression. Table of Contents

STATISTICA Formula Guide: Logistic Regression. Table of Contents : Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

More information

Introduction to Quantitative Methods

Introduction to Quantitative Methods Introduction to Quantitative Methods October 15, 2009 Contents 1 Definition of Key Terms 2 2 Descriptive Statistics 3 2.1 Frequency Tables......................... 4 2.2 Measures of Central Tendencies.................

More information

Models for Longitudinal and Clustered Data

Models for Longitudinal and Clustered Data Models for Longitudinal and Clustered Data Germán Rodríguez December 9, 2008, revised December 6, 2012 1 Introduction The most important assumption we have made in this course is that the observations

More information

Introduction to Regression and Data Analysis

Introduction to Regression and Data Analysis Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it

More information

Handling missing data in Stata a whirlwind tour

Handling missing data in Stata a whirlwind tour Handling missing data in Stata a whirlwind tour 2012 Italian Stata Users Group Meeting Jonathan Bartlett www.missingdata.org.uk 20th September 2012 1/55 Outline The problem of missing data and a principled

More information

Using An Ordered Logistic Regression Model with SAS Vartanian: SW 541

Using An Ordered Logistic Regression Model with SAS Vartanian: SW 541 Using An Ordered Logistic Regression Model with SAS Vartanian: SW 541 libname in1 >c:\=; Data first; Set in1.extract; A=1; PROC LOGIST OUTEST=DD MAXITER=100 ORDER=DATA; OUTPUT OUT=CC XBETA=XB P=PROB; MODEL

More information

7. Tests of association and Linear Regression

7. Tests of association and Linear Regression 7. Tests of association and Linear Regression In this chapter we consider 1. Tests of Association for 2 qualitative variables. 2. Measures of the strength of linear association between 2 quantitative variables.

More information

Panel Data: Linear Models

Panel Data: Linear Models Panel Data: Linear Models Laura Magazzini University of Verona laura.magazzini@univr.it http://dse.univr.it/magazzini Laura Magazzini (@univr.it) Panel Data: Linear Models 1 / 45 Introduction Outline What

More information

Multinomial Logistic Regression

Multinomial Logistic Regression Multinomial Logistic Regression Dr. Jon Starkweather and Dr. Amanda Kay Moske Multinomial logistic regression is used to predict categorical placement in or the probability of category membership on a

More information

Simple Linear Regression Inference

Simple Linear Regression Inference Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

More information

Machine Learning Logistic Regression

Machine Learning Logistic Regression Machine Learning Logistic Regression Jeff Howbert Introduction to Machine Learning Winter 2012 1 Logistic regression Name is somewhat misleading. Really a technique for classification, not regression.

More information

5. Multiple regression

5. Multiple regression 5. Multiple regression QBUS6840 Predictive Analytics https://www.otexts.org/fpp/5 QBUS6840 Predictive Analytics 5. Multiple regression 2/39 Outline Introduction to multiple linear regression Some useful

More information

Gender Effects in the Alaska Juvenile Justice System

Gender Effects in the Alaska Juvenile Justice System Gender Effects in the Alaska Juvenile Justice System Report to the Justice and Statistics Research Association by André Rosay Justice Center University of Alaska Anchorage JC 0306.05 October 2003 Gender

More information

Econometric Analysis of Cross Section and Panel Data Second Edition. Jeffrey M. Wooldridge. The MIT Press Cambridge, Massachusetts London, England

Econometric Analysis of Cross Section and Panel Data Second Edition. Jeffrey M. Wooldridge. The MIT Press Cambridge, Massachusetts London, England Econometric Analysis of Cross Section and Panel Data Second Edition Jeffrey M. Wooldridge The MIT Press Cambridge, Massachusetts London, England Preface Acknowledgments xxi xxix I INTRODUCTION AND BACKGROUND

More information

Structural Equation Models for Comparing Dependent Means and Proportions. Jason T. Newsom

Structural Equation Models for Comparing Dependent Means and Proportions. Jason T. Newsom Structural Equation Models for Comparing Dependent Means and Proportions Jason T. Newsom How to Do a Paired t-test with Structural Equation Modeling Jason T. Newsom Overview Rationale Structural equation

More information

Variance of OLS Estimators and Hypothesis Testing. Randomness in the model. GM assumptions. Notes. Notes. Notes. Charlie Gibbons ARE 212.

Variance of OLS Estimators and Hypothesis Testing. Randomness in the model. GM assumptions. Notes. Notes. Notes. Charlie Gibbons ARE 212. Variance of OLS Estimators and Hypothesis Testing Charlie Gibbons ARE 212 Spring 2011 Randomness in the model Considering the model what is random? Y = X β + ɛ, β is a parameter and not random, X may be

More information

Sampling Theory for Discrete Data

Sampling Theory for Discrete Data Sampling Theory for Discrete Data * Economic survey data are often obtained from sampling protocols that involve stratification, censoring, or selection. Econometric estimators designed for random samples

More information

Lecture notes: single-agent dynamics 1

Lecture notes: single-agent dynamics 1 Lecture notes: single-agent dynamics 1 Single-agent dynamic optimization models In these lecture notes we consider specification and estimation of dynamic optimization models. Focus on single-agent models.

More information

LOGISTIC REGRESSION ANALYSIS

LOGISTIC REGRESSION ANALYSIS LOGISTIC REGRESSION ANALYSIS C. Mitchell Dayton Department of Measurement, Statistics & Evaluation Room 1230D Benjamin Building University of Maryland September 1992 1. Introduction and Model Logistic

More information

DEPARTMENT OF ECONOMICS. Unit ECON 12122 Introduction to Econometrics. Notes 4 2. R and F tests

DEPARTMENT OF ECONOMICS. Unit ECON 12122 Introduction to Econometrics. Notes 4 2. R and F tests DEPARTMENT OF ECONOMICS Unit ECON 11 Introduction to Econometrics Notes 4 R and F tests These notes provide a summary of the lectures. They are not a complete account of the unit material. You should also

More information

Semester 1 Statistics Short courses

Semester 1 Statistics Short courses Semester 1 Statistics Short courses Course: STAA0001 Basic Statistics Blackboard Site: STAA0001 Dates: Sat. March 12 th and Sat. April 30 th (9 am 5 pm) Assumed Knowledge: None Course Description Statistical

More information

Automated Statistical Modeling for Data Mining David Stephenson 1

Automated Statistical Modeling for Data Mining David Stephenson 1 Automated Statistical Modeling for Data Mining David Stephenson 1 Abstract. We seek to bridge the gap between basic statistical data mining tools and advanced statistical analysis software that requires

More information

IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results

IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results IAPRI Quantitative Analysis Capacity Building Series Multiple regression analysis & interpreting results How important is R-squared? R-squared Published in Agricultural Economics 0.45 Best article of the

More information

Lecture 19: Conditional Logistic Regression

Lecture 19: Conditional Logistic Regression Lecture 19: Conditional Logistic Regression Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina

More information

The zero-adjusted Inverse Gaussian distribution as a model for insurance claims

The zero-adjusted Inverse Gaussian distribution as a model for insurance claims The zero-adjusted Inverse Gaussian distribution as a model for insurance claims Gillian Heller 1, Mikis Stasinopoulos 2 and Bob Rigby 2 1 Dept of Statistics, Macquarie University, Sydney, Australia. email:

More information

Deterministic and Stochastic Modeling of Insulin Sensitivity

Deterministic and Stochastic Modeling of Insulin Sensitivity Deterministic and Stochastic Modeling of Insulin Sensitivity Master s Thesis in Engineering Mathematics and Computational Science ELÍN ÖSP VILHJÁLMSDÓTTIR Department of Mathematical Science Chalmers University

More information

Agenda. Mathias Lanner Sas Institute. Predictive Modeling Applications. Predictive Modeling Training Data. Beslutsträd och andra prediktiva modeller

Agenda. Mathias Lanner Sas Institute. Predictive Modeling Applications. Predictive Modeling Training Data. Beslutsträd och andra prediktiva modeller Agenda Introduktion till Prediktiva modeller Beslutsträd Beslutsträd och andra prediktiva modeller Mathias Lanner Sas Institute Pruning Regressioner Neurala Nätverk Utvärdering av modeller 2 Predictive

More information

Regression with a Binary Dependent Variable

Regression with a Binary Dependent Variable Regression with a Binary Dependent Variable Chapter 9 Michael Ash CPPA Lecture 22 Course Notes Endgame Take-home final Distributed Friday 19 May Due Tuesday 23 May (Paper or emailed PDF ok; no Word, Excel,

More information

Regression III: Advanced Methods

Regression III: Advanced Methods Lecture 4: Transformations Regression III: Advanced Methods William G. Jacoby Michigan State University Goals of the lecture The Ladder of Roots and Powers Changing the shape of distributions Transforming

More information

Free Trial - BIRT Analytics - IAAs

Free Trial - BIRT Analytics - IAAs Free Trial - BIRT Analytics - IAAs 11. Predict Customer Gender Once we log in to BIRT Analytics Free Trial we would see that we have some predefined advanced analysis ready to be used. Those saved analysis

More information

Estimating the random coefficients logit model of demand using aggregate data

Estimating the random coefficients logit model of demand using aggregate data Estimating the random coefficients logit model of demand using aggregate data David Vincent Deloitte Economic Consulting London, UK davivincent@deloitte.co.uk September 14, 2012 Introduction Estimation

More information

The Exponential Family

The Exponential Family The Exponential Family David M. Blei Columbia University November 3, 2015 Definition A probability density in the exponential family has this form where p.x j / D h.x/ expf > t.x/ a./g; (1) is the natural

More information

THE SELECTION OF RETURNS FOR AUDIT BY THE IRS. John P. Hiniker, Internal Revenue Service

THE SELECTION OF RETURNS FOR AUDIT BY THE IRS. John P. Hiniker, Internal Revenue Service THE SELECTION OF RETURNS FOR AUDIT BY THE IRS John P. Hiniker, Internal Revenue Service BACKGROUND The Internal Revenue Service, hereafter referred to as the IRS, is responsible for administering the Internal

More information