Paired Comparison Models. Paired Comparison Preference Models. The prefmod Package: Part I Some Examples. Paired Comparisons.



Similar documents
A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn

Notating the Multilevel Longitudinal Model. Multilevel Modeling of Longitudinal Data. Notating (cont.) Notating (cont.)

Multivariate Logistic Regression

Chapter 5 Analysis of variance SPSS Analysis of variance

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2

Overview Classes Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

Poisson Models for Count Data

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

11. Analysis of Case-control Studies Logistic Regression

SAS Software to Fit the Generalized Linear Model

13. Poisson Regression Analysis

Nominal and ordinal logistic regression

Probability Calculator

Basic Statistical and Modeling Procedures Using SAS

SPSS Guide: Regression Analysis

Simple Linear Regression Inference

Ordinal Regression. Chapter

HLM software has been one of the leading statistical packages for hierarchical

GLMs: Gompertz s Law. GLMs in R. Gompertz s famous graduation formula is. or log µ x is linear in age, x,

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

MULTIPLE REGRESSION WITH CATEGORICAL DATA

3. Regression & Exponential Smoothing

It is important to bear in mind that one of the first three subscripts is redundant since k = i -j +3.

Additional sources Compilation of sources:

R 2 -type Curves for Dynamic Predictions from Joint Longitudinal-Survival Models

Statistical Models in R

Lecture 6: Poisson regression

Linda K. Muthén Bengt Muthén. Copyright 2008 Muthén & Muthén Table Of Contents

Directions for using SPSS

12.5: CHI-SQUARE GOODNESS OF FIT TESTS

Normality Testing in Excel

VI. Introduction to Logistic Regression

This can dilute the significance of a departure from the null hypothesis. We can focus the test on departures of a particular form.

Analyzing Ranking and Rating Data from Participatory On- Farm Trials

Factor Analysis. Chapter 420. Introduction

KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management

Multiple Linear Regression

ANALYSING LIKERT SCALE/TYPE DATA, ORDINAL LOGISTIC REGRESSION EXAMPLE IN R.

Multinomial and Ordinal Logistic Regression

DATA INTERPRETATION AND STATISTICS

Standard errors of marginal effects in the heteroskedastic probit model

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Introduction to General and Generalized Linear Models

Categorical Data Analysis

SIMPLE LINEAR CORRELATION. r can range from -1 to 1, and is independent of units of measurement. Correlation can be done on two dependent variables.

Nominal and Real U.S. GDP

Simple Predictive Analytics Curtis Seare

Linear Discrimination. Linear Discrimination. Linear Discrimination. Linearly Separable Systems Pairwise Separation. Steven J Zeil.

Parametric and non-parametric statistical methods for the life sciences - Session I

Multiple Choice Models II

Examples of Using R for Modeling Ordinal Data

ASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS

Nonlinear Regression:

The Dummy s Guide to Data Analysis Using SPSS

Analysing Questionnaires using Minitab (for SPSS queries contact -)

Use of deviance statistics for comparing models

MATH 10: Elementary Statistics and Probability Chapter 5: Continuous Random Variables

Illustration (and the use of HLM)

Statistiek II. John Nerbonne. October 1, Dept of Information Science

Generalized Linear Models

ANALYSIS OF TREND CHAPTER 5

Row vs. Column Percents. tab PRAYER DEGREE, row col

Using Excel for inferential statistics

The Latent Variable Growth Model In Practice. Individual Development Over Time

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)

Modelling and added value

Independent t- Test (Comparing Two Means)

MATH4427 Notebook 2 Spring MATH4427 Notebook Definitions and Examples Performance Measures for Estimators...

5. Linear Regression

Handling missing data in Stata a whirlwind tour

Examining a Fitted Logistic Model

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

STRUTS: Statistical Rules of Thumb. Seattle, WA

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Module 4 - Multiple Logistic Regression

Automated Biosurveillance Data from England and Wales,

Chapter 3 Quantitative Demand Analysis

Lean Six Sigma Analyze Phase Introduction. TECH QUALITY and PRODUCTIVITY in INDUSTRY and TECHNOLOGY

Logit Models for Binary Data

Lesson Outline Outline

containing Kendall correlations; and the OUTH = option will create a data set containing Hoeffding statistics.

We extended the additive model in two variables to the interaction model by adding a third term to the equation.

7 Generalized Estimating Equations

Simple linear regression

T-test & factor analysis

International College of Economics and Finance Syllabus Probability Theory and Introductory Statistics

A LOGNORMAL MODEL FOR INSURANCE CLAIMS DATA

Statistical Models in R

GLM I An Introduction to Generalized Linear Models

COMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES.

This chapter will demonstrate how to perform multiple linear regression with IBM SPSS

Computer exercise 4 Poisson Regression

TRINITY COLLEGE. Faculty of Engineering, Mathematics and Science. School of Computer Science & Statistics

Lecture 5 : The Poisson Distribution

Transcription:

Paired Comparison Models Paired Comparison Models Paired Comparison Models General Considerations: Paired Comparison Preference Models The prefmod Package: Part I Some Examples Regina Dittrich & Reinhold Hatzinger Department of Statistics and Mathematics, WU Vienna the paired comparison models we consider in this talk are based on the Bradley-Terry Model (Biometrika, 1952) and will be used as statistical models describing and explaining data the models are similar to regression models (ANOVA) other types of dependent variables (multivariate, nominal) Generalised Linear Models but will not be considered as measurement models in this case one should use another approach e.g. taken by Winkelmaier in the R package eba 6.11.2010 1 6.11.2010 2 Paired Comparison Models Paired Comparisons method of data collection given a set of J items Paired Comparison Models Overview LLBT models: loglinear Bradly-Terry models Basic LLBT individuals are asked to judge pairs of objects j preferred to k k preferred to j aim is to rank objects into a preference order obtain an overall ranking of the objects Extended LLBT undecided response subject covariates object specic covariates Pattern Models Paired comparison pattern models Ranking pattern models Rating pattern models 6.11.2010 3 6.11.2010 4

LLBT The Basic Bradley-Terry Model (BT) LLBT The Basic Loglinear BT Model (LLBT) for the each comparison (jk) of object j to object k we observe: n (j k)... the number of times j is preferred to k n (k j)... the number of times k is preferred to j the model can be formulated as a log-linear model following the usual Multinomial / Poisson - equivalence. the expected value m (j k) of n (j k) is m (j k) = N (jk) p (j k) N (jk) = n (j k) + n (k j) total number of responses to comparison (jk) P (j k) = π j πj = c π j + π (jk) k πk where c (jk) is constant for a given comparison the probability that j is preferred to k in comparison (jk) then our basic paired comparison model for one comparison is P (j k) = π j π j + π k π's are a called worth parameters and are non-negative numbers describing the location of the objects ln m (j k) = µ (jk) + λ j λ k λ's are the object parameters µ's are nuisance parameters this model formulation is feasible for further extensions 6.11.2010 5 6.11.2010 6 LLBT terms and relations relation between π and λ: λ j = ln π j π j = exp 2λ j identiability of πs is obtained by the restriction π J = 1 via λ J = 0 the worth parameters are calculated by π j = exp(2λ j), j = 1, 2,..., J j exp(2λ j ) where j π j = 1 LLBT LLBT Design Structure for 3 objects we have 3 comparisons PC decision counts µ λ 1 λ 2 λ 3 (12) O 1 n (1 2) 1 1-1 0 (12) O 2 n (2 1) 1-1 1 0 (13) O 1 n (1 3) 2 1 0-1 (13) O 3 n (3 1) 2-1 0 1 (23) O 2 n (2 3) 3 0 1-1 (23) O 3 n (3 2) 3 0-1 1 the design structure consists of counts (dependent variable) and the design matrix X with: µ which is a factor (dummies for µ 1, µ 2, µ 3 ) and variates for the objects O 1, O 2, O 3 6.11.2010 7 6.11.2010 8

LLBT LLBT Model Structure model formula for the rst line is: LLBT Example Wines WINE example (cticious) 4 sicilian red wines have been tasted pairwise by 1000 people (all of them prefer red to white wines) ln m (1 2) = µ 1 + λ 1 λ 2 general matrix notation: ln m (1 2) 1 0 0 1 1 0 µ 1 ln m (2 1) 1 0 0 1 1 0 µ ln m 2 (1 3) = 0 1 0 1 0 1 µ 3 ln m (3 1) 0 1 0 1 0 1 λ 1 ln m (2 3) 0 0 1 0 1 1 λ 2 ln m 0 0 1 0 1 1 λ (3 2) 3 aim of the study: preference order for dierent wines data: PC-responses about the choices of 4 selected wines the wines are: O1 = Nero D'Avola (also called 'Calabrese') O2 = Gaglioppo (a red of Calabrian origin frequently grown in Sicily O3 = Perricone (Pignatello - esoteric, robust red) O4 = Nerello (Mascalese - strong red) 6.11.2010 9 6.11.2010 10 Data - Coding Data preparation Data Coding Coding 4 objects (items) A,G,P,N number of comparisons is 4 3/2 = 6 for data input the ordering of comparisons is important wine rst in comparison wines A 1 G 2 P 3 N 4 A 1 G 2 V1 P 3 V2 V3 N 4 V4 V5 V6 V1 V2 V3 V4 V5 V6 (12) (13) (23) (14) (24) (34) (A,G) (A,P) (G,P) (A,N) (G,N) (P,N) One possible coding for each comparison (V1,... V6 ) is (jk) = { 1 if rst object is preferred to second object (j k) 1 if second object is preferred to rst object (k j) First respondent gave the following responses: NOTE: V1 V2 V3 V4 V5 V6 (12) (13) (23) (14) (24) (34) (A,G) (A,P) (G,P) (A,N) (G,N) (P,N) 1 1 1 1-1 -1 missing responses should be coded with NA if the coding is not (1, 1) but any other two numbers, the smaller number means the rst object is preferred e.g. a coding with (0, 1) 0 means rst object preferred 6.11.2010 11 6.11.2010 12

prefmod LLBT WINE data Wine data preparations > library(prefmod) > load("wine.rdata") > wnames <- c("a", "G", "P", "N") produce barplots for all comparisons > par(mfrow = c(2, 3)) > for (i in 1:6) barplot(table(wine[, i])) > par(mfrow = c(1, 1)) prefmod LLBT WINE Example Function: llbtpc.fit() preparations > library(prefmod) > load("wine.rdata") > wnames <- c("a", "G", "P", "N") t simple model > m1 <- llbtpc.fit(wine, nitems = 4, obj.names = wnames) > m1 6.11.2010 13 6.11.2010 14 prefmod LLBT Wine Example - Result > m1 Call: gnm(formula = formula, eliminate = elim, family = poisson, data = dfr) Coefficients of interest: A G P N 1.118 1.073 0.794 NA Deviance: 2.1059 Pearson chi-squared: 2.1101 Residual df: 3 prefmod LLBT Wine Example - Output model t was done by the package gnm() We see (parameter) estimates for λ 1, λ 2, λ 3 and λ 4 = NA (is set to zero because of restriction for estimation) We do not see parameter estimates for µ 1, µ 2, µ 3, µ 4 these are nuisance parameters and they are eliminated by the procedure (tted, but not displayed) Does the model t? We can use the Deviance 2.106 degrees of freedom 3 > dev1 <- round(m1$deviance, digits = 5) > df1 <- m1$df.residual > prob1 <- 1 - pchisq(dev1, df1) > print(prob1) [1] 0.55073 Model t is OK (probability is > 0.05 ) 6.11.2010 15 6.11.2010 16

deviance prefmod LLBT Wine Example - Worth Functions: llbt.worth(), plotworth() Deviance: oij ln ( o ij e ij ) = n ij ln ( n ij m ij ) calculate worth parameters under full modell n ij = m ij > worth1 <- llbt.worth(m1) χ 2 statistic: (o ij e ij ) 2 e ij = (n ij m ij ) 2 m ij Remember: π j = exp(2λj) j exp(2λj) λ A = 1.118, λ G = 1.073, λ P = 0.794, λ N = 0 or get estimates > est1 <- llbt.worth(m1, outmat = "lambda") plot worth or estimates > plotworth(worth1, ylab = "estimated worth") > plotworth(est1) 6.11.2010 17 6.11.2010 18 prefmod LLBT Wine Example - Plot extraction of parameter estimates estimated worth 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 Preferences A G P N How to get p.e. from m1 (which is a gnm object ) all parameter estimates (µ 1, µ 2,... λ 1, λ 2,...) > m1$coefficients A G P N 1.1183 1.0731 0.7935 NA attr(,"eliminated") [1] 6.2136 6.1628 6.1760 5.6880 5.7241 5.9282 How to get parameter estimates ofinterest only?(λ 1, λ 2,...) > c1 <- coef(m1) > c1 Coefficients of interest: A G P N 1.1183 1.0731 0.7935 NA estimate 6.11.2010 19 6.11.2010 20

extraction of parameter estimates extraction of parameter estimates Replace last parameter (NA) with 0 and give names > c1 <- c(c1[1:3], 0) > names(c1) <- c("a", "G", "P", "N") How to get the µ's?(µ 1, µ 2,...) How to get s.e. from m1 > cov <- vcov(m1) > se <- sqrt(diag(cov)) How to make a condence interval plot > mu <- attr(m1$coefficients, "eliminated") > mu[1:6] [1] 6.2136 6.1628 6.1760 5.6880 5.7241 5.9282 > mu[4] [1] 5.688 > up1 <- c1 + se * 1.96 > lo1 <- c1 - se * 1.96 > cbind(lo1, c1, up1) lo1 c1 up1 A 1.05548 1.1183 1.18114 G 1.01080 1.0731 1.13542 P 0.73392 0.7935 0.85308 N 0.00000 0.0000 0.00000 6.11.2010 21 6.11.2010 22 Plot estimates and condence intevals > plot(1:4, c1, type = "b", ylim = c(0, 1.4)) > for (i in 1:4) { + lines(rep(i, 2), c(lo1[i], up1[i]), col = "red", lty = "dashed") + text(rep(i, 2), c1[i], names(c1)[i], pos = 2) + } c1 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 A G P N 1.0 1.5 2.0 2.5 3.0 3.5 4.0 1:4 LLBT Interpretation of parameters Parameter Interpretation log odds for preferring A to N ln P (A N) = ln P (N A) m (A N) N (A,N) m (N B) N (A,N) = ln m (A N) = ln m m (A N) ln m (N A) (N B) insert equations ln m (A N) = µ (A,N) + λ A λ N ln m (N A) = µ (A,N) + λ A λ N log odds preferring A to N are parameter estimates are stored in c2[1] and c2[4] λ A is 1.118 and λ N is 0 (reference object) 2(λ A λ N ) is 2.237 the odds for preferring A to N are exp(2.237) = 9.362 6.11.2010 23 6.11.2010 24

LLBT expected preferences LLBT Comparing observed and expected preferences How would you calculate the expected preferences for m (A G) and for m (G A)? the λ parameter estimates are: > coef(m1) Coefficients of interest: A G P N 1.1183 1.0731 0.7935 NA the estimates for the µs are: > attr(m1$coefficients, "eliminated") [1] 6.2136 6.1628 6.1760 5.6880 5.7241 5.9282 Hint 1: Use model formula to calculate the logarithm of the expected numbers with ln m (A G) = µ (A,G) + λ A λ G Hint 2: the expected numbers are exp(ln m (A G) ) The observed preferences (counts) n (i j) are stored in > m1$y 1.V11 1.V12 1.V21 1.V22 1.V31 1.V32 1.V41 1.V42 1.V51 1.V52 1.V61 1.V62 519 481 656 344 626 374 908 92 902 98 819 181 The expected preferences m (i j) are stored in > m1$fitted.values 1.V11 1.V12 1.V21 1.V22 1.V31 1.V32 1.V41 1.V42 1.V51 1.V52 522.59 477.41 656.92 343.08 636.27 363.73 903.49 96.51 895.32 104.68 1.V61 1.V62 830.19 169.81 The raw residuals are the dierence between observed and expected preferences > m1$y - m1$fitted.values 1.V11 1.V12 1.V21 1.V22 1.V31 1.V32 1.V41-3.58549 3.58549-0.92408 0.92408-10.27047 10.27047 4.50957 1.V42 1.V51 1.V52 1.V61 1.V62-4.50957 6.68499-6.68499-11.19455 11.19455 6.11.2010 25 6.11.2010 26 LLBT Example CEMS Example: CEMS exchange programme LLBT Extensions LLBT Extensions Overview students of the WU can study abroad visiting one of currently 17 CEMS universities undecided responses (ties) subject covariates aim of the study: preference orderings of students for dierent locations identify reasons for these preferences object specic covariates data: PC-responses about their choices of 6 selected CEMS universities for the semester abroad (London, Paris, Milan, Barcelona, St.Gall, Stockholm) answer: can not decide was allowed several covariates (e.g., gender, working status, language abilities, etc.) 6.11.2010 27 6.11.2010 28

LLBT Extensions undecided Response Undecided Response undecided 3 possible responses for each comparison between two objects j and k the response can be: j k k j j = k In our CEMSexample: comparing London (LO) to Paris (PA) the response can be: LO PA PA LO LO = PA LLBT Extensions undecided Response LLBT with Undecided Response Using the respecication of the probabilities suggested by Davidson and Beaver (1977): the LLBT model formulas for the comparison (jk) are now: ln m (j k) = µ (jk) + λ j λ k ln m (k j) = µ (jk) λ j + λ k ln m (j=k) = µ (jk) + γ where γ is the parameter for undecided response (could also be γ (jk) ) λ's are the object parameters µ's are nuisance parameters 6.11.2010 29 6.11.2010 30 LLBT Interpretation of parameters Interpretation of parameter γ log odds for preferring (j k) to "no decision" (j = k) ln P (j k) = ln m P (j k) ln m (j=k) (j=k) insert equations ln m (j k) = µ (jk) + λ j λ k ln m (j=k) = µ (jk) γ for λ j = λ k log odds preferring (j k) to "no decision" (j = k) is γ if γ < 0 : there is an advantage in favour of a decision xxx LLBT Design Structure with undecided PC decision counts µ γ λ 1 λ 2 λ 3 (12) O 1 n (1 2) 1 0 1-1 0 (12) no n (1=2) 1 1 0 0 0 (12) O 2 n (2 1) 1 0-1 1 0 (13) O 1 n (1 3) 2 0 1 0-1 (13) no n (1=3) 2 1 0 0 0 (13) O 3 n (3 1) 2 0-1 0 1 (23) O 2 n (2 3) 3 0 0 1-1 (23) no n (2=3) 3 1 0 0 0 (23) O 3 n (3 2) 3 0 0-1 1 Design matrix consists of µ, λ 1, λ 2, λ 3 and additionally γ where µ is a factor and the n's are the counts 6.11.2010 31 6.11.2010 32

prefmod LLBT CEMS Example Function: llbtpc.fit() preparations > library(prefmod) > load("cpc.rdata") > cities <- c("lo", "PA", "MI", "SG", "BA", "ST") t simple model > m2 <- llbtpc.fit(cpc, nitems = 6, obj.names = cities) > summary(m2) t model including eect for undecided prefmod LLBT CEMS Example Functions: llbt.worth(), plotworth() calculate worth parameters > worth2 <- llbt.worth(m2) > worth3 <- llbt.worth(m3) > worthmat <- cbind(worth2, worth3) plot them > plotworth(worthmat) > m3 <- llbtpc.fit(cpc, nitems = 6, undec = TRUE, obj.names = cities) > summary(m3) compare models > anova(m3, m2) 6.11.2010 33 6.11.2010 34 LLBT Extension Subject Covariates Subject Covariates Are the preference orderings dierent for dierent groups of subjects? For one subject covariate on s levels we have now ln m (j k) s = µ (jk)s + λ S s + (λ O j j where λ O λ OS + λ O js js ) (λ O k k + λo ks ks ) object parameters (for subject baseline group) interaction parameter between objects and subject category xxx LLBT Design structure with 1 subject covariate PC decison counts µ λ S γ λ O 1 λ O 2 λ O 3 λ OS 12 λ OS 22 λ OS 32 (12) O 1 n (1 2) 1 1 0 0 1-1 0 0 0 0 (12) no n (1=2) 1 1 0 1 0 0 0 0 0 0 (12) O 2 n (2 1) 1 1 0 0-1 1 0..... 0 0 0 (12) O 1 n (1 2) 2 4 1 0 1-1 0 1-1 0 (12) no n (1=2) 2 4 1 1 0 0 0 0 0 0 (12) O 2 n (2 1) 2 4 1 0-1 1 0..... -1 1 0 O and OS are parameter of interest λ S s µ's xing the margin for category s of covariate S (nuisance) nuisance parameters MU S not related to objects nuisance parameters 6.11.2010 35 6.11.2010 36

prefmod LLBT: One Subject Covariates CEMS Example prefmod LLBT CEMS Example - Plot Options for llbtpc.fit(): formel, elim Preferences t null model (without subject covariates) > mb0 <- llbtpc.fit(cpc, nitems = 6, undec = TRUE, formel = ~1, + elim = ~SEX, obj.names = cities) t model with dierent preference scales for SEX > mbsex <- llbtpc.fit(cpc, nitems = 6, undec = TRUE, formel = ~SEX, + elim = ~SEX, obj.names = cities) Plot for dierent preference scales for SEX female=1 and male =2 estimated worth 0.10 0.15 0.20 0.25 0.30 0.35 0.40 LO PA MI SG BA ST LO PA SG BA ST MI > wbsex <- llbt.worth(mbsex) > colnames(wbsex) <- c("female", "male") > plotworth(wbsex, ylab = "estimated worth") female male 6.11.2010 37 6.11.2010 38 LLBT Interpretation of parameters ofinterest λ O j j parameter estimate for O j for the reference group (all subjects covariates on level = 0 in dummy coding ) λ O js js change of λ O j j model: SEX for group s estimates for object PA reference group SEX1 λ P A SEX2 λ P A +λ P A:SEX2 LLBT Interpretation of parameters ofinterest > cmsw <- coef(mbsex) > cmsw Coefficients of interest: LO PA MI SG BA ST 0.8444125 0.5177037 0.2604805 0.1505150 0.0788050 NA u LO:SEX2 PA:SEX2 MI:SEX2 SG:SEX2 BA:SEX2-1.3203958-0.0994717-0.2341781-0.3117667 0.0647604 0.0042943 ST:SEX2 NA model: SEX estimates for object PA model in mbsex λ P A +λ P A:SEX2 SEX1 0.518 λ P female A = 0.518 SEX2 0.518-0.234 λ P male A = 0.284 6.11.2010 39 6.11.2010 40