Semiparametric Multinomial Logit Models for the Analysis of Brand Choice Behaviour



Similar documents
BayesX - Software for Bayesian Inference in Structured Additive Regression

STA 4273H: Statistical Machine Learning

Applied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne

Smoothing and Non-Parametric Regression

Christfried Webers. Canberra February June 2015

Statistics in Retail Finance. Chapter 6: Behavioural models

Flexible geoadditive survival analysis of non-hodgkin lymphoma in Peru

Statistics Graduate Courses

Least Squares Estimation

Profitable Strategies in Horse Race Betting Markets

Probabilistic user behavior models in online stores for recommender systems

Fitting Subject-specific Curves to Grouped Longitudinal Data

Local classification and local likelihoods

Statistical Machine Learning

Adequacy of Biomath. Models. Empirical Modeling Tools. Bayesian Modeling. Model Uncertainty / Selection

Nonnested model comparison of GLM and GAM count regression models for life insurance data

CS 688 Pattern Recognition Lecture 4. Linear Models for Classification

NON-LIFE INSURANCE PRICING USING THE GENERALIZED ADDITIVE MODEL, SMOOTHING SPLINES AND L-CURVES

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

STATISTICA Formula Guide: Logistic Regression. Table of Contents

Regression III: Advanced Methods

Gaussian Processes in Machine Learning

Overview Classes Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

The zero-adjusted Inverse Gaussian distribution as a model for insurance claims

Linear Discrimination. Linear Discrimination. Linear Discrimination. Linearly Separable Systems Pairwise Separation. Steven J Zeil.

SAS Software to Fit the Generalized Linear Model

Linear Classification. Volker Tresp Summer 2015

An extension of the factoring likelihood approach for non-monotone missing data

Regression Modeling Strategies

Nominal and ordinal logistic regression

Discrete Optimization

Penalized Logistic Regression and Classification of Microarray Data

Lecture 3: Linear methods for classification

Overview. Longitudinal Data Variation and Correlation Different Approaches. Linear Mixed Models Generalized Linear Mixed Models

Weight of Evidence Module

Penalized regression: Introduction

Time Series Analysis III

Invited Applications Paper

GLM I An Introduction to Generalized Linear Models

Pattern Analysis. Logistic Regression. 12. Mai Joachim Hornegger. Chair of Pattern Recognition Erlangen University

Missing Data. Paul D. Allison INTRODUCTION

SUMAN DUVVURU STAT 567 PROJECT REPORT

Package smoothhr. November 9, 2015

Estimating an ARMA Process

MOBILE ROBOT TRACKING OF PRE-PLANNED PATHS. Department of Computer Science, York University, Heslington, York, Y010 5DD, UK

Examples. David Ruppert. April 25, Cornell University. Statistics for Financial Engineering: Some R. Examples. David Ruppert.

Lecture 10: Regression Trees

The equivalence of logistic regression and maximum entropy models

Introduction to Event History Analysis DUSTIN BROWN POPULATION RESEARCH CENTER

Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets

Scheduling Algorithm with Optimization of Employee Satisfaction

NEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS

We shall turn our attention to solving linear systems of equations. Ax = b

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Multivariate Logistic Regression

Modeling and Analysis of Call Center Arrival Data: A Bayesian Approach

December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B. KITCHENS

Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression

Introduction to General and Generalized Linear Models

It is important to bear in mind that one of the first three subscripts is redundant since k = i -j +3.

Linear Models and Conjoint Analysis with Nonlinear Spline Transformations

GLM, insurance pricing & big data: paying attention to convergence issues.

Bayesian Hidden Markov Models for Alcoholism Treatment Tria

A Log-Robust Optimization Approach to Portfolio Management

Precalculus REVERSE CORRELATION. Content Expectations for. Precalculus. Michigan CONTENT EXPECTATIONS FOR PRECALCULUS CHAPTER/LESSON TITLES

Mathematics (MAT) MAT 061 Basic Euclidean Geometry 3 Hours. MAT 051 Pre-Algebra 4 Hours

SPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing lsun@smu.edu.sg

Multiple Choice Models II

Machine Learning Logistic Regression

Using the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, and Discrete Changes

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

THE IMPORTANCE OF BRAND AWARENESS IN CONSUMERS BUYING DECISION AND PERCEIVED RISK ASSESSMENT

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Introduction to Fixed Effects Methods

Permanents, Order Statistics, Outliers, and Robustness

Contents. List of Figures. List of Tables. List of Examples. Preface to Volume IV

EARLY WARNING INDICATOR FOR TURKISH NON-LIFE INSURANCE COMPANIES

CCNY. BME I5100: Biomedical Signal Processing. Linear Discrimination. Lucas C. Parra Biomedical Engineering Department City College of New York

Data Mining: Algorithms and Applications Matrix Math Review

AN INTRODUCTION TO NUMERICAL METHODS AND ANALYSIS

i=1 In practice, the natural logarithm of the likelihood function, called the log-likelihood function and denoted by

Factor Analysis. Principal components factor analysis. Use of extracted factors in multivariate dependency models

Heritage Provider Network Health Prize Round 3 Milestone: Team crescendo s Solution

Methods of Data Analysis Working with probability distributions

Introduction: Overview of Kernel Methods

Pa8ern Recogni6on. and Machine Learning. Chapter 4: Linear Models for Classifica6on

IBM SPSS Neural Networks 22

1.2 Solving a System of Linear Equations

Class-specific Sparse Coding for Learning of Object Representations

Examining a Fitted Logistic Model

A Tutorial on Probability Theory

Statistics in Retail Finance. Chapter 2: Statistical models of default

Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014

Exact Inference for Gaussian Process Regression in case of Big Data with the Cartesian Product Structure

Model Combination. 24 Novembre 2009

Transcription:

Semiparametric Multinomial Logit Models for the Analysis of Brand Choice Behaviour Thomas Kneib Department of Statistics Ludwig-Maximilians-University Munich joint work with Bernhard Baumgartner & Winfried J. Steiner University of Regensburg 28.3.27

Brand Choice Data Brand Choice Data When purchasing a specific brand, the consumer is faced with a discrete set of alternatives. One aim of marketing analyses: Identify the influence of covariates on brand choice behaviour. Two types of covariates: Global covariates: Fixed for all categories, e.g. age, gender of the consumer. Brand-specific covariates: Depending on the category, e.g. loyalty to a product, price, presence of special advertisement. We will consider data on purchases of the most frequently bought brands of coffee, ketchup and yogurt. Semiparametric Multinomial Logit Models for the Analysis of Brand Choice Behaviour 1

Main characteristics of the data sets: Brand Choice Data Coffee Ketchup Yogurt Number of brands five three five Market share 53% 87% 74% Sample size 49.83 26.82 66.679 Covariates: Loyalty Reference price Difference between reference price and price Promotional Activity Loyalty of the consumer to a specific brand. Internal reference price built through experience. Deviation of the actual price from the reference price. Dummy-variables for the presence of special promotion. Loyalty and reference price are estimated based on an exponentially weighted average of former purchases. Semiparametric Multinomial Logit Models for the Analysis of Brand Choice Behaviour 2

Brand Choice Data Model the decision using latent utilities associated with buying a specific brand r: L (r) i, r = 1,..., k. Note: We do not observe the utilities but only the brand choice decisions. Rational behaviour: utility: The consumer chooses the product that maximizes her/his Y i = r L (r) i = max s=1,...,k L(s) i. Express the utilities in terms of covariates and an error term: L (r) i = u iα (r) + w (r) i δ + ε (r) i. Semiparametric Multinomial Logit Models for the Analysis of Brand Choice Behaviour 3

Brand Choice Data If the error term is standard extreme value distributed, we obtain the multinomial logit model. P (Y i = r) = exp(η (r) i ) 1 + k 1 s=1 exp(η(s) i ), r = 1,..., k 1 with η (r) i = u iα (r) + (w (r) i w (k) i ) δ = u iα (r) + w (r) i δ. Semiparametric Multinomial Logit Models for the Analysis of Brand Choice Behaviour 4

Brand Choice Data Some marketing theories suggest the possibility of nonlinear influences of some of the covariates. Example: Adaptation level theory. Consumers compare prices to internal reference prices build through experience. Around the reference point (price equals reference price) there may be a region of indifference. Suggests a sigmoid-shaped form of the covariate-effect. Semiparametric extensions of the multinomial logit model to validate such hypotheses. Semiparametric Multinomial Logit Models for the Analysis of Brand Choice Behaviour 5

Semiparametric Multinomial Logit Models Semiparametric Multinomial Logit Models Extend the linear predictor to a semiparametric predictor η (r) i = u iα (r) + w (r) i δ + l j=1 f (r) j (x ij ) + p j=l+1 f j (x (r) ij ) where The functions f (r) j f j (x (r) ij ) = f j(x (r) ij ) f j(x (k) ij ). and f j are modelled using penalised splines. Represent a function f(x) as a linear combination of B-spline basis functions: M β m B m (x). m=1 Semiparametric Multinomial Logit Models for the Analysis of Brand Choice Behaviour 6

Semiparametric Multinomial Logit Models 2 1 1 2 3 1.5 1.5 3 B-spline basis 2 1 1 2 3 1.5 1.5 3 Scaled B-splines 2 1 1 2 3 1.5 1.5 3 B-spline fit Semiparametric Multinomial Logit Models for the Analysis of Brand Choice Behaviour 7

Semiparametric Multinomial Logit Models Use a large number of basis functions to guarantee enough flexibility but augment a penalty term to the likelihood to ensure smoothness. Approximate derivative penalties are obtained by difference penalties, e.g. 1 2τ 2 1 2τ 2 M (β m β m 1 ) 2 (first order differences) m=2 M (β m 2β m 1 + β m 2 ) 2 (second order differences) m=3 The smoothing parameter τ 2 controls the trade-off between fidelity to the data (τ 2 large) and smoothness (τ 2 small). Penalty terms in matrix notation: 1 2τ 2β Kβ with penalty matrix K = D D and appropriate difference matrices D. Semiparametric Multinomial Logit Models for the Analysis of Brand Choice Behaviour 8

Inference Inference Two different types of parameters in the model: Regression coefficients describing either parametric or semiparametric effects, and Smoothing parameters. Penalised likelihood for the regression coefficients: l pen (α, δ, β) = l(α, δ, β) k 1 r=1 q j=1 1 2β(r) 2(τ (r) j K j β (r) j j ) p j=q+1 1 2τ 2 j β jk j β j. l(α, δ, β) is the usual likelihood of a multinomial logit model. Maximisation can be achieved by a slight modification of Fisher scoring. Semiparametric Multinomial Logit Models for the Analysis of Brand Choice Behaviour 9

Estimate smoothing parameters based on marginal likelihood: L(τ 2 ) = L pen (α, δ, β, τ 2 )dα dδ dβ max τ 2. Inference Laplace approximation to the integral yields a working Gaussian model. Integral becomes tractable. Fisher scoring algorithm in the working model. Marginal likelihood corresponds to restricted maximum likelihood estimation in Gaussian regression models. Semiparametric Multinomial Logit Models for the Analysis of Brand Choice Behaviour 1

Results Results Loyalty: Coffee Ketchup Yogurt 6 4 2 2 4 2 1 1 2 3 2 2 4 6.2.4.6.8 1 Loyalty.2.4.6.8 1 Loyalty.2.4.6.8 1 Loyalty Semiparametric Multinomial Logit Models for the Analysis of Brand Choice Behaviour 11

Thomas Kneib Results Reference price: Coffee 6 3 Yogurt 6 7 8 Reference price 9 1 2 6 2 4 1 2 2 1 4 2 2 4 Ketchup 2 3 4 5 Reference price 6 7 5 1 15 Reference price Difference between reference price and price: Coffee 6 3 Yogurt 4 2 2 Difference between reference price and price 4 2 6 2 4 1 2 2 1 4 2 2 4 Ketchup 4 2 2 Difference between reference price and price Semiparametric Multinomial Logit Models for the Analysis of Brand Choice Behaviour 4 1 5 5 Difference between reference price and price 1 12

Model Evaluation & Proper Scoring Rules Model Evaluation & Proper Scoring Rules We propose to use a more complicated model. Is the increased model complexity necessary? Validate the model based on its predictive performance. What are suitable measures of predictive performance? What is a prediction? We consider predictive distributions ˆπ = (ˆπ (1),..., ˆπ (k) ) with the model probabilities π (r) = P (Y = r). A scoring rule is a real-valued function S(ˆπ, r) that assigns a value to the event that category r is observed when ˆπ is the predictive distribution. Semiparametric Multinomial Logit Models for the Analysis of Brand Choice Behaviour 13

Model Evaluation & Proper Scoring Rules Score: Sum over individuals in a validation data set S = n S(ˆπ i, r i ) i=1 Let π denote the true distribution. Then a scoring rules is called Proper if S(π, π ) S(ˆπ, π ) for all π. Strictly proper if equality holds only if ˆπ = π. Some common examples: Hit rate (proper but not strictly proper): { 1 n if ˆπ (r i) = max{ˆπ (1),..., ˆπ (k) }, S(ˆπ, r i ) = otherwise. Semiparametric Multinomial Logit Models for the Analysis of Brand Choice Behaviour 14

Model Evaluation & Proper Scoring Rules Logarithmic score (strictly proper): S(ˆπ, r i ) = log(ˆπ (r i) ). Brier score (strictly proper): k S(ˆπ, r i ) = (1(r i = r) ˆπ (r)) 2 r=1 Spherical score (strictly proper): S(ˆπ, r i ) = ˆπ (r i) k r=1 (ˆπ(r) ) 2. Semiparametric Multinomial Logit Models for the Analysis of Brand Choice Behaviour 15

Model Evaluation & Proper Scoring Rules In our data sets: Coffee Ketchup Yogurt parametric semipar. parametric semipar. parametric semipar. Hit rate (est.).7.7.79.79.82.83 Hit rate (pred.).7.66.78.78.83.81 Logarithmic (est.) -13816.9-13491.45-5146.61-524.4-852.6-7923.95 Logarithmic (pred.) -13955.8-15682.87-5297.58-5225.32-2461.49-26588.13 Brier (est.) -6912.34-6789.38-5192.6-5222.72-4261.52-444.11 Brier (pred.) -693.3-7646.83-299.25-2962.46-12919.96-12416.39 Spherical (est.) 1212.9 12181.2 6455.8 645.31 12678.38 12798.71 Spherical (pred.) 1293.57 11588.11 7688.11 771.58 37965.96 38231.85 Coffee data: Parametric model seems sufficient. Ketchup data: Improved performance with semiparametric model. Yogurt data: Some indication of a need for semiparametric extensions but no definite answer. Semiparametric Multinomial Logit Models for the Analysis of Brand Choice Behaviour 16

Software Software Proposed methodology is implemented in the software package BayesX. Stand-alone software for additive and geoadditive regression models. Supports exponential family regression, categorical regression and hazard regression for continuous time survival analysis. The current version is Windows-only but a Linux version and a connection to R are work in progress. Available from http://www.stat.uni-muenchen.de/~bayesx Semiparametric Multinomial Logit Models for the Analysis of Brand Choice Behaviour 17

Summary Summary Semiparametric extension of the well-known multinomial logit model. Fully automated fit (including smoothing parameters). Model validation based on proper scoring rules. Reference: Kneib, T., Baumgartner, B. & Steiner, W. J. (27). Semiparametric Multinomial Logit Models for Analysing Consumer Choice Behaviour. Under revision for AStA Advances in Statistical Analysis. A place called home: http://www.stat.uni-muenchen.de/~kneib Semiparametric Multinomial Logit Models for the Analysis of Brand Choice Behaviour 18