Bayesian Inference of Arrival Rate and Substitution Behavior from Sales Transaction Data with Stockouts



Similar documents
Real-time Particle Filters

Single-machine Scheduling with Periodic Maintenance and both Preemptive and. Non-preemptive jobs in Remanufacturing System 1

Chapter 8: Regression with Lagged Explanatory Variables

TEMPORAL PATTERN IDENTIFICATION OF TIME SERIES DATA USING PATTERN WAVELETS AND GENETIC ALGORITHMS

The Application of Multi Shifts and Break Windows in Employees Scheduling

ANALYSIS AND COMPARISONS OF SOME SOLUTION CONCEPTS FOR STOCHASTIC PROGRAMMING PROBLEMS

Statistical Analysis with Little s Law. Supplementary Material: More on the Call Center Data. by Song-Hee Kim and Ward Whitt

Strategic Optimization of a Transportation Distribution Network

Term Structure of Prices of Asian Options

Journal Of Business & Economics Research September 2005 Volume 3, Number 9

The Transport Equation

PATHWISE PROPERTIES AND PERFORMANCE BOUNDS FOR A PERISHABLE INVENTORY SYSTEM

Hotel Room Demand Forecasting via Observed Reservation Information

Duration and Convexity ( ) 20 = Bond B has a maturity of 5 years and also has a required rate of return of 10%. Its price is $613.

Measuring macroeconomic volatility Applications to export revenue data,

Chapter 7. Response of First-Order RL and RC Circuits

Mathematics in Pharmacokinetics What and Why (A second attempt to make it clearer)

DYNAMIC MODELS FOR VALUATION OF WRONGFUL DEATH PAYMENTS

Research on Inventory Sharing and Pricing Strategy of Multichannel Retailer with Channel Preference in Internet Environment

Morningstar Investor Return

SPEC model selection algorithm for ARCH models: an options pricing evaluation framework

PROFIT TEST MODELLING IN LIFE ASSURANCE USING SPREADSHEETS PART ONE

Analogue and Digital Signal Processing. First Term Third Year CS Engineering By Dr Mukhtiar Ali Unar

Niche Market or Mass Market?

Predicting Stock Market Index Trading Signals Using Neural Networks

Inductance and Transient Circuits

11/6/2013. Chapter 14: Dynamic AD-AS. Introduction. Introduction. Keeping track of time. The model s elements

Stock Trading with Recurrent Reinforcement Learning (RRL) CS229 Application Project Gabriel Molina, SUID

Random Walk in 1-D. 3 possible paths x vs n. -5 For our random walk, we assume the probabilities p,q do not depend on time (n) - stationary

Multiprocessor Systems-on-Chips

USE OF EDUCATION TECHNOLOGY IN ENGLISH CLASSES

CHARGE AND DISCHARGE OF A CAPACITOR

Individual Health Insurance April 30, 2008 Pages

Appendix D Flexibility Factor/Margin of Choice Desktop Research

Chapter 2 Problems. 3600s = 25m / s d = s t = 25m / s 0.5s = 12.5m. Δx = x(4) x(0) =12m 0m =12m

MTH6121 Introduction to Mathematical Finance Lesson 5

DEMAND FORECASTING MODELS

Option Put-Call Parity Relations When the Underlying Security Pays Dividends

On the degrees of irreducible factors of higher order Bernoulli polynomials

Principal components of stock market dynamics. Methodology and applications in brief (to be updated ) Andrei Bouzaev, bouzaev@ya.

Why Did the Demand for Cash Decrease Recently in Korea?

DETERMINISTIC INVENTORY MODEL FOR ITEMS WITH TIME VARYING DEMAND, WEIBULL DISTRIBUTION DETERIORATION AND SHORTAGES KUN-SHAN WU

Bayesian Filtering with Online Gaussian Process Latent Variable Models

A Note on Using the Svensson procedure to estimate the risk free rate in corporate valuation

µ r of the ferrite amounts to It should be noted that the magnetic length of the + δ

Making a Faster Cryptanalytic Time-Memory Trade-Off

Forecasting, Ordering and Stock- Holding for Erratic Demand

Performance Center Overview. Performance Center Overview 1

ARCH Proceedings

Chapter 6: Business Valuation (Income Approach)

The Real Business Cycle paradigm. The RBC model emphasizes supply (technology) disturbances as the main source of

The option pricing framework

TSG-RAN Working Group 1 (Radio Layer 1) meeting #3 Nynashamn, Sweden 22 nd 26 th March 1999

MACROECONOMIC FORECASTS AT THE MOF A LOOK INTO THE REAR VIEW MIRROR

Cointegration: The Engle and Granger approach

Risk Modelling of Collateralised Lending

Chapter 1.6 Financial Management

Distributing Human Resources among Software Development Projects 1

Supplementary Appendix for Depression Babies: Do Macroeconomic Experiences Affect Risk-Taking?

Measuring the Effects of Monetary Policy: A Factor-Augmented Vector Autoregressive (FAVAR) Approach * Ben S. Bernanke, Federal Reserve Board

Relationships between Stock Prices and Accounting Information: A Review of the Residual Income and Ohlson Models. Scott Pirie* and Malcolm Smith**

UNDERSTANDING THE DEATH BENEFIT SWITCH OPTION IN UNIVERSAL LIFE POLICIES. Nadine Gatzert

Modeling VIX Futures and Pricing VIX Options in the Jump Diusion Modeling

SELF-EVALUATION FOR VIDEO TRACKING SYSTEMS

Vector Autoregressions (VARs): Operational Perspectives

DOES TRADING VOLUME INFLUENCE GARCH EFFECTS? SOME EVIDENCE FROM THE GREEK MARKET WITH SPECIAL REFERENCE TO BANKING SECTOR

Double Entry System of Accounting

Mobile Broadband Rollout Business Case: Risk Analyses of the Forecast Uncertainties

The Grantor Retained Annuity Trust (GRAT)

LIFE INSURANCE WITH STOCHASTIC INTEREST RATE. L. Noviyanti a, M. Syamsuddin b

How To Calculate Price Elasiciy Per Capia Per Capi

Analysis of Pricing and Efficiency Control Strategy between Internet Retailer and Conventional Retailer

Chapter 4: Exponential and Logarithmic Functions

Appendix A: Area. 1 Find the radius of a circle that has circumference 12 inches.

The naive method discussed in Lecture 1 uses the most recent observations to forecast future values. That is, Y ˆ t + 1

Credit Index Options: the no-armageddon pricing measure and the role of correlation after the subprime crisis

Monte Carlo Observer for a Stochastic Model of Bioreactors

Stochastic Optimal Control Problem for Life Insurance

Optimal Investment and Consumption Decision of Family with Life Insurance

Option Pricing Under Stochastic Interest Rates

A Universal Pricing Framework for Guaranteed Minimum Benefits in Variable Annuities *

Present Value Methodology

Permutations and Combinations

The Impact of Surplus Distribution on the Risk Exposure of With Profit Life Insurance Policies Including Interest Rate Guarantees.

Analysis of Tailored Base-Surge Policies in Dual Sourcing Inventory Systems

Gene Regulatory Network Discovery from Time-Series Gene Expression Data A Computational Intelligence Approach

Default Risk in Equity Returns

Automatic measurement and detection of GSM interferences

Transcription:

Bayesian Inference of Arrival Rae and Subsiuion Behavior from Sales Transacion Daa wih Sockous Benjamin Leham 1, Lydia M. Leham, and Cynhia Rudin 3 1 Operaions Research Cener, Massachuses Insiue of Technology, bleham@mi.edu Deparmen of Elecrical Engineering and Compuer Science, Massachuses Insiue of Technology, lmleham@mi.edu 3 Compuer Science and Arificial Inelligence Laboraory and Sloan School of Managemen, Massachuses Insiue of Technology, rudin@mi.edu Absrac When an iem goes ou of sock, sales ransacion daa no longer reflec he original cusomer demand, since some cusomers leave wih no purchase while ohers subsiue alernaive producs for he one ha was ou of sock. We provide a Bayesian hierarchical model for inferring he underlying cusomer arrival rae and choice model from sales ransacion daa and he corresponding sock levels. The model uses a nonhomogeneous Poisson process o allow he arrival rae o vary hroughou he day, and allows for a variey of choice models including nonparameric models. Model parameers are inferred using a sochasic gradien MCMC algorihm ha can scale o large ransacion daabases. We fi he model o daa from a local bakery and show ha i is able o make accurae ou-of-sample predicions. The model indicaes ha some bakery iems experienced subsanial los sales, whereas ohers, due o subsiuion, did no. 1 Inroducion An imporan common challenge facing reailers is o undersand cusomer preferences in he presence of sockous. When an iem is ou of sock, some cusomers will leave, while ohers will subsiue a differen produc. From he ransacion daa colleced by reailers, i is challenging o deermine exacly wha he cusomer s original inen was, or, because of no-purchase arrivals, even how many cusomers here acually were. The ask ha we consider here is o infer boh he arrival rae, including hose ha lef wihou a purchase, and he subsiuion model from sales ransacion and sock level daa. These quaniies are a necessary inpu for invenory managemen and assormen planning problems. In his paper we apply he model and inference procedure o bakery daa o esimae los sales due o sock unavailabiliy. We will see ha for some iems here are subsanial los sales, while for ohers, due o subsiuion, here are no. Knowing which iems are being subsiued and which are no will help he reailer o beer focus resources. There are several conribuions made by our model. Firs, we allow he model for he arrival rae o be nonhomogeneous in ime. For example, in our experimens wih bakery daa we rea each day as a ime period and model he arrival rae wih a funcion ha peaks a he busies ime for he bakery and hen apers off owards he end of he day. Nonhomogeneous arrival raes are likely o be presen in many reail seings where sockous are common. For example, in our experimens we use ransacion daa from a bakery, where many of he iems are inended o sockou every day 1

as hey mus be baked fresh he nex morning. As we will see in Secion 5, he daily arrival rae a he bakery is far from consan. As anoher example, Johnson e al. 1) describe a relaively new indusry of reailers ha operae flash sales in which he mos popular iems quickly sockou. Using daa from one of hese reailers hey show ha he purchase rae has a peak near he sar of he sale and hen decreases. The second major conribuion is ha our model can incorporae pracically any choice model, including nonparameric models. The hird is ha he model allows for muliple cusomer segmens, each wih heir own subsiuion models. We show how his can be used o borrow srengh across daa from muliple sores. Finally, our inference is fully Bayesian. In many cases he model parameers are no of ineres per se, bu are o be used for making predicions and decisions. In Secion 1. we discuss how Bayesian inference provides a naural framework for incorporaing he uncerainy in he inference ino he decisions ha are based on he inference. In his paper we describe he model and he Bayesian inference procedure. We hen use a series of simulaions o illusrae he inference, and o show ha we can recover he rue, generaing values. Finally, we demonsrae how he model can be fi o real ransacion daa obained from a local bakery. We use he resuls o esimae he bakery s los sales due o sock unavailabiliy. 1.1 Prior Work The primary work on esimaing demand and choice from sales ransacion daa wih sockous was done by Vulcano e al. 1). They model cusomer arrivals using a homogeneous Poisson process wihin each ime period, meaning he arrival rae is consan hroughou each ime period. Cusomers hen choose an iem, or an unobserved no-purchase, according o he mulinomial logi MNL) choice model. They show ha when he no-purchase cusomers are no observed, he MNL choice model parameers are no all idenifiable. Raher, he reailer mus conjecure he proporion of arrivals ha do no purchase anyhing even when all iems are in sock. They derive an EM algorihm o solve he corresponding maximum likelihood problem. Our model uses a nonhomogeneous Poisson process for cusomer arrivals ha allows he arrival rae o vary hroughou each ime period. The nonhomogeneiy will prove imporan when we work wih real daa in Secion 5, which are nonhomogeneous hroughou he day. Our model also does no require using he MNL model and can be used wih models ha are enirely idenifiable, hus no longer requiring he reailer o know beforehand he unobserved proporion of no-purchases. The exogenous model ha we describe in Secion.3.3 is one such model ha we use. Finally, we ake a Bayesian approach o inference which comes wih advanages over maximum likelihood esimaion in using he model o make predicions, as we describe in Secion 1.. Anupindi e al. 199) also presen a mehod for esimaing demand and choice probabiliies from ransacion daa wih sockous. Cusomer arrivals are modeled wih a homogeneous Poisson process and purchase probabiliies are modeled explicily for each sock combinaion, as opposed o using a choice model. They find he maximum likelihood esimaes for he arrival rae and purchase probabiliies. Their model does no scale well o a large number of iems as he likelihood expression includes all sock combinaions found in he daa. Vulcano and van Ryzin 1) exend he work of Vulcano e al. 1) o incorporae nonparameric choice models, for which maximum likelihood esimaion becomes a large-scale concave program ha mus be solved via a mixed ineger program subproblem. Our model naurally incorporaes nonparameric models from a pre-specified subse of relevan ypes. Their approach generaes relevan ypes, bu requires a consan arrival rae over ime periods and involves a compuaionally inensive opimizaion. There is work on esimaing demand and choice in seings differen from ha which we consider here, such as discree ime Talluri and van Ryzin, 1; Vulcano e al., 1), panel or aggregae sales daa Campoa e al., 3; Kalyanam e al., 7; Musalem e al., 1), negligible no purchases Kök and Fisher, 7), and online learning wih simulaneous ordering decisions Jain e al., 15).

Jain e al. 15) provide an excellen review of he various hreads of research in demand and choice esimaion. 1. The Bayesian Approach Suppose we have daa and laen model parameers z. A common esimaion approach is he maximum likelihood esimae: z * arg max z p z). Suppose now ha here is anoher quaniy Q ha we wish o predic, which depends on he model parameers according o he probabiliy model pq z). For insance, he los sales due o sockous is one such quaniy ha we esimae. Using he maximum likelihood esimae, he esimae of Q given is Q pq z * ), from which samples can be drawn wih Mone Carlo sampling. In Bayesian inference, he objecive is no a single poin esimae, raher i is o draw samples from he poserior disribuion pz ). Given hese samples, we can esimae he acual poserior disribuion of Q: pq ) = pq z)pz )dz. The poserior disribuion of Q incorporaes all of he uncerainy in z direcly ino he esimae of Q. Suppose ha here is a range of values of z wih similar likelihood o z *, bu ha produce very differen values of Q. The uncerainy in z ha remains afer observing will be ranslaed o he corresponding uncerainy in Q. A Generaive Model for Transacion Daa wih Sockous We begin by inroducing he noaion ha we use o describe he observed daa. We hen inroduce he nonhomogeneous model for cusomer arrivals, followed by a discussion of various possible choice models. Secion. discusses how muliple cusomer segmens are modeled. We hen in Secion.5 inroduce he likelihood model: he probabilisic model for how he daa are generaed. Finally, Secion. discusses he prior disribuions, a which poin he model is ready for inference..1 The Daa We suppose ha we have daa from a collecion of sores = 1,..., S. For each sore, daa come from a number of ime periods l = 1,..., L, hroughou each of which ime varies from o T. For example, in our experimens a ime period was one day. We consider a collecion of iems i = 1,..., n. We suppose ha we have wo ypes of daa: purchase imes and sock levels. We denoe he number {,l i,1,...,,l i,m,l i of purchases of iem i in ime period l a sore as m,l i. Then, we le,l i = be he observed purchase imes of iem i in ime period l a sore. For noaional convenience, { we le,l =,l i } n i=1 be he collecion of all purchase imes for ha sore and ime period, and le = {,l} l=1,...,l be he complee se of arrival ime daa. A able of all of he noaion used =1,...,S hroughou he paper is given in Appendix A. In addiion o purchase imes, we suppose ha we know he sock levels. We denoe he known iniial sock level as N,l i and assume ha socks are no replenished hroughou he ime period. Tha is, m,l i N,l i and equaliy implies a sockou. As before, we le N,l and N represen respecively he collecion of iniial sock daa for sore and ime period l, and for all sores and all ime periods. Given,l i and N,l i, we can compue a sock indicaor as a funcion of ime. We define his indicaor funcion as { s i,l, N,l if iem i is ou of sock a ime ) = 1 if iem i is in sock a ime. } 3

. Modeling Cusomer Arrivals We model he imes of cusomer arrivals using a nonhomogeneous Poisson process NHPP). An NHPP is a generalizaion of he Poisson process ha allows for he inensiy o be described by a funcion λ) as opposed o being consan. We assume ha he inensiy funcion has been parameerized, wih parameers η poenially differen for each sore. For example, if we se λ η ) = η1 we obain a homogeneous Poisson process of rae η1. As anoher example, we can produce an inensiy funcion ha rises o a peak and hen decays by leing λ η ) = η 1 η η3 1 + ) η 3 η 3 ) η 1 ) η ), 1) which is he derivaive of he Hill equaion Gouelle e al., ). This is he parameerizaion ha we use in our bakery daa experimens. The modeler chooses a parameerizaion for he rae funcion ha is appropriae for heir daa source, bu does no choose he acual values of η. The poserior disribuion of η will be inferred. To do his we use he condiional densiy funcion for NHPP arrivals, which we provide now. Lemma 1. Consider arrival imes 1,,... generaed by an NHPP wih inensiy funcion λ η ). Then, p j j 1, η ) = exp Λ j 1, j η ))λ j η ), where Λ j 1, j η ) = j j 1 λ η )d. The proof is given in Appendix B. We le η = {η } S =1 represen he complee collecion of rae funcion parameers o be inferred..3 Models for Subsiuion Behavior We have modeled cusomers arriving according o an NHPP described by parameers η. In he nex piece of he model, each of hose cusomers will eiher purchase an iem or will choose he no-purchase opion. If hey purchase an iem and which iem hey purchase will depend on he sock availabiliy as well as some choice model parameers which we will describe below. We define f i s), φ k, τ k ) o be he probabiliy ha a cusomer purchases produc i given he curren sock s) and choice model parameers φ k and τ k. The index k indicaes he parameers for a paricular cusomer segmen, which we will discuss in Secion.. The modeler is free o choose whaever form for he choice funcion f i he or she finds o be mos appropriae. Poserior disribuions for he parameers φ k and τ k are hen inferred. We now discuss how several common choice models fi ino his framework, and we use hese choice models in our simulaion and daa experimens..3.1 Choice wih No Subsiuion Here we le he parameers φ k 1,..., φ k n specify a preference disribuion over producs, ha is, φ k i and n i=1 φk i = 1. Each cusomer selecs a produc according o ha disribuion. If hey selec a produc ha is ou of sock hen here is no subsiuion, hey leave as a no-purchase: f ns i s), φ k ) = s i )φ k i. The disribuion φ k describes exacly he primary demand, and he parameer τ k is no used.

.3. Mulinomial Logi Choice The MNL is a popular choice model ha derives from a random uiliy model. As in he previous model, φ k specifies a preference disribuion over producs. When an iem goes ou of sock, subsiuion akes place by ransferring purchase probabiliy o he oher iems proporionally o heir original probabiliy, including o he no-purchase opion. In order o have posiive probabiliy of cusomers subsiuing o he no-purchase opion, a proporion of arrivals mus be no-purchases even when all iems are in sock. We le τ k /1 + τ k ) be he no-purchase probabiliy when all iems are in sock, and obain he MNL choice probabiliies by normalizing wih he preference vecor φ k accordingly: f mnl i s), φ k ) = s i )φ k i τ k + n v=1 s. v)φ k v Vulcano e al. 1) show ha he MNL model parameer τ k is no idenifiable when he arrival funcion is also unknown, and so hey assume i o be a known, fixed parameer..3.3 Single-Subsiuion Exogenous Model The exogenous model overcomes some shorcomings of he MNL choice model, and allows for he no-purchase opion o be chosen only if here is a sock unavailabiliy. According o he exogenous proporional subsiuion model Kök and Fisher, 7), a cusomer samples a firs choice from he preference disribuion φ k. If ha iem is available, he or she purchases he iem. If he firs choice is no available, wih probabiliy 1 τ k he cusomer leaves as no-purchase. Wih he remaining τ k probabiliy, he cusomer picks a second choice according o a preference vecor ha has been re-weighed o exclude he firs choice. Specifically, if he firs choice was j hen he probabiliy of choosing i as he second choice is φ k i / v j φk v. If he second choice is in sock i is purchased, oherwise he cusomer leaves as no-purchase. The formula for he purchase probabiliy follows direcly: fi exo s), φ k, τ k ) = s i )φ k i + s i )τ k n 1 s j ))φ k j For his model, poserior disribuions for boh φ k and τ k are inferred..3. Nonparameric Choice Model φk i v j φk v. ) Nonparameric models ofen offer a lucid descripion of subsiuion behavior. Raher han being a probabiliy vecor as in he parameric models, here he parameer φ k is an ordered subse of he iems {1,..., n}. Cusomers purchase φ k 1 if i is in sock. If no, hey purchase φ k if i is in sock. If no, hey coninue subsiuing down φ k unil hey reach he firs iem ha is available. If none of he iems in φ k are available, hey leave as a no-purchase. The purchase probabiliy for his model is hen f np i s), φ k ) = { 1 if i = min{j {1,..., φ k } : s φ k) = 1} j oherwise. Because his model requires all cusomers o behave exacly he same, i is mos useful when cusomers are modeled as coming from a number of differen segmens k, each wih is own preference ranking φ k. This is precisely wha we do in our model, as we describe in he nex secion. For he nonparameric model he rank orders for each segmen φ k are fixed and i is he disribuion of cusomers across segmens ha is inferred. 3) 5

. Segmens and Mixures of Choice Models We model cusomers as each coming from one of K segmens k = 1,..., K, each wih is own choice model parameers φ k and τ k. Le θ be he cusomer segmen disribuion for sore, wih θk he probabiliy ha an arrival a sore belongs o segmen k, θk, and K k=1 θ k = 1. As wih oher variables, we denoe he collecion of segmen disribuions across all sores as θ. Similarly, we denoe he collecions of choice model parameers across all segmens as φ and τ. For he nonparameric choice model, each of hese segmens would have a differen rank ordering of iems and muliple segmens are required in order o have a diverse se of preferences. For he MNL and exogenous choice models, cusomer segmens can be used o borrow srengh across muliple sores. All sores share he same underlying segmen parameers φ and τ, bu each sore s arrivals are represened by a differen mixing of hese segmens, θ. This model allows us o use daa from all of he sores for inferring he choice model parameers, while sill allowing sores o differ from each oher by having a differen mixure of segmens..5 The Likelihood Model We now describe he underlying model for how cusomer segmens, choice models, sock levels, and he arrival funcion all inerac o creae ransacion daa. Consider sore and ime period l. Cusomers arrive according o he NHPP for his sore. Le,l 1,...,,l represen all of he arrival m,l imes; hese are unobserved, as hey may include no-purchases. Each arrival has probabiliy θk of belonging o segmen k. They hen purchase an iem or leave as no-purchase according o he choice model f i. If he j h arrival purchases an iem hen we observe ha purchase a ime,l j ; if hey leave as no-purchase we do no observe ha arrival a all. The generaive model for he observed daa is hus: For sore = 1,..., S: For ime period l = 1,..., L : Sample cusomer arrival imes,l 1,...,,l NHPPλ η ), T ). m,l For cusomer arrival j = 1,..., m,l : Sample his cusomer s segmen as k Mulinomialθ ). Choose iem i for his cusomer s purchase wih probabiliy f i s,l j,l, N,l ), φ k, τ k ), or he no-purchase opion wih probabiliy 1 n i=1 f is,l j,l, N,l ), φ k, τ k ). If iem i purchased, add he ime o,l i. We now provide he likelihood funcion corresponding o his generaive model. Theorem 1. The log-likelihood funcion of is: S L n log p η, θ, φ, τ, N, T ) = where λ,l i ) = λ η ) K k=1 =1 l=1 i=1 i m,l λ,l ) log i,l i,j ),l Λ i, T ), T θk f i s,l, N,l ), φ k, τ k ) and Λ,l i, T ) = λ,l i )d. The resul is acually ha which would be obained if we reaed he purchases for each iem,l as independen NHPPs wih rae λ i ), which is he purchase rae for iem i incorporaing sock availabiliy and cusomer choice. In realiy, however, hey are no independen NHPPs inasmuch

as hey depend on each oher via he sock funcion s,l, N,l ). The key elemen of he proof is ha while he purchase processes depend on each oher, hey do no depend on he no-purchase arrivals. Proof of Theorem 1. We consider he densiy funcion for he complee arrivals,l, which include boh he observed arrivals,l as well as he unobserved arrivals ha lef as no-purchase, which we { } m,l here denoe,l =,l,j. Le f s,l, N,l ), φ k, τ k ) = 1 n i=1 f is,l, N,l ), φ k, τ k ) be he probabiliy ha a cusomer of segmen k chooses he no-purchase opion. Also, le π i s,l, N,l ), φ, τ, θ ) = K k=1 θ k f is,l, N,l ), φ k, τ k ) be he probabiliy ha a randomly chosen arrival purchases produc i, or he no-purchase i =. Finally, we se Ĩ,l j equal o i if he cusomer a ime,l j purchased iem i, or if his cusomer lef as no-purchase. For sore and ime period l, p,l,,l η,θ, φ, τ, N, T ) = P no arrivals in m,l j= p,l j ],l, T,l, η ) p,l m,l 1 η )pĩ,l 1 θ, φ, τ, N),l 1,...,,l j 1, η )pĩ,l j,l 1,...,,l j 1, θ, φ, τ, N) = exp Λ m,l, T η ))λ,l 1 η ) exp Λ,,l m,l j= λ,l j = exp Λ, T η )) = exp Λ, T η )) = η ) exp Λ,l j 1,,l j n i= n j:ĩ,l j =i m,l i i=,l m exp Λ,l, T )) λ,l j 1 η ))πĩ,l 1 s,l 1,l, N,l ), φ, τ, θ ) η ))πĩ,l s,l j,l, N,l ), φ, τ, θ ) j η )π i s,l j,l, N,l ), φ, τ, θ ) λ,l i,j η )π i s,l i,j,l, N,l ), φ, τ, θ) λ,l,l,j ) n i=1 m,l i exp Λ,l i, T )) λ,l i,l The second equaliy uses Lemma 1, and he final uses Lemma from he appendix. We have hen ha p,l η, θ, φ, τ, N, T ) = p,l,,l η, θ, φ, τ, N, T )d,l,l m m n,l i = exp Λ,l, T )) λ,l,l,j )d,l exp Λ,l i, T )) λ,l i,l i,j ) = n i=1 m,l i exp Λ,l i, T )) λ,l i,l i,j ), using Corollary 1 from he appendix. Given he model parameers, daa are generaed independenly i=1 i,j ). 7

for each and l, hus log p η, θ, φ, τ, N, T ) = = S L log p,l η, θ, φ, τ, N, T ) =1 l=1 S L n =1 l=1 i=1 i m,l λ,l ) log i,l i,j ),l Λ i, T ). We show in Appendix B how compued efficienly.,l Λ i, T ) can be expressed in erms of Λ, T η ) and hus. Prior Disribuions and he Log-Poserior To do Bayesian inference we mus specify a prior disribuion for each of he laen variables: η, θ, and φ and τ as required by he choice model. The variables θ, φ, and τ are all probabiliy vecors, so he naural choice is o assign hem a Dirichle or Bea prior: θ Dirichleα) φ k Dirichleβ), k = 1,..., K τ k Beaγ), k = 1,..., K. Here α, β, and γ are prior hyperparameers chosen by he modeler. If here is acually some exper knowledge abou he choice models and segmen disribuions hen i can be encoded in hese hyperparameers. Oherwise, a naural choice is o use a uniform prior disribuion by seing each of hese hyperparameers o be a vecor of ones. In our experimens, we used uniform priors. Similarly, for η, a naural choice for he prior disribuion is a uniform disribuion for each elemen: η v Uniformδ v ), v = 1,..., η, = 1,..., S. In our experimens we chose he inerval δ v large enough o no be resricive. For he Hill rae ha we use in our daa experimens, η = 3. We can hen compue he prior probabiliy as K pη, θ, φ, τ α, β, γ, δ) = pθ α) pφ k β)pτ k γ)) η S pηv δ v ) Bayes heorem yields: k=1 k=1 =1 v=1 K θ k ) α k 1 τ k) γ 1 1 1 τ k ) )) γ n η 1 ) φ k βi 1 S i i=1 =1 v=1 1 {η v [δ v 1,δv ]} log pη, θ, φ, τ, α, β, γ, δ, N, T ) log p η, θ, φ, τ, N, T ) + log pη, θ, φ, τ α, β, γ, δ), 5) and hese wo quaniies are available in Theorem 1 and in ). Wih his resul we are now equipped o do poserior inference. ).

3 Sochasic Gradien MCMC Inference We use Markov chain Mone Carlo MCMC) echniques o simulae poserior samples, specifically he sochasic gradien Riemannian Langevin dynamics SGRLD) algorihm of Paerson and Teh 13). This algorihm uses a sochasic gradien ha does no require he full likelihood funcion o be evaluaed in every MCMC ieraion, hus allowing poserior inference o be done even on very large ransacion daabases. Also, he SGRLD algorihm is well suied for variables on he probabiliy simplex, as are θ, φ k, and τ k. Meropolis-Hasings sampling is difficul in his seing because i requires evaluaing he full likelihood as well as dealing wih he simplex consrains in he proposal disribuion. 3.1 The Expanded-Mean Parameerizaion We firs ransform each of he probabiliy variables using he expanded-mean parameerizaion Paerson and Teh, 13). The laen variable θ has as consrains θ k and K k=1 θ k = 1. Take θ a random variable wih suppor on R K +. We give θ a prior disribuion consising of a produc of Gammaα k, 1) disribuions: p θ α) K θ α k 1 k=1 k exp θ k ). The poserior sampling is done over variables θ by mirroring any negaive proposal values abou. We hen compue θ k = θ k / K θ r=1 r. This parameerizaion is equivalen o sampling on θ wih a Dirichleα) prior, bu does no require he probabiliy simplex consrain. The same ransformaion is done o φ k and τ k. 3. Riemannian Langevin Dynamics Le z = {η, θ, φ, τ } represen he complee collecion of ransformed laen variables whose poserior we are inferring. From sae z w on MCMC ieraion w, he nex ieraion moves o he sae z w+1 according o z w+1 = z w + ɛ w diagz w) log pz w, α, β, γ, δ, N, T ) + 1) + diagz w ) 1 ψ, ψ N, ɛw I). The ieraion performs a gradien sep plus normally disribued noise, using he naural gradien of he log poserior, which is he manifold direcion of seepes descen using he meric Gz) = diagz) 1. From 5), log pz w, α, β, γ, δ, N, T ) = log p z w, N, T ) + log pz w α, β, γ, δ). We use a sochasic gradien approximaion for he likelihood gradien. On MCMC ieraion w, raher han use all L ime periods o compue he gradien we use a uniformly sampled collecion of ime periods L w. The gradien approximaion is hen log p z w, N, T ) S =1 L L w l L w i=1 n i m,l λ,l ) log i,l i,j ),l Λ i, T ). The ieraions will converge o he poserior samples if he sep size schedule is chosen such ha w=1 ɛ w = and w=1 ɛ w < Welling and Teh, 11). In our simulaions and experimens we used hree ime periods for he sochasic gradien approximaions. We followed Paerson and Teh 13) and ook ɛ w = a1 + q/b) c ) wih sep size parameers chosen using cross-validaion o minimize ou-of-sample perplexiy. We drew 1, samples from each of hree chains iniialized a a local maximum a poseriori soluion found from a random sample from he prior. We verified convergence using he Gelman-Rubin diagnosic afer discarding he firs half of he samples as burn-in Gelman and Rubin, 199), and hen merged samples from all hree chains o esimae he poserior. In Appendix C we give he analyical likelihood gradien, as well as he gradiens for each of he choice models and rae funcions previously described. 9

Poserior densiy 1 1 1..... 1. θ1 1..... 1. θ 1..... 1. θ 3 1 Figure 1: Normalized hisograms of poserior samples of θ 1 for each of he hree sores used in he simulaion. The verical line indicaes he rue value. Simulaion Sudy We use a collecion of simulaions o illusrae and analyze he model and he inference procedure. The simulaion resuls show ha, for a variey of rae funcions and choice models, he poserior samples concenrae around he rue, generaing values. Furhermore, he poserior becomes more concenraed around he rue values as he amoun of daa increases..1 Homogeneous Rae and Exogenous Choice The firs se of simulaions used he homogeneous rae funcion λ η ) = η1 and he exogenous choice model given in ). We se he number of segmens K =, he number of iems n = 3, and se he choice model parameers o τ 1 = τ =.75, φ 1 = [.75,.,.5], and φ = [.33,.33,.3]. We simulaed daa from hree sores S = 3, for each of which he segmen disribuion θ was chosen independenly a random from a uniform Dirichle disribuion and he arrival rae η1 was chosen independenly a random from a uniform disribuion on [, ]. For each sore, we simulaed 5 ime periods, each of lengh T = 1 and wih he iniial sock for each iem chosen uniformly beween and 5, independenly a random for each iem, ime period, and sore. Purchase daa were hen generaed according o he generaive model in Secion.5. This simulaion was repeaed 1 imes, each wih differen random iniializaions of η and θ. Inference was done wih he prior hyperparameer for η1, δ 1, se o [, ]. To illusrae he resul of he inference, Figure 1 shows he poserior densiy for θ for one of he simulaions, as esimaed by MCMC sampling. The figure shows ha he poserior samples are concenraed around he rue values. The poserior densiies for his same simulaion for all of he parameers η, θ, τ, and φ) are given in Figures 1-17 in Appendix D. Figure shows he poserior means esimaed from he MCMC samples across all of he 1 repeas of he simulaion, showing ha across he full range of parameer values used in hese simulaions he poserior mean was close o he rue value.. Hill Rae and Exogenous Choice In a second se of simulaions, we used he same design as he firs se bu replaced he homogeneous arrival rae wih he Hill arrival rae, given in 1). We did only one simulaion, wih he rae funcion parameers η = [3, 3, 3] o obain a mean rae similar o ha of he simulaions in he previous secion. In he inference, we used prior hyperparameers δ 1 = [, ], δ = [, ], and δ 3 = [, ]. Figure 1 shows he poserior disribuion of η 1. Figure 3 shows poserior samples of he rae funcion λ η 1 ). The poserior esimaes of he rae funcion closely mach he rae funcion used 1

. Poserior mean of η 1 Poserior mean of θ k 3.5 3..5...5 3. 3.5. True value of η1 1........... 1. True value of θk Figure : Markers in he op panel show, for each randomly chosen value of η1 used in he se of simulaions 3 sores 1 simulaions), he corresponding esimae of he poserior mean. The boom panel shows he same resul for each value of θk used 3 sores segmens 1 simulaions). 1 Arrival rae λ η 1 ) 1 Figure 3: Each gray line is he rae funcion evaluaed using a η 1 randomly sampled from he poserior, wih a oal of such samples. The blue line is he rue rae funcion for his simulaion. 11

Poserior densiy 5 3 1..1..3..5 θ1, 1 {1}..1..3..5 θ 1, {1, }..1..3..5 θ 1 9, {3, } Figure : Poserior densiy for he non-zero segmen proporions from a simulaion wih nonparameric choice. The corresponding ordering φ k is given below each panel. o generae he daa..3 Hill Rae and Nonparameric Choice In he final se of simulaions we use he Hill rae funcion wih he nonparameric choice funcion from 3), wih 3 iems. We used all ses of preference rankings of size 1 and, which for 3 iems requires a oal of 9 segmens. We simulaed daa for a single sore, wih he segmen proporion θk 1 se o.33 for preference rankings {1}, {1, }, and {3, }. The segmen proporions for he remaining preference rankings were se o zero. Wih his simulaion we also sudy he effec of he number of ime periods used in he inference, L 1. L 1 was aken from {5, 1, 5, 5, 1}, and for each of hese values 1 simulaions were done. Figure shows he poserior densiies for he non-zero segmen proporions θk 1, for one of he simulaions wih L 1 = 5. The poserior densiies for he oher six segmen proporions are in Figure 19, and are all concenraed near zero. Figure 5 describes how he poserior depends on he number of ime periods. The op panel shows ha he poserior mean ends closer o he rue value as more daa are made available. The boom panel shows he acual concenraion of he poserior, where he inerquarile range of he poserior decreases wih he number of ime periods. Because we use a sochasic gradien approximaion, using more ime periods came a no addiional compuaional cos: We used 3 ime periods for each gradien approximaion regardless of he available number. 5 Daa Experimens We now provide he resuls of he model applied o real ransacion daa. We used he real daa o evaluae he predicive power of he model, and o compue a poserior disribuion of los sales due o sockous. We obained one semeser of sales daa from he bakery a 1 Main Markeplace, a cafe locaed a MIT. The daa were for a collecion of breakfas pasries bagel, scone, and croissan) and for a collecion of cookies oameal, double chocolae, and chocolae chip). The daa se included all purchase imes for 151 days; we reaed each day as a ime period. For he breakfas pasries he ime period was from 7: a.m. o : p.m., and for he cookies he ime period was from 11: a.m. o 7: p.m. The breakfas pasries comprised a oal of 39 purchases, and he cookies comprised purchases. Sock daa were no available, only purchase imes, so for he purpose of hese experimens we se he iniial sock for each ime period equal o he number of purchases for he ime period - hus every iem was reaed as socked ou afer is las recorded purchase. This is a reasonable assumpion given ha hese are perishable baked goods, whose sock levels are designed o sock ou by he end of he day. 1

.5 Poserior mean of θ 1 k.35.5.15. IQR of θ 1 k poserior.... 1 Number of simulaed ime periods L 1 Figure 5: Each marker corresponds o he poserior disribuion for θk 1 from a simulaion wih he corresponding number of ime periods, across he 3 values of k where he rue value equaled.33. The op panel shows he poserior mean for each of he simulaions across he differen number of ime periods. The boom panel shows he inerquarile range IQR) of he poserior. 13

Purchase rae min 1 )..15.1.5. :a 1:a 1:p :p Time Figure : In black is a normalized hisogram of he purchase imes for he breakfas pasries, across all 151 days. Each blue line is a poserior sample for he model fi of his quaniy, given in ). The empirical purchase raes for he wo ses of iems, shown in Figures and 9, were markedly nonhomogeneous, so we used he Hill rae funcion from 1). For all of he daa experimens we ook he rae prior hyperparameers o be δ 1 = [, ], δ = [1, 1], and δ 3 = [, 1], which we found o be a large enough range so as o be unresricive. For each se of iems we fi he model using boh he exogenous and nonparameric choice models. 5.1 Breakfas Pasries We began by fiing he breakfas pasry daa using he nonparameric choice model. Figure shows he acual purchase imes in he daa se across all hree iems, along wih random poserior samples from he model s prediced average purchase rae over all ime periods, which equals 1 151 151 3 l=1 i=1 λ 1,l i. ) The purchase rae shows a significan morning rush, as is expeced for hese ypes of iems a a bakery. Figure 7 shows he poserior densiies for he segmen probabiliies θ. The densiies indicae ha cusomers whose firs choice is bagel are generally willing o subsiue, hose whose firs choice is croissan less so, and cusomers seeking a scone are generally unwilling o subsiue. The model was also fi using he exogenous choice model, wih K = 1 cusomer segmen. The poserior densiies for φ are given in Appendix E, in Figure. The poserior densiy for he subsiuion rae τ 1 is given in Figure. 5. Cookies We hen fi he model o he cookie daase using he nonparameric choice model. The empirical average purchase rae is given in Figure 9, along wih poserior samples for he model s prediced average purchase rae from ). The purchase rae shows a lunch ime rush, followed by a susained afernoon rae ha finally apers off in he evening. There are also significan rushes during he periods beween afernoon classes. The Hill rae funcion ha we use is no able o capure hese afernoon peaks, however he model can incorporae any inegrable rae funcion. Given a rae funcion ha can produce hree peaks, he inference would proceed in he same way. 1

1. 1 1. 1 1 Poserior densiy θ1, 1 {bagel} 1.. θ, 1 {bagel, scone} 5. 3 1 θ, 1 {scone} θ3, 1 {croissan} 1 θ5, 1 {bagel, croissan} θ, 1 {scone, bagel} 5 1 1 3 1....... 1........ 1....... 1. θ7, 1 {scone, croissan} θ, 1 {croissan, bagel} θ9, 1 {croissan, scone} Figure 7: Normalized hisograms of poserior samples for each segmen proporion, for he breakfas pasries wih he nonparameric choice model. The corresponding ordered lis for each segmen is indicaed. 5 Poserior densiy 3 1..... 1. τ Figure : Normalized hisogram of poserior samples of he exogenous choice model subsiuion rae, for he breakfas pasry daa. 15

Purchase rae min 1 ).15.1.9..3. 11:a 1:p 3:p 5:p 7:p Time Figure 9: A normalized hisogram of purchase imes for he cookies, across ime periods, along wih poserior samples for he model s corresponding prediced purchase rae. 1 Poserior densiy..... 1. τ Figure 1: Normalized hisogram of poserior samples of he exogenous choice model subsiuion rae, for he cookie daa. The uncerainy in he poserior is clear from he variance in he samples in Figure 9. This uncerainy is he moivaion for using he full poserior in making predicions, as described in Secion 1.. The poserior densiy for θ is given in he appendix, in Figure 1. The model was also fi using he exogenous choice model, and he densiies for φ and τ are given in Figures and 1 respecively. 5.3 Predicive Performance Now ha we have fi he model o daa, i is imporan o esablish ha i has predicive power. We evaluaed he predicive power of he model by predicing ou-of-sample purchase couns during periods of varying sock availabiliy. We ook % of he ime periods 1 ime periods) as raining daa and did poserior inference. The remaining 31 ime periods were held ou as es daa. We considered each possible level of sock unavailabiliy, i.e., s = [1,, ], s = [, 1, ], ec. For each sock level, we found all of he ime inervals in he es periods wih ha sock. The predicion ask was, given only he ime inervals and he corresponding sock level, o predic he oal number of purchases ha ook place during hose ime inervals in he es periods. The acual number of purchases is known and hus predicive performance can be evaluaed. This is a meaningful predicion ask because good performance requires being able o accuraely 1

1..3.1.....1..1 Poserior densiy.. 5 1 15 Purchases, s = [,1,1].5... 1 3 5 Purchases, s = [,,1].5...15 15 5 3 Purchases, s = [1,,1].3...1.3..1.1.5.... 5 5 75 1 15. 15 5. 5 75 1 15. 15 1. 3 5 71. Purchases, s = [1,,] Purchases, s = [1,1,] Purchases, s = [,1,] Figure 11: Poserior densiies for he number of purchases during es se inervals wih he indicaed sock availabiliy for iems [bagel, scone, croissan]. The densiy in blue is for he nonparameric choice, red is for he exogenous choice, and gray is for a homogeneous arrival rae wih MNL choice. The verical line indicaes he rue value. model exacly he wo main componens of our model: he arrival rae as a funcion of ime, and how he acual purchases hen depend on he sock. We did his ask using he nonparameric and exogenous choice models as done in Secions 5.1 and 5.. We also did he predicion ask using he maximum likelihood model wih a homogeneous arrival rae and he MNL choice model, which is similar o he model used by Vulcano e al. 1). For he MNL model we se τ 1 =.1, as i canno be inferred. Poserior densiies for he prediced couns for he breakfas pasries are given in Figure 11. These were obained as described in Secion 1.. Despie heir very differen naures, he predicions made by he exogenous and nonparameric models are quie similar, and are boh consisen wih he rue values for all sock levels. The model wih a homogeneous arrival rae and MNL choice is unable o accuraely predic he purchase raes, mos likely because of he poor model for he arrival rae. Figure 3 shows he same resuls for he cookies daa. 5. Los Sales Due o Sockous Once he model parameers have been inferred, we can esimae wha he sales would have been had here no been any sockous. We esimaed poserior densiies for he number of purchases of each iem across 151 ime periods, wih full sock. In Figures 1 and 13 we compare hose densiies o he acual number of purchases in he daa, for he cookies and breakfas pasry daa respecively. For each of he hree cookies, he acual number of purchases was significanly less han he poserior densiy for purchases wih full sock, indicaing ha here were subsanial los sales due o sock unavailabiliy. Wih he nonparameric model, he difference beween he full-sock poserior mean and he acual number of purchases was 791 oameal cookies, 77 double chocolae cookies, and 1535 chocolae chip cookies. 17

1 1 3 Poserior densiy 1 1 3 Purchases, oameal 1 3 Purchases, double chocolae 1 3 5 7 Purchases, chocolae chip Figure 1: For he cookie daa, poserior densiies for he number of purchases during all periods, if here had been no sockous. The blue densiy is he resul wih he nonparameric choice model, and he red wih he exogenous. The verical line indicaes he number of purchases in he daa. Poserior densiy 1 3 1 1 1 3 Purchases, bagel 1 3 3 1 1 1 Purchases, scone 1 1 3 1 1 3 Purchases, croissan Figure 13: For he breakfas pasry daa, poserior densiies for he number of purchases during all periods, if here had been no sockous. The blue densiy is he resul wih he nonparameric choice model, and he red wih he exogenous. The verical line indicaes he number of purchases in he daa. Figure 13 shows he resuls for he breakfas pasries. Here he resuls do no suppor subsanial los sales due o sockous. For he nonparameric model, he 95% credible inerval for he full-sock number of bagel purchases is 195 951, which conains he acual value of 1 and so is no indicaive of los sales. The number of scone purchases also lies wihin he full-sock 95% credible inerval. Only for croissans does he acual number of purchases fall ouside he 95% credible inerval, wih a difference of 531 croissans beween he full-sock poserior mean and he observed purchases. Figures and 1 give some insigh ino he differen impac of sockous on sales for he wo ses of iems. These figures show he poserior densiies for he exogenous model subsiuion rae τ 1, for he breakfas pasries and cookies respecively. The poserior mean of τ 1 for he breakfas pasries was.7, whereas for he cookies i was.. These resuls indicae ha cusomers are much less willing o subsiue cookies, hence he los sales. Discussion We have developed a Bayesian model for inferring primary demand and consumer choice in he presence of sockous. The model can incorporae a realisic model of he cusomer arrival rae, and is flexible enough o handle a variey of differen choice models. Our model is closely relaed o 1

models like laen Dirichle allocaion, used in he machine learning communiy for opic modeling Blei e al., 3). Varians of opic models are regularly applied o very large ex corpora, wih a large body of research on how o effecively infer hese models. Tha research was he source of he sochasic gradien MCMC algorihm ha we used, which allows inference from even very large ransacion daabases. In our daa experimens, sampling ook jus a few minues on a sandard lapop compuer. The simulaion sudy showed ha when daa are acually generaed from he model, we are able o recover he rue, generaing values. They furher showed ha he poserior bias and variance decrease as more daa are made available, an improvemen wihou any addiional compuaional cos due o he sochasic gradien. We applied he model and inference o real sales ransacion daa from a local bakery. The daily purchase rae in he daa was clearly nonhomogeneous, wih a rush of purchases. The rush of purchases illusraes he imporance of modeling nonhomogeneous arrival raes in many reail seings. In a predicion ask ha required accurae modeling of boh he arrival rae and he choice model, we showed ha he model was able o make accurae predicions and significanly ouperformed he baseline approach. Finally, we showed how he model can be used o esimae a specific quaniy of ineres: los sales due o sockous. For bagels and scones here was no indicaion of los sales due o sockous, whereas for cookies he poserior provided evidence of subsanial los sales. The model and inference procedure we have developed provide a new level of power and flexibiliy ha will aid decision makers in using ransacion daa o make smarer decisions. Acknowledgemens. We are graeful o he saff a 1 Main Markeplace a he Massachuses Insiue of Technology who provided daa for his sudy. References Anupindi, Ravi, Maqbool Dada, Sachin Gupa. 199. Esimaion of consumer demand wih sockou based subsiuion: An applicaion o vending machine producs. Markeing Science 17) 3. Blei, David M., Andrew Y. Ng, Michael I. Jordan. 3. Laen dirichle allocaion. Journal of Machine Learning Research 3 993 1. Campoa, Kaia, Els Gijsbrechsb, Paricia Nisol. 3. The impac of reailer sockous on wheher, how much, and wha o buy. Inernaional Journal of Research in Markeing 73. Gelman, Andrew, Donald B. Rubin. 199. Inference from ieraive simulaion using muliple sequences. Saisical Science 7 57 511. Gouelle, Sylvain, Michel Maurin, Floren Rougier, Xavier Barbau, Lauren Bourguignon, Michel Ducher, Pascal Maire.. The Hill equaion: A review of is capabiliies in pharmacological modelling. Fundamenal & Clinical Pharmacology 33. Jain, Adiya, Nils Rudi, Tong Wang. 15. Demand esimaion and ordering under censoring: Sock-ou iming is almos) all you need. Operaions Research In press. Johnson, Kris, Bin Hong Alex Lee, David Simchi-Levi. 1. Analyics for an online reailer: Demand forecasing and price opimizaion. Working paper. Kalyanam, Kirhi, Sharad Borle, Peer Boawrigh. 7. conribuion. Markeing Science 3) 37 31. Deconsrucing each iem s caegory Kök, A. Gürhan, Marshall L. Fisher. 7. Demand esimaion and assormen opimizaion under subsiuion: Mehodology and applicaion. Operaions Research 55) 11 11. 19

Musalem, Andrés, Marcelo Olivares, Eric T. Bradlow, Chrisian Terwiesch, Daniel Corsen. 1. Srucural esimaion of he effec of ou-of-socks. Managemen Science 57) 11 1197. Paerson, Sam, Yee Whye Teh. 13. Sochasic gradien Riemannian Langevin dynamics on he probabiliy simplex. Advances in Neural Informaion Processing Sysems. NIPS 13, 31 311. Talluri, Kalyan, Garre van Ryzin. 1. Revenue managemen under a general discree choice model of consumer behavior. Managemen Science 51) 15 33. Vulcano, Gusavo, Garre van Ryzin. 1. A marke discovery algorihm o esimae a general class of nonparameric choice models. Managemen Science 1) 1 3. Vulcano, Gusavo, Garre van Ryzin, Wassim Chaar. 1. Choice-based revenue managemen: An empirical sudy of esimaion and opimizaion. Manufacuring & Service Operaions Managemen 13) 371 39. Vulcano, Gusavo, Garre van Ryzin, Richard Raliff. 1. Esimaing primary demand for subsiuable producs from sales ransacion daa. Operaions Research ) 313 33. Welling, Max, Yee Whye Teh. 11. Bayesian learning via sochasic gradien Langevin dynamics. Proceedings of he h Inernaional Conference on Machine Learning. ICML 11.

A Table of Noaion Here we provide a able of he noaion used hroughou he paper. = 1,..., S l = 1,..., L T i = 1,..., n m,l i j = 1,..., m,l i,l i,l N,l i N,l N s i,l, N,l ) η η λ η ) Each of S sores Each of L ime periods for sore Time ranges from o T in each ime period Each of n iems considered Number of purchases of iem i in ime period l a sore Each of he m,l i purchases Purchase imes of iem i during ime period l a sore All observed purchases iems i = 1,..., n) during ime period l a sore The complee se of purchase ime daa Iniial sock for iem i in ime period l a sore Iniial socks of all iems in ime period l a sore The complee se of iniial sock daa Indicaor funcion of he sock of iem i a ime, given purchase imes,l and iniial socks N,l Rae funcion parameers for sore Rae funcion parameers for all sores Arrival rae a ime, given parameers η Λ 1, η ) Inegral of arrival rae funcion from 1 o k = 1,..., K φ k φ τ k τ f i s), φ k, τ k ) θ θ m,l Each of K cusomer segmens Choice model parameer relaing o cusomer preference across iems, for cusomer segmen k. For parameric models, his is a probabiliy vecor over he iems. For he nonparameric model, his is an ordered se of iems Choice model parameers φ k for all cusomer segmens Choice model parameer relaing o subsiuion o he nopurchase opion, for segmen k Choice model parameers τ k for all cusomer segmens Choice model - he probabiliy a cusomer purchases iem i given sock s) and choice model parameers φ k and τ k Cusomer segmen disribuion for sore Cusomer segmen disribuions for all sores Toal number of arrivals in ime period l a sore 1

,l 1,...,,l m,l The arrival imes in ime period l a sore λ,l i ) The purchase rae for iem i a ime in ime period l a sore Λ,l i 1, ) Inegral of he purchase rae from 1 o, for iem i in ime period {,l =,l,j } m,l f s,l, N,l ), φ k, τ k ) π i s,l, N,l ), φ, τ, θ ) Ĩ,l j α β γ δ v δ p η, θ, φ, τ, N, T ) pη, θ, φ, τ α, β, γ, δ) pη, θ, φ, τ, α, β, γ, δ, N, T ) θ, φ, τ z z w ɛ w L w l a sore Unobserved imes of arrivals ha lef wih no purchase Probabiliy ha a cusomer of segmen k chooses he no-purchase opion given he sock and model parameers Probabiliy ha an arrival purchases iem i, or leaves as nopurchase for i = Iem Ĩ,l j sore, or Ĩ,l j was purchased by he j h arrival in ime period l a Prior hyperparameer for θ Prior hyperparameer for φ k Prior hyperparameer for τ k Prior hyperparameer for η v = if he j h arrival was no-purchase Collecion of prior hyperparameers for η The likelihood The prior The poserior Expanded-mean parameerizaions of θ, φ, and τ Complee se of ransformed laen variables - he sample space for MCMC Sae in MCMC ieraion w Sep size a ieraion w Se of ime periods used for he sochasic gradien approximaion for sore in MCMC ieraion w

B Proofs and Resuls for he Likelihood Funcion Here we prove several resuls relaing o he likelihood funcion. We begin wih he condiional densiy funcion for NHPP arrivals, given in he paper as Lemma 1. Proof of Lemma 1. The NHPP can be defined by is couning process: Pm arrivals in he inerval τ 1, τ ]) = Λτ 1, τ )) m exp Λτ 1, τ )), where Λτ 1, τ ) = m! τ τ 1 λu)du. Le random variables S 1, S,... be he arrival process for he NHPP. Consider a pair of imes j and j 1, wih j > j 1. The condiional disribuion funcion for he arrival imes is F Sj j S j 1 = j 1 ) = 1 PS j > j S j 1 = j 1 ) = 1 P no arrivals in he inerval j 1, j ]) Differeniaing 7) yields he corresponding densiy funcion. = 1 exp Λ j 1, j )). 7) Now we provide wo resuls ha are used in he proof of Theorem 1. Lemma. Proof. Λ, T η ) = n i= Λ,l i, T ). Λ, T η ) = = = = = T T i= T n i= n i= λ η )d n λ η )π i s,l, N,l ), φ, τ, θ )d n i= T λ,l i )d λ,l i )d Λ,l i, T ), where he second line uses n i=1 π is,l, N,l ), φ, τ, θ ) = 1. Now we derive he densiy funcion for a collecion of arrivals, and hen obain a useful corollary for he proof of Theorem 1. Lemma 3. For 1,..., m NHPPλ), T ), p 1,..., m ) = exp Λ, T )) m λ j ). 3

Proof. Le random variables S 1, S,... be he NHPP arrival process. m p 1,..., m ) = f S1 1 ) f Sj j S j 1 = j 1 ) PS m+1 > T S m = m ) j= m = λ j ) exp Λ j 1, j )) λ 1 ) exp Λ, 1 ))) exp Λ m, T )) j= m m = λ j ) exp Λ 1 ) + Λ j 1, j ) + Λ m, T ) m = exp Λ, T )) λ j ). j= Corollary 1.,l m exp Λ,l, T )) λ,l,l,j )d,l = 1. Proof. The quaniy being inegraed is exacly he densiy funcion for m,l arrivals from an NHPP wih rae ) over inerval [, T ]. λ,l,l Finally, we show how Λ i, T ) can be expressed analyically in erms of Λ, T η ). This is done by looking a each of he ime inervals where he sock s,l, N,l ) is consan. Le he sequence of imes q,l 1,..., q,l demarcae he inervals of consan sock. Tha is, [, T ] = Q,l Q,l 1 r, q,l r+1 ] and s,l, N,l ) is consan for [qr,l, q,l r+1 ) for r = 1,..., Q,l 1. Then, r=1 [q,l T Λ,l i, T ) = = = = T Q,l 1 r=1 Q,l 1 r=1 λ,l i )d λ η ) q,l r+1 q,l r K k=1 K θk f i s,l, N,l ), φ k, τ k )d k=1 λ η ) K k=1 θ k f i sq,l r θ k f i sq,l r,l, N,l ), φ k, τ k ),l, N,l ), φ k, τ k )d ) ) Λqr,l, q,l r+1 η ). Wih his formula, he likelihood funcion can be compued for any parameerizaion λ η ) desired so long as i is inegrable.

C Model Gradiens Here we provide he gradiens necessary o use he SGRLD sampler for our model. C.1 Likelihood Gradiens The derivaives of he likelihood funcion wih respec o he ransformed laen variables are: L m n,l i η log p z, N, T ) = η λ,l i,j η ) l=1 i=1 λ,l i,j η ) Q,l 1 K ) θk f i sqr,l,l, N,l ), φ k, τ k ) η Λqr,l, q,l r+1 η ) τ d log p z, N, T ) = θ d log p z, N, T ) = φd log p z, N, T ) = S L n =1 l=1 i=1 L n l=1 i=1 r=1 i m,l Q,l 1 m,l =1 l=1 i=1 r=1 k=1 θ d τ df is,l i,j,l, N,l ), φ d, τ d ) K k=1 θ k f is,l i,j,l, N,l ), φ k, τ k ) θd Λqr,l, q,l r+1 η ) τ df i sq,l r,l, N,l ), φ d, τ d ) i f i s,l i,j,l, N,l ), φ d, τ d ) K k=1 θ k f is,l i,j,l, N,l ), φ k, τ k ) K θ k=1 k f is,l i,j,l, N,l ), φ k, τ k ) 1 K k=1 θ k K k=1 S L n ) Q,l 1 r=1 f i sq,l r,l, N,l ), φ d, τ d ) θ k f i sq,l r,l, N,l ), φ k, τ k ) i m,l Q,l 1 r=1 θ d φ df is,l i,j,l, N,l ), φ d, τ d ) K k=1 θ k f is,l i,j,l, N,l ), φ k, τ k ) θd Λqr,l, q,l r+1 η ) φdf i sq,l ) ) Λq r, q r+1 η ) r,l, N,l ), φ d, τ d ) The gradiens of he rae funcion λ η ) and he choice model f i s,l, N,l ), φ d, τ d ) depend on which rae funcion and choice model are chosen. We now supply hose gradiens for he rae funcions and choice models presened in he paper. C. Rae Funcion Gradiens We use wo rae funcions in our simulaions and experimens: a consan rae and a Hill rae. C..1 Consan Rae When we le λ η ) = η 1, he NHPP reduces o a homogeneous Poisson process wih rae η 1. For his rae funcion, he mean-value funcion Λ 1, η ) = η 1 1 ). The gradiens of he rae funcion and mean-value funcion wih respec o η are simply 1 and 1 ) respecively. 5

C.. Hill Rae We also use he derivaive of he Hill equaion as he rae funcion. Here, λ η ) = η 1 The gradiens are: η λ η ) = η η3 η η3 1 + 1 + η 1 η Λ η ) = 1 + η 1 ) η η 3 ) η 3 η 3 η 3 η 3 ) 1 ) η 1 η 3 η 1 + η 3 ) ) η 1 ) ) η and Λ 1, η ) = 1 + ) η 1 ) η ), η 1 η 3 η 3 ) η 1 + ) η 1 ) η η 1 3 ) ) η 3 1 + 1 + ) η 1 η 3 η 3 1 η 3 ) η ) η 1 C.3 Choice Model Gradiens 1 ) η, η 1 η 3 η 1 + η 3 ) η 3 ), ) η η 3 log 1 + ) η η 3 η 3 ) η ) ) η η 1 + ) η 3 η 3 η 3 ) η ) η 1 η 3. η1 1 η3 ) η log ) η 1 + η 3 η1 η3 ) + η log ) η. ) η 3 ) ) ) η ) 3, 1 ) η η 3 log 1 + ) 1 η 3 1 η 3 ) η ), Here we give he gradiens for he choice models ha we use in he paper: he MNL model, he single-subsiuion exogenous model, and he nonparameric model. These are he gradiens wih respec o he reparameerized variables φ k and τ k, where φ k i = φ k i / n r=1 φ k r and τ k = τ k 1 / τ k 1 + τ k ). C.3.1 MNL Choice The MNL model uses f i s), φ k, τ k ) = where τ k is a fixed, chosen consan. The derivaives are: φ k i s i )φ k i τ k + n v=1 s, v)φ k v n f i s), φ k, τ k v=1 ) = s i ) τ k + s v )) φ k v 1 + τ k ) φ k i ), φ f k i s), φ k, τ k ) = s i ) r i n n v=1 τ k + s v )) φ k v s r ) + τ k ) φ k i v=1 τ k + s v )) φ k v )

C.3. Exogenous Choice The exogenous model uses f i s), φ k, τ k ) = s i )φ k i + τ k s i ) n 1 s v ))φ k v v=1 φk i The gradiens are: [ τ kf i s), φ k, τ k τ k s i ) n ) = τ 1 k + τ k 1 s v ))φ k ) v φk i τ 1 k s i ), v=1 j v φk j τ 1 k + τ k ) n φ φ f k i s), φ k, τ k k j ) = s i ) φ k n i n i φ ) + s i )τ k 1 s v )) n k v=1 j + n φ k j φ k v ) n φ k j φ k v ), j v φk j. n 1 s v ))φ k v v=1 φ k i φ k j φ f k i s), φ k, τ k ) = s i ) φ k i n r i φ ) + s i )τ k 1 s r )) φ k 1 k i n φ j k j φ k r n + s i )τ k 1 s v )) φ k 1 1 i ) C.3.3 Nonparameric Choice The nonparameric choice model is f i s), φ k, τ k ) = v=1 n φ k j n ) ) φ k j φ k v { 1 if i = min{j {1,..., φ k } : s φ k) = 1} j oherwise. n ) φk i j v φk j φ k i φ k j φ k v For his model here is no parameer τ k, and φ k is a fixed, chosen ordering over producs. Thus here are no choice model variables o be inferred - he inference is jus over θ.. ], ) 7

1 Poserior densiy..5 3. 3.5. η1 1..5 3. 3.5. η1..5 3. 3.5. η1 3 Figure 1: Normalized hisograms of poserior samples of η for he simulaion of Secion.1. The verical line indicaes he rue value. D Addiional Simulaion Figures Here we give addiional figures o illusrae he simulaion resuls. Figures 1-17 show he esimaed poserior densiies for η, θ, τ, and φ respecively, for he same simulaion used in Figure 1, Secion.1. Figure 1 shows he poserior disribuion of η 1 for he simulaion in Secion.. Figure 19 shows he poserior disribuion of he elemens of θ 1 for which he rue value was, for he same simulaion as Figure from Secion.3.

Poserior densiy 1. 1 1 1.. 1. 1 1. θ 1 1 θ 1....... 1........ 1....... 1. θ 1 θ θ 3 θ 3 1 Figure 15: Normalized hisograms of poserior samples of θ for he simulaion of Secion.1. 5 Poserior densiy 15 1 5..... 1. τ Figure 1: Normalized hisogram of poserior samples of τ 1 for he simulaion of Secion.1. 9

Poserior densiy 1. 5. 3. 1. 5. 3 1 φ 1 1 5 3 1 φ 1 φ 1 3....... 1........ 1....... 1. φ 1 φ φ 3 5 3 1 Figure 17: Normalized hisograms of poserior samples of φ for he simulaion of Secion.1..1 5.5 Poserior densiy.... 15 1 5..3..1. 9 95 3 35 31 η1.9.95 3. 3.5 3.1 η. 9 95 3 35 31 η3 Figure 1: Normalized hisograms of poserior samples of η 1 for he simulaion in Secion. 3

Poserior densiy 1. 1.. 1.. θ 1, {} θ 1 3, {3} θ 1 5, {1, 3}........1........1.......1 1. θ, 1 {, 1} θ7, 1 {, 3} θ, 1 {3, 1} Figure 19: Normalized hisograms of poserior samples of θk 1, along wih he corresponding ordering φ k below he panel, for he simulaion in Secion.3. The rue value for all of hese parameers was. E Addiional Daa Experimen Figures Figure shows poserior densiies for he exogenous choice model parameers, for he breakfas pasry daa. Figure 1 shows he poserior densiies for θ for he nonparameric choice model applied o he cookie daa. Figure shows poserior densiies for he exogenous choice model parameers, for he cookie daa. Figure 3 shows he resuls of he predicion ask for he cookie daa. 31

Poserior densiy 1 1 1..... 1. φ 1 1, bagel..... 1. φ 1, scone..... 1. φ 1 3, croissan Figure : Normalized hisograms of poserior samples of φ for he exogenous choice model and breakfas pasry daa. Poserior densiy 1. 1. 9. 7 5 3. 1 7 θ 1 1, {oameal} θ 1, {oameal, dbl. choc.} θ 1 7, {dbl. choc., choc. chip} 9 7 5 3 1 5 3 1 7 5 3 1 θ 1, {dbl. choc.} θ 1 5, {oameal, choc. chip} 1 1 1 1 1 1 θ 1 3, {choc. chip} θ 1, {dbl. choc., oameal}. 5 3 1....... 1........ 1....... 1. θ 1, {choc. chip, oameal} 1 1 θ 1 9, {choc. chip, dbl. choc.} Figure 1: Normalized hisograms of poserior samples of θ for he nonparameric choice model and cookie daa. 3

1 Poserior densiy..... 1. φ 1 1, oameal..... 1. φ 1, dbl. choc...... 1. φ 1 3, choc. chip Figure : Normalized hisograms of poserior samples of φ for he exogenous choice model and cookie daa. 1......1.1.1 Poserior densiy.. 1 3.. Purchases, s = [1,1,1].. 3 Purchases, s = [,1,1]. 1 3 Purchases, s = [,,1]....15...1...5..... 1 1.. 1.1 1. 3 Purchases, s = [1,,1] Purchases, s = [1,,] Purchases, s = [,1,] Figure 3: Poserior densiies for he number of purchases during es se inervals wih he indicaed sock availabiliy for cookies [oameal, double chocolae, chocolae chip]. The densiy in blue is for he nonparameric choice, red is for he exogenous choice, and gray is for a homogeneous arrival rae wih MNL choice. The verical line indicaes he rue value. 33