The Pure Characteristics Discrete Choice Model with Application to Price Indices. still preliminary

Similar documents
What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling

Econometric Tools for Analyzing Market Outcomes.

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

1 if 1 x 0 1 if 0 x 1

MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.

8 Divisibility and prime numbers

Mathematical finance and linear programming (optimization)

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS

ECON20310 LECTURE SYNOPSIS REAL BUSINESS CYCLE

Systems of Linear Equations

Linear Programming Notes V Problem Transformations

Notes V General Equilibrium: Positive Theory. 1 Walrasian Equilibrium and Excess Demand

Linear Algebra Notes for Marsden and Tromba Vector Calculus

Linear Programming. March 14, 2014

Estimating the random coefficients logit model of demand using aggregate data

Special Situations in the Simplex Algorithm

Metric Spaces. Chapter Metrics

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m

Elasticity. I. What is Elasticity?

1 Solving LPs: The Simplex Algorithm of George Dantzig

1 The Brownian bridge construction

Exact Nonparametric Tests for Comparing Means - A Personal Summary

Math 4310 Handout - Quotient Vector Spaces

Section 1.3 P 1 = 1 2. = P n = 1 P 3 = Continuing in this fashion, it should seem reasonable that, for any n = 1, 2, 3,..., =

4.5 Linear Dependence and Linear Independence

Econometric analysis of the Belgian car market

Several Views of Support Vector Machines

Duality in Linear Programming

Practical Guide to the Simplex Method of Linear Programming

INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition)

Econometrics Simple Linear Regression

3. Mathematical Induction

Solutions to Math 51 First Exam January 29, 2015

Mathematics Course 111: Algebra I Part IV: Vector Spaces

NOTES ON LINEAR TRANSFORMATIONS

ECON 459 Game Theory. Lecture Notes Auctions. Luca Anderlini Spring 2015

Basic Proof Techniques

Quotient Rings and Field Extensions

Continued Fractions and the Euclidean Algorithm

Tiers, Preference Similarity, and the Limits on Stable Partners

LS.6 Solution Matrices

1 Short Introduction to Time Series

Moral Hazard. Itay Goldstein. Wharton School, University of Pennsylvania

6.207/14.15: Networks Lecture 15: Repeated Games and Cooperation

1 Teaching notes on GMM 1.

Lecture 3: Finding integer solutions to systems of linear equations

Solution to Homework 2

What is Linear Programming?

Chapter 4 Online Appendix: The Mathematics of Utility Functions

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES

3. INNER PRODUCT SPACES

Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 2

c 2008 Je rey A. Miron We have described the constraints that a consumer faces, i.e., discussed the budget constraint.

Sensitivity Analysis 3.1 AN EXAMPLE FOR ANALYSIS

Vector and Matrix Norms

Common sense, and the model that we have used, suggest that an increase in p means a decrease in demand, but this is not the only possibility.

Chapter 4: Vector Autoregressive Models

Increasing for all. Convex for all. ( ) Increasing for all (remember that the log function is only defined for ). ( ) Concave for all.

The Basics of FEA Procedure

a 11 x 1 + a 12 x a 1n x n = b 1 a 21 x 1 + a 22 x a 2n x n = b 2.

Walrasian Demand. u(x) where B(p, w) = {x R n + : p x w}.

Marketing Mix Modelling and Big Data P. M Cain

Estimating the Effect of Tax Reform in Differentiated Product Oligopolistic Markets

Row Echelon Form and Reduced Row Echelon Form

1.7 Graphs of Functions

Chapter 6. Cuboids. and. vol(conv(p ))

LEARNING OBJECTIVES FOR THIS CHAPTER

Methods for Finding Bases

Linear Programming for Optimization. Mark A. Schulze, Ph.D. Perceptive Scientific Instruments, Inc.

T ( a i x i ) = a i T (x i ).

1.2 Solving a System of Linear Equations

Unified Lecture # 4 Vectors

INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS

Covariance and Correlation

UCLA. Department of Economics Ph. D. Preliminary Exam Micro-Economic Theory

24. The Branch and Bound Method

Elasticity Theory Basics

Chapter 1 Introduction. 1.1 Introduction

Inner Product Spaces

2. Linear regression with multiple regressors

Unraveling versus Unraveling: A Memo on Competitive Equilibriums and Trade in Insurance Markets

Inequality, Mobility and Income Distribution Comparisons

WRITING PROOFS. Christopher Heil Georgia Institute of Technology

Optimal linear-quadratic control

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur

Example 1. Consider the following two portfolios: 2. Buy one c(s(t), 20, τ, r) and sell one c(s(t), 10, τ, r).

2DI36 Statistics. 2DI36 Part II (Chapter 7 of MR)

2x + y = 3. Since the second equation is precisely the same as the first equation, it is enough to find x and y satisfying the system

Chapter 27: Taxation. 27.1: Introduction. 27.2: The Two Prices with a Tax. 27.2: The Pre-Tax Position

Linear Programming. April 12, 2005

Maximum Likelihood Estimation

Markov random fields and Gibbs measures

1 Error in Euler s Method

Name: Section Registered In:

Introduction to General and Generalized Linear Models

Chapter 3. Cartesian Products and Relations. 3.1 Cartesian Products

The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method

Lecture 2: August 29. Linear Programming (part I)

Regression III: Advanced Methods

Chapter 21: The Discounted Utility Model

Transcription:

The Pure Characteristics Discrete Choice Model with Application to Price Indices still preliminary Steven Berry Dept. of Economics, Yale and NBER Ariel Pakes Dept. of Economics, Harvard and NBER June 12, 2001

Abstract In this paper we consider a class of discrete choice models in which consumers care about a finite set of product characteristics. These models have been used extensively in the theoretical literature on product differentiation, but have not as yet been translated into a form that is useful for empirical work. Most recent econometric applications of discrete choice models implicitly let the dimension of the characteristic space increase with the number of products. The models in this paper have very different theoretical properties. After developing those properties and comparing them to the properties of models where there is a taste for the product per se, we provide an algorithm for an estimator for the parameters of our model. In this version of the paper, we particularly consider how the modeling choices discussed in the paper affect the calculation of ideal consumer price indices, especially in the presence of new goods.

1 Introduction. Discrete choice models have recently gained importance in the study of new goods and of differentiated products oligopoly. These models allow for a parsimonious treatment of demand, via Lancaster s (1971) idea that products be treated as bundles of characteristics. Utility is then defined over a limited set of characteristics rather than a potentially very large number of products. Griliches s (1961) work on hedonic price functions revived empirical work on models based on characteristics. Hedonic price functions were introduced as a way of accounting for quality change in the prices of new goods. The reasoning was that since newer models of goods often had more desirable characteristics, the difference between the prices of the newer and the older models should not be entirely attributed to inflation. On the other hand, if we build our price indices entirely from inter-period price comparisons of goods sold in both periods, that is if we never compare old to new goods directly, we will never capture the effect switching to new goods has on welfare, and this will bias price index calculations upward. Griliches suggests estimating a surface which relates prices to characteristics, and then using the estimated surface to obtain estimates of quality adjusted price changes for products with given sets of characteristics. This suggestion leads to a lower bound for the benefits from new goods (see Pakes 1998), but to go further than this we need a more complete analysis of characteristics based demand systems. Rosen (1974) introduced a class of equilibrium hedonic models with a continuum of products and with perfect competition on the supply side. We will start instead from the literature that uses discrete choice models of demand and which often assumes oligopoly pricing. McFadden s work on discrete choice models (e.g. McFadden 1974) introduced feasible techniques for estimating a complete characteristics-based discrete choice model of demand. However, very particular assumptions were needed and the literature that followed (including many contributions by Mc- Fadden himself) has been concerned that the structure of a specific discrete choice model would in some way restrict the range of possible outcomes, thereby providing misleading empirical results. This paper is a continuation of that tradition. In contrast to the typical empirical model, we consider estimation of a class of discrete choice models in which consumers care about a finite set of product characteristics. These models have been used extensively in the theoretical literature on product differentiation, but have not as yet been translated into a form that is useful for empirical work. Typical discrete choice empirical models implicitly assume that the di- 1

mension of the product space increases with the number of products. This assumption is often embedded in an otherwise unexplained i.i.d. additive random term in the utility function. This term might be interpreted as a direct taste for the product, as opposed to a taste for the characteristics of the products. In many cases, such models probably do a good job of approximating demand, but we worry because these models have some counter-intuitive implications as the number of products increases. Thus, they might not do a great job of answering questions that are specifically about changes in the number of goods one example being the evaluation of the benefits from introducing new goods into the market. We begin by explaining why we might want to use a pure characteristics model, with a finite set of product characteristics and no tastes for the products themselves. We then develop some of the properties of our model. These properties enable us to build an algorithm for estimating the pure characteristics model. The paper provides with some Monte Carlo evidence both on the estimation of utility parameters and on the use of those parameters to construct price indices. We conclude with a discussion of proposed empirical work on the personal computer industry. 2 Discrete Choice Models and Empirical Work. We consider models in which each consumer chooses to buy at most one product from some set of differentiated products. Consumer i s (indirect) utility from the purchase of product j is U ij = U(X j, V i, θ), (1) where X j is a vector of product characteristics (including the price of the good), V i is a vector of consumer tastes and θ is some vector of parameters. Probably the earliest model of this sort in the economic literature is the Hotelling (1929) model of product differentiation on the line. In that model X is the location of the product and V is the location of the consumer. Subsequent applications were concerned both with demand conditional on product locations and the determination of those locations. To obtain the market share, s j, of good j, we simply add up the number of consumers who prefer good j over all other goods. That is s j = P r{v i : U(X j, V i, θ) > U(X k, V i, θ), k j)} (2) To make the transition to empirical work easier we follow Berry, Levinsohn and Pakes (1998)and partition the vector of consumer attributes, V i, into z i, 2

which an econometrician with a micro data set might observe, and ν i, which the econometrician does not observe. We also partition product characteristics, X j, into x j, which is observed by the econometrician, and ξ j, which is not. All market participants are assumed to have perfect information. Typically, empirical studies write the utility function as additively separable in a deterministic function of the data and a disturbance term U ij = f(x j, z i ; θ) + µ ij, (3) where θ is a parameter to be estimated. A natural interpretation of (3) in terms of (1) is to think of the µ ij as resulting from interactions between unobserved consumer tastes (the ν) and the product characteristics X (both observed and unobserved). The specification of the model in (3) is completed by making a detailed set of assumptions on the joint distribution of the {µ i,j, X j, z i } tuples. For example, if there were K product characteristics then one might specify K µ ij ν ik X jk. (4) k=1 and assume a parametric distribution for ν conditional on (X, z). We would then have a random coefficients model. The observations on (x, z) and the distributional assumption on the ν (together with either product dummies or a distributional assumption on unobserved ξ component of X) would then generate a joint distribution for {µ i,j, X j )}. However, it is hard to interpret the typical specifications used in empirical work in this way. Empirical work typically assumes that µ ij contains an i.i.d. (across products and consumers) additive component that has support on the entire real line. This i.i.d. component insures that the distribution of random utilities in turn has full support on R J (where J is the number of products) no matter what characteristics and prices define the products. It is not important that there is literally an i.i.d. component in the model, but rather that the µ contain an additive component with the property that its density, conditional on the realizations of the additive components for the other products, is positive on the entire real line. Then for every possible set of products, there will always be some consumers who like any given product infinitely more than the others. Familiar examples of specifications that have additive components with full support include the random coefficient logit model discussed in Berry, Levinsohn and Pakes (1995) as well as the random coefficient probit (see Hausman and Wise 1978, McFadden 1981). To generate an additive component with full support from a specification like (4), we have to make the dimension of the characteristic space, K, be a 3

function of the number of products. Caplin and Nalebuff s (1991) suggestion is to think of the additive component as being formed from the interaction of a set of product-specific dummy variables and a set of i.i.d. tastes for each product. We will refer to this class of models as having including tastes for products, as opposed to just tastes for product characteristics. Though this assumption does justify empirical work, it contradicts the spirit of the literature on characteristic based demand models (which focus on demand and product location in a given characteristic space), and has several questionable implications (as outlined in the next subsection). On the other hand, models that include a taste for products have a number of important practical advantages. In particular the additive component with full support insures that all the purchase probabilities are nonzero (at every value of the parameter vector), and have particularly simple derivatives. This makes most estimation algorithms, particularly maximum likelihood, relatively easy to implement. Further, the additivity of this disturbance simplifies the limits of integration for the integrals defining the needed shares. 1 2.1 Properties of Empirical Models. Recall that models with tastes for products behave as if every new good is introduced with its own characteristic which is valued by consumers independently of the characteristics of all other characteristics and has full support. This is the source of the familiar red-bus blue-bus problem. When we introduce a new product that is virtually identical to an existing product (say product A ) we expect the combined market shares of the new product and product A to to be approximately the same as was the market share of product A before we introduced the new product. The fact that we introduce a new characteristic with the new product insures that the model will predict a larger combined share than that (consumers who value the new characteristic enough will switch from products other than A to the new product). It should not be surprising then that we are most worried about the implications of the model when we use it to compare situations which involve different amounts of products. Our biggest worry, and a leading reason to consider the model introduced in this paper, is the model s implications vis a vis the welfare gains from product introductions. This is because models with tastes for products insure that there will be consumers who like the new good infinitely more than any of the previously existing products (independent of either the ob- 1 See McFadden (1981) for a discussion of the transformation that transforms the probit model s region of integration into the positive orthant. 4

served characteristics of the new good, or of the relationship between those characteristics and the characteristics of the products already marketed). We hasten to add here that similar problems plague the other demand models that have been used to evaluate new products. This is because obtaining the change in welfare that results from the introduction of a product requires one to integrate over the marginal utility gains from every unit of the product bought. At best the data can identify the marginal utilities that observed price movements sweep over. The marginal utilities gains from units that were purchased at all observed prices are obtained by extrapolating the estimated demand system into a region for which there is no data (for a more detailed discussion of this problem, including how it manifests itself in demand systems estimated in product, in contrast to characteristic, space see Pakes (1998) and also Hausman (1997), who reports that infinite benefits are implied by some of his demand specifications). Welfare analysis of product introductions is one of the more important uses of demand systems. Still, because this type of analysis necessarily involves imputing benefits outside the range of the data, when we engage in it we should be particularly careful about the implications of the model s assumptions. Discrete choice systems have been applied to analyzing the benefits from new goods at least since the CT scanner study of Trajtenberg (1989). Pakes, Berry and Levinsohn (1993) go one step further and use the discrete choice system estimated by Berry, Levinsohn and Pakes (1995) to compute an ideal price index for autos that accounts for new good introduction, but Petrin (2000), in his study of minivans, shows how the importance of the additive component can drive the welfare implications of these models and implicitly raises a cautionary note for all of these studies. Petrin tries to reduce the impact of the additive component by using household-level data. This gives more room for differences in preferences for observable attributes to explain choices and lessens the impact of the additive component (a similar strategy is used in Berry, Levinsohn and Pakes (1998)). In this paper we provide an alternative procedure: do away the additive component entirely. At the very least we hope our alternative will give some indication of the robustness of the results from these studies to the presence of the additive component with full support. The additive component also has other implications that are suspect. Assume that the appropriate model has a finite-dimensional characteristics space (no tastes for products ). Then if we held the environment constant we would expect the space itself to fill up as the number of products grew large. 2 This has two implications that are at odds with the model with 2 The caveat on the environment is to rule out either technological changes, or changes 5

additive components with full support. First, as the number of products increases (holding population fixed) products will become increasingly good substitutes for one another and oligopolistic competition will approach the competitive case, with prices driven toward marginal cost. In models with additive components with full support there are always some consumers with a nearly infinite preference for each product. As a result as more goods are added markups do not generally go to zero but are bounded from below by some positive constant (a similar point is made by Anderson, DePalma and Thisse (1992) in the context of the logit model). This fact might lead us to worry about the implications of the model with the additive components on the incentives for product development, at least in markets with large numbers of products. Second, pure characteristics models with finite marginal preferences for each characteristic imply that the benefits that a consumer can gain from consuming a single product from the given market are bounded (no matter the number of products marketed). As we increase the number of products in a model with an additive component with full support we insure that each consumer s benefits grow without bound. This might lead us to worry about the implications of the model with additive components on estimates of the benefits to variety. 2.2 Finite Dimensional Models Here we provide a brief review of the literature on pure characteristic models. The theoretical literature on these models includes the Hotelling model of competition on a line, the vertical model of product differentiation of Mussa and Rosen (1978) (see also Gabszewicz and Thisse (1979) Shaked and Sutton (1982)) and Salop s (1979) model of competition on a circle. In all these models, demand is determined by the location or address of the products in the characteristics space and an exogenous distribution of consumer preferences over this space. As the number of products increases, the product space fills up, with products becoming very good substitutes for one another. The vertical model was first brought to data by Bresnahan (1987) and has been subsequently used by a few others including Greenstein (1996). Since we will want to explicitly allow for unobserved product characteristics we use a specification due to Berry (1994) u ij = X j β α i p j, (5) in competing and complimentary products, which alter the relative benefits of producing in different parts of the characteristic space. 6

where the unobserved and observed characteristics both enter through X j = (x j, ξ j ). In this model the quality of the good is the single index δ j X j β (6) and its value increases over the entire real line. All consumers agree on this quality ranking. The reason consumers differ in their choices is that different consumers have different marginal utilities of income (this generates the differences in their coefficients on price). Other examples of pure characteristics models are given in Caplin and Nalebuff (1991) and Anderson et al. (1992). These include the ideal point models in which consumers care about the distance between their location (v i ) and the products location (X j ) in R k : u ij = X j ν i α i p j. (7) where is some distance metric. A special case of this is Hotelling s model of competition on the line. If we interpret as Euclidean distance, expand (7) and eliminate individual specific constant terms that have the same effect on all choices (since these do not effect preference orderings), this last specification becomes u ij = X j β i α i p j. (8) Equation (8) is a pure characteristics random coefficients model. It allows consumers to differ in their tastes for different product characteristics, in addition to differences in their marginal utility of income. Note that unlike the standard random coefficients model used in the econometric literature 8 does not have tastes for the products, aside from the tastes for the X s themselves. 2.2.1 A Finite Dimensional Model for Empirical Work. We will investigate models which are special cases of (8). The extra constraint we impose on this model is that there be only one unobserved product characteristic; i.e. X = (x, ξ) R k R 1. If, in addition, we constrain the coefficient of the unobserved characteristic ξ to be the same for all consuming units, so that u ij = x j β i α i p j + ξ j, (9) our model becomes identical to the model in Berry, Levinsohn and Pakes (1995) without their additive component with full support. If we allow for coefficients on ξ which vary over consumers, so that u ij = x j β i α i p j + λ i ξ j, (10) 7

where λ i is an additional random coefficient, then our model is identical to the one in Das, Pakes and Olley (1995), but without an i.i.d. additive component. Our previous work has emphasized the reasons for (and the empirical importance of) allowing for unobserved product characteristics in estimating discrete choice demand systems (see, in particular, Berry (1994), Berry, Levinsohn and Pakes (1995) and Berry, Levinsohn and Pakes (1998)). Those papers note inconsistencies in estimation techniques that do not allow for unobserved product characteristics, explain the bias in price elasticities that are likely to result from omitting unobserved characteristics when they in fact are present, and show that those biases can be important. Partly because of the nature of the data we had at our disposal, we made no attempt to investigate multi-factor models that have more than one dimension of unobserved product heterogeneity (see Goettler and Schachar 1997). There may in fact be many different unobserved product characteristics with marginal utilities that differ among consumers, but the methods for estimating those models are beyond the scope of this paper. 3 Once we focus on the benefits from new goods, it might be preferable to allow for several unobserved product characteristics. Of course, if we allow for as many unobserved factors as there are products, then the pure characteristics model with multiple unobserved characteristics has the traditional models with tastes for products as special cases. We hope that, at the least, the pure characteristics model with one unobserved characteristic can provide a benchmark and a robustness test for the traditional models with product-specific tastes. Perhaps the two models together can provide bounds for the impacts of unobserved product product heterogeneity. 3 Estimating the Model. The estimation issues that arise when estimating the pure characteristics model are similar to the issues that arise when estimating more traditional discrete choice models. We use the techniques in Berry, Levinsohn and Pakes (1995) and Berry, Levinsohn and Pakes (1998) (and the vast literature cited therein) as starting points. There are, however, three differences with between the pure characteristics model and the rest of the literature: 3 Goettler and Schachar (1997), and a related literature in the field of marketing, consider the possibility that multiple unobserved product characteristics might be identified from data that observes the same consumers making a repeated set of choices. 8

We have to modify the method of calculating the aggregate market share function conditional on the vectors of characteristics and parameters (θ), We have to modify both the argument that leads to the existence of a unique ξ vector conditional on any tuple of parameters for the model and any vector of observed market shares, and the method of computing that vector, and The limiting distribution of the parameter estimate from the pure characteristics model is different than the rate for the rest of the literature, and this, in turn, suggests different tradeoffs in computational burden. We focus on these three changes, mentioning other aspects of the estimation strategy only when they impact directly on our Monte Carlo experiments or when the absence of the additive component calls into question some aspect of estimation on other types of data. 3.1 Computing Market Shares. In the model with product-specific tastes, market shares can be calculated by: i) conditioning on preferences for the product characteristics (the β i ) and integrating out the additive component to compute market shares conditional on the β i, and then ii) integrating over the β i. When the additive productspecific tastes follows a logit form, it can be integrated out explicitly. This both produces a smooth objective function and insures that the variance in the additive component does not contribute to computational error. When there is no additive component we must compute market shares in a different way. A simple replacement is to use the structure of the vertical model to integrate out of one of the dimensions of heterogeneity in the pure characteristics model, and then simulate over the other dimensions to obtain the aggregate share (this solves the aggregation problem by simulation, as in Pakes 1986). This procedure maintains a smooth objective function and insures that one of the components of heterogeneity does not contribute to computational error. Recall that the simple vertical model can be written as where δ j is product quality e.g. u ij = δ j α i p j, (11) δ j = x j β α i p j + ξ j. (12) 9

We normalize the utility of the outside alternative (u i,0 ) to zero. Order the goods in terms of increasing price. Then good j is purchased iff u ij > u ik, k j., or equivalently δ j α i p j > δ k α i p k, α i (p j p k ) < δ j δ k, k j. (13) Recall that (p j p k ) is positive if j > k and negative otherwise. So a consumer endowed with α i will buy product j iff α i < min k<j α i > max k>j δ j δ k (p j p k ) j(δ, p), and δ k δ j (p k p j ) j(δ, p). (14) These formula assume that 0 < j < J. However if we set 0 =, and J = 0. (15) they extend to the j = 0 (the outside good) and j = J cases. If the cdf of α is F ( ), then the market share of product j is s j (x, p, ξ) = ( F ( j (x, p, ξ)) F ( j (x, p, ξ) ) 1 [ j > j ], (16) where 1[ ] is the indicator function for the condition in the brackets. Note that if, then s j ( ) = 0. Since the data has positive market shares, the model should predict positive market shares at the true value of the parameters. However the pure characteristics model behaves unlike the standard models in that it will predict zero market shares for some parameter values (e.g. any parameter vector which generates an ordering which leaves one product with a higher price but lower quality than some other product will do). 3.1.1 The Extension to K Dimensions. Recall that the difference between the vertical model and the pure characteristics model is that in the pure characteristics model characteristics other than price can have coefficients which vary over consumers (the β i ). For example, u ij = x j β i α i p j + ξ j, (17) so that consumers have heterogeneous opinions about a product s quality, x j β i α i p j + ξ j. 10

Conditional on β i, the model is once again a vertical model with cut-off points in the space of α i, but now the quality levels in those cut-offs depend on the β i. To obtain market shares in this case we do the calculation in (16) conditional on the β i, and then integrate over the β i distribution. Given cut-off points (x, p, ξ, β) and (x, p, ξ, β), the market share function is then s j (x, p, ξ) = (18) (F ( j (δ, p, X, β)) F ( j (δ, p, X, β)) ) 1 [ j (x, p, ξ, β) > j (x, p, ξ, β) ] dg(β), where F ( ) is again the cdf of α and G( ) is the cdf of β. The conditioning argument used here avoids the difficult problem of solving for the exact region of the β i space on which a consumer prefers product j. 4 Also, although the integral in (18) is typically not analytic, we can use familiar simulation techniques to approximate it. 3.2 The Inverse of the Discrete Choice Probability Function. Now recall that to estimate their model Berry, Levinsohn and Pakes (1995) use the identifying assumption that E[ξ x, θ] = 0 at θ = θ 0, calculate moments formed from the cross product of ξ(θ) and functions of x, and then search for the value of θ that makes the values of these moments as close as possible to zero. In order to proceed, then, they need to i) show that for every value of the parameters of the model (θ) there is a unique value of ξ which makes the model s predicted shares just equal to the shares observed in the data, say s o and ii) provide a way of computing that value, say ξ(s o, θ). This paper provides an analogous estimation strategy for the pure characteristics model. Formal proofs of the consistency and asymptotic normality of both estimators are provided in Berry, Linton and Pakes (1999). To proceed, then, we need a method of finding the ξ(s o, θ) implied by the pure characteristics model. As in Berry, Levinsohn and Pakes (1995) we assume that s o, the (J + 1-dimensional) vector of observed market shares, is in the interior of the J-dimensional unit simplex (all market shares are strictly between zero and one), and consider the system s(θ, ξ) = s o, (19) with s(θ, ξ) obtained from (18) so that s(θ, ξ) is the vector of shares predicted by the pure characteristics model for the given value of (ξ, θ). Given the 4 FeenLev95 directly calculate the region of integration, A j R K such that if (β, α) A j then good j is purchased directly, but this becomes quite complicated. 11

normalization ξ 0 = 0, our goal is to show that for fixed θ this system has exactly one solution, ξ(θ, s o ), and to provide a way of finding that solution. Let the discrete choice market share, as a function of all unobserved product characteristics (including that of the outside alternative), be s j (ξ j, ξ j, ξ 0 ), (20) where ξ j is the own-product characteristic, ξ j is the vector of rival-product characteristics and ξ 0 is the characteristic of the outside good. Since we look for solutions that normalize ξ 0 to zero, we can define the element-by-element inverse for product j, r j (s j, ξ j ), as s j (r j, ξ j, 0) = s j (21) The vector of element-by-element inverses, say r(s, ξ), when viewed as a function of ξ takes R J R J. It turns out to be more convenient to work with a fixed point defined by the element-by-element inverse than to work directly with the system of equations defined by (19). In particular, the inverse of the market share function exists and is unique if there is a unique solution to the fixed point ξ = r(s, ξ). (22) Theorem. Suppose the discrete choice market share function has the following properties: 1. Monotonicity. s j is weakly increasing and continuous in ξ j and weakly decreasing in ξ j and ξ 0. Also, for all (ξ j,ξ 0 ), there must be values of ξ j that set s j arbitrarily close to zero and values of ξ j that set s j arbitrarily close to one. 2. Linearity of utility in ξ. If the ξ for every good (including the outside good) is increased by an equal amount, then no market share changes. 3. Substitutes with Some Other Good. Whenever s is strictly between 0 and 1, every product must be a strict substitute with some other good. In particular, if ξ ξ, with strict inequality holding for at least one component, then there is a product (j) such that s j (ξ j, ξ j, ξ 0 ) < s j (ξ j, ξ j, ξ 0). (23) Similarly, if ξ ξ, with strict inequality holding for at least one component, then there is a product (j) such that s j (ξ j, ξ j, ξ 0 ) > s j (ξ j, ξ j, ξ 0). (24) 12

Then, for any market share vector s that is strictly interior to the unit simplex; (i) an inverse exists, and (ii) this inverse is unique. Proof. Existence follows from the argument in Berry (1994). In providing our proof of uniqueness, we will show that the map r(ξ, s) is a weak contraction (a contraction with modulus 1), a fact which we will use in computation (see below). Take any ξ and ξ R J and let ξ ξ sup = d > 0. From (21) and Linearity s j (r j + d, ξ j + d, d) = s j. (25) By Monotonicity s j (r j + d, ξ, 0) s j, (26) and by (3) there is at least one good, say good q, which for which this inequality is strict (any good that substitutes with the outside good). By Monotonicity, this implies that for all j, r j r j + d (27) with strict inequality for good q. A symmetric argument shows that the condition s j (r j d, ξ j d, d) = s j (28) implies that for all j, r j r j d (29) with strict inequality for at least one good. Clearly then r(ξ, s) r(ξ, s) d, which proves that the inverse function is a weak contraction. Now assume that both ξ and ξ satisfy 22, but that ξ ξ = κ, i.e. that there are two distinct solutions to the fixed point. In particular let ξ q ξ q = κ. Without loss of generality assume that q substitutes to the outside good (if this were not the case then renormalize in terms of the good that substitutes with q and repeat the argument that follows). From above, s q (r q + κ, ξ q, 0) > s q. But this last expression equals s q (ξ q, ξ q, 0), which, since ξ is a fixed point, equals s q, a contradiction. It is easy to verify that the pure random coefficients model satisfies the assumptions of the theorem as long as the cdf F (α) is strictly increasing. 3.3 Limit Theorems. This section reviews results from Berry, Linton and Pakes (1999) who provide limit theorems for the parameter estimates from differentiated product models both with and without additive components with full support. The actual 13

form of the limit distributions for both models depend on the type of data available. We will focus on estimation from product level data (defined as data on market share, prices, and characteristics, possibly augmented with the distribution of consumer attributes, as in Berry, Levinsohn and Pakes (1995)), though similar types of issues arise when micro data is also available (see Berry, Levinsohn and Pakes (1998)). Again, our motive here is not only (or even primarily) to show that there are limit theorems for our models, rather we hope that the discussion in this and the next section provides some indication of the computational tradeoffs of between the different models on different types of data sets. The difference in the limit properties of the models with additive components with full support, and those without, stem from differences in the behavior of the inverse to the market share function between those two models, as J, the number of products, grows large. For clarity then we focus on the case where we have only one market and one time period and the appropriate limiting dimension is the number of products. In both models the objective function minimized in the estimation algorithm is a norm of G J (θ, s n, P ns ) = 1 J J z j ξ j (s n, x, p, θ). (30) j=1 where the ξ j are defined implicitly as the solution to the system s n j = s j (ξ, x, p, ; θ, P ns ), (31) the z are functions of the x in E[ξ x, θ 0 ] = 0, s n is the observed vector of market shares, and P ns is notation for the vector of simulation draws used to compute the market shares predicted by the model. The difference between the models is in the properties of the map s( ). In both models the objective function, G J (θ, s n, P ns ), has a distribution determined by three independent sources of randomness: randomness generated from the draws on the vectors {ξ j, x 1j }, randomness generated from the sampling distribution of s n, and that generated from the simulated distribution P ns. Analogously there are three dimensions in which our sample can grow: as n, as ns, and as J grow large. The limit theorems allow different rates of growth for each dimension. Throughout we take pathwise limits, i.e., we write n(j) and ns(j), let J, and note that our assumptions imply n(j), ns(j) at some specified rate. Note also that both s n and σ(ξ, θ, P ) take values in R J, where J is one of the dimensions that we let grow in our limiting arguments. 14

A simple heuristic argument will help explain the properties of our estimators. Write ξ(θ, s n, P ns ) = (32) ξ(θ, s 0, P 0 )+ { ξ(θ, s n, P ns ) ξ(θ, s 0, P ns ) } + { ξ(θ, s 0, P ns ) ξ(θ, s 0, P 0 ) }, and assume the function σ(ξ, θ, P ) is differentiable in ξ, and its derivative has an inverse, say H 1 (ξ, θ, P ) = { } 1 σ(ξ, θ, P ). (33) Abbreviate σ o (θ, s, P ) = σ(ξ(s, θ, P ), θ, P ) and H o (θ, s, P ) = H(ξ(s, θ, P ), θ, P ). This plus some regularity conditions imply that we can rewrite (32) as ξ(θ, s n, P ns ) = ξ(θ, s 0, P 0 ) + H 1 o (θ, s 0, P 0 ) {ε n ε ns (θ)} + r(θ, s n, P ns ), (34) where r(θ, s n, P ns ) is a remainder term, while ε n = s n s 0 and ε ns (θ) = σ[ξ(θ, s 0, P ns ), θ, P ns ] σ[ξ(θ, s 0, P ns ), θ, P 0 ]. Consequently, G J (θ, s n, P ns ) = G J (θ, s 0, P 0 )+ 1 J z Ho 1 (θ, s 0, P 0 ) {ε n ε ns (θ)}+ 1 J z r(θ, s n, P ns ). (35) The limit theorems in Berry, Linton and Pakes (1999) work from this representation of G J (θ, s n, P ns ). To prove consistency they provide conditions which insure that: i) the second and third terms in this equation converge to zero in probability uniformly in θ, and ii) an estimator which minimized G J (θ, s 0, P 0 ) over θ Θ would lead to a consistent estimator of θ 0. Asymptotic normality requires, in addition, local regularity conditions of standard form, and a limiting distribution for Ho 1 (θ, s 0, P 0 ) {ε n ε ns (θ)}. The rate needed for this limit distribution depends on how the elements of the J J matrix Ho 1 (θ, s 0, P 0 ) grow, as J gets large. This differs for the two classes of models. It is perhaps easiest to see the difference when we compare the simple logit model to the simple vertical model. In the simple logit model, u i,j = δ j + ɛ i,j, with the {ɛ i,j } distributed i.i.d. type II extreme value, and δ j = x j β αp j +ξ j. Familiar arguments show that s j = exp[δ j ]/(1 + q exp[δ q ]), while s 0 = 1/(1 + q exp[δ q ]). So δ j = ln[s j ] ln[s 0 ] and the required inverse is ξ ξ j (θ) = (ln[s j ] ln[s 0 ]) x j β αp j. Thus in this simple case ξ s j = 1 s j. 15

Now consider how randomness effects the estimate of ξ j (θ). In the simple logit model the only source of randomness is in the sampling distribution of s o. That is we observe the purchases of only a finite random sample of consumers. Letting their shares be s n we have, s n s o = ɛ n. The first order impact of this randomness on the value of our objective function at any θ will be given by Ho 1 (θ, s 0 ) ɛ n = ξ ɛ n, s s=s 0 which from above contains expressions like ɛ n j 1 s j. In the logit model as J, s j 0. So as J grows large the impact of any given sampling error grows without bound. A similar argument holds for the estimator of Berry, Levinsohn and Pakes s (1995) model, only in this more complicated model there are two sources or randomness whose impacts increase as J grows large, sampling error and simulation error. Consequently Berry, Linton and Pakes (1999) show that to obtain an asymptotically normal estimator of the parameter vector from this model both n and ns must grow at rate J 2. The computational implication is that for data sets with large J one will have to use many simulation draws, and large samples of purchasers, before one can expect to obtain an accurate estimator whose distribution is approximated well by a normal with finite variance. Now go back to the simplest pure characteristics model, the vertical model, i.e., u ij = δ j α i p j, with δ j defined as above and with u i0 = 0. Order the products so 0 = p 0 < p 1 < p 2 <..., and assume that 0 < δ 1 < δ 2,..., and that j = (δ j δ j 1 )/(p j p j 1 ) is also ordered in this way. These two latter conditions are necessary and sufficient for all products to be purchased (a fact we use below). If F ( ) is the distribution of α i, then the market share of good j, j = 1,..., J 1 is s j = F ( j ) F ( j+1 ), s J = F ( J ). Thus the derivative matrix is of the form s j = f( j ) j + f( j+1 ) j+1, ξ p ξ p ξ p where f( ) is the density of F ( ). None of these elements tend to zero as J. Indeed, Berry, Linton and Pakes (1999) provide sufficient conditions for all elements of the inverse matrix Ho 1 ( ) to stay bounded as J grows large. Consequently to obtain an asymptotically normal estimate of the parameter vector in the vertical model both n and ns need only grow at rate J. That is we should not need either as large a consumer sample, or as many simulation draws, to obtain reasonable parameter estimates from the vertical model. 16

4 Computation. We have two computational tasks. First, we have to specify how we calculate the market shares. Second, we have to develop a routine to calculate the inversion that produces ξ. Our computational methods follow directly from the expression for the market share in (18) and from the last theorem. 4.1 Market Shares. We use a conditional Monte Carlo simulation technique to calculate market shares. The idea is to condition on all the random utility coefficients (β) except for the random coefficient on price (α). This yields a simple vertical model. We then integrate the vertical model market shares, over the distribution of the β s. Thus, we make use of the fact that the pure random coefficients model can be expressed as a mixture of pure vertical models. In particular, as suggested by (18), we take ns draws from the distribution G of the random coefficients β and then calculate the sample mean 1 ns ŝ j (x, p, ξ) = (36) ( F ( j (x, p, ξ, β i )) F ( j (x, p, ξ, β i )) ) 1 [ j (x, p, ξ, β i ) > j (x, p, ξ, β i ) ]. i Assuming that f( ) has positive density on R +, this calculation is simplified by noting that a necessary and sufficient condition for the indicator function s condition for product j to be one conditional a β i is that max q<j q (, β i ) < j (, β i ) (recall that the j ordering is the price ordering). Our program then first draws a β, then computes the { j (, β)}, then drops those goods for whom (, β i ) are out of order, and only then computes the remaining shares. Note also that unless the distribution of β is discrete, in which case a simple mean similar to the mean in (36) will provide an exact calculation of the share, the generalization from the vertical model to the pure characteristics model introduces simulation error into the estimates (just as the generalization from the simple logit to the random coefficient logit in Berry, Levinsohn and Pakes (1995) adds simulation error). 4.2 Inversion The second problem we face is computing the inverse market-share function. Unlike our earlier work in BLP on more traditional models, we know of no simple method for solving the fixed-point problem that defines the inverse. 17

We have two methods for finding the inverse. First, we could use the element-by-element inverse, r(s, ξ), shown to lead to a weak contraction in the proof of the theorem in section 3.2. If the weak contraction had modulus that was strictly less than one, this would be guaranteed to converge to the fixed point at a geometric rate. Unfortunately we have been unable to prove that the modulus is strictly less than one, and in Monte exercises studies we find that it sometimes contracts very slowly. This contrasts to the random coefficients logit specification used in Berry, Levinsohn and Pakes (1995), for which the modulus of contraction was less than one, and we did not have these kinds of problems in using the contraction to find the inverse. Therefore, we turn to standard fixed-point computational methods, such as homotopy [cites] In our variant of the homotopy method, we begin with the standard fixed-point homotopy equation: where h(ξ, t, ξ 0 ) = (1 t) (ξ ξ 0 ) + t (ξ r(s, ξ)), t is a parameter that takes values between zero and one, δ 0 is an initial guess for δ and the function r(δ) returns the element-by-element inverse of the market share function (see equation (21)). For each value of t, we consider the value of ξ that sets h(ξ, t, ξ 0 ) to zero. Call this ξ(t, ξ 0 ). For t = 0, this solution is trivially the starting guess of ξ 0. For t = 1, this solution is the fixed point that we are looking for. The homotopy methods suggest starting at t = 0, where the solution is trivial, and slowly moving t toward one. The series of solutions ξ(t) should then move toward the fixed-point solution ξ(1). If t is moved slowly enough, then by continuity the new solution should be close to the old solution and therefore easy to find (as by a Newton method starting at the prior solution). For our problem, it turns out by that a version of the fixed-point homotopy is a strong contraction when t < 1. In particular, re-write the homotopy equation h(ξ(t, ξ 0 ), t, ξ 0 ) = 0 as ξ(t, ξ 0 ) = (1 t)ξ 0 + tr (s, ξ(t, ξ 0 )). (37) This suggests a recursive solution method, taking an initial guess, ξ, for the solution ξ(t, ξ 0 ) and then placing this guess on the RHS of (37) to create a new guess, ξ : ξ = (1 t)ξ 0 + tr(s, ξ). (38) 18

For t < 1, this is a strict contraction mapping and the solution to the homotopy equation is very fast and reliable for values of t that are not very close to one (the proof relies on the properties of the best-reply function r.) As t approaches one, the modulus of contraction for fixed point in (37) also approaches one. Thus when t is very close to one (when we are very close to the final answer to the inverse), we may not be able to rely on the contraction property, but instead use a Newton method to zero the equation. In practice, we find that it is sometimes necessary to move t very slowly as it approaches one. 4.3 Computational Comparison to the Model with Tastes for Products. Gathering the results of prior sections we have two theoretical reasons for expecting the computational burden of the pure characteristics model to differ from the computational burden of the model with tastes for products. First, the number of simulation draws needed to get accurate estimates of the inverse, and hence of the moment conditions, must grow at rate J 2 in the model with a taste for products, while it need only grow at rate J in the pure characteristics model. Second the contraction mapping used to compute the inverse is expected to converge at a geometric rate for the model with tastes for products, but we do not have such a rate for the pure characteristics model. The first argument implies that computation should be easier in the pure characteristics model, the second that computation should be easier in the model with tastes for products. Of course which of the two effects turns out to dominate may well depend on features of the data being analyzed; the number of products, the number of important characteristics,.... 5 Monte Carlo Results. We have performed some limited Monte Carlo experiments to test our algorithm. Eventually, we will consider how the pure characteristics model compares to the model with tastes for products when neither model is quite correct; we are particularly interested in the case where the true model has a fixed dimension, but where there are multiple unobserved dimensions of product characteristics for which consumers have heterogeneous tastes. This seems like a realistic benchmark and each of the simple models is then just an approximation to a more complicated data generating process. For now, we have only some preliminary Monte Carlo Results to test 19

whether the algorithm is working at all. model that assumes the utility function We have generated data from a u ij = θ 1 + x j β i α i p j + ξ j. (39) There is a single x j drawn from U(0, 1) and ξ j is drawn from the same distribution. Price, p j is set equal to ξj 2, which obviously ensures that price and ξ j are correlated and also helps to ensure that each randomly drawn product is purchased with positive probability. 5 The consumer sensitivity to price, α i is also drawn from U(0, 1), In the Monte Carlo exercise, we attempt to estimate intercept θ 1 plus the the parameters of the distribution of β i. We draw the true β i s from the symmetric distribution with discrete support. The support is ( θ 3, θ 2, θ 3 ), with true values of ( 1, 0, 1). The symmetric probabilities associated with these points are denoted P rob(β i = θ 3 ) P rob(β i = θ 3 ) θ 4 and P rob(β i = θ 2 ) = 1 2θ 4. The true value of θ 3 is 1/3, so that each element of the support is drawn equal probability. The discrete distribution for β i reduces computational time greatly, which is particularly important in a Monte Carlo exercise. In future work, we will further investigate the probabilities of simulated probabilities over smooth distributions. The data is created for T markets with N t firms in each market. N t is drawn randomly for each market. In our first example, N t is drawn from set (6, 7, 8, 9, 10), with equal probability on each outcome. The number of markets is set equal to 40. The instruments are the firms own x, plus the x s of the firms that are adjacent in the ordering of the x s (to get some idea of local competition.) In addition, the number of firms in the market and the sum of the x s of all firms in the market are used as instruments. Since this is just a Monte Carlo exercise, we only estimate the parameters of the βs. We impose that the taste distribution is symmetric. The parameters are then θ 1, the intercept, θ 2, the center point of the β distribution, θ 3, the spread of the β distribution (defined above) and θ 4, the probability that β i = θ 3 5 In particular, this ensures that the consumer with β i = 0 will purchase each good. 20

The results from 100 Monte Carlo Estimations are given in Table 1. The results are mildly encouraging that the routine is working, although the standard error of the mean estimated θ 4 is implausibly low given its difference with the true value. It may be that 100 Monte Carlo simulations is not enough, or else that some small sample affect is at work. Table 1 Results from 100 Monte Carlo Estimations Parameter Description True Value Mean Estimate SD of Mean θ 1 intercept 2.5 2.508 0.0087 θ 2 mean β i 0.5 0.454 0.0089 θ 3 spread 1.0 1.650 0.1980 θ 4 Prob β i = θ 3 0.333 0.270 0.0095 6 Calculating a Price Index In this section, we consider the problem of calculating a true price index from estimated demand parameters of an explicit model. Quite obviously, if the estimated demand model is incorrect, then the price index will also be incorrect; the interesting question is the direction of the bias under different conditions. Petrin (2000) has shown how the i.i.d. errors of the traditional discrete choice model may bias upwards the estimated welfare gains of a new product. This effect might tend to decrease a calculated price index when new goods are introduced. There is another effect, however, that is present when introducing goods into a finite-dimensional product space. If the product is located in the center of the existing product space, then the aggregate consumption of the goods is unlikely to increase, although the new product brings in extra utility for some consumers (because of a better match to their preferences.) [cites to earlier discussions of this issue]. A logit-like model will have to explain why the new product did not increase aggregate shares despite introducing a new ɛ ij. The result will be to reduce the estimated mean utility levels of all the products. This will in turn reduce the estimated consumer welfare in the post-product introduction period. In some cases, counter to the Petrin example, the effect can be to under-estimate the benefit of the new product. 6 6 Nevo (1999b) discusses an example where this effect in the logit model just offsets other specification errors and produces the correct result; however, Nevo s example does not 21