Ec2610 (IO) Problem Set I Differentiated Product Demand Suggested solutions

Ec2610 (IO) Problem Set I Differentiated Product Demand Suggested solutions Ariel Pakes, TF Ashvin Gandhi * Due: October 14, 2015 October 14, 2015 Preliminaries Cut from solutions. 1 Background on demand estimation Cut from solutions. 2 Estimation exercise 2.1 Setting Cut from solutions. 2.2 Data description For the empirical exercise, we are giving you data on T = 10 markets. In these markets, 11 different firms sell a total of J = 247 products. All of the products are unique, so none of them are offered in * agandhi@fas.harvard.edu. These solutions are attributable to Daniel Pollmann. 1

multiple markets. The dataset is simulated, but you can still think of a product as a passenger vehicle with a set of characteristics if you like, although the units do not have an interpretation. The dataset contains the following pieces of data, where products are ordered by market (1-10): ˆ prodsmarket : T -vector of the number of products in each market ˆ share : J-vector of market shares ˆ f : J-vector denoting the firm that sells the product ˆ ch : J 4-matrix of constant and three product characteristics ˆ pr : J-vector of prices ˆ costshifters : J 2-matrix of cost shifters 2.3 Basic summary statistics 1. Prepare a table with the following pieces of information for each market: How many firms are active? How many products do they market in total? What fraction of agents bought one of the goods in the sample period? This question just tries to get you acquainted with the market structure in the data. The last question has some significance because it relates to the definition of the outside good, which is not innocuous. Some of you asked about the definition of the outside good. That s a good point, since it is true that this may involve a decision on the part of the researcher. First, note that whether you care about this depends a bit on what you re trying to model/estimate. For the demand side parameters without supply side info, this doesn t matter that much, since it just affects the value of the constant term in the utility (verify this). One way of thinking about the potential market is how many people would purchase a product if the vector utilities went to infinity (then the 1 in the share denominator gets dominated). This could happen if the prices of all products went to negative infinity. That seems like a sweet deal for consumers, and we d think they d all want one or more units. So you see the trouble here. Sometimes, it s okay to assume that everyone is potentially in the market; the potential market for newspapers could sensibly be the number of households, at least in a local area. In certain cases, survey data tells us who considered buying a product in a given year, so Ec2610 2 Problem Set I

that s something (though more people might have considered with prices at negative infinity). In the new cars setting in BLP, part of the trouble stems from the fact that we re looking at sales of a durable good, so consumers may really like a product, they just don t buy every year (notice how this is different from modeling, e.g., market shares of cable TV providers, or cereal). If we don t want to set up a full-blown model of dynamic demand, we need to make some assumption on how many people were in the market for a new car in a given year. Since we re not actually interested in market shares at prices equal to negative infinity, we want to choose a value that makes sense when considering local price variation. 2. Prepare a table with summary statistics for market share, characteristics, price, and cost shifters. Please include mean, median, minimum, maximum, and standard deviation. You can inspect these statistics separately for each market, but in what you report, you may pool all markets. This is always a useful thing to do when working with a new dataset. In part to get a sense for the variables you have at your disposal, in part to make sure there aren t any obvious errors in your data (never trust your TF). Some of you said that it was difficult to think about which orders of magnitude made sense for the coefficients in the discrete choice model. One first guide is obviously the orders of magnitude of the characteristics. Almost more importantly though, we want to think about the variation in each variable, because this variation, scaled by its coefficient, determines the variation in utility across the different choices arising from that particular characteristic (admittedly, it is difficult to have a prior for this when nobody has told you what the characteristics are). Given the quasilinear specification, I suppose the utility we model is even not purely ordinal. In the pure logit model, an increase in mean utility by one means that, given a large number of products in the market, the share will multiply by e 1 2.72, so if your characteristic is horsepower, a coefficient of.01 probably makes more sense than a coefficient of 1. For the purposes of the statistical model, the more important normalization arises from the fact that we have set the scale parameter of the Type-I EV distributed errors to one, resulting in logit errors. Similarly, a probit model typically assumes standard normal errors. It is important to be aware that in discrete choice, it is not just mean utility that matters for probabilities and shares, but also the variance, and the covariance with the utility of other products. Modeling the latter flexibly is one of the defining features of BLP. Ec2610 3 Problem Set I

2.4 Pure logit model 1. Suppose agents have the following utility function, where i denotes the agent, and j denotes the product: u ij = δ j }{{} x j β αp j+ξ j +ɛ ij, where ɛ ij is an iid error following a standard Type-I Extreme Value distribution with F (ɛ) = e e ɛ ( logit errors). Suppose further that the firms know ξ when setting prices, but that they cannot use this information to adjust characteristics in the short run. (a) What statistical assumptions can you make based on this? Which of your conditions, based on data provided to you, identify the parameter vector of interest, θ = (α, β)? In other words, what are valid (and relevant) instruments? Is the model over-identified? The usual assumption is that characteristics and cost shifts are orthogonal to the unobserved product characteristic, but price is not. In this case, even with just the cost shifters as additional instruments, the model is over-identified. If we strengthen orthogonality to mean independence, we can use any function of these instruments as instruments, though they would only be helpful for identification if they brought in additional variation. (b) Show how you can invert market shares to obtain the mean utility level δ j for each product. See the BLP slides. (c) Estimate θ = (α, β) and provide standard errors for your estimate. You can try different combinations of instruments, but please use all the different types of instruments that are included or can be constructed from the data (i.e., BLP instruments ). See the GMM slides for estimation and inference. 2. Estimate and present the matrix of cross- and own-price elasticities for market 10 based on your model and parameter estimates. 1 1 The symmetry of the matrix of share derivatives with respect to price (though not of elasticities) may have reminded you of the symmetry of the Slutsky matrix of the derivative of uncompensated demand with respect to price (Mas-Colell, Whinston and Green, 1995, Proposition 3.G.2). Anderson, de Palma and Thisse (1992, p. 67) (available on Hollis) show that it also holds in discrete choice settings with constant marginal utility of income in the region of interest, e.g., with quasilinear utility. Goolsbee and Petrin (2004) take advantage of symmetry to identify a cross-price derivative which would not be identified otherwise due to lack of variation in one of the prices. Ec2610 4 Problem Set I

3. In the next question, we are going to free up the substitution pattern by introducing random coefficients as in BLP. Alternatively, we could think about implementing nested logit, the pure characteristics model, or multinomial probit. Would they be appealing in this setting? Why or why not? See the BLP slides and lecture notes. 4. Bonus question: Explain how Hausman instruments work, and under which assumptions they are valid. Can you form them in this case? How or why not? Hausman instruments (MIT s Jerry Hausman of Hausman test fame) are prices of the same product in other markets (Hausman, 1996) provided these are uncorrelated with the demand shock in the market in question; hard to do here because we do not see the same product in different markets in our data. 2.5 Random-coefficient logit model 1. Suppose agents have the following utility function, where i denotes the agent, and j denotes the product: u ij = δ }{{} j + x j β αp j+ξ j k {1,2} σ k ν i,k x j,k σ p ν i,p p j + ɛ ij, where ɛ ij is an iid error following a standard Type-I Extreme Value distribution, and ν i, iid N (0, 1) is an iid standard normal error. To summarize: the model is as before, but with random coefficients on the constant, the first characteristic, and price. remain the same as before. The orthogonality/exogeneity assumptions (a) What is the contraction mapping used here for the inner loop? Is there a way to reduce the computational burden from the contraction mapping? (Hint: take a look at page 4 of the appendix to Nevo (2000).) In the following, make sure to set the inner tolerance level for the contraction mapping very tight, in your final run ideally on the order of 10 14. See Nevo s appendix. In practice, the exponentiated version of the contraction mapping tends to be used. (b) Write the parameter vector of interest as θ = (θ 1, θ 2 ), where θ 1 are the linear parameters, and θ 2 are the nonlinear parameters. Which parameters are in θ 1 and which are in θ 2? Ec2610 5 Problem Set I

What does this imply for estimation? The linear parameters are α and β. The nonlinear parameters are σ. (c) Bonus question: Explain how the variance terms σ are identified from variation in the choice set and prices. The ideal experiment would be to randomly remove products from certain markets and observe which fraction of consumers reallocates to the different alternatives. If they chose a product particularly close in one characteristic, but not in others, then we would think that the variance in the random coefficient of that product should be particularly large. Removing a product from the choice set is the same as sending its price to infinity, so we would think that reallocation of market shares based on variation in prices is also useful, although in both cases, we should also ask ourselves why a product is missing from the choice set or its price is higher (the two would probably occur under opposite circumstances). The recent papers I refer to in the preliminary section provide a much more formal treatment. (d) Estimate the model using 2-step optimal GMM. In addition to your point estimates, please provide standard errors. (Hint: take a look at page 6 of the appendix to Nevo (2000) for analytic standard errors, and/or use finite differences for a numerical approximation.) If you try different starting values, do your estimates change? You may have found some sensitivity with respect to starting values. Search algorithms are only guaranteed to find a local minimum. In practice, it is useful to check whether that is the case by trying out different starting values. Provided you can find all the local minima, the minimum of that set is the global minimum, so try to compare the objective function values. You should also look at and interpret the reason that the search algorithm terminated, and you might want to set its termination rules to stricter levels: continue with smaller changes in the objective function value or the value of the argument and allow more function evaluations. Please have a look at my code to see how I use the bootstrap to adjust my standard errors for the simulation error. 2. Compare the cross- and own-price elasticities for market 10 for the RC logit and pure logit model. We should see more substitution to products that are close in the characteristics space. 3. We are assuming here that demand in all markets is identical. With data on the distribution of Ec2610 6 Problem Set I

income within each market, how could you let the distribution of α i vary systematically across markets? BLP parametrizes price sensitivity as a function of income. To simulate income, they draw from a lognormal distribution. If the income distribution varied by market, this would be useful to incorporate to have a better model for price sensitivity. 4. Let s assume, only for this question, that you had micro moments as in Berry, Levinsohn and Pakes (2004): the covariance of consumer attributes and product characteristics as well as the covariance between first- and second-choice characteristics. How could you integrate them into your estimation procedure to improve the precision of your estimates? Which of the coefficients would each of the different sets of moments be particularly useful in pinning down? Your model should predict covariances similar to those in your sample, so you could add these moments to your GMM procedure. The covariance of consumer attributes and product characteristics helps with the coefficients on observed household heterogeneity (the coefficients β o on the interaction terms of household attributes and product characteristics), while the covariance between firstand second-choice characteristics helps with the coefficients on the unobserved heterogeneity (the standard deviations of the random coefficients based on typically normal draws). 5. Suppose in addition that each product has the following marginal cost structure: mc j = [ x j cs j ] γ + ωj, where cs is the J 2 matrix of cost shifters. (a) Explain how you can estimate γ based on your estimates for the demand side and different assumptions on the supply side (product-level profit maximization, firm-level profit maximization, collusion to maximize total profit). How could you obtain valid standard errors? One way to estimate the supply side parameters is to solve for the marginal cost vector implied by the demand side estimates and run a simple regression on the characteristics. You see this done in some papers, when for some reason joint estimation was too burdensome/difficult or somehow not possible. The disadvantage of this two-step estimator is a loss of efficiency relative to the joint GMM estimator. Ec2610 7 Problem Set I

Your usual standard errors from the above regression would not be valid. Instead, you would need to account for the fact that the marginal cost plugged into the regression is an estimate of the true marginal cost mc j stated above. The true cost includes ω j that is not the issue but there is an additional error that stems from the fact that the demand-side parameter plugged in to solve for marginal cost are estimated themselves. So, since marginal cost is a function of the estimates for the demand-side parameters, which are random, it is random itself. Formally, since the derivative of the moment condition used to estimate the supplyside parameters with respect to the parameters that require plug-in estimates is different from zero. 2 If you want to or have to estimate demand-side parameters and supply-side parameters sequentially, you can do so analytically or using a bootstrap procedure (notes). With joint GMM, you do not have this issue and your standard errors are obtained the usual way. (b) Can you also estimate the demand side and supply side jointly? How would you do so? What are the linear and nonlinear parameters now? Can joint estimation improve the precision of your estimates for the demand side parameters? Please explain how. Is there a caveat? The supply side provides you with additional moment conditions (ω interacted with instruments), which you could add to your GMM procedure. The supply side parameters are linear parameters, but α is now a nonlinear parameter because it appears nonlinearly in the marginal cost equation. This implies that α becomes one of the parameters which we search over in the outer loop. Hayashi ( Econometrics, p. 272) discusses conditions under which single-equation and multipleequation estimates will be numerically or asymptotically equivalent. Clearly, if both equations are just-identified, the results will be numerically identical, but if at least one of them is overidentified, the efficient multiple-equation GMM estimator will generally be more efficient. The exception is if the off-diagonal block matrices are zero; in our case, under conditional homoskedasticity, this would reduce to ξ and ω being uncorrelated. While we did not generate them with any correlation, it is conceivable that the unobserved characteristic ξ valued by consumers actually has a cost of producing that appears in ω. There are several questions which I wasn t able to formally answer in spite of some serious attempts at rewriting the variances matrices applying different results on partitioned matrices 2 The case where this derivative is zero is often referred to as the adaptive case, while the non-zero case is referred to as non-adaptive. However, the term adaptiveness also has a different meaning in the statistics literature on semiparametric efficiency (e.g. van der Vaart, 1998, p. 223). I digress. Ec2610 8 Problem Set I

and matrix equalities: if we add a just-identified system (exhibiting positive cross-correlation with the original system), will the variance of the parameters in the original over-identified system decrease? Does it depend on whether any of the parameters of the original system appear in the new system? I would like to know this because I am adding a just-identified supply side (just-identified in terms of additional parameters) to my over-identified demand side, and I would like to know whether the variance of my estimates improves. I did a simplified numerical experiment in Matlab, since I couldn t prove things formally (yet), and it turns out that for many random draws of covariance and gradient matrices, the demand-side parameter variance never changed if we added a supply side, only that the supply-side parameters had a lower variance than otherwise because of joint estimation (the latter assuming no common parameters). The demand-side variance did not decrease even when I put demand-side parameters in the supplyside equation as is true in BLP. This is in line with some notes I have seen, but which gave no proof. The above is no proof either, but I find it fairly convincing because I randomly generated many systems and tried different configurations. Try it yourself; the code is in the appendix. This implies that to get any asymptotic efficiency gains from using the supply side, we better make sure it is over-identified. BLP uses the whole set of instruments on the supply side, but it is not clear whether the additional moment conditions are actually over-identified, since the additional instruments need to actually be relevant for the supply-side regressors. The other caveat is that additional moments only help when they are valid. Otherwise, they will bias the parameter estimates. In this example, if the supply side is incorrectly specified it requires us to make an assumption on the market structure this contamination will also feed into the estimation of the demand side parameters. The next issue is that most of the above is a large-sample analysis, i.e., as the number of observations tends to infinity. In finite sample, we may well encounter issues related to weak or many instruments problems, in which we are essentially overfitting the first stage of the IV problem and thereby allowing too much noise to pass through nonlinearly, leading to bias. (c) Estimate the demand side and supply side parameters jointly under the assumption that firms maximize total firm profits. You don t need to provide standard errors. (Hint: since we have two sets of moment conditions, we can use multiple-equation GMM for the linear Ec2610 9 Problem Set I

parameters. 3 ) (d) Bonus question: suppose you had to select among the different pricing assumptions; how could you do so? Hint: have a look at Nevo (2001). Nevo compares margins from the model to accounting margins and concludes that multiproduct Bertrand-Nash cannot be rejected. Alternative tests are discussed in Nevo (2001, pp. 334-35). One issue with these statistical tests, including tests of overidentifying restrictions, but generally any test of narrow or point hypotheses, is that any model will be misspecified to some extent, since it is just a model. Failure to reject may then simply imply that we do not have enough observations. 2.6 Counterfactual (bonus exercise) Firms 1 and 10 have announced their intention to merge. 4 As the regulator in market 9, you are concerned that this will lead to an increase in prices and a decrease in product variety. Are these concerns justified? Please explain carefully and show your estimates. What concerns do you have about this counterfactual? For this question, you need to recompute the pricing equilibrium given your estimates of the demand side and the cost structure, but with a matrix in the FOC that takes into account the change in the ownership structure. One issue is what to do with the vector ξ; since we think of it not as noise, but a characteristic that households observe (just the econometrician doesn t), it probably makes sense to use the residuals ˆξ. In order to recompute the pricing equilibrium, you could either solve the system of nonlinear equations given by the set of FOCs or use fixed-point iteration. To speak to the two concerns of the regulator, you should compare prices before and after in some disciplined way. The question about product variety essentially asks whether any firm would decide to remove their product from the market. One way to check this would have been to go through all the possible combinations of sets of products which can be offered and compare implied profits. However, removing a product from the market is the same as setting its price to, so if the latter is not a pricing equilibrium, no firm would want to remove a product from the market. Indeed, it seems that with logit demand and in the absence of fixed costs, no product will ever be removed. This is clearly an important 3 See, e.g., Eric Zivot s notes on multiple-equation linear GMM notice that he flips the notation for X and Z relative to their typical use. Hayashi s unmistakably titled textbook Econometrics (2004) devotes its entire chapter 4 to the topic, and Zivot s notes appear to be based on it. 4 Mergers will be discussed in lecture later on in the course. Ec2610 10 Problem Set I

modelling concern, since we do not want our model to rule out situations we deem important in reality. The impact of entry and exit on prices of incumbents is actually ambiguous with heterogeneous consumers. Suppose a low-price product exits the market. An incumbent product loses a competitor, so should it increase its price? Not necessarily, because it may now sell to a greater number of more price sensitive consumers. BLP-type models are thus also much richer in terms of the competitive responses they can rationalize, though in your specific application, you would still want to verify how restrictive your model is. 3 Feedback Thanks to those of who you provided feedback, it s much appreciated. References Anderson, Simon P., André de Palma and Jacques-Francois Thisse. 1992. product differentiation. Cambridge, Mass.: MIT Press. Discrete choice theory of Berry, Steven, James Levinsohn and Ariel Pakes. 2004. Differentiated Products Demand Systems from a Combination of Micro and Macro Data: The New Car Market. Journal of Political Economy 112(1):68 105. URL: http://ideas.repec.org/a/ucp/jpolec/v112y2004i1p68-105.html Goolsbee, Austan and Amil Petrin. 2004. The Consumer Gains from Direct Broadcast Satellites and the Competition with Cable TV. Econometrica 72(2):351 81. URL: http://ideas.repec.org/a/ecm/emetrp/v72y2004i2p351-381.html Hausman, Jerry A. 1996. Valuation of New Goods under Perfect and Imperfect Competition. In The Economics of New Goods. NBER Chapters National Bureau of Economic Research, Inc pp. 207 248. URL: http://ideas.repec.org/h/nbr/nberch/6068.html Mas-Colell, Andreu, Michael D. Whinston and Jerry R. Green. 1995. Microeconomic Theory. Oxford University Press. Nevo, Aviv. 2000. A Practitioner s Guide to Estimation of Random-Coefficients Logit Models of Ec2610 11 Problem Set I

Demand. Journal of Economics & Management Strategy 9(4):513 48. URL: http://ideas.repec.org/a/bla/jemstr/v9y2000i4p513-548.html Nevo, Aviv. 2001. Measuring Market Power in the Ready-to-Eat Cereal Industry. Econometrica 69(2):307 42. URL: http://dx.doi.org/10.1111/1468-0262.00194 van der Vaart, A. W. 1998. Asymptotic Statistics. Cambridge University Press. A GMM variance experiment A.1 Introduction Recall that given the optimal choice of weighting matrix, ) d n (ˆθ θ0 N 0, ( Γ 1 Γ ) 1. }{{} V Now we divide the set of moment conditions. In BLP, the first set could be the set of demand-side moment conditions, while the second set could be the set of supply-side moment conditions. We also divide the set of parameters into two sets. In BLP, we would put (α, β, σ) into the first set and γ into the second set. Accordingly, we write Γ = = Γ 11 Γ 12 Γ 21 Γ 22 11 12. 21 22 For example, each column of Γ 12 is the derivative of the first set of moment conditions with respect to an element of the second parameter set. Suppose in the following that Γ 12 = 0, while Γ 21 need not. In BLP, this makes sense because the demand-side parameters appear in the supply-side equation (common parameters), but not vice versa. Clearly, 21 = ( 12 ), since is a covariance matrix and as such symmetric. In the BLP example, under conditional homoskedasticity, ξ and ω need to be correlated for 21 0. Ec2610 12 Problem Set I

We are particularly interested in the case where the second set of moment conditions is just-identified (in the additional parameters). A.2 Code 1 %% original overidentified system 2 3 % one parameter, two equations 4 Gam1 = [1; 5 -.7]; 6 7 Del1 = [1, -.3; 8 -.3, 2]; 9 10 % optimal GMM variance 11 inv(gam1'*inv(del1)*gam1) 12 13 %% add just identified system 14 15 % new variance 16 a = rand(2); 17 b = triu(a) + triu(a,1)'; 18 c = rand(2); 19 d = triu(c) + triu(c,1)'; 20 21 Del2 = [Del1, b; 22 b', d]; 23 24 % no common parameters 25 Gam2 = blkdiag(gam1,[.9, -.3; -.23 -.18]); 26 27 % optimal GMM variance 28 % result: variance of original parameter constant 29 inv(gam2'*inv(del2)*gam2) 30 31 % common parameters Ec2610 13 Problem Set I

32 Gam2(3,1) = rand(1); 33 34 % optimal GMM variance 35 % result: variance of original parameter constant 36 inv(gam2'*inv(del2)*gam2) Ec2610 14 Problem Set I