Worldwide Gravity in Online Commerce
|
|
|
- Godwin Robinson
- 10 years ago
- Views:
Transcription
1 Worldwide Gravity in Online Commerce Bo Cowgill UC Berkeley, Haas Cosmina Dorobantu Department of Economics, Oxford University August 27, 2014 Abstract We analyse geographical patterns of cross-country Internet transactions using proprietary data from Google. Spanning all countries from and over 10 billion online transactions, the data allow us to examine a fast growing, but neglected area of trade. We find the elasticity of distance to be around -0.5, indicating that distance effects persist in the virtual marketplace. We also discover that cultural characteristics, such as shared languages or religions, have a large impact in e-commerce, while economic ties, such as a common currency, have an insignificant effect. Our results improve upon an unpublished study of ebay transactions by significantly expanding the set of countries and correcting known econometric biases. JEL Classifications: F10, F14, F19, L86 Keywords: Online trade; gravity; international trade; Internet economics; ecommmerce; distance; trade costs Authors are listed in alphabetical order. The authors thank Google for sharing the data used in this paper. We also thank Beata Javorcik, Marc Melitz, Bertin Martens, Andres Rodriguez- Clare, Pierre-Louis Vezinas, Noam Yuchtman and seminar participants at Oxford, the European Trade Study Group and the European Commission for helpful comments. All errors are our own. Cowgill acknowledges financial support from the Kauffman Foundation and Google. Dorobantu acknowledges financial support from the Economic and Social Research Council. Cowgill: bo [email protected]. Dorobantu: [email protected]. 1
2 1 Introduction Few empirical results in economics have been more widely confirmed than the negative relationship between distance and trade. Since the 1960s, economists have estimated gravity equations to measure the relationship of distance and trade between countries. Head and Mayer (2013) collected over 2,500 of these estimates and calculated the mean of the estimated coefficient on distance. They find that the average of the estimated distance elasticity is around -1, indicating that a 1% increase in the distance between two countries leads to a 1% decrease in trade between them, all else being equal. With the rise of the Internet, an obvious question is whether distance still matters in a world that is more connected every day. The business press has an answer: distance is no longer relevant. Frances Cairncross book, The Death of Distance: How the Communication Revolution is Changing our Lives is a New York Times Business Bestseller. Thomas Friedman s The World is Flat won the Financial Times s Book of the Year Award. Are these authors right? Is distance irrelevant in a digitised, interconnected world? We answer this question by applying traditional methods from international economics to new data about Internet commerce. Online commerce recently passed $1 trillion in transactions and its growth shows no signs of abating. Online transactions are no longer a negligible part of the global economy. Our paper makes the important contribution of establishing whether the rules governing trade in the offline world also apply online. In doing so, we improve on other recent papers on this topic whose results were limited by data availability and econometric methodology. Distance may matter differently online and offline. Exporting in the offline world requires high fixed costs. A company must conduct extensive market research to decide which countries to enter. It needs to know if there is demand for its products, how to market itself, and what prices it can expect in the importing country. Once it makes a decision to enter a particular market, the company faces more costs. It needs to establish a partnership with a distributor or a retailer, it needs to localise its products to ensure that they meet the linguistic, cultural and regulatory expectations of the importing country, and it needs to find solutions to logistical problems such as transportation and storage. All of these activities are costly for the company in both money and time. Exporting in the online world, by comparison, is much easier. As Lendle et al. (2012) argue, information frictions that plague offline trade do not exist in the 2
3 online environment. A company that has an online presence can sell to a foreign country without sending its analysts there to do market research, and without signing a lengthy contract with a distributor or a retailer. Similarly, customers wishing to buy a product from a foreign company face close to zero search costs, as all they need to do in order to connect with a foreign seller is run an Internet search. Allen (2011) estimated information costs to account for 93% of the distance effect in the offline world. In the online environment, Lendle et al. (2012) hypothesise that with no information frictions, distance should have a smaller effect on trade. The question we address in this paper is: how much does distance matter online? We answer this question using a proprietary data set from Google covering online transactions. The data set is the most comprehensive one used to conduct a worldwide gravity analysis relating to online trade to date. It covers more than 10 billion transactions conducted online over a four year timeframe, The data cover all countries, which is a vast improvement over other studies. The most extensive online gravity analysis before ours is Lendle et al. (2012), who employ data from ebay to estimate a gravity equation covering 62 countries. The researchers justify limiting their analysis to 62 countries by noting that 92% of all offline trade flows happen between these 62 states. By omitting the countries that trade little with each other, however, Lendle et al. (2012) risk underestimating the effects of distance. If distance is part of the reason why some countries trade with each other very little, or do not trade at all, excluding the locations that do not trade much with each other will bias the coefficient on distance downward. This has lead the authors to claim a 65% reduction in the effect of distance and a 29% increase in welfare attributable to online sales channels. Besides the comprehensiveness of the data set, we also improve on previous online gravity work by using the methodology proposed by Helpman, Melitz and Rubinstein (2008). All previous studies that examine the effect that distance has on online trade use linear estimators. In their seminal work, Helpman, Melitz and Rubinstein (2008) show that linear estimators suffer from selection and omitted variable biases. With their methodology, we find the elasticity of distance to be around This figure is lower than the -1.1 average coefficient reported by Head and Mayer (2013). It is also lower than the -0.8 estimate that Helpman, Melitz and Rubinstein (2008) obtain by employing the same methodology to traditional trade flows. Although lower, the elasticity of distance we estimate is certainly not 3
4 negligible. We conclude that distance may show signs of abating, but it certainly is still alive in the online world. Section 2 presents an overview of the gravity literature. It describes the studies done to date examining the effects of distance both in the online and the offline world and provides an overview of Helpman, Melitz and Rubinstein (2008). Section 3 describes the empirical approach and the data set. Section 4 presents the results of our analysis and section 5 concludes. 2 Previous Literature 2.1 Gravity Equations Gravity equations have been estimated since the 1960s, based on a simple concept borrowed from physics: trade between two countries depends positively on their sizes and negatively on the distance between them. Although gravity equations performed very well empirically, they had no theoretical foundations until Anderson and van Wincoop (2003), who convincingly showed that besides bilateral impediments, trade also faces multilateral resistance. Chaney (2008), Helpman, Melitz and Rubinstein (2008), and Melitz and Ottaviano (2008) made the second substantial theoretical contribution to the gravity literature, by integrating gravity within heterogeneous firms models. Head and Mayer (2013) provide an excellent summary of the theoretical, as well as the empirical results, of the gravity literature. The variables included in empirical gravity analyses typically measure the effect of the geographic, cultural or economic closeness between trading partners. The two authors examined over 2,500 estimates reported in 159 academic publications to evaluate the main empirical findings. We summarise their results below. Geographical factors The physical distance between two countries is a ubiquitous variable in any gravity equation. Head and Mayer (2013) report the average of the estimated distance elasticity to be around -1. Another common geographical factor considered in gravity equations is whether two countries share a land border or not. Most re- 4
5 searchers find contiguity to have a positive and significant effect on trade. Head and Mayer (2013) report the average of the estimated coefficient on a dummy variable equal to 1 if two countries share a border to be around 0.5. Other geographical characteristics considered in gravity equations are whether the trading countries are landlocked or islands. Due to the limitations these geographical characteristics place on transport options, they are generally found to negatively impact trade (see Limao and Venables (2001), Santos and Tenreyro (2006), Helpman, Melitz and Rubinstein (2008)). Cultural factors Cultural closeness between two countries has a positive effect on the amount of trade between them. Head and Mayer (2013) calculate the average estimated coefficient of a dummy variable equalling 1 if two countries speak the same language to be around 0.4. The average estimated coefficient of a dummy variable equalling 1 if two countries share a colonial link is even higher, taking a value of around 0.8. Other cultural characteristics considered by researchers in gravity settings are whether the trading partners have similar religious beliefs or whether their legal systems share the same origin. These variables are generally found to positively and significantly influence trade (see Helpman, Melitz and Rubinstein (2008)). Economic factors Free trade agreements, as well as a common currency, facilitate trade between two countries. In a gravity setting, the effect of a free trade agreement is measured by including a dummy variable that equals 1 if two countries are part of a free trade area. Head and Mayer (2013) report the average estimated coefficient of this dummy variable to be around 0.4. The effect of sharing a currency on trade is usually measured by including a dummy variable equal to 1 if two countries use the same currency. Head and Mayer (2013) calculate the mean of the estimated coefficient on this dummy variable to be around Internet Trade and Gravity Equations A small literature at the intersection of computer science, economics and quantitative marketing has studied the relationship between economic geography and the Internet. This literature has primarily focused on consumers substitution between online and offline purchasing channels (for example, see Forman, Ghose and 5
6 Goldfarb (2009), Brynjolfsson, Hu and Rahman (2009) or Goolsbee (2001)). Very few researchers have used online transaction data to address trade-related questions. Lendle et. al (2012), Hortacsu, Asis Martinez-Jerez and Douglas (2009), and Blum and Goldfarb (2006) are the most convincing studies to estimate gravity equations on data relating to online activities. However, each of these researchers have faced data limitations, some of which we are able to improve upon. Lendle et. al (2012) work with ebay data and they have access to the most comprehensive data set out of the three papers mentioned above. The authors perform a gravity analysis focusing on 62 countries and find the effect of distance to be 65% smaller on ebay s online platform than offline. In order to produce results that are directly comparable to an offline setting, the authors drop 92% of the observations available in their original data set. Their study, therefore, examines only 8% of ebay s total cross border flows. Furthermore, the researchers report results obtained using a linear estimator, which is known to produce biased results in a gravity setting. The distance coefficient, in particular, suffers from an upward bias in an OLS setting. Hortacsu, Asis Martinez-Jerez and Douglas (2009) look at geographical trade patterns using data from ebay and MercadoLibre, a South American website dedicated to e-commerce and online auctions. The ebay data cover US buyers and sellers over a four-month time frame and the location of the buyer is known for 27% of the transactions. This limits the authors to examine trade patterns only within the United States and only on a quarter of the transactions registered in their data set. Their results show that within the US, distance has a negative effect on trade, but the value of the coefficient is only about 10% of that obtained by studies looking at the offline world. The MercadoLibre data is more complete, but it only allows for an analysis of trade flows between nine South American countries. Blum and Goldfarb (2006) use data from a defunct ISP company to look at the Internet activities of 2,654 American Internet users, who visited websites from 46 countries over the span of three months. Their data are also limiting. They only cover page views, the country of origin of the website visited is difficult to determine and there are questions about the representativeness of the sample, given that there were over 100 million Internet users in the United States in the year examined by the authors. They find that distance matters more for taste-dependent differentiated products and less for homogenous products. As noted in the introduction, we are able to improve upon the existing estimates, 6
7 by using a more comprehensive data set and a methodology that corrects known econometric biases. We find the elasticity of distance in a worldwide gravity setting covering 140 countries to be around -0.5, which is close to the estimate obtained by Lendle et. al (2012). Our estimate is much higher than the one obtained by Hortacsu, Asis Martinez-Jerez and Douglas (2009). We believe this is due to our covering a much broader set of countries. Indeed, when we limit our coverage to online transactions conducted within North America, like Hortacsu, Asis Martinez- Jerez and Douglas (2009) do, we obtain an elasticity of distance around Overview of Helpman, Melitz and Rubinstein (2008) Worldwide gravity exercises have been criticised on two main grounds: first, that the estimating equation is not carefully derived from theory and second, that OLS estimates suffer from selection bias due to ignoring the zero trade flows. Helpman, Melitz and Rubinstein (2008) is the only paper to address both of these issues, which is why we choose to use their methodology in our paper. Helpman, Melitz and Rubinstein (2008) start from a Melitz (2003) model with asymmetric countries. Consumers face a CES utility function. Each country has a number of firms, and each firm produces a distinctive product. A firm produces a unit of output with a cost-minimising combination of inputs, c j a. The parameter a is firm specific and it reflects differences across firms in productivity levels. The cost c j is country specific and it reflects differences across countries in factor prices. A firm from country j can sell its products on the domestic market, or it can export to country i. If the firm wants to export to country i, however, it faces two additional costs: a fixed cost of exporting and a variable cost that is specified in the model as a melting iceberg transport cost. Profit maximisation makes it obvious that while all firms will sell on the domestic market, exporting will be feasible only for a fraction of firms that have a productivity level a above the threshold required for exports from j to i to be profitable. Based on the theoretical model, the authors derive the following estimating equation: m ij = β 0 + λ j + χ i γd ij + w ij + u ij 7
8 where: m ij = country j s exports to country i λ j = fixed effect of the exporting country, j χ i = fixed effect of the importing country, i d ij = symmetric distance between i and j w ij = fraction of firms (possibly zero) that export from j to i u ij = i.i.d. unmeasured trade frictions Traditional estimates are biased due to two econometric problems. First, omitting the fraction of firms that export from j to i creates an omitted variable bias. This fraction is endogenous; it is a function of the productivity threshold required for exports from j to i to be profitable, which in turn is a function of the variables controlling for the trade barriers between the two countries. Omitting w ij from the estimating equation induces an upward bias in the coefficient measuring the effect of the symmetric distance between i and j, as this coefficient picks up the effect of trade barriers on firm-level trade and on the proportion of exporting firms. Second, omitting zero trade flows creates a selection bias. Helpman, Melitz and Rubinstein (2008) underline the fact that there are many zero trade flows observed in the real world. In 1997, only about 40% of country pairs traded in both directions. An additional 12% traded in one direction only, and a surprising 48% of country pairs did not trade at all. We ran a similar analysis for our data from Google and found that between , 25% of all country pairs traded in both directions, about 30% traded in one direction only, and 45% did not trade at all. Given the high percentage of zero trade flows in the online world, it is just as important as it is in the traditional, offline world, to correct for the selection bias created by omitting the zero trade flows. Excluding the country pairs with zero trade flows places a downward bias on γ, the trade barrier coefficient. To correct these biases, Helpman, Melitz and Rubinstein (2008) start by defining a latent variable, Z ij : z ij = ln Z ij = γ 0 + ξ j + ζ i γd ij κφ ij + η ij 8
9 where: z ij = ratio of variable export profits for the most productive firm to the fixed export costs for exports from j to i ξ j = fixed effect of the exporting country, j ζ i = fixed effect of the importing country, i d ij = symmetric distance between i and j φ ij = observed measure of any additional country-pair specific fixed trade costs η ij = u ij + v ij, where v ij is IID N(0, σ ij ) The latent variable z ij is not observed, but positive trade between j and i is observed when z ij > 0 and zero trade is observed when z ij = 0. Let T ij be an indicator variable equal to 1 when country j exports to i and 0 otherwise, and ρ ij be the probability that j exports to i. We can then specify the Probit equation: ρ ij = Pr(T ij = 1) = Pr(z ij > 0) = Φ(γ * 0 + ξ * j + ζ * i γ * d ij κ * φ ij ) If ρˆ ij is the predicted probability of exports from j to i, we can easily obtain ẑ ij * = Φ 1 ( ρˆ ij ), the predicted value of the latent variable zij. Since W ij, the fraction of firms that export from j to i, is a monotonic function of Z ij, we can use ẑ ij * = Φ 1 ( ρˆ ij ) to obtain a consistent estimate for w ij = ln W ij, the omitted variable in the gravity equation specified above. The proposed methodology, thus far, corrects the first of the two biases identified above, namely the omitted variable bias. To correct the second bias identified above, Helpman, Melitz and Rubinstein (2008) propose to use the standard Heckman (1979) correction for sample selection. For this, they use the same Probit equation specified above to obtain the inverse Mills ratio: ˆ η ij = φ(ẑ * ij)/φ(ẑ * ij). Accounting for this term in the gravity equation addresses the selection bias created by omitting zero trade flows from the estimating equation. 3 Empirical Approach 3.1 Empirical Specification 9
10 To correctly estimate the elasticity of distance using online trade flows, we use the two-step estimation procedure proposed by Helpman, Melitz and Rubinstein (2008). The first stage focuses on the extensive margin, and it is a Probit regression that yields a prediction for the probability that country j exports to country i. The second stage focuses on the intensive margin, and it is a gravity equation estimated on the subset of positive trade flows, which examines the volume of exports from j to i, conditional on exporting. The Probit equation takes the following form: T ij = β 0 + β 1 ln (Distance) ij + β 2 Land border ij + β 3 Island ij + β 4 Landlock ij + β 5 Legal ij + β 6 Language ij + β 7 Colony ij + β 8 Currency ij + β 9 FTA ij + β 10 Religion ij + β 11 Warehouse ij + λ i + χ j + ε ij where: T ij : 1 if country j exports to country i, 0 otherwise Distance: great circle distance, in km, between the capitals of i and j Land border: 1 if two countries share a land border, 0 otherwise Island: 1 if both countries are islands, 0 otherwise Landlock: 1 if both countries have no coastline, 0 otherwise Legal: 1 if both countries legal systems share the same origin, 0 otherwise Language: 1 if both countries share an official language, 0 otherwise Colony: 1 if the two countries were in a colonial relationship, 0 otherwise Currency: 1 if both countries use the same currency, 0 otherwise FTA: 1 if both countries belong to a free trade agreement, 0 otherwise Religion: (% Protestants in i % Protestants in j) + (% Catholics in i % Catholics in j) + (% Muslims in i % Muslims in j) Warehouse: 1 if the number of procedures required to build a warehouse is above the median in both countries, 0 otherwise In the econometric model, λ i and χ j are importer and exporter fixed effects, respectively, and ε ij is the error term. With the exception of Warehouse, which requires additional explanation, the other 10
11 variables included in the Probit regression account for the geographic, cultural and economic closeness factors that have been identified by previous researchers to influence trade between two countries. The inclusion of the variable Warehouse is necessary from an econometric perspective. We estimate the first stage Probit regression above in order to obtain a predicted probability of export, ρˆ ij, which in turn allows us to obtain consistent estimates for ˆ w ij and ˆ η ij. These estimates are needed as additional controls in the second stage estimation, as they correct the omitted variable and sample selection biases. To obtain consistent estimates for ˆ w ij and ˆ η ij, it is necessary to include an exclusion restriction in the first stage estimation. The theoretical model proposed by Helpman, Melitz and Rubinstein (2008) suggests that a valid exclusion restriction is one that affects the fixed cost of exporting, so it enters with a non-zero coefficient in the first stage Probit equation, but has no effect on the variable cost of exporting. The authors propose two indicators variables as exclusion restrictions, the first of which takes the value of 1 when the cost of setting up a business in both countries is above the median, while the second takes the value of 1 when the number of days and procedures required to set up a business in both countries is above the median. We test these two indicator variables proposed by Helpman, Melitz and Rubinstein (2008) on our data set, but we find that they do not have enough explanatory power in the first stage estimation to act as valid exclusion restrictions. The number of procedures to build a warehouse, on the other hand, does very well as an exclusion restriction for our data set and satisfies the requirement of influencing the fixed, rather than the variable, cost of exporting online. Helpman, Melitz and Rubinstein (2008) propose three different ways to run the second stage estimation. The dependent variable in the second stage estimation is, invariably, m ij, which represents the exports from j to i. Since δ, the coefficient associated with ˆ w ij does not enter the model linearly, an obvious first candidate for the second stage estimation is nonlinear least squares (NLS). A second candidate for the second stage estimation is an OLS regression, which includes among the independent variables the usual gravity controls, as well as ˆ η ij and a polynomial in ˆ z ij. The authors reach this estimating equation by relaxing the assumption governing the distribution of firm heterogeneity, which eliminates the nonlinearity induced by ˆ w ij, and the need for a nonlinear estimator. Finally, the third candidate for the second stage estimation is a non-parametric functional form that emerges as a possibility after relaxing the joint normality assumption for the unobserved trade costs. Although these three estimation methods rely on different assumptions, they yield very similar results. 11
12 A Priori Hypotheses A firm wanting to export online goes through a process that is different from that undertaken by a firm looking to export offline. On Google s platforms, a US company wanting to export to France can do so instantaneously, by ticking a box in its Google account and making its online ads visible within France. In theory, therefore, a US company wanting to export to France faces zero fixed costs. In practice, however, it is hard to imagine a profit maximising firm making the decision to export without undertaking some initial investment to make its product appealing and accessible to foreign consumers. Before exporting to France, a US online retailer might want to translate its website and marketing materials into French. It might also want to ensure that it can process payments made with French credit cards, or that it can safely ship and store its products once they arrive in France. The costs outlined here are small compared to the investment a traditional firm needs to undertake to export offline, but they are unlikely to be zero. Our prior is that distance still matters online. A recent study done by an Australian hosting company reveals that 83% of respondents turn to online shopping for convenience, while 71% turn to it for lower prices. 1 Shipping physical goods to far away places can take a long time and is costly. By decreasing convenience and increasing the costs of ordering a product, large distances are likely to decrease the appeal of online exchanges. Besides the wait and the cost online shoppers face when ordering a product from a distant place, they might also worry that they have to face an additional wait and cost in case they need to return the product. The quality and the fit of a particular good are less certain when purchased online than when purchased and physically inspected in a store. This makes product returns more likely in the online environment, which further contributes to the sensitivity of online exchanges of physical goods over large distances. Similar sensitivities might apply to the exchange of services over the Internet, where the convenience of working with a provider that is close by will make transactions between online shoppers and service providers less likely over long distances. Our other strong prior is that countries that speak the same language will trade more with each other in the online world. Translating the website and the marketing materials constitutes a large part of the fixed cost incurred by a company wanting to export online. After it starts exporting, the company will likely communicate with its customers via or over the phone about the products offered 1 The survey was conducted by Rackspace Hosting in
13 or the details of the orders placed online. Speaking the same language will facilitate that communication. Our prior is that a common language, by lowering both the fixed and the variable cost of exporting, will have a positive influence on online trade. 3.2 Data Google Data Description The data we use come from Google s online advertising platforms. In order to provide the clients of its advertising program, AdWords, with useful intelligence about their ads, Google offers free conversion tracking software. This software tracks users anonymously from viewing an ad to purchasing, placing the item in a shopping cart, downloading or other forms of online conversions that are valuable to businesses. It allows clients to measure the return on their investment in advertising in terms of dollars and transaction counts. The transaction data we use in this study come from the records generated by the conversion tracking software. We are confident that our data set is the most comprehensive one used to date to examine trade related questions in the digital world. However, like any other data set, the Google data we analyse have some limitations. We consider them to be minor compared to the limitations faced by the other studies that looked at online trade, yet is is important to clearly spell them out: Counts only: The conversion tracking code reliably generates conversion counts. Counts represent the number of times users reach the sections of the sites where the advertisers placed the code. AdWords ads only: The code will only track conversions generated as a result of users clicking on an ad placed on Google or on one of Google s partner sites. Although Google partners with millions of sites around the world, 13
14 there are online transactions which will not be recorded in this data set. For example, if a user types into their browser and purchases an object directly from that transaction will not be recorded in the data set, as it happened outside of Google s advertising program. 30 day limit: The code generates a conversion count if a user clicks on an AdWords ad and reaches the page where the code is placed within 30 days of the initial click on the ad. If the user reaches the page after 30 days, the system will no longer record the conversion. Optional tool: The conversion tracking tool is optional, so advertisers are not required to enable it. However, the adoption rate is high, as it is the easiest way for advertisers to assess whether their online advertising campaign are effective. The location data associated with our transactions come from two sources. For buyers, we use estimates based on IP (Internet Protocol) address. For sellers, we use the self-reported address data required of businesses who sign up to be Google advertisers. Although we believe that this information allows us to place most buyers and sellers in the correct region, it is possible that we are attributing a small part of the buyers or sellers to the wrong location. On the buyers side, the IP address might indicate an incorrect physical location if the user is accessing the Internet using a virtual private network (VPN) connection or if the user is, for whatever reason, using software to mask his or her actual IP address. While we cannot identify these users, the level of sophistication needed to set up a VPN network makes us believe that the users that mask their IP address represent a small enough percentage of the total Internet users to make our location identification on the buyers side reliable. On the seller side, we have access to two address fields. The first one is a general mailing address that is self-reported by the sellers. This is the address field that we choose to use. The second one is the address where the credit card used to pay for the ads is registered. A data check reveals that the general mailing address reported by the sellers very rarely differs from the billing address. 14
15 Coverage In 2007, Google announced that AdWords, its advertising program, passed the one million mark in terms of number of advertising accounts. This is the last publicly available figure. It is meaningful in that it indicates that six years ago, there were already more than one million businesses opting for Google s advertising platform. Google also announced publicly that no client accounts for more than 10% of the company s revenues, indicating that its advertising program is not dominated by one large business. The data set used here covers all conversion counts recorded over a period of four years, from 2008 to Although we cannot provide precise figures, the overall number of recorded conversion counts is in excess of 10 billion. The data are available at the daily level, although we aggregate it to the yearly level to generate our main results. 2 All but 6 countries are featured in our data set. There is a trade embargo in place between the United States and Burma, Cuba, Iran, North Korean, Sudan and Syria. As a US-based company, Google cannot generate revenues in these countries and therefore, it cannot accept ads from businesses located in any of these six countries Other Variables Besides the dependent variable, the only other variable that requires explanation is our distance measure, d ij. Similar to Lendle et al. (2012), we use the CEPII Distances database, which calculates the distance between two countries based on the bilateral distances between the biggest cities of those two countries and weights those inter-city distances by the share of the city in the overall country s population. We used information reflected in CIA s World Factbook to construct all the other variables. 3 2 Unfortunately, we cannot use the panel dimension of the data, as we do not have a timevarying instrument. To use all years of data available, however, we use Lendle et al. (2102) s solution and average the observations over the four years of data that we have at our disposal. 3 Due to data unavailability for the external variables, we dropped the following countries and territories from our analysis: American Samoa, Antarctica, Bouvet Island, British Indian Ocean Territory, French Southern Territories, Guam, Heard and McDonald Islands, Liechtenstein, Mayotte, Monaco, South Georgia and the South Sandwich Islands, Svalbard and Jan Mayen, US Virgin Islands, and the Vatican. 15
16 4 Results 4.1 Comparison with Lendle et al. (2012) As a first exercise, we compare the results we obtain using our data set from Google to the results that Lendle et al. (2012) obtain using data from ebay. In column (1) of Table (1), we report the results presented by Lendle et al. (2012) on page 34 of their paper. In column (2), we report the results we obtain by running an identical specification on our data set from Google. We limit the analysis to the 62 countries covered by the Lendle et al. (2012) study, we use the same controls that they use, and we employ the same linear estimator. Finally, in column (3), we employ the same data set and methodology that we used to generate the results reported in column (2), but we consider four additional controls proposed in the Helpman, Melitz and Rubinstein (2008) paper. The elasticity of distance that we estimate using data from Google is slightly higher than the one Lendle et al. (2012) obtain using data from ebay. However, a simple test reveals that the difference between these two estimates is not statistically significant. When it comes to online trade flows, distance matters just as much on Google as it does on ebay. With the exception of the coefficient on the dummy variable that measures the effect that a common border has on trade flows, none of the other coefficients reported in column (2) are statistically different from those listed in column (1). This is an encouraging result, as it indicates that although our data come from a different online platform, the underlying trade patterns are similar to those identified by Lendle et al. (2012). Column (3) in Table (1) points to the importance of considering additional controls besides the ones included by Lendle et al. (2012). In particular, the coefficient of the variable that measures the similarities in religious beliefs across countries is highly significant. We will see in the next section that this result is persistent, and robust to different specifications. The dummy variable that takes the value of 1 when both trading partners share a currency is insignificant. We will explore this result in section
17 Table 1: OLS Results for a Data Set Limited to 62 Countries ebay Google (1) (2) (3) Distance (0.031) (0.044) (0.043) Land border (0.085) (0.140) (0.140) Island (0.131) Landlock (0.152) Legal (0.037) (0.048) (0.048) Language (0.066) (0.109) (0.109) Colony (0.094) (0.183) (0.184) Currency (0.091) FTA (0.055) (0.071) (0.075) Religion (0.135) Obs. 3,778 3,714 3,714 R Notes: Exporter and importer fixed effects. Robust standard errors (clustering by country pair). *** significant at 1%, ** significant at 5%, * significant at 10% 4.2 Main Results Table 2 reports the results we obtain when we expand the data set to 140 countries. This more than doubles the number of countries considered in the Lendle et al. (2012) paper. Column (1) in Table (2) reports the results we obtain when running a simple OLS regression on the larger data set. In terms of the controls and the methodology used, the results reported in this first column are directly comparable to those reported in column (3) of Table (1). We find that limiting the analysis to the 62 countries with high levels of trade between them biases the coefficients of the variables intended to capture the closeness between trading partners downwards. Remoteness, be it geographic, cultural or economic, discourages trade. By excluding from a data set the country pairs that trade little with each other, one excludes the observations for which remoteness might matter the most. A comparison of the results generated using the smaller sample size of 62 countries and those generated using the larger sample size of 140 countries confirms this hypothesis. With the increase in sample size, the elasticity of distance increases from to The coefficients of the other variables that measure cultural or economic closeness increase, as well. The coefficient of the dummy variable that takes the value of 1 if both countries speak the same language increases from to The coefficient on the variable that measures the similarity in religious beliefs increases from to Finally, the coefficient of the dummy variable 17
18 that equals 1 if both countries are part of a free trade agreement, an indicator meant to capture economic closeness, increases from to All of these increases are statistically significant at the 1% level and point to the importance of expanding the analysis beyond the 62 countries considered by Lendle et al. (2012). Column (2) in Table (2) confirms the validity of the exclusion restriction used in the first stage regression. As noted in the previous section, a valid exclusion restriction should have no impact on the amount of trade between two countries, but it should have a statistically significant effect on the probability of two countries trading. The coefficients on the exclusion restriction Warehouse, reported in columns (2) and (3) of Table (2) confirm that it is exactly the type of variable that we need. Column (3) of Table (2) reports the results of the Probit regression. This is the first stage estimation and it gives insight into the extensive margin of trade. The estimated coefficients reflect the effect that the independent variables have on the probability of two countries trading. Columns (4)-(6) of Table (2) report the results obtained in the second stage regressions. The estimated coefficients reflect the effect of the independent variables on the volume of trade between two countries, conditional on exporting. The results in these columns give us insight into the intensive margin of trade. We discuss the estimated coefficients below. 18
19 Table 2: Worldwide Gravity Results: Conversion Counts Google Data OLS OLS Probit NLS Polynomial Non-Param. (1) (2) (3) (4) (5) (6) Distance (0.029) (0.029) (0.011) (0.034) (0.038) (0.039) Land border (0.142) (0.142) (0.043) (0.138) (0.142) (0.143) Island (0.104) (0.104) (0.035) (0.101) (0.103) (0.104) Landlock (0.115) (0.115) (0.041) (0.114) (0.116) (0.117) Legal (0.038) (0.038) (0.014) (0.037) (0.037) (0.037) Language (0.068) (0.068) (0.017) (0.086) (0.096) (0.097) Colony (0.202) (0.202) (0.098) (0.192) (0.196) (0.198) Currency (0.151) (0.151) (0.056) (0.153) (0.156) (0.157) FTA (0.079) (0.079) (0.031) (0.079) (0.084) (0.085) Religion (0.084) (0.084) (0.031) (0.081) (0.083) (0.083) Warehouse (0.061) (0.023) δ (from ˆ ω ij ) (0.091) ˆ η ij (0.081) (0.135) ˆ z ij (0.361) ˆ z ij (0.111) ˆ z ij (0.011) Obs. 10,120 10,120 19,425 10,120 10,120 10,120 R Notes: Exporter and importer fixed effects. Marginal effects at sample means and pseudo R 2 reported for Probit. The number of procedures to build a warehouse is he excluded variable in all second stage specifications. Bootstrapped standard errors for NLS; robust standard errors (clustering by country pair) elsewhere. *** significant at 1%, ** significant at 5%, * significant at 10% Distance Distance has a negative and significant influence on both the extensive and the intensive margins of trade. This confirms our a priori belief that distance is not dead, even online. The second stage estimation results indicate that conditional on exporting, a 1% increase in the distance between two countries reduces the volume of trade between them by approximately 0.5%. We can safely conclude that among Google s advertising clients, distance lives on. The OLS coefficient on distance is biased upwards. The benchmark results, reported in column (1) of Table (2), show the elasticity of distance to be around The magnitude of this coefficient is between 13% and 17% higher than the magnitude of the coefficients reported in columns (4)-(6) of Table (2), which are estimated using the methodology suggested by Helpman, Melitz and Rubinstein (2008). It is worth nothing that the bias is much smaller in the virtual world than in the analysis conducted by Helpman, Melitz and Rubinstein (2008) on offline 19
20 data. The magnitude of the OLS coefficient they estimate using traditional trade data is between 27% and 32% higher than the magnitude of the coefficients estimated using econometric methods designed to correct the omitted variable and the sample selection biases. The coefficients we report for distance in columns (4)-(6) of Table (2) are similar in magnitude to the coefficient we report in column (3) of Table (1), where we use Lendle et al. (2012) s methodology and run a linear estimator on a sample size of 62 countries. While tempting, one cannot confidently argue based on our results that the biases in their sample, which we corrected by adding more countries and by using the methodology proposed by Helpman, Melitz and Rubinstein (2008), are similar in magnitude to the biases in our data set. Furthermore, the main contribution of the Lendle et al. (2012) study is the ability to identify the precise products being sold on ebay and to compare the estimated elasticities of distance for online trade in these products to offline trade. There is no guarantee that the omitted variable and the sample selection biases that affect Lendle et al. (2012) s online analysis are the same as those affecting their offline estimations. Other Geographic Characteristics Our results show that besides distance, sharing a land border also significantly affects the volume of trade between two countries. The magnitude of the coefficient of the contiguity dummy in columns (4)-(6) is large: it predicts that conditional on exporting, online trade between two countries that share a land border is close to 120% larger than trade between countries that do not share a border. Contiguity also affects the extensive margin on trade. Column (3) of Table (2) shows that the coefficient of the Land border dummy variable has a positive and significant impact on the probability of two countries trading online. Helpman, Melitz and Rubinstein (2008) s estimates using offline trade data, replicated in the Appendix for convenience, show similarly large contiguity effects. Countries that are islands or landlocked, characteristics which several studies done on data from the offline world found to negatively impact trade, no longer decrease the volume of trade in the virtual world. In columns (4)-(6), the coefficient of the dummy variable equalling 1 if both countries are landlocked is positive and statistically significant at the 10% level, while the coefficient on the dummy variable equalling 1 if both countries are islands is insignificant. We find some indication that the probability of exporting is negatively affected by both trading partners being islands. The coefficient on the Island dummy variable, reported in column (3) is negative and significant. This is, once more, similar to the results that Help- 20
21 man, Melitz and Rubinstein (2008) obtain on offline trade data. Language Our prior regarding the importance of language to online trade was correct. A good proportion of the fixed cost that a company faces when exporting online comes from translating its website and marketing materials. In column (3) of Table (2), we can see that the coefficient on the dummy variable measuring the effect of two countries sharing a language has a large, positive and statistically significant effect on the probability of two countries trading. Once it decides to export, a company needs to provide customer support services to its foreign clients. In columns (4)-(6), we can see that sharing an official language also has a large and positive effect on the intensive margin of trade. Conditional on exporting, the volume of trade between countries that share an official language is more than 180% higher than the volume of trade between countries that do not speak the same language. Interestingly, Helpman, Melitz and Rubinstein (2008) find that language has no statistically significant effect on the decision to export offline or on the volume of exports once the decision is made. The difference between our results and theirs implies that sharing an official language in the online world, where customers and businesses interact via a website written in a language that they can both understand, is far more important than sharing a language in the offline world. Other Cultural Characteristics Besides language, other cultural affinities, such as similarity in religious beliefs, colonial ties and common legal systems also have a positive influence on the volume of online trade. The index measuring similarity in religious beliefs, in particular, has a positive and statistically significant effect, both on the extensive and the intensive margins of online trade. The dummy variables measuring the effect of two countries having similar legal systems or colonial ties have a positive and statistically significant effect only on the intensive margin on trade. Conditional on exporting, the volume of trade between countries that have similar legal systems is 12% larger than the volume of trade between countries that do not. For countries that have colonial ties, the volume of trade conditional on exporting is 57% higher than for countries that do not. Economic Characteristics Belonging to a free trade agreement has a positive and statistically significant 21
22 effect on the probability of countries trading and on the volume of online exports between them once the decision to export is made. All else being equal, the quantity of online trade, conditional on exporting, is around 80% higher for countries that belong to a free trade agreement than for countries that do not. Using a common currency influences the extensive, but not the intensive margin of online trade. The coefficient of the dummy variable Currency, which takes the value of 1 if two countries have a common currency and 0 otherwise, is positive and significant in column (3) of Table (2), indicating that the indicator influences the probability of countries trading with each other. This is a sensible result. Before exporting to a new country, an online company has to invest in ensuring that it can accept electronic payments in the currency of the importing country. It might also need to alter the billing section on its website in order to reflect the new currency. As it influences the fixed cost of exporting, Currency should have, and does have, a positive and statistically significant effect on the extensive margin of trade. Once the initial setup is complete, Currency is unlikely to influence the volume of online trade between two countries. With electronic payments, currency conversions are done automatically and instantaneously. As algorithms take care of currency conversions online, sharing a common currency should not impact the volume of trade between two locations. Our results in columns (4)- (6) of Table (2) confirm this hypothesis. The coefficient on the dummy variable Currency is insignificant. This result is in contrast with Helpman, Melitz and Rubinstein (2008), who find currency unions to have a large and significant effect on the intensive margin of offline trade. 5 Conclusion In this study, we challenge the popular view that distance is dead in the virtual world. We use a proprietary data set from Google to perform the most extensive online worldwide gravity analysis to date. We find that a 0.5% increase in the distance between two countries lowers the volume of online trade between them by 1%. We conclude that distance continues to negatively affect trade, even for transactions done over the Internet. Although our findings are similar to Lendle et al. (2012), our study relies on a much larger database and it corrects two important econometric biases that affect linear estimators in a worldwide gravity setting. Besides the negative effect of distance on online trade, our study also uncovers 22
23 a few other interesting facts relating to online trade. We find that cultural affinities matter greatly in the virtual world. A common official language, in particular, close to triples the volume of online trade between two countries. We also find that economic ties, such as a common currency, which facilitate trade in the offline world, have an insignificant effect on the intensive margin of trade in the virtual world. Our results indicate that the economic policies that encourage trade in the offline world might not be as successful at increasing the number of cross-border transactions online. While our study answers the important question of whether distance matters online, it does not answer the question of whether it matters more or less than offline. A simple comparison between our results and those obtained by researchers studying offline trade flows, such as Helpman, Melitz and Rubinstein (2008) might not be very precise. The composition of online and offline trade flows is likely to differ, which calls into question the validity of a direct comparison between online and offline estimates. In their attempt to contribute to the online-offline debate, Lendle et al. (2012) make a step in the right direction, but as noted above, the limitations of their data set and methodology make the analysis unreliable. Since we do not have matched information about the products traded on Google, we cannot shed light on the online-offline comparison question. We certainly hope, however, that as more detailed data regarding online trade flows becomes available, researchers will be able to provide a precise answer. 23
24 6 Bibliography Anderson, James E., and Eric van Wincoop. Gravity with Gravitas: A Solution to the Border Puzzle. American Economic Review 93.1 (2003): Blum, Bernardo, and Avi Goldfarb. Does the Internet Defy the Law of Gravity? Journal of International Economics 70.2 (2006): Brynjolfsson, Erik, Hu Yu, and Mohammad Rahman. Battle of the Retail Channels: How Product Selection and Geography Drive Cross-channel Competition. Management Science (2009): Chaney, Thomas. Distorted Gravity: The Intensive and the Extensive Margins of International Trade. American Economic Review 98.4 (2008): Forman, Chris, Anindy Ghose, and Avi Goldfarb. Competition between Local and Electronic Markets: How the Benefit of Buying Online Depends on Where You Live. Management Science 55.1 (2009): Goolsbee, Austin. Competition in the Computer Industry: Online Versus Retail, Journal of Industrial Economics, 49.4 (2001): Head, Keith, and Thierry Mayer. Gravity Equations: Workhorse, Toolkit, and Cookbook. CEPII research center, Heckman, James. Sample Selection Bias as a Specification Error. Econometrica 47.1 (1979): Helpman, Elhanan, Marc Melitz, and Yona Rubinstein. Estimating Trade Flows: Trading Partners and Trading Volumes. The Quarterly Journal of Economics (2008): Hortacsu, Ali, F. Asis Martinez-Jerez, and Jason Douglas. The Geography of Trade in Online Transactions: Evidence from ebay and MercadoLibre. American Economic Journal: Microeconomics 1.1 (2009): Lendle, Andreas, Marcelo Olarreaga, Simon Schropp, and Pierre-Louis Vezina. There Goes Gravity: How ebay Reduces Trade Costs. Centre for Economic Policy Research Discussion Paper No. 9094,
25 Limao, Nuno, and Anthony Venables. Infrastructure, Geographical Disadvantage, Transport Costs, and Trade The World Bank Economic Review 15.3 (2001): Melitz, Marc, and Gianmarco Ottaviano. Market Size, Trade and Productivity. Review of Economic Studies 75.1 (2008): Melitz, Marc. The Impact of Trade on Intra-Industry Reallocations and Aggregate Industry Productivity. Econometrica 71.6 (2003):
26 7 Appendix Table (3) reproduces the results presented in the original Helpman, Melitz and Rubinstein (2008) paper. Table 3: Helpman, Melitz and Rubinstein (2008) Results Offline: 1986 OLS Probit NLS Polynomial Non Param. (1) (2) (3) (4) (5) Distance (0.040) (0.016) (0.049) (0.052) (0.088) Land border (0.165) (0.072) (0.170) (0.166) (0.170) Island (0.269) (0.078) (0.290) (0.258) (0.258) Landlock (0.189) (0.050) (0.175) (0.187) (0.187) Legal (0.064) (0.019) (0.065) (0.064) (0.065) Language (0.075) (0.021) (0.087) (0.077) (0.083) Colony (0.158) (0.130) (0.257) (0.148) (0.153) Currency (0.334) (0.038) (0.360) (0.333) (0.346) FTA (0.247) (0.009) (0.227) (0.197) (0.348) Religion (0.120) (0.034) (0.136) (0.120) (0.128) Regulation costs (0.100) (0.036) Days & proc (0.124) (0.031) δ (from ˆ ω ij ) (0.043) ˆ η ij (0.099) (0.209) ˆ z ij (0.540) ˆ z ij (0.170) ˆ z 3 ij (0.017) Obs. 6,602 12,198 6,602 6,602 6,602 R Notes: Exporter and importer fixed effects. Marginal effects at sample means and pseudo R 2 reported for Probit. Regulation costs are excluded variables in all second stage specifications. Bootstrapped standard errors for NLS; robust standard errors (clustering by country pair) elsewhere. *** significant at 1%, ** significant at 5%, * significant at 10% 26
