RESEARCH NOTE INFERRING APP DEMAND FROM PUBLICLY AVAILABLE DATA 1 Rajiv Garg McCombs School of Business, The University of Texas at Austin, Austin, TX 78712 U.S.A. {Rajiv.Garg@mccombs.utexas.edu} Rahul Telang School of Information Systems & Management. H. John Heinz III College, Carnegie Mellon University, Pittsburgh PA 15213 U.S.A. {rtelang@andrew.cmu.edu} With an abundance of roducts available online, many online retailers rovide sales rankings to make it easier for consumers to find the best-selling roducts. Successfully imlementing roduct rankings online was done a decade ago by Amazon, and more recently by Ale s A Store. However, neither market rovides actual download data, a very useful statistic for both ractitioners and researchers. In the ast, researchers develoed various strategies that allowed them to infer demand from rank data. Almost all of that work is based on an exeriment that shifts sales or collaboration with a vendor to get actual sales data. In this research, we resent an innovative method to use ublic data to infer the rank demand relationshi for the aid as on Ale s itunes A Store. We find that the to-ranked aid a for iphone generates 150 times more downloads comared to the aid a ranked at 200. Similarly, the to aid a on ipad generates 120 times more downloads comared to the aid a ranked at 200. We conclude with a discussion on an extension of this framework to the Android latform, in-a urchases, and free as. Keywords: Mobile as, a store, sales-rank calibration, a downloads, areto distribution, Android, Ale itunes, in-a urchase Introduction 1 The growth of mobile hones and smart hones over the last few years has been henomenal. Based on recently ublished reorts, there are about 106 million users of smart hones 2 in the United States. Globally, there are 1.1 billion active mobile subscritions 3 with over 100,000 new smart hones 1 Ravi Bana was the acceting senior editor for this aer. Anindya Ghose served as the associate editor. being sold every quarter. 4 As more countries deloy high seed wireless networks, users are sending an increasing amount of time on their hones. A significant reason for this growth has been attributed to the availability of mobile hone alications (as) that are becoming ubiquitous on all mobile oerating systems. In Aril 2012, according to Ashoer. com, there were over 787,000 as available for the ios latform and 50 ercent of mobile hone users used the downloaded alications. Similarly as of May 2011, the total number of as available for the Android latform was aroximately 200,000 (Barra 2011). Based on AdMob s 2 htt://www.comscore.com/press_events/press_releases/2012/5/ comscore_reorts_march_2012_u.s._mobile_subscriber_market_share. 3 htt://mobithinking.com/mobile-marketing-tools/latest-mobile-stats. 4 htt://www.gartner.com/it/age.js?id=1764714. MIS Quarterly Vol. 37 No. 4,. 1253-1264/December 2013 1253
2010 reort, 5 smart hone users send about 80 minutes er day on mobile alications. Ale s iphone ushered an era where develoers were able to sell their innovative alications to a large consumer base through the itunes a store latform. These as cover a wide variety of domains including games, location services, roductivity, and healthcare. In 2011, Ale announced that more than 15 billion as had been downloaded from its a store as of July 2011. 6 Clearly, the a market has found favor with customers. Mobile as attract end consumers and create diverse oortunities for additional revenue for a develoers, device manufacturers, and cellular service roviders. More imortantly, as users start valuing these as more and more, a stores have an oortunity to engender strong externality. Thus, these latforms lure develoers, sometimes by roviding dee subsidies, to write diverse alications for them. This has resulted in growth in the number of both large and small a ublishing firms entering this highly dynamic market. This increase in diversity of develoers results in greater variety of software alications available to consumers (Boudreau 2012). Based on recent statistics, 7 there are 500,000 as aroved for the ios latform develoed by over 85,000 a develoers. The growth of a market rovides a great oortunity to examine imortant questions around software innovation, firm entry and exit strategy, software roduct ricing and romotion, latform leadershi, and externality. However, our understanding of this market is limited due to lack of demand data. Similar to Amazon s book market, Ale, Google, and Nokia do not rovide sales information on any alication. In fact, even the a develoers get somewhat aggregated data from these latform owners. For examle, Ale may not rovide details on alications downloaded on the ipad and iphone searately to a develoer. 8 Additionally, a develoers themselves are reluctant to share any details on demand for cometitive reasons. Thus, most individuals have access to aggregate numbers or the rank data. For examle, Ale s A Store tyically rovides a list of the to 200 aid as, 5 htt://blog.flurry.com/bid/63907/mobile-as-put-the-web-in-their-rearview-mirror (accessed March 2012). 6 htt://www.ale.com/r/library/2011/07/07ales-a-store-downloads- To-15-Billion.html. 7 htt://www.148as.com/news/a-store-milestone-500000-alicationsaroved/. 8 Based on our communication with a develoer (some discussions are also available on htt://forums.indiegamer.com/). to 200 free as, or to 200 highest grossing as. Unfortunately, having access to just an a s rank is not useful because one cannot infer the value of an a laced at a given rank. For a develoer, it is hard to determine whether the cost of moving u a few ranks by romotional activities is worth the benefits. Similarly, one cannot readily determine whether a articular niche in the a market is viable or how to set various marketing mix variables. Thus, having access to demand is highly beneficial to both ractitioners and researchers. Fortunately, researchers were able to infer demand from the rank data in the case of Amazon s book sales (Brynjolfsson et al. 2003, 2010; Chevalier and Goolsbee 2003; Chevalier and Mayzlin 2006). In these aers, rank and sales are assumed to be related via the ower law (or Pareto distribution) imlying that a small number of roducts cature a large share of the market. The tyical Pareto distribution that has been estimated in extant research is sales = b (rank) Ga + g (1) Where b is the scale arameter and a is the shae arameter. To estimate the model arameters,we need both rank and sales information for a. Rank information is generally available through the ublishers of those ranks. But to get demand data, researchers either conducted exeriments or collaborated with ublishers. Chevalier and Goolsbee (2003) conducted a creative exeriment to infer demand. They selected low-selling books (for which the demand was known or assumed to be very small) and urchased large quantities (relative to already low demand) of each book on Amazon. As the ranks for the books changed, they could infer the relationshi between the sales rank and exerimented demand. The downside of this aroach is that such exeriments are ractical only for very low-selling books. Therefore, inferring the relationshi at the to ranks using this aroach is not entirely accurate. In their study, Brynjolfsson et al. (2003) collaborated with a book ublisher to get access to demand data to establish the sales and rank relationshi. In both aroaches, it is imerative to get access to demand data. Table 1 summarizes the shae arameter values as estimated in rior studies. Notice that the shae arameter is decreasing in magnitude over time. The smaller value of a suggests a flatter curve (longer tail) for equation (1). However, the most imortant asect of calibrating this relationshi is that it aves the way for other interesting work. For examle, Ghose et al. (2006) estimated the elasticity of substitution between new and used goods. Ghose and Sundararajan (2006) used it to study the software security roduct marketlace. Brynjolfsson et al. (2003) and Anderson (2004) 1254 MIS Quarterly Vol. 37 No. 4/December 2013
Table 1. Pareto Shae Parameters Estimated by Existing Literature Source Shae Parameter Estimate Chevalier and Goolsbee (2003) 1.199 Data Source: Poynter (2000) Chevalier and Goolsbee (2003) 1.05 Data Source: Weingarten (2001) Chevalier and Goolsbee (2003) 1.2 (Evidence from various exeriments suggesting a value between 0.9 and 1.3) Brynjolfsson et al. (2003) 0.871 Ghose et al. (2006) 0.952 Data Source: Weingarten (2001) Chevalier and Mayzlin (2006) 0.78 Brynjolfsson et al. (2010) 0.613 used this to establish the long-tail henomenon on the Internet. One can also study the dynamics of demand over time. In summary, use of sales rank to comute actual sales or use in lieu of sales has become common in academic research because of the unavailability of actual sales data. With the increasing oularity of mobile as, we exect that many researchers will be studying the dynamics of this market. In this aer, we rovide a methodology to link rank with actual downloads for mobile as using ublicly available data. While our method is similar to methods roosed in rior studies, it is different on a few key dimensions. First, to calibrate the relationshi between rank and sales, the rior work needed access to demand data (either from an exeriment or from a book ublisher). Getting demand data is quite challenging, much more so in the A market, rendering this work quite difficult. In our aer, we calibrate mobile as rank-sales relation using ublicly available data alone. Access to any demand data is not needed at all. Second, many rior studies calibrated this relationshi using books with very low rank/baseline sales. This has the otential to introduce rediction inaccuracies for to selling books. In our case, we are calibrating the relationshi for to ranking as. We believe we can rovide more accurate estimates for the to selling as (which sell disroortionally more). In recent work, Carare (2012) examines how the ast rank of an a influences future demand. Since demand data is unavailable, he rovides a method that overcomes lack of demand data to estimate the arameters. In our aer, we rovide a framework to infer demand from ublicly available rank information. This direct measure of demand then allows for estimation of other variables of interest for examle, Carare s method cannot recover rice elasticity (see age 732). In this aer, we will illustrate that one can calibrate the rank sales relationshi using ublicly available data alone. Furthermore, to the best of our knowledge, this is the first study that tries to calibrate the relationshi between a rank and sales for mobile latform. The next two sections discuss our estimation method and data. We then resent our results and rovide some validation to our method. In the final section, we rovide evidence of robustness and generalizability and resent our conclusions. Model Our subject for inferring the demand will be Ale s A Store. We will discuss Google s Android store (recently renamed Google Play Store) in a later section. A key feature of a stores is that three different rank lists are ublicly available: to-free alications, to-aid alications, and to-grossing alications. The to-free list shows the mostdownloaded alications that have no ufront urchase rice. The to-aid list shows the most-downloaded alications that have a non-zero rice. The to-grossing list ranks the alications based on revenue generation. Like the extant literature, we assume a Pareto distribution for inferring downloads from rank. Assuming number of downloads of an alication at rank r in the to-aid list is given by d r, the Pareto distribution could be written as a r d = b r 1 r 200 Here b defines the scale factor that is deendent on the total market size for ipad or iphone as, and a defines the shae of the Pareto curve. (2) MIS Quarterly Vol. 37 No. 4/December 2013 1255
Similarly we define the Pareto distribution of as in the togrossing list where d rg is the revenue generated by the alication at rank r g in the list. This revenue could also be written as the roduct of rice () and number of downloads (d r ) of the same alication in the to-aid list. Thus we can write the distribution for the to-grossing as as ag rg r g g d = d = b r Equation (3) assumes that the to-grossing as generate their revenues from the ufront ricing only. Additionally, both free and aid as may include additional features inside the alication that users may urchase. Thus, aid as may generate some additional revenue that is not reflected in (3). However, in-a features are most common for free as. For aid as, the dominant source of revenue is still ufront rices. In the Discussion section, we will discuss how our method can be adated to in-a urchase otions. In equation (2) and in equation (3), we know the values of, r, and r g from ublicly available data. The unknown arameters that we need to estimate are b, b g, a, a g. We can rewrite (3) after taking logs as or where ( r ) = 1 bg ( ) + a g a b a ( r ) 1 a ( ) log log log log g g ( rg) = β0 + β1 ( r) + β2 ( ) log log log b g b g (3) (4) (5) a g = 1 (1/β 2 ) (6) a = 1 (β 1 /β 2 ) (7) ( 1 ( β1 β2) ) = ex / This could be estimated using a simle truncated ordinary least square regression. For readability uroses, we do not index r and dr for a given a i. In other words, rank and rice information for an a (i) is treated as indeendent cross sectional data even if the same a aears multile times. Notice from (8) that we can only recover the ratio of scale arameters (b g /b ). Estimating individual values of the scale arameters (b and b g ) requires additional information. Since the information for actual downloads for an individual a is not readily available, we use aggregate downloads in a day to recover (b and b g ). To see this, notice that if we know aggregate downloads (D t ) then (8) N a t r r=1 D = d = b r Thus, with the knowledge of total number of downloads of to ranked as we can recover b and b g from the formula above as a b = d r N N r r r = 1 = 1 ( ( β β )) N a r r bg = ex 1 0 1 N d r = 1 r = 1 (9) (10) In the equation above, the shae arameter (a ) is estimated from rior equations and the integral of individual a downloads (d r ) defines the total downloads associated with all to ranked as. Data As we mentioned earlier, the to-aid rank list and togrossing rank list are readily available from various websites such as Ale, Ashoer, AAnnie, and Mobilewalla. Our data eriod was from Aril 2011 to May 2011. The information collected contained the to 200 aid and the to 200 grossing a rankings recorded twice for each day during this eriod for both ipad and iphone. We also collected data on rices. It should also be noted that the resented methodology could be scaled to incororate a ranking list of any size as long as data is available. The summary statistics are given in Table 2. We observed 20 different categories for as where 38 ercent were categorized as games, 13 ercent as roductivity, 7 ercent as entertainment, 6 ercent as utilities, 5 ercent as hotograhy, 5 ercent as education, 4 ercent as business, 3 ercent as news, and 18 ercent as the remaining 12 categories. A snashot of the categories is resented in Figure 1. In our calibration, we did not use any a characteristics other than the rank and rice of an a. 9 The descritive statistics are resented only to show the differences between as for ipad and iphone. For examle, the average a file size is smaller for the iphone (when comared to the ipad) and a rices are lower, suggesting the ossibility of fewer grahical details, otentially due to the smaller screen size on iphones. 9 This is because a characteristics, if they affect demand, will already be subsumed in the rank data. 1256 MIS Quarterly Vol. 37 No. 4/December 2013
Table 2. Summary Statistics (from Aril 2011 to May 2011) ipad (Paid) ipad (Grossing) iphone (Paid) iphone (Grossing) N 23471 19857 21855 18008 Average Rank 99.63 (57.91) 100.04 (57.66) 99.98 (57.78) 99.94 (57.73) Average Price ($) 4.31 (4.44) 12.18 (51.03) 1.73 (1.68) 6.76 (44.68) Average File Size (MB) 73.55 (127.44) 83.98 (142.14) 51.58 (110.08) 68.26 (131.49) Figure 1. Share of Various Categories of A in To 200 Paid List Figure 2. Overlaing As on ipad and iphone (To 200 Paid List) MIS Quarterly Vol. 37 No. 4/December 2013 1257
Table 3. Summary Statistics of As Ranked in Both To Paid and To Grossing Lists ipad iphone N 10709 8164 Average Rank on To Paid List 74.00 (54.22) 64.46 (52.87) Average Rank on To Grossing List 88.67 (57.50) 93.72 (57.32) Average a rice 6.41 (5.11) 2.45 (2.22) Average a size (MB) 94.39 (149.80) 78.94 (149.49) Figure 3. Overlaing As on To 200 Paid and To 200 Grossing A Lists (ipad and iphone) We observe that, on average, 28 ercent of as overla between iphone and ipad lists and the average correlation between ranks is 0.46. We also found that 128 (10%) unique as out of 1,223 have a resence in the to aid lists for both the ipad and iphone. Similarly, 207 (13%) unique as out of 1,638 have a resence in the to free lists for both the ipad and iphone. We lot the rank correlation across two different latforms in Figure 2. For our analysis, we use as that aear on both lists. A summary of the overla between these two lists is rovided in Table 3. On average, 53 ercent of the to 200 aid as on ipad and 46 ercent of the to 200 aid as on iphone are also ranked among the to 200 grossing as list. As seen in Figure 3, there is a strong correlation (average value = 0.55 on ipad and 0.49 on iphone) between the ranks of as on the to-aid list and to-grossing list, which suggests that a higher rank (lower numerical value) in the aid list tends to generate larger revenue. Results Shae Parameter (a) Table 4 resents our estimates of coefficientsrecovered with equation (4). A ositive and significant estimate β 1 suggests that an increase in the rank on the to-aid list increases the rank on the to-grossing list as well. Estimate for rice (β 2 ) is negative and significant suggesting, all else equal, higher rices lead to more revenues and a lower numerical rank on the to grossing list. 10 We are not examining the effect of rice on sales (or elasticity) but are simly using rice to connect the 10 Notice that lower numerical rank means the alication is more oular. 1258 MIS Quarterly Vol. 37 No. 4/December 2013
Table 4. Coefficients from Truncated OLS Log(rank_GROSSING) iphone Coef. (robust std. err.) ipad Coef. (robust std. err.) Log(rank_PAID) - β 1 1.098 (0.008)*** 1.02 (0.009)*** Log(rice) - β 2-1.163 (0.011)*** -1.129 (0.01)*** Constant - β 0 1.014 (0.024)*** 2.131 (0.025)*** R 2 0.820 0.815 N 8164 10709 ***-value < 0.001 (1% significance) Table 5. Estimated Pareto Shae Parameters ipad iphone a 0.903 0.944 a g 0.886 0.860 two lists. All the coefficients in the regression are highly significant and a high value of R 2 suggests a good fit. As shown in equations (6,) (7), and (8), once we estimate (4), we can recover the shae arameter for both the to-grossing and to-aid as readily. They are roduced in Table 5. The estimated values suggest that most sales occur in the head, so the distribution of a demand is to-heavy (even within the 200 to as). A true benefit of the shae arameter is that we can estimate the ratio of the number of downloads of two as that are ranked differently during any given day in Ale s a store. for ipad d 1 : d for iphone: d 1 2 d 2 = ( r1 r2) = ( r1 r2) 0. 903 0. 944 (11) (12) Therefore, an imortant finding of equations (11) and (12) is that one can comare the value of different ranks. We can infer the number of downloads or revenues for different ranks. For examle, the number of downloads enjoyed by a to ranked ipad a is 120 times higher than the a ranked at 200. Similarly a to ranked iphone a gets 150 times more downloads than the a ranked at 200. Also, from the shae arameters for revenue (based on the to-grossing a list), the to-ranked a grosses 1.86 times more revenue than the second ranked a on the ipad. This relative valuation is an imortant factor for firms when they are investing marketing dollars in romoting their alications. One can readily infer the benefit of moving u or down in rank relative to the money sent on romotions. Scale Parameter (b) Usually, the shae arameter would be all we are interested in. Estimates of the shae arameters readily allow for comarison between two ranks. Moreover, the shae arameters tend to remain more stable over time. However, we can now rovide a method to estimate the scale arameter as well. To estimate the scale arameter we need additional information. Note from equation (7) that we can only recover the ratio of scale arameters. To estimate absolute scale arameters, we would need access to actual sale volume from a vendor. However, as we showed in equations (9) and (10), even the aggregate total number of downloads is enough to recover these arameters. Fortunately, the total number of downloads is calculated and resented by various a store analytics firms. As estimated by one such analytics firm (Distimo), the total number of downloads er day for the to 300 aid ipad as is aroximately 110,680 and the total number of downloads er day for the to 300 aid iphone as is aroximately 386,545. We can readily lug these numbers in equations (9) and (10) to estimate the scale arameter. 11 Given these statistics and the coefficients from the regression above, estimated scale arameters using equations (9) and (10) are rovided in the first row of Table 6. 11 Even though we estimated shae arameter with the to 200 as, extraolating the demand to the to 300 as is straightforward. MIS Quarterly Vol. 37 No. 4/December 2013 1259
Table 6. Estimated Pareto Scale Parameters ipad iphone b 13,516 52,958 b g $89,206 $126,666 Figure 4. Number of A Downloads Versus A Rank on To Paid List (ipad and iphone) Thus we can now illustrate the relationshi that links the total downloads of each a for any given rank. Similarly, we can secify the function that links revenues with rank. Using our estimated arameters, these functions are as follows: d d ipad iphone = 13, 516 r 0903. = 52, 958 r 0. 944 (13) (14) We exect the estimate of scale arameter to change with time as more consumers buy as on their mobile devices. Nonetheless, aggregate download numbers allow for the estimation of scale arameter. The grah in Figure 4 lots equations (13) and (14) for a sales as a function of a rank in the to aid list for the ipad and iphone. Given the high value for the shae arameter, the number of downloads dro sharly. An a ranked 200 on the ipad would generate about 100 downloads er day. However, an a ranked at 1000 would generate only about 25 downloads. Given that there are more than 200,000 as available, it is fair to conclude that most of them generate little or no demand. Model Validation One challenge in testing these models is the lack of available data on downloads and a revenue. We develoed three different ways to test the validity of our model. First, recall that we estimate two different models, one estimating total downloads and the other estimating total revenue. The difference in the two is that the second model is rice multilied by the demand coming from the first model. Thus, we can cross-validate the models by estimating downloads from the first model and multily by the a rice to get the estimated revenue from the second model. From Table 7, we can see that the values of estimated revenue are very close (confirmed by t-test) which rovides confidence in the accuracy of the comuted models. Additionally, to validate the model, we artnered with two searate a develoers (who have requested to remain anonymous). Develoer D1 shared data on an alication, its rank, and total downloads (ipad + iphone) for a month. Develoer D2 shared similar data, but her a was available for iphone only. We cannot estimate shae arameter from 1260 MIS Quarterly Vol. 37 No. 4/December 2013
Table 7. Summary Statistics of Estimated Revenue and Price Estimated Downloads iphone ipad N Mean (std. dev.) N Mean (std. dev.) Log (Estimated Downloads Price) 8164 7.97 (0.877) 10709 7.23 (0.999) Log (Estimated Revenue) 8164 8.10 (0.813) 10709 7.39 (0.930) Figure 5. Number of A Downloads Versus A Rank on To Paid List (ipad) D1 because the data is combined for iphone and ipad. However, we can still comare whether the redicted total downloads from our model match the actual download numbers rovided by the develoer. Recall that total redicted downloads are d = d + d = 13, 516 r + 52, 958 r 0903. 0944. total ipad iphone The mean value of the actual downloads in our data is 1,737 (std. dev. = 666) and the mean value for the estimated downloads based on rank information is 1,652 (std. dev. = 543). From the aired t-test, we find the -value = 0.388 (value for unaired t-test is 0.735). Thus, we can conclude, with some confidence, that the estimated model is redicting values that do not have a significantly different mean from the actual download numbers. We also lot actual and redicted downloads er day for this a over the samle eriod, as seen in Figure 5. It is evident from the lot that the model is doing a good job of redicting the sale of a given a if the rank is known. This rovides additional evidence regarding the robustness of our method. Data from develoer D2 is used to estimate shae arameter. However, D2 s a is a low-ranked a (average rank is 350). Tyically Ale does not ublish this rank. However, the a develoer worked with an a analytics firm that claimed to have this information. The data sanned 5 months from January 2011 to May 2011. The average number of downloads during this eriod was 301 (std. dev. = 13). Using this data (N = 55), 12 we estimated the shae arameter to be 0.89. This is consistent with our estimate of 0.94. This is desite the fact that the a was in the tail while our model is estimated using to 200 data. Thus, the actual data from two searate as rovides validation for our method in general. Finally, Distimo also shared the average daily revenue aggregated for the to 200 as in the to grossing list to be aroximately $632,158 for ipad and $1,014,371 for iphone. Using these numbers and the aroach suggested in equation (9), we estimated b g as $80,538 for the ipad and $120,375 for the iphone. We see that these scale arameters are close to the estimates we rovided in Table 6. This further validates our 12 The rank data was available only for a few days during each month. MIS Quarterly Vol. 37 No. 4/December 2013 1261
aroach to infer a demand from the ublicly available data. Discussion So far we have resented a methodology to infer the functional form of demand for Ale s A Store (ipad and iphone). We believe that this aroach is ortable to any size of ranked lists and any latform as long as the rank data from multile lists is available ublicly. This is an imortant contribution as it oens doors for both researchers and ractitioners to investigate interesting research and marketing investment questions for mobile latforms. Since the Android latform is another fast-growing mobile latform and in-a urchases are gaining momentum, the following discussion shows how our framework can be orted. We also discuss how our aroach can be extended to the free a rank list. Android Platform Although the Google Android store rovides crude information on the range of total lifetime downloads of an alication, it does not rovide any meaningful eriodic demand numbers. However, similar to Ale, Google does rovide both the to aid a lists and the to grossing a lists. Therefore, we can easily ort our method to develo the rank-demand correlation for Android a store. We collected similar data for one week in Aril 2012 and reestimated the shae arameters as shown in Table 8. We find that that the shae arameter for aid list (a ) is similar to the one estimated for the ios latforms but the shae arameter for revenue (a g ) is larger. It suggests that the revenue generated by as in a to grossing a list is more skewed on the Android latform. In-A Purchase () Ale s ios and Google s Android latforms allow for in-a urchases of content, functionality, services, or subscritions, 13 allowing a develoers to generate additional revenue from either aid as or free as. In-a urchases allow consumers to buy these additional features after exloring the caabilities and benefits of an a. Notice that our analysis deends on tying the aid list with the grossing list via rice to generate our estimates. However, if in-a urchase otions become a major source of revenue, 13 htts://develoer.ale.com/news/df/in_a_urchase.df. we need to find a way to modify our estimates. In what follows, we exlore this otion in detail. To account for ina urchases, we rewrite equation (3) as follows: where f I ( ) ( ( )) d = d + f I rg r ( = ) 0if a has no otion I 0 = θ I = θ if a has otion ( I = 1) Here I is an indicator variable identifying the availability of in-a urchase otions. If there is no in-a urchase available (I = 0), our analysis boils down to what we did earlier. If in-a urchases are available (I = 1), θ determines the revenue generated from in-a urchases. In short, all else equal, if an a generates large revenues from in-a urchases, its osition on the to grossing list will move relative to the to aid list. Since we know which as have in-a urchase otions, it allows for ready identification of θ. Taking logs of both sides reduces the regression equation to log(r g ) = β 0 + β 1 log(r ) + β 2 log( + θ I ) + g (15) Estimating the arameters using a nonlinear regression, we find that the shae arameters are consistent (a _ipad = 0.90, a _iphone = 0.939). Thus, the addition of in-a urchase otions does not change the sloe arameter of our estimated distribution. We estimate the value of θ = 0.16 (-value < 0.03), which suggests that, on average, revenue is aroximately equal to the 16 cents er download of an a (about 7% of the revenue). Thus, our method allows us to calibrate the rank-sales relationshi readily, even in the resence of ina urchase otions. Of course, our model s accuracy will decrease if each aid a has an in-a urchase otion and generates unequal revenues from in-a urchases for unobserved reasons. Even then, if we have some observables that can redict ina revenues, we can readily use this method. 14 In short, our method can be readily modified to accommodate the in-a urchase otion. Free As Our aroach so far has focused on the demand estimation for the aid as that are laced on the to 200 bestselling list and overla with the to grossing a list. Another extension 14 Generally we would exect that, in future, as on the to aid list will also generate more revenues from in-a urchase. This ositive correlation will kee their osition on to-grossing a intact, allowing our method to go through readily. 1262 MIS Quarterly Vol. 37 No. 4/December 2013
Table 8. Estimated Pareto Shae Parameters (Android Platform) Android a 0.985 a g 1.165 of this work is to estimate the demand for free as, where the overla with the to grossing list is urely because of the ina urchases. Since free as attract over 10 times more volume of downloads, it is even more lucrative for a develoers to infer the extent of revenue generated using in-a urchases for free as. We briefly discuss how our method can be alied to free as. The key difference between aid and free as is rice, so the revenues for free as are simly where f I ( ) d = d f I ( ) rg rf ( = ) ( = 1) 0if ahasno otion I 0 = θ I = θ if ahas otion I Using the overla between to grossing as and to free as, our estimable form, similar to equation (5), reduces to log(r g ) = β 0 + β 1 log(r F ) + g where the estimated arameters are related to model arameters as follows: ( a g ) ( F g) β θ 0 = 1 log bf b g β 1 = a a Recall, our grossing a arameter a g is already estimated from the aid list. Therefore, we can readily recover a F from the estimate β 1. 15 In our estimation, we treat each data oint as an indeendent observation even if the same a aears multile times. This is because the rank of an a is driven by its sale alone. If that were not the case and some other factors affected rank, then it is ossible that a higher ranked a might have lower downloads. In that case, we can rely on fixed effect models. These models assume that each a is searate and unique and has its own intercet. Our method can readily be used if Ale were to start a ranking scheme that does not entirely rely on demand alone. 16 Conclusion We have used ublicly available rank data and resented a methodology to estimate the roduct sales from the rankings of as listed in to-200 lists on Ale s a store for both iphone and ipad. From our analysis, we find that the number of downloads enjoyed by an ipad a ranked first on the to aid list is 120 times the number of downloads for the a ranked 200. Similarly an iphone a ranked first gets 150 times more downloads comared to the a ranked at 200. We also show that the iphone a ranked first on the to grossing list earns 95 times more revenue comared to the a ranked 200. The similar number for the ipad a store is 110. Thus our model allows for comarison between any two ranked as. We also rovide a method to estimate scale arameter from aggregate data. Thus, we show that the to ranked a on the ipad is downloaded 13,516 times er day and the to ranked a on the iphone is downloaded 52,958 times er day (Aril May 2011). These estimates should hel a develoers and marketing rofessionals guide their marketing efforts. We have validated our model in different ways, including gathering data from two searate a develoers. We also consider various extensions. For examle, we extend our method to Google s Android latform to calibrate a similar relationshi. We consider the ossibility of how in-a urchase otions could affect our estimates. We show how our method can be readily adated to account for the exected growth of in-a urchase otions in the future. We also show how our method can be extended to the free a rank list. Most imortantly, we believe that inferring demand data from rank is highly valuable for researchers. It will oen doors for 15 We estimate a f = 0.45 and 0.62, resectively, for iphone and ipad from our data. These estimates are lower than aid a suggesting that the curve for free as is not as skewed as for aid as. 16 If we used the fixed effect aroach, our estimate for shae arameter increases to 1.33 for iphone and 1.64 for ipad. They are significantly higher than estimates without fixed effect or the estimated shae arameter from the actual sales data from develoer D2. MIS Quarterly Vol. 37 No. 4/December 2013 1263
more exciting and interesting research that has not been ossible due to the absence of reasonable demand estimates. Mobile as are an imortant and fast growing technology market. Understanding this market and the oortunities it offers is imortant for different stakeholders. We believe our research method and results have many imortant imlications and hence make a very useful contribution not only to academic literature, but also to managers and entrereneurs. We also believe our methods can generalize to any latform that rovides to-aid and to-grossing rankings. To mobile latforms indeed rovide such rankings. Our research can be imroved on many different dimensions. Our dataset is limited to the to 200 as. Having access to a larger number of ranked as (say, the to 1000) should rovide a better fit. Similarly, while we verified our estimates with data from only two a roviders, there are significant oortunities to exand the scoe of this data collection by collecting data from more as across different categories. Also, we assumed that both download- and revenue-based rankings follow a ower law, but alternate and more recise distributions could also be develoed if more data is available from a roducers. We hoe future research will extend and refine our methods. Acknowledgments First, we would like to thank Distimo for sharing the aggregate sales and demand numbers, and our confidential sources for sharing the alication sales data for Ale s a store. We thank Vanlal Peka for his hel during the data acquisition hase. We thank Michael Smith for his suort and feedback on this research roject. Finally, we are grateful to the senior editor, the associate editor, and the three anonymous reviewers for helful feedback and suggestions. References Anderson, C. 2004. The Long Tail, Wired Magazine (12:10) (htt://www.wired.com/wired/archive/12.10/tail.html). Barra, H. 2011. Android: Momentum, Mobile and More at Google I/O, Official Google Blog, May 10 (htt:// googleblog.blogsot.com/2011/05/android-momentum-mobileand-more-at.html). Boudreau, K. J. 2012. Let a Thousand Flowers Bloom? An Early Look at Large Numbers of Software A Develoers and Patterns of Innovation, Organization Science (23:5),. 1409-1427. Brynjolfsson, E., Hu, Y. J., and Smith, M. D. 2003. Consumer Surlus in the Digital Economy: Estimating the Value of Increased Product Variety at Online Booksellers, Management Science (49:11),. 1580-1596. Brynjolfsson, E., Hu, Y. J., and Smith, M. D. 2010. The Longer Tail: The Changing Shae of Amazon s Sales Distribution Curve (available at SSRN: htt://ssrn.com/abstract=1679991). Carare, O. 2012. The Imact of Bestseller Rank on Demand: Evidence from the A Market, International Economic Review (54:2),. 717-742. Chevalier, J., and Goolsbee, A. 2003. Measuring Prices and Price Cometition Online: Amazon.com and BarnesandNoble.com, Quantitative Marketing and Economics (1:2),. 203-222. Chevalier, J., and Mayzlin, D. 2006. The Effect of Word of Mouth on Sales: Online Book Reviews, Journal of Marketing Research (43:3),. 345-354. Ghose, A., Smith, M. D., and Telang, R. 2006. Internet Exchanges for Used Books: An Emirical Analysis of Product Cannibalization and Welfare Imact, Information Systems Research (17:1),. 3-19. Ghose, A., and Sundararajan, A. 2006. Evaluating Pricing Strategy Using e-commerce Data: Evidence and Estimation Challenges, Statistical Science (21:2),. 131-142. Poynter, D. 2000, June. Publishing Poynters, Para Publishing. Weingarten, G. 2001. Below the Beltway, Washington Post Setember 2,. W03 (htt://www.washingtonost.com/w-dyn/ articles/a21499-2001aug30.html). About the Authors Rajiv Garg is an assistant rofessor in the McCombs School of Business at the University of Texas at Austin. He received his Ph.D. in Information Systems and Management from the Heinz College at Carnegie Mellon University. He also has graduate degrees in Comuter Science and Electrical Engineering, both from University of Southern California, and an undergraduate degree in Electrical Engineering from Indian Institute of Technology, Banaras Hindu University in India. His research interests are on the intersection of economics, marketing, and information systems with a focus on digital, social, and mobile latforms. Rajiv is a senior member of IEEE and for the ast decade has served on the boards of various small cororations. Rajiv s research work has aeared in Journal of Management Information Systems and various eer-reviewed conference roceedings. Rahul Telang is a rofessor of Information Systems and Management at the Heinz College, Carnegie Mellon University. He received his Ph.D. in Information Systems from the Teer School of Business, Carnegie Mellon University. Rahul s research interests lie in two major domains: the digital media industry and economics of information security and rivacy (for which he received an NSF CAREER). Currently, he is working on a large NSA funded roject on examining home users security and rivacy behavior. Rahul has ublished extensively in many to journals including Management Science, Marketing Science, Information Systems Research, MIS Quarterly, and Journal of Marketing Research. He is a senior editor at Information Systems Research and MIS Quarterly. His work has been cited in major media outlets and many of his aers have received to honors at journals and conferences. 1264 MIS Quarterly Vol. 37 No. 4/December 2013
Coyright of MIS Quarterly is the roerty of MIS Quarterly & The Society for Information Management and its content may not be coied or emailed to multile sites or osted to a listserv without the coyright holder's exress written ermission. However, users may rint, download, or email articles for individual use.