INDICATORS THROUGH MOBILE PHONE RECORDS. How mobile phone calling patterns can help us measure poverty, income and more.

Size: px
Start display at page:

Download "INDICATORS THROUGH MOBILE PHONE RECORDS. How mobile phone calling patterns can help us measure poverty, income and more."

Transcription

1 FIRF FFffff FF 3/13/2014 SECOND YEAR POLICY ANALYSIS TOWARDS REAL-TIME MONITORING OF SOCIO-ECONOMIC INDICATORS THROUGH MOBILE PHONE RECORDS The case of Rwanda: How mobile phone calling patterns can help us measure poverty, income and more. Written in fulfillment of the requirements for the degree of Master in Public Administration in International Development (MPA/ID), John F. Kennedy School of Government, Harvard University Alexis Eggermont Section Leader: Ishac Diwan Advisors: Ricardo Hausmann and Jukka-Pekka Onnela

2 Written in fulfillment of the requirements for the degree of Master in Public Administration in International Development (MPA/ID), John F. Kennedy School of Government, Harvard University All models are wrong, but some are useful. -- George E. P. Box If you torture the data enough, nature will always confess. -- Ronald Coase [1]

3 Contents Executive summary... 4 Research question... 5 Context... 5 Advantages of mobile phone metadata... 7 Literature review and state of the art... 9 Clients Data description Variable generation A. Linking mobile phone users and surveys to geography B. Mobility variables (independent variables) C. Social variables (independent variables) D. Usage variables (independent variables) E. Socio-Economics variables (dependent variables) Predictive model and results At the most granular level At lower granularity levels Policy applications A. Possible uses for data B. Institutional arrangement Assumptions and limitations A. Limitations to the overall approach B. Limitations of the current data Conclusion Glossary Bibliography Code [2]

4 Table of illustrations Figure 1 - Total mobile phone penetration in Africa,... 5 Figure 2 - Mobile penetration in Rwanda by carrier... 5 Figure 3 - BTS and Voronoi polygons - national Figure 4 - BTS and Voronoi polygons - zoom on Kigali Figure 5 - Real vs predicted poverty (R 2 = 16%) Figure 6 - Evolution of R-squared with addition of new variables (dependent variable: bottom wealth quintile) Figure 7 - Real vs predicted wealth (R 2 = 64%) Figure 8 - Evolution of R-squared with addition of new variables (dependent variable: top wealth quintile). 21 Figure 9 - R 2 including mobile penetration Figure 10 - R 2 including penetration Figure 11 - Initial 637 polygons Figure 12 - Subsequent 118 polygons Acknowledgements I would like to thank my advisors Ricardo Hausmann and Jukka-Pekka Onella as well as my section leader Ishac Diwan for their expertise and guidance in writing this paper. It could also not have been done without access to the rich dataset kindly provided by Real Impact Analytics and the mobile phone operator. Precious technical help was provided by Frank Vanden Berghen and his incredibly fast ETL software Anatella, which I have grown to appreciate more and more as I faced the challenges of large datasets. Finally, Carol Finney has been a second mother and formidable program director. I am very grateful for each one of the pieces of this puzzle that these people brought together. [3]

5 Executive summary The spread of mobile technology throughout the world has created new opportunities for data collection. In this paper, we examine mobile phone call records and their potential for predicting socio-economic indicators. We review the literature on the recent topic of Big Data, focusing on its use for the public good and development. We then dive into a concrete example using 1.9 billion mobile phone records from a large operator in Rwanda. By combining this data with socioeconomic variables from household surveys and census data, we create a model that attempts to predict the socio-economic variables, such as poverty rates, based on i) mobile phone usage variables, such as total calling time per user ii) mobility variables, such as the fraction of users who have never call from outside their home village or town and iii) social variables, such as the number of contacts and mobility of these contacts. We use rigorous regression techniques with k-fold cross-validation and find statistically significant (p-value<0.0001) models for the prediction of the proportion of households in each village or neighborhood in the top and bottom national wealth quintiles, as well as mobile phone ownership and bank account ownership at the BTS level (Base Transmission Station, i.e. usually a village or city block), with out-of-sample R-squared values of 16% for poverty prediction and 64% for wealth prediction. By adding mobile penetration, these predictive powers jump to 51% and 81% respectively. Due to high levels of noise in the dependent variable at village level, we further examine the performance of the models on aggregated geographic areas but do not find improved results. We then examine the current limitations of our models and ways in which they could be bettered. This provides us with sufficient confidence to map socio-economic variables such as income with much higher spatial and temporal resolution than through traditional means and at a much lower cost. We conclude by recommending ways in which these techniques could be implemented by governments to monitor socio-economic variables for regional planning or poverty-targeting programs. [4]

6 Research question How can mobile phone metadata be used to monitor or enrich socio-economic data? In particular, can it improve poverty mapping in Rwanda? Context The widespread adoption of mobile phones in developing economies has occurred faster than any other communication technology. As mobile phones have become increasingly affordable over the past decade, penetration rates have soared to cover most of the adult population on the African continent. The resulting increased capacity for communication enables has undoubtedly enhanced the social and economic integration of the region: migrant workers can now stay in contact with their families more easily than before, markets are made more transparent and efficient (Jensen, 2007) and mobile money services now facilitate business transactions, remittances and even micro-credit. In the past 3 to 5 years, Figure 1 - Total mobile phone penetration in Africa 1,2 Figure 2 - Mobile penetration in Rwanda by carrier 3 Note that the late 2013 decrease is due to a change in the definition of active subscriber rather than a change in trend. these high levels of penetration and usage have become very significant even in the most rural areas of the least developed countries (GSMA, 2012). Rwanda is no exception to this trend. Mobile penetration rates, define as number of active mobile phone numbers divided by population, have 100% 80% 60% 40% 20% 0% 100% 80% 60% 40% 20% F 2014F Rwandatel Airtel Tigo MTN 0% GSMA (2012) 2 World Development Indicators 3 Rwanda utilities regulatory authority [5]

7 soared in the past 3 years to reach almost 65% 4 of the total population. We must note two things in order to better understand this number: on one hand, many Rwandans use more than one SIM card simultaneously in order to benefit from lower on-network calling rates, implying that the penetration rate defined above overestimates the fraction of the population that actually uses cell phones. This will be referred to as multi-simming henceforth. On the other hand, the proportion of children below the age of 15 in Rwanda in 2010 was 42.6% (UN 2012). If we make the assumption that children under 15 do not have their own SIM card, the adult unadjusted penetration rate can be inferred to reach 113% nationwide. While the average number of SIM cards per mobile user was not available for Rwanda, this figure has been estimated at 1.96 for Africa in general 5. In the absence multi-simming figures specific to Rwanda, our best estimate of multi-simming adjusted adult mobile phone ownership gathered from SIM registrations is 113%/1.96, or 57%. More accurate figures of phone ownership obtained through direct surveys reveal that the percentage of households with at least one mobile phone dramatically increased between 2005/2006 and 2010/2011 in urban areas (from 26.5% to 71.5%) as well as rural areas (from 2.2% to 40.6%). In Kigali City in particular, the ownership rate was 79.6% in 2010/2011, up from 33.2% in 2005/2006 a growth rate of 46.4 per cent in only five years 6. Since the overall number of active SIM cards in the country has almost doubled between December 2010 and November 2013, we can reasonably assume household ownership rates above 60% and rising at the time of writing. We note some gender asymmetry in mobile phone ownership, with households headed by females having an ownership rate of 35.1% against 49% of households headed by males. These multi-simming adjusted adult mobile phone ownership rates of above 50% open the door for remarkable data gathering possibilities. Cell phone metadata such as call detail records (CDR), which include call location, length, call recipient and emitter, are usually stored by mobile phone operators for several years for billing and business intelligence purposes. They can be used to locate individuals and populations, identify their interactions and social network, their mobile phone credit patterns, etc. The possible applications of such data are countless. Just to name a few: migration flows can be 4 Rwanda utilities regulatory authority 5 GSMA intelligence National Institute of Statistic of Rwanda, Integrated Survey on Life Conditions, [6]

8 identified, analyzed and possibly predicted, commuting times and traffic can be measured, communities can be identified, and, as we will show, economic activity can be inferred. We will detail some of the documented possibilities in the literature review section. Given such a wide coverage of the population, one of the main inferential concerns of the past few years, namely bias towards the mobile-phone-owning minority, is still rapidly receding. This is not to say that bias does not exist - there is still higher penetration in some areas than others, but that it exists in a much more controllable way. In the example that we will formalize below, estimates of penetration rates by village or neighborhood provide an interesting variable to attempt to close the gap left by imperfect penetration. For example, when attempting to predict poverty rates, low rates of phone ownership can safely be assumed to be a significant predictor. Advantages of mobile phone metadata The technical part of the paper will be dedicated to showing if reliable estimates of poverty and other social indicators could be inferred from mobile phone usage patterns or not. If this is the case, we identify four ways in which such data collection would improve on current poverty mapping methods: 1. Granularity: Most household surveys are constrained by relatively small sample sizes. These usually allow poverty estimations to be statistically significant at the level of the first subnational division. In the case of Rwanda, these data are available for 5 provinces with populations of one to three million each. A mobile phone pattern-based approach, if shown to be sufficiently accurate, would allow such estimations to be made at the level of each village or neighborhood 7, a 1000-fold increase in the level of detail. 2. Cost: Whereas household surveys are costly endeavors, proportional to the sample size, mobile phone usage data is already being collected by mobile phone operators for every mobile phone user. The cost of exploiting this data would be very minimal and very easily reproducible (for example in other countries). 3. Frequency: Because of the cumbersome process required for household surveys, they are typically only performed every few years. The approach suggested in this paper could easily be performed daily at almost no cost. 7 In our proof-of-concept, the median number of mobile phone users per BTS was 700. [7]

9 4. Time lag: Household surveys take longer to be performed and processed. The mobilephone data approach would allow poverty to be monitored as it happens, rather than with months or years of delay. The maps below, depicting electoral outcome in the United States, provide an illustration of the objective of our project. As of now, poverty mapping in many African countries resembles the map on the left. The cost necessary to the drawing of this map allows it to be updated only about every 5 years. Our objective is to refine it into something that looks more like the map on the right, updated in real time and at very little cost. [8]

10 Literature review and state of the art Use of big data for the public good The use of large databases to predict characteristics of a population is a fairly recent trend dependent on technologies that often did not exist just one or two decades ago. The term Big Data, which refers to collections of data sets so large and complex that it becomes difficult to process using onhand database management tools or traditional data processing applications, is difficult to trace back to a specific origin, but has really become a common occurrence in the literature since the mid-tolate 2000s. The field was first driven by the private sector, as multinational companies eager to better monitor their performance, track their customers and understand their needs, adopted the emerging technologies necessary for it. In the past few years, governmental bodies (outside of the intelligence community, presumably a very early adopter of Big Data techniques) have jumped on the bandwagon. We provide below a few striking but non-exhaustive examples of current usage of Big Data for social or public purposes, by private or public actors. 1. Google.org, the charitable arm of Google, studies the timing and location of millions of search engine queries to predict flu outbreaks and unemployment before official statistics come out. These estimates have been generally consistent with conventional surveillance data collected by health agencies, both nationally and regionally 8. The large scale of the data collection is instrumental to the isolation of the signal (higher incidence of flu-related searches) from the noise (unpredictable variations due to people making flu-related searches without being infected) (Pervaiz et al 2012). 2. Telefonica Research, the R&D arm of Spanish telecommunications giant Telefonica, has developed two tools that use call records for public usage to compute more affordable census maps and to evaluate the impact of public health alerts on epidemic spreading (Frias-Martinez et al 2012). 3. Recently, a new class of geo-localized data has emerged, letting companies make useful inferences about people s lives and economic activity. For example: the length of time that people are willing to travel to shopping malls, which can easily be inferred from tracking the location of cell phones, is an excellent proxy for measuring consumer demand in the economy (Bollier et al 2010). 8 Google.org - [9]

11 4. Crimson Hexagon, a Harvard start-up, has developed tools to identify signals present in social media that can shed light on how populations cope with global crises, such as commodity price volatility. By classifying a population s tweets into categories associated with relevant topics, the tools can detect anomalies such as spikes or drops in the number of tweets about particular topics (e.g. comments about power outages in Indonesia or student loans in U.S.), observe weekly and monthly trends in Twitter conversations (e.g. discussions around debt in U.S.), etc. This research has pointed to the potential use of Twitter data for understanding the immediate worries, fears and concerns of populations. Their research has also highlighted the limitations of this approach for gauging people s long term aspirations SAS and the United Nations Global Pulse recently investigated whether useful signals could be found in public social media data in order to predict unemployment and indicate how people cope with rising unemployment in the US and Ireland 10. They found very powerful and statistically significant results. For example, conversations in Ireland showing a confused mood preceded the unemployment rate variations by 3 months. 6. A collaborative study between the United Nations Global Pulse and PriceStats focused on the construction of a daily bread price index for six Latin American countries. The results show that online retail prices reveal offline street price changes weeks before official sales numbers reflect the inflation. This could allow policy makers to better prepare for the negative effects of inflation on consumers A multi-partner project with the Complex Systems Institute of Paris Ile-de-France & IFRIS examined thousands of news items related to food security issues in French language press. The findings show a shift in news coverage from a focus on humanitarian issues to food price volatility discussions as the 2008 global economic crisis unfolded. Data from 2011 revealed that the news focus had by then shifted to social unrest United Nations Global Pulse United Nations Global Pulse United Nations Global Pulse United Nations Global Pulse - [10]

12 8. NASA satellite images of the Earth at night are routinely used to estimate economic development and population density Large Big Data service providers like IBM are already providing services tailored for governments using government data sources. These services encompass threat prediction, detection of social program fraud, tax fraud, crime prediction, etc. 14 Prediction of economic and social indicators using mobile phones The prediction of macro-economic indicators using mobile phone record requires great computing power, the widespread use of mobile phones in order to minimize bias, and the consent of at least one network operator to give access to its call records (or more aggregated data). As a result of these three constraints, it is a field that did not exist less than a decade ago and is still today very much embryonic. We highlight below the papers that have explored this new field. It has been suggested in theoretical work that the structure of social relations between individuals may affect a community s economic development. In particular, highly clustered, or insular, social ties are predicted to limit access to social and economic prospects. Eagle, N., Macy, M., & Claxton, R. (2010) quantify this relationship and find that the diversity of individuals relationships is strongly correlated with the economic development of communities in the United Kingdom. Closely related to our topic, Smith, C., Mashhadi, A. & Capra, L. (2013) highlight correlations between the Multidimensional Poverty Index and a number of aggregated variables of traffic between antennas in Cote d Ivoire. This is then used to compute a more precise map of poverty than previously available. We aim to improve on this method through the use of much more granular cell phone data (individual call records instead of antenna traffic) and poverty data (320 measures of cluster poverty instead of 11), allowing for more robust models. Based on field interviews, Blumenstock, J., & Eagle, N. (2012) also note large disparities in patterns of phone use and in the structure of social networks by socioeconomic status. Blumenstock, J., Shen, Y. and Eagle, N. (2010) create a method for estimating household wealth based on call history and find an R-squared of 0.21 between expenditures and phone use in Rwanda, which is indicative of a strong relationship between these variables, although not as strong as may 13 United Nations Global Pulse IBM - [11]

13 have been expected. They also find patterns in the relationship between a person s social network and his or her predicted expenditures. Namely, the number of unique international contacts has a strong positive association with expenditures. They note that predicting expenditures and predicting poverty are fairly different exercises, and expect the validity of the correlation between expenditures and phone usage to be stronger among the poor than the rich, who are expected to have lower elasticity of demand (their phone expenditures being less limited by their wealth and more by their social network ). As discussed further in the paper, we also expect to have higher predictive power in predicting poverty rates at the level of a cluster or municipality instead of predicting poverty a specific individual level. This is because errors average out on aggregation, for the same reason as why the relationship between life expectancy and GDP per capita is very strong at a country level (R-squared 0.75), but income is explains a much smaller part of the variability in life expectancy for a specific individual. Eagle et al. (2010) developed a metric to capture the Social Diversity of communication ties within a user s social network. Higher diversity scores mean that a user splits his or her time more evenly among social ties. The authors show in the same paper that such diversity score are highly correlated with socio-economic rank in the UK (R-squared = 0.5). They also demonstrate that structural holes (links that people have with different communities of people) have a positive correlation with the socio-economic level of neighborhoods in the United Kingdom Also using social variables, Wang et al. (2013) develop a reciprocity index to capture the degree of communicative imbalance between the nodes. Finally, Soto et al (2011) take an approach somewhat similar to the one we will favor below. Limiting their analysis to urban population of around 500,000 citizens in Latin America, they present predictive models constructed with Support Vector Machines and Random Forest algorithms that use multiple aggregated behavioral variables of at the Base Transmission Station (BTS) level to predict socioeconomic levels, with show correct categorization in around 80% of cases. [12]

14 Clients The primary client of the paper is the National Institute of Statistics of Rwanda. This paper aims to be a proof of concept of Big Data tools for governmental and public use. We advocate that the mission of the National Institute of Statistics could be broadened from the collection and distribution of statistics to the detection of digital signals used for real-time monitoring of socio-economic variables. This new type of data would for example lead to potential improvement of targeting for povertyreduction transfers and expenditures through a finer, cheaper and quicker understanding of the geographical determinants of poverty. The inclusion of mobile phone operators in that process is discussed in the policy proposal part of the paper. The secondary client of the paper will be the United Nations Global Pulse, a recent initiative launched by the Executive Office of the United Nations Secretary-General. The Global Pulse is exploring how new, digital data sources and real-time analytics technologies can help policymakers understand human well-being and emerging vulnerabilities in real-time, in order to better protect populations from shocks. The implementation of the suggestion made in this paper could serve as a pilot program for the Global Pulse on the use mobile data for macroeconomic predictions, which could then be implemented in other countries. [13]

15 Data description Cell phone data Our main dataset for this exercise consists of about 1.9 billion call detail records (CDR) for calls made using pre-paid SIM cards from the largest cell phone operator in Rwanda for the 7 months between the beginning of December 2011 and the end of June This dataset will be used to characterize mobility, social and expenditure behavior and will become the source for the construction of most independent variables. Each CDR contains the following fields: Anonymized phone number of the caller Anonymized phone number of the receiver Date of the call Time of the call ID of the cell on which the caller started the call (i.e. part of a cell phone antenna) Duration of the call Cost of the call A separate data set allows us to locate each cell to a specific site (i.e. cell phone tower/mast) with given GPS coordinates. 15 Household survey data Household surveys will be the source of the socio-economic data which will form our dependent variables. We use a public data set from MeasureDHS, which implemented its 2010 survey in partnership with the National Institute of Statistics of Rwanda (NISR) and the Ministry of Health (MOH). It features interview responses at the household level for 12,540 households. About 26 households were randomly selected in 492 clusters (villages or neighborhoods), which are a Rwandan administrative unit dividing the country in 14,837 parts. Each cluster was randomly selected with probability proportional to its population. 15 Note that in the current draft version of this paper, all computations were done using the first 6 days of data for each of the 7 months [14]

16 The target groups in these surveys were women age and men age who were randomly selected from households across the country. For privacy reasons, the GPS location of each interview is not given precisely but instead approximated by the geographical center of the cluster. Population data The final source of information is the WorldPop project 16. In order to be able to match population figures to the Voronoi polygons that will be used to link users to BTS, we will not be able to use census data directly, since it uses different, overlapping administrative divisions. The WorldPop project solves this problem by providing population estimate with a 100 meters resolution for most of the world, including Rwanda. This dataset is obtained by combining census population with a combination of widely available, remotely-sensed and geospatial datasets (e.g. settlement locations, land cover, roads, building maps, health facility locations, infrared nightlights, vegetation, topography) using Random Forest models [15]

17 Variable generation A. Linking mobile phone users and surveys to geography Since we will need to map our dependent and independent variables to the same geographical units, the first step in generating any mobile phone usage variables will be to link users to a specific BTS. For this, we sum the number of times each BTS was used by each user and attribute a higher weight to night calls (5PM to 5AM) than day calls 17, since they are more likely to have been made from a home location than from a work location. We then create geographical polygons around each BTS in a way that each point in the country is inside the polygon of its closest BTS 18. This process, called Voronoi tessellation, reflects fairly accurately the real process of cell phones connecting to a specific BTS. Figure 3 - BTS and Voronoi polygons - national Figure 4 - BTS and Voronoi polygons - zoom on Kigali Each household survey respondent can then be mapped to the generated Voronoi polygon that it falls into using its approximate GPS coordinates. 17 We use a 5 times higher weight for night calls than day calls in this case 18 As of the publication of the first draft, location is based on the most used site at all times for the 6 first days of each month for which we have data. [16]

18 These Voronoi polygons will become the basic unit of our models, i.e. each Voronoi polygon for which we have survey data will be one data point. B. Mobility variables (independent variables) Given that each call can be traced to a specific BTS and that we calculate the home BTS for each user, we can compute the distance between each call made and the home of the user. Using this, we define the following variables: Number and proportion of calls made between 75km and 300km from home Number and proportion of calls made between 30km and 300km from home Number and proportion of calls made between 10km and 300km from home Mean by BTS of mean distance of calls for each user Number and proportion of user based at the BTS who have never made a call further than 10km from home Number and proportion of user based at the BTS who have never made a call further than 30km from home Number and proportion of user based at the BTS who have never made a call further than 75km from home Number of BTS used per week C. Social variables (independent variables) Mean by BTS of proportion of top 5 closest ties of each user who have never made a call further than 10 km from home Mean by BTS of proportion of top 5 closest ties of each user who have never made a call further than 10 km from home Mean by BTS of proportion of top 5 closest ties of each user who have never made a call further than 10 km from home Herfindahl index of contacts, Shannon entropy and Reciprocity index could be added in a further exploration [17]

19 D. Usage variables (independent variables) Number of users based on that BTS For home BTS and call BTS: o Average cost of call o Average of fraction of each user s calls that last: under 5 seconds under 15 seconds under 60 seconds above 300 seconds o Average of fraction of each user s calls that cost Exactly 0 Under 25 francs Under 100 francs Under 500 francs Above 500 francs Note that Recharge and Mobile Money variables, such as average value of recharge, monthly amount sent and received using mobile money, are expected to exhibit strong predictive power, but could not be linked to geographical units and are therefore not included in the model presented in this version of the paper E. Socio-Economics variables (dependent variables) From the DHS, we retain the following variables: Fraction of the population in the Voronoi polygon in the bottom quintile in wealth o Nationally o Using rural wealth quintiles o Using urban wealth quintiles Fraction of the population in the Voronoi polygon in the bottom two quintiles in wealth o Nationally o Using rural wealth quintiles o Using urban wealth quintiles Fraction of the population in the Voronoi polygon in the top quintile in wealth [18]

20 Fraction of the population in the Voronoi polygon in the top two quintile in wealth Average median quintile of the population in the Voronoi polygon Fraction of the population in the Voronoi polygon with a bank account Fraction of the population in the Voronoi polygon with a mobile phone This variable may also be used as an independent variable as a way to compensate for the bias towards [19]

21 Predictive model and results At the most granular level In the first iteration of this paper, we use multivariate linear regression models to predict the fraction of the population in each Voronoi polygon which is in the bottom and top quintile of national wealth. Given the high number of possible dependent variables (68 for about 250 data points), we use two techniques to avoid overfitting and maximize the power of the model. First, we validate our model using k-fold cross-validation. The accuracy of the model is thus never measured using the data that was used to fit the model. In other words, we are not counting the inclusion of spurious correlation into our model as real accuracy. As we add variables to our model, the R-squared on our training set logically increases, however the real accuracy of the model starts decreasing after a point, since the model starts putting too much weight on spurious correlation and decreases the weight of real correlations, as can be seen on Figure 6 and Figure 8 below. In the case of this proof of concept, we observe a rise in our out-of-sample accuracy up to the selection of the 14 th most predictive variable in the prediction of the fraction of the population in the bottom wealth quintile (henceforth poverty prediction) and up to the 20 th most predictive variable for the prediction of the fraction of the population in the top wealth quintile (henceforth wealth prediction). We thus limit our model to the selection of the respectively 14 and 20 most predictive variables, and find out-ofsample R-squared values of 16% and 64% respectively. These initial results mean that the model is very good at predicting where the top quintile in wealth live based on mobile phone data (keeping in mind that our dependent variable is noisy at the level of a Voronoi polygon since only 26 households were usually surveyed in each village). However, the model in its present form performs fairly poorly for poverty prediction. This could be attributed to lower cell phone penetration in lower income areas, which would mean that mobile phones capture the behavior of a too small fraction of the population or a too income-biased sub-population to enable extraction of good poverty data. We must note however that some key variables, such as all transactional data, including recharges and mobile money, are yet to be added. [20]

22 Figure 5 - Real vs predicted poverty (R 2 = 16%) Figure 6 - Evolution of R-squared with addition of new variables (dependent variable: bottom wealth quintile) Figure 7 - Real vs predicted wealth (R 2 = 64%) Figure 8 - Evolution of R-squared with addition of new variables (dependent variable: top wealth quintile) [21]

23 Adding mobile phone penetration (as measured by the DHS) for poverty prediction gives us an outof-sample R-squared of about 51%, however this predictive power is captured almost exclusively by this one variable (48%), with the mobile phone usage variables predictably adding very little to the poverty prediction (only to reach an R-squared of 51% with 10 variables). Although the high prediction power of mobile penetration seems to defeat the purpose of the prediction since we are taking this variable from the DHS, it can also be extracted from mobile phone data if we can associate each Voronoi polygon with a population. This is an exercise we will perform in the second draft of this paper. Including this variable, we have demonstrated a capacity to monitor the geographical spread of the top quintile in wealth at a high spatial resolution with excellent predictive power using mobile phone patterns, and the bottom quintile at a lower but significant predictive power. We expect these predictive powers to go up with the addition of transactional variables. Figure 9 - R 2 including mobile penetration (poverty prediction) Figure 10 - R 2 including penetration (wealth prediction) At lower granularity levels Although the most precise way to construct the model is at the most granular level, the most precise way to measure its power is not, since the dependent variable data is fairly noisy due to the small DHS sample size at the village level (on average 26 households per cluster). One way to reduce the noise level would be to group neighboring polygons in a way that reduces the standard error to acceptable levels but still allows the user to extract granular predictions from the map. [22]

24 We therefore grouped adjacent polygons together in a way that strove to respect the geographical homogeneity of the area but included at least 3 household survey data points (corresponding to a minimum of 3 26 households surveyed. We ended up with 118 polygons instead of the 637 original ones, as picture below. Figure 11 - Initial 637 polygons Figure 12 - Subsequent 118 polygons However, we find no improvement in the explanatory power of the model after performing this transformation, as the gain in precision on the dependent variable is most likely offset by the loss in sample size. [23]

25 Policy applications A. Possible uses for data The model above illustrates the predictive power, and the limits, of data already collected as of today by companies in Rwanda for aggregated macro-economic indicators. In this section, we will focus on direct policy applications of such models and similar applications of mobile phone metadata analysis for the government of Rwanda. Tax collection targeting The proof-of-concept presented in this paper shows a strong capacity to identify neighborhoods where citizens in the top quintile of wealth reside. This information could be used to prioritize tax collection initiatives or to locate systematic tax evasion by the upper class by predicting tax revenue potential. We note that this would not be a desirable or reliable tool for identifying the tax potential of individuals, but could be useful at the neighborhood level. Poverty targeting and poverty maps Location is very powerful determinant of poverty, and geographical poverty information is thus logically often one of the cornerstones of poverty-alleviation efforts: knowing the geographical distribution of the poor and the rich helps to ensure that antipoverty spending reaches those it was designed to read and that leakage towards those who are not poor is limited. Further than a useful tool for poverty targeting, poverty mapping has a wider impact. It can inform the planning process at the subnational level and inform us about the geographical determinants of poverty. Combining geographical information poverty with other data sets, such as those relating to climate or infrastructure, may help us to better understand some of the drivers of poverty. Although, we have shown in the above proof-of-concept that the diffusion of mobile technology among the poor is still too low for mobile phone data to have significant predictive power, it is expected that penetration will continue to climb, reducing the bias in the data that makes prediction difficult. In parallel, prediction models will become better as experience builds up. We therefore think that there are reasons to be optimistic about the potential for poverty mapping in the future. [24]

26 Traffic and infrastructure monitoring Mobile phone metadata can also be used to monitor mobility, including the speed at which individuals move from one BTS to another. This enables monitoring of traffic conditions and provides a way to measure the impact of infrastructure development on mobility between cities or neighborhoods. While call records, in particular for calls handed over from one BTS to the next, could be sufficient for this exercise, another mobile data source, the Visitor Location Register, may provide more accurate data since it registers the location information without the need for a call to be made. This information is often not stored by the mobile phone operators as of now and may present additional privacy challenges. Immigration monitoring for infrastructure planning and emergency response Similarly to traffic monitoring, mobile phone metadata provides an easy way to monitor migration in near real time (a waiting period is necessary to distinguish real migration from temporary travel). This may be useful for urban planning, allowing cities to adapt faster to the inflows or outflows of population. It can also be used by authorities to track population movements after a natural disaster (as was done after the earthquake in Haiti). This would help the authorities to direct emergency aid and predict the diffusion of epidemics, which frequently break out after natural disasters. Selective dissemination of public health information Information regarding communicable diseases can be selectively transmitted to phone numbers identified as at-risk. This could be geography-based (inform all residents of an at-risk area, or those who have recently traveled there) or social network based (identify communities at risk). [25]

27 B. Institutional arrangement The paper aims to be a proof of a concept for a new method gathering and interpreting information on socio economic variables, including poverty levels. The policy proposal that ensues from this proof of concept is the setup of a framework agreement between the National Institute of Statistics of Rwanda, the three main mobile phone operators in Rwanda, and possibly a neutral third party aggregator such as the United Nations Global Pulse, allowing the sharing of anonymized and aggregated cell-phone records. Three options can be imagined: 1. Government-led approach: This implies the setup of a small digital signals team within the National Institute of Statistics dedicated to running, testing and calibrating poverty level predictions based on mobile phone data to which they are given access by the mobile phone operators. 2. Operator-led approach: In this model, the mobile phone operator(s) would take the lead to aggregate and analyze their own data set. They would then provide the results to the National Institute of Statistics or directly to the Ministries for which the information is relevant. This poses the problem of incentivization of mobile phone operators to provide these analytics to government agencies and may also render the aggregation of data from several operators more difficult or less insightful. A simple solution to the first problem would be for the operators to charge the national government for this service as a client, at a very small fraction of the cost of traditional data gathering methods. 3. Third-party led approach: This approach aims to bridge the gap between the two other approaches. A neutral third party could be entrusted to aggregate data from all operators and provide the analytical output to the relevant government agencies. This would allow the aggregation of data from several operators and ease some of the privacy and strategic concerns and restrictions linked to sharing even anonymized and partly aggregated call records with government institutions. [26]

28 The administrative or political obstacles for the implementation of this method are expected to be minimal. The main concerns are expected to lay in the privacy risks associated with sharing mobile phone records outside of the mobile operator. If the operators themselves can be convinced (e.g. compensated) to aggregate this data up to a level where it poses no more privacy or business risk, the institutional change required for this implementation is not expected to face significant political or administrative resistance. The choice of the right option requires a balance between three different types of risk: 1. Privacy risk, or the risk of a stakeholder abusing the information it is entrusted with. 2. Strategic risk, or the risk that information strategic to the mobile phone operators, such as the location of the cell phone towers, is leaked and is made available to their competitors 3. Incompleteness risk, or the risk that the information gathered will not be as accurate as it could because only part of the data is available (for example data from only one operator). Privacy poses the most serious risk when granular data is freely available to the government and the least risk when it is only accessed by the operator, which it already is for business purposes. Strategic risk is present any time that sensitive operator information is passed on to a third party. This includes BTS locations, number of subscribers by region, etc. Incompleteness risk is greatest when the operator aggregates all information into the relevant metrics. This incompleteness risk can be reduced by having all three operators aggregate their data and share the metrics with the government or third parties, but information is still lost in the process. The table below summarizes those risks. Privacy risk Strategic risk Incompleteness risk Government-led Operator-led Third-party led [27]

29 The best balance is achieved when information is aggregated by the operator to the level of the BTS, ensuring personal user information is never shared. The government or third party aggregator can then aggregate data from the different operators into one single geography-based data set. This setup necessitates the operators sharing the location of their BTS with the government, but this is usually already the case for purposes of urban planning and telecommunications regulation. We note that this is a conservative scenario in which some information is lost, since links between the users of different operators cannot be made after the information is geographically aggregated. For example, the computation of network variables across all operators, such as centrality variables, would not be possible in this setup since they can only be computed based on data at the level of individual call records or aggregated user-pairs. [28]

30 Assumptions and limitations A. Limitations to the overall approach 1. Mobile phone penetration: Although mobile phone penetration (number of mobile phone subscriptions) in the country is estimated to be about 65% among the entire population and above 113% for the adult population, our best estimate of multi-simming adjusted adult penetration in rural areas is only 41% in 2011, the period of our mobile phone dataset. This makes it likely that the behavior of the poor is still not adequately captured in current mobile phone datasets. However, extrapolating from Africa-wide trends, this penetration rate can be conservatively inferred to have surpassed 50% in 2014 and to still be rapidly growing. We would thus likely increase the precision of our results, in particular poverty prediction, if the study was done on a current dataset, and further improve this accuracy as time goes by. 2. Changing habits and technologies: Another important point to keep in mind if this method is to be implemented by the Rwandan government or external parties such as the Global Pulse is that the mobile phone signals passively collected are changing rapidly. It is entirely possible that specific signals that are highly predictive of a socio-economic indicator today will not be so tomorrow. It is therefore important to periodically assess and calibrate the model. 3. Although we cannot as of today empirically validate our predictions other than on the clusters for which we have data, the split of our data into a training set (used to fit the model) and a test set (used to calculate the predictive power of the model) allows to gauge how well our model does on data that was not used to construct it, thereby avoiding overfitting. 4. We have attempted to avoid the problem of cherry-picking by discussing the results of regressions for all socio-economic variables (so far only two) instead of just the most predictive ones. We firmly believe that the low fraction of variance explained by our regression in the poverty prediction model is interesting information in itself. [29]

31 B. Limitations of the current data Many of the limitations of the approach demonstrated in this paper diminish the precision and robustness of poverty predictions with the current data, but can easily be addressed if such a prediction method was rigorously implemented through collaboration between mobile phone operators and the Government of Rwanda. 1. Market share: our mobile phone data comes from only one operator, with a market share of approximately 40%. At this stage in our analysis, any geographic bias of the operator is not known, although it is not expected to be large since the operator is established and is pursuing a strong national coverage strategy. An ideal implementation of our method by the Government of Rwanda should include all 3 operators if possible. 2. Number of poverty observations: The number of data points used in our models is dependent on the availability of data both for the dependent and independent variables. While we have 637 active sites, and thus 637 Voronoi polygons, only 255 of them intersect with DHS surveys. 3. Noisy survey data: Our socio-economic variables will be constructed using data from a household survey. In order to gain spatial resolution, we use the disaggregated household survey data set rather than the published data, which only gives us poverty number at the province level (5 in Rwanda). This allows us to work with 255 data points instead of 5, but increases the standard error of each one of these data points since each data point is now constructed using an average of 26 randomly selected households. [30]

32 Conclusion The spread of mobile technology to very large parts of the population in the developing world creates new opportunities for data collection and remote sensing. We have first briefly highlighted the many ways in which governments and private organizations already do take advantage of these new data sources and the potential uses they could make of this data in the future. We then proceeded to develop a proof of concept for the monitoring of socio-economic variables with high spatial resolution and in real time at very low cost. We find very good out-of-sample predictive power of mobile phone penetration and calling patterns when the predicted variable is the fraction of the population in a village or neighborhood in the top national wealth quintile. Prediction of the fraction in the bottom quintile is much less accurate using only mobile phone metadata at this point, given the still low penetration of mobile phones among the poor. By adding mobile phone penetration, which can also be estimate in real time, as an explanatory variable, we find much better poverty predictions. We end by describing three different models for governments and NGOs to take advantage of the wealth of information contained in mobile phone metadata without compromising on individual or privacy or strategic information of mobile phone operators. We highlight the neutral aggregator as the most appropriate one. [31]

Big Data for Development

Big Data for Development Big Data for Development Malarvizhi Veerappan Senior Data Scientist @worldbankdata @malarv Why is Big Data relevant for Development? In developing countries there are large data gaps, both in quantity

More information

Is big data the new oil fuelling development?

Is big data the new oil fuelling development? Is big data the new oil fuelling development? 12th National Convention on Statistics Manila, Philippines 2 October, 2013 Johannes Jütting PARIS21 Big data (2 The future? Linked data: Is this the future?..

More information

BIG DATA FOR DEVELOPMENT: A PRIMER

BIG DATA FOR DEVELOPMENT: A PRIMER June 2013 BIG DATA FOR DEVELOPMENT: A PRIMER Harnessing Big Data For Real-Time Awareness WHAT IS BIG DATA? Big Data is an umbrella term referring to the large amounts of digital data continually generated

More information

Measurement of Human Mobility Using Cell Phone Data: Developing Big Data for Demographic Science*

Measurement of Human Mobility Using Cell Phone Data: Developing Big Data for Demographic Science* Measurement of Human Mobility Using Cell Phone Data: Developing Big Data for Demographic Science* Nathalie E. Williams 1, Timothy Thomas 2, Matt Dunbar 3, Nathan Eagle 4 and Adrian Dobra 5 Working Paper

More information

UN Global Pulse: Harnessing Big Data for a Revolution in Sustainable Development and Humanitarian Action Robert Kirkpatrick Director @rkirkpatrick

UN Global Pulse: Harnessing Big Data for a Revolution in Sustainable Development and Humanitarian Action Robert Kirkpatrick Director @rkirkpatrick UN Global Pulse: Harnessing Big Data for a Revolution in Sustainable Development and Humanitarian Action Robert Kirkpatrick Director @rkirkpatrick www.unglobalpulse.org @unglobalpulse Global Pulse Vision:

More information

Human mobility and displacement tracking

Human mobility and displacement tracking Human mobility and displacement tracking The importance of collective efforts to efficiently and ethically collect, analyse and disseminate information on the dynamics of human mobility in crises Mobility

More information

Big Data for Social Good. Nuria Oliver, PhD Scientific Director User, Data and Media Intelligence Telefonica Research

Big Data for Social Good. Nuria Oliver, PhD Scientific Director User, Data and Media Intelligence Telefonica Research Big Data for Social Good Nuria Oliver, PhD Scientific Director User, Data and Media Intelligence Telefonica Research 6.8 billion subscriptions 96% of world s population (ITU) Mobile penetration of 120%

More information

Introduction1. Sample Description. Drivers of Costs and the Empirical Approach or Explanatory Variables:

Introduction1. Sample Description. Drivers of Costs and the Empirical Approach or Explanatory Variables: Efficiency Drivers of Microfinance Institutions (MFIs): The Case of Operating Costs 1 Adrian Gonzalez, Researcher, MIX (agonzalez@themix.org) The findings, interpretations, and conclusions expressed in

More information

Big Data, Official Statistics and Social Science Research: Emerging Data Challenges

Big Data, Official Statistics and Social Science Research: Emerging Data Challenges Big Data, Official Statistics and Social Science Research: Emerging Data Challenges Professor Paul Cheung Director, United Nations Statistics Division Building the Global Information System Elements of

More information

Household Survey Data Basics

Household Survey Data Basics Household Survey Data Basics Jann Lay Kiel Institute for the World Economy Overview I. Background II. Household surveys Design Content Quality Availability I. Background Not new, household survey data

More information

Statistical Challenges with Big Data in Management Science

Statistical Challenges with Big Data in Management Science Statistical Challenges with Big Data in Management Science Arnab Kumar Laha Indian Institute of Management Ahmedabad Analytics vs Reporting Competitive Advantage Reporting Prescriptive Analytics (Decision

More information

Table 1. Active User Projections for Planned M-Banking Service

Table 1. Active User Projections for Planned M-Banking Service Market Research MARKET SIZING OVERVIEW Market sizing is traditionally defined as estimating the number of buyers of a particular product, or users of a service. Because of the relative newness of mobile

More information

Big Data for Development: What May Determine Success or failure?

Big Data for Development: What May Determine Success or failure? Big Data for Development: What May Determine Success or failure? Emmanuel Letouzé letouze@unglobalpulse.org OECD Technology Foresight 2012 Paris, October 22 Swimming in Ocean of data Data deluge Algorithms

More information

Project Proposal: SAP Big Data Analytics on Mobile Usage Inferring age and gender of a person through his/her phone habits

Project Proposal: SAP Big Data Analytics on Mobile Usage Inferring age and gender of a person through his/her phone habits George Mason University SYST 699: Masters Capstone Project Spring 2014 Project Proposal: SAP Big Data Analytics on Mobile Usage Inferring age and gender of a person through his/her phone habits February

More information

How To Get A Better Customer Experience

How To Get A Better Customer Experience Mobile Phone Data as the Key to Promoting Financial Inclusion The Challenge Over 2 billion people across the world do not make use of financial services. The problem in reaching the underserved and unbanked

More information

Predictive Analytics for Donor Management

Predictive Analytics for Donor Management IBM Software Business Analytics IBM SPSS Predictive Analytics Predictive Analytics for Donor Management Predictive Analytics for Donor Management Contents 2 Overview 3 The challenges of donor management

More information

Marketing data quality

Marketing data quality An Experian white paper The most proactive CMOs are trying to understand individuals as well as markets. Customer intimacy is crucial and CEOs know it. In our last CEO study, we learned CEOs regard getting

More information

THE POWER OF SOCIAL NETWORKS TO DRIVE MOBILE MONEY ADOPTION

THE POWER OF SOCIAL NETWORKS TO DRIVE MOBILE MONEY ADOPTION THE POWER OF SOCIAL NETWORKS TO DRIVE MOBILE MONEY ADOPTION This paper was commissioned by CGAP to Real Impact Analytics Public version March 2013 CGAP 2013, All Rights Reserved EXECUTIVE SUMMARY This

More information

Innovation Metrics: Measurement to Insight

Innovation Metrics: Measurement to Insight Innovation Metrics: Measurement to Insight White Paper Prepared for: National Innovation Initiative 21 st Century Innovation Working Group Chair, Nicholas M. Donofrio IBM Corporation Prepared by: Egils

More information

Step 1: Analyze Data. 1.1 Organize

Step 1: Analyze Data. 1.1 Organize A private sector assessment combines quantitative and qualitative methods to increase knowledge about the private health sector. In the analytic phase, the team organizes and examines information amassed

More information

Concept and Project Objectives

Concept and Project Objectives 3.1 Publishable summary Concept and Project Objectives Proactive and dynamic QoS management, network intrusion detection and early detection of network congestion problems among other applications in the

More information

Hyper-Personalization with MNO Subscriber Data

Hyper-Personalization with MNO Subscriber Data Hyper-Personalization with MNO Subscriber Data Approaches and Recent Trends Regarding the Use of MNO Data for Hyper-Personalization By Zach Cohen, Shahzad Zia and Carly Christian Private and public-sector

More information

Contest. Gobernarte: The Art of Good Government. Eduardo Campos Award. 2015 Third Edition

Contest. Gobernarte: The Art of Good Government. Eduardo Campos Award. 2015 Third Edition Contest Gobernarte: The Art of Good Government Eduardo Campos Award 2015 Third Edition 1 Gobernarte: The Art of Good Government The purpose of the Gobernarte contest is to identify, reward, document, and

More information

Getting the Most from Demographics: Things to Consider for Powerful Market Analysis

Getting the Most from Demographics: Things to Consider for Powerful Market Analysis Getting the Most from Demographics: Things to Consider for Powerful Market Analysis Charles J. Schwartz Principal, Intelligent Analytical Services Demographic analysis has become a fact of life in market

More information

Statistical & Technical Team

Statistical & Technical Team Statistical & Technical Team A Practical Guide to Sampling This guide is brought to you by the Statistical and Technical Team, who form part of the VFM Development Team. They are responsible for advice

More information

Evolution of informal employment in the Dominican Republic

Evolution of informal employment in the Dominican Republic NOTES O N FORMALIZATION Evolution of informal employment in the Dominican Republic According to official estimates, between 2005 and 2010, informal employment fell from 58,6% to 47,9% as a proportion of

More information

Using mobile phone data to map human population distribution

Using mobile phone data to map human population distribution Using mobile phone data to map human population distribution Pierre Deville, Vincent D. Blondel Université catholique de Louvain, Belgium Andrew J. Tatem University of Southampton, UK Marius Gilbert Université

More information

Billing Zip Codes in Cellular Telephone Sampling

Billing Zip Codes in Cellular Telephone Sampling Vol. 7, no 4, 2014 www.surveypractice.org The premier e-journal resource for the public opinion and survey research community Billing Zip Codes in Cellular Telephone Sampling David Dutwin Social Science

More information

Short-Term Forecasting in Retail Energy Markets

Short-Term Forecasting in Retail Energy Markets Itron White Paper Energy Forecasting Short-Term Forecasting in Retail Energy Markets Frank A. Monforte, Ph.D Director, Itron Forecasting 2006, Itron Inc. All rights reserved. 1 Introduction 4 Forecasting

More information

Component 1: Mapping humanitarian access and coverage trends

Component 1: Mapping humanitarian access and coverage trends Improving the evidence base on delivering aid in highly insecure environments Component 1: Mapping humanitarian access and coverage trends SUMMARY OF RESEARCH OBJECTIVES AND APPROACH PROBLEM STATEMENT

More information

Empowering the Digital Marketer With Big Data Visualization

Empowering the Digital Marketer With Big Data Visualization Conclusions Paper Empowering the Digital Marketer With Big Data Visualization Insights from the DMA Annual Conference Preview Webinar Series Big Digital Data, Visualization and Answering the Question:

More information

USAF STRATEGIC PLANNING ICT MARKET ASSESSMENT TEMPLATE

USAF STRATEGIC PLANNING ICT MARKET ASSESSMENT TEMPLATE USAF STRATEGIC PLANNING ICT MARKET ASSESSMENT TEMPLATE 1. INTRODUCTION This document presents the ICT Market Assessment to support the forecasts, planning assumptions, program scope, and budget allocations

More information

Questions to be responded to by the firm submitting the application

Questions to be responded to by the firm submitting the application Questions to be responded to by the firm submitting the application Why do you think this project should receive an award? How does it demonstrate: innovation, quality, and professional excellence transparency

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining 1 Why Data Mining? Explosive Growth of Data Data collection and data availability Automated data collection tools, Internet, smartphones, Major sources of abundant data Business:

More information

Enterprise Location Intelligence

Enterprise Location Intelligence Solutions for Customer Intelligence, Communications and Care. Enterprise Location Intelligence Bringing Location-related Business Insight to Support Better Decision Making and More Profitable Operations

More information

CAPITAL REGION GIS SPATIAL DATA DEMONSTRATION PROJECT

CAPITAL REGION GIS SPATIAL DATA DEMONSTRATION PROJECT CAPITAL REGION GIS SPATIAL DATA DEMONSTRATION PROJECT DRAFT January 2013 Prepared by: O2 Planning + Design, Inc. The information contained in this document has been compiled by O2 Planning + Design Inc.

More information

Beyond listening Driving better decisions with business intelligence from social sources

Beyond listening Driving better decisions with business intelligence from social sources Beyond listening Driving better decisions with business intelligence from social sources From insight to action with IBM Social Media Analytics State of the Union Opinions prevail on the Internet Social

More information

Panel Remarks by Mr Mmboneni Muofhe, GEO Principal Alternate, Deputy Director General, Department of Science and Technology, South Africa

Panel Remarks by Mr Mmboneni Muofhe, GEO Principal Alternate, Deputy Director General, Department of Science and Technology, South Africa UNGGIM- HLF, 20 22 April 2016, Addis Ababa, Ethiopia Session 1: Land Information for Sustainable Development Sub- Session: Addressing challenges confronting countries 20 April 2016: 15h45 17h00 Panel Remarks

More information

Marketing Mix Modelling and Big Data P. M Cain

Marketing Mix Modelling and Big Data P. M Cain 1) Introduction Marketing Mix Modelling and Big Data P. M Cain Big data is generally defined in terms of the volume and variety of structured and unstructured information. Whereas structured data is stored

More information

46 th Session of the UNSC, Side Event organized by UNSD 4 March 2015, New York

46 th Session of the UNSC, Side Event organized by UNSD 4 March 2015, New York Big Data: How do we meet the expectations? 46 th Session of the UNSC, Side Event organized by UNSD 4 March 2015, New York The telecommunication/ict sector as a major source of big data Susan Teltscher

More information

Sentiment Analysis of Twitter Feeds for the Prediction of Stock Market Movement

Sentiment Analysis of Twitter Feeds for the Prediction of Stock Market Movement Sentiment Analysis of Twitter Feeds for the Prediction of Stock Market Movement Ray Chen, Marius Lazer Abstract In this paper, we investigate the relationship between Twitter feed content and stock market

More information

Regional Agenda. World Economic Forum on Africa Meeting Overview

Regional Agenda. World Economic Forum on Africa Meeting Overview Regional Agenda World Economic Forum on Africa Meeting Overview Kigali, Rwanda 11-13 May 2016 Connecting Africa s Resources through Digital Transformation Africa s positive economic outlook is under pressure

More information

Index Insurance for Climate Impacts Millennium Villages Project A contract proposal

Index Insurance for Climate Impacts Millennium Villages Project A contract proposal Index Insurance for Climate Impacts Millennium Villages Project A contract proposal As part of a comprehensive package of interventions intended to help break the poverty trap in rural Africa, the Millennium

More information

Enterprise Location Intelligence

Enterprise Location Intelligence Solutions for Enabling Lifetime Customer Relationships Enterprise Location Intelligence Bringing Location-related Business Insight to Support Better Decision Making and More Profitable Operations W HITE

More information

Smart Cities. Opportunities for Service Providers

Smart Cities. Opportunities for Service Providers Smart Cities Opportunities for Service Providers By Zach Cohen Smart cities will use technology to transform urban environments. Cities are leveraging internet pervasiveness, data analytics, and networked

More information

Robichaud K., and Gordon, M. 1

Robichaud K., and Gordon, M. 1 Robichaud K., and Gordon, M. 1 AN ASSESSMENT OF DATA COLLECTION TECHNIQUES FOR HIGHWAY AGENCIES Karen Robichaud, M.Sc.Eng, P.Eng Research Associate University of New Brunswick Fredericton, NB, Canada,

More information

The Effects of Cellular Phone Infrastructure: Evidence from Rural Peru. Diether Beuermann Christopher McKelvey Carlos Sotelo-Lopez

The Effects of Cellular Phone Infrastructure: Evidence from Rural Peru. Diether Beuermann Christopher McKelvey Carlos Sotelo-Lopez The Effects of Cellular Phone Infrastructure: Evidence from Rural Peru Diether Beuermann Christopher McKelvey Carlos Sotelo-Lopez Motivation Cell phone penetration rates are skyrocketing worldwide, particularly

More information

Big data for beginners An introduction for developmental professionals and researchers

Big data for beginners An introduction for developmental professionals and researchers Big data for beginners An introduction for developmental professionals and researchers Sriganesh Lokanathan, LIRNEasia Research relevant to broadband policy and regulatory processes New Delhi, India 21

More information

Location Analytics for Financial Services. An Esri White Paper October 2013

Location Analytics for Financial Services. An Esri White Paper October 2013 Location Analytics for Financial Services An Esri White Paper October 2013 Copyright 2013 Esri All rights reserved. Printed in the United States of America. The information contained in this document is

More information

STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and

STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and Clustering Techniques and STATISTICA Case Study: Defining Clusters of Shopping Center Patrons STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table

More information

2.1 Net enrolment ratio in primary education

2.1 Net enrolment ratio in primary education 2.1 Net enrolment ratio in primary education GOAL AND TARGET ADDRESSED Goal 2: Achieve universal primary education Target 2.A: Ensure that, by 2015, children everywhere, boys and girls alike, will be able

More information

The Challenges of Geospatial Analytics in the Era of Big Data

The Challenges of Geospatial Analytics in the Era of Big Data The Challenges of Geospatial Analytics in the Era of Big Data Dr Noordin Ahmad National Space Agency of Malaysia (ANGKASA) CITA 2015: 4-5 August 2015 Kuching, Sarawak Big datais an all-encompassing term

More information

A Proven Approach to Stress Testing Consumer Loan Portfolios

A Proven Approach to Stress Testing Consumer Loan Portfolios A Proven Approach to Stress Testing Consumer Loan Portfolios Interthinx, Inc. 2013. All rights reserved. Interthinx is a registered trademark of Verisk Analytics. No part of this publication may be reproduced,

More information

Integrated data and information management in social protection

Integrated data and information management in social protection BRIEFING Integrated data and information management in social protection Key messages > Integrating data and information management of social protection programs through a Single Registry and associated

More information

ABSTRACT. Key Words: competitive pricing, geographic proximity, hospitality, price correlations, online hotel room offers; INTRODUCTION

ABSTRACT. Key Words: competitive pricing, geographic proximity, hospitality, price correlations, online hotel room offers; INTRODUCTION Relating Competitive Pricing with Geographic Proximity for Hotel Room Offers Norbert Walchhofer Vienna University of Economics and Business Vienna, Austria e-mail: norbert.walchhofer@gmail.com ABSTRACT

More information

Executive Summary. Abstract. Heitman Analytics Conclusions:

Executive Summary. Abstract. Heitman Analytics Conclusions: Prepared By: Adam Petranovich, Economic Analyst apetranovich@heitmananlytics.com 541 868 2788 Executive Summary Abstract The purpose of this study is to provide the most accurate estimate of historical

More information

How To Help The World Coffee Sector

How To Help The World Coffee Sector ICC 105 19 Rev. 1 16 October 2012 Original: English E International Coffee Council 109 th Session 24 28 September 2012 London, United Kingdom Strategic action plan for the International Coffee Organization

More information

Mobile Data for Development

Mobile Data for Development Mobile Data for Development By Ed Naef, Philipp Muelbert, Syed Raza, Raquel Frederick Earlier this year, Cartesian released a study written in collaboration with the Financial Services for the Poor team

More information

e-health Initiative Lina Abou Mrad MBA, PMP Director, National E-Health Program Health Insight 4 -March 2014

e-health Initiative Lina Abou Mrad MBA, PMP Director, National E-Health Program Health Insight 4 -March 2014 e-health Initiative Lina Abou Mrad MBA, PMP Director, National E-Health Program Health Insight 4 -March 2014 What is E-Health? The term e-health was barely in use before 1999 Terms such as medical informatics,

More information

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract W H I T E P A P E R Deriving Intelligence from Large Data Using Hadoop and Applying Analytics Abstract This white paper is focused on discussing the challenges facing large scale data processing and the

More information

The primary goal of this thesis was to understand how the spatial dependence of

The primary goal of this thesis was to understand how the spatial dependence of 5 General discussion 5.1 Introduction The primary goal of this thesis was to understand how the spatial dependence of consumer attitudes can be modeled, what additional benefits the recovering of spatial

More information

Concept Note and. Call for Papers

Concept Note and. Call for Papers Concept Note and Call for Papers AFRICAN ECONOMIC CONFERENCE 2015 ADDRESSING POVERTY AND INEQUALITY IN THE POST 2015 DEVELOPMENT AGENDA Kinshasa, Democratic Republic of Congo 2-4 November, 2015 1 1. Introduction

More information

Progress and prospects

Progress and prospects Ending CHILD MARRIAGE Progress and prospects UNICEF/BANA213-182/Kiron The current situation Worldwide, more than 7 million women alive today were married before their 18th birthday. More than one in three

More information

The Impact of Big Data on Social Research David Rhind Sharon Witherspoon

The Impact of Big Data on Social Research David Rhind Sharon Witherspoon The Impact of Big Data on Social Research David Rhind Sharon Witherspoon 1 www.nuffieldfoundation.org The landscape to be covered What is Big Data? Just consultants hype? Key questions for SRA Technology

More information

The Household Level Impact of Public Health Insurance. Evidence from the Urban Resident Basic Medical Insurance in China. University of Michigan

The Household Level Impact of Public Health Insurance. Evidence from the Urban Resident Basic Medical Insurance in China. University of Michigan The Household Level Impact of Public Health Insurance Evidence from the Urban Resident Basic Medical Insurance in China University of Michigan Jianlin Wang April, 2014 This research uses data from China

More information

Workshop Discussion Notes: Housing

Workshop Discussion Notes: Housing Workshop Discussion Notes: Housing Data & Civil Rights October 30, 2014 Washington, D.C. http://www.datacivilrights.org/ This document was produced based on notes taken during the Housing workshop of the

More information

FREQUENTLY ASKED QUESTIONS

FREQUENTLY ASKED QUESTIONS FREQUENTLY ASKED QUESTIONS July 2015 THE DATA REVOLUTION FAQs opendatawatch.com WHAT IS THE DATA The data revolution is about both the supply of and demand for data. Recent years have seen an exponential

More information

REFLECTIONS ON THE USE OF BIG DATA FOR STATISTICAL PRODUCTION

REFLECTIONS ON THE USE OF BIG DATA FOR STATISTICAL PRODUCTION REFLECTIONS ON THE USE OF BIG DATA FOR STATISTICAL PRODUCTION Pilar Rey del Castillo May 2013 Introduction The exploitation of the vast amount of data originated from ICT tools and referring to a big variety

More information

Green Power Accounting Workshop: Concept Note For discussion during Green Power Accounting Workshop in Mexico City, May 13th 2011

Green Power Accounting Workshop: Concept Note For discussion during Green Power Accounting Workshop in Mexico City, May 13th 2011 Introduction As more companies and government organizations prepare greenhouse gas (GHG) inventories that reflect the emissions directly and indirectly associated with their operations, they increasingly

More information

On The Relationship Between Socio-Economic Factors and Cell Phone Usage

On The Relationship Between Socio-Economic Factors and Cell Phone Usage On The Relationship Between Socio-Economic Factors and Cell Phone Usage Vanessa Frias-Martinez Telefonica Research Madrid, Spain vanessa@tid.es Jesus Virsesa Telefonica Research Madrid, Spain jvjerez@tid.es

More information

Multiple Critical Illness Benefits from market needs to product solutions

Multiple Critical Illness Benefits from market needs to product solutions June 2013 Newsletter Multiple Critical Illness Benefits from market needs to product solutions Asia-Pacific experience Author Karsten de Braaf Head of Product Development Asia-Pacific R&D Centre - Disability

More information

Sauti za Wananchi Collecting national data using mobile phones

Sauti za Wananchi Collecting national data using mobile phones Sauti za Wananchi Brief No. 1 February 2013 Sauti za Wananchi Collecting national data using mobile phones Introduction Sauti za Wananchi (Voices of Citizens) is a new initiative that uses mobile phones

More information

IBM SPSS Direct Marketing 23

IBM SPSS Direct Marketing 23 IBM SPSS Direct Marketing 23 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 23, release

More information

GEOGRAPHIC INFORMATION SYSTEMS (GIS): THE BEDROCK OF NG9-1-1

GEOGRAPHIC INFORMATION SYSTEMS (GIS): THE BEDROCK OF NG9-1-1 GEOGRAPHIC INFORMATION SYSTEMS (GIS): THE BEDROCK OF NG9-1-1 THE TIME IS NOW FOR PSAPS AND REGIONAL AGENCIES TO TAKE ADVANTAGE OF THE ACCURATE GEOSPATIAL DATABASES THAT WILL BE KEY TO NEXT GENERATION EMERGENCY

More information

SOCIAL MEDIA MONITORING AND SENTIMENT ANALYSIS SYSTEM

SOCIAL MEDIA MONITORING AND SENTIMENT ANALYSIS SYSTEM Kuwait National Assembly Media Department SOCIAL MEDIA MONITORING AND SENTIMENT ANALYSIS SYSTEM Dr. Salah Alnajem Associate Professor of Computational Linguistics and Natural Language Processing, Kuwait

More information

A Capability Model for Business Analytics: Part 2 Assessing Analytic Capabilities

A Capability Model for Business Analytics: Part 2 Assessing Analytic Capabilities A Capability Model for Business Analytics: Part 2 Assessing Analytic Capabilities The first article of this series presented the capability model for business analytics that is illustrated in Figure One.

More information

Opportunities and Limitations of Big Data

Opportunities and Limitations of Big Data Opportunities and Limitations of Big Data Karl Schmedders University of Zurich and Swiss Finance Institute «Big Data: Little Ethics?» HWZ-Darden-Conference June 4, 2015 On fortune.com this morning: Apple's

More information

Statistical Sales Forecasting using SAP BPC

Statistical Sales Forecasting using SAP BPC Statistical Sales Forecasting using SAP BPC Capgemini s unique statistical sales forecasting solution integrated with SAP BPC 10.0 helps global fortune 1000 company built robust & accurate sales forecasting

More information

Applying Data Science to Sales Pipelines for Fun and Profit

Applying Data Science to Sales Pipelines for Fun and Profit Applying Data Science to Sales Pipelines for Fun and Profit Andy Twigg, CTO, C9 @lambdatwigg Abstract Machine learning is now routinely applied to many areas of industry. At C9, we apply machine learning

More information

The big data revolution

The big data revolution The big data revolution Friso van Vollenhoven (Xebia) Enterprise NoSQL Recently, there has been a lot of buzz about the NoSQL movement, a collection of related technologies mostly concerned with storing

More information

Issue Brief. Access to Capital for Women- and Minority-owned Businesses: Revisiting Key Variables. Advocacy: the voice of small business in government

Issue Brief. Access to Capital for Women- and Minority-owned Businesses: Revisiting Key Variables. Advocacy: the voice of small business in government Issue Brief Advocacy: the voice of small business in government Issue Brief Number 3 Access to Capital for Women- and Minority-owned Businesses: Revisiting Key Variables By Christine Kymn At A Glance Small

More information

IBM SPSS Direct Marketing 22

IBM SPSS Direct Marketing 22 IBM SPSS Direct Marketing 22 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 22, release

More information

Discussion of Capital Injection, Monetary Policy, and Financial Accelerators

Discussion of Capital Injection, Monetary Policy, and Financial Accelerators Discussion of Capital Injection, Monetary Policy, and Financial Accelerators Karl Walentin Sveriges Riksbank 1. Background This paper is part of the large literature that takes as its starting point the

More information

CELL PHONE TRACKING. Index. Purpose. Description. Relevance for Large Scale Events. Options. Technologies. Impacts. Integration potential

CELL PHONE TRACKING. Index. Purpose. Description. Relevance for Large Scale Events. Options. Technologies. Impacts. Integration potential CELL PHONE TRACKING Index Purpose Description Relevance for Large Scale Events Options Technologies Impacts Integration potential Implementation Best Cases and Examples 1 of 10 Purpose Cell phone tracking

More information

Grand Challenges Making Drill Down Analysis of the Economy a Reality. John Haltiwanger

Grand Challenges Making Drill Down Analysis of the Economy a Reality. John Haltiwanger Grand Challenges Making Drill Down Analysis of the Economy a Reality By John Haltiwanger The vision Here is the vision. A social scientist or policy analyst (denoted analyst for short hereafter) is investigating

More information

Analytics: The Path to Business Intelligence and Decision Making

Analytics: The Path to Business Intelligence and Decision Making 1 Analytics: The Path to Business Intelligence and Decision Making According to the Harvard Business Review, the way for companies to pull ahead of the pack is to use sophisticated data collection technology

More information

Location matters. 3 techniques to incorporate geo-spatial effects in one's predictive model

Location matters. 3 techniques to incorporate geo-spatial effects in one's predictive model Location matters. 3 techniques to incorporate geo-spatial effects in one's predictive model Xavier Conort xavier.conort@gear-analytics.com Motivation Location matters! Observed value at one location is

More information

Belief Formation in the Returns to Schooling

Belief Formation in the Returns to Schooling Working paper Belief Formation in the Returns to ing Jim Berry Lucas Coffman March 212 Belief Formation in the Returns to ing March 31, 212 Jim Berry Cornell University Lucas Coffman Ohio State U. & Yale

More information

TravelOAC: development of travel geodemographic classifications for England and Wales based on open data

TravelOAC: development of travel geodemographic classifications for England and Wales based on open data TravelOAC: development of travel geodemographic classifications for England and Wales based on open data Nick Bearman *1 and Alex D. Singleton 1 1 Department of Geography and Planning, School of Environmental

More information

Cell Phone Analytics: Scaling Human Behavior Studies into the Millions

Cell Phone Analytics: Scaling Human Behavior Studies into the Millions Cell Phone Analytics FRIAS-MARTINEZ, VIRSEDA Research Article Cell Phone Analytics: Scaling Human Behavior Studies into the Millions Vanessa Frias-Martinez vanessa@tid.es Researcher Telefonica Research

More information

Visualizing of Berkeley Earth, NASA GISS, and Hadley CRU averaging techniques

Visualizing of Berkeley Earth, NASA GISS, and Hadley CRU averaging techniques Visualizing of Berkeley Earth, NASA GISS, and Hadley CRU averaging techniques Robert Rohde Lead Scientist, Berkeley Earth Surface Temperature 1/15/2013 Abstract This document will provide a simple illustration

More information

Cross Validation. Dr. Thomas Jensen Expedia.com

Cross Validation. Dr. Thomas Jensen Expedia.com Cross Validation Dr. Thomas Jensen Expedia.com About Me PhD from ETH Used to be a statistician at Link, now Senior Business Analyst at Expedia Manage a database with 720,000 Hotels that are not on contract

More information

Beef Demand: What is Driving the Market?

Beef Demand: What is Driving the Market? Beef Demand: What is Driving the Market? Ronald W. Ward Food and Economics Department University of Florida Demand is a term we here everyday. We know it is important but at the same time hard to explain.

More information

Satellite Solutions for Emergency Relief and Disaster Recovery Management. May 2009

Satellite Solutions for Emergency Relief and Disaster Recovery Management. May 2009 Satellite Solutions for Emergency Relief and Disaster Recovery Management May 2009 Introduction Disasters can occur anytime and anywhere. Whether the emergency is an act of nature or an act of man, the

More information

By CDG 450 Connectivity Special Interest Group (450 SIG)

By CDG 450 Connectivity Special Interest Group (450 SIG) Economics of 450 MHz band for the Smart Grid and Smart Metering By CDG 450 Connectivity Special Interest Group (450 SIG) September 2013 1. Introduction Alliander in The Netherlands is the first utility

More information

Target and Acquire the Multichannel Insurance Consumer

Target and Acquire the Multichannel Insurance Consumer Neustar Insights Whitepaper Target and Acquire the Multichannel Insurance Consumer Increase Conversion by Applying Real-Time Data Across Channels Contents Executive Summary 2 Are You Losing Hot Leads?

More information

BIG DATA + ANALYTICS

BIG DATA + ANALYTICS An IDC InfoBrief for SAP and Intel + USING BIG DATA + ANALYTICS TO DRIVE BUSINESS TRANSFORMATION 1 In this Study Industry IDC recently conducted a survey sponsored by SAP and Intel to discover how organizations

More information

WiMAX technology. An opportunity that can lead African Countries to the NET Economy. Annamaria Raviola SVP - Marketing and Business Development

WiMAX technology. An opportunity that can lead African Countries to the NET Economy. Annamaria Raviola SVP - Marketing and Business Development WiMAX technology An opportunity that can lead African Countries to the NET Economy Annamaria Raviola SVP - Marketing and Business Development Agenda Telecommunications in Africa: the present picture Wi-MAX:

More information

Draft WGIG Issue paper on Affordable and Universal Access

Draft WGIG Issue paper on Affordable and Universal Access Draft WGIG Issue paper on Affordable and Universal Access This paper is a 'draft working paper' reflecting the preliminary findings of the drafting team. It has been subject to review by all WGIG members,

More information

11 th World Telecommunication/ICT Indicators Symposium (WTIS-13)

11 th World Telecommunication/ICT Indicators Symposium (WTIS-13) 11 th World Telecommunication/ICT Indicators Symposium (WTIS-13) Mexico City, México, 4-6 December 2013 Contribution to WTIS-13 Document C/21-E 6 December 2013 English SOURCE: TITLE: LIRNEasia Leveraging

More information