Is big data the new oil fuelling development? 12th National Convention on Statistics Manila, Philippines 2 October, 2013 Johannes Jütting PARIS21
Big data (2
The future? Linked data: Is this the future?.. 1 (27
Outline 1. WHAT IS BIG DATA? 2. HOW CAN IT BE USED FOR POLICY? 3. WHAT S NEXT? (3
1. WHAT IS BIG DATA? (4
Big data New wealth of digital data - 90% of the world s digital data has been created in just the last two years and is doubling every 20 months. Qualitatively: digital translations of human actions Largely data exhaust : passively collected data derived from daily usage of digital devices Examples: electronic transactions, social media activity, internet searches 1 (5
Big data Big data comes in a number of forms - variety Data Exhaust collected passively from devices (phones, credit cards, web searches etc.) as sensors of human behaviour Online information (blogs, twitters, news articles...) sensors of human sentiments Physical sensors (pollution, light emission etc.) remote sensors of human activity Citizen reporting information actively produced via phone-surveys, hotlines etc. 1 (7
Open data, though big data from governments and businesses is one form of data that has been included in open data protocols Key challenge: anonymization Big data isn t Limited to developed countries, thanks to rapid spread of technology like mobile phones Not always big and different data Clear-cut concept! Lot s of confusion. 1 (8
2. HOW CAN IT BE USED FOR POLICY? (9
Early warning systems 3 main areas of application Health: detecting outbreaks/ spikes based on internet search trends Food security: detecting price hikes based on social media or text messages Monitoring of trends Economy: social media activity as proxy for prices/ inflation Socioeconomic indicators: filling in data gaps with cell phone usage trends 1 (14
3 main areas of application (cont d) Migration: tracking large-scale movement with satellite imagery Unemployment: nowcasting agricultural unemployment based on Google Trend analysis Evaluation Disaster relief: detect return of electricity using satellite imagery or cell phone logs Public awareness: measure changes in public awareness via social media and internet searches 1 (15
4 main areas Technology advanced computing, machine learning Training specialized skills in data mining and analysis Legal ability to anonymize data Knowledge of context ability to relate trends and/or anomalies in big data to a development objective 1 (16
3. WHAT S NEXT? (18
Big data seen as relevant by most statisticians PARIS21 conducted (non-representative) big data survey of statisticians (Early 2013) 70 responses, 35% from Asia & Pacific 1 (19
54% are talking about big data in their institutions 1 (20
94% feel big data can supplement national statistics 1 (21
78% said big data should supplement national statistics 1 (22
79% think big data will play an important or central role in their job over the next 15-20 years 1 (23
Consensus on main challenges 1 (24
Big data and Post-2015 Development Agenda The revolution in information technology over the last decade provides an opportunity to strengthen data and statistics for accountability and decision-making purposes. There have been innovative initiatives to use mobile technology and other advances to enable real-time monitoring of development results. But this movement remains largely disconnected from the traditional statistics community at both global and national levels. The post-2015 process needs to bring them together and start now to improve development data. 1 (25
Oil for revolutions? Crude Investments needed Complementary : not the same faster! Data AND institutions : NSO s key role It s political revolution versus digital divide 1 (14
Need for more pilots Beginning of a data revolution More evidence, tests and pilots needed Trail and error Philippines 1 (26
www.paris21.org For more information contact@paris21.org FOLLOW US ON
Early warning systems Researchers at John Hopkins University tracked Google Flu Trends data along with frequency of flu-related hospital visits first to show strong correlation between Google searches on the flu and emergency room activity Traditional reports based on admissions data, clinical symptoms and lab results take weeks to process Potential application: surveillance model to predict spikes in flu-like cases allowing hospitals to allocate additional resources (staff, space) Data limitation: can t distinguish between actual flu incidence and those with flu-like symptoms or those just curious about the flu. 1 (10
Source: Rothman, R et al., (2012), Google Flu Trends: Correlation With Emergency Department Influenza Rates and Crowding Metrics, Clinical Infectious Diseases, Oxford University Press. Available online at http://www.pacercenter.org/media/19059/cid.cir883.full.pdf. 1 (11 Correlation between Google Flu trend & hospital-wide rate of influenza
Monitoring inflation using social media UN Global Pulse and Crimson Hexagon monitored tweets associated with, among other things, the price of rice in Indonesia. Revealed close correlation between changes in frequency of tweets about price of rice and actual food price inflation. Potential application: real-time monitoring of price inflation Data limitation: limited sample (digital divide, youth dominance, etc.); specific culture of Twitter. 1 (12