Conference by STATEC and EUROSTAT Savoir pour agir: la statistique publique au service des citoyens big data in the European Statistical System Michail SKALIOTIS EUROSTAT, Head of Task Force 'Big Data'
Datafication Digital footprint Sensors
Proclamation of pope Benedict 2005
Proclamation of pope Francis 2013
African proverb When the music changes, so does the dance If we fail to listen we will be out of step! (Denise Lievesley)
Big data @ ESS key points ESS (European Statistical System) Scheveningen Memorandum September 2013 Examine the potential of big data sources for official statistics Official Statistics big data strategy as part of wider government strategy Address privacy and data protection Collaboration at European and global level Address need for skills Partnerships between different stakeholders (government, academics, private sector) Developments in methodology, quality assessment and IT Adopt action plan and roadmap for the ESS
Big data @ ESS key points ESS (European Statistical System) Scheveningen Memorandum Sep 2013 Task Force Big Data Big Data Action Plan and Roadmap 1.0 Sept. 2014 ESS Pilots 2016-2019 Implementation of ESS Vision 2020: Big Data project integral part of the portfolio European Commission Communication "Towards a thriving data driven economy" Public Private Partnership on big data International cooperation (UNSD, UNECE, etc.)
Areas in Big data roadmap Policy Quality framework Skills Experience sharing Legislation IT Infrastructures Methods Ethics / Communication Pilots
Challenges cooperation, sharing of know-how development of a sound methodology ("from design-based to model-based approach") exploration & tentative implementation Action (example) Pilot projects, carried out by the Member States (ESSnet) 2015 2019 (FPA / SGA construction) Exploring different big data sources (but also IT architecture, partnerships), developing generic guidelines and frameworks Enable the ESS to gradually integrate big data sources into the production of European and national statistics?
Challenges new skills for NSI staff: statisticians vs. data scientists? computing capacity, hardware? analytical tools, software? storage? Action (example) Training program for European statisticians (ESTP) In the next years: dedicated courses on big data Focus on big data sources and on big data tools Acquiring the skills needed to assess sources and their quality, the skills to use tools and to explore big data sources
Challenges integrating official statistics in big data strategies getting access to data & continuity of access data security & privacy concerns pay for data? Action (example) Project on the analysis of legislation and strategy (but also ethics and communication) 2015-2017 (22 months) Analysis for EU and for Member States at national level See also the Feasibility study on the use of mobile positioning data for tourism statistics (report on feasibility of access)
Action (example) Challenges transversal challenges to all big data activities: quality and ethics & communication big data vs. statistics : "goodness of fit" (concepts, representativeness, ) impact on the public opinion of privacy and security concerns? Cooperation with UN (lead) on a quality framework for big data Project on the analysis of ethics and communication (but also legislation and strategy) 2015-2017 (22 months) Analysis for EU and for Member States at national level
Big data = Multiple sources & Multiple outputs Tourism Statistics Mobile phone data Commuting Statistics Mobile Phone Data Population Statistics Satellite Images Population Statistics Smart Meters Traffic Statistics Migration Statistics VGI websites
Statistical domains Tourism Employment Population Migration Balance of payments Regional and GIS Transport ICT usage Prices and inflation Land use Agriculture
National initiatives as a driver CBS Netherlands ISTAT Italy ONS UK CSO Ireland Statistics Finland SURS Slovenia
Insights for world heritage sites from Wikipedia use Source Hourly page views for each Wikipedia article Content of Wikipedia articles High timeliness, temporal detail and transparency, no geographical information Processing Big Data Sandbox: computer cluster with 4 nodes Tools: Pig, Map-Reduce, Python, R Association of Wikipedia articles to specific WHS Output Exposure of world heritage via Wikipedia
Insights for world heritage sites from Wikipedia use Page views of English Wikipedia articles related to World Heritage Sites
Nowcasting Unemployment Source Google Trends (others to be explored) High timeliness, geo info available, low transparency Processing Low computing power required Time-series modelling (machine learning to be explored) Tools: R Output Nowcasting of unemployment from 1 month lag to current time
The statistical office of the future Data flows in addition to surveys and censuses Embedded in data flow statistics 'everywhere' Product designers in addition to data collection designers Statistical modelling will be a major activity From descriptive indicators to nowcasting (and forecasting) Trust and quality will be key New role in teaching digital literacy Accreditation and certification instead of pure production Address issues linked to quality & transparency, privacy & confidentiality, access to third party data sources & data sharing, scientific standards & methodology, professional ethics, skills,
The NSI of the future: Official Statistics in a full-fledged IoT world Svein Nordbotten: Use of electronically observed data in official statistics
Thank you for your attention!