Big data for official statistics



Similar documents
Modernization of European Official Statistics through Big Data methodologies and best practices: ESS Big Data Event Roma 2014

WHAT DOES BIG DATA MEAN FOR OFFICIAL STATISTICS?

big data in the European Statistical System

The use of Big Data for statistics

Will big data transform official statistics?

Big Data (and official statistics) *

HLG - Big Data Sandbox for Statistical Production

RATIONALISING DATA COLLECTION: AUTOMATED DATA COLLECTION FROM ENTERPRISES

ESS event: Big Data in Official Statistics

Report of the 2015 Big Data Survey. Prepared by United Nations Statistics Division

D&B Data Manager Your Data Management process in the Cloud. Transparent, Complete & Up-To-Date Master Data

Big data coming soon... to an NSI near you. John Dunne. Central Statistics Office (CSO), Ireland

Dynamic M2M Event Processing Complex Event Processing and OSGi on Java Embedded

New Frontiers for Official Statistics

Use of Mobile Positioning Data for Tourism Statistics

United Nations Global Working Group on Big Data for Official Statistics Task Team on Cross-Cutting Issues

Annex: Concept Note. Big Data for Policy, Development and Official Statistics New York, 22 February 2013

22 nd Meeting of the European Statistical System Committee

BIG DATA + ANALYTICS

/ WHITEPAPER / THE BIMODAL IT

PDF PREVIEW EMERGING TECHNOLOGIES. Applying Technologies for Social Media Data Analysis

OpenAIRE Research Data Management Briefing paper

Using Predictive Maintenance to Approach Zero Downtime

BANKING ON CUSTOMER BEHAVIOR

Inside Track Research Note. In association with. Hyper-Scale Data Management. An open source-based approach to Software Defined Storage

Benefits Realization from IS & IT, and Change Management of roles and the working practices of individuals and teams.

REGULATIONS FOR THE DEGREE OF MASTER OF SCIENCE IN LIBRARY AND INFORMATION MANAGEMENT (MSc[LIM])

BIG DATA AND ANALYTICS

Item rd International Transport Forum. Big Data to monitor air and maritime transport. Paris, March 2016

Strategy for data collection

M2M Analytics: A New Wave of Innovation

Integrated Data Collection System on business surveys in Statistics Portugal

The Cyber Threat Profiler

The structured application of advanced logging techniques for SystemVerilog testbench debug and analysis. By Bindesh Patel and Amanda Hsiao.

ICT Perspectives on Big Data: Well Sorted Materials

Probes and Big Data: Opportunities and Challenges

The future of your fleet begins with connect:

Big Data for Social Good. Nuria Oliver, PhD Scientific Director User, Data and Media Intelligence Telefonica Research

Sales forecasting with SAS Advanced Analytics for the Pharmaceutical sector. A business case

EL Program: Smart Manufacturing Systems Design and Analysis

Are You Big Data Ready?

LEGISLATIVE ACTS AND OTHER INSTRUMENTS Subject : Council Resolution on the implementation of the eeurope 2005 Action Plan

How To Use Big Data Effectively

A. Background. In this Communication we can read:

On the use of Honeypots for Detecting Cyber Attacks on Industrial Control Networks

M2M Communications and Internet of Things for Smart Cities. Soumya Kanti Datta Mobile Communications Dept.

Faculty of Applied Sciences. Bachelor s degree programme. Nanobiology. Integrating Physics with Biomedicine

KEY GUIDE. The key stages of financial planning

How To Improve Information Security

Service Performance Management: Pragmatic Approach by Jim Lochran

QUEEN S UNIVERSITY BELFAST. e-learning and Distance Learning Policy

New technologies applied to Carrier Monitoring Software Systems. Juan Carlos Sánchez Integrasys S.A.

E-COMMERCE - TOWARD AN INTERNATIONAL DEFINITION AND INTERNATIONALLY COMPARABLE STATISTICAL INDICATORS

Big Data andofficial Statistics Experiences at Statistics Netherlands

Developing a successful Big Data strategy. Using Big Data to improve business outcomes

DATA COLLECTION IN THE STATISTICAL OFFICE OF THE REPUBLIC OF SLOVENIA (SURS)

Machine Data Analytics with Sumo Logic

Data for the Public Good. The Government Statistical Service Data Strategy

KEY GUIDE. The key stages of financial planning

estatistik.core: COLLECTING RAW DATA FROM ERP SYSTEMS

DEGREE PROGRAMME IN INFORMATION TECHNOLOGY

Action Plan towards Open Access to Publications

Principles and Guidelines on Confidentiality Aspects of Data Integration Undertaken for Statistical or Related Research Purposes

STAGE 1 COMPETENCY STANDARD FOR ENGINEERING ASSOCIATE

Case Study: ICICI BANK INTERNAL AUDIT DEPARTMENT PENTANA AUDIT WORK SYSTEM IMPLEMENTATION

COMMISSION OF THE EUROPEAN COMMUNITIES COMMUNICATION FROM THE COMMISSION TO THE COUNCIL AND THE EUROPEAN PARLIAMENT

Keywords: big data, official statistics, quality, Wikipedia page views, AIS.

Sensing, monitoring and actuating on the UNderwater world through a federated Research InfraStructure Extending the Future Internet SUNRISE

BIG DATA: IT MAY BE BIG BUT IS IT SMART?

Cyber intelligence exchange in business environment : a battle for trust and data

Big Data Strategy Issues Paper

A SOA visualisation for the Business

SYSTEMS, CONTROL AND MECHATRONICS

Meeting TeTech. Version: 1.8, 15-July-2013, Author: Wim Telkamp, language: English

Guidelines for the implementation of quality assurance frameworks for international and supranational organizations compiling statistics

Cloud computing for maintenance of railway signalling systems

UNCLASSIFIED. Briefing to Critical Infrastructure Sector Organizations on the Canadian Cyber Incident Response Centre (CCIRC)

Transcription:

Big data for official statistics Strategies and some initial European applications Martin Karlberg and Michail Skaliotis, Eurostat 27 September 2013 Seminar on Statistical Data Collection WP 30 1

Big Data what is it? Volume Velocity Variety (social network data, RFID sensor data, satellite image data, business system data ) Generally Organic, i.e. not designed (in the survey design sense) by official statisticians Sometimes Open, i.e. freely available to anybody online (but sometimes proprietary) 27 September 2013 Seminar on Statistical Data Collection 2

Overview 1. Big data and data science 2. Some recent international initiatives 3. Actual applications: Internet as a Data Source (ICT statistics) Mobile positioning data (tourism statistics) Price collection via the Internet (price statistics) 4. Conclusions 27 September 2013 Seminar on Statistical Data Collection 3

1. Big Data and data science Data science vs. statistics multidisciplinary extracting meaning from data (mathematics, statistics, data engineering, pattern recognition and learning, advanced computing, visualization, uncertainty modeling, data warehousing, and high performance computing) Area dominated by computer scientists Official statisticians behind on the learning curve Who could become a data scientist? Trained statisticians with programming skills? IT specialists lacking training in statistics? Learning by doing is key 27 September 2013 Seminar on Statistical Data Collection 4

2. Big Data and official statistics some recent international initiatives Big Data Task Team (coordinated by UNECE) project proposal (http://www1.unece.org/stat/platform/display/collection/draft+hlg+project+proposal+on+big+data): identify, examine and provide guidance regarding main strategic and methodological issues demonstrate the feasibility of efficient production of both novel products and mainstream official statistics using Big Data sources facilitate the sharing across organisations of knowledge, expertise, tools and methods for the production of statistics using Big Data sources. Scheveningen memorandum (DGINS) 27 September 2013 Seminar on Statistical Data Collection 5

3. Learning by doing is key The time has come to demystify Big Data No data science skills without practice! Emphasised in BD Task Team project proposal Relevant activities launched by Eurostat in the domain (details provided in our paper): Internet as a Data Source (ICT statistics) Mobile positioning data (tourism statistics) Price collection via the Internet (price statistics) 27 September 2013 Seminar on Statistical Data Collection 6

Using the Internet for the collection of (information society) statistics Current situation: Traditional surveys to individuals and enterprises Need: rapidly evolving phenomena (moving target) particularly true for ICT Opportunity: digital footprint available by definition Study commissioned by Eurostat: analysis of methodologies for using the Internet for the collection of ICT and other statistics 27 September 2013 Seminar on Statistical Data Collection 7

IaD study for ICT statistics Approach: Internet as a data source (IaD) User-centric, network-centric, site centric Feasibility study: Which items could be collected over the Internet? Collection mode: Households: Respondents asked to download monitoring program ; transmissions to NSIs Enterprises: Data harvested from enterprise websites (2 nd phase: complemented with monitoring program or server log files) 27 September 2013 Seminar on Statistical Data Collection 8

IaD study (cont.) Going beyond ICT statistics: use of federated open data for official statistics: evaluation of Big Data repositories regarding their - usefulness as a supplementary source for traditional official statistics - capacity of replacing official statistical indicators study steps: - Identify Big Data sources - Assess potential - Assess practical feasibility - Analyse conditions for opening the Big Data sources 27 September 2013 Seminar on Statistical Data Collection 9

Use of mobile positioning data for tourism statistics Example: Anonymised signal data from one Mobile Network Operator (MNO) recalculation of the sample of signal data to a number of people (based on calibration) domestic tourism: based on home anchor point, according to repeated occurrence in the same cell (night-time) International tourism: based on roaming data 27 September 2013 Seminar on Statistical Data Collection 10

Use of mobile positioning data for tourism statistics (cont.) Meets a need: a more accurate and in-depth data source new aspects and indicators for describing the time-space behaviour of people a highly quantitative data source better time-space insights reduced costs possibility to register trips to sparsely populated locations such as natural parks 27 September 2013 Seminar on Statistical Data Collection 11

Collecting prices on the internet Potential: Increased volume of e-commerce Prices available in the public domain High frequency automated collection possible Feasibility studies (Statistics Netherlands): Generic (country, product) software modules (open source license) Collection via internet robots 27 September 2013 Seminar on Statistical Data Collection 12

Issues Data provision: Ethical and image issues Technical issues Continuity issues Quality: Representativity issues Comparability vs. relevance 27 September 2013 Seminar on Statistical Data Collection 13

Data: ethical, image & legal issues ICT monitoring tool : Similarity with phishing? (Outweighed by response burden reduction?) Mobile positioning data: Individuals monitored (personal data protection?) MNO concerns: cost (actual, opportunity cost) risks (strain on real-time systems; business secrets divulged) Price collection: Enterprises explicitly prohibiting bots combing their websites 27 September 2013 Seminar on Statistical Data Collection 14

Data: Technical issues ICT monitoring tool : Multiple operating systems, various configurations of each system (correlated with survey variables ) Mobile positioning data: High-frequency data taxing on the real-time system on MNOs Standardisation needed (but easier ) Price collection: Multiple e-commerce site designs 27 September 2013 Seminar on Statistical Data Collection 15

Data: Business continuity issues ICT monitoring tool : Evolving operating systems Mobile positioning data: If basis for official statistics: legislation inevitable? Price collection: Evolving webshops If basis for official statistics: legislation necessary? 27 September 2013 Seminar on Statistical Data Collection 16

Representativity and comparability New concepts, new collection mode time series break inevitable One person one device? One company one website? Cell phone penetration a non-issue? Online prices different from offline prices! Possible solutions: Calibrate using traditional statistics Parallel measurement (for bridging the gap) Accept break on grounds of increased relevance 27 September 2013 Seminar on Statistical Data Collection 17

Lessons learned so far Promising results but issues remain Trade-off between price/volume and representativity/quality however, big data alternative not always cheaper but worse : Non-response vs. Big Data representativity Narrow scope vs. Big Data granularity The variety is hard to tackle (obviously, more structured Big Data are easier to analyse) Increased ICT, cell phone and e-commerce use increased scope, relevance (and acceptance?) 27 September 2013 Seminar on Statistical Data Collection 18

4. The way forward Official statistics community now on the move! Collaboration Big Data goes beyond statistics authorities national/ government strategies needed Public-Private Partnerships (could help build-up) Multidisicplinary teams including legal expertise Action Demystify data science build up skills Applications-driven approach learn by doing 27 September 2013 Seminar on Statistical Data Collection 19