Big Data as a data source for official statistics



Similar documents
* With contributions of: Edwin de Jonge and Paul van den Hurk. Definition and the 3 V s. Can Big Data be used for official statistics?

Big CBS. Experiences at Statistics Netherlands. Dr. Piet J.H. Daas Methodologist, Big Data research coördinator. Statistics Netherlands

Big data, the future of statistics

Big Data andofficial Statistics Experiences at Statistics Netherlands

Visualization and Big Data in Official Statistics

Big Data (and official statistics) *

Big Data. Case studies in Official Statistics. Martijn Tennekes. Special thanks to Piet Daas, Marco Puts, May Offermans, Alex Priem, Edwin de Jonge

Big Data and Official Statistics

WHAT DOES BIG DATA MEAN FOR OFFICIAL STATISTICS?

Innovation at Statistics Netherlands

STATISTICS PAPER SERIES

Big Data as a Data Source for Official Statistics: experiences at Statistics Netherlands

The state of DIY. Mix Express DIY event Maarssen 14 mei 2014

Big Data, Official Statistics and Social Science Research: Emerging Data Challenges

Dutch Mortgage Market Pricing On the NMa report. Marco Haan University of Groningen November 18, 2011

Selectivity of Big data

Social Network Analysis

OBA & Compliance. Regulatory Update. The New Face of Online Advertising 9/30/2011

Danny Wang, Ph.D. Vice President of Business Strategy and Risk Management Republic Bank

CSC590: Selected Topics BIG DATA & DATA MINING. Lecture 2 Feb 12, 2014 Dr. Esam A. Alwagait

Of all the data in recorded human history, 90 percent has been created in the last two years. - Mark van Rijmenam, Think Bigger, 2014

Systems of Discovery The Perfect Storm of Big Data, Cloud and Internet-of-Things

Subject information: Subject: English

employager 1.0 design challenge

PoliticalMashup. Make implicit structure and information explicit. Content

Cervino Marketing Delivers Smart Marketing Dashboards. Netherlands-based Klipfolio Partner Combines Big Data Engine with Klipfolio Dashboards

If your internal communication is really effective?

Advanced Metering Infrastructure

TRENDS IN TRAVEL. GfK turning research into business opportunities. Judith Nijk,

BIG DATA FUNDAMENTALS

Pavlo Baron. Big Data and CDN

THE EMOTIONAL VALUE OF PAID FOR MAGAZINES. Intomart GfK 2013 Emotionele Waarde Betaald vs. Gratis Tijdschrift April

Beyond Watson: The Business Implications of Big Data

Assuring the Cloud. Hans Bootsma Deloitte Risk Services +31 (0)

ead management een digital wereld

SOCIAL MEDIA AND HEALTHCARE HYPE OR FUTURE?

Citrix Access Gateway: Implementing Enterprise Edition Feature 9.0

Statistical Challenges with Big Data in Management Science

Social Media Tips & Tools for Customer Engagement and Growth. Jessica Wilkins Byerly PIP Printing and Marketing Services Burlington, NC

End User Computing. Applications. Application Fabric / Data Fabric / PaaS. Software Defined Data Center

FACTS & FIGURES AUGUST 2013

Collaborations between Official Statistics and Academia in the Era of Big Data

Relationele Databases 2002/2003

BIG DATA TECHNOLOGY. Hadoop Ecosystem

Sources: Summary Data is exploding in volume, variety and velocity timely

Re-make/Re-model : Should big data change the modelling paradigm in official statistics? 1

GreenClics.com All rights reserved. Monitoring & Engaging the Green Social Web March 2013

MAYORGAME (BURGEMEESTERGAME)

M2M oplossingen voor hetzelfde probleem. Frank den Ridder

Sage CRM 7.2 Enhancing productivity. Enabling success.

Section I: Web Page Analysis

Overview. Enterprise social media management needs are met through MediaMiner components shown in Figure 1 below:

RandstadRail: Increase in Public Transport Quality by Controlling Operations

white paper Big Data for Small Business Why small to medium enterprises need to know about Big Data and how to manage it Sponsored by:

Digital Marketing Proposal.

PERFORMANCE MATTERS CONSUMER INSIGHTS FROM THE UNITED KINGDOM

5 Traits of Companies Successfully Preventing Fraud and How to Apply Them in Your Business. An IDology, Inc. Whitepaper

Martine de Bruijne, Cordula Wagner Safety 4 Patients

Digital Marketing in Travel: 2016 TREND REPORT

On a mission to reduce traffic congestion for all

Netbiscuits Analytics

IP-NBM. Copyright Capgemini All Rights Reserved

How To Handle Big Data With A Data Scientist

Data Centric Computing Revisited

Research Department. J. Jakulj, N. Jonker and H.M.M. Peeters. Research Memorandum WO no De Nederlandsche Bank

Big Data for Informed Decisions

Fleet management system as actuator for public transport priority

If farming becomes surviving! Ton Duffhues Specialist Agriculture and society ZLTO Director Atelier Waarden van het Land 4 juni 2014, Wageningen

Feed forward mechanism in public transport

Full chain integration with your mobile field engineers

Understanding the impact of the connected revolution. Vodafone Power to you

THE BIG DATA REVOLUTION

Big Data for Development: What May Determine Success or failure?

T-MOBILE USES SOCIAL MEDIA ANALYTICS TO BOOST EFFICIENCY

Smarter Cities - Wat betekent dit voor ICT(ers)? Let s Build a Smarter Planet City by City

How cloud-based systems and machine-driven big data can contribute to the development of autonomous vehicles

Buurten van gemeente Groningen

How B2B Customer Self-Service Impacts the Customer and Your Bottom Line. zedsuite

Campaigns and actions: some recent Dutch experiences. Gerjan Huis in t Veld november

Official Statistics in the Age. of Big Data. SAS Forum BeLux

Big Data: A Guided Tour

Primary health care in the Netherlands: current situation and trends

United Nations Global Working Group on Big Data for Official Statistics Task Team on Cross-Cutting Issues

Floating Car Data in the Netherlands

Networked. Field Services

#BusinessMeetsIT. Welcome. Seminar BI& Datacenters

Big Data in Retail Big Data Analytics Central to Customer Acquisition and Retention Strategies in Retail

International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: Vol. 1, Issue 6, October Big Data and Hadoop

The Cloud for Insights

IT Tools for SMEs and Business Innovation

Multichannel Customer Listening and Social Media Analytics

Cloud beyond the obvious, an approach for innovation

Busin i ess I n I t n e t ll l i l g i e g nce c T r T e r nds For 2013

canon repair center nederland

Private Equity Survey 2011

Big Data. What is Big Data? Over the past years. Big Data. Big Data: Introduction and Applications

SOCIAL MEDIA LISTENING AND ANALYSIS Spring 2014

Aaronia Spectran Spectrum Analyser Charles Claessens Page 2 Colofon het bitje Charles Claessens Page 17

INTRODUCTION. IoT AND IP STRATEGIES

Digital Marketing Strategy

Transcription:

Big Data as a data source for official statistics Piet Daas, Marco Puts, Bart Buelens and Paul van den Hurk Statistics Netherlands

Overview Data sources and statistics More & more data becomes available Effect on statistics production How we study Big Data: 2 examples Traffic loop detection data Social media messages 1

Introduction Statistics Netherlands has produced about 5000 official publications and tables in 2012 For this we need DATA 2

Data sources for official statistics Primary data Secondary data Data from others Our own surveys - Administrative sources - New data sources 3

Statistics Netherlands law Statistics Netherlands aims to reduce the administrative burden for companies and the public as much as possible By (re-)using existing administrative registrations of both government and government-funded organizations. And study potential new sources of information 3

Data, data everywhere! X 4

Statistics Netherlands and Data Data is generated in increasing amounts and at increasing frequencies: From Data scarcity (sample survey) to Data abundance (administrative & Big) Ever increasing amounts of data need to be checked, processed and analyzed More sources of information become available Opportunities to produce statistics faster ( real-time statistics ) Need for new methods and tools 1. Methods to quickly uncover information from massive amounts of data available, such as visualisation methods and data-, text- and streammining techniques ( making Big Data small ), High Performance Comp. 2. Methods capable of integrating the information in the statistical process, e.g. linking at massive scale, macro/meso-integration, estimation methods suited for large datasets 5

2 Big Data case studies Research findings on the study of Big Data sources from a statistics point of view 1. Traffic loop detection data 80 million records/day, studied 90 days so far, number of vehicles detected each minute 2. Dutch social media messages 1~2 million public messages/day, studied up to 2 billion records, content and sentiment 6

1. Traffic loop detection data Traffic loops Every minute (24/7) the number of passing vehicles is counted by >10,000 road sensors & camera s in the Netherlands Total vehicles and in different length classes Interesting source to produce traffic and transport statistics (and more) Huge amounts of data, about 100 million records a day Locations 7

Number of detected vehicles on a single day By all loops Total = ~ 295 million 8

Traffic loop detection activity (only first 10 min.) 9

Correct for missing data Corrected data (for blocks of 5 min) Before Total = ~ 295 million After Total = ~ 330 million (+ 12%) 10

Total vehicles during the day (snapshots) 12

For different vehicle lengths 1 categorie 3 categoriën 5 categoriën Totaal Totaal <= 5.6m > 5.6 & <= 12.2m > 12.2m Totaal > 1.85 & <= 2.4m > 2.4 & <= 5.6m > 5.6 & <= 11.5m > 11.5 & <= 12.2m > 12.2m Small vehicles <= 5.6 m Medium sized vehicles > 5.6 m & <= 12.2 m Large vehicles > 12.2 m 13

Small vehicles ~75% of total 14

Small & medium vehicles 15

Small, medium & large vehicles 16

Volatile behaviour at the micro-level 17

2. Social media messages Dutch are very active on social media platforms Bijna altijd bij zich en staat vrijwel altijd aan Steeds meer mensen hebben een smartphone! Mogelijke informatiebron voor: Welke onderwerpen zijn actueel: Aantal berichten en sentiment hierover Als meetinstrument te gebruiken voor:. Map by Eric Fischer (via Fast Company) 18

2. Social media messages Dutch are very active on social media platforms Potential information source for: Topics discussed and sentiment over these topics (quickly available!) and probably more? Investigate it to obtain an answer on potential use 2a. Content: - Collected Dutch Twitter messages for study: selection of 12 million 2b. Sentiment - Sentiment in Dutch social media messages: all ~2 billion 19

Social media: Dutch Twitter topics (3%) (7%) (3%) (10%) (7%) (3%) (5%) (46%) 12 million messages 20

Sentiment in Social media Access to Coosto database > 2 billion publicly available messages Twitter, Facebook, Hyves, Webfora, Blogs etc. Sentiment of each message Positive, negative or neutral Interesting finding Determine so-called Mood of the nation compared to Consumer confidence of Statistics Netherlands 21

Consumer confidence, survey data (pos neg) as % of total Sentiment towards the economic climate ~1000 respondents/month 22

Final remarks: Big Data and statistics Preparing Big data for statistics is time consuming Exploration phase takes a lot of time Try to reduce amount of data without losing information ( making big data small, noise reduction) Risk: garbage in garbage statistics out Traditional approach does not suffice Big data sources are definitely not large sample surveys or admin data Often a selective but a large part of the population is included Events are registered, not units! Careful with using traditional statistical analysis (everything is significant!) More need for: Visualisation methods (to rapidly gain insight) Methods & models specific for large dataset (fast and robust ) Learn from computational statistics & (try to) use dedicated hardware Beware of privacy issues! 27

The future of Stat Neth?