Big Data, Official Statistics and Social Science Research: Emerging Data Challenges



Similar documents
Statistical Challenges with Big Data in Management Science

CSC590: Selected Topics BIG DATA & DATA MINING. Lecture 2 Feb 12, 2014 Dr. Esam A. Alwagait

Danny Wang, Ph.D. Vice President of Business Strategy and Risk Management Republic Bank

BIG DATA FUNDAMENTALS

Big Data / FDAAWARE. Rafi Maslaton President, cresults the maker of Smart-QC/QA/QD & FDAAWARE 30-SEP-2015

Big Data and Open Data

Is big data the new oil fuelling development?

BUY BIG DATA IN RETAIL

BIG DATA TECHNOLOGY. Hadoop Ecosystem

Data Refinery with Big Data Aspects

The Next Wave of Data Management. Is Big Data The New Normal?

Industry Impact of Big Data in the Cloud: An IBM Perspective

Big Data Hope or Hype?

Statistics for BIG data

Smarter Analytics. Barbara Cain. Driving Value from Big Data

Grand Challenges Making Drill Down Analysis of the Economy a Reality. John Haltiwanger

How To Improve Data Quality

5 Keys to Unlocking the Big Data Analytics Puzzle. Anurag Tandon Director, Product Marketing March 26, 2014

BIG DATA I N B A N K I N G

Collaborations between Official Statistics and Academia in the Era of Big Data

Apache Hadoop Patterns of Use

PALANTIR HEALTH. Maximizing data assets to improve quality, risk, and compliance. 100 Hamilton Ave, Suite 300 Palo Alto, California 94301

The Big Deal about Big Data. Mike Skinner, CPA CISA CITP HORNE LLP

Demystifying Big Data Government Agencies & The Big Data Phenomenon

BIG DATA : Big Opportunity or Big Threat for Official Statistics?* Jose Ramon G. Albert, Ph.D. Secretary General, NSCB jrg.albert@nscb.gov.

Data Analytics in Organisations and Business

Big Data at DST. Bill Nixon, Matt Crouch

Deploying Big Data to the Cloud: Roadmap for Success

How To Handle Big Data With A Data Scientist

big data in the European Statistical System

1. Understanding Big Data

IBM Business Analytics software for Insurance

EXCLUSIVE INTERVIEW A BEHIND THE SCENES LOOK AT TELEFÓNICA S EVOLVING BIG DATA EXTERNAL MONETISATION MODEL

Big Data. Fast Forward. Putting data to productive use

SMARTPHONES & BIG DATA. Daniel Nelson Head of Enterprise Development, daniel.nelson@braintreepayments.

Big Data-Challenges and Opportunities

The Rise of Industrial Big Data

Big Data for Social Good. Nuria Oliver, PhD Scientific Director User, Data and Media Intelligence Telefonica Research

IBM's Fraud and Abuse, Analytics and Management Solution

Turning Big Data into a Big Opportunity

Big data The three-minute guide

The Principles of the Business Data Lake

Sentiment Analysis on Big Data

Healthcare Measurement Analysis Using Data mining Techniques

Big Data. What is Big Data? Over the past years. Big Data. Big Data: Introduction and Applications

Analyzing Big Data: The Path to Competitive Advantage

IBM Customer Experience Suite and Predictive Analytics

Discover How a 360-Degree View of the Customer Boosts Productivity and Profits. eguide

MAPS/REPUTATION DASHBOARD

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract

Are You Ready for Big Data?

CONSUMERLAB CONNECTED LIFESTYLES. An analysis of evolving consumer needs

Data-Driven Decisions: Role of Operations Research in Business Analytics

Big Data better business benefits

EVERYTHING THAT MATTERS IN ADVANCED ANALYTICS

Big Data for Development: What May Determine Success or failure?

Netskope Cloud Report. Report Highlights. cloud report. Three of the top 10 cloud apps are Storage, and enterprises use an average of 26 such apps

Predicting & Preventing Banking Customer Churn by Unlocking Big Data

Big data, the future of statistics

The Impact of Big Data on Social Research David Rhind Sharon Witherspoon

New Frontiers for Official Statistics

Banking On A Customer-Centric Approach To Data

BIG DATA ANALYTICS FOR HOSPITALITY AND LEISURE Learn more about your customers than ever before!

Experian Cross Channel Marketing Platform. Managing campaigns and reaching consumers in real time

Predicting & Preventing Banking Customer Churn by Unlocking Big Data

How To Understand Data Theory

PREDICTIVE ANALYTICS IN FRAUD

Turning Big Data into Big Decisions Delivering on the High Demand for Data

How To Understand The Benefits Of Big Data

Social Media Marketing for Local Businesses

Of all the data in recorded human history, 90 percent has been created in the last two years. - Mark van Rijmenam, Think Bigger, 2014

SOCIAL MEDIA MONITORING AND SENTIMENT ANALYSIS SYSTEM

Mobile Experience Benchmark. Crittercism

Improving customer service with data 19 may 2015 Maarten Jonker Leiden

Big Analytics: A Next Generation Roadmap

BIG DATA & ANALYTICS

Software Engineering for Big Data. CS846 Paulo Alencar David R. Cheriton School of Computer Science University of Waterloo

Big Data and Analytics: Getting Started with ArcGIS. Mike Park Erik Hoel

Understanding & Realizing Big Data Potential

Information-Driven Transformation in Retail with the Enterprise Data Hub Accelerator

Big Data Executive Survey

Transcription:

Big Data, Official Statistics and Social Science Research: Emerging Data Challenges Professor Paul Cheung Director, United Nations Statistics Division

Building the Global Information System Elements of a Global Information System: Common Standard, Data Exchange Protocol, Quality Assurance Mechanism, Universal Dissemination Platform, Global Governance Arrangement; Working with National Statistical Offices to evolve a global statistical system -- Many achievements over 65 years; Now working with National Geospatial Information Authorities to evolve a global geospatial information platform with common practices and standards; Imperative to bring these two communities, and other data communities, together to advance an integrated system.

Big Data: A BIG Deal? Google search trend 100 80 60 40 20 0 2004 2005 2006 2007 2008 2009 2010 2012 big data official statistics Source: Google Trends (as of 18 December 2012)

What is Big Data? No fixed definition, still debated Unstructured, Unregulated Four Vs: Volume: from Terabyte to Geopbyte Velocity: high speed of data in and out Variety: different formats, integration difficult Variability: data flows highly inconsistent Complexity: requires data cleansing, linking, and matching the data across systems

Multiple Sources of Data Social Everything! Networking Commenting Internet uses Online searches Online page-view Administrative Hospital visits Sales receipts Traffic monitoring Commercial Cell phone usages Credit card transactions Insurance records Product searches Health information Electronic medical records Medical monitoring Satellite imagery Monitoring systems

Google: Predicting the Present Source: Predicting the Present with Google Trends, Choi & Varian, April 2009

Hedonometrics and Twitter Source: Temporal Patterns of Happiness and Information in a Global Social Network: Hedonometrics and Twitter, Dodds et. al., 2011

National Mood (UK) and Twitter 25 16/11 04/12 0-18 Normalized mood scores for JOY, SADNESS, ANGER and FEAR Source: Mood of Nation [Beta] (http://geopatterns.enm.bris.ac.uk/mood/)

Over 1,000,000 outpatient visits per year by MHC Asia Source: http://www.mhcasia.com/managedcare/ A. ONE THOUSAND CLINICS in Singapore B. Adopted by 90% of insurers in Singapore C. Linked by Web & Smartphone Apps D. Smartphone Apps Virtual membership card & clinic locator 1. Reports- Diagnosis, Financial & Statistical Data 2. Disease pattern & management 3. Infectious Disease Alert 4. Cost Control 5. Drugs usage data lead to bulk purchase 6. Sick Leave control 7. Audit & Frauds detection 8. Email Alerts (High Claim,Sick Leave Alert)

Electronic Road Pricing (Singapore)

Electronic Road Pricing (Singapore) Source: Interactive map ERP, http://www.onemotoring.com.sg

Big Data : Everywhere, Anywhere The amount of data grows rapidly (approximately 2.5 quintillion bytes created per day) Everything will be, in some sense, a geospatial beacon, referencing to or generating location information A hyper-connected environment-estimates suggest over 50 billion things connected by 2020.

Real-time Tracking of Population Movement Regular July 4 Macy s firework Hypothetical data

Big Data Are they Really Useful? A lot of hype, but used mainly in commercial and security applications Research and development work are ongoing with great potential Commercial applications developing the fastest Detecting fraud / Risk Generating consumer profile Reducing medical care cost Changing travelling and consumption patterns

New Data, New Methods Data deluge makes scientific methods obsolete?? Official statistics depends on classical statistical methods?? Are social science data models and methods obsolete??

Big Data vs Official Statistics Official Statistics are Structured Data with Unique Identity Population Characteristics Company Profits/Losses Population Census Survey of Companies Census Questionnaire Company Balance Sheet Statistical Analysis Statistical Analysis

Big Data and Social Sciences Research

Statistical vs Structural Inference

Incorporating Big Data in Official Statistics Could Big Data replace traditional data sources? Not reliable source at this moment Limitations (non-representativeness, unreliability) Important as collaborating evidence Huge potential: faster, cheaper data New data sources could replace traditional sources? Data-mining with multiple sources of data for new insights

Improving Data Sources in Official Statistics A lot of work has been done in official statistics: Common Standard, Data Exchange Protocol, Quality Assurance Mechanism, Universal Dissemination Platform New emphasis in Data Sources Multi-mode data collection Internet based surveys Administrative sources Too much emphasis on surveys and traditional approaches Imperative to review appropriateness of Big Data to assess fit for purpose of official statistics.

University of Michigan Consumer Sentiment Index: Google Prediction Consumer Sentiment Index Current Economic Condition Index Consumer Expectations Index Source: Consumer Sentiment with Google Trends, Choi, Google Inc. Conference on Empirical Macroeconomics Using Geographical Data, March 2011

2.5 2 1.5 1 0.5 0-0.5-1 -1.5-2 -2.5 Predicting Consumer Sentiment Index Jan-04 May-04 Sep-04 Jan-05 May-05 Sep-05 Jan-06 May-06 Sep-06 Jan-07 May-07 Sep-07 Jan-08 May-08 Sep-08 Jan-09 May-09 Sep-09 Jan-10 May-10 Sep-10 Jan-11 May-11 Sep-11 Jan-12 May-12 Consumer Sentiment Index Google Search - Starbucks Franchise

Google Trend and Unemployment Rate Source: Consumer Sentiment with Google Trends, Choi, Google Inc. Conference on Empirical Macroeconomics Using Geographical Data, March 2011

Predicting Insurance Claims 100 1200000 90 80 70 60 50 1000000 800000 600000 40 30 20 10 400000 200000 0 2004 2005 2006 2007 2008 2009 2010 2011 2012 0 Initial claim of Unemployment Insurance Google search unemployment+social security+welfare

The Billion Prices Project @ MIT Pricing Behavior: What drives price stickiness around the world? How much can be explained by current inflation, and inflation histories? How much by competition and industries structure? Daily Inflation and Asset Prices: Construct daily inflation indexes across countries and sectors and study their ability to match official statistics. Pass-Through: How much do prices adjust internally when the exchange rate, or the international price of commodities change? Markups: What premium is paid in stores for green or organic products? With data from multinational retailers, compute premium differences -for exactly the same items- in different places. The Billion Prices Project @ MIT, http://bpp.mit.edu/

Argentina Aggregate Inflation Series Source: www.pricestats.com/arindex.html

Mobile Phone Positioning Data for Tourism Statistics Source: Mobile Telephones and Mobile Positioning data as source for statistics: Estonian Experiences, Ahas et. Al. (2011)

Source: http://index.intuit.com/ Intuit Small Business Employment Indexes

Big Data as Data Source for Research Traditional Data on Social Network Big Data on Social Network Snow-ball approach, from person to person, rich information on inter-personal relations Large number of people and connections Source: Reality Mining, http://reality.media.mit.edu/soc.php

Real-time Community Crime Data Source: https://www.crimereports.com/

Big Data and Representativeness What is the population? Who generates the data? Can we draw a sample and infer population traits? Patterns may reflect what is happening but the reference population is not clear Inferential Statistics not possible; hence the use of non-parametric analytics

Big Data: Who Generates the Data? Representative? Demographics of Twitter Users Source: The State of Twitter 2012 [STATS], 3 August 2012

Big Data and Social Reality Does Big Data reflect social reality Do the data reveal random or real patterns? Are the data representative? What is the real meaning of the data? Do the data reflect social patterns or structures? An example: Social network study Articulated social networks list of friends on Facebook Behavioural network communication patterns and cell coordinates

Big Data and Verifiability Can the data be verified and re-tested? Many big data are considered private, not available to larger academic community for repeated analysis Equal data access needed for Making scientific replication studies Preventing fraudulent publications

Big Data and Confidentiality Confidentiality a big issue. Traditional anonymization might not work well Geocoding statistical information creates new concerns Sharing continuous time and cell phone location information from a city is a problem Google privacy policy update (1 March 2012): linking a person via multiple Google products collecting across platforms information on health, political opinions and financial concerns Demands for precise, location-based information pushes the boundary of confidentiality

New types of research data about human behavior and society pose many opportunities if crucial infrastructural challenges are tackled. G King Science 2011;331:719-721

Using Big Data in Social Science New Tools and Procedures required for: Data preparation/cleaning Data reduction Data mining Searching for patterns and/or relationships Building the best model Apllying the best model to a new dataset to classify or estimate (machine learning) How/what to teach the machine?

Big Data and Computational Challenge Computational challenge Generating manageable structured data from unstructured data Integrating big data processing with statistical analysis tools

Learning to Use Big Data Training required Nonstandard data types Computational methods Protection of data confidentiality Legal protocols Data sharing norms Statistical tools

The Way Forward Big Data will become more prominent in years to come. Statisticians and Social Scientists should take advantage of new data source. Computation and quantitative analytical skills become important. Data must generate insights and knowledge: This is the ultimate goal. We must decipher truth vs falsehood.