1 Big Data, Official Statistics and Social Science Research: Emerging Data Challenges Professor Paul Cheung Director, United Nations Statistics Division
2 Building the Global Information System Elements of a Global Information System: Common Standard, Data Exchange Protocol, Quality Assurance Mechanism, Universal Dissemination Platform, Global Governance Arrangement; Working with National Statistical Offices to evolve a global statistical system -- Many achievements over 65 years; Now working with National Geospatial Information Authorities to evolve a global geospatial information platform with common practices and standards; Imperative to bring these two communities, and other data communities, together to advance an integrated system.
3 Big Data: A BIG Deal? Google search trend big data official statistics Source: Google Trends (as of 18 December 2012)
4 What is Big Data? No fixed definition, still debated Unstructured, Unregulated Four Vs: Volume: from Terabyte to Geopbyte Velocity: high speed of data in and out Variety: different formats, integration difficult Variability: data flows highly inconsistent Complexity: requires data cleansing, linking, and matching the data across systems
5 Multiple Sources of Data Social Everything! Networking Commenting Internet uses Online searches Online page-view Administrative Hospital visits Sales receipts Traffic monitoring Commercial Cell phone usages Credit card transactions Insurance records Product searches Health information Electronic medical records Medical monitoring Satellite imagery Monitoring systems
6 Google: Predicting the Present Source: Predicting the Present with Google Trends, Choi & Varian, April 2009
7 Hedonometrics and Twitter Source: Temporal Patterns of Happiness and Information in a Global Social Network: Hedonometrics and Twitter, Dodds et. al., 2011
8 National Mood (UK) and Twitter 25 16/11 04/ Normalized mood scores for JOY, SADNESS, ANGER and FEAR Source: Mood of Nation [Beta] (http://geopatterns.enm.bris.ac.uk/mood/)
9 Over 1,000,000 outpatient visits per year by MHC Asia Source: A. ONE THOUSAND CLINICS in Singapore B. Adopted by 90% of insurers in Singapore C. Linked by Web & Smartphone Apps D. Smartphone Apps Virtual membership card & clinic locator 1. Reports- Diagnosis, Financial & Statistical Data 2. Disease pattern & management 3. Infectious Disease Alert 4. Cost Control 5. Drugs usage data lead to bulk purchase 6. Sick Leave control 7. Audit & Frauds detection 8. Alerts (High Claim,Sick Leave Alert)
12 Big Data : Everywhere, Anywhere The amount of data grows rapidly (approximately 2.5 quintillion bytes created per day) Everything will be, in some sense, a geospatial beacon, referencing to or generating location information A hyper-connected environment-estimates suggest over 50 billion things connected by 2020.
13 Real-time Tracking of Population Movement Regular July 4 Macy s firework Hypothetical data
14 Big Data Are they Really Useful? A lot of hype, but used mainly in commercial and security applications Research and development work are ongoing with great potential Commercial applications developing the fastest Detecting fraud / Risk Generating consumer profile Reducing medical care cost Changing travelling and consumption patterns
15 New Data, New Methods Data deluge makes scientific methods obsolete?? Official statistics depends on classical statistical methods?? Are social science data models and methods obsolete??
16 Big Data vs Official Statistics Official Statistics are Structured Data with Unique Identity Population Characteristics Company Profits/Losses Population Census Survey of Companies Census Questionnaire Company Balance Sheet Statistical Analysis Statistical Analysis
17 Big Data and Social Sciences Research
18 Statistical vs Structural Inference
19 Incorporating Big Data in Official Statistics Could Big Data replace traditional data sources? Not reliable source at this moment Limitations (non-representativeness, unreliability) Important as collaborating evidence Huge potential: faster, cheaper data New data sources could replace traditional sources? Data-mining with multiple sources of data for new insights
20 Improving Data Sources in Official Statistics A lot of work has been done in official statistics: Common Standard, Data Exchange Protocol, Quality Assurance Mechanism, Universal Dissemination Platform New emphasis in Data Sources Multi-mode data collection Internet based surveys Administrative sources Too much emphasis on surveys and traditional approaches Imperative to review appropriateness of Big Data to assess fit for purpose of official statistics.
21 University of Michigan Consumer Sentiment Index: Google Prediction Consumer Sentiment Index Current Economic Condition Index Consumer Expectations Index Source: Consumer Sentiment with Google Trends, Choi, Google Inc. Conference on Empirical Macroeconomics Using Geographical Data, March 2011
23 Google Trend and Unemployment Rate Source: Consumer Sentiment with Google Trends, Choi, Google Inc. Conference on Empirical Macroeconomics Using Geographical Data, March 2011
24 Predicting Insurance Claims Initial claim of Unemployment Insurance Google search unemployment+social security+welfare
25 The Billion Prices MIT Pricing Behavior: What drives price stickiness around the world? How much can be explained by current inflation, and inflation histories? How much by competition and industries structure? Daily Inflation and Asset Prices: Construct daily inflation indexes across countries and sectors and study their ability to match official statistics. Pass-Through: How much do prices adjust internally when the exchange rate, or the international price of commodities change? Markups: What premium is paid in stores for green or organic products? With data from multinational retailers, compute premium differences -for exactly the same items- in different places. The Billion Prices MIT,
26 Argentina Aggregate Inflation Series Source:
27 Mobile Phone Positioning Data for Tourism Statistics Source: Mobile Telephones and Mobile Positioning data as source for statistics: Estonian Experiences, Ahas et. Al. (2011)
28 Source: Intuit Small Business Employment Indexes
29 Big Data as Data Source for Research Traditional Data on Social Network Big Data on Social Network Snow-ball approach, from person to person, rich information on inter-personal relations Large number of people and connections Source: Reality Mining,
30 Real-time Community Crime Data Source: https://www.crimereports.com/
31 Big Data and Representativeness What is the population? Who generates the data? Can we draw a sample and infer population traits? Patterns may reflect what is happening but the reference population is not clear Inferential Statistics not possible; hence the use of non-parametric analytics
32 Big Data: Who Generates the Data? Representative? Demographics of Twitter Users Source: The State of Twitter 2012 [STATS], 3 August 2012
33 Big Data and Social Reality Does Big Data reflect social reality Do the data reveal random or real patterns? Are the data representative? What is the real meaning of the data? Do the data reflect social patterns or structures? An example: Social network study Articulated social networks list of friends on Facebook Behavioural network communication patterns and cell coordinates
34 Big Data and Verifiability Can the data be verified and re-tested? Many big data are considered private, not available to larger academic community for repeated analysis Equal data access needed for Making scientific replication studies Preventing fraudulent publications
36 New types of research data about human behavior and society pose many opportunities if crucial infrastructural challenges are tackled. G King Science 2011;331:
37 Using Big Data in Social Science New Tools and Procedures required for: Data preparation/cleaning Data reduction Data mining Searching for patterns and/or relationships Building the best model Apllying the best model to a new dataset to classify or estimate (machine learning) How/what to teach the machine?
38 Big Data and Computational Challenge Computational challenge Generating manageable structured data from unstructured data Integrating big data processing with statistical analysis tools
39 Learning to Use Big Data Training required Nonstandard data types Computational methods Protection of data confidentiality Legal protocols Data sharing norms Statistical tools
40 The Way Forward Big Data will become more prominent in years to come. Statisticians and Social Scientists should take advantage of new data source. Computation and quantitative analytical skills become important. Data must generate insights and knowledge: This is the ultimate goal. We must decipher truth vs falsehood.
Statistical Challenges with Big Data in Management Science Arnab Kumar Laha Indian Institute of Management Ahmedabad Analytics vs Reporting Competitive Advantage Reporting Prescriptive Analytics (Decision
CSC590: Selected Topics BIG DATA & DATA MINING Lecture 2 Feb 12, 2014 Dr. Esam A. Alwagait Agenda Introduction What is Big Data Why Big Data? Characteristics of Big Data Applications of Big Data Problems
Big Data / FDAAWARE Rafi Maslaton President, cresults the maker of Smart-QC/QA/QD & FDAAWARE 30-SEP-2015 1 Agenda BIG DATA What is Big Data? Characteristics of Big Data Where it is being used? FDAAWARE
BIG DATA FUNDAMENTALS Timeframe Minimum of 30 hours Use the concepts of volume, velocity, variety, veracity and value to define big data Learning outcomes Critically evaluate the need for big data management
Danny Wang, Ph.D. Vice President of Business Strategy and Risk Management Republic Bank Agenda» Overview» What is Big Data?» Accelerates advances in computer & technologies» Revolutionizes data measurement»
Is big data the new oil fuelling development? 12th National Convention on Statistics Manila, Philippines 2 October, 2013 Johannes Jütting PARIS21 Big data (2 The future? Linked data: Is this the future?..
International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 7 (2013), pp. 655-662 International Research Publications House http://www. irphouse.com /ijict.htm Data
BIG DATA TECHNOLOGY Hadoop Ecosystem Agenda Background What is Big Data Solution Objective Introduction to Hadoop Hadoop Ecosystem Hybrid EDW Model Predictive Analysis using Hadoop Conclusion What is Big
Big Data Promises and Pitfalls David J. Hand Imperial College, London and Winton Capital Management July 2015 Policy making in the Big Data Era 1 The world of data is changing Not something which happens
BUY BIG DATA IN RETAIL Table of contents What is Big Data?... How Data Science creates value in Retail... Best practices for Retail. Case studies... 3 7 11 1. Social listening... 2. Cross-selling... 3.
Big Data Hope or Hype? David J. Hand Imperial College, London and Winton Capital Management Big data science, September 2013 1 Google trends on big data Google search 1 Sept 2013: 1.6 billion hits on big
Smarter Analytics Driving Value from Big Data Barbara Cain Vice President Product Management - Business Intelligence and Advanced Analytics Business Analytics IBM Software Group 1 Agenda for today 1 Big
Collaborations between Official Statistics and Academia in the Era of Big Data World Statistics Day October 20-21, 2015 Budapest Vijay Nair University of Michigan Past-President of ISI email@example.com What
The Big Deal about Big Data Mike Skinner, CPA CISA CITP HORNE LLP Mike Skinner, CPA CISA CITP Senior Manager, IT Assurance & Risk Services HORNE LLP Focus areas: IT security & risk assessment IT governance,
Why big data? Lessons from a Decade+ Experiment in Big Data David Belanger PhD Senior Research Fellow Stevens Institute of Technology firstname.lastname@example.org 1 What Does Big Look Like? 7 Image Source Page:
5 Keys to Unlocking the Big Data Analytics Puzzle Anurag Tandon Director, Product Marketing March 26, 2014 1 A Little About Us A global footprint. A proven innovator. A leader in enterprise analytics for
Industry Impact of Big Data in the Cloud: An IBM Perspective Inhi Cho Suh IBM Software Group, Information Management Vice President, Product Management and Strategy email: email@example.com twitter: @inhicho
100 Hamilton Ave, Suite 300 Palo Alto, California 94301 firstname.lastname@example.org www.palantir.com/health PALANTIR HEALTH Maximizing data assets to improve quality, risk, and compliance Palantir Health: Maximizing
III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution
Grand Challenges Making Drill Down Analysis of the Economy a Reality By John Haltiwanger The vision Here is the vision. A social scientist or policy analyst (denoted analyst for short hereafter) is investigating
Community Driven Apache Hadoop Apache Hadoop Patterns of Use April 2013 2013 Hortonworks Inc. http://www.hortonworks.com Big Data: Apache Hadoop Use Distilled There certainly is no shortage of hype when
Predictive Analytics & Business Insights 2015, Chicago Mudit Mangal Project Lead, Data Analytics, Supply Chain Sears Holdings Corporation 06/11/2015 Agenda WHAT IS HAPPENING WHAT ARE BENEFITS AND CHALLENGES
$ BIG DATA IN BANKING Table of contents What is Big Data?... How data science creates value in Banking... Best practices for Banking. Case studies... 3 7 10 1. Fraud detection... 2. Contact center efficiency
Deploying Big Data to the Cloud: Roadmap for Success James Kobielus Chair, CSCC Big Data in the Cloud Working Group IBM Big Data Evangelist. IBM Data Magazine, Editor-in- Chief. IBM Senior Program Director,
Statistics for BIG data Statistics for Big Data: Are Statisticians Ready? Dennis Lin Department of Statistics The Pennsylvania State University John Jordan and Dennis K.J. Lin (ICSA-Bulletine 2014) Before
Big Data and utility function in bank services Selected aspects Nikolay K. Vitanov 1 1 Institute of Mechanics, Bulgarian Academy of Sciences Sofia, 16. 06. 2015 Vitanov (BAS) Big Data and utility function
Big Data Big Data: Introduction and Applications August 20, 2015 HKU-HKJC ExCEL3 Seminar Michael Chau, Associate Professor School of Business, The University of Hong Kong Ample opportunities for business
Sustainable Development with Geospatial Information Leveraging the Data and Technology Revolution Steven Hagan, Vice President, Server Technologies 1 Copyright 2011, Oracle and/or its affiliates. All rights
Customer-Centricity in a World of Data: Turning Big Data into a Big Opportunity Richard Maraschi Business Analytics Solutions Leader IBM Global Media & Entertainment Joe Wikert General Manager & Publisher
IBM Customer Experience Suite and Predictive Analytics Introduction to the IBM Customer Experience Suite In order to help customers meet their exceptional web experience goals in the most efficient and
Government Efficiency through Innovative Reform IBM's Fraud and Abuse, Analytics and Management Solution Service Definition Copyright IBM Corporation 2014 Table of Contents Overview... 1 Major differentiators...
White Paper Analyzing Big Data: The Path to Competitive Advantage by Marcia Kaplan Contents Introduction....2 How Big is Big Data?................................................................................
BIG DATA : Big Opportunity or Big Threat for Official Statistics?* Jose Ramon G. Albert, Ph.D. Secretary General, NSCB Email: email@example.com 1 *Views expressed do not reflect those at NSCB Outline
Big Data and its Real Impact on Your Security & Privacy Framework: A Pragmatic Overview Erik Luysterborg Partner, Deloitte EMEA Data Protection & Privacy leader Prague, SCCE, March 22 nd 2016 1. 2016 Deloitte
Big Data at DST Bill Nixon, Matt Crouch 2013 DST Systems, Inc. 2013 All rights DST Systems, reserved. Inc. All rights reserved. The enclosed materials are highly sensitive, proprietary and confidential.
How to Leverage Big Data in the Cloud to Gain Competitive Advantage James Kobielus, IBM Big Data Evangelist Editor-in-Chief, IBM Data Magazine Senior Program Director, Product Marketing, Big Data Analytics
Big Analytics: A Next Generation Roadmap Cloud Developers Summit & Expo: October 1, 2014 Neil Fox, CTO: SoftServe, Inc. 2014 SoftServe, Inc. Remember Life Before The Web? 1994 Even Revolutions Take Time
Big Data Putting data to productive use Fast Forward What is big data, and why should you care? Get familiar with big data terminology, technologies, and techniques. Getting started with big data to realize
Enhance Collaboration and Data Sharing for Faster Decisions and Improved Mission Outcome Richard Breakiron Senior Director, Cyber Solutions Rbreakiron@vion.com Office: 571-353-6127 / Cell: 803-443-8002
Predicting & Preventing Banking Customer Churn by Unlocking Big Data Making Sense of Big Data http://www.ngdata.com Predicting & Preventing Banking Customer Churn by Unlocking Big Data 1 Predicting & Preventing
AT&T Location Information Services Table of Contents What Can Location Technology Do For You AT&T Vision for Location Information Services AT&T LIS Overview and Delivery Model Location Technologies and
Conference by STATEC and EUROSTAT Savoir pour agir: la statistique publique au service des citoyens big data in the European Statistical System Michail SKALIOTIS EUROSTAT, Head of Task Force 'Big Data'
Predicting & Preventing Banking Customer Churn by Unlocking Big Data Customer Churn: A Key Performance Indicator for Banks In 2012, 50% of customers, globally, either changed their banks or were planning
cloud report JAN 2014 Netskope Cloud Report In the second Netskope Cloud Report, we ve compiled the most interesting trends on cloud app adoption and usage based on aggregated, anonymized data from the
Big Data for Social Good Nuria Oliver, PhD Scientific Director User, Data and Media Intelligence Telefonica Research 6.8 billion subscriptions 96% of world s population (ITU) Mobile penetration of 120%
Certification In SAS Programming Introduction to SAS Program What Lies Ahead In this session, you will gain answers to: Overview of Analytics Careers in Analytics Why Use SAS? Introduction to SAS System
Are You Ready for Big Data? Jim Gallo National Director, Business Analytics April 10, 2013 Agenda What is Big Data? How do you leverage Big Data in your company? How do you prepare for a Big Data initiative?
PREDICTIVE ANALYTICS IN FRAUD Click Scott to White edit Master subtitle style Business Development Manager Why predict? Organizations that use predictive business performance metrics will increase their
The future of Big Data A United Hitachi View Alex van Die Pre-Sales Consultant 1 Oktober 2014 1 Agenda Evolutie van Data en Analytics Internet of Things Hitachi Social Innovation Vision and Solutions 2
How Financial Services Firms Can Benefit From Streaming Analytics > 2 VITRIA TECHNOLOGY, INC. > How Financial Services Firms Can Benefit From Streaming Analytics Streaming Analytics: Why It s Important
We ve never told our in-depth story like this before: looking at the history, lessons learnt and the challenges faced on our Big Data journey. Lorraine Stone EXCLUSIVE INTERVIEW A BEHIND THE SCENES LOOK
Big Data Executive Full Questionnaire Big Date Executive Full Questionnaire Appendix B Questionnaire Welcome The survey has been designed to provide a benchmark for enterprises seeking to understand the
Demystifying Big Data James Rawlins Senior Consultant What s the Big Idea? Objectives for the session Understand what is meant by Big Data Provide some context around what is Big Are we big? Understand
W H I T E P A P E R Deriving Intelligence from Large Data Using Hadoop and Applying Analytics Abstract This white paper is focused on discussing the challenges facing large scale data processing and the
Big data The three-minute guide Don t squint. Select the full-screen option to view at full size. Big Data The three-minute guide 1 2 What is big data? It s about insight Big data generally refers to datasets
Big Data for Development: What May Determine Success or failure? Emmanuel Letouzé firstname.lastname@example.org OECD Technology Foresight 2012 Paris, October 22 Swimming in Ocean of data Data deluge Algorithms
Big Data better business benefits Paul Edwards, HouseMark 2 December 2014 What I ll cover.. Explain what big data is Uses for Big Data and the potential for social housing What Big Data means for HouseMark
Survey Results Table of Contents Survey Results... 4 Big Data Company Strategy... 6 Big Data Business Drivers and Benefits Received... 8 Big Data Integration... 10 Big Data Implementation Challenges...
The Changing Face of Mobile Customer Engagement Sheryl Kingstone, Research Director, Yankee Group Mobile Research Summit: Data & Insights 2014 June 19, 2014 Page 1 Copyright 2014 Agenda Why Mobile is Changing
Social Media Marketing for Local Businesses The average number of hours a U.S. consumer spends on social media per week. - PQ Media, 2013 Social is the Norm A lot has changed in the 10 years since Facebook
The Principles of the Business Data Lake The Business Data Lake Culture eats Strategy for Breakfast, so said Peter Drucker, elegantly making the point that the hardest thing to change in any organization
Bringing Strategy to Life Using an Intelligent Platform to Become Ready Informatica Government Summit April 23, 2015 Informatica Solutions Overview Power the -Ready Enterprise Government Imperatives Improve
Value of Clinical and Business Data Analytics for Healthcare Payers NOUS INFOSYSTEMS LEVERAGING INTELLECT Abstract As there is a growing need for analysis, be it for meeting complex of regulatory requirements,
Experian Cross Channel Marketing Platform Managing campaigns and reaching consumers in real time The relationship between brands and customers has fundamentally changed. Whereas once there was equilibrium
Crittercism 2 Key Takeaways Mobile App Complexity App Responsiveness App Uptime (Crash) Mobile app performance is challenging with over 100M permutations of variables to manage. To be competitive, your
Business Intelligence Trends For 2013 10 Trends The last few years the change in Business Intelligence seems to accelerate under the pressure of increased business demand and technology innovations. Here
IBM Business Analytics software for Insurance Nischal Kapoor Global Insurance Leader - APAC 2 Non-Life Insurance in Thailand Rising vehicle sales and mandatory motor third-party insurance supported the
Kuwait National Assembly Media Department SOCIAL MEDIA MONITORING AND SENTIMENT ANALYSIS SYSTEM Dr. Salah Alnajem Associate Professor of Computational Linguistics and Natural Language Processing, Kuwait
Improving customer service with data 19 may 2015 Maarten Jonker Leiden Contents Introduction Our approach Our practice Increasing our understanding of data and using knowledge Achmea s digital-first principles
Discover How a 360-Degree View of the Customer Boosts Productivity and Profits eguide eguide Discover How a 360-Degree View of the Customer Boosts Productivity and Profits A guide on the benefits of using
Software Engineering for Big Data CS846 Paulo Alencar David R. Cheriton School of Computer Science University of Waterloo Big Data Big data technologies describe a new generation of technologies that aim
Data Analytics in Organisations and Business Dr. Isabelle E-mail: email@example.com 1 Data Analytics in Organisations and Business Some organisational information: Tutorship: Gian Thanei:
GE Intelligent Platforms The Rise of Industrial Big Data Leveraging large time-series data sets to drive innovation, competitiveness and growth capitalizing on the big data opportunity The Rise of Industrial
The Challenges of Geospatial Analytics in the Era of Big Data Dr Noordin Ahmad National Space Agency of Malaysia (ANGKASA) CITA 2015: 4-5 August 2015 Kuching, Sarawak Big datais an all-encompassing term
Utilizing big data to bring about innovative offerings and new revenue streams DATA-DERIVED GROWTH ACTIONABLE INTELLIGENCE Ericsson is driving the development of actionable intelligence within all aspects
Location-Based Social Media Intelligence ASIS Middle East Conference Dubai, UAE February 23, 2016 Don Zoufal CrowZnest Consulting, Inc. University of Chicago Presenter Donald R. Zoufal, C.P.P., ICAO AVSEC
Big Data is Changing Business Opportunities and Challenges The Regent Hotel, Beijing, China 20 th November, 2012 Presented by: Craig Stires, Research Director, IDC Asia/Pacific @Craig_IDC Copyright IDC.
The Next-Generation BPM for a Big Data World: Intelligent Business Process Management Suites (ibpms) Kai Wähner firstname.lastname@example.org @KaiWaehner www.kai-waehner.de Xing / LinkedIn Please connect! Kai
Big Data-Challenges and Opportunities White paper - August 2014 User Acceptance Tests Test Case Execution Quality Definition Test Design Test Plan Test Case Development Table of Contents Introduction 1
Paid Search Services Results-driven, mathematical PPC Pay Per Click advertising is widely acknowledged as the most powerful form of direct marketing. Not only are you gaining access to people that want
COULD VS. SHOULD: BALANCING BIG DATA AND ANALYTICS TECHNOLOGY The business world is abuzz with the potential of data. In fact, most businesses have so much data that it is difficult for them to process