The Impact of Big Data on Social Research David Rhind Sharon Witherspoon

Similar documents
1. Understanding Big Data

Cyber security. Cyber Security. Digital Employee Experience. Digital Customer Experience. Digital Insight. Payments. Internet of Things

Big Data for Social Good. Nuria Oliver, PhD Scientific Director User, Data and Media Intelligence Telefonica Research

BIG DATA: PROMISE, POWER AND PITFALLS NISHANT MEHTA

Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum

Research Note What is Big Data?

Big Data, Official Statistics and Social Science Research: Emerging Data Challenges

Collaborations between Official Statistics and Academia in the Era of Big Data

Big Data & Analytics: Your concise guide (note the irony) Wednesday 27th November 2013

Mohan Sawhney Robert R. McCormick Tribune Foundation Clinical Professor of Technology Kellogg School of Management

Senior Manager Commercial Lending - Position Description

Precision Farming and the Future of Crop Production

Is big data the new oil fuelling development?

How to gather and evaluate information

4th Annual ISACA Kettle Moraine Spring Symposium

data driven government

The big data dilemma an inquiry by the House of Commons Select Committee on Science and Technology

Accountancy Futures Academy. Big data: its power and perils

Big Data Decision Making

Big Data Strategy Issues Paper

How To Improve Data Quality

SIEM is only as good as the data it consumes

Big Data Analytics: 14 November 2013

IBM Technology in Public Safety

Role Activity Grade 5 PAS Professional Officer

Exemplars. Research Report

AgriInsurance in Canada

BIG DATA FOR DEVELOPMENT: A PRIMER

Software Engineering for Big Data. CS846 Paulo Alencar David R. Cheriton School of Computer Science University of Waterloo

The AIR Multiple Peril Crop Insurance (MPCI) Model For The U.S.

Big Data; Old News or New Hype? Marcel den Hartog, June 2012

Enhancing Safeguards Through Information Analysis: Business Analytics Tools. IAEA, Vienna, 09/10/2014. General Use

Geographical Information Systems with Remote Sensing

Session 32 - Big Data Big Changes How the World is Changing. Keith Walter

CSC590: Selected Topics BIG DATA & DATA MINING. Lecture 2 Feb 12, 2014 Dr. Esam A. Alwagait

Big Data and New Paradigms in Information Management. Vladimir Videnovic Institute for Information Management

Availability Digest. Everbridge Emergency Notification July 2014

Digital Agriculture: Leveraging Technology and Information into Profitable Decisions

Find the intruders using correlation and context Ofer Shezaf

Business Analytics and the Nexus of Information

Integrating a Big Data Platform into Government:

BIG DATA FUNDAMENTALS

Improving Project Governance

Understanding the impact of the connected revolution. Vodafone Power to you

End Small Thinking about Big Data

[ Climate Data Collection and Forecasting Element ] An Advanced Monitoring Network In Support of the FloodER Program

CSC SEEKS TO MONETIZE CLIMATE DATA WITH BIG DATA ANALYTICS SEPTEMBER 2012

Using TEM to Fuel the Big Data Machine. Telesoft TEM Edge Webinar December 12, 2012

Data analytics Delivering intelligence in the moment

Government Technology Trends to Watch in 2014: Big Data

Keeping up with the KPIs 10 steps to help identify and monitor key performance indicators for your business

Impact of Big Data in Oil & Gas Industry. Pranaya Sangvai Reliance Industries Limited 04 Feb 15, DEJ, Mumbai, India.

A Future Without Secrets. A NetPay Whitepaper. more for your money

Turning Big Data into Big Decisions Delivering on the High Demand for Data

MES and Industrial Internet

RenaissanceRe. Agriculture Products

ESRC Big Data Network Phase 2: Business and Local Government Data Research Centres Welcome, Context, and Call Objectives

Command Support System

UNCLASSIFIED. Open Data User Group (ODUG) Driver and Vehicle Licensing Agency (DVLA) Bulk Data August 2013

Statistical Challenges with Big Data in Management Science

Valuing Timber Resource Stocks in the Canadian Natural Resource Stock Accounts

Future of Insurance 2020

The U.K. Information Commissioner s Office Report on Big Data and Data Protection

Big Data better business benefits

Matt Erickson Economist American Farm Bureau Federation March 5, 2014

Smart Cities. Opportunities for Service Providers

A report from the Economist Intelligence Unit. Retail banks and big data: Risk and compliance executives weigh in

Privacy & Big Data: Enable Big Data Analytics with Privacy by Design. Datenschutz-Vereinigung von Luxemburg Ronald Koorn DRAFT VERSION 8 March 2014

New Frontiers for Official Statistics

Getting Started Practical Input For Your Roadmap

CEOP Relationship Management Strategy

Managing Data as a Strategic Asset: Reality and Rewards

Connecting things. Creating possibilities. A point of view

The Field. Specialty Areas

Transcription:

The Impact of Big Data on Social Research David Rhind Sharon Witherspoon 1 www.nuffieldfoundation.org

The landscape to be covered What is Big Data? Just consultants hype? Key questions for SRA Technology + other drivers of change New sources of data and their uses Big challenges Back to the future the next Census Presentation also matters Conclusions 2

What is/are Big Data? VOLUME: too large to handle by standard contemporary analytical tools i.e. subjective / relative measure the total amount of data has grown exponentially: it has been estimated that more data was harvested between 2010 and 2012 than in all of preceding human history. Source: http://www.bbc.co.uk/news/business-17682304 Certainly made by Mike Lynch; original source IBM? VELOCITY: how fast data is being produced and how fast it must be produced to meet demand. VARIETY: many different forms of data which are used structured and unstructured (the majority), held in different types of databases as text documents, emails, imagery, videos and much else PROBLEMS: hype, bias in (large) sample, focus on correlations not causality, understanding the results 3

Context and key questions for SRA Current practice mostly survey-based Divide exists between expertise in data collection and analysis skills National shortfall in quantitative analytical skills Will Big Data, etc change the ground-rules of research practice? Are established practices becoming obsolete? Or do we need to assimilate what s new into established principles of research? 4

Drivers of change Extraordinary rate of technological enhancement Austerity better vfm sought Transparency Job creation/ increase wealth Calls for better/ more up to date data/info/evidence Threats to traditional approaches e.g. EU Parliament and Data Protection - Specific and explicit consent Public sector manifestations of change: data scientists sought by government, support of Open Data Institute, ONS exploration of options, data.gov, ESRC 64m funding & ADRCs 5

Technology change Apollo 11 1969 More computing power than Apollo The iphone 4S 2012 in my pocket 3000 x storage of IBM 305 disk drive 1956 Leased for $35,000/year $150 / year 6

7

Mobile phone sensors New(ish) sources of data Proxy: satellite remote sensing 31cm resolution (how to reflect people data?) Proxy: web scraping (e.g. inflation measures) Crowd sourcing e.g. OpenStreetMap Management/ administrative data (public and private sector) Modelling starting from historic data 8

Visitors and locals in Paris 9 Source: Eric Fischer

Uses of different data types Obtaining data about things easy? see remote sensing examples People: location and movement of people technically easy via CCTVs, smartphones. ethnicity, age data approximations from names profiles from private sector data or linked governmental administrative data technically easy Best solution usually is combination of data types.. e.g. land cover and use from imagery and company records 10

Real time data collection now routine for some applications 11 Source: UK MoD under the Open Government license, Google and US Geological Survey

Different uses of imagery at different resolutions 10m resolution See roads and water features 1 to 2 metres resolution, See some cars and individual houses 30 to 60cm resolution, See all visible cars, manholes 12 Source: DigitalGlobe 2014

Extreme crowd sourcing: Pyongyang Open Street Map Also MH 370 13 Source: UK MoD under the Open Government license, Google and US Geological Survey

Admin data / management information Obvious advantages already exists, often continuously maintained, linkage of personal admin data facilitates valuable research and fraud reduction BUT You get (at best) what is created for other purposes Content or classification changes mess up time series Personal admin data sharing and privacy debate Has raw data quality been audited properly (English police recorded crime statistics)? 14

15 Ratio between CSEW incidents and crime recorded by the police

Adding value = a commercial asset Can have huge value e.g. Climate Corporation: 2006 start-up by 2 ex-google staff Linked US government weather, crop yield and soil data Provide yield forecasting and planting advice, weather and crop insurance Bought by Monsanto October 2013 for $930m 16

Big Challenges Trade-off between data integrity and currency. How good is good enough? How fast is fast enough? Want to anticipate the future as well as know the past Private sector increasingly active in data collection and exploitation e.g. Markit surveys used by Bank of England. Internationalisation of data collection/assembly growing. Public understanding: problem with use of technical language e.g. public doesn t really understand n year flood concept. PM confusion of deficit and debt. Changed role of data constructor/statistician? mentors and advocates? This all a matter for the very young? 17

18

Back to the future with surveys? 19

The 2011 Census 2011 Census survey data collection went well but total cost 480m Basically very similar to what done for decades; 16% completed on-line Results started to become available 15 months after survey but much still being published after 3.5 years Changing society more difficult to complete forms Statistics Commission, Treasury Select Committee and UKSA said no more traditional census 20

LFS Response Rates 1993 to 2008 Source: ONS US experience is similar an average of 20% reduction in 20 years 21

The 2021 Census Very strong support from public consultation for continuation of some form of Census ONS plan now accepted in principle by government Model is for an on-line Census+: aim to achieve high (e.g. 65)% of online completion of forms aim to enrich census data by adding variables derived from admin data wherever possible much research under way US Bureau of Census experimenting with use of smartphone-derived data 22

23 Source: ONS

Data presentation also matters! 24

Basic arithmetical error it should be almost 400 not almost 4000! 25 PM confusing deficit and debt

National Infrastructure Plan: Pipeline Value by sector ( m) Moral: how information is presented can seriously mislead (note log scale on Chart 2) Pipeline value by sector 250,000 200,000 150,000 100,000 50,000 Capital Value million 26 - Communications Flood Transport Water

27 Conclusions

Much Big Data hype but a revolution is under way This will change the way we assemble data and do social science to extract added value Much more work will be by multi-disciplinary teams with higher level analytic, quantitative and presentational skills in various disciplines Greater focus still needed on data quality issues Need focus on data sharing governance, ethics and safeguards and on advocacy of benefits Q-Step will help a BIT. But organisations like the SRA and its members have an important role! 28

29 Thank you