BIG DATA AND OFFICIAL STATISTICS. Filomena Maggino, Monica Pratesi



Similar documents
second level university master Academic Year 2013/14 QoLexity Measuring, Monitoring and Analysis of Quality of Life and its Complexity

Information Visualization WS 2013/14 11 Visual Analytics

Marketing Mix Modelling and Big Data P. M Cain

Statistics Canada s National Household Survey: State of knowledge for Quebec users

Strategies For Setting Up Your Organisation For Success With Big Data. Kevin Long Business Development Director Teradata

SIMon Social Indicators Monitor

Paid and Unpaid Labor in Developing Countries: an inequalities in time use approach

Big Data Big Security Problems? Ivan Damgård, Aarhus University

Statistics for BIG data

Sampling solutions to the problem of undercoverage in CATI household surveys due to the use of fixed telephone list

Section I. Context Chapter 1. Country s context and current equity situation.

REFLECTIONS ON THE USE OF BIG DATA FOR STATISTICAL PRODUCTION

Country Profile on Economic Census

PIAAC Outline of First International Report (2013) & Proposed Thematic PIAAC Data Analysis ADVANCED OUTLINE OF THE FIRST INTERNATIONAL PIAAC REPORT 1

Executive summary. Table of contents. Four options, one right decision. White Paper Fitting your Business Intelligence solution to your enterprise

Fitting Your Business Intelligence Solution to Your Enterprise

Exploratory Data Analysis with #codemash

Community Summary EDI Wave 5 (2011/ /13) School District 8 Kootenay Lake

Producing official statistics via voluntary surveys the National Household Survey in Canada. Marc. Hamel*

FORUM ON THE FUTURE OF THE CARIBBEAN ARE THERE REALLY DATA SOLUTIONS? i

Zhenping Liu *, Yao Liang * Virginia Polytechnic Institute and State University. Xu Liang ** University of California, Berkeley

OECD SOCIAL COHESION POLICY REVIEWS

Mobile phone data for Mobility statistics

The impact of social media is pervasive. It has

Interpreting Web Analytics Data

WHITE PAPER ON. Operational Analytics. HTC Global Services Inc. Do not copy or distribute.

Tips for Conducting a Gender Analysis at the Activity or Project Level

Three powerful analytics use cases for Customer Link. How linked data powers smarter analytics and better predictive models

Are Social Networking Sites a Source of Online Harassment for Teens? Evidence from Survey Data

CONCEPTUALIZING BUSINESS INTELLIGENCE ARCHITECTURE MOHAMMAD SHARIAT, Florida A&M University ROSCOE HIGHTOWER, JR., Florida A&M University

Finance Division. Strategic Plan

2015 COES Annual Conference Urban and Territorial Conflicts: Contesting Social Cohesion? (Santiago de Chile, November 17-20, 2015)

Human Development Index (HDI) and the Role of Women in Development. Eric C. Neubauer, Ph.D. Professor, Social Sciences Department

The primary goal of this thesis was to understand how the spatial dependence of

Curriculum - Doctor of Philosophy

Chapter 1. What is Poverty and Why Measure it?

Statistical Challenges with Big Data in Management Science

STATISTICAL DATA COLLECTION IN MAURITIUS

Grand Challenges Making Drill Down Analysis of the Economy a Reality. John Haltiwanger

?????? Data Analytics

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 29-1

Career, Family and the Well-Being of College-Educated Women. Marianne Bertrand. Booth School of Business

Example application (1) Telecommunication. Lecture 1: Data Mining Overview and Process. Example application (2) Health

Search Engine Marketing(SEM)

Social Indicators and Indicator Systems: Tools for Social Monitoring and Reporting

NATIONAL ACCOUNTS VS BIG DATA

Data Driven Assessment of Cyber Risk:

PRACTICAL DATA MINING IN A LARGE UTILITY COMPANY

Social Sustainability

A Design and implementation of a data warehouse for research administration universities

Ramesh Bhashyam Teradata Fellow Teradata Corporation

Efficiency and Equity

Double Master Degrees in International Economics and Development

of European Municipal Leaders at the Turn of the 21 st Century

Measuring Quality of life in the European Union

Big data in macroeconomics Lucrezia Reichlin London Business School and now-casting economics ltd. COEURE workshop Brussels 3-4 July 2015

TDAQ Analytics Dashboard

Analytics in Days White Paper and Business Case

Databases in Organizations

GETTING REAL ABOUT SECURITY MANAGEMENT AND "BIG DATA"

Integration of Registers and Survey-based Data in the Production of Agricultural and Forestry Economics Statistics

Statistical & Technical Team

How To Find Out How Different Groups Of People Are Different

WHITEPAPER. Unlocking Your ATM Big Data : Understanding the power of real-time transaction analytics.

ETL-EXTRACT, TRANSFORM & LOAD TESTING

Why Sample? Why not study everyone? Debate about Census vs. sampling

Transcription:

BIG DATA AND OFFICIAL STATISTICS Filomena Maggino, Monica Pratesi

What about risks, needs, and challenges of big-data in the context of measuring wellbeing?

«Data are widely available, what is scarce is the ability to extract wisdom from them» (Hal Varian, Google chief economist) http://www.economist.com/node/15557443

challenge risk need

risk loosing the way

BIG more we have, better it is risk loosing the way

BIG more we have, better it is risk loosing the way meaningful mass of information

big should represent an opportunity of transversal reading (this idea is what the multipurpose project at ISTAT has in a nutshell) risk loosing the way

system need 9

Exploiting all data sources in order to describe a consistent frame about community s wellbeing system need 10

through a transversal and horizontal approach creating a big and heterogeneous patrimony from which generating an overall view system need 11

challenge heterogeneity

challenge heterogeneity BIG heterogeneity of its components

challenge heterogeneity not [only] integration of different sources but [also]

challenge heterogeneity building and re-building paths of transversal senses

The definition of new indicators of countries progress and wellbeing introduced new needs of data. 16

BIG DATA

Instruments to manage big data 18

In order to avoid indigestible mixtures

.. a consistent conceptual framework is needed

conceptual framework + big data + analytic instruments = measuring country s wellbeing

In this perspective, we need to take into account the conceptual dimensions describing country s progress and communities wellbeing 22

1. Wellbeing quality of life: o living conditions o subjective wellbeing quality of society social cohesion (participation, trust, social relation, identity) 2. Equity distribution of wellbeing inequalities, regional disparities social exclusion 3. Sustainability Relationship between the previous levels, the environment and the future 23

The conceptual dimensions need to be observed and analyzed at micro level (individual / household) (*) (*) see Stiglitz J. E., A. Sen & J.-P. Fitoussi eds. (2009) Report by the Commission on the Measurement of Economic Performance and Social Progress, Paris. http://www.stiglitz-senfitoussi.fr/en/index.htm 24

Our aim is to introduce BIG DATA and their potential informative load into the dimension of social indicators in the field of official statistics 25

Our challenge is to construct complex indicators able to (i) monitor communities wellbeing (ii) support the definition for better policies by introducing new descriptions captured by big data. 26

Our challenge is to construct complex indicators by meeting the required characteristics 27

Identifying indicators An indicator should be able to: define and describe observe unequivocally and stably record by a degree of distortion as low as possible adhere to the principle of objectivity reflect adequately the conceptual model meet current ad potential users needs be observed through realistic efforts and costs reflect the length of time between its availability and the event of phenomenon it describes be analyzed in order to record differences and disparities be spread (I) METHODOLOGICAL SOUNDNESS (II) INTEGRITY (III) SERVICEABILITY (IV) ACCESSIBILITY

In other words, our goal is to extract consistent knowledge, new insights and meaningful pictures of our societies progress and wellbeing from BIG DATA.

Introduction to Small Area Estimation Population of interest (or target population): population for which the survey is designed directestimators should be reliable for the target population Domains: sub-populations of the population of interest, they could be planned or not in the survey design Geographic areas (e.g. Regions, Provinces, Municipalities, Health Service Area) Socio-demographic groups (e.g. Sex, Age, Race within a large geographic area) Other sub-populations (e.g. the set of firms belonging to a industry subdivision) we don t know the reliability of directestimators for the domains that have not been planned in the survey design

Introduction to Small Area Estimation Often direct estimators are not reliable for some domains of interest In these cases we have two choices: oversampling over that domains applying statistical techniques that allow for reliable estimates in that domains Small Domain or Small Area: geographical area or domain where direct estimators do not reach a minimum level of precision Small Area Estimator (SAE): an estimator created to obtain reliable estimate in a Small Area

Small Area Estimation and Big Data Our aim is to use the huge source of data coming from human activities - the big data - to make accurate inference at a small area level We identified three possible approaches: 1. Use big data as covariates in small area models 2. Use survey data to remove self-selection bias from estimates obtained using big data 3. Use big data to validate small area estimates

Use Big Data as Covariates in Small Area Models Big data often provide unit level data The outcome variable have to be linked to auxiliary variables in order to use unit level data in a small area model Due to technical challenges and law restrictions, it is unfeasible at this stage to have unit level big data that can be linked with administrative archive, census or survey data Big data can be aggregate at area level and then used in an area level model with d i a vector of p variables gathered from big data sources

Use Survey Data to Remove Self-Selection Bias from Estimates Obtained Using Big Data An option is to use big data directly to measure poverty and social exclusion It is realistic to think that the big data are not representative of the whole population of interest (self-selection problem) Using a quality survey we can check the differences in the distribution of common variables between big data and survey data If there aren t common variables we can use known correlated data to check the differencse in the distributions Given this differences, we can compute weights that allow the reduction of bias due to the self-selection of the big data

Use Big Data to Validate Small Area Estimates Poverty and deprivation measures obtained from big data can be compared with similar measures obtained from official survey data If there is accordance between big data estimates and survey data estimates, then there is a double checked evidence of the level of poverty and deprivation If there is discrepancy, there is need of further investigation