BIG DATA AND OFFICIAL STATISTICS. Filomena Maggino, Monica Pratesi
|
|
|
- Kelly Mills
- 9 years ago
- Views:
Transcription
1 BIG DATA AND OFFICIAL STATISTICS Filomena Maggino, Monica Pratesi
2 What about risks, needs, and challenges of big-data in the context of measuring wellbeing?
3 «Data are widely available, what is scarce is the ability to extract wisdom from them» (Hal Varian, Google chief economist)
4 challenge risk need
5 risk loosing the way
6 BIG more we have, better it is risk loosing the way
7 BIG more we have, better it is risk loosing the way meaningful mass of information
8 big should represent an opportunity of transversal reading (this idea is what the multipurpose project at ISTAT has in a nutshell) risk loosing the way
9 system need 9
10 Exploiting all data sources in order to describe a consistent frame about community s wellbeing system need 10
11 through a transversal and horizontal approach creating a big and heterogeneous patrimony from which generating an overall view system need 11
12 challenge heterogeneity
13 challenge heterogeneity BIG heterogeneity of its components
14 challenge heterogeneity not [only] integration of different sources but [also]
15 challenge heterogeneity building and re-building paths of transversal senses
16 The definition of new indicators of countries progress and wellbeing introduced new needs of data. 16
17 BIG DATA
18 Instruments to manage big data 18
19 In order to avoid indigestible mixtures
20 .. a consistent conceptual framework is needed
21 conceptual framework + big data + analytic instruments = measuring country s wellbeing
22 In this perspective, we need to take into account the conceptual dimensions describing country s progress and communities wellbeing 22
23 1. Wellbeing quality of life: o living conditions o subjective wellbeing quality of society social cohesion (participation, trust, social relation, identity) 2. Equity distribution of wellbeing inequalities, regional disparities social exclusion 3. Sustainability Relationship between the previous levels, the environment and the future 23
24 The conceptual dimensions need to be observed and analyzed at micro level (individual / household) (*) (*) see Stiglitz J. E., A. Sen & J.-P. Fitoussi eds. (2009) Report by the Commission on the Measurement of Economic Performance and Social Progress, Paris. 24
25 Our aim is to introduce BIG DATA and their potential informative load into the dimension of social indicators in the field of official statistics 25
26 Our challenge is to construct complex indicators able to (i) monitor communities wellbeing (ii) support the definition for better policies by introducing new descriptions captured by big data. 26
27 Our challenge is to construct complex indicators by meeting the required characteristics 27
28 Identifying indicators An indicator should be able to: define and describe observe unequivocally and stably record by a degree of distortion as low as possible adhere to the principle of objectivity reflect adequately the conceptual model meet current ad potential users needs be observed through realistic efforts and costs reflect the length of time between its availability and the event of phenomenon it describes be analyzed in order to record differences and disparities be spread (I) METHODOLOGICAL SOUNDNESS (II) INTEGRITY (III) SERVICEABILITY (IV) ACCESSIBILITY
29 In other words, our goal is to extract consistent knowledge, new insights and meaningful pictures of our societies progress and wellbeing from BIG DATA.
30 Introduction to Small Area Estimation Population of interest (or target population): population for which the survey is designed directestimators should be reliable for the target population Domains: sub-populations of the population of interest, they could be planned or not in the survey design Geographic areas (e.g. Regions, Provinces, Municipalities, Health Service Area) Socio-demographic groups (e.g. Sex, Age, Race within a large geographic area) Other sub-populations (e.g. the set of firms belonging to a industry subdivision) we don t know the reliability of directestimators for the domains that have not been planned in the survey design
31 Introduction to Small Area Estimation Often direct estimators are not reliable for some domains of interest In these cases we have two choices: oversampling over that domains applying statistical techniques that allow for reliable estimates in that domains Small Domain or Small Area: geographical area or domain where direct estimators do not reach a minimum level of precision Small Area Estimator (SAE): an estimator created to obtain reliable estimate in a Small Area
32 Small Area Estimation and Big Data Our aim is to use the huge source of data coming from human activities - the big data - to make accurate inference at a small area level We identified three possible approaches: 1. Use big data as covariates in small area models 2. Use survey data to remove self-selection bias from estimates obtained using big data 3. Use big data to validate small area estimates
33 Use Big Data as Covariates in Small Area Models Big data often provide unit level data The outcome variable have to be linked to auxiliary variables in order to use unit level data in a small area model Due to technical challenges and law restrictions, it is unfeasible at this stage to have unit level big data that can be linked with administrative archive, census or survey data Big data can be aggregate at area level and then used in an area level model with d i a vector of p variables gathered from big data sources
34 Use Survey Data to Remove Self-Selection Bias from Estimates Obtained Using Big Data An option is to use big data directly to measure poverty and social exclusion It is realistic to think that the big data are not representative of the whole population of interest (self-selection problem) Using a quality survey we can check the differences in the distribution of common variables between big data and survey data If there aren t common variables we can use known correlated data to check the differencse in the distributions Given this differences, we can compute weights that allow the reduction of bias due to the self-selection of the big data
35 Use Big Data to Validate Small Area Estimates Poverty and deprivation measures obtained from big data can be compared with similar measures obtained from official survey data If there is accordance between big data estimates and survey data estimates, then there is a double checked evidence of the level of poverty and deprivation If there is discrepancy, there is need of further investigation
second level university master Academic Year 2013/14 QoLexity Measuring, Monitoring and Analysis of Quality of Life and its Complexity
second level university master Academic Year 2013/14 QoLexity Measuring, Monitoring and Analysis of Quality of Life and its Complexity LIST OF SUBJECTS AND TOPICS A. Concepts and tools Total: 7 credits
Information Visualization WS 2013/14 11 Visual Analytics
1 11.1 Definitions and Motivation Lot of research and papers in this emerging field: Visual Analytics: Scope and Challenges of Keim et al. Illuminating the path of Thomas and Cook 2 11.1 Definitions and
Marketing Mix Modelling and Big Data P. M Cain
1) Introduction Marketing Mix Modelling and Big Data P. M Cain Big data is generally defined in terms of the volume and variety of structured and unstructured information. Whereas structured data is stored
Statistics Canada s National Household Survey: State of knowledge for Quebec users
Statistics Canada s National Household Survey: State of knowledge for Quebec users Information note December 2, 2013 INSTITUT DE LA STATISTIQUE DU QUÉBEC Statistics Canada s National Household Survey:
Strategies For Setting Up Your Organisation For Success With Big Data. Kevin Long Business Development Director Teradata
Strategies For Setting Up Your Organisation For Success With Big Data Kevin Long Business Development Director Teradata Agenda Developing a big data strategy and plan that is aligned with your organisation
SIMon Social Indicators Monitor
SIMon Social Indicators Monitor Heinz-Herbert Noll GESIS Leibniz Institute for the Social Sciences - Social Indicators Research Centre (ZSi) Mannheim, Germany InGRID Expert Workshop, Budapest, November
Paid and Unpaid Labor in Developing Countries: an inequalities in time use approach
Paid and Unpaid Work inequalities 1 Paid and Unpaid Labor in Developing Countries: an inequalities in time use approach Paid and Unpaid Labor in Developing Countries: an inequalities in time use approach
Big Data Big Security Problems? Ivan Damgård, Aarhus University
Big Data Big Security Problems? Ivan Damgård, Aarhus University Content A survey of some security and privacy issues related to big data. Will organize according to who is collecting/storing data! Intelligence
Statistics for BIG data
Statistics for BIG data Statistics for Big Data: Are Statisticians Ready? Dennis Lin Department of Statistics The Pennsylvania State University John Jordan and Dennis K.J. Lin (ICSA-Bulletine 2014) Before
Sampling solutions to the problem of undercoverage in CATI household surveys due to the use of fixed telephone list
Sampling solutions to the problem of undercoverage in CATI household surveys due to the use of fixed telephone list Claudia De Vitiis, Paolo Righi 1 Abstract: The undercoverage of the fixed line telephone
Section I. Context Chapter 1. Country s context and current equity situation.
1 Equity in education: dimension, causes and policy responses. Country Analytical Report Russia Outline Russian CAR will follow structural requirements offered in General Guidelines. Outline from this
REFLECTIONS ON THE USE OF BIG DATA FOR STATISTICAL PRODUCTION
REFLECTIONS ON THE USE OF BIG DATA FOR STATISTICAL PRODUCTION Pilar Rey del Castillo May 2013 Introduction The exploitation of the vast amount of data originated from ICT tools and referring to a big variety
Country Profile on Economic Census
Country Profile on Economic Census 1. Name of Country: Cuba 2. Name of Agency Responsible for Economic Census: National Statistics Office The National Statistics Office (NSO) is the leading institution
PIAAC Outline of First International Report (2013) & Proposed Thematic PIAAC Data Analysis ADVANCED OUTLINE OF THE FIRST INTERNATIONAL PIAAC REPORT 1
ADVANCED OUTLINE OF THE FIRST INTERNATIONAL PIAAC REPORT 1 The development and implementation of PIAAC A collaborative effort Form and Style of the first international report A key objective of the first
Executive summary. Table of contents. Four options, one right decision. White Paper Fitting your Business Intelligence solution to your enterprise
White Paper Fitting your Business Intelligence solution to your enterprise Four options, one right decision Executive summary People throughout your organization are called upon daily, if not hourly, to
Fitting Your Business Intelligence Solution to Your Enterprise
White paper Fitting Your Business Intelligence Solution to Your Enterprise Four options, one right decision. Table of contents Executive summary... 3 The impediments to good decision making... 3 How the
Exploratory Data Analysis with R. @matthewrenze #codemash
Exploratory Data Analysis with R @matthewrenze #codemash Motivation The ability to take data to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it that
Community Summary EDI Wave 5 (2011/12-2012/13) School District 8 Kootenay Lake
Community Summary EDI Wave 5 (2011/12-2012/13) School District 8 Kootenay Lake The EDI is a Canadianmade research tool, developed at the Offord Centre for Child Studies at McMaster University. has been
Producing official statistics via voluntary surveys the National Household Survey in Canada. Marc. Hamel*
Proceedings 59th ISI World Statistics Congress, 25-30 August 2013, Hong Kong (Session STS034) p.1762 Producing official statistics via voluntary surveys the National Household Survey in Canada Marc. Hamel*
FORUM ON THE FUTURE OF THE CARIBBEAN ARE THERE REALLY DATA SOLUTIONS? i
FORUM ON THE FUTURE OF THE CARIBBEAN ARE THERE REALLY DATA SOLUTIONS? i 1. DATA NEEDS FOR MULTI-DIMENSIONAL POVERTY MEASUREMENT: Evidently the measurement of poverty in all its dimensions requires high
Zhenping Liu *, Yao Liang * Virginia Polytechnic Institute and State University. Xu Liang ** University of California, Berkeley
P1.1 AN INTEGRATED DATA MANAGEMENT, RETRIEVAL AND VISUALIZATION SYSTEM FOR EARTH SCIENCE DATASETS Zhenping Liu *, Yao Liang * Virginia Polytechnic Institute and State University Xu Liang ** University
OECD SOCIAL COHESION POLICY REVIEWS
OECD SOCIAL COHESION POLICY REVIEWS CONCEPT NOTE Social Cohesion Policy Reviews are a new OECD tool to: measure the state of social cohesion in a society and monitor progress over time; assess policies
Mobile phone data for Mobility statistics
International Conference on Big Data for Official Statistics Organised by UNSD and NBS China Beijing, China, 28-30 October 2014 Mobile phone data for Mobility statistics Emanuele Baldacci Italian National
The impact of social media is pervasive. It has
Infosys Labs Briefings VOL 12 NO 1 2014 Social Enablement of Online Trading Platforms By Sivaram V. Thangam, Swaminathan Natarajan and Venugopal Subbarao Socially connected retail stock traders make better
Interpreting Web Analytics Data
Interpreting Web Analytics Data Whitepaper 8650 Commerce Park Place, Suite G Indianapolis, Indiana 46268 (317) 875-0910 [email protected] www.pentera.com Interpreting Web Analytics Data At some point in
WHITE PAPER ON. Operational Analytics. HTC Global Services Inc. Do not copy or distribute. www.htcinc.com
WHITE PAPER ON Operational Analytics www.htcinc.com Contents Introduction... 2 Industry 4.0 Standard... 3 Data Streams... 3 Big Data Age... 4 Analytics... 5 Operational Analytics... 6 IT Operations Analytics...
Tips for Conducting a Gender Analysis at the Activity or Project Level
Tips for Conducting a Gender Analysis at the Activity or Project Level Additional Help for ADS Chapter 201 New Reference: 03/17/2011 Responsible Office: EGAT/WID File Name: 201sae_031711 Tips for Conducting
Three powerful analytics use cases for Customer Link. How linked data powers smarter analytics and better predictive models
Three powerful analytics use cases for Customer Link 1 How linked data powers smarter analytics and better predictive models 0123 4567 8901 2345 The power of linked data When it comes to adopting new tech
Are Social Networking Sites a Source of Online Harassment for Teens? Evidence from Survey Data
Are Social Networking Sites a Source of Online Harassment for Teens? Evidence from Survey Data Anirban Sengupta 1 Anoshua Chaudhuri 2 Abstract Media reports on incidences of abuse on the internet, particularly
CONCEPTUALIZING BUSINESS INTELLIGENCE ARCHITECTURE MOHAMMAD SHARIAT, Florida A&M University ROSCOE HIGHTOWER, JR., Florida A&M University
CONCEPTUALIZING BUSINESS INTELLIGENCE ARCHITECTURE MOHAMMAD SHARIAT, Florida A&M University ROSCOE HIGHTOWER, JR., Florida A&M University Given today s business environment, at times a corporate executive
Finance Division. Strategic Plan 2014-2019
Finance Division Strategic Plan 2014-2019 Introduction Finance Division The Finance Division of Carnegie Mellon University (CMU) provides financial management, enterprise planning and stewardship in support
2015 COES Annual Conference Urban and Territorial Conflicts: Contesting Social Cohesion? (Santiago de Chile, November 17-20, 2015)
2015 COES Annual Conference Urban and Territorial Conflicts: Contesting Social Cohesion? (Santiago de Chile, November 17-20, 2015) Following the 2014 COES Annual Conference on Social Movements in Latin
Human Development Index (HDI) and the Role of Women in Development. Eric C. Neubauer, Ph.D. Professor, Social Sciences Department
Human Development Index (HDI) and the Role of Women in Development Eric C. Neubauer, Ph.D. Professor, Social Sciences Department What is Development? Historically, associated with economic development
The primary goal of this thesis was to understand how the spatial dependence of
5 General discussion 5.1 Introduction The primary goal of this thesis was to understand how the spatial dependence of consumer attitudes can be modeled, what additional benefits the recovering of spatial
Curriculum - Doctor of Philosophy
Curriculum - Doctor of Philosophy CORE COURSES Pharm 545-546.Pharmacoeconomics, Healthcare Systems Review. (3, 3) Exploration of the cultural foundations of pharmacy. Development of the present state of
Chapter 1. What is Poverty and Why Measure it?
Chapter 1. What is Poverty and Why Measure it? Summary Poverty is pronounced deprivation in well-being. The conventional view links well-being primarily to command over commodities, so the poor are those
Statistical Challenges with Big Data in Management Science
Statistical Challenges with Big Data in Management Science Arnab Kumar Laha Indian Institute of Management Ahmedabad Analytics vs Reporting Competitive Advantage Reporting Prescriptive Analytics (Decision
STATISTICAL DATA COLLECTION IN MAURITIUS
Organisational Framework STATISTICAL DATA COLLECTION IN MAURITIUS The Central Statistics Office (CSO), which was set up in 1945, is the official organisation responsible for the collection, compilation,
Grand Challenges Making Drill Down Analysis of the Economy a Reality. John Haltiwanger
Grand Challenges Making Drill Down Analysis of the Economy a Reality By John Haltiwanger The vision Here is the vision. A social scientist or policy analyst (denoted analyst for short hereafter) is investigating
?????? Data Analytics
?????? Data Analytics Prof. Dr.-Ing. Lars Linsen Prof. Dr. Adalbert FX Wilhelm Fall 2015 0. Organizational Stuff 0.1 Syllabus and Organization Data Analytics 3 Course website http://www.faculty.jacobsuniversity.de/llinsen/teaching/??????.htm
Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 29-1
Slide 29-1 Chapter 29 Overview of Data Warehousing and OLAP Chapter 29 Outline Purpose of Data Warehousing Introduction, Definitions, and Terminology Comparison with Traditional Databases Characteristics
Career, Family and the Well-Being of College-Educated Women. Marianne Bertrand. Booth School of Business
Career, Family and the Well-Being of College-Educated Women Marianne Bertrand Booth School of Business Forthcoming: American Economic Review Papers & Proceedings, May 2013 Goldin (2004) documents that
Example application (1) Telecommunication. Lecture 1: Data Mining Overview and Process. Example application (2) Health
Lecture 1: Data Mining Overview and Process What is data mining? Example applications Definitions Multi disciplinary Techniques Major challenges The data mining process History of data mining Data mining
Search Engine Marketing(SEM)
Search Engine Marketing(SEM) Module 1 Website Analysis Competition Analysis About Internet Marketing Scope & Career Opportunities Basics Of HTML & Website Development Platforms Module 2. Search Engine
Social Indicators and Indicator Systems: Tools for Social Monitoring and Reporting
Social Indicators and Indicator Systems: Tools for Social Monitoring and Reporting Heinz-Herbert Noll ZUMA Social Indicators Department Mannheim, Germany www.gesis.org/sozialindikatoren/ OECD World Forum
NATIONAL ACCOUNTS VS BIG DATA
NATIONAL ACCOUNTS VS BIG DATA Enrico Giovannini, University of Rome Tor Vergata Department of Economics and Finance [email protected] Big Data (Wikipedia) Big data is a blanket term for any
Data Driven Assessment of Cyber Risk:
Data Driven Assessment of Cyber Risk: Challenges in Assessing and Mitigating Cyber Risk Mustaque Ahamad, Saby Mitra and Paul Royal Georgia Tech InformationSecurity Center Georgia Tech Research Institute
PRACTICAL DATA MINING IN A LARGE UTILITY COMPANY
QÜESTIIÓ, vol. 25, 3, p. 509-520, 2001 PRACTICAL DATA MINING IN A LARGE UTILITY COMPANY GEORGES HÉBRAIL We present in this paper the main applications of data mining techniques at Electricité de France,
Social Sustainability
Social Sustainability March 2, 2011 Global Sustainability 1 Sustainability Global Sustainability 2 Sustainability 1. Sustainability is often defined as meeting the needs of today without compromising the
A Design and implementation of a data warehouse for research administration universities
A Design and implementation of a data warehouse for research administration universities André Flory 1, Pierre Soupirot 2, and Anne Tchounikine 3 1 CRI : Centre de Ressources Informatiques INSA de Lyon
Ramesh Bhashyam Teradata Fellow Teradata Corporation [email protected]
Challenges of Handling Big Data Ramesh Bhashyam Teradata Fellow Teradata Corporation [email protected] Trend Too much information is a storage issue, certainly, but too much information is also
Efficiency and Equity
Efficiency and Equity Lectures 1 and 2 Tresch (2008): Chapters 1, 4 Stiglitz (2000): Chapter 5 Connolly and Munro (1999): Chapter 3 Outline Equity, efficiency and their trade-off Social welfare function
Double Master Degrees in International Economics and Development
Double Master Degrees in International Economics and Development Detailed Course Content 1. «Development theories and contemporary issues for development» (20h) Lectures will explore the related themes
of European Municipal Leaders at the Turn of the 21 st Century
The Hannover Call of European Municipal Leaders at the Turn of the 21 st Century A. PREAMBLE We, 250 municipal leaders from 36 European countries and neighbouring regions, have convened at the Hannover
Measuring Quality of life in the European Union
Measuring Quality of life in Georgiana Ivan, European Commission European context of measuring Quality of Life Indicators Consistency with theory SSF Report The Triangle for Quality of Indicators Europe
Big data in macroeconomics Lucrezia Reichlin London Business School and now-casting economics ltd. COEURE workshop Brussels 3-4 July 2015
Big data in macroeconomics Lucrezia Reichlin London Business School and now-casting economics ltd COEURE workshop Brussels 3-4 July 2015 WHAT IS BIG DATA IN ECONOMICS? Frank Diebold claimed to have introduced
TDAQ Analytics Dashboard
14 October 2010 ATL-DAQ-SLIDE-2010-397 TDAQ Analytics Dashboard A real time analytics web application Outline Messages in the ATLAS TDAQ infrastructure Importance of analysis A dashboard approach Architecture
Analytics in Days White Paper and Business Case
Analytics in Days White Paper and Business Case Analytics Navigating the Maze Analytics is hot. It seems virtually everyone needs or wants it, but many still aren t sure what the business case is or how
Databases in Organizations
The following is an excerpt from a draft chapter of a new enterprise architecture text book that is currently under development entitled Enterprise Architecture: Principles and Practice by Brian Cameron
GETTING REAL ABOUT SECURITY MANAGEMENT AND "BIG DATA"
GETTING REAL ABOUT SECURITY MANAGEMENT AND "BIG DATA" A Roadmap for "Big Data" in Security Analytics ESSENTIALS This paper examines: Escalating complexity of the security management environment, from threats
Integration of Registers and Survey-based Data in the Production of Agricultural and Forestry Economics Statistics
Integration of Registers and Survey-based Data in the Production of Agricultural and Forestry Economics Statistics Paavo Väisänen, Statistics Finland, e-mail: [email protected] Abstract The agricultural
Statistical & Technical Team
Statistical & Technical Team A Practical Guide to Sampling This guide is brought to you by the Statistical and Technical Team, who form part of the VFM Development Team. They are responsible for advice
How To Find Out How Different Groups Of People Are Different
Determinants of Alcohol Abuse in a Psychiatric Population: A Two-Dimensionl Model John E. Overall The University of Texas Medical School at Houston A method for multidimensional scaling of group differences
WHITEPAPER. Unlocking Your ATM Big Data : Understanding the power of real-time transaction analytics. www.inetco.com
Unlocking Your ATM Big Data : Understanding the power of real-time transaction analytics www.inetco.com Summary Banks and credit unions are heavily investing in technology initiatives such as mobile infrastructure
ETL-EXTRACT, TRANSFORM & LOAD TESTING
ETL-EXTRACT, TRANSFORM & LOAD TESTING Rajesh Popli Manager (Quality), Nagarro Software Pvt. Ltd., Gurgaon, INDIA [email protected] ABSTRACT Data is most important part in any organization. Data
Why Sample? Why not study everyone? Debate about Census vs. sampling
Sampling Why Sample? Why not study everyone? Debate about Census vs. sampling Problems in Sampling? What problems do you know about? What issues are you aware of? What questions do you have? Key Sampling
