ESS event: Big Data in Official Statistics



Similar documents
ESS Big Data Event Rome 2014

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

Modernization of European Official Statistics through Big Data methodologies and best practices: ESS Big Data Event Roma 2014

XML enabled databases. Non relational databases. Guido Rotondi

Integrating a Big Data Platform into Government:

You should have a working knowledge of the Microsoft Windows platform. A basic knowledge of programming is helpful but not required.

BIG DATA & DATA SCIENCE

The use of Big Data for statistics

This Symposium brought to you by

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract

Questionnaire about the skills necessary for people. working with Big Data in the Statistical Organisations

big data in the European Statistical System

International collaboration to understand the relevance of Big Data for official statistics

Bigg-Data LLC, Data Scientists Hadoop Developers/Administrators

Building Your Big Data Team

New Frontiers for Official Statistics

Advanced Big Data Analytics with R and Hadoop

BIG DATA. Value 8/14/2014 WHAT IS BIG DATA? THE 5 V'S OF BIG DATA WHAT IS BIG DATA?

The? Data: Introduction and Future

Implement Hadoop jobs to extract business value from large and varied data sets

How To Handle Big Data With A Data Scientist

The 4 Pillars of Technosoft s Big Data Practice

Reference Architecture, Requirements, Gaps, Roles

Data Science Certificate Program

Consulting and Systems Integration (1) Networks & Cloud Integration Engineer

POSTGRAD PLACEMENTS. Placements are an integral part of the Masters programmes, so international students will not require additional work visas.

European Master in Official Statistics

HLG - Big Data Sandbox for Statistical Production

Sunnie Chung. Cleveland State University

Big Data & Netflix. Paul Ellwood February 9th, 2015

Hadoop for Enterprises:

FP7-ICT Scalable Data Analytics. Deadline: 16 April 2013 at 17:00:00 (Brussels local time)

Redesigning Data System Technology Curricula. IBM BDAEdCon 2014 Las Vegas Dr. Elena Gortcheva, Program Chair for MSc Data Systems Technology, UMUC

Machine Learning and Cloud Computing. trends, issues, solutions. EGI-InSPIRE RI

Educational Opportunities in Big Data

Big Data: calling for a new scope in the curricula of Computer Science. Dr. Luis Alfonso Villa Vargas

Manifest for Big Data Pig, Hive & Jaql

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW

Big Data Explained. An introduction to Big Data Science.

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

SEYMOUR SLOAN IDEAS THAT MATTER

Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, Viswa Sharma Solutions Architect Tata Consultancy Services

Global IDs gets big into 'big data' management

BIG DATA IN BUSINESS ENVIRONMENT

Transforming the Telecoms Business using Big Data and Analytics

COMP9321 Web Application Engineering

Annex: Concept Note. Big Data for Policy, Development and Official Statistics New York, 22 February 2013

UN Global Working Group on Big Data

Big Data Terminology - Key to Predictive Analytics Success. Mark E. Johnson Department of Statistics University of Central Florida F2: Statistics

BIG DATA AND ANALYTICS

Big Data Executive Survey

PREDICTIVE MARKETING, DIGITAL ATTRIBUTION, OPTIMIZATION, AND DATA-DRIVEN PERSONALIZATION

Collaborations between Official Statistics and Academia in the Era of Big Data

An interdisciplinary model for analytics education

Big Data and Data Science. The globally recognised training program

European Archival Records and Knowledge Preservation Database Archiving in the E-ARK Project

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE

CSPA. Common Statistical Production Architecture International activities on Big Data in Official Statistics. Carlo Vaccari Istat

BEYOND POINT AND CLICK THE EXPANDING DEMAND FOR CODING SKILLS BURNING GLASS TECHNOLOGIES JUNE 2016

Big data for official statistics

Big Data Integration: A Buyer's Guide

Challenges of Analytics

NOS for Data Analysis (802) September 2014 V1.3

What is Data Science? Girl Develop It! Meetup Renée M. P. Teate, March 2015

Big Data and Analytics: Challenges and Opportunities

22 nd Meeting of the European Statistical System Committee

Introduction to Big Data! with Apache Spark" UC#BERKELEY#

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

Big Data and Data Science: Behind the Buzz Words

ONS Big Data Project Progress report: Qtr 1 Jan to Mar 2014

Big Data (Adv. Analytics) in 15 Mins. Peter LePine Managing Director Sales Support IM & BI Practice

Data Science and Business Analytics Certificate Data Science and Business Intelligence Certificate

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES

White Paper: Datameer s User-Focused Big Data Solutions

Big Data Challenges and Success Factors. Deloitte Analytics Your data, inside out

BIG DATA TOOLS. Top 10 open source technologies for Big Data

Strategies For Setting Up Your Organisation For Success With Big Data. Kevin Long Business Development Director Teradata

Data Mining in the Swamp

Some Economics of Cultural PSI: the Micro Perspective

May 2015 Robert Gibbon & Jochen Stroobants

Using Tableau Software with Hortonworks Data Platform

IBM: An Early Leader across the Big Data Security Analytics Continuum Date: June 2013 Author: Jon Oltsik, Senior Principal Analyst

How To Learn To Use Big Data

DATA SCIENCE CURRICULUM WEEK 1 ONLINE PRE-WORK INSTALLING PACKAGES COMMAND LINE CODE EDITOR PYTHON STATISTICS PROJECT O5 PROJECT O3 PROJECT O2

SEIZE THE DATA SEIZE THE DATA. 2015

Native Connectivity to Big Data Sources in MicroStrategy 10. Presented by: Raja Ganapathy

USING BIG DATA FOR INTELLIGENT BUSINESSES

Getting Started Practical Input For Your Roadmap

BIG DATA: STORAGE, ANALYSIS AND IMPACT GEDIMINAS ŽYLIUS

What is Data Science? Data, Databases, and the Extraction of Knowledge Renée November 2014

SURVEY REPORT DATA SCIENCE SOCIETY 2014

PDF PREVIEW EMERGING TECHNOLOGIES. Applying Technologies for Social Media Data Analysis

Introduction to Big Data Analytics p. 1 Big Data Overview p. 2 Data Structures p. 5 Analyst Perspective on Data Repositories p.

Statistics for BIG data

Big Analytics: A Next Generation Roadmap

Big Data Analytics for Space Exploration, Entrepreneurship and Policy Opportunities. Tiffani Crawford, PhD

Optimized Hadoop for Enterprise

HOW BIG DATA IS IMPROVING MANAGEMENT

P4.1 Reference Architectures for Enterprise Big Data Use Cases Romeo Kienzler, Data Scientist, Advisory Architect, IBM Germany, Austria, Switzerland

Transcription:

ESS event: Big Data in Official Statistics v erbi v is 1

Parallel sessions 2A and 2B LEARNING AND DEVELOPMENT: CAPACITY BUILDING AND TRAINING FOR ESS HUMAN RESOURCES FACILITATOR: JOSÉ CERVERA- FERRI 2

Session 2 Related Scheveningen challenges [SCH5] Short-term Human Resources needs: recruitment, professional training, secondment/re-deployment [SCH5] Long-term needs: academic curricula for Data Scientists [SCH6] Collaboration with academia for training Data Scientists for official statistics 3

Session 2: Topics for discussion Skills for Big Data Opportunities for building skills Proposal for a key input to the roadmap to be established by the ESS Task Force Cross-cutting: short-term vs long-term 4

Session 2: Organization Short-term Long-term Skills for Big Data Opportunities for acquiring skills Proposal for a roadmap to acquire skills for Big Data in the ESS Session 2A Session 2A Session 2B Session 2B 5

Parallel session 2A SKILLS FOR BIG DATA OPPORTUNITIES FOR ACQUIRING SKILLS 6

Session 2A Preliminary considerations (1): Can NSIs rely on existing skills? Non-traditional set of skills to develop Trained statisticians and IT staff in statistics are already close to the data science skills required for Big Data (data cleaning, cubes, analytical software, data mining, etc.). Staff well-trained in methodology and statistical domains (UNECE Sprint paper, SWOT analysis strength). The Official Statistics Community has less knowledge of Big Data than many important players like Google. The Official Statistics Community has limited skills and limited IT resources when it comes to the new, nontraditional, technologies used to gather, process and analyse Big Data (UNECE Sprint paper, SWOT analysis weakness). 7

Session 2A Preliminary considerations (1): Can NSIs rely on existing skills? (cont.) Young staff coming in from universities may be very innovative and already have a personal relationship with Big Data (Facebook, Google, Twitter trends) and less constrained by traditional IT and analysis (UNECE Sprint paper, SWOT analysis opportunity). Failure to permit innovative methods might render OSC organizations less attractive workplaces for top talent (UNECE Sprint paper, SWOT analysis threats). Cultural change: a culture that values high quality and accurate information and regards the best way to achieve this through use of methods where the design can be controlled. Big Data doesn't allow this luxury Innovative thinking, risk-taking (is it the realm of Civil Servants??) 8

Session 2A Preliminary considerations (2): Learning methods Learning by doing in OS Training individuals, or teams? The business analyst and project manager The mathematician who builds algorithms The data architect The statistician (data collection, editing, processing) The communicator (visualization) Data analyst Data scientist Data engineer Data integrator System manager 9

Session 2A Preliminary considerations (3): Competition Competition with the Industry: better salaries in the private sector for Data Scientists? How to retain the talent? 10

Session 2A Skills for Big Data Data Scientist vs. Statistician Data Scientist as the connective tissue between data-processing technologies and datadriven decision making Necessary skills: math/statistics, IT, visualization, subject matter specialization Math/stat: data mining techniques IT: Hadoop, MongoDB, NoSQL, 11

Session 2A: IT Skills for Big Data R-SAS-SPSS Business Intelligence, Visual Analytics, Excel MapReduce Pig, Java SQL ETL (Extract, transform, load) Linux Which are the priorities? 12

Session 2A Statistical Skills for Big Data Computational statistics Analytical methods: correlations & causality, modelling, network analysis, information reduction Dissemination: data visualization Which are the priorities? 13

Session 2A Opportunities in the ESS ESS Learning and Development Framework ESTP 2014 course Big Data: Effective Processing and Analysis of Very Large and Unstructured Data for Official Statistics Contents: classification of various massive data sets, ETL (extract, transform, load), specific challenges, Privacy and statistical disclosure issues, comuting base, overview of statistical methods. Focus on concrete examples. Course requirements: Database fundamentals and data manipulation languages Data collection and integration tools Data mining techniques for large data sets Object-oriented design and programming Probablity and random variables Is there anyone with such a complete background in Official Statistics??? European Masters in Official Statistics (EMOS): ESS certification of programmes offered by Universities EMOS workshop 2014 (Helsinki, June 2014) Other methods for transfer of know-how within the ESS? 14

Parallel session 2B OPPORTUNITIES FOR ACQUIRING SKILLS (CONT.) KEY INPUT TO THE ROADMAP TO BE ESTABLISHED BY THE ESSTASK FORCE 15

Sessions 2B Opportunities outside the ESS Grasping the opportunities outside: Diversity of academic programmes on Big Data, Business Analytics, Data Science (certification?) Training offer from private companies (certification?) Opportunities within Horizon 2020 16

Session 2B [SH6] Collaboration with Academia Academic collaborators: use of existing expertise in statistical analysis of large sets of data: astronomy, remote sensing, genetics, image processing. Source of training: need for mapping academic programmes on Big Data How can academics be integrated with NSI staff? How can training be financed? National or ESS level? 17

Session 2B Horizon 2020 Marie Sklodowska-Curie actions: support for innovative training networks, mobility of researchers, inter-sectoral cooperation ICT 15-2014: Big data and Open Data Innovation and take-up: Objective: To contribute to capacity-building by designing and coordinating a network of European skills centres for big data analytics technologies and business development. The network is expected to identify knowledge/skills gaps in the European industrial landscape and produce effective learning curricula and documentation to train large numbers of European data analysts and business developers, capable of (co)operating across national borders on the basis of a common vision and methodology Expected impact: Availability of deployable educational material for data scientists and data workers and thousands of European data professionals trained in state-ofthe-art data analytics technologies and capable of (co)operating in cross-border, cross-lingual and cross-sector European data supply chains. Call on Training and educating Data Scientists More detailed linkages in Horizon 2020?? 18

Session 2B Input to the Roadmap: The actions Ideas for actions (which term?): Identify existing skills in the ESS Recruit Data Scientist with the missing skills Establish a network of providers of Big Data skills within the ESS Map the offer of Data Science training programmes in the private sector and their applicability to OS Establish a repository of assessed training materials Establish agreements with private sector and academia as providers of training, Who? NSIs, Eurostat, International organizations, private sector, Academia? Working Groups? Gexp (EMOS), HLG, ESTP,??? Which source of financing? Horizon 2020? Eurostat? National budgets? 19

Session 2B Input to the Roadmap: The actors Ideas for actors : NSIs Eurostat International organizations Universities Private sector 20

Session 2B Input to the Roadmap for Big Data training Brainstorming of ideas for building skills Assessment: sort by impact and ease of implementation Discussion of term, actors and level (national/eu/global), Proposal of responsibilities and time frame for the Input Rome Roadmap 21