ESS event: Big Data in Official Statistics v erbi v is 1
Parallel sessions 2A and 2B LEARNING AND DEVELOPMENT: CAPACITY BUILDING AND TRAINING FOR ESS HUMAN RESOURCES FACILITATOR: JOSÉ CERVERA- FERRI 2
Session 2 Related Scheveningen challenges [SCH5] Short-term Human Resources needs: recruitment, professional training, secondment/re-deployment [SCH5] Long-term needs: academic curricula for Data Scientists [SCH6] Collaboration with academia for training Data Scientists for official statistics 3
Session 2: Topics for discussion Skills for Big Data Opportunities for building skills Proposal for a key input to the roadmap to be established by the ESS Task Force Cross-cutting: short-term vs long-term 4
Session 2: Organization Short-term Long-term Skills for Big Data Opportunities for acquiring skills Proposal for a roadmap to acquire skills for Big Data in the ESS Session 2A Session 2A Session 2B Session 2B 5
Parallel session 2A SKILLS FOR BIG DATA OPPORTUNITIES FOR ACQUIRING SKILLS 6
Session 2A Preliminary considerations (1): Can NSIs rely on existing skills? Non-traditional set of skills to develop Trained statisticians and IT staff in statistics are already close to the data science skills required for Big Data (data cleaning, cubes, analytical software, data mining, etc.). Staff well-trained in methodology and statistical domains (UNECE Sprint paper, SWOT analysis strength). The Official Statistics Community has less knowledge of Big Data than many important players like Google. The Official Statistics Community has limited skills and limited IT resources when it comes to the new, nontraditional, technologies used to gather, process and analyse Big Data (UNECE Sprint paper, SWOT analysis weakness). 7
Session 2A Preliminary considerations (1): Can NSIs rely on existing skills? (cont.) Young staff coming in from universities may be very innovative and already have a personal relationship with Big Data (Facebook, Google, Twitter trends) and less constrained by traditional IT and analysis (UNECE Sprint paper, SWOT analysis opportunity). Failure to permit innovative methods might render OSC organizations less attractive workplaces for top talent (UNECE Sprint paper, SWOT analysis threats). Cultural change: a culture that values high quality and accurate information and regards the best way to achieve this through use of methods where the design can be controlled. Big Data doesn't allow this luxury Innovative thinking, risk-taking (is it the realm of Civil Servants??) 8
Session 2A Preliminary considerations (2): Learning methods Learning by doing in OS Training individuals, or teams? The business analyst and project manager The mathematician who builds algorithms The data architect The statistician (data collection, editing, processing) The communicator (visualization) Data analyst Data scientist Data engineer Data integrator System manager 9
Session 2A Preliminary considerations (3): Competition Competition with the Industry: better salaries in the private sector for Data Scientists? How to retain the talent? 10
Session 2A Skills for Big Data Data Scientist vs. Statistician Data Scientist as the connective tissue between data-processing technologies and datadriven decision making Necessary skills: math/statistics, IT, visualization, subject matter specialization Math/stat: data mining techniques IT: Hadoop, MongoDB, NoSQL, 11
Session 2A: IT Skills for Big Data R-SAS-SPSS Business Intelligence, Visual Analytics, Excel MapReduce Pig, Java SQL ETL (Extract, transform, load) Linux Which are the priorities? 12
Session 2A Statistical Skills for Big Data Computational statistics Analytical methods: correlations & causality, modelling, network analysis, information reduction Dissemination: data visualization Which are the priorities? 13
Session 2A Opportunities in the ESS ESS Learning and Development Framework ESTP 2014 course Big Data: Effective Processing and Analysis of Very Large and Unstructured Data for Official Statistics Contents: classification of various massive data sets, ETL (extract, transform, load), specific challenges, Privacy and statistical disclosure issues, comuting base, overview of statistical methods. Focus on concrete examples. Course requirements: Database fundamentals and data manipulation languages Data collection and integration tools Data mining techniques for large data sets Object-oriented design and programming Probablity and random variables Is there anyone with such a complete background in Official Statistics??? European Masters in Official Statistics (EMOS): ESS certification of programmes offered by Universities EMOS workshop 2014 (Helsinki, June 2014) Other methods for transfer of know-how within the ESS? 14
Parallel session 2B OPPORTUNITIES FOR ACQUIRING SKILLS (CONT.) KEY INPUT TO THE ROADMAP TO BE ESTABLISHED BY THE ESSTASK FORCE 15
Sessions 2B Opportunities outside the ESS Grasping the opportunities outside: Diversity of academic programmes on Big Data, Business Analytics, Data Science (certification?) Training offer from private companies (certification?) Opportunities within Horizon 2020 16
Session 2B [SH6] Collaboration with Academia Academic collaborators: use of existing expertise in statistical analysis of large sets of data: astronomy, remote sensing, genetics, image processing. Source of training: need for mapping academic programmes on Big Data How can academics be integrated with NSI staff? How can training be financed? National or ESS level? 17
Session 2B Horizon 2020 Marie Sklodowska-Curie actions: support for innovative training networks, mobility of researchers, inter-sectoral cooperation ICT 15-2014: Big data and Open Data Innovation and take-up: Objective: To contribute to capacity-building by designing and coordinating a network of European skills centres for big data analytics technologies and business development. The network is expected to identify knowledge/skills gaps in the European industrial landscape and produce effective learning curricula and documentation to train large numbers of European data analysts and business developers, capable of (co)operating across national borders on the basis of a common vision and methodology Expected impact: Availability of deployable educational material for data scientists and data workers and thousands of European data professionals trained in state-ofthe-art data analytics technologies and capable of (co)operating in cross-border, cross-lingual and cross-sector European data supply chains. Call on Training and educating Data Scientists More detailed linkages in Horizon 2020?? 18
Session 2B Input to the Roadmap: The actions Ideas for actions (which term?): Identify existing skills in the ESS Recruit Data Scientist with the missing skills Establish a network of providers of Big Data skills within the ESS Map the offer of Data Science training programmes in the private sector and their applicability to OS Establish a repository of assessed training materials Establish agreements with private sector and academia as providers of training, Who? NSIs, Eurostat, International organizations, private sector, Academia? Working Groups? Gexp (EMOS), HLG, ESTP,??? Which source of financing? Horizon 2020? Eurostat? National budgets? 19
Session 2B Input to the Roadmap: The actors Ideas for actors : NSIs Eurostat International organizations Universities Private sector 20
Session 2B Input to the Roadmap for Big Data training Brainstorming of ideas for building skills Assessment: sort by impact and ease of implementation Discussion of term, actors and level (national/eu/global), Proposal of responsibilities and time frame for the Input Rome Roadmap 21