EDISON Education for Data Intensive Science to Open New science frontiers

Similar documents
Defining Architecture Components of the Big Data Ecosystem

Big Data course and Learning Model for Online education (LMO) at the Laureate Online Education (University of Liverpool)

Data Science Competence Framework (CF-DS): Approach and First Working Version

Data Science Body of Knowledge (DS-BoK): Approach and Initial

Big Data Standardisation in Industry and Research

Horizon Research e-infrastructures Excellence in Science Work Programme Wim Jansen. DG CONNECT European Commission

Cloud and Big Data Standardisation

Workprogramme

OpenAIRE Research Data Management Briefing paper

Overview NIST Big Data Working Group Activities

8970/15 FMA/AFG/cb 1 DG G 3 C

e-infrastructures in Horizon 2020 Vision, approach, drivers, policy background, challenges, WP structure INFODAY France Paris, 25 mars 2014

9360/15 FMA/AFG/cb 1 DG G 3 C

EMC ACADEMIC ALLIANCE

Standard Big Data Architecture and Infrastructure

ICT 10: Software Technologies

BIG DATA & DATA SCIENCE

H2020-LEIT-ICT WP Big Data PPP

ICT 10: Software Technologies

EUCIP Model and Related Services Frank Mockler Programme Development Manager, ECDL Foundation. Placeholder for licensee logo

Data Intensive Research Initiative for South Africa (DIRISA)

European University Association Contribution to the Public Consultation: Science 2.0 : Science in Transition 1. September 2014

A Capability Maturity Model for Scientific Data Management

THE MCKINSEY GLOBAL INSTITUTE has predicted that by 2018, the US alone could face a shortage of between 140,000 to 190,000 people with deep

Cyber security. Cyber Security. Digital Employee Experience. Digital Customer Experience. Digital Insight. Payments. Internet of Things

Scientific Cloud Computing Infrastructure for Europe Strategic Plan. Bob Jones,

IMPORTANT PROJECT OF COMMON EUROPEAN INTEREST (IPCEI)

Certified Information Professional 2016 Update Outline

D1.3 Industry Advisory Board

W H I T E P A P E R C l i m a t e C h a n g e : C l o u d ' s I m p a c t o n I T O r g a n i z a t i o n s a n d S t a f f i n g

Training for Big Data

WORK PROGRAMME Topic ICT 9: Tools and Methods for Software Development

TURKEY BUSINESS ANALYSIS REPORT Thinking Like the Business

EFFECTS+ Clustering of Trust and Security Research Projects, Identifying Results, Impact and Future Research Roadmap Topics

BUSINESS INTELLIGENCE COMPETENCY CENTER

Understanding Big Data Analytics Applications in Earth Science Morris Riedel, Rahul Ramachandran/Kuo Kwo-Sen, Peter Baumann Big Data Analytics

SMB Update on Service Information Gathering

H2020-EUJ-2016: EU-Japan Joint Call. EUJ : IoT/Cloud/Big Data platforms in social application contexts

@DanSSenter. Business Intelligence Centre of Excellence Manager. +44 (0) dansenter.co.

DGE /DG Connect

ESS event: Big Data in Official Statistics

EMC ACADEMIC ALLIANCE

Defining Architecture Components of the Big Data Ecosystem

Exploring the roles and responsibilities of data centres and institutions in curating research data a preliminary briefing.

A Policy Framework for Canadian Digital Infrastructure 1

European Union Eastern Neighbouring Area Central Asia cooperation triangle in action

Becoming a Business Analyst

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

IFE Strategic Plan

How To Help The European Single Market With Data And Information Technology

BIG DATA + ANALYTICS

H2020-LEIT-ICT WP ICT 14, 15, 17,18. Big Data PPP

A discussion of information integration solutions November Deploying a Center of Excellence for data integration.

A Roadmap for Future Architectures and Services for Manufacturing. Carsten Rückriegel Road4FAME-EU-Consultation Meeting Brussels, May, 22 nd 2015

PROGRAMME SPECIFICATION POSTGRADUATE PROGRAMME

NITRD: National Big Data Strategic Plan. Summary of Request for Information Responses

MSc in Network Centred Computing. For students entering in October contributions from other EU universities Faculty of Science

LOUGHBOROUGH UNIVERSITY

IBM and the IT Infrastructure Library.

CYBERINFRASTRUCTURE FRAMEWORK FOR 21 st CENTURY SCIENCE AND ENGINEERING (CIF21)

I N T E L L I G E N T S O L U T I O N S, I N C. DATA MINING IMPLEMENTING THE PARADIGM SHIFT IN ANALYSIS & MODELING OF THE OILFIELD

Decision Making Under Uncertainty

MASTER OF TECHNOLOGY

Towards a data-driven economy in Europe

ANALYTICS CENTER LEARNING PROGRAM

Big Data and Data Science. The globally recognised training program

Structure of Presentation. The Role of Programming in Informatics Curricula. Concepts of Informatics 2. Concepts of Informatics 1

Transcription:

H2020 INFRASUPP-4 CSA Project EDISON Education for Data Intensive Science to Open New science frontiers Yuri Demchenko University of Amsterdam

Outline Consortium members EDISON Project Concept and Objectives Challenges Project structure WPs and Tasks EDISON Liaison Group and working with communities EDISON workshops Next steps How to contact EDISON project team? Project coordinator - Yuri Demchenko <y.demchenko At uva.nl> EDISON project invites wide cooperation and contribution to Data Science profession definition Join RDA Interest Group on Education and Training on Research Data Training (IG-ETRD) - https://www.rd-alliance.org/node/971 Join IG-ETRD mailing list <rda-edu-ig@rda-groups.org> EDISON Projject EDISON Overview Slide_2

EDISON Consortium Members Universities 1. University of Amsterdam (UvA) - Coordinator 2. University of Stavanger (UiS) 3. University of Southampton (UK) Community related partners 4. European Grid Initiative (EGI) 5. FTK Research Institute for Telecommunication and Cooperation (DE) and APARSEN (Alliance Permanent Access to Records of Science in European Network) Industry partner 6. Industry: Engineering (Italy) SME 7. InMark (Spain) EDISON Projject EDISON Overview 3

Project Objectives Objective 1: Promote the creation of Data Scientists curricula by an increasing number of universities and professional training organisations. Increasing the number of universities offering Data Science programs is essential in order to respond to the increased demand of the European research, industry and public sectors. Objective 2: Provide conditions and environment for re-skilling and certifying Data Scientists expertise to graduates, practitioners and researchers throughout their careers. To provide conditions for a sustainable increase in the number of Data Scientists in Europe, EDISON will also support the training by professional organisations of the self-made expert, in the light of reaching the target number of graduates that gradually will satisfy the level of research and industry demand in coming 5-8 years. Objective 3: Develop a sustainable business model and a roadmap to achieve a high degree of competitiveness for European education and training on Data Science technologies, provide a basis for the formal recognition of the Data Scientist as a new profession, including supporting the future establishment of formal certification for practicing self-made Data Scientists. EDISON Projject EDISON Overview 4

Objectives 1: Data Science model curriculum Objective 1: Promote the creation of Data Scientists curricula by an increasing number of universities and professional training organisations. Increasing the number of universities offering Data Science programs is essential in order to respond to the increased demand of the European research, industry and public sectors. A general model/framework of the required competences and skills starting from needs of existing and emerging e-infrastructures, different scientific domains and industry sectors (Competence Framework for Data Scientist - CF-DS) including Glossary and Taxonomy of competences and skills. The Body of Knowledge (BoK) for the Data Scientist Profession (DS-BoK) that will be used to map required competencies/skills and existing academic, research and technology disciplines A Model Curriculum for Data Science (MC-DS) that will provide a reference implementation of the proposed CF-DS and DS-BoK. A promotional and networking framework (called EDISON-Net) that establishes communication with top-level European institutions in European countries EDISON Projject EDISON Overview 5

Objective 2: Certification and re-skilling Objective 2: Provide conditions and environment for re-skilling and certifying Data Scientists expertise to graduates, practitioners and researchers throughout their careers. Provide conditions for a sustainable increase in the number of Data Scientists in Europe, in the light of reaching the target number of graduates that gradually will satisfy the level of research and industry demand in coming 5-8 years. Seamless access to Education and Training e-infrastructures, to support specialist training on Data Intensive Science and technologies that will include access to data sets for educational purposes, data management and analytical tools and cloud based virtual labs and classrooms. The creation of an EDISON Marketplace of existing and developing education and training courses, education materials and other resources, by leveraging on the EGI Knowledge Commons and the Training Market initiative. A model and a framework for Data Science education programme accreditation and professional certification, also targeting practicing self-made Data Scientists that want to obtain a formal certification via an academic or professional training institution. A network of experts, champion universities and professional training organisations and other stakeholders that will ensure consistent definition of the CF-DS, the DS-BoK and the MC-DS by sharing their experiences. EDISON Projject EDISON Overview 6

Objective 3: Sustainability model Objective 3: Develop a sustainable business model and a roadmap to achieve a high degree of competitiveness for European education and training on Data Science technologies, provide a basis for the formal recognition of the Data Scientist as a new profession, including supporting the future establishment of formal certification for practicing self-made Data Scientists. An education model suitable for all major learning paths (piloting cases) and stakeholders for education and training in Data Science from the consumer, operator and professional community sides that will include multiple learning models (e.g. residential, lifelong learning, e-learning) that will support Data Science education evolution and individual career path management as technologies evolve. A business model that incorporates the proposed education model into a sustainable cycle of the research data value chain including the major stakeholders and actors from research, industry and government; produce a roadmap for a sustainable education and training services as a part of overall data driven technologies development in Europe. Establish EDISON Liaison Group(s) (ELG) of independent experts that represent the major stakeholders in Data Science that will work as a consulting body for the project and will create a basis for future independent expert group for universities and for EC. EDISON Projject EDISON Overview 7

Data Scientist skill definition A Data Scientist is a practitioner who has sufficient knowledge in the overlapping regimes of expertise in business needs, domain knowledge, analytical skills, and programming and systems engineering expertise to manage the end-to-end scientific method process through each stage in the big data lifecycle. EDISON Projject EDISON Overview 8

Gaps: Education and Training for Data Science and e-infrastructure Universities both in Europe and in USA (worldwide) don t offer sufficient possibilities for educating new type of specialists. There are few Data Science programs offered in Europe and US. Some of currently offered programs are done based on slight modification or simple re-branding former curricula on Business Analytics, Data Analytics or Machine Learning. Existing academic disciplines and education programs can create a strong basis for Data Science education and profiling There is no common vision and model how to combine theoretical and practical components in university curricula and create a necessary education and training infrastructure for practical skills acquisition There is no (limited) coordination between industry and academia to achieve sustainability of the new specialists education and training that could avoid current competition between industry and research for new skills and competencies The new profession of the Data Scientist is not well defined and not recognised as a distinct profession what leads to not consistent education and training programs and not correct expectation from employers and job agencies EDISON Projject EDISON Overview 9

Basic methodology of EDISON, inputs and work packages EDISON Projject EDISON Overview 10

WPs Structure WP1 Coordination and Management WP2 Educational Focus and DDS-BoK T2.1 Education and training needs and competencies T2.2 Inventory Education and Training resources T2.3 Data Scientist: DS-BoK and CF-DS profiles WP3 Development and Reference Implementation T3.1 Development of the EDISON MC-DS T3.2 Piloting Use Cases A and B: Data Science Education and Training T3.3 Use Case C: Course and Programme accreditation T3.4 EDISON Online Educational Environment WP4 Sustainability and certification of the DSP T4.1 Strategic Market analysis for sustainability T4.2 Business models definition T4.3 Sustainability: Plan and implementation T4.4 Definition of a certification scheme WP5 Dissemination and Engagement T5.1 Management of the DEP T5.2 Production and distribution of dissemination material EDISON Projject EDISON Overview 11

WPs interaction and main tasks Edison Activities Links Community and Feedback Links WP2 Educational Focus and DS-BoK Data Scientist Competences and Skills Framework (CF-DS) Body of Knowledge Data Science (BoK-DS) Education and Training Resources Inventory and Taxonomy (ETR-Tax) Education and Training Resources Directory (ETR-Dir) WP3 Development and Reference Implementation Strategy Data Scientist Model Curricula (MC-DS) and Profiles Education Training Model (Residential, On-Line, Life- Long Learning) Data Science Accreditation Framework Education & Training Marketplace Education & Training Environment and Infrastructure Education and Training Environment WP4 Sustainability and certification of the Data Scientist Profession Business Model Sustainability Model and Implementation Plan Data Scientist Certification Scheme Plan Education and Training Model Roadmap WP5 Dissemination and Engagement EDISON Liaison Groups (Univ, Employers, Experts) Promotional and dissemination material Education & Training Infrastructure Studs/Teachers Studs/Teachers Target community: Students, Researchers, Practitioners, ICT staff Champion Organisations And Pioneer Universities Studs/Teachers Self-made Data Scientists Universities and Professional Training Organisations EDISON Projject EDISON Overview 12

Ongoing Education Programs Development Timeline and EDISON Project Lifespan The proposal involves partners that run individual educational and training programs development Enthusiastic universities and community driven activities EGI training program/marketplace development RDA Interest Group on Edu & Training, IG on Big Data Infrastructure, WG on Big Data Analytics and others Funding is requested for coordination of these activities and building sustainable Data Science education environment and infrastructure EDISON Project Sept 2015 August 2017 Other Projects Other Projects Other Projects RDA and community activity Sept 2014 March 2015 Sept 2015 March 2017 Sept 2017 EGI community activity Universities Data Science Programs Development Sept 2014 Sept 2015 Sept 2016 Sept 2017 EDISON Projject EDISON Overview 13

EDISON Dimensions: Stakeholders, Scientific Discovery Stages, Data Management Functions Partners with experience Data Science components Scientific domains Stakeholders Libraries Universities Sci/Prof Associations Research Infra (RI) Domains e-infrastr Mangnt Data/Infra Mangnt Data Analytics Sci Data Publication Tasks/Roles/Skills E-Infra Operators Scientific Discovery Stages Experiment Observation Data Collection Data Processing Data Visualisation Data Preservation Publication EDISON Projject EDISON Overview 14

Data Scientist skill definition A Data Scientist is a practitioner who has sufficient knowledge in the overlapping regimes of expertise in business needs, domain knowledge, analytical skills, and programming and systems engineering expertise to manage the end-to-end scientific method process through each stage in the big data lifecycle. EDISON Projject EDISON Overview 15

Data Science Curriculum Definition Education Infrastructure/Environment Skills and competencies Common Body of Knowledge (CBK) Stakeholders Training/education infrastructure Datasets Approaches: Bottom up vs Top down Specialisations Data Analytics and Tools Scientific Experiment Automation and Scientific Data Infrastructure Profiling for scientific domains Data Archiving, preservations, Librarian Example CBK Master Program 1. Big Data Definition and Big Data Architecture Framework, Data driven and data centric applications model, Stakeholders and Roles 2. Big Data use cases and application domains taxonomy and requirements, Big Data in industry and science 3. Data structures, SQL and NoSQL databases 4. Data Analytics Methods and Tools, Knowledge Presentation 5. Big Data Management and curation, Big Data Lifecycle, Data Preservation and Sharing, Enterprise Data Warehouses, Agile Data Driven Enterprise 6. Cloud based Big Data infrastructure and computing platforms, Data Analytics application and new Data Scientist skills required 7. Computing models: High Performance Computing (HPC), Massively Parallel Computing (MPP), Grid, Cluster Computing 8. Big Data Security and Privacy, Certification and Compliance EDISON Projject EDISON Overview 16

Education&Training Programs Development Approaches Reuse experience from the existing related program Cloud Computing Technologies and Tools Theoretical Informatics, Data Analytics and Artificial Intelligence Collect examples/projects and do taxonomy Science/subject domain and long-tale science Libraries and data archives General (inter)active education principles Common Body of Knowledge Bloom s taxonomy Pedagogy vs Andragogy MOOC: possibilities and limitations Discussion forum for online education, and Research and Reading assignments for on-campus education EDISON Projject EDISON Overview 17

How to contact EDISON project team? EDISON project invites wide cooperation and contribution to Data Science profession definition Project coordinator University of Amsterdam Yuri Demchenko <y.demchenko At uva.nl> Join RDA Interest Group on Education and Training on Research Data Training (IG-ETRD) - https://www.rd-alliance.org/node/971 Join IG-ETRD mailing list <rda-edu-ig@rda-groups.org> EDISON Projject EDISON Overview 18

Additional Information McKinsey Institute on Big Data Jobs Strata Survey Skills and Data Scientist Self-ID Big Data technology domain definition EDISON Projject EDISON Overview 19

Data Scientist: New Profession and Opportunities McKinsey Institute on Big Data Jobs http://www.mckinsey.com/mgi/publications/big_data/index.asp There will be a shortage of talent necessary for organizations to take advantage of Big Data. By 2018, the United States alone could face a shortage of 140,000 to 190,000 people with deep analytical skills as well as 1.5 million managers and analysts with the know-how to use the analysis of big data to make effective decisions SOURCE:US Bureau of Labor Statistics; US Census; Dun & Bradstreet; company interviews; McKinsey analysis EDISON Projject EDISON Overview 20

Strata Survey Skills and Data Scientist Self-ID Analysing the Analysers. O Reilly Strata Survey Harris, Murphy&Vaisman, 2013 Based on how data scientists think about themselves and their work Identified four Data Scientist clusters EDISON Projject EDISON Overview 21

Skills and Self-ID Top Factors Business ML/BigData Math/OR Programming Statistics ML Machine Learning OR Operations Research EDISON Projject EDISON Overview 22

Visionaries and Drivers: Seminal works and High level reports The Fourth Paradigm: Data-Intensive Scientific Discovery. By Jim Gray, Microsoft, 2009. Edited by Tony Hey, et al. http://research.microsoft.com/en-us/collaboration/fourthparadigm/ Riding the wave: How Europe can gain from the rising tide of scientific data. Final report of the High Level Expert Group on Scientific Data. October 2010. http://cordis.europa.eu/fp7/ict/einfrastructure/docs/hlg-sdi-report.pdf https://www.rd-alliance.org/ NIST Big Data Working Group (NBD-WG) https://www.rd-alliance.org/ AAA Study: Study on AAA Platforms For Scientific data/information Resources in Europe, TERENA, UvA, LIBER, UinvDeb. EDISON Projject EDISON Overview 23

The Fourth Paradigm of Scientific Research 1. Theory and logical reasoning Based on observation and imagination of ancient philosophers 2. Observation or Experiment E.g. Newton observed apples falling from the tree to design his theory of mechanics, verified with other observations, validated with experiments Gallileo Galilei made experiments with falling objects from the Pisa leaning tower 3. Simulation of theory or model Digital simulation can prove theory or model 4. Data-driven Scientific Discovery (aka Data Intensive Science) e-science as computing and Information Technologies empowered science More data beat hypnotized theory But not to use microscope blindly EDISON Projject EDISON Overview 24