Data Intensive Research Initiative for South Africa (DIRISA)



Similar documents
Exploring the roles and responsibilities of data centres and institutions in curating research data a preliminary briefing.

The IT Strategic Plan

Workprogramme

A Policy Framework for Canadian Digital Infrastructure 1

CYBERINFRASTRUCTURE FRAMEWORK FOR 21 st CENTURY SCIENCE AND ENGINEERING (CIF21)

Analytics Strategy Information Architecture Data Management Analytics Value and Governance Realization

European Data Infrastructure - EUDAT Data Services & Tools

CYBERINFRASTRUCTURE FRAMEWORK FOR 21 ST CENTURY SCIENCE, ENGINEERING, AND EDUCATION (CIF21)

S O U T H A F R I C A. Prepared by: Vinny Pillay Occasion: EUREKA - ITEA Co- Summit 2015 Date: 10 March 2015

Using Enterprise Content Management Principles to Manage Research Assets. Kelly Mannix, Manager Deloitte Consulting Perth, WA.

Stewarding Big Data: Perspectives on Public Access to Federally Funded Scientific Research Data

RESEARCH DATA MANAGEMENT POLICY

Library and University Collections Vision, Values, Strategic Goals and Key Performance Areas,

Advancing cyber security. Cyber Security Academy

OpenAIRE Research Data Management Briefing paper

Building Out BPM/SOA Centers of Excellence Business Driven Process Improvement

National Framework for Doctoral Education

Digital Asset Manager, Digital Curator. Cultural Informatics, Cultural/ Art ICT Manager

H2020-LEIT-ICT WP Big Data PPP

THE 7 STEPS TO A SUCCESSFUL CRM IMPLEMENTATION DEPLOYING CRM IN THE NEW ERA OF CONNECTED CUSTOMERS

Strategic Activities to Support Sustainability of Canada s Geospatial Data Infrastructure

NIST Big Data Phase I Public Working Group

DATA GOVERNANCE AT UPMC. A Summary of UPMC s Data Governance Program Foundation, Roles, and Services

White Paper. Enterprise Information Governance. Date Released: September Author/s: Astral Consulting.

Autonomy Consolidated Archive

The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into

Mastering Big Data. Steve Hoskin, VP and Chief Architect INFORMATICA MDM. October 2015

This Symposium brought to you by

Federal Enterprise Architecture and Service-Oriented Architecture

IDC MaturityScape Benchmark: Big Data and Analytics in Government. Adelaide O Brien Research Director IDC Government Insights June 20, 2014

How to bridge the gap between business, IT and networks

Big Data and New Paradigms in Information Management. Vladimir Videnovic Institute for Information Management

IBM Solution Framework for Lifecycle Management of Research Data IBM Corporation

IDC MaturityScape Benchmark: Big Data and Analytics in Government

Synergies between the Big Data Value (BDV) Public Private Partnership and the Helix Nebula Initiative (HNI)

Big Data and Official Statistics The UN Global Working Group

Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, Viswa Sharma Solutions Architect Tata Consultancy Services

EDISON Education for Data Intensive Science to Open New science frontiers

Government Enterprise Architecture

Big Data and Big Data Governance

8970/15 FMA/AFG/cb 1 DG G 3 C

Certified Information Professional 2016 Update Outline

SUMMARY PROFESSIONAL EXPERIENCE. IBM Canada, Senior Business Transformation Consultant

Big Data Executive Survey

Breaking Down the Silos: A 21st Century Approach to Information Governance. May 2015

9360/15 FMA/AFG/cb 1 DG G 3 C

Whitepaper Data Governance Roadmap for IT Executives Valeh Nazemoff

SURFsara Data Services

Strategy Providing resources for staff and students in higher and further education in the UK and beyond

1 Copyright 2011, Oracle and/or its affiliates. All rights reserved.

Transforming CliniCal Trials: The ability to aggregate and Visualize Data Efficiently to make impactful Decisions

ESS event: Big Data in Official Statistics

Developing Excellence in Leadership, Training and Science

Information and Communications Technology Strategy

How To Be An Architect

The amount of data you have doubles every 12 to 18 months. Information Asset Management that Drives Business Performance Jeremy Pritchard 10/06/2015

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

INVENTING THE FUTURE HITACHI DATA SYSTEMS BIG DATA ROADMAP MICHAEL HAY

Science and Engineering Professional Framework

Head of Engineering Job Description

Bruce Rogers. Forbes. Chief Insights Officer and Head of the CMO Practice

Cambridge University Library. Working together: a strategic framework

The Next Wave of Data Management. Is Big Data The New Normal?

Enabling Data Quality

Griffith IT Plan Delivering alignment and agility

Are You Big Data Ready?

INTERIM LIBRARY STRATEGIC PLAN

Explore the Possibilities

Visual Enterprise Architecture

BIG DATA GOVERNANCE: BALANCING BIG DATA VELOCITY & INFORMATION GOVERNANCE

High-Performance Analytics

Manager Service Transition

Data Management Emerging Trends. Sourabh Mukherjee Data Management Practice Head, India Accenture

UNITED STATES AIR FORCE. Air Force Product Support Enterprise Vision

The Future of Census Bureau Operations

From Stored Knowledge to Smart Knowledge

National Cybersecurity Challenges and NIST. Donna F. Dodson Chief Cybersecurity Advisor ITL Associate Director for Cybersecurity

High Performance Computing

What to Look for When Selecting a Master Data Management Solution

Transcription:

Data Intensive Research Initiative for South Africa (DIRISA) A Reinterpreted Vision A. Vahed 25 November 2014

Outline Background Data Landscape Strategy & Objectives Activities & Outputs Organisational Structure & Implementation November 2014 CSIR, 2014 2

NICIS NICIS National data integrative enabler supporting MTSF RDP SARIR, Overarching coordination & national strategy National (Tier1) Institutional (Tier2) Amalgamated, physically distributed cyber platform for data intensive research Data Networking Computing Crosscut S&T Physical-Service Support Application Computing Services (CHPC +) Phy Sci & Eng. Earth & Environment Materials & Manuf. Energy Networking Services (SANReN) Health, Bio & Food Humans & Society Data Services (DIRISA) Academia Science councils : : RI s Data intensive research environments (SA_Grid Cloud) Skills & expertise Core Services Networked resources November 2014 CSIR, 2014 3

Other views Capability Industrial Sector Awareness Connections Computing Hardware Software Data Networks Security Computing & Data Skills Sector Domain Knowledge D. Tildesley: Vision of integrated e-infrastructure ecosystem November 2014 CSIR, 2014 4

NICIS: DIRISA evolution DIRISA (Infrastructure) VLDB (Repository) DIRISA (Stewardship) NICIS Recommendations expanded Data Services (DIRISA) ambitious proposal on data services [ ] predicated on economic competitiveness, human resource development and industrial benefit Innovation for socio-economic development & knowledge economy Step change: NDP, MTSF, NSI, November 2014 CSIR, 2014 5

Data landscape Extreme Data Global, massive, well-typed, homogeneous volumes LHC & SKA Research Big Data Large, mixed-typed volumes Imagery, text, audio, etc Business Big Data Lots of (closed) transactional, serialised data Sentiment data (Facebook, Twitter, etc) Long Tail Data Lots of (poorly managed) relatively small data sets Data volume Extreme Data Big Data The underlying attributes and requirements are very different! Business Big Data Research Big Data Long Tail Number of data sets November 2014 CSIR, 2014 6

Data class characteristics Class Ownership Big Data Vs Technology Skills Research Env Extreme International Vol, Vel, Open Big Data Business Big Data Research Businesses National, Institutional Vol, Vel, Var, Closed Vol, Vel, Var, Ver, Open access Exascale Comp Maths / Stats / Astro, Visual Clusters, SAS, Cloud, Hadoop HPC, Clusters, Grid, Cloud, data transfer Data Engineers Data Scientists, Domain Researchers, Comp Scientists, Maths, Model Distributed teams Team VRE, multi-disc, RIs Long Tail Department, Individual Var, Ver Grid, cloud Stats, Comp Science Individuals, PhD, PD, Ris November 2014 CSIR, 2014 7

Vision & Strategy Vision: Vibrant communities of research and industry access, share, reuse, combine data in a cohesive network of data repositories, governed by sound data stewardship policies and principles, supported by robust services and environments, managed by expert and skilled people, and produce data intensive research output that support innovation for socioeconomic growth and improved service delivery Strategic principles Provide national capstone coordination Data intensive research initiatives; Stakeholder engagement Promote & support data intensive research Higher education; PPPs Data stewardship (more than DAAS) Robust infrastructure & enabling environments; E2E research data lifecycle Strategy & leadership Priority domains; Cross-cutting (inter- and multi-disciplinary) November 2014 CSIR, 2014 8

Skills and expertise Advocacy & Outreach Value proposition Services & Environments Rich world of discipline, cross- and multi disciplinary data analytics (enrichment, annotation, ) Data stewardship Harmonised world of data management (preservation, workflows, ) Data Acquisition Standards & Policies Literature Sharing Federated world of data generators and sensory observations (models and measurements) November 2014 CSIR, 2014 9

Key Objectives 1. Provide robust infrastructure and services Federate Tier 1 & Tier 2 repositories Enabling environments Journal licencing 2. Ensure good data stewardship Policies, protocols & standards Internationally benchmarked 3. Develop capacity & expertise Data intensive research and Infrastructure & Services Policies & Standards Capacity & Expertise Data Stewardship Data science programmes with HEIs & private sector 4. Advocacy & outreach Data stewardship and data sharing Stakeholder engagement establish and leverage existing forums 5. Coordination & strategy National data intensive research activities Inform on and guide aligned & consolidated strategic agenda Advocacy & Outreach Coordination & Strategy November 2014 CSIR, 2014 10

Scope Primarily a national capstone orchestration, enabling, supporting and facilitation role Coordinate, not prescribe, data science capacity development Funded capacity development limited to DIRISA s remit Promote & support priority research BUT with caveats of data stewardship plan and capacity building Guide research strategy and funding (Big Data, NRF, ) Provide services and research environments BUT not a domain research funder Promote, not enforce, data contribution and adoption of Open standards & Open data where feasible Support data stewardship in federated context November 2014 CSIR, 2014 11

Issues Research data lifecycle Observation / Generation : Preservation/ Expunction Ethics & privacy Re-identification Discriminative profiling Who watches the watchers? Access spectrum Trust & security IP & Copyright Data sharing mind-set (What s in it for me?) Laws have borders; data does not Sharing & reuse Ethics & IP Metadata Long term preservation Collection & formats Organising & storing November 2014 CSIR, 2014 12

Stakeholder Engagement Stakeholders RIs & Champions Academia & research councils Funders Industry International forums Engagement is critical Strategic research agenda Data stewardship policy & frameworks Coordination of initiatives Contribution & participation Impact Outcomes Outputs Activities Inputs & Enablers MTSF, NDP & NSI goals achievement World class grand science Innovation supporting knowledge economy Globally competitive industry Good data stewardship Vibrant research communities National science strategy achievements Maximised value from national data investment Data policies & federated data Capacity development programmes Services & environments Coordinated data intensive research activities Develop infrastructure & services Implement standards & policy Develop capacity & expertise Advocate & promote Coordinate strategic agenda Skills Funding Infrastructure Stakeholders November 2014 CSIR, 2014 13

DIRISA Roadmap Year 1 (2014/15) Survey & assess Year 2 (2015/16) Develop & build Year 3 (2016/17) Grand research Year 5+ (> 2017) Global competitive Institutional arrangements Infrastructure, policies & capacity Proposals Early adopters Infrastructure & services Big projects Policies & processes Capacity building Federated network Open data & publishing Business partnerships E2E Data Mngt Beyond eresearch Long running & real-time science Fused & streamed data Action/Task - Institutional arrangements - Set up forums & events - Engage & consult - Survey, assess As-Is situation - Prioritise areas & needs - Coordinate new & ongoing projects Year 1 Outputs - Tier 1 & core services - Data stewardship policies & framework (RDA, etc) - University data science programmes - Solicited proposals in data stewardship - Data intensive research strategy coordinated with funders, strategies and key initiatives November 2014 CSIR, 2014 14

Conclusion Business plan further provides Detailed activities, outputs detailed over 3-year timeframe Governance and managerial structure; institutional arrangements and organizational structure Major premises, risks and contingencies DIRISA s new remit being formalised Implementation plan with stakeholders November 2014 CSIR, 2014 15

Thank you The good thing about data is that there s so much of it The bad thing about data is that there s so much of it

Organisational structure November 2014 CSIR, 2014 17

Science strategies Research Communities Innovation & Discovery Collaboration Capacity Development Services & Standards Support & Enablement Advocacy & Outreach CHPC SANReN DIRISA Infrastructure November 2014 CSIR, 2014 18