The National Consortium for Data Science (NCDS)

Similar documents
Managing Next Generation Sequencing Data with irods

Data Management using irods

INTEGRATED RULE ORIENTED DATA SYSTEM (IRODS)

Panel on Emerging Cyber Security Technologies. Robert F. Brammer, Ph.D., VP and CTO. Northrop Grumman Information Systems.

UNIVERSITY GLOBAL PARTNERSHIP NETWORK (UGPN) RESEARCH COLLABORATION FUND 2014 THIRD CALL FOR PROPOSALS

Automated and Scalable Data Management System for Genome Sequencing Data

Balancing Big Data for Security, Collaboration and Performance

National Big Data R&D Initiative

Policy Policy--driven Distributed driven Distributed Data Management (irods) Richard M arciano Marciano marciano@un

How To Understand The Nature Of Big Data

Integrated Rule-based Data Management System for Genome Sequencing Data

NITRD: National Big Data Strategic Plan. Summary of Request for Information Responses

Data Registry Workshop Report

Science Gateways in the US. Nancy Wilkins-Diehr

irods Policy-Driven Data Preservation Integrating Cloud Storage and Institutional Repositories

CYBERINFRASTRUCTURE FRAMEWORK FOR 21 st CENTURY SCIENCE AND ENGINEERING (CIF21)

Graduate Research and Education: New Initiatives at ORNL and the University of Tennessee

High Performance Computing Initiatives

irods for Big Data Management in Research Driven Organizations Charles Schmitt CTO & Director of Informatics RENCI

Data Science at the NIH Philip E. Bourne Ph.D. Associate Director for Data Science National Institutes of Health

Institutes for Data Science: New York University University of Washington University of California, Berkeley

MBS: Webinar. Please note: Voice is over the phone. Please use the call in number on the left panel.

New Jersey Big Data Alliance

CYBERINFRASTRUCTURE FRAMEWORK FOR 21 ST CENTURY SCIENCE, ENGINEERING, AND EDUCATION (CIF21)

College of Human Environmental Sciences Strategic Plan for

College of Architecture Strategic Plan

Workforce Development for Teachers and Scientists Funding Profile by Subprogram and Activity

Strategic Plan The College of Business Oregon State University. Strategic Plan. Approved June 2012 Updated June 2013 Updated June 2014

How To Manage Research Data At Columbia

Pluggable Rule Engine

LabArchives Electronic Lab Notebook:

US NSF s Scientific Software Innovation Institutes

irods Overview Intro to Data Grids and Policy-Driven Data Management!!Leesa Brieger, RENCI! Reagan Moore, DICE & RENCI!

Electric Energy and Power Panel Sessions

SURENDRA SARNIKAR. 820 N Washington Ave, EH7 sarnikar@acm.org Madison, SD Phone:

Director, Office of Health IT and e Health; State Government HIT Coordinator. Deputy Director, Office of Health IT and e Health

Agenda. University of Southern California. Viterbi School of Engineering. Master s Programs. Doctoral Programs. Work Experience Opportunities Q&A

Technical. Overview. ~ a ~ irods version 4.x

Discover Viterbi: Computer Science

irods Overview Introduction to Data Grids, Policy-Driven Data Management, and Enterprise irods

Exploring the roles and responsibilities of data centres and institutions in curating research data a preliminary briefing.

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF COPENHAGEN. Strategy Department of Psychology University of Copenhagen

Agenda. University of Southern California. Viterbi School of Engineering. Master s Programs. Work Experience Opportunities Q&A

College of Architecture Strategic Plan

EMBL Identity & Access Management

Life Sciences and Large Data Challenges

Homeland Open Security Technology HOST Program

NOMINATION OF THE. For the 2010 USASBE Entrepreneurship Education National Award in. Outstanding Specialty Entrepreneurship Program.

Charting the Evolution of Campus Cyberinfrastructure: Where Do We Go From Here? 2015 National Science Foundation NSF CC*NIE/IIE/DNI Principal

EUDAT. Towards a pan-european Collaborative Data Infrastructure. Willem Elbers

CYBERINFRASTRUCTURE FRAMEWORK FOR 21 ST CENTURY SCIENCE, ENGINEERING, AND EDUCATION (CIF21) $100,070,000 -$32,350,000 / %

Education and Workforce Development in the High End Computing Community

Data Management Resources at UNC: The Carolina Digital Repository and Dataverse Network

Databases & Data Infrastructure. Kerstin Lehnert

Big Data to Knowledge (BD2K)

Technology solutions for managing and computing on largescale biomedical data

Vanderbilt University Biomedical Informatics Graduate Program (VU-BMIP) Proposal Executive Summary

Foundation for HEP Software

The University of Edinburgh Global Health Academy

Request for Information National Network for Manufacturing Innovation (NNMI)

Manjula Ambur NASA Langley Research Center April 2014

Case Studies in Systems Engineering Central to the Success of Applied Systems Engineering Education Programs

The Resource Management Life Cycle

Canadian National Research Data Repository Service. CC and CARL Partnership for a national platform for Research Data Management

SECURE AND TRUSTWORTHY CYBERSPACE (SaTC)

STUDENT ACTIVITIES STUDENT ORGANIZATION ANNUAL CERTIFICATION PACKET

Digital Stewardship Education at the Graduate School of Library & Information Science, Simmons College

Psychological Science Strategic Plan February 18, Department of Psychological Science Mission

Transcription:

The National Consortium for Data Science (NCDS) A Public-Private Partnership to Advance Data Science Ashok Krishnamurthy PhD Deputy Director, RENCI University of North Carolina, Chapel Hill

What is NCDS? is a public-private partnership to advance data science Mission Leadership in data science research & education, help industry to use the power of data to drive economic growth Vision Focused multi-sector, multidisciplinary data science community to solve big data challenges and drive the field forward Goals Engage broad communities of data experts Coordinate data science research priorities that span disciplines and industries Facilitate development education & training programs Support development of technical, ethical & policy standards Apply NCDS expertise to data challenges in science, business and government 2

NCDS Members The Big Data Frontier 4

Why a Consortium? Time Consortium can plant a stake in the ground quickly: significant funding and full-time staff not essential to launch. Participation Shared vision, ability to have your voice heard, define the issues to be tackled. Flexibility Able to try different models, different key projects, different core foci to see what works best and to respond to changing and varied needs and interests. Community Consortium is way of building a community that can eventually become the foundation for a center (a physical place). 5

NCDS Components Data Lab & Observatory Data Fellows Program Working Groups Data Science Events Shared, distributed infrastructure housing large organized data; serves as a platform for data R&D and data science education (Graduate certificates, MS) Seed grants for faculty to work on consortiumapproved projects; NCDS review panel evaluates proposals Industry internships for graduate students Visiting industry data scientists at member universities Year long deep dive into topics of interest to members Produces position papers, workshops, software, events, etc. Leadership Summits (Spring) Data Matters Short Courses (Summer) Student Career events (Fall/Spring) Invited lectures and outreach (ongoing) 6

Accomplishments: 2013-2014 Organizational: Bylaws passed, steering committee, kickoff featuring Dr. Eric Green (Director, NHGRI) and US Rep. David Price, 10 paid memberships so far. Programmatic: NCDS Leadership Summit (April 2013); Five Faculty Fellows appointed (October 2013); Student-Industry-Faculty career awareness event (April 2014); Data Innovation Showcase (May 2014); Data Matters short course series (June 2014), Observatory active with data sets (June 2014). Upcoming: Tech Talks with UNC Computer Science and UNC Career Services (October 2014), New Data Fellows CFP (October 2014), Working Groups (Fall 2014). 8

NCDS Data Cyberinfrastructure Secure Research Workspace/ Secure Medical Research Workspace: Secured virtual environments ExoGENI/ADAME NT: Federated Infrastructure as a Service irods: Policy-driven data management DataBridge: Social media- like discovery of useful data sets Genomic Medical Workflow Engine: Informatics and HPC in High Throughput Sequencing Key: Infrastructure that adapts to problems 9

What is irods? free to use free to modify free to contribute sits between the files and the user irods is open source data grid middleware for Data Discovery Workflow Automation Secure Collaboration Data Virtualization Metadata policies: any condition; any action sharing without losing control file system flexibility 1 10

irods 4.0: Ready for Enterprise Product of nearly 20 years of research and development, funded by DARPA, DOE, NASA, NSF, NARA, and NOAA. Sustainability - Formation of the irods Consortium 6 members, presently: developers, users, storage vendors Provides interaction between user/developer community Professional integration services, technical support, training and certification Enterprise Quality - Starting with irods 4.0, the entire codebase has been reviewed and restructured. Plug-in architecture Each change is verified with a test case in a continuous integration suite Pre-compiled binary packages are available for several Linux distributions and multiple database management systems. 1 11

Who Uses irods? The Wellcome Trust Sanger Institute manages 2 PB of data with irods Data discovery and workflow automation: data is tagged with processing history and checksums Secure collaboration: workgroups can share with each other while independently maintaining archiving and access policies Data virtualization: data is replicated for redundancy and high availability 1 12

Who Uses irods? The iplant Collaborative uses irods to manage over 112M files (>750 TB) with over 20,000 users Data discovery: Templates guide application of metadata according to international data curation standards Workflow automation: Fine-grained user permissions conditioned on domain, group, file size, metadata Data virtualization: Data is easily moved between storage and compute resources, always maintaining a specified level of redundancy 1 13

Data Science Education Modular courses for 11 month program Graduate Certificate in Data Science (Half time) MS in Data Science (Full time) 14

Conclusion Developing Data Science Will: Develop the next generation of data science experts and leaders Create strategies, practices and scientific methods for understanding data Enable more collaborations among data and domain scientists, business, academia and government Assist those who are struggling to collect, analyze, manage and use data Establish methodologies for measuring the value and impact of data 15

THANK YOU!