National Big Data R&D Initiative



Similar documents
Big Data R&D Initiative

CYBERINFRASTRUCTURE FRAMEWORK FOR 21 ST CENTURY SCIENCE, ENGINEERING, AND EDUCATION (CIF21)

CYBERINFRASTRUCTURE FRAMEWORK FOR 21 ST CENTURY SCIENCE, ENGINEERING, AND EDUCATION (CIF21) $100,070,000 -$32,350,000 / %

Core Techniques and Technologies for Advancing Big Data Science & Engineering (BIGDATA) NSF

NITRD: National Big Data Strategic Plan. Summary of Request for Information Responses

CYBERINFRASTRUCTURE FRAMEWORK FOR 21 st CENTURY SCIENCE AND ENGINEERING (CIF21)

SECURE AND TRUSTWORTHY CYBERSPACE (SaTC)

Information Technology R&D and U.S. Innovation

Big Data to Knowledge (BD2K)

The National Consortium for Data Science (NCDS)

NITRD and Big Data. George O. Strawn NITRD

Preparing a 21 st Century Workforce

High Performance Computing Initiatives

NASA Earth Science Research in Data and Computational Science Technologies Report of the ESTO/AIST Big Data Study Roadmap Team September 2015

354 Russell Senate Office Building 724 Hart Senate Office Building Washington, D.C Washington, D.C

Education and Workforce Development in the High End Computing Community

EXECUTIVE ORDER CREATING A NATIONAL STRATEGIC COMPUTING INITIATIVE. By the authority vested in me as President by the

SECURE AND TRUSTWORTHY CYBERSPACE (SaTC) $124,250,000 +$1,500,000 / 1.2%

Big Data R&D Initiative

Midwest Big Data Hub Le#ers of Intent for NSF

Public Access Plan. U.S. Department of Energy July 24, 2014 ENERGY.GOV

MEDICAL DATA MINING. Timothy Hays, PhD. Health IT Strategy Executive Dynamics Research Corporation (DRC) December 13, 2012

COGNITIVE SCIENCE AND NEUROSCIENCE

NICE and Framework Overview

Data Science at the NIH Philip E. Bourne Ph.D. Associate Director for Data Science National Institutes of Health

In s p i r i n g Ge n e r a t i o n s

Strategic Vision. for Stewarding the Nation s Climate Data. Our. NOAA s National Climatic Data Center

Workforce Development for Teachers and Scientists Funding Profile by Subprogram and Activity

Request for Information National Network for Manufacturing Innovation (NNMI)

One Hundred Eleventh Congress of the United States of America

A NEW STRATEGIC DIRECTION FOR NTIS

Career Management. Making It Work for Employees and Employers

Big Data. George O. Strawn NITRD

Good morning. It is a pleasure to be with you here today to talk about the value and promise of Big Data.

RFI Summary: Executive Summary

Department of the Interior Open Data FY14 Plan

Demystifying Big Data Government Agencies & The Big Data Phenomenon

The Australian Public Service Big Data Strategy

Government Technology Trends to Watch in 2014: Big Data

Tsukuba Communiqué. G7 Science and Technology Ministers Meeting in Tsukuba, Ibaraki May 2016

WORKFORCE ACCELERATOR FUND. Request for Applications. April 23, 2014

EL Program: Smart Manufacturing Systems Design and Analysis

How To Become Director Of Development At Northeastern

OUTLINE FOR AN INTERDISCIPLINARY CERTIFICATE PROGRAM

Mission and Goals Statement. University of Maryland, College Park. January 7, 2011

Internship Opportunities Xerox Research Centre India (XRCI), Bangalore Analytics Research Group

EDISON Education for Data Intensive Science to Open New science frontiers

THE ANALYTICS HUB LEVERAGING A SHARED SERVICES MODEL TO UNLOCK BIG DATA. Thomas Roland Managing Director. David Roggen Director CONTENTS

Council of the European Union Brussels, 13 February 2015 (OR. en)

NATIONAL CENTER FOR PUBLIC HEALTH INFORMATICS (CPE)

SDN Security Challenges. Anita Nikolich National Science Foundation Program Director, Advanced Cyberinfrastructure July 2015

ADVANCED DISTRIBUTION MANAGEMENT SYSTEMS OFFICE OF ELECTRICITY DELIVERY & ENERGY RELIABILITY SMART GRID R&D

NIH Commons Overview, Framework & Pilots - Version 1. The NIH Commons

Master Specialization in Knowledge Engineering

PROJECT DESCRIPTION PROJECT GOALS

How To Teach Data Science

Purdue University Department of Computer Science West Lafayette, IN Strategic Plan

SMART MINDS + SMART CITIES

Strategic Initiatives College of Science, August, 2004

Transcription:

National Big Data R&D Initiative Suzi Iacono, PhD National Science Foundation Co-chair NITRD Big Data Senior Steering Group for CASC Spring Meeting April 23, 2014

Why is Big Data Important? Transformative implications for commerce and economy Critical to accelerating the pace of discovery in almost every science and engineering discipline Potential for addressing some of the society s most pressing challenges Image Credit: Chi Birmingham

A National Imperative PCAST calls on the Federal government to increase R&D investments for collecting, storing, preserving, managing, analyzing, and sharing the increasing quantities of data. Furthermore, PCAST observed that the potential to gain new insights to move from data to knowledge to action has tremendous potential to transform all areas of national priority. Source: PCAST (December 2010), Report to the President and Congress: Designing a Digital Future A periodic congressionally-mandated review of the Federal Networking and Information Technology Research and Development (NITRD) Program. Available at: http://www.whitehouse.gov/administration/eop/ostp/pcast

National Big Data R&D Launch March 29, 2012 Led by cross-agency Big Data Senior Steering Group chartered in spring 2011 by the White House OSTP: Co-chaired by NSF and NIH Members from 18 agencies Charged with developing a framework and a plan Major Announcements: NSF, NIH, USGS, DoD, DARPA, DOE Cornerstone Announcement: Core Techniques and Technologies for Advancing Big Data Science & Engineering (BIG DATA) Solicitation All NSF Directorates and 8 NIH Institutes Research Thrusts: Collection, Storage, and Management; Data Analytics; Research in Data Sharing and Collaboration More information available at: http://nsf.gov/news/news_summ.jsp?org=cise&cntn_id=123607&preview=false

Framework for Investments to develop new techniques and technologies to derive knowledge from data New to manage, curate, and serve data to domain research communities New approaches for and New types of inter-disciplinary, grand challenges, and competitions

Foundational Research

Big Data Core Technologies 5% 3% Percent by number of projects 8% Data Collection, Management, Mining and Machine Learning Health and Bio Informatics 10% Social Networks 51% Physical Sciences and Engineering Algorithmic Foundations 23% Cyberinfrastructure In 2012 & 2013, NSF & NIH awarded 45 projects ranging from $250K/year for up to 3 years to $1M/year for up to 5 years.

Critical Techniques and Technologies for Advancing Big Data Science and Engineering (NSF 14-543) Two categories for submission Foundational: Encourages fundamental, novel techniques, theories, methodologies and technologies of broad applicability Innovative Applications: Encourages novel techniques, theories, methodologies, and technologies of interest to at least one specific application Due Date: June 9, 2014 Size: up to $500K per year for up to 4 years Not joint with NIH this year

Domain Specific Research & Infrastructure

Agency Program Highlights DARPA is continuing programs in Big Data and anomaly detection, machine learning, visualization, text extraction, and starting 3 new programs (Big Mechanism, Memex, and Big Data Capstone) with a focus on knowledge to discovery/action. NIST s Big Data Public Working Group is a community of interest for industry, academia, and government and will develop consensus definitions, taxonomies, secure reference architectures, and a technology roadmap DOE Science portfolio to support Extreme Scale Science focuses on multidisciplinary R&D for data reduction, scalability, data fusion for multi-experiments, simulations, and collaboration infrastructures. NSF cross directorate investment ($155M) and includes broad programs in core technologies, domain sciences areas, and infrastructure. NSF is also making specific investments in Big Data Centers at Berkley (AMPLab) and MIT (Brains, Minds, & Machines) and in earth Science (EarthCube) as well as in Big Data infrastrructure DIBBS program. NIH s 7 year, $100M/year BD2K initiative focuses on facilitating broad use of biomedical big data, developing and disseminating analysis methods and software, enhancing training, and establishing Centers of Excellence.

Agency Program Highlights USAID continues major programs like FEWS-Net, and focus techniques to integrate multi-resolution datasets and converting historical data to machine readable formats (i.e. from pdf). The US Group on Earth Observation (GEO) is continuing work on data integration and interoperability of earth science data across agencies and have compiled inventories and prioritized lists of earth datasets. NASA projects a dramatic increase in simulation data holdings, and new programs will focus on managing exascale science and improving modeling results with data scale up, as well as improving small scale data science and collaborative workbenches. NOAA s Big Data priorities focus on weather and climate (specifically hurricane) modeling and prediction, federal and public-private partnerships, and leveraging HPC. Specifically, NOAA is leveraging ARRA investments that dramatically increased data resolution/granularity to improve modelling. Data.gov is launching an update to streamline and automate agency submission of datasets onto the platform that should dramatically increase the number of datasets available on the site.

Building a Big Data R&D Pipeline FOUNDATIONAL RESEARCH DOMAIN SPECIFIC APPLICATION RESEARCH CYBERINFRASTRUCTURE PILOTS

Education and Workforce Development

Education, Learning, Workforce Development, Computational and Data-enabled Science McKinsey & Company By 2018 the United States alone faces a shortage of 140,000 to 190,000 people with analytical expertise and 1.5 million managers and analysts with the skills to understand and make decisions based on the analysis of big data. 1 1 McKinsey&Company (May 2011), Big data: The next frontier for innovation, competition, and productivity. Available at: http://www.mckinsey.com/insights/mgi/research/technology_and_innovation/big_data_the_next_frontier_for_innovation

WORKFORCE RECOMMENDATIONS TECHAMERICA Demystifying Big Data: A Practical Guide to Transforming the Business of Government (2012) Expand the talent pool by creating a formal career track for line of business and IT managers and establish a leadership academy to provide Big Data and related training and certification. Page 8 Leverage the data science talent by establishing and expanding college-togovernment service internship programs focused specifically on analytics and the use of Big Data. Page 8 IBM CENTER FOR THE BUSINESS OF GOVERNMENT From Data to Decisions III, Lessons from Early Analytics Programs (2013) To encourage data use and spark insight, enable employees to easily see, combine and analyze it. Page 27 Leaders and managers should demand and use data, and provide employees with targeted on-the-job-training. Page 30 MCKINSEY GLOBAL INSTITUTE Big data: The Next Frontier for Innovation, Competition, and Productivity (2011) Build human capital for big data. Page 117 MCKINSEY GLOBAL INSTITUTE Open Data: Unlocking Innovation and Performance with Liquid Information (2013) THE CENTER FOR DATA INNOVATION Data Innovation 101: An Introduction to the Technologies and Policies Supporting Data-Driven Innovation (2013) Human capital. Government agencies at the federal, state and local levels should continue to engage directly with the data science community and participate in civic hackathons, public coding challenges, and other events hosted by the data science community. Page 10 Government can also help spur the development of the necessary human capital by becoming a leader, rather than a laggard, in the adoption of data-driven innovation. Page 10 Additionally, public-sector agencies can help address the need for workers with strong data and analytical skills through education and immigration policies. Page 13

WORKFORCE RECOMMENDATIONS SOFTWARE & INFORMATION INDUSTRY ASSOCIATION Data- Driven Innovation, A Guide for Policymakers: Understanding and Enabling the Economic and Social Value of Data (2013) Policies must continue to balance the need of protecting the privacy of students, while enabling DDI to greatly enhance the teaching and learning experience. Page 24 OECD Exploring Data-Driven Innovation as a New Source of Growth (2013) Skills and employment. Page 26 Bottom Line: Expand the talent pool!

BIG DATA (Data Science) Programs in the USA BACHELOR S DEGREE PROGRAMS 6 CERTIFICATE PROGRAMS 10 PHD PROGRAMS 2 57 MASTER S DEGREE PROGRAMS

NSF Research Traineeship (NRT) Preparing professionals in emerging STEM fields vital to the nation Priority research theme: Data-enabled science and engineering Purpose: create and promote new, innovative, effective, and scalable models for STEM graduate student training and prepare scientists and engineers of the future, particularly in emerging STEM fields vital to the nation. Anticipated award amount: up to $3M over 5 yrs. Due dates: LOI is due May 20; Full proposal is due June 24 $7.59M CISE Investment

Data Scientist Match-Making Service DataKind partners with Pivotal to bring industry s top data analytics talent to bear on society s greatest challenges currently being tackled by non-profit organizations; also exploring similar partnership opportunities with Teradata Partnership with The Mission Continues to better understand the effects of their volunteer programs on improving veterans' lives Partnership with Medic Mobile to quantifiably measure the impact of their many health initiatives helping under-served and disconnected communities around the world Image Credit: R. Morris / Chattanooga

Summer School: Data Science for Social Good Eric & Wendy Schmidt Foundation & the University of Chicago We re training data scientists to tackle problems that really matter. Apply to be a Summer Fellow Apply to be a Summer Mentor Apply to be a Project Partner Deadline: February 1, 2014 Deadline: February 1, 2014 Deadline: January 10, 2014 Application deadlines for the 2014 Summer Program are over. Get in touch with us if you want to work with us before or after the summer as a fellow, mentor, or project partner.

New Types of Collaborations

Data to Knowledge to Action: White House event encouraging publicprivate partnerships across the country November 12, 2013

Topical Domains Types of Competition Big Data Challenge Engage a broader audience and create a useful product Tool Design: Can we create a new tool that processes particular types of Big Data of interest? Information Extraction/Data Visualization: Can we design a system to help users make sense of data that is important to them? Data Collection: What interesting data are not being collected right now and how can we collect and use these data? System Design: How do we design a particular platform that addresses collection, processing, and/or collaboration of a particular type of Big Data? Improving Core Technology: How can we improve the performance of an existing Big Data tool or technique? Education: Can we create an app that teaches/improves student or public understanding of data analytics? Healthcare Energy Earth Science/ Climate Education (e.g. MOOC data, cyberlearning) Fraud, Waste, Abuse (Data Forensics) Government transparency Privacy Cybersecurity Human in the Loop/Human as Data Integrator/Crowdsourcing Scientific discovery Goals Engagement Increase general awareness of general public Improve federal agency buy-in on interagency collaborations Engages and improves public-private partnerships Usefulness Create a solution for a broader Big Data issue (e.g., platform for processing PDF data) Create an application that addresses a specific usage of interest Develop a new tool or technique of interest to the Big Data R&D community (i.e. algorithms) Novelty Solve a new, unique problem Address an issue that isn t or hasn t been a good fit for traditional mechanisms (e.g. grants, contracts)

Design choices for the challenge depend on the type and topic chosen Implementation Partner Phases and Outputs Participants Should we partner with a challenge platform, such as: TopCoder Innocentive Kaggle Mozilla How do we determine the implementation partner? Should it be through a competitive process? Ideation: Short white paper describing a concept and implementation System Design: Specification for a Big Data process/technique/platform implementation; can be Development: A built end product tool or platform, or a test prototype. Who is the target participant(s) in the challenge? General public Academic researchers Graduate and undergraduate students Corporate teams Can potentially have multiple tracks for different types of participants (e.g. student, corporate) each with a track winner along with an ultimate winner. Industry Partners Should we engage industry partners, and if so, what would they contribute? Prize money sponsorship Platform for data processing Marketing and outreach Proprietary datasets Partners benefit by getting Publicity for their company or their contributed technology Potentially access to the IP developed in the competition Interaction with agencies Timeline for Rollout What agencies are interested in participation? What is a reasonable timeline for rollout given budgets and agency clearance processes? Are there other major events planned where synchronization or avoiding of overlap would be useful? Funding Level What is the approximate funding level for a prize? Size of prizes for different topical domains can vary dramatically for the same type of work (i.e ideation, design, development)

Policy Issues

Public Access White House Memo on Feb. 22 directs United States federal agencies to develop a plan to support increased public access of results from federally funded research. Implementation plans for public access could vary by discipline, Peer-reviewed publications should be stored for long-term preservation and publicly accessible to search, retrieve, and analyze in ways that maximize the impact and accountability of the Federal research investment. and new business models for universities, libraries, publishers, and scholarly and professional societies could emerge. Digitally formatted scientific data resulting from unclassified research should be stored and publicly accessible to search, retrieve, and analyze.

Federal Investments past focus DATA KNOWLEDGE ACTION Data collection, fusion, integration Large scale/distributed analytics Data mining

Federal Investments new paradigms DATA KNOWLEDGE ACTION Discovery informatics Human in the loop Validation, statistics, and heuristics

Game Changing Themes Discovery informatics Accelerating the rate of discovery Human in the Loop Knowing what is in the black box Next gen bar charts Systems that reason A la the big mechanism Broadening participation Including women and underrepresented minorities Data Science for Public Good More partnerships

Big Opportunities for the Future Transformative implications for commerce and economy Critical to accelerating the pace of discovery and innovation Enhancing quality of life and societal wellbeing But the future won t happen the way we intend it if we don t work together!

Thanks! siacono@nsf.gov