1 National Big Data R&D Initiative Suzi Iacono, PhD National Science Foundation Co-chair NITRD Big Data Senior Steering Group for CASC Spring Meeting April 23, 2014
2 Why is Big Data Important? Transformative implications for commerce and economy Critical to accelerating the pace of discovery in almost every science and engineering discipline Potential for addressing some of the society s most pressing challenges Image Credit: Chi Birmingham
3 A National Imperative PCAST calls on the Federal government to increase R&D investments for collecting, storing, preserving, managing, analyzing, and sharing the increasing quantities of data. Furthermore, PCAST observed that the potential to gain new insights to move from data to knowledge to action has tremendous potential to transform all areas of national priority. Source: PCAST (December 2010), Report to the President and Congress: Designing a Digital Future A periodic congressionally-mandated review of the Federal Networking and Information Technology Research and Development (NITRD) Program. Available at:
4 National Big Data R&D Launch March 29, 2012 Led by cross-agency Big Data Senior Steering Group chartered in spring 2011 by the White House OSTP: Co-chaired by NSF and NIH Members from 18 agencies Charged with developing a framework and a plan Major Announcements: NSF, NIH, USGS, DoD, DARPA, DOE Cornerstone Announcement: Core Techniques and Technologies for Advancing Big Data Science & Engineering (BIG DATA) Solicitation All NSF Directorates and 8 NIH Institutes Research Thrusts: Collection, Storage, and Management; Data Analytics; Research in Data Sharing and Collaboration More information available at:
5 Framework for Investments to develop new techniques and technologies to derive knowledge from data New to manage, curate, and serve data to domain research communities New approaches for and New types of inter-disciplinary, grand challenges, and competitions
6 Foundational Research
7 Big Data Core Technologies 5% 3% Percent by number of projects 8% Data Collection, Management, Mining and Machine Learning Health and Bio Informatics 10% Social Networks 51% Physical Sciences and Engineering Algorithmic Foundations 23% Cyberinfrastructure In 2012 & 2013, NSF & NIH awarded 45 projects ranging from $250K/year for up to 3 years to $1M/year for up to 5 years.
8 Critical Techniques and Technologies for Advancing Big Data Science and Engineering (NSF ) Two categories for submission Foundational: Encourages fundamental, novel techniques, theories, methodologies and technologies of broad applicability Innovative Applications: Encourages novel techniques, theories, methodologies, and technologies of interest to at least one specific application Due Date: June 9, 2014 Size: up to $500K per year for up to 4 years Not joint with NIH this year
9 Domain Specific Research & Infrastructure
10 Agency Program Highlights DARPA is continuing programs in Big Data and anomaly detection, machine learning, visualization, text extraction, and starting 3 new programs (Big Mechanism, Memex, and Big Data Capstone) with a focus on knowledge to discovery/action. NIST s Big Data Public Working Group is a community of interest for industry, academia, and government and will develop consensus definitions, taxonomies, secure reference architectures, and a technology roadmap DOE Science portfolio to support Extreme Scale Science focuses on multidisciplinary R&D for data reduction, scalability, data fusion for multi-experiments, simulations, and collaboration infrastructures. NSF cross directorate investment ($155M) and includes broad programs in core technologies, domain sciences areas, and infrastructure. NSF is also making specific investments in Big Data Centers at Berkley (AMPLab) and MIT (Brains, Minds, & Machines) and in earth Science (EarthCube) as well as in Big Data infrastrructure DIBBS program. NIH s 7 year, $100M/year BD2K initiative focuses on facilitating broad use of biomedical big data, developing and disseminating analysis methods and software, enhancing training, and establishing Centers of Excellence.
11 Agency Program Highlights USAID continues major programs like FEWS-Net, and focus techniques to integrate multi-resolution datasets and converting historical data to machine readable formats (i.e. from pdf). The US Group on Earth Observation (GEO) is continuing work on data integration and interoperability of earth science data across agencies and have compiled inventories and prioritized lists of earth datasets. NASA projects a dramatic increase in simulation data holdings, and new programs will focus on managing exascale science and improving modeling results with data scale up, as well as improving small scale data science and collaborative workbenches. NOAA s Big Data priorities focus on weather and climate (specifically hurricane) modeling and prediction, federal and public-private partnerships, and leveraging HPC. Specifically, NOAA is leveraging ARRA investments that dramatically increased data resolution/granularity to improve modelling. Data.gov is launching an update to streamline and automate agency submission of datasets onto the platform that should dramatically increase the number of datasets available on the site.
12 Building a Big Data R&D Pipeline FOUNDATIONAL RESEARCH DOMAIN SPECIFIC APPLICATION RESEARCH CYBERINFRASTRUCTURE PILOTS
13 Education and Workforce Development
14 Education, Learning, Workforce Development, Computational and Data-enabled Science McKinsey & Company By 2018 the United States alone faces a shortage of 140,000 to 190,000 people with analytical expertise and 1.5 million managers and analysts with the skills to understand and make decisions based on the analysis of big data. 1 1 McKinsey&Company (May 2011), Big data: The next frontier for innovation, competition, and productivity. Available at:
15 WORKFORCE RECOMMENDATIONS TECHAMERICA Demystifying Big Data: A Practical Guide to Transforming the Business of Government (2012) Expand the talent pool by creating a formal career track for line of business and IT managers and establish a leadership academy to provide Big Data and related training and certification. Page 8 Leverage the data science talent by establishing and expanding college-togovernment service internship programs focused specifically on analytics and the use of Big Data. Page 8 IBM CENTER FOR THE BUSINESS OF GOVERNMENT From Data to Decisions III, Lessons from Early Analytics Programs (2013) To encourage data use and spark insight, enable employees to easily see, combine and analyze it. Page 27 Leaders and managers should demand and use data, and provide employees with targeted on-the-job-training. Page 30 MCKINSEY GLOBAL INSTITUTE Big data: The Next Frontier for Innovation, Competition, and Productivity (2011) Build human capital for big data. Page 117 MCKINSEY GLOBAL INSTITUTE Open Data: Unlocking Innovation and Performance with Liquid Information (2013) THE CENTER FOR DATA INNOVATION Data Innovation 101: An Introduction to the Technologies and Policies Supporting Data-Driven Innovation (2013) Human capital. Government agencies at the federal, state and local levels should continue to engage directly with the data science community and participate in civic hackathons, public coding challenges, and other events hosted by the data science community. Page 10 Government can also help spur the development of the necessary human capital by becoming a leader, rather than a laggard, in the adoption of data-driven innovation. Page 10 Additionally, public-sector agencies can help address the need for workers with strong data and analytical skills through education and immigration policies. Page 13
16 WORKFORCE RECOMMENDATIONS SOFTWARE & INFORMATION INDUSTRY ASSOCIATION Data- Driven Innovation, A Guide for Policymakers: Understanding and Enabling the Economic and Social Value of Data (2013) Policies must continue to balance the need of protecting the privacy of students, while enabling DDI to greatly enhance the teaching and learning experience. Page 24 OECD Exploring Data-Driven Innovation as a New Source of Growth (2013) Skills and employment. Page 26 Bottom Line: Expand the talent pool!
17 BIG DATA (Data Science) Programs in the USA BACHELOR S DEGREE PROGRAMS 6 CERTIFICATE PROGRAMS 10 PHD PROGRAMS 2 57 MASTER S DEGREE PROGRAMS
18 NSF Research Traineeship (NRT) Preparing professionals in emerging STEM fields vital to the nation Priority research theme: Data-enabled science and engineering Purpose: create and promote new, innovative, effective, and scalable models for STEM graduate student training and prepare scientists and engineers of the future, particularly in emerging STEM fields vital to the nation. Anticipated award amount: up to $3M over 5 yrs. Due dates: LOI is due May 20; Full proposal is due June 24 $7.59M CISE Investment
19 Data Scientist Match-Making Service DataKind partners with Pivotal to bring industry s top data analytics talent to bear on society s greatest challenges currently being tackled by non-profit organizations; also exploring similar partnership opportunities with Teradata Partnership with The Mission Continues to better understand the effects of their volunteer programs on improving veterans' lives Partnership with Medic Mobile to quantifiably measure the impact of their many health initiatives helping under-served and disconnected communities around the world Image Credit: R. Morris / Chattanooga
20 Summer School: Data Science for Social Good Eric & Wendy Schmidt Foundation & the University of Chicago We re training data scientists to tackle problems that really matter. Apply to be a Summer Fellow Apply to be a Summer Mentor Apply to be a Project Partner Deadline: February 1, 2014 Deadline: February 1, 2014 Deadline: January 10, 2014 Application deadlines for the 2014 Summer Program are over. Get in touch with us if you want to work with us before or after the summer as a fellow, mentor, or project partner.
21 New Types of Collaborations
22 Data to Knowledge to Action: White House event encouraging publicprivate partnerships across the country November 12, 2013
23 Topical Domains Types of Competition Big Data Challenge Engage a broader audience and create a useful product Tool Design: Can we create a new tool that processes particular types of Big Data of interest? Information Extraction/Data Visualization: Can we design a system to help users make sense of data that is important to them? Data Collection: What interesting data are not being collected right now and how can we collect and use these data? System Design: How do we design a particular platform that addresses collection, processing, and/or collaboration of a particular type of Big Data? Improving Core Technology: How can we improve the performance of an existing Big Data tool or technique? Education: Can we create an app that teaches/improves student or public understanding of data analytics? Healthcare Energy Earth Science/ Climate Education (e.g. MOOC data, cyberlearning) Fraud, Waste, Abuse (Data Forensics) Government transparency Privacy Cybersecurity Human in the Loop/Human as Data Integrator/Crowdsourcing Scientific discovery Goals Engagement Increase general awareness of general public Improve federal agency buy-in on interagency collaborations Engages and improves public-private partnerships Usefulness Create a solution for a broader Big Data issue (e.g., platform for processing PDF data) Create an application that addresses a specific usage of interest Develop a new tool or technique of interest to the Big Data R&D community (i.e. algorithms) Novelty Solve a new, unique problem Address an issue that isn t or hasn t been a good fit for traditional mechanisms (e.g. grants, contracts)
24 Design choices for the challenge depend on the type and topic chosen Implementation Partner Phases and Outputs Participants Should we partner with a challenge platform, such as: TopCoder Innocentive Kaggle Mozilla How do we determine the implementation partner? Should it be through a competitive process? Ideation: Short white paper describing a concept and implementation System Design: Specification for a Big Data process/technique/platform implementation; can be Development: A built end product tool or platform, or a test prototype. Who is the target participant(s) in the challenge? General public Academic researchers Graduate and undergraduate students Corporate teams Can potentially have multiple tracks for different types of participants (e.g. student, corporate) each with a track winner along with an ultimate winner. Industry Partners Should we engage industry partners, and if so, what would they contribute? Prize money sponsorship Platform for data processing Marketing and outreach Proprietary datasets Partners benefit by getting Publicity for their company or their contributed technology Potentially access to the IP developed in the competition Interaction with agencies Timeline for Rollout What agencies are interested in participation? What is a reasonable timeline for rollout given budgets and agency clearance processes? Are there other major events planned where synchronization or avoiding of overlap would be useful? Funding Level What is the approximate funding level for a prize? Size of prizes for different topical domains can vary dramatically for the same type of work (i.e ideation, design, development)
25 Policy Issues
26 Public Access White House Memo on Feb. 22 directs United States federal agencies to develop a plan to support increased public access of results from federally funded research. Implementation plans for public access could vary by discipline, Peer-reviewed publications should be stored for long-term preservation and publicly accessible to search, retrieve, and analyze in ways that maximize the impact and accountability of the Federal research investment. and new business models for universities, libraries, publishers, and scholarly and professional societies could emerge. Digitally formatted scientific data resulting from unclassified research should be stored and publicly accessible to search, retrieve, and analyze.
27 Federal Investments past focus DATA KNOWLEDGE ACTION Data collection, fusion, integration Large scale/distributed analytics Data mining
28 Federal Investments new paradigms DATA KNOWLEDGE ACTION Discovery informatics Human in the loop Validation, statistics, and heuristics
29 Game Changing Themes Discovery informatics Accelerating the rate of discovery Human in the Loop Knowing what is in the black box Next gen bar charts Systems that reason A la the big mechanism Broadening participation Including women and underrepresented minorities Data Science for Public Good More partnerships
30 Big Opportunities for the Future Transformative implications for commerce and economy Critical to accelerating the pace of discovery and innovation Enhancing quality of life and societal wellbeing But the future won t happen the way we intend it if we don t work together!
Testimony of Farnam Jahanian, Ph.D. Assistant Director Computer and Information Science and Engineering Directorate Before the Committee on Science, Space, and Technology Subcommittee on Technology and
National Spatial Data Infrastructure Strategic Plan 2014 2016 Federal Geographic Data Committee December 2013 Federal Geographic Data Committee Federal Geographic Data Committee, Reston, Virginia: 2013
2014 The Massachusetts Big Data Report A Foundation For Global Leadership The Massachusetts Big Data Ecosystem Chapter Three MassTech: Who We Are The Massachusetts Technology Collaborative, or MassTech,
Transforming the Way Government Builds Solutions > ACT-IAC Institute for Innovation 2013 American)Council)for)Technology Industry)Advisory)Council:)) The American Council for Technology (ACT) is a non-profit
A National Talent Strategy Ideas For Securing U.S. Competitiveness and Economic Growth Executive Summary The United States faces a growing economic challenge a substantial and increasing shortage of individuals
H. R. 5116 One Hundred Eleventh Congress of the United States of America AT THE SECOND SESSION Begun and held at the City of Washington on Tuesday, the fifth day of January, two thousand and ten An Act
Green Paper on Citizen Science Citizen Science for Europe Towards a better society of empowered citizens and enhanced research Green Paper on Citizen Science Citizen Science for Europe Towards a better
Cybersecurity Education Workshop February 24-25, 2014 George Washington University Arlington Center Arlington, VA Final Report April 7, 2014 Sponsored by the National Science Foundation Directorates of
Public Access Plan U.S. Department of Energy July 24, 2014 ENERGY.GOV Table of Contents Background... 3 Authority... 3 Public Access to Scientific Publications... 4 Scope... 4 Requirements... 5 Applicability...
Cyber Security: Designing and Maintaining Resilience White paper presented by: Georgia Tech Research Institute Cyber Technology and Information Security Laboratory Dr. George A. Wright Chief Engineer Terrye
Vol. 19 No. 4 2012 Building a national program for cybersecurity science Globe at a Glance According to the Experts Pointers Spinouts GUEST Editor s column Frederick R. Chang, PhD Considered by most to
U.S. Department of Education Strategic Plan for Fiscal Years 2014 2018 U.S. Department of Education Strategic Plan: Fiscal Years 2014 18 CONTENTS MESSAGE FROM THE SECRETARY.2 DEPARTMENT S MISSION STATEMENT...
REQUEST FOR PROPOSALS Grand Challenges Canada at the Sandra Rotman Centre MaRS Centre, South Tower, 101 College Street, Suite 406, Toronto, Ontario, Canada M5G 1L7 T 416.673.6568 F 416.978.6826 E email@example.com
The Government Performance and Results Act (GPRA) (Public Law 103-62) requires Federal agencies to develop strategic plans setting forth missions, long-term goals, and means to achieving those goals, and
ISSUE BRIEF International Postal Big Data: Discussion Forum Recap May 12, 2014 Report Number: RARC-IB-14-002 International Postal Big Data: Discussion Forum Recap EXECUTIVE SUMMARY Big data large datasets
HARNESSING THE POWER of DIGITAL DATA for SCIENCE AND SOCIETY Report of the Interagency Working Group on Digital Data to the Committee on Science of the National Science and Technology Council January 2009
EUROPEAN COMMISSION Brussels, 2.7.2014 COM(2014) 442 final COMMUNICATION FROM THE COMMISSION TO THE EUROPEAN PARLIAMENT, THE COUNCIL, THE EUROPEAN ECONOMIC AND SOCIAL COMMITTEE AND THE COMMITTEE OF THE
Career Management Making It Work for Employees and Employers Stuck in neutral. That s how many employees around the world would describe their career. In fact, according to the 2014 Global Workforce Study,
TECHNOLOGY ASSOCIATION OF OREGON A Cyber-Studies Strategy for Oregon Prepared for: Engineering Technology Industry Council Prepared by: Technology Association of Oregon May 5, 2014 Document reference No.:
B ig Data and Analytics in Northern Virginia and the Potomac Region May 2014 Sponsored by Northern Virginia Technology Council 2214 Rock Hill Road Herndon, Virginia 20170 703.904.7878 (phone) 703.904.8009
Elements of Effective Succession Planning A Working Paper for the UCEDDs This material is funded in part by the Administration on Developmental Disabilities under Contract #233-01-0022 to the Association
Report of the Research Prioritisation Steering Group REPORT OF THE RESEARCH PRIORITISATION STEERING GROUP Minister s Foreword The last decade has seen consistent and considerable public and private investment
A Guide to Horizon 2020 Funding for the Creative Industries October 2014 Introduction This document is provided as a short guide to help you submit a proposal for the Horizon 2020 funding programme (H2020).
Cover Page DEMYSTIFYING BIG DATA A Practical Guide To Transforming The Business of Government Prepared by TechAmerica Foundation s Federal Big Data Commission 1 TechAmerica Foundation: Federal Big Data
PREPARED STATEMENT OF ROBERT M. GROVES DIRECTOR U.S. CENSUS BUREAU Census: Planning Ahead for 2020 Before the Subcommittee on Federal Financial Management, Government Information, Federal Services, and
USDA Forest Service Fire and Aviation Management Workforce and Development Strategic Framework Agency Fire Mgt Vision Line / Fire Dialogue Mentoring and coaching Leadership Development Long-term Indiv.
Center for US Health System Reform Business Technology Office The big data revolution in healthcare Accelerating value and innovation January 2013 Peter Groves Basel Kayyali David Knott Steve Van Kuiken
STATE OF IOWA FEBRUARY 4-5, 2015 REQUEST FOR NEW PROGRAM AT IOWA STATE UNIVERSITY: MASTER OF BUSINESS ANALYTICS PROGRAM Contact: Diana Gonzalez Action Requested: Consider approval of the request by Iowa