Suzi Iacono CISE Directorate National Science Foundation



Similar documents
Big Data R&D Initiative

Big Data R&D Initiative

Core Techniques and Technologies for Advancing Big Data Science & Engineering (BIGDATA) NSF

National Big Data R&D Initiative

CYBERINFRASTRUCTURE FRAMEWORK FOR 21 ST CENTURY SCIENCE, ENGINEERING, AND EDUCATION (CIF21)

Good morning. It is a pleasure to be with you here today to talk about the value and promise of Big Data.

CC-NIE PI Workshop Plenary

NITRD and Big Data. George O. Strawn NITRD

Big Data. George O. Strawn NITRD

Information Technology R&D and U.S. Innovation

Big Data a threat or a chance?

CYBERINFRASTRUCTURE FRAMEWORK FOR 21 ST CENTURY SCIENCE, ENGINEERING, AND EDUCATION (CIF21) $100,070,000 -$32,350,000 / %

How To Use Hadoop For Gis

CYBERINFRASTRUCTURE FRAMEWORK FOR 21 st CENTURY SCIENCE AND ENGINEERING (CIF21)

NITRD: National Big Data Strategic Plan. Summary of Request for Information Responses

Testimony of. Before the

COGNITIVE SCIENCE AND NEUROSCIENCE

How To Teach Data Science

Government Technology Trends to Watch in 2014: Big Data

Data Driven Discovery In the Social, Behavioral, and Economic Sciences

Training for Big Data

Challenges in e-science: Research in a Digital World

SECURE AND TRUSTWORTHY CYBERSPACE (SaTC)

The National Consortium for Data Science (NCDS)

Collaborations between Official Statistics and Academia in the Era of Big Data

Data Centric Computing Revisited

Big-Data Computing: Creating revolutionary breakthroughs in commerce, science, and society

Information Visualization WS 2013/14 11 Visual Analytics

Big Data to Knowledge (BD2K)

New Jersey Big Data Alliance

MEDICAL DATA MINING. Timothy Hays, PhD. Health IT Strategy Executive Dynamics Research Corporation (DRC) December 13, 2012

NASA Earth Science Research in Data and Computational Science Technologies Report of the ESTO/AIST Big Data Study Roadmap Team September 2015

Panel on Emerging Cyber Security Technologies. Robert F. Brammer, Ph.D., VP and CTO. Northrop Grumman Information Systems.

SDN Security Challenges. Anita Nikolich National Science Foundation Program Director, Advanced Cyberinfrastructure July 2015

3rd International Symposium on Big Data and Cloud Computing Challenges (ISBCC-2016) March 10-11, 2016 VIT University, Chennai, India

Big Data Challenges in Bioinformatics

UC AND THE NATIONAL RESEARCH COUNCIL RATINGS OF GRADUATE PROGRAMS

Danny Wang, Ph.D. Vice President of Business Strategy and Risk Management Republic Bank

Knowledge Discovery from patents using KMX Text Analytics

Exploring the roles and responsibilities of data centres and institutions in curating research data a preliminary briefing.

The Tonnabytes Big Data Challenge: Transforming Science and Education. Kirk Borne George Mason University

Standards for Big Data in the Cloud

Outline. Who conducts research related to CIIP in the U.S.? Universities. What is Critical Information Infrastructure? Who sponsors this research?

Astrophysics with Terabyte Datasets. Alex Szalay, JHU and Jim Gray, Microsoft Research

Big Data in the context of Preservation and Value Adding

12/7/2015. Data Science Master s programs

Proposal for the Theme on Big Data. Analytics. Qiang Yang, HKUST Jiannong Cao, PolyU Qi-man Shao, CUHK. May 2015

Databases & Data Infrastructure. Kerstin Lehnert

Big Data Hope or Hype?

Doing Multidisciplinary Research in Data Science

High Performance Computing Initiatives

Anuradha Bhatia, Faculty, Computer Technology Department, Mumbai, India

Data Publishing Workflows with Dataverse

1 st Symposium on Colossal Data and Networking (CDAN-2016) March 18-19, 2016 Medicaps Group of Institutions, Indore, India

Analyzing Big Data: The Path to Competitive Advantage

Survey of Canadian and International Data Management Initiatives. By Diego Argáez and Kathleen Shearer

Standard Big Data Architecture and Infrastructure

Understanding Big Data Analytics for Research

Make the Most of Big Data to Drive Innovation Through Reseach

Learning from Big Data in

A Capability Maturity Model for Scientific Data Management

We are Big Data A Sonian Whitepaper

International Journal of Advancements in Research & Technology, Volume 3, Issue 5, May ISSN BIG DATA: A New Technology

International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: Vol. 1, Issue 6, October Big Data and Hadoop

Transcription:

Big Data R&D Initiative Suzi Iacono CISE Directorate National Science Foundation National Academies of Science Integrating Environmental Health Data to Advance Discovery January 11, 2013 Image Credit: Exploratorium.

Advances in information technologies are transforming the fabric of our society and data represents a transformative new currency for science, engineering, education and commerce. Image Credit: CCC and SIGACT CATCS

Where do the data come from? Why do we have a national initiative?

The Big Data Landscape I: Big Science Science gathers data at an ever-increasing rate across all scales and complexities of natural phenomena Sloan Digital Sky Survey in 2000 collected more data in its 1 st few weeks than had been amassed in the entire history of astronomy Within a decade, over 140 terabytes of information collected Large Hadron Collider generates scores of petabytes a year The proposed Large Synoptic Survey Telescope (3.3 gigapixel digital camera) will generate 40 terabytes of data nightly By 2015, the world will generate the equivalent of approximately 93 million Libraries of Congress

The Big Data Landscape II: Smart Sensing, Reasoning and Decisionmaking Environment Sensing Emergency Response Percepts (sensors) Agent (Reasoning) Situation Awareness: Humans as sensors feed multimodal data streams Credit: Photo by US Geological Survey People Centric Sensing Actions (controllers ) Pervasive Computing Social Informatics Smart Health Care Evaluate Sense Personal Sensing Public Sensing Social Sensing Intervene Assess Identify Source: Sajal Das, Keith Marzullo

The Big Data Landscape III: New Paradigms for Communications 1988 Remarkable Pace of Innovation Today EMAIL MOBILE SOCIAL NETWORKS VIDEO VOIP BLOGS

Communications Volume & Traffic Diversity VoIP Video Twitter Broadband 663M registered Skype users in 2011. Represents 20% of long distance minutes world-wide. If Skype were a carrier, it would be the 3rd largest in the world (behind China Mobile and Vodaphone). Largest provider of cross-border communication. Recent estimates as high as 60% of internet traffic is video and music sharing; 35 hours of new videos are uploaded every minute in 2011; 2 billion views per day. Currently 175 million registered users. 20% of global internet users have residential broadband; d; 68% in US subscribe be to broadband. d Mobile 5.3 billion mobile phone subscribers; 85% of new handsets will be able to access the mobile web; 1 in 5 has access to fast service, 3G or better; IM, MMS, SMS expected to exceed 10 trillion message by 2013.

The Big Data Landscape IV: The Long Tail of fscience Hundreds of thousands of scientists and engineers work individually or in small, distributed, disconnected groups all generating data that collectively represent an enormous, largely untapped scientific resource From running simulations, experiments, etc. Making heterogeneous data across many areas of science more homogeneous could give way to breakthroughs across all areas of science and engineering Estimated 40 exabytes of unique new information generated worldwide in 2010 Only 5% of the information created is structured, however, in a standard format of words or numbers; the rest are unstructured text, voice, images, etc.

How Big is Big? Big Data : Datasets whose size are beyond the ability of typical database software tools to capture, store, manage, and analyze -McKinsey Global Institute, Big data: the next frontier for innovation, competition, and productivity, May 2011. Image Credit: Sigrid Knemeyer

Not Just Volumes of Data The science of big data is not just about volumes and velocity of data, but also Heterogeneity and diversity Levels of granularity Media formats Scientific ifi disciplines i Complexity Uncertainty Incompleteness Representation types

Why is Big Data Important? Critical to transforming how science is done and to accelerating the pace of discovery in almost every science and engineering discipline Transformative implications for commerce and economy Potential for addressing some of society s most pressing challenges Image Credit: Chi Birmingham

Paradigm Shift: from Hypothesis-driven to Data-driven Discovery The Economist, The data deluge and how to handle it: A 14 page special report (Feb 25, 2010). The Fourth Paradigm: Data Intensive Scientific Discovery (2009, Microsoft Corporation). http://www.sciencemag.org/site/special/data/ http://www.economist.com/node/15579717 http://research.microsoft.com/enus/collaboration/fourthparadigm/

The Age of Data: From Data to Knowledge to Action Data-driven discovery is revolutionizing i i scientific exploration and engineering innovations Automatic extraction of new knowledge about the physical, biological and cyber world continues to accelerate Multi-cores, concurrent and parallel algorithms, virtualization and advanced server architectures will enable data mining and machine learning, and discovery and visualization of Big Data

Potential for Transformational Science & Engineering: From Data to Knowledge to Action Integration ti of discipline i (or media format ) specific data, examine for relationships Disaster informatics 3D toxic fume images Simulations of gas spread Maps of census concentrations First responder on-the- ground findings Evacuation routing

From Data to Knowledge to Action Researchers seek to fundamentally transform understanding of spinning giants The task involves assembling data from more storm variables-such as updraft, downdraft and vorticity or g p, y regions of spin--than what can be observed from ground tornado chasers or even actually produced in the atmosphere. For example, researchers need to better understand how changes in wind direction with height cause the updrafts in a storm to rotate, preceding the formation of a tornado. To solve the quandary, Amy McGovern, an associate professor in OU's School of Computer Science, and her team create tornado models with super computers that can process vast amounts of data. McGovern and her colleagues use the models to analyze how storm variables interact in order to identify tornadic and nontornadic storms.

Examples of Research Challenges More data are being collected than we can store Analyze the data as it becomes available Decide what to archive and what to discard Many data sets are too large to download Analyze the data wherever it resides Many data sets are too poorly organized to be usable Better organize and retrieve data Many data sets are heterogeneous in type, structure, semantics, organization, granularity, accessibility Integrate and customize access to federate data Utility of data is limited by our ability to interpret and use it Extract and visualize actionable knowledge Evaluate results Large and linked datasets may be exploited to identify individuals Design management and analysis with built-in i privacy preserving characteristics

A National Imperative PCAST calls on the Federal government to increase R&D investments for collecting, storing, preserving, managing, g, analyzing, and sharing the increasing quantities of data. Furthermore, PCAST observed that the potential to gain new insights to move from data to knowledge to action has tremendous potential to transform all areas of national priority. Source: PCAST (December 2010), Report to the President and Congress: Designing a Digital Future a periodic congressionally-mandated review of the Federal Networking and Information Technology Research and Development (NITRD) Program.

Administration s Big Data Research and Development Initiative Big Data Senior Steering Group chartered in spring 2011 under the Networking and Information Technology R&D (NITRD) Program Members from DARPA, DOD OSD, DHS, DOE-Science, HHS, NARA, NASA, NIST, NOAA, NSA, OFR, USGS, etc. Co-chaired by NSF (and NIH) Initial charge was to come up with a plan, a strategy 18 Image Credit: Fuqing Zhang and Yonghui Weng, Pennsylvania State University; Frank Marks, NOAA; Gregory P. Johnson, Romy Schneider, John Cazes, Karl Schulz, Bill Barth, The University of Texas at Austin

Big Data Membership Biven Laura DOE Science Laura.Biven@science.doe.gov Blatecki Alan NSFNational Science Foundation ablateck@nsf.gov Collica Leslie NISTNational Institute of Standards and Technology leslie.collica@nist.gov Deift Abby NSFNational Science Foundation adeift@nsf.gov Downing Gregory HHSDepartment of Health and Human Services gregory.downing@hhs.gov Espina Pedro OSTPWhite House Office of Science and Technology Policy Pedro_I_Espina@ostp.eop.gov Gerr Neil DARPADefense Advanced Research Projects Agency neil.gerr.ctr@darpa.mil Gundersen Linda USGSU.S. Geological Survey lgundersen@usgs.gov NOAANational Oceanic and Atmospheric Hall Alan alan.hall@noaa.gov Administration Iacono Suzanne NSFNational Science Foundation siacono@nsf.gov Jakubek David OSDOffice of the Secretary of Defense ATL david.jakubek@osd.mil Kaufman Daniel DARPADefense Advanced Research Projects Agency daniel.kaufman@darpa.mil Larson OSTPWhite House Office of Science and Technology Phillip P. Policy Phillip_P._Larson@OSTP.eop.gov Lee Tsengdar NASANational Aeronautics and Space Administration tsengdar.j.lee@nasa.gov NIHNational Institutes of Health /NLMNIH s National Lipman David lipman@ncbi.nlm.nih.gov Library of Medicine /NCBI Little Michael NASANational Aeronautics and Space Administration m.m.little@nasa.gov Luker Mark NCONational Coordination Office for NITRD /NITRDNetworking and Information Technology Research and Development luker@nitrd.gov Marth Lisa NISTNational Institute of Standards and Technology lisa.marth@nist.gov Muoio Patricia A. DNI patricia.a.muoio@dni.gov Pantula Sastry NSFNational Science Foundation spantula@nsf.gov NIHNational Institutes of Health /NLMNIH s National Preuss Don don.preuss@nih.gov Library of Medicine /NCBI Quade Brittany NSFNational Science Foundation bquade@nsf.gov Romine Charles NISTNational Institute of Standards and Technology charles.romine@nist.gov Smith Darren NOAANational Oceanic and Atmospheric Administration Darren.Smith@noaa.gov Spengler Sylvia NSFNational Science Foundation sspengle@nsf.gov Statler Tom NSFNational Science Foundation tstatler@nsf.gov Strawn George NCONational Coordination Office for NITRD /NITRDNetworking and Information Technology gstrawn@nitrd.gov Research and Development Suskin Mark NSFNational Science Foundation msuskin@nsf.gov Villani Jennifer NIHNational Institutes of Health /NIGMS villanij@nigms.nih.gov Wigen Wendy NCONational Coordination Office for NITRD /NITRDNetworking and Information Technology wigen@nitrd.gov Research and Development Zhao Fen NSFNational Science Foundation fzhao@nsf.gov Nowell Lucy DOE Science Lucy.Nowell@science.doe.gov

Big Data Membership Bristol Sky USGSU.S. G eological Survey sbristol@ usg s.gov Kielm an Joseph DHSDepartment o f H om eland Security jo seph.kielman@dhs.gov Petters Jonathan DOE Science Jonathan.Petters@ science.doe.gov Carver Doris NSFNational Science Found ation d carver@nsf.go v Adolfie Laura O SD O ffice of the Secretary of Defense laura.adolfie@ osd.mil Chadduck Robert NSFNational Science Foundation rchadduc@ nsf.gov Crow der Grace NSANational Security Agency gacrowd@ nsa.gov Dunn M ich e lle NIHNational In st itu t es of Health dunnm 3@ mail.nih.gov Florance V a le rie NIHN a tional In stitu t es o f H ealth florancev@ mail.n ih.go v Frehill Lisa O SD O ffice of the Secretary of Defense lisa.fr eh ill.c TR @ d ar p a.m il Helland Barbara DOEDepartment of Energy Science barbara.helland@ science.doe.gov Hoang Thuc DOEDepartment of Energy NNSA thuc.hoang@ nnsa.doe.gov Kannan Nandini NSFNational Science Foundation nkannan@nsf.gov W arnow T and y NSFN a tional S c ie nce F ound a tion t warnow@ nsf.g o v Dean David DOE Science david.dean@ science.doe.gov Ly ster Peter NIHNational In st itu t es of Health lysterp @ m ail.n ih.go v M ille m a ci John O SD O ffice of the Secretary of Defense jo hn.millemaci.ctr@ osd.m il Pearce Claudia NSANational Security A gency cepearce@ nsa.gov Allen M arc NASANationalAeronautics Aeronautics and Space Administration marcallen@nasa marc.a nasa.gov Pearl Jennifer NSFNational Science Foundation jslim ow i@ nsf.gov Blaszkowsky David TreasuryDepartment of the Treasury OFR david.blaszkow sky@ treasury.gov Szykm an James EPAEnvironm ental Protection Agency ja mes.j.szykman@nasa.gov Tom pkins Jerry NSANational Security Agency gsto m p k@ rad ium.ncs c.m il Flood M ark TreasuryDepartment of the Treasury OFR markflood@treasurygov mark.flood@ ry.g ov Holm Jeanne Data.gov je anne.m.holm.jpl.nasa.gov Misawa Eduardo NSFNational Science Foundation emisawa@nsf.gov

Big Data Launch Federal Big Data R&D Initiative launched by White House OSTP on March 29, 2012 at AAAS Federal Announcements: NSF Subra Suresh NIH Francis Collins USGS Marcia McNutt DoD Zach Lemnios DARPA - Ken Gabriel DOE William Brinkman Panel Discussion: Moderator - Steve Lohr, New York Times Daphne Koller, Stanford University James Manyika, McKinsey & Company Lucila Ohno-Machado, UC San Diego Alex Szalay, Johns Hopkins University Image Credit: National Science Foundation More information available at: http://nsf.gov/news/news_summ.jsp?org=cise&cntn_id=123607&preview=false

Strategy to Address Big Data Foundational research to develop new techniques and technologies to derive knowledge from data New cyberinfrastructure to manage, curate, and serve data to research communities ii Policy New approaches for education and workforce development New types of inter disciplinary collaborations, grand challenges, and competitions

Core Techniques and Technologies for Advancing Big Data Science & Engineering (BIG DATA) Program Solicitation: NSF 12-499 Foundational research to extract knowledge from data Foundational research to advance the core techniques and dtechnologies for managing, analyzing, visualizing, and extracting useful information from large, diverse, distributed and heterogeneous data sets. Image Credit: Jurgen Schulze, Calit2, UC San Diego Cross Directorate t Program: NSF Wide Multi agency Commitment: NSF and NIH

BIG DATA Research Thrusts Collection, Storage, and Management of Big Data 3 awards Foundations of big data management Mitigating tradeoffs among speed of data ingestion, quicker answers and the freshness of data through the design of new storage devices with extreme capacities 4 awards Data Analytics Novel machine learning where multi-dimensional vector data points are replaced by distributions Design and test mathematical and statistical techniques for large-scale heterogeneous data in DNA repositories Research in Data Sharing and Collaboration 1 award (+1 shared with data collection) Open source tools for infrastructure for improving discovery through use of social analytic data Databridge linking data, human interactions, and usage practices for the long-tail of science Databridge linking data, human interactions and usage practices for the long-tail of science Data analytics problems in next generation sequencing Theory and algorithms fro couples tensors and associated software toolkits to make analysis possible Credit: Fermilab Photo Eight mid scale (up to $1M a year) awards out of over 136 projects announced on Oct. 3.

Award Citations DCM: Dan Suciu University of WA A formal foundation for big data management Michael Bender SUNY at Stony Brook & Martin Farach-Colton Rutgers University Eliminating the data ingestion bottleneck in big data applications Arcot Rajasekar University of North Carolina, Chapel Hill & Gary King Harvard University & Justin Zhan North Carolina Agriculture & Technical State University it Databridge A sociometric system for long-tail science data collections 25

Data Analytics Award Citations Eli Upfal Brown University Analytic approaches to massive data computation with applications to genomics Aarti Singh Carnegie-Mellon University Distribution-based machine learning for high dimensional datasets Srinvas Aluru Iowa State University & Wuchun Feng Virginia Polytechnic Institute & State University & Oyekunie Olukotun Stanford University Genomes Galore Core techniques, libraries, and domain specific languages for high throughput DNA sequencing

Award Citations Data Analytics (continued) Christos Faloutsos Carnegie Mellon University & Nikolaos Sidiropoulos University of Minnesota Twin Cities Big Tensor Mining: Theory Scalable Algorithms and Applications 27

Award Citations E-Science Collaboration Environments Thorsten Joachim Cornell University & Paul Kantor Rutgers University Discovery and social analysis for large-scale scientific literature 28

Ideation Contest Launch Opportunity to expand the innovation ecosystem Joint among NASA, NSF and DOE Office of Science A contest focused on How to make heterogeneous data seem more homogeneous? 5 judges 5 criteria Launched on Challenge.gov and the Top Coder platform on Oct. 3 with a two week window http://challenge.gov/nasa/425-big-data-challengeseries http://community.topcoder.com/coeci/nitrd/ topcoder com/coeci/nitrd/

Ongoing Big Data Programs at NSF Dear Colleague Letters: Encourage CIF21 IGERTs to educate and support a new generation of researchers able to address fundamental Big Data challenges: http://www.nsf.gov/pubs/2012/nsf12555/nsf12555.htm Data-Intensive t Education-Related t d Research Funding Opportunities announcing an Ideas Lab, for which cross disciplinary participation will be solicited, to generate transformative ideas for using large datasets to enhance the effectiveness of teaching and learning environments: http://www.nsf.gov/pubs/2012/nsf12060/nsf12060.jsp Data Citation to the Geosciences Community to encourage transparency and increased opportunities for the use and analysis of data sets: http://www.nsf.gov/pubs/2012/nsf12058/nsf12058.jsp

Earthcube: GEO Science Infrastructure EAGER awards announced as part of White House Big Data Launch Integrates geosciences data and high-performance computing technologies in an open, adaptable and sustainable framework to enable transformative research and education in Earth System Science Innovative Model: Community designed, community owned, community governed Interdisciplinary research: Building and sustaining new communities Workshops to bring together (GEO, SBE, CISE) communities EAGER awards to seed new research

A Complex Policy Setting Researchers want data. Public policy requires access to data. Public policy also requires protection of privacy and intellectual property and other sensitive information. Much more to be done: Policy on data management and data access.

Data Privacy Never more important than today However, not all data contain people s identities (as in data landscape III) Not a Big Brother scenario Government (NSF) invests in privacy research Values in design research community: Identity cloaking, anonymization Do-not-track cookie management Obfuscation, blurring Privacy preserving data mining, search, payment Just-in-time crypto Secure data distribution. Privacy in technology; privacy inspired technology

Opportunities for the Future Our investments in research and education have already returned exceptional dividends to the Nation. Many of tomorrow s breakthroughs will occur as a result of new techniques and technologies for advancing Big Data science and engineering. In turn, Big Data scientific discovery and technological innovation are at the core of our response to national and societal challenges from environment, energy, transportation, sustainability, and healthcare to cyber security and national defense.

Thanks! siacono@nsf.gov