BIG DATA REGIONAL INNOVATION HUBS &SPOKES Accelerating the Big Data Innovation Ecosystem Fen Zhao Staff Associate, Strategic Innovation CISE Directorate, Office of the Assistant Director 1 National Science Foundation
AGENDA FOR TODAY 01 THE PROGRAM An brief overview of the Hubs & Spokes Program Fen Zhao, NSF 03 Q&A Questions? Ask the Hub coordinators and NSF 02 THE HUBS Presentations by the NorthEast Hub, South Hub, MidWest Hub, and West Hub 2 National Science Foundation
THE NSF BIG DATA PORTFOLIO OF PROGRAMS Within the broader NSF portfolio, BD Hubs and BD Spokes focuses on building partnerships around Big Data RESEARCH Critical Techniques & Technologies for Big Data (BIGDATA) INFRASTRUCTURE Data Infrastructure Building Blocks (DIBBS) EDUCATION National Research Traineeship (NRT) PARTNERSHIPS Big Data Regional Innovation Hubs & Spokes 3 National Science Foundation
THE BD HUBS AND SPOKES NETWORK Hub and Spoke A Nation- Wide Network for Data Innovation 1 Hubs Local stakeholders guide activities locally and nationally 3 Nodes Partnerships formed to drive specific end goals in priority areas Spokes 2 Hub selects some local priority areas (i.e. transportation, manufacturing) 4 National Science Foundation
RECENT TIMELINE FOR THE PROGRAM BD Spokes is the latest phase of a long term NSF agenda for Big Data Partnerships MAR 2015 BD Hubs Launched BD Hubs solicitation to fund four regional Hubs is released JUN 2015 Hubs Proposals Submitted Large collaborative proposals aubmitted to NSF SEPT 2015 Hubs Awards Made Partnerships update NITRD on midterm outcomes from announced projects APR 2015 Big Data Regional Charrettes Held Industry, academia, and government representatives gathered in four charrettes around the country NOV 2015 BD Spokes BD Spokes solicitation released before 5 th DC national charrette 5 National Science Foundation
Alaska & Hawaii are part of the West region US Territories can participate in any region UW (PI) MIDWEST UND(co- PI) 106 Personnel 79 Organizations 12 states NORTHEAST 193 Personnel 99 Institutions 9 States U of M (co- PI) Iowa State (co- PI) Berkeley (PI) UIUC/NCSA (PI) Indiana U (co- PI) Columbia (PI) UNC/RENCI (PI) UCSD/SDSC (PI) WEST 86 Personnel 47 Organizations 13 States BD Hubs Points indicate affiliations of individuals named as steering council members and/or task leads. SOUTH* 116 Personnel 95 Organizations 15 States + DC Georgia Tech (PI) *South points indicate Senior Personnel University HPC Center Non- profit Government Industry 6 National Science Foundation
BIG DATA SPOKES (NSF 16-510) BD Spokes proposals will be designed in partnership with Hubs. All spoke proposals must have a letter of support from their Hub. Hubs will discuss their initial steps in managing the Spoke design proccess for their region BD Spokes are not an R&D project nor are they mini Hubs 7 National Science Foundation
SPOKES MAJOR THEMES Three different ways of slicing the Big Data Innovation problem SPOKES TO DIRECTLY ADDRESS 8 National Science Foundation
MISSION DRIVEN SPOKES BD Spokes proposals must articulate a clear focus within a specific Big Data topic or application area, while highlighting their Big Data Innovation theme. All BD Spokes must have clearly defined mission statements with goals and corresponding metrics of success. 9 National Science Foundation
AREAS OF EMPHASIS Some NSF priority areas include NEUROSCIENCE REPLICABILITY & REPRODUCABILITY IN DATA SCIENCE SMART & CONNECTED COMMUNITIES DATA PRIVACY DATA INTENSIVE RESEARCH IN THE SOCIAL, BEHAVIORAL, & ECONOMIC SCIENCES EDUCATION 10 National Science Foundation
Specifications & Limits AWARD INFORMATION & DUE DATES Each Spoke project will be funded up to a maximum of $1,000,000 for up to 3 years, subject to the availability of funds. Each Planning Grant will be funded up to a maximum of $100,000 for up to 1 year, subject to the availability of funds. Total anticipated funding amount of $10M An estimated 9 Spokes and 10 Planning Grants is anticipated. Letter of Intent Deadline: Jan 12, 2016 (5pm proposer s local time) Full Proposal Deadline: Feb 25, 2016 (5pm proposer s local time) 11 National Science Foundation
Specifications & Limits ELIGIBILITY INFORMATION Proposals may be submitted by ü Universities and Colleges ü Non- profit, non- academic organizations NSF welcomes collaborative proposals that include for- profit organizations and FFRDCs. However, such organizations may only participate as subawardees. ü State and Local Governments An individual may only serve as the PI or co- PI in at most one submission. 12 National Science Foundation
FOR FURTHER QUESTIONS CONTACT Fen Zhao, fzhao@nsf.gov 703 292 7344 NSF Headquarters, Arlington VA 13 National Science Foundation
1 Northeast Big Data Innovation Hub Kathleen McKeown (PI) Data Science Institute Department of Computer Science Columbia University
NORTHEAST UW (PI) UND(co-PI) U of M (co-pi) Iowa State (co-pi) Berkeley (PI) UIUC/NCSA (PI) Indiana U (co-pi) UNC/RENCI (PI) Georgia Tech (PI) 193 Personnel 99 Institutions 9 States University HPC Center Non-profit Government Industry 2
3 Vision To provide opportunities for data scientists across organizations, sectors and related fields to come together for collaboration To exploit data of different kinds to solve problems
4 Steering Committee Executive Committee Advisory Sub-group Task Leads Sub-group
5 Executive Comittee PI: Kathleen McKeown Executive Committee Carla Brodley, Northeastern Vasant Honavar, Penn State Andrew McCallum, Umass Amherst Howard Wactlar, CMU
6 Steering Advisory Sub-group Academic Members Nabil Adams Rutgers Univ., Newark (NJ) Industry, non-profits, government Josh Greenberg Mark Gerstein Yale Univ. (CT) Amen Ra Mashariki Sloan Foundation Tim Kraska Brown Univ (RI) Micah Adler Fiksu (MA) Theresa Pardo John Storey Lorenzo Torresani SUNY Albany (NY) Princeton Univ. (NJ) Jennifer Chayes Yann LeCun Mayor s Office of Data Analytics, (NY) Microsoft Research (MA) Facebook. NYU (NY) Dartmouth (NH) Cheryl Begandy Pittsburgh Supercomputing Center (PA) John Goodhue Mass Green HPCC (MA)
7
8 Spokes Cities/regions Energy Health Finance Data driven Education Scientific discovery
Cities/Regions Constantine Kontokontas, NYU Sanjay Seth, Regional Plan Association Better understanding of the science of cities Use of big data for operational efficiencies and better quality of life A regional perspective to encourage cooperation 9
Energy Abani Patra, SUNY Buffalo Bringing together three distinct communities and approaches Generation and storage Distribution Consumption 10
11 Health George Hripcsak, Columbia Univ Acquisition of data, especially Social media Quantified self Healthcare workflow Mobile health Advanced analytics Precision medicine DNA Sequencing on a chip
12 Big Data for Education Beverly Woolf, Umass Amherst Ryan Baker, Columbia University Online education yields large data Enable reasoning about students learning style Adaptive curricula Train people with smaller data how to use it
13 Discovery Science Chris Hill, MIT Neuro-engineering, materials design, environmental sustainability Data infrastructures Data inference tools
Finance Michael Kearns, UPenn Automation of every aspect of electronic data -> abundance of data Subject to poorly understood crises and collapses Explore collaborations that bring data science tools to bear 14
Connectors Data sharing Sam Madden, MIT Ethics and policy Jennifer Stromer-Galley, Syracuse Privacy and security Adam Smith, Penn State Data science Education Jim Hendler, RPI 15
Upcoming Workshops Inaugural Workshop of the Northeast Big Data Innovation Hub Dec. 16 th, Columbia University Register now! Cryptography for Big Data Dec. 14 th 15 th, Columbia University Data Science, Learning and Applications to Biomedical and Health Sciences Jan. 7-8, NY Academy of Science 16
17 Spokes Solicitation Northeast Timeline Dec. 10 th : Submit letter of intent draft Dec. 16 th : Letters of collaboration decided Maximum of 25 submissions from Northeast Jan. 6 th : Letters of collaboration provided
18 Success guidelines such as Clear metrics/timeline for success, Multi-state Multi-institution Mix of industry, academic, Govt, NGO, highly collaborative high impact includes a workforce component articulates expected collaboration with the Hub.
19 Possible Directions Data sharing and collaboration across regions Central access point Learning across populations and scales Workshops in an interdisciplinary area Followed by hackathons, pilot projects
20 Join the Northeast Hub http://northeastbdhub.dsi.columbia.edu Check workshop tab to register for inaugural workshop northeastbdhub@columbia.edu
NSF South Big Data Hub Initial Research Themes, Governance, & Collaboration
South BD Hub PI: ASHOK KRISHNAMURTHY, PhD PI: SRINIVAS ALURU, PhD 2
South BD Hub Partner Institutions 2 Pls: Aluru at Georgia Tech; Krishnamurthy at UNC/RENCI 27 Senior Personnel 59 Academic/ Nonprofit/ Government Partners 25 Industry Partners 3
Initial Spoke and Ring Research Priorities SPOKES RINGS Coastal Hazards Health Disparities, Precision Medicine, and Healthcare Analytics Materials and Manufacturing Industrial Big Data Habitat Planning Big Data Sharing and Infrastructure Big Data Economic Modeling, Security & Privacy, and Policy 4
Spoke: Precision Medicine, and Healthcare Analytics, and Health Disparities POTENTIAL RESEARCH AIMS To establish data registries and develop analytical models and approaches to inform and alleviate disparities in health status, health outcomes, and access to healthcare resources To apply genomics data for precision medicine aimed at improving the health outcomes of minority populations Why? Substantial minority population Significant health disparities Stroke belt spanning 11 Southern states 5
Spoke: Coastal Hazards POTENTIAL RESEARCH AIMS To develop approaches to integrate large amounts of disparate data across wide-ranging spatial and temporal scales To advance computational approaches and analytical models to predict the impact of and response to coastal hazards Why? Long coast line (Atlantic and Gulf) Tropical cyclones and hurricanes Disproportionate share of natural hazards 6
Spoke: Industrial Big Data POTENTIAL RESEARCH AIMS To develop real-time analytic capabilities for monitoring system performance and provisioning online advisory and security systems To advance analytical methods and approaches aimed at maximizing the distribution efficiency of products to customers Why? Hub for power generation and utility companies Oil and gas production 7
Spoke: Materials and Manufacturing POTENTIAL RESEARCH AIMS To improve the additive manufacturing of components for high-temperature structural applications To advance the manufacturing of unconventionally formed lightweight metal alloys Why? Southeast a major manufacturing hub U.S. supply chain for aerospace and automotive OEMs has substantial South presence 8
Spoke: Habitat Planning POTENTIAL RESEARCH AIMS To investigate new methods and approaches to collect and integrate population-scale data on infrastructure and behavior To develop complex simulations to test proposed infrastructure-based policies and guide decision making Why? Food deserts Urban-rural co-dependent systems Strengths in transportation and disease epidemics 9
Ring: Big Data Sharing and Infrastructure POTENTIAL RESEARCH AIMS To establish domain-specific National Data Collections To bring together national cyberinfrastructure providers, representative industries, and the broader community to deliberate optimal domain-specific cyberinfrastructure and reach consensus regarding recommendations 10
Ring: Big Data Economic Modeling, Security & Privacy, and Policy POTENTIAL RESEARCH AIMS To improve the efficiency of the manufacturer-distributor-retailor network through improvements in regional transportation hubs To work with stakeholders to reach consensus on mutually acceptable, domain- and project-specific policies, protocols, and procedures for data use agreements, regulatory compliance, and other security and privacy issues 11
South BD Hub: Governance Structure STEERING COUNCIL EXECUTIVE COMMITTEE CO-EXECUTIVE DIRECTOR GEORGIA TECH CO-EXECUTIVE DIRECTOR RENCI/UNC SOUTH BD HUB OPERATIONS OFFICE RESEARCH COMMITTEES: SPOKES RESEARCH COMMITTEES: RINGS OPERATIONS COMMITTEES: Health Disparities Coastal Hazards Industrial Big Data Materials & Manufacturing Habitat Planning BD Sharing & Infrastructure BD Economic Modeling, Security, & Privacy Community Engagement, Diversity & Partnerships Data Sharing & Analytics Infrastructure Education, Training, & Workforce Development 12
South BD Hub: Core Hub Activities 1 2 3 4 SOUTH BD HUB WEBSITE COLLABORATION SPACE COMMUNICATION INFRASTRUCTURE FACILITATION AND OUTREACH Hub activity calendar Hub news Announcements Document repository Working space/wiki Web conferencing Teleconferencing Workshop logistics Meeting facilitation Webinars YouTube channel Social media presence 13
South BD Hub: Collaboration Model BIRDS OF A FEATHER GROUPS WORKING GROUPS OUTCOMES Spokes, Rings, Workshops, Research Coordination Networks, Infrastructure 14
Requesting a Letter of Collaboration from the SouthBDHub We will be sending out information on how potential spoke PIs can request Letters of Collaboration from the SouthBDHub. We have a mailing list, and if you aren t already getting email from us please send email to: info@southbdhub.org and we ll get you on the mailing list! we will ALSO post this information at: http://www.southbdhub.org/ 15
South BDHub meeting The South BDHub will be holding a Spoke Organizing Meeting and a General Hub Meeting on Dec 7 th and 8 th in Georgia Tech Campus, Atlanta, GA. During this meeting we will introduce the NSF concept for hubs and spokes, allow time for team formation and informal discussions, and answer questions about hub-spoke interactions. We have a mailing list, and if you aren t already getting email from us please send email to: info@southbdhub.org and we ll get you on the mailing list! We will ALSO post this information at: http://www.southbdhub.org/ 16
South BD Hub: Contact Information info@southbdhub.o rg Dr. Srinivas Aluru, PI aluru@cc.gatech.edu 404-385-1486 Jennifer Salazar jsalazar@gatech.edu Dr. Ashok Krishnamurthy, PI ashok@renci.org 919-445-9643 Dr.Stanley Ahalt ahalt@renci.org 919-360-6131 David Knowles, Interim Co-Executive Director dknowles@renci.org 919-445-9677 17
Midwest Big Data Hub Edward Seidel Director, NCSA Founder Prof. of Physics, Prof of Astronomy On behalf of the Midwest Big Data Hub Brian Athey Sarah Nusser Beth Plale Josh Riedy Ed Seidel 1
SEEDCorn: Sustainable Enabling Environment for Data Collaboration (aka MBDH) A partnership of academia, government, industry, nonprofits Over 100 partners already Colleges, Universities, Medical Centers, of all types Industry, Non-profits, NGOs States, cities, communities
Spokes Currently Identified by MBDH (Leaders across midwest for each area!) Network Science Including Data Intensive Research in the Social, Behavioral, and Economic Sciences. Urban Science Including Smart and Connected Communities Business Analytics Digital Agriculture Transportation Advanced Manufacturing Food, Energy, Water Healthcare & Biomedical Research Including neuroscience Others as proposed Including Data Privacy Spokes are supported by the Hub
Crosscutting Rings Supported by MBDH Data Science Including Data Intensive Research in the Social, Behavioral, and Economic Sciences Replicability and Reproducibility in Data Science Education Including new approaches to STEM learning environments Data Tools and Services Rings are cross-cutting, supporting all spokes
Goals and Outcomes/Impacts Expected Strengthening, creating and securing funding numerous new public-private partnerships Additional funding from agencies (NSF, NIH, DOE, NIST, USDA ), NGOs, governments, industry will be sought Accelerating technology transfer projects Introducing new Big Data educational activities into universities, industry and government Data policies, management, and best practices with real data for real impact Will involve, train many young data scientists
Goals and Outcomes/Impacts Expected Starting pilots in data environments (SEEDCorn!) Collaborations will come together to develop and test new approaches to data sharing, policies, algorithms Will work with various organizations to test pilots with real data For example: helping farmers balance productivity and sustainability with detailed data on crop growth, soil conditions & environment Research Data Alliance (RDA), National Data Service (NDS), other orgs. HPC centers supporting pilots Developing and implementing new sustainability models Models for long term data stewardship, private-public partnerships, educational practice Different approaches will be needed!
We are just getting started! We are bootstrappping our way to function Executive Director sought! You are invited to join! Our initial planning involves a series of startup meetings beginning in January Check out our website at midwestbigdatahub.org White papers, interim steering group leadership, and more...
Let the Hubs begin!! TIMELINE
Key Near-term Dates for Hub Actions Nov 6: Steering Committee mails Nov 9: Mailing entire collaboration with guidelines for how to propose MBDH activities If not already involved, sign up on website if want to receive this Dec 9: Draft LOI due Proposers to coordinate with spoke leads if appropriate Additional document describing how you leverage hub assets, what you expect, what data collections you might bring, and more see website and mail for details Dec 16: Hub governance team meet in Chicago to plan Mid January: First All Hands meeting
Summary MBDH is highly integrative across all sectors from academia, governments, NGOs, and industry Building collaborations around grand challenges in science and society, focused on Midwest region Helping to automate big data life cycle Enablign access to and increasing use of dat aassets Regional structure to bring together communities across region, nation Focus on problems of the region, and beyond Working groups formed, interim steering committee, workshops planned We are starting to prepare for first NSF spoke solicitation Join us!
West Big Data Innovation Hub 11.5.2015
West Hub Leadership + Representatives + more! Mike Franklin (UC Berkeley), Bill Howe (UW), Christine Kirkpatrick (UCSD), Ed Lazowska(UW), Meredith Lee (UC Berkeley / West Hub), Mike Norman (SDSC), Erin Robinson (FES), Ariel Rokem(UW), Sarah Stone (UW) K. Selcuk Candan(ASU), Huiping Cao (NMSU), Thomas Hauser (CU-Boulder), Sharat Israni (Stanford), Julie Meier Wright (CCST), Dane Skow(UWYO),,Harsh Verma(R Systems)
Strategic Vision Upcoming Opportunities Communications
Building multi-sector and multi-state partnerships to address societal challenges with Big Data innovation
Big Data Technology Metro Data Science Precision Medicine Natural Resources + Hazards Data-enabled Scientific Decisions
Building multi-sector and multi-state partnerships to address societal challenges
Educate Connect Facilitate Incubate
WestBigDataHub.org info@westbigdatahub.org @westbigdatahub#bdhubs Spokes 2 pager by Dec. 18