Exascale Computing Project (ECP) Update



Similar documents
NA-ASC-128R-14-VOL.2-PN

Solvers, Algorithms and Libraries (SAL)

Visualization and Data Analysis

EXECUTIVE ORDER CREATING A NATIONAL STRATEGIC COMPUTING INITIATIVE. By the authority vested in me as President by the

Achieving Performance Isolation with Lightweight Co-Kernels

Next-Generation Networking for Science

Data Centric Systems (DCS)

So#ware Tools and Techniques for HPC, Clouds, and Server- Class SoCs Ron Brightwell

High Performance Computing Initiatives

U.S. DEPARTMENT OF ENERGY CONTRACT AND PROJECT MANAGEMENT IMPROVEMENT. Closure Report

NIST Cloud Computing Program Activities

Risk Management Techniques and Practice Workshop. December Compiled by Terri Quinn and Mary Zosel Lawrence Livermore National Laboratory

State of XXXXX. Project Name/Phase (Planning Development Implementation) Status Report. Date Ending: Mm/dd/yy

NITRD: National Big Data Strategic Plan. Summary of Request for Information Responses

Cybersecurity Framework. Executive Order Improving Critical Infrastructure Cybersecurity

Doing Business with Lawrence Livermore National Laboratory (LLNL)

Lawrence Livermore National Laboratory

Graduate Research and Education: New Initiatives at ORNL and the University of Tennessee

INCITE Overview Lattice QCD Computational Science Workshop April 30, Julia C. White, INCITE Manager

California Community Colleges Online Education Initiative

DOE s Fuel Cycle Technologies Program - An Overview

SAND P Issued by Sandia National Laboratories for NNSA s Office of Advanced Simulation & Computing, NA-114.

Customer Service Center and Violation Processing System. June 27, 2013

How To Support High Performance Computing

Program Lifecycle Methodology Version 1.7

NICE and Framework Overview

CYBERINFRASTRUCTURE FRAMEWORK FOR 21 ST CENTURY SCIENCE, ENGINEERING, AND EDUCATION (CIF21) $100,070,000 -$32,350,000 / %

Development, Acquisition, Implementation, and Maintenance of Application Systems

TECHNOLOGY ENTERPRISE RESOURCE PLAN (TERP)

Relations with ISV and Open Source. Stephane Requena GENCI

Summit and Sierra Supercomputers:

Briefing for NFAC February 5, Sharlene Weatherwax Associate Director Office of Biological and Environmental Research U.S. Department of Energy

ADVISORY MEMORANDUM REPORT ON DEVELOPMENT OF THE LOAN MONITORING SYSTEM ADVISORY REPORT NUMBER A1-03 FEBRUARY 23, 2001

Appro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes. Anthony Kenisky, VP of North America Sales

Software Defined Networking for Extreme- Scale Science: Data, Compute, and Instrument Facilities

The National Consortium for Data Science (NCDS)

Workforce Development for Teachers and Scientists Funding Profile by Subprogram and Activity

HIGH AVAILABILITY LINUX ARCHITECTURE FOR MISSION CRITICAL WORKLOADS

California Community Colleges Educational Planning Tool (EPT) & Degree Audit System (DAS) Request for Information (RFI)

Integrated Technology Plan (FY10 FY12)

GTA Board of Directors September 4, 2014

U.S. Department of Energy Office of Inspector General Office of Audits and Inspections

Vendor Management System Implementation. Planning to Maximize ROI

CIO and Cyber Security Overview Argonne National Laboratory. Michael A. Skwarek CIO Matthew A. Kwiatkowski CISO Oct. 12, 2011

SECURE AND TRUSTWORTHY CYBERSPACE (SaTC)

Cisco Network Optimization Service

Make the Most of Big Data to Drive Innovation Through Reseach

Lecture Objectives. Software Life Cycle. Software Engineering Layers. Software Process. Common Process Framework. Umbrella Activities

STATEMENT OF WORK (SOW) for Web Content Management System Professional Services

Accelerate Your Enterprise Private Cloud Initiative

ESnet Support for WAN Data Movement

PROJECT: Online Shop STATUS REPORT September 2015

Department of Technology Services

Exascale Operating Systems and Runtime Software Report

UNCLASSIFIED. R-1 Program Element (Number/Name) PE A / High Performance Computing Modernization Program. Prior Years FY 2013 FY 2014 FY 2015

Template K Implementation Requirements Instructions for RFP Response RFP #

STATEMENT BY MR. CHRISTOPHER A. MILLER PROGRAM EXECUTIVE OFFICER DOD HEALTHCARE MANAGEMENT SYSTEMS BEFORE THE SENATE APPROPRIATIONS COMMITTEE

Software Defined Infrastructure The Next Wave of Workload Portability Vinod Eswaraprasad Principal Architect, Wipro

Big Data to Knowledge (BD2K)

In today s acquisition environment,

HPC & Big Data THE TIME HAS COME FOR A SCALABLE FRAMEWORK

Information Technology R&D and U.S. Innovation

CYBERSECURITY AND RESILIENCE

Colorado Department of Health Care Policy and Financing

The GPU Accelerated Data Center. Marc Hamilton, August 27, 2015

UNCLASSIFIED R-1 ITEM NOMENCLATURE FY 2013 OCO

Department of Defense DIRECTIVE

Introduction to the ITS Project Management Methodology

Long Term Storage and Transportation Issues for Used Nuclear Fuel. Office of Used Nuclear Fuel Disposition R &D Mission

Transcription:

Exascale Computing Project (ECP) Update Presented to ASCAC Paul Messina Project Director Stephen Lee Deputy Director Washington, D.C. April 4, 2016

ECP is in the process of making the transition from an initiative to a formal DOE project Quite a bit has happened since July last year 2 Exascale Computing Project

ECP mission need On July 29, 2015 the President established the National Strategic Computing Initiative (NSCI) to maximize the benefits of HPC for US economic competitiveness and scientific discovery. DOE is a lead agency within NSCI with the responsibility that the DOE Office of Science and DOE National Nuclear Security Administration will execute a joint program focused on advanced simulation through a capable exascale computing program emphasizing sustained performance on relevant applications. 3 Exascale Computing Project

In 2015, ASCAC made the following recommendations to the then nascent ECI 1. Develop a detailed management and execution plan that defines clear responsibilities and decision-making authority to manage resources, risks, and dependencies appropriately across vendors, DOE laboratories, and other participants. 2. Unlike other elements of the hardware/software ecosystem, application performance and stability are mission critical, necessitating continued focus on hardware/software co-design to meet application needs. 3. As part of the execution plan, clearly distinguish essential system attributes (e.g., sustained performance levels) from aspirational ones (e.g., specific energy consumption goals) and focus effort accordingly. 4. Mitigate software risks by developing evolutionary alternatives to more innovative, but risky alternatives. 5. Remain cognizant of the need for the ECI to support both data intensive and computation intensive workloads. 6. Given the scope, complexity, and potential impact of the ECI, conduct periodic external reviews by a carefully constituted advisory board. 7. Where appropriate, work with other federal research agencies and international partners on workforce development and long-term research needs, while not creating dependences that could delay or imperil the execution plan. 4 Exascale Computing Project

The ECP is a lab-led Project transitioning from the DOE exascale research activities DOE has been funding research related to exascale challenges for 5+ years The Exascale Computing Project (ECP) is being launched as a joint SC/NNSA partnership using DOE s formal project management processes The ECP is managed by DOE laboratories following DOE Order 413.3B 5 Exascale Computing Project

Programmatic components of the ECP It is a partnership between SC and NNSA, addressing science and national security missions Relies on investments by SC/ASCR and NNSA/ASC NNSA/ASC Advanced Technology Development and Mitigation (ATDM) supports activities for the delivery of exascale applications, software, and technology ECP does not procure exascale systems ECP includes only activities required for the delivery of the exascale computing capability (procurements of exascale systems will follow SC and NNSA processes and timelines) Relationship of the ECP to the National Strategic Computing Initiative On July 29, 2015, an executive order established the National Strategic Computing Initiative (NSCI) to ensure a coordinated Federal strategy in HPC research, development, and deployment. DOE, along with the DoD and NSF, co-leads the NSCI. Within DOE, SC and NNSA execute the ECP, which is the primary DOE contribution to the NSCI. 6 Exascale Computing Project

ECP Goals Develop a broad set of modeling and simulation applications that meet the requirements of the scientific, engineering, and nuclear security programs of the Department of Energy and the NNSA Develop a productive exascale capability in the US by 2023, including the required software and hardware technologies Prepare two or more DOE Office of Science and NNSA facilities to house this capability Maximize the benefits of HPC for US economic competitiveness and scientific discovery 7 Exascale Computing Project

ECP goals will be tracked and accomplished via familiar DOE processes ECP will fund and manage work at the national laboratories, industry (including medium and small businesses), and universities In most cases ECP will provide incremental funding to teams that already have a funding base Build on existing activities incremental does not mean small There is a formal solicitation and selection process There are major deliverables and various reviews of major milestones and deliverables 8 Exascale Computing Project

ECP Technical Approach ECP will pursue a ten-year plan structured into four focus areas: Application Development deliver scalable science and mission performance on a suite of ECP applications that are ready for efficient execution on the ECP exascale systems. Software Technology enhance the software stack that DOE SC and NNSA applications rely on to meet the needs of exascale applications and evolve it to utilize efficiently exascale systems. Conduct R&D on tools and methods that enhance productivity and facilitate portability. Hardware Technology fund supercomputer vendors to do the research and development of hardware-architecture designs needed to build and support the exascale systems. Exascale Systems fund testbeds, advanced system engineering development (NRE) by the vendors, incremental site preparation, and cost of system expansion needed to acquire capable exascale systems. 9 Exascale Computing Project

Current ECP status Project office established at ORNL Project team leadership selected Project scope determined and detailed WBS created Successfully completed an Independent Cost Review and an Independent Design Review Solicited proposals in applications, co-design centers, and software technology for exascale Vendor information meeting for hardware technology April 6, 2016 draft RFP for PathForward (Hardware Technology R&D) is online Awaiting CD-0 approval Preparing for CD-1/3a review in May 10 Exascale Computing Project

ECP Laboratory Team Project Director Deputy Project Management Director Applications Development Director Deputy Software Technology Director Deputy Hardware Technology Director Deputy Exascale Systems Director Deputy CTO Integration Manager Paul Messina, ANL Stephen Lee, LANL Kathlyn Boudwin, ORNL Doug Kothe, ORNL Bert Still, LLNL Rajeev Thakur, ANL Pat McCormick, LANL Jim Ang, SNL John Shalf, LBNL Terri Quinn, LLNL Susan Coghlan, ANL Al Geist, ORNL Julia White, ORNL 11 Exascale Computing Project

ECP Scope is based on Mission Needs and Requirements Scope was determined based on Breadth of the mission-critical DOE and NNSA applications Historical and current software requirements of DOE and NNSA applications Input on future needs from 133 DOE and NNSA lab responses to the applications RFI Reports of DOE and NNSA workshops on application and software needs for exascale Reports of workshops and analyses of hardware requirements Analyses of computing technology trends Identifying gaps in vendor product plans for DOE mission applications Experiences from the NNSA ASCI program 12 Exascale Computing Project

ECP WBS Exascale Computing Project 1. Project Management 1.1 Application Development 1.2 Software Technology 1.3 Hardware Technology 1.4 Exascale Systems 1.5 Project Planning and Management 1.1.1 Project Controls & Risk Management 1.1.2 Business Management 1.1.3 Procurement Management 1.1.4 Information Technology and Quality Management 1.1.5 Communications & Outreach 1.1.6 Integration 1.1.7 DOE Science and Energy Apps 1.2.1 DOE NNSA Applications 1.2.2 Other Agency Applications 1.2.3 Developer Training and Productivity 1.2.4 Co-Design and Integration 1.2.5 Programming Models and Runtimes 1.3.1 Tools 1.3.2 Mathematical and Scientific Libraries and Frameworks 1.3.3 Data Management and Workflows 1.3.4 Data Analytics and Visualization 1.3.5 System Software 1.3.6 Resilience and Integrity 1.3.7 Co-Design and Integration 1.3.8 PathForward Vendor Node and System Design 1.4.1 Design Space Evaluation 1.4.2 Co-Design and Integration 1.4.3 Site Preparation 1.5.1 System NRE 1.5.2 Testbeds 1.5.3 Base System Expansion 1.5.4 Co-Design and Integration 1.5.5 13 Exascale Computing Project

ECP Holistic Structure Capable exascale computing requires close coupling and coordination of key development and technology R&D areas. 14 Exascale Computing Project

ECP Integration/Co-Design is an essential function of the leadership team ECP leadership team members participate in evaluation of all major decisions Focus area directors will work closely to ensure that the ECP applications will be ready to use the exascale systems productively the supporting software will meet the needs of the applications and run effectively and efficiently on the exascale architectures the architectures and the hardware technologies of the exascale systems are designed to support a broad range of ECP application computational characteristic 15 Exascale Computing Project

ECP has a detailed integrated timeline 2016 ü AD: first set of S&E/ other apps selected ü AD: first CD centers selected ü AD: proxy apps start ü ST: projects selected ü HT: PF RFP, SOW ü HT: RFI for arch analysis 2018 ü AD: 3 rd set apps selected ü ST: APEX/CORAL testing ü ST, AD, HT: deliver rqmts to one another ü HT: PFWD work package revisions ü ES: testbed/prototype (CORAL #1) 2020 ü AD: 4 th set of apps selected ü AD: FOMs determined ü ST, AD: assessments and releases ü HT: architecture analysis to guide AD, ST co-design ü ES: NRE status review 2022 ü AD, ST: assessments and releases ü AD: CD releases ü AD: deliver reqmts to ST, HT, ES ü ST: APEX/CORAL testing ü ES: testbed/prototype (pre-exascale) ü ES: site prep review 2024 ü AD: assessment and releases on exascale platform ü ST: testing and tuning on exascale platform ü ST: final report 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2017 ü AD: 2 nd set apps selctd ü AD, ST: assessments and releases ü AD: rqmts to ST, HT ü ST: RFP to univs, ISVs ü ST: testbed/prototype testing ü ST, AD: workshops (annual) ü HT: PFWD kickoff 2019 ü AD, ST: deliver requirements to HT, ES ü ST: down select projects for production ü ES: PFWD semiannual review ü ES: testbed/prototype (CORAL#2) 2021 ü AD, ST: assessments and releases ü AD: deliver reqmts to ST,HT, ES ü ES: system (NRE and build) review ü ES: site prep begins 2023 ü AD: AD: tests on exascale platforms (proxy apps, apps, acceptance testing) ü ST: initial testing on exascale system platforms (acceptance testing) ü ES: testing exascale systems 2025 ü AD; final assessment and releases using exascale platforms ü AD: final CD releases on exascale platforms ü AD: final report ü AD, ST: final all hands workshop (project closeout) 16 Exascale Computing Project exascale systems ECP all-hands meetings Cross-Focus Area workshops or reviews

ECP Project Management Structure Project Management Executive Deputy Secretary of Energy Federal Agency Council Department of Energy UnderSecretary for Science and Energy Office of Science Director ASCR Director ASCR Program Manager Defense Programs Deputy Administrator ASC Director ASC Program Manager Federal Project Director (SC) Deputy Project Director (NNSA) UnderSecretary for Nuclear Security OSO/OR-ISC Support Science Council Industry Council Board of Directors Project Director Deputy Director CTO Integration Manager Project Office Team Director PM Reporting & Controls; Risk and Quality Assessment; Business Services; Procurement; IT; Outreach Application Development Director Software Technology Director Hardware Technology Director Exascale Systems Director 17 Exascale Computing Project

ECP Timeline The Project has three phases: Phase 1 R&D before DOE facilities exascale systems RFP in 2019 Phase 2 Exascale architectures and NRE are known. Targeted development Phase 3 Exascale systems delivered. Meet Mission Challenges Application Development Software Technology Hardware Technology System Build NRE Site Prep System expansion Testbeds & Prototypes Exascale Systems FY 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 18 Exascale Computing Project

ECP activity solicitation and selection process The call, review, and selection process for ECP activities is generally: Request for Information (RFI): 2 3 page white papers, Review against published criteria, Down selection to a reduced number for Request for Proposal (RFP), Review against published criteria, and Selection Selection criteria include quality and makeup of the team, relevance to exascale, match to mission needs, technical feasibility. Process steps, RFIs, RFPs, and criteria are tuned as appropriate for the technical project under consideration and target (academia, industry). The ECP focus area leaders select teams of subject matter expert reviewers. Review results inform the decision process, which is finalized within the ECP to ensure integration across the project. 19 Exascale Computing Project

ECP has developed a detailed set of technical risks Inability of applications to achieve their science goals Vendor issues could result in project delays, technical challenges or financial losses Inability to inject needed software technologies into production Unavailability of math libraries and software frameworks or insufficient application integrity for adequate throughput and productivity Above are top technical risks; over 70 risks currently in Risk Register 20 Exascale Computing Project

ECP is off to a strong start ü Develop a detailed management and execution plan that defines clear responsibilities and decision-making authority to manage resources, risks, and dependencies appropriately across vendors, DOE laboratories, and other participants. Mostly developed (pending MOA among Labs & CD-1/3a review) ü Unlike other elements of the hardware/software ecosystem, application performance and stability are mission critical, necessitating continued focus on hardware/software co-design to meet application needs. Part of the ECP WBS ü As part of the execution plan, clearly distinguish essential system attributes (e.g., sustained performance levels) from aspirational ones (e.g., specific energy consumption goals) and focus effort accordingly. Planned as part of project execution ü Mitigate software risks by developing evolutionary alternatives to more innovative, but risky alternatives. Planning focuses on evolution selectively, including innovation ü Remain cognizant of the need for the ECI to support both data intensive and computation intensive workloads. Planned as part of project execution ü Given the scope, complexity, and potential impact of the ECI, conduct periodic external reviews by a carefully constituted advisory board. Planned in the ECP PEP ü Where appropriate, work with other federal research agencies and international partners on workforce development and long-term research needs, while not creating dependences that could delay or imperil the execution plan. Such considerations are included in ECP planning and programmatic discussions 21 Exascale Computing Project

Questions?