Canadian Meteorological Centre HPC Renewal Initiative

Similar documents
Cluster Implementation and Management; Scheduling

High Performance Computing OpenStack Options. September 22, 2015

EARTH SYSTEM SCIENCE ORGANIZATION Ministry of Earth Sciences, Government of India

VDI: What Does it Mean, Deploying challenges & Will It Save You Money?

CEDA Storage. Dr Matt Pritchard. Centre for Environmental Data Archival (CEDA)

SCALABLE FILE SHARING AND DATA MANAGEMENT FOR INTERNET OF THINGS

Transformation Initiatives: Status Update

Agenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC

Appro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes. Anthony Kenisky, VP of North America Sales

The Hartree Centre helps businesses unlock the potential of HPC

NOAA Big Data Project. David Michaud Acting Director, Office of Central Processing Office Monday, August 3, 2015

Cluster, Grid, Cloud Concepts

Big Data and the Earth Observation and Climate Modelling Communities: JASMIN and CEMS

Data Storage At the Heart of any Information System. Ken Claffey, VP/GM - June 2015

SURFsara HPC Cloud Workshop

Procurement Outsourcing and. Shared Service Centers. SANFORD INTERNATIONAL I Global Sourcing I Supply Chain I Procurement I.

Compute Canada Technology Briefing

Distributed Computing. Mark Govett Global Systems Division

Data Centre Networks Overview

The Copernicus Marine Enviroment Monitoring Service

IBM Deep Computing Visualization Offering

Kriterien für ein PetaFlop System

How Cineca supports IT

Request for Information PM CONTRACT MANAGEMENT SOLUTION

COMP/CS 605: Intro to Parallel Computing Lecture 01: Parallel Computing Overview (Part 1)

Information Growth: Challenge or Opportunity for the Storage User?

PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN

The safer, easier way to help you pass any IT exams. Exam : Storage Sales V2. Title : Version : Demo 1 / 5

Red Hat Storage Server

The CLASS Cloud Access Pilot

Title: Contract Management Software Solutions (CMS) and Procurement Front-End System

Low-cost

Interactive Data Visualization with Focus on Climate Research

Introduction to grid technologies, parallel and cloud computing. Alaa Osama Allam Saida Saad Mohamed Mohamed Ibrahim Gaber

Copernicus Information Day Q&A presentation

SURFsara HPC Cloud Workshop

High Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/ CAE Associates

HPC Storage Solutions at transtec. Parallel NFS with Panasas ActiveStor

Big Data at ECMWF Providing access to multi-petabyte datasets Past, present and future

Welcome to NASA Applied Remote Sensing Training (ARSET) Webinar Series

Scientific Computing Data Management Visions

Cloud Computing. Alex Crawford Ben Johnstone

How To Build A Supermicro Computer With A 32 Core Power Core (Powerpc) And A 32-Core (Powerpc) (Powerpowerpter) (I386) (Amd) (Microcore) (Supermicro) (

Big Data Assimilation Revolutionizing Weather Prediction

Virtualization s Evolution

Data Centric Systems (DCS)

Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing

Mapping the Universe; Challenge for the Big Data science

EMC BACKUP MEETS BIG DATA

Infrastructure Technical Support Services. Request for Proposal

Workshop on Parallel and Distributed Scientific and Engineering Computing, Shanghai, 25 May 2012

PRACE the European HPC Research Infrastructure. Carlos Mérida-Campos, Advisor of Spanish Member at PRACE Council

DESWAT project (Destructive Water Abatement and Control of Water Disasters)

High Performance Computing and Big Data: The coming wave.

Parallel Computing with MATLAB

Next generation models at MeteoSwiss: communication challenges

How To Compare Amazon Ec2 To A Supercomputer For Scientific Applications

Continuous Availability for Business-Critical Data

Request for Information (RFI) Supply of information on an Enterprise Integration Solution to CSIR

Cloud Computing through Virtualization and HPC technologies

Climate Change: A Local Focus on a Global Issue Newfoundland and Labrador Curriculum Links

EVERYTHING THAT MATTERS IN ADVANCED ANALYTICS

Nexenta Performance Scaling for Speed and Cost

Armenian State Hydrometeorological and Monitoring Service

Improving Grid Processing Efficiency through Compute-Data Confluence

Barry Bolding, Ph.D. VP, Cray Product Division

Parallel Simplification of Large Meshes on PC Clusters

Virtualisation Cloud Computing at the RAL Tier 1. Ian Collier STFC RAL Tier 1 HEPiX, Bologna, 18 th April 2013

Summit and Sierra Supercomputers:

High Performance Computing in CST STUDIO SUITE

Understanding Virtualization and Cloud in the Enterprise

Amazon Web Services Annual ALGIM Conference. Tim Dacombe-Bird Regional Sales Manager Amazon Web Services New Zealand

National Dam Safety Program Technical Seminar #22. When is Flood Inundation Mapping Not Applicable for Forecasting

REQUEST FOR INFORMATION. Hosted Website Solution and Services RFI #E Closing: March 24, 2015 at 2:00 pm local time.

Deploying and managing a Visualization Onera

Highly Scalable Dynamic Load Balancing in the Atmospheric Modeling System COSMO-SPECS+FD4

Very High Resolution Arctic System Reanalysis for

Joe Young, Senior Windows Administrator, Hostway

ENVIRONMENTAL MONITORING Vol. I - Remote Sensing (Satellite) System Technologies - Michael A. Okoye and Greg T. Koeln

HETEROGENEOUS SYSTEM COHERENCE FOR INTEGRATED CPU-GPU SYSTEMS

Performance Analysis of a Numerical Weather Prediction Application in Microsoft Azure

Basic Climatological Station Metadata Current status. Metadata compiled: 30 JAN Synoptic Network, Reference Climate Stations

Transcription:

Canadian Meteorological Centre HPC Renewal Initiative icas2015, September 16 2015 Laurent Chardon, Environment Canada Luc Corbeil, Shared Services Canada

Agenda Background information on the project Environmental Forecasting Requirements Supercomputing Procurement Conclusion Page 2 October-5-15

Background Contract signed in 2003 for the Scientific Computing Facility (SCF) with IBM, now soon to expire. Additional funding recently awarded to the Meteorological Service of Canada Component 1: Monitoring Networks Component 2: Supercomputing capacity Component 3: Weather Warnings and Forecast System As a result: HPC Renewal Initiative launched Page 3 October-5-15

Variety of customers of SCF Weather forecasting Research Development Operations Climate Air quality Environmental emergency response Page 4 October-5-15

Environmental Forecasting Requirements Science drives the requirements User Requirement Document (URD) produced 10 year plan outlining the expected scientific path (EC) Translated in computing needs (EC&SSC) RFP (SSC lead with EC representation) Page 5 October-5-15

Meteorological Research Division RPN-A Atmospheric modelling RPN-E Environmental modelling RPN-SI Informatics section ARMA Data assimilation ARMP Cloud physics and severe weather Page 6 October-5-15

Atmospheric Numerical Prevision Research HR global forecast 15 km, 4/day 80 levels -> 120 HR North America LAM 250m New 3D microphysics scheme Turbulence Ensemble forecast to HR 50km -> 25 km -> 15 km Increase radiation calls Cloud physics renewal Page 7 October-5-15

Environmental Numerical Prevision Research Ocean, ice, waves, urban, ground, vegetation, hydrology, lakes Global coupled Ocean/Atm forecast 15 km Global ice forecast Global wave forecast Ensemble arctic ocean forecast (20 + 20 mercator) 4-5 days HR arctic ocean deterministic forecast (1/36 º) < 4 days Coupled HR Lake-Ice-Atm over the great lakes Page 8 October-5-15

Data Assimilation Data assimilation at same resolution as forecast Add aeolus, GOES-R data Increase ensemble # of members 256 -> 512 Surface data assimilation -> soil moisture SMAP O(1000) -> O(15,000,000) AMSU-A, 6-h period, 7 satellites, temp profiles Page 9 October-5-15

Shared Services Canada Created in 2012, to take responsibility of email, networks and data center for the whole Government of Canada. Supercomputing IT people working for EC transferred to SSC. Scope of the HPC team expanded to all science departments As in any reorganization, there are challenges and opportunities! Page 10 October-5-15

EC Supercomputing Procurement Contract for Hosted HPC Solution: 8.5 years + one 2.5 year option (three systems + one optional) Connectivity between HPC Solution, Data Halls and Dorval Solution Data Hall A Inter-Hall Link (x2) Solution Data Hall B No more than 70km between Hall A, Hall B & Dorval Flexible Options for additional GC needs Site A Inter-Hall Link (x2) NCF Inter-Hall Link (x2) Site B Page 11 October-5-15

Scope Supercomputer Pre/Post-processing Global Parallel Storage (high-bandwidth low latency) Near-line Storage High Performance Interconnect Software & tools Maintenance & Support Training & Conversion support On-going Availability Page 12 October-5-15

HPC Solution: Fully Redundant Home Home Scratch Supercomputer A Site Store Site Store Scratch Supercomputer B Pre/Post Processing A Pre/Post Processing B HP-NLS HP-NLS Cache Cache HPN Data Transfer Solution Data Hall A DATA Feeds DATA Feeds Solution Data Hall B Storage Synchronization Out-of-Band Management NCF SCF Data Flow Logical View LPT, HPN/DADS, SSC 2014-10-07 Page 13 October-5-15

Collaborative Procurement Process Use of (more) Collaborative Procurement Process RFI Invitation to Qualify 4 Qualified Respondents (QR): Cray, Dell, HP & IBM Review and Refine Requirements Phase Bid Solicitation Phase Contract Award We are here Due Diligence Industry Engagement Request for Responses for Evaluation (RFRE) Phase Review and Refine Requirements Phase Bid Solicitation Phase Contract Award and Acceptance Clarification Response + Evaluate Clarification Response + Evaluate Qualify All Successful Respondents Select Top Bidder(s) Page 14 October-5-15

Technology Quite a bit of freedom on how the vendor can achieve the performance targets A lot of restrictions came from applications that aren t in the benchmark suite. They need to perform as well No GPUs nor other accelerators on 1 st system, strong possibility on the 2 nd Page 15 October-5-15

Bid Evaluation One number that drives almost everything: Fixed Performance Level (FPL) From that number, we derived floor amounts for memory, storage, benchmark performance, tape qty, etc. Minimum value is 0.25 (lower bid == non compliant) It means ~5X performance increase (with FPL= 0.25. ~20X for FPL=1) for 1 st system on the main GEM benchmark over the Power7, then 2.6X for each upgrade Requirements increased automatically as closing date of the RFP got pushed On an exponential growth curve, the starting point matters a lot! Page 16 October-5-15

Reality Check Tender issued late November Supply Chain Integrity (Security Phase) QR s supply chain is vetted by GoC for security purposes Took much longer than anticipated Market environment changes Procurement risks, technology, foreign exchange rate, etc. And Federal Elections Longest election campaign ever in the country s history launched in August (78 days). Most main decisional processes are paralyzed until November at the earliest Page 17 October-5-15

HPC Implementation Milestones: Delivery to Acceptance Data Hall and Hosting Site Certification Functionality Testing (IT infra) Security Accreditation Performance testing Conversion of Operational codes (Automated Environmental Analysis & Production (AEAPPS) Meeting the above triggers a 30 day availability test *Scheduling Details in next slide Inspection Functionality Testing Performance Testing Conversion RFU Acceptance Page 18 October-5-15

Once the Ink Dries Page 19 October-5-15 1 9

General Purpose Science Cluster (GPSC) Very similar to UK NCAS approach Federating users of many computing communities Using Linux containers (LXC, not docker) Works well, but not out of the box Workloads extremely heterogeneous Month(s)-long jobs? 4TB RAM on the node? 100Mio 32k files? Or one 3.2TB file? Or a 3PB database? Linux and Windows? In dev mode. More to report in a year Page 20 October-5-15

Summary Canada is investing massively in HPC to support the Environmental Prediction program RFP almost completed Evaluation of the bids still in progress It will eventually bring EC to O(100) Petaflops and O(1) Exabyte Page 21 October-5-15