Petascale Software Challenges. Piyush Chaudhary piyushc@us.ibm.com High Performance Computing

Similar documents
Big data management with IBM General Parallel File System

Agenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC

- An Essential Building Block for Stable and Reliable Compute Clusters

How to Choose your Red Hat Enterprise Linux Filesystem

Technical Computing Suite Job Management Software

Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle

Shared Parallel File System

How To Virtualize A Storage Area Network (San) With Virtualization

IBM System x GPFS Storage Server

Red Hat Enterprise linux 5 Continuous Availability

IBM Spectrum Scale vs EMC Isilon for IBM Spectrum Protect Workloads

Virtualization Technologies and Blackboard: The Future of Blackboard Software on Multi-Core Technologies

LS DYNA Performance Benchmarks and Profiling. January 2009

Sun Constellation System: The Open Petascale Computing Architecture

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

Cloud Computing through Virtualization and HPC technologies

IBM Platform Computing : infrastructure management for HPC solutions on OpenPOWER Jing Li, Software Development Manager IBM

System Software for High Performance Computing. Joe Izraelevitz

<Insert Picture Here> Managing Storage in Private Clouds with Oracle Cloud File System OOW 2011 presentation

Final Report. Cluster Scheduling. Submitted by: Priti Lohani

GPFS Storage Server. Concepts and Setup in Lemanicus BG/Q system" Christian Clémençon (EPFL-DIT)" " 4 April 2013"

A Survey of Shared File Systems

Symantec Storage Foundation High Availability for Windows

Next Generation Operating Systems

Quantum StorNext. Product Brief: Distributed LAN Client

BlueArc unified network storage systems 7th TF-Storage Meeting. Scale Bigger, Store Smarter, Accelerate Everything

How To Speed Up A Flash Flash Storage System With The Hyperq Memory Router

Nutanix Tech Note. Configuration Best Practices for Nutanix Storage with VMware vsphere

LSKA 2010 Survey Report Job Scheduler

Red Hat Enterprise Linux 6. Stanislav Polášek ELOS Technologies

Cluster Implementation and Management; Scheduling

<Insert Picture Here> Oracle Cloud Storage. Morana Kobal Butković Principal Sales Consultant Oracle Hrvatska

IBM Platform Computing Cloud Service Ready to use Platform LSF & Symphony clusters in the SoftLayer cloud

HyperQ Remote Office White Paper

Network Attached Storage. Jinfeng Yang Oct/19/2015

Data Centric Systems (DCS)

Eloquence Training What s new in Eloquence B.08.00

Content Distribution Management

Achieving Mainframe-Class Performance on Intel Servers Using InfiniBand Building Blocks. An Oracle White Paper April 2003

So#ware Tools and Techniques for HPC, Clouds, and Server- Class SoCs Ron Brightwell

PADS GPFS Filesystem: Crash Root Cause Analysis. Computation Institute

HyperQ DR Replication White Paper. The Easy Way to Protect Your Data

InfiniBand Software and Protocols Enable Seamless Off-the-shelf Applications Deployment

Cray DVS: Data Virtualization Service

Operating System for the K computer

Realizing the True Potential of Software-Defined Storage

Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks

Enterprise Storage Solution for Hyper-V Private Cloud and VDI Deployments using Sanbolic s Melio Cloud Software Suite April 2011

Connecting Flash in Cloud Storage

Application Brief: Using Titan for MS SQL

ORACLE DATABASE 10G ENTERPRISE EDITION

Distributed RAID Architectures for Cluster I/O Computing. Kai Hwang

Windows Server 2008 R2 Hyper-V Live Migration

Cisco and EMC Solutions for Application Acceleration and Branch Office Infrastructure Consolidation

ECLIPSE Performance Benchmarks and Profiling. January 2009

Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer

Clusters: Mainstream Technology for CAE

David Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems

IBM Software Group. Lotus Domino 6.5 Server Enablement

Datacenter Operating Systems

CASE STUDY: Oracle TimesTen In-Memory Database and Shared Disk HA Implementation at Instance level. -ORACLE TIMESTEN 11gR1

Enterprise-Class Virtualization with Open Source Technologies

TCP Servers: Offloading TCP Processing in Internet Servers. Design, Implementation, and Performance

How To Connect Virtual Fibre Channel To A Virtual Box On A Hyperv Virtual Machine

Scala Storage Scale-Out Clustered Storage White Paper

An Oracle White Paper November Oracle Real Application Clusters One Node: The Always On Single-Instance Database

COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters

Sun in HPC. Update for IDC HPC User Forum Tucson, AZ, Sept 2008

Scale and Availability Considerations for Cluster File Systems. David Noy, Symantec Corporation

PolyServe Matrix Server for Linux

OVERVIEW. CEP Cluster Server is Ideal For: First-time users who want to make applications highly available

GlusterFS Distributed Replicated Parallel File System

Windows Server 2008 R2 Hyper-V Live Migration

Petascale Software Challenges. William Gropp

Making Multicore Work and Measuring its Benefits. Markus Levy, president EEMBC and Multicore Association

Building Storage Service in a Private Cloud

SMB Advanced Networking for Fault Tolerance and Performance. Jose Barreto Principal Program Managers Microsoft Corporation

Accelerating High-Speed Networking with Intel I/O Acceleration Technology

Business Continuity with the. Concerto 7000 All Flash Array. Layers of Protection for Here, Near and Anywhere Data Availability

High Performance Computing. Course Notes HPC Fundamentals

Cloud Computing. Adam Barker

Protect Microsoft Exchange databases, achieve long-term data retention

Solid State Storage in Massive Data Environments Erik Eyberg

Mission-Critical Java. An Oracle White Paper Updated October 2008

Windows TCP Chimney: Network Protocol Offload for Optimal Application Scalability and Manageability

LoadLeveler Overview. January 30-31, IBM Storage & Technology Group. IBM HPC Developer TIFR, Mumbai

Best Practices for Data Sharing in a Grid Distributed SAS Environment. Updated July 2010

MOSIX: High performance Linux farm

REDEFINE SIMPLICITY TOP REASONS: EMC VSPEX BLUE FOR VIRTUALIZED ENVIRONMENTS

An Analysis of Container-based Platforms for NFV

Cluster, Grid, Cloud Concepts

WebLogic on Oracle Database Appliance: Combining High Availability and Simplicity

Distributed Operating Systems. Cluster Systems

Symmetric Multiprocessing

Cloud Based Application Architectures using Smart Computing

Oracle Database Scalability in VMware ESX VMware ESX 3.5

Why Computers Are Getting Slower (and what we can do about it) Rik van Riel Sr. Software Engineer, Red Hat

Transcription:

Petascale Software Challenges Piyush Chaudhary piyushc@us.ibm.com High Performance Computing

Fundamental Observations Applications are struggling to realize growth in sustained performance at scale Reasons need to be understood and addressed Productivity enhancements needed to make supercomputing accessible Domain of operation for the HPC software stack needs to expand beyond a single supercomputer cluster Domain needs to extend to the entire data center (or even the enterprise) Peta-scale systems co-exist with other large systems in the data center Operational efficiency of Supercomputers often overlooked MTBF: Mean time between failure is increasingly critical at scale Continuous operation always being able serving applications becomes critical 2

Sustained Performance vs. Peak Computational Efficiency 250 Linpack Sustained Scaling 200 150 100 50 0 Ideal Actual 0 128 256 Peak Scaling Node Scaling Arch + Alg Algorithmic + Architectural Sustained Performance Scaling Efficiency Average Very Poor Goal Key Problem: Gap between peak and sustained continues to widen Goal: Understand reasons for the gap and attack the problems 3

Ideas to Address Sustained Performance Communication latency: Burst MMIO Cache injection Lock overhead reduction Integrated I/O, Express and Ethernet closer to the micro-processing elements Overlap computation/communication: asynchronous data mover Collective Communication overheads: COE and daemon squashers (co-scheduling) RDMA exploitation Memory latency: Pre-fetch hints to subsystems and applications (SMT assist threads) Drive towards zero cache miss execution in the latency critical paths (Compiler approaches) Continuous Program Optimization Algorithmic Better tools to assist in smarter application development ( Locking & Message passing impacts) These are only some of the ideas. Many more being explored 4

Productivity Domains Programmer Develop Applications Debug Applications Tune Applications Administrator Storage management Network management Install, upgrades System monitoring System Operational ή Maximize System throughput Maximize Enterprise ή Ensure system balance RAS Continuous Operation Problem isolation FFDC Serviceability 5

Programmer Productivity Developing new parallel applications IDE environments Libraries Debugging parallel applications Tracing and parsing hooks Static analysis tools Rational Performance tuning parallel applications HPC Toolkit, FDPR, compiler feedback, static analysis tools Application/system tuning parameters CPO 6

IDE: Programming Model Specific Editor 7

Programming Models Support Models supported today MPI Two sided: Message Passing Interface LAPI One sided programming model Platform for middleware and exploring other programming models SHMEM shared memory programming model Future Models X10 concepts being applied to HPLS initiatives UPC Unified Parallel C Component Libraries ESSL, PESSL enhancements, COIN-OR 8

MetaCluster Provides Application Mobility App1 App3 App4 App2 OS 1 OS 2 HW HW MetaCluster: Application Containers enabling on-demand mobility of workload Lowest overhead of all virtualization technologies (<1%) Finest grain: application-level; enables app-centric management Largest scope of supported platform (Intel, AMD, Linux, SMP, 32b, 64b) Enterprise ready with preservation of all states including TCP Metacluster Application Containers compliment Virtual Machines. 9

Facility Wide File Sharing Each FS belongs to one server cluster, responsible for providing storage administration lock management recovery client nodes can mount FS from server cluster and request locks access data & metadata directly over the SAN External access also enabled through NFS, Samba Remote mount owning cluster 10

Multiple Active NSD Servers Red client cluster G1 G2 G3 Gx Server cluster N1 N2 N3 Nx D M1 M2 M3 Mx Network M Network G Network F F1 F2 Fy F3 N1+x N2+x N3+x NY+X G1+x G2+x G3+x Gy+x Blue client cluster Client nodes favor NSD servers on the same subnet (presumably the same fabric) High availability - when the NSD server for a LUN is not reachable: Fall back to other NSD servers on the same subnet Next, fall back to NSD servers on other routable subnets Multi-cluster implications NSD servers for a file sysem must all be in the same cluster If using multiple active NSD servers in multi-cluster environment, configure NSD node on client subnet as a member of one server cluster exporting the file system 11

Resource Management Vision Exploit GPFS WAFS, and Meta-cluster capability to make enterprise wide scheduling and RM efficient Multi-platform AIX and Linux already available Extending to p (AIX, Linux), x86 Linux, Blue Gene Invest in enhancing LL through advanced technologies Reservation in advance enhancements Flexible usage model and management for various large page pools Exploitation of SMT in a more flexible fashion Stronger affinity controls and usage limits Faster and scalable job launch Modern GUI with user friendly and intuitive interface 12

Example Demonstrating Enterprise Operational Efficiency and the Scale Across Vision new job submission Enterprise Job Queue Without C/R and Relocation Cluster 1: 4 jobs running. 2 nodes free Cluster 2: 2 jobs running, 6 nodes free Utilization: Cluster 1: 87%, Cluster 2: 63% Enterprise Utilization: 75% Relocating jobs to different cluster Cluster 1 Cluster 2 With C/R and Relocation Cluster 1: 3 jobs running. 0 nodes free Cluster 2: 4 jobs running, 0 nodes free Utilization: Cluster 1: 100%, Cluster 2: 100% Enterprise Utilization: 100% Cluster 1 Cluster 2 13

System Operational Efficiency Maximize utilization, and throughput of the systems in the enterprise Issues are Applications see inconsistent performance when run on production system Application shares resources with other applications in the system OS jitter impacts application run time especially at scale Ensuring intelligent balance of applications and utilization of various components critical CPU, memory, network, storage Need a workload mix on the system that utilizes all resources in a balanced fashion Increased intelligence for system tuning parameters Continuous program optimization Ability to pre-empt, checkpoint, relocate, applications seamlessly throughout the enterprise Ability to share files seamlessly across the enterprise 14

Systems Management Cluster Install Robust, fast, and push-button capabilities Concurrent and seamless upgrades Verification capabilities Monitoring of the network interconnect, power, fans, and other hardware components Drive towards zero administration and downtime 15

RAS Reliability and Availability Storage: Controllers, disks, network, IO servers, compute nodes failures tolerated Programming Models Packet drops, link errors, adapter failures, network reach-ability, checkpoint, restart and relocation Resource manager Compute nodes, network and reach-ability challenges Serviceability Diagnostic tools Monitoring infrastructure, errpts, FSP, trace data, network diagnostics, storage, resource management, systems management 16

Balanced System Architecture & Design Hardware Compilers & Tools Programmer Productivity Operating Systems System Productivity Admin Productivity Resource Management File IO & Data Management 17

BACKUP 18