Cluster Grid Interconects. Tony Kay Chief Architect Enterprise Grid and Networking



Similar documents
Achieving Mainframe-Class Performance on Intel Servers Using InfiniBand Building Blocks. An Oracle White Paper April 2003

InfiniBand Software and Protocols Enable Seamless Off-the-shelf Applications Deployment

Sockets vs. RDMA Interface over 10-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck

Improved LS-DYNA Performance on Sun Servers

Solving I/O Bottlenecks to Enable Superior Cloud Efficiency

White Paper Solarflare High-Performance Computing (HPC) Applications

From Ethernet Ubiquity to Ethernet Convergence: The Emergence of the Converged Network Interface Controller

I/O Virtualization Using Mellanox InfiniBand And Channel I/O Virtualization (CIOV) Technology

EDUCATION. PCI Express, InfiniBand and Storage Ron Emerick, Sun Microsystems Paul Millard, Xyratex Corporation

High Speed I/O Server Computing with InfiniBand

PCI Express and Storage. Ron Emerick, Sun Microsystems

Mellanox Cloud and Database Acceleration Solution over Windows Server 2012 SMB Direct

Building Enterprise-Class Storage Using 40GbE

InfiniBand vs Fibre Channel Throughput. Figure 1. InfiniBand vs 2Gb/s Fibre Channel Single Port Storage Throughput to Disk Media

Lustre Networking BY PETER J. BRAAM

Can High-Performance Interconnects Benefit Memcached and Hadoop?

A Tour of the Linux OpenFabrics Stack

Agenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC

Future technologies for storage networks. Shravan Pargal Director, Compellent Consulting

PCI Express Impact on Storage Architectures. Ron Emerick, Sun Microsystems

Comparing SMB Direct 3.0 performance over RoCE, InfiniBand and Ethernet. September 2014

Exploiting Remote Memory Operations to Design Efficient Reconfiguration for Shared Data-Centers over InfiniBand

Michael Kagan.

MPI / ClusterTools Update and Plans

RDMA over Ethernet - A Preliminary Study

Oracle Exadata: The World s Fastest Database Machine Exadata Database Machine Architecture

Meeting the Five Key Needs of Next-Generation Cloud Computing Networks with 10 GbE

ECLIPSE Performance Benchmarks and Profiling. January 2009

PCI Express Impact on Storage Architectures and Future Data Centers. Ron Emerick, Oracle Corporation

SMB Direct for SQL Server and Private Cloud

Interoperability Test Results for Juniper Networks EX Series Ethernet Switches and NetApp Storage Systems

Low Latency 10 GbE Switching for Data Center, Cluster and Storage Interconnect

Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks

Ultra Low Latency Data Center Switches and iwarp Network Interface Cards

Windows TCP Chimney: Network Protocol Offload for Optimal Application Scalability and Manageability

Virtual Compute Appliance Frequently Asked Questions

Clusters: Mainstream Technology for CAE

Why Compromise? A discussion on RDMA versus Send/Receive and the difference between interconnect and application semantics

Cisco SFS 7000P InfiniBand Server Switch

AIX NFS Client Performance Improvements for Databases on NAS

LS DYNA Performance Benchmarks and Profiling. January 2009

InfiniBand in the Enterprise Data Center

Copyright 2013, Oracle and/or its affiliates. All rights reserved.

PERFORMANCE CONSIDERATIONS FOR NETWORK SWITCH FABRICS ON LINUX CLUSTERS

State of the Art Cloud Infrastructure

New Data Center architecture

Comparing the performance of the Landmark Nexus reservoir simulator on HP servers

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

PCI Express IO Virtualization Overview

An Oracle White Paper December Consolidating and Virtualizing Datacenter Networks with Oracle s Network Fabric

Connecting the Clouds

Introduction to Infiniband. Hussein N. Harake, Performance U! Winter School

IP SAN Fundamentals: An Introduction to IP SANs and iscsi

A Micro-benchmark Suite for Evaluating Hadoop RPC on High-Performance Networks

Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays. Red Hat Performance Engineering

Accelerating High-Speed Networking with Intel I/O Acceleration Technology

Advancing Applications Performance With InfiniBand

Performance Evaluation of the RDMA over Ethernet (RoCE) Standard in Enterprise Data Centers Infrastructure. Abstract:

The virtualization of SAP environments to accommodate standardization and easier management is gaining momentum in data centers.

SMB Advanced Networking for Fault Tolerance and Performance. Jose Barreto Principal Program Managers Microsoft Corporation

3G Converged-NICs A Platform for Server I/O to Converged Networks

Accelerating From Cluster to Cloud: Overview of RDMA on Windows HPC. Wenhao Wu Program Manager Windows HPC team

PCI Express Impact on Storage Architectures and Future Data Centers

RoCE vs. iwarp Competitive Analysis

Building a Scalable Storage with InfiniBand

Block based, file-based, combination. Component based, solution based

PRIMERGY server-based High Performance Computing solutions

- An Essential Building Block for Stable and Reliable Compute Clusters

Hyper-V over SMB Remote File Storage support in Windows Server 8 Hyper-V. Jose Barreto Principal Program Manager Microsoft Corporation

OFA Training Program. Writing Application Programs for RDMA using OFA Software. Author: Rupert Dance Date: 11/15/

PCI Express Impact on Storage Architectures and Future Data Centers. Ron Emerick, Oracle Corporation

Global Headquarters: 5 Speen Street Framingham, MA USA P F

Whitepaper. Implementing High-Throughput and Low-Latency 10 Gb Ethernet for Virtualized Data Centers

Unified Fabric: Cisco's Innovation for Data Center Networks

Configuration Maximums VMware vsphere 4.0

Enabling High performance Big Data platform with RDMA

IBM Power 720 Express server

High-Performance Networking for Optimized Hadoop Deployments

Cloud Computing and the Internet. Conferenza GARR 2010

QuickSpecs. HP NC532m Dual Port Flex-10 10GbE Multifunction BL-c Adapter. What's New Support for BL2x220c G6 server blade

Mellanox Academy Online Training (E-learning)

Sun Constellation System: The Open Petascale Computing Architecture

Overview and Frequently Asked Questions Sun Storage 10GbE FCoE PCIe CNA

Pre-Conference Seminar E: Flash Storage Networking

Grid Computing. Jean-François Jorion Solution Sales Specialist Sun Microsystems France

Data Center Op+miza+on

CORRIGENDUM TO TENDER FOR HIGH PERFORMANCE SERVER

Storage Virtualization from clusters to grid

Solution Brief July All-Flash Server-Side Storage for Oracle Real Application Clusters (RAC) on Oracle Linux

Storage at a Distance; Using RoCE as a WAN Transport

Storage Networking Foundations Certification Workshop

Transcription:

Cluster Grid Interconects Tony Kay Chief Architect Enterprise Grid and Networking

Agenda Cluster Grid Interconnects The Upstart - Infiniband The Empire Strikes Back - Myricom Return of the King 10G Gigabit Summary

Agenda Cluster Grid Interconnects

Cluster Grid Problems Performance often Interconnect determined 3 rd party interconnects (IB, Myrinet, Quadrics) Expensive (Card + Port) Limitations ISV Support OS Support Scalability Not all address the storage issue NAS still rules, DAS expensive

DRAFT Workload Performance Factors Processor speed, capacity and throughput Memory capacity System interconnect latency & bandwidth Network and storage I/O Operating system scalability Visualization performance and quality Optimized applications Network service availability #1 issue for real world cluster performance and scaling

DRAFT The Grid Architecture Dilemma: Scale Vertically or Scale Horizontally? Scale Vertically: Parallel applications: OpenMP Large Shared Memory Top Performance Higher acquisition cost Lower development and management complexity & cost $/CPU The Deciding Factor What do the workloads require? Scale Horizontally: Serial and parallel applications: MPI Throughput Lower acquisition cost Higher development and management complexity & cost

Right Interconnect for the job 900 dual CPU Nodes: 7 x 256 CPU clusters 3 x 256 CPU I/O intense Gigabit (dual) 4 x 256 CPU MPI intense Gigabit (TCP/IP) Myrinet (MPI) Choose right interconnect for the job Sun's CRS build 38 racks containing 1,800 CPU cluster nodes (6 lorry loads) Final Delivery: 1 lorry load

Good News Interconnects are getting: Faster Lighter Greater throughput per %CPU Cheaper Multi-protocol support Infiniband FCAL, Ethernet friendly Myrinet Ethernet friendly

Bad News what to buy in 2005? No clear winner in 2005 Myrinet 'De facto' legacy standard? IB The promise of a 'single fabric' 10Gigabit Ethernet Undisputed Networking World champion? 1995 Token Ring 199x FDDI 199x ATM Myrinet & Infiniband - TBA?

Agenda The Upstart - Infiniband

Why Infiniband Open 'Standard' Price dropping quickly Number of players: Mellanox, Infinicon, TopSpin, Voltaire Optimised; Low Latency, High Bandwidth Dual port 10GBps PCI-X now (850MB/s sustained) 30GB/s PCI Express coming 1 Port, 3 roles: Cluster (MPI) LAN TCP/IP Storage

Infiniband Protocols - Rich Host to Host - SDP (Sockets Direct Protocol) Storage SRP (SCSI RDMA protocol for IB and IB to FC bridges) Network - TCP termination and IPoverIB Also used for MPI, DAFS, NDIS, VIA End End Node Node End Rich ISV support for API's Node End Node Switch Only 'Open' RDMA End Node Switch Switch End Node Switch End Node End Node End Node End Node

Infiniband and Oracle 10G RAC release 1 Currently Red Hat Enterprise Linux Solaris port underway Topspin and Voltaire port underway SQL*NET InfiniBand SDP/IB Cluster interconnect udapl DB Server Application Servers

Agenda The Empire Strikes Back - Myricom

Myrinet Dominant MPI Interconnect today Good OS and ISV support Good performance Light on host operating HW/SW Improving road-map Infiniband and threat of 10G forcing: Innovation Big cost reduction

Myricom Keeping itself Relevant? Myrinet 10G 'Look and feel' like 10 Gigabit Ethernet (10 Gbe) But lower latency Lower host CPU overhead Protocol offload Same physical infrastructure 10Gbit/s full duplex Compatible with Myrinet-2000 Same APIs and application support Similar switches, different data rate

Myrinet Protocol Support Low-level APIs GM 1 (legacy), GM 2 (current standard), MX (new), 3rd party (e.g., SCore PM) TCP/IP & UDP/IP Ethernet emulation, included in all GM and MX releases 1.95 Gb/s TCP/IP netperf benchmarks on Linux (2.4, 2.6) MPICH-GM, MPICH-MX An implementation of the Argonne MPICH directly over GM or MX. Third-party MPI implementations over Myrinet are also available. VI-GM, VI-MX An implementation of the VI Architecture API directly over GM or MX. Sockets-GM, Sockets-MX An implementation of UNIX or Windows sockets (or DCOM) over GM or MX. Completely transparent to application programs. Use the same binaries! Additional middleware in development e.g., udapl and kdapl (for distributed data bases and file systems DB2, Oracle 10g,...)

Agenda Return of the King 10G Gigabit

10G Ethernet (10 Gbe) Gigabit Ethernet Ubiquitous Track record for 'killing' all comers: Token Ring, FDDI, ATM Myrinet next? Infiniband TBA? Faster obviously! Lower latencies More Capable Start-up activity Storage capabilities

10Gbe Faster 10x throughput Latencies falling e.g (figures changing) Gigabit 10 Gbe 40-60us MPI over TCP/IP for simpler Linux stack 16us MPI over TCP/IP with user-mode stack Over TCP/IP, 9usec Over MPI 7 usec Lots of work around: TOE, RDMA, Etherfabric

10Gbe Momentum Lots of start-ups Level 5 networks Ammasso Chelsio S2IO Plus Myricom 'support' Impact of iscsi?

Agenda Summary

Summary Good news Everything's getting faster And Cheaper (both cost and also impact on host) More flexible Advice Better interoperability Richer protocols Look to the future what do you need Buy what best supports your applications Look at the new features Can you take advantage of these?

Cluster Grid Interconnects Tony.Kay@Sun.Com