Infiniband Monitoring of HPC Clusters using LDMS (Lightweight Distributed Monitoring System)



Similar documents
Leveraging In-Memory Key/Value Stores in HPC Kelpie Craig Ulmer

How To Monitor Infiniband Network Data From A Network On A Leaf Switch (Wired) On A Microsoft Powerbook (Wired Or Microsoft) On An Ipa (Wired/Wired) Or Ipa V2 (Wired V2)

Lustre Networking BY PETER J. BRAAM

Cloud Development Strategies

Introduction to Infiniband. Hussein N. Harake, Performance U! Winter School

InfiniBand Software and Protocols Enable Seamless Off-the-shelf Applications Deployment

Can High-Performance Interconnects Benefit Memcached and Hadoop?

OS/Run'me and Execu'on Time Produc'vity

So#ware Tools and Techniques for HPC, Clouds, and Server- Class SoCs Ron Brightwell

Web Application Platform for Sandia

A Micro-benchmark Suite for Evaluating Hadoop RPC on High-Performance Networks

Building a Scalable Storage with InfiniBand

Achieving Mainframe-Class Performance on Intel Servers Using InfiniBand Building Blocks. An Oracle White Paper April 2003

A Tour of the Linux OpenFabrics Stack

Storage at a Distance; Using RoCE as a WAN Transport

Network Contention and Congestion Control: Lustre FineGrained Routing

The Importability of Performance Tools: The Good, the Bad, and the Ugly of PMUs

Who is Watching You? Video Conferencing Security

New Storage System Solutions

HPC Software Requirements to Support an HPC Cluster Supercomputer

Building Enterprise-Class Storage Using 40GbE

Cluster Implementation and Management; Scheduling

Energy Storage Safety Plan Implementation Kickoff

Exadata Database Machine Administration Workshop NEW

Mellanox Academy Online Training (E-learning)

High Availability with Windows Server 2012 Release Candidate

Running Native Lustre* Client inside Intel Xeon Phi coprocessor

SMB Advanced Networking for Fault Tolerance and Performance. Jose Barreto Principal Program Managers Microsoft Corporation

Globus and the Centralized Research Data Infrastructure at CU Boulder

Intel Cluster Ready Appro Xtreme-X Computers with Mellanox QDR Infiniband

Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks

High Speed I/O Server Computing with InfiniBand

Appro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes. Anthony Kenisky, VP of North America Sales

An Oracle White Paper October InfiniBand Network Troubleshooting Guidelines and Methodologies

IEEE Congestion Management Presentation for IEEE Congestion Management Study Group

Know your Cluster Bottlenecks and Maximize Performance

Implementing Network Attached Storage. Ken Fallon Bill Bullers Impactdata

TCP Servers: Offloading TCP Processing in Internet Servers. Design, Implementation, and Performance

Why Compromise? A discussion on RDMA versus Send/Receive and the difference between interconnect and application semantics

High Performance Data-Transfers in Grid Environment using GridFTP over InfiniBand

Lustre * Filesystem for Cloud and Hadoop *

High Performance Computing OpenStack Options. September 22, 2015

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

An Alternative Storage Solution for MapReduce. Eric Lomascolo Director, Solutions Marketing

BRIDGING EMC ISILON NAS ON IP TO INFINIBAND NETWORKS WITH MELLANOX SWITCHX

Mellanox Technical Training and Certification Program

3G Converged-NICs A Platform for Server I/O to Converged Networks

Globus Striped GridFTP Framework and Server. Raj Kettimuthu, ANL and U. Chicago

Lessons learned from parallel file system operation

Accelerating From Cluster to Cloud: Overview of RDMA on Windows HPC. Wenhao Wu Program Manager Windows HPC team

Performance Evaluation of the RDMA over Ethernet (RoCE) Standard in Enterprise Data Centers Infrastructure. Abstract:

ovirt and Gluster hyper-converged! HA solution for maximum resource utilization

Netvisor Software Defined Fabric Architecture

Pedraforca: ARM + GPU prototype

ovirt and Gluster hyper-converged! HA solution for maximum resource utilization

Network Infrastructure Services CS848 Project

Hyper-V over SMB Remote File Storage support in Windows Server 8 Hyper-V. Jose Barreto Principal Program Manager Microsoft Corporation

SMB Direct for SQL Server and Private Cloud

INAM - A Scalable InfiniBand Network Analysis and Monitoring Tool

Implementation of Hadoop Distributed File System Protocol on OneFS Tanuj Khurana EMC Isilon Storage Division

A parallel file system made in Germany Tiered Storage and HSM

White Paper Solarflare High-Performance Computing (HPC) Applications

InfiniBand Clustering

OFA Training Program. Writing Application Programs for RDMA using OFA Software. Author: Rupert Dance Date: 11/15/

Architecting a High Performance Storage System

Agile VPN for Carrier/SP Network. ONOS- based SDN Controller for China Unicom MPLS L3VPN Service

Measuring MPI Send and Receive Overhead and Application Availability in High Performance Network Interfaces

Zadara Storage Cloud A

Scaling 10Gb/s Clustering at Wire-Speed

White Paper ClearSCADA Architecture

ALPS Supercomputing System A Scalable Supercomputer with Flexible Services

Accelerating Spark with RDMA for Big Data Processing: Early Experiences

Low Latency 10 GbE Switching for Data Center, Cluster and Storage Interconnect

Current Status of FEFS for the K computer

From Ethernet Ubiquity to Ethernet Convergence: The Emergence of the Converged Network Interface Controller

Configuring LNet Routers for File Systems based on Intel Enterprise Edition for Lustre* Software

Mellanox Academy Course Catalog. Empower your organization with a new world of educational possibilities

Exploiting Remote Memory Operations to Design Efficient Reconfiguration for Shared Data-Centers over InfiniBand

Deploying 10/40G InfiniBand Applications over the WAN

A Software and Hardware Architecture for a Modular, Portable, Extensible Reliability. Availability and Serviceability System

SDN v praxi overlay sítí pro OpenStack Daniel Prchal daniel.prchal@hpe.com

Bright Cluster Manager

Storage Networking Foundations Certification Workshop

Cray DVS: Data Virtualization Service

Sockets vs. RDMA Interface over 10-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck

High Performance OpenStack Cloud. Eli Karpilovski Cloud Advisory Council Chairman

Transcription:

OpenFabrics Software User Group Workshop Infiniband Monitoring of HPC Clusters using LDMS (Lightweight Distributed Monitoring System) Christopher Beggio Sandia National Laboratories Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy s National Nuclear Security Administration under contract DE-AC04-94AL85000. SAND NO. 2011-XXXXP

Goals Understand how user/applications are utilizing network resources Intra-job communications File system communications PFS NFS Understand network congestion Intensity, longevity, extent Drive job scheduling and resource allocation based on: Dynamic state information Knowledge of user/application historic utilization File system locations

Current Impediments Understanding of user/application network needs is occluded by possible congestion both internal and external to application Opacity of core switches only lends itself to vague notion of congestion characteristics No snapshot ability because counter readings aren t synchronized Rapid sampling of core fabric adversely impacts application traffic

Deployment Configuration: Data Collection, Transport, and Storage RDMA Socket ldmsd

Chama: 1232 node TLCC2 cluster (SNL) Socket over Private Internal Network Socket over Private Internal Network RDMA over IB RDMA over IB

LDMS Architecture Memory Metric Set Metric Set Metric Set Sampler Plug-in Interface Memory Sampler LDMS API (libldms) HSN Sampler Transport Driver Interface LDMS Daemon Storage Plug-in Interface CSV Store LDMS API (libldms) Other Store Storage CSV MySQL Metric Set RDMA Transport Socket Transport Flat File 6

Metric Set Format 7

IB Host Based Metric Set chama466/sysclassib: consistent, last update: Tue Jan 20 20:36:00 2015 [6129us] U64 0 ib.symbol_error#qib0.1 U64 0 ib.link_error_recovery#qib0.1 U64 0 ib.link_downed#qib0.1 U64 0 ib.port_rcv_errors#qib0.1 U64 0 ib.port_rcv_remote_physical_errors#qib0.1 U64 0 ib.port_rcv_switch_relay_errors#qib0.1 U64 0 ib.port_xmit_discards#qib0.1 U64 0 ib.port_xmit_constraint_errors#qib0.1 U64 0 ib.port_rcv_constraint_errors#qib0.1 U64 0 ib.local_link_integrity_errors#qib0.1 U64 0 ib.excessive_buffer_overrun_errors#qib0.1 U64 0 ib.vl15_dropped#qib0.1 U64 5526619018 ib.port_xmit_data#qib0.1 U64 5502039184 ib.port_rcv_data#qib0.1 U64 17515642 ib.port_xmit_packets#qib0.1 U64 17586331 ib.port_rcv_packets#qib0.1 U64 0 ib.port_xmit_wait#qib0.1 U64 3851 ib.port_unicast_xmit_packets#qib0.1 U64 7711 ib.port_unicast_rcv_packets#qib0.1 U64 118 ib.port_multicast_xmit_packets#qib0.1 U64 122831 ib.port_multicast_rcv_packets#qib0.1

Host Side Collection (would also like from every core link) Errors: symbol_error link_error_recovery port_rcv_errors port_rcv_remote_physical_errors port_rcv_switch_relay_errors port_xmit_constraint_errors port_rcv_constraint_errors local_link_integrity_errors excessive_buffer_overrun_errors Data Info: port_xmit_data port_rcv_data port_xmit_packets port_rcv_packets Misc. Info: port_xmit_wait port_xmit_discards VL15_dropped link_downed

Other LDMS Metric Set Examples shuttle-cray.ran.sandia.gov_1/meminfo U64 160032 MemFree U64 181728 Buffers U64 3443332 Cached U64 33076 SwapCached U64 2987544 Active shuttle-cray.ran.sandia.gov_1/procstatutil U64 1826564 cpu0_user_raw U64 699631 cpu0_sys_raw U64 663843760 cpu0_idle_raw U64 201018 cpu0_iowait_raw shuttle-cray.ran.sandia.gov_1/vmstat U64 40008 nr_free_pages U64 122286 nr_interactive_anon U64 321902 nr_active_anon U64 465532 nr_inactive_file U64 424986 nr_active_file Metric sets: (datatype, value, metricname) tuples optional per metric user metadata e.g., component id API: ldms_get_set ldms_get_metric ldms_get_u64 Same API for on-node and offnode transport

Would Like Synchronized counter collection across all switches Configurable at boot Fixed set On-the-fly collection frequency change Offload IB counter collection from computes Currently working on reference implementation using LDMS protocol

IB Traffic 12/9/2014

Lustre Traffic 12/9/2014

IB Traffic 1/22/2014

Lustre Traffic 1/22/2015

IB Traffic Information = No information = Have information

Summary Goals Impediments LDMS Architecture IB Host Metrics IB Traffic Lustre Traffic Path forward

Questions? https://github.com/ovis-hpc/ovis

Thank You OpenFabrics Software User Group Workshop #OFSUserGroup