High Performance Data-Transfers in Grid Environment using GridFTP over InfiniBand



Similar documents
RDMA over Ethernet - A Preliminary Study

Designing Efficient FTP Mechanisms for High Performance Data-Transfer over InfiniBand

A Micro-benchmark Suite for Evaluating Hadoop RPC on High-Performance Networks

Can High-Performance Interconnects Benefit Memcached and Hadoop?

Performance of RDMA-Capable Storage Protocols on Wide-Area Network

Exploiting Remote Memory Operations to Design Efficient Reconfiguration for Shared Data-Centers over InfiniBand

Accelerating Spark with RDMA for Big Data Processing: Early Experiences

Sockets vs. RDMA Interface over 10-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck

Globus Striped GridFTP Framework and Server. Raj Kettimuthu, ANL and U. Chicago

SMB Direct for SQL Server and Private Cloud

Comparing SMB Direct 3.0 performance over RoCE, InfiniBand and Ethernet. September 2014

Storage at a Distance; Using RoCE as a WAN Transport

UDR: UDT + RSYNC. Open Source Fast File Transfer. Allison Heath University of Chicago

RDMA-based Plugin Design and Profiler for Apache and Enterprise Hadoop Distributed File system THESIS

Deploying 10/40G InfiniBand Applications over the WAN

A Study on the Scalability of Hybrid LS-DYNA on Multicore Architectures

Performance Evaluation of VMXNET3 Virtual Network Device VMware vsphere 4 build

Deploying in a Distributed Environment

PADS GPFS Filesystem: Crash Root Cause Analysis. Computation Institute

Advancing Applications Performance With InfiniBand

TCP Offload Engines. As network interconnect speeds advance to Gigabit. Introduction to

New Storage System Solutions

COLO: COarse-grain LOck-stepping Virtual Machine for Non-stop Service

PipeCloud : Using Causality to Overcome Speed-of-Light Delays in Cloud-Based Disaster Recovery. Razvan Ghitulete Vrije Universiteit

Evaluating High Performance Data Transfer with RDMA-based Protocols in Wide-Area Networks

Campus Network Design Science DMZ

Enabling Technologies for Distributed Computing

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

Quantifying the Performance Degradation of IPv6 for TCP in Windows and Linux Networking

HyperIP : VERITAS Replication Application Note

RDMA Performance in Virtual Machines using QDR InfiniBand on VMware vsphere 5 R E S E A R C H N O T E

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging

OFA Training Program. Writing Application Programs for RDMA using OFA Software. Author: Rupert Dance Date: 11/15/

Windows 8 SMB 2.2 File Sharing Performance

Hadoop on the Gordon Data Intensive Cluster

Efficient Data Transfer Protocols for Big Data

Performance Evaluation of the RDMA over Ethernet (RoCE) Standard in Enterprise Data Centers Infrastructure. Abstract:

Lustre Networking BY PETER J. BRAAM

Transparent Optimization of Grid Server Selection with Real-Time Passive Network Measurements. Marcia Zangrilli and Bruce Lowekamp

Hari Subramoni. Education: Employment: Research Interests: Projects:

InfiniBand Software and Protocols Enable Seamless Off-the-shelf Applications Deployment

Enabling Technologies for Distributed and Cloud Computing

IMPLEMENTING GREEN IT

Open Source File Transfers

I/O Virtualization Using Mellanox InfiniBand And Channel I/O Virtualization (CIOV) Technology

NFS SERVER WITH 10 GIGABIT ETHERNET NETWORKS

High-Speed TCP Performance Characterization under Various Operating Systems

High Speed I/O Server Computing with InfiniBand

A High-Performance Storage and Ultra-High-Speed File Transfer Solution

Achieving Mainframe-Class Performance on Intel Servers Using InfiniBand Building Blocks. An Oracle White Paper April 2003

Comparisons between HTCP and GridFTP over file transfer

Performance Evaluation of InfiniBand with PCI Express

Mellanox Cloud and Database Acceleration Solution over Windows Server 2012 SMB Direct

How To Monitor Infiniband Network Data From A Network On A Leaf Switch (Wired) On A Microsoft Powerbook (Wired Or Microsoft) On An Ipa (Wired/Wired) Or Ipa V2 (Wired V2)

SAN/iQ Remote Copy Networking Requirements OPEN iscsi SANs 1

Symantec Data Loss Prevention Network Monitor and Prevent Performance Sizing Guidelines. Version 12.5

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

ECLIPSE Performance Benchmarks and Profiling. January 2009

Applications. Network Application Performance Analysis. Laboratory. Objective. Overview

A Performance Analysis of the iscsi Protocol

Performance Characteristics of VMFS and RDM VMware ESX Server 3.0.1

Introduction. Need for ever-increasing storage scalability. Arista and Panasas provide a unique Cloud Storage solution

A Transport Protocol for Multimedia Wireless Sensor Networks

1000Mbps Ethernet Performance Test Report

Evaluation Report: Emulex OCe GbE and OCe GbE Adapter Comparison with Intel X710 10GbE and XL710 40GbE Adapters

Microsoft Windows Server 2003 with Internet Information Services (IIS) 6.0 vs. Linux Competitive Web Server Performance Comparison

FLOW-3D Performance Benchmark and Profiling. September 2012

UPPER LAYER SWITCHING

Oracle Exadata: The World s Fastest Database Machine Exadata Database Machine Architecture

LS DYNA Performance Benchmarks and Profiling. January 2009

Big Data: Hadoop and Memcached

Building a Scalable Storage with InfiniBand

Why Compromise? A discussion on RDMA versus Send/Receive and the difference between interconnect and application semantics

OpenMP Programming on ScaleMP

GridFTP GUI: An Easy and Efficient Way to Transfer Data in Grid

Primary Data Center. Remote Data Center Plans (COOP), Business Continuity (BC), Disaster Recovery (DR), and data

Advanced Computer Networks. High Performance Networking I

Solution Brief July All-Flash Server-Side Storage for Oracle Real Application Clusters (RAC) on Oracle Linux

High Performance OpenStack Cloud. Eli Karpilovski Cloud Advisory Council Chairman

Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays. Red Hat Performance Engineering

Cluster Grid Interconects. Tony Kay Chief Architect Enterprise Grid and Networking

TCP Servers: Offloading TCP Processing in Internet Servers. Design, Implementation, and Performance

Oracle Database Scalability in VMware ESX VMware ESX 3.5

- An Essential Building Block for Stable and Reliable Compute Clusters

D1.2 Network Load Balancing

Tyche: An efficient Ethernet-based protocol for converged networked storage

Evaluation of ConnectX Virtual Protocol Interconnect for Data Centers

Where IT perceptions are reality. Test Report. OCe14000 Performance. Featuring Emulex OCe14102 Network Adapters Emulex XE100 Offload Engine

IEEE Congestion Management Presentation for IEEE Congestion Management Study Group

INDIAN INSTITUTE OF TECHNOLOGY KANPUR Department of Mechanical Engineering

Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle

Transcription:

High Performance Data-Transfers in Grid Environment using GridFTP over InfiniBand Hari Subramoni *, Ping Lai *, Raj Kettimuthu **, Dhabaleswar. K. (DK) Panda * * Computer Science and Engineering Department The Ohio State University, USA ** Mathematics and Computer Science Division Argonne National Laboratories, USA

Outline Introduction & Motivation Designing Globus-XIO ADTS Driver Experimental Results Conclusions & Future Work 2

Introduction Increasing demands in high end computing has lead to the deployment of compute and storage nodes on a global scale Efficient bulk data transfer within and across such clusters is important for Data-set distribution, content replication, remote site backup etc GridFTP, designed using the Globus XIO framework, is the most popular mechanism to achieve this in Grids But, performance of GridFTP in WAN is limited by relatively low communication bandwidth of existing network protocols 3

Introduction (cont.) System Area Network (SAN) gains momentum InfiniBand, 10Gigabit Ethernet/iWARP etc. High bandwidth, low latency and unique capabilities InfiniBand Open Industry Standard based High Performance High Bandwidth (~ 40Gbps) Low Latencies (~1 us) Multiple Transport modes Including RC, UD Two communication semantics Channel semantics: send/recv Memory semantics: RDMA operations WAN capabilities!! Obsidian Longbow routers Bay company products 4

InfiniBand WAN Obsidian WAN Router Obsidian WAN Router (Variable Delay) (Variable Delay) Cluster A WAN Link Point-to-point inter-cluster links Single Data Rate (SDR) Varying delay emulates the WAN distance Delay (us) 0 0 10 2 100 20 1000 200 Distance Emulated(km) 10000 2000 Cluster B Links emulate each km of WAN link length with an increase of 5 us to each packet latency 5

Bandwidth (MBps) Motivation Historically, wide-area data transport has been handled mostly by TCP Performance achieved using TCP is only a small part of the available bandwidth Although multiple efforts have been made to improve the performance of TCP or implement reliable transfer on top of UDP, none of these are able to utilize the complete network bandwidth InfiniBand on the other hand is able to saturate the network 14000 12000 10000 8000 6000 4000 2000 IB RC Verbs Bandwidth Previous works have shown that IB has better performance over WAN than optimized versions of TCP [1] [1] N. Rao, S. Poole, P. Newman and, S. Hicks, Wide Area InfiniBand RDMA: Experimental Evaluation, HPiDC'09 0 2 8 32 128 512 2K 8K 32K 128K 512K 2M Message size (Bytes) IBVerb(DDR) IPoIB(DDR) SDP(DDR) 10GigE IPoIB(SDR) 6

Motivation (cont) Other interesting developments has been taking place in parallel Authors have previously designed a File Transfer Protocol over IB (FTP-ADTS [2] ) with features like zero copy and pipelined, data transfers But, initial version of FTP-ADTS was not designed for disk based data transfers, a necessity for large volume data transfers used by real world applications such as UltraViz and CCSM [2] P. Lai, H. Subramoni, S. Narravula, A. Mamidala and D. K. Panda, Designing Efficient FTP Mechanisms for High Performance Data-Transfer over InfiniBand, ICPP 09. 7

Problem Statement Multiple solutions for data transfers in Grid GridFTP well accepted, well defined and easy to use low performance FTP-ADTS high performance basic FTP interface and less support for disk based data transfers Hybrid approach, combining ease of use of GridFTP with high performance offered by FTP-ADTS, could be optimum Can we provide native IB support to GridFTP by designing a new Globus-XIO ADTS driver for GridFTP? Can we enhance the design of FTP-ADTS for disk based data transfers using separate Disk-IO threads to stage data from slower speed disks to memory? 8

Outline Introduction & Motivation Designing Globus-XIO ADTS Driver Experimental Results Conclusions & Future Work 9

Communication Options in Grid GridFTP High Performance Computing Applications Globus XIO Framework IB Verbs IPoIB RoCE TCP/IP Obsidian Routers 10 GigE Network Multiple options exist to perform data transfer on Grid Globus-XIO framework currently does not support IB natively We create the Globus-XIO ADTS driver and add native IB support to GridFTP 10

Globus-XIO Framework with ADTS Driver Globus XIO Interface (Transform Drivers) Globus XIO Transform Driver #1 Globus XIO Transform Driver #n User Globus-XIO ADTS Driver (Transport Driver) Data Connection Management Persistent Session Management Data Transport Interface Buffer & File Management File System Zero Copy Channel Flow Control Memory Registration Modern WAN Interconnects InfiniBand / RoCE 10GigE/iWARP Network 11

Buffer and File Management Two types of buffers File Buffer Used to stage data from source, by the Network or Disk-IO threads Size varied from 2 MB to 64 MB Network Buffer Used by network thread to inject data into the network Set to 1 MB based on results of previous evaluation Disk read thread pre-fetches set of locations from the disk into circular buffer Size of one element of Circular Buffer = Size of File Buffer Low and high watermarks defined to limit the minimum and maximum number of file buffers read by disk read thread before yielding to Network Thread Write thread follows similar design Low and high watermark values as well as the size of the file buffer need to be carefully tuned to achieve best performance 12

Buffer and File Management (Cont) Disk Read Thread Network Communication Thread File System Circular Buffer Disk Write Thread InfiniBand Host Channel Adapter Physical Link 13

Outline Introduction & Motivation Designing Globus-XIO ADTS Driver Experimental Results Conclusions & Future Work 14

Experimental Setup Experimental Testbed Dual quad-core Xeon Nehalem processors, at 2.40 GHz with 12 GB RAM and PCIe 2.0 interface Red Hat Enterprise Linux Server release 5.3 Linux kernel 2.6.18-128 InfiniBand (IB) QDR ConnectX HCAs with OFED 1.4.2 For all IPoIB (TCP/IP) based tests, auto-tuning of the socket buffers was enabled HTCP congestion control mechanism was used due to bug with the CUBIC congestion control protocol on the 2.6.18 kernels Nodes are connected with Obsidian WAN routers Numbers shown are for FTP-Get operation 15

Bandwidth (MBps) Bandwidth (MBps) 1200 1000 800 600 400 200 0 Performance of Memory Based ADTS Driver Data Transfer 0 10 100 1000 Network Delay (us) 1200 1000 800 600 400 200 0 UDT Driver 0 10 100 1000 Network Delay (us) 2 MB 8 MB 32 MB 64 MB 2 MB 8 MB 32 MB 64 MB Performance numbers obtained while transferring 128 GB of aggregate data in chunks of 256 MB files at application level ADTS based implementation is able to saturate the link bandwidth Best performance for ADTS obtained when performing data transfer with a file buffer of size 32 MB 16

Bandwidth (MBps) Bandwidth (MBps) 400 Performance of Disk Based Synchronous Mode Data Transfer 400 Asynchronous Mode 300 200 100 300 200 100 0 0 10 100 1000 Network Delay (us) ADTS-8MB ADTS-16MB ADTS-32MB ADTS-64MB IPoIB-64MB Performance numbers obtained while transferring 128 GB of aggregate data in chunks of 256 MB files at application level Predictable as well as better performance when Disk-IO threads assist network thread (Asynchronous Mode) Best performance for ADTS obtained with a circular buffer with individual buffers (also file buffers) of size 64 MB 0 0 10 100 1000 Network Delay (us) ADTS-8MB ADTS-16MB ADTS-32MB ADTS-64MB IPoIB-64MB 17

Bandwidth (MBps) Application Level Performance 300 250 200 150 100 50 0 CCSM Ultra-Viz Target Applications Application performance for FTP get operation for disk based transfers Community Climate System Model (CCSM) Part of Earth System Grid Project Transfers 160 TB of total data in chunks of 256 MB Network latency - 30 ms Ultra-Scale Visualization (Ultra-Viz) Transfers files of size 2.6 GB Network latency - 80 ms ADTS IPoIB The ADTS driver out performs the UDT driver using IPoIB by more than 100% 18

Outline Introduction & Motivation Designing Globus-XIO ADTS Driver Experimental Results Conclusions & Future Work 19

Conclusions & Future Work Design and implement an ADTS based Globus-XIO driver for Grid-FTP, capable of utilizing optimizations done in ADTS including memory registration cache, and pipelined data transfer Propose and design a simple data staging mechanism for the ADTS library allowing us to achieve better and predictable file transfer performance GridFTP running over the new Globus-XIO-ADTS driver achieves significantly better performance (up to 100% improvement) for both disk and memory based transfers Future work Conduct experiments on actual wide area networks and study the impact of cross traffic on performance of the ADTS driver Explore the impact of various transport level features such as retransmission timeout on the performance of the ADTS driver Investigate whether, InfiniBand as a protocol, is suited for WAN data transfers 20

Thank you {subramon, laipi, panda} @cse.ohio-state.edu {kettimut} @mcs.anl.gov Network-Based Computing Laboratory http://nowlab.cse.ohio-state.edu/ Mathematics and Computer Science Division http://mcs.anl.gov/ 21

Questions??? 22