GridFTP GUI: An Easy and Efficient Way to Transfer Data in Grid



Similar documents
GridFTP GUI: An Easy and Efficient Way to Transfer Data in Grid

Grid Data Management. Raj Kettimuthu

GridFTP: A Data Transfer Protocol for the Grid

GridCopy: Moving Data Fast on the Grid

Data Movement and Storage. Drew Dolgert and previous contributors

Globus Striped GridFTP Framework and Server. Raj Kettimuthu, ANL and U. Chicago

Real Time Analysis of Advanced Photon Source Data

Web Service Robust GridFTP

High Performance Data-Transfers in Grid Environment using GridFTP over InfiniBand

Information Sciences Institute University of Southern California Los Angeles, CA {annc,

Globus XIO Pipe Open Driver: Enabling GridFTP to Leverage Standard Unix Tools

Data Management. Network transfers

A Tutorial on Configuring and Deploying GridFTP for Managing Data Movement in Grid/HPC Environments

globus online Integrating with Globus Online Steve Tuecke Computation Institute University of Chicago and Argonne National Laboratory

Chapter 7. Using Hadoop Cluster and MapReduce

globus online Reliable, high-performance file transfer as a service

Open Source File Transfers

Integration of Network Performance Monitoring Data at FTS3

Comparisons between HTCP and GridFTP over file transfer

DSS. Data & Storage Services. Cloud storage performance and first experience from prototype services at CERN

Information Sciences Institute University of Southern California Los Angeles, CA {annc,

Data Storage and Data Transfer. Dan Hitchcock Acting Associate Director, Advanced Scientific Computing Research

Scala Storage Scale-Out Clustered Storage White Paper

EMC Backup and Recovery for Microsoft SQL Server 2008 Enabled by EMC Celerra Unified Storage

Addressing research data challenges at the. University of Colorado Boulder

Sector vs. Hadoop. A Brief Comparison Between the Two Systems

Data Management in an International Data Grid Project. Timur Chabuk 04/09/2007

Axceleon s CloudFuzion Turbocharges 3D Rendering On Amazon s EC2

File transfer in UNICORE State of the art

Science DMZs Understanding their role in high-performance data transfers

The glite File Transfer Service

Scalable Multi-Node Event Logging System for Ba Bar

Roadmap for Applying Hadoop Distributed File System in Scientific Grid Computing

The Lattice Project: A Multi-Model Grid Computing System. Center for Bioinformatics and Computational Biology University of Maryland

globus online Cloud-based services for (reproducible) science Ian Foster Computation Institute University of Chicago and Argonne National Laboratory

A File Transfer Component for Grids

Using Globus Toolkit

Storage Techniques for RHIC Accelerator Data*

Measurement of BeStMan Scalability

File Transfer Best Practices

XSEDE Service Provider Software and Services Baseline. September 24, 2015 Version 1.2

Concepts and Architecture of the Grid. Summary of Grid 2, Chapter 4

Diagram 1: Islands of storage across a digital broadcast workflow

Panasas at the RCF. Fall 2005 Robert Petkus RHIC/USATLAS Computing Facility Brookhaven National Laboratory. Robert Petkus Panasas at the RCF

13.1 Backup virtual machines running on VMware ESXi / ESX Server

IPv6 Traffic Analysis and Storage

On the features and challenges of security and privacy in distributed internet of things. C. Anurag Varma CpE /24/2016

File Transfer Examples. Running commands on other computers and transferring files between computers

DTI Image Processing Pipeline and Cloud Computing Environment

September, Collider-Accelerator Department. Brookhaven National Laboratory P.O. Box 5000 Upton, NY www. bnl.gov

1 Product. Open Text is the leading fax server vendor in the world. *

EXPLORATION TECHNOLOGY REQUIRES A RADICAL CHANGE IN DATA ANALYSIS

Considerations In Developing Firewall Selection Criteria. Adeptech Systems, Inc.

Scalable Windows Storage Server File Serving Clusters Using Melio File System and DFS

HADOOP, a newly emerged Java-based software framework, Hadoop Distributed File System for the Grid

Monitoring Clusters and Grids

Michał Jankowski Maciej Brzeźniak PSNC

globus online Cloud Based Services for Science Steve Tuecke Computation Institute University of Chicago and Argonne National Laboratory

The GRID and the Linux Farm at the RCF

A Reliable and Fast Data Transfer for Grid Systems Using a Dynamic Firewall Configuration

How To Build A Clustered Storage Area Network (Csan) From Power All Networks

EMC EXAM - E Backup and Recovery - Avamar Specialist Exam for Storage Administrators. Buy Full Product.

Topaz: Extending Firefox to Accommodate the GridFTP Protocol

Optimizing Data Management at the Advanced Light Source with a Science DMZ

An objective comparison test of workload management systems

bbc Adobe LiveCycle Data Services Using the F5 BIG-IP LTM Introduction APPLIES TO CONTENTS

The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets

Australian Synchrotron, Storage Gateway

Globus Toolkit: Authentication and Credential Translation

Campus Network Design Science DMZ

Enabling petascale science: data management, troubleshooting and scalable science services

How To Test The Bandwidth Meter For Hyperv On Windows V (Windows) On A Hyperv Server (Windows V2) On An Uniden V2 (Amd64) Or V2A (Windows 2

Summer Student Project Report

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

Get Success in Passing Your Certification Exam at first attempt!

Administering the Web Server (IIS) Role of Windows Server

EMC ISILON OneFS OPERATING SYSTEM Powering scale-out storage for the new world of Big Data in the enterprise

Amazon Cloud Storage Options

IBM TSM DISASTER RECOVERY BEST PRACTICES WITH EMC DATA DOMAIN DEDUPLICATION STORAGE

A Web Services Data Analysis Grid *

StorReduce Technical White Paper Cloud-based Data Deduplication

How to Choose your Red Hat Enterprise Linux Filesystem

Research Data Storage, Sharing, and Transfer Options

Log Mining Based on Hadoop s Map and Reduce Technique

THE EXPAND PARALLEL FILE SYSTEM A FILE SYSTEM FOR CLUSTER AND GRID COMPUTING. José Daniel García Sánchez ARCOS Group University Carlos III of Madrid

Bridgit Conferencing Software: Security, Firewalls, Bandwidth and Scalability

Cisco and EMC Solutions for Application Acceleration and Branch Office Infrastructure Consolidation

Ciphermail Gateway Separate Front-end and Back-end Configuration Guide

Distributed Systems Architectures

UDR: UDT + RSYNC. Open Source Fast File Transfer. Allison Heath University of Chicago

Directory and File Transfer Services. Chapter 7

BMC CONTROL-M Agentless Tips & Tricks TECHNICAL WHITE PAPER

Managing your Red Hat Enterprise Linux guests with RHN Satellite

Cloud Computing. Lecture 5 Grid Case Studies

How SafeVelocity Improves Network Transfer of Files

From Centralization to Distribution: A Comparison of File Sharing Protocols


White Paper. Amazon in an Instant: How Silver Peak Cloud Acceleration Improves Amazon Web Services (AWS)

Distributed File Systems

Internet Information TE Services 5.0. Training Division, NIC New Delhi

Transcription:

GridFTP GUI: An Easy and Efficient Way to Transfer Data in Grid Wantao Liu 1,2 Raj Kettimuthu 2,3, Brian Tieman 3, Ravi Madduri 2,3, Bo Li 1, and Ian Foster 2,3 1 Beihang University, Beijing, China 2 The University of Chicago, Chicago, USA 3 Argonne National Laboratory, Argonne, USA

Outline GridFTP overview GridFTP Challenges Commonly used GridFTP clients Zero configure GUI client Experimental results

GridFTP A secure, robust, fast, efficient, standards based, widely accepted data transfer protocol We also supply a reference implementation: Server Client tools (globus-url-copy) Development Libraries Multiple independent implementations can interoperate University of Virginia and Fermi Lab have home grown servers that work with ours. Lots of people have developed clients independent of the Globus Project.

GridFTP Two channel protocol like FTP Control Channel Communication link (TCP) over which commands and responses flow Low bandwidth; encrypted and integrity protected by default Data Channel Communication link(s) over which the actual data of interest flows High Bandwidth; authenticated by default; encryption and integrity protection optional

Striping GridFTP offers a powerful feature called striped transfers (cluster-to-cluster transfers)

GridFTP Servers Around the World Created by Lydia Prieto ; G. Zarrate; Anda Imanitchi (Florida State University) using MaxMind's GeoIP technology ( http://www.maxmind.com/app/ip-locate).

GridFTP in production Many Scientific communities rely on GridFTP High Energy Physics tiered data movement infrastructure for the LHC computing Grid LIGO routinely uses GridFTP to move 1 TB a day Southern California Earthquake Center (SCEC), Earth Systems Grid (ESG), Relativistic Heavy Ion Collider (RHIC), European Space Agency, BBC use GridFTP for data movement GridFTP facilitates an average of more than 5 million data transfers every day

Challenges Past success Standard big selling point for adoption Throughput GridFTP was sold on speed Robustness has to work all the time Current and future Ease-of-use Zero configuration clients Firewall Scalable Extensible

Globus-url-copy Commonly used command line scriptable client globus-url-copy [options] srcurl dsturl URL format - protocol://[user:pass@] [host]/path Users can do client/server and 3 rd party transfers using globus-url-copy

Other clients UberFTP Reliable file transfer service Custom clients using globus C and Java client libraries All these clients require non-trivial configuration Security setup None of these clients provide graphical user interface

Drag and drop Zero configuration GridFTP GUI Integrated with myproxy Automatically trusts the CAs part of IGTF distribution Fault tolerant Transfer status monitoring Optimized for performance

Snapshot of the GUI

Fault tolerant Better fault tolerance than other GridFTP clients Like other clients, GUI can recover from transient server and network failures Globus-url-copy can not recover from its own failures GUI can recover from its own failures Unlike RFT, stores information on the local file system

Lots of small files Scientific experiments produce huge volume of data the individual file size is modest, on the order of kilobytes or megabytes hundreds of thousands of files to transfer every day the size of the entire dataset is tremendous, from hundreds of gigabytes to hundreds of terabytes

Advanced Photon Source Advanced Photon Source at Argonne dozens of samples may be acquired for one experiment every day each sample generates about 2,000 raw data files after processing, each sample produces additional 2,000 reconstructed files each file is 8 to 16 MB in size

Lots of small files Transfer threads pool Move multiple files concurrently Maximize the utilization of network bandwidth Improve the transfer performance Two windows for status information Directory window lists all directories and their transfer status File window lists all files under the active directory

Experiment Setup We conducted all of our experiments using TeraGrid NCSA nodes and the University of Chicago nodes GridFTP GUI is compared with scp and globus-url-copy TCP is configured as the underlying data transport protocol

Experiment Results 180 160 globus-url-copy Transfer Time(Seconds) 140 120 100 80 60 40 globus-url-copy(p=4) scp GridFTP GUI 20 0 1 5 10 50 100 300 500 1000 File Size(MB) GridFTP GUI(p=4)

Experiment Results(cont.)

Questions 20