Comparisons between HTCP and GridFTP over file transfer



Similar documents
GridFTP: A Data Transfer Protocol for the Grid

Globus Striped GridFTP Framework and Server. Raj Kettimuthu, ANL and U. Chicago

Web Service Robust GridFTP

Network setup and troubleshooting

Understanding Slow Start

High Performance Data-Transfers in Grid Environment using GridFTP over InfiniBand

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)

Quick Start Guide. Cerberus FTP is distributed in Canada through C&C Software. Visit us today at

Bandwidth Aggregation, Teaming and Bonding

<draft-luotonen-web-proxy-tunneling-00.txt> February Tunneling TCP based protocols through Web proxy servers

Internet Content Distribution

MANAGING NETWORK COMPONENTS USING SNMP

FAQs for Oracle iplanet Proxy Server 4.0

Network File System (NFS) Pradipta De

Chapter 1 - Web Server Management and Cluster Topology

Introduction to Computer Security Benoit Donnet Academic Year

INTERNET SECURITY: THE ROLE OF FIREWALL SYSTEM

First Midterm for ECE374 03/24/11 Solution!!

REQUIREMENTS AND INSTALLATION OF THE NEFSIS DEDICATED SERVER

Proxies. Chapter 4. Network & Security Gildas Avoine

CS 188/219. Scalable Internet Services Andrew Mutz October 8, 2015

Deploying in a Distributed Environment

Traffic Analyzer Based on Data Flow Patterns

A Talari Networks White Paper. Turbo Charging WAN Optimization with WAN Virtualization. A Talari White Paper

Sage 300 ERP Online. Mac Resource Guide. (Formerly Sage ERP Accpac Online) Updated June 1, Page 1

Sage ERP Accpac Online

Dissertation Title: SOCKS5-based Firewall Support For UDP-based Application. Author: Fung, King Pong

VPN over Satellite A comparison of approaches by Richard McKinney and Russell Lambert

File Transfer And Access (FTP, TFTP, NFS) Chapter 25 By: Sang Oh Spencer Kam Atsuya Takagi

The following multiple-choice post-course assessment will evaluate your knowledge of the skills and concepts taught in Internet Business Associate.

SWE 444 Internet and Web Application Development. Introduction to Web Technology. Dr. Ahmed Youssef. Internet

FILE TRANSFER PROTOCOL INTRODUCTION TO FTP, THE INTERNET'S STANDARD FILE TRANSFER PROTOCOL

Basic Network Configuration

IBM Unica emessage Version 8 Release 6 February 13, Startup and Administrator's Guide

Teridion. Rethinking Network Performance. The Internet. Lightning Fast. Technical White Paper July,

A host-based firewall can be used in addition to a network-based firewall to provide multiple layers of protection.

First Midterm for ECE374 02/25/15 Solution!!

From Centralization to Distribution: A Comparison of File Sharing Protocols

Port Use and Contention in PlanetLab

Multi-threaded FTP Client

CASHNet Secure File Transfer Instructions

GlobalSCAPE DMZ Gateway, v1. User Guide

Frequently Asked Questions

Lesson 7 - Website Administration

Компјутерски Мрежи NAT & ICMP

ADAPTIVE LOAD BALANCING FOR CLUSTER USING CONTENT AWARENESS WITH TRAFFIC MONITORING Archana Nigam, Tejprakash Singh, Anuj Tiwari, Ankita Singhal

Grid Scheduling Dictionary of Terms and Keywords

Data Movement and Storage. Drew Dolgert and previous contributors

WLAN TRAFFIC GRAPHING APPLICATION USING SIMPLE NETWORK MANAGEMENT PROTOCOL *

1 Introduction: Network Applications

A Tutorial on Configuring and Deploying GridFTP for Managing Data Movement in Grid/HPC Environments

Objectives of Lecture. Network Architecture. Protocols. Contents

Network Configuration Settings

UDR: UDT + RSYNC. Open Source Fast File Transfer. Allison Heath University of Chicago

CHAPTER 1 - JAVA EE OVERVIEW FOR ADMINISTRATORS

Konica Minolta s Optimised Print Services (OPS)

Avaya P330 Load Balancing Manager User Guide

Internet Technologies. World Wide Web (WWW) Proxy Server Network Address Translator (NAT)

Load Balancing. Outlook Web Access. Web Mail Using Equalizer

NEFSIS DEDICATED SERVER

High-Speed TCP Performance Characterization under Various Operating Systems

GridFTP GUI: An Easy and Efficient Way to Transfer Data in Grid

DoS: Attack and Defense

Load Balancer Comparison: a quantitative approach. a call for researchers ;)

Apache CloudStack 4.x (incubating) Network Setup: excerpt from Installation Guide. Revised February 28, :32 pm Pacific

Campus Network Design Science DMZ

CSCI 362 Computer and Network Security

WEB SERVER MONITORING SORIN POPA

Development of Monitoring and Analysis Tools for the Huawei Cloud Storage

AKAMAI WHITE PAPER. Delivering Dynamic Web Content in Cloud Computing Applications: HTTP resource download performance modelling

STERLING SECURE PROXY. Raj Kumar Integration Management, Inc.

DIGIPASS Authentication for Check Point Security Gateways

CHAPTER 3 PROBLEM STATEMENT AND RESEARCH METHODOLOGY

Running SAP Solutions in the Cloud How to Handle Sizing and Performance Challenges. William Adams SAP AG

PART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions. Outline. Performance oriented design

Network Security TCP/IP Refresher

Special Edition for Loadbalancer.org GmbH

A TECHNICAL REVIEW OF CACHING TECHNOLOGIES

Secure Transfers. Contents. SSL-Based Services: HTTPS and FTPS 2. Generating A Certificate 2. Creating A Self-Signed Certificate 3

USER GUIDE WEB-BASED SYSTEM CONTROL APPLICATION. August 2014 Phone: Publication: , Rev. C

Job Reference Guide. SLAMD Distributed Load Generation Engine. Version 1.8.2

GridFTP GUI: An Easy and Efficient Way to Transfer Data in Grid

Network Technologies

Project #2. CSE 123b Communications Software. HTTP Messages. HTTP Basics. HTTP Request. HTTP Request. Spring Four parts

Data Driven Success. Comparing Log Analytics Tools: Flowerfire s Sawmill vs. Google Analytics (GA)

Web Development. Owen Sacco. ICS2205/ICS2230 Web Intelligence

Oracle WebLogic Server 11g Administration

Higher Computing Networking 1

Optimizing Outlook Performance

DEPLOYMENT GUIDE Version 1.1. Deploying F5 with Oracle Application Server 10g

gateprotect Performance Test of xutm Appliances

Transcription:

Comparisons between HTCP and GridFTP over file transfer Andrew McNab and Yibiao Li Abstract: A comparison between GridFTP [1] and HTCP [2] protocols on file transfer speed is given here, based on experimental results collected from transferring files to various destinations via HTCP protocol and GridFTP protocol. Mean value and standard deviation are used in the statistical analysis of the results, and a percentage offset analysis is introduced for the analysis of sets of the experimental data. The comparisons between the HTTP [3] and FTP [4] protocols over file transfer are also discussed here. The results show that HTCP file transfer is slightly faster than GridFTP file transfer in average. Keywords: HTCP, GridFTP, file transfer, percentage offset Background Both HTCP and GridFTP were developed as grid computing [3] client side tools. GridFTP is a high performance, secure, reliable data transfer protocol optimized for high bandwidth wide area networks. The GridFTP protocol is based on FTP, and is extended it with facilities such as multistreamed transfer, autotuning and Globus based security. GridFTP supports the following features: Grid Security Infrastructure (GSI) and Kerberos support, third party control of data transfer, parallel data transfer, striped data transfer, partial file transfer, automatic negotiation of TCP buffer/window sizes, support for reliable and restartable data transfers. HTCP is developed as part of a family of Unix commands for performing file operations on remote HTTP and HTTPS fileservers. In particular, these commands provide client side support for the GridSite extensions to the Apache web server, include GSI proxies, authorization by X.509 certificate, PUT / DELETE / MOVE support, and GridHTTP bulk data transfers. It also supports: third party control of data transfer, partial file transfer and parallel data transfer. So there are great similarities between these two protocols (software tools). But the fundamental difference is that GridFTP is based on FTP protocol but HTCP is based on HTTP(S) protocol. The difference in file transfer mode between HTTP and FTP brings on the difference in file transfer mode between HTCP and GridFTP. GridFTP, following FTP, occupies two TCP channels to transfer file, one channel is used for the purpose of control and another is used for data transfer, while HTCP, following HTTP, normally uses one TCP channel for file transfer. Because HTCP and GridFTP are developed basing on HTTP and FTP, respectively, and there are few articiles or reports on the comparisons of these two protocols over the file transfer, so we start with comparing HTTP to FTP, and hope the comparison can be extended to the comparison between HTCP and GridFTP. Due to the complexities of HTTP and FTP protocols [5] [6], as well HTCP and GridFTP protocols, it is

almost impossible for us to compare each pair in theory and then give a conclusion of which protocol is better in file transfer. Instead, we choose to do speed tests to collect as much data as we can, then give a result based on the statistical analysis. Note the fact that the network traffic is various from time to time, and the capacities of the hardwares, such as computer CPU, memory and network device, can affect the network transformation, that means the test results may be various from time and test computer, so trying to build a fair environment for the test is critical for our test. And due to this reason, it is impossible for us to do a quantitative analysis and finally to give an exact result. Also, both FTP and HTTP are general protocols, lots of software tools derived from each protocol. Any attempt to compare these two protocols is unwise, but as we demonstrate in the later sections, the comparison between particular pair derived from these two protocols is feasible. Mathematical analysis methods To analyze the test results, we will use the following mathematical methods. The mean value is average of the numbers while the standard deviation tells how much variation of a set of values is from its average. Percentage offset and distribution To analyze a single set of data, the mean value and the standard deviation may be enough. However, as you will see in the following sections, we will combine many sets of data to do the analysis, because different set of data has its own range, it is quite difficult for us to put different set of data together, then apply a mathematical method to do analysis. Now we introduce a percentage offset, defined as a file transfer speed s percentage offset from the mean value of its set of data. For example, if we collect a set of file transfer speed data over HTTP for a specified server and a specified client, Where j stands for the j th test for file transfer over HTTP, and n j is the number of the test data collect from the j th test. Thus once we specified a value V j1, the offset percentage data of this sequence from the specified value M j1 will be:, Mean value and standard deviation Supposed that N times of test have been done and the test results (file transfer time) are: x 1, x 2,..., x N, then the mean value of the test result is defined as: Where. Similarly, for j th test of the file transfer over FTP, the speed data: and the standard deviation is defined as: Where j stands for the j th test for file transfer over FTP, and m j is the number of the test data collect from the j th test. Thus once we specified a value V j2, the offset percentage data of this sequence from the specified value V j2 will be:

where To compare two sequences H j and F j, we define: Where So the comparison will be based on the same value V j for the j th test. Based on the percentage sequence, we can do two comparisons. One is the general offset, can be calculated by the sum of the sequence, which gives the speed comparison. Another is the distribution comparison, specifying a percentage value P, and counting on the numbers in the H and F sequences, when the α ji or β ji is smaller than P. The comparison gives the possibilities for transfer speed over one protocol to be faster than another one. Let us apply this method to one of the tests now. HTTP vs. FTP The original HTTP (version 0.9) provided a way to publish or retrieve hypertext pages, now it (version 1.1, 1.2) is extended to transfer or convey information on intranet and the World Wide Web. FTP or File Transfer Protocol is used to transfer data from one computer to another over the Internet, or through a network. Specifically, FTP is a commonly used protocol for exchanging files over any network that supports the TCP/IP protocol (such as the Internet or an intranet).., There are a lot of criticisms on FTP about its security issue, such as two TCP/IP connections, conflicting with firewall, etc. People turn to utilize HTTP(S) to do file transfer. The advantages of file transfer over HTTP(S) are obvious: occupying only one TCP/IP connection, persistent connection, no more cost on server etc. Though there are a lot of arguments about the file transfer speed over HTTP(S) comparing to FTP, we cannot find quantitative analysis on this issue so far. The common answer on internet is FTP is a more efficient protocol for transferring larger files, whereas HTTP is more efficient for smaller files. Theoretically, both HTTP and FTP are located in the application layer in the ISO/OSI standard model [12], and in practise both are implemented basing on the TCP connections (note that HTTP(S) does not constrain to use TCP/IP connection, it can also deploy other connections, but most popular HTTP (S) products are using TCP/IP connection). Supposed two same size packages are transferred though the same TCP connection under the same network traffic condition, there should not be the delivering time difference in theory. But HTTP and FTP have different mechanisms to utilize the TCP connection, causing the transfer speed s difference. To measure the difference of file transfer speed over between HTTP and FTP is quite difficult. An analysis [9] of RTT (round trip time), comparing FTP sequence diagram [11] to HTTP sequence diagram [10], shows that HTTP is slightly faster than FTP during TCP calling, but there are still many other mechanisms for file transfer over each of two protocols, so the RTT analysis cannot make a conclusion. Instead of struggling to compare these two protocols in theory, we do some experiments in practice. After building an environment, and collecting data of file transfer speed over HTTP and FTP, we analyze the speed data and try to find a conclusion.

To build a fair circumstance to do the test, we equipped a computer with HTTP server and FTP server as a test server. Then using C, together with LIBCURL [10] API and multithreading technology [11] to write a client test tool, the flowchart of the test tool is shown in Fig 1. LIBCURL provides low level programming for file transfer over HTTP and FTP, and multi threading technology enable us to start HTTP and FTP file transfer simultaneously. Before doing the tests, we copy the file to be transferred into two directories, one can be fetched through HTTP and another can be fetched by FTP. Then test tool transfers two files (actually the same file) to a destination over HTTP and FTP, certainly, two files are saved in either different name or different locations. This ensures that there is no confliction on either source server side or destination side during file IO, and also ensure that the file is transferred under the same network traffic conditions. So the test is under the fair conditions for HTTP and FTP. Figure 1: a flow chart of test tool to collect data from file transfer over HTTP and FTP, respectively. Local network test Before doing the remote network test, let us do it in a local network, which has few network traffic congestions. We execute the client test tool in a computer within the server s local network; the collected data are plotted in Figure 2.

Figure 2: a plot of results for comparison of 1 megabyte file transfer over HTTP and FTP protocols. Red line for HTTP transfer and blue line for FTP transfer. File transfer speed over HTTP is drawn in red and FTP in blue. Fig.2 shows the file transfer speed when a 1megabyte file was repeatedly transferred for 100 times within a local network over HTTP and FTP. At the points 3, 22, 31 and 86, the speed curves for both HTTP and FTP are jumped up to the levels far from the averages, the file transfer speeds slow down, because there are some other the network traffics at these points. Apart from these points, we can see that the speed curves for HTTP and FTP are both nearly flat, there are few oscillations. In this test, the file transfer speed over HTTP is faster than the speed over FTP, and the file transfer speed over HTTP is coincide to the file transfer speed over FTP. Another test for a 100 megabytes file to be transferred within a local network gives the similar conclusions as shown in Fig. 3. Because the file transfer time for a 100 megabyte file is longer than the time for a one megabyte file and the transfer of the 100 megabyte file increases the network traffic, so the oscillations are more in Fig 3 than in Fig 2, but it can be obviously seen that file transfer speed over HTTP coincides with the speed over FTP, in average, the file transfer speed over HTTP is faster than the speed over FTP. Figure 3: a plot of results for comparison of 100 megabyte file transfer over HTTP and FTP protocols. Red line for HTTP transfer and blue line for FTP transfer. File transfer speed over HTTP is drawn in red and FTP in blue. From the above two tests, we know that 1. The transfer speeds over HTTP or FTP are stable if no disturbance. 2. The transfer speed over HTTP should coincide with the transfer speed over FTP. Though the above two tests show that the file transfer speed over HTTP is faster than the speed over FTP, but we cannot draw a conclusion just from these two tests, because we cannot predict the result after the environment changes, for example, what if the server is changed, the network devices are changed or the test tool is reprogrammed. Grid network test The local network tests over HTTP and FTP have been shown in the previous section, before extending the tests to the comparison over GridFTP and HTCP, we will present the tests over HTTP and FTP in grid network. Similarly, we equip a grid node with a HTTP and FTP server, and use the client test tool mentioned in last section to download a specific file over HTTP and FTP at the same time, and then collect the

speed data. In practise, we use the GridPP as test environment; note that the grid nodes are distributed world widely, so the routes on which the files are transferred may consist of tens or hundreds of computers or network devices, that means the network traffic becomes more complicated in the grid network that it in local network. Choose a different client or server, the test results may vary in a quite big range. Figure 4: a plot of results of 100 megabyte file transfer over HTTP and FTP in Grid Network. Red triangle for HTTP transfer data and blue diamond for FTP transfer data. The test environment is similar to the local network test, but here, the clients are chosen from grid nodes that have CURL command available (actually, it comes with most of the recently popular LINUX/Unix). By turn we transfer files over HTTP and FTP with CURL command, note that we put the same file under HTTP and FTP directories, and record the transferring times, respectively. Then repeated it for a specified times, for example, 20 times, and collected the speed data of the files to be transferred. After we do enough times of test and get enough data, we can analyse the data, and see which way between CURL HTTP and CURL FTP is the best regarding the file transfer. However, the test on a particular client can be considered as a particular case, to draw a general conclusion, we must do as many tests on different clients as we can. In practise, tens of clients distributed in differently geographic locations are chosen to do the tests. Fig.5 shows a plot of mean speeds of the tests. It is clearly seen that though the transfer speeds change up and down through the tests, but the speeds over HTTP and FTP well coincides just as shown in previous section. CURL command tests We present tests of comparison of file transfer between HTTP and FTP in the previous sections. The results may not be convinced due to the test tool built by authors. Now let us turn to CURL command. CURL is a command line tool for transferring files with URL syntax, supporting FTP, FTPS, HTTP, HTTPS, SCP, etc. Because it is open source command and long time being popularly used, it should be well coincided with HTTP and FTP protocols. So we will test the file transfer speed using CURL command instead of the test tool. Figure 5: A plot of mean speeds for the test using CURL HTTP and CURL FTP. Red stars stand for the mean values collected by CURL HTTP, while blue circles stand for the mean values collected by CURL FTP. From the comparison of mean values of the transfer speed over HTTP and FTP by using CURL command, we can see that in most of cases, the speeds are equivalent. Occasionally, the mean

speeds between CURL HTTP and CURL FTP have differences, for example, in tests 3 and 6 in Fig.5. Contrasting to the standard deviations (Fig. 6), we find that when the mean values between the CURL HTTP and CURL FTP have obvious differences, for example, in test 3 and 6 in the Fig. 5, the related tests have big standard deviations. As known, the big deviation is normally caused by some abnormal data (far from the mean value). Here the big standard deviations are caused by the unusual file transfer speeds, due to the unstable network states. Figure 7: A plot of percentage offsets of the speeds collected from CURL HTTP and CURL FTP tests. The red stars stand for the CURL HTTP speed offsets, the blue circles stand for the CURL FTP speed offsets. Figure 8: A plot of distribution of the file transfer speed for CURL command over HTTP and FTP. The red bars stand for the percentage data of the speeds over HTTP, the blue bars stand for the percentage data of the speeds over FTP. Figure 6: A plot of standard deviations for the speed test using CURL HTTP and CURL FTP. The red stars standard the standard deviation calculated from the data collected by CURL HTTP, the blue circles, by CURL FTP. HTCP vs. GridFTP Following the CURL tests in Grid network, we present the HTCP and GridFTP test in grid network here. Using the same test environment in previous section, changing the CURL command for file transfer over HTTP to HTCP and CURL command for file transfer over FTP to GridFTP, we can perform the tests, and the test plots of mean values, standard deviations, percentage offsets and distribution are shown in the figures 9 12, separately.

Figure 10: A plot of standard deviations for the speed test using HTCP and GridFTP. The red stars standard the standard deviation calculated from the data collected by HTCP, the blue circles, by GridFTP. Figure 9. A plot of mean values of file transfer speeds collected from the tests fetching the same file using HTCP and GridFTP from the various Grid nodes. The red star shows the mean speed collected from HTCP test and the blue circle shows the mean speed collected from the GridFTP test. Figure 9 clearly shows that the HTCP mean speeds and GridFTP mean speeds are quite close, and at most nodes, the HTCP mean speed is slightly faster than GridFTP mean speed. But from figure 10, the plot of standard deviation, we know that the HTCP transfer has the bigger standard deviation than GridFTP transfer. From mathematical analysis, the bigger standard deviation is caused by either only the values are far from the mean value at some individual points or the most values are away from the mean value. Figure 11 shows that fewer HTCP percentage data are far from the mean value, so we know that HTCP file transfer has more chances away from mean value. Figure 11: A plot of percentage offsets of the speed data collected from HTCP and GridFTP tests. The red stars stand for the HTCP speed offsets, the blue circles stand for the GridFTP speed offsets. Figure 12: a plot of distribution of the file transfer speed for HTCP and GridFTP. The red bars stand for the percentage data of the speeds over HTCP, the blue bars stand for the percentage data of the speeds over GridFTP.

Conclusions In previous sections, we have compared the file transfer speeds between HTTP and FTP, CURL HTTP and CURL FTP, as well as HTCP and GridFTP. The comparison between HTTP and FTP tells that the HTTP and FTP, both as application layer protocols, sharing the lower layer TCP connection during the file transferring, are affected similarly by the network traffic. The comparison between CURL HTTP and CURL FTP, shows that the speeds for using the CURL command to transfer file over between HTTP and FTP are almost levelled, the speeds under unstable network status cannot be predicted. And the comparison between HTCP and GridFTP shows that as a UNIX/LINUX commands, HTCP can transfer files slightly faster than GridFTP in average. As well known, the GridFTP can only operate files on or between GridFTP server(s), which have been installed the GridFTP package, and the user computer, is also required to install a GridFTP client package. However, the HTCP command can operate files on a normal website or Gridsite node, the user computer, need only to install a HTCP command, is easier to use than GridFTP. [1] http://www.globus.org/grid_software/data/gridftp.php, GridFTP information entry [2] https://www.gridsite.org/wiki/htcp_command, HTCP information entry [3] http://en.wikipedia.org/wiki/http, HTTP information page [4] http://en.wikipedia.org/wiki/file_transfer_protocol, FTP information page [5] RFC 2616 Hypertext Transfer Protocol HTTP/1.1, UC Irvine, J. Gettys, J. Mogul, DEC & H. Frystyk, T. Berners Lee, MIT/LCS, Jan 1997 [6] RFC 959 File Transfer Protocol (FTP). J. Postel, J. Reynolds. Oct 1985. [7] http://www.eventhelix.com/realtimemantra/networking/ftp.pdf, FTP Sequence Diagram [8] http://www.eventhelix.com/realtimemantra/networking/http_sequence_diagram.pdf, HTTP Sequence Diagram [9] http://www.isi.edu/lsam/publications/http perf/ RTT analysis to compare HTTP and FTP [10] http://curl.haxx.se/libcurl/, libcurl introduction [11] http://www.webopedia.com/term/m/multithreading.htm, Multi threading technology. [12] http://en.wikipedia.org/wiki/osi_model, ISO/OSI model and some protocols