Comparisons between HTCP and GridFTP over file transfer Andrew McNab and Yibiao Li Abstract: A comparison between GridFTP [1] and HTCP [2] protocols on file transfer speed is given here, based on experimental results collected from transferring files to various destinations via HTCP protocol and GridFTP protocol. Mean value and standard deviation are used in the statistical analysis of the results, and a percentage offset analysis is introduced for the analysis of sets of the experimental data. The comparisons between the HTTP [3] and FTP [4] protocols over file transfer are also discussed here. The results show that HTCP file transfer is slightly faster than GridFTP file transfer in average. Keywords: HTCP, GridFTP, file transfer, percentage offset Background Both HTCP and GridFTP were developed as grid computing [3] client side tools. GridFTP is a high performance, secure, reliable data transfer protocol optimized for high bandwidth wide area networks. The GridFTP protocol is based on FTP, and is extended it with facilities such as multistreamed transfer, autotuning and Globus based security. GridFTP supports the following features: Grid Security Infrastructure (GSI) and Kerberos support, third party control of data transfer, parallel data transfer, striped data transfer, partial file transfer, automatic negotiation of TCP buffer/window sizes, support for reliable and restartable data transfers. HTCP is developed as part of a family of Unix commands for performing file operations on remote HTTP and HTTPS fileservers. In particular, these commands provide client side support for the GridSite extensions to the Apache web server, include GSI proxies, authorization by X.509 certificate, PUT / DELETE / MOVE support, and GridHTTP bulk data transfers. It also supports: third party control of data transfer, partial file transfer and parallel data transfer. So there are great similarities between these two protocols (software tools). But the fundamental difference is that GridFTP is based on FTP protocol but HTCP is based on HTTP(S) protocol. The difference in file transfer mode between HTTP and FTP brings on the difference in file transfer mode between HTCP and GridFTP. GridFTP, following FTP, occupies two TCP channels to transfer file, one channel is used for the purpose of control and another is used for data transfer, while HTCP, following HTTP, normally uses one TCP channel for file transfer. Because HTCP and GridFTP are developed basing on HTTP and FTP, respectively, and there are few articiles or reports on the comparisons of these two protocols over the file transfer, so we start with comparing HTTP to FTP, and hope the comparison can be extended to the comparison between HTCP and GridFTP. Due to the complexities of HTTP and FTP protocols [5] [6], as well HTCP and GridFTP protocols, it is
almost impossible for us to compare each pair in theory and then give a conclusion of which protocol is better in file transfer. Instead, we choose to do speed tests to collect as much data as we can, then give a result based on the statistical analysis. Note the fact that the network traffic is various from time to time, and the capacities of the hardwares, such as computer CPU, memory and network device, can affect the network transformation, that means the test results may be various from time and test computer, so trying to build a fair environment for the test is critical for our test. And due to this reason, it is impossible for us to do a quantitative analysis and finally to give an exact result. Also, both FTP and HTTP are general protocols, lots of software tools derived from each protocol. Any attempt to compare these two protocols is unwise, but as we demonstrate in the later sections, the comparison between particular pair derived from these two protocols is feasible. Mathematical analysis methods To analyze the test results, we will use the following mathematical methods. The mean value is average of the numbers while the standard deviation tells how much variation of a set of values is from its average. Percentage offset and distribution To analyze a single set of data, the mean value and the standard deviation may be enough. However, as you will see in the following sections, we will combine many sets of data to do the analysis, because different set of data has its own range, it is quite difficult for us to put different set of data together, then apply a mathematical method to do analysis. Now we introduce a percentage offset, defined as a file transfer speed s percentage offset from the mean value of its set of data. For example, if we collect a set of file transfer speed data over HTTP for a specified server and a specified client, Where j stands for the j th test for file transfer over HTTP, and n j is the number of the test data collect from the j th test. Thus once we specified a value V j1, the offset percentage data of this sequence from the specified value M j1 will be:, Mean value and standard deviation Supposed that N times of test have been done and the test results (file transfer time) are: x 1, x 2,..., x N, then the mean value of the test result is defined as: Where. Similarly, for j th test of the file transfer over FTP, the speed data: and the standard deviation is defined as: Where j stands for the j th test for file transfer over FTP, and m j is the number of the test data collect from the j th test. Thus once we specified a value V j2, the offset percentage data of this sequence from the specified value V j2 will be:
where To compare two sequences H j and F j, we define: Where So the comparison will be based on the same value V j for the j th test. Based on the percentage sequence, we can do two comparisons. One is the general offset, can be calculated by the sum of the sequence, which gives the speed comparison. Another is the distribution comparison, specifying a percentage value P, and counting on the numbers in the H and F sequences, when the α ji or β ji is smaller than P. The comparison gives the possibilities for transfer speed over one protocol to be faster than another one. Let us apply this method to one of the tests now. HTTP vs. FTP The original HTTP (version 0.9) provided a way to publish or retrieve hypertext pages, now it (version 1.1, 1.2) is extended to transfer or convey information on intranet and the World Wide Web. FTP or File Transfer Protocol is used to transfer data from one computer to another over the Internet, or through a network. Specifically, FTP is a commonly used protocol for exchanging files over any network that supports the TCP/IP protocol (such as the Internet or an intranet).., There are a lot of criticisms on FTP about its security issue, such as two TCP/IP connections, conflicting with firewall, etc. People turn to utilize HTTP(S) to do file transfer. The advantages of file transfer over HTTP(S) are obvious: occupying only one TCP/IP connection, persistent connection, no more cost on server etc. Though there are a lot of arguments about the file transfer speed over HTTP(S) comparing to FTP, we cannot find quantitative analysis on this issue so far. The common answer on internet is FTP is a more efficient protocol for transferring larger files, whereas HTTP is more efficient for smaller files. Theoretically, both HTTP and FTP are located in the application layer in the ISO/OSI standard model [12], and in practise both are implemented basing on the TCP connections (note that HTTP(S) does not constrain to use TCP/IP connection, it can also deploy other connections, but most popular HTTP (S) products are using TCP/IP connection). Supposed two same size packages are transferred though the same TCP connection under the same network traffic condition, there should not be the delivering time difference in theory. But HTTP and FTP have different mechanisms to utilize the TCP connection, causing the transfer speed s difference. To measure the difference of file transfer speed over between HTTP and FTP is quite difficult. An analysis [9] of RTT (round trip time), comparing FTP sequence diagram [11] to HTTP sequence diagram [10], shows that HTTP is slightly faster than FTP during TCP calling, but there are still many other mechanisms for file transfer over each of two protocols, so the RTT analysis cannot make a conclusion. Instead of struggling to compare these two protocols in theory, we do some experiments in practice. After building an environment, and collecting data of file transfer speed over HTTP and FTP, we analyze the speed data and try to find a conclusion.
To build a fair circumstance to do the test, we equipped a computer with HTTP server and FTP server as a test server. Then using C, together with LIBCURL [10] API and multithreading technology [11] to write a client test tool, the flowchart of the test tool is shown in Fig 1. LIBCURL provides low level programming for file transfer over HTTP and FTP, and multi threading technology enable us to start HTTP and FTP file transfer simultaneously. Before doing the tests, we copy the file to be transferred into two directories, one can be fetched through HTTP and another can be fetched by FTP. Then test tool transfers two files (actually the same file) to a destination over HTTP and FTP, certainly, two files are saved in either different name or different locations. This ensures that there is no confliction on either source server side or destination side during file IO, and also ensure that the file is transferred under the same network traffic conditions. So the test is under the fair conditions for HTTP and FTP. Figure 1: a flow chart of test tool to collect data from file transfer over HTTP and FTP, respectively. Local network test Before doing the remote network test, let us do it in a local network, which has few network traffic congestions. We execute the client test tool in a computer within the server s local network; the collected data are plotted in Figure 2.
Figure 2: a plot of results for comparison of 1 megabyte file transfer over HTTP and FTP protocols. Red line for HTTP transfer and blue line for FTP transfer. File transfer speed over HTTP is drawn in red and FTP in blue. Fig.2 shows the file transfer speed when a 1megabyte file was repeatedly transferred for 100 times within a local network over HTTP and FTP. At the points 3, 22, 31 and 86, the speed curves for both HTTP and FTP are jumped up to the levels far from the averages, the file transfer speeds slow down, because there are some other the network traffics at these points. Apart from these points, we can see that the speed curves for HTTP and FTP are both nearly flat, there are few oscillations. In this test, the file transfer speed over HTTP is faster than the speed over FTP, and the file transfer speed over HTTP is coincide to the file transfer speed over FTP. Another test for a 100 megabytes file to be transferred within a local network gives the similar conclusions as shown in Fig. 3. Because the file transfer time for a 100 megabyte file is longer than the time for a one megabyte file and the transfer of the 100 megabyte file increases the network traffic, so the oscillations are more in Fig 3 than in Fig 2, but it can be obviously seen that file transfer speed over HTTP coincides with the speed over FTP, in average, the file transfer speed over HTTP is faster than the speed over FTP. Figure 3: a plot of results for comparison of 100 megabyte file transfer over HTTP and FTP protocols. Red line for HTTP transfer and blue line for FTP transfer. File transfer speed over HTTP is drawn in red and FTP in blue. From the above two tests, we know that 1. The transfer speeds over HTTP or FTP are stable if no disturbance. 2. The transfer speed over HTTP should coincide with the transfer speed over FTP. Though the above two tests show that the file transfer speed over HTTP is faster than the speed over FTP, but we cannot draw a conclusion just from these two tests, because we cannot predict the result after the environment changes, for example, what if the server is changed, the network devices are changed or the test tool is reprogrammed. Grid network test The local network tests over HTTP and FTP have been shown in the previous section, before extending the tests to the comparison over GridFTP and HTCP, we will present the tests over HTTP and FTP in grid network. Similarly, we equip a grid node with a HTTP and FTP server, and use the client test tool mentioned in last section to download a specific file over HTTP and FTP at the same time, and then collect the
speed data. In practise, we use the GridPP as test environment; note that the grid nodes are distributed world widely, so the routes on which the files are transferred may consist of tens or hundreds of computers or network devices, that means the network traffic becomes more complicated in the grid network that it in local network. Choose a different client or server, the test results may vary in a quite big range. Figure 4: a plot of results of 100 megabyte file transfer over HTTP and FTP in Grid Network. Red triangle for HTTP transfer data and blue diamond for FTP transfer data. The test environment is similar to the local network test, but here, the clients are chosen from grid nodes that have CURL command available (actually, it comes with most of the recently popular LINUX/Unix). By turn we transfer files over HTTP and FTP with CURL command, note that we put the same file under HTTP and FTP directories, and record the transferring times, respectively. Then repeated it for a specified times, for example, 20 times, and collected the speed data of the files to be transferred. After we do enough times of test and get enough data, we can analyse the data, and see which way between CURL HTTP and CURL FTP is the best regarding the file transfer. However, the test on a particular client can be considered as a particular case, to draw a general conclusion, we must do as many tests on different clients as we can. In practise, tens of clients distributed in differently geographic locations are chosen to do the tests. Fig.5 shows a plot of mean speeds of the tests. It is clearly seen that though the transfer speeds change up and down through the tests, but the speeds over HTTP and FTP well coincides just as shown in previous section. CURL command tests We present tests of comparison of file transfer between HTTP and FTP in the previous sections. The results may not be convinced due to the test tool built by authors. Now let us turn to CURL command. CURL is a command line tool for transferring files with URL syntax, supporting FTP, FTPS, HTTP, HTTPS, SCP, etc. Because it is open source command and long time being popularly used, it should be well coincided with HTTP and FTP protocols. So we will test the file transfer speed using CURL command instead of the test tool. Figure 5: A plot of mean speeds for the test using CURL HTTP and CURL FTP. Red stars stand for the mean values collected by CURL HTTP, while blue circles stand for the mean values collected by CURL FTP. From the comparison of mean values of the transfer speed over HTTP and FTP by using CURL command, we can see that in most of cases, the speeds are equivalent. Occasionally, the mean
speeds between CURL HTTP and CURL FTP have differences, for example, in tests 3 and 6 in Fig.5. Contrasting to the standard deviations (Fig. 6), we find that when the mean values between the CURL HTTP and CURL FTP have obvious differences, for example, in test 3 and 6 in the Fig. 5, the related tests have big standard deviations. As known, the big deviation is normally caused by some abnormal data (far from the mean value). Here the big standard deviations are caused by the unusual file transfer speeds, due to the unstable network states. Figure 7: A plot of percentage offsets of the speeds collected from CURL HTTP and CURL FTP tests. The red stars stand for the CURL HTTP speed offsets, the blue circles stand for the CURL FTP speed offsets. Figure 8: A plot of distribution of the file transfer speed for CURL command over HTTP and FTP. The red bars stand for the percentage data of the speeds over HTTP, the blue bars stand for the percentage data of the speeds over FTP. Figure 6: A plot of standard deviations for the speed test using CURL HTTP and CURL FTP. The red stars standard the standard deviation calculated from the data collected by CURL HTTP, the blue circles, by CURL FTP. HTCP vs. GridFTP Following the CURL tests in Grid network, we present the HTCP and GridFTP test in grid network here. Using the same test environment in previous section, changing the CURL command for file transfer over HTTP to HTCP and CURL command for file transfer over FTP to GridFTP, we can perform the tests, and the test plots of mean values, standard deviations, percentage offsets and distribution are shown in the figures 9 12, separately.
Figure 10: A plot of standard deviations for the speed test using HTCP and GridFTP. The red stars standard the standard deviation calculated from the data collected by HTCP, the blue circles, by GridFTP. Figure 9. A plot of mean values of file transfer speeds collected from the tests fetching the same file using HTCP and GridFTP from the various Grid nodes. The red star shows the mean speed collected from HTCP test and the blue circle shows the mean speed collected from the GridFTP test. Figure 9 clearly shows that the HTCP mean speeds and GridFTP mean speeds are quite close, and at most nodes, the HTCP mean speed is slightly faster than GridFTP mean speed. But from figure 10, the plot of standard deviation, we know that the HTCP transfer has the bigger standard deviation than GridFTP transfer. From mathematical analysis, the bigger standard deviation is caused by either only the values are far from the mean value at some individual points or the most values are away from the mean value. Figure 11 shows that fewer HTCP percentage data are far from the mean value, so we know that HTCP file transfer has more chances away from mean value. Figure 11: A plot of percentage offsets of the speed data collected from HTCP and GridFTP tests. The red stars stand for the HTCP speed offsets, the blue circles stand for the GridFTP speed offsets. Figure 12: a plot of distribution of the file transfer speed for HTCP and GridFTP. The red bars stand for the percentage data of the speeds over HTCP, the blue bars stand for the percentage data of the speeds over GridFTP.
Conclusions In previous sections, we have compared the file transfer speeds between HTTP and FTP, CURL HTTP and CURL FTP, as well as HTCP and GridFTP. The comparison between HTTP and FTP tells that the HTTP and FTP, both as application layer protocols, sharing the lower layer TCP connection during the file transferring, are affected similarly by the network traffic. The comparison between CURL HTTP and CURL FTP, shows that the speeds for using the CURL command to transfer file over between HTTP and FTP are almost levelled, the speeds under unstable network status cannot be predicted. And the comparison between HTCP and GridFTP shows that as a UNIX/LINUX commands, HTCP can transfer files slightly faster than GridFTP in average. As well known, the GridFTP can only operate files on or between GridFTP server(s), which have been installed the GridFTP package, and the user computer, is also required to install a GridFTP client package. However, the HTCP command can operate files on a normal website or Gridsite node, the user computer, need only to install a HTCP command, is easier to use than GridFTP. [1] http://www.globus.org/grid_software/data/gridftp.php, GridFTP information entry [2] https://www.gridsite.org/wiki/htcp_command, HTCP information entry [3] http://en.wikipedia.org/wiki/http, HTTP information page [4] http://en.wikipedia.org/wiki/file_transfer_protocol, FTP information page [5] RFC 2616 Hypertext Transfer Protocol HTTP/1.1, UC Irvine, J. Gettys, J. Mogul, DEC & H. Frystyk, T. Berners Lee, MIT/LCS, Jan 1997 [6] RFC 959 File Transfer Protocol (FTP). J. Postel, J. Reynolds. Oct 1985. [7] http://www.eventhelix.com/realtimemantra/networking/ftp.pdf, FTP Sequence Diagram [8] http://www.eventhelix.com/realtimemantra/networking/http_sequence_diagram.pdf, HTTP Sequence Diagram [9] http://www.isi.edu/lsam/publications/http perf/ RTT analysis to compare HTTP and FTP [10] http://curl.haxx.se/libcurl/, libcurl introduction [11] http://www.webopedia.com/term/m/multithreading.htm, Multi threading technology. [12] http://en.wikipedia.org/wiki/osi_model, ISO/OSI model and some protocols