File Transfer Best Practices

Size: px
Start display at page:

Download "File Transfer Best Practices"

Transcription

1 File Transfer Best Practices David Turner User Services Group NERSC User Group Meeting October 2, 2008

2 Overview Available tools ftp, scp, bbcp, GridFTP, hsi/htar Examples and Performance LAN WAN Reliability Future 2

3 File Transfer Protocol (ftp) Benefits Client everywhere Familiar Good performance pftp/pftp_gsi Liabilities Security Password transmitted as plain text Servers don t accept inbound connections 3

4 Secure Copy (scp) Benefits Client everywhere Familiar Based on ssh Liabilities Performance data encrypted static buffer size; HPN-SSH (PSC) eliminates Some usability issues need silent dot-files no filename completion 4

5 BaBar Copy (bbcp) Benefits Peer-to-peer root access not needed to install Performance uses 4 data streams by default many other tuning options Can use ssh for authentication Liabilities Must be installed at both ends Not widely available but easy to build 5

6 Building bbcp Download bbcp from Stanford % wget bbcp.tar.z % tar zxvf bbcp.tar.z % cd bbcp 6

7 Building bbcp (cont.) Modify Makefile Replace: LIBZSO = -L/usr/local/lib -lz LIBZ = /usr/local/lib/libz.a with: LIBZSO = -L/usr/lib -lz LIBZ = /usr/lib/libz.a or: LIBZSO = -L/lib64 -lz LIBZ = /lib64/libz.so.1 or: something similar 7

8 Build % make Building bbcp (cont.) Copy to final location % cp bin/i386_linux26/bbcp ~/bin 8

9 GridFTP Benefits Variety of clients globus-url-copy, uberftp, GUIs Performance many tuning options Liabilities Complex grid infrastructure steep learning curve Not widely available but fairly easy to install 9

10 Installing GridFTP Clients Download Pacman from Boston University % wget tarballs/pacman-latest.tar.gz % tar zxvf pacman-latest.tar.gz % cd pacman-3.26/ % source setup.csh 10

11 Installing GridFTP Clients (cont.) Install VDT clients from NERSC cache % mkdir ~/VDT % cd ~/VDT % pacman -get pacman:nersc-grid-client Answer y for trusted cache questions Answer y if you agree to the licenses Answer n for install CA files question disabled, so it doesn t matter get certificates via myproxy-logon 11

12 Installing GridFTP Clients (cont.) Configure grid environment % source setup.csh OR % grep setup ~/.tcshrc.ext source ~/VDT/setup.csh OR % env > env.before % source setup.csh % env > env.after % diff env.before env.after Then add differences to appropriate dotfiles. 12

13 hsi/htar Benefits Shell-like syntax wildcards, recursive operations Performance Available for many platforms download from NERSC website Liabilities Shell-like syntax some peculiar differences Performance suffers in presence of firewalls 13

14 HPSS Access From within NERSC hsi/htar ftp/pftp/pftp_gsi globus-url-copy/uberftp From outside NERSC hsi/htar (download from website) ftp globus-url-copy/uberftp No ssh/scp/sftp or bbcp 14

15 Installing Local hsi Client Download from NERSC website Click I agree box, choose platform FreeBSD 6.3, FreeBSD 7.0, RedHat/Centos 4.6, RedHat/Centos 5, SUSE 9.4, SUSE 10.2 Install in local directory % tar zxvf hsi rhel i386.tar.gz % cd hsi rhel i386 % make INSTALL_PREFIX=/home/dpturner install % ~/bin/hsi 15

16 Tools Available at NERSC All compute platforms ftp/pftp/pftp_gsi, scp, bbcp, globus-url-copy, uberftp, hsi/htar Except Bassi bbcp not available (yet?) 16

17 Concerning Hostnames Many platforms use DNS round-robin bbcp and GridFTP need static names franklingrid bassigrid jacquardgrid HPSS garchive DaVinci Single node, single name 17

18 Examples: ftp HPSS only Dummy password for authentication server % module show www % ssh auth@auth.nersc.gov auth@auth.nersc.gov's password: <enter dummy password> [auth]: ftppass DCE Principal: dpturner DCE Password: login 013bQpWk0CMHt0eVtBCpIkKXlplWRke3OqEwB1EF2s+tnfkyCRvY+IUg== password 013bQpWk0CMHt0eVtBCpIkKXlplWRke3OqEwB1EF2s+tnfkyCRvY+IUg== [auth]: quit This will all be changing soon (Q1-09?) HPSS tokens from NIM 18

19 Examples: ftp (cont.) Create.netrc file % cat >.netrc machine archive.nersc.gov login 013bQpWk0CMHt0eVtBCpIkKXlplWRke3OqEwB1EF2s+tnfkyCRvY+IUg== password 013bQpWk0CMHt0eVtBCpIkKXlplWRke3OqEwB1EF2s+tnfkyCRvY+IUg== Verify owner permissions only % ls -l.netrc -rw dpturner dpturner 160 Oct 1 21:33.netrc Can be used internally or externally Only valid on subnet where created 19

20 Examples: ftp (cont.) Connect to HPSS (archive) % ftp archive.nersc.gov Connected to heart-g0.nersc.gov. 220 heart-g0 FTP server (HPSS 5.1 PFTPD V1.1.1 Tue Aug 26 10:06:28 PDT 2008) ready. 334 Using authentication type GSSAPI; ADAT must follow GSSAPI accepted as authentication type GSSAPI error major: Miscellaneous failure GSSAPI error minor: No credentials cache found GSSAPI error: initializing context GSSAPI authentication failed 504 Unknown authentication type: KERBEROS_V4 KERBEROS_V4 rejected as an authentication type 331 Password required for /.../dce.nersc.gov/dpturner. 230 User /.../dce.nersc.gov/dpturner logged in. Remote system type is UNIX. Using binary mode to transfer files. ftp> pwd 257 "/nersc/ccc/dpturner" is current directory. 20

21 NERSC File Systems Seconds to cp 5 GB file within file system franklin bassi jacquard davinci /scratch /project

22 Examples: scp General form: scp Allows third-party transfers All ssh authentication methods available public/private key pairs facilitate file transfer More common: scp file.5g franklin: Don t forget : Watch for chatty dotfiles 22

23 Examples: bbcp Messy command line bbcp -T ssh -x -a -ofallbacktorsh=no %I -l %U %H /usr/common/usg/bin/bbcp file.5m davinci:/scratch/scratchdirs/dpturner Can use configuration file to simplify Can specify TCP/IP socket buffer size To set window to 1 MB: bbcp w 1M T Options for recursion, permissions, diagnostics, performance, and restart 23

24 24 Example: Internal Transfers bbcp scp 5 GB File Seconds MB/s jacquard bassi franklin davinci davinci bassi franklin jacquard davinci jacquard franklin bassi davinci jacquard bassi franklin MB/s Seconds To From

25 Examples: globus-url-copy Get a short-term proxy certificate myproxy-logon -T -s nerscca.nersc.gov Jaguar to DaVinci globus-url-copy file:///tmp/work/dpturner/foo gsiftp://davinci.nersc.gov/scratch/scratchdirs/dpturner/foo Jaguar to HPSS (archive) globus-url-copy file:///tmp/work/dpturner/foo gsiftp://garchive.nersc.gov/nersc/ccc/dpturner/foo To set 4 streams and 1 MB window globus-url-copy -p 4 -tcp-bs 1MB... Must specify absolute pathnames 25

26 Examples: uberftp Get a short-term proxy certificate myproxy-logon -T -s nerscca.nersc.gov Jaguar to DaVinci uberftp davinci.nersc.gov Jaguar to HPSS (archive) uberftp garchive.nersc.gov To set 4 streams and 1 MB window uberftp> parallel 4 uberftp> tcpbuf

27 Example: Jaguar to Franklin 100 MB file scp bbcd globus-url-copy uberftp Option s B/s s B/s s B/s s B/s none K M M M -w 1M M -w 500K M -p M -p 4 tcp-bs 1MB M parallel M parallel 4 tcpbuf M 27

28 Example: Jaguar to Jacquard 100 MB file scp bbcd globus-url-copy uberftp Option s B/s s B/s s B/s s B/s none K M K -w 1M M -w 500K M -p M -p 4 tcp-bs 1MB M parallel M parallel 4 tcpbuf M 28

29 Example: Jaguar to DaVinci 100 MB file scp bbcd globus-url-copy uberftp Option s B/s s B/s s B/s s B/s none M M M -w 1M M -w 500K M -p M -p 4 tcp-bs 1MB M parallel M parallel 4 tcpbuf M 29

30 Example: Jaguar to Archive 100 MB file ftp hsi globus-url-copy uberftp Option s B/s s B/s s B/s s B/s none K M M M tcp-bs 1MB M tcpbuf M 30

31 Reliability bbcp can restart failed transfers -k keeps any partially created target files. The k option allows full recovery after a copy failure. By default, partial files are removed after a copy fails. -a appends data to the end of the target file if the target is found to be incomplete due to a previously failed copy operation. Could script reliable transport of large number of files 31

32 Future bbcp on Bassi Transfer nodes with good network, NGF, and HPSS performance provide external access to NGF provide true parallel GridFTP into HPSS Automated transfer benchmarks to provide real-time monitoring SRB/SRM New and improved web pages! 32

33 Resources

34 34