Quick Introduction to HPSS at NERSC



Similar documents
Introduction to Archival Storage at NERSC

HSI Best Practices for NERSC Users

File Transfer Best Practices

NERSC Archival Storage: Best Practices

Large File System Backup NERSC Global File System Experience

High Performance Storage System. Overview

High Performance Storage System Interfaces. Harry Hulen

Archival Storage At LANL Past, Present and Future

Research Technologies Data Storage for HPC

Unix Sampler. PEOPLE whoami id who

SAS 9.4 In-Database Products

HPSS Best Practices. Erich Thanhardt Bill Anderson Marc Genty B

HPSS User's Guide. High Performance Storage System Release 7.4. March 2013 ( Revision 1.0) HPSS User's Guide March 2013 Release 7.4 (Revision 1.

The Einstein Depot server

CASHNet Secure File Transfer Instructions

Computing Service G72. File Transfer Using SCP, SFTP or FTP. many leaflets can be found at:

Performing Database and File System Backups and Restores Using Oracle Secure Backup

Product Brief: XenData X2500 LTO-6 Digital Video Archive System

NERSC File Systems and How to Use Them

StreamServe Persuasion SP4

IBM Tivoli Storage Manager Version Introduction to Data Protection Solutions IBM

USEFUL UNIX COMMANDS

1. Product Information

Online Backup Client User Manual Linux

MySQL Backups: From strategy to Implementation

RecoveryVault Express Client User Manual

SWsoft Plesk 8.3 for Linux/Unix Backup and Restore Utilities

List of FTP commands for the Microsoft command-line FTP client

The safer, easier way to help you pass any IT exams. Exam : Storage Sales V2. Title : Version : Demo 1 / 5

SWsoft Plesk 8.2 for Linux/Unix Backup and Restore Utilities. Administrator's Guide

Online Backup Linux Client User Manual

Online Backup Client User Manual

Fundamentals of UNIX Lab Networking Commands (Estimated time: 45 min.)

UCRL-WEB FTP Reference Manual. FTP Reference Manual - 1

LBNC and IBM Corporation Document: LBNC-Install.doc Date: Path: D:\Doc\EPFL\LNBC\LBNC-Install.doc Version: V1.0

System Administration. Backups

Backing Up TestTrack Native Project Databases

Parallels Cloud Server 6.0

Cisco Networking Academy Program Curriculum Scope & Sequence. Fundamentals of UNIX version 2.0 (July, 2002)

ITCertMaster. Safe, simple and fast. 100% Pass guarantee! IT Certification Guaranteed, The Easy Way!

IBM TSM DISASTER RECOVERY BEST PRACTICES WITH EMC DATA DOMAIN DEDUPLICATION STORAGE

Installation Guide for WebSphere Application Server (WAS) and its Fix Packs on AIX V5.3L

XenData Archive Series Software Technical Overview

Adobe Marketing Cloud Using FTP and sftp with the Adobe Marketing Cloud

Thirty Useful Unix Commands

Data storage services at CC-IN2P3

Online Backup Client User Manual

Software infrastructure and remote sites

EVault for Data Protection Manager. Course 361 Protecting Linux and UNIX with EVault

Online Backup Client User Manual

Implementing Network Attached Storage. Ken Fallon Bill Bullers Impactdata

Application Note. UDO Archive Appliance and C2C Archive One

Recommended File System Ownership and Privileges

INF-110. GPFS Installation

SmartFiler Backup Appliance User Guide 2.1

CA ARCserve Backup for Windows

Hopper File Management Tool

Linux System Administration on Red Hat

Linux command line. An introduction to the Linux command line for genomics. Susan Fairley

Use Enterprise SSO as the Credential Server for Protected Sites

WS_FTP Professional 12

Secure File Transfer Installation. Sender Recipient Attached FIles Pages Date. Development Internal/External None 11 6/23/08

SQL Server Instance-Level Benchmarks with DVDStore

Configuring MailArchiva with Insight Server

Data Movement and Storage. Drew Dolgert and previous contributors

IBM WebSphere Application Server Version 7.0

Automating FTP with the CP IT

BlueArc unified network storage systems 7th TF-Storage Meeting. Scale Bigger, Store Smarter, Accelerate Everything

Maurice Askinazi Ofer Rind Tony Wong. Cornell Nov. 2, 2010 Storage at BNL

RSA ACE/Agent 5.2 for UNIX Installation and Configuration Guide

Installing a Symantec Backup Exec Agent on a SnapScale Cluster X2 Node or SnapServer DX1 or DX2. Summary

Fred Hantelmann LINUX. Start-up Guide. A self-contained introduction. With 57 Figures. Springer

emedny FTP Batch Dial-Up Number emedny SUN UNIX Server ftp

IBM Campaign and IBM Silverpop Engage Version 1 Release 2 August 31, Integration Guide IBM

Guide to the Configuration and Use of SFTP Clients for Uploading Digital Treatment Planning Data to IROC RI

Time Navigator Migration Guide Upgrading Time Navigator Version or 4.1 to Version 4.2

PRiSM Security. Configuration and considerations

AFS Usage and Backups using TiBS at Fermilab. Presented by Kevin Hill

11. Oracle Recovery Manager Overview and Configuration.

RSA Authentication Manager 7.1 Basic Exercises

EMC Data Protection Search

Beyond Windows: Using the Linux Servers and the Grid

Setting-Up an Open-Source Backup Software Amanda Community in About 15 Minutes

SmartFiler Backup Appliance User Guide 2.0

Programming for GCSE Topic H: Operating Systems

EVault Software. Course 361 Protecting Linux and UNIX with EVault

File Transfer Protocol - FTP

TS-800. Configuring SSH Client Software in UNIX and Windows Environments for Use with the SFTP Access Method in SAS 9.2, SAS 9.3, and SAS 9.

Online Backup Client User Manual Mac OS

Online Backup Client User Manual Mac OS

VMware vsphere Data Protection Advanced 5.5

Using SSH Secure Shell Client for FTP

GeBro-BACKUP. Die Online-Datensicherung. Manual Pro Backup Client on a NAS

PageR Enterprise Monitored Objects - AS/400-5

Understanding Disk Storage in Tivoli Storage Manager

Transcription:

Quick Introduction to HPSS at NERSC Nick Balthaser NERSC Storage Systems Group nabalthaser@lbl.gov Joint Genome Institute, Walnut Creek, CA Feb 10, 2011

Agenda NERSC Archive Technologies Overview Use Cases for the Archive Hands-on: Authentication Client Usage and Examples Client Installation 2

NERSC Archive Has 2 Levels, Fast Frontend Disk Cache and Enterprise Tape Current data volume: 12PB in 100M files written to 26k tapes (user system) Permanent storage is magnetic tape, disk cache is transient All data written to HPSS goes through the disk cache Disk to tape migration occurs every 30 minutes Data retained on disk approximately one week, on average Tapes and tape drives are contained in robotic libraries Cartridges are loaded/unloaded into tape drives by sophisticated library robotics 110 tape drives in user (archive) system 3 cartridge and drive technologies in use: Oracle T10KB/T10KC (1TB/5TB, high capacity) and 9840D (fast access, 80GB) 3

Front-ending the Tape Subsystem is 240TB Fast-access disk Disk cache hardware: Data Direct Networks 9550 FC and 9900 SAS disk arrays User system has 13 server nodes, IBM p4/p5/p7 running AIX 12 IO nodes called data movers: read/write to network, disk and tape devices 1 core server: coordinates system activity and serves metadata HPSS storage application is under active development by IBM, LBNL, LLNL, LANL, SNL, and ORNL. NERSC has 2 full-time HPSS developers on staff New features, stability improvements, and bug fixes are continually being developed. 4

Approximately 50% data growth per year NERSC has 4 dedicated DTN nodes for high-speed transfers Transfer rates over 1GB/sec are possible 5

It is Important to Know How to Store and Retrieve Data Efficiently HPSS clients can emulate file system qualities FTP-like interfaces can be deceiving: the archive is backed by tape, robotics, and a single DB2 database instance for metadata Operations that would be slow on a file system, e.g. lots of random IO, can be impractical on the archive HPSS does not stop users from making mistakes It is possible to store data in such a way as to make it difficult to retrieve Tape storage systems do not work well with small files The archive has no batch system. Inefficient use affects others. 6

Use Cases for the Archive Typical use case: long-term storage and retrieval of very large raw data sets Good for incremental processing Long-term storage of result data Data migration between compute platforms Backups (/project and system/server backups) 7

Authentication is easy NERSC storage uses a token-based authentication method User places encrypted authentication token in ~/.netrc file at the top level of the home directory on the compute platform Token information is verified in the NERSC LDAP user database All NERSC HPSS clients can use the same token Tokens are username and IP specific must generate a different token for use offsite 8

Authentication Hands-on Authentication tokens can be generated in 2 ways: Automatic NERSC auth service (recommended): Log into any NERSC compute platform Type hsi Enter NERSC password Manual https://nim.nersc.gov/ website Under Actions dropdown, select Generate HPSS Token Copy/paste content into ~/.netrc chmod 600 ~/.netrc Use NIM website to generate token for alternate IP address 9

~/.netrc example machine archive.nersc.gov! login joeuser! password 02UPMUezYJ/Urc7ypflk7M8KHLITsoGN6ZIcfOBdBZBxn+BViShg==! machine ftp.nersc.gov! login anonymous! password joeuser@nersc.gov! Check permissions on this file Should be rw for user only 10

HPSS Client Overview Parallel, threaded, high performance: HSI Unix shell-like interface HTAR Like Unix tar, for aggregation of small files PFTP Parallel FTP Non-parallel: FTP Ubiquitous, many free scripting utilities and APIs GridFTP interface (garchive) Connect to other grid-enabled storage systems 11

Hands-on Examples: HSI Most flexibility, many features and options Can cause problems if not used correctly (supports recursive transfers of small files/ directories) Features: Parallel, high speed transfers Interactive and non-interactive modes Common shell commands: chown, chmod, ls, rm, etc. Recursion Command-line editing and history Wildcards Connecting to the archive: type hsi bash-4.0$ hsi [Authenticating] A:/home/j/joeuser-> 12

Interactive HSI Transfer A:/home/j/joeuser-> put myfile put 'myfile' : '/home/j/joeuser/myfile' ( 2097152 bytes, 31445.8 KBS (cos=4)) Retrieve A:/home/j/joeuser-> get myfile get 'myfile' : '/home/j/joeuser/myfile' (2010/12/19 10:26:49 2097152 bytes, 46436.2 KBS ) Full pathname or rename A:/home/j/joeuser-> put local_file : hpss_file A:/home/j/joeuser-> get local_file : hpss_file Wildcards A:/home/j/joeuser-> prompt prompting turned off A:/home/j/joeuser-> mput.bash* 13

Non-interactive HSI One-line mode bash-4.0$ hsi mkdir mydir; cd mydir; put myfile; ls l Command File bash-4.0$ cat mycommands.txt put myfile ls -l quit bash-4.0$ hsi in mycommands.txt Here Document bash-4.0$ hsi <<EOF put myfile ls -l quit EOF Standard Input bash-4.0$ echo 'mkdir mydir; cd mydir; put myfile; ls -l; quit' hsi 14

Hands-on Examples: HTAR Similar to Unix tar Parallel, high speed transfers, like HSI Recommended utility for archiving small files Faster/safer than running Unix tar via pipeline Creates index for fast file retrieval HTAR traverses subdirectories to create tarcompatible aggregate file in HPSS No staging space required Limitations: Aggregate file can be any size, recommend 500GB max Aggregates limited to 5M member files Individual HTAR member files max size 64GB 155/100 character prefix/filename limitation 15

HTAR, Continued Create archive bash-4.0$ htar cvf /home/n/nickb/mytarfile.tar./mydir HTAR: a./mydir/ HTAR: a./mydir/foofile HTAR: a /scratch/scratchdirs/nickb/htar_cf_chk_50212_1297706778 HTAR Create complete for /home/n/nickb/mytarfile.tar. 2,621,442,560 bytes written for 1 member files, max threads: 3 Transfer time: 11.885 seconds (220.566 MB/s) List archive bash-4.0$ htar tvf /home/n/nickb/mytarfile.tar Extract member file(s) bash-4.0$ htar xvf /home/n/nickb/mytarfile.tar./mydir/foofile 16

PFTP and FTP PFTP Standard FTP-like interface distributed with HPSS Implements parallel transfers for performance FTP-compatible syntax Scriptable with some effort (Here doc or command file) NERSC compute platforms only bash-4.0$ pftp i < cmds.txt FTP Available everywhere, but non-parallel, low performance Free utilities such as ncftp, curl, and Perl Net::FTP add flexibility for scripting Both interfaces implement ALLO64 <filesize> for writing files to the correct COS 17

GridFTP GridFTP uses a certificate based authentication method not ~/.netrc Users can use grid credentials to transfer data between other grid-enabled sites GridFTP is the server Clients include uberftp and globus-url-copy Clients often support user-tunable parameters for WAN transfer 18

HPSS Client Download and Installation HPSS clients are provided on NERSC systems (hopper, franklin, etc.) No download/installation necessary HSI and HTAR are now installed on JGI system phoebe HSI and HTAR are licensed for binary download for NERSC users (workstations, servers, offsite platforms) Go to the NERSC software download page https://www.nersc.gov/users/data-and-networking/hpss/storing-and-retrieving-data/ software-downloads/ Select appropriate version for your hardware/os (NERSC username/ password required) Minor OS version differences may be Ok FTP client is usually available on most operating systems Lower performance on high-speed networks Problems with authentication on Windows7 19

Reporting Problems NERSC Staff: Contact Storage Systems Email ssg@nersc.gov 24x7 NERSC Operations: 510-486-6821 NERSC Users: Contact NERSC Consulting Toll-free 800-666-3772 510-486-8611, #3 Email consult@nersc.gov. 20

Further Reading NERSC Website http://www.nersc.gov/users/data-and-networking/hpss/ NERSC Grid documentation http://www.nersc.gov/users/software/grid/data-transfer/ HSI, HTAR, PFTP man pages should be installed on NERSC compute platforms Gleicher Enterprises Online Documentation (HSI, HTAR) http://www.mgleicher.us/gel/ HSI Best Practices for NERSC Users LBNL Report #LBNL-4745E 21