High Performance Storage System Interfaces Harry Hulen 281-488-2473 hulen@us.ibm.com
Outline Introduction to Interfaces The Collaboration
How a SAN File System Works 3. MDC sends lock and ticket back to client LAN Computer 1. issues READ to Metadata Controller Metadata Controller SAN 2. MDC accesses metadata on disk Metadata Shared 4. reads data from shared disk over SAN Examples: IBM SAN FS ADIC SNFS is similar but adds tape (see next slide)
How Works 5. CS sends lock and ticket back to client 3. CS commands Mover to stage file from tape to disk LAN Computer 1. issues READ to Core Core 2. MDC accesses metadata on disk Metadata SAN Shared Tape 6a. reads data from shared disk over SAN Mover 4. Mover stages file from tape to disk 6b. Or Mover reads data and sends to client over LAN
Architecture Hierarchical global file system Distributed, cluster architecture provides horizontal growth Core SAN and/or LAN connected Metadata engine is IBM DB2 LAN Metadata Disks Multiple storage classes Striped disks and tapes for higher data rates Multi-petabyte capability in a single name space Computers SAN Backup Core Tape-Disk Movers Supports IBM AIX, Linux, Sun Solaris, and some SGI Irix components, mix and match Disk Arrays Robotic Tape Libraries
Multiple Storage Systems Shared nothing architecture Inefficient use of resources Disk Cache Disk Cache Disk Cache Storage Manager Storage Manager Storage Manager Tape Drives Tape Drives Tape Drives
Shared-Resource Archive A single distributed storage solution with all resources shared Disk Cache Pool Storage Manager Movers Tape Drive Pool
Outline Introduction to Interfaces The Collaboration
Interface Summary The following charts illustrate these interfaces, with emphasis on data ingestion (transferring data into ): Write or Put Over TCP/IP Network FTP Flow SAN-Enabled Write or Put Pull from SAN to Disk or Tape POSIX VFS Interface NFS Interface via VFS Agent Windows Interface via VFS Agent Other possibilities: Storage Interface (a useful 3 rd party suite of easy-to-use interfaces) Reading data from transportable media ( sneaker net )
Write or Put Over TCP/IP Network SAN Disk Store Domain Tape Libraries 2. transfers file to Disk or Tape Over TCP/IP LAN or WAN using an Mover FC SAN Cluster Computers Metadata Disks API IP Network 1. issues Write or Put to Core Cluster Computers Core and Movers API Hpss_write( ) etc. Optional list form to access discontiguous segments Parallel, gigabyte/s capability Use for performancecritical applications PFTP Parallel FTP FTP-like get-put semantics Parallel, gigabyte/s capability Most-used interface
FTP Flow SAN Disk Store Domain Tape Libraries 2. transfers file to Disk or Tape Over TCP/IP LAN or WAN using an Mover and FTP Daemon FC SAN Cluster Computers Metadata Disks UNIX FTP IP Network 1. issues FTP Put to FTP Daemon Cluster Computers Core, Movers, and FTP Daemon Uses conventional UNIX, Linux, Windows ftp semantics No API to install Most universal interface Performance commensurate with ftp and underlying network Not a parallel interface
SAN-Enabled Write or Put SAN Disk Store Domain 2. transfers file to Disk or Tape Directly Over SAN Tape Libraries FC SAN Cluster Computers Metadata Disks API IP Network 1. issues Write or Put to Core Cluster Computers Core and Movers transferred directly between client and disk over SAN Control is over TCP/IP network (separation of control and data) Supported by API and PFTP Currently supported on AIX and Linux Used internally to to move data between disk and tape
Pull from SAN to Disk or Tape SAN Disk SAN Disk Store FC SAN 2. reads file over SAN Domain Tape Libraries FC SAN Metadata Disks Cluster Computers SAN API API 1. issues LFPUT SAN API IP Network Cluster Computers Core and Movers 3. writes to disk tape as indicated by Class of Service Local File Mover accesses data on client SAN Examples of client SAN: IBM SAN FS, ADIC SNFS, IBM GPFS Activated by PFTP LFPUT-LFGET with more options coming CPU overhead entirely offloaded to Movers Parallel capability and/or direct tape access via Class of Service options
POSIX VFS Interface! accessed using standard UNIX/Posix semantics! Run standard products on such as IBM DB2! VFS currently only on Linux Buffer Unix/Posix Application Posix File System Interface Linux VFS Extensions & Daemons API Control Optional SAN Path Movers Core Cluster AIX or Linux
NFS Interface via VFS Agent Unix/Linux Application NFS NFS Unix/Linux Application VFS Extensions & Daemons Posix File System Interface Buffer Any Unix/Linux Buffer Posix File System Interface Linux Agent API Control Optional SAN Path Movers Core Cluster AIX or Linux
Windows Interface via VFS Agent Windows Application CIFS Samba Unix/Posix Application VFS Extensions & Daemons Windows File System Interface Buffer Windows Buffer Posix File System Interface Linux Agent API Control Optional SAN Path Movers Core Cluster AIX or Linux
Outline Introduction to Interfaces The Collaboration
The Collaboration U.S Department of Energy Laboratories are Co-Developers Lawrence Livermore National Lab. - Sandia National Laboratories Los Alamos National Laboratory - Oak Ridge National Laboratory Lawrence Berkeley National Lab. IBM Global Services in Houston, Texas Access to IBM technology (DB2, for example) Project management Quality assurance and testing (SEI CMM Level 3) Commercial sales and service Advantages of Collaborative Development Developers are users: focus on what is needed and what works Keeps focus on the high end: the largest data stores A limited open source model for collaboration members and users Since 1992
Some Large Sites 2+ PB: Brookhaven National Laboratory (BNL) 1+ PB: Commissariat à l'energie Atomique/Direction des Applications Militaires (CEA/DAM) Compute Center in France 1 PB: The European Centre for Medium-Range Weather Forecasts (ECMWF) in England 1.1 PB: Lawrence Livermore National Laboratory (LLNL) open system 2+ PB: Los Alamos National Laboratory (LANL) 1+ PB: National Energy Research Scientific Computing Center (NERSC) 1 PB: San Diego Supercomputer Center (SDSC) 1.4 PB: Stanford Linear Accelerator Center (SLAC)