dcache Introduction Forschungszentrum Karlsruhe GmbH Institute for Scientific Computing P.O. Box 3640 D-76021 Karlsruhe, Germany Dr. http://www.gridka.de
What is dcache? Developed at DESY and FNAL Disk pool management with or without tape backend Data may be distributed among a huge amount of disk servers. Fine grained configuration of pool attraction scheme Automatic load balancing by cost metric and inter pool transfers. Data removed only if space is needed
Pool Selection Mechanism Pool Selection required for: Client dcache Tape dcache dcache dcache dcache Client Pool selection is done in 2 steps Query configuration database : which pools are allowed for requested operation Query 'allowed pool' for their vital functions : find pool with lowest cost for requested operation
Pool Selection Mechanism : Tuning Space vs. Load For each request, the central cost module generates two cost values for each pool Space : Cost based on available space or LRU time stamp CPU : Cost based on the number of different movers (in,out,...) The final cost, which is used to determine the best pool, is a linear combination of space and CPU cost. The coefficients needs to be configured. Space coefficient << CPU coefficient Pro : Movers are nicely distributed among pools. Con : Old files are removed rather than filling up empty pools. Space coefficient >> CPU coefficient Pro : Empty pools are filled up before any old file is removed. Con : 'Clumping' of movers on pools with very old files or much space.
dcache properties Automatic migration to tape Supports read ahead buffering and deferred write Uses standard 'ssh' protocol for administration interface. supports ssl, kerberos and gsi security mechanisms
LCG Storage Element DESY dcap lib incorporates with CERN GFAL library SRM version ~ 1 (1.7) supported gsiftp supported
Multiple access of one file Pool 1 Pool 2 Pool 3 File 1 File 1
dcache environment compute nodes / login server gridftp / mountpoint file transfer pools head node srm gsiftp srmcp file transfer tsm server file transfer TSM with tapes
PNFS Perfectly Normal File System pool and tape real data 0000000000000000000014F0 000000000000000000001510 0000000000000000000015A0 0000000000000000000017E8 000000000000000000001858 pnfs database for filenames metadata
Databases gdbm databases Experiment specific databases Independent access Content of metadata: User file name File name within dcache Information about the tape location (storage class ) Pool name where the file is located
Access to dcache Mountpoint ls mv rm checksum,. dcap dccp <source> <destination> dc_open(...) dc_read(...) Gridftp Problematic when file needs to be staged first SRMCP
dcache Access Protocol (dcap) write a file to dcache Login server: dccp d2 <source file> <pnfs mountpoint> Connection to head node return available pool node copy direct into available pool node pool: data is precious (can't be deleted) flush into tape data is cached (can be deleted from pool)
dcache Access Protocol (dcap) copy a file out of dcache Login server : dccp d2 <pnfs mountpoint> <dest. file> Connection to the headnode is file in any pool? if not in pool the data will be taken from tape Copy file from pool to destination
gsiftp Only registered dcache user!!! grid-proxy-init globus-url-copy dbg \ file:///grid/fzk.de/mounts/nfs/home/ressmann/testfile \ gsiftp://dcacheh1.gridka.de/ressmann/file1
srmcp Only registered dcache user!!! grid-proxy-init srmcp debug=true gsissl=true \ srm://castorgrid.cern.ch:80//castor/cern.ch/grid/dteam/castorfile \ srm://srm1.fzk.de:8443//pnfs/gridka.de/data/ressmann/file2 srmcp debug=true \ srm://srm1.fzk.de:8443//pnfs/gridka.de/data/ressmann/file2 file:////tmp/file2
Tape Management Forschungszentrum Karlsruhe Tivoli Storage Manager (TSM) library management TSM is not developed for archive Interruption of TSM archive No control what has been archived
dcache tape access Convenient HSM connectivity (done for Enstore, OSM, TSM, bad for HPSS) Creates a separate session for every file Transparent access Allows transparent maintenance at HSM
dcache tape management Precious data is separately collected per 'storage class Each 'storage class queue ' has individual parameters, steering the tape flush operation. Maximum time, a file is allowed to be 'precious' per 'storage class'. Maximum number of precious bytes per 'storage class Maximum number of precious files per 'storage class Maximum number of simultaneous tape flush' operations can be configured
dcache pool node 20 GB 800 GB 1h
Tivoli Storage Manager (tsm)
Tivoli Storage Manager (tsm) after dcache tuning
Problematic Hardware RAID controller 3WARE with 1.6 TB Always Degraded mode Rebuilding 70 kb/s or 10 MB/s Lost data
Installation experience Little documentation rpm s created for Tier 2 centres e.g. read access from tape was removed New versions are in new locations Every new installation is getting easier Great support from DESY especially Patrick Fuhrmann
User Problems Use of cp Overwriting of existing files Third party transfer of globus-url-copy
Other File Management Systems CERN Advanced STORage manager (Castor) Jefferson lab Asynchronous Storage Manager (JASMine) xrootd SLAC Cluster File Systeme (SAN FS, AFS, GPFS)
Conclusion Low cost read pools Reliable write pools Fast file access with low cost hardware Write once never change a dcache file Single point of failure Future work: using cheap disks to pre-stage a whole tape