GridKa User Meeting Forschungszentrum Karlsruhe GmbH Central Information and Communication Technologies Department Hermann-von-Helmholtz-Platz 1 D-76344 Eggenstein-Leopoldshafen Dr. Holger Marten http://grid.fzk.de
Content 1. FZ Karlsruhe & GridKa 2. User accounts & certificates 3. Cluster Installation 4. A rack-based cooling solution 5. Data management 6. Final remarks & summary
1. FZ Karlsruhe & GridKa
Forschungszentrum Karlsruhe 40 institutes and divisions 3.500 employees 13 research programs for - Structure of Matter - Earth & Environment - Health - Energy - Key Technologies many close collaborations with TU Karlsruhe
Central Information and Communication Technologies Dept. (Hauptabteilung Informations- und Kommunikationstechnik, HIK) Your Key to Success HIK provides institutes of the Research Centre with state-of-the-art high performance computers and IT solutions for each purpose. vector computers, parallel computers, Linux Clusters, workstations, ~2500 PCs, online storage, tape robots, networking infrastructure, printers and printing services, central software, user support,... + R&D About 90 persons in 7 departments
HIK (K.-P. Mickel) Organisational Structure Zentralabtlg. Sekretariat DASI HLR GIS GES NiNa PC/BK Repro Datendienste Anwendungen Systemüberwachung Infrastruktur Hochleistungsrechnen Grid-Computing Infrastruktur und Service GridKa Grid-Computing und e-science Competence Centre Netzinfrastruktur und Netzanwendungen PC-Betreuung und Bürokommunikation Reprografie (R. Kupsch) (F. Schmitz) (H. Marten) (M. Kunze) (K. -P. Mickel) (A. Lorenz) (G.Dech) GridKa Zentralabteilung und Sekretariat: IT-Innovationen, Accounting, Billing, Budgetierung, Ausbildungskoordination, sonstige zentrale Aufgaben
Grid Computing Centre Karlsruhe - The mission German Tier-1 Regional Centre for 4 LHC HEP-Experiments 2001-04 test phase 2005-07 main set-up phase 2007+ production phase German Computing Centre for 4 non-lhc HEP-Experiments 2001-04+ production environment for BaBar, CDF, D0, Compass
Regional Data and Computing Centre Germany, Requirements For LHC (Alice, Atlas, CMS, LHCb) + BaBar, Compass, D0, CDF 2001-2004: Test Phase 2005-2007: LHC Setup Phase 2007+: Operation Including resources for BaBar Tier-A month/year 11/2001 4/2002 4/2003 4/2004 2005 2006 2007 CPU (ksi95) 1 10 25 60 150 325 900 disk (TByte) 7 45 113 210 440 850 1500 tape (Tbyte) 7 111 211 350 800 2000 3700 started in 2001! + services... + other sciences
Organization of GridKa Project Leader & Deputy H. Marten, M. Kunze Overview Board controls execution & financing, arbitrates in case of conflicts Technical Advisory Board defines technical requirements It s a project with 41 user groups from 19 German institutions
GridKa Overview Board (OB) Chairman Representative of BMBF R. Maschuw (FZK board of directors) J. Richter Project Leader & Deputy Head of FZK Computing Department H. Marten, M. Kunze (FZK) K.-P. Mickel (FZK) Chaiman of TAB Representatives of KET & KHK P. Malzacher (GSI) R.-D. Heuer, (DESY), P. Braun-Munzinger (GSI) 2 Representatives of LHC/non-LHC each L. Köppke (U Mainz) for Atlas P. Mättig (U Wuppertal) for CDF/D0 T. Müller (TU Karlsruhe) for CMS M. Schmelling (MPI Heidelberg) for Alice & LHCb B. Spaan (TU Dresden) for BaBar & Compass
GridKa Technical Advisory Board (TAB) Chairmen K.-P. Mickel (FZK), P. Malzacher (GSI) Project Leader & Deputy H. Marten, M. Kunze (FZK) Representatives of KET & KHK Representative of DESY R.-D. Heuer, (DESY) P. Braun-Munzinger (GSI) R. Mankel (DESY) 8 representatives of LHC experiments 8 representatives of non-lhc experiments Alice K. Schwarz (GSI Darmstadt) BaBar H. Lacker (TU Dresden) P. Malzacher (GSI Darmstadt) M. Steinke (RU Bochum) Atlas G. Duckeck (LMU München) CDF T. Müller (TU Karlsruhe) L. Köppke (U Mainz) K. Rinnert (TU Karlsruhe) CMS G. Quast (TU Karlsruhe) Compass F.-H. Heinsius (U Freiburg) F. Raupach (RWTH Aachen) L. Schmitt (TU München) LHCb M. Schmelling (MPI Heidelberg) D0 D. Wicke (U Wuppertal) U. Uwer (U Heidelberg) Ch. Zeitnitz (U Mainz)
Role of the 16 HEP-Representatives in TAB For each of the 8 HEP experiments they are the primary interface between experiments & GridKa staff responsible for installation of experiment specific software (exp-admin) responsible for data import & data management per experiment responsible for information flow and Web pages per experiment responsible for allocation/de-allocation of user accounts per experiment
2. User accounts & certificates
GridKa how to get an account? Application form at http://grid.fzk.de -> GridKa Info -> user insert your adress, institute, E-mail,... select experiment(s) and/or project(s) you are working in (projects = CrossGrid, DataGrid, LCG) print the form sign by yourself let it sign by your experiment representative(s)! send application form back to FZK by fax or official post login by ssh on port 24 only notice pw guiding rules (e.g. max. age = 180 days)
Grid Certification Authority Basic problem: I have a job here. Shall I give this job access to my resources? Is it really a job of user A? Solution: Grid users are identified by their certificate. A certificate is a public key signed by a Certification Authority (CA). Everybody can prove that this certificate has not been corrupted by a man in the middle. user applies for certificate (grid-cert-request / Web-interface); this generates users private/public key pair CA guarantees that a users public key belongs to the respective natural person - user sends copy of his identity card with own signature and signature of IL - CA calls back by phone to user and IL - CA signs users public key with CAs private key (off-line!!) everybody can prove users public key using the public key of the CA
GermanGrid CA delivers x509 certificates for German Grid research activities of Alice, Atlas, CMS, LHCb BaBar, CDF, Compass, D0 CrossGrid, DataGrid, LCG - already delivered certificates to GridLab Certificate Policy and Certification Practice Statement available http://grid.fzk.de/ca Certificates accepted by DataGrid- and CrossGrid; Certificates delivered by globus.org not accepted!
3. GridKa Cluster Installation
Support for multiple experiments Strategy as starting point: Compute Nodes = general purpose, shared resources - GridKa Technical Advisory Board agreed to RedHat 7.2 Experiment-specific software server - arbitrary development environments & login at the beginning - pure software servers and/or Globus gatekeepers later-on
Experiment Specific Software Server 8x dual PIII for Alice, Atlas, BaBar,..., each with 2 GB ECC RAM, 4x 80 GB IDE-Raid5, 2x Gbit Ethernet Linux & basic software on demand: RH 6.2, 7.1.1, 7.2, Fermi-Linux, SuSE 7.2 Used as Development environment per experiment Interactive login & Globus gatekeeper per experiment Basic admin (root) by FZK Specific software installation by experiment admin
Grid LAN Backbone Extreme Black Diamond 6808 redundant power supply redundant management board 128 Gbit/s back plane max. 96 Gbit ports currently 80 ports available
Compute Nodes 124x dual PIII, each with 1 GHz or 1.26 GHz 1 GB ECC RAM 40 GB HDD IDE 100 Mbit Ethernet Total numbers: 5 TB local disk 124 GByte RAM R peak > 270 GFlops RedHat 7.2 OpenPBS automatic installation with NPACI Rocks
OpenPBS batch system provides batch commands qsub, qstat, qdel, qalter provides job status monitoring commands xpbs, xpbsmon examples for job submit and example scripts on the Web default (test) queue for jobs <20 min fair share queues short/ long/ extralong with 1/ 10/ 48 h one job per processor
Cluster Installation, Monitoring & Management scalability: many nodes to install and maintain (~ 2000) heterogeneity: different (Intel-based?) hardware over time consistency: software must be consistent on all nodes manpower: administration by few persons only This is for administrators, not for a Grid Resource Broker
Architecture for Scalable Cluster Administration Cabinet 1 Cabinet 2 Cabinet n Nodes C 1 Nodes C 2 Nodes C n Private Compute Network Manager C 1 Manager C 2 Manager C n Management Network Master Public Net Naming scheme: C01-002-001...064; F01-003-002..,...
Installation - NPACI Rocks with FZK extensions subnet subnet subnet Nodes C 1 Nodes C 2 Nodes C n Private Compute Network DHCP-server compute IP install nodes Manager C 1 Manager C 2 Manager C n Management Network DHCP-server for Managers C1...Cn RH kickstart incl. IP conf. for Managers rpm to Managers Master Public Net http://rocks.npaci.edu reinstall all nodes in < 1 h
System Monitoring with Ganglia also installed on fileservers - CPU usage - Bytes I/O - Packets I/O - disk space -... and published on the Web http://ganglia.sourceforge.net
System Monitoring with Ganglia - Combined Push-Pull Ganglia daemon on each node info via multicast no routing subnet Nodes C 1 Nodes C 2 Nodes C n Private Compute Network Manager C 1 Manager C 2 Manager C n request Ganglia Master report write into round robin DB 300 kb / node Management Network publish to Web
Cluster Management with Nagios - Combined Push-Pull subnet Nodes C 1 Nodes C 2 Nodes C n analyse data handle local events ping SNMP syslog report Manager C 1 Manager C 2 Private Compute Network Manager C n Management Network report GPL http://www.nagios.org analyse data handle events Nagios Master publish to Web
Software Installation & Central Services Software Installation Service (NPACI Rocks) http://rocks.npaci.edu System Monitoring (Ganglia Cluster Toolkit) http://ganglia.sourceforge.net Globus MDS, LDAP,... (Globus 2.0) System Management http://www.nagios.org User Statistics (home made)
4. GridKa rack-based cooling system
Infrastructure We all want to build a Linux cluster...... do we have the cooling capacity?
Closed rack-based cooling system - A common development of FZK and Knürr rear CPU disk front 19 technique 38 units height usable 70x 120 cm floor space 10 kw cooling redundant DC fans temperature controlled CPU shut-down internal heat exchanger
Closed rack-based cooling system - No air conditioning channels required; Estimated cost reduction > 70% building external heat exchanger
Build a cluster of clusters (or a Campus Grid?) bld. C bld. A LAN bld. B bld. D
5. GridKa data management
Available Disk Space for HEP: 36 TByte net TByte 18 16 14 12 10 8 6 4 2 0 ALICE ATLAS CMS LHCb BaBar CDF D0 Compass soll ist
Online Storage 59 TB brutto 45 TB net capacity ~ 500 disk drives mixed IDE, SCSI, FC ext3 & ufs file systems DAS: 2.6 TB brutto, 7.2k SCSI 120 GB, attached to SUN Enterprise 220 R SAN: 2x 5.1 TB brutto, 10k FC 73,4 GB, IBM Fast500 NAS: 42.2 TB brutto, 19x IDE-System, dual PIII 1.0/ 1.26 GHz, dual 3Ware Raid Controller, 16x 5.4k IDE 100/ 120/ 160 GB SAN-IDE: 3.8 TB brutto, 2 Systems, 12x 5.4k IDE 160 GB, driven by Linux-PC
Scheme of (Disk) Storage Cluster Nodes (clients) Grid backbone MDC Fileserver Tests with IDE NAS 1.4... 2 Linux TB/System Server & FC/IDE successful SAN (shared w. FZK) SAN goes commodity! Disk Subsystems Tape Drives
Available Tape Space for HEP: 106 TByte native TByte 20 18 16 14 12 10 8 6 4 2 0 ALICE ATLAS CMS LHCb BaBar CDF D0 Compass soll ist
GridKa Tape System FZK Tape 1 FZK Tape 2 FZK SAN IBM 3584 ~ 2400 slots LTO Ultrium 100 GB/tape 106 TB native available 8 drives, 15 MByte/s each Backup/Archive with Tivoli Storage Manager
GridKa Backup Policy backup = short-term storage of (frequently) changing data (on tape) online data last version keep 60 days after online data is removed extra version keep max. 30 days remove if online data is removed automatic incremental backup of all file servers every night reverse process: restore = recreate the online copy from tape restore only by system administrators (trouble shooting) no warnings before versions removed from tape!
GridKa Archive Policy archive = long-term storage of data (on tape) archival for 1 / 2 / 5 / 10 years (default = 2 years) reverse process: retrieve = recreate the online copy from tape archival/retrieval to be done by users themselves (no automatic process) may be used to store multiple version for historical reasons command dsmc available on each login server online data may be removed after archival no warnings before archives removed from tape!
Hierarchical Storage Management (HSM) HSM = automatic migration & recall automatically migrate data from online to secondary storage (disk or tape) migration based on file age, available disk volume,... keep stub file online automatically recall file to original location if it is accessed data is lost if stub file is deleted or corrupted! -> needs additional backup HSM not available at GridKa! (but tests currently under discussion)
Storage & Data Management does it scale? >600 automount operations per second for 150 processors measured IDE NAS throughput - >150 MB/s local read (2x RAID5 + RAID0) - 30-40 MB/s w/r... over NFS... but < 10 MB/s with multiple I/O and multiple users 150 jobs write to a single NAS box Linux file system limit 2 TB disk volumes of >50 TB with flexible volume management desirable mature system needed now! Tests for gfs & gpfs under discussion
Storage & Data Management under discussion GridKa disk storage DataGrid Castor backup SAM dcache TSM - robotics - tape error handling - tape recycling
6. GridKa a few final remarks
Internet Connection 34 Mbps currently available 155 Mbps available from January 2003 1 Gbps test connection to CERN via DFN & Geant available flexibility will increase due to dedicated Grid firewall transfer of 1 TeraByte of data needs ~ 4 days at 34 Mbps ~ 1 day at 155 Mbps ~ 4 hours at 1 Gbps Still: high networking cost for bandwidth and transfer volume!
CrossGrid DataGrid Testbed dedicated CrossGrid installation available at FZK EDG 1.2.2 installed FZK + Portugal + Spain were first to connect to DataGrid testbed GermanGrid certificates accepted CrossGrid-DataGrid demo at IST2002, Nov. 4, Copenhagen Our first goals: test middleware stability transfer middleware to GridKa a.s.a.p.
Summary - The GridKa Installation Gbit backbone 250 CPUs + 130 PIV this week for testing + ~60 PIV until April 2003 8 experiment specific server 35 TB net disk + 40 TB net until April 2003 100 TB tape + 110 TB until April 2003 (or on demand) a few central servers Globus, batch, installation, management,... WAN Gbit-test Karlsruhe-Cern FZK-Grid-CA -- testbed... exclusively for HEP and Grid Computing
GIS Your GridKa Support Team Holger Marten Manfred Alef Bruno Hoeft Axel Jaeger Melanie Knoch Ingrid Schäffner Bernhard Verstege Jos van Wezel Division & Project Leader Login Server, Security, File Server, Linux LAN/WAN Network Infrastructure, Cluster Management Globus, batch, user accounts Globus, Web, user accounts, certificates Cluster Management, Linux Data Management, File Server, Linux Ursula Epting (NiNa) + few persons from other divisions GermanGrid CA grid@hik.fzk.de; user forum rdccg-user@hik.fzk.de; HyperNews coming soon.
We appreciate the continuous interest and support by the Federal Ministry of Education and Research, BMBF.