GridKa: Roles and Status



Similar documents
Forschungszentrum Karlsruhe: GridKa and GGUS

Forschungszentrum Karlsruhe in der Helmholtz - Gemeinschaft. Holger Marten. Holger. Marten at iwr. fzk. de

The GridKa Installation for HEP Computing

Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft. Global Grid User Support - GGUS - within the LCG & EGEE environment

Tier0 plans and security and backup policy proposals

EGEE is a project funded by the European Union under contract IST

Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft. GridKa User Meeting

Linux and the Higgs Particle

Global Grid User Support - GGUS - in the LCG & EGEE environment

Integrating a heterogeneous and shared Linux cluster into grids

Global Grid User Support - GGUS - start up schedule

The CMS analysis chain in a distributed environment

Report from SARA/NIKHEF T1 and associated T2s

Mass Storage at GridKa

Mass Storage System for Disk and Tape resources at the Tier1.

LHC schedule: what does it imply for SRM deployment? CERN, July 2007

Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft. dcache Introduction

Internal ROC DECH Report

The dcache Storage Element

Computing in High- Energy-Physics: How Virtualization meets the Grid

Implementing an Automated Digital Video Archive Based on the Video Edition of XenData Software

Implementing a Digital Video Archive Based on XenData Software

Big Data and Storage Management at the Large Hadron Collider

Dcache Support and Strategy

High Availability Databases based on Oracle 10g RAC on Linux

Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft. Support in EGEE. (SA1 View) Torsten Antoni GGUS, FZK

GridKa site report. Manfred Alef, Andreas Heiss, Jos van Wezel. Steinbuch Centre for Computing

Implementing a Digital Video Archive Using XenData Software and a Spectra Logic Archive

(Possible) HEP Use Case for NDN. Phil DeMar; Wenji Wu NDNComm (UCLA) Sept. 28, 2015

The LCG Distributed Database Infrastructure

HAMBURG ZEUTHEN. DESY Tier 2 and NAF. Peter Wegner, Birgit Lewendel for DESY-IT/DV. Tier 2: Status and News NAF: Status, Plans and Questions

CMS Tier-3 cluster at NISER. Dr. Tania Moulik

Implementing Offline Digital Video Storage using XenData Software

Data storage services at CC-IN2P3

Tier-1 Services for Tier-2 Regional Centres

Maurice Askinazi Ofer Rind Tony Wong. Cornell Nov. 2, 2010 Storage at BNL

Alternative models to distribute VO specific software to WLCG sites: a prototype set up at PIC

Support Model for SC4 Pilot WLCG Service

irods at CC-IN2P3: managing petabytes of data

OSG Hadoop is packaged into rpms for SL4, SL5 by Caltech BeStMan, gridftp backend

Grid Computing in Aachen

CHESS DAQ* Introduction

IT-INFN-CNAF Status Update

NT1: An example for future EISCAT_3D data centre and archiving?

AFS Usage and Backups using TiBS at Fermilab. Presented by Kevin Hill

Status of Grid Activities in Pakistan. FAWAD SAEED National Centre For Physics, Pakistan

ATLAS GridKa T1/T2 Status

Software, Computing and Analysis Models at CDF and D0

File server


NEXTGEN v5.8 HARDWARE VERIFICATION GUIDE CLIENT HOSTED OR THIRD PARTY SERVERS

SPACI & EGEE LCG on IA64

Evolution of the Italian Tier1 (INFN-T1) Umea, May 2009

KIT Site Report. Andreas Petzold. STEINBUCH CENTRE FOR COMPUTING - SCC

Virtual Server and Storage Provisioning Service. Service Description

DCMS Tier 2/3 prototype infrastructure

Das HappyFace Meta-Monitoring Framework

Towards a Comprehensive Accounting Solution in the Multi-Middleware Environment of the D-Grid Initiative

ALICE GRID & Kolkata Tier-2

BaBar and ROOT data storage. Peter Elmer BaBar Princeton University ROOT Oct. 2002

Betriebssystem-Virtualisierung auf einem Rechencluster am SCC mit heterogenem Anwendungsprofil

How To Build A Supermicro Computer With A 32 Core Power Core (Powerpc) And A 32-Core (Powerpc) (Powerpowerpter) (I386) (Amd) (Microcore) (Supermicro) (

The LHC Open Network Environment Kars Ohrenberg DESY Computing Seminar Hamburg,

Big Science and Big Data Dirk Duellmann, CERN Apache Big Data Europe 28 Sep 2015, Budapest, Hungary

The GRID and the Linux Farm at the RCF

The safer, easier way to help you pass any IT exams. Exam : Storage Sales V2. Title : Version : Demo 1 / 5

Low-cost BYO Mass Storage Project. James Cizek Unix Systems Manager Academic Computing and Networking Services

Ultra-Scalable Storage Provides Low Cost Virtualization Solutions

Scientific Storage at FNAL. Gerard Bernabeu Altayo Dmitry Litvintsev Gene Oleynik 14/10/2015

Annex 1: Hardware and Software Details

Virtualization Infrastructure at Karlsruhe

Service Challenge Tests of the LCG Grid

RO-11-NIPNE, evolution, user support, site and software development. IFIN-HH, DFCTI, LHCb Romanian Team

MANAGED STORAGE SYSTEMS AT CERN

WHITEPAPER: Understanding Pillar Axiom Data Protection Options

Status and Integration of AP2 Monitoring and Online Steering

ATLAS Cloud Computing and Computational Science Center at Fresno State

Virtualization of a Cluster Batch System

Solution for private cloud computing

Software Scalability Issues in Large Clusters

XenData Video Edition. Product Brief:

HP reference configuration for entry-level SAS Grid Manager solutions

June Blade.org 2009 ALL RIGHTS RESERVED

How To Monitor Your Computer With Nagiostee.Org (Nagios)

LHCb activities at PIC

Agenda. Enterprise Application Performance Factors. Current form of Enterprise Applications. Factors to Application Performance.

NAS or iscsi? White Paper Selecting a storage system. Copyright 2007 Fusionstor. No.1

System Requirements Version 8.0 July 25, 2013

AFS Usage and Backups using TiBS at Fermilab. Presented by Kevin Hill

Andrew Hanushevsky SLAC National Accelerator Laboratory Stanford University 19-August-2009 Atlas Tier 2/3 Meeting.

CONFIGURATION GUIDELINES: EMC STORAGE FOR PHYSICAL SECURITY

SAN TECHNICAL - DETAILS/ SPECIFICATIONS

Preview of a Novel Architecture for Large Scale Storage

DSS. High performance storage pools for LHC. Data & Storage Services. Łukasz Janyst. on behalf of the CERN IT-DSS group

Enabling Technologies for Distributed Computing

dcache, a managed storage in grid

Objectivity Data Migration

Storage Virtualization from clusters to grid

LCG POOL, Distributed Database Deployment and Oracle

Archive Data Retention & Compliance. Solutions Integrated Storage Appliances. Management Optimized Storage & Migration

Transcription:

GridKa: Roles and Status GmbH Institute for Scientific Computing P.O. Box 3640 D-76021 Karlsruhe, Germany Holger Marten http://www.gridka.de

History 10/2000: First ideas about a German Regional Centre for LHC Computing - planning and cost estimates 05/2001: Start a BaBar-Tier-B with Univ. Bochum, Dresden, Rostock 07/2001: German HEP communities send Requirements for a Regional Data and Computing Centre in Germany (RDCCG) - more planning and cost estimates 12/2001: Launching committee establishes RDCCG (renamed to Grid Computing Centre Karlsruhe, GridKa later) 04/2002: First prototype 10/2002: GridKa Inauguration meeting

High Energy Physics experiments served by GridKa Atlas (SLAC, USA) u p m o id C y today r G to ad e d r e l t a it m ta a m o d C l rea e v Ha (FNAL,USA) LHC experiments (FNAL,USA) ting (CERN) non-lhc experiments Other sciences later

GridKa Project Organization Technical Advisory Board Overview Board Board Alice Atlas CMS LHCb BaBar CDF D0 Compass Physics Committees DESY Project Leader GridKa Planning Development Technical realization Operation BMBF Physics Committees HEP Experiments LCG FZK Management Head FZK Comp. Centre Chairman of TAB Project Leader

Aachen (4) Bielefeld (2) Bochum (2) Bonn (3) Darmstadt (1) Dortmund (1) Dresden (2) Erlangen (1) Frankfurt (1) Freiburg (2) Hamburg (1) Heidelberg (1) (6) Karlsruhe (2) Mainz (3) Mannheim (1) München (1) (5) Münster (1) Rostock (1) Siegen (1) Wuppertal (2) German Users of GridKa 22 institutions 44 user groups 350 scientists

GridKa in the network of international Tier-1 centres France: Germany: Italy: Japan: Spain: Switzerland: Taiwan: UK: USA: USA: IN2P3, Lyon CNAF, Bologna ICEPP, University Tokio PIC, Barcelona CERN, Genf Academia Sinica, Taipei Rutherford Laboratory, Chilton Fermi Laboratory, Batavia, IL BNL Warning: List not fixed.

The fifth LHC subproject Lab z Uni e The global LHC Computing Centre ATLAS Virtual Organizations Lab y USA (Fermi, BNL) Tier 3 Tier 1 (Institute Tier 2 computers) (Uni-CCs, Lab-CCs) Uni d UK (RAL) Uni b CERN Tier 0 LHCb.. Lab x Uni a France (IN2P3) Italy (CNAF) Tier 4 (Desktop) CERN Tier 1 CMS Germany (FZK) Lab i Working Groups Uni c Tier 0 Centre at CERN

desktops portables Santiago RAL small Tier-2 Weizmann centres Forschungszentrum Karlsruhe Tier-1 MSU IC IN2P3 IFCA UB FNAL Cambridge LHC Computing Model (simplified!!) Tier-0 the accelerator centre Filter raw data Reconstruction summary data (ESD) Record raw data and ESD Distribute raw and ESD to Tier-1 CNAF Budapest Prague FZK Taipei PIC TRIUMF Legnaro ICEPP CSCS Rome CIEMAT Krakow NIKHEF USC Tier-1 Les Robertson, GDB, May 2004 BNL Permanent storage and management of raw, ESD, calibration data, meta- online to data acquisition process data, analysis data and databases -- high availability (24h x7d) grid-enabled data service -- managed mass storage Data-heavy analysis -- long-term commitment Re-processing raw ESD -- resources: 50% of average National, regional support Tier-1

Tier-2 Well-managed disk storage grid-enabled Simulation End-user analysis batch and interactive High performance parallel analysis (PROOF?) MSU IC IN2P3 IFCA UB FNAL Cambridge CNAF Budapest Prague FZK Taipei PIC TRIUMF Legnaro ICEPP CSCS Rome Each Tier-2 is associated with a Tier-1 that Serves as the primary data source Takes responsibility for long-term storage and management of all of the data generated at the Tier-2 (grid-enables mass storage) May also provide other support services (grid expertise, software distribution, maintenance, ) BNL CERN will not provide these services for Tier-2s GridKa School 2004, September 20-23, 2004, Karlsruhe, Germany except by special arrangement CIEMAT Krakow NIKHEF USC Les Robertson, GDB, May 2004 desktops portables Santiago RAL small Tier-2 Weizmann centres Forschungszentrum Karlsruhe Tier-1

GridKa planned resources Tbyte 6000 Jan-2004 4000 CPU Disk Tape 3000 4000 2000 2000 1000 0 0 2002 2003 2004 2005 2006 2007 2008 2009 LCG Phase I Phase II Phase III ksi95 8000

Distribution of planned resources at GridKa 100% 80% 60% 40% 20% 0% 100% 80% 60% 40%!! C H on-l non-lhc n o t ns o i t u rib t LHC n o c t n a c re2007 t A ifi 2004 n e 2002 ign 2003 2005 2006 r l C S Tie a n r o i a eg ab R B F non-lhc CD, 0 D 20% 80% 60% Jan-2004 2009 Disk LHC 0% 100% 2008 CPU 2002 2003 2004 2005 2006 2007 2008 2009 non-lhc Tape 40% 20% LHC 0% 2002 2003 2004 2005 2006 2007 2008 2009

GridKa Environment

IW R 441,442 Main building Tape Storage

Worker Nodes & Test beds Production environment 97x dual PIII, 1,26 GHz 97 ksi2000 64x dual PIV, 2,2 GHz 102 ksi2000 72x dual PIV, 2,667 GHz 130 ksi2000 267x dual PIV, 3,06 GHz 534 ksi2000 36x dual Opteron 246 90 ksi2000 1 GB mem, 40 GB HD 1 GB RAM, 40 GB HD 1 GB RAM, 40 GB HD 1 GB RAM, 40/80 GB HD 2 GB RAM, 80 GB HD Σ 536 nodes, 1072 CPUs, 953 ksi2000 installed with RH7.3, LCG 2.2.0 (except for Opterons) Test environment additional 30 machines in several test beds Next OS Scientific Linux if middleware and applications are ready

PBSPro fair share according to requirements experiment ksi2000 share percentage Alice 143 14 300 13.2 Atlas 150 15 000 13.9 CMS 140 14 000 12.9 LHCb 56 5 600 5.2 BaBar 210 21 000 19.4 CDF 50 5 000 4.6 Dzero 283 28 300 26.2 Compass 50 5 000 4.6 45% LHC 55 % nlhc 1-oct-2004 The default (test) queue is not handled by the fair share. These 20-30 CPUs are kept free for test jobs.

Disk Space available for HEP experiments: 202 TB 60 50 30 Oct 04 20 10 29 % LHC 71 % nlhc Compass D0 CDF BaBar LHCb CMS ATLAS 0 ALICE TByte 40

Online Storage I about 40 TB stored in NAS (better: DAS) dual CPU, 16 EIDE disks, 3Ware controller Experience hardware cheap, but not very reliable RAID software & management messages not always useful good throughput for a few simultaneous jobs, but doesn t scale to a few hundred simultaneous file accesses Workarounds disk mirroring management software ( managed disks ): file copies on multiple boxes) more reliable disks + parallel file system

Online Storage: I/O Design with NAS (DAS) Compute nodes TCP/IP/NFS Expansion ~ 30 MB/s r/w bottleneck disk access bottleneck Alice Atlas

Online Storage II about 160 TB stored in a SAN SCSI disks (rpm 10k) with redundant controllers parallel file system on a file server cluster exported via NFS on a cluster of file server to the WNs

Online Storage: Scalable I/O Design Compute nodes TCP/IP/NFS Expansion file server cluster SAN/SCSI Fibre Channel Alice Atlas RAID 5 storage striping + parallel file system; 350-400 MB/s I/O measured

Online Storage II about 160 TB stored in a SAN SCSI disks (rpm 10k) with redundant controllers parallel file system on a file server cluster exported via NFS on a cluster of file server to the WNs Advantages high availability through multiple redundant servers load balancing via automounter program map Experience many teething problems (bugs, learn how to configure,...) ratio (CPU/Wall clock) near to 1 in some applications more expensive > next try cheaper S-ATA systems

Why telling all this? Because we need your experience and feedback as users!

Tape Space available for HEP experiments: 374 TB 120 100 60 Oct 04 40 20 27 % LHC 73 % nlhc Compass D0 CDF BaBar LHCb CMS ATLAS 0 ALICE TByte 80

Tape Storage tape library IBM 3584 LTO Ultrium 8 drives LTO-1, 4 drives LTO-2 375 TB native (uncompressed) Tivoli Storage Manager (TSM) for Backup and Archive installation of dcache in progress - tape backend interfaced to Tivoli Storage Manager - installation with 1 head and 3 pool nodes currently tested by CMS & CDF other - SAM station caches for D0 and CDF - JIM (Job information management) station for D0 - tape connection via scripts (D0) - CORBA Naming service (for CDF)

GridKa Plan for WAN connectivity 2 Gbps 155 Mbps Start 10 Gbps tests 10 Gbps 20 Gbps Start discussion with Dante! 34 Mbps 2001 2002 2003 2004 2005 2006 2007 2008 Sept 2004 DFN upgraded the capacity from Karlsruhe to Géant to 10 Gbps; tests have been started! Routing (full 10 Gbps): GridKa DFN (Karlsruhe) DFN (Frankfurt) Géant (Frankfurt) Géant (Milano) Géant (Geneva) CERN

Further services & sources of information

GGUS (Global Grid User Support) www.ggus.org

User information www.gridka.de GridKa Info - user registration globus installation batch system PBS backup & archive getting a certificate from GermanGrid CA listserver / mailing lists monitoring status with Ganglia www.gridka.de HEP experiments - experiment specific information www.ggus.org - FAQ Documentaion...

Tools gridmon.fzk.de/ganglia

Final remarks

Europe on the way to e-science EU-Project EGEE April 2004 to March 2006 32 Mio. Euro f. personnel Russland 70 partner institutes in 27 countries organized in 9 federations applications LHC grid, Biomed,... Op Co Re Provide distributed European research communities with a common market of computing, offering round-the-clock access to major computing resources, independent of geographic location,..

Status of LCG / EGEE http://goc.grid-support.ac.uk/lcg2

Last but not least We want to help - our users on our systems - support/discuss cluster installations at other institutes - support/discuss middleware installations at other centres - creating a German Grid Infrastructure and... We will continue the balancing act between - testing & Data Challanges - production with real data

No equipment without people. Thanks! We appreciate the continuous interest and support by the Federal Ministry of Education and Research, BMBF.