ATLAS GridKa T1/T2 Status



Similar documents
Forschungszentrum Karlsruhe in der Helmholtz - Gemeinschaft. Holger Marten. Holger. Marten at iwr. fzk. de

Report from SARA/NIKHEF T1 and associated T2s

HAMBURG ZEUTHEN. DESY Tier 2 and NAF. Peter Wegner, Birgit Lewendel for DESY-IT/DV. Tier 2: Status and News NAF: Status, Plans and Questions

The dcache Storage Element

LHC schedule: what does it imply for SRM deployment? CERN, July 2007

Das HappyFace Meta-Monitoring Framework

GridKa: Roles and Status

Status and Evolution of ATLAS Workload Management System PanDA

Mass Storage at GridKa

Service Challenge Tests of the LCG Grid

How To Use Happyface (Hf) On A Network (For Free)

The DESY Big-Data Cloud System

ARDA Experiment Dashboard

CERN local High Availability solutions and experiences. Thorsten Kleinwort CERN IT/FIO WLCG Tier 2 workshop CERN

The CMS analysis chain in a distributed environment

DCMS Tier 2/3 prototype infrastructure

Patrick Fuhrmann. The DESY Storage Cloud

Data storage services at CC-IN2P3

Computing at the HL-LHC

Forschungszentrum Karlsruhe: GridKa and GGUS

KIT Site Report. Andreas Petzold. STEINBUCH CENTRE FOR COMPUTING - SCC

Managed GRID or why NFSv4.1 is not enough. Tigran Mkrtchyan for dcache Team

ANALYSIS FUNCTIONAL AND STRESS TESTING

Grid Computing in Aachen

ATLAS Petascale Data Processing on the Grid: Facilitating Physics Discoveries at the LHC

CMS Dashboard of Grid Activity

Tier0 plans and security and backup policy proposals

Network issues on FR cloud. Eric Lançon (CEA-Saclay/Irfu)

Cost effective methods of test environment management. Prabhu Meruga Director - Solution Engineering 16 th July SCQAA Irvine, CA

Preview of a Novel Architecture for Large Scale Storage

CERN Cloud Storage Evaluation Geoffray Adde, Dirk Duellmann, Maitane Zotes CERN IT

Virtualisation Cloud Computing at the RAL Tier 1. Ian Collier STFC RAL Tier 1 HEPiX, Bologna, 18 th April 2013

OSG Hadoop is packaged into rpms for SL4, SL5 by Caltech BeStMan, gridftp backend

The Compact Muon Solenoid Experiment. Conference Report. Mailing address: CMS CERN, CH-1211 GENEVA 23, Switzerland

bwgrid Treff MA/HD Sabine Richling, Heinz Kredel Universitätsrechenzentrum Heidelberg Rechenzentrum Universität Mannheim 29.

Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft. dcache Introduction

Global Grid User Support - GGUS - in the LCG & EGEE environment

Cloud Services MDM. Control Panel Provisioning Guide

DSS. High performance storage pools for LHC. Data & Storage Services. Łukasz Janyst. on behalf of the CERN IT-DSS group

Monitoring Evolution WLCG collaboration workshop 7 July Pablo Saiz IT/SDC

Sustainability in Grid-Computing Christian Baun. Die Kooperation von Forschungszentrum Karlsruhe GmbH und Universität Karlsruhe (TH)

ATS CLIENT PORTAL INTRODUCTION

HappyFace for CMS Tier-1 local job monitoring

HTCondor at the RAL Tier-1

Free Medical Billing. Insurance Payment Posting: The following instructions will help guide you through Insurance Payment Posting Procedures.

bwgrid Treff MA/HD Sabine Richling, Heinz Kredel Universitätsrechenzentrum Heidelberg Rechenzentrum Universität Mannheim 24.

Client/Server Grid applications to manage complex workflows

LHC GRID computing in Poland

WHITE PAPER RUN VDI IN THE CLOUD WITH PANZURA SKYBRIDGE


and Deployment Roadmap for Satellite Ground Systems

Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft. GridKa User Meeting

Transcription:

ATLAS GridKa T1/T2 Status GridKa TAB, FZK, 19 Oct 2007 München GridKa T1/T2 status Production and data management operations Computing team & cloud organization T1/T2 meeting summary Site monitoring/gangarobot Tier-3/NAF TAB, FZK, 19 Oct 07 1

GridKa Tier-1/Tier-2 overview 6 Tier-2 for ATLAS-GridKa (>=11 Centres) in D, PL, CZ, CH, A Resources: un-official best guess Oct 07 Site CPU(kSI2k) ( '08 ) DISK(TB) ( '08 ) WLCG cert Prod DDM ops GridKa-T1 400 2500 280 1400 X X X Desy-ZN/HH (+Gött?) 80 550 40 260 X X X Wup 100 290 30 130 X X X Freiburg 100 290 20 130 X X LMU 190 290 40 130 X X X MPI 60 330 30 150 X (X) CH/CSCS 60 198 20 90 X X X CZ/Prague 60 700? 11 400? X X X PL/CYF Krakow 120 465 30 90 X X X A/Innsbruck/Wien 100? 10? X X Tier-2 mostly in operation and used for ATLAS prod and DDM ops German T2 Funding by HGF-Alliance until 2012, addtl. resources from 'D-Grid Sonderinvestitionen' in 2007 Massive jump resources 2007 --> 2008 TAB, FZK, 19 Oct 07 2

Production in GridKa cloud - 2007 Walltime Sites Finished Failed Eff DE 58115 8947 87 CZ 5178 332 94 PL 11816 2675 85 CSCS 2252 283 89 Jobs Sites Finished Failed Eff DE 310563 159905 66 CZ 15164 6689 69 PL 57830 29622 66 CSCS 10894 12354 47 TAB, FZK, 19 Oct 07 3

Distributed data management in ATLAS TAB, FZK, 19 Oct 07 4

Distributed data management in GridKa cloud Continuous MC-data replication in GridKa cloud: TAB, FZK, 19 Oct 07 5

Distributed data management dataset monitoring TAB, FZK, 19 Oct 07 6

DDM -- 1 st real ATLAS data to T1s (M4): Aug22 Sep1 Weitere Cosmic runs bis LHC startup Re-prozessieren an Tier-2s Kalibration/Analyse an dedizierten Tier-2 TAB, FZK, 19 Oct 07 7

GridKa cloud computing team organization of main tasks - MC production team: participate in ATLAS prod on EGEE resources J. Kennedy (M), M. Schroers (W), K. Reeves (W), M. Jahoda(CZ), J. Guenther Distributed data management: data replication, clean-up and integrity checks C. Serfon (M), K. Leffhalm (Desy), A. Olszewski (PL), J. Chudoba (CZ) GridKa ATLAS technical contact: help sort out T1 problems, provide direct link to T1 admins D. Meder (W) (bis Juli), S. Nderitu (BN) (seit Mai) Cloud coordination knowledgeable in all fields & active in ATLAS Computing meetings John Kennedy (M) Monthly phone meetings + Mailing list + Wiki TAB, FZK, 19 Oct 07 8

Manpower Status Substantial Improvement in last 6 months: GridKa ATLAS contact, FR & MPPM support persons Still missing manpower for central tasks: DDM, Production, User support, Hard / impossible to find people with solid Grid-computing background mostly new person-power with professional experience from noncomputing fields extended start-up phase ATLAS GridKa cloud organization and distribution of tasks in progress TAB, FZK, 19 Oct 07 9

ATLAS T1/T2 meeting - yesterday ~15 people from 8 sites: Site&problem reports, common operations, discussions Issues Site usage in ATLAS production more direct involvement in central production data management & distribution: big trouble with incomplete datasets : data analysis mess and huge manual effort to accomplish completion ATLAS wide issue, GridKa cloud ~best for completion monitoring, integrity checks,... garbage pile-up in grey zone between DDM, LFC, FTS/SRM, dcache more manpower and refined distribution of tasks data distribution policies & coordination User support within GridKa cloud: Wiki: https://twiki.cern.ch/twiki/bin/view/atlas/gridkacloudusers Hypernews: https://hypernews.cern.ch/hypernews/atlas/get/gridkacloudusersupport.html... TAB, FZK, 19 Oct 07 10

ATLAS site availability monitoring not yet standard suit of ATLAS SAM tests (cf CMS/LHCb) under development/discussion indirect monitoring via production system... recently started GangaRobot: site functionality for ATLAS distributed analysis complementary requirements wrt production direct access to data on close SE (dcap, xrootd, rfiod) compile&link user-code on Wns 1 st prototype implemented to test availabilty/functionality of EGEE sites TAB, FZK, 19 Oct 07 11

TAB, FZK, 19 Oct 07 12

Tier-3 for ATLAS and User Support ATLAS-D internal meeting in Bonn last May ATLAS Computing Model, GridKa cloud, Distributed analysis needs, Tier-3 options plans/status of ATLAS-D groups Tier-3 Grid Installations Workshop in Wuppertal (Juli 07) ca 20 participants BN, DD, DO, GÖ, MZ, SI glite/dcache installation and configuration, networking requirements,... ATLAS software Installation,... TAB, FZK, 19 Oct 07 13

Helmholtz Alliance-NAF Setup Requirements draft prepared : interactive login & storage space AOD & ntuple (batch-) analysis Proof/ntuple (interactive-) analysis dedicated RAWD/ESD samples for calibration & development TAG database Discussion/feedback w/ Desy in progress Further iterations needed: role of Proof for ATLAS analysis startup scenarios... TAB, FZK, 19 Oct 07 14

Issues for GridKa Mostly smooth running of production decent performance of dcache disk pools production, data replication, T0 tests,... but no systematic longer-term performance/crash test Some hickups/failures for other services: LFC, RB, BDII,.. Need for further systematic tests of FTS configuration for individual T2s and coordinated effort to debug transfer failures to T2s Biggest current worry is availability and performance of tape system: mass staging input rate <<10 MB/s lack of monitoring & transparency admittedly we've used it mainly write-only so far TAB, FZK, 19 Oct 07 15

Conclusions Manpower for Computing Team substantially improved but still critical gaps to fill Organization & distribution of tasks within GridKa cloud in progress need more people for central services Massive CPU&Storage Increase next year at T1 & T2s potentially scaling problems in various areas. In ATLAS computing data management most critical baseline performance meets specs continuous struggle with dataset completion despite manpower effort in DDM ops At GridKa, tape I/O usage, performance and optimization is most critical current issue TAB, FZK, 19 Oct 07 16