ANALYSIS FUNCTIONAL AND STRESS TESTING

Similar documents
Monitoring Evolution WLCG collaboration workshop 7 July Pablo Saiz IT/SDC

ATLAS job monitoring in the Dashboard Framework

ARDA Experiment Dashboard

PoS(EGICF12-EMITC2)110

ATLAS Petascale Data Processing on the Grid: Facilitating Physics Discoveries at the LHC

Status and Evolution of ATLAS Workload Management System PanDA

Das HappyFace Meta-Monitoring Framework


The LHCb Software and Computing NSS/IEEE workshop Ph. Charpentier, CERN

Virtualisation Cloud Computing at the RAL Tier 1. Ian Collier STFC RAL Tier 1 HEPiX, Bologna, 18 th April 2013

VMware Virtualization and Cloud Management Overview VMware Inc. All rights reserved

Report from SARA/NIKHEF T1 and associated T2s

Scalable Architecture on Amazon AWS Cloud

Comparison of the Frontier Distributed Database Caching System with NoSQL Databases

The Evolution of Cloud Computing in ATLAS

Intel IT Cloud 2013 and Beyond. Name Title Month, Day 2013

The CMS analysis chain in a distributed environment

Client/Server Grid applications to manage complex workflows

Installation Runbook for Avni Software Defined Cloud

CERN Cloud Storage Evaluation Geoffray Adde, Dirk Duellmann, Maitane Zotes CERN IT

How To Use Happyface (Hf) On A Network (For Free)

Oracle Reference Architecture and Oracle Cloud

Cisco IT Hadoop Journey

PoS(EGICF12-EMITC2)005

CSE 544 Principles of Database Management Systems. Magdalena Balazinska (magda) Winter 2009 Lecture 1 - Class Introduction

CERN local High Availability solutions and experiences. Thorsten Kleinwort CERN IT/FIO WLCG Tier 2 workshop CERN

Tier0 plans and security and backup policy proposals

SharePoint 2010 Interview Questions-Architect

ATLAS GridKa T1/T2 Status

What s new in AM 9.30 Accelerating business outcomes

Status and Integration of AP2 Monitoring and Online Steering

Database Monitoring Requirements. Salvatore Di Guida (CERN) On behalf of the CMS DB group

Veeam Summer School. Thomas Zaatman Veeam Software

Elastic Detector on Amazon Web Services (AWS) User Guide v5

Distributed Database Access in the LHC Computing Grid with CORAL

The dashboard Grid monitoring framework

Computing at the HL-LHC

The Evolution of Cloud Computing in ATLAS

AVI NETWORKS CLOUD APPLICATION DELIVERY PLATFORM INTEGRATION WITH CISCO APPLICATION CENTRIC INFRASTRUCTURE

APPLICATION MANAGEMENT SUITE FOR ORACLE E-BUSINESS SUITE APPLICATIONS

Consulting Solutions Disaster Recovery. Yucem Cagdar

Sugar Professional. Approvals Competitor tracking Territory management Third-party sales methodologies

Product Information. Sugar vs Zoho. Features Comparison

Report on WorkLoad Management activities

APPLICATION MANAGEMENT SUITE FOR ORACLE E-BUSINESS SUITE APPLICATIONS

CLOUDFORMS Open Hybrid Cloud

Cloud Based Application Architectures using Smart Computing

COMPARING NETWORK AND SERVER MONITORING TOOLS

and Deployment Roadmap for Satellite Ground Systems

Scalable Network Monitoring with SDN-Based Ethernet Fabrics

Dynamic Services from T-Systems: Enterprise Cloud Computing in practice

Service Management in Microsoft Dynamics CRM 2011

Availability for your modern datacenter

CHAMELEON: A LARGE-SCALE, RECONFIGURABLE EXPERIMENTAL ENVIRONMENT FOR CLOUD RESEARCH

How To Build A Software Defined Data Center

Evolution of Database Replication Technologies for WLCG

PUBLIC, PRIVATE, OR HYBRID: WHICH CLOUD IS BEST FOR YOUR APPLICATIONS?

The Impact of PaaS on Business Transformation

Evolution of the ATLAS PanDA Production and Distributed Analysis System

Linux/Open Source and Cloud computing Wim Coekaerts Senior Vice President, Linux and Virtualization Engineering

EUMEDGrid-Support Supporting EUMEDGRID-Support e-infrastructure sustainability Overview of monitoring tools

CSE 544 Principles of Database Management Systems. Magdalena Balazinska (magda) Fall 2007 Lecture 1 - Class Introduction

Alternative Deployment Models for Cloud Computing in HPC Applications. Society of HPC Professionals November 9, 2011 Steve Hebert, Nimbix

High Availability Databases based on Oracle 10g RAC on Linux

Copyright Soleran, Inc. esalestrack On-Demand CRM. Trademarks and all rights reserved. esalestrack is a Soleran product Privacy Statement

Scientific Cloud Computing Infrastructure for Europe Strategic Plan. Bob Jones,

How To Run A Modern Business With Microsoft Arknow

Transcription:

ANALYSIS FUNCTIONAL AND STRESS TESTING Dan van der Ster, CERN IT-ES-DAS for the HC team: Johannes Elmsheuser, Federica Legger, Mario Úbeda García WLCG Workshop, 8 July 2010

Outline Overview what should we test in Distributed Analysis? HammerCloud and GangaRobot Tools for stress and functional testing Recent Developments HammerCloud v3 deployment Next Steps the AFT, integration with SSB Analysis and Functional and Stress Testing Dan van der Ster 2

DA Testing Goals Functional Testing: Test the basic infrastructure SAM Nagios. Not covered in this talk. Basic test of the complete analysis workflow Client Workload Mgmt...Site Worker Node Storage Special workflows: Complete chain test with Frontier/Squid access Tier 3 analysis Stress Testing: On-demand test to help commission/tune/benchmark the analysis sites. Standardized tests end-to-end test with real analyses of real data Analysis and Functional and Stress Testing Dan van der Ster 3

Intro to HammerCloud HammerCloud (HC) is a Distributed Analysis testing system serving these two use-cases: Robot-like Functional Testing: frequent ping jobs to all sites to perform end-to-end DA testing DA Stress Testing: on-demand (large-scale) stress tests using real analysis jobs to test one or many sites simultaneously to: Help commission new sites Evaluate changes to site infrastructure Evaluate SW changes Compare site performances ATLAS has already made a big investment in HC stress testing: ~210,000 CPU-wallclock days (that s 576 CPU-years) But this is only a few percent of the global DA resources Analysis and Functional and Stress Testing Dan van der Ster 4

HammerCloud Web UI http://hammercloud.cern.ch/atlas/ Analysis and Functional and Stress Testing Dan van der Ster 5

Implementation The HC UI is implemented as a Django web app: View test results View cloud/site evolution DB Admin State and results are maintained in MySQL HC Logic (job submission, monitoring, resubmission) implemented on top of the Ganga Grid Programming Interface (GPI) Analysis and Functional and Stress Testing Dan van der Ster 6

HammerCloud v3 HammerCloud v3 was recently deployed What s new? Test Templates standardized tests are templated. Templates are instantiated as a Test Functional Testing automatic instantiation of functional Templates at a defined frequency (these are the GangaRobot tests) Robot Report graphical display of site efficiencies for the functional tests Behind the scenes refactoring move to SL5, RPMs, generalizing the code for non-athena/non-atlas tests Plus many small interface changes Analysis and Functional and Stress Testing Dan van der Ster 7

HC Ops Functional Tests Currently active Functional Tests: UserAnalysis, Athena 15.6.9 with mc0*.merge.aod.e*_r* on Panda and LCG Data access: Panda schedconfig, local direct, FileStager 4 tests in total D3PDMaker, Athena 15.6.10.6 (Frontier/Squid test) on Panda and LCG Data access: Panda schedconfig, local direct 2 tests in total Each tests is set to keep 1 job running at all sites continuously Analysis and Functional and Stress Testing Dan van der Ster 8

HC Robot Report Analysis and Functional and Stress Testing Dan van der Ster 9

HC in SAM / HC Email Report http://dashb-sam-atlas.cern.ch Email robot report (currently sent to DAST) And available on the web: http://gangarobot.cern.ch/blac klist_hammercloud.html Analysis and Functional and Stress Testing Dan van der Ster 10

HC in Panda Monitor Panda processingtypes: Functional tests use processingtype=gangarobot Stress tests use processingtype=hammercloud Browse the HC results in the Panda monitor: http://tiny.cc/panda-gangarobot http://tiny.cc/panda-hammercloud Presentation title - 11

HC Ops Stress Tests A number of test templates are ready for site or cloud admins to schedule on-demand An HC account is needed. Contact us if you want one. On Panda or LCG using any data access method (including Panda FileStager or direct access): Muon Analysis, Athena 15.6.6, mc09*merge.aod*.e*r12* D3PDMaker, Athena 15.6.10.6, data10_7tev*physics_*aod*, Frontier/Squid Panda Tier 3 Test: Muon Analysis, Athena 15.6.6 you mail us a list of PFNs Analysis and Functional and Stress Testing Dan van der Ster 12

Example Stress Test Analysis and Functional and Stress Testing Dan van der Ster 13

Next Steps Fix the frontier/squid test ~25% of the jobs are currently crashing Is correlated in a non-obvious way to the desd dataset used currently replicating a known working dataset to all DE sites (globally later) Integrate with Site Status Board this is the long-discussed ADC Analysis Functional Test SSB implements the policy for site exclusion Will provide better communication to sites in case of exclusion Auto-approval for some test requests Manual approval isn t needed if the test is simply an instance of one of the approved Templates Site Ranking Tool Provides a score (per test template) to make comparisons Robot Web display to present results separated by Test Type (template) E.g see only the Frontier/Squid test results. Analysis and Functional and Stress Testing Dan van der Ster 14

Conclusions HammerCloud has taken over responsibilities for all central DA tests: functional and stress testing New template model makes the usage more user friendly Not much room for error enables automatic test approval Test results are available in a variety of places: HC web, Email reports, SAM Dashboard, Panda Monitor, (SSB coming soon) Acknowledgements to the HC team: Johannes Elmsheuser, Federica Legger, Mario Úbeda García Analysis and Functional and Stress Testing Dan van der Ster 15