Inca User-level Grid Monitoring



Similar documents
Inca User-level Grid Monitoring

A Survey Study on Monitoring Service for Grid

XSEDE Service Provider Software and Services Baseline. September 24, 2015 Version 1.2

Monitoring Clusters and Grids

Cloud Computing. Lecture 5 Grid Case Studies

Installing and Administering VMware vsphere Update Manager

GRID COMPUTING Techniques and Applications BARRY WILKINSON

S3 Monitor Design and Implementation Plans

SCF/FEF Evaluation of Nagios and Zabbix Monitoring Systems. Ed Simmonds and Jason Harrington 7/20/2009

PACE Predictive Analytics Center of San Diego Supercomputer Center, UCSD. Natasha Balac, Ph.D.

NetWrix SQL Server Change Reporter. Quick Start Guide

SOFTWARE TESTING TRAINING COURSES CONTENTS

IBM Endpoint Manager Version 9.1. Patch Management for Red Hat Enterprise Linux User's Guide

BI xpress Product Overview

Connecting to the University Wireless Network

globus online Globus Online for Research Data Management Rachana Ananthakrishnan Great Plains Network Annual Meeting 2013

Selenium An Effective Weapon In The Open Source Armory

FioranoMQ 9. High Availability Guide

Quick Start Guide. Citrix XenServer Hypervisor. Server Mode (Single-Interface Deployment) Before You Begin SUMMARY OF TASKS

NetWrix File Server Change Reporter. Quick Start Guide

GRMS Features and Benefits

[D-View 7 Advanced Hands-On Practice] Version 1.0

Protect Your Cisco UCS Domain with Symantec NetBackup

Web services with WebSphere Studio: Deploy and publish

[Jet-Magento Integration]

Data Movement and Storage. Drew Dolgert and previous contributors

CHAPTER 4 PROPOSED GRID NETWORK MONITORING ARCHITECTURE AND SYSTEM DESIGN

Change Manager 5.0 Installation Guide

GT 6.0 GRAM5 Key Concepts

NEFSIS DEDICATED SERVER

About Network Data Collector

Monitor Solution Best Practice v3.2 part of Symantec Server Management Suite

Anar Manafov, GSI Darmstadt. GSI Palaver,

Umbraco Courier 2.0. Installation guide. Per Ploug Hansen 5/24/2011

CME ClearPort (CP) API v.2.0. Certification Process Checklist

ESX 4 Patch Management Guide ESX 4.0

Apache Hama Design Document v0.6

VELOCITY. Quick Start Guide. Citrix XenServer Hypervisor. Server Mode (Single-Interface Deployment) Before You Begin SUMMARY OF TASKS

An approach to grid scheduling by using Condor-G Matchmaking mechanism

Patch Management. Module VMware Inc. All rights reserved

Tutorial: Packaging your server build

Deploying Business Virtual Appliances on Open Source Cloud Computing

Virtual Appliance Setup Guide

USER S MANUAL Cloud Firewall Cloud & Web Security

MALAYSIAN PUBLIC SECTOR OPEN SOURCE SOFTWARE (OSS) PROGRAMME. COMPARISON REPORT ON NETWORK MONITORING SYSTEMS (Nagios and Zabbix)

IBM Security Access Manager, Version 8.0 Distributed Session Cache Architectural Overview and Migration Guide

DEPLOYMENT GUIDE Version 1.0. Deploying the BIG-IP LTM with the Nagios Open Source Network Monitoring System

Getting Started with the Ed-Fi ODS and Ed-Fi ODS API

Configuring Apache HTTP Server as a Reverse Proxy Server for SAS 9.3 Web Applications Deployed on Oracle WebLogic Server

What makes Panda Cloud Protection different? Is it secure? How messages are classified... 5

File S1: Supplementary Information of CloudDOE

Understanding Task Scheduler FIGURE Task Scheduler. The error reporting screen.

Monitoring Oracle Enterprise Performance Management System Release Deployments from Oracle Enterprise Manager 12c

Web based training for field technicians can be arranged by calling These Documents are required for a successful install:

BMC BladeLogic Client Automation Installation Guide

Copyrighted , Address :- EH1-Infotech, SCF 69, Top Floor, Phase 3B-2, Sector 60, Mohali (Chandigarh),

: Test 217, WebSphere Commerce V6.0. Application Development

Support a New Class of Applications with Cisco UCS M-Series Modular Servers

Debugging and Profiling Lab. Carlos Rosales, Kent Milfeld and Yaakoub Y. El Kharma

GSI Credential Management with MyProxy

Smart Call Home Quick Start Configuration Guide

Use of Continuous Integration Tools for Application Performance Monitoring

globus online Integrating with Globus Online Steve Tuecke Computation Institute University of Chicago and Argonne National Laboratory

THE CCLRC DATA PORTAL

Tracking Network Changes Using Change Audit

Installing and Configuring vcenter Support Assistant

Data Center Automation with SUSE Manager Federal Deployment Agency Bundesagentur für Arbeit Data Center Automation Project

UCS. Amazing tools suite in CORBA world

Grid Computing With FreeBSD

Web Content Management System

Lavastorm Analytic Library Predictive and Statistical Analytics Node Pack FAQs

HP Service Manager. Software Version: 9.40 For the supported Windows and Linux operating systems. Processes and Best Practices Guide (Codeless Mode)

Using Network Application Development (NAD) with InTouch

MSI Admin Tool User Guide

CDAT Overview. Remote Managing and Monitoring of SESM Applications. Remote Managing CHAPTER

Proposal Submission System - A Content Management System Approach for Proposal Submission

Sisense. Product Highlights.

Database FAQs - SQL Server

NETWRIX CHANGE NOTIFIER

Transcription:

Inca User-level Grid Monitoring Shava Smallen ssmallen@sdsc.edu SC 09 November 17, 2009

Goal: reliable grid software and services for users Over 750 TF Over 30 PB of online and archival data storage Connected via dedicated multi- Gbps links 30-63 software packages and 6-23 services per resource 11 TeraGrid sites, 21 resources

Related Grid monitoring tools Inca s primary objective: user-level Grid monitoring

User-level grid monitoring Runs from a standard user account Executes using a standard GSI credential Uses tests that are developed and configured based on user documentation Automates periodic execution of tests Verifies user-accessible Grid access points Centrally manages monitoring configuration Easily updates and maintains monitoring deployment

Who benefits from user-level grid monitoring? Grid managers Verify requirements are fulfilled by resource providers Identify failure trends System administrators Email notification Debugging support System administrators End users Debug user account/ environment issues Advanced users: feedback to Grid/VO

Inca provides user-level grid monitoring Stores and archives a wide variety of monitoring results Captures context of monitoring result as it is collected Eases the writing, deploying, and sharing of new tests or benchmarks Flexible and comprehensive web status pages Secure

Reporters collect monitoring data Executable programs that measure some aspect of the system or installed software Supports a set of command-line options and writes XML to stdout Schema supports multiple types of data Extensive library support for perl and python scripts (most reporters < 30 lines of code) Independent of other Inca components

Libraries support common reporter tasks Reporter Purpose Perl Library Python Library General report Inca::Reporter inca.reporter Software version testing Inca::Reporter::Version inca.versionreporter Software unit testing Inca::Reporter::SimpleUnit inca.simpleunitreporter Globus unit testing Inca::Reporter::GlobusUnit inca.globusunitreporter System performance testing Inca::Reporter::Performance inca.performancereporter Documentation http://inca.sdsc.edu/releases/latest/repdocs/perl.html http://inca.sdsc.edu/releases/latest/repdocs/python.html

Repositories support sharing Collection of reporters available via a URL Supports package dependencies Packages versioned to allow for automatic updates Inca project repository contains 150+ reporters Version, unit test, performance benchmark reporters Grid middleware and tools, compilers, math libraries, data tools, and viz tool

Agent provides centralized configuration and management Implements the configuration specified by Inca administrator Stages and launches a reporter manager on each resource - local, SSH, GRAM, WS-GRAM Sends package and configuration updates Manages proxy information Administration via GUI interface (incat) Screenshot of Inca GUI tool, incat, showing the reporters that are available from a local repository

Depot stores and publishes data Stores configuration information and monitoring results Provides full archiving of reports Uses relational database backend via Hibernate Supports HQL and predefined queries Supports plug-in customization (e.g., email notifications, downtimes) Web services - Query data from depot and return as XML

Consumer displays data Current and historical views Web application packaged with Jetty JSP 2.0 pages/tags to query data and format using XSLT CeWolf/JFreeChart to graph data Ability to fetch Inca data in HTML or XML format via REST URLs *new* Allow run nows from the Inca web status pages *new*

Tests Summary Weekly status report Cumulative test status by resource Resource status history Error history summary Related test histories Inca s status pages provide multiple levels of details Test status by package and resource Test Details Individual test history Individual test result details Historical Current status

New features Depot peering provides fault tolerance Approval mode allows system administrators greater control over testing on their resources

Software status and deployments Current software version: 2.5 (available from Inca website) http://inca.sdsc.edu

Running since 2003 Resource registration in Inca TeraGrid deployment information services (IIS) Total of ~2200 tests running on 18 login nodes, 2 grid nodes, and 3 servers Email notifications for critical services Cross-site tests CA certificate and CRL checking Screenshot of Inca status pages for TeraGrid http://inca.teragrid.org/ IIS paper to be given on Friday at GCE 09 by JP Navarro

Using Inca and IPM to measure performance variation on TeraGrid Joint work with Nick Wright (LBL) and PMaC group Performance variation makes code optimization challenging Benchmarks: Paratec, WRF, PingPong, and HPCC (naturally ordered ring and random ring) Machines: NCSA Abe, NICS Kraken, and TACC Ranger 256 processors twice a day during March 2009 Variance due to network contention; higher for WRF Mean MPI ping pong bandwidth history Wright, N., Smallen, S., Olschanowsky, C., Hayes, J., Snavely, A. Measuring and Understanding Variation in Benchmark Performance, published in HPCMP UGC 2009

Inca UC Grid deployment University of California campuses Lead by UCLA First deployed in August and now includes 4 campuses Total of 71 tests running on 5 portal servers, 4 appliance nodes, and the myproxy server http://inca.ucgrid.org

Inca monitoring benefits end users Tests resources and services used by LEAD. E.g. Pings service every 3 mins Verifies batch job submission every hour Automatically notifies admins of failures Show week of history in custom status pages Inca reported errors mirror failures we ve observed and as they are addressed we ve noticed an improvement in TeraGrid s stability. -- Suresh Marru (LEAD developer)

Benefits of using Inca Detect problems before the users notice them Easy to write and share tests and benchmarks Easy to deploy and maintain Flexible and comprehensive displays

Inca Information Announcements: inca-users@sdsc.edu Supported by: Email: inca@sdsc.edu Website: http://inca.sdsc.edu