Monitoring Infrastructure for Superclusters: Experiences at MareNostrum

Similar documents

MareNostrum: Building and running the system - Lisbon, August 29th, 2005

Red Hat Network Satellite Management and automation of your Red Hat Enterprise Linux environment

Red Hat Satellite Management and automation of your Red Hat Enterprise Linux environment

MALAYSIAN PUBLIC SECTOR OPEN SOURCE SOFTWARE (OSS) PROGRAMME. COMPARISON REPORT ON NETWORK MONITORING SYSTEMS (Nagios and Zabbix)

Agenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC

PANDORA FMS NETWORK DEVICE MONITORING

PANDORA FMS NETWORK DEVICES MONITORING

Solution Guide Parallels Virtualization for Linux

Computational infrastructure for NGS data analysis. José Carbonell Caballero Pablo Escobar

Maintaining Non-Stop Services with Multi Layer Monitoring

Managing your Red Hat Enterprise Linux guests with RHN Satellite

PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN

The Monitis Monitoring Agent ver. 1.2

CATS-i : LINUX CLUSTER ADMINISTRATION TOOLS ON THE INTERNET

Full and Para Virtualization

SEE-GRID-SCI. SEE-GRID-SCI USER FORUM 2009 Turkey, Istanbul December, 2009

Implementing Probes for J2EE Cluster Monitoring

Improved metrics collection and correlation for the CERN cloud storage test framework

LCMON Network Traffic Analysis

D5.6 Prototype demonstration of performance monitoring tools on a system with multiple ARM boards Version 1.0

PANDORA FMS OFFICIAL TRAINING

Enterprise Application Monitoring with

STUDY AND SIMULATION OF A DISTRIBUTED REAL-TIME FAULT-TOLERANCE WEB MONITORING SYSTEM

Spanish Supercomputing Network

CentOS Linux 5.2 and Apache 2.2 vs. Microsoft Windows Web Server 2008 and IIS 7.0 when Serving Static and PHP Content

Tuning Tableau Server for High Performance

XpoLog Center Suite Data Sheet

Monitoring Clusters and Grids

CMS Tier-3 cluster at NISER. Dr. Tania Moulik

COMP/CS 605: Intro to Parallel Computing Lecture 01: Parallel Computing Overview (Part 1)

Performance, Reliability, and Operational Issues for High Performance NAS Storage on Cray Platforms. Cray User Group Meeting June 2007

Microsoft SharePoint Server 2010

Cisco Application Networking Manager Version 2.0

An Oracle White Paper August Oracle WebCenter Content 11gR1 Performance Testing Results

Tools and strategies to monitor the ATLAS online computing farm

Development of Monitoring and Analysis Tools for the Huawei Cloud Storage

Table of Contents Introduction and System Requirements 9 Installing VMware Server 35

How To Build A Supermicro Computer With A 32 Core Power Core (Powerpc) And A 32-Core (Powerpc) (Powerpowerpter) (I386) (Amd) (Microcore) (Supermicro) (

Cray: Enabling Real-Time Discovery in Big Data

Livrable L13.3. Nature Interne Date livraison 12/07/2012. Titre du Document Energy management system and energy consumption efficiency - COEES Code v1

Grid Site Monitoring tools developed and used at at SCL

Estimate Performance and Capacity Requirements for Workflow in SharePoint Server 2010

SCF/FEF Evaluation of Nagios and Zabbix Monitoring Systems. Ed Simmonds and Jason Harrington 7/20/2009

Monitoring HTCondor with Ganglia

The GRID and the Linux Farm at the RCF

Red Hat enterprise virtualization 3.0 feature comparison

Management of VMware ESXi. on HP ProLiant Servers

PARALLELS CLOUD SERVER

CS312 Solutions #6. March 13, 2015

How Microsoft Designs its Cloud-Scale Servers

Building a Linux Cluster

From Ethernet Ubiquity to Ethernet Convergence: The Emergence of the Converged Network Interface Controller

SERVER CLUSTERING TECHNOLOGY & CONCEPT

StruxureWare TM Data Center Expert

Cloud Server. Parallels. An Introduction to Operating System Virtualization and Parallels Cloud Server. White Paper.

Solution for private cloud computing

Product Review: James F. Koopmann Pine Horse, Inc. Quest Software s Foglight Performance Analysis for Oracle

Introduction 1 Performance on Hosted Server 1. Benchmarks 2. System Requirements 7 Load Balancing 7

Cloud3DView: Gamifying Data Center Management

7/15/2011. Monitoring and Managing VDI. Monitoring a VDI Deployment. Veeam Monitor. Veeam Monitor

Cluster Computing at HRI

Mind Q Systems Private Limited

HADOOP AT NOKIA JOSH DEVINS, NOKIA HADOOP MEETUP, JANUARY 2011 BERLIN

Lustre Monitoring with OpenTSDB

Cloud Networking: A Novel Network Approach for Cloud Computing Models CQ1 2009

SQL Server 2012 Database Administration With AlwaysOn & Clustering Techniques

Developing a dynamic, real-time IT infrastructure with Red Hat integrated virtualization

RED HAT ENTERPRISE VIRTUALIZATION

Solution for private cloud computing

The ganglia distributed monitoring system: design, implementation, and experience

Machine Learning Algorithm for Pro-active Fault Detection in Hadoop Cluster

Problem Statement. Jonathan Huang Aditya Devarakonda. Overview

Wait, How Many Metrics? Monitoring at Quantcast

PANDORA FMS OFFICIAL TRAINING

OGF25/EGEE User Forum Catania, Italy 2 March 2009

August Transforming your Information Infrastructure with IBM s Storage Cloud Solution

Optimization of QoS for Cloud-Based Services through Elasticity and Network Awareness

ZEN LOAD BALANCER EE v3.04 DATASHEET The Load Balancing made easy

HP OO 10.X - SiteScope Monitoring Templates

Hadoop Kelvin An Overview

Mitglied der Helmholtz-Gemeinschaft. System monitoring with LLview and the Parallel Tools Platform

IT Infrastructure Management

Monitoring Elastic Cloud Services

Software Scalability Issues in Large Clusters

Logically a Linux cluster looks something like the following: Compute Nodes. user Head node. network

Configuration Management of Massively Scalable Systems

Cloud Computing for Control Systems CERN Openlab Summer Student Program 9/9/2011 ARSALAAN AHMED SHAIKH

With Red Hat Enterprise Virtualization, you can: Take advantage of existing people skills and investments

Load Balancer Comparison: a quantitative approach. a call for researchers ;)

Automating Big Data Benchmarking for Different Architectures with ALOJA

JoramMQ, a distributed MQTT broker for the Internet of Things

System performance monitoring in RTMT

This document describes the new features of this release and important changes since the previous one.

A Performance Monitor based on Virtual Global Time for Clusters of PCs

Crystal Reports Server 2008

Scaling Graphite Installations

WHITE PAPER. ClusterWorX 2.1 from Linux NetworX. Cluster Management Solution C ONTENTS INTRODUCTION

COMPUTING. Centellis Virtualization Platform An open hardware and software platform for implementing virtualized applications

SCUOLA SUPERIORE SANT ANNA 2007/2008

MySQL Enterprise Monitor

Transcription:

ScicomP13 2007 SP-XXL Monitoring Infrastructure for Superclusters: Experiences at MareNostrum Garching, Munich Ernest Artiaga Performance Group BSC-CNS, Operations

Outline BSC-CNS and MareNostrum Overview Monitoring Requirements Building Blocks for a Monitoring System GGcollector: Architecture and Implementation Lessons Learned and Future Work 2

The BSC-CNS Consortium The BSC-CNS Consortium includes: Spanish Government (MEC) Catalonian Government (DURSI - Generalitat) Technical University of Catalunya (UPC) Started operations on 2005 Located at UPC North Campus 3

Top500 4

MareNostrum Overview Computing Nodes 2560 JS21, 2.3GHz 4 cores per board 8 Gbytes RAM 36 Gbytes disk Network Myrinet 2 Spine 1280 10 Clos256 2560 Myrinet cards Gigabit Ethernet 10/100 Ethernet Blade centers Myrinet racks Storage servers Operations rack Gigabit switch 10/100 switches 94,20 peak TFlops 10240 processors 20 TBytes memory 280 + 90 TBytes disk 5

MareNostrum Overview Novell SuSE Linux (kernel 2.6) on each node OS image loaded via Gigabit network Batch System Slurm + Moab (Central Manager node) File systems NFS & GPFS provided by 41 storage servers Local disk for scratch 6

Outline BSC-CNS and MareNostrum Overview Monitoring Requirements Building Blocks for a Monitoring System GGcollector: Architecture and Implementation Lessons Learned and Future Work 7

Supercluster Specifics Increased risk of error conditions Applications span across hundreds (thousands) of nodes Probability of hardware/software/network component failure Heterogeneous workload In practice, general purpose means no patterns Application/system behaviour changes with individual projects Per-processor allocation vs. per-node allocation Goals Detect, process and report component malfunctions Analyze the performance of MareNostrum Support for diagnostic and decision-making 8

Requirements Minimize impact on applications Small nodes need light probes High scalability Big clusters do not like broadcasts and synchronisms Heterogeneous data sources Operating system and scripts on each node Management modules (from blade center technology) Storage and external server logs and tools Batch system controller, job prologs and epilogs Network equipment Redundant information to find discrepancies 9

Flexibility Different data views for humans and systems Overall, synthetic high-level view for system administrators Detailed status for diagnostics Historical records for performance and management Batch system interface for taking scheduler decisions Low-level format for alert systems automatic preventive/corrective actions Able to deal with new software Take advantage of existing tools Location independent Decouple data acquisition/processing/storage/view 10

Outline BSC-CNS and MareNostrum Overview Monitoring Requirements Building Blocks for a Monitoring System GGcollector: Architecture and Implementation Lessons Learned and Future Work 11

The GGCollector Framework Good monitoring tools already exist, but Do not fulfil all our requirements (mainly scalability/footprint) Difficult interoperation with other tools Though some components of the tools may be nice! Additionally, we would like a single tool for everything At least a common interface Our solution: GGCollector A framework to integrate other tools components Provides collection/storage/view independence Compensates weak points from other tools 12

Building Blocks: Ganglia http://ganglia.sourceforge.net/ Widely used monitoring system Three components Gmond: per-node information Gmetad: central collector for all cluster data Web interface: visualization tool Interesting features Gmond s small footprint and extensible metrics Easy to parse XML data exchange Poor scalability on big flat networks Visualization tightly coupled with data collection 13

Building Blocks: RRDTools http://oss.oetiker.ch/rrdtool Common tool for handling historical data Based on large circular buffers Features Can handle several data sources Provides simple statistical functions for collected data Limited source aggregation capacity (scalability limits) Requires predefined fixed-size time slices Lots of front-ends Including web-based graphical interfaces like drraw http://web.taranis.org/drraw/ 14

Building Blocks: Other Sources Batch system Job wrapper scripts on nodes Controller daemon data on head server Data available via API Air conditioning Chiller machines status from ad-hoc scripts More to come... Management module data (node temperature, status, ) Network equipment (interface to mrtg) http://oss.oetiker.ch/mrtg 15

Outline BSC-CNS and MareNostrum Overview Monitoring Requirements Building Blocks for a Monitoring System GGcollector: Architecture and Implementation Lessons Learned and Future Work 16

GGCollector Core of the MareNostrum monitoring infrastructure Developed at BSC-CNS Framework to integrate/complement other tools Avoid unnecessary data movements Metric: basic piece of data Applicable host (may differ from producer) Time stamp Type Value Metric type attributes Data type (integer/float/string) Ttl (expiration time) Threshold (value changes relevant for reporting) Source 17

GGCollector Metric Synthesizer Scalable aggregation facility for data from different nodes Provides simple operations: count, average, max, min, sum. Subscription mode Clients are notified only when relevant changes occur Only requested data is sent Full query mode is also available for one-time requests Other features Low resource consumption Basic CLI interface (extensible via API) Distributed client/server model Scalable by chaining (either as proxy or peer) 18

GGcollector: Architecture Gmond Ganglia Collectors Metric Synthesizer Data Keeper Data Providers C-API Client Support Perl-API Text-based Protocol Clients Clients 19

GGCollector: Configuration Ganglia configuration No Gmetad Data shared only inside a Blade Center GGCollector configuration Both metric types and applicable hosts are pre-defined Tuneable number of threads Other sources Use standard GGCPush client Need data translation into GGCollector format Data acquisition When necessary Opposed to As Fast As Possible 20

GGCollector: Deployment Head server Storage servers Gmond Selected Selected Gmond Gmond Gmond GGCPush Blade center Blade center helper GGCollector GGC2RRD SlurmCtld Head monitor Drraw Moab GGCPush Monitoring servers External Monitor Software Web interface Apache Server 21

GGCollector: Scalability GGCollector chains Peer mode Proxy mode Specialization Peer GGCollector Data Source 1 Data Source 2 Client Data Source 3 Client Client Proxy GGCollector Peer GGCollector Data Source 4 Client Client Data Source 5 22

Examples http://www.bsc.es 23

Examples Internal usage screen with ddraw 24

Examples Ggstatus: command line tool (text based). 25

Outline BSC-CNS and MareNostrum Overview Monitoring Requirements Building Blocks for a Monitoring System GGcollector: Architecture and Implementation Lessons Learned and Future Work 26

Lessons Learned Minimize impact on running applications Tools with small footprint (memory and cpu) on nodes Collect information only when necessary Delays are acceptable!!! Hardware resources for monitoring are limited Move the data out and process elsewhere Decouple data acquisition/processing/storage/view Scalability: think thousands, not hundreds! Avoid synchronisms Avoid broadcasts Simple, generic, extensible framework Easy to integrate, easy to extend 27

Future Work GGCollector is currently in production at BSC-CNS Holds together the MareNostrum monitoring infrastructure Still room for improvement Handling of thresholds and timeouts (per client?) Improvement of chained configuration Integration with other tools Take advantage of existing software Nagios alert system (http://nagios.org) Mrtg (http://oss.oetiker.ch/mrtg) Database backend for historical data Application performance analysis Crossing collected data with batch system job information 28

Spanish Supercomputing Network GGCollector deployment RedIRIS MareNostrum Process: 10240 PowerPC 970 2.3 GHz Memory: 20 TBytes Disk: 280 + 90 TBytes Network: Myrinet, Gigabit, 10/100 System: Linux CeSViMa Process: 2408 PowerPC 970 2.2 GHz Memory: 4.7 TBytes Disk: 63 + 47 TBytes Network: Myrinet, Gigabit, 10/100 System: Linux IAC, UMA, UNICAN, UNIZAR, UV Process: 512 PowerPC 970 2.2 GHz Memory: 1 TByte Disk: 14 + 10 TBytes Network: Myrinet, Gigabit, 10/100 System: Linux 29

Thank you! www.bsc.es 30