Scalable System Monitoring



Similar documents
Report on Project: Advanced System Monitoring for the Parallel Tools Platform (PTP)

Mitglied der Helmholtz-Gemeinschaft. System monitoring with LLview and the Parallel Tools Platform

A highly configurable and efficient simulator for job schedulers on supercomputers

Welcome to the. Jülich Supercomputing Centre. D. Rohe and N. Attig Jülich Supercomputing Centre (JSC), Forschungszentrum Jülich

Mitglied der Helmholtz-Gemeinschaft UNICORE. Uniform Access to JSC Resources. Michael Rambadt, 20.

2311A: Advanced Web Application Development using Microsoft ASP.NET Course 2311A Three days Instructor-led

Parallel file I/O bottlenecks and solutions

Parallel I/O on JUQUEEN

Managing Large Imagery Databases via the Web

Anwendungsintegration und Workflows mit UNICORE 6

HPC technology and future architecture

PRIMERGY server-based High Performance Computing solutions

Advanced Web Application Development using Microsoft ASP.NET

CUDA Tools for Debugging and Profiling. Jiri Kraus (NVIDIA)

Advanced Web Application Development using Microsoft ASP.NET

WEB PAGE, DIGITAL/MULTIMEDIA AND INFORMATION RESOURCES DESIGN

Catálogo de cursos plataforma elearning Microsoft Imagine Academy: Microsoft SQL Server y Visual Studio

SOFTWARE TESTING TRAINING COURSES CONTENTS

Parallel Visualization of Petascale Simulation Results from GROMACS, NAMD and CP2K on IBM Blue Gene/P using VisIt Visualization Toolkit

JUROPA Linux Cluster An Overview. 19 May 2014 Ulrich Detert

File transfer in UNICORE State of the art

Windows Presentation Foundation (WPF)

Oracle Data Integrator 12c: Integration and Administration

On Enabling Hydrodynamics Data Analysis of Analytical Ultracentrifugation Experiments

vcenter Operations Manager Administration 5.0 Online Help VPAT

Output Agent Throughput

Analytics for Performance Optimization of BPMN2.0 Business Processes

A Process Oriented Tool for Mobile Devices for Monitoring OSCAR Clusters

JOB DESCRIPTION APPLICATION LEAD

Also on the Performance tab, you will find a button labeled Resource Monitor. You can invoke Resource Monitor for additional analysis of the system.

Curl Building RIA Beyond AJAX

LSKA 2010 Survey Report Job Scheduler

Jozef Matula. Visualisation Team Leader IBL Software Engineering. 13 th ECMWF MetOps Workshop, 31 th Oct - 4 th Nov 2011, Reading, United Kingdom

for High Performance Computing

Cosmological simulations on High Performance Computers

PIVOTAL CRM ARCHITECTURE

Operations Management Software for the K computer

Client Overview. Engagement Situation. Key Requirements

Accelerating your engineering workflow. Engineering solutions for industry

Human Brain Project -

Computer Information Systems (CIS)

Managing Complexity in Distributed Data Life Cycles Enhancing Scientific Discovery

Application Testing Suite Oracle Load Testing Introduction

IBM Rational ClearCase, Version 8.0

Service Quality Analytics and Visualizations. SLA Suite

Cluster, Grid, Cloud Concepts

Thin Client Total Cost of Ownership & ACP ThinManager Enterprise Software Advantages. White Paper

Scalable monitoring and configuration tools for grids and clusters

Titolo del paragrafo. Titolo del documento - Sottotitolo documento The Benefits of Pushing Real-Time Market Data via a Web Infrastructure

BSC vision on Big Data and extreme scale computing

NVIDIA IndeX Enabling Interactive and Scalable Visualization for Large Data Marc Nienhaus, NVIDIA IndeX Engineering Manager and Chief Architect

Next Generation Lab. A solution for remote characterization of analog integrated circuits

Microsoft SQL Server 2012: What to Expect

D83167 Oracle Data Integrator 12c: Integration and Administration

ASYST Intelligence South Africa A Decision Inc. Company

The ADOxx Metamodelling Platform Workshop "Methods as Plug-Ins for Meta-Modelling" in conjunction with "Modellierung 2010", Klagenfurt

Developing a Website. Chito N. Angeles Web Technologies: Training for Development and Teaching Resources

GC3: Grid Computing Competence Center Cluster computing, I Batch-queueing systems

Distributed Computing and Big Data: Hadoop and MapReduce

Toad for Oracle 8.6 SQL Tuning

HTML5. Turn this page to see Quick Guide of CTTC

Situational Awareness Through Network Visualization

ATLAS job monitoring in the Dashboard Framework

Product Comparison List

Microsoft SharePoint 2010, Application Development Course Outline

zen Platform technical white paper

Grid Scheduling Dictionary of Terms and Keywords

Developing Microsoft SharePoint Server 2013 Core Solutions

Don Bartley Technology Research. Open Source Activity at the Australian Bureau of Statistics

Product Comparison List

JuRoPA. Jülich Research on Petaflop Architecture. One Year on. Hugo R. Falter, COO Lee J Porter, Engineering

The Information Technology Solution. Denis Foueillassar TELEDOS project coordinator

ACEYUS REPORTING. Aceyus Intelligence Executive Summary

ITG Software Engineering

Oracle Data Integrator 11g: Integration and Administration

COMP5426 Parallel and Distributed Computing. Distributed Systems: Client/Server and Clusters

HPC at IU Overview. Abhinav Thota Research Technologies Indiana University

1 What Are Web Services?

Deploying MATLAB -based Applications David Willingham Senior Application Engineer

Self-Service Business Intelligence

Drupal Performance Tuning

SAP Data Services 4.X. An Enterprise Information management Solution

Manage Software Development in LabVIEW with Professional Tools

ERIE COMMUNITY COLLEGE COURSE OUTLINE A. COURSE NUMBER CS ADVANCED WEB DEVELOPMENT & PROGRAMMING II

ERIE COMMUNITY COLLEGE COURSE OUTLINE A. COURSE NUMBER CS WEB DEVELOPMENT & PROGRAMMING I AND TITLE:

Instrumentation Software Profiling

Cleo Communications. CUEScript Training

User Reports. Time on System. Session Count. Detailed Reports. Summary Reports. Individual Gantt Charts

Mathematical Libraries and Application Software on JUROPA and JUQUEEN

Overlapping Data Transfer With Application Execution on Clusters

Welcome to the Force.com Developer Day

Monitoring system for OpenPBS environment

High Performance Computing Facility Specifications, Policies and Usage. Supercomputer Project. Bibliotheca Alexandrina

Streamline Integration using MPI-Hybrid Parallelism on a Large Multi-Core Architecture

Workspaces Concept and functional aspects

Developing Parallel Applications with the Eclipse Parallel Tools Platform

Obfuscated Biology -MSc Dissertation Proposal- Pasupula Phaninder University of Edinburgh March 31, 2011

Ergon Workflow Tool White Paper

The Director Software V4 and the Scalability Tool

MyCompany Professional Web Developer Certification Examination Specification

Transcription:

Mitglied der Helmholtz-Gemeinschaft PTP Scalable System Monitoring with Eclipse Parallel Tools Platform Wolfgang Frings Jülich Supercomputing Centre September 2012, CHANGES Workshop W.Frings@fz-juelich.de

Why System Monitoring in an IDE? Eclipse/PTP: Development, execution and debugging of parallel programs on remote systems Information about remote system status: Where is my job running? What s going on remote system? Why is my job not running? Monitoring parallel programs on remote system: Eclipse/PTP version of llq / qstat LLview batch system monitoring tool LLview: graphical monitoring, mapping of jobs to system resources September 2012, CHANGES Workshop, Jülich Scalable System Monitoring with PTP 2

Sample: llview client, JUGENE September 2012, CHANGES Workshop, Jülich Scalable System Monitoring with PTP 3

LLview and Eclipse PTP PTP: Eclipse Parallel Tools Platform Remote system monitoring New implementation using LML, LLview components and LML adapters Project A Scalable Development Environment for Peta-Scale Computing 3-years project, started September 2009, DOE funded JSC contribution: System and application monitoring LLview components integration, part of Eclipse/PTP (since June 2011) since June 12 September 2012, CHANGES Workshop, Jülich Scalable System Monitoring with PTP 4

PTP Monitoring: Example JUQUEEN September 2012, CHANGES Workshop, Jülich Scalable System Monitoring with PTP 5

PTP Monitoring: Further Examples Trestles (XSEDE) Juropa (Juelich) Lincoln (NCSA) Forge (XSEDE) September 2012, CHANGES Workshop, Jülich Scalable System Monitoring with PTP 6

Monitoring System Dataflow LML_da Implements distributed workflow for data acquisition and data processing LML-Request: describes data to visualize in client Implements server-side filtering and support for level-of-details September 2012, CHANGES Workshop, Jülich Scalable System Monitoring with PTP 7

Large-scale system Markup Language LML: Markup language for description of supercomputer's status Interface between LML_da and visualization clients Describes all graphical components available in LLview (job-list, nodedisplay, charts ) Logical an system independent description of current status, which can easily be converted into graphical output Implemented in XML, validation against XML-Schema LML Eclipse/PTP LML_da (data access) September 2012, CHANGES Workshop, Jülich Scalable System Monitoring with PTP 8

LML Structure Global data: intermediate data format, scheduling objects Graphical components: data for visualization components Layout: hints for visualization September 2012, CHANGES Workshop, Jülich Scalable System Monitoring with PTP 9

Node Display Description of the supercomputer's architecture Maps jobs to compute resources (nodes, CPUs) Challenge: description of large systems (e.g. JUQUEEN: 131072 CPUs) Targets: data compression, avoid redundancy September 2012, CHANGES Workshop, Jülich Scalable System Monitoring with PTP 10

Node Display Compression September 2012, CHANGES Workshop, Jülich Scalable System Monitoring with PTP 11

Node Display Compression Attribute inheritance September 2012, CHANGES Workshop, Jülich Scalable System Monitoring with PTP 12

Node Display Compression Scheme: defines architecture of empty system September 2012, CHANGES Workshop, Jülich Scalable System Monitoring with PTP 13

Node Display Compression Ranges 3 objects instead of 16 September 2012, CHANGES Workshop, Jülich Scalable System Monitoring with PTP 14

Scalable Visualization Row level September 2012, CHANGES Workshop, Jülich Scalable System Monitoring with PTP 15

Scalable Visualization Nodeboard level September 2012, CHANGES Workshop, Jülich Scalable System Monitoring with PTP 16

Scalable Visualization CPU Node level September 2012, CHANGES Workshop, Jülich Scalable System Monitoring with PTP 17

Table Filtering Scalability of job data display Filtering by specification of attributes values/ranges Server- and client-side filtering September 2012, CHANGES Workshop, Jülich Scalable System Monitoring with PTP 18

Predicition of system usage (JuFo) Online simulation for job schedulers on supercomputers (JuFo: Juelich Forecast) Independent module using intermediate LML as data interface Target system independent prediction of job start dates Visualisation: Gantt chart Carsten Karbach, JSC September 2012, CHANGES Workshop, Jülich Scalable System Monitoring with PTP 19

Conclusion Eclipse PTP: Integration tool to develop parallel codes on remote system Code + Execution + Debugging System monitoring to control program execution Scalability Scalable data acquisition Scalable data format Scalable data presentation Flexibility and portability System independent data format Small system and scheduler related drivers Eclipse/PTP integration September 2012, CHANGES Workshop, Jülich Scalable System Monitoring with PTP 20

Questions? September 2012, CHANGES Workshop, Jülich Scalable System Monitoring with PTP 21