LMT Lustre Monitoring Tools



Similar documents
Cray Lustre File System Monitoring

New and Improved Lustre Performance Monitoring Tool. Torben Kling Petersen, PhD Principal Engineer. Chris Bloxham Principal Architect

Lustre & Cluster. - monitoring the whole thing Erich Focht

TEST AUTOMATION FRAMEWORK

Monitoring the Lustre* file system to maintain optimal performance. Gabriele Paciucci, Andrew Uselton

Jason Hill HPC Operations Group ORNL Cray User s Group 2011, Fairbanks, AK

Lustre Monitoring with OpenTSDB

Monitoring Tools for Large Scale Systems

Ganglia & Nagios. Maciej Lasyk 11. Sesja Linuksowa Wrocław, /25. Maciej Lasyk, Ganglia & Nagios

April 8th - 10th, 2014 LUG14 LUG14. Lustre Log Analyzer. Kalpak Shah. DataDirect Networks. ddn.com DataDirect Networks. All Rights Reserved.

How To Write A Libranthus (Libranthus) On Libranus 2.4.3/Libranus 3.5 (Librenthus) (Libre) (For Linux) (

Performance Comparison of Intel Enterprise Edition for Lustre* software and HDFS for MapReduce Applications

Tools and strategies to monitor the ATLAS online computing farm

Open Source Data Protection. Bareos Open Source Data Protection

Developing High-Performance, Scalable, cost effective storage solutions with Intel Cloud Edition Lustre* and Amazon Web Services

ALERT installation setup

Lustre * Filesystem for Cloud and Hadoop *

SCF/FEF Evaluation of Nagios and Zabbix Monitoring Systems. Ed Simmonds and Jason Harrington 7/20/2009

DTWMS Required Software Engineers. 1. Senior Java Programmer (3 Positions) Responsibilities:

Dynamic Honeypot Construction

Analyzing large flow data sets using. visualization tools. modern open-source data search and. FloCon Max Putas

High Performance Computing OpenStack Options. September 22, 2015

Performance Comparison of SQL based Big Data Analytics with Lustre and HDFS file systems

COURSE CONTENT Big Data and Hadoop Training

Monitoring Clusters and Grids

Figure 1. perfsonar architecture. 1 This work was supported by the EC IST-EMANICS Network of Excellence (#26854).

Graylog2 Lennart Koopmann, OSDC /

Using Moab Service Manager. Steve Hurst September 17, 2009

Improved metrics collection and correlation for the CERN cloud storage test framework

Peers Techno log ies Pv t. L td. HADOOP

ITG Software Engineering

Client Overview. Engagement Situation. Key Requirements

Backup with rdiff-backup and rsnapshot

Presented by: CSIR-KNOWGATE. KNOWGATE KNOWGATE Website: knowgate.niscair.res.in

TPAf KTl Pen source. System Monitoring. Zenoss Core 3.x Network and

STUDY AND SIMULATION OF A DISTRIBUTED REAL-TIME FAULT-TOLERANCE WEB MONITORING SYSTEM

Customer Case Study. Sharethrough

Yahoo! Communities Architectures Ian Flint

Implementing a Data Warehouse with Microsoft SQL Server 2012

Uncovering degraded application performance with LWM 2. Aamer Shah, Chih-Song Kuo, Lucas Theisen, Felix Wolf November 17, 2014

Oak Ridge National Laboratory Computing and Computational Sciences Directorate. File System Administration and Monitoring

PowerShell. It s time to own. David Kennedy (ReL1K) Josh Kelley (Winfang) Twitter: dave_rel1k

MALAYSIAN PUBLIC SECTOR OPEN SOURCE SOFTWARE (OSS) PROGRAMME. COMPARISON REPORT ON NETWORK MONITORING SYSTEMS (Nagios and Zabbix)

高 通 量 科 学 计 算 集 群 及 Lustre 文 件 系 统. High Throughput Scientific Computing Clusters And Lustre Filesystem In Tsinghua University

Wait, How Many Metrics? Monitoring at Quantcast

USE OF PYTHON AS A SATELLITE OPERATIONS AND TESTING AUTOMATION LANGUAGE

Introduction to Highly Available NFS Server on scale out storage systems based on GlusterFS

A Performance Analysis of Distributed Indexing using Terrier

Request Tracker/RTx::AssetTracker at DigitalGlobe

Forensic Clusters: Advanced Processing with Open Source Software. Jon Stewart Geoff Black

Using Redis as a Cache Backend in Magento

BIRT Application and BIRT Report Deployment Functional Specification

I don t intend to cover Python installation, please visit the Python web site for details.

Continuous Integration

OpenITSM - IT Service Management with Open Source Software

Chatbots 3.3. Chatbots in Web Applications with RiveScript. Presented by Noah Petherbridge

Mercy Baggot Street Canopy Intranet

How Comcast Built An Open Source Content Delivery Network National Engineering & Technical Operations

An Introduction to High Performance Computing in the Department

Katta & Hadoop. Katta - Distributed Lucene Index in Production. Stefan Groschupf Scale Unlimited, 101tec. sg{at}101tec.com

Efficient Management of System Logs using a Cloud Radoslav Bodó, Daniel Kouřil CESNET. ISGC 2013, March 2013

ZABBIX. An Enterprise-Class Open Source Distributed Monitoring Solution. Takanori Suzuki MIRACLE LINUX CORPORATION October 22, 2009

the missing log collector Treasure Data, Inc. Muga Nishizawa

Setting Up a CLucene and PostgreSQL Federation

Basic TCP/IP networking knowledge of client/server concepts Basic Linux commands and desktop navigation (if don't know we will cover it )

OpenITSM - IT Service Management with Open Source Software


LDAPCON Sébastien Bahloul

Nagios. cooler than it looks. Wednesday, 31 October 2007

Analysis of Network Beaconing Activity for Incident Response

Sample. WebCenter Sites. Go-Live Checklist

Lustre Operations Manual Version man-v33 (08/01/2006)

/ Administrator Training PAT-2012

A Filesystem Layer Data Replication Method for Cloud Computing

PZVM1 Administration Guide. V1.1 February 2014 Alain Ganuchaud. Page 1/27

Lustre Networking BY PETER J. BRAAM

Finding Order(s) in the Chaos

Security Information Management

Monitoring systems: Concepts and tools

RAPIDMINER FREE SOFTWARE FOR DATA MINING, ANALYTICS AND BUSINESS INTELLIGENCE. Luigi Grimaudo Database And Data Mining Research Group

Parallel Analysis and Visualization on Cray Compute Node Linux

VoIP Fraud and Misuse

Web Intrusion Detection with ModSecurity. Ivan Ristic

GT 6.0 GRAM5 Key Concepts

VCE Vision Intelligent Operations Version 2.5 Technical Overview

Current Status of FEFS for the K computer

Running Native Lustre* Client inside Intel Xeon Phi coprocessor

Transcription:

LMT Lustre Monitoring Tools April 13, 2011 Christopher Morrone, P. O. Box 808, Livermore, CA 94551 This work performed under the auspices of the U.S. Department of Energy by under Contract DE-AC52-07NA27344

LMT Mission Collect and display real-time and historical information about Lustre filesystem activity. 2

LMT History LMT Version 1 In-house python prototype LMT Version 2 Rewrite of LMT in C, Java, Perl, Sh Cerebro for data collection Incorporated MySQL for logging data, proved very useful [Uselton 2009 CUG] ltop text utility lwatch GUI in Java LMT Version 3 Major improvments by Jim Garlick 3

Cerebro overview Cluster monitoring daemon, tools and libraries Inspired by ganglia Uses multicast Dynamic module interface (plugins) for adding metrics /usr/lib/cerebro Current metrics: cerebro_metric_lmt_mdt cerebro_metric_lmt_ost cerebro_metric_lmt_router cerebro_metric_lmt_osc 4

LMT 2 architecture lmt_agg.cron MDS MDS MySQL MySQL Database Database lwatch lwatch (GUI) (GUI) ltop ltop (curses) (curses) cerebro cerebro multicast multicast OSS OSS lnet lnet Router Router 5

Problems in LMT version 2 Lustre config must be expressed in an odd language, then pre-loaded into MySQL. Nothing functions until both MySQL and cerebro are up. Poor error handling and logging make debug difficult. There are two overlapping config files in odd locations. The cerebro module code is prototype quality and brittle. 6

Improved in Version 3 Lustre config is automatically determined on the fly. ltop now functions as soon as cerebro is up. MySQL is actually optional now. Error handling and logging are rewritten/improved. Cerebro module code has been refactored/rewritten. There is a single new config file: /etc/lmt/lmt.conf More data is collected/shown in ltop 7

LUA-based lmt.conf lmt_cbr_debug = 0 lmt_proto_debug = 0 lmt_db_debug = 0 lmt_db_host = nil lmt_db_port = 0 lmt_db_rouser = "lwatchclient" lmt_db_ropasswd = nil lmt_db_rwuser = "lwatchadmin" f = io.open("/etc/lmt/rwpasswd") if (f) then lmt_db_rwpasswd = f:read("*all") f:close() else lmt_db_rwpasswd = nil end 8

Unchanged in Version 3 The architecture is the same (except ltop). The database schema is unchanged. The lwatch/lstat java clients are unchanged (moved to separate lmt-gui package). Cron aggregation scripts that convert high low-res MySQL sample data still exist (kludge!). 9

LMT 3 architecture lmt_agg.cron MDS MDS MySQL MySQL Database Database lwatch lwatch (GUI) (GUI) cerebro cerebro multicast multicast ltop ltop (curses) (curses) OSS OSS lnet lnet Router Router 10

Simple setup for ltop MDS MDS cerebro lmt-server ltop ltop OSS OSS cerebro lmt-server-agent 11

ltop screenshot 12

ltop ost compressed view 13

ltop recovery example 14

ltop screenshot 15

Get LMT LMT Code https://github.com/chaos/lmt https://github.com/chaos/lmt-gui LMT Wiki https://github.com/chaos/lmt/wiki Cerebro Code https://github.com/chaos/cerebro 16