FEPA Project status and further steps



Similar documents
Lustre & Cluster. - monitoring the whole thing Erich Focht

MongoDB Developer and Administrator Certification Course Agenda

for High Performance Computing

MongoDB and Couchbase

Tagesordnung WIN/IP-Forum

NoSQL Performance Test In-Memory Performance Comparison of SequoiaDB, Cassandra, and MongoDB

Rackspace Cloud Databases and Container-based Virtualization

Windows HPC 2008 Cluster Launch

Windows HPC Server 2008 Deployment

Benchmarking Couchbase Server for Interactive Applications. By Alexey Diomin and Kirill Grigorchuk

Can High-Performance Interconnects Benefit Memcached and Hadoop?

NoSQL in der Cloud Why? Andreas Hartmann

MADOCA II Data Logging System Using NoSQL Database for SPring-8

GigaSpaces Real-Time Analytics for Big Data

NOCTUA by init.at THE FLEXIBLE MONITORING WEB FRONTEND

Can the Elephants Handle the NoSQL Onslaught?

Übersetzerbau in der Industrie: CacaoVM

Benchmarking and Analysis of NoSQL Technologies

Sharding with postgres_fdw

NoSQL - What we ve learned with mongodb. Paul Pedersen, Deputy CTO paul@10gen.com DAMA SF December 15, 2011

On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform

SPECjEnterprise2010 & Java Enterprise Edition (EE) PCM Model Generation DevOps Performance WG Meeting

Querying MongoDB without programming using FUNQL

Scalable Architecture on Amazon AWS Cloud

Evaluator s Guide. McKnight. Consulting Group. McKnight Consulting Group

Scaling Graphite Installations

Cloud computing - Architecting in the cloud

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

TECHNISCHE UNIVERSITÄT MÜNCHEN Institut für Informatik Lehrstuhl für Rechnertechnik und Rechnerorganisation

Comparison of computational services at LRZ

Building an energy dashboard. Energy measurement and visualization in current HPC systems

Scaling up = getting a better machine. Scaling out = use another server and add it to your cluster.

Big Data & Data Science Course Example using MapReduce. Presented by Juan C. Vega

MongoDB. The Definitive Guide to. The NoSQL Database for Cloud and Desktop Computing. Apress8. Eelco Plugge, Peter Membrey and Tim Hawkins

MONGODB - THE NOSQL DATABASE

STeP-IN SUMMIT June 2014 at Bangalore, Hyderabad, Pune - INDIA. Performance testing Hadoop based big data analytics solutions

How To Use The Persyst Tool On A Supercomputer

The MongoDB Tutorial Introduction for MySQL Users. Stephane Combaudon April 1st, 2014

Big Data-Anwendungsbeispiele aus Industrie und Forschung

Mobile Analytics. mit Elasticsearch und Kibana. Dominik Helleberg

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Database Scalability and Oracle 12c

E-Commerce Design and Implementation Tutorial

RDBMS vs NoSQL: Performance and Scaling Comparison

Cloud Scale Distributed Data Storage. Jürmo Mehine

Monitoring HTCondor with Ganglia

DATA INTEGRATION. in the world of microservices

Satellite-UMTS - Specification of Protocols and Traffic Performance Analysis

An OS-oriented performance monitoring tool for multicore systems

the missing log collector Treasure Data, Inc. Muga Nishizawa

Social Networks and the Richness of Data

Performance Analysis for NoSQL and SQL

Kashif Iqbal - PhD Kashif.iqbal@ichec.ie

Chapter 5 Cloud Resource Virtualization

CPU Session 1. Praktikum Parallele Rechnerarchtitekturen. Praktikum Parallele Rechnerarchitekturen / Johannes Hofmann April 14,

Outline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging

The Sierra Clustered Database Engine, the technology at the heart of

Networking in the Hadoop Cluster

Weekly Report. Hadoop Introduction. submitted By Anurag Sharma. Department of Computer Science and Engineering. Indian Institute of Technology Bombay

How To Compare The Economics Of A Database To A Microsoft Database

Scaling Pinterest. Yash Nelapati Ascii Artist. Pinterest Engineering. Saturday, August 31, 13

Couchbase Server Technical Overview. Key concepts, system architecture and subsystem design

Lustre Monitoring with OpenTSDB

Optimizing Shared Resource Contention in HPC Clusters

Building Heavy Load Messaging System

HDMQ :Towards In-Order and Exactly-Once Delivery using Hierarchical Distributed Message Queues. Dharmit Patel Faraj Khasib Shiva Srivastava

Lustre * Filesystem for Cloud and Hadoop *

Architecture. Evaluation and Classification of Computer Architectures. Bewertung und Klassifikation von Rechnerarchitekturen.

MongoDB. An introduction and performance analysis. Seminar Thesis

Practical Hadoop. Security. Bhushan Lakhe

MySQL és Hadoop mint Big Data platform (SQL + NoSQL = MySQL Cluster?!)

NoSQL replacement for SQLite (for Beatstream) Antti-Jussi Kovalainen Seminar OHJ-1860: NoSQL databases

Databases for text storage

Aktives Service-, Asset- und Lizenzmanagement mit Altiris

Wir begleiten Sie in die Cloud

Customer Intimacy Analytics

Visual Statement. NoSQL Data Storage. MongoDB Project. April 10, Bobby Esfandiari Stefan Schielke Nicole Saat

A Software and Hardware Architecture for a Modular, Portable, Extensible Reliability. Availability and Serviceability System

Björn Kraus. Session Aware Full Page Caching For Magento With Varnish ESI

An Approach to Implement Map Reduce with NoSQL Databases

Why NoSQL? Your database options in the new non- relational world IBM Cloudant 1

GraySort and MinuteSort at Yahoo on Hadoop 0.23

Why Zalando trusts in PostgreSQL

Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world

Real Time Fraud Detection With Sequence Mining on Big Data Platform. Pranab Ghosh Big Data Consultant IEEE CNSV meeting, May Santa Clara, CA

Getting Started with SandStorm NoSQL Benchmark

3 Case Studies of NoSQL and Java Apps in the Real World

Transcription:

ERLANGEN REGIONAL COMPUTING CENTER FEPA Project status and further steps J. Eitzinger, T. Röhl, W. Hesse, A. Jeutter, E. Focht 15.12.2015

Motivation Cluster administrators employ monitoring to Detect errors or faulty operation Observe total system utilization Application developers use (mostly GUI) tools to do performance profiling Ein flexibles Framework zur Energie- und Performanceanalyse hochparalleler Applikationen im Rechenzentrum Primary Target Provide a monitoring infrastructure to allow for a continuous system-wide application performance and energy profiling based on hardware performance counter measurements 2

Objectives Allow to detect applications with pathological performance behavior Help to identify applications with large optimization potential Give users feedback about application performance Ease access to hardware performance counter data 3

STATUS

RRZE (Thomas Röhl) Support for new architectures: Intel Silvermont, Intel Broadwell and Broadwell-EP, Intel Skylake Improved overflow detection (including RAPL) Improved documentation with many new examples (Cilk+, C++11 threads) More performance groups and validated metrics for many architectures Improvements in likwid-bench and likwid-mpirun New access layer to support platform-independent code (x86, Power, ARM) 5

NEC (Andreas Jeuter) collector group collector group collector group Instantiate Program tagger Componentized Fully distributed Separate per processes: job truly parallel Implemented aggregator in Python store Connected per job through ZeroMQ aggregator store Instantiate at job start (Trigger aggregation) Kill when job stops controller instantiate per group store AggMon Sharding + Replication NoSQL DB NoSQL DB NoSQL DB Resource Scheduler job start/stop 6

AggMon: Collector Add tag Remove tag Subscribe Unsubscribe modified gmond ZMQ PUSH RPC collector O(50k) msg/s ZMQ PULL queue tagger match & publish Messages: JSON serialized dicts/maps Tagger: adds a key-value to message, based on match condition Subscribe: based on match condition (key-value, key-value regex) ZMQ PUSH O(10k) msg/s 7

AggMon: Data Store TokuMX: MongoDB compatible Collections can be sharded Spread Documents on different mongod instances Entry point: any mongos instance Replication (for example master-slave) is possible Group master mongos... Group master mongos O(10k) msg/s { group:rack1, } configsvr shard key mongod rack1... mongod mongod mongod rack2 rack3... 8

LRZ (Wolfram Hesse, Carla Guillen) Erfolgreicher Abschluss der Promotion von C. Guillen Knowledge-based Performance Monitoring for Large Scale HPC Architectures; Dissertation C. Guillen Carias; 2015; http://mediatum.ub.tum.de?id=1237547 Validierung der verwendeten Performancemuster Statistische Auswertung der Performancemuster Dokumentation des PerSyst-Monitoring-System 9

LRZ: PerSyst Status PerSyst-Monitoring ist produktiv @ SuperMUC Phase I + II Definition und Umsetzung der Performancemuster Phase 1 (Westmere- EX,SandyBridge-EP) und Phase 2 (Haswell-EP) Nutzung und Verifikation durch: LRZ-Applikationsunterstützungsgruppe und IBM-Mitarbeiter Benachrichtigung der Benutzer, falls offensichtliche Bottlenecks vorliegen + Vorschläge für Optimierungen Sichtung von Anwendungen für Extreme Scaling und Benchmarks SuperMUC-Benutzer Pos. Feedback bzg. Nützlichkeit Umsetzung des PerSyst Web-Frontend am RRZE 10

ONGOING WORK Integrate complete stack at RRZE Validate Performance Patterns from profiling data

Current Questions How to deal with established monitoring infrastructure (Ganglia)? Easy: Use existing monitoring infrastructures Target: Replace existing software with FEPA stack Concerns about large overhead of continous HPM profiling Overhead could be lower with a better interface to HPM (ISA, OS) Missing knowledge about overheads in general Picking the right building blocks. Backend daemon: diamond (https://github.com/python-diamond/diamond) Communication protocol: ZeroMQ (http://zeromq.org) Storage: TokuMX (NoSQL) 12

Integration of FEPA components Target system: 80- Nehalem cluster system in normal production use Objectives Sort out issues between components Validate and benchmark solution: diamond mongodb/tokumx Liferay framework based PerSyst frontend Experiment on application profiling data Required granularity for phase detection Performance Pattern validation on set of known codes 13

Conclusion and Outlook Layers are ready to be integrated into complete stack Convergence for finding external building blocks LRZ PerSyst System in production use Next: Continue integrating stack to make FEPA ready to be distributed at associated HPC centers Validate FEPA on a set of known benchmarks (Mantevo, NPB, SPEC) 14

ERLANGEN REGIONAL COMPUTING CENTER Regionales Rechenzentrum Erlangen NEC Deutschland GmbH Leibniz- Rechenzentrum Thank You.