ERLANGEN REGIONAL COMPUTING CENTER FEPA Project status and further steps J. Eitzinger, T. Röhl, W. Hesse, A. Jeutter, E. Focht 15.12.2015
Motivation Cluster administrators employ monitoring to Detect errors or faulty operation Observe total system utilization Application developers use (mostly GUI) tools to do performance profiling Ein flexibles Framework zur Energie- und Performanceanalyse hochparalleler Applikationen im Rechenzentrum Primary Target Provide a monitoring infrastructure to allow for a continuous system-wide application performance and energy profiling based on hardware performance counter measurements 2
Objectives Allow to detect applications with pathological performance behavior Help to identify applications with large optimization potential Give users feedback about application performance Ease access to hardware performance counter data 3
STATUS
RRZE (Thomas Röhl) Support for new architectures: Intel Silvermont, Intel Broadwell and Broadwell-EP, Intel Skylake Improved overflow detection (including RAPL) Improved documentation with many new examples (Cilk+, C++11 threads) More performance groups and validated metrics for many architectures Improvements in likwid-bench and likwid-mpirun New access layer to support platform-independent code (x86, Power, ARM) 5
NEC (Andreas Jeuter) collector group collector group collector group Instantiate Program tagger Componentized Fully distributed Separate per processes: job truly parallel Implemented aggregator in Python store Connected per job through ZeroMQ aggregator store Instantiate at job start (Trigger aggregation) Kill when job stops controller instantiate per group store AggMon Sharding + Replication NoSQL DB NoSQL DB NoSQL DB Resource Scheduler job start/stop 6
AggMon: Collector Add tag Remove tag Subscribe Unsubscribe modified gmond ZMQ PUSH RPC collector O(50k) msg/s ZMQ PULL queue tagger match & publish Messages: JSON serialized dicts/maps Tagger: adds a key-value to message, based on match condition Subscribe: based on match condition (key-value, key-value regex) ZMQ PUSH O(10k) msg/s 7
AggMon: Data Store TokuMX: MongoDB compatible Collections can be sharded Spread Documents on different mongod instances Entry point: any mongos instance Replication (for example master-slave) is possible Group master mongos... Group master mongos O(10k) msg/s { group:rack1, } configsvr shard key mongod rack1... mongod mongod mongod rack2 rack3... 8
LRZ (Wolfram Hesse, Carla Guillen) Erfolgreicher Abschluss der Promotion von C. Guillen Knowledge-based Performance Monitoring for Large Scale HPC Architectures; Dissertation C. Guillen Carias; 2015; http://mediatum.ub.tum.de?id=1237547 Validierung der verwendeten Performancemuster Statistische Auswertung der Performancemuster Dokumentation des PerSyst-Monitoring-System 9
LRZ: PerSyst Status PerSyst-Monitoring ist produktiv @ SuperMUC Phase I + II Definition und Umsetzung der Performancemuster Phase 1 (Westmere- EX,SandyBridge-EP) und Phase 2 (Haswell-EP) Nutzung und Verifikation durch: LRZ-Applikationsunterstützungsgruppe und IBM-Mitarbeiter Benachrichtigung der Benutzer, falls offensichtliche Bottlenecks vorliegen + Vorschläge für Optimierungen Sichtung von Anwendungen für Extreme Scaling und Benchmarks SuperMUC-Benutzer Pos. Feedback bzg. Nützlichkeit Umsetzung des PerSyst Web-Frontend am RRZE 10
ONGOING WORK Integrate complete stack at RRZE Validate Performance Patterns from profiling data
Current Questions How to deal with established monitoring infrastructure (Ganglia)? Easy: Use existing monitoring infrastructures Target: Replace existing software with FEPA stack Concerns about large overhead of continous HPM profiling Overhead could be lower with a better interface to HPM (ISA, OS) Missing knowledge about overheads in general Picking the right building blocks. Backend daemon: diamond (https://github.com/python-diamond/diamond) Communication protocol: ZeroMQ (http://zeromq.org) Storage: TokuMX (NoSQL) 12
Integration of FEPA components Target system: 80- Nehalem cluster system in normal production use Objectives Sort out issues between components Validate and benchmark solution: diamond mongodb/tokumx Liferay framework based PerSyst frontend Experiment on application profiling data Required granularity for phase detection Performance Pattern validation on set of known codes 13
Conclusion and Outlook Layers are ready to be integrated into complete stack Convergence for finding external building blocks LRZ PerSyst System in production use Next: Continue integrating stack to make FEPA ready to be distributed at associated HPC centers Validate FEPA on a set of known benchmarks (Mantevo, NPB, SPEC) 14
ERLANGEN REGIONAL COMPUTING CENTER Regionales Rechenzentrum Erlangen NEC Deutschland GmbH Leibniz- Rechenzentrum Thank You.