Correlating Multiple TB of Performance Data to User Jobs

Size: px

Start display at page:

Download "Correlating Multiple TB of Performance Data to User Jobs"

Hugh Hubbard
7 years ago
Views:

1 Michael Kluge, ZIH Correlating Multiple TB of Performance Data to User Jobs Lustre User Group 2015, Denver, Colorado Zellescher Weg 12 Willers-Bau A 208 Tel Michael Kluge (michael.kluge@tu-dresden.de)

2 About us ZIH: about us HPC and service provider Research Institute Big Data Competence Center Main research areas: Performance Analysis Tools Energy Efficiency Computer Architecture Standardization Efforts (OpenACC, OpenMP) Michael Kluge, ZIH 2

ZIH HPC and BigData Infrastructure Erlangen, Potsdam TU Dresden ZIH- Backbone 100 Gbit/s Cluster Uplink Home Archiv HTW, Chemnitz, Leipzig, Freiberg Parallel Machine BULL HPC Haswell processors 2 x

3 ZIH HPC and BigData Infrastructure Erlangen, Potsdam TU Dresden ZIH- Backbone 100 Gbit/s Cluster Uplink Home Archiv HTW, Chemnitz, Leipzig, Freiberg Parallel Machine BULL HPC Haswell processors 2 x 612 Nodes ~ Cores High Throughput BULL Islands Westmere, Sandy Bridge, und Haswell processors, GPU-Nodes 700 Nodes ~ Cores other clusters Megware Cluster IBM idataplex Shared Memory SGI Ultraviolet cores SandyBridge 8 TB RAM NUMA Storage Michael Kluge, ZIH 3

HRSK-II, Phase 2, currently installed GPU Isle 64 nodes Haswell 2 x 12 cores 64 GB RAM + 2 x Nvidia K 80 Troughput Isle 232 nodes Haswell 2 x 12 cores 64/128/256 GB RAM 128 GB SSD HPC Isle 612 nodes

4 HRSK-II, Phase 2, currently installed GPU Isle 64 nodes Haswell 2 x 12 cores 64 GB RAM + 2 x Nvidia K 80 Troughput Isle 232 nodes Haswell 2 x 12 cores 64/128/256 GB RAM 128 GB SSD HPC Isle 612 nodes Haswell 2 x 12 cores 64 GB RAM 128 GB SSD HPC Isle 612 nodes Haswell 2 x 12 cores 64 GB RAM 128 GB SSD 64 GB RAM 24 Lustre Server 2 x Export 4 x Login 2 x SMP nodes 2 TB RAM, 48 cores FDR InfiniBand 5 PB NL- SATA 80 GB/s IOPS 40 TB SSD 10 GB/s 1 Mio. IOPS to Campus 4 x 10GE Uplink, Home, Backup ~ 1,2 PF peak in CPUs, ~ cores total ~ 300 TF peak in GPUs, 128 cards Michael Kluge, ZIH 4

Architecture of our storage concept (FASS) SCRATCH Switch 2 Switch 1 ZIH-Infrastructure Flexible Storage System (FASS) Login Batch-System Export HPC-Component User A User Z User A User Z

5 Architecture of our storage concept (FASS) SCRATCH Switch 2 Switch 1 ZIH-Infrastructure Flexible Storage System (FASS) Login Batch-System Export HPC-Component User A User Z User A User Z Server/File systems Server 1 Transaction Checkpoint. Server 2 Scratch Server N Net Storage SSD SAS SATA Access statistics Analysis Throughput-Component Optimization and control Michael Kluge, ZIH 5

6 Performance Analysis for I/O Activities (1) What is of interest Application requests, Interface types Access sizes/patterns Network/Server/Storage utilization Level of detail: Record everything Record data every 5 to 10 seconds Challenge: How to analyze this? How to deal with anonymous data? Michael Kluge, ZIH 6

7 I/O and Job Correlation: Architecture/Software HPC File System SLURM Batch System node node node node I/O network application events node local events utilization errors MDS OSS OSS OSS SAN backend, cache RAID RAID RAID throughput, IOPS Global I/O Data Correlation System Batch System Log Files Cluster Formation Job Clusters Agent Interface Agent Interface Agent Interface Agent Interface Job Clusters with high I/O demands Job Clusters with low I/O demands Database Storage Data Processing Database Storage Database Storage Data Request s Dataheap Data Collection System Michael Kluge, ZIH 7

8 Visualization of Live Data for Phase 1 Michael Kluge, ZIH 8

9 Visualization of Live Data for Phase 2 Michael Kluge, ZIH 9

10 Performance Analysis for I/O Activities (2) Amount of collected data: 240 OSTs, 20 OSS servers, Looking at stats and brw_stats (how to store a histogram in database?) ~ tables about 1 GB of data per hour Analysis is supposed to cover 6 month 4 TB data No way to analyze this with any serial approach Michael Kluge, ZIH 10

11 Analysis Process map operations to a set of tables modify all tables (align data, scale, aggregate over time) create a new table from an input set of tables (aggregation) plot (time series, histograms) currently: single thread Haskell program Swift/T work in progress 4 TB means 50 seconds on an 80 GB/s file system for the initial read Swift/T has nice ways to describe data dependencies and to parallelize this kind of workflows (open for suggestions!) Michael Kluge, ZIH 11

12 Approach to look at the Raw Data analysis of the raw performance data take original data and create derived metrics look at plots and histograms correlate metrics with user jobs take job log and split jobs into classes check for correlations between job classes and performance metrics Michael Kluge, ZIH 12

13 Metric Example: NetApp vs. OST Bandwidth (20s) Michael Kluge, ZIH 13

14 Metric Example: NetApp vs. OST Bandwidth (60s) Michael Kluge, ZIH 14

15 Metric Example: NetApp vs. OST Bandwidth (10 minutes) Michael Kluge, ZIH 15

16 Metric Example: 3 Month bandwidth vs. IOPS Michael Kluge, ZIH 16

17 I/O and Job Correlation: Starting Point (Bandwidth) Michael Kluge, ZIH 17

Building Job Classes a class is defined by: user id, core count similar runtime similar start time (time isolation) sort all jobs by runtime, split list at

18 Building Job Classes a class is defined by: user id, core count similar runtime similar start time (time isolation) sort all jobs by runtime, split list at gaps split these groups again if they are not overlapping (long gap between start) challenges: runtime fluctuation, large run time variations Michael Kluge, ZIH 18

19 I/O and Job Correlation: Preliminary Result (1) Michael Kluge, ZIH 19

20 I/O and Job Correlation: Preliminary Result (1) Michael Kluge, ZIH 20

21 I/O and Job Correlation: Correlation Factors Michael Kluge, ZIH 21

22 Open Topics How to store a histogram in database? Other approaches than Swift/T Deal with the zeros Cache intermediate results Michael Kluge, ZIH 22

23 Conclusion / Questions it is possible to map anonymous performance data back to user jobs advance towards storage controller analysis (caches, backend bandwidth, ) Michael Kluge, ZIH 23

Performance Tools for System Monitoring

Center for Information Services and High Performance Computing (ZIH) 01069 Dresden Performance Tools for System Monitoring 1st CHANGES Workshop, Jülich Zellescher Weg 12 Tel. +49 351-463 35450 September