Parallel file I/O bottlenecks and solutions
|
|
- Colin Cummings
- 8 years ago
- Views:
Transcription
1 Mitglied der Helmholtz-Gemeinschaft Parallel file I/O bottlenecks and solutions Views to Parallel I/O: Hardware, Software, Application Challenges at Large Scale Introduction SIONlib Pitfalls, Darshan, I/O-Strategies Wolfgang Frings Jülich Supercomputing Centre 13th VI-HPS Tuning Workshop, BSC, Barcelona 2014
2 Overview Parallel I/O from different views Hardware: Example: IBM BG/Q I/O infrastructure System Software: IBM GPFS, I/O-forwarding Application: Parallel I/O libraries Pitfalls Small blocks, I/O to individual files, false sharing Tasks per shared File, portability SIONlib Overview I/O characterization with darshan I/O strategies 13th VI-HPS Tuning Workshop. BSC, Barcelona 2014, Parallel I/O, W.Frings 2
3 IBM Blue Gene/Q (JUQUEEN) & I/O IBM Blue Gene/Q JUQUEEN IBM PowerPC A2 1.6 GHz, 16 cores per node 28 racks (7 rows à 4 racks) 28,672 nodes (458,752 cores) 5D torus network 5.9 Pflop/s peak 5.0 Pflop/s Linpack Main memory: 448 TB I/O Nodes: 248 (27x8 + 1x32) Network: 2x CISCO Nexus 7018 Switches (connect I/O-nodes) Total ports: GigEthernet Nexus-Switch 13th VI-HPS Tuning Workshop. BSC, Barcelona 2014, Parallel I/O, W.Frings 3
4 Blue Gene/Q: I/O-node cabling (8 ION/Rack) internal torus network 10GigE Network IBM th VI-HPS Tuning Workshop. BSC, Barcelona 2014, Parallel I/O, W.Frings 4
5 I/O-Network & File Server (JUST) JUQUEEN JUST4 18 Storage Controller (16 x DCS3700, 2 x DS3512) 20 GPFS NSD-Server x CISCO Nexus Ports (10GigE) JUST4-GSS 8 TSM Server p720 13th VI-HPS Tuning Workshop. BSC, Barcelona 2014, Parallel I/O, W.Frings 5
6 Software View to Parallel I/O: GPFS Architecture and I/O Data Path Comp. node Comp. node Comp. node Application NSD client Application NSD client O( )... Application NSD client Application GPFS NSD Client IO size parallelism pagepool (streams) prefetch threads Network (TCP/IP or IB) Network transfer size NSD server NSD server NSD server NSD server NSD server NSD server GPFS NSD Server pagepool (disks) S NSD workers SAN SAN O( )... SAN Adapter / Disk Device Driver hdisk dd adapter dd NSD NSD GPFS server BB1 NSD NSD GPFS server BB2 NSD NSD GPFS server BBn Storage Subsystem SAN Ctrl A Ctrl B P Software Stack IBM th VI-HPS Tuning Workshop. BSC, Barcelona 2014, Parallel I/O, W.Frings 6
7 Software View to Parallel I/O: GPFS on IBM Blue Gene/Q (I) I/O- Forwarding S Software Stack IBM th VI-HPS Tuning Workshop. BSC, Barcelona 2014, Parallel I/O, W.Frings 7
8 Software View to Parallel I/O: GPFS on IBM Blue Gene/Q (II) I/O- Forwarding Parallel application POSIX I/O POSIX I/O Parallel file system 13th VI-HPS Tuning Workshop. BSC, Barcelona 2014, Parallel I/O, W.Frings 8 IBM 2012
9 Application View to Parallel I/O Parallel application HDF5 NETCDF MPI-I/O SIONlib shared local POSIX I/O Parallel file system 13th VI-HPS Tuning Workshop. BSC, Barcelona 2014, Parallel I/O, W.Frings 9
10 Application View: Data Formats 13th VI-HPS Tuning Workshop. BSC, Barcelona 2014, Parallel I/O, W.Frings 10
11 Application View: Data Distribution HDF5 Parallel Application NETCDF distributed local view Transformation MPI/IO POSIX I/O shared global view Parallel file system Post-processing: convert-utility Software-view Data-view 13th VI-HPS Tuning Workshop. BSC, Barcelona 2014, Parallel I/O, W.Frings 11
12 Parallel Task-local I/O at Large Scale Usage Fields: Check-point files, restart files Result files, post-processing Parallel Performance-Tools Data types: Simulation data (domain-decomposition) Trace data (parallel performance tools) Bottlenecks: File creation #files: O(10 5 ) t1 t2 tn File management./checkpoint/file.0001./checkpoint/file.nnnn 13th VI-HPS Tuning Workshop. BSC, Barcelona 2014, Parallel I/O, W.Frings 12
13 The Showstopper for Task-local I/O: Parallel Creation of Individual Files > 33 minutes < 10 seconds Entries directory i-node f f f f f f f f f f f Tasks Tasks create file FS Block FS Block FS Block Jugene + GPFS: file create+open, one file per task versus one file per I/O-node 13th VI-HPS Tuning Workshop. BSC, Barcelona 2014, Parallel I/O, W.Frings 13
14 SIONlib: Shared Files for Task-local Data Parallel Application HDF5 NETCDF MPI-I/O SIONlib POSIX I/O Parallel file system Serial program Application t 1 t 2 t 3 Tasks t n-2 t n-1 t n t 1 t 2 Logical task-local files t n-1 t n #files: O(10)./checkpoint/file.0001 Physical./checkpoint/file.nnnn multi-file SIONlib Parallel file system 13th VI-HPS Tuning Workshop. BSC, Barcelona 2014, Parallel I/O, W.Frings 14
15 The Showstopper for Shared File I/O: Concurrent Access & Contention t 1 t 2 lock FS Block FS Block FS Block data task 1 data task 2 lock File System Block Locking Serialization SIONlib: Logical partitioning of Shared File: Dedicated data chunks per task Alignment to boundaries of file system blocks no contention Tasks t 1 t 2 t n SIONlib metablock 1 chunk 1 data Gaps chunk 2 data block 1 chunk n data metablock 2 FS Blocks Shared file 13th VI-HPS Tuning Workshop. BSC, Barcelona 2014, Parallel I/O, W.Frings 15
16 SIONlib: Architecture & Example Application SION OpenMP API SION Hybrid API SION MPI API callbacks Parallel generic API Serial API callbacks SIONlib OpenMP ANSI C or POSIX-I/O MPI Extension of I/O-API (ANSI C or POSIX) C and Fortran bindings, implementation language C Current versions: 1.4p3 Open source license: /* fopen() */ sid=sion_paropen_mpi( filename, bw, &numfiles, &chunksize, gcom, &lcom, &fileptr,...); /* fwrite(bindata,1,nbytes, fileptr) */ sion_fwrite(bindata,1,nbytes, sid); /* fclose() */ sion_parclose_mpi(sid) 13th VI-HPS Tuning Workshop. BSC, Barcelona 2014, Parallel I/O, W.Frings 16
17 SIONlib in a NutShell: Task local I/O /* Open */ sprintf(tmpfn, "%s.%06d",filename,my_nr); fileptr=fopen(tmpfn, "bw",...);... /* Write */ fwrite(bindata,1,nbytes,fileptr);... /* Close */ fclose(fileptr); Original ANSI C version no collective operation, no shared files data: stream of bytes 13th VI-HPS Tuning Workshop. BSC, Barcelona 2014, Parallel I/O, W.Frings 17
18 SIONlib in a NutShell: Add SIONlib calls /* Collective Open */ nfiles=1;chunksize=nbytes; sid=sion_paropen_mpi( filename, "bw", &nfiles, &chunksize, MPI_COMM_WORLD, &lcomm, &fileptr,...);... /* Write */ fwrite(bindata,1,nbytes,fileptr);... /* Collective Close */ sion_parclose_mpi(sid); Collective (SIONlib) open and close Ready to run... Parallel I/O to one shared file 13th VI-HPS Tuning Workshop. BSC, Barcelona 2014, Parallel I/O, W.Frings 18
19 SIONlib in a NutShell: Variable Data Size /* Collective Open */ nfiles=1;chunksize=nbytes; sid=sion_paropen_mpi( filename, "bw", &nfiles, &chunksize, MPI_COMM_WORLD, &lcomm, &fileptr,...);... /* Write */ if(sion_ensure_free_space(sid, nbytes)) { fwrite(bindata,1,nbytes,fileptr); }... /* Collective Close */ sion_parclose_mpi(sid); Writing more data as defined at open call SIONlib moves forward to next chunk, if data to large for current block 13th VI-HPS Tuning Workshop. BSC, Barcelona 2014, Parallel I/O, W.Frings 19
20 SIONlib in a NutShell: Wrapper function /* Collective Open */ nfiles=1;chunksize=nbytes; sid=sion_paropen_mpi( filename, "bw", &nfiles, &chunksize, MPI_COMM_WORLD, &lcomm, &fileptr,...);... /* Write */ sion_fwrite(bindata,1,nbytes,sid);... /* Collective Close */ sion_parclose_mpi(sid); Includes check for space in current chunk Parameter of fwrite: fileptr sid 13th VI-HPS Tuning Workshop. BSC, Barcelona 2014, Parallel I/O, W.Frings 20
21 SIONlib: Applications Applications DUNE-ISTL (Multigrid solver, Univ. Heidelberg) ITM (Fusion-community), LBM (Fluid flow/mass transport, Univ. Marburg), PSC (particle-in-cell code), OSIRIS (Fully-explicit particle-in-cell code), PEPC (Pretty Efficient Parallel C. Solver) Profasi: (Protein folding and aggr. simulator) NEST (Human Brain Simulation) MP2C: k tasks, write 16k tasks, write (SION) 1k tasks, read 16k tasks, read (SION) 100 Time (s) 10 MP2C: Mesoscopic hydrodynamics + MD Speedup and higher particle numbers through SIONlib integration Tools/Projects Scalasca: Performance Analysis instrumented application Mio. Particles Local event traces Parallel analysis Score-P: Scalable Performance Measurement Infrastructure for Parallel Codes DEEP-ER: Adaption to new platform and parallelization paradigm Global analysis result 13th VI-HPS Tuning Workshop. BSC, Barcelona 2014, Parallel I/O, W.Frings 21
22 Are there more Bottlenecks? Increasing #tasks further Bottleneck: file meta data management by first GPFS client which opened the file I/Oclient file i-node indirect blocks FS blocks JUGENE: Bandwidth per ION, comparison individual files (POSIX), one file per ION (SION) and one shared file (POSIX) I/O-Node 1 >> P 1 P n Par. FS I/O-Node m P 1 SIONlib e.g.: IBM BG/Q I/O- Infrastructure P n 13th VI-HPS Tuning Workshop. BSC, Barcelona 2014, Parallel I/O, W.Frings 22
23 SIONlib: Multiple Underlying Physical Files Parallelization of file meta data handling using multiple physical files Mapping: Files : Tasks 1 : n p : n n : n IBM Blue Gene: One file per I/O-node (locality) Tasks t 1 t n/2 t n/2+1 t n metablock 1 metablock 2 mapping metablock 1 metablock 2 Shared file Shared file 1 Shared file 2 13th VI-HPS Tuning Workshop. BSC, Barcelona 2014, Parallel I/O, W.Frings 23
24 SIONlib: Scaling to Large # of Tasks JUGENE: Total bandwidth (write), one file per I/O-node (ION), varying the number of tasks doing the I/O I/Onodes Preliminary Tests on JUQUEEN up to 1.8 Mio Tasks JUQUEEN: Total bandwidth (write/read), one file per I/O-bridge (IOB) Old (Just3) vs. New (Just4) GPFS file system 13th VI-HPS Tuning Workshop. BSC, Barcelona 2014, Parallel I/O, W.Frings 24
25 Other Pitfalls: Frequent flushing on small blocks Modern file systems in HPC have large file system blocks A flush on a file handle forces the file system to perform all pending write operations If application writes in small data blocks the same file system block it has to be read and written multiple times Performance degradation due to the inability to combine several write calls flush() 13th VI-HPS Tuning Workshop. BSC, Barcelona 2014, Parallel I/O, W.Frings 25
26 Other Pitfalls: Portability Endianess (byte order) of binary data Example (32 bit): = Address Little Endian Big Endian Conversion of files might be necessary and expensive Solution: Choosing a portable data format (HDF5, NetCDF) 13th VI-HPS Tuning Workshop. BSC, Barcelona 2014, Parallel I/O, W.Frings 26
27 Darshan I/O Characterization Darshan: Scalable HPC I/O characterization tool (ANL) (version 2.2.8) Profiling of I/O-Calls (POSIX, MPI-I/O, ) during runtime Instrumentation dynamic linked binaries: LD_PRELOAD=<<path>libdarshan.so> static binaries: Wrapper for compiler-calls for static binaries Log-files: <uid><binname><jobid><ts>.darshan.gz Path: set by environment variable DARSHANLOGDIR e.g. mpirun -x DARSHANLOGDIR=$HOME/darshanlog Reports: PDF-file or text files Extract information: darshan-parser <logfile> > ~/job-characterization.txt Generate PDF-report from logfile: darshan-job-summary.pl <logfile> PDF-file 13th VI-HPS Tuning Workshop. BSC, Barcelona 2014, Parallel I/O, W.Frings 27
28 Darshan on MareNostrum-III Installation directory: DARSHANDIR=/gpfs/projects/nct00/nct00001/\ UNITE/packages/darshan/2.2.8-intel-openmpi Program start: mpirun -x DARSHANLOGDIR=${HOME}/darshanlog \ -x LD_PRELOAD=${DARSHANDIR/lib/libdarshan.so Parser: $DARSHANDIR/bin/darshan-parser <logfile> Output format: see documentation Generate PDF-report: on local system (needs pdflatex) $LOCALDARSHANDIR/bin/darshan-job-summary.pl <logfile> 13th VI-HPS Tuning Workshop. BSC, Barcelona 2014, Parallel I/O, W.Frings 28
29 How to choose an I/O strategy? Performance considerations Amount of data Frequency of reading/writing Scalability Portability Different HPC architectures Data exchange with others Long-term storage E.g. use two formats and converters: Internal: Write/read data as-is Restart/checkpoint files External: Write/read data in non-decomposed format (portable, system-independent, self-describing) Workflows, Pre-, Postprocessing, Data exchange, 13th VI-HPS Tuning Workshop. BSC, Barcelona 2014, Parallel I/O, W.Frings 29
30 Questions? Serial program t 1 t 2 t 3 Application t n-2 t n-1 t n Tasks Physical multi-file Logical task-local files SIONlib Parallel file system Thank You! 13th VI-HPS Tuning Workshop. BSC, Barcelona 2014, Parallel I/O, W.Frings 30
Parallel I/O on JUQUEEN
Parallel I/O on JUQUEEN 3. February 2015 3rd JUQUEEN Porting and Tuning Workshop Sebastian Lührs, Kay Thust s.luehrs@fz-juelich.de, k.thust@fz-juelich.de Jülich Supercomputing Centre Overview Blue Gene/Q
More informationGPFS Storage Server. Concepts and Setup in Lemanicus BG/Q system" Christian Clémençon (EPFL-DIT)" " 4 April 2013"
GPFS Storage Server Concepts and Setup in Lemanicus BG/Q system" Christian Clémençon (EPFL-DIT)" " Agenda" GPFS Overview" Classical versus GSS I/O Solution" GPFS Storage Server (GSS)" GPFS Native RAID
More informationHow To Build A Supermicro Computer With A 32 Core Power Core (Powerpc) And A 32-Core (Powerpc) (Powerpowerpter) (I386) (Amd) (Microcore) (Supermicro) (
TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 7 th CALL (Tier-0) Contributing sites and the corresponding computer systems for this call are: GCS@Jülich, Germany IBM Blue Gene/Q GENCI@CEA, France Bull Bullx
More informationScalable System Monitoring
Mitglied der Helmholtz-Gemeinschaft PTP Scalable System Monitoring with Eclipse Parallel Tools Platform Wolfgang Frings Jülich Supercomputing Centre September 2012, CHANGES Workshop W.Frings@fz-juelich.de
More informationCluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer
Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer Stan Posey, MSc and Bill Loewe, PhD Panasas Inc., Fremont, CA, USA Paul Calleja, PhD University of Cambridge,
More informationPetascale Software Challenges. Piyush Chaudhary piyushc@us.ibm.com High Performance Computing
Petascale Software Challenges Piyush Chaudhary piyushc@us.ibm.com High Performance Computing Fundamental Observations Applications are struggling to realize growth in sustained performance at scale Reasons
More informationDavid Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems
David Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems About me David Rioja Redondo Telecommunication Engineer - Universidad de Alcalá >2 years building and managing clusters UPM
More informationReport on Project: Advanced System Monitoring for the Parallel Tools Platform (PTP)
Mitglied der Helmholtz-Gemeinschaft Report on Project: Advanced System Monitoring for the Parallel Tools Platform (PTP) September, 2014 Wolfgang Frings and Carsten Karbach Project progress Server caching
More informationWelcome to the. Jülich Supercomputing Centre. D. Rohe and N. Attig Jülich Supercomputing Centre (JSC), Forschungszentrum Jülich
Mitglied der Helmholtz-Gemeinschaft Welcome to the Jülich Supercomputing Centre D. Rohe and N. Attig Jülich Supercomputing Centre (JSC), Forschungszentrum Jülich Schedule: Monday, May 19 13:00-13:30 Welcome
More informationOpenMP Programming on ScaleMP
OpenMP Programming on ScaleMP Dirk Schmidl schmidl@rz.rwth-aachen.de Rechen- und Kommunikationszentrum (RZ) MPI vs. OpenMP MPI distributed address space explicit message passing typically code redesign
More informationBG/Q Performance Tools. Sco$ Parker Leap to Petascale Workshop: May 22-25, 2012 Argonne Leadership CompuCng Facility
BG/Q Performance Tools Sco$ Parker Leap to Petascale Workshop: May 22-25, 2012 BG/Q Performance Tool Development In conjunccon with the Early Science program an Early SoIware efforts was inicated to bring
More informationJUROPA Linux Cluster An Overview. 19 May 2014 Ulrich Detert
Mitglied der Helmholtz-Gemeinschaft JUROPA Linux Cluster An Overview 19 May 2014 Ulrich Detert JuRoPA JuRoPA Jülich Research on Petaflop Architectures Bull, Sun, ParTec, Intel, Mellanox, Novell, FZJ JUROPA
More informationMitglied der Helmholtz-Gemeinschaft. System monitoring with LLview and the Parallel Tools Platform
Mitglied der Helmholtz-Gemeinschaft System monitoring with LLview and the Parallel Tools Platform November 25, 2014 Carsten Karbach Content 1 LLview 2 Parallel Tools Platform (PTP) 3 Latest features 4
More informationOperating System for the K computer
Operating System for the K computer Jun Moroo Masahiko Yamada Takeharu Kato For the K computer to achieve the world s highest performance, Fujitsu has worked on the following three performance improvements
More informationBG/Q Performance Tools. Sco$ Parker BG/Q Early Science Workshop: March 19-21, 2012 Argonne Leadership CompuGng Facility
BG/Q Performance Tools Sco$ Parker BG/Q Early Science Workshop: March 19-21, 2012 BG/Q Performance Tool Development In conjuncgon with the Early Science program an Early SoMware efforts was inigated to
More informationUnified Performance Data Collection with Score-P
Unified Performance Data Collection with Score-P Bert Wesarg 1) With contributions from Andreas Knüpfer 1), Christian Rössel 2), and Felix Wolf 3) 1) ZIH TU Dresden, 2) FZ Jülich, 3) GRS-SIM Aachen Fragmentation
More informationAgenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC
HPC Architecture End to End Alexandre Chauvin Agenda HPC Software Stack Visualization National Scientific Center 2 Agenda HPC Software Stack Alexandre Chauvin Typical HPC Software Stack Externes LAN Typical
More informationPADS GPFS Filesystem: Crash Root Cause Analysis. Computation Institute
PADS GPFS Filesystem: Crash Root Cause Analysis Computation Institute Argonne National Laboratory Table of Contents Purpose 1 Terminology 2 Infrastructure 4 Timeline of Events 5 Background 5 Corruption
More informationFLOW-3D Performance Benchmark and Profiling. September 2012
FLOW-3D Performance Benchmark and Profiling September 2012 Note The following research was performed under the HPC Advisory Council activities Participating vendors: FLOW-3D, Dell, Intel, Mellanox Compute
More informationParallel IO performance and scalability study on the PRACE CURIE supercomputer.
Available on-line at www.prace-ri.eu Partnership for Advanced Computing in Europe Parallel IO performance and scalability study on the PRACE CURIE supercomputer. Philippe Wautelet a,, Pierre Kestener a
More informationA highly configurable and efficient simulator for job schedulers on supercomputers
Mitglied der Helmholtz-Gemeinschaft A highly configurable and efficient simulator for job schedulers on supercomputers April 12, 2013 Carsten Karbach, Jülich Supercomputing Centre (JSC) Motivation Objective
More informationBSC vision on Big Data and extreme scale computing
BSC vision on Big Data and extreme scale computing Jesus Labarta, Eduard Ayguade,, Fabrizio Gagliardi, Rosa M. Badia, Toni Cortes, Jordi Torres, Adrian Cristal, Osman Unsal, David Carrera, Yolanda Becerra,
More informationSystem Software for High Performance Computing. Joe Izraelevitz
System Software for High Performance Computing Joe Izraelevitz Agenda Overview of Supercomputers Blue Gene/Q System LoadLeveler Job Scheduler General Parallel File System HPC at UR What is a Supercomputer?
More informationLS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance
11 th International LS-DYNA Users Conference Session # LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance Gilad Shainer 1, Tong Liu 2, Jeff Layton 3, Onur Celebioglu
More informationMulticore Parallel Computing with OpenMP
Multicore Parallel Computing with OpenMP Tan Chee Chiang (SVU/Academic Computing, Computer Centre) 1. OpenMP Programming The death of OpenMP was anticipated when cluster systems rapidly replaced large
More informationCurrent Status of FEFS for the K computer
Current Status of FEFS for the K computer Shinji Sumimoto Fujitsu Limited Apr.24 2012 LUG2012@Austin Outline RIKEN and Fujitsu are jointly developing the K computer * Development continues with system
More informationBeyond Embarrassingly Parallel Big Data. William Gropp www.cs.illinois.edu/~wgropp
Beyond Embarrassingly Parallel Big Data William Gropp www.cs.illinois.edu/~wgropp Messages Big is big Data driven is an important area, but not all data driven problems are big data (despite current hype).
More informationLarge File System Backup NERSC Global File System Experience
Large File System Backup NERSC Global File System Experience M. Andrews, J. Hick, W. Kramer, A. Mokhtarani National Energy Research Scientific Computing Center at Lawrence Berkeley National Laboratory
More informationData Deduplication in a Hybrid Architecture for Improving Write Performance
Data Deduplication in a Hybrid Architecture for Improving Write Performance Data-intensive Salable Computing Laboratory Department of Computer Science Texas Tech University Lubbock, Texas June 10th, 2013
More informationIntroduction History Design Blue Gene/Q Job Scheduler Filesystem Power usage Performance Summary Sequoia is a petascale Blue Gene/Q supercomputer Being constructed by IBM for the National Nuclear Security
More informationCosmological simulations on High Performance Computers
Cosmological simulations on High Performance Computers Cosmic Web Morphology and Topology Cosmological workshop meeting Warsaw, 12-17 July 2011 Maciej Cytowski Interdisciplinary Centre for Mathematical
More informationLLNL s Parallel I/O Testing Tools and Techniques for ASC Parallel File Systems
UCRL-CONF-203489 LAWRENCE LIVERMORE NATIONAL LABORATORY LLNL s Parallel I/O Testing Tools and Techniques for ASC Parallel File Systems W. E. Loewe, R. M. Hedges, T. T. McLarty, and C. J. Morrone April
More informationUnderstanding and Improving Computational Science Storage Access through Continuous Characterization
Understanding and Improving Computational Science Storage Access through Continuous Characterization Philip Carns, Kevin Harms, William Allcock, Charles Bacon, Samuel Lang, Robert Latham, and Robert Ross
More informationPortable, Scalable, and High-Performance I/O Forwarding on Massively Parallel Systems. Jason Cope copej@mcs.anl.gov
Portable, Scalable, and High-Performance I/O Forwarding on Massively Parallel Systems Jason Cope copej@mcs.anl.gov Computation and I/O Performance Imbalance Leadership class computa:onal scale: >100,000
More informationP013 INTRODUCING A NEW GENERATION OF RESERVOIR SIMULATION SOFTWARE
1 P013 INTRODUCING A NEW GENERATION OF RESERVOIR SIMULATION SOFTWARE JEAN-MARC GRATIEN, JEAN-FRANÇOIS MAGRAS, PHILIPPE QUANDALLE, OLIVIER RICOIS 1&4, av. Bois-Préau. 92852 Rueil Malmaison Cedex. France
More informationUncovering degraded application performance with LWM 2. Aamer Shah, Chih-Song Kuo, Lucas Theisen, Felix Wolf November 17, 2014
Uncovering degraded application performance with LWM 2 Aamer Shah, Chih-Song Kuo, Lucas Theisen, Felix Wolf November 17, 214 Motivation: Performance degradation Internal factors: Inefficient use of hardware
More informationCluster Implementation and Management; Scheduling
Cluster Implementation and Management; Scheduling CPS343 Parallel and High Performance Computing Spring 2013 CPS343 (Parallel and HPC) Cluster Implementation and Management; Scheduling Spring 2013 1 /
More informationHardware Performance Optimization and Tuning. Presenter: Tom Arakelian Assistant: Guy Ingalls
Hardware Performance Optimization and Tuning Presenter: Tom Arakelian Assistant: Guy Ingalls Agenda Server Performance Server Reliability Why we need Performance Monitoring How to optimize server performance
More informationClusters: Mainstream Technology for CAE
Clusters: Mainstream Technology for CAE Alanna Dwyer HPC Division, HP Linux and Clusters Sparked a Revolution in High Performance Computing! Supercomputing performance now affordable and accessible Linux
More informationHPC Wales Skills Academy Course Catalogue 2015
HPC Wales Skills Academy Course Catalogue 2015 Overview The HPC Wales Skills Academy provides a variety of courses and workshops aimed at building skills in High Performance Computing (HPC). Our courses
More informationPETASCALE DATA STORAGE INSTITUTE. SciDAC @ Petascale storage issues. 3 universities, 5 labs, G. Gibson, CMU, PI
PETASCALE DATA STORAGE INSTITUTE 3 universities, 5 labs, G. Gibson, CMU, PI SciDAC @ Petascale storage issues www.pdsi-scidac.org Community building: ie. PDSW-SC07 (Sun 11th) APIs & standards: ie., Parallel
More informationHPC and Big Data. EPCC The University of Edinburgh. Adrian Jackson Technical Architect a.jackson@epcc.ed.ac.uk
HPC and Big Data EPCC The University of Edinburgh Adrian Jackson Technical Architect a.jackson@epcc.ed.ac.uk EPCC Facilities Technology Transfer European Projects HPC Research Visitor Programmes Training
More informationA Framework For Application Performance Understanding and Prediction
A Framework For Application Performance Understanding and Prediction Laura Carrington Ph.D. Lab (Performance Modeling & Characterization) at the 1 About us An NSF lab see www.sdsc.edu/ The mission of the
More informationRecommended hardware system configurations for ANSYS users
Recommended hardware system configurations for ANSYS users The purpose of this document is to recommend system configurations that will deliver high performance for ANSYS users across the entire range
More informationBernie Velivis President, Performax Inc
Performax provides software load testing and performance engineering services to help our clients build, market, and deploy highly scalable applications. Bernie Velivis President, Performax Inc Load ing
More informationA Survey of Shared File Systems
Technical Paper A Survey of Shared File Systems Determining the Best Choice for your Distributed Applications A Survey of Shared File Systems A Survey of Shared File Systems Table of Contents Introduction...
More informationTHE EXPAND PARALLEL FILE SYSTEM A FILE SYSTEM FOR CLUSTER AND GRID COMPUTING. José Daniel García Sánchez ARCOS Group University Carlos III of Madrid
THE EXPAND PARALLEL FILE SYSTEM A FILE SYSTEM FOR CLUSTER AND GRID COMPUTING José Daniel García Sánchez ARCOS Group University Carlos III of Madrid Contents 2 The ARCOS Group. Expand motivation. Expand
More informationParallel I/O on Mira Venkat Vishwanath and Kevin Harms
Parallel I/O on Mira Venkat Vishwanath and Kevin Harms Argonne Na*onal Laboratory venkat@anl.gov ALCF-2 I/O Infrastructure Mira BG/Q Compute Resource Tukey Analysis Cluster 48K Nodes 768K Cores 10 PFlops
More informationData Centric Systems (DCS)
Data Centric Systems (DCS) Architecture and Solutions for High Performance Computing, Big Data and High Performance Analytics High Performance Computing with Data Centric Systems 1 Data Centric Systems
More informationCOSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters
COSC 6374 Parallel I/O (I) I/O basics Fall 2012 Concept of a clusters Processor 1 local disks Compute node message passing network administrative network Memory Processor 2 Network card 1 Network card
More informationDirect NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle
Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle Agenda Introduction Database Architecture Direct NFS Client NFS Server
More informationI/O intensive applications: what are the main differences in the design of the HPC filesystems vs the MapReduce ones?
I/O intensive applications: what are the main differences in the design of the HPC filesystems vs the MapReduce ones? Matthieu Dorier, Radu Marius Tudoran Master 2 Research ENS Cachan - Brittany extension
More informationJuRoPA. Jülich Research on Petaflop Architecture. One Year on. Hugo R. Falter, COO Lee J Porter, Engineering
JuRoPA Jülich Research on Petaflop Architecture One Year on Hugo R. Falter, COO Lee J Porter, Engineering HPC Advisoy Counsil, Workshop 2010, Lugano 1 Outline The work of ParTec on JuRoPA (HF) Overview
More informationParallel Visualization of Petascale Simulation Results from GROMACS, NAMD and CP2K on IBM Blue Gene/P using VisIt Visualization Toolkit
Available online at www.prace-ri.eu Partnership for Advanced Computing in Europe Parallel Visualization of Petascale Simulation Results from GROMACS, NAMD and CP2K on IBM Blue Gene/P using VisIt Visualization
More informationUpdate on Petascale Data Storage Institute
Update on Petascale Data Storage Institute HEC File System & IO Workshop, Aug 11, 2009 Garth Gibson Carnegie Mellon University and Panasas Inc. SciDAC Petascale Data Storage Institute (PDSI) Special thanks
More informationMPI / ClusterTools Update and Plans
HPC Technical Training Seminar July 7, 2008 October 26, 2007 2 nd HLRS Parallel Tools Workshop Sun HPC ClusterTools 7+: A Binary Distribution of Open MPI MPI / ClusterTools Update and Plans Len Wisniewski
More informationComparing the performance of the Landmark Nexus reservoir simulator on HP servers
WHITE PAPER Comparing the performance of the Landmark Nexus reservoir simulator on HP servers Landmark Software & Services SOFTWARE AND ASSET SOLUTIONS Comparing the performance of the Landmark Nexus
More informationBig Data Management in the Clouds and HPC Systems
Big Data Management in the Clouds and HPC Systems Hemera Final Evaluation Paris 17 th December 2014 Shadi Ibrahim Shadi.ibrahim@inria.fr Era of Big Data! Source: CNRS Magazine 2013 2 Era of Big Data! Source:
More informationPOSIX. RTOSes Part I. POSIX Versions. POSIX Versions (2)
RTOSes Part I Christopher Kenna September 24, 2010 POSIX Portable Operating System for UnIX Application portability at source-code level POSIX Family formally known as IEEE 1003 Originally 17 separate
More informationHigh Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/2015. 2015 CAE Associates
High Performance Computing (HPC) CAEA elearning Series Jonathan G. Dudley, Ph.D. 06/09/2015 2015 CAE Associates Agenda Introduction HPC Background Why HPC SMP vs. DMP Licensing HPC Terminology Types of
More informationBottleneck Detection in Parallel File Systems with Trace-Based Performance Monitoring
Julian M. Kunkel - Euro-Par 2008 1/33 Bottleneck Detection in Parallel File Systems with Trace-Based Performance Monitoring Julian M. Kunkel Thomas Ludwig Institute for Computer Science Parallel and Distributed
More informationPetascale Software Challenges. William Gropp www.cs.illinois.edu/~wgropp
Petascale Software Challenges William Gropp www.cs.illinois.edu/~wgropp Petascale Software Challenges Why should you care? What are they? Which are different from non-petascale? What has changed since
More informationHadoop on the Gordon Data Intensive Cluster
Hadoop on the Gordon Data Intensive Cluster Amit Majumdar, Scientific Computing Applications Mahidhar Tatineni, HPC User Services San Diego Supercomputer Center University of California San Diego Dec 18,
More informationCOSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters
COSC 6374 Parallel Computation Parallel I/O (I) I/O basics Spring 2008 Concept of a clusters Processor 1 local disks Compute node message passing network administrative network Memory Processor 2 Network
More informationGeneral Parallel File System (GPFS) Native RAID For 100,000-Disk Petascale Systems
General Parallel File System (GPFS) Native RAID For 100,000-Disk Petascale Systems Veera Deenadhayalan IBM Almaden Research Center 2011 IBM Corporation Hard Disk Rates Are Lagging There have been recent
More informationLoad Testing Analysis Services Gerhard Brückl
Load Testing Analysis Services Gerhard Brückl About Me Gerhard Brückl Working with Microsoft BI since 2006 Mainly focused on Analytics and Reporting Analysis Services / Reporting Services Power BI / O365
More informationIBM System x GPFS Storage Server
IBM System x GPFS Storage Server Schöne Aussicht en für HPC Speicher ZKI-Arbeitskreis Paderborn, 15.03.2013 Karsten Kutzer Client Technical Architect Technical Computing IBM Systems & Technology Group
More informationLS DYNA Performance Benchmarks and Profiling. January 2009
LS DYNA Performance Benchmarks and Profiling January 2009 Note The following research was performed under the HPC Advisory Council activities AMD, Dell, Mellanox HPC Advisory Council Cluster Center The
More informationBlue Gene Active Storage for High Performance BG/Q I/O and Scalable Data-centric Analytics
for High Performance BG/Q I/O and Scalable Data-centric Analytics Blake G. Fitch bgf@us.ibm.com The Extended, World Wide Active Storage Fabric Team Blake G. Fitch Robert S. Germain Michele Franceschini
More informationDistributed communication-aware load balancing with TreeMatch in Charm++
Distributed communication-aware load balancing with TreeMatch in Charm++ The 9th Scheduling for Large Scale Systems Workshop, Lyon, France Emmanuel Jeannot Guillaume Mercier Francois Tessier In collaboration
More informationApplication Performance Analysis Tools and Techniques
Mitglied der Helmholtz-Gemeinschaft Application Performance Analysis Tools and Techniques 2012-06-27 Christian Rössel Jülich Supercomputing Centre c.roessel@fz-juelich.de EU-US HPC Summer School Dublin
More informationGlobus Striped GridFTP Framework and Server. Raj Kettimuthu, ANL and U. Chicago
Globus Striped GridFTP Framework and Server Raj Kettimuthu, ANL and U. Chicago Outline Introduction Features Motivation Architecture Globus XIO Experimental Results 3 August 2005 The Ohio State University
More informationPicking the right number of targets per server for BeeGFS. Jan Heichler March 2015 v1.2
Picking the right number of targets per server for BeeGFS Jan Heichler March 2015 v1.2 Evaluating the MetaData Performance of BeeGFS 2 Abstract In this paper we will show the performance of two different
More informationParallel Programming Survey
Christian Terboven 02.09.2014 / Aachen, Germany Stand: 26.08.2014 Version 2.3 IT Center der RWTH Aachen University Agenda Overview: Processor Microarchitecture Shared-Memory
More informationPart I Courses Syllabus
Part I Courses Syllabus This document provides detailed information about the basic courses of the MHPC first part activities. The list of courses is the following 1.1 Scientific Programming Environment
More informationClimate-Weather Modeling Studies Using a Prototype Global Cloud-System Resolving Model
ANL/ALCF/ESP-13/1 Climate-Weather Modeling Studies Using a Prototype Global Cloud-System Resolving Model ALCF-2 Early Science Program Technical Report Argonne Leadership Computing Facility About Argonne
More informationChapter 6, The Operating System Machine Level
Chapter 6, The Operating System Machine Level 6.1 Virtual Memory 6.2 Virtual I/O Instructions 6.3 Virtual Instructions For Parallel Processing 6.4 Example Operating Systems 6.5 Summary Virtual Memory General
More informationPetascale Visualization: Approaches and Initial Results
Petascale Visualization: Approaches and Initial Results James Ahrens Li-Ta Lo, Boonthanome Nouanesengsy, John Patchett, Allen McPherson Los Alamos National Laboratory LA-UR- 08-07337 Operated by Los Alamos
More informationSAN Conceptual and Design Basics
TECHNICAL NOTE VMware Infrastructure 3 SAN Conceptual and Design Basics VMware ESX Server can be used in conjunction with a SAN (storage area network), a specialized high speed network that connects computer
More informationPerformance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi
Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi ICPP 6 th International Workshop on Parallel Programming Models and Systems Software for High-End Computing October 1, 2013 Lyon, France
More informationFD4: A Framework for Highly Scalable Dynamic Load Balancing and Model Coupling
Center for Information Services and High Performance Computing (ZIH) FD4: A Framework for Highly Scalable Dynamic Load Balancing and Model Coupling Symposium on HPC and Data-Intensive Applications in Earth
More informationHigh Performance Computing in Aachen
High Performance Computing in Aachen Christian Iwainsky iwainsky@rz.rwth-aachen.de Center for Computing and Communication RWTH Aachen University Produktivitätstools unter Linux Sep 16, RWTH Aachen University
More informationEvaluating parallel file system security
Evaluating parallel file system security 1. Motivation After successful Internet attacks on HPC centers worldwide, there has been a paradigm shift in cluster security strategies. Clusters are no longer
More informationKriterien für ein PetaFlop System
Kriterien für ein PetaFlop System Rainer Keller, HLRS :: :: :: Context: Organizational HLRS is one of the three national supercomputing centers in Germany. The national supercomputing centers are working
More information(51) Int Cl.: G06F 11/14 (2006.01)
(19) (12) EUROPEAN PATENT SPECIFICATION (11) EP 1 08 414 B1 (4) Date of publication and mention of the grant of the patent: 04.03.09 Bulletin 09/ (1) Int Cl.: G06F 11/14 (06.01) (21) Application number:
More informationAutomatic Tuning of HPC Applications for Performance and Energy Efficiency. Michael Gerndt Technische Universität München
Automatic Tuning of HPC Applications for Performance and Energy Efficiency. Michael Gerndt Technische Universität München SuperMUC: 3 Petaflops (3*10 15 =quadrillion), 3 MW 2 TOP 500 List TOTAL #1 #500
More informationOverlapping Data Transfer With Application Execution on Clusters
Overlapping Data Transfer With Application Execution on Clusters Karen L. Reid and Michael Stumm reid@cs.toronto.edu stumm@eecg.toronto.edu Department of Computer Science Department of Electrical and Computer
More informationEOFS Workshop Paris Sept, 2011. Lustre at exascale. Eric Barton. CTO Whamcloud, Inc. eeb@whamcloud.com. 2011 Whamcloud, Inc.
EOFS Workshop Paris Sept, 2011 Lustre at exascale Eric Barton CTO Whamcloud, Inc. eeb@whamcloud.com Agenda Forces at work in exascale I/O Technology drivers I/O requirements Software engineering issues
More informationHPC enabling of OpenFOAM R for CFD applications
HPC enabling of OpenFOAM R for CFD applications Towards the exascale: OpenFOAM perspective Ivan Spisso 25-27 March 2015, Casalecchio di Reno, BOLOGNA. SuperComputing Applications and Innovation Department,
More informationUsing the Windows Cluster
Using the Windows Cluster Christian Terboven terboven@rz.rwth aachen.de Center for Computing and Communication RWTH Aachen University Windows HPC 2008 (II) September 17, RWTH Aachen Agenda o Windows Cluster
More informationPerformance Tools for System Monitoring
Center for Information Services and High Performance Computing (ZIH) 01069 Dresden Performance Tools for System Monitoring 1st CHANGES Workshop, Jülich Zellescher Weg 12 Tel. +49 351-463 35450 September
More informationCan High-Performance Interconnects Benefit Memcached and Hadoop?
Can High-Performance Interconnects Benefit Memcached and Hadoop? D. K. Panda and Sayantan Sur Network-Based Computing Laboratory Department of Computer Science and Engineering The Ohio State University,
More informationHigh Productivity Computing With Windows
High Productivity Computing With Windows Windows HPC Server 2008 Justin Alderson 16-April-2009 Agenda The purpose of computing is... The purpose of computing is insight not numbers. Richard Hamming Why
More informationRunning on Blue Gene/Q at Argonne Leadership Computing Facility (ALCF)
Running on Blue Gene/Q at Argonne Leadership Computing Facility (ALCF) ALCF Resources: Machines & Storage Mira (Production) IBM Blue Gene/Q 49,152 nodes / 786,432 cores 768 TB of memory Peak flop rate:
More informationHands-on exercise (FX10 pi): NPB-MZ-MPI / BT
Hands-on exercise (FX10 pi): NPB-MZ-MPI / BT VI-HPS Team 14th VI-HPS Tuning Workshop, 25-27 March 2014, RIKEN AICS, Kobe, Japan 1 Tutorial exercise objectives Familiarise with usage of VI-HPS tools complementary
More informationPerformance Analysis and Optimization Tool
Performance Analysis and Optimization Tool Andres S. CHARIF-RUBIAL andres.charif@uvsq.fr Performance Analysis Team, University of Versailles http://www.maqao.org Introduction Performance Analysis Develop
More informationOpenMP & MPI CISC 879. Tristan Vanderbruggen & John Cavazos Dept of Computer & Information Sciences University of Delaware
OpenMP & MPI CISC 879 Tristan Vanderbruggen & John Cavazos Dept of Computer & Information Sciences University of Delaware 1 Lecture Overview Introduction OpenMP MPI Model Language extension: directives-based
More informationCloud Computing Where ISR Data Will Go for Exploitation
Cloud Computing Where ISR Data Will Go for Exploitation 22 September 2009 Albert Reuther, Jeremy Kepner, Peter Michaleas, William Smith This work is sponsored by the Department of the Air Force under Air
More informationPart V Applications. What is cloud computing? SaaS has been around for awhile. Cloud Computing: General concepts
Part V Applications Cloud Computing: General concepts Copyright K.Goseva 2010 CS 736 Software Performance Engineering Slide 1 What is cloud computing? SaaS: Software as a Service Cloud: Datacenters hardware
More information