Introduction Physics at CSC Tomasz Malkiewicz Jan Åström
CSC Autumn School in Computational Physics 2013 9.00-9.30 9.30-10.15 Monday November 25 Tuesday November 26 Course intro, physics@csc (T. Malkiewicz, J. Åström) Round robin: how CSC can help your research (T. Malkiewicz) 10.15-10.45 Coffee break Coffee break 10.45-11.30 11.30-12.00 Massively parallel computations (K. Rummukainen) Computational physics with Xeon Phi and GPU (F. Robertsén) 12.00-13.00 Lunch Lunch Advanced unix for physicists (J. Lento) Advanced unix for physicists (J. Lento) 13.00-14.30 Debugging and code optimization (S. Ilvonen / J. Enkovaara) Introduction to glaciology and numerical modelling of glacier dynamics, example: Vestfonna ice-cap, Svalbard (M. Schäfer) 14.30-14.45 Coffee break Coffee break 14.45-15.15 On the diversity of particle-based methods (J. Åström) Continuum models and assumptions (T. Zwinger) 15.15-16.30 Archive and IO + demo on FGI and Cloud (K. Mattila, R. Laurikainen) Scientific visualization, focus on geophysics (J. Hokkanen) 11/25/2013 CSC Autumn School in Comp. Phys. '13 2 + supercomputers guided tour on Tuesday at 12:40
Aims Rather lecture than conference-oriented presentations Slides/abstracts available in advance Try to make potentially difficult things look relatively easy to learn and understand Skip items that have less significance in everyday work of physicists A hands-on sessions included in most lecture session allow to practice the just learned subjects 11/25/2013 CSC Autumn School in Comp. Phys. '13 3
Physics at CSC Physics at supercomputers Resources available for physicists What s new Future Why and when to use supercomputers Courses of interest for physicists Physics people at CSC Q/A 11/25/2013 CSC Autumn School in Comp. Phys. '13 4
Physics at supercomputers Physics is a branch of science concerned with the nature, structure and properties of matter, ranging from the smallest scale of atoms and sub-atomic particles, to the Universe as a whole. Physics includes experiment and theory and involves both fundamental research driven by curiosity, as well as applied research linked to technology. EPS report, 2013 Supercomputer is a computer at the frontline of contemporary processing capacity particularly speed of calculation. Fastest supercomputer: China Tianhe-2 with 33.86 petaflop/s (quadrillions of calculations per second) on the LINPACK benchmark 11/25/2013 CSC Autumn School in Comp. Phys. '13 5
Usage of processor time by discipline 1H/2013 3% 2% 2% 2% Physics 4% 34% Nanoscience 6% 15% 5% Total 84.5 million billing units Chemistry Astrophysics Computational fluid dynamics Biosciences Grid usage Materials sciences 27% Computational drug design Other 11/25/2013 CSC Autumn School in Comp. Phys. '13 6
Application software usage (maintained by CSC) according to processor time 1H/2013 2% 2% 2% 2% 3% 2% 36% GPAW Gromacs 4% 10% 3% Total 22.3 million core hours CP2K Gaussian Molpro NAMD ADF VASP Matlab Turbomole 34% Other 11/25/2013 CSC Autumn School in Comp. Phys. '13 7
New projects by discipline 1H/2013 8 24 56 Biosciences Computer science Language research 12 9 12 Total 195 new projects Physics Grid usage Chemistry Structural analysis Social sciences 12 12 12 16 22 Medical sciences Computational fluid dynamics Other 11/25/2013 CSC Autumn School in Comp. Phys. '13 8
Users of computing servers by organization 2012 (total 1463 users) University of Helsinki 57 57 38 33 181 555 Aalto University University of Jyväskylä University of Turku 61 University of Oulu 78 University of Eastern Finland 97 99 207 Tampere University of Technology CSC (PRACE) University of Tampere CSC (Projects) Other 11/25/2013 CSC Autumn School in Comp. Phys. '13 9
Foreign user accounts in CSC's server environment 1H/2013 124 Germany France 327 110 U.K. Italy 41 46 46 55 Total 1121 users from 69 countries 68 68 75 79 82 India Poland China Russia USA Spain The Netherlands Other (58 countries) 11/25/2013 CSC Autumn School in Comp. Phys. '13 10
Currently available computing resources Massive computational challenges: Sisu > 10 000 cores, >23TB memory Theoretical peak performance > 240 Tflop/s HP-cluster Taito (+ Vuori by 1/2014) Small and medium-sized tasks Theoretical peak performance 180 Tflop/s (40) Application server Hippu Interactive usage, without job scheduler Postprocessing, e.g. vizualization FGI CSC cloud services 11/25/2013 CSC Autumn School in Comp. Phys. '13 11
11/25/2013 CSC Autumn School in Comp. Phys. '13 12
Last site level blackout in the early 1980s Power distribution (FinGrid) CSC started ITI Curve monitoring early Feb-2012 11/25/2013 CSC Autumn School in Comp. Phys. '13 13
11/25/2013 CSC Autumn School in Comp. Phys. '13 14
Sisu now 11/25/2013 CSC Autumn School in Comp. Phys. '13 15
Sisu rear view 11/25/2013 CSC Autumn School in Comp. Phys. '13 16
Taito (HP) hosted in SGI Ice Cube R80 11/25/2013 CSC Autumn School in Comp. Phys. '13 17
SGI Ice Cube R80 11/25/2013 CSC Autumn School in Comp. Phys. '13 18
Taito 11/25/2013 CSC Autumn School in Comp. Phys. '13 19
Cray Dragonfly Topology All-to-all network between groups 2 dimensional all-to-all network in a group Source: Robert Alverson, Cray Hot Interconnects 2012 keynote 11/25/2013 Optical uplinks to inter-group net CSC Autumn School in Comp. Phys. '13 20
GFlop/s Performance of numerical libraries 30.00 DGEMM 1000x1000 Single-Core Performance Turbo Peak (when only 1 core is used) @ 3.5GHz * 8 Flop/Hz 25.00 20.00 15.00 Peak @ 2.7GHz * 8 Flop/Hz Sandy Bridge 2.7GHz Opteron Barcelona 2.3GHz (Louhi) 10.00 Peak @ 2.3GHz * 4 Flop/Hz 5.00 0.00 ATLAS 3.8 ATLAS 3.10 ACML 5.2 Ifort 12.1 RedHat 6.2 RPM matmul MKL 12.1 LibSci ACML 4.4.0 MKL 11 MKL the best choice on Sandy Bridge, for now. (On Cray, LibSci a good alternative) 11/25/2013 CSC Autumn School in Comp. Phys. '13 21
Sisu&Taito vs. Louhi&Vuori vs. FGI vs. Local Cluster Availability CPU Sisu&Taito (Phase 1) Available Intel Sandy Bridge, 2 x 8 cores, 2.6 GHz, Xeon E5-2670 CSC Autumn School in Comp. Phys. '13 Vuori FGI Merope Available (by 1/2014) 2.6 GHz AMD Opteron and Intel Xeon Available Available Intel Xeon, 2 x 6 cores, 2.7 GHz, X5650 Interconnect Aries / FDR IB QDR IB QDR IB Cores 11776 / 9216 3648 7308 748 RAM/core 2 / 4 GB 16x 256GB/node 1 / 2 / 8 GB 2 / 4 / 8 GB 4 / 8 GB Tflops 244 / 180 33 95 8 GPU nodes in Phase2 8 88 6 Disc space 2.4 PB 145 TB 1+ PB 100 TB 11/25/2013 22
What s new 11/25/2013 CSC Autumn School in Comp. Phys. '13 23
Future Phase 1 Phase 2 Cray HP Cray HP Deployment Done Done Probably 2014 CPU Interconnect Intel Sandy Bridge 16 cores @ 2.6 GHz Aries FDR InfiniBand (56 Gbps) Next generation processors Aries EDR InfiniBand (100 Gbps) Cores 11776 9216 ~40000 ~17000 Tflops 244 180 (5x Vuori) 1700 515 (15x Vuori) Tflops 11/25/2013 total CSC 424 Autumn School in Comp. Phys. '13 2215 24
CSC Computing Capacity 1989 2012 Standardized processors max. capacity (80%) capacity used Cray XC30 10000 Cray XT5 1000 IBM eserver Cluster 1600 Two Compaq Alpha Servers (Lempo and Hiisi) Cray XT4 DC Cray XT4 QC HP CP4000 BL Proliant 6C AMD HP Proliant SL230s 100 10 1 Cray X-MP/416 Convex 3840 SGI R4400 IBM SP2 IBM SP1 Cray C94 Cray T3E expanded Cray T3E (224 proc) (192 proc) Cray T3E expanded (512 proc) SGI upgrade SGI Origin 2000 IBM upgrade Cray T3E IBM SP2 decommissioned (64 proc) SGI upgrade 1/1998 IBM SP Power3 Compaq Alpha cluster (Clux) HP DL 145 Proliant Sun Fire 25K Federation HP switch upgrade on IBM Cray T3E decommissioned 12/2002 Clux and Hiisi decommissioned 2/2005 HP CP4000BL Proliant 465c DC AMD Murska decommissioned 6/2012 Convex C220 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 50. 100. Top500 rating 1993 2012 The Top500 lists http://www.top500.org/ were started in 1993. Cray T3E IBM eserver p690 Cray XT4/XT5 Cray XC30 150. 200. 250. SGI Power Challenge IBM SP2 SGI Origin 2000 HP Proliant 465c DC HP Proliant SL230s 300. 350. Cray X-MP IBM SP Power3 400. 450. IBM SP1 Cray C94 Digital AlphaServer HP Proliant 465c 6C 500. Convex C3840 11/25/2013 CSC Autumn School in Comp. Phys. '13 25
IT summary Cray XC30 supercomputer (Sisu) Fastest computer in Finland Phase 1: 385 kw, 244 Tflop/s, 16 x 2 GB cores per computing node, 4 x 256 GB login nodes Phase 2: ~1700 Tflop/s Very high density, large racks PRACE prototype (coming late 2013 and 2014) Intel Xeon Phi coprocessors NVIDIA next generation GPUs 11/25/2013 CSC Autumn School in Comp. Phys. '13 26
IT summary cont. HP (Taito) 1152 Intel CPUs 16 x 4 GB cores per node 16 fat nodes with 16 x16 GB cores per node 6 x 64 GB login nodes 180 TFlop/s 30 kw 47 U racks HPC storage 1 + 1.4 + 1.4 PB of fast parallel storage Supports Cray and HP systems 11/25/2013 CSC Autumn School in Comp. Phys. '13 27
ns/day Why and when to use HPC? 160 140 Lipid MD, 120katoms, PME, Gromacs 120 100 80 15 10 5 louhi vuori 60 40 0 0 16 32 taito sisu 20 0 0 100 200 300 400 500 600 cores 11/25/2013 CSC Autumn School in Comp. Phys. '13 28
Courses at CSC CSC courses: http://www.csc.fi/courses CSC HPC Summer School Sisu (Cray) workshops Taito (HP) workshops December 2013 Intel Xeon Phi programming 11/25/2013 CSC Autumn School in Comp. Phys. '13 29
Physics people at CSC Particle based methods: Jan Åström Geophysics/glaciology: Thomas Zwinger Nanoscience/semiconductors: Jura Tarus Nuclear/particle physics: Tomasz Malkiewicz Partial differential equations/elmer: Peter Råback A few with background in DFT: Juha Lento Quantum chemistry: Nino Runeberg A few with numerical mathematics background Several with advanced code optimisation skills Everything related to HPC in general 11/25/2013 CSC Autumn School in Comp. Phys. '13 30
Q/A: Need disk space 3.8 PB on DDN $HOME, $USERAPPL: 20 GB $WRKDIR (not backed up), soft quota: 5 TB HPC ARCHIVE: 2 TB / user, common between Cray and HP /tmp (around 1.8 TB) to be used for compiling codes Disk space through IDA 11/25/2013 CSC Autumn School in Comp. Phys. '13 31
Disks at Kajaani taito.csc.fi login nodes sisu.csc.fi login nodes Your workstation irods client compute nodes compute nodes SUI $TMPDIR $TMPDIR $TMPDIR $TMPDIR $WRKDIR $HOME $TMPDIR New tape $ARCHIVE in Espoo irods interface disk cache icp, iput, ils, irm $USERAPPL $HOME/xyz 11/25/2013 CSC Autumn School in Comp. Phys. '13 icp 32
Datasets served by TTA Projects funded by Finnish Academy (akatemiahankkeet, huippuyksiköt, tutkimusohjelmat and tutkimusinfrastruktuurit) 1 PB capacity Universities and Polytechnics 1 PB capacity ESFRI-projects (ex. BBMRI, CLARIN) Other important research projects via special application process SA hankke et 1 PB ESFRIt, FSD, pilotit ja lisäosuudet 1 PB Korkea -koulut 1 PB 11/25/2013 CSC Autumn School in Comp. Phys. '13 33
Q/A: Is there a single place to look for info regarding supercomputers? User manuals http://research.csc.fi/guides Support helpdesk@csc.fi 11/25/2013 CSC Autumn School in Comp. Phys. '13 34
Q/A: Need large capacity -> Grand Challenges Normal GC (in half a year / year) new CSC resources available for a year no bottom limit for number of cores, up to 50% Special GC call (mainly for Cray) (when needed) possibility for short (day or less) runs with the whole Cray Remember also PRACE/DECI http://www.csc.fi/english/csc/news/news/pracecalls 11/25/2013 CSC Autumn School in Comp. Phys. '13 35
Q/A: Is Cloud something for me? ->example: Taito Taito cluster: two types of nodes, HPC and cloud HPC node HPC node Cloud node Cloud node Host OS: RHEL Virtual machine Guest OS: Ubuntu Virtual machine Guest OS: Windows 11/25/2013 CSC Autumn School in Comp. Phys. '13 36
Q/A: How fast is the I/O? I/O speed Infiniband interconnect 56 Gbit/s, tested to give 20 GB/s (peak, on DDN) i-commands 100 MB/s = 1 Gbit/s (10-16 thread, if > 32 MB then spreads, Kernel schedules) SUI: 11 MB/s, 1 GB = 1 min Fastest laptop:120 MB/s, disc speed 40 MB/s write 10 Gbit/s ethernet = 1.2 GB/s Metadata operations for Lustre take long, therefore not good to have many small files 11/25/2013 CSC Autumn School in Comp. Phys. '13 37
Q/A: Fastest way to connect? NoMachine NX server for remote access 11/25/2013 CSC Autumn School in Comp. Phys. '13 38
Q/A: How to get access to CSC supercomputers? sui.csc.fi (HAKA authentication) sing up 11/25/2013 CSC Autumn School in Comp. Phys. '13 39
ns/day Performance comparison Per core performance ~2 x compared to Vuori Better interconnects enhance scaling Larger memory Smartest collective communications The most powerful computer(s) in Finland Big investment Quick summary 120 100 80 60 40 20 0 Gromacs performance 0 20 40 60 80 cores Taito Sisu FGI Vuori Louhi 11/25/2013 CSC Autumn School in Comp. Phys. '13 40
Round robin
Round robin What are your research interest? What are your needs in terms of computing? How CSC can help? Any comments towards CSC? 42