CALMIP a Computing Meso-center for middle HPC academis Users CALMIP : Calcul en Midi-Pyrénées Méso-centre de Calcul Boris Dintrans, President Operational Board CALMIP, Laboratoire d Astrophysique de Toulouse Tarbes, Observatoire Midi-Pyrénées Pierrette Barbaresco, Toulouse University - Université Paul Sabatier Nicolas Renon, Toulouse University - Université Paul Sabatier Page 1
Region Midi-Pyrénées Space & Aeronautic Industry Food & Wine Toulouse Midi-Pyrénées Page 2
Summary : CALMIP : A Computing mesocenter Objectives Organisation / Fundings CALMIP : Academic community and regional SMEs CALMIP Figures CALMIP and SMEs Computing System Distributed Part / Shared Memory Part Performances Perspectives Page 3
Summary : CALMIP : A Computing mesocenter Objectives Organisation / Fundings CALMIP : Academic community and regional SMEs CALMIP Figures CALMIP and SMEs Computing System Distributed Part / Shared Memory Part Performances Perspectives Page 4
Scientific Grouping CALMIP : History Start in 1994 : 17 Labs. In Midi-Pyrénées region share computing resources Support of University of Toulouse (6 Universities) Purposes : Promote High Performance Computing training in parallel computing, code optimisation Exchanges experiences (Thematic Days) Access to a Competitive computing system Purchase a system Achieving performance/»easy to use»/stable Support Users Basics Developping parallel code Page 5
HPC Policy : Hierarchical Meshing European European Stage (PRACE): O(1000) TFLOP National National Stage (GENCI): O(100) TFLOP Toulouse Mesocentre Regional Stage : O(10) TFLOP Midi-Pyrénées CALMIP GENCI : Grands Equipements Nationaux en Calcul Intensif Page 6
HPC Policy : computing mesocenter European European Stage : O(1000) TFLOP National National Stage : O(100) TFLOP Mesocentre Regional Stage : O(10) TFLOP Midi-Pyrénées CALMIP Labs. Mesocenter : Interface National stage Labs. stage (nearness and propinquity ) CALMIP Scientific Grouping Constraints of Production Multi scientific domains Page 7
Scientific grouping CALMIP : Organisation governing Scientific Council Representing Universities Region Midi-pyrénées Local companies Granting resources Operationnal Board Scientific projects review Resource Attribution 2 calls/year Operation Toulouse University Université Paul Sabatier Purchase/Operate computing system User Support Training Communication Page 8
2010 : 3rd system in production 0,04 Tflop 64 cores / 72 Go ram 1,5 Tflop 512 cores / 256 Go ram 33 TF 2912 cores / 14 To ram 1999 2004 2007 2010 First System. #1 Renew. #2 upgrading Renew. #3 Fundings National + University Region(Local)+National+University 15% computation system devoted to SMEs Page 9
HYPERION, CALMIP computational system HYPERION 2912 cores Nehalem Intel 33,57 TF peak 223rd TOP 500 (november 2009) Page 10
Summary : CALMIP : A Computing mesocenter Objectives Organisation / Fundings CALMIP : Academic community and regional SMEs CALMIP Figures CALMIP and SMEs Computing System Distributed Part / Shared Memory Part Performances Perspectives Page 11
CALMIP : figures evolutions Y2005-Y2011 +58% +50% +30% +7% +307% +50% Year 2005 2006 2007 2008 2009 2010 2011 * Cpu hour requested 760 000 1 200 000 1 800 000 2 345 000 2 520 000 10 279 000 15 073 000 Total available 400 000 800 000 800 000 1 600 000 1 600 000 17 000 000 18 000 000 Ratio (Avail./Request.) 52% 66% 44% 68% 64% 171% 120% * Only first call Y2011 Former System New System 33TF HYPERION Page 12
CALMIP : science distribution evolution Total Demand : Y2010 : 10 000 000 h Y2009 : 2 500 000 h Page 13
CALMIP and SMEs Y2007 : Regional Funding Main idea : give regional companies access to HPC resources Target : SMEs (special rates for SMEs) How : 15% CALMIP Computing System devoted 15% = 3 000 000 cpu hour in Y2011 Challenges to face : Clients out of the Academic community Lots SMEs need HPC, but don t know how to use it Is numerical simulation fully embedded in production process? Technical Challenges Data : safety, security, transfert, What kind of service? hardware, software(isv), expertise, Page 14
CALMIP and SMEs CALMIP and SMEs: Service for Parallel Scientific computations focus on SMEs in Space & Aerodynamics access to a better time-to-solution in their numerical simulation problem access to simulations they can t run on their systems give SMEs access to a full computation solution : cpu/ram/storage/graphic CALMIP : Three-cornered relationship Part. 1 : SME Part.3 : CALMIP Computing resource Part. 2 : Consulting, engineering services Page 15
Summary : CALMIP : A Computing mesocenter Objectives Organisation / Fundings CALMIP : Academic community and regional SMEs CALMIP Figures CALMIP and SMEs Computing System Distributed Part / Shared Memory Part Performances Perspectives Page 16
HYPERION Shared memory : Altix UV 96 cores 2,6 Ghz Nehalem EX 6-cores 1 TB ram ccnuma architecture Distributed Memory : Altix ICE 2816 cores 2,8 Ghz Nehalem EP Quad core 36 GB ram /nodes Interconnect : IB, Dual-Rail, DDR permanent storage: Enhanced NFS : 38TB Temporary Storage : Water cooling Lustre Remote Visualisation Solution : 2 MDS, 4 OSS 4 nodes : 3 Gbytes/s 8 cores NHM, 48 go ram 200 TB GPU nvidia FX 4800 Virtual GL/ Turbo VNC Page 17
Distributed memory example in CFD : Industrial fluidised bed Neptune_CFD : 3 000 000 cpu hours used in Y2010 on HYPERION Number of cores range in production : 68c - 512c Courtesy of : O Simonin, H. Neau, Laviéville - Institut de Mécanique des fluides de Toulouse - Université de Toulouse/ CNRS Page 18
Distributed memory example in CFD : Industrial fluidized bed Time-to-solution and speed-up HYPERION HYPERION C1 a : Altix ICE Harpertown, Intel_MPI C1 b : Altix ICE Harpertown, MPT C2 a : HYPERION Altix ICE NHM, Intel MPI C2 b : HYPERION Altix ICE NHM, MPT C3 : Cluster IB AMD Shanghaï, OpenMPI Courtesy of : O Simonin, H. Neau, Laviéville - Institut de Mécanique des fluides de Toulouse - Université de Toulouse/ CNRS Page 19
Share memory example in Theoretical Physics : N-body Physics Exact resolution of Schrödinger Equation => Eigenvalue problem N-Body physics Share-memory approach, need > 200 GB OpenMP Parallelized => 64 threads roughly 326 000 cpu hours used on HYPERION in Y2010 Courtesy of : Sylvain Capponi - Laboratoire de Physique Théorique - Université Paul Sabatier / CNRS Page 20
Runs on Shared memory part : Altix UV 100 N-body physics code DO36 scalability on AltixUV Page 21
CALMIP and new HPC users Example in Life Science Fish population fragmentation studies Counting in fish in two Rivers near by Toulouse fish population fragmentation studies 12 000 cpu hour used mostly data set exploring (code in C! Good!) Chevaine Gandoise Vairon Page 22
CALMIP Perspectives CALMIP and EQUIP@MESO Project Rouen Paris Reims Strasbourg Saclay Lyon Grenoble Toulouse Marseille CALMIP(Toulouse) Partner of GENCI in project EQUIP@MESO 10 partners amplify with a global coherency the HPC capacities of 10 regional centres carry on training actions and to allow reciprocal transfers of knowledge among regional centres and their universities relay at the regional level, the national initiative HPC for SME set up by GENCI, INRIA, OSEO (www.initiativehpc-pme.org) to help SME to use HPC to improve their competitiveness. GENCI : Grands Equipements Nationaux en Calcul Intensif Page 23
CALMIP and EQUIP@MESO Project CALMIP(Toulouse) Partner of GENCI in project EQUIP@MESO Upgrading Share-memory Part (Altix UV) 1 SSI with 400 cores (westmere EX) + 3 To shared RAM 2 TF GPU (NVIDIA) Meet users needs and beyond their needs Big Memory : Plasma Physics, Physics Theory, Astrophysic DATA processing Increase time to solution for middle parallel jobs «Easy to use» Developping environnement renon@uvcalmip:~> qstat -a @uvcalmip Job ID Username Queue Jobname NDS TSK Memory Time S Time --------------- -------- -------- ---------- ------ --- --- ----- --- ------- 103089.service0 hubert uv BiH.runCC4 1 1 225gb 230:0 R 41:27 103519.service0 roudier uv cst 1 8 200gb 100:0 R 13:25 103584.service0 capponi uv in.80.0-0. 1 48 560gb 140:0 Q -- 103600.service0 didierp uv peps 1 8 100gb 240:0 R 16:02 103638.service0 capponi uv in.80.0-0. 1 48 600gb 140:0 Q -- Page 24
CALMIP Perspectives Y2013 Shared facilities : «Espace Clément Ader» Building Partners : University of Toulouse / French Weather Forecast (Meteo-France)/ Midi-Pyrénées Region Different computing systems One Single Plant : 3,5 MW in Y2013, 1000m2 /floor Page 25
CALMIP : Science Y2010 Scientific Topics (Operational Board) Theoretical & Molecular Physics Biology & Molecules Material Chemistry and Physics CFD Quantum Chemistry Numerical Algorithm Astrophysics, Earth Science ~30 labs 150 Scientific Projects Page 26
Magnitude des changements CALMIP Projet #2 : fragmentation des populations de poissons (P1003 / 12 000h, Paz / Loot / Blanchet, EdB) 1500-9000 ans 1500-9000 ans 1500-6300 2000-9000 3000-4000 ans ans ans 1500-6300 2000-9000 3000-4000 ans ans ans Non fragmentée Fragmentée Ne correspondent pas à la fragmentation Correspondent aux colonisations postglaciaires!! Page 27
Altix UV 100 and MKL-DGEMM scalability Scalability : tests on DGEMM multithreaded (MKL, ifort : v11), size matrix = 10000 x 10000 (DP) Nhm 2,8 Ghz, EP, bi-socket node (ICE node) NHM 2,6 Ghz, EX, 16-socket node (UV node) Itanium 1,5 Ghz, 128 -socket node (Altix3700) -DGEMM-SCSL HYPERTHREADING Page 28
Distributed memory example : Atmospheric Model MésoNH code Méso-NH: non-hydrostatic mesoscale atmospheric model of the French research community. developpement and maintenance at : Laboratoire d Aérologie CNRS/University of Toulouse (Université Paul Sabatier/ UMR 5560 ) CNRM-GAME URA 1357 CNRS/Météo-France (French Weather Forecast) Page 29
MesoNH Performance on HYPERION HYPERION performance Courtesy of : JP Pinty & J. Escobar Laboratoire Aérologie - Observatoire Midi-Pyrénées Université Paul Sabatier/ CNRS Page 30