High Performance Computing Infrastructure in Japan Kento Aida National Institute of Informatics
2 Overview of HPCI
Introduction n High Performance Computing Infrastructure (HPCI) Ø national project promoted by Ministry of Education, Culture, Sports, Science and Technology (MEXT) in Japan Ø distributed computing infrastructure for high performance computing ü K computer, supercomputers and high performance storage Ø first production level infrastructure for high performance computing in Japan n roadmap Ø Mar 2011 basic design ü network, authentication, user management, shared storage, testbed for advanced software Ø Apr Dec 2011 detailed design Ø Jan Aug 2012 test operation Ø Sep 2012 production level operation
Services (1) account registration ü application ü account HPCI account cert. (2) single sign-on ü input HPCI account and password ü operation through a web browser (3) login to resources ü no password ü run jobs on supercomputers ü access files on shared storages computer HPCI shared storage
System Overview user management HPCI ID registration review proposals acct. registration helpdesk HPCI Secretariat (RIST) HPCI acct. apply certificate single sign-on portal authentication CA system certificate repository shib. SP shib. SP shib. IdP shib. IdP shib. IdP AICS (K-computer) Supercomputer Centers in 9 Universities AICS, U. Tokyo computer resource shared storage computer resource computer resource network infrastructure NII More resources will be connected after 2012.
Computing Resources As of Nov. 2012 RIKEN AICS K computer (10.62PF, 1.27PiB/30PiB) Kyushu Univ. FX10 (68.1TF/181.6TF, 9.2TB/24TB) CX400 (44.2TF/510.1TF, 16.4TB/184.5TB) SR16000 L2 (25.3TF, 5.5TB) Osaka Univ. SX-9 (16TF, 10TB) SX-8R (5.3TF, 3.3TB) PCCluster (6.1TF, 2.0TB) Kyoto Univ. XE6 (300.8 TF, 59 TB) GreenBlade8000(242.5TF, 38TB) 2548X(10.6TF, 24TB) Nagoya Univ. FX1(30.72TF, 24TB) HX600(25.6TF, 10TB) M9000(3.84TF, 3TB) Hokkaido Univ. SR16000/M1(51.6TF/172TF, 6.6TB/ 22TB) BS2000 (5.76TF/44TF, 1.92TB/14TB) RENKEI-VPE: VM Hosting Tohoku Univ. SX-9(29.4TF, 18TB) Express5800 (1.74TF, 3TB) Univ. of Tsukuba T2K (95.4Tflops, 20TB) HA-PACS (802Tflops, 34.3TB) FIRST (36.1TFlops, 1.6TB) Univ. of Tokyo FX10 (1.13PF, 150TB) SR16000/M1(54.9TF, 10.94TB) T2K (75.36TF/140TF, 16TB/31.25TB) EastHubPCCluster(10TF/13TF, 5.71TB/ 8.15TB) GPU Cluster(CPU 4.5TF, GPU 16.48TF, 1.5TB) WestHubPCCluster(12.37TF,8.25TB) RENKEI-VPE:VM Tokyo Institute of Hosting Technology TSUBAME2.0 (0.24PF/2.4PF, 10TB/ 100TB) RENKEI-VPE : VM Hosting source: M. Hirakawa, AICS
Storage HPCI WEST HUB 10 PB+ storage AICS, RIKEN HPCI EAST HUB 12 PB+ storage University of Tokyo Hokkaido University Gfarm2 is used as the global shared file system Kyushu University Tohoku University University of Tsukuba Tokyo Institute of Technology Nagoya University Osaka University Kyoto University source: Y. Ishikawa, Univ. of Tokyo
SINET4: Science Information NETwork 4 Network (SINET4)
SINET4 (cont d) n connection to 700+ academic sites n IX for commercial networks n 80Gbps backbone between Ø 134 30Gbps in Tokyo Tokyo and Osaka Ø 22 11Gbps in Osaka CA n L3VPN, L2VPN/VPLS, QoS user user user user portal univerisity university QoS IX Tokyo) IX Osaka) commercial network VPN non-comercial network university user storage compt. resource resource provider university user storage compt. resource resource provider AICS LAN user storage compt. resource user storage compt. resource
Cloud Service 10 n VM hosting Ø repository for research results Ø pre/post processing Ø testbed for prototype system software source: S. Takizawa, Tokyo Tech.
11 Authen3ca3on System
Overview of Authentication System n access to web portals: Shibboleth Ø management of certificates, user support, cloud service n access to remote computers: GSI Ø login to remote computers, access to shared storage n bridge between shibboleth and GSI: web portal user IdP, HPCI account pass word single sign- on portal (1) sign-on to the portal (cert. issuing system) (2) generate a proxy certificate and download the proxy certificate % gsi-ssh host.univ.ac.jp login to remote computers access to shared storage (3) ssh login to remote computers ü no need to give local account name and password
Architecture NII ü apply user cert. ü single sigh-on browser portal (Shib. SP) proxy cert. repository Shib. DS cert. management system cert. repository CA system (Shib. SP) ü login to resources GSI-SSH client supercomputer centers, RIKEN SINET 4 supercomputer centers, RIKEN portal (Shib. SP) proxy cert. repository GSI-SSH server Shib. IdP account DB
ü apply user cert. ü single sigh-on browser Architecture (cont d) NII portal (Shib. SP) proxy cert. repository Shib. DS cert. management system cert. repository CA system (Shib. SP) ü login to resources GSI-SSH client supercomputer centers, RIKEN SINET 4 supercomputer centers, RIKEN portal (Shib. SP) proxy cert. repository GSI-SSH server Shib. IdP account DB
Software role system software Certificate Authority CA system NAREGI-CA Portal NII supercomputer centers Identity Provider supercomputer centers, AICS Resource Provider supercomputer centers, AICS certificate management certificate repository ID federation portal (cert. issuing system) Proxy certificate repository ID federation ID federation middleware to access resources custom software MyProxy Shibboleth custom software MyProxy Shibboleth Shibboleth GSI-SSH Gfarm
Summary and Future Plan n Summary Ø This talk presents a design of HPCI focusing on the authentication mechanism. Ø HPCI started production level operation in Sep. 2012. n Issues Ø interoperation with oversea infrastructure ü review of the operation in HPCI CA to obtain approval of International Grid Trust Federation (IGTF) Ø federation with other authentication system ü discussion about the federation with other web authentication systems, e.g. OpenID
h=ps://www.hpci- office.jp/