Self service for software development tools

Similar documents
High Performance Computing in CST STUDIO SUITE

FLOW-3D Performance Benchmark and Profiling. September 2012

High Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/ CAE Associates

Can High-Performance Interconnects Benefit Memcached and Hadoop?

Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer

Cloud Computing through Virtualization and HPC technologies

David Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems

Hadoop on the Gordon Data Intensive Cluster

OpenMP Programming on ScaleMP

Scientific Computing Data Management Visions

HPC Cluster Decisions and ANSYS Configuration Best Practices. Diana Collier Lead Systems Support Specialist Houston UGM May 2014

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

Private Cloud: Regain Control of IT

Using SUSE Cloud to Orchestrate Multiple Hypervisors and Storage at ADP

Cluster Implementation and Management; Scheduling

A central continuous integration platform

Recent Advances in HPC for Structural Mechanics Simulations

Intel Solid-State Drives Increase Productivity of Product Design and Simulation


The PHI solution. Fujitsu Industry Ready Intel XEON-PHI based solution. SC Denver

SMB Direct for SQL Server and Private Cloud

Mellanox Cloud and Database Acceleration Solution over Windows Server 2012 SMB Direct

ALPS Supercomputing System A Scalable Supercomputer with Flexible Services

Solution for private cloud computing

owncloud Enterprise Edition on IBM Infrastructure

Recommended hardware system configurations for ANSYS users

Simplifying Big Data Deployments in Cloud Environments with Mellanox Interconnects and QualiSystems Orchestration Solutions

Interconnect Analysis: 10GigE and InfiniBand in High Performance Computing

Enabling Technologies for Distributed Computing

Appro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes. Anthony Kenisky, VP of North America Sales

Michael Kagan.

How to Deploy OpenStack on TH-2 Supercomputer Yusong Tan, Bao Li National Supercomputing Center in Guangzhou April 10, 2014

IBM Platform Computing Cloud Service Ready to use Platform LSF & Symphony clusters in the SoftLayer cloud

SR-IOV In High Performance Computing

Parallels Cloud Storage

Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays. Red Hat Performance Engineering

Lustre SMB Gateway. Integrating Lustre with Windows

A Micro-benchmark Suite for Evaluating Hadoop RPC on High-Performance Networks

STeP-IN SUMMIT June 18 21, 2013 at Bangalore, INDIA. Performance Testing of an IAAS Cloud Software (A CloudStack Use Case)

SUSE Cloud 2.0. Pete Chadwick. Douglas Jarvis. Senior Product Manager Product Marketing Manager

DSS. Diskpool and cloud storage benchmarks used in IT-DSS. Data & Storage Services. Geoffray ADDE

Software-defined Storage Architecture for Analytics Computing

Workshop on Parallel and Distributed Scientific and Engineering Computing, Shanghai, 25 May 2012

HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK

Silver Peak Virtual Appliances

Client-aware Cloud Storage

HPC Cloud. Focus on your research. Floris Sluiter Project leader SARA

Development of Monitoring and Analysis Tools for the Huawei Cloud Storage

Best practices for efficient HPC performance with large models

PES. TWiki at CERN Service Evolution. Platform & Engineering Services. Terje Andersen, Peter Jones for IT-PES-IS Jan 2014

Clusters: Mainstream Technology for CAE

DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION

NoSQL Performance Test In-Memory Performance Comparison of SequoiaDB, Cassandra, and MongoDB

Agenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC

Cloud Storage. Parallels. Performance Benchmark Results. White Paper.

Enabling Technologies for Distributed and Cloud Computing

SLIDE 1 Previous Next Exit

SMB Advanced Networking for Fault Tolerance and Performance. Jose Barreto Principal Program Managers Microsoft Corporation

Configuring and Launching ANSYS FLUENT Distributed using IBM Platform MPI or Intel MPI

An Oracle White Paper March Load Testing Best Practices for Oracle E- Business Suite using Oracle Application Testing Suite

Simulation Platform Overview

Introduction. Need for ever-increasing storage scalability. Arista and Panasas provide a unique Cloud Storage solution

Accelerating From Cluster to Cloud: Overview of RDMA on Windows HPC. Wenhao Wu Program Manager Windows HPC team

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren

Comparing SMB Direct 3.0 performance over RoCE, InfiniBand and Ethernet. September 2014

wu.cloud: Insights Gained from Operating a Private Cloud System

How To Build A Supermicro Computer With A 32 Core Power Core (Powerpc) And A 32-Core (Powerpc) (Powerpowerpter) (I386) (Amd) (Microcore) (Supermicro) (

CON Software-Defined Networking in a Hybrid, Open Data Center

Accelerating Spark with RDMA for Big Data Processing: Early Experiences

Highly-Available Distributed Storage. UF HPC Center Research Computing University of Florida

PES. Batch virtualization and Cloud computing. Part 1: Batch virtualization. Batch virtualization and Cloud computing

Performance Analysis and Capacity Planning Whitepaper

A Study on the Scalability of Hybrid LS-DYNA on Multicore Architectures

Zadara Storage Cloud A

Scaling from Datacenter to Client

NexentaStor Enterprise Backend for CLOUD. Marek Lubinski Marek Lubinski Sr VMware/Storage Engineer, LeaseWeb B.V.

Hadoop: Embracing future hardware

præsentation oktober 2011

Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing

Preview of Oracle Database 12c In-Memory Option. Copyright 2013, Oracle and/or its affiliates. All rights reserved.

Diablo and VMware TM powering SQL Server TM in Virtual SAN TM. A Diablo Technologies Whitepaper. May 2015

SURFsara HPC Cloud Workshop

Estonian Scientific Computing Infrastructure (ETAIS)

Integration of Virtualized Workernodes in Batch Queueing Systems The ViBatch Concept

Best Practices for Virtualised SharePoint

Stovepipes to Clouds. Rick Reid Principal Engineer SGI Federal by SGI Federal. Published by The Aerospace Corporation with permission.

LS DYNA Performance Benchmarks and Profiling. January 2009

PES. Ermis service for DNS Load Balancer configuration. HEPiX Fall Aris Angelogiannopoulos, CERN IT-PES/PS Ignacio Reguero, CERN IT-PES/PS

High Performance Oracle RAC Clusters A study of SSD SAN storage A Datapipe White Paper

An HPC Application Deployment Model on Azure Cloud for SMEs

Current Status of FEFS for the K computer

Using the Windows Cluster

Transcription:

Self service for software development tools Michal Husejko, behalf of colleagues in CERN IT/PES CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it

Self service for software development tools & Scalable HPC at CERN Michal Husejko, behalf of colleagues in CERN IT/PES CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it

Agenda Self service portal for software development tools Scalable HPC at CERN Next steps CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it

Services for (computing) projects A place to store code and configuration files under revision control A place to document development and to provide instructions for users A place to track bugs and plan new features Other services such as build and testing frameworks, but this is outside the scope of this talk CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it

Version Control Services at CERN SVN: Still the main version control system. 2100 repositories, over 50000 commits per month (SVN Statistics) GIT: Available as a service since spring, about 700 repositories CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it

Collaborative Web documentation Typically, TWiki for web documentation In use at CERN since 2003, ~10000 users Documentation divided into project Webs and Topics Currently running TWiki 5.1 Many plugins and active user community Could also be on Drupal or another web site for the project A link provides flexibility CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it

JIRA issue tracking service Central JIRA instance CERN : SSO, egroups, Service-now link Plugins : Git, SVN, Agile, Issue Collector, Gantt charts 165 projects and ~2000 users of central instance, growing (10 projects/week) More users in the other JIRA instances, that have been migrated to the central Issue Tracking infrastructure Savannah migration to JIRA on-going (e.g. ROOT migrated) CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it

The case for automation Creation of a project on a self-service basis is easy for the user and avoids a repetitive task for support teams. Administrators of projects would benefit if they can set e.g. authorization rules for multiple tools at the same time Certain settings of a Git or SVN projects are beyond the scope of a project administrator, if an intervention requires access to server infrastructure JIRA project administrators have restricted rights to change their configuration. Providing more freedom would be desirable. CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it

Example: Git service DNS Load balancer DNS AFS 8443 8443 8443 443 443 443 Git.cern.ch https Gitolite Permissions, integrated with e-groups World or Cern visibility option Gitweb Browser access Infrastructure Puppet DNS LB Apache LB HTTP AFS (FS agnostic) CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it

CERN Forge admin portal Django Web interface, with backend DB for service metadata CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it Use REST API to JIRA and other services for administrative actions

CERN Forge admin portal - 2 In the future, users will be able to configure links between VCS and Issue tracking, as well as other settings (public/private etc) CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it

More in the next chapter! Stay tuned for updates on http://cernforge.cern.ch Over to another challenge where we are working towards on demand flexibility: Scalable High Performance Computing CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it

Scalable HPC (1/2) Some 95% of our applications are served well with breadand-butter machines We (CERN IT) have invested heavily in AI including layered approach to responsibilities, virtualization, private cloud. There are certain applications, traditionally called HPC applications, which have different requirements Even though these applications sail under common HPC name, they are different and have different requirements These applications need detailed requirements analysis

Scalable HPC (2/2) We contacted our user community and started to gather continuously user requirements We have started detailed system analysis of our HPC applications to gain knowledge of their behavior. In this talk I would like to present the progress and the next steps At a later stage, we will look how the HPC requirements can fit into the IT infrastructure

CERN HPC applications Engineering applications: Used at CERN in different departments to model and design parts of the LHC machine. Tools used for: structural analysis, fluid dynamics, electromagnetics, and recently multiphysics Major commercial tools: Ansys, Fluent, HFSS, Comsol, CST but also open source: OpenFOAM (fluid dynamics) Physics simulation applications HEP community developed simulation applications for theory and accelerator physics

One of many eng. cases Ansys Mechanical LINAC4 beam dump system, single cycle simulation Time to obtain a single cycle solution: 8 cores -> 63 hours to finish simulation 64 cores -> 17 hours to finish simulation User interested in 50 cycles: would need 130 days on 8 cores, or 31 days on 64 cores It is impossible to obtain simulation results for this case in a reasonable time on a standard user engineering workstation

Challenges Why do we care? Challenges Problem size and its complexity are challenging our users workstations This can be extrapolated to other Engineering HPC applications How to solve the problem? Can we use current infrastructure? or do we need something completely new? and if something new, how this could fit into our IT infrastructure We are running a Data Center and not a Super Computing Center

Current infrastructure CERN lxbatch Optimized for HEP data processing Definitely not designed with MPI applications in mind Example lxbatch node: Network: mostly Ethernet 1 Gb/s, rarely 10 Gb/s (very limited number with low latency) 8-48 cores, 3-4 GB RAM per core Local disk, AFS or EOS/CASTOR HEP storage Can current lxbatch service provide good-enough HPC capability? How interconnect affects performance (MPI based distributed computing) How much RAM per core Type of temporary storage Can we utilize multicore CPUs to decrease time to solution (increase jobs/week)?

Number of jobs/week Multi-core scalability We know that some of our tools have a good scalability (example Ansys Fluent) How about other, heavily used at CERN (example Ansys Mechanical)? One of many test cases: LINAC4 beam dump system, single cycle simulation results: Scales well beyond single multi-core box. Greatly improved number of jobs/week, or simulation cycles/week Conclusion Multi-core distrbuted platforms needed to finish simulation in reasonable time 12 10 8 6 4 2 0 Number of cores 0 20 40 60 80

Jobs/week Interconnect latency impact Ansys Fluent Commercial CFD application Speedup beyond single node can be diminished because of high latency interconnect. The graph shows good scalability for 10 Gb low latency beyond single box, and dips in performance when switched to 1 Gb MPI Conclusion: Currently 99 % of nodes in CERN batch system are equipped with 1 Gb/s NIC Low latency interconnect solution needed. 200 160 120 80 40 0 Jobs/week 16c machine Number of cores Jobs/week 32c machine 0 8 16 24 32 40 48 56 64 72

Memory requirements In-core/out-core simulations (avoiding costly file I/O) In-core = most of temporary data is stored in the RAM (still can write to disk during simulation) Out-of-core = uses files on file system to store temporary data. Preferable mode is in-core to avoid costly disk I/O accesses, but this requires increased RAM memory and its bandwidth Ansys Mechanical (and some other engineering applications) has limited scalability Depends heavily on solver and user problem Limits possibility of problem splitting among multiple nodes All commercial engineering application use some licensing scheme, which can put skew on choice of a platform

Impact of HDD IOPS on performance jobs/week Temporary storage? Test case: Ansys Mechanical, BE CLIC test system Disk I/O impact on speedup. Two configurations compared. Using SSD (better IOPS) instead of HDD increases jobs/week almost by 100 % Conclusion: We need to investigate more cases to see if this is a marginal case or something more common 1000 800 600 400 200 Jobs per week ssd 0 Number of cores 0 4 8 12 16 20 24 28 32 36

Analysis done so far for engineering applications Conclusions More RAM needed for in-core mode, this seems to solve potential problem of disk I/O access times. Multicore machines needed to decrease time to solution Low latency interconnect needed for scalability beyond single node, which by itself is needed to decrease simulation times even further. Next steps: Perform scalability analysis on many-node clusters Compare low latency 10 Gb/s Ethernet with Infiniband Cluster for all CERN HPC Engineering applications

Last but not least If low latency interconnect then Ethernet or Infiniband?

Lattice QCD Lattice QCD: Highly parallelized MPI application Main objective is to investigate: Impact of interconnection network on system level performance (comparison of 10 Gb Ethernet iwarp and Infiniband QDR) Scalability of clusters with different interconnect Is Ethernet (iwarp) good enough for MPI heavy applications?

IB QDR vs. iwarp 10 Gb/s Two clusters, same compute nodes, same BIOS settings. Dual Socket, Xeon SandyBridge E5-2630L, 12 cores total per compute node 64 GB RAM @ 1333 MT/s per compute node Different Interconnect networks Qlogic (Intel) Infiniband QDR (40 Gb/s) 12 compute nodes + 1 frontend node (max. 128 cores) NetEffect (Intel) iwarp Ethernet (10 Gb/s) + HP 10Gb low latency switch 16 compute nodes + 1 frontend node (max. 192 cores)

IB QDR.vs iwarp 10 Gb/s (1/3) % 100% 80% 60% 40% 20% 0% %COMP vs %MPI IB %COMP vs %MPI iwarp 100% 80% 93.2 87.7 77.8 65.2 60.9 93.2 83.0 70.9 58.2 51.7 42.5 60% %COMP IB 40% %COMP iwarp 6.8 12.3 22.3 34.8 39.1 %MPI IB 6.8 17.0 29.1 41.8 48.3 57.6 20% %MPI iwarp 0% 12 24 48 96 128 192 12 24 48 96 128 192 Number of tasks (cores) Number of tasks (cores) 60.0 Application Percentage - MPI IB vs iwarp 50.0 40.0 30.0 20.0 %MPI IB %MPI iwarp 10.0 0.0 0 24 48 72 96 120 144 168 192 216 Number of tasks (cores)

Seconds Thousands IB QDR.vs iwarp 10 Gb/s (2/3) Seconds Thousands Less is better Computation Time per Core IB vs iwarp 100 Communication Time per Core IB vs iwarp 8 7 6 10 CompuTimeCore iwarp 5 4 3 CommTimeCore iwarp CompuTimeCore IB 2 CommTimeCore IB 1 0 24 48 72 96 120 144 168 192 1 0 24 48 72 96 120 144 168 192 Number of tasks (cores) Number of tasks (cores)

IB QDR.vs iwarp 10 Gb/s (3/3) Time per trajectory (seconds) Less is better 6.00E+03 Time per trajectory (test IB/iWARP cluster) 5.00E+03 5.05E+03 4.00E+03 3.00E+03 2.85E+03 IB Time per trajectory iwarp Time per trajectory 2.00E+03 1.66E+03 1.00E+03 1.39E+03 1.08E+03 0.00E+00 0 24 48 72 96 120 144 168 192 216 Number of cores

Conclusions and outlooks Activity started to better understand requirements of CERN HPC applications Performed first steps to measure performance of CERN HPC applications on Ethernet (iwarp) based cluster results are encouraging Next steps are: Refine our approach and our scripts to work at higher scale (next target is 40-60 nodes) and with real-time monitoring Compare results between Sandy Bridge 2 socket system with SB 4 socket system both iwarp Gain more knowledge about impact of Ethernet interconnect network and tuning parameters on MPI jobs Investigate impact of virtualization (KVM, Hyper-V) on latency and bandwidth for low latency iwarp NIC.

Q&A