CAS2K5. Jim Tuccillo jtuccillo@lnxi.com 912.576.5215



Similar documents
Appro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes. Anthony Kenisky, VP of North America Sales

Agenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

New Storage System Solutions

ALPS Supercomputing System A Scalable Supercomputer with Flexible Services

Cluster Implementation and Management; Scheduling

Fujitsu HPC Cluster Suite

Improved LS-DYNA Performance on Sun Servers

HPC Software Requirements to Support an HPC Cluster Supercomputer

Sun Constellation System: The Open Petascale Computing Architecture

SRNWP Workshop. HP Solutions and Activities in Climate & Weather Research. Michael Riedmann European Performance Center

Designed for Maximum Accelerator Performance

WHITE PAPER. ClusterWorX 2.1 from Linux NetworX. Cluster Management Solution C ONTENTS INTRODUCTION

The PHI solution. Fujitsu Industry Ready Intel XEON-PHI based solution. SC Denver

LANL Computing Environment for PSAAP Partners

Comparing the performance of the Landmark Nexus reservoir simulator on HP servers

Current Status of FEFS for the K computer

PRIMERGY server-based High Performance Computing solutions

SR-IOV: Performance Benefits for Virtualized Interconnects!

Using the Windows Cluster

High-Performance Computing Clusters

Cloud Computing through Virtualization and HPC technologies

1 Bull, 2011 Bull Extreme Computing

How To Write An Article On An Hp Appsystem For Spera Hana

ORACLE OPS CENTER: PROVISIONING AND PATCH AUTOMATION PACK

Sun in HPC. Update for IDC HPC User Forum Tucson, AZ, Sept 2008

The GRID and the Linux Farm at the RCF

Cluster Computing at HRI

Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage

Red Hat Network Satellite Management and automation of your Red Hat Enterprise Linux environment

HPC Update: Engagement Model

Red Hat Satellite Management and automation of your Red Hat Enterprise Linux environment

Altix Usage and Application Programming. Welcome and Introduction

Technical Overview of Windows HPC Server 2008

Cisco for SAP HANA Scale-Out Solution on Cisco UCS with NetApp Storage

JUROPA Linux Cluster An Overview. 19 May 2014 Ulrich Detert

Certification: HP ATA Servers & Storage

Highly-Available Distributed Storage. UF HPC Center Research Computing University of Florida

February, 2015 Bill Loewe

Performance, Reliability, and Operational Issues for High Performance NAS Storage on Cray Platforms. Cray User Group Meeting June 2007

Enabling Technologies for Distributed Computing

Microsoft Compute Clusters in High Performance Technical Computing. Björn Tromsdorf, HPC Product Manager, Microsoft Corporation

ClusterWorX r : A Framework to Manage Large Clusters Effectively

22S:295 Seminar in Applied Statistics High Performance Computing in Statistics

IOS110. Virtualization 5/27/2014 1

Enabling Technologies for Distributed and Cloud Computing

Clusters: Mainstream Technology for CAE

INDIAN INSTITUTE OF TECHNOLOGY KANPUR Department of Mechanical Engineering

High Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/ CAE Associates

Intel Cluster Ready Appro Xtreme-X Computers with Mellanox QDR Infiniband

HP reference configuration for entry-level SAS Grid Manager solutions

High Performance Computing OpenStack Options. September 22, 2015

Overview of HPC systems and software available within

alcatel-lucent vitalqip Appliance manager End-to-end, feature-rich, appliance-based DNS/DHCP and IP address management

Building a Top500-class Supercomputing Cluster at LNS-BUAP

VTrak SATA RAID Storage System

Cray XT3 Supercomputer Scalable by Design CRAY XT3 DATASHEET

A Study on the Scalability of Hybrid LS-DYNA on Multicore Architectures

- An Essential Building Block for Stable and Reliable Compute Clusters

Supercomputing Status und Trends (Conference Report) Peter Wegner

High Performance Computing in Aachen

Virtualization Technologies and Blackboard: The Future of Blackboard Software on Multi-Core Technologies

SUN HPC SOFTWARE CLUSTERING MADE EASY

Kriterien für ein PetaFlop System

Oracle Database Scalability in VMware ESX VMware ESX 3.5

Shared Parallel File System

FLOW-3D Performance Benchmark and Profiling. September 2012

Building Clusters for Gromacs and other HPC applications

Recommended hardware system configurations for ANSYS users

David Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems

Commoditisation of the High-End Research Storage Market with the Dell MD3460 & Intel Enterprise Edition Lustre

Smarter Cluster Supercomputing from the Supercomputer Experts

SUN ORACLE EXADATA STORAGE SERVER

Windows Server 2008 R2 Hyper V. Public FAQ

How To Compare Two Servers For A Test On A Poweredge R710 And Poweredge G5P (Poweredge) (Power Edge) (Dell) Poweredge Poweredge And Powerpowerpoweredge (Powerpower) G5I (

PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN

Smarter Cluster Supercomputing from the Supercomputer Experts

Kashif Iqbal - PhD Kashif.iqbal@ichec.ie

THE SUN STORAGE AND ARCHIVE SOLUTION FOR HPC

The Production Cluster Construction Checklist

The Asterope compute cluster

FOR SERVERS 2.2: FEATURE matrix

Cisco Unified Computing Remote Management Services

A Smart Investment for Flexible, Modular and Scalable Blade Architecture Designed for High-Performance Computing.

SERVER CLUSTERING TECHNOLOGY & CONCEPT

SGI High Performance Computing

Cisco Application Networking Manager Version 2.0

High Performance Computing in CST STUDIO SUITE

HUAWEI Tecal E6000 Blade Server

Parallel Programming Survey

Cloud Storage. Parallels. Performance Benchmark Results. White Paper.

Motherboard- based Servers versus ATCA- based Servers

OpenPower: IBM s Strategy for Best of Breed 64-bit Linux

Cray DVS: Data Virtualization Service

MPI / ClusterTools Update and Plans

RLX Technologies Server Blades

Transcription:

CAS2K5 Jim Tuccillo jtuccillo@lnxi.com 912.576.5215

Agenda icorporate Overview isystem Architecture inode Design iprocessor Options iinterconnect Options ihigh Performance File Systems Lustre isystem Management i System Software iservices and Training iwrf Scalability

Linux Networx Corporate Overview i Mission i Focus exclusively on Linux clustered solutions, tools, and services i High Performance production computing systems i Government, Commercial, Academic i Growth/Funding i Revenues have grown an average of over 100% per year for the past four years i Similar growth rates projected for 2005 and 2006 i $40 million in funding during 2004 i Significant additions of key management and technical personnel from the HPC industry i 40% increase in integration capacity in 2005 i Future i Continued growth Expertise in system engineering, software, and networking i Balanced introduction of dependable, leading edge technologies i Development of new technologies focus on production HPC

Company Background iincorporated in 1989 iheadquarters: Salt Lake City, Utah iprivately held ioperations: Americas, Europe, Asia

What We Do i Hardware Integration: i Commodity components processors, memory, motherboard i Purpose built chassis, cooling system and power distribution units i Software Integration: i Standard Linux OS distributions from Red Hat or SuSE i Clusterworx Cluster Management Software i Linux BIOS i Open Source Tools, MPI libraries, and applications software i Installation & Overall System Integration i Test, validate, & integrate hardware and software components prior to shipment i Verify cooling and power requirements for your specific configuration and layout

Position in the Industry i1997 - First company to deliver a Linux cluster i1997 First cluster management tools i2000 - First design & patent on vertical nodes i2002 Built world s most powerful Linux cluster - first cluster to be ranked as a top 5 computer i2003 - Designed & delivered a 2,800 CPU cluster in under three months an industry record iover 1000+ systems and 200+ sites

Linux Networx Systems Site: Computer Name: Rmax: Best Top500 Ranking: System Model: Processors: US Army Research Laboratory John Von Neumann 8.770 TFlops 13 Evolocity II 2048 Xeon 3.4 GHz Los Alamos National Laboratory Lightning 8.051 TFlops 6 Evolocity 2816 Opteron 2 GHz Lawrence Livermore National Lab. MCR 7.634 TFlops 3 Evolocity II 2304 Xeon 2.4 GHz Site: Computer Name: Rmax: Best Top500 Ranking: System Model: Processors: Grid Technology Research Center AIST Super Cluster 1.997 TFlops 97 Evolocity II 512 Xeon 3.06 GHz Sandia National Laboratories Catalyst 1.076 TFlops 111 Evolocity II 256 P4 Xeon 3.06 GHz US Army Research Laboratory Powell 1.060 TFlops 113 Evolocity II 256 Pentium4 Xeon 3.06 GHz

Linux Networx Linux Networx White Box Vendors Proprietary Vendors

System Architecture

Node Overview i Compute Nodes 8 Provide primary computational resource for applications i Administration Nodes 8 Provide system management functions 8 Host the system management software i Login Nodes 8 Provide user access point to cluster 8 Host authentication services, such as a DoD Kerberos ticketing system 8 Host development tools for users i I/O Storage Nodes 8 Lustre MetaData Server (MDS) Nodes manage file names and locations. There are 1 or 2 MDS nodes per cluster. 8 Lustre Object Storage Target Server (OSS) nodes provide high-speed, scalable access to files

Node Design Evolocity II Designed for HPC clustering Compact form factor (0.8U) Up to 50 Nodes per rack Use standard motherboards Optimized for maximum air flow and cooling 2 processor sockets: 2-way or 4- way nodes

Node Design Vertical rack-mount design benefits system cooling Horizontal design captures heat Vertical design allows for better air flow Even More Important as Density rises Evolocity decreases temperature by 12 C compared to traditional solution

Processor Options isingle-core AMD Opteron (64-bit) isingle-core Intel (64-bit EMT-64) idual-core AMD idual-core Intel

Processor Selection itypically we benchmark both AMD and Intel processors. A rough guideline would be: - AMD Opteron for memory bandwidth limited codes - Intel for codes that have high cache/register residency iwe can fine-tune price/performance through clock speed selection

Interconnect Fabric iinfiniband imyrinet igigabit Ethernet

Lustre Architecture Failover Lustre Clients (Compute/ Login Nodes) Lustre MetaData Servers (MDS) (I/O Nodes) IB MDS 1 (active) MDS 2 (standby) Dual Fibre Channel 4 DDN Storage Lustre Object Storage Servers (I/O Nodes)

Lustre Glossary i Metadata Server (MDS) i Exports one or more MetaData Targets (MDTs) i Stores file system metadata i Object Storage Server (OSS) i Exports one or more Object Storage Targets (OSTs) i Stores file contents i Lustre Client i Writes to and reads from files i Client installed on each Compute and Login Node for filesystem access i Logical Object Volume (LOV) i Manages file striping across multiple OSTs i Size of stripe and number of OSTs selectable per directory or per file

Facilities, Power, and Cooling i System engineering approach treating facilities as part of system i Include all major interfaces 8 Power 8 Thermal/Cooling 8 Physical requirements (space and loading) 8 Networking 8 Logistics i Linux Networx has wide range of experiences deploying clustering in a variety of data centers i We work closely with facilities engineers throughout design process i Linux Networx provides Fluent airflow modeling to ensure proper system cooling at the customer facility i 208V 3-phase power to racks 8 Most racks require two 50A circuits i Low-power options are available, with reduced performance i Temperature-controlled variable speed fans cool the system with reduced noise and airflow

System Management iconsists of the ICE Box appliance, Clusterworx software, and LinuxBIOS iremotely controls each node s power cycling, booting, software image management iprovides remote BIOS modifications iongoing monitoring of software processes and physical characteristics including temperature, memory ECC, disk drive activity, and fan functionality

System Management: Clusterworx i Integrated disk cloning installs the Linux OS and applications on a cluster system of any size in minutes i Image Manager allows for easy creation of disk or nfsboot images i Integrated version control system for cluster node payloads i Historical graphing of system statistics creates dynamically generated charts to show cluster performance i Monitoring of system properties including CPU usage, memory usage, disk I/O, and network bandwidth i Automatic administration actions and notification i Remotely accessible, easy-to-use GUI

System Management: ICE Box Designed specifically for HPC clusters Serial terminal server and concentrator Remote power/reset Each ICE Box supports up to 10 nodes and 2 auxiliary devices Unique zero U design that mounts to the back of a standard 19" rack. Each has a network connection and unique IP address Allows multiple boxes to create a highly scalable IP-based communication Ability to monitor temperatures within nodes and adjust fan speeds Remotely reset motherboards through internally placed probes Emergency beacons Fully integrated with Clusterworx via easyto-use GUI

System Management: Linux BIOS i An open source BIOS alternative 8Boots quickly 8Remotely accessible 8Designed specifically for cluster systems i LinuxBIOS performs the same basic functions as commercial (factory-installed) BIOS only 10-20 times faster. 8Initializes the hardware 8Checks for valid memory 8Begins loading the operating system in about three seconds (Most commercial BIOS require about 30-60 seconds) i LinuxBIOS can be configured and accessed from within the Linux operating system. 8Changes and inspection of the BIOS can be made remotely to a single node or to all the nodes in a cluster system 8Can be configured without re-booting 8Can be upgraded/written without booting to dos

Software SuSe Linux Enterprise Server (SLES) 9 or Redhat Compilers Intel OpenMP and auto-parallelization Portland Group (PGI) C/C++/Fortran OpenMP and auto-parallelization HPF Pathscale (Opteron only) C/C++/Fortran compilers OpenMP and auto-parallelization GNU compilers C/C++/Fortran77 Programming models MPI OpenMP, auto-parallelizing compilers Sockets (IPoIB) Hybrid MPI/OpenMP SysV shared memory (on same node) PVM

Software (cont.) Libraries Intel MKL: optimized math libraries including BLAS, LAPACK, and FFT AMD ACML FFTW, ATLAS, etc are available Debuggers GNU gdb PGDBG Graphical debugger Totalview Development Tools Integrated development environments included with SLES Profiling tools Intel VTUNE Intel Trace Collector/Intel Trace Analyzer mpip, MPE, Papi are available PGPROF parallel performance profiler

Batch Schedulers ilsf ipbspro iopenpbs islurm icheckpoint/restart is currently being developed

Certified Cluster Services ilinux cluster training iproject management icomprehensive support idatacenter modeling and design isite planning iapplication consulting

Training Offerings icluster System Fundamentals imigrating HPC Applications to Clusters ilinux Cluster Administration istorage and file systems iresource and Job Management iparallel Programming with MPI

Announcements iseveral new product announcements at SC05 in November - Mountaineer - Trailblazer

WRF Scalability WRF 2.0 425x300x34 Domain Speedup relative to 16 tasks 18 16 14 12 10 8 6 4 2 0 0 100 200 300 MPI Tasks Opteron Cluster with IB Linear Curve

Thank You