Microsoft Windows Compute Cluster Server 2003 Evaluation



Similar documents
Windows HPC 2008 Cluster Launch

Using the Windows Cluster

Microsoft Compute Clusters in High Performance Technical Computing. Björn Tromsdorf, HPC Product Manager, Microsoft Corporation

walberla: A software framework for CFD applications on Compute Cores

walberla: A software framework for CFD applications

Microsoft Windows Compute Cluster Server 2003 Getting Started Guide

Technical Overview of Windows HPC Server 2008

Interoperability between Sun Grid Engine and the Windows Compute Cluster

How to Run Parallel Jobs Efficiently

System Requirements Table of contents

Parallel Debugging with DDT

Configuring and Launching ANSYS FLUENT Distributed using IBM Platform MPI or Intel MPI

High Performance Computing in Aachen

virtualization.info Review Center SWsoft Virtuozzo (for Windows) //

Compute Cluster Server Lab 3: Debugging the parallel MPI programs in Microsoft Visual Studio 2005

Pcounter Web Report 3.x Installation Guide - v Pcounter Web Report Installation Guide Version 3.4

Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer

Running applications on the Cray XC30 4/12/2015

The Asterope compute cluster

Experiences with HPC on Windows

Assignment # 1 (Cloud Computing Security)

Interconnect Efficiency of Tyan PSC T-630 with Microsoft Compute Cluster Server 2003

Asta Powerproject Enterprise

Multicore Parallel Computing with OpenMP

HPC at IU Overview. Abhinav Thota Research Technologies Indiana University

Virtualization Guide. McAfee Vulnerability Manager Virtualization

Debugging with TotalView

Acronis Backup & Recovery 10 Advanced Server Virtual Edition. Quick Start Guide

HPC Cluster Decisions and ANSYS Configuration Best Practices. Diana Collier Lead Systems Support Specialist Houston UGM May 2014

7.x Upgrade Instructions Software Pursuits, Inc.

WebSphere Business Monitor V7.0 Configuring a remote CEI server

Freshservice Discovery Probe User Guide

Debugging and Profiling Lab. Carlos Rosales, Kent Milfeld and Yaakoub Y. El Kharma

ANSYS Remote Solve Manager User's Guide

High Productivity Computing With Windows

INTEL PARALLEL STUDIO XE EVALUATION GUIDE

DIABLO VALLEY COLLEGE CATALOG

High Performance Computing in CST STUDIO SUITE

Log Analyzer Reference

Using NeSI HPC Resources. NeSI Computational Science Team

MPI / ClusterTools Update and Plans

Microsoft HPC. V 1.0 José M. Cámara (checam@ubu.es)

Multi-Threading Performance on Commodity Multi-Core Processors

XenClient Enterprise Synchronizer Installation Guide

Toward a practical HPC Cloud : Performance tuning of a virtualized HPC cluster

Quick Start Guide for VMware and Windows 7

PHD Virtual Backup for Hyper-V

INSTALLATION GUIDE ENTERPRISE DYNAMICS 9.0

- An Essential Building Block for Stable and Reliable Compute Clusters

Synchronizer Installation

Setup Cisco Call Manager on VMware

Request Manager Installation and Configuration Guide

StarWind Virtual SAN Hyper-Converged Platform Quick Start Guide

Kaseya Server Instal ation User Guide June 6, 2008

Pedraforca: ARM + GPU prototype

October Gluster Virtual Storage Appliance User Guide

Virtual Desktop Infrastructure (VDI) made Easy

RA MPI Compilers Debuggers Profiling. March 25, 2009

Introduction to the NI Real-Time Hypervisor

Table of Contents. Introduction...9. Installation Program Tour The Program Components...10 Main Program Features...11

Streamline Computing Linux Cluster User Training. ( Nottingham University)

Installation and Configuration Guide for Windows and Linux

Windows Clients and GoPrint Print Queues

Site Configuration SETUP GUIDE. Windows Hosts Single Workstation Installation. May08. May 08

Performance Evaluation of VMXNET3 Virtual Network Device VMware vsphere 4 build

e-business Suite Server Install Guide

Clusters: Mainstream Technology for CAE

IceWarp Server. Log Analyzer. Version 10

MSSQL quick start guide

High Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/ CAE Associates

Requirements & Install. Module 2 Single Engine Installation

Microsoft Visual Basic Scripting Edition and Microsoft Windows Script Host Essentials

Agenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC

Deploying Microsoft Operations Manager with the BIG-IP system and icontrol

Parallel Computing using MATLAB Distributed Compute Server ZORRO HPC

TANDBERG MANAGEMENT SUITE 10.0

Installation and Configuration Guide for Windows and Linux

Oracle Database Scalability in VMware ESX VMware ESX 3.5

Quick Start - NetApp File Archiver

Numerical Calculation of Laminar Flame Propagation with Parallelism Assignment ZERO, CS 267, UC Berkeley, Spring 2015

Deploying Windows Streaming Media Servers NLB Cluster and metasan

UltraBac Documentation. UBDR Gold. Administrator Guide UBDR Gold v8.0

13.1 Backup virtual machines running on VMware ESXi / ESX Server

HPC Wales Skills Academy Course Catalogue 2015

VMware vrealize Operations for Horizon Administration

INSTALL AND CONFIGURATION GUIDE. Atlas 5.1 for Microsoft Dynamics AX

Introduction to HPC Workshop. Center for e-research

Transcription:

Microsoft Windows Compute Cluster Server 2003 Evaluation Georg Hager,, Johannes Habich (RRZE) Stefan Donath (Lehrstuhl für Systemsimulation) Universität Erlangen-Nürnberg rnberg ZKI AK Supercomputing 25./26.10.2007, GWDG

Outline Administrative issues Installing the Head Node Cluster Network Topology RIS-Unattended Installation Domain integration User Environment Benchmarks for Intel MPI PingPong Current user projects Evaluating Excel integration LINPACK sample Homebrew VBA macros for simple Jacobi benchmark NUMA and affinity issues Conclusions 2

Opteron Test Platform 7 quad Opteron nodes (Dual Core Dual Socket) 4 GB per node 8GB on head node Windows 2003 Enterprise + Compute Cluster Pack Visual Studio 2005, Intel compilers, MKL, ACML Star-CD Gbit Ethernet Access via RDP or ssh (sshd from Cygwin) GUI tool for job control: Cluster Job Manager CLI: job.cmd script 3

Cluster Layout RRZE ccsmaster ccsfront Plan: ccsfront as VMWare virtual machine on master node Only machine accessible to users Easily movable However: Serious problems with Visual Studio and Intel compilers inside VM Same procedures work on ccsmaster Being investigated For now, people work on ccsmaster Switch ccs002 ccs003 ccs004 ccs005 ccs006 ccs007 ccs008 4

Installing the Head Node Preinstalled headnode and cluster nodes from transtec with evaluation version of WinCCS2003 Preliminary network connection bandwidth evaluation with Intel IMB benchmarks No SUA support Clean installation of Win2003 Enterprise R2 x64 Installation of Intel Fortran and C compiler with Visual Studio 2005 integration Intel C++ Compiler 9.1 and 10.0 Intel Fortran Compiler 9.1 Intel MKL 9.0 5

Cluster Network Topology Separate private and public 1GE networks available DHCP Server could not separate the two scopes to two physical network adapters DHCP Server is reconfigured without warning for Remote Installation Services (RIS) to install nodes unattended only private network was used, with NAT translation for outside communication 6

RIS Unattended Installation Creating the RIS (Remote Installation Server) Image from Win2003 installation CD By-hand inlining of R2 necessary packets and settings from Win2003 installation CD 2 Adjustment of wrong paths inside the configuration files First attempt to install RIS caused a complete DHCP breakdown, as RIS changes complete DHCP configuration and launches without proper rights for standard scope 192.0.0.x After that, RIS installed compute nodes flawlessly 7

RIS Unattended Installation Domain Administrator rights are necessary for creating automatically new computer accounts inside domain RIS deploy wizard warns user that password is visible in plain text during deployment huge security risk during each setup! Suggestions: No plain text passwords Easy wizard creation of standard Win2003 images DHCP Server with multiple instances for each network interface Failure messages should be more elaborate 8

Domain Integration Headnode and cluster nodes were integrated into UNI-Erlangen ADS Problem of supplying a working directory which is both available as a network share via Windows Explorer and available to the jobs running on the cluster User works on mounted network share Job works on complete UNC path leading to same network share One common path for both, but UNC for users is not a preferred choice Automatic mapping of this location at job start and user login Problems of some Windows software to work with UNC (CMD.exe, partly Visual Studio 2005) 9

Intel MPI Ping Pong benchmark Compiling inside SUA and Visual Studio 2005 Issuing jobs with standard MPIEXEC causes 4 threads to run on each node internode communication measured Hence: Fun and interesting ways to run MPI Jobs on CCS from http://windowshpc.net/ Now: pernode.bat script hacks %CCP_NODES% system variable and only one task is executed per compute node Measuring real internode connection bandwidth and latency Desirable: flexible specification for processors per node at job submission and runtime 11

Intel MPI Ping Pong benchmark PingPong INTEL MPI Benchmark on CCS (small packets / latency test) Latency [µsec] 200 180 160 140 120 100 80 60 40 20 0 1 10 100 1000 Message length [bytes] MS: Variations due to interrupt coalescing in GE driver No option to switch off in our driver 12

Projects on the WinCCS Cluster Crack propagation (FAU, Materials Science) F90/OpenMP, C++ (OpenMP to come) VirtualFluids (TU Braunschweig) C++/MPI Star-CD (FAU, Fluid Mechanics) walberla (FAU, System Simulation) C++/MPI 13

walberla project @. Widely applicable lattice Boltzmann from Erlangen 25.10.2007 hpc@rrze.uni-erlangen.de CFD project based on lattice Boltzmann method Modular software concept Supports various applications, currently planned: Blood flow in aneurysms Moving particles and agglomerates Free surfaces to simulate foams, fuel cells, a.m.m. Charged colloids Arbitrary combinations of above Integration in efficient massive-parallel environment Standardized input and output routines User-friendly interface Platform independency with CMAKE WinCCS@RRZE 14

walberla project @. Widely applicable lattice Boltzmann from Erlangen Porting issues concerning CMake: CMake has to be configured to find MPI Not possible to specify Cluster Debugger Configurations via CMake (overwrites settings when project is built) Visual Studio & Queues Not possible to automatically submit and debug parallel job via Visual Studio Debugging issues MPI Cluster Debugger: Configuration pain to run jobs on remote sites Remote Debugger not able to connect to queued jobs 15

Goals WCCS as a development environment Visual Studio Parallel debugging Different compilers Some issues (Intel project system, parallel debugging), but ok in general Coupling of CCS cluster to MS Excel by use of VBA Job construct, submit Result retrieval Visualization Behaviour of Windows on a ccnuma architecture Locality & affinity issues Buffer cache 18

Excel-CCS coupling CCS API available for C++ and VBA Documentation issues for VBA, but many examples online: http://www.microsoft.com/technet/scriptcenter/ scripts/ccs/job/default.mspx?mfr=true API is part of Office Professional only Example Excel Sheet and VBA macros for LINPACK parameter scans provided Toy project: Make Excel sheet for simple heat equation solver with graphical output of performance numbers 19

Evaluating Excel Integration (LINPACK Eample) Taking precompiled binaries offered by MS with ACML Tuning LINPACK parameters as suggested in: Hands-On Lab Building HPC LINPACK Tool Issuing jobs directly to Job Manager Querying jobs Results are instantly plotted in Excel 12.00 10.00 8.00 6.00 4.00 2.00 0.00 Gflops Matrix 44000 60 64 68 72 76 80 84 88 92 96 Blocksize 20

Excel-CCS coupling Principles of operation Provide Excel worksheet with necessary parameters Binary name, working dir Number of CPUs, walltime limit Input parameters for application Position active elements (buttons, ) linked to VBA macros VBA communicates with CCS using XML First time generation of XML structure from XML Schema template file (saved from Job Manager app.) scratch Link entries to worksheet cells Use VBA-CCS API to construct and submit jobs Many options possible Collect and parse output data and fill cells Simple visualization possible using Excel graphs 21

job.xml template <?xml version="1.0" encoding="utf-8" standalone="yes"?> <Job xmlns:ns1="http://www.microsoft.com/computecluster/" Name="job1" MaximumNumberOfProcessors="4" MinimumNumberOfProcessors="4" Owner="unrz55" Priority="Normal" Project="Jacobi" Runtime="Infinite"> <ns1:tasks> <ns1:task Name="task1" CommandLine="echo 10 10 5 heat_ccs.exe" MaximumNumberOfProcessors="4" MinimumNumberOfProcessors="4" Runtime="Infinite" WorkDirectory="\\Ccsmaster\ccsshare\unrz55\xlstest\" Stderr="3700-3700-50-err.out" Stdout="3700-3700-50-heat.out /> </ns1:tasks> </Job> 22

Excel-CCS coupling 23

Submitting a Job with VBA in Excel connect to cluster, defined by xls cell Cluster Set objcluster = CreateObject("Microsoft.ComputeCluster.Cluster") objcluster.connect (Range("Cluster").Value) Job object from XML description Set Job = objcluster.createjobfromxml(strxml) obtain user credentials Set WshNetwork = CreateObject("WScript.Network") UserName = WshNetwork.UserDomain & "\" & WshNetwork.UserName submit job to queue ID = objcluster.queuejob((job), UserName, "", False, 0) Many other CCS-API and general VBA functions available Status query, job cancel etc. VBA: Regexp package 24

Result retrieval e.g. by VBScript Regexp Package 25

Windows on ccnuma Locality and congestion problems on ccnuma First touch policy for memory pages ensures local placement Watch OpenMP init loops P P Even if placement is correct, make sure it stays that way pin threads/processes to initial sockets Issue with OpenMP and MPI To make matters worse, FS buffer cache can fill LDs so that apps must use nonlocal memory Use memory sweeper before production How does all this work on Windows? C C MI C C Memory LD0 data data BC P P C C C C MI Memory LD1 data BC 26

NUMA Placement and Pinning with Heat Conduction Solver (Relaxation) placement+pinning placement only no placement MLUPs/s 800 700 600 500 400 300 Pinning benefit is only due to better NUMA locality! additional pinning: +30% 200 100 0 4MB L2 limit NUMA placement: +60% 100 250 400 550 700 850 1000 1150 1300 1450 1600 1750 1900 2050 2200 2350 2500 2650 2800 27 N

Buffer Cache and Page Placement 400 350 No file write 300 250 MLUPs/s 200 150 100 50 0 How to limit FS buffer cache in Windows? 1GB file write in LD0 before benchmark 2GB working set limit 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 11000 12000 13000 14000 N 28

Future plans Test of Ansys CFX 11 Test of StarCD Test of Allinea DDTLite for Visual Studio WalbErla (CFD) development and evaluation of Windows Performance (see Projects) Customized Excel sheets for standard production applications (see above) 29

Conclusions Well-known development environment with HPC add-ons Batch system/scheduler is not enterprise-class Ease of use (develop/compile/debug/job submit) Room for improvement with parallel debugging Similar ccnuma issues as with Linux, same remedies Process/thread pinning absolutely essential CCS VBA API is extremely fun to play with May be attractive to production-only users Still lacks some coherent documentation Only available with Office Professional 30