Microsoft HPC. V 1.0 José M. Cámara (checam@ubu.es)



Similar documents
Windows HPC Server 2008 R2 Service Pack 3 (V3 SP3)

DigitalPersona Pro Server for Active Directory v4.x Quick Start Installation Guide

"Charting the Course to Your Success!" MOC A Understanding and Administering Windows HPC Server Course Summary

WINDOWS AZURE AND WINDOWS HPC SERVER

MATLAB Distributed Computing Server with HPC Cluster in Microsoft Azure

Windows Compute Cluster Server Top 5 Most Common Challenges

CLUSTER COMPUTING TODAY

Deploying Cisco Unified Contact Center Express Volume 1

Microsoft Compute Clusters in High Performance Technical Computing. Björn Tromsdorf, HPC Product Manager, Microsoft Corporation

Eine CAE Infrastruktur für LS-DYNA. unter Verwendung von. Microsoft Windows HPC Server 2008

Batch Systems. provide a mechanism for submitting, launching, and tracking jobs on a shared resource

LSKA 2010 Survey Report Job Scheduler

Quick Start Guide. User Manual. 1 March 2012

Deploying Windows Streaming Media Servers NLB Cluster and metasan

Optimizing Shared Resource Contention in HPC Clusters

Desktop Central Managing Windows Computers in WAN

Chapter 2: Getting Started

Chapter 1 - Web Server Management and Cluster Topology

Experience with Server Self Service Center (S3C)

Microsoft Technical Computing The Advancement of Parallelism. Tom Quinn, Technical Computing Partner Manager

Introduction. Before you begin. Installing efax from our CD-ROM. Installing efax after downloading from the internet

FileMaker Pro 13. Using a Remote Desktop Connection with FileMaker Pro 13

Server Manager Performance Monitor. Server Manager Diagnostics Page. . Information. . Audit Success. . Audit Failure

Uptime Infrastructure Monitor. Installation Guide

SAM XFile. Trial Installation Guide Linux. Snell OD is in the process of being rebranded SAM XFile

Administrator s Guide

Hardware/Software Guidelines

Private cloud computing advances

Alfresco Enterprise on Azure: Reference Architecture. September 2014

ACTIVE DIRECTORY DEPLOYMENT

NZQA Expiring unit standard 6857 version 4 Page 1 of 5. Demonstrate an understanding of local and wide area computer networks

A High Performance Computing Scheduling and Resource Management Primer

Deploying System Center 2012 R2 Configuration Manager

ArcGIS for Server in the Cloud

Mitglied der Helmholtz-Gemeinschaft. System monitoring with LLview and the Parallel Tools Platform

Develop a process for applying updates to systems, including verifying properties of the update. Create File Systems

Scalability and Classifications

LogMeIn Hamachi. Getting Started Guide

Active-Active and High Availability

Network Station - Thin Client Computing - Overview

Silect Software s MP Author

Position Description. Job Summary: Campus Job Scope:

Vess A2000 Series HA Surveillance with Milestone XProtect VMS Version 1.0

HPC Cluster Decisions and ANSYS Configuration Best Practices. Diana Collier Lead Systems Support Specialist Houston UGM May 2014

For Active Directory Installation Guide

Administrator s Guide

The Windows Server 2003 Environment. Introduction. Computer Roles. Introduction to Administering Accounts and Resources. Lab 2

Aqua Connect Load Balancer User Manual (Mac)

Drobo How-To Guide. Topics. What You Will Need. Prerequisites. Deploy Drobo B1200i with Microsoft Hyper-V Clustering

Azure Scalability Prescriptive Architecture using the Enzo Multitenant Framework

An HPC Application Deployment Model on Azure Cloud for SMEs

Avoiding Performance Bottlenecks in Hyper-V

Deployment Planning Guide

locuz.com HPC App Portal V2.0 DATASHEET

THE WINDOWS AZURE PROGRAMMING MODEL

The Nuts and Bolts of Autodesk Vault Replication Setup

Why is the V3 appliance so effective as a physical desktop replacement?

Windows Server 2008 R2 Hyper-V Live Migration

Configuring and Launching ANSYS FLUENT Distributed using IBM Platform MPI or Intel MPI

System Center 2012 R2 Configuration Manager Toolkit

Cisco Active Network Abstraction Gateway High Availability Solution

Citrix Application Streaming. Universal Application Packaging and Delivery Breaking Away from Traditional IT

Using Windows HPC Server 2008 Job Scheduler

Load Balancing Exchange 2007 SP1 Hub Transport Servers using Windows Network Load Balancing Technology

Configuring a Custom Load Evaluator Use the XenApp1 virtual machine, logged on as the XenApp\administrator user for this task.

Installing GFI Network Server Monitor

Client/Server Computing Distributed Processing, Client/Server, and Clusters

Getting Started with HC Exchange Module

AdminToys Suite. Installation & Setup Guide

Deploying and Optimizing SQL Server for Virtual Machines

Configuring RemoteFX on Windows Server 2012 R2

Cisco UCS CPA Workflows

Course Syllabus. Microsoft Dynamics GP Installation & Configuration. Key Data. Introduction. Audience. At Course Completion

Desktop Authority and Group Policy Preferences

PC Power Management FAQ

ArcGIS for Server: In the Cloud

Installing GFI Network Server Monitor

Wyse Device Manager TM

Getting Started Guide

Assignment # 1 (Cloud Computing Security)

DataCentred Cloud Storage

How to monitor AD security with MOM

Administration Guide. . All right reserved. For more information about Specops Deploy and other Specops products, visit

DB2 Connect for NT and the Microsoft Windows NT Load Balancing Service

Setting Up Exchange. In this chapter, you do the following tasks in the order listed:

EnduraData Cross Platform File Replication and Content Distribution (November 2010) A. A. El Haddi, Member IEEE, Zack Baani, MSU University

Web based training for field technicians can be arranged by calling These Documents are required for a successful install:

Work Environment. David Tur HPC Expert. HPC Users Training September, 18th 2015

Virtual desktops made easy

ION EEM 3.8 Server Preparation

Cloud Computing. Adam Barker

1. Server Microsoft FEP Instalation

WHITE PAPER SETTING UP AND USING ESTATE MASTER ON THE CLOUD INTRODUCTION

virtualization.info Review Center SWsoft Virtuozzo (for Windows) //

How To Test Your Web Site On Wapt On A Pc Or Mac Or Mac (Or Mac) On A Mac Or Ipad Or Ipa (Or Ipa) On Pc Or Ipam (Or Pc Or Pc) On An Ip

Cloud & Datacenter Monitoring with System Center Operations Manager

EView/400i Management Pack for Systems Center Operations Manager (SCOM)

Big Data Use Case. How Rackspace is using Private Cloud for Big Data. Bryan Thompson. May 8th, 2013

VEEAM ONE 8 RELEASE NOTES

Windows Server 2008 R2 Hyper V. Public FAQ

Transcription:

Microsoft HPC V 1.0 José M. Cámara (checam@ubu.es)

Introduction Microsoft High Performance Computing Package addresses computing power from a rather different approach. It is mainly focused on commodity clusters. It is meant to make the most of existing resources thus minimizing investment. The system is highly scalable due to both on premises and cloud resources allocation. Overall performance is affected by customer s investment on the architectural subsystems: compute nodes hardware & interconnect.

Cluster architecture Head node Windows Server + HPC Pack WAN Compute nodes Windows 7, 8, (Pro. Ent.), Server + HPC Pack Cloud nodes Windows Azure LAN / SAN Workstation nodes Windows 7, 8, (Pro. Ent.), Server + HPC Pack

Cluster architecture II Types of nodes Head node: no particular hardware features are required (double network adapter). Compute node: devoted to HPC. Workstation: general purpose computer. It joins the cluster to perform calculation when not used for other purposes. They may be brought online manually or according to a timetable. Cloud node: virtual computer hired to Microsoft. Operating system The head node must run Windows Server. The rest nodes in premises may be servers but they also admit conventional Microsoft operating systems usually restricted to professional, enterprise or ultimate versions. Cloud nodes are integrated under Windows Azure. Network architecture A conventional WAN connection is used for both remote management and connecting cloud resources. Obviously the wider bandwidth, the better. On premises resources are connected by means of either LAN or SAN solutions; most vendor systems are supported. Nodes can be linked though private networks, corporate or both.

Cluster deployment Server deployment OS installation. Active domain configuration. HPC Pack Installation. Head node configuration Node deployment Definition of cluster topology. Configuration of communications. User policies. OS installation. Entering the active domain. HPC Pack Installation. Entering the cluster.

Active domain For all the cluster operations and services to work safely, all nodes must belong to a common domain or at least to different domains with trust relations established. The domain is set as a new forest and usually the server is promoted to domain controller, unless other server is in charge. All nodes and users will set as domain members so they can join the cluster.

HPC Pack The high performance computing package (HPC Pack) is provided by Microsoft free of charge. The same package is valid for any type of node to join the cluster. Depending on the type of node selected a different set of tools is installed. In all cases, a group of software tools is made available to the user for cluster management.

Cluster topologies Microsoft uses this concept to define a series of in premises network connection alternatives. It is not related to the network graph actually. Enterprise network Enterprise network Topology 1: compute & workstation nodes only connected to private network. Private network Enterprise network Private network Topology 1: compute nodes connected to private network. Workstations only connected to enterprise network Enterprise network Topology 2: compute nodes connected to enterprise & private networks. Workstations connected to both or just to private network. Private network Private network Enterprise network Topology 3: compute nodes connected to private & application networks. Workstations connected to both or to enterprise network. Private network Application network (SAN)

Cluster topologies II Enterprise network Topology 4: compute nodes connected to enterprise, private & application networks. Workstations connected to all or only to enterprise network. Private network Application network (SAN) Enterprise network Topology 5: no private or application network. All nodes connected to enterprise network The user is not compelled to match the actual network configuration when setting the topology. Enterprise network may exist on workstations but may not be used for HPC.

Cluster management The system administrator is in charge of configuration, supervision and cluster management. The Cluster Manager is the tool that makes all these operations possible: Node management. Users management. Scheduler configuration.

Node management Add / remove nodes. Bring workstations online/offline: only if this operation has previously been set as manual and not according to a certain schedule. Form node groups to facilitate management. For instance, set certain types of jobs to be allocated to certain node groups. Monitoring nodes: node health, properties, network, metrics

Users management Add / remove users. Manage user groups. Set user roles: User: allowed to manage its own jobs. Administrator: allowed to manage jobs and resources. Job administrator: allowed to manage jobs but not resources. Job operator: allowed to manage jobs in a restricted manner (view, cancel, finish, re-queue).

Scheduler management The scheduler decides what jobs to launch, when and what resources must be assigned to them. This is done under some policies: Queued. Balanced. Backfilling is permitted: smaller jobs can be launched before others ahead of the queue when they are waiting for enough resources and as long as this operation does not delay the awaiting jobs.

Preemption: higher priority jobs are allowed to take away resources from lower priority ones already started. Job scheduling policies Sub-policies Dynamic resource allocation: resources allocated to a job can be modified during execution. Graceful: resources are taken away from already finished tasks. Immediate: all job s tasks are cancelled at run time. Task level: tasks are cancelled individually. Automatic increase: new resource allocation is preferred over starting lower priority jobs. Automatic decrease: take away unused resources from jobs with no tasks to be started. Queued Politicies Balanced Tries to start jobs in queue order. Optimized for large jobs. Default graceful preemption. Default automatic increase / decrease enabled. Jobs are started as soon as they have the minimum required resources. New available resources are allocated to running jobs according to their priority. Optimized for small and interactive jobs. Default immediate preemption.

Job Manager The users submit jobs to the cluster. Jobs are made up from tasks. The Job Manager is the tool used to create and submit jobs.

Task types Basic Runs a single instance of a series or parallel application. Parametric sweep Runs a number of instances of the same tasks. The exact number is determined by a command that takes different values. The command s value itself is meaningful to each instance. Node preparation A command or script to be run on each node. It is executed prior to any other task in the job. Node release A command or script to be run on each node. It is executed when the node is released from the job. Service Runs a command or service on each resource allocated to a job. If an instance of the command exits and the resource is still allocated, another one starts.

Job types Job It is the most general type. The two others can be made on this form as well. The jobs may comprise multiple tasks and parametric sweeps. Single-task job Meant to make the configuration of simple jobs easier. It is used for single task jobs. Parametric sweep job Meant to make the configuration of simple jobs easier. It is used for a single parametric sweep task.

Job properties Job ID Job name Job template Priority Run time Memory Licenses Numeric ID of the job. Assigned by the Job Manager. Text name of the job. User assigned. Name of the job template used to submit the job. Templates define default values and constraints to the jobs and are created by the administrator. Numeric value to set the priority of the job. Ranges from 0 to 4000, being 4000 the highest priority. Maximum time to complete the job. If exceeded the job is cancelled. The minimum amount of memory on a node to run the job. Licenses requested to run the job. Apart from the name, setting the rest is not mandatory. There are some more properties. For a complete list refer to: https://technet.microsoft.com/es-es/library/ff919649

Job submission & monitoring. Job submission using the Job Manager. Jobs can be created and submitted logging on to a node and starting the Job Manager. Choose the type of job and fill the fields in the form. Set the command line and the output files. Default values may be left unchanged if not needed. Jobs can be submitted remotely. A Remote Desktop Connection to a cluster node must be stablished first. For a complete reference on how to submit and monitor jobs remotely refer to: https://technet.microsoft.com/en-us/library/gg315415%28v=ws.10%29.aspx

Queued scheduling. Aims to provide as many resources as possible to running jobs. Different preemption options can be selected. It is also possible to decide how allocated resources are adjusted. Main queued scheduling options.

Balanced scheduling. Main balanced scheduling options. Tries to start as many jobs as possible even though the maximum required resources are not available. Different preemption options can be selected. The bias applied to priority when rebalancing is also selectable. Rebalancing time is configurable. The goal is to balance flexibility and efficiency.

References Cluster network topologies: https://technet.microsoft.com/enus/library/gg145543.aspx Node management: https://technet.microsoft.com/es-es/library/ff919378 Managing cluster users: https://msdn.microsoft.com/enus/library/ff919335.aspx New cluster user roles: http://blogs.technet.com/b/windowshpc/archive/2013/09/04/hpc-pack- 2012-sp1-available.aspx Job scheduler: https://technet.microsoft.com/es-es/library/ff919436 Job manager: https://technet.microsoft.com/es-es/library/ff919691