Step-by-Step Guide for Monitoring in Windows HPC Server 2008 Beta 2



Similar documents
Creating and Deploying Active Directory Rights Management Services Templates Step-by-Step Guide

Customizing Remote Desktop Web Access by Using Windows SharePoint Services Stepby-Step

Lab Answer Key for Module 1: Installing and Configuring Windows Server Table of Contents Lab 1: Configuring Windows Server

How To Install Outlook Addin On A 32 Bit Computer

Deploying Remote Desktop IP Virtualization Step-by-Step Guide

Pipeliner CRM Phaenomena Guide Sales Target Tracking Pipelinersales Inc.

Deploying Personal Virtual Desktops by Using RemoteApp and Desktop Connection Step-by-Step Guide

Exclaimer Alias Manager for Exchange Deployment Guide - Exclaimer Alias Manager for Exchange Outlook Add-In

Pipeliner CRM Phaenomena Guide Add-In for MS Outlook Pipelinersales Inc.

Management Reporter Integration Guide for Microsoft Dynamics GP

Microsoft Corporation. Status: Preliminary documentation

Overview of Microsoft Office 365 Development

Management Reporter Integration Guide for Microsoft Dynamics AX

Managing Linux Servers with System Center 2012 R2

Distributed File System Replication Management Pack Guide for System Center Operations Manager 2007

Deploying the Workspace Application for Microsoft SharePoint Online

Pipeliner CRM Phaenomena Guide Administration & Setup Pipelinersales Inc.

Improving Performance of Microsoft CRM 3.0 by Using a Dedicated Report Server

Pipeliner CRM Phaenomena Guide Opportunity Management Pipelinersales Inc.

Deploying Microsoft RemoteFX on a Single Remote Desktop Virtualization Host Server Step-by-Step Guide

Deploying Remote Desktop Web Access with Remote Desktop Connection Broker Step-by- Step Guide

Step-by-Step Guide for Microsoft Advanced Group Policy Management 4.0

Pipeliner CRM Phaenomena Guide Sales Pipeline Management Pipelinersales Inc.

Troubleshooting File and Printer Sharing in Microsoft Windows XP

Master Data Services. SQL Server 2012 Books Online

Windows BitLocker Drive Encryption Step-by-Step Guide

Security Explorer 9.5. User Guide

Spotlight on Messaging. Evaluator s Guide

Pipeliner CRM Phaenomena Guide Lead Management Pipelinersales Inc.

Pipeliner CRM Phaenomena Guide Getting Started with Pipeliner Pipelinersales Inc.

Writers: Joanne Hodgins, Omri Bahat, Morgan Oslake, and Matt Hollingsworth

Update and Installation Guide for Microsoft Management Reporter 2.0 Feature Pack 1

Implementing and Supporting Windows Intune

Integrating Business Portal 3.0 with Microsoft Office SharePoint Portal Server 2003: A Natural Fit

Lab Answer Key for Module 11: Managing Transactions and Locks

Migrating Active Directory to Windows Server 2012 R2

Windows Small Business Server 2003 Upgrade Best Practices

Product Guide for Windows Home Server

Introduction to DirectAccess in Windows Server 2012

Microsoft Dynamics GP. Pay Steps for Human Resources Release 9.0

SQL Server 2005 Reporting Services (SSRS)

Introduction to Hyper-V High- Availability with Failover Clustering

Hands-On Lab: WSUS. Lab Manual Expediting WSUS Service for XP Embedded OS

Windows Server Update Services 3.0 SP2 Step By Step Guide

File and Printer Sharing with Microsoft Windows

ArchestrA Log Viewer User s Guide Invensys Systems, Inc.

UPGRADE. Upgrading Microsoft Dynamics Entrepreneur to Microsoft Dynamics NAV. Microsoft Dynamics Entrepreneur Solution.

Hyper-V Server 2008 Setup and Configuration Tool Guide

How To Set Up A Load Balancer With Windows 2010 Outlook 2010 On A Server With A Webmux On A Windows Vista V (Windows V2) On A Network With A Server (Windows) On

Dell Enterprise Reporter 2.5. Configuration Manager User Guide

RedBlack CyBake Online Customer Service Desk

Microsoft Dynamics GP SQL Server Reporting Services Guide

Windows Azure Pack Installation and Initial Configuration

Microsoft Dynamics CRM Adapter for Microsoft Dynamics GP

Silect Software s MP Author

Lab Answer Key for Module 6: Configuring and Managing Windows SharePoint Services 3.0. Table of Contents Lab 1: Configuring and Managing WSS 3.

Microsoft Dynamics GP. Electronic Signatures

8.7. Resource Kit User Guide

Redeploying Microsoft CRM 3.0

Using Apple Remote Desktop to Deploy Centrify DirectControl

Version Provance Technologies, Inc. All rights reserved. Provance Technologies Inc. 85 Bellehumeur Gatineau, Quebec CANADA J8T 8B7

Sage CRM Connector Tool White Paper

Microsoft Dynamics TM NAV Installation & System Management: Application Server for Microsoft Dynamics NAV

Microsoft Dynamics GP. Check Printing

SQL Server 2014 BI. Lab 04. Enhancing an E-Commerce Web Application with Analysis Services Data Mining in SQL Server Jump to the Lab Overview

BizTalk Server Business Activity Monitoring. Microsoft Corporation Published: April Abstract

Overview of Active Directory Rights Management Services with Windows Server 2008 R2

Using SQL Reporting Services with Amicus

Connector for Microsoft Dynamics Configuration Guide for Microsoft Dynamics SL

Windows Server 2012 R2 Storage Infrastructure

Lab 05: Deploying Microsoft Office Web Apps Server

A SharePoint Developer Introduction

Windows Server ,500-user pooled VDI deployment guide

Integrating Campaign Data with WebTrends

Lab 02 Working with Data Quality Services in SQL Server 2014

NetIQ Aegis Adapter for Databases

EventTracker: Support to Non English Systems

Dell MessageStats for Lync and the MessageStats Report Pack for Lync & OCS 7.3. User Guide

Spotlight on Active Directory Quick Start Guide

ENHANCE. The Style Sheet Tool for Microsoft Dynamics NAV. Microsoft Dynamics NAV 5.0. User s Guide

Veeam Task Manager for Hyper-V

Installing Windows Rights Management Services with Service Pack 2 Step-by- Step Guide

Financial consolidations and currency translation

All other trademarks are property of their respective owners.

Integrating Symantec Endpoint Protection

How To Configure A Windows 8.1 On A Windows (Windows) With A Powerpoint (Windows 8) On A Blackberry) On An Ipad Or Ipad (Windows 7) On Your Blackberry Or Black

CA Nimsoft Monitor Snap

Microsoft Dynamics GP. Audit Trails

Foglight Cartridge for Active Directory Installation Guide

Getting Started Guide

CA Nimsoft Monitor. Probe Guide for Performance Collector. perfmon v1.5 series

Dell Recovery Manager for Active Directory 8.6. Quick Start Guide

Active Directory Deployment and Management Enhancements

Audit Management Reference

How to Secure a Groove Manager Web Site

Lab Answer Key for Module 9: Active Directory Domain Services. Table of Contents Lab 1: Exploring Active Directory Domain Services 1

Business Portal for Microsoft Dynamics GP. Key Performance Indicators Release 10.0

Deploying Microsoft RemoteFX for Personal Virtual Desktops Step-by-Step Guide

Implementing and Supporting Windows Intune

Microsoft Dynamics GP. Engineering Data Management Integration Administrator s Guide

Transcription:

Step-by-Step Guide for Monitoring in Windows HPC Server 2008 Beta 2 Microsoft Corporation Published: May 2008 Abstract This guide provides procedures and guidance for monitoring compute nodes in Windows HPC Server 2008 Beta 2. This guide focuses on cluster monitoring through the HPC Cluster Manager. It does not cover monitoring through System Center Operations Manager.

Copyright This document supports a preliminary release of a software product that may be changed substantially prior to final commercial release, and is the confidential and proprietary information of Microsoft Corporation. It is disclosed pursuant to a non-disclosure agreement between the recipient and Microsoft. This document is provided for informational purposes only and Microsoft makes no warranties, either express or implied, in this document. Information in this document, including URL and other Internet Web site references, is subject to change without notice. The entire risk of the use or the results from the use of this document remains with the user. Unless otherwise noted, the example companies, organizations, products, domain names, e-mail addresses, logos, people, places, and events depicted herein are fictitious, and no association with any real company, organization, product, domain name, e-mail address, logo, person, place, or event is intended or should be inferred. Complying with all applicable copyright laws is the responsibility of the user. Without limiting the rights under copyright, no part of this document may be reproduced, stored in or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or otherwise), or for any purpose, without the express written permission of Microsoft Corporation. Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this document. Except as expressly provided in any written license agreement from Microsoft, the furnishing of this document does not give you any license to these patents, trademarks, copyrights, or other intellectual property. 2008 Microsoft Corporation. All rights reserved. Microsoft, Windows, and Windows PowerShell are either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries. All other trademarks are property of their respective owners.

Contents Step-by-Step Guide for Monitoring in Windows HPC Server 2008 Beta 2... 4 Overview... 4 Requirements... 4 Steps for monitoring... 4 Step 1: View cluster status at a glance... 4 Step 2: Drill down into individual node details... 5 Step 3: Customize metric collection... 5 Step 4: Correlate the status of nodes, jobs, operations, and diagnostics... 6 Step 5: Monitoring ongoing operation progress... 6 Logging bugs and feedback... 7 Appendix... 7 MetricsFileFormat.xsd... 7 Example: XML file to add percent CPU Idle Time counter to Heat Map and Performance Charts... 11

Step-by-Step Guide for Monitoring in Windows HPC Server 2008 Beta 2 This document provides procedures and guidance for monitoring compute nodes in Windows HPC Server 2008 Beta 2. Note This document focuses on cluster monitoring through the HPC Cluster Manager. It does not cover monitoring through System Center Operations Manager. Overview A key step in monitoring and maintaining cluster health is to find any deviance from the normal operational state or performance. Windows HPC Server 2008 Beta 2 enables cluster administrators to view cluster and node status, identify problem areas, and drill down into the details for further investigation. Another monitoring aspect that is critical to administrating a cluster is the status of cluster operations. Cluster administrators need to track the changes that have recently occurred or are still happening to their cluster. Windows HPC Server 2008 Beta 2 archives recent operations and allows the administrator to find the operation of interest and observe its progress. Requirements To monitor compute nodes, you need a cluster with Microsoft HPC Pack 2008 Beta 2 deployed. Steps for monitoring This section describes the procedures that are used for monitoring compute nodes: Step 1: View cluster status at a glance Step 2: Drill down into individual node details Step 3: Customize metric collection Step 4: Correlate status of nodes, jobs, operations, and diagnostics Step 5: Monitoring ongoing operation progress Step 1: View cluster status at a glance To view cluster status at a glance 1. In Node Management, in the navigation tree, click Nodes to see all nodes in the cluster. You can click a filtered view to narrow the nodes to a subset of interest.

2. To see the node properties and metric values in List view, in the middle pane, select List. To customize the columns that are shown, in the Tool pane, click Column Chooser. Click column headers to sort the List view by a specific column. 3. In the List view, icons in each row indicate the Nodes that require administrator attention. There are three icons: A red cross indicates unreachable and failed provisioning A yellow warning sign indicates failed diagnostics tests A blue circle indicates ongoing operations 4. To view a metrics heat map for the nodes of interest, in the middle pane, select Heat Map. In the heat map, every cell represents a cluster node, and values for the tracked metrics are represented by depth of color. You can view up to three metrics at a time on the heat map. Current displayed metrics with their color maps are shown at the top of the heat map. In the List view, you can add metrics. Note To customize the range and color of a heat map metrics, on the Toolbar, click Options, and then click Customize View. This opens a dialog box where you can set the Maximum and Minimum metrics for the heat map, the Rollup method, and the Color map for each metric. To view status and performance charts 1. To view charts that show the aggregated status and performance information for the cluster, in the navigation tree, in Charts and Reports, click Monitoring Charts. 2. You can add additional charts by selecting a metric from the list. Step 2: Drill down into individual node details To view detailed information for an individual node 1. In Node Management, in the Node List view, click a node to view details in the details pane (for example, hardware, operating system properties, or current metric numbers). You can also doubleclick the node to open a separate dialog box that shows this information. 2. To view a node's performance over time, in the Actions pane, click View Performance Charts. On this page, you can select individual metrics to chart. 3. To view events that are generated by Windows HPC Server 2008 services on a selected node, click Open Event Viewer. 4. To open a remote desktop session to the node(s) that allows you to access the compute node(s) directly, click Remote Desktop. Step 3: Customize metric collection To add any perfmon counter to the heat map and performance charts 1. Create an XML file that describes the metric to add. (See MetricsFileFormat.xsd in the Appendix for the XML schema. Also included in the Appendix is a sample XML file that adds the metric CPU % Idle

Time. You can see more examples by exporting an XML file that contains current heat map metrics: Use the Windows PowerShell TM cmdlet Export-HpcMetric. 2. Import the XML file that describes the new metric by using the Windows PowerShell cmdlet Import- HpcMetric. 3. Within five minutes, the new metric appears in Heat Map and Performance Charts. 4. To change an existing metric collection (for example, sampling frequency), import the XML file that describes the metric to overwrite an existing setting. 5. To remove a metric from the collection, use the Windows PowerShell cmdlet Remove-HpcMetric. Note To view usage for the Windows PowerShell cmdlet mentioned above, in the Windows PowerShell, open a Command Prompt window, and type Get-Help <cmdlet> -Detailed. Step 4: Correlate the status of nodes, jobs, operations, and diagnostics Correlating the monitoring information between nodes, jobs, and other aspects of the cluster has traditionally been difficult for cluster administrators. Windows HPC Server 2008 solves this issue by allowing the administrator to pivot through this information while keeping the same context. To correlate the status of nodes, jobs, operations, and diagnostics 1. In Node Management, from the Node List view or Heat Map view, select the nodes you want status on. 2. In the Actions pane, click Pivot To, and then choose Jobs, Failed diagnostics, or Operations for the selected nodes. Pivoting paths that are supported include: Nodes Jobs, Failed Diagnostics, Operations Jobs Nodes Diagnostics Failed Nodes, Operations (Progress of the test) Step 5: Monitoring ongoing operation progress To monitor the progress of ongoing operations 1. In Node Management, in the navigation tree, click Operations. A list of all cluster operations from the past seven days appears. 2. Use the filter fields to filter the operations by Definition, Node, and (last) Updated. You can also click the filters in the navigation tree to view operations by Status. 3. Click an operation, and in the Details pane, you can see a detailed log of the operation that you selected. If the operation is still in progress, the log is updated in real time. Alternatively In Node view, select one or more nodes, and then in the Actions pane, click Operations for the node. This takes you to an operations view that is filtered by the selected nodes.

The following table defines the operation status reports: Operation Status Archived Committed Executing Failed Reverted Description Operation is more than 24 hours old or diagnostics test has been cleared. When an operation is archived, it is removed from other status reports. Operation has completed successfully Operation is still in progress Operation has failed to execute, is being reverted, or has failed to revert Operation has reverted after failure or cancellation Logging bugs and feedback To report bugs about monitoring features, send Microsoft the management logs from both the head node and problematic compute nodes. To create management logs 1. On the head node, and any compute nodes that are experiencing problems, set the tracelevel value under the HKEY_LOCAL_MACHINE\Software\Microsoft\hpc registry key to 4. 2. Restart the service. 3. Copy the management log from the following folder: C:\Program Files\Microsoft HPC Pack\Data\LogFiles\HpcManagement.log. Appendix MetricsFileFormat.xsd <?xml version="1.0" encoding="utf-8"?> <xs:schema id="metricsfileformat" targetnamespace=http://schemas.microsoft.com/hpcmetricsfileformat/2007/12 elementformdefault="qualified" xmlns="http://schemas.microsoft.com/hpcmetricsfileformat/2007/12" xmlns:mstns="http://schemas.microsoft.com/hpcmetricsfileformat/2007/12" xmlns:xs="http://www.w3.org/2001/xmlschema" blockdefault="#all">

<xs:element name="metrics" type="metricslist" /> <xs:simpletype name="samplerate"> <xs:restriction base="xs:string"> <xs:enumeration value="second"/> <xs:enumeration value="minute"/> <xs:enumeration value="hour"/> <xs:enumeration value="day"/> </xs:restriction> </xs:simpletype> <xs:simpletype name="storerate"> <xs:restriction base="xs:string"> <xs:enumeration value="never"/> <xs:enumeration value="minute"/> <xs:enumeration value="hour"/> <xs:enumeration value="day"/> </xs:restriction> </xs:simpletype> <xs:simpletype name="metrictarget"> <xs:restriction base="xs:string"> <xs:enumeration value="computenode"/> <xs:enumeration value="headnode"/> <xs:enumeration value="cluster"/> </xs:restriction> </xs:simpletype> <xs:simpletype name="hardwaresensortype"> <xs:restriction base="xs:string"> <xs:enumeration value="temperature"/> <xs:enumeration value="fanspeed"/> <xs:enumeration value="voltage"/> </xs:restriction> </xs:simpletype>

<xs:simpletype name="metriccalculation"> <xs:restriction base="xs:string"> <xs:enumeration value="averagepernode"/> <xs:enumeration value="averageallnodes"/> <xs:enumeration value="sumpernode"/> <xs:enumeration value="sumallnodes"/> </xs:restriction> </xs:simpletype> <xs:complextype name="metricslist"> <xs:choice maxoccurs="unbounded"> <xs:element name="performancecountermetric" type="performancecountermetric" minoccurs="0" maxoccurs="unbounded"/> <xs:element name="hardwaremetric" type="hardwaremetric" minoccurs="0" maxoccurs="unbounded"/> <xs:element name="calculatedmetric" type="calculatedmetric" minoccurs="0" maxoccurs="unbounded"/> </xs:choice> </xs:complextype> <xs:complextype name="metrictype" abstract="true"> <xs:attribute name="name" type="xs:string" use="required"/> <xs:attribute name="displayname" type="xs:string" use="required"/> <xs:attribute name="samplerate" type="samplerate" use="required"/> <xs:attribute name="storerate" type="storerate" use="required"/> <xs:attribute name="description" type="xs:string" use="required"/> <xs:attribute name="unit" type="xs:string" use="required"/> <xs:attribute name="metrictarget" type="metrictarget" use="required"/> <xs:attribute name="minimum" type="xs:float" use="optional"/> <xs:attribute name="maximum" type="xs:float" use="optional"/> </xs:complextype> <xs:complextype name="performancecountermetric"> <xs:complexcontent> <xs:extension base="metrictype">

<xs:attribute name="category" type="xs:string" use="required"/> <xs:attribute name="counter" type="xs:string" use="optional"/> <xs:attribute name="instance" type="xs:string" use="optional"/> </xs:extension> </xs:complexcontent> </xs:complextype> <xs:complextype name="hardwaremetric"> <xs:complexcontent> <xs:extension base="metrictype"> <xs:attribute name="sensortype" type="hardwaresensortype" use="optional"/> <xs:attribute name="sensorinstance" type="xs:string" use="optional"/> </xs:extension> </xs:complexcontent> </xs:complextype> <xs:complextype name="calculatedmetric"> <xs:complexcontent> <xs:extension base="metrictype"> <xs:attribute name="basemetric" type="xs:string" use="required"/> <xs:attribute name="counterinstance" type="xs:string" use="required"/> <xs:attribute name="calculation" type="metriccalculation" use="required"/> </xs:extension> </xs:complexcontent> </xs:complextype> </xs:schema> Attribute Name DisplayName SampleRate StoreRate Description Unit Description Name used to refer to the metric in Windows PowerShell commands. Name displayed in heat map and performance charts. Rate at which counter is sampled and refreshed on the heat map. Rate at which counter is stored and displayed on performance charts. Description of the metric. Unit of the metric value.

Attribute MetricTarget Minimum, Maximum PerformanceCounterMetric: Category, Counter; Instance HardwareMetric: SensorType; SensorInstance CalculatedMetric: BaseMetric; CounterInstance; Calculation Description Where the metric is collected. When the MetricTarget is Cluster, it is displayed in the Cluster Overall charts under Charts and Reports. When the MetricTarget is ComputeNode, it is displayed on heat map and Compute Node performance charts. When the MetricTarget is HeadNode, it is still displayed under Charts and Reports. Metric value range. This is used to define heat map and performance chart ranges. Performance counter definition as shown in perfmon. If Counter and Instance are not defined, all instances are displayed on performance charts, and the average is displayed on the heat map. Hardware counter definition as defined by WMI. If not defined, all instances are displayed on performance charts, and the average is displayed on the heat map. Defines the formula for calculating the counter. Example: XML file to add percent CPU Idle Time counter to Heat Map and Performance Charts <?xml version="1.0" encoding="utf-8"?> <Metrics xmlns:xsi=http://www.w3.org/2001/xmlschema-instance xmlns:xsd=http://www.w3.org/2001/xmlschema xmlns="http://schemas.microsoft.com/hpcmetricsfileformat/2007/12"> <PerformanceCounterMetric Name="CPUIdleTime" DisplayName="CPU Idle Time (%)" SampleRate="Second" StoreRate="Minute" Description="Percentage CPU Idle Time" MetricTarget="ComputeNode" Minimum="0" Maximum="100" Category="Processor" Counter="% Idle Time"

</Metrics> Instance="_Total"/>