WHITE PAPER. ClusterWorX 2.1 from Linux NetworX. Cluster Management Solution C ONTENTS INTRODUCTION



Similar documents
ClusterWorX r : A Framework to Manage Large Clusters Effectively

PARALLELS SERVER BARE METAL 5.0 README

CA Nimsoft Monitor. Probe Guide for CPU, Disk and Memory. cdm v4.7 series

Gigabyte Management Console User s Guide (For ASPEED AST 2400 Chipset)

Parallels Plesk Panel

By the Citrix Publications Department. Citrix Systems, Inc.

Management of VMware ESXi. on HP ProLiant Servers

Managing your Red Hat Enterprise Linux guests with RHN Satellite

Enterprise Manager. Version 6.2. Administrator s Guide

User Manual. (updated December 15, 2014) Information in this document is subject to change without notice.

Legal Notices Introduction... 3

VERITAS Backup Exec TM 10.0 for Windows Servers

Red Hat Network Satellite Management and automation of your Red Hat Enterprise Linux environment

Red Hat Satellite Management and automation of your Red Hat Enterprise Linux environment

Running VirtualCenter in a Virtual Machine

Application Servers - BEA WebLogic. Installing the Application Server

HP A-IMC Firewall Manager

VERITAS Backup Exec 9.1 for Windows Servers Quick Installation Guide

Installation Quick Start SUSE Linux Enterprise Server 11 SP1

NortechCommander Software Operating Manual MAN R6

Web Enabled Software for 8614xB-series Optical Spectrum Analyzers. Installation Guide

IBM Systems Director Navigator for i5/os New Web console for i5, Fast, Easy, Ready

NMS300 Network Management System

Imaging License Server User Guide

AXIS Camera Station Quick Installation Guide

Pearl Echo Installation Checklist

Tk20 Network Infrastructure

HP Device Manager 4.6

Version Filename Part Number Opcode V_ FL bin FL bin.md5 FL pkg

NexentaConnect for VMware Virtual SAN

Quick Start Guide. for Installing vnios Software on. VMware Platforms

EMC Data Domain Management Center

I. General Database Server Performance Information. Knowledge Base Article. Database Server Performance Best Practices Guide

ProCurve Manager Plus 2.2

Installation and Deployment

v6.1 Websense Enterprise Reporting Administrator s Guide

AT-S84 Version ( ) Management Software for the AT-9000/24 Gigabit Ethernet Switch Software Release Notes

HP IMC Firewall Manager

PARALLELS SERVER 4 BARE METAL README

TeamViewer 9 Manual ITbrain

The Benefits of Verio Virtual Private Servers (VPS) Verio Virtual Private Server (VPS) CONTENTS

Uptime Infrastructure Monitor. Installation Guide

SUPER SMC LCD. User's Guide. Revision 1.0

FileMaker Server 15. Getting Started Guide

HYPERION SYSTEM 9 N-TIER INSTALLATION GUIDE MASTER DATA MANAGEMENT RELEASE 9.2

HP Intelligent Management Center v7.1 Virtualization Monitor Administrator Guide

EZblue BusinessServer The All - In - One Server For Your Home And Business

JAMF Software Server Installation Guide for Windows. Version 8.6

MONITORING RED HAT GLUSTER SERVER DEPLOYMENTS With the Nagios IT infrastructure monitoring tool

Hyper-V Installation Guide for Snare Server

Imaging Computing Server User Guide

Agenda. Capacity Planning practical view CPU Capacity Planning LPAR2RRD LPAR2RRD. Discussion. Premium features Future

Windows Server 2008 R2 Hyper-V Live Migration

CA arcserve Unified Data Protection Agent for Linux

Web Application s Performance Testing

PHD Virtual Backup for Hyper-V

FileMaker Server 7. Administrator s Guide. For Windows and Mac OS

Parallels Virtuozzo Containers 4.7 for Linux Readme

CA Nimsoft Monitor Snap

HP Server Management Packs for Microsoft System Center Essentials User Guide

User Manual. Onsight Management Suite Version 5.1. Another Innovation by Librestream

Big Brother Professional Edition Windows Client Getting Started Guide. Version 4.60

technical brief Optimizing Performance in HP Web Jetadmin Web Jetadmin Overview Performance HP Web Jetadmin CPU Utilization utilization.

A Scalability Study for WebSphere Application Server and DB2 Universal Database

JAMF Software Server Installation Guide for Linux. Version 8.6

Vistara Lifecycle Management

Healthstone Monitoring System

McAfee Firewall Enterprise

CA Nimsoft Monitor. Probe Guide for IIS Server Monitoring. iis v1.5 series

EZblue BusinessServer The All - In - One Server For Your Home And Business

Rally Installation Guide

W H I T E P A P E R. Best Practices for Building Virtual Appliances

Cloud Server. Parallels. An Introduction to Operating System Virtualization and Parallels Cloud Server. White Paper.

FileMaker Server 12. Getting Started Guide

Live Maps. for System Center Operations Manager 2007 R2 v Installation Guide

Network operating systems typically are used to run computers that act as servers. They provide the capabilities required for network operation.

Symantec Database Security and Audit 3100 Series Appliance. Getting Started Guide

CATS-i : LINUX CLUSTER ADMINISTRATION TOOLS ON THE INTERNET

ETM System SIP Trunk Support Technical Discussion

Parallels Plesk Panel

Installing Management Applications on VNX for File

Chapter 8 Monitoring and Logging

Dell Server Management Pack Suite Version 6.0 for Microsoft System Center Operations Manager User's Guide

Upgrading from Call Center Reporting to Reporting for Contact Center. BCM Contact Center

CA Nimsoft Monitor. Probe Guide for Active Directory Server. ad_server v1.4 series

PATROL Console Server and RTserver Getting Started

Network Discovery Preparing for Installation

HP Remote Monitoring. How do I acquire it? What types of remote monitoring tools are in use? What is HP Remote Monitoring?

Virtual Appliance Setup Guide

Citrix MetaFrame Presentation Server 3.0 and Microsoft Windows Server 2003 Value Add Feature Guide

Functions of NOS Overview of NOS Characteristics Differences Between PC and a NOS Multiuser, Multitasking, and Multiprocessor Systems NOS Server

NetApp Storage System Plug-In for Oracle Enterprise Manager 12c Installation and Administration Guide

MGC WebCommander Web Server Manager

Intelligent Power Protector User manual extension for Microsoft Virtual architectures: Hyper-V 6.0 Manager Hyper-V Server (R1&R2)

OnCommand Performance Manager 1.1

Bosch Access Professional Edition 3.0 Access Control Module Certification Evaluation Made by Milestone

QuickSpecs. Overview. Compaq Remote Insight Lights-Out Edition

HP Client Manager 6.2

Symantec Endpoint Protection 11.0 Architecture, Sizing, and Performance Recommendations

EView/400i Management Pack for Systems Center Operations Manager (SCOM)

Transcription:

WHITE PAPER A PRIL 2002 C ONTENTS Introduction 1 Overview 2 Features 2 Architecture 3 Monitoring 4 ICE Box 4 Events 5 Plug-ins 6 Image Manager 7 Benchmarks 8 ClusterWorX Lite 8 Cluster Management Solution ClusterWorX 2.1 from Linux NetworX INTRODUCTION Linux clusters have become the high performance compute (HPC) engine of choice for many industries seeking raw number crunching power with greater flexibility, reliability, scalability and price/performance over traditional supercomputers. Early adopters of the technology have viewed the task of setting up an efficient and powerful Linux cluster system as a challenge, and typically have had access to a dedicated administration staff to ensure cluster stability. As cluster systems scale from dozens, to hundreds, and even to thousands of processors, management becomes exponentially complex, and can be a daunting challenge for any organization. Keeping software up to date, monitoring hardware and software status, and even performing routine maintenance requires significant effort. To alleviate these efforts, Linux NetworX has developed ClusterWorX, which provides a simple, user-friendly solution for cluster administration. ClusterWorX management software makes organizations more efficient, allowing organizations to dedicate resources to their applications instead of system management. This white paper describes the features and benefits of ClusterWorX v. 2.1. cluster management software from Linux NetworX. 1 www.linuxnetworx.com 1.800.214.9100

C LUSTERW ORX 2.1 OVERVIEW ClusterWorX gives administrators an easy method of managing and controlling their cluster. It provides secure remote monitoring and management capabilities down to the component level of each individual node. The flexible and scalable architecture of the software allows users to easily add compute nodes as system demands increase. ClusterWorX is designed to be easy to customize and extend to accommodate administrator-specific application requirements. The software employs an event engine, allowing users to set-up automatic notification and system administration for a variety of variables. Integrated cloning allows administrators to load or update the operating system and other cluster applications on single nodes or on the entire cluster simultaneously. C LUSTERW ORX 2.1 FEATURES Visually-based monitoring of node groups or the entire cluster Plug-in support for administrator-defined monitors and actions Three-tiered application GUI, server, and agent Event engine for automatic system administration Secure, remote access via SSL over Apache Complete node power control Monitoring of CPU, disk I/O, network bandwidth, and measures SWAP Automatic administrator notification upon node failure Integrated disk cloning and image management Easy-to-use, customizable GUI with tabs for easy configuration Fully integrated with ICE Box hardware or works without proprietary management hardware Licensing management Lite and 16-node or fewer version also available IP communication to ICE Box gor greater scalability LM sensors support for monitoring Ssh and serial communication for nodes Parallel, ssh based CLI controls 2 www.linuxnetworx.com 1.800.214.9100

A RCHITECTURAL O VERVIEW ClusterWorX is comprised of three tiers of software: a browser-based GUI, a Java Servlet engine residing on a "host" node, and an agent residing on each node in the cluster (see Figure 1). The GUI client, built with Java Swing components, provides a professional, portable interface to administrators through their browser. In order to optimize start-up performance, the client caches the GUI, so after the initial download ClusterWorX will always start quickly, regardless of the connection speed. The server software is comprised of Java servlets and a database to store monitoring information. The ClusterWorX agent, written in C++, resides on the nodes and performs monitoring and administration functions as directed by the host. The agent provides an easily extensible plug-in interface through an embedded Perl interpreter. Generally, the three tiers are developed and maintained as independent modules for portability allowing any of the tiers to be updated or replaced independently. This software architectural design creates a modular platform in which customers can write their own applications to work within the ClusterWorX environment. The three-tiered architecture is vital in making secure, remote access possible. A visual diagram of the ClusterWorX architecture is shown in Figure 1. Figure 1: The ClusterWorX architecture. 3 www.linuxnetworx.com 1.800.214.9100

Monitoring ClusterWorX advanced monitoring allows administrators to view more than 50 specific system details including, CPU usage, CPU type, network bandwidth, memory usage, disk I/O, and system uptime. System details are grouped into tabs, allowing various monitoring functions to be consolidated while still remaining extremely configurable (see Figure 2). Figure 2: Administrators can customize the ClusterWorX GUI to monitor more than 50 different values. ClusterWorX monitors power, temperature, and network connectivity by default. The UDP echo port checks for network connectivity. Monitor properties are gathered at administrator-defined intervals from the agents running on the individual nodes to servlets on the host server. These values are stored on the server and forwarded to clients as requested. In contrast to purely serverbased monitoring technologies, the client does all of the work of drawing/graphing the monitoring data and calculating averages. This allows monitoring to remain quick and up-to-date, even over a slow modem connection. ClusterWorX provides administrators with expanded information about the health of their clusters through LM sensor readings. LM sensors allow administrators to monitor voltage, power consumption, fan speed, and internal temperature of the motherboard. ICE BOX SUPPORT ICE Box, a Linux NetworX hardware management tool, provides ClusterWorX with its advanced power monitoring and power control. Through ICE Box ClusterWorX hard-powers nodes up and down, performs power resets, and monitors the temperature of individual nodes. 4 www.linuxnetworx.com 1.800.214.9100

ClusterWorX and ICE Box 2.1 provide improved compatibility. Instead of using a serial daisy chain, ClusterWorX creates a virtual IP daisy chain using the network connection a more reliable and scalable solution over the previously used serial network. A serial connection can still be used as a backup solution and administrators can access information about an individual ICE Box or node through the serial port in case of network failure. Either method allows administrators to create network up to 256 ICE Boxes, creating a powerful and scalable system management solution. In case of network failure, the serial port also allows administrators to retrieve information about an individual ICE Box or node through the serial port. ClusterWorX can function with ICE Box or without proprietary management hardware. On install, ClusterWorX becomes aware of the management hardware available and installs and configures itself accordingly. Though ClusterWorX will work with other management hardware, the full benefits and features of ClusterWorX can only be utilized when connected through ICE Box. ClusterWorX users, who s systems utilize ICE Box 2.1, have the additional benefit of managing ICE Box through the ClusterWorX GUI. Features such as ICE Box ID number, IP, Netmask, and Gateway address, and ICE Box password can be quickly updated through ClusterWorX. E VENTS/AUTOMATIC A DMINISTRATION When cluster problems arise, administrators can customize ClusterWorX to automatically power down, reboot, or halt any malfunctioning node. This is accomplished through an event engine, a feature that allows administrators to set thresholds on any of the above monitoring values, see Figure 3. This allows corrective action to be taken before problems become critical. Figure 3. A sample ClusterWorX event. If the administrator-defined threshold is exceeded, ClusterWorX automatically triggers an action. Default actions include node power down and node reboot. For example, the event engine can report and take an administrator-defined action, such as powering down a node, when processors rise above a certain temperature, or if the load average is too high. Events are configured by administrators and allow administrators the choice of receiving a notification when an event occurs. Events are also extendable in that they monitor administrator-defined values and execute administrator-defined plug-ins. Customizable action can be created by shell scripts, Perl scripts, symbolic links, programs, and more. Using a smart notification algorithm, ClusterWorX notifies administrators of problems without swamping them with unnecessary e-mails. The e-mail informs administrators which cluster is malfunctioning, the name of the triggered event, the node(s) which is experiencing problems, and 5 www.linuxnetworx.com 1.800.214.9100

the action (if any) that was taken, see Figure 4. One e-mail is sent per triggered event. If a node is fixed by an administrator but fails again later, the event re-fires automatically, without administrative intervention. For those who desire, e-mail can be directed to most wireless devices such as pagers and cellular phones. This is an automated message to inform you that the following event was triggered on the LNXICluster cluster. The "Load Average" event occurred on March 18, 2002 at 4:25:40 PM MST. The Load 5 Min Property is >= 4.0 on the following nodes: Node: pj3 Triggering Value: 4.03 Node: pj2 Triggering Value: 4.4 Node: pj1 Triggering Value: 4.18 This event has no action. Figure 4. A sample notification e-mail message. P LUG-INS/EXTENSIBLE A RCHITECTURE Through plug-ins, administrators can add extra features and functionality to ClusterWorX. A plugin is an extension to a program that can be added at run time. Plug-ins are either Perl scripts or binary programs and are executed by the ClusterWorX agent. All communications between the nodes and the host is handled by the plug-ins frame work. There is no need for the administrator to program any of the communications. To keep nodes running clean, zombie plug-ins are automatically removed after a set amount of time. ClusterWorX supports three types of plug-ins: execute, monitor, and startup. Execute plug-ins take an action on each of the nodes and act as customizable actions to be taken when an event is triggered by the event engine. Execute plug-ins can occur without administrative intervention. Execute plug-ins are identified on the ClusterWorX GUI with the gear icon. Monitor plug-ins continuously gather the vital statistics of a cluster. Monitor plug-ins include CPU percent usage, network connection, and percent memory usage. These values are continuously updated at administrator-defined time intervals. Administrators who desire to create their own monitor plug-ins must write them in Perl. Monitor plug-ins are identified on the ClusterWorX GUI by the health monitor icon. Start-up plug-ins gather static information each time the node each time it s booted. This information is stored in the ClusterWorX database. Static values include CPU type, 6 www.linuxnetworx.com 1.800.214.9100

operating system version, and memory information. Administrators who desire to create their own start-up plug-ins must write them in Perl. Start-up plug-ins are identified on the ClusterWorX GUI by the magnifying glass icon. Adding a Plug-in Using Perl scripts, administrators can create and add their own monitor plug-ins to meet the unique needs of their system (see Figure 5). Linux NetworX continues to develop additional plugins, and encourages customers to make useful plug-ins available to others as well. #!/usr/bin/perl #*********************************************************** # Copyright (c) 2001 Linux NetworX. All Rights Reserved. # # ClusterWorX uptime Plugin sub uptime { open FILE,'/usr/bin/uptime '; if (defined($uptime = <FILE>)) { $uptime =~ s/^.*up //; $uptime =~ s/,.*$//; # remove everything after the comma $uptime =~ s/^\s*//; # remove leading whitespace $uptime =~ s/\s*$//; # remove trailing whitespace } close(file); ($uptime); } Figure 5. The uptime plug-in. This plug-ins ships with ClusterWorX. It uses /usr/bin/uptime to get the uptime for each cluster node. Once a Perl plug-in is written, integrating it with ClusterWorX is easy. Through the GUI administrators configure plug-in parameters such as plug-in type, how often the plug-in should run, and which nodes the plug-in is to be copied. After configuration is complete, the agent will automatically fetches the new plug-in to be used on the node. The plug-in is now ready to be used for monitoring or events. IMAGE M ANAGER Disk image consistency is accomplished through disk cloning a process of quickly copying a system image from the ClusterWorX host to individual nodes within a cluster. The ClusterWorX Image Manager allows administrators to load the operating system and other cluster applications on single nodes or on the entire cluster simultaneously. Administrators build the functionality they want onto an image, then load the operating system and applications to a node. ClusterWorX then clones the image to selected nodes, saving administrators time and effort over interactive install methods. In the image manager, administrators can create different types of hard drive images some of which are specifically targeted to cluster node configurations. Images are stored on the host for ready-use. Creating the image clone on the host usually takes 15 minutes or less. Once the file is created, individual nodes can be cloned in as little as three minutes. Because up to 70 nodes can be 7 www.linuxnetworx.com 1.800.214.9100

cloned at once, configurations can be updated or replaced very quickly. ClusterWorX also supports NFS root, which allows nodes to run disklessly. C LUSTERW ORX AT W ORK ClusterWorX uses few system resources. It was benchmarked on a cluster designed to represent an average size Linux Cluster. ClusterWorX was run using default plug-ins (11) and plug-in refresh rate (30 seconds). During benchmarking procedures, ClusterWorX proved to use low network bandwidth, low memory usage, and low user CPU usage. System CPU usage is caused by the number of plug-ins which are run and the interval at which they are run and can be adjusted to lower system CPU usage. Table 1: Resources on ClusterWorX host with one client logged in and 1 node. Server 700 MHz Pentium III with 128 MB RAM Memory usage 13MB CPU Usage 2.7% Table 2: Resources on ClusterWorX host with eleven clients logged in and 11 nodes. Server 650 MHz AMD-K7 with 384MB RAM Memory usage 14MB CPU Usage 3.5% Table 3: Resources used on nodes Node 550 MHz Pentium III Memory usage 2.5MB User CPU usage 1.1% System CPU usage < 10% Table 4: Network bandwidth usage Traffic from node to ClusterWorX host 5808 bytes every 30 seconds 100 node average 20KB/second Traffic from ClusterWorX host to each client 4.5 KB burst/30 sec/node 100 node average 103KB/sec C LUSTERW ORX LITE ClusterWorX Lite is available to users of cluster systems with 16 or fewer nodes. ClusterWorX Lite includes the basic management capabilities of the full ClusterWorX version, but doesn t allow users access to key monitoring features such as the input engine and the ability to create and control sub-groups of nodes within the cluster. ClusterWorX Lite may be upgraded to full version at any time through the licensing management feature. 8 www.linuxnetworx.com 1.800.214.9100

N OTICE The information in this publication is subject to change without notice. LINUX NETWORX SHALL NOT BE LIABLE FOR TECHNICAL OR EDITORIAL ERRORS OR OMISSIONS CONTAINED HEREIN, NOR FOR INCIDENTAL OR CONSEQUENTIAL DAMAGES RESULTING FROM THE FURNISHING, PERFORMANCE, OR USE OF THIS MATERIAL 2001-2002 Linux NetworX and ClusterWorX are registered trademarks. Evolocity, ICE Box and the cube logo are trademarks of Linux NetworX. All Rights Reserved. Linux is a registered trademark of Linus Torvalds. Other company product and service names may be the trademarks of others. 9 www.linuxnetworx.com 1.800.214.9100