CATS-i : LINUX CLUSTER ADMINISTRATION TOOLS ON THE INTERNET



Similar documents
Managing Multiple Multi-user PC Clusters Al Geist and Jens Schwidder * Oak Ridge National Laboratory

WHITE PAPER. ClusterWorX 2.1 from Linux NetworX. Cluster Management Solution C ONTENTS INTRODUCTION

Gigabyte Management Console User s Guide (For ASPEED AST 2400 Chipset)

DB2 Connect for NT and the Microsoft Windows NT Load Balancing Service

Red Hat Network Satellite Management and automation of your Red Hat Enterprise Linux environment

Red Hat Satellite Management and automation of your Red Hat Enterprise Linux environment

PARALLELS SERVER BARE METAL 5.0 README

NetCrunch 6. AdRem. Network Monitoring Server. Document. Monitor. Manage

System Management Framework and Tools for Beowulf Cluster

Cluster Implementation and Management; Scheduling

Example of Standard API

PARALLELS SERVER 4 BARE METAL README

CHAPTER 15: Operating Systems: An Overview

Network Probe User Guide

IBM Tivoli Composite Application Manager for Microsoft Applications: Microsoft Hyper-V Server Agent Version Fix Pack 2.

Reborn Card NET. User s Manual

Best Practices for Deploying and Managing Linux with Red Hat Network

SNMP-1000 Intelligent SNMP/HTTP System Manager Features Introduction Web-enabled, No Driver Needed Powerful yet Easy to Use

Chapter 4. System Software. What You Will Learn... Computers Are Your Future. System Software. What You Will Learn... Starting the Computer

HP Device Manager 4.6

CS 3530 Operating Systems. L02 OS Intro Part 1 Dr. Ken Hoganson

Network Licensing. White Paper 0-15Apr014ks(WP02_Network) Network Licensing with the CRYPTO-BOX. White Paper

EWeb: Highly Scalable Client Transparent Fault Tolerant System for Cloud based Web Applications

VMware Server 2.0 Essentials. Virtualization Deployment and Management

FileNet System Manager Dashboard Help

Week Overview. Installing Linux Linux on your Desktop Virtualization Basic Linux system administration

Computers: Tools for an Information Age

Deploying Windows Streaming Media Servers NLB Cluster and metasan

Remote Application Server Version 14. Last updated:

The System Monitor Handbook. Chris Schlaeger John Tapsell Chris Schlaeger Tobias Koenig

Product Description. Licenses Notice. Introduction TC-200

Managing your Red Hat Enterprise Linux guests with RHN Satellite

User and Installation Manual

Installation Guide for FTMS and Node Manager 1.6.0

Remote Application Server Version 14. Last updated:

Monitoring Infrastructure for Superclusters: Experiences at MareNostrum

LSKA 2010 Survey Report Job Scheduler

pc resource monitoring and performance advisor

13.1 Backup virtual machines running on VMware ESXi / ESX Server

10 STEPS TO YOUR FIRST QNX PROGRAM. QUICKSTART GUIDE Second Edition

CT LANforge-FIRE VoIP Call Generator

MOSIX: High performance Linux farm

Management Software. Web Browser User s Guide AT-S106. For the AT-GS950/48 Gigabit Ethernet Smart Switch. Version Rev.

Veritas Cluster Server

FileMaker Server 7. Administrator s Guide. For Windows and Mac OS

White Paper. The Ten Features Your Web Application Monitoring Software Must Have. Executive Summary

Ultra Thin Client TC-401 TC-402. Users s Guide

Cisco Networking Academy Program Curriculum Scope & Sequence. Fundamentals of UNIX version 2.0 (July, 2002)

Management of VMware ESXi. on HP ProLiant Servers

Installation Guide for the Intel Server Control

A scalable file distribution and operating system installation toolkit for clusters

WhatsUp Gold v11 Features Overview

Network Station - Thin Client Computing - Overview

NI Real-Time Hypervisor for Windows

Acronis Backup & Recovery 10 Server for Linux. Installation Guide

Transaction Performance Maximizer InterMax

PANDORA FMS NETWORK DEVICE MONITORING

Acronis Backup & Recovery 10 Server for Linux. Update 5. Installation Guide

Features Overview Guide About new features in WhatsUp Gold v12

HOMEROOM SERVER INSTALLATION & NETWORK CONFIGURATION GUIDE

DeployStudio Server Quick Install

User's Guide - Beta 1 Draft

PANDORA FMS NETWORK DEVICES MONITORING

EXPRESSCLUSTER X for Windows Quick Start Guide for Microsoft SQL Server Version 1

Running a Workflow on a PowerCenter Grid

PROMISE ARRAY MANAGEMENT (PAM) for

Exploiting the Web with Tivoli Storage Manager

Tools and strategies to monitor the ATLAS online computing farm

Citrix XenServer 5.6 OpenSource Xen 2.6 on RHEL 5 OpenSource Xen 3.2 on Debian 5.0(Lenny)

IBM Systems Director Navigator for i5/os New Web console for i5, Fast, Easy, Ready

How To Set Up Safetica Insight 9 (Safetica) For A Safetrica Management Service (Sms) For An Ipad Or Ipad (Smb) (Sbc) (For A Safetaica) (

Building a Penetration Testing Virtual Computer Laboratory

Terminal Server Software and Hardware Requirements. Terminal Server. Software and Hardware Requirements. Datacolor Match Pigment Datacolor Tools

Cisco Application Networking Manager Version 2.0

System Requirements - CommNet Server

System Area Manager. Remote Management

How To Install An Aneka Cloud On A Windows 7 Computer (For Free)

LCMON Network Traffic Analysis

Imaging Computing Server User Guide

Expertcity GoToMyPC and GraphOn GO-Global XP Enterprise Edition

HP ProLiant ML150 System Monitor User Guide

Information in this document, including URL and other Internet Web site references, is subject to change without notice. Unless otherwise noted, the

FileMaker Server 8. Administrator s Guide

CentreWare Internet Services Setup and User Guide. Version 2.0

SystemManager. Server Management Software. November, NEC Corporation, Cloud Platform Division, MasterScope Group

How To Install Acronis Backup & Recovery 11.5 On A Linux Computer

Client/server is a network architecture that divides functions into client and server

HP Intelligent Management Center v7.1 Virtualization Monitor Administrator Guide

Microsoft Windows Compute Cluster Server 2003 Getting Started Guide

A Brief. Introduction. of MG-SOFT s SNMP Network Management Products. Document Version 1.3, published in June, 2008

Course Description and Outline. IT Essential II: Network Operating Systems V2.0

Network operating systems typically are used to run computers that act as servers. They provide the capabilities required for network operation.

Moxa Device Manager 2.0 User s Guide

opensm2 Enterprise Performance Monitoring

KASPERSKY LAB. Kaspersky Administration Kit version 6.0. Administrator s manual

Attix5 Pro Server Edition

Maintaining Non-Stop Services with Multi Layer Monitoring

CT LANforge WiFIRE Chromebook a/b/g/n WiFi Traffic Generator with 128 Virtual STA Interfaces

INTELLECT TM Software Package

DiskPulse DISK CHANGE MONITOR

Transcription:

CATS-i : LINUX CLUSTER ADMINISTRATION TOOLS ON THE INTERNET Jiyeon Kim, Yongkwan Park, Sungjoo Kwon, Jaeyoung Choi {heaven, psiver, lithmoon}@ss.ssu.ac.kr, choi@comp.ssu.ac.kr School of Computing, Soongsil University 1-1, Sangdo-Dong, Dongjak-Ku Seoul 156-743, KOREA Abstract As the number of nodes in Linux cluster systems is increased, the administration of cluster systems becomes challenging due to the increased complexity of the system. The lack of powerful tools for the administrative tasks even makes the matter worse. In this paper, we present CATS-i, Cluster Administration Tools on the Internet, which provides automatic and convenient installation of operating systems and software packages as well as efficient monitoring and management of cluster. Keywords: Cluster Management Software, OS Installation Tools, Linux Cluster, PBS 1. Introduction As the computer technology is developed, we are provided the high performance processors and high bandwidth network with low latency cheaply and easily. One of the important trends of cluster systems is Linux cluster system, which emphasizes on the use of commodity hardware and open source software to deliver a very high-performance at an extremely low cost [2]. Linux cluster system has become one of the mainstream computing systems among the high-performance computing community. Recently, very large cluster systems started to appear. These systems consist of from hundreds to thousands of nodes. Installing operating systems and software packages on many nodes, and monitoring and management of such a huge system is tedious and challenging task since workstation and PCs are typically designed to work as a standalone system, rather that a part of clusters. And most cluster management softwares are focused on managing and monitoring cluster nodes after installing operating systems and application software packages. In this paper, we present CATS-i, Cluster Administration Tools on the Internet, which provides automatic and convenient installation of operating systems and application packages, efficient monitoring and management of cluster nodes with simple operations on the Internet, and easy-to-use graphical interface of PBS. That is, CATS-i allows users to install operating systems and application packages, monitor system activities and resource utilization of various components of PC clusters, and submit and manage a job. CATS-i consists of setup tools, management tools, and PBS interface. Operating systems and fundamental application packages are installed on cluster nodes using the setup tools,

which clone the same preinstalled and optimized system image to many slave nodes simultaneously and efficiently with its multicast facility. After the setup, administrator can monitor system activities and resource utilization and manage many cluster nodes with the management tools, which offer real-time monitoring and management of clusters on the Internet using Java. CATS-i currently supports the following functions: Aggregation of cluster resources Resource Activities File Management System Log Monitoring RPM Package Management Alert Services JPBS (Java Portable Batch System) Services The important features of CATS-i compared with other cluster management softwares are as follows: First of all, CATS-i provides automatic and convenient installation of operating systems. It includes a method to clone the system image of a target node to many other nodes simultaneously with an efficient multicasting, which reduces I/O loads. It is suitable for large clusters to install the same operating system, and enables users to install MS Windows as well as Linux operating systems. CATS-i provides monitoring of a cluster at node level, group level, and entire system level. CATS-i console provides powerful graphical user interface, implemented with Java, which offers real-time monitoring and management of a cluster on the Internet. CATS-i is expand-able and scalable since it consists of a 3-tier internal framework. CATS-i also provides a nice user interface of PBS [9], which makes it possible to submit and manage a job from one user interface. 2. Cluster Setup Tools The COCOA Beowulf cluster project [4] presented 16 steps of package installation and configuration. If the number of nodes in a cluster is increased, the work becomes tedious and time-consuming. CATS-i provides a facility to clone a disk image of a node to other nodes. At first, a booting diskette should be prepared, which includes a specially tailored kernel image to use a device driver for NIC, a DHCP, and a NFS. If it is required to make a compressed system image of a target node, in which an operating system is preinstalled and application software packages are installed and optimized, the target node is booted with the booting diskette and connected to the master node, in which CATS-i server is installed. After receiving a command from the master node, the target node sends its disk partitioning information and its compressed contents to the master node. Then this image can be shipped and installed to other slave nodes using the CATS-i. Slave nodes are required to be booted with the same booting diskette and connected to the master node to install the clone image of the target node. After receiving a request from slave nodes, the server sends the system image to them.

CATS-i provides a method to clone the system image of the target node including an operating system by multicasting, which reduces I/O loads. At first, each slave node gets a multicast module using NFS, which can receive multicasted UDP packets from the server. Each node installs the multicast module in its own RAM drive and joins its own IP address to D class IP (224.0.0.0 ~ 239.255.255.255). Then the master node multicasts the system image to slave nodes. IP multicast is unreliable since it uses UDP datagrams. CATS-i setup tools exploit some techniques to make up for the unreliability of UDP timeout and retransmission, which handle lost datagrams and sequence number effectively. Figure 1 shows how to install an operating system using the multicast method. All the slave nodes have identical images. Before they are rebooted, the IP addresses in DHCP configuration file should be correctly configured. Master Node Node DB Network Configuration info GUI Error/Flow Control Multicast Server Module UDP D class IP (224.0.0.0 ~ 239.255.255.255) UDP Node1 Node 2 Node 3 Node N Figure 1. Multicast and cloning process Cluster setup procedures are usually of text-based commands. CATS-i provides a graphic user interface implemented with Java, which makes the setup process easy and friendly. Figure 2 shows an interface related with OS installation including node database and network configuration.

Figure 2. Interface related with an OS installation including network configuration 3. Cluster Monitoring and Management Tools CATS-i offers management tools to users for maintenance of cluster nodes. The features of cluster monitoring and management tools in CATS-i are briefly summarized as follows. First of all, it is possible to bind many nodes as one cluster group, and manage multiple cluster groups in one place. Second, it is possible to apply the same operation efficiently to all or selected nodes. Third, CATS-i is implemented with Java, so it can be executed on various platforms without compiling the code again. Fourth, CATS-i is implemented with Java Swing set, which provides a convenient graphical user interface. The interface is interactive and easy to use. The CATS-i consists of three parts user interface (console), server daemon, and node daemon. The console applet works to monitor and manage the cluster system. The console applet is implemented with Java and provides a graphical user interface that shows lots of graphical information about the cluster system. It requests information about cluster nodes to the server daemon. When the console issues a job, the server daemon sends it to all or selected nodes. This 3-tier model works well with multiple platforms. The server daemon and the node daemon are implemented with the C language. The server daemon usually runs on one of the cluster nodes or on a front-end. Node daemons run on each node. The CATS-i console is implemented with Java, so it can be executed on several different platforms as an application or an applet. The console applet just sends and receives commands to and from a server daemon. The console consists of a node tree and a window to display results. Figure 3 shows an applet of the CATS-i console.

Figure 3. Main screen of CATS-i The left side of Figure 3 shows a node tree, which shows the status and structure of the nodes. Each cluster group may consist of several nodes. In Figure 3, three nodes are grouped as Heaven_Group and six nodes are grouped as Sea_Group. If a node isn t working, node icon tree will display red rectangle. The node icons show operating systems, which are installed on each node. The penguin icons represent Linux. The tree can be customized by the administrator. The tree information is stored on a server daemon. Users can select nodes from the tree to execute an operation. In Figure 3, users can execute an operation for nodes such as basic information, performance monitoring, user management, process management, disk usage, etc, from job menu, and a command like login, logout, reboot, and shutdown from system menu. Monitoring functions in the current version of CATS-i are classified into 6 groups according to their characteristics. a) Aggregation of cluster resources This function is called total view. Total view shows real-time information about the CPU and memory of all the nodes graphically, as in Figure 4. And it shows node information according to each group. b) Resource activities. It shows resource information of cluster nodes such as CPU, memory, account, user, realtime CPU and memory monitoring, process monitoring and managing, etc. Basic Information it shows basic information of selected nodes, such as the type and load of a CPU, the type and version of an operating system, amount of memory, etc. Performance Monitoring it shows real-time information about the CPU and memory of the selected nodes graphically. User List it represents users who are currently logged on the nodes.

Figure 4. Total view Process List it shows a list of processes which are currently running. Users can control the process list using the right button of a mouse. When clicking the right button, a popup menu appears, in which users can control process-related jobs such as kill, stop, and resume of the process. Disk Usage it displays disk usage information. It is similar to the df command, but with a convenient graphical interface. Users can see text information and graphical pie charts simultaneously. Account Control it enables the administrator to manage the account of selected nodes. And it shows a list of accounts at selected nodes. c) File Management A general file manager tool works only on one node. If users want to move one or more files from a node to another node, they usually use the ftp command. CATS-i provides file management functions for a cluster group. Users can copy one or more files from one node, then simply paste them to another node. It is very easy to use and its copy-and-paste method is the same as general file managers of Linux. When they want to perform jobs related with files, they just click the right button to show a pop-up menu. d) System Log Monitoring Log information is very useful in various situations. It collects log information from each node. Users can select log files to trace with a customized command. It supports real-time monitoring. e) RPM Package Management It shows installed RPM list of selected. Users can install, remove, upgrade and query RPM packages. The setup tool of CATS-i is only used for installing operating systems for specific nodes to build a cluster system. But cluster management tools are also required to install application packages in order to add, delete, and upgrade the service of the cluster system. The nodes of a cluster are physically separated, but the management tool of CATS-i provides functions to install, remove, upgrade, and query application packages to all selected nodes. Figure 5 shows an interface to install an application.

Figure 5. Application packages installation CATS-i currently supports the REDHAT Linux, so it can use the RPM system call and the RPM device library. A master node creates threads to connect slave nodes and sends a RPM file to slave nodes, which receive it and install the application using the rpm system call, as shown in Figure 6. N ode1 N ode1 CATS-i D aem on CATS-i Daemon Thread 1 Thread 2 Thread 3 TCP/IP *.rpm E rror, log TCP/IP N ode2 N ode2 CATS-i D aem on N ode3 N ode3 CATS-i D aem on Monitoring Monitoring command command Install Install log log Package Package DB& Info DB& Info Interface Interface Interface Interface Installing Installing module module Install Package Install Package log database log database Figure 6. Application packages installation mechanism f) Alert Services Administrator notifies urgent message to users. And this allows to events and its automatic triggering whenever event condition is met during running time. Notification is done though e-mail or system functions.

g) JPBS (Java Portable Batch System) Services Management tool of CATS-i supports PBS, called JPBS, which is interfaced with Java. It enables users to use a general PBS with the same CATS-i interface on the Internet. JPBS can batch jobs, run the job, and deliver output back to the submitter from a simple user interface as in Figures 7 and 8. Figure 7. Main screen of CATS-i JPBS Figure 8. Snapshot of JPBS Job Submission Dialog 4. Related Works There are several tools for OS installation such as NodeCloner [8] and Beoboot [3]. NodeCloner is developed to install OS easily by CACR (Center for Advanced Computing Research) at CalTech. To make the cluster system as easy to maintain as possible, it is better to make all nodes identical, that is, all nodes have the same disk partitions, the same application packages, and the same configuration files. This makes the installation packages easier to develop. There are various methods to clone a preinstalled system image. The

easiest way is to physically connect it to the hard disk, and then copy the image to other nodes. Another way, which may be more elegant, is to boot the nodes as diskless clients at first and let the setup scripts partition and copy the system from tar files or another already running system. It is required to use bootp or dhcp in booting diskless to clone the system. The NodeCloner package will help users setting up the NFS root on the front-end machine. But NodeCloner does not provide a GUI. Users must edit the setup files related to NodeCloner, which is quite complicated and requires advanced skills. Beoboot is a software product by the Rembo Technology SaRL, Switzerland. Beoboot is a bootrom-based software to automatically install Linux on an arbitrary large number of Intelbased computers. Beoboot is particularly well suited for managing clusters of PCs working together as a supercomputer. The bootrom-enabled PC differs from a stand-alone PC. When it is turned on, it will ask the server for a boot program instead of looking for it on the boot sector of the hard disk. Beoboot uses this mechanism to boot Linux directly from the network server, without any preliminary work on each individual client computer. Beoboot provides a command-based interface. Users must know the complicated usage of the program. LUI (Linux Utility for cluster Installation) [5] from IBM is an open source utility for installing Linux workstations remotely, over an Ethernet network. LUI provides tools to manage installation resources on the server, that can be allocated and applied to installing clients, allowing users to select just which resources are right for each client. LUI supports both the BOOTP protocol for diskette based client installation, as well as true network installation, using DHCP and PXE. Users must define the resource object to operate LUI, which is not easy. LUI uses the RPM and tar files to install the OS and application at the each client node, which are transferred through the TFTP from a server node. But with the TFTP protocol, transfer rate is deteriorate as the number of client nodes is increasing. And there are many tools for cluster administration. Alert System [1] from University of Virginia which allows several workstation to monitor a cluster of computers and reports regularly on their condition. The report is done through e-mail and web. Alert is portable and available on Linux, Solaris, and FreeBSD. In Alert, each node runs its own daemon to get node information. This type of tool is based on client/server architecture and provides a simple graphical user interface with which users can view the status of cluster nodes. It is relatively simple and easy to use. However, the CGI based systems such Alert display the information in a text mode or with simple graphics. They display the information statically. Users must click the reload button in order to view new information. VACM (VA Cluster Management) administration tool [11] is included in VA-Linux VACM runs on Linux OS and performs hardware related monitoring such as status of network traffic, CPU temperature, and fan speed. This system has a 3-tier structure. Every node has a node daemon. A server daemon collects information from node daemons and sends it to a VACM console. It is implemented with the GTK library. It is fast and provides a convenient graphical user interface. However it is closely related with a special hardware of the Intel s Platform Management Interface. MAT cluster administration tool [6] is console application with interpreter languages such as Tcl/Tk. These systems are very common and popular, but it causes a lot of overhead to display rapidly changing data.

SMILE administration tool [10] is called K-CAP. The K-CAP user interface uses Java Applets for connecting to K-CAP management node through predefined URL address in the cluster. It consists of real time monitoring system, parallel Unix command and numerous system administration utilities. SCMS is currently available freely as on open source software on Internet and used by many organizations around the world. M3C (Managing Multiple Multi-User Clusters) administration tool [7] from Oak Ridge National Laboratory enables users to manage multiple nodes simultaneously. It is implemented with Java and offers a concept called 'cluster grouping'. Users can schedule the jobs and install software at selected nodes or cluster groups through M3C. 5. Conclusion and Future Works In this paper, we presented CATS-i, Cluster Administration Tools on the Internet, which includes installation of operating systems and application packages as well as cluster monitoring and management. The reduced complexity will save a lot of expense of operating and maintaining the system. We are preparing the next version of CATS-i. The new version will show basic hardware information such as status of CPU temperature, voltage, and fan speed. It will support extended aggregation services and network analysis. As saving real-time data such as CPU, memory, and network traffic into database, CATS-i will be able to detect network bottleneck of cluster nodes. Administrator can specify condition such as maximum and minimum threshold as viewing statistics of CPU and memory, so CATS-i notifies emergency status though e-mail or log. 6. References [1] Alert, http://www.cs.virginia.edu/~jdm2d/alert/index.shtml. [2] BAKER, M.A., FOX, G.C. and YAU, H.W. (1995): Cluster Computing Review. Northeast Parallel Architectures Center. [3] Beoboot, http://www.rembo.com/beoboot. [4] COCOA Beowulf cluster, http://www.beowulf-underground.org/doc_project. [5] LUI, http://oss.software.ibm.com/developerworks/projects/lui/. [6] MAT (Monitoring and Administration Tool): http://www.ee.ryerson.ca:8080/~sblack/mat/. [7] M3C: http://www.csm.ornl.gov/torc. [8] NodeCloner: http://www.cacr.caltech.edu/beowulf/tutorial/building.html. [9] PBS: http://www.openpbs.org/ [10] UTHAYOPAS, P., MANEESILP J., and INGONGNAM, P. (2000): SCMS: An Integrated Cluster Management Tool for Beowulf Cluster System. In proceedings of the International Conference on Parallel and Distributed Proceeding Techniques and Applications, PDPTA 2000, (Las Vegas, Nevada, USA.) [11] VA-Linux Cluster Management System, http://www.valinux.com/software/vacm.