A System for Monitoring and Management of Computational Grids
|
|
|
- Lee Perry
- 10 years ago
- Views:
Transcription
1 A System for Monitoring and Management of Computational Grids Warren Smith Computer Sciences Corporation NASA Ames Research Center Abstract As organizations begin to deploy large computational grids, it has become apparent that systems for observation and control of the resources, services, and applications that make up such grids are needed. Administrators must observe resources and services to ensure that they are operating correctly and must control resources and services to ensure that their operation meets the needs of users. Users are also interested in the operation of resources and services so that they can choose the most appropriate ones to use. In this paper we describe a prototype system to monitor and manage computational grids and describe the general software framework for control and observation in distributed environments that it is based on. 1. Introduction A recent trend in government and academic research is the development and deployment of computational grids [14, 22]. Computational grids are large-scale distributed systems that typically consist of high-performance compute, storage, and networking resources. Examples of such computational grids are the DOE Science Grid [3], the NSF Partnerships for Advanced Computing Infrastructure [6, 7], and the NASA Information Power Grid [29]. Most of the work to deploy these grids is in developing the software services to allow users to execute applications on large and diverse sets of distributed resources. These services include security, execution of remote applications, managing remote data, access to information about resources and services, and so on. There are several toolkits that provide these services, such as Globus [21], Legion [26], and Condor [30]. NASA is building a computational grid called the Information Power Grid (IPG) that is based upon the Globus toolkit. The IPG currently consists of resources and users at four NASA centers and our attempt to deploy a production grid of this size has highlighted the need for systems to observe and control the resources, services, and applications that make up such grids. We have found it difficult to ensure that the many resources in the IPG and the grid services executing on those resources are performing correctly. We have also found it cumbersome to perform administrative tasks such as adding grid users to our resources. These observations have led to our development of a system to address these needs. This paper provides an overview of our system for monitoring and managing a computational grid. It allows administrators to observe the status of the resources and services that make up a Globus-based computational grid, to perform actions to correct failures, and perform day-today administrative functions. This system is constructed using the CODE toolkit [35] that provides a secure, scalable, and extensible framework for making observations on remote computer systems, transmitting this observational data to where it is needed, performing actions on remote computer systems, and analyzing observational data to determine what actions should be taken. We begin our discussion with an overview of the CODE framework. Section 3 describes the current functionality of our grid monitoring and management system. Section 4 describes related work and Section 5 summarizes our work and presents future work. 2. CODE Framework We have developed a software framework for Control and Observation in Distributed Environments, called CODE for obvious reasons [35]. We are using this framework to implement several useful grid services, including our grid monitoring and management system. This section provides an overview of the framework Architecture We call CODE a framework because it contains the core code that is necessary for performing monitoring and management. Users only need to add components to this framework and start the framework running. For example, if a user wants to create a host monitor, she would create components to monitor processes, files, network communications, and so on. The user would then add these components to the framework and tell the framework to begin monitoring the host. This same process is used for adding components to perform management actions. In fact, the typical process will be
2 search for observers and controllers Directory Service advertise observer advertise actor Manager Management Logic Event Consumer Interface Director Interface request events events perform action action results Event Producer Interface Actor Interface Observer Sensor Manager Actor Actuator Manager Sensor Sensor Sensor Actuator Actuator Actuator easier because CODE provides a set of commonly used components for observing various properties and performing various actions and all a user will have to do is select which of these components to use. The CODE architecture is shown in Figure 1. The components that are shown with a solid outline are those that are supplied by our framework, the components that are shown with a dashed outline are provided by the user, and the gray boxes show the logical grouping of the components in our framework into entities that may be on different hosts. The logical components of our framework are observers that perform and report observations, actors that perform actions, managers that receive observations, make decisions, and request actions, and a directory service for locating observers and actors. An observer is a process on a computer system that provides information that can be measured from the system it is executing on. This could be information about the computer system, services or applications running on that computer system, or information that is not related to the computer system but that is accessible from it. Examples of this last type of information are scheduling queue information from a front-end system and the current use of a local area network. An observer provides information in the form of events. An event has a type and contains data in the form of <name, value> pairs. The values are typically of simple types such as string or integers, but can also be structures. An observer allows a manager to query for an event or to subscribe for a set of events. A subscription is useful, for example, if a user wants to be notified of the load on a system periodically or notified whenever some fault condition occurs. Access Figure 1. Architecture of the CODE framework. to events is controlled based on user identity and user location on both a per-observer and a per-event type basis. An observer consists of the following components: Sensor. A sensor is used to sense or measure some property. For example, a CPU load sensor would measure the CPU load of a host. A sensor is a passive component that performs measurements only when the sensor manager requests them. We are providing a set of sensors as part of our framework, but users will most likely need to implement sensors for their specific purposes. Sensor Manager. The sensor manager receives event requests or subscriptions from the event producer interface, uses the appropriate sensor at the appropriate time to perform a measurement, and sends the result of the measurement to the event producer interface in the form of an event. Event Producer Interface. The event producer interface provides an interface for observers to access a distributed event service. This event service allows event subscriptions to be established between producers and consumers, allows consumers to query for events from producers, and allows producers to send events to consumers. An actor is a process on a computer system that can be asked to perform actions. These actions are made from the actor process and could affect local or remote resources, services, and applications. Access to actions is controlled based on user identity and user location on both a peractor and a per-action type basis. An actor consists of the following components:
3 Actuator. An actuator is a component that can be used to perform a specific action. For example, an actuator can be used to start a daemon. An actuator is a passive component that performs actions only when the actuator manager requests them. We are providing a set of actuators as part of our framework, but users will most likely need to implement actuators for their own specific purposes. Actuator Manager. The actuator manager receives requests to perform actions from the actor interface, uses the appropriate actuator to perform the action, and sends the results of the action back to the actor interface. Actor Interface. The actor interface provides an interface to a distributed action service that transmits requests for actions and their results. A manager is a process that asks observers for information, reasons upon that information, and asks actors to take actions when the observations indicate that actions need to be taken. A manager consists of the following components: Management Logic. The management logic receives events from the event consumer interface, reasons upon this information to determine if any actions need to be taken, and then takes any actions using the director interface. There are two ways to implement the management logic: o Write C++ or Java code that contains a series of if and case statements, a state machine, or whatever code is needed to decide what actions to perform. o Use an expert system and write management rules. We are experimenting with using the CLIPS expert system [25] to simplify the writing of managers. Event Consumer Interface. The event consumer interface is used to request events from observers and receive those events. Director Interface. The director interface is used to request that actors perform actions and to receive the results of those actions. A common component of a computational grid is a directory service or grid information service [20]. For our purposes, a directory service is a distributed database that is accessed using the Lightweight Directory Access Protocol (LDAP) [28]. We use a directory service to store the locations of observers and actors, describe what types of observations and actions they provide, and allow managers to search for the observers and actors Implementation We have implemented the CODE framework in C++ and in Java so that it can be used from a variety of programming languages. At this point, the CODE framework supports communication using TCP, UDP, and SSL. The SSL communication is implemented using the Globus Grid Security Infrastructure [23]. The CODE framework is an implementation of the Grid Monitoring Architecture [40] defined in the Global Grid Forum and supports encoding of communication messages with extension of the event protocol [37, 38] that is being defined in the Global Grid Forum. This protocol encodes data using the extensible Markup Language (XML) and CODE uses the Xerces XML parsers to decode messages. Further, the format of the data CODE places in the directory service is compatible with the LDAP schemas [36] being defined in the Global Grid Forum. 3. Grid Management System As computational grids grow, it becomes very difficult to ensure the correct operation of the large number of resources and services that make up a grid and to configure the services that are available on a grid. We have developed a prototype Grid Management System (GMS) to assist with these tasks in a Globus-based grid such as the NASA Information Power Grid. Figure 2 shows the high-level architecture of this system and we will describe the components of this architecture next GRAM Management Agent The Globus toolkit includes a service called the Globus Resource Allocation Manager (GRAM) [17] that allows remote users to execute applications on a computer system. Our system has an agent on each host that has a GRAM server. The purpose of this agent is to ensure that the GRAM service is available to users, that the computer system it is associated with is operating correctly, and that there is network connectivity to other GRAM hosts. The GRAM management agent contains a CODE observer, actor, and manager. The observer is used to monitor the following properties: 1. The network latency between the GRAM host and other GRAM hosts. These latencies are used in this situation to detect any network problems. The ping sensor measures round trip times using the Unix ping command. 2. The available network bandwidth between the GRAM host and other GRAM hosts. These bandwidth measurements are also used to detect network problems and help users select resources. The IPerf [42] network measurement tool is used to make these measurements. 3. The CPU load is measured to determine if the computer system is overloaded and unusable. This measurement is made using three different sensors. One sensor uses the Unix uptime program, a second uses the PBS qstat command, and the third uses the
4 Directory Service Advertise existence Find managers and archive GIS Management Agent 1. Events describing current state 2. Action requests Management GUI GUI GRAM Management Agent Query for events that describe problems Event Archive 1. Subscribe 2. Events when problems Figure 2. Architecture of our grid monitoring and management system. LSF bjobs command. The sensor that is used depends on how access to the computer system is scheduled. 4. The memory statistics are measured using the Unix vmstat command, or similar commands, to determine if the memory subsystem is overloaded. 5. The available disk space is measured using the Unix df command. The GRAM servers require some minimal amount of disk space to operate. 6. The status of the GRAM reporter. The IPG is currently running the Globus MDS in classic mode. In that mode, a GRAM reporter daemon is executing on each GRAM host and writing data into a remote LDAP server. 7. The GRAM log files. These log files contain information about usage of the GRAM service and information about any problems that occur. These log files are observed for any problems. 8. The GRAM grid map file. This file specifies which grid users can execute applications through the GRAM service and also maps grid user identifiers to local Unix user identifiers. This information is provided so that remote administrators can determine and modify which grid users can use the GRAM service. The actor has actuators to perform the following actions: 1. Start and stop the Iperf server. An Iperf server is needed so that Iperf clients can connect and perform Iperf experiments. 2. Send . The actuator is used to send to administrators when a problem cannot be handled automatically, but must be corrected immediately. 3. Modify the GRAM gridmap file. This actuator is used by the remote management GUI so that access to the GRAM service on the host can be given to or taken away from grid users and the mapping of grid users to local user identities can be modified 4. Start, stop, or restart the GRAM reporter. If the GRAM reporter is not running or is not responding it can be stopped or started. The GRAM management agent also includes a CODE manager. At this time, this manager does not receive any observations nor perform any actions. This approach assumes that the management of a grid takes place in the management GUIs. In the future, this manager will receive observations and perform actions so that management functionality will be offloaded to the GRAM manager and that the system will be more scalable. When this agent begins executing, it locates the event archive (described further in Section 3.3) using the directory service and initiates subscriptions with the archive as the producer of events. These subscriptions indicate that the GRAM management agent will send events to the archive when problems occur. These problems include excessive CPU or memory use, failure of the GRAM reporter, or problems in the GRAM log files. At any time, management GUIs can contact this agent to receive information or request that actions be performed GIS Management Agent A Globus-based computational grid also has a distributed database, called a Grid Information Service (GIS) that contains information about the resources, users, services, and applications that are part of the computational grid. The Globus GIS is the called the Metacomputing Directory Service [16]. A GIS typically consists of multiple servers running on multiple hosts. Our architecture includes an agent to monitor the operation of GIS servers, to determine if there are any problems, and to take actions to attempt to address any problems. Similarly to the GRAM agent, this agent provides observational data to remote agents and allows authorized remote agents to request actions. The GIS management agent for each
5 Figure 3. Management GUI displaying the status of a subset of the resources on the NASA IPG. GIS server consists of an observer, an actor, and a manager. The observer monitors the following properties: 1. The network connectivity between the GIS hosts. LDAP servers typically refer searches for information to other LDAP servers. 2. The CPU load of the host. 3. The available memory of the host. 4. The available disk space. 5. The status of the LDAP server itself. This is measured in two ways. First, the existence of the LDAP server process is observed. Second, the time to request a search and receive a reply is measured. The actor that is part of a GIS management agent is relatively simple: It only has two actuators at the current time. One actuator is used to send s. The other actuator is used to start, stop, and restart the LDAP server. The GIS management agent also includes a CODE manager. At this time, this manager does not receive any observations nor perform any actions. When the GIS management agent begins executing, it locates the event archive using the directory service and initiates subscriptions with the archive as the producer of events. These subscriptions indicate that the GIS management agent will send events to the archive when problems occur Event Archive The event archive stores events so that management GUIs can obtain information about problems that have occurred in the past. The archive acts as an event consumer for events generated by GRAM and GIS management agents and acts as a producer of events for management GUIs. GRAM and GIS management agents use the directory service to find the archive and then they initiate a subscription to the archive. The agents then send events to the archive whenever problems occur. Management GUIs also find the archive using the directory service. They then query for events from the
6 Figure 4. Detailed information about the load on the SGI Origin lomax.nas.nasa.gov and about the connection between lomax.nas.nasa.gov (in California) and rogallo.larc.nasa.gov (in Virginia). archive. This query contains an event filter that is used to select which events to return. At the current time, we are using the Xpath [15] language as our filter language and the Xindice [12] XML database to store our events Directory Service As described in Section 2.1, the observers and actors on the GRAM and GIS hosts register themselves in the directory service so that managers can find them. The event archive also registers itself so that management GUIs, GRAM managers, and GIS managers can find it Management GUI The final component of our grid monitoring and management system is a graphical user interface that is used by grid administrators. Instances of this interface can be started and stopped at any time by multiple administrators. This interface allows grid administrators to view the current status of a grid, be notified when problems occur on the grid, examine problems that have occurred in the past, and perform grid administrative tasks. Figure 3 shows the management GUI being used to show the status of a subset of the resources on the NASA IPG. While the colors can t be seen here, the boxes around each computer system icon indicate whether status information from the machine has been received recently. The boxes are colored green when information has been received recently, yellow when an expected event has not been received, and red when two expected events have not been received. The vertical progress bars next to each computer system icon show what fraction of the CPUs in that computer system is being used. The lines between computer systems indicate whether the machines can ping each other. They are also colored green, yellow, and red to indicate if pings have been successful. The progress bars next to the lines show the fraction of maximum measured bandwidth, in both directions, that was available during the last bandwidth experiment. The computers and networks to monitor along with icon selection and placement of all of the graphical components are stored in a configuration file that is loaded when the GUI starts. As you can see from the figure, a user of the management GUI can quickly understand the status of some of the major IPG systems and the networks that connect them. Users can also click on any of the machines or network connections to display more detailed information such as that shown in Figure 4. This interface can also be used to perform administrative tasks. At the current time, the interface allows an administrator to add, remove, and modify users in the GRAM grid map files on the remote computer systems. This interface provides several different ways to view user access. One display shows the grid user to local user name mappings for a single computer system and allows modifications to these mappings. Another display shows all of the computers that a grid user has access to and all of the local user names the grid user maps to. An administrator can use this display to add or remove access to computer systems and specify which local user name a grid user should map to. 4. Related Work There are many existing systems for remote monitoring and management of networks and computer systems. A few commercial systems are OpenView [4] from Hewlett Packard, ManagementCenter [10] from Sun, Works2000 [2] from Cisco, Unicenter Networks and Systems Management [11] from Computer Associates, Tivoli Enterprise Console [5] and related products from IBM, PATROL [8] from BMC Software, and SiteAssure [9] from Platform Computing. These systems typically provide a wide range of monitoring and management services for a variety of resources. There are several problems with using these tools in our current grid environments. First, these products do have a cost associated with them, which may be difficult to afford for all of the participants in a multi-institution research grid.
7 Other problems are the lack of standards and compatibility between products and the lack of portability because of the unavailability of source code. Further, such tools do not support the grid security infrastructure. There are other systems that are free for noncommercial use or open source, but these tools tend to lack the functionality of the commercial tools previously mentioned and would need to be extended to manage grid services. Examples of such tools are NetSaint [24] and Big Brother [1]. Many of these types of tools are based on the Simple Network Management Protocol [39] that can provide information on networking and other types of resources, services, and so on. Another area of related work is distributed event services, one of the core components of our framework. There has also been a large amount of work in this area. CORBA has defined an event service [31] and a notification service [32]. The problem with these services is that they are part of CORBA, which is not commonly used in grid computing. There Java Messaging Service [27] has support for distributed event services, but this only supports the Java language. There are also research projects to develop distributed event services such as Sienna [13], Elvin [33], Echo [18], OIF [19], and XEvents [34] and research projects to perform various types of monitoring such as NWS [43], JAMM [41], and MDS [16]. Many of these services are quite usable, the main advantage of the one that is part of our framework is that it will continue to be compatible with the standards defined in the Grid Forum. The benefit of this is that each implementation of an event service or monitoring system will have positives and negatives in terms of programming language, performance, and usability. Standards allow users to select the best implementations for their needs and still communicate with other implementations that are optimized for different purposes. 5. Summary and Future Work Our efforts to deploy a computational grid at NASA have demonstrated the need for tools to observe and control the resources, services, and applications that make up grids. This has led to our development of CODE which provides a secure, scalable, and extensible framework for making observations from remote computer systems, transmitting this observational data to where it is needed, performing actions from remote computer systems, and analyzing observational data to determine what actions should be taken. A prototype of our framework is complete and we are continuing to improve it. In addition to the core framework, we have implemented sensors for measuring various properties such as process status, file characteristics, disk space, CPU load, network interface characteristics, and LDAP search performance. We have also implemented a few simple actuators. We have used this framework to develop a prototype grid management system that allows administrators to observe the current status of the resources and services that make up a grid, to correct problems when they appear, and to perform administrative tasks such as modifying which grid users can access which computer systems. The grid management system also records failures that occur so that they can be examined at a later time. In the future we will continue to improve and extend the CODE framework and we will also improve our Grid Management System as we gain experience from its use. Further, we will track the standards for grid event services that are being developed in the Global Grid Forum and will strive to be compatible with those standards. Acknowledgments We gratefully acknowledge the help of Abdul Waheed who participated in the early phases of this project and of Jerry Yan who provided the initial motivation. We also wish to thank Dan Gunter, Ruth Aydt, Brian Tierney, Dennis Gannon, and Valerie Taylor for the many useful discussions we have had related to this work both inside and outside of the Grid Forum. This work has been supported by the NASA HPCC and CICT programs. References [1] "Big Brother," [2] "CiscoWorks2000," s.shtml. [3] "The DOE Science Grid," [4] "HP OpenView," [5] "IBM Tivoli Enterprise Console," [6] "The National Computational Science Alliance," [7] "The National Partnership for Advanced Computing Infrastructure," [8] "PATROL Console," [9] "Platform SiteAssure," x.asp. [10] "Sun Management Center," center/. [11] "Unicenter Network and Systems Management," [12] "Xindice," [13] A. Carzaniga, D. S. Rosenblum, and A. L. Wolf, "Achieving Scalability and Expressiveness in an Internet-Scale Event Notification Service." In Proceedings of the Ninteenth ACM Symposium on Principles of Distributed Computing, Portland, OR, 2000.
8 [14] C. Catlett and L. Smarr, "Metacomputing," in Communications of the ACM, vol. 35, 1992, pp [15] J. Clark and S. DeRose, "XML Path Language (XPath) Version 1.0," World Wide Web Consortium November [16] K. Czajkowski, S. Fitzgerald, I. Foster, and C. Kesselman, "Grid Information Services for Distributed Resource Sharing." In Proceedings of the The 10th IEEE International Symposium on High Performance Distributed Computing, [17] K. Czajkowski, I. Foster, N. Karonis, C. Kesselman, S. Martin, W. Smith, and S. Tuecke, "A Resource Management Architecture for Metasystems," Lecture Notes on Computer Science, vol. 1459, [18] G. Eisenhauer, F. Bustamante, and K. Schwan, "Event Services for High Performance Computing." In Proceedings of the 9th IEEE International Symposium on High Performance Distributed Computing, [19] R. E. Filman and D. D. Lee, "Managing Distributed Systems with Smart Subscriptions." In Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, Las Vegas, NV, [20] S. Fitzgerald, I. Foster, C. Kesselman, G. v. Laszewski, W. Smith, and S. Tuecke, "A Directory Service for Configuring High-Performance Distributed Computations." In Proceedings of the 6th IEEE International Symposium on High Performance Distributed Computing, [21] I. Foster and C. Kesselman, "Globus: A Metacomputing Infrastructure Toolkit," International Journal of Supercomputing Applications, vol. 11, pp , [22] I. Foster and C. Kesselman, "The Grid: Blueprint for a New Computing Infrastructure,".: Morgan Kauffmann, [23] I. Foster, C. Kesselman, G. Tsudik, and S. Tuecke, "A Security Architecture for Computational Grids." In Proceedings of the 5th ACM Conference on Computer and Communications Security, [24] E. Galstad, "NetSaint Network Monitor," [25] J. Giarratano, Expert Systems: Principles and Programming: Brooks and Cole Publishing, [26] A. Grimshaw, W. Wulf, J. French, A. Weaver, and P. R. Jr., "Legion: The Next Logical Step Toward A Nationwide Virtual Computer," Department of Computer Science, University of Virginia CS-94-21, June, [27] M. Hapner, R. Burridge, R. Sharma, J. Fialli, and K. Stout, "Java Message Service Specification v1.1," Sun Microsystems June [28] T. Howes and M. Smith, LDAP: Programming Directory-Enabled Applications with Lightweight Directory Access Protocol: Macmillan Technical Publishing, [29] W. Johnston, D. Gannon, and B. Nitzberg, "Grids as Production Computing Environments: The Engineering Aspects of NASA's Information Power Grid." In Proceedings of the 8th IEEE International Symposium on High Performance Distributed Computing, [30] M. Litzkow and M. Livny, "Experience with the Condor Distributed Batch System." In Proceedings of the IEEE Workshop on Experimental Distributed Systems, [31] OMG, "Event Service Specification v1.1,", , March [32] OMG, "Notification Service Specification v1.0,", , June [33] B. Segall, D. Arnold, J. Boot, M. Henderson, and T. Phelps, "Content Based Routing with Elvin4." In Proceedings of the AUUG2k, Canberra, Australia, [34] A. Slominski, M. Govindaraju, D. Gannon, and R. Bramley, "SoapRMI Events: Design and Implementation," Computer Science Department, Indiana University TR549, May [35] W. Smith, "A Framework for Control and Observation in Distributed Environments," NASA Advanced Supercomputing Division, NASA Ames Research Center, Moffett Field, CA NAS , June [36] W. Smith and D. Gunter, "Simple LDAP Schemas for Grid Monitoring," The Global Grid Forum GWD-Perf- 13-1, [37] W. Smith, D. Gunter, and D. Quesnel, "A Simple XML Producer-Consumer Protocol," The Global Grid Forum GWD-Perf-8-2, [38] W. Smith, D. Gunter, and D. Quesnel, "An XML- Based Protocol for Distributed Event Services." In Proceedings of the The 2001 International Conference on Parallel and Distributed Processing Techniques and Applications, Las Vegas, NV, [39] W. Stallings, SNMP, SNMPv2, and CMIP: The Practical Guide to Network-Management Standards. Reading, Massachusetts: Addison-Wesley, [40] B. Tierney, R. Aydt, D. Gunter, W. Smith, Valerie Taylor, R. Wolski, and M. Swany, "A Grid Monitoring Service Architecture," Global Grid Forum Performance Working Group [41] B. Tierney, B. Crowley, D. Gunter, M. Holding, J. Lee, and M. Thompson, "A Monitoring Sensor Management System for Grid Environments." In Proceedings of the 9th IEEE Symposium on High Performance Distributed Computing, Pittsburgh, PA, [42] A. Tirumala and J. Ferguson, "Iperf Version 1.2," [43] R. Wolski, "Forecasting Network Performance to Support Dynamic Scheduling Using the Network Weather Service." In Proceedings of the 6th IEEE Symposium on High Performance Distributed Computing, 1997.
MapCenter: An Open Grid Status Visualization Tool
MapCenter: An Open Grid Status Visualization Tool Franck Bonnassieux Robert Harakaly Pascale Primet UREC CNRS UREC CNRS RESO INRIA ENS Lyon, France ENS Lyon, France ENS Lyon, France [email protected]
Monitoring Clusters and Grids
JENNIFER M. SCHOPF AND BEN CLIFFORD Monitoring Clusters and Grids One of the first questions anyone asks when setting up a cluster or a Grid is, How is it running? is inquiry is usually followed by the
2 Monitoring Infrastructure
237 An Infrastructure for Monitoring and Management in Computational Grids Abdul Waheed 1, Warren Smith 2, Jude George 3, and Jerry Yan 4 1 MRJ Technology Solutions, NASA Ames Research Center, Moffett
How To Manage A Monitoring Sensor Management System For A Grid
A Monitoring Sensor Management System for Grid Environments Brian Tierney, Brian Crowley, Dan Gunter, Mason Holding, Jason Lee, Mary Thompson Computing Sciences Directorate Lawrence Berkeley National Laboratory
A Survey Study on Monitoring Service for Grid
A Survey Study on Monitoring Service for Grid Erkang You [email protected] ABSTRACT Grid is a distributed system that integrates heterogeneous systems into a single transparent computer, aiming to provide
Resource Management on Computational Grids
Univeristà Ca Foscari, Venezia http://www.dsi.unive.it Resource Management on Computational Grids Paolo Palmerini Dottorato di ricerca di Informatica (anno I, ciclo II) email: [email protected] 1/29
DiPerF: automated DIstributed PERformance testing Framework
DiPerF: automated DIstributed PERformance testing Framework Ioan Raicu, Catalin Dumitrescu, Matei Ripeanu Distributed Systems Laboratory Computer Science Department University of Chicago Ian Foster Mathematics
Grid Scheduling Dictionary of Terms and Keywords
Grid Scheduling Dictionary Working Group M. Roehrig, Sandia National Laboratories W. Ziegler, Fraunhofer-Institute for Algorithms and Scientific Computing Document: Category: Informational June 2002 Status
How To Monitor A Grid System
1. Issues of Grid monitoring Monitoring Grid Services 1.1 What the goals of Grid monitoring Propagate errors to users/management Performance monitoring to - tune the application - use the Grid more efficiently
Web Service Based Data Management for Grid Applications
Web Service Based Data Management for Grid Applications T. Boehm Zuse-Institute Berlin (ZIB), Berlin, Germany Abstract Web Services play an important role in providing an interface between end user applications
A Monitoring Sensor Management System for Grid Environments
A Monitoring Sensor Management System for Grid Environments Brian Tierney, Brian Crowley, Dan Gunter, Jason Lee, Mary Thompson Computing Sciences Directorate Lawrence Berkeley National Laboratory University
An approach to grid scheduling by using Condor-G Matchmaking mechanism
An approach to grid scheduling by using Condor-G Matchmaking mechanism E. Imamagic, B. Radic, D. Dobrenic University Computing Centre, University of Zagreb, Croatia {emir.imamagic, branimir.radic, dobrisa.dobrenic}@srce.hr
HP A-IMC Firewall Manager
HP A-IMC Firewall Manager Configuration Guide Part number: 5998-2267 Document version: 6PW101-20110805 Legal and notice information Copyright 2011 Hewlett-Packard Development Company, L.P. No part of this
HP IMC Firewall Manager
HP IMC Firewall Manager Configuration Guide Part number: 5998-2267 Document version: 6PW102-20120420 Legal and notice information Copyright 2012 Hewlett-Packard Development Company, L.P. No part of this
Monitoring Message Passing Applications in the Grid
Monitoring Message Passing Applications in the Grid with GRM and R-GMA Norbert Podhorszki and Peter Kacsuk MTA SZTAKI, Budapest, H-1528 P.O.Box 63, Hungary [email protected], [email protected] Abstract.
Grid Technology and Information Management for Command and Control
Grid Technology and Information Management for Command and Control Dr. Scott E. Spetka Dr. George O. Ramseyer* Dr. Richard W. Linderman* ITT Industries Advanced Engineering and Sciences SUNY Institute
Comparison of Representative Grid Monitoring Tools
Report of the Laboratory of Parallel and Distributed Systems Computer and Automation Research Institute of the Hungarian Academy of Sciences H-1518 Budapest, P.O.Box 63, Hungary Comparison of Representative
An objective comparison test of workload management systems
An objective comparison test of workload management systems Igor Sfiligoi 1 and Burt Holzman 1 1 Fermi National Accelerator Laboratory, Batavia, IL 60510, USA E-mail: [email protected] Abstract. The Grid
Writing Grid Service Using GT3 Core. Dec, 2003. Abstract
Writing Grid Service Using GT3 Core Dec, 2003 Long Wang [email protected] Department of Electrical & Computer Engineering The University of Texas at Austin James C. Browne [email protected] Department
Grid Computing With FreeBSD
Grid Computing With FreeBSD USENIX ATC '04: UseBSD SIG Boston, MA, June 29 th 2004 Brooks Davis, Craig Lee The Aerospace Corporation El Segundo, CA {brooks,lee}aero.org http://people.freebsd.org/~brooks/papers/usebsd2004/
2. Create (if required) 3. Register. 4.Get policy files for policy enforced by the container or middleware eg: Gridmap file
Policy Management for OGSA Applications as Grid Services (Work in Progress) Lavanya Ramakrishnan MCNC-RDI Research and Development Institute 3021 Cornwallis Road, P.O. Box 13910, Research Triangle Park,
Distributed Network Management Using SNMP, Java, WWW and CORBA
Distributed Network Management Using SNMP, Java, WWW and CORBA André Marcheto Augusto Hack Augusto Pacheco Augusto Verzbickas ADMINISTRATION AND MANAGEMENT OF COMPUTER NETWORKS - INE5619 Federal University
Using LDAP Authentication in a PowerCenter Domain
Using LDAP Authentication in a PowerCenter Domain 2008 Informatica Corporation Overview LDAP user accounts can access PowerCenter applications. To provide LDAP user accounts access to the PowerCenter applications,
LSKA 2010 Survey Report Job Scheduler
LSKA 2010 Survey Report Job Scheduler Graduate Institute of Communication Engineering {r98942067, r98942112}@ntu.edu.tw March 31, 2010 1. Motivation Recently, the computing becomes much more complex. However,
vsphere Client Hardware Health Monitoring VMware vsphere 4.1
Technical Note vsphere Client Hardware Health Monitoring VMware vsphere 4.1 Purpose of This Document VMware vsphere provides health monitoring data for ESX hardware to support datacenter virtualization.
CSF4:A WSRF Compliant Meta-Scheduler
CSF4:A WSRF Compliant Meta-Scheduler Wei Xiaohui 1, Ding Zhaohui 1, Yuan Shutao 2, Hou Chang 1, LI Huizhen 1 (1: The College of Computer Science & Technology, Jilin University, China 2:Platform Computing,
Monitoring Replication
Monitoring Replication Article 1130112-02 Contents Summary... 3 Monitor Replicator Page... 3 Summary... 3 Status... 3 System Health... 4 Replicator Configuration... 5 Replicator Health... 6 Local Package
Classic Grid Architecture
Peer-to to-peer Grids Classic Grid Architecture Resources Database Database Netsolve Collaboration Composition Content Access Computing Security Middle Tier Brokers Service Providers Middle Tier becomes
Introducing Tripp Lite s PowerAlert Network Management Software
Introducing Tripp Lite s PowerAlert Network Management Software What Can It Do For You? Improve the accuracy and efficiency of the network operations staff in the context of power and environmental management
An Online Credential Repository for the Grid: MyProxy
An Online Credential Repository for the Grid: MyProxy Jason Novotny Lawrence Berkeley Laboratory [email protected] Steven Tuecke Mathematics and Computer Science Division Argonne National Laboratory [email protected]
SolarWinds Certified Professional. Exam Preparation Guide
SolarWinds Certified Professional Exam Preparation Guide Introduction The SolarWinds Certified Professional (SCP) exam is designed to test your knowledge of general networking management topics and how
Collaborative & Integrated Network & Systems Management: Management Using Grid Technologies
2011 International Conference on Computer Communication and Management Proc.of CSIT vol.5 (2011) (2011) IACSIT Press, Singapore Collaborative & Integrated Network & Systems Management: Management Using
Citrix MetaFrame Presentation Server 3.0 and Microsoft Windows Server 2003 Value Add Feature Guide
Citrix MetaFrame Presentation Server 3.0 and Microsoft Windows Server 2003 Value Add Feature Guide Advanced Functionality Basic Functionality Feature MANAGEMENT Microsoft Management Console Enhanced Connection
SCALEA-G: a Unified Monitoring and Performance Analysis System for the Grid
SCALEA-G: a Unified Monitoring and Performance Analysis System for the Grid Hong-Linh Truong½and Thomas Fahringer¾ ½Institute for Software Science, University of Vienna [email protected] ¾Institute
WHITE PAPER September 2012. CA Nimsoft For Network Monitoring
WHITE PAPER September 2012 CA Nimsoft For Network Monitoring Table of Contents EXECUTIVE SUMMARY 3 Solution overview 3 CA Nimsoft Monitor specialized probes 3 Network and application connectivity probe
A Brief. Introduction. of MG-SOFT s SNMP Network Management Products. Document Version 1.3, published in June, 2008
A Brief Introduction of MG-SOFT s SNMP Network Management Products Document Version 1.3, published in June, 2008 MG-SOFT s SNMP Products Overview SNMP Management Products MIB Browser Pro. for Windows and
WHITE PAPER OCTOBER 2014. CA Unified Infrastructure Management for Networks
WHITE PAPER OCTOBER 2014 CA Unified Infrastructure Management for Networks 2 WHITE PAPER: CA UNIFIED INFRASTRUCTURE MANAGEMENT FOR NETWORKS ca.com Table of Contents Solution Overview 3 Specialized Probes
The syslog-ng Premium Edition 5F2
The syslog-ng Premium Edition 5F2 PRODUCT DESCRIPTION Copyright 2000-2014 BalaBit IT Security All rights reserved. www.balabit.com Introduction The syslog-ng Premium Edition enables enterprises to collect,
MONITORING RED HAT GLUSTER SERVER DEPLOYMENTS With the Nagios IT infrastructure monitoring tool
TECHNOLOGY DETAIL MONITORING RED HAT GLUSTER SERVER DEPLOYMENTS With the Nagios IT infrastructure monitoring tool INTRODUCTION Storage system monitoring is a fundamental task for a storage administrator.
Heroix Longitude Quick Start Guide V7.1
Heroix Longitude Quick Start Guide V7.1 Copyright 2011 Heroix 165 Bay State Drive Braintree, MA 02184 Tel: 800-229-6500 / 781-848-1701 Fax: 781-843-3472 Email: [email protected] Notice Heroix provides
Survey of execution monitoring tools for computer clusters
Survey of execution monitoring tools for computer clusters Espen S. Johnsen Otto J. Anshus John Markus Bjørndalen Lars Ailo Bongo Department of Computer Science, University of Tromsø September 29, 2003
A Taxonomy and Survey of Grid Resource Planning and Reservation Systems for Grid Enabled Analysis Environment
A Taxonomy and Survey of Grid Resource Planning and Reservation Systems for Grid Enabled Analysis Environment Arshad Ali 3, Ashiq Anjum 3, Atif Mehmood 3, Richard McClatchey 2, Ian Willers 2, Julian Bunn
CDAT Overview. Remote Managing and Monitoring of SESM Applications. Remote Managing CHAPTER
CHAPTER 1 The Cisco Distributed Administration Tool (CDAT) provides a set of web-based facilities that allow the service-provider administrator to perform two different sets of tasks: Remote managing and
FioranoMQ 9. High Availability Guide
FioranoMQ 9 High Availability Guide Copyright (c) 1999-2008, Fiorano Software Technologies Pvt. Ltd., Copyright (c) 2008-2009, Fiorano Software Pty. Ltd. All rights reserved. This software is the confidential
HP Device Manager 4.6
Technical white paper HP Device Manager 4.6 Installation and Update Guide Table of contents Overview... 3 HPDM Server preparation... 3 FTP server configuration... 3 Windows Firewall settings... 3 Firewall
FUNCTIONAL OVERVIEW www.amdosoft.com
Business Process Protectors Business Service Management Active Error Identification Event Driven Automation Error Handling and Escalation Intelligent Notification Process Reporting IT Management Business
Basic Scheduling in Grid environment &Grid Scheduling Ontology
Basic Scheduling in Grid environment &Grid Scheduling Ontology By: Shreyansh Vakil CSE714 Fall 2006 - Dr. Russ Miller. Department of Computer Science and Engineering, SUNY Buffalo What is Grid Computing??
The syslog-ng Premium Edition 5LTS
The syslog-ng Premium Edition 5LTS PRODUCT DESCRIPTION Copyright 2000-2013 BalaBit IT Security All rights reserved. www.balabit.com Introduction The syslog-ng Premium Edition enables enterprises to collect,
HelpSystems Web Server User Guide
HelpSystems Web Server User Guide Copyright Copyright HelpSystems, LLC. Robot is a division of HelpSystems. HelpSystems Web Server, OPAL, OPerator Assistance Language, Robot ALERT, Robot AUTOTUNE, Robot
Vicom Storage Virtualization Engine. Simple, scalable, cost-effective storage virtualization for the enterprise
Vicom Storage Virtualization Engine Simple, scalable, cost-effective storage virtualization for the enterprise Vicom Storage Virtualization Engine (SVE) enables centralized administration of multi-platform,
G-Monitor: Gridbus web portal for monitoring and steering application execution on global grids
G-Monitor: Gridbus web portal for monitoring and steering application execution on global grids Martin Placek and Rajkumar Buyya Grid Computing and Distributed Systems (GRIDS) Lab Department of Computer
CATS-i : LINUX CLUSTER ADMINISTRATION TOOLS ON THE INTERNET
CATS-i : LINUX CLUSTER ADMINISTRATION TOOLS ON THE INTERNET Jiyeon Kim, Yongkwan Park, Sungjoo Kwon, Jaeyoung Choi {heaven, psiver, lithmoon}@ss.ssu.ac.kr, [email protected] School of Computing, Soongsil
OPTIMIZING APPLICATION MONITORING
OPTIMIZING APPLICATION MONITORING INTRODUCTION As the use of n-tier computing architectures become ubiquitous, the availability and performance of software applications is ever more dependent upon complex
Network Monitoring with SNMP
Network Monitoring with SNMP This document describes how SNMP is used in WhatsUp Gold v11 and provides examples on how to configure performance, active, and passive monitors. Introduction SNMP (Simple
Service Oriented Distributed Manager for Grid System
Service Oriented Distributed Manager for Grid System Entisar S. Alkayal Faculty of Computing and Information Technology King Abdul Aziz University Jeddah, Saudi Arabia [email protected] Abstract
Globus Striped GridFTP Framework and Server. Raj Kettimuthu, ANL and U. Chicago
Globus Striped GridFTP Framework and Server Raj Kettimuthu, ANL and U. Chicago Outline Introduction Features Motivation Architecture Globus XIO Experimental Results 3 August 2005 The Ohio State University
RED HAT ENTERPRISE VIRTUALIZATION FOR SERVERS: COMPETITIVE FEATURES
RED HAT ENTERPRISE VIRTUALIZATION FOR SERVERS: COMPETITIVE FEATURES RED HAT ENTERPRISE VIRTUALIZATION FOR SERVERS Server virtualization offers tremendous benefits for enterprise IT organizations server
SENTINEL MANAGEMENT & MONITORING
MANAGEMENT & MONITORING Network Monitoring Server Monitoring Database Monitoring Application Monitoring End User Response Time Monitoring Virtualisation Monitoring VOIP Monitoring SLA Monitoring Knowing
Condor-G: A Computation Management Agent for Multi-Institutional Grids
: A Computation Management Agent for Multi-Institutional Grids James Frey, Todd Tannenbaum, Miron Livny Ian Foster, Steven Tuecke Department of Computer Science Mathematics and Computer Science Division
How To Manage The Sas Metadata Server With Ibm Director Multiplatform
Manage SAS Metadata Server Availability with IBM Technology A SAS White Paper Table of Contents The SAS and IBM Relationship... 1 Introduction...1 Fault Tolerance of the SAS Metadata Server... 1 Monitoring
MCTS Guide to Microsoft Windows Server 2008 Applications Infrastructure Configuration (Exam # 70-643)
MCTS Guide to Microsoft Windows Server 2008 Applications Infrastructure Configuration (Exam # 70-643) Chapter Six Configuring Windows Server 2008 Web Services, Part 1 Objectives Create and configure Web
WhatsUp Gold v11 Features Overview
WhatsUp Gold v11 Features Overview This guide provides an overview of the core functionality of WhatsUp Gold v11, and introduces interesting features and processes that help users maximize productivity
Integrated and reliable the heart of your iseries system. i5/os the next generation iseries operating system
Integrated and reliable the heart of your iseries system i5/os the next generation iseries operating system Highlights Enables the legendary levels of reliability and simplicity for which iseries systems
SiteCelerate white paper
SiteCelerate white paper Arahe Solutions SITECELERATE OVERVIEW As enterprises increases their investment in Web applications, Portal and websites and as usage of these applications increase, performance
Systems Operations SUITE. Operations. Network Server SUITE
SUITE Achieve fail-safe control of your IBM i core applications, processes and systems and reduce the costs of IT through automation Advanced Automation SUITE LEVEL 2 Center SUITE Network Server SUITE
A technical guide for monitoring Adobe LiveCycle ES deployments
Technical Guide A technical guide for monitoring Adobe LiveCycle ES deployments Table of contents 1 Section 1: LiveCycle ES system monitoring 4 Section 2: Internal LiveCycle ES monitoring 5 Section 3:
Administrator s Guide
Administrator s Guide Citrix Network Manager for MetaFrame XPe Version 1.0 Citrix Systems, Inc. Information in this document is subject to change without notice. Companies, names, and data used in examples
FileNet System Manager Dashboard Help
FileNet System Manager Dashboard Help Release 3.5.0 June 2005 FileNet is a registered trademark of FileNet Corporation. All other products and brand names are trademarks or registered trademarks of their
Running a Workflow on a PowerCenter Grid
Running a Workflow on a PowerCenter Grid 2010-2014 Informatica Corporation. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying, recording or otherwise)
Network Monitoring with SNMP
Network Monitoring with SNMP This paper describes how SNMP is used in WhatsUp- Professional and provides specific examples on how to configure performance, active, and passive monitors. Introduction SNMP
IBM WebSphere MQ File Transfer Edition, Version 7.0
Managed file transfer for SOA IBM Edition, Version 7.0 Multipurpose transport for both messages and files Audi logging of transfers at source and destination for audit purposes Visibility of transfer status
Configuring Logging. Information About Logging CHAPTER
52 CHAPTER This chapter describes how to configure and manage logs for the ASASM/ASASM and includes the following sections: Information About Logging, page 52-1 Licensing Requirements for Logging, page
KNOWLEDGE GRID An Architecture for Distributed Knowledge Discovery
KNOWLEDGE GRID An Architecture for Distributed Knowledge Discovery Mario Cannataro 1 and Domenico Talia 2 1 ICAR-CNR 2 DEIS Via P. Bucci, Cubo 41-C University of Calabria 87036 Rende (CS) Via P. Bucci,
VERITAS Cluster Server v2.0 Technical Overview
VERITAS Cluster Server v2.0 Technical Overview V E R I T A S W H I T E P A P E R Table of Contents Executive Overview............................................................................1 Why VERITAS
CA NSM System Monitoring Option for OpenVMS r3.2
PRODUCT SHEET CA NSM System Monitoring Option for OpenVMS CA NSM System Monitoring Option for OpenVMS r3.2 CA NSM System Monitoring Option for OpenVMS helps you to proactively discover, monitor and display
Load Manager Administrator s Guide For other guides in this document set, go to the Document Center
Load Manager Administrator s Guide For other guides in this document set, go to the Document Center Load Manager for Citrix Presentation Server Citrix Presentation Server 4.5 for Windows Citrix Access
The GRID and the Linux Farm at the RCF
The GRID and the Linux Farm at the RCF A. Chan, R. Hogue, C. Hollowell, O. Rind, J. Smith, T. Throwe, T. Wlodek, D. Yu Brookhaven National Laboratory, NY 11973, USA The emergence of the GRID architecture
