Monitoring Message Passing Applications in the Grid with GRM and R-GMA Norbert Podhorszki and Peter Kacsuk MTA SZTAKI, Budapest, H-1528 P.O.Box 63, Hungary pnorbert@sztaki.hu, kacsuk@sztaki.hu Abstract. Although there are several tools for monitoring parallel applications running on clusters and supercomputers they cannot be used in the grid without modifications. GRM, a message-passing parallel application monitoring tool for clusters, is connected to the infrastructure of R-GMA, the information and monitoring system of the EU-DataGrid project in order to collect trace information about message-passing parallel applications executed in the grid. In this paper, their connection is described. 1 Introduction The monitoring of grid applications is a new area for research. Existing tools developed for clusters and supercomputers are not usable without redesign. One of the main reasons is that they cannot be set-up on the grid for monitoring at all. The direct access rights to the resources and the a priori knowledge of the machines where the target application is executed are required for the tools to be set-up on a cluster or supercomputer. Without this knowledge on the grid the tools cannot be started and used for collecting information about the application. This is also the case for GRM [1], a semi-on-line monitor for message-passing parallel applications. To adapt it to the grid, the start-up mechanism as well as the data transfer had to be modified. Within the EU-DataGrid project [2], GRM is connected to R-GMA, the grid information and monitoring system. GRM uses R-GMA as a service to publish trace information about the monitored application and to transfer the trace to the user s site. In this paper, the connection of the two tools is described. First, GRM and R-GMA are shortly introduced. Then their connection is presented. Finally, about a small MPI example show visualised trace information as a result. 2 GRM Application Monitor GRM [3] is an on-line monitoring tool for performance monitoring of message passing parallel applications running in the grid. PROVE is a performance visualisation tool for GRM traces. When requested, GRM collects trace data from all machines where the application is running and transfers it to the machine where the trace is visualised by PROVE.
2 To enable monitoring of an application, the user should first instrument the application with trace generation functions. GRM provides an instrumentation API and library for tracing. The instrumentation API is available for C/C++ and Fortran. The basic instrumentation functions are for the start and exit, send and receive, multicast, block begin and end events. However, more general tracing is possible by user defined events. For this purpose, first the format string of a new user event should be defined, similarly to C printf format strings. Then the predefined event format can be used for trace event generation, always passing the arguments only. The instrumentation is explained in detail in [4]. The trace is event record oriented. One record (line) in the trace file represents one trace event of the application. Each record starts with a header containing information about the type of the event, generation time and id of the generating process. The remainder of the record contains the values for that given type of event. PROVE is a trace visualisation tool to present traces of parallel programs (see Fig. 1) collected by GRM. Its main purpose is to show a time-space diagram from the trace but it also generates several statistics from the trace that help to discover the performance problems, e.g. Gannt chart, communication statistics among the processes/hosts and detailed run-time statistics for the different blocks in the application process. Fig. 1. Visualisation of trace and statistics in PROVE These tools have been the basis in the development of a grid application monitor that supports on-line monitoring and visualisation of parallel/distributed applications in the grid. GRM can be used as a stand-alone tool for grid application monitoring, as its architecture is described in [3]. However, the problem of firewalls cannot be overcome
3 by GRM itself. If a firewall disables a connection between the components of GRM, the tool is not able to collect trace from the application processes. To solve this problem, a proxy-like solution is needed which enables the connection of two components by making a chain of connections from one of the components towards the other through some hops in the network. Such solution should be a service which is always available in the grid. Instead of creating a new GRM-service, we turned to R-GMA that is a continuously running grid monitoring service and that also can be used to transfer trace data through the network. 3 R-GMA, a Relational Grid Monitoring Architecture R-GMA (Relational Grid Monitoring Architecture, [5]) is being developed as a Grid Information and Monitoring System for both the grid itself and for use by applications. It is based on the GMA concept [6] from Global Grid Forum, which is a simple Consumer-Producer model. The special strength of this implementation comes from the power of the relational model. It offers a global view of the information as if each Virtual Organisation had one large relational database. It provides a number of different Producer types with different characteristics; for example some of them support streaming of information. It also provides combined Consumer/Producers, which are able to combine information and republish it. At the heart of the system is the mediator, which for any query is able to find and connect to the best Producers to do the job. R-GMA is not a general distributed RDBMS but it provides a way of using the relational data model in a Grid environment [7]. All the producers of information are quite independent. It is relational in the sense that Producers announce what they have to publish via an SQL CREATE TABLE statement and publish with an SQL INSERT and that Consumers use an SQL SELECT to collect the information they need. R-GMA is built using servlet technology and is being migrated rapidly to web services and specifically to fit into an OGSA (Open Grid Services Architecture, [8]) framework. It is important to emphasize here that R-GMA is a grid service providing an infrastructure to enable developers to create special producers and consumers for specific tasks and not a tool usable for any purpose (like application monitoring) in itself. 4 Connection of GRM and R-GMA GRM uses R-GMA to deliver trace data from the application process to the machine where the user is running the visualisation tool. The basic structure of the connection can be seen in Fig. 2. The application processes contain the instrumentation library that produces events. The main monitor of GRM is running at the user s host (where PROVE is running as well) and reads trace data from R-GMA. The instrumentation library is a Producer while GRM s main monitor is a Consumer of R-GMA. R-GMA is distributed among several hosts. It consists of servlets: Registry servlets are placed somewhere in the grid providing a fault-tolerant service for publishing information about available producers. Other servlets connect to the registry to find a way to communicate with each other. ProducerServlets are placed on several machines. Any producer of data should
4 connect to one of the ProducerServlets (whose address is set on the host where the producer is running). Similarly, every consumer connects to a ConsumerServlet. The configuration of R-GMA is very flexible to fit to the current grid infrastructure. For more detailed information see the architecture documentation of R-GMA [9]. Fig. 2. Structure of GRM in R-GMA R-GMA is always running in the grid as a service while GRM s main monitor is started by the user when the job is submitted. The application processes start to behave as producers of R-GMA when they are launched. This way, the structure of the monitoring chain is built-up with the application start. The instrumentation functions automatically connect to R-GMA at the start of the processes and trace events are published to R-GMA. GRM s main monitor acts as a Consumer of R-GMA, looking for trace data and receiving it from R-GMA. The delivery of data from the machines of the running processes to the collection host is the task of R-GMA now. As it can be seen in Fig. 3, R-GMA is using several servlets and buffers to deliver the trace data to the consumers. There is a local buffer in the application process itself that can be used to temporarily store data if a large amount of trace is generated fast. The processes are connected to ProducerServlets that are further connected to ConsumerServlets. Both kind of servlets create distinguished buffers for each Producer/Consumer that connect to them. The mediator functionality of R-GMA ensures that all matching information for a specific query are merged from several data sources and the consumer receives all information in one data stream. Thus, GRM s main monitor receives the whole application trace data in one single stream.
5 Fig. 3. Buffering and delivery of trace data within R-GMA The distinction between the traces of different applications is made by a unique id for each application. This id works as a key in the relational database schema and one instance of GRM is looking for one application with a given id/key. A proper id can be the global job id of the application which is defined by the grid brokering system. Currently, there is no defined way how the instrumentation functions within the application processes can get this id. So, the user should define a unique id for its application in the current version of GRM. After the application is submitted and GRM s main monitor is started, the main monitor connects R-GMA immediately and subscribes for traces with the id of the application. When R-GMA gives a positive response GRM starts continuously reading trace from R-GMA. 4.1 Monitoring of MPI applications As an example, the code of the systest demo application of the MPICH package is instrumented and the generated trace in PROVE is shown. The systest program performs two different tests. In the Hello test each process sends a short message to all the others. In the Ring test, the processes form a ring based on their ranks, the process 0 is connected to process 1 and N-1, where N is the number of processes. Starting from process 0, a messages with ever increasing size are sent around the ring, finally arriving at process 0 again. In the top window of the screenshots in Fig. 4. PROVE presents the full execution. The arrows on the left side of the picture represent the messages of the Hello test while the many arrows in the right side of the picture represent the Ring test. In be-
6 tween, the large section with light color represent an artifically inserted sleep statement in the program to make the different phases clearly distinguishable. The bottom left screenshot is the zoom to the left part. In this test each process sent a message to all the others, one by one. The first blocks with light color represent the barrier in the program. Also the triangle symbol representing the start of process P0 can be seen on the left. The blocks with light color on the right are the sleeping section in the processes. The bottom right screenshot shows the right side of the full trace. In this test messages with sizes 1, 2, 4,..., 524288 bytes are sent around the processes. The time of the communication is growing with the size of the message. The sending and receiving phases in the processes are distinguished by the alternating lighter and darker blocks. The triangles representing the exit statement in the instrumentation can also be seen on the right side of the picture. Fig. 4. Trace of MPI systest example application 5 Related work R-GMA is deployed within the EU-DataGrid project [2]. Other grid projects are mostly based on MDS [10], the LDAP based information system of Globus but, e.g., the Grid- Lab [11] project is extending the MDS to provide an information system and it is developing a new monitoring system [12]. OGSA [8] specifications and developments also address the issue of information systems and all projects above (including R-GMA)
7 will have to redesign their information systems according to OGSA specifications in the future. In the area of application monitoring, the OMIS [13] on-line monitoring interface, developed for clusters and supercomuters (similarly to the case of GRM) is the basis for a grid application monitoring system within the CrossGrid [14] project. Netlogger is used for monitoring distributed applications in the grid rather then for parallel programs. Its time-space visualisation display concept is orthogonal to PROVE. In the vertical axis different types of events are defined while in PROVE the processes of the parallel programs are presented. Netlogger can be used for finding performance/behaviour problems in a communicating group of distributed applications/services while PROVE for a parallel program that is heavily communicating within itself. Netlogger, PROVE and other tools like Network Weather Service and Autopilot has been compared in the beginning of the DataGrid project in detail, see [15]. GRM and R-GMA are the first tools that can be used for on-line monitoring of parallel applications running in the grid. 6 Conclusion R-GMA is a relational Grid Monitoring Architecture delivering the information generated by the resources, services and application processes in the grid. GRM/PROVE is a parallel application monitoring toolset that is now connected to R-GMA. The two systems together can be used for on-line monitoring and performance analysis of messagepassing parallel applications running in the grid environment. 7 Acknowledgement We would like to thank for the efforts of the developers of R-GMA in the EU-DataGrid project helping us to use their system together with GRM to monitor applications. The development of the tools described in this paper has been supported by the following grants: EU DataGrid IST-2000-25182, Hungarian DemoGrid OMFB-01549/2001 and OTKA T042459. References 1. N. Podhorszki. Semi-on-line Monitoring of P-GRADE Applications. PDPC Journal, to appear in 2003 2. EU DataGrid Project Home Page: http://www.eu-datagrid.org 3. Z. Balaton, P. Kacsuk, N. Podhorszki, F. Vajda. From Cluster Monitoring to Grid Monitoring based on GRM. Proc. of EuroPar 2001, Manchester, pp. 874 881 4. GRM User s Manual. Available at http://hepunx.rl.ac.uk/edg/wp3/documentation/ 5. S. Fisher et al. R-GMA: A Relational Grid Information and Monitoring System. 2nd Cracow Grid Workshop, Cracow, Poland, 2002 6. B. Tierney, R. Aydt, D. Gunter, W. Smith, V. Taylor, R. Wolski and M. Swany. A grid monitoring architecture. GGF Informational Document, GFD-I.7, GGF, 2001, URL: http://www.gridforum.org/documents/gfd/gfd-i.7.pdf
8 7. Steve Fisher. Relational Model for Information and Monitoring. GGF Technical Report GWDPerf-7-1, 2001. URL: http://www-didc.lbl.gov/ggf-perf/gma-wg/papers/gwd- GP-7-1.pdf 8. S. Tuecke, K. Czajkowski, I. Foster, J. Frey, S. Graham, C. Kesselman, and P. Vanderbilt. Grid service specification. GGF Draft Document, 2002, URL: http://www.gridforum.org/meetings/ggf6/ggf6 wg papers/draft-ggf-ogsi-gridservice- 04 2002-10-04.pdf 9. The R-GMA Relational Monitoring Architecture. DataGrid WP3 Report, DataGrid-01-D1.2-0112-0-3, 2001, Available at http://hepunx.rl.ac.uk/edg/wp3/documentation/ 10. K. Czajkowski, S. Fitzgerald, I. Foster, C. Kesselman. Grid Information Services for Distributed Resource Sharing. Proc. of the Tenth IEEE International Symposium on High- Performance Distributed Computing (HPDC-10), IEEE Press, August 2001. 11. GridLab project. URL http://www.gridlab.org 12. G. Gombás and Z. Balaton. A Flexible Multi-level Grid Monitoring Architecture. 1st European Across Grids Conference, Universidad de Santiago de Compostela, Spain, Feb. 2003 13. T. Ludwig and R. Wismüller. OMIS 2.0 A Universal Interface for Monitoring Systems. In M. Bubak, J. Dongarra, and J. Wasniewski, eds., Recent Advances in Parallel Virtual Machine and Messag Passing Interface, Proc. 4th European PVM/MPI Users Group Meeting, LNCS vol. 1332, pp. 267 276, Cracow, Poland, 1997. Springer Verlag. 14. B. Balis, M. Bubak, W. Funika, T. Szepienic, and R. Wismller. An Infrastructure for Grid Application Monitoring. In D. Kranzlmller, P. Kacsuk, J. Dongarra, and J. Volk ert, editors, Recent Advances in Parallel Virtual Machine and Message Passing Interface, 9th European- PVM/MPI Users Group Meeting, volume 2474 of Lecture Notes in Computer Science, pp. 41 49, Linz, Austria, September 2002. Springer-Verlag. 15. Information and Monitoring: Current Technology. DataGrid Deliverable DataGrid-03-D3.1. URL: https://edms.cern.ch/document/332476/2-0