10 CHAPTER 2 GRID MONITORING ARCHITECTURE AND TOOLS USED FOR GRID MONITORING This section presents literature survey about Grid computing, Grid standards, Globus Toolkit architecture, Grid monitoring process, Grid monitoring models, mobile agents, Grid network monitoring, agents role in Grid monitoring, job monitoring, automated deployment of service in Grid and Grid network performance prediction. 2.1 INTRODUCTION The Grid is a heterogeneous collection of resources providing a reliable admittance to all of the users thus enabling scalable virtual organizations for resource sharing between the geographically distributed communities. Grid monitoring involves the monitoring of the available resources and the network. The grid resources may dynamically join and leave that leads to monitoring of resources a momentous task. The existing monitoring strategy will significantly increase the system overhead when the size of the computing facility grows (Zanikolas and Sakellariou 2005). 2.2 GRID STANDARDS Grid is a complex system in which many services are collaborated to solve a complex scientific application. The Open Grid Services Architecture (OGSA), developed by the Global Grid Forum (GGF), aims to
11 define a common, standard, and open architecture for Grid-based applications. The main goal of OGSA is to standardize all the services by specifying a set of standard interfaces for these services. The OASIS developed a specification called Web Services Resource Framework (WSRF) that specifies how to create the web services WSRF compliant. It is a joint effort of the Grid and Web Services communities. WSRF is the infrastructure that has built on top of OGSA architecture. WSRF provides the stateful services which are needed by OGSA. The OGSA architecture is shown in Figure 2.1. Figure 2.1 OGSA Architecture 2.2.1 OGSA OGSA describes architecture for a service-oriented grid computing environment for business and scientific use, developed within GGF. OGSA has been described as a refinement of the emerging Web Services architecture, specifically designed to support Grid requirements. OGSA is a distributed interaction and computing architecture based around services, assuring interoperability on heterogeneous systems so that different types of resources can communicate and share information.
12 2.2.2 OGSI The Open Grid Services Infrastructure (OGSI) was published by GGF as a proposed recommendation. It was intended to provide an infrastructure layer for the OGSA. OGSI takes the statelessness issues (along with others) into account by essentially extending Web services to accommodate Grid computing resources that are both transient and stateful. 2.2.3 WSRF A web service by itself is nominally stateless, i.e., it retains no data between invocations. This limits the things that can be done with web services, although workarounds exist such as having the web service read from a database. WSRF offers a set of operations that web services may implement to become stateful; web service clients communicate with resource services which allow data to be stored and retrieved. When clients talk to the web service they take in the identifier of the specific resource that should be used inside the request, encapsulated within the WS-Addressing endpoint reference. This may be a simple URI address, or it may be complex XML content to identify or even fully describe the specific resource in question. Alongside the notion of an explicit resource reference comes with a standardized set of web service operations to get/set resource properties. These can be used to read and perhaps write resource state, in a manner somewhat similar to having member variables of an object alongside its methods. The Globus Toolkit version 4 includes Java and C implementations of WSRF. 2.2.4 Grid Service A Grid service is a Web service that conforms to a set of conventions (interfaces and behaviours) that define how a client interacts with
13 a Grid service. A client gains access to a Grid service instance through Grid Service Handles (GSH) and Grid Service References (GSR). OGSI provides a mechanism, the Handle Resolver, to support client resolution of a GSH into a GSR. OGSI is based on Web services, and in particular uses Web Services Description Language (WSDL) as the mechanism to describe the public interfaces of Grid services. WSDL is extensible to allow description of endpoints and their messages regardless of what message formats or network protocols are used to communicate. WSDL recognizes the need for rich type systems for describing message formats, and supports the XML Schemas specification (XSD) as its canonical type system. 2.3 GRID MIDDLEWARE Grid is an infrastructure that involves the integrated and collaborative use of computers, networks, databases and scientific instruments owned and managed by multiple organizations. Grid applications often involve large amounts of data and/or computing resources that require secure resource sharing across organizational boundaries. Grid middleware s provide users with seamless computing ability and uniform access to resources in the heterogeneous Grid environment. In order to provide users with a seamless computing environment, the Grid middleware systems need to solve several challenges originating from the inherent features of the Grid. One of the main challenges is the heterogeneity in Grid environments. Another challenge involves the multiple administrative domains and autonomy issues because of geographically distributed Grid resources across multiple administrative domains and owned by different organizations. Currently, many organizations and research institutes are using this middleware for their Grid operations. Some of the popular grid middleware s are Globus Toolkit (GT), glite, Unicore, Gridbus Broker, Condor G.
14 2.4 GLOBUS TOOLKIT 4 (GT4) ARCHITECTURE AND ITS COMPONENTS Globus Alliance, a community of organizations and individuals developing fundamental technologies behind the Grid, along with many other communities, developed Globus Toolkit (Globus 2004). It is an open source software toolkit used for building Grid systems and applications. The Globus Toolkit (GT) is a community-based, open-architecture, open-source set of services and software libraries that support Grid and Grid applications. The toolkit addresses issues of security, information discovery, communication, resource management, fault detection and portability. Globus Toolkit Version 4.0 (GT4) is a latest implementation of OGSA. The Globus Toolkit has been included with a complete implementation of the WSRF specification. Globus middleware has been initially designed to enable applications that aggregate distributed resources, whether computers, storage, data, services, networks, or sensors. Globus was motivated by the demands of Virtual Organizations (VOs). Globus has developed as a layered architecture in which the higherlevel services can be developed using the lower-level core services. The Resource discovery in Globus is done by querying the Monitoring and Discovery Service (MDS4) available in GT4 and LDAP (Lightweight Directory Access Protocol) based information store called as Metacomputing Directory Services (MDS2) in Globus Toolkit 2 (GT2). 2.5 GRID MONITORING Grid monitoring is characterized by significant requirements including, among others, scalable support for both pull and push data delivery models applied over vast amounts of current and past monitoring data that may be distributed across organizations. The data format of a monitoring system has to balance between extensibility and self-description on one hand and compactness on the other. The former is required to accommodate the
15 ever expanding types of monitored resources, whereas the latter is a prerequisite for non-intrusive and scalable behavior. The problem is further complicated by the continuous evolution of Grid middleware and the lack of consensus regarding data representation, protocols and semantics, leading to ad-hoc solutions of limited interoperability. Existing proprietary network and host monitoring applications lack the openness required for interoperability and customization, while they also impose significant financial costs. The main goal of the Grid monitoring is to measure and publish the state of resources at a particular point in time. Li and Baker (2005) stated that an efficient monitoring must be end-to-end, means that all components which includes software (e.g. applications, services, processes and operating systems), host (e.g. CPUs, disks, memory, etc), and networks (e.g. routers, switches, bandwidth, latency, etc.). In addition to Grid information services (GIS) (Czajkowski et al 2001), monitoring is also crucial in a variety of cases such as scheduling, data replication, accounting, performance analysis and optimization of distributed systems or individual applications, self-tuning applications, and many more (Balaton et al 2001). It will be used to provide background measurements of network performance which will be of value to network managers and those tasked with the provision of network services for Grid applications. In addition, it will be used to understand the impact of Grid applications on the operation of networks. This will include the storage of performance figures for subsequent use by the application to adjust their behaviour accordingly. To characterize and quantify network behaviour and to inform Grid applications, via the middleware, of the current status of the network, network monitoring is needed. This acts as publication mechanism to identify fault conditions in the operation of the grid and to provide input to network configuration and development of the grid infrastructure. There are different
16 levels of monitoring needed in grid environments such as application specific, node level, cluster/site Level, and grid level. Monitoring grids includes four stages (Mansouri-Samani and Sloman 1993) such as Generation of events, that is, sensors enquiring entities and encoding the measurements according to a given schema, Processing of generated events is application-specific and may take place during any stage of the monitoring process, for example, filtering the events according to some predefined criteria, Distribution refers to the transmission of the events from the source to any interested parties, and Presentation carries out further processing so that the overwhelming number of received events will be provided in a series of abstractions in order to enable an end-user to outline conclusions about the operation of the monitored system. 2.6 AN OVERVIEW OF GRID MONITORING MODELS Numerous monitoring architectures have been developed to make Grid monitoring more efficient and scalable. 2.6.1 GMA An open architecture is proposed by the GGF s Grid Monitoring Architecture Working Group (GMA-WG) (Tierney et al 2002) and it is shown in Figure 2.2. It is a producer-consumer-registry model and also defines the interactions between producers and consumers. A producer registers a description of its event stream with the directory service. A consumer contacts the directory service to locate producers that have data relevant to its query. A communication link is then set up directly with each producer to acquire data, either by a publish/subscribe protocol, or by a query/response protocol.
17 Figure 2.2 Components of GMA and interactions Consumers may also register with the directory service. These are then notified whenever new producers become available. Intermediary components consist of both a consumer and a producer. Intermediaries may be used to forward, broadcast, filter, aggregate or archive data from other producers. Then the intermediary makes that data available for other consumers from a single point in the Grid. Grid Monitoring Architecture (GMA) is scalable, because of separating the tasks of information discovery, enquiry, and publication. Even though GMA is not supporting a data model, or query language or protocol for data transmission. It is not specifying details about how the information should be stored in the directory service. 2.6.2 R-GMA Relational Grid Monitoring Architecture (R-GMA) (Cooke et al 2004) is built as part of the EU DataGrid project; a framework which combines grid monitoring and information services based on the relational model. RGMA provides an implementation of GMA that utilizes a relational model. Database producers are utilized for static data stored in databases, whereas stream producers for dynamic data stored in memory resident circular buffers. New producers announce their relations using an SQL create table query, offer them via an SQL insert statement, and drop their
18 tables when they finish. A consumer is defined as an SQL select query. In order for a component to act as either a consumer or a producer, it has to instantiate a remote object (agent) and invoke methods from the appropriate (consumer or producer) API. The global schema comprises of a core set of relations, while new relations can be created and dropped by producers dynamically. Republishers are defined as one or more SQL queries to provide a relational view on data received by producers or other republishers. The registry holds the relations and views provided by database producers, stream producers and republishers. A mediator uses the information available in the registry and cooperates with consumers to dynamically construct query plans for queries. RGMA provides access to the information of a Virtual Organization s resources because information is stored in a single RDBMS and also able to extend across VOs. Overall, the system supports good scalability through replication of the global schema and the registry, and the combination of data sources into a hierarchy of republishers. Two special properties of R-GMA that differs from GMA are, to supply or obtain information from R-GMA, user does not need to know about the Directory Service, because the Consumer and Producer handle the Directory Service and the Information and monitoring system appears like one large relational database, and can be queried as such. 2.6.3 MapCenter MapCenter (Bonnassieux et al 2002), developed as part of the EU Data-Grid project, is used to monitor the application and display of all the availability of services across Grid. It builds and periodically updates a model of the network services available in Grid, and provides this information in several logical views (sites, Virtual Organizations (VOs), applications, geographical) through a web interface. It supports automatic polling and
19 permits users to probe services on demand. MapCenter does not keep details concerning configuration and utilization of resources. However, it does allow users to dynamically query an MDS server because it is using a PHP-based LDAP client. 2.6.4 NWS The Network Weather Service (NWS) provides non-intrusive performance monitoring and forecasting within distributed systems (Wolski et al 1999). It supports scheduling and dynamic resource allocation for distributed computational environments (Wolski 1998). NWS sensors are used for estimating CPU load, memory utilization and end-to-end network bandwidth and latency for all possible sensor pairs (NWS 2004). Sensors combine passive and active monitoring metrics to accomplish accurate measurements, and are stateless to improve robustness and also to reduce the network monitoring intrusiveness (Swany and Wolski 2002). An NWS sensor must be installed and configured on each host to start the monitoring activities periodically to collect local information. Network sensors make use of a set of techniques for avoiding conflicts among contending sensors. Sensors are managed through a sensor control process and their events are sent to a memory service, both of which can be replicated for load distribution and fault-tolerance. All components subscribe to an LDAP-based registry, i.e. name service using a soft-state protocol so the Globus users are able to query the NWS name server to obtain the performance data. 2.6.5 Ganglia Ganglia (Massie et al 2004) is an open source hierarchical monitoring system for high performance computing such as clusters and Grid. At the cluster level, membership is determined with a broadcast and soft-state protocol i.e., membership must be periodically renewed by explicit messages
20 or otherwise expires. All nodes have a multi-threaded daemon (Ganglia monitoring daemon) to perform the following tasks such as it collects and broadcasts External Data Representation (XDR) encoded events from the local host, it listens the broadcasts sent by other Ganglia nodes, and it answers to consumer requests about any node in the local cluster, using XML encoded messages. Ganglia introduces considerable, although linear, overhead both at hosts and networks at cluster and hierarchy levels because of the multicast updates and XML event encoding. The network intrusiveness imposed by republishers connected through Wide Area Network (WAN) links which increase associated costs. The other concerns include the IP multicast messaging, and the unavailability of a registry since Ganglia was primarily intended for clusters, which are fairly static compared to grids. 2.6.6 MDS The Monitoring and Directory Service (MDS) provides a LDAP based information infrastructure, suited for grid environments (Czajkowski et al 2001). There are 2 components of MDS implementation GRIS and GIIS. The Grid Resource Information Service (GRIS) implements a uniform means to query resources on a grid for current status and configuration. The Grid Index Information Service (GIIS) component of MDS provides a framework to form an index over various GRIS s or other GIIS s. This combines the information of an entire system, in that way giving a method to search a coherent system image. MDS (MDS 2005) utilizes the concept of information providers that act as probe utilities or sensors to the smaller components of a Grid. Since it is based on LDAP, MDS needs a schema to be built in that represents the hierarchy and rules of the information to be retrieved from the
21 components of a Grid. Caching mechanisms are used to make the retention of data more efficient, based on the time-sensitivity of a piece of information. It uses an extensible framework with a hierarchical structure for managing static and dynamic information about the status of grid components generated by information providers. Index services provide an aggregation service of lower level data using a soft-state registration protocol and caching to minimize the transfer of un-stale data. MDS2 is built on top of LDAP which enables only simple query processing (Wahl et al 1997). MDS3 is based on the Globus Tool kit 3(GT3) information services component using service data elements defined in OGSA. The well-known Grid Middleware GT4 (Borja and Lisa 2005) has MDS4 which monitors the resources in Grid using LDAP service to access the resource metrics but MDS doesn t provide the network metrics and also not supporting complex querying (Schopf et al 2005). 2.6.7 NetLogger The Network Application Logger Toolkit (NetLogger) presents an overall view of the performance bottlenecks with significant intrusiveness because the activation nodes periodically polling the activation manager and the applications checks their configuration file periodically (Gunter et al 2003). It includes wrappers for UNIX system and network monitoring tools which generate monitoring events. The applications can transparently log events dynamically to a single destination over WAN. The NetLogger visualization tool is used for interactive graphical representation of systemlevel and application-level events so it is mainly intended for application monitoring. 2.6.8 JAMM Java Agents for Monitoring and Management (JAMM) uses sensors to collect and publish host monitoring data and uses LDAP servers for
22 the sensor directory. The sensors can be updated and new sensors can be deployed dynamically, for example, the sensors include host, network, process and application sensors which are used to monitor CPU, memory, network usage and application errors respectively (Tierney et al 2001). It uses sensors to filter the incoming events according to the consumer queries but not supporting the replication. The multiple sensors configuration potentially increase loads on the event gateway and also increases the intrusiveness. 2.6.9 GridRM GridRM (Grid Resource Monitoring) is based on Producer- Consumer model and it provides SQL query language for clients to interact with emerging resource monitoring technologies through drivers (Baker and Smith 2003). The consumers use the registry to find out the resources in the Grid Site and query those resources. The resources can be searched using naming schemas of their drivers provide and also through SQL queries. Fault tolerance is realized with multiple GridRM gateways and replica of registry but it suffers from scalability issues. 2.6.10 Autopilot Autopilot is based on GMA and provides an interface to enable real-time adaptive control of parallel and distributed computing resources. Sensors are configured to return specific resource information (Ribler et al 1998). The clients must be previously configured for retrieving information, because the query interface is not supported to sensors. 2.6.11 GridICE GridICE (Andreozzi et al 2005) is mainly used to monitor the Grid Resources to analyze their use, behavior and performance. Multiple users
23 allow for view the status information of resources concurrently. GridICE doesn t have an event mechanism to provide notification of new resources. It can act as information provider to MDS2 but not supported in MDS3 and MDS4. 2.6.12 Mercury Mercury is a monitoring system to support application monitoring, self-tuning, performance analysis and prediction (Mercury 2005). It is based on GGF s GMA and Autopilot. Multiple clients can monitor multiple resources concurrently. It monitors host and application information using sensors which are installed on hosts. The host sensors are operating system specific and provide a defined range of host information. 2.6.13 GridMon GridMon is a network performance monitoring toolkit for identifying faults and inefficiencies (GridMon 2004). It consists of a set of tools such as PingER, IPerf to offer RTT, packetloss, TCP and UDP throughput and jitter. The bbcp and bbftp tools are used to copy the files between Grid sites. 2.6.14 SCALEA-G SCALEA-G is a GMA based and provides resource monitoring and performance analysis of the system in Grid (Truong and Fahringer 2003). It has more Grid services like registry, client, archival, sensor and sensor manager. The system sensors monitor the network links, hard disks, memory usage and CPU availability. The performance analyzer analyzes the collected data and provides the result to the user.
24 2.6.15 Remos The Resource monitoring system (Remos) provides performance measurements of local and WANs to network-aware applications (Dinda et al 2001). It has a query based interface and several types of collectors to support heterogeneity of networks through SNMP. Remos not only provides current load measurements but also supports predictions for host load and network measures such as bandwidth utilization and latency. 2.6.16 TopMon TopoMon is a grid network monitoring tool based on NWS (Den Burger et al 2002). It considers two characteristics of the network such as latency and bandwidth between the cooperating computers in a Grid environment. The network topology sensors explore the structure of the Grid network and it is provided for applications to take the performance measure of all end-to-end network paths. 2.7 MOBILE AGENTS Mobile agents are considered as one of the most powerful forms of code mobility (Danny 2002). They can exploit the high processing power available in the server machines by shifting the computations into the server side. Mobile agent technology seems to be very adequate to cope with systems heterogeneity and to deploy user customized procedures on remote sites. Mobile agents are autonomous, intelligent programs moving through a network, searching for and interacting with services on the user's behalf and possess inherent navigational autonomy (Carzaniga 2002). The mobile agents role is vital in grid because it supports the local monitoring, local correlation on management devices, deploying agents dynamically, good
25 scalability, continuing execution on device when link-down or unreliable and return results when available (Puliafito and Tomarchio 2000). Mobile agents provide an effective way of migrating code and data together and return the results to the original user. Unlike RMI, which transfers data alone using stubs and skeletons, Mobile agents uses the concept of migration of the code and the data. Mobile agent based network monitoring in Grid helps us to effectively utilize the idle resources available in the geographically separated areas in a more optimized way. The choice between mobile agents and client-server paradigm is discussed by Carzaniga et al (1997) and Fuggetta et al (1998) dealt their performance comparison. And the comparison between mobile agent and RMI-based applications has attracted a growing attention. This is probably because mobile agents have a better fault tolerance when compared to RMI (Aderounumu et al 2006). El-Gamal et al (2007) presented about distributed information processing with agent technology. Consider a mobile computing scenario with low bandwidth connection links, agents suit them in a better way because the code to be executed migrates into the network leaving the portable device and performs necessary actions in the remote execution site. Only the result is returned back to the mobile device (Gray et al 2001). Mobile agent paradigm has the following advantages in the field of network management such as reduction of the network load, opportunity of performing operations of monitoring data analysis, filtering of monitoring data at several abstraction levels, asynchronous and independent execution of tasks defined by the user, the integration of heterogeneous network monitoring tools, on-demand enabling of the services without high overheads for the system. 2.7.1 Aglets Aglets are Java objects and it can move from one host on the network to another. An aglet can execute on one host which can suddenly halt
26 execution, then dispatch to a remote host, and start executing again. When the aglet moves, it carries the program code as well as the states of all the objects. It has the basic operations such as creation, cloning and dispatching. A builtin security mechanism makes it secure to host untrusted aglets. It supports dynamic and powerful communication which enables agents to communicate with unknown agents as well as well-known agents (Aglets 2004). 2.8 GRID NETWORK MONITORING Grid environments have an integrating infrastructure for distributed high-performance scientific applications (Foster and Kesselman 1999). Although, many ideas had been presented in the field of network management for grid environment, most of the techniques suffer from scalability related issues. The IETF IP Performance Metrics Working Group (IPPM-WG) and GGF Network Measurement Working Group (NM-WG) categorizes a set of network characteristics to identity the various types of network measurements. The proposed network monitoring system follows the guidelines of the network measurement classification and identification which is specified by Lowekamp et al (2004). Network Monitoring in Grid is an immense research and different architectures have been proposed in (Tierney et al 2001, Gunter et al 2003). This thesis mainly focuses on network monitoring in grid which is based on GMA and network performance measurements to improve the resource utilization and reduce the load on the grid resources. The users, services, and data are in need of communication over networks so the network information is necessary for efficient Grid schedulers to perform their tasks (Tomás et al 2008). Den Burger et al (2002) described a monitoring tool for Grid network topology called TopoMon which uses NWS sensors to monitor the
27 performance of the end-to-end paths between all sites of a grid. It focuses on two network characteristics such as latency and bandwidth but the proposed Grid network monitoring system deals with four network characteristics such as bandwidth, RTT, packet loss and jitter. Vicat-Blanc Primet (2003) proposed the architecture for network monitoring for European DataGrid (EDG) Project to view the network performance from the Grid applications perspective. It comprises of the monitoring tools to generate network metrics and also provides access to use the derived metrics such as Round Trip Delay, Packet Loss, Total Traffic volume, TCP and UDP Throughput, Site Connectivity and Service Availability. Vicat-Blanc Primet et al (2004) developed the MapCenter tool to vizualize the Grid status in real time and to provide access to EDG network measurement infrastructure. Practically, network measurements and monitoring often take the form of active measurement, passive measurement and SNMP-base measurement (Shin et al 2007). A Network-based Grid Optimization Service has an optimizer which provides network quality estimation service between grid nodes for grid applications but only three basic network metrics such as the average round trip time, latency, the packet loss probability and throughput are considered (Ferrari and Giacomini 2004). Richard et al (2007) presented a meta-scheduling approach called Data Intensive and Network Aware (DIANA) which considers network characteristics for scheduling decision making in Grid. It uses non preemptive mode of execution so once a jobs gets a CPU, abort that job and move that job to other Grid site is difficult. The system will be overloaded due to bulk submission of large jobs. Agustín et al (2011) proposed a strategy to perform peer-to-peer stimulated meta-scheduling in Grids by considering the network characteristics. The nodes forward job queries to all neighbors using routing indices approach but the peer-to-peer approach uses physical topology, which is not providing most efficient query forwarding and influence the scalability.
28 Developers of high-performance distributed systems often observe performance problems such as unexpectedly low throughput or high latency. To determine the source of these problems, detailed end-to-end monitoring data from applications, networks, operating systems, and hardware must be correlated across time and space. Detailed comparison of these data from a variety of angles is desired. To address this problem, this thesis presents a monitoring system using Mobile agents for efficient handle high-volume streams of monitoring data. Resource Monitoring Framework (RMF) provides Grid monitoring service through utilizing NWS with LDAP facilities and active and passive monitoring measurements (Ruoyun et al 2005). With the support of passive and active measurements in a Grid environment, it is trouble-free to produce large amount of raw data. The unplanned network monitoring is needed when the job is submitted to the resource broker and it does not need a measurement database. The planned network monitoring relies on the availability of powerful information repository of network measurements, i.e. past experience (Ciuffoletti and Polychronakis 2006). The proposed approach also explores both the monitoring techniques. 2.9 AGENTS IN GRID MONITORING Agents can provide a useful abstraction in all the layers of the Grid (Rana and Moreau 2000). They are able to adapt to the current circumstances so it can be used to provide services dynamically. It can be used to extend existing computational infrastructures and suitable for a Grid environment (Chunlin and Layuan 2004). The agents are able to cooperate to provide service advertisement and discovery for scheduling of applications and increase the grid resource utilization (Cao et al 2002). Mobile Agent technology supports systems heterogeneity and deploys the procedures with code and data on remote sites. The mobile agents reduce the network load and use less bandwidth by moving logic near data,
29 and its actions are dependent on the state of the host environment (Aversa et al 2006). It is capable of working without a dynamic connection between nodes, hence not affected by network failures and it is much more flexible. So the mobile agents are extensively used in resource discovery and monitoring applications for information retrieval (Tomarchio et al 2000). Mobile agents based network management train agents with network management capabilities and allow them to issue requests to managed devices after migrating to those nodes (Puliafio and Tomarchio 2000). The network performance monitoring and prediction provides the necessary information for the enrichment of scheduling the best resources such as where to get or put the data and where to execute the job, fault detection and trouble-shooting, identifying the bottleneck, performance analysis and tuning. The existing monitoring strategy will significantly increase the system overhead when the size of the computing facility grows (Zanikolas and Sakellariou 2005). Activating all monitoring tools in different grid resources, collecting such data, filtering them for obtaining useful information will be a major issue in network monitoring. In this scenario, the mobile agent technology can play a vital role because of its capability to cope up with the system s heterogeneity and reduces the amount of information or queries to be sent over the network in order to perform functionality and get back the result (Dong and Tong 2007). 2.10 JOB MONITORING IN GRID Balaton and Gombas (2003) presented the importance of the monitoring of grid jobs in order to know its status and progress. This provides the user with the control over the job and also for the user to know about the different job attributes. MASSIVE Monitoring System (MMS) monitors grid jobs on multiple sites and support multiple users. It is capable of monitoring and steering file operations (Luan et al 2005). G-Monitor is a web portal to
30 monitor the jobs in Grid. It retrieves the information from the resource broker, and hence, it is restricted to data from the submission process (Placek and Buyya 2003). OCM-G is a job monitoring tool for the Grid. It deals with parallel jobs which are distributed across multiple sites (Bali et al 2004). 2.11 AUTOMATED DEPLOYMENT OF SERVICE IN GRID The dynamicity of grid environment raises a need for monitoring the grid. As new resources join the grid, the resources should be monitored for efficient usage of resources. System configuration is a major source of error in deployment and its management dominates system administration costs. Automation of deployment for monitoring system enables improved correctness and speed. The script based approach in Automated Deployment of Monitoring Service (ADMS) utilizes peer to peer protocols for automated deployment (Yiduo et al 2007). 2.12 PERFORMANCE PREDICTION According to He et al (2007), the classification of TCP throughput prediction techniques falls into two categories are Formula-Based (FB) and History-Based (HB). The FB prediction is based on mathematical model and it can be performed fine with relatively trivial non-intrusive network measurements but HB approach has been acknowledged because of history of measurements from previous one. Network Weather Service (NWS) is based on the HB approach (Swany and Wolski 2002). The accuracy of NWS measurements is not quite good with real performance measurements because the predictive engine use a formula to scale the measurements. And, also NWS is not well adapted to measure the high performance links but it provides good forecasting technique (Primet et al 2002). The forecasting technique provides the future performance based on historic data and current data (Vimala Devi et al 2009).