Implementing Probes for J2EE Cluster Monitoring



Similar documents
OSGi Service Platform in Integrated Management Environments Telefonica I+D, DIT-UPM, Telvent. copyright 2004 by OSGi Alliance All rights reserved.

JoramMQ, a distributed MQTT broker for the Internet of Things

The Monitis Monitoring Agent ver. 1.2

IBM Tivoli Composite Application Manager for Microsoft Applications: Microsoft Hyper-V Server Agent Version Fix Pack 2.

Monitoring Infrastructure for Superclusters: Experiences at MareNostrum

Red Hat Network Satellite Management and automation of your Red Hat Enterprise Linux environment

Red Hat Satellite Management and automation of your Red Hat Enterprise Linux environment

An Oracle White Paper July Oracle Primavera Contract Management, Business Intelligence Publisher Edition-Sizing Guide

PERFORMANCE MONITORING OF JAVA COMPONENT-ORIENTED DISTRIBUTED APPLICATIONS

ZooKeeper. Table of contents

Minimum Hardware Configurations for EMC Documentum Archive Services for SAP Practical Sizing Guide

IBM CICS Transaction Gateway for Multiplatforms, Version 7.0

Transparent Optimization of Grid Server Selection with Real-Time Passive Network Measurements. Marcia Zangrilli and Bruce Lowekamp

SIDN Server Measurements

High-performance Linux cluster monitoring using Java

How To Monitor A Server With Zabbix


System Requirements Table of contents

Contents Introduction... 5 Deployment Considerations... 9 Deployment Architectures... 11

11.1 inspectit inspectit

Getting Started with Capacity Planner

Fact Sheet In-Memory Analysis

Performance and scalability of a large OLTP workload

User's Guide - Beta 1 Draft

Augmented Search for Web Applications. New frontier in big log data analysis and application intelligence

HP OO 10.X - SiteScope Monitoring Templates

How To Store Data On An Ocora Nosql Database On A Flash Memory Device On A Microsoft Flash Memory 2 (Iomemory)

CHAPTER 1 - JAVA EE OVERVIEW FOR ADMINISTRATORS

OpenProdoc. Benchmarking the ECM OpenProdoc v 0.8. Managing more than documents/hour in a SOHO installation. February 2013

Capacity Planning Guide for Adobe LiveCycle Data Services 2.6

Oracle TimesTen In-Memory Database on Oracle Exalogic Elastic Cloud

Oracle Database Scalability in VMware ESX VMware ESX 3.5

Best Practices on monitoring Solaris Global/Local Zones using IBM Tivoli Monitoring

PERFORMANCE ANALYSIS OF KERNEL-BASED VIRTUAL MACHINE

Supplementing Windows 95 and Windows 98 Performance Data for Remote Measurement and Capacity Planning

How To Monitor And Test An Ethernet Network On A Computer Or Network Card

Configuration Management of Massively Scalable Systems

IBM Rational Asset Manager

Agenda. Enterprise Application Performance Factors. Current form of Enterprise Applications. Factors to Application Performance.

Instrumentation for Linux Event Log Analysis

A Survey Study on Monitoring Service for Grid

CA Nimsoft Monitor. Probe Guide for Active Directory Server. ad_server v1.4 series

Resource Utilization of Middleware Components in Embedded Systems

Cloud Operating Systems for Servers

Dependency Free Distributed Database Caching for Web Applications and Web Services

Mobile Cloud Computing for Data-Intensive Applications

Oracle WebLogic Server 11g Administration

Monitoring Clusters and Grids

Enabling Technologies for Distributed Computing

User's Guide - Beta 1 Draft

PANDORA FMS NETWORK DEVICE MONITORING

How To Monitor Performance On A Microsoft Powerbook (Powerbook) On A Network (Powerbus) On An Uniden (Powergen) With A Microsatellite) On The Microsonde (Powerstation) On Your Computer (Power

An Oracle White Paper September Advanced Java Diagnostics and Monitoring Without Performance Overhead

Performance Evaluation of VMXNET3 Virtual Network Device VMware vsphere 4 build

Managing your Red Hat Enterprise Linux guests with RHN Satellite

SOFT 437. Software Performance Analysis. Ch 5:Web Applications and Other Distributed Systems

Crystal Reports Server 2008

XTM Web 2.0 Enterprise Architecture Hardware Implementation Guidelines. A.Zydroń 18 April Page 1 of 12

Titolo del paragrafo. Titolo del documento - Sottotitolo documento The Benefits of Pushing Real-Time Market Data via a Web Infrastructure

Monitoring Remedy with BMC Solutions

Improved metrics collection and correlation for the CERN cloud storage test framework

Virtualization Technologies and Blackboard: The Future of Blackboard Software on Multi-Core Technologies

Transaction Performance Maximizer InterMax

Delivering Quality in Software Performance and Scalability Testing

Heavy and Lightweight Dynamic Network Services : Challenges and Experiments for Designing Intelligent Solutions in Evolvable Next Generation Networks

IBM Tivoli Composite Application Manager for WebSphere

IBM Tivoli Monitoring V6.2.3, how to debug issues with Windows performance objects issues - overview and tools.

A Scalability Study for WebSphere Application Server and DB2 Universal Database

The ganglia distributed monitoring system: design, implementation, and experience

Oracle Primavera P6 Enterprise Project Portfolio Management Performance and Sizing Guide. An Oracle White Paper October 2010

Can High-Performance Interconnects Benefit Memcached and Hadoop?

A Comparison of Software Architectures for E-Business Applications

Chapter 7: Distributed Systems: Warehouse-Scale Computing. Fall 2011 Jussi Kangasharju

Performance Isolation of a Misbehaving Virtual Machine with Xen, VMware and Solaris Containers

Efficiency of Web Based SAX XML Distributed Processing

Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage

Chapter 14 Virtual Machines

IBM Tivoli Composite Application Manager for WebSphere

Chapter 3 Operating-System Structures

Configuring Apache Derby for Performance and Durability Olav Sandstå

Hypertable Architecture Overview

Enabling Technologies for Distributed and Cloud Computing

Web Server Architectures

Tuning Tableau Server for High Performance

Oracle WebLogic Foundation of Oracle Fusion Middleware. Lawrence Manickam Toyork Systems Inc

Transcription:

Implementing s for J2EE Cluster Monitoring Emmanuel Cecchet, Hazem Elmeleegy, Oussama Layaida, Vivien Quéma LSR-IMAG Laboratory (CNRS, INPG, UJF) - INRIA INRIA Rhône-Alpes, 655 av. de l Europe, 38334 Saint-Ismier Cedex, France First.Last@inrialpes.fr 1 Introduction Clusters have become the de facto platform for large J2EE application servers. In production environments, it is necessary to constantly monitor the state of the system to detect failures or performance degradations that may lead to violations of Service Level Agreements. The LeWYS project (http://lewys.objectweb.org) is an open source initiative aiming at building such monitoring infrastructure. It relies on Julia, a Java implementation of the Fractal component model. In this paper, we focus on the implementation of efficient s in Java that reifies the state of various J2EE cluster components with a controllable intrusiveness. We first describe in section 2 the overall architecture of the LeWYS framework. Then, section 3 presents the design and implementation of hardware and operating system s for Linux and Windows. J2EE specific components are d using the JMX introduced in section 4. Section 5 provides a performance evaluation and section 6 discusses related work. Section 7 concludes the paper. 2 The LeWYS framework LeWYS (LeWYS is Watching Your System) is a component-based framework for distributed system monitoring. LeWYS is based on Fractal [1], a Java-based component model featuring hierarchical composition, component sharing and component binding. LeWYS allows building different kinds of monitoring systems, for various configurations of distributed systems. Figure 1 depicts the main components commonly found in a monitoring system built using LeWYS: observers receive monitoring data from a set of monitored nodes. A monitoring pump is deployed on each monitored node. Monitoring pumps allow observers to subscribe to different s that are in charge of collecting monitoring data on both hardware and software resources. The monitoring pump timestamps monitoring data and sends them to observers using dynamic event channels. Event channels are made of communication components (e.g. marshallers, TCP sockets) and processing components (e.g. filters, aggregators). Note that LeWYS provides a particular observer, called monitoring repository, that store monitoring data in databases. Data can be retrievied from the repository and processed later (e.g. to correlate various indicators). 3 Hardware and OS s 3.1 Linux s Linux, as many other Unix operating systems, reifies the kernel state by the way of a virtual filesystem directly mapped onto its in-memory data structures. This filesystem is commonly mounted under /proc and is referenced as the /proc filesystem.

Observer Observer Monitoring Pump pump thread Monitoring Pump pump thread memory network Monitoring Pump pump thread Observer Monitoring repository Observer disk Fig.1. Architecture of a monitoring system built with LeWYS. [2] describes the benefits of several optimizations to gather information from the /proc filesystem. All LeWYS Linux s follow the same design. When the is loaded, the relevant files of the /proc filesystem are parsed to find the list of available resources and the indices of the values within the files. The various optimization includes keeping the file descriptor open between reads, reading the /proc files as a whole block (values are regenerated on each access) and processing raw bytes using pre-computed indices without conversion through UTF strings. We have implemented Linux specific s that report monitoring data on the consumption (user, nice, kernel, idle), on the memory (total, used, free, shared, cached, etc.), on disks (issued reads and write, merged reads and writes, read and write sectors, etc.), on the netwok (received bytes, received packets, collisions, etc.), and on the kernel (interrupts, context switches, etc.). Note that cpu information is reported for each processor in the system and network usage is reported for each network interface handled by the kernel. Most information is retrieved from few files (one file per except for the memory ) but units remain resource specific. Resources that report a monotonically increasing value will require further processing to compute throughput. For example, network throughput can be computed from 2 samples (s1 and s2) of bytes received and bytes transmit divided by the time elapsed between the 2 samples: throughput = (s2 bytes received+s2 bytes transmit ) (s1 bytes received +s1 bytes transmit ) s2 timestamp s1 timestamp 3.2 Windows s Windows provides a performance monitoring infrastructure ensuring that applications have up-to-date information on performance. data concerns: (i) physical components, such as processors, disks, and memory, (ii) system objects, such as processes, threads and interrupts, and (iii) application-level services that use Windows mechanisms to provide information about them. data is structured in three hierarchy levels called objects, instances, and counters: An object refers to any resource, service or application providing performance data. Examples of performance objects are disk, processor and memory.

An instance is a unique copy of a particular object type relative to a sub-part of this object. For example, the network object may have an instance for each available network interface. Each object (or instance) has several counters that are used to measure various metrics. For instance, metrics of a network instance are: bandwidth utilization, packet-loss ratio, number of active connections and so on. Perfmon Custom App 1 Custom App 2 Network monitoring applications Enum, select, query, etc. PDH LeWYS Memory Network Disk Battery. RegQueryValueEx Registry Interface JNI LeWYS LIB HKEY_PERFORMANCE_DATA Kernel WIN32 API monitoring interfaces PerfLib System Components Network Memory Processor.. Service 1 Service 2 Service 3.. Extension Components monitoring components Fig.2. Overview of performance monitoring structure in Windows. As shown on Figure 2, several system performance components are responsible of system and resource monitoring. On top of these components, Windows provides a registry interface to collect performance data using the Windows registry API. The data is not stored in the registry database; instead, querying the registry key HKEY PERFORMANCE DATA causes the system to collect the data from the appropriate performance components. The system returns a uniform data structure containing the requested set of performance data, structured as objects, instances and counters values. Based on this interface, LeWYS implements Windows specific s used to gather all performance data made available by the system. On the registry interface, a native LeWYS library (LeWYS LIB on figure 2) provides generic routines to collect performance information, which are then made accessible to LeWYS s through the Java Native Interface (JNI). 4 JMX s The purpose of JMX [3] s is to collect monitoring information about the software applications running in J2EE environments. Unlike other types, this information can not be obtained from the underlying operating system, but rather from the applications themselves. The JMX architecture is basically a 3-tier architecture, in which the back-end tier (known as the instrumentation level) represents the managed applications; the middle tier (or the agent level) is constituted by the JMX agent

(including the MBeans server and the services it provides); while the front-end tier (referred to as the distributed services level) represents the JMX clients and their communication channels with the agent. Our JMX s represent the clients in the architecture described above. Each JMX is responsible for a single JMX agent. During its initialization, it inquires from the agent about all the MBeans registered in it, together with their attributes. JMX s use RMI connectors to communicate with the agent, so they can be deployed either on the same machine as the agent, or on a remote machine. Currently, LeWYS incorporates both the generic JMX and a specialized that is responsible for monitoring the C-JDBC [4] middleware, as its JMX interface is based on retrieving values through special operations rather than getter methods. 5 5.1 Linux performance Table 2 reports the average time needed to collect resource usage for each LeWYS Linux. The measurements were performed using Sun JVM 1.4.2 on a Pentium4 1.8GHz machine, 512 MB RAM, a 40GB IDE disk with 6 partitions, and running Linux kernel 2.4.20. Number of resources Average time to Average time to 1 resource all resources 8 22.9µs 23.4µs Memory 13 40.3µs 40.7µs Disk 66 30.4µs 31.5µs Network 48 25.3µs 27.8µs Kernel 3 23.0µs 23.0µs Table 1.. LeWYS Linux s collecting time The dominant cost is reading the /proc filesystem and we perform this read only once regardless of the number of resources to. Therefore, there is few difference between gathering the values of all resources or only one. The gathering cost ranges from 23 to 41µs to all resources of a. The memory costs more than the other s because it has to collect information from 2 different /proc files. The cost allows for a single thread to gather all hardware information of a node in 146µs. 5.2 Windows performance Table 2 reports the average time needed to collect resource usage for each LeWYS Windows. The measurements were performed using Sun JVM 1.4.2 on a Pentium4 2 GHz machine, 512 MB RAM, a 40GB IDE disk with 2 partitions, and running Windows 2000. The cost to collect one resource varies from 332 µs for memory to 12093 µs for network and is relatively high compared to Linux s. This cost is mainly due to JNI calls but also to the access to performance data through the registry interface. However, the variation between different s is due to the fact that some

Number of resources Average time to Average time to 1 resource all resources 10 661 µs 725 µs Memory 9 332.8 µs 354.7 µs Disk 42 1939.1 µs 1956.2 µs Network 32 12093 µs 12265 µs Kernel 9 1184.4µs 1251.6µs Table 2.. LeWYS Linux s collecting time performance components in Windows (e.g. network) are costly in terms of processing and memory requirements. We also observe a slight difference between probing one resource or all of them, and this thanks to some optimization features applied in LeWYS. Indeed, because Windows provides thousands of performance objects, retrieving all object data is too costly in terms of processor and memory requirements. Experiments made in Windows 2000 have revealed that retrieving the whole performance data generates more than 95 Kilobytes of data. This is due to the fact that querying the performance registry key causes the system to collect data from all monitoring components, which is costly. LeWYS reduces the cost of native monitoring routines by minimizing access to the registry. Only the required subset of information is required from the registry. For instance, the retrieves only counter values of the processor object, which required less than 700 bytes of data. s requiring several objects are grouped within the same request so that one request is made on the registry. 6 Related work Closely related to our work, the WatchTower [5] system provides a monitoring framework specific to Windows. WatchTower builds upon PDH ( Data Helper), an abstract API that hides the complexity of the registry interface. Although this interface gives access to all performance data, it is oriented more towards operations on a single counter rather than groups of counters (each counter value requires an individual access to the registry). Using this interface to collect a large amount of performance information would be too costly. In LeWYS, this cost is reduced by accessing the raw registry interface in an intelligent way. The JMX specification [3] defines a monitoring service that does not match the requirements of our framework, which aims at capturing the complete behavior of the monitored resources. In particular, the three offered types of monitors have the following inconsistencies with our goals: the String Monitors are not of interest to us since they do not monitor numerical data; the Counter Monitors are suitable only for monitoring integer-typed continuously increasing variables, which is a special case that can not be guaranteed, especially if we did not know the nature of the monitored resources beforehand; the Gauge Monitors only send notifications when the value of the monitored resources gets out of a predefined range. However, no information is obtained regarding the behavior of the resource inside or outside that range. Ganglia [6] is a toolkit aimed at monitoring clusters and clusters of clusters. It relies on the use of monitor daemons (gmond) and meta daemons (gmetad). A gmond, running on every monitored node, s a fixed set of hardware resources at a fixed sample rate and multicasts the values on the network to gmetad. Ganglia does not offer any for J2EE monitoring. Moreover it is not as flexible as LeWYS.

Researchers at Charles University have developed Xampler [7], a benchmarking suite that focuses on extensive evaluation of CORBA environments. It provides s for Solaris, Windows, and Linux operating systems. LeWYS is complementary to Xampler and targets J2EE environments. 7 Conclusion and future work In this paper, we have briefly introduced LeWYS, a Java-based monitoring framework targeting J2EE clusters. We have described the implementation of hardware and operating system specific s, as well as J2EE generic s (based on JMX). Our preliminary experiments show that it is possible to implement efficient Java-based s that collect exhaustive resource usage within a J2EE cluster. We are currently experimenting the overhead of the monitoring infrastructure on real e-commerce applications running on J2EE clusters. Availability LeWYS is freely available under an LGPL license at the following URLs: http://lewys.objectweb.org. References 1. Eric Bruneton, Thierry Coupaye, Matthieu Leclercq, Vivien Quéma, and Jean-Bernard Stefani. An Open Component Model and its Support in Java. In Proceedings of the International Symposium on Component-based Software Engineering (CBSE 2004), Edinburgh, Scotland, 2004. 2. C. Smith and D. Henry. High- Linux Cluster Monitoring Using Java. In Proceedings of the 3rd Linux Cluster International Conference, 2002. 3. Java Management Extensions (JMX) specification version 1.2, Mars 2004. Sun Microsystems, http://java.sun.com/products/javamanagement/. 4. E. Cecchet, J. Marguerite, and W. Zwaenepoel. C-JDBC: Flexible Database Clustering Middleware. In Proceedings of Freenix, 2004. 5. M. Knop, J. Schopf, and P. Dinda. Windows Monitoring and Data Reduction Using WatchTower. In Proceedings of the Workshop on Self-Healing, Adaptive and self- MANaged Systems (SHAMAN), New York City, 2002. 6. Ganglia Toolkit 2.5.0, September 2002. University of California, Berkeley, USA, http://ganglia.sourceforge.net/. 7. Middleware Benchmarking, 2004. Distributed Systems Research Group, Charles University Prague, http://nenya.ms.mff.cuni.cz/.