theguard! ApplicationManager Operating System Data Collector for Solaris 10 with Virtualization (Zones) Status: 9/25/2006
Introduction...3 Performance Features of the ApplicationManager Data Collector for UNIX Operating System:...4 Overview of the UNIX Operating System...5 Special Features of the Solaris 10 Operating System...8 The UNIX Operating System Data Collector Solaris...9 Monitoring Hardware and Software Resources with the Solaris DC...10 Hardware Information (Host)...11 Information and Monitoring of Hard Disk (Disk)...11 Information and Monitoring of Memory...13 Information and Monitoring of Processors (processor)...14 Information About the Operating System...15 Information and Monitoring of File Systems (Filesystem)...15 Information and Monitoring of Application Processes (Process Groups)...17 Information and Monitoring of System Swap Space (Memory)...19 Information and Monitoring of Virtual Operating Systems (Zone)...20 Standard Reporting and Service Level Analysis...21 Application Processes and Business Systems...21 Case Study of a Monitored System...22 Availability and Platforms...24 Copyright REALTECH 2006 Side 2 of 2
Introduction There is more to efficiently managing an application than maximizing its availability. Targeted tuning can increase the performance and stability of business-critical applications without having to invest in additional hardware (processors, RAM, disk space). A number of data collectors have been developed for theguard! ApplicationManager that provide comprehensive monitoring and generate detailed data analyses. Data collectors do more than simply collect events according to pre-defined rules. They deliver every performance value and the current status of each application object in real time. They also provide insight into configuration attributes, such as the release status or the application's parameter settings. Data collectors model an application in objects and sub-objects, enabling a dedicated handling of alerts, monitoring or status messages. This model ensures that information is clearly structured and that messages are easy to allocate to a problem. Pre-defined and reusable policies for each type of application facilitate the implementation of the solution and the adaptation of monitoring to dynamic landscapes. The ease with which thresholds are set ensures the early recognition of potential errors. Comprehensive reaction management enables flexible alerting for more than 100 different devices and alarm consoles. The automatic discovery of new application instances and objects, including the automatic allocation of policies, enables automatic monitoring even in those cases in which system administrators have completely reconfigured the application, for example, by adding new instances or objects. Central reporting at the application instance and application object level provides for a detailed and effective capacity management of all resources. Integrated Service Level Management ensures that the service levels for application availability and performance are achieved, while Operational Level Agreements (OLAs) can be easily defined at the application object level. Copyright REALTECH 2006 Side 3 of 3
Performance Features of the ApplicationManager Data Collector for UNIX Operating System: The operating system, its components and their reliability, availability, and usability, in short the system's overall "availability," is the basis of a properly functioning server. Business-critical applications of a company are running on one or more servers. They presuppose a functioning operating system and/or rely on the operating system and its resources. If Downtimes occur because of technical problems or bottlenecks in processing because of defective or too small limited operating system resources as e.g. defective CPU and/or defective hard disks or too small limited swap space etc., then applications will fail or can be no more served. This causes subsequent costs. The operating system data collector hereafter referred to as OS DC for UNIX operating systems enables a comprehensive monitoring of a live UNIX server running operating system. With the OS DC, a number of operating system parameters, such as processes and process groups, running on a server can be monitored in parallel and compared to each other. All of the hardware and software resources of a server, such as processors, memory, virtual memory, physical hard disks, logical hard disks, services, processes, etc., are defined and analyzed individually within the framework of the CIM model as Managed Objects (MO). This structures the information and enables the policies to be controlled individually. With ApplicationManager's event and threshold monitoring resource, bottlenecks and system overloads can be detected and reported at an early stage. Together with two other data collectors, File Parser DC: Configuration file and security-related system file management Custom DC: User-defined system object monitoring OS DC provides the best security for your system. Administrators and/or other authorized users can access a machine by simply clicking on the managed node. The Application Launcher function, which is integrated in the interface, enables the context-based configuration of a detail analysis application, in this case a Telnet session. The OS data collector is a very powerful application that consists of a number of objects, event calendars and performance counters. All of the information is described in the data collector's online documentation. The present document provides an overview of the most important functions of the data collector. Copyright REALTECH 2006 Side 4 of 4
Overview of the UNIX Operating System UNIX is a multi-user multiprocess system, meaning several users can use the same computer at the same time. several processes, meaning executable programs, can be run at the same time by one user ID (userid). Applications on UNIX: Most of the business critical applications are Client/Server applications (2, 3 or 4-tier), the application server processes (in the following application processes) run on UNIX whereas the client interface is either a browser or a Windows application on a Frontend-PC. Applications consist of several application processes, which may communicate even beyond computer borders. These processes are initialized under a system user and their services have to be available permanently with high accessibility and performance, in order to give best response to client requests. Figure 1: Multi-user- and multiprocess system (schematic). Copyright REALTECH 2006 Side 5 of 5
Using the UNIX System Each user who logs in to a UNIX system opens a so-called shell, the environment in which the users' commands are interpreted and executed. General structure The operating system is the interface between the hardware and the user and/or the applications. Figure 2 : General structure of UNIX The hardware is the basis. The hardware consists mainly of the main frame memory, where the processes are executed (a temporary storage on the hard disk is possible), the CPU, the hard disks, where files and programs are stored permanently as well as the network, meaning the machine, where you work on is connected to other machines. The UNIX operating system runs on the hardware. Its main task is to control and manage the hardware and provide all of the programs with a standard system call interface. The system calls, which only run in kernel mode, enable user programs, processes, files, and other resources to be generated and managed. The basic tasks of the UNIX operating system are: Process Management o Load, start, interrupt, and end processes o Allocate processor time to processes (scheduling) o Synchronize the workload distribution on multiprocessor machines Memory Management o Allocate, manage, and release memory to processes o Monitor and protect the memory from other processes accessing it o Temporarily move processes (swapping) to mass storage devices Copyright REALTECH 2006 Side 6 of 6
Device and file management, and I/O control o Organize and manage the data on the mass storage devices in a hierarchical file structure o Manage access permissions to the file system o Efficiently allocate I/O devices and switching units (data channels, control units) to prevent conflicts. Initialize and monitor the execution, schedule I/O processes, convert data, and logically control the file system. o Manage computer resources such as terminals, mass storage devices, I/O ports, the network, etc. o Transfer common read/write operations to device-specific control signals o Coordinate concurrent accesses to I/O resources Authentication and Access Control o Manage users and their access permissions to system resources o Log users in to and out of the system o Protect system resources from users that do not have the correct permissions Log Processing and Error Handling o Record important functions and events such as error statuses in the system log o Recognize error statuses and process them accordingly User Interfaces o The shell and additional/other service programs, besides other things, administration programs o Text-based (console) and graphical user interface for communication between man and machine Copyright REALTECH 2006 Side 7 of 7
Special Features of the Solaris 10 Operating System With solaris 10 the concept of the resource management was extended by the zones. With this tool you can install several virtual Solaris 10 instances (zones) on a physical server. This is an advantage for utilizing the hardware resources of a server in an optimal way. With it you can assign disks, processors, so called processor sets and computing power to each zone. First of all, this concept is interesting to outsourcers and web hosters, because they are now able to create a new zone per application and user until the hardware resources of a server are optimally utilized. In order to monitor the resource requirements of the zones, the "Solaris data collector has to be installed in the "global zone". If it is installed in a "local" zone, the data collector realizes only this zone, the Solaris instance. As the assignment of the resources is done manually, a complete monitoring and permanent recording of the load of all physical and virtual resources is therefore absolutely necessary. This is the performance of the new data collector Solaris. The Solaris DC is licensed per zone. Because all other data collectors for databases, SAP, Net Service, etc. can also run within a zone, the platform Solaris 10 is fully supported by theguard! ApplicationManager. Copyright REALTECH 2006 Side 8 of 8
The UNIX Operating System Data Collector Solaris The Solaris10 operating system data collector is partitioned in two Managed Object Types (MOT). The first range, the Host MOT, represents the physical view onto the machine. The second range, the OS MOT, provides information which is a direct connection to the operating system. The clear separation into hardware and software makes it easier to integrate future developments of this platform into the REALTECH CIM model. The following view displays the structure within the global zone: Host: The name of the model describes the model of the used hardware Disk: The physically available hard disks of this computer Memory: The physical memory of this Computer Processor: The physically available processors of this Computer OS: The installed operating system Filesystem: The file systems, administrated by the operating system (zone) Processor: The processors, used by this operating system (zone) ProcessGroup: The application processes, as grouped by the users individually SwapSpace: The swap areas, assigned by the operating system Zone: The zones, installed within the operating system. Please notice that zones are only visible from out of the global zone. Figure 3: DC tree (overview) The Managed Objects, the concrete physical objects of the MOTs, are hierarchically arranged and clearly structured. The above figure displays the Managed Object of the MOT "Disk," MO sd0. Presenting the information in this way displays the current status at a glance. Monitoring parameters can be set individually for each component. Copyright REALTECH 2006 Side 9 of 9
Monitoring Hardware and Software Resources with the Solaris DC In order to guarantee a trouble-free function of the operating system and the productive applications as well the monitoring of the hard- and software resources like e.g. processors, working memory, hard disks, virtual memory, processes etc. is most important. With it, the resources have to be monitored according to sufficient capacity and performance. If one of the resources is overloaded or used up, the performance of the complete system drops drastically, which may have a negative impact on the productive server operation. In many cases this leads to an unforeseeable malfunction of the productive applications. In worst case, the stability of the entire system is endangered. By continuously monitoring the system resources, the Solaris DC warns at an early stage, before important resources are overloaded or have an impact or used up. This enables the system administrator to react early on resource bottlenecks to remove them. The Solaris DC provides numerous statistical values on the current utilization and performance of various hardware- and software resources of the monitored server. Thresholds, which are used to trigger alerts when they are transgressed can be defined for all of the statistical values. Resource bottlenecks with it will be detected at an early stage and can be removed. Function and performance monitoring can be performed at the same time. All of the statistical values can be monitored and compared in the Real-time Performance Monitor. The provided features offer a valuable support to optimize performance and capacity. Figure 4: Current usage of the system resources memory, processors and swap space, in percent. Copyright REALTECH 2006 Side 10 of 10
The various servers' resource usage and the different zones can be compared, which gives a substantial support according to optimization tasks, such as a more efficient load distribution to different servers or zones. The statistical values can also be collected in the ApplicationManager database and evaluated using REALTECH Reporting. The data can be used to generate long-term trend analyses, for example on the usage of system resources, such as CPU, memory and disk space, and can be used as a basis for cost and capacity planning. Hardware Information (Host) Figure 5: Properties of the object "Host" Host is an information object, which provides data on hard disks, on memory and on processors of a computer. Information and Monitoring of Hard Disk (Disk) Figure 6: Properties of the object "Disk" The properties displayed here refer to the system's available hard disks. The progression of statistical values, for example the fill level, can be used in the appropriate policy to generate an alert or send a notification. Copyright REALTECH 2006 Side 11 of 11
The Performance Criteria for Monitoring a Disk Object are: The number of read accesses, since the hard disk became available to the operating system The number of write accesses, since the hard disk became available to the operating system The number of read kilobyte, since the hard disk became available to the operating system The number of kilobyte, written since the hard disk became available to the operating system Current read-out rate (access per second) Current write-out rate (access per second) Current read-/write out rate (access per second) Current read-out transfer rate (kilobyte per second) Current write-out transfer rate (kilobyte per second) Current read-/write-out transfer rate (kilobyte per second) Current access time (milliseconds) Disk utilization (percent) Disk waiting condition (percent) Example: Performance statistics Figure 7: Disk sd1 has an access time of 3,7 ms and its full capacity is approx. 1% Example: Throughput statistics Figure 8: Counter of category "Throughput statistics" Copyright REALTECH 2006 Side 12 of 12
Information and Monitoring of Memory Figure 9: Properties of object "Memory" The properties displayed here refer to the system memory. The progression of statistical values, for example the fill level, can be used in the appropriate policy to generate an alert or to send a notification. The Performance Criteria for Monitoring a Memory Object are: Size of the physical memory (kilobyte) Used memory (kilobyte) Free memory (kilobyte) Used memory (percent) Example: Figure 10: Counters of the memory object Copyright REALTECH 2006 Side 13 of 13
Information and Monitoring of Processors (processor) Figure 11: Properties of object "Processor" The Performance Criteria for Monitoring a Processor Object are: Processor utilization (percent) Processor in user mode (percent) Processor in kernel mode (percent) Processor in idle mode (percent) Example: Figure 12: Counters of the object "processor" Copyright REALTECH 2006 Side 14 of 14
Information About the Operating System Figure 13: Properties of the object "OS" "OS is an information object, which provides data on file systems, process groups, processors and zones within the operating system. Information and Monitoring of File Systems (Filesystem) Figure 14: Properties of the object "Filesystem The Performance Criteria for Monitoring a File System Object are: Total size (kilobyte) Load space (kilobyte) Available space (kilobyte) Available space for user root (kilobyte) Capacity (load space in percent) Number of inodes Number of used inodes Number of available inodes Number of root available inodes Percentage of used inodes inodes Copyright REALTECH 2006 Side 15 of 15
Example: Figure 15: Occupation statistics of a file system Figure 16: Inode statistics of a file system Copyright REALTECH 2006 Side 16 of 16
Information and Monitoring of Application Processes (Process Groups) Figure 17: Application process groups; total for all processes. Process groups, such as SAP, Siebel, Oracle, etc., can be freely defined The process filters look as follows: Figure 18: Process filter of process group 'Total, to which all processes of the operating system belong to Copyright REALTECH 2006 Side 17 of 17
The Performance Criteria for Monitoring a Process Group Object are: Number of processes in the group CPU utilization of the group (percent) Memory utilization of the group (kilobyte) Example: Figure 19: Counters of a process group For each process group, the performance values of the above-described resources, such as memory and CPU, can be evaluated and alerts can be displayed in real time. It is also possible to generate reports from the longterm log. The availability and performance of single applications can be evaluated and their performance can be increased. Copyright REALTECH 2006 Side 18 of 18
Information and Monitoring of System Swap Space (Memory) Figure 20: Property of the system's Swap Space displayed in the ApplicationManagers' Managed Monitor The Performance Criteria for Monitoring a Swap Space Object are: Total size (kilobyte) Free space (kilobyte) Load space (kilobyte) Free space (percent) Load space (percent) Example: Figure 21: Counters of a process group The properties that are displayed here refer to the system's swap space. The progression of statistical values, for example the fill level, can be used in the appropriate policy to generate an alert or to send a notification. Copyright REALTECH 2006 Side 19 of 19
Information and Monitoring of Virtual Operating Systems (Zone) Figure 22: Properties of the object "Zone" The properties displayed here relate to the system's available zones. The progression of statistical values, for example the CPU- or memory utilization, can be used in the appropriate policy to generate an alert or to send a notification. The Performance Criteria for Monitoring a Zone Object are: Number of processes in the zone Number of existing zombies in the zone Total "image size" of all zone processes (kilobyte) Total of "resident size" of all zone processes (kilobyte) Memory requirement of zone (percent) CPU requirement of zone (percent) Example: Figure 23: Counters of a zone object Copyright REALTECH 2006 Side 20 of 20
Standard Reporting and Service Level Analysis In addition to monitoring performance and displaying and triggering alarms for critical statuses, theguard! ApplicationManager can also be used to evaluate historical data to verify service levels or plan capacity. Service Level Analysis Configuration of Object Level Agreements (OLAs) for an application's process and/or process group. Measuring of availability against rules and reports. Standard reporting (Performance and Capacity): Availability of an application's process and/or process group Load and resource usage of an application's process and/or process group by the above-mentioned parameters Total processor load Single processor load Memory Application Processes and Business Systems Applications in form of process groups are Managed Objects that can be used in Business Systems to map distributed applications and business processes in theguard! ServiceCenter. Copyright REALTECH 2006 Side 21 of 21
Case Study of a Monitored System The following screen sequence shows how to display the load measurement on a system with the performance monitor First, the status "Idle" (unused processor capacity, idle processors): Figure 24: Performance Monitor - all processors are idle The following elements were selected for monitoring: Processors 0 and 1, to clearly display the caused processor load Zone dev (development), where a compiler is started Memory, to clearly display the necessary memory Swap space, to clearly display the necessary swap space We want to examine, which system resources are involved, when load is produced. Copyright REALTECH 2006 Side 22 of 22
In the following figure the start of the load is clearly viewable. With starting the compiler, there is a clearly viewable impact on the resources of the system. Figure 25: Performance Monitor start and end of "compiler process Obviously, the processors are almost used during the compiling process. The CPU utilization of zone dev shows a noticeable rise, the memory requirement an insignificant and the swap space requirement no rise. This indicates that the machine still has enough resources. The peaks of processor load before and after compiling are caused by other zones. Copyright REALTECH 2006 Side 23 of 23
Availability and Platforms The Solaris DC is available right now. It supports operating systems from Solaris 10 on and also SPARC and X64 systems. For more information about REALTECH s software products see: REALTECH AG Industriestr. 39c 69190 Walldorf, Germany Tel +49.6227.837.880 Fax +49 6227 837 837 customer-services@realtech.com http://www.realtech.com Copyright REALTECH 2006 Side 24 of 24