Tools and strategies to monitor the ATLAS online computing farm
|
|
|
- Juliana Mitchell
- 10 years ago
- Views:
Transcription
1 Tools and strategies to monitor the ATLAS online computing farm S. Ballestrero 1,2, F. Brasolin 3, G. L. Dârlea 1,4, I. Dumitru 4, D. A. Scannicchio 5, M. S. Twomey 1,6, M. L. Vâlsan 4 and A. Zaytsev 7 1 CERN CH-1211 Genève 23, Switzerland 2 University of Johannesburg Department of Physics, PO Box 524 Auckland Park 2006, South Africa 3 INFN Sezione di Bologna, Viale Berti Pichat 6/2, Bologna, Italy 4 Politehnica University of Bucharest, Splaiul Independenţei 313, cod , sector 6, Bucharest, Romania 5 University of California, Irvine, CA 92697, USA 6 University of Washington Department of Physics, Box Seattle WA , USA 7 Russian Acad. Sci., Siberian Div., Budker Inst. of Nuclear Physics (BINP), 11 Academician Lavrentiev prospect, Novosibirsk, , Russia [email protected] ATL-DAQ-PROC June Abstract. In the ATLAS experiment the collection, processing, selection and conveyance of event data from the detector front-end electronics to mass storage is performed by the ATLAS online farm consisting of nearly 3000 PCs with various characteristics. To assure the correct and optimal working conditions the whole online system must be constantly monitored. The monitoring system should be able to check up to health parameters and provide alerts on a selected subset. In this paper we present the assessment of a new monitoring and alerting system based on Icinga. This is an open source monitoring system derived from Nagios, granting backward compatibility with already known configurations, plugins and add-ons, while providing new features. We also report on the evaluation of different data gathering systems and visualization interfaces Introduction The ATLAS [1] Online Farm, consisting of nearly 3000 PCs, must be continuously monitored to ensure the optimal working conditions. The monitoring system should be able to check up to health parameters and provide alerts on a selected subset: the health status of the OS, hardware, critical services and network components. The monitoring system is not critical for the ATLAS data taking, but it is very useful for promptly reacting to potentially fatal issues that may arise in the system. Nagios v2.5 [2] was chosen in 2007 to design and implement the monitoring system and it is still being used. It has proven to be robust and scalable with the increase in size of the farm; to cope with the large amount of checks and the consequently high work load, the checks have been distributed across many Nagios instances on separate servers (up to 80 now). Therefore from a configuration point of view the system is too complex to be managed manually. The many configuration files must be instead generated in an automated way; the
2 resulting files consequently have a simplified, standardized structure, and cannot take advantage of the full flexibility native to Nagios. As new tools have recently become available (e.g. Gearman [3] and mod gearman [4]) to nicely distribute the work load on worker nodes, we have begun to evaluate the possibility of updating the current system to take advantage of these new features and at the same time simplify the current schema. 2. Nagios Nagios is the monitoring and alerting system adopted and used since 2007 to implement the monitoring of the ATLAS online farm. The main requirements were to have a robust and reliable system capable of scaling up together with the planned growth of the farm. Information and alert notifications are successfully provided by monitoring many different services, for example: ping and SSH connectivity NTP synchronization kernel version temperature HDD state (if present) and ramdisk usage automount status filesystem status CPU load memory usage For events related to critical services, which are of crucial importance for the proper functioning of the ATLAS Trigger and Data Acquisition (TDAQ) infrastructure, Nagios provides and/or SMS alerts to the concerned experts Current implementation Due to the large amount of checks and hosts ( 3000 hosts with up to 40 checks each) it was necessary to distribute the work load on many servers. Consequently, a Nagios server has been installed on each of the 80 Local File Servers (in the ATLAS Online Farm architecture [5], an LFS is a server which provides DHCP, PXE, NFS and other services to a defined subset of clients, typically 32 or 40 hosts). Since the number of servers (and nodes to be monitored by each server) is too large to be handled manually, the configuration files, describing the hosts and services to be checked, have to be generated automatically by ConfDBv2 [6]. This is a tool developed by TDAQ SysAdmin Team to manage network, host and Nagios configuration in the Online Farm. The automatically generated configuration files have a simplified and standardized structure; as a consequence the system cannot use all the features and flexibility native to Nagios (e.g. host and service checks dependencies). Eventually the data resulting from the checks is stored on a MySQL database which represents the current status; the database is hosted on a MySQL Cluster for performance and reliability reasons. The same data is also accumulated in RRD (Round-Robin Database, [7]) format and stored on a centralized file storage, to be used later to produce graphs showing the historical evolution of the monitored parameters. However, because of Nagios design, the log files, used by its web interface, can only be stored locally. This implementation, sketched in Figure 1, has the obvious disadvantage of having access to the information scattered across all the LFSes: the standard Nagios GUI of each server can
3 Figure 1. Schema of the current implementation of the monitoring system only display the data collected locally. To overcome this limitation, a web interface has been developed in-house to group all the information in a single page: the interface displays summaries and detailed data of the checks performed by each of the 80 servers, and provides, for each monitored host, quick links to the GUI of its Nagios server. Figures 2 and 3 show screenshots of the main summary page and of the in-house Nagios web page showing a particular group of hosts being monitored. This custom interface has proven effective, but it does not offer all the advanced functionalities that are available in other Web GUIs developed for different, more standard Nagios-based monitoring systems, and requires maintenance to follow the evolution of the system. Besides the checks provided via the standard Nagios plugins, a few customizations have also been introduced: some of the existing plugins have been adapted to suit the needs of the monitoring system. For example, the plugin used to check the status and the traffic of various ethernet interfaces of a node has been adapted to be able to simultaneously monitor all interfaces, regardless of their names. a full system has been put in place to monitor all the significant IPMI [8] sensor information, which is provided by the Baseboard Management Controllers (BMC) 1 of the PCs in the farm; in general more than 20 hardware sensors are monitored for each node. Amongst the most important are CPU and system temperatures, fan speeds, power supplies statuses, various currents and voltages from the system. These checks run as an independent service which then exports the result in a format which is easily interpreted by Nagios. Despite the described disadvantages of the complex, custom distributed implementation, the system has been working well since the start-up of LHC operation in Ganglia As a first step to improve the monitoring system, we have recently introduced Ganglia [9]. Ganglia is a software package designed for monitoring the workload and performance of multiple large, and possibly geographically distributed, high performance computing clusters; contrary to Nagios, it does not have advanced alerting capabilities. We use it to monitor about 300 hosts, mainly servers, for which it provides detailed information on CPU, disk and network performance; the advanced functionalities of its Web User Interface help in performance analysis and forecast. We are also evaluating the option of using it as a data source for the Nagios alerting system [10]. 1 The Baseboard Management Controller is described in the IPMI standard, see [8].
4 Figure 3. GUI. Example of the in-house Nagios Figure 2. Example of the status summary page Icinga Icinga [11] is a fork of Nagios v3 and is backward compatible: Nagios configurations, plugins and addons can all be used with Icinga. Though Icinga retains all the existing features of its predecessor, it builds on them to add many patches and features requested by the user community as described on its web site. Moreover a very active community provides regular major releases, bug fixes and new features. All these characteristics, and the backward compatibility with Nagios configurations, convinced us to test it. The main goals in updating the current system are to simplify the current setup by reducing the number of distributed servers, while maintaining the same performances, and to increase the flexibility of the configuration Tests performed The first evaluations have been performed installing Icinga on a single server: 1100 hosts and services have been configured to be monitored; this load corresponds to about one third of the online farm. For this test, the configuration files have been prepared manually by copying the existing ones from the multiple Nagios servers. This was a very important proof of the backward compatibility. Therefore the same system currently in use (ConfDB [6]) can be easily adapted to generate configurations for a system based on a single Icinga server. A local MySQL server has been used to store the data produced by Icinga. Icinga, being a more recent and actively updated software package for system monitoring, behaves well with a high number of hosts and checks and provides new and useful options. For example one may benefit from the use large installation tweaks configuration option that allows
5 Figure 4. Schema representing how mod gearman integrates with Nagios/Icinga the Icinga daemon to take certain shortcuts (e.g. better memory management and usage of forks), which result in a reduced system load and increased performance. The results obtained show that Icinga copes better with the high number of checks performed than the currently used Nagios v2.5 implementation: the work load on a single node shows an average check execution time and latency below 1 second Gearman and mod gearman Gearman [3] is a generic application framework that distributes work from a server to other machines. Mod gearman [4] is an easy way of distributing active Nagios/Icinga checks across the network and of increasing scalability. It can even help to reduce the load on a single monitoring server, because it is much smaller and more efficient in executing checks. It consists of three parts (see Figure 4): a module for the Nagios or Icinga core which sends service and host checks to a Gearman queue one or more worker nodes to execute the checks at least one Gearman Job Server 4.3. Test performed using Gearman Some tests have also been performed using Gearman and mod gearman (see section 4.2) to distribute the work load of the thousands of checks on multiple servers. In the test setup we have defined two workers: one on the Icinga server itself and the second on another node. Figure 5 is a snapshot from the Gearman status tool showing the number of workers available, jobs waiting and jobs running. For the time being the customized scripts (described in Section 2) have not been used. As they make use of files saved locally to generate the check results, they are not suitable to be used in a distributed environment like the one provided with Gearman/mod gearman.
6 Figure 5. Snapshot showing gearman status tool with two workers nodes Figures 6 and 7 respectively show the CPU load of the worker node and of the server (which runs also a worker process). Figure 6. Snapshot of the CPU usage on a Gearman worker node. Figure 7. Snapshot of the CPU usage on a server running Icinga, MySQL and a Gearman worker Migration strategy A smooth migration from the current to the new monitoring system is highly desirable in order to maintain the current level of reliability and robustness and avoid disruption to the system during the LHC operation. The schema of the foreseen monitoring system implementation based on Icinga is shown in Figure 8. Figure 8. Sketch of the new possible implementation of the monitoring system based on Icinga Different upgrade strategies have been considered and all foresee having Icinga running in parallel with Nagios until the end of the current year (2012). The final step of putting the new monitoring system into production can safely be performed next year, taking advantage of the LHC Long Shutdown (2013). For this upgrade ConfDBv2 will have to be adapted to generate the configuration file on a single server. Moreover different tools will still need access to the centralized MySQL database and RRD storage and will likely need to be adapted for the new system environment. The
7 centralized Icinga server will provide a unified web user interface which may replace most of the functionality of our dedicated in-house Nagios GUI. One drawback of an Icinga implementation with a single central server is that losing the central server results in no access to the monitoring information. In theory it is possible to build a high availability system, but the increased complexity may not be justified since the monitoring is not a critical system for the ATLAS data taking. 6. Conclusions The current monitoring and alerting system for the ATLAS Online Farm is based on Nagios v2.5, with the addition of in-house developed web interface and specialised plugins. It has proven its reliability and effectiveness in production, providing alerting and basic performance monitoring. To better support the evolution of the Farm in the long term we are evaluating an upgrade of the system, based on Icinga and mod gearman. Initial tests performed using a standalone Icinga server have shown good performance in monitoring 1100 hosts with services. Adding mod gearman, with 2 workers, provides the expected increase in performance, pointing to a very good scalability of the Icinga Gearman combination. The coming months will see the completion of the scalability tests, the porting of certain data collection plugins, and the choice of one of the possible upgrade scenarios. References [1] ATLAS Collaboration, The ATLAS Experiment at the CERN Large Hadron Collider, JINST 3 (2008) S08003 [2] Nagios: [3] Gearman: [4] mod gearman: [5] ATLAS Collaboration, ATLAS High Level Trigger, Data Acquisition and Controls: Technical Design Report, CERN/LHCC/ (2003), ISBN [6] ATLAS TDAQ Sysadmin, Centralized configuration system for a large scale farm of network booted computers, to be published in proceedings of CHEP 2012 [7] RRDTool: [8] Intelligent Platform Management Interface (IPMI): [9] Ganglia: [10] ATLAS TDAQ Sysadmin, Upgrade and integration of the configuration and monitoring tools for the ATLAS Online farm, to be published in proceedings of CHEP 2012 [11] Icinga:
A SURVEY ON AUTOMATED SERVER MONITORING
A SURVEY ON AUTOMATED SERVER MONITORING S.Priscilla Florence Persis B.Tech IT III year SNS College of Engineering,Coimbatore. [email protected] Abstract This paper covers the automatic way of server
SCF/FEF Evaluation of Nagios and Zabbix Monitoring Systems. Ed Simmonds and Jason Harrington 7/20/2009
SCF/FEF Evaluation of Nagios and Zabbix Monitoring Systems Ed Simmonds and Jason Harrington 7/20/2009 Introduction For FEF, a monitoring system must be capable of monitoring thousands of servers and tens
World-wide online monitoring interface of the ATLAS experiment
World-wide online monitoring interface of the ATLAS experiment S. Kolos, E. Alexandrov, R. Hauser, M. Mineev and A. Salnikov Abstract The ATLAS[1] collaboration accounts for more than 3000 members located
Maintaining Non-Stop Services with Multi Layer Monitoring
Maintaining Non-Stop Services with Multi Layer Monitoring Lahav Savir System Architect and CEO of Emind Systems [email protected] www.emindsys.com The approach Non-stop applications can t leave on their
Migration and Building of Data Centers in IBM SoftLayer with the RackWare Management Module
Migration and Building of Data Centers in IBM SoftLayer with the RackWare Management Module June, 2015 WHITE PAPER Contents Advantages of IBM SoftLayer and RackWare Together... 4 Relationship between
CATS-i : LINUX CLUSTER ADMINISTRATION TOOLS ON THE INTERNET
CATS-i : LINUX CLUSTER ADMINISTRATION TOOLS ON THE INTERNET Jiyeon Kim, Yongkwan Park, Sungjoo Kwon, Jaeyoung Choi {heaven, psiver, lithmoon}@ss.ssu.ac.kr, [email protected] School of Computing, Soongsil
MALAYSIAN PUBLIC SECTOR OPEN SOURCE SOFTWARE (OSS) PROGRAMME. COMPARISON REPORT ON NETWORK MONITORING SYSTEMS (Nagios and Zabbix)
MALAYSIAN PUBLIC SECTOR OPEN SOURCE SOFTWARE (OSS) PROGRAMME COMPARISON REPORT ON NETWORK MONITORING SYSTEMS (Nagios and Zabbix) JANUARY 2010 Phase II -Network Monitoring System- Copyright The government
SEE-GRID-SCI. www.see-grid-sci.eu. SEE-GRID-SCI USER FORUM 2009 Turkey, Istanbul 09-10 December, 2009
SEE-GRID-SCI Grid Site Monitoring tools developed and used at SCL www.see-grid-sci.eu SEE-GRID-SCI USER FORUM 2009 Turkey, Istanbul 09-10 December, 2009 V. Slavnić, B. Acković, D. Vudragović, A. Balaž,
MONITORING RED HAT GLUSTER SERVER DEPLOYMENTS With the Nagios IT infrastructure monitoring tool
TECHNOLOGY DETAIL MONITORING RED HAT GLUSTER SERVER DEPLOYMENTS With the Nagios IT infrastructure monitoring tool INTRODUCTION Storage system monitoring is a fundamental task for a storage administrator.
Migration and Building of Data Centers in IBM SoftLayer with the RackWare Management Module
Migration and Building of Data Centers in IBM SoftLayer with the RackWare Management Module June, 2015 WHITE PAPER Contents Advantages of IBM SoftLayer and RackWare Together... 4 Relationship between
Cloud Computing for Control Systems CERN Openlab Summer Student Program 9/9/2011 ARSALAAN AHMED SHAIKH
Cloud Computing for Control Systems CERN Openlab Summer Student Program 9/9/2011 ARSALAAN AHMED SHAIKH CONTENTS Introduction... 4 System Components... 4 OpenNebula Cloud Management Toolkit... 4 VMware
automates system administration for homogeneous and heterogeneous networks
IT SERVICES SOLUTIONS SOFTWARE IT Services CONSULTING Operational Concepts Security Solutions Linux Cluster Computing automates system administration for homogeneous and heterogeneous networks System Management
How To Monitor Your Computer With Nagiostee.Org (Nagios)
Host and Service Monitoring at SLAC Alf Wachsmann Stanford Linear Accelerator Center [email protected] DESY Zeuthen, May 17, 2005 Monitoring at SLAC Alf Wachsmann 1 Monitoring at SLAC: Does not really
Red Hat Network Satellite Management and automation of your Red Hat Enterprise Linux environment
Red Hat Network Satellite Management and automation of your Red Hat Enterprise Linux environment WHAT IS IT? Red Hat Network (RHN) Satellite server is an easy-to-use, advanced systems management platform
Managing your Red Hat Enterprise Linux guests with RHN Satellite
Managing your Red Hat Enterprise Linux guests with RHN Satellite Matthew Davis, Level 1 Production Support Manager, Red Hat Brad Hinson, Sr. Support Engineer Lead System z, Red Hat Mark Spencer, Sr. Solutions
Red Hat Satellite Management and automation of your Red Hat Enterprise Linux environment
Red Hat Satellite Management and automation of your Red Hat Enterprise Linux environment WHAT IS IT? Red Hat Satellite server is an easy-to-use, advanced systems management platform for your Linux infrastructure.
WHITE PAPER. ClusterWorX 2.1 from Linux NetworX. Cluster Management Solution C ONTENTS INTRODUCTION
WHITE PAPER A PRIL 2002 C ONTENTS Introduction 1 Overview 2 Features 2 Architecture 3 Monitoring 4 ICE Box 4 Events 5 Plug-ins 6 Image Manager 7 Benchmarks 8 ClusterWorX Lite 8 Cluster Management Solution
Scaling Graphite Installations
Scaling Graphite Installations Graphite basics Graphite is a web based Graphing program for time series data series plots. Written in Python Consists of multiple separate daemons Has it's own storage backend
Shoal: IaaS Cloud Cache Publisher
University of Victoria Faculty of Engineering Winter 2013 Work Term Report Shoal: IaaS Cloud Cache Publisher Department of Physics University of Victoria Victoria, BC Mike Chester V00711672 Work Term 3
Gigabyte Management Console User s Guide (For ASPEED AST 2400 Chipset)
Gigabyte Management Console User s Guide (For ASPEED AST 2400 Chipset) Version: 1.4 Table of Contents Using Your Gigabyte Management Console... 3 Gigabyte Management Console Key Features and Functions...
Nagios and Cloud Computing
Nagios and Cloud Computing Presentation by William Leibzon ([email protected]) Nagios Thanks for being here! Open Source System Management Conference May 10, 2012 Bolzano, Italy Cloud Computing What
What s new in Hyper-V 2012 R2
What s new in Hyper-V 2012 R2 Carsten Rachfahl MVP Virtual Machine Rachfahl IT-Solutions GmbH & Co KG www.hyper-v-server.de Thomas Maurer Cloud Architect & MVP itnetx gmbh www.thomasmaurer.ch Before Windows
Better Integration of Systems Management Hardware with Linux
Better Integration of Systems Management Hardware with Linux LINUXCON NORTH AMERICA Aug 2014 Charles Rose Engineer Dell Inc. Agenda Introduction Systems Management Hardware/Software Information Available
The GRID and the Linux Farm at the RCF
The GRID and the Linux Farm at the RCF A. Chan, R. Hogue, C. Hollowell, O. Rind, J. Smith, T. Throwe, T. Wlodek, D. Yu Brookhaven National Laboratory, NY 11973, USA The emergence of the GRID architecture
Feature Comparison. Windows Server 2008 R2 Hyper-V and Windows Server 2012 Hyper-V
Comparison and Contents Introduction... 4 More Secure Multitenancy... 5 Flexible Infrastructure... 9 Scale, Performance, and Density... 13 High Availability... 18 Processor and Memory Support... 24 Network...
ovirt and Gluster hyper-converged! HA solution for maximum resource utilization
ovirt and Gluster hyper-converged! HA solution for maximum resource utilization 31 st of Jan 2016 Martin Sivák Senior Software Engineer Red Hat Czech FOSDEM, Jan 2016 1 Agenda (Storage) architecture of
ovirt and Gluster hyper-converged! HA solution for maximum resource utilization
ovirt and Gluster hyper-converged! HA solution for maximum resource utilization 21 st of Aug 2015 Martin Sivák Senior Software Engineer Red Hat Czech KVM Forum Seattle, Aug 2015 1 Agenda (Storage) architecture
Management of VMware ESXi. on HP ProLiant Servers
Management of VMware ESXi on W H I T E P A P E R Table of Contents Introduction................................................................ 3 HP Systems Insight Manager.................................................
Monitoring Infrastructure for Superclusters: Experiences at MareNostrum
ScicomP13 2007 SP-XXL Monitoring Infrastructure for Superclusters: Experiences at MareNostrum Garching, Munich Ernest Artiaga Performance Group BSC-CNS, Operations Outline BSC-CNS and MareNostrum Overview
RED HAT ENTERPRISE VIRTUALIZATION FOR SERVERS: COMPETITIVE FEATURES
RED HAT ENTERPRISE VIRTUALIZATION FOR SERVERS: COMPETITIVE FEATURES RED HAT ENTERPRISE VIRTUALIZATION FOR SERVERS Server virtualization offers tremendous benefits for enterprise IT organizations server
Zabbix 1.8 Network Monitoring
Zabbix 1.8 Network Monitoring Monitor your network's hardware, servers, and web performance effectively and efficiently Rihards Olups - PUBLISHING - 1 BIRMINGHAM - MUMBAI Preface 1 Chapter 1: Getting Started
PANDORA FMS NETWORK DEVICES MONITORING
NETWORK DEVICES MONITORING pag. 2 INTRODUCTION This document aims to explain how Pandora FMS can monitor all the network devices available in the market, like Routers, Switches, Modems, Access points,
FOR SERVERS 2.2: FEATURE matrix
RED hat ENTERPRISE VIRTUALIZATION FOR SERVERS 2.2: FEATURE matrix Red hat enterprise virtualization for servers Server virtualization offers tremendous benefits for enterprise IT organizations server consolidation,
Improved metrics collection and correlation for the CERN cloud storage test framework
Improved metrics collection and correlation for the CERN cloud storage test framework September 2013 Author: Carolina Lindqvist Supervisors: Maitane Zotes Seppo Heikkila CERN openlab Summer Student Report
Veritas Storage Foundation High Availability for Windows by Symantec
Veritas Storage Foundation High Availability for Windows by Symantec Simple-to-use solution for high availability and disaster recovery of businesscritical Windows applications Data Sheet: High Availability
WÜRTHPHOENIX NetEye Version 3
WÜRTHPHOENIX NetEye Release Note WÜRTHPHOENIX NetEye Version 3 Release date: March 2009 Overview of the updates and newly introduced functionalities in VS 3 In the following summaries, you can obtain a
UNISOL SysAdmin. SysAdmin helps systems administrators manage their UNIX systems and networks more effectively.
1. UNISOL SysAdmin Overview SysAdmin helps systems administrators manage their UNIX systems and networks more effectively. SysAdmin is a comprehensive system administration package which provides a secure
PANDORA FMS NETWORK DEVICE MONITORING
NETWORK DEVICE MONITORING pag. 2 INTRODUCTION This document aims to explain how Pandora FMS is able to monitor all network devices available on the marke such as Routers, Switches, Modems, Access points,
Wait, How Many Metrics? Monitoring at Quantcast
Wait, How Many Metrics? Monitoring at Quantcast Count what is countable, measure what is measurable, and what is not measurable, make measurable. Galileo Galilei Quantcast offers free direct audience measurement
RED HAT ENTERPRISE VIRTUALIZATION
Giuseppe Paterno' Solution Architect Jan 2010 Red Hat Milestones October 1994 Red Hat Linux June 2004 Red Hat Global File System August 2005 Red Hat Certificate System & Dir. Server April 2006 JBoss April
PLUMgrid Toolbox: Tools to Install, Operate and Monitor Your Virtual Network Infrastructure
Toolbox: Tools to Install, Operate and Monitor Your Virtual Network Infrastructure Introduction The concept of Virtual Networking Infrastructure (VNI) is disrupting the networking space and is enabling
Khóa học dành cho các kỹ sư hệ thống, quản trị hệ thống, kỹ sư vận hành cho các hệ thống ảo hóa ESXi, ESX và vcenter Server
1. Mục tiêu khóa học. Khóa học sẽ tập trung vào việc cài đặt, cấu hình và quản trị VMware vsphere 5.1. Khóa học xây dựng trên nền VMware ESXi 5.1 và VMware vcenter Server 5.1. 2. Đối tượng. Khóa học dành
Rackspace Cloud Databases and Container-based Virtualization
Rackspace Cloud Databases and Container-based Virtualization August 2012 J.R. Arredondo @jrarredondo Page 1 of 6 INTRODUCTION When Rackspace set out to build the Cloud Databases product, we asked many
"FRAMEWORKING": A COLLABORATIVE APPROACH TO CONTROL SYSTEMS DEVELOPMENT
10th ICALEPCS Int. Conf. on Accelerator & Large Expt. Physics Control Systems. Geneva, 10-14 Oct 2005, P-O1.049-6 (2005) "FRAMEWORKING": A COLLABORATIVE APPROACH TO CONTROL SYSTEMS DEVELOPMENT ABSTRACT
Software Scalability Issues in Large Clusters
Software Scalability Issues in Large Clusters A. Chan, R. Hogue, C. Hollowell, O. Rind, T. Throwe, T. Wlodek Brookhaven National Laboratory, NY 11973, USA The rapid development of large clusters built
WhatsUp Gold vs. Orion
Gold vs. Building the network management solution that will work for you is very easy with the Gold family just mix-and-match the Gold plug-ins that you need (WhatsVirtual, WhatsConnected, Flow Monitor,
System Monitoring Using NAGIOS, Cacti, and Prism.
System Monitoring Using NAGIOS, Cacti, and Prism. Thomas Davis, NERSC and David Skinner, NERSC ABSTRACT: NERSC uses three Opensource projects and one internal project to provide systems fault and performance
AGENDA: INTRODUCTION: 1. How is our cloud monitoring setup? 2. Which are the tools used? 3. How do we access monitoring dashboard?
Nagios Introduction AGENDA: INTRODUCTION: 1. How is our cloud monitoring setup? 2. Which are the tools used? 3. How do we access monitoring dashboard? 4. What are the user id / password? 5. How to check
Intel Cloud Builder Guide: Cloud Design and Deployment on Intel Platforms
EXECUTIVE SUMMARY Intel Cloud Builder Guide Intel Xeon Processor-based Servers Red Hat* Cloud Foundations Intel Cloud Builder Guide: Cloud Design and Deployment on Intel Platforms Red Hat* Cloud Foundations
OnCommand Performance Manager 1.1
OnCommand Performance Manager 1.1 Installation and Setup Guide For Red Hat Enterprise Linux NetApp, Inc. 495 East Java Drive Sunnyvale, CA 94089 U.S. Telephone: +1 (408) 822-6000 Fax: +1 (408) 822-4501
NETWORK MONITORING & ALERTING SERVICES SERVICE DEFINITION
NETWORK MONITORING & ALERTING SERVICES Complete IT Support for Business Westgate IT Network Monitoring & Alerting Services: Service Definition Service Name Network Monitoring & Alerting Services Overview
Using EonStor FC-host Storage Systems in VMware Infrastructure 3 and vsphere 4
Using EonStor FC-host Storage Systems in VMware Infrastructure 3 and vsphere 4 Application Note Abstract This application note explains the configure details of using Infortrend FC-host storage systems
White paper: Unlocking the potential of load testing to maximise ROI and reduce risk.
White paper: Unlocking the potential of load testing to maximise ROI and reduce risk. Executive Summary Load testing can be used in a range of business scenarios to deliver numerous benefits. At its core,
Tk20 Network Infrastructure
Tk20 Network Infrastructure Tk20 Network Infrastructure Table of Contents Overview... 4 Physical Layout... 4 Air Conditioning:... 4 Backup Power:... 4 Personnel Security:... 4 Fire Prevention and Suppression:...
Cloud Server. Parallels. Key Features and Benefits. White Paper. www.parallels.com
Parallels Cloud Server White Paper Key Features and Benefits www.parallels.com Table of Contents Introduction... 3 Key Features... 3 Distributed Cloud Storage (Containers and Hypervisors)... 3 Rebootless
Database Services for Physics @ CERN
Database Services for Physics @ CERN Deployment and Monitoring Radovan Chytracek CERN IT Department Outline Database services for physics Status today How we do the services tomorrow? Performance tuning
MS SQL Server 2014 New Features and Database Administration
MS SQL Server 2014 New Features and Database Administration MS SQL Server 2014 Architecture Database Files and Transaction Log SQL Native Client System Databases Schemas Synonyms Dynamic Management Objects
Analysis of Issues with Load Balancing Algorithms in Hosted (Cloud) Environments
Analysis of Issues with Load Balancing Algorithms in Hosted (Cloud) Environments Branko Radojević *, Mario Žagar ** * Croatian Academic and Research Network (CARNet), Zagreb, Croatia ** Faculty of Electrical
ATLAS job monitoring in the Dashboard Framework
ATLAS job monitoring in the Dashboard Framework J Andreeva 1, S Campana 1, E Karavakis 1, L Kokoszkiewicz 1, P Saiz 1, L Sargsyan 2, J Schovancova 3, D Tuckett 1 on behalf of the ATLAS Collaboration 1
Best Practices for Optimizing Your Linux VPS and Cloud Server Infrastructure
Best Practices for Optimizing Your Linux VPS and Cloud Server Infrastructure Q1 2012 Maximizing Revenue per Server with Parallels Containers for Linux www.parallels.com Table of Contents Overview... 3
Neverfail for Windows Applications June 2010
Neverfail for Windows Applications June 2010 Neverfail, from Neverfail Ltd. (www.neverfailgroup.com), ensures continuity of user services provided by Microsoft Windows applications via data replication
A recipe using an Open Source monitoring tool for performance monitoring of a SaaS application.
A recipe using an Open Source monitoring tool for performance monitoring of a SaaS application. Sergiy Fakas, TOA Technologies Nagios is a popular open-source tool for fault-monitoring. Because it does
SYNERGY SOFTWARE DATA SHEET. www.the-imcgroup.com
SYNERGY SOFTWARE DATA SHEET www.the-imcgroup.com SYNERGY SOFTWARE Synergy is the software platform system to support many current and all future Hanwell hardware and replaces the current RadioLog software.
High Availability of the Polarion Server
Polarion Software CONCEPT High Availability of the Polarion Server Installing Polarion in a high availability environment Europe, Middle-East, Africa: Polarion Software GmbH Hedelfinger Straße 60 70327
Automating Management for Weblogic Server on JRockit-VE. Marius Sandu-Popa
Automating Management for Weblogic Server on JRockit-VE Marius Sandu-Popa CERN openlab 8 August 2010 otn-2010-01 openlab Summer Student Report Automating Management for WebLogic Server on JRockit-VE Marius
A Scalable Network Monitoring and Bandwidth Throttling System for Cloud Computing
A Scalable Network Monitoring and Bandwidth Throttling System for Cloud Computing N.F. Huysamen and A.E. Krzesinski Department of Mathematical Sciences University of Stellenbosch 7600 Stellenbosch, South
EWeb: Highly Scalable Client Transparent Fault Tolerant System for Cloud based Web Applications
ECE6102 Dependable Distribute Systems, Fall2010 EWeb: Highly Scalable Client Transparent Fault Tolerant System for Cloud based Web Applications Deepal Jayasinghe, Hyojun Kim, Mohammad M. Hossain, Ali Payani
CSS ONEVIEW G-Cloud CA Nimsoft Monitoring
CSS ONEVIEW G-Cloud CA Nimsoft Monitoring Service Definition 01/04/2014 CSS Delivers Contents Contents... 2 Executive Summary... 3 Document Audience... 3 Document Scope... 3 Information Assurance:... 3
Configuring Windows Server Clusters
Configuring Windows Server Clusters In Enterprise network, group of servers are often used to provide a common set of services. For example, Different physical computers can be used to answer request directed
NOCTUA by init.at THE FLEXIBLE MONITORING WEB FRONTEND
NOCTUA by init.at THE FLEXIBLE MONITORING WEB FRONTEND init.at informationstechnologie GmbH - Tannhäuserplatz 2 - A-1150 Wien - www.init.at Dieses Dokument und alle Teile von ihm bilden ein geistiges Eigentum
Monitoring individual traffic flows within the ATLAS TDAQ network
Home Search Collections Journals About Contact us My IOPscience Monitoring individual traffic flows within the ATLAS TDAQ network This content has been downloaded from IOPscience. Please scroll down to
DevOps Course Content
DevOps Course Content INTRODUCTION TO DEVOPS What is DevOps? History of DevOps Dev and Ops DevOps definitions DevOps and Software Development Life Cycle DevOps main objectives Infrastructure As A Code
Red Hat Network Satellite (On System z) 18-JUNE CAVMEN Meeting
Red Hat Network Satellite (On System z) 18-JUNE CAVMEN Meeting Shawn D. Wells System z Platform Manager (+1) 443 534 0130 Why are we here? PROBLEM SCENARIO SysAdmin wants to automate Linux
VMware vsphere: Install, Configure, Manage [V5.0]
VMware vsphere: Install, Configure, Manage [V5.0] Gain hands-on experience using VMware ESXi 5.0 and vcenter Server 5.0. In this hands-on, VMware -authorized course based on ESXi 5.0 and vcenter Server
Network Management Deployment Guide
Smart Business Architecture Borderless Networks for Midsized organizations Network Management Deployment Guide Revision: H1CY10 Cisco Smart Business Architecture Borderless Networks for Midsized organizations
VMware Server 2.0 Essentials. Virtualization Deployment and Management
VMware Server 2.0 Essentials Virtualization Deployment and Management . This PDF is provided for personal use only. Unauthorized use, reproduction and/or distribution strictly prohibited. All rights reserved.
Windows Server 2008 R2 Hyper-V Server and Windows Server 8 Beta Hyper-V
Features Comparison: Hyper-V Server and Hyper-V February 2012 The information contained in this document relates to a pre-release product which may be substantially modified before it is commercially released.
The enterprise class monitoring solution for everyone
The enterprise class monitoring solution for everyone Today's topics Zabbix overview We met 2 years ago what's new? Uncommon uses 2 years for Zabbix Zabbix 2.0 out in May 13 maintenance releases of 1.8
Gigabyte Content Management System Console User s Guide. Version: 0.1
Gigabyte Content Management System Console User s Guide Version: 0.1 Table of Contents Using Your Gigabyte Content Management System Console... 2 Gigabyte Content Management System Key Features and Functions...
LSKA 2010 Survey Report Job Scheduler
LSKA 2010 Survey Report Job Scheduler Graduate Institute of Communication Engineering {r98942067, r98942112}@ntu.edu.tw March 31, 2010 1. Motivation Recently, the computing becomes much more complex. However,
Best Practices for Deploying and Managing Linux with Red Hat Network
Best Practices for Deploying and Managing Linux with Red Hat Network Abstract This technical whitepaper provides a best practices overview for companies deploying and managing their open source environment
Compare E SPIN NMS SaaS Plan & Addon
E-SPIN NMS System as a Service Subscription (Saas) Screenshot Depend on the final NMS system and/or subscribed service plan, the available dashboard component is different, can be generalize in the follow
Solution Brief Availability and Recovery Options: Microsoft Exchange Solutions on VMware
Introduction By leveraging the inherent benefits of a virtualization based platform, a Microsoft Exchange Server 2007 deployment on VMware Infrastructure 3 offers a variety of availability and recovery
