Icinga and Puppet Dominik Schulz Head of Datacenter and Operations Magic Internet / MyVideo



Similar documents
Secure Linux Administration Conference Bernd Strößenreuther

Pro Puppet. Jeffrey McCune. James TurnbuII. Apress* m in

depl Documentation Release depl contributors

SIG-NOC Meeting - Stuttgart 04/08/2015 Icinga - Open Source Monitoring

The road to lazy monitoring with Icinga2 & Puppet. Tom De

ICINGA2 OPEN SOURCE MONITORING

DevShop. Drupal Infrastructure in a Box. Jon Pugh CEO, Founder ThinkDrop Consulting Brooklyn NY

Platform as a Service and Container Clouds

Scaling Graphite Installations

ENTERPRISE-CLASS MONITORING SOLUTION FOR EVERYONE ALL-IN-ONE OPEN-SOURCE DISTRIBUTED MONITORING

CLOUD API DOCUMENTATION v2.0. Get list of cloud servers in account

Network Monitoring with Nagios. Matt Gracie, Information Security Administrator Canisius College, Buffalo, NY

BOA: Drupal Utopia. Jim Smith. Drupal Camp Charlotte June 10, 2012

Product Manual. MDM On Premise Installation Version 8.1. Last Updated: 06/07/15

Modern Web development and operations practices. Grig Gheorghiu VP Tech Operations Nasty Gal

Taking Drupal development to the Cloud. Karel Bemelmans

Creating a DUO MFA Service in AWS

NOCTUA by init.at THE FLEXIBLE MONITORING WEB FRONTEND

Opsview in the Cloud. Monitoring with Amazon Web Services. Opsview Technical Overview

CI Pipeline with Docker

deploying meteor with meteor up

developing sysadmin - sysadmining developers

How To Monitor A Server With Zabbix

CURRENT STATE OF ICINGA

Migrating a running service to AWS

Building Success on Acquia Cloud:

Optimizing your Monitoring and Trending tools for the Cloud

Monitoring Oracle Enterprise Performance Management System Release Deployments from Oracle Enterprise Manager 12c

The Puppet Show Managing Servers with Puppet

ZeroTurnaround License Server User Manual 1.4.0

JAMF Software Server Installation and Configuration Guide for Linux. Version 9.2

insync Installation Guide

Migration and Disaster Recovery Underground in the NEC / Iron Mountain National Data Center with the RackWare Management Module

Linux A first-class citizen in Windows Azure. Bruno Terkaly bterkaly@microsoft.com Principal Software Engineer Mobile/Cloud/Startup/Enterprise

Migration and Building of Data Centers in IBM SoftLayer with the RackWare Management Module

My DevOps Journey by Billy Foss, Engineering Services Architect, CA Technologies

Introduction to system monitoring with Nagios, Check_MK and Open Monitoring Distribution (OMD)

TF-NOC Dublin. Alexandros Kosiaris GRNET NOC Use puppet and network inventory to populate nagios/icinga configuration

Making System Administration Easier by Letting the Machines Do the Hard Work, Or, Becoming an Agile Sysadmin

JAMF Software Server Installation and Configuration Guide for OS X. Version 9.2

How to Deploy a Secure, Highly-Available Hadoop Platform

Jenkins: The Definitive Guide

ABRAHAM ARCHITECTURE OF A CLOUD SERVICE USING PYTHON TECHNOLOGIES

Avira Management Console Optimization for large networks. HowTo

Nagios. cooler than it looks. Wednesday, 31 October 2007

Ciphermail Gateway Separate Front-end and Back-end Configuration Guide

CONSUL AS A MONITORING SERVICE

Your eyes in the network

Deploying System Center 2012 R2 Configuration Manager

KonyOne Server Installer - Linux Release Notes

Monitoring Drupal with Sensu. John VanDyk Iowa State University DrupalCorn Iowa City August 10, 2013

NS DISCOVER 4.0 ADMINISTRATOR S GUIDE. July, Version 4.0

Migration to Zabbix. By Erik Skytthe, DBC, Denmark

Monitor Open stack environments from the bottom up and front to back. Roger Ruttimann VP Engineering, GroundWork OpenSource November 17, 2015

Ansible. Configuration management tool and ad hoc solution. Marcel Nijenhof

Best Practices for Python in the Cloud: Lessons

Mobile Device Management Version 8. Last updated:

AklaBox. The Ultimate Document Platform for your Cloud Infrastructure. Installation Guideline

FXLoader Cloud Service Deployment Guide

DevOps Course Content

JAMF Software Server Installation Guide for Linux. Version 8.6

AFW: Automating host-based firewalls with Chef

Amazon Elastic Beanstalk

Easy Setup Guide 1&1 CLOUD SERVER. Creating Backups. for Linux

Network Monitoring. Review of Software

Magento Search Extension TECHNICAL DOCUMENTATION

Enterprise Application Monitoring with

Nessus Agents. October 2015

Upgrading to Ubuntu Server Edition LTS

VERSION 9.02 INSTALLATION GUIDE.

How To Monitor Your Computer With Nagiostee.Org (Nagios)

Migrating Exchange Server to Office 365

How To Write A Monitoring System For Free

Step One: Installing Rsnapshot and Configuring SSH Keys

Lustre Monitoring with OpenTSDB

Building a Scalable News Feed Web Service in Clojure

rpaf KTl Pen source Plone 3.3 Site Administration Manage your site like a Plone professional Alex Clark

Prepared for: How to Become Cloud Backup Provider

Managing Linux Computers Using System Center 2012 R2 Configuration Manager

EVault Software. Course 361 Protecting Linux and UNIX with EVault

Veritas CommandCentral Disaster Recovery Advisor Release Notes 5.1

Migration and Building of Data Centers in IBM SoftLayer with the RackWare Management Module

Applications Manager Best Practices document

PUPPET FOR MANAGED HOSTING PROVIDERS

AGENDA: INTRODUCTION: 1. How is our cloud monitoring setup? 2. Which are the tools used? 3. How do we access monitoring dashboard?

Availability Monitoring using Http Ping

Getting Started Using Project Photon on VMware Fusion/Workstation

JAVA IN THE CLOUD PAAS PLATFORM IN COMPARISON

Avira Management Console User Manual

SapphireIMS 4.0 Asset Management Feature Specification

Case Study. SaaS Based Multi-Store Market Place Brainvire Infotech Pvt. Ltd Page 1 of 5

FioranoMQ 9. High Availability Guide

Continuous Integration/Testing and why you should assume every change breaks your code

for NewTech United, London

Systems Management with Open Source

JAMF Software Server Installation and Configuration Guide for OS X. Version 9.0

Deployment for Network Proxy in Simpana Environment

Every Silver Lining Has a Vault in the Cloud

Energy Management Web-based embedded solution for monitoring of distributed conventional energy applications Type Em 2 -Server

Nagios and Cloud Computing

Transcription:

A company of ProSiebenSat.1 Media AG Berlin, Mai2014 Icinga and Puppet Dominik Schulz Head of Datacenter and Operations Magic Internet / MyVideo

Our Stack Icinga: 300 Hosts and over 4000 Services Linux (Ubuntu, Debian) Managed by Puppet Heterorgenous infrastructure Private Cloud Public Cloud Dedicated Servers

Our Environments Several environments development, integrajon, staging, live Several locajons private cloud, on- premise, public clouds,...

Our History Introduced Puppet in 2012 Introduced Zabbix in 2013 Not in use anymore Introduced PuppetDB in second half of 2013 Icinga since end of 2013

Reflections Puppet knows everything about our environment Why not let it feed our monitoring? GitHub does it Puppet has najve Nagios resources Sounds good?

The Plan Distributed Puppet / Icinga One Puppetmaster per environment Also Icinga Satellite Master (dubbed Server ) Performing acjve checks on its nodes SubmiYng results to the Master And Graphite Relay One central Icinga Server (dubbed Master ) Only passive Checks Receiving check results from satellite masters

The Hierachy Icinga Master Icinga Server Live Icinga Server Staging Icinga Server IntegraJon Webserver Live Webserver Staging Webserver IntegraJon

Excourse: Our Puppet Hierachy We use modules, services and roles with hiera The well known components, profiles, roles pa4ern with different names Each module knows about the the package it handles and the supported OS No data, no business logic Each service uses dumb modules to implement the business logic (wiring, folders, monitoring, backup,...) Each role includes different services (no logic here)

Example Hierachy Role: Frontend Web Role: Backend Search Service: Nginx Service: SolR Modul: Nginx Modul: Redis Modul: Tomcat

The Configuration PuppetDB Icinga Master Icinga Server Each Puppet Service exports its Icinga Checks to the PuppetDB The Master and each Icinga Server realize these resources Agent

Example: Distributed Check I class service_nginx_frontend (... ) {... service_monitoring::check { "HTTP- localhost- 81- ${vhost_name": service_description => "HTTP VHost for ${vhost_name on Port 81", host_name => $fqdn, command => "check_http_81_${vhost_name", nrpe_command => "check_http - H ${vhost_name - I 127.0.0.1 - p 81-4 - k 'X- SECRET: 42' - u /ping/ - e 'HTTP/1.1 200'", process_perf_data => 0,

Example: Distributed Check II define service_monitoring::check ( $host_name, $command, $nrpe_command,... ) { @@module_icinga::server::check { "${fqdn- ${name": ensure => $ensure, host_name => $host_name,... @@module_icinga::master::check { "${fqdn- ${name": ensure => $ensure, host_name => $host_name,... if $nrpe_command { module_icinga::client::check { $command: Server Master Client

Example: Distributed Checks III Class module_icinga::master::config (... ) {... Module_icinga::Master::Check << >> Module_icinga::Master::Host << >> Master Class module_icinga::server::config ( $location,... ) { Server... Module_icinga::Server::Check << location == $location >> Module_icinga::Server::Host << location == $location >>

Example: Checking Backups I People don t want backups, they want recovery We don t have checks for recoverability, yet But making sure backups actually succeed is imporant, too We use pull- backups Only a few hosts, most hosts don t need backups Automated Provisioning, Deployment and Source Control Idea: Export backup jobs to the backup servers Icinga Checks should be exported as well

Example: Checking Backups II Service_gitlab (... ) { service_backup::vault { $fqdn: path => "/home/git, Every service should ensure it is backed up Only if necessary Backup Resource (vault) takes care of backup and monitoring

Example: Checking Backups III define service_backup::vault( ) { @@module_revobackup::vault { "$name": source => "${user@${host:${path", server => $backup_server, sudo => 1, module_icinga::client::plugin { 'check_backup : file => "puppet:///.../plugins/check_backup", service_monitoring::check { 'Backup : nrpe_command => "check_backup - w $warn - c $crit - p $path",...

Awesome! Sounds great and easy In pracjce it was a liale more difficult Lets look at some of the issues we had with this setup

Issues Checks becoming stale Modeling Host- and Servicegroups Metrics NoJficaJons MigraJng old Nagios Checks Breaking Icinga Removing Checks Disabling certain Hosts / Environments Puppet / Ruby performance

Checks becoming stale Nsca / send_nsca are preay old Do not scale very well We did oeen get batches of stale checks First we raised the freshness interval Once this was maxed out we tried nsca- ng Out of the box with Ubuntu 14.04 LTS Since then we did not look back

Modeling Host- and Servicegroups Host- and Servicegroups are very nice Modeling them in puppet is a liale difficult Hint: Use a fallback groups which are always defined

Gathering Metrics Started feeding Nagios Perfdata to Graphite Quickly became clear that we want a finer resolujon Switched to Diamond + Graphite Relays Works quite well StatsD / CollectD may be even beaer suited If we want to switch puppet makes this preay easy

Notifications Sending nojficajons (Email / SMS) is sjll an issue A large environment tends to produce quite a lot of false posijves If only for a short period of Jme anag works quite well, but it s no push nojficajon SuggesJons?

Migrating old Nagios Checks When we introduced Icinga we sjll had an old Nagios instance running preay unaaended How do you migrate those checks to an puppet- manged Icinga Master? Easy: Add this in your /etc/icinga/icinga.conf: Cfg_dir=/etc/icinga/legacy.d/ Put the old configurajon in there Some minor adjustments and your good to go

Removing Checks Puppet / PuppetDB is a great tooling But somejmes it complicates things a bit Removing hosts or services is not as easy as it used to be w/o Puppet Removing a host: puppet node deacjvate <fqdn> Removing a service: Export an icinga check resource with ensure => absent

Disabling Hosts Having all hosts in monitoring is great But certain hosts don t need to be monitored Reduce noise and distracjon Using $enable flags on exported resources is the key May take a few iterajons to get it right Icinga doesn t like services referencing non- exisjng hosts

Puppet / Ruby Performance As the number of resources grows things slow down Puppet is wriaen in Ruby Ruby is opjmized for developers Ruby is NOT opjmized for execujon speed Puppet tends to get real slow with huge catalogs SoluJon: Raise your Puppet Jmeouts VERY high We re really eager to see how Ruby 2.x will perform

Questions? QuesJons? SuggesJons?