Kickstart & Puppet @ Booking. Kristian Köhntopp, booking.com



Similar documents
Pro Puppet. Jeffrey McCune. James TurnbuII. Apress* m in

Making System Administration Easier by Letting the Machines Do the Hard Work, Or, Becoming an Agile Sysadmin

Deploying Foreman in Enterprise Environments 2.0. best practices and lessons learned. Nils Domrose Cologne, August,

The Puppet Show Managing Servers with Puppet

Windows Template Creation Guide. How to build your own Windows VM templates for deployment in Cloudturk.

Deploy and test ovirt using nested virtualization environments. Mark Wu

2000 databases later. Kristian Köhntopp Mittwoch, 17. April 13

Red Hat Linux Administration II Installation, Configuration, Software and Troubleshooting

Building Hosts with Puppet

Linux - CentOS 6 Install Guide

Advanced Linux System Administration on Red Hat

Secure Linux Administration Conference Bernd Strößenreuther

CSE/ISE 311: Systems Administra5on Logging

Cloud Homework instructions for AWS default instance (Red Hat based)

Simplifying Your IT Helpdesk with Request Tracker

Migration and Building of Data Centers in IBM SoftLayer with the RackWare Management Module

The road to lazy monitoring with Icinga2 & Puppet. Tom De

Installing Booked scheduler on CentOS 6.5

Using SNMP to Obtain Port Counter Statistics During Live Migration of a Virtual Machine. Ronny L. Bull Project Writeup For: CS644 Clarkson University

Red Hat Network Satellite Management and automation of your Red Hat Enterprise Linux environment

Red Hat Satellite Management and automation of your Red Hat Enterprise Linux environment

How to configure High Availability (HA) in AlienVault USM (for versions 4.14 and prior)

Migration and Building of Data Centers in IBM SoftLayer with the RackWare Management Module

RH033 Red Hat Linux Essentials or equivalent experience with Red Hat Linux..

Apple Airport Extreme Base Station V4.0.8 Firmware: Version 5.4

Oracle Linux Advanced Administration

Opsview in the Cloud. Monitoring with Amazon Web Services. Opsview Technical Overview

RHEL to SLES Migration Overview

See the installation page

Running an OpenStack Cloud for several years and living to tell the tale. Alexandre Maumené Gaëtan Trellu Tokyo Summit, November 2015

RHCSA 7RHCE Red Haf Linux Certification Practice

TestOps: Continuous Integration when infrastructure is the product. Barry Jaspan Senior Architect, Acquia Inc.

Cloud Computing for Control Systems CERN Openlab Summer Student Program 9/9/2011 ARSALAAN AHMED SHAIKH

Application Discovery Manager User s Guide vcenter Application Discovery Manager 6.2.1

Mittwoch, 12. Jänner Grml

TECHNICAL HOWTO. Imaging Linux systems with hardware changes. author:

How to Deploy a Secure, Highly-Available Hadoop Platform

System Management. Leif Nixon. a security perspective 1/37

TF-NOC Dublin. Alexandros Kosiaris GRNET NOC Use puppet and network inventory to populate nagios/icinga configuration

Simple Software for a Business Continuity Plan and a Disaster Recovery Plan (BCP DRP) pdf ppt

Maintaining Non-Stop Services with Multi Layer Monitoring

Automatic System Installations And Change Management. with FAI

OnCommand Performance Manager 1.1

Google

E-commerce is also about

NOC PS manual. Copyright Maxnet All rights reserved. Page 1/45 NOC-PS Manuel EN version 1.3

Red Hat Enterprise Linux OpenStack Platform 8 Partner Integration

Red Hat System Administration 1(RH124) is Designed for IT Professionals who are new to Linux.

SUSE Cloud 2.0. Pete Chadwick. Douglas Jarvis. Senior Product Manager Product Marketing Manager

High Availability Solutions for the MariaDB and MySQL Database

Cloudera Manager Introduction

Zabbix. At Scale. By Steve Mushero September, Running the World s Internet Servers. Copyright 2014 ChinaNetCloud

Deploying Database clusters in the Cloud

Pro Linux System Administration. James Turnbull, Peter Lieverdink, Dennis Matotek

StruxureWare Data Center Expert Release Notes

Week Overview. Installing Linux Linux on your Desktop Virtualization Basic Linux system administration

Continuous Integration in the Cloud with Hudson

LAMP : THE PROMINENT OPEN SOURCE WEB PLATFORM FOR QUERY EXECUTION AND RESOURCE OPTIMIZATION. R. Mohanty Mumbai, India

XpoLog Competitive Comparison Sheet

ESX 4 Patch Management Guide ESX 4.0

GL-250: Red Hat Linux Systems Administration. Course Outline. Course Length: 5 days

Parallels Plesk Automation

PARALLELS SERVER 4 BARE METAL README

Drupal CMS for marketing sites

EVENT LOG MANAGEMENT...

How to Install Windows on Xen TM 3.0

Converting Linux and Windows Physical and Virtual Machines to Oracle VM Virtual Machines. An Oracle Technical White Paper December 2008

Blackboard Open Source Monitoring

Deploying and Monitoring Ruby on Rails A practical guide

Oracle Linux 7: System Administration Ed 1 NEW

ICANWK401A Install and manage a server

EMC DOCUMENTUM xplore 1.1 DISASTER RECOVERY USING EMC NETWORKER

Deployment of Private, Hybrid & Public Clouds with OpenNebula

Release Notes for Fuel and Fuel Web Version 3.0.1

The current version installed on your server is el6.x86_64 and it's the latest available.

Installing an open source version of MateCat

Command Center :56:41 UTC Citrix Systems, Inc. All rights reserved. Terms of Use Trademarks Privacy Statement

JOINUS AG. PowerPay Checkout. Magento Module User Manual. Support:

Fermilab Central Web Service Site Owner User Manual. DocDB: CS-doc-5372

Installing RHEL 6.x from beginning to end using PXE and Kickstart

Administration: Users and Roles

VMware Identity Manager Connector Installation and Configuration

A SURVEY ON AUTOMATED SERVER MONITORING

Railo Installation on CentOS Linux 6 Best Practices

Red Hat Certifications: Red Hat Certified System Administrator (RHCSA)

1 Scope of Assessment

EaseUS Todo Backup PXE Server

Tools and strategies to monitor the ATLAS online computing farm

IceWarp to IceWarp Server Migration

Cloud.com CloudStack Community Edition 2.1 Beta Installation Guide

Transcription:

Kickstart & Puppet @ Booking Kristian Köhntopp, booking.com

What Booking does Facilitates Hotel Room Bookings between Travelers and Hotels. Just that.

Booking Data Hotel Base Data, Brochures, Reviews & Score, Availability by Room, Rate and Date. A large history of stuff.

Booking Tech Frontends w/ Linux, Apache, mod_perl, With different functional classes. Databases MySQL, Also differentiation. Lots of Infrastructure systems.

Booking Size FE to DB rate of ~ 4-6 to 1. About 160 slaves, about a dozen schemata. About 1000 hosts. Growing fast.

Building a new DC Build a Business Continuity Facility! You are not allowed to touch! Completely automated installation and configuration.

ServerDB MAC addresses pre-announced by vendor. Or gathered from OOB maintenance interface for installed machines. Enter it into ServerDB, Assign function and status.

pxebooting Generate a PXE Boot config and KS file. pxeboot the box first time, Boot order: disk, net, Menu as additional safeguard unless marked in ServerDB.

pxelinux.cfg pxelinux loads pxelinux.cfg/01-$mac. aftpd has been patched: Call script for nonexisting files, Script acts on ServerDB flags.

pxelinux.cfg [root@bkbuild-01 bin]# tftp_generator --file kstest Serving pxelinux.cfg file for 00:1E:68:0F:46:F8/kstest # Generated from data in the serverdb # See https://wiki/ /ServerDB PROMPT 1 TIMEOUT 50 DEFAULT co5-x86_64 LABEL local LOCALBOOT 0x80 LABEL co5-x86_64 kernel vmlinuz-co5-x86_64 append initrd=initrd-co5-x86_64 lang=us pci=bfsort nofb text devfs=nomount ramdisk_size=7168 network ksdevice=eth0 ks=http:// /kick/kstest.dqs.lhr1.booking.com

Kickstart Load.ks file via http. Dynamically generated in Apache from ServerDB.

Kickstart part /boot --fstype ext3 --size 100 --asprimary part swap --size 1000 part pv.01 --size=100 --grow volgroup VolGroup00 pv.01 logvol / --fstype ext3 --name=root --vgname=volgroup00 --size=100 --grow %post /bin/rm -f /etc/yum.repos.d/* /bin/cat > /etc/yum.repos.d/booking.repo <<EOF yum -y install puppet ruby-rdoc /sbin/chkconfig --level 345 puppet on

Overrides If a file exists, the scripts are not called: At pxeboot level, At kickstart level. Alternative: Set state to live or standby in ServerDB: You get the menu.

Lessons so far Automate everything. Use a database. Provide an easy way out: Optimize the common case, Forward special cases.

Puppet Migrate to puppet gradually: Run puppetd everywhere: Existing hosts & new hosts. Have it do nothing at first. Roll out node-by-node, service-byservice.

Puppet Right now: 318 nodes in site.pp. 141 databases in site.pp. LDAP planned.

Migration to Puppet Test a new service definition. Roll out to individual nodes via site.pp. If fine, make part of base::common, if applicable to all nodes.

base::common Common services: Cron, Nagios, nsswitch, LDAP, NTP, Puppet, resolver, ssh, SNMP, sudo, sysctl, syslog. Package Management and common packages.

Differentiation Apache (lots of flavors). Service definitions according to function. Databases (partial): MySQL and Merlin deploys, requires storage configuration. Memcaches.

Differentiation node "mc01lb-01.prod.lhr1.booking.com" { include "s_lb" } node "sc01static-01.prod.lhr1.booking.com" {include "s_webstatic::static" } node "mc01avrdb-02.prod.lhr1.booking.com" { include "s_db::avrdb" }

Differentiation Service definitions vary wildly in size: Load balancer: 10 lines. Database: 541 lines. Not even complete yet. About 2 dozen services. About 2 dozen modules.

Benefits Works. Pretty. Crossplatform. Deploy time from poweron: 20 min through Kickstart. Additional 6 to 20min through puppet.

Possible problems In creating puppet structure, we ran into a number of obstacles. For some of these, solutions exist. For others, workarounds are needed.

Problems: Conceptual Declarative Syntax: Tell Puppet what you want, not how it is done. Hard to do for some services. Task: Generate a my.cnf. No way out? Generator script Deploy.

Problems: Facter Facter Server/Template Node. Facts are scalars. Templating at the server. Task: Generate a my.cnf, Manage lvm facts.

Problems: Performance Puppet performs as if it was written in Ruby. mod_ruby is a must. splay does not help a lot.

Problems: Large files As a file transfer service, puppet sucks. Task: Deploy one of several 18M.bin files for Merlin, run a bunch of setup scripts. Lazy solution: Filebucket OOM. Pseudo-RPM yum. Fixed in upcoming release.

Problems: Instability Logrotate during puppet run: Puppet crashes. High load during facter run: Crashing facts are cashed Server poisoned. All of these are Heisenbugs.

Problems: Ordering Puppet reorders and could parallelize. Dependencies must be declared. That is hard to do and debug. Parse puppet and drop into graphviz: --graph option.