Scalable System Operations. Joshua Hoffman



Similar documents
Table 1-1. PXE Server Side Requirements

Deploying Foreman in Enterprise Environments 2.0. best practices and lessons learned. Nils Domrose Cologne, August,

NOC PS manual. Copyright Maxnet All rights reserved. Page 1/45 NOC-PS Manuel EN version 1.3

SUSE Linux Enterprise Server 11 SP2 for UEFI Clients

Installing RHEL 6.x from beginning to end using PXE and Kickstart

Achieve Automated, End-to-End Firmware Management with Cisco UCS Manager

Open Source Datacenter Conference 2011 System Management with RHN Satellite. Dirk Herrmann, Solution Architect, Red Hat

Aspen Cloud Server Management Console

Parallels Cloud Server 6.0

THE HONG KONG POLYTECHNIC UNIVERSITY Department of Electronic and Information Engineering

Deploying Baremetal Instances with OpenStack

Red Hat System Administration 1(RH124) is Designed for IT Professionals who are new to Linux.

Release Notes for Fuel and Fuel Web Version 3.0.1

DKTCOMEGA AONode Management Module User Documentation

Event Listener based UCS Integration with OpenStack

Open Network Install Environment (ONIE) LinuxCon North America 2015

ORACLE OPS CENTER: PROVISIONING AND PATCH AUTOMATION PACK

Installing and Using the vnios Trial

SUSE Manager in the Public Cloud. SUSE Manager Server in the Public Cloud

How To Backup In Cisco Uk Central And Cisco Cusd (Cisco) Cusm (Custodian) (Cusd) (Uk) (Usd).Com) (Ucs) (Cyse

GL-250: Red Hat Linux Systems Administration. Course Outline. Course Length: 5 days

Installation of the On Site Server (OSS)

ovirt self-hosted engine seamless deployment

LOCKSS on LINUX. CentOS6 Installation Manual 08/22/2013

SOA Software API Gateway Appliance 7.1.x Administration Guide

HP CloudSystem Enterprise

FOG Guide. IPBRICK International. July 17, 2013

Closing the Loop: Automating Server Setups. Vanessa Vasile, InMotion Hosting

Cork Institute of Technology Master of Science in Computing in Education National Framework of Qualifications Level 9

Optimally Manage the Data Center Using Systems Management Tools from Cisco and Microsoft

Home Linux Networking Lab (202) This Howto shows how to recreate the CIS Lab environment at home.

File Transfers. Contents

Setting up a Linux PXE server and integrating clients

Cloud on TIEN Part I: OpenStack Cloud Deployment. Vasinee Siripoonya Electronic Government Agency of Thailand Kasidit Chanchio Thammasat

System Admin Module User Guide. Schmooze Com Inc.

Advanced Linux System Administration on Red Hat

VoIP Zone Controller: 4-Port Audio Out Operations Guide

ENTERPRISE LINUX SYSTEM ADMINISTRATION

RH033 Red Hat Linux Essentials or equivalent experience with Red Hat Linux..

APPLICATION NOTE. CC5MPX Digital Camera and IPn3Gb Cellular Modem 10/14. App. Note Code: 3T-Z

pp=pod number, xxx=static IP address assigned to your pod

Intelligent Power Protector User manual extension for Microsoft Virtual architectures: Hyper-V 6.0 Manager Hyper-V Server (R1&R2)

Application Notes for the Ingate SIParator with Avaya Converged Communication Server (CCS) - Issue 1.0

Dell UPS Local Node Manager USER'S GUIDE EXTENSION FOR MICROSOFT VIRTUAL ARCHITECTURES Dellups.com

CommandCenter Secure Gateway

Dell EqualLogic Red Hat Enterprise Linux 6.2 Boot from SAN

Apache CloudStack 4.x (incubating) Network Setup: excerpt from Installation Guide. Revised February 28, :32 pm Pacific

Preparation Guide. How to prepare your environment for an OnApp Cloud v3.0 (beta) deployment.

An Oracle White Paper July Oracle VM 3: Building a Demo Environment using Oracle VM VirtualBox

Setting up your K12LTSP or LTSP loadbalancing

LOCKSS on LINUX. Installation Manual and the OpenBSD Transition 02/17/2011

Field Installation Guide

Installing Operating Systems

Red Hat Linux Networking

BUFFALO LS421DE NAS. 1) Go to Buffalo web-site, buffalotech.com, download the following: - firmware update, comes with LSupdater program - unzip file

How To Install Eucalyptus (Cont'D) On A Cloud) On An Ubuntu Or Linux (Contd) Or A Windows 7 (Cont') (Cont'T) (Bsd) (Dll) (Amd)

Tracking Network Changes Using Change Audit

Cloud.com CloudStack Community Edition 2.1 Beta Installation Guide

Managing the System Event Log

VERITAS NetBackup Bare Metal Restore 6.0

Implementing and Troubleshooting the Cisco Cloud Infrastructure **Part of CCNP Cloud Certification Track**

PVFS High Availability Clustering using Heartbeat 2.0

Note: This case study utilizes Packet Tracer. Please see the Chapter 5 Packet Tracer file located in Supplemental Materials.

V310 Support Note Version 1.0 November, 2011

An Oracle White Paper June Oracle Linux Management with Oracle Enterprise Manager 12c

Active Directory - User, group, and computer account management in active directory on a domain controller. - User and group access and permissions.

How To Install An Org Vm Server On A Virtual Box On An Ubuntu (Orchestra) On A Windows Box On A Microsoft Zephyrus (Orroster) 2.5 (Orner)

OpenStack Tutorial. IEEE CloudCom Bret Piatt Community Stacker

Install Guide for JunosV Wireless LAN Controller

Managing the System Event Log

REDEFINING THE ENTERPRISE OS RED HAT ENTERPRISE LINUX 7

CT LANforge-FIRE VoIP Call Generator

Hijacking the Cloud: Systematic Risk in Datacenter Management Networks

How to Create a Virtual Switch in VMware ESXi

ACP ThinManager Tech Notes Troubleshooting Guide

Red Hat Linux Administration II Installation, Configuration, Software and Troubleshooting

Deploy and Manage Hadoop with SUSE Manager. A Detailed Technical Guide. Guide. Technical Guide Management.

Deployment of Private, Hybrid & Public Clouds with OpenNebula

Self-Service Cloud Infrastructure For Dynamic IT Environments

DeployStudio Server Quick Install

SUSE Cloud 2.0. Pete Chadwick. Douglas Jarvis. Senior Product Manager Product Marketing Manager

Service Bulletin [ ]

Managing the System Event Log

RedHat (RHEL) System Administration Course Summary

VMware Auto Deploy Administrator s Guide

Virtual Appliance Setup Guide

PARALLELS SERVER 4 BARE METAL README

MANAGE INFRASTRUCTURE AND DEPLOY SERVICES WITH EASE USING DELL ACTIVE SYSTEM MANAGER

Configuring the Edgewater 4550 for use with the Bluestone Hosted PBX

How To Use 1Bay 1Bay From Awn.Net On A Pc Or Mac Or Ipad (For Pc Or Ipa) With A Network Box (For Mac) With An Ipad Or Ipod (For Ipad) With The

IPMI overview. Power. I/O expansion. Peripheral UPS logging RAID. power control. recovery. inventory. Hugo CERN-FIO-DS

Deskpool Quick Start. Version: V2.1.x. Based on Hyper-V Server 2012 R2. Shenzhen Jieyun Technology Co., Ltd (

Automation Tools for UCS Sysadmins

Virtualization Management the ovirt way

Transcription:

Scalable System Operations Joshua Hoffman 1

About This Talk Set of principles Operations Engineering Tumblr project Server management Massively automated Software Techniques Example code Best practices Open source 2

About Me 3

About Me 1995: CompUSA Intro to The Internet 2000: Guru Labs Sun, Cisco, Red Hat 2002: Red Hat Sys admin courseware 4

About Me 2004: Fortress Systems Anti-spam/malware 2005: Red Hat Virtualization cert Remote learning Defined cloud 2011: Tumblr Lead Systems Eng 5

The Problem Deploying new servers is very repetitive and slow. (and we hate that) http://www.flickr.com/people/mc4army/ 6

The Job We Don t Want http://www.flickr.com/people/redjar/ 7

The Solution 8

Install OS Configure OS Install software Configure software Add to DNS Add to monitoring Add to trending Firmware Configure BIOS Set up BMC Inventory Stress testing Network config 9

The Goal Is Clear 10

Time To Strategize http://www.flickr.com/people/irishwildcat/ 11

Time To Strategize Use open source? Which? Buy software? Which? Write software? Mix and match? http://www.flickr.com/people/irishwildcat/ 12

The Choice Principle The time to make a decision is a function of the possible choices. http://www.flickr.com/people/3059349393/ 13

Rapid Software Research 14

Rapid Software Research 1. Define 2. Gather 3. Disqualify 4. Rank 15

Rank http://www.flickr.com/people/nirak/ 16

Rank Modularity Compliance Novelty Disruption http://www.flickr.com/people/nirak/ 17

My Requirements Asset inventory State management Robust API Event triggers 18

My Requirements Modular Flexible Extensible Fast 19

My Requirements Manage physical hardware as easily as virtual machines. 20

The Usual Suspects Cobbler Foreman Satellite Orchestra Racktables Clusto 21

But Wait! 22

Data Entry Just import the data supplied by the hardware vendor... http://www.flickr.com/people/mwichary/ 23

Missing Requirements Firmware Configure BIOS Set up BMC Inventory Stress testing Network config Add to monitoring Add to trending 24

We have to write software! http://www.flickr.com/people/argen/ 25

We have to write software! Delivery Schedule Scope Creep Maintenance Documentation http://www.flickr.com/people/argen/ 26

Tumblr Management Stack ipxe Invisible Touch Collins Phil Kickstart Puppet 27

The Glue Principle Unix Rule of Parsimony: Write a big program only when it is clear by demonstration that nothing else will do. http://www.flickr.com/people/kodomut/ 28

The Standards Principle The nice thing about standards is that you have so many to choose from. -Andrew Tanenbaum http://www.flickr.com/people/usfwssoutheast/ 29

The Simplicity Principle Unix Rule of Simplicity: Design for simplicity; add complexity only where you must. http://www.flickr.com/people/haldanemartin/ 30

The 3:00 AM Principle It must be obvious to someone woken up from a sound sleep at 3:00 am. http://www.flickr.com/people/joi/ 31

The Don t Break The OS Principle The software should NOT prevent the OS from working as expected. http://www.flickr.com/photos/philmanker/ 32

The Amnesia Principle Given enough time, you WILL forget why you did that. http://www.flickr.com/people/zach_a/ 33

Tumblr Management Stack ipxe Invisible Touch Collins Phil Kickstart Puppet 34

Why not pxelinux? http://www.flickr.com/people/digiart2001/ 35

Why not pxelinux? TFTP Flat files http://www.flickr.com/people/digiart2001/ 36

ipxe http://www.flickr.com/people/chiotsrun/ 37

ipxe HTTP, FTP, iscsi Scriptable Variables Dynamic http://www.flickr.com/people/chiotsrun/ 38

ISC DHCP For ipxe # subnet for the provisioning vlan subnet <%= subnet %> netmask <%= netmask %> { option domain-name "<%= option_domain_name %>"; option routers <%= option_routers %>; option domain-name-servers <%= option_dns_servers.map{ i "#{i}"}.join(", ") -%>; option subnet-mask <%= option_subnet_mask %>; default-lease-time 21600; max-lease-time 43200; range <%= range_start %> <%= range_end %>; # If a pxe request comes in from ipxe send the config url if exists user-class and option user-class = "ipxe" { filename "<%= ipxe_config_url %>"; # http://foo.example.com/ipxe/${net0/mac} # For all other pxe requests send ipxe } else { next-server <%= next_server %>; # tftp server filename "<%= filename %>"; # path to ipxe binary on tftp server } } } 39

Fedora LiveCD Tools lang en_us.utf-8 keyboard us timezone US/Eastern auth --useshadow --enablemd5 selinux --enforcing firewall --disabled repo --name=centos --baseurl=http://127.0.0.1/pub/repo/centos/os/6.2 repo --name=infra --baseurl=http://127.0.0.1/pub/repo/infra/6.2 repo --name=epel --baseurl=http://127.0.0.1/repo/epel/6/x86_64/ %packages --excludedocs @core dracut dracut-kernel device-mapper device-mapper-event %end 40

Invisible Touch Kickstart # Invisible Touch Live OS image %include centos-6.2-livecd-minimal.ks %packages --excludedocs it %end %post cat > /etc/issue <<EoF Invisible Touch Live OS v0.0.4 Kernel \r EoF # set ipmi to start at boot up /sbin/chkconfig ipmi on # configure rsyslog cat >> /etc/rsyslog.conf <<EoF # invisible touch local0.* local0.* EoF %end /var/log/it.log /dev/tty7 41

Invisible Touch Utilities lshw lldpd Breakin ipmitool Bash scripts 42

lshw <node id="disk:1" claimed="true" class="disk" handle="scsi:04:00:01:00"> <description>ata Disk</description> <product>st91000640ns</product> <vendor>seagate</vendor> <physid>0.1.0</physid> <businfo>scsi@4:0.1.0</businfo> <logicalname>/dev/sdf</logicalname> <dev>8:80</dev> <version>n/a</version> <serial>9xg0etb8</serial> <size units="bytes">1000204886016</size> <configuration> <setting id="ansiversion" value="5" /> <setting id="signature" value="000e1763" /> </configuration> <capabilities> <capability id="partitioned">partitioned disk</capability> <capability id="partitioned:dos">ms-dos partition table</capability> </capabilities> </node> lshw generates hardware info XML 43

lldpd <interface label="interface" name="eth0" via="lldp" rid="1" age="0 day, 00:01:03"> <chassis label="chassis"> <id label="chassisid" type="mac">78:19:f7:88:60:c0</id> <name label="sysname">core01.dfw01</name> <descr label="sysdescr">juniper Networks, Inc. ex4500-40f</descr> <capability label="capability" type="bridge" enabled="on" /> <capability label="capability" type="router" enabled="on" /> </chassis> <port label="port"> <id label="portid" type="local">608</id> <descr label="portdescr">ge-0/0/3.0</descr> <mfs label="mfs">1514</mfs> <auto-negotiation label="pmd autoneg" supported="no" enabled="yes"> <advertised label="adv" type="10base-t" hd="no" fd="yes" /> <current label="mau oper type">unknown</current> </auto-negotiation> </port> <vlan label="vlan" vlan-id="666" pvid="yes">dfw01-provisioning</vlan> <lldp-med label="lldp-med"> <device-type label="device Type">Network Connectivity Device</device-type> <capability label="capability" type="capabilities" /> </lldp-med> </interface> lldpctl outputs network info in XML 44

Breakin Stress testing framework 45

Breakin Standard tools LINPACK Extensible Bash scripts 46

Invisible Touch Firmware Configure BIOS Set up BMC Inventory Stress testing Network config 47

Collins Asset management system in Scala REST API Client libraries in Ruby, Python and Bash Shell tool for scripting and automation Callback system for hooking into events Granular permissions model Flexible web and API based provisioning Remote power management IP Address allocation and management Distributed mode for spanning data centers 48

Collins Docs 49

Collins Search 50

Collins Asset Details 51

Collins Provisioning 52

Phil ipxe dispatcher Kickstart generator Light Ruby app Collins API client 53

Server Intake Workflow 1. Rack and stack 2. Power on 3. Enter physical data 54

Server Intake Process 1. Server boots ipxe via DHCP/PXE 2. ipxe gets config from Phil 3. Phil sends Invisible Touch 4. IT updates firmware (if needed) 5. IT configures BIOS 6. IT configures BMC 7. IT uploads inventory data to Collins 8. IT starts stress tests 9. IT powers down server 55

Provisioning Workflow 1. Search Collins 2. Choose Profile, Role, Pool 3. Click button 56

Provisioning Process 1. Server boots ipxe via DHCP/PXE 2. ipxe gets config from Phil 3. Phil sends install image 4. Install image gets Kickstart from Phil 5. Install runs Puppet in %post 6. End of %post calls back to Collins 7. Collins triggers vlan update 8. Collins triggers monitoring/trending 9. Added to production if all green 57

Result Fast, scalable, no hassle provisioning! http://www.flickr.com/people/mc4army/ 58

Hurdles http://www.flickr.com/people/ligynnek/ 59

Hurdles PXE kickstart w/ multiple NICs Network set up in %post Virident SSD set up in %post http://www.flickr.com/people/ligynnek/ 60

PXE Kickstart / Multiple NICs Phil ipxe config initrd <%= os_install_url %>/images/initrd.img kernel <%= os_install_url %>/images/vmlinuz ip=dhcp ksdevice=${mac} Phil kickstart snippet # network network --bootproto=dhcp 61

%post Network Set Up Phil kickstart snippet # Bond Interface: <%= bond.name %> cat > /etc/sysconfig/network-scripts/ifcfg-<%= bond.name %> <<EoF DEVICE=<%= bond.name %> BONDING_OPTS="<%= bond.options %>" BOOTPROTO=static IPADDR=<%= bond.address %> NETMASK=<%= bond.netmask %> GATEWAY=<%= bond.gateway %> EoF 62

%post Virident SSD Set Up # Start the virident daemon /etc/init.d/vgcd start # create a device node mknod /dev/vgca0 b 252 0 # create a mount point mkdir -p /var/lib/mysql # create partitions parted -s /dev/vgca0 mklabel msdos parted -s /dev/vgca0 unit s mkpart primary ext2 2048 100% # make another device node mknod /dev/vgca0p1 b 252 1 # make the filesystem /sbin/mkfs.xfs -f -d su=64k,sw=3 -l size=32m,su=16k /dev/vgca0p1 # create fstab entry echo "/dev/vgca0p1 /var/lib/mysql xfs noauto 0 0" >>/etc/fstab # create virident config cat > /etc/sysconfig/vgcd.conf << EoF RESCAN_MD=1 RESCAN_LVM=1 MOUNT_POINTS="/var/lib/mysql" RESCAN_MOUNT=1 EoF # mount the virident mount /var/lib/mysql 63

Lessons Learned Modularity is very important Hardware always has issues at scale Use modern Bash syntax 4 hour burn-in is not enough 64

Yes, we re hiring! Joshua Hoffman joshua@tumblr.com tumblr.com/jobs 65

66