@jenkinsconf. Maintaining huge Jenkins clusters - Have we reached the limit of Jenkins?



Similar documents
Practicing Continuous Delivery using Hudson. Winston Prakash Oracle Corporation

Cloud Computing for Control Systems CERN Openlab Summer Student Program 9/9/2011 ARSALAAN AHMED SHAIKH

SUCCESFUL TESTING THE CONTINUOUS DELIVERY PROCESS

Mastering Continuous Integration with Jenkins

Lab 0 (Setting up your Development Environment) Week 1

Developer Workshop Marc Dumontier McMaster/OSCAR-EMR

Continuous Integration and Bamboo. Ryan Cutter CSCI Spring Semester

Continuous Integration (CI) for Mobile Applications

Building, testing and deploying mobile apps with Jenkins & friends

Continuous Integration: Put it at the heart of your development

How Liferay Is Improving Quality Using Hundreds of Jenkins Servers

November 12 th 13 th London: Mastering Continuous Integration with Jenkins

Continuous Integration and Delivery at NSIDC

Continuous Integration. CSC 440: Software Engineering Slide #1

Version Uncontrolled! : How to Manage Your Version Control

SOFTWARE DEVELOPMENT BASICS SED

Massively! Continuous Integration! A case study for Jenkins at cloud-scale

Copyrighted , Address :- EH1-Infotech, SCF 69, Top Floor, Phase 3B-2, Sector 60, Mohali (Chandigarh),

StriderCD Book. Release 1.4. Niall O Higgins

Continuous Integration: Improving Software Quality and Reducing Risk. Preetam Palwe Aftek Limited

Continuous Integration. Wellcome Trust Centre for Gene Regulation & Expression College of Life Sciences, University of Dundee Dundee, Scotland, UK

Continuous Integration and Automatic Testing for the FLUKA release using Jenkins (and Docker)

Continuous Integration Optimizing Your Release Management Process

1.1 SERVICE DESCRIPTION

Continuous Delivery on AWS. Version 1.0 DO NOT DISTRIBUTE

Building, Testing & Deploying Android Apps with Jenkins

TestOps: Continuous Integration when infrastructure is the product. Barry Jaspan Senior Architect, Acquia Inc.

Jenkins User Conference Herzelia, July #jenkinsconf. Testing a Large Support Matrix Using Jenkins. Amir Kibbar HP

DevOps Best Practices for Mobile Apps. Sanjeev Sharma IBM Software Group

DevOps Course Content

Successful PaaS and CI in the Cloud

Mobile Development with Git, Gerrit & Jenkins

Version Control Your Jenkins Jobs with Jenkins Job Builder

Software configuration management

Build Automation for Mobile. or How to Deliver Quality Apps Continuously. Angelo Rüggeberg

Surround SCM Best Practices

Jenkins Continuous Build System. Jesse Bowes CSCI-5828 Spring 2012

SUCCESFUL TESTING THE CONTINUOUS DELIVERY PROCESS

Automated Performance Testing of Desktop Applications

WA2102 Web Application Programming with Java EE 6 - WebSphere RAD 8.5. Classroom Setup Guide. Web Age Solutions Inc. Web Age Solutions Inc.

Continuous Integration: A case study

VIPERVAULT STORAGECRAFT SHADOWPROTECT SETUP GUIDE

Agile Austin Dev SIG. June Continuous Integration (CI)

CSE 70: Software Development Pipeline Version Control with Subversion, Continuous Integration with Bamboo, Issue Tracking with Jira

Course Agenda: Managing Active Directory with NetIQ Directory and Resource Administrator and NetIQ Exchange Administrator

Actualtests.C questions

Software Development In the Cloud Cloud management and ALM

Implementing Continuous Integration Testing Prepared by:

ALERT installation setup

Jenkins: The Definitive Guide

2012 Nolio Ltd. All rights reserved

Yocto Project Eclipse plug-in and Developer Tools Hands-on Lab

Our Puppet Story. Martin Schütte. May

ACCELERATE DEVOPS USING OPENSHIFT PAAS

Pipeline Orchestration for Test Automation using Extended Buildbot Architecture

WA1791 Designing and Developing Secure Web Services. Classroom Setup Guide. Web Age Solutions Inc. Web Age Solutions Inc. 1

Jenkins and Chef Infrastructure CI and Application Deployment

Installation Guide for WebSphere Application Server (WAS) and its Fix Packs on AIX V5.3L

Beginners guide to continuous integration. Gilles QUERRET Riverside Software

CISC 275: Introduction to Software Engineering. Lab 5: Introduction to Revision Control with. Charlie Greenbacker University of Delaware Fall 2011

How Bigtop Leveraged Docker for Build Automation and One-Click Hadoop Provisioning

Content. Development Tools 2(63)

EXPLORING LINUX KERNEL: THE EASY WAY!

OS X Modular Imaging and Deployment using Free and Open Source Tools

Lab Exercise Part II: Git: A distributed version control system

WA1916 WebSphere ESB 7.0 Programming Using WID. Classroom Setup Guide. Web Age Solutions Inc. Copyright 2011 Web Age Solutions Inc.

Escaping the Works-On-My-Machine badge Continuous Integration with PDE Build and Git

Azure Day Application Development

Installing and Administering VMware vsphere Update Manager

using version control in system administration

II. Installing Debian Linux:

WHITE PAPER. Ready, Set, Go Upgrade to ERP Cloud Release 10

Troubleshooting. Palo Alto Networks. Panorama Administrator s Guide Version 6.0. Copyright Palo Alto Networks

The Deployment Production Line

Jenkins World Tour 2015 Santa Clara, CA, September 2-3

Software Configuration Management (SCM)

Nexus Professional Whitepaper. Repository Management: Stages of Adoption

Continuous Integration and Delivery. manage development build deploy / release

Big Data with Component Based Software

Automated build service to facilitate Continuous Delivery

Installing and Configuring vcenter Support Assistant

GitLab as an Alternative Development Platform for Github.com

Continuous integration End of the big bang integration era

Department of Veterans Affairs. Open Source Electronic Health Record (EHR) Services

Paul Barham Program Manager - Java. David Staheli (dastahel@microsoft.com) Software Development Manager - Java

Solution for private cloud computing

owncloud Configuration and Usage Guide

Deployment Guide: Unidesk and Hyper- V

Type-C Ubuntu Product & Strategy Canonical Ltd.

America s Most Wanted a metric to detect persistently faulty machines in Hadoop

Comodo One Software Version 1.8

Continuous Integration using Docker & Jenkins

What is new in Switch 12

Overview Customer Login Main Page VM Management Creation... 4 Editing a Virtual Machine... 6

SAP Web IDE Hybrid App Toolkit Add-on

Interworks. Interworks Cloud Platform Installation Guide

VMware vsphere Data Protection 6.0

This Deployment Guide is intended for administrators in charge of planning, implementing and

Delivering Quality Software with Continuous Integration

Deploying and updating VMware vsphere 5.0 on HP ProLiant Servers

Transcription:

Maintaining huge Jenkins clusters - Have we reached the limit of Jenkins? Robert Sandell Sony Mobile Communications www.sonymobile.com www.rsandell.com @jenkinsconf 1

TOC! How We Work! Jenkins & topography setup! What can our users do! Automated maintenance! Future plans 2

About Me Robert Bobby Sandell a.k.a. rsandell I work in the Development Environment section within the Software R&D division at Sony Mobile. We are forging the tools used by the Developers e.g. SCM, IDE, emulator, compilers, build, CI and some verification tools. My current role is Tool Manager** for Jenkins as a member of the Build & CI Team. Maintainer of several Jenkins plugins e.g. Trigger, Build Failure Analyzer and Multi Slave Config. ** TM is management speak for: Besides coding I also prioritize and manage the backlog and act as ambassador to the users. 3

The Tool chain Jenkins is over here Mostly OSS OSS based homebrew 4

How We Work The tale of adding a small feature A change needs to get votes before it can be merged; Verified +1 and Code Review +2. Developer makes changes Jenkins is the only user in allowed to set Verified +1 Only after the change has been automatically built (on real devices or emulators), tested, analyzed and accepted, the change is Verified +1 Pretty much every GIT needs to have a corresponding Jenkins build project To help us enable this workflow we created the Trigger Plug-in for Jenkins Verified -1 Qualified +1 Jenkins Change uploaded to Peer Review Verified +1 Code Review +1 Sub Architect review Code Review +2 Submit Code Review -1 Code Review -1 5

Some typical builds! Besides our build farm most teams have one or more Jenkins slaves located near their desks, so called Team Slaves. The slaves have physical phones connected to them via USB.! The teams have Jenkins jobs configured for periodic builds, which Flash the latest Android Platform onto the phone Build the application from HEAD of the branch Install it on the phone Run several tests/static analysis on it! When builds are triggered from, teams can choose to run the build on their team slave(s) (testing using a real device) or on a build farm machine (testing using an emulator).! The System team also do full Android platform builds on every commit for their own integration branches.! We also do Official builds Full build of the Android platform Labeled and uploaded to the binary repository Periodically triggered several times per day Multi configuration project All phone configurations and variants for current release branch A successful official build will trigger a chain of other builds that perform automatic smoke tests in the verification lab. 6

Jenkins Masters Jenkins Regular @ SELD 24 Cores 64GB RAM 6TB Storage 300 Slaves (70 TEAM) 99 Plugins 5k Jobs 6k builds/day 350k tests/day Jenkins Regular @ CNBJ 16 Cores 48GB RAM 5TB Storage 180 Slaves (130 TEAM) 82 Plugins 3k Jobs 1.7k builds/day 20k tests/day Jenkins Platform @ CNBJ 24 Cores 64GB RAM 5TB Storage 180 Slaves 65 Plugins 2k Jobs 2k builds/day 0** tests/day Jenkins Platform @ SELD 24 Cores 64GB RAM 8TB Storage 200 Slaves (40 TEAM) 70 Plugins 2k Jobs 2.5k builds/day 0 tests**/day Jenkins CM @ SELD Jenkins Regular @ JPTO Jenkins CM @ JPTO Totals: 16K Builds/day 600K Tests/day 32 Cores 128GB RAM 3TB Storage 70 Slaves 70 Plugins 400 Jobs 1.5k builds/day 24 Cores 48GB RAM 10TB Storage 300 Slaves (190 TEAM) 90 Plugins 7k Jobs 3.5k builds/day 230k tests/day 24 Cores 48GB RAM 6TB Storage 260 Slaves 106 Plugins 2k Jobs 7.5k builds/day 1500 (Non virtual) Slaves All of them acts local reference mirrors (updated and repacked daily)! Around 25-30% of all builds are triggered by ** There are more tests executed, but the count is based on recorded test results by Jenkins 7

Jenkins/ Setup Review seld01 seld02 seld03 seld04 jpto01 jpto02 jpto03 cnbj-dev cnbj01 seld load balancer jpto load balancer Jenkins Slaves Jenkins Master Jenkins Slaves Jenkins Master Jenkins Slaves Jenkins Master Servers: Ubuntu 10.04 64 Server Slaves: Ubuntu 12.04 64 Workstation 8

What can users do If they can login (and everyone can), they are allowed to: Create and configure jobs Delete any job or build Run any build Basically do anything except administer But certain privileged/power users can also administer Mostly Freestyle or Matrix jobs Lots of bash and python scripts. Applications are mostly built using an in-house developed plug-in, wrapping the functionality our in-house application build system. We provide job templates and a user guide Deviations occur quickly Users usually end up copying a previous team job But many teams have one or two power users that helps explaining and setting up builds We encourage users to keep to our project naming standard This way we can sort various projects in different views depending on project, branch, etc. Helps with the collection of Metrics Is the build triggered by? Periodically? Experimental? 9

Maintaining the clusters Split responsibility between IT, application managers and our team The IT guys takes care of everything up to the OS level on the slaves and servers. The AM team takes care of the every day Jenkins work and first line Jenkins/Git/ support We take second line support, larger development work and try to plan ahead. Keep them cool and connected Replace/upgrade hardware OS Patching Etc. User Support Manage Jenkins Analyze and fix transient build errors Etc. Plan Jenkins upgrades Manage/develop plug-ins Handle user requests Educate users Engage with the community Etc. 10

Automated Maintenance Problem #1: Disk Space! Disk space on the slaves is between 1 and 2 TB! Some pimped slaves with 240GB SSD! A full Android build is ~50GB! The Android SDK is ~1.5GB! Each slave has a git reference mirror of about 100GB Cron script on Master Running once a day Periodic Jenkins job on Master Running every hour or 30 min while free_disk_space < 20% removeoldestartifact() We can keep artifacts stored for about 2 weeks sometimes 2 days depending on load. foreach slave ssh-connect and check disk-space if < 80GB remove workspace for all non-building jobs if nothing is building rm rf /tmp/* A workspace might be available for a few hours after the build completes. 11

Automated Maintenance Keeping slaves up to date without disturbing builds! CF Engine and Puppet can t handle the load.! Small tools and maintenance scripts needs to be distributed.! Don t want a lib updated in the middle of an official build. Update local git ref mirror every evening: Launching a slave: rsynch the needful launch slave.jar Update packages every morning: Put the slave temporary offline Wait for builds to finish list = gerrit ls-projects grep f white-list foreach pos in list git clone -bare -git-dir ~/.repo-mirror/$pos Jobs can then clone with --reference $REPO_MIRROR Run cf-agent and apt-get upgrade If reboot needed Force fsck on next reboot Reboot Mark the slave as online 12

Automated Maintenance Surveying and measuring Slave sanity ----- SLAVE: Check that there is >= 50 GB free disk space on. ----- ----- SLAVE: Check that there is >= 1 GB free disk space on /tmp ----- ----- SLAVE: Check that there is 1 instance of slave.jar ----- ----- MASTER: Check that there is 1 connection to each slave ----- ----- SLAVE: Check that the.ssh directory is correct ----- ----- SLAVE: Check that slave home dir doesn't contain a.repo directory ----- SLAVE: Check that the hosts are pingable ----- ----- SLAVE: Check that /tmp isn't readonly ----- Package sanity Check versions of vital packages like: Semc-build Python signtool Periodic counting with the plot plugin 13

How to install a plugin 1. Check the source code Any global hooks? RunListener, ItemListener, QueueSorter, PageDecorator etc. Any big iterations? getitems(), getcomputers(), getbuilds() etc. General feeling about the code Complexity, Unit tests, does it look like crap? i.e. how hard would it be if we need to fix a bug? 2. Play with it in a local instance 3. Install on test server(s) 4. Install on one of the production servers 5. Deploy world wide 6. Still keep a paranoid eye on it for another week or so. 14

Plugins worth mentioning for a maintainer! Multi Slave Config Plugin For handling multiple slaves in one click! Sectioned View Plugin For news! Scriptler For debug and quick fixes! Job Config History To find someone to blame! Disk Usage Plugin To find someone else to blame! Build Failure Analyzer So we don t have to find the same issue again 15

Bumps in the road! Politics on Jenkins as integration blocker! Hudson 1.377 to Jenkins 1.404 Remoting API Sell-in of Jenkins fork to management! Regressions in plug-ins The CVS plugin is staying at version 1.6!! Performance NFS -> SAN 40min startup to 10min Ext4 vs. XFS 40min startup to 20min Spring cleaning 1h 40min startup to 1h 10min Queue synchronization Why is the UI so slow at lunch time? 1.447.2 -> 1.480.2 1h 40min startup to 15min Nothing in the change log about this except Misc performance improvements in 1.452!? How to test lazy loading penalty/gain in 1.509? 16

Future! Metadata How to find something on such an open master.! Templates and pipelines! Security! Move to the Cloud? Or just stick our heads up there every once in a while?! Increasing the cluster size? Do we need to? 17

Thank You To Our Sponsors Platinum Gold Silver 18

19