Continuous Integration Using Cruise Control Presented By Tom Grant PlatinumSolutions, Inc. Tuesday, May 24 th, 2005 Northern VA Java User s Group Core SIG 2005 PlatinumSolutions, Inc. The Freedom to Achieve Independent, Open Solutions for Business and Government
What does it cost? Free, Open Source Software One time costs $500 for a dedicated build machine 4 hours configuration time for a first-timer, 2 hours for an experienced Cruise Control user. Recurring costs 20 minutes to set up a new project Electricity Disk space 2005 PlatinumSolutions, Inc. The Freedom to Achieve Independent, Open Solutions for Business and Government
Continuous Testing in an Agile Environment Nan Krull Manager, Software Quality 2006 Macrovision Corporation Company Confidential
Assessing Quality 83%* of bugs originate in the Requirements and Design stages of a project. Approx. half of software development effort is spent on testing. Project Development Costs** 40% 50% Development Prevent Bugs Find\Fix Bugs Release Bugs 7% 3% *Quality Assurance Institute, 2006 **Economics of Continuous Testing, Stephen A. Bender, The Quality Connection, 2006 2006 Macrovision Corporation Company Confidential 12
AdminStudio Improved Product Quality % of Critical Failures in System Test 3 2.5 2 1.5 1 0.5 0 6.0 7.0 7.5 2006 Macrovision Corporation Company Confidential 13
AdminStudio Improved Product Quality 4 Customer-Reported Critical\Major Bugs, 30 days post-release 3 2 1 0 6.0 7.0 7.5 2006 Macrovision Corporation Company Confidential 14
Results Higher quality product Lower risk of releasing defects Lower defect handling costs Earlier flagging of issues which impact the schedule Better Quality = Satisfied Customers, Increased Demand Everybody Wins! 2006 Macrovision Corporation Company Confidential 17
Accelerate Software Delivery with Continuous Integration and Testing JaSST 08 Tokyo Jeffrey Fredrick jtf@agitar.com Agitar Software, 2009 1
Model of Finding Bugs Manually Simple assumption 1. Testing a feature takes 2 min 2. Investigation & reporting a bug takes 10 min Setup, bug investigation, and reporting take time away from test design and execution Simple conclusion In a 90 min session, we can run 45 features tests as long as we don t find any bugs... (example from Michael Bolton, DevelopSense.com) Agitar Software, 2008 5
Finding Bugs Manually Costs Time 90 minute testing session, 45 features tests Number of bugs found Time spent reporting bugs (minutes) Increase in testing time 0 0 0 1 10 11% 9 90 100% And what about other costs of those bugs? The engineering time to investigate The time to retest The bugs hidden behind bugs Time reporting Increase in testing Number of bugs found Agitar Software, 2008 6
Use Multiple Builds to Minimize Feedback Cycles quick build compiles + fast tests feedback < 10 minutes system tests feedback < 1 hour nightly tests (official build) Agitar Software, 2008 11
Agitar Makes Extensive Use of Multiple Build Cycles 14 servers dedicated to CI 8 instances of CruiseControl 6 machines for distributed agitation 106 builds defined Average 7 builds per day per module Coverage on multiple branches and operating systems Continuous releases to QA No special test builds required Agitar Software, 2008 12
CruiseControl Report for Build Metrics Agitar Software, 2008 16
Management Email Gives Visibility into High Level Trends Agitar Software, 2008 17
Impact of Continuous Integration in the Literature 36% reduction in defect rate when integration/regression testing at each code check-in Trade-offs between Productivity and Quality in Selecting Software Development Practices, IEEE Software, Sept-Oct 2003 Agitar Software, 2008 19
Impact of Continuous Integration with Agitar Customers 90% reduction in bugs reaching QA Major municipal gas utility 95% cut in cost of bugs Large retail web site 90% cut in defect remediation cost Global supplier of healthcare equipment Agitar Software, 2008 20
Impact of Continuous Integration at Agitar Software Faster time-to-market More features and higher quality Agility in the marketplace added new functionality 2 weeks before ship shipped 1 week early Confidence in the process Oozing Confidence (Mike Clark) Agitar Software, 2008 21
Development at the Speed and Scale of Google Ashish Kumar Engineering Tools
Speed and Scale of Google More than 5000 developers in more than 40 offices More than 2000 projects under active development More than 50000 builds per day on average More than 100 million test cases run per day 20+ code changes per minute; 50% of the code changes every month Single monolithic code tree with mixed language code Development on head; all releases from source
Single monolithic code tree... Develop at head Build everything from source Extensive automated tests running at each changelist Need strong enforcement of coding style and guidelines Can make changes to kernel, gmail and buzz in the same changelist Complex dependency graph across products and libraries
Estimating build tools savings 2008 to 2009 Rough use case estimates Estimated Time waiting on build tools Estimated Savings: ~600 person years
Source code at scale... How to allow 1000s of engineers to sync source code on a single tree with massive dependencies? A full checkout would take tens of minutes o Would easily choke any corporate network o Other companies create developer branches per feature Developers change < 10% of code they actually check out o Builds and tests often need the rest of the code to run o Deliver the rest of the code as a read-only copy, on demand o Implemented as a FUSE-based file system, tracks changes to main source depot and caches aggressively
Keeping the code tree consistent Mandatory code reviews with central tool o Need code readability for languages (enforces style guide) o Need owners for code sub-tree that maintain consistency and correctness o Higher code transparency and code contributions across teams Reduce code review costs, provide lots of signals to reviewers o Lint errors o Code Analysis and Build warnings / errors o Code coverage data o Test results o Easy, web-based access - full graphical diffs available, easy to add comments o Future: integrate with IDEs
Keep code reviews efficient Code review breakdown for one package
Code Review turnaround by size
Measure the tool itself Box-plots for the Code Review tool latencies
Build Systems require strong CS skills Deal with massive scale o 20 Million+ builds per year Massive distributed execution o More than 10000 cores using > 50TB of memory o ~1 PB 7-day cached object output
Durable metrics Remember this? Mostly flat between 2009 and 2010 o Files for each (measured) target grew by 54% to 191% o Doing significant more work in the same time Needed durable metrics across time; bucket builds by: o Count of discrete actions and inputs o Office o Incrementality o...
Builds by incrementality Many builds are clean, but most are in the 90-100% incrementality range!
Builds by action size Most builds are small, but long tail (mostly by our own automated systems)
Clean Build times
Build times by office
Action Cache
How much did we save?
Object caching wins Statistics from a single day ~ 500M build actions 94% action cache hit rate 30M cache misses 800 CPU days (just build and test) 66% of actions from automated builds
Continuous Integration at Scale 120K test suites in the code base Run 7.5M test suites per day 120M individual test cases / day and growing 1800+ continuous integration builds Mountains of data == Opportunity for data mining and research
Continuous Integration to DevOps: Co-Evolution of Agile and Automation #$%&$'"(&$)&*+," -$+./*+01"230/4$1*56" 789:&;0/+<)$=+<>"!"
CI improves productivity and quality <=>*-$),*$"*?@6*0&+:&+A:-03-'77,-" "B.$/"D$&C<&>*/4";:*1)5"06"1$056")0*1'" BC>*-,#&./0"*$"*#,D,.+*-'+," "B.$/"*/6$4&0G</a&$4&$55*</"6$5G/4"06"$0+."+<)$" +.$+,[*/" T-&0)$[<%5";$6B$$/"U&<):+G3*6'"0/)"b:01*6'"*/" ^$1$+G/4"^<EB0&$"_$3$1<D>$/6"U&0+G+$5WH" P222"^<EB0&$H"^$D6[]+6"?ccI" `"
Continuous Integration on a Dollar a Day I!"
Doing the impossible 50-times a day I?"
Case Study - Continuous Deployment at Outbrain Itai Hochman VP Engineering at Outbrain!
Fun Numbers l l l l 5-50 production changes a day!!! More then 5000 unit tests running in less then 5 minutes. More then 8000 regression tests running every hour. It takes ~15 minutes from code complete to ~50 machines deployed.
Continuous Integration at Google Scale By John Micco
Speed and Scale 15000+ developers in 40+ offices 4000+ projects under active development 5500+ submissions per day on average Single monolithic code tree with mixed language code Development on one branch - submissions at head All builds from source 20+ sustained code changes per minute with 60+ peaks 50% of code changes monthly 75+ million test cases run per day Google Confidential and Proprietary
Benefits Identifies failures sooner Identifies culprit change precisely Avoids divide-and-conquer and tribal knowledge Lowers compute costs using fine grained dependencies Keeps the build green by reducing time to fix breaks Accepted enthusiastically by product teams Enables teams to ship with fast iteration times Supports submit-to-production times of less than 36 hours for some projects Google Confidential and Proprietary
Costs Requires enormous investment in compute resources (it helps to be at Google) grows in proportion to: Submission rate Average test time Variants (debug, opt, valgrind, etc.) Increasing dependencies on core libraries Branches Requires updating dependencies on each change Takes time to update - delays start of testing Google Confidential and Proprietary
Google Confidential and Proprietary
Google Confidential and Proprietary
Google Confidential and Proprietary
Test Growth Conclusions Quadratic execution time growth Ultimately cannot run every affected test @ every change no incentive for teams to optimize use of shared test resources Engineering time is expensive! Future plans Enforce execution quotas for teams? Provide incentive for teams to optimize resources Smarter scheduling? Periodic green builds Culprit finding Flake hiding Google Confidential and Proprietary
Large Scale Test Automation in the Cloud John Penix Developer Infrastructure
Speed and Scale of Google More than 10000 developers in more than 40 offices More than 2000 projects under active development More than 50000 builds per day on average Single monolithic code tree with mixed language code Development on head; all releases from source 20+ code changes per minute; 50% of code changes monthly More than 50 million test cases run per day
Source code at scale... How to allow 1000s of engineers to sync source code on a single tree with massive dependencies? A full checkout would take tens of minutes Would easily choke any corporate network Other companies create developer branches per feature Insights: Developers change < 10% of code they check out Builds and tests often need the rest of the code to run Solution: Deliver the rest of the code as a read-only, on demand Implemented as a FUSE-based file system, tracks changes to main source depot and caches aggressively
Building at Scale Deal with massive scale 20 Million+ builds per year Massive distributed execution More than 10000 cores using > 50TB of memory
Building in the cloud has costs... Large builds have large outputs Corp-Cloud network is not as efficient as Cloud-Cloud Solution: don't send the build outputs to the workstation till they are actually needed or read. Implemented as a Fuse-based file system Aggressive caching for build outputs Cache test results too! ~1 PB 7-day cached object output
Continuous Integration at Scale 200K test suites in the code base Run 10M test suites per day > 60M individual test cases / day and growing > 4000 continuous integration builds
Practical Issues: Load Growth and Spikes Bigger and bigger tests Long running tests with lots of dependencies trigger often Load spikes Most submits are during west-coast US working hours Low-level changes can trigger many thousands of tests Everyone submits before going to lunch
Practical Issues: Load Growth and Spikes Bigger and bigger tests Long running tests with lots of dependencies trigger often Load spikes Most submits are during west-coast US working hours Low-level changes can trigger many thousands of tests Everyone submits before going to lunch Lunch