Development at the Speed and Scale of Google. Ashish Kumar Engineering Tools



Similar documents
Continuous Integration Using Cruise Control

Software Development In the Cloud Cloud management and ALM

Chapter 7: Distributed Systems: Warehouse-Scale Computing. Fall 2011 Jussi Kangasharju

Machine Data Analytics with Sumo Logic

Understanding Infrastructure as Code. By Michael Wittig and Andreas Wittig

Distributed Development With Perforce Software. Tony Vinayak Perforce Software

Visualizing an OrientDB Graph Database with KeyLines

DJANGOCODERS.COM THE PROCESS. Core strength built on healthy process

SavvyDox Publishing Augmenting SharePoint and Office 365 Document Content Management Systems

Antelope Enterprise. Electronic Documents Management System and Workflow Engine

Successfully managing geographically distributed development

Oracle Big Data SQL Technical Update

Optimizing Application Performance with CUDA Profiling Tools

Volunteer Computing, Grid Computing and Cloud Computing: Opportunities for Synergy. Derrick Kondo INRIA, France

A Performance Evaluation of Open Source Graph Databases. Robert McColl David Ediger Jason Poovey Dan Campbell David A. Bader

Eucalyptus-Based. GSAW 2010 Working Group Session 11D. Nehal Desai

Automated Performance Testing of Desktop Applications

Surround SCM Best Practices

Distributed File Systems Part I. Issues in Centralized File Systems

DSS. Diskpool and cloud storage benchmarks used in IT-DSS. Data & Storage Services. Geoffray ADDE

A Case for Static Analyzers in the Cloud (Position paper)

Hierarchical Bloom Filters: Accelerating Flow Queries and Analysis

Assignment # 1 (Cloud Computing Security)

JAVASCRIPT CHARTING. Scaling for the Enterprise with Metric Insights Copyright Metric insights, Inc.

PORTAL ADMINISTRATION

Cloud Computing Backgrounder

Cloudera Manager Training: Hands-On Exercises

Web project proposal. European e-skills Association

Lost in Space? Methodology for a Guided Drill-Through Analysis Out of the Wormhole

Ansible in Depth WHITEPAPER. ansible.com

In-Memory Databases Algorithms and Data Structures on Modern Hardware. Martin Faust David Schwalb Jens Krüger Jürgen Müller

MOBILE METRICS REPORT

Revision control systems (RCS) and

A new version of Firefox is available

XpoLog Center Suite Data Sheet

Gladinet Cloud Backup V3.0 User Guide

Massive Data Storage

Testing & Assuring Mobile End User Experience Before Production. Neotys

Big Fast Data Hadoop acceleration with Flash. June 2013

Distributed Systems. Lec 2: Example use cases: Cloud computing, Big data, Web services

Load Testing at Yandex. Alexey Lavrenuke

System Requirement Specification for A Distributed Desktop Search and Document Sharing Tool for Local Area Networks

RUN THE RIGHT RACE. Keep pace with quickening release cycles. Discover automation with the human touch. CHOOSE A TEST TO RUN BELOW

Cognos Performance Troubleshooting

Data processing goes big

Cloud Computing. Up until now

Bringing Big Data Modelling into the Hands of Domain Experts

Welcome to the Force.com Developer Day

ORACLE OLAP. Oracle OLAP is embedded in the Oracle Database kernel and runs in the same database process

Talend Real-Time Big Data Sandbox. Big Data Insights Cookbook

Petabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, SIGMOD, New York, June 2013

Mission-Critical Database with Real-Time Search for Big Data

Parallels Cloud Server 6.0

Analyzing Big Data with Splunk A Cost Effective Storage Architecture and Solution

EMC NETWORKER AND DATADOMAIN

Enabling Continuous Delivery by Leveraging the Deployment Pipeline

Cloud and Big Data Summer School, Stockholm, Aug Jeffrey D. Ullman

TotalFlow Prep, Print Manager and Production Manager. Improve efficiencies, reduce costs and simplify workflows

Instructional Design Framework CSE: Unit 1 Lesson 1

Traditional v/s CONVRGD

PHP on IBM i: What s New with Zend Server 5 for IBM i

Practical Online Filesystem Checking and Repair

EMC BACKUP MEETS BIG DATA

For more about patterns & practices: My blog:

Hadoop & SAS Data Loader for Hadoop

ANSYS EKM Overview. What is EKM?

Outline. What is Big data and where they come from? How we deal with Big data?

Top Five Ways Any Business Can Benefit from Box

Dashboard Reporting Business Intelligence

SOFTWARE TESTING TRAINING COURSES CONTENTS

Hadoop Ecosystem B Y R A H I M A.

NEXT GENERATION ARCHIVE MIGRATION TOOLS

Key Benefits of Microsoft Visual Studio 2008

PIVOTAL CRM ARCHITECTURE

Pcounter. counting on cost efficiency. Pcounter Applications

Meister Going Beyond Maven

In-Database Analytics

The Future of Recruiting: The Ultimate Guide to Sales and Recruiting Software

Oracle Database 12c Plug In. Switch On. Get SMART.

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

What s New in MATLAB and Simulink

Database Fundamentals

white paper Modernizing the User Interface: a Smarter View with Rumba+

Lecture 3: Scaling by Load Balancing 1. Comments on reviews i. 2. Topic 1: Scalability a. QUESTION: What are problems? i. These papers look at

Simple Tips to Improve Drupal Performance: No Coding Required. By Erik Webb, Senior Technical Consultant, Acquia

Why Computers Are Getting Slower (and what we can do about it) Rik van Riel Sr. Software Engineer, Red Hat

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

2012 AKAMAI FASTER FORWARD TM

owncloud Architecture Overview

A programming model in Cloud: MapReduce

WHAT S NEW IN SAS 9.4

DATABASE VIRTUALIZATION AND INSTANT CLONING WHITE PAPER

What's New in SAS Data Management

CS 377: Operating Systems. Outline. A review of what you ve learned, and how it applies to a real operating system. Lecture 25 - Linux Case Study

The Benefits of Utilizing a Repository Manager

Delivery. Continuous. Jez Humble and David Farley. AAddison-Wesley. Upper Saddle River, NJ Boston Indianapolis San Francisco

Continuous Delivery Workshop

Yocto Project Eclipse plug-in and Developer Tools Hands-on Lab

Mobile Testing in a Fast Paced World

Datacenter Operating Systems

Transcription:

Development at the Speed and Scale of Google Ashish Kumar Engineering Tools

The Challenge

Speed and Scale of Google More than 5000 developers in more than 40 offices More than 2000 projects under active development More than 50000 builds per day on average More than 100 million test cases run per day 20+ code changes per minute; 50% of the code changes every month Single monolithic code tree with mixed language code Development on head; all releases from source

Single monolithic code tree... Develop at head Build everything from source Extensive automated tests running at each changelist Need strong enforcement of coding style and guidelines Can make changes to kernel, gmail and buzz in the same changelist Complex dependency graph across products and libraries

Why do we care?

Rough developer workflow

Estimating build tools savings 2008 to 2009 Rough use case estimates Estimated Time waiting on build tools Estimated Savings: ~600 person years

Who we are

Engineering Tools and Engineering Productivity Google Focus Area: Engineering Productivity o Focus on Accelerating Google o Includes Test Engineering, Release Engineering, Engineering Docs and Education,..., and Engineering Tools Engineering Tools o Focused on providing tools that accelerate Google engineers from idea to production o 100+ team of engineers spread across 4 major sites o Builds and manages tools related to Source Control, Developer Tools and IDEs, Test Infrastructure, Build Tools and Infrastructure, Project Management Tools, and others

What's Unique? Significant investment in infrastructure for developers o Core infrastructure technologies like GFS, BigTable etc. that developer can quickly build systems on o Core tools that developers can quickly build, test and release their products / projects with o Tools leverage the same production infrastructure that our products do Continuous Improvement with Tools o "We can't improve what we can't measure" o Data-driven culture: strong focus on metrics for improvement o Our goal: make the tools disappear from the workflow

How we do it

Building for scale

Our version "Free" infrastructure for all teams o Transparency of code changes through centralized code review service o Developers can run affected tests before submitting code o Run every affected test at every code change o Run tests on all major OS / browser combinations o Transparently store all build and test results (including build, code analysis, and linter warnings) o Provide comprehensive UI, API and notification o Move all "compute-intensive work" to the cloud

Key Goals and Principles Speed: Developers spend lesser and lesser time waiting on tools e.g. builds, test systems, code analysis,... High Quality Feedback: Deliver high quality feedback; more signal, less noise. Simplicity: Developers will ideally not need to know or understand how the underlying tools and systems work. Measure everything

Source code at scale... How to allow 1000s of engineers to sync source code on a single tree with massive dependencies? A full checkout would take tens of minutes o Would easily choke any corporate network o Other companies create developer branches per feature Developers change < 10% of code they actually check out o Builds and tests often need the rest of the code to run o Deliver the rest of the code as a read-only copy, on demand o Implemented as a FUSE-based file system, tracks changes to main source depot and caches aggressively

Keeping the code tree consistent Mandatory code reviews with central tool o Need code readability for languages (enforces style guide) o Need owners for code sub-tree that maintain consistency and correctness o Higher code transparency and code contributions across teams Reduce code review costs, provide lots of signals to reviewers o Lint errors o Code Analysis and Build warnings / errors o Code coverage data o Test results o Easy, web-based access - full graphical diffs available, easy to add comments o Future: integrate with IDEs

Keep code reviews efficient Code review breakdown for one package

Code Review turnaround by size

Measure the tool itself Box-plots for the Code Review tool latencies

The Build System is important Builds are glamour-less at most companies Problems with builds can result in huge productivity losses o Debugging build problems o Waiting for builds to finish o Feedback best attached to build systems; e.g. run tests, code analysis as part of builds Build metadata is equally important as source code o Needs to analyze and enforce dependencies, validate inputs o Needs to be correct and fast o Builds need to be hermetic to be distributed o Full knowledge of inputs, dependencies and outputs can allow massive parallelization of actions

Build Systems require strong CS skills Deal with massive scale o 20 Million+ builds per year Massive distributed execution o More than 10000 cores using > 50TB of memory o ~1 PB 7-day cached object output

Durable metrics Remember this? Mostly flat between 2009 and 2010 o Files for each (measured) target grew by 54% to 191% o Doing significant more work in the same time Needed durable metrics across time; bucket builds by: o Count of discrete actions and inputs o Office o Incrementality o...

Builds by incrementality Many builds are clean, but most are in the 90-100% incrementality range!

Builds by action size Most builds are small, but long tail (mostly by our own automated systems)

Clean Build times

Build times by office

Action Cache

How much did we save?

Object caching wins Statistics from a single day ~ 500M build actions 94% action cache hit rate 30M cache misses 800 CPU days (just build and test) 66% of actions from automated builds

Building in the cloud has costs... Large builds have large outputs Corp-Cloud network is not as efficient as Cloud-Cloud network, transferring bits can be a significant time sink and network hog Solution: don't send the build outputs to the workstation till they are actually needed or read. o Implemented as a Fuse-based file system that allows directory operations on the output. o Aggressive caching for build outputs by office and workstation

Distributed builds have costs... Link actions require all the input object files o Requires moving all object files that are built on different distributed nodes to the one node where the link action occurs o Can be expensive and on the critical path Solution: Incremental link o Store additional information in a binary o Use old binary + modified object files to build new binary o Only process modified object files symbol tables and relocations o expected 10x improvement in link speed

Continuous Integration at Scale Fail fast, report clearly, root cause Test early at every stage Reduce defect identification to fix time Use feedback and data to stay healthy Reduce complexity "... the key is to practice continual improvement and think of (it) as a system, not as bits and pieces." Dr. W. Edwards Deming

Continuous Integration at Scale 120K test suites in the code base Run 7.5M test suites per day 120M individual test cases / day and growing 1800+ continuous integration builds Mountains of data == Opportunity for data mining and research

Scale requires Search Also provides a SQL interface to query build and test results for further analysis

Test results repository

Integrated coverage view

Faster time to fix

Faster time to fix

And of course, we need more... IDEs that can work at scale Code visualization and search Code Analysis and Documentation... many more

Summary

What we do different Invest in our developer infrastructure o Developers can build upon common technologies o Significant investment in central tools team results in a measurable boost in engineer productivity Parallelize and Distribute where possible o Compute intensive operations leverage the cloud, while UIsensitive work stays closer to the developer Hire the best / Design for scale o Developer Tools and Build Systems are tough computer science and systems problems; they need the best developers Measure Everything o Cannot improve what we don't measure